We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Dask - extending Python data tools for parallel and distributed computing

Formal Metadata

Title
Dask - extending Python data tools for parallel and distributed computing
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
The growing Python data science ecosystem, including the foundational packagesNumpy and Pandas, provides powerful tools for data analysis that are widelyused in a variety of applications. Typically, these libraries were designedfor data that fits in memory and for computations that run on a single core. Dask is a Python library for parallel and distributed computing, using blockedalgorithms and task scheduling. By leveraging the existing Python dataecosystem, Dask enables to compute on arrays and dataframes that are largerthan memory, while exploiting parallelism or distributed computing power, butin a familiar interface (mirroring Numpy arrays and Pandas dataframes).