Dask - extending Python data tools for parallel and distributed computing

Cite

Related Material

FOSDEM VZW

Bossche, Joris Van den

Formal Metadata

Title

Dask - extending Python data tools for parallel and distributed computing

Title of Series

FOSDEM 2017

Number of Parts

611

Author

Bossche, Joris Van den

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/41958 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The growing Python data science ecosystem, including the foundational packagesNumpy and Pandas, provides powerful tools for data analysis that are widelyused in a variety of applications. Typically, these libraries were designedfor data that fits in memory and for computations that run on a single core. Dask is a Python library for parallel and distributed computing, using blockedalgorithms and task scheduling. By leveraging the existing Python dataecosystem, Dask enables to compute on arrays and dataframes that are largerthan memory, while exploiting parallelism or distributed computing power, butin a familiar interface (mirroring Numpy arrays and Pandas dataframes).