Parallel computing in Python

Zitieren

EuroPython

Glaser, Pierre

Formale Metadaten

Titel

Parallel computing in Python

Untertitel

Current state and recent advances

Serientitel

EuroPython 2019

Anzahl der Teile

118

Autor

Glaser, Pierre

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Identifikatoren

10.5446/44807 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2019

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Parallel computing in Python: Current state and recent advances Modern hardware is multi-core. It is crucial for Python to provide high-performance parallelism. This talk will expose to both data-scientists and library developers the current state of affairs and the recent advances for parallel computing with Python. The goal is to help practitioners and developers to make better decisions on this matter. I will first cover how Python can interface with parallelism, from leveraging external parallelism of C-extensions –especially the BLAS family– to Python's multiprocessing and multithreading API. I will touch upon use cases, e.g single vs multi machine, as well as and pros and cons of the various solutions for each use case. Most of these considerations will be backed by benchmarks from the scikit-learn machine learning library. From these low-level interfaces emerged higher-level parallel processing libraries, such as concurrent.futures, joblib and loky (used by dask and scikit-learn) These libraries make it easy for Python programmers to use safe and reliable parallelism in their code. They can even work in more exotic situations, such as interactive sessions, in which Python’s native multiprocessing support tends to fail. I will describe their purpose as well as the canonical use-cases they address. The last part of this talk will focus on the most recent advances in the Python standard library, addressing one of the principal performance bottlenecks of multi-core/multi-machine processing, which is data communication. We will present a new API for shared-memory management between different Python processes, and performance improvements for the serialization of large Python objects ( PEP 574, pickle extensions). These performance improvements will be leveraged by distributed data science frameworks such as dask, ray and pyspark.

Schlagwörter

Scientific Libraries (Numpy/Pandas/SciKit/...)