We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Painless Route in Python to Fast and Scalable Machine Learning

Formal Metadata

Title
The Painless Route in Python to Fast and Scalable Machine Learning
Title of Series
Number of Parts
130
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Python is the lingua franca for data analytics and machine learning. Its superior productivity makes it the preferred tool for prototyping. However, traditional Python packages are not necessarily designed to provide high performance and scalability for large datasets. From this talk you will learn how to get close-to-native performance with Intel-optimized packages, such as numpy, scipy, and scikit-learn. The next part of the talk is focused on getting high performance and scalability from multi-cores on a single machine to large clusters of workstations. It will be demonstrated that with Python it is possible to achieve the same performance and scalability as with hand-tuned C++/MPI code: - Scalable Dataframe Compiler (SDC) makes possible to efficiently load and process huge datasets using pandas/Python. - A convenient Python API to data analytics and machine learning primitives (daal4py). While its interface is scikit-learn-like, its MPI-based engine allows to scale machine learning algorithms to bare-metal cluster performance. - From the talk you will learn how to use SDC and daal4py together to build an end-to-end analytics pipeline that scales to clusters, requiring only minimal code changes.