Scikit-learn to "learn them all"

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/20046 (DOI)

Publisher

EuroPython

Release Date

2014

Language

English

Production Place

Berlin

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Valerio Maggio - Scikit-learn to "learn them all" Scikit-learn is a powerful library, providing implementations for many of the most popular machine learning algorithms. This talk will provide an overview of the "batteries" included in Scikit-learn, along with working code examples and internal insights, in order to get the best for our machine learning code. ----- **Machine Learning** is about *using the right features, to build the right models, to achieve the right tasks* However, to come up with a definition of what actually means **right** for the problem at the hand, it is required to analyse huge amounts of data, and to evaluate the performance of different algorithms on these data. However, deriving a working machine learning solution for a given problem is far from being a *waterfall* process. It is an iterative process where continuous refinements are required for the data to be used (i.e., the *right features*), and the algorithms to apply (i.e., the *right models*). In this scenario, Python has been found very useful for practitioners and researchers: its high-level nature, in combination with available tools and libraries, allows to rapidly implement working machine learning code without *reinventing the wheel*. **Scikit-learn** is an actively developing Python library, built on top of the solid `numpy` and `scipy` packages. Scikit-learn (`sklearn`) is an *all-in-one* software solution, providing implementations for several machine learning methods, along with datasets and (performance) evaluation algorithms. These "batteries" included in the library, in combination with a nice and intuitive software API, have made scikit-learn to become one of the most popular Python package to write machine learning code. In this talk, a general overview of scikit-learn will be presented, along with brief explanations of the techniques provided out-of-the-box by the library. These explanations will be supported by working code examples, and insights on algorithms' implementations aimed at providing hints on how to extend the library code. Moreover, advantages and limitations of the `sklearn` package will be discussed according to other existing machine learning Python libraries (e.g., "Shogun Toolbox", "PyML", "MLPy"). In conclusion, (examples of) applications of scikit-learn to big data and computational intensive tasks will be also presented. The general outline of the talk is reported as follows (the order of the topics may vary): * Intro to Machine Learning * Machine Learning in Python * Intro to Scikit-Learn * Overview of Scikit-Learn * Comparison with other existing ML Python libraries * Supervised Learning with `sklearn` * Text Classification with SVM and Kernel Methods * Unsupervised Learning with `sklearn` * Partitional and Model-based Clustering (i.e., k-means and Mixture Models) * Scaling up Machine Learning * Parallel and Large Scale ML with `sklearn` The talk is intended for an intermediate level audience (i.e., Advanced). It requires basic math skills and a good knowledge of the Python language.

Keywords

EuroPython Conference

EP 2014

EuroPython 2014