| Valerio Maggio - Scikit-learn to "learn them all"
Scikit-learn is a powerful library, providing implementations for 
many of the most popular machine learning algorithms. 
This talk will provide an overview of the "batteries" included in 
Scikit-learn, along with working code examples and internal insights, in order to 
get the best for our machine learning code.
-----
**Machine Learning** is about *using the right features, to build the right 
models, to achieve the right tasks* 
However, to come up with a definition of what actually means **right** for 
the problem at the hand, it is required to analyse 
huge amounts of data, and to evaluate the performance of different algorithms 
on these data.
However, deriving a working machine learning solution for a given problem 
is far from being a *waterfall* process. 
It is an iterative process where continuous refinements are required for the 
data to be used (i.e., the *right features*), and the algorithms to apply 
(i.e., the *right models*).
In this scenario, Python has been found very useful for practitioners and 
researchers: its high-level nature, in combination with available tools and 
libraries, allows to rapidly implement working machine learning code 
without *reinventing the wheel*.
**Scikit-learn** is an actively 
developing Python library, built on top of the solid `numpy` and `scipy` 
packages.
Scikit-learn (`sklearn`) is an *all-in-one* software solution, providing 
implementations for several machine learning methods, along with datasets and 
(performance) evaluation algorithms.
These "batteries" included in the library, in combination with a nice and intuitive
software API, have made scikit-learn to become one of the most popular Python 
package to write machine learning code.
In this talk, a general overview of scikit-learn will be presented, along with 
brief explanations of the techniques provided out-of-the-box by the library.
These explanations will be supported by working code examples, and insights on 
algorithms' implementations aimed at providing hints on 
how to extend the library code.
Moreover, advantages and limitations of the `sklearn` package will be discussed 
according to other existing machine learning Python libraries
(e.g., "Shogun Toolbox", "PyML", "MLPy").
In conclusion, (examples of) applications of scikit-learn to big data and 
computational intensive tasks will be also presented.
The general outline of the talk is reported as follows (the order of the topics may vary):
*   Intro to Machine Learning
    *   Machine Learning in Python
    *   Intro to Scikit-Learn
*   Overview of Scikit-Learn
    *   Comparison with other existing ML Python libraries
*   Supervised Learning with `sklearn`
    *   Text Classification with SVM and Kernel Methods
*   Unsupervised Learning with `sklearn`
    *   Partitional and Model-based Clustering (i.e., k-means and Mixture Models)
*   Scaling up Machine Learning
    *   Parallel and Large Scale ML with `sklearn`
The talk is intended for an intermediate level audience (i.e., Advanced).
It requires basic math skills and a good knowledge of the Python language. |