Holger Peters - Using Scikit-Learn's interface for turning Spaghetti Data Science into Maintainable Software
Finding a good structure for number-crunching code can be a problem,
this especially applies to routines preceding the core algorithms:
transformations such as data processing and cleanup, as well as
feature construction.
With such code, the programmer faces the problem, that their code
easily turns into a sequence of highly interdependent operations,
which are hard to separate. It can be challenging to test, maintain
and reuse such "Data Science Spaghetti code".
Scikit-Learn offers a simple yet powerful interface for data science
algorithms: the estimator and composite classes (called meta-
estimators). By example, I show how clever usage of meta-estimators
can encapsulate elaborate machine learning models into a maintainable
tree of objects that is both handy to use and simple to test.
Looking at examples, I will show how this approach simplifies model
development, testing and validation and how this brings together best
practices from software engineering as well as data science.
_Knowledge of Scikit-Learn is handy but not necessary to follow this talk._ |