Testable ML Data Science

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/20215 (DOI)

Publisher

EuroPython

Release Date

2015

Language

English

Production Place

Bilbao, Euskadi, Spain

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Holger Peters - Using Scikit-Learn's interface for turning Spaghetti Data Science into Maintainable Software Finding a good structure for number-crunching code can be a problem, this especially applies to routines preceding the core algorithms: transformations such as data processing and cleanup, as well as feature construction. With such code, the programmer faces the problem, that their code easily turns into a sequence of highly interdependent operations, which are hard to separate. It can be challenging to test, maintain and reuse such "Data Science Spaghetti code". Scikit-Learn offers a simple yet powerful interface for data science algorithms: the estimator and composite classes (called meta- estimators). By example, I show how clever usage of meta-estimators can encapsulate elaborate machine learning models into a maintainable tree of objects that is both handy to use and simple to test. Looking at examples, I will show how this approach simplifies model development, testing and validation and how this brings together best practices from software engineering as well as data science. _Knowledge of Scikit-Learn is handy but not necessary to follow this talk._

Keywords

EuroPython Conference

EP 2015

EuroPython 2015