Testable ML Data Science

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Identifikatoren

10.5446/20215 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2015

Sprache

Englisch

Produktionsort

Bilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Holger Peters - Using Scikit-Learn's interface for turning Spaghetti Data Science into Maintainable Software Finding a good structure for number-crunching code can be a problem, this especially applies to routines preceding the core algorithms: transformations such as data processing and cleanup, as well as feature construction. With such code, the programmer faces the problem, that their code easily turns into a sequence of highly interdependent operations, which are hard to separate. It can be challenging to test, maintain and reuse such "Data Science Spaghetti code". Scikit-Learn offers a simple yet powerful interface for data science algorithms: the estimator and composite classes (called meta- estimators). By example, I show how clever usage of meta-estimators can encapsulate elaborate machine learning models into a maintainable tree of objects that is both handy to use and simple to test. Looking at examples, I will show how this approach simplifies model development, testing and validation and how this brings together best practices from software engineering as well as data science. _Knowledge of Scikit-Learn is handy but not necessary to follow this talk._

Schlagwörter

EuroPython Conference

EP 2015

EuroPython 2015