We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Testable ML Data Science

Formal Metadata

Title
Testable ML Data Science
Subtitle
How to make numeric code testable using Scikit-Learn's interfaces.
Alternative Title
Using Scikit-Learn's interface for turning Spaghetti Data Science into Maintainable Software
Title of Series
Part Number
37
Number of Parts
173
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceBilbao, Euskadi, Spain

Content Metadata

Subject Area
Genre
Abstract
Holger Peters - Using Scikit-Learn's interface for turning Spaghetti Data Science into Maintainable Software Finding a good structure for number-crunching code can be a problem, this especially applies to routines preceding the core algorithms: transformations such as data processing and cleanup, as well as feature construction. With such code, the programmer faces the problem, that their code easily turns into a sequence of highly interdependent operations, which are hard to separate. It can be challenging to test, maintain and reuse such "Data Science Spaghetti code". Scikit-Learn offers a simple yet powerful interface for data science algorithms: the estimator and composite classes (called meta- estimators). By example, I show how clever usage of meta-estimators can encapsulate elaborate machine learning models into a maintainable tree of objects that is both handy to use and simple to test. Looking at examples, I will show how this approach simplifies model development, testing and validation and how this brings together best practices from software engineering as well as data science. _Knowledge of Scikit-Learn is handy but not necessary to follow this talk._
Keywords