How to write a scikit-learn compatible estimator/transformer

Cite

FOSDEM VZW

Jalali, Adrin

Formal Metadata

Title

How to write a scikit-learn compatible estimator/transformer

Subtitle

Tips and tricks, testing your estimator, and must-watch related current developments

Title of Series

FOSDEM 2020

Number of Parts

490

Author

Jalali, Adrin

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/47287 (DOI)

Publisher

FOSDEM VZW

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

This is a hands-on short tutorial on how to write your own estimator or transformer which can be used in a scikit-learn pipeline, and works seamlessly with the other meta-estimators of the library. It also includes how they can be conveniently tested with a simple set of tests. In many data science related tasks, the use-case specific requirements require us to slightly manipulate the behavior of some of the estimators or transformers present in scikit-learn. Some of the tips and requirements are not necessarily well documented by the library, and it can be cumbersome to find those details. In this short tutorial, we go through an example of writing our own estimator, test it against the scikit-learn's common tests, and see how it behaves inside a pipeline and a grid search. There has also been recent developments related to the general API of the estimators which require slight modifications by the third party developers. I will cover these changes and point you to the activities to watch as well as some of the private utilities which you can use to improve your experience of developing an estimator.