Good features beat algorithms

Cite

EuroPython

Mascolo, Pietro

Formal Metadata

Title

Good features beat algorithms

Title of Series

EuroPython 2018

Number of Parts

132

Author

Mascolo, Pietro

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44894 (DOI)

Publisher

EuroPython

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

In Machine Learning and Data Science in general, understanding the data is paramount. This understanding can come from many different sources and techniques: domain expertise, exploratory analysis, SMEs, some specific Machine Learning techniques, and feature engineering. As a matter of fact, most Machine Learning and Statistical analysis strongly depends on how the data is prepared, thus making feature engineering very important for any serious Machine Learning enterprise. "Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data." In this talk we will discuss what feature engineering and feature selection are; how to select important features in a real-world dataset and how to develop a simple, but powerful ensemble to measure feature importance and perform feature selection. Familiarity with intermediate concepts of the Python programming language is required to follow the implementation steps. General knowledge of the basic concepts of Machine Learning and data cleaning will be useful, but not strictly necessary, to follow the discussion on feature selection and feature engineering.