CatBoost - the new generation of Gradient Boosting

Cite

Related Material

EuroPython

Dorogush, Anna Veronika

Formal Metadata

Title

CatBoost - the new generation of Gradient Boosting

Title of Series

EuroPython 2018

Number of Parts

132

Author

Dorogush, Anna Veronika

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44984 (DOI)

Publisher

EuroPython

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others. CatBoost is a new open-source gradient boosting library, that outperforms existing publicly available implementations of gradient boosting in terms of quality. It has a set of addional advantages. CatBoost is able to incorporate categorical features in your data (like music genre, URL, search query, etc.) in predictive models with no additional preprocessing. CatBoost inference is 20-60 times faster then in other open-source gradient boosting libraries, which makes it possible to use CatBoost for latency-critical tasks. CatBoost has the fastest GPU and multi GPU training implementations of all the openly available gradient boosting libraries. CatBoost requires no hyperparameter tunning in order to get a model with good quality. CatBoost is highly scalable and can be efficiently trained using hundreds of machines The talk will cover a broad description of gradient boosting and its areas of usage and the differences between CatBoost and other gradient boosting libraries. We will also briefly explain the details of the proprietary algorithm that leads to a boost in quality.