Machine Learning: Power of Ensembles

6 views

Formal Metadata

Title
Machine Learning: Power of Ensembles
Title of Series
Part Number
167
Number of Parts
169
Author
Subramanian, Bargava
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
DOI
Publisher
EuroPython
Release Date
2016
Language
English

Content Metadata

Subject Area
Abstract
Bargava Subramanian - Machine Learning: Power of Ensembles In Machine Learning, the power of combining many models have proven to successfully provide better results than single models. The primary goal of the talk is to answer the following questions: 1) Why and How ensembles produce better output? 2) When data scales, what's the impact? What are the trade-offs to consider? 3) Can ensemble models eliminate expert domain knowledge? ----- It is relatively easy to build a first-cut machine learning model. But what does it take to build a reasonably good model, or even a state- of-art model ? Ensemble models. They are our best friends. They help us exploit the power of computing. Ensemble methods aren't new. They form the basis for some extremely powerful machine learning algorithms like random forests and gradient boosting machines. The key point about ensemble is that consensus from diverse models are more reliable than a single source. This talk will cover how we can combine model outputs from various base models(logistic regression, support vector machines, decision trees, neural networks, etc) to create a stronger/better model output. This talk will cover various strategies to create ensemble models. Using third-party Python libraries along with scikit-learn, this talk will demonstrate the following ensemble methodologies: 1) Bagging 2) Boosting 3) Stacking Real-life examples from the enterprise world will be show-cased where ensemble models produced better results consistently when compared against single best-performing models. There will also be emphasis on the following: Feature engineering, model selection, importance of bias-variance and generalization. Creating better models is the critical component of building a good data science product.
Loading...
Demon Red Hat
Computer animation Statistical ensemble (mathematical physics) Statistical ensemble (mathematical physics) Mathematical model Mathematical model Power (physics)
Statistical ensemble (mathematical physics) Control flow Abstract machine Mathematical model
Point (geometry) Hidden surface determination Number Computer animation Network topology Multiplication sign Network topology
Computer animation Domain name Mathematical model
Machine learning Domain name Algorithm Machine learning Process (computing) Computer animation output Process (computing) Prediction Mathematical model Mathematical model
Machine learning Computer animation output Process (computing) Mathematical model Form (programming)
Machine learning Randomization Statistical ensemble (mathematical physics) Linear regression Distribution (mathematics) Mathematical model Mathematical model Computer animation Network topology Forest Vertex (graph theory) Process (computing) Subtraction
Support vector machine Process (computing) Computer animation Whiston, William Virtual machine output Mathematical model Mathematical model
Computer animation Mathematical model Mathematical model
Process (computing) Computer animation Transformation (genetics) Multiplication sign Abstract machine System identification
Prediction Mathematical model Computer animation Prediction Subtraction Mathematical model
Solution set Algorithm Dialect Spacetime Mathematical model Computer animation Mathematical model Subtraction
Category of being Mathematical model Mathematical model Mathematical model
Number Mathematical model Computer animation Parameter (computer programming) Parameter (computer programming) Set (mathematics) Mathematical model Subtraction
Mathematical model Linear regression Statistical ensemble (mathematical physics) Linear regression Exponentiation Gradient Logistic distribution Combinational logic Feasibility study Mathematical model Mathematical model Mathematical model Revision control Computer animation Strategy game Natural number Statistical ensemble (mathematical physics)
Random number Building Gradient Gradient Parameter (computer programming) Mathematical model Wave packet Mathematical model Forest Computer animation Forest Statistical ensemble (mathematical physics) Matrix (mathematics)
Linear regression Computer animation Linear regression Gradient Logistic distribution Parameter (computer programming) Insertion loss Wave packet Matrix (mathematics) Condition number Mathematical model
Point (geometry) Forest Random number Linear regression Computer animation Gradient Software testing Prediction Mathematical model Symmetric matrix
Forest Maxima and minima Random number Computer animation Gradient Function (mathematics) Counting Statistical ensemble (mathematical physics) Mathematical model Information security Number
Forest Maxima and minima Random number Computer animation Average Function (mathematics) Gradient Statistical ensemble (mathematical physics) Function (mathematics) Mathematical model
Proxy server Computer animation Central processing unit Proxy server
Solution set Randomization Spacetime Computer animation Strategy game Statistical ensemble (mathematical physics) Algorithm Multiplication sign Forest Gradient Virtual machine Mathematical model
Statistical ensemble (mathematical physics) Multiplication sign Energy level Mathematical model Mathematical model Power (physics)
Product (category theory) Mathematical model Statistical ensemble (mathematical physics) Prediction Mathematical model Mathematical model Mathematical model Prediction Computer animation Logic Logic output Statistical ensemble (mathematical physics) output Subtraction Computer architecture
Computer animation Statistical ensemble (mathematical physics) Parallel computing Multiplication sign Variance Statistical ensemble (mathematical physics) Function (mathematics) Mathematical model Mathematical model Mathematical model
Dialect Algorithm Set (mathematics) Sampling (statistics) Parameter (computer programming) Set (mathematics) Mathematical model Mathematical model Hypercube Wave packet Number Mathematical model Computer animation Subtraction Sampling (music)
Addition Algorithm Algorithm Set (mathematics) Combinational logic Parameter (computer programming) Water vapor Function (mathematics) Average Mathematical model Hypercube Mathematical model Voting Computer animation Forest Subtraction Sampling (music) Social class
Area Prediction Set (mathematics) Mathematical model Mathematical model Food energy Hypercube Plane (geometry) Computer animation Average Function (mathematics) output Energy level Diagram Mathematical optimization Subtraction Capability Maturity Model Condition number Library (computing)
Serial port Sample (statistics) Computer animation Distribution (mathematics) Dimensional analysis Statistics Parameter (computer programming) Coma Berenices Parallel port Mathematical optimization Mathematical optimization Library (computing)
Process (computing) Computer animation Function (mathematics) Cross-validation (statistics) Pattern language Parallel port Mereology Mathematical model Mathematical model Task (computing)
Metropolitan area network Voting Computer animation Statistical ensemble (mathematical physics) Combinational logic Coma Berenices Statistical ensemble (mathematical physics) Mathematical model Subtraction Resultant
Building Statistical ensemble (mathematical physics) Multiplication sign Coma Berenices Mathematical model Mathematical model Metric tensor Mathematical model Mixture model Computer animation Core dump Interpreter (computing) Metrologie Statistical ensemble (mathematical physics) Quicksort Mutual information Spectrum (functional analysis)
Slide rule Computer animation Distance
Laptop Computer animation Coma Berenices Statistical ensemble (mathematical physics) Mutual information
Demon Computer animation
2 prominent you and you explain some stuff from one-dimensional OK so we are going to be run the
stock is what ensemble models some of you might have thought about some of the material participated in our cabin
competition almost invariably solution is an ensemble model so loan
what ensemble models are both of those 2 using break down into the
stock fall from a data scientist at the school and rendering machine learning for about 14 years before I start with the talk of
this 1 to you a shot of the on this a bunch of bullets but there in a tree the actually 150
bullets hundred comes along and the players 3 shots on the what's our has probably give hit the dog at this point to the 20 per cent of the time see compared with that distraught T times of 1 below this had the Christian as for the not but homey
bullets no agreement on the street yes yes they also play right so you can
use complicated models from simple models deep learning models but never lose the big picture of it's extremely important it's this is an example where you along
related to but in practice kind of models that you built of maybe so complicated that you may not hear and understand that it relates to the domain of knocked what's the
machine learning process we have input it gets fed into a learning algorithm which creates emission learning model and that's used to build the prediction that's the
that's what is taught in schools and in the cost
but in reality the input data is in this in a different form but I have to transform have to to modify you to clean up to do a whole lot of stuff before you create features from the input data on that gets fed into model
and use the final model to predict in reality it's even more complex because there's so many ways using which you can create creatures you can fit
to so many different kinds of model not just 1 kind of model that you can build so many kinds of qualities of the new models linear regression nodes distribution of tree-based models Mission trees and have ensemble previous models random forests
William Whiston machines have deep learning models support vector machines it's and not a lot how would you do something like this it's extremely difficult or what what am I really however is really going to select something from the regarding the 2
important steps In the process you have to create features from the input dataset you also need to select models in some way the challenge here
is that the model is only as good as you it's because you're the person is creating the features so the model is only as good as features which is created by
you and this is where a lot of
time spent in traditional machine learning please you spend a lot of time in identifying the features in creating the features on generating a transformation all kinds of stuff to create the
features of all this is made famous by the New York Times articles which tells that this is the janitorial work in that it assigns process and it takes about all 80 per cent of the time the challenge after you
create a feature so that if you if you have the same features different models give different predictions and why is
that it's because the solution
space is so huge it's defined by your and there the soccer features on the models can go on search
in different regions of the solutions to these different algorithms such different solutions pesos on it's
kind of hard to figure out which 1 is what should take so that they have better generalization property OK given that you
know how to create features from a lot of talk about how to improve model performance this is where you have different models to select
from each models has different parameters that you can use this column hyperparameters and
make it you can try different sets of features for each of the models how can you create the
final model using those not try all possible combinations that really explored it's exponential in nature the exhaustive search is definitely and this is where the ensemble models helps us come up with some kind of strategy on solving this the talk about
what the the ensemble models are just with some by example with grade on a dummy data said 3 kinds of models 1 linear model Lojistik revisions a classification problems have a large
degradation model a based on random forest
model on the gradient
boosting model so it's very easy to build and all by
Don all modesty provisions directly various I could learn a discrete of this
use linear model called loves to condition not reading distinguishing this implemented
by the loss of liberty ecology boast of it's better than the 1 that's the of and you have
the prediction on the test dataset the assumption here is that quality do is datasets are 1 and this is how the model comes the also have different predictions on the same in data points but they all have the same accuracy on have 70 per cent how do we go and with a final prediction
when how all the models with equal to if you probably use cross-validation symmetric have 70 per cent of all 3 of them but if you look into
what does here is the easy way where you can come by and these 3 models into something that you
can take going number that comes maximum security max count and you see that OK in my 1st role 1 comes the most so I'm going to
set my output as 1 and if
you do something like the the 2nd 1 0 come small assignments and 0 you know those you miraculously enter a 90 per cent accuracy this is the simplest model maximum counting all which will also be your average in some sense you
going to use CPU as
a proxy for creating more features of creating this particular kind of what so would
talk about strategies which can you use some clever techniques to search the solution space
having said this I should tell that ensemble models are not
new it's been there for a very long time techniques like random forest gradient boosting machines are also ensemble models but they just use
it as a very simplistic level advances in computing power has led to a far more powerful techniques that the really sharply on some models for use predominantly and
academia for a long time until the the Netflix competition this happened to those and 7 of this was 1 creating an ensemble model and that's where an
industry-standard noting that all you can use ensemble models and now you see a lot of ensemble models and in production
In sense this assault it looks like you have an input data you create different kinds of models it's very important that you create different kinds of models is not the same models and you combine them using some logic and then you use that to date your final prediction this is what the architecture of an ensemble model would look like
what are the advantages of doing this well it definitely improves the accuracy not obvious but most of the times it improves the accuracy it becomes the robust your variance of the output reduces and because we're doing things in that you can do things in parallel the lends itself nicely depalatalization anything you can do things much faster 2
important things as they here you have to sell at different model so you should have this model diversity and there should be some somebody to negate the models so the subdued things that's very important for of ensemble
models the before important techniques that you can use to create this model you can use different training data
sets you can take for that's your number of the regions you can sample on the number of
features you can use different algorithms have been used in addition you can use random forests use neural it works you can have that's different algorithms and each of them can have different hyperparameters this will cost you to build different models once you have that
the combinational logic can be according really salt water was the 1 that we used it could also be averaging if if you want to use probability as your own book then you would have rich you just 1 big class of output you just walked and blending and
stacking all stacking looks something like that that you would come to its mature data into through pipes you would use a fast but to create something called this modern of put for this area based models and that's used as an input for your 2nd model so that's that's what you do docket of this could be so this would also be a modest models input would be dealt with all of the previous model how would you
do this in Python you would this this is based models OK here you have a base models which can be done using by plane so could learn house by plane so you can take all of the different libraries like atlas terraces for deep learning of you will this was really interesting to build a model and a set everything up pipeline and you can use randomized so it's the before cross-validation school and once you have your based models you can use so if you want to do build great Dave you want to read an average all you use the library qualified but often it's an hyper optimization library of our if you want to stacking you again created to another model the following energy boost our mothers to conditions to create the final prediction something on
randomized so it's even it's faster than what you would do
in In our grid search it's much faster and more hyper
as a it's it's for optimization the search business that and optimization working widely
used 1 for parallelization somewhat models lend itself nicely to balance each month and run each model in problem generally what you would do if you run 1 model you would run your cross validation in fact instead of a curious can scale-out and can run each model in parallel and job it does this task extremely nicely part you can log inventories that we
you you know which model takes a lot of pheromone really should optimize the you soon enough flexibility in doing that all OK home to really
know more about ensembles that will grow on really suggest you to look into that of competitions than models and invariably ensemble models these days
but here's an example of all this was used by the winner for the cold global search relevance competition Crawford have manner competition where the search results have to be the relevance of the search results have to be classified are you can see that once you get to the feature extraction stage different models vote and and the final models combination of these models all of of
them having said this not everything is fine an ensemble model of if you want to interpret the model interpretable model this is clearly not what you would use interpretability goes for across sometimes this is also true if you look into gathered to get to the last 2 per cent or 1 per cent mixture the time it takes to improve accuracy may not make sense re-elected practice so that's a big disadvantage it takes a really long time to improve accuracy so you would probably this really a full-blown metrology only accuracies are most important metric of the what's the
cool stuff around that's all we actually created package on building on some models of it was due to years back crowd song the dump cited this summer typified by going to the home so that's sort of spectrum which will on those that's all I have to thanks
distance OK we'll have
secretions announcement known and there to gift of slide is the what they did have slept threat to your picture on the other documentation is still a lot
of what it has it does notebooks which talks about how we
use this the the pocket
northern christians thank you so much of it is the 1st
from
Loading...
Feedback

Timings

  645 ms - page object

Version

AV-Portal 3.9.1 (0da88e96ae8dbbf323d1005dc12c7aa41dfc5a31)