We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Machine Learning: Power of Ensembles

00:00

Formal Metadata

Title
Machine Learning: Power of Ensembles
Title of Series
Part Number
167
Number of Parts
169
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Bargava Subramanian - Machine Learning: Power of Ensembles In Machine Learning, the power of combining many models have proven to successfully provide better results than single models. The primary goal of the talk is to answer the following questions: 1) Why and How ensembles produce better output? 2) When data scales, what's the impact? What are the trade-offs to consider? 3) Can ensemble models eliminate expert domain knowledge? ----- It is relatively easy to build a first-cut machine learning model. But what does it take to build a reasonably good model, or even a state- of-art model ? Ensemble models. They are our best friends. They help us exploit the power of computing. Ensemble methods aren't new. They form the basis for some extremely powerful machine learning algorithms like random forests and gradient boosting machines. The key point about ensemble is that consensus from diverse models are more reliable than a single source. This talk will cover how we can combine model outputs from various base models(logistic regression, support vector machines, decision trees, neural networks, etc) to create a stronger/better model output. This talk will cover various strategies to create ensemble models. Using third-party Python libraries along with scikit-learn, this talk will demonstrate the following ensemble methodologies: 1) Bagging 2) Boosting 3) Stacking Real-life examples from the enterprise world will be show-cased where ensemble models produced better results consistently when compared against single best-performing models. There will also be emphasis on the following: Feature engineering, model selection, importance of bias-variance and generalization. Creating better models is the critical component of building a good data science product.
Red HatDemonMathematical modelMusical ensembleLecture/Conference
Power (physics)Musical ensembleMusical ensembleMathematical modelMathematical modelLecture/ConferenceJSONXMLUML
Mathematical modelMusical ensembleControl flowMeeting/Interview
Network topologyNumberPoint (geometry)Coefficient of determinationMultiplication signNetwork topologyComputer animation
Domain nameMathematical modelMusical ensembleComputer animation
Mathematical modelDomain nameMeeting/Interview
Process (computing)Machine learningMathematical modelPredictabilityAlgorithmProcess (computing)Machine learningoutputJSONXMLComputer animation
Mathematical modeloutputForm (programming)Mathematical modelFile formatLecture/ConferenceMeeting/Interview
Mathematical modelMathematical model
Machine learningProcess (computing)Musical ensembleMathematical modelDistribution (mathematics)Linear regressionRandomizationForestNetwork topologyMathematical modelDifferent (Kate Ryan album)Decision theoryLinearizationGradientLogistic distribution
Whiston, WilliamMathematical modelVirtual machineVector spaceSelectivity (electronic)Support vector machineLecture/ConferenceMeeting/Interview
Mathematical modelProcess (computing)outputMathematical modelMathematical modelJSON
System identificationMultiplication signTransformation (genetics)JSONComputer animation
Process (computing)Multiplication signMeeting/Interview
Mathematical modelDifferent (Kate Ryan album)PredictionSolution setMathematical modelDifferent (Kate Ryan album)SpacetimePredictabilityDialectJSONXML
Mathematical modelAlgorithmDifferent (Kate Ryan album)DialectSpacetimeCategory of beingJSONComputer animation
NumberParameter (computer programming)Mathematical modelMathematical modelMathematical modelParameter (computer programming)Set (mathematics)Different (Kate Ryan album)Meeting/InterviewJSONXMLComputer animation
Feasibility studyMusical ensembleMathematical modelMathematical modelLogistic distributionLinear regressionSet (mathematics)Mathematical modelDifferent (Kate Ryan album)Mathematical modelPotenz <Mathematik>Linear regressionCombinational logicStrategy gameMusical ensembleNatural numberRevision controlGradientLogistic distributionRandomizationForestComputer animation
Mathematical modelBuildingMusical ensembleForestRandom numberWave packetParameter (computer programming)Matrix (mathematics)GradientLinear regressionLogistic distributionForestMathematical modelGradientLogistic distributionLinear regressionMeeting/InterviewComputer animation
Mathematical modelParameter (computer programming)Wave packetMatrix (mathematics)GradientRandom numberForestLinear regressionCondition numberLinear regressionGradientMathematical modelPredictabilityInsertion lossPoint (geometry)Software testingSet (mathematics)Library (computing)
Random numberGradientForestMusical ensembleFunction (mathematics)Symmetric matrixMathematical modelCross-validation (statistics)Information securityMaxima and minimaCountingNumberRow (database)Function (mathematics)Meeting/InterviewTable
Musical ensembleRandom numberGradientForestFunction (mathematics)BefehlsprozessorProxy serverFunction (mathematics)Mathematical modelMaxima and minimaAverageBefehlsprozessorProxy serverMeeting/InterviewComputer animationTable
AlgorithmSolution setProxy serverMathematical modelStrategy gameMusical ensembleMathematical modelSpacetimeLecture/ConferenceMeeting/InterviewComputer animation
Musical ensembleMathematical modelVirtual machineForestMultiplication signGradientRandomizationPower (physics)Level (video gaming)JSONComputer animation
Mathematical modelMultiplication signMusical ensembleProduct (business)Mathematical modelCodecLecture/ConferenceMeeting/Interview
Mathematical modelMathematical modelLogicMusical ensemblePredictionoutputMathematical modelDifferent (Kate Ryan album)Computer architectureMusical ensembleMathematical modeloutputPredictabilityLogicXMLProgram flowchart
Parallel computingMathematical modelMusical ensembleParameter (computer programming)Different (Kate Ryan album)HypercubeSet (mathematics)AlgorithmSampling (music)Musical ensembleDifferent (Kate Ryan album)Function (mathematics)Mathematical modelMathematical modelWave packetMultiplication signVarianceParalleler AlgorithmusComputer animation
Different (Kate Ryan album)HypercubeParameter (computer programming)Set (mathematics)AlgorithmSampling (music)Mathematical modelAverageVotingNumberWave packetState observerSampling (statistics)Set (mathematics)DialectVotingMathematical modelForestSocial classFunction (mathematics)Stack (abstract data type)AlgorithmParameter (computer programming)Different (Kate Ryan album)Artificial neural networkLinear regressionCombinational logicAdditionWater vaporMeeting/InterviewComputer animation
Function (mathematics)DiagramLevel (video gaming)Mathematical modeloutputCapability Maturity ModelMathematical modelAreaMereologyFunction (mathematics)Heegaard splittingDiagramProgram flowchart
Library (computing)Mathematical modelMathematical modelLogistic distributionLinear regressionMathematical optimizationPredictabilityGradientAverageHypercubeRandomizationCross-validation (statistics)WeightCondition numberFood energyDifferent (Kate Ryan album)PlanningSet (mathematics)Program flowchart
StatisticsParameter (computer programming)Sample (statistics)Distribution (mathematics)Dimensional analysisMathematical optimizationSerial portParalleler AlgorithmusLibrary (computing)Coma BerenicesFunction (mathematics)Pattern languageMathematical modelParalleler AlgorithmusMathematical optimizationCross-validation (statistics)Mathematical modelProcess (computing)Task (computing)MereologySpacetimeCoroutineLecture/ConferenceMeeting/InterviewComputer animation
Musical ensembleComa BerenicesMetropolitan area networkMathematical modelWechselseitige InformationMathematical modelMultiplication signLevel (video gaming)Mathematical modelVideo gameMusical ensembleCombinational logicInterpreter (computing)Real numberBuildingWebsiteMetric systemResultantDifferent (Kate Ryan album)VotingMixture modelMetrologieCore dumpQuicksortSpectrum (functional analysis)Lecture/ConferenceMeeting/InterviewComputer animation
DistanceSlide ruleArmLaptopLecture/ConferenceMeeting/Interview
Musical ensembleComa BerenicesWechselseitige InformationLaptopLecture/ConferenceMeeting/InterviewComputer animation
3 (number)DemonComputer animation
Transcript: English(auto-generated)
Subramanian. He will explain some stuff about the machine learning. Okay, let's go. Good afternoon everyone. This talk is about what ensemble models are. Some of you might
have heard about it. If you've ever participated in a Kaggle competition, almost invariably every winning solution is an ensemble model. So we'll learn what ensemble models are and how to build using Python in today's talk. I'm a data scientist at Cisco and
been doing machine learning for about 14 years. Before I start about the talk, just want to give a short puzzle. Just a bunch of birds that's there in a tree.
There are actually 150 birds. Hunter comes along and he fires three shots on the birds. His probability to hit the target is 0.2. 20% of the times he can hit. So he's hit his shot three times. One bird is hit. The question is at the end of it, how many birds
now remain on the tree? Yes? Yes, they all fly away, right? So you can use complicated models,
ensemble models, deep learning models, but never lose the big picture. It's extremely important. This is an example where you will all relate to, but in practice kind of models that you build may be so complicated that you may not even understand that it relates to the domain or not.
What's the machine learning process? We have input. It gets fed into a learning algorithm which creates a machine learning model and that's used to build the prediction.
That's easy to tell. That's what is taught in school, in any course, but in reality the input data is in a different format. You'll have to transform, you'll have to modify, you have to clean, you have to do a whole lot of stuff before you create features from your
input data and that gets fed into models and you use the final model to predict. In reality it's even more complex because there's so many ways using which you can create features
and you can fit so many different kinds of models. It's not just one kind of model, but you can build so many kinds of models. You have linear models, linear regression, logistic regression, you have tree-based models, decision trees, and you have ensemble tree-based models, random forest, gradient boosting machines,
deep learning models, support vector machines. It's a lot. How would you do something like this? It's extremely difficult. How am I really going to select something from this? Recapping, there are two important steps in the process. You have to create features from
your input data set. You would also need to select models in some way. The challenge here is that the model is only as good as you. It's because you're the person who's creating the features. So the model is only as good as the features which are created by you
and this is where a lot of time is spent in traditional machine learning place. You spend a lot of time in identifying the features, in creating the features, and generating it, transformation, all kinds of stuff to create the features. This is made really famous by the
New York Times article which tells that this is the janitorial work in a data science process and it takes about 80 percent of the time. The challenge after you create the features is that
even if you use the same features, different models give different predictions. And why is that? It's because the solution space is so huge. It's defined by your entire space of your features and the models can go and search in different regions of the solution space. Different
algorithms search in different solution spaces and it's kind of hard to figure out which one is what I should take so that they have better generalization property. Okay given that you know how to create features from it, we'll now talk about how to improve model performance. This is
where you'll have different models to select from. Each model has different parameters that you can tune. It's called hyper parameters and you can try different sets of features
for each of the models. How can you create a final model using this? You cannot try all possible combinations that really explode. It's exponential in nature. Exhaustive search is definitely ruled out and this is where ensemble models helps us come up with some kind of strategy on
solving this. We'll talk about what ensemble models are just with some toy example. We'll create on a dummy data set three kinds of models. One is a linear model logistic regression is a classification problem. We have a logistic regression model, a tree-based random forest model
and a gradient boosting model. So it's very easy to build in Python. Logistic regression is directly there in scikit-learn. You just create, just use linear
model called logistic regression. Gradient boosting machine is implemented by an awesome library called XGBoost. It's better than the one that's there in scikit-learn. And you have the prediction on the test data set. The assumption here is that all the test data sets are one and this is how the model comes. They all have different predictions on the same
10 data points but they all have same accuracy. They all have 70 percent accuracy. How do we go and build a final prediction when you have all three models with equal? So if you probably use cross-validation as a metric, you're going to have 70 percent in all three of them.
But if you look into what does here, there's a easy way where you can combine these three models into something better. You can take the number that comes maximum, here it's max count, and you see that okay in my first row one comes the most. So I'm going to set my output as one
and if you do something like in the second one, zero comes more so I'm going to set zero. You do this, you miraculously enter a 90 percent accuracy. This is your simplest model, maximum counting which will also be your average. In some sense you're going to use CPU as a proxy
for creating more features or creating this particular kind of model. So we're going to talk about strategies which can use some clever techniques to search your solution space.
Having said this, I should tell that ensemble models are not new. It's been there for a very long time. Techniques like random forest gradient boosting machines are also ensemble models but they just use it at a very simplistic level. Advances in computing power has led to
far more powerful techniques that we're going to see very shortly. Ensemble models were used predominantly in academia for a long time until the Netflix competition. This happened in 2007.
This was won by creating an ensemble model and that's when industry started noting okay we can ensemble models and now you see a lot of ensemble models in production. In its essence this is how it looks like. You have an input data, you create different kinds of models. It's very
important that you create different kinds of models. These are not the same models and you combine them using some logic and then you use that to take your final prediction. This is what the architecture of an ensemble model would look like. What are the advantages of doing this?
It definitely improves the accuracy. Not always but most of the times it improves the accuracy. It becomes very robust. Your variance of your output reduces and because you do things and
you can do things in parallel, it lends itself nicely to parallelization and you can you can do things much faster. Two important things as I told here. You have to select different models so you should have base model diversity and there should be some way to aggregate the models. So these are two things that's very important for ensemble models. There are four
important techniques that you can use to create base models. You can use different training data sets. You can take so that's your number of observations. You can sample on your
number of features. You can use different algorithms. You can use linear regression. You can use random forest. You can use neural networks. You can have that's different algorithms and each one of them can have different hyper parameters. This will cause you to build different models. Once you have that the combination logic can be voting. We saw voting was the one that
we used. It could also be averaging. If you want to use probability as your output then you would average. If you just want the class of your output you would just vote.
And blending and stacking. Stacking works something like this. That you would come you would split your data into two parts. You would use the first part to create something called base model learner output. So these are your base models and that's used as an input for your
secondary model. So that's that's what you do in stacking. This could be so this would also be a model. This model's input would be the output of the previous model. How would you do this in Python? You would this this is your base models. Okay here you have
your base models which can be done using pipeline. Scikit-learn has pipeline so you can take different libraries. Scikit-learn, Keras is for deep learning, XGBoost is for gradient boosting.
You build a model and you set everything in a pipeline and you can use randomized search CV for getting your cross-validation score. And once you have your base models you can use
so if you want to do weighting if you want to do weighted average you would use a library called hyperopt. It's a hyper optimization library. If you want to do stacking you would again feed it to another model probably an XGBoost or logistic regression to create
the final prediction. Something on randomized search CV it's faster than what you would do in in your grid search. It's much faster. Hyperopt is as I told earlier it's for optimization over the search spaces. It's an optimization routine. It's widely used for parallelization. So models
lend itself nicely to parallelization. You can run each model in parallel. Generally what you would do if you run one model is you will run your cross-validation in parallel. Instead of here you can scale out and you can run each model in parallel and joblib does this task
extremely nicely. You can log you can trace it like you really know which model takes a lot of time and where you should optimize gives you enough flexibility in doing that.
Okay to really know more about ensembles at work really suggest you to look into Kaggle competitions. The winning models are invariably ensemble models these days. Here's an example of how this was used by the winner for the crowdflower search relevance competition. Crowdflower ran a competition where the search results have
to be the relevance of the search results have to be classified. You can see that once you get to the feature extraction stage different models were built and the final model was the combination of these models. Having said this not everything is fine in an ensemble
model. If you want to interpret the model for an interpretable model this is really not what you would use interpretability goes for a toss sometimes this is also true if you look into
Kaggle to get to the last two percent or one percent extra accuracy the time it takes to improve accuracy may not make sense in real life practice so that's a big disadvantage it takes a really long time to improve accuracy so you would probably do this really a full-blown
methodology only if accuracy is your most important metric. Okay what's the cool stuff around it we actually created a package on building ensemble models we pushed it two three days back uh it's on the github site we've just submitted to pypi
hoping to um so that's uh that's my team which worked on this yes that's all i have thanks questions okay who has a question for now friend no
yeah just a github slide please the what the github slide yeah right yes take a picture
arm yeah another documentation is still not up but it has it has notebooks which talks about how we use this okay no other question thank you so much thanks