Machine Learning: Power of Ensembles
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 167 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21111 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Red HatDemonMathematical modelMusical ensembleLecture/Conference
00:16
Power (physics)Musical ensembleMusical ensembleMathematical modelMathematical modelLecture/ConferenceJSONXMLUML
00:36
Mathematical modelMusical ensembleControl flowMeeting/Interview
00:53
Network topologyNumberPoint (geometry)Coefficient of determinationMultiplication signNetwork topologyComputer animation
01:39
Domain nameMathematical modelMusical ensembleComputer animation
01:53
Mathematical modelDomain nameMeeting/Interview
02:06
Process (computing)Machine learningMathematical modelPredictabilityAlgorithmProcess (computing)Machine learningoutputJSONXMLComputer animation
02:25
Mathematical modeloutputForm (programming)Mathematical modelFile formatLecture/ConferenceMeeting/Interview
02:44
Mathematical modelMathematical model
03:02
Machine learningProcess (computing)Musical ensembleMathematical modelDistribution (mathematics)Linear regressionRandomizationForestNetwork topologyMathematical modelDifferent (Kate Ryan album)Decision theoryLinearizationGradientLogistic distribution
03:18
Whiston, WilliamMathematical modelVirtual machineVector spaceSelectivity (electronic)Support vector machineLecture/ConferenceMeeting/Interview
03:36
Mathematical modelProcess (computing)outputMathematical modelMathematical modelJSON
04:01
System identificationMultiplication signTransformation (genetics)JSONComputer animation
04:21
Process (computing)Multiplication signMeeting/Interview
04:36
Mathematical modelDifferent (Kate Ryan album)PredictionSolution setMathematical modelDifferent (Kate Ryan album)SpacetimePredictabilityDialectJSONXML
04:59
Mathematical modelAlgorithmDifferent (Kate Ryan album)DialectSpacetimeCategory of beingJSONComputer animation
05:16
NumberParameter (computer programming)Mathematical modelMathematical modelMathematical modelParameter (computer programming)Set (mathematics)Different (Kate Ryan album)Meeting/InterviewJSONXMLComputer animation
05:35
Feasibility studyMusical ensembleMathematical modelMathematical modelLogistic distributionLinear regressionSet (mathematics)Mathematical modelDifferent (Kate Ryan album)Mathematical modelPotenz <Mathematik>Linear regressionCombinational logicStrategy gameMusical ensembleNatural numberRevision controlGradientLogistic distributionRandomizationForestComputer animation
06:19
Mathematical modelBuildingMusical ensembleForestRandom numberWave packetParameter (computer programming)Matrix (mathematics)GradientLinear regressionLogistic distributionForestMathematical modelGradientLogistic distributionLinear regressionMeeting/InterviewComputer animation
06:38
Mathematical modelParameter (computer programming)Wave packetMatrix (mathematics)GradientRandom numberForestLinear regressionCondition numberLinear regressionGradientMathematical modelPredictabilityInsertion lossPoint (geometry)Software testingSet (mathematics)Library (computing)
07:17
Random numberGradientForestMusical ensembleFunction (mathematics)Symmetric matrixMathematical modelCross-validation (statistics)Information securityMaxima and minimaCountingNumberRow (database)Function (mathematics)Meeting/InterviewTable
07:43
Musical ensembleRandom numberGradientForestFunction (mathematics)BefehlsprozessorProxy serverFunction (mathematics)Mathematical modelMaxima and minimaAverageBefehlsprozessorProxy serverMeeting/InterviewComputer animationTable
08:08
AlgorithmSolution setProxy serverMathematical modelStrategy gameMusical ensembleMathematical modelSpacetimeLecture/ConferenceMeeting/InterviewComputer animation
08:27
Musical ensembleMathematical modelVirtual machineForestMultiplication signGradientRandomizationPower (physics)Level (video gaming)JSONComputer animation
08:56
Mathematical modelMultiplication signMusical ensembleProduct (business)Mathematical modelCodecLecture/ConferenceMeeting/Interview
09:15
Mathematical modelMathematical modelLogicMusical ensemblePredictionoutputMathematical modelDifferent (Kate Ryan album)Computer architectureMusical ensembleMathematical modeloutputPredictabilityLogicXMLProgram flowchart
09:40
Parallel computingMathematical modelMusical ensembleParameter (computer programming)Different (Kate Ryan album)HypercubeSet (mathematics)AlgorithmSampling (music)Musical ensembleDifferent (Kate Ryan album)Function (mathematics)Mathematical modelMathematical modelWave packetMultiplication signVarianceParalleler AlgorithmusComputer animation
10:33
Different (Kate Ryan album)HypercubeParameter (computer programming)Set (mathematics)AlgorithmSampling (music)Mathematical modelAverageVotingNumberWave packetState observerSampling (statistics)Set (mathematics)DialectVotingMathematical modelForestSocial classFunction (mathematics)Stack (abstract data type)AlgorithmParameter (computer programming)Different (Kate Ryan album)Artificial neural networkLinear regressionCombinational logicAdditionWater vaporMeeting/InterviewComputer animation
11:21
Function (mathematics)DiagramLevel (video gaming)Mathematical modeloutputCapability Maturity ModelMathematical modelAreaMereologyFunction (mathematics)Heegaard splittingDiagramProgram flowchart
11:58
Library (computing)Mathematical modelMathematical modelLogistic distributionLinear regressionMathematical optimizationPredictabilityGradientAverageHypercubeRandomizationCross-validation (statistics)WeightCondition numberFood energyDifferent (Kate Ryan album)PlanningSet (mathematics)Program flowchart
13:01
StatisticsParameter (computer programming)Sample (statistics)Distribution (mathematics)Dimensional analysisMathematical optimizationSerial portParalleler AlgorithmusLibrary (computing)Coma BerenicesFunction (mathematics)Pattern languageMathematical modelParalleler AlgorithmusMathematical optimizationCross-validation (statistics)Mathematical modelProcess (computing)Task (computing)MereologySpacetimeCoroutineLecture/ConferenceMeeting/InterviewComputer animation
13:47
Musical ensembleComa BerenicesMetropolitan area networkMathematical modelWechselseitige InformationMathematical modelMultiplication signLevel (video gaming)Mathematical modelVideo gameMusical ensembleCombinational logicInterpreter (computing)Real numberBuildingWebsiteMetric systemResultantDifferent (Kate Ryan album)VotingMixture modelMetrologieCore dumpQuicksortSpectrum (functional analysis)Lecture/ConferenceMeeting/InterviewComputer animation
15:51
DistanceSlide ruleArmLaptopLecture/ConferenceMeeting/Interview
16:24
Musical ensembleComa BerenicesWechselseitige InformationLaptopLecture/ConferenceMeeting/InterviewComputer animation
16:44
3 (number)DemonComputer animation
Transcript: English(auto-generated)
00:00
Subramanian. He will explain some stuff about the machine learning. Okay, let's go. Good afternoon everyone. This talk is about what ensemble models are. Some of you might
00:24
have heard about it. If you've ever participated in a Kaggle competition, almost invariably every winning solution is an ensemble model. So we'll learn what ensemble models are and how to build using Python in today's talk. I'm a data scientist at Cisco and
00:47
been doing machine learning for about 14 years. Before I start about the talk, just want to give a short puzzle. Just a bunch of birds that's there in a tree.
01:02
There are actually 150 birds. Hunter comes along and he fires three shots on the birds. His probability to hit the target is 0.2. 20% of the times he can hit. So he's hit his shot three times. One bird is hit. The question is at the end of it, how many birds
01:28
now remain on the tree? Yes? Yes, they all fly away, right? So you can use complicated models,
01:42
ensemble models, deep learning models, but never lose the big picture. It's extremely important. This is an example where you will all relate to, but in practice kind of models that you build may be so complicated that you may not even understand that it relates to the domain or not.
02:07
What's the machine learning process? We have input. It gets fed into a learning algorithm which creates a machine learning model and that's used to build the prediction.
02:22
That's easy to tell. That's what is taught in school, in any course, but in reality the input data is in a different format. You'll have to transform, you'll have to modify, you have to clean, you have to do a whole lot of stuff before you create features from your
02:41
input data and that gets fed into models and you use the final model to predict. In reality it's even more complex because there's so many ways using which you can create features
03:02
and you can fit so many different kinds of models. It's not just one kind of model, but you can build so many kinds of models. You have linear models, linear regression, logistic regression, you have tree-based models, decision trees, and you have ensemble tree-based models, random forest, gradient boosting machines,
03:22
deep learning models, support vector machines. It's a lot. How would you do something like this? It's extremely difficult. How am I really going to select something from this? Recapping, there are two important steps in the process. You have to create features from
03:42
your input data set. You would also need to select models in some way. The challenge here is that the model is only as good as you. It's because you're the person who's creating the features. So the model is only as good as the features which are created by you
04:04
and this is where a lot of time is spent in traditional machine learning place. You spend a lot of time in identifying the features, in creating the features, and generating it, transformation, all kinds of stuff to create the features. This is made really famous by the
04:25
New York Times article which tells that this is the janitorial work in a data science process and it takes about 80 percent of the time. The challenge after you create the features is that
04:41
even if you use the same features, different models give different predictions. And why is that? It's because the solution space is so huge. It's defined by your entire space of your features and the models can go and search in different regions of the solution space. Different
05:03
algorithms search in different solution spaces and it's kind of hard to figure out which one is what I should take so that they have better generalization property. Okay given that you know how to create features from it, we'll now talk about how to improve model performance. This is
05:24
where you'll have different models to select from. Each model has different parameters that you can tune. It's called hyper parameters and you can try different sets of features
05:40
for each of the models. How can you create a final model using this? You cannot try all possible combinations that really explode. It's exponential in nature. Exhaustive search is definitely ruled out and this is where ensemble models helps us come up with some kind of strategy on
06:03
solving this. We'll talk about what ensemble models are just with some toy example. We'll create on a dummy data set three kinds of models. One is a linear model logistic regression is a classification problem. We have a logistic regression model, a tree-based random forest model
06:26
and a gradient boosting model. So it's very easy to build in Python. Logistic regression is directly there in scikit-learn. You just create, just use linear
06:40
model called logistic regression. Gradient boosting machine is implemented by an awesome library called XGBoost. It's better than the one that's there in scikit-learn. And you have the prediction on the test data set. The assumption here is that all the test data sets are one and this is how the model comes. They all have different predictions on the same
07:07
10 data points but they all have same accuracy. They all have 70 percent accuracy. How do we go and build a final prediction when you have all three models with equal? So if you probably use cross-validation as a metric, you're going to have 70 percent in all three of them.
07:25
But if you look into what does here, there's a easy way where you can combine these three models into something better. You can take the number that comes maximum, here it's max count, and you see that okay in my first row one comes the most. So I'm going to set my output as one
07:46
and if you do something like in the second one, zero comes more so I'm going to set zero. You do this, you miraculously enter a 90 percent accuracy. This is your simplest model, maximum counting which will also be your average. In some sense you're going to use CPU as a proxy
08:09
for creating more features or creating this particular kind of model. So we're going to talk about strategies which can use some clever techniques to search your solution space.
08:25
Having said this, I should tell that ensemble models are not new. It's been there for a very long time. Techniques like random forest gradient boosting machines are also ensemble models but they just use it at a very simplistic level. Advances in computing power has led to
08:47
far more powerful techniques that we're going to see very shortly. Ensemble models were used predominantly in academia for a long time until the Netflix competition. This happened in 2007.
09:04
This was won by creating an ensemble model and that's when industry started noting okay we can ensemble models and now you see a lot of ensemble models in production. In its essence this is how it looks like. You have an input data, you create different kinds of models. It's very
09:24
important that you create different kinds of models. These are not the same models and you combine them using some logic and then you use that to take your final prediction. This is what the architecture of an ensemble model would look like. What are the advantages of doing this?
09:44
It definitely improves the accuracy. Not always but most of the times it improves the accuracy. It becomes very robust. Your variance of your output reduces and because you do things and
10:01
you can do things in parallel, it lends itself nicely to parallelization and you can you can do things much faster. Two important things as I told here. You have to select different models so you should have base model diversity and there should be some way to aggregate the models. So these are two things that's very important for ensemble models. There are four
10:27
important techniques that you can use to create base models. You can use different training data sets. You can take so that's your number of observations. You can sample on your
10:41
number of features. You can use different algorithms. You can use linear regression. You can use random forest. You can use neural networks. You can have that's different algorithms and each one of them can have different hyper parameters. This will cause you to build different models. Once you have that the combination logic can be voting. We saw voting was the one that
11:07
we used. It could also be averaging. If you want to use probability as your output then you would average. If you just want the class of your output you would just vote.
11:21
And blending and stacking. Stacking works something like this. That you would come you would split your data into two parts. You would use the first part to create something called base model learner output. So these are your base models and that's used as an input for your
11:43
secondary model. So that's that's what you do in stacking. This could be so this would also be a model. This model's input would be the output of the previous model. How would you do this in Python? You would this this is your base models. Okay here you have
12:10
your base models which can be done using pipeline. Scikit-learn has pipeline so you can take different libraries. Scikit-learn, Keras is for deep learning, XGBoost is for gradient boosting.
12:26
You build a model and you set everything in a pipeline and you can use randomized search CV for getting your cross-validation score. And once you have your base models you can use
12:41
so if you want to do weighting if you want to do weighted average you would use a library called hyperopt. It's a hyper optimization library. If you want to do stacking you would again feed it to another model probably an XGBoost or logistic regression to create
13:00
the final prediction. Something on randomized search CV it's faster than what you would do in in your grid search. It's much faster. Hyperopt is as I told earlier it's for optimization over the search spaces. It's an optimization routine. It's widely used for parallelization. So models
13:25
lend itself nicely to parallelization. You can run each model in parallel. Generally what you would do if you run one model is you will run your cross-validation in parallel. Instead of here you can scale out and you can run each model in parallel and joblib does this task
13:45
extremely nicely. You can log you can trace it like you really know which model takes a lot of time and where you should optimize gives you enough flexibility in doing that.
14:00
Okay to really know more about ensembles at work really suggest you to look into Kaggle competitions. The winning models are invariably ensemble models these days. Here's an example of how this was used by the winner for the crowdflower search relevance competition. Crowdflower ran a competition where the search results have
14:25
to be the relevance of the search results have to be classified. You can see that once you get to the feature extraction stage different models were built and the final model was the combination of these models. Having said this not everything is fine in an ensemble
14:47
model. If you want to interpret the model for an interpretable model this is really not what you would use interpretability goes for a toss sometimes this is also true if you look into
15:01
Kaggle to get to the last two percent or one percent extra accuracy the time it takes to improve accuracy may not make sense in real life practice so that's a big disadvantage it takes a really long time to improve accuracy so you would probably do this really a full-blown
15:22
methodology only if accuracy is your most important metric. Okay what's the cool stuff around it we actually created a package on building ensemble models we pushed it two three days back uh it's on the github site we've just submitted to pypi
15:42
hoping to um so that's uh that's my team which worked on this yes that's all i have thanks questions okay who has a question for now friend no
16:13
yeah just a github slide please the what the github slide yeah right yes take a picture
16:21
arm yeah another documentation is still not up but it has it has notebooks which talks about how we use this okay no other question thank you so much thanks
Recommendations
Series of 2 media
Series of 5 media