Machine Learning: Power of Ensembles
Formal Metadata
Title 
Machine Learning: Power of Ensembles

Title of Series  
Part Number 
167

Number of Parts 
169

Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
Bargava Subramanian  Machine Learning: Power of Ensembles In Machine Learning, the power of combining many models have proven to successfully provide better results than single models. The primary goal of the talk is to answer the following questions: 1) Why and How ensembles produce better output? 2) When data scales, what's the impact? What are the tradeoffs to consider? 3) Can ensemble models eliminate expert domain knowledge?  It is relatively easy to build a firstcut machine learning model. But what does it take to build a reasonably good model, or even a state ofart model ? Ensemble models. They are our best friends. They help us exploit the power of computing. Ensemble methods aren't new. They form the basis for some extremely powerful machine learning algorithms like random forests and gradient boosting machines. The key point about ensemble is that consensus from diverse models are more reliable than a single source. This talk will cover how we can combine model outputs from various base models(logistic regression, support vector machines, decision trees, neural networks, etc) to create a stronger/better model output. This talk will cover various strategies to create ensemble models. Using thirdparty Python libraries along with scikitlearn, this talk will demonstrate the following ensemble methodologies: 1) Bagging 2) Boosting 3) Stacking Reallife examples from the enterprise world will be showcased where ensemble models produced better results consistently when compared against single bestperforming models. There will also be emphasis on the following: Feature engineering, model selection, importance of biasvariance and generalization. Creating better models is the critical component of building a good data science product.

Related Material
00:00
Demon
Red Hat
00:16
Musical ensemble
Musical ensemble
Mathematical model
Mathematical model
Power (physics)
00:36
Control flow
Musical ensemble
Mathematical model
00:53
Point (geometry)
Number
Coefficient of determination
Network topology
Network topology
Multiplication sign
01:26
Mathematical model
Domain name
01:53
Domain name
Predictability
Machine learning
Algorithm
Machine learning
Process (computing)
output
Process (computing)
Mathematical model
Mathematical model
02:19
Machine learning
output
Process (computing)
Mathematical model
Form (programming)
02:44
Machine learning
Distribution (mathematics)
Randomization
Different (Kate Ryan album)
Linear regression
Network topology
Forest
Process (computing)
Musical ensemble
Mathematical model
Mathematical model
03:18
Support vector machine
Process (computing)
Mathematical model
Whiston, William
Virtual machine
output
Mathematical model
03:49
Mathematical model
Mathematical model
04:05
Process (computing)
Transformation (genetics)
Multiplication sign
System identification
04:36
Predictability
Prediction
Different (Kate Ryan album)
Different (Kate Ryan album)
Mathematical model
Mathematical model
04:49
Solution set
Algorithm
Dialect
Different (Kate Ryan album)
Mathematical model
Spacetime
Mathematical model
05:07
Category of being
Mathematical model
Mathematical model
Mathematical model
05:27
Number
Different (Kate Ryan album)
Set (mathematics)
Parameter (computer programming)
Parameter (computer programming)
Mathematical model
Mathematical model
05:44
Potenz <Mathematik>
Mathematical model
Linear regression
Linear regression
Gradient
Logistic distribution
Combinational logic
Feasibility study
Mathematical model
Mathematical model
Mathematical model
Revision control
Strategy game
Natural number
Musical ensemble
Musical ensemble
06:19
Forest
Random number
Mathematical model
Building
Gradient
Forest
Gradient
Musical ensemble
Parameter (computer programming)
Mathematical model
Matrix (mathematics)
Wave packet
06:32
Mathematical model
Linear regression
Linear regression
Gradient
Logistic distribution
Parameter (computer programming)
Insertion loss
Matrix (mathematics)
Wave packet
Condition number
06:51
Predictability
Point (geometry)
Forest
Random number
Linear regression
Gradient
Software testing
Mathematical model
Symmetric matrix
07:26
Forest
Random number
Gradient
Function (mathematics)
Musical ensemble
Counting
Maxima and minima
Mathematical model
Information security
Number
07:43
Forest
Random number
Average
Function (mathematics)
Gradient
Musical ensemble
Maxima and minima
Function (mathematics)
Mathematical model
08:03
Proxy server
Befehlsprozessor
Proxy server
08:17
Solution set
Randomization
Strategy game
Algorithm
Forest
Multiplication sign
Gradient
Virtual machine
Musical ensemble
Mathematical model
Spacetime
08:36
Multiplication sign
Energy level
Musical ensemble
Mathematical model
Mathematical model
Power (physics)
09:07
Predictability
Mathematical model
Mathematical model
Mathematical model
Product (business)
Mathematical model
Prediction
Different (Kate Ryan album)
Logic
Logic
Musical ensemble
output
Musical ensemble
output
Computer architecture
09:40
Mathematical model
Parallel computing
Multiplication sign
Musical ensemble
Variance
Function (mathematics)
Musical ensemble
Mathematical model
Mathematical model
10:23
Dialect
Mathematical model
Algorithm
Sampling (statistics)
Set (mathematics)
Parameter (computer programming)
Mathematical model
Mathematical model
Hypercube
Wave packet
Number
Different (Kate Ryan album)
Sampling (music)
Different (Kate Ryan album)
Set (mathematics)
10:41
Addition
Algorithm
Mathematical model
Algorithm
Combinational logic
Parameter (computer programming)
Water vapor
Function (mathematics)
Average
Mathematical model
Hypercube
Voting
Different (Kate Ryan album)
Sampling (music)
Forest
Different (Kate Ryan album)
Set (mathematics)
Social class
11:21
Area
Predictability
Planning
Set (mathematics)
Mathematical model
Mathematical model
Food energy
Hypercube
Different (Kate Ryan album)
Average
Function (mathematics)
output
Energy level
Diagram
Mathematical optimization
Capability Maturity Model
Library (computing)
Condition number
13:01
Serial port
Sample (statistics)
Distribution (mathematics)
Dimensional analysis
Parameter (computer programming)
Coma Berenices
Parallel port
Statistics
Mathematical optimization
Library (computing)
Mathematical optimization
13:20
Process (computing)
Function (mathematics)
Crossvalidation (statistics)
Pattern language
Parallel port
Mereology
Mathematical model
Mathematical model
Task (computing)
13:58
Metropolitan area network
Voting
Different (Kate Ryan album)
Combinational logic
Musical ensemble
Coma Berenices
Musical ensemble
Mathematical model
Resultant
14:42
Wechselseitige Information
Building
Mathematical model
Multiplication sign
Coma Berenices
Mathematical model
Mathematical model
Mixture model
Core dump
Interpreter (computing)
Musical ensemble
Metrologie
Musical ensemble
Quicksort
Metric system
Spectrum (functional analysis)
15:47
Slide rule
Distance
16:24
Laptop
Wechselseitige Information
Musical ensemble
Coma Berenices
16:44
Demon
3 (number)
00:00
2 prominent you and you explain some stuff from onedimensional OK so we are going to be run the
00:18
stock is what ensemble models some of you might have thought about some of the material participated in our cabin
00:29
competition almost invariably solution is an ensemble model so loan
00:38
what ensemble models are both of those 2 using break down into the
00:42
stock fall from a data scientist at the school and rendering machine learning for about 14 years before I start with the talk of
00:55
this 1 to you a shot of the on this a bunch of bullets but there in a tree the actually 150
01:05
bullets hundred comes along and the players 3 shots on the what's our has probably give hit the dog at this point to the 20 per cent of the time see compared with that distraught T times of 1 below this had the Christian as for the not but homey
01:27
bullets no agreement on the street yes yes they also play right so you can
01:40
use complicated models from simple models deep learning models but never lose the big picture of it's extremely important it's this is an example where you along
01:53
related to but in practice kind of models that you built of maybe so complicated that you may not hear and understand that it relates to the domain of knocked what's the
02:08
machine learning process we have input it gets fed into a learning algorithm which creates emission learning model and that's used to build the prediction that's the
02:23
that's what is taught in schools and in the cost
02:27
but in reality the input data is in this in a different form but I have to transform have to to modify you to clean up to do a whole lot of stuff before you create features from the input data on that gets fed into model
02:47
and use the final model to predict in reality it's even more complex because there's so many ways using which you can create creatures you can fit
03:04
to so many different kinds of model not just 1 kind of model that you can build so many kinds of qualities of the new models linear regression nodes distribution of treebased models Mission trees and have ensemble previous models random forests
03:20
William Whiston machines have deep learning models support vector machines it's and not a lot how would you do something like this it's extremely difficult or what what am I really however is really going to select something from the regarding the 2
03:38
important steps In the process you have to create features from the input dataset you also need to select models in some way the challenge here
03:53
is that the model is only as good as you it's because you're the person is creating the features so the model is only as good as features which is created by
04:02
you and this is where a lot of
04:06
time spent in traditional machine learning please you spend a lot of time in identifying the features in creating the features on generating a transformation all kinds of stuff to create the
04:21
features of all this is made famous by the New York Times articles which tells that this is the janitorial work in that it assigns process and it takes about all 80 per cent of the time the challenge after you
04:39
create a feature so that if you if you have the same features different models give different predictions and why is
04:48
that it's because the solution
04:51
space is so huge it's defined by your and there the soccer features on the models can go on search
05:00
in different regions of the solutions to these different algorithms such different solutions pesos on it's
05:08
kind of hard to figure out which 1 is what should take so that they have better generalization property OK given that you
05:18
know how to create features from a lot of talk about how to improve model performance this is where you have different models to select
05:28
from each models has different parameters that you can use this column hyperparameters and
05:37
make it you can try different sets of features for each of the models how can you create the
05:46
final model using those not try all possible combinations that really explored it's exponential in nature the exhaustive search is definitely and this is where the ensemble models helps us come up with some kind of strategy on solving this the talk about
06:06
what the the ensemble models are just with some by example with grade on a dummy data said 3 kinds of models 1 linear model Lojistik revisions a classification problems have a large
06:20
degradation model a based on random forest
06:24
model on the gradient
06:27
boosting model so it's very easy to build and all by
06:33
Don all modesty provisions directly various I could learn a discrete of this
06:40
use linear model called loves to condition not reading distinguishing this implemented
06:45
by the loss of liberty ecology boast of it's better than the 1 that's the of and you have
06:54
the prediction on the test dataset the assumption here is that quality do is datasets are 1 and this is how the model comes the also have different predictions on the same in data points but they all have the same accuracy on have 70 per cent how do we go and with a final prediction
07:17
when how all the models with equal to if you probably use crossvalidation symmetric have 70 per cent of all 3 of them but if you look into
07:26
what does here is the easy way where you can come by and these 3 models into something that you
07:33
can take going number that comes maximum security max count and you see that OK in my 1st role 1 comes the most so I'm going to
07:44
set my output as 1 and if
07:46
you do something like the the 2nd 1 0 come small assignments and 0 you know those you miraculously enter a 90 per cent accuracy this is the simplest model maximum counting all which will also be your average in some sense you
08:07
going to use CPU as
08:08
a proxy for creating more features of creating this particular kind of what so would
08:17
talk about strategies which can you use some clever techniques to search the solution space
08:25
having said this I should tell that ensemble models are not
08:29
new it's been there for a very long time techniques like random forest gradient boosting machines are also ensemble models but they just use
08:39
it as a very simplistic level advances in computing power has led to a far more powerful techniques that the really sharply on some models for use predominantly and
08:56
academia for a long time until the the Netflix competition this happened to those and 7 of this was 1 creating an ensemble model and that's where an
09:08
industrystandard noting that all you can use ensemble models and now you see a lot of ensemble models and in production
09:16
In sense this assault it looks like you have an input data you create different kinds of models it's very important that you create different kinds of models is not the same models and you combine them using some logic and then you use that to date your final prediction this is what the architecture of an ensemble model would look like
09:41
what are the advantages of doing this well it definitely improves the accuracy not obvious but most of the times it improves the accuracy it becomes the robust your variance of the output reduces and because we're doing things in that you can do things in parallel the lends itself nicely depalatalization anything you can do things much faster 2
10:09
important things as they here you have to sell at different model so you should have this model diversity and there should be some somebody to negate the models so the subdued things that's very important for of ensemble
10:25
models the before important techniques that you can use to create this model you can use different training data
10:35
sets you can take for that's your number of the regions you can sample on the number of
10:42
features you can use different algorithms have been used in addition you can use random forests use neural it works you can have that's different algorithms and each of them can have different hyperparameters this will cost you to build different models once you have that
11:01
the combinational logic can be according really salt water was the 1 that we used it could also be averaging if if you want to use probability as your own book then you would have rich you just 1 big class of output you just walked and blending and
11:21
stacking all stacking looks something like that that you would come to its mature data into through pipes you would use a fast but to create something called this modern of put for this area based models and that's used as an input for your 2nd model so that's that's what you do docket of this could be so this would also be a modest models input would be dealt with all of the previous model how would you
12:00
do this in Python you would this this is based models OK here you have a base models which can be done using by plane so could learn house by plane so you can take all of the different libraries like atlas terraces for deep learning of you will this was really interesting to build a model and a set everything up pipeline and you can use randomized so it's the before crossvalidation school and once you have your based models you can use so if you want to do build great Dave you want to read an average all you use the library qualified but often it's an hyper optimization library of our if you want to stacking you again created to another model the following energy boost our mothers to conditions to create the final prediction something on
13:04
randomized so it's even it's faster than what you would do
13:07
in In our grid search it's much faster and more hyper
13:13
as a it's it's for optimization the search business that and optimization working widely
13:20
used 1 for parallelization somewhat models lend itself nicely to balance each month and run each model in problem generally what you would do if you run 1 model you would run your cross validation in fact instead of a curious can scaleout and can run each model in parallel and job it does this task extremely nicely part you can log inventories that we
13:49
you you know which model takes a lot of pheromone really should optimize the you soon enough flexibility in doing that all OK home to really
14:02
know more about ensembles that will grow on really suggest you to look into that of competitions than models and invariably ensemble models these days
14:14
but here's an example of all this was used by the winner for the cold global search relevance competition Crawford have manner competition where the search results have to be the relevance of the search results have to be classified are you can see that once you get to the feature extraction stage different models vote and and the final models combination of these models all of of
14:43
them having said this not everything is fine an ensemble model of if you want to interpret the model interpretable model this is clearly not what you would use interpretability goes for across sometimes this is also true if you look into gathered to get to the last 2 per cent or 1 per cent mixture the time it takes to improve accuracy may not make sense reelected practice so that's a big disadvantage it takes a really long time to improve accuracy so you would probably this really a fullblown metrology only accuracies are most important metric of the what's the
15:29
cool stuff around that's all we actually created package on building on some models of it was due to years back crowd song the dump cited this summer typified by going to the home so that's sort of spectrum which will on those that's all I have to thanks
15:52
distance OK we'll have
16:07
secretions announcement known and there to gift of slide is the what they did have slept threat to your picture on the other documentation is still a lot
16:25
of what it has it does notebooks which talks about how we
16:29
use this the the pocket
16:36
northern christians thank you so much of it is the 1st
16:46
from