Explaining behavior of Machine Learning models with eli5 library
Formal Metadata
Title 
Explaining behavior of Machine Learning models with eli5 library

Title of Series  
Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2017

Language 
English

Content Metadata
Subject Area  
Abstract 
Explaining behavior of Machine Learning models with eli5 library [EuroPython 2017  Talk  20170713  Anfiteatro 2] [Rimini, Italy] ML estimators don't have to be black boxes. Interpretability has many benefits: it is easier to debug interpretable models, humans trust decisions of such models more. In this talk I’ll give an overview of ML models interpretation and debugging techniques. I’ll cover linear models, decision trees, tree ensembles, arbitrary classifiers using LIME algorithm. The talk focus is on explanation algorithms, because it is important to be aware of pitfalls and limitations of the explanation method to be able to interpret an explanation correctly. I’ll also show how to use these techniques in practice, to debug and explain behavior of estimators from Python ML libraries like scikitlearn and xgboost using opensource eli5 library: https://github.com/TeamHGMemex/eli5 . Attendees will get both practical and theoretical understanding of these explanation methods. Target audience is ML practitioners who want to 1) get a better quality from their ML pipelines  understanding of why a wrong decision happens is often a first step to improve the quality of an ML solution; 2) explain ML model behavior to clients or stakeholders  inspectable ML pipelines are easier to “sell” to a client; humans trust such models more because they can check if an explanation is consistent with their domain knowledge or gut feeling, understand better shortcomings of the solution and make a more informed decision as a result

00:00
Web crawler
Mathematical model
Virtual machine
Mathematical model
Library (computing)
Descriptive statistics
00:38
Linear regression
Computer file
Weight
Multiplication sign
1 (number)
Outline of industrial organization
Parameter (computer programming)
Regular graph
Distance
Mathematical model
Mathematical model
Machine code
Wave packet
Product (business)
Performance appraisal
Coefficient
Wellformed formula
Software
Crossvalidation (statistics)
Cuboid
Absolute value
Metropolitan area network
Linear map
Condition number
Area
Machine learning
Curve
Pairwise comparison
Theory of relativity
Scaling (geometry)
Linear regression
Weight
Projective plane
Sampling (statistics)
Mathematical analysis
Electronic mailing list
Sound effect
Uniform resource locator
Prediction
Personal digital assistant
Universe (mathematics)
Right angle
Quicksort
Coefficient
Resultant
06:47
Predictability
Scaling (geometry)
Open source
Multiplication sign
Virtual machine
Coma Berenices
Letterpress printing
Machine code
Black box
Mathematical model
Measurement
Mathematical model
Machine code
Number
Personal digital assistant
Video game
Object (grammar)
Quicksort
Coefficient
Intercept theorem
Curve fitting
Library (computing)
Library (computing)
08:32
Logarithm
Twin prime
Physical law
Insertion loss
Black box
Term (mathematics)
Mathematical model
Message passing
Process (computing)
Condition number
Form (programming)
Computer architecture
Task (computing)
09:38
Web page
Noise (electronics)
Slide rule
Information
Computer file
Linear regression
Weight
Virtual machine
Mathematical model
Computer graphics (computer science)
Data mining
Medical imaging
Kernel (computing)
Software
Different (Kate Ryan album)
Term (mathematics)
Core dump
Binary multiplier
Library (computing)
Form (programming)
Social class
11:59
Email
Weight
Content (media)
Parameter (computer programming)
Mereology
Graph coloring
Mathematical model
Element (mathematics)
Word
Message passing
Negative number
Right angle
Maize
Resultant
Address space
Social class
13:58
Filter <Stochastik>
Email
Module (mathematics)
Direction (geometry)
Source code
Set (mathematics)
Water vapor
Parameter (computer programming)
Student's ttest
Mathematical model
Wave packet
Cuboid
Software testing
Message passing
Position operator
Task (computing)
Social class
Predictability
Default (computer science)
Email
Focus (optics)
Theory of relativity
Touchscreen
Information
Content (media)
Mathematical analysis
Data storage device
Sound effect
Message passing
Word
Video game
Musical ensemble
Coefficient
Library (computing)
18:28
Performance appraisal
Machine learning
Cuboid
Information
Mereology
Message passing
Mathematical model
Mathematical model
18:59
Continuum hypothesis
Simultaneous localization and mapping
Computergenerated imagery
Mereology
Window
19:45
Real number
20:25
Addition
Metric system
Theory of relativity
Mathematical model
File format
Projective plane
Interior (topology)
Sound effect
Range (statistics)
Mathematical model
Machine code
Latent heat
Performance appraisal
Process (computing)
Singleprecision floatingpoint format
Computer cluster
Cuboid
Task (computing)
21:31
Predictability
Slide rule
Intel
Implementation
Link (knot theory)
Weight
Multiplication sign
Shared memory
Mathematical model
Mathematical model
Theory
Symbol table
Mathematical model
Prediction
Network topology
Decision support system
Network topology
Forest
Electronic meeting system
Musical ensemble
Form (programming)
22:47
Predictability
Implementation
Scaling (geometry)
Mathematical model
Link (knot theory)
Decision theory
Modal logic
Multiple Regression
Black box
Mereology
Mathematical model
Mathematical model
Wave packet
Wave packet
Mathematical model
Arithmetic mean
Nonlinear system
Prediction
Different (Kate Ryan album)
Forest
Cuboid
Software testing
Resultant
Library (computing)
25:38
Predictability
Neighbourhood (graph theory)
Functional (mathematics)
Mathematical model
Prediction
Function (mathematics)
Singleprecision floatingpoint format
Neighbourhood (graph theory)
Set (mathematics)
Distance
Theory
26:30
Random number
Neighbourhood (graph theory)
Distribution (mathematics)
Computergenerated imagery
Black box
Mereology
Mathematical model
Mathematical model
Medical imaging
Cuboid
Area
Distribution (mathematics)
Information
File format
Neighbourhood (graph theory)
Physical law
Sampling (statistics)
Mereology
Machine code
Approximation
Mathematical model
Film editing
Sample (statistics)
Error message
Sweep line algorithm
Data type
28:05
Implementation
Linear regression
Computer file
Linear regression
Computergenerated imagery
Source code
Expert system
Line (geometry)
Mathematical model
Mathematical model
Frame problem
Mathematical model
Medical imaging
Entropie <Informationstheorie>
Pattern language
Implementation
Task (computing)
Task (computing)
Library (computing)
28:54
Medical imaging
Mechanism design
Mathematical model
Mathematical model
Mathematical model
Machine code
Computer programming
Electric current
Rhombus
Similarity (geometry)
29:49
Mathematical model
Mathematical model
Library (computing)
00:07
hi everyone my name is revealed that I'm a data scientist description cup we do a lot smarter web crawlers and I want to talk about how things get your machine learning models so how many of you will you say can you please raise your hand that mice and I will be asking you questions to make sure you can understand my pay a Russian oxins OK the fossil but that go
00:43
much much in learning project you collect some training data and then extract features from from the 2 then you put it all some your favorite classifier and then you put that the production rate the uh what's missing is a lot of things missing but what's missing is is south a sorry do not they that yeah what else is missing yeah revelation right across the this so yeah there's a lot of things amusing but at the same point and
01:18
sold let's say the edit relations we computed accuracy score we perform crossvalidation you got love 90 % accuracy all of the ones clock but what what does it mean what qualities that best quality maybe the good quality for this model would be 99 per cent and we got 90 only because of the box analysis of 2 or maybe that's the integrated result and to be can be created and deployed the production and go party I had to do but so there is a lot of low questions after like questions which can solve just billycan that relations course the main 1 want to know if our model is reliable or not like maybe it's files spectacularly on some examples but we can see that our location so about how to solve these problems nothing 1 0 how to do this and I often don't know how to do this and I don't have an answer us so there is also but that if fuel or or inspection models where you can get an additional the which can help you left and so I came out to the conference of that area so I had had time to walk by the short there's a lot of restaurants and so each and every 1 of them is just around the those is it the same Shiffrin so or water right it's in the Romanian please raise your hands yeah and Wilhelm cited yeah yep so there are people next like this for you and for you yeah the and I visited this restaurants and I have created a formula of how to compute it surprised that is the linear regression formula so you can see that the surprise it depends on the the weight on discuss the sea of uh 1 ingredients and disease there no all man playing the guitar them and it's very important for pizza so the question is what can you see from the curve account of the good coefficients of these linear regression so please the well you can see for example that designs effects are Bryce negatively so the folly of from you from the sea the list is a course right while you can see that there may be a all the men play in digitized might put the Martians rise of the but then the well yeah but what that people's of yeah this formally awful I don't saves the formula I'm sorry about that please don't use it to fuel a once and pizza about you know yeah this final is awful the but that's great phrase gives so yeah so 1 more problem what is that that you can't compare conditions because like distance from C or you or you have doesn't have the same scale as of like old men playing the guitar time so if you just take the questions you may think that our weight is important about of my from some of important at all but this is not the case now because scale for different so are we as we can see in a chicken model parameters it's helpful I we can get some understanding of what was going on but we all must know what you are looking at and the other pitfalls Tao for example let's say really thin no I don't if universe the where is a pizza are always get a meat and mushrooms the together so they are always come together almost always come together so in this case to Our weights of all me and my friends we can't use them so we can just weights as soon as this samples and will you so we heard can make weight minus medium for and blast million and tool for mushrooms and the final you give the same result because of me to National always come to and together the an so it depends on the training at at on what so we show that the center will get informal or if they're correlated the and you need to be aware of this but in practice is not usually a problem because you usually use regularization but for example if you use our L 2 regularization than k fusions will be about the same R and if you use L 1 regularization there 1 if you can get 0 and another 1 in the it twice as large and so if you're sort provisions by the absolute value but you need to be aware of that of useful training of this linear regression the on OK so long
06:50
time some people are here used on sigh get them and so it's like a john like in of many other libraries the and there is a way to look at coefficients so the model right so this how to do it and take it on but this code is incorrect knows what's not correct here the the well it's not like entirely incorrect but it won't give you the whole formal because there is a lot of extra coefficients uh its name the intercept of you and on see here so or so we created a library called life life it means explain like in 5 it started from where needed similar of to this number 2 on various life no the most work to get this k fusions from novels How much learning models it supports 5 of popular machine learning packages more than 70 objects and this is the 1st some of measures which to explain models and their predictions of arbitrary blackbox classifiers but in the simplest case started from the stuff by the way this stable doesn't make any sense because scales of of and diffusion of skills of features in both datasets is not the same so that
08:20
libraries open source uh you can use that you can join us contribute to use and a or a sort of issues if something doesn't walk so suppose they could
08:33
learn is Elijah Graham use several of the less popular packages and defense and conditional lime logarithm which law school I explained black box models at so
08:47
much gall to and now I would give you like and moral loss examples of how can you use it so how many of you or or full of it the text processing tutorial in psychic and dogs the other people who have done this yeah there are people here nice so on a 2nd has great doxies his greater total signed on to of from of them and there is the tutorial on Text Classification there are messages from of forms an and so but based on the features of the task is to classify them and all the datasets named the 20 newsgroups but she have reason for all architectures up so then
09:40
final model tire in this tutorial is so I G F idea features and SEM Safire so is aecium justified yeah there people so it's also on so we assume we have a linear kernel is also a linear model is similar to our is a former but then you need to which approved glass is a positive or negative you just compressed core tools you is greater than answer these better than 0 there the answer is yes it's less than 0 answer is no Antoine also what's TFIDF yeah there's a lot of people on and there's a book or and by Our Christopher mining the information to will book I interests and to the chapter about the TFIDF and there is a page shows that 60 different ways to computed like 64 falls to and the 2nd term is not 1 of them so every machine learning library users in some form of what we noise I but and so on this approach to classify text is because what's approach so you have farm and wait for each 1 the well and then you check of autism and documented and the world multiply of this which by a different year score of for the water and but if we instead of
11:18
Our modal trained and the on can use uh a but his someone's he was also on the slide the the and also I can't I can't breathe so all of these a top features I learned by the modal the 4 different classes for example for computer graphics you have for what's like graphics see much software images 3 D file uh so you feel some of 1 of these what are appears in the document it's more likely to be of a different about computer graphics that next like total sense but if you look at it for example
11:59
at are atheism are class you can see that well there are there are words which are related to atheism like atheism Islamic Islamic atheists morality other uh somewhat switch like don't make any sense of and told or for their freedom whatsoever th and might you so do if Ephraimite you in this room no and so the 1st some document mentions a guy named to match you then document is for some reason about so atheism uh this is also what my on the model learnt and there are similar issues with a documents about medicine part of solve some guy named to Pete if she's mentioned in a document document about medicine and the most negative waterfall Christianity is an be so something's going on here right so this doesn't feel right so we can have a the chip color documents and find some of them which of what makes you
13:12
are and so we can see that the documents are messages and so what we're using them as these with of trauma he the result the email addresses uh etc. and the model of found an easier way to classify messages so instead of 4 remember instead of figuring out how to classify them by content but it just remember to some of us are and to their names parts of the emails and so what it things like 0 I see this is my old friend you Human Rights about the season so this is the argument about a season and of the doesn't matter what you say and these different about Atheism so while while to
14:00
so it depends on the dust maybe this is what we wanted from the model the of but maybe you want to to more or justifying their messages the using of their content like what the message about of course the model learns something about message but the top features that you here of the FA lighted seasonal effects library so the features of these emails etc. so what does it mean it means that if we don't look at their or predictions and the coefficients of you may you started to write different classifiers to doing she hyper parameters and they aren't aware model of our worst model will be just the model we can use this information but but if you look at this we may maybe this is so
14:55
there is a problem in our data of was 2nd provides a way to remove the word he there's filters and support from these messages for this particular dataset and so we can do this through train the model and we can see that the accuracy drops a lot so in the previous model accuracy was of more than 92 or % analysis of 0 . 7 9 8 0 6 so why why doesn't so why doesn't that happen for uh for bonanza you yes so I previously the model was overfitting enter a d for example there can be at my 0 both in training and test apostle data and so a relation doesn't show us that this is not the in the does ensures that the and by all but also we have some some useful information so election gaze so let's try to improve quality was modeled on we can see that there are what's region not so related to a particular topic like do you have you are dont for these what and this like by groundwater water and so they are not well the and that was related to a particular topic that is say the is not seen anything on the screen but are not what i is negative and what use positive and it doesn't make any sense what where when why and why the the likely and these are just some background and so they appear in our documents about all topics and the model will just decided to use these words as a non preferences for some classes so the band so well sometimes because of of our data set is not large this was done to be equally and documents so some may have positive with some I get many ways the going our way to 4 or 5 this is to use a student source to make the task of focus if I easier they're called stop ports and 2nd provides a way to remove it OK and you can see where we can pass to store was equals English the TFIDF that the rise of and indeed so now this was a military lighted uh and quality improves so by looking at explanations so we can all try to figure out how to process all data but it's better we can try to find new features and by the way if you know all but it's carefully you can see the words don't is not removed long life don't don't still very light if the passing of Wars English yeah exactly so actually this is consecutive on that of the default article Mesos please on directions but also what please doesn't account for that the so by looking at explanations we we find box the
18:29
OK so we can at a x the stop what's mission were not is not Portuguese and we have was improved and so was the lessons that are below they planned what you is the box and today it's their challenge in machine learning because of models can adjust for your box but the exam quality may improve the
18:56
drawn from from there are other
19:00
ways to or approach text but like instead of what we may use a character ngrams i it means we are using sliding windows of size of you free and 4 and 5 uh and just so these 3 4 and 5 so let us suspicious the so if a small sculpture all utilize them you can see that
19:27
the quality is worse and you can see that the Watson not light it in full but some parts of what's our more important some are less important well known quality is it's not about again we can try the same abroad frightened contrite remove stopwords are
19:47
what would be the quality really temporal for or decrease or stay the same people knows what what you think things a quality to improve this raise your hands and to think of quality real of decrease and who I think of who doesn't know what will happen yeah dont with things that that quality will stay the same yeah so someone thinks of Cordials they're the same and if you will and if they have the same yeah
20:21
why the this happens because it's documented infected
20:26
film that's the most of no effect if analyzes not of is not what this is very easy to miss I made this mistake myself so by looking at
20:38
formations you can find box in your own called so bio
20:45
inspection in our model by chicken was going on we we were able to find usually is data she is best specific we found about consecutive on we branch to buck and our own sauce called and he got a better understanding of the processing pipeline and be made so where relations for much worse yeah well if I am not saying this is just to show that 2nd plant tutorials but it's very well but they seem these problems in every single emotion and project I work and on so in addition an additional tool which can show people should be able to do but was going on of that I think is helpful
21:32
so there are no 2 ways to win with this you can see 2 main wasteful can consent models you can see it uh the model as a whole inspectors with or you can explain conquered predictions of the model so like
21:46
i xi out we had taken prediction and
21:49
share we're looking at model as a whole both with for useful and
21:54
so now so each seems a spent too much time to control pizza so where I want to have the time to discuss all my slides that the like but there are well so far I was taken only about the new models but of course there are methods so to respect other models as well like form tied diffusion trees and symbols you can use a fishing boat answers and there is a way or to inspect the predictions of are theory explain predictions of follow this our models of course use the link I only Firefly because in an implementation for the usual full I GBM form a psychic longer in boosting methods on things the random forests that young there also ways of
22:51
instructional blackbox models of they're probably the simplest method i is called mean decrease in accuracy I've just another 10 or paper Brierley Bremen was paper where in the forest was introduced so it's like In the yield methods and so there's no reference this metals a someone knowledge is the use of the new me give me a link FIL so the idea is simple you train turned a model then you or our and run you get predictions on this part of data and then the if you want to know how were features affect the result so you can are removed some features train the model again but training is slow so there is a walk around the Taiyuan so remove for this only in test dataset but there you can just an official from the dataset because model uses decision so you replace you want you a place at is and they will use but you can't just take a random rose because they may have different scale of then in the dataset even distribution and so the walk around is that you just shuffle values for the feature the and get a random value from some other examples and then you all a run model we to training on this side of the dataset this in the future shuffled and check how much of these if its score the and you can do this for players of features and so check well which usually important for a model we don't have implementation these in LA face library but on issues and so that there is also a way to had
24:41
to but the big difference of like book books models so the main idea is that we have a black box model we don't know how to look inside of and the gets and explain all modal may be linear because if I am nonlinear a regression model and then the train it so that the topics inmates this black box model so we don't train it will give the correct predictions we train it to get the same predictions as our but the books model and then instead of solving of inspection black box which we can do all respect this white box model of do you think this is the sidewalks and we will do that the but there's nothing but the I think this doesn't work because well if you have followed is taxable model which can approximate but books model the like you don't use the spectacle model so the result
25:40
climb and the main if you do the same but not globally not on the whole dataset you approximating your predictions on these small neighborhood around a single bird examples and this method works in both ways and pretty popular recently and there are some issues with the use of these
26:00
metals all like what's neighborhood neighborhood is means that we need to example similar to a given example we can get it from now on our data set but like a there won't be enough examples so we want to generate these examples and it's also what's doesn't similar so we need a distance function between examples and we also need to define a neighborhood so theory it should do some size of the neighborhood and you must have the size of so to
26:32
generate of fake examples we can for text data we can remove some parts of tips for images that we can our brain cells some parts of the image or is a braille we've in which means that for arbitrary data we are beginning to make its distribution and then sample from this distribution of so here we
26:54
have a tradeoff we done longer care about the black box model the world or but to be clear about data so instead of simply writing called for each black box not for each model we write code for each data type why do you have for each dataset the and there are challenges like white box models people powerful enough to explain the at least in a small neighborhood the must use neighborhood size from earlier and and so they generated examples some of diverse enough then of alignment made lied to us it may be defined as incorrect information incomplete formation and these areas a cut to detect because if you choose you have chosen incorrect neighborhood size you may check how well your white books model approximation but books model and if score is law then you see that something is going on but if your examples some not so good enough them your simple model may approximately a black box model very good on these examples but I've doesn't mean it inflation correct so
28:08
on the files get this there is a
28:10
popular and high quality implementation from blind source as it's a separate package would support for images for regression tasks and we also have an implementation for i'm because it's of some 2 details different and because if the library well I have a lot of inspectable now models and we can use any of them with lime without implementing anything uh and so we have our expert tool J so on to we can show them a pattern of books a community export them to data frames uh and we don't have to a rewrite all this called for each model and so line can you servicing so you did just that with the good
28:55
use and you have unified debate was of you have some cool features
29:01
of Michigan written the
29:03
documentation what I was explaining days like a very ground to us the diamond under the new very simple and basic metals there are there is research going on about the model solution could operate in themselves for example in the deep learning your often may use our attention mechanism and so is there a new ways to visualize models especially for images and then there is a dark a program called expendable typical intelligence go to on going on so and expects to both of you search or come in and for this topic so the conclusion is that
29:51
the boring Eurosceptics inspection models if you can uh but you know or we should know what you're looking at because slasher may lie to you and so only 5 library might help and so but it may not feel about but you can help allay 5 so please the genus thank you
30:13
questions