Explaining behavior of Machine Learning models with eli5 library

Video in TIB AV-Portal: Explaining behavior of Machine Learning models with eli5 library

Formal Metadata

Explaining behavior of Machine Learning models with eli5 library
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Explaining behavior of Machine Learning models with eli5 library [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 2] [Rimini, Italy] ML estimators don't have to be black boxes. Interpretability has many benefits: it is easier to debug interpretable models, humans trust decisions of such models more. In this talk I’ll give an overview of ML models interpretation and debugging techniques. I’ll cover linear models, decision trees, tree ensembles, arbitrary classifiers using LIME algorithm. The talk focus is on explanation algorithms, because it is important to be aware of pitfalls and limitations of the explanation method to be able to interpret an explanation correctly. I’ll also show how to use these techniques in practice, to debug and explain behavior of estimators from Python ML libraries like scikit-learn and xgboost using open-source eli5 library: https://github.com/TeamHG-Memex/eli5 . Attendees will get both practical and theoretical understanding of these explanation methods. Target audience is ML practitioners who want to 1) get a better quality from their ML pipelines - understanding of why a wrong decision happens is often a first step to improve the quality of an ML solution; 2) explain ML model behavior to clients or stakeholders - inspectable ML pipelines are easier to “sell” to a client; humans trust such models more because they can check if an explanation is consistent with their domain knowledge or gut feeling, understand better shortcomings of the solution and make a more informed decision as a result
Web crawler Mathematical model Virtual machine Mathematical model Library (computing) Descriptive statistics
Linear regression Computer file Weight Multiplication sign 1 (number) Outline of industrial organization Parameter (computer programming) Regular graph Distance Mathematical model Mathematical model Machine code Wave packet Product (business) Performance appraisal Coefficient Well-formed formula Software Cross-validation (statistics) Cuboid Absolute value Metropolitan area network Linear map Condition number Area Machine learning Pairwise comparison Curve Theory of relativity Scaling (geometry) Linear regression Weight Projective plane Sampling (statistics) Mathematical analysis Electronic mailing list Sound effect Uniform resource locator Prediction Personal digital assistant Universe (mathematics) Right angle Quicksort Coefficient Resultant
Predictability Scaling (geometry) Open source Multiplication sign Virtual machine Coma Berenices Letterpress printing Black box Machine code Measurement Mathematical model Mathematical model Machine code Number Personal digital assistant Video game Object (grammar) Quicksort Coefficient Intercept theorem Curve fitting Library (computing) Library (computing)
Logarithm Twin prime Physical law Insertion loss Black box Term (mathematics) Mathematical model Message passing Process (computing) Condition number Task (computing) Form (programming) Computer architecture
Web page Noise (electronics) Slide rule Information Computer file Linear regression Weight Virtual machine Mathematical model Computer graphics (computer science) Data mining Medical imaging Kernel (computing) Software Different (Kate Ryan album) Term (mathematics) Core dump Binary multiplier Form (programming) Library (computing) Social class
Email Weight Content (media) Parameter (computer programming) Mereology Mathematical model Graph coloring Element (mathematics) Word Message passing Negative number Right angle Maize Address space Resultant Social class
Filter <Stochastik> Email Module (mathematics) Direction (geometry) Source code Set (mathematics) Water vapor Parameter (computer programming) Student's t-test Mathematical model Wave packet Cuboid Software testing Message passing Position operator Task (computing) Social class Predictability Default (computer science) Email Focus (optics) Theory of relativity Touchscreen Information Content (media) Mathematical analysis Data storage device Sound effect Message passing Word Video game Musical ensemble Coefficient Library (computing)
Performance appraisal Machine learning Cuboid Information Mereology Message passing Mathematical model Mathematical model
Continuum hypothesis Simultaneous localization and mapping Computer-generated imagery Mereology Window
Real number
Addition Metric system Theory of relativity Mathematical model File format Projective plane Interior (topology) Sound effect Range (statistics) Mathematical model Machine code Latent heat Performance appraisal Process (computing) Single-precision floating-point format Computer cluster Cuboid Task (computing)
Predictability Slide rule Intel Implementation Link (knot theory) Weight Multiplication sign Shared memory Mathematical model Mathematical model Symbol table Theory Mathematical model Prediction Network topology Decision support system Network topology Forest Electronic meeting system Musical ensemble Form (programming)
Predictability Implementation Scaling (geometry) Mathematical model Link (knot theory) Decision theory Modal logic Multiple Regression Black box Mereology Mathematical model Mathematical model Wave packet Wave packet Mathematical model Arithmetic mean Nonlinear system Prediction Different (Kate Ryan album) Forest Cuboid Software testing Resultant Library (computing)
Predictability Neighbourhood (graph theory) Functional (mathematics) Mathematical model Prediction Function (mathematics) Single-precision floating-point format Neighbourhood (graph theory) Set (mathematics) Distance Theory
Random number Neighbourhood (graph theory) Distribution (mathematics) Computer-generated imagery Black box Mereology Mathematical model Mathematical model Medical imaging Cuboid Area Distribution (mathematics) Information File format Neighbourhood (graph theory) Physical law Sampling (statistics) Mereology Machine code Approximation Mathematical model Film editing Sample (statistics) Error message Sweep line algorithm Data type
Implementation Linear regression Computer file Linear regression Computer-generated imagery Source code Expert system Line (geometry) Mathematical model Mathematical model Frame problem Mathematical model Medical imaging Entropie <Informationstheorie> Pattern language Implementation Task (computing) Task (computing) Library (computing)
Medical imaging Mechanism design Mathematical model Mathematical model Mathematical model Machine code Computer programming Electric current Rhombus Similarity (geometry)
Mathematical model Mathematical model Library (computing)
hi everyone my name is revealed that I'm a data scientist description cup we do a lot smarter web crawlers and I want to talk about how things get your machine learning models so how many of you will you say can you please raise your hand that mice and I will be asking you questions to make sure you can understand my pay a Russian oxins OK the fossil but that go
much much in learning project you collect some training data and then extract features from from the 2 then you put it all some your favorite classifier and then you put that the production rate the uh what's missing is a lot of things missing but what's missing is is south a sorry do not they that yeah what else is missing yeah revelation right across the this so yeah there's a lot of things amusing but at the same point and
sold let's say the edit relations we computed accuracy score we perform cross-validation you got love 90 % accuracy all of the ones clock but what what does it mean what qualities that best quality maybe the good quality for this model would be 99 per cent and we got 90 only because of the box analysis of 2 or maybe that's the integrated result and to be can be created and deployed the production and go party I had to do but so there is a lot of low questions after like questions which can solve just billycan that relations course the main 1 want to know if our model is reliable or not like maybe it's files spectacularly on some examples but we can see that our location so about how to solve these problems nothing 1 0 how to do this and I often don't know how to do this and I don't have an answer us so there is also but that if fuel or or inspection models where you can get an additional the which can help you left and so I came out to the conference of that area so I had had time to walk by the short there's a lot of restaurants and so each and every 1 of them is just around the those is it the same Shiffrin so or water right it's in the Romanian please raise your hands yeah and Wilhelm cited yeah yep so there are people next like this for you and for you yeah the and I visited this restaurants and I have created a formula of how to compute it surprised that is the linear regression formula so you can see that the surprise it depends on the the weight on discuss the sea of uh 1 ingredients and disease there no all man playing the guitar them and it's very important for pizza so the question is what can you see from the curve account of the good coefficients of these linear regression so please the well you can see for example that designs effects are Bryce negatively so the folly of from you from the sea the list is a course right while you can see that there may be a all the men play in digitized might put the Martians rise of the but then the well yeah but what that people's of yeah this formally awful I don't saves the formula I'm sorry about that please don't use it to fuel a once and pizza about you know yeah this final is awful the but that's great phrase gives so yeah so 1 more problem what is that that you can't compare conditions because like distance from C or you or you have doesn't have the same scale as of like old men playing the guitar time so if you just take the questions you may think that our weight is important about of my from some of important at all but this is not the case now because scale for different so are we as we can see in a chicken model parameters it's helpful I we can get some understanding of what was going on but we all must know what you are looking at and the other pitfalls Tao for example let's say really thin no I don't if universe the where is a pizza are always get a meat and mushrooms the together so they are always come together almost always come together so in this case to Our weights of all me and my friends we can't use them so we can just weights as soon as this samples and will you so we heard can make weight minus medium for and blast million and tool for mushrooms and the final you give the same result because of me to National always come to and together the an so it depends on the training at at on what so we show that the center will get informal or if they're correlated the and you need to be aware of this but in practice is not usually a problem because you usually use regularization but for example if you use our L 2 regularization than k fusions will be about the same R and if you use L 1 regularization there 1 if you can get 0 and another 1 in the it twice as large and so if you're sort provisions by the absolute value but you need to be aware of that of useful training of this linear regression the on OK so long
time some people are here used on sigh get them and so it's like a john like in of many other libraries the and there is a way to look at coefficients so the model right so this how to do it and take it on but this code is incorrect knows what's not correct here the the well it's not like entirely incorrect but it won't give you the whole formal because there is a lot of extra coefficients uh its name the intercept of you and on see here so or so we created a library called life life it means explain like in 5 it started from where needed similar of to this number 2 on various life no the most work to get this k fusions from novels How much learning models it supports 5 of popular machine learning packages more than 70 objects and this is the 1st some of measures which to explain models and their predictions of arbitrary black-box classifiers but in the simplest case started from the stuff by the way this stable doesn't make any sense because scales of of and diffusion of skills of features in both datasets is not the same so that
libraries open source uh you can use that you can join us contribute to use and a or a sort of issues if something doesn't walk so suppose they could
learn is Elijah Graham use several of the less popular packages and defense and conditional lime logarithm which law school I explained black box models at so
much gall to and now I would give you like and moral loss examples of how can you use it so how many of you or or full of it the text processing tutorial in psychic and dogs the other people who have done this yeah there are people here nice so on a 2nd has great doxies his greater total signed on to of from of them and there is the tutorial on Text Classification there are messages from of forms an and so but based on the features of the task is to classify them and all the datasets named the 20 newsgroups but she have reason for all architectures up so then
final model tire in this tutorial is so I G F idea features and SEM Safire so is aecium justified yeah there people so it's also on so we assume we have a linear kernel is also a linear model is similar to our is a former but then you need to which approved glass is a positive or negative you just compressed core tools you is greater than answer these better than 0 there the answer is yes it's less than 0 answer is no Antoine also what's TF-IDF yeah there's a lot of people on and there's a book or and by Our Christopher mining the information to will book I interests and to the chapter about the TF-IDF and there is a page shows that 60 different ways to computed like 64 falls to and the 2nd term is not 1 of them so every machine learning library users in some form of what we noise I but and so on this approach to classify text is because what's approach so you have farm and wait for each 1 the well and then you check of autism and documented and the world multiply of this which by a different year score of for the water and but if we instead of
Our modal trained and the on can use uh a but his someone's he was also on the slide the the and also I can't I can't breathe so all of these a top features I learned by the modal the 4 different classes for example for computer graphics you have for what's like graphics see much software images 3 D file uh so you feel some of 1 of these what are appears in the document it's more likely to be of a different about computer graphics that next like total sense but if you look at it for example
at are atheism are class you can see that well there are there are words which are related to atheism like atheism Islamic Islamic atheists morality other uh somewhat switch like don't make any sense of and told or for their freedom whatsoever th and might you so do if Ephraimite you in this room no and so the 1st some document mentions a guy named to match you then document is for some reason about so atheism uh this is also what my on the model learnt and there are similar issues with a documents about medicine part of solve some guy named to Pete if she's mentioned in a document document about medicine and the most negative waterfall Christianity is an be so something's going on here right so this doesn't feel right so we can have a the chip color documents and find some of them which of what makes you
are and so we can see that the documents are messages and so what we're using them as these with of trauma he the result the e-mail addresses uh etc. and the model of found an easier way to classify messages so instead of 4 remember instead of figuring out how to classify them by content but it just remember to some of us are and to their names parts of the e-mails and so what it things like 0 I see this is my old friend you Human Rights about the season so this is the argument about a season and of the doesn't matter what you say and these different about Atheism so while while to
so it depends on the dust maybe this is what we wanted from the model the of but maybe you want to to more or justifying their messages the using of their content like what the message about of course the model learns something about message but the top features that you here of the FA lighted seasonal effects library so the features of these e-mails etc. so what does it mean it means that if we don't look at their or predictions and the coefficients of you may you started to write different classifiers to doing she hyper parameters and they aren't aware model of our worst model will be just the model we can use this information but but if you look at this we may maybe this is so
there is a problem in our data of was 2nd provides a way to remove the word he there's filters and support from these messages for this particular dataset and so we can do this through train the model and we can see that the accuracy drops a lot so in the previous model accuracy was of more than 92 or % analysis of 0 . 7 9 8 0 6 so why why doesn't so why doesn't that happen for uh for bonanza you yes so I previously the model was overfitting enter a d for example there can be at my 0 both in training and test apostle data and so a relation doesn't show us that this is not the in the does ensures that the and by all but also we have some some useful information so election gaze so let's try to improve quality was modeled on we can see that there are what's region not so related to a particular topic like do you have you are dont for these what and this like by groundwater water and so they are not well the and that was related to a particular topic that is say the is not seen anything on the screen but are not what i is negative and what use positive and it doesn't make any sense what where when why and why the the likely and these are just some background and so they appear in our documents about all topics and the model will just decided to use these words as a non preferences for some classes so the band so well sometimes because of of our data set is not large this was done to be equally and documents so some may have positive with some I get many ways the going our way to 4 or 5 this is to use a student source to make the task of focus if I easier they're called stop ports and 2nd provides a way to remove it OK and you can see where we can pass to store was equals English the TF-IDF that the rise of and indeed so now this was a military lighted uh and quality improves so by looking at explanations so we can all try to figure out how to process all data but it's better we can try to find new features and by the way if you know all but it's carefully you can see the words don't is not removed long life don't don't still very light if the passing of Wars English yeah exactly so actually this is consecutive on that of the default article Mesos please on directions but also what please doesn't account for that the so by looking at explanations we we find box the
OK so we can at a x the stop what's mission were not is not Portuguese and we have was improved and so was the lessons that are below they planned what you is the box and today it's their challenge in machine learning because of models can adjust for your box but the exam quality may improve the
drawn from from there are other
ways to or approach text but like instead of what we may use a character n-grams i it means we are using sliding windows of size of you free and 4 and 5 uh and just so these 3 4 and 5 so let us suspicious the so if a small sculpture all utilize them you can see that
the quality is worse and you can see that the Watson not light it in full but some parts of what's our more important some are less important well known quality is it's not about again we can try the same abroad frightened contrite remove stopwords are
what would be the quality really temporal for or decrease or stay the same people knows what what you think things a quality to improve this raise your hands and to think of quality real of decrease and who I think of who doesn't know what will happen yeah dont with things that that quality will stay the same yeah so someone thinks of Cordials they're the same and if you will and if they have the same yeah
why the this happens because it's documented infected
film that's the most of no effect if analyzes not of is not what this is very easy to miss I made this mistake myself so by looking at
formations you can find box in your own called so bio
inspection in our model by chicken was going on we we were able to find usually is data she is best specific we found about consecutive on we branch to buck and our own sauce called and he got a better understanding of the processing pipeline and be made so where relations for much worse yeah well if I am not saying this is just to show that 2nd plant tutorials but it's very well but they seem these problems in every single emotion and project I work and on so in addition an additional tool which can show people should be able to do but was going on of that I think is helpful
so there are no 2 ways to win with this you can see 2 main wasteful can consent models you can see it uh the model as a whole inspectors with or you can explain conquered predictions of the model so like
i xi out we had taken prediction and
share we're looking at model as a whole both with for useful and
so now so each seems a spent too much time to control pizza so where I want to have the time to discuss all my slides that the like but there are well so far I was taken only about the new models but of course there are methods so to respect other models as well like form tied diffusion trees and symbols you can use a fishing boat answers and there is a way or to inspect the predictions of are theory explain predictions of follow this our models of course use the link I only Firefly because in an implementation for the usual full I GBM form a psychic longer in boosting methods on things the random forests that young there also ways of
instructional black-box models of they're probably the simplest method i is called mean decrease in accuracy I've just another 10 or paper Brierley Bremen was paper where in the forest was introduced so it's like In the yield methods and so there's no reference this metals a someone knowledge is the use of the new me give me a link FIL so the idea is simple you train turned a model then you or our and run you get predictions on this part of data and then the if you want to know how were features affect the result so you can are removed some features train the model again but training is slow so there is a walk around the Taiyuan so remove for this only in test dataset but there you can just an official from the dataset because model uses decision so you replace you want you a place at is and they will use but you can't just take a random rose because they may have different scale of then in the dataset even distribution and so the walk around is that you just shuffle values for the feature the and get a random value from some other examples and then you all a run model we to training on this side of the dataset this in the future shuffled and check how much of these if its score the and you can do this for players of features and so check well which usually important for a model we don't have implementation these in LA face library but on issues and so that there is also a way to had
to but the big difference of like book books models so the main idea is that we have a black box model we don't know how to look inside of and the gets and explain all modal may be linear because if I am nonlinear a regression model and then the train it so that the topics inmates this black box model so we don't train it will give the correct predictions we train it to get the same predictions as our but the books model and then instead of solving of inspection black box which we can do all respect this white box model of do you think this is the sidewalks and we will do that the but there's nothing but the I think this doesn't work because well if you have followed is taxable model which can approximate but books model the like you don't use the spectacle model so the result
climb and the main if you do the same but not globally not on the whole dataset you approximating your predictions on these small neighborhood around a single bird examples and this method works in both ways and pretty popular recently and there are some issues with the use of these
metals all like what's neighborhood neighborhood is means that we need to example similar to a given example we can get it from now on our data set but like a there won't be enough examples so we want to generate these examples and it's also what's doesn't similar so we need a distance function between examples and we also need to define a neighborhood so theory it should do some size of the neighborhood and you must have the size of so to
generate of fake examples we can for text data we can remove some parts of tips for images that we can our brain cells some parts of the image or is a braille we've in which means that for arbitrary data we are beginning to make its distribution and then sample from this distribution of so here we
have a tradeoff we done longer care about the black box model the world or but to be clear about data so instead of simply writing called for each black box not for each model we write code for each data type why do you have for each dataset the and there are challenges like white box models people powerful enough to explain the at least in a small neighborhood the must use neighborhood size from earlier and and so they generated examples some of diverse enough then of alignment made lied to us it may be defined as incorrect information incomplete formation and these areas a cut to detect because if you choose you have chosen incorrect neighborhood size you may check how well your white books model approximation but books model and if score is law then you see that something is going on but if your examples some not so good enough them your simple model may approximately a black box model very good on these examples but I've doesn't mean it inflation correct so
on the files get this there is a
popular and high quality implementation from blind source as it's a separate package would support for images for regression tasks and we also have an implementation for i'm because it's of some 2 details different and because if the library well I have a lot of inspectable now models and we can use any of them with lime without implementing anything uh and so we have our expert tool J so on to we can show them a pattern of books a community export them to data frames uh and we don't have to a rewrite all this called for each model and so line can you servicing so you did just that with the good
use and you have unified debate was of you have some cool features
of Michigan written the
documentation what I was explaining days like a very ground to us the diamond under the new very simple and basic metals there are there is research going on about the model solution could operate in themselves for example in the deep learning your often may use our attention mechanism and so is there a new ways to visualize models especially for images and then there is a dark a program called expendable typical intelligence go to on going on so and expects to both of you search or come in and for this topic so the conclusion is that
the boring Eurosceptics inspection models if you can uh but you know or we should know what you're looking at because slasher may lie to you and so only 5 library might help and so but it may not feel about but you can help allay 5 so please the genus thank you