Add to Watchlist

Machine Learning Under Test


Citation of segment
Embed Code
Purchasing a DVD Cite video

Automated Media Analysis

Recognized Entities
Speech transcript
In the end of this thank you thank you were much becoming can everybody hear me OK thank you I have a couple of slides of presentation the direction of my life please let me introduce myself and
find variable that's me of the of me and I'm pleased to have reached the computational science and I'm currently a postdoc researcher at the University of Salerno Italy I can deployment of a data scientist whatever the means and of course also are very
diverse and long life old stuff and please don't ask me to features computer grade of the of watcher that you might have seen that yeah let's get serious these or something some of the topics I work usually work on my work with so
much learning algorithm information retrieval text mining travel and recently joined the team Salerno working with Linked Data and Semantic Web data technologies various things and so usually apply all this stuff to the softer so fact my main research field is soft and maintenance and so basically apply much learning algorithms to force within the analysis of and of course I do all this stuff with like this you might prefer a programming language and these are more or less all the tools basically because every day uh in particular of the much learning to use the most and highlight over the most are this 1 and these are holding to find going to talk interference so let's get to the
point much learning and depth so what did the presentation is or less organized into different parts the first one we're going to to to to to understand what did the common risks and pitfalls related to much learning and much learning animals or placed qualified to that the sum of the torques issued by about this kind of thing and in the 2nd part of going to talk about testing much learning good what actually it means and what rules on the right to use so please before we start
please let me ask you 3 questions are also called you already know much learning how many of you off testing to Europe of perfectly suited for this and you already know you used here about a single about of yeah and I don't like it's learned from learning OK perfect so you're trying to people what the
introductory so basically will much learning and this is 1 of the most common definition of social learning so that says that Russia learning is the systematic study of algorithms and systems of improved the knowledge and performance with experience might take this definition because this information points out of the interesting part of the motion the algorithmic part so that lands on so basically much learning means writing
algorithms and writing code and plants much learning look like this to basically it's all algorithms of data and statistical stay in a few words in the in shoulder much learning should be summarized as as algorithms that should run on data and from our point of view mean from the point of view this talk how we should deal with algorithms that we should deal with the testing of algorithms that analyze data so we need to take some these into consideration to to perform our testing properly very common and few examples of machine-learning of the season example you're regression so in this case we have all the data the blue dots on to date and we want to generalize the function that all the data another very common problem is the classification of we have the data yeah divided into classes and wanted algorithm to divide data properly to dating thing to do in the 2 classes we have in this case we have find a of hyperplane separation between 2 classes and so another well known algorithm another well known technique to learning is clustering the clustering problems tells that we have this data are distributed in spite of the blue dots and we want to end up in an organization of data like this force is to we want to want algorithm that is able to identify the different groups being the data from the 1st 2 the examples presented a menu many new are an example of supervised at the last of the lesser learning remains that pipeline the processing of emotional learning is the smallest like this we have data over there and we transform the data into feature vectors then we these features this is fed into the machine learning about when you want do we have to find want to test and then we have labels so this is this supervision part that's why that's kind of look at that stockholder devices that and after that after we trained them although we want to fix the size of the on the new data to so basically much learning means try to define a model that is able to generalize the conclusion that and that's the key word a keyword history station then this is due to the bias learning setting the unsupervised learning setting um it's something like this it's almost the size of the differences in the output of course and in the fact that the supervision is missing there are no labels on the data provides to please let me just give you the wise in the hopes of supervised learning all the leaves the expected label so we have our in here that is trained in set of labels of on the set of given labels to set of labeled data and we expect the algorithm to generate the exact label or the proper label for the new data coming after the training but this is a supervised setting in the unsupervised settings the since the missing since the labels are missing the output difference to in general we may have a likelihood or a cluster ID which cluster the group where the data belongs to you can so these are the general introduction to the techniques and the cost of the Our 1st to deal with and 2nd provides this language she's and it's sort of my that you can use to decide which kind of technique you can in for your specific problem and this is quite interesting because as you can see 2nd provides algorithms for classification problems for clustering problems for regression problems so with 3 examples of presented previously as we have also the dimensionality reduction is not a problem of unsupervised learning I mean here you might find in the Bible literally you can read it that I'm hearing I thought some tips on how you can decide to which technique you should use for your specific problem forces so if you have labels of course you might uh each you have labels you might end up with a regression of across calcifications supervised learning approach in case you don't have label data here you hand up to clustering approach is because you don't have supervision and this is just very simple to but in the if you decide which kind of approach you want to to use I mean regression or classification clustering without it is only a year or so a year you need to decide which kind of technique you may use because classification is a family of 4 other approaches to the conservation uh itself come as a kind of on is an approach family of techniques to you might decide which kind of algorithm make you should use and after that basically you have to decide which model you're going to use and after that you have to decide those of the steps of farm parameters that you model best uh that that should be useful but best proxy and your results 0 we have a lot of things to to decide so
basically another definition much learning this structure learning teachers machines out to carry out task by themselves as that simple and this in the the complexity comes with details and the stop we're going to deal with this kind of details and try to um to see how we can deal with all the details we're asked to to deal with so this is our this is our starting point so we have all the data the historical data we want to we have decided which kind of model we want to use for our problem and we end up with this pipeline processes this time this sort of a generative process because we want to test if the mobile where you have built so the all the model we have decided to use is perfect for the problem at hand and we want to evaluate the performance of the mobile and we want to optimize the model so in this case it means try to achieve the parameters of the model in order to to improve the
performance of so we want to deal with this part I want to deal with this all iterative processes for and what about the risk that was related to uh much learning we may end up for a fruitful we may end up dealing and I think non-stable data set with need to be against data that may contain some noise and 1 hand on the other hand and as a said before much learning is essentially algorithms so we need to test if we the code we already written contains 4 4 4 4 we might end up with a problem which is called underfitting the underfitting problem means that the learning function we decide to use and sometimes this means the is set of parameters we decided to accept job modal is not properly band suited for a dataset to learn a function does not take into account enough information so the model is not accurate enough learn from our data this called underfitting the problem is the overfitting to be counterexamples to be completely different problem we have taken the the learning function does not generalize and say the use of quite difficult phenomena to discovery and we will see that there are some techniques to deal with this problem and finally we have the unpredictable future so we don't actually know if if our model was working on it's not working so we need to track and test the performances of our model while this don't know how
to cope with this kind of the fossil fuel we have 1 so of we may end up we want to reduce the problem of unstable data we have testing stage where required to do some testing drugs if you want to avoid underfitting or overfitting problems we have a technique which is called cross-validation you will see some examples tomorrow and the anatomical features precision recall tracking of of you know with water what is precision and recall so I would get out time and you find no problem OK so let's start with the you're dealing with unstable they tend to basically the the the point is try to test the code and each testing your code is 1 of the things that I suggest you to do most of the time thank you uh
in Python we have a lot of tools for testing we have the great unit test modules and basically unit test is based on a set of version and the assertion 1st we have a certain people and that test this is a very simple to instances of the object be we have a lot of research the lost all here in the figure refers to the Python version number bird because being introduced in the unit test let me just briefly remind you that unit test module is a bit more uh extend its improved announces sometimes there some terrorist a Python 3 with respect to bite into and so I will show you 1 example of that in a couple slides from all over we have assertion to test the exceptions we have the the assertion to text warnings or even assertion to test logs and this is an example of how you can use we also plot here basically here you test if the output of the log here that correspond to what you what you expect but in case of much learning we need to take it to the to the to consideration deaths basically we're dealing with numbers in fact 1 of the the most important features so that it is that data are presented through our mattresses so in general we end up with having the feature metrics as x represented as a matrix of numbers and we have labels that basically number on array of arrays of numbers so here we have we have to do with numbers so our the testing we're going to write the test and the unit test we to write has to deal with number of problems that we need to to just numbers and of the we need to compare raise or a floating point number In this
case in this particular case we have New pi comes you know and with the new news you already knew knew that new has a testing modules that includes some more additional such the 1st sparse set almost equal approximately equal and summer session related to array comparison will see a couple of example 1st if you want to set that 2 numbers are almost equal we might use the circle mostly equal assertion in the new the testing module and we might number of national positions who want to the 2 numbers or compare so in the 1st case we want to test the number of the 7 decimal places so in this case the test classes In the 2nd place since the police the last the digit sort last G. this different so here that Deschanel played the decimal places to take into consideration of operate in the test they'll to say we have an assertion note here that says that arrays the whole most people to be like that of all the action and so on you can see that these are reported and this is 1 of the things we need to to take into account when we deal with floating point number uh and although we might to rates are equal and you read provides 2 different functions must set all closed and costs directly over here at the acceptor close function implements this 1 comparison this function but basically what we if we test aseptically states to uh some more additional parameters here uh which means absolutely absolute tolerance bottle which is the relative tolerance and this case the test will pass In instead that if we're going to use the a certain rate equal these 2 rates of difference and this is the assertion of power we have to the mismatches are 50 % again if we want to compare floating point numbers we might take into that into account the you know the so-called you will be so the units based procedures uh which is the usually and refer to the accident and if you wanted you to know what is the excellent for new B and for floating point numbers in general we might on get these by using and we don't have been info have this is the absolute you will be for 40 points and in this case if you want to test if you range of equal In the 1st case the test classes because all we're going to to verify which you to check if 2 numbers the excellent on the people I mean this test classes because we just have in 1 single excellent and save due to floating-point numbers representation discussed passes in the 2nd case this the test fails because we're adding a quantity which is greater than the absolute so greater than the unit least precision as indicated the 2 numbers are considered different except and what X and Y are not equal to 1 you will be my again and finally New Be Testing is right because it's it's also had some I had some more uh tools to deal with your testing 1st is a task some decorators that integrates with knows the nose testing framework of some just an example in of these these tests that the greatest of the 1 is shown in this slide so that allows you to decorate the function telling the framework that that that that just a supposed to run slowly what what the means it depends on Europe personal Europe deformation again we have uh there we have sorry we have the marked framework which is included in the unit tests of Python 3 and this is 1 of the future I was referring to when I said that the unit just module in Python the built-in you just you by and for patent we've come to submit an extended announcer with respect to the 1 in Python 2 in Python treat you might do something like this from unit testing block and this works in Python to to try to import all from unit test you got an error and if you want to use the mark in biology because you should do is are being installed market which is a mock package available on the by
and we see an example of them back out do you know what watermark the guys to the problem of basically here where
we define a function that outlasts which is you know that basically about cost function which is the factual the factual here prints the message and this message is just used due to testing the actually produce exercise and not by the market and and calculates the actual number of the and even in the as soon as the test in the 1st test we not the structural function and in the 2nd test we don't say this is
the output of series of here we have a lot the output said Our we want to test the assertion here here sorry and we got a working which is the actual and the actual we exercises this state here the work we accept that the output of the market 6 that we have already defined and not not know more message has being printed no more it's been printed here because the actual code has not been exercised by the model is just being marked and in the 2nd case we AssertionError because we have here structural 3 which is not supposed to rise and exceptions so we have an insertion error here so here we are exercising the real close here we go there the clear
OK thank you so this is the part related to the of the this is the part related to the unstable data what about the model generalization in the way of setting up until the time to explain it but I just show you the example basically the 2 of the most important parts of this coder based forms because basically we randomly generated some data here in this example and we're trying to apply different of algorithms in this case of linear regression algorithm on these data using different features and that the different polynomial feature in particular and the different people the future of being generated by the political feature here by the inside package and the different features that have different degrees so we try to fly features of degree 1 4 and 15 and try to test what's the what are the ones on the
spot and this is the output so basically the the dark side of the data and in you you may seem agreed that the true function and in blue the function approximated by the In the 1st case we have a model which is the undefeated in this case the modal defined selenium over here all year mobile with the new features is a it is not taking into account enough information In this case the the the we have a very good model series perfectly suited for our best data and in this case the mobile overfitting because is trying to it's trying to so this case the it isn't covered you'd approximation so if we look at this particular case going it seems that if we defined if we define a model we get full feature of decrease for for this particular data we don't say we have a perfect world we might have indeed this is not because this particular problem here this this mobile story as the exercise only on training data and the the problem is reasonable it some sense overfitting what does it mean this means that our if we considered just the training data and we can perfectly phase of the tree any data but the mobile does not generalize any this not generalizing sense because if we if the the mobile is going to but really I'm going to it is the moment of a of the little uh will cease on new data the mobile this as being too much strain on track and dataset you not to release it no generalization is allowed on visible so how we can cope with this kind of problem the 1 extremely important part of this being the modal about the nature of the speech to a technique which is full of cross-validation and in this particular case the from the other side package helps us with a lot of built-in function that allows us to apply cross-validation and model evaluation technique in this particular case we want we applied over simple cross-validation which is called train and test split so basically we get the input data and we split the data into different sets so we have the training set and the validation set we training the model on the training data and we the value of the prediction performance of the model on the validation data 1 kind of technique to to see the are properties of the prediction property and different forms of will is the so-called confusion that makes is that this is a classification problem actually classes by 3 classes of multiclass problems In the sea that in this case we have we missed crisis in this justification for but another more now the more complicated example is of important enough time to show you is the all and
this is an example of the k-nearest classifiers applied on last time data and
I so in this case did that just to conclude this this is
very interesting because you want to do our own to test the think you don't we want to test the performance on the training data and on the cross across by data data set and we apply here the function which is called shuffle split so basically we get the samples we have 150 samples we all of these are the the the of ancient function to generate the true function so that x and y data as the regression problem where and then we want to compare the learning through so basically the learning through this the the performance of the training school would respect the cross-validation school and and this is the cross-validation score for the degree for polynomial so basically here we see that when we there are lot the number of training examples we consider the errors between the training and cross-validation score is that distance URIs this version of and we have a so all the all of the 1 so would which was the law and effective here we have that the error between the predation and the training that basically is even if locked it's not a good and finally some conclusions
on the likely of the time to show you
story so basically
conclusion the 1st all very important otherwise you it's always important to have testing in code especially if you want to test numerical data and America algorithm another suggestion I mean give you just 1 hint some reference to to look into is something which is called parts testing process that thing is very interesting from outgoing this just some test the fastest thing maybe uh basically generates around applied data to 1 of the fastest thing technique is usually used to test the robustness of your notes so mean just to to do to test the performance logarithms in case of randomly generated data
time so thank you a lot of effort and attention
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


Formal Metadata

Title Machine Learning Under Test
Title of Series EuroPython 2015
Part Number 98
Number of Parts 173
Author Maggio, Valerio
License CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
DOI 10.5446/20217
Publisher EuroPython
Release Date 2015
Language English
Production Place Bilbao, Euskadi, Spain

Content Metadata

Subject Area Information technology
Abstract Valerio Maggio - Machine Learning Under Test One point usually underestimated or omitted when dealing with machine learning algorithms is how to write *good quality* code. The obvious way to face this issue is to apply automated testing, which aims at implementing (likely) less-buggy and higher quality code. However, testing machine learning code introduces additional concerns that has to be considered. On the one hand, some constraints are imposed by the domain, and the risks intrinsically related to machine learning methods, such as handling unstable data, or avoid under/overfitting. On the other hand, testing scientific code requires additional testing tools (e.g., `numpy.testing`), specifically suited to handle numerical data. In this talk, some of the most famous machine learning techniques will be discudded and analysed from the `testing` point of view, emphasizing that testing would also allow for a better understanding of how the whole learning model works under the hood. The talk is intended for an *intermediate* audience. The content of the talk is intended to be mostly practical, and code oriented. Thus a good proficiency with the Python language is **required**. Conversely, **no prior knowledge** about testing nor Machine Learning algorithms is necessary to attend this talk.
Keywords EuroPython Conference
EP 2015
EuroPython 2015

Related Material


AV-Portal 3.5.0 (cb7a58240982536f976b3fae0db2d7d34ae7e46b)


  436 ms - page object