Add to Watchlist

Extending Scikit-Learn with your own Regressor


Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title Extending Scikit-Learn with your own Regressor
Title of Series EuroPython 2014
Part Number 64
Number of Parts 120
Author Wilhelm, Florian
License CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/19972
Publisher EuroPython
Release Date 2014
Language English
Production Place Berlin

Content Metadata

Subject Area Computer Science
Abstract Florian Wilhelm - Extending Scikit-Learn with your own Regressor We show how to write your own robust linear estimator within the Scikit-Learn framework using as an example the Theil-Sen estimator known as "the most popular nonparametric technique for estimating a linear trend". ----- Scikit-Learn is a well-known and popular framework for machine learning that is used by Data Scientists all over the world. We show in a practical way how you can add your own estimator following the interfaces of Scikit-Learn. First we give a small introduction to the design of Scikit-Learn and its inner workings. Then we show how easily Scikit-Learn can be extended by creating an own estimator. In order to demonstrate this, we extend Scikit-Learn by the popular and robust Theil-Sen Estimator that is currently not in Scikit-Learn. We also motivate this estimator by outlining some of its superior properties compared to the ordinary least squares method (LinearRegression in Scikit-Learn).
Keywords EuroPython Conference
EP 2014
EuroPython 2014
OK and now we're going to learn about extending cited learned with your own aggressive from you
few more so imagine a my talk extending cited when the only present the 1st give a short introduction so I could learn and maybe most of you know and then I'll talk about an estimator which is not included which not yet included in could London robust estimator cultivars and this this as an example I would show you how you can implement your own estimator and so I could learn how to extend so I could learn that a little bit about and what you need to consider if you want to call you wouldn't own estimated society of learning and a little bit about my own experiences in the computing secular so present what if I could learn so sigh can learn machine learning library so whenever you have some kind of data anyone extract some insight from this data can I could learn
that you can use I could learn it from its simple efficient tool for data made mining and data analytics so it's really it's simple to use and and so that makes accessible for everyone and you can really applied to all kinds of problems so I took this marketing sentences right from the web page but it's really choose it's really extremely simple so if you haven't used you should buy you should definitely look into sigh learn it's built on number high side and Matplotlib toward 3 famous libraries which are used all over in the technical system and what really good it's open source but still commercially usable so it's BSD-licensed so if you want some not maybe not contribute everything you do with it back to psych learning you can still use it which makes it really good In the commercial applications OK so this picture and it can also be found on the cycle and
website I like it because it gives a nice overview of all the things you can do with secular the basic areas of applications so you can do classification so and to be good example would be if you have like hand written digit and you want a classifier for digits of 1 over and then you have everything related to clustering if you just looking for patterns in the data without having some kind of labels that the real real targets unsupervised learning you can use clustering and it also supports dimension reduction techniques so when you have too many features and 1 avoid over fitting for instance you have a lot of tools to do PCA and so on so the dimension reduction and of course the full regression part if you wanna find relationship to target variable depending on some features of and this is what we're going to talk about so but before we start 1st I'm a little refreshing from the mean from school who have learned about this the least square least
square method the is called linear regression in so I could learn an emotionally explain how it works because titles and is a kind of extension to this request so we have independent variables X 1 to X T and B in psych it learns the they're called features and we have a dependent variable to the so-called target Y and now we want to build a model we want to use the features to somehow predictive value Y an attribute really simple approaches just linear model so you have a linear combination of x and the coefficient W and you try to explain your target variable Y with the features x so in order to now find the w and then you minimize the functional which is given here so this is then the least square your minimizing the squared distances and in a typical one-dimensional case this term this picture here so the the blue dots your data and in 1 dimension for the x axis artists the future and directs and online now minimizes the squared distances to all black dots so this works really well if you have a if you have perfect data because that's an internal assumption that Arab value is normally distributed so dispersed then quite well but in practice and in many many many projects and that I worked on all the data you get maybe from customers is less than perfect so you have a lot of some outliers who have corrupted data because of measurement measurement arrows because of and maybe someone in the wrong values somewhere and then quite often in your data looks like this in 1 dimension directly see on the on the right hand side of the decimal values that don't really fits to the really dense line on the left side so what do we do in this case would maybe just remove those thoughts just by by by looking at this plot and decide OK I don't wanna take this into my into my face but what do you do if you were in a in a 10 dimensional space so in n-dimensional space and you can't just see by looking at the plot like this which I outliers you need to somehow makes some complicated pre-processing to eliminate those outliers so what happens if you know just applied the ordinary least square to gets of course the complete wrong results so you would not expect the line to go like this you would rather want to have the line to go through the the the black line to the dance line on the left side so this is something I think what this really you really need consider whenever you look at new data that there no outliers in this new data and that you would come up with something robust so the highest and as a natural generalization of of the least square method is an algorithm that now looks at all
possible pairs of those few sample points and calculates a list of and then in this case and if you have in the end the list of floats you take the median and the median is what makes what what we what them makes the method really robust because the median doesn't care about a single value it only cares about the rank so the order of those values so I think this is easily shown and understood with an example so here again our plot with the with outliers we note take 2 points to 2 red dots here we calculate the slope connecting those 2 blocks and added to the list and just close to the x axis and the slope is 3 . 1 in this case and I would just go on with all possible points and of this time it's 3 . 1 again and so no we are not so lucky anymore so we have 1 outlier connecting to 1 point we consider not to be an outlier and so the is 3 . 1 and you see that the list is sorted and as we go along and another 1 and so that lots of even to outliers and we could go on and on and on but already here we see that if you look at the center of the list of slopes that the media and so this the center is that correct so it's a 3 . 0 and if you look at 3 . 0 so this is the the slope of the line would expect the that whole the line should be so inside the stands line of of our sample points so the whole principle is this that take median and that that don't look at all points on this method that outliers they're not really considered anymore in this case so this is the case for a two-dimensional problem so just 1 feature and a target graph variable Of course this method can be extended to come to n-dimensional space because most cases if you cited learning will have a lot of features are not only 1 feature and I'm in an n-dimensional space so I've given here on this occasion to this paper and in n-dimensional space you don't have any smokes anymore so the slopes become hyperplanes and the list of slopes and becomes a list of vectors and so but you basically do the same thing you sample in an n-dimensional space and plus 1 points making the hyperplane and put this vector of the hyperplane inside the list and then and it becomes a little tricky because and they need to decide what is the median of a list of slopes and a median time of a list of slopes can then be for instance the spatial median and the spatial median is just if you see the list of vector just points in dimensional dimensional space to try to find the 1 point so that the sum of all distances to all other points is minimized so this is the so so called from our the above problems but basically but exactly like like it does here OK then again and the comparison the ordinary square and the and if you do this iteration fall horns it finds the perfect life
so case or this is about the motivation of pilots and and at 1 protect had to deal with corrupted data and outliers could not really by hand who removed by hand and then and then anchorites OK how would I know implemented this I estimating inside of a cited learn so the good thing about sigh learn is that you have a lot of good commentation so I think the
2nd 1 is used so often because the documentation is just so well so if you look for how to run on a regressive directly get manual and you need to do is you want to widen on the present you have to provide far for functions said currents and get permanent parents this is of course the setting and getting the parameters of your estimator and those methods are I they more or less used only internally so I they use for instance if you do cross-validation right to use another kind of made estimator those um functions are used to set and get the parameters of your estimator but you need to implement them for yeah for for your own estimator and of course you need to fit and predict method so the base estimator class which is inside of cycad learn already gives you an implementation of the currents and get power and so so that you can just inherit from it and since we have since I've is a linear model We can also directly inherit linear model and this also gives you to predict method because the linear case assisting before the fall last and predicting and future of design matrix X is just the matrix vector products just take X times the weights w we have calculated before so if we inherited like shown on the right side of we just take in let's or ties and estimated inherit from linear model we already get set parents get parents in Britain and additionally we have so-called mixed in in so I could learn so the principle of makes senses that you have some reusable coat that can only work together inside something larger and and you can combine different mixands inside a class and pipe makes things are done with the help of multiple and multipole inheritance and in our case so that a lot of Nixon's classifiers regressors cluster transform extends our case since writing press we of course inherit also the regressor makes in which gives us additionally additional functionality like for instance a score function so but that's already about it so to see the source code so Tyson and as I said before we just inherited from linear model and
press omics to get sitcoms get parents and predict and all right the init function I made it improved appreciation here so of course state all different kinds of parameters you have been doing it functions like if you want if if the intercept the dominant in the sense them in my consider there like 10 different and parametres also if you wanna work maybe only in a subset of the sample points and so on and if you wanna make the subsampling with the help of some random state and so on so more interesting part is then the sum of the fit function and X is not designed matrix the feature matrix and wider target as usual in and so I could learn here I and check with the help of Czech random state the random status privileges some some some subsampling some of the broken some stopped publication of x if you don't wanna consider also combination and and we also check the various X and Y so check errors and check random states to functions which are psychic you chills and if you write your own function that if you if you write your own estimators and you should have a look in in cycling you chose the developer tools which help you a lot doing those repetitive and things like that checking areas floated the dense format and is the random state even as the number of that you should use a seed or into the random state object itself and should just be passed on so this about the the developer tools inside I learn than the actual and with incomes and 1 is going to much detail about these algorithms such as said before it's basically quite simple it's just a technical because you need to create all those different combinations of sample points in n-dimensional space and you also need to consider that you don't do too much so depending on some some maximum number of samples you might wanna consider and also editor and parallelisation with the help of chocolate which is also included inside the site could learn so it's I could learn also comes with some external packages which are directly included like 6 and the chocolate OK and then In this green tire center within Part I calculate the coefficients of course the source code is online so you can check it out and now and the coefficients they need to be did you need to be stored and stored would predict function to work and be stored in self intercept and self and co and so that the predict method that uses those arrays box and in the end of course the returns self which allows us to change different methods together we can call and directly top products for instance sold after having some program this and I was really happy that that it works so well so without being 2nd learned developer something I could really easily take my title and prototype and put it in inside this framework so that it can be used fees with things like cross-validation for instance and so on and so on I thought OK why not just and give this back to I could learn so that the OK from my boss and decided OK what do we now need to do to really I'm contribute this and again so contributing cycle learners also well documented so they have
really good a high quality standards and and so what we need to do a few of them also want to contribute something you your code of course should be unit tested at least 90 % but of course 100 % and to make sure you method works then of course documentation is really important so I think looking back and the documentation and that took me way longer than that actually writing the code because you need to find good examples you need to explain a little bit you method you need to define all your paramaters in in strings and so on and yeah so you should also consider what the complexity all your algorithm is the spatial and runtime complexity and yeah as as a for like you need to draw some figures maybe you want to compare your method to an already implemented method in psych you've learned and if you got them and the idea of this method from some paper edition of coke costs make a reference to this paper 4 papers then of course coding guidelines to its usual PEP 8 and pi flakes is used as an insight could learn and they really help a lot to find like yeah quite operas problems but it's good that it's of American be automatically checked and it before you should so you should make sure that it was used to cited learned you tales that you don't we implemented stuff that is already there and I another big the barrier for me was that I had to uh and yeah make sure that Michael grunts and Python 2 6 2 7 3 point 4 and so on and so forth and this can be done with the help of 6 that usually heard of and this is also included cycle learned and the police station with the help of chocolate OK so this is about the requirements for contribution and benefit of why not just uh contributors that so it would be about my experiences of so my 1st call requests started on March and yet was my 1st kind of pull requests in in a in the open-source world and that the community of a secular is really great so that there were a lot of improvements due to really good remarks so and I could improve the all with the help of and the 2nd 1 main maintainers the performance was increased by a factor of 5 or 10 even so it was a really huge improvement um and also I get some coding guidelines are still had still wrong at this time so this is really good so of course showing your code to other people always gets you good feedback and then then and there was also discussion about the TEI's and being more statistical arm requests that really machine learning so on integer may be better to start small and forensic so this is a random sampling consensus that method is maybe almost always better than 10 times and this is something that is included in 0 . 15 and at this time there was a secular learn 0 . 14 so it was not included at that time so I didn't even know about this existed so some some during that time I learned about new methods so and yet was really really cool and if you don't wanna follow up on this for request so it's currently it's so secular also ties and is still not included so I'm I'm still working on this and if you want to learn about the discussion was really interesting discussion I can only recommend to everyone if you want but if you want to contribute to an open source project it's always a good idea because during that um yet to index weight you really learned a lot just about you how to improve things and what comes standards are and so on OK so that's about it with my talk on yeah a little marketing slides to you on the company I worked for is hiring maybe you've seen us just of sites we adopt was and we will be here until Sunday so even throw 2 PPI data so if you wanna come talk to us OK
thanks a lot because of
thank you and any questions yeah this is out of this you have all of it was so the question was a little too much additional they're much efficient techniques like what which regression yet rich aggression is included as but yeah it really depends and enrich what richer class ration does is it to remove features completely if you have too many features techniques like last so it's another 1 and rich and that problem is more you wanna would always fitting with those methods so you have let's say 100 features but only 1 thousand samples and this is really prone to overfitting and then you give it to last so a rich or not 1 is idea and then it kind of says OK I throw out feature number 5 and it reduces it's more like a moderate rocker repair reduction thing so yes so that the thing with outliers more is different because you can have this old lies inside 1 features and so I think it's a good idea to also include some more robust estimators incited learn and as of now I mean ransac is known movement and this is algorithm coming more from the computer vision so it's the more prestigious it's not that complicated it's tries to select the right points and check if it acts other samples to to this consensus set and so on so I think and this I could learn developers are really not looking for more robust things in addition to what they already have some more questions what you so the question was if Tyson and could be paralyzed and yet it can and is paralyzed it's so the thing is that taking out those different combinations of all possible points of quotas can be done perfectly in parallel and calculating than the hyperplanes can be done in parallel and writing back to some large arrow error rates can be done in parallel so this is what I did with the help of chocolate which is included so I could learn and this works really good only last step that you need to find this 1 single spatial media media and so this is and so the algorithm is based on the reweighted least square thing is called modified bytes Feltz method and this and understand iterative and can't be paralyzed but the 1st part of the course is usually paralyzed feel
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


  327 ms - page object


AV-Portal 3.8.0 (dec2fe8b0ce2e718d55d6f23ab68f0b2424a1f3f)