00:16
so the talk today is about cited learned or in other
00:19
words why I think it's I could learn so so-called of possible I would like to ask you
00:25
3 questions know what your framework color actually but and if you already know 1 Russian again all right perfect and the 2nd 1 is any you ever use secular OK and the 3rd 1 is
00:46
only give you also attended this great training on cited yesterday OK OK it's just too raised questions again
00:58
so what actually much learning means Our washer learning there are many definitions about margin running a 1 and this uh inasmuch much allowing teachers machines how to carry out task by themselves OK with bird simple definition and it's that simple the complex come with the details of character who is very general definition of just to give you the intuition and behind
01:25
marshalling at a glance motion learning yes about algorithms that are able to analyze to crunch the data and in particular to limit the data obtained from the data of there we basically uh exploits statistical approaches so that's why sort that
01:43
also is bird you each word in this class must learning is
01:48
almost related to data analysis techniques and there are many buzz words about Marcia learning and you may have heard about dead analysis Data mining data Big data and data science kind of a data science actually is the of the generalizable extraction of knowledge from data and machine learning is related to data science
02:12
according to your comment on this road diagram much learning is in the middle of and Data Science is a part of national and because it exploits Marshall and mentioning is a fundamental part in data science step OK
02:28
but what what is actually the relation with all data mining and the 2 losses in general with much learning um model learning is
02:38
about to make predictions OK so instead of only we analyze the data we have much learning is also able to generalize from the state OK so we have the the we have a bunch of data that we may want to crunch these data to make certain statistics analysis on the later but and that's it OK and this is also called data mining 1st Russia learning is a bit different because much learning performs this analysis by the the goal is as lively different the goal is analyze this data and generalize try to find a way to learn from this data general model for future data for data that are already uh that are almost and see at this point OK so the idea is a pattern exists in the data we cannot pin this pattern manually but we have data on it OK so we may learn from this data in other words this kind of also known as learning by examples OK but a learning comes in 2 different settings and there is the
03:47
supervised settings this is the general pipeline all much learning algorithm using you have all the data on the upper left corner you translate uh you translate the data into feature vectors this all about almost uh a common step in preprocessing the data then you feed this feature vectors to you're Marshall learning algorithms and the supervised learning setting supports also the the labels which is the set of expected results on this data and then we combine we g generate visible uh from feature vectors and labels and we generalize that we we get the model to predict for a future data in there the bottom left corner of the fear a kind of
04:34
a classical example of supervised learning is the classification you have 2 different groups of data in this case if you want to find a general rule to seperate these data this data Kaiser you you find in this case even a function that separates the data and for future data you will all you will be able to to know which is the class this is it's it's about our classifications we have 2 classes and in the future when you've got a new data you will be able to predict which has the class associated to this data another example is the clustering in this case the setting is also unsupervised learning the pipeline processing is this
05:20
1 year have the same old processing but what you missed is the label part of character because that's why this a school uncivilized because you have no supervision on the data you have no label to predict OK and of the 4 it as for the clustering the problem is
05:38
that a bunch of data and try to cluster rise in another words to uh separated the data into different groups OK so you have a bunch of data you want to identify the groups inside the state of again just a brief introduction to what
05:54
about Python Python and data science of the related nowadays actually apply than is getting more and more packages to for computational science according to this graph Python is Academy edge technology for this kind of computation it's about almost it's in the upper right corner and actually it's not him replacing uh
06:26
and substituting all the technology is 1 of the advantages such as R or Matlab persons 1 of the advantages of Python is that Python provides a unique programming language across different application it has a very huge set of libraries to exploit and this is the case this is why the reason why the Python is the language of choice on its for it assigns almost like and this this placing Matlab and by the way there will be also a pied-a-terre conference at the end of the day of the week it will be decided on friday so if you if you can please call
07:10
of data science and Python actually math what can be easily substituted by all of these technologies such as Python you apply sci-fi matplotlib for plotting but there are many other possibilities for special for reporting on are it's got to be easily subdued substituted with appendices right package and In the Python legal system we have also some efficient of Python interpreters that have been compiled for it this kind of computations such as medical knowledge and volcanic pipes and we cited or projects like site and sight and it's very great project to allow to to be used the computation of the Python code
07:59
the packages for much learning on manifold
08:02
actually I am trying to to to to to as described the British bit all the a set well-known packages for for much learning code and so I would like to teach to make some consideration why 2nd learners at very great 1 kind of we have so far commercially primal natural language what took it and I'll take a sometimes called the should be much learning toolbox this morning there have been a talk about it secular and of course by Graham of no OK and there's a guy who the answer to a list and this on the town where uh and everybody can is our offered a contribution to this list uh in order to distribute the the knowledge about available packages in different languages and Python is very slow yeah and so we have park and lead spartanly actually used implemented in skull it's not uh Python it's is rotting in Python which is called prices py spot at but actually the library for Russia learning is an atom very Ellis stage but should you and is written in C + + and it offers a lot of interfaces 1 of these interfaces in Python the other packages there are Python powered so we're trying to to to take to talk about this just in natural language from Acadia for is implemented in pure Python so no new pyrosulphite allowed but in the other packages are implemented in Pine of so that the code there is a flight more efficient for large scale computation and not the supports Python to and Python 3 all is also in a stage PyMOL suppose Python to actually put the sport by country is not so clear I rates of course only Python to and these all the 2 guys they're supposed both Python to him by the tree account in what about the purpose of this packages and is for natural language processing OK and that's some algorithms for much learning but actually it is not supposed to be uh used in complete must learning environment it's almost related to Text Analysis Natural Language Processing general Piomelli is almost a focuses on supervised learning in particular through as the the technique which is a sample vector machine that doesn't many of algorithm especially related to use of self supervised learning Pyrenees foreign nation neural network with which is another set of techniques in the Marshall learning because system the other 2 guys there are some more general problems of so psychic MML much learning pi r contains all algorithms for supervised and unsupervised learning and some others their friends and like the different settings for machine learning OK so we're we remove that we will not consider any more there and brain OK from Europe so but we ended up with this these tree aligner is written in Python for our national code so why to
11:46
Bell itself and he's that Big Data Guide recommends secular 1st 6th reasons the first one is commitment to the conduction mentation reusability secular as a brilliant documentation and it's very very useful for newcomers and for people without any background about much learning the 2nd reason is moles altruism and implemented by a dedicated team of experts and then the the 2nd model supported by the Library covers most Marshall learning tasks OK uh python and PPI data improved the support forward uh data science the 2 science tools that defines problems and actually I think I know if you know Candle candle is a site where you made of apply competition for it data science and psychic is 1 of the most used package for this kind of competition the fire the another reason should be the focus secularism Russia learning library and its goal is to provide a set of common algorithm to Python uses from a consistent interface these 2 features are to all the features that I like the most up there will be a lot more precise in a few slides about this L and finally but by no means the sole last month 9 wins places like scales data problems of characters so scalability is another feature this site can learn supports out of the box if you want to
13:27
install secular you have to pay very few comments you need to install new pi sci-fi natural
13:37
IPython actually is not needed is just for convenience and then you install secular and all the other packages that pions sell find predictor are required because secular is based on new points OK but anyway if you want to install on the uh version of the Python interpreter such as a it's already provided on the box
14:01
the design philosophy of cycle so 1 of the greatest features of this package I guess uh in my opinion it includes all the batteries necessary for general populace national encode each has as its supports features for and functionalities for data in datasets feature selection and extraction of feature extraction algorithms much learning algorithms in general in different settings social classification regression clustering and stuff like that and finally evaluation functions for cross-validation confusion metrics will see some examples in the next slides the algorithm selection philosophy for this package is try to keep the court as light as possible and try to include only the well-known L largely used much learning algorithms a so the focus here is to be as much general-purpose as possible OK so in order to include a broad audience of users at a
15:06
glance this is a a great up to like there's a great fit picture depicting all that the fact that the features of provided by secular and this figure here this as being governed by the documentation this is a sort of map you may follow to is that allows you to to choose the particular much learning techniques you wanted to use in your emotional learning so there are some clusters in this picture there is regression over their classification clustering and dimensionality reduction and you may follow these kind of I pass over there to to decide which kind of which is the setting was suited for your problem can the API of psychiatrists very intuitive and so of mostly consistence to every motion learning technique uh there are 4 different objects that there is the estimated the predictor former animal cat they In this interfaces are implemented might not almost all the of learning algorithms included
16:20
in the library for instance let's make an example of the API for the estimator is 1 of the maidens effects OK the at and estimator is an object that fits the model based on some training data and is capable of inferring some properties on new data for example if we want to create an algorithm which is called k or k neighbors classifiers we the KNN algorithm which is a classifier so it's an is for classification problems and then supervised learning it as the feet more method
16:56
but for all of also sorry for also and supervised learning algorithms such as k-means the became order is an estimated as well as it implements the FT method to for feature selection is always sign OK then the predictor
17:14
in the predictor provides the predicted and the predictive probability
17:20
method and finally the transformer here's the transformer is about the transfer method that and sometimes there's also the the transform method that applies the fate and then the transformation of the data but also raises to to make the transformation of the data 2 to to to to make data uh able to end for that is able to be processed by the algorithms finally the
17:50
last 1 is the model the model s and the the general model you might create your lecture learning algorithm the mobile is for supervised and 4 as revised algorithms and another great feature of Bachelor of reflected in these points because the psychic provides and rights way to create a pipeline processing so in this case you may create a pipeline of different processing steps OK just out of the box you might apply these select k best which is a feature selection step then after the feature selection you might apply PCA PCA so feature is in an algorithm forward the dimensionality reduction and then you may apply a logistic regression which is it possible a classifier OK so you might associate pipelined processing very well very easily see it and then you call the fixed method on the pipe length and the feed method will um and then the predict the only constraint here is that the last step of the pipeline should be class that implements the predictive methods so a predictor so can suppose a good OK great so let's see some examples I
19:17
actions we have but it's very introductory example um the 1st thing to to consider is the data presentation actually cite it is based on replies advising so all the data are usually represented as matrices and vectors in general in much learning by definition we have the axis is not fixed forward there which is usually identified by the cup to let because it doesn't matter as a maverick self uh and different rows and the different colors in this case I'm sorry in this case
19:56
N is the number of samples we have in our dataset and the is the number of features and so the number of the relevant information on the data we have OK so the data comes the training data comes in this flavor and it under the hood it is implemented by sci-fi don't spots Madison can usually it is defined and not mistaken and should be as implementations will come sparse wrote a compresses phosphoryl again and finally we have the labels because we know that the the values for each of this data about about the problem we have in the problem
20:40
we're going to consider is about the IRIS dataset and we want to design a library and that is able to automatically recognize Iris species OK so we have 3 different species survive this we have I was VersaColor in on the left I was with Jenny cannot here and uh I was that tells us you going on some the features we're going to consider out for an obvious the length of the stable and we the disabled line pay and the we of the paper OK so every data in this dataset comes as a vector and then resample sorry comes as a vector for 4 different features against this for years cycle already has a great package to handle datasets actually being is particularly because it is very well known in many fields and is already embedded in the 2nd learned library so you only need to import the datasets package called that Lord iris and then you you called the functional load virus and the I was object is a bunch object that
21:52
contains different keys it as the target names to the data the target a description of the dataset and the feature names a kind description is the description of a word was description of these feature names the 4 different features I've already mentioned in the previous slides the time the names are that targets we expected on this dataset in particular Sytos VersaColor originated at 2 different I was suspicious we want to predict then we have the data
22:22
so we go there and I was not data comes environment metrics and you rate the shape of this matrix A S A 1 50 100 and 150 rose times for uh for which is 4 different the columns and the target start 100 50 because we have a value for the target value of target for each sample in the dataset so and the number of samples in this case is 150 d the number of feature in this case is for and this a and the targets here is
23:02
that the result of stars OK so we have a value that ranges from 0 to 2 corresponding to the pre different classes we want to predict
23:12
we might try to apply a classification problem on
23:15
this data we want to exploit the KNN algorithm the idea of the KNN classifier is is pretty simple but in for
23:24
example if we consider K which is able to 6 were going to check the the the classes this is the new data we trying our global with the training data and if we want to predict the class of this new data on the and that the classes of the the 60 years the inverse of the state of that in this case
23:50
should be the bridging the gap because the dot on the red dot spoken were sample is a
23:58
few lines of code we import the dataset we call the k neighbor classifier algorithm in in this case we select and neighbours equals to 1 then we call the fits method and we train all mobile that if
24:13
this is what we get actually if you want to plot the data and these these are called the decision boundaries of the classifier and if you want to know for new data which of the kind of which is a species of virus that task 3 centimeters times 5 centimeters of several and 4 times 2 centimeters of pattern weird OK right let's check I just got target names in and and OR predict because Canada is a classifier so that you may fit the data and also predicts that after the training and its cells OK it's virginica again to president right then we
24:56
might also try to instead of of facing this problem as a classification you may also face the problem as a nose ring unsupervised setting so as a clustering problem in this case we're going to use the K-means algorithm the k-means algorithm is pretty the ideas a simple the we want to we create an idiot cluster of object in each object these equal distance to the center of this of this cluster of and
25:28
that's it and cycle this 1st simple we have the the painting we see we specify the number of clusters we want to have in the k-means in this case 1 precursors because we're going to predict 3 different our speeches from the Irish and then this is the ground truth so this is the value we expect that this is what we got after calling the the k-means as you might already noticed the interface for the 2 ordering is exactly the same even if that much learning settings are completely different in the former case it was supervised in this latter case is unsupervised OK so classification versus clustering finally reversed his
26:12
life to conclude another great battery included in inside and I'm I had a little many other machine learning a libraries in Python at has so complete internal batteries is about the model evaluation irony model agent is necessary to know how do we know if our predicted or our prediction model is good so
26:40
we apply model validation techniques we might simply try to verify that every prediction correspond to the actual to the actual target kind but this is meaningless because we're trying to verify if we train all the data on the training OK so there's there's this kind of relation is very poor because our because it's based only on the training so we we're just checking if we are you able to fit the data but we are not able to use that to test the from the mobile the final model is able to generalize cat because a key feature of this kind of technique is the generalization so no the goal too much to the training data because it's if you will end up being a problem which is called overfitting but you need to generalized to to to be able to analyze and to be able to predict even new data that are not actually uh identical to the training data that 1 usually technique uses much learning is the so-called confusion metrics of that movement psychic provides the in that the metrics package provides a different kind of metrics to evaluate your performance in this case we're we're going to use the confusion matrix the condition method is is very simple is a methods work it's the number it has is a square matrix where the rows and the columns corresponds to the number of classes you want to predict a guide and in the diagonal you have all these the classes of that you expect with respect to the class that you predict correct so you have all the possible matchings if you have all the data the on the other hand on the on the diagonal itself that you predicted perfectly all the classes again is that there OK right thank you but and
28:41
added a very well known as word you guys that are already aware Russia learning in the cross-validation technique cross-validation is a movable addition techniques for assessing how the results of the statistical analysis of data is able to generalize to independent datasets not only to the set we use for training a character and 2nd are ready provide all the features to handle this kind of stuff so all psychic uh Our imposes us to write very few code just the few lines of code necessary to import the functions already provided in the library but in other cases we were indeed we were required to implement this kind of functions over and over for every time in In our Python code can so this is very weak so very useful even for laser programmers like me again In this case we have we exploit the trying test late so we the idea of the cross-validation here is of the 2 splitting the data the training data into different sets of the data to the training set and the test set so we fit on the training set and we predict on the test set OK so in this case we will see we we we see that there are some errors of cash coming from this prediction OK this is a more obvious way to evaluate our work prediction model OK so the last couple of things think in the last couple of things
30:22
fears large-scale out of the box kind of great battery into inside here is the support for large scale computation an array of of the box you buy combined-cycle encode we every library you want to use for multiprocessing Laura Paolo computation distributed computation but if you the 1 to exploit the already provided features for this kind of stuff is some there are many techniques in the library that allows for a parameter which is called an analytical jobs if you use this set this us with the value different to 1 which is the default value it's uh perform the cut performs the computation on the did different CPU you have in your machine if you put the minus 1 value here and this means that is going it is going to exploit all this accuse you have when you're a single motion and this is 1 of 4 different settings for different kind of applications in machine learning you may apply multiple processing for clustering the k-means examples that we made a few slides ago for cross-validation 1st of all 4 and research and research is another of the rates of features included a future influence like that is able to the identify the best parameter for prediction that would for a predictor that maximizes the value for the cross-validation so we want to get the best parameters for our model that maximizes the cross-correlation so that is able to generalize the best OK just to to to to to give the intuition OK this is my possible thanks to the job leave a library which is provided in the background of Paxil and the food the new number jobs here respond to a call to the job that there the job well-documented as well so you might read review documentation for any additional details and last but by no
32:37
means least psyche admits any all the libraries can as a sort
32:43
of psychic could be integrated with an L 2 K the is that is never language token and for psychic image just to make a couple examples in details like made natural and spoken but by design and not a k includes additional modules which is an optic Adolph classified uptake merit which is actually a rapid in and take a library that allows us to translate the API taking it in the API using them together OK so if you have a code on an Lck you want to apply a classifier Exploiting the psyche library upon you my translate like you may what the classifier from psycinfo and then you might say use the sacred learned classifier class from the end decay package over there and I wrap the interface for this classifier to the ones of psyche and that it it is in this case we as the C. that stands for support vector OK and then you may also include this kind of stuff in the pipeline processing of circuit so in conclusion
33:57
secular is not the only national learning library available in Python but this powerful and in my opinion easy-to-use very efficient implementation provided if it's based on replies sci-fi insight and under the Food and it is highly integrated for example in and LTK or the 2nd thing is just an example
34:18
so I really hope that you are looking forward to using it and thanks for the
34:25
kind of tensions few thank you thank you very very you with the fix minute left for your
34:35
question please raise your hand them over and Weibull microphone the 1st of the the well things little to show questions that I could provide any online learning methods yeah yeah actually this is a point I I I wasn't able to include in the slides the online learning is already provided and there are many classifiers or techniques that allows for a method which is called partial fit OK so you have this method to up provide the the mobile a bunch of data one-at-a-time character so the interface has been extended by a partial fit method so some techniques allow for online learning and another very a great the usage of this part of it is in case of the so called out of core core learning in that case the the in the out of cold up of course our the learning setting you're your data are to too big to fit in the memory so you provide the data 1 bunch of bunch of data 1 time because they're too big to fit in the memory so you call powerful fit method to In case because referred to fit a model a bunch of a bunch of partner again things seconds could the regression on is there any support for missing values or missing labels apart from just leading In case of online learning or any other in general for Energy learning for missing labels missing labels are missing data on what you mean so like you get a feature vector that dismisses like a value at the 3rd component actually had enough budget and further really yes thank you just that much so we have a very simple and future that can include by uh median or mean in the different directions uh so if you have very few missing that doesn't work well if you have a lot uh then you you might want to look at matrix completion uh and that is which we do not have we had a Google Summer of Code project in this cluster it didn't finished we welcome contributions of course but here the type and I have some experience section of a psychic before and an action image which is meant mathematician and as I had no way of 8 years about all that to stuff under the hood and you want to deep to use a to be too deep inside of world organism starts and mathematics and and this is the biggest problem from the words to realize what to it so if you got some kind of big data sets with features labels supervised learning how what would you and a tries to someone who doesn't know how does work what which set for which so smaller smaller easy solutions should I consider to improve the results of specific excitation you actually national learning is about finding the right will when the right parameters and so there are many steps you may want to apply in in your training the different algorithms in general you apply data normalization step so you you might forcible that the 1st step I suggest it is preprocessing of the data analyze the data you make some uh statistical tests on the data some preprocessing some visualization of your data manager no what kind of data you're dealing with eventually 1st step the 2nd 1 is trying the same balls mobile you you want to fly and then improving 1 step at a time can if you find the right will you want to use that you want to Our finally you should you're required to find the best settings for that will get in that case you might end up using the greedy search method for sensor which is a method provided of box just to find the best combination of parameters that maximizes the values of the cross-validation person and of course as a training on the job right so you see you may find that the the right will Fourier predictions forum I find were small than you start over again the for different roles so of that of the cells but yes things again malaria laughing he hears going he's going to give a talk at the data as well I think on Saturday isn't yet on saturday so if you attend PPI data don't miss that talk of well and yet thanks again