Merken
Scikitlearn to "learn them all"
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:16
so the talk today is about cited learned or in other
00:19
words why I think it's I could learn so socalled of possible I would like to ask you
00:25
3 questions know what your framework color actually but and if you already know 1 Russian again all right perfect and the 2nd 1 is any you ever use secular OK and the 3rd 1 is
00:46
only give you also attended this great training on cited yesterday OK OK it's just too raised questions again
00:58
so what actually much learning means Our washer learning there are many definitions about margin running a 1 and this uh inasmuch much allowing teachers machines how to carry out task by themselves OK with bird simple definition and it's that simple the complex come with the details of character who is very general definition of just to give you the intuition and behind
01:25
marshalling at a glance motion learning yes about algorithms that are able to analyze to crunch the data and in particular to limit the data obtained from the data of there we basically uh exploits statistical approaches so that's why sort that
01:43
also is bird you each word in this class must learning is
01:48
almost related to data analysis techniques and there are many buzz words about Marcia learning and you may have heard about dead analysis Data mining data Big data and data science kind of a data science actually is the of the generalizable extraction of knowledge from data and machine learning is related to data science
02:12
according to your comment on this road diagram much learning is in the middle of and Data Science is a part of national and because it exploits Marshall and mentioning is a fundamental part in data science step OK
02:28
but what what is actually the relation with all data mining and the 2 losses in general with much learning um model learning is
02:38
about to make predictions OK so instead of only we analyze the data we have much learning is also able to generalize from the state OK so we have the the we have a bunch of data that we may want to crunch these data to make certain statistics analysis on the later but and that's it OK and this is also called data mining 1st Russia learning is a bit different because much learning performs this analysis by the the goal is as lively different the goal is analyze this data and generalize try to find a way to learn from this data general model for future data for data that are already uh that are almost and see at this point OK so the idea is a pattern exists in the data we cannot pin this pattern manually but we have data on it OK so we may learn from this data in other words this kind of also known as learning by examples OK but a learning comes in 2 different settings and there is the
03:47
supervised settings this is the general pipeline all much learning algorithm using you have all the data on the upper left corner you translate uh you translate the data into feature vectors this all about almost uh a common step in preprocessing the data then you feed this feature vectors to you're Marshall learning algorithms and the supervised learning setting supports also the the labels which is the set of expected results on this data and then we combine we g generate visible uh from feature vectors and labels and we generalize that we we get the model to predict for a future data in there the bottom left corner of the fear a kind of
04:34
a classical example of supervised learning is the classification you have 2 different groups of data in this case if you want to find a general rule to seperate these data this data Kaiser you you find in this case even a function that separates the data and for future data you will all you will be able to to know which is the class this is it's it's about our classifications we have 2 classes and in the future when you've got a new data you will be able to predict which has the class associated to this data another example is the clustering in this case the setting is also unsupervised learning the pipeline processing is this
05:20
1 year have the same old processing but what you missed is the label part of character because that's why this a school uncivilized because you have no supervision on the data you have no label to predict OK and of the 4 it as for the clustering the problem is
05:38
that a bunch of data and try to cluster rise in another words to uh separated the data into different groups OK so you have a bunch of data you want to identify the groups inside the state of again just a brief introduction to what
05:54
about Python Python and data science of the related nowadays actually apply than is getting more and more packages to for computational science according to this graph Python is Academy edge technology for this kind of computation it's about almost it's in the upper right corner and actually it's not him replacing uh
06:26
and substituting all the technology is 1 of the advantages such as R or Matlab persons 1 of the advantages of Python is that Python provides a unique programming language across different application it has a very huge set of libraries to exploit and this is the case this is why the reason why the Python is the language of choice on its for it assigns almost like and this this placing Matlab and by the way there will be also a piedaterre conference at the end of the day of the week it will be decided on friday so if you if you can please call
07:10
of data science and Python actually math what can be easily substituted by all of these technologies such as Python you apply scifi matplotlib for plotting but there are many other possibilities for special for reporting on are it's got to be easily subdued substituted with appendices right package and In the Python legal system we have also some efficient of Python interpreters that have been compiled for it this kind of computations such as medical knowledge and volcanic pipes and we cited or projects like site and sight and it's very great project to allow to to be used the computation of the Python code
07:59
the packages for much learning on manifold
08:02
actually I am trying to to to to to as described the British bit all the a set wellknown packages for for much learning code and so I would like to teach to make some consideration why 2nd learners at very great 1 kind of we have so far commercially primal natural language what took it and I'll take a sometimes called the should be much learning toolbox this morning there have been a talk about it secular and of course by Graham of no OK and there's a guy who the answer to a list and this on the town where uh and everybody can is our offered a contribution to this list uh in order to distribute the the knowledge about available packages in different languages and Python is very slow yeah and so we have park and lead spartanly actually used implemented in skull it's not uh Python it's is rotting in Python which is called prices py spot at but actually the library for Russia learning is an atom very Ellis stage but should you and is written in C + + and it offers a lot of interfaces 1 of these interfaces in Python the other packages there are Python powered so we're trying to to to take to talk about this just in natural language from Acadia for is implemented in pure Python so no new pyrosulphite allowed but in the other packages are implemented in Pine of so that the code there is a flight more efficient for large scale computation and not the supports Python to and Python 3 all is also in a stage PyMOL suppose Python to actually put the sport by country is not so clear I rates of course only Python to and these all the 2 guys they're supposed both Python to him by the tree account in what about the purpose of this packages and is for natural language processing OK and that's some algorithms for much learning but actually it is not supposed to be uh used in complete must learning environment it's almost related to Text Analysis Natural Language Processing general Piomelli is almost a focuses on supervised learning in particular through as the the technique which is a sample vector machine that doesn't many of algorithm especially related to use of self supervised learning Pyrenees foreign nation neural network with which is another set of techniques in the Marshall learning because system the other 2 guys there are some more general problems of so psychic MML much learning pi r contains all algorithms for supervised and unsupervised learning and some others their friends and like the different settings for machine learning OK so we're we remove that we will not consider any more there and brain OK from Europe so but we ended up with this these tree aligner is written in Python for our national code so why to
11:43
change cycling up
11:46
Bell itself and he's that Big Data Guide recommends secular 1st 6th reasons the first one is commitment to the conduction mentation reusability secular as a brilliant documentation and it's very very useful for newcomers and for people without any background about much learning the 2nd reason is moles altruism and implemented by a dedicated team of experts and then the the 2nd model supported by the Library covers most Marshall learning tasks OK uh python and PPI data improved the support forward uh data science the 2 science tools that defines problems and actually I think I know if you know Candle candle is a site where you made of apply competition for it data science and psychic is 1 of the most used package for this kind of competition the fire the another reason should be the focus secularism Russia learning library and its goal is to provide a set of common algorithm to Python uses from a consistent interface these 2 features are to all the features that I like the most up there will be a lot more precise in a few slides about this L and finally but by no means the sole last month 9 wins places like scales data problems of characters so scalability is another feature this site can learn supports out of the box if you want to
13:27
install secular you have to pay very few comments you need to install new pi scifi natural
13:37
IPython actually is not needed is just for convenience and then you install secular and all the other packages that pions sell find predictor are required because secular is based on new points OK but anyway if you want to install on the uh version of the Python interpreter such as a it's already provided on the box
14:01
the design philosophy of cycle so 1 of the greatest features of this package I guess uh in my opinion it includes all the batteries necessary for general populace national encode each has as its supports features for and functionalities for data in datasets feature selection and extraction of feature extraction algorithms much learning algorithms in general in different settings social classification regression clustering and stuff like that and finally evaluation functions for crossvalidation confusion metrics will see some examples in the next slides the algorithm selection philosophy for this package is try to keep the court as light as possible and try to include only the wellknown L largely used much learning algorithms a so the focus here is to be as much generalpurpose as possible OK so in order to include a broad audience of users at a
15:06
glance this is a a great up to like there's a great fit picture depicting all that the fact that the features of provided by secular and this figure here this as being governed by the documentation this is a sort of map you may follow to is that allows you to to choose the particular much learning techniques you wanted to use in your emotional learning so there are some clusters in this picture there is regression over their classification clustering and dimensionality reduction and you may follow these kind of I pass over there to to decide which kind of which is the setting was suited for your problem can the API of psychiatrists very intuitive and so of mostly consistence to every motion learning technique uh there are 4 different objects that there is the estimated the predictor former animal cat they In this interfaces are implemented might not almost all the of learning algorithms included
16:20
in the library for instance let's make an example of the API for the estimator is 1 of the maidens effects OK the at and estimator is an object that fits the model based on some training data and is capable of inferring some properties on new data for example if we want to create an algorithm which is called k or k neighbors classifiers we the KNN algorithm which is a classifier so it's an is for classification problems and then supervised learning it as the feet more method
16:56
but for all of also sorry for also and supervised learning algorithms such as kmeans the became order is an estimated as well as it implements the FT method to for feature selection is always sign OK then the predictor
17:14
in the predictor provides the predicted and the predictive probability
17:20
method and finally the transformer here's the transformer is about the transfer method that and sometimes there's also the the transform method that applies the fate and then the transformation of the data but also raises to to make the transformation of the data 2 to to to to make data uh able to end for that is able to be processed by the algorithms finally the
17:50
last 1 is the model the model s and the the general model you might create your lecture learning algorithm the mobile is for supervised and 4 as revised algorithms and another great feature of Bachelor of reflected in these points because the psychic provides and rights way to create a pipeline processing so in this case you may create a pipeline of different processing steps OK just out of the box you might apply these select k best which is a feature selection step then after the feature selection you might apply PCA PCA so feature is in an algorithm forward the dimensionality reduction and then you may apply a logistic regression which is it possible a classifier OK so you might associate pipelined processing very well very easily see it and then you call the fixed method on the pipe length and the feed method will um and then the predict the only constraint here is that the last step of the pipeline should be class that implements the predictive methods so a predictor so can suppose a good OK great so let's see some examples I
19:17
actions we have but it's very introductory example um the 1st thing to to consider is the data presentation actually cite it is based on replies advising so all the data are usually represented as matrices and vectors in general in much learning by definition we have the axis is not fixed forward there which is usually identified by the cup to let because it doesn't matter as a maverick self uh and different rows and the different colors in this case I'm sorry in this case
19:56
N is the number of samples we have in our dataset and the is the number of features and so the number of the relevant information on the data we have OK so the data comes the training data comes in this flavor and it under the hood it is implemented by scifi don't spots Madison can usually it is defined and not mistaken and should be as implementations will come sparse wrote a compresses phosphoryl again and finally we have the labels because we know that the the values for each of this data about about the problem we have in the problem
20:40
we're going to consider is about the IRIS dataset and we want to design a library and that is able to automatically recognize Iris species OK so we have 3 different species survive this we have I was VersaColor in on the left I was with Jenny cannot here and uh I was that tells us you going on some the features we're going to consider out for an obvious the length of the stable and we the disabled line pay and the we of the paper OK so every data in this dataset comes as a vector and then resample sorry comes as a vector for 4 different features against this for years cycle already has a great package to handle datasets actually being is particularly because it is very well known in many fields and is already embedded in the 2nd learned library so you only need to import the datasets package called that Lord iris and then you you called the functional load virus and the I was object is a bunch object that
21:52
contains different keys it as the target names to the data the target a description of the dataset and the feature names a kind description is the description of a word was description of these feature names the 4 different features I've already mentioned in the previous slides the time the names are that targets we expected on this dataset in particular Sytos VersaColor originated at 2 different I was suspicious we want to predict then we have the data
22:22
so we go there and I was not data comes environment metrics and you rate the shape of this matrix A S A 1 50 100 and 150 rose times for uh for which is 4 different the columns and the target start 100 50 because we have a value for the target value of target for each sample in the dataset so and the number of samples in this case is 150 d the number of feature in this case is for and this a and the targets here is
23:02
that the result of stars OK so we have a value that ranges from 0 to 2 corresponding to the pre different classes we want to predict
23:12
we might try to apply a classification problem on
23:15
this data we want to exploit the KNN algorithm the idea of the KNN classifier is is pretty simple but in for
23:24
example if we consider K which is able to 6 were going to check the the the classes this is the new data we trying our global with the training data and if we want to predict the class of this new data on the and that the classes of the the 60 years the inverse of the state of that in this case
23:50
should be the bridging the gap because the dot on the red dot spoken were sample is a
23:58
few lines of code we import the dataset we call the k neighbor classifier algorithm in in this case we select and neighbours equals to 1 then we call the fits method and we train all mobile that if
24:13
this is what we get actually if you want to plot the data and these these are called the decision boundaries of the classifier and if you want to know for new data which of the kind of which is a species of virus that task 3 centimeters times 5 centimeters of several and 4 times 2 centimeters of pattern weird OK right let's check I just got target names in and and OR predict because Canada is a classifier so that you may fit the data and also predicts that after the training and its cells OK it's virginica again to president right then we
24:56
might also try to instead of of facing this problem as a classification you may also face the problem as a nose ring unsupervised setting so as a clustering problem in this case we're going to use the Kmeans algorithm the kmeans algorithm is pretty the ideas a simple the we want to we create an idiot cluster of object in each object these equal distance to the center of this of this cluster of and
25:28
that's it and cycle this 1st simple we have the the painting we see we specify the number of clusters we want to have in the kmeans in this case 1 precursors because we're going to predict 3 different our speeches from the Irish and then this is the ground truth so this is the value we expect that this is what we got after calling the the kmeans as you might already noticed the interface for the 2 ordering is exactly the same even if that much learning settings are completely different in the former case it was supervised in this latter case is unsupervised OK so classification versus clustering finally reversed his
26:12
life to conclude another great battery included in inside and I'm I had a little many other machine learning a libraries in Python at has so complete internal batteries is about the model evaluation irony model agent is necessary to know how do we know if our predicted or our prediction model is good so
26:40
we apply model validation techniques we might simply try to verify that every prediction correspond to the actual to the actual target kind but this is meaningless because we're trying to verify if we train all the data on the training OK so there's there's this kind of relation is very poor because our because it's based only on the training so we we're just checking if we are you able to fit the data but we are not able to use that to test the from the mobile the final model is able to generalize cat because a key feature of this kind of technique is the generalization so no the goal too much to the training data because it's if you will end up being a problem which is called overfitting but you need to generalized to to to be able to analyze and to be able to predict even new data that are not actually uh identical to the training data that 1 usually technique uses much learning is the socalled confusion metrics of that movement psychic provides the in that the metrics package provides a different kind of metrics to evaluate your performance in this case we're we're going to use the confusion matrix the condition method is is very simple is a methods work it's the number it has is a square matrix where the rows and the columns corresponds to the number of classes you want to predict a guide and in the diagonal you have all these the classes of that you expect with respect to the class that you predict correct so you have all the possible matchings if you have all the data the on the other hand on the on the diagonal itself that you predicted perfectly all the classes again is that there OK right thank you but and
28:41
added a very well known as word you guys that are already aware Russia learning in the crossvalidation technique crossvalidation is a movable addition techniques for assessing how the results of the statistical analysis of data is able to generalize to independent datasets not only to the set we use for training a character and 2nd are ready provide all the features to handle this kind of stuff so all psychic uh Our imposes us to write very few code just the few lines of code necessary to import the functions already provided in the library but in other cases we were indeed we were required to implement this kind of functions over and over for every time in In our Python code can so this is very weak so very useful even for laser programmers like me again In this case we have we exploit the trying test late so we the idea of the crossvalidation here is of the 2 splitting the data the training data into different sets of the data to the training set and the test set so we fit on the training set and we predict on the test set OK so in this case we will see we we we see that there are some errors of cash coming from this prediction OK this is a more obvious way to evaluate our work prediction model OK so the last couple of things think in the last couple of things
30:22
fears largescale out of the box kind of great battery into inside here is the support for large scale computation an array of of the box you buy combinedcycle encode we every library you want to use for multiprocessing Laura Paolo computation distributed computation but if you the 1 to exploit the already provided features for this kind of stuff is some there are many techniques in the library that allows for a parameter which is called an analytical jobs if you use this set this us with the value different to 1 which is the default value it's uh perform the cut performs the computation on the did different CPU you have in your machine if you put the minus 1 value here and this means that is going it is going to exploit all this accuse you have when you're a single motion and this is 1 of 4 different settings for different kind of applications in machine learning you may apply multiple processing for clustering the kmeans examples that we made a few slides ago for crossvalidation 1st of all 4 and research and research is another of the rates of features included a future influence like that is able to the identify the best parameter for prediction that would for a predictor that maximizes the value for the crossvalidation so we want to get the best parameters for our model that maximizes the crosscorrelation so that is able to generalize the best OK just to to to to to give the intuition OK this is my possible thanks to the job leave a library which is provided in the background of Paxil and the food the new number jobs here respond to a call to the job that there the job welldocumented as well so you might read review documentation for any additional details and last but by no
32:37
means least psyche admits any all the libraries can as a sort
32:43
of psychic could be integrated with an L 2 K the is that is never language token and for psychic image just to make a couple examples in details like made natural and spoken but by design and not a k includes additional modules which is an optic Adolph classified uptake merit which is actually a rapid in and take a library that allows us to translate the API taking it in the API using them together OK so if you have a code on an Lck you want to apply a classifier Exploiting the psyche library upon you my translate like you may what the classifier from psycinfo and then you might say use the sacred learned classifier class from the end decay package over there and I wrap the interface for this classifier to the ones of psyche and that it it is in this case we as the C. that stands for support vector OK and then you may also include this kind of stuff in the pipeline processing of circuit so in conclusion
33:57
secular is not the only national learning library available in Python but this powerful and in my opinion easytouse very efficient implementation provided if it's based on replies scifi insight and under the Food and it is highly integrated for example in and LTK or the 2nd thing is just an example
34:18
so I really hope that you are looking forward to using it and thanks for the
34:25
kind of tensions few thank you thank you very very you with the fix minute left for your
34:35
question please raise your hand them over and Weibull microphone the 1st of the the well things little to show questions that I could provide any online learning methods yeah yeah actually this is a point I I I wasn't able to include in the slides the online learning is already provided and there are many classifiers or techniques that allows for a method which is called partial fit OK so you have this method to up provide the the mobile a bunch of data oneatatime character so the interface has been extended by a partial fit method so some techniques allow for online learning and another very a great the usage of this part of it is in case of the so called out of core core learning in that case the the in the out of cold up of course our the learning setting you're your data are to too big to fit in the memory so you provide the data 1 bunch of bunch of data 1 time because they're too big to fit in the memory so you call powerful fit method to In case because referred to fit a model a bunch of a bunch of partner again things seconds could the regression on is there any support for missing values or missing labels apart from just leading In case of online learning or any other in general for Energy learning for missing labels missing labels are missing data on what you mean so like you get a feature vector that dismisses like a value at the 3rd component actually had enough budget and further really yes thank you just that much so we have a very simple and future that can include by uh median or mean in the different directions uh so if you have very few missing that doesn't work well if you have a lot uh then you you might want to look at matrix completion uh and that is which we do not have we had a Google Summer of Code project in this cluster it didn't finished we welcome contributions of course but here the type and I have some experience section of a psychic before and an action image which is meant mathematician and as I had no way of 8 years about all that to stuff under the hood and you want to deep to use a to be too deep inside of world organism starts and mathematics and and this is the biggest problem from the words to realize what to it so if you got some kind of big data sets with features labels supervised learning how what would you and a tries to someone who doesn't know how does work what which set for which so smaller smaller easy solutions should I consider to improve the results of specific excitation you actually national learning is about finding the right will when the right parameters and so there are many steps you may want to apply in in your training the different algorithms in general you apply data normalization step so you you might forcible that the 1st step I suggest it is preprocessing of the data analyze the data you make some uh statistical tests on the data some preprocessing some visualization of your data manager no what kind of data you're dealing with eventually 1st step the 2nd 1 is trying the same balls mobile you you want to fly and then improving 1 step at a time can if you find the right will you want to use that you want to Our finally you should you're required to find the best settings for that will get in that case you might end up using the greedy search method for sensor which is a method provided of box just to find the best combination of parameters that maximizes the values of the crossvalidation person and of course as a training on the job right so you see you may find that the the right will Fourier predictions forum I find were small than you start over again the for different roles so of that of the cells but yes things again malaria laughing he hears going he's going to give a talk at the data as well I think on Saturday isn't yet on saturday so if you attend PPI data don't miss that talk of well and yet thanks again
00:00
Virtuelle Maschine
Vorlesung/Konferenz
Maschinelles Lernen
Wort <Informatik>
Kantenfärbung
Framework <Informatik>
Computeranimation
00:45
Task
Randverteilung
Wellenpaket
Maschinelles Lernen
01:24
Statistik
Datenanalyse
Datenanalyse
Klasse <Mathematik>
Virtuelle Maschine
Polygonnetz
Atomarität <Informatik>
Computeranimation
Data Mining
Algorithmus
Endlicher Graph
Statistische Analyse
Wort <Informatik>
Algorithmische Lerntheorie
Data Mining
Analysis
02:12
Fundamentalsatz der Algebra
Einfügungsdämpfung
Punkt
Datenanalyse
Relativitätstheorie
Mathematisierung
Maschinelles Lernen
Statistische Analyse
Codec
Computeranimation
Data Mining
Persönliche Identifikationsnummer
Mustersprache
Diagramm
Informationsmodellierung
Prognoseverfahren
Menge
Mereologie
Mustersprache
Wort <Informatik>
Beobachtungsstudie
Aggregatzustand
Analysis
03:45
Resultante
Lineares Funktional
Subtraktion
Prozess <Physik>
Klasse <Mathematik>
Gruppenkeim
Klassische Physik
Schlussregel
Vektorraum
Überwachtes Lernen
Informationsmodellierung
Algorithmus
Menge
Minimum
Algorithmische Lerntheorie
Cluster <Rechnernetz>
05:19
Subtraktion
Prozess <Physik>
Mereologie
Gruppenkeim
Wort <Informatik>
Aggregatzustand
05:53
Programmiersprache
Graph
Formale Sprache
Eindeutigkeit
Systemaufruf
Kartesische Koordinaten
Optimierung
Computeranimation
Schlussregel
Web log
Berline
Menge
Formale Sprache
Rechter Winkel
Datenverarbeitungssystem
Programmbibliothek
Informatik
Auswahlaxiom
07:08
Interpretierer
Web Site
Maschinencode
Datenverarbeitungssystem
Rechter Winkel
Mathematisierung
Projektive Ebene
Physikalisches System
Computerunterstütztes Verfahren
Topologische Mannigfaltigkeit
08:01
Subtraktion
Bit
Maschinencode
Prozess <Physik>
Mathematisierung
Formale Sprache
Virtuelle Maschine
Natürliche Zahl
Maschinelles Lernen
Computerunterstütztes Verfahren
Computeranimation
Netzwerktopologie
Virtuelle Maschine
Algorithmus
Datennetz
Stichprobenumfang
Programmbibliothek
Schnittstelle
Analysis
Zentrische Streckung
MailingListe
Physikalisches System
Vektorraum
Bitrate
Natürliche Sprache
Menge
Formale Sprache
Dreiecksfreier Graph
Unüberwachtes Lernen
Ordnung <Mathematik>
Programmierumgebung
Neuronales Netz
11:44
Expertensystem
Zentrische Streckung
Web Site
Quader
Installation <Informatik>
Natürliche Zahl
Fokalpunkt
Hinterlegungsverfahren <Kryptologie>
Computeranimation
Rechenschieber
Task
Systemprogrammierung
Informationsmodellierung
Algorithmus
Skalierbarkeit
Menge
Tensor
Programmbibliothek
Installation <Informatik>
Wärmeleitfähigkeit
Schnittstelle
13:36
Subtraktion
Quader
Leistungsbewertung
Matrizenrechnung
Versionsverwaltung
Maschinelles Lernen
Algorithmus
Trennschärfe <Statistik>
Lineare Regression
Maschinencode
Speicherabzug
Leistungsbewertung
Lineares Funktional
Interpretierer
Algorithmus
Lineare Regression
Linienelement
Installation <Informatik>
Social Tagging
Fokalpunkt
Kreuzvalidierung
Rechenschieber
Modallogik
Funktion <Mathematik>
Menge
Dreiecksfreier Graph
Decodierung
Ordnung <Mathematik>
15:05
Schätzwert
Wellenpaket
EMail
Computeranimation
Informationsmodellierung
Algorithmus
Fermatsche Vermutung
Lineare Regression
Programmbibliothek
Ordnungsreduktion
Cluster <Rechnernetz>
Figurierte Zahl
Widerspruchsfreiheit
Schnittstelle
Schätzwert
Soundverarbeitung
Algorithmus
Kategorie <Mathematik>
Prognostik
Ordnungsreduktion
QuickSort
Mapping <Computergraphik>
Objekt <Kategorie>
Menge
Computerunterstützte Übersetzung
Instantiierung
16:55
Schätzwert
Algorithmus
Trennschärfe <Statistik>
Datenmodell
Unüberwachtes Lernen
Wärmeübergang
KarhunenLoèveTransformation
Transformation <Mathematik>
Ordnung <Mathematik>
Computeranimation
Modallogik
17:47
Nebenbedingung
Subtraktion
Punkt
Prozess <Physik>
Quader
Klasse <Mathematik>
Gruppenoperation
Vektorraum
Kartesische Koordinaten
Gebäude <Mathematik>
Kombinatorische Gruppentheorie
Computeranimation
Schwach besetzte Matrix
Informationsmodellierung
Datensatz
Algorithmus
Lineare Regression
Trennschärfe <Statistik>
Dicke
Matrizenring
Stichprobe
Datenmodell
Vektorraum
Ordnungsreduktion
Matrizenring
Rechter Winkel
Zahlenbereich
Kantenfärbung
19:55
Computervirus
Wellenpaket
Zustandsmaschine
IRIST
Implementierung
Zahlenbereich
Vektorraum
Schwach besetzte Matrix
Stichprobenumfang
Programmbibliothek
Quellencodierung
ARTNetz
Gerade
Cliquenweite
Algorithmus
Lineares Funktional
Dicke
Stichprobe
Vektorraum
Objekt <Kategorie>
Matrizenring
Datenfeld
Last
Zahlenbereich
Dreiecksfreier Graph
ARTNetz
Information
21:52
Rechenschieber
Matrizenrechnung
Deskriptive Statistik
Shape <Informatik>
Linienelement
Stichprobenumfang
IRIST
Zahlenbereich
Wort <Informatik>
Programmierumgebung
Schlüsselverwaltung
Computeranimation
23:01
Resultante
Subtraktion
Wellenpaket
Algorithmus
Einheit <Mathematik>
Kommunikationsdesign
Inverse
Klasse <Mathematik>
IRIST
MIDI <Musikelektronik>
Computeranimation
Aggregatzustand
23:48
Maschinencode
Computervirus
Wellenpaket
Prognostik
IRIST
Computeranimation
Entscheidungstheorie
Task
Randwert
Algorithmus
Rechter Winkel
Stichprobenumfang
Uniforme Struktur
Mustersprache
Gerade
ARTNetz
Fitnessfunktion
24:55
Zahlenbereich
Sprachsynthese
Computeranimation
Objekt <Kategorie>
Unterring
Algorithmus
Menge
Chatten <Kommunikation>
Gruppentheorie
Dreiecksfreier Graph
Abstand
Cluster <Rechnernetz>
Klumpenstichprobe
Schnittstelle
26:10
Matrizenrechnung
Dualitätstheorie
Subtraktion
Wellenpaket
Leistungsbewertung
Klasse <Mathematik>
Zahlenbereich
Computeranimation
Virtuelle Maschine
Datensatz
Informationsmodellierung
Prognoseverfahren
Klon <Mathematik>
Programmbibliothek
Leistungsbewertung
Data Encryption Standard
Videospiel
Multifunktion
Matrizenring
Matching <Graphentheorie>
Linienelement
Relativitätstheorie
Validität
Datenmodell
Prognostik
Konditionszahl
Computerunterstützte Übersetzung
Diagonale <Geometrie>
28:39
Resultante
Programmiergerät
Subtraktion
Maschinencode
Prozess <Physik>
Wellenpaket
Quader
Zahlenbereich
Kartesische Koordinaten
Computerunterstütztes Verfahren
Zentraleinheit
Analysis
Computeranimation
Virtuelle Maschine
Informationsmodellierung
Prognoseverfahren
Prozess <Informatik>
Programmbibliothek
Schnitt <Graphentheorie>
Default
Korrelationsfunktion
Gerade
Softwaretest
Zentrische Streckung
Addition
Lineares Funktional
Parametersystem
Datenmodell
Systemaufruf
Statistische Analyse
Bitrate
Kontextbezogenes System
Kreuzvalidierung
Rechenschieber
Menge
HeegaardZerlegung
Wort <Informatik>
Decodierung
Lesen <Datenverarbeitung>
Fehlermeldung
32:36
SCI <Informatik>
Maschinencode
Transinformation
Prozess <Physik>
MIMD
Klasse <Mathematik>
Formale Sprache
Natürliche Zahl
TokenRing
Vektorraum
Modul
QuickSort
Computeranimation
Eins
Arithmetisches Mittel
Formale Sprache
Digitaltechnik
Programmbibliothek
Term
Bildgebendes Verfahren
Modul
Schnittstelle
33:54
Programmbibliothek
Implementierung
Vorlesung/Konferenz
ChiQuadratVerteilung
Softwarekonfigurationsverwaltung
34:34
Resultante
Matrizenrechnung
Maschinencode
Wellenpaket
Punkt
Quader
Selbst organisierendes System
Relationentheorie
Gruppenoperation
Schaltnetz
ELearning
Richtung
Informationsmodellierung
Datenmanagement
Prognoseverfahren
Exakter Test
Webforum
Prozess <Informatik>
Lineare Regression
Datentyp
Visualisierung
Zusammenhängender Graph
GreedyAlgorithmus
Bildgebendes Verfahren
Leistung <Physik>
Schnittstelle
Umwandlungsenthalpie
Parametersystem
Vervollständigung <Mathematik>
Zwei
Vektorraum
Partielle Differentiation
Kreuzvalidierung
Arithmetisches Mittel
Rechenschieber
Energiedichte
Menge
Rechter Winkel
Festspeicher
Mereologie
Speicherabzug
Wort <Informatik>
Garbentheorie
Projektive Ebene
Fitnessfunktion
Metadaten
Formale Metadaten
Titel  Scikitlearn to "learn them all" 
Alternativer Titel  Why SCIKITLEARN is so cool 
Serientitel  EuroPython 2014 
Teil  49 
Anzahl der Teile  120 
Autor 
Maggio, Valerio

Lizenz 
CCNamensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. 
DOI  10.5446/20046 
Herausgeber  EuroPython 
Erscheinungsjahr  2014 
Sprache  Englisch 
Produktionsort  Berlin 
Inhaltliche Metadaten
Fachgebiet  Informatik 
Abstract  Valerio Maggio  Scikitlearn to "learn them all" Scikitlearn is a powerful library, providing implementations for many of the most popular machine learning algorithms. This talk will provide an overview of the "batteries" included in Scikitlearn, along with working code examples and internal insights, in order to get the best for our machine learning code.  **Machine Learning** is about *using the right features, to build the right models, to achieve the right tasks* However, to come up with a definition of what actually means **right** for the problem at the hand, it is required to analyse huge amounts of data, and to evaluate the performance of different algorithms on these data. However, deriving a working machine learning solution for a given problem is far from being a *waterfall* process. It is an iterative process where continuous refinements are required for the data to be used (i.e., the *right features*), and the algorithms to apply (i.e., the *right models*). In this scenario, Python has been found very useful for practitioners and researchers: its highlevel nature, in combination with available tools and libraries, allows to rapidly implement working machine learning code without *reinventing the wheel*. **Scikitlearn** is an actively developing Python library, built on top of the solid `numpy` and `scipy` packages. Scikitlearn (`sklearn`) is an *allinone* software solution, providing implementations for several machine learning methods, along with datasets and (performance) evaluation algorithms. These "batteries" included in the library, in combination with a nice and intuitive software API, have made scikitlearn to become one of the most popular Python package to write machine learning code. In this talk, a general overview of scikitlearn will be presented, along with brief explanations of the techniques provided outofthebox by the library. These explanations will be supported by working code examples, and insights on algorithms' implementations aimed at providing hints on how to extend the library code. Moreover, advantages and limitations of the `sklearn` package will be discussed according to other existing machine learning Python libraries (e.g., "Shogun Toolbox", "PyML", "MLPy"). In conclusion, (examples of) applications of scikitlearn to big data and computational intensive tasks will be also presented. The general outline of the talk is reported as follows (the order of the topics may vary): * Intro to Machine Learning * Machine Learning in Python * Intro to ScikitLearn * Overview of ScikitLearn * Comparison with other existing ML Python libraries * Supervised Learning with `sklearn` * Text Classification with SVM and Kernel Methods * Unsupervised Learning with `sklearn` * Partitional and Modelbased Clustering (i.e., kmeans and Mixture Models) * Scaling up Machine Learning * Parallel and Large Scale ML with `sklearn` The talk is intended for an intermediate level audience (i.e., Advanced). It requires basic math skills and a good knowledge of the Python language. 
Schlagwörter 
EuroPython Conference EP 2014 EuroPython 2014 