OpenML, R, mlr
Formale Metadaten
| Titel |
OpenML, R, mlr
|
| Serientitel | |
| Autor |
|
| Mitwirkende |
|
| Lizenz |
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. |
| Identifikatoren |
|
| Herausgeber |
|
| Erscheinungsjahr |
2014
|
| Sprache |
Englisch
|
Inhaltliche Metadaten
| Fachgebiet | |
| Abstract |
I will first introduce an R package to interface with OpenML. We support querying and downloading, running experiments and uploading results, so that all your experiments are organized online. R itself allows many forms of machine learning methods and experiments, from completely custom code to powerful semi-automated frameworks. The OpenML package is framework-agnostic in that regard. The mlr package provides a generic, object-oriented, and extensible interface to a large number of machine learning methods in R. It enables researchers and practitioners to easily compare methods and implementations from different packages, rapidly conduct complex experiments, and implement their own meta-methods using mlr's building blocks. Classification, regression, survival analysis, and clustering are supported and virtually every resampling strategy. Meta-Optimization can be performed by tuning, feature filtering and feature selection, and most modeling steps can be parallelized. Its object-oriented structure provides in many cases a close match to the OpenML structure, and it can already be connected to the OpenML R package in a simple manner. The talk will conclude with an outlook regarding the next steps, open challenges and ideas to improve upon the current state of the project.
|

00:00
Coxeter-Gruppe
Besprechung/Interview
Multiplikationsoperator
01:02
Gruppenoperation
Computeranimation
Multiplikationsoperator
Vollständiger Verband
Objekt <Kategorie>
Gerade
01:39
Algorithmische Lerntheorie
Verschlingung
Vorlesung/Konferenz
Multiplikationsoperator
Erneuerungstheorie
Programmierung
Programmbibliothek
Physikalisches System
02:49
Algorithmische Lerntheorie
Computeranimation
Vorlesung/Konferenz
Interface <Schaltung>
Data Mining
Gleitendes Mittel
COM
03:22
Algorithmische Lerntheorie
Computeranimation
Vorlesung/Konferenz
Nichtlinearer Operator
Arithmetischer Ausdruck
Virtuelle Maschine
COM
Konvexe Hülle
Rechter Winkel
Datenstruktur
Formale Sprache
03:58
Mapping <Computergraphik>
Virtuelle Maschine
COM
Gesetz der großen Zahlen
Programmierung
Typentheorie
Computeranimation
Vorlesung/Konferenz
Differente
Service provider
Web Site
Hill-Differentialgleichung
Rechter Winkel
Schreiben <Datenverarbeitung>
04:49
Sensitivitätsanalyse
Code
Multiplikationsoperator
Innerer Punkt
Ereignisdatenanalyse
Virtuelle Maschine
Gerade
Schreib-Lese-Kopf
Programmiergerät
Bestimmtheitsmaß
Computeranimation
Vorlesung/Konferenz
Gesetz <Physik>
Softwaretest
Task
05:57
Meter
NP-hartes Problem
Information
Multiplikationsoperator
Ereignisdatenanalyse
Innerer Punkt
Mailing-Liste
Einflussgröße
Rahmenproblem
Emulation
Programmiergerät
Computeranimation
Objektorientierte Programmiersprache
Vorlesung/Konferenz
Einfacher Ring
Ein-Ausgabe
Objekt <Kategorie>
WML
06:48
Feuchteleitung
Wiederkehrender Zustand
Multiplikationsoperator
Ereignisdatenanalyse
Hochdruck
Variable
Spezielle unitäre Gruppe
Einflussgröße
Typentheorie
Bildgebendes Verfahren
Dualitätstheorie
MKS-System
Kardinalzahl
Computeranimation
Bitrate
Notepad-Computer
Differente
Einfacher Ring
Web Site
Softwaretest
Endliche Modelltheorie
Quick-Sort
Task
08:05
Verschlingung
Lineares zeitinvariantes System
Lineare Regression
Multiplikationsoperator
Ereignisdatenanalyse
Lokales Minimum
Drei
Ereignisdatenanalyse
Likelihood-Quotienten-Test
Zahlenbereich
Programmiergerät
Computeranimation
Vorlesung/Konferenz
Einfacher Ring
Gesetz <Physik>
Hill-Differentialgleichung
Klasse <Mathematik>
08:38
Sensitivitätsanalyse
Funktional
Algorithmus
Interface <Schaltung>
Ereignisdatenanalyse
Parametersystem
Wellenpaket
Computeranimation
Einfügungsdämpfung
Vorlesung/Konferenz
Ordnungsreduktion
Mehrwertige Logik
Arithmetische Folge
09:12
Klassische Physik
Entscheidungstheorie
Lokales Minimum
Mereologie
Kategorie <Mathematik>
Kardinalzahl
Computeranimation
Topologie
Einfacher Ring
Hill-Differentialgleichung
Gesetz <Physik>
Rechter Winkel
Tabelle
10:01
Schlussregel
Default
Algorithmus
Nebenbedingung
Multiplikationsoperator
Entscheidungstheorie
Ereignisdatenanalyse
Schnittmenge
Parametersystem
Thermische Zustandsgleichung
Typentheorie
Computeranimation
Vorlesung/Konferenz
Topologie
Klasse <Mathematik>
11:03
Zentrische Streckung
Vorlesung/Konferenz
System F
Multiplikationsoperator
Rechter Winkel
Summierbarkeit
Parametersystem
Datenstruktur
Assoziativgesetz
Globale Optimierung
Minkowski-Metrik
11:46
Vorhersagbarkeit
Computeranimation
Vorlesung/Konferenz
Lineare Regression
Multiplikationsoperator
Einfacher Ring
Ereignisdatenanalyse
Ereignisdatenanalyse
Rechter Winkel
Einflussgröße
Emulation
Zahlenbereich
12:18
Schätzfunktion
Computeranimation
Ereignisdatenanalyse
Virtuelle Maschine
Bit
Einflussgröße
Metropolitan area network
13:01
Vorhersagbarkeit
Addition
Algorithmus
Computeranimation
Rekursivfilter
Notepad-Computer
Entscheidungstheorie
Datenstruktur
Einflussgröße
Lesen <Datenverarbeitung>
Deskriptive Statistik
Task
13:56
Computeranimation
Information
IRIS-T
Softwaretest
Einflussgröße
Wellenpaket
14:28
Web-Seite
Distributionenraum
Funktional
Relativitätstheorie
Information
Interface <Schaltung>
Entscheidungstheorie
Eins
Benchmark
Wellenpaket
Schnitt <Mathematik>
Computeranimation
Vorlesung/Konferenz
Notepad-Computer
Wald <Graphentheorie>
Sampler <Musikinstrument>
Softwaretest
Paarvergleich
Task
15:54
Stapeldatei
Parallele Schnittstelle
Speicherabzug
Parallele Schnittstelle
Computeranimation
Vorlesung/Konferenz
Socket
Befehl <Informatik>
Rechter Winkel
Trennschärfe <Statistik>
Stellenring
Multiplikation
Physikalisches System
16:26
Stapeldatei
Virtuelle Maschine
Cluster <Rechnernetz>
Speicherabzug
Schreib-Lese-Kopf
Datenverarbeitungssystem
Neuroinformatik
Typentheorie
Arithmetisches Mittel
Fermatsche Vermutung
Prozess <Informatik>
Computeranimation
Socket
Ordnung <Mathematik>
Stellenring
Multiplikation
Physikalisches System
17:04
Stapeldatei
MIDI <Musikelektronik>
Lokales Minimum
Parallele Schnittstelle
Speicherabzug
Mereologie
Hidden-Markov-Modell
Computeranimation
Vorlesung/Konferenz
Socket
Nichtlinearer Operator
Web Site
Softwaretest
Rechter Winkel
Stellenring
Paarvergleich
Multiplikation
Task
17:35
Auswahlaxiom
Algorithmus
Gruppenoperation
Laufzeitfehler
Bootstrap-Aggregation
Prozess <Informatik>
Computeranimation
Vorlesung/Konferenz
Schnittmenge
Computerspiel
Übergang
Softwaretest
Softwaretest
Trennschärfe <Statistik>
Paarvergleich
Task
18:26
Punkt
Stapeldatei
Distributionenraum
Lineare Regression
CAD
Codierung
Wellenpaket
Wald <Graphentheorie>
Bestimmtheitsmaß
Mathematik
Computeranimation
Visualisierung
Einfacher Ring
Socket
Wald <Graphentheorie>
Nichtlinearer Operator
Computerspiel
Endliche Modelltheorie
Stellenring
Task
Physikalisches System
19:41
Rechenschieber
Computeranimation
Vorlesung/Konferenz
Nichtlinearer Operator
Innerer Punkt
Hauptkomponentenanalyse
Lokales Minimum
Hill-Differentialgleichung
Objekt <Kategorie>
20:15
Digitalfilter
Algorithmus
Funktional
Zentrische Streckung
Code
Drei
Lokales Minimum
ARM <Computerarchitektur>
Größenordnung
Emulation
Prädiktor-Korrektor-Verfahren
Marketinginformationssystem
Computeranimation
Vorlesung/Konferenz
Topologie
Hauptkomponentenanalyse
Übergang
Konvexe Hülle
Quick-Sort
PRINCE2
Summierbarkeit
Ausgleichsrechnung
20:54
Rohdaten
Orientierung <Mathematik>
Innerer Punkt
Lokales Minimum
Emulation
Computeranimation
Vier
Einfügungsdämpfung
Differente
Hauptkomponentenanalyse
Ein-Ausgabe
Konvexe Hülle
Trennschärfe <Statistik>
Rechter Winkel
Objekt <Kategorie>
Physikalisches System
21:30
Algorithmus
Beobachtungsstudie
Konstruktor <Informatik>
Lokales Minimum
ARM <Computerarchitektur>
Programmierung
Computeranimation
Vorlesung/Konferenz
URN
Nichtlinearer Operator
Kette <Mathematik>
Hauptkomponentenanalyse
Hill-Differentialgleichung
Softwaretest
Endliche Modelltheorie
Resultante
Globale Optimierung
22:31
E-Mail
Algorithmus
Innerer Punkt
Lokales Minimum
Zahlenbereich
Relationentheorie
Uniformer Raum
Computeranimation
Hauptkomponentenanalyse
Konfigurationsraum
Hill-Differentialgleichung
Bus <Informatik>
Endliche Modelltheorie
Computersicherheit
Globale Optimierung
23:10
Digitaltechnik
Code
Minimum-Abstand-Klassifikator
Virtuelle Maschine
Parametersystem
Datenmodell
Kette <Mathematik>
Schätzfunktion
Güte der Anpassung
Bitrate
Term
Differente
Wald <Graphentheorie>
Übergang
Trennschärfe <Statistik>
Codierung <Programmierung>
Endliche Modelltheorie
Globale Optimierung
Algorithmus
Auswahlaxiom
Zentrische Streckung
Gewicht <Ausgleichsrechnung>
Parametersystem
Lokales Minimum
Gerade
Abstimmung <Frequenz>
Computeranimation
Notepad-Computer
Endliche Modelltheorie
Rechter Winkel
Quick-Sort
Deforestation <Informatik>
Task
25:52
Gruppenoperation
Nebenbedingung
Computeranimation
Übergang
Gerade
Resultante
Globale Optimierung
Minkowski-Metrik
26:52
Datenmodell
Computeranimation
Code
Minimum-Abstand-Klassifikator
Innerer Punkt
Stichprobenumfang
Ereignisdatenanalyse
Bit
Gerade
27:24
Modallogik
Algorithmus
Computeranimation
Notepad-Computer
Nichtlinearer Operator
Multiplikationsoperator
Adressierung
Exploit
Task
27:59
Funktional
Code
Charakteristisches Polynom
Bit
Große Vereinheitlichung
Gerade
Informationsqualität
Kondition <Mathematik>
Versionsverwaltung
Auswahlverfahren
Computeranimation
Vorlesung/Konferenz
Nichtlinearer Operator
Tabelle
Task
Schreiben <Datenverarbeitung>
Task
Klasse <Mathematik>
29:05
Konfiguration <Informatik>
Computeranimation
Beobachtungsstudie
Vorlesung/Konferenz
Summengleichung
Lokales Minimum
Rechter Winkel
Große Vereinheitlichung
Globale Optimierung
Schreib-Lese-Kopf
Klasse <Mathematik>
Vektorpotenzial
29:39
Weg <Topologie>
Elektronische Publikation
Elektronische Publikation
Bit
Zeiger <Informatik>
Konfiguration <Informatik>
Computeranimation
Halbleiterspeicher
Lie-Gruppe
Dokumentenserver
Mini-Disc
Rechter Winkel
Objekt <Kategorie>
Minkowski-Metrik
30:24
Computeranimation
Information
Vorlesung/Konferenz
Aggregatzustand
Nichtlinearer Operator
Elektronische Publikation
Dokumentenserver
Schnittmenge
Objekt <Kategorie>
Task
Simulated annealing
Task
30:55
Vorhersagbarkeit
Authentifikation
Addition
Algorithmus
Laufzeitfehler
Virtuelle Maschine
Nichtlineares Zuordnungsproblem
Produkt <Mathematik>
Neuroinformatik
Mathematik
Stichprobe
Computeranimation
Pi <Zahl>
Vorlesung/Konferenz
Teilbarkeit
Übergang
Task
Resultante
Familie <Mathematik>
Datentyp
31:45
Vorhersagbarkeit
Computeranimation
Vorlesung/Konferenz
Last
Hash-Algorithmus
Ordnung <Mathematik>
Program Slicing
Analysis
Unendlichkeit
Einflussgröße
Resultante
32:25
Beobachtungsstudie
Computeranimation
Vorlesung/Konferenz
Multiplikationsoperator
Entscheidungstheorie
Analysis
Informatik
32:59
Gruppenoperation
Exogene Variable
MUD
Beobachtungsstudie
Information
Gerade
Multiplikationsoperator
Zeitzone
Vertauschungsrelation
Unendlichkeit
Einflussgröße
Ereignishorizont
Typentheorie
Computeranimation
Einfacher Ring
Vorlesung/Konferenz
Profil <Aerodynamik>
Notepad-Computer
Rechter Winkel
Formation <Mathematik>
34:51
Flächeninhalt
Web-Seite
Statistik
Computeranimation
Vorlesung/Konferenz
Multiplikationsoperator
Gewicht <Ausgleichsrechnung>
Momentenproblem
Ereignisdatenanalyse
Unendlichkeit
Einflussgröße
35:28
Generator <Informatik>
Computeranimation
Information
Vorlesung/Konferenz
Multiplikationsoperator
Ereignisdatenanalyse
Schnittmenge
Hill-Differentialgleichung
Formation <Mathematik>
Ausgleichsrechnung
sinc-Funktion
36:12
Rechenwerk
Funktional
Statistik
Multiplikationsoperator
Ereignisdatenanalyse
Ereignisdatenanalyse
Mereologie
Amalgam <Gruppentheorie>
Schätzfunktion
Computeranimation
Vorlesung/Konferenz
Endliche Modelltheorie
Physikalisches System
37:01
Supremum <Mathematik>
Benutzeroberfläche
Elektronische Publikation
Integral
Aggregatzustand
Server
Multiplikationsoperator
Ereignisdatenanalyse
Virtuelle Maschine
Analysis
Automatische Handlungsplanung
Stab
Mereologie
Mereologie
Systemaufruf
Varianz
Vorzeichen <Mathematik>
Mechanismus-Design-Theorie
Computeranimation
Vorlesung/Konferenz
Web Site
Konvexe Hülle
Rechter Winkel
Summierbarkeit
38:05
Standardabweichung
Paarvergleich
Mapping <Computergraphik>
Beobachtungsstudie
Code
Momentenproblem
Bit
Offene Menge
Gerade
CAN-Bus
Computeranimation
Visualisierung
Vorlesung/Konferenz
Service provider
Resultante
Physikalisches System
39:06
Punkt
Supremum <Mathematik>
Standardabweichung
Informationsmanagement
Standardabweichung
Beobachtungsstudie
Mapping <Computergraphik>
Zentrische Streckung
Matching
Information
Beobachtungsstudie
Multiplikationsoperator
Division
Datenbank
Mathematik
Vorzeichen <Mathematik>
Computeranimation
Vier
Vorlesung/Konferenz
Trennschärfe <Statistik>
Quick-Sort
Resultante
40:48
Standardabweichung
Rechenwerk
MUD
Computeranimation
Information
Vorlesung/Konferenz
Multiplikationsoperator
Innerer Punkt
Lokales Minimum
Texteditor
Summierbarkeit
00:00
her legs were in the world and by the age you stay up orange wrote for 40 time Woodward after do with this Inc cabbages users and the other extra begin choroid longer 3 1 2nd XI at the and saw west of this works are going to grip this when he of design I can Boro your presenter with each as the yet should should be top seed to switch automatically is to great things going and actually going to talk about for things so that I think and hope
01:04
I will be able to do this in a 13 minutes where some time left for discussion because in the end with a couple of well so dishes questions and so like your King said the move to introduce and witches a package but and many other people have been developing now for a couple of ideas on the and I've started
01:26
working on the open mouthed object thing 1 a goal and with an eye which were actually that the 2 guys who wrote for only the 1st 2 lines of this in 1 may be the 2nd meeting of this group here that the working
01:40
time by the middle and yet let's go through these guys quickly because it without them all of presenting on now here today would look for a new treaty differently so I'm there was
01:55
measured and was over their and yokel bleached the last couple off for the Caulkins should be constrained Programming Rubin island this year should be the goal of the additional smile and from by and the old book together with me on this package and the woman accused of was not looking at mission school at that you don't want as a must eschewed and and has stood in the systems of pudding many hours now into the open mouthed package and Mitchell also has not stopped working on the open up package and his co operating with the club for renewables dozens of other are alleged packages on so what they are and are sure machine learning are and it exists on year and which is the official so that distribute are libraries and packages it also exist and gets up so you just want to remember 1 link
02:50
remember the ghetto Lincoln everything
02:52
else is reference from their its worst to point 1 riches out right now and we will relieve to point to during the next few days from the while I'm still here because the don't will get another from Brian repeated I'm pretty sure how the idea behind the package is to give you a unified Interface for every basic machine learning
03:17
supervised unsupervised technique in in are actually in the fact that the goals and the
03:23
reason why we want to have that is that she is current up pretty
03:27
well structure that many Coleman operation and what we really want to do is to don't want tackle 1 specific either most stop aspect of the machine learning we really want have expressed expressive language right went on for the G of this machine learning and stuff it interesting if you can buy can combine all of these are well things that you need to do in your experiments and well model them as a whole use them as a whole and for that you need a language and the basic as that of these lack of language over a basically
03:59
these well underlying machine learning methods like aggression occasion techniques and and many other things can come on top of
04:09
that that use this unique and the business community is so I can actually use a sometimes called this like label approach right where you combined plug in different different of these steps together and you can end up in the end with something like this year which reacting copied from from the
04:28
rapid minor tools and we don't really like to do this type of little Programming that you can do the same on by well writings more expressed about programs and will show you how these works hopefully in a few sites so the idea is to provide stretches for everything Group Co together so many many things than the package not
04:49
well programme by ourselves such so we I didn't programme the as yet algorithms that in the UK but the programme many will need of on top of that will come to a head and the package has grown quite which adjusted within a few minutes ago it's now 14 thousand lines of are code just code and well associated broke agenda tradition and the 6 thousand lines of testing and so was in
05:20
the for basic machine learning time how so we can currently cover nor regression in almost supervised justification cost sensitive classification for the general on definition of bad old enough that many of the rooms yet for its of this the experimental last caught of made it possible that we can now talk about plus string towns and try to sort the move with and are in the works but quite a lot on making so Bible moulding possible and and went to introduce but that this to you at at the end of the for because everyone a see this an open as a L at of and dog
05:58
while also during the and the and into early such a attacks smoking
06:06
or I told you about half open mouthed customers are a pretty Samueli said the data frame time with the with the
06:14
input that 3 of features the output and and that's annotated which other top Williams you might have to wait in the wake of the nation's to might misclassification cost and that so data with extra metre information like your house and everything lock is Object said 30 Object Oriented and you can't programme on that as well to everything is attacked with the to information 2nd list everything you can ask what is the subset of white during a rooms on measures that have a certain up 4 but and so on which in my opinion makes
06:49
it really nice to combine with something and this is how he would so this would be that this is how a 1st step for modeling step up might look like a with with are so critical staycation task to put and the date of the data in the case of the Times Irish data said his specify the target variable in your house and you see how would open all men construct from its of this is the print out some nice summary of the task to see how many of the measures that are what type of future site in the US so just 4 numerous features fact assault features that has missing values rates at broking broking is exactly what you ask about
07:28
when you set up and so 1 of you guys as boat with a possible to do with the 1 person O'Driscoll so that what you can do with this broking you can say the sort of Asians long together and if you long to get the injury resampling barrier either go to the trading said with either go to the said that test that all of them because that in some scenario that very common that you need to do something like that think about not looking at images where you look at different subsegments of the image or other difference subsegments of songs and so on and sound bites found that we need to do something like this
08:08
which comparably but so how many that what would we have 1 of
08:13
the side of learning and move about 40 classification rhythms and a couple of class 3 albums of these are still growing because this is very New 23 regression techniques 7 survival methods and by the way all of this is actually the whole talks
08:30
programme with the time that introduced and this is a well
08:36
and in right both numbers of some kind of
08:39
past the package for many of those things have and we have reduction Algorithms for across sensitive classification so these again Mutai techniques where you realise or New using the biggest progression techniques anticipation techniques usually in awaited sent and they functions in the cost of train and predicts which of these
09:01
methods that what the interface is made of for these for these running a rooms in each of these Lewis has Sociedad parameters said that we can also ask about and think that on the
09:15
next flight exactly how would you use such a your learning about the world you call maker the naming are so the said giving classic Asia World owners are are part because of petitioning just a decision for for the right so that Scott if you know that and and you
09:34
can again printed and rises and it tells you well I'm from package are part by the way this would automatically loaded and was Acacia room and my name is decision Tree either shorter name which is are part which is useful for tables and also and so on but it has a clause in as many different properties so it can handle tool cost to cash in my declined is can handle missing values numerous fact as well and it is a tree right into basically
10:02
everything that's myself trees and and it has predict so it will
10:08
predicts class labels that can change the ability of the method knows but that can supply that as well
10:18
and the only 1 to have a proper setting extra changed and that's don't to across Alicia troubling because
10:25
we know this them off and we don't waste time so that switch off by
10:30
default would interested again and again as well in what are called possible hyperparameters and but the tribute effort Tremaine Yulianto of this for of these assaulted algorithms acknowledged that so you can see it all of the different types of from the settings for the decision Tree you can see the type and it might have lent it it's a backdrop and the City for value taken from the book imitation and constraints so this goes this guy goes from 1 fragility well because of the accounting very well and you can say Well this parameters actually independent
11:04
parameters something about across when as right so that from with which is usually called them that only makes sense if you use the are from or be 1 or 2 of the other girls at the foot end of his use of the car from she can model well
11:19
dependencies structures and parameters space that get interesting if you think about during those guys and you can associate France summations with this 2 can do something like automatically chewing on log scales you can say it is a brand that every time you apply please do which true to the exit before your apply right because that's what we do when we do not Optimization of things that go from 0 to infinity right we optimizing looks good and
11:47
yet again that many different from its measures in them quite look for classification and this huge number is due to the fact that we have world these are a see
11:59
measures right like positive Jean once and so on and regression measures only
12:05
1 for survival analysis of because of 40 odd to implement the technically some lustring stuff for General measures like timing so that you can do everything right you can ask how long time NEMO
12:15
Prediction also took and and
12:19
again these SCF occasion Methods measures have prop achieves see can see whether they are available for my because the occasionally binary and so on you can you know whether they should be minimised maximise their best and the worst value and so on and now this is all pretty basic it becomes a bit more interesting weekend we something all this is what who is
12:45
already talked about and so it's about facing before man's estimation during the prop early Machine earnings comments from the old model despite hot and so we have crossed data bootstrapping sub something extra typification a Sapori this broking
13:02
structure that supported and the thing that is basically a and the and the Wallabies of territory of were stripping like the 6 3 plus if you want that for your read small datasets with less than 200 automation his or you call it to cradling algorithm 1st again but to use a decision feel we don't and 4 trusted additions recreated description objected and so we Gregory something destruction cross addition tenfold we create some ever so we measure up the mean mystification are all and stupidly also the accuracy and this just 1 minus the or in this case and you called recycle the learning of the
13:49
task and the abuse of the resounding description of the measures and you get back all the predictions of the year
14:00
measurements when just the test said it also was a of the training centre and in the end you usually mainly interested
14:07
in the aggregate to preside over a low cross tradition but if you want to have a detailed information you get out
14:14
of here so this 1 is real GEC contains all of the stuff but not as we have a shorter for this weekend just say crossbelt
14:24
and the and for cross auditioned for a 5 foot or and we have actually for many
14:32
of the things that I'm sure including about not nowadays for cuts of protective work but and the main aspect of the packages to clean interface to be able to do everything I want should also mentioned that it was certainly was not implemented that's where documented on the Web page where do this to take a 61 page of text you Kobe pasted and you fill in a few or details tried so critical the training function multiple the test function and the most boring aspect that somebody has so for an all be branded information ones but you are allowed to leave the house but then you don't get some nice extra checking functionality of the package but everything will work you can demise into on this too can be a really easy but the most important thing is if you do something like this is something as missing and he's Otello's reason for must go to the top tracker usual and tell us well access missing his my 1st trying to integrate the 2 the cricket and you can
15:37
also go benchmarking you can join a couple of tough and this case we took the IRA's we took before not cost we took 3 learning and rooms decision ran of forest as I am and you can call 1 function benchmark and this will run all of the 10 for across relations said for from the
15:57
right was just a couple of nested looks and the nice thing is that not
16:02
its and apply statements Parallel by statements and early seeking and paralysed all of this you can paralysed resampling you can rise this benchmarking cucumber lights features selection of have talked about it for and and you can do this on how well on the way to say every system but she Rice's every system that
16:24
either work 28 and because
16:28
the use of her on the head package which works of local machines it would look a magical machines Rawkins pocket at 1 in the eye Clusters and we also integrated another package
16:41
that we wrote for each piece Computing side for means Computing was also appointed amid United together that job and this enables you to virtual use every type of 4 months of this computing systems can order talks system as or a strong Cluster for L and as and when the work on also ordered it but most of those are in the world where all of the books once you have
17:04
completed this once for your site and another really nice thing is and why it she wrote this column package is that you can tag these operations that should maybe run in parallel with names and
17:19
you can then from the outside select whether they should be paralysed will not so I'm look at this year right that or any different stages of the experiment and depending on
17:32
what you do or depending on whether you blowhole hold out for
17:36
100 for bootstrapping and Romania at datasets to have and you might 1 of her life different never arrived so what you could do is you could say what role job is running 1 across the needed algorithm 1 1 data said right so that the old level but you could also add paralysed where the re something it's said right at the
17:57
end level and maybe there future selection of 14 in their as well and then that level and what other technically the best thing depends on sizes right depends on Runtime behaviour of a group amended depends on size of data said and testing and we don't know and how should we know bigger that is or whether you hold out across the nation and so you just select the right level because you usually the the choice between who experiment looks and and
18:29
I just tell the system to Duprels stop you tell the system the back your using may be a Gyo ever and you at 1 of her lawyers resampling Boro 1 of her life span from operation Danny visualizations and are so I'm despite days useful teaching and sometimes throughout shouldest Udenze how well so model that might look like into died so this is a random forest began on the Irish status and and what you see here on the Irish of data points in the 1st 2 were features of the IRA's datasets that you can see that the cost as little as 1 of Britain's dogs and trying and quest whether there were mistakes made by the model in the colour and codes approach to recruit to distribution
19:20
of brand of forest can cost ring or regression as well so that it just nice for working with begin as but a sometimes still over something you look and this is by the way again related to a stock of this is that we all is used Fault accosted by like as well you get bigger judge approach to object in you can changes in the the and right because
19:42
the club's of objects and you can apply to the operation stood if you don't like the colour coded ever the did and that many many things that can not talk about sold by just of some rise than he on the slide so we have pretty many prepossessing at the should start different so this is all pretty basic that I'm sure news its nice
20:13
but this thing he read
20:15
and opens up another level of scaling and another but it gives you a sort of magnitude to do with interesting stuffed with a package to what you can do it you can't take a basic learning
20:25
algorithm in Ukraine wraparound so Meetup algorithms or and functionality to a Saudi prince for example to a dual you can say Well want and a certain prepossessing method actually can into a team method you want because you can use a prepossessing rapper that can contain custom code to can say it will be for a while flight is going out of the Maze the tree award will take my up during yet and you can do
20:54
well feature the trade right we have about ironed out more than 10 different techniques knowing implemented this feature selection in there and see Crenshaw for what such that Ross or talked about through the correctly and the genetic a rebound we are all working on my secret methods as well at that different imputation Methods actually we have a whole Object
21:21
Oriented System throughout to any kind of amputation you want and majority implemented of many useful by the end of
21:31
those 4 you already have you can do generate begging and to do 1 some construction sticky is that if you want to create also most of the region was models over and under something for the but 1st occasion programs to just the huge study on those and trying to publish the results of these tests more than over so that also something that's mould and so on and on and something that I'm very interested in which is a credit to shooting and for up to
22:03
fight and optimization the nice think of this as is you can not only of a blueprint tuning for the basic algorithms can constructed chain of trade-processing Modeling and cost professing operations or of these with the greatest you can to be construed the jointly it because this not like it but bigger than just 22 revenues of the as yet dismount algorithms and there search or but surged so there is rated at racing and in
22:33
their or model based Optimization of this about everybody configuration a topic that looking already into the and this is a pretty or top accounting and and well we
22:42
like small based approach and also developing stuff for this as well but you can also racing if you like more and for nyse integrated into her are and this is possible as well but not officially
22:56
security was Beijing Optimization numbers of 2 misation but the email this was take maybe 1 or 2
23:01
months until the seeds from the used this not acting for 1 between the 2 optimized for data might model and
23:11
and finally and want to show you something pretty complex so eye will show you how to do it it efficient out model selection including different running algorithms including different hyperparameters with just a few lines of code so the idea of having the with understood how this 1st level of model or to my as Asian and works right it's something like maximum like a good estimation up so penalised M L or you might call this regular rise lost minimisation right or understood how this works wine take the same principle to the next level and in the next level you he is this mobile and this and this and this and and do what many Alix Burns and will look at what works best and well can as and looks nice non room to fill wrong with parameters circuit nicer adding that 1 of the best and review the right with machine and so we will just told the same thing that we did on the 1st level on the next level and this is sometimes called 2nd level estimation and think it's a good way to look at this and say this is not from the by invented by stole from a paper from the but we all about this and so you can know that there are so what you can do it you can something that we use the term we invented because the the nautical this on a multiplex mobile to you take on a different voting machines with parameters you got them together in a mighty plaques and that might excise won friend and this hybrid encodes what they actually selected price and the idea is not to join exhaustive sort of the way to do it that based Optimization West racing to focus on the well for me to write to efficient Optimization on this 2nd level and you can use every 2 that you want and young is how this good looks like on lines are the 10 or so so you could constructed a summary of also move is not place a riot of the rainforest the as yet you change the prices of goods and you say 1 1 or 2 of the 4 tries which in this case because the weights along you say or want to be traded at racing with 400 experiments and you create the private is set for them by a model Multiplex or so for the rate of forest you would optimized size the as the and in this case you would optimized that comes with the example right in reality would more credit and
25:51
your optimized is 1 of looks scale
25:54
constraints yachts about constraints unapprenticed and then you do during rabbit wraparound other toning and put on top of the other earning a group and now you can do to think that once the 1st of all you optimized over this pretty large space also large not just imagine a few more lines and I don't a sufficiently with racing and the other thing is if you use the right approach he you can do nested resampling rightly note we do this toning we can just point the best result in the end but because maybe did 1
26:34
billion experiments on the same day with the same frustrated Asian we know this will be optimates optimistically by his drive and the papers were get rejected the hopefully of the review was ordered and so we have to nest this into another level of performance and timation and if you use the right approach he now
26:53
he just say Well cross the rapper for a sample again and this will do everything at once another line of code and we are and that of on him and if you want a
27:09
real about this a little bit more special you for survival analysis by what we did in the 40 generally possible for other supervised of Maudling techniques which by road paper
27:24
about optimizing prepossessing operations and some of the most this case with the with the dreaded accuracy and that it to be open about package now witches might say for smaller so and good use up that much time to get the current 80 allows you to exploit datasets and tax on the sofa up to download datasets and has you can register learning algorithms economic look so that's what we got so far and hopefully
27:58
we'll have more at the end of this week is to
28:00
look most IPDPS we don't have the combined with a car on the idea is you can use any are package for you can write custom code if you want it might get less comedians but you can do everything you want and and OR is already a bit more integrated because of accused this is what looks like if you want to attack the said so you can ask well please of tell me what the open amount that assets to see how well the name of the data said data said idea in the version of the cast were where 1 of the registered top can see the ideas of the task the names of the dead sense associated and again the version and be the most useful operation gets to data qualities of this by calling the function you will get a table of which consists of
28:51
1 line datasets but which tells you the the features or the characteristics of the status and so how large a discount many featured in a comedy feature as it has on whether the missing values and then I'm off classes to what you do with fewer
29:06
studied at enough for example and the study for imbalance
29:10
datasets like all dysfunction looked at everything that had to classes which was pretty imbalance adenoid he's tend to 1 potential
29:20
1 ratio between the cost sizes usually we dislike missing values so many 1 exclude datasets was and so on and in the end you may be have 30 to 50 datasets that defying a badge work-study and then you can just the right over the can actually it still
29:39
technical problems this is available only for the datasets so we really want to have for the past and we are in the usual for this of the track at the weekend throughout the of this this week as well because this holding a bit and this is how you can download it has sent you can either use a name for the data said died and juicy well downloading said Iris from memory for the Tory and stuff gives sought in files on disc and in the end the has again posh suspect files are files those
30:13
files transformed into a advantage memory back again and again as the object of the deepening and some rights and not work with pride and or of the data and the spacing again and a friend
30:26
annotated with extra information to download a task and this without download not only the data set all Jack but well information on what you're supposed to do not with the state and it will also
30:41
download the across from the with some song and again you get this Sinise object that you can look reduced I was without now you can now view the open mouthed packages you want even these 2 operations will be
30:56
useful was for practical and but you can also now use every machine learning algorithm you wish on this year again creates learning and amara and the run-time which is a very convenient way for produced the predictions for some never allowed her world to see the cross addition runs and we have some basic prepossessing and so we are dropping apparently a couple of called because the computing concerns and most your body of Rubens don't like that they get results so 97 per cent accuracy and cost most important predictions and we might
31:37
wonder uploaded into the UK and this is the only thing that didn't switch on the other 1 a change the of authenticate now so because
31:46
I want to change data to create a new basically registered the
31:51
running and remove and are at the young don't have to do this any more and in a couple of weeks because of these learning and with already been registered and well and you up load up the run results which are basically the predictions and then so that will be the year them for you at
32:11
think where British funds measure which some of knows you could also evaluate itself enough of this and the other is going to beat you and an hour to a slice as everybody order survive analysis because they were
32:26
just about to talking and not going to invest much time until this time because the may be over time anywhere for 18 and the thing I wanted talk about this because it is that when to convince you guys automatic this possible it open amount because the pretty Houghton
32:46
task for by or said decisions and by or computer science guys so survive analysis is about and predicting how long certain
32:59
patient will survive may be to sell the group of patients that or a type of cancer and and on my anything really about this and some major is going to drop me commute to remember to talk Taubman's to add a couple of patients in a 1 0 win at the going to die right it's a sad day but the book thing about is that we want to re late this information to about their gene expression profiles and then hopefully figure out which genes are response a boat maybe for for winning their lifetime so drastically 3 can help them right so basically like your aggression estimates much time there as this is left their for the person and with 1 or little twist which makes it more complicated so we can we will have people in the medical study and we have a certain events that the death right and we can measure the amount of time that happens until the band happened and but then might be the case that some of these patients at she studied for ever reason and we don't get a measurement so that sensory and you can get a right sends ring witches when people
34:28
leave the study and UConn contact them any more and so we know they survived these ideas but may be sent to attend a would every right they can also enter this study and we have and we didn't know when so this might be left sense ring and it could be a sensing size and this makes it more complicated price because we know some information but we don't have the
34:52
true measurement and why at and the sole area and statistics that deals without which was also time now called the most
35:04
important thing for them out Howell out just days look like this we have clinical could various these just moment features page of the patient and their weight item whatever and we have high damage on genetic died expression daytime so these are either No 10 400 was a bright critical of local clinical and on the
35:31
few about tried almost size so she lighting this will be higher Menschel so this might be leg which is
35:37
set of may be 50 thousand and and for next generation sequencing item of a million and on the NYSE extra information as you have for the 100 or 300 alterations to really tough from right and and we have these timing inflammations and the way that the band had no not seeking an for what title since drink plates and to motivate you that this is not a necessary for look at by a few people and what it looked at by a
36:13
couple could not move it very very rather than the many people working in statistics so that it is cut my estimated which gives you a basic estimation of how this survival function looks like
36:26
that but I I think we should go with this a couple of days ago 1 of the most side is the physical paper and as 1 of the system Simplon models but predicts a Bible time which was the cost of 40 has lost model the 2nd most side statistical paper and get this in draw was and although amalgamate people from statistics will be hopefully the more interested in over and of these a large because the full of looks pretty certain about what glory have and and I'd undermined by 1st
37:02
prejudice the more we opened up the other communities the more successful this would be a new world also learn from drawing on these people drawing in all of these different people because the with it and they will give us another hopefully implemented prospective and were doing but we can get this right with 4 3 guys
37:23
designing call Machine experiments should be done complete correct and 2 sites and this is
37:33
at least my plan for the worst of 1 of discuss and technical staff with a European young and openly get these are this out of the way discussed possible integration of analysis the to reflect a and clean up some parts of the package because the next step should really be published this was here and and make Magnus available to people to again people comment on this used as and when not far away from the visit of Florida which was in a bad state currently time which it is already working on the file cashing mechanisms we don't need to
38:06
download everything again and again and everyone and the and highly like that you guys provider of these from the opener also provide the visualization and so on and the comparison but they are is a really nice food for that as well and will be able to do what all the niceties testing the Gigi brought visualization in are with the results so we need the data and and was not perfectly integrated and that at the moment because it's a bit hard to get this right 1 discussed with some people yet how we really can support custom mauling just a few lines of code that somebody wrote that this must be a way because again open up the system to anything and what might be the general next steps up for the
38:55
op package and openmail as a whole so disappointed member of this House and sophistication in and out of reach of across already might be the as well because
39:07
of the for me is the senior we should be able to do what the match stand to winning a feature selections business to be something and stole the information from the experiment on the sort because my pinioned that's very standard for many papers we must be a very well dramatic studies writes scientific studies for people to in the papers every aspect of this must be mappable to some openmail were work for or concept this is key because the and all the signs point to to work at you the use this hopefully at me but I think we already have a problem that people are being sent to lazy they tool and and they stick to it because they they want at a time think about mathematics and 4 minutes and then wonderful around with a potential technology at least most of the scientist at the and in the must make this general flexible and easy to use I'm but to skip the and we should do with large scale study of my opinion and populated database with interesting results because actually widely need any body to run it as a standard as the and 1 or data weekend with his wife and we have cost as for this so let's just and create millions of baseline experiments that can then again data mind and learn something from the actually maybe this modest thing would be to build the centre early in the year to anybody else because I'm i am of the opinion that if we do that this will be well a rich but we have and I'm
40:45
not sure thing Danny at
40:49
this kind of information that we just need to exploit the new just and a way to do this and is also editor some kind of a
40:58
broke worry presented tell a works in simple sonatas to get people interested in this sorry that
41:06
are and tennis over time bring that you have
