OpenML, R, mlr
68 views
Formal Metadata
Title 
OpenML, R, mlr

Title of Series  
Author 

Contributors 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2014

Language 
English

Content Metadata
Subject Area  
Abstract 
I will first introduce an R package to interface with OpenML. We support querying and downloading, running experiments and uploading results, so that all your experiments are organized online. R itself allows many forms of machine learning methods and experiments, from completely custom code to powerful semiautomated frameworks. The OpenML package is frameworkagnostic in that regard. The mlr package provides a generic, objectoriented, and extensible interface to a large number of machine learning methods in R. It enables researchers and practitioners to easily compare methods and implementations from different packages, rapidly conduct complex experiments, and implement their own metamethods using mlr's building blocks. Classification, regression, survival analysis, and clustering are supported and virtually every resampling strategy. MetaOptimization can be performed by tuning, feature filtering and feature selection, and most modeling steps can be parallelized. Its objectoriented structure provides in many cases a close match to the OpenML structure, and it can already be connected to the OpenML R package in a simple manner. The talk will conclude with an outlook regarding the next steps, open challenges and ideas to improve upon the current state of the project.

00:00
Presentation of a group
Meeting/Interview
Multiplication sign
01:02
Computer animation
Lattice (order)
Multiplication sign
Object (grammar)
Line (geometry)
Local Group
01:39
Computer programming
Machine learning
Lecture/Conference
Linker (computing)
Multiplication sign
Renewal theory
Physical system
Library (computing)
02:49
Machine learning
Computer animation
Lecture/Conference
Data mining
Moving average
Interface (computing)
Coma Berenices
03:22
Machine learning
Computer animation
Lecture/Conference
Operator (mathematics)
Expression
Virtual machine
Coma Berenices
Convex hull
Right angle
Data structure
Formal language
03:58
Computer programming
Mapping
Virtual machine
Coma Berenices
Law of large numbers
Computer animation
Lecture/Conference
Internet service provider
Website
Hill differential equation
Right angle
Subtraction
Data type
Writing
04:49
Hidden surface determination
Sensitivity analysis
Code
Multiplication sign
Interior (topology)
Survival analysis
Virtual machine
Line (geometry)
Disk readandwrite head
Programmer (hardware)
Computer animation
Lecture/Conference
Physical law
Software testing
Task (computing)
05:57
Metre
NPhard
Information
Ring (mathematics)
Multiplication sign
Survival analysis
Interior (topology)
Electronic mailing list
Measurement
Frame problem
Emulation
Programmer (hardware)
Computer animation
Objectoriented programming
Lecture/Conference
output
Object (grammar)
Wireless Markup Language
06:48
Vapor barrier
Ring (mathematics)
Repetition
Multiplication sign
Scientific modelling
Survival analysis
Letterpress printing
Variable (mathematics)
Measurement
Medical imaging
Duality (mathematics)
MKS system of units
Numeral (linguistics)
Computer animation
Bit rate
Personal digital assistant
Website
Software testing
Quicksort
Data type
Subtraction
Task (computing)
08:05
Link (knot theory)
LTI system theory
Linear regression
Ring (mathematics)
Multiplication sign
Survival analysis
Survival analysis
Likelihoodratio test
Number
Maxima and minima
Programmer (hardware)
Computer animation
Lecture/Conference
Physical law
Hill differential equation
Social class
08:38
Sensitivity analysis
Algorithm
Survival analysis
Interface (computing)
Parameter (computer programming)
Functional (mathematics)
Wave packet
Computer animation
Insertion loss
Lecture/Conference
Reduction of order
Manyvalued logic
Arithmetic progression
09:12
Classical physics
Ring (mathematics)
Decision theory
Mereology
Table (information)
Maxima and minima
Category of being
Numeral (linguistics)
Computer animation
Network topology
Hill differential equation
Physical law
Right angle
10:01
Rule of inference
Default (computer science)
Algorithm
Constraint (mathematics)
Multiplication sign
Decision theory
Survival analysis
Set (mathematics)
Parameter (computer programming)
Equation of state
Computer animation
Network topology
Lecture/Conference
Data type
Social class
11:03
Summation
Spacetime
Scaling (geometry)
Lecture/Conference
Ferry Corsten
Multiplication sign
Right angle
Parameter (computer programming)
Data structure
Associative property
Mathematical optimization
11:46
Computer animation
Lecture/Conference
Linear regression
Ring (mathematics)
Multiplication sign
Survival analysis
Survival analysis
Right angle
Prediction
Measurement
Emulation
Number
12:18
Computer animation
Survival analysis
Virtual machine
Bit
Measurement
Metropolitan area network
Estimator
13:01
Addition
Algorithm
Computer animation
Infinite impulse response
Personal digital assistant
Decision theory
Data structure
Prediction
Measurement
Descriptive statistics
Reading (process)
Task (computing)
13:56
Computer animation
Information
IRIST
Software testing
Measurement
Wave packet
14:28
Web page
Distribution (mathematics)
Theory of relativity
Information
Decision theory
1 (number)
Interface (computing)
Functional (mathematics)
Benchmark
Wave packet
Computer animation
Lecture/Conference
Personal digital assistant
Forest
Compilation album
Software testing
Film editing
Pairwise comparison
Task (computing)
15:54
Parallel port
Core dump
Parallel port
Batch processing
Computer animation
Lecture/Conference
Network socket
Statement (computer science)
Right angle
Selectivity (electronic)
Local ring
Multiplication
Physical system
16:26
Computer
Virtual machine
Gene cluster
Core dump
Disk readandwrite head
Computer
Batch processing
Arithmetic mean
Fermat's Last Theorem
Process (computing)
Computer animation
Network socket
Order (biology)
Local ring
Data type
Multiplication
Physical system
17:04
MIDI
Parallel port
Core dump
Mereology
Hidden Markov model
Batch processing
Maxima and minima
Computer animation
Lecture/Conference
Network socket
Operator (mathematics)
Website
Software testing
Right angle
Local ring
Pairwise comparison
Multiplication
Task (computing)
17:35
Run time (program lifecycle phase)
Axiom of choice
Algorithm
Set (mathematics)
Local Group
Video game
Bootstrap aggregating
Process (computing)
Computer animation
Lecture/Conference
Energy level
Software testing
Software testing
Selectivity (electronic)
Pairwise comparison
Task (computing)
18:26
Point (geometry)
Hidden surface determination
Linear regression
Scientific modelling
Distribution (mathematics)
Computeraided design
Ring (mathematics)
Batch processing
Code
Wave packet
Forest
Video game
Mathematics
Computer animation
Visualization (computer graphics)
Network socket
Forest
Operator (mathematics)
Local ring
Task (computing)
Physical system
19:41
Slide rule
Computer animation
Lecture/Conference
Operator (mathematics)
Interior (topology)
Hill differential equation
Object (grammar)
Principal component analysis
Maxima and minima
20:15
Digital filter
Algorithm
Scaling (geometry)
Code
Linear multistep method
Functional (mathematics)
Order of magnitude
Emulation
Maxima and minima
Summation
Maize
Computer animation
Network topology
Lecture/Conference
Energy level
Convex hull
Quicksort
PRINCE2
Principal component analysis
Curve fitting
20:54
Raw image format
Orientation (vector space)
Interior (topology)
Emulation
Maxima and minima
Computer animation
4 (number)
Insertion loss
output
Convex hull
Right angle
Selectivity (electronic)
Object (grammar)
Principal component analysis
Subtraction
Physical system
21:30
Computer programming
Algorithm
Observational study
Scientific modelling
Constructor (objectoriented programming)
Maxima and minima
Chaining
Computer animation
Lecture/Conference
Uniform resource name
Operator (mathematics)
Hill differential equation
Software testing
Principal component analysis
Mathematical optimization
Resultant
22:31
Email
Algorithm
Scientific modelling
Interior (topology)
Number
Maxima and minima
Database normalization
Computer animation
Configuration space
Hill differential equation
Bus (computing)
Principal component analysis
Mathematical optimization
Information security
Uniform space
23:10
Digital electronics
Code
Model theory
Scientific modelling
Maschinenbau Kiel
Virtual machine
Parameter (computer programming)
Weight
Data model
Maxima and minima
Chain
Goodness of fit
Bit rate
Term (mathematics)
Forest
Code
Energy level
Selectivity (electronic)
Subtraction
Algorithm
Axiom of choice
Scaling (geometry)
Parameter (computer programming)
Line (geometry)
Estimator
Voting
Computer animation
Personal digital assistant
Right angle
Quicksort
Rainforest
Mathematical optimization
Task (computing)
25:52
Constraint (mathematics)
Spacetime
Computer animation
Energy level
Line (geometry)
Mathematical optimization
Local Group
Resultant
26:52
Data model
Computer animation
Code
Maschinenbau Kiel
Interior (topology)
Sampling (statistics)
Survival analysis
Bit
Line (geometry)
27:24
Modal logic
Algorithm
Computer animation
Personal digital assistant
Operator (mathematics)
Multiplication sign
Addressing mode
Exploit (computer security)
Task (computing)
27:59
Code
Characteristic polynomial
Bit
Grand Unified Theory
Line (geometry)
Functional (mathematics)
Data quality
Table (information)
Discounts and allowances
Revision control
Casting (performing arts)
Computer animation
Lecture/Conference
Operator (mathematics)
Task (computing)
Writing
Task (computing)
Social class
29:05
Computer configuration
Computer animation
Observational study
Lecture/Conference
Chemical equation
Right angle
Grand Unified Theory
Mathematical optimization
Disk readandwrite head
Vector potential
Social class
Maxima and minima
29:39
Readonly memory
Trail
Spacetime
Computer file
Computer file
Bit
Pointer (computer programming)
Computer configuration
Computer animation
Lie group
Repository (publishing)
MiniDisc
Right angle
Object (grammar)
30:24
Computer animation
Information
Lecture/Conference
State of matter
Operator (mathematics)
Computer file
Repository (publishing)
Set (mathematics)
Object (grammar)
Task (computing)
Simulated annealing
Task (computing)
30:55
Run time (program lifecycle phase)
Authentication
Addition
Product (category theory)
Algorithm
Computer
Virtual machine
Amsterdam Ordnance Datum
Prediction
Mathematics
Sample (statistics)
Computer animation
Pi
Lecture/Conference
Divisor
Energy level
Task (computing)
Resultant
Data type
Family
31:45
Computer animation
Lecture/Conference
Structural load
Hash function
Program slicing
Order (biology)
Mathematical analysis
Prediction
Infinity
Measurement
Resultant
32:25
Observational study
Computer animation
Lecture/Conference
Multiplication sign
Decision theory
Mathematical analysis
Computer science
32:59
Musical ensemble
MUD
Observational study
Information
Line (geometry)
Multiplication sign
Time zone
Ring (mathematics)
Infinity
Measurement
Event horizon
Local Group
Computer animation
Commutator
Lecture/Conference
Personal digital assistant
Profil (magazine)
Dependent and independent variables
Right angle
Data type
34:51
Area
Web page
Statistics
Computer animation
Lecture/Conference
Multiplication sign
Moment (mathematics)
Survival analysis
Infinity
Weight
Measurement
35:28
Musical ensemble
Electric generator
Computer animation
Information
Lecture/Conference
Multiplication sign
Survival analysis
Hill differential equation
Set (mathematics)
Curve fitting
Sinc function
36:12
Statistics
Multiplication sign
Scientific modelling
Survival analysis
Survival analysis
Mereology
Amalgam (chemistry)
Functional (mathematics)
Estimator
Computer animation
Lecture/Conference
Units of measurement
Physical system
37:01
User interface
Computer file
INTEGRAL
State of matter
Server (computing)
Multiplication sign
Survival analysis
Virtual machine
Mathematical analysis
Planning
Auto mechanic
Staff (military)
Mereology
Mereology
System call
Variance
Summation
Sign (mathematics)
Computer animation
Lecture/Conference
Website
Convex hull
Right angle
Supremum
38:05
Standard deviation
Pairwise comparison
Mapping
Observational study
Code
Moment (mathematics)
Bit
Line (geometry)
Open set
Computer animation
Visualization (computer graphics)
Lecture/Conference
Internet service provider
Resultant
Physical system
39:06
Point (geometry)
Standard deviation
Standard deviation
Observational study
Mapping
Scaling (geometry)
Observational study
Information
Multiplication sign
Division (mathematics)
Sign (mathematics)
Mathematics
Computer animation
4 (number)
Lecture/Conference
Database
Selectivity (electronic)
Quicksort
Matching (graph theory)
Resultant
Supremum
40:48
Standard deviation
Summation
MUD
Computer animation
Information
Lecture/Conference
Multiplication sign
Interior (topology)
Text editor
Units of measurement
Maxima and minima
00:00
her legs were in the world and by the age you stay up orange wrote for 40 time Woodward after do with this Inc cabbages users and the other extra begin choroid longer 3 1 2nd XI at the and saw west of this works are going to grip this when he of design I can Boro your presenter with each as the yet should should be top seed to switch automatically is to great things going and actually going to talk about for things so that I think and hope
01:04
I will be able to do this in a 13 minutes where some time left for discussion because in the end with a couple of well so dishes questions and so like your King said the move to introduce and witches a package but and many other people have been developing now for a couple of ideas on the and I've started
01:26
working on the open mouthed object thing 1 a goal and with an eye which were actually that the 2 guys who wrote for only the 1st 2 lines of this in 1 may be the 2nd meeting of this group here that the working
01:40
time by the middle and yet let's go through these guys quickly because it without them all of presenting on now here today would look for a new treaty differently so I'm there was
01:55
measured and was over their and yokel bleached the last couple off for the Caulkins should be constrained Programming Rubin island this year should be the goal of the additional smile and from by and the old book together with me on this package and the woman accused of was not looking at mission school at that you don't want as a must eschewed and and has stood in the systems of pudding many hours now into the open mouthed package and Mitchell also has not stopped working on the open up package and his co operating with the club for renewables dozens of other are alleged packages on so what they are and are sure machine learning are and it exists on year and which is the official so that distribute are libraries and packages it also exist and gets up so you just want to remember 1 link
02:50
remember the ghetto Lincoln everything
02:52
else is reference from their its worst to point 1 riches out right now and we will relieve to point to during the next few days from the while I'm still here because the don't will get another from Brian repeated I'm pretty sure how the idea behind the package is to give you a unified Interface for every basic machine learning
03:17
supervised unsupervised technique in in are actually in the fact that the goals and the
03:23
reason why we want to have that is that she is current up pretty
03:27
well structure that many Coleman operation and what we really want to do is to don't want tackle 1 specific either most stop aspect of the machine learning we really want have expressed expressive language right went on for the G of this machine learning and stuff it interesting if you can buy can combine all of these are well things that you need to do in your experiments and well model them as a whole use them as a whole and for that you need a language and the basic as that of these lack of language over a basically
03:59
these well underlying machine learning methods like aggression occasion techniques and and many other things can come on top of
04:09
that that use this unique and the business community is so I can actually use a sometimes called this like label approach right where you combined plug in different different of these steps together and you can end up in the end with something like this year which reacting copied from from the
04:28
rapid minor tools and we don't really like to do this type of little Programming that you can do the same on by well writings more expressed about programs and will show you how these works hopefully in a few sites so the idea is to provide stretches for everything Group Co together so many many things than the package not
04:49
well programme by ourselves such so we I didn't programme the as yet algorithms that in the UK but the programme many will need of on top of that will come to a head and the package has grown quite which adjusted within a few minutes ago it's now 14 thousand lines of are code just code and well associated broke agenda tradition and the 6 thousand lines of testing and so was in
05:20
the for basic machine learning time how so we can currently cover nor regression in almost supervised justification cost sensitive classification for the general on definition of bad old enough that many of the rooms yet for its of this the experimental last caught of made it possible that we can now talk about plus string towns and try to sort the move with and are in the works but quite a lot on making so Bible moulding possible and and went to introduce but that this to you at at the end of the for because everyone a see this an open as a L at of and dog
05:58
while also during the and the and into early such a attacks smoking
06:06
or I told you about half open mouthed customers are a pretty Samueli said the data frame time with the with the
06:14
input that 3 of features the output and and that's annotated which other top Williams you might have to wait in the wake of the nation's to might misclassification cost and that so data with extra metre information like your house and everything lock is Object said 30 Object Oriented and you can't programme on that as well to everything is attacked with the to information 2nd list everything you can ask what is the subset of white during a rooms on measures that have a certain up 4 but and so on which in my opinion makes
06:49
it really nice to combine with something and this is how he would so this would be that this is how a 1st step for modeling step up might look like a with with are so critical staycation task to put and the date of the data in the case of the Times Irish data said his specify the target variable in your house and you see how would open all men construct from its of this is the print out some nice summary of the task to see how many of the measures that are what type of future site in the US so just 4 numerous features fact assault features that has missing values rates at broking broking is exactly what you ask about
07:28
when you set up and so 1 of you guys as boat with a possible to do with the 1 person O'Driscoll so that what you can do with this broking you can say the sort of Asians long together and if you long to get the injury resampling barrier either go to the trading said with either go to the said that test that all of them because that in some scenario that very common that you need to do something like that think about not looking at images where you look at different subsegments of the image or other difference subsegments of songs and so on and sound bites found that we need to do something like this
08:08
which comparably but so how many that what would we have 1 of
08:13
the side of learning and move about 40 classification rhythms and a couple of class 3 albums of these are still growing because this is very New 23 regression techniques 7 survival methods and by the way all of this is actually the whole talks
08:30
programme with the time that introduced and this is a well
08:36
and in right both numbers of some kind of
08:39
past the package for many of those things have and we have reduction Algorithms for across sensitive classification so these again Mutai techniques where you realise or New using the biggest progression techniques anticipation techniques usually in awaited sent and they functions in the cost of train and predicts which of these
09:01
methods that what the interface is made of for these for these running a rooms in each of these Lewis has Sociedad parameters said that we can also ask about and think that on the
09:15
next flight exactly how would you use such a your learning about the world you call maker the naming are so the said giving classic Asia World owners are are part because of petitioning just a decision for for the right so that Scott if you know that and and you
09:34
can again printed and rises and it tells you well I'm from package are part by the way this would automatically loaded and was Acacia room and my name is decision Tree either shorter name which is are part which is useful for tables and also and so on but it has a clause in as many different properties so it can handle tool cost to cash in my declined is can handle missing values numerous fact as well and it is a tree right into basically
10:02
everything that's myself trees and and it has predict so it will
10:08
predicts class labels that can change the ability of the method knows but that can supply that as well
10:18
and the only 1 to have a proper setting extra changed and that's don't to across Alicia troubling because
10:25
we know this them off and we don't waste time so that switch off by
10:30
default would interested again and again as well in what are called possible hyperparameters and but the tribute effort Tremaine Yulianto of this for of these assaulted algorithms acknowledged that so you can see it all of the different types of from the settings for the decision Tree you can see the type and it might have lent it it's a backdrop and the City for value taken from the book imitation and constraints so this goes this guy goes from 1 fragility well because of the accounting very well and you can say Well this parameters actually independent
11:04
parameters something about across when as right so that from with which is usually called them that only makes sense if you use the are from or be 1 or 2 of the other girls at the foot end of his use of the car from she can model well
11:19
dependencies structures and parameters space that get interesting if you think about during those guys and you can associate France summations with this 2 can do something like automatically chewing on log scales you can say it is a brand that every time you apply please do which true to the exit before your apply right because that's what we do when we do not Optimization of things that go from 0 to infinity right we optimizing looks good and
11:47
yet again that many different from its measures in them quite look for classification and this huge number is due to the fact that we have world these are a see
11:59
measures right like positive Jean once and so on and regression measures only
12:05
1 for survival analysis of because of 40 odd to implement the technically some lustring stuff for General measures like timing so that you can do everything right you can ask how long time NEMO
12:15
Prediction also took and and
12:19
again these SCF occasion Methods measures have prop achieves see can see whether they are available for my because the occasionally binary and so on you can you know whether they should be minimised maximise their best and the worst value and so on and now this is all pretty basic it becomes a bit more interesting weekend we something all this is what who is
12:45
already talked about and so it's about facing before man's estimation during the prop early Machine earnings comments from the old model despite hot and so we have crossed data bootstrapping sub something extra typification a Sapori this broking
13:02
structure that supported and the thing that is basically a and the and the Wallabies of territory of were stripping like the 6 3 plus if you want that for your read small datasets with less than 200 automation his or you call it to cradling algorithm 1st again but to use a decision feel we don't and 4 trusted additions recreated description objected and so we Gregory something destruction cross addition tenfold we create some ever so we measure up the mean mystification are all and stupidly also the accuracy and this just 1 minus the or in this case and you called recycle the learning of the
13:49
task and the abuse of the resounding description of the measures and you get back all the predictions of the year
14:00
measurements when just the test said it also was a of the training centre and in the end you usually mainly interested
14:07
in the aggregate to preside over a low cross tradition but if you want to have a detailed information you get out
14:14
of here so this 1 is real GEC contains all of the stuff but not as we have a shorter for this weekend just say crossbelt
14:24
and the and for cross auditioned for a 5 foot or and we have actually for many
14:32
of the things that I'm sure including about not nowadays for cuts of protective work but and the main aspect of the packages to clean interface to be able to do everything I want should also mentioned that it was certainly was not implemented that's where documented on the Web page where do this to take a 61 page of text you Kobe pasted and you fill in a few or details tried so critical the training function multiple the test function and the most boring aspect that somebody has so for an all be branded information ones but you are allowed to leave the house but then you don't get some nice extra checking functionality of the package but everything will work you can demise into on this too can be a really easy but the most important thing is if you do something like this is something as missing and he's Otello's reason for must go to the top tracker usual and tell us well access missing his my 1st trying to integrate the 2 the cricket and you can
15:37
also go benchmarking you can join a couple of tough and this case we took the IRA's we took before not cost we took 3 learning and rooms decision ran of forest as I am and you can call 1 function benchmark and this will run all of the 10 for across relations said for from the
15:57
right was just a couple of nested looks and the nice thing is that not
16:02
its and apply statements Parallel by statements and early seeking and paralysed all of this you can paralysed resampling you can rise this benchmarking cucumber lights features selection of have talked about it for and and you can do this on how well on the way to say every system but she Rice's every system that
16:24
either work 28 and because
16:28
the use of her on the head package which works of local machines it would look a magical machines Rawkins pocket at 1 in the eye Clusters and we also integrated another package
16:41
that we wrote for each piece Computing side for means Computing was also appointed amid United together that job and this enables you to virtual use every type of 4 months of this computing systems can order talks system as or a strong Cluster for L and as and when the work on also ordered it but most of those are in the world where all of the books once you have
17:04
completed this once for your site and another really nice thing is and why it she wrote this column package is that you can tag these operations that should maybe run in parallel with names and
17:19
you can then from the outside select whether they should be paralysed will not so I'm look at this year right that or any different stages of the experiment and depending on
17:32
what you do or depending on whether you blowhole hold out for
17:36
100 for bootstrapping and Romania at datasets to have and you might 1 of her life different never arrived so what you could do is you could say what role job is running 1 across the needed algorithm 1 1 data said right so that the old level but you could also add paralysed where the re something it's said right at the
17:57
end level and maybe there future selection of 14 in their as well and then that level and what other technically the best thing depends on sizes right depends on Runtime behaviour of a group amended depends on size of data said and testing and we don't know and how should we know bigger that is or whether you hold out across the nation and so you just select the right level because you usually the the choice between who experiment looks and and
18:29
I just tell the system to Duprels stop you tell the system the back your using may be a Gyo ever and you at 1 of her lawyers resampling Boro 1 of her life span from operation Danny visualizations and are so I'm despite days useful teaching and sometimes throughout shouldest Udenze how well so model that might look like into died so this is a random forest began on the Irish status and and what you see here on the Irish of data points in the 1st 2 were features of the IRA's datasets that you can see that the cost as little as 1 of Britain's dogs and trying and quest whether there were mistakes made by the model in the colour and codes approach to recruit to distribution
19:20
of brand of forest can cost ring or regression as well so that it just nice for working with begin as but a sometimes still over something you look and this is by the way again related to a stock of this is that we all is used Fault accosted by like as well you get bigger judge approach to object in you can changes in the the and right because
19:42
the club's of objects and you can apply to the operation stood if you don't like the colour coded ever the did and that many many things that can not talk about sold by just of some rise than he on the slide so we have pretty many prepossessing at the should start different so this is all pretty basic that I'm sure news its nice
20:13
but this thing he read
20:15
and opens up another level of scaling and another but it gives you a sort of magnitude to do with interesting stuffed with a package to what you can do it you can't take a basic learning
20:25
algorithm in Ukraine wraparound so Meetup algorithms or and functionality to a Saudi prince for example to a dual you can say Well want and a certain prepossessing method actually can into a team method you want because you can use a prepossessing rapper that can contain custom code to can say it will be for a while flight is going out of the Maze the tree award will take my up during yet and you can do
20:54
well feature the trade right we have about ironed out more than 10 different techniques knowing implemented this feature selection in there and see Crenshaw for what such that Ross or talked about through the correctly and the genetic a rebound we are all working on my secret methods as well at that different imputation Methods actually we have a whole Object
21:21
Oriented System throughout to any kind of amputation you want and majority implemented of many useful by the end of
21:31
those 4 you already have you can do generate begging and to do 1 some construction sticky is that if you want to create also most of the region was models over and under something for the but 1st occasion programs to just the huge study on those and trying to publish the results of these tests more than over so that also something that's mould and so on and on and something that I'm very interested in which is a credit to shooting and for up to
22:03
fight and optimization the nice think of this as is you can not only of a blueprint tuning for the basic algorithms can constructed chain of tradeprocessing Modeling and cost professing operations or of these with the greatest you can to be construed the jointly it because this not like it but bigger than just 22 revenues of the as yet dismount algorithms and there search or but surged so there is rated at racing and in
22:33
their or model based Optimization of this about everybody configuration a topic that looking already into the and this is a pretty or top accounting and and well we
22:42
like small based approach and also developing stuff for this as well but you can also racing if you like more and for nyse integrated into her are and this is possible as well but not officially
22:56
security was Beijing Optimization numbers of 2 misation but the email this was take maybe 1 or 2
23:01
months until the seeds from the used this not acting for 1 between the 2 optimized for data might model and
23:11
and finally and want to show you something pretty complex so eye will show you how to do it it efficient out model selection including different running algorithms including different hyperparameters with just a few lines of code so the idea of having the with understood how this 1st level of model or to my as Asian and works right it's something like maximum like a good estimation up so penalised M L or you might call this regular rise lost minimisation right or understood how this works wine take the same principle to the next level and in the next level you he is this mobile and this and this and this and and do what many Alix Burns and will look at what works best and well can as and looks nice non room to fill wrong with parameters circuit nicer adding that 1 of the best and review the right with machine and so we will just told the same thing that we did on the 1st level on the next level and this is sometimes called 2nd level estimation and think it's a good way to look at this and say this is not from the by invented by stole from a paper from the but we all about this and so you can know that there are so what you can do it you can something that we use the term we invented because the the nautical this on a multiplex mobile to you take on a different voting machines with parameters you got them together in a mighty plaques and that might excise won friend and this hybrid encodes what they actually selected price and the idea is not to join exhaustive sort of the way to do it that based Optimization West racing to focus on the well for me to write to efficient Optimization on this 2nd level and you can use every 2 that you want and young is how this good looks like on lines are the 10 or so so you could constructed a summary of also move is not place a riot of the rainforest the as yet you change the prices of goods and you say 1 1 or 2 of the 4 tries which in this case because the weights along you say or want to be traded at racing with 400 experiments and you create the private is set for them by a model Multiplex or so for the rate of forest you would optimized size the as the and in this case you would optimized that comes with the example right in reality would more credit and
25:51
your optimized is 1 of looks scale
25:54
constraints yachts about constraints unapprenticed and then you do during rabbit wraparound other toning and put on top of the other earning a group and now you can do to think that once the 1st of all you optimized over this pretty large space also large not just imagine a few more lines and I don't a sufficiently with racing and the other thing is if you use the right approach he you can do nested resampling rightly note we do this toning we can just point the best result in the end but because maybe did 1
26:34
billion experiments on the same day with the same frustrated Asian we know this will be optimates optimistically by his drive and the papers were get rejected the hopefully of the review was ordered and so we have to nest this into another level of performance and timation and if you use the right approach he now
26:53
he just say Well cross the rapper for a sample again and this will do everything at once another line of code and we are and that of on him and if you want a
27:09
real about this a little bit more special you for survival analysis by what we did in the 40 generally possible for other supervised of Maudling techniques which by road paper
27:24
about optimizing prepossessing operations and some of the most this case with the with the dreaded accuracy and that it to be open about package now witches might say for smaller so and good use up that much time to get the current 80 allows you to exploit datasets and tax on the sofa up to download datasets and has you can register learning algorithms economic look so that's what we got so far and hopefully
27:58
we'll have more at the end of this week is to
28:00
look most IPDPS we don't have the combined with a car on the idea is you can use any are package for you can write custom code if you want it might get less comedians but you can do everything you want and and OR is already a bit more integrated because of accused this is what looks like if you want to attack the said so you can ask well please of tell me what the open amount that assets to see how well the name of the data said data said idea in the version of the cast were where 1 of the registered top can see the ideas of the task the names of the dead sense associated and again the version and be the most useful operation gets to data qualities of this by calling the function you will get a table of which consists of
28:51
1 line datasets but which tells you the the features or the characteristics of the status and so how large a discount many featured in a comedy feature as it has on whether the missing values and then I'm off classes to what you do with fewer
29:06
studied at enough for example and the study for imbalance
29:10
datasets like all dysfunction looked at everything that had to classes which was pretty imbalance adenoid he's tend to 1 potential
29:20
1 ratio between the cost sizes usually we dislike missing values so many 1 exclude datasets was and so on and in the end you may be have 30 to 50 datasets that defying a badge workstudy and then you can just the right over the can actually it still
29:39
technical problems this is available only for the datasets so we really want to have for the past and we are in the usual for this of the track at the weekend throughout the of this this week as well because this holding a bit and this is how you can download it has sent you can either use a name for the data said died and juicy well downloading said Iris from memory for the Tory and stuff gives sought in files on disc and in the end the has again posh suspect files are files those
30:13
files transformed into a advantage memory back again and again as the object of the deepening and some rights and not work with pride and or of the data and the spacing again and a friend
30:26
annotated with extra information to download a task and this without download not only the data set all Jack but well information on what you're supposed to do not with the state and it will also
30:41
download the across from the with some song and again you get this Sinise object that you can look reduced I was without now you can now view the open mouthed packages you want even these 2 operations will be
30:56
useful was for practical and but you can also now use every machine learning algorithm you wish on this year again creates learning and amara and the runtime which is a very convenient way for produced the predictions for some never allowed her world to see the cross addition runs and we have some basic prepossessing and so we are dropping apparently a couple of called because the computing concerns and most your body of Rubens don't like that they get results so 97 per cent accuracy and cost most important predictions and we might
31:37
wonder uploaded into the UK and this is the only thing that didn't switch on the other 1 a change the of authenticate now so because
31:46
I want to change data to create a new basically registered the
31:51
running and remove and are at the young don't have to do this any more and in a couple of weeks because of these learning and with already been registered and well and you up load up the run results which are basically the predictions and then so that will be the year them for you at
32:11
think where British funds measure which some of knows you could also evaluate itself enough of this and the other is going to beat you and an hour to a slice as everybody order survive analysis because they were
32:26
just about to talking and not going to invest much time until this time because the may be over time anywhere for 18 and the thing I wanted talk about this because it is that when to convince you guys automatic this possible it open amount because the pretty Houghton
32:46
task for by or said decisions and by or computer science guys so survive analysis is about and predicting how long certain
32:59
patient will survive may be to sell the group of patients that or a type of cancer and and on my anything really about this and some major is going to drop me commute to remember to talk Taubman's to add a couple of patients in a 1 0 win at the going to die right it's a sad day but the book thing about is that we want to re late this information to about their gene expression profiles and then hopefully figure out which genes are response a boat maybe for for winning their lifetime so drastically 3 can help them right so basically like your aggression estimates much time there as this is left their for the person and with 1 or little twist which makes it more complicated so we can we will have people in the medical study and we have a certain events that the death right and we can measure the amount of time that happens until the band happened and but then might be the case that some of these patients at she studied for ever reason and we don't get a measurement so that sensory and you can get a right sends ring witches when people
34:28
leave the study and UConn contact them any more and so we know they survived these ideas but may be sent to attend a would every right they can also enter this study and we have and we didn't know when so this might be left sense ring and it could be a sensing size and this makes it more complicated price because we know some information but we don't have the
34:52
true measurement and why at and the sole area and statistics that deals without which was also time now called the most
35:04
important thing for them out Howell out just days look like this we have clinical could various these just moment features page of the patient and their weight item whatever and we have high damage on genetic died expression daytime so these are either No 10 400 was a bright critical of local clinical and on the
35:31
few about tried almost size so she lighting this will be higher Menschel so this might be leg which is
35:37
set of may be 50 thousand and and for next generation sequencing item of a million and on the NYSE extra information as you have for the 100 or 300 alterations to really tough from right and and we have these timing inflammations and the way that the band had no not seeking an for what title since drink plates and to motivate you that this is not a necessary for look at by a few people and what it looked at by a
36:13
couple could not move it very very rather than the many people working in statistics so that it is cut my estimated which gives you a basic estimation of how this survival function looks like
36:26
that but I I think we should go with this a couple of days ago 1 of the most side is the physical paper and as 1 of the system Simplon models but predicts a Bible time which was the cost of 40 has lost model the 2nd most side statistical paper and get this in draw was and although amalgamate people from statistics will be hopefully the more interested in over and of these a large because the full of looks pretty certain about what glory have and and I'd undermined by 1st
37:02
prejudice the more we opened up the other communities the more successful this would be a new world also learn from drawing on these people drawing in all of these different people because the with it and they will give us another hopefully implemented prospective and were doing but we can get this right with 4 3 guys
37:23
designing call Machine experiments should be done complete correct and 2 sites and this is
37:33
at least my plan for the worst of 1 of discuss and technical staff with a European young and openly get these are this out of the way discussed possible integration of analysis the to reflect a and clean up some parts of the package because the next step should really be published this was here and and make Magnus available to people to again people comment on this used as and when not far away from the visit of Florida which was in a bad state currently time which it is already working on the file cashing mechanisms we don't need to
38:06
download everything again and again and everyone and the and highly like that you guys provider of these from the opener also provide the visualization and so on and the comparison but they are is a really nice food for that as well and will be able to do what all the niceties testing the Gigi brought visualization in are with the results so we need the data and and was not perfectly integrated and that at the moment because it's a bit hard to get this right 1 discussed with some people yet how we really can support custom mauling just a few lines of code that somebody wrote that this must be a way because again open up the system to anything and what might be the general next steps up for the
38:55
op package and openmail as a whole so disappointed member of this House and sophistication in and out of reach of across already might be the as well because
39:07
of the for me is the senior we should be able to do what the match stand to winning a feature selections business to be something and stole the information from the experiment on the sort because my pinioned that's very standard for many papers we must be a very well dramatic studies writes scientific studies for people to in the papers every aspect of this must be mappable to some openmail were work for or concept this is key because the and all the signs point to to work at you the use this hopefully at me but I think we already have a problem that people are being sent to lazy they tool and and they stick to it because they they want at a time think about mathematics and 4 minutes and then wonderful around with a potential technology at least most of the scientist at the and in the must make this general flexible and easy to use I'm but to skip the and we should do with large scale study of my opinion and populated database with interesting results because actually widely need any body to run it as a standard as the and 1 or data weekend with his wife and we have cost as for this so let's just and create millions of baseline experiments that can then again data mind and learn something from the actually maybe this modest thing would be to build the centre early in the year to anybody else because I'm i am of the opinion that if we do that this will be well a rich but we have and I'm
40:45
not sure thing Danny at
40:49
this kind of information that we just need to exploit the new just and a way to do this and is also editor some kind of a
40:58
broke worry presented tell a works in simple sonatas to get people interested in this sorry that
41:06
are and tennis over time bring that you have