Machine learning and applications
153 views
Formal Metadata
Title 
Machine learning and applications

Title of Series  
Author 

Contributors 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 
WeierstraßInstitut für Angewandte Analysis und Stochastik (WIAS), Technische Informationsbibliothek (TIB)

Release Date 
2017

Language 
English

Production Year 
2017

Production Place 
Hannover

Content Metadata
Subject Area  
Abstract 
Since a few years Machine Learning (ML) has broadened the modeling toolbox for the sciences and industry. The talk will first remind the audience of the main ingredients for applying machine learning. Then various ML applications in the sciences namely Brain Computer Interfaces and Quantum Chemistry will be discussed.

00:00
Point (geometry)
Spacetime
State of matter
Texture mapping
Virtual machine
Mass
Theory
Inference
Machine learning
Kernel (computing)
Computer network
God
Machine learning
Support vector machine
Spacetime
Mapping
Distribution (mathematics)
State of matter
Functional (mathematics)
Arithmetic mean
Probability distribution
Kernel (computing)
Computer animation
Graph coloring
Personal digital assistant
Data center
Right angle
Mathematician
03:32
Polynomial
Pixel
Statistics
Feature space
Parameter (computer programming)
Student's ttest
Field (computer science)
Theory
Medical imaging
Sign (mathematics)
Kernel (computing)
Vector space
Subtraction
Metropolitan area network
Support vector machine
Pattern recognition
Spacetime
Mapping
Decision theory
Theory
Functional (mathematics)
10 (number)
Arithmetic mean
Kernel (computing)
Computer animation
Nonlinear system
Function (mathematics)
Hausdorff dimension
Order (biology)
06:08
Computer programming
Support vector machine
Artificial neural network
Feature space
Optimization problem
Virtual machine
Sparse matrix
Atomic nucleus
Variable (mathematics)
Maxima and minima
Duality (mathematics)
Coefficient
Error message
Computer animation
Kernel (computing)
Mathematics
Computer network
Vector space
Computer network
Uniqueness quantification
Species
06:59
Point (geometry)
Computer programming
Perceptron
Multiplication sign
Scientific modelling
Connectivity (graph theory)
Virtual machine
Parameter (computer programming)
Weight
Mereology
Disk readandwrite head
Number
Moore's law
Goodness of fit
Sign (mathematics)
Machine learning
Term (mathematics)
Computer network
Square number
Representation (politics)
Data structure
Feature space
Noise (electronics)
Pattern recognition
Scaling (geometry)
Spacetime
Information
Mapping
Artificial neural network
Variable (mathematics)
Functional (mathematics)
Approximation
Estimator
Proof theory
Summation
Kernel (computing)
Computer animation
Nonlinear system
Hilbert space
Computer network
Universe (mathematics)
Computer science
output
Speech synthesis
Pattern language
12:48
Machine learning
Computer animation
Scientific modelling
Bit
Mathematical model
Subtraction
13:43
State of matter
Modal logic
Multiplication sign
View (database)
Realtime operating system
Bookmark (World Wide Web)
Machine learning
Roundness (object)
Spherical cap
Multivariate Analyse
Multiplication
Musical ensemble
Programming paradigm
Touchscreen
Computer
Sound effect
Bit
Artificial life
Control flow
Translation (relic)
Fast Fourier transform
Latent heat
Arithmetic mean
Wave
Vector space
Telecommunication
Order (biology)
Linearization
Peripheral
Quicksort
Laptop
Digital filter
Game controller
Virtual machine
Translation (relic)
Number
Wave packet
Social class
Subtraction
International Date Line
Metropolitan area network
Support vector machine
Demo (music)
Artificial neural network
Projective plane
Brain–computer interface
Mathematical analysis
Cartesian coordinate system
Cursor (computers)
Sphere
Local Group
Interface (computing)
Computer animation
Principal component analysis
Game theory
Videoconferencing
21:23
Point (geometry)
Observational study
Code
Multiplication sign
Virtual machine
1 (number)
Realtime operating system
Mereology
Food energy
Code
Number
Mathematics
Internetworking
Data compression
Term (mathematics)
Videoconferencing
Extension (kinesiology)
MP3
Standard deviation
Scaling (geometry)
Projective plane
Interactive television
Line (geometry)
Interface (computing)
Computer animation
Universe (mathematics)
Physics
Right angle
Quicksort
Game theory
26:01
Point (geometry)
Computer programming
Spacetime
Density functional theory
Presentation of a group
Scientific modelling
Multiplication sign
Virtual machine
Similarity (geometry)
Black box
Mereology
Schrödinger equation
Graph (mathematics)
Formal language
Wave packet
Element (mathematics)
Prime ideal
Frequency
Matrix (mathematics)
Machine learning
String (computer science)
Representation (politics)
Subtraction
Machine learning
Artificial neural network
Computer
Forcing (mathematics)
Discrete Fourier transform
Bit
Prediction
System call
Approximation
Category of being
Summation
Computer animation
Network topology
Vector space
Physicist
Quantum mechanics
Equation
output
Representation (politics)
Matrix (mathematics)
Resultant
31:40
Linear regression
Set (mathematics)
Multiplication sign
Scientific modelling
Artificial neural network
Weight
Food energy
Prediction
Kernel (computing)
Authorization
Subtraction
Form (programming)
Scale (map)
Linear regression
Characteristic polynomial
Parameter (computer programming)
Bit
Food energy
Weight
Prediction
Approximation
Distance
Category of being
Kernel (computing)
Error message
Sample (statistics)
Computer animation
Graph coloring
Physical system
Matrix (mathematics)
33:22
Point (geometry)
Musical ensemble
Scientific modelling
Direction (geometry)
Decision theory
Computergenerated imagery
Virtual machine
1 (number)
Water vapor
Total S.A.
Equivalence relation
Perspective (visual)
Field (computer science)
Medical imaging
Prediction
Machine learning
Selectivity (electronic)
output
Pixel
Category of being
Social class
Spacetime
Artificial neural network
Linear regression
Prediction
Variable (mathematics)
Perspective (visual)
Wellformed formula
Computer animation
Nonlinear system
Network topology
Order (biology)
Linearization
output
Dependent and independent variables
Cycle (graph theory)
Conservation law
36:06
Boss Corporation
Pixel
Mapping
Artificial neural network
Scientific modelling
Computergenerated imagery
Virtual machine
1 (number)
Mereology
Disk readandwrite head
Total S.A.
Equivalence relation
Prediction
Kernel (computing)
Wellformed formula
Computer animation
Nonlinear system
Personal digital assistant
Conservation law
output
Pixel
Category of being
Resultant
37:10
Taylor series
Coccinellidae
Artificial neural network
Artificial neural network
Thermal expansion
Weight
Function (mathematics)
Weight
Theory
Category of being
Propagator
Message passing
Prediction
Computer animation
Term (mathematics)
Order (biology)
Computer network
Helmholtz decomposition
Linearization
Vertex (graph theory)
Interpreter (computing)
Local ring
Taylor series
39:08
Web crawler
Artificial neural network
Thermal expansion
Shape (magazine)
Mereology
Prediction
Computer animation
Kernel (computing)
Computer network
Conservation law
Selectivity (electronic)
Conservation law
Computerassisted translation
Category of being
Pixel
Subtraction
40:03
Model theory
Scientific modelling
Transport Layer Security
Multiplication sign
Computergenerated imagery
Virtual machine
Computer
Social class
Machine learning
Mathematics
Computer network
Database
Fisher's exact test
Error message
Social class
Mapping
Artificial neural network
Machine vision
Sampling (statistics)
Physicalism
Benchmark
Process (computing)
Error message
Computer animation
Vector space
Personal digital assistant
Software testing
42:33
Artificial neural network
Scientific modelling
Control flow
Physicalism
Mereology
Tensor
Shooting method
Tensor
Computer animation
Computer network
Telecommunication
Equation
Physics
Representation (politics)
43:24
Context awareness
Scientific modelling
Source code
Artificial neural network
Embedding
Weight
Food energy
Formal language
Tensor
Matrix (mathematics)
Profil (magazine)
Representation (politics)
output
Subtraction
Injektivität
Electronic data interchange
Artificial neural network
Feedback
Interactive television
Mathematical analysis
Symbol table
Category of being
Kernel (computing)
Voting
Polarization (waves)
Computer animation
Vector space
Computer network
Representation (politics)
Local ring
45:34
Product (category theory)
Spacetime
Group action
Graph (mathematics)
Computer animation
Scientific modelling
Neighbourhood (graph theory)
Interactive television
Thermal expansion
Element (mathematics)
Tangent
Hyperbolische Gruppe
46:15
Point (geometry)
NPhard
Black box
Total S.A.
Food energy
Perspective (visual)
Field (computer science)
Prediction
Arithmetic mean
Database
Arrow of time
Aerodynamics
Quantum
Task (computing)
Stability theory
Computer architecture
Spacetime
Information
Keyboard shortcut
Physicalism
Food energy
Local Group
Category of being
Error message
Computer animation
Physicist
Quantum mechanics
Computer network
Computer science
Simulation
49:59
Point (geometry)
State of matter
Multiplication sign
Scientific modelling
Materialization (paranormal)
Mereology
Mathematical model
Perspective (visual)
Open set
Prediction
Machine learning
Database
Code
State of matter
Computer
Theory
Interface (computing)
Data analysis
Bit
Prediction
Local Group
Information privacy
Causality
Sample (statistics)
Computer animation
Personal digital assistant
Physics
Right angle
Representation (politics)
Data management
00:00
so I will I will present their work on machine learning and I'm with the billing big data centers and also with you believe in and I 7 affiliation with a Korea University so
00:15
because I don't expect people to know what machine learning is In all details and I have try to to be explained very simply on 1 side and also give something for the people already know so it's very technical what I'm saying but since you're bunch mathematicians you should be given to to take so this is basic so what is machine learning about we'd like to learn from data which means that we have data and this is these are some points into the and to be data is living in very highdimensional spaces and it has some labels y and this could be continuous or or these could be discrete like in this classification problem where we want to distinguish between red and you so the only thing that people in machine learning do is they try to infer some unknown mapping between X and Y assuming that there's some joint probability distribution that only God knows OK and the 1 important point here is that all the mass that is behind this um and tries to make sure that you can generalize meaning that you can actually get this mapping right for unseen data which is a very strange concept if you think about it because you have to actually you know from given data you have to infer it to something that you haven't seen OK so you but I show you this is like the function f that you tried to to estimate this could be a function of its of its we complex this could be also another function and is infinitely many functions which 1 is the optimal 1 that's the question optimally generalizing 1 so if I show you this data point then most of you would attribute to red label to the state of the art because you're in your brain in b by eyeballing and you do some tools the inference and their and put the red color to this case so and of course again if you have a million dimensional data are graphical dimensional data and you cannot I both things so this means that you need to have a proper mathematical background to put this in place and this is what all fields of and apart from this nice theory we actually um have a bunch of methods that actually work very well and 1 of them was already mentioned the neural networks and and another 1 that is is kernel methods like support vector machines and a lot people are using them and I will 1st tell you the idea about this and then will continue and see how how they can be so the
03:34
kernel method basically is we have some function
03:40
and W is the parameters and and phi is some mapping x is the data that you have have sign as the size OK so the idea of a support vector machine is the following so you have some data obtained again now red and green not red and you see the difference no meanings them so we map this data to some highdimensional space the man to mapping phi that's here the and in this highdimensional space we do something very simple for which we can prove and by the virtual doing the optimal thing in this highdimensional space and we do the optimal nonlinear thing in the original space think about and that you to you met your data with polynomial Hs and you have a picture with thousand by thousand pixels which is about 1 million pixels and you map this with a polynomial of 10th order then you have approximately 1 million to the attend dimensions in this space which is quite large and then you do something very simple there for which you can prove and in the original space you have your function that classifies very well images when I was said no among the the 1st people who actually worked on this together with my mentor public and my students spend Alex and and so in this this moment and we're all told other things so the field called pattern recognition of the field called statistics always told us that we should not use many features we should use as few as possible and we're doing exactly the opposite and the reason why we can do this is the very nice theory that allows us to show that if we're doing the optimal thing here than we're doing also an optimal nonlinear that so that's in a nutshell what what the what the idea and practically a we
06:10
solve an optimization problem which is a quadratic program and we go to the dual and then this gives us the solution and for for this support vector machine so for those of you who haven't seen quadratic programs ignore it and for those of you have seen quadratic programs this is a nice quadratic program and so the other and learning machine
06:43
that is very popular these days is a neural
06:46
network and now neural networks have been around for quite a while and in fact they have been a dominating the brains of of the species there
06:59
a a couple million years but in science and or in computer science the real networks have been in Oak coming up and sometimes in the sixties last century and and M. basically they take the following structure so back in the sixties there was they were called perceptrons and this was just a bunch of inputs connected with some weights that summed up the the activities of the inputs went through a nonlinearity and that was it a so that's a perceptron no since perceptrons or not very capable of doing any really interesting nonlinear stuff and people have thought that putting stacking a bunch of perceptrons next to each other and on top of each other and that this will help so basically neural networks on a universal function approximator so you have some input you take this input sum it up you have some weights here and um put it through a nonlinearity that's here and you do this again and again and again and again and then became bumps so just for the mathematicians so that this is a universal function approximator and I show you with a sheet of paper can so what does it do so that every component every neurons basically is this nonlinear function is a signal with a pen so it's it's this is a signal in a highdimensional space that's it a and with the parameters you can make this even flat or of steam so now you take another signal head so this once again with the other 1 so you have this OK and the next layer so you have a rich and then in the next thing we take another rich and then you have a bomb and with a bump you can do that that's the universal approximation proof in with a sheet of paper so so these neural networks as and I'm quite but useful and then and the question is why are they getting such a hot method to it again and the reason is that because nowadays we have a GPU was so we have much more computing power and also we have a lot of data we have big data these days so in of course uh there's nothing informative about dictate data so the data is just think so it becomes information when when you actually and ask great questions to it interesting question questions so when you do an interesting model but the interesting point about neural networks is that they're very efficient in the sense that they can deal with a lot of data that's what I mean by this is say if you want to solve a quadratic part a program with a million parameters then you take approximately between a million square on million Q times computing time in terms of the variables that neural network scales linearly so that's a good thing and this is why they used with so much so now if you think about this i put my sophistications had not compare if you think about estimators in general then there's something good about a lot of data because and if you want to have an estimate something the more data you get them better you get to the solution of the closer you get there and if you have a decent estimator then this goes like 1 over n where n is the number of data points until you reach the noise level which means that if you have a lot of data this is good and most of the techniques in machine learning outside neural networks are not able to make use of a lot of data and so that's why they are now very popular OK so 1 thing kernel methods is that kernel methods use this mapping to feature space you have to say what the feature spaces for you it's some new space and so you have a representation this hilbert space that tells you how to compare things nonlinearly here the neural network learns this representation by itself from data and which means that also multiscale information can be learned and so you have to say beforehand that's the important point so machine learning has been a huge economic sector and it is being used in selfdriving cars in the best pattern
12:18
recognition methods all the best speech recognition methods these days worked with machine learning with the networks we find Higgs particles with it we do a neuro science with that use it in your cell phones when you use Google Facebook Amazon what not um and we can do also some decent signs in it OK so of the cutest felt everywhere but I
12:51
m much more interested in is to to use machine learning as some kind of an enabling technologies for the science so similarly like them mathematical modelling is enabling technology for the Sciences machine learning is an enabling technology for the science it comes from a different corner in mathematical modeling and you make up your mathematical kinetic model here you data driven of Gross these things are not as far apart as they seem so they're actually growing together any reasonable person would think this is a good idea so 1st I will tell you a bit about my
13:39
neuro science hobby that I've been pursuing since about 2 22 and this is the Berlin Brain
13:47
Computer Interface project that to um you know there's 3 principal components of that and that's the end of Benjamin Blankertz was a professor at 2 1 0 and W cool you was a medical doctor charity and myself so we came to the conclusion that it would be interesting to an given brain signals to understand what's the what cognitive state the brain was in In other something like mind well of course this is very simple minded mindreading so but let me give you the motivation why people are interested in this this from the medical side as the number of that socalled locked in patients so they have no means for communication and they have intact brains the so if you could read their brain signals and decode them and translate them into a control signal of playing my wheelchair some writing device they could communicate with the outside world so this was 1 of the motivations that that people have and so far me because I'm a machine learner this looks like a machine learning problem because you have the gene you have a say 60 channels of EEG have with thousand hertz each and you extract some features and you classify and and what I would be showing you is a translation of human intentions into a some control signal without using muscle activity or any kind of paraffin because of course if I start reading might years or you know making faces then there's some muscle activity that will reflect itself also brain but this is not available to a less patients because they cannot you so if we want to help this patient group on the long run we need to be able and communicate with all these are without muscle activity practically and this is a complicated thing so you have this multivariate EEG and and in real time you have to do all sorts of filtering you have to remove artifacts and you have to be having a lot of feature extractors that put in all the medical knowledge of neuro science knowledge that we have about the brain doing things and then this stack this into huge vector and then we put this through our favorite learning machine this could be a support vector machine this could be some linear discriminant analysis it could be a neural network was not so when when
17:12
I got interested in this and this was about 22 years ago and the state what state of the art was that and the subjects had to to to be trained for about 100 to 300 so which means that people have to wear in Aegean people and when EEG cap like that and then they had to to be trained with biofeedback in order to get out some decent signal so we said well let's do it the other way round that the subjects think whatever they think and that's just have the machines learn that whatever needs to be learned and this reduced the necessary training time to about 5 minutes so before this was maybe a dozen groups in the world that worked on this now it's about more than 400 or maybe even 1 and 2 groups in sciences and industry doing all sorts of things with the brink so what are the applications of course 1 application is hopeful patient and this could be the ALS patients all this could be also patients with stroke where we would enhance review rehabilitation but just to get the better idea of what can be done right and and this is an old view but is still alive so you have the subject here wearing the cap OK so the data goes in into the amplifier From the amplified goes to this laptop here and the letter um gives a control signals to the screen and moves the cursor around OK and so as a subject you have too soon different brain states and forward decoding was done right we can move the subject can move the cursor to play a game which is the game of brain pong so subject is sitting still not doing anything it's it's not using the eyes and it's controlling through the virtual their brain signals this because of Don so in real time this is decoded now I was the subject you can often what did you think about k in order to get this done so 1st of all and I will teach a bit of physiology so if I would wave my hand like this to you then on my left hemisphere over the motor cortex this some activation in fact there's neither rhythm that is suppressed if I wave my left hand then the ideal rhythm is suppressed on the right side now the interesting thing and this is not something that we invented but this is no 1 on physiology and if we only imagined that without any waving without any movement we still have the same effect as the suppression of idle rhythm although the contralateral motor cortex and then this we can use for decoding purpose and this is being used as saw so so you see that in principle you can get 1 bit out but you can get much more bits out so there we know nowadays and depending on on the speller paradigms you can you can get 10 bit social community man of which so so we had demo on this and Siebert which was all already
21:25
8 7 letters in so that's the sort ready so this helpful patients that I think the interesting point is we can use this device to understand what the brain is actually doing right because the brain is a machine that is thinking and behaving really real time so so we have some other device that they can decode these activities and we can do interesting that studies and that help us to understand how the brain dead also there's something which is um quite interesting which is we can use this in technology so you've already seen someone play a game that you could so if you can imagine all sorts of interaction with the new channel that is provided by the brain that you can decode in real time and so the most I'm obvious ones that I would like to tell you about and unfortunately Professor Kleiner is not here anymore so I could complain to him that M I mean this is more of a joke so some some some years ago when this Excellence Initiative thing happened right so the university's thought about strange project and part as part of some excellent center some you know moonshot and I have a colleague in Berlin Islamists at 2 must be to must be again is 1 of the fathers of H 2 6 4 which is the current video coding stand out and every 2nd beating the Internet is coded by the so using it all the time without knowing and so I suggested to promise wrote this MP3 where people have studied very carefully what the EU can perceive and so why why should we in study with the brain computer interface what we can perceive in video and just code that so that was the idea and we wrote this and of course and sure enough the reviewers said at what not on like things to them and their but we did it anyway and universe is a line of research this number of papers and the role and then it actually what's the point is not to use a brain computer interface to you know change something in your video cold but the point is to learn something about perception and so we learn something about perception and we could use this later and impact in particular that a change I people have used this to an extent that they could improve the video coding and for example and there was a Champions League final brought is that by sky I think it was the 1 and a half years ago or something like that and from linear so they the there there have been recording being used with the new 1 with this new mindset and the interesting part is you can say something maybe this was you a couple per cent right but if you think about a couple % on a global scale because every 2nd that you know of the internet is is broadcasted by H 2 6 forwards this new coding standard it amounts to a couple of nuclear power plants in terms of energy consumption so the fact that you can actually understand something about the brain translate this into a better coding and and I'll make a planet of the that this may be a bit and obvious and it's clear it was clearly beyond the scope of these referees so the conclusion from this is that it doesn't matter what people say hey review stuff it it just doesn't work as so and another party
25:49
that would like to show you and it's it's maybe equally funny but even more heretic of care so please prepare for some heresy so this was
26:03
in 2011 and there's a very nice place on this planet which is called an iPad Institute for Pure and Applied Mathematics and this is at UCLA and I was invited to participate in the program there for 3 months and and so on my sabbatical was anyway up so as not there's nothing wrong with going to California so you can't can't do anything wrong coming that I realize that this was a program about quantum chemistry so it's not really a machine learning but I happen to have a history of being a theoretical physicist and who got some training in quantum mechanics and and did some strings and things like that in my past so I I heard people talk about the Schrödinger equation and there's a little bit Schrödinger equation is a is a wonderful and these that you can see here um and it's a very complicated equation that cannot be solved other than in approximation and people got nobel prices for these approximations it's called density functional theory In fact when I 1st heard DFT I thought what are they talking this is not discrete Fourier transform that they're talking about so coming and you know to this remark about a joint language that you have to develop OK so an when when I sat in these talks then I thought well this is very nice so people come from 1st principles and then there they do all the things from 1st principles and at some point they make an approximation so I thought why not same trees the Schrödinger equation as a black box and just this as a prediction problem so this is as if somebody takes you know obvious Stokes says everything that goes matters everything that is result medals and in between artists some can just predicted neural network with what so this is what I suggested to the people and of course they were not amused and they they were not as you know clear on how they would kill me so I survived and vast the interesting point was that we we actually as always and there was some young people infected um and but useful but I skeptical and um on tool for newly infected with or whether will let's try this idea to crazy 1 and then what they did was they generated a lot of data that with DFT so DFT for small molecules in a reasonable artifacts summation per molecule costs about 5 moles of computing time so you know next to the climate people these are the guys that use all you computed in so the data generated something like 7 thousand molecules with all the property i'm and that was all training data in fact it was not all training data but we thought we take part of it thousand as training build a model from it of prediction model and on the rest of the data that we haven't seen before and whether or not we can predict the outcome of the show which OK so this was 2011 and in 2012 all 1st period of his redletter appeared on so the way we did this was
30:07
the following so 1st of all in machine learning and you need the input so the input of the molecule would be an so this is now molecule and you take the coordinates and nuclear charges of all the molecules and then I then you say well I have to represent and the similarity of molecules of 2 molecules somehow and a good way of representing the similarity of molecules would be and the so it's the similarity within a molecule is to to um write down something which is which we called who matrix so we take the could all forces between the eyes in the jth atom and white this into them into a matrix so am i j the matrix element between the eyes and 2 days atom is just the nuclear charges or what the call lawful so this is a matrix representation is not effective but but maybe people have told you that in machine learning we can deal with vectors we can deal with mattresses we can deal with 10 users we can deal with graphs and anything 1 and whatever you have and so this is now a matrix of presentation and then we can take a say if we compared to more molecules mn and Prime as as represented here we just take it from being this difference between 10 and then we do something very simple I insisted that would do something simple
31:42
we take we 1st put this into galson so this is the kind of model and then we do kernel ridge regression something that is really the most stupid nonlinear model that you can think of 1 but a very powerful 1 and then you know and so it can be solved in closed form so computing time is need make little prediction time is indicative so but we
32:10
haven't and was a 10 K come from as quality which is not good enough but not bad so if we take the mean of across all the molecules as a predictor this would be around 350 k come the to give difference so a mechanical approximation would be some they get a decent 1 with these 3 for OK because something then all a bit later we use the neural nets we were down at 3 come and then 2015 we already have 1 K and recently we had the 1 K color which is chemical accuracy this is out of sample so we're just taking the molecular properties and just predicting and of course we can also predict instead of energies we can predict authorizations and everything within you know less than a millisecond so let me before I
33:25
come back to this cycle I will come back to this this more let me give you some perspective on machine so if you have a machine learning model
33:46
this is a highly nonlinear East for example a deep neural network of all these layers very complex very nonlinear so you put something inside in this you know and then neuralnetwork answers to this for some classifies this input image into a the respective class so this is a classification a problem where you have a couple of thousand classes roosters fossils caused to do and what not right and so the neural network actually gives you the correct prediction and you wonder why doesn't give you this prediction so there's a field in machine learning which is called an feature selection so in feature selection you have all your huge bulk of data and you ask the question what of the inputs is the most salient ones and people use less water at a candidate indeed they play the in 1 tree and in order to find a few variables that are responses now the interesting thing is if you have you know hold hold bulk of data it is nice to know what is generally interested interesting but I think if we think about medical diagnosis say you couldn't care less about the diagnoses all the ensemble so you would like to know what is your individual diagnosis you would like to know what are the individual variables that the model things I important for this particular decision making and the is the reason why people in the sciences in general have been only you working with linear models is because there is a way back in in the linear model it's an obvious way back OK so if I have this linear classifier in space and I put data point here then I know that this direction is the 1 that is reasonable but if I have this very highly
36:06
nonlinear whatever the classifier how do I get back what is the right things and this was an unsolved problem that is solved now for any kind of nonlinear machine learning models for neural networks for kernel machines and so on this was by as a Boston
36:25
boss who and we with and at all and I'm also among these all them so the idea is the following so we we take the classification and we go backwards From the the results of the classification to the input and make a probability map heat maps which tells you which of these pixels in this case and is the most salient ones for this particular case and you see you know it's there it's this this thing here of most of the head part of the rooster that makes that this is considered the most salient for this part of can so I I would just
37:12
give you an idea about this and so is mathematically it's very beautiful because you can you can have this is a very highly complex nonlinear things right and so you could do some kind of a Taylor expansion around so the Taylor expansion is not so helpful because it's it would be a global thing and you would have to get many orders so we could show that if you take a Taylor expansion that is local local firing neurons and then then it's easy to do with locally linear Taylor expansion that becomes global does the global nonlinear thing in this manner and it's called detail at this very popular paper and and they're in this manner you can show something some properties some mathematical properties and as you can understand this problem and so let me just give you an idea so you see you have the picture that goes in here like this lady that think and then you have your neural network that classifies this as
38:20
a is sorry not ladybird but ladybug and so this is considered a ladybug over the network and now what we're doing is we're going backwards and and we call this relevance propagation so this node was the most relevant and we say we propagate this this relevance backwards and a and basically you take all the activities of the network output you you have already the weights because the network has been trained and if you want to know the relevance here then you sum this up as appropriately normalized with the activity that you got from the forward pass and you can in this this theoretical interpretation in terms of Taylor
39:09
expansion and so on so you can do all this and you can assume that relevance doesn't increase doesn't become more less so you there's a relevance and conservation property that you need in this and so when you see OK this is the part that this mostly the body OK so here's and have
39:33
pictures and spiders and the came up and then you see of course cats come in front of different a backgrounds and with different shapes and races and everything so so all the feature selection aspects would be completely nonsensical because they would say a cat is in the middle of the picture they would not say you know what is the QAP like thing here now this is also an
40:05
interesting 1 so people in machine learning and they and I just told you that they are obsessed by the generalization error they want to optimize the unseen the error and that they want to minimize the error on unseen data so assume that you take models in this case this is a deep neural network not even trained by us this was trained by the will and this 1 is the Fisher vector model which is a very popular and computer vision model and we put this image inside and we get when maps out now if we look at the out of sample performance of both models they're the same OK so both generalize equally well but they seem to solve the problem differently where is this beast here looks at the horses this 1 looks at this lower left corner and if you look at the
41:15
lower left corner and I read to you then there's there's this tag which says that the baby the fault of the a so this is a public database of 20 million pictures nobody has ever looked at them but they used as a benchmark for measuring the generalization error so all of these models do great job but they solved the problem differently so if we are using machine learning models in the sciences all engineering we should better know why behave the way they behave In this VI able to do what and this is extremely important because if we think about physics or chemistry or engineering we cannot afford to have this this this change yeah although this is the intelligent behaviour we looking at tags right and just happens that the holes class had this time nobody noticed that's intelligent behavior but not exactly what we are thinking about a model so
42:34
understanding is a key concept that we need if we using these kind of models and if we are engaging in modeling now and come back to this physics part is a recent paper that we just that just a few that I have to see I'm taking the and tools minutes in your coffee break so I will I think it's not something OK so this is the guys at at that
43:07
to work on that all this paper it appeared 9th of generate an edge of communications so we need to take care and deep tensor neural network and learn some optimistic representation same thing shooting at like uh equation again so basically again
43:29
the molecules are transformed into some features and now this is a big more complex so
43:38
we are now trying to use a neural network so so before we had this kernel that basically compared molecules by using differences between kernel matrices now we would like to have something that is like a vote to whack vector which is the representation in in and language analysis and where you put language context which is something symbolic into a vector bending embedding so in a sense we would like to understand what other local atomic properties within the molecule how can they represent be represented somehow as a vector or something and we learn this and so this is the bound a deep neural network for every atom we have this kind of representation where you see these equations and and and and then so so for every atom we give the energy contribution and then we sum it all up of course we can also have other contributions like polarisation source so that we can use this this model for this we try to a 2 0 and think about the underlying chemistry and and use them their local interaction profiles that molecules have and and implement our model that so there's this interesting feedback loop which means that in fact this innocently looking network is a network where you have the 1st model the local context of 1 item then you look at in the next step you feed this into the local context of 2 items uh correla uh and interactions and then into 3 atom injection songs of that you share some more the weights
45:36
so so this is if you roll out OK and there's some thoughts here now if you do this and and you go
45:45
across chemical compounds based which means that you take some data on some molecules you train the model and then you take some other data that you haven't seen before and then you below 1 K and if you take just the anatomic neighborhoods it's not enough if you take into actions pairwise interactions it was um interactions between pairs which along the graph it it it is becoming better you can do the same game for
46:19
and molecular dynamics can quantummechanically accurate molecular dynamics an entity of course you need much better accuracy is so we are way below 0 . 1 Caicos homework and free simulate some molecules like this 1 here and then we have very very close to 0 to the true quantum mechanical molecular dynamics simulation now this is all
46:51
good so this was the best model available that with the same architecture it could go across chemical compounds space and for 1 single molecule could predict molecular dynamics behave now OK so now you have this thing and everybody is thinking 0 this neuron network black box but it's not like any more so we can start looking at what did this thing implement what hasn't learned about chemistry and physics and M. so we can now look at for example a bunch of molecules and we can see where would then H atom bind away would you see a combined all where M so we can we look at them molecular properties like an arrow Metacity city and that on not exactly in in chemical databases in the way we could get so so I will try to explain this so in the chemical databases we find the energy is and Our much cities and all this stuff about food molecules but for example don't have the information if we take all of em molecules that have a benzene ring that some groups which all these many molecules has the most stable benzene ring which 1 has his most aromatic so this is not in the database and so we can infer that and that's a new tool that has learned some chemistry and and so on of course you know I'm very short and this explanation and I'm not that they're physicist unexpecting courses an who has been contributing to the so I'm I'm I'm saying it in the way of that I have understood from the computer science perspective but I think this is a very good starting point to no in fact it's it's it's a field that is strongly in uh is developing so so I think are already the year before so this was to his own 15 there were only in the U. S. there were 5 workshops on this topic it's good and substantial growth that we experience and if we think about it we can we have some some techniques that are blazing fast where we can learn something and M. the the the the point is that we need to put together insights from physics and chemistry and computer science and it's a hard task but it's very great and rewarding so I'm coming to the
50:01
conclusion so machinelearning is is of central it's a driving technology in big data it's 1 of the tools driving technology is database management is 1 machine learning is the other together it makes the data this is all philosophy of the but indicated center so if we use it form for neuro science and we can do BrainComputer Interfacing could decoding brain states diagnoses what not In chemistry and we can be million times faster at high accuracy some hope on materials because if you think about this I played with molecules with also played with materials with them them much punk Institute of uh in higher layers with the Articles group and and and In searching for the superconductor so but that's the the beginning and then we much better with molecules and with the material and the question is and that's a general and the uh the question is how can we get a better understanding so it's not only about prediction we need to understand things of course in the case of industry doesn't matter which just predict because a better prediction gives us more income right that if we want to understand science then you know we we we shouldn't have horses in this nausea and I think there's a lot of open questions and 1 of the open questions that that and we also tackle in billion this is how to bring this very nice datadriven modeling technique to get that gather with the mathematical modeling world because although I have given my talk from this perspective just to be a bit provocative here because you do the other part so so in principle like 1 could say well you know that's the only thing that we need this data and then landed no need for models but this is nonsense because at some point we need to go back and have models because we need to have some understanding and and this is something that is not this path is not well found research on and I think it needs to be researched thank you thank you