Machine learning and applications


Formal Metadata

Machine learning and applications
Title of Series
Müller, Klaus-Robert
Gottfried Wilhelm Leibniz Universität Hannover (LUH)
FIZ Karlsruhe (zbMATH)
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Weierstraß-Institut für Angewandte Analysis und Stochastik (WIAS), Technische Informationsbibliothek (TIB)
Release Date
Production Year
Production Place

Content Metadata

Subject Area
Since a few years Machine Learning (ML) has broadened the modeling toolbox for the sciences and industry. The talk will first remind the audience of the main ingredients for applying machine learning. Then various ML applications in the sciences namely Brain Computer Interfaces and Quantum Chemistry will be discussed.
Point (geometry) Spacetime State of matter Texture mapping Virtual machine Mass Theory Inference Machine learning Kernel (computing) Computer network God Machine learning Support vector machine Spacetime Mapping Distribution (mathematics) State of matter Functional (mathematics) Arithmetic mean Probability distribution Kernel (computing) Computer animation Graph coloring Personal digital assistant Data center Right angle Mathematician
Polynomial Pixel Statistics Feature space Parameter (computer programming) Student's t-test Field (computer science) Theory Medical imaging Sign (mathematics) Kernel (computing) Vector space Subtraction Metropolitan area network Support vector machine Pattern recognition Spacetime Mapping Decision theory Theory Functional (mathematics) 10 (number) Arithmetic mean Kernel (computing) Computer animation Nonlinear system Function (mathematics) Hausdorff dimension Order (biology)
Computer programming Support vector machine Artificial neural network Feature space Optimization problem Virtual machine Sparse matrix Atomic nucleus Variable (mathematics) Maxima and minima Duality (mathematics) Coefficient Error message Computer animation Kernel (computing) Mathematics Computer network Vector space Computer network Uniqueness quantification Species
Point (geometry) Computer programming Perceptron Multiplication sign Scientific modelling Connectivity (graph theory) Virtual machine Parameter (computer programming) Weight Mereology Disk read-and-write head Number Moore's law Goodness of fit Sign (mathematics) Machine learning Term (mathematics) Computer network Square number Representation (politics) Data structure Feature space Noise (electronics) Pattern recognition Scaling (geometry) Spacetime Information Mapping Artificial neural network Variable (mathematics) Functional (mathematics) Approximation Estimator Proof theory Summation Kernel (computing) Computer animation Nonlinear system Hilbert space Computer network Universe (mathematics) Computer science output Speech synthesis Pattern language
Machine learning Computer animation Scientific modelling Bit Mathematical model Subtraction
State of matter Modal logic Multiplication sign View (database) Real-time operating system Bookmark (World Wide Web) Machine learning Roundness (object) Spherical cap Multivariate Analyse Multiplication Musical ensemble Programming paradigm Touchscreen Computer Sound effect Bit Artificial life Control flow Translation (relic) Fast Fourier transform Latent heat Arithmetic mean Wave Vector space Telecommunication Order (biology) Linearization Peripheral Quicksort Laptop Digital filter Game controller Virtual machine Translation (relic) Number Wave packet Social class Subtraction International Date Line Metropolitan area network Support vector machine Demo (music) Artificial neural network Projective plane Brain–computer interface Mathematical analysis Cartesian coordinate system Cursor (computers) Sphere Local Group Interface (computing) Computer animation Principal component analysis Game theory Videoconferencing
Point (geometry) Observational study Code Multiplication sign Virtual machine 1 (number) Real-time operating system Mereology Food energy Code Number Mathematics Internetworking Data compression Term (mathematics) Videoconferencing Extension (kinesiology) MP3 Standard deviation Scaling (geometry) Projective plane Interactive television Line (geometry) Interface (computing) Computer animation Universe (mathematics) Physics Right angle Quicksort Game theory
Point (geometry) Computer programming Spacetime Density functional theory Presentation of a group Scientific modelling Multiplication sign Virtual machine Similarity (geometry) Black box Mereology Schrödinger equation Graph (mathematics) Formal language Wave packet Element (mathematics) Prime ideal Frequency Matrix (mathematics) Machine learning String (computer science) Representation (politics) Subtraction Machine learning Artificial neural network Computer Forcing (mathematics) Discrete Fourier transform Bit Prediction System call Approximation Category of being Summation Computer animation Network topology Vector space Physicist Quantum mechanics Equation output Representation (politics) Matrix (mathematics) Resultant
Linear regression Set (mathematics) Multiplication sign Scientific modelling Artificial neural network Weight Food energy Prediction Kernel (computing) Authorization Subtraction Form (programming) Scale (map) Linear regression Characteristic polynomial Parameter (computer programming) Bit Food energy Weight Prediction Approximation Distance Category of being Kernel (computing) Error message Sample (statistics) Computer animation Graph coloring Physical system Matrix (mathematics)
Point (geometry) Musical ensemble Scientific modelling Direction (geometry) Decision theory Computer-generated imagery Virtual machine 1 (number) Water vapor Total S.A. Equivalence relation Perspective (visual) Field (computer science) Medical imaging Prediction Machine learning Selectivity (electronic) output Pixel Category of being Social class Spacetime Artificial neural network Linear regression Prediction Variable (mathematics) Perspective (visual) Well-formed formula Computer animation Nonlinear system Network topology Order (biology) Linearization output Dependent and independent variables Cycle (graph theory) Conservation law
Boss Corporation Pixel Mapping Artificial neural network Scientific modelling Computer-generated imagery Virtual machine 1 (number) Mereology Disk read-and-write head Total S.A. Equivalence relation Prediction Kernel (computing) Well-formed formula Computer animation Nonlinear system Personal digital assistant Conservation law output Pixel Category of being Resultant
Taylor series Coccinellidae Artificial neural network Artificial neural network Thermal expansion Weight Function (mathematics) Weight Theory Category of being Propagator Message passing Prediction Computer animation Term (mathematics) Order (biology) Computer network Helmholtz decomposition Linearization Vertex (graph theory) Interpreter (computing) Local ring Taylor series
Web crawler Artificial neural network Thermal expansion Shape (magazine) Mereology Prediction Computer animation Kernel (computing) Computer network Conservation law Selectivity (electronic) Conservation law Computer-assisted translation Category of being Pixel Subtraction
Model theory Scientific modelling Transport Layer Security Multiplication sign Computer-generated imagery Virtual machine Computer Social class Machine learning Mathematics Computer network Database Fisher's exact test Error message Social class Mapping Artificial neural network Machine vision Sampling (statistics) Physicalism Benchmark Process (computing) Error message Computer animation Vector space Personal digital assistant Software testing
Artificial neural network Scientific modelling Control flow Physicalism Mereology Tensor Shooting method Tensor Computer animation Computer network Telecommunication Equation Physics Representation (politics)
Context awareness Scientific modelling Source code Artificial neural network Embedding Weight Food energy Formal language Tensor Matrix (mathematics) Profil (magazine) Representation (politics) output Subtraction Injektivität Electronic data interchange Artificial neural network Feedback Interactive television Mathematical analysis Symbol table Category of being Kernel (computing) Voting Polarization (waves) Computer animation Vector space Computer network Representation (politics) Local ring
Product (category theory) Spacetime Group action Graph (mathematics) Computer animation Scientific modelling Neighbourhood (graph theory) Interactive television Thermal expansion Element (mathematics) Tangent Hyperbolische Gruppe
Point (geometry) NP-hard Black box Total S.A. Food energy Perspective (visual) Field (computer science) Prediction Arithmetic mean Database Arrow of time Aerodynamics Quantum Task (computing) Stability theory Computer architecture Spacetime Information Keyboard shortcut Physicalism Food energy Local Group Category of being Error message Computer animation Physicist Quantum mechanics Computer network Computer science Simulation
Point (geometry) State of matter Multiplication sign Scientific modelling Materialization (paranormal) Mereology Mathematical model Perspective (visual) Open set Prediction Machine learning Database Code State of matter Computer Theory Interface (computing) Data analysis Bit Prediction Local Group Information privacy Causality Sample (statistics) Computer animation Personal digital assistant Physics Right angle Representation (politics) Data management
so I will I will present their work on machine learning and I'm with the billing big data centers and also with you believe in and I 7 affiliation with a Korea University so
because I don't expect people to know what machine learning is In all details and I have try to to be explained very simply on 1 side and also give something for the people already know so it's very technical what I'm saying but since you're bunch mathematicians you should be given to to take so this is basic so what is machine learning about we'd like to learn from data which means that we have data and this is these are some points into the and to be data is living in very high-dimensional spaces and it has some labels y and this could be continuous or or these could be discrete like in this classification problem where we want to distinguish between red and you so the only thing that people in machine learning do is they try to infer some unknown mapping between X and Y assuming that there's some joint probability distribution that only God knows OK and the 1 important point here is that all the mass that is behind this um and tries to make sure that you can generalize meaning that you can actually get this mapping right for unseen data which is a very strange concept if you think about it because you have to actually you know from given data you have to infer it to something that you haven't seen OK so you but I show you this is like the function f that you tried to to estimate this could be a function of its of its we complex this could be also another function and is infinitely many functions which 1 is the optimal 1 that's the question optimally generalizing 1 so if I show you this data point then most of you would attribute to red label to the state of the art because you're in your brain in b by eyeballing and you do some tools the inference and their and put the red color to this case so and of course again if you have a million dimensional data are graphical dimensional data and you cannot I both things so this means that you need to have a proper mathematical background to put this in place and this is what all fields of and apart from this nice theory we actually um have a bunch of methods that actually work very well and 1 of them was already mentioned the neural networks and and another 1 that is is kernel methods like support vector machines and a lot people are using them and I will 1st tell you the idea about this and then will continue and see how how they can be so the
kernel method basically is we have some function
and W is the parameters and and phi is some mapping x is the data that you have have sign as the size OK so the idea of a support vector machine is the following so you have some data obtained again now red and green not red and you see the difference no meanings them so we map this data to some high-dimensional space the man to mapping phi that's here the and in this high-dimensional space we do something very simple for which we can prove and by the virtual doing the optimal thing in this high-dimensional space and we do the optimal nonlinear thing in the original space think about and that you to you met your data with polynomial Hs and you have a picture with thousand by thousand pixels which is about 1 million pixels and you map this with a polynomial of 10th order then you have approximately 1 million to the attend dimensions in this space which is quite large and then you do something very simple there for which you can prove and in the original space you have your function that classifies very well images when I was said no among the the 1st people who actually worked on this together with my mentor public and my students spend Alex and and so in this this moment and we're all told other things so the field called pattern recognition of the field called statistics always told us that we should not use many features we should use as few as possible and we're doing exactly the opposite and the reason why we can do this is the very nice theory that allows us to show that if we're doing the optimal thing here than we're doing also an optimal nonlinear that so that's in a nutshell what what the what the idea and practically a we
solve an optimization problem which is a quadratic program and we go to the dual and then this gives us the solution and for for this support vector machine so for those of you who haven't seen quadratic programs ignore it and for those of you have seen quadratic programs this is a nice quadratic program and so the other and learning machine
that is very popular these days is a neural
network and now neural networks have been around for quite a while and in fact they have been a dominating the brains of of the species there
a a couple million years but in science and or in computer science the real networks have been in Oak coming up and sometimes in the sixties last century and and M. basically they take the following structure so back in the sixties there was they were called perceptrons and this was just a bunch of inputs connected with some weights that summed up the the activities of the inputs went through a non-linearity and that was it a so that's a perceptron no since perceptrons or not very capable of doing any really interesting nonlinear stuff and people have thought that putting stacking a bunch of perceptrons next to each other and on top of each other and that this will help so basically neural networks on a universal function approximator so you have some input you take this input sum it up you have some weights here and um put it through a nonlinearity that's here and you do this again and again and again and again and then became bumps so just for the mathematicians so that this is a universal function approximator and I show you with a sheet of paper can so what does it do so that every component every neurons basically is this nonlinear function is a signal with a pen so it's it's this is a signal in a high-dimensional space that's it a and with the parameters you can make this even flat or of steam so now you take another signal head so this once again with the other 1 so you have this OK and the next layer so you have a rich and then in the next thing we take another rich and then you have a bomb and with a bump you can do that that's the universal approximation proof in with a sheet of paper so so these neural networks as and I'm quite but useful and then and the question is why are they getting such a hot method to it again and the reason is that because nowadays we have a GPU was so we have much more computing power and also we have a lot of data we have big data these days so in of course uh there's nothing informative about dictate data so the data is just think so it becomes information when when you actually and ask great questions to it interesting question questions so when you do an interesting model but the interesting point about neural networks is that they're very efficient in the sense that they can deal with a lot of data that's what I mean by this is say if you want to solve a quadratic part a program with a million parameters then you take approximately between a million square on million Q times computing time in terms of the variables that neural network scales linearly so that's a good thing and this is why they used with so much so now if you think about this i put my sophistications had not compare if you think about estimators in general then there's something good about a lot of data because and if you want to have an estimate something the more data you get them better you get to the solution of the closer you get there and if you have a decent estimator then this goes like 1 over n where n is the number of data points until you reach the noise level which means that if you have a lot of data this is good and most of the techniques in machine learning outside neural networks are not able to make use of a lot of data and so that's why they are now very popular OK so 1 thing kernel methods is that kernel methods use this mapping to feature space you have to say what the feature spaces for you it's some new space and so you have a representation this hilbert space that tells you how to compare things nonlinearly here the neural network learns this representation by itself from data and which means that also multi-scale information can be learned and so you have to say beforehand that's the important point so machine learning has been a huge economic sector and it is being used in self-driving cars in the best pattern
recognition methods all the best speech recognition methods these days worked with machine learning with the networks we find Higgs particles with it we do a neuro science with that use it in your cell phones when you use Google Facebook Amazon what not um and we can do also some decent signs in it OK so of the cutest felt everywhere but I
m much more interested in is to to use machine learning as some kind of an enabling technologies for the science so similarly like them mathematical modelling is enabling technology for the Sciences machine learning is an enabling technology for the science it comes from a different corner in mathematical modeling and you make up your mathematical kinetic model here you data driven of Gross these things are not as far apart as they seem so they're actually growing together any reasonable person would think this is a good idea so 1st I will tell you a bit about my
neuro science hobby that I've been pursuing since about 2 22 and this is the Berlin Brain
Computer Interface project that to um you know there's 3 principal components of that and that's the end of Benjamin Blankertz was a professor at 2 1 0 and W cool you was a medical doctor charity and myself so we came to the conclusion that it would be interesting to an given brain signals to understand what's the what cognitive state the brain was in In other something like mind well of course this is very simple minded mind-reading so but let me give you the motivation why people are interested in this this from the medical side as the number of that so-called locked in patients so they have no means for communication and they have intact brains the so if you could read their brain signals and decode them and translate them into a control signal of playing my wheelchair some writing device they could communicate with the outside world so this was 1 of the motivations that that people have and so far me because I'm a machine learner this looks like a machine learning problem because you have the gene you have a say 60 channels of EEG have with thousand hertz each and you extract some features and you classify and and what I would be showing you is a translation of human intentions into a some control signal without using muscle activity or any kind of paraffin because of course if I start reading might years or you know making faces then there's some muscle activity that will reflect itself also brain but this is not available to a less patients because they cannot you so if we want to help this patient group on the long run we need to be able and communicate with all these are without muscle activity practically and this is a complicated thing so you have this multivariate EEG and and in real time you have to do all sorts of filtering you have to remove artifacts and you have to be having a lot of feature extractors that put in all the medical knowledge of neuro science knowledge that we have about the brain doing things and then this stack this into huge vector and then we put this through our favorite learning machine this could be a support vector machine this could be some linear discriminant analysis it could be a neural network was not so when when
I got interested in this and this was about 22 years ago and the state what state of the art was that and the subjects had to to to be trained for about 100 to 300 so which means that people have to wear in Aegean people and when EEG cap like that and then they had to to be trained with biofeedback in order to get out some decent signal so we said well let's do it the other way round that the subjects think whatever they think and that's just have the machines learn that whatever needs to be learned and this reduced the necessary training time to about 5 minutes so before this was maybe a dozen groups in the world that worked on this now it's about more than 400 or maybe even 1 and 2 groups in sciences and industry doing all sorts of things with the brink so what are the applications of course 1 application is hopeful patient and this could be the ALS patients all this could be also patients with stroke where we would enhance review rehabilitation but just to get the better idea of what can be done right and and this is an old view but is still alive so you have the subject here wearing the cap OK so the data goes in into the amplifier From the amplified goes to this laptop here and the letter um gives a control signals to the screen and moves the cursor around OK and so as a subject you have too soon different brain states and forward decoding was done right we can move the subject can move the cursor to play a game which is the game of brain pong so subject is sitting still not doing anything it's it's not using the eyes and it's controlling through the virtual their brain signals this because of Don so in real time this is decoded now I was the subject you can often what did you think about k in order to get this done so 1st of all and I will teach a bit of physiology so if I would wave my hand like this to you then on my left hemisphere over the motor cortex this some activation in fact there's neither rhythm that is suppressed if I wave my left hand then the ideal rhythm is suppressed on the right side now the interesting thing and this is not something that we invented but this is no 1 on physiology and if we only imagined that without any waving without any movement we still have the same effect as the suppression of idle rhythm although the contralateral motor cortex and then this we can use for decoding purpose and this is being used as saw so so you see that in principle you can get 1 bit out but you can get much more bits out so there we know nowadays and depending on on the speller paradigms you can you can get 10 bit social community man of which so so we had demo on this and Siebert which was all already
8 7 letters in so that's the sort ready so this helpful patients that I think the interesting point is we can use this device to understand what the brain is actually doing right because the brain is a machine that is thinking and behaving really real time so so we have some other device that they can decode these activities and we can do interesting that studies and that help us to understand how the brain dead also there's something which is um quite interesting which is we can use this in technology so you've already seen someone play a game that you could so if you can imagine all sorts of interaction with the new channel that is provided by the brain that you can decode in real time and so the most I'm obvious ones that I would like to tell you about and unfortunately Professor Kleiner is not here anymore so I could complain to him that M I mean this is more of a joke so some some some years ago when this Excellence Initiative thing happened right so the university's thought about strange project and part as part of some excellent center some you know moonshot and I have a colleague in Berlin Islamists at 2 must be to must be again is 1 of the fathers of H 2 6 4 which is the current video coding stand out and every 2nd beating the Internet is coded by the so using it all the time without knowing and so I suggested to promise wrote this MP3 where people have studied very carefully what the EU can perceive and so why why should we in study with the brain computer interface what we can perceive in video and just code that so that was the idea and we wrote this and of course and sure enough the reviewers said at what not on like things to them and their but we did it anyway and universe is a line of research this number of papers and the role and then it actually what's the point is not to use a brain computer interface to you know change something in your video cold but the point is to learn something about perception and so we learn something about perception and we could use this later and impact in particular that a change I people have used this to an extent that they could improve the video coding and for example and there was a Champions League final brought is that by sky I think it was the 1 and a half years ago or something like that and from linear so they the there there have been recording being used with the new 1 with this new mindset and the interesting part is you can say something maybe this was you a couple per cent right but if you think about a couple % on a global scale because every 2nd that you know of the internet is is broadcasted by H 2 6 forwards this new coding standard it amounts to a couple of nuclear power plants in terms of energy consumption so the fact that you can actually understand something about the brain translate this into a better coding and and I'll make a planet of the that this may be a bit and obvious and it's clear it was clearly beyond the scope of these referees so the conclusion from this is that it doesn't matter what people say hey review stuff it it just doesn't work as so and another party
that would like to show you and it's it's maybe equally funny but even more heretic of care so please prepare for some heresy so this was
in 2011 and there's a very nice place on this planet which is called an iPad Institute for Pure and Applied Mathematics and this is at UCLA and I was invited to participate in the program there for 3 months and and so on my sabbatical was anyway up so as not there's nothing wrong with going to California so you can't can't do anything wrong coming that I realize that this was a program about quantum chemistry so it's not really a machine learning but I happen to have a history of being a theoretical physicist and who got some training in quantum mechanics and and did some strings and things like that in my past so I I heard people talk about the Schrödinger equation and there's a little bit Schrödinger equation is a is a wonderful and these that you can see here um and it's a very complicated equation that cannot be solved other than in approximation and people got nobel prices for these approximations it's called density functional theory In fact when I 1st heard DFT I thought what are they talking this is not discrete Fourier transform that they're talking about so coming and you know to this remark about a joint language that you have to develop OK so an when when I sat in these talks then I thought well this is very nice so people come from 1st principles and then there they do all the things from 1st principles and at some point they make an approximation so I thought why not same trees the Schrödinger equation as a black box and just this as a prediction problem so this is as if somebody takes you know obvious Stokes says everything that goes matters everything that is result medals and in between artists some can just predicted neural network with what so this is what I suggested to the people and of course they were not amused and they they were not as you know clear on how they would kill me so I survived and vast the interesting point was that we we actually as always and there was some young people infected um and but useful but I skeptical and um on tool for newly infected with or whether will let's try this idea to crazy 1 and then what they did was they generated a lot of data that with DFT so DFT for small molecules in a reasonable artifacts summation per molecule costs about 5 moles of computing time so you know next to the climate people these are the guys that use all you computed in so the data generated something like 7 thousand molecules with all the property i'm and that was all training data in fact it was not all training data but we thought we take part of it thousand as training build a model from it of prediction model and on the rest of the data that we haven't seen before and whether or not we can predict the outcome of the show which OK so this was 2011 and in 2012 all 1st period of his red-letter appeared on so the way we did this was
the following so 1st of all in machine learning and you need the input so the input of the molecule would be an so this is now molecule and you take the coordinates and nuclear charges of all the molecules and then I then you say well I have to represent and the similarity of molecules of 2 molecules somehow and a good way of representing the similarity of molecules would be and the so it's the similarity within a molecule is to to um write down something which is which we called who matrix so we take the could all forces between the eyes in the j-th atom and white this into them into a matrix so am i j the matrix element between the eyes and 2 days atom is just the nuclear charges or what the call lawful so this is a matrix representation is not effective but but maybe people have told you that in machine learning we can deal with vectors we can deal with mattresses we can deal with 10 users we can deal with graphs and anything 1 and whatever you have and so this is now a matrix of presentation and then we can take a say if we compared to more molecules mn and Prime as as represented here we just take it from being this difference between 10 and then we do something very simple I insisted that would do something simple
we take we 1st put this into galson so this is the kind of model and then we do kernel ridge regression something that is really the most stupid nonlinear model that you can think of 1 but a very powerful 1 and then you know and so it can be solved in closed form so computing time is need make little prediction time is indicative so but we
haven't and was a 10 K come from as quality which is not good enough but not bad so if we take the mean of across all the molecules as a predictor this would be around 350 k come the to give difference so a mechanical approximation would be some they get a decent 1 with these 3 for OK because something then all a bit later we use the neural nets we were down at 3 come and then 2015 we already have 1 K and recently we had the 1 K color which is chemical accuracy this is out of sample so we're just taking the molecular properties and just predicting and of course we can also predict instead of energies we can predict authorizations and everything within you know less than a millisecond so let me before I
come back to this cycle I will come back to this this more let me give you some perspective on machine so if you have a machine learning model
this is a highly nonlinear East for example a deep neural network of all these layers very complex very nonlinear so you put something inside in this you know and then neural-network answers to this for some classifies this input image into a the respective class so this is a classification a problem where you have a couple of thousand classes roosters fossils caused to do and what not right and so the neural network actually gives you the correct prediction and you wonder why doesn't give you this prediction so there's a field in machine learning which is called an feature selection so in feature selection you have all your huge bulk of data and you ask the question what of the inputs is the most salient ones and people use less water at a candidate indeed they play the in 1 tree and in order to find a few variables that are responses now the interesting thing is if you have you know hold hold bulk of data it is nice to know what is generally interested interesting but I think if we think about medical diagnosis say you couldn't care less about the diagnoses all the ensemble so you would like to know what is your individual diagnosis you would like to know what are the individual variables that the model things I important for this particular decision making and the is the reason why people in the sciences in general have been only you working with linear models is because there is a way back in in the linear model it's an obvious way back OK so if I have this linear classifier in space and I put data point here then I know that this direction is the 1 that is reasonable but if I have this very highly
nonlinear whatever the classifier how do I get back what is the right things and this was an unsolved problem that is solved now for any kind of non-linear machine learning models for neural networks for kernel machines and so on this was by as a Boston
boss who and we with and at all and I'm also among these all them so the idea is the following so we we take the classification and we go backwards From the the results of the classification to the input and make a probability map heat maps which tells you which of these pixels in this case and is the most salient ones for this particular case and you see you know it's there it's this this thing here of most of the head part of the rooster that makes that this is considered the most salient for this part of can so I I would just
give you an idea about this and so is mathematically it's very beautiful because you can you can have this is a very highly complex nonlinear things right and so you could do some kind of a Taylor expansion around so the Taylor expansion is not so helpful because it's it would be a global thing and you would have to get many orders so we could show that if you take a Taylor expansion that is local local firing neurons and then then it's easy to do with locally linear Taylor expansion that becomes global does the global nonlinear thing in this manner and it's called detail at this very popular paper and and they're in this manner you can show something some properties some mathematical properties and as you can understand this problem and so let me just give you an idea so you see you have the picture that goes in here like this lady that think and then you have your neural network that classifies this as
a is sorry not ladybird but ladybug and so this is considered a ladybug over the network and now what we're doing is we're going backwards and and we call this relevance propagation so this node was the most relevant and we say we propagate this this relevance backwards and a and basically you take all the activities of the network output you you have already the weights because the network has been trained and if you want to know the relevance here then you sum this up as appropriately normalized with the activity that you got from the forward pass and you can in this this theoretical interpretation in terms of Taylor
expansion and so on so you can do all this and you can assume that relevance doesn't increase doesn't become more less so you there's a relevance and conservation property that you need in this and so when you see OK this is the part that this mostly the body OK so here's and have
pictures and spiders and the came up and then you see of course cats come in front of different a backgrounds and with different shapes and races and everything so so all the feature selection aspects would be completely nonsensical because they would say a cat is in the middle of the picture they would not say you know what is the QAP like thing here now this is also an
interesting 1 so people in machine learning and they and I just told you that they are obsessed by the generalization error they want to optimize the unseen the error and that they want to minimize the error on unseen data so assume that you take models in this case this is a deep neural network not even trained by us this was trained by the will and this 1 is the Fisher vector model which is a very popular and computer vision model and we put this image inside and we get when maps out now if we look at the out of sample performance of both models they're the same OK so both generalize equally well but they seem to solve the problem differently where is this beast here looks at the horses this 1 looks at this lower left corner and if you look at the
lower left corner and I read to you then there's there's this tag which says that the baby the fault of the a so this is a public database of 20 million pictures nobody has ever looked at them but they used as a benchmark for measuring the generalization error so all of these models do great job but they solved the problem differently so if we are using machine learning models in the sciences all engineering we should better know why behave the way they behave In this VI able to do what and this is extremely important because if we think about physics or chemistry or engineering we cannot afford to have this this this change yeah although this is the intelligent behaviour we looking at tags right and just happens that the holes class had this time nobody noticed that's intelligent behavior but not exactly what we are thinking about a model so
understanding is a key concept that we need if we using these kind of models and if we are engaging in modeling now and come back to this physics part is a recent paper that we just that just a few that I have to see I'm taking the and tools minutes in your coffee break so I will I think it's not something OK so this is the guys at at that
to work on that all this paper it appeared 9th of generate an edge of communications so we need to take care and deep tensor neural network and learn some optimistic representation same thing shooting at like uh equation again so basically again
the molecules are transformed into some features and now this is a big more complex so
we are now trying to use a neural network so so before we had this kernel that basically compared molecules by using differences between kernel matrices now we would like to have something that is like a vote to whack vector which is the representation in in and language analysis and where you put language context which is something symbolic into a vector bending embedding so in a sense we would like to understand what other local atomic properties within the molecule how can they represent be represented somehow as a vector or something and we learn this and so this is the bound a deep neural network for every atom we have this kind of representation where you see these equations and and and and then so so for every atom we give the energy contribution and then we sum it all up of course we can also have other contributions like polarisation source so that we can use this this model for this we try to a 2 0 and think about the underlying chemistry and and use them their local interaction profiles that molecules have and and implement our model that so there's this interesting feedback loop which means that in fact this innocently looking network is a network where you have the 1st model the local context of 1 item then you look at in the next step you feed this into the local context of 2 items uh correla uh and interactions and then into 3 atom injection songs of that you share some more the weights
so so this is if you roll out OK and there's some thoughts here now if you do this and and you go
across chemical compounds based which means that you take some data on some molecules you train the model and then you take some other data that you haven't seen before and then you below 1 K and if you take just the anatomic neighborhoods it's not enough if you take into actions pairwise interactions it was um interactions between pairs which along the graph it it it is becoming better you can do the same game for
and molecular dynamics can quantum-mechanically accurate molecular dynamics an entity of course you need much better accuracy is so we are way below 0 . 1 Caicos homework and free simulate some molecules like this 1 here and then we have very very close to 0 to the true quantum mechanical molecular dynamics simulation now this is all
good so this was the best model available that with the same architecture it could go across chemical compounds space and for 1 single molecule could predict molecular dynamics behave now OK so now you have this thing and everybody is thinking 0 this neuron network black box but it's not like any more so we can start looking at what did this thing implement what hasn't learned about chemistry and physics and M. so we can now look at for example a bunch of molecules and we can see where would then H atom bind away would you see a combined all where M so we can we look at them molecular properties like an arrow Metacity city and that on not exactly in in chemical databases in the way we could get so so I will try to explain this so in the chemical databases we find the energy is and Our much cities and all this stuff about food molecules but for example don't have the information if we take all of em molecules that have a benzene ring that some groups which all these many molecules has the most stable benzene ring which 1 has his most aromatic so this is not in the database and so we can infer that and that's a new tool that has learned some chemistry and and so on of course you know I'm very short and this explanation and I'm not that they're physicist unexpecting courses an who has been contributing to the so I'm I'm I'm saying it in the way of that I have understood from the computer science perspective but I think this is a very good starting point to no in fact it's it's it's a field that is strongly in uh is developing so so I think are already the year before so this was to his own 15 there were only in the U. S. there were 5 workshops on this topic it's good and substantial growth that we experience and if we think about it we can we have some some techniques that are blazing fast where we can learn something and M. the the the the point is that we need to put together insights from physics and chemistry and computer science and it's a hard task but it's very great and rewarding so I'm coming to the
conclusion so machine-learning is is of central it's a driving technology in big data it's 1 of the tools driving technology is database management is 1 machine learning is the other together it makes the data this is all philosophy of the but indicated center so if we use it form for neuro science and we can do Brain-Computer Interfacing could decoding brain states diagnoses what not In chemistry and we can be million times faster at high accuracy some hope on materials because if you think about this I played with molecules with also played with materials with them them much punk Institute of uh in higher layers with the Articles group and and and In searching for the superconductor so but that's the the beginning and then we much better with molecules and with the material and the question is and that's a general and the uh the question is how can we get a better understanding so it's not only about prediction we need to understand things of course in the case of industry doesn't matter which just predict because a better prediction gives us more income right that if we want to understand science then you know we we we shouldn't have horses in this nausea and I think there's a lot of open questions and 1 of the open questions that that and we also tackle in billion this is how to bring this very nice data-driven modeling technique to get that gather with the mathematical modeling world because although I have given my talk from this perspective just to be a bit provocative here because you do the other part so so in principle like 1 could say well you know that's the only thing that we need this data and then landed no need for models but this is nonsense because at some point we need to go back and have models because we need to have some understanding and and this is something that is not this path is not well found research on and I think it needs to be researched thank you thank you


  737 ms - page object


AV-Portal 3.11.0 (be3ed8ed057d0e90118571ff94e9ca84ad5a2265)