An introduction to inverse problems with applications in machine learning
Formal Metadata
Title 
An introduction to inverse problems with applications in machine learning

Title of Series  
Author 

Contributors 

License 
CC Attribution  NonCommercial  NoDerivatives 3.0 Germany:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2017

Language 
English

Production Year 
2017

Production Place 
Hannover

Content Metadata
Subject Area  
Abstract 
The presentation starts with some motivating examples of Inverse Problems before introducing the general setting. We shortly review the most common regularization approaches (Tikhonov, iteration methods) and sketch some recent developments in sparsity and machine learning. Sparsity refers to additional expert information on the desired reconstruction, namely, that is has a finite expension in some predefined basis or frame. In machine learning we focus on 'multi colored' inverse problems, where part of the application can be formulated by a strict analytical framework but some part of the problem needs to modeled by a data driven approach. Those combined problems can be created by data driven linear low rank approximations or more general black box models. In particular we review deep learning approaches to inverse problems. Finally, machine learning techniques by themselves are often inverse problems. We highlight basis learning techniques and applications to hyperspectral image analysis.

00:00
Point (geometry)
Military base
Heat transfer
Cartesian coordinate system
Theory
Type theory
Mathematics
Collaborative software
Computational science
Theorem
Figurate number
Series (mathematics)
Reading (process)
01:05
State observer
Linear equation
State of matter
Combinational logic
Mereology
Perspective (visual)
Medical imaging
Mathematics
Machine learning
Different (Kate Ryan album)
Matrix (mathematics)
Mathematician
Error message
Descriptive statistics
Physical system
Social class
Digitizing
Moment (mathematics)
Bit
Measurement
Vector space
Fehlerschranke
Right angle
Summierbarkeit
Quicksort
Sinc function
Slide rule
Perfect group
Computer file
Transformation (genetics)
Diagonal
Connectivity (graph theory)
Virtual machine
Distance
Field (computer science)
Theory
Goodness of fit
Term (mathematics)
Operator (mathematics)
Energy level
Task (computing)
Condition number
Noise (electronics)
Dependent and independent variables
Approximation
Inverse problem
Personal digital assistant
Universe (mathematics)
Numerical analysis
06:00
Point (geometry)
Slide rule
Dataflow
Theory of relativity
Multiplication sign
Distance
Cartesian coordinate system
Measurement
Inverse problem
Profil (magazine)
Different (Kate Ryan album)
Speech synthesis
output
Nichtlineares Gleichungssystem
Quicksort
Determinant
Physical system
07:41
Game controller
Transportation theory (mathematics)
Multiplication sign
Virtual machine
Protein
Vibration
Field (computer science)
Attribute grammar
Power (physics)
Centralizer and normalizer
Different (Kate Ryan album)
Operator (mathematics)
Software cracking
Differential equation
Data structure
Physical system
Rotation
Standard deviation
Expression
Staff (military)
Software maintenance
Cartesian coordinate system
Measurement
Performance appraisal
Process (computing)
Inverse problem
Order (biology)
Universe (mathematics)
Video game
Iteration
Pattern language
Musical ensemble
Quicksort
10:32
Discrete group
Complex (psychology)
Context awareness
Range (statistics)
Numbering scheme
Mathematical model
Dimensional analysis
Sign (mathematics)
Different (Kate Ryan album)
Core dump
Office suite
Error message
Information security
Physical system
Constraint (mathematics)
Mapping
Software developer
Infinity
Sound effect
Bit
Maxima and minima
Funktionalanalysis
Mengenfunktion
Category of being
Process (computing)
Duality (mathematics)
Physical quantity
Quicksort
Spacetime
Point (geometry)
Slide rule
Diagonal
Civil engineering
Canonical ensemble
Regular graph
Element (mathematics)
Product (business)
Kritischer Punkt <Mathematik>
Profil (magazine)
Term (mathematics)
Energy level
Data structure
Form (programming)
Standard deviation
Information
Key (cryptography)
Expert system
Line (geometry)
Cartesian coordinate system
Vector potential
General relativity
Word
Uniform resource locator
Beschränktheit <Mathematik>
Numerical analysis
Iteration
Game theory
Discrepancy theory
Funktionenalgebra
Library (computing)
Gradient descent
State of matter
Direction (geometry)
Multiplication sign
1 (number)
Set (mathematics)
Parameter (computer programming)
Function (mathematics)
Replication (computing)
Mereology
Food energy
Subset
Korrekt gestelltes Problem
Mathematics
Strategy game
Matrix (mathematics)
Control theory
Endliche Modelltheorie
Position operator
Stability theory
OverlayNetz
Algorithm
Sigmaalgebra
Moment (mathematics)
Physicalism
Thermal expansion
Measurement
Flow separation
Type theory
Vector space
Linearization
Normal (geometry)
Selforganization
Parametrische Erregung
Linear map
Resultant
Sinc function
Row (database)
Finitismus
Game controller
Existence
Functional (mathematics)
Divisor
Observational error
Heat transfer
Distance
Theory
Field (computer science)
Natural number
Operator (mathematics)
Analytic continuation
Mathematical optimization
Addition
Noise (electronics)
Multiplication
Mathematical analysis
Analytic set
Variance
Inverse problem
Object (grammar)
Spectrum (functional analysis)
25:04
Game controller
Statistics
Software developer
Image processing
Physicalism
Cartesian coordinate system
Mereology
Theory
Field (computer science)
Inference
Sparse matrix
Term (mathematics)
Factory (trading post)
Representation (politics)
Software testing
Iteration
Data structure
Coefficient
Discrepancy theory
Resultant
Gradient descent
26:53
Group action
Interpolation
State of matter
Decision theory
Multiplication sign
Source code
Numbering scheme
Mereology
Theory
Mathematics
Term (mathematics)
Differential equation
Nichtlineares Gleichungssystem
Renewal theory
Position operator
Physical system
Collaborationism
Schmelze <Betrieb>
Bit
Cartesian coordinate system
Frame problem
Sparse matrix
Process (computing)
Universe (mathematics)
Iteration
Object (grammar)
Quicksort
28:34
Point (geometry)
Interpolation
Functional (mathematics)
Multiplication sign
Direction (geometry)
Source code
Computer
Proper map
Theory
Field (computer science)
Medical imaging
Term (mathematics)
Gastropod shell
Theorem
Contrast (vision)
Endliche Modelltheorie
Genetic programming
Hydraulic jump
Metropolitan area network
Addition
Projective plane
Analytic set
Cartesian coordinate system
Limit (category theory)
Sequence
Sphere
Approximation
Proof theory
Type theory
Sparse matrix
Word
Software
Vector field
Logic
Interpreter (computing)
Codec
Funktionenalgebra
Bounded variation
Resultant
Spacetime
31:29
Sensitivity analysis
Differential (mechanical device)
Combinational logic
Set (mathematics)
Shape (magazine)
Function (mathematics)
Mereology
Dimensional analysis
Derivation (linguistics)
Different (Kate Ryan album)
Cuboid
Endliche Modelltheorie
Physical system
Algorithm
Mapping
Instance (computer science)
Variable (mathematics)
Sequence
Vector space
Linearization
output
Website
Ranking
Quicksort
Linear map
Resultant
Server (computing)
Functional (mathematics)
Virtual machine
Black box
Student's ttest
Graph coloring
Theory
Wave packet
Term (mathematics)
String (computer science)
Data structure
Form (programming)
Projective plane
Mathematical analysis
Basis <Mathematik>
Mortality rate
Approximation
Word
Kernel (computing)
Software
Nonlinear system
Digitale Videotechnik
Numerical analysis
Iteration
Routing
Gradient descent
35:37
Dataflow
Direction (geometry)
Characteristic polynomial
1 (number)
Set (mathematics)
Theory
Computer programming
Subset
Wave packet
Expected value
Latent heat
Matrix (mathematics)
Alpha (investment)
Predictability
Projective plane
Constructor (objectoriented programming)
Variance
Xray computed tomography
10 (number)
Process (computing)
Software
output
Video game
Resultant
Tuple
Spacetime
37:36
Point (geometry)
Medical imaging
ACID
Numerical analysis
Port scanner
Complete information
Lie group
Cartesian coordinate system
Graph coloring
38:24
Virtual machine
Basis <Mathematik>
Regular graph
Approximation
Process (computing)
Vector space
Personal digital assistant
Different (Kate Ryan album)
Term (mathematics)
Orthogonality
Program slicing
Ranking
Series (mathematics)
Spectrum (functional analysis)
39:26
Sparse matrix
Standard deviation
Algorithm
Process (computing)
Helmholtz decomposition
Different (Kate Ryan album)
Orthogonality
Interpreter (computing)
Median
Basis <Mathematik>
Data structure
Resultant
40:05
Slide rule
Complex (psychology)
Software testing
Iteration
Graph coloring
Resultant
40:41
Observational study
Software
Personal digital assistant
Pattern language
Spectrum (functional analysis)
41:45
Type theory
Mathematics
Observational study
Inverse problem
Forcing (mathematics)
Virtual machine
Computer science
Numbering scheme
Right angle
Computer programming
00:00
figure out material was great pleasure to accept this invitation as he lots of familiar faces in the in the audience through the back more less to sit this type of groupware thing 10 years ago in a similar situation so I'm the center NASA mathematics so we are rooted in in
00:17
mathematics but we spend our activities the scientific computing I would say half the people are what we call a doing mathematics reading proving theorems and the people are using mathematics and scientific computing so the user bases of the mathematical theory in scientific computing and also in transfer so our center has the obligation to actually show the impact of our series also outside the academic girls this is written in our of constitution every 5 years we have to defend our budget and then there is always a big point how well they do in real industrial application so it's always difficult with such a mixed audience whom to address should material Alex indeed lower should address the librarian's amongst you were so I decided that I make half of the talk as the introduction on a very basic
01:07
sort of basic motivation level and then the 2nd half a just talk about what interests me at the moment so to get their research overview of what we're doing and I try to put this into perspective of the general interest in in the field of the universe and inverse problems in some sense it's a followup of what's what's the present the talked about yesterday since part of the talk will be on machine learning which has 2 sides 1 side is that it helps us to solve inverse problems gives little different perspective on the well known problems the class the universe problems and on the other hand many machine learning task I actually inverse problems and we can apply all theory to improve some parts of machine learning has a topic by itself Cynthia mathematicians in the room we started the very simple example many 2 by 2 matrix this is overly simplified to those who know inverse problems already and so I apologize for those who are rooted in the mathematics theory but it gives a flavor of what's really going on so you take a 2 by 2 matrix of and the colorful what problem is that we take a vector 1 long and compute its image under this operator if you like to say so often this matrix of this transform offers system description given by a linear 2 by 2 system and I think all able to check that this is hopefully the correct answer right and mechanistic and the inverse problem is that you're given the the matrix a and the right side y and you're asked to solve the linear equation y equals a x and this is given by the inverse applied to to Y at least in the twodimensional case is possible and then there's a little example but you can't just do issues say to the same problem with data are if less than 1 per cent error in the data and then we just cut off after the 2nd digit then this is rounded to 4 full well for should be for 2 years and then the inverse all of a sudden this minus 200 to 1 so even small change of less than 1 per cent in the data has enormous error in the In the reconstruction difficult applying the inverse reconstruction using the data why delta and those other the 2 typical features which we have to dress in all kinds of inverse problems namely that there always noisy measurements typically we tried to determine of certain observables X which cannot be measured directly but we observe it through some noisy measurements for the measurement matrix a and Y equals the measurements and the inverse is ill posed not only condition but also post which makes so and also give you the is the most common example how to remedy this on the 1st few slides since after the 1st 3 slides you probably half of the audience fades away and if you a little bit intimidated by the high number of false lights don't bother happy talk more than 2 minutes per slide so you can can check that this was not too long so how
04:06
to solve this is to stabilize this is simple way namely adding something on the diagonal if the matrix self is symmetric positive definite you can really get Chosun all fall on the diagonal and sold the system where you know that it does not give you the exact answer to accept that it solves a different problem but this should give you something which is more stable in terms of a sense that the total error can be controlled so the a is not symmetric then you just multiply both sides with the transposed matrix and now we get a system which is symmetric positive definite semidefinite and then you can apply the same trick shifting the respectable fall by adding all from the diagonal and then to solve the system of the perturbed perturbed matrix then to get this to picture but if you look at the error the distance between the difference between what you reconstruct using the the it change system with this all fired on the diagonal with the true solution this bit bearer he do you do with as if you would do the reconstruction perfect data about that this approximation excel file and you split up the 2nd part of the error which is the approximation error coming just from the difference was 0 noise but doing to reconstructions perfected don't recognize to get to error components namely the approximation error this gets smaller and smaller and smaller evolves like good sound is 0 if of noise state I would be perfectly choose all thinking 0 otherwise the data are the inverse of the data are my clothes and you want to control the sum of those 2 so you want to control the combination so this is a typical feature of inverse problems you do not try to make a perfect reconstruction you have to sacrifice some of the accuracy for and defined a perfect matches the task of response that they give you a 2nd example
06:04
which is slightly more and involving but still can be explained with 1 or 2 slides so this was actually done by by Kaplan many many years ago it serves the distances by some of the planets from from the Earth but he didn't want to have the distance he will give the path of the planet you want to determine the speed of the planet and the relation between measuring the distance and trying to determine the speech if she integrates at time 0 if a certain distance from your measurement point at 0 and you have to the speed profile F of towel for time tau and if you integrate the speed to get the distance so that's are a basic equation which comes from from my school and the quest is now you measured she and you want to interfere about so probably more modern application will
06:52
be trying to determine highway and everybody knows if you drive this constant speed you this is on the shown increases linearly and if you accelerate as much as it can whenever there's little gap in the and the flow and then they have to break down the river speed profile which is pumping up and down but the distance to cover is almost the same so now we see that this is also a typical or to be all set up that large differences in the input of the system difference speed profiles have been very different generate only various minor differences in the so the former problem this sort of thing we just integrate that's fine but the inverse problem you have to look into small differences to interfere large differences in the input data and this causes the instability and this will come from the from the will add to the complications of
07:44
inverse problems so I did choose this inverse problem which we had when we're still important develop Olson curler this was the use of all sorts are engines in and out of it and they had a very nice inverse problems namely what they typically do they have to make the maintenance staff to check the engines and they have to check whether for example central shaft has little bands cracks in it developed over time and what they do is they play some acceleration sensors on the outside and the let the machine wrong at certain rotation speeds and the measure the vibrations and from the measurements of the appraisal outside the try to interview the the internal structure of the shop and that's recognizing the problem and it's clear that is from the transportation a system is then it's meant to be then things and see if you have a vibration of the engine not elect Kaplan should be migration migrating so even if you have sort of comparatively large deviations from the mobile and the central shaft the measurements might not show this so strong and this attribute example which is given by by a complicated structure but it was sound to the classical mechanical iteration of of power and resources so forces some operations which is 2nd order differential equation OK and there's an overview and if
09:10
you look at our Dussel implications over the last 20 years almost all of them I inverse problems there's hardly ever a direct problems whenever you do there's hardly any quantity of interest which can be measured directly even if you look at the temperature if you have those little instruments what to show what you see is not temperature you see the elongation of this column off some something which which expands on the heat so you never measure anything more honest directly and this Beverly applies where's our main field application is quality control whenever you have a technical process you never measure directly what causes the the defects you always indirect measurements which also applies to to life sciences where the difference between being else and all tells the shows in the proteomic pattern but you never measure directly that cancer but you measure a protein which has a high expression as compared to the old have with status so I think you really inverse problems are the everywhere and if you think about its maybe in those problems even if you don't treat it as an hour of problems hidden inside it is an inverse problem and it will expand if it if you would apply some thinking about the problem from the viewpoint of the universe we construct of OK so this is our our topics
10:34
and now I would like to go with a bit more about the mathematical models on the same basic structure so as a century after always that the search for an unknown f which you can measure directly in a is transfer operator the passion matrix this the matrix whatever you have in your justification in your mind and the 2 different roles you can model this this greedily you as a matrix vector or you can do this in the continuous setting function spaces and I very much prefer the function space setting for the reason being modest lazy if you think about this could problems you have to combine the problems coming from the operator with assembling questions is a discretization model correctly did you have a good scheme for doing the numerics and whenever I get a result which is says something off the old off you have to worry about the constant from really is bounded independently of the dimension of discretization library much prefer that the function space setting where we look at operators believing function spaces and then leave the discretization through to the 2nd 2nd step and I think it gives a much bigger picture it's smiling and and gives you some insight on the core problem which is not and how to say disguised mixed up with some some additional problem but others come back to the the problem of trying and the German lives with different speed profiles and maybe the police just measures the distance doesn't measure the directly via the speed but it measures the distance from the from the laser cannon really have position variance instruments and lost instrument all this has a measurement error and even if the measurement error is only 1 per cent the the overlay a profile with small deviations with error you can not see the difference anymore through the so typically the error is much stronger influencing via the data and then the deviation to try to detect so there's a big problem what can you do in this situation how should you be able to do any meaningful reconstruction if the error is stronger than the difference you want to look at and this comes from by using some side information and we want to explain this in the next slides so the method that the model you're looking at is that we have this operator this mapping from the space of entities for their properties which should try to determine to the measurement space were actually the data and we could say the solution sets all parametres which explain the data within the given the accuracy of the measurement device so you say you look at the set of all paths subset of f minus a given data is bounded by the error level that you have to accept due to physical reasons of measurement reasons or whatever is the the course of your of your deviation from the true through measurement and the problem is illconditioned illposed problems is that this but set is unbounded so even if you are from the data side so you know the data lies close to the true data if you look to the potential elements in X which are mapped by to this set it's an unbounded set and still to try to do is you have 1 particular element in this small ball in Y and you somehow have have to 6 1 elements on the other side as being close to the true solution and there may be many problems 1 is that the set is unbounded but also that there might be multiple solutions nobody says that a has to be injective and it can be have multiple solutions to everything is combined in saying that the set of potential solutions is unbounded and the crystals what should you do to to do this and that's the topic of factorizations theory which you country month so the remains of regularizing this and really the most important 1 is standing and all from the diagonal which is called taken a functional applications if you extended to more general it generalizes to more complex settings and iteration methods and the purely analytic likability expansion the thing is of less interest for replication so the 1st idea is to reform relates solving a the equals the delta as a minimization problem instead of saying AF should be equal to T delta and tried to minimize the residual if you go for it you solution this is stephanie 0 but as we said to be sacrificed security for stability and we say we don't want this to be 0 but will just on that have a small and on the same it's on the same arguments we would like to to avoid this explosion of the reconstruction evil this 2 by 2 example where all of a sudden the small change in the DataCite cost the true solution to explode to minus 1 and 200 real white this by saying we penalize f to be too long and therefore you small civilization term also has to be chosen appropriately that's a different topic which we see in a moment and this means if you do the analysis property that's the normal equations of of all iterations then exactly do would be what he explained before use the transpose a or a at showing the blights after a at something on the diagonal and that's what sold with the data which has been transformed to the proper space but by the transpose of and this has the great effects at minus 1 is unbounded absolutely enormous in finite range of various stop found it's not the full space Kennedy applied to any data you have in mind this all is remedied by is shifting the spectrum by all 5 or by using a a possible 5 and this is well bounded linear operator which is part it and hence can be controlled what I and now we have the problem shifted a little bit that we need to determine the optimal all from the different strategies to do heuristically very little on theoretically but the jails general or the structures that the face to large then you penalize F to strongly it so it will be too small giving it to a small solutions if is too small on the other hand them this instability occurs and the norm of the solution so will explode to have to be really careful typically you just solve it for many many all fast and you have to develop a 2nd criterion for choosing the optimal of and this is really problemdependent applicationdependent and there's very little of useful from the channels you OK this can be it can be it can be generalized we nodes that only look at the at L 2 norm so we can have rather channel distance norms frequent distances war directions and so on properties and we can have more general penalty terms but the general theory is still a valid that if you don't regularizing don't steep stabilised by a penalty term that get in stable to large reconstructions and you can encode some expert information in designing a penalty term
17:50
saying we want if the norm of the of reconstructed small is just 1 piece of expert information may be saying that nature is lazy wants to do everything the least energy so if solution which does something of the small energy in the form of small L two norm then the preferred but this can be different types of expert information which can be encoded in the penalty term as much as you like can also encodes nonnegativity constraints incomes and context relaxation Properties problems as well and 1 thing we come to the difference to control theory in a minute this that's what on the theoretical side interested in is when we talk about convergence is typically a different convergence as when you talk to people in America analysis Ward optimization for us to be critical property the critical versus what happens if the data are those 0 to really approach the true solution if measurements get better and better so not and infinity no discretisation no iteration scheme but the problem the convergence property we are concerned officers what happens if silicosis 0 and this is the key question the the dressing on the theoretical side and then of course we need also numerical schemes for computing the minimizer of those and every Borel from from from control theory optimization so this is a typical result in results but my thing of course is a to of mathematics is Stephanie true but maybe it's it's useless in reality but it's not it really helps to to choose the regularization parameter correctly it also tells you if you know more about the Solution in terms of regularity of the solution is 2 times a frangible instead of 3 times the how should you change your organization it also tells you how you should choose the record the year the discretization so that the discretization error is not interfering with the accuracy of the analytic results so this is really a theoretical result if you know how to interpret traded it gives you a lot of information and it's definitely useful for applications and there are lots of 1 consequences coming from so let me highlight to papers in the talk of the height of 2 papers which we have changed the game of the field and models this people buying our coordination and about 189 where they transferred everything from the linear theory to loneliness theory and transfers several words since they use complete and it is completely new concepts in terms of a linear function analysis which were not known in the few before and they developed the full theory 1 paper and this was the starting point which I think all of us going for next week for the next decade from 1890 2001 as that's developing theories from all linear was the key and again the algorithms are surprisingly simple and at the end but you need to find the correct answer to it you just take the discrepancy term of critical functional and domestic trade in the sense which 1 nonlinear operators supplies iterative from the option of iterative make a step size control and then you make a gradient descent for the discrepancy terms and if the stop accordingly what choose of regularization for the pipeline that's fine before for fertilization purposes but the the says it's fine to do this in a more general case so that it's a few words about control theory and optimization as opposing most problems both feel the same topics that have discovers interspersed tend to transpose the minimization of functionals as 1 of the really most important core problems in both fields so it looks superficially it as it would be identical but if you go to the details there some some major differences In this problems we always say there is a real unknown physical quantity we try to determine like this a speed profile the cars driving the distance be profile there's is a physical quantity which is there we just don't have direct access to its that's why the measure the data you don't in control problems delete have a piece of ancient process which you want to drive to a certain and the states maybe production line that should produce certain quantities UTC cyan the outputs as wishful thinking you think OK will degrade to have this kind of control of parametres control after which buys system to stage which I desire that is by no means clear whether this is possible whether a allows you to find a controls such you those subset reached the it's completely different question as much harder in this respect on the control theory side Texas is clear problems spots controllability and existence this is much harder for the for the control problems on the other hand our major interest is how to control the measurement noise and since G was the science as a desirable it's artificial the exact there's no problem with noise so if you look at the theory in the till they are rather rather different we talk about different converge since we're really have to go to the X space while while for for the control problem you're interested outputs how well you were close to the desired state is really a different type of theory but when it comes to the numerics it's more or less the same you minimize functional and is among major development over the last 10 15 years that's lots of the very finds theory from from control theory was shifted those problems that's what my field there are many people who have much much better that's for example dual thinking that using in dual problems so we take use duality theory to explain the effects of inverse problems and also incorporate them like you reversible for for the signing very efficient algorithms there was a huge impact from control theory on the inverse problem so it's a much stronger than the than the other way around and there's some experts which are much more much of the FIL so I think I've used up know about soccer have 15 minutes OK now let's talk about 1 of the other topics rather quickly so again we have this problem that we don't know which element we should choose on the reconstruction side and typically a very smooth objects in this set of potential solutions which explain the data and the given accuracy and raped ones which already much localized and the question is which 1 you prefer and this comes from the application so if you say the norm has to be small typically this gives a smooth solutions but if you say defects are localized you would probably rather say OK if they they tell me the central shaft is sort of spend it all over non or I will this year as location of the defect there is only very few spots and I would probably prefer this solution and the question is how can I do my my algorithm to jump to this element in the the dataset and of this 1 and this actually is what
25:07
applies to most applications physical structures are sparse the deviation from healthy to to Cyc is encoded not in the full proteomic landscape on an array proteins the defects are localized so sparsity as it is called as a strong concept and the question is how can we control our algorithm into
25:27
sears and sparsity just means that if you have appropriate representation you can represent everything with the very few coefficients not knowing how many but with you and let me jump to run 1 problem this was introduced and this is the 2nd paper which change the field in 2004 I think she wrote this paper Bishop knowledge methods from image processing or statistics which were around there and developed the theory for you most problems with L 1 penalty terms articulation is p equals 1 that's what was called the the sparsity penalty term and she developed a nice theory and wonderful post get test there was an inference and to think it over she those years he came back in 2005 and since then for lasting years this was our main interests to develop sparsity theory which in fact is very similar still too the previous you do this gradient descent you can even include a stepsize control here so this is a gradient ascent of discrepancy term and then the interrelated with a shrinkage term which just throws away so small coefficients so it's a very nice iteration is spend it comes to to the other things it's easy to explain its gradient descent + French and this then developed a huge future factory of results we were 1 of the people who extended to nonlinear more problems it's still going on there some from reasons solves it's part is smallest federated the Austin
26:54
group before much out so did a lot on the theoretical side applications are still still of interest and fast wasn't so still a problem I think back and it would have a great so they don't have them and I think Figueiredo and nested ofyour having the quality the best iteration schemes for for solving those systems which come after the different names for the community so let me give you
27:19
1 application in a bit more detail so this was coming from the collaboration with the by cox from my work and you'll still at the end of the year university and then have her 1st position industry so this was from both those times when the moon was going from 25 frames per 2nd 200 frames per 2nd so you get 2 pictures and the question is just how doing asserts the frames in between there's some snowflakes flowing around she is shifting ahead a little bit the lady so if you would use interpolation everything would be smeared about and our approach was to do this by differential equation in we say part of the object as a transport equation no from flex transported price effusion like melting of the snowflake and part is the sole source for renewal objects occur and then we employ sparsity and sparsity zest that's it that's should every of those things should have been should have minimal support and some explain the state of the and new support means the other decides whether he he uses diffusion process here of low terms and it gives you a the mathematics of whites smearing out and sort of mediocre decisions sparsity it takes a strong position this is melting and this is the this is
28:36
from flowing and this is the creation of new project and this is a little sequence which will solve this so we inserted 10 images of this man closing the back of his car and look at the edge of the of the work and if you would use using the interpolation is would be smeared out but if you sparsity and the transport terms this meshes nicely transported with chocolate of America mostly you see is that the so contrast in the background jump since the it always boy sparsity in time and wants to do everything the source term chums at 1 time step from the background to make it so maybe this could be done better but in general this gives further results at the expense of terrible computation times so so it's really not useful if you want to use realtime applications but if you have time to
29:32
do with this type of some sequencing mystical interpretations yeah but don't let's come to the topic of what's Mr. Miller said yesterday and I think you really this is sick great point and I must say 20 years ago when you networks and so on simulated annealing genetic programming 1 definition of the local and addition you didn't work in this field for the reasons that people talk about the opinions rather about theorems that's not really a solid sphere really it's just say all my value is better than what is proof that 1 example better so it was really strange this topic of this has changed and that they say few words why the classical approach which was modeldriven is probably not not suitable for all applications naturally incomplete has has limits so it's absolutely clear but more importantly for Big Data applications running them the full model is too expensive in solving a PD a few hundred thousand times by be too complicated for automotive driving in the car no way so you need to have to introduce at the expense of accuracy the reduced model which gives a good approximation but my main point is that this function analytic theory has lots of drawbacks since it uses those function spaces and even for images we don't have a proper function space we don't know how to describe image so he's my assesses the directions of N H 3 what to vector field works well maybe maybe not be total variation is a good model but it's 17 not what the best model and then those models should apply for natural images as well as for a radio Logical Images they have rather different structures but before the same same same function so I think there isn't much finer model needed for really capturing the essence of certain subclasses which are not captured by the shell from here to the opposite this
31:31
datadriven approaches to throw away the model and just say of huge datasets where you know the mapping and then to construct a black box model so that's what's what sets yesterday and the 7 years and approaches which came up rather rather often in the last 10 years
31:50
and then I'll just give you 2 examples you can build a low rank approximation so we just look for some some data vectors on the 1 side which are the basis vectors and some output vectors and you construct a kernel which has a finite dimension and then use the projection on this basis as a function and this is called basis pursuit of basis learning approach machine learning you return the optimal basis for for your problem but what the people really how you can check it's try to find as much as came from very little they are either on the black box will slide or the white mold sites being being modeldriven data but why not use as much as it can the analytic model and only ended up with some correction for the parts which you don't want to model because this is computationally too expensive in which you is not covered by a physical model or for any other reasons and when I talked about was the last years and or works just coded multicolumn since you have a black box model many but not most of them aren't really black they have some color as we heard yesterday and assuring between living in between any use really both in advance it's neither white model or mortalism multicolor model and I think this is phrase very nice but we found out that even in wikipedia and there's a chapter about trade box models its modest exactly described in this and there 1 paper which you found which applies a string of problems trade models are well adapted for full models but there's very little and that they say what is the basis for all the analysis is the derivative so what you need to be able at 2 dif different shape the form what model with respect to x for designing gradient descent for doing then a sensitivity analysis for doing your characterization theory and this is the problem if you talk about networks no network servers had 5 minutes of so we have no networks are very simple structures this is a sequence of linear operations followed by nonlinearity so if you're right it's in a way it's just a combination of all linear maps and that's it so the differentiation is theoretically very nice but is also very nice in terms of algorithm whatever they do it's in training in your network they compute the derivative respect to the design variables but by the same way you can also compute the derivative of respect to the input our so did so did determined via the route to with respect to the input variables of the penalty functional instances simple so you can directly apply all trading descent approaches even for from undercover problems so that's the 1st concept I want to explain we just sort this about a year ago given to a 3 PhD students WordNet so we don't have too much results in a well we've just explained to you 2 more concepts coming from the next 1 is a is a great great concepts so this was Lacombe 1 of the big guys from institutional users who use that if you take this iterative soft shrinkage Ireland and that's exactly the structure often the red for you have linear operation followed by a nonlinearity if you fix the number of iterations that's exactly a Klevel will never so that's what he says we formalize everything as having nominees just combines the linear part systems who like and then be assessed instead of saying that this is composed from a words so use the
35:40
huge datasets to learn those matrices W and the matrix B and even the permit alpha and based on the specific data set you want to do of course iterates SoftRank is optimal in l 2 spaces respected you about your specific data set is only a subset of a certain characteristics and this is captured by the learning process where user data for and I would say much more elegantly than if you go to suppressing of problems of citizen of problems which is really a clumsy theories and they work they have to do from getting 1 more the variance instead of just the expectation values so horrible and he is really elegant and this really helps to give you better results and the experiments are nicely done and he's the done since programs for for training nor networks are all of the tens of flow for the honor of the ones and there was a paper by Michael answer which made me realize that angry that it's the approaches fine so the general approach would be computerized tomography take cytochrome useful protection get a reconstruction and said now takes the other constructions done by Philip the projection as inputs to an on your network and combine them to get a better predictions and they tested the meiosis and admire dataset and that it's better and that's not so surprising so used to propose specific network
37:07
and make me angry say that this at all we can apply to sparse data just 50 directions instead of 2 1 and get better results than for the black projection and that's a tuple mistake you can combine you compare your with a very bad them in the 1st and say it's better nobody would useful backprojections sparsely sampled sexrelated but this is OK I finally got which is no surprise that this general idea to use networks of life afterburner
37:36
for improvement is definitely good ideas and we just
37:39
test so last 3 minutes I will talk about what we have been working as an application on where we actually made it to a dozen applications was was mold imaging repetitious lies into the laser and the reverberated material is runs for mass spectrometry acid get every data point to get not 3 color images like RGB thousands of of of channels so it's a hyperspectral imaging applications those that 2 color channels we can can be taken as insulin and as a complete information disaster at the end you want have 1 number namely the probability that the person has scans or not and you create bill billions of numbers to find so the question is what you can do and the answer is pretty simple in the ever
38:28
slice or maybe 8 or 10 different metabolic processes running of thousands so you should be able to read presented data with the very few properly chosen basis vectors spectral vectors which represent the different metabolic processes and this is example exactly basis learning the machine
38:48
so we prefer according to 7 so we want to have k rank approximations of the kernel and so we add 2 more things namely we say about the 6 channels are nonnegative and pairwise orthogonal and this means that support has to be separate from the support of those differences case gives you clustering which was well known in the discrete case since 2008 and we said what happens if you if you apply this and the regularization series and this more general penalty terms we went of the year for but outside to prove what happens recognized spatially Korean cases sometimes if you 1 of all of a sudden you get
39:29
a median based on kmeans clustering and you're just to results that's what comes out of it if you use the standards sparsity basis for the for the year for the spectra and L 2 for the for the Chelsea gets nicely decomposition which you can perfectly interpretation as having different metabolic processes in which regions and if you apply TV yours is spatially Korean structure can completely different + orthogonality and sparsity much more easier to interpret takes picture and that's what we have published and some developed the
40:04
algorithms which is not so important to apply it's this
40:08
digital pathology so the standard approaches that the the pathologist takes additional slides makes some cancer biomarkers which us neutral with additional slice and that the colors the petitions slice where they expect that there there's some tumorous cells yes to make many tests to find out in a more complex iterations diagnosis and you want to replace this by digital staining test developed for use in this mass spectrometry and I can show you some
40:38
exclusive results we obtained so we do this what is routinely we have no some different datasets
40:44
more maybe those of the the bigger studies we have been analyzing this now things more than 800 patients if data from on the left hand side is what comes out from the standard 1 really pathologist s OK I know the biomarker I know which molecular weight this is having so we look at some some some spots in the spectrum and some some somewhat picked instead values compared to all American but like optimize schemes and as you can see it's not always better it's really not saying that the spectral patterns always beating it I if certain cases of sufficient advantage so that the people really
41:25
are starting to use and if build a spinoff company out of this and so true grows civil leading manufacturer of mass spectrometry equipment status and worldwide of Walt the instruments are using our our software which was our company until last month and then proposed audits completely so it's no regret company but it's a nice
41:46
story where of course the cost mathematics without them ever met metal never back background you would have never designed this type of classification classification schemes a contagion schemes about 80 % of the workers computer science programming adapting it to new machines with doing the study in basic right the courts mathematics and it would not have happened without its and its inverse problems Mr. driving in force but the
42:11
thing which of the future