Merken

# An introduction to inverse problems with applications in machine learning

#### Automatisierte Medienanalyse

## Diese automatischen Videoanalysen setzt das TIB|AV-Portal ein:

**Szenenerkennung**—

**Shot Boundary Detection**segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.

**Texterkennung**–

**Intelligent Character Recognition**erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.

**Spracherkennung**–

**Speech to Text**notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.

**Bilderkennung**–

**Visual Concept Detection**indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).

**Verschlagwortung**–

**Named Entity Recognition**beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.

Erkannte Entitäten

Sprachtranskript

00:00

figure out material was great pleasure to accept this invitation as he lots of familiar faces in the in the audience through the back more less to sit this type of groupware thing 10 years ago in a similar situation so I'm the center NASA mathematics so we are rooted in in

00:17

mathematics but we spend our activities the scientific computing I would say half the people are what we call a doing mathematics reading proving theorems and the people are using mathematics and scientific computing so the user bases of the mathematical theory in scientific computing and also in transfer so our center has the obligation to actually show the impact of our series also outside the academic girls this is written in our of constitution every 5 years we have to defend our budget and then there is always a big point how well they do in real industrial application so it's always difficult with such a mixed audience whom to address should material Alex indeed lower should address the librarian's amongst you were so I decided that I make half of the talk as the introduction on a very basic

01:07

sort of basic motivation level and then the 2nd half a just talk about what interests me at the moment so to get their research overview of what we're doing and I try to put this into perspective of the general interest in in the field of the universe and inverse problems in some sense it's a follow-up of what's what's the present the talked about yesterday since part of the talk will be on machine learning which has 2 sides 1 side is that it helps us to solve inverse problems gives little different perspective on the well known problems the class the universe problems and on the other hand many machine learning task I actually inverse problems and we can apply all theory to improve some parts of machine learning has a topic by itself Cynthia mathematicians in the room we started the very simple example many 2 by 2 matrix this is overly simplified to those who know inverse problems already and so I apologize for those who are rooted in the mathematics theory but it gives a flavor of what's really going on so you take a 2 by 2 matrix of and the colorful what problem is that we take a vector 1 long and compute its image under this operator if you like to say so often this matrix of this transform offers system description given by a linear 2 by 2 system and I think all able to check that this is hopefully the correct answer right and mechanistic and the inverse problem is that you're given the the matrix a and the right side y and you're asked to solve the linear equation y equals a x and this is given by the inverse applied to to Y at least in the two-dimensional case is possible and then there's a little example but you can't just do issues say to the same problem with data are if less than 1 per cent error in the data and then we just cut off after the 2nd digit then this is rounded to 4 full well for should be for 2 years and then the inverse all of a sudden this minus 200 to 1 so even small change of less than 1 per cent in the data has enormous error in the In the reconstruction difficult applying the inverse reconstruction using the data why delta and those other the 2 typical features which we have to dress in all kinds of inverse problems namely that there always noisy measurements typically we tried to determine of certain observables X which cannot be measured directly but we observe it through some noisy measurements for the measurement matrix a and Y equals the measurements and the inverse is ill posed not only condition but also post which makes so and also give you the is the most common example how to remedy this on the 1st few slides since after the 1st 3 slides you probably half of the audience fades away and if you a little bit intimidated by the high number of false lights don't bother happy talk more than 2 minutes per slide so you can can check that this was not too long so how

04:06

to solve this is to stabilize this is simple way namely adding something on the diagonal if the matrix self is symmetric positive definite you can really get Chosun all fall on the diagonal and sold the system where you know that it does not give you the exact answer to accept that it solves a different problem but this should give you something which is more stable in terms of a sense that the total error can be controlled so the a is not symmetric then you just multiply both sides with the transposed matrix and now we get a system which is symmetric positive definite semi-definite and then you can apply the same trick shifting the respectable fall by adding all from the diagonal and then to solve the system of the perturbed perturbed matrix then to get this to picture but if you look at the error the distance between the difference between what you reconstruct using the the it change system with this all fired on the diagonal with the true solution this bit bearer he do you do with as if you would do the reconstruction perfect data about that this approximation excel file and you split up the 2nd part of the error which is the approximation error coming just from the difference was 0 noise but doing to reconstructions perfected don't recognize to get to error components namely the approximation error this gets smaller and smaller and smaller evolves like good sound is 0 if of noise state I would be perfectly choose all thinking 0 otherwise the data are the inverse of the data are my clothes and you want to control the sum of those 2 so you want to control the combination so this is a typical feature of inverse problems you do not try to make a perfect reconstruction you have to sacrifice some of the accuracy for and defined a perfect matches the task of response that they give you a 2nd example

06:04

which is slightly more and involving but still can be explained with 1 or 2 slides so this was actually done by by Kaplan many many years ago it serves the distances by some of the planets from from the Earth but he didn't want to have the distance he will give the path of the planet you want to determine the speed of the planet and the relation between measuring the distance and trying to determine the speech if she integrates at time 0 if a certain distance from your measurement point at 0 and you have to the speed profile F of towel for time tau and if you integrate the speed to get the distance so that's are a basic equation which comes from from my school and the quest is now you measured she and you want to interfere about so probably more modern application will

06:52

be trying to determine highway and everybody knows if you drive this constant speed you this is on the shown increases linearly and if you accelerate as much as it can whenever there's little gap in the and the flow and then they have to break down the river speed profile which is pumping up and down but the distance to cover is almost the same so now we see that this is also a typical or to be all set up that large differences in the input of the system difference speed profiles have been very different generate only various minor differences in the so the former problem this sort of thing we just integrate that's fine but the inverse problem you have to look into small differences to interfere large differences in the input data and this causes the instability and this will come from the from the will add to the complications of

07:44

inverse problems so I did choose this inverse problem which we had when we're still important develop Olson curler this was the use of all sorts are engines in and out of it and they had a very nice inverse problems namely what they typically do they have to make the maintenance staff to check the engines and they have to check whether for example central shaft has little bands cracks in it developed over time and what they do is they play some acceleration sensors on the outside and the let the machine wrong at certain rotation speeds and the measure the vibrations and from the measurements of the appraisal outside the try to interview the the internal structure of the shop and that's recognizing the problem and it's clear that is from the transportation a system is then it's meant to be then things and see if you have a vibration of the engine not elect Kaplan should be migration migrating so even if you have sort of comparatively large deviations from the mobile and the central shaft the measurements might not show this so strong and this attribute example which is given by by a complicated structure but it was sound to the classical mechanical iteration of of power and resources so forces some operations which is 2nd order differential equation OK and there's an overview and if

09:10

you look at our Dussel implications over the last 20 years almost all of them I inverse problems there's hardly ever a direct problems whenever you do there's hardly any quantity of interest which can be measured directly even if you look at the temperature if you have those little instruments what to show what you see is not temperature you see the elongation of this column off some something which which expands on the heat so you never measure anything more honest directly and this Beverly applies where's our main field application is quality control whenever you have a technical process you never measure directly what causes the the defects you always indirect measurements which also applies to to life sciences where the difference between being else and all tells the shows in the proteomic pattern but you never measure directly that cancer but you measure a protein which has a high expression as compared to the old have with status so I think you really inverse problems are the everywhere and if you think about its maybe in those problems even if you don't treat it as an hour of problems hidden inside it is an inverse problem and it will expand if it if you would apply some thinking about the problem from the viewpoint of the universe we construct of OK so this is our our topics

10:34

and now I would like to go with a bit more about the mathematical models on the same basic structure so as a century after always that the search for an unknown f which you can measure directly in a is transfer operator the passion matrix this the matrix whatever you have in your justification in your mind and the 2 different roles you can model this this greedily you as a matrix vector or you can do this in the continuous setting function spaces and I very much prefer the function space setting for the reason being modest lazy if you think about this could problems you have to combine the problems coming from the operator with assembling questions is a discretization model correctly did you have a good scheme for doing the numerics and whenever I get a result which is says something off the old off you have to worry about the constant from really is bounded independently of the dimension of discretization library much prefer that the function space setting where we look at operators believing function spaces and then leave the discretization through to the 2nd 2nd step and I think it gives a much bigger picture it's smiling and and gives you some insight on the core problem which is not and how to say disguised mixed up with some some additional problem but others come back to the the problem of trying and the German lives with different speed profiles and maybe the police just measures the distance doesn't measure the directly via the speed but it measures the distance from the from the laser cannon really have position variance instruments and lost instrument all this has a measurement error and even if the measurement error is only 1 per cent the the overlay a profile with small deviations with error you can not see the difference anymore through the so typically the error is much stronger influencing via the data and then the deviation to try to detect so there's a big problem what can you do in this situation how should you be able to do any meaningful reconstruction if the error is stronger than the difference you want to look at and this comes from by using some side information and we want to explain this in the next slides so the method that the model you're looking at is that we have this operator this mapping from the space of entities for their properties which should try to determine to the measurement space were actually the data and we could say the solution sets all parametres which explain the data within the given the accuracy of the measurement device so you say you look at the set of all paths subset of f minus a given data is bounded by the error level that you have to accept due to physical reasons of measurement reasons or whatever is the the course of your of your deviation from the true through measurement and the problem is ill-conditioned ill-posed problems is that this but set is unbounded so even if you are from the data side so you know the data lies close to the true data if you look to the potential elements in X which are mapped by to this set it's an unbounded set and still to try to do is you have 1 particular element in this small ball in Y and you somehow have have to 6 1 elements on the other side as being close to the true solution and there may be many problems 1 is that the set is unbounded but also that there might be multiple solutions nobody says that a has to be injective and it can be have multiple solutions to everything is combined in saying that the set of potential solutions is unbounded and the crystals what should you do to to do this and that's the topic of factorizations theory which you country month so the remains of regularizing this and really the most important 1 is standing and all from the diagonal which is called taken a functional applications if you extended to more general it generalizes to more complex settings and iteration methods and the purely analytic likability expansion the thing is of less interest for replication so the 1st idea is to reform relates solving a the equals the delta as a minimization problem instead of saying AF should be equal to T delta and tried to minimize the residual if you go for it you solution this is stephanie 0 but as we said to be sacrificed security for stability and we say we don't want this to be 0 but will just on that have a small and on the same it's on the same arguments we would like to to avoid this explosion of the reconstruction evil this 2 by 2 example where all of a sudden the small change in the DataCite cost the true solution to explode to minus 1 and 200 real white this by saying we penalize f to be too long and therefore you small civilization term also has to be chosen appropriately that's a different topic which we see in a moment and this means if you do the analysis property that's the normal equations of of all iterations then exactly do would be what he explained before use the transpose a or a at showing the blights after a at something on the diagonal and that's what sold with the data which has been transformed to the proper space but by the transpose of and this has the great effects at minus 1 is unbounded absolutely enormous in finite range of various stop found it's not the full space Kennedy applied to any data you have in mind this all is remedied by is shifting the spectrum by all 5 or by using a a possible 5 and this is well bounded linear operator which is part it and hence can be controlled what I and now we have the problem shifted a little bit that we need to determine the optimal all from the different strategies to do heuristically very little on theoretically but the jails general or the structures that the face to large then you penalize F to strongly it so it will be too small giving it to a small solutions if is too small on the other hand them this instability occurs and the norm of the solution so will explode to have to be really careful typically you just solve it for many many all fast and you have to develop a 2nd criterion for choosing the optimal of and this is really problem-dependent application-dependent and there's very little of useful from the channels you OK this can be it

17:17

can be it can be generalized we nodes that only look at the at L 2 norm so we can have rather channel distance norms frequent distances war directions and so on properties and we can have more general penalty terms but the general theory is still a valid that if you don't regularizing don't steep stabilised by a penalty term that get in stable to large reconstructions and you can encode some expert information in designing a penalty term saying we want if the norm of the of reconstructed small is just 1 piece of expert information may be saying that nature is lazy wants to do everything the least energy so if solution which does something of the small energy in the form of small L two norm then the preferred but this can be different types of expert information which can be encoded in the penalty term as much as you like can also encodes nonnegativity constraints incomes and context relaxation Properties problems as well and 1 thing we come to the difference to control theory in a minute this that's what on the theoretical side interested in is when we talk about convergence is typically a different convergence as when you talk to people in America analysis Ward optimization for us to be critical property the critical versus what happens if the data are those 0 to really approach the true solution if measurements get better and better so not and infinity no discretisation no iteration scheme but the problem the convergence property we are concerned officers what happens if silicosis 0 and this is the key question the the dressing on the theoretical side and then of course we need also numerical schemes for computing the minimizer of those and every Borel from from from control theory optimization so this is a typical result in results but my thing of course is a to of mathematics is Stephanie true but maybe it's it's useless in reality but it's not it really helps to to choose the regularization parameter correctly it also tells you if you know more about the Solution in terms of regularity of the solution is 2 times a frangible instead of 3 times the how should you change your organization it also tells you how you should choose the record the year the discretization so that the discretization error is not interfering with the accuracy of the analytic results so this is really a theoretical result if you know how to interpret traded it gives you a lot of information and it's definitely useful for applications and there are lots of 1 consequences coming from so let me highlight to papers in the talk of the height of 2 papers which we have changed the game of the field and models this people buying our coordination and about 189 where they transferred everything from the linear theory to loneliness theory and transfers several words since they use complete and it is completely new concepts in terms of a linear function analysis which were not known in the few before and they developed the full theory 1 paper and this was the starting point which I think all of us going for next week for the next decade from 1890 2001 as that's developing theories from all linear was the key and again the algorithms are surprisingly simple and at the end but you need to find the correct answer to it you just take the discrepancy term of critical functional and domestic trade in the sense which 1 nonlinear operators supplies iterative from the option of iterative make a step size control and then you make a gradient descent for the discrepancy terms and if the stop accordingly what choose of regularization for the pipeline that's fine before for fertilization purposes but the the says it's fine to do this in a more general case so that it's a few words about control theory and optimization as opposing most problems both feel the same topics that have discovers interspersed tend to transpose the minimization of functionals as 1 of the really most important core problems in both fields so it looks superficially it as it would be identical but if you go to the details there some some major differences In this problems we always say there is a real unknown physical quantity we try to determine like this a speed profile the cars driving the distance be profile there's is a physical quantity which is there we just don't have direct access to its that's why the measure the data you don't in control problems delete have a piece of ancient process which you want to drive to a certain and the states maybe production line that should produce certain quantities UTC cyan the outputs as wishful thinking you think OK will degrade to have this kind of control of parametres control after which buys system to stage which I desire that is by no means clear whether this is possible whether a allows you to find a controls such you those subset reached the it's completely different question as much harder in this respect on the control theory side Texas is clear problems spots controllability and existence this is much harder for the for the control problems on the other hand our major interest is how to control the measurement noise and since G was the science as a desirable it's artificial the exact there's no problem with noise so if you look at the theory in the till they are rather rather different we talk about different converge since we're really have to go to the X space while while for for the control problem you're interested outputs how well you were close to the desired state is really a different type of theory but when it comes to the numerics it's more or less the same you minimize functional and is among major development over the last 10 15 years that's lots of the very finds theory from from control theory was shifted those problems that's what my field there are many people who have much much better that's for example dual thinking that using in dual problems so we take use duality theory to explain the effects of inverse problems and also incorporate them like you reversible for for the signing very efficient algorithms there was a huge impact from control theory on the inverse problem so it's a much stronger than the than the other way around and there's some experts which are much more much of the FIL

24:05

so I think I've used up know about soccer have 15 minutes OK now let's talk about 1 of the other topics rather quickly so again we have this problem that we don't know which element we should choose on the reconstruction side and typically a very smooth objects in this set of potential solutions which explain the data and the given accuracy and raped ones which already much localized and the question is which 1 you prefer and this comes from the application so if you say the norm has to be small typically this gives a smooth solutions but if you say defects are localized you would probably rather say OK if they they tell me the central shaft is sort of spend it all over non or I will this year as location of the defect there is only very few spots and I would probably prefer this solution and the question is how can I do my my algorithm to jump to this element in the the dataset and of this 1 and this actually is what

25:07

applies to most applications physical structures are sparse the deviation from healthy to to Cyc is encoded not in the full proteomic landscape on an array proteins the defects are localized so sparsity as it is called as a strong concept and the question is how can we control our algorithm into

25:27

sears and sparsity just means that if you have appropriate representation you can represent everything with the very few coefficients not knowing how many but with you and let me jump to run 1 problem this was introduced and this is the 2nd paper which change the field in 2004 I think she wrote this paper Bishop knowledge methods from image processing or statistics which were around there and developed the theory for you most problems with L 1 penalty terms articulation is p equals 1 that's what was called the the sparsity penalty term and she developed a nice theory and wonderful post get test there was an inference and to think it over she those years he came back in 2005 and since then for lasting years this was our main interests to develop sparsity theory which in fact is very similar still too the previous you do this gradient descent you can even include a stepsize control here so this is a gradient ascent of discrepancy term and then the interrelated with a shrinkage term which just throws away so small coefficients so it's a very nice iteration is spend it comes to to the other things it's easy to explain its gradient descent + French and this then developed a huge future factory of results we were 1 of the people who extended to nonlinear more problems it's still going on there some from reasons solves it's part is smallest federated the Austin

26:54

group before much out so did a lot on the theoretical side applications are still still of interest and fast wasn't so still a problem I think back and it would have a great so they don't have them and I think Figueiredo and nested ofyour having the quality the best iteration schemes for for solving those systems which come after the different names for the community so let me give you

27:19

1 application in a bit more detail so this was coming from the collaboration with the by cox from my work and you'll still at the end of the year university and then have her 1st position industry so this was from both those times when the moon was going from 25 frames per 2nd 200 frames per 2nd so you get 2 pictures and the question is just how doing asserts the frames in between there's some snowflakes flowing around she is shifting ahead a little bit the lady so if you would use interpolation everything would be smeared about and our approach was to do this by differential equation in we say part of the object as a transport equation no from flex transported price effusion like melting of the snowflake and part is the sole source for renewal objects occur and then we employ sparsity and sparsity zest that's it that's should every of those things should have been should have minimal support and some explain the state of the and new support means the other decides whether he he uses diffusion process here of low terms and it gives you a the mathematics of whites smearing out and sort of mediocre decisions sparsity it takes a strong position this is melting and this is the this is

28:36

from flowing and this is the creation of new project and this is a little sequence which will solve this so we inserted 10 images of this man closing the back of his car and look at the edge of the of the work and if you would use using the interpolation is would be smeared out but if you sparsity and the transport terms this meshes nicely transported with chocolate of America mostly you see is that the so contrast in the background jump since the it always boy sparsity in time and wants to do everything the source term chums at 1 time step from the background to make it so maybe this could be done better but in general this gives further results at the expense of terrible computation times so so it's really not useful if you want to use real-time applications but if you have time to

29:32

do with this type of some sequencing mystical interpretations yeah but don't let's come to the topic of what's Mr. Miller said yesterday and I think you really this is sick great point and I must say 20 years ago when you networks and so on simulated annealing genetic programming 1 definition of the local and addition you didn't work in this field for the reasons that people talk about the opinions rather about theorems that's not really a solid sphere really it's just say all my value is better than what is proof that 1 example better so it was really strange this topic of this has changed and that they say few words why the classical approach which was model-driven is probably not not suitable for all applications naturally incomplete has has limits so it's absolutely clear but more importantly for Big Data applications running them the full model is too expensive in solving a PD a few hundred thousand times by be too complicated for automotive driving in the car no way so you need to have to introduce at the expense of accuracy the reduced model which gives a good approximation but my main point is that this function analytic theory has lots of drawbacks since it uses those function spaces and even for images we don't have a proper function space we don't know how to describe image so he's my assesses the directions of N H 3 what to vector field works well maybe maybe not be total variation is a good model but it's 17 not what the best model and then those models should apply for natural images as well as for a radio Logical Images they have rather different structures but before the same same same function so I think there isn't much finer model needed for really capturing the essence of certain subclasses which are not captured by the shell from here to the opposite this

31:31

data-driven approaches to throw away the model and just say of huge datasets where you know the mapping and then to construct a black box model so that's what's what sets yesterday and the 7 years and approaches which came up rather rather often in the last 10 years

31:50

and then I'll just give you 2 examples you can build a low rank approximation so we just look for some some data vectors on the 1 side which are the basis vectors and some output vectors and you construct a kernel which has a finite dimension and then use the projection on this basis as a function and this is called basis pursuit of basis learning approach machine learning you return the optimal basis for for your problem but what the people really how you can check it's try to find as much as came from very little they are either on the black box will slide or the white mold sites being being model-driven data but why not use as much as it can the analytic model and only ended up with some correction for the parts which you don't want to model because this is computationally too expensive in which you is not covered by a physical model or for any other reasons and when I talked about was the last years and or works just coded multicolumn since you have a black box model many but not most of them aren't really black they have some color as we heard yesterday and assuring between living in between any use really both in advance it's neither white model or mortalism multicolor model and I think this is phrase very nice but we found out that even in wikipedia and there's a chapter about trade box models its modest exactly described in this and there 1 paper which you found which applies a string of problems trade models are well adapted for full models but there's very little and that they say what is the basis for all the analysis is the derivative so what you need to be able at 2 dif different shape the form what model with respect to x for designing gradient descent for doing then a sensitivity analysis for doing your characterization theory and this is the problem if you talk about networks no network servers had 5 minutes of so we have no networks are very simple structures this is a sequence of linear operations followed by nonlinearity so if you're right it's in a way it's just a combination of all linear maps and that's it so the differentiation is theoretically very nice but is also very nice in terms of algorithm whatever they do it's in training in your network they compute the derivative respect to the design variables but by the same way you can also compute the derivative of respect to the input our so did so did determined via the route to with respect to the input variables of the penalty functional instances simple so you can directly apply all trading descent approaches even for from undercover problems so that's the 1st concept I want to explain we just sort this about a year ago given to a 3 PhD students WordNet so we don't have too much results in a well we've just explained to you 2 more concepts coming from the next 1 is a is a great great concepts so this was Lacombe 1 of the big guys from institutional users who use that if you take this iterative soft shrinkage Ireland and that's exactly the structure often the red for you have linear operation followed by a nonlinearity if you fix the number of iterations that's exactly a K-level will never so that's what he says we formalize everything as having nominees just combines the linear part systems who like and then be assessed instead of saying that this is composed from a words so use the

35:40

huge datasets to learn those matrices W and the matrix B and even the permit alpha and based on the specific data set you want to do of course iterates SoftRank is optimal in l 2 spaces respected you about your specific data set is only a subset of a certain characteristics and this is captured by the learning process where user data for and I would say much more elegantly than if you go to suppressing of problems of citizen of problems which is really a clumsy theories and they work they have to do from getting 1 more the variance instead of just the expectation values so horrible and he is really elegant and this really helps to give you better results and the experiments are nicely done and he's the done since programs for for training nor networks are all of the tens of flow for the honor of the ones and there was a paper by Michael answer which made me realize that angry that it's the approaches fine so the general approach would be computerized tomography take cytochrome useful protection get a reconstruction and said now takes the other constructions done by Philip the projection as inputs to an on your network and combine them to get a better predictions and they tested the meiosis and admire dataset and that it's better and that's not so surprising so used to propose specific network

37:07

and make me angry say that this at all we can apply to sparse data just 50 directions instead of 2 1 and get better results than for the black projection and that's a tuple mistake you can combine you compare your with a very bad them in the 1st and say it's better nobody would useful back-projections sparsely sampled sex-related but this is OK I finally got which is no surprise that this general idea to use networks of life afterburner

37:36

for improvement is definitely good ideas and we just

37:39

test so last 3 minutes I will talk about what we have been working as an application on where we actually made it to a dozen applications was was mold imaging repetitious lies into the laser and the reverberated material is runs for mass spectrometry acid get every data point to get not 3 color images like RGB thousands of of of channels so it's a hyperspectral imaging applications those that 2 color channels we can can be taken as insulin and as a complete information disaster at the end you want have 1 number namely the probability that the person has scans or not and you create bill billions of numbers to find so the question is what you can do and the answer is pretty simple in the ever

38:28

slice or maybe 8 or 10 different metabolic processes running of thousands so you should be able to read presented data with the very few properly chosen basis vectors spectral vectors which represent the different metabolic processes and this is example exactly basis learning the machine

38:48

so we prefer according to 7 so we want to have k rank approximations of the kernel and so we add 2 more things namely we say about the 6 channels are non-negative and pairwise orthogonal and this means that support has to be separate from the support of those differences case gives you clustering which was well known in the discrete case since 2008 and we said what happens if you if you apply this and the regularization series and this more general penalty terms we went of the year for but outside to prove what happens recognized spatially Korean cases sometimes if you 1 of all of a sudden you get

39:29

a median based on k-means clustering and you're just to results that's what comes out of it if you use the standards sparsity basis for the for the year for the spectra and L 2 for the for the Chelsea gets nicely decomposition which you can perfectly interpretation as having different metabolic processes in which regions and if you apply TV yours is spatially Korean structure can completely different + orthogonality and sparsity much more easier to interpret takes picture and that's what we have published and some developed the

40:04

algorithms which is not so important to apply it's this

40:08

digital pathology so the standard approaches that the the pathologist takes additional slides makes some cancer biomarkers which us neutral with additional slice and that the colors the petitions slice where they expect that there there's some tumorous cells yes to make many tests to find out in a more complex iterations diagnosis and you want to replace this by digital staining test developed for use in this mass spectrometry and I can show you some

40:38

exclusive results we obtained so we do this what is routinely we have no some different datasets

40:44

more maybe those of the the bigger studies we have been analyzing this now things more than 800 patients if data from on the left hand side is what comes out from the standard 1 really pathologist s OK I know the biomarker I know which molecular weight this is having so we look at some some some spots in the spectrum and some some somewhat picked instead values compared to all American but like optimize schemes and as you can see it's not always better it's really not saying that the spectral patterns always beating it I if certain cases of sufficient advantage so that the people really

41:25

are starting to use and if build a spin-off company out of this and so true grows civil leading manufacturer of mass spectrometry equipment status and worldwide of Walt the instruments are using our our software which was our company until last month and then proposed audits completely so it's no regret company but it's a nice

41:46

story where of course the cost mathematics without them ever met metal never back background you would have never designed this type of classification classification schemes a contagion schemes about 80 % of the workers computer science programming adapting it to new machines with doing the study in basic right the courts mathematics and it would not have happened without its and its inverse problems Mr. driving in force but the

42:11

thing which of the future

00:00

Punkt

Mathematik

Reihe

Wärmeübergang

Kartesische Koordinaten

Groupware

Physikalische Theorie

Computeranimation

Theorem

Datentyp

Wissenschaftliches Rechnen

Ruhmasse

Stützpunkt <Mathematik>

Vorlesung/Konferenz

Figurierte Zahl

Lesen <Datenverarbeitung>

01:05

Matrizenrechnung

Bit

Gewichtete Summe

Momentenproblem

Nebenbedingung

Analysis

Computeranimation

Übergang

Deskriptive Statistik

Font

Perfekte Gruppe

Einflussgröße

Nichtlinearer Operator

Approximation

Güte der Anpassung

Gruppenoperation

Rechenschieber

Datenfeld

Rechter Winkel

Fehlerschranke

Physikalische Theorie

Digitalisierer

Konditionszahl

Mathematikerin

Messprozess

Diagonale <Geometrie>

Informationsmodellierung

Fehlermeldung

Aggregatzustand

Subtraktion

Inverse

Mathematisierung

Klasse <Mathematik>

Schaltnetz

Geräusch

Maschinelles Lernen

Transformation <Mathematik>

Term

Physikalische Theorie

Task

Physikalisches System

Schwach besetzte Matrix

Virtuelle Maschine

Perspektive

Endogene Variable

Näherungsverfahren

Luenberger-Beobachter

Zusammenhängender Graph

Abstand

Inverses Problem

Algorithmische Lerntheorie

Grundraum

Bildgebendes Verfahren

Fehlermeldung

Approximationstheorie

Mathematik

sinc-Funktion

Lineare Gleichung

Physikalisches System

Vektorraum

Elektronische Publikation

Quick-Sort

Mereologie

06:00

Inverses Problem

Subtraktion

Punkt

Inverse

Relativitätstheorie

Profil <Aerodynamik>

Gleichungssystem

Sprachsynthese

Kartesische Koordinaten

Physikalisches System

Ein-Ausgabe

Datenfluss

Quick-Sort

Computeranimation

Rechenschieber

Determiniertheit <Informatik>

Messprozess

Abstand

Inverses Problem

Einflussgröße

07:41

Zentralisator

Subtraktion

Prozess <Physik>

Kontrollstruktur

Stab

Inverse

Flächentheorie

Iteration

Kartesische Koordinaten

Drehung

Transportproblem

Computeranimation

Physikalisches System

Virtuelle Maschine

Arithmetischer Ausdruck

Gruppe <Mathematik>

Mustersprache

Inverses Problem

Datenstruktur

Scherbeanspruchung

Grundraum

Einflussgröße

Leistung <Physik>

Leistungsbewertung

Attributierte Grammatik

Videospiel

Nichtlinearer Operator

Cracker <Computerkriminalität>

Anwendungsspezifischer Prozessor

Physikalisches System

Erschütterung

Quick-Sort

Softwarewartung

Datenfeld

Differentialgleichungssystem

Gamecontroller

Messprozess

Ordnung <Mathematik>

Erschütterung

Standardabweichung

10:32

Vektorpotenzial

Momentenproblem

Mengenfunktion

Iteration

Wärmeübergang

Baumechanik

Textur-Mapping

Diskrepanz

Raum-Zeit

Computeranimation

Richtung

Lineare Abbildung

Stetige Abbildung

Algorithmus

Vorzeichen <Mathematik>

Datenreplikation

Konditionszahl

Inkorrekt gestelltes Problem

Analytische Fortsetzung

Gerade

Addition

Extremwert

Kontrolltheorie

Kategorie <Mathematik>

Computersicherheit

Profil <Aerodynamik>

Biprodukt

Kontextbezogenes System

Approximation

Gruppenoperation

Kritischer Punkt

Menge

Physikalische Theorie

Ein-Ausgabe

Ablöseblase

Messprozess

Diagonale <Geometrie>

Informationsmodellierung

Fehlermeldung

Mathematische Größe

Dualitätstheorie

Theorem

Subtraktion

Stabilitätstheorie <Logik>

Kontrollstruktur

Allgemeine Relativitätstheorie

Messfehler

Selbst organisierendes System

Inverse

Mathematisierung

Geräusch

Gleichmäßige Beschränktheit

Diskrete Gruppe

Analytische Menge

Nummerung

Singulärwertzerlegung

Sigma-Algebra

Open Source

Knotenmenge

Bildschirmmaske

Informationsmodellierung

Spannweite <Stochastik>

Iteration

Zufallszahlen

Spieltheorie

Datentyp

Näherungsverfahren

Programmbibliothek

Abstand

Diskrepanz

Datenstruktur

Varianz

Analysis

Soundverarbeitung

Algorithmus

sinc-Funktion

Gibbs-Verteilung

Unendlichkeit

Auswahlaxiom

Existenzsatz

Gamecontroller

Wort <Informatik>

Wärmeausdehnung

Resultante

Matrizenrechnung

Bit

Punkt

Prozess <Physik>

Natürliche Zahl

Minimierung

Kartesische Koordinaten

Gradient

Element <Mathematik>

Zahlensystem

Komplex <Algebra>

Übergang

Regulärer Graph

Existenzsatz

Gradientenverfahren

Funktionalanalysis

Kontrolltheorie

Einflussgröße

Funktion <Mathematik>

Lineares Funktional

Nichtlinearer Operator

Parametersystem

Prozess <Informatik>

Globale Optimierung

Nummerung

Teilbarkeit

Linearisierung

Spannweite <Stochastik>

Teilmenge

Rechenschieber

Datenfeld

Funktion <Mathematik>

Gradientenverfahren

Strategisches Spiel

Dualitätstheorie

Information

Schlüsselverwaltung

Parametrische Erregung

Standardabweichung

Aggregatzustand

Lineare Abbildung

Nebenbedingung

Ortsoperator

Hausdorff-Dimension

Physikalismus

Matrizenrechnung

Nichtlinearer Operator

Term

Punktspektrum

Overlay-Netz

Physikalische Theorie

Physikalisches System

Multiplikation

Datensatz

Mathematische Modellierung

Inverses Problem

Softwareentwickler

Transferoperator

Expertensystem

Lipschitz-Bedingung

Fehlermeldung

Approximationstheorie

Mathematik

Funktionenraum

Finitismus

Physikalisches System

Vektorraum

Dualer Graph

Office-Paket

Mapping <Computergraphik>

Energiedichte

Nichtlineares System

Parametersystem

Mereologie

Helmholtz-Zerlegung

Speicherabzug

Normalvektor

Term

24:00

Vektorpotenzial

Kontrollstruktur

Inverse

Nebenbedingung

Maschinelles Lernen

Kartesische Koordinaten

Element <Mathematik>

Nichtlinearer Operator

Computeranimation

Eins

Physikalisches System

Schwach besetzte Matrix

Font

Algorithmus

Kontrolltheorie

Quick-Sort

Approximation

Spannweite <Stochastik>

Objekt <Kategorie>

Menge

Existenzsatz

Funktion <Mathematik>

Physikalische Theorie

Ein-Ausgabe

Parametersystem

URL

Normalvektor

Informationsmodellierung

25:04

Resultante

Inferenz <Künstliche Intelligenz>

Kontrollstruktur

Inverse

Physikalismus

Selbstrepräsentation

Iteration

Parser

Kartesische Koordinaten

Gradient

Nichtlinearer Operator

Extrempunkt

Term

Physikalische Theorie

Computeranimation

Schwach besetzte Matrix

Gradientenverfahren

Data Dictionary

Strebe

Softwareentwickler

Datenstruktur

Diskrepanz

Normalvektor

Basisvektor

Softwaretest

Statistik

Wavelet-Analyse

Bildanalyse

Schwach besetzte Matrix

Rahmenproblem

Datenfeld

Datenstruktur

Koeffizient

Mereologie

Gradientenverfahren

Gamecontroller

Helmholtz-Zerlegung

Faktor <Algebra>

Pixel

26:53

Systemidentifikation

Bit

Prozess <Physik>

Ortsoperator

Rahmenproblem

Gruppenkeim

Schmelze <Betrieb>

Iteration

Kartesische Koordinaten

Gleichungssystem

Gradient

Term

Physikalische Theorie

Computeranimation

Heegaard-Zerlegung

Spezialrechner

Erneuerungstheorie

Mailing-Liste

Newton-Verfahren

Grundraum

Lorenz-Kurve

Folge <Mathematik>

Algorithmus

Fehlermeldung

Mathematik

Division

Axonometrie

Analytische Fortsetzung

Glättung

Nummerung

Methode der kleinsten Quadrate

Physikalisches System

Quellcode

Schwach besetzte Matrix

Quick-Sort

Entscheidungstheorie

Objekt <Kategorie>

Kollaboration <Informatik>

Interpolation

COM

Parametersystem

Mereologie

Differentialgleichungssystem

Gradientenverfahren

Interpolation

Zeitzone

Innerer Punkt

Aggregatzustand

28:34

Resultante

Differential

TVD-Verfahren

Punkt

Nabel <Mathematik>

Klassische Physik

Leistungsbewertung

Kartesische Koordinaten

Computerunterstütztes Verfahren

Information

Raum-Zeit

Computeranimation

Richtung

Spezialrechner

Vektorfeld

Wechselsprung

Theorem

Kontrast <Statistik>

Expertensystem

Metropolitan area network

Inklusion <Mathematik>

Lineares Funktional

Interpretierer

Addition

Approximation

Datennetz

Globale Optimierung

Schwach besetzte Matrix

Quellcode

Datenfeld

Interpolation

Funktion <Mathematik>

Physikalische Theorie

Beweistheorie

ATM

Projektive Ebene

Messprozess

Informationsmodellierung

Lineare Abbildung

Folge <Mathematik>

Teilmenge

Bildgebendes Verfahren

Matrizenrechnung

Vektorraum

Analytische Menge

Term

Mathematische Logik

Physikalische Theorie

Evolutionsstrategie

Informationsmodellierung

Kugel

Datentyp

Vererbungshierarchie

Inverser Limes

Bildgebendes Verfahren

Funktionenraum

Raum-Zeit

Codec

Wort <Informatik>

Eigentliche Abbildung

31:29

Resultante

Kernel <Informatik>

Sensitivitätsanalyse

Blackbox

t-Test

Iteration

Extrempunkt

Computeranimation

Kernel <Informatik>

Lineare Abbildung

Algorithmus

Konditionszahl

Gradientenverfahren

Neuronales Netz

Kette <Mathematik>

Nichtlineares System

Funktion <Mathematik>

Lineares Funktional

Shape <Informatik>

Approximation

Datennetz

Ein-Ausgabe

Approximation

Linearisierung

Matrizenring

Funktion <Mathematik>

Menge

ATM

Gradientenverfahren

Server

Digitale Videotechnik

Derivation <Algebra>

Projektive Ebene

Informationsmodellierung

Instantiierung

Zeichenkette

Subtraktion

Folge <Mathematik>

Web Site

Wellenpaket

Quader

Sterbeziffer

Hausdorff-Dimension

Inverse

Schaltnetz

Maschinelles Lernen

Derivation <Algebra>

Term

Physikalische Theorie

Open Source

Virtuelle Maschine

Informationsmodellierung

Variable

Differential

Bildschirmmaske

Iteration

Task

Rangstatistik

Endlicher Graph

Näherungsverfahren

Schwellwertverfahren

Datenstruktur

Analysis

Algorithmus

Fehlermeldung

Vektorgraphik

Routing

Vektorraum

Physikalisches System

Quick-Sort

Mapping <Computergraphik>

Nichtlineares System

Basisvektor

Mereologie

Analytische Menge

Wort <Informatik>

Kantenfärbung

Modelltheorie

Rangstatistik

35:37

Resultante

Matrizenrechnung

Prozess <Physik>

Wellenpaket

Bildgebendes Verfahren

Inverse

n-Tupel

Raum-Zeit

Physikalische Theorie

Computeranimation

Richtung

Eins

Open Source

Erwartungswert

Task

Iteration

Prognoseverfahren

Wellenpaket

Datennetz

Faltungsoperator

Jensen-Maß

Spärliche Codierung

Schwellwertverfahren

Optimierung

Varianz

Umwandlungsenthalpie

Videospiel

Konstruktor <Informatik>

Computertomograph

Matrizenring

Datennetz

Zehn

Ein-Ausgabe

Sichtenkonzept

Datenfluss

Teilmenge

Matrizenring

Menge

Einheit <Mathematik>

Zellulares neuronales Netz

Parametersystem

Projektive Ebene

Anwendungsdienstanbieter

Charakteristisches Polynom

37:36

Systemidentifikation

Punkt

Bildgebendes Verfahren

Atomarität <Informatik>

Inverse

Matrizenrechnung

Nebenbedingung

Maschinelles Lernen

Kartesische Koordinaten

Information

Computeranimation

Schwach besetzte Matrix

Vollkommene Information

Font

Physikalische Theorie

Ruhmasse

Elektronischer Fingerabdruck

Näherungsverfahren

Punktspektrum

Kantenfärbung

Pixel

Bildgebendes Verfahren

Portscanner

Lie-Gruppe

Informationsmodellierung

38:24

Subtraktion

Prozess <Physik>

Orthogonale Funktionen

Bildgebendes Verfahren

Program Slicing

Matrizenrechnung

Nummerung

Term

Punktspektrum

Computeranimation

Virtuelle Maschine

Stetige Abbildung

Rangstatistik

Regulärer Graph

Gewicht <Mathematik>

Diskrete Untergruppe

Basisvektor

Approximation

Prozess <Informatik>

Reihe

Paarvergleich

Orthogonale Funktionen

Vektorraum

Faktorisierung

Mustersprache

Basisvektor

Ruhmasse

Punktspektrum

Klumpenstichprobe

39:26

Randomisierung

Resultante

Algorithmus

Interpretierer

Subtraktion

Prozess <Physik>

Orthogonale Funktionen

Gradient

Schwach besetzte Matrix

Information

Medianwert

Computeranimation

Helmholtz-Zerlegung

Divergenz <Vektoranalysis>

Mustersprache

Schwach besetzte Matrix

Algorithmus

Basisvektor

Datenstruktur

Pixel

Standardabweichung

40:05

Resultante

Rechenschieber

Softwaretest

Paarvergleich

Iteration

Kantenfärbung

Komplex <Algebra>

Computeranimation

40:41

Beobachtungsstudie

Stellenring

Software

Mustersprache

Paarvergleich

Mini-Disc

Punktspektrum

Computeranimation

Gammafunktion

41:45

Portscanner

Beobachtungsstudie

Virtuelle Maschine

Forcing

Mathematik

Rechter Winkel

Datentyp

Farbverwaltungssystem

Nummerung

Optimierung

Inverses Problem

Informatik

Computeranimation

### Metadaten

#### Formale Metadaten

Titel | An introduction to inverse problems with applications in machine learning |

Serientitel | The Leibniz "Mathematical Modeling and Simulation" (MMS) Days 2017 |

Autor | Maaß, Peter |

Mitwirkende |
Weierstraß-Institut für Angewandte Analysis und Stochastik (WIAS) University Bremen FB 3 Mathematik und Informatik, Zentrum für Technomathematik |

Lizenz |
CC-Namensnennung - keine kommerzielle Nutzung - keine Bearbeitung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt in unveränderter Form zu jedem legalen und nicht-kommerziellen Zweck nutzen, vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. |

DOI | 10.5446/21899 |

Herausgeber | Technische Informationsbibliothek (TIB) |

Erscheinungsjahr | 2017 |

Sprache | Englisch |

Produktionsjahr | 2017 |

Produktionsort | Hannover |

#### Inhaltliche Metadaten

Fachgebiet | Informatik |

Abstract | The presentation starts with some motivating examples of Inverse Problems before introducing the general setting. We shortly review the most common regularization approaches (Tikhonov, iteration methods) and sketch some recent developments in sparsity and machine learning. Sparsity refers to additional expert information on the desired reconstruction, namely, that is has a finite expension in some predefined basis or frame. In machine learning we focus on 'multi colored' inverse problems, where part of the application can be formulated by a strict analytical framework but some part of the problem needs to modeled by a data driven approach. Those combined problems can be created by data- driven linear low rank approximations or more general black box models. In particular we review deep learning approaches to inverse problems. Finally, machine learning techniques by themselves are often inverse problems. We highlight basis learning techniques and applications to hyperspectral image analysis. |