Merken
An introduction to inverse problems with applications in machine learning
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:00
figure out material was great pleasure to accept this invitation as he lots of familiar faces in the in the audience through the back more less to sit this type of groupware thing 10 years ago in a similar situation so I'm the center NASA mathematics so we are rooted in in
00:17
mathematics but we spend our activities the scientific computing I would say half the people are what we call a doing mathematics reading proving theorems and the people are using mathematics and scientific computing so the user bases of the mathematical theory in scientific computing and also in transfer so our center has the obligation to actually show the impact of our series also outside the academic girls this is written in our of constitution every 5 years we have to defend our budget and then there is always a big point how well they do in real industrial application so it's always difficult with such a mixed audience whom to address should material Alex indeed lower should address the librarian's amongst you were so I decided that I make half of the talk as the introduction on a very basic
01:07
sort of basic motivation level and then the 2nd half a just talk about what interests me at the moment so to get their research overview of what we're doing and I try to put this into perspective of the general interest in in the field of the universe and inverse problems in some sense it's a followup of what's what's the present the talked about yesterday since part of the talk will be on machine learning which has 2 sides 1 side is that it helps us to solve inverse problems gives little different perspective on the well known problems the class the universe problems and on the other hand many machine learning task I actually inverse problems and we can apply all theory to improve some parts of machine learning has a topic by itself Cynthia mathematicians in the room we started the very simple example many 2 by 2 matrix this is overly simplified to those who know inverse problems already and so I apologize for those who are rooted in the mathematics theory but it gives a flavor of what's really going on so you take a 2 by 2 matrix of and the colorful what problem is that we take a vector 1 long and compute its image under this operator if you like to say so often this matrix of this transform offers system description given by a linear 2 by 2 system and I think all able to check that this is hopefully the correct answer right and mechanistic and the inverse problem is that you're given the the matrix a and the right side y and you're asked to solve the linear equation y equals a x and this is given by the inverse applied to to Y at least in the twodimensional case is possible and then there's a little example but you can't just do issues say to the same problem with data are if less than 1 per cent error in the data and then we just cut off after the 2nd digit then this is rounded to 4 full well for should be for 2 years and then the inverse all of a sudden this minus 200 to 1 so even small change of less than 1 per cent in the data has enormous error in the In the reconstruction difficult applying the inverse reconstruction using the data why delta and those other the 2 typical features which we have to dress in all kinds of inverse problems namely that there always noisy measurements typically we tried to determine of certain observables X which cannot be measured directly but we observe it through some noisy measurements for the measurement matrix a and Y equals the measurements and the inverse is ill posed not only condition but also post which makes so and also give you the is the most common example how to remedy this on the 1st few slides since after the 1st 3 slides you probably half of the audience fades away and if you a little bit intimidated by the high number of false lights don't bother happy talk more than 2 minutes per slide so you can can check that this was not too long so how
04:06
to solve this is to stabilize this is simple way namely adding something on the diagonal if the matrix self is symmetric positive definite you can really get Chosun all fall on the diagonal and sold the system where you know that it does not give you the exact answer to accept that it solves a different problem but this should give you something which is more stable in terms of a sense that the total error can be controlled so the a is not symmetric then you just multiply both sides with the transposed matrix and now we get a system which is symmetric positive definite semidefinite and then you can apply the same trick shifting the respectable fall by adding all from the diagonal and then to solve the system of the perturbed perturbed matrix then to get this to picture but if you look at the error the distance between the difference between what you reconstruct using the the it change system with this all fired on the diagonal with the true solution this bit bearer he do you do with as if you would do the reconstruction perfect data about that this approximation excel file and you split up the 2nd part of the error which is the approximation error coming just from the difference was 0 noise but doing to reconstructions perfected don't recognize to get to error components namely the approximation error this gets smaller and smaller and smaller evolves like good sound is 0 if of noise state I would be perfectly choose all thinking 0 otherwise the data are the inverse of the data are my clothes and you want to control the sum of those 2 so you want to control the combination so this is a typical feature of inverse problems you do not try to make a perfect reconstruction you have to sacrifice some of the accuracy for and defined a perfect matches the task of response that they give you a 2nd example
06:04
which is slightly more and involving but still can be explained with 1 or 2 slides so this was actually done by by Kaplan many many years ago it serves the distances by some of the planets from from the Earth but he didn't want to have the distance he will give the path of the planet you want to determine the speed of the planet and the relation between measuring the distance and trying to determine the speech if she integrates at time 0 if a certain distance from your measurement point at 0 and you have to the speed profile F of towel for time tau and if you integrate the speed to get the distance so that's are a basic equation which comes from from my school and the quest is now you measured she and you want to interfere about so probably more modern application will
06:52
be trying to determine highway and everybody knows if you drive this constant speed you this is on the shown increases linearly and if you accelerate as much as it can whenever there's little gap in the and the flow and then they have to break down the river speed profile which is pumping up and down but the distance to cover is almost the same so now we see that this is also a typical or to be all set up that large differences in the input of the system difference speed profiles have been very different generate only various minor differences in the so the former problem this sort of thing we just integrate that's fine but the inverse problem you have to look into small differences to interfere large differences in the input data and this causes the instability and this will come from the from the will add to the complications of
07:44
inverse problems so I did choose this inverse problem which we had when we're still important develop Olson curler this was the use of all sorts are engines in and out of it and they had a very nice inverse problems namely what they typically do they have to make the maintenance staff to check the engines and they have to check whether for example central shaft has little bands cracks in it developed over time and what they do is they play some acceleration sensors on the outside and the let the machine wrong at certain rotation speeds and the measure the vibrations and from the measurements of the appraisal outside the try to interview the the internal structure of the shop and that's recognizing the problem and it's clear that is from the transportation a system is then it's meant to be then things and see if you have a vibration of the engine not elect Kaplan should be migration migrating so even if you have sort of comparatively large deviations from the mobile and the central shaft the measurements might not show this so strong and this attribute example which is given by by a complicated structure but it was sound to the classical mechanical iteration of of power and resources so forces some operations which is 2nd order differential equation OK and there's an overview and if
09:10
you look at our Dussel implications over the last 20 years almost all of them I inverse problems there's hardly ever a direct problems whenever you do there's hardly any quantity of interest which can be measured directly even if you look at the temperature if you have those little instruments what to show what you see is not temperature you see the elongation of this column off some something which which expands on the heat so you never measure anything more honest directly and this Beverly applies where's our main field application is quality control whenever you have a technical process you never measure directly what causes the the defects you always indirect measurements which also applies to to life sciences where the difference between being else and all tells the shows in the proteomic pattern but you never measure directly that cancer but you measure a protein which has a high expression as compared to the old have with status so I think you really inverse problems are the everywhere and if you think about its maybe in those problems even if you don't treat it as an hour of problems hidden inside it is an inverse problem and it will expand if it if you would apply some thinking about the problem from the viewpoint of the universe we construct of OK so this is our our topics
10:34
and now I would like to go with a bit more about the mathematical models on the same basic structure so as a century after always that the search for an unknown f which you can measure directly in a is transfer operator the passion matrix this the matrix whatever you have in your justification in your mind and the 2 different roles you can model this this greedily you as a matrix vector or you can do this in the continuous setting function spaces and I very much prefer the function space setting for the reason being modest lazy if you think about this could problems you have to combine the problems coming from the operator with assembling questions is a discretization model correctly did you have a good scheme for doing the numerics and whenever I get a result which is says something off the old off you have to worry about the constant from really is bounded independently of the dimension of discretization library much prefer that the function space setting where we look at operators believing function spaces and then leave the discretization through to the 2nd 2nd step and I think it gives a much bigger picture it's smiling and and gives you some insight on the core problem which is not and how to say disguised mixed up with some some additional problem but others come back to the the problem of trying and the German lives with different speed profiles and maybe the police just measures the distance doesn't measure the directly via the speed but it measures the distance from the from the laser cannon really have position variance instruments and lost instrument all this has a measurement error and even if the measurement error is only 1 per cent the the overlay a profile with small deviations with error you can not see the difference anymore through the so typically the error is much stronger influencing via the data and then the deviation to try to detect so there's a big problem what can you do in this situation how should you be able to do any meaningful reconstruction if the error is stronger than the difference you want to look at and this comes from by using some side information and we want to explain this in the next slides so the method that the model you're looking at is that we have this operator this mapping from the space of entities for their properties which should try to determine to the measurement space were actually the data and we could say the solution sets all parametres which explain the data within the given the accuracy of the measurement device so you say you look at the set of all paths subset of f minus a given data is bounded by the error level that you have to accept due to physical reasons of measurement reasons or whatever is the the course of your of your deviation from the true through measurement and the problem is illconditioned illposed problems is that this but set is unbounded so even if you are from the data side so you know the data lies close to the true data if you look to the potential elements in X which are mapped by to this set it's an unbounded set and still to try to do is you have 1 particular element in this small ball in Y and you somehow have have to 6 1 elements on the other side as being close to the true solution and there may be many problems 1 is that the set is unbounded but also that there might be multiple solutions nobody says that a has to be injective and it can be have multiple solutions to everything is combined in saying that the set of potential solutions is unbounded and the crystals what should you do to to do this and that's the topic of factorizations theory which you country month so the remains of regularizing this and really the most important 1 is standing and all from the diagonal which is called taken a functional applications if you extended to more general it generalizes to more complex settings and iteration methods and the purely analytic likability expansion the thing is of less interest for replication so the 1st idea is to reform relates solving a the equals the delta as a minimization problem instead of saying AF should be equal to T delta and tried to minimize the residual if you go for it you solution this is stephanie 0 but as we said to be sacrificed security for stability and we say we don't want this to be 0 but will just on that have a small and on the same it's on the same arguments we would like to to avoid this explosion of the reconstruction evil this 2 by 2 example where all of a sudden the small change in the DataCite cost the true solution to explode to minus 1 and 200 real white this by saying we penalize f to be too long and therefore you small civilization term also has to be chosen appropriately that's a different topic which we see in a moment and this means if you do the analysis property that's the normal equations of of all iterations then exactly do would be what he explained before use the transpose a or a at showing the blights after a at something on the diagonal and that's what sold with the data which has been transformed to the proper space but by the transpose of and this has the great effects at minus 1 is unbounded absolutely enormous in finite range of various stop found it's not the full space Kennedy applied to any data you have in mind this all is remedied by is shifting the spectrum by all 5 or by using a a possible 5 and this is well bounded linear operator which is part it and hence can be controlled what I and now we have the problem shifted a little bit that we need to determine the optimal all from the different strategies to do heuristically very little on theoretically but the jails general or the structures that the face to large then you penalize F to strongly it so it will be too small giving it to a small solutions if is too small on the other hand them this instability occurs and the norm of the solution so will explode to have to be really careful typically you just solve it for many many all fast and you have to develop a 2nd criterion for choosing the optimal of and this is really problemdependent applicationdependent and there's very little of useful from the channels you OK this can be it
17:17
can be it can be generalized we nodes that only look at the at L 2 norm so we can have rather channel distance norms frequent distances war directions and so on properties and we can have more general penalty terms but the general theory is still a valid that if you don't regularizing don't steep stabilised by a penalty term that get in stable to large reconstructions and you can encode some expert information in designing a penalty term saying we want if the norm of the of reconstructed small is just 1 piece of expert information may be saying that nature is lazy wants to do everything the least energy so if solution which does something of the small energy in the form of small L two norm then the preferred but this can be different types of expert information which can be encoded in the penalty term as much as you like can also encodes nonnegativity constraints incomes and context relaxation Properties problems as well and 1 thing we come to the difference to control theory in a minute this that's what on the theoretical side interested in is when we talk about convergence is typically a different convergence as when you talk to people in America analysis Ward optimization for us to be critical property the critical versus what happens if the data are those 0 to really approach the true solution if measurements get better and better so not and infinity no discretisation no iteration scheme but the problem the convergence property we are concerned officers what happens if silicosis 0 and this is the key question the the dressing on the theoretical side and then of course we need also numerical schemes for computing the minimizer of those and every Borel from from from control theory optimization so this is a typical result in results but my thing of course is a to of mathematics is Stephanie true but maybe it's it's useless in reality but it's not it really helps to to choose the regularization parameter correctly it also tells you if you know more about the Solution in terms of regularity of the solution is 2 times a frangible instead of 3 times the how should you change your organization it also tells you how you should choose the record the year the discretization so that the discretization error is not interfering with the accuracy of the analytic results so this is really a theoretical result if you know how to interpret traded it gives you a lot of information and it's definitely useful for applications and there are lots of 1 consequences coming from so let me highlight to papers in the talk of the height of 2 papers which we have changed the game of the field and models this people buying our coordination and about 189 where they transferred everything from the linear theory to loneliness theory and transfers several words since they use complete and it is completely new concepts in terms of a linear function analysis which were not known in the few before and they developed the full theory 1 paper and this was the starting point which I think all of us going for next week for the next decade from 1890 2001 as that's developing theories from all linear was the key and again the algorithms are surprisingly simple and at the end but you need to find the correct answer to it you just take the discrepancy term of critical functional and domestic trade in the sense which 1 nonlinear operators supplies iterative from the option of iterative make a step size control and then you make a gradient descent for the discrepancy terms and if the stop accordingly what choose of regularization for the pipeline that's fine before for fertilization purposes but the the says it's fine to do this in a more general case so that it's a few words about control theory and optimization as opposing most problems both feel the same topics that have discovers interspersed tend to transpose the minimization of functionals as 1 of the really most important core problems in both fields so it looks superficially it as it would be identical but if you go to the details there some some major differences In this problems we always say there is a real unknown physical quantity we try to determine like this a speed profile the cars driving the distance be profile there's is a physical quantity which is there we just don't have direct access to its that's why the measure the data you don't in control problems delete have a piece of ancient process which you want to drive to a certain and the states maybe production line that should produce certain quantities UTC cyan the outputs as wishful thinking you think OK will degrade to have this kind of control of parametres control after which buys system to stage which I desire that is by no means clear whether this is possible whether a allows you to find a controls such you those subset reached the it's completely different question as much harder in this respect on the control theory side Texas is clear problems spots controllability and existence this is much harder for the for the control problems on the other hand our major interest is how to control the measurement noise and since G was the science as a desirable it's artificial the exact there's no problem with noise so if you look at the theory in the till they are rather rather different we talk about different converge since we're really have to go to the X space while while for for the control problem you're interested outputs how well you were close to the desired state is really a different type of theory but when it comes to the numerics it's more or less the same you minimize functional and is among major development over the last 10 15 years that's lots of the very finds theory from from control theory was shifted those problems that's what my field there are many people who have much much better that's for example dual thinking that using in dual problems so we take use duality theory to explain the effects of inverse problems and also incorporate them like you reversible for for the signing very efficient algorithms there was a huge impact from control theory on the inverse problem so it's a much stronger than the than the other way around and there's some experts which are much more much of the FIL
24:05
so I think I've used up know about soccer have 15 minutes OK now let's talk about 1 of the other topics rather quickly so again we have this problem that we don't know which element we should choose on the reconstruction side and typically a very smooth objects in this set of potential solutions which explain the data and the given accuracy and raped ones which already much localized and the question is which 1 you prefer and this comes from the application so if you say the norm has to be small typically this gives a smooth solutions but if you say defects are localized you would probably rather say OK if they they tell me the central shaft is sort of spend it all over non or I will this year as location of the defect there is only very few spots and I would probably prefer this solution and the question is how can I do my my algorithm to jump to this element in the the dataset and of this 1 and this actually is what
25:07
applies to most applications physical structures are sparse the deviation from healthy to to Cyc is encoded not in the full proteomic landscape on an array proteins the defects are localized so sparsity as it is called as a strong concept and the question is how can we control our algorithm into
25:27
sears and sparsity just means that if you have appropriate representation you can represent everything with the very few coefficients not knowing how many but with you and let me jump to run 1 problem this was introduced and this is the 2nd paper which change the field in 2004 I think she wrote this paper Bishop knowledge methods from image processing or statistics which were around there and developed the theory for you most problems with L 1 penalty terms articulation is p equals 1 that's what was called the the sparsity penalty term and she developed a nice theory and wonderful post get test there was an inference and to think it over she those years he came back in 2005 and since then for lasting years this was our main interests to develop sparsity theory which in fact is very similar still too the previous you do this gradient descent you can even include a stepsize control here so this is a gradient ascent of discrepancy term and then the interrelated with a shrinkage term which just throws away so small coefficients so it's a very nice iteration is spend it comes to to the other things it's easy to explain its gradient descent + French and this then developed a huge future factory of results we were 1 of the people who extended to nonlinear more problems it's still going on there some from reasons solves it's part is smallest federated the Austin
26:54
group before much out so did a lot on the theoretical side applications are still still of interest and fast wasn't so still a problem I think back and it would have a great so they don't have them and I think Figueiredo and nested ofyour having the quality the best iteration schemes for for solving those systems which come after the different names for the community so let me give you
27:19
1 application in a bit more detail so this was coming from the collaboration with the by cox from my work and you'll still at the end of the year university and then have her 1st position industry so this was from both those times when the moon was going from 25 frames per 2nd 200 frames per 2nd so you get 2 pictures and the question is just how doing asserts the frames in between there's some snowflakes flowing around she is shifting ahead a little bit the lady so if you would use interpolation everything would be smeared about and our approach was to do this by differential equation in we say part of the object as a transport equation no from flex transported price effusion like melting of the snowflake and part is the sole source for renewal objects occur and then we employ sparsity and sparsity zest that's it that's should every of those things should have been should have minimal support and some explain the state of the and new support means the other decides whether he he uses diffusion process here of low terms and it gives you a the mathematics of whites smearing out and sort of mediocre decisions sparsity it takes a strong position this is melting and this is the this is
28:36
from flowing and this is the creation of new project and this is a little sequence which will solve this so we inserted 10 images of this man closing the back of his car and look at the edge of the of the work and if you would use using the interpolation is would be smeared out but if you sparsity and the transport terms this meshes nicely transported with chocolate of America mostly you see is that the so contrast in the background jump since the it always boy sparsity in time and wants to do everything the source term chums at 1 time step from the background to make it so maybe this could be done better but in general this gives further results at the expense of terrible computation times so so it's really not useful if you want to use realtime applications but if you have time to
29:32
do with this type of some sequencing mystical interpretations yeah but don't let's come to the topic of what's Mr. Miller said yesterday and I think you really this is sick great point and I must say 20 years ago when you networks and so on simulated annealing genetic programming 1 definition of the local and addition you didn't work in this field for the reasons that people talk about the opinions rather about theorems that's not really a solid sphere really it's just say all my value is better than what is proof that 1 example better so it was really strange this topic of this has changed and that they say few words why the classical approach which was modeldriven is probably not not suitable for all applications naturally incomplete has has limits so it's absolutely clear but more importantly for Big Data applications running them the full model is too expensive in solving a PD a few hundred thousand times by be too complicated for automotive driving in the car no way so you need to have to introduce at the expense of accuracy the reduced model which gives a good approximation but my main point is that this function analytic theory has lots of drawbacks since it uses those function spaces and even for images we don't have a proper function space we don't know how to describe image so he's my assesses the directions of N H 3 what to vector field works well maybe maybe not be total variation is a good model but it's 17 not what the best model and then those models should apply for natural images as well as for a radio Logical Images they have rather different structures but before the same same same function so I think there isn't much finer model needed for really capturing the essence of certain subclasses which are not captured by the shell from here to the opposite this
31:31
datadriven approaches to throw away the model and just say of huge datasets where you know the mapping and then to construct a black box model so that's what's what sets yesterday and the 7 years and approaches which came up rather rather often in the last 10 years
31:50
and then I'll just give you 2 examples you can build a low rank approximation so we just look for some some data vectors on the 1 side which are the basis vectors and some output vectors and you construct a kernel which has a finite dimension and then use the projection on this basis as a function and this is called basis pursuit of basis learning approach machine learning you return the optimal basis for for your problem but what the people really how you can check it's try to find as much as came from very little they are either on the black box will slide or the white mold sites being being modeldriven data but why not use as much as it can the analytic model and only ended up with some correction for the parts which you don't want to model because this is computationally too expensive in which you is not covered by a physical model or for any other reasons and when I talked about was the last years and or works just coded multicolumn since you have a black box model many but not most of them aren't really black they have some color as we heard yesterday and assuring between living in between any use really both in advance it's neither white model or mortalism multicolor model and I think this is phrase very nice but we found out that even in wikipedia and there's a chapter about trade box models its modest exactly described in this and there 1 paper which you found which applies a string of problems trade models are well adapted for full models but there's very little and that they say what is the basis for all the analysis is the derivative so what you need to be able at 2 dif different shape the form what model with respect to x for designing gradient descent for doing then a sensitivity analysis for doing your characterization theory and this is the problem if you talk about networks no network servers had 5 minutes of so we have no networks are very simple structures this is a sequence of linear operations followed by nonlinearity so if you're right it's in a way it's just a combination of all linear maps and that's it so the differentiation is theoretically very nice but is also very nice in terms of algorithm whatever they do it's in training in your network they compute the derivative respect to the design variables but by the same way you can also compute the derivative of respect to the input our so did so did determined via the route to with respect to the input variables of the penalty functional instances simple so you can directly apply all trading descent approaches even for from undercover problems so that's the 1st concept I want to explain we just sort this about a year ago given to a 3 PhD students WordNet so we don't have too much results in a well we've just explained to you 2 more concepts coming from the next 1 is a is a great great concepts so this was Lacombe 1 of the big guys from institutional users who use that if you take this iterative soft shrinkage Ireland and that's exactly the structure often the red for you have linear operation followed by a nonlinearity if you fix the number of iterations that's exactly a Klevel will never so that's what he says we formalize everything as having nominees just combines the linear part systems who like and then be assessed instead of saying that this is composed from a words so use the
35:40
huge datasets to learn those matrices W and the matrix B and even the permit alpha and based on the specific data set you want to do of course iterates SoftRank is optimal in l 2 spaces respected you about your specific data set is only a subset of a certain characteristics and this is captured by the learning process where user data for and I would say much more elegantly than if you go to suppressing of problems of citizen of problems which is really a clumsy theories and they work they have to do from getting 1 more the variance instead of just the expectation values so horrible and he is really elegant and this really helps to give you better results and the experiments are nicely done and he's the done since programs for for training nor networks are all of the tens of flow for the honor of the ones and there was a paper by Michael answer which made me realize that angry that it's the approaches fine so the general approach would be computerized tomography take cytochrome useful protection get a reconstruction and said now takes the other constructions done by Philip the projection as inputs to an on your network and combine them to get a better predictions and they tested the meiosis and admire dataset and that it's better and that's not so surprising so used to propose specific network
37:07
and make me angry say that this at all we can apply to sparse data just 50 directions instead of 2 1 and get better results than for the black projection and that's a tuple mistake you can combine you compare your with a very bad them in the 1st and say it's better nobody would useful backprojections sparsely sampled sexrelated but this is OK I finally got which is no surprise that this general idea to use networks of life afterburner
37:36
for improvement is definitely good ideas and we just
37:39
test so last 3 minutes I will talk about what we have been working as an application on where we actually made it to a dozen applications was was mold imaging repetitious lies into the laser and the reverberated material is runs for mass spectrometry acid get every data point to get not 3 color images like RGB thousands of of of channels so it's a hyperspectral imaging applications those that 2 color channels we can can be taken as insulin and as a complete information disaster at the end you want have 1 number namely the probability that the person has scans or not and you create bill billions of numbers to find so the question is what you can do and the answer is pretty simple in the ever
38:28
slice or maybe 8 or 10 different metabolic processes running of thousands so you should be able to read presented data with the very few properly chosen basis vectors spectral vectors which represent the different metabolic processes and this is example exactly basis learning the machine
38:48
so we prefer according to 7 so we want to have k rank approximations of the kernel and so we add 2 more things namely we say about the 6 channels are nonnegative and pairwise orthogonal and this means that support has to be separate from the support of those differences case gives you clustering which was well known in the discrete case since 2008 and we said what happens if you if you apply this and the regularization series and this more general penalty terms we went of the year for but outside to prove what happens recognized spatially Korean cases sometimes if you 1 of all of a sudden you get
39:29
a median based on kmeans clustering and you're just to results that's what comes out of it if you use the standards sparsity basis for the for the year for the spectra and L 2 for the for the Chelsea gets nicely decomposition which you can perfectly interpretation as having different metabolic processes in which regions and if you apply TV yours is spatially Korean structure can completely different + orthogonality and sparsity much more easier to interpret takes picture and that's what we have published and some developed the
40:04
algorithms which is not so important to apply it's this
40:08
digital pathology so the standard approaches that the the pathologist takes additional slides makes some cancer biomarkers which us neutral with additional slice and that the colors the petitions slice where they expect that there there's some tumorous cells yes to make many tests to find out in a more complex iterations diagnosis and you want to replace this by digital staining test developed for use in this mass spectrometry and I can show you some
40:38
exclusive results we obtained so we do this what is routinely we have no some different datasets
40:44
more maybe those of the the bigger studies we have been analyzing this now things more than 800 patients if data from on the left hand side is what comes out from the standard 1 really pathologist s OK I know the biomarker I know which molecular weight this is having so we look at some some some spots in the spectrum and some some somewhat picked instead values compared to all American but like optimize schemes and as you can see it's not always better it's really not saying that the spectral patterns always beating it I if certain cases of sufficient advantage so that the people really
41:25
are starting to use and if build a spinoff company out of this and so true grows civil leading manufacturer of mass spectrometry equipment status and worldwide of Walt the instruments are using our our software which was our company until last month and then proposed audits completely so it's no regret company but it's a nice
41:46
story where of course the cost mathematics without them ever met metal never back background you would have never designed this type of classification classification schemes a contagion schemes about 80 % of the workers computer science programming adapting it to new machines with doing the study in basic right the courts mathematics and it would not have happened without its and its inverse problems Mr. driving in force but the
42:11
thing which of the future
00:00
Punkt
Mathematik
Reihe
Kartesische Koordinaten
Wärmeübergang
Groupware
Physikalische Theorie
Computeranimation
Theorem
Datentyp
Wissenschaftliches Rechnen
Ruhmasse
Stützpunkt <Mathematik>
Figurierte Zahl
Lesen <Datenverarbeitung>
01:05
Matrizenrechnung
Bit
Gewichtete Summe
Momentenproblem
Nebenbedingung
Analysis
Computeranimation
Übergang
Deskriptive Statistik
Font
Perfekte Gruppe
Einflussgröße
Nichtlinearer Operator
Approximation
Güte der Anpassung
Gruppenoperation
Rechenschieber
Datenfeld
Rechter Winkel
Fehlerschranke
Physikalische Theorie
Konditionszahl
Digitalisierer
Mathematikerin
Messprozess
Diagonale <Geometrie>
Informationsmodellierung
Fehlermeldung
Aggregatzustand
Subtraktion
Inverse
Mathematisierung
Klasse <Mathematik>
Schaltnetz
Geräusch
Maschinelles Lernen
Transformation <Mathematik>
Term
Physikalische Theorie
Task
Physikalisches System
Schwach besetzte Matrix
Virtuelle Maschine
Perspektive
Endogene Variable
Näherungsverfahren
LuenbergerBeobachter
Zusammenhängender Graph
Abstand
Algorithmische Lerntheorie
Inverses Problem
Grundraum
Bildgebendes Verfahren
Fehlermeldung
Approximationstheorie
Mathematik
sincFunktion
Lineare Gleichung
Vektorraum
Physikalisches System
Elektronische Publikation
QuickSort
Mereologie
06:00
Inverses Problem
Subtraktion
Punkt
Inverse
Relativitätstheorie
Profil <Aerodynamik>
Gleichungssystem
Sprachsynthese
Kartesische Koordinaten
Physikalisches System
EinAusgabe
Datenfluss
QuickSort
Computeranimation
Rechenschieber
Determiniertheit <Informatik>
Messprozess
Abstand
Inverses Problem
Einflussgröße
07:41
Zentralisator
Subtraktion
Prozess <Physik>
Kontrollstruktur
Stab
Flächentheorie
Inverse
Iteration
Kartesische Koordinaten
Drehung
Transportproblem
Computeranimation
Physikalisches System
Virtuelle Maschine
Arithmetischer Ausdruck
Gruppe <Mathematik>
Mustersprache
Inverses Problem
Datenstruktur
Scherbeanspruchung
Grundraum
Einflussgröße
Leistungsbewertung
Attributierte Grammatik
Leistung <Physik>
Nichtlinearer Operator
Videospiel
Cracker <Computerkriminalität>
Anwendungsspezifischer Prozessor
Physikalisches System
QuickSort
Erschütterung
Softwarewartung
Datenfeld
Differentialgleichungssystem
Gamecontroller
Messprozess
Ordnung <Mathematik>
Erschütterung
Standardabweichung
10:32
Vektorpotenzial
Momentenproblem
Mengenfunktion
Iteration
Wärmeübergang
Baumechanik
TexturMapping
Diskrepanz
RaumZeit
Computeranimation
Richtung
Lineare Abbildung
Stetige Abbildung
Algorithmus
Vorzeichen <Mathematik>
Datenreplikation
Konditionszahl
Inkorrekt gestelltes Problem
Analytische Fortsetzung
Gerade
Addition
Extremwert
Kontrolltheorie
Kategorie <Mathematik>
Computersicherheit
Profil <Aerodynamik>
Kontextbezogenes System
Biprodukt
Approximation
Gruppenoperation
Kritischer Punkt
Menge
Physikalische Theorie
EinAusgabe
Ablöseblase
Messprozess
Diagonale <Geometrie>
Informationsmodellierung
Fehlermeldung
Mathematische Größe
Dualitätstheorie
Theorem
Stabilitätstheorie <Logik>
Subtraktion
Kontrollstruktur
Selbst organisierendes System
Messfehler
Allgemeine Relativitätstheorie
Inverse
Mathematisierung
Geräusch
Gleichmäßige Beschränktheit
Analytische Menge
Diskrete Gruppe
Nummerung
Singulärwertzerlegung
SigmaAlgebra
Open Source
Informationsmodellierung
Spannweite <Stochastik>
Knotenmenge
Bildschirmmaske
Iteration
Zufallszahlen
Spieltheorie
Datentyp
Näherungsverfahren
Programmbibliothek
Abstand
Datenstruktur
Diskrepanz
Varianz
Analysis
Soundverarbeitung
Algorithmus
sincFunktion
GibbsVerteilung
Unendlichkeit
Auswahlaxiom
Existenzsatz
Gamecontroller
Wort <Informatik>
Wärmeausdehnung
Resultante
Matrizenrechnung
Bit
Prozess <Physik>
Punkt
Natürliche Zahl
Minimierung
Kartesische Koordinaten
Gradient
Element <Mathematik>
Zahlensystem
Komplex <Algebra>
Übergang
Regulärer Graph
Existenzsatz
Gradientenverfahren
Funktionalanalysis
Kontrolltheorie
Einflussgröße
Funktion <Mathematik>
Lineares Funktional
Parametersystem
Nichtlinearer Operator
Prozess <Informatik>
Globale Optimierung
Nummerung
Teilbarkeit
Linearisierung
Spannweite <Stochastik>
Rechenschieber
Teilmenge
Datenfeld
Funktion <Mathematik>
Gradientenverfahren
Strategisches Spiel
Dualitätstheorie
Information
Schlüsselverwaltung
Parametrische Erregung
Standardabweichung
Aggregatzustand
Lineare Abbildung
Nebenbedingung
Ortsoperator
HausdorffDimension
Physikalismus
Matrizenrechnung
Nichtlinearer Operator
OverlayNetz
Punktspektrum
Term
Physikalische Theorie
Physikalisches System
Datensatz
Multiplikation
Mathematische Modellierung
Softwareentwickler
Inverses Problem
Transferoperator
Expertensystem
LipschitzBedingung
Fehlermeldung
Approximationstheorie
Mathematik
Funktionenraum
Finitismus
Physikalisches System
Vektorraum
Dualer Graph
OfficePaket
Mapping <Computergraphik>
Energiedichte
Nichtlineares System
Parametersystem
Mereologie
HelmholtzZerlegung
Speicherabzug
Normalvektor
Term
24:00
Vektorpotenzial
Kontrollstruktur
Inverse
Nebenbedingung
Kartesische Koordinaten
Maschinelles Lernen
Element <Mathematik>
Nichtlinearer Operator
Computeranimation
Eins
Physikalisches System
Schwach besetzte Matrix
Font
Algorithmus
Kontrolltheorie
QuickSort
Approximation
Spannweite <Stochastik>
Objekt <Kategorie>
Existenzsatz
Funktion <Mathematik>
Menge
Physikalische Theorie
EinAusgabe
Parametersystem
URL
Normalvektor
Informationsmodellierung
25:04
Resultante
Inferenz <Künstliche Intelligenz>
Kontrollstruktur
Inverse
Selbstrepräsentation
Physikalismus
Parser
Iteration
Kartesische Koordinaten
Gradient
Nichtlinearer Operator
Extrempunkt
Term
Physikalische Theorie
Computeranimation
Schwach besetzte Matrix
Gradientenverfahren
Data Dictionary
Strebe
Softwareentwickler
Datenstruktur
Diskrepanz
Normalvektor
Basisvektor
Softwaretest
Statistik
WaveletAnalyse
Bildanalyse
Schwach besetzte Matrix
Rahmenproblem
Datenfeld
Datenstruktur
Koeffizient
Mereologie
Gradientenverfahren
Gamecontroller
HelmholtzZerlegung
Faktor <Algebra>
Pixel
26:53
Systemidentifikation
Bit
Prozess <Physik>
Ortsoperator
Rahmenproblem
Gruppenkeim
Schmelze <Betrieb>
Iteration
Kartesische Koordinaten
Gleichungssystem
Gradient
Term
Physikalische Theorie
Computeranimation
HeegaardZerlegung
Spezialrechner
Erneuerungstheorie
MailingListe
NewtonVerfahren
Grundraum
LorenzKurve
Folge <Mathematik>
Algorithmus
Fehlermeldung
Mathematik
Division
Axonometrie
Analytische Fortsetzung
Glättung
Nummerung
Methode der kleinsten Quadrate
Physikalisches System
Schwach besetzte Matrix
Quellcode
QuickSort
Entscheidungstheorie
Objekt <Kategorie>
Kollaboration <Informatik>
Interpolation
COM
Parametersystem
Differentialgleichungssystem
Mereologie
Gradientenverfahren
Interpolation
Zeitzone
Innerer Punkt
Aggregatzustand
28:34
Differential
Resultante
TVDVerfahren
Punkt
Klassische Physik
Nabel <Mathematik>
Leistungsbewertung
Kartesische Koordinaten
Computerunterstütztes Verfahren
Information
RaumZeit
Computeranimation
Richtung
Spezialrechner
Vektorfeld
Wechselsprung
Theorem
Kontrast <Statistik>
Metropolitan area network
Expertensystem
Inklusion <Mathematik>
Lineares Funktional
Interpretierer
Addition
Approximation
Datennetz
Globale Optimierung
Schwach besetzte Matrix
Quellcode
Datenfeld
Interpolation
Funktion <Mathematik>
Physikalische Theorie
Beweistheorie
ATM
Projektive Ebene
Messprozess
Informationsmodellierung
Lineare Abbildung
Folge <Mathematik>
Teilmenge
Bildgebendes Verfahren
Matrizenrechnung
Vektorraum
Analytische Menge
Term
Mathematische Logik
Physikalische Theorie
Evolutionsstrategie
Informationsmodellierung
Kugel
Datentyp
Vererbungshierarchie
Inverser Limes
Bildgebendes Verfahren
Funktionenraum
RaumZeit
Codec
Wort <Informatik>
Eigentliche Abbildung
31:29
Resultante
Kernel <Informatik>
Sensitivitätsanalyse
Blackbox
tTest
Iteration
Extrempunkt
Computeranimation
Kernel <Informatik>
Lineare Abbildung
Algorithmus
Konditionszahl
Gradientenverfahren
Neuronales Netz
Kette <Mathematik>
Funktion <Mathematik>
Nichtlineares System
Lineares Funktional
Shape <Informatik>
Approximation
Datennetz
EinAusgabe
Approximation
Linearisierung
Matrizenring
Funktion <Mathematik>
Menge
ATM
Gradientenverfahren
Digitale Videotechnik
Server
Derivation <Algebra>
Projektive Ebene
Informationsmodellierung
Instantiierung
Zeichenkette
Web Site
Subtraktion
Folge <Mathematik>
Wellenpaket
Quader
Sterbeziffer
HausdorffDimension
Inverse
Schaltnetz
Maschinelles Lernen
Derivation <Algebra>
Term
Physikalische Theorie
Open Source
Virtuelle Maschine
Informationsmodellierung
Bildschirmmaske
Differential
Variable
Iteration
Task
Rangstatistik
Endlicher Graph
Näherungsverfahren
Schwellwertverfahren
Datenstruktur
Analysis
Algorithmus
Fehlermeldung
Vektorgraphik
Routing
Physikalisches System
Vektorraum
QuickSort
Mapping <Computergraphik>
Nichtlineares System
Basisvektor
Mereologie
Analytische Menge
Wort <Informatik>
Kantenfärbung
Modelltheorie
Rangstatistik
35:37
Resultante
Matrizenrechnung
Prozess <Physik>
Wellenpaket
Bildgebendes Verfahren
Inverse
nTupel
RaumZeit
Physikalische Theorie
Computeranimation
Richtung
Eins
Open Source
Erwartungswert
Task
Iteration
Prognoseverfahren
Wellenpaket
Datennetz
Faltungsoperator
JensenMaß
Spärliche Codierung
Schwellwertverfahren
Optimierung
Varianz
Umwandlungsenthalpie
Videospiel
Konstruktor <Informatik>
Computertomograph
Matrizenring
Datennetz
Zehn
EinAusgabe
Sichtenkonzept
Datenfluss
Teilmenge
Matrizenring
Menge
Einheit <Mathematik>
Zellulares neuronales Netz
Parametersystem
Projektive Ebene
Anwendungsdienstanbieter
Charakteristisches Polynom
37:36
Systemidentifikation
Punkt
Bildgebendes Verfahren
Atomarität <Informatik>
Inverse
Matrizenrechnung
Nebenbedingung
Maschinelles Lernen
Kartesische Koordinaten
Information
Computeranimation
Schwach besetzte Matrix
Vollkommene Information
Font
Physikalische Theorie
Ruhmasse
Elektronischer Fingerabdruck
Näherungsverfahren
Punktspektrum
Kantenfärbung
Pixel
Bildgebendes Verfahren
Portscanner
LieGruppe
Informationsmodellierung
38:24
Subtraktion
Prozess <Physik>
Orthogonale Funktionen
Bildgebendes Verfahren
Program Slicing
Matrizenrechnung
Nummerung
Term
Punktspektrum
Computeranimation
Virtuelle Maschine
Stetige Abbildung
Rangstatistik
Regulärer Graph
Gewicht <Mathematik>
Diskrete Untergruppe
Basisvektor
Approximation
Prozess <Informatik>
Reihe
Paarvergleich
Orthogonale Funktionen
Vektorraum
Mustersprache
Faktorisierung
Basisvektor
Ruhmasse
Punktspektrum
Klumpenstichprobe
39:26
Randomisierung
Resultante
Algorithmus
Interpretierer
Subtraktion
Prozess <Physik>
Orthogonale Funktionen
Gradient
Schwach besetzte Matrix
Information
Medianwert
Computeranimation
HelmholtzZerlegung
Divergenz <Vektoranalysis>
Mustersprache
Schwach besetzte Matrix
Algorithmus
Basisvektor
Datenstruktur
Pixel
Standardabweichung
40:05
Resultante
Rechenschieber
Softwaretest
Paarvergleich
Iteration
Kantenfärbung
Komplex <Algebra>
Computeranimation
40:41
Beobachtungsstudie
Stellenring
Software
Mustersprache
Paarvergleich
MiniDisc
Punktspektrum
Computeranimation
Gammafunktion
41:45
Portscanner
Beobachtungsstudie
Virtuelle Maschine
Mathematik
Forcing
Rechter Winkel
Datentyp
Farbverwaltungssystem
Nummerung
Inverses Problem
Optimierung
Informatik
Computeranimation
Metadaten
Formale Metadaten
Titel  An introduction to inverse problems with applications in machine learning 
Serientitel  The Leibniz "Mathematical Modeling and Simulation" (MMS) Days 2017 
Autor 
Maaß, Peter

Mitwirkende 
Gottfried Wilhelm Leibniz Universität Hannover (LUH)

Lizenz 
CCNamensnennung  keine kommerzielle Nutzung  keine Bearbeitung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt in unveränderter Form zu jedem legalen und nichtkommerziellen Zweck nutzen, vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. 
DOI  10.5446/21899 
Herausgeber  WeierstraßInstitut für Angewandte Analysis und Stochastik (WIAS), Technische Informationsbibliothek (TIB) 
Erscheinungsjahr  2017 
Sprache  Englisch 
Produktionsjahr  2017 
Produktionsort  Hannover 
Inhaltliche Metadaten
Fachgebiet  Mathematik 
Abstract  The presentation starts with some motivating examples of Inverse Problems before introducing the general setting. We shortly review the most common regularization approaches (Tikhonov, iteration methods) and sketch some recent developments in sparsity and machine learning. Sparsity refers to additional expert information on the desired reconstruction, namely, that is has a finite expension in some predefined basis or frame. In machine learning we focus on 'multi colored' inverse problems, where part of the application can be formulated by a strict analytical framework but some part of the problem needs to modeled by a data driven approach. Those combined problems can be created by data driven linear low rank approximations or more general black box models. In particular we review deep learning approaches to inverse problems. Finally, machine learning techniques by themselves are often inverse problems. We highlight basis learning techniques and applications to hyperspectral image analysis. 