Merken

# Generalization properties of multiple passes stochastic gradient method

#### Automatisierte Medienanalyse

## Diese automatischen Videoanalysen setzt das TIB|AV-Portal ein:

**Szenenerkennung**—

**Shot Boundary Detection**segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.

**Texterkennung**–

**Intelligent Character Recognition**erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.

**Spracherkennung**–

**Speech to Text**notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.

**Bilderkennung**–

**Visual Concept Detection**indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).

**Verschlagwortung**–

**Named Entity Recognition**beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.

Erkannte Entitäten

Sprachtranskript

00:04

the 2 the so thank you for the introduction and an indication that song the OK we have learned during his workshop that the large-scale learning so tools to deal with the large-scale learning problems the statistical aspects and optimization aspects has to be treated joined the junta is 1 example of this is a very useful Casagrande method that is an ugly the of choice for these kind of problems so the money into our work started from the observation that there are 7 results is studying a generalization properties over 1 passes costing Britain's sense but in practice very often the other he sticks and used such as multiple passes over the top for that and combined usually combined with the of the so In autumn-winter presented there are some theoretical results about this kind of although sticks and in particular the generalization properties of such kind of volumes of St has spoken with quite front the stock of the methods surpassing sent through history and too so I start from the problem setting we consider a linear regression in announcer in the space with the first-quarter loss so the chances of this score is critical in our hearts and in our analysis on that I'm going to say there heavily dependent on these choices so the proverbial we forget to say say this again you didn't talk about that's a square lost plays a crucial role so the objective in learning or the classical musician piece from minimize a functionality he which can be taken as an expectation In this case again if the petition score also which depends our set the yearly from W so there and then the problem here is the fact that the major role in journalism but we only have a summer there we can only exist would find a number of finds that the called the examples and so I I have access only to these points that are independent and identically solar-powered randomly sampled from World Cup so this is the 2nd naturally includes the aggression

02:53

with ocean liners in being so if a fix H will be on the idea that I can and so I can generate big preferences by randomly sampled points and exciting at and then by measuring some of the leading a measurements corrupted by the was at the height and the but also in this framework includes for instance function progression if my input .period belonged to waiting from the space for the Sasakwa export functions also and this is a season and forms these an agreement way to write and the learning problems in producing coming the species sold it and you consider learning in a in HS then there is what you want to do is to minimize the again because the risks were designed function not really that I want to find the element in each case a function now depends mainly nearly from the input what it was but that as I think most of you would also if I and establishes a functioning they're producing coming the space generated by the change then that we have their producing property in W evaluated at the point candidate and as a scholar product between the 20 where EPA space age OK so exciting but at exactly source CSI was that we do will see in a moment why I don't Quebec but there and there's a big sigh of a candidate can be seen as being put space on the outer space light and W is a function mapping the input into the office so what they can do is to consider a function that identifies the input .period side with the command of waited this point site and send widens when he widened changed and that if I consider these change of item was inside is and in that functional what they can right is that the EU's 1st the commentary CSO W of side easily and the brother located discover product between W and cakes use the change of viable and then back to the more than I'm presented at the beginning OK so the and setting

05:21

a classical approach is taken over liberalization so what they want to do in the rest of my talk is to start from a sound very classic results about the decline of regularization for learning problems and the then and now we listed some sunshine and then straightening the results and then I would compare them with the 1 that we can get in a force for castigating OK so so what you said earlier today the people of estimated that it will arrive in the current system places is appointing a space station that can be used in this way so ii 1st place beneath which is known by its in the conversion and then I have these restate it whereas function are that in the stalker will be the square of the law as they showed them what they do is that any mines the is that it will arrive in the country can they be these astute about the fact which depends from pardon which is that rendition parameter that the policies the the weight of the rises against the doctor after the date the feuding these are also the properties of this is the interested studied a lot but mainly from a statistical point of view so now I want to represent to use on the statistics proper statistical properties of the system In order to do that I need to introduce Osama bin assumptions and so did the first one is a fairly stunned that some shown basically requires that the mining measure role is a hazard bounded support for .period at sees and the normal right when Texas is not emotionally and the decision was flawed the output twice so the 2nd 1 9 is dislocated his assumption can be relaxed them for what I'm going to say it but I will consider this synthesis another assumption that the still is the fact that the mountain is good as a minimizes and therefore meetings to cease opacity symbolizes the deceptive minimizes his season-ending and they can conceded that a minimum of solution this is a lesser standard assumption say it's a particular chasing yearning that is a further weakening I was good in that the cases so where you have exactly 1 minute target the Planet Earth Western and so on these assumptions you can already proved some consistency results all of that economists make but more essential 70 needed if we want to have a more precise that violence on wasn't of minds for the we have existed for years here for perhaps found in which unified mentioned Casey was OK the moves in the International in finally mention I sure yes bond bond the sets a pump up comets hoping just missed combustible comics continuance of anything that is Kasey keys and for 11 months using the 3rd dimension and the whereabouts of the objective we always see inside in the city of something fundamental use Justine can't event yes but the correct use of the land that injuries the military side of yet be yes for the youth vote is the result of this fine but you don't have any definition for instance you the other was the source that was usually assume that you this is the 2nd reading of the guide out of measures the use of our society devoted to his work on them focus for some of the assets I'm presenting the results in the setting some of the results that they would prevent or even without these assumptions about the English spoke of me I would assume that is were sold mainly because I want to introduce the incestuous condition that is a technical condition that allows to get these ever wrong end 1st I went by EDI defined tea which is the 2nd woman to wear it so and operator here from H quake and is there when the final round at the scene thanks to the mountainous assumption that the I just assumed OK a day what what is this was conditions so user can be rendered as it would like assumption of this emotion I assuming to exist and the more precisely in the past that the dissolution of the minimum solution belongs to the range not only to the range of people but after the end of the exorcist power plant so what then but I will use this stranger indexing just for all 4 of his top tool to make use of it combines along with existing result because the disease is usually a skating that used is use the inevitable result in that papers OK what does it mean I with a picture of a we have the space station if R is equal to 1 of the I'm not asking for anything since the key to this view is that entered into for a just asking the existence of the Miser the FAA is bigger than when I'm then means that the and spaces such department vector space is a subspecies of H R next and as are increases my condition is a stronger ties so as our increasing I'm requiring more regularity to mine the solution while wait according to what I think yeah he's

11:34

another interpretation so you because he is a compact operator from reached which suffered joined to the before it can that in the end they have analyzing and build a sequence of a gene vectors and I menus so my my point of view of the belonged to wage and this is the basis of age they ended this quantity is going on so this is this is equivalent to the existence of bubbling under what does this was condition means asks so that tasks that the seeds of under easing around the bend H a converted is operator and they can write these wait before HE lost weight to the cup bleach and hear what they can do I can compute this quantity using the user bases for bait and they get in these conditions so I guess that CCH is as a fine art this song here is finite what does it mean the Saudi member and that the user ID values of going you with the conduct apparently so this means using quantity here asks that these new proficiency is proficiency will do feel sufficiently fast OK so than usual so this is the usual and this is stronger there'd be this the stronger assumptions OK so

13:03

when this condition so abundant blasts of conditions would improve the following a Arab bounce on the bed and the people of good will arise the and distributor so there and gave the key point here is the choice of that women's should parameters as sticky and improved that the you have some bounce if I choose the regularization parameter in as a dependent on ending his way Spring Awakening depends on the of conditions and what they get here is that the fight is more than 3 outstanding these afraid that increases as are increase this but that the system .period these rate doesn't increase anymore and stops and these mines were not so at odds with the entry on the eve of the fine requiring a stronger regularity on my solution I will not be able to approximately in a faster way OK so what's the adjusting the over the proof is a biased financed it off so what do here is the II introduced enunciated .period did not which is the all the risk to risk blasts and it resolution so I will not have access to this point but I don't need to compute just use it as a reference point and then ended making the compose the difference of the norm between minus the nature and my 2 of the parliament about it on the as the disarmament of new on this in Norman at with some of these 2 terrorists and as you can see here and we have determined that depends on the sample and determined that depends only on the choice of long so long that it would opposition properties of my approach so using so balancing the 2 terms sees this and this and that the behavior of the ceremonies determined by the choice of that by this was condition we will have their we we let the results so these results the results that the shoulder army moxie In this sense so and in the sense that it In a few speaks a class of probably measure and in this case the cost of operating measures that they thinks everyone that satisfied the assumptions I just introduced then there OK I will have a different solution for each injured in different solutions which probably measure and they conclude the place thinks and I believe I can compute the did he stands between the expert in the distance between my algorithms and my theater and my 2 parameter that I can do their worst case I'm not in this in the sense that they put put in front of economics before Baker the problem on which might unblinking gaze worse and then on all these followed the others that they have at my disposal I think the best 1 so it they take the island which has the best behavior OK and as you can see here I don't know if you remember the the explained in demands for being taken over gratification you get the DAB exponent this and it's the same before they entered the U.S. forces there is researchers introduces the overall World to OK yes there so that isn't is the following the day designs that I'm going to that we'll obtain the forum multiple passes as did the are opting not indication on but they're not we feel the optimistic still don't have optimal results for the that's why going to presented his results but basically can do the same replacing days than normal with the error and they are mindful not with TWA the the OK so they know what happened so what told the 4 are bigger than 3 is the fact that you do not even if you're asking more of your solution you don't see is that these good and that this is called this a as submissions Saudi Conover revision as established fact and the In the problem practice is the fact that you do not all are so that you don't know how to choose the liberalization parameter is so usually you need some adopting the results and in this case is readying with the Asian on really you can use for these says let's you also known as violence and principle that allows recall urges all rates OK so if you find if you assume summer the father and assumptions on Europe at steep so that more precisely digging that is the case with a certain rate than he can prove that improved rates so companies that political capacity defendant rates so that depends on these Caribbean demands and are more precise more clearly the carried to the properties of European the singing the Wheeler considered the capacity independence and setting OK but that is about sold United that they should feel does not take into account the optimization so there is a an alternate so what in what can we do this so that someone did it for so there is a saying in the paper by was clear would be suggest there is the fact that if we are in the large-scale scenario computing estimated W act line that can be costly and sold the existing authorities dissolution of immunization problem and does not ,comma bookmark the out of your computer but you will need to win approximately 2 computers and so what makes more sense in this analysis this will be a very important it is to arrives the behavior of at the in approximation of the true meaning and so I will let and then altered with alarming the meaning that the is it's a bit that these W headlined T is the outcome of the teeth by tradition of interactive procedure applied to that will arise in the 2 countries so it be a classic Island bias violence the composition we got another term which is an optimization so the 1st thing they 1st of all views ,comma and it's a comment or consideration that 1 Candlewood looking at the 2nd position is that it in when we user in optimization procedure in order to optimize the whereas in becoming a skit makes no sense to Will beyond the statistical accuracy since there will be a no not in Sweden it would be lost in the media the traffic will be lost In OK so what kinds of means who book and so on the 1 hand and the

19:57

composition so that parallel to these observations let's say in the last year and the last years there have been a very active activities research on optimization methods to solve problems that have exactly these forms so that can be interpreted as at the hearing will arrive in beauty risk function so the power and this structure over at the some structure and then maybe of some possibly have strongly comments or not and especially in the largest cases synod so there's been a there's been a huge amount of work on it methods that skated they mentions office for America and in this sense of knocked out not blind with Spector the structure of the function we optimize so picking book on that we are minimizing the sounds I'm thinking to the both for that mattered but speaking you still costs because incrementally abrogated OK so all these class of meant so were and what we can do once that we have this convergence properties so the complexity bounce on these optimization methods who can think to balance these terms and pulled things in a trade-off that include also did a number of steps that are needed in order to approximate might my idea then welcoming my 2 parameters what we can also do ended in it's so what are we do now is full instead take another of their actions and to abandon the as the composition trying to 2 . to have 1 that does not speak to say this statistic apart from the transition 1 OK so suppressed

21:42

agreed in Mecca this needed younger than does the stock so the reality is that that we forget empirical research and we tried directly to minimize our Drew Brees good for the expected error in this case it and the with what did with is suppressed agreed in sentimental since we do not the boxes to between gradient seems the always measure is unknown so 1 possibility is to use the is a simple version were a each status is determined by this would each as the plastic approximation of the gradient is given by a single sample .period and then so the game In the setting there is no explicit recognition we do not add explicitly not hearing and we have also said that after the country's out so that undervalues assumptions that the established some of the convergence results of easy conditions boss minimize the ordaining of the victories not that what's the point here so what was the point I want to make is the fact that in practice very often multiple passes over the top are so I knew of course was aggression which I know how many big guy yet so it's not that truly alliance aggression because they know they will arise and I know many people will see it ending which I have the possibility to seek to visit them more than once but this is done in practice and Our idea was to try to explain from a technical point of view why these were all kinds so I even rightly we focused on this seemed less the In version of the multiple passes category in which a seeking conversion play tonight the finishing order 2 the tools make some points so they as you can see here but if you look at the multiple passes the test again and this can be immediately interpreted as a musician procedure Fordham peak so disease when only free and easy at each point in the times we will converge to the meaning of the indeed countries would get at me and then get out of solution that can you might mean that is the fact that the In a each step is a gradient sweet-faced cost a gradient that but as before and the wider use these a stranger the composition in with any never and then out there 1 that because I wanted to work to keep these numbers people putting In United as the summer thing which is the number of passes over the top and that there is usually no couldn't and called and that is usually called at book the prior to that so the main

24:35

question here that they try to answer is the only passes should we should we all many times should is he always easy that in order to minimize the risk so what's the what's the point so as I told you that that is the reason they noted that the convergence of vacation worsening market the meaning of the inter-Korea skis well established sold that starting from the take us but also for by the Romans keep all have to so there are many streets good faith classical but what we would like is to provide this kind of result so what's the point and the idea is to exploit

25:20

to stymie the properties of our word those forecast gradient and the original of Hall so was 1st introduced in auxiliary information which is degrading sent up blind to the tourists come the meeting as you can see here at this that I fix the step size which is gone now then that was a constant and the number of points and I write degrading dissenting these strange way in order to make the combines an easier with the incremental method soul a W D if you want conducted plus 1 will be the result of key plus what in at the end of the 1 and conditions although they didn't sense on the trees so what happens we know that these a it's a great example prior to the use of these at iteration we converged to minimize the obvious kind on the other hand what they have is that I have made it out of my thing was to put boxes the possibly applied the impede career and I will prove that for certain time beast with additions are close to each other but then after a while is the addition deviates from my from my the Gold and goes forcing the minds of computer countries so they their the job here is to detect the live CDs these his zone where are they the approximation of 2 objectives or some the enthusiasm was using the accepts of the averaging the new soul can say yes it's also the shouldn't admit the fact that no that this Societe these lines converging so the visa truly begin applied to a common function like these that size you and he speaks although there was no In fact the effect look at but look he spent his life in the cities this will underline a it's not the mid-May through thing it's strongly in sorry in function that is for sure but here we go not do something that city so that's the way it up point but is not that I think important that were essential for 4 days presentation of kindness so before beginning their Connecticut results let's see what happens in practice

28:03

so working with euro is a very simple submission we uh generated somber sitting regression problem generated some random points uniformly at the end of summer and noise measurements why young and then we divided these points into training and the test set and we apply these incremental supply gradient so what that says is that if you look at the training error so more or less is decreasing right we know that the it's incrementally free not sentimental but these diseases these conversed with the mini-mall quite a few morning days a test set to with that this is that which should begin with the approximation of our through Wetterer expected error what that says is that after summer a editions these curves starts to To me what to do what began and therefore was starting the the defeating Richard right and should the better early stalwart additions here instead of going on India of loopholes so yesterday and I don't know that I would not try to woo since it's in all of Europe and I think that here should be I think it's a around like a 50 50 but less than 100 so anything that underlie way conditions so that there should be I think it could be that I think that reason that I can't is 30 points and this is the true number of additional security is the 1st at book so the 1st step accused that affected here and then starts decreasing the means lower this year it the was function of the One this was the communication so what function the hours of the eating a functioning aggression and I don't know what is so it was really just a it blocked tool to show that here but the thing functioning in some of the so it's exciting DID NOT be and why are you not the still here that is an example of the debate there should be another point of view inhabited a perspective on the a senior example not the same so here we have there is an elected because it's more recent so here is that we try to walk proxy make it is again a year erudition problem but it's a function that can be taken as a combination of 2 limited functions so here we have a 40 kilometer functions and I'm not from the I don't know exactly pointed 30 40 so this is the result of what happens after the 1st Apple clothing ferment entirely begin and that's what up enough that and that's what happens after the 100 at work so that that's the BDA stools stopped the vacationing in order to achieve it is reasonable approximates it will arrive approximation lower .period OK so there results of the 1st we assume sold only boundedness and existence of solution without assuming any stress condition what we can prove is that full fix the step size as you can see here the suspect sites depends on our boundedness the costs of our policy not unknown in general and the so what that happens is that we can prove that Jesus is the major is the university consistent so converged on emotionally to the 2 1 if we assume 2 things 1st and then on the red books was from Fiji as the number of points role and Sakhalin it goes infinity not too fast so this can be interpreted Of the Commons so that they become that we can make understatement of abuse that size is speaks of tying the constant but the idea i.e. the quota that is divided by United implementation and so they did what would these things saying this is that on 1 hand 30 stopping is needed to achieve consistency so this can be interpreted as an order stopping the ruler since we've been on the ratification of people to flee to fast but on the other hand the number so we we need would passes so what these results is that with our analyzes with these choice of that size you need multiple passes all opec there is going to be here the president of the the reversal of Chi II by the justly name Anthony I need to get the theater would be so as to what they say is needed with the size of that sort of thing so the value of guests just common in brief comments on this what we saw what we proved is that if you do you should take in amazement that sites so dumb over and do more to board busses were multiple passes over the order of the squandered here then there you have consistency on the other hand what you can prove is that if we do 1 passed across the grain metal and each was a larger and you choose just 1 that book then you will achieve the same result of kindness and here again is the same in procedure that they mentioned before so I think he needs for the period so for the for the W's and this is not to know I think there is so much so that it begins these wonderful care about the Arabic the comparisons is useful for 1 it was surpasses makes sense why should they do smaller steps sites and the thing that income more than

33:53

once he said and we only 1 pass with the absence so I'm trying I will I will convince them that this makes sense in the followings like so that's where are they whose condition comes into play so end if we want to find simple bounced we need Posthumus was condition and in this case what can still prove high probability this week with improved high probability that converges with high probability here with these rates here explained here the number of passes over the date of the pence 1 hour proportionality Safwan proportion but depends on the irregularity of all work for old solutions so so what does word some contacts again what does the ban in the fact that the 1st lady rates are again saw opting Monday saying that they showed that the beginning of people that I wouldn't use that is no separation with sold are even the fog was 15 so as I was between the Jesus and the rate is contained in proving sweets increasing Telus disclose land and the woman and the southern assault what says is that this is the only thing that you have to in tuna nets include I'd have to I need to be adopted these days stopping rule before the number of passes over the top so also here as in the article of faith I can use I can get adopted results by using phrases again Amundson principle so now I'd like to do a 2nd round of comparison with the past 1 past suppressed agreed so that in order to comply with the existing results in that that they are in the same setting is always swinging the demise of the measure setting taking into account the stress conditions by obvious the annuity everywhere and in addition everywhere opposition turn models to compare with all of them OK so they Keely as I said at the beginning about surpassing didn't as an understudy so convergence in finding the nationalities for strongly conducts function is classic I'm and goes back to Robinson moral and about the being the and many other developments later instead he made it in the the case and specifically in introducing can space with this quarter also uh I would say that the denies it starts from the paper reminds me and in 2006 so they not To complete apparently papers on which I think it's uh and useful to compare with which is useful to compare our results and his newspaper right in thinking and continued despite passing out In this by Frances and I need to do so and begin a OK 1st the ICC 1st things that are less credit markets and not less than a month but less than a month for what I'm going to say later sold the 1st that this is the only paper which is able to coast to obtain capacity dependent bounced so this saying that the on 240 globalization so that is able to take into account to the effective dimension of the problem so interpret between the final painting information the idea that took me into cases instead the capacity is is a cup of capacity independent rates as always and as you can see there is a difference in we have some matters which have separation puddings as this 1 would be son and he's won by inventory that does not Seoul ended day at there different results in expectation and that when the young 1 without Robbie Deans this 1 as our 1st and usually the analyzes to obtain hydraulic advances more and more so now all I want to say that we have rendered it and the only comment that that I think is a leader in month here so as you can see here also in this case is ruled by the way they can In order the outcry and warranties papers of paying rates Knox rates so they are indistinguishable from a statistical point of view on the matters that I presented they're in a so the point here is that in order to achieve these rates bought in unknown newspaper is the stepson it depends on the show's conditions on the planet there are in this was conditions here we have only done not also here and we had we have got both dumber and so that the new feature is the following so we now I think the it was just will here the garment brands and the suspension OK so it would share would have been the new picture fate will be weakened basically we have to regimes in which we can achieve any marks rates 1 that he is the 1 which should take a small step size universal so not dependent on the was condition and the number of passes on the date and that depends on your conditions the other 1 is where the big case where you want passed with biggest that sites and that you'd want 1 past possibly with other G so I think that is sold in bulk in

39:43

both cases what they want to say is that there a modest collection is needed ended in here to so in order to achieve marks radiating left select the right step size and here you have to select the right number of passes over the top so you from this point of view a whopping claiming is that our method that is when comes as a natural approach in the sense that know what you can do for says is too In view of training accurately and divide the and it test and just 1 including online being it said that the behavior of the test their Anderson stop when this 1 over the options so that's I think it's fair for the good of the main difference is the "quotation mark plays this year is who plays the role of the liberalization parameters here is the step size here is the number of passes and what I'm saying and they compete just do is that they in 2 being adopted on this the number of passes of the although they did not ease natural and can be done this is the a to further stated that all although knows saloon steps as being in the same whatever produced curious result was yet another sign that yes this is so then in Hunan within 90 saying that getting mind with a constant steps isolates the compost divided by the end the yes 20 cents for the day approximation error so that is the optimization IRA here what really matters is the sum of the step size and the same might be for the so it's certainly the same for the assembly ever use different serves as long as strong as saying that the so but is that from our analysis you can derive some convergence for this case before the start 1 of few taking part on the gunman had begun on depending on the source condition so you can but the point is that we do not get optimal rates and the we do not know yes adding that we do not get all the merits and step size that undivided from our analyzes are too small so we didn't solely in open problem future the the the detection of research would be to understand what is the analysis with the city's and nice that allows the interpolate between the long that size 1 passes toward the short step by step size and pomposity this year the fact is the role of provisions the 6th something else so here's a multiple passes you actually come be degree because the answer's yes in much worse the government has agreed to in the past want what you have to do here in order to select the right that size so probably will have to do professors maybe only lest they become just 1 justice that size told him where cost of society and you just another you going use them news of most of this should be merged or sold off constant fled our in our case at the moment we are not able to do it again and again it would be great is his the president is getting a little of the there is no progress this is the only reduce biased need not final although we know there's only 1 of the most we not because it because as you can see it so as we know additional candidates assemble letters for the stability increases with the therefore animal were the news that was lost than some of its more liberal causes what might get a year after year we knew this when you read the 1st said that is yet this would be just that something like this so we can prove that song from our analysis you can ride that give convergence 47 . 1 yen down should be Donald then increasing for Christ's presently these answers to the question OK

43:57

so all 11 just 1 last combines not that is also breeding center so we know that Britain's centered on you become risk with a stopping works and the what they can say is that the suit from the competition .period new and the water from the competition and the statistical point of view this season by no difference between them to put Bosasso castigated and uh and gradient different gradients sold 1 . 2 reasons 1 point we understood and that I have to say that is 1 . we understood all sing the optimization scenario in the services of a users views the world have not so that you can prove that the stopping times the Saints issue acceded to the Met being sold 1 Iacocca equal to 1 part 1 9 past Is this exactly the same thing stopping time Saints that size His Constance says that enough about the other should be small and I think that they should be confident of the NPT conference and have the 109 appearing regularly in music is here president yes for sure these long enough to at the moment we don't have the answers decision of the government was the way in which go visit begin the United States yeah so only what so that as I was saying this is an open problem also engulfed musician is not clear what is the advantage of an incremental method applied to with some of the problems instead of integrated method but what is known is that the soul and Pico served at least the end of the summer and almost the to find is that at least 1 Europe far from being a miser and the beginning of the petition increment 1 possibly incremental is more-or-less equivalent to 140 gradient passed on to you do not entering this confusion region where aware their calls for the demise of budding miser sulfuric some months are away so push you away from this 2 1 so this would be a again the 1 wife from empirically the incrementally gradient has a better performance than 1 OK so they just meaning some idea of the the fruit so just to see a trade-off in trade of solar we I'd call you that I consider I'm comparing the my education with the 1801 and so they did buying science trade-off peacetime is still tainted in body the and bonding these quantities here there would be the 1st between our it's the multiple passes the castigated in a collision with the on at risk so this quantity here now the bias is exactly the optimization right because I am I'm applying gradient method to minimize a function before I know this is exactly the optimization and Sheeree study updated is something that comes into and the so as should you before with the pitcher and India will be approved with to prove that this this quantity is increasing with tea decreasing with any ended up in yes said and quantity instead of course is increasing with the before the ability of the produced boss these 2 2 terms OK they're in "quotation mark what prepared just the few likes to work out of the pool the day as I said at the beginning so I don't forget that on as you can see also worry from here had this could also plays a crucial role because otherwise it would not have these expressions of being all the great so the Yankees that I can ride the period at the end of 1 Apple as a protector during of the region descent conditions so it's almost immediately sent was that size ,comma but that we have here is a mission there is said India with be tools and and just to compare these without opening day a continues setting so here I have an open these operator is is just this 1 wants to put the moment and the Cadiz 1 it is an element in nature and then she area these 2 quantities that are quite complex and you can see and so that's the point anything you can do for the conditional on the the expected risk and you get more or less a very similar expression here but a similar agent the and so what you basically is to ride the difference between these 2 carry with by the sum of all the operator which has long less than 1 and so you you forget it and then you have dessert some here where you have always it's a and PD :colon there and mean and meets the expectations so you compare empirical mean with the expectation and so basically you can apply that it was pointed to this as the only problem is for these 2 will complex terms here because they did not some of independent viable sparked fears the sum of Marketing for BPA's concentration of which still and then you get a bond of this form increasing decreasing in an I with I probably the instead they all creditors missionaries standards is Winona and you can prove that the the the the rate depends on this was condition and on this some of that size comes from and so did the final result is obtained by balancing simply distort terms and you think that the expression that officials OK so that they can contributions and think that the women so that we add some results that are the 1st results explaining trying to explain the generalization properties of summer use the UT speaks for suppressing gradients and a more specific multiple passes and stopping and also some future work any mention of sold some of of his mentor in France something that this into and that means the origins and the rest of Europe the rest of the country analysts survey was militias over rooms usually referred to as the focus of this so we started from the city 1 thinking it was the easiest promised by the noise coming from the crossed for the analysis it is the easiest 1 but in any case there so I think that already Lawrence Alaska with the long Mina but try to generalize the city's answer both to no loss and 2 other something techniques soul goes in particular costing 1 for you visit at the time of the year .period multiple times but yours you randomly select 1 what is the thing which would be very interesting I think it is the random shuffling its because here is that you're sitting at my reaction came from some institutional aid and then I have whatever order and I'm just ready and we're seeing the end user Everything is it's fine I want what I mean it was independent cases like this writing optimization refusal was low the market was in the early years of where is of the that just too because of the extent of the year-earlier results not about which we're having station versus the euro again members of the universe and that on them I think we have an understanding comes from the fact that you have run the trigger points and the the the the the way the I think so and then after often this you can and this is just an optimization procedure to the at least not for between I think it carries and you can their access to your point to visit your points from you can visit here .period seeking you can choose to visit your points by randomly shuffling them after the checkbook Singh from the optimization point of view is adopted and lost a policy seems to be the best 1 empirically there are some results abilities and the results show that it works also infuriated nothing nice little know exactly the constants out there so what is known that the at least up to my knowledge is that that the increment in the knishes settings so the incremental choice of the points believes usually the worst then in the forecast extending but for assessing this strongly comics case they are saying that if that is the reason people provide good to bond among you and that was the last of the offices of the idea that often have because it's a matter of conference In this yeah but the it's so Soviet from the thing we don't not we don't see these effects here and I don't think so what could happen but I don't think this way I don't think so that you know if you're bookings it's what he should optimization procedure is better foster problem you can't stop there yet but I would expect these more from an accelerated method than from the on the 1 hand of the that questioning of that the world is usually measured in the world you should know that you know how much is this again again the rest of the year you know use the Bennett but you cannot choose it so it comes with real problem right 688 along a problem with the recovery of our allies moved from Bloomberg News if the identity the traditional home of Ramon who have the commission review this programs in reviewing this universal ,comma images this year and have be careful because here we are looking at convergence of the candidates and not of the error so there was this crude demand that you're talking about is on the error OK so the in these cases be on Yerevan B 2 Hawaiian divided by 4 press once they should be better than 1 have failed initially the with this soldiers so when I was 15 degrees goes to 1 I so want if you look other regions here is still a question in use the will review revisions analysis His approaches to reduce the Colts will always be a review of the school year which is sold you find find are we wanted to foster the release of the according to the world where is the original art at the suggest we use 1 pass because it's 1 of yeah but see the point is that you've read the noise and yes the visualization comes from the era stopping you want big idea that is when the inverse problems he got to where you miss sensor exploits summer centralizing effect all the delicate procedures so you can interpret them as a yes as we were musicians procedure and for the early stopping limitation allows you to get to a a form of increasing categorization some of you have something to say relations with the greater your problem morphological that so long as you would have to move with progress here this is the way musicians this year you will get a new 1 but now I think it is I think it worked so you you can and you cannot write that we knew just like also I think these in sense but probably can be defined and that what this year is that the the longer we depend on our again with you we some something like this where is Europe so that you can shoot and then cross-pollinate also on this parameter using some of the review of the sliced she is isn't land island Phuket today if you are able to approximately 2 solution I think on national alignment that this them Due to only I asked yesterday also sorry yes of course so I compared my results with a live feed the on-line setting where they Eisen is known because he's made more sense in the sense that when all eyes and so would be but of course the opts for on this constant this quantities here are constant would expect to keep butting their true one-line scenario we we have something that bodies with the Nautilus To win so we not be constantly yesterday you could do it but I guess that would be banned again are forcing his case as mentioned in the on what pollution to enrollment across conditions may be because this being so also for their worlds usually yes yes yes you have guarantees you can achieve the same rates with cross-pollination so for these sofa for a big eateries you needed a balancing principle but since the stopping times the same both for the circuit and the and the theory you can play to meet with this and use cross-pollination the few and

00:00

Resultante

Distributionstheorie

Lineares Funktional

Statistik

Einfügungsdämpfung

Punkt

Matrizenmultiplikation

Kategorie <Mathematik>

Minimierung

Klassische Physik

Zahlenbereich

Raum-Zeit

Objekt <Kategorie>

Erwartungswert

Lineare Regression

Indexberechnung

Spezifisches Volumen

Auswahlaxiom

Analysis

02:51

Resultante

Abstimmung <Frequenz>

Matrizenmultiplikation

Punkt

Momentenproblem

Extrempunkt

Familie <Mathematik>

Element <Mathematik>

Extrempunkt

Gesetz <Physik>

Raum-Zeit

Gerichteter Graph

Regulärer Graph

Existenzsatz

Analytische Fortsetzung

Einflussgröße

Funktion <Mathematik>

Distributionstheorie

Parametersystem

Lineares Funktional

Nichtlinearer Operator

Statistik

Extremwert

Grothendieck-Topologie

Kategorie <Mathematik>

Stichprobe

Übergangswahrscheinlichkeit

Strömungsrichtung

Biprodukt

Frequenz

Ereignishorizont

Entscheidungstheorie

Funktion <Mathematik>

Verbandstheorie

Menge

Forcing

Rechter Winkel

Konditionszahl

Dimension 3

Ordnung <Mathematik>

Lineare Abbildung

Gewicht <Mathematik>

Unrundheit

Bilinearform

Unendlichkeit

Spannweite <Stochastik>

Arithmetische Folge

Binärdaten

Indexberechnung

Widerspruchsfreiheit

Fundamentalsatz der Algebra

Lineare Regression

Mathematik

Vektorraum

Physikalisches System

Objekt <Kategorie>

Parametersystem

Randverteilung

Eigentliche Abbildung

Innerer Punkt

11:33

Resultante

Quelle <Physik>

Theorem

Subtraktion

Folge <Mathematik>

Punkt

Gewicht <Mathematik>

Vollstetiger Operator

Sterbeziffer

Ortsoperator

Minimierung

Natürliche Zahl

Klasse <Mathematik>

Auflösung <Mathematik>

Extrempunkt

Term

Stichprobenfehler

Computeranimation

Multiplikation

Iteration

Regulärer Graph

Existenzsatz

Stichprobenumfang

Stützpunkt <Mathematik>

Punkt

Abstand

Einflussgröße

Gerade

Auswahlaxiom

Analysis

Nichtlinearer Operator

Parametersystem

Approximation

Exponent

Kategorie <Mathematik>

Stochastische Abhängigkeit

Klassische Physik

Vektorraum

Physikalisches System

Frequenz

Menge

Arithmetisches Mittel

Forcing

Konditionszahl

Beweistheorie

Parametersystem

Basisvektor

Ordnung <Mathematik>

Normalvektor

Wärmeleitfähigkeit

19:57

Resultante

Punkt

Quader

Minimierung

Gruppenoperation

Klasse <Mathematik>

Zahlenbereich

Gradient

Komplex <Algebra>

Term

Stichprobenfehler

Gradient

Algebraische Struktur

Iteration

Exakter Test

Spieltheorie

Stichprobenumfang

Vorlesung/Konferenz

Ordnung <Mathematik>

Einflussgröße

Leistung <Physik>

Lineares Funktional

Parametersystem

Statistik

Gerichtete Menge

Approximation

Kategorie <Mathematik>

Güte der Anpassung

Stichprobe

Frequenz

Arithmetisches Mittel

Diskrete-Elemente-Methode

Funktion <Mathematik>

Konditionszahl

Ordnung <Mathematik>

Innerer Punkt

24:33

Resultante

Prozess <Physik>

Punkt

Quader

Gruppenoperation

Annulator

Iteration

Zahlenbereich

Gradient

Extrempunkt

Kombinatorische Gruppentheorie

Computeranimation

Gradient

Topologie

Iteration

Gerade

Lineares Funktional

Addition

Approximation

Kategorie <Mathematik>

Güte der Anpassung

Klassische Physik

Kombinator

Zeitzone

Arithmetisches Mittel

Objekt <Kategorie>

Verbandstheorie

Konditionszahl

Parametersystem

Ordnung <Mathematik>

28:01

Resultante

Unterring

Punkt

Annulator

Gradient

Extrempunkt

Raum-Zeit

Computeranimation

Gradient

Exakter Test

Existenzsatz

Lineare Regression

Einflussgröße

Chi-Quadrat-Verteilung

Auswahlaxiom

Große Vereinheitlichung

Addition

Lineares Funktional

Statistik

Grothendieck-Topologie

Approximation

Globale Optimierung

Frequenz

Widerspruchsfreiheit

Arithmetisches Mittel

Menge

Sortierte Logik

Rechter Winkel

Konditionszahl

Ordnung <Mathematik>

Normalspannung

Theorem

Subtraktion

Wellenpaket

Gewicht <Mathematik>

Sterbeziffer

Hausdorff-Dimension

Geräusch

Zahlenbereich

Unrundheit

Matrizenmultiplikation

Stichprobenfehler

Erwartungswert

Perspektive

Modelltheorie

Grundraum

Stochastische Abhängigkeit

Widerspruchsfreiheit

Gammafunktion

Trennungsaxiom

Kurve

Stochastische Abhängigkeit

sinc-Funktion

Kombinator

Schlussregel

Paarvergleich

Unendlichkeit

Parametersystem

39:42

Resultante

Stabilitätstheorie <Logik>

Subtraktion

Punkt

Gewichtete Summe

Wellenpaket

Momentenproblem

Sterbeziffer

Minimierung

Natürliche Zahl

Zahlenbereich

Gradient

Gerichteter Graph

Computeranimation

Arithmetische Folge

Exakter Test

Vorzeichen <Mathematik>

Vorlesung/Konferenz

Analysis

Parametersystem

Physikalischer Effekt

Minimalgrad

Garbentheorie

Interpolation

Fehlerschranke

Offene Menge

Rechter Winkel

Konditionszahl

Mereologie

Modelltheorie

Ordnung <Mathematik>

43:56

Resultante

Einfügungsdämpfung

Punkt

Gewichtete Summe

Momentenproblem

Natürliche Zahl

Minimierung

Element <Mathematik>

Sondierung

Gradient

Arithmetischer Ausdruck

Gruppe <Mathematik>

Nichtunterscheidbarkeit

Gradientenverfahren

Auswahlaxiom

Parametersystem

Lineares Funktional

Nichtlinearer Operator

Statistik

Kategorie <Mathematik>

Frequenz

Entscheidungstheorie

Konstante

Arithmetisches Mittel

Konzentrizität

Menge

Rechter Winkel

Konditionszahl

Ordnung <Mathematik>

Standardabweichung

Subtraktion

Kategorizität

Sterbeziffer

Stoß

Wasserdampftafel

Gruppenoperation

Besprechung/Interview

Geräusch

Bilinearform

Äquivalenzklasse

Term

Physikalische Theorie

Stichprobenfehler

Multiplikation

Erwartungswert

Mathematische Morphologie

Arithmetische Folge

Inverser Limes

Maßerweiterung

Optimierung

Inverses Problem

Grundraum

Analysis

Beobachtungsstudie

Relativitätstheorie

Kombinator

Fokalpunkt

Summengleichung

Minimalgrad

Flächeninhalt

Mereologie

### Metadaten

#### Formale Metadaten

Titel | Generalization properties of multiple passes stochastic gradient method |

Serientitel | Computational and statistical trade-offs in learning |

Teil | 10 |

Anzahl der Teile | 10 |

Autor | Villa, Silvia |

Lizenz |
CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. |

DOI | 10.5446/20848 |

Herausgeber | Institut des Hautes Études Scientifiques (IHÉS) |

Erscheinungsjahr | 2016 |

Sprache | Englisch |

#### Inhaltliche Metadaten

Fachgebiet | Mathematik |

Abstract | The stochastic gradient method has become an algorithm of choice in machine learning, because of its simplicity and small computational cost, especially when dealing with big data sets. Despite its widespread use, the generalization properties of the variants of stochastic gradient method used in practice are relatively little understood. Most previous works consider generalization properties of SGM with only one pass over the data, while in practice multiple passes are usually considered. The effect of multiple passes has been studied extensively for the optimization of an empirical objective, but the role for generalization is less clear. In this talk, we start filling this gap studying the generalization properties of multiple passes stochastic gradient method for least square regression in an abstract non parametric setting. We show that, if all other parameters are fixed a priori, the number of passes over the data indeed acts as a regularization parameter. The obtained bounds are sharp and matches those obtained with other regularized techniques such as ridge regression. |