Merken

# Algorithmic and statistical perspectives of randomized sketching for ordinary least-squares

#### Automatisierte Medienanalyse

## Diese automatischen Videoanalysen setzt das TIB|AV-Portal ein:

**Szenenerkennung**—

**Shot Boundary Detection**segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.

**Texterkennung**–

**Intelligent Character Recognition**erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.

**Spracherkennung**–

**Speech to Text**notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.

**Bilderkennung**–

**Visual Concept Detection**indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).

**Verschlagwortung**–

**Named Entity Recognition**beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.

Erkannte Entitäten

Sprachtranskript

00:03

I mean I'm longing only with US President thanks very much to the organizers for inviting me it's a real privilege to have the opportunity to present here home thanks to everyone for coming on said to be talking about and statistical and computational or really perspectives of randomized sketching younger than surge in particular kind of trying to serve unifier explains that 2 different perspectives of Saskatchewan which at which I introduced the kind that the broad focus of this this workshop is the idea behind when you have wide-scale data problems in machine learning statistics optimization is science you can never looking at me either statistical computational a celebrated perspectives of that and that's the real issue in many kind of modern life skills their problems and and an ideal world what you what you really want is something that new hires only weakened is computationally feasible the unique can run whatever computational resources you have available ,comma but then also has kind of good performance you know and some statistical or other other rhetoric that you might be interested in and I mean this many different ways that people have done and without you know a few different approaches like this workshop will help here more but 1 approach that's kind of received on a fair amount of attention loss of 5 to 10 years is this notion of sketching and also all defined what that precisely means in the context of the problem that I talk about what kind of sketching refers to is if you have a very wide scale data so Hialeah very large-scale the kind of Europe sketch are projected onto something one-dimensional and then run whatever algorithm you going to run on wide status said but are now sketched a smaller dataset that then you kind of world you know that that's 1 way to certainly improve willing reduce computation because you reduced the size of the data and ideally you sort of have not lost too much for computation from from statistical formula accuracy perspective on answered I mean this not like how you do this sketching so of it depends on the on the problem and but there has been and this is a very very sparse sampling of the sketching was been lots and lots of all the work done but here's the few relevant examples but you do particularly like and things like randomized projections the subsampling there are some results to show that you know In substance you you gain computation because even reduce the data size and writer of aluminum much smaller dataset but also potentially not lost too much in terms of in terms of accuracy all this is so broadly based on ideas you'd like to 200 Johnston the Johnson wasn't Lindenstrauss limit for dementia reduction and my concentration measure in particular for ordinary least squares which is what I'll be talking about today all those results needed to show that nearly potentially don't lose much of computationally all will somewhat related to the CRD composition also it's kind of a attracted to like SPD type become the compositions In multi tomorrow or in some kind of lower dimensional reduction techniques but using kind of Nunavut random projections to speed up computation and will also be in this kind of 1 of what and iterative sketching on for a kind of know if you wanna solve linear systems using spectral sparse occasions here just a couple of examples where this idea sketching or reducing the data size is potentially going to say allow us to do yeah Foster computations for things without losing too much serve on a U.S. Justice explaining the genesis of this from service In some ways like this this to serve respects is listed this like a lot of the white that's been done on sketching on prior to this has wisely been in the applied not theoretical computer science which some kind described the Aldrin maker computation perspective getting to people like Michael Mahoney dance film and others have had wiped out worked on this a lot and that the general principle what it is that you know by doing sketching you still get so close to optimal West case error bounds and all have defined what that means for this particular problem although even even by doing sketching so kind of Munich gaining sincere you're getting a lot computationally and basically not losing much from accuracy perspective and this it's some kind of results that that sort of supporters but survived not so originally trained in statistics and use the thinking about things in a successful way and so when I think about sort of the sketching universe sketching tight methods that are that being talked about in which you hear that in some sense you're throwing away a lot of data sir and I'll talk about exactly how much for the problem that that all focusing on their own like and this is a lot like more than half your data basically and in some sense what what what these results as saying is you're not losing much by kind of throwing away a lot of good data said this didn't really start make much sense to make his honestly if you have more data you're going to know it if you have like less Hockaday should be losing a lot of it and so this was something that at some level when I started working on this problem in Elliott's 2013 as did really understand you know why wish you gain something so incitement in the White would not losing too much inside the goal of this talk is baseless canard unified these these 2 perspectives since they identified as a ran into Michael at said at a conference at the beginning 2013 instead of having read some of this yeah this is what unsteady and so do really understand exactly the sort of how you know you're not losing too much on and will see that the answer to that in the context of this ordinary least squares off a regression type problem is that its system so it's were coming out from from a couple from 2 different perspectives so we had a lot of long discussions and arguments about this and I guess eventually we just decided to sell result majors Raheem paper together we tell which of about serve justice of the concrete about the problem that I'm willing and so this is kind of the simplest problem that you can think of that is still so very widely used its this year notion of doing we squares for large-scale problems and analogously to solving wide-scale when your systems also fretting about on really squares you have united to the data like your in your wiser and by the matrix and wise and dimensional vector on to beat to make things country were assumed the boys and the extremely large but ends much much larger than the on and for simplicity and without loss of generality resumed with the rank of X is pleased so you can kind of you easily apply this if you have Lawrence structure but Britain assume for simplicity that that the X equals the answer now we also our household you order really squares of problems so all you this is and and this is something that can be done but obviously the computational cost of that is order in the square which is you know perfectly reasonable and in many cases the when you have winner in extremely light large-scale data saying all this is no potentially you not not feasible and particularly you know if you're doing we squares iteratively you in some sense ideally would want sort of potentially reduces computation yes although this is a very simple

07:28

problem that you know some sense you'd think is or sold is as good as can be described in lots of earlier talks Sir Alex forced upon this in his pocket and minutes of pensioners that in some sense a lot of problems reduced to sort of solving the least squares problems any gains you can get a computationally to to solve that all our there certainly beneficial so the idea of doing sketching in the least squares senses is the pretty simple you apply song sketching matrix M S O S is this are by and matrix are as much much smaller than in and then you on you're going to do your least squares but now on your sketch status have assets and as wise enough Essex's are by and our Y is just an hour dimensional vector and urinated estimated that's kind of year based on minimizing this this least squares problem against the because computational complexity of this is Order of squared plus whatever computation was involved in getting that sketch so if cut could come up with a computationally efficient way to get that sketch then you can potentially know do you can potentially save a lot of computation by doing this and all and that's what we're about what a reasonable choices for sketching matrices later on but the idea is that you have some sense what you're beta data has to be a reasonably good approximation debated it all less under some metrics and all talk about you know what matrices and so how the choice of the metrics as is really important although so this is so the prior work and you know what was basically saying that U.S. sketching wax really well so this is not wide by tenacious and money back in 2011 on which we too base was supported the idea of doing sketching for police squares was basically you know you're gaining you're getting a lot and not losing much from it from accuracy perspectives so busy where you see him as you assume that wide of his white why an extra fixed and you know that they can be whatever whatever they are and what a reasonable way of measuring on you how how much you lose by by doing your own but but buyout by doing your sketching is you can't compare these kind of residual I mean squared so you know why minus expected squared divided by some of the original or so that the accuracy of the original all less estimated and here I am married to major West and Davies a prima all the cruelly and you question have is 1 take a supreme over acts as well and for most of the results I talk about you can apply that that is the only 1 result where it gets a little bit complicated that said essentially can assume that is so any any West case or winning any West itself and what the results show is basically that essentially you get like God that this this quantity is only serve 1 was dealt a wise than the original than the originally squares estimated so that's why it in some sense then and and this was that this was for various sketching schemes and although straight some of them shortly like you're doing random projections are going to some sort of sampling types sketching skin so this is kind of the computational our oral agreement perspective which was saying that you're not really losing anything by doing by doing sketching RIA and Sinaloa home so like going back to the original goal like this is something I didn't get 1 of the study's results like community throw away a lot of your data are much much smaller than and so you would think that you you'd be losing by taking serenely on like a fraction of R and samples they must be losing something and yet you would according to these results no really losing anything Celtic constant you're doing just as well as you were before and so this was something as wasn't wasn't really understanding so I'm used to start thinking about things to be question was a position he has held yes said the conditions said the condition the only condition you need is that the on yesterday the dog within our has to be bigger than like P 1 P so that was the barriers so that that's the only result is that odds with than and people in the city kind of like you're getting enough of the of the roads so that you're you're getting most of year following answered so the land so used to thinking about things and this is really the of the ways the statistical assistance pursuant to think about things to be too easily in China solely squares this comes from some underlying generated generative models like with to like a gas cylinder models young this year the simple gossamer model we have some true prominent data I mean you're adding some noise which you know we're assuming for simplicity's isotropic but that's sort of a fairly easy assumption to relaxes uniting 0 mean noise that's isotropic on and then others at a number of different metrics he can choose in a minute talk about 2 particular metrics Tier 1 and instances it to be leader you often use talking about efficiency when you're comparing estimators so you can think of your sketched a least squares on solution as an estimator and your reading original or less salt solution is an estimated 2 and so 1 of the best discussed metric which looks quite similar to the original 1 is the so-called residual efficiency which is just kind of comparing the residuals but the difference is that you're not taking expectations in its expectations being taken over the noise epsilon so it taking in expectation of the noise of Swan and this is how you're kind of comparing the East-West made is the 2nd 1 which is often what statisticians were more interested in is the so-called prediction efficiencies recent argument prediction error which is like expected at minus Baidoa Bay is the true parameter and all and we're in the top and and this is the 2nd major Korean organization when will say that there is busy going to be a very important distinction between these 2 these 2 metrics so ideally 1 at a time and good sketching schemes that sort near do well in terms of all 3 of these these criteria but are you did this to a released to school types a criterion in the original on what workers and the previous slide was a community hall renege on our computational criteria I guess and I'll talk a little bit about what sketching schemes are actually typically used in practice and so there's a number that you are are widely used essentially has 2 classes sketching the stage for a common there there there but based on kind of randomization there Alsace some determining who sketching schemes that are suggestive of computationally randomized sketching savings schemes generally work well so was based on skates sampling and in particular about talk about what leverage sampling is on a shortly on but yeah Sam trying to sample and you know in a way that's that surrounds the good thing about sampling Rosa poems of doing something more clever than that and the other is the approach that that's widely used is random projections they can imagine if your asses in our random Garcia number really matrix and started do a projection onto using that and then there's also a high-demand projection that as it's a different kind of production that

14:56

essentially has 2 classes of skate randomized sketching the widely used and all the focusing on these costs of sketching either doing so sampling or doing all doing projection I guess I just I mentioned this notion of leverage score sampling so this is not something that's gained particularly by about 1 million others spend a lot of the attraction of a loss may be the 5 years of service in the theory of computer science and machine learning shirt on and so what we courses so we're thinking about sampling kind of your your ex's natural phrase the thing to do would be just a sample uniformly and was that works OK but on like essentially more serious with some simulations later turned on if you do sampling in it in a more potentially clever where more efficient way that picks the samples that you do best in terms of mean squared error you can potentially you know do do more with less samples and so 1 way to do this that's that's sort China's depict the samples that have what are known as high leverage cause that explain high leverage yet as you can bring about the singular value decomposition of excellence music of the transporters and the points the so-called like have high leverage scored if you imagine that you know you have your own you know your new matrix which is essentially the right singular vectors and if you take it to the norm of these of the writers of the singular vectors then on these award a so-called believers scorers in these numbers between 0 and 1 such that the sum of these and leverage costs add up to pee on and in some sense the idea of where its core sampling is the you pick the the rule is that have the highest leverage scorers OK and so at an end and so this is this is a typical knew that the it's been shown that if the points with highly region some sense of best in terms of hand of reducing of the mean squared error that we've talked about season in simulations attend but tend to work pretty well I guess on another southern noted that this kind of quick terrorism from what I'm talking about but it's also it's also related is that in general this concept of leverage scorers and points with high leverage of like that actually goes back to somewhat can the above statistics of community in the 50's and 60's and in fact like if you have you presented this idea presented by sampling points that high leverage to statistics community often say that that's sort a really like not not a smart thing to do at all and the reason for that is because In some sense typically points of high leverage it reflects all you know reflect points that may be outliers and so it is nonsense like this then and I an answer this is being proposed as a scheme to include points that have been produced a mean squared error but a statistician my often often-stated European those points because they have because they're outliers and this idea and also illustrates maybe a different perspective in terms of how a statistician might look at high leverage points and someone in the numerical linear algebra computer-science because some sense the high leverage points are best in terms of reducing means squared error because they if if you've yearly squares problems having like no noise no allies than those those points are the ones that are in some sense contain with information and so they skew your your mean square estimate that you're you're off on really squares estimate the most but on the other hand if employer fitting is instead the fact that the allies the noise which is what did really people statistics care about then you're essentially just unit by senior year estimated in completely the wrong way said that some of I noticed and others that the difference in perspective but for now I'm concerned this is but for now focusing on the fact that these in some sense a models right which is obviously very strong assumption but you know we you know that that song then so were fitting days we're picking these leverage points because they reduce the overall mean square today while any questions at this point and so on a national question you might somehow related to this is that in our original goal was to reduce the computational on really squares and in some sense to get these leverage scorers going use for sampling we at the computer which is the same computation is on really squares so you know why we doing this and we gaining anything and so there is some notice some wiped that's seen and done by a tenacious and others that shows the Mexican Pepys leverage scholars that might not be exactly response but approximately rich guys barely efficiency at the daily efficiently so with computation of order and he's so you can potentially do this computation fairly efficiently as well so that's why you do it like doing this computation is isn't sort of you know you're not just new to solving the original probably something that's hard you will lose so this digital listen only to look formula all of the soldiers who moved here on yesterday that's a good question and I don't know of any scheme that allows you to do online because to complete the letters to the leaders of the good yeah yeah yeah absolutely so that's 1 of the IAG agrees 1 of the potential weaknesses that if you are doing this in an online Wyatt think there's been any work on and I don't know how you do that his use of a need to know all the all the roads a case of the home a sense of building up to the Nunavut the goal is to say explain these 2 different studies to perspectives are answered just starts to want to serve up to that there's a slide that are in those talks about how you can relate the the White House these to metrics perform or how these 3 metrics performed in terms of near properties of the simple projection matrix of doing it when your solely on really squares and recall we had the SPD of excimer previous life here just solving least squares such a projection onto plates projecting Wyant delivery spaces of X and so what if if we do sketching them where doing is rather than doing a projection onto the race of extra kind of doing a projection onto using this so blade a bleak projection matrix is a projection matrix so you know if you square you get back what you you had originally but then you know it's not it's a wake in the sense that it's not that you can't wake up to take a transfer was not the same thing answer on this season related to quantities that involved the Saudis simply projection matrix and is the main thing to take away from this is that if you look at these last 2 statistical metrics than in some sense that on you say that you not to constant that this prediction efficiency kind of scales like and I repeat times the residual efficiency that in a nutshell is expects explains that the difference between you know these 2 respective send why you potentially will do very well in terms of West case efficiency and residual efficiency but not so well in terms of production efficiency which wiped which is what kind of statisticians would potentially be most interested in

22:22

and services I guess the the main result probably the 1 thing that should to take most away from all this talk if we look at a comparison of these results in search of remember that P is kind of the way dimension of Europe really squares are as the number of rows that you've sketched and end is kind of big dimension and the number of samples that you have on if we look at on these at 3 particular us getting scheme so as stands for a kind of leverage scorers in with played against the that's that's with some sort of correction for the bias STP is kind of a gas interests of just-in-time production and the rest of it it is just had a marked reduction on but for all of the sketching schemes he stated for the worst-case and for the residual we say that we get can't A-1 +plus appear over our time so as long as late you are is a little bit bigger than P which is you know like P 1 P relating to what someone asked on then you can save you get the performance guarantees like White House imagination money we're getting our but if you look at the prediction efficiency you get something that's very different which can means that it which is consistent with what minimize intuition was initially that you're likely need are to be of water and even get close in terms of performance or according to this side of the coin to this metric which is what Stateshas statisticians to the carrots in some sense getting isn't really helping you in any reasonable way of if you kind of looking at this prediction efficiency metrics even very and what's fall from a West perspective it's not working well at all in terms of our in terms of production efficiency you have to look that's going going to that on the last leg idea that's a good question in this area so this is not yet pointed out these issues upper bounds on the sketching schemes like other law abounds and yet like there are handled all talk about it yet the the situation in which some say and get high probability results on my suspected probably years player we haven't done that but I certain maybe analysis lot easier to not do that but I think you can probably get away with that new things change that in his arms To like you mean to like to this idea think it would change something better than so I guess how long will soon out talk about low abounds in the end that suggests that you can't really do that on the hand must always have very specialized faces and I'll talk about that long on the next flight but general in in general you can't really do better than that again so this is like just a different from up around and and this is for a particular sketching scheme that I guess is not traditionally used by in a sketch in which forbid that this is on it's sort it's trying to take advantage of the fact that you potentially have wavered score distributions that you can you can take advantage of an exploits on and near that also supports some of the simulations also show and it shows you can potentially do better in terms of some of the statistical leverage cause if you do this leverage or sketching without doing a kind of correction provides on and so you get the KPK years typically be less and because it's taking largest start and like the quietest scale and so you're going to potentially be better using the sketching skin but it's under this very very strong assumptions again justice

26:06

to show some simulations opted to start compare difference sketching skins also to compare the of the different different metrics this can be 6 different sketching explains that he is the 1st 4 to on near solid sampling type approaches that I talked about and lost last couple are related to to you all To projection type approaches story doing this is not a very large status adding that have got some simulations and alive today as saying that you have got any calls a thousand people 50 and we very our own we you are as varied as will from you know around maybe 50 to about 2 about a thousand Missouri with everyone on the period P small and all and we're gonna sample are uniformly our according to wear user multivariate t distribution and the reason for that is because I talked a bit about leverage sketching Weaver's sampling and so why what this allows you to do is still allows us to generate X that has different leveraged score distributions of some acts that has had a very uniformly leverage scholars and summits that time search wariness that used to distributions with 3 different not tied 3 different not falling numbers of degrees of freedom of 1 2 and 10 degrees of freedom home and we're compared the 6 different dog 6 different dust sampling sketching schemes so in particular just illustrates why we're using these different distributions are this is a plot of what the leverage scorers look like for these 3 different studies 3 different values so in some sense that the intuition for this is that if you have a new equals 1 that's a very heavy tail distribution and so that means that you're going to have some sense more non-uniform with the new Eagles 1 has a very heavy tail distribution on Gaza and dividing have fewer degrees of freedom so in some sense you have more of this does not mean on the like of this less uniformly risk or distribution which is what you see here and for new equals 10 kind of getting close like almost revision as a very not very are uniformly Rich score so just comparing the default performance for different choices of In several again

28:22

answered to these standards crowded but if we look at these 6 different 6 different sketching skins on so the colony in 1 of the main points to take out of this and wife told you particularly relating to the results is that you look at the scale of this service is looking at the algorithm make or computation respectively wouldn't be the Y axis you everything's pretty close to 1 or more particularly for the 1st 2 cases everything's have fairly close to 1 Surry near this supports the theory that you're kind of women 1 was Delta if you look at the on and the west face of the worst case type performance that journalists and Miami were dealing with and then on the other

29:04

hand if you look at the US statistical perspective which is you know what you know about that .period metric that I talked about the prediction efficiency on Wednesday that the waxes years substantially larger than 1 because in some sense you're losing in terms of like Bye Bye you're losing in terms of efficiency by having known by having less stated to do it so this is kind of just distance to shine as the plots for when we look at these 2 different metrics they serve on and and and so we can say that our like injection in general produce like sampling schemes and possession protection schemes Nuremberg the seemed to work reasonably well on but a bit but in particular like the reason why assured that that 2nd result was that that seems to do the best are computationally even tho this does not not make not much in terms of theory for that said this was kind of doing these leverage score sketching schemes on without doing any kind of bias correction it is ludicrous to green that's just that's what destination originally do which is leverage score sketching the meadow bias correction a rescaling so that is in some sense what what's typically being proposed and has the most theoretical guarantees but it doesn't do so well in practice the Red is doing that but without doing the bias correction or a scaling so that's actually not required that the reason we do that is because that's not proposed in theory and has very little theoretical work is typically typically like when you do worry skills when you're when you're sampling state and you take our samples you typically need to Rexdale things so that you know you're not like so you In some sense correct for the fact that you have West was by the you've got less samples might multiplied by scores of irony I'd said to make sure that that everything is under control the same multiplied so so if you if you take our samples uniformly you need to multiply the square Nevada and to make sure that everything can worked out and so what this is doing on what but but for leverage scars in memory of to do things a little bit differently because every point has a different leverage score so instead of dividing by squared of flight are and like squared event arise like squared inflict and whatever leverage scorers "quotation mark situation yes exactly yeah that's exactly right to get the correct expectation but what's interesting is that according to the simulations you do a lot better if we don't do this correction most many many times and so that that's otherwise included this and that's why we had that 2nd of theoretical result because knew we wanted to get some theoretical justification for why was doing so much better in terms of the simulations that a man has to be .period al-Sadr having this works very well under this model but under model Miss specification it's unclear how well how well this window this would really do you have to get here but if the people in football Smith said the yeah exactly so you have a bath metrics screens much worse than than red in general but there's there's more theoretical justification for green and red all men in general like doing sampling as you might expect is less stable than doing projections this with projections the kind of keeping all the data but retaining it wears for a referral for sampling the kind of losing a lot of losing a lot of data and in the end it is you expect the performance also was 4 or more unstable for the Nunavut is 1 which is the the case we have more like non-uniform near in your leverage courts against a gang

32:54

back to differences are "quotation mark earlier questions sir on this is actually like it was done subsequently by quantity and Wainwright said we suspect that there was a lower bound but I guess when we're trapper we discovered that plant in Wainwright been working on this side of independently and they have done that done a bunch of other things too but if which looked at 1 of their results and you was written in this this way exactly but it sounds as if you assume that you're sketching matrix has this property services for randomized sketching said to be clear this expectation is taken over however asked over the sketching scheme that you're using and if you can't assume that it has this property which is more or less just says that you've got kind of art are raising it in your sketch if it has this property than because of like the prediction of fiction efficiency scales no better than an are basically said this suggests that you know all those upper bounds we had cannot really be improved by by using a different sketching scheme or that new analysis was not was not really was not really it's tied to this kind of really confounds knew what we we have a done with that tone In fact steel this prediction efficiency which is what new statisticians there to be thinking about is solved it is an intrinsically hard metric and what was kind of being looked at in the Europe in the algorithmic computational or in computational setting and this this was proved using pretty standard information theoretic law banning law banning techniques they answered on service and and this this this conditions kind satisfied for all of the young all of the answers that we talked about except for that I guess so

34:41

the on the red line years the red line saw that we

34:44

actually got a better result for for 4 for this particular case and that's because it

34:49

doesn't satisfy this saw it does not have satisfy this condition but all the other wines that I plotted basically satisfy this condition and so on and so did I guess the white debated also kind of went so well was basically saying is that during 1 sketch of your data when you do on really squares or even constrained least squares is really not enough if you want to do well in terms of the statistical metric and so what they had proposed is if you do iterative sketching schemes are and this is computationally more intensive because you're taking and warned sketches on if you do iterative sketching then can potentially so they get .period the original like this 1 was dealt a time down that that you want but you have to do kind of iterative sketching someone Saskatchewan is enough when you we're doing least squares in your interest in this signs prediction efficiency the is taken lawfully to rescind the law so look to those holding people each year home of this is that you have to go to the well so you can make this 28 bigger and make that half smaller so it's later you Italians lacks kind of you get like this Constance here in the air like a one-liners Delta and then it's like an area like that 128 depends on 1 of the probabilities so if you want to make that figure you you can make this close to to the 1 against said yes that's that's to be layout yesterday in his constant severity would exploit the place for the upper and lower bounds at the upper bounds of based on my conservation measure type arguments in his law abounds in using information theory techniques but here the constants really really lose lose can come in questions right arm

36:42

surge justice concluded and to talk about like this is over you know we looked at a very simple problems number of ways that this white guy can be extended search so this analysis shows is that it for interested and this year we had these 2 perspectives be in computational logarithmic perspective saying that in doing sketching 1 stars mirrored that preserves most of the information you want to preserve and and and does well but then you know it .period like never came out thinking that you're losing a lot data and said this was resolved by justified the word means in some sense would need different print different criteria and different models and so in terms of production efficiency which is what you care about the school setting all this is substantially more challenging than the standard kind of all the maker or west-facing on and then there was this other said interesting they make came up that when you do like this leverage scored sampling and without correcting for the buyers to make the expectation that it seems to do well from from a computational power from from simulation performance perspective on the Yemenis as point out there and new relating to some of these questions along a number of ways this why can be extended served almost flat is the online case of this disarmingly what's in the in the battery case a lot of these are what have these ideas but a few potentially didn't all year is is there a way to extended to to the online not sitting so this is this is kind of the simplest problem that look at this on really squares prominence what with yeah right now is when you kind of have like wearing matrices tenses like there's this notion of our doing sketching to come up with like an approximation to the SPD incident call this EUR decomposition it's a kind of doing words around our doings analysis or of offer that arm then and there's also a perspective of sketching that's related to regularization implicit regularization work and were exploring this saw this connection and also with the weather weather-related sketching can can in some sense recovery like were ranked tensile structures on general tenses are more difficult to deal with them than vectors her matrices and saying it sketching can potentially give you kind of computational gains there if you do the sketching in a in a known and reasonably so clever and efficiently thanks thanks for much thank you go to for his he sued the letters were sent to the resulting in the history of the simulations destination and then you have fast version of 1 the Livery year area yeah exactly so that's this as as a job which is certainly paying the 2 years of the Soviet below the sorrow of last year you have a lot of years and I went on the air so that that also does better in years as a bad does better than the S R 1 which is fast so the SNR 1 of Boston's better than the first one to this but also yet exactly so he is the best game testimony deliveries were the priceless agency you do what you do West than if you knew him exactly but the thing that the goal of the group unless you prove that you have to do this all the jail within this 2 years of moment in it and a reasonably well that's tank but here they feel compelled to do so you yeah that's rights obtained get inked agreements at the time of year that which which will like and that's folly because of that fact that on like you're doing some kind of regularization by doing the approximate and actually does better which is interesting because this is the statement to the fact that you almost to the point that the opposition was yeah that's all there is to the United think it is because it is a regularization thereby argument that right out of something so those only we tried to do and I guess that was what despair was kind of trying to get out but we need to assume something very specific and so we're trying to do that in a more formal way but this is not a little less close to the president of the school year is yet that exactly that's exactly right so that's exactly why does terribly when you have a bearing on you because you're dividing by something that's potentially requested a delay was like this what inflation aware pollution woes this morning is the only losers yet it's terrible so that's and yes that song I mean like 1 way that I mean like other people try commenters they'll do like when combination of leverage score was uniform the Frederick on that but it's still so that seems to still do worse than just not doing any reason resembling all waiting at all there is to which is almost politicians remove the watch for forms of the area that's right from the start of the you don't know what I did as a as I understand that's true yeah doing the random rotations to yeah that's as I understand it would offer little help it it expected to the 2nd militaries households in but this year political issues locations in the the problem of loopholes in the law of in world is in the hands of those hold although this year as yet I don't know that there's a fostered the the final around the world do you think people Over the years regardless of his follows from this school year and Wisconsin also when you say breakable you mean certainly was a legal battle the gun in hand was not mine sometimes said that there is a terrible like so when you say area and that those politicians who was in the energy users a like this and we will do yeah scissors only like yeah the law selling under this assumption search yeah you can potentially yes to half-life injury like a guest

43:44

for example interview if you the other levers distribution which is very like non-uniform than you can say you can get very cloistered together you like a apart like a might like the much much better than Wall abound but on this again this is under very restrictive assumptions are basically what it's saying is voluntarily risk masses in like TV if the samples then you can kind of get those piece samples and that takes care of everything but yeah is not like often satisfied here so that's kind of yeah said independent think what year it was Eurex matrix is like that for a general Accident if he can do much better than all of them the question is how was it with in which and artisans society assumption on so the intuition is that essentially like a few it is you have for example like like anything about an example where P leverage scorers have like the weird scored 1 and the rest 0 you could just take all those people scholars and it's like the rest of the roses 0 answered that would like essentially this would don kind of have a candidate like this SS transfers matrix would have a new very different condition and is what this is basically saying is effectively getting are are an infraction of the of the leverage score awaits answers if you don't do rescaling you're getting where you potentially getting 1 weigh less than that but the

00:00

Resultante

Matrizenrechnung

Einfügungsdämpfung

Subtraktion

Minimierung

Term

Punktspektrum

Stichprobenfehler

Ausdruck <Logik>

Übergang

Algebraische Struktur

Multiplikation

Rangstatistik

Perspektive

Reelle Zahl

Lineare Regression

Stichprobenumfang

Randomisierung

Inverser Limes

Vorlesung/Konferenz

Grundraum

Einflussgröße

Zentrische Streckung

Parametersystem

Statistik

Güte der Anpassung

Vektorraum

Physikalisches System

Schwach besetzte Matrix

Fokalpunkt

Ordnungsreduktion

Konzentrizität

Quadratzahl

Sortierte Logik

Projektive Ebene

Ordnung <Mathematik>

07:27

Mittelwert

Resultante

Matrizenrechnung

Einfügungsdämpfung

Gewichtete Summe

Punkt

Natürliche Zahl

Wärmeübergang

Singulärwertzerlegung

Kardinalzahl

Extrempunkt

Fastring

Raum-Zeit

Computeranimation

Eins

Numerisches Modell

Einheit <Mathematik>

Prognoseverfahren

Gruppe <Mathematik>

Ausgleichsrechnung

Randomisierung

Bestimmtheitsmaß

Vorlesung/Konferenz

Punkt

Schnitt <Graphentheorie>

Auswahlaxiom

Haar-Integral

Inneres Modell

Distributionstheorie

Zentrische Streckung

Bruchrechnung

Parametersystem

Statistik

Approximation

Physikalischer Effekt

Kategorie <Mathematik>

Kraft

Singularität <Mathematik>

Güte der Anpassung

Rekursionstheorie

Biprodukt

Residuum

Approximation

Sinusfunktion

Konstante

Rechenschieber

Arithmetisches Mittel

Ausreißer <Statistik>

Lemma <Logik>

Sortierte Logik

Konditionszahl

Projektive Ebene

Reelle Zahl

Ordnung <Mathematik>

Subtraktion

Ortsoperator

Fächer <Mathematik>

Inverse

Klasse <Mathematik>

Zahlenbereich

Geräusch

Transportproblem

Term

Obere Schranke

Stichprobenfehler

Ausdruck <Logik>

Erwartungswert

Perspektive

Form <Mathematik>

Endogene Variable

Stichprobenumfang

Lineare Geometrie

Gammafunktion

Beobachtungsstudie

Schätzwert

Matrizenring

Linienelement

Stichprobennahme

Schlussregel

Vektorraum

Quadratzahl

Surjektivität

Residuum

Quadratzahl

Normalvektor

Rangstatistik

Term

Numerisches Modell

Grenzwertberechnung

22:22

Mittelwert

Resultante

Distributionstheorie

Theorem

Subtraktion

Wasserdampftafel

Hausdorff-Dimension

Zahlenbereich

Extrempunkt

Term

Gesetz <Physik>

Computeranimation

Gebundener Zustand

Freiheitsgrad

Prognoseverfahren

Multivariate Analyse

Perspektive

Konstante

Stichprobenumfang

Bestimmtheitsmaß

Gleichmäßige Konvergenz

Auswahlaxiom

Analysis

Inklusion <Mathematik>

Beobachtungsstudie

Distributionstheorie

Zentrische Streckung

Statistik

Linienelement

Physikalischer Effekt

Singularität <Mathematik>

Indexberechnung

Paarvergleich

Biprodukt

Frequenz

Quadratzahl

Flächeninhalt

Sortierte Logik

Ruhmasse

Projektive Ebene

28:21

Mittelwert

Resultante

Umwandlungsenthalpie

Zentrische Streckung

Statistik

Subtraktion

Punkt

Linienelement

Stichprobennahme

Term

Physikalische Theorie

Computeranimation

Erwartungswert

Diskrete-Elemente-Methode

Prognoseverfahren

Quadratzahl

Perspektive

Stichprobenumfang

Projektive Ebene

Abstand

Aggregatzustand

Standardabweichung

Numerisches Modell

32:53

Mittelwert

Resultante

Zentrische Streckung

Matrizenrechnung

Theorem

Subtraktion

Stichprobennahme

Kategorie <Mathematik>

Gesetz <Physik>

Physikalische Theorie

Computeranimation

Gebundener Zustand

Erwartungswert

Prognoseverfahren

Zustandsdichte

Unentscheidbarkeit

Konditionszahl

Große Vereinheitlichung

Gerade

Analysis

Standardabweichung

34:43

Mittelwert

Theorem

Abgeschlossene Menge

Gesetz <Physik>

Term

Computeranimation

Gebundener Zustand

Prognoseverfahren

Vorzeichen <Mathematik>

Konstante

Gleichmäßige Konvergenz

Figurierte Zahl

Einflussgröße

Parametersystem

Distributionstheorie

Konstante

Quadratzahl

Flächeninhalt

Zustandsdichte

Rechter Winkel

Konditionszahl

Modulform

Ablöseblase

Ruhmasse

Energieerhaltung

36:41

Distributionstheorie

Matrizenrechnung

Theorem

Subtraktion

Prozess <Physik>

Punkt

Momentenproblem

Hochdruck

Besprechung/Interview

Gruppenkeim

Zahlenbereich

Wärmeübergang

Drehung

Bilinearform

Extrempunkt

Gesetz <Physik>

Inzidenzalgebra

Term

Analysis

Erwartungswert

Logarithmus

Regulärer Graph

Perspektive

Spieltheorie

Stichprobenumfang

Vorlesung/Konferenz

Gammafunktion

Analysis

Einfach zusammenhängender Raum

Parametersystem

Matrizenring

Approximation

Ruhmasse

Kombinator

Vektorraum

Biprodukt

Helmholtz-Zerlegung

Energiedichte

Quadratzahl

Flächeninhalt

Rechter Winkel

Konditionszahl

Rangstatistik

Numerisches Modell

### Metadaten

#### Formale Metadaten

Titel | Algorithmic and statistical perspectives of randomized sketching for ordinary least-squares |

Serientitel | Computational and statistical trade-offs in learning |

Teil | 06 |

Anzahl der Teile | 10 |

Autor | Raskutti, Garvesh |

Lizenz |
CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. |

DOI | 10.5446/20249 |

Herausgeber | Institut des Hautes Études Scientifiques (IHÉS) |

Erscheinungsjahr | 2016 |

Sprache | Englisch |

#### Inhaltliche Metadaten

Fachgebiet | Mathematik |

Abstract | Garvesh Raskutti - Algorithmic and statistical perspectives of randomized sketching for ordinary least-squares In large-scale data settings, randomized 'sketching' has become an increasingly popular tool. In the numerical linear algebra literature, randomized sketching based on either random projections or sub-sampling has been shown to achieve optimal worst-case error. In particular the sketched ordinary least-squares (OLS) solution and the CUR decomposition have been shown to achieve optimal approximation error bounds in a worst-case setting. However, until recently there has been limited work on consider the performance of the OLS estimator under a statistical model using statistical metrics. In this talk I present some recent results which address both the performance of sketching in the statistical setting, where we assume an underlying statistical model and show that many of the existing intuitions and results are quite different from the worst-case algorithmic setting. |