We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

1/3 Operator limits of random matrices

00:00

Formale Metadaten

Titel
1/3 Operator limits of random matrices
Serientitel
Anzahl der Teile
27
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
AnalysisDifferenzierbare MannigfaltigkeitEigenwertproblemFolge <Mathematik>GeometrieGraphMathematikNumerische MathematikOptimierungStochastische MatrixMatrizenrechnungAusdruck <Logik>WahrscheinlichkeitsmaßInverser LimesLeistung <Physik>LoopMomentenproblemSigma-AlgebraTheoremReelle ZahlEinflussgrößeGewicht <Ausgleichsrechnung>PunktspektrumWurzel <Mathematik>Nachbarschaft <Mathematik>RandomisierungQuadratzahlSummierbarkeitPunktSymmetrische MatrixUmwandlungsenthalpieMultigraphAbgeschlossene MengeDifferenteObjekt <Kategorie>DickeAdjazenzmatrixVorlesung/Konferenz
EigenwertproblemGraphNumerische MathematikStochastische MatrixTopologieVerschiebungsoperatorMatrizenrechnungMittelwertAusdruck <Logik>Ganze ZahlWahrscheinlichkeitsmaßOrthogonalitätAuswahlaxiomGeradeInverser LimesMomentenproblemProjektive EbeneSigma-AlgebraGüte der AnpassungEinflussgrößeAbstandPunktspektrumWurzel <Mathematik>DistributionenraumSummierbarkeitPunktKreisflächeMultifunktionDiagonale <Geometrie>Komplexe EbeneMultigraphUmkehrung <Mathematik>Dreiecksfreier GraphDickeFourier-TransformierteFourier-TransformierteRechter WinkelAdjazenzmatrixEinheitswurzelVorlesung/Konferenz
Algebraische StrukturEigenwertproblemFolge <Mathematik>GraphNumerische MathematikStatistikTopologieMengenlehreMittelwertFinitismusBerechenbare FunktionWahrscheinlichkeitsmaßAnalytische FortsetzungArithmetisches MittelErwartungswertInverser LimesSigma-AlgebraStellenringTermTheoremEinflussgrößeAbstandPunktspektrumWurzel <Mathematik>Nachbarschaft <Mathematik>DistributionenraumPunktMultigraphRandwertVorlesung/Konferenz
EigenwertproblemRelativitätstheorieStochastische MatrixTransformation <Mathematik>MatrizenrechnungVektorraumOrthogonalitätÄquivalenzklasseBeweistheorieEindeutigkeitHypermatrixMereologieSigma-AlgebraTheoremEinflussgrößeAiry-FunktionBasis <Mathematik>Klasse <Mathematik>Symmetrische MatrixMultifunktionDiagonale <Geometrie>DiagonalformInnerer AutomorphismusGruppendarstellungVorlesung/Konferenz
Algebraische GleichungBruchrechnungEigenwertproblemMathematikMatrix <Mathematik>Stochastische MatrixModulformInvarianteVektorraumOrthogonalitätAuswahlaxiomEinflussgrößeGewicht <Ausgleichsrechnung>Wurzel <Mathematik>DistributionenraumSymmetrische MatrixJacobi-VerfahrenSpezielle orthogonale GruppeRichtungSpiegelung <Mathematik>NichtunterscheidbarkeitDickeInnerer AutomorphismusInverseVorlesung/Konferenz
Algebraische StrukturGraphMathematikNumerische MathematikStochastische MatrixVarianzMatrizenrechnungZufallsvariableVektorraumDimensionsanalyseOrthogonalitätGesetz <Physik>AuswahlaxiomBeweistheorieDeterministischer ProzessInverser LimesTheoremStochastische AbhängigkeitGewicht <Ausgleichsrechnung>NormalvektorPunktspektrumWurzel <Mathematik>ZufallsgraphRandomisierungVollständigkeitDistributionenraumQuadratzahlChi-Quadrat-VerteilungDiagonale <Geometrie>DifferenteDickeMultiplikationsoperatorVorlesung/Konferenz
EigenwertproblemErgodentheorieGeometrieGraphNumerische MathematikStochastische MatrixMatrizenrechnungInvarianteMittelwertBerechenbare FunktionUniformer RaumGesetz <Physik>AuswahlaxiomBeweistheorieDeterministischer ProzessDifferenzenrechnungErwartungswertGeradeInverser LimesMomentenproblemProjektive EbeneResultanteSigma-AlgebraZentrische StreckungEinflussgrößeGewicht <Ausgleichsrechnung>PunktspektrumWurzel <Mathematik>RandomisierungDistributionenraumQuadratzahlPunktKreisflächet-TestRadiusWeg <Topologie>DifferenteMultiplikationsoperatorVorlesung/Konferenz
EigenwertproblemStochastische MatrixGesetz <Physik>Wurzel <Mathematik>DistributionenraumRichtungVorlesung/Konferenz
EigenwertproblemMathematikMatrix <Mathematik>Stochastische MatrixVarianzWahrscheinlichkeitsverteilungMatrizenrechnungHauptkomponentenanalyseVektorraumGanze FunktionBilinearformGesetz <Physik>BeweistheorieGruppenoperationInverser LimesLemma <Logik>LogarithmusLokales MinimumMereologieSkalarproduktraumStichprobenumfangTheoremKonstanteNichtlinearer OperatorNormalvektorWurzel <Mathematik>Bernoullische ZahlKorrelationsfunktionPunktMultifunktionDiagonale <Geometrie>DickeMultiplikationsoperatorDiagonalformRechter WinkelVorlesung/Konferenz
EigenwertproblemNumerische MathematikStatistikStochastische MatrixMengenlehreVarianzModelltheorieMatrizenrechnungStörungstheorieInvarianteVektorraumPhasenumwandlungFinitismusAnalogieschlussAggregatzustandArithmetisches MittelDifferentialErwartungswertFluktuation <Statistik>FunktionalGruppenoperationKovarianzmatrixRangstatistikStichprobenumfangTheoremKonstanteGewicht <Ausgleichsrechnung>KorrelationsfunktionQuadratzahlKovarianzfunktionSortierte LogikDiagonale <Geometrie>Vorzeichen <Mathematik>DifferenteMultiplikationsoperatorDiagonalformEinsVorlesung/Konferenz
EigenwertproblemAusdruck <Logik>Arithmetisches MittelBeweistheorieGruppenoperationLoopTheoremZentrische StreckungGewicht <Ausgleichsrechnung>Wurzel <Mathematik>VollständigkeitQuadratzahlVorlesung/Konferenz
EigenwertproblemFunktionalLokales MinimumSkalarproduktraumSupremum <Mathematik>DickeAdjazenzmatrixVorlesung/Konferenz
Diagramm
Transkript: Englisch(automatisch erzeugt)
Thank you very much. Thank you for the invitation to this wonderful place. I already spent a week here listening to great lectures and it's a pleasure to
give lectures myself. So what I'm going to talk about is an approach to random matrices, which is a little bit different than the usual approaches.
In the sense, you may think about it as somehow making geometry more prominent than analysis. So I'm going to think about random matrices as having some geometric structure.
And when we try to understand limiting things, such as eigenvalue distributions and various things that you usually take limits of in random matrices, we'll try to keep this structure as opposed to forget it and just focus on the tiny thing that we care about. Another predecessor of
this approach is that of David Aldus, who actually had the whole program to understand probability this way, which is called the objective method. So when you want to understand limits of some random objects, for example, random trees, you don't just take a
statistic and take a limit of it, but try to capture as much of the object as you can. Not just one statistic, but perhaps the whole thing if possible. And then if you do that, it's often actually easier to do and you can deduce theorems that would be harder to prove
otherwise. So that's the point of this approach. So we're going to start with something that you have seen before, which is spectral measure.
So I was told to define everything, so we'll start with that. So let's say that A is a simple thing. You know that it has real eigenvalues and real eigenvectors.
So for a symmetric matrix A, you can define a measure sigma. This is the spectral measure at the first entry, which is just sum over i equals one to n
of delta mass at lambda. So far, it looks like the eigenvalue counting measure, but there is a weight which captures the fact that we want to understand this matrix at its first entry, and the weight is phi one i squared. So it's the i-th normalized eigenvector.
We do it right like phi i one squared. So this is the i-th normalized eigenvector, and you take the one entry, the first entry of that. So these are the weights of the spectral measure. So this is the spectral measure at one.
And you can imagine how to define the spectral measure at any other vertex two and so on. Just replace this one by two. So what do you know about this spectral measure? Well, actually one thing that you can prove, so let me already give you an exercise. I'll
give you all kinds of exercises of things that I don't want to do here. So a simple exercise is the following, that if you look at the k-th moment of the spectral measure, so this is a probability measure because these things sum to one, because these are normalized eigenvectors. So then the k-th moment of the eigenvector
is just that you look at the k-th power of a and look at the one one entry of that. So a simple example, as you have seen before, is when you take G, a graph,
and A, the adjacency matrix. So A ij is one if i is a neighbor of j and zero otherwise.
Can decide what to do when they're loops, so maybe even if they're loops. Okay, then this is just, in that case, this is just the number of paths of length k.
Right, number of closed paths from one to one, right? Vertex one to vertex one. So here is an exercise which you will be able to do in the next step.
So you need to wait maybe 10 minutes. So this of course works. This is obvious when the graph is finite, but let's say that A is infinite, or G infinite,
then you can still make sense of this formula, right? And you could think of this that this defines the measure. So G infinite, and let's say that it's bounded degree, then this formula
defines a unique expected probability measure. So you do this exercise and you already see that
the spectral measure is defined for infinite graphs as well. Okay, so why do we care about the spectral measure? Well, nice thing about the spectral measure is that it behaves nicely with graph convergence.
So let's define some notions of graph convergence, okay? So this is called the local limit. Or actually, rooted limits. Rooted limit, rooted convergence. Okay, so you have a sequence of graphs. Typically, you'd like them to be bounded
degrees, so universal bound on their degree, but this can be relaxed. And every graph has a root, so it's just a specific vertex, okay? And we say this converges to some graph which is possibly infinite. This can also be infinite, it doesn't matter.
Okay, for every r, the r neighborhood eventually stabilizes, okay, to that in G.
So it's very simple, right? So the graphs converges. If you look around in a ball around this O, what you see eventually doesn't change and it's just going to be what you see in G. So let's look at, did I use this one? I'll use this one. Let's look at some examples.
So let's look at the n cycle. This is very simple. We have to root somewhere.
Well, what is the rooted limit of that? Well, this is obvious, right? It's just the nearest neighbor graph of G, okay? And then you can look at the n path.
And now what is the rooted limit of that? Well, actually it could be many things, right? If you put the root here, then it goes to z plus, right? If you put the root some finite
distance away from the beginning, then it goes to z plus with a root at that finite distance away from the beginning. And if you put the root with the distance growing away from
both endpoints, then of course you'll get the set. So there are many options to here. And again, there is a nice exercise. This is just for those of you too. Who have seen the notion of the irregular graphs. Of course, if you have seen
Eyal's talk last time, then he has done this exercise for you. Okay, so that the random irregular graph, so G and D, okay, converges to T D,
which is the irregular tree. Okay, and I guess this is the in probability, right? With respect to rooted convergence. So that's a little bit harder exercise,
but it's not that hard. Just have to make sure that it's unlikely that you close a cycle. So this is rooted convergence. What is it good for? Well,
you know, we're talking about random matrices here and so on. So eventually eigenvalues have to come up and I've already discussed eigenvalues. So what is the statement? Well, of course, so even this is a remark that the spectral measure is continuous as with respect
to rooted convergence, right? If you have G and O converging to some G O,
then the sigma n, which is the spectral measure at the root, this converges to the sigma weakly. Okay, and why is that? Yeah?
If you take any bounded number, it converges. So this integral, this kth moment. Yes, because of this formula, right? Number of path of length k is the kth moment,
so that stabilizes. This is an extremely strong version of convergence, right? Eventually this number, which is the number of path of length k, stabilizes to the corresponding thing in j. So it's not only that the moments converge, the moments actually just freeze, actually. Of course, convergence of moments implies
convergence of probability measures. I don't need this point yet. Yeah, that's why they hid it from me. Okay, so let's look at, again, an example.
So let's look at the n-cycle. So let's look at what are the eigenvalues of this n.
What is the spectral measure? Well, let's look at, so this is actually a little bit tricky. So let's look at what are the eigenvalues.
So the adjacency matrix of the n-cycle, of course, you can write as t plus t inverse, where t is just the right shift matrix. It's just the adjacency matrix of this directed graph. Right? And of course, these things commute. And what are the eigenvalues? Well, you just
write down, it's just this matrix where you have things of the diagonal and one below. This is also the discrete Fourier transform. So it's diagonalized by the Fourier transform. So the eigenvalues are actually just the roots of unity. So what are the eigenvalues of this?
The eigenvalues of this are the inverses of the eigenvalues of this, right? So the eigenvalues of this are the i plus the i inverse, which are the roots of unity.
And of course, this is just twice the real part of the i. Okay, so here is a picture for this. Let me do this picture. For the roots of unity, let me blow them up by two,
because of these two here, and then just take the real part. So the real part is just project them down to the real line. So this is now what we have. This we understand. This is the
mu, mu sub n, which is the eigenvalue distribution. It's just uniform in this. Now, what is the spectral measure? Well, first of all, one thing that you know is that the
spectral measure is of course the same at every vertex, because this graph looks the same. It's a transitive graph. The other thing you know from this formula is, this is an exercise, because this is again very nice and simple. If you look at the spectral measure,
it's actually immediate from that formula if you think about it. And sum over all the vertices, or all the indices, the same way, it doesn't matter. Not only sum, but you take the average.
Now, what do you get? Well, you get the eigenvalue distribution. OK. So the average of the spectral measure with respect to the choice of root gives you the eigenvalue distribution, just because when you sum this overall entries, you get one, because these are normalized eigenvector. That's all. That's the only
thing you use, right? The only thing you use is that these have length one. You don't even use that. They're orthogonal. Excuse me? It's not an exercise. Yeah, it's too easy. I told you. It's the solution. OK. So we got to this point.
OK. So all of these guys in this case are equal, and their average is mu. So in this case, this is just sigma n. So what does this tell you? Well, sigma n converges to where? Well,
we know where it converges. On the one hand, this is just the projection of these equally distributed lines on points in the circle. When the number of points grow, then of course, it will just be the projection of the circle to the real line, right? So you take uniform
measure on the circle and project it to the real line. OK. So this measure, this projection is called the arc sine distribution on minus 2, 2. And of course, this converges to the sigma of z, because the graph converges to z. So we have just concluded that the
spectral measure of the integers is arc sine. It's just in this example.
OK. So let me say this. So sigma z is equal to arc sine. So nice and simple.
Let's go a little bit further and see. See, now we have understood somehow, if you think of this as understanding limits of spectral measures in terms of limits of graphs, right, or limits of structures. But how can you understand the limit of Eigenvalue distributions
in terms of limits of graphs? So this thing gives you a hint, this over here. So if you take the average of the spectral measures, you get an Eigenvalue distribution. OK. So what this shows you is that if you somehow take a limit of graphs in an average
sense, then you should get the limit of the Eigenvalue distribution. And this is the content of, well, this is why it's useful to use the Benjamin-Nisham convergence.
So what is that? Now you have a sequence of, say, finite unrooted graphs. OK. And you make this into random rooted graphs by picking a root at random,
uniformly at random. OK. And you say go, which is now a random rooted graph. So let's say that what you'd like is that go should convert to some go,
which is a limiting random rooted graph. OK. And with what sense? Well, it's obvious what sense, right? It's just locally and weakly, right? So we know what local convergence is. So this is just weak convergence with respect
to the local convergence notion. We know what it means for random rooted graphs to converge, for a rooted graph to converge to a rooted graph. So we know what it means for a random rooted graph to converge to random rooted graphs to go in.
OK. And what this really means is, if you want it more simply and not so abstractly, what this means is that if you look at the statistics of the neighborhoods that you see around your root, what is the probability of seeing this particular thing that converges for each particular thing? And for each R in each neighborhood, you have a probability
and those probabilities converge. So it doesn't mean anything more fancy than that. OK. And we again have a corollary, right? So a corollary of the definition, I don't know,
corollary of what we wrote, right? So if g n converges to some go, in the Benjamin-Nisham sense, then mu n, which is the eigenvalue distribution,
empirical eigenvalue distribution, converges to what? Well, this here, we wrote is actually the expectation of sigma n, right? It's the average of the spectrum measures. What does it mean to take an expectation of a probability measure? Well, it just means that
the measure of a set is just the expected measure of the set here. So this is very simple what it means. As long as you have linearity, you can take expectations. And everything is fine here because these are probability measures. All the numbers you care about are bounded. So you don't have to worry about any kind of technicalities. OK. So what do we know? So mu n actually
actually converges to the expectation of sigma of the GO. Why? Because this is the expectation of sigma n. You know that sigma n
converges to sigma weakly, right? Because here is weak convergence, and that's a continuous function, so just by a continuity theorem. OK. And then by taking expectations, which you can, so bounded convergence,
this also converges to that. So it's extremely simple. OK. So what do we know?
Let me see. Now I need a sponge. OK. So let's just do an example.
Again, I'm going to do very simple things now. So if you look at the n path, then this converges in Banyamini-Schramm sense to z. We have seen it because most of the
points are going to be growing distance away from the boundary. So when you pick a random root, it will be far away from the boundary. And all those sequences actually do converge to z. And so this tells you that mu of the n path converges to,
mu n here converges to arc sine. It's a proof. Notice that we didn't know no computation so far. OK. Nice. So we have this nice notion of spectral measure.
And it works very well with graph convergence. So let me tell you.
So let me give you two exercises before I go further. One of them is kind of obvious. So if you look at the n by n box, you can do this more precisely if you like. It takes a bit of work to make it actually precise. n by n, this converges to zd, z2, and for general d.
So this is 1. We did make this precisely. And the second exercise, which I really like, showed that if you take finite trees, tn, then these never converge in Banyamini-Schramm sense
to the d regular tree if d is greater than or equal to 3. That's actually surprising if you first see it. So I really like to approximate this tree. Try hard and it doesn't
work. You could do it that way also. That's the worst way. But when I tell you that you can't, you can prove that you can't of trees. OK. That's the right thing to do here.
Have you read that in some of the Texas schools, they have reintroduced corporal punishment?
Nothing, friends? Why is it an important piece of information? Hope not. Actually, you need the agreement of the parents, but let's see.
So have we been? OK. So we have seen that this spectral measure, sigma,
is something important about matrices. So it would be nice to introduce an equivalence
relation. So let's say that A is equivalent to B if sigma A equal to sigma B. And now we're talking about n by n symmetric matrices. So this is somehow something that comes up here
because you want to understand the spectral measure. What part of the matrix is responsible for the spectral measure? Can we extract that part? No, no. This is at 1, 1. Yes. So
it's at the first entry. So well, maybe this question, maybe you have seen this question, or if not, you have maybe seen a different question. What if you do the same thing with
mu, the eigenvalue distribution? Well, that question you know the answer to, right? So when do two matrices have the same eigenvalues? Well, there is a structural understanding of that. You just say, well, they can all be conjugated by an orthogonal matrix to
some diagonal matrix, which, say, has increasing values on the diagonal, or non-decreasing values on the diagonal. So on the one hand, you can say, well, there is a way to transform them to one another. So in fact, more simply, this happens if and only if the two matrices are
conjugate by an orthogonal matrix. So there is a notion of transformation if you have that. Then there is also a notion of class representative. There's a unique class representative, which is just a diagonal matrix with non-decreasing entries.
So the question is, what is the corresponding thing here? So actually, there is, and in fact, in many ways, it is nicer.
And here is the theorem. So first of all, A and B is conjugate to B in this sense, if and only if there exists an orthogonal matrix O, which is n minus 1 by n minus 1,
such that A is, you know, you can conjugate one to the other, so QBQ inverse, where Q is just, you know, this block diagonal thing, which is very simple. I just have a 1 here and an O here.
Okay, this is O, right? And 0 is here. So this is one thing. Well, you have to be, this is almost completely true, okay? You have to actually assume one thing for this,
which is, at least for the next part of the theorem. So we'll make an assumption. Okay, so the assumption is that the vector E1, so the first coordinate vector,
is cyclic for A. Okay, so what does that mean? It means that if you take the vector E,
let me just call it E, vector E and AE and A squared E, and so on, all the way to A to n minus 1 E, then these things are linearly independent. Okay, so for example,
you cannot just have, this cannot be an eigenvector. It's one of the things that it says. And of course, if they're linearly independent, then they form a basis.
So it's like, it's equivalent to asking that to form a basis. If they form a basis, you can Gram-Schmidt them and so on. There are various nice things you can do. I'll get to that. Okay, and the second thing is that each equivalence class, so again,
we're looking at matrices like this. Forget all others, we throw them out. Of course, if you look at random matrices, this will happen with probability one, so this won't be
a unique Jacobi matrix. Okay, and what's a Jacobi matrix?
Okay, I should be somewhat consistent with my board use here. Okay, so a Jacobi matrix is
just a matrix, which have AI. Okay, so tri-diagonal matrix, where the AIs are real
and the BIs are positive. That's what the Jacobi matrix is. So this is the world that we live in.
So if you want to understand spectral measure, it's very natural to try to understand Jacobi matrices for that. So how does that go? Well, actually, I want to do the proof. I'm going to do the proof of this at least one step.
Okay, so here's how the proof goes. We'll show that first of all, step one is there exists the Jacobi matrix, which is equivalent to A. Okay, that's the first step.
And the second step is that if two Jacobi matrices have the same spectral measure, then they're equal. Okay, so I leave this one as an exercise. It's actually a nice exercise.
And if you put these two together, then it implies both of these. Okay, just think about it. All right, so I forgot to tell you, this is really, I should have told you. So everything
I'm talking to you about today and maybe in the next lecture is you can find lecture notes on my website. Just Google my name, and it's right there. Okay, so let's see the first one.
So the first one actually, historically, the way this came up is it came up in the Landau-Shaw algorithm. Okay, so what's the Landau-Shaw algorithm? This was a way to,
let's say that you want to store the eigenvalues of a huge matrix. Okay, but you're only given the matrix. It's an n by n matrix, and you like to keep them for your grandchildren, but only the eigenvalues. So don't care about the matrix. So of course,
one way to do it is to compute all the eigenvalues, write them on a piece of paper, or upload them in the cloud or something. That's one way. But the problem is that when you want to compute the eigenvalues, you have to solve algebraic equations. That may not be that nice to do. So you lose precision and various things. You have to do some
hard things computationally and algebraically. So the Landau-Shaw algorithm is a way to go around this. This is one of the goals, which is that you want to reduce your n-squared pieces of data to just order n pieces of data. And so what do you do? We just try to find
the Jacobi matrix, which has the same eigenvalues as your original matrix. So it's essentially this problem, except you only care about the eigenvalues. You'll get the weights just as a bonus from that. So how does it work? So you take your matrix.
Let's see. So you take your matrix A, which you're going to write like this. And you conjugate by Q. So let's look at Q. Let me write A also in the form that Q is.
So you take A. A is going to be written like this. So this matrix A, I'm going to write it like this. So there is a vector B. There is then B transpose. And then here is a small matrix C, which is n minus 1 by n minus 1. And you now conjugate with a Q
that is of the form that we wanted. So you want to put 1, 0, 0, and an O here. And you want to put 1, 0, 0, and an O inverse here. So what happens when you do that?
Well, that's not hard to compute, right? What you get is just
something like what you had before. So this A actually stays fixed. The B gets multiplied by O. Well, this is still going to stay a symmetric matrix. So you get your OB transpose. And the C just gets conjugated by O. So what is this good for you?
Well, it's good for now. You have a choice of this O. And you choose it so that it will rotate
your, this is just a general orthogonal matrix. So you can rotate your vector B to any direction you like. So choose your O so that B is going to be this vector, which is just length of B. Well, the length has to change. It has to stay the same. But apart from it like this,
it's going to be 0s otherwise, right? So this is going to be now like this. And again, the length of B. And you have already done a step towards your Jacobi matrix.
You have put 0s here where they're supposed to be 0s. And then you just repeat. So you do the same thing with a matrix where the top corner, orthogonal matrix where the top 2 by 2 corner is the identity. And you see that it will work. So keep going.
If you do this, you get the Jacobi matrix. And of course, it's clear that, I didn't tell you, but of course it's clear that conjugation by matrices of this form don't change the spectral measure. That follows simply from many things. For example, from the way the
eigenvectors change. So the eigenvectors will get multiplied by this matrix when you conjugate by O. And so the first entries won't change because this matrix doesn't
change the first entries. So that's one way of seeing it. But also by expanding the path, the O's will cancel. That's another way of doing so. OK. So that gave you basically a proof of that. And this is called,
I mean, you can actually have a choice of Bs which are certain reflections. They're called household direct fractions. It's the standard choice. But it doesn't matter which O's you use. Now here is something funny. When you do this thing here, you see, you have the feeling that there are a huge amount of choice in O. Because the only
thing you care about is where it takes this vector B. So there are many, many O's that do that. And so it looks like there could be many Jacobi matrices that you get this way by choosing the O's. But in fact, there is just one. So this choice is an illusion.
OK. I'm not talking about life. I'm talking about this matrix.
All right. So what do we do? I think we can go to GOE now. OK. So you know GOE. The Gaussian orthogonal ensemble. So it's just m plus
m transpose over root 2, where m is the matrix with iid normal 0, 1.
OK. So that's the GOE. Root T, root 2. And you can check that O M, O inverse has the same distribution as, sorry, O. So let's call this A. So O A O inverse has the same distribution
as A for any orthogonal matrix O. That's why it's called the GOE. You wish an orthogonal ensemble contained orthogonal matrices, but it's not true. It's orthogonal invariant. So this idea, which is due to Trotter,
82. He was at Princeton. I actually met him in the 2000s and said
the thing that he did, he didn't think it's going to be any importance, but he liked it, so he wrote it down. He said that you can apply the same algorithm to GOE.
So what do you do? Well, OK, so now that's a GOE. So what's happening to A? Well, A is a normal. OK. And B is just a vector of normal, independent normal.
So when you do this first step, this is going to be a normal. And this is going to be what? A chi, right? Which is just the length of a normal vector, chi n minus 1,
because it's a normal vector with n minus 1 entries. Distribution of that is called chi. Now, what is this going to be? This is a bit more tricky. Well, if you look at the submatrix of a GOE from this definition,
you see that's also a GOE, just as lower dimensions. So the C is a GOE of lower dimensions. And if you conjugate a GOE by an orthogonal, you get a GOE. But you have to make sure that the orthogonal you conjugate by is independent of the GOE. Otherwise, you could get a diagonal matrix, which is not a GOE. So it's important. Well,
it should be either a constant, or independent is also OK. But you can see that it was independent, because C was independent of A and B, C from the definition, completely. And O, the choice of O just depended on B, nothing else.
So this is still dependent. This is still a GOE. And the same is true for the next steps. So what does this tell you? You can imagine. OK, so basically, Trotter's theorem tells you that the GOE is equivalent, in the sense that we showed before, to this matrix, which is n1, n2, n sub n,
the n are normal, 0, 2. Because of this, the diagonals are just added. So and then you have the chi n minus 1, chi n minus 2, chi 1.
OK, when you do the second step, you just get the normal vector of length n minus 2. And the chi's are just, I'm just writing the distributions.
And these chi i's are chi's, or independent. So there's a matrix with independent entries.
There's another matrix with independent entries, and they're equivalent, or you can couple them in an equivalent way. So that's Trotter's theorem. And guess what Trotter used this theorem for? Who can guess? This is an obvious guess.
He used it to prove the Wigner semicircle law. OK, and at 82, that wasn't used, but it was a new proof. So basically, what I'm going to show you is a modern version of Trotter's proof. It's set in a different language, but essentially, it's what he did.
And let's see. So here is, again, what we need is an exercise for this, is the following.
So if you look at chi n, then this is asymptotic to square root of n. OK, so that's important. So chi n over square root of n converges to 1 in
probability. But in fact, you can say something more precise, that this is asymptotic to square root of 1 plus this. OK, so it's a normal of variance one half. So you can reduce it from the CLT and some trickery, because chi squared is the sum of independent things. So chi squared
has a CLT, but then you have to push the CLT through the square root, which is not hard. So this means that if you subtract the chi n minus root n, then it converges to normal in distribution. OK, so let's think of this matrix here as a graph. OK,
so we have this Jacobi matrix, J n corresponding to GoE. What does it look like as a graph?
Well, it's a graph. It loops, right? All Jacobi matrices look like this. The weights in here are positive. Right, so this is the chi n minus 1.
And the weights here are just, you know, these normals. So it's a weighted graph, a graph with edge weights. And the edge weights go to infinity. So let's normalize. So let's 1 over root n time this. So this means I'm weighting the edges.
So I want to take a Benjamin-Isham limit. Now, there is a slight issue that this is
a weighted graph. And I didn't tell you what the Benjamin-Isham limit of a weighted graph is. But it's kind of obvious, right? A local limit of the weighted graph, it just means the structure stabilizes and the weights also converge. So then the Benjamin-Isham limit is the same. And it's not so hard to check that all the statements about
spectrum and so on will go through. So where does it go? Well, you divide by n. So all these labels, they disappear. OK, then these normals, they go away. And not only that, but the randomness in the chi's will go away too,
because that's also order 1 normal. So you could say, essentially, this thing just looks like for this kind of convergence. And then you have here root k, n minus k over root n.
This kind of convergence, you may as well think of it as a fixed random graph. Sorry, a fixed graph, not a random graph. The randomness is not important for this kind of convergence. The randomness is not important for the Wigner semicircle law anymore.
So where does it converge? Well, this is just the path. So first of all, it has to converge to z. The graph structure has to converge to z.
OK, now you have to understand the labels. How do the labels converge? Well, what you do is pick a random place and you zoom in there to see what's around that. If you pick a random place and you zoom in there, the labels don't change very much. OK, so in the limit, all the labels are going to be the same. But they're not going to be
deterministic. They're going to be random because of the choice of location. That was random. And what are these? So if you look at this n minus k over n, where you choose k uniformly at random, this converges just to a uniform 0, 1 random variable.
Let's call it u. You choose k uniformly at random between 1 and n or 0 and n. This number just converges to that. So you have a graph of z with labels that are square root of u.
OK, and these are the same u. They're not independent. They're the same u. So it's really non-ergodic. This is the opposite, very far from ergodic. So you could just write it as square root of u times z. That's your graph.
So we have proved that, basically. And what is the limit here?
Benjamin Isham, of course, and in probability. So that's the proper, because this is random. This is something deterministic. You have convergence in probability. In distribution also, if you like, but it's the same.
So of course, we also know that the eigenvalue distribution, u n, converges, again in probability, to the expected spectral measure of this. And the expectation is only over u, because everything else is
deterministic. So basically, you just take the spectral measure of z and scale it by square root of u. So let's put this geometrically. So you have a circle of radius 2.
You scale this circle by a square root of a uniform. You pick a random point and project it down to the line. That's the eigenvalue distribution of this. But when you scale a circle by a square root of a uniform and pick random points,
it's just the same as if I took the disk here and picked the random point and projected it down to the line. Because the distribution of the radius of a random point in the circle is just the square root of a uniform. The square is uniform.
So this thing is the projection of the circle to the line, or the disk to the line. We use for measuring the disk to the line. So this is the Wigner semicircle. So here is the proof.
No computation. Yes? When you have your graph computation, is that a problem that you may have negative weights on the groups? No. It doesn't matter. There's no difference.
I mean, just think about it. All the co-moment things go through as long as things are bounded and so on. So that's a proof of the Wigner semicircle law. And it is a kind of proof
that I promised you. It's a proof of a simple thing, but it's a proof that takes geometry out of these matrices and then tries to take a limit with the geometry there. And they do things from, say, the eigenvalue distribution of the matrix using this geometry.
So I have 30 more minutes. I want to tell you two things. One of them is another proof, the Wigner semicircle law, which is also very quick, even quicker than this. And the other one is
we're going to talk about the top eigenvalue. So we can use this approach to show the Kombos-Friedi result. And I'll tell you that in a second what it is.
So let's try to quickly do this other proof of the semicircle law. And the reason I'm telling you is because it's a proof that takes a different operator limit. And actually, it's surprising that you get the same answer. It's something for contemplation.
So here is another way of taking the Wigner semicircle law. So let's look at sigma n, so the spectral measure at 1 of GUE.
OK, so you can take this in the Jacobi matrix. And now you take a rooted limit to understand sigma n. So you take the limit at this vertex.
So what do you see? Well, you have this chi n minus 1, chi n minus 2. Again, the randomness doesn't matter, the same idea. And what do you see?
You see that this measure, this graph, this rooted graph, which is a rooted graph, converges now in the rooted limit in probability again, because this is random and you get a deterministic thing. Well, what? To just z plus.
So z plus with this root. So this tells you that sigma n, which is continuous, it
converges to sigma of z plus. I know. OK? And this is true for a fixed choice of root. But it's true for every choice of root, no matter what, because the GOE was, of course, invariant. So it's also true in average.
Actually, this very simple way of concluding from here is just talked to me by a student in my last lecture. So I was very, very happy to see that. So this is true for all choice.
So therefore, mu n, sigma, z plus, naught. OK? So now you can do various things. You could say, well, this is an independent proof of the
Wigner semicircle law. As long as you know that the spectral measure of z plus here is a semicircle law. You could actually prove that by simply computing the spectral measure of an n path at the n point. You can compute that and just check that it converges to the semicircle law by hand.
So that's one way. Or you can use the previous proof and just say, well, the conclusion is that z plus has the semicircle law here as a spectral measure. You can do that. That's probably the fastest thing. There's no computation. And what does this imply?
Well, it implies that the moments of the semicircle are actually the DIG paths. So just the paths in z plus that return to the root. The k-th moment, so integral x to the k, the sigma for the semicircle, or maybe semicircle, is equal to,
write it like this. So these are just the DIG paths, or maybe 0, 0, number of paths in z plus that start from 0 and return to 0.
OK, so the next thing I want to show you, and hopefully I will be able to, I completely lost track of these words, I'm sorry.
Maybe I'm able to continue, get to a good point where you can stop and continue next time, is the Kamloch-Furedi result, or Furedi-Kamloch.
OK, so what is Furedi-Kamloch? It just shows that if you look at the top, the first eigenvalue of lambda 1, lambda 1 of GoE. It's more general, but it still works.
And you divide by root n. Then, well, what does the semicircle law tell you? The eigenvalue distribution converges to the semicircle if you divide by root n. The semicircle is supported from minus 2 to 2.
So it tells you that this limit, of course this is random, but anything that it can converge to, it has to be at least 2. Because otherwise, that would contradict the semicircle. There has to be all this. So Furedi-Kamloch tells you that it is actually 2.
It's a simple thing you may learn. And so the only thing we have to show is that, so the GoE tells you, the semicircle law tells you this, that this is what follows from the semicircle
law, and then you have to get the other direction. So let's do a lemma for this.
And this is a lemma that's true for general Jacobi matrices. So for any Jacobi matrix, you have an upper bound on the top eigenvalue, which is given by the max over i, a i plus b i plus b i minus 1.
So remember, a is where the diagonals and b is where the off-diagonals. OK? So let's prove this.
So here I just recall that this was j. So let's take this matrix M, which is the following. So take 0, root b1 minus root b1, root b2, minus root b2,
root b3, minus root b3. I think you get the point. And I'm going to write this. So j is equal to minus M, M transpose,
plus a diagonal matrix with entries over here, or this.
Those are the entries of the diagonal matrix. Well, you know, that's not surprising, right? You take M, M transpose, you get this b squared. That's how we made it. And then you're going to get something on the diagonal. You have to fix that. So that's the diagonal matrix.
So the ij entry of this is just the inner product of the corresponding rows or columns. So that's how you get it. So this is obvious. And then we're done with the proof, actually. Because look at that, right? This is positive definite, this M, M transpose. It's every such matrix.
It just has non-negative eigenvalues. So minus M, M transpose just has non-positive eigenvalues. And so the max eigenvalue of this is less than or equal to the max eigenvalue of this plus the max eigenvalue of that. Why is that, actually? How do you prove that?
It's a norm, almost. It's not quite a norm, because a norm has negative eigenvalues. But yeah, so yeah, essentially the norm. So you use the Rayleigh characterization, right? The max eigenvalue is the super overall vectors of length one of the bilinear form. And then it just drops out of that.
OK, so end of proof. This is going way too fast, actually. And so what does it tell you?
So the lambda 1 of GOE is less than or equal to the max over i. What? So you have a chi n minus i plus chi n minus i minus 1.
This is bn minus 1 plus the normal i. So what do these look like, right? So these are roughly square root of n plus a normal,
which is just a normal. So you can easily say that this is less than or equal to square root of n, twice square root of n for these two things, plus some constant times square root of log n,
because this is the maximum of n normals. The chi's, it's easy to check that they also have Gaussian tails. And when you take maximum of n Gaussians, you get this kind of thing. And this is a random constant, but tight.
So you get a pretty good bound on the top eigenvalue. So it's not even terrible. So the truth is, in fact, this is 2 square root of n plus n to the minus 1, 6 times the trace of freedom distribution, which
is some probability distribution we'll talk about later, so this is the actual truth. So we're still off by a little factor, a little power, but it's not terrible. It's much more than what we need.
All right, so maybe I can tell you about another thing we can get here.
Sorry, can you recall what you mean by Tw? So this is here. And I didn't tell you what this is. What I'm saying, this is a theorem, what I'm stating here, but I'm not proving the theorem. So the theorem is that there exists a probability
distribution, which is defined by this theorem, even, such that this holds. So if you want to make it precise, you take this thing, lambda 1 goe, you subtract 2 root n, you multiply by n to the 1, 6, and that converges in law to something.
That's what the statement is. So here is another thing I want to tell you, which you can understand through this operator limit. And it's very timely now, because Gerard Bernoulli just
had his 60th birthday. And his most famous result, which he's not especially proud of, because it's not one of his strongest results, but it's a nice result, is what's called the Bayek-Benderous-Pache transition.
So did I do it wrong? No T. Terrible.
I'm sorry. Thanks. All right. So what's the Bayek-Benderous-Pache transition? Well, anyway, Bernoulli says, it's
terrible that you do all these strong things, and then you get famous for the one that is not this strong. And then I told him that you're still better off than Fubini. Anyway, oh, yeah, he was a pretty strong geometer,
actually. No, he is, yes. There'll be some more of this. OK, so where does it come from? So it comes from where the entire random matrix theory is coming from, right? We do random matrix theory, and we know it doesn't really come from the Greeks, like almost all other parts of mathematics or the Egyptians.
But it comes from the 1920s from Scotland, from this guy, Wishart, who was studying correlation matrices, so covariance matrices. So the point is, you want to do principal component analysis. You take some piece of data.
You normalize the entries, so that you don't care, by variances and so on. And so you think of this as now some sample, various samples from a population, and there are various statistics like, I don't know, height and weight and my strength
and all kinds of things. So you read their numbers. And you want to understand if these things have some kind of non-trivial correlation, right? So what you do is you normalize these matrices. You form the sample covariance matrix, and you look at the top eigenvalue. And if the top eigenvalue is large, then you say, OK, there is a correlation that comes from this top eigenvalue.
So but of course, you want to understand what does it mean large, right? And the way you do it is you compare it to the null case. So you create fake data. OK, and the fake data is going to be this matrix M, just an n by n IID matrix for our case.
It doesn't have to be symmetric. It doesn't have to be square. It can be a rectangular matrix. But just for now, let's make it square. And you look at the covariances in here. So look at the empirical covariance matrix of the columns of this and see what is the top eigenvalue.
So whatever is smaller than that in an actual data set is just noise, obviously. So that's what Richard did, essentially. I mean, he did sort of finite dimensional versions of this. And so it's kind of classical, and it's kind of not surprising that BBP become an important thing.
So let me give you the theorem, which we will not prove, but we'll prove versions, some analogs of this. So you look at 1 over n times the top eigenvalue of the following. So you take your matrix M. Remember,
this is IID Gaussian matrix. And you look at the diagonal matrix. And it looks like the first entry is 1 plus a squared. And then you just put 1's. And then you look at n transpose. OK. So again, you take this. So what is it?
So you can think of this matrix as the population covariance matrix. And then what you get is you take some data, and you get an empirical covariance matrix, the sample covariance matrix. This is the top eigenvalue of the sample covariance matrix.
Now, you may be telling me I was cheating because I told you to normalize everything to roughly one variance. But here I made this variance 1 plus a squared. And actually, it's still diagonal. I should put here correlations and not make this matrix diagonal. But because of the variance of the model, the only thing that matters of the population covariance matrix
is its eigenvalues. Nothing else matters. They're non-negative eigenvalues, but that's the only thing that shows up. If I change it to a different matrix here, with different eigenvalues, I still have exactly the same distribution because of the invariance of m. So actual correlations in the data, you can just encode as one large eigenvalue here.
And you can think of that things were normalized, and then I found some covariate correlations. And then I conjugated that matrix, and I got something like this. So this is the BBP theorem. OK, so actually they proved it for GUE.
And they proved much more, much, much more than this. It's just the simplest case. That this thing converges to some phi of a squared. And what is this phi of a? The BBP transition function.
So phi of a is equal to 2 if a is less than or equal to 1. And a plus 1 over a, if a is greater than or equal to 1.
So what does this mean?
So this is how the singular value, the top singular value, behaves. So what is this thing? So here is a, and here is phi of a. Let's see, so a is 1, and here is 2.
So we start at 2, and this thing actually is differentiable here, but not twice differentiable, I think. Well, I have to check. Certainly not three times differentiable.
And this is a plus 1 over a. So it sort of converges, it's asymptotic to this diagonal. This is not a great picture. So what you see is that for a while, if this a is small, if my correlations are small, then you don't see an effect.
Because they're the same as if they're none. So for a while, this is just state 2. And then suddenly, some effect is going to stick its head out after a is greater than 1. And you can see in the data the correlation. And the interesting thing is that this transition happens not by scale, but actually by just changing a constant.
That's why it's a phase transition. So if you think about it, it's not that surprising, because you have all these eigenvectors, and you sort of do a rank 1, a small perturbation of your matrix. Then for a while, the perturbation you don't see, because it's washed away
by the other fluctuations. But suddenly, it will overtake everybody else. And then you'll see it. And this is the exact way that it happens. I should stop at 30. Is that the plan, or 20? What's usual? 30.
So I'm going to give you some version of this theorem. So first of all, this model also has a tri-diagonal version.
So this is an exercise, which is to tri-diagonalize this. And you can prove this theorem using that.
But I don't want to go through this way. Rather, I'm going to stick with our GOE, because we've seen that very well, and just change the theorem to the GOE. So this is a theorem, which is the GOE with mean. The mean GOE.
So you take the GOE and add to it some non-trivial expectation to all the entries. And you now look at, this is a natural question, right? What's going to happen to that? So you look at 1 over n of lambda 1 times the GOE n plus.
You look at this all one matrix, right? That's the matrix with all ones. This is the all one vector. And if you do this, you get all one matrix. It's column vectors.
So this is going to be actually a rank one matrix. And you add to the GOE this non-trivial mean. If I put here just this, then it's going to say that all the entries will have mean one. Okay, and I want to see what is the top eigenvalue of that. And actually, if we make this interesting,
this is way too much mean to add. So this will actually go to infinity. It should be root n here. What you have to add here is a over root n.
And then this converges to phi of a in probability. And that's the theorem we're going to prove. It's going to have a complete proof of this.
But let me tell you what is the motivation. Like, why is this phi of a? Could that be the whole square root of n? No, that's actually 1 over n. Yeah, it's because it's square. So that's a covariance thing. The scaling is slightly off, but they all have a reason.
Okay, so instead of giving the proof to you now, I'll give you another theorem. It's an exercise, actually. Okay, so give me a break. Okay, because it's an exercise, it's helpful to do it for the proof.
You can't really follow, at least I don't know how to just deduce it from this exercise, the proof. But it certainly is helpful to understand what's happening. And it does give you where this comes from. So if you look at lambda 1 of z plus,
so z plus plus a loop of weight a at 0. So the top eigenvalue, which you can define by the Rayleigh-Cauchy formula, again, of this graph, where you just put a, and all of these are 1, it's 1.
Okay, and this is phi y. Okay, so this is the bike-bandeau-respeche transition for z plus.
And the lambda 1, you said which formula? The Rayleigh-Cauchy formula. Okay, you'd like to know that? Okay, so in general, the top eigenvalue of some operator, where this works, so of course, you look at the adjacency matrix here.
You define it by the sup overall function f in L2, in fact of length 1, of the inner product f a f, right?
And so when a is a matrix, for example, then if you plug in here the top eigenvector, then you get exactly the top eigenvalue. So it's actually a max in that case.
But in general, for z plus, this also makes perfect sense. So you can take any function in little L2. You can apply the adjacency matrix to it this way. This is just, again, and we can take this sup. So that's how you get this lambda 1.
Okay, so we'll continue from here next time, thank you.