We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

3/3 Spectral vs. geometric approaches to random walks on random graphs

00:00

Formale Metadaten

Titel
3/3 Spectral vs. geometric approaches to random walks on random graphs
Serientitel
Anzahl der Teile
27
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
DiagrammRelativitätstheorieDichte <Physik>RandomisierungModulformEigenwertproblemRegulärer GraphGebundener ZustandDistributionenraumAdjazenzmatrixDurchmesserBetrag <Mathematik>GraphFolge <Mathematik>PunktAussage <Mathematik>MatrizenrechnungMultiplikationsoperatorParametersystemGruppenoperationSpektrum <Mathematik>ZweiNumerische MathematikGraphfärbungBeweistheorieFunktionalAnalysisZufallsgraphQuadratzahlÜbergangswahrscheinlichkeitObjekt <Kategorie>GrößenordnungLokales MinimumOvalMinimalgradDifferenzkernMereologieGrenzwertberechnungGruppendarstellungResultanteVollständigkeitMengenlehreStichprobenfehlerNormalvektorAbstandIrrfahrtsproblemFormation <Mathematik>Nominalskaliertes MerkmalWurzel <Mathematik>Coxeter-GruppeTheoremKategorie <Mathematik>Gesetz <Physik>Negative ZahlVorlesung/Konferenz
Maximum-Likelihood-SchätzungObjekt <Kategorie>SummierbarkeitÜberlagerung <Mathematik>Total <Mathematik>LogarithmusTVD-VerfahrenGraphMatrizenrechnungGrenzwertberechnungBeweistheorieDistributionenraumAggregatzustandMultiplikationsoperatorRelativitätstheorieDichte <Physik>Rechter WinkelRegulärer GraphTopologieAussage <Mathematik>Perfekte GruppeTermNichtunterscheidbarkeitKonditionszahlQuadratzahlOrtsoperatorNormalvektorIrrfahrtsproblemWahrscheinlichkeitsverteilungMinkowski-MetrikEinfach zusammenhängender RaumUnendlichkeitAdditionKlasse <Mathematik>UngleichungMultiplikationAbstandEigenwertproblemGruppenoperationGerichteter GraphAdjazenzmatrixGleichverteilungÜbergangswahrscheinlichkeitIdeal <Mathematik>KnotenmengeSpektrum <Mathematik>Vorlesung/Konferenz
ÜbergangMultiplikationsoperatorOrtsoperatorTeilbarkeitÜbergangswahrscheinlichkeitQuadratzahlGraphBetrag <Mathematik>SpektraldarstellungAbstandIrrfahrtsproblemDivisionSummierbarkeitTopologieAusdruck <Logik>DifferenzkernGruppenoperationKnotenmengeRechter WinkelBeweistheorieWurzel <Mathematik>Negative ZahlGebundener ZustandLeistung <Physik>Abgeschlossene MengeZahlensystemEinsEinfach zusammenhängender RaumMultiplikationStellenringBruchrechnungRotationsflächeWärmeausdehnungVorlesung/Konferenz
Thermodynamisches GleichgewichtAbstandAsymptoteGraphFunktion <Mathematik>Normierter RaumUniformer RaumTermPunktRegulärer GraphKnotenmengeGraphTopologieRandomisierungMultiplikationsoperatorMultiplikationNachbarschaft <Mathematik>Leistung <Physik>ThetafunktionRechter WinkelMathematikGammafunktionJensen-MaßBeweistheorieNichtlinearer OperatorAnalysisTheoremModelltheorieSummierbarkeitStichprobenumfangMinimalgradSterbezifferIrrfahrtsproblemProdukt <Mathematik>GeradeLogarithmusRechenschieberKonditionszahlRechenbuchNumerische MathematikProzess <Physik>KonfigurationsraumAbgeschlossene MengeMatrizenrechnungHelmholtz-ZerlegungZufallsgraphPaarvergleichKurveDiagonale <Geometrie>AbstandTVD-VerfahrenEigenwertproblemAusdruck <Logik>KalkülMassestromAuflösung <Mathematik>DifferenteWurzel <Mathematik>QuadratzahlStreuungsdiagrammKonstanteAussage <Mathematik>ApproximationGebundener ZustandVorlesung/Konferenz
Expandierender GraphThermodynamisches GleichgewichtEigenwertproblemGraphMaßerweiterungTheoremKnotenmengeRegulärer GraphBipartiter GraphFolge <Mathematik>BeweistheorieÄquivalenzklasseKategorie <Mathematik>AnalysisPunktMatrizenrechnungAdjazenzmatrixGerichteter GraphAlgebraische ZahlFunktion <Mathematik>Nichtlineares GleichungssystemRadiusp-BlockDiagonale <Geometrie>Lokales MinimumKonvexe HülleKreisringÄhnlichkeitsgeometrieAbstandGraphDistributionenraumIndexberechnungTopologieWasserdampftafelMinimalgradFamilie <Mathematik>VerschlingungTheoremUnendlichkeitIrrfahrtsproblemEntropieModulformRechter WinkelKreisflächeRechenschieberDifferenzkernNabel <Mathematik>EigenwertproblemOrtsoperatorRelativitätstheorieMultiplikationsoperatorNichtlinearer OperatorStatistikUngleichungSchnitt <Mathematik>SummierbarkeitGeradeBeweistheorieQuantenzustandAnalogieschlussNumerische MathematikKategorie <Mathematik>Gewicht <Ausgleichsrechnung>Physikalisches SystemAnalysisEnthalpieWeg <Topologie>Objekt <Kategorie>KonstanteInverser LimesDeterministischer ProzessAnnulatorLoopErwartungswertEinsPunktParametersystemSterbezifferLogarithmusZufallsgraphNormaler OperatorThetafunktionTourenplanungKnotenmengeFunktionalPuls <Technik>EbeneRegulärer GraphAdjazenzmatrixRuhmasseModelltheorieSpezifisches VolumenWurzel <Mathematik>MatrizenrechnungKomplex <Algebra>Thermodynamisches SystemSymmetrieArithmetisches MittelBipartiter GraphGruppenoperationEinfach zusammenhängender RaumRandomisierungGüte der AnpassungQuadratzahlSpektrum <Mathematik>Computeranimation
Randelemente-MethodeMaßerweiterungKnotenmengeGraphEigenwertproblemExpandierender GraphRegulärer GraphFolge <Mathematik>Bipartiter GraphMultiplikationsoperatorRadiusÜbergangTopologieEntropieWurzel <Mathematik>SterbezifferRechter WinkelIrrfahrtsproblemMatrizenrechnungNormalvektorLeistung <Physik>RechenbuchGraphPunktEigenwertproblemKreisflächeNichtlinearer OperatorTermModelltheorieAbstandSchnitt <Mathematik>BeweistheorieRandwertParametersystemNumerische MathematikErwartungswertMinimalgradRechenschieberLokales MinimumDifferenteKnotenmengeRelativitätstheorieMereologieVektorraumRandverteilungZweiVarianzDimensionsanalyseNormaler OperatorEnergiedichteLogarithmusDistributionenraumStichprobenfehlerÄquivalenzklasseTheoremLoopHistogrammGüte der AnpassungZeitrichtungOrdnungsreduktionRandomisierungÜberlagerung <Mathematik>Gebundener ZustandRegulärer GraphRestklasseVorlesung/Konferenz
GraphAbstandKnotenmengeFolge <Mathematik>BeweistheorieDurchmesserPolynomBipartiter GraphPunktspektrumAnalysisZufallszahlenRegulärer GraphTopologieParametersystemSpektrum <Mathematik>Thermodynamisches GleichgewichtKategorie <Mathematik>ÄquivalenzklassePunktAdjazenzmatrixGerichteter GraphMatrizenrechnungFunktion <Mathematik>Algebraische ZahlEigenwertproblemNichtlineares GleichungssystemDiagonale <Geometrie>p-BlockRadiusGraphMultiplikationsoperatorAusdruck <Logik>Lokales MinimumParametersystemKlasse <Mathematik>EigenwertproblemKnotenmengeAbstandFamilie <Mathematik>PunktTopologieGrenzwertberechnungRadiusDickeTeilbarkeitDurchmesserGewicht <Ausgleichsrechnung>ModulformStatistikDistributionenraumBeweistheorieTVD-VerfahrenSterbezifferArithmetisches MittelIrrfahrtsproblemZufallsgraphSpektrum <Mathematik>Total <Mathematik>AnalysisBasis <Mathematik>ResultanteWeg <Topologie>TeilmengeMereologieLogarithmusÜbergangTermUniformer RaumUmwandlungsenthalpieRichtungOrtsoperatorPhasenumwandlungBruchrechnungErwartungswertNumerische MathematikDreiecksfreier GraphExpandierender GraphApproximationFlächeninhaltForcingDifferenteFunktionalRechenschieberZweiRegulärer GraphWurzel <Mathematik>Vorlesung/Konferenz
MatrizenrechnungGerichteter GraphModelltheorieMultiplikationsoperatorFormation <Mathematik>GraphSpieltheorieWurzel <Mathematik>ThetafunktionQuadratische GleichungEigenwertproblemNichtlinearer OperatorRechter WinkelUngleichungQuadratzahlRechenbuchInnerer AutomorphismusEinsNichtlineares GleichungssystemMinimalgradAbstandResultanteFreiheitsgradAusdruck <Logik>RechenschieberFunktionalKreisflächeIrrfahrtsproblemRegulärer GraphDimensionsanalyseRestklasseNegative ZahlVorlesung/Konferenz
Gerichteter GraphNichtlinearer OperatorTheoremHelmholtz-ZerlegungBeweistheoriePunktInnerer AutomorphismusAusdruck <Logik>PrimidealThetafunktionWurzel <Mathematik>Nichtlinearer OperatorKörper <Algebra>p-BlockMatrizenrechnungHypermatrixQuadratische GleichungEigenwertproblemMultiplikationsoperatorMultiplikationGraphQuadratzahlGammafunktionJensen-MaßRechter WinkelHelmholtz-ZerlegungDiagonale <Geometrie>TermLeistung <Physik>Vorlesung/Konferenz
GraphMinimalgradDistributionenraumÜbergangHistogrammLokales MinimumKlasse <Mathematik>EntropieGewicht <Ausgleichsrechnung>Inverser LimesTermPunktAnnulatorStichprobenfehlerMultiplikationsoperatorMereologieRechter WinkelSpieltheorieRelativitätstheorieErwartungswertVariableVarianzUngleichungGraphGrenzwertberechnungSpektrum <Mathematik>KonstanteIrrfahrtsproblemTopologieApproximationWurzel <Mathematik>BeweistheorieKonvergenzgeschwindigkeitRegulärer GraphTeilmengeEreignishorizontAusdruck <Logik>Überlagerung <Mathematik>KnotenmengePotenz <Mathematik>RandomisierungParametersystemAbstandPolynomProzess <Physik>LogarithmusTeilbarkeitAuflösung <Mathematik>Arithmetisches MittelDickeSterbezifferRandverteilungDurchmesserStichprobenumfangLoopZufallsgraphAnalogieschlussAnalysisDifferenzkernZentralisatorKonfigurationsraumMultifunktionRechenbuchNachbarschaft <Mathematik>UnendlichkeitDimensionsanalyseEinsRechenschieberZweiLeistung <Physik>RuhmasseQuantenzustandRandwertObjekt <Kategorie>Vorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
So this is the third and last lecture.
And in today's talk, I'd like to cover two items. One is the item that we began last time, which is the second proof for the result on a cutoff for random regular graphs, the one that uses a spectral analysis. And that's the one that's going to also carry to settings where we don't have randomness.
So remember that diagram that we had at the beginning where we had the first proof that heavily relied on the randomness of the graph and on regularity. Then we'll have a second proof that will not
rely on randomness anymore, but rather on the spectral properties that we understand very well on the random regular graph, but still uses regularity very crucially. And then in the second part of today's talk,
I'll address what happens when you don't have a regular graph. Let's say it's random, but now, let's say, half of the degrees are 3 and half are 4. And then things are different. So also because this is the last lecture
and last time we did things very slowly and carefully, but we only have this one last lecture to go. So today, I'll try to give you the outline of arguments instead of spelling everything out. But feel free to come along in the break if you want to see the details of various statements.
So as I promised yesterday, we said a few things very quickly towards the end of the lecture. I want to just mention them a little more formally.
So here's a proposition. So for p equals 2. So I'll remind you, first of all, what is it that we're after? Well, this picture above, I'm going
to shut the lights when we get to it very, very soon. But this picture above is a Ramanujan graph. And you might not remember what that is. But a Ramanujan graph is a graph where the bound that you have on all the non-trivial eigenvalues is essentially the best one possible.
So you want to take your adjacency matrix before you divide it by d and turn it into a transition kernel of a simple random walk. Just take your adjacency matrix and want to somehow confine all the eigenvalues into the smallest possible interval around 0 that you can in magnitude. And it turns out that 2 square root d minus 1
is the magic number. You can't do any better than that. So if you have any sequence of graphs, d regular graphs going to infinity, then you can say that the absolute value of lambda is going to be 2 root d minus 1 minus constant over the diameter.
And this is what Charles was referring to as the Long-Boppana theorem. And he had discussed that theorem in a more general context yesterday and in this course in general this week. So 2 root d minus 1 is the best possible one. And if it so happens that all the eigenvalues satisfy that they are at most, except the first one,
that are at most this number in magnitude, but really this number, not plus some epsilon, plus this law of one error, really at most this number, then you call the graph a Ramanujan graph. That's all. So if you'd like to think of it differently, it's a d regular graph where the spectral gap,
the absolute spectral gap, you also take the most negative eigenvalue in mind, for instance, is the best possible one. And as we saw from Charles' talks, g and d, the random d regular graph, is almost Ramanujan.
It's weakly Ramanujan. It's 2 root d minus 1 plus a little of one error. And there are precise and actually fascinating conjectures for the distribution of the second largest eigenvalue. And in particular, it is believed that with uniformly bounded away from 0 and 1 probability,
a random d regular graph will be Ramanujan. But with probability, I don't know, 2 sales. But that is far from the present technology's abilities to prove it. OK, so there are various constructions, randomized
and non-randomized for Ramanujan graphs. And in order to deal with this object, as I mentioned yesterday, that's what I was getting to here.
You have a random graph. And suppose that I tell you, well, forget about the random graph. You just want to use the fact that you have some information about the spectrum. And you want to exploit that in order to control the L1 mixing time. So the first thing that one would try would be to say, I don't know about L1 mixing,
but L2 is tightly related to the eigenvalues. I mean, if you know the eigenfunctions as well, then it's an equality. So let's see what we can do with the eigenvalues. And maybe L2 will bound us L1. With some luck, it's going to be a tight bound, a bound that we can realize using a lower bound on L1.
Then we're done. This is the way many, many proofs for L1 cutoff actually were obtained. So we're using precise representations, complete representations of all the eigenfunctions and all the eigenvectors going through L2 to bound L1 and providing some bound that matched using
some elementary arguments. OK, so for p equals 2, what can we say? I'll write the definition of what
we mean by the distance according to Lp norm. So this would be just starting from some point x. You can just say that this is just measuring yourself.
You're measuring the relative density with respect to pi or stationary distribution. p is, as usual, the transition kernel. And you're measuring it with respect to, maybe this is written in small font, with respect to Lp. So this is the distance according to Lp at time t. And as usual, we'll say that t mix according
to Lp within epsilon is the minimum t such that dp without any x is at most epsilon over t, where this guy is the worst case dp, so the worst starting
position. This is the standard definition. If you plug in p equals 1, then integrating over pi cancels this pi and multiplies minus pi here.
And you get just the L1, or equivalently twice the total variation mixing. So L1 is essentially, so total variation distance is the case p equals 1. And we're just going to use the p equals 2 to bound the p equals 1 case, ideally.
So here's the proposition. For every epsilon larger than 0, so on g, it is g and d, simple random walk with high probability has.
So for every epsilon larger than 0, you know that t mix of L2 of epsilon is equal to 1 half plus little o1. Log rho is this magic number from above.
It's the bound we have on the eigenvalues divided by d to term because this is a normalization to make the adjacency matrix into a transition kernel for simple random walk. And what I mentioned yesterday very quickly
is that this inequality, so just a lower bound, so this as a lower bound applies to any irregular graph.
Any irregular graph, so generally, you can say that t mix in Lp for p that
is between 2 and infinity, t mix of epsilon will always be at least p minus 1 over p log 1 over rho n minus some order log log n.
So this is a deterministic statement. And the proposition is telling us that actually these random regular graphs or these Ramanujan graphs in general, because the only thing we need in order,
this is kind of a red herring, this on g and d, what the proposition actually needs is just what is just looks at the spectral. So if you have a Ramanujan graph or a weakly Ramanujan you're a little low here, but we don't care about that for now, then Ramanujan graphs actually
are the fastest possible for simple random walk for any Lp. And here I wrote p that is at least 2, going all the way to infinity. And we know, at least from what we've seen in the past two classes,
that g and d was the fastest one for L1. And in fact, this will be true for any p from 1 to infinity. So these are kind of funny objects. The proof of this lower bound, actually also the upper bound
is immediate, the proof of the lower bound is very short. It would be good to kind of get us into gear before we discuss how the actual proof of Ramanujan graphs for p equals 1 works. So I'm going to do it very quickly. So here's the proof. And the lower bound is essentially
exploiting the same old connection between the cover tree and g, which is something that we've seen earlier in this course and also in Charles' talks. Goes as follows.
So for the lower bound, this is what I'm going to start with. And I'm going to start doing it just for p equals 2. So any irregular graph, p equals 2. So what you can say is for every x in your graph, you want to understand what sum over y of p t xy squared is.
And the reason is that p t x dot with respect to l of pi
squared, well, what is that? So I'll write it like this. We have a sum over a, let's write it like this, sum over x of mu of x.
This is, I'll write it without writing it in terms of p to the t. I'll just write something that's true for any distribution mu. So for any distribution, every probability distribution mu, if you just write this, so the relative density
with respect to pi squared, this is my d pi. And this is a discrete space. So this is what we have here when I write mu, replace p to the t by mu. This, if we open it up, we get what?
We get the square minus 1. Because the cross term, again, the pi vanishes. So we have a minus 2 plus 1, which is a minus 1. Yes, fancy, right? So with this simple observation, we see that this is the guy that we care about.
But our pi, we also know what our pi is. It would just be the uniform distribution. So let's look at what summation over x of mu squared is. So this is what I'm writing here. So summation over this x is actually this y.
This is fine. So summation over y of p, because this is now becoming the source vertex for the work. So p t x y squared, what is that? That is nothing but p 2 t of x x. So these are the wonders of reversibility.
So this just measures the probability of returning to the origin. We are in G. We are still not on the tree. Returning to the origin in 2 t steps. So now we can lower bound it by q 2 t of o o,
where this is random walk on the tree using the exact same connection. So it is just a lower bound, but definitely because the tree has the same labels for many vertices in G.
So we really don't need to go back to the root in order to revisit the same vertex multiple times. But certainly, if we revisit the tree, the tree's root, then we do go back to the origin. So q transition kernel of r.
We used this notation in the course, this script x on td.
OK. But what is this guy? But this guy, this is nothing but the probability. And you can now say that my o also looks like a 0. But this is a 0. So the probability starting from 0, that st equals 0,
where s is the appropriate one-dimensional biased reflected at 0 random walk. Because now we are just forgetting that this is a tree, and we just have this biased random walk that goes
to the right with probability d minus 1 over d and to the left with probability 1 over d, and we reflect it at 0 because we don't go to the negative. So this is nothing but the height, the height of xt. So this is known, this guy. So this is equal to 2 o squared.
I'm writing a precise formula because it's nice that these things are known to this level of a position.
This is equality. And let's write what this actually means. This is asymptotically, if we are looking at t large, this behaves like rho to the 2t.
So what do we care about? We care about this t to the minus 3 halves and about this rho to the 2t. So this means, if we go back to why we needed this, this means that if we look at pt x dot divided by pi dot minus 1 l2 of pi, it's at least 2 rho tilde rho squared.
I have a square root pi here. And now the important things, we have a rho to the 2t divided by t to the 3 halves times n.
And the n comes from this ratio of pi that we said, let's postpone that till later. Here we did it without the division by pi. So dividing by pi gives the factor of a. So now we see that in order to actually kill this n
and bring our l2 of pi, our distance with respect to l2 to be, I don't know, a half or close to 0, then we have to pay a factor. We have to pay t. t should be at least 1 half, first of all,
1 half log base 1 over rho of n. That's to kill the n. But we also have our t in the denominator, which we can kill with an extra sum C log log. So that is the lower bound. It is trivial.
And for the upper bound, we just want to use, I'm using this, by the way, you may not realize this, but one of my purposes in doing this exercise is to remind everyone the expansions, the spectral expansions with respect to l2 distance, which we are going
to use in the proof of the Ramanujan graph. So those of you who find this super elementary, bear with me slightly longer. So for the upper bound, well, all that you need to do is remind yourself of the fact that we can just
write the l2 distance as summation over i running from 2 all the way to n.
Now we have something like this. This is something that we did right on the board last time. We have fi evaluated at x squared. And now we have lambda i divided by d to the power of 2t. This was a different formula. It was equality for this distance.
And now, if we look at it like this, then we see that in order to kill n together with something that is, if you know that all the lambdas are at most, all these guys are at most rho, then you get essentially the same thing for Ramanujan.
Our assumption for Ramanujan was that the absolute value of this guy is at most rho. So we get here something that looks like n times rho to the 2t, which is exactly matching this guy up
to this log-log collection. So that's why the Ramanujan ones achieve this equality. So one last thing.
Yeah, OK, here's one last thing. So this is this formula, the fact that this equals n times,
oh, did I? I forgot the n. And yes, well, we can, OK, maybe we'll give it as an exercise. I think I'll give it as an exercise.
There is an additional way to see this lower bound, which is related to what Charles was talking about yesterday. If you don't want to look at the tree and want to immediately obtain a lower bound that matches this rho to the 2t times n, you can use the fact that, I mean,
how did our upper bound work? It said, well, all your eigenvalues are essentially at most rho. And rho is enough to kill an n in time. Well, rho squared is enough to kill an n
if you raise it to a power of t, which is a half log n. That was the upper bound. The lower bound that you could say if you don't want to look at the tree just uses Serre's theorem that tells you that not only do you have to have an eigenvalue close to rho, close to 2d minus 1, but there's
a linear number of such eigenvalues. Then if you know that there's a linear number of such eigenvalues, then you have to kill an n with that rate. So OK. Sorry, can I start? Yes, but I'll give this. But if it's related to the exercise. No.
OK. You put here the l2 square above where you looked at mu, but then you've not looked at the square below. I don't know what you're talking about. So this should be a minus 1. If you're looking for it to be close to 0. No, no, no. You're looking for this to be close to 0 after the minus 1.
The 1 here is pi, essentially. It's the. So in the previous line, you've got a minus 1. Yeah. The minus 1. But if you're looking for the cloud on the right-hand side to be like approximately. I do. So what? But there's no minus 1 on the second line.
But you can add the minus 1 if you want. I take t to make this other number approximately 0. Or like a small copy. OK. If t is small, then it's big. This is really simple. I'll be happy to walk it through you in the break.
What is you're doing in the minus 1 for this one? Yeah, but for these times, that's why I wrote it in kind of like a faded font. I want to look at the term that multiplies n.
So until you become, the point you could, the question is, so when you start with total variation mixing, you start with a distance of 1 and you need to go to 0. When you talk about L2 mixing, you start with root n and you need to go where? So you could say that you need to go to 1. But you could also say that you need to go to 10.
And it would be the same. Because we know that there is a cutoff in these situations. Because there's a theorem by Sal of Cost and Chen that tells you that Yuval's condition, this product criterion, is going to capture a cutoff for any LP
if p is larger than 1. However, we don't know where that cutoff is. But if we know that there's cutoff, then setting a constant is really the same for all of them. So if you choose 10 and you are happy that it's OK. So the take-home message from this
was spectral analysis does not work. Because this guy, which is our L2 mixing time, this is our L2 mixing time for G and D. We have it precisely. And we can do the same for any LP larger than 2. And for between 1 and 2, you could also calculate it.
But these guys all miss the bound that we are after. What is the bound that we are after for L1? It was, I'll remind you, this was the formula that we got from G and D. So if you think that the Ramanujan graph in general should be like the random regular graph,
then this is the bound that you are after. This looks kind of different. And one of the exercises was a simple calculus exercise to show that for any D that is 3 and above, this guy, the leading order constant before the log n here is strictly bigger than this one, as it has to be.
So as a method, it breaks down. So let's see what we can do. This is a picture of the Ramanujan graphs and some simulation that shows us.
So this is actually a simple random walk in L1, as opposed to L2. And you see that L1 and L2 look like this. So the blue curve is the L1, and it mixes strictly before the L2 does.
And this is the picture of the generalization of this proposition that I wrote down that works for any P. So for any P, it turns out that these Ramanujan graphs and also
the random regular graphs that behave the same way, they attain the fastest possible mixing time. And it goes like this.
And that's the last one, I think. This is a simulation for just the LP distance on one family of Ramanujan graphs, how it decays. And as you can see, you have L1 and then much farther L2
and so on. So all of them take longer and longer to mix. And L infinity is to the far right. Now, this is the theorem that we want to show. We want to say that exactly what we had on GND also applies on a Ramanujan graph.
So this is the theorem that we want to show. But we also know that spectral analysis isn't good enough, just the naive approach to spectral analysis. You just write it down. And we also know that our understanding of a general non-bipartite Ramanujan graph is limited.
So we cannot do the tricks that we did in GND, which, I'll remind you, the bottom line was you needed to know, to estimate the size of the cut between every two balls that had volume root n.
So two typical balls with volume root n, you wanted to know that they have the right number of edges between them. That it's essentially like the epoisson with fixed mean. If you knew that, then you're in business. But otherwise, you have nothing. So the solution in this case, let's keep this on the board
for a minute. The solution is simply to work with non-backtracking walks instead. So as we did for the case of the proof of the walk
in the random regular graph, there we moved to non-backtracking random walks, even though you could have carried out the same proof for the simple random walk, which we did in the first paper. Here it turns out that the move to non-backtracking random walks is imperative for the sake
of this spectral analysis. So let's remind you that A is the adjacency matrix of G, and B is the adjacency matrix of the non-backtracking walk,
let's just say the non-backtracking operator, which acts on directed edges. So it acts on, and this acts on the vertices.
And what we know about, if you recall Charles' talk from yesterday, there is this tight connection between the eigenvalues lambda 1 all the way to lambda n of simple random walk, or of A in this case.
And the eigenvalues theta i, like this, there are dn eigenvalues, so d for the degree and for the number of vertices, the number
of directed edges of B. And it turns out that this relation, maybe I have a picture of it, let me jump for a second to this picture. You see these two circles, you see a circle above,
there are two types of eigenvalue distributions. Now the eigenvalues are no longer real, unfortunately, because this matrix is no longer normal, but you can still plot out the complex eigenvalues.
And they sit over there in the complex plane, and it turns out that for a minuetian graphs, they are all exactly with modulo root d minus 1. Exactly. I mean, there are trivial ones that are plus or minus 1.
Let's ignore them. It's just like you have a trivial one for simple random walk that is d. For a minuetian graphs, you have all of these eigenvalues, you have, let's say, 1 at d, and all the rest sit here at an interval between plus and minus 2
root d minus 1. It turns out that through this relation that Charles mentioned, all your complex eigenvalues, theta i, are going to be transported to have exactly modulo root d minus 1 in the complex plane.
So these are your thetas. They sit here, and you have the plus and minus 1, and you also have the d minus 1, which also comes from this d. So this is what the picture looks like.
And why is that helpful? Well, let's go back to the COM above. It is helpful for the following reason. Let's imagine that our b was a normal operator,
and we'd like to do exactly the same trick as we did before, which is to write down the L2 distance explicitly. What would we get? Well, we would get something like summation over i that is at least 2 up to n.
This is d times n. And there is some contribution for the eigenfunctions. I'm going to ignore it. We can ignore. Technically, you don't need to raise your eyebrows that much. It's actually easy to eliminate the eigenvalues if, for instance, you cared about just
the average starting position instead of the worst one. If you have something like this, if you have a summation over i going from 2 to n of fi evaluated, this is your eigenfunction evaluated at x, your starting position, squared.
And now you have some lambda to the i to the 2t. You have something like this. If I'm telling you that instead of this, you want to take 1 over n summation over x. For every x, you have the mixing time form x.
So this is a quenched property. So we first draw the random environment, the graph. Now there are various points. Each of them has a mixing time. And probably they are all the same. They are all because of symmetry. It's not transitive. If the graph was fixed and transitive, then this thing is exactly 1.
Otherwise, you can still say what it means to mix from an average starting position. If you do that, you can interchange these two sums. And then you use the fact that summation, what do you get? You get summation over x of fi of x squared, which by Parseval is 1. This becomes a 1.
And you are left with exactly what you need. So this is a cheap trick to bypass lack of understanding of eigenfunctions. But it turns out that in this situation of the problem, we actually, even without this, we can go beyond eigenfunctions.
This is not the main issue. We can somehow bypass the fact that we don't understand it. So I'm going to ignore them. Now if we ignore them, what are we left with? Well, we have something that looks like summation over i of theta i divided by d minus 1
to the 2t for the non-backtracking random walk. And now I'm committing a serious crime here by treating B as a normal operator. That is a huge difference.
But let's just see whether this gives the right intuition or not. So the degree of B was here just a 0, 1 matrix. I put, just the way Charles defined it, I wrote 1 moving from xy to yz such that these are two edges
and z is not x and 0 otherwise. So this is a 0, 1 matrix. How many ways do I have to do this on a D regular graph? D minus 1. So D minus 1 is the first eigenvalue, the trivial one. That's why this point is here.
This is the Perron eigenvalue. And now what does this look like? If I am now in the situation, if you trust my, if you believe me that all the eigenvalues magically lined up on this circle at radius root D minus 1, then here I'm going to write just n.
Well, do you want me to write n minus 1? I can write an n minus 1 if you want. OK. What we have here is n times, so we write a root D minus 1 for the modulo of this theta i.
And we divide it by D minus 1, and we raise the entire thing to the power of 2t, which is n D minus 1 to the minus t,
OK? Which allows us to kill this n at time log base D minus 1 of n. n is Dn. OK, so we see that moving to the known backtracking walk,
of which the simple random walk is simply a marginal. It's just the end, the endpoint, for instance, by this exact same argument of the cover tree, right? Remember that for that argument that related simple random walk to known backtracking random walk
was fine for any graph G. It didn't need to be random. So it always suffices to bound the mixing time of the known backtracking random walk. And I'll remind you that that argument had a conclusion. Suppose you can show me that there is cutoff for known backtracking random walk at some time l. At some time l, l for known backtracking random walk
implied the same upper bound, just an upper bound on mixing of at times D over D minus 2 times l for simple random walk, plus there was some error, some plus order root l error.
So that was the reduction. So if we know that we can put in log D minus 1 of Dn as an upper bound, then we get our D over D minus 2 upper bound for the theorem, which is exactly what we're after. And the lower bound is the same.
The lower bound was also fine for any deterministic graph. It was just relying on how fast it takes you to actually see the vertices of any graph with even doesn't have to be regular. It has to be maximum degree D. You can't see all the vertices before that time. But here it's the l2 missing time for the known backtracking.
Ah, but that's an upper bound on the l1. Good question. So what you have here is that the known backtracking random walk had the same lower bound of covering, of kind of seeing all the vertices. So what this calculation that is, of course, erroneous,
because we said that it's normal, but what this calculation would say is that here is the l2 point in time where you mix. This is this log D minus 1 of n, of Dn, it doesn't really matter, for known backtracking.
And here is l1 lower bound for known backtracking. So actually, the known backtracking walk has the same cutoff location for both l1 and l2, as opposed to simple random walk where they are shifted apart. So you move to known backtracking random walk
and they glue to one another. But for the sake of simple random walk, you just needed the upper bound. So your question actually raises a point that I didn't say, that actually this would also show not only would give us cutoff for simple random walk
at the rescale time, but also cutoff for the known backtracking random walk at the faster time, both with l1 and l2 at the same location. So this operator is not normal, but it turns out
that one can bypass that technicality. And we'll see that in a second. But before we do so, let's just see what one can achieve once you know that, for instance, that the l1 mixing time is a time log D minus 1 of n times
a. Once you know, for instance, that the known backtracking walk has cutoff at this t. So we know that if we do a known backtracking random walk now on the Ramanujan graph, if we believe that this argument can be corrected,
should have cutoff at this location. Well, what does that mean? This is actually the time that it takes us to see all the directed edges. And essentially at that time, we mix. So that must mean that the distance, and this is the worst case starting position.
So it means, in particular, that almost all vertices should have that distance from us. We know that almost all vertices are at least that far. If there was a constant fraction that was farther, then it would mean that the total variation distance would be less than 1, because that would be a distinguishing statistic.
So here is where we use the fact that l1 really has a physical interpretation, as opposed to an analytical or spectral interpretation. From l1, we can read distinguishing statistics. So it tells us various things, such as this one. So almost all vertices are at that distance from the origin.
And you can say more things. So for instance, the diameter, I won't go into all of them. The diameter one, for instance, you can say that the diameter is at most 2, this comment from below.
How do you know that the diameter is 2? Well, from one vertex, you see almost all vertices at distance 1 times log d minus 1 over n. So from any two vertices, it suffices that you see more than a half at distance 1 log d minus 1
over n, and there has to be a path between them of length 2 times log d minus 1 over n. So this is trivial. And actually, you can say more things. For instance, you use the fact that this is a non-backtracking walk.
And that means that I can say, well, I don't want you to just reach z here, almost z's, in a path. But I want you to reach z here in a path that starts with this edge and is non-backtracking. And that seems kind of like, OK, what's the big difference?
But for instance, these Ramanujan graphs, they have a huge girth. So some of these constructions have a girth that is 2-thirds log base d minus 1 over n, for instance. So if I force you to go through a specific side,
suppose that the shortest path went through one of the other neighbors from w. In order to actually get to w, because you can't turn around, you have to somehow complete a cycle. And the cycle will cost you a macroscopic amount of time. So a huge amount. But actually, it turns out that there's
such a wealth of paths that you can always get from any direction that you want to any direction that you want to get to z from. Anyway, and there are various other statements. In the paper, we listed various more complicated corollaries.
OK, so the key to the proof, I think, so we started this proof actually after seeing Charles' paper. Because in Charles' paper from 2015, the one that he talked about yesterday, he somehow analyzes the non-backtracking random walks
in a way that really looked like, in order to show that it's weakly Ramanujan, in a way that really seemed close to what we need. We need to somehow do exactly the opposite. Use the fact that it's weakly Ramanujan to somehow count the non-backtracking random walks. And then it seemed like, well, this thing
has to give you the proper control. This slide summarizes what we said. It's just a simple spectral analysis fails. And this picture shows you this. You see how the L1 and L2 glue on the non-backtracking side and don't for the simple random walk.
So for simple, you have L1 and L2 different. For non-backtracking, they glue each other. OK, so two quick things. First, the reason that all the eigenvalues
are on this disk at radius root d minus 1, that's an old fact. It's a result of Hashimoto in 1989. It uses the Hara-Zeta function. And Charles also referred to it as Bass's formula. Bass's formula also implies it and applies to not necessarily
general regular graphs. So he wrote the Bass-Ihara formula. And it tells you that whenever you have an eigenvalue lambda, that eigenvalue gives rise to of A. It gives rise to two eigenvalues, theta 1 and theta 2, of B. And then B
has many more states. It has dn as opposed to 2n, the two that I just counted. But all of these d minus 2 times n are trivial eigenvalues. These are these plus and minus ones that you get from, you have various degrees of freedom. And this is something that I'm going to dedicate
one of the exercises to today. OK, so one little calculation that Charles did yesterday, and he did it a little quickly, and I want to do it again just because I think it's important to bear in mind,
which is how you can actually see that every lambda gives rise to two such thetas. So suppose that AF equals lambda F.
And suppose that theta solves this equation. It's the root of this quadratic. Then I want to convince you that actually you can write down an eigenvalue of B that matches theta.
Also, by the way, if you look at this formula, maybe we'll write it very, very quickly. So this is the solution, right? This is the solution of a quadratic minus d minus 1.
This is the solution of this quadratic equation. Notice that if lambda is less than 2 root d minus 1, that's exactly when this discriminant becomes negative. Then we have a lambda over 2 plus i times.
In that situation, we'll have a plus i root d minus 1 minus lambda squared, or minus that thing. And the modulo will end up with having a theta and a theta conjugate
in that situation, when lambda is less than 2 root d minus 1. And what is the modulo? Well, we'll see that it's lambda over 2 squared minus this squared. The lambda over 2 squared cancels. And we're left with a d minus 1.
This is for the square. So the modulo is root of that, right? So if you believe that theta that solves this quadratic, that's all the non-trivial eigenvalues of Rb, then you immediately see that being Ramanujan implies
that all of them have to be on this circle in c. Does it mean that b has at most 2n distinct eigenvalues? No. Well, it has at most 2n plus 3.
It has at most 2n distinct non-trivial eigenvalues. There are the ones minus ones and d minus one that you can immediately write down. So actually, slightly less than that, because the d minus 1 and one of the one copies come from plugging in lambda equals d.
Right? If you solve lambda, you can see that if you put in lambda equals d, then d minus 1. And the rest n minus 1 eigenvalues, they can give rise to non-trivial eigenvalues on the circle. So how do you see this quadratic
is the right thing to look at? All that you need to do is to define the following. I'll say that g of x, y, this is a directed edge. So I'm defining an eigenfunction explicitly. It's just theta fy minus fx.
This is it. Now I'll just verify very quickly that this is an eigenfunction. So what is b times g of x, y? What do I need to do? I'm summing over z such that yz is an edge.
Non-directed, just a simple edge. It doesn't matter. And I want z to not be x. This is where the operator b would send xy, to an edge yz such that z is not x.
And now what do I need to do to that one? Well, I need to apply g on yz. So it is theta of fz minus fy. This is g of yz.
So what is that? What we have here is exactly theta times a f of y. So minus fx minus d minus 1 of fy.
Because fy gets counted as many times as we have z's. We have exactly d minus 1 z's. Because it's irregular, it's the outer degree of this b operator.
It's the Perroni eigenvalue. So this d minus 1 f of y is just by the regularity of b. And now we're left with this guy. So if we had counted all the z's, including x, this is exactly like applying the adjacency operator, which is af of y and multiplying it by a theta.
Because it would say, look at all the neighbors of y and apply f over them. But we forgot one. We left out x. So minus f of x. And now, this rises. Now we'll use the fact that af is lambda f.
And we have here theta lambda minus d minus 1 multiplies our f of y minus theta f of x. I rearranged it a little.
I took the theta fx out. So I used the fact that af equals lambda f. And I have that. And now, I want to use the solution of the quadratic to say that this is nothing more than theta of g x, y.
Because you just see that you have here exactly this is the point where you just plug in the fact that theta lambda minus d minus 1
is exactly theta lambda is exactly theta squared. So this is theta squared because of the solution to the quadratic. So we have theta squared fy minus theta fx, which is exactly theta times the g that we started with.
So Charles did it very quickly yesterday on the board, roughly. And now, we see exactly where these eigenvalues come from. And the problem that we still need to work with
is the fact that this operator is not normal. So the fact that we have exactly the right kind of control over the thetas, it doesn't really care about it. I mean, it doesn't let us work with it the way that we want to. So for this, that was the bit that acquired some delicate treatment in that paper.
We needed to decompose the operator and show that even though it's not normal, it's decomposable into two by two blocks. So it is unitarily similar to a block diagonal matrix whose block sizes are just two.
So even though they are not one, so it's not a diagonal matrix, they are just two by two. And they look like this. The theta 2 and theta 2 prime are, in the case of Ramanujan graphs, going to be a value and its conjugate, its complex conjugate, just like we saw here,
solutions to this quadratic equation. So in the real case, in the Ramanujan, in the non-Ramanujan case, then you have two real values that are marked here as a theta 2 and theta 2 prime. But they still come from being the two roots of the same quadratic equation coming
from the same lambda. So these can have lots of multiplicities, and that's what made the proof more delicate. Specifically, in the Ramanujan construction that I showed pictures of, eigenvalue multiplicities are as high as n to the 1 third. So if all the eigenvalues were different,
then there would be some easier ways to work around the fact that you can somehow nearly diagonalize. And they're not, but you can still carry through. So with this formula, all that remains
is there's still the business of the eigenfunctions, which I again want to ignore. But the only thing that one needs to now notice
is that if you have a 2 by 2 matrix that looks like this, I'll put here a theta and a theta prime, 0, and I'll write here alpha, just like we have over there. And we raise it to the power of t. So in the spectral decomposition, so you have your lambdas, then
you raise your matrix to the power of t, and the lambdas just rise to lambda to the power of t. And then you square it when you do this passeval. But now we have a 2 by 2 matrix. But it is still easy to see what's going on. So here we have a theta to the t.
Here we have a theta prime to the t, 0 is 0. And here we have something that looks like, I'll call it gamma. And gamma, we're going to call it gamma sub t. Gamma is just this sum.
This is well known. So it's theta to the j theta prime to the t minus 1 minus j. This is all. So your gamma t, this value of gamma t, it's at most alpha times t times theta to the t minus 1.
So instead of just sustaining as above, above we had, let's replace this, above we sustained,
which is now below, we sustained something like a value of theta to the 2t. So we had the theta to the 3, and then we squared it. Now there's an extra term that comes in
from this off-diagonal entry. And this extra term is actually becoming dominant, because it has an extra t in it. So what? So what we get, instead of getting this, we'll get instead constant. This is what we had before. Instead of getting, this is what we had before.
And instead, we have another t squared. This is a minus. D minus 1, 2 minus, exactly what we had before. This comes from the gamma squared. So we need to take t to be the same thing, plus a log log to kill this t squared.
And that's it. You get this by taking the trace of the power of the operator, or? You could, well, there are the eigenfunctions to get rid of, but yes.
OK, so this essentially concludes this proof. OK, now, in the little time that remains, I want to give you an overview, a very rough overview of why we needed to have yet a third proof of Kato
for GND, because 2 sounds like enough. And the point in having, OK, this is the calculation that we just did. The point in having the third proof is to treat with non-regular random graphs.
And if the graphs are non-regular, then this completely destroys the spectral proof. And it also makes the original proof, well, it also seriously damages the original one. But non-regular random graphs are
associated with fascinating features, one of which I highlighted in this slide. It turns out that the thing that plays the key role in understanding what the mixing time is on a graph that is, let's say, not 3 regular, but half the degrees are 3 and half are 4.
What do you do? Already from this exploration process that we saw, this BFS working with the configuration model, we saw that we developed the neighborhood of a vertex. And it was essentially like a D regular tree, right? Now, in the new situation, we can do exactly the same thing. And now the neighborhood of a vertex
will no longer be D regular. But every time, we just sample the degree from one of the remaining half edges. A half edge, a uniform one, would come along. And it will bring all its friends with it. So this would give a Poisson-Golden-Watson tree.
Or for instance, in the case of GNP, or not Poisson, but some other Golden-Watson tree with a degree distribution z, that in the case of, let's say, half degree 3, half degree 4, is not really half 2 and half 3.
It would be the size-biased version of this variable. Because if a guy has many, many half edges, you are more likely to choose it. And it will bring, then, all of its friends as the children. And this is well-known. OK, so as I was saying, the thing that plays a role is then the entropy of a random walk
on the Golden-Watson tree. So the degree sequence will describe, in the obvious way, using this size-biased transition, will describe a distribution, an off-string distribution that I will call z. And now we are forgetting about the random graph. We just have z. And we picture this infinite tree,
OK, this infinite Golden-Watson tree. And we recall that when we had this cover map argument and everything was d regular, then what was the cutoff time for the non-backtracking random walk,
for instance? We wanted to reach level which was log base d minus 1 of n. We wanted to reach a level where we would see n vertices. And that marked the onset of mixing. And simple random walk was just a time delay over this. Alternatively, I could say that this
is the time where your distribution of the non-backtracking random walk, OK, let's look at its entropy. At first, you have a point mass. Then it is uniform on d minus 1 points. And then it's uniform over d minus 1 squared points. So it grows.
The entropy is exactly growing. It is just uniform over the size of the layer, which is d minus 1 to the k at distance k. So this is the rate of growth of the entropy. And you are waiting for this to become just completely uniform, OK?
So the time that you need to wait is exactly that. This is for the non-backtracking random walk. And you could say the same for the simple random walk. It has a delay, but it is essentially the same thing by exactly this analogy between non-backtracking random walk and simple random walk. Simple random walk is nothing more than once you know the distance on the cover tree, then it is just non-backtracking random walk.
OK, so it turns out that this entropy is the thing to look at if you don't have a regular setup anymore. But it's instead now just some offspring distribution, offspring variable z. So I write this Hx on the top left.
And I'm telling you, well, now if you, for instance, you were not listening the entire week. And all of a sudden, you looked at this slide, OK? Then you consider a biased random walk on a Galton-Watson
tree. And you can define the distribution. You can look at the distribution Srwt, which is just the distribution of random walk after t steps on this tree, and count its entropy just like Yuval defined in the previous class. And it's Srw, but were you standing in front? So the fact that this limit exists
is a non-deterministic constant almost surely requires proof. But let's take a leap of faith here. The fact that these constants are well-defined and are non-random is known. And I'm asking, what can you say
about the relation between these three constants? One of the inequalities is trivial. We know the ratio. Actually, I can tell you already. We know the ratio between the top two. But we don't know the other two inequalities in this generality. So for instance, I'll say at the end what we know.
So in this generality, if z can take the value 1 with positive probability, we have a conjecture, but we don't know it. And it's a very basic, seemingly naive question. It's just the growth rate of n2.
Oh, that's loop-erased. Loop-erased, I just mean, this is a tree, so understanding loop-erase is very easy. What you have here, what you have on the left is just simple random walk. It goes up and then goes down somewhere else. The only loops on the tree are the trivial ones. So the loop-erase is just the trace
of the walk where you eliminate all the trivial going back and forth. So instead of, the non-backtracking did not allow you to go back and forth. So the non-backtracking is a ray. However, it is a very simple ray. It is one where you know exactly that if you have now k children, then you
go to each of them with probability 1 over k. Loop-erased also looks like a ray, but it assumes the harmonic distribution over these. It works like the simple random walk would, but then you eliminate all of these. Actually, these guys have an exponentially decaying tail. But the fact that the walk can actually,
this object is extremely more complicated than just the non-backtracking random walk. So here's what I was saying. Formally, you see that hx, this growth of entropy,
that's the key parameter for simple random walk on a graph with the degree sequence. So you have a random graph with some degree sequence. You define a random variable z that is the size-biased distribution that I mentioned, that has the size-biased distribution.
And now, the growth of entropy on the corresponding Galton-Watson tree dictates when mixing should occur, both for simple and for non-backtracking random walk.
So for non-backtracking random walk, this hy, because you can write it explicitly, because you know exactly the growth of entropy, because the probability to visit a vertex is nothing more. So if you have here degree, if you have d1, children here, d2, d3, and so on, the probability to visit this guy is just 1 over d1, 1 over d2, and so on,
because you choose exactly one of the children with equal probability. Now, this is z1, let's write it like that. Now, it's a random guy, z3, and so on. So z1, z2, z3, and so on.
But this is a Galton-Watson tree. So the z's are independent, okay? So if I want to understand what this probability behaves like, I'll take a log, okay? And now, I know that I just am summing log of zi's
that are iid, and I have a central limit theorem, okay? So the growth, I'm trying to say, that the growth rate of this guy is nothing more than the expectation of log z. That's the rate at which entropy grows for a non-backtracking random walk. And I should say that Nicola allowed me to go faster.
So, this is an infinite, ah, z was at least one. If it can have leaves, then you need to define the non-backtracking walk to go up, maybe, and then continue, and it should be true still, but then the proof is even.
But you could still say that it stays something like this and ask what the relation between the corresponding parameters is. So whenever you are stuck, you have to go to your parent, and then continue. Okay, anyway, so expectation of log z will be this h y.
And Justin and Anna Benamou had proof of the second statement using a different argument, and then if you open their paper, you'll see expectation of log z. And they have an argument
that also walks through this exposing, like a tree from a left, a tree from a right, but then there is some, this becomes delicate, and they use some exchangeable pairs technique. It's very nice.
It's going to appear soon, or it just appeared in analysis of probability. In order to get the simple random walk case, the fact that the walk can go back, and somehow, you'd think that it wants to go forward, but actually, this was all part of some detour,
then goes back and chooses another path, and the fact that we do not really understand what the support is of random walk on a golden watson tree, and we understand, but we have just, there is understanding, but it is limited.
Meant that an entirely new approach had to be drafted to, at least in our paper for simple random walk. We also did non-backtracking with trees from the left and right. And okay, what I'll do,
I'll show you one quick thing, because we are running out of time. The picture on the top shows you, for instance, the distribution of simple random walk on, I don't know, one, two, three, four, five, or maybe six levels of a golden watson tree,
with a distribution that is half, half, either one or three. The fantastic idea of plotting the, the fantastic idea of plotting the histogram above the tree is Nicola's. That appeared in a paper of Nicola and Jean-Francois.
Which I stole the idea from the book. Oh, such a good idea. Then it comes from the Bible. Okay, and that's, and that paper discussed, so again, you want to understand
how I mentioned it, criticality, then. But this picture is to show you that this quantity d, that was, the fact that wt over t converges almost surely to this quantity, wt is this log. This is exactly, this is an equivalent formulation
for giving it in terms of the entropy. Goes back to the famous paper by Rastrobin and Yuval from 95. And we, for instance, in this paper had to get some quantitative rate of convergence. Replacing this error of almost sure convergence,
but we needed to get something about the fact that it behaves, essentially that the variance has order t. Okay, so not just that the expectation is dt and that you converge almost surely, but that you have a variance of t. Okay, so I want to finish with two things.
Okay, let me hide this, put this picture here. I want to divide the next five minutes and then we'll finish into two parts.
In the first, I will tell you what it is that we actually did in order to handle this case and why and how to actually, and that would be very easy because I'm going to give you such a high level sketch that it would only take a minute. And the second is to understand the phenomena that is captured by these pictures,
which is something that is kind of a take home message in the regular as opposed to non-regular world. The idea, we already discussed this exposing a tree from the left, exposing a tree from the right. The idea here is to say that when you are doing a simple random walk
on a Galton-Watson tree, this dimension that was mentioned in the previous slide that captures the rate at which entropy grows tells you that even though the boundary of this tree grows like the branching number, it grows like the expectation of z
to the power of k at level k, actually, what does this entropy growth rate mean? It means that the simple random walk, we phrased it in terms of the loop-erase random walk, which is simple random walk
when you erased all the trivial loops and you looked at it at distance k. It means that simple random walk at distance k was confined to an exponentially small segment of level k. Instead of growing like expectation of z to the k,
it grew like some d to the k and d was strictly less than expectation of z. It does go exponentially, but with a different base and that makes it exponentially small
in terms of that level. What does that mean and how to actually make a proof that allows you to understand when mixing occurs? First of all, it accounts for this picture here because how did we discuss what the diameter was
or what the typical distance in the graph was? We said expose the tree until you see all the vertices. That was one of the exercises when we said calculate the typical distance between vertices and then also do it with exponential ones. It's essentially you do a breadth first search and you wait until this tree grows
such that you see n vertices as the leaves, essentially. And that gives you the typical distance in a random graph. In order to do that, since expectation of z is bigger than actually the growth of this segment
to which simple random walk is confined to, we can think of it as if simple random walk, so the tree grows very fast, but simple random walk has its own little tree and it grows at a slower rate. This vertex x is going to see all the vertices
in the graph, let's say here. That's where you have n vertices. And this dictates the typical distance. So when Dourette had his picture of what the distance between the origin and the walk is, the picture would still look like that, exactly as in Dourette's paper with Nathan Elberisticki.
And you could say, well, as soon as it stabilizes, that's when I have mixing. This is a very natural margin. This is what we started the first class with. This is what started this conjecture of Dourette from 07. If you look at that point, if you look just at this margin of the distance,
you'll see that it stabilizes exactly at the location at log base expectation of z of n. And that is easy. However, at that point, simple random walk will still be confined to a smaller, to a polynomial. I mean it's exponentially smaller,
to some polynomial strictly less than, with exponent strictly less than one. And we have a formula for it, but we don't really know to write it down. I can't tell you immediately what it is for even a distribution on two points. Have to run some numerical process to approximate it.
Anyway, so it will be confined here. And in order for the entropy to actually become, to go to log n, so that you are uniform over all the vertices, you have to wait longer until this guy becomes bigger. And that is why on the irregular graph case, you see the blue curve, that is the mixing time one,
starts at one, it has cutoff, but macroscopically later. Whereas on the regular case, both happen exactly at the same time. So I think that in order to understand
just this one minute overview, all you need to think about is that in order to do that, we want to essentially explore the tree, just like we did before. Only the proof says explore only the parts of the graph
that the walk is likely to go to. You can somehow already, by exposing the tree, you can see, ah, this part of the tree already looks like the walk is not going to visit there. So I'll stop exploration here. Because if you just explore the BFS and wait until you mix,
then you'll run out of vertices and the tree approximation is going to die. It's very bad. We keep wanting to have lots of spare vertices so that the tree approximation would be accurate. So you are careful to only explore the really heavy part of the distribution. And you can do that.
And somehow explore as you go along. And then as soon as you know that your entropy is not too big towards the end, you know that you're on a subset of the vertices. And maybe the walk escaped this tree, so that's a bad event.
But if it still got confined to the popular part of where it should be, then most vertices here don't have a large weight. And there's an end game argument that uses the spectrum, which we won't get to. But that's what I meant by a hybrid. We have a combinatorial part which somehow understands the distribution
of the walk on the graph. It bounds the maximum weight of a vertex. And then we say, okay, at this point we punt. And the punt is to use the spectrum to say that this is an expander and say that if the maximum weight of a vertex is let's say at most n to the epsilon over n,
so we are off by a factor of n to the epsilon compared to the stationary distribution, then the time that it will take us to kill this with the spectral gap is log of that, to some constant basis, which is like epsilon log n for epsilon that is little o of one. And then we do this in the,
we are more careful to get the right window. So this will actually be e to the root log n. And then when you take a log, you get the root log n that you want. But these are details. Okay, so I think with that,
I said everything that I wanted to say. I'll just show you the answers to these inequalities that I said that we don't fully know. So this paper we posted on Monday, or on Saturday of this week.
And so we know the left inequality is true if z is at least two. We don't, if you allow some probability for z equals one, should also be true. We don't know it. And we don't know the right inequality at all,
but it should also be true. Okay, new is the speed of random walk. These, the House of Dimension, that was the scary formula from the previous slide. Yeah, so I think I'm done. In the exercise session, I'll start by solving
this distance problem with exponential ones, and then I'll give you handouts of problems that we'll solve outside, like Charles' idea. It was very nice. And then I'll walk between you to somehow, so we'll start with like 10 minutes of just five to go through that one exercise,
and then we'll do the handouts outside. Thank you.