Phase transitions in inference: Tree reconstruction and community detection
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 3 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/46450 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
Feasibility studyAlgorithmMathematical analysisRandom numberNP-hardTask (computing)Phase transitionGraph (mathematics)Laurent seriesDiagramBlock (periodic table)NumberComputer fontBoundary value problemLine (geometry)Physical systemGraph coloringGraph (mathematics)Task (computing)AlgorithmExpert systemPoint (geometry)Slide ruleGodGroup actionGeneric programmingFocus (optics)NP-hardInferenceRandomizationMusical ensembleQuery languageState observerPolynomialMultiplication signInformationParameter (computer programming)Arithmetic meanTheoryBitObservational studyGreen's functionRight angleInstance (computer science)TheoremProcess (computing)1 (number)Set (mathematics)Electric generatorNetwork topologyPhase transitionData recoveryVertex (graph theory)NeuroinformatikThresholding (image processing)DiagramEndliche ModelltheorieFeasibility studyData structureGraph (mathematics)MappingRandom matrixGraph isomorphismInteractive televisionIdentity managementConstructor (object-oriented programming)IsomorphieklasseRootPartition (number theory)Computer fontDifferent (Kate Ryan album)Noise (electronics)BlogSigma-algebraBlock (periodic table)Link (knot theory)WordCue sportsWhiteboardFreewareLecture/Conference
07:09
Matrix (mathematics)RootDistribution (mathematics)InferenceNetwork topologySigma-algebraInheritance (object-oriented programming)Distribution (mathematics)Matrix (mathematics)Endliche ModelltheorieRootRule of inferenceRepresentation theoryData transmissionStochasticMeasurementPropagatorVertex (graph theory)Set (mathematics)Process (computing)Network topologyTask (computing)Descriptive statisticsPositional notationLecture/ConferenceComputer animation
08:40
Wechselseitige InformationInflection pointNetwork topologySequelInformationData structureCondition numberLevel (video gaming)Network topologyNumberInferenceSigma-algebraField (computer science)RootCategory of beingElectric generatorIndependence (probability theory)Task (computing)Process (computing)Inequality (mathematics)Electronic data processingIntegerInformationstheorieBinary treeLimit (category theory)Expected valueInstance (computer science)DeterminismVariable (mathematics)CASE <Informatik>Parameter (computer programming)RandomizationDistribution (mathematics)Endliche ModelltheorieEntropie <Informationstheorie>StochasticAlpha (investment)Positional notationWechselseitige InformationRandom variablePropagatorState observerInfinitySequenceGraph coloringMatrix (mathematics)CountingMultiplication signAverageHypothesisResultantObservational studyConstructor (object-oriented programming)Feasibility studyRoutingTheory of relativityComputer animation
18:33
Element (mathematics)Proof theoryPoisson processNetwork topologyInformation managementHand fanSpectrum (functional analysis)Alpha (investment)TensorproduktIterationMereologyAbsolute valueInheritance (object-oriented programming)Order (biology)CASE <Informatik>Message passingVulnerability (computing)Level (video gaming)StochasticCodecInformationTheoremMultiplication signResultantPresentation of a groupExpected valueEigenvalues and eigenvectorsSigma-algebraPoint (geometry)Fuzzy logicFunctional (mathematics)AverageCategory of beingMatrix (mathematics)Spectrum (functional analysis)RoutingPower (physics)Condition numberLambda calculusSet (mathematics)Proper mapSummierbarkeitProof theoryEqualiser (mathematics)Distribution (mathematics)Data structureConstructor (object-oriented programming)LengthGraph (mathematics)NumberSquare numberParameter (computer programming)Limit (category theory)WordExtension (kinesiology)Squeeze theoremTerm (mathematics)Thresholding (image processing)Type theoryCalculationNetwork topologyVector spaceFeasibility studyDirection (geometry)RootWhiteboardLecture/Conference
26:09
Hand fanThresholding (image processing)Element (mathematics)Network topologyPoisson processProof theorySpectrum (functional analysis)Matrix (mathematics)Thresholding (image processing)Equaliser (mathematics)SupremumConservation lawEndliche ModelltheorieTheoremCondition numberText editorCategory of beingSummierbarkeitInformationSpectrum (functional analysis)Parameter (computer programming)Eigenvalues and eigenvectorsSigma-algebraNichtlineares GleichungssystemRootCodeStochasticBlock (periodic table)Complex (psychology)MiniDiscUniformer RaumMultiplication signVector spaceScalar fieldSymmetry (physics)Inheritance (object-oriented programming)Alpha (investment)VarianceIndependence (probability theory)Game theoryMusical ensembleNormal (geometry)Distribution (mathematics)Arithmetic meanLevel (video gaming)Projective planePersonal identification numberProper mapSubject indexingForm (programming)WordTerm (mathematics)Order (biology)Constructor (object-oriented programming)Spring (hydrology)Statement (computer science)Pay televisionMoment (mathematics)Prime idealRight angleDirection (geometry)Real numberRandom variableLine (geometry)Boundary value problem1 (number)CASE <Informatik>NumberFood energyLambda calculusSquare numberExpected valuePlanningPresentation of a groupResultantPerspective (visual)PredictabilityRadiusINTEGRALNachlauf <Strömungsmechanik>Metric systemLogical constantPoisson processPoisson distributionType theoryRule of inferenceBoom (sailing)IntegerLecture/Conference
33:44
Population densityTime evolutionIndependence (probability theory)Nichtlineares GleichungssystemPhysical lawInflection pointUniform convergencePhase transitionSymmetric matrixData modelLie groupMultitier architectureExecution unitoutputLine (geometry)AreaWeb pageCondition numberDistribution (mathematics)Category of beingDistanceAlgorithmResultantAnalytic setProduct (business)Perturbation theorySquare numberEigenvalues and eigenvectorsVector spaceRow (database)TheoremCASE <Informatik>BitConjugacy classMathematicsLambda calculusSigma-algebraVarianceMultiplication signConstructor (object-oriented programming)Nichtlineares GleichungssystemTerm (mathematics)Level (video gaming)RecursionData conversionWell-formed formulaThermodynamic equilibriumRevision controlNeighbourhood (graph theory)Shape (magazine)Independence (probability theory)Electric generatorNetwork topologyPower (physics)CubeNoise (electronics)RootMatrix (mathematics)Proof theoryEvolutePopulation densityEndliche ModelltheorieMessage passingExpected valuePositional notationExecution unitPoint (geometry)System administratorInformationMereologyPropagatorRandomizationSymmetry (physics)Random matrixBlock (periodic table)Random graphSummierbarkeitAdjacency matrixAbsolute valueGroup actionPhysical lawCircleStochasticSingle-precision floating-point formatWordAlpha (investment)Parallel portTask (computing)Feasibility studyRandom variableTwitterScaling (geometry)PrototypeLoop (music)NumberPhase transitionRankingType theoryFreeware1 (number)MeasurementSpectrum (functional analysis)AllegorySpring (hydrology)InfinityParameter (computer programming)Uniqueness quantificationQuadrilateralPhysical systemAsymptotic analysisLink (knot theory)PressureSelf-organizationCovariance matrixSymmetric matrixRoutingOrder (biology)Normal (geometry)Affine spaceDegree (graph theory)Limit (category theory)System callMassThresholding (image processing)Computer animation
41:20
Proof theoryApproximationInclusion mapMarkov chainProgrammable read-only memoryPartition (number theory)Exponential functionSymmetric matrixPhase transitionExistence1 (number)Phase transitionTheory of relativityVarianceAlpha (investment)Shared memoryConstructor (object-oriented programming)CASE <Informatik>Task (computing)Morley's categoricity theoremGraph (mathematics)InformationWell-formed formulaTheoryStandard deviationArithmetic meanCalculationBlock (periodic table)Parameter (computer programming)MereologyCondition numberEigenvalues and eigenvectorsSymmetric matrixNumberDegree (graph theory)Endliche ModelltheoriePoint (geometry)Partition (number theory)Spectrum (functional analysis)Thresholding (image processing)Matrix (mathematics)Group actionTerm (mathematics)Lambda calculusMessage passingBlogSet (mathematics)Axiom of choiceDeterminantAverageNetwork topologySigma-algebraDiagonalConcentricMusical ensembleInstance (computer science)BitGoodness of fitSoftware developerExpert systemSpectral methodAdjacency matrixPropagatorDistanceSystem identificationWordData conversionValue-added networkOrder of magnitudeDifferent (Kate Ryan album)Dot productBoundary value problemShape (magazine)Sparse matrixTrailMetric systemSquare numberClassical physicsOffice suitePerturbation theoryIdentity managementStochastic processRandom graphDiagonal matrixRankingBlock matrixSummierbarkeitSweep line algorithmMultiplication signPlotterGreatest elementCommitment schemeControl flowRight angleField (computer science)Cycle (graph theory)StochasticRootAlgebraElectric generatorGrass (card game)Computer animation
48:07
Proof theoryPartition (number theory)Exponential functionPhase transitionExistenceSymmetry (physics)Symmetric matrixThresholding (image processing)ExponentiationTheoremStress (mechanics)NP-hardHill differential equationPower (physics)Level (video gaming)Matrix (mathematics)Partition (number theory)Group actionNumberEigenvalues and eigenvectorsBoundary value problemInterior (topology)Sigma-algebraMusical ensembleDistribution (mathematics)Uniformer RaumSpectrum (functional analysis)Symmetry (physics)State of matterPoint (geometry)MappingCondition numberSurface of revolutionDiagramRevision controlIndependence (probability theory)DistanceCalculationDynamical systemEscape characterArithmetic meanFood energyMathematical analysisVarianceEndliche ModelltheoriePhysical lawLimit (category theory)Block (periodic table)Thermal fluctuationsHeat transferNetwork topologyAlgorithmProcess (computing)Parameter (computer programming)Geometric quantizationGraph (mathematics)AverageFunctional (mathematics)Characteristic polynomialCASE <Informatik>Instance (computer science)Constructor (object-oriented programming)PropagatorDialectControl flowPermutationWechselseitige InformationProbability distributionChemical equationDual spaceMoment (mathematics)Phase transitionComputational complexity theoryStatistical physicsPerspective (visual)Forcing (mathematics)FreewareAssociative propertyCombinatoricsInfinityAttractorThresholding (image processing)Multiplication signGezeitenkraftPerformance appraisalNumeral (linguistics)Medical imagingExpected valueResultantMechanism designBinary fileData structureEqualiser (mathematics)Graph (mathematics)StochasticRight angleSymmetric matrixGleichverteilungPolynomialInformationNeuroinformatikSlide ruleComputer animation
55:51
Phase transitionNP-hardRandom numberVertex (graph theory)Correlation and dependenceEigenvalues and eigenvectorsIRIS-THill differential equationTrailNewton's law of universal gravitationSpectrum (functional analysis)Vector spaceMusical ensembleWebsiteFunctional (mathematics)Context awarenessDirection (geometry)Eigenvalues and eigenvectorsSpectral methodPosterior probabilityDegree (graph theory)MassSampling (statistics)Standard deviationPoint (geometry)Graph coloringSpectrum (functional analysis)Matrix (mathematics)Complex (psychology)Configuration spaceCurveInformationGraph (mathematics)IterationMeasurementDistribution (mathematics)Group actionEndliche ModelltheorieRandom variablePropagatorRandom graphRootNichtlineares GleichungssystemSpacetimeNumeral (linguistics)Procedural programmingArithmetic meanThermal fluctuationsCorrespondence (mathematics)Personal identification numberMedical imagingAlgorithmInstance (computer science)Set (mathematics)LoginBitMetropolitan area networkPhase transitionConcentricWell-formed formulaDynamical systemAdjacency matrixPredictabilityResultantTensorproduktOrder (biology)StochasticBlock (periodic table)PolynomialMultiplication signCross-correlationSpinorMessage passingData structureThermal expansionAlpha (investment)Goodness of fitNetwork topologyPower (physics)NumberLengthThresholding (image processing)WordLambda calculusComputer animation
01:03:35
Eigenvalues and eigenvectorsEndliche ModelltheorieScalar fieldConstructor (object-oriented programming)Eigenvalues and eigenvectorsInformationRadiusSubject indexingSquare numberGame theoryLambda calculusReal numberAlpha (investment)MiniDiscCondition numberBlock (periodic table)Instance (computer science)Spectrum (functional analysis)CircleRootVector spaceNeighbourhood (graph theory)WordResultantMatrix (mathematics)Proper mapPolynomialCross-correlationMultiplication signTheoremFunction (mathematics)Power (physics)Sigma-algebraInfinityBounded variationProjective planeNetwork topologyAxiom of choicePerspective (visual)RenormalizationSummierbarkeitDistanceGoodness of fitStatement (computer science)Computer animationLecture/Conference
01:11:19
Phase transitionSymmetric matrixRankingSpectrum (functional analysis)TheoremLipschitz-StetigkeitParallel portSummierbarkeitLambda calculusPositional notationTheoremEigenvalues and eigenvectorsNichtlineares GleichungssystemConstructor (object-oriented programming)Matrix (mathematics)Alpha (investment)Different (Kate Ryan album)Phase transitionRandomizationRevision controlEmpirical distribution functionRandom matrixBitVarianceTerm (mathematics)Symmetry (physics)Endliche ModelltheorieRandom graphRandom variableSummierbarkeitMereologyAdjacency matrixAbsolute valueSpectrum (functional analysis)Symmetric matrixCondition numberSingle-precision floating-point formatArithmetic meanCASE <Informatik>Group actionParallel portSquare numberBlock (periodic table)RankingNoise (electronics)Population densityExpected valueMultiplication signDiagonalSigma-algebraSparse matrixOrder (biology)Row (database)Block matrixResultantPerturbation theoryExpressionPolynomialDegree (graph theory)Parameter (computer programming)Line (geometry)StochasticSource codeComputer animation
01:19:02
Well-formed formulaEndliche ModelltheorieSpectrum (functional analysis)Sparse matrixDegree (graph theory)Classical physicsWell-formed formulaDeterminantThresholding (image processing)Symmetry (physics)Spectrum (functional analysis)Matrix (mathematics)Degree (graph theory)Identity managementCalculationTrailMultiplication signNumberEndliche ModelltheorieEigenvalues and eigenvectorsLambda calculusParameter (computer programming)ConcentricAlpha (investment)DiagonalRandom graphDiagonal matrixGroup actionAdjacency matrixStandard deviationArithmetic meanTheory of relativityAveragePhase transitionPoint (geometry)Modulo (jargon)Perturbation theorySparse matrixSource code
01:22:13
StatisticsPerspective (visual)Computational physicsPhysicsKolmogorov complexityAerodynamicsPhase transitionPropagatorNP-hardSoftware developerPerspective (visual)Matrix (mathematics)Spectrum (functional analysis)Message passingEigenvalues and eigenvectorsCASE <Informatik>AlgorithmPhysical systemPolynomialGroup actionSimilarity (geometry)SpacetimeNichtlineares GleichungssystemApproximationInformationIndependence (probability theory)TheoryRevision controlTheoremMorley's categoricity theoremAxiom of choiceGraph (mathematics)Set (mathematics)Complex (psychology)IterationSparse matrixPoint (geometry)Order (biology)Correspondence (mathematics)Instance (computer science)Characteristic polynomialDegree (graph theory)Numeral (linguistics)Limit (category theory)Natural numberEndliche ModelltheoriePopulation densityPhysicalismStochasticUniverse (mathematics)Block (periodic table)Physical lawArithmetic meanVarianceResultantCondition numberDescriptive statisticsMultiplication signProper mapCentral limit theoremSampling (statistics)Symmetric matrixFood energyGeometryStatistical mechanicsConstructor (object-oriented programming)Spectral methodDistribution (mathematics)Order of magnitudeComputational complexity theoryStatistical physicsPower (physics)DistanceLecture/Conference
Transcript: English(auto-generated)
00:18
It's an honor and a pleasure to be invited here in this prestigious place.
00:24
So I'll be talking about inference problems and especially free reconstruction and community detection. So let me first say a few words about the kinds of inference problems that we have in mind in this area.
00:43
If you take the first one, community detection, typically the task consists in processing a graph, which represents relationships between entities. And you want to cluster these entities into groups such that nodes within a group are statistically similar.
01:01
So the picture on the top is the graph of quotations between bloggers in the US and some were Republicans, some were Democrats. And just by looking at the graph, you can recover who's a Democrat, who's a Republican, and that's what this picture illustrates.
01:22
So this is a very generic task, community detection. And that will be a focus of this talk. A second one where we are trying to retrieve some structure hidden in a large data set, that of graph alignment. So I'm giving this example just to show that there are many
01:45
problems beyond that of community detection. So graph alignment is the task of given two graphs, find mapping of the vertices in one graph to the vertices in the other graph so that you have more or less graph isomorphism.
02:01
And you have many tasks that can be phrased as a graph alignment problem. For instance, you can have a graph representing the interactions of users of an online social network. And you may have on one such graph identities that are revealed. And you may have another graph representing interactions of users of
02:23
another online social network for which the identities are not revealed. And so if you can find the isomorphism, you can de-anonymize the second graph. So that's one example, but you have many more. So there are algorithms for all of these tasks. There are plenty of algorithms, but it's hard to say which one is good,
02:44
what are the relative merits, what is the difficulty of the task at hand. So the philosophy of the work that I'm going to describe is, let's look at large random instances of these inference problems. And by analyzing those large random instances,
03:01
we'll have a handle on the hardness of the task. Is it difficult? What kinds of algorithms will succeed? And we will be able to prove theorems about the feasibility of the task. And we may also develop new algorithms in the process of analyzing the performance of existing ones. Okay, so I'll be focusing on community detection.
03:24
You can forget about graph alignment for now. And what I want to convey is this kind of picture. So you have a large instance of a community detection problem. You're trying to recover blocks underlying the data set.
03:41
And by varying the parameters of your probabilistic model, your generative model of the task, you may have three situations at least. So there are three depicted here. There is one set of parameters, so I'll go over the meaning of the axis. Don't bother trying to figure out what they mean for now.
04:01
You may have three different outcomes at least. One where the signal is just too weak. You cannot recover meaningful blocks from the observation of this large graph. There's a two-week signal to noise ratio. This is impossible. There's a situation where it's not only possible but available polynomial time.
04:22
Algorithms can be shown to succeed at extracting the information. That's the easy task. That's the easy region. So that's the triangle in green on the left. And then in between you have this very puzzling region where you have signal. You know you can extract it in exponential time,
04:41
but you don't know whether a polynomial time algorithm exists that will succeed at extracting the signal. And so you can take your point in this diagram, move up increasing the signal to noise ratio and cross boundaries. So the red line would be an information theoretic threshold. That's where useful information appears in your data set.
05:02
And when you cross the blue line then that's a computational threshold. That's when the problem becomes easy. Whereas it was feasible but hard before that. Okay so that's what I want to convey for the particular problem of community detection.
05:21
But before doing that I'll go over the tree reconstruction problem which is interesting in its own right and which also paves the way for the understanding of this community detection problem. All right so here's the outline. I'll describe two phase transitions that occur in the problem of tree reconstruction.
05:43
Then I'll move to community detection related to the tree reconstruction problem. I'll talk about the three phases. So showing that a hard phase exists indeed. Show that above the so-called Kestens-Tegum threshold
06:02
we can indeed recover the hidden information in polynomial time. And then the last bit I want to cover is a link between study of this community detection problem and random matrix theory. All right so let's start with tree reconstruction.
06:25
And I'm missing a picture on that slide so I'll start using the chalk on the board. So here's the setup. We have a generality tree and so we have a non-sester or a root at the top of the tree and then we have children.
06:44
And the root has a trait that could be color of its size whether blue or brown. And these traits are passed from father to son or mother to daughter if you prefer in a probabilistic manner.
07:01
And so we can look at the tree down to a depth d. And so we would have a spin or a trait sigma i. And the rule for a transmission of traits in this model is probabilistic.
07:20
We have a stochastic matrix p which gives you the probability that sigma of i equals t given sigma of the parent of i equals s. And so we'll assume this is an irreducible matrix and we'll denote by nu
07:43
it's a stationary distribution so nu p equals n. So I'm almost done with specifying the model. The last thing I need to mention is that the root has a spin that is distributed according to the stationary measure.
08:00
And that these propagations are done independently of one another. So we have a description of the process and the task informally is given the tree to depth d. So I'll add a bit of notation.
08:21
td would be the tree to depth d. I also denote by dd its vertices, ed its edges, and by ld the set of depth d nodes.
08:40
And so the question will be can one infer non-trivially the spin root from observation of the tree down to level d and the traits of the descendants. So it's as if today we looked at the colors of our eyes. We knew about the genealogy tree from backwards to Eve or Adam.
09:05
And we wanted to figure out whether Adam had blue eyes or brown eyes. The number of children is fixed. So right now I have not said when the tree is. I will be assuming this is a Galton-Watson branching tree.
09:25
So that will be my hypothesis throughout. So formally what is the definition of reconstructability? We'll say that reconstructability holds if the mutual information between the spin
09:41
and the root that we're interested in and the information gd, which is the tree down to level d plus the spins at depth d, does not vanish as d goes to infinity. I assume I don't need to recall what mutual information is to this audience.
10:05
Don't hesitate if it's convenient to write it. So I of x, y, the mutual information between two random variables x and y would be the entropy of x minus the entropy of x given y.
10:24
And this would be some of the values that it can take of the probability that it takes that value times the log of one of the probability that it takes this value. And conditional entropy would be the same with some of our x, y probability
10:51
that x is x, y is y, log of one of our probability that x equals x given y equals y.
11:05
So here's the mutual information and it has the nice property that the variables would be independent if and only if this is zero. So that's this reconstructability notion captures the fact that we have information correlated
11:25
with the root spin arbitrarily far down the tree. So that's the first task we'll be interested in. And then we may ask for a harder task which is so-called census reconstruction.
11:42
If I am given not the tree to depth d but only the spins of the nodes at depth d and even less than, well if I don't know the tree I can only count how many individuals at generation d have such and such traits.
12:03
This is what we call the census. If we do a census of the population today we count how many have brown eyes, how many have blue eyes and we have got these counts. Can we from these counts infer non-trivially what was the color of the eyes of Adam or Eve?
12:22
So that's census reconstruction and formally this is going to hold census reconstructability. If the mutual information between the spin at the root sigma r and the census in generation d does not vanish as d goes to infinity.
12:42
And so indeed we need to specify what the tree is and for us it will be a Galton-Watson branching tree and a critical parameter will be the average number of children per individual. So that will be alpha. So the parameters of the model will be p the stochastic matrix and alpha the branching
13:02
number of the tree. And of particular interest will be the case where the number of children of a given node will follow Poisson distribution with parameter alpha. You assume that the matrix p the stochastic matrix is known or is unknown? I assume it's known.
13:21
Yes. I assume it's known. So we know the parameters of the model and we try to answer these two questions. When is the problem the tree reconstruction problem physical? When is the census reconstruction physical? Okay so perhaps one thing to mention is that we can introduce notation
13:48
mu r s d the q is the probability that sigma r equals s and so we have that
14:02
i of sigma r d d will be given by somewhere s. So I did not mention that but I use q as the number of possible traits of each individual
14:21
and so the spin values will run over the integers from one to q. And so this mutual information would be then mu r s d log of mu r s d over mu s.
14:40
That's a convenient property. And from this actually we can see at once that the task is going to be impossible. This will go to zero as d goes to infinity if and only if the distribution
15:01
u s d will go to u s will converge in probability to the deterministic distribution u s. Okay and one more thing to note is that this must converge almost surely to a limit
15:25
because this is the expectation of a random variable conditional to some information structure and actually we can see this we can enlarge this information structure by giving also the
15:44
the spin values below level d d plus one etc revealing also the tree below that but the conditional independence properties we have in this model make sure that so maybe I can write that so I can write h d this is the information containing the full tree
16:07
and sigma l d plus j greater than zero so because of conditional independence of h d and
16:28
sigma r given g d I have that mu r s d is also the probability that sigma r equals s
16:40
given h d and this decreases with d so this must converge almost surely to limiting distribution and that would be the probability that sigma r equals s even
17:02
that I might call h infinity which is the limiting information structure infinitely far down the tree. All right so perhaps another remark is that these limits must exist
17:22
and that is something that follows from a basic inequality in information theory I have that for instance sigma r and x d plus one are independent when I condition
17:44
on this is a mark of random field property of my my propagation model so if I reveal sorry yeah that's that's true for the the Galton Watson branching process
18:04
so this implies and this is a basic data processing inequality of information theory that the information common to x d plus one and sigma r must be less than information between
18:23
x d and sigma r so we have actually decreasing sequences so the limits exist there's no problem with that so the first result that I want to to present now is a characterization of when sensors for constructivity is feasible and when it is not and that is covered
18:50
in greater generality than I will tell you in the paper by Elhanan Mossel and Yuval Peres in 2003 and to state the result we need to describe the spectrum of the stochastic matrix p
19:07
and so the result is as follows we know it's a stochastic matrix we know its largest eigenvalue if we order them by absolute value is one so let's take lambda two the second largest
19:22
eigenvalue in modulus and so the the result is as follows we have reconstructivity if alpha the average number of children times the square of this second eigenvalue is strictly larger than one and we do not have reconstructivity sensors reconstructivity if it
19:41
is strictly less than that okay and in the equality case they show that for certain types of trees then it's not reconstructable so we would want to say it fails if it's less than or equal to one but it's not shown for every situation so I'd like to explain to
20:04
some extent the arguments in the proof I'll be much more sketchy further in the talk but for this one I think it's important because it will allow us to understand why we keep talking about this extensive threshold and where it comes from really we will see that in trying to prove
20:25
this result so let me is there an eraser somewhere oh there are plenty here okay a big one no theorem about reconstructivity none I mean non-senses like that I'll talk
20:52
about it I'll talk about it yes so I'll quote theorems I will give one implication in one direction
21:03
and I'll talk about reconstructivity afterwards but I want first to convey what is this case-tensive threshold and that characterizes the census reconstructivity so the proof goes as for the feasibility case so let's assume alpha lambda two square is larger than one
21:27
then we can form from the census okay so first let's take x in r to the q an eigenvector
21:44
of d which is associated with this lambda 2 eigenvalue and now we can form a random variable z d which is by definition the sum of our spins i in a level d of the entry of this
22:06
vector x evaluated at the spin of that node and so we will this is a function of the census because you can write it also sum over s of x s times capital s and x s d so that's
22:27
really a function of what we call the census and we show that in fact this already contains enough information to do non-trivial reconstruction so how do we do that we use martingale
22:41
convergence arguments and to do that we introduce a proper information structure filtration and the one that will be adequate for us is the information present in the tree
23:01
to level d and all the spins of nodes to level d so and not that sigma vd vd is the set of nodes in the tree to level d okay and so we'll show that this is a martingale for this information structure that it converges to
23:24
limit that still carries information about the spin at the root so where is it a martingale so we look at expectation of z d conditional and f d minus one we can
23:42
oh i forgot to normalize yes i i need a i need a term i need lambda to alpha to the d here if i really want it to be a martingale so that will be lambda to alpha to the minus d times the expectation and i want to
24:08
make z d minus one appear so i write it this way and sum of our nodes in level d minus one and then i actually have moved the expectation inside and i have
24:25
a expectation of the sum of d such that their parent is i of x sigma j given f d minus one so that's the first part of the calculation showing this is a
24:45
martingale and so we know there are on average alpha children and we know that conditional and the spin of their parent sigma i the spin downwards will be a t with probability p sigma
25:01
i t so this uh read and that will fall to minus d sum over i and d minus one and now we have we have sum over c of alpha p s p sigma i t x t so now we leverage the eigenvector property
25:25
that would be precisely the summation over t x sigma i times lambda two and so okay i can from it on this board so i will indeed get lambda two alpha to the minus d plus one
25:46
sum over i l d minus one so that's precisely as l d minus one so that that's the part that does not depend on the assumption it is always a martingale whether we are above or below the
26:00
kestel stegum threshold and so now let's leverage this kestel stegum assumption and leverage it by showing that the martingale has a bounded second moment and if it has a
26:20
bounded second moment it is uniformly integrable and if it is then we have a theorem that tells us that it converges almost truly and in l1 through limiting random variables so the martingale convergence result is if supremum over d
26:44
of expectation of z d square is finite then there exists such that precisely the expectation of
27:06
so that will that will suffice for us to prove that we we have information in z d arbitrarily largely about the spin rule because we have that
27:24
maybe i'll prove that and then i'll go back to how we extract information based on this variance now and that's where we we see the kstel stegum condition appear so the variance of z d
27:43
we write as the variance of expectation of z d given f d minus one plus the expectation of variance conditional z d minus one so the first one by the martingale property
28:08
this is just variance of z d whereas the second one this is the expectation and now if we look at the Poisson situation we will have something that will be
28:28
sum over i in l d minus one and we'll have up to some constant a term that is the variance of the Poisson random variable with a bounded mean so maybe i
28:48
should not detail this too much let me just say that we have a term that is less than constant times one for each of the for each of the uh nodes in level d minus one the contribution
29:06
to the variance is order one okay i have morally each of them contributes an offspring offspring of type t let's say we will have a Poisson distribution with a parameter alpha
29:22
p sigma t this is going to be multiplied by x t so we'll have a constant we can take supremum of this x t we have we'll just have things that can be bounded uniformly for all i and then we have one contribution per node at level d minus one so
29:45
except okay that i forgot again my lambda two uh that pops out when i take the uh the variance okay it is d minus one for the first term
30:03
yes you're right okay so um all right so now now we can leverage the uh the assumption i have
30:22
because um the expectation of sum over i and d minus one of one this is precisely alpha to the d and so i have that variance of z d is less than variance of z d minus one plus up to a
30:54
i have conservation of an alpha to the editor and so this is less than uh
31:01
c some of our all integers and then uh two square times alpha to the minus k so this is uh less than than another constant so the uh k-stance t boom condition implies that
31:22
we have a uniformly integrable martingale and now let's look at the expectation of this martingale conditional on on the root spin and that's precisely x sigma r and so one thing we need to
31:48
remark now is that since we have an eigenvector that is associated with a eigenvalue that is not the trivial one of this stochastic matrix this is lambda two is distinct from one then um the eigenvector cannot be constant because the
32:05
all one's vector is an eigenvector associated with the one eigenvalue so this is not constant and hence uh we we if we want to have a mean for this limiting random variable that is uh conditionally conditionally on the spin at the root uh uh dependent on this spin at the
32:29
root we cannot have independence so i i think i i can skip detailing that but this is really what drives the reconstructability uh property okay uh so for the other direction uh i just
32:47
um i just mentioned a theorem by kestin and stigu that is instrumental in showing that we cannot hope to achieve reconstructability based on the census if
33:04
if we are below the threshold so the the theorem by kestin and stigu says the following thing so if alpha lambda two squared is less than one we have that
33:24
taking x as d the number of individuals at depth d of type of a spin s we normalize by their expected subtracting their expected value now if we uh take an appropriate normalization we look at this or q then this will converge in distribution
33:53
as z goes to infinity to a gaussian random variable with a covariance matrix which is
34:05
independent of sigma r so that's conditionally on sigma r the limiting distribution is independent of the spin root value so that's instrumental in the proof you need further
34:21
steps in order to to deduce the property but this is really what the crux of the proof and so that was done in 66 by kestin and stigu okay so this one i wanted to give some details i'll be much more sketchy uh from from now on um so i i'll now mention the other threshold
34:45
we were considering uh which was reconstructability so that was all about census reconstructivity we we have discussed in single threshold now what about uh reconstructivity well the information about the root that is captured by the
35:06
spins in level d and the tree between the root and level d can be summarized in the conditional distribution of the spin at the root given the information we have and it turns
35:21
out that there is an algorithm that allows us to recursively determine this distribution this is the so-called belief propagation algorithm that's been discovered several times and is usually attributed to judea pearl in 82 and so basically to determine the the distribution of
35:48
a spin at a particular node given its its descendants at level d so let's have a tree like that we have node i here and then it has some descendants at level d that i call
36:09
l id and then you have the whole of the level d here so by applying
36:21
the conditional independence property plus the base formula you can really in a couple of lines establish this recursive formula for these conditional distributions of a node spin knowing its descendants in generation d and so that's also called the sum product algorithm so basically
36:43
this distribution is given by a product over its sons of some of its sons distribution conditionally on their own descendants in generation d and you have an appropriate
37:01
normalization which involves matrix p as well as the stationary distribution mu of p and so that's something you can propagate a towards towards the root and you need to initialize it properly by taking the direct mass at the true value of the spin that you do
37:22
observe in generation so that's an algorithm but this gives you if you compute it all the information there is to to know about the the quantity of interest which is the spin at the root but this is besides being an algorithm this is also an analytical tool
37:46
in particular because of this you can find the recursion for the distribution of this conditional law given the information at level d for the the spin at the root given
38:04
that this spin at the root takes a particular value of t and so this is a simple fact you can condition on the value of the spin at the root let's say t you can determine its number of
38:21
d you can determine the spins of its children then conditionally on that the messages that come back from level d using the bp recursion will necessarily correspond to if you go down to level d plus one they will correspond to what you would have
38:43
uh taking the information from level d so in other words we have for this pi d a recursion that is called the density evolution that is that is as follows so pi d plus one is characterized as the law of the outcomes of the
39:06
algorithm if we plugged in inputs in that recursion that were distributed according to pi d okay so this is in this sense that belief propagation is not just an algorithm but
39:22
it is also an analytical tool because once we are equipped with this density evolution equation we can deduce things about the feasibility or not of the reconstruction task okay so uh something i've already mentioned is that non-reconstructibility amounts to this distribution
39:44
of the conditional law of the root spin converging to the Dirac mass on the unconditional distribution and so there is something that can be deduced from this consideration about this density evolution equation which is if there is a unique fixed point for
40:03
this density evolution equation it has to be the trivial one okay and since the conditional distribution of the root spin given the information at level d has a limit and that this limit must satisfy must be a fixed point hence if there's only one fixed point we know
40:27
for sure that our reconstructability is doomed okay and so in a paper by Andrea and Marc Meza in 2006 they they claimed this but also they showed the converse that at least when the
40:42
spin distribution at equilibrium is uniform then non-uniqueness implies reconstructability so if you have non-uniqueness you can craft signals from the spins at level d propagate them upwards and you will get something that is non-trivial that contains non-vanishing information
41:04
about the spin at the root so a further use of this density evolution equation was made by Alan Sly in 2009 where he established something that was the t their spin values will be
41:21
uncorrelated that's what's given by that and we can deduce that if true reconstruction fails indeed block reconstruction in the associated committed detection problem fails but the converse is true as well because it seems that since you have a locally you see a tree
41:45
and locally your common field has the same structure as in the tree if you can reconstruct on the tree then you would be able to reconstruct on the ground well on the tree you get to see the spins at the far away distance that you don't get in the graph so
42:04
you have yes you are stuck at generation log of n it's it's not that in the tree we have cycles so this is okay but if you can reconstruct on the tree that means that if you look at the the tree you see up to distance d then you have
42:23
some information on the root right if i look at the shape of the tree and at the spins right and since locally in the graph you see a tree if you explore not too far up to this that means that in the graph if you explore up to distance again and you see your tree shape then you if tree reconstruction works that means
42:47
you you should be able to take something about the the spin in the grass and not not necessarily not necessary so uh maybe i show plots perhaps i can't observe many labels you don't have any
43:03
spins that's the difference three they give you the spins at the bottom ah yeah they don't give you any spins ah i see uh the you just have to put the just the graph
43:21
yeah okay but the the converse i mean it is tempting to say free reconstruction holds then i can show the reconstructors i think not so but we have the experts in the room to to
43:40
give us more more insight into that but but with pre-construction means you you are given the labels yes far away but no labels you are no labels you want to cluster nodes but you just see the graph no labels anywhere okay so we we have a sufficient
44:06
condition for impossibility of the task we can just look at the true construction if it's failed the community detection is failed as well let's see now that there is indeed a region where we have uh reconstructability but we are still below the kesten stigum threshold
44:25
so uh that that's an argument that was made by uh uh bonks et al in in a paper in 2016 for the symmetric stochastic block model which is if you like the uh the one corresponding to
44:41
the symmetric pots model so the way they parameterize it is uh to take as the rst matrix two parameters c in on the diagonal and then c out of the diagonal so fairly straightforward to determine the ksten stigum condition on that the mean number of
45:02
neighbors that would be seen plus q minus one c over q and i think the uh okay never mind the exact value of this this is a simple algebra um and so uh what what they show is that for
45:21
this model for q larger than or equal to four uh then uh you can have uh reconstruction in the stronger of the two senses that i proposed so uh strictly positive overlap uh while being below the ksten stigum threshold and so let me try to to explain briefly the idea of the argument
45:43
so uh so you can consider a sigma hat i an assignment of labels to nodes that creates a balanced partition as you say
46:06
sum over i of the nodes i labeled s this is n over q and uh then we we can consider good partitions these are balanced and moreover they uh
46:29
they put the correct number of edges crossing the boundaries of the true blocks and those
46:41
internal to the uh to the uh blocks so i can put m in of sigma hat this is the number of so okay so maybe it's a good time to take a five minute break
47:33
okay
48:11
now use in the account the number of edges that cross the boundaries so that that would be
48:36
m out rather the account would you remind us okay okay yes yes so we are in this
48:45
stochastic block model world we have a graph that we observe we want to show that there is a hard phase where you can reconstruct non-trivially the underlying blocks but still you are below this ksten stigum threshold so ksten stigum threshold we will see is a good
49:06
candidate for an information computational threshold yeah and so the point here is to show that you can do something non-trivial even if you're below that so that the white region in the first diagram i showed is not empty and so uh they consider the symmetric stochastic block model
49:27
and they proved indeed that this white region exists and so i'm trying to give the gist of argument it goes as follows you take a candidate partition of the nodes into into
49:45
blocks of equal size a balanced partition then you will insist on the number of edges crossing these blocks you draw and the number of edges internal to these blocks you draw to be close to
50:00
what it would be an average if you have the true partition so if you know the parameters of the model you can compute that so you similarly i would define m in of sigma hat the number of edges internal to the partition i'm considering and so a good partition is one
50:30
such that these numbers they don't behave too much from what you would like them to be if
50:40
you have the true partition so that's the expected number under the true partition and so they have threshold conditions which look like that so they enforce that right and so given these conditions you can take exponential time check all the balanced partitions check if
51:02
these hold and if one satisfies the criterion you stop and what they show is that for some parameter ranges any balanced partition that satisfies this must have another lab that is a lower bounded away from zero and the way they do that is by applying the first moment
51:27
method they just look at the expectation of the number of balanced partition that satisfy those and yet that have another that is below a particular value and so by playing with
51:41
combinatorics they show this must go to zero as n goes to infinity and so this is the idea behind their argument so using the first moment method and the proper notion of a good partition they reduce the problem to a combinatorial problem okay so the white region exists okay and that's something that was expected from the result of slide in 2009
52:05
on the tree that there is an intermediate region but okay again can we have a diagram where we superimpose the tree regions with the plural and the regions we know on graphs so okay let me try so okay so that was q and i guess c in that was and so the one i thought
52:35
three or four graphs so yes we know that we don't know for sure okay so that's ks and so
52:45
we know this is easy well that's what i'm going to tell you next for graphs you can do in polynomial time something and we know census reconstruction is uh is in this region so the the diagram i was drawing was uh here three recon and non-reconstruction and so we know that
53:07
graph as well yes and so and here the transition appears so it's distinct for q equals four okay they start diverging and for uh so that's the tree diagram and now you would put
53:27
something like that and so it would be feasible but hard for uh graphs in this narrower white range and impossible below but we don't expect these two boundaries to be one
53:44
understand um all right so let's move on to uh above the kesten stigum threshold and the point is to see what can be done above here and so there's been a very inspiring
54:09
conjecture made in a paper by uh alenka and co-authors desal et al in 2011 saying well if you are in this region you can consider belief propagation which if we take inspiration from
54:23
the tree situation this is the best thing there is you just need to initialize it correctly so their conjecture was if you initialize bp with random distributions and you let it run then it's going to converge and the converging uh uh distributions of belief propagation will
54:44
be such that you have non-vanishing mutual information between the true spins and the distributions that are the fixed point you converge to and numerically it just seems to be true so that's so so just it's just a structure that tells you so you see if you
55:06
initialize with uniform distribution in the symmetric case you stare at the uniform distribution nothing happens if you perch up slightly then some uh symmetry breaking will take place and so that's going to be you know it's uh reconstruction is up to permutation
55:24
so one permutation will win and you'll end up having a fixed point that tends to predict correctly according to this permuted mapping of the labels so that that's their tradition and it's it's really consistent with the numerical evaluations but it's still standing and it's still
55:44
open it's very hard to characterize sharply the dynamics of belief propagation in those models so it's it's uh it's not uh known it's well it's strongly believed but it's not proven let's say
56:02
and so here's a picture uh i mean trying to convey some intuition for what's going on here so here i'm drawing uh if i take for q the posterior distribution of this spin vector given the graph that i observed and x would be a candidate overlap measure so i might consider
56:26
some kind of large deviations functional of this overlap so say minus one over n log of q of those spin assignments such that the overlap is a particular x and so the intuition for for
56:45
these transitions could be as follows if you're in the non-reconstructible phase the large deviations are like that so the bulk of the posterior distribution is on uncorrelated spins so you don't have information there's nothing you can do in the heart phase the
57:05
posterior distribution would have a non-vanishing mass uncorrelated spins so it's it's useful signal if you could sample from the posterior distribution then you would get something useful but the large deviations functional is like that and belief propagation is trying to move
57:26
downhill along that curve and so if the zero overlap vectors are stable and on those dynamics then you get stuck so we don't know how to initialize bp and so the heart phase is where
57:42
you get stuck even though there is information in the posterior distribution and the easy phase would be the trivial fixed point is unstable so you torture it slightly and you go to a good set of configurations so but as i say it's not it's not proven yet so it led again lenka
58:03
and co-authors to try and propose something something else but before i move to that let me say that the next idea that comes to mind is take off the shelf methods so spectral methods are very popular in that context what that might mean is take the adjacency matrix extract
58:23
the eigenvectors of the leading eigenvalues and then try to use the coordinates of the corresponding eigenvectors for each even node as information about the underlying speed it turns out this phase in those random graph models when the average degree is order one
58:45
because the spectrum tends to be you know flattening while extending away from a bulk and you have large eigenvalues that tend to be triggered by high degree nodes so in a sparse
59:01
random graph the degrees tend to be distributed as Poisson random variables and when the mean is order one then you have fluctuations you don't have concentration to the mean so you have degrees of order log n log n typically and this would induce eigenvalues that are like the square root of that and the corresponding eigenvector would be a bit like the eigenvector
59:25
of a star graph centered at this point so high value at this vertex small values at the neighboring sites and even smaller further away and so these are not correlated so hence the proposal by Lenka et al they have a very nice name for them they call it
59:46
the spectral redemption because you need to and to salvage classical spectral methods and so they propose to stick to bp but linearize it so you take the bp iteration
01:00:00
and you linearize it around the trivial fixed point. So you can write your messages passed from node i to node j as the distribution u s, that's the trivial fixed point, and times one plus some perturbation, and you just do an expansion.
01:00:21
And so you find a very nice structure for the BP iteration. It's a tensor product between a matrix indexed by the oriented edges. So BP works on oriented edges passing on distributions along each oriented edge. So here you see the tensor product between a matrix
01:00:42
that is indexed by the oriented edges of the graph, and a matrix of size q by q, which is precisely the p matrix that you have in the tree reconstruction problem. So what is this matrix B? This is a non-backtracking matrix, which would put a one on entries u to v, x to y.
01:01:05
If u to v feeds into x to y, so that is v is equal to x, but x to y does not backtrack on u to v, that is y is not equal to u. So that's the picture here.
01:01:21
And so their spectral redemption conjecture was if we cannot prove that BP works, then maybe we can prove that spectral method based on the non-backtracking matrix does work. One more word about this non-backtracking matrix,
01:01:42
it's asymmetric, so we are no longer in the realm of Hermitian matrices, so it's a bit nastier to analyze. And it encodes in a very succinct manner the number of non-backtracking walks that you can have of the graph G.
01:02:01
So if you take the kth power of B and look at the entry e to f, what you will find is the number of non-backtracking paths of length k plus one edges, starting with edge e and ending with edge f. All right, so what I want to describe next is a result we had with Sharm, Bordenov, and Marc Lelage,
01:02:23
where we indeed confirmed the prediction of this spectral redemption that you can do in polynomial time correlated reconstruction in this stochastic block model well above the kth and Stigum threshold by processing the eigenvectors of this non-backtracking matrix.
01:02:44
So it's a bit heavy to parse this slide, but let's try that. So what it says in a nutshell is that there are some eigenvalues of what I call the mean progeny matrix, M,
01:03:02
which is p times alpha. I think I introduced that on an earlier slide, but I might not have mentioned it. So M is alpha times p, okay? And so it has q eigenvalues, lambda i of M, and some of them will be found almost identically
01:03:21
in the non-backtracking matrix spectrum, while others will be lost in a bulk. And those that will stand out and be present in the spectrum unchanged of the non-backtracking matrix B will be precisely those that satisfy the kth and Stigum condition. That is to say, those for which they are square,
01:03:43
lambda i of M squared, will be larger than alpha, which is just lambda 1 of M. And so that's the statement about the eigenvalues. So R naught is the largest index for which an eigenvalue lambda i of M satisfies the kth and Stigum condition.
01:04:04
And so they are present in the spectrum of B, whereas all the other eigenvalues of B are caught in a bulk, which is a disk of radius square root of lambda 1 of M. Maybe I'll not talk too much about the eigenvectors,
01:04:21
but we have results also on the corresponding eigenvectors. When an eigenvalue stands out, is out of the bulk, we can characterize the corresponding eigenvector when this eigenvalue is simple as being nearly parallel to an eigenvector that we can construct from the matrix B somehow.
01:04:43
And this is what we leverage to prove that a spectral method, taking the matrix B, extracting the eigenvalues, the corresponding eigenvectors, pick the right one. Then this is an eigenvector indexed by oriented edges, so we may project it down to a vector indexed by nodes.
01:05:03
This is what this projection does, so for node U, I sum up the entries of edges pointing outside of U, and then I get my variate for the node U, and this carries some information about the underlying spins.
01:05:27
So that's how one shows that the spectrum of B allows to do in polynomial time correlative reconstruction. I think I don't want to pass this definition of the vector,
01:05:44
because it's not that, unless someone insists. So the output of this theorem is that you have a good way to construct this sigma hat, or a good choice of sigma hat?
01:06:01
Yes, so you take B, you extract the spectrum, you pick an eigenvector, you project it down to a vector on R to the N, and then you know that you get a scalar for each node, and this scalar is positively correlated, carries information about the speed.
01:06:23
How do you choose the eigenvalue? How do you choose the index I? So if we're still in that game where we know the parameters, we have revealed matrix M, P, I, phi, etc., then we can just look it up and say, oh, well, there's an eigenvalue that satisfies the k-stance-tigum threshold,
01:06:43
so I know I should be able to find a corresponding eigenvalue in the spectrum of B. Maybe I'll show you this picture, so maybe this is the better answer. So that's an instance of this stochastic block model.
01:07:01
This is the spectrum of the non-backtracking matrix, so it's on the complex plane, because we have non-symmetry, and so you have the Perron-Frobenius eigenvalue, which is real positive, and you may draw a circle of width, the square root of this Perron-Frobenius eigenvalue,
01:07:24
and that's what you would call the bulk of this spectrum. And if you find eigenvalues that are close to the real axis and that are in between the Perron-Frobenius eigenvalue and the bulk, these are the right words for your reconstruction. So that's something that you can do from a data-driven perspective.
01:07:44
If you get the model parameters, you can even predict that you will find them there. Okay, all right, so maybe I should say a word on the eigenvectors and why that holds.
01:08:04
My question is if you have more than two eigenvalues outside of the circle, outside of the bulk, can you choose the first Montreal one or the largest Montreal one? If you had the lambda 3 lambda 4, which was outside? If you're just in the game of it, I can extract signal.
01:08:23
I'm not insisting on catching as much signal as possible, then take any. But certainly you could do better by taking all of them that are outside the bulk. We don't quite know how we should do that if we wanted to, but okay, you just need one to prove that reconstruction can be done.
01:08:45
Maybe I should just say a word about this eigenvector construction. So if I consider a vector, so that starts by taking y in R to the e, the eigenvectors,
01:09:07
such that y u v equals x i of u, sorry, of sigma u, where x i is the eigenvector of m
01:09:27
that is associated with the eigenvalue lambda i of m. And now if I look, say, at e transpose L y on edge u to v for some power L,
01:09:47
then what I will have is something like that. If my neighborhood is tree-like, I will eventually have a summation of our nodes
01:10:03
at distance L of x sigma i for those nodes at distance L. And so invoking the coupling of the neighborhood with a tree,
01:10:22
I can relate this to the martingales I was using for the census reconstruction. And so that would converge on the tree as n goes to infinity to some delta infinity u to v, limiting random variable,
01:10:41
if I renormalize by alpha lambda i, say, to the minus L, after proper renormalization that would converge to a limiting thing. And now you can get a feel for why this is a good candidate for an eigenvector because if I replace the entries of this vector by d lambda i to dL times limiting value,
01:11:10
then I have that d transpose of d transpose to dL y u to v. That would be, again, something like alpha lambda i to the L plus 1 delta infinity of u to v.
01:11:29
So I find that approximately the martingale convergence theorem gives me a kind of eigenvalue eigenvector equation where the eigenvalue would be alpha lambda i.
01:11:44
So here, okay, I've been using the notations from the census reconstruction, so that would be we have alpha lambda i of p, so that's lambda i of n, which is just... Okay, so that's one part of the intuition, but you need to work quite a bit to really show that indeed the eigenvectors can be constructed.
01:12:05
And in fact, this does not suffice. As you see, this is a bit more complex, the construction of the candidate eigenvectors. We need to apply b transpose L times, but then bL times. This is for technical reasons. Okay, so we have then the result that in polynomial time we can achieve with construction above the test-sensitive expression.
01:12:33
So one thing I wanted to tell you about is the relationship between those results on the non-backtracking spectrum
01:12:43
and results in the random matrix theory literature. So there's a line of work about characterizing the spectra of deformed large random matrices. So the canonical model of a random matrix is the Wigner random matrix,
01:13:05
which is a matrix that is Hermitian, whose entries are independent up to this symmetry. And they are Gaussian, so you would have wij would be a Gaussian with variance sigma squared over n for i less than j.
01:13:25
And on the diagonal you would double the variance, let's say. So that's the prototypical random matrix, and it's been studied a lot. In the 50s, Wigner showed that for fixed sigma as n goes to infinity, the spectral measure,
01:13:46
the empirical distribution of the eigenvalues converges to the semicircle law, which has density given here. And so Bayek, Benares and Pechet characterized the impact on the spectrum of such a Wigner matrix
01:14:06
if you add a low-rank matrix to it that is independent of the Wigner matrix W. Hence the name BBP or Bayek-Benares-Pechet phase transition. So specifically they consider an n times n symmetric matrix P with fixed rank Q,
01:14:28
fixed spectrum as well, we scale the quantities to the other one. And similarly to the notation I was using for discussing the case density GUM threshold,
01:14:41
I'll introduce R naught to be the largest i such that the square of the eigenvalue lambda i of P n exceeds the variance term of the Wigner matrix. And so here's a version I took from Benares-Georges and Nadal-Puditi,
01:15:00
but this is really this Bayek-Benares-Pechet transition. If i satisfies the BBP criterion, you might call it, then the highest largest eigenvalue of W plus the perturbation P is outside the bulk. So it's larger in absolute value than twice sigma.
01:15:24
And it's precisely given by the original eigenvalue lambda i of P plus sigma squared over this lambda i of P. Whereas if you are for i larger than R naught, the eigenvalue is lost in the bulk.
01:15:41
You cannot see it. And in fact, there is a very strong parallel between this situation and the one we have in this non-backtracking matrix spectrum of stochastic block models. So I'll try to convey that now.
01:16:00
So first, let's try to read off the stochastic block model, the variance parameter in the Wigner matrix case. So recall, we would view the adjacency matrix as a single matrix, block structures, which is the expectation of the matrix conditions of the spins.
01:16:22
And then we have a noise matrix that is added to it. So sigma squared in BDP is the sum of variances of terms along a row or a column. So in our case, we would sum this R sigma i V over n times 1 minus R sigma i V over n,
01:16:45
because this is the variance of a Bernoulli random variable. So ignoring the lower order of terms, we get the sum over T of R sigma i T over n. And if you check the model parameters, this just gives you the parameter alpha.
01:17:13
Another way to see this is in those random graph models, those node degrees are asymptotically distributed like a Poisson random variable,
01:17:21
and this has variance alpha. So the variance of the sum of terms is asymptotically the sum of the... Okay, it just works out. Okay, and then the single matrix P sub n would correspond to the block matrix I was writing down.
01:17:41
And so the spectrum of this one is asymptotic to the spectrum of this mean progeny matrix, n, that we saw before. And so the k-stance theorem condition on the one hand says an eigenvalue will be seen in the non-backtracking spectrum,
01:18:02
lambda of n, if it's squared, is larger than alpha. And BDP says an eigenvalue lambda of the deformation matrix P will be visible if it's squared, is larger than sigma squared. And so by this identification, we find that this is one and the same criterion
01:18:24
for being visible in the spectrum of the particular matrix. So we try perhaps to push this parallel a bit further now, because, okay, we have the same criterion for eigenvalues being visible,
01:18:40
but in one case we see them in the non-backtracking spectrum, in the other case we see them directly in the spectrum. And the difference being we look in the SVM at sparse situations, we have sparse random graphs, whereas with Wigner matrices we are in a different regime, we have non-sparse matrices.
01:19:02
So let's try to push the parallel a bit further, and for that I will quote a formula that is the so-called Ihara-Bass formula that links the spectrum of, there's a B missing here,
01:19:20
it should be identity matrix minus Z times B. So you have a formula that relates the determinant of identity minus ZB to the determinant of an n by n matrix, which involves the adjacency matrix of a graph, and a diagonal matrix with on its diagonal the degrees of the nodes minus one.
01:19:45
Okay, so that's the Ihara-Bass formula, this is known for decades now, and it readily implies that an eigenvalue of B which is not one minus one or zero is necessarily, well, it's an if and a leave,
01:20:04
it is such an eigenvalue if and only if the determinant of lambda squared I minus lambda A minus the same diagonal matrix is zero. So now we can process that a bit, and if we assume there is concentration of the degrees in our random graph models,
01:20:27
we let the alpha parameter go to infinity, sufficiently large so that the deviations, the relative deviations of degrees to their means vanish,
01:20:40
then any eigenvalue lambda in the spectrum of the number tracking matrix B, that is small compared to alpha, is necessarily such that lambda plus alpha over lambda up to a tiny perturbation is going to be in the spectrum of the adjacent symmetries.
01:21:03
So the way to parse that is that if you let your average degree increase, you move from sparse models to non-sparse ones, then you expect the eigenvalues in the number tracking matrix
01:21:24
to show up in the spectrum of the adjacent symmetries, modulo the same kind of transformation that is present in the Baye-Benaro-Spéché transition. So that makes perhaps the parallel even stronger. And perhaps another point to take from this calculation
01:21:45
is that all the hassle of constructing the number tracking matrix and of extracting a spectrum is needed if you are after sharp thresholds in the sparse case,
01:22:02
whereas if your models are dense enough, you may just go with the adjacent symmetries and stick with classical spectrum methods. So I was planning to conclude about now. If it's too early, I can say more things, but if not, let's conclude.
01:22:27
So there are many exciting developments that I have not covered. So for instance, I was talking just now about denser models going beyond the sparse case. And in that case, there are very powerful tools that have been developed,
01:22:42
in particular by Andrea. There is a so-called approximate message passing set of techniques that is really the tool of choice now for analyzing non-sparse models. And so if we wanted to have a version of Baye-Benaro-Spéché in non-sparse cases,
01:23:03
then this would probably be a very promising approach. I've talked about case density transition a lot, and then about the information theoretic transition for reconstruction. There are finer phenomena that you can look at,
01:23:22
and if you're interested, you can go and check a very recent paper by Lenka and Ritchie, Tasaghi and Semerjian, where they distinguish cases where belief propagation might be non-trivial,
01:23:43
but yet not optimal. So there's even finer categorizations that you can grow. And about, I guess, one thing that I think is very interesting is a better understanding of the hard phase.
01:24:02
To me, it's understood as of now. So the landscape, for instance, of BP, the magnitude of the basins of attraction, what are the fixed points, etc. So I think there's a very rich geometry here that is worth understanding. And we can also try to understand such landscapes of energy for other dynamics,
01:24:24
not just to BP. So that's a very exciting, perhaps, way to try and understand algorithmic hardness in those problems. And more generally, I would say that statistical physics brings us with a very rich perspective on this computational complexity problem.
01:24:45
By looking at those large instances, we see phenomena that we would not see otherwise. We start distinguishing between different phases, identifying cases where a very simple algorithm works,
01:25:02
situations where it starts to fail. All this is really a contribution of this perspective on the computational complexity. And with this, I will stop. Very nice lecture.
01:25:27
Do you have any questions? Maybe I'm wrong, but when you're sparse, because you take your known backtracking matrix and you look at the power of order again.
01:25:43
Yes. So if you are sparse enough, you can do it in polynomial time. We look at powers to characterize the eigenvectors. And so we prove that the eigenvector carries some signal
01:26:01
by showing that the eigenvector... So you don't need to... Yes, I think the appeal of this spectral method is that you just do the spectral analysis, you pick the eigenvector and you don't need to power it. We had looked earlier on at other spectral methods where instead of considering a known backtracking matrix,
01:26:23
you would construct matrices counting self-avoiding paths. We have done some work more recently with Lidlik and counting graph distances. And so this takes a bit longer to construct. They have other advantages,
01:26:40
but the beauty of the known backtracking matrix is that you just off the shelf extract the spectrum, pick the eigenvectors and you have an algorithm. So you are able to describe precisely the eigenvectors of the outliers?
01:27:01
Yes. And what is the result? So the result is, it is as if conditional on the spin of the endpoints of an edge. So you know these are indexed by oriented edges.
01:27:20
So you pick an edge u to v, you have spin sigma u sigma v attached. Conditional on these two spins, the entry of the eigenvector is distributed as the limit of the martingale that we can construct on the associated free model, as an initialization point, something that is dictated by those spins.
01:27:45
So this is something you can write down. It is deterministic at the end? It's not deterministic. There are fluctuations? There are fluctuations, but the means we know exactly.
01:28:01
And we can also compute the variances. These laws are typically complex, so martingale limits, that would be not an easy to describe distribution, but the mean and variance we can easily compute. And so it's as if you have conditionally on the spins
01:28:23
independent martingale limit samples with the proper mean and variance. Is there a distinctive description of this distribution?
01:28:44
I don't know either. I mean we can write a fixed point equation for the characteristic function, for instance. So that's something we can do. And if we are interested in estimating quantize, we can do that numerically. I don't think we can, in general, we can say a lot more.
01:29:07
Normally, the limit of large degree, you get Gaussians, right? For instance, the theorem of how long the t reported at the beginning of the construction is obtained by approximating the distribution of the degree of belief by Gaussians.
01:29:26
So in the sparsity lessons, higher and higher degrees, as Andreas says, you have averaging phenomena. So it's central limit theorem starts to kick in
01:29:41
and you have more explicit characterization. More and more deterministic or more and more concentrated? When we stick to this signal-to-noise ratio of order 1,
01:30:04
so second eigenvalue squared divided by first eigenvalue, say 1.2, when this stays order 1, you cannot hope to be perfect at reconstruction. So this is really the right notion of a signal-to-noise ratio in this reconstruction problem.
01:30:24
If this becomes very large, then indeed you can hope to be asymptotically exact. Any questions? So I have a question concerning characterization of the space transition.
01:30:43
Because in statistical mechanics in general, you don't just want to know where the transition is, but you're also interested to what happens when you come close to this transition. So you're an idea? I mean, do people have this heuristic behavior of what happens when you come close? I don't have an idea.
01:31:00
For instance, the reconstructivity transition, we know from very recently that in some cases, exactly where it is. But in general, we are not sure about where it is. So I think there is a conjecture for where it should be. That's been established rigorously in a paper by Leon Kaita in 2017
01:31:24
for disassortative symmetric stochastic block models. But okay, I would not venture this kind of universality. From a physics point of view, these are kind of similar space transitions as in mean-field systems in physics.
01:31:43
But people didn't look so much into the critical behaviors, not maybe so. Maybe not the most interesting questions in this concept. If this transition is between, let's say, a fast algorithm and slow algorithm,
01:32:00
maybe you would like to know when you approach a transition from the fast side, how fast does it stay? How fast is it? How quick is it when you approach it? So if you have polynomial decay, for instance, that's the polynomial decay rate.
01:32:21
How does it change when you approach this? So in the dense systems, we have really simple equations that tell you about the convergence time of the algorithm. So there you could do, it's just a scalar fixed-point equation so you could analyze all of them. I have another question about the spectrum of the V-matrix.
01:32:42
Do you still have to extract, if you're given the graph, do you have to extract by some numerical... Do you have to compute exactly the spectrum and then look at the first outliers and try to extract statistical information about this?
01:33:04
My question is, do you need to compute it by some natural algorithm, for instance? Or can you use this B to the L, BT to the L method to have at least an approximate... I guess you could have a good starting point from this.
01:33:23
Maybe you would need to use long-source iterations but you could initialize them in a smart manner. So you don't have to really compute the spectrum because this can be a lot richer, it means much. Another remark about computing the spectrum is this correspondence between the spectrum of B
01:33:43
and this E-harabas formula. So in the paper on the spectral redemption they also leveraged that to show that you can really work with n by n matrices so you can cut on the complexity of it.
01:34:05
Another question? Let's thank the speaker.