6/6 Nilsequences
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 18 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/17025 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Hadamard matrixHadamard, JacquesLarge eddy simulationMaß <Mathematik>MultiplicationModulformFunctional (mathematics)Convex setCartesian coordinate systemNumber theoryLogarithmLocal ringAveragePower (physics)Beta functionSet theoryProof theoryResultantProduct (business)AnalogyPrime idealClique-widthSubsetCuboidMultiplication signArithmetic progressionInfinityLinearizationLattice (group)Affine spaceCoefficientMortality rateStandard errorDivisorTerm (mathematics)MereologyThree-valued logicBinary treeSummierbarkeitVolume (thermodynamics)Physical systemUniformer RaumLogical constantTheoremComplex (psychology)Prime number theoremPressurePoint (geometry)Formal power seriesFlow separationMoment (mathematics)Selectivity (electronic)Process capability indexPrime numberGeneral relativitySign (mathematics)Basis <Mathematik>Event horizonSequelFinitismusSocial classCarry (arithmetic)Inverse elementLecture/Conference
09:54
Latent class modelTensorPrime idealFunctional (mathematics)VarianceTheoremArithmetic progressionModulformAffine spaceLocal ringLinearizationFinitismusPhysical systemLinear algebraWell-formed formulaEqualiser (mathematics)Social classConvex setResidual (numerical analysis)TheorySummierbarkeitNormal (geometry)Operator (mathematics)Multiplication signProduct (business)Beta functionNumber theoryComplex (psychology)AverageWeightNumberProof theoryLatent heatLattice (group)CountingEccentricity (mathematics)Moment (mathematics)Event horizonExpressionComputational number theoryFormal power seriesComputer programming1 (number)Lecture/Conference
19:17
Term (mathematics)CubeCoefficientModulformMultiplication signSummierbarkeitRight angleCircleComplex (psychology)Number theoryNormal (geometry)Different (Kate Ryan album)Physical systemSubsetProof theoryForcing (mathematics)Arithmetic progressionProcess capability indexMathematical analysisMany-sorted logicVariable (mathematics)Social classMathematicsInequality (mathematics)Cartesian coordinate systemOperator (mathematics)1 (number)AverageGroup actionSet theoryFunctional (mathematics)Linear algebraStandard errorMereologyLinearizationEstimatorCountingClassical physicsTwin primePrime idealParity (mathematics)Fourier seriesRange (statistics)TheoremCategory of beingIndependence (probability theory)Partition (number theory)Lecture/Conference
28:40
Functional (mathematics)Product (business)Multiplication signNumber theorySequenceCartesian coordinate systemPrime idealModel theoryFrequencyGroup actionSummierbarkeitLengthResultantFlow separationState of matterFormal power seriesEvent horizonRight angleNormal (geometry)AverageParity (mathematics)Arithmetic progressionPrime numberOrder (biology)Proof theoryBound stateTheoremGreatest common divisorModulo (jargon)Beta functionIntegerTerm (mathematics)Point (geometry)Lecture/Conference
38:03
Number theoryPrime idealSummierbarkeitMultiplication signAxiom of choiceModulo (jargon)MassFunctional (mathematics)WeightLogical constantExpressionMultiplicationPoint (geometry)Group actionSieve of EratosthenesQuadratic formPrime numberArithmetic progressionPhysical systemTheoryPower (physics)Cartier-DivisorPrime number theoremAverageRange (statistics)Square numberDirected graphLocal ringAdditionFigurate numberForestSelectivity (electronic)Lecture/Conference
47:26
Physical lawCategory of beingFunctional (mathematics)AverageMultiplicationTheoremNormal (geometry)Helmholtz decompositionInverse elementProof theoryLinearizationModulformComputational number theoryWeightInequality (mathematics)ExistenceChi-squared distributionSequenceSocial classCross-correlationSkalarproduktraumMultiplication signPoint (geometry)Number theoryExpressionRandom matrixParameter (computer programming)Logical constantBound stateRange (statistics)Standard errorPrime idealArithmetic progressionTheoryEvent horizonNear-ringMoment (mathematics)Rule of inferenceResultantCasting (performing arts)SummierbarkeitLecture/Conference
56:49
Normal (geometry)Functional (mathematics)Free groupSocial classTheoremRandomizationSequenceL-functionBilinear formPrime factorAverageSkalarproduktraumPrime idealINTEGRALComplex (psychology)Standard errorChi-squared distributionMultiplication signClassical physicsMoment (mathematics)SummierbarkeitConvolutionNumber theoryAnalytic number theoryPositional notationParity (mathematics)Power (physics)AdditionGlattheit <Mathematik>Prime numberTheoryAlgebraic structureDecision theoryPoint (geometry)Confidence intervalCausalityProduct (business)Well-formed formulaModulformMilitary baseGroup actionLinearizationLogical constantExplosionPhase transitionImage resolutionNoise (electronics)Condition numberState of matterAlgebraLecture/Conference
01:06:12
12 (number)Analytic number theorySummierbarkeitWell-formed formulaCoefficientIdentical particlesConvolutionTeilerfunktionChi-squared distributionMultiplication signSocial classProof theoryHelmholtz decompositionAverageSkalarproduktraumGeometric seriesL-ReiheDivisorModulformFunctional (mathematics)TheoremBound stateCartesian coordinate systemRootSquare numberIntegerRange (statistics)Variable (mathematics)MathematicsPoint (geometry)Prime idealNumber theoryNear-ringCombinatory logicSequenceDivision (mathematics)Inequality (mathematics)Grothendieck topologySeries (mathematics)Event horizonWeightProduct (business)Group actionFood energyLecture/Conference
01:15:35
Chi-squared distributionSequenceDistribution (mathematics)RootCartesian coordinate systemSquare numberAlgebraic structureMultiplication signConvolutionAveragePoint (geometry)Many-sorted logicMass flow rateMereologyCross-correlationDivisor (algebraic geometry)Power (physics)Prime idealGamma functionLinearizationResultantOrbitPolynomialFunctional (mathematics)AnalogyAutomorphismModulformSystem of linear equationsLimit of a sequencePhysical systemCycle (graph theory)1 (number)Parity (mathematics)Bound statePopulation densityRandomizationTheoremSimilarity (geometry)TheoryArithmetic progressionIdentical particlesGoodness of fitFrequencySpacetimePhysical lawDifferent (Kate Ryan album)Normal (geometry)Nichtlineares GleichungssystemRule of inferenceCombinatory logicNetwork topologyLecture/Conference
Transcript: English(auto-generated)
00:32
What I'm going to talk about today in this last lecture is applications of the material that I've been discussing to the primes.
00:47
So I'm going to start by stating the results that I want to sketch some proofs of. And in stating any results about primes, in fact, it's convenient to introduce the von mongolt function, and that's defined by lambda of n, which is going to be log p if
01:14
n is a power of a prime p and 0 otherwise.
01:24
And really, the reason you introduce this is to avoid having huge numbers of logs involved in your statements. So the prime number theorem is equivalent to the statement that the average value of
01:41
the von mongolt function is 1 as n tends to infinity. So the applications to primes that I want to talk about are about counting how often linear forms take prime values.
02:03
So I'm interested in finding prime values of linear forms.
02:23
So I'm going to be interested in a collection of linear forms. So let psi equals psi 1 to psi t be a collection of t linear forms, affine linear forms.
02:51
So those are going to map, and they'll be in d variables, so they'll map z to the d to z.
03:05
So for example, e.g., I could have t equals 4, d equals 2, and the form is defined as as follows, psi 1 of n is n1, psi 2 of n is n1 plus n2, psi 3 of n, and so on.
03:32
And this is the familiar system, the system we've been talking about quite a lot. This is talking about four-term arithmetic progressions.
03:47
So one thing I'm going to talk about is how to count the number of four-term arithmetic progressions of primes. But this is a lot more general. So to give another example of something that is covered by this, so I could take, again,
04:03
well t equals 3, and d equals 2, and the form's as follows. So psi 1 is n1, psi 2 is n2, and psi 3 is some constant big N, minus n1 minus n2.
04:25
And then if I can count how often those three forms are primed, what I've done is represent n as the sum of three primes. So n equals p1 plus p2 plus p3. So that's what's called the ternary goldback problem.
04:42
Actually, I could state the binary goldback problem in the same formalism, of course, but that's not going to be covered by my main theorem, as you probably know. Anyway, so what is a general theorem about this?
05:01
So this is a theorem of myself and tau, and with a crucial input by myself, tau, and Ziegler, that being the inverse theorem for the Gauss norms, which I've talked about. So the theorem is if you take any collection of linear forms, so suppose that this collection,
05:23
psi, has finite complexity. I'll define precisely the complexity in just a moment, but what finite complexity means is essentially that no two of the forms are multiples of one another, which means
05:45
that no two of the homogeneous parts, so I'll call those psi i dot, so the homogeneous
06:02
parts, just what you get by ignoring the constant term, are multiples of one another. And then the conclusion is that you can count how often those forms are all prime.
06:21
Well, I'm going to count over a certain region, so suppose that k is a subset of a box of width n. I'm going to say it's convex because that's what we did in the paper where we worked these things out, but really any nice set could be decomposed into convex sets.
06:44
So just suppose k is a nice set that you're going to count over, then if you count over k intersected with the lattice points of k, and you count how often those forms
07:06
are all primed in this weighted manner using the von mongolt function, then there's what we call a local global principle.
07:20
So asymptotically, this can be computed as a product of certain local factors, beta sub infinity times the product over primes beta sub p, plus an error that's expected to be small, so little o of n to the d.
07:44
So I'll tell you what the local factors are in just a second, and where this, well, the rate of decay of this, well, it will certainly depend on d and on t,
08:04
and it also depends on the size of the coefficients of the psi, so the coefficients of the homogeneous parts.
08:21
Now, I'm allowing affine linear forms, and I'm allowing the constant term actually to be rather big, so it also depends on 1 over n times the constant terms, but other than that, it's uniform.
08:41
So for example, you get a, this does decay for both of these systems that I've written up on this board. So here, so beta sub infinity, well, that's really just how many lattice points
09:01
I'm counting over, so it's the volume of k. It's not actually, it's not quite the volume of k, because primes are always supposed to be positive numbers, so it's the volume of that part of k on which
09:22
all of the psi i's are non-negative. So it's essentially just the number of points you're counting over, and the beta sub p's reflect local behavior of the primes, so beta sub p, well, it's kind of a local analog of,
09:47
at p, of this same average, so it's going to be the average over m, now ranging over z mod pz to the d of the same expression, but with local variance
10:00
of the Fromongold function. So what should a local variance of the Fromongold function be? It should be a function with average value one, which is somehow detecting the local behavior
10:22
of the primes mod p. So what are the primes mod p? Well, apart from p itself, which is just one prime among infinitely many, the primes are equal distributed in residue classes other than zero, and so that makes this definition quite sensible.
10:41
The local Fromongold function of x is going to equal p over p minus one if x is not divisible by p, and zero if x is divisible by p. So you can think of this as a local global principle.
11:12
For the primes on linear forms.
11:20
So in any given case with a little bit of work, you can go and compute exactly what this formula gives you, and I'm going to tell you the outcome in one case, so exercise. So this exercise will involve, well, substituting into the above, but then you have to do,
11:49
you have to remove the weights that I've given to the primes from the Fromongold function, so plus removing the log weight, which is a simple matter.
12:04
So it's a formula for the number of k-term progressions. So the number of k-term progressions
12:26
less than n of primes. Well, it's asymptotically the following constant,
12:43
so it's one over twice k minus one times the product over p of beta p times n squared over log to the k n, where beta sub p is given in the following formula.
13:00
So beta sub p is one over p times p over p minus one to the k minus one, if p is less than or equal to k, and if p is greater than or equal to k, one minus k minus one over p
13:20
times p over p minus one to the k minus one. So it's quite complicated, actually. Of course, if k is four or something, you could work out an explicit numerical value for this.
13:45
So that's the theorem whose proof I want to talk about. Are there any questions? I imagine not.
14:18
So mostly so far in these lectures, I've talked about just some very specific examples,
14:23
for example, four-term progressions. And I haven't talked about general systems of linear forms at all. So maybe I'll just say a couple more words about general systems of linear forms. The theory of Gower's norms, et cetera,
14:44
applies in the general context, well, in the general context that I've just described,
15:00
so for finite complexity systems of affine linear forms.
15:21
So in fact, there's a big generalization of what I call the generalized von Neumann theorem. So there is a generalization, a generalized generalized von Neumann theorem.
15:53
So remember the generalized von Neumann theorem said that if you want to count four-term progressions, the example I gave said if you want to count four-term progressions weighted by some functions,
16:03
then it's enough to control the Gower's norms of those functions. So let me introduce the obvious generalization. Introduce an operator T sub psi, and I guess it also depends on the convex body K,
16:21
of some functions, f1 up to ft, will just be basically the average value. So one over n to the d, let's say,
16:40
doesn't matter too much whether I take n to the d or two n to the d. And then the sum over this convex body, or the lattice points of the convex body, of those functions weighted by the appropriate linear forms.
17:06
So it's a generalization of the multilinear operators I considered before. So before I talked about, I think, just three-term progressions and four-term progressions. And so the generalized generalized von Neumann theorem
17:21
states that if the fi are bounded, we have that this is bounded by a Gower's norm, T of psi K f1 up to f of T
17:42
is bounded by some constant times any of the Gower's norms where S is something called the complexity of this particular system.
18:10
And there's a linear algebra recipe for computing the complexity.
18:34
I'll tell you what it is. It's not, well, I'll tell you what it is and then make some remarks about it.
18:40
So what it is, it's the smallest S such that for any i between one and T,
19:03
I can partition the forms psi i, the forms other than psi i. So I can partition psi one, psi i minus one,
19:22
psi i plus one up to psi T into S plus one classes with the property that so that psi i is not in the affine linear span of any of them.
19:58
So it's a strange sort of recipe.
20:00
I can explain, though, why, for example, what the complexity of four-term progressions is.
20:25
So for example, if capital psi equals psi one up to psi four, it's a system defining four-term progressions.
20:51
Then the complexity is three, so it's two.
21:01
And the reason the complexity is two is clearly if I remove any one of those four forms, I can partition the remaining forms into three classes, just the trivial classes. And then the form I remove will certainly not be in the affine linear span of any of them because they're independent.
21:21
But I can't do it with two classes because if I try to do that, I'd have to have two forms in a class and then I can write every form in two variables as something in the linear span of those. So I exercise to convince yourself of that.
21:41
I wrote down some other ones. So the complexity of the Vinogradov three prime system is one. And somehow what this means,
22:02
so complexity one means that the Gauss U2 norm is all you need. And as we've seen, the Gauss U2 norm is something that's just classical Fourier analysis. And indeed, writing odd numbers as a sum of three primes
22:20
is something that was done quite some time ago using types of Fourier analysis. And thus can be handled using Fourier analysis, traditional Fourier analysis, or more accurately, the Hardy-Littlewood circle method.
22:44
So a la Hardy-Littlewood. I probably should have an accent on that A. If I'm gonna write two words of French in my six lectures, I should at least get them right.
23:01
Is it, excuse me, is coefficient in Z or Q? Is it, ah, Z, yeah. I mean, actually, I mean, really all that's important is that they're Z-valued if you wanted coefficients in Q, but with Z-valued. No, I meant in your finding your span.
23:20
For example, you have something like two and one and three and one in your terms. Yeah, over Q. So could be handled by Fourier analysis, Hardy-Littlewood and Vinogradov. And that was done in the 1930s.
23:42
And then I wrote down one more example, which is the example of a cube. So cubes, which would be of the form. So I take all of the forms N one plus the sum
24:01
over I in A of N I, where A ranges over all subsets of two up to D. So this is two to the D minus one forms.
24:23
And the complexity is D minus one. So there are many different, D minus two. So there are various different types of systems
24:41
of forms that you can handle here. Now, as I said, I've just stated what the complexity is. Why is that the complexity? Well, that's a bit difficult to explain. But basically, the proof of this generalized von Neumann theorem is, again, some applications of the Cauchy-Schwarz inequality. And it turns out that the number
25:01
of Cauchy-Schwarz inequalities that you need to prove this bound depends on, you need to arrange the forms psi I by various changes of variables so that those Cauchy-Schwarz inequalities give you the right thing. And once you've decided upon that idea, working out what the correct number
25:21
of Cauchy-Schwarz inequalities you need is, it boils down to linear algebra. It comes down to just a linear algebra problem to which this is the solution. I don't think I can add any more intuition to that.
25:43
So that's the statement of the theorem. And, well, I'm going to tell you some of the ideas that go into the proof. And I think we'll lose nothing by specializing to the case of four-term progressions.
26:02
So for simplicity, but everything I say, I mean, it's for definiteness, not simplicity. Let's specialize to the system that we've been studying.
26:23
Psi equals psi one up to psi four of four-term progressions. And so let's again write T for the operator there.
26:41
So T of F1 up to F4 for the corresponding operator. So I talked about how you use this operator and generalized von Neumann theorems
27:02
to count arithmetic progressions. And the basic idea is that you split, you split the characteristic function of the set you're interested in into a structured and a random part.
27:21
So the way I did it when looking at Szemeredi's theorem, so the first idea would be just to split the von Mongolt function, so to estimate T of lambda, lambda, lambda, lambda,
27:42
which is what we want to do, you might try subtracting off the average value, try writing lambda as one, which is its average, plus lambda minus one,
28:03
and then decompose, just as we did before, T lambda, lambda, lambda, as T one, one, one, one plus 15 error terms,
28:25
terms involving lambda minus one. And then, so maybe this is equal to T of one, one, one,
28:47
plus order of the Gauss U3 norm of lambda minus one. And then perhaps one could show, just as we did when looking at Szemeredi's theorem, that this is small,
29:00
and hence the number of four-term progressions of primes is dominated by the main term here. And now it turns out there are a very large number of issues with that plan.
29:36
So here are some problems.
29:50
Well, I'm going to list several problems with it. One is, if it works, it would give the wrong answer, so we know it's not going to work.
30:04
So the answer it would give is just wrong, and the way it's wrong is we've not seen any of this local behavior of the von Mongolt function at primes P. So where are the beta P?
30:23
So there's no, we've not taken any account of irregularities of the von Mongolt function modulo two, three, and five. Second problem is,
30:41
well, in making this assertion here, I applied the generalized von Neumann theorem. But it wasn't actually valid to do that because the function I applied it to is not bounded. So the application of the generalized von Neumann theorem
31:09
was invalid, and that's because lambda minus one is not bounded.
31:21
It's not bounded by one, it's not bounded uniformly. So I'm just going to see if I can find some chalk of appropriate length, maybe that will do. So that's not so great.
31:41
And then third, even if that had all been valid, well, how am I going to show that this U3 norm, lambda minus one in U3 is indeed small? So how do I propose to show,
32:11
so how do I propose to show that that's little o of one? The tool I have at my disposal is the inverse theorem. So the inverse theorem for the U3 norm
32:31
was again only valid for bounded functions. So I only stated it, and I also only proved it.
32:44
I didn't prove it, but the proofs that I discussed were only valid for bounded functions, for bounded functions. So there are, well there are three
33:02
very serious problems with the plan.
33:23
However, the plan can be made to work by addressing all of those issues. The first one is actually the easiest to address.
33:56
So point one is addressed using something
34:01
called the W trick. And the idea here is that you take W to be just the product of the first few primes.
34:21
You need to take a number of primes that tends to infinity. So there's nothing really special about log log N, just any slowly growing function would work, but you want to make sure that it doesn't grow too quickly. And consider, well, and write lambda as an average
34:48
of the functions, which I'll call lambda sub WB. And these are defined as follows. Lambda sub WB of N is lambda W over W,
35:05
sorry, phi of W, Euler phi of W over W, lambda of W N plus B. So that's basically saying, let's foliate the integers into progressions modulo W. So here, the highest common factor of B and W is one.
35:27
And this has the effect of smoothing out the irregularities, modulo small primes of lambda. So these new functions, lambda sub WB,
35:52
are now do not have bad irregularities, irregularities, modulo primes P
36:06
that are less than the threshold I took. So to explain why that's so, well, and where this trick comes from, let me just show you
36:21
what the idea that led to this. So if I take the primes, three, five, I know two is a prime, but it's always best to ignore it. Let's list all of the primes.
36:41
So of course, those are all odd. But if I consider what happens, if I consider the N for which two N plus one is prime, well, then I instead get the sequence one, two, three, five, six, eight, nine, 11, and 14.
37:08
And you'll see that that is, well, it no longer consists entirely of odd numbers. And in fact, it's half even numbers and half odd numbers. So half, or at least it would be
37:20
if I continued it asymptotically. Half even, half odd. And the reason for that is that a number is equally likely, prime numbers are equally likely to be one mod four as they are three mod four.
37:46
So primes are equally likely to be one or three mod four. And that's a result of de la valet poussin.
38:01
And appropriately enough, Adam R. I think, well, these guys proved the prime number theorem. And with what was available at the time, they could easily have proved the version of the prime number theorem in progressions one mod four or three mod four.
38:22
So that's all I'll say about that. I don't want to continue working with lambda sub WB. So I will go back to working with lambda, but one should pretend that I've done this trick. And so in other words, we're going to pretend from now on that lambda itself is nicely distributed
38:41
in modulo small primes. So we'll pretend that this trick has been applied and that lambda itself is nicely distributed
39:08
to small moduli. So in reality, you do have to include the W and the B and that really the only effect of that is that throughout the paper, there's a W and a B,
39:22
which is a bit of a pain. So two and three are more interesting. I'll deal with them a little bit together. So the point about two and three is that
39:41
although the von Mongok function is not bounded, it is bounded by something that's much easier to understand than the function itself. So for two and three, the key observation is that
40:03
although lambda is not bounded, it is bounded above by another function. It is point wise bounded by another function, nu.
40:29
Well, by a constant times constant multiple of some other function, which we call nu.
40:48
And this function nu is much better understood
41:02
than lambda itself. And this function nu comes from the theory of the sieve and in particular, what's called the Selberg's sieve.
41:21
So Selberg's sieve. So I'll just tell you very briefly the key idea because I feel that everybody should see this idea at some point if they haven't seen it before. The amazing observation is just take any weights lambda d.
41:45
So let lambda d from d equals one to some threshold r. So r is going to be a small power of n.
42:00
r is n to some small power. Let this be any system of weights and consider the following function, sum over d divides m and d less than or equal to r.
42:26
I'll call it something f of n lambda d squared. So it's just an arbitrary system of weights
42:42
but I do want to have that lambda one is one. So consider a function like that. Then I claim, I mean obviously this is a non-negative function. But furthermore, if n is prime
43:00
and if provided it's not tiny, so provided it's between r and n or bigger than n is also fine, then f of n is equal to one.
43:20
And that's clear. I mean it's the sum of divisors of a prime number. There are of course only two such divisors and only one of them's in the range that I'm allowing myself. So manifestly f of n is always non-negative
43:46
and so f majorizes the characteristic function of the primes.
44:08
Now it may not be completely obvious to you, it shouldn't be, but f is an easier thing, at least if the weights lambda d are chosen in some sensible way. F's an easier thing to compete with. So for example, if you wanted to prove,
44:21
if you wanted to figure out the average of f, well you just sum over n less than or equal to n, and then you expand out the square, and you get a sum. But the point is if r is relatively small, it's a fairly short sum, so you can estimate it, generally you can estimate it without too much difficulty
44:41
provided r is not too big. So it's reasonably easy to compute with in cases that arise. F is often quite easy to compute with
45:09
if r is small, if c is small. But who said that there's a choice of the lambda d for which f is, I mean,
45:22
for a typical choice of lambda d, just if I choose them randomly, f is going to be much bigger than the primes. So although it majorizes the primes, it will also be supported on many other numbers and be big. So the remarkable fact is that you can choose the lambda d so that this is not too much bigger than the primes.
45:47
So the lambda d can be chosen so that, well, for example, the sum over n less than or equal to n of f of n is bounded above by a constant, depending only on c,
46:08
times the number of primes less than n. And that's what Selberg did. So the way, I mean, one way to make a sensible choice
46:22
is just to look at the expression you're interested in. We want to make the total weight of f small. And it's actually a quadratic form in the lambda ds, essentially. So you just minimize that quadratic form, and that's how you choose your weights lambda d. So that's a ridiculously short course in the Selberg sieve,
46:43
but this is basically the idea. And the one further thing I should say is that if you want to majorize the von mongolt function, you can take nu of n to be log of n, simply just log of big N times f of n.
47:02
So then, I mean, it's also usually a good idea to divide through and renormalize so that the total mass is one. So let's also, so where c is chosen
47:33
so that the average value of nu is just precisely one.
47:42
Yes? Which of your hands are big and which are small? Very good, yes. Actually, they could probably both be small, but certainly at least one of them should be small, and it's the argument of f.
48:07
So choose c so that the average value of nu is one, and then this will majorize the von mongolt function up to a constant. So lambda of n will be bounded by constant times nu of n point-wise.
48:31
So the existence of this function nu allows you to prove two, so it allows you to prove a generalized von Neumann theorem for certain, well, for functions
48:41
like the von mongolt function. So it's not bounded, but it is bounded by something that you can compute with. And I didn't show you the proof of the generalized von Neumann theorem, but I did remark that it uses the Cauchy-Schwarz inequality. And basically, you do the same Cauchy-Schwarz inequality, but whenever you were tempted
49:01
to just throw away a function because it's bounded by one, you must resist that temptation and instead make sure you include the weight nu. And then essentially, the same computation works. You find yourself needing to evaluate a large number of expressions involving nu.
49:21
But that can always be done because they've been chosen specifically so that you can compute with them. So I'll write a few remarks on that.
49:48
Well, actually, because I'm, I actually am not going to write down what I just said because I want to say a few more things. Don't want to run out of time.
50:09
So what about point three, the inverse theorem?
50:36
So it turns out that you can also prove the inverse theorem as I stated it, namely that functions with a large U3 norm
50:42
correlate with a class two nil sequence. That's also true without the assumption that the functions are bounded, provided they're bounded by, say, nu. So it turns out that we can prove a variant
51:05
of the inverse theorem for the Gauss norms
51:21
for functions bounded by nu. In fact, by nu plus one, bounded by a constant multiple of the average of one plus nu.
51:49
So that's another function which has average value one. And an example of this would be lambda minus one, which is the function that we actually care about.
52:02
So the theory here does not use specific properties of the exact construction of this nu coming from the Selberg sieve, but rather a host of properties about linear forms involving nu that can be established in that particular case, but are also valid for more general functions.
52:23
Now, we don't do this by going through the proof of the inverse theorem step by step and making sure that we only used the fact that we're bounded by nu, but this is actually a consequence of the inverse theorem as a black box. So this follows from the inverse theorem
52:43
for bounded functions together with a decomposition result.
53:07
And this decomposition theorem was actually the key idea in my first joint work with Terry Tao, where we proved that there are arbitrarily long progressions of primes.
53:20
So this is from 2004. And the decomposition theorem is that lambda minus one, say, but actually any function bounded by a constant multiple of nu,
53:43
bounded by a constant multiple of nu or one plus nu or what have you. So for a wide range of functions, you can decompose them as a bounded function,
54:04
bounded by one, plus an error that's extremely small in the Gauss norm. So plus a function that's tiny in the U3M norm.
54:30
So why would this then imply the inverse theorem for lambda minus one? Well, if the U3 norm of the left-hand side is large, then because this is tiny,
54:42
the U3 norm of the bounded function is large. So I can then apply the inverse theorem for the bounded function, which then correlates with a class two nil sequence. This doesn't correlate with a class two nil sequence because it's tiny in the U3 norm.
55:00
And hence this correlates with a class two nil sequence. So that's the mode of argument. I should write down how that goes. So lambda, let me give these functions names. So let's call this little f, and let's call that little g. So lambda minus one in U3 of n
55:23
is greater than delta, let's say, implies that the U3 norm of f is greater than delta over two if I chose parameters correctly. And that implies by the inverse theorem for bounded functions
55:48
that f correlates with a nil sequence. The inner product of f with chi is large for some class two nil sequence.
56:06
And then that implies that lambda minus one correlates with a class two nil sequence because what g does not correlate
56:28
with a class two nil sequence. So the inner product of g with chi is tiny. And that's by the converse of the inverse theorem.
56:46
So remember I said that the inverse theorem for the Gauss norms comes with a converse. It says, so the theorem itself says if you have a function with a large U3 norm then it correlates with a class two nil sequence. Conversely, if you correlate with a class two nil sequence, then you have a large U3 norm.
57:02
But g does not have a large U3 norm. So we've managed to prove the inverse theorem for a function that's not bounded. And that has actually now addressed all of the problems that I foresaw with my plan.
57:24
Questions at all? You still need to wish a contradiction. Yes, exactly, I haven't. Right, so this is precisely. There's no contradiction yet because I've not ruled out the possibility that lambda minus one does correlate
57:41
with a class two nil sequence. It would be strange if it did because lambda minus one is to do with the primes and nil sequences have nothing to do with the primes. But that needs to be proven.
58:09
So the remaining task,
58:23
the remaining task then is to show precisely what Emmanuel just suggested.
58:42
So the remaining task is to show that the inner product of lambda minus one with chi, and maybe I should just clarify exactly what I mean here by the inner product.
59:02
This is the average value over n less than big N of lambda n minus one times chi of n. I would like to show that this is small, little o one for all class two nil sequences
59:24
of fixed complexity. And the error would be allowed to depend on that fixed complexity. So that's the task. And this involves techniques that come from
59:42
the classical additive prime number theory. But there's an initial step which is it reduces to a similar question. I mean, I'm oversimplifying quite a bit with what I'm about to say. But it's a fact that's.
01:00:00
Something that comes up quite often is that questions about primes are related to questions about the Möbius function. So it reduces using fairly standard techniques based on the fact that the von Mungolt function is,
01:00:22
well, it's the Dirichlet convolution of the Möbius function with a nice analytic smooth function log D. So you can always take a sum over the von Mungolt function and write it as a sum of things involving the Möbius function.
01:00:41
And this log D is usually very benign. So it reduces to establishing that the inner product of Möbius, I will just remind you what Möbius is in one moment. So Möbius inner product chi,
01:01:01
which is the average value of mu of n chi of n is a little over one. The notation here is probably a bit unfortunate because in analytic number theory, chi is always a Dirichlet character. So I think probably I'll rewrite these notes
01:01:23
at some point and call it something else. So if anyone happens to watch this video at precisely this point, this is not a Dirichlet character, although it could be. No, because it's an average.
01:01:42
Yeah, so Möbius is just minus one to the power of the number of prime factors of n. Parity of the number of prime factors.
01:02:05
And what we're trying to prove here is an instance of what's known as, I think this terminology was introduced by Friedlander and Ivaniacs, but it's something that's all pervasive in analytic number theory.
01:02:23
It's an example of what's called the Möbius randomness principle, which is the idea that the Möbius function should just be orthogonal to everything.
01:02:47
Is there some? Oh, I thought the Möbius function was zero if n was not sort of free. Ah, yes.
01:03:06
Oh, well actually, I mean, everything that I would say would be true with the function that's the Liouville function that if you don't include that square free condition, which is in some ways more natural. It depends. In other ways, it's less natural.
01:03:20
So the Möbius randomness principle is that Möbius is orthogonal to everything, unless there's some obvious reason why it shouldn't be. So it's clearly not orthogonal to itself, and nor is it orthogonal to the von-Mongolt function. But if you take a function coming from somewhere else, so coming from an algebraic construction or a construction like these nil characters,
01:03:41
there's simply no reason that it should correlate with the Möbius function. So that's a heuristic. Proving it in any given case is not always so easy. There are basically two ways that I know of
01:04:01
of proving a Möbius randomness principle. So there are two methods for proving rigorously that the inner product of Möbius with a particular function is little l of one for some f.
01:04:31
And those methods are, roughly speaking, there are two cases. If f is kind of multiplicative, so if it is a Dirichlet character, or if it's just a constant function one,
01:04:42
then the techniques that you would use would be techniques involving l functions and the zeta function if f is just one. So if f is multiplicative in some vague sense, if it somehow has multiplicative structure, then we use l function contour integration
01:05:05
type techniques, Perron's formula, et cetera. And if f is not multiplicative, if f looks to be far from multiplicative,
01:05:29
then we use a different method that's called the method of bilinear forms.
01:05:45
And I'm going to tell you, so it's the second method that's going to be, well actually, one does, to show that Möbius is orthogonal to class two nil sequences, one actually has to use both methods depending on what the nil sequence is.
01:06:03
Because actually the constant function one is itself a nil sequence. But if the nil sequence doesn't look like a constant function, then it's this method that's going to be used, this method of bilinear forms. And this is associated with various names such as Vinogradov, Linick,
01:06:22
Heath-Brown, and Vaughan, and others. It's a well-traveled method in analytic number theory. And the idea, so for those of you
01:06:53
who have read or are interested in reading Zhang's proof of bounded gaps between primes,
01:07:00
this method is also important there. It doesn't come up in the Maynard-Tao approach to that theorem. So the idea is to decompose the Möbius function
01:07:22
into Dirichlet convolutions. So it's going to be written as a sum of a few functions.
01:07:44
So functions that are going to be of the form f star g of n. So this is Dirichlet convolution. Sum over d divides n f of d, g of n over d, a sum of.
01:08:02
And to get the method to work, you've got to be careful about exactly how you do that, and in particular about the ranges that those f and g are supported on. So there's a lot of flexibility, a lot of flexibility in how this is done.
01:08:23
And somehow there's one basic identity that governs this entire endeavor, which is called Linux identity. And that's the identity that Möbius, mu of n,
01:08:46
is the sum over all integers k of minus one to the k times the kth divisor function of n. Well, in fact, what's called the kth proper divisor function of n. So tor sub k prime of n is the number of ways
01:09:05
of writing n equals n one times up to nk, with all of the ni's strictly bigger than one. So tor sub k is itself, it's in fact a k-fold Dirichlet convolution.
01:09:33
And so there's many ways this gives a lot of flexibility in decomposing mu into a sum of Dirichlet convolutions.
01:09:41
Linux identity is a fun exercise. And I think I can give you a hint that will let you do the exercise in about half a line. It relies on the fact that one over zeta, which is the Dirichlet series of the Möbius, can be expanded as one over one plus zeta minus one,
01:10:03
obviously, and that's the sum of zeta minus one to the j by the geometric series formula. So if you compare coefficients of both sides,
01:10:22
you will find that you've come out with a Linux formula. So it's a really natural thing. I mean, actually, for actually doing this, Linux formula is not useful. And the reason for that is that the sum over k, k can be large.
01:10:42
So what one would actually use is a formula of Heath-Brown, which is a truncation of Linux formula, or depending on your application, and for this application, we don't need too much sophistication. There's an identity of Vaughan.
01:11:02
This is Vaughan, by the way. I know that this is a, English pronunciation is pretty irregular, but that's about as irregular as it gets. So when I say Vaughan, I mean that. Vaughan's identity is popular.
01:11:22
It's easy enough to prove, and you'll find it in books, and that would also suffice for what I'm talking about. So you've decomposed your Möbius into Dirichlet convolutions, so now what do you do? So once we have a decomposition of Möbius
01:11:52
as a sum of Dirichlet convolutions f star g,
01:12:01
well then, let's try and evaluate the inner product of one of those, the average f star g of n, chi of n, where chi is now a class two nil sequence, let's say.
01:12:26
So I can evaluate the inner product of Möbius with chi as a sum of these things. And let's suppose that I've done this decomposition in such a way that I've got nice control over the support of f and g, so for simplicity,
01:12:47
let me suppose that f and g are supported near, well, near the square root of n.
01:13:03
So near, so f of x and g of x are supported where x is roughly the square root of n. So that's just one case, but it's a case that would, could come up.
01:13:36
So the idea is that you take that inner product, let's call that something.
01:13:45
So star is going to equal, well, it's going to be roughly the average value over a and b of size about square root n of f of a g of b chi of ab.
01:14:06
It won't be quite that, there'll be some weights, but roughly that, and then by Cauchy-Schwanz, one application of Cauchy-Schwanz, that's at most the average over b and b primed of,
01:14:24
well, of the average over a, and all of these of size about square root n of chi of ab, chi of ab primed.
01:14:44
So by using Cauchy, and also the fact that f and g are bounded by one, let's say, so I'm simplifying a vast amount here,
01:15:06
but the reason I'm doing that is that I want to illustrate what the key point is, which is the evaluation of this sum. To see what this is, I prefer to do a change of variables
01:15:21
just to have a look at it. So this is going to equal, I'm going to call it the average over n less than root n of chi of dn, chi of d primed n. So here I just substituted a equals n
01:15:40
and b equals d, and b primed equals d primed, just because I feel that's more suggestive. So what is this? Well, if I expand it out, so chi is a nil sequence,
01:16:01
so it's phi of p of d of n times phi, this should be bars as well, and phi of p of d primed n bar, and that is the average.
01:16:37
Well, it looks like a correlation of two nil sequences,
01:16:42
but you can interpret it as one nil sequence. So let's just write it as the average over n less than or equal to n of phi times phi bar of p of dn, p of d primed n. So for fixed d and d primed,
01:17:01
so this is square root n, for fixed d and d primed, and d and d primed are fixed, this is an average of a nil sequence on g cross g.
01:17:23
So gamma mod g cross itself in which the polynomial sequence, so with automorphic function phi cross phi bar,
01:17:44
and polynomial sequence n maps to the pair p of dn, p of d primed n.
01:18:01
So what it boils down to at the end of the day is understanding how the distribution of these polynomial sequences on g cross g for different d and d primed are related to the distribution of p itself on g. So more precisely, what one does is reduce to the case
01:18:20
where p is equidistributed on g, and it then turns out that most of these sequences are equidistributed as well. So for that, you need the whole theory of distribution on nil sequences that I discussed in lecture four. So a similar sort of technique,
01:18:41
but instead of being based on Linux identity, but rather on a criterion of Dabusi, I believe, and Katai, there's some very interesting work of Bourguin, Sarnak, and Ziegler,
01:19:05
which establishes this Möbius randomness principle for things that are quite a bit more general than nil sequences. So in fact, for nil sequences come from discrete time flows on nilpotent homogeneous spaces, g mod gamma, but they deal with unipotent flows on more general g mod gamma,
01:19:22
the same generality as Ratner's theorem. I think so. I think it's written completely. The key point is that they prove a weaker result than we do. They just get a little o one of cancellation.
01:19:43
So what I didn't say is that we actually need to beat the little o one by a big power of log, but I think for just little o one, their result does give this. You only need this for fixed D and D primed, I'm pretty sure.
01:20:02
Right, well, I think that's about all I have time to sketch, so I'm going to stop there.
01:20:26
Yeah. How much of all of this remains, I mean, how much of all of this is needed if you just wanted to prove that there are increasing and full terms and everything?
01:20:43
Essentially none of it. So all of these lectures have been about work that was subsequent to my work with tau on progressions of primes. So that work is quite a bit softer than this. You don't need null sequences at all.
01:21:00
That work was used here on the top board, in fact. But you actually, to understand that paper, you wouldn't need to know, I mean, you need the Gauss norms, that's an important part of it, but you don't need anything about null sequences. I suppose you wanted to still prove them,
01:21:21
with the way you stated it with psi one, psi nine. Yeah. But you only want to know that you can do many of those. Yeah, so for that, I believe you do need all of this theory in general, unless it's somehow, so there are sort of homogeneous systems of linear forms.
01:21:42
You need there to be an analog of Szemeredi's theorem. So already for the system of forms X, Y, X plus Y, there's no analog of Szemeredi's theorem. You can have a positive density set with no solutions to that equation, just the odd numbers. So as soon as you hit upon that issue,
01:22:02
you're going to need to use this more elaborate theory. So basically what you're saying is that the fact that you get the exact same thing is basically not more constant than proving. Right, in fact, I even wonder whether,
01:22:23
it may even be the case that if you had lower bounds for just every system of forms, this may even automatically imply an asymptotic somehow possible. They'd have to be compatible.
01:22:43
It's a technical thing, so at the end here, so then you start a G, and so of course, mu is a two prime, two primes of K is a K fold complex. Yes. So you have to do what you did here with this K. Actually, no, because you can decompose
01:23:02
a K fold convolution. You can just arrange it as a two fold convolution. Yeah, in general, they're not going to be bounded by one. I mean, again, that was just for simplicity, so they'll be bounded by some power of the divisor function, but yes,
01:23:20
you only ever deal with the, well, we only ever deal with the two fold convolutions. I think that there are, in fact, some parts of Zhang's work used more carefully, the sort of triple convolution structure and so on. My last question is can you envision
01:23:42
some arithmetic applications of this program, or gammas and idea, yeah? Yeah, that's a good question. I don't know. So one thing I think, maybe when you were referring to some work of theirs that was not completed, I think maybe what they'd like to do is understand orbits in G mod gamma
01:24:02
at prime return times, for example, and I think even for SL2, that's open. So I don't know, but I don't believe, or I certainly don't know of any arithmetic applications that are parallel to the ones that I've been discussing in these lectures.
01:24:22
So they replace the new sequence with the with the full cycle flow? Yeah, you could evaluate a horizontal flow at some automorphic function on G mod gamma. I don't know what the significance of this is in general. It's actually an interesting question, for sure.
01:24:45
Any more questions? So I suggest we thank Ben for his wonderful lectures.