key: cord-0194424-oky924gc
authors: Helfgott, Harald Andr'es
title: Expansion, divisibility and parity: an explanation
date: 2022-01-03
journal: nan
DOI: nan
sha: 3c8f45656e3dd2ee2601efa731a3ae4662a87475
doc_id: 194424
cord_uid: oky924gc

After seeing how questions on the finer distribution of prime factorization -- considered inaccessible until recently -- reduce to bounding the norm of an operator defined on a graph describing factorization, we will show how to bound that norm. In essence, the graph is a strong local expander, with all eigenvalues bounded by a constant factor times the theoretical minimum (i.e., the eigenvalue bound corresponding to Ramanujan graphs). The proof will take us on a walk from graph theory to linear algebra and the geometry of numbers, and back to graph theory, aided, along the way, by a generalized sieve. This is an expository paper; the full proof has appeared as a joint preprint with M. Radziwi{l}{l}.

This paper is meant as an informal exposition of [HR] . The main result is a statement on how a linear operator defined in terms of divisibility by primes has small norm. In this exposition, we will choose to start from one of its main current applications, namely, (1.1) 1 log x n≤x λ(n)λ(n + 1)

which strengthens results by Tao [Tao16] and Tao-Teräväinen [TT19] . (Here λ(n) is the Liouville function, viz., the completely multiplicative function such that λ(p) = −1 for every prime p.) There are other corollaries, some of them subsuming the above statement. It is also true that above statement is an improvement on a bound, whereas the main result is a result that is new also in a qualitative sense. One may thus ask oneself whether it is right to center the exposition on (1.1).

All the same, (1.1) is a concrete statement that is obviously interesting, being a step towards Chowla's conjecture ("logarithmic Chowla in degree 2"), and so it is a convenient initial goal.

First, some meta comments. We may contrast two possible ways of writing a paper -what may be called the incremental and the retrospective approaches.

• In the incremental approach, we write a paper while we solve a problem, letting complications and detours accrete. There is much that can be said against this approach: it is hard to distinguish it from simply lazy writing; the end product may be unclear; just how one got to the solution of the problem may be inessential or even misleading. • The retrospective approach consists in writing the paper once the proof is done, from the perspective that one has reached by the time one has solved the problem.

These are of course two extremes. Few people write nothing down while solving a problem, and the way one followed to reach the solution generally has some influence on the finished paper. In the case of my paper with Maksym, what we followed was mainly the retrospective approach, with some incremental elements, mainly to deal with technical complications that we had to deal with after we had an outline of a proof. It is tempting to say that that is still too much incrementality, but, in fact, some of the feedback we have received suggests a drawback of the retrospective approach that I had not thought of before.

When faced with a result with a lengthy proof, readers tend to come up with their own "natural strategy". So far, so good: active reading is surely a good thing. What then happens, though, is that readers may see necessary divergences from their "natural strategy" as technical complications. They may often be correct; however, they may miss why the "natural strategy" may not work, or how it leads to the main, essential difficulty -the heart of the problem, which they may then miss for following the complications.

What I will do in this write-up is follow, not an incremental approach, but rather an idealized view of what the path towards the solution was or could have been like; a recreated incrementality with the benefit of hindsight, then, starting from a "natural strategy", with an emphasis on what turns out to be essential.

Notation. We will use notation that is usual within analytic number theory. In particular, given two functions f, g on R + or Z + , f (x) = O(g(x)) means that there exists a constant C > 0 such that |f (x)| ≤ Cg(x) for all large enough x, and f (x) = o(g(x)) means that lim x→∞ f (x)/g(x) = 0 (and g(x) > 0 for x large enough). By O * (B), we will mean "a quantity whose absolute value is no larger than B"; it is a useful bit of notation for error terms. We define ω(n) to be the number of prime divisors of an integer n.

1.1. Initial setup. Let us set out to reprove Tao's "logarithmic Chowla" statement, that is, 1 log x n≤x λ(n)λ(n + 1) n → 0 as x → ∞. Now, Tao's method gives a bound of O(1/(log log log log x) α ) on the left side (as explained in [HU] , with α = 1/5), while Tao-Teräväinen should yield a bound of O(1/(log log log x) α ) for some α > 0. Their work is based on depleting entropy, or, more precisely, on depleting mutual information. Our method gives stronger bounds (namely, O(1/ √ log log x)) and is also "stronger" in ways that will later become apparent. Let us focus, however, simply on giving a different proof, and welcome whatever might come from it.

The first step will be consist of a little manipulation as in Tao, based on the fact that λ is multiplicative. Let W = n≤x λ(n)λ(n + 1)/n. For any prime (or integer!) p, Objectives. Tao showed that there exists a set P of primes (very small compared to N ) such that S = o(N L ). It is our aim to prove that S = o(N L ) for every set P of primes satisfying some simple conditions. (As we said, we assume p ≤ H, and it is not hard to see that we have to assume L → ∞; it will also be helpful to assume that no p ∈ P is tiny compared to H.) We will in fact be able to show that S = O(N √ L ), which is essentially optimal.

We now set out on our own.

It is a deep-seated instinct for an analytic number theorist to apply Cauchy-Schwarz:

p 1 ,p 2 ∈P p i |n, n+σ i p i ∈N λ(n + σ 1 p 1 )λ(n + σ 2 p 2 ) ≤ n∈N σ 1 ,σ 2 =±1 p 1 ,p 2 ∈P p 1 |n,p 2 |n+σ 1 p 1 n+σ 1 p 1 ,n+σ 1 p 1 +σ 2 p 2 ∈N λ(n)λ(n + σ 1 p 1 + σ 2 p 2 ), where we are changing variables in the last step.

We iterate, applying Cauchy-Schwarz times:

, p i ∈P ∀1≤i≤2 :p i |n+σ 1 p 1 +...+σ i−1 p i−1 λ(n)λ(n + σ 1 p 1 + . . . + σ 2 p 2 ).

We can see n + σ 1 p 1 + . . . + σ 2 p 2 as the outcome of a "walk" of length 2 .

Suppose for a moment that, for k = 2 large, the number of walks of length k from n to m is generally about ψ(m), where ψ is a nice continuous function. Then S 2 would tend to

The main result (Theorem 1) in Matomäki-Radziwiłł [MR16] would then give us a bound on that double sum. Let us write that bound in the form

since |ψ| 1 should be about L k , and write our statement on convergence to ψ in the form

Then S ≤ (err

Here already we would seem to have a problem. The "width" M of the distribution ψ (meaning its scale) should be √ k · E(p : p ∈ P) ≤ √ kH; the distribution could be something like a Gaussian at that scale, say. Now, the bound from [MR16] is roughly of the quality err 2 ≤ 1/ log M . One can use intermediate results in the same paper to obtain a bound on err 2 roughly of the form 1/M δ , δ > 0, if we remove some integers from N. At any rate, it seems clear that we would need, at the very least, k larger than any constant times log H.

As it turns out, all of that is a non-issue, in that there is a way to avoid taking the kth root of err 2 altogether. Let us make a mental note, however.

The question now is how large has to be for the number of walks of length k = 2 from n to n + m to approach a continuous distribution ψ(m). Consider first the walks n, n + σ 1 p 1 , . . . , n + σ 1 p 1 + · · · + σ k p k such that no prime p i is repeated. Fix σ i , p i and let n vary. By the Chinese Remainder Theorem, the number of n ∈ N such that p 1 |n, p 2 |n + σ 1 p 1 , . . . , p k |n + σ 1 p 1 + . . . + σ k−1 p k−1 is almost exactly N/p 1 p 2 · · · p k . In other words, the probability of that walk being allowed is almost exactly 1/p 1 . . . p k . We may thus guess that ψ has the same shape (scaled up by a factor of L k ) as the distribution of the endpoint of a random walk where each edge of length p is taken with probability 1/p i (divided by L , so that the probabilities add up to 1). That distribution should indeed tend to a continuous distribution -namely, a Gaussianfairly quickly. Of course, here, we are just talking about the contribution of walks with distinct edges p i to

without absolute values, and we do need to take absolute values as in (2.1). However, we can get essentially what we want by looking at the variance

and considering the contribution to this variance made by closed walks n,n + σ 1 p 1 , . . . , n + σ 1 p 1 + · · · + σ k p k = m,

The contribution of these closed walks is almost exactly what we would obtain from the naïve model we were implicitly considering, viz., a random walk where each edge p i is taken with probability 1/(L p i ), and so we should have the same limiting distribution as in that model.

What about walks where some primes p i do repeat? At least some of them may make a large contribution that is not there in our naïve model. For instance, consider walks of length 2k that retrace their steps, so that the (n + 1)th step is the nth step backwards, the (n + 2)th step is the (n − 1)th step backwards, etc.: n,n + σ 1 p 1 , . . . , n + σ 1 p 1 + · · · + σ k p k , n + σ 1 p 1 + · · · + σ k−1 p k−1 , . . . , n + σ 1 p 1 , n,

The second row of divisibility conditions here is obviously implied by the first row. Hence, again by the Chinese Remainder Theorem, the walk is valid for almost exactly N/p 1 p 2 · · · p k elements n ∈ N, rather than for N/(p 1 p 2 · · · p k ) 2 elements. The contribution of such walks to n∈N ∀1≤i≤2k:σ i =±1, p i ∈P ∀1≤i≤2k:p i |n+σ 1 p 1 +...+σ i−1 p i−1 σ 1 p 1 +...+σ 2k p 2k =0 1 (which is the interesting part of the variance we wrote down before) is clearly N L k . In order for it not to be of greater order than what one expects from the limiting distribution, we should have N L k N L 2k /M , where M , the width of the distribution, is, as we saw before, very roughly √ kH. Thus, we need k (log H)/(log L ).

There are of course other walks that make similar contributions; take, for instance, n, n + p 1 , n, n − p 3 , n − p 3 + p 4 , n − p 3 , n − p 3 + p 6 , n − p 3 , n for k = 3. These are what we may call trivial walks, in the sense that a word is trivial when it reduces to the identity. It is tempting to say that their number is 2 k C k , where C k ≤ 2 2k is the kth Catalan number (which, among other things, counts the number of expressions containing k pairs of parentheses correctly matched: for example, ()(()) would correspond to the trivial walk above). In fact, the matter becomes more subtle because some primes may reappear without taking us one step further back to the origin of the walk; for instance, in the above, we might have p 4 = p 1 , and that is a possibility that is not recorded by a simple pattern of correctly matched parentheses-yet it must be considered separately. Here again we make a mental note.

It is, incidentally, no coincidence that, when we try to draw the trivial walk above, we produce a tree:

Any trivial walk gives us a tree (or rather a tree traversal) when drawn. Now let us look at walks that fall into neither of the two classes just discussed; that is, walks where we do have some repeated primes p i = p i even after we reduce the walk. (When we say we reduce a walk, we mean an analogous procedure to that of reducing a word.) Then, far from being independent, the condition

either implies or contradicts the condition

We may draw another graph, emphasizing the two edges with the same label ±p i :

At this point it becomes convenient to introduce the assumption that p ≥ H 0 for all p ∈ P. Then it is clear that, if i − i > 1 and p j = p i for all i < j < i , the divisibility condition p i |σ i+1 p i+1 + . . . + σ i −1 p i −1 may hold only for a proportion 1/H 0 of all tuples (p i+1 , . . . , p i −1 ).

So far, so good, except that it is not enough to save one factor of H 0 , and indeed we should save a factor of at least M , which is roughly in the scale of H, not H 0 . Obviously, for L → ∞ to hold, we need H 0 = H o(1) , and so we need to save more than any constant number of factors of H 0 .

We have seen three rather different cases. In general, we would like to have a division of all walks into three classes:

1. walks containing enough non-repeated primes p i that their contribution is one would expect from the hoped-for limiting distribution; 2. rare walks, such as, for example, trivial walks; 3. walks for which there are many independent conditions of the form

Some initial thoughts on the third case. We should think a little about what we mean or should mean by "independent". It is clear that, if we have several conditions p|L j (p 1 , . . . , p 2k ), where the L j are linear forms spanning a space of dimension D, then, in effect, we have only D distinct conditions. It is also clear that, while having several primes p i divide the same quantity L(p 1 , . . . , p 2k ) ought to give us more information than just knowing one prime divides it, that is true only up to a point: if L(p 1 , . . . , p 2k ) = 0 (something that we expect to happen about 1/ √ kH of the time), then every condition of the form p i |L(p 1 , . . . , p 2k ) holds trivially.

It is also the case that we should be careful about which primes do the dividing. Say two indices i, i are equivalent if p i = p i . Choose your equivalence relation ∼, and paint the indices i in some equivalence classes blue, while painting the indices i in the other equivalence classes red. It is not hard to show, using a little geometry of numbers, that, if p i j |L j (p 1 , . . . , p 2k ) for some blue indices i j and linear forms L j , j ∈ J, and the space spanned by the forms L j considered as formal linear combinations on the variables x i for i red is D, we can gain a factor of at least H D 0 or so: the primes p i for i red have to lie in a lattice of codimension D and index ≥ H D 0 . A priori, however, it is not clear which primes we should color blue and which ones red.

We have, at any rate, arrived at what may be called the core of the problem -how to classify our walks in three classes as above, and how to estimate their contribution accordingly.

It is now time to step back and take a fresh look at the problem. Matters will become clearer and simpler, but, as we will see, the core of the problem will remain.

We have been talking about walks. Now, walks are taken in a graph. Thinking about it for a moment, we see that we have been considering walks in the graph Γ having V = N as its set of vertices and E = {n, n + p : n, n + p ∈ N, p ∈ P, p|n} as its set of edges. (In other words, we draw an edge between n and n + p if and only if p divides n.) We also considered random walks in what we called the "naïve model"; those are walks in the weighted graph Γ having N as its set of vertices and an edge of weight 1/p between any n, n + p ∈ N with p ∈ P, regardless of whether p|n.

3.1. Adjacency, eigenvalues and expansion. Questions about walks in a graph Γ are closely tied to the adjacency operator Ad Γ . This is a linear operator on functions f :

In other words, Ad Γ replaces the value of f at a vertex v by the sum of its values f (w) at the neighbors w of v. The connection with walks is not hard to see: for instance, it is very easy to show that, if 1 v : V → C is the function taking the value 1 at v and 0 elsewhere, then, for any w ∈ V and any k ≥ 0, ((Ad Γ ) k 1 v )(w) is the number of walks of length k from v to w.

The connection between Ad Γ and our problem is very direct, in that it can be stated without reference to random walks. We want to show that n∈N σ=±1 p∈P:p|n n+σp∈N

That is exactly the same as showing that

where ·, · is the inner product defined by

The behavior of random walks on a graph -in particular, the limit distribution of their endpoints -is closely related to the notion of expansion. A regular graph Γ (that is, a graph where every vertex has the same degree d)

is said to be an expander graph with parameter > 0 if, for every eigenvalue γ of Ad Γ corresponding to an eigenfunction orthogonal to constant functions,

(A few basic remarks may be in order. Since Γ is regular of degree d, a constant function on V is automatically an eigenfunction with eigenvalue d. Now, Ad Γ is a symmetric operator, and thus it has full real spectrum: the space of all functions V → C is spanned by a set of eigenfunctions of Ad Γ , all orthogonal to each other; the corresponding eigenvalues are all real, and it is easy to see that all of them are at most d in absolute value.)

It is clear that we need something both stronger and weaker than expansion.

(We cannot use the definition of expansion above "as is" anyhow, in that our graph Γ is not regular; its average degree is L .) We need a stronger bound than what expansion provides: we want to show, not just that | λ,

There is nothing unrealistically strong here -in the strongest kind of expander graph (Ramanujan graphs), the absolute value of every eigenvalue is at most 2 √ d − 1.

At the same time, we cannot ask for f, Ad Γ f /|f | 2 2 = o(L ) to hold for every f orthogonal to constant functions. Take f = 1 I 1 − 1 I 2 , where I 1 , I 2 are two disjoint intervals of the same length ≥ 100H, say. Then f is orthogonal to constant functions, but (Ad Γ f )(n) is equal to ω(n)f (n), except possibly for those n that lie at a distance ≤ H of the edges of I 1 and I 2 . Hence, f, Ad Γ f /|f | 2 2 will be close to L . It follows that Ad Γ will have at least one eigenfunction orthogonal to constant functions and with eigenvalue close to L ; in fact, it will have many.

(This observation is related to the fact that endpoint of a short random walk on Γ cannot be approximately equidistributed, as it is in an expander graph: the edges of Γ are too short for that. The most we could hope for is what we were aiming for, namely, that the distribution of the endpoint converges to a nice distribution, centered at the starting point.)

We could aim to show that f, Ad Γ f /|f | 2 2 is small whenever f is approximately orthogonal to approximately locally constant functions, say. Since the main result in [MR16] can be interpreted as the statement that λ is approximately orthogonal to such functions, we would then obtain what we wanted to prove for f = λ.

We will find it cleaner to proceed slightly differently. Recall our weighted graph Γ , which was meant as a naïve model for Γ. It has an adjacency operator Ad Γ as well, defined as before. (Since Γ has weights 1/p on its edges, (Ad Γ f )(n) = p∈P (f (n + p) + f (n − p))/p.) It is not hard to show, using the techniques in [MR16] , that

(In fact, what amounts to this statement has already been shown, in [Tao16, Lemma 3.4-3.5]; the main ingredient is [MRT15, Thm. 1.3], which applies and generalizes the main theorem in [MR16] . Their bound is a fair deal smaller than o(L ).) We define the operator

It will then be enough to show that

as then it will obviously follow that

It would be natural to guess, and try to prove, that f,

We cannot hope for quite that much. The reason is simple. For any vertex n, A1 n , A1 n equals the sum of the squares of the weights of the edges {n, n } containing n. That sum equals p∈P p|n

which in turn is greater than 1/4 times the number ω P (n) of divisors of n in P. Thus, A has at least one eigenvalue greater than ω P (n)/2. Now, typically, n has about L divisors in P, but some integers n have many more; for some rare n, in fact, ω P (n) will be greater than L 2 , and so there have to be eigenvalues of A greater than L .

It is thus clear that we will have to exclude some integers, i.e., we will define our vertex set to be some subset X ⊂ N with small complement. We will set ourselves the goal of proving that all of the eigenvalues of the operator A| X defined by

. (Here f | X is just the function taking the value f (n) for n ∈ X and 0 for n ∈ X .) Then, for f = λ, or for any other f with |f | ∞ ≤ 1,

where, if N \ X is small enough (as it will be), it will not be hard to show that the sum within O(·) is quite small. We will then be done: obviously f, (A| X )f is bounded by the largest eigenvalue of A| X times |f | 2 (which is ≤ |f | ∞ ≤ 1), and so we will indeed have f, Af = o(L ). We will in fact be able to prove something stronger: there is a subset X ⊂ N with small complement such that all eigenvalues of A| X are

(This bound is optimal up to a constant factor.) This is our main theorem.

We hence obtain that

From (3.1), we deduce the bound

we stated at the beginning.

More generally, we get f, Af = O( √ L ) for any f with |f | ∞ ≤ 1, or for that matter by any f with |f | 4 ≤ e 100L and |f | 2 ≤ 1. We obtain plenty of consequences besides (3.2).

3.2. Powers, eigenvalues and closed walks. Now that we know what we want to prove, let us come up with a strategy.

There is a completely standard route towards bounds on eigenvalues of operators such as A (or A| X ), relying on the fact that the trace is invariant under conjugation. Because of this invariance, the trace of a power A 2k is the same whether A is written taking a full family of orthogonal eigenvectors as a basis, or just taking the characteristic functions 1 n as our basis. Looking at matters the first way, we see that

where λ 1 , λ 2 , . . . , λ N are the eigenvalues corresponding to the basis made out of eigenvectors. Looking at matters the second way, we see that Tr(A| X ) 2k = N 2k , where N 2k is the sum over all closed walks of length 2k of the products of the weights of the edges in each walk:

where we adopt the convention 1 true = 1, 1 false = 0.

Since all eigenvalues are real, it is clear that λ 2k i ≤ N 2k for every eigenvalue λ i . Often, and also now, that inequality is not enough in itself for a good bound on λ i . What is then often done is to show that every eigenvalue must have multiplicity ≥ M , where M is some large quantity. Then it follows that, for every eigenvalue γ,

and so |γ| ≤ (N 2k /M ) 1/2k . We do not quite have high multiplicity here (why would we?) but we have something that is almost as good: if there is one large eigenvalue, then there are many mutually orthogonal functions g i of norm 1 with g i , Ag i large. Then we can bound TrA 2k from below, using these functions g i (and some arbitrary functions orthogonal to them) as our basis, and, since TrA 2k also equals N 2k , we can hope to obtain a contradiction with an upper bound on N 2k .

For simplicity, let us start by sketching a proof that, if | f, Af | is large (≥ ρL , say) for some f with |f | ∞ ≤ 1, then there are many orthogonal functions g i of norm 1 and g i , Ag i large (with "large" meaning ≥ ρL /2, say). This weaker statement suffices for our original goal, since we may set f equal to the Liouville function λ.

Edges starting at a vertex v in I i end at another vertex in I i , unless they are close to the edge.

To prove truly that A has no large eigenvalues, we should proceed as we just did, but assuming only that |f | 2 ≤ 1, not that |f | ∞ ≤ 1. The basic idea is the same, except that (a) pigeonholing is a little more delicate, (b) if f is almost entirely concentrated in a small subset of N, then we can extract only a few mutually orthogonal functions g i from it. Recall that we are anyhow restricting to a set X ⊂ N. A brief argument suffices to show that we can avoid the problem posed by (b) simply by making X a little smaller (essentially: deleting the support of such g i , and then running through the entire procedure again), while keeping its complement N \ X very small.

In any event: we obtain that, if, for some X ⊂ N, Tr(A| X ) 2k is not too large (smaller than (ρL /2) 2k N/H or so) then there is a subset X ⊂ X with X \ X small such that every eigenvalue of A| X is small (≤ ρL ). It thus remains to prove that Tr(A| X ) 2k is small for some X ⊂ N with small complement N \ X.

Recall that Tr(A| X ) 2k = N 2k (with N 2k defined as above, except with X instead of X ) and that X should not include integers n with many more prime divisors in P than average. Our task is to bound N 2k .

We have come full circle, or rather we have arrived twice at the same place. We started with a somewhat naïve approach that lead us to random walks. Then we took a step back and analyzed the situation in a way that turned out to be cleaner; for instance, the problem involving err 1/k 2 vanished. As it happens, that cleaner approach took us to random walks again. Surely this is a good sign.

It is also encouraging to see signs that other people have thought in the same direction. The paper by Matomäki-Radziwiłł-Tao on sign patterns of λ and µ is based on the examination of a graph equivalent to Γ; what they show is, in essence, that Γ is almost everywhere locally connected. Being connected may be a much weaker property than expansion, but it is a step in the same direction. As for expansion itself, Tao ( §4) comments that "some sort of expander graph property" may hold for that graph (equivalent to Γ) "or [for] some closely related graph". He goes on to say:

Unfortunately we were unable to establish such an expansion property, as the edges in the graph [. . . ] do not seem to be either random enough or structured enough for standard methods of establishing expansion to work."

And so we will set about to establish expansion by our methods (standard or not).

In any event, our initial discussion of random walks is still pertinent. Recall the plan with which we concluded, namely, to divide walks into three kinds: walks with few non-repeated primes, walks imposing many independent divisibility conditions, and rare walks. This plan will shape our approach to bounding N 2k in the next section.

Let us recapitulate. Let N = {n ∈ Z : N < n ≤ 2N }. We have defined a linear operator A on functions f : N → C as the difference of the adjacency operators of two graphs Γ, Γ :

We would like to show that there is a subset X ⊂ N with small complement N \ X such that, for some k that is not too small, the trace

is substantially smaller than L 2k N . Indeed, we will prove that there is a constant C such that

Incidentally, when we say "k not too small", we mean "k is larger than log H or so"; we already saw that we stand to lose a factor of H 1/k when going from (a) a trace bound as above to (b) a bound on eigenvalues, which is our ultimate goal. If k log H, then H 1/k is just a constant.

For comparison: if, as will be the case, we define X so that every n ∈ X has at most KL prime factors, the trivial bound is

We also saw that Tr(A| X ) 2k can be expressed as a sum over closed walks, i.e., walks that end where they start:

Here the double sum just goes over closed walks of length 2k in the weighted graph Γ − Γ , which has X as its set of vertices and an edge between any two vertices n, n whose difference n − n is a prime p in our set of primes P; the weight of the edge is then 1 − 1/p if p|n, and −1/p otherwise. The contribution of a walk equals the product of the weights of its edges. n n 1 = n + σ 1 p 1

Cancellation. It might be nicer to work with an expression with yet simpler weights. First, though, let us see what gains we can get from cancellation. Let p 1 , . . . , p 2k ∈ P and σ 1 , . . . , σ 2k ∈ {−1, 1} be given, and consider the total contribution of the paths they describe as n varies in X. Say there is a p i that appears only once, i.e., p j = p i for all j = i. The weight of the edge from n i−1 = n + σ 1 p 1 + . . . + σ i−1 p i−1 to n i = n + σ 1 p 1 + . . . + σ i p i is 1 − 1/p if p|n i−1 and 1/p otherwise. The weights of all the other edges depend on the congruence classes n mod p j for all j = i.

Suppose for a moment that X = N. Then, for p, σ fixed, and n in a given congruence class n mod p j for every j = i (that is, n in a given congruence class a + P Z for P = p∈{p 1 ,....,p i−1 .p i+1 ,...,p 2k } p, by the Chinese remainder theorem), the probability that p i divides n i−1 is almost exactly 1/p i : the number of n in N in our congruence class mod P is N/P + O * (1) (that is, no less than N/P − 1 and no more than N/P + 1), and, for such n, again by the Chinese remainder theorem, p|n i−1 if and only if n lies in a certain congruence class modulo p i · P ; the number of n in N in that congruence class is N/(p i P ) + O * (1).

Hence, among all n in N ∩ (a + P Z), a proportion almost exactly 1/p have a weight 1 − 1/p on the edge from n i−1 to n i , and a proportion almost exactly 1 − 1/p have a weight −1/p there instead. Since all other weights are fixed, we obtain practically total cancellation:

In other words, the contribution of paths where at least one p i appears only once is practically nil. Hence, we can assume that, in our paths, every p i appears at least twice among p 1 , p 2 , . . . , p 2k .

Of course we do not actually want to set X = N, and in fact we cannot, as we have already seen. If X is well-distributed in arithmetic progressions, then we should still get cancellation, but it will not be total -there will be an error term. Much of the pain here comes from the fact that we have to exclude numbers with too many prime factors (meaning: > KL prime factors). Suppose for simplicity that X is the set of all numbers in N with ≤ KL . Recall that all vertices n, n 1 = n + σ 1 p 1 , n 2 = n + σ 1 p 1 + σ 2 p 2 + . . . have to be in X; in particular, n i−1 ∈ X. As a consequence, the likelihood that p|n i−1 is slightly lower than 1/p: if n i−1 = pm, then m is constrained to have ≤ KL − 1 prime factors, and it is slightly more difficult for m to satisfy that constraint than it is for an n ∈ N to have ≤ KL prime factors. We do have cancellation, but it is not total, as it is for X = N. The techniques involved in estimating how much cancellation we do have are standard within analytic number theory.

Later, we will also exclude some other integers from X, besides those having > KL prime factors. We will then need to show that the effect on cancellation is minor. Doing so will require some arguably new techniques; we will cross that bridge when we come to it.

To cut a long story short, the effect of cancellation will be, not that every p i appears at least twice among p 1 , p 2 , . . . , p 2k , but that the number of "singletons" (primes that appear only once) is small. More precisely, a path with m singletons will have to pay a penalty of a factor of L −m/2 .

Let us see what we have. Write k = {1, 2, . . . , 2k}. Let l range among all subsets of k . Here l will be our set of "lit" indices, corresponding to the set of indices i such that 

coming from p i |n + σ 1 p 1 + . . . + σ i p i and p i = p j |n + σ 1 p 1 + . . . + σ j p j . Let us write β i as shorthand for σ 1 p 1 + . . . + σ i p i ; then our condition becomes

Given a walk n, n + σ 1 p 1 , n + σ 1 p 1 + σ 2 p 2 , . . ., we define its shape to be (∼, σ), where ∼ is the equivalence relation it induces (as above). In fact, let us start with shapes, meaning pairs (∼, σ), where ∼ is an equivalence class on {1, 2, . . . , k} and σ ∈ {−1, 1} 2k . For any given shape, we will bound the contribution of all walks of that shape. There will be some shapes for which we will not be successful; we will later treat walks of those shapes, and show that their contribution is small in some other way.

To rephrase what we said just before: given l ⊂ k, the contribution of a shape (∼, σ) will be at most

where Π is the set of equivalence classes of ∼ and S(∼) is the set of singletons of ∼, where a "singleton" is an equivalence class with exactly one element. We write |S| for the number of elements of a set S.

What we have to do then is, in essence, bound the number of solutions (p [i] ) [i]∈Π to a system of divisibility conditions

It would be convenient if the divisors p [i] were all distinct from the primes in the sums being divided. Then we could apply directly the following Lemma, which is really grade-school-level geometry of numbers. 

The trivial bound is clearly m i=1 (N i + 1). 

Let r be the dimension of the space spanned by the differences

Then we can select r divisibility relations of the form (4.2) and r red equivalence classes such that the matrix consisting of a row ( j∈[j] σ j ) [j] red for each equivalence relation is non-singular. (We are just saying that a matrix of rank r has a non-singular r-by-r submatrix.) We can then apply our lemma.

After some book-keeping, we obtain a bound on our sum from (4.1), namely,

Here the important factor is 1/H r 0 . We see that we "win" if r is at least somewhat large. The question is then how to choose which equivalence classes to color red or blue so as to make the rank r large.

To address this question, let us define a new graph. First, though, let us define the reduction of a shape (∼, σ). A shape clearly induces a word

. This word can be reduced (if it isn't already), and the resulting reduced word induces a "reduced shape" (∼ , σ ). If all representatives of an equivalence class of (∼, σ) disappear during the reduction, we color that class yellow. It is the non-yellow classes that we will color red or blue.

We define a graph G (∼,σ) to be an undirected graph having the non-yellow equivalence classes as its vertices, and an edge between two vertices v 1 , v 2 if there are i 1 ∈ v 1 , i 2 ∈ v 2 such that every equivalence class containing at least one index j ∈ {i 1 + 1, i 1 + 2, . . . , i 2 − 1} is yellow.

(We define matters in this way, rather than simply reduce the word and join two vertices v 1 , v 2 if there are i 1 ∈ v 1 , i 2 ∈ v 2 such that i 2 = i 1 + 1, because reducing the word could create more singletons. At any rate, the idea is that, if there are only yellow indices between i 1 and i 2 , then v( Given a subset V of the set of vertices V of a graph G , write G | V for the restriction of G to V , i.e., the subgraph of G having V as its set of vertices and set of all edges in G between elements of V as its set of edges. What happens if we choose our coloring so that the restriction G | blue to the set of blue vertices (named blue) is connected?

Lemma 2. Let (∼, σ) be a shape, and let G = G (∼, σ) and v(i) be as above.

Color some non-yellow vertices red and some other non-yellow vertices blue, in such a way that, for blue the set of blue vertices, the restriction G | blue is connected.

Then the space V spanned by the vectors

The proof is an exercise, and its idea may be best made clear by an example.

Sketch of proof (or rather, a worked example). Say we have three blue equivalence classes, corresponding to letters x, y, z in the induced word, and let them be disposed as follows:

x z yx zy

Call the indices of the six letters we have written i 1 , i 2 , . . . , i 6 . Then the space V in the Lemma is the space spanned by

where the space W in the Lemma is the space spanned by all vectors

It is clear that V ⊂ W , but why is W ⊂ V ? Why, say, is v i 2 − v i 1 in V ? Well, let us follow a path in G | blue going from x (the first blue letter, i.e., the letter at position i 1 ) to z (the second blue letter, i.e., the letter at position i 2 ): there is an edge from (the equivalence class labeled) x to (the equivalence class label-led) y, and an edge from y to z. So: v i 4 − v i 1 is in V because i 1 and i 4 are both in the equivalence class x v i 4 equals v i 3 because i 4 and i 3 are adjacent (meaning there cannot be red indices between them; note that there is an edge from x to y precisely because i 4 and i 3 are adjacent) v i 6 − v i 3 is in V because i 6 and i 3 are both in y v i 5 equals v i 6 because i 6 and i 5 are adjacent, as is again reflected in the fact that there is an edge from y to z, v i 2 − v i 6 is in V because i 2 and i 6 are both in z.

Hence, v i 2 − v i 1 ∈ V , as we have shown by following a path in G | blue from x to z. The same argument works in general for any two indices in blue equivalence classes.

Now we have to bound the rank of W from below. We first reduce our word; yellow letters disappear. The most optimistic expectation would be that the rank of W equal the number of gaps between blue "chunks" indicated by braces in our example from before:

x z yx zy (Chunks may have merged during reduction.) The number of gaps here is 4, considered cyclically (so that the first and last gap become one). The gaps

Imagine for a moment that each red letter appeared only once. (From now on, all letters that are neither blue nor yellow will be colored red. In our reduced word, all letters that are not blue are red.) Then the optimistic expectation would hold:

would be a non-trivial formal linear combination of a non-zero number of symbols x [j] , each appearing only once altogether, and so those combinations must all be linearly independent.

Of course, we cannot ensure that each red letter will appear only once, and in fact we are usually treating cases where most of them appear at least twice (i.e., singletons are rare). Let us see what we can do with a weaker assumption. What if we assume that each red letter appears at most κ times?

Let us see an easy linear-algebra lemma.

• every row has at least one non-zero entry,

• no column has more than κ non-zero entries.

Then the rank of A is ≥ n/κ.

Proof. We will construct a finite list S of columns, starting with the empty list. At each step, if there is a row i such that the ith entry of every column in S is 0, include at the end of S a column whose ith entry is non-zero. Stop if there is no such row.

When we stop, we must have κ · |S| ≥ n, as otherwise there would still be a row in which no element of S would have a non-zero entry. Since, for each column in S, there is a row in which that column has a non-zero entry and no previous column in S does, we see that the columns in S are linearly independent. Hence rank(A) ≥ |S| ≥ n/κ.

Now we see what to do: let A be a matrix with columns corresponding to red equivalence classes, and rows corresponding to gaps between blue chunks, with the entry a i [j] being the number of times the red letter corresponding to column [j] appears in the gap corresponding to row i (counting appearances as x −1 [j] as negative appearances). Each column has no more than κ non-zero entries because, by assumption, each red equivalence class contains at most κ elements. We still need to show that no or few rows are full of zeros.

A row is full of zeros iff every letter x in the corresponding gap appears an equal number of times as x and as x −1 (i.e., every equivalence class has as many representatives i with σ i = 1 as with σ i = −1 within the gap). Let us call such gaps invalid.

There is a condition that limits how many such gaps there can be while at the same time ensuring that an equivalence class contains at most κ elements (or not quite, but something that is as good). Let (∼ , σ ) (of length 2k ) be the reduction of the shape (∼, σ). Let us say that we see a revenant when there are indices i, i such that (a) i ∼ i , and (b) there is a j ∼ i with i < j < i . (In other words, x [i] has come back after going away.) We say that there are κ disjoint revenants if there are

with i j ∼ i j and i j ∼  j for 1 ≤ j ≤ 2k. Thus, for example, in Let us impose the condition that there cannot be more than κ disjoint revenants in our walk. (We will be able to assume this condition by rigging the definition of X later.) Then it follows immediately that the appearances of a letter form at most κ contiguous blocks in the reduced word. Hence, a red letter cannot appear in more than κ gaps. We also see that there cannot be more than κ invalid gaps: a gap is a non-empty reduced subword, and, if a letter x appears in a reduced, non-trivial word as many times as x and as x −1 , either the pattern x . . . x −1 or the pattern x −1 . . . x appears in the word, with ". . ." standing for a non-empty subword consisting of letters that are not x. Thus, > κ invalid gaps would give us > κ revenants, all disjoint. It then follows, by the easy linear-algebra lemma above, that

where s is the number of gaps.

The question is then: how do you choose which letters to color blue and which to color red so that the number s of gaps is large? contains an index i such that i and j are separated only by yellow letters; since there are no yellow letters, that means that i = j + 1 or i = j − 1 (or one of i, j is 1 and the other one is 2k)).

The question is then how to choose the set blue of equivalence classes to be colored blue in such a way that ∂blue is large. Here blue can be any set of vertices such that G | blue is connected. So, in general: given a connected undirected graph G , how do we choose a set blue of vertices so that G | blue is connected and ∂blue is large?

that is a tree (i.e., has no cycles) and has the same set of vertices V as G . Given a spanning tree of G , we can define blue to be the set of internal nodes of G , that is, the set of vertices that are not leaves. Then blue is connected, and ∂blue equals the set of leaves. The question is then: is there a spanning tree of G with many leaves?

Here there is a result from graph theory that we can just buy off the shelf.

Proposition 1 (Kleiman-West, 1991; see also Storer, 1981, Payan-Tchuente-Xuong, 1984, and Griggs-Kleitman-Shastri, 1989). Let G be a connected graph with n vertices, all of degree ≥ 3. Then G has a spanning tree with ≥ n/4 + 2 leaves.

Using this Proposition, we prove:

Corollary 1. Let G be a connected graph such that ≥ n of its vertices have degree ≥ 3. Then G has a spanning tree with ≥ n/4 + 2 leaves.

We omit the proof of the corollary, as it consists just of less than a page of casework and standard tricks. Alternatively, we can prove it from scratch in about a page by modifying Kleiman and West's proof.

(It is clear that some condition on the degrees, as here, is necessary; a spanning tree of a cyclic graph (every one of whose vertices has degree 2) has no leaves.)

Before we go on to see what do we do with shapes (∼, σ) such that G (∼, σ) does not have many vertices with degree ≥ 3, let us remove the assumption that there are no yellow letters.

if there are representatives i ∈ [i], j ∈ [j] that survive in the reduced word, and such as that all letters between i and j disappear during reduction. (If j < i, then "between" is to be understood cyclically, i.e., the letters between i and j are those coming after i or before j.) We draw each arrow only once, that is, we do not draw multiple arrows.

For instance, in our example

It is obvious that every vertex has an in-degree of at least 1.

For S a set of vertices, define the out-boundary ∂S to be the set of all vertices v not in S such that there is an arrow going from some element of S to v. Then, whether or not there are yellow letters, the number of red gaps in the reduced word is at least | ∂blue|.

Lemma 4. Let G be a directed graph such that every vertex has positive in-degree. Let S be a subset of the set vertices of G. Then there is a subset S ⊂ S with |S | ≥ |S|/3 such that, for every v ∈ S , there is an arrow from some vertex not in S to v.

Proof. The first step is to remove arrows until the in-degree of every vertex is exactly 1. Then G is a union of disjoint cycles. If all vertices in a cycle are contained in S, we number its vertices in order, starting at an arbitrary vertex, and include in S the second, fourth, etc. elements. If no vertices in a cycle are in S, we ignore that cycle. If some but not all vertices in a cycle are in S, the vertices that are in S fall into disjoint subsets of the form {v 1 , . . . v r }, where there is an arrow from some v not in S to v 1 , and an arrow

We let S be the set of leaves of our spanning tree, and define red to be the set S given by the Lemma; blue is the set of all other non-yellow equivalence classes. Then the number of gaps is ≥ (n/4 + 2)/3, where n is the number of vertices of degree ≥ 3 in G (∼, σ) . Hence, by our work up to now,

and so, if n is even modestly large, we win by a large margin: we obtain a factor nearly as small as 1/H n 12κ 0 in (4.4).

Note. Had we been a little more careful, we would have obtained a bound of dim(W ) ≥ n 50 log κ − κ 2 or so. This improvement -which involves drawing, and considering, multiple arrows -would affect mainly the allowable range of H 0 in the end. We will remind ourselves of the matter later. 4.5. Shapes with low freedom. Writer-reader arguments. The question now is what to do with walks of shapes (∼, σ) for which G (∼, σ) does not have many vertices of degree ≥ 3.

Let us first give an argument that is sufficient when the word given by our walk is already reduced; we will later supplement it with an additional argument that takes care of the reduction. Let n ⊂ {1, 2, . . . , 2k} be the set of indices that survive the reduction. It is enough to define an equivalence relation ∼ on n to define the graph G ∼ = G ∼,σ we have been considering. (We do not need to specify σ, as its only role was to help determine which letters are yellow.) Assume that G ∼ has ≤ ν vertices of degree ≥ 3. Let κ be, as usual, an upper bound on the number of disjoint revenants; in particular, for any equivalence class [i] , there are at most κ elements i ∈ [i] such that the following element of n is not in [i] . We claim that the number of equivalence classes ∼ on n satisfying these two constraints (given by ν and κ) is

We will prove this bound by showing that we can determine an equivalence class of this kind by describing it by a string s on 5 letters with indices in n, together with some additional information at each of at most (κ − 1)ν + 2 indices. The idea is that, if an index lies in an equivalence class that is a vertex of degree 1 or 2 in G ∼ , then there are very few possibilities for the equivalence classes on which the index just thereafter may lie, namely, 1 or 2 possibilities.

We let the index i go through n from left to right. If [i] is in an equivalence class we have not seen before, we let s i = * . Assume otherwise. Let i − be the element of n immediately preceding i.

is a vertex of degree ≤ 2 and i is in an equivalence class that we have already seen next to [i − ] (that is, just before or just after [i − ] in n), then we let s i = 1 or s i = 2 depending on which one of those ≤ 2 equivalence classes we mean (the first one or the second one to appear). In all remaining cases, we let s i = ·, and specify our equivalence class explicitly, by giving an index j < i in the same equivalence class. Let us give an example. Let k = 8, n = {1, 2, . . . , 2k}. Let our equivalence classes be   {1, 7, 15}, {2, 16}, {3, 4, 5, 11}, {6, 10, 12, 14}, {8}, {9, 13} .

Then s 1 = s 2 = s 3 = s 6 = s 8 = s 9 = * and s 4 = s 5 = 0. The vertices of degree 3 are [1] and [6]; all other vertices are of degree 2. Hence, s 16 = · (since 16 follows 15, which is in [1]) and s 7 = s 11 = s 13 = s 15 = · (since these indices follow 6, 10, 12, 14, which are in [6]). Since 3 ∼ 4 ∼ 5, we let s 4 = s 5 = 0. It remains to consider i = 10, 12, 14. In the case i = 10, we see that [9] has degree 2, but, when we come to 10, we realize that no element of [10] has been seen next to an element of [9] before: 8 is next to 9, but 8 / ∈ [10]. Hence, we let s 10 = ·. In the case i = 12, we see that [12] has been seen next to [11] In summary, s = ***00*·**··2·2··, and, in addition to writing s, we specify the equivalence classes of the indices i with s i = . A reader can now reconstruct our equivalence classes by reading s from left to right, given that additional information. (Try it!) We should now count the number of dots ·, since that equals the number of times we have to give additional information. For a class [i ] that is a vertex of degree ≤ 2, it can happen at most once (that is, for at most one element i of [i ]) that s i = 0, 1, 2 for the index i in n right after i , unless 1 ∈ [i ], in which case it can happen twice. (Someone who already has a neighbor and will end up with ≤ 2 neighbors in total can meet a new neighbor at most once.) For [i ] a vertex of arbitrary degree, it can happen at most κ times that s i = 0. Hence, writing n ≤2 for the number of vertices of degree ≤ 2 and n ≥3 for the number of vertices of degree ≥ 3, we see that the total number of indices i ∈ n with s i ∈ { * , .} is at most κn ≥3 + n ≤2 + 1 + 1, where the last +1 comes from the first index i in n. The number of indices i with s i = * equals the number of classes, i.e., n ≤2 + n ≥3 . Hence, the number of indices i with s i = . is ≤ κn ≥3 + n ≤2 + 2 − (n ≥3 + n ≤2 ) = (κ − 1)n ≥3 + 2 ≤ (κ − 1)ν + 2.

Each equivalence class contributes a factor of at most L = p∈P 1 p to our total in (4.1); singletons (equivalence classes with one element each) actually contribute √ L , because of the factor of L − |S(∼)|

. Recall that we are saving a factor of almost H ν 12κ −1 0 through (4.4) (let us say H ν/24κ 0 , to be safe). Thus, forgetting for a moment about the yellow equivalence classes, we conclude that the contribution to (4.4) of the equivalence relations ∼ such that G ∼ has ν vertices of degree ≥ 3 is

where the factor of 4 2k is there because we also have to specify σ ∈ {−1, 1} {1,...,2k} and l, n ⊂ {1, 2, . . . , 2k}. Provided that we set our parameters so that H 1/24κ 0 ≥ 2(2k) κ−1 (and it turns out that we may do so, provided that log H 0 is larger than (log H) 2/3+ -or rather, larger than (log H) 1/2+ , if we make the improvement through multiple arrows we mentioned a little while ago), we are done; we have a bound of size

which is what we wanted all along.

But wait! What about the part of the word that disappears during reduction? It is partly described by a string of matched parentheses: for example, xx −1 x −1 yy −1 x gives us ()(()). (We also have to specify the exponents σ i separately.) The equivalence class of the index of a closing parenthesis is the same as that of the index of the matching opening parenthesis. Thus, we need only worry about specifying the equivalence classes of the opening parentheses. There are k − |n|/2 of them.

A naive approach would be to describe each such equivalence class [i] by specifying the first index i in it each time it occurs (except for the first time). The cost of that approach could be about as large as k k−|n|/2 , which is much too large. It would seem we are in a pickle. Indeed, we know we would have to be in a pickle, if we were not using the fact that we are not working in all of N, but in a subset X ⊂ N all of whose elements have ≤ KL divisors in P. (If we worked in all of N, even trivial walks, which are entirely yellow, would pose an insurmountable problem.) However, how can we use X, or the bound ≤ KL , by this point?

The point is that we need not consider all possible (p [i] ) in (4.1), but only those tuples that can possibly arise in a walk n, n + σ 1 p 1 , n + σ 1 p 1 + σ 2 p 2 , . . . , n + σ 1 p 1 + σ 2 p 2 + · · · + σ 2k p 2k = n all of whose nodes are in X. Now, if a prime p j has appeared before as p i (i.e., i < j and i ∼ j) and both i and j are "lit", that is i, j ∈ l, then, as we know, σ i p i + . . . + σ j−1 p j−1 must be divisible by p i . (Indices that are not lit do not pose a problem, due to the factors of the form 1/p that they contribute.) What is more: if i ∈ l, i < j with p i |σ i p i +. . .+σ j−1 p j−1 , then n+σ 1 p 1 +. . .+ σ j−1 p j−1 is forced to be divisible by p i (because n + σ 1 p 1 + . . . + σ i−1 p i−1 is divisible by p i ). Now, n + σ 1 p 1 + . . . + σ j−1 p j−1 has ≤ KL divisors. Hence, given j, there are at most KL distinct equivalence classes [i] having at least one representative i < j, i ∈ l such that p i |σ i p i + . . . + σ j−1 p j−1 . This is a property where n no longer appears. Now, as we describe ∼ to our reader, when we come to an index of the one kind that remains problematic -disappearing in the reduction, corresponding to an open parenthesis, in an equivalence class that has been seen before -we need only specify an equivalence class among those ≤ KL equivalence classes that have at least one representative i < j, i ∈ l such that p i |σ i p i + . . . + σ j−1 p j−1 .

The reader can figure out which one those are, as that is a property given solely by p 1 , . . . , p j−1 and σ 1 , . . . , σ j−1 . We can give them numbers 1 to KL by order of first appearance, and communicate to the reader the equivalence class we want by its number, rather than by an index. Thus we incur only in a factor of KL , not 2k.

In the end, we obtain a total contribution of

which is what we wanted. In other words,

Incidentally, in earlier drafts of the paper, we did not have a "writer" and a "reader", but a mahout and an elephant:

They were unfortunately censored by my coauthor. As this is my exposition, here they are. The picture might be clearer now -the elephant-reader has no idea of n, or of our grand strategy, but it is an intelligent animal that can follow instructions and is endowed with a flawless memory (and the ability to test for divisibility, apparently). Then, for any

where the implied constants are absolute.

We have sketched a full proof, leaving out one, or rather two, passagesnamely, the proof that we can take out from X two kinds of integers, and still keep X well-distributed enough in arithmetic progressions for cancellation to happen when we have too many lone primes. As we have said before, those two kinds of integers are: (a) integers n with ≥ KL divisors, (b) integers n that could give rise to too many disjoint revenants. Here (b) sounds a little vague, but, if we simply take out from X the set Y of those integers n for which there can be a "premature revenant", meaning that there exist p ∈ P, p 1 , . . . , p l ∈ P with p i = p and σ ∈ {−1, 1} l , l ≤ , such that p|n, p 1 |n, p 2 |n + σ 1 p 1 , . . . , p l |n + σ 1 p 1 + . . . + σ l−1 p l−1 , p|n + σ 1 p 1 + . . . σ l p l , then we have ensured that there cannot be more than 2k/ disjoint revenants.

(We have not really forgotten about the possibility that some intermediary indices may not be lit -those are taken care of by a different argument.) It is actually not hard to show that Y is a fairly small set; what takes work is showing that it is well-distributed. What we did was develop a new tool -a combinatorial sieve for conditions involving composite moduli. While it is somewhat technical, may be interesting in that it will probably be useful for attacking other problems. Let us leave it to the appendix.

The main theorem has several immediate corollaries. First of all, we obtain what we set as our original goal.

For any e < w ≤ x such that w → ∞ as x → ∞,

We can also obtain substantially sharper results. A case in point: we can prove that λ(n + 1) averages to zero (with weight 1/n as above, or "at almost all scales") over integers ≤ N having exactly k prime factors, where k is a popular number of prime factors to have (e.g., log log N , or log log N + 2021). To see more such corollaries, look at the actual paper, or derive your own! 5.1. Subset of acknowledgments. Bonus track. I am grateful to many people -please read the full acknowledgments in the paper. Here I would like to thank two subsets in particular -(a) postdocs and students in Göttingen who patiently attended my online lectures during the first year of the COVID pandemic, as the proof was finally gelling, (b) inhabitants of MathOverflow.

In (b), one can find, for example, Fedor Petrov, who pointed us towards Kleitman-West, besides answering other questions, but you can also find some users who chose to remain anonymous. Among them was user "BS.", who explained how one of my question about ranks was related to topology. That relation has gone well under the surface in the current version, so let us discuss it here, for our own edification.

Consider a word w of a special kind -a word w where every letter x 1 , . . . , x k appears twice, once as x i , once as , if a skew-symmetric matrix M has rank r, then it has a minor with disjoint row and column index sets and rank ≥ r/2. Since I was interested precisely in constructing such a minor with high rank (I and J giving us what we called "blue" and "red" vertices in the above), it made sense that I would want to know what the rank r of M might be. In particular, when is M non-singular?

What BS. showed to me is that one can construct a surface S with handles corresponding to the word w in a natural way. (Apparently this construction is standard, but it was completely unknown to me.) For instance, for w = x 1 x 2 x −1 1 x −1 2 x 3 x −1 3 , the surface S looks as follows:

The matrix M then corresponds to the intersection form of this surface. This form is defined as an antisymmetric inner product on H 1 (S, Z), counting the number of intersections (with orientation) of two closed paths in the way you may expect. For instance, in the following, z 1 , z 2 = −1, whereas z 1 , z 3 = z 2 , z 3 = 0:

Say S has genus g and b ≥ 1 boundary components. Then, for S g the surface of genus g without boundary, there is an embedding S → S g preserving the intersection form, with H 1 (S) → H 1 (S g ) having kernel of rank b − 1. The intersection form on H 1 (S g ) is non-singular. Hence, M has corank b − 1. In particular, M is non-singular iff b = 1, i.e., iff its boundary is connected.

It is an exercise to show that b equals the number of cycles in the permutation i → σ(i) + 1 mod 2k, where σ is the permutation of {1, 2, . . . , 2k} switching

I have no idea of how to define a surface S like the above for a word w of general form -the natural generalization of M is the matrix corresponding to the system (4.2) of divisibility relations, and that matrix need not be skew-symmetric, or even square. However, in S and its boundary, you can already see shades of our graph G (σ,∼) .

which is at most 1/H 0 ; the probability that there be p 1 , . . . , p l as above with σ 1 p 1 + . . . + σ l p l = 0 is also quite small.) The more complicated task is to show that Y is reasonably equidistributed in arithmetic progressions.

The main issue here is that we have a great number of conditions as in (A.1) to exclude. Inclusion-exclusion involves 2 m terms for m conditions -that is too many. There is a tool for dealing with that sort of issue in number theory, at least in some specific contexts: sieves.

We shall first show how to set up a general, abstract combinatorial sieve, for arbitrary logical conditions (rather than conditions of the form n ≡ a mod p).

We will then show how to apply it to conditions of the form n ≡ a mod m, that is, congruence conditions where the moduli may be composite (as opposed to being prime, as is common in sieve theory). The matter is tricky -one has to prevent combinatorial explosion again. Rota's cross-cut theorem will be our friend.

Lastly, we will show how to apply the sieve in our context (with moduli pp 1 · · · p l coming from (A.1) and sketch how to estimate the main and error terms. We will introduce sieve graphs.

For readers who have had some passing contact with sieve theory: while, in introductory texts on sieve theory, the emphasis is often on counting a set of elements S not obeying any of a set of conditions (e.g., the set S of primes n such that n + 2 is also a prime; its elements n do not fulfill the conditions n ≡ 0 mod p or n ≡ −2 mod p for any small prime p), the emphasis on much recent work, and also here, lies more generally on providing an approximation to the characteristic function 1 S of S by a function that is easier to deal with, or, if you wish, has a "simpler description" (in some precise sense). One label that has become attached to this use of sieves is "enveloping sieve", though that really describes one kind of approximation (a majorant of 1 S ) and at any rate should really be called an enveloping use of a sieve (many sieves can be used as enveloping sieves). At any rate, that is all more or less orthogonal to the main issue here, which is that we have to develop a genuinely more general sieve.

A.1. An abstract combinatorial sieve. Let Q be a set of conditions that an element x of a set Z may or may not fulfill. (For us, later, Z will be the set of integers, but that is of no importance at this point.) Denote by Q(x) ∈ 2 Q the set {Q ∈ Q : Q(x) is true}, i.e., the set of conditions in Q fulfilled by x. Define 1 ∅ (S) to be 1 if the set S is empty, and 0 otherwise. Then 1 ∅ (Q(x)) equals 1 when x satisfies none of the conditions in Q, and 0 otherwise.

We are interested in approximations to 1 ∅ (Q(x)), i.e., the function that takes the value 1 when x satisfies none of the conditions in Q, and 0 otherwise. This may seem to be a silly question, though it falls within the general framework we were discussing before. Let us put matters a little differently. A standard way to express 1 ∅ (Q(x)) would be as

and that might suit us, except that the number of subsets T ⊂ Q(x) is very large. Can we obtain a reasonable approximation by means of a sum of the form

where g : 2 Q → {0, 1} is a function -preferably one whose support is much smaller than Q(x)? (Here, as is usual, 2 Q denotes the set of all subsets of Q.)

It turns out to be possible to bound the error term in an approximation of this form in full generality. To be precise: the error term will be bounded in terms of the boundary of the support of g. Here we say that an S ⊂ Q is in the boundary of a collection B ⊂ 2 Q if there is an element s of S such that exactly one of the two sets S, S \ {s} is in B. The proof is short and basically trivial (for Q(x) non-empty, the second sum is just a reordering of the first sum, with opposite sign). It is inspired by a passage in the proof of Brun's combinatorial sieve (see, e.g., [CM06, §6.2, p. 87-89]. We do not need the linear ordering for Q to be in any sense natural.

A.2. Sieving by composite moduli. Let Q be a finite collection of arithmetic progressions. To each progression P ∈ Q, we can associate the condition n ∈ P , for n ∈ Z. Thus, we obtain a set Q of conditions corresponding to Q, and apply the framework above.

We are interested in approximating 1 n∈P ∀P ∈Q (n) -that is, the characteristic function of the set of all n lying in no progression P ∈ Q -by a sum

where D ⊂ Q ∩ is some set of progressions.

We will denote by q(R) the modulus q of an arithmetic progression a + qZ.

Proposition 2. Let Q be a finite collection of distinct arithmetic progressions in Z with square-free moduli. Let D be a non-empty subset of

i.e., if S ∈ D, then every superset S ⊃ S in Q ∩ is also in D. Let F D be as above. Then

We can of course think of ∂D and ∂ out D as the boundary and the outer boundary of D.

Proof. The proof of the Proposition starts with an application of the Lemma above. In what then follows, the important thing is to prevent a combinatorial explosion. For instance, it is not a priori clear that c R can be bounded well: there could be very many ways to express a given R ∈ D as an intersection S ; in fact, the number of ways could be close to 2 2 ω(q(R)) , that is, the number of collections of subsets of a set with q(R) elements. We can give the much better bound 2 ω(q(R)) by obtaining cancellation (by (−1) |S | ) among those different ways. To be more precise, we apply the following Lemma, which is an easy consequence of Rota's cross-cut theorem, but can also be proved from scratch in a couple of lines. The same Lemma allows us to deal with the same combinatorial explosion in the error terms.

Lemma 6. Let C be a collection of subsets of a finite set X. Then

Proof. Exercise.

How to apply the Proposition? We can define D to be the set of progressions in Q ∩ with "small modulus", for some notion of "small". Then its boundary consists of progressions that are "borderline small", i.e., not really small, and so the proportion of n in each one of them will not be large; we just need to control the size of the boundary to show that the total error term is acceptable.

A.3. Sieve graphs and their usage. We now come to our application of the sieve we have just developed. Our aim is to prevent our walks n, n + σ 1 p 1 , n + σ 1 p 1 + σ 2 p 2 , . . . from having what we called premature revenants. We will do so by constraining each of n, n + σ 1 p 1 , n + σ 1 p 1 + σ 2 p 2 . . . to lie within the set Y of integers that cannot give rise to premature revenants.

To be precise: we define Y to be the set of all integers n except for those for which there are primes p 1 , . . . , p l ∈ P and signs σ 1 , . . . , σ l ∈ {−1, 1} with 1 ≤ l < such that (A.2) p 1 |n, p 2 |n + σ 1 p 1 , . . . , p l |n + σ 1 p 1 + . . . + σ l−1 p l−1 , there are no repeated primes among p 1 , . . . , p l except perhaps for consecutive primes p i = p i+1 = . . . = p j with σ i = σ i+1 = . . . = σ j , and one of the following two conditions holds:

• there exists a prime p 0 ∈ P distinct from p 1 , . . . , p l such that (A.3) p 0 |n and p 0 |n + σ 1 p 1 + . . . + σ l p l ,

• we have (A.4) σ 1 p 1 + . . . + σ l p l = 0.

The set of integers n that obey conditions (A.2) and (A.3) is an arithmetic progression to modulus [p 0 , p 1 , . . . , p l ] (that is, the lcm of p 1 , . . . , p l , i.e., the product of all distinct primes among them), unless it is empty. The set of integers n obeying (A.2) and (A.4) is an arithmetic progression to modulus [p 1 , p 2 , . . . , p l ], unless it is empty. Let W ,P denote the set of all arithmetic progressions arising in this way. Then the condition n ∈ Y is equivalent to n not lying in any of the arithmetic progressions in W ,P . Likewise, for β 1 , . . . , β 2k ∈ Z, the condition that n + β i ∈ Y for all 1 ≤ i ≤ 2k is equivalent to asking that n not be in any arithmetic progression of the form P − β i with P ∈ W ,P and 1 ≤ i ≤ 2k.

We are thus in the kind of situation to which our sieve for composite moduli is applicable. Applying the Proposition above, we obtain, for any m ≥ 1,

for Q = W ,P (β) and some c R ∈ R with |c R | ≤ 2 ω(q(R)) . (The proof is one line: we define D to be the set of all non-empty R ∈ Q ∩ such that the modulus of R has ≤ m prime factors.)

The more obvious issue now is how to bound the error term here. (For us, in our application, there is also the related issue of showing that the main term is well-behaved -in particular, its sum over certain arithmetic progressions should not be too large.)

To keep track of the kind of conditions giving rise to a progression R ∈ Q ∩ , we find it sensible to define a sieve graph, which is, one may say, a pictorial representation of those conditions, or rather of what their general shape is and how they relate to each other.

We define a sieve graph to be a directed graph consisting of:

(1) a path of length 2k, called the horizontal path;

(2) threads of length < , of two kinds:

(a) a closed thread, which is a cycle that contains some vertex of the horizontal path, and is otherwise disjoint from it, (b) an open thread, which is a path that has an endpoint at some vertex of the horizontal path, and is otherwise disjoint from it;

(3) for each open thread and each of the two endpoints of that thread, an edge whose tail is that endpoint, but whose head belongs only to the edge (i.e., it is a vertex of degree 1). These two edges will be called the thread's witnesses; they are considered to be part of the thread.

We will work with pairs (G, ∼), where G is a sieve graph and ∼ is an equivalence relation on the edges of G such that the witnesses of a thread are equivalent to each other. We can put additional conditions, reflecting the conditions defining Y : we require that witnesses be equivalent to no other edges in their thread, and that, in any thread, the set of edges in an equivalence class form a connected subgraph (meaning: primes do not repeat in a thread unless they are consecutive).

At any rate, it is clear what we will do: we will go over different pairs (G, ∼), and, for each pair, we will consider the divisibility conditions resulting from assigning a distinct prime in P to each equivalence class of ∼. We recall that Y is defined as the set of integers that do not satisfy any conditions of a certain kind. A pair (G, ∼), together with an assignment of a prime in P to each equivalence class of ∼, corresponds to a conjunction Q 1 ∧ Q 2 ∧ · · · ∧ Q j of some such conditions Q i . (To be precise -the conditions are given by a pair (G, ∼) together with a subset l ⊂ {1, 2, . . . , 2k}, corresponding to the "lit" edges: only those edges in the horizontal path whose indices are in l impose divisibility conditions -the unlit edges are muted, so to speak.) We need to study these conditions Q i (all of which are of the form "n belongs to an arithmetic progression", as we have seen) because they will appear in the approximation to 1 Y that a sieve will give us.

We say (G, ∼) is non-redundant if every thread contains at least one edge x (possibly a witness) whose equivalence class [x] contains no edge in any other thread. (A thread where every edge is equivalent to an edge in some other thread would correspond to a condition that either is redundant, given the conditions from the other threads, or contradicts them. As Caliph Omar did not say. . . ) The cost κ(G, ∼) is the number of equivalence classes that contain at least one edge (possibly a witness) in some thread, i.e., the number of classes that do not contain only edges in the horizontal path. Let W k, ,m be the set of non-redundant pairs with given parameters k and and cost m. It is clear that W k, ,m must be finite, since any pair of cost m contains at most m threads. It is not hard to bound the number of elements of W k, ,m with a given number of threads r ≤ m.

When we apply our sieve for composite moduli so as to approximate 1 n+β i ∈Y ∀1≤i≤2k , we define D to be the set of conditions corresponding to non-redundant pairs (G, ∼) with cost κ(G, ∼) ≤ m for some value of m we choose. Then the outer boundary ∂ out D corresponds to non-redundant pairs (G, ∼) of cost m < κ(G, ∼) ≤ m + . Our task is then to prove that the contribution of all (G, ∼) with cost between m and m + is small. The crucial part is to show that, given any such (G, ∼), together with a sign σ y

An introduction to sieve methods and their applications

Expansion, divisibility and parity

Primos, paridad y análisis. To appear in the proceedings of the AGRA III school

Multiplicative functions in short intervals

An averaged form of Chowla's conjecture. Algebra Number Theory

The logarithmically averaged Chowla and Elliott conjectures for two-point correlations

The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures

75013 Paris CEDEX 13, France; Mathematisches Institut

We can give estimates on the main term of the sieve in much the same way, only keeping track of the set W k, ,m of strongly non-redundant pairs, meaning pairs (G, ∼) such that every thread contains at least one edge x (possibly a witness) whose equivalence class [x] contains no edge in any other thread and no lit edge in the horizontal path (that is, no edge in the horizontal path with index in l).

What we must address now may be seen as a technical task. However, the way we will address it most likely has more general applicability.Our task in this appendix is to show how to exclude from our set X ⊂ N all integers n that could give rise to premature revenants, that is, edge lengths p i = p i with i − i small such that p j = p i for some i < j < i . (Without this last condition, we would be counting not only "revenants" but also mere repetitions.) As we already commented, it is enough to exclude the set Y of all integers n such that there exist p ∈ P, p 1 , . . . , p l ∈ P with p i = p and σ ∈ {−1, 1} l , l ≤ , for which (A.1) p|n, p 1 |n, p 2 |n + σ 1 p 1 , . . . ,It is actually easy (and in fact an exercise for the reader) to show that Y is quite small -not much larger than O(L ) N/H 0 . (Outline: the probability that a given divisor p of σ 1 p 1 + . . . + σ l p l divide a random n is about 1/p, for each edge y and a subset l ⊂ k, the sum of The proof is simple, and its main idea is as follows. Since (G, ∼) is nonredundant, each thread contains an edge in an equivalence class that does not appear in other threads (or elsewhere in the same thread, except for consecutive appearances). For each thread, we choose one such edge x. Then the thread binds the variable p [x] , so to speak, and so we lose one degree of freedom for each r. In detail: