key: cord-0468291-ecj9u1kd authors: Kai, Wataru title: The Green-Tao theorem for affine curves over F_q date: 2021-01-04 journal: nan DOI: nan sha: 9f9cee79e924e96da92812ce62f91833677e65f5 doc_id: 468291 cord_uid: ecj9u1kd Green and Tao famously proved in a 2008 paper that there are arithmetic progressions of prime numbers of arbitrary lengths. Soon after, analogous statements were proved by Tao for the ring of Gaussian integers and by L^e for the polynomial rings over finite fields. In 2020 this was extented to orders of arbitrary number fields by Kai-Mimura-Munemasa-Seki-Yoshino. We settle the case of the coordinate rings of affine curves over finite fields. The main contribution of this paper is subtle choice of a polynomial subring of the given ring which plays the role of $mathbb Z$ in the number field case. This choice and the proof of its pleasant properties eventually depend on the Riemann-Roch formula. In this paper we prove the following: Theorem 1.1. Let p be a prime. Let O 0 be an integral domain finitely generated over F p and whose fraction field has transcendence degree 1 over F p . Then for any positive integer k ≥ 1, the set of prime elements of O 0 contains a k-dimensional affine subset. Recall that an affine subset of a vector space is by definition a translate of a vector subspace (necessarily unique). Its dimension is defined to be that of the corresponding vector subspace. We actually prove a density version of the theorem which we now formulate. Let O be the integral closure of O 0 in its fraction field. It is a Dedekind domain finite over O 0 . There is a canonical linear norm, defined in §2, − : O → R ≥0 , which gives an increasing exhaustive filtration by finite subsets The density statement is formulated as follows. In fact, in our proof of Theorem 1.2, we search for k-dimensional affine subsets of a very specific form. For later use, note that these notions make perfect sense for any integral domain o, a torsion-free o-module a and any subset S ⊂ a. For a suitable o ⊂ O and an arbitrary S, we shall prove in Theorem 5.2 that any subset of P O with positive upper relative density contains a non-trivial o-homothetic copy of S. Theorem 1.2 then follows because if we take S to be a k-dimensional linear subspace of O then every non-trivial o-homothetic copy of it is a k-dimensional affine subset. This is why we propose to call Theorem 1.2 the Green-Tao theorem in positive characteristic, as the Green-Tao theorem for number fields is commonly formulated as follows. Theorem 1.4 (Green-Tao theorem for number fields: [3] , [9] , [6] ). Let K be a number field and O K the ring of its integers. Denote by P K the set of prime elements of O K . Then any set A ⊂ P K with positive upper relative density contains a non-trivial Z-homothetic copy of S for an arbitrary finite subset S of O K . In [6] they also prove a variant of this statement for the "prime elements" (in an appropriate sense) in a given non-zero ideal a ⊂ O K . While we will also state and prove Theorem 5.2 in this generality, the reader is advised to assume a = O in the first reading. We will stick to the case a = O in the rest of Introduction. 1.1. Overview of the proof. Theorem 1.2 for polynomial rings O = F q [t] is due to Lê [7] . For Theorem 1.4, the case of Z is the renowned theorem of Green and Tao [3] and the case of Z[ √ −1] is due to Tao [9] . For the ring O K of integers in a general number field K, it is a result of Mimura, Munemasa, Seki, Yoshino and the present author [6] . See Table 1 . All of them follow the strategy of Green-Tao [3] . Details of our arguments are closest to those of [6] . The new issue we have to face is that while the number ring O K has a canonical base ring Z which is simple enough and such that O K ∼ = Z n compatibly with the metrics on both sides, there is no canonical one for O in positive characteristic. In §2 we use the Riemann-Roch formula to find an appropriate subring o of O, which is isomorphic to the polynomial ring and over which O is finite. The subtlety of our choice is that, moreover, the o-linear isomorphism O ∼ = o r (given once we choose a basis) is compatible with the metrics on both sides; see Proposition 2.2. Once we have done this, everything in [6] goes through. So we refer the reader to [6, § §1-2] for a detailed overview. Let us just recall the three main ingredients: • the relative Szemerédi theorem (recalled in §3); • the construction of a pseudorandom measure λ : O → R ≥0 ( §4, completed in §5); • the prime elements P O have positive density with respect to the measure λ ( §5). Here, the relative Szemerédi theorem (Theorem 3.2) roughly asserts the following: suppose we are given a function λ : O → R ≥0 which is a pseudorandom measure-this condition says λ is close enough to the constant function 1 in a certain measure (see Definition 3.1) . Suppose also that a subset A ⊂ O has positive (upper) density with respect to λ in that Then A contains a non-trivial o-homothetic copy of S for every finite subset S ⊂ O. The construction of λ : O → R ≥0 is ideal-theoretic in nature. The proof that it is pseudorandom ultimately relies on the knowledge that the zeta function ζ O has a simple pole at 1 with a positive residue. That the prime elements have positive density with respect to λ will be deduced from the Chebotarëv density theorem, an analog of Prime Number Theorem in our setting. A pitfall in the construction of λ is that there is a somewhat natural fuction λ ′ : O → R ≥0 but we are not going to use it directly because we do not know if it is pseudorandom in our sense. Instead, we choose an element W ∈ o with sufficiently many different prime factors and b ∈ O coprime to W , and define the function λ as the composite O times a normalizing factor. This enables us to prove the pseudorandomness. Thus, the set A ⊂ O in the relative Szemerédi theorem is going to be taken as the inverse image of P O by the affine linear map W (−) + b; this is of course equivalent to considering the prime elements which are congruent to b modulo W . The reader will see how this trick (so-called W -trick) works as the proof unrolls in § §4-5. While the translation of the arguments in the number field case [6] into our situation is straightforward in many places, one cannot formally apply the results in loc. cit. because of the slight difference of languages between number fields and algebraic curves. So we include full proofs for the convenience of the reader. Some details get simpler-as the reader might naturally expect-partly thanks to the fact that the canonical norm − : O → R ≥0 is ultrametric, meaning that α + β ≤ max { α , β } for all α, β ∈ O, so that the subsets O ≤N are in fact subgroups. Notation. For a function on a non-empty finite set f : X → C, we use the standard expectation notation: From §4 onwards, we will make extensive use of the big-O notation with dependence parameters as in [6, Notation in §2]: let f and g be C-valued functions on a set X which depend on additional parameters a, b, c, . . . . Assume that the values of g are positive real numbers. We write to mean that there is a positive constant C = C b,c,... > 0 depending only on the parameters in the subscript such that the inequality |f (x)| < Cg(x) holds for all x ∈ X. (Thus in this case the implied constant C can be taken independent of a.) An expression like O b,c,... (1) would mean a positive constant depending only on the parameters b, c, . . . in the subscript. Note in particular that O(1) without subscript would mean an absolute constant depending on nothing at all. When g(x) is a heavy formula, the no- is sometimes preferred. A quantity written in the form 1 + O b,c,... (g(x)) is one whose difference with 1 is O b,c,... (g(x)). When we say something like "f (x, y) = O b,c,... (g(x, y)) for all sufficiently large x, y ∈ R" we are taking the domain X to be a subset of R 2 consisting of the pairs of real numbers larger than certain thresholds. In practice, the thresholds o en depend on the parameters a, b, c, . . . . We usually indicate how the thresholds depend on the parameters, especially when that piece of information is relevant. The purpose of this section is to define the canonical norm − : O → R ≥0 , set up necessary algebraic terminology and find an appropriate We use [8] as the main reference about algebraic background. We assume the reader is familiar with advanced undergraduate commutative algebra as in [1] and basic notions of algebraic curves (equivalently function fields in one variable) e.g. as in [8, Chapter 5] [5, Chapter I, Section 6] but they do not have to know more than the Riemann-Roch theorem [8, Theorem 5.4 By a linear norm or an ultrametric norm on an abelian group G let us mean a nonnegatively valued function − : G → R ≥0 which is ultrametric: and non-degenerate in that the only element with norm 0 is the zero element. A submultiplicative linear norm on an (always commutative) ring A is a linear norm on the abelian group A which moreover satisfies In our examples the multiplicative unit will always have norm 1. A norm − is said to be multiplicative if the above inequality is always an equality. Let F q be the integral closure of F p in O. Let X be the complete non-singular curve over F q which contains X := Spec O as an open subscheme. Let n be the cardinality of the complement: n := X X . Regard each v ∈ X as a discrete valuation Frac(O) * ։ Z. Let F(v) be its residue field and deg(v) := [F(v) : F q ] its degree. Define a linear norm − v on O by: The value 0 v is understood to be 0. The canonical norm − on O is defined by This is a submultiplicative linear norm because the following formulas hold for all v ∈ X and α, β ∈ Frac(O) * : . Define the norm of a non-zero ideal a ⊂ O as the cardinality of the quotient N(a) := |O/a|, and for α ∈ O {0} write N(α) := N(αO) for short. By convention we define N(0) := 0. We call it the ideal norm of α to avoid confusion with the linear norm − . For α = 0 we know N(α) = v∈X |F(v)| v(α) say by prime decomposition of ideals in O. It follows by the product formula for complete algebraic curves (e.g. [8, Proposition 5.1, p.47] ) that the following equality holds for all α ∈ O: In particular α is in O * if and only if v∈X X α v = 1. For positive real numbers N > 0, set: By the Riemann-Roch theorem for curves [8, Corollary 4 of Theorem 5.4, p.49], we know that |O ≤N | is approximately proportional to N n . To be more precise, consider the following invariants: g := the genus of X, d 0 := the least common multiple of deg(v)'s for v ∈ X X. Let us denote by ⌊−⌋ the floor function x → (the largest integer not exceeding x) and consider the next divisor on X for N ≥ 1: Its degree is v∈X X in the notation of [8] . Therefore by Riemann-Roch [8, Corollary 4 of Theorem 5.4, p.49] we get |O ≤N | = N n /q g−1 for every N ≥ q (2g−1)/n which is a power of q d 0 . From this we also get the following bound valid for all real numbers N ≥ q (2g−1)/n : When we consider a non-zero ideal a ⊂ O (which is relevant only if the reader is interested in the case a = O of Theorem 5.2), we endow a the induced linear norm and write By the Riemann-Roch formula again (or from (3)) we get The use of the canonical norm among other norms is not essential. We could have chosen an arbitrary positive intger d v ≥ 1 for each v ∈ X X and defined α v := q −dv·v(α) . The content of this paper would remain valid with minor modifications. However, it did not seem appealing to the author to allow the freedom of this choice at the cost of heavier notation. Remark 2. The A = P O case of Theorem 1.2 can be reduced to the case where n = X X = 1 (with a general A ⊂ P O ), for which the treatment in the rest of §2 can be much simpler because then we have − O = − v = N(−) (where v ∈ X X is the unique element). Since we eventually prove Theorem 1.2 in full strength, we only give a sketch of this reduction argument. Take any point v ∈ X X and set X ′ := X {v} and O ′ := Γ(X ′ , O X ). We have a canonical injection i : O ′ ֒→ O and know that all but finitely many associate classes (corresponding to a subset of X ′ X) of prime elements of O ′ remain prime elements in O. Since O ′ * = F * q , those exceptional prime elements are finite in number, so in particular have density zero in P O ′ . Let P − O ′ be the set of remaining prime elements. Now we apply Theorem 1. Recall the definition X = Spec O and that X is the complete non-singular curve over F q containing X as an open subscheme. Also n = X X . Let d be a large enough common multiple of deg(v) (v ∈ X X). We claim that there exists a function φ ∈ O which has a pole at each v ∈ X X of order exactly d/ deg(v). For this, for each v consider the set Γ(X, O X ( d deg v v)) of rational functions on X whose only possible pole is v with order ≤ d/ deg(v). By the Riemann-Roch formula [ is a proper one so there is a function φ v whose only pole is at v and of order exactly d/ deg(v). Choose one such φ v for each v with a common d. Then the function φ := v∈X X φ v has the claimed property. Denote also by φ the corresponding finite map of curves φ : X → P 1 . By the choice of φ we have an equality of divisors on X: One can also see that the degree r of the map φ equals deg(D) = nd. Let t be the coordinate of A 1 ⊂ P 1 . We have φ −1 (A 1 ) = X |D| = X so t can be seen as an element of O and assertion (1) holds. Next, let f (t) ∈ F q [t] be a polynomial of degree e. Since the valuation v ∞ : F q (t) * → Z at ∞ ∈ P 1 agrees with the degree function when restricted to F q [t], we have f o := f v∞ = q e . By (5) we know v(f ) = − de deg(v) for each v ∈ X X. It follows that for all v. Therefore the assertion (2) and the compatibility assertion holds. We fix an o ⊂ O as in Proposition 2.1 throughout the paper. To avoid overloaded notation, we will avoid the use of the canonical norm of o as much as possible and reserve the symbol − for the canonical norm of O. As a consequence we use the following potentially confusing piece of notation: Note that therefore the cardinality |o ≤N | is equal, up to a bounded constant, to N 1/d . Despite this potential confusion, this notation is convenient in the bulk of our discussion. Let o ⊂ O be as in Proposition 2.1. We know O is a free o-module of rank r. Let α 1 , . . . , α r ∈ O be a basis. One can consider the max norm on O with respect to this basis: It is an ultrametric norm on the abelian group O. Let us recall that two norms − 1 and − 2 on an abelian group G are said to be equivalent if there are positive real numbers c, C > 0 such that the next inequality holds on G: It is easy to see that the equivalence class of the norm − α is independent of the choice of the basis α = (α 1 , . . . , α r ). The inequality in the other direction is slightly harder. For a notational reason, let us introduce the degree function deg : For integers M ≥ 0, denote by O deg≤M the F q -vector subspace of elements with degree ≤ M ; of course one has O deg≤M = O ≤(q M ) . Note that by Proposition 2.1 (2), the element t ∈ O is multiplicative in the sense that the equality holds for all α ∈ O rather than a mere inequality. (Actually, all elements of o are multiplicative by (6) .) It follows that the following mutiplication by t map is injective for all M ≥ 0, where we recall from Proposition 2.1 that d = deg(t): We claim that it is also surjective for all sufficiently large M ≥ 0. There are at least two ways to see this. One is to use the Riemann-Roch theorem which tells us that both sides of (8) have the same dimension for M large enough. The second is more down-to-earth. Let Since α i ∈ O deg≤M 0 ⊂ O deg≤M is zero in the group in question, we may assume f i (t) has no constant term. Then we have a well-defined element α/t := i f i (t) t · α i which is in O deg≤M because t is a multiplicative element. Then α is the image of α/t under the map (8) . In any case let M 0 be such that (8) is surjective for all M ≥ M 0 . Now since O deg≤M 0 is a finite set, there trivially exists an e 0 ≥ 0 such that all α ∈ O deg≤M 0 can be written in the form By induction on deg(α) using the bijection (8), the same holds for all α ∈ O. In multiplicative terms, this precisely says there is a positive constant C = q e 0 such that This complets the proof of Proposition 2.2. For the "prime elements in an ideal" case of Theorem 5.2, let a ⊂ O be a non-zero ideal. It is also a rank r free o-module. We can consider the restriction of the canonical norm − O to a and the max norm − β with respect to an o-basis β = (β 1 , . . . , β r ) of a. Proof. While the proof of Proposition 2.2 works for this case just as well, here we present a proof using the proposition. Let α continue to be an o-basis of O. By Proposition 2.2, it suffices to show that the restriction of − α to a and − β are equivalent. Each β i can be written (uniquely) as β i = 1≤j≤r g ij α j . Take a positive number C such that C ≥ g ij O for all i, j. For an element x = i f i β i = j ( i f i g ij ) α j , by the ultrametricity of − O and the choice of C we have: Next, by the theory of finitely generated modules over a principal ideal domain (say), we know that there is a non-zero element By the definition of the max norm we can isolate the f (t)-factors so that: This completes the proof. If we let f be the induced finite map f : The element t is not a multiplicative element for the canonical norm − because for example t/s = (s − 1) 3 /s 2 = q 2 which is not equal to t · 1/s = q 2 · q 1 . Instead t is a multiplicative element for the following linear norm − ′ : O → R ≥0 : The arguments of the proof of Proposition 2.2 show that the max norm on O ∼ = o 3 is equivalent to − ′ . However, one easily sees that − ′ is not equivalent to − ; for example, one has 1/s n = q n and 1/s n ′ = q 2n so their ratio is not bounded. Recall from (7) that we endow o with the induced norm from O and so we have N 1/d < |o ≤N | ≤ q · N 1/d where the right hand inequality becomes an equality when N is a power of q d . Let us note the following simple observation. Proof. Let us denote by − o t and − o r the max norms on o t and o r with respect to the standard bases. Choose an o-linear section σ : Also, for each choice of a r+1 , . . . , a t ∈ o ≤N we have by the same reasoning: for all choices of a i 's as above. Since this last element is in φ −1 (x), our claim follows. Corollary 2.5. Let φ : o t → a be a surjective o-linear map. Then there exists a positive number U > 0 such that for all N > 0 and x ∈ a ≤N , the following set Proof. This immediately follows from Proposition 2.2, Corollary 2.3 and Lemma 2.4. It is known that its image has rank n − 1: Since Pic(X) = Z ⊕ (finite) and the connecting map δ has nontrivial image into the Z-part, the claim (9) follows. Now consider the map L : It is an obvious analog of the multiplicative Minkowski map L that was also used in [6, §4] . By (2) we know that L maps the subset O * into the hyperplane H of R n defined by: is bounded from above. Note from (1) and (2) Proof. Consider the function f : It fits into the following commutative diagram: Let H ⊂ R n be as in (10) and π : R n → H the projection along the vector (1, . . . , 1). Since the value of f is unchanged by the translation along the vector (1, . . . , 1) ∈ R n , it follows that the norm-length compatibility of a subset D ⊂ O {0} is equivalent to the boundedness of the non-negatively valued function (x 1 , . . . , x n ) → max i x i on the set π(L(D)) ⊂ H. This is equivalent to the boundedness of the set π(L(D)) itself; recall that one can define the notion of boundedness of a subset of R n using any choice of a linear norm and the resulting notion is independent of the choice. It follows that for any given bounded subset ∆ ⊂ H, Recall by (2) and (9) that L(O * ) is a full-rank lattice of H. Let ∆ ⊂ H be any bounded complete set of representatives for the quotient H/L(O * ) (say a fundamental parallelogram). The inverse image π −1 (∆) ⊂ R n is a complete set of representatives for the quotient R n /L(O * ). By the previous paragraph, the set L −1 (π −1 (∆)) ⊂ O {0} is norm-length compatible. It is acted on by the group F * q = ker(L : O * → R n ) and the natural map of quotient sets For α ∈ O, one can consider its O * -orbit αO * . We will need the following bound. For the big-O notation, see Notation at the end of Introduction. The term "+1" is there only to cover the rare case N(α) = N n (the largest possible). For the application in this paper, its corollary Proof. We may assume α = 0. For a real number C, let R n ≤C be the set of points ( Sending the set in question by L, we see: So it suffices to bound the size of the set on the right hand side. By the translation "−L(α)" we have a bijection . Now since Γ is a lattice in H, the cardinality of a set of the form Γ ∩ ∆ ′ ≤C is asymptotically proportional to C n−1 as C → +∞ with error bounded independently of the specific translation ∆ ≤C ∆ ′ ≤C ; see for example [4, Appendix A] . This completes the proof. 3. Szemerédi's theorem Let us keep the notation O and o from the previous section. Namely O is a Dedekind domain finitely generated over F q in which F q is integrally closed and o = F q [t] is a subring of O as in Proposition 2.1. In case the reader is interested in the "prime elements of ideals" case of Theorem 5.2, let a ⊂ O be a non-zero ideal. (If not, they can always assume a = O.) Recall that an o-homothetic copy of a (finite) subset S ⊂ a is by definition a subset of a of the form a · S + β = {aα + β | α ∈ S} with a ∈ o and β ∈ a. It it said to be non-trivial if we can take a = 0. The next definition is a rather straightforward translation of [6, Definition 5.3]. Definition 3.1. (1) A finite subset S ⊂ a is said to be a standard shape if it contains 0 and generates a as an o-module. (The word "shape" is used because Z-homothetic copies of S in an abelian group are also called constellations with shape S especially in the context of torsion-free abelian groups.) In the rest of this Definition, assume S is a standard shape. (2) Write k = |S| and give S a numbering S = {s 1 , . . . , s k−1 , s k = 0} for the ease of notation. Let Let us denote elements of the sum o k ⊕ o k by symbols like (x ± i ) 1≤i≤k . Given an index 1 ≤ j ≤ k and a function ω : 1, . . . ,ĵ, . . . , k → {±}, we obtain an element Let us call this map the restriction along ω and denote it by (3) Let ρ > 0 be a positive real number and N 0 > 0 a positive integer. A non-negatively valued function λ : O → R ≥0 is said to be an (S, N 0 , ρ, o)-pseudorandom measure if for every choice of the data below: ..,ĵ,...,k} (namely a set of pairs (j, ω) of an index 1 ≤ j ≤ k and a function ω : 1, . . . ,ĵ, . . . , k → {±}), we have the inequality: We will need the following form of relative Szemerédi theorem. Then the following inequality holds: In particular there exist non-trivial o-homothetic copies of S contained in A. This statement and its proof are an immediate analog of [6, Theorem 5.4 ]. Nonetheless we write down the proof for the convenience of the reader. For this we have to recall the notion of weighted hypergraphs and borrow results on them from combinatorics. We will content ourselves with the following narrower definition than usual. Definition 3.3. An r-uniform weighted hypergraph consists of the following data: • a finite set J; • a finite set V i of vertices given for each i ∈ J; • for each subset e ⊂ J with cardinality r, a weight function ν e : i∈e V i → R ≥0 . The case where ν e have values in {0, 1} recovers the notion of an r-uniform hypergraph by interpreting the value 0 as "no r-edge" and 1 as "an r-edge." The case r = 2 corresponds to classical (weighted, |J|-partite) graphs. Consider the product i∈J V i × i∈J V i and denote its elements by symbols like (x ± i ) i . Paralelly to the above, for a subset e ⊂ J and a function ω : e → {±} we get an element For a positive real number ρ > 0, an r-uniform weighted hypergraph as above is said to be ρ-pseudorandom if for all choices of a subset Ω ⊂ e⊂J with |e|=r {±} e (namely a set of pairs (e, ω) of a subset e ⊂ J with |e| = r and a function ω : e → {±}), the following estimate holds: The next theorem is a deep result from combinatorics. ). Let 0 < r ≤ k be positive integers and ε > 0 be a positive real number. Then there exist positive real numbers γ = γ(r, k, ε) and ρ = ρ(r, k, ε) > 0 such that the following holds. Let ((V i ) i∈J , (ν e ) e⊂J,|e|=r ) be a ρ-pseudorandom r-uniform weighted hypergraph. Suppose given a subset E e ⊂ i∈e V i for each e ⊂ J with |e| = r and suppose that the following inequality holds: Then there is a family of subsets E ′ e ⊂ E e for e ⊂ J with |e| = r such that: Proof. Conlon-Fox-Zhao [2, Theorem 2.12] state this in a slightly different way but their proof actually shows our current statement. See [6, Theorem 5.10] for detail. Recall that we write X = Spec O and let X be the complete non-singular curve containing X as an open subscheme. Also let us recall some integer quantities: Proof of Theorem 3.2. Let S ⊂ a and δ > 0 be as in the statement. Recall k = |S|. Consider the o-linear map φ S : o k−1 ։ a in Definition 3.1. By Corollary 2.5, there is a constant U > 0 such that for every N ≥ 1 and α ∈ a ≤N , the set φ −1 S (α) ∩ o k−1 ≤U N contains at least (N 1/d ) k−1−r elements. Using Theorem 3.5, we set positive numbers ε, γ and ρ > 0 as: γ := γ(k − 1, k, ε), ρ := ρ(k − 1, k, ε), whose motivation will only be clear later. Now suppose we are given an (S, N 0 , ρ, o)-pseudorandom function λ : a → R ≥0 with the above ρ and some N 0 ≥ 0. Also let N ≥ N 0 and suppose the subset A ⊂ O ≤N satisfies (12) and (13). Out of these data, we construct a (k − 1)-uniform weighted hypergraph ((V i ) 1≤i≤k , ν j : i =j V i → R ≥0 ) as follows. The vertex sets are: To define the weight functions, note that for any index 1 ≤ j ≤ k and tuple (H i ) i ∈ i∈{1,...,ĵ,...,k} be the map sending a given tuple to this point. Its restriction to the jth summand shall be denoted by T j if we need to emphasize the domain of definition. By abuse of nota- where pr j is the projection dropping the jth entry. We define the weight functions for 1 ≤ j ≤ k: We can specify tuples (H i ) 1≤i≤k of hyperplanes by tuples (h i ) 1≤i≤k ∈ (o ≤U N ) k of scalars appearing in their defining equations. This gives us the le -hand vertical map in the following commutative diagram, where the map ψ S,j was defined in Definition 3.1: It follows that the estimate (11) implies the estimate (15) for the weighted hypergraph at hand. Thus it is ρ-pseudorandom. Define a subset E j ⊂ i =j V i for each j by: The significance of these sets is as follows: given an be the i-th standard vector and let e k = 0 be the zero vector. We claim the following inequality: Toward contradiction, suppose otherwise. The assumption (13) is equivalently formulated as It follows that the expectation computed on (o ≤U N ) k−1 × o ≤U N is also ≤ γ. By the definitions of ν j and E j as pullbacks, we know that the following commutes: Also recalling the definition of T we find that this last inequality precisely says that the hypothesis (16) of Theorem 3.5 is satisfied for our situation. Therefore there is a family of subsets E ′ i ⊂ E i as in Theorem 3.5. We claim that the existence of such E ′ i leads to the negation of (12). Define a map We have an equality of maps . Endow this set with the following filtraion; for the sake of space, we write also Hence i E i is the disjoint sum of the successive complements so we can define a map the restriction pr : Then define a map ι : We see that T • ι = id and in particular ι is injective. So far we have obtained the following commutative diagram: where we know the fibers of the map φ S have cardinality > (N 1/d ) k−1−r by the choice of U > 0. It follows that By (17) we know that the le hand side is bounded by By these inequalities and the fact (4) (or (3)) that |a ≤N | ≥ N n /(N(a)q nd 0 +g−1 ) for N ≥ N(a)q 2g−1 we get: = δ, contradicting (12). This shows the claimed inequality (20). To conclude, let us deduce (14) from (20). First, if the elements a ∈ o and β ∈ a ≤N satisfy as + β ∈ A ⊂ a ≤N for an s ∈ S {0}, it is necessarily true that as ∈ a ≤N because a ≤N is a subgroup. Since s ≥ 1, we necessarily have a ≤ N . It follows that the terms with a > N do not contribute to the expectation (20) so that we obtain Note that the fibers of φ S : o k−1 ≤U N → a have cardinality ≤ |o ≤U N | k−1−r and so the same is true for the vertical map in the next commutative diagram: It follows that If we divide both sides by |o ≤U N | k−1 · |o ≤N | this precisely says: |a ≤N | |o ≤U N | r · L.H.S. of (14) ≥ L.H.S. of (20) > γ. Note by (4) that the first factor on the le hand side is smaller than 1 at least if g ≥ 1 or U ≥ q. It follows that the asserted inequality (14) holds. This completes the proof of Theorem 3.2. The definitions and results are parallel to those in [6, §6] . Given a non-zero ideal a of O, we define the function Λ a R,χ : a → R by the composition: Note that the membership α ∈ a implies that αa −1 is a (non-zero) ideal of O so that the above composition is well defined. Below we use the notation be o-linear maps whose cokernels are finite and such that ker(φ i ) does not contain ker(φ j ) whenever i = j. Then there are large positive numbers R 0 > 1, w 0 > 1 and a small one 0 < f 0 < 1 such that for every choice of the quantities below: • real numbers R ≥ R 0 and w ≥ w 0 such that Note that only those terms with N(b i ) ≤ log R and N(c i ) ≤ log R for all i contribute to the sum because Supp χ ⊂ [−1, 1]. Define an o-linear mapφ by a andb for its residue class in i a/a · (b i ∩ c i ). Then for x ∈ o t , the condition that is equivalent to the equality Wφ(x) +b = 0, namely the equalityF (x) := 1 {0} (Wφ(x) +b) = 1. It follows that we have a commutative diagram: The next assetion paves the way for the computation of the E(−) term in (22). where π runs through the associate classes of prime elements of o and b (π) is a π-ideal. We call b (π) the π-part of b. The π-part of a tuple of ideals shall mean the tuple of the π-parts of its entries: for the quantity in (25). It depends also on the data of φ i but we do not include it in the notation. The quantity E((b, c); W, b) decomposes into the product of its π-parts; namely, where π runs through the associate classes of the prime elements of o. Proof. By Chinese Remainder Theorem, the mapφ : (o/I) t → i a/a(b i ∩ c i ) decomposes to its π-parts, that is:φ whereφ (π) is the map defined by (23) with (b, c) (π) in place of (b, c). Then in (25), the {0, 1}valued functionF decomposes into the product of functionsF (π) : o/I (π) → {0, 1} which are defined exactly asF with (b, c) (π) in place of (b, c). By a Fubini type computation our assertion now follows. Our next task is to evaluate E((b, c) (π) ; W, b). The computation is divided to two cases: when π o is small and when it is large. Let us use Greek letters α, β, γ to denote ideals when they are assumed to be π-ideals for a fixed prime element π ∈ o. Proof. Let us consider the case (1) . In this case, we know W ∈ πo by our assumption on the prime factors of W in Theorem 4.2. We claim the value Wφ(x) +b is never 0 in i a/a(β i ∩ γ i ). Indeed, choose any i with β i ∩ γ i O (which we are assuming to exist) and any of its prime factors p. It is a prime ideal over πo, and hence W ∈ p. It follows that Our claim follows. Next we consider the case (2) . In this case the ideals β i , γ i are all coprime to W in O. For the case (2a), it suffices to show that the map W φ i (−) + b i : (o/I) t → a/a(β i ∩ γ i ) is surjective. Since the translation +b i and the multiplication by W map on a/a(β i ∩ γ i ) are both bijective, it suffices to show that the map is surjective when w is large enough. By assumption coker(φ i ) is an o-module which is a finite abelian group. Hence there are only finitely many prime ideals (π) of o satisfying π · coker(φ i ) coker(φ i ). Now assume w exceeds the norms of those π's. Then as long as β i and γ i are π-ideals and N(πo) > w, we have (β i ∩ γ i ) · coker(φ i ) = coker(φ i ), i.e., the map (26) is surjective. Let us consider the case (2b). First we specify how large w should be. We are assuming that ker(φ i )'s do not contain each other. For each pair of distinct indices i, j, choose an element x ij ∈ ker(φ i ) ker(φ j ). Since φ j (x ij ) ∈ a is non-zero, there are only finitely many prime ideals p ⊂ O with φ j (x ij ) ∈ pa. Let w exceed the norms of all the p's appearing this way. To show (2b) it suffices to verify that the image of the mapφ has cardinality ≥ N(πo) 2 . Suppose i = j are among the indices with β i ∩ γ i O and let p i , p j be prime ideals containing them. We show that the image of the next further composition has cadinality ≥ N(πo) 2 . The images of the two elements x ij , x ji are respectively the residue classes of (0, φ j (x ij )) and (φ i (x ji ), 0), and both are non-zero by the very choice of w. It follows that their o/(π)-linear combinations are all distinct (note that the target is an o/(π)-vector space). Therefore the image of the map (27) contains at least |o/(π)| 2 distinct elements. Now we want to plug our results here into (22), but to proceed further, we need the help of Fourier analysis. Let χ be the Fourier transform of the function x → e x χ(x) so that by inverse Fourier transform: By the theory of Fourier analysis, we know that χ decays rapidly: Proof. See any textbook on Fourier analysis or [6, Lemma 6.15 and its corollary]. The right hand side of (22) is written as: For tuples (ξ, η) ∈ I 2m , consider the infinite sum This subsection is devoted to the proof of: The sum E((ξ, η), R, w, b) converges absolutely and uniformly in (ξ, η) ∈ R 2m . For any given A > 0, the quantity (28) is equal, up to an error ±O A,χ,m,r ((log R) −A ), to: Note that by the presence of the Möbius function, only those terms where all b i and c i are square-free contribute to the sum (29) and that the sum decomposes into the product of its π-parts by the multiplicativity of the functions involved (see Lemma 4.4 for the multiplicativity of E((b, c); W, b)). Namely: Note that there are at most r prime ideals of O over a given (π) and hence there are at most 2 r square-free π-ideals. By Lemma 4.5, if N(πo) ≤ w then E (π) ((ξ, η), R, w, b) = 1; we also know the following when N(πo) > w, supposing (as we shall always do) that w is large enough to invoke the lemma: • For those exceptional cases in the previous item, we know E((β, γ); W, b) = 1 for the first case and = 1/ N(πo) for the others. This gives us the following crude estimate, where O r (1) can be taken to be 2 r : In particular we have E((ξ, η), R, w, b) = 1 + O r N(πo) −1− 1 log R uniformly in ξ, η. Therefore by basic facts on Euler products (such as [6, Lemma 6 .19]) we conclude that the product (π) E (π) ((ξ, η) , R, w, b) converges absolutely. As a result, the sum of absolute values associated with the sum E((ξ, η), R, w, b) (29) can be estimated as: proving the convergence claim in Proposition 4.7. Now let us move on to the comparison of (28) and (30). We want to replace the domain R of integration in (28) by the bounded interval We have the following estimate: which has at most the claimed size by Lemma 4.6. (2) Apply (1) to each of b i and c i and take the product, taking into account the bound This completes the proof. Apply the operation (log R) 2m c) ; W, b) to the estimate of Lemma 4.8 (2) . The le hand side becomes precisely (28). The main term of the right hand side becomes We claim that we can interchange the sum and integral here. Indeed, by the convergence part of Proposition 4.7, i.e., formula (33), the sum (b,c) F ((b, c) , ξ, η) converges absolutely and uniformly in ξ and η to a continuous function. Since I is a bounded closed interval, our claim follows so that the value (34) equals (30). The error term is at most the following, which we can bound again by (33): (1) ). The proof of Proposition 4.7 is now complete. Intermission. Before proceeding further, let us recall basic facts from elementary calculus and the theory of the zeta function. The absolute constans c i and C i appearing below can be made explicit, but we do not seek to do so because their precise values are not important. Potentially big constants are denoted in upper case and potentially small ones in lower case. From §4.4 on, when we say some quantities should be small or large enough, we will be implicitly using these constants to specify the thresholds. There is a positive real number c 1 > 0 such that for all ε ∈ C with |ε| ≤ c 1 one has e ε = 1 + O(1) · ε and (35) (Actually one can take c 1 := 1/2, say.) Next, for real numbers A ≥ 2 we have Taylor expansion Noting that the positively valued function A → log A A has bounded range, there are c 2 > 0 and C 3 > 0 such that for all A ≥ 2 and ε ∈ C with |ε| ≤ c 2 log A we have Also, since the function A → log A A decreases for A ≥ e, for prime ideals p ⊂ O with p ∩ o = (π) we have (38) log N(p) N(p) ≤ log N(πo) N(πo) at least if N(πo) ≥ 3. This inequality happens to be true even when N(πo) = 2 thanks to the equality log 4 4 = log 2 2 . 4.3.2. The zeta function. We need to recall the zeta function of O. For s ∈ C with Re(s) > 1, we set: It is known that ζ O (s) extends to a meromorphic function on C. Actually we know [8, Theorem 5.9, p.53] that there is a polynomial L(u) ∈ Z[u] of degree 2g, with L(q −1 ) = Pic 0 (X) /q g , such that: It follows that ζ O (s) has a simple pole at s = 1 with positive residue, say κ O > 0: Moreover, by Weil's Riemann Hypothesis for algebraic curves [8, Theorem 5.10, p.55], we know that all the roots of L(u) in C have magnitude q −1/2 . Chebotarëv's density theorem 5.3 below is an important consequence of this. By explicit computation we know ζ Fq[t] (s) = 1/ 1 − q 1−s (see [8, p.11] ). From this it follows for integers i ≥ 1: see [8, Theorem 2.2, p.14]. This gives the following. The sums are over maximal ideals πo ⊂ o satisfying the indicated conditions: for some C 4 , C 5 > 0 and all w > 1. Euler product. Now we compute the main term (30) using the Euler product presentation (31) and estimate (32). We start with some detailed estimate of the Euler product. Recall that the latter estimate requires N(πo) > w and that w be large enough. Assume w is large enough to match this requirement. Then by (32) and basic facts like we have for N(πo) > w: Take the product of (43) over all πo with N(πo) > w. By the definition of the zeta function ζ O in (39), we get the following. There, the symbol p with N(πo)≤w means the product over prime ideals p of O such that πo := p ∩ o satisfies the indicated condition: Here the last factor has been obtained using (35), (36) via exp • log = id and (41) as follows: where for the last estimate we have to assume O r (1)/(w log q w) is small enough. Since we have m such factors, we get the factor 1 + mO r (1/w log q w). Formula (40) can be written as we find that the product of zeta functions in (44) has the following form when R is large enough: We have to compute the products p with N(πo)≤w in (44) as well. By (37) we know for small complex numbers ε: The product of the first factors is ϕ O (W ) N(W O) . For the second factors, by (38), (42) and the fact that the number of prime ideals p over a given πo is at most r, for small ε we have: (product of the second factors in (45)) = (π) with N(π)≤w We apply this to ε = 1 log R (−1+ξ √ −1), with ξ = ξ i or η i and to 1 log log R is smaller than mr log q w times an absolute constant, then the product of the products p with N(πo)≤w in (44) is of the form: with O(1) an absolute constant. By (44)-(46), we get an estimate: The error factor above is a 1 + O m,r 1 w log q w + O O,r,m log q w √ log R . We are ready to compute (30). Let us recall for the convenience of reference: Proof. By substituting (47) we get The value above is estimated as: We want to bound the integral I 2m |F (ξ, η)|dξdη. By Lemma 4.6, we know for any A > 0 and therefore for any A > 0 It follows that (by taking A := 2 for example) is also a finite value. Next we consider the integral I 2m F (ξ, η)dξdη. We want to replace I 2m by R 2m . Consider the following partition of the domain of integral: where J runs through maps {1, . . . , 2m} → {I, R I} except the constant map into the one-point set {I}, and Ω J denotes the corresponding product Ω J := J 1 × · · · × J 2m ⊂ R 2m . Proof of Lemma 4.10. Since J is not the constant function at {I}, there is an index 1 ≤ k ≤ 2m such that J k = R I. By symmetry, we may assume k = 1. By (50) (51), we have for any A > 2 This completes the proof of Lemma 4.10. By Lemma 4.10, we can proceed as: Now that we have estimated the integrals in (49) in (51) and (52), we obtain for any A > 0: m can be evaluated by a standard Fourier analysis computation (e.g. [9, p.170] or [6, Lemma 6.29]): We conclude that This completes the proof of Proposition 4.9 Let us collect the computations we have done and finish the proof of the main result of this section. Proof of Theorem 4.2. We wanted to evaluate up to error the average: By (22) and (28), it equals: By Proposition 4.7, this has been estimated as for any A > 0. By Proposition 4.9, this is further estimated as: As always, let O continue to be a Dedekind domain finitely generated over F p . We restate Theorem 1.2 in a slightly broader generality. In the number field case [6] , the extra generality allowed one to prove a constellation theorem for prime-valued points on a binary quadratic form ax 2 + bxy + cy 2 over Z. Now we can state our main theorem in its proper generality. For the proof, we need to recall Chebotarëv's density theorem. This is by far the deepest imput from algebraic geometry in this work. Let X be a complete non-singular geometrically irreducible curve over F q . Let Pic(X) φ − → G be a finite quotient of Pic(X). The restriction of the degree map deg : ker(φ) → Z is necessarily non-trivial. Let D ≥ 1 be the order of its cokernel. (Let us always use D in this sense when G is understood.) It follows that we have the degree map of the following form: The next result is a consequence of Weil's Riemann Hypothesis for algebraic curves over finite fields. Theorem 5.3 (Chebotarëv's density theorem). Let G be a finite quotient of Pic(X) and P ∈ G. Then for positive integers n > 0, we have the following cardinality estimate: Via the transition of parameters from the degree n = deg(x) to the norm L = q n =: N(x), we get the following: The following is clear from definitions. Lemma 5.5. If α ∈ a is a prime element, then: When we apply Lemma 5.5, it will be convenient to have a bound for the number of elements α ∈ a with N(α) < N(a)R. Also, let χ : R → [0, 1] be a compactly supported C ∞ function as in Definition 4.1. Take a norm-length compatible O * -fundamental domain D of a {0} which exists thanks to Proposition 2.6. This means that there is a small positive number c D > 0 such that the following inclusion holds for all M > 0: Let δ > 0 be any positive number smaller than the upper densityδ Pa (A) of A ⊂ P a . Let δ 1 be the positive number defined by (64) below (which is not very motivating) depending only on the preliminary data O, a, χ, S, δ and D that are already available. Using the relative Szemerédi theorem 3.2, we fix the following positive numbers: ρ := ρ(o, a, S, δ 1 ), γ := γ(o, a, S, δ 1 ). Let w > 1 be a large integer to be specified in a moment and R > 1 be a large real number to be specified much later, satisfying It is routine to check that this family of maps satisfies the hypothesis of Theorem 4.2; see [6, Lemma 5.8] for details. Hence if w is large enough depending on S ⊂ a and r, and if R is large enough depending in addition on χ and w, then the error terms in Theorem 4.2 can be made smaller than ρ: We fix such w. The value of R is yet to be fixed. Set W := N(πo)≤w π ∈ o, where the product N(πo)≤w is taken over the monic irreducible polynomials π satisfying the indicated condition. Let e > 1 be a large positive integer to be specified toward the end of the proof. We consider the following positive real numbers determined by e: Since δ <δ Pa (A) by our choice, for infinitely many e ∈ N the following inequality holds: (58) |A ∩ a ≤M | > δ · |P a ∩ a ≤M | . By (54), the set a ≤M contains a(N(a)L) ∩ D. Hence the right hand side is at least: ≥ δ · |P a ∩ a(N(a)L) ∩ D| . For every element α ∈ P a ∩ a(N(a)L) ∩ D, the ideal αa −1 ⊂ O is a prime ideal with norm N(α)/ N(a) and whose class in Pic(O) equals −[a]. Therefore the association α → αa −1 establishes a bijection from P a ∩ a(N(a)L) ∩ D to the following set: Its cardinality is already estimated in Corollary 5.4. As a result we get: where we have written C O := 1 q D D |Pic(O)| for short. Since we want to use Lemma 5.5 later, we want to consider only those elements with ideal norm > N(a)R. By Corollary 5.6 we know By (57) the right hand side has the order of (log L) n−1 L 1/(2m+1)n or less as a function of e, which is smaller than the right-most term of (59). Hence by replacing δ by a slightly smaller value if necessary, we see that the following variant of (59) is valid: Lemma 5.7. For every α ∈ P a a(N(a)R), the residue class of α in a/W a generates it as an O-module. Proof of Lemma. By Chinese Remainer Theorem for O-modules, the assertion is equivalent to that α ∈ a ( p|W pa). Suppose there is a p|W such that α ∈ pa. Since α ∈ P a it follows that α = ap. By the definition of W the ideal πo = p ∩ o has norm ≤ w. It follows that N(p) ≤ w r and hence This contradicts the assumption (55). This proves Lemma 5.7. As a is a rank 1 projective O-module, we have an isomorphism of O-modules a/W a ∼ = O/W O. The generators of a/W a correspond to the elements of (O/W O) * . It follows that: |{α ∈ a/W a | α generates a/W a}| = ϕ O (W ). By Lemma 5.7, we see that the set A∩(a ≤M a(N(a)R)) decomposes into the sum of ϕ O (W ) disjoint subsets according to the mod W classes. By the pigeonhole principle, it follows that for some residue class [b] ∈ a/W a we have (61) |{α ∈ A ∩ (a ≤M a(N(a)R)) | α = [b] in a/W a}| ≥ 1 ϕ O (W ) · (R.H.S of(60)). Choose one such [b] ∈ a/W a. Let us fix a C > 0 depending only on a and W such that the projection a ≤C → a/W a is surjective (which exists because the target is a finite set) and choose a li b ∈ a ≤C of [b]. Let Aff W,b : a → a be the affine linear map α → W α + b. Set (N(a)R) ) ⊂ a. We have the following inclusion if e > 1 is large enough: Indeed, suppose α ∈ a satisfies W α + b ≤ M = N W . Since b ≤ C < N W for e large enough, by the ultrametricity of − this implies W α ≤ N W . We get α ≤ N because W is a multiplicative element for the norm − . This proves the inclusion (62). In particular the set on the le hand side of (61) is contained in Aff W,b (B ∩ a ≤N ). Having fixed W and b, we can finally define a function λ : a → R ≥0 by the formula: By (56) the function λ is (R 2m /q, ρ, S, o)-pseudorandom. Let us verify the other hypotheses in the Szemerédi theorem 3.2. By Lemma 5.5, the restriction of λ to B equals the constant function ϕ O (W ) W n κ O Cχ log R. This together with (60) and (62) implies: We have |a ≤N | ≤ M n / W n N(a)q g−1 (which is an equality if N happens to be a power of q d ). By the definition (57) of our parameters we get for all sufficiently large e > 1: This establishes one of the two requirements in the relative Szemerédi theorem 3.2. We have to establish one more inequality to invoke the relative Szemerédi theorem. By Lemma 5.5 we have λ k 1 B ≤ const. · (log R) 2k where the constant comes from the coefficient in the definition of λ in (63). So: which is < γN for e sufficiently large. Now fix e so that it satisfies (58) and is large enough to make all the above inequalities true. We can apply the relative Szemerédi theorem 3.2 to the current situation by (64), (65) and the pseudorandomness of λ. It follows that B contains an o-homothetic copy of S. Sending it by the affine o-linear map Aff W,b : B → A, we get an o-homothetic copy of S in A. This completes the proof of Theorem 5.2. Remark 4. The above proof actually shows a finitary version of the theorem as in [6, Theorem A] because the dependence of the threshold for e on the set A is via its density δ (though of course the specific value of e should be determined depending on A to ensure (58)). Remark 5. The assumption that A has positive upper density in P a was used solely at (58). It follows that we could have assumed more directly that A ⊂ P a satisfies an inequality of the form: |A ∩ a ≤M | > const. · M n log M for arbitrarily large M , with the positive constant depending only on a and A. See [6, § §8-9] for a fully axiomatic treatment in the number field context. In fact, not surprisingly, this inequality is equivalent to A having positive upper density in P a ; see [6, Proposition 8 .14] for the arguments in the number field case, which is also valid here. We give only sketches. See also [6, §10] for a detailed account in the setting of number fields. Let f be the conductor: which is an ideal of O contained in O 0 . Let P f O be a temporary notation for the set of prime elements of O coprime to f, which is P O minus finitely many associate classes. One shows that the elements of P O 0 are precisely those elements of P f O which are contained in O 0 . More explicitly, we have the following cartesian diagram: Set O * f := {f ∈ O * | f mod f = 1 in O/f}. It is a subgroup of O * 0 which is of finite index in O * . Let D ⊂ O {0} be a norm-length compatible O * f -fundamental domain whose existence easily follows from Proposition 2.6. It suffices to show the inequality (66) with P O 0 ∩ D in place of P O 0 . By the norm-length compatibility of D, we are reduced to showing the following inequality for infinitely many L ∈ N: with the constant depending only on O 0 and D. Hence it suffices to show the α 0 = 1 case (say) of the following claim: The obvious map P f O,α 0 ∩ D → Spec(O) α 0 ; α → αO is a bijection. The Chebotarëv Density Theorem 5.3 holds with Pic(X) replaced by Pic(X, f) with the same proof because the result [8, Theorem 9.24, p.141] we cited is stated in this generality. Thus for every finite quotient G of Pic(X, f), an element P ∈ G and n > 0, we have: This proves Claim 5.9 and hence Proposition 5.8. Introduction to Commutative Algebra A relative Szemerédi theorem The primes contain arbitrarily long arithmetic progressions Linear equations in primes Algebraic Geometry Constellations in prime elements of number fields Green-Tao theorem in function fields Number Theory in Function Fields The Gaussian primes contain arbitrarily shaped constellations Aoba Acknowledgements. I have learned much of the technique used here through collaboration [6] with Masato Mimura, Akihiro Munemasa, Shin-ichiro Seki and Kiyoto Yoshino. Especially I owe much to Shin-ichiro, who was crazy enough to give us a 100-hour lecture series and teach us everything about the classical Green-Tao theorem. I thank Federico Binda for motivating conversations over lunch. Most of this work was done in the latter half of 2020. Amid all the irregularities caused by the COVID-19 pandemic, the Tohoku University staff has been so great that I was able to finish this work more quickly than I intended. During the work I was partially supported by JSPS KAKENHI Grant Number JP18K13382.