key: cord-0043271-1v41ya0m authors: Ji, Jing; Heinz, Jeffrey title: Input Strictly Local Tree Transducers date: 2020-01-07 journal: Language and Automata Theory and Applications DOI: 10.1007/978-3-030-40608-0_26 sha: 48a432943e1c2416fe598ac3c98ee46d06bde105 doc_id: 43271 cord_uid: 1v41ya0m We generalize the class of input strictly local string functions (Chandlee et al. 2014) to tree functions. We show they are characterized by a subclass of frontier-to-root, deterministic, linear tree transducers. We motivate this class from the study of natural language as it provides a way to distinguish local syntactic processes from non-local ones. We give examples illustrating this kind of analysis. Locally Testable sets of strings in the strict sense (Strictly Local, SL) are a subclass of the regular languages with interesting properties [16, 20] . Rogers [18] presents a generalization of SL to sets of trees and shows they characterize the derivations of context-free languages. Chandlee et al. [2, 3] generalize SL formal languages in another direction. They present classes of strictly local string-tostring functions. In this paper, we generalize the SL class to a class of functions over trees. In particular, we present a characterization in terms of frontier-toroot, deterministic, linear tree transducers [5, 7] . One motivation comes from computational and theoretical linguistics, where the goal of one program is to identify and understand the minimally powerful classes of formal grammars which can describe aspects of natural language [4] . To this end, subregular sets and functions over strings have been used to distinguish and characterize phonological generalizations [11] . More recent research has begun studying natural language syntax from the perspective of subregular sets and functions over trees, as opposed to strings [9, 10] . One rationale for studying subclasses of regular string/tree sets and relations is that it is known that finite-state methods are sufficient to describe aspects of natural language. For phonology and morphology, finite-state methods over strings appear sufficient [1, 17] . For syntax, finite-state methods over trees similarly appear sufficient. Rogers [19] showed that a syntactic theory of English can be understood in terms of Monadic Second Order (MSO) definable constraints over trees. Languages with more complex constructions can be understood in terms of regular tree languages undergoing regular tree transductions [8, 14] . Tree transducers also have found broad application in machine translation [13, 15] . It remains an open question, however, whether the full power of regular computations are necessary [11] . Another rationale for identifying subregular classes of languages is that learning problems may be easier to solve in the sense of requiring less and time and resources than otherwise [12] . By defining and characterizing the Input Strictly Local class of tree transducers, we hope to take a first step in developing a more fine-grained perspective on the syntactic transformations present in natural languages. The structure of the paper is as follows. Section 2 defines trees and associated properties and functions based on their recursive structure. In this way we follow the tree transducer literature [5, 7] . However, we note that we do not adopt the convention of ranked alphabets. Instead we obtain their effects by bounding the largest number of children a tree in some tree set can have and by requiring that the pre-image of the transition function of the tree automata is finite. While this is unconventional, we believe it simplifies our presentation and proofs. Section 2 also reviews strictly local treesets and reviews the proof of the abstract characterization of them [18] . Section 3 presents the main theoretical results. Deterministic, frontier-toroot, finite-state, linear tree transducers (abbreviated DFT) are defined, Input Strictly Local (ISL) tree functions are defined abstractly and then characterized in terms DFTs. Section 4 concludes. Assume a finite alphabet Σ and let Σ * denote the set of all strings of finite length that can be obtained via concatenation of the elements of Σ. We denote the empty string with λ. Consider an alphabet Σ and symbols [ ] which do not belong to it. A tree is defined inductively as follows: -Base Case: For each a ∈ Σ, a[ ] is a tree. The tree a[ ] is also called a leaf. We also write a[λ] for a[ ]. -Inductive Case: If a ∈ Σ and t 1 t 2 . . . t n is a string of trees of length n (n ≥ 1), then a[t 1 t 2 . . . t n ] is a tree. For a trees t = a[t 1 t 2 . . . t n ], the trees t 1 , t 2 , . . . t n are the children of t and t i denotes the ith child. Σ T denotes the set of all trees of finite size from Σ. The depth, size, yield, root, branch, and the set of subtrees of a tree t, written dp(t), |t|, yld(t), root(t), branch(t) and sub(t), respectively, are defined as follows. For all a ∈ Σ: -If t = a[ ], then dp(t) = 1, |t| = 1, yld(t) = a, root(t) = a, branch(t) = 0, and sub(t) = {t}. . . t n ] then dp(t) = max{dp(t i )|1 ≤ i ≤ n} + 1, and |t| = 1 + n i=1 |t i |, and yld(t) = yld(t 1 )yld(t 2 ) . . . yld(t n ), and root(t) = a, and branch(t) = n, and The roots of the subtrees of a tree t are called nodes. The root of a tree is also called its root node. Leaves are also called frontier nodes. The branching degree of a tree t is branch degree(t) = max{branch(u) | u ∈ sub(t)}. Let Σ T n denotes the set of trees {t ∈ Σ T | branch degree(t) ≤ n}. ] denotes a tree rooted in S with branch degree of 3. Let N * be the set of all sequences of finite length of positive natural numbers. For n = n 1 , n 2 , . . . , n m ∈ N * (m ≥ 1), the subtree of t at n is written t. n, and it is defined inductively: -Base Case: t. n = t iff n = λ. The Gorn addresses provide a natural ordering of the subtrees of t in terms of the length-lexicographic ordering. For distinct n = n 1 , n 2 , . . . , n k , m = m 1 , m 2 , . . . , m , n precedes m iff either k < , or k = and n 1 < m 1 , or k = and n 1 = m 1 and n 2 , . . . , n k < m 2 , . . . , m . This essentially orders subtrees of t such that the ones closer to the root of t are ordered earlier, and those 'on the same level' in t are ordered 'left to right.' We make use of this ordering in our proof of Theorem 1. The largest common subtrees of a set of trees T , denoted lcs(T ), is {d ∈ t∈T sub(t) | ∀d ∈ t∈T sub(t), |d | ≤ |d|}. The k-stem (k ≥ 1) of a tree t, written stem k (t), is defined as follows. - The stems of a tree t, denoted stem(t) is the set {stem k (t) | k ≥ 1}. It is useful to incorporate boundary markers into the roots and leaves of trees. Informally, given a Σ-tree t, boundary markers are added above the root and below the leaves. Formally, we employ symbols , ∈ Σ for this purpose. Thus for all a ∈ Σ, t ∈ Σ T , let add , and add (a[t 1 . . . t n ]) = a[ (t 1 ) · · · (t n )]. Then for any Σ-tree t, its augmented counterpartt = add (add (t)). The k-factors of a tree t are defined as the set of k-depth stems of subtrees oft. For all t ∈ Σ T , let F k (t) = {stem k (u) | u ∈ sub(t )}. We lift the definition of k-factors to treesets in the natural way. For all T ⊆ Σ T , F k (T ) = t∈T F k (t). A strictly k-local grammar G = (Σ, S) where S is a finite subset of F k (Σ T ) and the tree language of G is defined as: Note that since S is finite, there exists a smallest number n such that S ⊆Σ T n . It follows that L((Σ, S)) is of branching degree n. A treeset T ⊆ Σ T is strictly k-local if there exists a k and a strictly k-local grammar G such that L(G) = T . Such treesets form exactly strictly k-local treesets (SL k ). Strictly local stringsets are a special case of strictly local treesets where all the branching degree is 1; so every node (except leaves) are unary branching. Strictly 2-local treesets have been called local treesets in previous literature [18] . Every Strictly 2-local tree language can be generated by a context free grammar [7, 18] . Comparable to the characterization of strictly local string sets, which is Suffix Substitution Closure [20] , each strictly 2-local tree language satisfies Subtree Substitution Closure [18] . To explain this characterization, we first introduce the notion of subtree-substitution. For t, s ∈ Σ T and n = n 1 , n 2 , . . . , n m ∈ N * (m ≥ 1), the operation of substituting the subtree of t at n by s, written as t. n ← s, is defined as follows. We also define substitution of all the subtrees of t rooted at x (x ∈ Σ) by s, which we write as t Rogers [18] proves the following result and we repeat the proof to set the stage for the sequel. Proof. If T is strictly 2-local, then there exists a corresponding strictly 2-local grammar G that satisfies L(G) = T . Thus there exists a finite set S ⊂ F k (Σ T ) such that L((Σ, S)) = T . Consider any A, B ∈ T and n 1 , n 2 ∈ N such that root(A. n 1 ) = root(B. n 2 ). Let t = A. n 1 ← B. n 2 . We show t ∈ T . First notice that F 2 (A) ⊆ S and F 2 (B) ⊆ S because A, B ∈ T and T = L((Σ, S)). Next consider any element u ∈ F 2 (t). By definition of t and 2-factor, u must be a 2-stem of a subtree of A. n 1 ← B. n 2 . If u is the 2-stem of a subtree of B. n 2 then u ∈ F 2 (B) ⊂ S. If not, then u is a 2-stem of a subtree of A and so u ∈ F 2 (A) ⊂ S. Either way, u ∈ S and so F 2 (t) ⊆ S. It follows that t ∈ T . Conversely, consider a treeset T such that whenever there exist two vectors n 1 , n 2 ∈ N , such that root(A. n 1 ) = root(B. n 2 ) then A. n 1 ← B. n 2 ∈ T . We refer to this property as the SSC. To show T is Strictly 2-Local, we present a finite set S ⊂ F k (Σ T ) such that L((Σ, S)) = T . Let S = F 2 (T ). Since T is of branching degree n, S is finite. In order to prove L((Σ, S)) = T , we need to show both L((Σ, S)) ⊆ T and T ⊆ L((Σ, S)). It is obvious that T ⊆ L((Σ, S)) because for any t ∈ T , F 2 (t) ⊆ S = F 2 (T ). The following proves that L((Σ, S)) ⊆ T by recursive application of SSC. Consider any t ∈ L((Σ, S)). Let t 1 = t. n 1 , t 2 = t. n 2 , . . . t m = t. n m be an enumeration of the m subtrees of t by their Gorn addresses in length-lexicographic order. (Note that t 1 = t). The base step of the induction is to choose a tree s 0 ∈ T that has the same root as t. Such a s 0 ∈ T exists because [root(t)[ ]] ∈ S. Next we assume by the induction hypothesis that s i−1 ∈ T and we will construct Informally, this construction ensures the nodes and children of s i are identical to those of t from the root of t to the root of the subtree t i . Since each s i is built according to s i−1 and s 0 ∈ T we conclude that s m ∈ T . Furthermore, since the subtrees are ordered length-lexicographically and we substitute a 2-stem of a subtree of t to build s i , it follows that s m = t. As t was arbitrary in L((Σ, S)), we obtain L((Σ, S)) ⊆ T . The catenation operation of two trees u · t is defined by substitution in the leaves. Let $ be a new symbol, i.e., $ ∈ Σ. Let Σ T $ denote the set of all trees over Σ ∪ $ which contain exactly one occurrence of label $ in the leaves. The operation of catenation is defined inductively: Notice that the classical catenation of strings can be viewed as a special case of catenation of trees with unary branching. This operation can also be used to represent subtrees. For t ∈ Σ T ∪ Σ T $ , if t = u · s, then s is a subtree of t. Furthermore, for any t ∈ T and any tree language T ⊆ Σ T , the quotient of t w.r.t. T is defined Canonical finite-state tree recognizers can be defined in terms of these quotients. In this section we define functions that map trees to trees. After reviewing some basic terminology, we introduce deterministic, frontier-to-root, linear, finite-state Tree Transducers (DFT). We then define Input Strictly Local Tree Transducers (ISLTT) in a grammar-independent way, and then prove they correspond exactly to a type of DFTs. Examples are provided along the way. A function f with domain X and co-domain Y can be written f : is defined} and the pre-image of f is the set {x ∈ X|f (x) is defined}. Tree transducers compute functions that map trees to trees f : Σ T n → Γ T . DFTs are defined as a tuple (Q, Σ, Γ, F, δ), where Q is a finite set of states, F ⊆ Q is a set of final states, and δ is a transition function that maps a sequence of states paired with an element of Σ to a state and a variably-leafed tree. A variably-leafed tree is a tree which may include variables in the leaves of the tree. Let X = {x 1 , x 2 , . . .} be a countable set of variables. If Σ is a finite alphabet then Σ T [X] denotes the set of trees t formed with the alphabet Σ ∪ X such that if the root of a subtree s of t is a variable then s is a leaf (so variables are only allowed in leaves). Thus formally the transition function is δ : Importantly, the pre-image of the transition function must be finite. We sometimes write (q 1 q 2 . . . q m , a, t, q) ∈ δ to mean δ(q 1 q 2 . . . q m , a) = (t, q). In the course of computing a tree transduction, the variables in variablyleafed trees are substituted with trees. Assume t 1 , t 2 , . . . t m ∈ Γ T and s ∈ Γ T [X], which is a variable leafed tree with any subset of the variables {x 1 , x 2 , ..., x m }. We define a substitution function φ such that φ( We define the process of transducing a tree recursively using a function π, which maps Σ T n to Q × Γ T , which itself is defined inductively with δ. The tree-to-tree function the transducer M recognizes is the set of pairs We also write M (t) = s whenever (t, s) ∈ L(M ). A DFT is linear provided whenever δ(q 1 q 2 . . . q m , a) = (s, q), no variable occurs more than once in s. Example 5. Wh-movement refers to a syntactic analysis of question words such as English what and who. It is common to analyze this as a relation between tree structures [21] . The input structure describes the relation of the wh-word to its verb (cf. "John thinks Mary believes Bill buys what?") and the yield of the output structure reflects the pronunciation (cf. "What does John think Mary believe Bill buys"). We use a simplified transformation to make the point. In the alphabet, S represents the root node of a input tree, W stands for a wh-word and P for everything else (P is for phrase). A transducer of wh-movement can be constructed as a Figure 1 illustrates some of the transformations computed by the finite-state machine M wh . The tree with a wh-word in Fig. (1a) is transformed into the tree in Fig. (1b) . (M wh keeps the original wh-word in-situ but it could easily be removed or replaced with a trace). The trees in Fig. (1c) and (d) are the same because there is no wh-word in the input tree and so M wh leaves it unchanged. Next we describe the canonical form of deterministic tree transducers. The quotient of a tree t ∈ Σ T with respect to a tree-to-tree function f : Σ T → Γ T is a key idea. It will be useful to develop some notation for the largest common subtree of the image under f of the set of trees which includes t as a subtree. When f is understood from context, we just write lcsi(t). Then the quotient is defined as follows: (1) When f is clear from context, we write qt(t) instead of qt f (t). It is worth noting that for a tree t ∈ Σ T n , the largest common subtree of the image of a linear transducer with the input of Σ T $ · {t} is unique if it exists because if there is more than one tree that belongs to lcs(f (Σ T $ ·{t})), they must be produced by copying, which is not allowed by linear DFT. If trees t 1 , t 2 ∈ Σ T have the same quotient with respect to a function f , they are quotient-equivalent with respect to f and we write t 1 ∼ f t 2 . Clearly, ∼ f is an equivalence relation which partitions Σ T . As in the string case, to each regular tree language T , there is a canonical DFT accepting T . The characterization given by the Myhill-Nerode theorem can be transferred to the tree case [6] . For any treeset T , the quotients of trees w.r.t. T can be used to partition Σ T into a finite set of equivalence classes. Analogous to the smallest subsequential finite state transducer for a subsequential function, we can construct the smallest linear DFT for a deterministic tree-to-tree function f and refer to this transducer as the canonical transducer for f , Ψ c f . For t 1 , t 2 , . . . , t m ∈ Σ T n (m ≤ n) and a ∈ Σ, let the contribution of a w.r.t. The term cont f (a, Then the canonical DFT for a deterministic tree-to-tree function f is: The presentation here differs from Friese et al. [6] , but the only thing we require in the proof of Theorem 2 below is the existence of the canonical DFT whenever ∼ f is of finite index. We define ISLTT as a subclass of linear DFTs. In the same way ISL string functions can be used to probe the locality properties of phonological processes, ISL tree functions can used to probe the locality properties of syntactic transformations. To show that a syntactic transformation is not ISL one need only construct a counterexample to Definition 1. Example 6. We can show the function computed by M w h from Example 5 is not ISL for any k because there is no bound on the distance the wh-word can 'travel.' Suppose there is a k and n such that for all t 1 , t 2 ∈ Σ T n , if stem k−1 (t 1 ) = stem k−1 (t 2 ) then qt f (t 1 ) = qt f (t 2 ). Let u 1 = u 2 . . . = u k−1 = P [P $]. Also let u k = P [P P], s = S[P $] and w = P [P W ]. We construct two sentence structures: s · t 1 and s · t 2 , where t 1 = u 1 · u 2 . . . u k−1 · w and t 2 = u 1 · u 2 . . . u k−1 · u k . It is obvious that stem k−1 (t 1 ) = stem k−1 (t 2 ). However, qt f (t 1 ) = qt f (t 2 ) since (s, s) ∈ qt f (t 2 ) but (s, s) / ∈ qt f (t 1 ). As we can always find such a pair of trees t 1 and t 2 for any k, it is thus proved that wh-movement is not ISL for any k. Our main result, Theorem 2 below, establishes an automata-theoretic characterization of ISL tree-to-tree functions. As we illustrate after the proof, one can show that a tree transformation is ISL using this theorem. Transducers) . A function f is ISL iff there is some k and n such that f can be described with a DFT for which The transducer is finite since Σ is finite and n bounds the branching degree of the pre-image of f which ensures the finiteness of both Q and δ. Before our proof of the Theorem, we prove a lemma based on these remarks. For all k,m ∈ N with k ≤ m, and for all t ∈ Σ T n , stem k (stem m (t)) = stem k (t) since both t and stem m (t) share the same k-stem from the root. k ∈ N, and for all a ∈ Σ and t 1 , t 2 . This is a direct consequence of Remark 1. Let Ψ be a ISLTT with the properties defined in Theorem 2. If t ∈ Σ T n and u ∈ Γ T , π(t) = (q, u), then q = stem k−1 (t). Proof. The proof is by induction on the depth of the trees to which π is applying. The base case follows from the facts that for (λ, a, v, q) ∈ δ iff π(a[ ]) = (q, v) and q = stem k−1 (a[ ]). Next assume for all t 1 , t 2 , . . . , t m ∈ Σ T n (m ≤ n) and v 1 , v 2 . . . , v m ∈ Γ T such that π(t 1 ) = (q 1 , v 1 ) implies q 1 = stem k−1 (t 1 ), π(t 2 ) = (q 2 , v 2 ) implies ). Based on the assumption, we know that π( Now we can prove the theorem. Proof ( Theorem 2). (⇐) Assume k ∈ N and let f be a function described by Ψ = {Q, Σ, Γ , F, δ} constructed as in Theorem. Let t 1 , t 2 ∈ Σ T n such that stem k−1 (t 1 ) = stem k−1 (t 2 ). By Lemma 1, both t 1 and t 2 lead to the same state, so qt f (t 1 ) = qt f (t 2 ). Therefore, f is k-ISL. (⇒) Consider any ISL tree-to-tree function f . Then there is some k and n such that ∀t 1 , t 2 ∈ Σ T n , we have stem k−1 (t 1 ) = stem k−1 (t 2 ) ⇒ qt f (t 1 ) = qt f (t 2 ). We show that the corresponding ISL tree transducer Ψ ISL f exists. Since stem k−1 (Σ T n ) is a finite set, the equivalence relation ∼ f partitions Σ T into at most stem k−1 (Σ T n ) blocks. Thus there exists a canonical linear DFT Ψ c f = {Q c , F c , Σ, Γ, δ c }. π c is the process function derived from δ c that maps Σ T n to Q c × Γ T . Construct Ψ = {Q, F, Σ, Γ, δ} as follows: Ψ is ISL by construction, as the states and transitions of Ψ meet requirements (1) and (2) of Theorem 2. The following proof show that Ψ computes the same function as Ψ c f by showing that Ψ and Ψ c f generate the same function. In other words we show ∀t ∈ Σ T n , First, we show that π(t) = (stem k−1 (t), u) iff π c (t) = (qt(t), u). Clearly, the base case is satisfied. For all a ∈ Σ and v ∈ Γ T [X], (λ, a, v, q) ∈ δ iff (λ, a, v, qt(q)) ∈ δ c . Thus π c (a[ ]) = (qt(a[ ]), v) and π(a[ ]) = (stem k−1 (a[ ]), v). Next assume that there exist t 1 , t 2 , . . . , t m ∈ Σ T n and u 1 , u 2 , . . . , u m ∈ Γ T such that π(t i ) = (stem k−1 (t i ), u i ) iff π c (t i ) = (qt(t 1 ), u i ) for each 1 ≤ i ≤ m. We show ∀a ∈ Σ and ∀v ∈ Σ T [X] such that π(a[t 1 t 2 . . . . By substitution then, we have π c (t i ) = (qt(q i ), u i ) for each 1 ≤ i ≤ m and qt(q 1 )qt(q 2 ) . . . qt(q m ), a, v, qt(a[q 1 q 2 . . . q m ]) ∈ δ c . By construction of Ψ , Conversely, consider any a ∈ Σ and v ∈ Σ T [X] and suppose π(a[t 1 t 2 . . . t m ]) = stem k−1 (a[t 1 t 2 . . . t m ]), (φ(u 1 u 2 . . . u m ], v)) . By assumption, π(t i ) equals (stem k−1 (t i ), u i ) for each 1 ≤ i ≤ m. Thus stem k−1 (t 1 )stem k−1 (t 2 ) . . . Let q i = stem k−1 (t i ) for each 1 ≤ i ≤ m as before. It follows that stem k−1 (t i ) = stem k−1 (q i ), so qt(t i ) = qt(q i ). Likewise, stem k−1 (a[t 1 t 2 . . . t m ]) = stem k−1 (a[q 1 q 2 . . . q m ]), so qt(a[t 1 t 2 . . . t m ]) = qt(a[q 1 q 2 . . . q m ]). Therefore, stem k−1 (q 1 )stem k−1 (q 2 ) . . . stem k−1 (q m ), a, v, stem k−1 (a[q 1 q 2 . . . q m ]) ∈ δ. By construction of Ψ , this means qt(q 1 )qt(q 2 ) . . . qt(q m ), a, v, qt(a[q 1 q 2 . . . q n ]) ∈ δ c . Since π c (t i ) = (qt(t i ), u i ) for each i by assumption, it follows that π c (a[t 1 t 2 . . . t m ]) = qt(a[q 1 q 2 . . . q n ]), (φ (u 1 u 2 . . . u n , v) . We need to further show that stem k−1 (t) ∈ F iff qt(t) ∈ F c . By construction, we know that q ∈ F iff qt(q) belongs to F c . Thus stem k−1 (t) ∈ F iff qt(stem k−1 (t)) ∈ F c . By Remark 1, stem k−1 (t) = stem k−1 (stem k−1 (t)). Hence qt(t) = qt(stem k−1 (t)). Therefore, stem k−1 (t) ∈ F iff qt(t) ∈ F c . This concludes the proof that Ψ and Ψ c f generate the same function. As mentioned earlier, the value of Theorem 2 is that it can be used to establish that certain tree transformations are ISL by presenting a transducer for the transformation which satisfies the properties specified by the theorem. Example 7. This example shows that reversing the branch order of a regular tree set T ⊆ Σ T n is ISL. We illustrate with the classic tree language whose yield is the string language a n b n . In other words we wish to show that the transformation that maps t 1 The reader can verify that this transducer correctly reverses the branch order of the trees in its pre-image. Further, this construction shows the function is ISL since it satisfies the requirements in Theorem 2. This paper took a first step in characterizing local syntactic transformations by generalizing Input Strictly Local string functions to trees. Future work includes defining Output SL tree functions (cf. [3] ) and studying whether these classes of tree functions can be learned more quickly and with fewer resources, and characterizing subclasses of tree transducers which characterize the types of nonlocal processes found in syntax and machine translation. Finite State Morphology Learning strictly local subsequential functions Output strictly local functions The Minimalist Program Tree automata techniques and applications Minimization of deterministic bottom-up tree transducers Closure properties of minimalist derivation tree languages Curbing feature coding: strictly local feature assignment C-command dependencies as TSL string constraints The computational nature of phonological generalizations Grammatical Inference for Computational Linguistics. Synthesis Lectures on Human Language Technologies Applications of weighted automata in natural language processing Minimalist tree languages are closed under intersection with recognizable tree languages Survey: tree transducers in machine translation Counter-Free Automata Computational Approaches to Morphology and Syntax Strict LT2: regular :: local : recognizable A Descriptive Approach to Language-Theoretic Complexity Aural pattern recognition experiments and the subregular hierarchy An Introduction to Syntactic Analysis and Theory