UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN Digitized by the Internet Archive in 2013 http://archive.org/details/generatingtreeso903zaks /J/dtA UIUCDCS-R-77-903 UILU-ENG 77 1762 GENERATING TREES AND OTHER COMBINATORIAL OBJECTS LEXICOGRAPHICALLY By S. Zaks and D. Richards November 1977 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Jte Library of the FEB 1 ° 1978 GENERATING TREES AND OTHER COMBINATORIAL OBJECTS LEXICOGRAPHICALLY by S. Zaks and D. Richards Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 November 1977 This work supported in part by the National Science Foundation under grant number NSF MCS 73-03408. I. INTRODUCTION Motivated by the problem of generating, ranking and unranking all k-ary trees with n nodes, we solved it for the more general case of all trees with n. nodes having k. sons each, i = 1, 2, ... , t, and n n + 1 leaves (hence V^fk, - On,). We establish a 1-1 correspondence between those trees and the integer sequences a, a ? . . . a , n = 2L n. , which have n. occurrences of k. for i = 1 , 2, ... , t, and n Q O's, such that in each prefix a, a ? ... a, 1 <_ l <_ n, the number of O's is not greater than 2-i(k- - 1) • (number of k.'s in the pre- fix). It turns out that there is also a 1-1 correspondence between these sequences and the lattice paths L = L Q L, ... L in the (t + 1 )-dimensional space, from the point (n Q , n, , ... n.) to the origin (0, 0, ... ,0), which do not go below the hyperplane x~ = 2 (k. - l)x-. These correspondences will be shown in section 2. The algorithm, which lexicographically generates a modified version of the above sequences, is discussed in section 3. The ranking and unranking procedures are the subject of section 4. + We assume that each step in the path is directed towards the oriqin, and we use this assumption throughout this paper. 2 II. TREES, SEQUENCES AND PATHS Ordering Of Trees We will deal solely with ordered trees (or planted planar trees ). We will follow the conventions in [5]. Let K = (kg, ... , k.) and N = (rig, ... , n ) be (t + l)-tuples of non-negative integers, such that k > k. -.>...> kg = and n = I, (k. - l)n.. We are concerned with the set of trees T(K,N) where each tree has n. nodes with k. sons each for 1 <_ i <_ t, and for convenience, there are n + 1 nodes with sons, i.e. leaves. If t = 1 , then we have regular k,-ary trees with n internal nodes. There are two common ways to order k-ary trees found in the current literature which we generalize to the set T(K,N). Let |T| be the number of nodes in tree T, r T be the degree of its root, and T. be the subtree rooted at the i son of the root of T. The ordering given in [4,5] for binary trees and in [9] for k-ary trees can be generalized to arbitrary trees as A-order as follows: Two trees, T and T', are in A-order , T < T', if 1) | T | < |T' | or 2) |T| = |T' | and for some i, 1 < i < r Jt we have a) T. = T^ for j = 1, 2, ... , i-1 and b) T. < T! Let p T be the sequence formed by consecutively numbering the nodes (by travers- ing T) in post-order and reading them in pre-order. Two trees, T and T', are in A-order if p T is lexicographically less than p T , . For binary trees in-order is interchangeable with post-order and is used in [4]. The proof of this cor- respondence is analogous to the proof used in the binary case. A second ordering is given in [12,13] for k-ary trees which we generalize to arbitrary trees as B-order as follows: In a regular k-ary tree, each internal node has exactly k sons Two trees, T and V , are in B-order , T < T 1 , if 1) r T < r T , or 2) r T = r-p, and for some i, 1 <_ i <_ r T we have a) T. = T". for j = 1, 2, ... , i - 1 and b) T. < t: 1 1 Later we give our interpretation of B-order using sequences. The best known algorithms for ranking and unranking trees are considerably more efficient when the trees are in B-order. Therefore, in this paper, we will only be concerned with generation in B-order. Tree Sequences and Lexicographic Ordering Define A(K,N) to be the set of integer sequences a = a, a ? ... a that have n. occurrences of the integer k. and possesses the dominating property. A sequence, a, has the dominating property if the number of O's is not greater than s -j(k- - iMnumber of k. 's) for every prefix a, a 2 ... a , 1 <_ l £ n. The following theorem was proved in [1] (using the palindromes of our sequences) . Theorem 1 : There is a 1-1 correspondence between T(K,N) and A(K,N). This correspondence is simple to understand and use. Given a tree, T, construct the sequence a T by labeling each node with its number of sons and reading the labels in pre-order. The last node is not read since it is always a leaf, and its omission simplifies matters. This maps a tree to a sequence. The inverse mapping is accomplished by building a tree node by node from the sequence a-p Begin by creating a root with degree a, and position a pointer there. In general, process a. by creating a new son of the node v currently pointed to and move the pointer from v to it. If v has its requisite number of sons, backtrack to v's father. The dominating criterion arises naturally since a tree in T(K,N) has 2, (k- - l)n. + 1 leaves. If the criterion was violated, it would indicate the existence of a completed tree which does not arise since the final leaf is omitted. The property can be written as 2, (a. - 1) >_ 0, since the sum of the negative terns is the number of O's,or more succinctly as This has a simple interpretation: there are not more nodes than the collective number of sons. Pre-order search of trees shares with other search methods the property that it inspects all of a node's ancestors before inspecting that node. This is enough to insure the seguences read are in A(K,N). Breadth- first search is such a search method and is used in [1,31. We will use ore- order exclusively because it simplifies the qeneration procedure. Similar sequences have been studied extensively in relationship to the "ballot problem" (for example [12]). They have been related to binary trees in [2] and were independently given for k-ary trees in [3,13,14]. An overview of such sequences is found in [6]. It is vital to note that T < T' (i.e. in B-order) if a-j- is lexicographi cally less than a-p . This follows from the definitions of a T and B-order. Mote that r T and r T , are equal to a, and a,, respectively. Therefore, if r T < r-p , then a T precedes a-p. If r T = r-p and the first i - 1 subtrees are equal, then the corresponding prefixes of a-p and ap are equal, since a T is formed in a pre-order fashion. Then the argument recurs on T. and T'. . Therefore, if we generate the sequences of A(K,N) lexicographically, we will generate the trees of T(K,N) in B-order. In arranging the sequences of A(K,N) in lexicographic order, as normally defined, only the relative values of the k. 's are needed. Therefore, since k. + , k. and k~ = 0, we find it convenient to map the sequences of A(K,N) to sequences b = b, b« ... b , where b. = j if a- = k.. More formally, b is an element of B(K,N) if it contains n. occurrences of the inteqer i and £ for every subsequence b, b~ . . . b , I, k, >_ i. There is obviously a one-to- one correspondence between A(K,N) and B(K,N) that preserves the lexicographic ordering. We note that there is also a correspondence between ordered forests and such sequences. An ordered forest consists of ordered trees which are in turn ordered. Define F(K,N) in the same way as T(K,N) except that n = S*(k. - l)n. + (f - 1), i.e. F(K,N) = T(K,N) if f = 1. A(K,N) is defined analogously. If we introduce a new node v of degree f and connect it to the roots of the f ordered trees we create one ordered tree. The corresnondence is easily seen if we prefix the sequence a, a e A(K,N)» with the inteqer f to get a sequence corresponding to the tree with root v and note that this sequence now has the dominating property. Our generation and ranking proce- dures will work identically on sequences from F(K,N) as from T(K,N), but we use T(K,N) in our discussion for clarity. Lattice Paths Corresponding to these sequences are lattice paths within a bounded region of (t + 1 )-dimensional space from the point (n Q , n, , ...,n.) to the origin. Each step of the path is one unit towards the origin parallel to some i dimensional axis, and the path may not go below the hyperplane x„ = 2-|(k.-l)x. Let P(K,N) be this set of paths. In [1] it is shown that there is one-to-one correspondence between B(K,N) and P(K,N). To map a path to a sequence, let b. be i if the j step of the path is parallel to the i dimensional axis. The inverse mapping follows immediately. More formally, an element of P(K,N) is a sequence of lattice points L Q L 1 ... L n where L Q = (n Q , n ] , . . . , n t ) , l^ = (0, 0, . . . ) and if b i = j and L^-j = (x Q , x-, , ... x t ) then L i = (x Q , ... Xj-j , x- - 1 , x. +1 , ...x t ). Example As an example, consider the tree in Figure 1 which is an element of T(K,N), where K = (0,2,3) and N = (4,2,1). The corresponding elements from A(K,N), B(K,N) and P(K,N) are given. (3 2 2 0) e A(K,N) (2 1 1 0) e B(K,N) Figure 1 Ill, GENERATION OF TREES In the preceding section we described a mapping from B(K,N) to A(I(,N) and from A(K,N) to T(K,N). We also showed that the lexicographic order in B(K,N) corresponds toB-order in T(K,N). To generate the next tree after a given tree T we produce its corresponding sequence b e B(K,N) and generate the lexicographically next sequence b' and map it to T 1 , the next tree. All the sequences of B(K,N) are permutations of each other. If it were not for the condition that each sequence should have the dominating property, it would be a simple matter of generating permutations in lexicographic order which has been well studied [8]. However, we have chosen one such algorithm and adapted it. In its original form, it can be described as follows: scan the permutation c, c« . . . c of {1, 2, ... n}, from the right until the first occurrence of c < c.,-,. Substitute c. with the least c 3 l l+l i j such that c. > c. and j > i and append after it the first permutation of {c, C + ,, ... , c } - {c} in the ordering. (This is done efficiently by a single exchange and subsequence reversal.) Similarly, we scan from the right for the first b. < b. + , and sub- stitute b. with the appropriate b.. And again, we append the first permutation of b* = {b., b.,,, ... b } - {b.}. If it were not for the dominating pro- li+l n j m m 3 r J m Q m 1 m 2 m. perty, the first permutation of b* would be 1 2 ... t where S indicates x repetitions of the sequence S, and there are m. occurrences of i in b*. Let d = rru - Z, m.(k. - 1). The first permutation of b* is . k,-l m, k 9 -l m 9 k.-l m d (l ' ) ' (2 l ) l ... (t t ) t . To show this, note that the dominating property can be rewritten as s n _ £+ -i k. <_ i. Therefore if the first permutation of b* began with d+1 O's, the property would not hold. The next character must be the smallest non-zero character of b*; otherwise, some other sequence would precede it. Say it was a 1, then at most k,-l O's may follow before the property is again violated. The arguments recurs, establishing the above permutation. Note that d > 0, since the original sequence had the dominating property and b. < b., when b. and b. were interchanged. 1 J "i J We now state the preceding discussion as an algorithm. Note that for termination is checked before loop entry. It is easily seen to have time complexity 0(n) where n = 2 n.. Algorithm GENERATE (b) ; (This algorithm generates the lexicographically next sequence after the input sequence b). for j + 1 to t do m- + 0; i -*- n; sum «- 0; while b. i > b. do begin m^ ■*■ m, + 1 ; i i if b. > then sum <- sum + k, - 1; — i b. i +■ i - 1 end ; j ^ 0; I + b i _ 1 + 1; while j = do rf m > then j «- £. else l «- £ + 1 ; m. -<- m. + 1 ; b i-l b i-l b i-l * j; m, +-m. - 1; J J m n * m n " sum; for j ^ to t do wh i 1 e m . > do begin b . ■+■ j ; mj «- m . - 1 ; i •*■ i + 1 ; for i +■ 1 to k - 1 do begin b. <- 0; i <- i + 1 end end. This algorithm can be stated more succinctly in the case of k-ary trees, i.e. t = 1. In [14] this is done, but the more convenient reverse- ly -order was used, where the precedence relations are merely reversed. Generation and ranking are both done differently but use sequences from B(K,N). In [9,10] k-ary trees are ranked and generated in B-order by using a mapping between k-ary trees and binary trees. In [7] binary trees are generated and ranked in B-order. They use the sequence of the level numbers (i.e. the heights) of the leaves read in in-order. The correspondence of these two methods to B-order was established in [13]. Applying the generating algorithm to B(K,N) as in the example in the end of the section, we get the following sequences: 10 Index Sequence Index Sequenc e 1 1010200 12 1210000 2 1012000 13 2001010 3 1020010 14 2001100 4 1020100 15 2010010 5 1021000 16 2010100 6 1100200 17 2011000 7 1102000 18 2100010 8 1120000 19 2100100 9 1200010 20 2101000 10 1200100 21 2110000 11 1201000 11 IV. RANKING AND UNRANKING In this section we compute the function Index(L) that, given a path L e P(K,N), will compute its corresponding position in the lexicographic ordering of P(K,N); also, given an integer w, we construct the path L e P(K,N) such that Index(L) = w. As discussed earlier, the paths L = L n L, ... L in P(K,N) are those lattice paths from the point (n fl , n, , ... , n.) to the origin which do not go below the hyperplane x = Z, (k.-l)x.. We make use of the multinomial coefficients d \ I if any d. is < d-, , dp, ... , d„/ 1 , , , ■ A , otherwise where d = 2, d. . The multinomial coefficient has a familiar interpretation as the number of lattice paths from the point (d, , dp, ... , d ) to the origin. This interpretation gives a combinatorial proof of the following lemma, which is also easily proved directly from the above definition: Lemma 1 : If d = 2,d. and all d. are integers, then d \ £ / d " ] V d 2' ••' ' d £/ = i ? 1 (v d T d 2' •■■ ' d i-V d i - !■ d i+T •'• ' d £ y Let C(n ,n,,np, ... ,n.) denote the number of lattice paths from the point (n fi , n, , ... , n.) to the origin which do not go below the hyperplane x fl = S.(k.-l)x. The following theorem defines these entries recursively, and solves the recurrence relation: 12 Theorem 2: The solution to the recurrence relation C(n ,n 1 ,n 2 ,. . . ,n t ) = / is given by ^ 1 t V La j=0 n.<0 for i = 1 ,2,. . . , or t. n Q = n } .. =n t =0 n Q = ^ (k.-l)n. - 1 2 C(n Q ,n, ,. . . ,n.-l ,. . . ,n ) otherwise C(n 0> n.|,h 2 ,...,n t ) = \n Q ,n 1 ,. . . ,n t J - 2 (k^-1 j(n Q +l ,n-j ,. .. ,n..-l ,.. . ,n t j (*) where n = n~ + n, + ... + n. . Proof : We show that C(n n ,n, , 1=1 , n.), as given by (*), satisfies the re- currence relation and the boundary conditions. When n. < for i = 1, 2, . or t (*) gives the value by definition. The case n n = n, = ... = n, = is taken care of in the same way. When n n = 2 (k.-l)n. - 1 and no n. is < 0, (*) can be rewritten as C( VV n 2 n t> 'T vl)!n;i...n t l ^ " S T ^i" 1 )n i ] from which it is clear that C(n Q ,n 1 , . . . ,n t ) is for this case. If n Q , n^ , ... , n are none of the above, we prove the recurrence by induction on n. For n = 1 , (*) is correct. We assume that it holds for any m < n, and take n = n Q + n, + . . . + n. . By the recursive definition of C(n ,n.j,. we have t C(n Q ,n 1 ,. . . ,n t ) = 2 C(n Q ,n 1 , . . . ,n -l,...,n t ). For each of the terms on the right, we use (*), by the induction hypothesis, and get ^ , n-1 ,n t ) C(n Q ,n 1 ,.. . ,n t ) = 2 \ n o' n i » ■ ■ • ' n j -1 » • • • ' n t t - 2 j=0 t n-1 i=l i_ H n o +1,n l ,, " ,n i" 1,,,,,n j" 1,,,,,n t' ' n t 13 t n-1 = j? (vv---' n j- 1 '---' n t) t t / n-1 i-i ] j=ov n o +1 ' n i "r 1 n r i , . . . ,n t ; which, by the previous lemma, gives c( vv ...,n t ) =( vv ".. in J - .£ (k 1 -i)( vlfni ,..."„ rli as desired. D This theorem has been previously solved for the t = 1 case: for k, = 2 it was solved in [11] and a solution for arbitrary k, is found in [12]. A solution for the general problem for points on the hyperplane x~ = 2, (k.-l)x. is given in [1] using an involved generating function argument. Given a path L e P(K,N), we find its position Index (L) in the lexicographic ordering of P(K,N), as follows: Theorem 3 : Let b = b-, b« • . . b e B(K,N) and the corresponding lattice path L = L Q L 1 ... L n e P(K,N), where L i = (y i0 »y ir ••• » y it )- Then n-1 W 1 Index(L) = 1+2 2 C(y, n , y„, ... , y . . - 1, ... , y..) i=0 j=0 1U 1 ' 1J ir where the C( .,.,...,.) 's are given by Theorem 2. Proof : By definition we know that all the sequences that begin with either of 0, 1, ... or b-,-1 will come before this sequence b, and their b!-l number is indicated by the inner summation 2 . This follows from the defini- j=0 tions of b., y. . and Theorem 2. Next, we know that all the sequences which begin with b,0, b,l, ... , or b-j(bp-l) will come before b, and their number is indicated b 2 -l by the summation 2 . The rest follows immediately by induction, following this j=0 14 line of argument. The constant 1 is added so that the indexing will begin with 1 rather than 0. □ The time complexity depends on how the C(x Q , ... , x.) are calculated. If storage is inexpensive or if ranking is done frequently all the values could be stored in (t+1 )-dimensional array of space complexity 0(n*) in time proportional to (t+1 ) n* where n* = n^n,-- Using this array, however, allows each ranking to be done with time complexity 0((t+l)n). If ranking is done infrequently the values of the C(x ( - ) , ... , x.) can be calculated as needed 2 2 in time 0((t+l)n). This leads to a 0((t+l) n ) ranking procedure. Note that for most applications (t+1) will be small and independent of n. In [9] a ranking procedure for k-ary trees is given which is 0((nk) ) time-bounded. The best known ranking procedures [9] for k-ary trees in A-order are 0(n ). To illustrate this procedure refer to the tree and path in our previous example. In Figure 2 the lattice points have been labeled with C(x n , ... , x.). The path L corresponding to that tree gets the following rank: Index(L) = 1+12 + 5 + + 2 + + + 0= 20. As for the unranking procedure, we will follow Theorem 3 in a re- verse order. We are given a number w, and look for a path L such that Index(L) = w. The idea is best explained by an example: suppose we want to find the 20 th sequence in P(K,N) where K = (0,2,3) and N = (4,2,1). Starting at the point (4,2,1) we sum up the entries in direction 0, 1, ... (see figure 2) in that order as long as we do not exceed 20 - 1 = 19. Here we take 12, which corresponds to making the first move from (4,3,1) to (4,3,0), or b-, = 2. Starting from this point (4,3,0), we can sum up the entries in the directions 0, 1, ... (in that order) as long as we do not exceed 19 - 12 = 7. Here 15 X / 3 ] A 12 s 2 1 \21 t.^\ V \ f "v. . i 5^ / 2 t f \ \ \ i \ \ \ \ 2 x Si \l 1 / V. \ \ \ / Y < K \ A.y "S* '■■— i "\ Fiqure 2 16 we take 5, which corresponds to making the next move from (4,3,0) to (3,3,0), etc. This is given more formally as algorithm UNRANK Algorithm UNRANK(w) ; (This algorithm returns the b sequence corresponding to the lattice path L having rank w. The rank of the sequence beginning at L. is always u.) u +■ w - 1 ; (y »y r --- .y t ) * (n Q ,n 1 ,...,n t ); for i *- 1 to n do begin (Find the largest j such that sum of entries in the first j directions does not exceed u) j «- 0; sum <- 0; s +■ c(y Q -l, y r ... ».y t ); while sum + S < u do begin sum ■*■ sum + S; j ■*- j + 1; S *■ C(y Q , ... , yj -l, ... , end; b i t J; y j *y r l; u «- u - sum; end The proof follows directly from the previous theorem. The space and time complexity considerations are the same as for the ranking procedure. ACKNOWLEDGEMENT The authors wish to thank Professor C. L. Liu for his valuable assistance and encouragement during this research. 17 [RENCES I. Z. Chorneyko and S. G. Mohanty, "On the Enumeration of Certain Sets of Planted Plane Trees," Journal of Combinatorial Theory (B) 18, 209- 221 (1975). N. G. DeBruijn and B. J. M. Morselt, "A Note on Plane Trees," Journal of Combinatorial Theory 2, 27-34 (1967). D. A. Klarner, "Correspondences Between Plane Trees and Binary Sequences," Journal of Combinatorial Theory 9, 401-411 (1970). G. D. Knott, "A Numbering System for Binary Trees," Communications of the ACM , February 1977, vol. 20, no. 2. D. E. Knuth, The Art of Computer Programming , vol. 1: Fundamental Algorithms, Addi son-Wesley, 1968. R. C. Read, "The Coding of Various Kinds of Unlabeled Trees," Graph Theory and Computing , R. C. Read editor, Academic Press, 1972. F. Ruskey and T. C. Hu, "Generating Binary Trees Lexicographically," SIAM Journal on Computing , to appear. R. Sedgewick, "Permutation Generation Methods," Computing Surveys 9, 152-154 (1977). A. E. Trojanowski, "On the Ordering, Enumeration and Ranking of k-ary Trees," Technical Report UIUCDCS-R-77-850, Department of Computer Science, University of Illinois at Urbana-Champaign, February 1977. A. E. Trojanowski, "Ranking and Listing Algorithms for k-ary Trees, SIAM Journal on Computing , to appear. W. A. Whitworth, "Arrangements of m Things of One Sort and m Things of Another Sort Under Certain Conditions of Priority," Messenger of Math 8 (1878), 105-114. A. M. Yaglom and I. M. Yaglom, Challenging Mathematical Problems with Elementary Solutions , vol. 1, Combinatorial Analysis and Probability Theory, Hoi den-Day, 1964. S. Zaks, "Generating Binary Trees Lexicographically," Technical Report UIUCDCS-R-77-888, Department of Computer Science, University of Illinois at Urbana-Champaign, August 1977. S. Zaks, "Generating k-ary Trees Lexicographically," Technical Report UIUCDCS-R-77-901 , Department of Computer Science, University of Illinois at Urbana-Champaign, November 1977. 3LI0GRAPHIC DATA EET 1. Report No. UIUCDCS-R-77-903 3. Re< i pi c nt '•. Acce *sion Nc l'it K' ana Subt it It- GENERATING TREES AND OTHER COMBINATORIAL OBJECTS LEXICOGRAPHICALLY 5- Report Date November 1977 fhor(s) S. Zaks and D. Richards 8. Performing Organization Re PC. No. rformtng Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 10. Project/Task/Work Unit N< 11. Contract /Grant No. MCS-73-03408 Sponsoring Organization Name and Address National Science Foundation Washington, D.C 13. Type of Report & Period Covered 14. :pplementary Notes Abstracts We show a one-to-one correspondence between all the ordered trees, that have n + 1 leaves and n. internal nodes with k. sons each, for i = 1, ... , t, (hence n fi = 2 (k. - l)n.) and all the lattice paths in the (t + 1)- dimensional space, from the point (n Q , n, , ... , n t ) to the origin, which do not go below the hyperplane x Q = 2 (k - - l)x,. Procedures for generating these paths (and thus the ordered trees) are presented and the ranking and unranking procedures are derived. Key Words and Document Analysis. 17o. Descriptors ordered tree, lattice path, lexicographic order, ranking, unranking b. Ident if iers /Open-Ended Terms 'c. ( OSATI Field/Group (.Availability Statement 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price 3RM NTIS-35 I 10-70) USCOMM-DC 40329-PV1 AUG l 5 1980