Hi 
 
 Hi Silsl 
 
 ■ 
 
 wMBBEBRm 
 
 ■ 
 
 ■ 
 
 mama 
 
 Mmmgm m 
 
 MM ' Hi RSWWS 
 
 H 
 
 ■ 
 
 ittfffiil 
 
 IB 
 
 U 
 
 MB 
 
 iffl 
 
 HH 
 
 HfsSnn 
 
 Si 
 
 HflH 
 
 
 Bfl WB&BBB& 
 
 BOB — H 
 
 ^■9 jaH HHH 
 HHHH 
 
 fiEPW 9ffi B6B9i 
 
 ■ BIbI IH IS 
 
 B I ■ < 138 BH eh 
 
 Hil HH 
 
 ■ H 
 
 HUBS 
 
 ■ 1 
 
 B 
 
 RSI!!! HflBB 
 
 IbI WHllS 
 Hm m mm 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 I46r 
 no .679-68+ 
 cop. 2. 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 NOV 5 
 
 OCT 13 
 
 RECD 
 
 JAN 2 
 
 4 
 
 L161 — O-1096 
 
ON RANDOM 3-2 TREES 
 
 by 
 Andrew Chi-Chih Yao 
 
 October 197^ 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 IHE LIBRARY OF THE 
 
 NOV 2 7 1974 
 
 ^V OF li t INOIS 
 
uiucdcs-r-7^-679 
 
 On Random 3-2 Trees 
 
 t>y 
 
 Andrew Chi -Chin Yao 
 
 October 197^ 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 Research supported by NSF Grant GJ^1538. 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/onrandom32trees679yaoa 
 
510. ty 
 
 Titer 
 
 L&) 2" ABSTRACT 
 
 It is shown that n(N), the average number of nodes in an 
 N-key random 3-2 tree, satisfies the inequality 0.70N-e <n(N) <0.79N+e. 
 A similar analysis is done for general B-trees. It is shown that 
 storage utilization is essentially 2m2 « 69$ for B-trees of high 
 orders . 
 
1. Introduction 
 
 Balanced tree structures are often used in the organization of 
 information. One attractive scheme, called "3-2 trees, " was introduced 
 by J. Hopcroft [l] [4]. Some interesting questions concerning 3-2 trees 
 have been raised [3] [k-]. In this paper we present a partial solution to 
 a problem posed in [3]» 
 
 A 3-2 tree is a tree in which every internal node contains either 
 1 or 2 keys, and all the leaves are at the same level (see Figure l). In 
 drawing 3-2 trees we shall adopt the notation used in [3]« Thus keys are 
 represented by dots inside a node as we shall only be interested in the 
 structure of the trees. 
 
 To put a new key into a node that contains only one key, we 
 simply insert it as the second key. If the node already contains 2 keys, 
 we split the node into two nodes containing respectively the minimum and 
 the maximum of the three keys, and insert the middle key into the parent 
 node by repeating the process. When there is no node above, a new root 
 node will be created to hold the middle key. 
 
 Consider a 3-2 tree T with j-1 keys in it. These j-1 keys 
 divide all possible key values into j intervals. The insertion of a new 
 key K. into T is said to be a random insertion if K. has equal probabilities 
 for being in any one of the j intervals defined above. 
 
 Now consider the building of 3-2 trees by successive random 
 insertions. The average cost involved is dependent on the specific 
 implementation of the insertion algorithm. There are, however, certain 
 quantities that are useful in general for the analysis. One quantity of 
 interest is n(N), the average number of internal nodes in a 3-2 tree after 
 N keys have been randomly inserted into the empty tree. In this paper we 
 
Figure 1 A 3-2 tree with 11 keys 
 
 Figure 2 The two types of 3-2 trees of height 1 
 
shall derive bounds on n(N) and on the corresponding quantity for 
 B-trees (see Section 3)» A systematic procedure for deriving improved 
 bounds is discussed, but the computation involved appears to be 
 prohibitive. Main results are contained in Theorems 2.6, 2.13, and 3»1» 
 
 2. Number of Nodes in 3-2 Trees 
 
 Let T be any 3-2 tree. We shall use n(T) to denote the number 
 of internal nodes of T. Let f N (T) be the probability that T will result 
 after N random insertions are made to the empty tree. Obviously f N (T) is 
 zero unless T contains exactly N keys. In terms of f N (T) and n(T), the 
 average number of nodes n(N) defined in Section 1 can be expressed as 
 follows : 
 
 n(N) = Z n(T)f N (T) (l) 
 
 To derive bounds on n(N), we observe that most of the internal 
 nodes of any 3-2 tree appear on the lowest few levels. Therefore, a 
 good estimate of n(N) can be obtained by analyzing the number of internal 
 nodes in those levels of a random 3-2 tree. We shall carry out the 
 analysis for the lowest level first in the next subsection, and then 
 take the second lowest level into account in Section 2.2. 
 
 2.1 First Order Analysis 
 
 As shown in Figure 2, there are two types of 3-2 trees of 
 height 1. The type 13-2 tree contains 1 keys and the type 2 3-2 tree 
 contains 2 keys. An arbitrary 3-2 tree T is said to be of class (l;x^,x ) 
 if among the lowest hight 1 subtrees of T, x, of them are of type 1 and 
 Xg are of type 2. The 3-2 tree shown in Figure 1 is of class (l;3,2). Let 
 T be an N-key 3-2 tree of class (l;x,, x ), the following lemmas are easy 
 to obtain: 
 
Lemma 2.1 : 2^ + 3^ = N+l 
 
 Proof : Both KH-1 and 2x + 3x^ are equal to the number of leaves of T. Q 
 
 Lemma 2.2 : |( X;L + x 2 ) - | S n(T) =g 2( Xl + x 2 ) - 1 
 
 Proof: There are x. + x - 1 keys contained in the internal nodes above 
 
 the lowest level. Thus the number of nodes above the lowest level, 
 
 n(T) - (x 1 + x 2 ), satisfies ^x^+Xg-l) ^ n(T) - (x 1 +x 2 ) ^+^-1. 
 
 Lemma 2.2 follows. □ 
 
 Definition 2.3 : Let ^(x ,x £ ) be the set of 3-2 trees of class (ljx^x^). 
 
 Define 
 
 Definition 2.k : A. (N) d l f v Z v x • P (x ,x ) i = 1,2. 
 
 1 X-. y X/-J X XV J- C-. 
 
 P (x ,x p ) is obviously the probability for a random H -key 3-2 
 tree to be of type (l^,,^), and A. (l) is the average value of x. for 
 random N-key 3-2 trees. 
 
 Lemma 2.3 : |(A 1 (N) +A 2 (N)) - | < n"(N) < 2(A 1 (w) +A 2 (N)) - 1. 
 Proof : This follows from Lemma 2.2 and the definitions of n(N), A (n), 
 
 a 2 (n). a 
 
 Lemma 2.6 : A (N) = |(N+l), A (n) = ^(N+l) for N ^ 6. 
 
 Proof : Let T be an (N-l)-key 3-2 tree of class (l;x_,x ). By making a 
 random insertion to T, we will obtain a 3-2 tree either of class 
 (l;x 1 -l,x +l) or of class (l;x n +2,x p -l). The former situation happens, 
 with probability 2x^/N, when the new key is" inserted into a subtree of 
 type 1. Thus we have 
 
V M) ■ xjx 2 W*!'^ 
 
 2x n 2x 
 
 x Jx 2 P N-i (x r x 2 )(x i + ir + 2) 
 
 i(x 1 -l) + (l-- 1 i)(x 1+ 2) 
 
 -6x. 
 
 (l-f^N-l) +2 _ (2) 
 
 With initial condition A (l) = 1, it is easy to show from (2) that 
 
 2, 
 
 A 1 (N) = |(N+1) f or N i? 6 (3) 
 
 Lemma 2.1 implies 2A (N) + 3A (N) = N+l. This and (3) give 
 
 A (N) = y(N+l) for N ^ 6 
 
 D 
 
 Lemma 2.5 and 2.6 lead immediately to the following theorem: 
 Theorem 2.6 : ^-N + y =g n(W) « In - ^ for N ^ 6. 
 Corollary : 0.6^N g n(N) g 0.86N for N £ 6. 
 
 N — / \ 
 The above bounds should be compared with the obvious bounds — S n(NJ g N, 
 
 which can be regarded as the zero order approximation of n(N). 
 
 2.2 Second Order Analysis 
 
 Better bounds for n(N) can be derived by considering the internal 
 nodes on the lowest 2 levels of 3-2 trees. Let us divide all 3-2 trees of 
 height 2 into 9 types as shown in Figure 3* For any 3-2 tree T with no fewer 
 than 3 keys, we can classify T by its height 2 subtrees. We shall say 
 that T is of class (2;x..,x , . . .,x ) if there are x. height 2 subtrees of 
 type i for each i (Figure h) . Let T be an N-key 3-2 tree of class 
 (2;x..,x , . . . ,x ) . The following two lemmas are easy to prove. 
 
Type 1 
 
 Type 2 
 
 Type 3 
 
 Type k 
 
 Type 5 
 
 Type 6 
 
 Type 7 
 
 Type 8 
 
 Type 9 
 
 Figure 3 There are 9 types of 3-2 trees of height 2 
 (leaves not shown). 
 
Figure k A 3-2 tree of class (2;1,2, 0,0, 0,0, 1,0,0) 
 
8 
 
 Lemma 2.7 : hx ± + 5^ + 6x_ + 6x^ + 7x + 7xg + 8x^ + 8xg + 9x^ = N+l. 
 
 Proof : Similar to the proof of Lemma 2.1. Q 
 
 7 5 o 9 1 , mN , 3 9 
 
 Lemma 2.8 : jr E x. + £ E x. - 5- < n(T) S 1»- e x. + 5 Z x. - 1. 
 
 2 i=l X 2 i=fc X 2 i=l i=4 
 
 Proof : Similar to the proof of Lemma 2.2. O 
 
 In analogy with the notation P (x-.,Xp) defined in Section 2.1, 
 
 we use P (2;x.,,x , . . .,x n ) to denote the probability for an N-key random 
 
 3-2 tree to be of class (2;x ,x , . . .,x ) . For each i(l g i £ 9), define 
 
 L l'~2 
 
 A.(N) 
 
 1 ,.?.,Xq X i P N (2;X l' '"'V- 
 
 Lemma 2.9 : J E A. (N) + § E A (N) - \ g n(N) ^ 4 z A (N) + 5 E A (N) - 1. 
 2 i=l X d i=4 X d i=l X i=4 1 
 
 Proof : Use Lemma 2.8 and definitions of A. (N), n(N). D 
 
 We shall study the values of the A. (N)'s. Once these numbers 
 
 are known, Lemma 2.9 determines n(N) to within 15$. 
 
 Consider any (N-l)-key 3-2 tree T of class (2;x ,x , . . .,x ) . 
 
 By examining the insertion process, it can be seen that there are 17 
 
 classes of trees that T might become upon the random insertion of a key. 
 
 These 17 possible classes together with their probabilities of occurrence 
 
 are tabulated "in Table 1. Recurrence relations for the A.(N)'s can be 
 
 1 
 
 obtained from Table 1 as in Section 2.1. For example, it is easy to show 
 that 
 
 v T ^ x n 5x q 3x A 
 
 A i (N) = x.'s W 2 5^->Vh + in- 1)+ ir'- (2)+ n ( 2 ) 
 
 6x_ 6xn 6x 
 
 + — i. (1) + —2.(1) + —2 . rr 
 
 = A 1 (N-1) +|(-U 1 (N-1) +6A (N-l) +6A 6 (N-1) + 6X,(N-l) + 6A q (N-1) 
 + 6A 9 (N-1)). 
 
 
*i 
 
 X 2 
 
 "3 
 
 x i 
 
 X 5 
 
 H 
 
 -7 
 
 -8 
 
 x' 
 
 9 
 
 Probability 
 
 V l 
 
 Xg+1 
 
 *3 
 
 \ 
 
 X 5 
 
 x 6 
 
 *r 
 
 *8 
 
 x 9 
 
 4-3L/R 
 
 *L 
 
 Xg-1 
 
 x,+l 
 
 \ 
 
 X 5 
 
 x 6 
 
 *7 
 
 x 8 
 
 x 9 
 
 axg/N 
 
 *1 
 
 x 2 -l 
 
 x 3 
 
 V 1 
 
 x 5 
 
 x 6 
 
 *7 
 
 *8 
 
 x 9 
 
 3^/N 
 
 X l 
 
 *2 
 
 x 5 -l 
 
 \ 
 
 X +1 
 
 x 6 
 
 *7 
 
 *8 
 
 x 9 
 
 6x^/N 
 y 
 
 x l 
 
 *2 
 
 *3 
 
 V 1 
 
 X 5 +1 
 
 x 6 
 
 "7 
 
 "8 
 
 x 9 
 
 kx k /U 
 
 X l 
 
 *2 
 
 x 3 
 
 Vi 
 
 X 5 
 
 Xg+l 
 
 \ 
 
 x 8 
 
 x 9 
 
 2x^/N 
 
 x l 
 
 X 2 
 
 x 3 
 
 \ 
 
 x -1 
 5 
 
 x 6 
 
 x^+1 
 
 *8 
 
 x 9 
 
 2x 5 /N 
 
 X l 
 
 *2 
 
 x 3 
 
 \ 
 
 x 5 -l 
 
 x 6 
 
 *r 
 
 Xg+l 
 
 x 9 
 
 2x /N 
 
 x 1 +2 
 
 X 2 
 
 *3 
 
 \ 
 
 V 1 
 
 x 6 
 
 *7 
 
 *8 
 
 x 9 
 
 3x 5 /N 
 
 *1 
 
 % 
 
 *3 
 
 \ 
 
 X 5 
 
 V 1 
 
 xu+l 
 
 x 8 
 
 x 9 
 
 ifx 6 /N 
 
 x +2 
 
 X 2 
 
 x 3 
 
 \ 
 
 x 5 
 
 V 1 
 
 "7 
 
 x 8 
 
 X 9 
 
 3x 6 /N 
 
 X l 
 
 X 2 
 
 *3 
 
 \ 
 
 X 5 
 
 x 6 
 
 Xj-1 
 
 x 8 
 
 V 1 
 
 2x ? /N 
 
 XL+1 
 
 Xg+l 
 
 *3 
 
 \ 
 
 X 5 
 
 x 6 
 
 3U-1 
 
 x 8 
 
 x 9 
 
 6^/N 
 
 X l 
 
 X 2 
 
 "3 
 
 \ 
 
 X 5 
 
 x 6 
 
 *7 
 
 Xg-l 
 
 x g +l 
 
 2x 8 /N 
 
 X +1 
 
 x 2 +l 
 
 *3 
 
 \ 
 
 X 5 
 
 x 6 
 
 *T 
 
 Xg-1 
 
 x 9 
 
 6x 8 /N 
 
 V 1 
 
 *2 
 
 X..+1 
 
 \ 
 
 X 5 
 
 *6 
 
 *? 
 
 x 8 
 
 v 1 
 
 6x 9 /N 
 
 *1 
 
 x 2 +2 
 
 *3 
 
 \ 
 
 x 5 
 
 x 6 
 
 *7 
 
 x 8 
 
 V 1 
 
 3x 9 /N 
 
 Table 1 Transition under a random insertion: 
 a tree of class (2;x n ,x , . . .,x ) 
 
 becomes a tree of class (2;x',x', . . .,x* ). 
 
 Each row gives the values of x',x* ...,x' 
 
 for a possible resulting class with its 
 probability of occurrence in the last 
 column. 
 
10 
 
 Similar formulas for A (N), . . .,A (N) can also be derived. These 
 relations can be compactly written in the following form: Let A(n) be 
 the 9-component column vector (A (N)), then 
 
 A(N) = (I+|D)A(N-1) 
 
 where I is the 9x9 identity matrix and D is given by 
 
 (k) 
 
 k 
 
 
 
 
 
 
 
 6 
 
 6 
 
 6 
 
 6 
 
 6 
 
 k 
 
 -5 
 
 
 
 
 
 
 
 
 
 6 
 
 6 
 
 6 
 
 
 
 2 
 
 -6 
 
 
 
 
 
 
 
 
 
 
 
 6 
 
 
 
 3 
 
 
 
 -6 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 6 
 
 k 
 
 -7 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 2 
 
 
 
 -7 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 2 
 
 1+ 
 
 -8 
 
 
 
 
 
 
 
 
 
 
 
 
 
 2 
 
 
 
 
 
 -8 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 2 
 
 2 
 
 -9 
 
 (5) 
 
 To solve A(n) from (k), we define a 9-component column vector 
 a(N) = (a.(N)) by 
 
 a ± (N) = A i (N)/(N+l) 
 
 In terms of a(N), (h) can be written as 
 
 a(N) = (l+^-(D-l))a(N-l) 
 
 (6) 
 
 (7) 
 
 To solve (7), the following two lemmas will be useful. Since recurrence 
 
 relation of the form (7) has been studied in some other context [5], we 
 
 shall omit the proofs here. 
 
 Lemma 2.10 : Let G be a p X p real matrix with simple eigenvalues 
 
 X^X^.X^, . . .,X . where \_ = and ReX , g ReX „ g ... g Re\ n < 0. 
 0' L' 2 p-1 p-1 p-2 1 
 
 If v(l),v(2), . . .,v(N), . . . is a sequence of p-component vectors satisfying 
 
11 
 
 v(j) = (l +t-tj-G) v(j-l), then there exists a vector u such that 
 
 (i) Gu = 
 
 Re\ 1 
 (ii) |(v(N)). - u. I < CN for some constant C and all i,N. 
 
 where (v(N)).,u. denote the i th component of v(N) and u respectively. 
 Corollary : v(N) -* u as N -* ». 
 
 Lemma 2.11 : Let k be a positive integer, then the polynomial 
 
 I I 
 
 g(\) = II (\+k+j ) - II (k+j ) has only simple roots. Furthermore, 
 j=0 j=0 
 
 the real parts of all the roots except the root X = are negative. 
 
 We now turn to the determination of a(N) by (7)« An explicit 
 calculation shows that the characteristic polynomial of D-I is 
 -\(\+T)(A.+8)(x+9)((^+9) 5 +25(x+9) 5 +210(x+9) 2 +136(\+9)+38^). The roots of the 
 polynomial, which are eigenvalues of D-I, are 0, -6.55-6.25i, 
 -1, -8, -9, -9«23±1.37i, -13. Mu Thus, D-I satisfies the conditions 
 on G in Lemma 2.10. Therefore, there exists a vector u = (u. ) such that 
 
 la. (N) -u. I < C^N 55 for some, constant C^ (8) 
 1 l i ' v ' 
 
 where u satisfies 
 
 (D-I)u = (9) 
 
 In terms of the u.'s, we can express Lemma 2.9 as follows: 
 Lemma 2.12 : 
 
 (N+1)(J e u. + | Z u.) -i-CN" 5 ' 55 gn(N)g (N+l)(^ e u. + 5 Z u. ) -1 + CN -5 * 55 
 2 i=l x 2 ±=14- x 2 i=l x i=k 1 
 
 for some constant C. 
 
 Proof : From equations (6) and (8), we obtain 
 
 |A (N) - (N+l)u ± | < | N" 5 * 55 for some constant C. (10) 
 The lemma follows immediately from (10) and Lemma 2.9. D 
 
12 
 
 Now, to find the values of the u. * s, we observe that 
 equation (9) determines u up to a constant factor. The normalization 
 constant can be determined as follows: Lemma 2.7 and equation (6) 
 lead to the equation 
 
 l+a 1 (N)+5a 2 (N)+6a 3 (N)46a 1+ (N)+7a 5 (N)+Ta 6 (N)+8a 7 (N)+8a 8 (N)+9a 9 (N) = 1 (ll) 
 
 Since a. (n) -* u. as N -* », (11) implies that 
 
 ifu 1 +5u 2 +6u 5 +6u ]+ +7u +7u 6 +8iu+8ug+9u = 1 , ( 12 ) 
 
 which is the equation we need in order to determine the normalization 
 constant. Therefore, solving equations (9) and (12), we obtain: 
 
 uu = 41J+/7991 = 0.052 
 u 2 = 396/7991 = 0.050 
 u 3 = 912/55937 = 0.016 
 u, = 1188/55937 = 0.021 
 
 u 5 = 1278/55937 = 0.023 (13) 
 
 u 6 = 297/55937 =0.005 
 1^ = U6/55937 = 0.007 
 u 8 = 284/55937 = 0.005 
 ■ u Q = 20/7991 = 0.003 
 
 Substituting the values of the u. ' s into the inequality in Lemma 2.12, 
 we obtain the main result of this section. 
 
 Theorem 2.13 : 0.70N+0.2-C N~ 5 * 55 ^ n(N) £ 0.79N-0.2+C N~ 5 ' 55 for some 
 constant C. 
 
13 
 
 The technique used in this subsection can be used to compute 
 higher moments of the number of nodes n(T). For example, we can set 
 up a system of recurrence relations of the form of equation (k) for the 
 quantities A. . (n) where 
 
 A. .(N) = 2 P. T (2;x n ,...,x n )x.x. i,j = 1,2,. ..,9 . 
 ij v x x , ...,x IT ' 1' ' 9 i j ' 
 
 Determination of the A. . (n)'s will then lead to (by Lemma 2.8) bounds 
 
 J- J 
 
 on the average value of n(T) for N-key random 3-2 trees. 
 
 2.3 Higher Order Analysis 
 
 The methods used in the previous two subsections obviously 
 can be generalized to obtain better approximation of n(N). By computing 
 the average number of nodes in the lowest k-levels, we can determine 
 
 n(N) to an accuracy of l/2(2 -l) X 100$. This is so because at most 
 
 k 
 1/2 of the keys are in nodes above the lowest k levels. 
 
 This general procedure, however, is not very useful in 
 
 practice. If F(k) is the number of different types of trees of height k, 
 
 then solution of this problem involves the manipulation of an F(k) x F(k) 
 
 matrix. It is not difficult to show that F(k) = |-F(k-l)(F(k-l)+l) 2 . 
 
 We have analyzed the cases k = 1, 2, where F(l) =2 and F(2) = 9» 
 
 However, F(3) = ^50 is already such a large number that carrying out 
 
 the computation appears to be very difficult. 
 
 3. An Analysis of B -trees 
 3^1 Introduction 
 
 A natural extension of 3-2 trees is the idea of "B-trees" [2] 
 [k]. A B-tree of order m is a tree in which the number of keys contained 
 
Ik 
 
 in any internal node other than the root is no greater than m-1 and 
 no less than |"m/2]-l. 3-2 trees are just B-trees of order 3« To add 
 a key to a node, we insert the new key into the other keys and check 
 if the node now contains more than m-1 keys. If the answer is no, 
 the insertion has been completed. Otherwise, we split the node into 2 
 nodes, one of which contains the smallest f"m/2]-l keys and the other 
 the m-fm/2] largest keys, the one remaining key is then inserted into 
 the parent node. Random B-trees are defined in exactly the same way 
 as random 3-2 trees are defined. 
 
 We shall study n (N), the average number of nodes in the 
 B-trees of order m resulting from N random insertions. An obvious 
 bound was given in [k]: 
 
 -, n (N) n , 
 
 -i- * -^— * 1 + i (lk) 
 
 m-1 - N - rn /2l-l N 
 
 In this section we shall consider the nodes at the lowest level, and 
 do an analysis similar to the first order analysis done in Section 2.1. 
 As we shall see, this analysis yields better results than the corresponding 
 analysis for 3-2 trees. This is so because a greater proportion of keys 
 in a B-tree are stored in the lowest internal nodes as m becomes larger. 
 Define the following functions: 
 
 H(N) = Z t for N II 
 
 k=l k 
 
 (HOiO-Hdii^))" 1 if m= even 
 
 •(m) ( 
 
 m+1 
 
 d = f mTl (Hda+D-HUm+Dyte))" 1 if m = odd 
 v. 
 
15 
 
 It is well known [6] that H(m) ~ &nm + 0. 58 + — +... . A simple 
 
 2m 
 
 computation shows that r(m) = -77 + 0(-rr). Our new bounds on 
 
 n (N) are given below: 
 
 Theorem 3«l s For any e > and fixed m, 
 n n" (N) , 
 
 111-1 N f m/2 ] -1 
 
 when N is sufficiently large. 
 
 ff m« 1 1 , , C 
 
 Corollary : For any e > and fixed m, | — - -s-rr \ < -^ + e for all 
 
 sufficiently large N, where C is a constant independent of m and N. 
 
 The corollary follows from Theorem 3»1 and the approximation of r(m) 
 given earlier. If all the nodes in a B-tree of order m contain m-1 
 keys, there would be N/(m-l) nodes. The ratio N/(m-l)n (n) can 
 therefore be viewed as storage utilization [k-]. Our corollary to 
 Theorem 3»1 shows that, as N becomes large, the storage utilization 
 is essentially 5m2. ~ O.69 for fixed large m (cf. equation (l4)). 
 
 3.2 Proof of Theorem 3.1 
 
 We will first introduce some notations. Note that there are 
 
 m-[m/2]+l types of B-trees of order m and height 1. As shown in 
 
 Figure 5> a "type i B-tree contains i keys in its node for 
 
 i = ["m/2]-l, [m/2 ],..., m-1. A B-tree of order m is said to be of class 
 
 (yr /^i -,»yr /rt-i»«»»*y -,) if at the lowest level there are y. subtrees 
 w |m/2 \-V J |m/2]* ,J m-l y J x 
 
 of type i for each i. Let V y [m/2l-l' y f"m/2l' * ' ''^m-l^ be the 
 
 probability for a random N-key B-tree of order m to be of class 
 
 ( y fm/2l-l'""'* y m-l?" 
 
16 
 
 Type fm/21-1 
 
 Type fm/2] 
 
 Type m-1 
 
 Figure 5 The height 1 B-trees of order m 
 consist of m-["m/2l+l types. 
 (Shown for m = 10) 
 
17 
 
 def 
 
 Definition ?.2: A.(N) = ^ y^fy^,^ . . .,7^) 
 
 J 
 
 i = [m./2~|-l, . . . ,m-l . 
 
 For brevity, we have suppressed the dependence of A. (N) and P on m 
 in our notations. 
 
 1 m " 1 1 
 
 Lemma 5.3 ; (l + ~r) E A. (W) - -~ g n (N) 
 
 ^ ^ i=|-m/2l-l x m " 1 m 
 
 m-1 
 
 g (1+ * ) E A. (N) = +1 
 
 [m/21-1 i=[ m / 2 ]-l X [m/21-1 
 
 Proof : Similar to the proof of Lemma 2.1. The term +1 appearing on 
 
 the right-hand side of the equation arises from the fact that the root 
 
 may contain less than [m/2]-l keys. □ 
 
 The major effort to prove Theorem 3*1 is contained in the next Lemma. 
 
 -, f m-1 
 Lemma 3-h : Let g(N) J E A. (N)/(N+l). Then for any e > 0, 
 
 i^m/21-1 1 
 
 |g(N) -r(m)| < e for all sufficiently large W. 
 
 Proof : We shall assume m = 2p to be an even number. The proof for 
 odd m is similar. 
 
 Let T be an (N-l)-key B-tree of order 2p. After a random 
 insertion, T may become a B-tree of class (y ,...,y. -l,y.+l, . . ,,y ) 
 with probability iy /n (for each i = p, ...,2p-l) or it may become a 
 
 B-tree of class (yp_ 1 +1 >y p +1 >y p+1 > • • ->7 2 p.^ 1 ) ^- th probability 2py g ^/N. 
 It follows that 
 
18 
 
 Vi (N) = Vl*" 1 ' "i^^p-i^- 1 ^ 2 ^-!^- 1 ^ 
 
 A (N) = A (N-l)+fepA^ (N-l)-(p+l)A (N-l)+2pA (N-l)) 
 
 P P IN p-J- P e -r x 
 
 P 
 
 and 
 
 A.(N) = A.(N-l)+|(jA 1 (W-l)-(j+l)A (N-l)) 
 
 for p+1 g j g 2p-l 
 
 (15) 
 
 Denoting by A(N) the (p+1 ) -component vector (A.(N)), (15) can be 
 
 J 
 
 written in matrix notation as 
 
 A(N) = (I+|B)A(N-1) 
 
 where I is the (p+l) X (p+l) identity matrix, and B is defined by 
 
 (16) 
 
 B = 
 
 -P 
 
 P "(P+l) 
 P+l 
 
 -(p+2) 
 p+2 
 
 -(p+3) 
 
 2p 
 
 2p 
 
 
 
 (IT) 
 
 2p-l -2p 
 
 To solve (l6) for A(n), we define a (p+l) -component vector a(N) = (a. (N)) 
 by 
 
 a ± (N) = A i (N)/(N+l) 
 
 Equations (l6) and (18) lead to the following recurrence relation 
 
 (18) 
 
 a(N) = [l+pi (B-l)]a(N-l) 
 
 (19) 
 
19 
 The characteristic polynomial q(\) of B-I is computed to be 
 
 q(\) = (-1) P ' L (\+2p+l)[Z (\+p+j)- Z (p+j)] 
 
 0=1 j=l 
 
 From Lemma 2.11, it is easy to see that the roots of q(\) = satisfy 
 the following conditions : 
 
 (i) All roots are simple roots, 
 (ii) \ = is a root, 
 (iii) The real parts of all roots except \ = are negative. 
 
 Therefore, according to Lemma 2.10, there exists a vector 
 
 T 
 u = (u p _ 1 , u , . . ., u 2p _ 1 ) such that 
 
 (i) (B-I) u = (21) 
 
 -e 
 (ii) |a. (N) -u. | < C N m for p-1 g i g 2p-l (22) 
 
 where C , e are positive constants. 
 
 (22) implies 
 
 2p-l 2p-l _ e 
 
 | Z A i (N)/(N+l) - Z uJssC^N m (23) 
 
 i=p-l i=p-l 
 
 Now, to determine the u. ' s, we note that the following 
 
 equation can be proved easily (cf. the derivation of Equation (12)): 
 
 2p-l 
 
 Z (i+1) u. = 1 (2*0 
 
 i=p-l X 
 
 Solving (21) and (2k), we obtain 
 
 -1 
 
 %-l = ^l2^1 P(2p)-H(p)] 
 
 p+1 2p+l 
 
 (25) 
 U i = ife 172 [H(2p)-H(p)f p^i*2p-l 
 
20 
 
 2P-1 1 ml 
 
 Therefore, L u = ^—r (H(2p)-H(p)) x = r(m). (26) 
 
 i=p-l 1 P 
 
 Finally, substituting (26) in (23), the lemma is obtained. □ 
 
 Proof of Theorem 3»1 « It is a direct consequence of Lemma 3*3 and 
 
 3.*K □ 
 
 k. Conclusion 
 
 We have derived bounds on the average number of nodes in an 
 N-key random B-tree, which essentially is the average number of 
 "splitting" in building the tree. One interesting result is that the 
 asymptotic storage utilization is approximately Sn2 ~ 69% for B-trees 
 of high orders. This seems to agree well with one set of experimental 
 data (m = 121, N = 5000, storage utilization = 67$, see [k]). 
 
 Many problems about 3-2 trees remain to be investigated. 
 What is the average number of splitting on the N^h random insertion 
 [3]? How to analyze 3-2 trees when deletions are also present? Some 
 upper bounds can be obtained for the former problem using the present 
 approach, but it appears that very different methods would be required 
 to answer these questions satisfactorily. 
 
21 
 
 References 
 
 [1] Aho, Hopcroft, Ullman, The Design and Analysis of Computer Algorithms, 
 Addis on -Wesley, 197^. 
 
 [2] Bayer, R. and E. McCreight, Acta Informatica (1972), pp. 173-189. 
 
 [3] Chvatal, V., D. A. Klarner, and D. E. Knuth, Selected Combinatorial 
 Research Problems, Problem 37, STAN-CS-72-292, 1972. 
 
 [k] Knuth, D. E., The Art of Computer Programming, Vol. 3» Addison - 
 Wesley, 1973, pp. ^68-^80. 
 
 [5] Knuth, D. E., The Art of Computer Programming, Vol. 3» Addison - 
 Wesley, 1973, PP» 679-68O, answer to exercise 10. 
 
 [6] Knuth, D. E., The Art of Computer Programming, Vol. 1, Addison- 
 Wesley, 1968, p. 7k. 
 
BLIOGRAPHIC DATA 
 1EET 
 
 1. Report No. 
 
 UIUCDCS-R-74-679 
 
 2. 
 
 3. Recipient's Accession No. 
 
 Title and Subtitle 
 
 On Random 3-2 Trees 
 
 5. Report Date 
 
 October 197U 
 
 6. 
 
 Author(s) 
 
 Andrew Chi -Chin Yao 
 
 8- Performing Organization Rept. 
 No. 
 
 Performing Organization Name and Address 
 
 Department of Computer Science 
 
 10. Project/Task/Work Unit No. 
 
 University of Illinois at Urb ana -Champaign 
 Urbana, Illinois 61801 
 
 1 1. Contract/Grant No. 
 
 NSF GJ-J+1538 
 
 . Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C. 
 
 13. Type of Report & Period 
 Covered 
 
 
 14. 
 
 . Supplementary Notes 
 
 . Abstracts 
 
 It is shown that n(N), the average number of nodes in an N-key random 3-2 
 tree, satisfies the inequality 0.70N-e <n(N) <0.79N+e . A similar analysis is 
 done for general B-trees. It is shown that storage utilization is essentially 
 8m2 ~ 69$ for B-trees of high orders. 
 
 Key Words and Document 
 
 Analysis. 17a. Descriptors 
 
 
 
 >. Identifiers/Open-Ended Terms 
 
 :. COSATI Field/Group 
 Availabiliry Statement 
 
 ?M NTIS-35 ( 10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 USCOMM-DC 40329-P7 1 
 
en 
 
 in 
 
 a. 
 
 UJ 
 CO 
 
I 
 
 
 RJM ■■ 
 
 ffli !\ 
 
 UNIVERSITY OF ILLINOIS-URBAN A 
 blO 64 IL6R no COO? no 679-664(1974 
 Alport / 
 
 M 
 
 dHHHHH 
 
 H 
 
 Ira 
 
 BB WW ■ > iB| 
 
 B H H 
 
 BB W MM HI l & 
 
 S V*^ LH 
 
 I 
 
 hmHS ■"j&ri 
 
 RH I •' ■ pwJ *& *-i I >^ 
 Hi H Bl 
 
 ■39 BH 
 
 ■H n9 H 
 H H H 
 
 IB 
 
 Bbs H 
 
 Bi HUH ojH 
 
 ■—■—a mm nj 
 
 HH 
 
 
 — 
 
 HJ BH 
 
 I LBH 
 
 ■ 
 
 ■