ja.IJr SnKB LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 y\o. 818 - 825 cop. 2, The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 8E P 1 1 P 1 = r i (^ T ),r p.), where x > y and z is a product of powers of the other p!s. We can pair each p X p^z in the numerator with two terms (p.p^z and K 1 J 1 J pWz) in the denominator. pMz p x ~ y Since — J = , we qet P x p y z + p y P x z P x -y + P x - y P X " y „X V / M x, x v, „y„x,x PiP J =( D *-y + D x-y )(p i p j z + p i p J z) H i p j p. Since 1 g x-y, j ( p i p j z + p i p i z ) = p i p j z * Summin 9 over all states with x > y and dividing by C gives P. ( J p i - — r-=--< Prob(k. ahead of k.) Since - — - = Prob(k. ahead of k.) using the move to the p i p j J n front rule, and E(Cost) = 1 + £ p. J Prob(k. ahead of k.), i=l n 1« 1 J the transposition rule is better than (or the same as) the move to the front rule. I I 14 So the transposition rule has lower asymptotic cost than the move to front rule. Rivest [2] has conjectured that this result extends to all permutation rules, i.e. that the transposition rule is the optimal rule (has lowest asymptotic cost for any probability distribu- tion) out of all permutation rules. Intuitively, this conjecture is not surprising. The best we could possibly do (see Section 2.6) is to count the number of times each key has been accessed, and keep the keys ordered with respect to this count. The rule which most closely approximates this strategy is the transposition rule. We can also look at the situation in a dif- ferent way: After a long time, the high probability keys are near the front of the list, and the low probability keys near the bottom. Occasionally, a low probability key will be accessed, and the move to the front rule will move it to the front of the list, increasing the expected cost since many high probability keys have moved down one position. The transposition rule does not do this, and it is difficult for the low probability keys to rise to high positions in the list. While we cannot yet prove the transposition rule is optimal, it has been shown by Yao [4] that if an optimal permutation rule (optimal for all distributions) does exist, it must be the transposition rule. He does this by showing a particular distribution for which the trans- position rule is optimal. Before discussing Yao's proof, we need a theorem by Rivest [2]. 15 Theorem (Rivest [2]): An optimal permutation rule, {t., 1 < j < n} J i,L (t. is used when the j key is requested) must have the property that J each t. : J (i) leaves positions j+1 to n of the list fixed (ii) if j > 1, moves the key in position j to some position j' < j. Proof : Consider the probability distribution p. = 1/k for 1 < i < k and p. = for k < i < n, for some k < n. Any permutation rule satisfying (i) and (ii) above will have an asymptotic cost of (k+l)/2 since all of the keys with zero probability will move to the end of the list and stay there. Any permutation rule which violates (i) will occasionally move a key with zero probability in front of one with nonzero prob- ability, and thus have greater asymptotic cost. Any permutation rule which satisfies (i) but not (ii) will not be able to move any keys out of positions j such that t.(j') = j, so that the optimal ordering for this particular probability distribution cannot be reached, and, again, the asymptotic cost will be higher. Theorem (Yao [4]): Given a list of n elements with probability distri- 1-e bution p, = 1-e and p. = — y, 2 < i < n, there is an e small enough such that the transposition rule is optimal for this distribution. Proof : The Markov chain corresponding to this list has n distinct states, each one having k, (the key with probability 1-e) in a different position. Let q. be the steady state probability that k-j occupies 16 position i, using the transposition rule, and let r. be this probability using the optimal rule. The transposition steady state satisfies: q l = (1 "n^ q l + (1 " e) q 2 q 2 = H^T q l + ^f E q 2 + (1 " e) q : 'n-1 = iPT q n-2 + ^f e Vl + (1 " e) q n q n = TPT q n -l + e q n We solve this to get: Vl ■ (pTf T^ q i for J = l....»n-l n Since Y q. = 1, we obtain 1=1 ? q] = 1 + 0(e), qj = (^rr)^ 1 + 0(e j ) 2 < j < n. From Ri vest's Theorem, we know an optimal rule (if one exists) must have the form: 17 T l T 2 T 3 1 2 ... n\ 1 2 ... nj 1 2 3 ... n 2 1 3 ... n. 1 2 3 4 ... n 1 ld 31 a 32 a 33 4 ••• n Vl T n 1 2 ... n-1 n a n-l,l a n-l,2 ••• a n-l,n-l ' 1 2 ... n-1 n a nl a n2 ••• a n-l,n a nn The theorem now proceeds inductively in n-2 stages. At the 1L k stage we will show: 1. t. +2 is the same as the transposition rule. 2. a.. = k for i > k + 2. J ' r k+l n-1 1-e r k ^n-l j uu j * Note that after stage n-2, we will have proven x. is the same as the transposition rule for i=l,...,n, and hence the theorem will be proved. To begin the induction, we note that x-, and x 2 are the same as the transposition rule by Rivest's Theorem. Hence condition 1 is 18 initially satisfied. Note that condition 2 vacuously holds 1f k=0. Finally, any rule satisfying the condition of Ri vest's Theorem has oo r-.=l when c=0. Since r, = £ a -j e i' ^ c= 0> r i = ^ =d 0" Hence r i = l+0(e) and condition 3 is initially satisfied. The proof for stage k proceeds as follows: Let N(i,j) be the number of I such that an,- = j- This is just the number of requests that cause the key in position i to move to position j. The r. must satisfy their steady state equations. These give us the following bound r i £ N^' 1 ) * 7jfr r k k + 1 < i < n Here we have counted only those transitions from state k to state i and replaced the transition probability by a lower bound of -§y. Summing these inequalities gives n i i=k+l r w t - tr »= > LL" lki,|] iPT r k' n Set a = I N(k,i). Note that a is just the number of j's such that i=k+l a jk > k ' > a(^-) k+1 + 0(e k+2 ) by Condition 3 So Vl + --' + r n = A( n^T )k+1 + °( £k+2 ) for some A " a (1) From this we conclude 19 E xk+1 n / k+2< (k + l)r k+1 + ... + nr n nk + l)A(^) K+l + 0(en (2) r } + ... + r k = 1 - A^)** 1 + 0(e k+2 ). (3) We know that since (k+l)q k+1 + ... + n q n = (k+l)^)^ 1 + 0(e k+2 ) (4) q k+1 = ljtf) M and ^ < 0( e k+1 ) for i > k + 1 Subtracting (4) from (2) gives (k+l)r k+1 + ... +nr n - (k+l)q k+1 ... - n. q R > (k+l)(A-l)(^§ T ) k+1 + 0(e k+2 ) (5) Since 'l * — + t >k =1 "Vl " ••• - q n we have ^ + ... +q k = 1 - (^rr)^ 1 +0( e k+2 ) (6) 1 e i - 1 Now let c = — y y-r • We have q. = c q., i < k and from property 3, r. = c 1 " 1 r, 1 < k (7) Hence q, + q 2 + . . . + q k = q] (l+c+c 2 +. . .+c k_1 ) = 1 - (^) k+1 + 0(e k+2 ) 1 - (^) k+1 + 0(e k+2 ) q 1 = Qd - 1 CT (8) 1 1 + c + z L + ... + c K ' Similarly 2 k-l r, + ... + r k = r^l+c+c +.. .+c ) using (3), = l - A Cri§T )k+1 + °( £k+2 )' 20 so r l = 1 " A(^) k+1 * 0(c k+2 ) 1 + c + c + ... + c FT (9) Now k-l q, + 2q 2 + ... + kq u = q., + 2cq 1 + . . . + kc q ■k ^1 = q 1 (l+2c+...kc k_1 ) Substituting for q, from (8) i - (H§r) k+1 + o(^ k+2 ) k-l l+2c+...+kc l+c+...+c k " Similarly k-l r-, + 2r 9 + . . . + kq u = q n + 2cq, + . . . + kc q l+2c+...+kc k-l H - A (n§T> k+1 + °( £k+2 ^ Subtracting (11) from (10) gives 1+C+...+C ET (10) (11) q ] + 2q 2 + ... + kq k - r ] - 2r 2 - ... - kr k = 21 [(A-lMpiy) 1 * 1 + 0(c k+2 )] < C(A-D( T5 § T ) k+1 + 0(e k+2 )] l+2c+...+kc k-1 1+C+...+C k+kc+...+kc k-T k-1 1+C+...+C R-T < WA-Dt^y)^ 1 + 0(e k+2 ) Finally, subtracting (12) from (5) gives (12) r, + 2r 9 +..-.+ nr- q, - 2q n H l .. - nq, < (A-l)(^) k+1 + 0(e k+2 ) (13) But the left hand side of (13) is just the cost of the optimal rule minus the cost of the transposition rule. If A > 1, then the trans- position has lower cost than the optimum rule, a contradiction. Hence A < 1 and therefore a < 1. Recall that a is the number of a., f k. Since t, + , is the same as the transposition rule by Condition 1, we have a, + , . = k+1 and hence a > 1. Hence a must equal 1. Thus all other a.. < k (else, a > 1). If a.. < k, Condition 2 will be violated since this value has already appeared in the permutation. Hence a.. = k : proving Condition 2 for k. This determines the equation for r. ,. If k=l, r l " < ] - 7PT>'1 + n-e)r 2 V (1-eHii-l) r 1 = CT +0 ^ 22 since r^ = l+0(e). If k > 1 r k-RVl + ^f r k + 0-«' Vi Solving this for r k+1 and substituting (iDlliLlSl) ^ for r^ (Condition 3) gives r k+1 = ( ( n _^( -| _ £ ) )r k and since r k = (^§y) k + k+1 0(e ) (again, Condition 3) r„ Al = (A) k+1 - ( c k+2 ) k+1 v n-l In either case, Condition 3 is proved. All that remains is to prove Condition 1. From Condition 2, we know a k+2 • = i for i < k - 1. We now know a k+2 i. ■ k, and hence x k+2 is the same as the transposition rule, completing the induction for Condition 1 and proving the theorem. I I A final question is how far these rules can possibly be from the optimum. This is answered for the move to the front rule by Rivest [2] and Burville and Kingman [7], If we assume p, > p 2 > ... > p are the key request probabilities, then n 1-1 P,-P I I. ILL MTF Cost . 1+2 i=1 j=l p i +p j ^ l+2x n 1=1 n Opt Cost " * 1+x n -i n _-| where x = J p . ( j-1 ) £ 2(1 - — py) since x < — . j=l Therefore, the move to the front rule never does more than twice the 23 work of the optimal ordering. The theorem also holds for the trans- position rule as its cost is less than or equal to that of the move to front rule. This may be a significant savings, as remarked in the introduction. In summary, the situation in the asymptotic case is quite clear: the transposition rule has asymptotic cost less than or equal to that of the move to front rule. Both rules compare quite favorably with the optimal cost. For the distributions we considered, the transposition rule was within 10 percent of the optimum and the move to front ranged from 25 percent to 38 percent. Finally, the cost of these rules is at most twice the optimal cost for any probability distribution, 2.2 Rate of Convergence In the previous section, we only considered asymptotic behavior and found the move to the front rule inferior to the trans- position rule. In this section we will consider how quickly the rules approach their asymptotes. We will find that the move to the front rule approaches its asymptote more quickly, and initially has a lower expected cost than the transposition rule. The reason for this is clear. In the initial random ordering, many high probability elements are far down in the list. These must be brought to the front to reduce the cost. Obviously, the move to the front rule will do a better job here since these keys make large jumps and quickly rise to the top. The transposition rule allows keys to move only one step at a time, so the convergence should be rather slow. 24 When key k. is requested, it moves up one position, decreasing the cost by p. since we can locate k. with one less compare, and increasing the cost by p. -■ , since key k. , (the key above k.) moves down one position, resulting in a net decrease of p . - p - _ -. . If the p.'s are "close" in size, they are O(-), and this decrease is O(-), resulting in a very slow convergence. We would expect the move to the front rule to take Q(n) time to get very close to steady state, assuming ft(n) high probability keys. The transposition rule should require fi(n ) since each key must move fi(n) steps to get near the top. To begin the analysis, we determine the expected cost of the move to the front rule as a function of time. Theorem : Given keys k, ,kp,...,k having request probabilities p, ,p 2 ,...,p , the expected cost of accessing a list being modified by the move to front rule after t requests is 2 v P.- P.- y (P,--P.j) t 1+2 I JL_J_ + l 3 J M-d -d ) U1 1, and k. was not requested after time m. The probability for this is 25 I (l^.-p.)^ P1 = V (l-P rPi ) m p. m=l 1 J 1 m=0 1 J 1 ■ ( vpt» - (1 - p r»/ ^ Adding these gives P(k. ahead of k. at time t) = J i (1 - p r p j )t + 'p^pJ' " ( P7P7 ,(1 - p r p j )t P i P i" p i t = Vpt + Ttitt (1 - p r p / Then E(Cost) = 1 + I p. I P(k, ahead of k . ) i=l v j7i J ] y y p,P i P^P-P.) t 1 1=1 #1 P^Pj 2(p.+ Pj ) (1 p 1 p j } Y p i p i Y ( p i"Pi) = 1+9 L _L_J_ + Z 1_J fi_ D _ D } l 0. Using the transposition rule, k, is equally likely to start in any position and will move up one position at a time until 1t reaches the top. We will then have r t-n n , t < n - 2 Prob(k, is in position 1 at time t)=i 1, t > n - 1 v. Prob(k, is in position i^l at time t)= - if n - i < t n = otherwise For t < n - 2, the expected cost is: n-t i. SU £ i.(l).i ♦**%**- !♦!(¥» For t > n - 1 the expected cost is 1 since k, must have reached the top An interesting statistic to compute from these time varying costs is the overwork. This is defined as the area between the cost 27 curve and its asymptote. (See Figure 2.2.1) The overwork measures how quickly the cost converges to Its asymptote. Also, since the area under a cost curve measures the total cost, the overwork represents the total number of comparisons we do in addition to the asymptotic cost. The overwork can be determined by summing the time varying part of the equation for the cost. The overwork for the move to the front rule is then r r (Pi-Pi) 2 , t ,. (Pi-Pi) 2 £ I 2(p+p ) (Hyp/ 8 I , 3 %2 - t=0 l = "IP So we see that the move to the front rule does overwork ft(n) and the 2 transposition rule does fi(n ), and hence the move to the front rule approaches its asymptote more quickly. Also note that the move to the front rule converges in 1 request, but the transposition rule requires n-2 requests. We now consider a slightly more complicated case. Suppose there are n-1 elements of probability — y, and one element (k, ) of probability zero. This is not equivalent to the previous case in which k, was moved (unless it was at the top) after each request. Now k, may or may not move depending on which of the n-1 elements with nonzero probability is accessed. The overwork for the move to the front rule in this case is -s- . This can be obtained by substituting the p. in the overwork formula. In order to determine the overwork in the case of the trans- position rule, we calculate P(k,t), the probability k, is in position k at time t. Notice that k, will move down only when the key directly under it is accessed and that this occurs with probability — y . We then k have for k < n: P(k,t) = 4i P^ob(k 1 initially in position i)»Prob (k 1 moves down k-i positions in t time steps). 30 k 1 t 1 k_1 n 2 t-k+1 - 1 J 1 n ' ( k-i } fjCT J ( n=T J - 1 t£f** % c?) n - iiil = ( ? +1 > n - i=1 ticostj 2(n-l) n-T^ 2(n-l) fPI n . 1 #n-2v r J /tx ,n-j\ 31 This gives us the expected cost as a function of time. The overwork 1s then: X hiftt <£f> X id? ( j' (n 2 J) = ^X^ (n 2 J) X i ^ ) § By use of a Taylor Expansion, we can verify i °° i (l-x) k+1 1=0 k n 2 -l Using this substitution, we can show the overwork equals — g- ; which also is the same as our earlier model. Again, the move to the front rule does fi(n) overwork, and the transposition rule does ft(n ) overwork. For this case, it is also possible to obtain simple bounds on the residual cost , i.e. the difference between the cost and the asymptotic cost. By substituting the p. into the equation for the cost of the move to the front rule, we get C0ST MTF = ^ + i(^4") • The residual cost is then i(~?) ~ \ e" t/ ^ n " 2 ^for large n. For the transposition rule, note that if t < n - 2, all the terms of a binomial expansion are present in the time-varying cost, and the residual cost equals j-X^ t 2 -t(2n 2 -4n+3)+n(n-l) 3 s t 2 -2n 2 t+n 4 2n(n " 1) (n-1) 2 2n 4 for large n. 32 If t > n-2 we can add terms with n-2 < j < t to complete the binomial expansion and obtain this result as an upper bound on the convergence. This bound, however, gets progressively poorer as t becomes larger since we must add more and more terms. In fact, the bound goes to infinity as t -*■ °°. These two bounds illustrate the difference rates of convergence. Initially (when t £ n-2) the move to the front rule converges exponen- tially, and the transposition rule converges quadratically, so the move to the front rule converges considerably more quickly. To give an idea of the magnitude of these bounds, for t = n-2, Residual Cost MT p ~ .1839 o 1 n /n 9 1 1 ^ and at t = n Residual Cost MTF = ^-(e) ' = j{^) . On the other hand, 1 1 2 Residual Cost TO z -~ at t = n-2 and Residual Cost TO < *r for t ~ n . TR 2 TR = 2n In general, the transposition rule will converge exponentially, much more slowly than the move to front rule. The convergence of the cost, which is c,A, +...+C A (see appendix), is mainly determined by the size of the eigenvalues with largest modulus. These are much larger in the case of the transposition rule. As a comparison, for Zipf's Law with 3 elements, the eigenvalues which have nonzero c. are (l-p.-p.) for the move to the front rule (.545, .273, .182). For the transposition rule, these can be numerically calculated as: .710, .576, -.344, .175, -.117. Indeed, the "major" eigenvalues of the transposition rule are larger, and slower convergence will result. The overwork has been numerically calculated for more compli- cated distributions. We have already determined a simple form for the 33 overwork in the move to the front rule and can just put 1n the particular distribution. For the transposition rule, there 1s no known simple form. We can closely approximate the overwork by letting x Q = (' n T»---»7T) De the initial distribution over the states of the Markov chain. Then x~ P 1s the distribution after t requests. From this, we can calculate the expected cost at any t. The asymptotic cost (Acost) can be determined directly from the steady state probabilities, or approximated by the cost of XqP for large t. The overwork is then - i t 1 I [cost(x n P )-Acost] s I [cost(x n P )-Acost], i=0 u i=0 u for sufficiently large t. This is the quantity we calculate. The over- work for several distributions is shown in Table 2.2.2. By analyzing the differences between successive values of the overwork in the case of Zipf's Law, we can conclude that the trans- 3 position rule does ft(n ) overwork while the move to front rule does only o n(n ). Thus, for a more complicated distribution the transposition rule does much more overwork. In fact, assuming Zipf's Law, we can derive an exact form for the move to front rule overwork and prove 1t is ft(n ) and thus the bound f n l n ~') is of the right order. Theorem : Assume that the key request probabilities satisfy Zipf's Law. Then the overwork for the move to front rule with a list of n elements is 34 Table 2.2.2 The Overwork for Various Distributions OVERWORK FOR MOVE TO FRONT RULE ENGLISH LETTERS 52.7469 ENGLISH WORDS 122.3576 OVERWORK FOR GEOMETRIC DISTRIBUTION WITH N ELEMENTS N MOVE TO FRONT RULE 3 0.291 1 4 0.8291 5 1.7564 6 3.1250 7 4.9612 8 7.2860 9 1C. 1011 10 13.4123 11 17.2216 12 21.5299 13 26.3377 14 3 1.6452 15 37.4527 16 43.7600 17 50.5674 18 57.8747 19 65.6820 20 73.9893 OVERWORK FOR ZIPF'S LAW WITH N ELEMENTS N MOVE TC FRONT RULE TRANSPOSITION RULE 3 0.2006 0.4579 4 0.4463 1.6503 5 0.7978 3.9793 6 1.2576 7.7514 7 1.8272 13.3005 8 2.5076 9 3.2994 10 4.2031 11 5.2189 12 C.3473 13 7.5882 14 8.9420 15 10.4087 16 1 1.9884 17 13.6812 18 1 5.4871 19 17.4063 20 19.4387 35 5rT ■(^W^/ 11 ^^-^ where Hi 2 ' - I \. n i=i r Asymptotically, this is (| - ln2)n 2 ~ .057n 2 . Proof: Substituting p. = -Jr- Into the overwork formula gives 1 lH n (J- x> 2 i ™„ JH„' . .. .,2 | I _2 L_.J j UxiL U1). (3) To determine the asymptotic behavior, note that H ~ in n, so H 2 - H ~ ln2n - In n = ln2, so the second term is asymptotically 2 2 n Tn2. The third term is 0(log n) and is dominated by the n terms. 2n t (2^ (2) = y — Finally, we need to approximate Hi ' - H* ' \m-\ ^ ' Since the summand is a decreasing function, we can bound it using the following relation: b+1 b f(x)dx * I f(D * f(a) + i=a f(x)dx Substituting a = n+1 , b = 2n and f(x) = -*- gives x 2n+l 2n T * I _ TF * . ,v2 n+1 : 2 i=n+l i 2 ' (n+1)' 2n n+1 dx T x (2n+U(«M) * , J n+ 1 7 r\ c + 6n - 1 2n(n+l)' 1 _2 Since both the upper and lower bounds equal j- + 0(n ), we have 38 2n 1 1 -2 I —k = «- + 0(n ) and the fourth term 1n (4) approaches i=n+l r 1 2 2 n . Hence, the asymptotic value for (4) 1s 1^ n 2 - n 2 ln2 + | n 2 = (| - ln2)n 2 s .057n 2 rn We can get a graphic idea of the difference 1n convergence from Figure 2.6.1 and Table 2.6.1 in Section 2.6. These show the cost of accessing a list ordered by the two rules as a function of time and compare them with the frequency count rule (see Section 2.6) which is optimal. From graphs like these, it is interesting to calculate the smallest number of requests for which it is better to use the trans- position rule (See Table 2.2.3). Note that the value we are really interested in is not the point where the two cost curves cross, but the point where the integrals of the two curves cross. This is because we want the rule that does the least total work. The slope of the cost crossover in Table 2.2.3 increases, so it is super linear and may be about ft(n log n). The integral crossover appears to be fi(n ). We can also get an estimate of the integral cross- over as follows: If we assume all the overwork has been done by time t, the integral crossover time, then the cost integral for the move to front rule is t times the asymptotic cost (AS ) plus the overwork (OV ), and similarly for the transposition rule. Since we are at the point where these integrals cross, t • AS m + 0V m = t • AS TD + 0V TD mm TR TR ov td - V m . TR m z AS - AS TD m TR 39 Table 2.2.3 Cost Crossover and Integral Crossover Times n Cost Crossover 3 3 4 5 5 7 6 10 7 13 10 22 20 75 Integral Crossover 6 10 14 20 27 50 212 Points where the cost and integral of the cost for the transposition rule become less than that of the move to front rule, for an n-element list with Zipf's Law as the probability distribution. 40 Earlier in this section, we found 0V TR = ft(n 3 ) and 0V = fi(n 2 ). Since the asymptotic costs are bounded within twice the optimal cost (which is ," - ), AS - AS TD = ft U " J and hence we get t = n(n In n), inn m i k inn which is slightly larger than shown in Table 2.2.3. In summary, though the transposition rule has lower asymptotic cost than the move to front rule, it converges to that cost much more 2 slowly, and, in fact, for Zipf's law, it will require fi(n ) key requests before it becomes more economical to use the transposition rule. 2.3 Other Permutation Rules We previously defined the idea of a permutation rule, where a permutation t. is perfomed on the list when the key in location i is requested. So far, we have only considered two such rules: the move to front rule and the transposition rule. There are a total of (n!) n possible such rules, but most will just senselessly jumble the list, resulting in no decrease in cost. Let us think intuitively about what a "sensible" rule must be like. We will see that a sensible rule should move the requested key up in the list by a certain amount (which may depend on the location of the requested key). This is the only good way to use the information that this key, having been requested, should have higher probability. Any permutation not of this form can be viewed as performing first a sen- sible permutation and then a permutation that leaves the requested element alone. This second permutation will only increase the disorder of the list since no additional information has been given on these keys, and permuting them will work against the order we are trying to create. 41 We consider the following sort of sensible rule which moves the requested key k position ahead for some fixed k. Another type of rule that should behave similarly 1s where the requested key 1s moved some fixed fraction of the distance to the top. It can be seen from Figure 2.3.1 (due to Rivest [2]) and Table 2.3.1 that as the distance the requested key moves is Increased, the asymptotic cost increases and the rules converge more quickly, forming a spectrum of rules, ranging from the move ahead 1 (transposition) rule at one end to the move ahead n-1 (move to front) rule at the other. 2.4 A Hybrid Rule We can get a rule that 1s superior to any of those we have considered so far by relaxing the restraint that the rule cannot vary with respect to time. A hybrid rule can be envisioned that moves keys to the front for some initial period of time, then switches and begins transposing. Such a rule will enjoy the advantages of both rules. Initially, it will move keys to the front and will therefore converge quite rapidly. Asymptotically, it will behave like the transposition rule and therefore will have a low asymptotic cost. The question is when we should switch rules. To help answer this question, a simulation was run using Zipf's Law for the key request probabilities. Each trial of the simulation used the move to front rule until the expected decrease from using the transposition rule became larger than that of the move to front rule. The number of requests required for this to occur is an approximation to the correct 42 o E •= 120 1.10 1.00 Move to front rule Tronsposit ion rule A 5 A 6 This figure, due to Rivest [2], compares the cost of different move ahead k rules (A. refers to the move ahead i rule) for a list of seven elements whose probabil- ities are given by Zipf's Law. Figure 2.3.1 Asymptotic comparison of the move ahead k rules. 43 Table 2.3.1 Comparison of the Convergence of the Move Ahead k Rules Lowest Total Cost - 5 6 - 8 9 -13 14 -38 39 - °° The results of a simulation using a list of 6 elements whose probabilities are given by Zipf's Law show the time interval for which each move ahead k rule has lowest cost and lowest total cost (the total cost is the cost summed over all previous requests). k Lowest Cost 5 - 3 4 4 3 5 - 7 2 8 -15 1 16 - co 44 time to switch. These times were then averaged over all trials to give the results shown in Table 2.4.1. The results of this simulation In- dicate .268n + .980 as the best time to switch rules. This time, of course, depends on the request probabilities, but we would not expect it to vary too much for different distributions. Furthermore the choice is not too critical. Since the transposition rule converges so slowly, little is lost if we use the move to front rule for too long. We need only make sure that our choice is large enough to have the move front rule be close to its asymptote. We would then switch after .5n requests, to make sure we had used the move to front rule long enough to significantly reduce the cost. Another method would be to estimate our position on the cost curve by counting the number of compares we require and averaging over a period of time. Once this estimate stops decreasing, we suspect that we are in the flat part of cost curve, and we switch to the transposition rule. This rule has the overhead of counting the number of comparisons, In addition, we must be careful not to average over too short a period, or we may switch too soon. This rule is best employed when we expect an intermediate number of requests. If few (0(n )) requests are expected, then the move to front rule is used. A great number suggests the transposition rule. An intermediate number means that both of the good features of the hybrid (fast convergence and low cost) will be valuable and the overhead incurred by using this rule will be worthwhile. 45 Table 2.4.1 Best Times to Switch Rules n Average Switch Time 3 1.90 4 2.20 5 2.29 6 2.52 7 2.84 8 2.94 9 3.35 10 3.55 20 6.608 30 8.924 Simulation showing the average best time to switch from the move to front rule to the transposition rule. The probability distri- bution is Zipf's Law over n elements. 46 2.5 The First Request Rule The first request rule 1s defined as follows: the first time a key is requested, it is moved up 1n the 11st until 1t comes to the top or a previously requested key. After that, it 1s not moved. Note that the keys occur in the 11st 1n order of their first request. After all keys have been requested, the ordering obtained is the same as if the keys had not been known a priori, and the list had been built by inserting a "new" key (one that had been requested for the first time) at the end of the list. The following theorem characterizes the performance of this rule. Theorem : Given any initial list, the probability of obtaining a given final list after any number of requests is the same for the move to front and first request rules. Proof : Consider any sequence of requests r, ...r. as inputs to the move to front rule, and the reverse sequence r....r, as inputs to the first request rule. Note that these two sequences have the same probability. Suppose that both rules start with the same list. We now show that these two sequences produce the same ordering. Consider any two keys k. and kj. If neither is requested, both rules will leave the initial order unchanged, and k. and k. will be ordered the same in the two final lists. If only one (say k. ) is requested, then both rules will have k. ahead of k. in the final list. If both are requested (say k. is requested after k. in the sequence r, ...r.), then k. will be ahead 47 of k. in the move to front list. Since k. is requested before k. in the sequence r. . ..r, , k. will also be ahead of k. in the first request list, hence the orderings 1n the final list will again be the same. In any case, k. is ahead of k. in one list if and only if it is ahead of k. in the other. Hence the two lists must have the same ordering. J Now consider any list. For each sequence of requests that will produce this list using one rule, there exists a sequence of equal probability that will produce this same list using the other rule. Hence the probability for either rule to produce this list must be equal. ~\ This theorem is easily extended to hold for a probability distribution over initial lists since the two rules will behave identically for each initial list. Also, it implies that the cost of the first request rule at any time will equal the cost of the move to front rule. Therefore all the previous results concerning the move to front rule apply to the first request rule. Suppose the keys were not known a priori and the list was constructed by inserting a "new" key at the end of the list. Clearly, the asymptotic distribution will be that of the first request rule. This theorem tells us that if the initial list was constructed in this manner, using the move to the front rule will not decrease the cost (since the Markov chain will be in steady state). The first request rule differs from the move to front rule in two important ways. First, since each key is moved only once, it is 48 cheaper to execute than the move to front rule. Second, since the list converges to a specific ordering (which may have very high cost), the variance of the cost is much higher than that of the move to front rule. The first request rule can be modeled by a Markov chain with (n+l)n! states. For each of the n! orderings of the list, the chain can be in n+1 different states, depending on whether 0,1,..., or n different keys have been requested. Unlike previous chains, this chain is reducible (see appendix). Once we reach a state in which all n keys have been requested, we are "trapped" and cannot leave this state, On the other hand, an irreducible chain cannot get trapped and must divide its time among all states that have nonzero steady state probability. In fact, the ergodic theorem tells us that if a state has steady state probability p, the chain will spend a fraction of its time equal to p in this state. We are now in a position to talk about the variance of the costs of these two rules. If we let c. be a random variable equal to the cost of the state the chain is in at time i, E(c), VAR(c.) and c,+c 2 +...+c. E( * ) are the same for both rules. However the variance of the c,+...+c cost averaged over some time period [VAR(— -)] is much greater for the first request rule. The fact that the move to front rule is lim c-1+...+c irreducible implies VAR( -) = (see appendix). However, for large n, c = c , using the first request rule (since the chain has reached a final state), and -~ c . Therefore the variance 49 of the average cost is VAFUc ) > 0. (An expression for this variance can be found in McCabe [10].) We can use the first request rule to form a hybrid rule with the transposition rule as follows: When a key is first requested, we use the first request rule and move the key up until a previously requested key is encountered. When the key is subsequently requested, we use the transposition rule to promote it in the list. The performance of this hybrid is better then the move to front/transpose hybrid. The only requests handled by the transposition rule are second and subsequent requests. Hence the initial list that the transposition part of the hybrid "sees" is a list ordered by the first request rule, which, as we have seen, is the move to front rule (or first request rule) steady state. Hence the transposition rule "starts" from the move to front rule steady state. This is an improve- ment over using the move to front rule initially since then the steady state is never reached. In addition, this hybrid will reduce the cost more quickly than the first request rule because it does a cost re- ducing transposition on second and subsequent requests of a key, while the first request rule alone does nothing. This hybrid also has the desirable feature that no guesswork need be done as to when to switch rules. This choice is performed automatically by the algorithm. 2.6 Frequency Count Rule Perhaps the most natural way to cause high frequency keys to move to higher positions in the data structure would be to keep count 50 of how many times each key has been requested. If we assume the request probabilities are constant with respect to time and then keep the keys sorted according to their frequency counts, high probability keys will move to the top. The primary advantage of this rule is that it has a lower access time than the other rules we have considered. In fact, its performance is optimal. In addition, frequency information is available for analysis, which may be desirable, and the changes required to execute the rule are quite simple. The primary disadvantage is that count fields must be kept, requiring extra storage. These points are now considered in greater detail. We first discuss the performance of this rule. The following theorem shows that it is asymptotically optimal. Theorem : As the number of requests, t ■* °°, a list ordered by the frequency count rule approaches the optimal ordering. Proof : If two keys k. and k. have probabilities p. and p. with p. > p., the probability that k. is ahead of k. after t requests approaches one n as t -> oo. Since E(Cost) = T p. (1 + T Prob(k. is ahead of k. )) we 1=1 ] j7i J 1 1 im I I have t ^ oo E(Cost) = I ip i which is the optimal cost. [_| Also, if we have no a priori reason to suspect k. is more probable than k., this rule is optimal at any time. J 51 Theorem : If we have no a priori knowledge of the probability distribu- tion, the frequency count rule provides the optimal ordering at any time. Proof : If we have no a priori knowledge, then all distributions of key requests must be considered equally likely, so if k. has occurred more times than k., Prob(p. > p.) > Prob(p. < p.), and an arrangement with k. ahead of k. will have a lower expected cost. Clearly, the arrangement with the lowest expected cost will be the one in which the keys are sorted by frequency count, and this, of course, is the arrangement given by the count rule. I — I A comparison with previous rules is given by Figure 2.6.1, which shows the results of a simulation done on a 15-element list using Zipf's Law. Table 2.6.1 shows a simulation for a list with 100 elements. These two simulations give us a good idea of the differing rates of convergence of the two previous rules and how they compare to the optimum, Initially, the move to the front cost decreases nearly as quickly as that of the count rule. This is intuitively reasonable: Initially, the count rule will move the requested item close to the top, so its behavior should be very close to the move to the front rule. On the other hand, the transposition rule's cost decreases very slowly, especially on the 100 element list. As mentioned before, the changes required by this rule after each request are small. Suppose k. is requested for the r time. The only change is to increase k. 's frequency count from r-1 to r and move 52 02.0 Simulation on a 15-element list using Zipf s Law "Time" is measured as the number of requests Figure 2.6.1 Comparison of Various Rules 53 Table 2.6.1 Another Comparison TIME VARYING COST FOR ZIPF'S LA* WITH 100 ELEMENTS TIME MOVE TO FRONT TRANSPOSIT ION FREQUENCY COUNT 50.1322 50.1322 50.1322 1 47.4562 50.0749 47.4556 2 45.4688 50.0307 45.4608 3 43.4281 49.9742 43.4257 4 41.8796 49.9329 41.8443 5 40.7099 49.8780 40.6515 6 39.7503 49.8341 39.6460 7 38.5905 » 49.7742 38.4920 8 37.8538 49.7196 37.6994 9 36.5972 49.6666 36.4395 10 36.1294 49.6216 35.8960 it ahead of all keys having frequency count r-1. We can easily determine to where k. should move in the following manner. During our search for k. we keep a pointer to the key furthest down in the list whose count is greater than the key we are currently examining. When we examine k. , this pointer will point to k.'s new location. Note that after many requests, the count fields will be widely separated, and these moves will rarely be required. The primary disadvantage of this rule is the additional storage required for the count fields. The storage required, however, can be reduced using very simple techniques. From the updating algorithm, we can see that actual count a key has is not important. What matters is the difference between successive counts, because this gives us all the information we need to keep the keys ordered with respect to count. If we store this difference instead of the full count, we will require 54 less storage, since the rate of growth of the difference fields is proportional to the difference in successive probabilities (which is small), while the count fields grow in proportion to the probabilities. Note that only a small amount of work is required to update the dif- ference fields since after a request, only once count field changes, and hence at most two difference fields must be updated. Thus, the count rule is a very attractive rule. Asymptotically it approaches the optimal ordering. At any time, it provides us with the list which has lowest cost, based on the requests we have seen so far. The work required to update the list is also very small. The primary disadvantage is the extra storage required. However, this disadvantage can be reduced by storing the differences between successive counts. 2.7 Limited Difference Rules We now consider a set of rules which limit the size of the difference fields in the frequency count rule. Once a difference field reaches this limit, additional requests of the more frequent key leave this field unchanged (requests to the other key, of course, decrease this field). If the maximum difference is zero, then the algorithm will move a key to the front when it is requested, and will perform exactly like the move to front rule. As the maximum difference is increased, the performance will improve, with the full count rule (no maximum dif- ference) as the limit. Therefore, performance approaches the optimum as the number of bits is increased. 55 To see how much the performance is effected by the number of bits, let us consider a list with only 2 elements, having probabilities a and b(=l-a). If the maximum difference is at most n, then the corresponding Markov chain has 2n+2 states: A. , < i < n where the key with probability a is first in the list and the difference is i. B. , * i < n where the key with probability b is first in the list with difference i. It is easy to verify that the steady state equations are: A„ = aA n , + aA n B n = bB„ , + bB n n n-1 n n n-1 n A-j = aA -j-i + bA i+i B i = bB i_i + aB i+r 2 ^ 1# ^ n- 1 A 1 = bA 2 + aA Q + aB Q B, = aB 2 + bB Q + bA Q A Q = bA 1 B Q = aB 1 n n and, in addition, I A. + I B. = 1. 1=0 1 i=0 n We solve this system of equations to get h n_1 h n+1 . n-i . n+i A. = A n (f) B. = A n (|) 1 < 1 < n and A_ = n a[l - (j^ n+1 ] The cost of the list is (a+2b) Prob(key with probability a is first in list) + (b+2a) Prob(key with probability b is first) 56 = (a+2b) I A. + (b+2a) £ B, 1=0 n 1-0 1 2b(b-a)(£) n - (b-a) = (1+a) * K 2n+l Let us now suppose that b > a. Then the optimal cost is b+2a = b+a+a = 1+a, which is the first term in the cost expression. The difference from the optimum is then given by 2b(b-.)(|)" - (b-a) , TTnTI . n+1 S1nce a '■ [(£> - 1] (£) Hence we see that the "use" of adding one to the maximum difference decreases expontially with base — . This tells us that the performance a should be improved by the addition of just a few bits. However, the "flatness" of the distribution (determined by how close — is to one in this simple case) determines how many bits will be required. The flatter the distribution, the more bits will be required to correctly distinguish the more probable elements. Table 2.9.1 shows the results of a simulation run on larger bits. Even using a small maximum difference provides nearly optimal results. The limited difference rule lets us use a limited amount of storage, while providing nearly optimal results. For a two element list, the cost of this rule approaches the optimum exponentially as we increase the maximum difference. 57 2.8 Wait c, Move and Clear Rules We now consider two classes of rules that use bit fields to store information about key requests. The first class uses the bit field as a counter, initially zero, that is incremented by one each time the key is accessed. Once the field exceeds to maximum value, the key is moved (using either the move to front or transposition rule) and the field of every key is reset to zero. The cost of performing this may be very significant. However, if all fields are stored in one area (instead of being directly associated with each key) we can set all fields to zero by zeroing a contiguous area of core, which may be done very efficiently. We will call these rules "wait c, move and clear" rules, where c is the maximum value of the field. A second class of rules (discussed in the next section) behaves in a similar fashion, except that when a key is moved, only its field is reset to zero. These rules will be called "wait n and move" rules. In analyzing these rules, we will find that using the count fields in the first manner will decrease the asymptotic cost more than the second method. However, the convergence of the first method will be much slower, since we will not move a key every request, and, if the maximum difference is very large, we will move keys only very rarely. We begin our analysis of the wait c, move and clear rules with the following theorem. 58 Theorem : Given key request probabilities p,,p 2 ,...p , the steady state probability of a given list using a wait c, move and clear rule is equal to the steady state probability of the list using the cor- responding permutation rule with modified key request probabilities P^c), P 2 (c),...,p n (c), where c c c c P,(c) = I ... I I ... I »r° a i-r° vr° v° (c + a 1 H-...ta i ._ 1 + a 1+1 ^...ta n )! c!a l ! -Vl ! Vl ! - s n ! n C+1 „ a i n a i'-l n a i+l /" Pi Pi •••P 1 _i P 1+ i ■••?„ • Proof : Consider the sequence of keys that have been moved by the wait c, move and clear rule. We have assumed that any two requests are in- dependent, and that the request probabilities are constant with respect to time. Because of these assumptions and the fact that we clear the counts after each move, the move sequence has the following properties: (1) Any two moves are independent. (2) The probability that the i move is a given key does not depend on i . 59 If we use the move sequence as inputs to a permutation rule, the resulting list will be the same as one obtained by inputting the original request sequence to the wait c, move and clear rule. We note that the properties of the move sequence are exactly those required for a request sequence, so the inputs to the permutation rule can be thought of as a sequence of requests. However, elements of this sequence are not chosen using the request probabilities, but using the probability that a key is moved. The probability that p. is moved is exactly the p. shown in the statement of this theorem. This formula is derived as follows: If k. was moved, we know that k. has been requested c+1 times, and that the last request (the one that caused k. to be moved) must have been for k. . Then for j^i, let k. be requested a. times (0 £ a. £ c) and sum over all possible J J J choices for the a.. This would complete the proof if every request to the wait c, move and clear rule caused a move. This is not the case since we must wait after each move while the counts build up. If this waiting time were dependent on the current state (as it will be for the wait c and move rules), states with longer waiting times would have proportionally greater probabilities. Fortunately, this is not the case. After each move, the counts are reset and hence each state will have the same expected waiting time. ^] This proof demonstrates the reason wait c, move and clear rules outperform permutation rules. In order to be moved, a low probability 60 key must be requested c+1 times before any other key 1s requested c+1 times. Hence these are less likely to be moved. On the other hand, high probability keys now have a proportionally greater chance. Notice, of course, that the probability that a key is requested remains the same; we are only being more selective about which key we move. Due to this correspondence between wait c, move and clear rules and permutation rules, many results from previous sections carry over. Specifically: Corollary : Let keys k, ,kp,...,k have request probabilities p,,p 2 ,...p and let p,(c) ,. . . ,p (c) be defined as in the previous theorem. Then (1) The asymptotic cost of the wait c, move to front and clear rule is 1 + I W P^CjPjU) (2) For the wait c, transpose and clear rule, the steady state probability of any given ordering (k, ...k ) is n n 'i n p.(c) i=l n N where N is a normalizing constant. (3) The wait c, transpose and clear rule has asymptotic cost less than or equal to that of the wait c, move to front and clear rule. 61 Proof : All result from replacing p^Cthe probability a key is moved by a permutation rule) by p. (c)(the probability for a wait c, move and clear rule). As in the case of the limited difference rule, the performance approaches the optimum as c -»• ». Theorem : As c -*• °°, the asymptotic costs of the wait c, move to front and clear rule and the wait c, transpose and clear rule approach the optimal cost. Proof : We first examine the wait c, move to front and clear rule. Consider the probability that k. is ahead of k. in the list. This will be the case if any only if k. was moved at the most recent time when either k. or k. was moved (i.e. k. was the most recently moved of k. and k.). Thus, the probability is Prob(k_- was moved k. or k. was moved). This equals the probability that k. was requested c+1 times before k. was requested c+1 times. By the law of large numbers, this approaches 1 if p. > p. and if p. < p.. Hence the expected cost which equals 1 + I p. Prob(k. ahead of k.) W J 1 approaches 1 + I PjO-U = I 1p r i i the optimal cost. 62 By (3) of the previous corollary, the wait c, transpose and clear rule has cost less than or equal to that of the wait c, move to front and clear rule, so 1t also approaches the optimum. So both the wait c, move and clear rules and the limited difference rule approach the optimum as the number of bits they use increases. The important question is: which converges more quickly? Table 2.9.1 shows the limited difference rule makes "better use" of its bits. This can also be demonstrated in the case of a list of two elements (A and B) having probabilities a and b (=l-a). Here the c • • probability that A is ahead of B equals I ( C V) a c+ V. Table 2.8.1 i=0 n shows this probability approaches one much more slowly than that of the limited difference rule. A major disadvantage of the wait c, move and clear rules is that they decrease the cost more slowly than the corresponding per- mutation rule with modified probabilities, since a counter must exceed c for a move to be done. The worst case occurs when every key is requested c times before any key is requested c+1 times. In this case, a move will be done every cn+1 requests. Thus, the convergence can be slowed by a factor fi(n). On the other hand, the best case occurs when the same key is requested c+1 times. Here, a move will be made every c+1 requests and the convergence must be slowed by at least this constant multiple. 63 Table 2.8.1 Probability A 1s ahead of B for a=.6 c LIMITED BIT «ULE 0*60000 1 0*66316 2 0*74218 3 0*81039 4 0*66446 5 0*90511 6 0*93457 7 0*95536 8 0*96977 9 0*97963 10 0*98632 11 0*99084 12 0*99387 13 0*99591 14 0*99727 15 0*99818 16 0.99878 17 0.99919 18 0.99946 19 0.99964 20 0. 99576 21 0.99984 22 0.99589 23 0.99593 24 0.99995 25 0.99997 26 0.99598 27 0.99999 28 0.95599 29 0.59999 30 1.00000 31 1.00000 32 1 .00000 33 1.00000 34 1.00000 35 1.00000 36 1 .00000 37 l.COOOO 38 1.00000 39 1.00000 40 1.00000 41 1 .00000 42 1.00000 43 1.00000 44 1.00000 45 1.00000 46 l.COCOO 47 1.00000 48 1 .00000 49 l.COOOO 50 1*00000 WAIT C RULE •60000 ,64800 •68256 •71021 •73343 •75350 .77116 ,78690 ,80106 .81391 ,82562 0, ,83636 ,84623 ,85535 0, ,86379 ,87162 0, ,87890 0. ,88569 0, ,89202 0, ,89794 0, ,90348 0. ,90868 0, ,91355 0, .91812 0< ► 92242 0, ,92647 0, ,9 30 28 0, ,93387 0, ,937 25 0, ,94045 0. ,94346 0, ,94631 0. ,94900 0, ,95154 0, ,95395 0. ,95623 0. ,95838 0, ,96042 0. ,96236 0, ,96419 o, 96593 0. ,96757 0, ,96914 o. ,97062 0« 97203 0, ,97336 o. 97463 0« 97584 0, ,97698 0. 97607 0* 97910 64 To get an idea of the average decrease 1n convergence, we consider n equally likely keys and c=l. Note that this 1s the least favorable key distribution. We now determine the expected number of requests before a key is requested for a second time. This is n I Prob(no key has been requested twice after i requests). 1*0 This probability equals the number of sequences of length i of distinct keys ( / "'• \ , ) divided by the total number of sequences (n ) ? n! 1 "i^O^^ 7 Replacing i by n-i gives -n e n nl I V~ =n!n- n [ ^ < n!n i=0 n ' i=0 '' Stirling's approximation gives 3 (n n e" n SZm) n" n e n = SZ™. Therefore, for c=l , the expected slowdown is ft(/n), for this unfavorable key distribution. The wait c, move and clear rules have an interesting cor- respondence with the permutation rules. They perform better than per- mutation rules because they are more selective about which keys are moved. However, the performance is not as good as the limited difference rule. 65 These rules have the further disadvantage of converging more slowly than permutation rules. For a list of n elements, the convergence is slowed by a factor between c+1 and nc+1 times. If c=l, the average slowdown is ~/27rn for the uniform distribution. 2.9 Wait c and Move Rules We now turn our attention to the wait c and move rules, and first consider the wait c and move to front rule. Theorem : Given key probabilities p,,p 2 ,...,p , the asymptotic cost of the wait c and move to front rule is 1 + I Pt x.., where Pa c P-: k c mxb P-; m J1 ( Pi + Pi )(c-H) 2 k=0 h + Pj m=0 m V p j (the probability k. is ahead of k. in the list). Proof : Recall that the expected cost is 1 + I p. Prob(k. ahead of k.) W J and therefore we must determine this probability. Consider any two keys, A and B, having probabilities a and b. Note that the relative ordering of A and B will not be effected when another key is moved. Also, their counts will remain the same since they are not cleared. Therefore, in determining Prob(A ahead of B), we can ignore all other keys and requests 66 to all other keys; we need only consider a 11st consisting of A and B, having probabilities -nr and -r of being requested. (For simplicity, we rename these probabilities "a" and "b"). 2 This list can be modeled by a Markov chain with 2 (c+1) states, A., and B. . for * i, j <; c. State A,, corresponds to the list with A (having count i) ahead of B (having count j). State B. . corresponds to B (having count j) ahead of A. Note that the first sub- script is always A's count. Before solving for the stationary distribution, we must first make sure it will give us Prob(A ahead of B). There are two possible troubles. First, as with the wait c, move and clear rule, we must wait in each state of the two element chain while keys other than A and B are being requested. However, since key requests do not depend on whether A is ahead of B, or the count of either key, the requests are independent of the state and hence the expected waiting time is the same for each state. Second, the chain is periodic with period c+1. If we let r. and r„ be the number of times A and B have been requested, we have i = r. mod(c+l) and j = r„ mod(c+l). Therefore i+j = (r. + r g ) mod(c+l). Since each transition increases r. + r D by one, if we start at A., (or A D I J B..), it will always take a multiple of (c+1) transition to return. Hence the chain has period c+1. A chain which is periodic does not converge to its steady state distribution in the sense that .™ p t^ x 0' x ^ = ^ x ^» where P t (x Q ,x) 67 is the probability of going from an initial state x Q to state x in t transitions, and p(x) is the steady state probability of state x. However, for an irreducible chain (which this one is), the ergodic 1 , im 1 theorem holds (see appendix). This states that ._ >oo j I P+^ x o» x ^ = p(x). Hence the "time average" of the probability approaches the steady state distribution. If C(t) is the expected cost at time t, we are guaranteed ™ T J C(i) = T p(x)c(x) where c(x) is the cost of z l i=0 x state x. The cost converges to the asymptotic cost in this sense. Note that the asymptotic cost is still the stationary probability of a state times its cost summed over all states, only the strength of con- vergence has been changed. We now proceed to determine the stationary probability. The steady state equations are: A ir a Vi,j + b Vj-i B ij =aB 1-l,j +bB i,j-l for0<1,j*c A 0j = bA 0>j . 1 ♦ aA c . + aB cj Bq. = bB . 1 for < j <; c A i0 " aA i-l,0 B i0 = aB 1-l.0 + bA ic + bB 1c forO<1«i l' (c+l)'(l-y) *~ by ' 00 ., 00 = - L -T ( Iy 1 )( I (ax+by) J ) (c+ir 1=0 j=0 Using the binomial theorem gives j-k k 7 ( Iy 1 )(.I I ( J k )(ax) (by) ) (c+l) fc 1=0 j=0 k=0 k ' = -^"T I I i ( J k )a^ k b y- k y i+k (c+1) 1=0 j=0 k=0 K Now substitute i' for j-k and j' for i+k and then drop the primes OO 00 J ■ - J -7 I I I ( 1 k k )a 1 b k xV (c+lr 1=0 j=0 k=0 K Therefore A.. = a * \ ( i t k )a i b k 1J (c+ir k=0 k 70 c c Prob(A ahead of B) is then I I A . . i=0 j=0 1J = - S -7 I I I C 1 t k )a 1 b k (c+lT 1=0 j=0 k=0 K = - J - T I I ( 1 t k )a i b k I 1 (c+1) k=0 i=0 K j=k a r / mukS ,1+k%_ 1 7 I (c-k+l)b* [ (': R )a (c+ir k=0 1-0 k Recalling that a and b were originally -Jt- and — nr and substituting into the cost formula finishes the proof. Another interesting fact about this rule is that for some distributions we can prove that it does not approach the optimum as we increase the number of bits. Theorem : Given a distribution of key request probabilities, if p. < p. < 2p. for some i and j, the wait c and move to front rule will not approach the optimum as c -► ». Proof : We show that Prob(k. ahead of k.) does not approach 1 as c * «, hence the cost is bounded away from the optimum. p. p. For convenience, let a = — | — and b = — J— - . From the p i +p j p i +p j preceeding theorem, Prob (k. ahead of k.) = J k r ,i+kx i -i-7 I (c-k+l)b K I (': K )a (c+1) k=0 i=0 K c . °° ... <- JL - T I (c-k+l)b k I C*)*' 1 (c+iy k=0 i=0 K 71 a ? ,_ ..-.xuk 7 I (c-k+l)b R ^ (c+ir k=0 (1-a)' Since 1-a = b, = — *—* I (c-k+1) b(c+l) k=0 = -^ T [(c + l) 2 -^-] b(c+ir a c+2 b 2c+2 p . , . which approaches It- = -s— as c ■> ». since p. < 2p., „ n Prob(k, ahead „ 2b 2p . r i r j' c-*>° v l "l ^ I — I k.) = j^~ K 1 anc ^ tne cost cannot approach the optimum as c ■+ ». I I J Indeed, it is reasonable to expect the theorem to hold for all distributions except the uniform and the distribution with a key of probability one. However, this conjecture has not yet been proved. It is interesting to determine why this method decreases the cost over the move to front rule. The wait c, move and clear rule achieved a decrease by altering the probability that a key is moved from the request probabilities to a more favorable distribution. However, the wait c and move rule does not do this. Since a key is moved after st eyery (c+1) request for it, the move probabilities remain unchanged in the sense that a key requested with probability p. will account for a fraction, equal to p., of the total number of moves. Consider any two keys, k. and k.. If we assume that moves occur at intervals which are independent of whether or not k. is ahead of p i k., then k. will be ahead ' of the time and the performance will be vJ 72 the same as the move to front rule. However, this 1s not the case. After k. has been moved (assume p, > p.), its count 1s set to zero. Asymptotically, k.'s count is uniformly distributed over {0,1, ...,c}. After k. has been moved, its count is zero, and k.'s count ranges from zero to c. Clearly, after k. has been moved, the next move will occur sooner; the roles of k. and k. have been interchanged, and after k. has been moved, the count of k. (the more probable key) 1s closer to causing a move. Therefore, the probability we find k. ahead of k. is increased because we must wait longer for the next move when it is ahead of k.. w Finally, we notice that this rule will have much faster con- vergence than the wait c, move and clear rule since on the average, it will move a key after ewery c+1 requests. The performance of this rule is compared with previous rules in Table 2.9.1. Having analyzed these rules, we can see that they are asympto- tically inferior to both the wait c, move and clear rules and the limited difference rule. The convergence is faster than the wait c, move and clear rules. It is at most c+1 times slower than the cor- responding permutation rule, while the wait c, move and clear rule may be as bad as nc+1 . A final interesting fact is that for some probability distributions, it can be proved that this rule does not approach the optimum as c -*■ ». We conjecture this to hold for any probability distribution except the uniform and the distribution with a key of probability one. 73 Table 2.9.1 Comparison of Rules that use Counters c=0 C=l c=2 c=3 c=4 c=5 Limited Difference Rule 3.9739 3.4162 3.3026 3.2545 3.2288 3.2113 Wait c, Move to Front and Clear (Exact) 3.9739 3.6230 3.4668 3.3811 3.3285 Wait c, Transpose and Clear (Exact) 3.4646 3.3399 3.2929 3.2670 3.2501 Wait c and Move to Front (Exact) 3.9739 3.8996 3.8591 3.8338 3.8165 3.8040 Wait c and Transpose 3.4646 3.3824 3.3576 3.3473 3.3312 3.3272 Asymptotic costs for various rules assuming a nine element list whose probabilities are given by Zipf 's Law. Compare these with the optimal cost which is 3.1814. Cost for the limited difference rule and the wait c and transpose rule were estimated by simulations consisting of 1000 requests. The average of 200 trials is shown. 74 2.10 Time Varying Distributions In this section, we consider probability distributions that vary with respect to time. We first examine two examples concerning the move to front rule: one where the probability of a key decreases after it has been requested, and another where it increases. The first example supposes we have n keys, k,,k 2 ,...,k . Assume the requests made to this list form a sequence of permutations of these n keys. The permutations are independently chosen with each of the n! permutations being equally likely. A model that satisfies this constraint is a company that sends out bills each month. Its customers then pay their bills in a random order. Assuming this model, we can prove that the move to back rule is the optimal rule. The proof is as follows: after t requests out of a permutation have been made, each of the remaining n-t requests is equally likely, and the best we can do is to have these n-t keys (and none of the t previously requested keys) in the first n-t positions of the list. Since each key is equally likely to be requested, the ordering of the unrequested keys will make no difference. The move to back rule clearly achieves this and therefore must be optimal. Any other rule will occasionally move the requested key to one of the first n-t positions, resulting in a higher cost. To derive the average cost for the move to back rule to retrieve all n keys of a parmutation, we J.L. note that to retrieve the i key, we search through an unordered list of n-i+1 keys. The average cost is then 75 r/rnc+ \ ■ 1 ■? (n-1-H) + 1 _ n+3 E(C0St MTB } " n fa — ~T r If no rule 1s applied to the list, each key will be accessed exactly once giving a cost of E(Cost DflMn ) - I I i - -' RAND' n fa* 2 ' Finally if the move to front rule is used, accessing the list at time i will first require 1-1 comparisons with the previously requested keys. Then we search through an unordered list of n-1+1 keys. The cost is then nrnct \ . 1 r m i\ *. (n-1+1) + 1 3n+l E(Cost MTF ) --^(1-1) +.J ji 5-. This cost is three times larger than the move to back rule, and 50 percent larger than doing no moves at all. The reason is obvious: once a key has been requested, its probability of being requested again decreases. In this case, our strategy must be to move requested keys back in the list. Using the move to back rule, the keys will appear 1n the list in the order that they were requested. If our clients have regular habits and pay their bills at about the same time each month, the access time of the move to back rule will decrease further, and that of the move to front rule will increase. We now consider a second example. Suppose that with probabil- ity p, the requested key is the same as the previously requested key. 76 With probability 1-p some distribution (p, ,p 2> . . . ,p ) over the keys 1s used. The move to front rule would seem to be the logical choice here (if p is not extremely small) since the first key in the list will have a good chance of being requested again. To analyze this rule, we note that the probability of a given ordering is not effected by p, but depends only on the p.. We can view the chain as waiting in each state until a "normal" request is made. During this wait, only requests to the first key are made, and these do not change the order of the list. In addition, the wait time is the same for all states. Therefore, with probability p, the first key is found in one comparison. With probability 1-p, the p,,p 2 ,...,p distribution is p.p. used. The cost here is just 1+2 £ F~+d ' t ' le norma ^ move t0 Ui Figure 3.1.2 87 where y is in either of the shaded subtrees. Note that y cannot be in the left most subtree since then z would be between x and y. Case (2): z > x. The transformation is: ^ Figure 3.1 .3 where, again y must be in either shaded subtree. In either case, x is still an ancestor of y. Since z is no longer a descendant of x, any further rotations will leave x as an ancestor of y, and therefore Observation 2 must be true. Observation 3 : If neither x nor y is the ancestor of the other and a key that is not between x and y is requested, then neither x nor y will become the ancestor of the other. Proof : If neither x nor y is the ancestor of the other, there exists some w that is between x and y, and an ancestor of both. Since z is not between x and y, it cannot be between x and w and it cannot be between y and w. By Observation 2, w will still be an ancestor of both x and y in the resulting tree and hence neither x nor y will become the ancestor of the other. [~| w We now give a lemma that characterizes exactly when one node will be an ancestor of another, based on the sequence of requests and the initial tree. Lemma 1 : Node x will be an ancestor of node y using the move to root rule if and only if: (1) Neither x, nor y, nor any key between them in ordering on the keys has been requested, and x was an ancestor of y in the initial tree. OR (2) Neither y nor any key between x and y has been requested after the most recent request for x. Proof : ("if" part) (1) => Lemma follows from Observation 2. (2) => Lemma follows from Observation 2 and the fact that when x is requested, it becomes the root of the tree and hence is an ancestor of every other node. ("only if" part) Case 1 (x has not been requested) Suppose that x has not been requested, and it is an ancestor of y. We will show that this must imply (1). From the observations it is clear that the only way x can become an ancestor of y (if it is not already) is for x to be requested, Since x was never requested, it must have originally been an ancestor of y. Then, from Observation 2, no key between x 89 and y can have been requested since then x would no longer be an ancestor of y. Similarly, y cannot have been requested. Therefore (1) holds. Case 2 (x has been requested) Here we show (2) must hold. Consider the situation after the most recent request for x: x is an ancestor of y, and x will not be requested again. This is the same situation as in Case 1 and by using its proof, we can show (2) must hold. Q We also prove the following lemma about the first request rule. Lemma 2 : Node x will be an ancestor of node y using the first request rule if and only if: (1) Neither x nor y nor any node between them has been requested and x was an ancestor of y in the original tree. OR (2) Neither y nor any node between x and y was requested before the first request for x. Proof : Case 1 (x has not been requested.) First note that the three observations still hold if x has not been requested. Then, as we noted before, once the requested node (z) is no longer a descendant of x, further rota- tions involving z do net effect the tree rooted at x. Hence the v, two rules "look the same" to an unrequested x because the only differences occur after z is no longer a descendant of x. Therefore the proof for Lemma 1 is valid and Case 1 is proved. Case 2 (x has been requested) To see what happens when x is first requested, consider the previously requested keys and label them k,,kp,...,k so that k, < k 2 <...< k . They occur in a group at the top of the tree and divide the unrequested nodes into n+1 different sub- trees. The leftmost of these subtrees contains all keys less than k, , and the rightmost contains all greater than k . Each 1 n of the remaining consist of all keys between two "adjacent" k. . (See Figure 3.1.4) Thus, two unrequested nodes, x and y, are in the same subtree if and only if no key between them has been requested. When x is first requested, it moves to the root of the subtree it is in and becomes an ancestor of all nodes in that subtree. Therefore x becomes the ancestor of y if and only if neither y nor any key between x and y has been requested, x will then remain the ancestor of y since no node can move up past x and out of its subtree, proving Case 2. I — I We can now prove the main theorem. Theorem : Given any initial tree, the probability of obtaining a given final tree after any number of requests is the same for the move to root rule and the first request rule. 91 Keys k,,k 2 ,k 3 and k. have been requested. S, contains all keys less than k, . S 5 contains all keys greater than k.. S. contains all keys between k. •, and k. for 1 = 2,3,4. Figure 3.1.4 How the requested keys divide the tree. 92 Proof : Consider any sequence of requests r,,r^,...r. as inputs to the move to root rule and the reversed sequence r^ . , r*. _ -. , . . . ,r, as inputs to the first request rule. Note that these two sequences have the same probability. Trivially, the conditions of Lemma 1 hold if and only if the conditions to Lemma 2 hold. This means that x is an ancestor of y in one tree if and only if it is an ancestor of y in the other. Since this information allows us to uniquely construct a tree, the two trees are the same and the theorem is proved. Note also that the theorem also holds if we are given a probability distribition over the initial trees since the two rules perform identically on each tree. As is the case with linked lists, the first request rule creates the same tree as if the keys were not known a priori, and each "new" key (one requested for the first time) was inserted into the tree. If the initial tree was created in this manner, the move to root rule will not decrease the cost. The characterization given in Lemma 1 allows us to determine the time varying and steady state costs for the move to root rule. As stated in the theorem, these will equal the cost of the first request rule. Theorem : If key k. has probability p. of being requested and the keys are ordered k, < k < ...< k , the cost for the move to root rule after I c n t requests is: 93 PiP-i 2p.p. t UT which bears a striking resemblance to the asymptotic l<1 A. > B. B will move down one level if either of its sons is l-l i requested or the son of A. that is not A. +1 is requested. Thus, we can see that the movements of B are controlled by much more than just its probability. If B is far from the root, it may be difficult for B to 97 1.00 0.75 ■• 0.50 -- 0.25 -- 0.00 0.00 1.00 The curve in this figure shows those a and b where the cost of the move to root rule equals that of the move up one rule. The move to root rule has lower cost in the region to the right of the curve (53% of the total area) and the move up one rule has lower cost in the region to the left. Figure 3.5.1 y< move up in the tree. On the other hand, the move to root rule promotes nodes all the way to the root of the tree, so high probability nodes cannot spend a lot of time "trapped" far from the root. We derived a closed form for the cost of the move to root rule as a function of time which bore a striking resemblance to that of the move to front rule. The move to root rule was also shown to be identical to the first request rule. A simulation estimating the cost of the move up one rule suggested it was often inferior to the move to root rule (see Table 3.1.1). Both rules performed well and provided reasonable decreases over the cost of a random tree. The move to root rule averaged within 38 percent of the optimum, while the move up one rule was within 45 percent. These average costs suggest that the move to root rule would be the better choice. 3.2 Monotonic Trees Another method for getting more frequently accessed nodes high in the tree is to keep a frequency count associated with each node. The node with the largest count becomes the root of the tree, and each subtree is formed recursively, using the same rule. Such a tree is called monotonic because the frequency count for any given node is greater than or equal to that of any of its descendants. (This property is the same as the one required for a heap, see Williams [17].) It is a simple matter to keep the tree ordered in this manner. Rotations are used to promote the requested key until a key with equal or greater count is encountered. The resulting tree will have the monotonic property. 99 Asymptotically, the most probable key will become the root of the tree (by the Law of Large Numbers, it will be requested the most times), and each subtree will have its most probable node as its root. The asymptotic tree will be monotonic, with probabilities as weights. This allows us to easily calculate the asymptotic cost for this method. Table 3.2.1 show it averages within 15 percent of the optimum. However, this method is very poor for some distributions. Suppose key k. has probability p. and that the lexicographic ordering of the keys is k, < k <...< k 1 ^ n* Table 3.2.1 The Performance of Monotonic Trees ZIPF'S LAW English Letters #1 #2 #3 #4 #5 Average Random Cost 5.15 7.26 7.50 7.27 7.33 7.63 7.40 Optimal Cost 3.32 4.10 3.93 4.16 4.06 3.96 4.04 Monotonic Tree Cost (Exact) 3.77 4.91 4.18 5.32 4.68 4.20 4.66 Increase Over Optimal 13.6°/ ]Q 1°L £ Z°/ 07 no/ See Table 3.1.1 for explanation. TOO If the p. are approximately equal and p, > p ? >...> p then the skewed tree shown below will result. Figure 3.2.1 A worst case monotonic tree. A theorem by Mel home [13] shows how bad this can be. Theorem (Melhorne [13]): The ratio between the cost of a monotonic tree and the optimal tree may be as high as n/(4 log n) for trees with n nodes. This theorem depends on a \/ery unfavorable choice for the ordering of the keys and only gives an idea of the worst case per- formance of monotonic trees. We now consider how these trees perform on the average by assuming the probabilities are randomly chosen in some way. The first method we consider chooses the probabilities from a given set of n probabilities. The second chooses the probabilities from some given probability density function. We now investigate the first method. Theorem (Knuth [3, p. 432]): Given n keys and n probabilities (p.. ^ p 2 * ... > p ), if each of the nl assignments of probabilities to keys is n n equally likely, the expected cost of a monotonic tree is [2 £ H-p.] - 1. i=l q n 101 Proof: An assignment of probabilities to the keys imposes an ordering on the probabilities. Probability p. is to the left of p. if the key to which p. has been assigned is to the left of the key to which p. has been assigned. We have assumed that each of the n! orderings is equally likely. The cost of a monotonic tree is solely determined by this ordering imposed on the probabilities. Hence, the problem is equivalent to assigning p. to key ^ and then randomly ordering the k. since each of the n! orderings on the probabilities will still be equally likely. This restatement turns out to be simpler, and we work with it instead. Let £. be a random variable denoting the level of k.. By definition, C0St = I p.£. i = l ■• ] E(Cost)= E( I p£) = I p E ( £ .) i=l 1 1 i=i i i' So Define R. -( ^ 1fk J 1s an dncestor of k i otherwise T ^n £. =R 1+ R 2+ ... + Ri%i + 1 (R . = if j,i) E(^) = E(R 7 ) + E (R 2 ) + ...+ E (R N1 ) + 1 i-1 = J Prob(k, is an ancestor of k.) + 1 j=l J l' To determine this probability, we discuss some properties of a random ordering. Consider any two keys, k. and k.. There are only two 102 distinct orderings of k. and k.; k. to the left of k, and k. to the right of k., each having probability 1/2. For either of these two orderings, a third key can be in three different regions: to the left of both keys, between the two keys, and to the right of both keys, each with probability 1/3. In general, any ordering of i keys creates i+1 regions, each having probability 1/i+l of containing a given key. Consider any ordering of k, ,...,k,_ , and k.. Now k. will be an ancestor of k. if no key with probability greater than k. (that is, k 1 ,k«»...»k. -.) occurs between k. and k. . For this to happen, k. must occur in either the region to the left of k. or the region to the right. 2 Since there are j keys in the ordering, this probability is -rrr. 1-1 ? Hence E{i.) = ( I -nr) + 1 1 J-l J ' = 2H. - 1 l n and hence E(Cost) = I p.(2H.-l) i=l 1 n The following theorem tells us the cost of a tree built by a random sequence of insertions. Theorem : Given n keys (k, < k 2 <...< k ) and a set of n probabilities {p. : 1 £ i £ n}, if the probabilities are randomly assigned to the keys and then a tree is built by a random sequence of insertions, its expected cost will be 2 ^ n ' H - 3 for any set of probabilities. 103 Proof : Let p(k.) be random variables denoting the probability chosen for k. and let I. denote the level of k. . As before, n Cost = I p(k.U. i=l 1 n n n E(Cost) = E( I p(k.U.) = I E(p(k.H.) i=l n 1 i=l ] n The insertion sequence (and hence I.) does not depend on p(k.)- These two random variables are independent and E(Cost) = I E(p(k.)) EU_.) = i I E(£.) 1=1 1 n n i=l n E(£.) = 1 + 1 + £ Prob (k. is an ancestor of k, ) Now k. will be an ancestor of k. if and only if it occurs in the insertion sequence before k. and any key between k. and k.. This probability is -|jTn+T. Therefore "V-'^n^r i-1 1 n , ■ 1 ♦ [H r l] ♦ [H n . i+1 -1] 104 ? n - I H, - 1 2 y n-i + 1 1 2 ? n+1 2 n -lusiiiH -3 n n n I I This quantity is the same as that derived by Hibbard [18]. However, he assumed that the keys were equally probable, and our result holds for any set of probabilities as long as they are randomly assigned to the keys. To compare the monotonic and random tree costs, note that if we substitute p. = — into the formula for the monotonic tree cost, l n we get exactly the expression for the random tree cost. = -[(n+l)?i- ? 1] - 1 =^+U.H -3. Clearly, n -j^i 1 i = i n n p. = — is the worst case since p n > p >...> p„ and the coefficients of r i n I c n the p. increase with i. Hence, except for the case where p. = — , the monotonic tree is better than the random tree. If some of the p. are large, the savings can be quite substantial. 105 To demonstrate this, we first consider a set of probabilities satisfying the geometric distribution, p. = -5-, r < 1, 1 * i < n, where n+1 1 R r= — i s a normalizing constant. Substituting this into the formula for the cost of a monotonic tree, we get n r\ , . 2 ? i I 1 1=1 n K K i = l j=l J , 2 ? 1 r j -r n+1 n t n Wi=j V,J 1-' 2 r ? r J n+1 ? ■ [ !V--r" T| T4-] - 1 r . r n+l ^J j^J If n is large, this is approximately 2 1 — ln(y- -) - 1, a constant independent of n, If the probabilities satisfy Zipf's Law, the cost is n 1 ? n H. 1=1 n n i=l " H n 2 (H n " H n ) " ] where fo\ n i °° i 2 H (2) = 1 1 < y 1 .* 1=1 i 2 1=1 i 2 6 106 Thus, for large n, the cost is approximately H , which is half of the cost of the random tree. For both distributions the monotonic tree gives significant gains over the random tree. We now consider a method of selecting the key probabilities that has been studied by Nievergelt and Wong [19]. Here we are given a probability density, f(x), and the key probabilities are chosen with respect to that density. It is necessary to drop the requirement that our choices must sum to one, so instead of probabilities, we must con- n sider key weights . The cost of a tree is now £ w.£. where w. is the i=l 1 ] ] weight of k. . We now need two standard definitions Define : E(f) = xf(x)dx, the mean of f(x) Define : F (x) = f(y)dy, the distribution function of f(x) J -oo F (x) is the probability a number chosen according to the density func- tion is less than or equal to x. The following theorem defines the cost of a monotonic tree for an arbitrary density function. Theorem : Given n keys (k, < k„ <...< k ) and a density function f(x), if the weights of the keys are independently chosen from this density function, the expected cost of the resulting monotonic tree is 107 n-1 yf(y)F (y) 1 ' dy 2E(f)[nH -("!)]- 2 I ?^- n-l ^ i=1 i j — oo Proof : As before, we have n n E(Cost) = I E(w,(l+ I A..)) = nE(f) + J 7 E(w.A..) 1=1 1 #1 J1 1-1 j?i J where w- is the weight chosen for k^ , 1 if k. is an ancestor of k. and A.. =< j l J1 1 if not Note that w. and A., are not independent. ' ECw.A^.) = y ProbfwjAj. = y)dy A., can just equal or 1 , and if A.. = the only y having nonzero probability is y = 0. Since this will be multiplied by y = 0, the case with A., can be ignored and oo r E'(tr f Aj 1 ) = y Prob(w i = y and A^. = l)dy To determine Probfw. = y and A.. = 1) we note that k. will be an ancestor of k. if and only if w. > w. and w. is greater than the weight of any key between k. and k. in the ordering on the keys. The probability that w. = y is f(y)dy. We then chose an x ^ y for w. . Any specific x is chosen with probability f(x)dx. For this x, we must chose the |i-j|-l=m keys between k. and k. to have weight less than or equal to x. The probability for this isF(x) m . The product of these must be integrated over x ^ y, giving Prob(w. = y and A. . = 1 ) = rr: f(y) f(x)F(x) m dxdy Then v m+l ■ f(y) ^Itf . since ^- - f(x) and F(») - 1 m+T dx E(w iAji ) D yf(y) [ ] ' F ^( ] dy. Note that this quantity depends only on m, and not the values of i and j Since there are 2(n-m-l) distinct ordered (i,j) pairs having a given value of m, n-2 r , r-/..xm+l I m=0 n " 2 f l-FM m+1 E(Cost) = nE(f) + I 2(n-m-l) yf (y) [-^j^p ] dy = n E (f , + z n f #i ' m=0 yf(y)dy - 2^ 5^1 f yf (y )F(y) m+1 dy m=0 J n-1 n-m 2E(f)[nH i - l^-l)] - 2 I lOL- yf(y)F(y) m dy m=l Nievergelt and Wong give us two measures to which we can compare this cost. Theorem (Nievergelt and Wong [19]): The cost of the optimal tree whose weights are chosen according to f(x) is E(f) n log n + 0(n). 109 Theorem (Nievergelt and Wong [19]): The cost of a random tree whose weights are chosen according to f(x) is (2 £n 2) E(f) n log n + 0(n). Nievergelt and Wong also considered choosing the weights for a monotonic tree from a uniform distribution. The resulting cost was (2 In 2) E(f) n log n + 0(n), asymptotically equal to that of the random tree. They conjectured that this held for any probability distribution. We now show this conjecture to be true. I am grateful, to D. L. Burkholder for the proof of the following lemma: Lemma: For any density function f with a finite mean, n-1 i=l i yf(y)F 1 (y)dy = o(n log n) Proof : F irst note that for any i >, 1, yf(y)?Hy) yf(y) Hence Lebesgue's Dominated Convergence Theorem (see [21]) applies and we have lim i-H» lim r-i yf(y)F 1 (y)dy = j yf(y) j™ F^yjdy = since lim ,-i "|-M» F n (y) = if F(y) < 1 = and f(y) = if F(y) = 1 no Now, n-1 n-i 1=1 1 yf(y) F 1 (y)dy n-l -| N'. Therefore the limit of this ratio is zero as n ■> °° and the lemma is proved. I 1 Theorem: If n keys have their weights chosen according to any density function with finite mean, the expected cost of a monotonic tree is (2 In 2)E(f) n log n + 0(n), asymptotically equal to the cost of a tree built from a random insertion sequence. Proof: The cost of a monotonic tree is 2E(f)[nH , - (£-1)] - 2 n "l Iti fyf(y) F^y) dy n "' * i= i i J -co The first term is asymptotically equal to 2E(f) n In n = E(f) n log n. The final term was shown by the lemma to be o (n log n). Hence the asymptotic cost is (2 In 2) E(f) n log n. We now show that the cost of a monotonic tree is less than or equal to that of a random tree, proving that the cost of a monotonic tree equals (2 In 2)E(f) n log n + 0(n). 112 The method we are using to select key weights choses n weights independently from a density function. An equivalent method first selects a set of n weights from an n-dimensional density function. This function is constructed so that the probability of choosing a given set equals the probability of obtaining it (in any order) from n selections from the original function. We then choose a permutation of the set. Now consider any set. We have already studied the case where the key probabilities (easily generalized to include key weights) were selected from a set and found the expected cost of a monotonic tree to be less than or equal to that of a random tree. Since this holds for es/ery set in the n-dimensional probability density, the theorem is proved. I I Finally, we cite the results of a simulation run by Walker and Gottlieb [14] that showed the performance of monotonic trees to be poor. They state that although these poor results are partially explained by the fact that the leaf weights cannot influence the structure of the tree, even the tests with all leaf weights equal to zero did not produce acceptable nearly optimum trees. Indeed the majority of the results concerning monotonic trees are quite discouraging. This method performs well only when we are quaranteed that the key probabilities will differ significantly from a uniform distribution (i.e., have low entropy). If this is not the case (as in the situation described by Nievergelt and Wong), the performance is asymptotically the same as randomly built trees. 113 3.3 Cost Balanced Trees The previous methods have focused on the fact that a rotation moves a certain node up in the tree, ignoring the fact that it also dis- turbs two (possibly large) subtrees. The method of cost balancing considers the entire tree. We do a rotation only when it appears to be profitable, that is, when the number of accesses to the nodes that will move up exceeds the number to those that will move down. This method has the advantage that it is possible to do the rebalancing during the search for the requested key since we know in which subtree it lies. For example in Figure 3.3.1, we perform a rotation to promote A, if w(A) + w(S, ) > w(B) + w(C) + w(S 3 ) + w(S«). We promote C if w(C) + w(S 4 ) > w(B) + w(A) + w(S ] ) + w(S 2 ). Here, w(A) is the number of times A has been requested, and w(S->) is the number of times any node in S, has been requested. All this information is avail- able at node B, and any rebalancing can be done there. Figure 3.3.1 114 Another advantage of this rule is that the leaf weights play a rule in the balancing of the tree. Since a rotation that promotes the nodes in subtree S, also must promote the leaves, the "weight" of S, must be the number of accesses to both the nodes and the leaves of S,. If the weight of the leaves is considerable, this is a significant advantage over previous rules, all of which ignored accesses to leaves. However, balancing at one node may cause other nodes to become unbalanced. (See below). ^ Figure 3.3.2 Here, both node A and node C may require rebalancing. (However, no rebalancing would be required at node A if node C had been its left son). An attempt to correct these imbalances (and all the imbalances resulting from the corrections) could be quite costly. A more rea- sonable policy is to ignore the imbalances and rebalance at a later request when the search path passes through the unbalanced node. Tables 3.3.1 and 3.3.2 compare these two rules. The total rotation rule (which corrects all imbalances) has a slightly lower cost, Table 3.3.1 The Performance of the Limited Single Rotation (LSR) Rule 115 ZIPF'S LAW English Letters Random Cost 5.15 Optimal Cost 3.32 LSR Cost 3.44 Increase Over Optimum 3.55% Average Number of Rotations/Request .111 Average Over the Last 100 Requests #1 n #3 f 4 ? 5 Average 7.26 7.50 7.27 7.33 7.63 7.40 4.10 3.93 4.16 4.06 3.96 4.04 4.33 4.14 4.46 4.28 4.20 4.28 5.46% 5.31% 7.29% 5.42% 5.98% 5.89% .199 .204 .199 .197 .200 .200 033 .041 .040 .039 .034 038 See Table 3.1.1 for explanation. Table 3.3.2 The Performance of the Total Single Rotation (TSR) Rule 116 ZIPF'S LAW English Letters 1 *2 13 «4 t 5 Average Random Cost 5.15 Optimal Cost 3.32 TSR Cost 3.41 Increase Over Optimum 2.93% Average Number of Rotations/Request .113 Average Over the Last 100 Rotations 7.26 7.50 7.27 7.33 7.63 7.40 4.10 3.93 4.16 4.06 3.96 4.04 4.33 4.11 4.41 4.22 4.17 4.25 5.57% 4.57% 6.02% 3.92% 5.27% 5.07% .220 .219 .217 .209 .213 .215 .040 .048 .044 .036 .036 .041 See Table 3.1.1 for explanation, 117 It gives an increase of 5.07% over the optimal cost (as compared with 5.89 percent for the limited rotation rule) with asuprisingly small increase in the number of rotations required (an average of .215 per request as compared with .200). However, there is much more overhead associated with a rotation in the total rotation rule. Since imbal- ances can propogate throughout the tree, either a pointer to a node's father must be maintained or we must stack the nodes encountered during the search for the requested key. These tables also show how much work the rules do after many requests. We consider the last 100 requests out of 500 in the simula- tion. The limited rotation rule does an average of .038 rotations per requested during this period, or approximately one rotation every 27 requests. The total rotation rule averages .041 rotations per request, or one rotation e\/ery 24 requests. A weakness of these rotation rules is that they do not con- sider the "inside" subtrees (the right subtree of a node's left son, or the left subtree of its right son, see Figure 3.3.3). Figure 3.3.3 The inside subtrees of node A are darkened. 118 A rotation can promote either exterior subtree, but the interior sub- trees remain at the same level. This can lead to very poor trees that are still "stable" in that no rotation can be performed. Figure 3.3.4 shows an example. This tree is stable as long as the weight of a node is less than or equal to that of his father. Figure 3.3.4 While the worst case performance on such a distribution is quite bad, a simulation suggests the average case is acceptable. The probability distribution was p, . ^ , p 2 . ^ P 25 = tItb ' 1 3 49 P 26 ~ 7275 ' p 27 " 7275 •" p 50 = 7275 ' The tree shown inFi 9ure 3.3.4 is stable for this probability distribution. Yet, after 500 requests the limited rotation rule reduced the cost to 4.7593, a mere 3.06 percent increase over the optimal cost of 4.6180. The limited single rotation rule has several desirable features. The necessary rotations can be performed during the search for the requested key. The performance for the distributions we con- sidered was good, within 5.89 percent of the optimal. After an initial 119 period that reorganizes the tree, the rule required a rotation approxi- mately every 27 requests, a very low maintenance cost. The total single rotation rule has little more to offer. It decreases the cost to within 5.07 percent of the optimum, and sur- prisingly does only slightly more rotations. However the overhead required by this rule to allow changes to propogate throughout the tree is not justified by the relatively small decrease in cost, making the limited rotation rule a better choice. 3.4 Double Rotations A transformation (called a "double rotation") that allows the promotion of inside subtrees is shown below. -> Figure 3.4.1 A double rotation, 120 A rule that uses both single and double rotations has several ad- vantages over a rule that is limited to single rotations. First, such a rule will always be able to promote a requested node if it is profitable to do so. Single rotations can be used to promote nodes in the outside subtrees and double rotations for those in the inside sub- trees. A double rotation actually consists of two successive single rotations, each promoting node B one level. The double rotation has an advantage when doing both of these rotations will reduce the cost, but doing only the first will not. A rule that is restricted to performing single rotations will check if the first rotation can be done. Since it cannot be, the tree is left unchanged, and the second rotation is not considered. A rule which also considers double rotations will be able to reduce the cost in this situation. Table 3.4.1 shows the cost of a rule that uses both single and double rotations. For the distribution considered, this method averaged within 3.84 percent of the optimal cost. The total number of rotations required per request (counting both single and double rotations) is .208, which is very close to the averages for the limited rotation rule (.200) and the total rotation rule (.215). However, after many requests, fewer rotations are required than for either single rotation rule. In fact, the average over the last 100 requests was .027 single rotations (one e\/ery 36 requests) and .008 double rotations (one eyery 129 requests). Table 3.4.1 The Performance of the Double Rotation (DR) Rule 121 ZIPF'S LAW English Letters # 1 #2 §3 #4 # 5 Average Random Cost 5.15 Optimal Cost 3.32 DR Cost 3.40 Increase Over Optimum 2.35% Average Number of Single Rotations/ Request .074 Average Over Last 100 Requests Average Numbers of Double Rotations/ Request .037 Average Over Last 100 Requests 7.26 7.50 7.27 7.33 7.63 7.40 4.10 3.93 4.16 4.06 3.96 4.04 4.29 4.06 4.32 4.23 4.09 4.20 4.62% 3.45% 3.80% 4.11% 3.22% 3.84% .129 .129 .127 .126 .126 .127 .029 .027 .025 .028 .028 .027 .082 .081 .082 .079 .080 .081 .007 .007 .008 .007 .009 .008 See Table 3.1.1 for explanation. 122 These results are supported by a simulation run by Baer [20]. He assumed all keys to be equally likely and found the cost of the double rotation rule to range from 1.2 percent to 3.6 percent of the optimum. This cost is lower than that we obtained because more re- quests were made in Baer's simulation. In addition, his trees have fewer nodes. This would also explain the lower cost since the cost of smaller trees tends to be closer to the optimum (see Table 2.9.1, compare the cost for the English letters (26 nodes) with the others (100 nodes)). Baer also gives statistics on the number of rotations done by this rule. Again his results agree with ours. His most extensive simulation (85,000 requests) required 23 rotations for the first 850 requests (one every 37 requests). The next 7650 requests caused 6 rotations (one every 1,275 requests), and the final 76,500 also caused 6 requests (on every 12,750 requests). These results indicate that the • cost of "maintaining" the tree might be extremely low. Bruno and Coffman [12] have considered an extension of this rule that can promote a node any number of levels by using a sequence of rotations. They, however, were concerned with an algorithm to build a nearly optimal tree from a set of known key probabilities and used this set of transformations to reduce the cost of the initial tree. Every final tree in their simulation was within 5 percent of the optimum, and the average was within 2.6 percent. This suggests further rules, where we consider promoting the requested node i levels for i = 1, 2, ..., k, where k is a parameter of 123 the rule. Note that the single rotation rules have k = 1, and the double rotation rule has k = 2. Increasing k will increase the work the rule must do, but will result in decreased retrieval times. The results of Bruno and Coffman suggest that the retrieval time will not be greatly improved by increasing k beyond 2, while the increase in the complexity of the algorithm to execute the rule would be substantial 3.5 Summary and Conclusion We have examined several methods for dynamically altering binary search trees to decrease their access time. The first two methods were analogs of the linked list case : the move to root rule and the move up one rule . The move to root rule was shown to be identical to the first request rule (analogous to the case of the linked list) and a formula for the cost was derived. Calculations showed this method to be an improvement over a tree built by a random sequence of insertions. The analogy breaks down when we consider the move up one rule; it is often outperformed by the move to root rule. A simulation showed the move to root rule to have lower average cost than the move up one rule (38 percent of the optimum compared with 45 percent), in- dicating that of these two rules, the move to root rule would be the superior choice. However, these rules should be used only if we cannot associate a counter with each key. If we can, the following rules will give better performance. We next considered rules that use counters. The first of these was the monotonic tree rule. Its performance was found to be 124 disappointing. Melhorne [13] has shown that the ratio of the cost of a monotonic tree to that of the optimal tree may be as high as n/(4 log n), for a tree of n nodes. If the weights of the nodes are chosen according to a probability density function (a case considered by Nievergelt and Wong [19]) the performance is asymptotically the same as a random tree for any probability distribution. A simulation by Walker and Gottlieb [14] also confirms the poor performance of this method. Only if we assume the probabilities are chosen from a fixed set (guaranteeing they will be "spread out") does this method signifi- cantly improve over the cost of a random tree. A formula was derived for this case, and significant decreases were obtained for Zipf's Law and the geometric distribution. This assumption was also true in the simulation we discussed. It showed the cost of this method to average within 15 percent of the optimal cost. We then discussed the most promising methods. Simulations showed that the cost of the limited single rotation rule averaged with- in 5.89 percent of the optimum. However, the worst case performance was \zery bad, resulting in the tree shown in Figure 3.3.4. The total single rotation rule reduced the average cost to 5.07 percent of the optimum. However, since this rule requires much more overhead, this small decrease in cost does not justify its use. Finally, we considered the double rotation rule . Its cost averaged approximately 3.84 percent of the optimum. Though this rule must check for both single and double rotations, it averages a rotation eyery 36 requests and a double rotation e\/ery 129 after the initial period of reorganization. Compared with the limited single rotation 125 rule (one rotation every 27 requests), the double rotation rule does less work after the initial period. It then appears to be the best choice of the counter rules. 126 4. CONCLUSION The purpose of this thesis was to examine various heuristics that dynamically alter data structures by moving frequently accessed keys near the "top" of the data structure. The first data structure we considered was the linked list. If relatively few requests (compared to the number of keys) are anticipated, the fast convergence of the move to front rule (nearly as fast as the optimum) makes it the best choice. If many requests are expected, the transposition rule gives the best performance because its asymptotic cost is close (10 percent) to the optimum. For an intermediate number of requests, the first request/transposition rule combines both of these features with a small additional overhead, making it the best choice in this case. If space is available for counters, there are much better rules. If enough space is available so that the counters will never overflow, the frequency count rule should be used. If this is not the case, the limited difference rule uses whatever space can be spared and gives nearly optimal results for even a small number of bits. The second data structure we considered was the binary search tree. Here we found the move to root rule to give the best performance of the rules that do not use counters, approximately 38 percent of the optimum. If counters are available, the double rotation rule appears to be best. Its performance averages 3.84 percent of the optimum, and after a period of initial organization of the tree, it is 127 very inexpensive to execute. On the average, a single rotation is done es/ery 36 requests, and a double rotation every 129 requests. The methods we have considered are simple and inexpensive to execute. In addition, they significantly reduce the average access time over data structures in which the keys are randomly arranged; in some cases these methods keep the structure very close to the one of optimum cost. 128 APPENDIX We will make great use of Markov chains, so a summary of their important properties is required. These can all be found in [1]. To define a Markov chain, we first consider a set S of states and a sequence (x , n=0,l,...} of random variables which take their values from S. The value x tells us which state the chain is "in" n at time n. In addition, the Markov property must be satisfied. This is Prob{x n+] = S n+1 |x Q = S Q ,...,x n = S n } = Prob{x n+1 = S n+] |x n = S n >, which says that the probability of being in a given state depends only on the previous state, and not any before that. We then define the on probabilities Prob{x , = j|x = i} (or just P ( i , j ) ) as the trans iti probabilities of the chain and can form a matrix whose (i,j) element is Prob{x ,, = jlx = i}. This is called the transition matrix P. If n+ 1 ' n we are given a probability distribution x over the states, then xP gives us the probability distribution at the next time step. This defines the basic idea of Markov chains. Several more definitions are required. Definition : A state x leads to a state y if there exists a sequence of states x,,...,x such that P(x,x ] ) P(x r x 2 ) ... P(x n ,y) > 0. Definition : A set C of states is closed if no state in C leads to a state outside of C. 129 Definition : A closed set C is irreducible (also called ergodic ) if x leads to y for all choices of x and y in C. Most of the chains we are dealing with will be irreducible. That is the set of all states, S, will be irreducible. S is then closed since there are no states outside of S. For these chains, it will be clear that some sequence of requests can be designed to cause any state to lead to any other state. Definition : Define the period of a state x as the greatest common divisor (g.c.d.) of the set {n ^ 1 : P (x,x) > 0}. It can be shown that all states have equal periods and this is defined as the period of this chain. A chain with a period of 1 is called aperiodic . Nearly all of the chains we deal with will be aperiodic. If the top element in a configuration is requested, none of these schemes will alter the configuration and hence we will have P (x,x) > 0. Hence 1 is in the set {n > 1 : P n (x,x) > 0} so the g.c.d. must be 1 and the chain must be aperiodic. Definition : A steady state distribution (also called a stationary distribution is defined as any probability distribution x over the states such that xP = x. Note that we can easily determine this distribution by solving the system x(P-I) = 0. The following theorem shows that this distribu- tion tells us the asymptotic behavior of the chain. 130 Theorem : Any closed and irreducible chain with a finite number of states has a unique steady state distribution. If the chain is aperiodic, it approaches the steady state distribution from any initial distribution. If the chain has period d, then for < i - d, the chain x. > x i+H' x i+2d ••• nas a un1c l ue steady state distribution and approaches it. Another useful theorem that characterizes asymptotic behavior is the ergodic theorem. Theorem [Ergodic Theorem]: Let N.(s) be a random variable denoting the number of times the chain has been in state s after t transitions. Suppose that s has steady state probability p(s). Then if the chain is closed and irreducible (ergodic) Urn V s > , , • Note the ergodic theorem holds for both periodic and aperiodic chains. The chains we are dealing with have a cost associated with each state. Let the probability of being in state s at time t be P t (s), and suppose s has steady state probability p(s) and cost c(s). Finally define the cost of the chain at time t (C0ST t ) as J p t (s)c(s). seS For an aperiodic irreducible chain, we have .^ P t (s) = p(s). Hence 1™ COST. = I p(s)c(s). L u seS For any irreducible chain, we can use the Ergodic Theorem to show 131 1 l lim j I COST. = 5! (s)c(s). Therefore £ p(s)c(s) determines the t-*» i=l ] seS seS asymptotic cost of the chain. We also use the following theorem. Theorem : Let c. be random variables denoting the cost of the state the chain is in at time i. Then for any irreducible chain, -,. C-,+C 9 +.. .+C. i™ e( j_^ — £)= lP(s)c(s) L l seS •,. c.+c +.. .+c. lim VAR( J2 t )=Q t-X" t Proof : To prove the first statement, note E(c.) = COST.. Then t n ^ c +c + +c I C0ST i lim rr c r c 2 + --- +c t , _ lim ifl \ - , , , el t-o E( 1 ) " t-*» 1 Ip(s)c(s). seS To prove the second statement, 1-im c,+c«+...+c + 1™ VAR(-1— ^ h t-*» t "Mm c,+c +.. ,+c. 2 . lim E[( J_L 1 } j. ( Zp(s)c(s)) 2 seS lnm Ct+C +. . .+C, 2 = E[(J™-J-L 1) ] - ( I p(s)c(s)) 2 (1) 1 L seS lim C 1 +C«+...+C. We now determine ' (— — ^r -). Let N.(s) be the number of times state s is visited in t transitions. Then c,+c 2 +...+c. equals the number of times we visit each state times its cost summed over all states 132 Therefore c +c + +c E N (s)c(s) lim ( _l_^__t ) _ lim seS z lim M s > scS r l seS by the Ergodic Theorem. Substituting this into (1) shows the variance is zero. I I Another important question is how quickly the chain approaches steady state. We can tell this from the eigenvalues of the transition matrix. To see this, suppose the n eigenvectors y-.,...,y span the space of all probability distributions. We can then write an initial distribution x Q as a linear combination of the y. x« = c,y, + ... + c y rl n-'n So after t transitions, the distribution will be "*O pt = C l^l + ••■ + c Ar Since the chain has a stationary distribution, there is some x such that xP = x. Hence x is an eigenvector with eigenvalue 1. If the chain is closed, irreducible and aperiodic, we can show that all other eigenvalues have modulus strictly less than 1. If we suppose X, is the eigenvalue -t-t- t- t - equal to 1 , we get x Q P = x + X-c^yp + ^3C 3 y 3 + ... + ^ n c n y n - Tnen as t -*- °°, x n P -*• x, and the rate of convergence is limited by the size of the other n-1 eigenvalues and eigenvectors. 133 REFERENCES [I] Hoel, P. B. , S. C. Port, and C. J. Stone, Introduction to Stochas- tic Processes , Houghton Mifflin Company, Boston, 1972. [2] Rivest, R. L., "On Self Organizing Sequential Search Heuristics," CACM, 19 (1976), 63-67. [3] Knuth, D. E., The Art of Computer Programming , Vol. 3, Addison- Wesley Publishing Co., Reading, Mass., 1973. [4] Yao, A. C. , Personal communication. [5] Kahn, D., The Codebreakers , Macmillan Company, New York, 1967. [6] Kucera, H. and W. N. Francis, Computational Analysis of Present-Day American English , Brown University Press, Providence, R.I., 1967. [7] Burville, P. J. and Kingman, J.F.C., "On a Model for Storage and Search," J. Appl. Prob. , 10 (1973), 697-701. Hendricks, W. J., "The Stationary Distribution of an Interesting Markov Chain," J. Appl. Prob. , 9 (1972), 231-233. [9] Hendricks, W. J., "An Extension of a Theorem Concerning an Inter- esting Markov Chain," J. Appl. Prob. , ]0_ (1973), 886-890. [10] McCabe, J., "On Serial Files with Relocatable Records," Operations Res. , 12 (1965), 609-618. [II] Knuth, D. E., "Optimum Binary Search Trees," Acta Informatica , 1 (1971), 14-25. [12] Bruno, J. and E. G. Coffman, "Nearly Optimal Binary Search Trees," Proc. IFIP Congress 71 (1971). [13] Mel home, K. , "Nearly Optimal Binary Search Trees," Acta Informatica , 5 (1975), 287-295. [14] Walker, W. A. and C. C. Gottlieb, "A Top-Down Algorithm for Con- structing Nearly Optimal Lexicographic Trees," in Graph Theory and Computing (ed. R. C. Reid), Academic Press (1972), 303-323. [15] AdeTson-VeTskii, G. M. and E. M. Landis, Dokl . Akad. Nauk SSSR 146 (1962), 263-266; English translation in Soviet Math 3, 1259-1263. [16] Nievergelt, J. and E a M. Reingold, "Binary Search Trees of Bounded Balance," SI AM J Comput . 2 (1973), 33-43. 134 [17] Williams, J.W.J. , Algorithm 232 - Heapsort. , CACM 7 (1964), 347- [18] Hibbard, T. N., "Some Combinatorial Properties of Certain Trees with Application to Searching and Sorting," JACM 9 (1962), 16-17. [19] Nievergelt, J. and Wong, C. K. , "On Binary Search Trees," Informa- tion Processing 71 , vol. 1, North Holland, Amsterdam, pp. 91-93. [20] Baer, J. L. , "Weight Balanced Trees," National Computer Conferenc e 1975, pp. 467-472. [21] Rudin, W., Prin ciples of Mathematical Analy sis, McGraw-Hill, New York, 1976, p". 321. " 135 VITA James Richard Bitner was born August 2, 1953, in Minneapolis, Minnesota. He attended the University of Illinois at Urbana-Champaign, receiving a B.S. in Mathematics and Computer Science (June 1973) with highest honors and distinction in Mathematics and Computer Science. He continued at the University of Illinois for graduate study, receiving an M.S. (December 1974) in Computer Science. During this time, he was employed as a research assistant for Drs. C. L. Liu and E. M. Reingoldo He is also a member of Phi Beta Kappa and Phi Kappa Phi. The titles of his published articles are: "Use of Macros in Backtrack Programming," "Tiling 5n x 12 Rectangles with Y-Pentominos" and "Efficient Generation of the Binary Reflected Gray Code and Its Applications." .IOGRAPHIC DATA ET 1. Report No. UIUCDCS-R-76-8I8 ; |e anj Subtitle EURISTICS THAT DYNAMICALLY ALTER DATA STRUCTURES TO UREASE THEIR ACCESS TIME ithor(s) James R. Bitner rtorraing Organization Name and Addtess department of Computer Science Jniversity of Illinois at Urbana-Champaign Jrbana, Illinois 6l801 ftonsoring Organization Name and Address National Science Foundation •ashington, D.C. ipf Irmt-ntary Notes 3. Recipient's Accession No. 5. Report Date July 1976 8. Performing Organization Rept. No. 10. Project/Task/Work Unit No. 11. Contract/Grant No. NSF GJ-i+1538 13. Type of Report & Period Covered 14. In evaluating the access times of data structures, we often assume : each key is equally likely to be requested. This is seldom the case ) practice: some keys will be requested more often than others. The data uctures considered are lists and trees, and access times can be substan- Lally reduced in these data structures by the use of several simple "istics that cause the more frequently accessed key to move to the :op" of the data structure. These methods are analyzed and compared. .' *ords and Document Analysis. 17a. Descriptors ita structures, linked lists, binary search trees, access times J nficrs /Open-Ended Te OkTI Field/Group ■ Statement 35 (10-70) 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21- No. of Pages 135 22. Price USCOMM-DC 40329-P7 1 JVN 14 1577