ja.IJr 
 
 SnKB 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 y\o. 818 - 825 
 cop. 2, 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 8E P 1 <? 19W 
 
 SEP 2 1 1 
 
 L161 — O-1096 
 

 V 
 
 uiucDCS-R-76-818 
 
 HEURISTICS THAT DYNAMICALLY ALTER DATA STRUCTURES TO 
 DECREASE THEIR ACCESS TIME 
 
 by 
 
 James Richard Bitner 
 
 July, I976 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/heuristicsthatdy818bitn 
 
UIUCDCS-R-76-818 
 
 HEURISTICS THAT DYNAMICALLY ALTER DATA STRUCTURES TO 
 DECREASE THEIR ACCESS TIME* 
 
 by 
 
 James Richard Bitner 
 
 July 1976 
 
 Department of Computer Science 
 University of Illinois at Urbana -Champaign 
 Urbana, Illinois 6l801 
 
 *This work was supported in part by the Department of Computer Science 
 and in part by the National Science Foundation under Grant GJ-I4I538 
 and was submitted in partial fulfillment of the requirements for the 
 degree of Doctor of Philosophy in Computer Science, 1976. 
 

 m 
 
 ■ 
 
 ACKNOWLEDGMENTS 
 
 I am very grateful to my advisor, Edward M. Reingold, for his 
 many valuable suggestions during the preparation of this thesis, and to 
 the other members of my exam committee: C. W. Gear, David Kuck, David 
 Muller, and C. L. Liu. I thank D. L. Burkholder for his aid in proving 
 the lemma on Page 109. I also wish to thank the typists at Techni- 
 Typist for their fine job of typing this thesis and the National Science 
 Foundation (Grants GJ-31222 and GJ-41538) and the Department of Computer 
 Science for their financial support. Finally, I want to thank my family 
 and friends for their understanding moral support. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 2. LINKED LISTS . . 5 
 
 2.1 Asymptotic Results for Permutation Rules 6 
 
 2.2 Rate of Convergence 23 
 
 2.3 Other Permutation Rules 40 
 
 2.4 A Hybrid Rule 41 
 
 2.5 The First Request Rule . . 46 
 
 2.6 Frequency Count Rule 49 
 
 2.7 Limited Difference Rules ... . 54 
 
 2.8 Wait c, Move and Clear Rules 57 
 
 2.9 Wait c and Move Rules 65 
 
 2.10 Time Varying Distributions 74 
 
 2.11 Summary and Conclusion 77 
 
 3. BINARY SEARCH TREES 82 
 
 3.1 Transform after Every Request 84 
 
 3.2 Monotonic Trees 98 
 
 3.3. Cost Balanced Trees 113 
 
 3.4. Double Rotations 119 
 
 3.5. Summary and Conclusion 123 
 
 4. CONCLUSION 126 
 
 APPENDIX 128 
 
 REFERENCES 133 
 
 VITA 135 
 
1. INTRODUCTION 
 
 Users of data structures frequently ignore some very valuable 
 information: the number of times each key is requested. It is rarely 
 the case that all keys are equally likely to be requested; some keys 
 will be accessed frequently and others only rarely. Because we search 
 for a key in a data structure by examining the locations in a certain 
 order (which may depend on the results of previous key comparisons), the 
 position that a given key occupies is important, and data will be re- 
 trieved much faster if high probability keys occur in the positions of 
 the data structure that are searched first, that is, near the "top" of 
 the structure. 
 
 There are various results in the case where the key request 
 probabilities are known a priori, but this is seldom the case. If the 
 probability distribution is not known beforehand, it must be observed as 
 requests for keys are made. To take advantage of this information, the 
 structure must be dynamically altered so that high probability elements 
 rise to the "top" of the structure, and low probability elements sink to 
 the "bottom." The purpose of this thesis is to evaluate and compare 
 simple heuristics that perform this alteration. 
 
 Throughout this thesis, the expected access time will be 
 used as the evaluation criterion for these heuristics. If c. key com- 
 parisons are required to locate key k. , which is requested with probabil- 
 
 n 
 ity p., then the expected access time is defined by I p. c. This is 
 
 1 i=l n ] 
 
 a good measure of the cost of accessing elements in the data structure 
 
since it is equal to the two major costs, the expected number of 
 comparisons and the expected number of links we must traverse (in a 
 list or tree) . 
 
 We will not consider the cost of performing the dynamic altera- 
 tion because the rules we will look at are simple, and this cost should 
 be quite small. Occasionally it is not, and these instances will be 
 noted. 
 
 Depending on the exact probability distribution of key requests, 
 substantial savings can be achieved if the arrangement of keys in the 
 data structure is favorable. As an example, let us consider a linked 
 list. If the order of the keys is random (each of the n! orderings is 
 equally likely), the expected access time is -*- since, on the average, 
 we must search half-way down the list to find a given key. Note that 
 this result holds for any probability distribution of key requests. 
 
 The optimal arrangement occurs when the elements of the list 
 are in order of decreasing probability. The proof is simple: in any 
 other ordering there must be a key that occurs before another keys 
 which has higher probability. Interchanging these two keys results in 
 a list of lower cost, and hence the original arrangement cannot be 
 optimal . 
 
 An interesting generalization is given by Knuth [3, p. 400]. 
 He supposes that the records are stored on tape and that the i has 
 probability p. and length L. . It can then be proved that arranging the 
 records in decreasing order of P..-/I-. will yield the optimal arrangement. 
 
If the keys are optimally arranged, substantial decreases in 
 access time can result as shown in Table 1. 
 
 Table 1 
 Comparison of Optimal and "Random" Costs 
 
 Distribution 
 
 p, = 1 , p. = for i > 1 
 
 P 1 = r i (^ T ),r<l 
 
 r-r 
 (Geometric Distribution) 
 
 n 
 
 Minimum Cost 
 
 Cost of Random 
 Arrangement 
 
 1 
 
 1 nr n+1 _ 1 
 1-r r _ r n+1 ] " r 
 
 
 n+1 
 2 
 
 n+1 
 2 
 
 p. = _!_ where H n - J \ '■ In n 
 
 n 
 (Zipf's Law) 
 
 n - n n+1 
 
 n " k=l " H n ln n 
 
 Distribution of English 
 
 Letters, see Kahn [5, p. 100] 7.5375 13.5 
 
 Distribution of the 50 most 
 probable English words, see 
 Kucera and Francis [6] 12.5718 25.5 
 
 The first distribution is rather unlikely, but points out that great 
 decreases can occur. The minimum cost for the geometric distribution 
 quickly approaches j-^ , a constant, again, much smaller than -^-k If 
 the distribution is in accordance with Zipf's Law, the random cost is 
 
■nr-2- times greater than the optimum. This can mean a factor of four or 
 
 five for reasonably sized n. Both these formulas are easily derived by 
 
 n 
 substituting thep. into ][ ip.(the optimal cost). For both 
 
 English letters and the fifty most frequent English words, the random 
 
 cost is approximately twice the optimum. 
 
 The costs of the heuristics described in the following section 
 are shown to be at most twice the optimal cost. The exact cost, cal- 
 culated for several distributions, is well under this bound. In fact, 
 it is at most 38 percent larger than the optimum for these distributions, 
 
 Throughout this thesis, we will use In x to denote the natural 
 
 logarithm of x and log x to denote the base 2 log. Also, the following 
 
 standard notations are used: f(n) = 0(g(n)) means lim f (n) is 
 
 n-*» g(n) 
 bounded, f(n) = o(g(n)) means this limit equals zero, and f(n) = 
 
 fi(g(n)) means this limit is bounded, but not equal to zero. 
 
2. LINKED LISTS 
 
 The first data structure we will study is the linked list. 
 Throughout most of this chapter, we make the assumption that the request 
 probabilities are constant with time and that the requests are in- 
 dependent. Because of these assumptions, we can model the behavior of 
 an n-element list by a Markov chain* with n! states. Each state cor- 
 responds to one of the different orderings of the list. This model 
 allows us to analyze the performance of the various rules. 
 
 This analysis will be from two different points of view. 
 The first supposes that the number of key requests will be large 
 compared to the number of elements in the list. In this case we are 
 only concerned with the steady state of the Markov chain which tells us 
 the asymptotic behavior. 
 
 The second point of view supposes there will be relatively 
 few key requests. In this case, it is important how quickly the 
 Markov chain approaches steady state from the initial distribution (in 
 which each state is assumed to be equally likely). In general, the 
 rate of convergence also indicates how well a rule will perform if the 
 distribution varies with time. The more rapid the convergence, the 
 more quickly a rule can adapt to a changing distribution. 
 
 *See the appendix for a summary of the important properties of 
 Markov chains. 
 
2.1 Asymptotic Results for Permutation Rules 
 
 We define a permutation rule as a set of n permutations 
 {Tji 1 < i < n} of the integers {!,..., n}. When the key in position i 
 is requested, t. is used to reorder the elements of the list. 
 
 We will be primarily concerned with the following two 
 permutation rules: The move to front rule moves the requested key to 
 the top of the list, and the transposition rule transposes the requested 
 key with the one above it. In both cases, if the requested key is 
 already at the top of the list, no action is performed. We consider 
 these rules because they are simple, and because the changes required 
 on a linked list are cheaply executed. (If the list is sequentially 
 allocated instead of linked, the move to front rule becomes very ex- 
 pensive to execute. ) 
 
 Previous work in this area has been done [2 and 7] where the 
 cost of the move to front rule is shown to be at most twice the 
 optimal cost. Rivest [2] determined the steady state distribution of 
 the transposition rule, and proved that it has lower asymptotic cost 
 than the move to front rule. He also conjectured the transposition 
 rule to be the optimal rule of all permutation rules. Yao [4] proved 
 that the transposition rule is optimal assuming an optimal rule exists. 
 The cost of the move to front rule has been determined [3,7,10] and 
 analyzed by Knuth [3] in the case of Zipf's Law. Finally, Hendricks 
 [8] has determined the steady state distribution for the move to front 
 rule. These results will be proved in this section. 
 
As noted in the appendix, both heuristics have steady state 
 distributions and approach them from any initial distribution, and the 
 asymptotic access time is the expected access time for the steady state 
 distribution. We begin the analysis by determining the steady state 
 distribution for the move to the front rule. 
 
 Theorem (Hendricks [8]): Consider any arrangement of n keys and label 
 them k, ,...,k with probabilities p, ,...,p respectively. Using the move 
 to the front rule, the steady state probability of this arrangement is 
 
 n 
 n p. 
 
 P ( k l y s ^ ' 
 
 n I Pi 
 i=l j=i+l J 
 
 Proof : For the list to be in this ordering, k. must have been requested 
 more recently than k- + , , k. +2 ,...k , for 1 < i < n-1. The probability 
 
 that k, , was requested after every other key is p-, . The probability that 
 
 p 2 
 
 k, was requested last out of k os k ,..., k is - — t-= — t tz— . In 
 
 2 *• 2 5 3' n p 2 + p 3 + --- + P n 
 
 general, the probability that k. was requested last out of k. , k. +1 ,..., 
 
 p . 
 k is + 1 . Multiplying these probabilities gives the probability 
 
 p i " * p n 
 of the required sequence of key requests, which is 
 
 n n 
 
 n P n P 
 
 j = i ' = i-i 1 rn 
 
 n n n-1 n I I 
 
 n I p,. n I p. 
 i=2 j=i J i=l j=i+l J 
 
H 
 
 A more interesting statistic then the steady state distribution 
 is the asymptotic access time (or "cost"). This 1s determined 1n the 
 following theorem. 
 
 Theorem (Knuth [3, p. 399], Burville and Kingman [7] and McCabe [10]): 
 Given keys k, , k,,,...,k having probabilities p, , p 2 ,...,p , the 
 asymptotic cost for a list ordered by the move to front rule is 
 
 y p i p i 
 l<i<j<n p i + p.* 
 
 Proof : If we let I. be a random variable denoting the location of k., 
 
 E(Cost) = E( I p. JL.) = I E(p.i ) 
 i=l n n i=l n l 
 
 Since the expectation of a sum equals the sum of the expectations, and 
 
 i=l 1 1 
 
 since each p. is a constant. To determine £(l. ), define for j7i 
 
 random variables 
 
 fl rl if k. is ahead of k. in the list 
 
 Jl if not 
 
 Since a given key's position is just one more than the number of keys 
 
 ahead of it, 
 
 I* = 1 + I A., and 
 J7i J1 
 
 E(£J = 1+1 E(A..) 
 
 1 m J1 
 
But A.. = 1 • Prob (k. ahead of k.) + • Prob (k. not ahead of k.) 
 
 = Prob (k. ahead of k. ). 
 Therefore 
 
 E(l.) = 1 + J Prob (k,. ahead of k.) 
 and 
 
 tfi 
 
 E(Cost) = l p, + I Prob (k. ahead of k.) 
 i=l 1 j?1 J n 
 
 n 
 
 E(Cost) = 1 + I p. I Prob (k, ahead of k. ) . 
 i=l 1 j7i J n 
 
 This last relation is very important and much use will be made of it. 
 
 Prob (k. is ahead of k. ) is just the probability k. was 
 ^ p. ^ 
 
 requested after k. and therefore equals j- . Substituting this into 
 
 the cost formula gives 
 
 1 + y y p i p j =i+2 I Pl ' P J 
 • i •£• d.+d. l<i<i<n p.+p. 
 i=l J^i H i H j " J_ p i H j 
 
 Table 2.1.1 gives an indication of the magnitude of this cost. 
 We can see from this table that the move to the front rule compares quite 
 favorably with the optimum. The increase for the geometric distribution 
 appears to reach a limit of 26.4%. Although the increase for Zipf's Law 
 and large n cannot be seen from this table, Knuth [3, p. 399] has 
 shown it to be approximately 38 percent for large n. The other two 
 distributions considered have increases of 27.8 percent and 32.6 percent, 
 which are also in the same range. 
 
 The steady state distribution of the transposition rule can 
 also be determined. 
 
10 
 
 Table 2.1.1 
 
 Asymptotic Cost of the Move to the Front Rule 
 Compared with the Optimal Cost 
 
 ENGLISH LETTtHI 
 
 
 
 
 OPTIMAL 
 
 MOVE TO 
 
 FRONT RULE X 
 
 INCREASE 
 
 7 
 
 .5375 
 
 
 9 
 
 • 6359 
 
 27.8 
 
 ENGLISH WORDS 
 
 
 
 
 OPT I MAL 
 
 MOVE TO 
 
 FRONT RULE X 
 
 INCREASE 
 
 12 
 
 • 5718 
 
 
 16 
 
 .6667 
 
 32.6 
 
 GEOMETRIC DISTRIBUTION MITH N ELEMENTS 
 
 N 
 
 OPTIMAL 
 
 
 MOVE 
 
 TO FRONT RULE 
 
 X INCREASE 
 
 3 
 
 1.5714 
 
 
 
 1.8000 
 
 14.5 
 
 4 
 
 1.7333 
 
 
 
 2.0607 
 
 18.9 
 
 5 
 
 1.8387 
 
 
 
 2.2392 
 
 21.8 
 
 6 
 
 1.9048 
 
 
 
 2.3550 
 
 23.6 
 
 7 
 
 1.9449 
 
 
 
 2.4270 
 
 24.8 
 
 6 
 
 1 • 96 86 
 
 
 
 2.4704 
 
 25.5 
 
 9 
 
 1.9624 
 
 
 
 2.4958 
 
 25.9 
 
 10 
 
 1 .9902 
 
 
 
 2.5105 
 
 26.1 
 
 11 
 
 1.9946 
 
 
 
 2.5188 
 
 26.3 
 
 12 
 
 1.9971 
 
 
 
 2.5234 
 
 26.4 
 
 13 
 
 1.9984 
 
 
 
 2.5260 
 
 26.4 
 
 14 
 
 1.9991 
 
 
 
 2.5274 
 
 26.4 
 
 15 
 
 1.9995 
 
 
 
 2.5281 
 
 26.4 
 
 16 
 
 1.9998 
 
 
 
 2.5285 
 
 26.4 
 
 17 
 
 1.9999 
 
 
 
 2.5287 
 
 26.4 
 
 18 
 
 1.9999 
 
 
 
 2.5289 
 
 26.4 
 
 19 
 
 2.0000 
 
 
 
 2. 5289 
 
 26.4 
 
 20 
 
 2.0000 
 
 
 
 2.5290 
 
 26.4 
 
 ZIPF'S LAW 1 
 
 tflTH 
 
 N ELEMENTS 
 
 
 N 
 
 OPTIMAL 
 
 
 MOVE 
 
 TO FRONT RULE 
 
 X INCREASE 
 
 3 
 
 1.6364 
 
 
 
 1.8545 
 
 13.3 
 
 4 
 
 1.92 00 
 
 
 
 2.241 1 
 
 16.7 
 
 5 
 
 2. 1898 
 
 
 
 2.6104 
 
 19.2 
 
 6 
 
 2.4490 
 
 
 
 2.9660 
 
 21.1 
 
 7 
 
 2.6997 
 
 
 
 3.3107 
 
 22.6 
 
 8 
 
 2.9435 
 
 
 
 3.6462 
 
 23.9 
 
 9 
 
 3.1814 
 
 
 
 3.9739 
 
 24.9 
 
 10 
 
 3.4142 
 
 
 
 4.2949 
 
 25.8 
 
 11 
 
 3.6425 
 
 
 
 4.6100 
 
 26.6 
 
 12 
 
 3.8670 
 
 
 
 4.9198 
 
 27.2 
 
 13 
 
 4.0879 
 
 
 
 5.2248 
 
 27.8 
 
 14 
 
 4.3056 
 
 
 
 5.5256 
 
 28.3 
 
 15 
 
 4.52 05 
 
 
 
 5.8225 
 
 28.8 
 
 16 
 
 4.7327 
 
 
 
 6. 1158 
 
 29.2 
 
 17 
 
 4.9425 
 
 
 
 6.4058 
 
 29.6 
 
 18 
 
 5.1501 
 
 
 
 6.6928 
 
 30.0 
 
 19 
 
 5.3555 
 
 
 
 6.9770 
 
 30.3 
 
 20 
 
 5.5590 
 
 
 
 7,2585 
 
 30.6 
 
11 
 
 Theorem (Rivest [2]): Consider any arrangement of n keys and label 
 them k, ,...,k with probabilities p, ,...,p respectively. Using the 
 transposition rule, the steady state probability of this arrangement 
 
 iS n „ , 
 
 n p?" 1 
 
 1=1 ! 
 
 where n 
 
 r _ I n p"" 1 where n = (II, ,IU,.,., JL) 
 
 1 ~ all n i=l n i ' c n 
 
 is a permutation of {l,...,n}. 
 
 Proof : We can easily verify that this is indeed a probability distribu- 
 tion since all terms are nonnegative and must sum to 1 (by definition of 
 
 C). 
 
 To show it is the stationary distribution, we show P(k, ,..,,k ) 
 satisfies the steady state equation: 
 
 P(k r ...,k n ) = P] P(k r ...,k n ) 
 
 n-1 
 *& P i P(k r k 2 ,...,k i+1 ,k.,...,k n }. 
 
 From the definition of P we can see that 
 
 p < k l k i + r k i V =^f • p < k l k n' 
 
 (1) 
 
12 
 
 Substituting this in (1) gives 
 
 3/t"! k i' k 1+l k n)l7-Pt + « k T-' k B 'Pl" 
 
 n-1 
 P(k n k n )( J p 1+l ♦ p,) ■ P(k r .....k n ) 
 
 D 
 
 Hence the distribution is the steady state distribution. 
 
 From this distribution, we can determine the steady state cost 
 by multiplying the cost of each state by its probability and then summing 
 over all states. We have done this for Zipf's distribution and the 
 results are summarized in Table 2.1.2. It is interesting to note that 
 the difference from the optimum decreases as n increases (the difference 
 increases for the move to the front rule). Also, the percentage in- 
 crease is noticeably smaller than the 38 percent of the move to the 
 front rule. In fact, Rivest [2] has shown this must hold for any 
 distribution. 
 
 Table 2.1.2 
 
 Asymptotic Cost of the Transposition Rule 
 Compared with the Optimum 
 
 ASYMPTOTIC COST FOR ZIPF'S LAM WITH N ELCMCNTS 
 
 N OPTIMAL TRANSPOSITION RULE ktNCNEASE 
 
 3 1.6364 1.6181 11.1 
 
 4 1.9200 2.1392 11.4 
 
 5 2.1896 2.4304 11.0 
 
 6 2.4490 2.7042 10.4 
 
 7 2.6997 2.9662 9.9 
 6 2.9435 3*2191 9.4 
 9 3.1814 3.4648 8.9 
 
13 
 
 Theorem (Rivest [2]): For any probability distribution, the cost of the 
 transposition rule is less than or equal to that of the move to front 
 rule. 
 
 Proof : Consider Prob(k. is ahead of k.) for the transposition rule. 
 
 This is merely the sum of the probabilities of all states where k^ is 
 
 pVz 
 ahead of k.. These states have probabilities of the form ^ J 
 
 (assume p. > p.), where x > y and z is a product of powers of the other 
 
 p!s. We can pair each p X p^z in the numerator with two terms (p.p^z and 
 
 K 1 J 1 J 
 
 pWz) in the denominator. 
 
 pMz p x ~ y 
 Since — J = , we qet 
 
 P x p y z + p y P x z P x -y + P x - y 
 
 P X " y 
 „X V / M x, x v, „y„x,x 
 
 PiP J =( D *-y + D x-y )(p i p j z + p i p J z) 
 
 H i p j 
 
 p. 
 Since 1 g x-y, j ( p i p j z + p i p i z ) = p i p j z * Summin 9 over 
 
 all states with x > y and dividing by C gives 
 P. ( 
 
 J 
 
 p i 
 - — r-=--< Prob(k. ahead of k.) 
 
 Since - — - = Prob(k. ahead of k.) using the move to the 
 p i p j J 
 
 n 
 front rule, and E(Cost) = 1 + £ p. J Prob(k. ahead of k.), 
 
 i=l n 1« 1 J 
 
 the transposition rule is better than (or the same as) the move to 
 
 the front rule. I I 
 
14 
 
 So the transposition rule has lower asymptotic cost than 
 the move to front rule. Rivest [2] has conjectured that this result 
 extends to all permutation rules, i.e. that the transposition rule is the 
 optimal rule (has lowest asymptotic cost for any probability distribu- 
 tion) out of all permutation rules. 
 
 Intuitively, this conjecture is not surprising. The best we 
 could possibly do (see Section 2.6) is to count the number of times 
 each key has been accessed, and keep the keys ordered with respect to 
 this count. The rule which most closely approximates this strategy is 
 the transposition rule. We can also look at the situation in a dif- 
 ferent way: After a long time, the high probability keys are near the 
 front of the list, and the low probability keys near the bottom. 
 Occasionally, a low probability key will be accessed, and the move to 
 the front rule will move it to the front of the list, increasing the 
 expected cost since many high probability keys have moved down one 
 position. The transposition rule does not do this, and it is difficult 
 for the low probability keys to rise to high positions in the list. 
 
 While we cannot yet prove the transposition rule is optimal, it 
 has been shown by Yao [4] that if an optimal permutation rule (optimal 
 for all distributions) does exist, it must be the transposition rule. 
 He does this by showing a particular distribution for which the trans- 
 position rule is optimal. Before discussing Yao's proof, we need a 
 theorem by Rivest [2]. 
 
15 
 
 Theorem (Rivest [2]): An optimal permutation rule, {t., 1 < j < n} 
 
 J 
 i,L 
 
 (t. is used when the j key is requested) must have the property that 
 
 J 
 
 each t. : 
 J 
 
 (i) leaves positions j+1 to n of the list fixed 
 (ii) if j > 1, moves the key in position j to some 
 position j' < j. 
 
 Proof : Consider the probability distribution p. = 1/k for 1 < i < k and 
 p. = for k < i < n, for some k < n. Any permutation rule satisfying 
 (i) and (ii) above will have an asymptotic cost of (k+l)/2 since all of 
 the keys with zero probability will move to the end of the list and 
 stay there. Any permutation rule which violates (i) will occasionally 
 move a key with zero probability in front of one with nonzero prob- 
 ability, and thus have greater asymptotic cost. Any permutation rule 
 which satisfies (i) but not (ii) will not be able to move any keys out 
 of positions j such that t.(j') = j, so that the optimal ordering for 
 this particular probability distribution cannot be reached, and, again, 
 the asymptotic cost will be higher. 
 
 Theorem (Yao [4]): Given a list of n elements with probability distri- 
 
 1-e 
 bution p, = 1-e and p. = — y, 2 < i < n, there is an e small enough 
 
 such that the transposition rule is optimal for this distribution. 
 
 Proof : The Markov chain corresponding to this list has n distinct 
 states, each one having k, (the key with probability 1-e) in a different 
 position. Let q. be the steady state probability that k-j occupies 
 
16 
 
 position i, using the transposition rule, and let r. be this probability 
 using the optimal rule. The transposition steady state satisfies: 
 
 q l = (1 "n^ q l + (1 " e) q 2 
 
 q 2 = H^T q l + ^f E q 2 + (1 " e) q : 
 
 'n-1 = iPT q n-2 + ^f e Vl + (1 " e) q n 
 
 q n = TPT q n -l + e q n 
 
 We solve this to get: 
 
 Vl ■ (pTf T^ q i for J = l....»n-l 
 
 n 
 Since Y q. = 1, we obtain 
 1=1 ? 
 
 q] = 1 + 0(e), qj = (^rr)^ 1 + 0(e j ) 2 < j < n. 
 
 From Ri vest's Theorem, we know an optimal rule (if one exists) must 
 have the form: 
 
17 
 
 T l 
 
 T 2 
 
 T 3 
 
 1 2 ... n\ 
 
 1 2 ... nj 
 
 1 2 3 ... n 
 
 2 1 3 ... n. 
 
 1 2 3 4 ... n 1 
 ld 31 a 32 a 33 4 ••• n 
 
 Vl 
 
 T n 
 
 1 2 ... n-1 n 
 
 a n-l,l a n-l,2 ••• a n-l,n-l ' 
 
 1 2 ... n-1 n 
 
 a nl a n2 ••• a n-l,n a nn 
 
 The theorem now proceeds inductively in n-2 stages. At the 
 
 1L 
 
 k stage we will show: 
 
 1. t. +2 is the same as the transposition rule. 
 
 2. a.. = k for i > k + 2. 
 
 J ' r k+l n-1 1-e r k ^n-l j uu j * 
 
 Note that after stage n-2, we will have proven x. is the same as the 
 transposition rule for i=l,...,n, and hence the theorem will be proved. 
 
 To begin the induction, we note that x-, and x 2 are the same 
 as the transposition rule by Rivest's Theorem. Hence condition 1 is 
 
18 
 
 initially satisfied. Note that condition 2 vacuously holds 1f k=0. 
 Finally, any rule satisfying the condition of Ri vest's Theorem has 
 
 oo 
 
 r-.=l when c=0. Since r, = £ a -j e i' ^ c= 0> r i = ^ =d 0" Hence r i = 
 l+0(e) and condition 3 is initially satisfied. The proof for stage k 
 proceeds as follows: 
 
 Let N(i,j) be the number of I such that an,- = j- This is just 
 the number of requests that cause the key in position i to move to 
 position j. The r. must satisfy their steady state equations. These 
 give us the following bound 
 
 r i £ N^' 1 ) * 7jfr r k k + 1 < i < n 
 
 Here we have counted only those transitions from state k to state i 
 and replaced the transition probability by a lower bound of -§y. 
 Summing these inequalities gives 
 
 n 
 
 i 
 
 i=k+l 
 
 r w t - tr »= > LL" lki,|] iPT r k' 
 
 n 
 Set a = I N(k,i). Note that a is just the number of j's such that 
 i=k+l 
 
 a jk > k ' 
 
 > a(^-) k+1 + 0(e k+2 ) by Condition 3 
 
 So 
 
 Vl + --' + r n = A( n^T )k+1 + °( £k+2 ) for some A " a (1) 
 
 From this we conclude 
 
19 
 
 E xk+1 n / k+2< 
 
 (k + l)r k+1 + ... + nr n nk + l)A(^) K+l + 0(en (2) 
 
 r } + ... + r k = 1 - A^)** 1 + 0(e k+2 ). (3) 
 
 We know that 
 
 since 
 
 (k+l)q k+1 + ... + n q n = (k+l)^)^ 1 + 0(e k+2 ) (4) 
 
 q k+1 = ljtf) M and ^ < 0( e k+1 ) for i > k + 1 
 Subtracting (4) from (2) gives 
 
 (k+l)r k+1 + ... +nr n - (k+l)q k+1 ... - n. q R > 
 
 (k+l)(A-l)(^§ T ) k+1 + 0(e k+2 ) (5) 
 
 Since 
 
 'l * — + t >k =1 "Vl " ••• - q n 
 
 we have 
 
 ^ + ... +q k = 1 - (^rr)^ 1 +0( e k+2 ) (6) 
 
 1 e i - 1 
 
 Now let c = — y y-r • We have q. = c q., i < k and from property 3, 
 
 r. = c 1 " 1 r, 1 < k (7) 
 
 Hence q, + q 2 + . . . + q k = q] (l+c+c 2 +. . .+c k_1 ) = 1 - (^) k+1 + 0(e k+2 ) 
 
 1 - (^) k+1 + 0(e k+2 ) 
 
 q 1 = Qd - 1 CT (8) 
 
 1 1 + c + z L + ... + c K ' 
 
Similarly 
 
 2 k-l 
 r, + ... + r k = r^l+c+c +.. .+c ) using (3), 
 
 = l - A Cri§T )k+1 + °( £k+2 )' 
 
 20 
 
 so 
 
 r l = 
 
 1 " A(^) k+1 * 0(c k+2 ) 
 
 1 + c + c + ... + c 
 
 FT 
 
 (9) 
 
 Now 
 
 k-l 
 q, + 2q 2 + ... + kq u = q., + 2cq 1 + . . . + kc q 
 
 ■k ^1 
 
 = q 1 (l+2c+...kc k_1 ) 
 Substituting for q, from (8) 
 
 i - (H§r) k+1 + o(^ k+2 ) 
 
 k-l 
 
 l+2c+...+kc 
 l+c+...+c k " 
 
 Similarly 
 
 k-l 
 
 r-, + 2r 9 + . . . + kq u = q n + 2cq, + . . . + kc q 
 
 l+2c+...+kc 
 
 k-l 
 
 H - A (n§T> k+1 + °( £k+2 ^ 
 
 Subtracting (11) from (10) gives 
 
 1+C+...+C 
 
 ET 
 
 (10) 
 
 (11) 
 
 q ] + 2q 2 + ... + kq k - r ] - 2r 2 - ... - kr k = 
 
21 
 
 [(A-lMpiy) 1 * 1 + 0(c k+2 )] 
 
 < C(A-D( T5 § T ) k+1 + 0(e k+2 )] 
 
 l+2c+...+kc 
 
 k-1 
 
 1+C+...+C 
 
 k+kc+...+kc 
 
 k-T 
 
 k-1 
 
 1+C+...+C 
 
 R-T 
 
 < WA-Dt^y)^ 1 + 0(e k+2 ) 
 Finally, subtracting (12) from (5) gives 
 
 (12) 
 
 r, + 2r 9 +..-.+ nr- q, - 2q 
 
 n H l 
 
 .. - nq, 
 
 < (A-l)(^) k+1 + 0(e k+2 ) 
 
 (13) 
 
 But the left hand side of (13) is just the cost of the optimal rule 
 minus the cost of the transposition rule. If A > 1, then the trans- 
 position has lower cost than the optimum rule, a contradiction. Hence 
 A < 1 and therefore a < 1. Recall that a is the number of a., f k. 
 Since t, + , is the same as the transposition rule by Condition 1, we 
 have a, + , . = k+1 and hence a > 1. Hence a must equal 1. Thus all 
 other a.. < k (else, a > 1). If a.. < k, Condition 2 will be violated 
 since this value has already appeared in the permutation. Hence a.. = k : 
 proving Condition 2 for k. This determines the equation for r. ,. 
 If k=l, 
 
 r l " < ] - 7PT>'1 + n-e)r 2 
 
 V (1-eHii-l) r 1 = CT +0 ^ 
 
22 
 
 since r^ = l+0(e). If k > 1 
 
 r k-RVl + ^f r k + 0-«' Vi 
 
 Solving this for r k+1 and substituting (iDlliLlSl) ^ for r^ 
 (Condition 3) gives r k+1 = ( ( n _^( -| _ £ ) )r k and since r k = (^§y) k + 
 
 k+1 
 
 0(e ) (again, Condition 3) 
 
 r„ Al = (A) k+1 - ( c k+2 ) 
 
 k+1 v n-l 
 
 In either case, Condition 3 is proved. All that remains is to prove 
 Condition 1. From Condition 2, we know a k+2 • = i for i < k - 1. We 
 now know a k+2 i. ■ k, and hence x k+2 is the same as the transposition 
 rule, completing the induction for Condition 1 and proving the theorem. I I 
 
 A final question is how far these rules can possibly be from 
 the optimum. This is answered for the move to the front rule by 
 Rivest [2] and Burville and Kingman [7], If we assume p, > p 2 > ... > p 
 are the key request probabilities, then 
 
 n 1-1 P,-P 
 
 I I. 
 
 ILL 
 
 MTF Cost . 1+2 i=1 j=l p i +p j ^ l+2x 
 n 
 
 1=1 n 
 
 Opt Cost " * 1+x 
 
 n -i n _-| 
 
 where x = J p . ( j-1 ) £ 2(1 - — py) since x < — . 
 
 j=l 
 Therefore, the move to the front rule never does more than twice the 
 
23 
 
 work of the optimal ordering. The theorem also holds for the trans- 
 position rule as its cost is less than or equal to that of the move to 
 front rule. This may be a significant savings, as remarked in the 
 introduction. 
 
 In summary, the situation in the asymptotic case is quite 
 clear: the transposition rule has asymptotic cost less than or equal 
 to that of the move to front rule. Both rules compare quite favorably 
 with the optimal cost. For the distributions we considered, the 
 transposition rule was within 10 percent of the optimum and the move to 
 front ranged from 25 percent to 38 percent. Finally, the cost of these 
 rules is at most twice the optimal cost for any probability distribution, 
 
 2.2 Rate of Convergence 
 
 In the previous section, we only considered asymptotic 
 behavior and found the move to the front rule inferior to the trans- 
 position rule. In this section we will consider how quickly the rules 
 approach their asymptotes. We will find that the move to the front 
 rule approaches its asymptote more quickly, and initially has a lower 
 expected cost than the transposition rule. 
 
 The reason for this is clear. In the initial random ordering, 
 many high probability elements are far down in the list. These must be 
 brought to the front to reduce the cost. Obviously, the move to the 
 front rule will do a better job here since these keys make large jumps 
 and quickly rise to the top. The transposition rule allows keys to 
 move only one step at a time, so the convergence should be rather slow. 
 
24 
 
 When key k. is requested, it moves up one position, decreasing the cost 
 by p. since we can locate k. with one less compare, and increasing the 
 cost by p. -■ , since key k. , (the key above k.) moves down one position, 
 resulting in a net decrease of p . - p - _ -. . If the p.'s are "close" in 
 size, they are O(-), and this decrease is O(-), resulting in a very slow 
 convergence. We would expect the move to the front rule to take Q(n) 
 time to get very close to steady state, assuming ft(n) high probability 
 keys. The transposition rule should require fi(n ) since each key must 
 move fi(n) steps to get near the top. 
 
 To begin the analysis, we determine the expected cost of the 
 move to the front rule as a function of time. 
 
 Theorem : Given keys k, ,kp,...,k having request probabilities 
 
 p, ,p 2 ,...,p , the expected cost of accessing a list being modified by 
 
 the move to front rule after t requests is 
 
 2 
 v P.- P.- y (P,--P.j) t 
 
 1+2 I JL_J_ + l 3 J M-d -d ) 
 
 U1<jsn p^p. U1<j*n 2(p..+p.) v p i y y 
 
 Proof : We begin by deriving Prob (k. is ahead of k. at time t). There 
 are two different situations that could cause k. to be ahead of k.. 
 First, neither k. nor k. was requested in t requests and k. was 
 initially ahead of k.. The probability of this is -^(l-p.-pJ . Second 
 k!s most recent request was at time m > 1, and k. was not requested 
 after time m. The probability for this is 
 
25 
 
 I (l^.-p.)^ P1 = V (l-P rPi ) m p. 
 m=l 1 J 1 m=0 1 J 1 
 
 ■ ( vpt» - (1 - p r»/ ^ 
 
 Adding these gives 
 
 P(k. ahead of k. at time t) = 
 
 J 
 
 i (1 - p r p j )t + 'p^pJ' " ( P7P7 ,(1 - p r p j )t 
 
 P i P i" p i t 
 
 = Vpt + Ttitt (1 - p r p / 
 
 Then 
 
 E(Cost) = 1 + I p. I P(k, ahead of k . ) 
 
 i=l v j7i J ] 
 
 y y p,P i P^P-P.) t 
 
 1 1=1 #1 P^Pj 2(p.+ Pj ) (1 p 1 p j } 
 
 Y p i p i Y ( p i"Pi) 
 
 = 1+9 L _L_J_ + Z 1_J fi_ D _ D } 
 
 l<i<j£n p^p, l<i<j<n 2(p.+p.) u p i p j ; 
 
 D 
 
 J 
 
 As t ■*• °° the last term vanishes and the first two terms give 
 us the steady state cost. The last term then measures the speed of 
 convergence. 
 
 Determining the expected cost of the transposition rule as a 
 function of time is much more difficult and has been determined only 
 in some simple cases. We will consider two cases which will serve to 
 illustrate the difference in the rates of convergence of the two rules, 
 
26 
 
 In the first case, we assume that one key (k,) has probability 
 one and the other n-1 have probability zero. Using the move to the 
 front rule, the first request will be for k, , which will Immediately 
 move to the front and remain there. The cost is then 
 
 n+1 
 
 for t = 0, 
 
 1 
 
 for 
 
 t > 0. 
 
 Using the transposition rule, k, is equally likely to start in any 
 position and will move up one position at a time until 1t reaches the 
 top. We will then have 
 
 r 
 
 t-n 
 
 n 
 
 , t < n - 2 
 
 Prob(k, is in position 1 at time t)=i 
 
 1, t > n - 1 
 
 v. 
 
 Prob(k, is in position i^l at time t)= 
 
 - if n - i < t 
 n = 
 
 otherwise 
 
 For t < n - 2, the expected cost is: 
 
 n-t 
 
 i. SU £ i.(l).i ♦**%**- !♦!(¥» 
 
 For t > n - 1 the expected cost is 1 since k, must have reached the top 
 
 An interesting statistic to compute from these time varying 
 costs is the overwork. This is defined as the area between the cost 
 
27 
 
 curve and its asymptote. (See Figure 2.2.1) The overwork measures 
 how quickly the cost converges to Its asymptote. Also, since the area 
 under a cost curve measures the total cost, the overwork represents the 
 total number of comparisons we do in addition to the asymptotic cost. 
 
 The overwork can be determined by summing the time varying part 
 of the equation for the cost. The overwork for the move to the front 
 rule is then 
 
 r r (Pi-Pi) 2 , t ,. (Pi-Pi) 2 
 
 £ I 2(p+p ) (Hyp/ 8 I , 3 %2 - 
 
 t=0 l<i<j<n ap i p j ; ] J l<1<j<n 2(p +p Y 
 
 This formula allows us to get a simple upper bound on the overwork. 
 
 Since 
 
 p.-p . 
 
 p.+p . 
 
 < 1 we have 
 
 y 1 < p -£h.) 2 " y 1 « aLndl , 
 
 l<i<j^n K i K j lsi<J<n 
 
 so the overwork is 0(n ) for the move to the front rule. This bound is 
 interesting since it tells us how significant the overwork can he 
 compared to the asymptotic cost. For example, after n ^ n 7 * key requests, 
 the overwork is at most one comparison per request, and the asymptotic 
 cost is a good approximation to the amount of work we have done, 
 
 For the distribution just considered, the overwork in the 
 move to the front rule is -»-, The overwork in the transposition rule 
 is 
 
28 
 
 cost 
 
 time 
 
 The overwork is the area between the cost 
 curve and its asymptote. 
 
 Figure 2.2.1 The definition of the overwork. 
 
29 
 
 n v 2 1 ,n-tx _ 1 5 ,t* 1 ,n+K n 2 -l 
 t l Q n ( 2 ) " n l 2 V = n { 3 > = "IP 
 
 So we see that the move to the front rule does overwork ft(n) and the 
 
 2 
 transposition rule does fi(n ), and hence the move to the front rule 
 
 approaches its asymptote more quickly. Also note that the move to the 
 
 front rule converges in 1 request, but the transposition rule requires 
 
 n-2 requests. 
 
 We now consider a slightly more complicated case. Suppose 
 there are n-1 elements of probability — y, and one element (k, ) of 
 probability zero. This is not equivalent to the previous case in which 
 k, was moved (unless it was at the top) after each request. Now k, may 
 or may not move depending on which of the n-1 elements with nonzero 
 probability is accessed. 
 
 The overwork for the move to the front rule in this case is 
 -s- . This can be obtained by substituting the p. in the overwork 
 formula. 
 
 In order to determine the overwork in the case of the trans- 
 position rule, we calculate P(k,t), the probability k, is in position k 
 at time t. Notice that k, will move down only when the key directly 
 under it is accessed and that this occurs with probability — y . We then 
 
 k 
 have for k < n: P(k,t) = 4i P^ob(k 1 initially in position i)»Prob (k 1 
 
 moves down k-i positions in t time steps). 
 
30 
 
 k 1 t 1 k_1 n 2 t-k+1 
 
 - 1 J 1 n ' ( k-i } fjCT J ( n=T J 
 
 - 1 t£f** % c?) <nV 
 
 For k = n (probability k, is in the last position) 
 
 n 
 
 P(n,t) = I Prob(k, initially in position i) Prob(k, moves down n-i 
 
 1*1 ' ' 
 
 positions in < t steps). 
 
 n i t f , j 9 t-j 
 1=1 n j-n-1 J n ' 
 
 n ' n j=0 J (n-2) J 
 
 Now if k is the position of k, , then 
 
 * v _L • t ? J_\ J_ (n+l)n k 
 " J=l n_1 J ' j=l n " n_1 " ( } " "^ 
 
 n 
 
 J = 
 j7k 
 
 Then 
 
 I 1P(1.t) 
 
 E (cost) = ( ? +1 > n - iiil = ( ? +1 > n - i=1 
 
 ticostj 2(n-l) n-T^ 2(n-l) fPI 
 
 n . 1 #n-2v r J /tx ,n-j\ 
 
31 
 
 This gives us the expected cost as a function of time. The overwork 1s 
 then: 
 
 X hiftt <£f> X id? ( j' (n 2 J) 
 
 = ^X^ (n 2 J) X i ^ ) § 
 
 By use of a Taylor Expansion, we can verify 
 
 i °° i 
 (l-x) k+1 1=0 k 
 
 n 2 -l 
 Using this substitution, we can show the overwork equals — g- ; 
 
 which also is the same as our earlier model. Again, the move to the 
 
 front rule does fi(n) overwork, and the transposition rule does ft(n ) 
 
 overwork. 
 
 For this case, it is also possible to obtain simple bounds on 
 the residual cost , i.e. the difference between the cost and the 
 asymptotic cost. By substituting the p. into the equation for the cost 
 of the move to the front rule, we get C0ST MTF = ^ + i(^4") • The 
 residual cost is then i(~?) ~ \ e" t/ ^ n " 2 ^for large n. 
 
 For the transposition rule, note that if t < n - 2, all the 
 terms of a binomial expansion are present in the time-varying cost, and 
 
 the residual cost equals j-X^ t 2 -t(2n 2 -4n+3)+n(n-l) 3 s t 2 -2n 2 t+n 4 
 
 2n(n " 1) (n-1) 2 2n 4 
 
 for large n. 
 
32 
 
 If t > n-2 we can add terms with n-2 < j < t to complete 
 the binomial expansion and obtain this result as an upper bound on the 
 convergence. This bound, however, gets progressively poorer as t 
 becomes larger since we must add more and more terms. In fact, the 
 bound goes to infinity as t -*■ °°. 
 
 These two bounds illustrate the difference rates of convergence. 
 Initially (when t £ n-2) the move to the front rule converges exponen- 
 tially, and the transposition rule converges quadratically, so the move 
 to the front rule converges considerably more quickly. To give an idea 
 of the magnitude of these bounds, for t = n-2, Residual Cost MT p ~ .1839 
 
 o 1 n /n 9 1 1 ^ 
 
 and at t = n Residual Cost MTF = ^-(e) ' = j{^) . On the other hand, 
 
 1 1 2 
 
 Residual Cost TO z -~ at t = n-2 and Residual Cost TO < *r for t ~ n . 
 TR 2 TR = 2n 
 
 In general, the transposition rule will converge exponentially, 
 much more slowly than the move to front rule. The convergence of the 
 cost, which is c,A, +...+C A (see appendix), is mainly determined by the 
 size of the eigenvalues with largest modulus. These are much larger in the 
 
 case of the transposition rule. As a comparison, for Zipf's Law with 
 3 elements, the eigenvalues which have nonzero c. are (l-p.-p.) for the 
 move to the front rule (.545, .273, .182). For the transposition rule, 
 these can be numerically calculated as: .710, .576, -.344, .175, -.117. 
 Indeed, the "major" eigenvalues of the transposition rule are larger, 
 and slower convergence will result. 
 
 The overwork has been numerically calculated for more compli- 
 cated distributions. We have already determined a simple form for the 
 
33 
 
 overwork in the move to the front rule and can just put 1n the particular 
 distribution. For the transposition rule, there 1s no known simple form. 
 We can closely approximate the overwork by letting x Q = (' n T»---»7T) De 
 the initial distribution over the states of the Markov chain. Then 
 x~ P 1s the distribution after t requests. From this, we can calculate 
 the expected cost at any t. The asymptotic cost (Acost) can be determined 
 directly from the steady state probabilities, or approximated by the cost 
 of XqP for large t. The overwork is then 
 
 - i t 1 
 I [cost(x n P )-Acost] s I [cost(x n P )-Acost], 
 
 i=0 u i=0 u 
 
 for sufficiently large t. This is the quantity we calculate. The over- 
 work for several distributions is shown in Table 2.2.2. 
 
 By analyzing the differences between successive values of the 
 
 overwork in the case of Zipf's Law, we can conclude that the trans- 
 
 3 
 position rule does ft(n ) overwork while the move to front rule does only 
 
 o 
 n(n ). Thus, for a more complicated distribution the transposition rule 
 
 does much more overwork. 
 
 In fact, assuming Zipf's Law, we can derive an exact form for 
 the move to front rule overwork and prove 1t is ft(n ) and thus the bound 
 f n l n ~') is of the right order. 
 
 Theorem : Assume that the key request probabilities satisfy Zipf's Law. 
 Then the overwork for the move to front rule with a list of n elements is 
 
34 
 
 Table 2.2.2 
 The Overwork for Various Distributions 
 
 
 
 OVERWORK FOR MOVE TO FRONT RULE 
 
 ENGLISH 
 
 LETTERS 52.7469 
 
 ENGLISH 
 
 WORDS 122.3576 
 
 OVERWORK 
 
 FOR 
 
 GEOMETRIC DISTRIBUTION WITH N ELEMENTS 
 
 N MOVE 
 
 TO FRONT RULE 
 
 3 
 
 
 0.291 1 
 
 4 
 
 
 0.8291 
 
 5 
 
 
 1.7564 
 
 6 
 
 
 3.1250 
 
 7 
 
 
 4.9612 
 
 8 
 
 
 7.2860 
 
 9 
 
 
 1C. 1011 
 
 10 
 
 
 13.4123 
 
 11 
 
 
 17.2216 
 
 12 
 
 
 21.5299 
 
 13 
 
 
 26.3377 
 
 14 
 
 
 3 1.6452 
 
 15 
 
 
 37.4527 
 
 16 
 
 
 43.7600 
 
 17 
 
 
 50.5674 
 
 18 
 
 
 57.8747 
 
 19 
 
 
 65.6820 
 
 20 
 
 
 73.9893 
 
 OVERWORK 
 
 FOR 
 
 ZIPF'S LAW WITH N ELEMENTS 
 
 N MOVE 
 
 TC FRONT RULE TRANSPOSITION RULE 
 
 3 
 
 
 0.2006 0.4579 
 
 4 
 
 
 0.4463 1.6503 
 
 5 
 
 
 0.7978 3.9793 
 
 6 
 
 
 1.2576 7.7514 
 
 7 
 
 
 1.8272 13.3005 
 
 8 
 
 
 2.5076 
 
 9 
 
 
 3.2994 
 
 10 
 
 
 4.2031 
 
 11 
 
 
 5.2189 
 
 12 
 
 
 C.3473 
 
 13 
 
 
 7.5882 
 
 14 
 
 
 8.9420 
 
 15 
 
 
 10.4087 
 
 16 
 
 
 1 1.9884 
 
 17 
 
 
 13.6812 
 
 18 
 
 
 1 5.4871 
 
 19 
 
 
 17.4063 
 
 20 
 
 
 19.4387 
 
35 
 
 5rT 
 
 ■(^W^/ 11 ^^-^ 
 
 where 
 
 Hi 2 ' - I \. 
 
 n i=i r 
 
 Asymptotically, this is (| - ln2)n 2 ~ .057n 2 . 
 
 Proof: Substituting p. = -Jr- Into the overwork formula gives 
 1 lH n 
 
 (J- x> 2 
 
 i ™„ JH„' . .. .,2 
 
 | I _2 L_.J j UxiL 
 
 U1<j<n , ,2 Ui<j*n (i+j) 
 
 Since the summand is symmetric in i and j, we have 
 
 i 
 
 iblil + 
 
 I 
 
 ihu: 
 
 U1<j<n (1+j) lsj<1*n (1+j) 
 
 -i ? ■? ^t 
 
 4 1=1 ih (T^7 
 
 Now, making the substitution k = 1+j, we get 
 
 * k=2 j=l r H k=n+l j=k-n IT 
 
 (1) 
 
 The first term in equation (1) equals 
 
 l n 
 
 1 I 
 
 4 k=2 
 
 k-1 . k-1 , k-1 ? 
 j=l K j=l IT j=l 
 
36 
 
 l n 
 
 I I 
 4 k=2 
 
 . ■ 4 (k-1)k 4 (k-1)k(2k-1) 
 k " ] " k ' ^T- + 7 ' J — <T — L 
 
 l n 
 q k=2 
 
 £- 1 fir 
 3 ' 3k 
 
 algtU-}- <„.„♦§<„,.,) 
 
 n 5n Ju 
 24" " 24 + G^n 
 
 (2) 
 
 The second term in equation (1) equals 
 
 1 2n 
 4 k=n+l 
 
 n A n - n ? 
 
 j=R-n j=k-n k j=k-n 
 
 2n 
 
 * k=n+l 
 
 (n . (k . n) +1) _4 ( n[nHl_ (k-n-1)(k-n) 
 
 + 4_ , n(n+l)(2n+l) . (k-n-1 )(k-n)(2(k-n)-1 ) ) 
 k 2< 6 6 
 
 Collecting all equal powers of k gives 
 
 1 2n 
 4 k=n+l 
 
 . k + (2n+1) _ ^2^4) + ^ (nln+ll^ntli) 
 
 3 ■ ^„.,, kV .M .™^j + -j v 3 
 
 _ 1_. ( 2nJ|n ± Ti . nJn+Uj + (2n+1)n 
 
 (4n 2 + 4n4)(H 2n -H n ) 
 
 , 4n(n+1)(2n+1) ,,.(2) (2) 
 
 3 lM 2n M n ; 
 
37 
 
 = fn 2 + fjn- (n 2 + n4)(H 2n -H n ) 
 + niniHIMli (H (2). H ( 2)) 
 
 Adding (2) and (3) gives the total result, 
 
 1 |n 2 -(n 2 + n + l)(H 2n -H n ) + lH n 
 + n(n+l)(2n+1) ( H ^-H< 2 >). 
 
 (3) 
 
 To determine the asymptotic behavior, note that H ~ in n, 
 
 so H 2 - H ~ ln2n - In n = ln2, so the second term is asymptotically 
 
 2 2 
 
 n Tn2. The third term is 0(log n) and is dominated by the n terms. 
 
 2n t 
 
 (2^ (2) = y — 
 
 Finally, we need to approximate Hi ' - H* ' \m-\ ^ ' 
 Since the summand is a decreasing function, we can bound it using the 
 following relation: 
 
 b+1 b 
 
 f(x)dx * I f(D * f(a) + 
 
 i=a 
 
 f(x)dx 
 
 Substituting a = n+1 , b = 2n and f(x) = -*- gives 
 
 x 
 
 2n+l 
 
 2n 
 
 T * I _ TF * . ,v2 
 
 n+1 
 
 : 2 i=n+l i 2 ' (n+1)' 
 
 2n 
 
 n+1 
 
 dx 
 
 T 
 
 x 
 
 (2n+U(«M) * , J n+ 1 7 
 
 r\ c + 6n - 1 
 
 2n(n+l)' 
 
 1 _2 
 Since both the upper and lower bounds equal j- + 0(n ), 
 
 we have 
 
38 
 
 2n 1 1 -2 
 I —k = «- + 0(n ) and the fourth term 1n (4) approaches 
 
 i=n+l r 
 
 1 2 
 
 2 n . Hence, the asymptotic value for (4) 1s 
 
 1^ n 2 - n 2 ln2 + | n 2 = (| - ln2)n 2 s .057n 2 rn 
 
 We can get a graphic idea of the difference 1n convergence 
 from Figure 2.6.1 and Table 2.6.1 in Section 2.6. These show the cost 
 of accessing a list ordered by the two rules as a function of time and 
 compare them with the frequency count rule (see Section 2.6) which is 
 optimal. From graphs like these, it is interesting to calculate the 
 smallest number of requests for which it is better to use the trans- 
 position rule (See Table 2.2.3). Note that the value we are really 
 interested in is not the point where the two cost curves cross, but 
 the point where the integrals of the two curves cross. This is because 
 we want the rule that does the least total work. 
 
 The slope of the cost crossover in Table 2.2.3 increases, so 
 it is super linear and may be about ft(n log n). The integral crossover 
 appears to be fi(n ). We can also get an estimate of the integral cross- 
 over as follows: If we assume all the overwork has been done by time 
 t, the integral crossover time, then the cost integral for the move to 
 front rule is t times the asymptotic cost (AS ) plus the overwork (OV ), 
 and similarly for the transposition rule. Since we are at the point 
 where these integrals cross, 
 
 t • AS m + 0V m = t • AS TD + 0V TD 
 mm TR TR 
 
 ov td - V m 
 
 . TR m 
 
 z AS - AS TD 
 m TR 
 
39 
 
 Table 2.2.3 
 Cost Crossover and Integral Crossover Times 
 
 n 
 
 Cost Crossover 
 
 3 
 
 3 
 
 4 
 
 5 
 
 5 
 
 7 
 
 6 
 
 10 
 
 7 
 
 13 
 
 10 
 
 22 
 
 20 
 
 75 
 
 Integral Crossover 
 
 6 
 
 10 
 14 
 20 
 27 
 50 
 212 
 
 Points where the cost and integral of the cost for the 
 transposition rule become less than that of the move 
 to front rule, for an n-element list with Zipf's Law 
 as the probability distribution. 
 
40 
 
 Earlier in this section, we found 0V TR = ft(n 3 ) and 0V = fi(n 2 ). Since 
 the asymptotic costs are bounded within twice the optimal cost 
 
 (which is ," - ), AS - AS TD = ft U " J and hence we get t = n(n In n), 
 inn m i k inn 
 
 which is slightly larger than shown in Table 2.2.3. 
 
 In summary, though the transposition rule has lower asymptotic 
 
 cost than the move to front rule, it converges to that cost much more 
 
 2 
 slowly, and, in fact, for Zipf's law, it will require fi(n ) key requests 
 
 before it becomes more economical to use the transposition rule. 
 
 2.3 Other Permutation Rules 
 
 We previously defined the idea of a permutation rule, where a 
 permutation t. is perfomed on the list when the key in location i is 
 requested. So far, we have only considered two such rules: the move to 
 front rule and the transposition rule. There are a total of (n!) n 
 possible such rules, but most will just senselessly jumble the list, 
 resulting in no decrease in cost. 
 
 Let us think intuitively about what a "sensible" rule must be 
 like. We will see that a sensible rule should move the requested key 
 up in the list by a certain amount (which may depend on the location of 
 the requested key). This is the only good way to use the information 
 that this key, having been requested, should have higher probability. 
 Any permutation not of this form can be viewed as performing first a sen- 
 sible permutation and then a permutation that leaves the requested element 
 alone. This second permutation will only increase the disorder of the 
 list since no additional information has been given on these keys, and 
 permuting them will work against the order we are trying to create. 
 
41 
 
 We consider the following sort of sensible rule which 
 moves the requested key k position ahead for some fixed k. Another 
 type of rule that should behave similarly 1s where the requested key 
 1s moved some fixed fraction of the distance to the top. 
 
 It can be seen from Figure 2.3.1 (due to Rivest [2]) and 
 Table 2.3.1 that as the distance the requested key moves is Increased, 
 the asymptotic cost increases and the rules converge more quickly, 
 forming a spectrum of rules, ranging from the move ahead 1 (transposition) 
 rule at one end to the move ahead n-1 (move to front) rule at the other. 
 
 2.4 A Hybrid Rule 
 
 We can get a rule that 1s superior to any of those we have 
 considered so far by relaxing the restraint that the rule cannot vary 
 with respect to time. A hybrid rule can be envisioned that moves keys 
 to the front for some initial period of time, then switches and begins 
 transposing. Such a rule will enjoy the advantages of both rules. 
 Initially, it will move keys to the front and will therefore converge 
 quite rapidly. Asymptotically, it will behave like the transposition 
 rule and therefore will have a low asymptotic cost. 
 
 The question is when we should switch rules. To help answer 
 this question, a simulation was run using Zipf's Law for the key request 
 probabilities. Each trial of the simulation used the move to front 
 rule until the expected decrease from using the transposition rule 
 became larger than that of the move to front rule. The number of 
 requests required for this to occur is an approximation to the correct 
 
42 
 
 o 
 E 
 •= 120 
 
 1.10 
 
 1.00 
 
 Move to front rule 
 
 Tronsposit ion rule 
 
 A 5 A 6 
 
 This figure, due to Rivest [2], compares 
 the cost of different move ahead k rules 
 (A. refers to the move ahead i rule) for 
 a list of seven elements whose probabil- 
 ities are given by Zipf's Law. 
 
 Figure 2.3.1 
 
 Asymptotic comparison of the move ahead 
 k rules. 
 
43 
 
 Table 2.3.1 
 Comparison of the Convergence of the Move Ahead k Rules 
 
 Lowest Total Cost 
 
 - 5 
 
 6 - 8 
 
 9 -13 
 14 -38 
 39 - °° 
 
 The results of a simulation using a list of 6 
 elements whose probabilities are given by Zipf's 
 Law show the time interval for which each move 
 ahead k rule has lowest cost and lowest total 
 cost (the total cost is the cost summed over all 
 previous requests). 
 
 k 
 
 Lowest Cost 
 
 5 
 
 - 3 
 
 4 
 
 4 
 
 3 
 
 5 - 7 
 
 2 
 
 8 -15 
 
 1 
 
 16 - co 
 
44 
 
 time to switch. These times were then averaged over all trials to give 
 the results shown in Table 2.4.1. The results of this simulation In- 
 dicate .268n + .980 as the best time to switch rules. This time, of 
 course, depends on the request probabilities, but we would not expect 
 it to vary too much for different distributions. Furthermore the choice 
 is not too critical. Since the transposition rule converges so slowly, 
 little is lost if we use the move to front rule for too long. We need 
 only make sure that our choice is large enough to have the move front 
 rule be close to its asymptote. We would then switch after .5n 
 requests, to make sure we had used the move to front rule long enough 
 to significantly reduce the cost. 
 
 Another method would be to estimate our position on the cost 
 curve by counting the number of compares we require and averaging over 
 a period of time. Once this estimate stops decreasing, we suspect that 
 we are in the flat part of cost curve, and we switch to the transposition 
 rule. This rule has the overhead of counting the number of comparisons, 
 In addition, we must be careful not to average over too short a period, 
 or we may switch too soon. 
 
 This rule is best employed when we expect an intermediate 
 number of requests. If few (0(n )) requests are expected, then the 
 move to front rule is used. A great number suggests the transposition 
 rule. An intermediate number means that both of the good features of 
 the hybrid (fast convergence and low cost) will be valuable and the 
 overhead incurred by using this rule will be worthwhile. 
 
45 
 
 Table 2.4.1 
 Best Times to Switch Rules 
 
 n Average Switch Time 
 
 3 1.90 
 
 4 2.20 
 
 5 2.29 
 
 6 2.52 
 
 7 2.84 
 
 8 2.94 
 
 9 3.35 
 10 3.55 
 20 6.608 
 30 8.924 
 
 Simulation showing the average best time to 
 switch from the move to front rule to the 
 transposition rule. The probability distri- 
 bution is Zipf's Law over n elements. 
 
46 
 
 2.5 The First Request Rule 
 
 The first request rule 1s defined as follows: the first time 
 a key is requested, it is moved up 1n the 11st until 1t comes to the 
 top or a previously requested key. After that, it 1s not moved. Note 
 that the keys occur in the 11st 1n order of their first request. After 
 all keys have been requested, the ordering obtained is the same as if 
 the keys had not been known a priori, and the list had been built by 
 inserting a "new" key (one that had been requested for the first time) 
 at the end of the list. 
 
 The following theorem characterizes the performance of this 
 rule. 
 
 Theorem : Given any initial list, the probability of obtaining a given 
 final list after any number of requests is the same for the move to 
 front and first request rules. 
 
 Proof : Consider any sequence of requests r, ...r. as inputs to the move 
 to front rule, and the reverse sequence r....r, as inputs to the first 
 request rule. Note that these two sequences have the same probability. 
 Suppose that both rules start with the same list. We now show that 
 these two sequences produce the same ordering. Consider any two keys 
 k. and kj. If neither is requested, both rules will leave the initial 
 order unchanged, and k. and k. will be ordered the same in the two 
 final lists. If only one (say k. ) is requested, then both rules will 
 have k. ahead of k. in the final list. If both are requested (say k. 
 is requested after k. in the sequence r, ...r.), then k. will be ahead 
 
47 
 
 of k. in the move to front list. Since k. is requested before k. in 
 
 the sequence r. . ..r, , k. will also be ahead of k. in the first request 
 
 list, hence the orderings 1n the final list will again be the same. 
 
 In any case, k. is ahead of k. in one list if and only if it is ahead 
 
 of k. in the other. Hence the two lists must have the same ordering. 
 J 
 
 Now consider any list. For each sequence of requests that 
 will produce this list using one rule, there exists a sequence of 
 equal probability that will produce this same list using the other 
 rule. Hence the probability for either rule to produce this list must 
 be equal. ~\ 
 
 This theorem is easily extended to hold for a probability 
 distribution over initial lists since the two rules will behave 
 identically for each initial list. Also, it implies that the cost of the 
 first request rule at any time will equal the cost of the move to front 
 rule. Therefore all the previous results concerning the move to front 
 rule apply to the first request rule. 
 
 Suppose the keys were not known a priori and the list was 
 constructed by inserting a "new" key at the end of the list. Clearly, 
 the asymptotic distribution will be that of the first request rule. 
 This theorem tells us that if the initial list was constructed in this 
 manner, using the move to the front rule will not decrease the cost 
 (since the Markov chain will be in steady state). 
 
 The first request rule differs from the move to front rule in 
 two important ways. First, since each key is moved only once, it is 
 
48 
 
 cheaper to execute than the move to front rule. Second, since the 
 list converges to a specific ordering (which may have very high cost), 
 the variance of the cost is much higher than that of the move to front 
 rule. 
 
 The first request rule can be modeled by a Markov chain with 
 (n+l)n! states. For each of the n! orderings of the list, the chain 
 can be in n+1 different states, depending on whether 0,1,..., or n 
 different keys have been requested. Unlike previous chains, this 
 chain is reducible (see appendix). Once we reach a state in which all 
 n keys have been requested, we are "trapped" and cannot leave this state, 
 
 On the other hand, an irreducible chain cannot get trapped 
 and must divide its time among all states that have nonzero steady state 
 probability. In fact, the ergodic theorem tells us that if a state has 
 steady state probability p, the chain will spend a fraction of its 
 time equal to p in this state. 
 
 We are now in a position to talk about the variance of the 
 
 costs of these two rules. If we let c. be a random variable equal to 
 
 the cost of the state the chain is in at time i, E(c), VAR(c.) and 
 c,+c 2 +...+c. 
 
 E( * ) are the same for both rules. However the variance of the 
 
 c,+...+c 
 cost averaged over some time period [VAR(— -)] is much greater for 
 
 the first request rule. The fact that the move to front rule is 
 
 lim c-1+...+c 
 irreducible implies VAR( -) = (see appendix). However, 
 
 for large n, c = c , using the first request rule (since the chain 
 
 has reached a final state), and -~ c . Therefore the variance 
 
49 
 
 of the average cost is VAFUc ) > 0. (An expression for this variance 
 can be found in McCabe [10].) 
 
 We can use the first request rule to form a hybrid rule with 
 the transposition rule as follows: When a key is first requested, we 
 use the first request rule and move the key up until a previously 
 requested key is encountered. When the key is subsequently requested, 
 we use the transposition rule to promote it in the list. 
 
 The performance of this hybrid is better then the move to 
 front/transpose hybrid. The only requests handled by the transposition 
 rule are second and subsequent requests. Hence the initial list that 
 the transposition part of the hybrid "sees" is a list ordered by the 
 first request rule, which, as we have seen, is the move to front rule 
 (or first request rule) steady state. Hence the transposition rule 
 "starts" from the move to front rule steady state. This is an improve- 
 ment over using the move to front rule initially since then the steady 
 state is never reached. In addition, this hybrid will reduce the cost 
 more quickly than the first request rule because it does a cost re- 
 ducing transposition on second and subsequent requests of a key, while 
 the first request rule alone does nothing. This hybrid also has the 
 desirable feature that no guesswork need be done as to when to switch 
 rules. This choice is performed automatically by the algorithm. 
 
 2.6 Frequency Count Rule 
 
 Perhaps the most natural way to cause high frequency keys to 
 move to higher positions in the data structure would be to keep count 
 
50 
 
 of how many times each key has been requested. If we assume the request 
 probabilities are constant with respect to time and then keep the keys 
 sorted according to their frequency counts, high probability keys will 
 move to the top. 
 
 The primary advantage of this rule is that it has a lower 
 access time than the other rules we have considered. In fact, its 
 performance is optimal. In addition, frequency information is available 
 for analysis, which may be desirable, and the changes required to 
 execute the rule are quite simple. The primary disadvantage is that 
 count fields must be kept, requiring extra storage. These points are 
 now considered in greater detail. 
 
 We first discuss the performance of this rule. The following 
 theorem shows that it is asymptotically optimal. 
 
 Theorem : As the number of requests, t ■* °°, a list ordered by the 
 frequency count rule approaches the optimal ordering. 
 
 Proof : If two keys k. and k. have probabilities p. and p. with p. > p., 
 the probability that k. is ahead of k. after t requests approaches one 
 
 n 
 as t -> oo. Since E(Cost) = T p. (1 + T Prob(k. is ahead of k. )) we 
 
 1=1 ] j7i J 1 
 
 1 im I I 
 
 have t ^ oo E(Cost) = I ip i which is the optimal cost. [_| 
 
 Also, if we have no a priori reason to suspect k. is more 
 probable than k., this rule is optimal at any time. 
 
 J 
 
51 
 
 Theorem : If we have no a priori knowledge of the probability distribu- 
 tion, the frequency count rule provides the optimal ordering at any 
 time. 
 
 Proof : If we have no a priori knowledge, then all distributions of key 
 requests must be considered equally likely, so if k. has occurred more 
 times than k., Prob(p. > p.) > Prob(p. < p.), and an arrangement with 
 k. ahead of k. will have a lower expected cost. Clearly, the arrangement 
 with the lowest expected cost will be the one in which the keys are 
 sorted by frequency count, and this, of course, is the arrangement given 
 by the count rule. I — I 
 
 A comparison with previous rules is given by Figure 2.6.1, 
 which shows the results of a simulation done on a 15-element list using 
 Zipf's Law. Table 2.6.1 shows a simulation for a list with 100 elements. 
 These two simulations give us a good idea of the differing rates of 
 convergence of the two previous rules and how they compare to the optimum, 
 Initially, the move to the front cost decreases nearly as quickly as 
 that of the count rule. This is intuitively reasonable: Initially, the 
 count rule will move the requested item close to the top, so its behavior 
 should be very close to the move to the front rule. On the other hand, 
 the transposition rule's cost decreases very slowly, especially on the 
 100 element list. 
 
 As mentioned before, the changes required by this rule after 
 each request are small. Suppose k. is requested for the r time. The 
 only change is to increase k. 's frequency count from r-1 to r and move 
 
52 
 
 02.0 
 
 Simulation on a 15-element list using Zipf s Law 
 "Time" is measured as the number of requests 
 
 
 Figure 2.6.1 Comparison of Various Rules 
 
53 
 
 
 
 Table 2.6.1 
 
 
 
 Another Comparison 
 
 
 
 TIME VARYING COST 
 
 FOR 
 
 ZIPF'S LA* WITH 
 
 100 ELEMENTS 
 
 TIME 
 
 MOVE TO FRONT 
 
 
 TRANSPOSIT ION 
 
 FREQUENCY COUNT 
 
 
 
 50.1322 
 
 
 50.1322 
 
 50.1322 
 
 1 
 
 47.4562 
 
 
 50.0749 
 
 47.4556 
 
 2 
 
 45.4688 
 
 
 50.0307 
 
 45.4608 
 
 3 
 
 43.4281 
 
 
 49.9742 
 
 43.4257 
 
 4 
 
 41.8796 
 
 
 49.9329 
 
 41.8443 
 
 5 
 
 40.7099 
 
 
 49.8780 
 
 40.6515 
 
 6 
 
 39.7503 
 
 
 49.8341 
 
 39.6460 
 
 7 
 
 38.5905 
 
 » 
 
 49.7742 
 
 38.4920 
 
 8 
 
 37.8538 
 
 
 49.7196 
 
 37.6994 
 
 9 
 
 36.5972 
 
 
 49.6666 
 
 36.4395 
 
 10 
 
 36.1294 
 
 
 49.6216 
 
 35.8960 
 
 it ahead of all keys having frequency count r-1. We can easily determine 
 to where k. should move in the following manner. During our search for 
 k. we keep a pointer to the key furthest down in the list whose count 
 is greater than the key we are currently examining. When we examine k. , 
 this pointer will point to k.'s new location. Note that after many 
 requests, the count fields will be widely separated, and these moves 
 will rarely be required. 
 
 The primary disadvantage of this rule is the additional storage 
 required for the count fields. The storage required, however, can be 
 reduced using very simple techniques. From the updating algorithm, we 
 can see that actual count a key has is not important. What matters is 
 the difference between successive counts, because this gives us all the 
 information we need to keep the keys ordered with respect to count. If 
 we store this difference instead of the full count, we will require 
 
54 
 
 less storage, since the rate of growth of the difference fields is 
 proportional to the difference in successive probabilities (which is 
 small), while the count fields grow in proportion to the probabilities. 
 Note that only a small amount of work is required to update the dif- 
 ference fields since after a request, only once count field changes, and 
 hence at most two difference fields must be updated. 
 
 Thus, the count rule is a very attractive rule. Asymptotically 
 it approaches the optimal ordering. At any time, it provides us with 
 the list which has lowest cost, based on the requests we have seen so 
 far. The work required to update the list is also very small. The 
 primary disadvantage is the extra storage required. However, this 
 disadvantage can be reduced by storing the differences between successive 
 counts. 
 
 2.7 Limited Difference Rules 
 
 We now consider a set of rules which limit the size of the 
 difference fields in the frequency count rule. Once a difference field 
 reaches this limit, additional requests of the more frequent key leave 
 this field unchanged (requests to the other key, of course, decrease 
 this field). 
 
 If the maximum difference is zero, then the algorithm will 
 move a key to the front when it is requested, and will perform exactly 
 like the move to front rule. As the maximum difference is increased, 
 the performance will improve, with the full count rule (no maximum dif- 
 ference) as the limit. Therefore, performance approaches the optimum 
 as the number of bits is increased. 
 
55 
 
 To see how much the performance is effected by the number of 
 
 bits, let us consider a list with only 2 elements, having probabilities 
 
 a and b(=l-a). If the maximum difference is at most n, then the 
 
 corresponding Markov chain has 2n+2 states: 
 
 A. , < i < n where the key with probability a is first in 
 
 the list and the difference is i. 
 
 B. , * i < n where the key with probability b is first in 
 
 the list with difference i. 
 
 It is easy to verify that the steady state equations are: 
 
 A„ = aA n , + aA n B n = bB„ , + bB n 
 
 n n-1 n n n-1 n 
 
 A-j = aA -j-i + bA i+i B i = bB i_i + aB i+r 2 ^ 1# ^ n- 1 
 
 A 1 = bA 2 + aA Q + aB Q B, = aB 2 + bB Q + bA Q 
 
 A Q = bA 1 B Q = aB 1 
 
 n n 
 and, in addition, I A. + I B. = 1. 
 1=0 1 i=0 n 
 
 We solve this system of equations to get 
 
 h n_1 h n+1 
 
 . n-i . n+i 
 
 A. = A n (f) B. = A n (|) 1 < 1 < n 
 
 and A_ = 
 
 n a[l - (j^ n+1 ] 
 
 The cost of the list is 
 
 (a+2b) Prob(key with probability a is first in list) + 
 (b+2a) Prob(key with probability b is first) 
 
56 
 
 = (a+2b) I A. + (b+2a) £ B, 
 
 1=0 n 1-0 1 
 
 2b(b-a)(£) n - (b-a) 
 
 = (1+a) * 
 
 K 2n+l 
 
 Let us now suppose that b > a. Then the optimal cost is b+2a = b+a+a = 
 1+a, which is the first term in the cost expression. The difference 
 from the optimum is then given by 
 
 2b(b-.)(|)" - (b-a) , 
 
 TTnTI . n+1 S1nce a '■ 
 
 [(£> - 1] (£) 
 
 Hence we see that the "use" of adding one to the maximum difference 
 
 decreases expontially with base — . This tells us that the performance 
 
 a 
 
 should be improved by the addition of just a few bits. However, the 
 "flatness" of the distribution (determined by how close — is to one 
 in this simple case) determines how many bits will be required. The 
 flatter the distribution, the more bits will be required to correctly 
 distinguish the more probable elements. 
 
 Table 2.9.1 shows the results of a simulation run on larger 
 bits. Even using a small maximum difference provides nearly optimal 
 results. 
 
 The limited difference rule lets us use a limited amount of 
 storage, while providing nearly optimal results. For a two element 
 list, the cost of this rule approaches the optimum exponentially as we 
 increase the maximum difference. 
 
57 
 
 2.8 Wait c, Move and Clear Rules 
 
 We now consider two classes of rules that use bit fields to 
 store information about key requests. The first class uses the bit 
 field as a counter, initially zero, that is incremented by one each 
 time the key is accessed. Once the field exceeds to maximum value, the 
 key is moved (using either the move to front or transposition rule) and 
 the field of every key is reset to zero. The cost of performing this 
 may be very significant. However, if all fields are stored in one area 
 (instead of being directly associated with each key) we can set all 
 fields to zero by zeroing a contiguous area of core, which may be done 
 very efficiently. We will call these rules "wait c, move and clear" 
 rules, where c is the maximum value of the field. 
 
 A second class of rules (discussed in the next section) 
 behaves in a similar fashion, except that when a key is moved, only its 
 field is reset to zero. These rules will be called "wait n and move" 
 
 rules. 
 
 In analyzing these rules, we will find that using the count 
 fields in the first manner will decrease the asymptotic cost more than 
 the second method. However, the convergence of the first method will 
 be much slower, since we will not move a key every request, and, if the 
 maximum difference is very large, we will move keys only very rarely. 
 
 We begin our analysis of the wait c, move and clear rules with 
 the following theorem. 
 
58 
 
 Theorem : Given key request probabilities p,,p 2 ,...p , the steady 
 state probability of a given list using a wait c, move and clear rule 
 is equal to the steady state probability of the list using the cor- 
 responding permutation rule with modified key request probabilities 
 
 P^c), P 2 (c),...,p n (c), 
 
 where 
 
 c c c c 
 
 P,(c) = I ... I I ... I 
 
 »r° a i-r° vr° v° 
 
 (c + a 1 H-...ta i ._ 1 + a 1+1 ^...ta n )! 
 c!a l ! -Vl ! Vl ! - s n ! 
 
 n C+1 „ a i n a i'-l n a i+l /" 
 
 Pi Pi •••P 1 _i P 1+ i ■••?„ • 
 
 Proof : Consider the sequence of keys that have been moved by the wait 
 c, move and clear rule. We have assumed that any two requests are in- 
 dependent, and that the request probabilities are constant with respect 
 to time. Because of these assumptions and the fact that we clear the 
 counts after each move, the move sequence has the following properties: 
 
 (1) Any two moves are independent. 
 
 (2) The probability that the i move is a given key 
 does not depend on i . 
 
59 
 
 If we use the move sequence as inputs to a permutation rule, 
 the resulting list will be the same as one obtained by inputting the 
 original request sequence to the wait c, move and clear rule. We 
 note that the properties of the move sequence are exactly those required 
 for a request sequence, so the inputs to the permutation rule can be 
 thought of as a sequence of requests. However, elements of this sequence 
 are not chosen using the request probabilities, but using the probability 
 that a key is moved. The probability that p. is moved is exactly the 
 p. shown in the statement of this theorem. 
 
 This formula is derived as follows: If k. was moved, we 
 know that k. has been requested c+1 times, and that the last request 
 (the one that caused k. to be moved) must have been for k. . Then for 
 j^i, let k. be requested a. times (0 £ a. £ c) and sum over all possible 
 
 J J J 
 
 choices for the a.. 
 
 This would complete the proof if every request to the wait c, 
 move and clear rule caused a move. This is not the case since we must 
 wait after each move while the counts build up. If this waiting time 
 were dependent on the current state (as it will be for the wait c and 
 move rules), states with longer waiting times would have proportionally 
 greater probabilities. Fortunately, this is not the case. After each 
 move, the counts are reset and hence each state will have the same 
 expected waiting time. ^] 
 
 This proof demonstrates the reason wait c, move and clear rules 
 outperform permutation rules. In order to be moved, a low probability 
 
60 
 
 key must be requested c+1 times before any other key 1s requested 
 c+1 times. Hence these are less likely to be moved. On the other 
 hand, high probability keys now have a proportionally greater chance. 
 Notice, of course, that the probability that a key is requested 
 remains the same; we are only being more selective about which key we 
 move. 
 
 Due to this correspondence between wait c, move and clear 
 rules and permutation rules, many results from previous sections carry 
 over. Specifically: 
 
 Corollary : Let keys k, ,kp,...,k have request probabilities p,,p 2 ,...p 
 and let p,(c) ,. . . ,p (c) be defined as in the previous theorem. Then 
 
 (1) The asymptotic cost of the wait c, move to front and 
 clear rule is 
 
 1 + I 
 
 W P^CjPjU) 
 
 (2) For the wait c, transpose and clear rule, the steady 
 
 state probability of any given ordering (k, ...k ) is 
 
 n n 'i 
 n p.(c) 
 
 i=l n 
 
 N 
 where N is a normalizing constant. 
 
 (3) The wait c, transpose and clear rule has asymptotic 
 cost less than or equal to that of the wait c, move to 
 front and clear rule. 
 
61 
 
 Proof : All result from replacing p^Cthe probability a key is moved 
 by a permutation rule) by p. (c)(the probability for a wait c, move and 
 clear rule). 
 
 As in the case of the limited difference rule, the performance 
 approaches the optimum as c -»• ». 
 
 Theorem : As c -*• °°, the asymptotic costs of the wait c, move to front 
 and clear rule and the wait c, transpose and clear rule approach the 
 optimal cost. 
 
 Proof : We first examine the wait c, move to front and clear rule. 
 Consider the probability that k. is ahead of k. in the list. This will 
 be the case if any only if k. was moved at the most recent time when 
 either k. or k. was moved (i.e. k. was the most recently moved of k. 
 and k.). Thus, the probability is Prob(k_- was moved k. or k. was 
 moved). This equals the probability that k. was requested c+1 times 
 before k. was requested c+1 times. By the law of large numbers, this 
 approaches 1 if p. > p. and if p. < p.. Hence the expected cost 
 which equals 
 
 1 + I p. Prob(k. ahead of k.) 
 W J 1 
 
 approaches 
 
 1 + I PjO-U = I 1p r 
 i i 
 
 the optimal cost. 
 
62 
 
 By (3) of the previous corollary, the wait c, transpose and 
 clear rule has cost less than or equal to that of the wait c, move to 
 front and clear rule, so 1t also approaches the optimum. 
 
 So both the wait c, move and clear rules and the limited 
 difference rule approach the optimum as the number of bits they use 
 increases. The important question is: which converges more quickly? 
 Table 2.9.1 shows the limited difference rule makes "better use" of 
 its bits. 
 
 This can also be demonstrated in the case of a list of two 
 
 elements (A and B) having probabilities a and b (=l-a). Here the 
 
 c • • 
 probability that A is ahead of B equals I ( C V) a c+ V. Table 2.8.1 
 
 i=0 n 
 shows this probability approaches one much more slowly than that of the 
 
 limited difference rule. 
 
 A major disadvantage of the wait c, move and clear rules is 
 that they decrease the cost more slowly than the corresponding per- 
 mutation rule with modified probabilities, since a counter must exceed 
 c for a move to be done. The worst case occurs when every key is 
 requested c times before any key is requested c+1 times. In this case, 
 a move will be done every cn+1 requests. Thus, the convergence can be 
 slowed by a factor fi(n). On the other hand, the best case occurs when 
 the same key is requested c+1 times. Here, a move will be made every 
 c+1 requests and the convergence must be slowed by at least this 
 constant multiple. 
 
63 
 
 Table 2.8.1 
 Probability A 1s ahead of B for a=.6 
 
 c 
 
 LIMITED BIT «ULE 
 
 
 
 0*60000 
 
 1 
 
 0*66316 
 
 2 
 
 0*74218 
 
 3 
 
 0*81039 
 
 4 
 
 0*66446 
 
 5 
 
 0*90511 
 
 6 
 
 0*93457 
 
 7 
 
 0*95536 
 
 8 
 
 0*96977 
 
 9 
 
 0*97963 
 
 10 
 
 0*98632 
 
 11 
 
 0*99084 
 
 12 
 
 0*99387 
 
 13 
 
 0*99591 
 
 14 
 
 0*99727 
 
 15 
 
 0*99818 
 
 16 
 
 0.99878 
 
 17 
 
 0.99919 
 
 18 
 
 0.99946 
 
 19 
 
 0.99964 
 
 20 
 
 0. 99576 
 
 21 
 
 0.99984 
 
 22 
 
 0.99589 
 
 23 
 
 0.99593 
 
 24 
 
 0.99995 
 
 25 
 
 0.99997 
 
 26 
 
 0.99598 
 
 27 
 
 0.99999 
 
 28 
 
 0.95599 
 
 29 
 
 0.59999 
 
 30 
 
 1.00000 
 
 31 
 
 1.00000 
 
 32 
 
 1 .00000 
 
 33 
 
 1.00000 
 
 34 
 
 1.00000 
 
 35 
 
 1.00000 
 
 36 
 
 1 .00000 
 
 37 
 
 l.COOOO 
 
 38 
 
 1.00000 
 
 39 
 
 1.00000 
 
 40 
 
 1.00000 
 
 41 
 
 1 .00000 
 
 42 
 
 1.00000 
 
 43 
 
 1.00000 
 
 44 
 
 1.00000 
 
 45 
 
 1.00000 
 
 46 
 
 l.COCOO 
 
 47 
 
 1.00000 
 
 48 
 
 1 .00000 
 
 49 
 
 l.COOOO 
 
 50 
 
 1*00000 
 
 WAIT C RULE 
 
 
 
 •60000 
 
 
 
 ,64800 
 
 
 
 •68256 
 
 
 
 •71021 
 
 
 
 •73343 
 
 
 
 •75350 
 
 
 
 .77116 
 
 
 
 ,78690 
 
 
 
 ,80106 
 
 
 
 .81391 
 
 
 
 ,82562 
 
 0, 
 
 ,83636 
 
 
 
 ,84623 
 
 
 
 ,85535 
 
 0, 
 
 ,86379 
 
 
 
 ,87162 
 
 0, 
 
 ,87890 
 
 0. 
 
 ,88569 
 
 0, 
 
 ,89202 
 
 0, 
 
 ,89794 
 
 0, 
 
 ,90348 
 
 0. 
 
 ,90868 
 
 0, 
 
 ,91355 
 
 0, 
 
 .91812 
 
 0< 
 
 ► 92242 
 
 0, 
 
 ,92647 
 
 0, 
 
 ,9 30 28 
 
 0, 
 
 ,93387 
 
 0, 
 
 ,937 25 
 
 0, 
 
 ,94045 
 
 0. 
 
 ,94346 
 
 0, 
 
 ,94631 
 
 0. 
 
 ,94900 
 
 0, 
 
 ,95154 
 
 0, 
 
 ,95395 
 
 0. 
 
 ,95623 
 
 0. 
 
 ,95838 
 
 0, 
 
 ,96042 
 
 0. 
 
 ,96236 
 
 0, 
 
 ,96419 
 
 o, 
 
 96593 
 
 0. 
 
 ,96757 
 
 0, 
 
 ,96914 
 
 o. 
 
 ,97062 
 
 0« 
 
 97203 
 
 0, 
 
 ,97336 
 
 o. 
 
 97463 
 
 0« 
 
 97584 
 
 0, 
 
 ,97698 
 
 0. 
 
 97607 
 
 0* 
 
 97910 
 
64 
 
 To get an idea of the average decrease 1n convergence, we 
 consider n equally likely keys and c=l. Note that this 1s the least 
 favorable key distribution. We now determine the expected number of 
 requests before a key is requested for a second time. This is 
 
 n 
 
 I Prob(no key has been requested twice after i requests). 
 1*0 
 
 This probability equals the number of sequences of length i 
 
 of distinct keys ( / "'• \ , ) divided by the total number of sequences (n ) 
 
 ? n! 1 
 "i^O^^ 7 
 
 Replacing i by n-i gives 
 
 -n e n 
 
 nl I V~ =n!n- n [ ^ < n!n 
 i=0 n ' i=0 '' 
 
 Stirling's approximation gives 
 
 3 (n n e" n SZm) n" n e n = SZ™. 
 
 Therefore, for c=l , the expected slowdown is ft(/n), for this unfavorable 
 key distribution. 
 
 The wait c, move and clear rules have an interesting cor- 
 respondence with the permutation rules. They perform better than per- 
 mutation rules because they are more selective about which keys are moved. 
 However, the performance is not as good as the limited difference rule. 
 
65 
 
 These rules have the further disadvantage of converging more slowly 
 than permutation rules. For a list of n elements, the convergence is 
 slowed by a factor between c+1 and nc+1 times. If c=l, the average 
 slowdown is ~/27rn for the uniform distribution. 
 
 2.9 Wait c and Move Rules 
 
 We now turn our attention to the wait c and move rules, and 
 first consider the wait c and move to front rule. 
 
 Theorem : Given key probabilities p,,p 2 ,...,p , the asymptotic cost of 
 the wait c and move to front rule is 
 
 1 + I Pt x.., 
 
 where 
 
 Pa c P-: k c mxb P-; m 
 
 J1 ( Pi + Pi )(c-H) 2 k=0 h + Pj m=0 m V p j 
 
 (the probability k. is ahead of k. in the list). 
 
 Proof : Recall that the expected cost is 
 
 1 + I p. Prob(k. ahead of k.) 
 W J 
 
 and therefore we must determine this probability. Consider any two keys, 
 A and B, having probabilities a and b. Note that the relative ordering 
 of A and B will not be effected when another key is moved. Also, their 
 counts will remain the same since they are not cleared. Therefore, in 
 determining Prob(A ahead of B), we can ignore all other keys and requests 
 
66 
 
 to all other keys; we need only consider a 11st consisting of A and B, 
 having probabilities -nr and -r of being requested. (For simplicity, 
 
 we rename these probabilities "a" and "b"). 
 
 2 
 
 This list can be modeled by a Markov chain with 2 (c+1) 
 
 states, A., and B. . for * i, j <; c. State A,, corresponds to the 
 list with A (having count i) ahead of B (having count j). State B. . 
 corresponds to B (having count j) ahead of A. Note that the first sub- 
 script is always A's count. 
 
 Before solving for the stationary distribution, we must first 
 make sure it will give us Prob(A ahead of B). There are two possible 
 troubles. First, as with the wait c, move and clear rule, we must wait 
 in each state of the two element chain while keys other than A and B are 
 being requested. However, since key requests do not depend on whether 
 A is ahead of B, or the count of either key, the requests are independent 
 of the state and hence the expected waiting time is the same for each 
 state. 
 
 Second, the chain is periodic with period c+1. If we let r. 
 and r„ be the number of times A and B have been requested, we have 
 i = r. mod(c+l) and j = r„ mod(c+l). Therefore i+j = (r. + r g ) mod(c+l). 
 Since each transition increases r. + r D by one, if we start at A., (or 
 
 A D I J 
 
 B..), it will always take a multiple of (c+1) transition to return. 
 Hence the chain has period c+1. 
 
 A chain which is periodic does not converge to its steady 
 state distribution in the sense that .™ p t^ x 0' x ^ = ^ x ^» where P t (x Q ,x) 
 
67 
 
 is the probability of going from an initial state x Q to state x in 
 t transitions, and p(x) is the steady state probability of state x. 
 However, for an irreducible chain (which this one is), the ergodic 
 
 1 , im 1 
 
 theorem holds (see appendix). This states that ._ >oo j I P+^ x o» x ^ = 
 
 p(x). Hence the "time average" of the probability approaches the 
 steady state distribution. If C(t) is the expected cost at time t, 
 
 we are guaranteed ™ T J C(i) = T p(x)c(x) where c(x) is the cost of 
 z l i=0 x 
 
 state x. The cost converges to the asymptotic cost in this sense. 
 Note that the asymptotic cost is still the stationary probability of a 
 state times its cost summed over all states, only the strength of con- 
 vergence has been changed. 
 
 We now proceed to determine the stationary probability. The 
 steady state equations are: 
 
 A ir a Vi,j + b Vj-i 
 
 B ij =aB 1-l,j +bB i,j-l for0<1,j*c 
 
 A 0j = bA 0>j . 1 ♦ aA c . + aB cj 
 
 Bq. = bB . 1 for < j <; c 
 
 A i0 " aA i-l,0 
 
 B i0 = aB 1-l.0 + bA ic + bB 1c forO<1«i 
 
l'<i 
 
 A 00 = aA cO + aB cO 
 
 B 00 = bA 0c + bB 0c 
 
 1 
 
 By adding pairs of equations, we can verify A.. + B, . = 
 , , v2 . This corresponds to the fairly obvious fact that asymptotically 
 every pair of counts (without regard to the order of the list) is 
 equally likely. 
 
 Substituting this relation gives 
 
 A ij =aA i-l,j +bA 1,j-l forO<i,j<c 
 
 An-i = bA n ,• 1 + — — 1 for < j ( c 
 
 'Oj u "0,j- ^7 
 
 A iO =aA i-l,0 forO<i«c 
 
 A 
 
 00 (c+1) 2 
 
 This is equivalent to the system 
 
 A. . = aA. , . + bA. . , for £ i , j £ c 
 
 for £ j * c 
 
 for * i * c 
 
 For convenience, extend these recurrences to hold for all i , j ^ 0. 
 This will not effect the A., we are interested in. The recurrence can 
 now be solved by the use of generating functions. Define 
 
 A -l 
 
 ,j 
 
 _ 1 
 
 (c+1) 2 
 
 A i, 
 
 -1 
 
 = 
 
69 
 
 00 00 . , 
 
 F(x,y) = I I A.-xV 
 1=0 j=0 1J 
 
 = I I (aAj , i + bA, , .)xV 
 i=0 i=o ' l,J 1,J ■ 
 
 00 00 
 
 = a I I A,, ,-xV + b I £ A, , lX y 
 i=0 j=0 n ,,J i=0 j=0 1,J 
 
 00 00 . 00 . 00 00 . 
 
 = ax I I A.-xV + a I A , ,y J + b I I A-.xV 
 i=0 j=0 1J j=0 l,J 1=0 j=0 1J 
 
 = ax F(x,y) + % + by F(x,y) 
 
 (c+l) T (l-y) 
 
 Solving for F(x,y) gives 
 
 F(x - y) = ( ,..X .. )( i^bb7> 
 
 (c+l)'(l-y) *~ by ' 
 
 00 ., 00 
 
 = - L -T ( Iy 1 )( I (ax+by) J ) 
 (c+ir 1=0 j=0 
 
 Using the binomial theorem gives 
 
 j-k k 
 
 7 ( Iy 1 )(.I I ( J k )(ax) (by) ) 
 
 (c+l) fc 1=0 j=0 k=0 k ' 
 
 = -^"T I I i ( J k )a^ k b y- k y i+k 
 (c+1) 1=0 j=0 k=0 K 
 
 Now substitute i' for j-k and j' for i+k and then drop the primes 
 
 OO 00 J 
 
 ■ - J -7 I I I ( 1 k k )a 1 b k xV 
 (c+lr 1=0 j=0 k=0 K 
 
 Therefore A.. = a * \ ( i t k )a i b k 
 1J (c+ir k=0 k 
 
70 
 
 c c 
 Prob(A ahead of B) is then I I A . . 
 
 i=0 j=0 1J 
 
 = - S -7 I I I C 1 t k )a 1 b k 
 
 (c+lT 1=0 j=0 k=0 K 
 
 = - J - T I I ( 1 t k )a i b k I 1 
 (c+1) k=0 i=0 K j=k 
 
 a r / mukS ,1+k%_ 1 
 
 7 I (c-k+l)b* [ (': R )a 
 (c+ir k=0 1-0 k 
 
 Recalling that a and b were originally -Jt- and — nr and 
 substituting into the cost formula finishes the proof. 
 
 Another interesting fact about this rule is that for some 
 distributions we can prove that it does not approach the optimum as 
 we increase the number of bits. 
 
 Theorem : Given a distribution of key request probabilities, if 
 
 p. < p. < 2p. for some i and j, the wait c and move to front rule will 
 
 not approach the optimum as c -► ». 
 
 Proof : We show that Prob(k. ahead of k.) does not approach 1 as c * «, 
 
 hence the cost is bounded away from the optimum. 
 
 p. p. 
 
 For convenience, let a = — | — and b = — J— - . From the 
 
 p i +p j p i +p j 
 
 preceeding theorem, Prob (k. ahead of k.) = 
 
 J 
 
 k r ,i+kx i 
 
 -i-7 I (c-k+l)b K I (': K )a 
 (c+1) k=0 i=0 K 
 
 c . °° ... 
 <- JL - T I (c-k+l)b k I C*)*' 1 
 (c+iy k=0 i=0 K 
 
71 
 
 a ? ,_ ..-.xuk 
 
 7 I (c-k+l)b R ^ 
 
 (c+ir k=0 (1-a)' 
 Since 1-a = b, 
 
 = — *—* I (c-k+1) 
 b(c+l) k=0 
 
 = -^ T [(c + l) 2 -^-] 
 
 b(c+ir 
 
 a c+2 
 
 b 2c+2 
 
 p . , . 
 
 which approaches It- = -s— as c ■> ». since p. < 2p., „ n Prob(k, ahead 
 „ 2b 2p . r i r j' c-*>° v l 
 
 "l ^ I — I 
 k.) = j^~ K 1 anc ^ tne cost cannot approach the optimum as c ■+ ». I I 
 
 J 
 
 Indeed, it is reasonable to expect the theorem to hold for 
 all distributions except the uniform and the distribution with a key of 
 probability one. However, this conjecture has not yet been proved. 
 
 It is interesting to determine why this method decreases the 
 cost over the move to front rule. The wait c, move and clear rule 
 achieved a decrease by altering the probability that a key is moved from 
 the request probabilities to a more favorable distribution. However, 
 
 the wait c and move rule does not do this. Since a key is moved after 
 
 st 
 eyery (c+1) request for it, the move probabilities remain unchanged 
 
 in the sense that a key requested with probability p. will account for 
 
 a fraction, equal to p., of the total number of moves. 
 
 Consider any two keys, k. and k.. If we assume that moves 
 
 occur at intervals which are independent of whether or not k. is ahead of 
 
 p i 
 k., then k. will be ahead ' of the time and the performance will be 
 
 vJ 
 
72 
 
 the same as the move to front rule. However, this 1s not the case. 
 After k. has been moved (assume p, > p.), its count 1s set to zero. 
 Asymptotically, k.'s count is uniformly distributed over {0,1, ...,c}. 
 After k. has been moved, its count is zero, and k.'s count ranges from 
 zero to c. Clearly, after k. has been moved, the next move will occur 
 sooner; the roles of k. and k. have been interchanged, and after k. 
 has been moved, the count of k. (the more probable key) 1s closer to 
 causing a move. Therefore, the probability we find k. ahead of k. is 
 increased because we must wait longer for the next move when it is 
 ahead of k.. 
 
 w 
 
 Finally, we notice that this rule will have much faster con- 
 vergence than the wait c, move and clear rule since on the average, it 
 will move a key after ewery c+1 requests. The performance of this 
 rule is compared with previous rules in Table 2.9.1. 
 
 Having analyzed these rules, we can see that they are asympto- 
 tically inferior to both the wait c, move and clear rules and the 
 limited difference rule. The convergence is faster than the wait c, 
 move and clear rules. It is at most c+1 times slower than the cor- 
 responding permutation rule, while the wait c, move and clear rule may 
 be as bad as nc+1 . A final interesting fact is that for some probability 
 distributions, it can be proved that this rule does not approach the 
 optimum as c -*■ ». We conjecture this to hold for any probability 
 distribution except the uniform and the distribution with a key of 
 probability one. 
 
73 
 
 Table 2.9.1 
 Comparison of Rules that use Counters 
 
 c=0 C=l c=2 c=3 c=4 c=5 
 
 Limited Difference 
 
 Rule 3.9739 3.4162 3.3026 3.2545 3.2288 3.2113 
 
 Wait c, Move to Front 
 
 and Clear (Exact) 3.9739 3.6230 3.4668 3.3811 3.3285 
 
 Wait c, Transpose and 
 
 Clear (Exact) 3.4646 3.3399 3.2929 3.2670 3.2501 
 
 Wait c and Move to 
 
 Front (Exact) 3.9739 3.8996 3.8591 3.8338 3.8165 3.8040 
 
 Wait c and Transpose 3.4646 3.3824 3.3576 3.3473 3.3312 3.3272 
 
 Asymptotic costs for various rules assuming a nine 
 element list whose probabilities are given by Zipf 's 
 Law. Compare these with the optimal cost which is 
 3.1814. Cost for the limited difference rule and 
 the wait c and transpose rule were estimated by 
 simulations consisting of 1000 requests. The average 
 of 200 trials is shown. 
 
74 
 
 2.10 Time Varying Distributions 
 
 In this section, we consider probability distributions that 
 vary with respect to time. We first examine two examples concerning 
 the move to front rule: one where the probability of a key decreases 
 after it has been requested, and another where it increases. 
 
 The first example supposes we have n keys, k,,k 2 ,...,k . 
 Assume the requests made to this list form a sequence of permutations 
 of these n keys. The permutations are independently chosen with each 
 of the n! permutations being equally likely. A model that satisfies 
 this constraint is a company that sends out bills each month. Its 
 customers then pay their bills in a random order. 
 
 Assuming this model, we can prove that the move to back rule 
 is the optimal rule. The proof is as follows: after t requests out 
 of a permutation have been made, each of the remaining n-t requests is 
 equally likely, and the best we can do is to have these n-t keys (and 
 none of the t previously requested keys) in the first n-t positions of 
 the list. Since each key is equally likely to be requested, the 
 ordering of the unrequested keys will make no difference. The move to 
 back rule clearly achieves this and therefore must be optimal. Any 
 other rule will occasionally move the requested key to one of the first 
 n-t positions, resulting in a higher cost. To derive the average cost 
 for the move to back rule to retrieve all n keys of a parmutation, we 
 
 J.L. 
 
 note that to retrieve the i key, we search through an unordered list 
 of n-i+1 keys. The average cost is then 
 
75 
 
 r/rnc+ \ ■ 1 ■? (n-1-H) + 1 _ n+3 
 E(C0St MTB } " n fa — ~T r 
 
 If no rule 1s applied to the list, each key will be accessed exactly 
 once giving a cost of 
 
 E(Cost DflMn ) - I I i - -' 
 
 RAND' n fa* 2 ' 
 
 Finally if the move to front rule is used, accessing the list at time 
 
 i will first require 1-1 comparisons with the previously requested 
 
 keys. Then we search through an unordered list of n-1+1 keys. The cost 
 is then 
 
 nrnct \ . 1 r m i\ *. (n-1+1) + 1 3n+l 
 E(Cost MTF ) --^(1-1) +.J ji 5-. 
 
 This cost is three times larger than the move to back rule, and 50 
 percent larger than doing no moves at all. The reason is obvious: 
 once a key has been requested, its probability of being requested again 
 decreases. In this case, our strategy must be to move requested keys 
 back in the list. 
 
 Using the move to back rule, the keys will appear 1n the list 
 in the order that they were requested. If our clients have regular 
 habits and pay their bills at about the same time each month, the access 
 time of the move to back rule will decrease further, and that of the 
 move to front rule will increase. 
 
 We now consider a second example. Suppose that with probabil- 
 ity p, the requested key is the same as the previously requested key. 
 
76 
 
 With probability 1-p some distribution (p, ,p 2> . . . ,p ) over the keys 1s 
 used. The move to front rule would seem to be the logical choice here 
 (if p is not extremely small) since the first key in the list will 
 have a good chance of being requested again. 
 
 To analyze this rule, we note that the probability of a given 
 ordering is not effected by p, but depends only on the p.. We can 
 view the chain as waiting in each state until a "normal" request is 
 made. During this wait, only requests to the first key are made, and 
 these do not change the order of the list. In addition, the wait time 
 is the same for all states. 
 
 Therefore, with probability p, the first key is found in one 
 
 comparison. With probability 1-p, the p,,p 2 ,...,p distribution is 
 
 p.p. 
 used. The cost here is just 1+2 £ F~+d ' t ' le norma ^ move t0 
 
 Ui<j*n p i p j 
 
 front cost. Adding these two results gives 
 
 p • 1 + (1-p) - [1 + 2 I -J5L] 
 
 U1<j<n K 1 F j 
 
 -1+20-P) J. jSJ£ - 
 
 a decrease of nearly a factor of 1-p. If p is close to one, the move to 
 front rule gives \/ery good performance. 
 
 These two examples point out much of the performance depends on 
 the model for the input requests. For models that cause the requested key 
 to become more probable, the move to front rule will perform well. If 
 the requested key becomes less probable, a rule that moves the requested 
 
77 
 
 key back in the 11st 1s more suitable. Due to the wide variety of 
 performance, little more can be said unless a specific model is con- 
 sidered. 
 
 2.11 Summary and Conclusion 
 
 We first discussed the move to front rule and the transposition 
 rule . Asymptotically, the transposition rule performs better. Rivest 
 [2] has shown that for any distribution its asymptotic cost 1s less than 
 or equal to that of the move to front rule. We calculated the asymptotic 
 cost for several distributions, with the transposition rule showing about a 
 10 percent increase over the optimum, and the move to front rule a 25-38 
 percent increase. Finally, a theorem by Rivest [2] showed that these costs 
 could be at most twice the optimum. Thus, 1f we expect the number of 
 requests to be large compared to the number of keys, the asymptotic cost 
 will dominate and the transposition rule will be superior. 
 
 Asymptotic cost is not the only criterion for evaluating rules. 
 A rule may have very low asymptotic cost, but converge so slowly that it 
 is of little practical value. We defined the overwork in order to 
 measure the speed of convergence. The move to front rule was found to 
 have much smaller overwork. For two simple distributions, the move to 
 
 ft 1 n 1 
 
 front overwork was -i- (for an n element list), compared to — ^- for the 
 
 transposition rule. For Zipf 's Law, the move to front rule has overwork 
 
 2 3 
 
 ~ .057n , while the transposition rule has fi(n ) overwork. 
 
 The difference in rates of convergence was also demonstrated 
 
 by graphs of the time varying cost. From these, we calculated when the 
 
7H 
 
 total cost of the transposition rule would become less than that of the 
 
 2 
 move to front rule. For Zipf's Law, it appears to take U[n ) requests 
 
 before the crossover occurs. Thus, if the number of requests will be 
 
 small (0(n )), compared to the number of keys, the move to the front 
 
 rule outperforms the transposition rule. 
 
 We next considered a subset of permutation rules called 
 move ahead k rules . These rules form a spectrum ranging from the move 
 ahead 1 rule (transposition) to the move ahead n-1 rule (move to front). 
 As the parameter k is increased, the asymptotic cost increases, but the 
 rate of convergence also increases. 
 
 A hybrid rule that initially moves keys to the front and then 
 begins transposing was also examined. If the anticipated number of 
 requests is neither large enough to make the transposition rule a clear 
 choice nor small enough to require the move to front rule, the hybrid 
 rule should be used. It was shown to combine the best features of both 
 rules: initially it converges quickly, and asymptotically it has a low 
 cost. Note that it is only in this intermediate region that both fast 
 convergence and low asymptotic cost are important. Outside of this 
 region, the hybrid either performs like either the move to front rule 
 or the transposition rule, and it is better to use these rules than 
 incur the overhead of the hybrid. A difficulty was found in deciding 
 when to switch rules. For Zipf's Law, this point appears to be .268n + 
 .980. 
 
 The first request rule and the move to front rule was shown to 
 produce any list with the same probability. Thus, these two rules are 
 
79 
 
 essentially the same, but there are two Important differences. First, 
 the first request rule moves each key only once and therefore 1s much 
 cheaper to execute than the move to front rule. Second, the first 
 request rule will be "trapped" in some ordering after all keys have been 
 requested. Thus, its cost, averaged over time, has a much higher 
 variance than that of the move to front rule. Thus, this rule can be 
 used in place of the move to front rule. The advantage here 1s that the 
 first request rule is cheaper to execute (each key 1s moved only once). 
 However, it also has the drawback of increasing this variance. 
 
 The asymptotic ordering obtained by the first request rule 1s 
 the same as if the keys were not known a priori and a "new" key (one 
 requested for the first time) 1s inserted at the end of the list. 
 Since this is also the steady state distribution for the move to front 
 rule, if the initial list was constructed in this manner, the move to 
 front rule will not reduce the cost. 
 
 Finally, a hybrid rule between the first request and trans- 
 position rules was formulated. This has better performance than the 
 move to front/ transposition hybrid, and also, we need not guess when to 
 switch rules. 
 
 In comparing the different rules we have studied, we find 
 that the move to front rule is best if a small number of requests will 
 be made. The transposition rule is best for a large number of requests, 
 and the first request rule/transposition hybrid should be used for an 
 intermediate number. However, none of these rules should be used if 
 
80 
 
 we have storage to keep counters. If this 1s the case, one of the 
 following methods should be used. 
 
 We next considered rules that used counters. The first of 
 these was the frequency count rule . The performance of this rule is 
 optimal at any time and asymptotically. Its major disadvantage is 
 the storage required by the count fields. This can be reduced by 
 storing the differences between successive counts. 
 
 The limited difference rule put an upper bound on the size of 
 these differences. The performance is no longer optimal, but approaches 
 the optimum as the upper bound on the differences goes to infinity. 
 This upper bound need not be too large. Even for small bounds, the 
 performance is nearly optimal. 
 
 The wait c, move and clear rules improve on the performance of 
 the corresponding permutation rule by altering the probability that a 
 key is moved. As c ■* », the performance approaches the optimum. How- 
 ever, the performance of the limited difference rule is better and 
 these rules also have the disadvantage of converging wery slowly. 
 
 A final class of rules was the wait c and move rules . These 
 rules also improved upon their corresponding permutation rule, but the 
 cost does not approach the optimum as c + », and these rules were 
 outperformed by both the limited difference rule and the wait c, move and 
 clear rules. 
 
 In comparing the different rules using counters, the frequency 
 count rule should be the choice if enough storage can be spared for the 
 
81 
 
 counters so that there 1s no possibility of overflow. If this 1s not 
 the case, the limited difference rule appears to be the best. Its 
 asymptotic cost 1s the lowest, and 1t does not have the slow con- 
 vergence of either the "wait" rules. 
 
 Finally, we considered the effects of a time varying distri- 
 bution. In one example, the move to back rule 1s optimal, and the move 
 to front rule 1s poorer than simply leaving the 11st unchanged. In 
 another example, the move to front rule performed quite well. The dif- 
 ference is that once a key has been requested in the first example, its 
 probability of being requested decreases. In the second example, it 
 increases. 
 
u 
 
 3. BINARY SEARCH TREES 
 
 In this chapter, we will discuss extensions of the previously 
 
 discussed techniques to binary search trees. The standard definitions 
 
 given in Knuth [3] will be used. Note that the cost function is still 
 n 
 7 p.c, but now c. is the level of k. (with the root having level one.) 
 
 Here it is necessary to assume that there is an ordering imposed on the 
 keys. The tree search algorithm requires that every node in the tree 
 must be greater than every node in its left subtree and less than every 
 node in its right subtree. Any transformation we perform on the tree 
 must preserve this property. 
 
 Results that are related to this topic fall into two categories 
 The first assumes that key request probabilities are known a priori. If 
 this is the case, an algorithm by Knuth [11] can be used to determine 
 the optimal binary tree. Heuristic algorithms that build near-optimal 
 trees, but require less space and time have been discovered by Bruno 
 and Coffman [12], Melhorne [13], and Walker and Gottlieb [14]. 
 
 The second category of results contains the height balanced 
 trees of Adelson-Velskii and Landis [15], and the weight balanced trees 
 of Nievergelt and Reingold [16]. These methods balance the tree when 
 keys are inserted and deleted. No changes are made if a key already in 
 the tree is requested, so these methods do not take advantage of a 
 favorable probability distribution by moving more frequently accessed 
 keys nearer the root. The methods we will examine do not fall into 
 either category since we assume the probabilities are not known a priori, 
 
83 
 
 and we will suppose that there are high probability keys which we wish 
 to move near the root of the tree. 
 
 We will study the two transformations shown below, called 
 "rotations." 
 
 Figure 3. The two rotations. 
 
 Here the circles labeled A and B are nodes, and the triangles labeled 
 S, , S 2 and S 3 are subtrees. Note that this transformation can be per- 
 formed at any node in the tree. This pair of transformations is 
 acceptable since 1t does preserve the ordering of the tree; after a 
 rotation, the left subtree of A contains only keys less than A, and the 
 right subtree contains only those greater, and similarly for B. This 
 pair of transformations is also "complete" in the following sense: 
 
 Theorem : Let T, and T 2 be two binary search trees that have the same set 
 of keys. Then T, can be transformed into T« by a sequence of rotations. 
 
 Proof : Let T, be the root of T«. We can bring r to the root of T-. by 
 using rotations to successively promote it until it reaches the root. 
 
84 
 
 Since rotations preserve the ordered property of the tree, the nodes in 
 r's left subtree will be less than r, and the nodes in its right subtree 
 will be greater. We then recursively apply this procedure to each sub- 
 tree to generate first the left and then the right subtrees of T ? . Note 
 that the transformations applied to the right subtree will leave the 
 left subtree unchanged. Hence, this procedure successfully produces T~. LJ 
 
 We consider these transformations as a mechanism to move node 
 A and subtree S, to higher positions in the tree when we suspect they 
 contain high-probability nodes. The important question is: when should 
 the transformation be used? The following sections describe several 
 different rules for using the transformations. 
 
 3.1 Transform after Every Request 
 
 We first consider rules analogous to the transposition and move 
 to front heuristics. The move to root rule uses rotations to repeatedly 
 promote the requested node until it becomes the root of the tree. The 
 move up one rule uses a rotation to promote the requested node one level. 
 
 Although it is not immediately obvious, the operation of 
 moving a node to the root can easily be done during the search for the 
 requested key. Suppose x has been requested and has subtrees S. and S R . 
 Let £,,...,£. be the ancestors of x which are less than x (labeled in 
 order of distance from the root) and suppose they have left subtrees 
 S ,...,S . Similarly, let r -,,..., r. be the ancestors greater than x 
 
 X,-| X,. I J 
 
 with right subtrees S ,...,S . We can then construct a tree with x as 
 
 r l r j 
 its root as shown below. 
 
85 
 
 ancestors 
 
 less than 
 
 x 
 
 ancestors 
 greater 
 than x 
 
 Figure 3.1.1 Moving node x to the root. 
 
 To accomplish this transformation during the search, we simply 
 keep a list of the ancestors of x which are less than x using their right 
 pointers and a list of those greater than x using their left pointers. 
 When x is found, his sons become 1-, and r, , the heads of these two lists. 
 S. becomes the right subtree of £ . , and S R becomes the left subtree of 
 r.. The case where x has no left (or right) ancestors is easily handled 
 
 J 
 
 by a few tests. 
 
 To analyze the performance of the move to root rule, we intro- 
 duce the first request rule . The first time a key is requested, it is 
 promoted in the tree until it reaches the root or becomes the son of 
 previously requested key. The resulting tree is the same as the one 
 obtained by inserting a "new" key (one requested for the first time) into 
 
'•'X 
 
 the tree. As in the case of the linked list, the move to root rule 
 and the first request rule behave identically. To prove this, we first 
 make three observations. 
 
 Observation 1 : Consider a tree that is modified by the move to root 
 rule. For any two keys, x and y, if a key which is between x and y in 
 the ordering on the keys is requested, x will not be an ancestor of y 
 in the resulting tree. 
 
 Proof : Since the root of the resulting tree is between x and y, they 
 will be in different subtrees of the root. 
 
 Observation 2 : If x is an ancestor of y and a key that is not between 
 x and y is requested, x will still be an ancestor of y. 
 
 Proof : Suppose that z has been requested. If z is not a descendant of 
 x, the tree rooted at x will not be altered, so x will still be an 
 ancestor of y. If z is a descendant of x, it will be moved up in the 
 tree until it becomes a son of x. There are two cases 
 
 Case (1): z < x. The rotation looks like: 
 
 ■> 
 
 Figure 3.1.2 
 
87 
 
 where y is in either of the shaded subtrees. Note that y cannot be in 
 the left most subtree since then z would be between x and y. 
 
 Case (2): z > x. The transformation is: 
 
 ^ 
 
 Figure 3.1 .3 
 
 where, again y must be in either shaded subtree. 
 
 In either case, x is still an ancestor of y. Since z is no 
 longer a descendant of x, any further rotations will leave x as an 
 ancestor of y, and therefore Observation 2 must be true. 
 
 Observation 3 : If neither x nor y is the ancestor of the other and a 
 key that is not between x and y is requested, then neither x nor y will 
 become the ancestor of the other. 
 
 Proof : If neither x nor y is the ancestor of the other, there exists 
 some w that is between x and y, and an ancestor of both. Since z is 
 not between x and y, it cannot be between x and w and it cannot be 
 between y and w. By Observation 2, w will still be an ancestor of both 
 x and y in the resulting tree and hence neither x nor y will become the 
 ancestor of the other. [~| 
 
w 
 
 We now give a lemma that characterizes exactly when one node 
 will be an ancestor of another, based on the sequence of requests and 
 the initial tree. 
 
 Lemma 1 : Node x will be an ancestor of node y using the move to root 
 rule if and only if: 
 
 (1) Neither x, nor y, nor any key between them in ordering 
 on the keys has been requested, and x was an ancestor 
 of y in the initial tree. 
 OR (2) Neither y nor any key between x and y has been requested 
 after the most recent request for x. 
 
 Proof : ("if" part) 
 
 (1) => Lemma follows from Observation 2. 
 
 (2) => Lemma follows from Observation 2 and the fact that 
 when x is requested, it becomes the root of the tree and 
 hence is an ancestor of every other node. 
 
 ("only if" part) 
 
 Case 1 (x has not been requested) 
 
 Suppose that x has not been requested, and it is an ancestor 
 of y. We will show that this must imply (1). From the 
 observations it is clear that the only way x can become an 
 ancestor of y (if it is not already) is for x to be requested, 
 Since x was never requested, it must have originally been an 
 ancestor of y. Then, from Observation 2, no key between x 
 
89 
 
 and y can have been requested since then x would no longer 
 
 be an ancestor of y. Similarly, y cannot have been requested. 
 
 Therefore (1) holds. 
 
 Case 2 (x has been requested) 
 
 Here we show (2) must hold. Consider the situation after 
 
 the most recent request for x: x is an ancestor of y, and 
 
 x will not be requested again. This is the same situation 
 
 as in Case 1 and by using its proof, we can show (2) must 
 
 hold. Q 
 
 We also prove the following lemma about the first request 
 rule. 
 
 Lemma 2 : Node x will be an ancestor of node y using the first request 
 rule if and only if: 
 
 (1) Neither x nor y nor any node between them has been 
 
 requested and x was an ancestor of y in the original 
 tree. 
 
 OR (2) Neither y nor any node between x and y was requested 
 before the first request for x. 
 
 Proof : Case 1 (x has not been requested.) 
 
 First note that the three observations still hold if x has 
 not been requested. Then, as we noted before, once the 
 requested node (z) is no longer a descendant of x, further rota- 
 tions involving z do net effect the tree rooted at x. Hence the 
 
v, 
 
 two rules "look the same" to an unrequested x because the only 
 differences occur after z is no longer a descendant of x. 
 Therefore the proof for Lemma 1 is valid and Case 1 is proved. 
 Case 2 (x has been requested) 
 
 To see what happens when x is first requested, consider the 
 previously requested keys and label them k,,kp,...,k so that 
 k, < k 2 <...< k . They occur in a group at the top of the 
 tree and divide the unrequested nodes into n+1 different sub- 
 trees. The leftmost of these subtrees contains all keys less 
 
 than k, , and the rightmost contains all greater than k . Each 
 1 n 
 
 of the remaining consist of all keys between two "adjacent" 
 k. . (See Figure 3.1.4) 
 
 Thus, two unrequested nodes, x and y, are in the 
 same subtree if and only if no key between them has been 
 requested. When x is first requested, it moves to the root 
 of the subtree it is in and becomes an ancestor of all nodes 
 in that subtree. Therefore x becomes the ancestor of y if 
 and only if neither y nor any key between x and y has been 
 requested, x will then remain the ancestor of y since no node 
 can move up past x and out of its subtree, proving Case 2. I — I 
 
 We can now prove the main theorem. 
 
 Theorem : Given any initial tree, the probability of obtaining a given 
 final tree after any number of requests is the same for the move to root 
 rule and the first request rule. 
 
91 
 
 Keys k,,k 2 ,k 3 and k. have been requested. S, contains 
 all keys less than k, . S 5 contains all keys greater 
 than k.. S. contains all keys between k. •, and k. for 
 1 = 2,3,4. 
 
 Figure 3.1.4 How the requested keys divide the tree. 
 
92 
 
 Proof : Consider any sequence of requests r,,r^,...r. as inputs to the 
 move to root rule and the reversed sequence r^ . , r*. _ -. , . . . ,r, as inputs to 
 the first request rule. Note that these two sequences have the same 
 probability. Trivially, the conditions of Lemma 1 hold if and only if 
 the conditions to Lemma 2 hold. This means that x is an ancestor of y 
 in one tree if and only if it is an ancestor of y in the other. Since 
 this information allows us to uniquely construct a tree, the two trees 
 are the same and the theorem is proved. 
 
 Note also that the theorem also holds if we are given a 
 probability distribition over the initial trees since the two rules 
 perform identically on each tree. 
 
 As is the case with linked lists, the first request rule creates 
 the same tree as if the keys were not known a priori, and each "new" key 
 (one requested for the first time) was inserted into the tree. If the 
 initial tree was created in this manner, the move to root rule will not 
 decrease the cost. 
 
 The characterization given in Lemma 1 allows us to determine 
 the time varying and steady state costs for the move to root rule. As 
 stated in the theorem, these will equal the cost of the first request 
 rule. 
 
 Theorem : If key k. has probability p. of being requested and the keys 
 
 are ordered k, < k < ...< k , the cost for the move to root rule after 
 I c n 
 
 t requests is: 
 
93 
 
 PiP-i 2p.p. t 
 
 UT<jsn b ij Ui<jsn n J1 J 1J B ij n 
 
 Where Aj. is the probability that k i is an ancestor of k. in the initial 
 
 max(i,j) 
 tree and B . . = I p. , the probability of requesting a key between 
 
 1J k=min(i,j) K 
 
 k. and k. incl usive . 
 
 i j 
 
 Proof : We first determine Prob(k. is an ancestor of k. after t requests) 
 By Lemma 1, this is the sum of: 
 
 (1) The probability that neither k. nor k. nor any key 
 between them has been requested in t requests and k. 
 was originally an ancestor of k.. This probability is 
 (l-B^'V 
 and (2) The probability that neither k. nor a key between k. 
 
 and k. has been requested since k.'s most recent request. 
 This equals the probability that k. was most recently 
 requested at time m and neither k. nor any key between 
 k. and k. nor k. (since m was the most recent request) 
 was requested after time m. Hence the probability of 
 (2) equals 
 t t-m l-O-B.,)* 
 
 i=i n 1J 
 
 = p 
 
 i B, , 
 
 m*i u ij 
 
 Therefore Prob(k. is an ancestor of k. at time t) = 
 
 J 
 
 t l-O-B..)* 
 
 1-B..) A.. + p. 5 — ^ — 
 
 ij 1J K i B i . 
 
94 
 
 Then 
 
 n 
 E(Cost) = 1 + I p. I Prob(k. is an ancestor of k. at time t) 
 1-1 1 J?1 J ' 
 
 p.p. 2p.p. . 
 
 = 1+21 ^-J- + I [p A + p.A - -g^-Jd-B ) t 
 l*1<j«n b ij l<Kj*n n J1 J 1J b ij 1J 
 
 □ 
 
 It is interesting to note that the asymptotic cost equals 
 
 p i p i 
 1 + 2 I r — > which bears a striking resemblance to the asymptotic 
 
 l<1<j<n ij 
 
 cost of the move to front rule (p.+p. has been replaced by B.. in the 
 
 denominator). Also, the formula gives the initial cost (t=0) as 
 
 1 + I [P^A.. + p.A •]. 
 l*1<j<n 1 J J J 
 
 For a tree built by random insertion, k. will be the ancestor of k. 
 (i<j) if and only if it is the first to be inserted from the set of all 
 keys from k. to k. inclusive. Since each of these j-i+1 keys is equally 
 likely to be first, this gives us A. . = A.. = . . ,, , and the cost 
 equals 
 
 (j iPi [H i+ H n .. +1 ])-l. 
 
 We now consider the move up one rule. We would expect it to 
 have lower asymptotic cost than the move to root rule (as an analogy to 
 the case of linked lists). However, simulations (See Table 3.1.1) 
 suggest that this is not the case. The move to root rule had signifi- 
 cantly lower asymptotic cost in four out of six distributions tested. 
 In addition its average asymptotic cost was lower. 
 
95 
 
 Table 3.1.1 
 
 A Comparison of the Move to Root 
 and Move Up One Rules 
 
 
 
 
 
 
 ZIPF'S 
 
 LAW 
 
 
 
 English 
 Letters 
 
 #1 
 
 #2 
 
 #3 
 
 #4 
 
 #5 
 
 Average 
 
 Random Cost 
 
 5.15 
 
 7.26 
 
 7.50 
 
 7.27 
 
 7.33 
 
 7.63 
 
 7.40 
 
 Optimal Cost 
 
 3.32 
 
 4.10 
 
 3.93 
 
 4.16 
 
 4.06 
 
 3.96 
 
 4.04 
 
 MTR Cost (Exact) 
 
 4.31 
 
 5.63 
 
 5.53 
 
 5.68 
 
 5.59 
 
 5.55 
 
 5.59 
 
 Increase Over 
 
 
 
 
 
 
 
 
 Optimum 
 
 29.8% 
 
 37.0% 
 
 40.7% 
 
 36.5% 
 
 37.6% 
 
 40.0% 
 
 38.4% 
 
 Move Up One Rule 
 Cost 
 
 4.77 
 
 6.27 
 
 5.52 
 
 6.07 
 
 6.18 
 
 5.43 
 
 5.90 
 
 Increase Over 
 
 
 
 
 
 
 
 
 Optimum 
 
 43.8% 
 
 52.8% 
 
 40.6% 
 
 45.9% 
 
 52.1% 
 
 37.1% 
 
 45.7% 
 
 A simulation was run to determine the cost of the 
 
 tables) ° n Fif r tvV dnd ° ther rUl6S a PP-°inVintter 
 the co t and lV ree t w, : re / a ndomly generated, and 
 the cost and other statistics were recorded after 
 500 requests. The probability distributions we con- 
 sidered were the English letters and five others 
 
 \^ e ^nerated by choosing a random ordering of 
 100 keys whose probabilities were given by Zipf's 
 
96 
 
 Further evidence is provided by considering these rules 
 applied to a tree of only three nodes. We label the keys A, B and C 
 with probabilities a, b and c respectively. The rules form Markov 
 chains with five states, each corresponding to a different tree of 
 three nodes. The transition matrices are easily determined, and from 
 them, the steady state cost is easily obtained. This calculation was 
 done for a = 0, .01, .02,. ..,.99, 1. b = 0, .01 , ... 1 -a and c = 1-a-b. 
 The results are shown in Figure 3.1.5. Note that the move to root 
 rule does outperform the move up one rule for a considerable number 
 of the data points. Since these calculations were not simulations, but 
 were done exactly (within the precision of the computer), the move to 
 root rule does have a lower asymptotic for some distributions, and 
 hence a theorem showing the move up one rule to always be superior (as 
 for linked lists) cannot be true. 
 
 To get an intuitive idea of why the move up one rule can 
 
 behave poorly, let us consider what will cause a given node to move up 
 
 in the tree. For a given node B, let us look at A,, A 2 ,...,A. , the 
 
 ancestors of B, ordered in increasing distance from the root. From 
 
 the properties of the rotations, we can verify the B will move up in 
 
 the tree if B itself is requested or if A. is requested and A. ,< A.< B 
 
 or A. n > A. > B. B will move down one level if either of its sons is 
 l-l i 
 
 requested or the son of A. that is not A. +1 is requested. Thus, we can 
 see that the movements of B are controlled by much more than just its 
 probability. If B is far from the root, it may be difficult for B to 
 
97 
 
 1.00 
 
 0.75 ■• 
 
 0.50 -- 
 
 0.25 -- 
 
 0.00 
 
 0.00 
 
 1.00 
 
 The curve in this figure shows those a and b where the cost of the move 
 to root rule equals that of the move up one rule. The move to root rule 
 has lower cost in the region to the right of the curve (53% of the total 
 area) and the move up one rule has lower cost in the region to the left. 
 
 Figure 3.5.1 
 
y< 
 
 move up in the tree. On the other hand, the move to root rule promotes 
 nodes all the way to the root of the tree, so high probability nodes 
 cannot spend a lot of time "trapped" far from the root. 
 
 We derived a closed form for the cost of the move to root 
 rule as a function of time which bore a striking resemblance to that 
 of the move to front rule. The move to root rule was also shown to be 
 identical to the first request rule. A simulation estimating the cost 
 of the move up one rule suggested it was often inferior to the move to 
 root rule (see Table 3.1.1). Both rules performed well and provided 
 reasonable decreases over the cost of a random tree. The move to root 
 rule averaged within 38 percent of the optimum, while the move up one 
 rule was within 45 percent. These average costs suggest that the move 
 to root rule would be the better choice. 
 
 3.2 Monotonic Trees 
 
 Another method for getting more frequently accessed nodes 
 high in the tree is to keep a frequency count associated with each node. 
 The node with the largest count becomes the root of the tree, and each 
 subtree is formed recursively, using the same rule. Such a tree is 
 called monotonic because the frequency count for any given node is 
 greater than or equal to that of any of its descendants. (This property 
 is the same as the one required for a heap, see Williams [17].) 
 
 It is a simple matter to keep the tree ordered in this manner. 
 Rotations are used to promote the requested key until a key with equal 
 or greater count is encountered. The resulting tree will have the 
 monotonic property. 
 
99 
 
 Asymptotically, the most probable key will become the root 
 of the tree (by the Law of Large Numbers, it will be requested the 
 most times), and each subtree will have its most probable node as its 
 root. The asymptotic tree will be monotonic, with probabilities as 
 weights. This allows us to easily calculate the asymptotic cost for this 
 method. Table 3.2.1 show it averages within 15 percent of the optimum. 
 
 However, this method is very poor for some distributions. 
 Suppose key k. has probability p. and that the lexicographic ordering 
 
 of the keys is k, < k <...< k 
 
 1 ^ n* 
 
 Table 3.2.1 
 The Performance of Monotonic Trees 
 
 
 
 
 
 
 ZIPF'S 
 
 LAW 
 
 
 
 English 
 
 
 
 
 
 
 
 
 Letters 
 
 #1 
 
 #2 
 
 #3 
 
 #4 
 
 #5 
 
 Average 
 
 Random Cost 
 
 5.15 
 
 7.26 
 
 7.50 
 
 7.27 
 
 7.33 
 
 7.63 
 
 7.40 
 
 Optimal Cost 
 
 3.32 
 
 4.10 
 
 3.93 
 
 4.16 
 
 4.06 
 
 3.96 
 
 4.04 
 
 Monotonic Tree 
 
 
 
 
 
 
 
 
 Cost (Exact) 
 
 3.77 
 
 4.91 
 
 4.18 
 
 5.32 
 
 4.68 
 
 4.20 
 
 4.66 
 
 Increase Over 
 
 
 
 
 
 
 
 
 Optimal 
 
 13.6°/ 
 
 ]Q 1°L 
 
 £ Z°/ 
 
 07 no/ 
 
 
 
 
 See Table 3.1.1 for explanation. 
 
TOO 
 
 If the p. are approximately equal and p, > p ? >...> p 
 then the skewed tree shown below will result. 
 
 Figure 3.2.1 A worst case monotonic tree. 
 
 A theorem by Mel home [13] shows how bad this can be. 
 
 Theorem (Melhorne [13]): The ratio between the cost of a monotonic 
 tree and the optimal tree may be as high as n/(4 log n) for trees with 
 n nodes. 
 
 This theorem depends on a \/ery unfavorable choice for the 
 ordering of the keys and only gives an idea of the worst case per- 
 formance of monotonic trees. We now consider how these trees perform 
 on the average by assuming the probabilities are randomly chosen in 
 some way. The first method we consider chooses the probabilities from 
 a given set of n probabilities. The second chooses the probabilities 
 from some given probability density function. We now investigate the 
 first method. 
 
 Theorem (Knuth [3, p. 432]): Given n keys and n probabilities (p.. ^ p 2 * 
 
 ... > p ), if each of the nl assignments of probabilities to keys is 
 
 n n 
 
 equally likely, the expected cost of a monotonic tree is [2 £ H-p.] - 1. 
 
 i=l q n 
 
101 
 
 Proof: An assignment of probabilities to the keys imposes an ordering 
 on the probabilities. Probability p. is to the left of p. if the key 
 to which p. has been assigned is to the left of the key to which p. has 
 been assigned. We have assumed that each of the n! orderings is 
 equally likely. The cost of a monotonic tree is solely determined by 
 this ordering imposed on the probabilities. Hence, the problem is 
 
 equivalent to assigning p. to key ^ and then randomly ordering the k. 
 
 since each of the n! orderings on the probabilities will still be 
 
 equally likely. This restatement turns out to be simpler, and we work 
 
 with it instead. 
 
 Let £. be a random variable denoting the level of k.. By 
 definition, 
 
 C0St = I p.£. 
 
 i = l ■• ] 
 
 E(Cost)= E( I p£) = I p E ( £ .) 
 i=l 1 1 i=i i i' 
 
 So 
 
 Define R. -( ^ 1fk J 1s an dncestor of k i 
 otherwise 
 
 T ^n £. =R 1+ R 2+ ... + Ri%i + 1 (R . = if j,i) 
 
 E(^) = E(R 7 ) + E (R 2 ) + ...+ E (R N1 ) + 1 
 
 i-1 
 
 = J Prob(k, is an ancestor of k.) + 1 
 j=l J l' 
 
 To determine this probability, we discuss some properties of a 
 random ordering. Consider any two keys, k. and k.. There are only two 
 
102 
 
 distinct orderings of k. and k.; k. to the left of k, and k. to the 
 right of k., each having probability 1/2. For either of these two 
 orderings, a third key can be in three different regions: to the left 
 of both keys, between the two keys, and to the right of both keys, 
 each with probability 1/3. In general, any ordering of i keys creates 
 i+1 regions, each having probability 1/i+l of containing a given key. 
 
 Consider any ordering of k, ,...,k,_ , and k.. Now k. will be 
 an ancestor of k. if no key with probability greater than k. (that is, 
 k 1 ,k«»...»k. -.) occurs between k. and k. . For this to happen, k. must 
 
 occur in either the region to the left of k. or the region to the right. 
 
 2 
 Since there are j keys in the ordering, this probability is -rrr. 
 
 1-1 ? 
 Hence E{i.) = ( I -nr) + 1 
 1 J-l J ' 
 
 = 2H. - 1 
 l 
 
 n 
 and hence E(Cost) = I p.(2H.-l) 
 
 i=l 1 n 
 
 The following theorem tells us the cost of a tree built by a 
 random sequence of insertions. 
 
 Theorem : Given n keys (k, < k 2 <...< k ) and a set of n probabilities 
 {p. : 1 £ i £ n}, if the probabilities are randomly assigned to the keys 
 and then a tree is built by a random sequence of insertions, its expected 
 cost will be 2 ^ n ' H - 3 for any set of probabilities. 
 
103 
 
 Proof : Let p(k.) be random variables denoting the probability chosen 
 for k. and let I. denote the level of k. . As before, 
 
 n 
 Cost = I p(k.U. 
 i=l 1 n 
 
 n n 
 
 E(Cost) = E( I p(k.U.) = I E(p(k.H.) 
 i=l n 1 i=l ] n 
 
 The insertion sequence (and hence I.) does not depend on p(k.)- These 
 two random variables are independent and 
 
 E(Cost) = I E(p(k.)) EU_.) = i I E(£.) 
 1=1 1 n n i=l n 
 
 E(£.) = 1 + 1 + £ Prob (k. is an ancestor of k, ) 
 
 Now k. will be an ancestor of k. if and only if it occurs in the 
 insertion sequence before k. and any key between k. and k.. This 
 probability is -|jTn+T. Therefore 
 
 "V-'^n^r 
 
 i-1 1 n , 
 
 ■ 1 ♦ [H r l] ♦ [H n . i+1 -1] 
 
104 
 
 ? n 
 
 - I H, - 1 
 
 2 y n-i + 1 1 
 
 2 ? n+1 2 n 
 
 -lusiiiH -3 n 
 
 n n I I 
 
 This quantity is the same as that derived by Hibbard [18]. 
 However, he assumed that the keys were equally probable, and our result 
 holds for any set of probabilities as long as they are randomly assigned 
 to the keys. 
 
 To compare the monotonic and random tree costs, note that 
 
 if we substitute p. = — into the formula for the monotonic tree cost, 
 l n 
 
 we get exactly the expression for the random tree cost. 
 
 = -[(n+l)?i- ? 1] - 1 =^+U.H -3. Clearly, 
 n -j^i 1 i = i n n 
 
 p. = — is the worst case since p n > p >...> p„ and the coefficients of 
 r i n I c n 
 
 the p. increase with i. Hence, except for the case where p. = — , the 
 monotonic tree is better than the random tree. If some of the p. are 
 large, the savings can be quite substantial. 
 
105 
 
 To demonstrate this, we first consider a set of probabilities 
 
 satisfying the geometric distribution, p. = -5-, r < 1, 1 * i < n, where 
 
 n+1 1 R 
 
 r= — i s a normalizing constant. Substituting this into the 
 
 formula for the cost of a monotonic tree, we get 
 
 n r\ , . 2 ? i I 1 
 
 1=1 n K K i = l j=l J 
 
 , 2 ? 1 r j -r n+1 
 
 n t n 
 
 Wi=j V,J 1-' 
 
 2 r ? r J n+1 ? ■ 
 
 [ !V--r" T| T4-] - 1 
 
 r . r n+l ^J j^J 
 
 If n is large, this is approximately 
 
 2 1 
 
 — ln(y- -) - 1, a constant independent of n, 
 
 If the probabilities satisfy Zipf's Law, the cost is 
 n 1 ? n H. 
 1=1 n n i=l 
 
 " H n 2 (H n " H n ) " ] 
 
 where 
 
 fo\ n i °° i 2 
 H (2) = 1 1 < y 1 .* 
 
 1=1 i 2 1=1 i 2 6 
 
106 
 
 Thus, for large n, the cost is approximately H , which is 
 half of the cost of the random tree. For both distributions the 
 monotonic tree gives significant gains over the random tree. 
 
 We now consider a method of selecting the key probabilities 
 that has been studied by Nievergelt and Wong [19]. Here we are given 
 a probability density, f(x), and the key probabilities are chosen with 
 respect to that density. It is necessary to drop the requirement that 
 
 our choices must sum to one, so instead of probabilities, we must con- 
 
 n 
 sider key weights . The cost of a tree is now £ w.£. where w. is the 
 
 i=l 1 ] ] 
 weight of k. . 
 
 We now need two standard definitions 
 
 Define : E(f) = 
 
 xf(x)dx, the mean of f(x) 
 
 Define : F (x) = f(y)dy, the distribution function of f(x) 
 
 J 
 
 -oo 
 
 F (x) is the probability a number chosen according to the density func- 
 tion is less than or equal to x. 
 
 The following theorem defines the cost of a monotonic tree 
 for an arbitrary density function. 
 
 Theorem : Given n keys (k, < k„ <...< k ) and a density function f(x), 
 if the weights of the keys are independently chosen from this density 
 function, the expected cost of the resulting monotonic tree is 
 
107 
 
 n-1 
 
 yf(y)F (y) 1 ' dy 
 
 2E(f)[nH -("!)]- 2 I ?^- 
 
 n-l ^ i=1 i j 
 
 — oo 
 
 Proof : As before, we have 
 
 n n 
 
 E(Cost) = I E(w,(l+ I A..)) = nE(f) + J 7 E(w.A..) 
 1=1 1 #1 J1 1-1 j?i J 
 
 where w- is the weight chosen for k^ 
 
 , 1 if k. is an ancestor of k. 
 and A.. =< j l 
 
 J1 1 if not 
 
 Note that w. and A., are not independent. 
 
 ' 
 
 ECw.A^.) = y ProbfwjAj. = y)dy 
 
 A., can just equal or 1 , and if A.. = the only y having 
 nonzero probability is y = 0. Since this will be multiplied by y = 0, 
 the case with A., can be ignored and 
 
 oo 
 
 r 
 
 E'(tr f Aj 1 ) = y Prob(w i = y and A^. = l)dy 
 
 To determine Probfw. = y and A.. = 1) we note that k. will 
 be an ancestor of k. if and only if w. > w. and w. is greater than the 
 weight of any key between k. and k. in the ordering on the keys. The 
 probability that w. = y is f(y)dy. We then chose an x ^ y for w. . 
 Any specific x is chosen with probability f(x)dx. For this x, we must 
 chose the |i-j|-l=m keys between k. and k. to have weight less than or 
 equal to x. The probability for this isF(x) m . The product of these 
 must be integrated over x ^ y, giving 
 

 Prob(w. = y and A. . = 1 ) = 
 
 rr: 
 
 f(y) f(x)F(x) m dxdy 
 
 Then 
 
 v m+l 
 
 ■ f(y) ^Itf . since ^- - f(x) and F(») - 1 
 
 m+T 
 
 dx 
 
 E(w iAji ) 
 
 D 
 
 yf(y) [ ] ' F ^( ] dy. 
 
 Note that this quantity depends only on m, and not the values of i and j 
 Since there are 2(n-m-l) distinct ordered (i,j) pairs having a given 
 value of m, 
 
 n-2 r , r-/..xm+l 
 
 I 
 m=0 
 
 n " 2 f l-FM m+1 
 E(Cost) = nE(f) + I 2(n-m-l) yf (y) [-^j^p ] dy 
 
 = n E (f , + z n f #i ' 
 
 m=0 
 
 yf(y)dy - 2^ 5^1 f yf (y )F(y) m+1 dy 
 m=0 J 
 
 n-1 
 
 n-m 
 
 2E(f)[nH i - l^-l)] - 2 I lOL- yf(y)F(y) m dy 
 
 m=l 
 
 Nievergelt and Wong give us two measures to which we can 
 compare this cost. 
 
 Theorem (Nievergelt and Wong [19]): The cost of the optimal tree whose 
 weights are chosen according to f(x) is E(f) n log n + 0(n). 
 
109 
 
 Theorem (Nievergelt and Wong [19]): The cost of a random tree whose 
 weights are chosen according to f(x) is (2 £n 2) E(f) n log n + 0(n). 
 
 Nievergelt and Wong also considered choosing the weights for 
 a monotonic tree from a uniform distribution. The resulting cost was 
 (2 In 2) E(f) n log n + 0(n), asymptotically equal to that of the 
 random tree. They conjectured that this held for any probability 
 distribution. We now show this conjecture to be true. I am grateful, 
 to D. L. Burkholder for the proof of the following lemma: 
 
 Lemma: For any density function f with a finite mean, 
 n-1 
 
 i=l i 
 
 yf(y)F 1 (y)dy = o(n log n) 
 
 Proof : F irst note that for any i >, 1, 
 
 yf(y)?Hy) 
 
 yf(y) 
 
 Hence Lebesgue's Dominated Convergence Theorem (see [21]) applies and 
 we have 
 
 lim 
 i-H» 
 
 lim r-i 
 
 yf(y)F 1 (y)dy = j yf(y) j™ F^yjdy = 
 
 since 
 
 lim ,-i 
 
 "|-M» 
 
 F n (y) = if F(y) < 1 = and f(y) = if F(y) = 1 
 
no 
 
 Now, 
 
 n-1 
 
 n-i 
 
 1=1 1 
 
 yf(y) F 1 (y)dy 
 
 n-l -| 
 
 <n I j yf(y) F^yjdy 
 
 (1) 
 
 We now choose N such that 
 
 yf(y) F 1 (y)dy < e for i 5» N, 
 
 Putting this in (1) gives 
 
 N-l , f . n-l 
 
 < n I I yf(y) F 1 (y)4y + n J f 
 
 i=l 1 J i=N ] 
 
 N-l r/^\ n-l 
 <n J ^ + nj f 
 
 i=l 
 
 i=N 
 
 Since H < £n x + 1 , we have 
 
 A 
 
 < n(ln(N-l) + 1) E(f) + ne(ln(n-l) + 1) 
 
 Therefore 
 
 n(ln(N-1) + 1) E(f) + ne(ln(n-l) + 1) 
 n log n 
 
 = (ln(N-l) + 1) E(f) + e(In(n-1) + 1 
 log n (log e)( in n) 
 
Ill 
 
 . (ln(N-l)+l) E(f) _e_ + e 
 
 log n log e (log e)(ln n) 
 
 We can make the first and third terms arbitrarily small (say, less than 
 e) by choosing n sufficiently large. Therefore, 
 
 00 
 
 i=l i 
 
 yf(y) F^yjdy 
 
 T7^) e 
 
 n log n log e 
 
 for n > N'. Therefore the limit of this ratio is zero as n ■> °° and the 
 lemma is proved. I 1 
 
 Theorem: If n keys have their weights chosen according to any density 
 function with finite mean, the expected cost of a monotonic tree is 
 (2 In 2)E(f) n log n + 0(n), asymptotically equal to the cost of a tree 
 built from a random insertion sequence. 
 
 Proof: The cost of a monotonic tree is 
 
 2E(f)[nH , - (£-1)] - 2 n "l Iti fyf(y) F^y) dy 
 n "' * i= i i J 
 
 -co 
 
 The first term is asymptotically equal to 2E(f) n In n = E(f) n log n. 
 The final term was shown by the lemma to be o (n log n). Hence the 
 asymptotic cost is (2 In 2) E(f) n log n. 
 
 We now show that the cost of a monotonic tree is less than or 
 equal to that of a random tree, proving that the cost of a monotonic 
 tree equals (2 In 2)E(f) n log n + 0(n). 
 
112 
 
 The method we are using to select key weights choses n 
 weights independently from a density function. An equivalent method 
 first selects a set of n weights from an n-dimensional density function. 
 This function is constructed so that the probability of choosing a 
 given set equals the probability of obtaining it (in any order) from n 
 selections from the original function. We then choose a permutation of 
 the set. 
 
 Now consider any set. We have already studied the case where 
 the key probabilities (easily generalized to include key weights) were 
 selected from a set and found the expected cost of a monotonic tree to 
 be less than or equal to that of a random tree. Since this holds for 
 es/ery set in the n-dimensional probability density, the theorem is 
 proved. I I 
 
 Finally, we cite the results of a simulation run by Walker 
 and Gottlieb [14] that showed the performance of monotonic trees to 
 be poor. They state that although these poor results are partially 
 explained by the fact that the leaf weights cannot influence the structure 
 of the tree, even the tests with all leaf weights equal to zero did not 
 produce acceptable nearly optimum trees. 
 
 Indeed the majority of the results concerning monotonic trees 
 are quite discouraging. This method performs well only when we are 
 quaranteed that the key probabilities will differ significantly from a 
 uniform distribution (i.e., have low entropy). If this is not the case 
 (as in the situation described by Nievergelt and Wong), the performance 
 is asymptotically the same as randomly built trees. 
 
113 
 
 3.3 Cost Balanced Trees 
 
 The previous methods have focused on the fact that a rotation 
 moves a certain node up in the tree, ignoring the fact that it also dis- 
 turbs two (possibly large) subtrees. The method of cost balancing 
 considers the entire tree. We do a rotation only when it appears to be 
 profitable, that is, when the number of accesses to the nodes that will 
 move up exceeds the number to those that will move down. 
 
 This method has the advantage that it is possible to do the 
 rebalancing during the search for the requested key since we know in 
 which subtree it lies. For example in Figure 3.3.1, we perform a 
 rotation to promote A, if w(A) + w(S, ) > w(B) + w(C) + w(S 3 ) + w(S«). 
 We promote C if w(C) + w(S 4 ) > w(B) + w(A) + w(S ] ) + w(S 2 ). Here, w(A) 
 is the number of times A has been requested, and w(S->) is the number of 
 times any node in S, has been requested. All this information is avail- 
 able at node B, and any rebalancing can be done there. 
 
 Figure 3.3.1 
 
114 
 
 Another advantage of this rule is that the leaf weights play 
 a rule in the balancing of the tree. Since a rotation that promotes 
 the nodes in subtree S, also must promote the leaves, the "weight" of 
 S, must be the number of accesses to both the nodes and the leaves of 
 S,. If the weight of the leaves is considerable, this is a significant 
 advantage over previous rules, all of which ignored accesses to leaves. 
 
 However, balancing at one node may cause other nodes to 
 become unbalanced. (See below). 
 
 ^ 
 
 Figure 3.3.2 
 
 Here, both node A and node C may require rebalancing. (However, no 
 rebalancing would be required at node A if node C had been its left 
 son). An attempt to correct these imbalances (and all the imbalances 
 resulting from the corrections) could be quite costly. A more rea- 
 sonable policy is to ignore the imbalances and rebalance at a later 
 request when the search path passes through the unbalanced node. 
 
 Tables 3.3.1 and 3.3.2 compare these two rules. The total 
 rotation rule (which corrects all imbalances) has a slightly lower cost, 
 
Table 3.3.1 
 
 The Performance of the Limited Single 
 Rotation (LSR) Rule 
 
 115 
 
 ZIPF'S LAW 
 
 
 English 
 Letters 
 
 Random Cost 
 
 5.15 
 
 Optimal Cost 
 
 3.32 
 
 LSR Cost 
 
 3.44 
 
 Increase Over 
 
 Optimum 
 
 3.55% 
 
 Average Number of 
 Rotations/Request 
 
 .111 
 
 Average Over the 
 Last 100 Requests 
 
 
 #1 
 
 n 
 
 #3 
 
 f 4 
 
 ? 5 Average 
 
 7.26 7.50 7.27 7.33 7.63 7.40 
 
 4.10 3.93 4.16 4.06 3.96 4.04 
 
 4.33 4.14 4.46 4.28 4.20 4.28 
 
 5.46% 5.31% 7.29% 5.42% 5.98% 5.89% 
 
 .199 .204 .199 .197 .200 .200 
 
 033 .041 .040 .039 .034 
 
 038 
 
 See Table 3.1.1 for explanation. 
 
Table 3.3.2 
 
 The Performance of the Total Single 
 Rotation (TSR) Rule 
 
 116 
 
 ZIPF'S LAW 
 
 English 
 Letters 1 
 
 *2 
 
 13 
 
 «4 
 
 t 5 Average 
 
 Random Cost 5.15 
 
 Optimal Cost 3.32 
 
 TSR Cost 3.41 
 
 Increase Over 
 
 Optimum 2.93% 
 
 Average Number of 
 Rotations/Request .113 
 
 Average Over the 
 Last 100 Rotations 
 
 7.26 7.50 7.27 7.33 7.63 7.40 
 
 4.10 3.93 4.16 4.06 3.96 4.04 
 
 4.33 4.11 4.41 4.22 4.17 4.25 
 
 5.57% 4.57% 6.02% 3.92% 5.27% 5.07% 
 
 .220 .219 .217 .209 .213 .215 
 
 .040 .048 .044 .036 .036 .041 
 
 See Table 3.1.1 for explanation, 
 
117 
 
 It gives an increase of 5.07% over the optimal cost (as compared with 
 5.89 percent for the limited rotation rule) with asuprisingly small 
 increase in the number of rotations required (an average of .215 per 
 request as compared with .200). However, there is much more overhead 
 associated with a rotation in the total rotation rule. Since imbal- 
 ances can propogate throughout the tree, either a pointer to a node's 
 father must be maintained or we must stack the nodes encountered during 
 the search for the requested key. 
 
 These tables also show how much work the rules do after many 
 requests. We consider the last 100 requests out of 500 in the simula- 
 tion. The limited rotation rule does an average of .038 rotations per 
 requested during this period, or approximately one rotation every 27 
 requests. The total rotation rule averages .041 rotations per request, 
 or one rotation e\/ery 24 requests. 
 
 A weakness of these rotation rules is that they do not con- 
 sider the "inside" subtrees (the right subtree of a node's left son, or 
 the left subtree of its right son, see Figure 3.3.3). 
 
 Figure 3.3.3 The inside subtrees of node 
 A are darkened. 
 
118 
 
 A rotation can promote either exterior subtree, but the interior sub- 
 trees remain at the same level. This can lead to very poor trees that 
 are still "stable" in that no rotation can be performed. Figure 3.3.4 
 shows an example. This tree is stable as long as the weight of a node 
 is less than or equal to that of his father. 
 
 Figure 3.3.4 
 
 While the worst case performance on such a distribution is 
 quite bad, a simulation suggests the average case is acceptable. The 
 probability distribution was p, . ^ , p 2 . ^ P 25 = tItb ' 
 
 1 3 49 
 
 P 26 ~ 7275 ' p 27 " 7275 •" p 50 = 7275 ' The tree shown inFi 9ure 3.3.4 
 
 is stable for this probability distribution. Yet, after 500 requests 
 the limited rotation rule reduced the cost to 4.7593, a mere 3.06 
 percent increase over the optimal cost of 4.6180. 
 
 The limited single rotation rule has several desirable 
 features. The necessary rotations can be performed during the search 
 for the requested key. The performance for the distributions we con- 
 sidered was good, within 5.89 percent of the optimal. After an initial 
 
119 
 
 period that reorganizes the tree, the rule required a rotation approxi- 
 mately every 27 requests, a very low maintenance cost. 
 
 The total single rotation rule has little more to offer. It 
 decreases the cost to within 5.07 percent of the optimum, and sur- 
 prisingly does only slightly more rotations. However the overhead 
 required by this rule to allow changes to propogate throughout the tree 
 is not justified by the relatively small decrease in cost, making the 
 limited rotation rule a better choice. 
 
 3.4 Double Rotations 
 
 A transformation (called a "double rotation") that allows the 
 promotion of inside subtrees is shown below. 
 
 -> 
 
 Figure 3.4.1 A double rotation, 
 
120 
 
 A rule that uses both single and double rotations has several ad- 
 vantages over a rule that is limited to single rotations. First, such 
 a rule will always be able to promote a requested node if it is 
 profitable to do so. Single rotations can be used to promote nodes in 
 the outside subtrees and double rotations for those in the inside sub- 
 trees. 
 
 A double rotation actually consists of two successive single 
 rotations, each promoting node B one level. The double rotation has an 
 advantage when doing both of these rotations will reduce the cost, but 
 doing only the first will not. A rule that is restricted to performing 
 single rotations will check if the first rotation can be done. Since it 
 cannot be, the tree is left unchanged, and the second rotation is not 
 considered. A rule which also considers double rotations will be able 
 to reduce the cost in this situation. 
 
 Table 3.4.1 shows the cost of a rule that uses both single 
 and double rotations. For the distribution considered, this method 
 averaged within 3.84 percent of the optimal cost. The total number of 
 rotations required per request (counting both single and double 
 rotations) is .208, which is very close to the averages for the limited 
 rotation rule (.200) and the total rotation rule (.215). 
 
 However, after many requests, fewer rotations are required 
 than for either single rotation rule. In fact, the average over the 
 last 100 requests was .027 single rotations (one e\/ery 36 requests) and 
 .008 double rotations (one eyery 129 requests). 
 
Table 3.4.1 
 
 The Performance of the Double Rotation 
 (DR) Rule 
 
 121 
 
 ZIPF'S LAW 
 
 English 
 Letters # 1 
 
 #2 
 
 §3 
 
 #4 
 
 # 5 Average 
 
 Random Cost 5.15 
 
 Optimal Cost 3.32 
 
 DR Cost 3.40 
 
 Increase Over 
 
 Optimum 2.35% 
 
 Average Number of 
 Single Rotations/ 
 Request .074 
 
 Average Over Last 
 100 Requests 
 
 Average Numbers of 
 Double Rotations/ 
 Request .037 
 
 Average Over Last 
 100 Requests 
 
 7.26 7.50 7.27 7.33 7.63 7.40 
 
 4.10 3.93 4.16 4.06 3.96 4.04 
 
 4.29 4.06 4.32 4.23 4.09 4.20 
 
 4.62% 3.45% 3.80% 4.11% 3.22% 3.84% 
 
 .129 .129 .127 .126 .126 .127 
 
 .029 .027 .025 .028 .028 .027 
 
 .082 .081 .082 .079 .080 .081 
 
 .007 .007 .008 .007 .009 .008 
 
 See Table 3.1.1 for explanation. 
 
122 
 
 These results are supported by a simulation run by Baer [20]. 
 He assumed all keys to be equally likely and found the cost of the 
 double rotation rule to range from 1.2 percent to 3.6 percent of the 
 optimum. This cost is lower than that we obtained because more re- 
 quests were made in Baer's simulation. In addition, his trees have 
 fewer nodes. This would also explain the lower cost since the cost of 
 smaller trees tends to be closer to the optimum (see Table 2.9.1, 
 compare the cost for the English letters (26 nodes) with the others 
 (100 nodes)). 
 
 Baer also gives statistics on the number of rotations done by 
 this rule. Again his results agree with ours. His most extensive 
 simulation (85,000 requests) required 23 rotations for the first 850 
 requests (one every 37 requests). The next 7650 requests caused 6 
 rotations (one every 1,275 requests), and the final 76,500 also caused 
 6 requests (on every 12,750 requests). These results indicate that the • 
 cost of "maintaining" the tree might be extremely low. 
 
 Bruno and Coffman [12] have considered an extension of this 
 rule that can promote a node any number of levels by using a sequence 
 of rotations. They, however, were concerned with an algorithm to build 
 a nearly optimal tree from a set of known key probabilities and used this 
 set of transformations to reduce the cost of the initial tree. Every 
 final tree in their simulation was within 5 percent of the optimum, and 
 the average was within 2.6 percent. 
 
 This suggests further rules, where we consider promoting the 
 requested node i levels for i = 1, 2, ..., k, where k is a parameter of 
 
123 
 
 the rule. Note that the single rotation rules have k = 1, and the 
 double rotation rule has k = 2. Increasing k will increase the work 
 the rule must do, but will result in decreased retrieval times. The 
 results of Bruno and Coffman suggest that the retrieval time will not 
 be greatly improved by increasing k beyond 2, while the increase in 
 the complexity of the algorithm to execute the rule would be substantial 
 
 3.5 Summary and Conclusion 
 
 We have examined several methods for dynamically altering 
 binary search trees to decrease their access time. The first two 
 methods were analogs of the linked list case : the move to root rule 
 and the move up one rule . The move to root rule was shown to be 
 identical to the first request rule (analogous to the case of the linked 
 list) and a formula for the cost was derived. Calculations showed this 
 method to be an improvement over a tree built by a random sequence of 
 insertions. The analogy breaks down when we consider the move up one 
 rule; it is often outperformed by the move to root rule. A simulation 
 showed the move to root rule to have lower average cost than the move 
 up one rule (38 percent of the optimum compared with 45 percent), in- 
 dicating that of these two rules, the move to root rule would be the 
 superior choice. However, these rules should be used only if we cannot 
 associate a counter with each key. If we can, the following rules will 
 give better performance. 
 
 We next considered rules that use counters. The first of 
 these was the monotonic tree rule. Its performance was found to be 
 
124 
 
 disappointing. Melhorne [13] has shown that the ratio of the cost of 
 a monotonic tree to that of the optimal tree may be as high as 
 n/(4 log n), for a tree of n nodes. If the weights of the nodes are 
 chosen according to a probability density function (a case considered 
 by Nievergelt and Wong [19]) the performance is asymptotically the 
 same as a random tree for any probability distribution. A simulation 
 by Walker and Gottlieb [14] also confirms the poor performance of this 
 method. Only if we assume the probabilities are chosen from a fixed 
 set (guaranteeing they will be "spread out") does this method signifi- 
 cantly improve over the cost of a random tree. A formula was derived 
 for this case, and significant decreases were obtained for Zipf's Law 
 and the geometric distribution. This assumption was also true in the 
 simulation we discussed. It showed the cost of this method to average 
 within 15 percent of the optimal cost. 
 
 We then discussed the most promising methods. Simulations 
 showed that the cost of the limited single rotation rule averaged with- 
 in 5.89 percent of the optimum. However, the worst case performance 
 was \zery bad, resulting in the tree shown in Figure 3.3.4. The total 
 single rotation rule reduced the average cost to 5.07 percent of the 
 optimum. However, since this rule requires much more overhead, this 
 small decrease in cost does not justify its use. 
 
 Finally, we considered the double rotation rule . Its cost 
 averaged approximately 3.84 percent of the optimum. Though this rule 
 must check for both single and double rotations, it averages a rotation 
 eyery 36 requests and a double rotation e\/ery 129 after the initial 
 period of reorganization. Compared with the limited single rotation 
 
125 
 
 rule (one rotation every 27 requests), the double rotation rule does 
 less work after the initial period. It then appears to be the best 
 choice of the counter rules. 
 
126 
 
 4. CONCLUSION 
 
 The purpose of this thesis was to examine various heuristics 
 that dynamically alter data structures by moving frequently accessed 
 keys near the "top" of the data structure. 
 
 The first data structure we considered was the linked list. 
 If relatively few requests (compared to the number of keys) are 
 anticipated, the fast convergence of the move to front rule (nearly as 
 fast as the optimum) makes it the best choice. If many requests are 
 expected, the transposition rule gives the best performance because 
 its asymptotic cost is close (10 percent) to the optimum. For an 
 intermediate number of requests, the first request/transposition rule 
 combines both of these features with a small additional overhead, 
 making it the best choice in this case. 
 
 If space is available for counters, there are much better 
 rules. If enough space is available so that the counters will never 
 overflow, the frequency count rule should be used. If this is not the 
 case, the limited difference rule uses whatever space can be spared and 
 gives nearly optimal results for even a small number of bits. 
 
 The second data structure we considered was the binary 
 search tree. Here we found the move to root rule to give the best 
 performance of the rules that do not use counters, approximately 38 
 percent of the optimum. If counters are available, the double rotation 
 rule appears to be best. Its performance averages 3.84 percent of the 
 optimum, and after a period of initial organization of the tree, it is 
 
127 
 
 very inexpensive to execute. On the average, a single rotation is 
 done es/ery 36 requests, and a double rotation every 129 requests. 
 
 The methods we have considered are simple and inexpensive 
 to execute. In addition, they significantly reduce the average access 
 time over data structures in which the keys are randomly arranged; in 
 some cases these methods keep the structure very close to the one of 
 optimum cost. 
 
128 
 
 APPENDIX 
 
 We will make great use of Markov chains, so a summary of 
 their important properties is required. These can all be found in [1]. 
 
 To define a Markov chain, we first consider a set S of states 
 
 and a sequence (x , n=0,l,...} of random variables which take their 
 
 values from S. The value x tells us which state the chain is "in" 
 
 n 
 
 at time n. 
 
 In addition, the Markov property must be satisfied. This is 
 Prob{x n+] = S n+1 |x Q = S Q ,...,x n = S n } = Prob{x n+1 = S n+] |x n = S n >, 
 which says that the probability of being in a given state depends only 
 on the previous state, and not any before that. We then define the 
 
 on 
 
 probabilities Prob{x , = j|x = i} (or just P ( i , j ) ) as the trans iti 
 
 probabilities of the chain and can form a matrix whose (i,j) element 
 
 is Prob{x ,, = jlx = i}. This is called the transition matrix P. If 
 n+ 1 ' n 
 
 we are given a probability distribution x over the states, then xP 
 gives us the probability distribution at the next time step. 
 
 This defines the basic idea of Markov chains. Several more 
 definitions are required. 
 
 Definition : A state x leads to a state y if there exists a sequence of 
 states x,,...,x such that 
 
 P(x,x ] ) P(x r x 2 ) ... P(x n ,y) > 0. 
 
 Definition : A set C of states is closed if no state in C leads to a 
 state outside of C. 
 
129 
 
 Definition : A closed set C is irreducible (also called ergodic ) if x 
 leads to y for all choices of x and y in C. 
 
 Most of the chains we are dealing with will be irreducible. That is 
 the set of all states, S, will be irreducible. S is then closed since 
 there are no states outside of S. For these chains, it will be clear 
 that some sequence of requests can be designed to cause any state to 
 lead to any other state. 
 
 Definition : Define the period of a state x as the greatest common 
 divisor (g.c.d.) of the set {n ^ 1 : P (x,x) > 0}. It can be shown that 
 all states have equal periods and this is defined as the period of this 
 chain. A chain with a period of 1 is called aperiodic . 
 
 Nearly all of the chains we deal with will be aperiodic. If the top 
 element in a configuration is requested, none of these schemes will 
 alter the configuration and hence we will have P (x,x) > 0. Hence 1 is 
 in the set {n > 1 : P n (x,x) > 0} so the g.c.d. must be 1 and the chain 
 must be aperiodic. 
 
 Definition : A steady state distribution (also called a stationary 
 distribution is defined as any probability distribution x over the states 
 such that xP = x. 
 
 Note that we can easily determine this distribution by solving 
 the system x(P-I) = 0. The following theorem shows that this distribu- 
 tion tells us the asymptotic behavior of the chain. 
 
130 
 
 Theorem : Any closed and irreducible chain with a finite number of 
 states has a unique steady state distribution. If the chain is 
 aperiodic, it approaches the steady state distribution from any initial 
 distribution. If the chain has period d, then for < i - d, the 
 
 chain x. > x i+H' x i+2d ••• nas a un1c l ue steady state distribution and 
 approaches it. 
 
 Another useful theorem that characterizes asymptotic behavior 
 is the ergodic theorem. 
 
 Theorem [Ergodic Theorem]: Let N.(s) be a random variable denoting the 
 number of times the chain has been in state s after t transitions. 
 Suppose that s has steady state probability p(s). Then if the chain is 
 closed and irreducible (ergodic) 
 
 Urn V s > , , • 
 
 Note the ergodic theorem holds for both periodic and aperiodic 
 chains. 
 
 The chains we are dealing with have a cost associated with 
 
 each state. Let the probability of being in state s at time t be P t (s), 
 
 and suppose s has steady state probability p(s) and cost c(s). Finally 
 
 define the cost of the chain at time t (C0ST t ) as J p t (s)c(s). 
 
 seS 
 
 For an aperiodic irreducible chain, we have .^ P t (s) = p(s). 
 
 Hence 1™ COST. = I p(s)c(s). 
 L u seS 
 
 For any irreducible chain, we can use the Ergodic Theorem to show 
 
131 
 
 1 l 
 
 lim j I COST. = 5! (s)c(s). Therefore £ p(s)c(s) determines the 
 
 t-*» i=l ] seS seS 
 
 asymptotic cost of the chain. We also use the following theorem. 
 
 Theorem : Let c. be random variables denoting the cost of the state the 
 chain is in at time i. Then for any irreducible chain, 
 
 -,. C-,+C 9 +.. .+C. 
 
 i™ e( j_^ — £)= lP(s)c(s) 
 
 L l seS 
 
 •,. c.+c +.. .+c. 
 lim VAR( J2 t )=Q 
 
 t-X" t 
 
 Proof : To prove the first statement, note E(c.) = COST.. Then 
 
 t n ^ 
 
 c +c + +c I C0ST i 
 
 lim rr c r c 2 + --- +c t , _ lim ifl \ - , , , el 
 
 t-o E( 1 ) " t-*» 1 Ip(s)c(s). 
 
 seS 
 
 To prove the second statement, 
 
 1-im c,+c«+...+c + 
 
 1™ VAR(-1— ^ h 
 
 t-*» t 
 
 "Mm c,+c +.. ,+c. 2 
 
 . lim E[( J_L 1 } j. ( Zp(s)c(s)) 2 
 
 seS 
 
 lnm Ct+C +. . .+C, 2 
 
 = E[(J™-J-L 1) ] - ( I p(s)c(s)) 2 (1) 
 
 1 L seS 
 
 lim C 1 +C«+...+C. 
 
 We now determine ' (— — ^r -). Let N.(s) be the number of times 
 
 state s is visited in t transitions. Then c,+c 2 +...+c. equals the 
 number of times we visit each state times its cost summed over all states 
 
132 
 
 Therefore 
 
 c +c + +c E N (s)c(s) 
 lim ( _l_^__t ) _ lim seS z 
 
 lim M s > 
 scS r l seS 
 
 by the Ergodic Theorem. Substituting this into (1) shows the variance 
 
 is zero. I I 
 
 Another important question is how quickly the chain approaches 
 steady state. We can tell this from the eigenvalues of the transition 
 matrix. To see this, suppose the n eigenvectors y-.,...,y span the 
 space of all probability distributions. We can then write an initial 
 distribution x Q as a linear combination of the y. 
 
 x« = c,y, + ... + c y 
 rl n-'n 
 
 So after t transitions, the distribution will be 
 
 "*O pt = C l^l + ••■ + c Ar 
 
 Since the chain has a stationary distribution, there is some x such that 
 xP = x. Hence x is an eigenvector with eigenvalue 1. If the chain is 
 closed, irreducible and aperiodic, we can show that all other eigenvalues 
 
 have modulus strictly less than 1. If we suppose X, is the eigenvalue 
 
 -t-t- t- t - 
 equal to 1 , we get x Q P = x + X-c^yp + ^3C 3 y 3 + ... + ^ n c n y n - Tnen as 
 
 t -*- °°, x n P -*• x, and the rate of convergence is limited by the size of 
 
 the other n-1 eigenvalues and eigenvectors. 
 
133 
 
 REFERENCES 
 
 [I] Hoel, P. B. , S. C. Port, and C. J. Stone, Introduction to Stochas- 
 tic Processes , Houghton Mifflin Company, Boston, 1972. 
 
 [2] Rivest, R. L., "On Self Organizing Sequential Search Heuristics," 
 CACM, 19 (1976), 63-67. 
 
 [3] Knuth, D. E., The Art of Computer Programming , Vol. 3, Addison- 
 Wesley Publishing Co., Reading, Mass., 1973. 
 
 [4] Yao, A. C. , Personal communication. 
 
 [5] Kahn, D., The Codebreakers , Macmillan Company, New York, 1967. 
 
 [6] Kucera, H. and W. N. Francis, Computational Analysis of Present-Day 
 American English , Brown University Press, Providence, R.I., 1967. 
 
 [7] Burville, P. J. and Kingman, J.F.C., "On a Model for Storage and 
 Search," J. Appl. Prob. , 10 (1973), 697-701. 
 
 Hendricks, W. J., "The Stationary Distribution of an Interesting 
 Markov Chain," J. Appl. Prob. , 9 (1972), 231-233. 
 
 [9] Hendricks, W. J., "An Extension of a Theorem Concerning an Inter- 
 esting Markov Chain," J. Appl. Prob. , ]0_ (1973), 886-890. 
 
 [10] McCabe, J., "On Serial Files with Relocatable Records," Operations 
 Res. , 12 (1965), 609-618. 
 
 [II] Knuth, D. E., "Optimum Binary Search Trees," Acta Informatica , 1 
 (1971), 14-25. 
 
 [12] Bruno, J. and E. G. Coffman, "Nearly Optimal Binary Search Trees," 
 Proc. IFIP Congress 71 (1971). 
 
 [13] Mel home, K. , "Nearly Optimal Binary Search Trees," Acta Informatica , 
 5 (1975), 287-295. 
 
 [14] Walker, W. A. and C. C. Gottlieb, "A Top-Down Algorithm for Con- 
 structing Nearly Optimal Lexicographic Trees," in Graph Theory and 
 Computing (ed. R. C. Reid), Academic Press (1972), 303-323. 
 
 [15] AdeTson-VeTskii, G. M. and E. M. Landis, Dokl . Akad. Nauk SSSR 146 
 (1962), 263-266; English translation in Soviet Math 3, 1259-1263. 
 
 [16] Nievergelt, J. and E a M. Reingold, "Binary Search Trees of Bounded 
 Balance," SI AM J Comput . 2 (1973), 33-43. 
 
134 
 
 [17] Williams, J.W.J. , Algorithm 232 - Heapsort. , CACM 7 (1964), 347- 
 
 [18] Hibbard, T. N., "Some Combinatorial Properties of Certain Trees 
 with Application to Searching and Sorting," JACM 9 (1962), 16-17. 
 
 [19] Nievergelt, J. and Wong, C. K. , "On Binary Search Trees," Informa- 
 tion Processing 71 , vol. 1, North Holland, Amsterdam, pp. 91-93. 
 
 [20] Baer, J. L. , "Weight Balanced Trees," National Computer Conferenc e 
 1975, pp. 467-472. 
 
 [21] Rudin, W., Prin ciples of Mathematical Analy sis, McGraw-Hill, New 
 York, 1976, p". 321. " 
 
135 
 
 VITA 
 
 James Richard Bitner was born August 2, 1953, in Minneapolis, 
 Minnesota. He attended the University of Illinois at Urbana-Champaign, 
 receiving a B.S. in Mathematics and Computer Science (June 1973) with 
 highest honors and distinction in Mathematics and Computer Science. He 
 continued at the University of Illinois for graduate study, receiving an 
 M.S. (December 1974) in Computer Science. During this time, he was 
 employed as a research assistant for Drs. C. L. Liu and E. M. Reingoldo 
 He is also a member of Phi Beta Kappa and Phi Kappa Phi. The titles of 
 his published articles are: "Use of Macros in Backtrack Programming," 
 "Tiling 5n x 12 Rectangles with Y-Pentominos" and "Efficient Generation 
 of the Binary Reflected Gray Code and Its Applications." 
 
.IOGRAPHIC DATA 
 ET 
 
 1. Report No. 
 
 UIUCDCS-R-76-8I8 
 
 ; |e anj Subtitle 
 
 EURISTICS THAT DYNAMICALLY ALTER DATA STRUCTURES TO 
 UREASE THEIR ACCESS TIME 
 
 ithor(s) 
 
 James R. Bitner 
 
 rtorraing Organization Name and Addtess 
 
 department of Computer Science 
 
 Jniversity of Illinois at Urbana-Champaign 
 
 Jrbana, Illinois 6l801 
 
 ftonsoring Organization Name and Address 
 
 National Science Foundation 
 •ashington, D.C. 
 
 ipf Irmt-ntary Notes 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 July 1976 
 
 8. Performing Organization Rept. 
 No. 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 NSF GJ-i+1538 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 In evaluating the access times of data structures, we often assume 
 : each key is equally likely to be requested. This is seldom the case 
 
 ) practice: some keys will be requested more often than others. The data 
 uctures considered are lists and trees, and access times can be substan- 
 
 Lally reduced in these data structures by the use of several simple 
 "istics that cause the more frequently accessed key to move to the 
 
 :op" of the data structure. These methods are analyzed and compared. 
 
 .' *ords and Document Analysis. 17a. Descriptors 
 
 ita structures, linked lists, binary search trees, access times 
 
 J nficrs /Open-Ended Te 
 
 OkTI Field/Group 
 
 ■ Statement 
 
 35 (10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21- No. of Pages 
 
 135 
 
 22. Price 
 
 USCOMM-DC 40329-P7 1 
 
JVN 14 1577