The person chanrine- th« ™o* 
 sponsible for its ret?,? * material is re- 
 - h ''ch it was withdr w „° thC ,;i ; rar - v fro ™ 
 Latest Dote sta^pedMor ^^^ 6 
 
 »he University. ° y resu " in dismi $ sol from 
 
 ^;"; s ;;° , ::""""~ <■—»««. 
 
 L161 — O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/notesonavltrees441rein 
 
May 10, 1971 
 
 NOTES ON AVL TREES 
 
 Edward M. Reingold 
 
 THE LIBRARY OFJ I] 
 
 NOV 9 1972 
 
 UNIVERSITY OF ILLINOIS 
 AT UR8AIWA-CHAMPAIGN 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
Report No. 441 
 
 NOTES ON AVL TREES 
 
 By 
 
 Edward M. Reingold 
 
 Department of Computer Science 
 
 University of Illinois at Urbana- Champaign 
 
 Urbana, Illinois 61801 
 
 May 10, 1971 
 
Abstract 
 
 These notes introduce AVL trees and related algorithms. They 
 were written as supplementary material for a course in data structures 
 given by the Department of Computer Science of the University of Illinois 
 at Urbana-Champaign during the second semester of the 1970-71 academic 
 year. A background of §2.2 and §2.3 in The Art of Programming by Knuth 
 (Addison-Wesley, 1968) is assumed. 
 
 Key words and phrases 
 
 AVL trees; information retrieval. 
 
 CR Categories 
 
 3.74, 3-79 (AVL trees) 
 
One popular method for storing and retrieving information by its 
 "name" is to store the names in a binary tree. Each node of this tree would 
 look as in Figure 1, 
 
 LLINK 
 
 NAME 
 
 RLINK 
 
 Figure 1. 
 where LLINK is a pointer to the left subtree of the node, NAME is the name 
 of the information (along with a pointer to the information), and RLINK 
 is a pointer to the right subtree of the node. Generally, the names are 
 stored in the tree in such a fashion that the postorder list of the nodes is 
 an alphabetized list of the names. For example, suppose the information to 
 be stored is the documentation for various programming languages; the tree 
 might be constructed as in Figure 2. 
 
 LISP 
 
 COBOL 
 
 bcfl 
 
 CUPL 
 
 GEDANKEN 
 
 RUSH 
 
 NUCLEOL 
 
 SNOBOL 
 
 FORTRAN 
 
 Figure 2. 
 
To find a given name in the tree we perform the following algorithm 
 
 which is closely related to postorder traversal of the nodes : 
 
 Algorithm S 
 
 Let T be a pointer to the tree to be searched and let 
 ITEM be the name for which we are searching. 
 
 Step 1 (initialize) 
 
 Set P ^ T 
 
 Step 2 (is it the root?) 
 
 If P = A then the name we are looking for is not in 
 the tree and we are done. If NAME(P) = ITEM then we have formed 
 the name and we are done. 
 
 Step i (Go down one level) 
 
 If NAME(P) > ITEM set P <- LLINK(P). 
 If HAME(P) < ITEM set P *- BLINK (P). 
 Go back to step 2. 
 
 One special case of this method is the binary search and its 
 
 variants. 
 
 BCPL 
 
 ALGOL C0MIT 
 
 COBL DIALOG 
 
 CUPL GEDANKEN 
 
 /\ 
 
 FOBTBAN NUCLEOL 
 
 /\ 
 
 LISP PL/l 
 
 OL/2 SNOBOL 
 BUSH 
 
It is clear that if we apply Algorithm S to the tree in Figure 2 
 then at most five comparisons will be needed to determine whether or not a 
 given name is in the tree. On the other hand, had the tree structure been 
 as shown in Figure 3, as many as eight comparisons may have been required. 
 Clearly the structure of the tree in Figure 2 is more desirable. 
 
 In general, it is desirable to have the tree structure as "balanced" 
 as possible. Thus from the criterion of searching the tree, the optimal 
 structure for n names, assuming each name is equally probable, would be the 
 complete binary tree of n nodes; then Algorithm S is a binary search. (See 
 page Ij.01 in Volume 1 of Knuth. ) However, the names to be stored are usually 
 dynamic in nature and so one must allow for frequent additions and deletions 
 to the trees; if we require the tree to be completely balanced, then the 
 addition or deletion of a name would require the complete restructuring of 
 the tree. For example, the completely balanced tree for the names in the 
 trees given in Figures 2 and 3 is given in Figure k- 
 
 GEDANKEN 
 
 COMIT 
 
 PL/l 
 
 BCPL 
 
 DIALOG 
 
 ALGOL COBOL CUPL 
 
 FORTRAN 
 
 NUCLEOL 
 
 LISP OL/2 
 
 SNOBOL 
 
 RUSH 
 
 Figure k- 
 
 Adding a new language name, say AMBIT; would require a complete 
 restructuring of the tree to obtain the tree illustrated in Figure 5- 
 
COBOL 
 
 AMBIT 
 
 CUPL 
 
 ALGOL BCPL COMIT DIALOG 
 
 FORTRAN 
 
 GEDANKEN MJCLEOL PL/l SNOBOI 
 
 Figure % 
 
 Clearly there is a tradeoff between the maximum time required to 
 add and delete items and the maximum time required to retrieve an item. To 
 summarize, if there are n names then the minimal retrieval time is 
 approximately log n comparisons for a balanced tree, but addition and deletion 
 of nodes required a complete reorganization of the tree, requiring 
 proportional to n steps. On the other hand, if the tree is allowed to grow 
 randomly, so that addition and deletion of nodes is facilitated, then 
 retrieval time can take as long as n/2 comparisons. 
 
 AVI Trees 
 
 About ten years ago Adel'son-Vel'skii and Landis proposed a tree 
 structure which provides a good compromise between the two extremes of 
 complete balancing and unrestricted growth. Their structures, now known as 
 AVL trees are binary trees which have the property that the heights of the 
 two subtrees of any node in the tree differ by at most one. For example, 
 the following trees are not AVL; 
 
On the other hand, the following trees are AVI; 
 
 To discover an upper bound on the number of comparisons required 
 to retrieve a name in the worst case, we calculate the least number of 
 elements required to form an AVL tree of k levels. Such a tree is sometimes 
 called a mintree and will be symbolized M . A mintree of k levels may be 
 constructed by taking one item as the root of the tree and placing a mintree 
 of height k-1 as one subtree and a mintree of height k-2 as the other 
 subtree (compare this with Fibonacci trees). Counting the elements of this 
 tree we get a recursive formula for N k , the number of items in M : 
 
 \ • \-l + "k-2 +1 - 
 
 This equation can be solved by standard techniques, and the solution is 
 
 \ = 1 + 7T 
 
 k-1 
 
 -1. 
 
 Thus the height, b, of an AVL tree with R nodes is bounded above by 
 
 h < | log 2 (N k + 1) - 1, 
 
 3 
 
 and so the maximum number of comparisons to retrieve a name is ^ log n, 
 where n is the number of names in the tree. So the number of comparisons 
 required in the worst case for retrieval in AVL trees compares favorably 
 with the number required for completely balanced trees. 
 
Addition of Nodes to AVL Trees 
 
 In order for AVL trees to "be useful, we must show that it is not 
 too difficult for the tree to change dynamically. This section gives an 
 algorithm requiring time proportional to log 2 n to insert a new node in an 
 AVL tree of n nodes; the difficulty in this problem is clearly the fact 
 that the tree must be an AVL tree after the insertion of the new node. 
 
 We will suppose that we have been given an AVL tree, T, and a 
 new name, N, to be added to that tree. Applying Algorithm S to find N in 
 T the algorithm terminates at step 2 with P =j± = a left or right son 
 of some node in T; it is at this point in the tree that N should be added 
 as a new leaf, provided that the augmented tree would still be an AVL tree. 
 
 In any AVL tree exactly one of three possible conditions will 
 be true at any node of the tree: 
 
 CI. The height of the left and right subtrees of the node 
 are equal. 
 
 C2. The height of the right subtree of the node is one 
 greater than the height of the left subtree of the 
 node. 
 
 C3. The height of the right subtree of the node is one less 
 than the height of the left subtree of the node. 
 
 The algorithm for the insertion of a new node will insert that 
 node in the position where Algorithm S discovered it was not in the tree; 
 that is it will be added as the son of a node already in the tree. We 
 will refer to the tree with the node added as the augmented tree. If this 
 new node was added as the left (right) son of a node which previously 
 had only a right (left) son, then the tree is clearly still an AVL tree 
 ani no more has to be done. On the other hand, if this new node was 
 added as the son (right or left) of what was previously a leaf, we may 
 have destroyed the AVL property by increasing the length of a subtree. The 
 
crucial part of the insertion algorithm is to discover if the AVL property 
 of the tree has been ruined and if so to restructure the tree to regain 
 the AVL property. 
 
 The augmented tree is tested for the AVL property by going 
 up the path from the newly added node to the root of the tree. This 
 is facilitated in the algorithm by storing that path in a push 
 down stack as the tree is being searched to find where the node is to 
 be added; The construction of such a stack is a simple, inexpensive task, 
 'or in an AVL tree of a billion nodes a maximum of about fifty stack entries 
 will be needed (see Exercise 3). 
 
 As we follow the path up from the newly added node, we must check 
 at each node which of CI, C2, or CJ holds. This will be done by having each 
 node contain, in addition to the NAME, LLINK, and RLINK, a field called 
 !OND which indicates which condition holds. As the path is traced upward 
 
 the new node, in addition to checking the conditions, we will also update 
 them to take the new node into account. This is where the difficulties arise: 
 What if in adding the new node we have lengthened a subtree which was 
 already one longer than its brother subtree? At this point we must 
 restructure the tree. 
 
 When we consider the existing left-right symmetries, we find that 
 ;he only ways the tree can be unbalanced are given in the three cases below. 
 
 Case 1: 
 
 r 
 Here the new node, r, has been added to a leaf of the tree, 
 
 causing a subtree containing that node to be too long. The tree is 
 
 rebalanced by changing it to 
 
Case 2; 
 
 Subtree A 
 of height 
 
 n+1 
 
 Subtree B 
 of height 
 
 n 
 
 ■ 
 Subtree C 
 of height 
 n 
 
 Here the new node has been added in subtree A, causing the height 
 of the left subtree of p to be n+2 while the height of the right subtree is 
 n (see Exercise k) . The tree is reblanced by changing it to 
 
 Subtree A 
 of height 
 n+1 
 
 Subtree B 
 
 of height 
 
 n 
 
 Subtree C 
 of height 
 n 
 
 Case 3: 
 
 Subtree A 
 of height 
 n+1 
 
 ubtree B 
 Df height 
 n+1 
 
 Subtree D 
 of height 
 n+1 
 
 Subtree C 
 of height 
 
10 
 
 Here the new node has been added in subtree B and so the height of 
 the left subtree of p is n+3 while the height of the right subtree is n+1. 
 The tree is rebalanced by changing it to 
 
 Subtree A 
 of height 
 n+1 
 
 Subtree B 
 
 of height 
 
 n+1 
 
 Subtree C 
 of height 
 n 
 
 Subtree D 
 
 of height 
 
 n+1 
 
 (Why are these three cases exhaustive? In Case 3, what if the new 
 node was added to subtree C? See Exercise 5-) 
 
 On the basis of the above description, we now give Algorithm I to 
 insert a new node into an AVL tree, and restructure that tree (if need be) 
 to keep it AVL. The algorithm will assume nodes of the form indicated in 
 Figure 6, 
 
 NAME 
 
 COND 
 
 LLINK 
 
 RLINK 
 
 Figure 6 . 
 where NAME, LLINK, and RLINK are as in Figure 1, and COND is equal to one, 
 two, or three according as CI, C2, or C3 holds at that node. 
 
 Algorithm I : Insertion into AVL trees 
 
 NEW is the name to be added to the AVL tree pointed to by T. P will 
 be used in the search through the tree to find out where NEW should be added; 
 PP is always one step behind P in the tree, that is, PP will point to the 
 father of the node pointed to by P. PATH is a push down stack used to store 
 the nodes of the tree on the path from the newly added node up to the root. 
 Each element on the stack is an ordered pair (p,q) where p is a pointer to the 
 node and q is either "L" or "R", indicating which son of node p is followed in 
 going down the path. 
 
 This algorithm uses the notation a <- b «- c to represent assigning the 
 value of c to both a and b. Also, if S is a variable whose value is either 
 "L" or "R" then S-LINK is either LLINK or RLINK according to the value of S. 
 For example, 
 
11 
 
 has the same affect as 
 
 S = "L" 
 
 S»LINK(P) «- P 
 
 LLINK(P) «- P. 
 
 Step 1 (initialize) 
 
 Set P <- T 
 
 PATH - PP «- A 
 
 Step ! (Save the path) 
 
 If PP /A then PATH^z(PP,S) 
 
 Step i (Have we found the place?) 
 
 If P = A, we have found the place to add the new node: go to Step 5, 
 Step k (Go down the tree) 
 
 If NAME(P) = NEW then the name is already in the tree so it needn't 
 be added and we are done. 
 
 Otherwise set PP «- P and 
 
 if NAME(P) < NEW set P «- RLINK(P) 
 
 S *- "R" 
 
 if NAME(P) > NEW set P <- LLINK(P) 
 
 S «- "L M 
 
 Go back to Step 2. 
 
 ?tep 5 (Add new node to the tree) 
 
 X^Z AVAIL 
 
 NAME(X) *- NEW 
 
 CODE(X) «- 1 
 
 LLINK(X) *- RLINK(X) «- A 
 
 S«LINK(PP) *- X 
 
 Step 6 (Update condition indicators) 
 
 If PATH =Ji.> we are done. Otherwise, (P.,S)<£zPATH and do as 
 specified in the table below 
 
 CODE(P) = 
 
 r 
 
 
 
 S = "R" 
 
 S = "L" 
 
 1 
 
 C0DE(P)^2 
 repeat 
 Step 6 
 
 C0DE(P)«-3 
 repeat 
 Step 6 
 
 2 
 
 Go to Step 
 7 to 
 
 rebalance 
 the tree 
 
 C0DE(P)+-1 
 and we are 
 done 
 
 ' 
 
 C0DE(P)+-1 
 and we 
 are done 
 
 Go to Step "[ 
 to rebalance 
 the tree 
 
12 
 
 Step 7 (Case 1 and its symmetric variants) 
 
 - 
 
 If NAME(LLINK(LLINK(P))) = NEW then 
 
 COND(P) - COND(LLINK(P)) - COND(LLINK(LLINK(P) ) ) -1 
 
 RLINK(LLINK(P))^ P 
 
 if PATH = A , set T - LLINK(P), otherwise 
 
 (F,S)£=PATH 
 
 S • LINK(F) «- LLINK(P) 
 LLINK(P) <-A and we are done. 
 
 If NAME(RLINK(LLINK(P))) = NEW then 
 
 COND(P) - COND(LLINK(P)) - C0ND(RLINK(LLINK(P) ) ) 4-1 
 LLINK(RLINK(LLINK(P))) - LLINK(P) 
 RLINK(RLINK(LLINK(P))) ^P 
 if PATH = A; set T *" RLINK(LLINK(P)), otherwise 
 
 (F,S)<ZPATH 
 
 S«LINK(F) - RLINK(LLINK(P)) 
 RLINK(LLINK(P)) -A 
 LLINK(P) *-A and we are done. 
 
 If NAME(LLINK(RLINK(P))) = NEW then 
 
 COND(P) - COND(RLINK(P)) - C0ND(LLINK(RLINK(P) ) ) -1 
 
 LLINK(LLINK(RLINK(P))) ^P 
 
 RLINK(LLINK(RLINK(P))) - LLINK(P) 
 
 if PATH = A, set T «- LLINK(RLINK(P) ) , otherwise 
 
 (F,S)<ZPATH 
 
 S-LINK(F) - LLINK(RLINK(P)) 
 LLINK(RLINK(P)) *- A 
 RLINK(P) *- A and we are done. 
 
 If NAME(RLINK(RLINK(P))) = NEW then 
 
 COND(P) - COND(RLINK(P)) - COND(RLINK(RLINK(P) ) ) «- 1 
 
 LLINK(RLINK(P)) *- P 
 
 if PATH = A, set T ^ KLINK(P), otherwise 
 
 (F,S)<=PATH 
 
 S»LINK(F) «- RLINK(P) 
 RLINK(P) <- A and we are done. 
 
 Step 8 (Case 2 and its symmetric variants) 
 See Exercise 6. 
 
 Step 9 (Case 3 and it symmetric variants) 
 See Exercise 7. 
 
 It can be shown that Algorithm I requires time proportional to log 2 n 
 in the worst case (see Exercise 9). In addition, it can be shown that the 
 expected time is also proportional to log n (see Exercise 10). 
 
13 
 
 Exercises 
 
 1. (ML5) Prove that Algorithm S works correctly. 
 
 2. (MLO) Show that Fibonacci trees are among the most unbalanced 
 possible AVL trees. The Fibonacci trees are defined as follows: 
 
 F l - * F 2 " ' 
 
 F 
 n+2 
 
 A 
 
 F F ^ 
 n n+1 
 
 3- (0 In an AVL tree of n nodes, what is the length of the longest path 
 from a leaf to the root? 
 
 k. (3) In Cases 2 and 3, why must the heights of the subtrees A,B,C and 
 D be as indicated in the text? 
 
 5- (10) Explain in detail why Cases 1, 2, and 3 and their symmetric variants 
 are exhaustive. 
 
 6. (15) Write the necessary steps to handle Case 2 and its symmetric 
 variants. 
 
 7. (15) Write the necessary steps to handle Case 3 and its symmetric 
 variants. 
 
 8. (M25) Prove that Algorithm I works correctly. 
 
 9- (M20) Analyze the time required for Algorithm I in the worst case. Show 
 that the time is proportional to log n where n is the number of nodes in 
 the tree; that is, determine constants 0L,a _,p ,0 such that if T(n) is 
 the time to add to a node to a tree of n nodes, then 
 
 o^log 2 n + 3 2 < T(n) < <^log 2 n + p g . 
 
 - . (HMJO) Analyze the expected time required for Algorithm I. Show that it 
 too is proportional to log n. 
 
 11. (20) Design an algorithm to delete a leaf from an AVL tree so that the 
 tree retains the AVL property. 
 
 12. (30) Extend your algorithm in Exercise 11 to delete any node of an AVL 
 tree so that the tree retains the AVL property. Analyze the time required 
 for the algorithm (it can be done in time proportional to log n). Prove 
 that the algorithm works correctly. 
 
 13- (h0) Design an efficient algorithm to merge two AVL trees so that the 
 result is an AVL tree. Analyze the time required by your algorithm. 
 
14 
 
 Bibliography 
 
 [1] Adel'son-Vel'skii, G. M. and Landis, Ye. M. , An algorithm for the 
 organization of information, Dokl. Akad. Nauk SSSR 14 6 
 (1962), 263-266 (Russian). English translation in 
 Soviet Math. Dokl. 3 (1962), 1259-1262. 
 
 Foster, C. C. , Information storage and retrieval using AVL trees, 
 Proc. of ACM 20 th National Conf. (1965), 192-205. 
 
 . A study of AVL trees, Report Number GER-12158, Goodyear 
 
 Aerospace Corporation, Akron, Ohio (1965). 
 
 [4] Knuth, D. E. , The Art of Computer Programming, Volume 1 , Addison- 
 Wesley, Reading, Mass. (1968). 
 
UOM^