The person chanrine- th« ™o* sponsible for its ret?,? * material is re- - h ''ch it was withdr w „° thC ,;i ; rar - v fro ™ Latest Dote sta^pedMor ^^^ 6 »he University. ° y resu " in dismi $ sol from ^;"; s ;;° , ::""""~ <■—»««. L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/notesonavltrees441rein May 10, 1971 NOTES ON AVL TREES Edward M. Reingold THE LIBRARY OFJ I] NOV 9 1972 UNIVERSITY OF ILLINOIS AT UR8AIWA-CHAMPAIGN DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Report No. 441 NOTES ON AVL TREES By Edward M. Reingold Department of Computer Science University of Illinois at Urbana- Champaign Urbana, Illinois 61801 May 10, 1971 Abstract These notes introduce AVL trees and related algorithms. They were written as supplementary material for a course in data structures given by the Department of Computer Science of the University of Illinois at Urbana-Champaign during the second semester of the 1970-71 academic year. A background of §2.2 and §2.3 in The Art of Programming by Knuth (Addison-Wesley, 1968) is assumed. Key words and phrases AVL trees; information retrieval. CR Categories 3.74, 3-79 (AVL trees) One popular method for storing and retrieving information by its "name" is to store the names in a binary tree. Each node of this tree would look as in Figure 1, LLINK NAME RLINK Figure 1. where LLINK is a pointer to the left subtree of the node, NAME is the name of the information (along with a pointer to the information), and RLINK is a pointer to the right subtree of the node. Generally, the names are stored in the tree in such a fashion that the postorder list of the nodes is an alphabetized list of the names. For example, suppose the information to be stored is the documentation for various programming languages; the tree might be constructed as in Figure 2. LISP COBOL bcfl CUPL GEDANKEN RUSH NUCLEOL SNOBOL FORTRAN Figure 2. To find a given name in the tree we perform the following algorithm which is closely related to postorder traversal of the nodes : Algorithm S Let T be a pointer to the tree to be searched and let ITEM be the name for which we are searching. Step 1 (initialize) Set P ^ T Step 2 (is it the root?) If P = A then the name we are looking for is not in the tree and we are done. If NAME(P) = ITEM then we have formed the name and we are done. Step i (Go down one level) If NAME(P) > ITEM set P <- LLINK(P). If HAME(P) < ITEM set P *- BLINK (P). Go back to step 2. One special case of this method is the binary search and its variants. BCPL ALGOL C0MIT COBL DIALOG CUPL GEDANKEN /\ FOBTBAN NUCLEOL /\ LISP PL/l OL/2 SNOBOL BUSH It is clear that if we apply Algorithm S to the tree in Figure 2 then at most five comparisons will be needed to determine whether or not a given name is in the tree. On the other hand, had the tree structure been as shown in Figure 3, as many as eight comparisons may have been required. Clearly the structure of the tree in Figure 2 is more desirable. In general, it is desirable to have the tree structure as "balanced" as possible. Thus from the criterion of searching the tree, the optimal structure for n names, assuming each name is equally probable, would be the complete binary tree of n nodes; then Algorithm S is a binary search. (See page Ij.01 in Volume 1 of Knuth. ) However, the names to be stored are usually dynamic in nature and so one must allow for frequent additions and deletions to the trees; if we require the tree to be completely balanced, then the addition or deletion of a name would require the complete restructuring of the tree. For example, the completely balanced tree for the names in the trees given in Figures 2 and 3 is given in Figure k- GEDANKEN COMIT PL/l BCPL DIALOG ALGOL COBOL CUPL FORTRAN NUCLEOL LISP OL/2 SNOBOL RUSH Figure k- Adding a new language name, say AMBIT; would require a complete restructuring of the tree to obtain the tree illustrated in Figure 5- COBOL AMBIT CUPL ALGOL BCPL COMIT DIALOG FORTRAN GEDANKEN MJCLEOL PL/l SNOBOI Figure % Clearly there is a tradeoff between the maximum time required to add and delete items and the maximum time required to retrieve an item. To summarize, if there are n names then the minimal retrieval time is approximately log n comparisons for a balanced tree, but addition and deletion of nodes required a complete reorganization of the tree, requiring proportional to n steps. On the other hand, if the tree is allowed to grow randomly, so that addition and deletion of nodes is facilitated, then retrieval time can take as long as n/2 comparisons. AVI Trees About ten years ago Adel'son-Vel'skii and Landis proposed a tree structure which provides a good compromise between the two extremes of complete balancing and unrestricted growth. Their structures, now known as AVL trees are binary trees which have the property that the heights of the two subtrees of any node in the tree differ by at most one. For example, the following trees are not AVL; On the other hand, the following trees are AVI; To discover an upper bound on the number of comparisons required to retrieve a name in the worst case, we calculate the least number of elements required to form an AVL tree of k levels. Such a tree is sometimes called a mintree and will be symbolized M . A mintree of k levels may be constructed by taking one item as the root of the tree and placing a mintree of height k-1 as one subtree and a mintree of height k-2 as the other subtree (compare this with Fibonacci trees). Counting the elements of this tree we get a recursive formula for N k , the number of items in M : \ • \-l + "k-2 +1 - This equation can be solved by standard techniques, and the solution is \ = 1 + 7T k-1 -1. Thus the height, b, of an AVL tree with R nodes is bounded above by h < | log 2 (N k + 1) - 1, 3 and so the maximum number of comparisons to retrieve a name is ^ log n, where n is the number of names in the tree. So the number of comparisons required in the worst case for retrieval in AVL trees compares favorably with the number required for completely balanced trees. Addition of Nodes to AVL Trees In order for AVL trees to "be useful, we must show that it is not too difficult for the tree to change dynamically. This section gives an algorithm requiring time proportional to log 2 n to insert a new node in an AVL tree of n nodes; the difficulty in this problem is clearly the fact that the tree must be an AVL tree after the insertion of the new node. We will suppose that we have been given an AVL tree, T, and a new name, N, to be added to that tree. Applying Algorithm S to find N in T the algorithm terminates at step 2 with P =j± = a left or right son of some node in T; it is at this point in the tree that N should be added as a new leaf, provided that the augmented tree would still be an AVL tree. In any AVL tree exactly one of three possible conditions will be true at any node of the tree: CI. The height of the left and right subtrees of the node are equal. C2. The height of the right subtree of the node is one greater than the height of the left subtree of the node. C3. The height of the right subtree of the node is one less than the height of the left subtree of the node. The algorithm for the insertion of a new node will insert that node in the position where Algorithm S discovered it was not in the tree; that is it will be added as the son of a node already in the tree. We will refer to the tree with the node added as the augmented tree. If this new node was added as the left (right) son of a node which previously had only a right (left) son, then the tree is clearly still an AVL tree ani no more has to be done. On the other hand, if this new node was added as the son (right or left) of what was previously a leaf, we may have destroyed the AVL property by increasing the length of a subtree. The crucial part of the insertion algorithm is to discover if the AVL property of the tree has been ruined and if so to restructure the tree to regain the AVL property. The augmented tree is tested for the AVL property by going up the path from the newly added node to the root of the tree. This is facilitated in the algorithm by storing that path in a push down stack as the tree is being searched to find where the node is to be added; The construction of such a stack is a simple, inexpensive task, 'or in an AVL tree of a billion nodes a maximum of about fifty stack entries will be needed (see Exercise 3). As we follow the path up from the newly added node, we must check at each node which of CI, C2, or CJ holds. This will be done by having each node contain, in addition to the NAME, LLINK, and RLINK, a field called !OND which indicates which condition holds. As the path is traced upward the new node, in addition to checking the conditions, we will also update them to take the new node into account. This is where the difficulties arise: What if in adding the new node we have lengthened a subtree which was already one longer than its brother subtree? At this point we must restructure the tree. When we consider the existing left-right symmetries, we find that ;he only ways the tree can be unbalanced are given in the three cases below. Case 1: r Here the new node, r, has been added to a leaf of the tree, causing a subtree containing that node to be too long. The tree is rebalanced by changing it to Case 2; Subtree A of height n+1 Subtree B of height n ■ Subtree C of height n Here the new node has been added in subtree A, causing the height of the left subtree of p to be n+2 while the height of the right subtree is n (see Exercise k) . The tree is reblanced by changing it to Subtree A of height n+1 Subtree B of height n Subtree C of height n Case 3: Subtree A of height n+1 ubtree B Df height n+1 Subtree D of height n+1 Subtree C of height 10 Here the new node has been added in subtree B and so the height of the left subtree of p is n+3 while the height of the right subtree is n+1. The tree is rebalanced by changing it to Subtree A of height n+1 Subtree B of height n+1 Subtree C of height n Subtree D of height n+1 (Why are these three cases exhaustive? In Case 3, what if the new node was added to subtree C? See Exercise 5-) On the basis of the above description, we now give Algorithm I to insert a new node into an AVL tree, and restructure that tree (if need be) to keep it AVL. The algorithm will assume nodes of the form indicated in Figure 6, NAME COND LLINK RLINK Figure 6 . where NAME, LLINK, and RLINK are as in Figure 1, and COND is equal to one, two, or three according as CI, C2, or C3 holds at that node. Algorithm I : Insertion into AVL trees NEW is the name to be added to the AVL tree pointed to by T. P will be used in the search through the tree to find out where NEW should be added; PP is always one step behind P in the tree, that is, PP will point to the father of the node pointed to by P. PATH is a push down stack used to store the nodes of the tree on the path from the newly added node up to the root. Each element on the stack is an ordered pair (p,q) where p is a pointer to the node and q is either "L" or "R", indicating which son of node p is followed in going down the path. This algorithm uses the notation a <- b «- c to represent assigning the value of c to both a and b. Also, if S is a variable whose value is either "L" or "R" then S-LINK is either LLINK or RLINK according to the value of S. For example, 11 has the same affect as S = "L" S»LINK(P) «- P LLINK(P) «- P. Step 1 (initialize) Set P <- T PATH - PP «- A Step ! (Save the path) If PP /A then PATH^z(PP,S) Step i (Have we found the place?) If P = A, we have found the place to add the new node: go to Step 5, Step k (Go down the tree) If NAME(P) = NEW then the name is already in the tree so it needn't be added and we are done. Otherwise set PP «- P and if NAME(P) < NEW set P «- RLINK(P) S *- "R" if NAME(P) > NEW set P <- LLINK(P) S «- "L M Go back to Step 2. ?tep 5 (Add new node to the tree) X^Z AVAIL NAME(X) *- NEW CODE(X) «- 1 LLINK(X) *- RLINK(X) «- A S«LINK(PP) *- X Step 6 (Update condition indicators) If PATH =Ji.> we are done. Otherwise, (P.,S)<£zPATH and do as specified in the table below CODE(P) = r S = "R" S = "L" 1 C0DE(P)^2 repeat Step 6 C0DE(P)«-3 repeat Step 6 2 Go to Step 7 to rebalance the tree C0DE(P)+-1 and we are done ' C0DE(P)+-1 and we are done Go to Step "[ to rebalance the tree 12 Step 7 (Case 1 and its symmetric variants) - If NAME(LLINK(LLINK(P))) = NEW then COND(P) - COND(LLINK(P)) - COND(LLINK(LLINK(P) ) ) -1 RLINK(LLINK(P))^ P if PATH = A , set T - LLINK(P), otherwise (F,S)£=PATH S • LINK(F) «- LLINK(P) LLINK(P) <-A and we are done. If NAME(RLINK(LLINK(P))) = NEW then COND(P) - COND(LLINK(P)) - C0ND(RLINK(LLINK(P) ) ) 4-1 LLINK(RLINK(LLINK(P))) - LLINK(P) RLINK(RLINK(LLINK(P))) ^P if PATH = A; set T *" RLINK(LLINK(P)), otherwise (F,S)