LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 510.84- 
 r££r 
 
 no.5>16-322 
 co p. 2 
 
 MATHEMATICS 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 To renew call Telephone Center, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 BUILDING 
 
 JAN 18 
 
 USE 
 
 jm i 
 
 8 
 
 ONLY 
 
 98Z 
 
 982 
 
 L161— O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/suggestionsforus321hend 
 
l 
 
 Report No. 
 
 April 10, 1969 
 
 
 COO-1018-1178 
 
 SUGGESTIONS FOR USE OF A 
 PARTICULAR DIRECTORY SCHEME 
 
 by 
 
 D. Austin Henderson, Jr. 
 David E. Gold 
 
 JUN 1 6 1680 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN • URBANA, ILLINOIS 
 
COO-1018-1178 
 
 Report No. 321 
 
 SUGGESTIONS FOR USE OF A 
 PARTICULAR DIRECTORY SCHEME 
 
 by 
 
 D. Austin Henderson, Jr.* 
 David E. Gold 
 
 April 10, 1969 + 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 *Presently at Massachusetts Institute of Technology, Project MAC 
 Document originally completed in April 1967 • 
 
Abstract 
 
 The storage/directory method described in this paper allows for 
 flexibility in modification of a rapidly growing directory while main- 
 taining a reasonable number of searches necessary to locate items whose 
 keys are stored in the directory. A mathematical analysis shows the 
 average number of searches to be of the same order of magnitude as for a 
 binary-chop method of table look-up while the directory itself need not 
 be reordered to accomodate new entries. The analysis is followed by a 
 series of procedures and suggestions for procedures which should prove 
 useful when employing this method. 
 
 Introduction 
 
 Two classical directory schemes are the binary-chop and sequen- 
 tial-link pointer* methods. The method to be described which is essen- 
 tially that of Hibbard [1], embodies many of the desirable features of 
 these two schemes, while avoiding many of the disadvantages. This method 
 or storage directory system will be referred to as the Binary-Chop Pointer 
 method. As will be shown later in this paper the BCP method reduces to 
 either the binary-chop or sequential pointer linked directory methods at 
 either end of its functional spectrum. 
 
 General 
 
 The binary-chop search method is a highly efficient directory 
 look-up procedure. Given a directory of n elements in collating sequence, 
 this scheme locates a key being searched for in, on the average, log n-1 
 searches. The main disadvantages of this type of table are the necessary 
 reordering of the entries as additions are made and the necessity of using 
 contiguous storage space for the table. 
 
 The sequential-pointer linked directory allows for rapid modifi- 
 cation and use of non-contiguous storage by merely setting pointers. On 
 the other hand, the search of a table of this type is necessarily a linear 
 
 *i.e. a directory in which a pointer in each entry indicates the next 
 entry. 
 
 -1- 
 
process and hence requires, on the average, p search for a table of n 
 entries. A BCP table allows for addition of entries with no reordering, 
 and does not require contiguous storage. The average number of searches, 
 however, is small, being less than I.38 logpn. 
 
 Format of the Table 
 
 A BCP table is made up of n entries , each entry consisting of 
 the following items: 
 
 1) the key on which the search is made. 
 
 2 ) a low pointer or L0P . 
 
 3) a high pointer or HIP. 
 
 k) the argument which is associated with the key in that entry. 
 
 The LjfrP in an entry refers (points) to an entry whose key is 
 low with respect to the present entry's key. Similarly the HIP in an 
 entry points to an entry whose key is high with respect to the present 
 key. Thus, an examination of a particular key in the table during a 
 search operation results in a ternary decision branch: 
 
 i) an equal compare in which case the argument corresponding 
 to that entry is retrieved. 
 ij ) the key being search for it is lower than the one being 
 examined, in which case the Lf)P indicates the location 
 of the next key to be examined, 
 iii) the key being searched for it is higher than the one being 
 examined in which case the HIP indicates the location of 
 the next key to be examined. 
 
 Example: 
 
 Assume that the following five names appear as keys in BCP table: 
 
 MAC, B0B, PETE, SAM, Jj£>E 
 The table might be: 
 
 -2- 
 
Location 
 
 
 L0P 
 
 HIP 
 
 Argument 
 
 1 
 
 mac: 
 
 2 
 
 3 
 
 XX 
 
 2 
 
 B0B 
 
 
 5 
 
 XX 
 
 3 
 
 PETE 
 
 
 k 
 
 XX 
 
 k 
 
 SAM 
 
 
 
 XX 
 
 5 
 
 JjbE 
 
 
 
 XX 
 
 If one desires to insert an entry whose key is MIKE, the 
 following operations occur: 
 
 1) Insert the entry in an available storage location (in this 
 case in location 6) with the L^P and HIP blank. 
 
 2) Compare MIKE to the key at location 1. Examine the appro- 
 priate pointer (in this case it is the HIP, since MIKE > 
 MAC). If it is not blank, continue the search at the indi- 
 cated location. (i.e. now make comparison at location 3«) 
 
 3) When the examined pointer is blank, the location of the key 
 being added is then inserted. In this case, the resultant 
 table is: 
 
 Locat 
 
 ion 
 
 Key 
 
 Lj£)P 
 
 HIP 
 
 Ar 
 
 gument 
 
 1 
 
 
 MAC 
 
 2 
 
 3 
 
 
 XX 
 
 2 
 
 
 BOB 
 
 
 5 
 
 
 XX 
 
 3 
 
 
 PETE 
 
 6 
 
 h 
 
 
 XX 
 
 k 
 
 
 SAM 
 
 
 
 
 XX 
 
 5 
 
 
 jf)E 
 
 
 
 
 XX 
 
 6 
 
 
 MIKE 
 
 
 
 
 XX 
 
 . Suppose one started with no entries in the table. If one then 
 inserted MAC, followed by BOB, PETE, SAM, J0E, MIKE, in that order, the 
 table above will result. However if one were to insert the entries in 
 the order BOB, MAC, J0E, SAM, PETE, one would obtain the table: 
 
Locat 
 
 ion 
 
 Key 
 
 LOP 
 
 hi: 
 
 1 
 
 
 BOB 
 
 
 2 
 
 2 
 
 
 MIKE 
 
 3 
 
 5 
 
 3 
 
 
 MAC 
 
 It 
 
 
 k 
 
 
 JjbE 
 
 
 
 5 
 
 
 SAM 
 
 6 
 
 
 6 
 
 
 PETE 
 
 
 
 Note that the structure of a BCP table is highly dependent upon 
 the ordering of keys as they are entered. 
 
 The algorithm used in determining the location of the pointer to 
 the new entry in the last example is the method used to search the table 
 to retrieve an entry already in the table. An equal compare will result 
 
 when the desired entry is encountered. An example of this basic algo- 
 
 T2l 
 nthm was used by Knowlton . A convenient form for representing the 
 
 BCP table is a tree in which each node corresponds to an entry in the table 
 
 From each node there may be both left and right arrows, corresponding to 
 
 L0P and HIP's respectively. The tree representation of the last table is 
 
 shown in Figure 1. A node X which has a pointer to a node Y is said to 
 
 subsume node Y and all nodes subsumed by Y. 
 
 Analysis 
 
 A measure of the worth of a table searching algorithm is the 
 average time required to locate an item in the table. Here, this time 
 is directly proportional to the average number of entries examined in 
 obtaining a required entry. For a given table this is the average of the 
 search times to each node in the tree that the table represents. A search 
 time of 1 is assumed to the top node, 2 to each node subsumed immediately 
 below this, and so on; a table with no nodes will be defined to have a 
 search time of to each element, and hence an average search time of 0. 
 The analysis is based on the notion that for a table of n+1 entries there 
 are exactly (n+l)l different orders in which the entries can be made. For 
 simplicity and without loss of generality it is assumed that values of the 
 keys of the entries are the integers 1 through n+1. Any given order of 
 elements in the table will be assumed equally likely i.e., with a proba- 
 

 
 
bility of — — . 
 
 (n+l)! 
 
 Each ordering of entries defines exactly the tree structure of 
 the table. The (n+l)l resulting trees are partitioned according to which 
 of the (n+l) integers is entered first and hence determines its top node. 
 Clearly there are (n)l trees in each block of this partitioning; each block 
 is equal likely. 
 
 The top node in the tree determines the number of elements to 
 
 the left of the top, and the number to the right. For example: for 
 
 tables consisting of the 5 keys 1, 2, 3? ^, 5> if ^ is the top node, the 
 
 subtree to the left of k will have 1, 2, and 3 in it, and that to the right 
 will have the node 5- 
 
 In any given block, say the (k+l)th, all trees can be character- 
 ized by the form shown in Figure 2; i.e. the top node is k+1, there are 
 the k nodes for 1 to k in the subtree to the left and the (n-k) nodes for 
 k+2 to n+l in the subtree to the right. 
 
 Within the block we will have all trees of this form. Hence, on 
 the average, the search time to an element in the left subtree will be 
 S(k)+1 where S(k) is the average search time to an element in a table with 
 k elements. There will be k such elements. The search time to an element 
 in the right subtree will be S(n-k)+l. There will be (n-k) such elements. 
 The search time to the top element, k+1, is 1. 
 
 Let the average search time over the (k+l)th block of the set of 
 
 all trees having n+l elements be denoted S, _,(n+l). 
 to k+1 
 
 S fn+1) - k [S( k ) +1 ] + ( n " k ) [S(n-k)+l] + 1 
 k+1^ ' n+l 
 
 = kS(k) + (n-k) S(n-k)+n + 1 
 n+l 
 
 Now to obtain the average search time over all blocks we take 
 an average (unweighted, as all blocks are equally likely) of S (n+l). 
 Thus, we arrive at the basic recursion formula: 
 
 -6- 
 
CM 
 
 3 
 
 -7- 
 
S(n+1) = 
 
 n+1 
 n 
 
 y kS(k) + (n-k) S(n-k) + n+1 
 k=0 n+1 
 
 n+1 
 
 k Z Q KS(R) + (n-k) S(n-R) +1 (l) 
 
 b^n+lj - — 
 
 (n+1) 2 
 
 As the summation k runs from to n, kS(k) runs from 0S(0) to nS(n) and 
 (n-k) S(n-k) runs from nS(n) to 0S(0). Also 0S(0) = 0. This leads to 
 
 n 
 
 S(n+1) = 
 
 2 ,Z 1 kS(k) 
 
 k=l v ' +1 
 
 (2) 
 
 (n+1) 2 
 
 This formula involves, in the calculation of S(n+l) the use of the values 
 S(l) ... S(n). A more useful formula involving only S(n) can be obtained 
 as follows: 
 
 n 
 
 Or: 
 
 -, ,, ! kSi kS < k > + i 
 
 S(n+1) = 5 
 
 (n+1) 2 
 
 2 - n - 
 
 (n+1) [S(n+1)-1] = 2 ^ kS(k) 
 
 (3) 
 
 n-1 
 
 = 2nS(n) +2 7 kS(k) 
 
 k=l 
 
 But from (3) with n replaced by n-1 
 
 n-1 2 
 
 2 k E x kS(k) = (n) c [S(n)-1] 
 
 2 - - 2 - 
 
 Hence, (n+l) [S(n+l)-l] = 2nS(n)+n [S(n)-1] 
 
 S(n+1)-1 = g nS(n)+n 2 S(n)-n 2 
 (n+1) 2 
 
S(n+1) = (n+2)n S(n)-n 2 + (n+l) 2 
 
 (n+l) 2 
 
 = (n+2)n S(n) + 2n + 1 
 
 7—2 (M 
 
 (n+l) 
 
 This formula can also be written: 
 
 t . x -, . v (n+2)n S(n)+2n + 1 
 
 (n+l) S(n+1) = ■» - y —~ — 
 
 v ' v ' n+l 
 
 And clearly S(n) behaves as log n for n sufficiently large. 
 
 Now knowing that S(n) = k In n in the limit, the value of k 
 can be determined from (k) . Clearly: 
 
 k ln(n+l) a {&£& k ln(n ) + 2n+l 
 (n+l) 2 (n+ir 
 
 As n -» <x , 
 
 Also 
 
 Hence 
 
 (n+2)n 
 
 2 
 
 (n+l) 2 
 
 ln(n+l) = ln(n) + ln(l + \) 
 
 i f \ 1 2 
 
 = ln(n) + - + - 2 + 
 
 , _ , v k 2k ~ . . / v 2n+l 
 k ln(n) + — + —ij + ... = k ln(n) + „ 
 
 n n ( n+ l) d 
 
 Comparing first order terms in: 
 
 k 2k 2 3 
 
 n n 2 n ^ 
 
 gives, in the limit as n -* og, k -* 2 . 
 
 However, by use of the formula exactly as calculated by a computer, we 
 obtain the results shown in Figure 3- Hence, for tables in the range of 
 
 7 
 
 normal usage, say up to 10 entries, 
 
 k < 1.83, and S(n) < 1.83 ln(n) = 1.31 log^O). 
 
 -9- 
 
CO 
 
 o 
 
 <T> 
 
 co 
 
 
 CO 
 
 m 
 
 -10- 
 
Deh'tioiis in a BCP Table 
 
 Deletion of an item from a basic BCP table is not always a 
 trivial operation. A node can have one of three pointer structures within 
 the BCP table. These are represented graphically in Figure k (a), - ^(c), 
 where X is the node to be deleted. 
 
 The deletion in case a) is merely the removal of the pointer to 
 node X. In case b), the pointer to node X is reset to point to node Y, 
 and is shown schematically in Figure U(d). 
 
 In either case, the entry corresponding to the deleted node X 
 will, in general, occur somewhere imbedded in the actual BCP table. This 
 space is now freed and may be used for a subsequent entry (addition) in 
 the table. 
 
 Case c) is somewhat more difficult to handle because there 
 exists a single pointer to X but two from X. Clearly this single pointer 
 cannot be made to point to two different nodes at the same time. 
 
 One possible solution would be to reset this lone pointer to 
 point to one of the two nodes subsumed by X, say the left (or Y) . A 
 pointer to the other subsumed node would be established at the first 
 node not having a HIP in use which is reached by successively following 
 HIP's from Y. (Note that any other form of linking node Z from a node 
 subsumed by Y will yield an incorrectly structured BCP table. In 
 particular, when there exists any other linkage to Z, the Z entry is 
 irretrievable.) This solution is undesirable because the search path 
 to Z and any node subsumed by it must now go through Y and in general, 
 other nodes subsumed by Y. 
 
 As an example, consider the representation of a BCP table shown 
 in Figure 5(a) where the number of searches for each node are listed 
 below the respective nodes. 
 
 It is desired to delete node e. If the pointer is now changed 
 to indicate the left node subsumed by e, the resultant situation is 
 shown in Figure 5(b). For the case in which the pointer is reset to 
 indicate the right node first, the table is represented by Figure 5(c). 
 
 -11- 
 
-Q 
 
 Q> 
 
 T3 
 
 =3 
 
 -0 
 
 (0 
 0) 
 
 C7» 
 
 
 0) 
 
 -12- 
 
Figure 5a 
 
 Figure 5b 
 
 Figure 5c 
 
 Figure 5d 
 ( * i nd i cates null node ) 
 
 •13- 
 
A more acceptable solution to the problem is reached by- 
 allowing the "image" of the deleted node to remain for comparison 
 purposes. The entry in the table corresponding to this node is flagged 
 to indicate that it no longer exists as a standard entry in the table, 
 but rather is there only to allow the search algorithm to continue to 
 reach nodes subsumed by it. Such a node is called a null node . The 
 tree representing such a deletion is shown in Figure 5(d). Note that 
 a null node may be entirely deleted if one or both of the two nodes 
 subsumed by it becomes deleted through some sequence of later dele- 
 tions. When this occurs, the null node is handled as in the earlier 
 non-troublesome cases (cases a and b above). 
 
 It is also possible to re-utilize the space in the table 
 containing a null node entry when a subsequent addition to the table 
 falls anywhere in the allowable range. The authors have written an 
 algorithm which does this in a manner which is essentially no more 
 complicated than the standard search algorithm. 
 
 Balance of a BCP Table 
 
 The structure of a BCP table varies between two extremes. 
 These endpoints are referred to as best case and worst case conditions. 
 The best case condition occurs when S is a minimum for a given number 
 of entries in the table and is the same (with respect to search time) 
 as a binary-chop method. This case is equivalent to the method de- 
 scribed by Brooks and Iverson . A schematic representation for such 
 a table containing seven keys (the integers 1 - 7) is shown in 
 Figure 6(a). Note that the tree corresponding to this condition is 
 unique only when the number of entries in the table is one less than 
 a power of two. 
 
 The worst-case condition is satisfied when S is a maximum for 
 a given number of entries in the table. The search time for the table 
 in this case is the same as for a sequential-linked-pointer directory. 
 Such a situation is that depicted in Figure 6(b). 
 
 -Ill- 
 
(0 
 
 to 
 
 0) 
 
 L. 
 3 
 cn 
 
 CO 
 
 -Q 
 
 to 
 
 l_ 
 
 -15- 
 
One method of eliminating this situation comes to mind when 
 one considers that most tables or directories are not formed by starting 
 with an empty list and merely adding entries. In general an initial 
 table is established by some kind of declaration and this table is 
 modified by later adding or deleting entries. The entries thus 
 initially declared should be established in the table in a best-case 
 fashion. A practical alternative to this method is applicable in 
 cases where there exists previous knowledge as to the usage of the 
 initial entries. 
 
 In such a situation, the approximate frequencies of use 
 (retrieval) could also be declared and this information would be used to 
 establish an optimum BCP table. Such a table will minimize S over the 
 retrieval of all entries in the table where the retrieval of each entry 
 is weighted accordingly. Note that in general this is not the same as 
 a best-case table. Regardless of the initial method used to start the 
 table, the tree representing the table might become highly asymmetric 
 or skewed through subsequent additions and/ or deletions. The way to 
 determine this would be to establish a measure of skewness which would 
 be calculated periodically. When this measure exceeded some allowable 
 limit, a garbage collection routine could take over and restructure 
 the table. Note that a larger well-structured table will require more 
 modifications than a smaller one before becoming adversely skewed, 
 hence a very large directory need be examined for skewness less fre- 
 quently than a smaller one. 
 
 A simple, although not necessarily best, measure of skewness 
 is merely S for a given table. This can then be compared to S for the 
 best-case condition and suitable action can then be taken. 
 
 Multiple Keying 
 
 hi 
 
 A method suggested by Brooks and Iverson L for a particular 
 type of binary-chop table is also applicable to the BCP table -- namely 
 multiple keying. In the BCP table this is equivalent to imagining many 
 columns instead of a single column for each entry (i.e., for a single 
 argument) each containing a key and a set of pointers. This allows 
 
 -16- 
 
searching on any set of keys and can be thought of as an alternate 
 indexing scheme. Note that it is also possible to use one or more of 
 the keys of an entry as the actual argument in some cases. An example 
 of such a case would be telephone directory information where the sets 
 of keys would be name, address, and telephone number, thus allowing the 
 retrieval of either or both of the remaining corresponding keys when 
 searching on any one of the three. 
 
 Segmentation 
 
 When using a BCP table with multiple keys, it is not always 
 possible to obtain a system of segmenting the table which is universal 
 to all keys. In the telephone directory example, one would be inviting 
 trouble to suggest that the table can be broken into two tables by 
 merely inserting people with- last names beginning in A-L in one table and 
 to M-Z in the other. Which of these tables does one look in when one 
 wants to retrieve a name but knows just a telephone number? 
 
 One special feature of the BCP table is helpful in this 
 respect: If we always require null nodes when deleting items, all 
 pointers in the table point down in the table (i.e. away from the first 
 entry) . The entire table can then be thought of as one long continuous 
 one, where different segments may be retrieved and searched as needed. 
 In general, a pointer from an entry in a segment may point to an entry 
 in another segment. However, because these pointers only point down- 
 ward, it will never be necessary to call a segment into memory twice 
 to locate an entry. Because a pointer need not point from an entry in 
 a segment to an entry in the next segment, some segments might be 
 entirely passed over. It would also be possible to structure the initial 
 table (at declaration time) such that a minimal number of segments need 
 be entered (and hence retrieved). 
 
 -IT- 
 
Multiple Entry Pointers 
 
 Previously, the top node of the tree defining a table was the 
 initial entry in that table and hence this entry contained the first key 
 to be examined in any search. This entry can be chosen at the initial 
 declaration of the directory from those entries thus declared, but when 
 multiple-keying the best candidate is not obvious. (indeed, a best can- 
 didate may not exist -- the best choice with respect to one set of keys 
 might be the worst with respect to another.) A method of resolving this 
 is to assign a pointer for each column. In the case where the table is 
 to be segmented, there need only exist the further restrictions that each 
 of these pointers indicate an entry in the first segment and that pointers 
 not in this first segment point downward as before i.e., the pointers in 
 the first segment may point up or down. This creates no problems because 
 this first segment is always the first to be retrieved and there are none 
 before (higher than) it which could be recalled. 
 
 Variable Segment Boundaries 
 
 In searching through a segmented table, it is obviously 
 advantageous to minimize the number of segments which must be retrieved. 
 The fact that pointers only point downward can sometime prove useful 
 here, too, depending upon the configuration of the table in tertiary 
 storage . 
 
 If the table is stored contiguously such that a number of 
 records comprise each segment, the segment boundaries may be varied to 
 optimize the transfer of data (table segments) into primary storage. 
 When a pointer indicates an entry which is outside of the segment 
 containing that pointer, the next segment to be retrieved is the one 
 starting with the record containing the new entry and continuing until 
 the proper number of records is reached. For example, suppose that 
 segments are made up of four records each, and that the entire table 
 is stored contiguously. If a pointer in the first segment indicates 
 an entry in the seventh record, the second segment (consisting of 
 records 5-8) need not be retrieved. Instead, a segment consisting 
 
 -18- 
 
of records 7 -10 can be retrieved. If this new entry's pointer now 
 
 routed the algorithm to an entry in the first half of the third segment 
 
 (records 9 and 10), no new segment need be immediately retrieved. 
 This would have been the case had records 5-8 been retrieved. 
 
 Duplications 
 
 A whole area of thought is opened up when we consider the 
 effects of having in any column, keys duplicated. 
 
 If no special precautions are taken we will have an equal 
 compare resulting while adding an entry to the table. Normally this 
 would trigger an error condition. 
 
 The simplest solution is to allow a duplicate key to be 
 considered as higher than the equal entry already in the table. This 
 inserts duplicates like any other entry. However when searching to 
 locate an element one would have to search all the way to the bottom of 
 the tree to determine if there were any duplicate entries. A partial 
 solution is to flag entries which have duplicates. Then one need only 
 search if one knew a duplicate existed. A further disadvantage of this 
 system is that to reach entries lower in the tree one will in general 
 have to search many duplicates -- a time wasting procedure. 
 
 The best system is to provide a pointer from a node which is 
 being duplicated by a new entry to that new entry. If an entry has SAM 
 as a key, for example, a third pointer is provided to the next entry 
 with SAM as a key (the HIP and L0P) being the first two. The first SAM 
 is flagged to indicate that a duplicate exists, or the presence or absence 
 of the third pointer (referred to as a duplicate pointer -- DUP) can be 
 tested. -A third entry with SAM as a key will simply be pointed to in a 
 sequential-pointer-linked fashion from the second SAM-keyed entry, and 
 so forth. To allow this scheme, there must be storage provided (for each 
 column if multiple-keying) in each original (non-duplicate) entry to 
 provide space for a possible later duplicate pointer. This usually 
 represents a fairly expensive use of storage. 
 
 -19- 
 
A scheme which overcomes the necessity of wasting core on 
 possibly unused duplicate pointers is the following. The second SAM -keyed 
 entry has its HIP set to point to the entry which the first SAM-keyed 
 HIP pointer indicated. This HIP pointer in the original SAM-keyed entry 
 indicates the new entry. The L0P of the new entry can be used for the 
 first pointer in a chained set to the third and higher-numbered SAM-keyed 
 entries (see Figure 7)- A draw-back of this system is that the pointer 
 from the second SAM-keyed entry to the entry indicated by the original 
 HIP pointer may be pointing physically upward in the table. If seg- 
 mentation is being used, this is unacceptable. 
 
 If both the boundary conditions above are in force (segmenta- 
 tion, no space for DUP pointers) as may happen in large files, a solu- 
 tion may be achieved at the expense of some search time. When a new 
 entry duplicates an old, the old entry is flagged and the new entry 
 entered in a BCP table containing only duplicate entries. Space is left 
 in this table for DUP pointers. One must search both tables to locate 
 a duplicate. But hopefully this duplicate table will be small relative 
 to the main table and both of these draw-backs will be minimal. 
 
 If one is multiple-keying, an entry may duplicate an existing 
 key in one column and be a new key in a second. One cannot store the 
 entry in two different tables at once, unless they are Interleaved. 
 This scheme necessitates having a pointer to the top node of the duplicate 
 table in each column, as these tables will be made up of keys from 
 different entries in each column. The following table illustrates this 
 concept of a multiple-keyed table with duplicate keys which is capable 
 of being segmented. 
 
 -20- 
 

 I- 
 
 w 
 
 o 
 o 
 
 -21- 
 
The keys to be searched here are name and age: 
 
 Loc. 
 
 Key 
 
 LjfrP 
 
 HIP 
 
 DUP^ 
 
 Loc . 
 
 Key 
 
 LjftP 
 
 HIP 
 
 f 
 
 DUP' 
 
 1 
 
 MAC 
 
 3 
 
 5 
 
 
 2 
 
 * 
 
 22 
 
 8 
 
 6 
 
 
 3 
 
 Dj6W 
 
 13 
 
 T 
 
 
 -*3 
 
 22 
 
 2U 
 
 22 
 
 10 
 
 5 
 
 TIM 
 
 9 
 
 19 
 
 
 6 
 
 29 
 
 12 
 
 16 
 
 
 7 
 
 JIM 
 
 
 
 
 8 
 
 !9 
 
 Ik 
 
 18 
 
 
 9 
 
 R0N 
 
 
 15 
 
 
 10 
 
 22 
 
 
 
 
 >ll 3 
 
 JIM 
 
 23 
 
 
 21 
 
 12 
 
 27 
 
 
 
 
 13 
 
 B/)B 
 
 IT 
 
 
 
 Ik 
 
 IT 
 
 
 
 
 15 
 
 SAM 
 
 
 
 
 16 
 
 32 
 
 
 26 
 
 
 IT 
 
 * 
 
 AL 
 
 
 
 
 18 
 
 * 
 20 
 
 
 20 
 
 
 19 
 
 TfM 
 
 
 
 
 20 
 
 21 
 
 
 
 
 21 
 
 JIM 
 
 25 
 
 
 
 22 
 
 27 
 
 
 
 
 23 
 
 AL 
 
 
 
 
 2U 
 
 20 
 
 
 
 
 25 
 
 JIM 
 
 
 
 
 26 
 
 33 
 
 
 
 
 *indicates that a key has been flagged to show that it has been 
 duplicated 
 
 Notes: 
 
 1 The locations here appear in sequential order only to simplify 
 the example. In actual practice, entries may occupy any fixed 
 size of memory, depending on length of keys, pointers, etc. 
 
 2 Space for the DUP is not left for entries in which none appears, 
 but only in those entries in which one exists in the table. 
 
 3 The pointers for entering the table to search on the two keys, 
 name and age, are assumed to be set at locations 1 and 2 re- 
 spectively. The arrows indicate the pointers which reference 
 the first entries in the duplicate table. 
 
 -22* 
 
If the added device of entering the key being duplicated 
 in the duplicate table as well is used when one desires to search for 
 a key which one knows to be duplicated, one can search the duplicate 
 table first, bypassing the (hopefully larger) search on the main table. 
 As one cannot recopy keys in other columns which are not duplicates, 
 one only copies one column into the duplicate table and uses, as its 
 argument, a pointer (admittedly upward) to the original entry. As this 
 upward pointer is used only when the final entry has been retrieved, 
 we will have at most one segment of table to recall. 
 
 -23- 
 
Summary 
 
 The Binary-Chop Pointer directory scheme seems to have consider- 
 able promise. The authors have noted several possibilities and have given 
 suggestions for incorporating same into a computer system. Their results 
 are by no means exhaustive, but, rather, merely suggestive of the possi- 
 bilities of a little explored method. 
 
 -2l4- 
 
Bibliography 
 
 1. Hibbard, T. N. Some Combinatorial Properties of Certain Trees 
 with Applications to Searching and Sorting. JACM 6 (Jan. 1962), 
 13-28. 
 
 2. Knowlton, Movie on L . 
 
 3. Brooks and Iverson Automatic Data Processing , Ch . F: Searching 
 and Sorting. 
 
 -25- 
 
Form AEC -427 
 
 (6/68) 
 
 AECM 3201 
 
 U.S. ATOMIC ENERGY COMMISSION 
 
 UNIVERSITY-TYPE CONTR \CTORS RECOMMENDATION FOR 
 
 DISPOSITION OF SCIENTIFIC AND TECHNICAL DOCUMENT 
 
 ( See Instructions on Rmrtne Side ) 
 
 1. AEC REPORT NO. 
 
 COO-10l8-1178-Report No. 321 
 
 2. TITLE 
 
 SUGGESTIONS FOR USE OF A PARTICULAR DIRECTORY SCHEME 
 
 3. TYPE OF DOCUMENT (Check one): 
 
 L}[ a. Scientific and technical report 
 
 U b. Conference paper not to be published in a journal: 
 
 Title of conference 
 
 Date of conference 
 
 Exact location of conference _^ 
 
 Sponsoring organization 
 
 □ c - Other (Specify) 
 
 4. RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): 
 
 Q a. AEC's normal announcement and distribution procedures may be followed 
 
 D b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. 
 
 |_J c. Make no announcement or distribution. 
 
 5. REASON FOR RECOMMENDED RESTRICTIONS: 
 
 6. SUBMITTED BY: NAME AND POSITION (Please print or type) 
 
 ^Austin Henderson, Jr., David E. Gold - Research Assistant, 
 
 Organization 
 
 , Depa^ent of.CoMputer S ci ence , University of I11inM , „^„ T1 _ .. 
 
 Signature 
 
 ois 6l80l 
 
 Date 
 
 April 10, 1969 
 
 FOR AEC USE ONLY 
 
 '■ «cVr;r™ A r ,N ' sTRAT0R ' s comments - ,f any - on above «*°™ ano D1STR , BUT , 0N 
 
 '■ PATENT CLEARANCE: 
 
 D a. AEC patent clearance has been granted by responsible AEC patent group. 
 U b. Report has been sent to responsible AEC patent group for clearance. 
 LJ c. Patent clearance not required.