LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 TIGr 
 
 no. ?G I - ?&3 
 
 Cop. 2 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/listmergingproce762holl 
 
Report No. UIUCDCS-R-75-762 NSF - OCA - DCR73-07980 A02 
 
 A LIST MERGING PROCESSOR 
 FOR INVERTED FILE INFORMATION RETRIEVAL SYSTEMS 
 
 Lee Allen Hollaar 
 
 October 1975 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 SHE LIBRARY OF THE 
 
 UNIVERSITY OF il,_.h<JIS 
 fcl UR8ANA-CHAMPAIGN 
 
Report No. UIUCDCS-R-75-762 
 
 A LIST MERGING PROCESSOR * 
 
 FOR INVERTED FILE INFORMATION RETRIEVAL SYSTEMS 
 
 by 
 
 Lee Allen Hollaar 
 
 October 1975 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinoia 61801 
 
 * 
 This work was supported in part by the National Science Foundation under 
 
 Grant No. US NSF DCR73 -07980 A02 and was submitted in partial fulfillment 
 
 of the requirements for the degree of Doctor of Philosophy in Computer 
 
 Science, October 1975. 
 
A LIST MERGING PROCESSOR 
 FOR INVERTED FILE INFORMATION RETRIEVAL SYSTEMS 
 
 Lee Allen Hollaar, Ph.D. 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign , 1975 
 
 In large scale inverted file information retrieval systems implemented on 
 conventional digital computers, it is possible for more time to be spent 
 processing and merging the index lists than on any other activity. However, 
 this non-numeric processing cannot be performed efficiently by general purpose 
 digital computers, thus reducing either the response time of the system or the 
 number of simultaneous users. This paper describes a simple processor which 
 can efficiently process these lists while the main computer devotes itself to 
 other tasks. 
 
 It is also possible to combine these merge processors into networks which 
 can process complex expressions directly, with no requirement for the storing 
 and later refetching of intermediate results, This eliminates the need for 
 memory cycles to store and later refetch these intermediate results, 
 effectively increasing the available memory bandwidth. 
 
 Designs for both word-parallel and bit-serial implementations of the 
 merge processor are presented. The modifications necessary for these 
 processors to be connected as a network, and in particular to form a binary 
 tree, along with algorithms for parsing expressions which can be contained in 
 the available tree and which are too large and must be subdivided, are also 
 discussed. 
 
iii 
 
 "But you can't look up all those license numbers 
 in time," Drake objected. 
 
 "We don't have to, Paul. We merely arrange a list 
 and look for duplications." 
 
 -Perry Mason 
 (The Case of the Angry Mourner, 1951) 
 
iv 
 
 TABLE OF CONTENTS 
 
 Page 
 CHAPTER 1 — INTRODUCTION TO THE PROBLEM . < . . -I 
 
 1 . 1 Information Retrieval Queries 2 
 
 1 . 2 Database Structure .....**........... H 
 
 1.3 Zipf's Law and the Level of Inversion ........ 7 
 
 Hi Processing on Conventional Computers . .. 13 
 
 CHAPTER 2 — A SPECIALIZED LIST MERGING PROCESSOR 2 
 
 
 
 2 . 1 Previous List Merging Processors 20 
 
 2.2 A Simple List Merging System . . . * *....- 23 
 
 2.3 Parallel Element Implementation ...... .i ....... . 26 
 
 2.4 Serial Element Implementation . 30 
 
 CHAPTER 3 — MERGE PROCESSOR NETWORKS ....... 37 
 
 3 . 1 Network Bandwidth Considerations ....... 37 
 
 3 • 2 Merge Network Hardware Considerations 41 
 
 3.-3 A Serious Problem ........ i *.....-. ..**.. 247 
 
 3.4 Parsing Expressions for a Fixed Tree Size *»...»«<»,». 55 
 
 CHAPTER H — FUTURE RESEARCH ,.....,-.,,.*...-..*...*. 6 
 
 4 .- 1 Higher Bandwidth Memories •■ ... s .**«.-. 4 .• s *.*...?.-.....*. 61 
 
 4 . 2 Mult i-User Systems •■.*•«.*.**.,.»•*.•*»«..*,»»« 65 
 
 CONCLUSIONS > * * ........ . s ....... . . . . . , . , . . , . , . . 68 
 
 LIST OF REFERENCES ..........-..,.,.,......,.,.-.--.,...., , 70 
 
 VITA . . •*«»•*...•**#.«,,*,*»*,,,»„,,„,»,»,».,,,,,,,,,.,,, 72 
 
LIST OF FIGURES 
 
 Page 
 
 1 . 1 Context Hierarchies .... * 3 
 
 1 .2 Logical Database Organization 6 
 
 1.3 Table of Word Frequency in "Ulysses" .*.....*».- 9 
 
 1.4 Zipf Curve for "Ulysses" *..*.. 9 
 
 1.5 Zipf Curves for State Statutes ..*........ . 10 
 
 1.6 Truncated Zipf Curve .■ 5 12 
 
 1 .-7 Savings Using Complementary Lists ........... ^ ... t 12 
 
 1 .8 AND Subroutine from EUREKA System ........... , 15 
 
 1 .9 List Entry Format for AND Subroutine . . . . 16 
 
 1 . 1 Operations Table ........ s ....... t .. . . ................... s » 17 
 
 2 . 1 Stellhorn/Batcher Merge System . ... t ............. t . t ...... . 22 
 
 2.2 HARVEST Functional Units .*«*«*»<,•«*.•»««,..•,*.».•,,<« 22 
 
 2 . 3 Merge System Configuration ........... i .. . 24 
 
 2.4 Complex Merge System Configuration . 24 
 
 2 . 5 Merge Element Block Diagram .................. ^ .................... . 27 
 
 2.6 Ripple Carry Number Comparator . ^ ........ a ... iX ... . 29 
 
 2.7 Parallel Count Field Generator ................... .. .. 29 
 
 2.8 Serial Merge Element Block Diagram *......*...* ........ .... 31 
 
 2.9 Basic Serial Merge Element . ........ i . 31 
 
 2.10 Complete Serial Merge Element 33 
 
 2.11 Serial Element Control Flow Chart 34 
 
 2.12 Element Performance ........ . . 35 
 
 3.1 Network to Process a Complex Expression ...... i 38 
 
 3 .2 Binary Tree Network Configuration 43 
 
 3 . 3 Network with PASS Element .................. t 44 
 
 3 . 4 Network Gate Counts *....**.* .. 46 
 
 3 .5 Network Deadlock ......... . 48 
 
 3 .6 Modified Operations Table * . * * . . . 49 
 
 3.7 Modified Network Operation . . 51 
 
 3.8 Multi-Pass Expression Processing 58 
 
 3 . 9 Input Pair Classes t .......... iV . ............. t ...... . . -. . 58 
 
 4 , 1 Simulation Result s ............ ^........^..........g......... 64 
 
CHAPTER 1 -- INTRODUCTION TO THE PROBLEM 
 
 This thesis deals with the design and use of a specialized data 
 processing system in connection with a conventional digital computer running a 
 large scale information retrieval system using an inverted file database 
 structure. It describes the operation of a representative information 
 retrieval system implemented on a standard digital computer, with emphasis on 
 the structure of the database and the types of queries generally made of it. 
 Then it will be shown that the processing of standard queries frequently 
 requires the merging of two or more ordered lists of index term pointers. The 
 design of a simple processor for aiding in this and a technique for increasing 
 the effective memory bandwidth available by connecting a number of these merge 
 processors in a network will be presented. The implications to both the 
 hardware design and the methods used to parse expressions when the network is 
 in the form of a fixed binary tree concludes the third chapter* Finally, a 
 number of areas for future research are proposed, with preliminary results in 
 these areas given* 
 
 This thesis will not try to justify the use of a particular file 
 structure, data format, or form of query. These problems have been discussed 
 at great lengths in the past , and to do so here would only obscure the issue 
 of a specialized merge processor. Suffice it to say that inverted file 
 information retrieval systems exist, that large scale systems of this type 
 show a decrease in performance due to the disproportionate time spent handling 
 the merging and correlation of the index terms, and that a specialized 
 processor presents a possible solution to the problem. 
 
1 . 1 Information Retrieval Queries 
 
 A representative information retrieval system can be considered to have a 
 command which locates items* within the database which match a given pattern. 
 Any additional commands, such as those which print out the results of the 
 search, create sets of items based on previous searches, and the like, are not 
 important to the discussion of list merging. This command can find all items 
 which contain one or more of a set of explicitly specified character strings 
 (with the command in the form "FIND 'AARDVARK'" to locate all occurrences of 
 AARDVARK, or "FIND 'AARDVARK' OR 'AARDWOLF'" to find all items which contain 
 either AARDVARK or AARDWOLF), This union of terms can also be specified 
 implicitly by the use of "explosion" or "wildcard" functions. For example, 
 EXPLODE( 'HOUND') may produce an operation equivalent to 'BASSET HOUND' OR 
 
 'BEAGLE' OR Similarly, PROGRAM* (where # is assumed to match any 
 
 arbitrary character string, including the null string) may be equivalent to 
 
 'PROGRAM' OR 'PROGRAMMED' OR 'PROGRAMMER' OR , with as many terms as 
 
 there are words in the database which begin with PROGRAM. 
 
 In addition, the command allows searching for two or more terms occurring 
 within a given context. While the meaning of context varies from system to 
 system depending upon the inherent heirarchy of the data stored, the contexts 
 given in Figure 1.1 for EUREKA will be assumed. An example of this second 
 form of the command is "FIND 'BEOWULF' AND 'GRENDEL' IN PARAGRAPH", which 
 
 * An item is the primary entity in the representative information retrieval 
 system. When inverted files are present, it represents the context to 
 which the inversion was made, and which must be searched if a lower context 
 is specified. For EUREKA [1], an item corresponds to a document. 
 
EUREKA PDP-11 System 
 
 State Statutes 
 
 Database 
 Document 
 Author 
 Title 
 Source 
 Date 
 Keys 
 
 Abstract 
 Body 
 
 Paragraph 
 Sentence 
 Notes 
 
 References 
 Comments 
 
 Statutes 
 Chapter 
 Index 
 Act 
 Title 
 
 Enabling Clause 
 Section 
 Paragraph 
 Sentence 
 Footnotes 
 
 Figure 1.1- Context Hierarchies 
 
matches all items which have a paragraph containing both BEOWULF and GRENDEL. 
 If no context is specified, it is assumed that the request is for the 
 co-occurrence of the terms in the same item. A special form of the AND 
 connective exists in the form "FIND 'AREA NAVIGATION'". This expression asks 
 the system to find a sentence which contains both AREA and NAVIGATION, 
 occurring adjacently and in that order. 
 
 A third form of an expression is "FIND 'COMPUTER' AND NOT 'ANALOG'" which 
 finds all items which contains the word COMPUTER but not the word ANALOG. It 
 is easy to see that the OR connective increases the number of items meeting 
 the test, while the two forms of the AND decrease the number. This is 
 important, since the primary function of the information retrieval system is 
 to reduce the number of items which must be examined for relevancy. 
 
 These three forms can also be combined to make a more complex expression, 
 for example: 
 
 FIND EXPLODE('RNAV') OR 'OMEGA' AND 'DIGITAL COMPUTER' IN 
 
 SENTENCE OR 'PROGRAM*' AND NOT 'INERTIAL' 
 
 AND 'NAVIGATION' IN PARAGRAPH 
 
 1.2 Database Structure 
 
 Probably the easiest database organization, both conceptually and in 
 terms of implementation, is one consisting of all the textual material 
 organized in a single file which is searched sequentially in response to a 
 user's query. The programming necessary to implement the previously discussed 
 connectives is fairly obvious. The only disadvantage to this type of 
 organization is that the time required for each search on a conventional 
 
digital computer increases linearly with the number of characters (or items) 
 stored. For a batch processing system this may not be a serious problem, 
 since many user's requests can be satisfied in a single pass thru the 
 database. However, for a system like the National Library of Medicine's 
 MEDLINE [2], with 500,000 documents available for online user inquiries, the 
 batching of requests would be impractical, while the initiation of a search 
 thru the entire database for each user request is impossible if adequate 
 response times for a large number of users is to occur. 
 
 The answer to this problem is to provide an index to the material which 
 can be checked to eliminate the needless searching of items which do not 
 contain the desired terms. This index can be prepared manually by trained 
 indexers, or automatically by a computer. In the latter case, all words can 
 be indexed, only certain words from a predefined list can be indexed, or all 
 words except those on an exception list (such as THE, AND, A, etc.) can be 
 indexed. 
 
 Figure 1.2 illustrates the logical organization of the database in an 
 inverted file structure. The index file contains lists of pointers to items 
 in the text file, and other data necessary for the operation of the system. 
 Additionally, the list entries may contain context flags, to indicate that the 
 word is also contained in some context outside of the body of the item (such 
 as the title), and other data, such as a count field to indicate the frequency 
 of occurrence of the word in the item. 
 
 As long as the expressions imply the item level for a context, the 
 requested operations can be performed without actually searching the textual 
 material. For an OR operation, all that is required is to form the union of 
 
Index File 
 
 Text File 
 
 
 List Header A 
 Index Term 
 Entry Count (=N) 
 
 List Entry Al 
 
 Item Pointer (=X) 
 Context Flags 
 Other Data 
 
 List Entry A2 
 
 Item Pointer 
 Context Flags 
 Other Data 
 
 List Entry AN 
 
 Item Pointer 
 Context Flags 
 Other Data 
 
 List Header B 
 Index Term 
 Entry Count 
 
 List Entry Bl 
 
 Item Pointer 
 Context Flags 
 Other Data 
 
 Item 1 
 
 Item 2 
 
 Item X-l 
 
 Item X 
 
 Item X+l 
 
 Figure 1.2 - Logical Database Organization 
 
'two or more index lists, for AND, the intersection, and for AND NOT, the 
 removal of entries contained in the second argument's list from the list 
 specified by the first* For requests specifying a lower level of context, 
 such as the sentence, the result of the merging of the index lists gives a set 
 of items which have a chance of success in a full-text search. This saves 
 processor time by not requiring the searching of items which cannot possibly 
 match the specified search expression. 
 
 1.3 Zipf's Law and the Level of Inversion 
 
 Since the storage requirements for an inverted file system depend upon 
 not only the total number of words in the database (the number of tokens ) , but 
 also the number of distinct words (the types ) and the number of times each 
 type occurs (the number of tokens per type) , estimating the requirements is 
 more difficult than with a full text searching system. However, it has been 
 observed for a number of natural phenomena, including large collections of 
 text, that when the types are ranked according to their number of tokens, the 
 product of the rank of the type and the number of its tokens is constant. 
 This is referred to as Zipf's Law*. When a constant product is plotted with 
 the rank of the type on the x-axis, and the number of tokens in the type on 
 the y-axis, the graph is in the form of a rectangular hyperbola. If this same 
 curve is plotted on log-log coordinates, the result will be a straight line. 
 
 * After George Kingsley Zipf, a professor of German at Harvard University. 
 Zipf initially studied the distribution of words in a variety of languages 
 and from a number of sources, observing approximately the same results. He 
 regarded these results as some sort of universal truth, which he called the 
 Priciple of Least Effort, and attempted to use it to explain the Civil War, 
 Committees of Congress, chamber music, the Chicago Tribune, and sex. [3»^] 
 
Figure 1.3 represents data collected by Dr* Miles Hanley and Dr. M. 
 Joos on the distribution of words in James Joyce's novel "Ulysses" [5]. This 
 work was initially selected to demonstrate that the Zipf distribution would 
 not exist in a sample of this size (260,430 tokens). It can be seen by the 
 product column in the table and from the graph of the data on log-log 
 coordinates in Figure 1*4, that even a sample of this size confirms Zipf's 
 observations. The abnormality at the right side of the curve exists because a 
 word can only occur an integral number of times; the final step, for example, 
 is between words which occur twice and those which occur once* 
 
 Figure 1.5 show the curves for a large database consisting of the 
 statutes of a state.- The curve for inversion to the word level corresponds to 
 the actual Zipf curve, and approximates a straight line when plotted log-log. 
 The other two curves indicate that as the inversion is made to a higher level, 
 the curve flattens on the left side, as a result of a greater number of words 
 which appear in all or nearly all items at the level to which the inversion 
 was made. At the sentence level, about a dozen words (such as A and THE) 
 occur in at least half the items, while at the document level, with a document 
 corresponding to a chapter of the statutes, more that 300 words occur in over 
 half the items. 
 
 It is customary in many information retrieval systems to delete common 
 words from the index file, both to conserve space and to prevent the user from 
 making a query which would result in an inordinate number of items matching. 
 However, this imposes a rather arbitrary restriction on the system user, since 
 it is possible for him to meaningfully use these words in a query. This query 
 
RANK 
 
 FREQUENCY 
 
 PRODUCT OF 
 
 
 
 
 RANK AND 
 
 FREQUENCY 
 
 
 10 
 
 2,653 
 
 26 
 
 ,530 
 
 
 20 
 
 1,311 
 
 26 
 
 ,220 
 
 
 30 
 
 926 
 
 27 
 
 ,780 
 
 
 40 
 
 717 
 
 28 
 
 ,680 
 
 
 50 
 
 556 
 
 27 
 
 ,800 
 
 
 100 
 
 265 
 
 26 
 
 ,500 
 
 
 200 
 
 m 
 
 26 
 
 ,600 
 
 
 300 
 
 25 
 
 ,200 
 
 
 400 
 
 62 
 
 24 
 
 ,800 
 
 
 500 
 
 50 
 
 25 
 
 ,000 
 
 1 
 
 ,000 
 
 26 
 
 26 
 
 ,000 
 
 2 
 
 ,000 
 
 12 
 
 24 
 
 ,000 
 
 3 
 
 ,000 
 
 8 
 
 24 
 
 ,oor> 
 
 14 
 
 ,000 
 
 6 
 
 24 
 
 ,000 
 
 5 
 
 ,000 
 
 5 
 
 25 
 
 ,000 
 
 10 
 
 ,000 
 
 2 
 
 20 
 
 ,000 
 
 20 
 
 ,000 
 
 1 
 
 20 
 
 ,000 
 
 29 
 
 ,899 
 
 1 
 
 29 
 
 ,899 
 
 Figure 1.3 - Table of Word Frequency in "Ulysses" 
 
 10,000 
 
 1000 
 
 > 
 u 
 
 z 
 
 UJ 
 
 S 100 
 
 DC 
 
 — James Joyce data 
 
 10 
 
 10 
 
 100 1000 
 
 RANK 
 
 10,000 
 
 Figure 1.4 - Zipf Curve for "Ulysses" 
 
10 
 
 WORD LEVEL 
 
 PARAGRAPH LEVEL 
 
 DOCUMENT LEVEL 
 
 LOG (RANK) 
 
 Figure 1.5 - Zipf Curves for State Statutes 
 
11 
 
 would generally take the form of A AND B or A AND NOT B, where A is a list 
 which has a small number of entries, and B with a large number. Both have as 
 results a list with the number of entries less than or equal to the number of 
 entries in A, while the result of A OR B is of the same order as B (the large 
 list). However, the first two expressions can be rewitten as A AND NOT (NOT 
 B) and A AND (NOT B) . Therefore, a list indicating the items which do not 
 contain a term can be stored if the term is contained in more than half the 
 total number of items. 
 
 The saving achieved by using this technique depends upon the frequency of 
 occurrence of the words and the total number of tokens at the inversion level 
 selected. A simple model which can be used consists of a line with slope -1 
 for the Zipf curve of the database inverted to the word level, as illustrated 
 in Figure 1.6. When the inversion is made to a higher level, all points on 
 the curve with a frequency greater than the total number of items at the 
 inversion level are made equal to the total number of items. In reality, the 
 changing of the inversion level not only truncates the left hand portion of 
 the curve, but also decreases the slope, moving the point where the 
 negatively-sloped line meets the horizontal line left. 
 
 Figure 1.7 illustrates the savings achieved by using complemented lists. 
 It must be remembered that the results are plotted on log-log coordinates, so 
 the relative sizes of the areas are misleading. However, it does illustrate 
 where the saving is achieved. With the simple model for a higher level 
 inversion, using complementary lists results in a savings of 15 to 25 percent 
 in the size of the index file. On actual data (the statutes mentioned 
 previously) the decrease in size is 15 percent at the document level and 10 
 
12 
 
 IDEAL ZIPF CURVE 
 
 THEORETICAL TRUNCATED ZIPF CURVE 
 
 ACTUAL TRUNCATED ZIPF CURVE 
 
 LOG (RANK) 
 
 Figure 1.6 - Truncated Zipf Curve 
 
 ,TOTAL NUMBER OF DOCUMENTS 
 
 NOTE: AREAS ARE DISTORTED BECAUSE 
 OF LOG-LOG AXES 
 
 STORAGE SAVED BY USING 
 COMPLEMENTARY LISTS 
 
 DOCUMENTS 
 
 TOTAL NUMBER OF TYPES 
 
 NUMBER OF TYPES 
 CONTAINED IN ALL 
 DOCUMENTS 
 
 Figure 1.7 - Savings Using Complementary Lists 
 
13 
 
 percent at the sentence level, due to differences between the simple model's 
 curve and the actual curve. 
 
 Another consideration is the extra processing required to check contexts 
 either above or below the item level [6]. Earlier, it was noted that if the 
 context specified is lower than the item level, full-text searching may be 
 required to determine if a match actually exists. If the context is higher 
 than the item level, then a change in the method used for combining terms is 
 required. This can be done either by having the item pointer contain encoded 
 information regarding the higher level structure (i.e.- for inversion at the 
 sentence level, have the item number consist of fields indicating the document 
 and paragraph numbers which contain the sentence, as well as the number of the 
 sentence within the paragraph) , or by defining each higher entry as any item 
 within a fixed range of another item. The first technique is inefficient in 
 its use of bits in the item pointer, while the second is inaccurate due to the 
 use of a non-standard definition for the higher levels. - 
 
 However, in many systems operating on data with an inherent hierarchical 
 structure (such as Figure 1.1 shows for state statutes in a legal database), 
 it is possible to invert to an optimal level which minimizes the number of 
 context requests either above or below the item level. 
 
 1.4 Processing on Conventional Computers 
 
 Most currently implemented inverted file information retrieval systems 
 run on standard digital computers (for example MEDLINE on an IBM System/370 
 and EUREKA on a PDP-11/40). Estimates made by the operators of these systems 
 indicate that a majority of the processor time is spent in the routines used 
 
HI 
 
 for fetching and merging posting lists. 
 
 Figure 1.8 is representative of the instructions used in the EUREKA 
 system to produce the AND of two lists both contained in memory. The program 
 has been written to take advantage of the high speed registers available on 
 the PDP-11 computer, and is close to the maximum efficiency for doing the 
 operations required. Figure 1.9 illustrates the data element on which the 
 program operates. The first part of the program checks the context bits to 
 determine if the entry occurs in the proper context, and if not, fetches the 
 next entry on the list. When a valid entry from each list has been found, the 
 two document number fields are compared, and if they are not equal, a new 
 entry is fetched from the list containing the lower document number. If they 
 are equal, an output entry is created consisting of a document number from 
 either of the two input entries, a count field equal to the minimum of the two 
 input count fields and a tag bit equal to the OR of the two input tags is 
 formed, and the context bits are cleared. The tag bit is used as an 
 indication that full-text searching is required for an entry, since if one of 
 the input entries required full-text searching to determine if it is a valid 
 entry, then any entry formed from an AMD operation with that entry will 
 require full-text searching. Figure 1.10 summarizes the operations used to 
 generate the fields for all three operators. 
 
 The number to the right of the comment field for each instruction is its 
 number of memory references, which on the PDP-11 is directly proportional to 
 the time required to perform that instruction. Assuming that an average merge 
 operation will require the reading of one entry and the writing of one entry 
 (it can read either one or two and write zero or one), and counting the number 
 
15 
 
 * rt + * CM tH M M * ri * t< N d M M M h^Wh N»nt-(N-I MH^rld H (M i ) h M CM * d M CM T H H H 
 
 Lt 
 
 K > 
 u. Cl 
 
 I I 
 
 K X t- H 
 
 Ll U. iij uj 
 
 hOI- K 
 
 2 2 2 
 
 in G 
 
 X Cl X 
 
 I CJ 
 
 Uj uj 
 
 J I 
 
 a. 
 
 a. a. 
 
 K- > 2 
 
 V 
 
 ili U. CJ 
 
 K C O 
 
 U; 
 
 u. 
 
 X 
 
 0. 
 
 s > > 
 
 O II V 
 
 cj x :■•: 
 
 Uj 
 > 2 
 
 a. > 
 E x 
 
 x > 
 i- x 
 
 q. t- 
 
 t- :•'.' 2 
 X o 
 
 UJ U. Ul h- 
 
 h- 5 w Z 
 
 2 X 3 
 
 5 ui i 5 
 
 O H- Cl CJ 
 
 Of 05 6 N 
 
 x u 
 
 uj U 
 
 -J c u- 
 
 l- to 
 .* 0- 
 CJ > 
 
 2 'I 
 
 i- a 
 
 2 K 
 
 3 a. 
 
 in ll 
 
 Ui r a 
 
 O u lu 
 2 12 
 
 I- LL 
 
 2 K 
 
 i- :■. 
 
 > h 
 ll 
 
 ll r 
 
 O G 
 
 2 x 
 
 u. 
 
 ll- c 
 
 Uj 2 
 2 Uj 
 
 01 Ci U 
 UJ 2 lu 
 
 :>• uj > 
 
 m m + 2 
 
 or a: -■ x 
 
 2 2 ts 
 
 rt * CM *» X Ti 
 
 X «H X rt v to 
 
 (S3 + + 
 
 " to ■ 
 
 X •- CM CM 
 
 a. I ll a. 
 
 ". CS '- 
 
 cs cs in ui 
 
 N CS X I 
 
 '- N 2 2 
 
 N 
 
 vi MH CM *t a. 
 
 * * > M > i 
 
 (S3 X 
 
 CS - 
 CS - 
 
 N rt 
 
 <s in 
 
 LL LX 
 
 rt W rt #t Ct rt 
 
 >: CO X CB >-• Cl 
 
 CS 
 
 '■ Ct 
 
 CS ■-• 
 
 Cl I 
 
 - CS 
 
 CS CS 
 
 CS CS 
 
 N CS 
 
 p.. r- 
 
 N v 1 , ■ 
 
 * * 
 
 L~ CS 
 
 Cl -i 
 
 + 
 
 Cl 
 
 
 + 
 
 
 a. 
 
 
 
 
 LL 
 
 1 
 
 
 to 
 
 
 I 
 
 
 
 . 
 
 + + 
 
 + 
 
 
 + 
 
 + 
 
 + 
 
 
 
 2 
 
 ■■*■ " 
 
 r\ 
 
 
 /"■ 
 
 .-■ 
 
 - 
 
 
 
 I 
 
 (53 CM 
 
 r ■ 
 
 <n 
 
 CM 
 
 CS 
 
 CS 
 
 
 
 
 Cl Cl 
 
 to 
 
 CM 
 
 Cl 
 
 to 
 
 LL 
 
 rt 
 
 *n 
 
 M % 
 
 V V 
 
 *-' 
 
 rt 
 
 ■" 
 
 '-■' 
 
 ■-' 
 
 Cl 
 
 \- 
 
 Cl r- 
 
 I: g £ U £ g Lu C U J- Ul H UJ h x uj u o a, C :■ h- C, 03 UJ C I- Uj (- UJ !•■ X UJ L' O C!^;-^l->hSi > K Ul CJ LU CC 2 
 ciSuj^HL^ tt ajaj£coLOHi?^ffi^Sffi£a : ^&cEc:£S£ 
 
 
 
 
 
 
 
 r 
 
 * 
 
 
 in 
 
 «t *t #t 
 UJ N 01 
 
 
 u. 
 
 
 
 
 
 
 
 a 
 
 z 
 
 
 
 O UJ 
 
 
 
 
 
 
 
 C 2 
 
 UJ 
 
 
 
 I 
 
 
 
 
 
 rt 
 
 
 J CC 
 
 Ul 
 
 
 
 Ct H- 
 
 irt 
 
 
 2 
 
 
 U3 
 
 
 3 
 
 CO 
 
 
 o 
 
 UJ 
 
 Ui 
 
 
 UJ 
 
 
 
 
 O Ci 
 
 
 
 1- 
 
 guj 
 
 i. 
 
 
 H 2 
 
 
 O 
 
 
 2 CC 
 
 > 
 
 
 
 CC 
 
 
 M C 
 
 
 h- 
 
 
 Ul 
 
 -J 
 
 
 Q£ 
 
 2 CC 
 
 1- 
 
 
 Ct 
 
 
 
 
 a. x 
 
 m 
 
 
 CC 
 
 2 2 
 
 
 
 UJ Ll 
 
 
 ai 
 
 
 X to 
 
 3 
 
 
 _J 
 
 
 a 
 
 
 2 
 
 
 * 
 
 
 _l UJ 
 
 o 
 
 
 
 Ui X 
 
 2 
 
 
 i- in 
 
 
 
 
 ct > 
 
 t-* 
 
 
 2 
 
 I o 
 
 
 
 UJ 
 
 
 2 
 
 
 UJ o 
 
 > 
 
 
 h-t 
 
 1- ct 
 
 i. 
 
 
 o ^ 
 
 
 O 
 
 
 > 
 
 UJ 
 
 
 in 
 
 cl 
 
 U 
 
 
 1- CC 
 
 
 to 
 
 
 O 2 
 
 to 
 
 
 
 UJ LL Ci 
 
 h- CC UJ 
 
 Ui 
 
 
 1- 
 
 
 to 
 
 
 UJ 
 
 to 
 
 
 
 2 
 
 
 3 1 , 
 
 
 
 
 X h- 
 
 
 
 in 
 
 CC 01 
 
 CJ 
 
 
 
 in 
 
 
 CS Ul 
 
 Ui 
 
 
 K 
 
 U d Ul 
 
 
 
 _J c 
 
 
 UJ 
 
 
 CM > 
 
 ii- 
 
 
 m 
 
 m 2 Ui 
 
 X. 
 
 
 CC UJ 
 
 
 a 
 
 
 Ul 
 
 X 
 
 
 M 
 
 Ci CC u 
 
 W 
 
 
 > F 
 
 
 CC 
 
 
 1- 
 
 2 
 
 
 _l 
 
 2 O 
 
 I 
 
 
 u 
 
 
 H 
 
 
 3b 
 
 
 
 
 " , Ct 
 
 ^T 
 
 
 2 UJ 
 
 
 
 
 Ul 
 
 
 K 
 
 2 Cl 
 
 
 
 
 
 2 
 
 
 CC Uj 
 
 to 
 
 
 -1 
 
 in C 
 
 LL 
 
 
 ui 
 
 
 UJ 
 
 
 Ul 
 
 
 0. CS 
 
 l-HL!i 
 
 I 
 
 
 Ct Ct 
 
 
 t- 
 
 
 
 K 
 
 
 2 * 
 
 2 h- CO 
 
 K 
 
 
 UJ 
 
 
 *-* 
 
 
 X « 
 
 U1 
 
 
 ►■* \ 
 
 Ui U 
 
 
 
 2 UJ 
 O CO 
 
 
 
 
 1- Ul 
 
 »-» 
 
 
 rt 
 
 2 3 O 
 
 01 
 
 
 
 t- 
 
 
 M 3 
 
 CD 
 
 
 O ^ 
 
 2 Ct 1- 
 
 _J 
 
 
 _J 
 
 ui 
 
 —i 
 
 
 2 C 
 
 Ul 
 
 
 2 1 
 
 O (- 
 
 
 
 O 
 
 3 
 
 to 
 
 
 X 
 
 Ct 
 
 
 i- a. 
 
 u in z 
 
 CC 
 
 
 HI H 
 
 
 H 
 
 
 Ul UJ 
 
 
 
 5 
 
 2 a 
 
 LL 
 
 
 »-i 
 
 U^ 
 
 -i 
 
 
 1- 
 
 UJ 
 
 
 U. Cl 
 
 Ui w w 
 
 
 
 h- 
 
 
 5 
 
 
 Ul ~ 
 
 X 
 
 
 O 
 
 I t- 
 
 I 
 
 
 2 - 
 
 oi 
 
 
 
 ►- in 
 
 (- 
 
 
 . u 
 
 1- Ul u 
 
 (J 
 
 
 U 
 
 N 
 
 2 
 
 
 _J c 
 
 
 
 2 3 
 
 t— I 
 
 
 M CS 
 
 
 CC 
 
 
 2 
 
 H 
 
 
 O H 
 
 U. H Ct 
 
 I 
 
 
 2 2 
 
 O 
 
 
 
 2 O 
 
 X 
 
 
 2 
 
 O 1- 
 
 
 
 2 w 
 
 1- 
 
 Ul 
 
 ui 
 
 Ui CJ 
 
 2 
 
 
 CC 2 
 
 > Ul 
 
 
 
 Ul 
 
 
 u 
 
 — J 
 
 1- Ui 
 
 1- 
 
 
 : O 
 
 1- CO 2 
 
 2 
 
 
 2 2 
 
 H 
 
 -1 
 
 
 « Ul 
 
 
 
 
 2 w 
 
 UJ 
 
 
 UJ CC 
 
 
 5 
 
 &l 
 
 
 m 
 
 
 Ui CC 
 
 a cj 
 
 t- 
 
 
 1- CJ 
 
 N 
 
 o 
 
 
 CS tf 
 
 UJ 
 
 
 I X. 
 
 - Ul Lu 
 
 M 
 
 ui 
 
 M 
 
 -1 
 
 ct 
 
 \p 
 
 CS 
 
 2 
 
 
 y- to 
 
 Ct Ct 2 
 
 
 2 
 
 
 
 Ct 
 
 in 
 
 CS (S 
 
 ~i 
 
 
 z% 
 
 >- 1- 
 
 LU 
 
 
 UJ h- 
 
 LL 
 
 
 
 ^ 
 
 Ul 
 
 
 UJ 3 
 
 
 N 
 
 _J m 
 
 Q 
 
 o 
 
 o 
 
 rt > 
 
 Ul • 
 
 
 % w 
 
 i e ct 
 
 H Ui o 
 
 3 
 
 2 
 
 -J 
 
 S M 
 
 2 —J 
 
 
 h- 
 
 h 
 
 _J 
 O UJ 
 
 X Ul 
 
 H- 
 
 to > 
 
 Ct LL 
 
 
 rt 
 
 
 01 
 
 2 
 
 H 
 
 2 1- 
 
 UJ c 
 
 Ul 
 
 CD 
 
 o 
 
 Ul 
 
 
 m ct 
 
 lu 
 
 2 
 
 
 H X 
 
 2 _J 
 
 ►«■« 
 
 O 
 
 t- in a 
 
 
 to 
 
 Ui 
 
 ■J 
 
 »-i 
 
 T 
 
 2 
 
 -< J 
 
 _J X > N 
 
 i- a 
 
 Ui Ui 
 
 'I 
 
 a 
 
 CC 2 
 
 U 
 
 Ul 
 
 * 
 
 U. w 
 
 1- 
 
 
 UJ 
 
 Ul _J to 
 
 
 
 t- 
 
 > 
 
 Ul 
 
 
 O X 
 
 3 U. 
 
 X U. > U. M U. 
 
 UJ£ 
 
 to U M 
 
 a 
 
 ^ 
 
 ca a 
 
 o 
 
 UJ 
 
 to 
 
 o 
 
 G 
 
 UJ > 3 
 
 2 
 
 Ul 
 
 2 
 
 
 u 
 
 a 
 
 : Ct 
 
 ct in 
 
 O G C 
 
 M 
 
 CO U C2 
 
 
 UJ 
 
 M UJ 
 
 Ifl 
 
 c 
 
 
 u Cl Ll 
 
 CO X 
 
 K t- (- h- t- H 
 
 t- Ui 
 
 2 UJ 
 
 Ul 
 
 
 Ul 2 
 
 OJ 
 
 to 
 
 x 
 
 2 Cl 2 
 XX" 
 
 3 
 
 2 £ s 
 
 Ct 3 Ct 3 X 3 
 
 2 2 
 
 3 > to 
 
 Ul 
 
 u 
 
 U"! H 
 
 
 to 
 
 UT 
 
 ui a 
 
 O C 
 
 2 Ct 
 
 UJ 
 
 > 
 
 Ul 
 
 Q 
 
 
 UJ 
 
 : H 
 
 Lu 
 
 1- O 1- O 1- O 
 
 Cl 
 
 5 LU 
 
 u 
 
 CJ 
 
 U 2 
 
 H 
 
 J 
 
 _ J 
 
 Ui 
 
 Ul Q 
 
 LL L 1 LL '_■ CL '_ 
 
 CD UJ 
 2 I 
 
 Ui 2 2 
 2 Ul m 
 
 r, 
 
 LL 
 
 f>1 
 
 c o 
 ct ct 
 
 (J\ 
 
 _j 
 
 (J 
 > 
 
 m 
 
 2 O 
 
 1! II 1 II li II 
 CS rt CM f 1 t IP 
 
 Ul h- 
 
 t- 2 H 
 
 Cl 
 
 rt 
 
 Cl ll 
 
 d 
 
 u. 
 
 CJ 
 
 K H « 
 
 t- _J 
 
 X LL Cl LL LL U 
 
 § 
 
 CO 
 
 CO 
 
 < 
 
 Cd 
 CC 
 
 23 
 W 
 
 E 
 O 
 
 d) 
 C 
 •H 
 
 O 
 t, 
 J3 
 
 3 
 CO 
 
 O 
 
 z 
 
 CO 
 
 •rt 
 Cv, 
 
16 
 
 15 12 
 
 CONTEXT-T 
 
 11 
 
 DOCUMENT NUMBER 
 
 EVEN WORD 
 
 15 
 
 T 
 
 14 
 
 CONTEXT- II 
 
 COUNT 
 
 ODD WORD 
 
 Context-I and Context-II form a group of 13 flag bits [Ctx(A)] 
 flags which indicate the contexts within which the term 
 occurs. The assignment is arbitrary, but must match 
 the assignment used for the context check mask. In 
 general, it indicates contexts other than Body, Paragraph, 
 or Sentence. 
 
 Document Number [Doc (A)] is a pointer or indicator of the 
 
 document which contains the term. Due to implementation 
 restrictions, it only allows 4094 documents in the 
 database. 
 
 Count [Cnt(A)] indicates how many times the term occurs within 
 the body of the document. Zero means it is only in 
 another context (Title, Author, etc.) and ^ means it 
 occurs 6 3 or more times. 
 
 T is a special tag bit [Tag(A)] that indicates a result of 
 the merge operations must be full-text searched to 
 determine if it actually satisfies the specified 
 conditions of the query. 
 
 Figure 1.9 - List Entry Format for AND Subroutine 
 
17 
 
 w 
 
 c 
 o 
 
 •H 
 P 
 O 
 
 < 
 
 
 
 bO 
 
 CO 
 
 
 
 
 
 
 
 CO 
 
 H 
 
 
 
 
 
 
 
 H 
 
 • 
 
 
 
 
 
 
 
 f 
 
 o 
 
 
 
 
 
 
 rH 
 
 K 
 
 S3 
 
 
 
 iH 
 
 
 
 rH 
 
 O 
 
 < 
 
 
 
 .H 
 
 
 
 - D 
 
 • 
 
 ■ 
 
 
 
 •> D 
 
 
 
 »-. Pi, 
 
 ^-^ 
 
 •^ 
 
 
 
 ^ Cl, 
 
 
 
 X ll 
 
 X 
 
 X 
 
 
 
 >H || 
 
 
 
 W •• 
 
 v— y 
 
 v_^ 
 
 
 
 \^^ • • 
 
 
 
 faC N! 
 
 faO - -H 
 
 bO •> 
 
 <H 
 
 
 bO Nl 
 
 
 
 CO 
 
 CO l — i r~\ 
 
 CO •-> 
 
 tH 
 
 
 CO 
 
 
 
 H - 
 
 H ^ D 
 
 H >n 
 
 D 
 
 
 E-i - 
 
 
 
 n >, 
 
 II >H &U 
 
 II w 
 
 ti- 
 
 
 ll >> 
 
 
 
 " P 
 
 •• N ' II 
 
 •• P 
 
 ll 
 
 
 •• P 
 
 
 
 «"-> G, 
 
 <-v +J .. 
 
 ^ c 
 
 • • 
 
 
 ^-» D. 
 
 
 
 Nl E 
 
 NJ C M 
 
 Nl O 
 
 M 
 
 
 Nl E 
 
 
 
 ^ W 
 
 v- 1 O 
 
 ^_^ 
 
 
 
 ^ ta 
 
 
 
 bC ll 
 
 bO - - 
 
 hO OT 
 
 «■ 
 
 
 fafl ll 
 
 
 
 (0 •• 
 
 CO ^ >> 
 
 CO 3 
 
 >> 
 
 >s 
 
 CO •• 
 
 
 
 H X 
 
 H X P 
 
 H H 
 
 P 
 
 P 
 
 H >H 
 
 
 
 
 v_^ Q. 
 
 a 
 
 a 
 
 a 
 
 
 
 
 •• •> 
 
 - P E 
 
 •» 
 
 E 
 
 E 
 
 •s «k 
 
 
 
 **""S *~v 
 
 ~ c w 
 
 ^-v *— ■>. 
 
 H 
 
 fcl 
 
 *""N /■"> 
 
 
 
 X X 
 
 * U ll 
 
 * X 
 
 II 
 
 ii 
 
 >H >H 
 
 
 
 V — • *^s 
 
 N_x l__l •• 
 
 ^-^ S— ' 
 
 • • 
 
 • • 
 
 > * ^_s 
 
 
 
 O P 
 
 O C >h 
 
 O P 
 
 >H 
 
 >^ 
 
 O P 
 
 
 
 o c 
 
 O -H 
 
 o c 
 
 
 
 o c 
 
 
 
 P CJ 
 
 Q S - 
 
 o u 
 
 ■> 
 
 •s 
 
 Q O 
 
 
 >> 
 
 II II 
 
 ll ll >> 
 
 ii ii 
 
 >> 
 
 >> 
 
 ll ll 
 
 >> 
 
 P 
 
 ■ • •• 
 
 .. .. +3 
 
 •• «• 
 
 P 
 
 p 
 
 • • •• 
 
 -p 
 
 Q. 
 
 /— v ^"N 
 
 ^^ a 
 
 /-^ *-~N 
 
 a 
 
 D. 
 
 ••■v .*— * 
 
 D. 
 
 E 
 
 N) Nl 
 
 NN E 
 
 Nl N! 
 
 E 
 
 E 
 
 Nl Nl 
 
 E 
 
 W 
 
 V > 
 
 w v_- W 
 
 *W ^«' 
 
 Ci3 
 
 W 
 
 *— ' ^^ 
 
 W 
 
 II 
 
 O P 
 
 O P II 
 
 O P 
 
 II 
 
 H 
 
 O P 
 
 ii 
 
 •• 
 
 o c 
 
 o c •• 
 
 o c 
 
 • • 
 
 • • 
 
 o c 
 
 
 X 
 
 O o 
 
 Q U X 
 
 Q O 
 
 X 
 
 X 
 
 o o 
 
 >H 
 
 c 
 o 
 
 •H 
 P 
 
 CO 
 
 t, 
 
 CD 
 a 
 c 
 
 H 
 c 
 
 SS 
 
 o 
 
 
 o 
 
 o 
 
 2 
 
 O 
 
 H 
 O 
 
 Q 
 
 2: 
 
 <1) 
 
 CO 
 
 <u 
 
 CO 
 E 
 
 O 
 X 
 
 SL, 
 
 x; 
 p 
 
 •H 
 
 
 P 
 CO 
 
 x: 
 p 
 
 CO 
 0) 
 
 p 
 
 CO 
 
 o 
 
 •H 
 T3 
 
 c 
 
 •H 
 
 CO 
 CO 
 
 C 
 O 
 
 •H 
 
 P 
 CO 
 t- 
 CD 
 D- 
 
 O 
 
 I 
 
 O 
 
 bO 
 •H 
 
 O 
 O 
 P. 
 
 X 
 >^ 
 
 o 
 
 o 
 
 V 
 
18 
 
 of cycles required on the average to carry out this operation, the average 
 merge time is about 30 microseconds, with an effective bandwidth of 2.HJ 
 megaHertz. The actual bandwidth of the memory (16 bits per 650 nanoseconds) 
 is 24*6 MHz. Memory efficiency of a process can be defined as the effective 
 bandwidth of the process divided by the available memory bandwidth. For the 
 previous example, this is about 8.8$. This number is similar to those 
 calculated for other general purpose digital computers; for example the IBM 
 System/360 Model 75 has an efficiency of 6.4$, due to its higher available 
 bandwidth. 
 
 This low efficiency for conventional digital processors can be easily 
 explained by examining the program. Before an instruction is executed it must 
 be fetched from storage, requiring an overhead memory cycle. Because of this, 
 even if every instruction completely processed a word of the input or output 
 data, the efficiency would be only 50$ -. In addition, on the PDP-11 and many 
 other computers, an instruction may consist of more than a single word, 
 reducing the efficiency even more. 
 
 In addition, there are instructions in the program which do not process 
 any input or output data. These can be divided into two classes — flow of 
 control and locating and aligning. The flow of control instructions include 
 branches necessary to reach other statements of the program either 
 conditionally or unconditionally. The second class is used to find the next 
 input data element in a list or the next available output location (adjustment 
 of pointers) , or to transform data to a form which can be processed by the 
 machine (bit masking, shifting, etc.). 
 
19 
 
 It is clear from the preceding discussion that the problem of merging 
 lists of entries does not nicely match a conventional digital computer's 
 architecture. What is needed is a processor which could execute instructions 
 more compatible with the problem, reducing both the number of instructions 
 which must be fetched and the need for flow of control instructions. 
 
20 
 
 CHAPTER 2 — A SPECIALIZED LIST MERGING PROCESSOR 
 
 Very few types of processors have been proposed to conveniently handle 
 the generally non-numeric task of merging two lists of data. Most non-numeric 
 processors are associative processors, which are ideal for searching large 
 bodies of data, but not for combining two lists and eliminating unwanted 
 entries . 
 
 The implementation of a specialized processor to merge two input lists 
 into a single output list is simplified by the nature of the problem. The 
 operations required are both simple and well defined, allowing a hardwired, 
 rather than programmed, sequencer for speed and efficiency. Operations such 
 as pointer and count manipulations can be performed in parallel with the 
 actual merging operation, further increasing the speed* Finally, the data 
 alignment problem present on conventional processors is non-existent, since 
 the data can easily be routed to the appropriate points in the processor 
 (assuming the data format is fixed or falls within a small set of previously 
 defined formats) . 
 
 2.1 Previous List Merging Processors 
 
 Two different styles of list merging processors have previously been 
 proposed: the bit serial/entry parallel unit discussed by Stellhorn, and the 
 HARVEST non-numeric extension to the IBM STRETCH computer designed for the 
 National Security Agency. 
 
21 
 
 Stellhorn [7] proposed using a Batcher merge network [8] to combine the 
 two input lists (see Figure 2.1). Since it is not practical to build a 
 Batcher network capable of merging two large lists (since for two lists, each 
 containing N entries, it requires order N log N* Batcher merge elements), a 
 technique for merging the lists in parts was devised. This consists of 
 merging the next sublist from one of the input lists, selected based on the 
 lower first entry, with the last half of the results of the previous merge 
 (which are fed back from the outputs to the inputs of the merge network). 
 Stellhorn proved that this technique will always produce a properly merged 
 list. 
 
 However, this network only produces a list consisting of the merged 
 entries of the two input lists; no action is taken to remove duplicate 
 entries in an OR operation or, more importantly, to identify these duplicates 
 as the only correct results of an AND. This action must be handled by an 
 additional unit, the coordination network* This unit must examine the entries 
 and eliminate those which are not proper results. It then must repack the 
 data in the output buffers and wait until these buffers are full, because some 
 of the entries from the Batcher merge network may have been eliminated. 
 
 It is possible, when a large number of list entries are being processed 
 in parallel, for either the processing time (with the unit proposed by 
 Stellhorn) or the number of gates (as proposed by Lawrie [9,10]) of the 
 coordination network to be greater than that of the Batcher merge network! 
 
 * In all instances in this thesis, log n will mean the logarithm to base two 
 of n, if n is a integral power of two, or the logarithm of the next higher 
 power of two, if it isn't. 
 
22 
 
 
 
 
 uconc 
 
 / 
 
 
 
 DATA 
 MEMORY 
 
 / 
 
 
 
 
 CONTROL 
 
 
 NETWORK 
 
 r 
 
 
 
 t 
 
 
 
 CONTROL 
 COMPUTER 
 
 ( R 
 
 
 DISK 
 UNITS 
 
 
 
 ^ 
 
 
 ( 
 
 
 
 
 
 F 
 U 
 
 
 | 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 
 > 
 
 ' 
 
 
 
 
 
 
 
 COORDIN- 
 ATION 
 NETWORK 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 ) 
 
 
 
 CONTROL 
 
 
 
 
 
 
 
 
 
 DATA 
 
 
 
 
 
 
 
 
 CONTRC 
 
 TO 
 DISK 
 
 
 
 
 
 DATA 
 
 
 Figure 2.1 - Stellhorn/Batcher Merge System 
 
 1 
 
 ■ - 
 
 
 I 
 
 64 
 
 > \ 
 
 64 
 
 t 
 ( 
 
 P 
 
 
 REGISTER P 
 
 UN 
 
 IT 
 
 128 — »• 8 MATRIX 
 
 Q 
 
 INDEXING 
 UNIT 
 
 R 
 
 INDEXING 
 UNIT 
 
 -> 
 
 -> 
 
 STREAM UNIT P 
 
 64 
 
 64 
 
 REGISTER Q 
 
 128 
 
 8 MATRIX 
 
 STREAM UNIT Q 
 
 STREAM UNIT R 
 
 128 MATRIX 
 
 REGISTER R 
 
 64 
 
 64 
 
 TABLE 
 STORED 
 
 IN 
 MEMORY 
 
 Figure 2.2 - HARVEST Functional Units 
 
23 
 
 The second form of merge processor is similar to the IBM 7950 HARVEST 
 processor, an extension to the IBM 7030 STRETCH system [11]. Figure 2.2 is a 
 simplified diagram of the functional units within the processor. HARVEST is 
 programmed by having STRETCH pass it a list of setup instructions. When the 
 processor has been successfully programmed by these instructions, a start 
 command is issued by STRETCH, and HARVEST processes the streams of data based 
 on the instructions. Facilities exist for the transformation of data 
 controlled by table lookup, in addition to logical transformation. This table 
 lookup scheme causes the processor to fetch data from the current table based 
 on a function of the input characters. This data can consist of an arbitrary 
 number of output characters, including none, and an address for the table to 
 be used next. 
 
 2*2 A Simple List Merging System 
 
 Figure 2.3 shows the major data paths connecting the components of a 
 large scale data processing system, such as the IBM System/360 Model 75 -. The 
 large, high bandwidth memory is connected to the various processors by a 
 memory bus control unit (BCU) , which acts as an arbitrator between the 
 potential memory users. The channels have the highest priority access to the 
 memory, and the central processor the lowest * The channels are used to 
 relieve the need for character assembly by the central processor, and to 
 better match the high bandwidth of the memory to the low bandwidth of the 
 peripheral units. In a smaller system, the BCU is replaced by a simple bus 
 arbitration protocol, and the channels by including direct memory access 
 capability in control units which transfer large amounts of data. 
 
2U 
 
 DATA PATH 
 
 CPU CONTROL PATH 
 
 DEVICE 
 CONTROLLERS 
 
 TERMINALS , ETC 
 
 Figure 2.3 - Merge System Configuration 
 
 HOST SYSTEM 
 CHANNEL 
 
 
 
 
 
 
 TO OTHER DEVICE CONTOLLERS 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CHANNEL 
 INTERFACE 
 
 
 
 
 
 
 
 MERGE 
 SYSTEM 
 
 LOCAL 
 MEMORY 
 
 
 
 
 DISK 
 CONTROLLER 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 MERGE 
 PROCESSOR 
 
 
 
 
 
 
 — DATA PATH 
 --CONTROL PATH 
 
 SCHEDULING 
 PROCESSOR 
 
 
 
 i 
 
 I 
 
 1 
 
 
 
 
 
 
 
 
 
 
 INDEX FILE 
 DISK STORAGE 
 
 X) 
 X) 
 
 Figure 2.U - Complex Merge System Configuration 
 
2S 
 
 The merge processor is added as if it were an additional processor or 
 channel, with control information exchanged between the central processor and 
 the merge processor, and the merge processor transferring data to and from 
 memory thru the BCU. In this configuration, the central processor issues a 
 command to the merge processor indicating the memory locations for the input 
 and output lists and the type of operation desired. The central processor can 
 then execute other tasks until the merge processor completes the operation or 
 detects an error; at this point it will interrupt the central processor for a 
 new command. Due to the high data requirements of the merge processor, as 
 compared to normal input/output devices, and the fact that the bus control 
 unit grants access to the central processor only when another unit is not 
 requesting it, the merge processor can effectively halt execution of the 
 program running on the central processor if care is not taken to periodically 
 relinquish memory ownership to the central processor. 
 
 A more complex merge processing system is illustrated in Figure 2.4. 
 This system contains its own memory and disk files, so its interference during 
 operation of the conventional data processing system is minimized. It 
 consists of a merge processor, a disk system, a channel interface, memory, and 
 a scheduling processor. This scheduling processor receives requests from the 
 host processor, queues them until the appropriate resources are available, 
 fetches the data from disk into memory, merges the entries, and transfers the 
 result either to disk for later usage or to the host processor via the channel 
 interface. To the host system, this configuration appears to be a very 
 intelligent disk system which has all possible combinations of lists stored. 
 
26 
 
 The merge processor can be implemented either as a parallel or serial 
 unit. As is generally true, the parallel unit operates considerably faster, 
 but requires an increase in gates greater than the increase in its speed over 
 the serial unit , However, the parallel unit is easier to understand, and will 
 be discussed first. 
 
 2.3 Parallel Element Implementation 
 
 A general block diagram of the parallel merge processor [12,13] is given 
 in Figure 2.5. Data is fetched from memory by either the X or Y list fetch 
 logic and delivered to the appropriate mask checker. Here the context bits 
 are checked using the specified mask to determine if the entry is for an item 
 in the proper context; if it is, the entry is placed in the appropriate input 
 holding register and that register is marked full*. Fetching of list entries 
 continues until both holding registers are full. At this time, the two 
 document number fields are compared, and the action to be taken is determined 
 based on the operation specified* This consists of forming the output fields, 
 marking the output register as full if the operations table specifies the 
 creation of an output entry, and indicating that either or both of the input 
 registers are empty 5 This action continues until the lists are exhausted**. 
 
 * The merge processor and the memory interface are interlocked using a bit 
 for each of the inputs and for the output. These bits are set by the data 
 source when it places data in the buffer, to indicate the connection is 
 full , and reset by the data sink to indicate the connection is empty , and 
 new data should be placed in it . 
 
 ** In the case of A AND B, processing can be stopped when either list A or 
 list B is exhausted, rather than waiting for both lists to be exhausted. 
 For A AND NOT B, it can be stopped when list A is exhausted. The amount of 
 time this saves is highly data dependent, and will be ignored in future 
 discussions. 
 
27 
 
 
 
 1 
 
 t 
 
 
 
 
 
 
 
 
 
 
 
 
 Memory 
 
 
 
 
 
 
 
 
 
 
 } 
 
 P 
 
 
 u 
 
 
 
 
 X Input 
 Fetcher 
 
 -^ 
 
 
 Y Input 
 Fetcher 
 
 -^ — «- 
 
 Z Output 
 Storer 
 
 
 
 ^ r 
 
 t ^- 
 
 
 f 
 
 
 
 
 
 
 
 I 
 
 i 
 
 _j_ 
 
 
 
 l 
 
 ' 
 
 < 
 
 > 
 
 
 X Mask 
 Checker 
 
 
 Y Mask 
 Checker 
 
 
 
 i 
 
 r 
 
 
 i 
 
 ' 
 
 
 
 X Holding 
 Register 
 
 
 Y Holding 
 Register 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Document 
 
 Numb e r 
 Selector 
 
 
 
 i 
 
 i . 
 
 » » 
 
 r 
 
 ^ 
 
 
 i 
 
 ' 
 
 < 
 
 1 1 
 
 p — 
 
 < 
 i — 
 
 
 ^ 
 
 
 
 ' 
 
 ' r 
 
 
 
 
 
 
 
 Document 
 
 Number 
 
 Comparator 
 
 
 
 
 
 Field 1 
 Generator 
 
 -* 
 
 
 
 i 
 
 f 
 
 » 
 
 
 
 — ► 
 
 
 
 Action 
 Selector 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 --— i 
 
 • 
 
 
 
 i 
 
 i 
 
 
 
 • 
 
 
 
 
 Field N 
 Generator 
 
 
 
 
 Command 
 Reeisters 
 
 
 
 
 
 
 ^ 
 »- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Figure 2.5 - Merge Element Block Diagram 
 
28 
 
 The major unit in the element is the document number comparator, which is 
 used to decide which input holding registers should be marked as empty and how 
 the output entry should be formed. Figure 2.6 illustrates a ripple-carry 
 design for this comparator, including the equations for each stage and the 
 approximate number of gates and the delay. The output XLOW is true (high) if 
 the document number in the X input register is less than or equal to the one 
 in the Y register; YLOW is true if Y is less than or equal to X. For higher 
 speed operation, a parallel comparator similar to the SN7M85 MSI unit [14] can 
 be used. The document number in the output entry is generated by using a two 
 input selector driven by the document number fields of the two input list 
 entries, and using either the XLOW or YLOW signal as required to select the 
 lower document number. 
 
 Figure 2.7 shows the simple ALU/selector used to form the output count 
 field. A comparator examines the two input count fields to determine the 
 lesser, which is selected if an AND operation is being performed. Since only 
 one of the four AND gates is selected in this case, the adder passes the 
 desired count field directly to the output* If either an OR or an AND NOT 
 operation were specified, the proper inputs to the adder are selected based on 
 the XLOW and YLOW control signals — if the appropriate nLOW control signal is 
 true, the Cnt(n) data is fed to the adder, while if nLOW is false, a zero is 
 sent. Other output generators, such as for the tag bit, can be added in a 
 similar fashion. 
 
29 
 
 i=CH . 
 
 • • • 
 
 XLOW 
 
 XL =0 
 
 YL =0_ 
 
 XLiMXi-Yi-YLi.jJ + XL,.! 
 
 YLt^XfYj-XL^ + YLi., 
 
 XLOW=YL n 
 
 YLOW = XL n 
 
 GATES = 4* n 
 
 MAX DELAYS n* AVERAGE GATE DELAY 
 
 Figure 2.6 - Ripple Carry Number Comparator 
 
 CNT(X)<CNT(Y) 
 
 YLOW 
 
 'AND 
 OPERATION 
 
 CNT(Z) 
 
 Figure 2.7 - Parallel Count Field Generator 
 
30 
 2.H Serial Element Implementation 
 
 Figure 2.8 is a block diagram of the serial implementation of the 
 processor. The data is placed in parallel into the X or Y shift registers 
 from memory-. From here, it is sent one bit at a time to the merge element, 
 and is also routed back to the shift register in case the data is needed by 
 the merge element on the next cycle. When the control determines that a 
 register is empty, new data is loaded into it in parallel replacing the old 
 data collected from the feedback path. 
 
 For the serial element to function properly, it must be sent the most 
 significant bit of the document number first. If other fields are to be 
 processed, the order must be: document number, context field, tag, and count, 
 with all numbers sent most significant digit first. Since this order can be 
 produced by the connection between the holding shift registers and the memory, 
 there are no special format requirements for data in the memory. 
 
 The basic logic of the serial element is shown in Figure 2.9, and 
 consists of two flip-flops (5 and 6, and 8 and 9), logic to set these 
 flip-flops, and a network to select one of the inputs based on the state of 
 the flip-flops. At the start of the operation, both flip-flops are reset, 
 allowing both inputs to pass thru gates 1, 2, and 3 to the output. This 
 continues as long as both input bit streams are equal, since the output of 
 gate ? inhibits either gate 4 or gate 7 from setting the flip-flops. However, 
 when the first difference in the inputs occurs, the output bit is forced to be 
 a zero (since this is the bit in the lowest entry at this time) and one of the 
 flip-flops is set. This causes the output to follow the input which contained 
 the zero bit. 
 
31 
 
 11 . 
 
 X SHIFT REGISTER 
 
 SERIAL 
 MERGE 
 ELEMENT 
 
 Z COLLECTION REGISTER 
 
 Y SHIFT REGISTER 
 
 7i 
 
 V 
 
 z 
 
 Figure 2.8 - Serial Merge Element Block Diagram 
 
 CLOCK 
 
 Figure 2.9 - Basic Serial Merge Element 
 
32 
 
 The addition of extra fields to the serial element is not as simple as it 
 was for the parallel element. Figure 2.10 illustrates the complete data 
 section for a unit which handles count, tag, and context flags in addition to 
 the document number. Gates 10, 11, 12, and 13 are used to examine the context 
 flags against two masks sent bit serially from the control section. If a 
 match failure occurs, the CONTEXT signal is reset and XLOW or YLOW is set to 
 indicate that the input is empty* A conventional serial adder (gates 14, 1S, 
 16, 17, 18, and 19) is used to form the sum as required for the count field in 
 an AND operation. For an OR or AND NOT operation, the basic network is used 
 to form the minimum or select the desired entry. 
 
 The actual output bit is selected by gates 20, 21, and 22. In the case 
 
 of the tag bit, the selection of the AND of the input tags required by the OR 
 
 operation can be done using gate 3, while the OR necessary for the AND 
 
 operation can be produced using both 3 and 19. (Remember that A OR B is 
 equivalent to A XOR 3 OR A AND B.) Figure 2.11 is a flow diagram illustrating 
 
 the order in which these control signals are generated for the different 
 operations. 
 
 Figure 2.12 summarizes the time required by both a fully parallel and a 
 bit serial implementation of the merge processor to handle entries of 
 representative sizes. Both standard TTL logic (11 ns nominal gate delay) and 
 Schottly-clamped TTL (3 ns gate delay) speeds are given. Correspondingly 
 higher speeds would be available using ECL 10000 or other logic families with 
 gate delays of 1 ns or less. In the case of the parallel implementation, 
 speeds for similar MSI devices were used instead of calculating the gate 
 delays for the selectors, comparators, and adders * 
 
33 
 
 RESET2- 
 CLK2 - 
 
 SELSUM 
 
 XMASK 
 
 RESET1 
 
 -a 14 
 
 > 
 
 »> 
 
 3E> 
 
 E> 
 
 E> 
 
 j=©-pB> 
 
 CARRY 
 
 CONTEXT 
 
 YLOW 
 
 XLOW 
 
 CLK1 
 
 Figure 2.10 - Complete Serial Merge Element 
 
34 
 
 START 
 
 Pulse RESET1, RESET2 , RESET 3 - 1 
 SELSUM - 
 SELLOW - 1 
 
 Full (X)- Full (Y) -Empty (Z) ? 
 Y| Nl 
 
 All Document Numbers Processed T 
 N j *| 
 
 Shift Inputs 
 
 Load Output 
 Pulse CLK1 = 1 
 
 SELLOW - 
 
 All Context Bits Processed ? 
 
 y I In 
 
 Set Empty(X) » XLOW 
 S«t Empty (Y) • YLOW 
 
 I 
 Context Flag Set? 
 
 I "AND* 
 
 ■ Operation 
 l"OR" 
 
 'AND NOT 
 
 1 
 
 Shift Masks 
 Shift Inputs 
 
 Load Output 
 Pulse CLK3 - 1 
 
 Set Full(Z) - XLOW-YLOW Set Full(Z) - 1 Set Full(Z) - yToV 
 
 Shift Input and Output by 
 Number of Remaining Bits 
 
 "AND" ? 
 _Yj 1_N_ 
 
 Pulse RESET1 - 1 
 SELLOW, SELSUM ■ 1 
 
 SELLOW - 1 
 
 Shift Inputs 
 
 I 
 
 Load Output 
 
 "OR" -XLOW -YLOW ? 
 Yl IN 
 
 SELSUM - 1 
 SELLOW - 
 
 SELSUM - 
 SELLOW - 1 
 
 All Count Bits Processed ? 
 Yl 
 
 I" 
 
 Signal End-of-Cycle to 
 Memory Interface Control 
 
 Shift Inputs 
 
 I 
 
 Load Output 
 
 Pulse CLK1, CLK2 - 1 
 
 Figure 2.11 - Serial Element Control Flow Chart 
 
35 
 
 i> 
 
 M 
 
 tsl 
 
 X 
 
 X 
 
 z 
 
 Z 
 
 o 
 
 o 
 
 00 
 
 o 
 
 ^J- 
 
 rsi 
 
 N N 
 
 o o 
 
 fM O 
 
 t-n oo 
 
 hJ 
 
 ►4 
 
 H 
 
 H 
 
 H 
 
 H 
 
 TJ 
 
 >. 
 
 P 
 
 r^ 
 
 re 
 
 +-> 
 
 T3 
 
 4-> 
 
 P 
 
 o 
 
 03 
 
 rP 
 
 •P 
 
 u 
 
 re 
 
 00 
 
 3. 
 
 hO 
 
 CO 
 
 P 
 o 
 
 co 
 
 CO 
 
 p. 
 
 P 
 
 LO 
 
 o 
 
 • 
 
 t-^ 
 
 t-O 
 
 en 
 
 P 
 
 
 
 1 
 
 
 
 o 
 
 
 
 p 
 
 
 
 •H 
 
 N 
 
 N 
 
 o 
 
 
 
 +-> 
 
 S 
 
 ± 
 
 •H 
 
 00 
 
 CO 
 
 re 
 
 z 
 
 Z 
 
 +-> 
 
 3. 
 
 P 
 
 +-> 
 
 
 
 re 
 
 
 
 p 
 
 o 
 
 o 
 
 ■m 
 
 vO 
 
 o 
 
 CD 
 
 -3- 
 
 o 
 
 p 
 
 • 
 
 ro 
 
 6 
 
 CM 
 
 VO 
 
 0) 
 
 (N] 
 
 r^ 
 
 CD 
 
 i— 1 
 
 
 
 CD 
 
 
 
 Ph 
 
 
 
 rH 
 
 
 
 6 
 
 
 
 Ph 
 
 
 
 rH 
 
 CO 
 
 CO 
 
 ■H 
 
 N 
 
 N 
 
 re 
 
 n 
 
 p 
 
 (h 
 
 s 
 
 s 
 
 fH 
 
 
 
 CD 
 
 Z 
 
 z 
 
 re 
 
 o 
 
 o 
 
 CO 
 
 
 
 Ph 
 
 o 
 
 CO 
 
 1 
 
 CO 
 
 vO 
 
 I 
 
 (N) 
 
 
 
 r— 1 
 
 vO 
 
 CO CO 
 
 hJ 
 
 i-q 
 
 H 
 
 H 
 
 H 
 
 H 
 
 13 
 
 >. 
 
 P 
 
 ,* 
 
 re 
 
 +-> 
 
 -d 
 
 4-> 
 
 p 
 
 O 
 
 re 
 
 X 
 
 +-> 
 
 u 
 
 CO CO 
 
 
 N 
 
 M 
 
 ts) 
 
 
 i 
 
 1 
 
 1 
 
 
 r-- 
 
 O 
 
 to 
 
 
 • 
 
 LO 
 
 t-O 
 
 
 o 
 
 
 r- 1 
 
 
 r>- 
 
 
 
 
 rH 
 
 II 
 
 II 
 
 
 II 
 
 I— 1 
 
 i— I 
 
 
 
 CD 
 
 CD 
 
 
 CO 
 
 I— 1 
 
 i— 1 
 
 
 p 
 
 I— 1 
 
 i— 1 
 
 
 
 re 
 
 re 
 
 
 o 
 
 p 
 
 p 
 
 
 LO 
 
 v$ 
 
 re 
 
 
 t^ 
 
 p. 
 
 Ph 
 
 
 CO 
 
 p 
 
 p 
 
 
 +J 
 
 •H 
 
 •H 
 
 
 ■ H 
 
 
 
 
 X> 
 
 U) 
 
 bO 
 
 
 
 p 
 
 P 
 
 
 00 
 
 •H 
 
 ■ H 
 
 N 
 
 rsi 
 
 p 
 
 P 
 
 S 
 
 i— i 
 
 P 
 
 P 
 
 z 
 
 
 CD 
 
 CD 
 
 
 II 
 
 Mh 
 
 4-1 
 
 VO 
 
 
 CO 
 
 CO 
 
 • 
 
 >^ 
 
 P 
 
 p 
 
 *d- 
 
 P 
 
 re 
 
 re 
 
 cni 
 
 O 
 
 P 
 
 p 
 
 
 e 
 
 +-> 
 
 ■M 
 
 M 
 
 CD 
 
 
 
 
 Z 
 
 CO 
 
 CO 
 
 l CO 
 
 
 n3 
 
 t3 
 
 CO pj 
 
 CD 
 
 re 
 
 re 
 
 cu 
 
 P 
 
 <d 
 
 CD 
 
 'H O 
 
 O 
 
 rC 
 
 ,P 
 
 P LO 
 
 u 
 
 
 
 O vO 
 
 
 I— 1 
 
 l—l 
 
 6 \ 
 
 T3 
 
 t— 1 
 
 r- 1 
 
 <D co 
 
 0) 
 
 < 
 
 < 
 
 s; 4J 
 
 > 
 
 
 
 ' -H 
 
 re 
 
 
 
 Xi 
 
 i— i 
 
 
 
 vO 
 
 P 
 
 
 
 1— 1 
 
 
 
 
 II 
 
 p 
 
 1— 1 
 
 
 
 >s 
 
 
 
 
 P 
 
 X 
 
 N 
 
 N 
 
 O 
 
 LO 
 
 E 
 
 a: 
 
 6 
 
 F> 
 
 z 
 
 
 CD 
 
 
 
 
 Z 
 
 r- 1 
 CD 
 
 LO 
 
 LO 
 
 CD 
 
 T3 
 
 (Nl 
 
 vO 
 
 P 
 
 O 
 
 
 
 o 
 
 Z 
 
 II 
 
 II 
 
 u 
 
 
 
 
 
 o 
 
 ^ 
 
 r* 
 
 o 
 
 vO 
 
 CO 
 
 CO 
 
 <* 
 
 hO 
 
 •H 
 
 •H 
 
 \ 
 
 **», 
 
 Q 
 
 Q 
 
 i-i 
 
 S 
 
 
 
 r-\ 
 
 CD 
 
 CD 
 
 cu 
 
 i 
 
 +-> 
 
 Ph 
 
 Ph 
 
 Cv. 
 
 CO 
 
 >s 
 
 ^ 
 
 Q 
 
 ^ 
 
 +J 
 
 ■P 
 
 o< 
 
 CO 
 
 1 
 
 1 
 
 
 
 >* 
 
 o 
 
 u 
 
 Z 
 
 1— 1 
 
 to 
 
 pq 
 
 « 
 
 to 
 
 tn 
 
 Q 
 
 i— i 
 
 (Ni- 
 
 hO 
 
 CD 
 O 
 
 c 
 
 E 
 U 
 
 o 
 <1) 
 
 0- 
 
 ■p 
 c 
 <l) 
 
 E 
 <D 
 f-\ 
 W 
 
 CM 
 
 < — 
 
 CM 
 
 CD 
 
 •H 
 
 Clh 
 
36 
 
 The bandwidths of various memory devices are also given in Figure 2.12. 
 These are simply the number of bits they can supply in one second* For the 
 word oriented memories, such as processor core memories, this is the word 
 length divided by the cycle time. The bandwidth for a merge processor is 
 twice the number of bits processed in one second, since it is assumed that, on 
 the average, each cycle will use one input entry and produce one output entry. 
 The clock cycle for the serial processors are 30 ns and 110 ns , to allow for 
 cable delays and control signals outside the merge processor's data section. 
 
 As the figure illustrates, even the serial Schottky TTL processor is 
 faster than the memory on the PDP-11. Both parallel implementations are 
 considerably faster than the processors' memories, and up to 200 times faster 
 than the 3330-type disk. Since conventional computers used only about one out 
 of fifteen memory cycles to actually process the list entries, given 
 sufficiently fast local memory, a merge processor system can operate over one 
 hundred times faster than conventional systems. 
 
37 
 CHAPTER 3 — MERGE PROCESSOR NETWORKS 
 
 Many times, a query consists of more than two terms, combined using a 
 complex expression of AND's, OR's, and MOT's. It is highly desirable to be 
 able to process a query of this type directly, both to save time and to reduce 
 the requirement for storing intermediate results. This can be done by taking 
 the simple merge element considered in Chapter 2, making minor modifications 
 to the design, and connecting a number of them in a network which directly 
 produces the desired result. 
 
 The network at the top of Figure 3.1 illustrates how AND, NOT, and OR 
 elements can be connected to form a more complex expression. Although the 
 network performs its functions on list entries, creating new intermediate 
 lists, it can be considered much the same as Boolean combinational logic, with 
 the same laws (association, distribution, etc.) applying to the expression. 
 For example, the network which is described by (A OR B) AND (C OR D) produces 
 the same results as (A AND C) OR (A AND D) OR ( B AND C) OR (B AND D) . 
 
 3-1 Network Bandwidth Considerations 
 
 In this chapter, the network is assumed to operate in a synchronized, 
 pipelined fashion, with the memory able to either fetch or store a single list 
 entry each network cycle. In most cases, this assumption regarding the memory 
 speed is valid, since the simple design of the merge processor allows for very 
 high speed operation. As can be seen in Figure 2.12, even a serial merge 
 processor implemented in conventional 7400 series TTL logic has a higher 
 bandwidth than the memory of a DEC PDP-11/40, and the parallel processor 
 
38 
 
 R=(J+K) • L-M 
 
 J 
 
 = 
 
 [5 
 
 ,8, 
 
 12, 
 
 K 
 
 r 
 
 [6 
 
 ,12 
 
 , • * 
 
 L 
 
 r 
 
 [5 
 
 ,6, 
 
 7,9, 
 
 M 
 
 = 
 
 C5 
 
 ,7, 
 
 10, 
 
 Time 
 
 M 
 
 C1 
 
 C2 
 
 1 
 
 5 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 2 
 
 5 
 
 6 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 3 
 
 8 
 
 6 
 
 X 
 
 X 
 
 5 
 
 X 
 
 X 
 
 4 
 
 8 
 
 6 
 
 5 
 
 X 
 
 5 
 
 X 
 
 X 
 
 5 
 
 8 
 
 6 
 
 5 
 
 5 
 
 5 
 
 X 
 
 X 
 
 6 
 
 8 
 
 6 
 
 6 
 
 X 
 
 5 
 
 5 
 
 X 
 
 7 
 
 8 
 
 12 
 
 6 
 
 X 
 
 6 
 
 X 
 
 X 
 
 8 
 
 8 
 
 12 
 
 6 
 
 7 
 
 6 
 
 X 
 
 X 
 
 9 
 
 8 
 
 12 
 
 7 
 
 7 
 
 6 
 
 X 
 
 X 
 
 10 
 
 8 
 
 12 
 
 9 
 
 X 
 
 6 
 
 7 
 
 X 
 
 1 1 
 
 X 
 
 12 
 
 9 
 
 10 
 
 8 
 
 7 
 
 6 
 
 12 
 
 X 
 
 12 
 
 X 
 
 10 
 
 8 
 
 X 
 
 X 
 
 X indicates the connection is empty 
 
 Figure 3,1 - Network to Process a Complex Expression 
 
39 
 
 constructed using Schottky TTL has a bandwidth from ?. to 10 times that of the 
 interleaved memory on the IBM System/360 Model 75. Chapter k will briefly 
 discuss some of the results obtained when the memory speed of the network is 
 some multiple of the element speed greater than one. 
 
 During each cycle, every network element examines its two input entries 
 and forms an output according to the rules given in Chapter 1 . If the 
 connection to its successor is already full from a previous cycle and the 
 operation produces a new output, it waits for a future cycle when the 
 successor connection is empty. Otherwise, the element places its result, if 
 any, on the connection, and marks its inputs empty as indicated by Figure 
 1.10. The table in Figure 3.1 illustrates how data flows thru the network. 
 
 This operation continues until an end-of-processing entry, which is a 
 number greater than the highest item number allowed, reaches the output 
 connection. This entry is introduced to the network when the fetch logic 
 detects the end of a list. An OR element outputs this special entry when both 
 inputs are equal to it, an AMD when either input is equal to it (since there 
 will be no use trying to match inputs if one of the input lists is exhausted) , 
 and an AND NOT when the X input contains the special entry. 
 
 Why should a number of merge processors be connected together to form a 
 network when the memory is capable only of fetching or storing a single entry 
 in the time that it takes the element to perform its operation? Because the 
 processing of any expression involving more than two inputs requires the 
 generation of intermediate results, which must also be both stored and 
 refetched from memoryi This reduces the number of useful memory cycles 
 available in a unit of time (the memory's effective bandwidth), increasing the 
 time required to process the expression. 
 
40 
 
 In contrast, if the expression is processed directly by a network of 
 merge elements, no intermediate results are generated. The time required for 
 the network to process an expression in this case is approximately equal to 
 the sum of the lengths of the inputs lists and the output list, multipled by 
 the memory cycle time. If intermediate results are produced, the time 
 required is this basic time plus twice the length of the intermediate results 
 (since they are both stored and fetched). 
 
 The lengths of the intermediate results are dependent upon the lengths of 
 the input lists and the amount of overlap between these inputs * If the 
 operation is to form the AND of a number of lists with negligible overlap, the 
 size of the intermediate lists will be small and the savings insignificant. 
 However, if the operation is the OR of N input lists, again with negligible 
 overlap and length L, the first stage in the tree will have a total input 
 length of N L, and an output length of (2 N L)/2, or M L. In fact, each stage 
 will have inputs and outputs equal in length to N L.- Since there are log N 
 stages in the tree, the total number of memory cycles required is 2 N L log N, 
 with only 2 N L being used for fetching input lists or storing the output 
 list. The effective increase in bandwidth using a tree structure which does 
 not require the storing and refetching of intermediate results is simply the 
 ratio of these two quantities, or log N. 
 
 In other words, for an expression containing sixteen input lists, it is 
 possible to increase memory performance by a factor of one to four by simply 
 using fourteen additional low-cost processing elements.- 
 
m 
 
 A secondary benefit occurs if a list is used in more than one place in an 
 expression. Using conventional processing, this list must be fetched each 
 place it is used. However, it is only necessary to fetch it once for all the 
 times it is used in the network, again reducing the number of memory cycles 
 required to process an expression. 
 
 This effective increase in memory speed allows two options. Either the 
 system can process a user's request faster, thereby shortening response times 
 or allowing more simultaneous users of the system, or if this higher speed is 
 not necessary, low speed memory can be used to give speeds comparable to a 
 single processor using high speed memory * If a lower speed memory is 
 utilized, this would allow the purchase of a larger memory at the same cost. 
 With this larger memory available, it is possible that fewer of the final 
 results will have to be placed on the disk, being saved in the high speed 
 memory for use with later queries. This further increases the speed of the 
 system by reducing the number of lists which must be stored and fetched from 
 the comparatively slow disk memory. 
 
 3.2 Merge Network Hardware Considerations 
 
 There are two reasonable forms for the network to take — either the 
 connections between the elements are fixed, or they can be changed under 
 control of the host computer. In either case, the function of the individual 
 elements is controlled by setup instructions from the host computer. 
 
 For a network with N inputs, N - 1 processing elements are required. 
 However, if an Omega network [15] is used to interconnect the individual 
 elements and the memory, order N log N switching elements are required. Other 
 
42 
 
 switching networks require at least as many switching elements as the Omega 
 network. For large networks, the number of gates in the switching network is 
 more than in all the processing elements. Because of this, only the fixed 
 network will be discussed in this chapter* 
 
 Since each element has two inputs and one output, the network takes the 
 form of a binary tree. This means that if a network has a possible N inputs 
 (where N is a power of two) , it contains N - 1 processing elements in log N 
 stages. Figure 3-2 illustrates this binary tree, and some of the terms used 
 when referring to parts of the tree and the individual elements. Notice that 
 the only processing element inputs and outputs which are connected to the 
 memory are at the extreme ends of the tree. Within the tree, an element's 
 output can only be connected to another element's input. Since it is probable 
 that the expression specified by the user does not have exactly N input lists, 
 some method must exist to accommodate it without the necessity of expanding or 
 contracting it to N inputs. This can be done by defining a fourth operation 
 (in addition to AND, OR, and AND NOT) which an element can perform. This is 
 the PASS operation, which transfers data from the X-input , if it is full, to 
 the output, if the output connection is empty, and marks the input empty when 
 the transfer occurs. The Y-input is marked empty unconditionally each cycle 
 to prevent an improperly specified operation from blocking the network. 
 However, any connection which has an input list as a predecessor should not be 
 used as a Y-input to the PASS element. Figure 3.3 illustrates the operation 
 of a network which includes a PASS element. 
 
U3 
 
 z 
 
 < V) >_ 
 
 t- UJ £ 
 
 = <r° 
 
 <n K 5 
 
 UJ 2 UJ 
 
 (E UJ 2 
 
 • • • 7S 
 
 • • • 65 
 
 aoinaiuisia 
 
 c 
 o 
 
 •H 
 
 JP 
 
 CO 
 S-, 
 
 Z3 
 
 hO 
 •H 
 
 c 
 o 
 o 
 
 ^ 
 
 o 
 
 -P 
 
 a; 
 u 
 
 H 
 
 >> 
 
 c 
 
 •H 
 PQ 
 
 I 
 
 C\J 
 
 m 
 
 bO 
 •H 
 
 sin<wi u 2»N 
 
44 
 
 R = S + T + U 
 
 S = [5,8,12, ... 
 T = [6,12, ... ] 
 U = [2,4,10,15, 
 
 Time 
 
 C1 
 
 C2 
 
 1 
 
 5 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 2 
 
 5 
 
 6 
 
 X 
 
 X 
 
 X 
 
 X 
 
 3 
 
 8 
 
 6 
 
 X 
 
 5 
 
 X 
 
 X 
 
 4 
 
 8 
 
 fi 
 
 2 
 
 5 
 
 X 
 
 X 
 
 5 
 
 8 
 
 6 
 
 4 
 
 5 
 
 2 
 
 X 
 
 6 
 
 8 
 
 6 
 
 10 
 
 5 
 
 4 
 
 ? 
 
 7 
 
 8 
 
 6 
 
 X 
 
 5 
 
 10 
 
 4 
 
 8 
 
 8 
 
 X 
 
 X 
 
 6 
 
 10 
 
 5 
 
 9 
 
 8 
 
 X 
 
 X 
 
 X 
 
 10 
 
 6 
 
 10 
 
 8 
 
 X 
 
 X 
 
 X 
 
 10 
 
 X 
 
 11 
 
 8 
 
 12 
 
 X 
 
 X 
 
 10 
 
 X 
 
 12 
 
 12 
 
 12 
 
 X 
 
 8 
 
 10 
 
 X 
 
 X indicates the connection is empty 
 
 Figure 3.3 - Network with PASS Element 
 
45 
 
 Operation of the network is controlled by the host system. Commands are 
 sent from the host system to the controllers for the processing elements to 
 indicate which of the four operations it should perform. The starting 
 addresses in memory and the lengths of the input lists and the output area is 
 passed to the memory interface. Finally, all connections are marked empty, 
 and the processing started. If an error is encountered, or when processing 
 has finished, the host processor is interrupted, and the network waits for the 
 next command. 
 
 Memory operation is a function of the output being full or an input 
 empty. For each cycle of the network, the output connection is checked and 
 the output list element stored if the connection is full. If no output was 
 stored, the distributor finds an input which is empty, and fetches the next 
 entry of that input list, marking the input connection full. If a list is the 
 input to more than one network element, the distributor does not fetch a new 
 entry until all elements which use the input have marked it empty. Since the 
 lengths of the lists are long (and of course if they weren't, this special 
 hardware would not be necessary) , no special priority scheme to decide which 
 list entry to fetch next is necessary. For any scheme used, the time required 
 to evaluate the expression only differs by the time required to flush out the 
 pipeline when the ends of the input lists are reached, which is proportional 
 to the height of the tree. 
 
 Figure 3.4 summarizes the number of gates required to construct the tree 
 using serial and parallel processing elements. The time required for the 
 network to process an entry is approximately that given for a single processor 
 in Figure 2.12, since the network operates as a pipelined processor. 
 
46 
 
 N term expression (which requires N-1 processing elements) 
 X bit list entry 
 
 Total gates in network = (N-1)proc + X(N+1)mem + 
 
 (N-2)conn + control 
 
 control = number of gates in central control and 
 
 interface to host system (depends greatly 
 on host system requirements) 
 
 conn = number of gates in inter-element connection 
 = 0(2) 
 
 mem = number of gates in memory interface 
 = 0(2X) 
 
 proc = number of gates in processing element and 
 
 its local control 
 = 0(30) for serial processing element 
 = 0(1 OX) for parallel processing element 
 
 using ripple-carry comparators and adders 
 
 (more if carry-lookahead used) 
 
 Figure 3-4 - Network Gate Counts 
 
l\7 
 
 3.3 A Serious Problem 
 
 Unfortunately, the network structure presented above will not work in all 
 cases. For example, Figure 3.5 illustrates a simple network which produces (L 
 AND M) OR (L AND N) OR (M AND N) . For clarity, the portion of the network 
 consisting of PASS elements has been deleted. The value X indicates that the 
 connection is empty* It is clear from the table of connection values vs. 
 time that the tree rapidly reaches a point where no changes occur. Element E3 
 is prevented from placing its next result on C?. It therefore cannot mark 
 either of its inputs M and N empty. However, new entries from list M and list 
 N must be fetched before E1 and E2 can produce results. These in turn are 
 needed by E4 and E5 to empty C3. The network is hopelessly deadlocked. 
 
 This deadlock only occurs in a network which contains AND or AND NOT 
 processing elements, and only if an input list is used as the input to more 
 than one processor. A network which does not have any shared inputs lists 
 cannot deadlock, since if one input connection to an element is full, the 
 other is capable of supplying entries until the full connection can be marked 
 as empty. 
 
 This problem can be solved by modifying the operation of the element and 
 extending the concept of a connection being full or empty. Instead of a 
 connection being full, it will be called valid . The term empty will be 
 replaced by invalid . The difference is that invalid data on a connection has 
 a definite value, while the value of an empty connection is undefined. Figure 
 3.6 is the operations table, similar to the one in Figure 1.10, for the 
 modified merge processing element. It shows the input and output control and 
 the formation of the document number field.- The count and tag fields are 
 
M8 
 
 1 
 
 
 
 ,\ 
 
 CI 
 
 
 C4 
 
 
 L 
 
 
 
 ki m- 
 
 e'i/ 
 
 
 M < 
 
 
 
 ~7Y^ 
 
 
 
 
 
 
 
 
 A/ 
 
 
 
 
 ,\ 
 
 C2 
 
 
 
 
 
 
 
 
 
 «7 
 
 > 
 
 M , .. 
 
 
 
 N ■ ■ 
 
 
 
 + 
 
 
 
 
 E5 
 
 
 
 C3 
 
 *- J i 
 
 
 
 
 
 
 
 ,\ 
 
 
 
 
 
 
 
 
 «/ 
 
 
 
 
 
 
 
 
 R= LM + LN +MN 
 
 L = [3,8,10, ... ] 
 
 M = [1,2,6,9,12, .., ] 
 
 N = [ 1,2,5,8,11, ... ] 
 
 Time 
 
 M 
 
 C1 
 
 C2 
 
 C3 
 
 CH 
 
 1 
 
 3 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 2 
 
 3 
 
 1 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 3 
 
 3 
 
 1 
 
 1 
 
 X 
 
 X 
 
 X 
 
 X 
 
 X 
 
 1J 
 
 3 
 
 2 
 
 X 
 
 X 
 
 X 
 
 1 
 
 X 
 
 X 
 
 5 
 
 3 
 
 2 
 
 2 
 
 X 
 
 X 
 
 1 
 
 X 
 
 X 
 
 6 
 
 3 
 
 2 
 
 2 
 
 X 
 
 X 
 
 1 
 
 X 
 
 X 
 
 7 
 
 3 
 
 2 
 
 2 
 
 X 
 
 X 
 
 1 
 
 X 
 
 X 
 
 8 
 
 3 
 
 2 
 
 2 
 
 X 
 
 X 
 
 1 
 
 X 
 
 X 
 
 9 
 
 3 
 
 2 
 
 2 
 
 X 
 
 X 
 
 1 
 
 X 
 
 X 
 
 X indicates the connection is empty 
 Figure 3-5 - Network Deadlock 
 
49 
 
 
 
 
 
 
 
 
 
 
 
 cfl 
 
 
 
 
 
 
 
 
 
 
 
 
 
 > 
 
 
 
 
 
 
 
 
 
 
 
 
 
 c 
 
 
 
 
 
 
 
 
 
 
 
 
 
 M 
 
 
 
 
 
 
 
 
 
 
 
 
 
 II 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 >H 
 
 
 
 
 
 
 
 
 
 
 
 
 •a 
 
 
 
 
 
 
 
 
 
 
 
 
 
 •H 
 
 •» 
 
 
 
 
 
 •o 
 
 
 
 
 T3 
 
 
 T3 
 
 rH 
 
 •o 
 
 
 •o 
 
 
 
 •H 
 
 
 
 
 •H 
 
 
 •H 
 
 CO 
 
 •H 
 
 
 •H 
 
 
 
 .H 
 
 
 
 
 iH 
 
 
 rH 
 
 > 
 
 H 
 
 
 r-i 
 
 
 
 to 
 
 
 
 
 to 
 
 
 a) 
 
 c 
 
 m 
 
 
 cfl 
 
 
 
 > 
 
 
 
 
 > 
 
 
 > 
 
 H 
 
 > 
 
 
 > 
 
 
 
 £ 
 
 
 
 
 c 
 
 M 
 
 
 c 
 
 M 
 
 II 
 
 c 
 
 
 c 
 
 
 
 II 
 
 
 
 
 II 
 
 
 II 
 
 X 
 
 II 
 
 
 II 
 
 
 
 • • 
 
 
 
 
 • • 
 
 
 • • 
 
 
 • • 
 
 
 
 
 
 X 
 
 •o 
 
 
 •o 
 
 X 
 
 "D 
 
 X 
 
 X3 
 
 X 
 
 TD 
 
 X 
 
 
 
 * 
 
 •H 
 
 
 •H 
 
 » 
 
 •H 
 
 1 
 
 •H 
 
 ■k 
 
 •H 
 
 p, 
 
 c 
 
 
 •o 
 
 <H 
 
 
 rH 
 
 •o 
 
 tH 
 
 •o 
 
 .H 
 
 TJ 
 
 tH 
 
 T3 
 
 o 
 
 
 •H 
 
 co 
 
 
 CO 
 
 •H 
 
 a) 
 
 •H 
 
 m 
 
 •rH 
 
 cfl 
 
 •H 
 
 •H 
 
 
 «H 
 
 > 
 
 
 > 
 
 iH 
 
 > 
 
 3 
 
 > 
 
 iH 
 
 > 
 
 i— | 
 
 4-> 
 
 
 (0 
 
 c 
 
 
 c 
 
 CO 
 
 C 
 
 c 
 
 Cfl 
 
 C 
 
 (0 
 
 o 
 
 
 > 
 
 M 
 
 
 M 
 
 > 
 
 M 
 
 > 
 
 H 
 
 > 
 
 M 
 
 
 < 
 
 
 II 
 
 II 
 
 
 II 
 
 II 
 
 II 
 
 II 
 
 ii 
 
 II 
 
 II 
 
 II 
 
 
 
 M 
 
 X 
 
 
 X 
 
 N) 
 
 X 
 
 Nl 
 
 X 
 
 M 
 
 X 
 
 Nl 
 
 
 
 K 
 
 n 
 
 
 » 
 
 « 
 
 „ 
 
 „ 
 
 ^ 
 
 ^ 
 
 
 m 
 
 
 X 
 
 X 
 
 X 
 
 * 
 
 * 
 
 * 
 
 * 
 
 * 
 
 * 
 
 * X 
 
 X 
 
 X 
 
 
 11 
 
 II 
 
 II 
 
 ii 
 
 II 
 
 II 
 
 II 
 
 II 
 
 ii 
 
 II II 
 
 II 
 
 II 
 
 
 Nl 
 
 M 
 
 n; 
 
 M 
 
 N) 
 
 NJ 
 
 Nl 
 
 N 
 
 N) 
 
 N CO 
 
 Nl 
 
 Nl 
 
 c 
 o 
 
 •H 
 
 .p 
 
 Cfl 
 
 t* 
 
 a 
 o 
 
 
 CO 
 
 > 
 
 o o 
 
 < 
 cc 
 o 
 
 H 
 
 
 H 
 
 
 
 
 O 
 
 
 O 
 
 
 
 CC 
 
 5r 
 
 
 5r 
 
 
 E-> 
 
 O 
 
 •» 
 
 cc 
 
 ^ 
 
 CC 
 
 O 
 
 
 n 
 
 o 
 
 Q 
 
 O 
 
 52 
 
 n 
 
 s 
 
 
 sz 
 
 
 
 ?*; 
 
 < 
 
 
 < 
 
 
 
 < 
 
 O T- 
 
 O 
 
 o o 
 
 I- O *- T- 
 
 
 X) 
 
 
 m 
 
 
 H 
 
 
 CO 
 
 
 c 
 
 <D 
 
 o 
 
 £-, 
 
 •H 
 
 (0 
 
 P 
 
 O 
 
 cc 
 
 
 t. 
 
 -P 
 
 a) 
 
 \ 
 
 a 
 
 c 
 
 o 
 
 o 
 
 
 •o 
 
 TJ 
 
 
 cu 
 
 co 
 
 •h 
 
 •H 
 
 Cm 
 
 
 •H 
 
 i 
 
 •o 
 
 
 O 
 
 •» 
 
 S 
 
 X 
 
 
 u 
 
 1 
 
 o 
 
 vr 
 
 X 
 
 m 
 
 L 
 
 0) 
 
 0) 
 
 s_ 
 
 .c 
 
 D 
 
 +J 
 
 bO 
 
 •H 
 
 •H 
 
 O 
 
 Cx, 
 
 CO 
 
 
 •H 
 
 
 X 
 73 
 
 CO 
 
 > 
 
 o 
 o 
 
 o «- 
 
 V 
 
 o 
 o 
 
 Q 
 
50 
 
 produced as before, except in the case where the two inputs are equal. Here, 
 an input is only used to form the output field if it is valid. The major 
 change consists of unconditionally placing the value of the lower input on the 
 output connection if the output connection is free (does not contain valid 
 data). In addition, the element compares the two inputs regardless of whether 
 their inputs are both valid, and sets the validty of the output based not only 
 on the relative magnitudes of the two inputs, but also on their validity. 
 Figure 3.7 shows the operation of the network in Figure 3-5, but with the 
 modified element operation. A value in parenthesis indicates that that value 
 is invalid. 
 
 A modified network is one constructed from processing elements following 
 the operations table given in Figure 3.6. An unmodified network is one formed 
 according to the original operations table in Figure 1.10. The following show 
 that a modified network will never deadlock, and will produce valid output 
 list entries identical to those of the unmodified network. 
 
 Lemma The output of any subtree of a modified network consists of the 
 union of its input lists, in increasing order with all duplicates 
 removed. Furthermore, if an output list entry is invalid, it cannot 
 become valid at some later time* 
 
 Proof First consider a subtree consisting of a single merge processing 
 element whose inputs are the input lists. By the operations table, it 
 takes the lower of its two inputs and places it on its output. It then 
 gets a higher valued input to replace the one transferred to the output. 
 Therefore, its output is the union of its two inputs, in increasing 
 order* If the same entry occurs in both its inputs, only one output 
 
51 
 
 1 
 
 
 
 ,\ 
 
 Ci 
 
 
 C4 
 
 
 L 
 
 ► 
 
 
 M r 
 
 «/ 
 
 
 IV! ™"™* 
 
 
 
 + \ 
 
 
 
 
 
 
 
 
 A/ 
 
 
 
 
 A 
 
 C2 
 
 
 
 
 
 
 
 
 
 ■;/ 
 
 > 
 
 M 
 
 
 
 IN 
 
 
 
 E5 / 
 
 
 
 
 
 
 C3 
 
 
 
 
 
 
 
 .\ 
 
 
 
 
 
 
 
 
 "/ 
 
 
 
 
 
 
 
 
 Time 
 
 M 
 
 R = LM + LN +MN 
 
 L = [3,8,10, ... 
 M = [1,2,6,9,12, 
 N = [1,2,5,8,11, 
 
 C1 
 
 C2 
 
 C3 
 
 cn 
 
 1 
 
 3 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 2 
 
 3 
 
 1 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 3 
 
 3 
 
 1 
 
 1 
 
 (1) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 (0) 
 
 4 
 
 3 
 
 2 
 
 (D 
 
 (1) 
 
 (1) 
 
 1 
 
 (0) 
 
 (0) 
 
 5 
 
 3 
 
 2 
 
 2 
 
 (2) 
 
 (1) 
 
 1 
 
 (1) 
 
 (0) 
 
 6 
 
 3 
 
 6 
 
 (2) 
 
 (2) 
 
 (2) 
 
 2 
 
 (1) 
 
 1 
 
 7 
 
 3 
 
 6 
 
 (2) 
 
 (3) 
 
 (2) 
 
 2 
 
 (2) 
 
 (1) 
 
 8 
 
 3 
 
 6 
 
 5 
 
 (3) 
 
 (2) 
 
 (2) 
 
 (2) 
 
 2 
 
 9 
 
 3 
 
 6 
 
 5 
 
 (3) 
 
 (3) 
 
 (5) 
 
 (2) 
 
 (2) 
 
 10 
 
 8 
 
 6 
 
 5 
 
 (3) 
 
 (3) 
 
 (5) 
 
 (3) 
 
 (2) 
 
 1 1 
 
 8 
 
 6 
 
 8 
 
 (6) 
 
 (5) 
 
 (5) 
 
 (3) 
 
 (3) 
 
 12 
 
 8 
 
 9 
 
 8 
 
 (6) 
 
 8 
 
 (6) 
 
 (5) 
 
 (3) 
 
 13 
 
 10 
 
 9 
 
 (8) 
 
 (8) 
 
 8 
 
 (8) 
 
 (6) 
 
 (5) 
 
 14 
 
 10 
 
 9 
 
 11 
 
 (9) 
 
 (8) 
 
 (8) 
 
 8 
 
 (6) 
 
 15 
 
 10 
 
 12 
 
 11 
 
 (9) 
 
 (10) 
 
 (9) 
 
 (8) 
 
 8 
 
 (A) indicates A is a value marked as invalid 
 Figure 3.7 - Modified Network Operation 
 
52 
 
 entry is produced, so that no duplicates exist in the output* Finally, 
 once an output has been produced, the input that was used to form it is 
 replaced by a higher valued entry, so that any future output entry, valid 
 or invalid, must be greater than the current output entry. 
 
 Assume the Lemma is true for all subtrees consisting of N stages. A 
 subtree of N + 1 stages consists of a single merge processing element, 
 with two subtrees of N stages as its inputs. Since the outputs of these 
 two subtrees are assumed to be in the form given by the Lemma, which is 
 the same form as for an input list, the arguments given for the single 
 element subtree also hold for the final element in the N + 1 stage 
 subtree. Therefore, the output of this subtree is in the form given by 
 the Lemma. By induction, the Lemma is true. 
 
 Lemma If an N stage subtree of a modified network is not deadlocked, 
 and its ouput connection remains free (either by no valid items placed on 
 it or by the successor element of the subtree immediately marking it as 
 invalid) , the lowest of its inputs is transfered to its output in not 
 more than N cycles. 
 
 Proof For a subtree consisting of a single element, the Lemma is 
 obvious. Assume the Lemma is true for all subtrees of N - 1 stages. 
 Remember that a tree of N stages consists of two subtrees of N - 1 stages 
 as inputs to a single element at stage N.- After N - 1 cycles, each of 
 these subtrees has transferred the lowest of its inputs to its output. 
 At cycle N, the final stage's element takes the lower of these two 
 subtree outputs, which is the lowest of the inputs to both subtrees, and 
 places it on its output connection. Therefore, the Lemma holds for all 
 subtrees of N stages, and by induction is true. 
 
 
53 
 
 Theorem A modified network cannot deadlock. 
 
 Proof Assume the network is deadlocked. Therefore, one or more 
 elements is unable to place a valid entry on its output connection 
 because a previous valid entry has not been marked invalid by the element 
 which has the connection for an input. Furthermore, this condition has 
 been in existence for an arbitrary length of time. This is a blocked 
 connection. 
 
 Consider the blocked connection B closest to the input end of the 
 tree. The subtree which has B as its output connection will be called X, 
 the element with B as its input, E, and the subtree which feeds E's other 
 input will be called Y. Because connection B is blocked, E only takes 
 its input from subtree Y. 
 
 If there are no input lists in common between X and Y, Y will 
 continue to transfer its input list entries to element E's input. 
 Eventually, an entry (possibly the end-of-list marker) greater than the 
 value in connection B will occur at the output of Y. This will allow E 
 to process the value in B, unblocking it. Hence, the network cannot 
 deadlock if there is not at least one input list in common between X and 
 Y. 
 
 If there is an input in common, the value of its list entry is 
 greater than or equal to the value in connection B. This is because if 
 there were an input to a subtree less than the value of its output, at 
 some later time the lower input would occur as an output entry. But this 
 cannot occur because of the first Lemma above. 
 
sn 
 
 Therefore, subtree Y has at least one input which is greater than or 
 equal to the value in connection B. Since Y is not blocked, eventually 
 the value at the common input will be the lowest input to Y. By the 
 second Lemma, shortly thereafter this value will be the output of Y. 
 Since this value is greater than or equal to the value in connection B, 
 connection B will be marked invalid by element E. Hence, the tree is not 
 deadlocked. 
 
 Theorem An OR element modified according to the operations table in 
 Figure 3.6 produces the same valid items in the same order as one 
 constructed according to the operations table in Figure 1.10. 
 
 Proof The only parts of the new operations table which must be examined 
 are those which differ from the original table. These fit two different 
 categories: the input with the lower value is valid but the higher input 
 is invalid, or both inputs are equal, but only one is valid: 
 
 In the first case, a valid result is produced and the lower input is 
 marked as invalid, where previously no action was taken. However, this 
 can produce an incorrect action only if at the next time both inputs are 
 valid, the input which held the higher invalid input now contains a valid 
 entry less than or equal to the original lower input. However, by the 
 above Lemma, this cannot occur. Hence, the OR element functions 
 correctly in this case. 
 
55 
 
 In the second case, the element produces a valid result even though 
 only one of the two equal inputs is valid. Again, this is an incorrect 
 action only if the input which contains the invalid entry were to contain 
 a valid entry less than or equal to the current invalid entry at some 
 later point in time. Since by the Lemma this cannot occur, the operation 
 is performed correctly. 
 
 A similar proof can be used to show that the AMD, AND MOT, and PASS 
 elements function correctly. Since all elements in the network function 
 correctly, it is clear that the network will always yield the correct results. 
 
 3-4 Parsing Expressions for a Fixed Tree Size 
 
 If the expression can be contained in the available tree, how the 
 expression is parsed has no effect on the required processing time. 
 Disregarding end-of-list effects, the time required to process the expression 
 is identical for all forms of the expression* It is simply proportional to 
 the lengths of the input and output lists, since the network is pipelined and 
 only one entry can be transferred to or from the memory in any one network 
 cycles However, if the expression cannot be contained in the available tree 
 in any form, the problem of reducing the processing time becomes more complex. 
 
 In the following discussion, subexpression will mean that portion of the 
 total expression which can be processed directly by the available tree. The 
 processing of a subexpression by the tree will be termed a pass , with the 
 first subexpression processed during pass one. Figure 3.8 illustrates a 
 possible scheme for numbering the passes. All the passes performed at the 
 same level of trees in the processing of an expression will be referred to as 
 a level. 
 
<tf 
 
 The following discussion assumes that the overlaps between lists is 
 
 negligible, so that the length of the list produced by an OR operation is the 
 
 sum of the two input lists' lengths, while the length of an AND is an 
 extremely small number, and can be regarded as zero. 
 
 The time required to process an expression is proportional to the sum of 
 the lengths of the inputs to all the passes plus the lengths of the outputs 
 from these passes. In particular, each entry of an intermediate list must be 
 counted twice — once as an output and once as an input. It is therefore 
 important to minimize the lengths of the intermediate lists. In addition, if 
 a list appears as the input to more than one pass, it must be counted 
 separately for each pass to determine the processing time. The major 
 trade-off is then whether to fetch a list more than once in hopes of reducing 
 the length of the intermediate results* 
 
 The easiest expressions to parse into multiple passes are those 
 consisting of all AND's or all OR's. It is obvious that it is not necessary 
 for an input list to be used more than once, so the only problem is to 
 minimize the lengths of the intermediate results. For an expression 
 consisting of all AND's, the length of any intermediate list is negligible, 
 and therefore the results of any parsing scheme will require the same 
 processing time. 
 
 For an OR expression, however, the length of an intermediate list is 
 equal to the sum of the lengths of the inputs to the pass creating the 
 intermediate list.- In this case, it is possible to reduce the lengths of the 
 
intermediate lists by including the longest input lists in the highest level 
 possible. However, each time an input list is used as the input to a pass, it 
 reduces by one the number of passes available in the next lower level. 
 Therefore, if too many input lists are used at a higher level, the number of 
 passes remaining at the lower level may be unable to handle the expression, 
 necessitating an additional lower level, which will create more intermediate 
 results. If the length of the input list being included in the higher level 
 is more than the lengths of the input lists which are used to form the passes 
 in the new lower level, the input list should be included in the higher level; 
 if not, it should be included in a lower level* 
 
 A more interesting expression is the one studied by Jane Liu [16] for 
 conventional processors. This is a sum of terms multiplied by a single term, 
 such as 
 
 (A1 OR A2 OR A3 OR ... OR AN) AND B 
 An expression of this type requires a tree whose height is equal to 1 + log n. 
 This means that at least half the elements of the tree which produces the 
 expression are PASS elements* 
 
 Each element at the input end of a tree has two input connections. There 
 are three classes of inputs which can be applied to this input pair 
 (illustrated in Figure 3*9): 
 
 Class AB, the simplest class, consisting of one of the A inputs and the B 
 input. In this case, the input element is used as an AND, 
 effectively distributing B to the selected A input. (Note that while 
 this is a reasonable way to view the operation, if this B connection 
 
58 
 
 LEVEL 1 
 
 LEVEL 3 
 
 PASS NUMBER 
 
 Figure 3.8 - Multi-Pass 
 
 Expression Process! 
 
 ng 
 
 AB INPUT PAIR 
 
 A. B 
 
 SB INPUT PAIR 
 
 SS INPUT PAIR 
 
 A j.l A j,2n 
 
 Figure 3.9 - Input Pair Classes 
 
and any other input connection were interchanged, the time required 
 to process the expression would remain the same, although the 
 elements would be assigned different functions.) 
 
 Class SB, consisting of B and a pass from a lower level which does not 
 include B in any of its inputs. 
 
 Class SS, consisting of two passes from the next lower level, each of 
 
 which contains B in its inputs. It is important to note that the 
 
 number of A inputs available at the next lower level is identical for 
 both the SB and SS input pair classes. 
 
 The following algorithm reduces the time required to process the 
 expression given above: input pairs. 
 
 1. Start at the highest level of the expression, setting B* = 0. 
 
 2-. Find the longest remaining A list. If using it in an AB input pair would 
 force another level in order to process the expression, go to Step 3. 
 Otherwise, use it in an AB input pair and set B* = Length(B). If there 
 are no more A's left, the algorithm is finished. If only one input pair 
 remains unfilled and more than one A input remains, go to Step 4. 
 Otherwise, repeat this step. 
 
 3- If the longest remaining A causes the creation of a new level, and the sum 
 of the input lengths to this new level is more than the length of A, go to 
 Step 4. Otherwise, use it in an AB input pair and set B* = Length(B). If 
 there are no more A's left, the algorithm is fininshed. If only one input 
 pair remains unfilled and there is more than one remaining A input, go to 
 Step 4. Otherwise, repeat this step. 
 
60 
 
 4. Group the remaining A's into as many subexpressions as there are remaining 
 input pairs. As much as possible, include the longest remaining A inputs 
 at the highest levels and in the same subexpressions. 
 
 5. For each remaining subexpression, if the sum of the lengths of its inputs 
 is greater than the length of B plus B*, go to Step 6. Otherwise, use the 
 subexpression in an SB input pair and set B* = Length (B). If no more 
 subexpressions exist, the algorithm is finished. Otherwise, repeat this 
 step* 
 
 6. Use this algorithm to process the subexpression AND'ed with B. Go to Step 
 5i 
 
 The algorithms for parsing other forms of expressions are similar, with 
 the trade-off between fetching an input more than once against intermediate 
 results of longer lengths. 
 
61 
 
 CHAPTER H — FUTURE RESEARCH 
 
 The previous chapters discussed merge processors and trees constructed 
 from these processors, in the form which would be used in most cases. 
 However, certain assumptions were made regarding the configuration of the 
 tree, the relative bandwidths of the processors and its memory, and the actual 
 implementation of both processors and networks , In addition, no special 
 considerations were made for the system which can support more than one user 
 at a time. 
 
 Initial research has been conducted in these areas. This chapter 
 presents the preliminary results of this research. 
 
 4s1 Higher Bandwidth Memories 
 
 In the previous discussions, it was assumed that the cycle time of the 
 memory was less than or equal to the time required for a merge processor to 
 handle a single entry. This assumption is valid for cycle times of large 
 semiconductor memories currently available. Given a merge processor 
 implemented in commercially available Schottly TTL logic, the memory cycle 
 time would have to be greater than 80 ns. for the assumption to be invalid. 
 Available and projected logic families would allow the processor to operate at 
 even higher speeds. 
 
 Furthermore, it is the memory which holds the various lists, rather than 
 the high speed buffer memory which must be considered. This is because the 
 lists must be brought from the bulk memory to the high speed memory before the 
 merge processor can use them.- The time required for this is a function of the 
 
62 
 
 bulk memory transfer rate. Current technology dictates that this bulk memory 
 be some form of disk system, because of the high volume of data. Therefore, 
 the processor speed is substantially greater than the memory cycle time. 
 Until faster technologies are available for low cost mass storage of data 
 (such as CCD shift register memories) , the assumption regarding the processor 
 vs. memory speeds will hold. 
 
 It is still important to understand the network behavior when the memory 
 cycle time is less than the merge processor speed . However, due to the nature 
 of the network, much of its behavior depends on the data contained in the 
 lists specified by the expression. Certain characteristics of the processors 
 can be examined, to better assess the network's behavior* The action of an 
 AND element reduces the size of a list, while an OR increases it. Assuming 
 negligible list overlap, when connected in a network an OR element transfers 
 one of its input to its output whenever the output connection is free. On the 
 average, the input lists are processed at half the rate that the output list 
 of the OR element is being processed by the OR element's successor. For an 
 AND element, the normal action is to process one input element every network 
 cycle, regardless of the action of the successor. 
 
 It can be shown that the maximum memory parallelism (expressed as a ratio 
 between the memory cycle time and the processor speed) utilized in the steady 
 state by a network of all OR's is 2. This is because the network produces an 
 output and requires only one input during each memory cycle. However, for a 
 network whose input stage consists of all AND's, the maximum parallelism which 
 can be utilized (again assuming negligible list overlap) is equal to the 
 number of input stage AND processors* 
 
*3 
 
 Complicating the analysis is the case where a list is used by more than 
 one processor as an input. In this case, the network actions depend greatly 
 on the data contained in the input lists. The best method to analyze the 
 network behavior in this case is with a simulator 5 
 
 Figure 4.1 illustrates the results of a number of simulations of 
 representative expressions with different amounts of memory parallelism. For 
 each expression, the first column is the memory parallelism, the second is the 
 number of network cycles required to completely process the expression, the 
 third is the memory utilization, and the final column is the processor 
 requirement. The memory utilization is simply the number of memory cycles 
 actually used divided by the total number of memory cycles available (which is 
 the product of the parallelism and the time required to process the 
 expression). The processor requirement is the number of processors needed by 
 the expression, which is given next to the expression in the figure, times the 
 time required to process the expression. The data used for the simulation 
 consisted of lists of 100 random numbers, with 5 numbers common to all lists, 
 and from 1 to 15 different numbers common between pairs of lists. 
 
 The differences in time required to process equivalent expressions when 
 the parallelism is one are due to end-of -processing conditions. A decrease of 
 one cycle in the processing time occurs with each increase in parallelism, 
 since one more input can be initially loaded. 
 
 The results show that distributing a product of sums expression into its 
 sum of products form reduces the time required and increases the memory 
 utilization. However, it greatly increases the processor requirement, far 
 more than the increase in memory utilization * In the case of a fixed tree, it 
 
64 
 
 EXPRESSION 
 
 (A+BHC+D) 
 
 [3 Processors Required] 
 
 ( AC+AD+BC+BD) 
 
 [7 Processors Required] 
 
 (A+B+C)(D+E+F) 
 
 [5 Processors Required] 
 
 ( AD+AE+AF+BD+BE+BF+CD+CE+CF) 
 
 [17 Processors Required] 
 
 (A+B)(C+D)(E+F) 
 
 [5 Processors Required] 
 
 MEMORY CYCLES MEMORY PROCESSOR 
 PARALLELISM UTILIZATION REQUIREMENT 
 
 (AB+CD) 
 
 [3 Processors Required] 
 
 1 
 
 433 
 
 0.98 
 
 1299 
 
 2 
 
 363 
 
 0.59 
 
 1089 
 
 I 
 
 358 
 
 0.40 
 
 1074 
 
 357 
 
 0.30 
 
 1071 
 
 1 
 
 431 
 
 0.-99 
 
 3017 
 
 2 
 
 310 
 
 O.-69 
 
 2190 
 
 1 
 
 300 
 
 0.47 
 
 2100 
 
 290 
 
 O.36 
 
 2093 
 
 1 
 
 654 
 
 n.99 
 
 3270 
 
 2 
 
 524 
 
 0.62 
 
 2620 
 
 3 
 
 512 
 
 0,42 
 
 2S60 
 
 l| 
 
 512 
 
 O.31 
 
 2560 
 
 5 
 
 511 
 
 0.25 
 
 2555 
 
 6 
 
 511 
 
 0.21 
 
 2555 
 
 1 
 
 656 
 
 0.98 
 
 11152 
 
 2 
 
 441 
 
 0.73 
 
 7497 
 
 3 
 
 409 
 
 0.53 
 
 6953 
 
 4 
 
 408 
 
 0.40 
 
 6936 
 
 5 
 
 408 
 
 0.32 
 
 6936 
 
 6 
 
 407 
 
 0.26 
 
 6919 
 
 1 
 
 610 
 
 0.99 
 
 3050 
 
 2 
 
 40? 
 
 0.75 
 
 2015 
 
 3 
 
 386 
 
 0.52 
 
 1930 
 
 4 
 
 384 
 
 0.39 
 
 1920 
 
 1 
 
 417 
 
 0.99 
 
 1251 
 
 2 
 
 216 
 
 0.96 
 
 648 
 
 3 
 
 190 
 
 0.69 
 
 597 
 
 4 
 
 198 
 
 0.52 
 
 594 
 
 Figure 4.1 - Simulation Results 
 
6S 
 
 is therefore advantageous to distribute to increase the number of input stage 
 AND's, as long as the expression can be contained in the tree. For the last 
 expression given in Figure U« 1 , that of a sum of products whose product terms 
 have independent inputs, it is clear that ideal memory utilization exists when 
 the parallelism is less than or equal to the number of input AMD's. 
 
 Simulations were also made to determine the effects of first in/first out 
 buffers between all the elements in the tree. In all but a few extreme cases, 
 there was no benefit in using FIFO's, and in the cases where there was benefit 
 it was negligible. Different techniques for determining which empty input to 
 fill were also tried, again with the differences being slight. 
 
 U.2 Multi-User Systems 
 
 Since it is reasonable to assume that any system requiring this 
 specialized hardware is able to support more than one user at a time, it may 
 be convenient to use more than one processor tree in the system. This can be 
 done so that a time consuming request does not block smaller requests, or to 
 take better advantage of high memory parallelism. In this case, many medium 
 sized trees (the size dependent on the normal expression lengths for the 
 system) can be used* Again, the expression is distributed to fill the tree, 
 but this is less important than previously, since the effective parallelism is 
 equal to the memory parallelism divided by the number of trees in use at a 
 given time. To avoid a single tree monopolizing the memory, the following 
 memory priority scheme can be used if the parallelism is greater than twice 
 the number of trees. First, store all outputs from the trees; then fill one 
 input of each tree; if more memory cycles are available, fill any remaining 
 inputs. If the parallelism is less than twice the number of trees, a similar 
 
66 
 
 scheme can be used, but with a commutator to decide which tree should be given 
 the first try at using the memory. 
 
 A special use for multiple processor trees is presented by the EXPLODE 
 command, as discussed in Chapter 1. This command forms the OR of a large 
 number of lists. It is not reasonable to have every tree large enough to 
 handle EXPLODE's, which occur frequently enough to be considered but not 
 frequently enough to warrant making every user's tree large enough to handle 
 thenu 
 
 Two possible solutions exist. Either one or more large tree networks can 
 be constructed to be used especially for EXPLODE's, or a Batcher merge 
 configuration similar to that proposed by Stellhorn can be used. In the 
 latter case, it may not be necessary to construct as complex a coordination 
 network. If the overlap in the list entries is slight, then very few entries 
 must be eliminated. In this case, it may be easier and faster to leave them 
 in the list, with a special flag indicating that they are not to be 
 considered. Since it is likely that the results of the EXPLODE will be used 
 as an input to some later expression, the merge processing tree can be 
 modified to ignore any list entry with this special bit set, much as it 
 presently ignores entries which do not have the proper context bits set. 
 
 The primary benefit of the Stellhorn/Batcher network is that it can be 
 expanded without limit to match the bandwidth of the memory. (However, it 
 should be remembered that the cost of the network is prder N log N.) The 
 expansion of the merge processing tree discussed in Chapter 3 is limited by 
 the length of the list element (which determines how many input bits can be 
 processed in parallel) , and the complexity of the expression (which determines 
 
67 
 
 the maximum practical tree size. But the Stellhorn/Batcher network has the 
 same problem with intermediate results as the single merge processor discussed 
 in Chapter 2. Although it is possible to connect Stellhorn's merge processor 
 to form a tree, the amount of hardware which would be necessary for a tree to 
 handle an EXPLODE would make it currently impractical. 
 
 Although it depends on the size of the list entry, the amount of 
 parallelism in the Batcher network, and the number of terms in the EXPLODE, it 
 appears that for EXPLODE's on the order of 64 terms that the time required 
 using the tree network is less than using a Stellhorn/Batcher system. 
 Slightly more gates are required for the tree network, but the product of 
 gates times time is less for the tree network < 
 
68 
 
 CONCLUSIONS 
 
 It has been observed that much of the time required to process a request 
 on a large scale inverted filed information retrieval system is used to 
 transfer and merge lists of pointers to items in the text files. This problem 
 is ill suited for conventional digital computers, which are organized 
 primarily for numeric operations. Less that ten percent of the memory cycles 
 used by the merge subroutines are to directly process the data. 
 
 It is possible to construct specialized merge processors, which can 
 operate at speeds faster than the memories generally available on conventional 
 processors, thus increasing the processing speed by a factor greater than ten. 
 In fact, it is possible for a processor of this type to operate ten times 
 faster than the memory on the IBM System/360 Model 75, so that with 
 sufficiently high speed memory, the processing time for a merge can be reduced 
 by a factor of over one hundred. 
 
 These simple processors can also be connected together to form networks, 
 primarily binary trees * This further reduces the time required to process an 
 expression, since the number of memory cycles required to store and later 
 refetch intermediate results is reduced. In fact, if the tree is large enough 
 to accommodate the complete expression, no intermediate results will be 
 produced. For an expression with N terms, this can result in an increase in 
 the effective bandwidth of log Ni An additional increase in effective memory 
 bandwidth occurs if a list is used more than once in an expression. In this 
 case, it can be fetched only once and distributed to the various network 
 inputs, while conventional approaches would require it to be fetched for each 
 time it was used. 
 
6Q 
 
 As long as the expression can be contained in the tree, any scheme used 
 for parsing it will result in the same processing time. If it is larger than 
 the available tree, it can be divided into an number of passes thru the tree 
 so that the time required is reduced. This technique was illustrated for a 
 number of different forms of expressions. 
 
 The total speed increase is highly dependent on the structure of the 
 input lists and the form of the expression. The factors mentioned above 
 combine multiplicatively , so even if the memory used has the same bandwidth as 
 for the conventional computer, it is possible to achieve increases of greater 
 than ten for a simple two term expression. If the user specifies an EXPLODE 
 which forms the OR of 64 different lists, it is possible that the speed 
 increase will be from 60 to 100. The use of higher speed memory, to better 
 match the speed of the merge processors, increases these speeds by a factor of 
 10. It is therefore possible to process large EXPLODE's up to 1000 times 
 faster than a conventional general purpose digital computer. 
 
 The introduction of a specialized list merging system of this type into a 
 large scale inverted file information retrieval system either can drastically 
 improve the response time for the existing users, or can allow substantially 
 more users to utilize the system at the same time. 
 
70 
 
 LIST OF REFERENCES 
 
 [1] L. A.- Hollaar, et al . , "System Architecture for Information Retrieval," 
 EUREKA Project Memorandum, 1975. 
 
 [?] D. B. McCarn and J. Leiter, "On-Line Service in Medicine and Beyond," 
 Science , pp 318-324, July 27, 197?. 
 
 [?] George Kingsley Zipf, The Psycho-biology of Language: an Introduction to 
 Dynamic Philology , MIT Press, 196s „ 
 
 [4] George Kingsley Zipf, Human Behavior and the Principle of Least Effort . 
 Hafner Publishing Company, 1965* 
 
 [5] M. L. Hanley, Word Index to James Joyce's Ulysses . Madison, Wis.-, 1937. 
 
 [6] B. J. Hurley, EUREKA Project Internal Memorandum, 1975. 
 
 [7] W. H, Stellhorn, "A Specialized Computer of Information Retrieval," 
 Ph.D. thesis, University of Illinois, Department of Computer Science 
 Report No. 637, October 1974. 
 
 [8] K. E. Batcher, "Sorting Networks and Their Applications," Spring Joint 
 Computer Conference , pp. 307-314, 1968* 
 
 [9] D. Hi Lawrie, "A Solution to the Squeeze Problem in Log N," EUREKA 
 Project Internal Memorandum, 1974 * 
 
 [10] D. H. Lawrie, "Squeezing Harder," EUREKA Project Internal Memorandum, 
 1974. 
 
71 
 
 [11] W. Buchholz, ed . , Planning a Computer System: Project Stretch , McOraw 
 Hill, 1962* 
 
 [12] L. A. Hollaar, "PDP-11/40 Experimental Merge nnit — Engineering 
 Drawings," EUREKA Project Internal Documentaion , 1974. 
 
 [13] L. A. Hollaar, "PDP-11/40 Experimental Merge Unit — Preliminary 
 Programming Notes," EUREKA Project Internal Documentation, 1974. 
 
 [14] Texas Instruments Incorporated, The TTL Data Book for Design Engineers t 
 
 £15] D. H. Lawrie , "Memory-Processor Connection Networks," Ph.D. Thesis, 
 University of Illinois, Department of Computer Science Report No. 
 557, February 1973- 
 
 [16] J. W. S. Liu, "Algorithms for Parsing Search Queries in Inverted File 
 Document Retrieval Systems," University of Illinois Department of 
 Computer Science Report No. 75-718, April 19*75. 
 
72 
 
 VITA 
 
 Lee Allen Hollaar was born on March 9, 1947, in Litchfield, Minnesota* 
 He received his B.S* degree in Electrical Engineering from the Illinois 
 Institute of Technology in 1969, and his M.S. in Computer Science from the 
 University of Illinois in 1974* While a student, Mr. Hollaar was employed by 
 the First National Bank of Chicago; Burroughs Corporation Defense, Space, and 
 Special Systems Group; Automation Technology, Inc., now a part of Gould Data 
 Systems; and Datalogics, a Chicago computer systems engineering firm* He was 
 resposible for the design of a number of hardware and software systems, 
 including development of fast , efficient methods for the automated 
 photocomposition of complex material. 
 
 Mr. Hollaar was a Research Assistant with the Department of Computer 
 Science, University of Illinois, from 1970 to 1975, where he was associated 
 with the EUREKA information retrieval system project* From 1974 to the 
 present, he has also been associated with the University of Illinois' Aviation 
 Research Laboratory, investigating architectures and algorithms for digital 
 avionics systems* 
 
 He is a member of Eta Kappa Nu, Phi Kappa Phi, IEEE, ACM, and the 
 Institute of Navigation. He is also an associate member of Sigma Xi. 
 
OCRAPHIC DATA 
 
 T 
 
 1. Report No. 
 
 UIUCDCS-R-75-762 
 
 <• and Subt i( It- 
 
 A LIST MERGING PROCESSOR 
 FOR INVERTED FILE INFORMATION RETRIEVAL SYSTEMS 
 
 (l.H(s) 
 
 Lee Allen Hollaar 
 
 lorming Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 insoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 plomentary Notes 
 
 3. Ret ipient's Accession Ni 
 
 5- Repon Date 
 
 October 1975 
 
 8. Performing Organization Rept, 
 
 No - UIUCDCS-R-75-762 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract 'Grant No. 
 
 US NSF DCR73-07980 A02 
 
 13. Type of Keport & Period 
 Covered 
 
 Doctoral - 1975 
 
 14. 
 
 large scale inverted file information retrieval systems implemented on conventional 
 ital computers, it is possible for more time to be spent processing and merging the 
 ex lists than on any other activity. However, this non-numeric processing cannot 
 performed efficiently by general purpose digital computers, thus reducing either 
 response time of the system or the number of simultaneous users. This paper de- 
 ibes a simple processor which can efficiently process these lists while the main 
 puter devotes itself to other tasks. 
 
 is also possible to combine these merge processors into networks which can process 
 3iex expressions directly, with no requirement for the storing and later refetching 
 intermediate results. This eliminates the need for memory cycles to store and later 
 2tch these intermediate results, effectively increasing the available memory band- 
 th. 
 
 t Words and Document Analysis. 17a. Descriptors 
 
 igns for both word-parallel and bit-serial implementations of the merge processor 
 presented. The modifications necessary for these processors to be connected as a 
 vork, and in particular to form a binary tree, along with algorithms for parsing 
 "essions which can be contained in the available tree and which are too large and 
 t be subdivided, are also discussed. 
 
 Words; 
 
 Binary Tree Networks, Computer System Architecture, Information Retrieval 
 Systems, Inverted File Databases, List Merging, Non-numeric Processing, 
 and Pipelined Networks. 
 
 ntifiers /Open-Ended Terms 
 
 >SATI Field/Group 
 
 lability Statement 
 
 PLEASE UNLIMITED 
 
 IS-35 I 10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 
 UNCLASSIFIED 
 
 21- No. of Page; 
 
 78 
 
 22. P 
 
 USCOMM-DC 40329-P7! 
 
tvcr * \ 
 
*» 
 
 &