Em 
 
 
 Km 
 
 WM&b HIS 
 
 i iSI Hi 
 
 MMbBINI Mb H H 
 
 ■mm 
 nllTTTnnnfl 
 
 HMHUH 
 
 IB 
 
 
 ■i28 
 
 p 
 
 K$ 
 
 H §11 
 
 H Hi 
 
 H ■ 
 
 ■MB 
 
 ■Bt 
 
 I 
 
 m ma wm 
 
 BBsBBBBBSm BH89B 
 
 hBH 
 
 MS 
 
 
 IBS 
 
 ■ 
 
 
 J 
 
 bhj 968 Ma ma s 
 BPBl flags Kb ■Mbm 
 
 HB 
 
 1 H 
 
 nils 
 
 H 
 
 ansa tHWMUMl 
 
 
 BBS 
 
 
 "V.^ 
 
 HHH1 
 
 ?.»! 
 
 
 ■ 
 
 
 ■1 
 
 ■I I 
 
 ■■ 
 
 ■■I 
 
 ffi*< 
 
 
 I 
 
 ■F 
 
 EH '. ~i**- H 
 ■MOMm 
 
 ■H 
 
 ■ 
 
 l» 
 
 m 
 
 
 ■mvsmi 
 
 ■ 
 
 
 BbEB 
 
 H 
 
 HIS M BS 
 
 ■ V HH ShMBBKE 
 
 Bl 
 
 M] 
 
 HI i 
 nHMffi 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 510.84 
 
 TiQr 
 
 ho.C31-64Z 
 
 Cop. Z 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before th™ 
 Latest Date stamped below. 
 
 Theft mu ,il t i on, and underlining of books are reasons 
 
 z jsxt oc,ion and may resu,t in dismi " a ' '- 
 
 To renew call Telephone Center, 333-8400 
 
 ^^!L^^!ii!!!ARY AT URBANA-CHAMPA.GN 
 
 SEP 2 2 
 
 L161— O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/specializedcompu637stel 
 
OIU. If 
 
 )ie/r37 Re P° rt No - UIUCDCS-R-74-637 
 
 NSF - OCA - GJ -36936 - 000003 
 A SPECIALIZED COMPUTER FOR INFORMATION RETRIEVAL 
 
 by 
 William Howard Stellhorn 
 
 October 1974 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 THE LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 AT URBAf\ "\<\!GN 
 
» 
 
 Report No. UIUCDCS-R-74-637 
 
 A SPECIALIZED COMPUTER FOR INFORMATION RETRIEVAL* 
 
 by 
 
 William Howard Stellhorn 
 
 October 1974 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 * 
 This work was supported in part by the National Science Foundation under 
 
 Grant No. US NSF-GJ-36936 and was submitted in partial fulfillment of the 
 
 requirements for the degree of Doctor of Philosophy in Computer Science, 
 
 October 1974. 
 
Ur 
 
 A SPECIALIZED COMPUTER FOR INFORMATION RETRIEVAL 
 
 William Howard Stellhorn, Ph.D. 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign, 1974 
 
 Response time in large, inverted file document retrieval systems 
 is determined primarily by the time required to access files of document 
 identifiers on disk and perform the processing associated with a Boolean 
 search request. This paper describes a specialized computer system capable 
 of performing these functions in hardware. Using this equipment, a com- 
 plicated sample search involving 70 terms and over 60,000 document ref- 
 erences can be performed from 12 to 60 times faster than with a conventional 
 machine, and many small searches can be processed concurrently with very 
 little effect upon system performance. 
 
 A detailed description of the system, which can be realized with 
 currently-available technology, is presented; and algorithms for controlling 
 the progress of a search are discussed. Results from numerous simulations 
 involving various system configurations and other factors are also reported. 
 
m 
 
 ACKNOWLEDGMENT 
 
 Several persons have contributed significantly to the success 
 of this project, and their help is gratefully acknowledged. Special thanks 
 are due to my advisor, Professor David J. Kuck, for his constant enthusi- 
 asm and continuing help and support. 
 
 Thanks, too, to officials at the National Library of Medicine, 
 especially Messrs. W. H. Caldwell and D. B. McCarn, for information con- 
 cerning the MEDLARS and MEDLINE systems. 
 
 Much of the test data reported here was obtained through the 
 efforts of Mr. B. J. Hurley in maintaining, modifying and running the 
 simulation program. 
 
 Finally, the financial support of the Department of Computer 
 Science and the typing and drafting services of Mrs. Vivian Alsip and 
 Messrs. S. Zundo and R. Bright have been invaluable. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 1.1 Overview 1 
 
 1.2 Operational Environment 3 
 
 2. TERM COORDINATION HARDWARE 4 
 
 2.1 System Description 4 
 
 2.2 Example 8 
 
 2.3 Hardware Requirements 11 
 
 2.3.1 Merge Network 16 
 
 2.3.2 Coordination Network 21 
 
 2.3.2.1 General Description 21 
 
 2.3.2.2 Step 1 Processing 24 
 
 2.3.2.3 Step 2 Processing 27 
 
 2.3.2.4 Step 3 Processing 31 
 
 2.3.2.5 Merge and Coordination Control Requirements. . 34 
 
 2.3.3 Data Memory 35 
 
 2.3.4 Disk File System 36 
 
 2.3.5 Control Computer 36 
 
 2.4 System Integration 37 
 
 2.5 Summary 48 
 
 3. BASIC ALGORITHMS 50 
 
 3.1 Sublist Sequencing 50 
 
 3.2 Intermediate Results 53 
 
 3.3 List Splitting 54 
 
 3.4 Special Requirements of OR, AND and NOT Processing .... 54 
 
 3.5 Processing Algorithms for the Experimental System 57 
 
 3.5.1 Overview 57 
 
 3.5.2 List Selection 57 
 
 3.5.3 Merge Initiation 58 
 
 3.5.4 File S Processing 59 
 
 3.5.5 Result Processing 59 
 
 3.5.6 Standard Parameters 60 
 
 3.6 Example 61 
 
 4. PERFORMANCE 66 
 
 4.1 Preliminaries 66 
 
 4.2 Monoprogrammed Results 73 
 
Page 
 
 4.2.1 Basic Tests 73 
 
 4.2.2 Data Base Expansion 79 
 
 4.2.3 Discussion of Performance Curves 83 
 
 4.2.4 Other Parameters 87 
 
 4.2.4.1 Overlap 88 
 
 4.2.4.2 Buffering Delay 88 
 
 4.3 Multi programmed Results 91 
 
 4.4 Algorithmic Development 96 
 
 4.5 Merge Activity 100 
 
 5. CONCLUSION 103 
 
 LIST OF REFERENCES 106 
 
 VITA 108 
 
VI 
 
 LIST OF TABLES 
 
 Page 
 
 2.1 Processing Summary for Term Coordination Example. . . 9 
 
 2.2 Design Parameters for Standard System 15 
 
 2.3 Merge Network Characteristics 19 
 
 2.4 Coordination Step 2 Control Signals for First Three Cycles 
 
 of Operation 29 
 
 2.5 Cumulative Timing for Hardware Cycle with 256 Parallel 
 
 Data Paths 41 
 
 2.6 Cumulative Timing for Hardware Cycle with 16 Parallel 
 
 Data Paths 42 
 
 3.1 Definition of Sample Search 62 
 
 3.2 Progress of Sample Search 63 
 
 4.1 Data for Long Search [15] 68 
 
 4.2 Data for Short Search [15] 70 
 
 4.3 Long Search Performance for Three Standard Systems 75 
 
 4.4 Speed Improvement Factors Relative to Conventional Processor. . 77 
 
 4.5 Elapsed Time for Short Sample Search 78 
 
 4.6 Variation of Processing Cycle Length 90 
 
 5.1 Component Cost Estimates 104 
 
vn 
 
 LIST OF FIGURES 
 
 Page 
 
 2.1 Hardware Configuration 5 
 
 2.2 Symbol for a Comparison Element 7 
 
 2.3 Logic Symbols and Device Characteristics 13 
 
 2.4 Logic Diagram for a Comparison Element 17 
 
 2.5 Merge Network Feedback Connections 20 
 
 2.6 Coordination Network Interconnections 23 
 
 2.7 Details of Coordination Steps 1 and 2 25 
 
 2.8 Coordination Step 2 Interconnections, Final Three Stages. ... 28 
 
 2.9 Transfer Mechanism Between Coordination Steps 2 and 3 33 
 
 2.10 Timing Summary for Hardware Subsystems 39 
 
 2.11 Time Distribution for Hardware Activities in the Standard 
 
 System 43 
 
 2.12 Memory Conflict Analysis 45 
 
 3.1 Definitions for Sublist Sequence Discussion 52 
 
 3.2 Processing Example: "OR" Eleven Terms 65 
 
 4.1 Basic Performance Analysis 74 
 
 4.2 Effects of Data Base Expansion 80 
 
 4.3 Comparison of Small and Large Systems with Expanded Data 
 
 Bases 82 
 
 4.4 Comparison of Large and Small System Performance with 4X 
 
 Data Base 84 
 
 4.5 Overlap Factor Variations 89 
 
 4.6 Average Search and Response Time Presentation 93 
 
 4.7 Multi programmed Results 94 
 
 4.8 Algorithm Development 97 
 
 4.9 Merge and Coordination Hardware Utilization 101 
 
1. INTRODUCTION 
 
 1 .1 Overview 
 
 During the last few years, the growth of on-line information 
 retrieval services has been rapid, and this expansion is expected to 
 continue on a major scale for a long time to come. As such a system grows 
 and prospers, two problems often arise. First, the data base tends to 
 grow--rapidly, sometimes--and it is often difficult both to justify de- 
 leting old material and to select items to be discarded. Second, the 
 number of users desiring service may also tend to increase. Both of these 
 developments increase the load on the system, until eventually it becomes 
 difficult to provide sufficiently fast response to satisfy on-line users. 
 
 A number of systems already in operation are large enough to 
 experience these problems, and many have prospects for nearly unlimited 
 growth. To cite a single example, it is reported [1, 2] that Mead Data 
 Central, Incorporated 's LEXIS (formerly OBAR) now contains the full text 
 of all New York and Ohio statutes and supreme and appellate court deci- 
 sions plus all United States Supreme Court decisions and a number of other 
 federal materials. The complete United States Code and all federal court 
 of appeals and district court decisions are to be available in the spring 
 of 1974. This data base contains well over 100 million words of English 
 text and grows at a rate of several million words per year, and the nature 
 of the material makes deletion of old documents unacceptable. Besides ex- 
 panding the coverage of its data base, Mead is said to be planning to offer 
 retrieval services in a number of states not already served. 
 
 Other large retrieval systems include those maintained by the 
 
National Library of Medicine [3], to be discussed in more detail later, and 
 by the United States Patent Office [4]. 
 
 This report describes a specialized hardware subsystem for per- 
 forming the time-consuming term access and coordination functions in large, 
 inverted file, document retrieval systems. It is conservatively estimated 
 that the time to perform these functions for a large search involving 70 
 terms can be reduced by factors between 12 and 60 depending upon the size 
 of the hardware system employed. The speed-up is not so great for a smaller 
 search involving only a few terms; but in this case, a number of searches 
 can be performed in parallel with yery little effect upon the system, so 
 that the average elapsed time per search can still be reduced dramatically. 
 
 The remaining section of Chapter 1 describes the organization of 
 inverted file retrieval systems and identifies that portion of their oper- 
 ation which the proposed hardware will perform. Chapter 2 describes the 
 hardware components in detail, analyzes the timing constraints imposed by 
 each and shows that several processors of different sizes and capacities 
 can be built using currently-available subsystems and logic devices. Chap- 
 ter 3 describes several fundamental software procedures which must be pro- 
 vided to control the operation of the hardware and presents details of the 
 processing algorithms which have been used in simulating the proposed 
 system. Chapter 4 presents results of a large number of simulation ex- 
 periments in which the performance of the system has been evaluated in a 
 realistic retrieval situation. These tests are based on parameters of 
 actual searches which could be performed in a particular, large, operational 
 document retrieval system. Variations in the capacity of the hardware, the 
 size of the data memory, the size of the data base, and several other 
 
factors are considered. A few results which illustrate the potential of 
 the system to process multiple independent searches simultaneously are also 
 discussed. Conclusions are presented in Chapter 5. 
 
 1 .2 Operational Environment 
 
 Nearly all the mechanized document retrieval systems currently 
 in operation employ inverted files for data base organization. Certain 
 index terms (possibly all the information-bearing words in the original 
 text) are selected as descriptors for each document in the system. Each 
 index term is entered into a directory, the index file » along with certain 
 information including a pointer into a second directory, the postings file , 
 which contains a list of all the contexts (documents or document sub- 
 divisions) identified by the index term in question. To request infor- 
 mation from such a system, a user provides a list of index terms and 
 specifies the Boolean relationships (OR, AND, AND NOT) among them which 
 must be satisfied in any document that is retrieved. The system then 
 consults the index file to obtain the required postings file addresses, 
 reads the postings lists, and coordinates them, i.e., selects from them 
 those context identifiers which satisfy the search logic. This last pro- 
 cedure requires at least one disk access per search term and, if there is 
 a large number of search terms or if some of the associated postings lists 
 are very long, it may require a substantial amount of central processor 
 time as well. The new system described in this report accepts a list of 
 postings file addresses and performs the access and coordination operations 
 automatically, at disk speeds. 
 
2. TERM COORDINATION HARDWARE 
 
 This chapter describes in detail the proposed hardware for in- 
 verted file processing. Section 2.1 contains a brief general description 
 of the system and the functions of its various components. Section 2.2 
 presents a fairly detailed example of its operation, illustrating the 
 parallel nature of the design and some of the timing constraints which 
 must be satisfied. Hardware requirements are presented in detail, along 
 with logic designs for the critical components, in section 2.3. Section 
 2.4 contains a systems-oriented analysis, showing how the components inter- 
 act with one another and how their activity is distributed within the 
 available time. Several design alternatives are discussed, and the as- 
 sociated limitations are identified. Section 2.5 summarizes the principal 
 results of the chapter. 
 
 2.1 System Description 
 
 Term coordination in inverted file systems can be performed almost 
 entirely by hardware operating at disk speeds using the configuration shown 
 in Figure 2.1. Suppose the search "LI OR L2" is to be performed, i.e., two 
 ordered lists, LI and L2, are to be merged into a single ordered list with 
 duplicate elements removed. Suppose further that LI will be available for 
 reading before L2. LI and L2 are initially stored on disk in n-word blocks. 
 When LI becomes available, it is read into data memory and held there until 
 L2 comes under the read heads. Merging and coordination (selection of the 
 desired elements from the merged list) proceed in parallel with the reading 
 of L2, and the entire operation is completed shortly after the last block 
 of L2 has been read. The output list may be retained in data memory if 
 
cot 
 
 tt 
 
 >- 
 
 •-2 
 < 2 
 
 O UJ 
 
 ■!-> 
 
 i- 
 en 
 
 c 
 o 
 o 
 
 <u 
 
 s- 
 <o 
 
 s 
 ■o 
 
 s- 
 
 CM 
 
 i- 
 
space is available or written back on disk while the coordination procedure 
 continues. 
 
 The heart of the system is the merge network, an "odd-even" merge 
 of the type proposed by Batcher [5-7]. The basic building block of this 
 network is the comparison element, Figure 2.2, which accepts two input 
 numbers and routes their minimum and maximum to its "MIN" and "MAX" output 
 terminals, respectively. Batcher shows how these elements can be combined 
 to form a hardware merge network whose input is two n-element ordered lists 
 and whose output is a single ordered list containing 2n elements. 
 
 The coordination network selects from the output of the merge net- 
 work those elements which satisfy the Boolean logic specified in the search 
 request, and returns the edited list to the data memory. This function is 
 accomplished by comparing adjacent terms on the merged list and accepting or 
 rejecting terms as required. 
 
 To match the high speed parallel processing capabilities of the 
 merge and coordination networks, it is necessary to provide a wide-band data 
 memory and a disk system, preferably equipped with a hardware queuer, which 
 has the capability of reading simultaneously from n tracks while at the same 
 time writing simultaneously on n other tracks. Such a disk is currently in 
 use with the Illiac IV computer. 
 
 The function of the control computer during the merging operation 
 is one of supervision and bookkeeping. It provides memory management ser- 
 vices and guarantees that data are routed to the merge network and to the 
 disk in the proper sequence. 
 
MIN(X lt X 2 ) 
 
 MAXfX^Xg) 
 
 Figure 2.2. Symbol for a Comparison Element 
 
2.2 Example 
 
 As an example of the operation of the system consider the two lists 
 
 LI: 3, 5, 8, 10, 12, 27 
 
 L2: 2, 3, 5, 6, 10, 13, 25, 27, 33 
 
 and the search request "LI OR L2". Again assume that LI is available from 
 the disk before L2. Let n=3, so that three elements from either list may be 
 read or written simultaneously. This effectively divides the two original 
 lists into five sublists 
 
 LIT: 3, 5, 8 
 
 L12: 10, 12, 27 
 
 L21: 2, 3, 5 
 
 L22: 6, 10, 13 
 
 L23: 25, 27, 33. 
 
 Let one hardware cycle be defined as the time required to read or 
 write one such n-word sublist, i.e., the time required for a conventional 
 disk to transmit one computer word. This is the amount of time available 
 for one complete processing sequence in the hardware. The term cycle will be 
 used in this sense throughout this report except where a different meaning 
 is specified explicitly or is clearly intended from context. 
 
 The first step in processing the sample search request is to read 
 LI into data memory. Then, when L2 becomes available, the actions described 
 below and summarized in Table 2.1 occur during successive hardware cycles. 
 In the table, Li j refers to sublist j of input list i, LRj refers to sublist 
 
 Three is a convenient value of n to use for purposes of illustration. In 
 practice, because of the requirements of the merge network, n must be a 
 power of two. 
 

 
 
 
 
 
 
 
 
 
 
 
 , „ 
 
 
 
 J 
 
 
 
 
 
 
 
 
 
 
 O 
 
 
 LO 
 CM 
 
 A 
 
 
 re 
 
 A 
 
 
 LxJ 
 
 ^ 
 
 
 
 
 
 
 
 LO 
 
 
 i— 
 
 
 CO 
 
 
 CO 
 
 
 I— 
 
 o 
 
 
 
 
 
 
 
 A 
 
 
 fi 
 
 
 r— 
 
 
 CO 
 
 
 >— i 
 
 >— i 
 
 
 
 
 
 
 
 CO 
 
 
 CO 
 
 
 ft 
 
 
 A 
 
 
 at 
 
 i— 
 
 
 
 
 
 
 
 A 
 
 
 A 
 
 
 CM 
 
 
 r^ 
 
 
 3 
 
 O- 
 
 
 
 
 
 
 
 CM 
 
 
 vo 
 
 
 n— 
 
 
 ?NJ 
 
 
 o 
 
 
 
 
 
 
 
 ^-^ 
 
 
 V — ' 
 
 
 *. ^ 
 
 
 N^^ 
 
 
 
 
 
 
 
 
 
 at 
 —1 
 
 
 CM 
 
 Ot 
 
 
 CO 
 
 _i 
 
 
 •vi- 
 ce: 
 
 _l 
 
 
 
 
 
 
 
 
 
 
 
 _ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 LO 
 
 
 , — » 
 
 
 
 
 o 
 
 
 
 
 
 
 
 
 . s. 
 
 
 CM 
 
 
 re 
 
 
 
 
 1— 
 
 >- 
 
 
 
 
 
 LO 
 
 
 o 
 
 
 CO 
 
 
 CO 
 
 
 
 
 sr 
 
 at 
 
 
 
 
 
 A 
 
 
 A 
 
 
 r— 
 
 
 CO 
 
 
 
 
 ce: 
 
 o 
 
 
 
 
 
 CO 
 
 
 CO 
 
 
 A 
 
 
 A 
 
 
 
 
 m 
 
 21 
 
 
 
 
 
 ft 
 
 
 A 
 
 
 CM 
 
 
 r-. 
 
 
 
 
 1— 
 
 LU 
 
 
 
 
 
 CM 
 
 
 vo 
 
 
 r— 
 
 
 CM 
 
 
 
 
 LU 
 
 s: 
 
 
 
 
 
 v—* 
 
 
 V ' 
 
 
 V * 
 
 
 **^* 
 
 
 
 
 at 
 
 
 
 
 
 
 j— 
 
 
 CM 
 
 
 CO 
 
 
 ^t 
 
 
 
 
 
 
 
 
 
 q: 
 
 
 Ot 
 
 
 ce: 
 
 
 at 
 
 
 
 
 
 
 
 
 
 i 
 
 
 _l 
 
 
 _j 
 
 
 _i 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 <s 
 
 ) LU 
 
 
 
 
 
 
 
 , — . 
 
 
 LO 
 
 
 CO 
 
 
 
 u 
 
 J l— 
 
 
 
 
 
 
 
 o 
 
 
 CM 
 
 
 CO 
 
 
 
 C/- 
 
 ) <=E 
 
 
 
 ^■^ 
 
 
 t N 
 
 
 r— 
 
 
 A 
 
 
 ** 
 
 
 
 c 
 
 ) 2: 
 
 
 
 ro 
 
 
 KO 
 
 
 ft 
 
 
 CO 
 
 
 1 — 
 
 
 
 Ll 
 
 j ►— i 
 
 
 
 #% 
 
 
 A 
 
 
 o 
 
 
 r— 
 
 
 CM 
 
 
 
 <_ 
 
 ) Q 
 
 
 
 ro 
 
 
 LO 
 
 
 ^— 
 
 
 A 
 
 
 ft 
 
 
 
 c 
 
 ) at 
 
 
 
 #\ 
 
 
 *\ 
 
 
 A 
 
 
 CM 
 
 
 f»» 
 
 
 
 Q: 
 
 : o 
 
 
 
 CM 
 
 
 LO 
 
 
 00 
 
 
 r - 
 
 
 CM 
 
 
 
 a 
 
 o 
 o 
 
 
 
 CM 
 
 Lu 
 
 
 CO 
 
 Lu 
 
 
 
 
 LO 
 
 Lu 
 
 
 Lu 
 
 
 
 
 
 J 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Ll 
 
 J 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ] 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 — 
 
 J 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 a. 
 
 D. 
 
 
 
 
 
 
 
 
 
 „ , 
 
 , — . 
 
 , — s 
 
 «. — s 
 
 
 
 
 
 
 
 
 
 , .. 
 
 , — % 
 
 r^ 
 
 LO 
 
 CO 
 
 CO 
 
 
 
 
 
 
 
 
 
 CO 
 
 o 
 
 CM 
 
 CM 
 
 CO 
 
 CO 
 
 
 
 
 
 
 00 
 
 CO 
 
 00 
 
 ^O 
 
 a 
 
 a 
 
 CO 
 
 CO 
 
 1^ 
 
 r-s. 
 
 3Z 
 
 
 
 
 
 ft 
 
 a 
 
 PI 
 
 r» 
 
 o 
 
 o 
 
 t— 
 
 n— 
 
 CM 
 
 CM 
 
 A 
 
 
 
 
 
 LO 
 
 CO 
 
 LO 
 
 LO 
 
 
 1 — 
 
 ft 
 
 ti 
 
 n 
 
 ft 
 
 m 
 
 
 
 
 
 ft 
 
 n 
 
 a 
 
 ft 
 
 n 
 
 ft 
 
 CM 
 
 CM 
 
 r^ 
 
 t^» 
 
 A 
 
 
 
 
 
 CO 
 
 CM 
 
 LO 
 
 LO 
 
 CO 
 
 CO 
 
 *— 
 
 r " 
 
 CM 
 
 CM 
 
 rc 
 
 
 
 
 
 ^ 
 
 CM 
 
 CM 
 
 CO 
 
 CO 
 
 «3- 
 
 <* 
 
 LO 
 
 LO 
 
 <X> 
 
 io 
 
 
 
 LU 
 
 
 at 
 
 Lu 
 
 at 
 
 LU 
 
 at 
 
 Lu 
 
 at 
 
 LU 
 
 at 
 
 Lu 
 
 a: 
 
 
 
 CD 
 C£ 
 
 f 
 
 
 + 
 
 
 + 
 
 
 + 
 
 
 + 
 
 
 t 
 
 
 
 
 LU 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 „ — _ 
 
 
 r*. 
 
 - N 
 
 CO 
 
 * — » 
 
 
 
 
 
 
 
 
 
 CO 
 
 „ S, 
 
 CM 
 
 r». 
 
 CO 
 
 CO 
 
 
 
 
 
 
 , , 
 
 
 „ — » 
 
 
 r— 
 
 CO 
 
 *\ 
 
 CM 
 
 A 
 
 CO 
 
 ' — » 
 
 
 
 
 , , 
 
 00 
 
 , „ 
 
 LO 
 
 , — . 
 
 ft 
 
 f— 
 
 CM 
 
 A 
 
 r»» 
 
 ft 
 
 zc 
 
 
 
 
 o 
 
 ft 
 
 00 
 
 A 
 
 00 
 
 o 
 
 A 
 
 ^~ 
 
 CO 
 
 CM 
 
 l^~ 
 
 " 
 
 
 
 
 n 
 
 ID 
 
 A 
 
 CO 
 
 ft 
 
 r— 
 
 o 
 
 ft 
 
 r— ■• 
 
 ri 
 
 CM 
 
 re 
 
 
 
 
 o 
 
 ft 
 
 LT) 
 
 ft 
 
 LO 
 
 ft 
 
 1 — 
 
 o 
 
 #% 
 
 LO 
 
 A 
 
 ft 
 
 
 
 
 #1 
 
 CO 
 
 a 
 
 CM 
 
 A 
 
 <X) 
 
 9\ 
 
 r— 
 
 CM 
 
 CM 
 
 r>« 
 
 zc 
 
 
 
 
 • o 
 
 
 CO 
 
 
 LO 
 
 v -* 
 
 CO 
 
 1 
 
 r-* 
 
 •* — ■* 
 
 CM 
 
 N — ^ 
 
 
 
 
 
 ,__ 
 
 
 r _ 
 
 
 CM 
 
 1 — ' 
 
 CM 
 
 ■> — ' 
 
 co 
 
 *~-^ 
 
 1 
 
 
 
 
 'O 
 
 ,__ 
 
 r-. 
 
 CM 
 
 CM 
 
 CM 
 
 CO 
 
 i— 
 
 «5d" 
 
 CM 
 
 LO 
 
 1 
 
 
 
 
 >at 
 
 _J 
 
 a: 
 
 _J 
 
 at 
 
 _1 
 
 at 
 
 _J 
 
 Qt 
 
 1 
 
 at 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ^— v 
 
 
 CO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CO 
 
 
 CO 
 
 
 
 
 
 
 
 
 
 
 
 *■ — s 
 
 
 r— - 
 
 
 ft 
 
 
 
 
 
 
 
 
 
 
 
 LO 
 
 
 A 
 
 
 r-^ 
 
 
 
 
 
 
 
 
 
 
 Q 
 
 ■1 
 
 
 o 
 
 
 CM 
 
 
 
 
 
 
 
 
 
 
 <: 
 
 CO 
 
 
 r— * 
 
 
 #■. 
 
 
 
 
 
 
 
 
 
 
 LU 
 
 n 
 
 
 »* 
 
 
 LO 
 
 
 
 
 
 
 
 
 
 
 a; 
 
 CM 
 CM 
 
 
 CM 
 CM 
 
 
 CM 
 
 CO 
 CM 
 
 
 
 
 
 
 
 
 
 1 
 
 
 1 
 
 
 1 
 
 
 1 
 
 
 
 
 
 
 
 
 
 LU 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 C_> 
 
 i — 
 
 
 CM 
 
 
 CO 
 
 
 ** 
 
 
 LO 
 
 
 <£> 
 
 
 r-v 
 
 >- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 c_; 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 at 
 
 Q. 
 
 X 
 
 o 
 
 ro 
 
 T3 
 
 S- 
 
 o 
 o 
 
 
 o 
 
 S- 
 ro 
 
 3 
 CO 
 
 C7> 
 
 C 
 
 to 
 
 CO 
 
 ai 
 o 
 o 
 
 S- 
 
 CM 
 
 CD 
 
 ro 
 
10 
 
 j of final result, and Fj and Rj refer to the cycle j merge outputs at F 
 and R as shown in Figure 2.1. F is passed to the coordination system and 
 R is returned to the merge network for further processing. The letter H 
 denotes the computer word (111... Ill), which is used as a filler to provide 
 all postings lists with an integral multiple of n entries. 
 Processing proceeds as follows: 
 Cycle 1. A. Read L21. 
 
 B. Set R=(0,0,0), and merge R with Lll. This will pro- 
 duce outputs F1=(0,0,0) and R1=L11. Ignore Fl . 
 Cycle 2. A. Read L22. 
 
 B. Merge Rl (Lll) and L21 . Result F2 contains the n 
 (three) smallest elements in the combined list, and 
 can be passed to the coordination network. Result 
 R2 is returned to the merge network for further 
 processing. Note that R2 does not contain the data 
 element 6, which should be on the next sublist of the 
 merged result. 
 
 C. Coordinate F2, i.e., eliminate the duplicate element 3. 
 Cycle 3. A. Read L23. 
 
 B. Compare the first element of LI 2 with that of L22. It 
 will be shown later that the smaller of these determines 
 which sublist should be transmitted next to the merge 
 network. In this case, the answer is L22. 
 
 C. Merge R2 with L22. 
 
 D. Coordinate terms in F3 and combine with the previous 
 result to form the completed sublist LR1 (2,3,5) and the 
 partial result: 6. Return LR1 to data memory. 
 
11 
 
 Cycles 4 and 5 proceed as cycle 3 except that no new data are 
 read from disk. If the output is to be returned to disk, writing can begin 
 in cycle 4. During cycle 6, "high value" inputs are supplied internally to 
 the merge network in order to force R5 to the F output terminals. The 
 resulting F6 is then coordinated, padded with one "H" entry and returned to 
 memory. If the result is being returned to disk, the last sublist is 
 written during cycle 7. 
 
 In general, if lists LI and L2 contain i and j sublists, re- 
 spectively, then one new sublist is processed during each of the first 
 (i+j) hardware cycles, one additional cycle is required to generate the 
 last sublist of the result and return it to memory, and one extra cycle is 
 needed if the result is to be written on disk. The total number of cycles 
 required, t, is given by 
 
 t = i+j+w+1, (2.1) 
 
 where w is 1 if the result is written on disk and otherwise. 
 
 2.3 Hardware Requirements 
 
 Wide variation is possible in the parameters of the proposed sys- 
 tem, especially with respect to the degree of parallelism employed. A corre- 
 sponding variation exists in the demands which are placed on the various 
 system components and in the level of performance which can be achieved. 
 This section defines hardware requirements in detail, proposes ways in which 
 they can be satisfied, and identifies the factors which limit the design. 
 The performance capabilities of various configurations are the subject of 
 Chapter 4. 
 
 Throughout this analysis, the objective is to show that the re- 
 quired subsystems can be realized using currently-available technology and 
 
12 
 
 that they can operate within the time constraints imposed by the process. 
 Where detailed logic designs and their associated timings are discussed, 
 standard ECL-10,000 components have been assumed [8], and many of the 
 attractive characteristics of this family of devices have been employed. 
 In particular, fast Exclusive-OR gates, the "wired OR" and the availability 
 of both true and complement outputs from most devices have all been used. 
 Propagation delays have been increased approximately by a factor of 2 (1.5 
 for shift register parallel input and output) from published typical values 
 in order to produce a conservative design. Logic symbols and device charac- 
 teristics employed in these designs are defined in Figure 2.3. 
 
 This discussion concentrates mainly on the characteristics of 
 the largest system which is currently considered practical and useful. In 
 some cases alternative designs are mentioned, especially where slower, 
 cheaper components could be employed, but a detailed design optimization 
 study is beyond the scope of this report. 
 
 The standard design chosen for study is a system with 256 parallel 
 transmission paths throughout (n=256). Smaller systems would, of course, 
 have correspondingly less stringent requirements: results in Chapter 4 
 indicate that a yery powerful system can be built using only 16 parallel 
 paths. 
 
 A head-per-track disk with the required parallel transmission 
 facilities and with the other parameters shown in Table 2.2 has been 
 assumed. The characteristics in the table are typical of a number of well- 
 established disk units, and a head-per-track disk with parallel transmission 
 facilities and approximately the required transfer rate has also been 
 installed. 
 
13 
 
 O 
 
 NOR Gate 
 Propagation Delay: 5ns 
 
 O 
 
 Exclusive OR Gate 
 Propagation Delay: 7ns 
 
 Implied (wired) OR 
 
 T 
 
 T 
 
 SEL 
 
 A 
 
 A 
 
 Shift Register 
 
 Propagation Delay: 
 Operating Features 
 
 6ns per operation 
 
 Parallel or serial input 
 Parallel or serial output 
 Left or right shift 
 
 Q ► 
 
 D 
 
 Q ► 
 
 C 
 
 T 
 
 Latch 
 Propagation Delay: 7ns 
 Truth Table: 
 
 c 
 
 D 
 
 Q t +A 
 
 L 
 
 L 
 
 L 
 
 L 
 
 H 
 
 H 
 
 H 
 
 L 
 
 Qt 
 
 H 
 
 H 
 
 Qt 
 
 Outputs are latched on 
 positive clock transitions 
 
 Figure 2.3. Logic Symbols and Device Characteristics 
 
14 
 
 Type D, Master-Slave Fl i p-Flop 
 Propagation Delay: 7ns 
 
 Clocked Truth Table 
 
 R-S Truth Table: 
 
 c 
 
 D iQt + A 
 
 L 
 
 L 
 
 Qt 
 
 L 
 
 H 
 
 Qt 
 
 H 
 
 L 
 
 L 
 
 H 
 
 H ( H 
 
 R 
 
 S 
 
 Qt + A 
 
 L 
 
 L 
 
 Qt 
 
 L 
 
 H 
 
 H 
 
 H 
 
 L 
 
 L 
 
 H 
 
 H 
 
 N.D. 
 
 Clock "H" is positive 
 clock transition 
 
 "N.D." means not defined 
 
 R and S inputs are independent 
 of the clock 
 
 Figure 2.3 (continued). Logic Symbols and Device Characteristics 
 
15 
 
 Disk rotation time 
 
 25. ms 
 
 * 
 Word size 
 
 32. bits 
 
 Storage density 
 
 1800. words/physical track 
 
 ** 
 Tracks transmitted in parallel 
 
 256. physical tracks/logical track 
 
 Read time per sublist (one word 
 per physical track) 
 
 13.89 ys 
 
 Transfer rate 
 
 2.30(10 6 ) bits/sec. /physical track 
 
 589. (10 6 ) bits/sec. for total 256- 
 head parallel transmission 
 
 Each document identifier in a postings file occupies one computer word. 
 
 ** 
 
 Values examined range from 1 through 512 
 
 Table 2.2. Design Parameters for Standard System 
 
16 
 
 Postings files are assumed to be organized as n-word sublists 
 stored in consecutive locations on one or more logical tracks. Each entry 
 is one 32-bit word which uniquely identifies one document in the data base. 
 No identifier may appear more than once on any given list except H, the 
 "high value" filler, which may appear as many as n-1 times, but only on the 
 last sublist in the file. 
 
 2.3.1 Merge Network 
 
 For this analysis, a merge network composed of bit-serial compari- 
 son elements is employed, and it is assumed that data items on each input 
 list are arranged in nondescending order. In [5], Batcher gives a simple 
 iterative rule for constructing odd-even merge networks of any desired size 
 provided the number of elements, n, on each input list is a power of 2. He 
 also shows that a 2' D x2P merging network constructed according to this rule 
 requires p(2 p ) + 1 comparison elements and that the longest path through such 
 a network contains p + 1 comparison elements. 
 
 Batcher states that a bit-serial comparison element can be imple- 
 mented with 13 NORS, but does not give a specific design. One possible 
 implementation is shown in Figure 2.4. An initial Reset signal, R, leaves 
 y-, = y 2 = 0. As long as x-, = x 2 , no change occurs, and the outputs are 
 
 equal. As soon as x, and x ? differ, y, and y ? are changed to establish the 
 
 appropriate output connections and locked into their new states until another 
 reset signal is received. 
 
 The longest path through one comparison element contains seven 
 gates and thus requires a propagation time of 35ns under the assumptions of 
 
X^ X 2 K 
 
 17 
 
 E>^>1 
 
 O- [>J 
 
 yi =Ry 2 (Xi + yiMx 2 + yi) 
 ya s R7x (xx+yzitxa + ya) 
 
 WIN = Xiy2+x 2 72 
 
 MAX = x 1 y 2 + x 2 y2 
 
 Figure 2.4. Logic Diagram for a Comparison Element 
 
18 
 
 Figure 2.3. If 7ns latches are installed at the outputs of each comparison 
 element, then new inputs may be accepted every 42 nanoseconds. 
 
 Table 2.3 lists the number of comparison elements, the gate counts 
 
 and the maximum path lengths for network sizes of interest. The table also 
 
 2 
 shows the time required to merge two, n-element lists of 33-bit words in 
 
 networks with and without latches at each stage. 
 
 The example in the previous section emphasized the fact that only 
 half the output of the merge network (n terms) is available for coordination 
 after each hardware cycle; the other half must be fed back into the network 
 for comparison with the next input list. A group of n, 33-bit shift regis- 
 ters is required to collect the bit serial output from one cycle and present 
 it for processing in the next, as illustrated in Figure 2.5. 
 
 Special inputs to the merge network are used during the first and 
 last cycles of a merge procedure in order to force the first and last sub- 
 lists to the proper output terminals. During the first cycle, the shift 
 registers are cleared to and then applied to the lower input terminals. 
 Consequently, after the first cycle, the upper outputs are all zero and the 
 lower outputs contain valid data from the upper inputs. During the last 
 cycle of operation, the upper inputs are all set to 1 so that after that 
 cycle is. complete, the lower inputs appear at the upper outputs and all the 
 shift registers at the lower outputs are filled with l's. 
 
 2 
 The thirty-third bit is supplied by the system as required for the "AND NOT" 
 
 coordination algorithm to be described in section 2.3.2. 
 
19 
 
 rime 
 
 33-bit Words 
 With Latches 
 
 co 
 
 3- 
 
 LO 
 00 
 CO 
 
 00 
 CM 
 ■si- 
 
 O 
 
 CM 
 LO 
 
 LO 
 LO 
 
 LO 
 LO 
 
 CO 
 CO 
 LO 
 
 o 
 
 00 
 LO 
 
 CM 
 CM 
 
 LO 
 
 ..... 
 
 Merge 
 for Two Lists of 
 Without Latches 
 
 CO 
 
 PL 
 
 IT) 
 
 lo 
 
 CD 
 CO 
 CM 
 
 LO 
 
 lo 
 co 
 
 O 
 CM 
 LO 
 
 LO 
 LT) 
 
 o 
 
 CO 
 Cn 
 
 LO 
 
 LO 
 
 CO 
 
 o 
 
 CO 
 
 o 
 
 CM 
 
 cn 
 
 LO 
 
 cn 
 
 CO 
 
 o 
 
 o 
 
 LO 
 LO 
 
 Total 
 Gates 
 
 CO 
 
 CO 
 
 r-- 
 
 LO 
 CM 
 
 CO 
 
 LT) 
 
 co 
 
 CO 
 
 cn 
 o 
 
 CM 
 
 LO 
 
 o 
 o 
 
 LO 
 
 LO 
 LO 
 
 CO 
 LO 
 
 LO 
 CM 
 
 cn 
 cn 
 
 LO 
 
 Total 
 
 Comparison 
 
 Elements 
 
 - 
 
 CO 
 
 CT) 
 
 LO 
 CM 
 
 LO 
 LO 
 
 LO 
 
 r— 
 
 LO 
 00 
 CO 
 
 cn 
 
 CO 
 
 cn 
 o 
 
 CM 
 
 cn 
 
 o 
 
 LO 
 
 «3- 
 
 Comparison 
 Elements on 
 Longest Path 
 
 - 
 
 CM 
 
 CO 
 
 "* 
 
 LO 
 
 LO 
 
 r-- 
 
 CO 
 
 cn 
 
 c 
 
 Input 
 
 List 
 
 Length 
 
 - 
 
 CM 
 
 *d" 
 
 CO 
 
 LO 
 
 CM 
 CO 
 
 LO 
 
 CO 
 CM 
 
 LO 
 LO 
 CM 
 
 CM 
 
 LO 
 
 to 
 
 <_> 
 
 S- 
 Oi 
 
 o 
 
 (0 
 
 J- 
 
 o 
 
 ^. 
 
 i- 
 o 
 
 B 
 
 o> 
 
 O) 
 
 S- 
 Q) 
 
 CO 
 CM 
 
 0) 
 
 to 
 
20 
 
 co 
 co i- 
 
 UJ o 
 
 9 o 
 
 or 
 
 o 
 
 
 
 
 g* 
 
 
 
 
 
 
 11 
 
 
 
 
 
 
 
 
 
 
 H 
 
 
 
 
 
 
 
 
 
 
 OT < 
 
 
 
 10 
 
 
 
 
 UJ z 
 
 
 
 tr 
 
 
 
 
 3 o 
 
 
 
 t- Id 
 
 
 
 
 
 
 ~ (O 
 
 
 
 
 
 
 I — 
 
 
 
 
 
 
 to o 
 
 
 
 
 
 
 o 
 
 
 
 UJ 
 
 
 
 
 
 
 o 
 
 
 
 
 
 
 
 
 
 o 
 1— 
 
 
 
 
 A r 
 
 
 
 
 
 
 
 
 
 
 ^VX 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ' 
 
 
 N 
 
 i 
 
 1 i 
 
 i i 
 
 1 
 
 i 
 
 1 
 
 T 
 
 \ T 
 
 - 
 
 » 
 
 
 
 
 
 
 
 
 
 
 i fVJ 
 
 IO 
 
 
 
 
 
 1-1 
 
 CM 
 
 K) 
 
 • • 
 
 > z 
 
 
 1- + 
 » z 
 
 + • • •* 
 
 z 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 
 ui or 
 
 
 
 
 
 
 
 
 o o 
 
 
 
 
 
 
 
 
 o: $ 
 
 
 
 
 
 
 
 
 UJ h 
 
 
 
 
 
 
 
 
 2 UJ 
 
 
 
 
 
 
 
 
 z 
 
 
 
 
 
 
 i 
 
 L i 
 
 i i 
 
 \ 
 
 j 
 
 I 
 
 * oj 
 
 1 J, i 
 
 ■? z 
 + • • «w 
 
 E — 
 
 
 
 
 —1 
 
 c\j 
 
 Kl 
 
 • • 
 
 • z 
 
 - 
 
 - + 
 
 
 
 
 
 
 
 
 
 
 
 r z 
 
 z 
 
 
 
 
 
 
 
 
 
 
 
 V 
 
 
 
 
 >- 
 
 
 
 
 
 
 
 
 
 
 or 
 
 
 
 
 
 
 
 
 pi 
 
 ^ UJ 
 
 
 
 
 
 c 
 o 
 
 •r— 
 4-> 
 
 o 
 
 OJ 
 
 c 
 c 
 o 
 o 
 
 -^ 
 o 
 
 fO 
 -Q 
 "U 
 OJ 
 OJ 
 
 o 
 
 OJ 
 
 OJ 
 
 S- 
 0J 
 
 
 OJ 
 
 cr> 
 
 £2 
 
 s * 
 
 UJ o 
 Q. 
 
 a 2 
 
 3 O 
 
 or 
 
 CO 
 
 i- 
 
 D 
 
 co a. 
 
 I- h- 
 
 a. o 
 z 
 
 — or 
 
 or w 
 
 uj £ 
 
 8 -J 
 
 o 
 or 
 u. 
 
21 
 
 2.3.2 Coordination Network 
 
 2.3.2.1 General Description 
 
 The function of the coordination network is to select from the 
 output of the merge network those document identification numbers which 
 satisfy the current search request. 
 
 Suppose that the current output of the merge network is 
 
 V V 2 A' 3 A' 3 B' V 5 A' H B' 
 
 where the subscripts indicate the list of origin and "H" represents the 
 filler word which may occur at the end of a list. Then, the three allow- 
 able searches and the desired results are 
 
 A OR^ B = 1, 2, 3, 4, 5 
 A AND B = 1 , 3 
 
 and 
 
 A AND NOT B = 2, 5. 
 
 In order to make this selection, the coordination network employs 
 n identical logic circuits which compare adjacent postings as they arrive 
 in bit serial form from the merge network, and generate the appropriate 
 control, signals for the search procedure at hand. These signals are then 
 tested in reverse sequence from n to 1 , and the signal at stage i is used 
 either to retain the output at stage i or to eliminate it by shifting up 
 one stage all current outputs from stages i+1 through n. If shifting 
 occurs, the filler word is entered into stage n. A collection of shift 
 registers is used for assembling the outputs from the merge network and for 
 retaining the appropriate entries during the compression process. The 
 
22 
 
 arrangement of these components is shown in Figure 2.6. In addition, the 
 coordination network contains a collection of n registers not shown in the 
 figure which serve as a buffer for data being transferred into memory. 
 
 On the left side of the figure are the circuits which generate 
 the required control signals. Each of these has the following inputs: 
 
 RESET - initializes the circuit for a new cycle of 
 operation. 
 
 ■f-h 
 
 M., M. - i — output from the merge network. Both the 
 true signal and its complement are assumed to 
 be available. M. is used internally and is 
 
 also passed directly to the output as x. . 
 
 st 
 x. + , - i+1 — output from the merge network. 
 
 A, 0, N - control signals used to select the desired co- 
 ordination function. Each of these signals is 
 normally 1, and is changed to when in active 
 use. 
 
 C, CI - timing signals. 
 
 El . , - a control signal from the next higher numbered 
 
 stage. 
 
 E2. n - a control signal from the next lower numbered 
 i-l 3 
 
 stage. 
 REQ - a two-way line used to broadcast the current in- 
 struction to all stages during the compression 
 operation. 
 
23 
 
 SHIFT 
 N A E2 El. REQUEST 
 
 RESET 
 
 TTT 
 
 E2, El 2 X 2 
 
 N A E2 i . 1 El i X, 
 
 aim t t 
 
 m 
 
 E2 i + 1 El i+2 x i + 2 
 
 N A E2 n _! El„ 
 
 liili t t 
 
 rr 
 
 PARALLEL C 2 
 
 ENTRY 
 
 CONTROL 
 
 X 
 
 z 
 
 TT 
 
 S 
 
 ¥ 
 
 z 
 
 TT 
 
 c 2 
 
 s: 
 
 z 
 
 R la (PRIMARY) 
 
 R u ( SECONDARY) 
 
 '2a 
 
 Ria 
 
 '14-la 
 
 R l + 2a 
 
 R„«,»lll...lll 
 
 Figure 2.6. Coordination Network Interconnections 
 
24 
 
 Outputs include El. and E2. which are transmitted to the neighboring stages; 
 and x., the 33-bit serial output from the merge network, which is collected 
 
 in the primary register at stage i for further processing and also trans- 
 
 3 
 mitted to the control cirtuit for stage i-1. The SEL and REQ signals 
 
 control shift register operation. 
 
 Each stage of the coordination network contains one primary and 
 one secondary 32-bit processing register. The primary registers must be 
 able to perform shifts (for collecting serial data) and have parallel input 
 and output facilities. The secondary registers, which may be simpler 
 devices than the primary registers, serve to isolate the primary registers 
 and hold data temporarily on its way from one stage to another: no shifting 
 capability is required. The secondary register in stage n should have all 
 its input lines permanently set at 1. 
 
 Operation of this network proceeds in three phases, to be referred 
 to as steps 1, 2 and 3, where steps 1 and 2 can be implemented as shown in 
 Figure 2.7. 
 
 2.3.2.2 Step 1 Processing 
 
 The stage i output of step 1 is the signal z. , which must be 
 
 available for all i before step 2 begins. The system has been designed so 
 that no matter which procedure (AND, OR, NOT) is being performed the signal 
 z. = 1 causes the contents of primary register i to be "erased", while the 
 
 signal z. = causes the contents of primary register i to be retained. The 
 
 3 
 
 Only the 32-bit document identification number must be saved in the regis- 
 ters. 
 
25 
 
 to 
 
 _l 
 
 UJ 
 
 > 
 
 UJ 
 
 o 
 
 CO 
 
 I 
 
 o 
 I 
 
 < 
 
 z 
 
 X 
 
 X 
 
 _i 
 
 o 
 
 X 
 
 _J 
 
 -i 
 
 < 
 
 _l 
 
 X 
 
 X 
 
 UJ 
 
 or 
 
 z> 
 
 Q 
 UJ 
 O 
 O 
 
 or 
 
 0. 
 
 o 
 
 z 
 < 
 
 or 
 o 
 
 t- 
 o 
 
 z 
 
 CO 
 
 ._ 
 
 »- 
 
 
 
 h- 
 
 ? 
 
 X 
 
 O 
 
 
 — 
 
 
 X. o 
 
 »- 
 
 
 lD 
 
 V- 
 
 o b 
 
 
 
 rO 
 
 < 
 
 o < 
 
 to 
 
 
 CO 
 
 
 _i _i 
 
 Q 
 
 
 >-. 
 
 o 
 
 Z 
 
 
 U. 
 O 
 
 -J 
 
 < 
 
 UJ t 
 -1 OQ 
 
 o 
 
 0. 
 
 to 
 
 
 (O 
 
 rr 
 
 CD 
 
 UJ 
 
 
 H 
 
 UJ 
 
 5. "1 
 
 or 
 
 
 to 
 
 CO 
 
 — C\J 
 
 h- 
 
 
 to 
 
 n 
 
 Q lO 
 
 ■— 
 
 
 z 
 
 UJ 
 
 
 rr 
 
 
 o 
 
 t- 
 
 -^Ifi 
 
 iii 
 
 1- 
 
 
 z 
 
 UJ 
 
 
 i- 
 u. 
 
 £□ 
 
 3 
 
 0- 
 
 z 
 
 CO 
 
 Ul 
 
 Q < 
 
 < 
 
 ?l 
 
 or 
 
 Z _ 
 
 CVJ 
 
 fO 
 
 Q. 
 
 < Q 
 
 _l 
 
 to 
 
 UJ 
 
 
 
 
 
 1- 
 
 
 
 
 
 o 
 
 
 
 
 
 CVJ 
 
 -a 
 
 c 
 
 in 
 
 Q. 
 
 CD 
 4-> 
 00 
 
 (T3 
 
 C 
 
 "O 
 
 S- 
 
 o 
 o 
 c_> 
 
 tf- 
 o 
 
 (O 
 
 QJ 
 Q 
 
 CM 
 
 CU 
 
 en 
 
 + 
 X 
 
26 
 
 following rules determine z. , where w. is the complete document identifi- 
 cation number associated with stage i: 
 
 For "A OR B", z. = 1 if and only if w i = w i+1 , (2.2) 
 For A AND B, z i = 1 if and only if vj. f w. + , (2.3) 
 For A AND NOT B, z. = 1 if and only if either (2.4) 
 
 w i = w i+i or w i originated 
 
 on list B. 
 
 rd 
 The origin of word w. is determined from the 33 — bit at stage i, which is 
 
 for items from list A and 1 for items from list B. This last bit is dis- 
 carded after the necessary determination has been made. 
 
 As shown in Figure 2.7, the equality or inequality of w. and 
 
 w. + , is determined by means of flip-flop Dl and latch LI. Output Q of the 
 
 flip-flop is initially set to 1 . As long as successive bits of w. and 
 
 w. + , remain equal, the output of the exclusive OR (and of LI) is 0, and Dl 
 
 does not change. However, as soon as any pair of bits from w. and w. + , fail 
 
 to match, Dl is switched. Output Q then remains zero until another reset 
 signal is received. The clock signal to latch LI is normally kept high to 
 
 prevent spurious signals from affecting Dl . LI is "opened" after each of 
 
 rd 
 the first 32 bits is received, but not after the 33 — bit. In this way the 
 
 Q output of Dl is made to indicate whether or not the two adjacent document 
 
 identification numbers are equal, but it is not affected by values of the 
 
 rd 
 source tags transmitted as the 33 — bit. The effect of source tags is 
 
 registered by gate N3, whose output can be non-zero only when the "AND NOT" 
 
 operation is being performed. 
 
27 
 
 Finally, the control signal z. is generated as the implied OR of 
 
 rd 
 the outputs from gates Nl , N2 and N3 after transmission of the 33 — bit and 
 
 retained at the output of latch L2 until after step 2 of the coordination 
 
 processing is complete. When the coordination procedure is AND or OR, only 
 
 gate Nl or N2 conducts, and z. is formed according to (2.2) or (2.3). When 
 
 the coordination procedure is NOT, both N2 and N3 conduct; and, as (2.4) 
 
 rd 
 requires, z. = 1 whenever w. = w. +1 or whenever the 33 — bit transmitted at 
 
 J.L. 
 
 stage i is 1, i.e., when the i — document identification number originated 
 on list B. 
 
 The propagation time through step 1 depends upon the path of 
 interest (Figure 2.7), but in any case it is always shorter than the propa- 
 gation time through a comparison element in the merge network. Thus, the 
 only delay of real interest is the 12ns propagation time of bit 33. For the 
 sake of uniformity in hardware timing and operation, it is assumed that 33- 
 bit inputs are always used and that signal z. is available 12ns after the 
 
 rd 
 thirty-third input is received at the step 1 terminals. The 33—=- bit, of 
 
 course, has no effect on the value of z. for AND and OR processing. 
 
 .2.3.2.3 Step 2 Processing 
 
 The operation of coordination step 2 will be explained with the 
 aid of Figure 2.8 and Table 2.4. Figure 2.8 is a logic diagram for the last 
 three stages (n-2 through n) of step 2, illustrating the interconnections 
 between stages. Table 2.4 lists the signal states in these stages during 
 the first three cycles of operation. 
 
 The D2 flip-flops constitute a shift register which is used to 
 
28 
 
 
 
 D2 n -2 
 
 E2 n - 
 
 RESET 
 STAGE n-2 
 
 S Q 
 
 R 
 
 c ° 
 
 T 
 
 ci 
 
 n-3 
 
 Eln-2 
 
 = n-2 
 
 RESET 
 STAGE n-1 
 
 D2 n . 
 
 S Q 
 
 R 
 
 c ° 
 
 T 
 
 ci 
 
 E2 
 
 n-2 
 
 Hn-! 
 
 Eln-] 
 
 -Wn^O-» 
 
 N5 n 
 
 D2, 
 
 RESET 
 
 STAGE n 
 
 Ein + 1 
 
 S 
 
 R 
 
 D C ° 
 
 E2 n _: 
 
 Zn 
 
 Eln 
 
 JnOo- 
 
 CI 
 
 E2, 
 
 SHIFT 
 
 REQUEST 
 
 LINE 
 
 Sn-2 
 
 'n-i 
 
 i 
 
 Figure 2.8. Coordination Step 2 Interconnections, Final Three Stages 
 
29 
 
 c 
 
 £ 
 
 CVJ 
 
 1 
 e 
 
 00 
 
 o 
 
 o 
 
 o 
 
 CVJ 
 
 E 
 
 j a: 
 
 o 
 
 N C 
 
 |M= 
 
 CVJ 
 
 u 
 
 c 
 
 e: 
 h 
 O 
 
 J CO 
 
 c e 
 
 CVJ 
 
 o 
 
 o 
 
 o 
 
 O 
 
 1 
 
 CVJ 
 
 c 
 
 LU 
 
 - 
 
 - 
 
 - 
 
 o 
 
 £ 
 
 e 
 oo 
 
 o 
 
 o 
 
 1 
 
 E 
 
 CVJ 
 
 1 
 
 E 
 M 
 
 c£ 
 
 o 
 
 N c 
 
 |M= 
 
 CVJ 
 
 u 
 
 c 
 
 < 
 1/ 
 
 J CM 
 3 I 
 
 C E 
 
 CO 
 "> LU 
 
 o 
 
 o 
 
 O 
 
 - 
 
 1 
 
 1 
 
 e 
 Hi 
 
 - 
 
 - 
 
 O 
 
 O 
 
 C 
 
 E 
 
 oo 
 
 o 
 
 E 
 
 1 
 
 E 
 tvj 
 
 CVJ 
 
 1 
 
 E 
 
 c£ 
 
 o 
 
 |M = 
 
 |M= 
 
 CVJ 
 
 u 
 
 c 
 < 
 h 
 u 
 
 J i 
 
 3 E 
 C CVJ 
 LU 
 
 1 
 
 o 
 
 O 
 
 r— 
 
 •- 
 
 1 
 
 1 = = 
 
 - 
 
 O 
 
 o 
 
 O 
 
 
 + 
 E 
 
 LU 
 
 - 
 
 o 
 
 o 
 
 O 
 
 
 
 _J 
 
 < 
 
 1—1 
 
 h- 
 
 1— 1 
 
 1— 1 
 
 LU 
 
 _l 
 o 
 >- 
 o 
 
 CVJ 
 
 LU 
 _J 
 C_J 
 
 >- 
 
 CO 
 LU 
 
 _1 
 o 
 
 >- 
 o 
 
 E 
 O 
 
 +-> 
 03 
 S- 
 <U 
 Q. 
 
 O 
 
 4- 
 O 
 
 (/) 
 CD 
 
 a 
 >> 
 
 c_> 
 
 a> 
 
 0) 
 
 s- 
 .e 
 
 
 i- 
 
 o 
 
 4- 
 
 fO 
 
 E 
 C7> 
 
 oo 
 
 o 
 
 s- 
 
 E 
 O 
 
 o 
 
 CVJ 
 
 Q. 
 
 CD 
 +-> 
 00 
 
 E 
 O 
 
 +-> 
 ro 
 
 E 
 
 ■a 
 
 s- 
 o 
 o 
 o 
 
 ■3" 
 
 CVJ 
 
 CD 
 
 -O 
 ro 
 
30 
 
 control the selection of successive control signals. As Table 2.4 shows, the 
 true outputs of all flip-flops (signals El.) are initially set to 1, causing 
 
 R and all S. to be 0. For the first cycle of operation, El , is changed to 
 
 and CI is pulsed, reversing the outputs of D2 , but not those of any other 
 
 flip-flop. At this point, El = E2 , = 0, and R = z . Note that gate N4 
 
 "t"h 
 
 is "off" in all stages except the n — because the El signal in all other 
 
 stages is 1. For the same reason, S. = for all i f n, but S = Z . Now 
 a 1 ' ' n n 
 
 a different clock signal, C2 (see Figure 2.6), causes each primary shift 
 register whose SELECT signal (S.) is 1 to accept inputs from the stage below. 
 
 During the second cycle of operation, the at El is clocked 
 
 through D2 15 reversing its state and leaving El , = and E2 , = 1. 
 3 n- 1 3 3 n-1 n-1 
 
 Stage n-1 now behaves like stage n did in the previous cycle, setting 
 
 R = Z -J and S , = Z ■, . At the same time, E2 , turns off gate N4 , 
 
 isolating Z from the request line. El , however, is still 0, and so 
 3 n ^ n 
 
 S = R = Z , . All other stages are unaffected by these changes and gener- 
 ate S. = 0. Now, if Z -J = 1 , C2 will cause the primary register in each of 
 the last two stages to accept inputs from the stage below. If Z , = 0, 
 
 however, no further action will occur during the second cycle. 
 
 Allowing a 5ns delay per gate, 7ns to switch D2 and 10ns to broad- 
 cast the REQUEST signal, Z_. , the first phase of the step 2 operating cycle 
 
 requires 27ns. During this same time period, under the control of CI, all 
 secondary processing registers are loaded from the primary registers in the 
 stage below. Six nanoseconds are required for loading primary registers 
 
31 
 
 during the second phase of the cycle, giving a total cycle time of 33ns. 
 
 A coordination network with a 33 nanosecond operating cycle would 
 require t-, = 8.448 microseconds to process a list of 256 inputs. Of course 
 
 it may be practical to bypass stages for which Z = 0, but 8.448 y sec. 
 remains as a worst case possibility. 
 
 Faster operation can be achieved by implementing step 2 as k 
 smaller units which operate in parallel on input lists of length n/k and 
 complete the task in t-,/k y sec. Except, perhaps, for the duplication of 
 
 control signals, no special provisions of any kind would be required to 
 implement step 2 in this way--and problems associated with broadcasting the 
 REQUEST signal, R, would be reduced. A small amount of additional com- 
 plexity would be introduced into the control of step 3, where the outputs 
 from the separate units would have to be combined into a single list. 
 
 2.3.2.4 Step 3 Processing 
 
 At the completion of step 2, the n primary registers in the step 2 
 processor contain some unpredictable number, k (0 <_ k <_ n), of valid document 
 identifiers followed by n-k "fillers". These n words cannot simply be re- 
 turned to memory because they may constitute only a small part of the output 
 from the current coordination procedure, which may take several hardware 
 cycles to complete. Retaining the output from step 2 after each cycle would 
 produce a final result containing groups of valid pointers separated by groups 
 of fillers. That would be unacceptable as input for further processing either 
 as a part of the present search or in a subsequent search. Thus, it is neces- 
 sary to collect valid results from step 2 until a complete sublist of n 
 document identifiers is available or until the current process has been 
 
32 
 
 completed. The last sublist returned to memory from any coordination pro- 
 cedure may, of course, contain fillers. 
 
 The required "packing" is accomplished by means of a second set of 
 registers similar to those in the step 2 processor. Again a system of pri- 
 mary and secondary registers or equivalent physical devices is employed; and 
 again the primary registers should be capable of serial shifting as well as 
 parallel input and output, while the secondary registers need only perform 
 parallel input and output operations. These registers serve both as a 
 collection device for results from step 2 and as a transposer and buffer for 
 results returning to memory. Their relationship to other parts of the sys- 
 tem is shown in Figure 2.9. 
 
 This compression system is controlled by means of counters CS2 and 
 CS3 in the step 2 and step 3 processing units, respectively. Before step 2 
 begins, CS2 is loaded with the number n. The n — stage SELECT signal, S , 
 
 and the C2 clock are used to decrement the counter each time the contents of 
 the step 2 registers are shifted up one stage. When the procedure is com- 
 plete, the first k registers contain valid results, the last (n-k) registers 
 contain fillers, and the number k appears in CS2. This counter can then be 
 used to control the number of shift cycles performed in moving the results 
 into the step 3 unit. As Figure 2.9 indicates, data items move from the top 
 of the list in step 2 to the bottom of the list in step 3. 
 
 CS3 is loaded initially with the value n and decremented each time 
 an input is received from step 2. When the step 3 counter reaches 0, all 
 step 3 registers contain valid results. At this time, further transfers 
 from step 2 are suspended, the contents of the step 3 registers are returned 
 to memory, CS3 is reinitialized and transfers are resumed. During all 
 
33 
 
 
 1 
 
 STEP 2 OUTPUT 
 REGISTERS 
 
 
 
 STEP 3 
 
 i 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 ,r 
 
 
 
 
 
 i 
 
 
 
 1 
 
 1 
 
 
 
 2 
 
 
 i 
 
 2 
 
 
 
 
 
 
 
 
 
 ^ 1 
 
 C; 
 
 
 FROM 
 
 1 
 
 i — ^— i 
 
 L TO 
 
 COORDINATION - 
 
 n 
 
 1 
 
 1 
 
 
 n 
 
 MEMORY 
 
 STEP 1 
 
 ^ 1 
 
 "fc ~n 
 
 
 ,r 
 
 • 
 
 
 
 i 
 
 
 
 1 
 
 i 
 
 
 
 Sn 
 
 
 •L 
 
 i 
 
 
 
 
 i 
 
 1 
 
 
 
 
 
 1 
 
 i 
 
 
 
 
 
 
 
 
 
 
 
 i i 
 
 
 
 COUNTED CS2 
 
 COUNTER CSS 
 
 
 
 
 
 
 
 -T" 
 
 v 
 
 
 
 
 
 
 
 (a) One Step 2 Processor 
 
 
 1 
 
 STEP 2 OUTPUT 
 REGISTERS 
 
 STEP 3 
 
 l 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 -w — i 
 
 
 
 n 
 
 
 
 \ 
 
 • 
 
 — •< ' 
 
 1 
 
 1 
 
 
 
 
 l 
 
 2 
 
 
 
 
 !E 
 
 
 l 
 
 
 
 1 
 
 i 
 
 
 
 \ 
 
 H 
 
 c 
 
 
 
 COUNTER CS2o 
 
 J 
 
 n 
 
 
 
 c 
 
 
 
 
 
 ^ 1 
 
 
 
 
 
 1 1 
 
 
 
 1 
 
 
 
 
 
 
 
 7T\t 
 
 ■ 
 
 
 
 
 
 
 
 
 
 
 '* 
 
 f ' 
 
 
 
 
 
 
 
 1 
 ER CS2b 
 
 
 FROM 
 COORDINATION -< 
 STEP 1 
 
 S2 "/4 
 
 ?n / 1 1 
 
 COUNT 
 
 ► T0 
 
 
 
 MEMORY 
 
 
 
 
 
 
 1 
 
 
 
 
 ^ 
 
 
 
 In', 
 
 • | 
 
 
 
 
 
 
 
 
 
 
 'A 
 
 t 
 
 
 
 
 
 
 
 
 
 
 
 S '»/4 
 3n / 4 + 1 
 
 
 
 
 
 COUNTER CS2c 
 
 
 
 
 
 
 
 
 1 
 
 ) 
 
 
 
 
 
 
 
 ■ C 
 
 
 1 _ ( 
 
 1 ~~1 
 
 ► 
 
 
 
 
 
 
 
 
 c 
 
 
 
 
 
 
 
 
 s n 
 
 
 ■*i 
 
 1 1 
 
 
 
 COUNTER CSZd 
 
 
 COUNTER CS3 
 
 
 
 
 T 
 
 v 
 
 
 
 
 
 (b) Four Parallel Step 2 Processors 
 Figure 2.9. Transfer Mechanism Between Coordination Steps 2 and 3 
 
34 
 
 hardware cycles except the last, transfers between steps 2 and 3 stop when 
 CS2 reaches 0. In the last cycle only, CS2 and CS3 must both be decremented 
 to in order to eliminate unwanted entries from the top of the last sub- 
 list and place fillers in their proper positions at the bottom. 
 
 If several step 2 processors operate in parallel, then step 3 
 control becomes slightly more complicated in that each of these units must 
 be emptied in turn into the step 3 registers. The control unit will have 
 to provide for the necessary switching and supervision. 
 
 An estimate of the time required in step 3 for a single move from 
 one primary register to the next is 13ns, with 3.33 ys needed to transfer 
 the entire contents of 256 step 2 registers. 
 
 2.3.2.5 Merge and Coordination Control Requirements 
 
 Control unit requirements for the hardware system described here 
 are very modest. A 20 MHz (50ns pulse interval) clock and a signal indi- 
 cating the availability of data from the memory are required to control the 
 merge network, the latch in coordination step 1, and the shift register in- 
 puts in step 2. In addition, one or two counters are needed to initiate 
 and terminate various step 1 operations at the proper times relative to the 
 operation of the merge network. 
 
 Step 2 requires a timing interval of about 33ns (50 or 25ns might 
 be acceptable) and a provision to produce a second clock pulse at a fixed 
 interval relative to the first. This step generates internally a signal 
 pair (El, = 0, E2, = 1) which can be used to determine when processing is 
 
 complete without any additional control unit activity. Step 3 requires a 
 single clock for the primary registers and appropriate circuitry to delay 
 
35 
 
 the clock signal for the secondary registers and to monitor the various 
 counters and generate requests for memory transfers. A clock interval of 
 12.5ns is adequate for step 3 so that a basic clock frequency of 80MHz and 
 the submultiples of 20 and possibly 40 MHz can be used to control the entire 
 hardware system. 
 
 2.3.3 Data Memory 
 
 The proposed design requires a memory with a very high data rate 
 
 Q 
 
 (0(10 ) bits/sec.) and short effective cycle time (100ns or less). While 
 these requirements are stringent, they can presently be met either directly 
 by means of bipolar devices or indirectly by interleaving slower M0S units. 
 
 Because blocks of information are routed through the system in a 
 serial -by-bit, parallel -by -word fashion, it is natural to store data in 
 "transposed" format. That is, the i — n-bit physical word in the memory 
 may actually contain the i — bit from each of n data items. For the stan- 
 dard system under discussion k-word by 256-bit memory modules would be used, 
 where k is an integral multiple of 32. 
 
 Under the most severe operating conditions, simultaneous input 
 and output to both the disk and the hardware coordination system would be 
 required, and high priority memory transactions would occur at the rate of 
 
 about one every 100ns. With n = 256, the corresponding overall transfer 
 
 9 
 rate would be approximately 2x10 bits/second. As shown in section 2.4 
 
 below, all required transfers can be accomplished simply and without serious 
 conflicts using either a single 100ns cycle memory module or a collection 
 of four interleaved submodules each with a cycle time of 400ns. 
 
 At least one semiconductor manufacturer, Intel [9], is now pro- 
 moting a 100ns bipolar memory system with the required characteristics at a 
 
36 
 
 price of about ten cents per bit. More such systems and lower prices are 
 to be expected in the near future. A number of 400ns MOS units are also 
 available. 
 
 2.3.4 Disk File System 
 
 To match the high speed parallel processing capabilities of the 
 merge and coordination networks, a wideband mass storage device is required 
 for the postings file. Analyses to date have assumed the use of a head-per- 
 track disk with the capability of transmitting n tracks simultaneously. The 
 system should be capable of reading n tracks from one channel while writing 
 n tracks on another. It should also be equipped with a hardware queuer 
 which would permit the servicing of a group of outstanding I/O requests in 
 the order in which the referenced addresses became available, reducing 
 considerably the time required to process search requests involving large 
 numbers of terms. 
 
 The Illiac IV disk file system [10-12], effectively meets all 
 these requirements. It consists of two Burroughs Model II disk files 
 operating on separate channels, each with sufficient electronic circuitry 
 for reading or writing simultaneously on 128 tracks of one disk. Both 
 channels can operate concurrently at full capacity. The system employs a 
 
 disk file optimizer (hardware queuer) which accommodates up to 24 out- 
 
 9 
 standing I/O requests. The total storage capacity of this system is 10 
 
 q 
 bits, and its maximum transmission rate is 10 bits/second. 
 
 2.3.5 Control Computer 
 
 At the present stage of design, no specific implementation can be 
 given for the control computer. Conceptually, it is a computer with 
 
37 
 
 responsibility for a number of supervisory functions to be performed before 
 or during the operation of the specialized hardware. Some of these functions 
 may be distributed among the controllers for the individual devices, they 
 may be performed by one or more dedicated processors which have been opti- 
 mized for this application, or they may reside mainly within the computer 
 which has overall control of the retrieval system. 
 
 The required functions are of four broad types: communication 
 with the rest of the system, memory management , routing of sublists, and 
 internal control . The communication function consists of accepting re- 
 quests for service and postings file addresses from the main system and 
 generating the appropriate signals and control information when a search 
 is complete. Memory management refers to the dynamic allocation of space 
 in the data memory and on any scratch disks which may be used to process 
 a search. This function represents a heavy computational load since, during 
 any hardware cycle, two sublists may be removed from the data memory and two 
 more may enter. Routing in the present context means providing sublists to 
 the merge network (and to the disk) in the proper order (see section 3.1). 
 Finally, internal control refers to algorithmic decisions such as when to 
 write intermediate results on disk and whether to read, merge or skip a 
 particular list when it becomes available. These decisions are dictated by 
 the system resources available and the nature of the search at hand. They 
 have a crucial role in determining the overall performance capabilities of 
 the system. 
 
 2.4 System Integration 
 
 The purpose of this section is to illustrate how the various sub- 
 systems function together, especially with respect to their separate timing 
 
38 
 
 requirements and to the total time available for a complete operational 
 cycle. Without presenting an exhaustive catalog of possible designs, the 
 range of alternatives available is outlined. The approach here is to 
 choose a basic design which satisfies all constraints and then examine 
 various departures from this design which may be desirable and various 
 problems which can arise. It is assumed that the data memory contains a 
 number of independent modules which may be accessed simultaneously. The 
 term memory conflict implies multiple simultaneous requests for access to 
 a single module: only one such request can be honored during any given 
 memory cycle. Multiple requests involving different modules may be ser- 
 viced concurrently and do not present conflicts. Timing information from 
 sections 2.3.1 and 2.3.2 will be used extensively, and it may be helpful 
 to refer to Figure 2.7. 
 
 First, consider the overall organization of the system and the 
 timing requirements for individual operations, as shown in Figure 2.10. 
 For the basic design analysis, assume that n, the number of parallel data 
 paths, equals 256. A hardware cycle begins with 32 memory fetches to obtain 
 data. The rate at which this information may be applied to the hardware is 
 determined by the cycle rate of the memory, subject to the 42ns minimum 
 interval between bits required by the merge network. A thirty-third bit 
 supplied by the control system propagates through the merge network in 
 378ns, and requires an additional 12ns in step 1 of the coordination system. 
 If step 2 is implemented as a single unit with 256 registers, then it 
 requires a processing time of 8.448 ps. If a cluster of four identical 
 64-register units operating in parallel is employed, then the processing 
 time can be reduced to 2.112 ys. Coordination step 3 contains 256 primary 
 
39 
 
 t/> 
 
 3. S_ 
 O 
 o 4- 
 o 
 
 CM (U •/) 
 . r— CD 
 
 O O i— 
 
 I >> u 
 
 <J >1 
 
 u 
 
 s- 
 ai cm 
 
 CLOO 
 
 to • 
 
 i/> 
 
 O 
 LO 
 
 o 
 
 -o 
 s- 
 o 
 2 ■•- 
 
 d) -t-> J3 
 
 c c 
 
 O O CM 
 
 — (j on 
 
 
 
 1 
 to 
 
 
 
 
 
 
 CD 
 
 
 
 
 
 
 O T3 
 
 00 
 
 
 
 
 
 O S- 
 
 T3 
 
 
 
 
 CO 
 
 S- O 
 Q- 2 
 
 O 00 
 
 
 
 
 Q- 
 
 
 2 a 
 
 
 
 
 UJ 
 
 CO S- 
 
 
 
 
 
 1— 
 
 3 a) 
 
 to CO 
 
 
 
 
 u~> 
 
 .013 
 ing p 
 
 LO CM 
 CM CO 
 
 s_ co 
 o 
 
 
 
 
 
 O to 
 
 u_ 
 
 
 
 
 
 cn 
 
 
 
 
 
 
 C 
 
 
 s- 
 
 
 
 
 •i — 
 
 <u 
 
 3 
 
 
 .^ 
 
 
 to 
 
 c 
 
 o 
 
 1/1 
 
 o 
 
 
 to 
 
 o 
 
 4- 
 
 +-> 
 
 »— 1 
 
 
 CD 
 
 
 
 • r— 
 
 1— 
 
 
 <-> 
 
 tO C 
 
 c 
 
 c 
 
 <£ 
 
 CM 
 
 O 
 
 -O T- 
 
 • r— 
 
 Z3 
 
 Z 
 
 
 J- 
 
 s- ' 
 
 
 
 i— i 
 
 D- 
 
 Q. 
 
 O tO 
 
 to 
 
 1 — 
 
 Q 
 
 LU 
 
 
 S 3. 
 
 3. 
 
 0) 
 
 C£ 
 
 I— 
 
 to "O 
 
 
 
 
 O 
 
 oo 
 
 3. S- 
 
 tO 00 •« 
 
 C\J 
 
 1 — 
 
 o 
 
 
 O 
 
 un *a- ■»-> 
 
 r— 
 
 ai 
 
 c_> 
 
 
 co 2 
 
 CM *3" •!- 
 
 1 — 
 
 l» 
 
 
 
 CO 
 
 • c 
 
 
 ai 
 
 
 
 o s_ 
 
 S.CO 3 
 
 CM 
 
 n 
 
 
 
 • ai 
 
 o 
 
 
 
 
 
 O Q. 
 
 Li_ 
 
 
 
 
 
 
 >, CD 
 
 
 
 
 
 
 to cn 
 
 
 
 
 
 >> 
 
 >— s- 
 
 
 
 
 
 to 
 
 CD CM CD 
 
 
 
 
 1 
 
 'aj 
 
 ■a co e 
 1 
 
 
 
 
 O- 
 
 T3 CO 
 
 to .— >, 
 
 
 
 
 LU 
 
 CO 
 
 3 -O 
 
 c 
 
 
 
 1— 
 
 to 
 
 to 
 
 o 
 
 
 
 oo 
 
 3. -4-> 
 
 co +-> -o 
 
 ■ 1— 
 
 
 
 
 ■r- 
 
 CO -r- ai 
 
 +-> 
 
 
 
 
 CM -Q 
 
 Oi3^ 
 
 <a 
 
 
 
 
 ^— 
 
 to 
 
 s- 
 
 
 
 
 O i. 
 
 O S- TO 
 
 CD 
 
 
 
 
 • o 
 
 I ° e 
 
 Q. 
 
 
 
 
 O 4- 
 
 v|4 
 
 o 
 
 
 
 
 
 to 
 
 
 
 
 
 >> 
 
 -M 
 
 
 
 
 
 ra 
 
 •r- 
 
 
 
 
 
 1 — 
 
 -Q 
 
 
 
 
 
 ai 
 
 
 
 
 
 
 T3 
 
 E c 
 
 3 CD 
 
 
 
 
 
 r— 
 
 E <u 
 
 
 
 UJ 
 
 
 <o 
 
 •r- 2 
 
 
 
 o 
 
 
 ■M 
 
 C -M 
 
 
 
 a: 
 
 
 O +J 
 
 •r- a; 
 
 
 
 UJ 
 
 
 +J -r- 
 
 E -O 
 
 
 
 s: 
 
 
 J3 
 
 to 
 
 3. cu 
 
 c 
 
 CO O 
 
 CO $- 
 • o 
 
 O 4- 
 
 to i— 
 3. ro 
 
 > 
 
 CM J- 
 «3" CD 
 O 4-> 
 
 • c 
 
 O -r- 
 
 
 
 to 
 
 E 
 
 a> 
 
 4J 
 
 CO 
 
 <D 
 
 i_ 
 (T3 
 
 s 
 -o 
 
 j_ 
 
 CO 
 
 o 
 
 
 OO 
 
 
 to 
 
 S- CD 
 
 CD i— 
 
 Q. O 
 
 >> 
 
 tO (_> 
 
 3. 
 CM 
 
 o co 
 o 
 
 CM S_ 
 
 • o 
 
 O 4- 
 
 I 
 
 O CD 
 
 LT> r— 
 
 O O 
 
 • >> 
 O U 
 
 c 
 o 
 u 
 
 -a 
 
 o to 
 
 2 -M 
 
 CD -Q 
 C 
 
 O CM 
 CO 
 
 s- 
 
 
 o 
 
 
 4- 
 
 
 -a 
 
 
 ai 
 
 
 +-> 
 
 
 S- 
 
 
 eu 
 
 oo 
 
 00 
 
 -t-> 
 
 £Z 
 
 (_> 
 
 •i — 
 
 ■r- 
 
 
 t— 
 
 CD 4- 
 
 X) 
 
 C 
 
 
 o 
 
 >1 
 
 o 
 
 ie 
 
 
 E 
 
 5^ 
 
 
 !>- 
 
 CO 
 
 O 
 
 
 hi 
 
 T3 
 
 CLI 
 
 c 
 
 E 
 
 13 
 
 
 
 4- 
 
 < 
 
 O 
 
 (/I 
 
 i — 
 
 S- 
 
 o 
 
 CD 
 
 s_ 
 
 4- 
 
 -M 
 
 M- 
 
 c 
 
 3 
 
 o 
 
 CO 
 
 o 
 
 <X1 
 
 ai 
 
40 
 
 registers and requires up to 3.328 ys for its operation. Finally, 32 memory 
 cycles are required to transfer results back into memory. Cumulative pro- 
 cessing time requirements, assuming no memory conflicts, are shown in Table 
 2.5 for systems with effective memory cycle times of 50, 100 and 200ns and 
 with 1 and 4 step 2 processing units. Table 2.6 contains the corresponding 
 data for a system with n = 16, except that only one step 2 processor is 
 considered. 
 
 The time required for the disk to read one word from each of n 
 tracks is approximately 14 ys (13.89 ys), and this determines the maximum 
 allowable processing time for one hardware cycle. Two candidates from Table 
 2.5 meet this criterion. Both employ four parallel step 2 processors; one 
 has a 50ns memory cycle, and the other has a 100ns cycle. The 100ns system 
 is chosen as a standard, and the remainder of this chapter is devoted to a 
 further analysis of its timing requirements. 
 
 A time distribution for processing activities in a typical oper- 
 ating cycle of the standard system is shown in Figure 2.11. In the absence 
 of memory conflicts, the entire procedure including the return of results 
 to memory can be completed within 12.33 ys, leaving about 10% of the avail- 
 able time (region f in the figure) free for contingencies. Times shown 
 include generous allowances for delays within the circuit components, and 
 they reflect the worst-case situation in which two, 256-input lists, with- 
 out any common entries are processed using the operation OR. Duplication of 
 elements tends to reduce the size of region d. For AND's, which normally 
 produce only a few result postings, region d is usually much shorter and 
 region e frequently disappears altogether as the results of several hardware 
 cycles may be collected in the step 3 output buffer. 
 
41 
 
 UJ C\J 
 
 
 
 
 
 
 
 1— co 
 
 
 
 
 
 
 
 •— i O'-^ 
 
 CO 
 
 
 
 
 
 
 CH ^ CO 
 
 3. 
 
 
 
 
 
 
 3 UJ I— 
 
 
 
 
 
 
 
 \— «<_> 
 
 CO 
 
 o 
 
 IX) 
 
 o 
 
 LO 
 
 o 
 
 >- LU CO •— < 
 
 r— 
 
 CO 
 
 LO 
 
 CO 
 
 LO 
 
 CO 
 
 OC _l LU _l 
 
 «3" 
 
 o 
 
 LO 
 
 CO 
 
 t— 
 
 CO 
 
 O Q 1 U_ 
 
 
 
 
 
 
 
 s: s: o z 
 
 LO 
 
 en 
 
 CO 
 
 CM 
 
 LO 
 
 CO 
 
 UJOVO 
 
 n— 
 
 
 ^~ 
 
 r— 
 
 CM 
 
 t— 
 
 2:000 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 ^ — * 
 
 CO 
 
 
 
 
 
 
 co 
 
 3- 
 
 
 
 
 
 
 llj a: 
 
 
 
 
 
 
 
 1— LU 
 
 LO 
 
 o 
 
 LO 
 
 o 
 
 LO 
 
 o 
 
 • CO LjJ u_ 
 
 i^ 
 
 CO 
 
 LO 
 
 CO 
 
 CO 
 
 CO 
 
 O _J 00 
 
 00 
 
 >^- 
 
 "vf 
 
 r— 
 
 1^ 
 
 <d- 
 
 a: q_ o_ <o ^ 
 
 
 
 
 
 
 
 oujsm< 
 
 CO 
 
 r-^ 
 
 LO 
 
 CTi 
 
 CO 
 
 CM 
 
 oi-owa 
 
 ^— 
 
 
 r— 
 
 
 1 — 
 
 1— 
 
 uwo h- 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 LU 
 
 co 
 
 
 
 
 
 
 1— 
 
 p- 
 
 
 
 
 
 
 . CM LU 
 
 
 
 
 
 
 
 Q _l 
 
 co 
 
 CM 
 
 CO 
 
 CM 
 
 00 
 
 CM 
 
 C£ Q_ Q_ 
 
 CO 
 
 LO 
 
 CO 
 
 O 
 
 CO 
 
 o 
 
 O LU S 
 
 *3- 
 
 i — 
 
 1 — 
 
 CO 
 
 •sl- 
 
 r— 
 
 O I— O 
 
 
 
 
 
 
 
 Ol/)U 
 
 O 
 
 *3" 
 
 CM 
 i— - 
 
 LO 
 
 LO 
 
 Ol 
 
 _J 
 
 
 
 
 
 
 
 LU 
 
 
 
 
 
 
 
 U 1 <\J 
 
 
 
 
 
 
 
 O —1 CO 
 
 r— 
 
 *tf- 
 
 i — 
 
 «d- 
 
 r— 
 
 «* 
 
 <ah 
 
 
 
 
 
 
 
 • OC LU t-H 
 
 
 
 
 
 
 
 o<i-z 
 
 
 
 
 
 
 
 2: Q_ CO ZD 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Q- 1— 
 
 
 
 
 
 
 
 LU LU ZD 
 
 CO 
 
 
 
 
 
 
 y— \— o 
 
 ■P- 
 
 
 
 
 
 
 CO LU 
 
 
 
 
 
 
 
 _1 CO 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 • D_ CO 
 
 «* 
 
 <z* 
 
 o-. 
 
 CTi 
 
 CJ\ 
 
 CTi 
 
 q s: 
 
 o 
 
 o 
 
 LO 
 
 vo 
 
 CTi 
 
 CTi 
 
 q: o i— 
 
 
 
 
 
 
 
 o o •— < 
 
 CM 
 
 CM 
 
 CO 
 
 CO 
 
 LO 
 
 LO 
 
 o on 
 
 
 
 
 
 
 
 <_> i 
 
 
 
 
 
 
 
 -X 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1— 
 
 
 
 
 
 
 
 _3 
 
 CO 
 
 
 
 
 
 
 o 
 
 3. 
 
 
 
 
 
 
 LU 
 
 
 
 
 
 
 
 1— CO 
 
 cc 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 LU CO 
 
 CM 
 
 CM 
 
 r^ 
 
 r-- 
 
 |-» 
 
 r-«~ 
 
 LU _l 
 
 o 
 
 o 
 
 CO 
 
 LO 
 
 CTi 
 
 CT) 
 
 CJ3 O- (— 
 
 
 
 
 
 
 
 CC SL i— i 
 
 CM 
 
 CM 
 
 CO 
 
 CO 
 
 LO 
 
 LO 
 
 LU O CO 
 
 
 
 
 
 
 
 ZC ' 
 
 
 
 
 
 
 
 ^_^ 
 
 
 
 
 
 
 
 ^ 
 
 CO 
 
 
 
 
 
 
 i — i 
 
 3. 
 
 
 
 
 
 
 , — 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 • LU (— 
 
 LO 
 
 LO 
 
 o 
 
 o 
 
 o 
 
 o 
 
 O C£ I— 
 
 o 
 
 o 
 
 I— 
 
 i— 
 
 CM 
 
 CM 
 
 o:< m 
 
 
 
 
 
 
 
 LU I— CO 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 C 
 
 21 O0^~- 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 CO 
 
 
 
 
 
 
 
 CO CM 
 
 
 
 
 
 
 
 LU CO 
 
 CO 
 
 
 
 
 
 
 C_J — 'O'— 
 
 3- 
 
 
 
 
 
 
 O 2W 
 
 
 
 
 
 
 
 <=C LU |— 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 O 
 
 J— ^C_) 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 >- LU CO I— I 
 
 LO 
 
 LO 
 
 CM 
 
 CM 
 
 *d- 
 
 «3- 
 
 Od _J LU _l 
 
 
 
 
 
 
 
 O O ILL 
 
 1— 
 
 r— 
 
 CO 
 
 CO 
 
 LO 
 
 LO 
 
 2:2:0^ 
 
 
 
 
 
 
 
 uo>o 
 
 
 
 
 
 
 
 2L" o o o 
 
 
 
 
 
 
 
 * 
 
 
 
 
 
 
 
 LU 
 
 
 
 
 
 
 
 > 
 
 
 
 
 
 
 
 1— I 
 
 CO 
 
 
 
 
 
 
 1— >- 
 
 c 
 
 
 
 
 
 
 cj> or lu 
 
 
 
 
 
 
 
 LU o — 1 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 u. s. o 
 
 LO 
 
 LO 
 
 o 
 
 o 
 
 c 
 
 o 
 
 U_ LU >- 
 
 
 
 1— 
 
 r^ 
 
 CM 
 
 CM 
 
 LU 2 O 
 
 
 
 
 
 
 
 LO 
 LO 
 CM 
 
 CO 
 
 s- 
 o 
 
 CO 
 
 E 
 O 
 
 o 
 o 
 
 •1— 
 
 +j 
 
 
 CO 
 
 ra 
 
 
 c 
 
 o_ 
 
 CO 
 
 <L> 
 
 
 +-> 
 
 E 
 
 ro 
 
 O 
 
 •r— 
 
 -^ 
 
 •r— 
 
 O 
 
 rC 
 
 i — 
 
 
 Q 
 
 4- 
 
 CO 
 
 
 C 
 
 r— 
 
 i — 
 
 o 
 
 3 
 T3 
 
 CO 
 
 C_) 
 
 O 
 
 ■ — 
 
 >1 
 
 >' 
 
 fO 
 
 s_ 
 
 
 S- 
 
 o 
 
 ^ 
 
 rtS 
 
 E 
 
 i~ 
 
 o_ 
 
 co 
 
 O 
 
 
 2" 
 
 F 
 
 LO 
 
 
 <L) 
 
 LO 
 
 o 
 
 s: 
 
 CM 
 
 ^ 
 
 o 
 
 
 
 •— 
 
 c 
 
 
 a. 
 
 Q- 
 
 CO 
 
 
 f3 
 
 CU 
 
 CO 
 
 
 +J 
 
 4-> 
 
 CO 
 
 
 fO 
 
 CO 
 
 o_ 
 
 
 Q 
 
 c 
 
 
 
 r^ 
 
 o 
 
 <+- 
 
 
 CO 
 
 •1 — 
 
 »— t 
 
 
 ^™ 
 
 4-> 
 
 
 
 ^~ 
 
 (0 
 
 
 
 fO 
 
 c 
 
 • 
 
 
 $- 
 
 •1— 
 
 0) 
 
 
 (O 
 
 ■a 
 
 o 
 
 
 o_ 
 
 s. 
 
 ra 
 
 
 
 o 
 
 f^ 
 
 
 LO 
 
 o 
 
 CL 
 
 
 LO 
 
 o 
 
 co 
 
 
 CM 
 
 -o 
 
 J>c 
 
 
 -C 
 
 c 
 
 <0 
 
 
 4-> 
 
 fO 
 
 -(-> 
 
 
 •r- 
 
 2 
 
 n 
 
 -l-> 
 
 
 
 CO 
 
 o 
 
 
 CO 
 
 a> 
 
 c 
 
 
 ^~ 
 
 s- 
 
 
 
 o 
 
 CD 
 
 >> 
 
 
 >> 
 
 E 
 
 E 
 
 
 C_3 
 
 m 
 
 
 
 CO 
 
 >, 
 
 S- 
 
 
 s- 
 
 s- 
 
 o 
 
 
 fO 
 
 o 
 
 
 
 2 
 
 E 
 
 >> 
 
 
 "O 
 
 (U 
 
 (O 
 
 
 S- 
 
 E 
 
 E 
 
 
 n: 
 
 F 
 
 +-> 
 
 
 
 o 
 
 13 
 
 
 s_ 
 
 i- 
 
 a. 
 
 
 o 
 
 Lj- 
 
 +-> 
 
 • 
 
 4- 
 
 
 13 
 
 CO 
 
 
 +-> 
 
 o 
 
 
 cn 
 
 3 
 
 
 a. 
 
 c 
 
 o_ 
 
 >, a 
 
 "I - 
 
 c 
 
 S- 
 
 +-> 
 
 E 
 
 • r- 
 
 o 
 
 CO 
 
 •r- 
 
 
 E 
 
 
 1— 
 
 
 CO 
 
 +-> 
 
 
 • • 
 
 E 
 
 a. 
 
 CO 
 
 >> 
 
 
 3 
 
 > 
 
 
 CN 
 
 S_ 
 
 •I— 
 
 CO 
 
 CO 
 
 i. 
 
 -M 
 
 Z3 
 
 ^~ 
 
 CO 
 
 fO 
 
 O 
 
 o 
 
 -M 
 
 r~ 
 
 O) 
 
 >) 
 
 c 
 
 3 
 
 c 
 
 o 
 
 ■r - 
 
 E 
 
 »o 
 
 
 
 3 
 
 +-> 
 
 CU 
 
 >> 
 
 c_> 
 
 ^~ 
 
 S- 
 
 fO 
 
 
 13 
 
 fO 
 
 F 
 
 
 E 
 
 2 
 
 
 • 
 
 •i — 
 
 -o 
 
 +-> 
 
 LO 
 
 co 
 
 S- 
 
 •r— 
 
 • 
 
 
 «d 
 
 
 CM 
 
 "O 
 
 JZ 
 
 s- 
 
 
 CU 
 
 
 o 
 
 CO 
 
 OJ 
 
 s- 
 
 
 r— 
 
 o 
 
 ra 
 
 ri 
 
 -Q 
 
 o 
 
 r— 
 
 CO 
 
 fO 
 
 s- 
 
 ZS 
 
 
 f— 
 
 Q. 
 
 CJ 
 
 Q. 
 
 
 
 ■r" 
 
 CO 
 
 
 co 
 
 -(-> 
 
 +-> 
 
 
 a) 
 
 i- 
 
 co 
 
 
 CO 
 
 f0 
 
 
 
 CO 
 
 Q. 
 
 c 
 
 
 O 
 
 
 O 
 
 
 U 
 
 >> 
 
 •r— 
 
 
 O 
 
 C 
 
 +-> 
 
 
 S- 
 
 fC 
 
 «J 
 
 
 Q. 
 
 
 C 
 
 
 
 CD 
 
 •i — 
 
 
 CO 
 
 C TD 
 
 
 CO 
 
 •i — 
 
 s- 
 
 
 $- 
 
 S- 
 
 o 
 
 
 ^z 
 
 3 
 
 o 
 
 
 1— 
 
 
 <J 
 
 
MEMORY WRITE 
 COMPLETE (32 
 CYCLES, NO 
 CONFLICTS) 
 
 co 
 
 CO 
 CO 
 
 «3- 
 
 00 
 LO 
 
 00 
 
 o 
 en 
 
 CO 
 
 CO 
 
 LO 
 
 o 
 
 CM 
 
 CO 
 
 LO 
 
 cr> 
 
 CO 
 CM 
 
 COORD. 
 STEP 3 
 COMPLETE 
 (16 
 TRANSFERS) 
 
 CO 
 
 00 
 
 LO 
 CM 
 
 <* 
 
 CO 
 
 o 
 
 LO 
 
 CO 
 LO 
 CO 
 
 o 
 
 CO 
 LO 
 
 COORD. 
 STEP 2 
 COMPLETE 
 
 to 
 00 
 
 o 
 
 LO 
 CM 
 
 o 
 
 LO 
 
 o 
 
 o 
 
 LO 
 CO 
 
 1^ 
 
 o 
 
 LO 
 CO 
 
 o 
 
 O 
 LO 
 CT> 
 
 CO 
 
 NO. OF 
 PARALLEL 
 STEP 2 
 UNITS 
 
 
 
 
 
 
 COORD. STEP 
 1 COMPLETE 
 (BIT 33 OUT) 
 
 CO 
 
 o 
 
 00 
 
 cr> 
 
 CsJ 
 CNJ 
 LO 
 
 CO 
 
 CM 
 CM 
 
 CO 
 
 CO 
 
 CM 
 CM 
 
 o 
 
 CM 
 
 CM 
 
 co 
 
 MERGE 
 COMPLETE 
 (BIT 33 OUT) 
 
 00 
 
 O 
 CO 
 
 00 
 
 O 
 LO 
 
 CO 
 
 o 
 
 CO 
 CO 
 
 o 
 o 
 
 o 
 
 CO 
 
 MERGE 
 START 
 (BIT 1 IN) 
 
 CO 
 
 o 
 
 LO 
 
 o 
 o 
 
 o 
 o 
 
 o 
 
 o 
 
 o 
 
 CM 
 
 o 
 
 o 
 o 
 
 CO 
 
 o 
 
 o 
 o 
 
 o 
 
 MEMORY ACCESS 
 COMPLETE (32 
 CYCLES, NO 
 CONFLICTS) 
 
 CO 
 
 3. 
 
 c 
 c 
 
 CD 
 
 o 
 o 
 cvj 
 
 CO 
 
 o 
 o 
 
 co 
 
 o 
 o 
 
 CO 
 
 o 
 
 o 
 
 CO 
 CM 
 
 i >- 
 
 O OC LU 
 LU LU O —1 
 Ll_ > 21 O 
 U_ •— I LU >- 
 LU 1— s: C_> 
 
 CO 
 
 e 
 
 O 
 LO 
 
 o 
 
 o 
 
 o 
 
 o 
 
 CM 
 
 o 
 
 o 
 
 CO 
 
 o 
 o 
 
 *3- 
 
 42 
 
 2 
 o 
 
 o 
 
 4- 
 >> 
 
 E 
 
 CO 
 
 -E 
 
 <o 
 
 E 
 O 
 O 
 
 
 
 
 Q. 
 ClJ 
 
 E 
 CO 
 
 to 
 
 
 03 
 +J 
 03 
 
 O 
 
 
 
 
 +-> 
 to 
 
 CO 
 
 
 
 
 
 E 
 O 
 
 Q. 
 4- 
 
 
 'a; 
 
 
 
 
 
 
 
 
 
 
 
 4-> 
 
 03 
 
 e 
 
 •r— 
 -O 
 
 co 
 
 
 
 03 
 
 s- 
 
 03 
 
 o_ 
 
 
 
 
 O 
 
 fO 
 
 
 CO 
 
 
 
 
 O 
 
 Q. 
 
 
 
 
 
 
 U 
 
 -0 
 
 CO 
 
 
 -E 
 
 to 
 
 
 
 c 
 
 03 
 
 03 
 
 
 ■r— 
 
 •I— 
 
 
 
 n 
 
 +-> 
 
 
 CO 
 
 -Q 
 
 
 
 CD 
 
 O 
 
 
 
 
 CO 
 
 i — 
 
 
 
 CD 
 
 <u 
 
 E 
 >> 
 
 
 X 
 
 
 
 E 
 
 03 
 
 E 
 
 
 CO 
 
 to 
 
 
 
 >> 
 
 4- 
 
 
 03 
 
 -0 
 
 s- 
 
 03 
 
 a: 
 
 -a 
 
 s- 
 o 
 
 
 
 S- 
 O 
 
 E 
 
 O 
 
 >> 
 
 
 2 
 
 
 
 CO 
 
 E 
 
 E 
 
 
 -^ 
 
 
 
 E 
 O 
 
 S- 
 
 4-> 
 
 Z3 
 
 a. 
 
 
 
 
 to 
 
 
 
 4- 
 
 +-> 
 
 • 
 
 E 
 
 c 
 
 
 
 
 3 CO 
 
 o 
 
 co 
 
 
 +-> 
 
 O 
 
 
 •1 — 
 
 -E 
 
 
 =J 
 
 
 Q. 
 
 E 
 
 CO 
 
 +J 
 
 
 Q. 
 
 >> a 
 
 c 
 
 03 
 
 to 
 
 E 
 
 S- 
 
 4-> 
 
 1— 
 
 CO 
 
 D_ 
 
 +J 
 
 •r— 
 
 
 
 to 
 
 E 
 
 
 O 
 
 
 E 
 
 
 CO 
 
 > 
 
 •r— 
 
 03 
 
 •i — 
 
 
 CO 
 
 +J 
 
 Q 
 
 +-> 
 
 r— 
 
 • . 
 
 E 
 
 a_ 
 
 
 o3 
 
 4- 
 
 >> 
 
 
 3 
 
 p 
 
 co 
 
 CJ 
 
 E 
 
 ^- 
 
 n 
 
 i_ 
 
 03 
 
 r— 
 
 
 o 
 
 CO 
 
 CO 
 
 S- 
 
 ZS 
 
 t— 
 
 (_) 
 
 =3 
 
 ^~ 
 
 CO 
 
 3 
 
 E 
 3 
 
 -o 
 
 CD 
 
 
 O 
 
 
 
 4-> 
 
 o 
 
 i— 
 
 >> 
 
 <L) 
 
 >> 
 
 E 
 
 2: 
 
 1— 
 
 S- 
 
 E 
 
 
 
 •r- 
 
 O 
 
 
 03 
 
 o 
 
 03 
 
 
 
 >> 
 
 i- 
 
 E 
 
 +J 
 
 CO 
 
 >> 
 
 
 s- 
 
 03 
 
 cu 
 
 p— 
 
 s_ 
 
 03 
 
 
 o 
 
 o_ 
 
 s: 
 
 =3 
 
 03 
 
 E 
 
 CO 
 
 E 
 
 
 
 E 
 
 2 
 
 
 CI) 
 
 CO 
 
 o 
 
 •p* 
 
 -0 
 
 +-> 
 
 CM 
 
 ^~ 
 
 1 — 
 
 z: 
 
 CO 
 
 s- 
 
 •1 — 
 
 
 
 
 CU 
 
 fO 
 
 JE 
 
 
 
 CO 
 C3 
 
 
 
 
 CD 
 
 S- 
 
 
 OJ 
 
 
 
 
 O 
 O 
 
 03 
 
 01 
 CO 
 
 1— 
 
 
 
 
 s- 
 
 Z3 
 
 
 
 
 
 
 O- 
 
 U 
 
 •r- 
 
 Q- 
 CO 
 
 
 
 
 
 co 
 
 +-> 
 
 +-> 
 
 
 
 
 
 CO 
 
 s- 
 
 to 
 
 
 
 
 
 CO 
 
 03 
 
 
 
 
 
 
 to 
 
 Q. 
 
 E 
 
 
 
 
 
 CU 
 
 
 O 
 
 
 
 
 
 
 
 >> 
 
 •r— 
 
 
 
 
 
 
 
 E 
 
 +-> 
 
 
 
 
 
 S- 
 
 03 
 
 03 
 
 
 
 
 
 O- 
 
 a-. 
 
 E 
 •r— 
 
 
 
 
 
 cu 
 
 E"0 
 
 
 
 
 
 CO 
 
 •r™ 
 
 S- 
 
 
 
 
 
 i- 
 
 s- 
 
 
 
 
 
 
 
 x: 
 
 13 
 
 
 
 
 
 
 
 1— 
 
 d 
 
 
 
 
 
 
 
 * 
 
 * 
 
 * 
 
 
 
43 
 
 2 
 O 
 
 <* ° < 
 
 > -J 
 
 si" 
 
 IO 
 
 — O) 
 
 co 
 
 CO 
 
 e 
 o 
 
 o 
 
 
 o 
 
 
 
 o 
 
 
 
 ' — 
 
 
 CO 
 
 
 
 Lf> 
 
 • • 
 
 
 CNJ 
 
 CO 
 
 CU 
 
 
 
 r— 
 
 «3- 
 
 • ■ 
 
 C_> 
 
 
 CO 
 
 >. 
 
 
 -E 
 
 o 
 
 • • 
 
 +■> 
 
 
 co 
 
 ro 
 
 >^ 
 
 i. 
 
 Q. 
 
 S- 
 
 O 
 
 
 o 
 
 CO 
 
 03 
 
 E 
 
 CO 
 
 4-> 
 
 d) 
 
 <D 
 
 fO 
 
 E 
 
 
 
 ■o 
 
 
 
 
 
 CD 
 
 s- 
 
 n— 
 
 > 
 
 Q. 
 
 cu 
 
 •i - 
 
 
 ^-« 
 
 +-> 
 
 CNJ 
 
 f^ 
 
 U 
 
 
 03 
 
 CU 
 
 Q. 
 
 s_ 
 
 <+- 
 
 CU 
 
 to 
 
 4- 
 
 4-> 
 
 Q_ 
 
 UJ 
 
 OO 
 
 CO 
 
 Id 
 
 — to 
 
 m 
 
 to 
 
 >> 
 
 E 
 
 03 
 
 CU 
 
 E 
 
 +-> 
 
 
 CO 
 
 -(-> 
 
 >> 
 
 r— 
 
 CO 
 
 A 
 
 T3 
 
 -t-> 
 
 S- 
 
 E 
 
 03 
 
 CU 
 
 T3 
 
 co 
 
 C 
 
 CU 
 
 03 
 
 S- 
 
 ■»-> 
 
 Q. 
 
 CO 
 
 cu 
 
 
 
 
 
 
 
 CJ 
 
 
 CO 
 
 
 
 
 
 
 
 
 
 
 CU 
 
 
 
 
 
 
 
 
 
 
 +-> 
 
 
 
 
 
 
 
 +-> 
 
 
 •r— 
 
 
 
 
 
 
 
 
 
 
 > 
 
 
 
 
 
 
 
 E 
 
 
 •r— 
 
 
 en 
 
 
 
 
 
 
 
 +-> 
 
 
 e 
 
 
 
 
 
 >> 
 
 
 O 
 
 
 •1 — 
 
 
 
 
 
 03 
 
 
 ■=t 
 
 
 CO 
 
 
 
 
 
 E 
 
 
 
 
 CO 
 
 
 
 
 
 
 
 cu 
 
 
 CU 
 
 
 
 
 
 S- 
 
 
 s- 
 
 
 
 
 
 
 
 
 O 
 
 
 03 
 
 
 
 
 
 
 
 
 
 
 ■s 
 
 
 s- 
 
 
 
 
 
 >, 
 
 
 -0 
 
 
 Q. 
 
 
 
 
 
 03 
 
 E 
 
 
 s- 
 03 
 
 
 1 
 
 , 
 
 
 
 
 4J 
 
 
 3: 
 
 
 Q. 
 
 
 
 
 
 3 
 
 
 s- 
 
 
 CU 
 
 Q. 
 
 
 
 
 a. 
 
 
 
 
 
 +-> 
 
 CU 
 
 
 
 
 4-> 
 
 
 4- 
 
 
 CO 
 
 4-> 
 CO 
 
 
 
 
 3 
 
 
 CO 
 
 E 
 
 
 E 
 
 
 
 
 
 
 
 O 
 
 
 O 
 
 E 
 
 
 
 
 >> 
 
 Q. 
 
 •r— 
 
 
 •r- 
 
 O 
 
 
 
 
 s- 
 
 CU 
 
 +J 
 
 
 +J 
 
 "r— 
 
 
 
 
 
 
 +J 
 
 13 
 
 
 03 
 
 +-> 
 
 
 
 
 E 
 
 co 
 
 -Q 
 
 
 E 
 
 03 
 
 
 
 
 CU 
 
 
 •i— 
 
 
 # ^- 
 
 E 
 
 
 
 
 E 
 
 4-> 
 
 S- 
 
 
 "O 
 
 ■n- 
 
 
 
 
 
 Q. 
 
 -l-> 
 
 
 S- 
 
 -0 
 
 
 
 
 0\ 
 
 Z3 
 
 CO 
 
 
 
 
 s- 
 
 
 
 
 CU 
 
 S- 
 
 'r- 
 
 
 
 
 
 
 
 
 
 r— 
 
 S- 
 
 Q 
 
 
 
 
 
 
 
 
 
 U 
 
 QJ 
 
 
 
 
 
 
 
 
 
 >>+-> 
 
 CU 
 
 
 -0 
 
 
 
 
 
 
 
 E 
 
 E 
 
 
 e 
 
 -0 
 
 
 
 
 
 •1— 
 
 •1— 
 
 
 03 
 
 e 
 03 
 
 
 
 
 cu 
 
 >. 
 
 1— 
 
 
 *> 
 
 
 
 
 
 03 
 
 03 
 
 
 
 CU 
 
 CU 
 
 CNJ 
 
 00 
 
 
 s 
 
 E 
 
 • 
 
 
 CD 
 
 CD 
 
 
 
 
 -0 
 
 
 i— 
 
 
 s- 
 
 s- 
 
 a. 
 
 0. 
 
 
 S- 
 
 +-> 
 
 r— 
 
 
 CU 
 
 CU 
 
 cu 
 
 cu 
 
 
 03 
 
 •r— 
 
 • 
 
 
 E 
 
 E 
 
 CO 
 
 to 
 
 
 _e 
 
 S_ 
 
 CO 
 
 
 #\ 
 
 4- 
 
 
 
 
 S- 
 
 O 
 
 CU 
 
 
 .c 
 
 O 
 
 E 
 
 E 
 
 cu 
 
 03 
 
 
 S- 
 
 
 
 
 
 O 
 
 O 
 
 s- 
 
 t— 
 
 CO 
 
 3 
 
 
 4-> 
 
 E 
 
 • 1 — 
 
 •r— 
 
 
 
 3 
 
 
 CD 
 
 
 CU 
 
 O 
 
 +-> 
 
 +J 
 
 +-> 
 
 CJ 
 
 Q. 
 
 •1— 
 
 
 4- 
 
 • r- 
 
 03 
 
 03 
 
 CO 
 
 • i— 
 
 CU 
 
 Li_ 
 
 
 
 +-> 
 
 E 
 
 E 
 
 
 +-> 
 
 +-> 
 
 
 
 >^ 
 
 CU 
 
 •r" 
 
 •r- 
 
 >> 
 
 S- 
 
 CO 
 
 
 
 S- 
 
 r— 
 
 -a 
 
 T3 
 
 i. 
 
 03 
 
 
 
 
 O 
 
 a. 
 
 s- 
 
 S- 
 
 cu 
 
 Q. 
 
 E 
 
 
 
 E 
 
 E 
 
 
 
 O 
 
 E >— 
 
 
 O 
 
 
 • • 
 
 CU 
 
 O 
 
 
 
 O 
 
 cu -a 
 
 >> 
 
 ■r™ 
 
 
 CO 
 
 2: 
 
 
 
 
 
 C_J 
 
 ;>, >— 1 
 
 E 
 
 -4-> 
 
 
 CU 
 
 
 
 
 
 
 03 
 
 03 
 
 
 • 1 — 
 
 
 
 
 
 
 
 E 
 
 
 -M 
 
 
 
 
 
 
 cn 
 
 •r™ 
 
 
 • j— 
 
 03 
 
 -O 
 
 
 
 ■a 
 
 CU 4- 
 
 E 
 
 "O 
 
 
 > 
 
 
 
 
 
 
 •1— 
 
 S_ 
 
 
 •1 — 
 
 
 
 
 
 
 S- 
 
 O 
 
 
 +-> 
 
 
 
 
 
 
 ZJ 
 
 O 
 
 
 u 
 
 
 
 
 
 
 Q 
 
 a 
 
 
 < 
 
 
 
 
 
 
 * 
 
 
 
44 
 
 Finally, it is important to note that memory activity associated 
 with a hardware cycle is concentrated near the beginning and the end of 
 that cycle. Therefore, as long as the 14 ys time constraint is satisfied, 
 no memory conflicts between phases a and e can ever occur. The only con- 
 flicts which may arise involve either phase a or phase e and disk I/O. 
 
 Figure 2.11 is based on clock intervals of 100ns for merge and 
 step 1, 33ns for step 2 and 13ns for step 3. Use of an 80 MHz clock with 
 its related frequencies as described in section 2.3.2.5 would increase the 
 total time for a hardware cycle to 13.29 ys , which is still within the 14 ys 
 limit. 
 
 So far in this analysis conflicts among the four groups of memory 
 transfers which take place during a hardware cycle have been ignored. Figure 
 2.12 illustrates certain conflict situations and methods for controlling 
 them. Each time line in the figure represents one hardware cycle (13.89 ys), 
 and each vertical spike marks the beginning of one, 100ns memory cycle used 
 for the indicated series of transfers. Processing times are based on data 
 for the "standard" system in Figure 2.10 and Table 2.5. The reference line 
 at the bottom of Figure 2.12 represents disk 1/0 requirements for all cases 
 and is to be used separately with each of the other lines: a, b, c and d. 
 
 Figure 2.12(a) presents the same no-conflict situation as Figure 
 2.11. Disk accesses are spread uniformly throughout the cycle, and hardware 
 transfers are grouped together near the beginning and end. While hardware 
 input and output conflicts cannot occur if a strict 13.89 ys time limit is 
 observed, other types of conflicts are nearly inevitable since, for example, 
 data being read from disk and data being transferred to hardware are 
 typically different sublists of the same postings file and therefore should 
 
45 
 
 rs 
 
 o> 
 
 c 
 
 r— 
 
 
 u 
 
 LU 
 
 >» 
 
 or 
 
 u 
 
 cC 
 
 
 3 
 
 t- 
 
 Q 
 
 o 
 
 or 
 
 «4- 
 
 < 
 
 ^-^ 
 
 = 1 
 
 
 ■s °«3 
 
 o J o 
 
 z Q > 
 u a ° 
 
 (VJ 
 
 1 
 
 
 
 
 »- 
 
 •^ 
 
 :d 
 
 *^— s. f— 
 
 D- 
 
 •^" 
 
 »- 
 
 <u 
 
 ZD 
 
 OJ E 
 
 c 
 
 r— »r— 
 
 
 U 4-> 
 
 LU 
 
 >S 
 
 OT 
 
 U t_ 
 
 <t 
 
 0) 
 
 3 
 
 I- -M 
 
 C 
 
 O M- 
 
 OT 
 
 «4- ITS 
 
 <c 
 
 ^— *• 
 
 3: 
 
 3 
 
 Z3 
 
 o 
 
 U. a 
 
 is 
 
 UJ< 
 
 I 
 
 _IO 
 
 CM 
 
 - en 
 
 - CO 
 
 - h- 
 
 - to 
 
 V) 
 
 to 
 +J 
 o 
 
 c 
 o 
 u 
 
 (C 
 
 ID 
 
 - <T 
 
 
 1 
 
 1— 
 
 
 zd 
 
 "-— s 
 
 D_ 
 
 •r— 
 
 «££ 
 
 
 1— 1 
 
 0) 
 
 
 ^— 
 
 LU 
 
 o 
 
 or 
 
 >> 
 
 < 
 
 u 
 
 
 
 _i 
 
 
 o 
 
 S- 
 
 QT 
 
 o 
 
 <: 
 
 <*- 
 
 3C 
 
 
 -en 
 
 (0 
 
 -N — 
 
 UJ 
 
 S 
 
 -10 
 
 - If) 
 
 Q. 
 
 or 
 
 Q 
 
 or 
 
 0J 
 
 u 
 
 >> 
 o 
 
 J- 
 o 
 
 ro 
 
 ■M 
 
 Q. 
 +J 
 
 13 
 
 o 
 
 <1J 
 
 -Q 
 XJ 
 
 e 
 fd 
 
 to 
 
 E 
 <a 
 a> 
 s- 
 +-> 
 to 
 
 f0 
 
 -a 
 
 a. 
 
 c 
 
 CD 
 
 a» 
 +J 
 
 aj 
 _q 
 
 to 
 
 +-> 
 u 
 
 o 
 a 
 
 a> 
 
 -Q 
 (/> 
 
 to 
 
 o 
 a. 
 
 13 
 -Q 
 
 CD 
 S- 
 +-> 
 to 
 
 +J 
 
 A3 
 
 -a 
 
 
 
 
 
 
 
 
 ro 
 
 
 
 
 
 — 
 
 _c\J 
 
 
 
 
 ■— i 
 
 
 - 
 
 — 
 
 
 
 -O 
 
 u. or 
 O < 
 
 o£ 
 
 z or 
 uj < 
 
 X 
 
 ff) 
 
 oo 
 
 -N — 
 
 
 Ul 
 
 
 </) 
 
 
 < 
 
 
 o 
 
 (A 
 
 _l 
 
 =L 
 
 _l 
 
 
 < 
 
 LU 
 
 
 5 
 
 CC 
 
 tD 
 
 -1- 
 
 CO 
 
 LU 
 
 z 
 
 LU 
 CC 
 LU 
 Li. 
 LU 
 
 or 
 
 to 
 
 >> 
 
 n3 
 
 c 
 
 u 
 
 c 
 o 
 (_) 
 
 o 
 
 E 
 
 CO 
 
 CD 
 
 i- 
 13 
 
 -tvj 
 
 V) 
 
 c 
 O 
 O 
 
 LU 
 
 _l 
 o 
 >- 
 o 
 
 >- 
 or 
 o 
 
 2 
 
 LU 
 
 s 
 
 ' UJ 
 
 *: t- 
 oor a J 
 
 i/> 
 
 < .«- 
 
46 
 
 _ * 
 
 3 
 Q- 
 ■»-> 
 
 to 
 
 O 
 
 ■o 
 
 > 
 s- 
 
 <D 
 CO 
 
 <D 
 
 i- 
 
 o 
 
 c 
 
 in 
 
 CD 
 
 o 
 o 
 
 c 
 
 CO 
 
 o. 
 
 3 
 
 o 
 o 
 
 o 
 
 >> 
 
 E 
 ■P 
 
 3 
 Q. 
 +■> 
 3 
 O 
 
 CD 
 
 &. 
 
 fO 
 
 5 
 
 "2 
 
 31 
 
 -CM 
 
 13 
 
 
 Q. 
 
 
 I— 
 
 
 ZD 
 
 
 C 
 
 *—^ 
 
 
 r~ 
 
 UJ 
 
 1 
 
 q; 
 
 • r— 
 
 <c 
 
 
 3 
 
 a> 
 
 O 
 
 ^— 
 
 o; 
 
 u 
 
 <c 
 
 >> 
 
 a: 
 
 <j 
 
 LU 
 
 t- 
 
 — i 
 
 o 
 
 CO 
 
 t)- 
 
 H-H 
 
 *»— <" 
 
 to 
 
 
 to 
 
 
 o 
 
 1 
 
 a. 
 
 UJ 
 
 
 u. q: uj 
 
 
 o < _J 
 
 
 
 
 
 zgu 
 
 
 w < 
 
 (/) 
 
 z 
 
 0) 
 
 
 
 _o 
 
 -CD 
 
 (A 
 
 UJ 
 
 2 
 
 1 
 
 
 1— 
 
 
 ZD 
 
 ^■^ 
 
 a. 
 
 •r* 
 
 
 
 
 
 *— 1 
 
 <D 
 
 
 r~ 
 
 UJ 
 
 L> 
 
 a: 
 
 >i 
 
 < 
 
 u 
 
 3 
 
 
 O 
 
 u 
 
 a: 
 
 o 
 
 < 
 
 t»- 
 
 3C 
 
 
 3 
 
 
 Q. 
 
 
 c 
 
 
 •r~ 
 
 
 c 
 
 
 OJ 
 
 
 a) 
 
 
 ■5 
 
 
 4-> 
 
 
 OJ 
 
 
 -Q 
 
 
 to 
 
 
 +J 
 
 
 o 
 
 
 •^ 
 
 
 ^— 
 
 
 M- 
 
 
 C 
 
 • 
 
 o 
 
 t/> 
 
 o 
 
 E 
 
 
 <o 
 
 O) 
 
 OJ 
 
 r - 
 
 s- 
 
 XI 
 
 +-> 
 
 •i - 
 
 </) 
 
 to 
 
 
 to 
 
 (O 
 
 o 
 
 -p 
 
 a. 
 
 fD 
 
 
 T3 
 
 . 
 
 4-> 
 
 ■p 
 
 3 
 
 3 
 
 a. 
 
 Q.-P 
 
 ■P 
 
 3 
 
 3 
 
 O 
 
 O 
 
 
 
 c 
 
 a> 
 
 CD 
 
 s- 
 
 aj 
 
 ca 
 
 ■s 
 
 ^ 
 
 -p 
 
 -a 
 
 CD 
 
 i- 
 
 JD 
 
 <o 
 
 
 JC X 
 
 
 C 
 
 •p 
 
 fO 
 
 ro 
 
 
 
 (/I 
 
 n 
 
 E 
 
 5- 
 
 (O 
 
 OJ 
 
 d) 
 
 4- 
 
 s- 
 
 4- 
 
 ■p 
 
 3 
 
 to 
 
 _Q 
 
 
 
 ro 
 
 01 
 
 -P 
 
 c 
 
 fO 
 
 o 
 
 "O 
 
 1 
 
 I— 
 ZD 
 
 a. 
 
 t- 
 
 ZD 
 O 
 
 LU 
 
 < 
 
 o 
 
 Q£ 
 
 «t 
 X 
 
 UJ 
 _! 
 0Q 
 t— i 
 
 to 
 to 
 o 
 
 i 
 
 01 
 
 r— 
 U 
 >> 
 
 u 
 
 o 
 
 U- 
 
 
 
 (- 
 => 
 
 a. 
 z 
 •— < 
 
 UJ 
 
 a: 
 
 <c 
 
 Q 
 
 a: 
 
 < 
 a: 
 
 CD 
 
 O 
 
 >> 
 
 u 
 
 t- 
 o 
 4- 
 
 zcr b 
 
 UJ< 
 
 z 
 
 ro 
 
 _ C\J 
 
 - 0> 
 
 <A 
 LU 
 
 5 
 
 - to 
 
 <_> 
 
 03 
 
 ■p 
 
 fO 
 
 en 
 c 
 o 
 E 
 re 
 
 to 
 
 -P 
 O 
 
 o 
 
 u 
 
 CD 
 
 to 
 to 
 
 o 
 a. 
 
 a. 
 -p 
 
 CD 
 
 s- 
 
 CO 
 
 -o 
 
 s- 
 
 lO 
 
 -P 
 
 CD 
 
 u. tr 
 z a o 
 
 UJ < 
 
 z 
 
 ro 
 
 - O 
 
 cn 
 
 CO 
 
 JD (O 
 
 CD 
 
 CD S- 
 
 C -P 
 
 O to 
 
 ^-o 
 
 
 to 
 
 
 LU 
 
 
 CO 
 
 
 < 
 
 
 (_> 
 
 
 _l 
 
 
 < 
 
 *— ' 
 
 oc 
 
 UJ 
 
 o 
 
 2 
 
 Li- 
 
 to 
 
 LU 
 O 
 
 LU 
 li. 
 LU 
 
 or 
 
 CO 
 
 c 
 
 O 
 O 
 
 LU 
 
 -I 
 
 >- 
 
 u 
 
 > 
 tr 
 o 
 
 LU 
 
 2 
 
 to 
 
 >, 
 
 fO 
 
 c 
 <c 
 
 -t-> 
 u 
 
 o 
 
 >1 
 
 s_ 
 o 
 
 E 
 
 CD 
 
 -a 
 
 CD 
 
 c: 
 
 +-> 
 
 c 
 o 
 u 
 
 CD 
 
 s- 
 
 3 
 CD 
 
 1 UJ 
 
 air Q? 
 
47 
 
 require access to the same memory module. A similar relationship exists 
 between data transfers from hardware to memory and from memory to disk. 
 
 Suppose that the two input data streams do require access to the 
 same memory module, i, and that the two output data streams require access 
 to some other module, j. This case is illustrated in Figure 3.12(b). Here, 
 disk transactions "steal" a number of memory cycles from the hardware input 
 process, delaying its completion by approximately 1 ys. This delays com- 
 pletion of the hardware processing by the same amount; and another series 
 of conflicts in the output module causes the total time required for the 
 process to be 14.2 ys, about 300ns more than the allowable time. 
 
 One solution for this problem is to insert buffers between the 
 memory and the hardware (points A or B or both in Figure 2.10) to spread 
 memory access requirements more evenly throughout the operational cycle. 
 Each buffer used in this way increases the total time required to process 
 two lists by one hardware cycle, since, with a buffered input, each input 
 sublist reaches the merge network one cycle later than before, and simi- 
 larly at the output. Simulation experiments (Chapter 4) show that for a 
 complicated sample search, the use of buffers has very little effect upon 
 performance and, for any given set of starting conditions, may either 
 increase or decrease the total time required. When buffers are used, 
 Equation 2.1 becomes 
 
 t = i+j+w + 1+N B , (2.5) 
 
 where N g is the number of buffers employed in the hardware path. 
 
 Returning to the conflict situation defined above in which two 
 input data streams share one memory module and two output data streams share 
 
48 
 
 another, suppose that data from coordination step 3 were collected in a 
 buffer at Point B (Figure 2.10) and returned to memory at any convenient 
 time during the next hardware cycle. The effect would be to distribute 
 memory access requirements for hardware output over an entire cycle in- 
 stead of concentrating them near the end. As Figure 2.12(c) shows, all 
 timing requirements can be satisfied easily using this configuration. 
 
 Figure 2.12(d) illustrates the situation in which all data trans- 
 fers reference a single 100ns memory module. About half the memory cycles 
 are required for disk I/O; and, as a result, the hardware input phase re- 
 quires 6 us. During the next 5.9 ys disk I/O continues, and the hardware 
 output from the previous cycle is returned to memory. The entire cycle of 
 operation including all hardware-related memory transactions can be com- 
 pleted in 11.9 ys. Again, only one buffer located at Point B of Figure 
 2.10 is needed. 
 
 If, instead of using a single memory module with a true 100ns 
 cycle, an effective 100ns cycle were achieved by interleaving four cheaper 
 400ns submodules, then it can be shown that it would still be possible to 
 satisfy the timing constraints with the aid of buffers at Points A and B 
 of Figure 2.10 and with wery little performance degradation. 
 
 2.5 Summary 
 
 This chapter has described in detail the hardware requirements of 
 the proposed system. The operation of each of the hardware subsystems has 
 been defined and designs have been outlined for a comparison element (the 
 basic building block of the merge network) and for the various parts of the 
 coordination network. Other subsystems can be obtained from existing devices 
 either directly or through relatively minor modifications. Timing 
 
49 
 
 constraints have been analyzed, and the interaction of the various components 
 during a typical cycle of operation has been discussed at length. The 
 effects and control of memory conflicts have also been evaluated. 
 
 It is concluded that hardware term coordination systems capable 
 of processing up to 256 items simultaneously can reasonably be built using 
 current technology. 
 
50 
 
 3. BASIC ALGORITHMS 
 
 In order to operate the proposed hardware coordination system, 
 a number of procedural decisions are required. The system is intended for 
 operation in a large, on-line retrieval environment where frequently a 
 single search request may involve a large number of search terms and hence 
 require the manipulation of a large number of postings files. Further, the 
 number of entries in these files may vary radically from one file to an- 
 other and may be expected frequently to exceed the number of data paths in 
 the system and even the available memory. In the data base used as a model 
 for this study, for example, some terms index as few as one or two documents 
 while others index as many as half a million. As a result, procedures are 
 needed for insuring a proper sequence of inputs to the merge network, for 
 handling intermediate results in large searches, and for processing exces- 
 sively long lists. Problems of this type are considered in the present 
 chapter, and a brief description of the standard algorithms which have been 
 adopted for performance evaluation studies is presented. Variations exam- 
 ined in the interest of optimizing performance are discussed in Chapter 4. 
 It is beyond the scope of this report to specify explicit algorithms for 
 use by the control computer. Rather, it is assumed that the processing which 
 must be done there can be performed within the available time. 
 
 3.1 Sublist Sequencing 
 
 Consider the problem of processing two lists, each of which contains 
 more than n postings, where n is the number of data paths in the system. Each 
 of the two input lists may be divided into sublists of length n, and one new 
 sublist may be processed each cycle. The problem then becomes one of choosing 
 
51 
 
 a proper sequence of sublists to assure the success of the overall merge. 
 Clearly, some sequences are not appropriate since, for example, one could 
 not normally process first all the sublists from one file and then all the 
 sublists from the other. During each hardware cycle, n new inputs are 
 introduced into the merge and the n smallest elements currently in the 
 system are released as finished results and become unavailable for further 
 sorting. 
 
 Refer now to Figure 3.1, where Lists 1 and 2 represent files to 
 be merged. Items in each file are assumed to be arranged in nondecreasing 
 order. Let a be the last n-element sublist processed from List 1, 3 the 
 last sublist from List 2, and y and 6 the next sublists available on Lists 1 
 and 2, respectively. The last elements on a, $, y and 6 are a, b, e and f, 
 
 respectively; and c and d represent the leading items on lists y and 6. 
 
 k+1 
 Define: N = a list of n new inputs to the merge for cycle k+1. 
 
 k 
 F = a list of n finished results, f. , from merge cycle k 
 
 ( f i < f i+r ! < i < n - D 
 
 k 
 R = a list of n elements, r. , retained for further 
 
 processing after merge cycle k 
 (rj < rj +1 . 1 < i < n - 1). 
 
 Theorem: Proper sequencing of sublists is assured if, for every hardware 
 cycle, the next available sublist having the smaller leading 
 element is chosen. If the two leading elements are equal, either 
 sublist may be used. 
 
52 
 
 LIST 1 
 
 LIST 2 
 
 L^rr-.j 
 
 LAST INPUTS 
 PROCESSED 
 
 NEXT INPUTS 
 AVAILABLE 
 
 N = 
 
 NEW 
 INPUTS 
 
 n + 1 
 n + 2 
 
 2n 
 
 (a) Sublists and Data Elements 
 
 
 
 
 
 
 
 MERGE 
 NETWORK 
 
 
 • 
 
 
 • 
 
 
 • 
 
 
 
 • 
 
 
 • 
 
 — fri 
 
 
 • 
 
 — ^ 
 
 • 
 
 — » 
 — » 
 
 
 • 
 
 
 • 
 
 
 
 • 
 
 
 • 
 
 
 
 • 
 
 
 
 
 
 
 
 1 ' 
 
 2 
 
 n 
 
 n + 1 
 n + 2 
 
 2n 
 
 Y F = RESULTS 
 
 (b) Merge Network Inputs and Outputs 
 
 R = 
 
 FEEDBACK 
 RESULTS 
 
 Figure 3.1. Definitions for Sublist Sequence Discussion 
 
53 
 
 Proof: Consider the k step of the merge. 
 
 R k_1 merged with y or 6 + F k + R k 
 
 k-1 k-1 k-1 
 The last element of R is r = max R = a or b, say a. 
 
 Then a >_ b. Note also that c >_ a since List 1 is arranged in nondescending 
 order. 
 
 If d ^ c, then d >^ a and 
 
 R k_1 merged with y •+ F k + Y » where F k = R k_1 and R k = Y 
 
 and R k_1 merged with 6 ■* F k + 6, where F k = R k_1 and R k = 6. 
 
 If, however, d < c, then the relationship between a and d is unknown and 
 
 k-1 k 
 
 R must be merged with 6 so that any 6. <_ a may be included in F . □ 
 
 An alternative rule which will produce an acceptable sequence and 
 which may be more convenient to apply in practice is based on comparison of 
 a and b rather than c and d. As each new sublist is entered into the system, 
 compare its largest (last) element with the largest item currently in the 
 system and update an indicator showing which file has given rise to the 
 current largest element. Then choose the next sublist from the other file. 
 
 3.2 Intermediate Results 
 
 When the length of the result of a particular search exceeds the 
 available space in memory a series of intermediate results must be stored 
 temporarily for later processing. In a given search, many but not all of 
 these intermediate results tend to be of comparable lengths (greater than 
 one memory load). It might seem appropriate to generate several such runs 
 on one pass, combine them in pairwise fashion on the next pass beginning 
 with whichever list becomes available first, and proceed in this fashion 
 
54 
 
 until only one list remains. However, it has been found that this procedure 
 results in a large amount of idle time spent waiting for the disk. Further- 
 more, if the number of intermediate results to be processed is odd, care 
 must be taken to avoid an "infinite loop" situation in which any particular 
 list serves alternately as a source on one rotation and a sink on the next. 
 (In the example of the previous chapter, LI was a source and L2 was a sink.) 
 
 It has proved more effective to identify the longest list at the 
 beginning of a search and use it exclusively as a sink. Whenever the memory 
 fills above a certain threshold, processing is suspended until the longest 
 list becomes available, the contents of the memory are processed against the 
 longest list and the result is left on disk. Then normal processing is re- 
 sumed until the memory is full again. 
 
 3.3 List Splitting 
 
 It frequently becomes necessary to split a list into two sections, 
 read the first part, and leave the other for future use. This facility is 
 essential whenever it is necessary to process a source list which is too 
 large to fit the available memory; it may be used (sparingly) at other times 
 to improve performance by improving the utilization of the data memory. The 
 procedure can be implemented as a simple bookkeeping transaction in the con- 
 trol computer. 
 
 3.4 Special Requirements of OR, AND and NOT Processing 
 
 Most of the discussion up to this point has dealt explicitly or 
 implicitly with "OR" processing. From an operational point of view, no 
 substantial difference exists among the OR, AND and NOT procedures, but 
 certain details should be examined. For the remainder of this discussion 
 
 
55 
 
 let LI refer either to search term one or to its associated postings file, 
 and let £, be the number of document postings in that file. Let L 2 , £«> LR 
 
 and £ have corresponding definitions with respect to search term two and 
 
 the result of the coordination procedure at hand. Assume that £ 2 ± h • 
 
 For the search request 
 
 LI OR L2 , 
 
 The condition £ = a, + ju implies that no documents are common to both in- 
 put lists, and processing proceeds exactly as described in Chapter 2. How- 
 ever if £ r < £, + £« then the smooth flow of results from the coordination 
 
 network through the memory and onto the disk will be interrupted from time 
 to time as it becomes necessary to wait for complete n-element sublists of 
 LR. If the results are destined for memory alone, this delay presents no 
 difficulty; but if they are to be written on disk, then "gaps" will appear 
 in the disk files. The problem can be controlled by storing information on 
 the disk to indicate which blocks contain valid data, or by supplying ap- 
 propriate accounting procedures in the control computer. It can be elimi- 
 nated by providing sufficient buffer space in memory to contain one complete 
 logical track (n physical tracks) of information. In practical retrieval 
 systems, the degree of overlap between any two pairs of postings files is 
 believed typically to be quite small, perhaps 2%, so that in most searches 
 only a few gaps might develop and a very small buffer would provide complete 
 protection. In the worst possible case ("LI OR LI") the density of infor- 
 mation in the output file cannot drop below 1/2 its normal value since the 
 result must contain at least £■, postings (£« <. £-i <. £ ). Gaps in one 
 
56 
 
 intermediate result may propagate to another, but this need not necessarily 
 occur. 
 
 For the two search requests 
 
 LI AND L2, 
 and 
 
 LI AND NOT L2, 
 
 the problem of gaps on the disk need never arise since LR is never longer 
 than LI and hence the results of the search can be collected in memory until 
 the procedure is complete. If the search involves very long input files, 
 however, it may be necessary for the control computer to conduct these 
 searches in several phases. Consider the search "LI AND L2" in which 
 £« <_ £, , but ju still contains km postings, where m represents the available 
 
 memory space. L2 must be divided into k sections, L2, , L2 2 , ..., L2. , each 
 
 of the length m. The search may then be conducted in k+1 phases to form 
 the desired LR: 
 
 LR ] = L2 1 AND LI 
 LR 2 = L2 2 AND LI 
 
 LR k = L2 k AND LI 
 
 LR = LR ] OR LR 2 OR ... OR LR k . 
 
 A similar procedure is required to perform the search "LI AND NOT 
 L2" when LI is too long for the available space. 
 
57 
 
 3.5 Processing Algorithms for the Experimental System 
 
 For experimental purposes, a number of search simulations have 
 been performed using a collection of standard procedures and parameters. 
 This section describes these standard elements; Chapter 4 describes specific 
 test conditions and presents results. As in other parts of this report, 
 the term "merge" will often be used to refer to the complete hardware merge 
 and coordination procedure. 
 
 3.5.1 Overview 
 
 Consider a search request involving the disjunction of several 
 terms, and let the longest of the associated postings files be designated 
 File S. Processing begins as soon as the disk addresses of the required 
 files are determined. With the exception of File S, postings lists are 
 accumulated in memory as they are encountered on the disk; merging is 
 initiated whenever two lists are available and the merge system is free. 
 When free memory drops below a specified threshold, t, further accumulation 
 is suspended until some core is released or until the present contents are 
 fully processed, coordinated with File S, and left on disk. Then normal 
 processing is resumed. 
 
 3.5.2 List Selection 
 
 When a list other than File S is encountered on the disk it may 
 be read into core, rejected or split. Normally it will be read in its 
 entirety. A list may be rejected only when the required transmission 
 facilities are busy (e.g., another list is being read) or when memory is 
 
58 
 
 filled above the threshold level of (100-t)%. A list rejected on one 
 rotation is reconsidered on each succeeding rotation until it is finally 
 processed. If at least t% of the total memory is free but the new list is 
 still too long to fit the available space, then the list is split into two 
 sections, A and B. Part A, which just fills the available space, is read 
 immediately; the remainder of the list, Part B, is left for another ro- 
 tation. 
 
 3.5.3 Merge Initiation 
 
 Merging is initiated whenever the merge system is free and any 
 two lists are available. A list is considered available when either a) 
 it is completely contained in core, or b) its first block is encountered 
 on the disk. A list in the process of transmission from disk to core does 
 not become available until after that transmission is complete. If a 
 choice exists among more than two lists, the two shortest are selected for 
 processing. Thus an attempt is made on a local basis to optimize the merge 
 and coordination procedure. It can be shown [13,14] that merge time would 
 be minimized if all lists were available initially and if the shortest two 
 remaining lists were chosen for each new processing cycle. In the present 
 context, minimizing the use of the merge system is not equivalent to 
 minimizing the total elapsed time for a search; nevertheless, a strong 
 interdependence between the two has been observed. 
 
 One additional restriction in the present implementation can cause rejection: 
 no more than 20 files for any given search may exist in core at one time. 
 This limit is occasionally reached. 
 
59 
 
 3.5.4 File S Processing 
 
 The longest file in a search is designated from the beginning as 
 the sink and is used to collect and coordinate intermediate results. This 
 list is accepted for processing only when no other lists remain on disk or 
 when the memory is nearly full and must be cleared to make room for other 
 files. 
 
 Certain other conditions must also be satisfied before File S 
 can be processed, namely, all the required transmission facilities (input 
 and output channels) must be free, the merge system must be idle, and 
 adequate space must be available on some disk to receive the output. If 
 any of these conditions fails, processing is deferred until the situation 
 can be corrected. 
 
 As the simulation is presently implemented, the merge network is 
 assigned whenever two lists are ready for merging, and merge processing is 
 not interrupted before its completion. File S, however, cannot be proces- 
 sed unless the merge system is free. As a result, when the memory gets full, 
 all files in core are combined into a single long intermediate result before 
 File S is processed. During this period of consolidation, memory space can 
 be released as unwanted data items are eliminated. If as a result of this 
 process, the amount of free memory rises above the threshold value, new in- 
 puts can again be accepted from the disk. There is reason to believe that 
 these policies leads to inefficiencies and that further algorithmic refine- 
 ments are in order. See section 4.4. 
 
 3.5.5 Result Processing 
 
 All results are retained in core except those which involve File 
 S and which therefore are left on disk. Results retained in core become 
 
60 
 
 available for further processing; those on disk constitute a new "longest 
 list". 
 
 In a practical retrieval system, the length of the file which 
 results from a particular coordination procedure depends upon the operation 
 being performed and upon the number of postings which are common to the two 
 input lists. In order to simulate the effects of element duplication, an 
 overlap factor , c. , has been associated with each term, i. This factor 
 
 reflects the extent to which term i indexes documents in common with other 
 terms of interest. Using the notation of section 3.4, the length and over- 
 lap factor for the output file from the search "LI OR L2" are given by 
 
 V = *-, + 0-c m U 2 > c r = c l • 
 where a, _> ju ap d c m = max(c, ,c 2 ). 
 
 Corresponding equations for AND and NOT processing are not used 
 in the present study. If this rule is applied repeatedly in a search in- 
 volving many terms, the length of the final result depends upon the order 
 in which the terms are processed. Experimentally, this has not proved to 
 be a serious problem: the result length from trial to trial has been found 
 to deviate from the overall mean value by only a few percentage points. 
 
 3.5.6 Standard Parameters 
 
 Disk-related parameters used throughout this study are those shown 
 in Table 2.2. In addition, standard values of 10% for the overlap factor 
 and 10% for the memory threshold have been adopted. A merge is assumed to 
 require £, + ju + 1 hardware cycles to complete. 
 
61 
 
 3.6 Example 
 
 The unified operation of the procedures discussed in this chapter 
 is best described by means of an example'. Suppose a search request has 
 been received for any document indexed by one or more of eleven specified 
 terms. Table 3.1 shows the number of documents posted to each term and also 
 the time interval after the start of the search during which each file will 
 first be available. A standard merge system with 16 parallel data paths 
 and a 6K word memory has been assumed. Table 3.2 lists the important events 
 which occur during the progress of the search. Many of the essential time 
 relationships are illustrated graphically in Figure 3.2. For each of the 
 three rotations required to process this request, the figure shows the 
 initial arrangement (in time) of data on the disk and the distribution of 
 merge activity during the period. Element heights in Figure 3.2 are not 
 significant but have been chosen merely to differentiate between adjacent 
 or overlapping activities. 
 
 During the first rotation all but three and part of a fourth of 
 eleven original lists are processed, and the merge network is occupied for 
 16.3 out of 25ms. The remainder of the process requires about 1-1/2 addi- 
 tional rotations and 23.8ms of additional merge time. 
 
62 
 
 Term 
 
 Postings 
 
 LI 
 
 
 1976 
 
 L2 
 
 (File S) 
 
 2384 
 
 L3 
 
 
 199 
 
 L4 
 
 
 292 
 
 L5 
 
 
 750 
 
 L6 
 
 
 1680 
 
 L7 
 
 
 1600 
 
 L8 
 
 
 220 
 
 L9 
 
 
 100 
 
 no 
 
 
 1414 
 
 
 (L10A 
 
 1280 
 
 
 (L10B 
 
 134 
 
 Lll 
 
 
 156 
 
 Start Address End Address 
 (ms. past reference) (ms. past reference) 
 
 0.500 
 
 5.014 
 
 5.167 
 
 6.875 
 
 12.236 
 
 12.570 
 
 15.556 
 
 17.222 
 
 17.431 
 
 21.445 
 
 21.445 
 
 22.556 
 
 21.806 
 
 2.222 
 
 7.084 
 
 5.347 
 
 7.139 
 
 12.889 
 
 14.028 
 
 16.945 
 
 17.417 
 
 17.528 
 
 22.681 
 
 22.556) 
 
 22.681) 
 
 21.945 
 
 Table 3.1. Definition of Sample Search 
 
63 
 
 Time (ms. 
 
 past reference) Event 
 
 0.000 Start of search 
 
 0.500 Read LI 
 
 5.014 Skip L2 (File S) 
 
 5.167 Read L3 and start merge (LI and L3) 
 
 6.875 Read L4 and hold in core 
 
 7.084 End merge: result = Tl (2155 postings) 
 
 7.139 End read L4 and start merge (Tl and L4) 
 
 9.292 End merge: result = T2 (2417 postings) 
 
 12.236 Read L5 and start merge (T2 and L5) 
 
 12.570 Skip L6 (Read channel busy) 
 
 15.014 End merge: result = T3 (3092 postings) 
 
 15.556 Read L7 and start merge (T3 and L7) 
 
 17.222 Read L8 and hold in core 
 
 17.431 Read L9 and hold in core 
 
 19.653 End merge: result = T4 (4532 postings) 
 
 19.653 Start merge (L8 and L9) 
 
 19.959 End merge: result = T5 (310 postings) 
 
 19.959 Start merge (T4 and T5) 
 
 21.445 Split L10. Read L10A and hold in core 
 
 21.806 Skip Lll (Read channel busy) 
 
 22.556 End Read L10A 
 
 22.556 Skip L10B (Memory full) 
 
 Table 3.2. Progress of Sample Search 
 
64 
 
 Time (ms. 
 
 past reference) Event 
 
 24.195 End merge: result = T6 (4811 postings) 
 
 24.195 Start merge (T6 and L10A) 
 
 25.000 End of Rotation 1 
 
 29.500 End merge: result T7 (5963 postings) 
 
 30.014 Read L2 and start merge (T7 and L2) 
 
 37.278 End merge: result = *R1 (on disk) (8108 postings) 
 
 37.292 End write *R1 
 
 37.570 Read L6 
 
 46.806 Read Lll and start merge (L6 and Lll) 
 
 47.556 Read L10B 
 
 48.417 End merge: result = T8 (1820 postings) 
 
 48.417 Start merge (T8 and L10B) 
 
 50.000 End of Rotation 2 
 
 50.139 End merge: result T9 (1940 postings) 
 
 55.052 Read *R1 and start merge (T9 and *R1) 
 
 63.792 End merge: result = *R2 (on disk) (9854 postings) 
 
 63.806 End write *R2. End of search. 
 
 Table 3.2 (continued). Progress of Sample Search 
 
 
a. 
 
 65 
 
 4A 
 
 —r~ 
 10 
 
 % 
 
 © 
 
 iZLfl 
 
 — r - 
 15 
 
 11 
 
 10A1 
 
 — r - 
 20 
 
 @}|V§) 
 
 TIME (ms) 
 DISK LAYOUT 
 
 — r- 
 
 10 
 
 i 
 
 20 
 
 25 
 
 —I 
 25 
 
 r - 
 25 
 
 15 
 
 <D 
 
 TIME (ms) 
 MERGE ACTIVITY 
 
 (a) First Rotation 
 
 © 
 
 35 
 
 JZZL 
 
 - 1 — 
 40 
 
 — r~ 
 45 
 
 (ii;(iob 
 
 JUL 
 
 — i 
 50 
 
 30 
 
 TIME (ms) 
 DISK LAYOUT 
 
 
 
 
 
 
 
 1 £ 
 
 s 
 
 
 
 1 
 
 
 1 ' 
 25 
 
 3 
 
 
 
 i 
 35 
 
 l 
 40 
 
 i 
 45 
 
 i 
 50 
 
 TIME (ms) 
 MERGE ACTIVITY 
 
 (b) Second Rotation 
 
 
 1 
 
 s 
 
 1 
 
 
 
 1 
 50 
 
 *1 
 
 r 
 
 55 
 
 
 1 1 
 60 65 
 
 TIME (ms) 
 DISK LAYOUT 
 
 i 
 
 70 
 
 I 
 75 
 
 
 
 1 
 
 
 1 
 50 
 
 r 
 
 55 
 
 
 i i 
 
 60 65 
 TIME (ms) 
 MERGE ACTIVITY 
 
 r 
 
 70 
 
 1 
 75 
 
 (c) Third Rotation 
 Figure 3.2. Processing Example: "OR" Eleven Terms 
 
66 
 
 4. PERFORMANCE 
 
 4.1 Preliminaries 
 
 The goal of the performance studies reported in this chapter is 
 to assess the capabilities of the new system operating in a realistic re- 
 trieval environment. To this end, sample searches have been selected from 
 the full MEDLARS Master MESH as of November, 1972, [15] with the aid of [3] 
 and other data obtained partly from the National Library of Medicine de- 
 scribing the MEDLARS and MEDLINE retrieval systems. Any errors in inter- 
 preting this information, of course, lie entirely with the author of this 
 report. 
 
 While characteristics of the MEDLARS data base have been used to 
 add realism to these tests, the simulated system differs in certain regards 
 from both MEDLARS and MEDLINE and is not intended to be a direct representa- 
 tion of either one. 
 
 The Master MESH is a listing of the complete Medical Subject Head- 
 ings (MESH) Index together with a tally of the documents indexed under each 
 term. The MESH index language is a carefully controlled, hierarchically 
 structured vocabulary of over 8500 terms employed by professional indexers 
 to classify technical articles from 2200 journals. The data base used in 
 this study contains over 1,000,000 citations dating from January 1964 to 
 November 1972. Individual terms reference from one to nearly 500,000 
 documents. It is interesting to note that the data base for MEDLINE, the 
 on-line version of this service, currently contains about 450,000 citations 
 and is limited in coverage to approximately the most recent three years. 
 This restriction is necessary in part because of the prohibitively long 
 search times required to process the larger data base in an on-line, real 
 
67 
 
 time environment and still provide adequately fast response for a large and 
 growing number of users. 
 
 The search language employed in the MEDLARS system permits the 
 retrieval of all documents indexed under one or more terms joined by the 
 logical connectives "AND", "OR", and "AND NOT" in a standard way. Several 
 techniques exist for modifying the basic search pattern, and one--the 
 explosion--is of special interest here. The request EXPLODE (TERM) is a 
 shortcut for searching simultaneously a general term and all of its sub- 
 ordinates in the hierarchy. It accomplishes the same objective as ORing 
 all terms with the same classification number. More than one EXPLODE can 
 be included in a search statement, and such requests can result in searches 
 involving a very large number of terms and a large amount of processing time. 
 
 The primary example used in these studies is the moderately long 
 search, 
 
 EXPLODE (CENTRAL NERVOUS SYSTEM), 
 
 which involves the coordination of 70 terms having a combined total of 
 67,527 document postings (see Table 4.1). A shorter search, 
 
 PARALYSIS OR PARAPLEGIA OR QUADRIPLEGIA, 
 
 (Table 4.2) is mentioned occasionally for comparison. Currently, short 
 searches occur more frequently than longer ones although both are common. 
 However, as data bases expand and user communities grow, demands on a re- 
 trieval system increase. This analysis is oriented toward the longer search 
 because it provides a better description of the system's performance under 
 heavy load. 
 
68 
 
 Term 
 
 Documents 
 Referenced 
 
 CENTRAL NERVOUS SYSTEM 
 
 2976 
 
 BRAIN 
 
 16277 
 
 BRAIN STEM 
 
 1680 
 
 MEDULLA OBLONGATA 
 
 1125 
 
 OLIVARY NUCLEUS 
 
 199 
 
 PONS 
 
 750 . 
 
 CEREBELLOPONTILE ANGLE 
 
 156 
 
 VESTIBULAR NUCLEI 
 
 292 
 
 RETICULAR FORMATION 
 
 1254 
 
 CEREBELLUM 
 
 2000 
 
 DIENCEPHALON 
 
 812 
 
 HYPOTHALAMUS 
 
 4931 
 
 HYPOTHALAMO-HYPOPHYSEAL SYSTEM 
 
 1440 
 
 MAMMILLARY BODIES 
 
 24 
 
 THALAMUS 
 
 1737 
 
 GENICULATE BODIES 
 
 845 
 
 THALAMIC NUCLEI 
 
 395 
 
 MESENCEPHALON 
 
 1471 
 
 CORPORA QUADRIGEMINA 
 
 403 
 
 INFERIOR COLLICULUS 
 
 93 
 
 OPTIC LOBE 
 
 226 
 
 SUPERIOR COLLICULUS 
 
 171 
 
 RED NUCLEUS 
 
 186 
 
 SUBSTANTIA NIGRA 
 
 294 
 
 TELENCEPHALON 
 
 434 
 
 CEREBRAL CORTEX 
 
 6877 
 
 CORPUS CALLOSUM 
 
 472 
 
 FRONTAL LOBE 
 
 818 
 
 GYRUS CINGULI 
 
 106 
 
 MOTOR CORTEX 
 
 313 
 
 OCCIPITAL LOBE 
 
 553 
 
 VISUAL CORTEX 
 
 1448 
 
 PARIETAL LOBE 
 
 398 
 
 Table 4.1. Data for Long Search [15] 
 
69 
 Term 
 
 Documents 
 Referenced 
 
 SOMATOSENSORY CORTEX 
 
 281 
 
 TEMPORAL LOBE 
 
 619 
 
 AUDITORY CORTEX 
 
 502 
 
 HIPPOCAMPUS 
 
 1515 
 
 AMYGDALOID BODY 
 
 526 
 
 LIMBIC SYSTEM 
 
 1077 
 
 CEREBRAL VENTRICLES 
 
 1655 
 
 CEREBRAL AQUEDUCT 
 
 35 
 
 CHOROID PLEXUS 
 
 363 
 
 CISTERNA MAGNA 
 
 172 
 
 EPENDYMA 
 
 338 
 
 MENINGES 
 
 . 344 
 
 ARACHNOID 
 
 243 
 
 SUBARACHNOID SPACE 
 
 305 
 
 DURA MATER 
 
 513 
 
 PIA MATER 
 
 145 
 
 NEURAL ANALYZERS 
 
 257 
 
 SPINAL CORD 
 
 3773 
 
 CAUDA EQUINA 
 
 195 
 
 EXTRAPYRAMIDAL TRACT 
 
 294 
 
 PYRAMIDAL TRACTS 
 
 376 
 
 SPINOTHALAMIC TRACTS 
 
 28 
 
 ANTERIOR HORN CELLS 
 
 170 
 
 AUDITORY PATHWAYS 
 
 102 
 
 CRANIAL FOSSA, POSTERIOR 
 
 234 
 
 TEGMENTUM MESENCEPHALI 
 
 48 
 
 VISUAL PATHWAYS 
 
 257 
 
 LISSAUER'S TRACT 
 
 4 
 
 NEURAL INTERCONNECTIONS 
 
 900 
 
 POSTERIOR COLUMNS 
 
 4 
 
 RESPIRATORY CENTER 
 
 211 
 
 CEREBELLAR CORTEX 
 
 588 
 
 CEREBELLAR NUCLEI 
 
 126 
 
 CORPUS STRIATUM 
 
 72 
 
 Table 4.1 (continued). Data for Long Search [15] 
 
70 
 
 Documents 
 
 Term Referenced 
 
 SEPTAL NUCLEI 9 
 
 OLFACTORY BULB 75 
 
 OLFACTORY PATHWAYS 15_ 
 
 Total 67527 
 
 Average 965 
 
 Table 4.1 (continued). Data for Long Search [15] 
 
 PARALYSIS 2026 
 
 PARAPLEGIA 1033 
 
 QUADRIPLEGIA 374 
 
 Total 3433 
 
 Average 1144 
 
 Table 4.2. Data for Short Search [15] 
 
71 
 The experimental procedure has been to generate a series of 
 
 performance curves showing the average time required to perform a search 
 under various conditions using various system configurations. Every point 
 plotted explicitly represents the average of results from 30 trials where 
 each trial involves a complete simulation of the search in question, be- 
 ginning with the random assignment of a disk address to each data file and 
 the random choice of an initial rotational position. Coordination proceeds 
 according to the algorithms in the previous chapter under the control of a 
 supervisory program which allocates system resources, performs the initial 
 address assignments and collects performance data. Whenever multiple, 
 independent searches are conducted simultaneously, each term in each search 
 receives a separate disk address; and the coordination algorithms are 
 applied to each search independently. Thus, if two of the long searches 
 defined in Table 4.1 are processed in parallel, it is as if two users had 
 requested searches having identical parameters but entirely different index 
 terms. Independent searches compete for limited system resources such as 
 memory space, the merge network and the I/O facilities. 
 
 The monitor program controls the progress of a simulation by main- 
 taining a time-ordered queue listing all events of interest to the system. 
 These include the access times for all files to be processed and the sched- 
 uled completion times for all I/O and merge procedures in progress. This 
 routine can refuse any request for service if the required facilities are in 
 use or if other conflicts arise. In this event, the postings file in ques- 
 tion is considered again on the next rotation. 
 
 System configurations examined in these tests include 1, 16, 32, 
 64, 128, 256 and 512 parallel data paths and standard memory sizes of 4K, 
 8K, 16K, 32K, 40K, 50K, and 64K words (K=1024). Other memory sizes have 
 
72 
 
 been used under special circumstances as noted in the text. Three con- 
 figurations receive particular attention: in the remainder of this report 
 the terms "large", "small" and "conventional" are used to describe systems 
 having 256, 16 and 1 data path, respectively. As mentioned previously, a 
 system with 256 parallel paths is the largest considered technologically 
 feasible at the present time. A sixteen-path system performs well and 
 should be relatively cheap and easy to build and maintain. The use of a 
 one-path system to represent a conventional machine is somewhat arbitrary, 
 but it is believed to be a conservative choice for the following reasons. 
 Consider a conventional movable head disk with a 25ms rotation 
 time, a 60ms average track access time and a track capacity of 1800 words. 
 Thirty-seven and one-half rotations, or approximately 0.94 seconds, are 
 needed to transfer the data required for this problem. If the seventy 
 files are located randomly on the disk, then, on the average, additional 
 penalties of 0.88 seconds for latency and 4.2 seconds for head motion are 
 required to access the data. Finally, on the basis of published execution 
 times for a large general purpose computer and a short segment of code 
 written to perform the term coordination function on this machine, it is 
 estimated that approximately six microseconds of processing time per data 
 element are required to perform this task. At that rate, 2.3 seconds of 
 CPU time are required to merge 64 lists of 1000 items each using an 
 optimal 2-way merge. Adding these times yields a rough estimate of 8.32 
 seconds for a conventional machine to perform this search. No allowance 
 is made for interruptions other than disk 1/0, and a memory adequate to 
 
 IBM 360/75 with four-way interleaved memory and IBM 2314 A-series disks 
 
73 
 
 hold all data is assumed. The corresponding figure from simulation of the 
 one-path hardware merge system is 4.23 seconds. With a memory restricted 
 to 4K words, the time for the one-path system is 9.47 seconds. Thus it is 
 felt that the one-path simulation produces a time estimate which is com- 
 parable with but generally shorter than the processing time that would be 
 required by a conventional computer with the same size memory. 
 
 4.2 Monoprogrammed Results 
 
 4.2.1 Basic Tests 
 
 Figure 4.1 contains the primary experimental result of this paper: 
 a description of the time required to process a long search using hardware 
 systems and data memories of various sizes under the standard conditions 
 defined in Chapter 3. Presentation of the actual merge times associated 
 with these trials is deferred until section 4.5 since those curves require 
 some special interpretation. Since the longest file is never retained in 
 core, a 64K word memory is sufficient to contain all the data which must be 
 processed internally for the long sample search: no further increase in 
 memory capacity can affect the results. 
 
 Figure 4.1 includes all the configurations discussed in the 
 previous section except the conventional system, whose performance curve 
 lies beyond the range of the graph. Data for that system appears under the 
 headings "Conventional System" and "Mean" in Table 4.3, which also contains 
 average values for the small and large parallel systems. The remaining 
 columns in the table show the standard deviations for the various samples 
 and the corresponding 95% confidence intervals, assuming that use of the 
 Central Limit Theorem is justified. With a probability of 95%, every point 
 
74 
 
 «/> 
 
 1000 
 
 900 
 
 800 
 
 700 
 
 UJ 
 
 
 2 
 
 600 
 
 H- 
 
 
 Q 
 
 500 
 
 UJ 
 
 
 </> 
 
 
 0. 
 
 
 < 
 
 400 
 
 UJ 
 
 300 
 
 200 
 
 100 
 
 SEARCH: ONE 70- TERM OR 
 CONDITIONS: STANDARD 
 
 -L 
 
 10K 20K 30K 40K 50 K 
 
 MEMORY SIZE 
 (WORDS; K = 1024) 
 
 16 PATHS 
 
 32 PATHS 
 
 64 PATHS 
 128 PATHS 
 256 PATHS 
 512 PATHS 
 
 1 
 
 60K 70K 
 
 Figure 4.1. Basic Performance Analysis 
 
75 
 
 1 
 
 
 a> 
 
 
 00 
 
 
 
 
 
 
 
 
 
 o 
 
 
 E 
 
 
 
 
 
 
 
 
 
 C i— c 
 
 
 <y\ 
 
 co 
 
 
 
 CO 
 
 r— 
 
 CM 
 
 CM 
 
 1 
 
 
 <u ro <o 
 
 
 LO 
 
 p-^ 
 
 Pv. 
 
 CO 
 
 LO 
 
 «* 
 
 P>» 
 
 E 
 
 
 -a > <u 
 
 
 
 
 
 
 
 
 
 OJ 
 
 
 •r- s- s: 
 
 
 O 
 
 00 
 
 p«* 
 
 LO 
 
 LO 
 
 «d- 
 
 CO 
 
 4-> 
 
 
 4- co 
 
 
 ^— 
 
 
 
 
 
 
 
 CO 
 
 ^^s 
 
 &S C 4-> i- 
 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 >> 
 
 CO 
 
 LO O C O 
 
 
 
 
 
 
 
 
 
 CO 
 
 -c 
 
 CTiOmH- 
 
 
 
 
 
 
 
 
 
 , 
 
 4-> 
 ITS 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 O) 
 
 Q. 
 
 • 
 
 
 
 
 
 
 
 
 
 r— 
 
 
 > 
 
 
 00 
 
 
 
 
 
 
 
 r— 
 
 lO 
 
 a> 
 
 
 E 
 
 
 
 
 
 
 
 rO 
 
 4-> 
 
 Q 
 
 
 O 
 
 00 
 
 *d- 
 
 00 
 
 p*. 
 
 CO 
 
 CO 
 
 S- 
 
 A3 
 
 
 
 «3- 
 
 «* 
 
 CO 
 
 
 
 P-» 
 
 00 
 
 CTi 
 
 ro 
 
 -a 
 
 
 
 
 
 
 
 
 
 
 Q_ 
 
 
 -a 
 
 
 00 
 
 CO 
 
 
 
 LO 
 
 ^f- 
 
 r— 
 
 CTt 
 
 
 CO 
 
 -4-> 
 
 
 CM 
 
 CM 
 
 CM 
 
 ^-» 
 
 r- 
 
 r— 
 
 
 (L) 
 
 LO 
 
 CO 
 
 
 
 
 
 
 
 
 
 CD 
 
 C\J 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 rO 
 
 
 
 
 00 
 
 
 
 
 
 
 
 _1 
 
 
 
 
 E 
 
 
 
 
 
 
 
 
 
 c 
 
 
 CO 
 
 cr> 
 
 r^ 
 
 r— 
 
 
 
 CM 
 
 O 
 
 
 
 
 rO 
 
 
 cr> 
 
 LO 
 
 co 
 
 CM 
 
 
 
 CTv 
 
 r— 
 
 
 
 
 OJ 
 
 
 
 
 
 
 
 
 
 
 
 
 2: 
 
 
 r-» 
 
 LO 
 
 LO 
 
 «* 
 
 LO 
 
 <Ti 
 
 00 
 
 
 
 
 
 
 LO 
 
 r— 
 
 CO 
 
 CXt 
 
 CT> 
 
 CO 
 
 CO 
 
 1 
 
 
 
 
 
 CO 
 
 CM 
 
 1 
 
 
 
 
 
 1 
 
 
 0) 
 
 
 
 
 
 
 
 
 
 T 
 
 
 o 
 
 
 00 
 
 
 
 
 
 
 
 
 
 C i— c 
 
 
 E 
 
 
 
 
 
 
 
 i 
 
 
 (U ro ra 
 
 
 C\J 
 
 ^— 
 
 CTt 
 
 LO 
 
 O 
 
 P^ 
 
 r— 
 
 E 
 
 
 T3 > CD 
 
 
 r>. 
 
 r— 
 
 p-. 
 
 CM 
 
 ^— 
 
 CM 
 
 ^J" 
 
 CD 
 
 
 •r- i. 21 
 
 
 
 
 
 
 
 
 
 +■> 
 
 
 4- O) 
 
 
 CO 
 
 *d" 
 
 CO 
 
 «* 
 
 LO 
 
 ^d" 
 
 «* 
 
 CO 
 
 .*— ■% 
 
 &« C 4-> i- 
 
 
 r— 
 
 r— 
 
 r— 
 
 1— 
 
 ^— 
 
 
 
 >> 
 
 oo 
 
 LO O C O 
 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 CO 
 
 -C 
 
 tr> o >-i 4- 
 
 
 
 
 
 
 
 
 
 , 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CD 
 
 ti- 
 
 • 
 
 
 
 
 
 
 
 
 
 r— 
 
 
 > 
 
 
 00 
 
 
 
 
 
 
 
 i — 
 
 ro 
 
 0) 
 
 
 E 
 
 
 
 
 
 
 
 rO 
 
 •<-> 
 
 Q 
 
 
 r^. 
 
 CO 
 
 r-^ 
 
 O 
 
 00 
 
 LO 
 
 CO 
 
 S- 
 
 (O 
 
 
 
 p-». 
 
 00 
 
 CT> 
 
 CM 
 
 *d- 
 
 *fr 
 
 00 
 
 rC 
 
 T3 
 
 
 
 
 
 
 
 
 
 
 Q_ 
 
 
 ■a 
 
 
 CO 
 
 p^ 
 
 CO 
 
 00 
 
 
 
 n— 
 
 n— 
 
 
 CO 
 
 4-> 
 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 <3- 
 
 r— 
 
 ^~ 
 
 1 — 
 
 i— 
 
 CO 
 
 
 
 
 
 
 
 
 
 ro 
 
 ' 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 E 
 
 
 
 
 00 
 
 
 
 
 
 
 
 CO 
 
 
 
 
 E 
 
 
 
 
 
 
 
 
 
 c 
 
 
 CT> 
 
 CM 
 
 ^~ 
 
 LO 
 
 co 
 
 CO 
 
 00 
 
 
 
 
 ro 
 
 
 O 
 
 CM 
 
 CT> 
 
 LO 
 
 LO 
 
 CTi 
 
 CO 
 
 
 
 
 ai 
 
 
 
 
 
 
 
 
 
 
 
 
 2: 
 
 
 CO 
 
 CTi 
 
 CO 
 
 O 
 
 CM 
 
 CM 
 
 
 
 
 
 
 
 
 CO 
 
 LO 
 
 CM 
 
 r— 
 
 "3- 
 
 O 
 
 
 
 1 
 
 1 
 
 
 
 
 00 
 
 LO 
 
 «3- 
 
 *3" 
 
 *fr 
 
 CO 
 
 CO 
 
 
 . 
 
 
 OJ 
 
 
 00 
 
 
 
 
 
 
 
 
 
 o 
 
 
 E 
 
 
 
 
 
 
 
 
 
 
 C r— C 
 
 
 pv. 
 
 «* 
 
 CT> 
 
 «3" 
 
 LO 
 
 «3- 
 
 CTi 
 
 
 
 
 CD fO ro 
 
 
 
 
 1— 
 
 CO 
 
 p-» 
 
 LO 
 
 p^. 
 
 CM 
 
 
 
 -o > o 
 
 
 
 
 
 
 
 
 
 E 
 
 
 •r- s- s: 
 
 
 00 
 
 CM 
 
 LO 
 
 CO 
 
 n~ 
 
 «* 
 
 LO 
 
 CD 
 
 
 4- CD 
 
 
 CO 
 
 ^- 
 
 O 
 
 
 
 CT> 
 
 •* 
 
 ^~ 
 
 •4-> 
 
 
 ci-S G 4-> S- 
 
 
 
 t— 
 
 ^™ 
 
 t— 
 
 t—~ 
 
 
 
 oo 
 
 .- s 
 
 LO O C O 
 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 +1 
 
 >> 
 
 sz 
 
 OlUMlf. 
 
 
 
 
 
 
 
 
 
 CO 
 
 -M 
 
 
 
 
 
 
 
 
 
 
 
 ro 
 
 
 
 
 
 
 
 
 
 
 t— 
 
 Q- 
 
 • 
 
 
 
 
 
 
 
 
 
 ro 
 
 
 > 
 
 
 00 
 
 
 
 
 
 
 
 c 
 
 ro 
 
 CD 
 
 
 E 
 
 
 
 
 
 
 
 o 
 
 ■*-> 
 
 Q 
 
 
 CD 
 
 P^. 
 
 LO 
 
 CM 
 
 CO 
 
 CO 
 
 <Ti 
 
 •r— 
 
 rO 
 
 
 
 O 
 
 O 
 
 CO 
 
 r— 
 
 LO 
 
 <Ti 
 
 CTi 
 
 +J 
 
 -a 
 
 
 
 
 
 
 
 
 
 
 C 
 
 
 -a 
 
 
 CM 
 
 t— 
 
 CO 
 
 CO 
 
 CO 
 
 CD 
 
 O 
 
 ID 
 
 CD 
 
 -M 
 
 
 O 
 
 00 
 
 00 
 
 r^ 
 
 r— 
 
 r— 
 
 <^- 
 
 > 
 
 c 
 
 CO 
 
 
 !—• 
 
 CO 
 
 CM 
 
 CM 
 
 LO 
 
 r— 
 
 
 c 
 O 
 
 o 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 00 
 
 
 
 
 
 
 
 
 
 
 
 E 
 
 
 
 
 
 
 
 
 
 
 
 
 CM 
 
 CO 
 
 CO 
 
 CM 
 
 r— 
 
 CO 
 
 CTi 
 
 
 
 
 c 
 
 
 r*» 
 
 I— 
 
 CO 
 
 p>» 
 
 P^ 
 
 CO 
 
 O 
 
 
 
 
 ro 
 
 
 
 
 
 
 
 
 
 
 
 
 a> 
 
 
 CM 
 
 CO 
 
 LO 
 
 CO 
 
 r~. 
 
 LO 
 
 ^^ 
 
 
 
 
 s: 
 
 
 p>. 
 
 
 
 n- 
 
 CO 
 
 00 
 
 CO 
 
 CO 
 
 
 
 
 
 
 «3- 
 
 LO 
 
 r^ 
 
 
 
 CO 
 
 CO 
 
 CM 
 
 1 
 
 1 
 
 
 
 
 CT> 
 
 CO 
 
 LO 
 
 LO 
 
 LO 
 
 «3- 
 
 <3" 
 
 
 
 >> ^> 
 
 
 
 
 
 
 
 
 
 
 
 S- "O 
 
 
 
 
 
 
 
 
 
 
 
 O CD S- 
 
 
 ^ 
 
 ^ 
 
 ±Z 
 
 ^. 
 
 :*: 
 
 M 
 
 ^ 
 
 
 
 E N O 
 
 
 <d- 
 
 00 
 
 CO 
 
 CM 
 
 
 
 O 
 
 «5d" 
 
 
 
 O) -r- s 
 
 
 
 
 r— 
 
 CO 
 
 ^f 
 
 LO 
 
 co 
 
 
 
 2T CO^— 
 
 
 
 
 
 
 
 
 
 00 
 
 E 
 OJ 
 
 +J 
 
 00 
 
 >5 
 
 CO 
 
 -o 
 
 s- 
 
 ro 
 "O 
 
 ro 
 +J 
 CO 
 
 <D 
 CD 
 
 i- 
 
 o 
 
 CD 
 U 
 
 c 
 ro 
 E 
 S- 
 O 
 4- 
 S- 
 0) 
 Q. 
 
 O 
 
 S- 
 rO 
 <D 
 CO 
 
 CD 
 
 c 
 o 
 
 CO 
 
 cu 
 !n 
 
 ro 
 
76 
 
 plotted for the large and small parallel systems lies within 15.1ms of the 
 true average value. In most cases the interval is actually much smaller. 
 For the conventional system, the confidence interval extends as far as 
 ±192ms from the calculated average value, a deviation of less than 3.5% 
 from the mean. 
 
 Ignore, for now, the local maximum which the performance curves 
 exhibit around 40K words and consider the performance potential these results 
 represent. Table 4.4 shows the speed-up which can be achieved by the 
 various systems relative to the assumed conventional machine. For a small 
 system, the speed-up is roughly a factor of 12 at all memory sizes; for a 
 large system, the factor varies from 26.46 with a 4K memory to approximately 
 62 with a memory of 50K words or more. In absolute terms, the large system 
 can coordinate 70 files containing a total of over 67,000 postings in an 
 average time equivalent to about 2-1/2 disk rotations. 
 
 In this test, the large system outperforms the small one by a 
 factor which ranges approximately from 2.5 to 4.5--a small improvement con- 
 sidering the additional cost, complexity and bandwidth involved. The 
 reason is that this search does not represent a very heavy load for these 
 machines, especially the larger one. A greater separation can be seen in 
 some of the data base size experiments to be presented in the next section. 
 
 For comparison, Table 4.5 presents results achieved in processing 
 the short, three-term, sample search. All figures in the table represent 
 system configurations having sufficient memory to hold all data files. In 
 processing a single search of this magnitude, the parallel systems out- 
 perform the conventional system by a factor of only about 3, and there is 
 yery little difference between the two parallel systems. When 10 independent 
 
77 
 
 +-> 
 
 a. 
 i 
 
 CM 
 
 in 
 
 
 1X5 
 
 cr> 
 
 CM 
 
 CT< 
 
 tr> 
 
 oo 
 
 <X5 
 O 
 
 00 
 
 oo 
 
 00 
 
 oo 
 
 1X5 
 
 CO 
 1X5 
 
 1X5 
 
 <X5 
 <X5 
 
 O 
 00 
 
 O 
 1^ 
 
 256-Path 
 
 ("Large 
 
 System") 
 
 
 to 
 
 st 
 
 <X5 
 
 CM 
 
 oo 
 o 
 
 CO 
 
 CM 
 
 oo 
 
 CT> 
 
 OO 
 
 LO 
 
 St 
 OO 
 
 cr> 
 
 LO 
 
 oo 
 
 si- 
 
 CM 
 <X5 
 
 OO 
 
 CM 
 1X5 
 
 +J 
 to 
 
 Q_ 
 
 CO 
 C\J 
 
 
 in 
 
 St 
 CM 
 
 sl- 
 o 
 
 CM 
 
 LO 
 
 oo 
 
 CTi 
 
 CM 
 st 
 
 00 
 
 00* 
 
 st 
 
 1X5 
 
 oo 
 
 1X5 
 
 OO 
 OO 
 
 CTl 
 
 St 
 
 +J 
 a. 
 st 
 
 <X5 
 
 
 sj- 
 CO 
 
 CM 
 
 o 
 
 <X5 
 
 CM 
 CM 
 
 sl- 
 
 CM 
 
 05 
 IT) 
 
 O 
 OO 
 
 O 
 
 oo 
 
 1X5 
 OO 
 
 «X5 
 
 oo 
 
 oo 
 oo 
 
 x: 
 
 to 
 
 a. 
 
 i 
 
 CM 
 
 oo 
 
 
 CJ5 
 
 sj- 
 
 1X5 
 
 1X5 
 
 sf 
 
 O 
 
 LO 
 
 cr» 
 
 r— 
 
 st 
 CM 
 
 St" 
 1X5 
 
 CM 
 
 CM 
 CT5 
 
 St 
 
 CM 
 
 00 
 st 
 
 CM 
 
 16-Path 
 ("Small 
 System") 
 
 
 CO 
 
 1X5 
 
 00 
 CTi 
 
 05 
 OO 
 
 CM 
 
 st 
 CM 
 
 St 
 st 
 
 <X5 
 
 o 
 
 st 
 
 Memory 
 
 Size 
 
 (words) 
 
 
 st 
 
 00 
 
 <X5 
 
 CM 
 OO 
 
 o 
 st 
 
 o 
 
 1X5 
 
 st 
 
 VX5 
 
 i. 
 o 
 to 
 to 
 CO 
 
 o 
 o 
 
 $- 
 
 Q- 
 
 (O 
 
 c 
 o 
 
 CO 
 
 > 
 o 
 
 C_5 
 
 o 
 
 CO 
 
 > 
 
 ns- 
 co 
 
 or 
 
 to 
 
 s- 
 o 
 
 4-> 
 O 
 
 CO 
 
 E 
 CO 
 
 > 
 o 
 
 s- 
 
 Q. 
 
 E 
 
 T5 
 <U 
 CO 
 
 Q- 
 
 oo 
 
 St 
 St 
 
 to 
 
78 
 
 Search 
 
 Conventional 
 System 
 
 Small Parallel 
 System 
 
 Large Parallel 
 System 
 
 
 
 
 
 One Short Sample 
 
 102.92ms 
 
 34.19ms 
 
 30.42ms 
 
 Ten Simultaneous 
 Short Samples 
 
 807.38 
 
 125.44 
 
 55.49 
 
 Table 4.5. Elapsed Time for Short Sample Search 
 
79 
 
 short searches are processed simultaneously, performance differentials 
 became more apparent; and improvement factors over the conventional machine 
 range from 6.4 for the small system to 14.6 for the large one. 
 
 4.2.2 Data Base Expansion 
 
 One major concern in the design of any information retrieval 
 system is its potential for growth: how rapidly will performance degrade 
 as the data base expands? To answer this question, the same 70 term sample 
 search was used, but the lengths of all the postings files were multiplied 
 by factors of 1/4, 2 and 4; and new curves were generated for these modi- 
 fied data bases. Results are shown in Figures 4.2(a) and 4.2(b) for the 
 small and large systems, respectively. Only one point (30 trials), which 
 does not appear in the figures, was obtained for the conventional system 
 because of the prohibitively high cost of simulating this configuration. 
 Its average processing time is 16.81 sec. for the 4X expansion with a 220K 
 word memory. This is slower than the corresponding small parallel system 
 by a factor of 15.4 and slower than the large system by a factor of 160.7. 
 
 For reasons to be discussed in section 4.2.3, it is considered 
 valid to compare these systems at the points where minima occur in the 
 performance curves: 50K, 100K and >200K memory sizes. Figure 4.3 shows 
 the average elapsed time for the search as a function of data base size 
 and memory size for two parallel configurations. In both cases, the large- 
 memory curves are yery nearly linear while the 50K curves show an increasing 
 slope with increasing data base size. This effect, which reflects degraded 
 performance under heavy load, is much more pronounced for the small system 
 than the large one. Considering the best performance available from both 
 systems, a four-fold increase in the data base size (from X to 4X) increases 
 
80 
 
 S800 
 
 3000 
 
 SEARCH: ONE 70- TERM "OR' 
 
 PARAMETER ON CURVE: 
 
 DATA BASE EXPANSION FACTOR 
 
 ( X« STANDARD) 
 
 2800 
 
 6 
 
 2000 
 
 Ul 
 
 a 
 
 LU 
 
 GO 
 
 a. 
 
 UJ 
 
 1500 
 
 1000 
 
 500 
 
 • ° 4X 
 
 2X 
 
 \- 
 
 % 
 
 _i_ 
 
 _L 
 
 J_ 
 
 20K 40K 60K 80K 100K 120K 140K 160K 180K 200K 220K 240K 
 
 MEMORY SIZE 
 (WORDS; K=1024) 
 
 Figure 4.2. Effects of Data Base Expansion 
 
81 
 
 700 
 
 600 
 
 Z, 500 
 
 LJ 
 
 UJ 
 
 400 - 
 
 300 
 
 Q. 
 
 < 
 
 UJ 200 
 
 100 - 
 
 SEARCH: ONE 70-TERM OR 
 
 PARAMETER ON CURVE: 
 
 DATA BASE EXPANSION FACTOR 
 
 STANOARO) 
 
 '■■■'■ ■ ■ i i ■ 
 
 20K 40K 60K 80K 100K 120K 140K 160K 180K 200K 220K 
 
 MEMORY SIZE 
 
 (WORDS; K = 1024) 
 
 (b) For Large System 
 
 Figure 4.2 (continued). Effects of Data Base Expansion 
 
82 
 
 </) 
 
 Ld 
 
 1600 
 
 1400 
 
 1200 
 
 2 
 
 1000 
 
 1- 
 
 
 Q 
 
 
 LJ 
 
 eoo 
 
 (/) 
 
 
 0. 
 
 
 < 
 
 
 _l 
 
 600 
 
 UJ 
 
 
 400 - 
 
 200 - 
 
 MEMORY 
 
 50 K 
 
 100 K " 
 
 >200K 
 
 16 
 
 PATHS 
 
 EMORY 
 
 50 K 
 
 100 K 
 
 >200K 
 
 256 
 
 PATHS 
 
 X 2X 3X 
 
 RELATIVE DATA BASE SIZE 
 
 4X 
 
 Figure 4.3. Comparison of Small and Large Systems 
 with Expanded Data Bases 
 
83 
 
 the response time for the small system by a factor of 3.6 (to about 1.0 
 second) and that of the large system by a factor of only 1.3 (to about 
 0.1 second). Evidently there is room in the large system for considerably 
 more than a four-fold expansion in the assumed data base before response 
 times on the order of a few seconds will be encountered. 
 
 4.2.3 Discussion of Performance Curves 
 
 All the performance curves presented thus far have exhibited 
 certain common characteristics. For small memories, processing time de- 
 creases rapidly with increasing memory size up to a certain point. For 
 large memories--! arge enough to hold all the files to be coordinated except 
 File S--processing time reaches a constant minimum value. Between these two 
 extremes a peculiar but wery consistent system of oscillations occurs. These 
 oscillations are a direct result of the necessity to alternately fill the 
 memory with new data and then combine the resulting intermediate list with 
 the output file on disk. 
 
 Consider again the 4X data base performance curves presented in 
 Figures 4.2(a) and 4.2(b), shown together for convenience in Figure 4.4. 
 The curve for the small system contains sharply defined minima at 50, 70, 
 100 and 210K, with distinct peaks at 60, 90 and 170K words. The curve for 
 the large system contains a series of "plateaus" extending from 40-60K, 
 70-90K, 100-1 70K and upwards from 200K. Here the divisions are not as well 
 defined as on the other curve because the peak-to-peak variation is much 
 smaller. Nevertheless, the curve is clearly divided into several regions, 
 and the region boundaries correspond closely to the minima on the first 
 curve. Each of these regions corresponds to a different number of repeti- 
 tions of the memory filling and clearing cycle. 
 
3500 
 
 3000 
 
 2300 
 
 M 
 
 £ 2000 
 
 Id 
 
 o 
 w 
 
 </> 
 
 0- 
 
 < 
 _i 
 Id 
 
 1500 
 
 1000 
 
 500 
 
 84 
 
 SEARCH: ONE 70- TERM "OR" 
 4 X DATA BASE 
 
 16- PATHS 
 
 -• 256- PATHS 
 
 j_ 
 
 _i_ 
 
 20 K 40K 60 K 80K 100 K 120 K 140 K 
 
 MEMORY SIZE 
 (WORDS; K=1024) 
 
 160K 180K 200K 220K 
 
 Figure 4.4. Comparison of Large and Small System Performance 
 with 4X Data Base 
 
85 
 
 Under the list selection algorithms defined previously, there 
 exists some critical memory size, M, , above which it is always possible 
 
 to perform this search by filling the memory and processing the longest list 
 only once. Similarly, there exist other values NL, M~, ..., above which the 
 
 longest list need be processed no more than twice, three times, etc. These 
 critical values are located approximately at the minima on the performance 
 curve. Similarly, there exists a second set of critical memory sizes 
 N-,, N«» ...» N-, etc., below which it is never possible to complete the 
 
 — L _ J_ 
 
 search by processing the longest list 1, 2, ..., j times. These points 
 correspond to the maxima on the performance curves. 
 
 Processing the longest list is a significant event in this system 
 because it requires an unusually long merge (involving all the data in 
 memory and all the postings on the current longest list), and it also 
 entails a disk latency penalty of one-half rotation on the average. During 
 all this time, no other list collection or processing can proceed. 
 
 From the curves in Figure 4.4, M, for the sample search lies 
 
 around 210K. If so, one would expect to find NL near M,/2 = 105K, NL at 
 
 M,/3 = 70K, M. near M,/4 = 52. 5K, etc. In fact, these are the observed 
 
 locations of other minima on the curves. As the memory decreases further 
 in size, the critical points lie closer and closer together, the peaks tend 
 to become smaller, and the curves rise very steeply. 
 
 2 
 No data exists at 105K; minimum occurs at TOOK, 
 
86 
 
 Other performance curves behave in a similar fashion. Thus, in 
 Figure 4.1, all curves exhibit a local maximum at 40K (except the 64 and 
 128 path systems, which never quite reverse their direction), and all reach 
 their final minimum values at 50K. In Figures 4.2(a) and 4.2(b), only the 
 4X curves have critical points (minima) near 70K; the 2X and 4X curves both 
 have critical points at 100K and 50K; and the X curve has critical points 
 near 50K and 30K. The 2X curve may also contain a minimum between 10K and 
 30K, but the sampling interval is too large to show this clearly. These 
 results are all consistent with the present theory. 
 
 Between points N. and M. there is a transition region in which it 
 
 is sometimes necessary to process the longest list i times and sometimes 
 i+1 . In this area, performance improves rapidly with increasing memory 
 size. Between the points M. and N._ -, there is a larger region where the 
 
 long list cycle is always repeated i times, but where larger memories may 
 be less effective than smaller ones. This results from the interaction of 
 several phenomena, most of which cannot be observed consistently to favor 
 one memory or another. On the average, however, their combined effect is 
 to discriminate against larger memories. 
 
 First, it takes longer to fill a large memory initially than a 
 smaller one. Then, too, it takes longer with a large memory to process 
 the last few sublists in core because these lists tend to be longer than 
 they would be in a smaller memory. If, as the algorithms now stand, in the 
 course of this final processing, the amount of free core rises above a 
 certain level, a few new, relatively short lists may be read. They have 
 to be incorporated with the long lists which are already there, a process 
 which again favors a smaller memory. This threshold crossing can occur 
 
87 
 
 several times, and it yields very little of value in exchange for the 
 
 3 
 processing time it requires. Processing the long list itself often takes 
 
 longer with a large memory than with a smaller one because the processing 
 
 time is proportional to the total number of data items which enter into 
 
 the merge. Finally, during the last cycle of activity, the system with a 
 
 larger memory tends to finish faster than one with a smaller memory because 
 
 less data remains to be processed. This advantage of the large-memory 
 
 configuration, however, is not sufficient to compensate for its several 
 
 disadvantages. 
 
 This analysis of the performance curve may be useful in planning 
 
 memory allocation procedures for processing multiple simultaneous searches. 
 
 The following procedure is proposed without verification. Determine the 
 
 combined total length of all files in the search except the longest one, 
 
 and regard this number as an approximation to M, . Then allocate a region 
 
 size equal to M. = M./i (i=l,2,...), where i is determined by other factors 
 such as the number of searches to be multiplexed, the desired response 
 time, etc. 
 
 4.2.4 Other Parameters 
 
 Three other factors considered in this study which might influence 
 system performance have all been found to be of minor significance. The 
 effects of varying the overlap between postings files and of changing the 
 
 3 
 
 A new algorithm which suppresses this activity has been tested with 
 
 promising results. See section 4.4. 
 
88 
 
 total processing time for the coordination of two files are considered in 
 this section. Changes in the memory threshold are examined under Al- 
 gorithmic Studies in section 4.4. In some cases these discussions are 
 limited to the small parallel system when experimental results have shown 
 that the effect of a given perturbation is more pronounced in the small 
 system than in the large one. 
 
 4.2.4.1 Overlap 
 
 Overlap is a measure of the extent to which different terms 
 index the same documents. In general, increasing the overlap factor de- 
 creases the effective size of the data base. This is clearly shown in 
 Figure 4.5, which presents long-search performance curves for overlap 
 factors of 0, 10% and 20%. For both large and small memories, processing 
 time varies inversely with the overlap factor; between these extremes, 
 the critical points tend to move toward smaller memories as the overlap 
 factor increases. 
 
 4.2.4.2 Buffering Delay 
 
 It was shown in Chapter 2 that certain memory constraints can 
 be relaxed if buffers are installed at the input and output of the special 
 purpose hardware. The proposed change would increase the processing time 
 for two lists from £-, + ju + 1 to £, + £« + 3 cycles, where lists 1 and 2 
 
 contain i-. and £ ? n-word sublists, respectively. Table 4.6 presents per- 
 formance data for both the small and large systems with processing times 
 of £, + i~ + 1 cycles and a conservative ju + £ 2 + 5 cycles. For this 
 
 small variation, neither system is consistently faster, and the average 
 
89 
 
 lOOOr 
 
 900 
 
 eoo 
 
 700 
 
 600 
 
 E 
 
 LJ 
 
 ? 500 
 
 Q 
 UJ 
 (/) 
 
 < 
 _l 
 UJ 
 
 400 
 
 300 
 
 LIST OVERLAP EFFECTS 
 
 SEARCH: ONE 70- TERM "or" 
 CONDITIONS: STANDARD, EXCEPT OVERLAP 
 OVERLAP: 
 
 o b) 10% ' 
 c) 20% 
 
 ,0% 
 
 ■x. 
 
 10% """l 
 
 o>\6 PATHS 
 
 20% 
 
 
 200 - 
 
 100 - 
 
 _L 
 
 _1_ 
 
 10K 
 
 20K 30K 
 
 40 K 
 
 50 K 
 
 60K 70K 
 
 MEMORY SIZE 
 
 (WORDS ; K=1024) 
 
 80 K 
 
 Figure 4.5. Overlap Factor Variations 
 
90 
 
 1 1 
 
 cu 
 
 
 
 
 
 
 
 
 
 
 en 
 
 &« 
 
 
 
 
 
 
 
 
 
 c 
 
 
 
 <T> 
 
 
 
 r— 
 
 
 
 
 «3 
 
 cvj 
 
 co 
 
 CO 
 
 CO 
 
 CT> 
 
 LO 
 
 *fr 
 
 
 
 .c 
 
 
 
 
 
 
 
 
 
 
 o 
 
 1 
 
 1 
 
 O 
 1 
 
 + 
 
 1 
 
 o 
 
 + 
 
 LO 
 
 + 
 
 E 
 
 
 C5-5 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ( 
 
 J 
 
 to 
 
 
 
 
 
 
 
 
 + 
 
 j 
 
 0) 
 
 
 
 
 
 
 
 
 
 
 i 
 
 ^~- 
 
 
 
 
 
 
 
 
 >, <L> 
 
 o 
 
 CO 
 
 
 
 
 
 
 
 1/ 
 
 } E 
 
 >> 
 
 E 
 
 
 
 
 
 
 
 
 •r~ 
 
 a 
 
 co 
 
 cr> 
 
 00 
 
 CT« 
 
 CNJ 
 
 CO 
 
 LO 
 
 
 i— 
 
 
 r^ 
 
 r». 
 
 o 
 
 00 
 
 CNI 
 
 CNI 
 
 r^ 
 
 1 
 
 j 
 
 ID 
 
 
 
 
 
 
 
 
 
 CD 
 
 + 
 
 CO 
 
 CNJ 
 
 LO 
 
 LO 
 
 CO 
 
 O 
 
 r— 
 
 
 c 
 
 CNJ 
 
 LO 
 
 r— 
 
 CO 
 
 cn 
 
 CT> 
 
 r^. 
 
 r^ 
 
 n 
 
 3 •!- 
 
 <^ 
 
 CO 
 
 CNJ 
 
 r— 
 
 
 
 
 
 1 
 
 CO 
 
 + 
 
 
 
 
 
 
 
 
 n 
 
 3 CO 
 
 ^~ 
 
 
 
 
 
 
 
 
 Q 
 
 <L) 
 
 o* 
 
 
 
 
 
 
 
 
 
 O 
 
 
 
 
 
 
 
 
 
 a 
 
 J s- 
 
 
 
 
 
 
 
 
 
 Cn Q- 
 
 to 
 
 
 
 
 
 
 
 
 s. 
 
 
 CD 
 
 
 
 
 
 
 
 
 n 
 
 3 
 
 r— 
 
 
 
 
 
 
 
 
 _ 
 
 J 
 
 O 
 
 CO 
 
 
 
 
 
 
 
 = 
 
 
 >> 
 
 E 
 
 
 
 
 
 
 
 
 
 o 
 
 CO 
 
 CT> 
 
 t— 
 
 p— 
 
 O 
 
 CNJ 
 
 o 
 
 
 
 
 CT> 
 
 in 
 
 CO 
 
 CNJ 
 
 O 
 
 CT> 
 
 r— 
 
 
 
 r— 
 
 
 
 
 
 
 
 
 
 
 + 
 
 r^ 
 
 LO 
 
 LO 
 
 «3- 
 
 LO 
 
 CTi 
 
 00 
 
 
 
 CM 
 
 LO 
 
 r— 
 
 CO 
 
 CT> 
 
 cn 
 
 CO 
 
 CO 
 
 
 
 o* 
 
 CO 
 
 CNJ 
 
 r— 
 
 
 
 
 
 
 
 + 
 
 
 
 
 
 
 
 
 
 
 n~ 
 
 
 
 
 
 
 
 
 ' 
 
 ' 
 
 o? 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 cu 
 en 
 
 c 
 
 <?s 
 
 
 
 r->. 
 
 LO 
 
 
 
 
 
 <0 
 
 CO 
 
 O 
 
 r*. 
 
 LO 
 
 *d- 
 
 r^ 
 
 CO 
 
 
 
 -C 
 
 
 
 
 
 
 
 
 
 
 O 
 
 t— 
 
 ^— 
 
 CNJ 
 
 o 
 
 o 
 
 CNJ 
 
 o 
 
 £ 
 
 
 ■&§ 
 
 + 
 
 + 
 
 
 + 
 
 + 
 
 + 
 
 + 
 
 
 
 
 
 
 
 
 
 
 a 
 
 3 
 
 to 
 
 
 
 
 
 
 
 
 + 
 
 j 
 
 cu 
 
 
 
 
 
 
 
 
 u 
 
 7 
 
 a 
 
 CO 
 
 
 
 
 
 
 
 
 
 ) E 
 
 >> 
 
 E 
 
 
 
 
 
 
 
 
 •r— 
 
 o 
 
 00 
 
 <T> 
 
 CT> 
 
 00 
 
 00 
 
 CO 
 
 CNJ 
 
 
 h- 
 
 
 CO 
 
 CT> 
 
 CNJ 
 
 CO 
 
 LO 
 
 CNI 
 
 00 
 
 c 
 
 j 
 
 UO 
 
 
 
 
 
 
 
 
 
 o> 
 
 + 
 
 co 
 
 <>»- 
 
 LO 
 
 CNJ 
 
 -d- 
 
 r— 
 
 r— 
 
 
 c 
 
 CO 
 
 *d- 
 
 CO 
 
 n~ 
 
 r— 
 
 "vl- 
 
 r— 
 
 O 
 
 n 
 
 3 -i- 
 
 o* 
 
 00 
 
 LO 
 
 ^f 
 
 *d- 
 
 «* 
 
 CO 
 
 CO 
 
 s. 
 
 CO 
 
 + 
 
 
 
 
 
 
 
 
 n 
 
 3 CO 
 
 t— 
 
 
 
 
 
 
 
 
 Q. 
 
 CD 
 O 
 
 c* 
 
 
 
 
 
 
 
 
 — 
 
 o 
 
 
 
 
 
 
 
 
 
 
 j_ 
 
 
 
 
 
 
 
 
 
 
 Q_ 
 
 co 
 
 
 
 
 
 
 
 
 n 
 
 3 
 
 cu 
 
 
 
 
 
 
 
 
 E 
 
 
 r— 
 
 
 
 
 
 
 
 
 c/ 
 
 i 
 
 o 
 
 co 
 
 
 
 
 
 
 
 = 
 
 
 >> 
 
 E 
 
 
 
 
 
 
 
 
 
 u 
 
 CT> 
 
 CNJ 
 
 r~ 
 
 LO 
 
 CO 
 
 CO 
 
 00 
 
 
 
 
 O 
 
 CN1 
 
 a> 
 
 LO 
 
 LO 
 
 CT> 
 
 00 
 
 
 
 t-~ 
 
 
 
 
 
 
 
 
 
 
 + 
 
 CO 
 
 CT> 
 
 CO 
 
 O 
 
 CNJ 
 
 CNJ 
 
 o 
 
 
 
 CNJ 
 
 CO 
 
 LO 
 
 CNJ 
 
 r-* 
 
 -3- 
 
 o 
 
 o 
 
 
 
 ■=>? 
 
 00 
 
 LO 
 
 «si- 
 
 «* 
 
 <d- 
 
 CO 
 
 CO 
 
 
 
 + 
 
 
 
 
 
 
 
 
 
 
 ^~ 
 
 
 
 
 
 
 
 
 
 
 o« 
 
 
 
 
 
 
 
 
 ] 
 
 
 f 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 >> 
 
 co 
 
 
 
 
 
 
 
 
 
 S- 
 
 ■o 
 
 ^ 
 
 ^ 
 
 1*£ 
 
 ^C 
 
 ^ 
 
 ^ 
 
 i^ 
 
 
 o cu 
 
 S- 
 
 *d- 
 
 CO 
 
 CO 
 
 CNI 
 
 o 
 
 o 
 
 ^- 
 
 
 E N 
 
 o 
 
 
 
 ^- 
 
 CO 
 
 •3- 
 
 LO 
 
 CO 
 
 
 CD •■- 
 
 s 
 
 
 
 
 
 
 
 
 
 s: co 
 
 
 
 
 
 
 
 
 
 cn 
 c 
 cu 
 
 cu 
 
 f— 
 
 (J 
 
 >> 
 
 C_3 
 
 en 
 
 c 
 
 •r— 
 CO 
 CO 
 
 cu 
 a 
 
 o 
 
 s_ 
 
 Q_ 
 
 <4- 
 O 
 
 C 
 
 o 
 
 ra 
 
 
 CO 
 
 cu 
 
 -Q 
 
91 
 
 processing times differ by less than one-half rotation in every case except 
 one. 
 
 4.3 Multi programmed Results 
 
 It is only possible at the present time to give an indication of 
 the new system's capabilities for handling multiple simultaneous searches. 
 In this situation the number of parameters and combinations of parameters 
 which might be considered increases tremendously over the monoprogrammed 
 case, and so does the cost of simulation. This discussion, therefore, may 
 be regarded only as a point of departure. 
 
 Two important parameters—average response time as seen by the 
 user and average elapsed time per search as seen by the system—and a 
 simple model for performance evaluation will be considered. Average re- 
 sponse time is defined to be the average time required to process a par- 
 ticular search request, and is closely related to user satisfaction. 
 Average search time is simply the total time required to process a batch 
 of n searches, divided by n, without regard for the completion times of 
 individual searches. It is a measure of system throughput. To see the 
 difference between average response and average search time, consider two 
 searches processed concurrently beginning at time t=0 and ending at t=4 
 units and t=5 units, respectively. The average response time is 4.5 units, 
 but the average search time is only 2.5 units. In applying these defi- 
 nitions in the present analysis, no allowance is made for time spent in 
 preliminary processing (parsing and index file access) or for waiting time 
 spent in various queues, which may be considerable. The object here is to 
 examine those time requirements that are related directly to the use of the 
 hardware coordination system. 
 
92 
 
 Figure 4.6 presents a pair of hypothetical curves for response 
 and search time as functions of the number of searches processed. Also 
 shown are two broken lines: one is the constant value t=t, , the average 
 
 time required to process a single search by itself, and the other is the 
 
 t 1 
 
 function t = 7>— (n+1), which represents the average response time that 
 
 would be experienced by n users if their requests were submitted simul- 
 taneously and processed individually in sequence, each with processing 
 time t-. . As long as the average search time for a group of searches is 
 
 less than t, , throughput can be increased by multiprogramming. The best 
 
 performance, system-wise, occurs at the minimum on the search time curve. 
 As long as the multi programmed response curve lies below the line t , 
 
 users will experience improved performance over a monoprogrammed system 
 with a comparable work load. 
 
 The somewhat limited test results which are available are shown 
 in Figure 4.7. All tests were conducted with a 16-path system and ap- 
 proximately 64K words of memory (exceptions are noted below). Essentially 
 similar results have been obtained for a 256-path system except that the 
 times are shorter for tests involving the long search. 
 
 Part (a) of Figure 4.7 shows average search and response times 
 for batches of from one to ten short searches. Over this range of work 
 loads, the average response time rises only from 34.19ms for a single 
 search to 71.77ms (three disk rotations) for a group of ten. This is well 
 below the monoprogrammed reference curve. At the same time, the average 
 search time drops from 34.19ms to 12.54ms and is still falling at the ten- 
 search level, indicating that the most efficient operational load has not 
 
93 
 
 9t, 
 
 
 
 
 
 
 RESPONSE 
 
 8t, 
 
 
 / / 
 
 7t, 
 
 
 // 
 
 
 
 // 
 
 
 
 
 // 
 
 
 6t, 
 UJ 
 
 ■ 
 
 / / 
 
 
 2 
 
 
 / / 
 
 
 
 
 / / 
 
 SEARCH 
 
 »= 5., 
 
 
 / / 
 / / 
 
 
 Q 
 
 
 $ ' / 
 
 
 UJ 
 
 
 x / / 
 
 
 t/> 4t, 
 
 Q. 
 
 ' 
 
 'A' / 
 
 
 < 
 
 
 
 _l 
 
 UJ 3t, 
 
 - 
 
 / / 
 / / 
 
 
 2t, 
 
 -■ 
 
 / / / 
 
 
 t. 
 
 
 /^^ / 
 
 t-t, 
 
 
 
 
 1 
 
 / 
 
 
 
 
 / 
 
 ' : i 
 
 i i i i i ., 
 
 2 4 6 8 10 12 14 16 
 
 NUMBER OF SIMULTANEOUS SEARCHES (n) 
 
 Figure 4.6. Average Search and Response Time Presentation 
 
to 
 
 
 CD 
 
 
 -C 
 
 
 u 
 
 x: 
 
 i~ 
 
 u 
 
 fO 
 
 i. 
 
 aj 
 
 fO 
 
 oo 
 
 ai 
 
 
 CO 
 
 4-> 
 
 
 s- 
 
 O) 
 
 o 
 
 c 
 
 -C 
 
 o 
 
 to 
 
 —1 
 
 ,— 
 
 CD 
 
 ai 
 
 c 
 
 ^~ 
 
 o 
 
 t— 
 
 
 ta 
 
 x: 
 
 s_ 
 
 +J 
 
 (O 
 
 ■p" 
 
 Q- 
 
 2 
 
 (Suj) awn Q3SdV~13 
 
 .111 
 
 
 
 
 
 
 
 
 
 \ 
 
 \ 
 
 ■y 
 
 \ 
 
 ux 
 
 c < 
 
 V >v> 
 
 V 4 - 
 \ A 
 \ \ 
 
 \ \ / 
 
 \. N, / 
 
 *v 
 
 
 -J 
 
 
 1 
 
 
 
 1 1 1 
 
 » 
 
 
 s. 
 
 3 UJ 
 
 2 x 
 
 < 
 
 U. UJ 
 O </> 
 
 to 
 
 <D 
 
 XT 
 
 o 
 
 J- 
 <o 
 
 CD 
 CO 
 
 en 
 
 c 
 o 
 
 CD 
 
 «a 
 
 s- 
 
 Q_ 
 
 (A 
 +-> 
 
 to 
 CD 
 Q£ 
 
 T3 
 CD 
 
 i- 
 en 
 O 
 s- 
 a. 
 
 8 
 
 8 
 
 o 
 o 
 
 CO 
 
 o 
 o 
 <0 
 
 o 
 o 
 
 o 
 o 
 
 CM 
 
 (««*) 3WI1 Q3SdV~l3 
 
 ai 
 
 Z3 
 C7> 
 
 S . Q 
 
 < - co 
 
 «o »-. 
 
 
 VI 
 
 
 CD 
 
 
 -C 
 
 
 o 
 
 
 i. 
 
 
 id 
 
 
 CD 
 
 
 CO 
 
 ♦O 
 
 
 3 
 
 +* 
 
 O 
 
 S- 
 
 Ul 
 
 z 
 
 o 
 
 x: 
 
 < 
 1- 
 
 CO 
 
 -!«/» 
 
 
 3U 
 
 r "~ 
 
 2 x 
 
 CD 
 
 ^« 
 
 ■ — 
 
 u»cc 
 
 ^~ 
 
 < 
 
 us 
 
 U. Ui 
 
 s- 
 
 O (A 
 
 «* 
 
 
 Q. 
 
 ae 
 
 
 Ul 
 
 
 CO 
 
 ^-*. 
 
 3E 
 
 «o 
 
 3 
 
 * — * 
 
 o 
 o 
 •o 
 
 o 
 o 
 
 CM 
 
 o 
 o 
 
 («») 3WIX Q3SdV13 
 
95 
 
 yet been reached. 
 
 Part (b) presents the corresponding results for loads of from 
 one to four parallel long searches. In this case multiprogramming appears 
 to be detrimental since both the average search and average response time 
 curves lie above the corresponding curves for a monoprogrammed system. 
 
 For tests reported in Figure 4.7(b), each search was assigned a 
 fixed memory partition before the start of processing. This memory 
 allocation procedure was found to be superior to the system used for all 
 other tests in which the several searches compete for available core on a 
 dynamic basis. Partitions used are for one search, 52K; two searches, 
 40K; three searches, 24K; and four searches, 16K. Attempts to employ 
 critically-sized regions as discussed in section 4.2.3 produced in- 
 conclusive results in which performance was sometimes improved by the 
 use of critical region sizes and sometimes not. 
 
 Part (c) of the figure presents results for a mixed job load 
 containing one long search and a number of short ones. This is likely to 
 be a very common situation for an operational system. In this case, the 
 monoprogrammed average response curve is assumed to have the same slope 
 as in part (a) for parallel short searches alone. The important point is 
 that the response and search time curves behave in much the same way in 
 the presence of a long search as they do without one. Specifically, 
 average response time increases by about thirty ms as the number of short 
 searches increases from one to eight; and the average time per short 
 search drops consistently throughout this range, indicating that more 
 short searches could be included without serious adverse effects. 
 
 Further testing is required to develop a clear picture of system 
 
96 
 
 behavior when processing multiple searches in parallel. Results presented 
 in this section do, however, indicate the level of performance which can 
 be achieved. 
 
 4.4 Algorithmic Development 
 
 Up to the present time, algorithmic development and refinement 
 have been performed on an empirical basis. No claim is made to an optimal 
 solution; however, certain experimental observations are worthy of note. 
 In general, procedures have been avoided which would be difficult to imple- 
 ment or time-consuming to execute in an operational system. 
 
 The general problem is to merge N ordered lists of various lengths 
 located initially at random positions on one or more disk drives. The 
 available memory may or may not be adequate to contain all the data ele- 
 ments to be processed, but in general it is not. 
 
 In the initial experiments, all lists were processed strictly in 
 their order of occurrence. When the memory became full its contents were 
 combined with the next available list and the result was left on disk. For 
 large problems, this procedure eventually resulted in a need to process a 
 number of intermediate results, each too large to fit in core, located ran- 
 domly on the disk. This proved to be a time-consuming job, and better per- 
 formance was achieved when a single list (the longest) was reserved from 
 
 4 
 the start for collecting intermediate results. Curve A in Figure 4.8 was 
 
 produced using this procedure. 
 
 Cases have been observed in which performance is improved if the collector 
 list is not chosen until it is needed, i.e., if the longest file is treated 
 like any other until after the memory has been filled once, and the longest 
 remaining list is chosen as a collector. However, this procedure has no 
 clear-cut advantage over reserving the longest list from the start; and the 
 latter is easier to implement. 
 
97 
 
 1000 
 
 900 - 4\ 
 
 ALGORITHM DEVELOPMENT 
 
 UJ 
 
 800 
 
 700 
 
 600 
 
 500 
 
 Q 
 UJ 
 
 tn 400 
 
 Q_ 
 
 < 
 -J 
 
 UJ 300 
 
 D D A 
 
 £> A B 
 
 o « C 
 
 16 PATHS 
 
 200 
 
 100 
 
 _i_ 
 
 _l~ 
 
 -L. 
 
 10K 20K 30K 40K 50K 60K 70K 
 
 MEMORY SIZE 
 (WORDS; K=1024) 
 
 Figure 4.8. Algorithm Development 
 
98 
 
 Curve A exhibits a large "hump" in the range of memory sizes be- 
 tween 16K and 50K words. In this operating region, it was found to be a 
 very common occurrence for the memory to fill within a few blocks of its 
 capacity and for the system to split a list in order to fill that small 
 space. A long merge would follow and, because of the assumed overlap 
 between the two lists, a few blocks of core would again become available 
 at its completion. The whole process would then be repeated. Eventually 
 the result would grow long enough to fill the memory completely, but mean- 
 while a great many opportunities for more useful work would be missed. To 
 correct this problem, list-splitting was suppressed whenever the amount of 
 free memory dropped below some threshold level, t. Curve B is the result 
 for t=10%. 
 
 System performance was found to be fairly insensitive to the 
 exact value of t above some low level. Curves for t=5, 10 and 20% all lie 
 close together at most points tested. As one might expect, critical points 
 show a tendency to shift towards large memories as t increases, indicating 
 a reduction in the effective memory size of any given configuration. 
 
 In the next refinement, two additional activities were keyed to 
 the level of memory usage. First, all reading of new data was suppressed 
 when free core dropped below the threshold. More important, collector list- 
 processing was permitted only when less than t% of the total memory was free. 
 In this way a number of unnecessary, long merge procedures were eliminated 
 and earlier access was permitted to short lists located at the same rotational 
 position as the collector list. These improvements yielded Curve C of 
 Figure 4.8, and this procedure has been used to generate all test results 
 discussed elsewhere in this report. 
 
99 
 
 Examination of the merge trees, or patterns of list combinations 
 produced by Algorithm C reveals that, as the memory fills, the level of 
 memory occupancy oscillates back and forth about the threshold, with the 
 result that inefficient merges involving one very long and one very short 
 list still occur frequently enough to degrade performance. 
 
 Recently (too recently for extensive use in this report), tests 
 have been conducted using a new algorithm in which, essentially, no fur- 
 ther reading is permitted after the memory fills beyond the threshold 
 
 5 
 level. Instead, all pending work on files in core is completed and the 
 
 sink list is processed at the earliest opportunity. This algorithm was 
 used to generate Curve D in the figure. Curve D lies below all other 
 curves at all points tested and yields a particular improvement in the 
 middle range of memory sizes. It does not, however, completely eliminate 
 the local maximum at 40K which results from inefficient handling of long 
 lists in memory sizes between the critical values NL and N, , defined in 
 section 4.2.3. 
 
 In view of the successful tests with Algorithm D, it now appears 
 that the threshold system might be abandoned entirely in favor of a pro- 
 cedure which fills the available memory completely and then allows the work 
 in progress to be completed and the memory emptied before accepting any new 
 inputs. Other possibilities are also being considered. 
 
 5 
 
 The precise procedure tested is that reading is suppressed after either a) 
 the memory fills completely, or b) an opportunity to split a list is re- 
 jected because less than 10% of the memory is free. 
 
100 
 
 4.5 Merge Activity 
 
 Records of merge and coordination hardware utilization have 
 proved less interesting than expected, although they do support some of the 
 analysis presented earlier. Figure 4.9 shows merge time as a function of 
 memory size for the collection of basic test runs presented in Figure 4.1. 
 
 For small systems (1 (not shown), 16 and 32 paths), the merge 
 time curves have nearly the same shape as those for total elapsed time. In 
 particular, they exhibit a local maximum near 40K which, as discussed pre- 
 viously, results from inefficient merge scheduling as the memory becomes 
 nearly full. When the memory becomes large enough to hold all files of 
 interest, this inefficiency disappears, and the merge time drops to its 
 overall minimum value. 
 
 For larger systems (64 or more parallel data paths) the phe- 
 nomenon above becomes much less pronounced in the total elapsed time curves 
 and it disappears altogether from the merge time records. In fact, for 
 systems with 128 and 256 paths, the merge time shows a definite increase 
 around 50K (where total elapsed time decreases suddenly). Merge time for 
 the largest system tested (512 paths) increases almost linearly with 
 memory size above 8K while the total elapsed time decreases steadily. 
 Thus, the efficiency of network utilization drops steadily as the overall 
 performance of the system improves. 
 
 The explanation for this may be found by considering the nature 
 of these large systems and examining the detailed progress of a search. 
 In these configurations a great many data items are transmitted simul- 
 taneously from disk with the result that any given postings file occupies 
 a much smaller angular region than would otherwise be the case, and the 
 
101 
 
 
 SEARCH: ONE 70-TERM OR 
 CONDITIONS: STANDARD 
 
 600 
 
 </> 
 
 500 
 
 E 
 
 
 W 
 
 400 
 
 3 
 
 
 H 
 
 
 UJ 
 
 300 
 
 e> 
 
 
 or 
 
 
 hi 
 
 
 ^ 
 
 200 
 
 100 - 
 
 o 16 PATHS 
 
 ■o 32 PATHS 
 
 64 PATHS 
 
 ©128 PATHS 
 <>256 PATHS 
 "512 PATHS 
 
 10K 20K 30K 40K 50K 
 
 MEMORY SIZE 
 (WORDS; K = 1024) 
 
 60K 70K 
 
 Figure 4.9. Merge and Coordination Hardware Utilization 
 
102 
 
 disk appears to be less densely populated. Furthermore, because the hard- 
 ware coordination system also processes more data on each cycle, a given 
 merge is completed more rapidly. These two factors combine to prevent the 
 accumulation of unprocessed lists in core so that instead of forming small 
 intermediate results, new lists tend to be merged directly into a single, 
 large, constantly expanding, combined list. This kind of process has 
 already been identified as a source of inefficiency in discussing the 
 behavior of the total elapsed time for a search as a function of memory 
 size. 
 
 It may be possible to improve the efficiency of hardware uti- 
 lization by waiting until several lists are available in core before doing 
 any processing. However, the price of such an improvement may well be an 
 increase in the total time required to complete the task. 
 
103 
 
 5. CONCLUSION 
 
 A specialized processor for performing postings file access and 
 coordination functions in inverted file retrieval systems has been pre- 
 sented. Design studies and simulation experiments indicate that the pro- 
 posed system can be built using current technology and that it can process 
 a complicated search in a large data base from 12 to 60 times faster than 
 a large conventional computer. The speed-up is not as great for a short 
 search involving only a few terms, but ten or more such searches can be 
 processed concurrently with very little effect upon the system. In this 
 way, the average elapsed time per search can be reduced drastically. 
 
 Application of the new system need not be restricted to infor- 
 mation retrieval. It can be employed for any merging application, and in 
 many cases it can be simplified considerably by the elimination of the co- 
 ordination network. 
 
 While an exhaustive analysis of development costs is beyond the 
 scope of this report, a fairly realistic estimate of component costs can be 
 given since the subsystem designs presented in Chapter 2 are based on "off- 
 the shelf" devices. Because semiconductor prices have been declining sharply 
 in recent years, these figures should be regarded as conservative. 
 
 Table 5.1 lists several components and shows the number of units 
 required and their approximate cost for the 16- and 256-path parallel sys- 
 tems. Hardware for a small system, excluding the control unit and the disk 
 but including a 16K word x 32-bit 100ns memory, is estimated at about 
 $50,000; a large unit would cost around $200,000. Addition of a control unit 
 should not affect these numbers significantly. 
 
104 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 O 
 
 
 
 ■•-* 
 
 LO 
 
 o 
 
 CO 
 
 o 
 
 o 
 
 CO 
 
 
 
 </) 
 
 LO 
 
 o 
 
 CO 
 
 o 
 
 LO 
 
 «d" 
 
 
 
 O 
 
 
 
 
 
 
 
 
 
 o 
 
 o 
 ■bo- 
 
 cr> 
 
 
 o 
 
 cr> 
 
 CO 
 
 o 
 o 
 
 CM 
 -bO- 
 
 to 
 
 
 
 
 
 
 
 
 -C 
 
 
 
 
 
 
 
 
 4-> 
 
 
 
 
 
 
 
 
 fO 
 
 
 
 
 
 
 
 
 1 
 
 VO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 LO 
 
 
 
 
 
 
 
 
 C\J 
 
 -o 
 
 
 
 
 
 
 
 
 
 S- CD 
 
 o 
 
 o 
 
 CM 
 
 o 
 
 t— 
 
 
 
 
 <D S- 
 
 o 
 
 LO 
 
 r— 
 
 CO 
 
 
 
 
 
 -Q T- 
 
 o 
 
 CO 
 
 LO 
 
 CO 
 
 
 
 
 
 E z» 
 
 « 
 
 r. 
 
 
 m 
 
 
 
 
 
 =j cr 
 
 CO 
 
 «* 
 
 
 r— 
 
 
 
 
 
 ^ <u 
 
 CM 
 
 
 
 
 
 
 
 
 or 
 
 
 
 
 
 
 
 } 
 
 f 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 i 
 
 I 
 
 
 
 
 
 
 
 
 
 
 
 o 
 
 o 
 
 O 
 
 o 
 
 o 
 
 O 
 
 
 
 ■m 
 
 LO 
 
 o 
 
 CO 
 
 LO 
 
 o 
 
 CD 
 
 
 
 to 
 
 CO 
 
 CO 
 
 
 LO 
 
 to 
 
 en 
 
 
 
 o 
 
 
 
 
 n 
 
 n 
 
 A 
 
 
 
 o 
 
 -bO- 
 
 
 
 CO 
 
 CTi 
 CO 
 
 00 
 
 i/> 
 
 
 
 
 
 
 
 
 sz 
 
 
 
 
 
 
 
 
 4-> 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 LO 
 
 
 
 
 
 
 
 
 1 — 
 
 
 
 
 
 
 
 
 
 
 -a 
 
 
 
 
 
 
 
 
 
 s- <u 
 
 
 
 
 
 
 
 
 
 CD S- 
 
 
 
 
 
 
 
 
 
 -O "i- 
 
 o 
 
 o 
 
 CM 
 
 o 
 
 r— 
 
 
 
 
 E 3 
 
 IT) 
 
 LO 
 
 CO 
 
 CO 
 
 
 
 
 
 3 cr 
 
 CD 
 
 r— 
 
 
 
 
 
 
 
 ■Zl CD 
 
 
 
 
 
 
 
 
 
 CtL 
 
 
 
 
 
 
 
 1 
 
 ' 
 
 
 
 
 
 
 
 
 
 
 
 
 
 V) 
 
 
 
 
 
 
 
 
 S- 
 
 
 
 
 
 
 
 
 <d 
 
 
 
 
 
 
 
 
 +J 
 
 
 
 
 
 
 
 
 V) 
 
 
 
 
 
 
 
 V) 
 
 ■r" 
 
 
 
 
 +-> 
 
 
 
 CL 
 
 en 
 
 
 
 
 C 
 
 
 
 O 
 
 aj 
 
 
 
 
 <D 
 
 
 l/> 
 
 1 — 
 
 or 
 
 
 
 
 C 
 
 
 a> 
 
 u_ 
 
 
 >1 
 
 
 
 o 
 
 (/) 
 
 .c 
 
 1 
 
 +-> 
 
 s- 
 
 r— 
 
 
 Q. 
 
 a) 
 
 u 
 
 Q. 
 
 4- 
 
 o 
 
 to 
 
 
 E 
 
 4-i 
 
 +J 
 
 ■!■■ 
 
 •r— 
 
 E 
 
 +-> 
 
 
 o 
 
 fl3 
 
 to 
 
 r— 
 
 -C 
 
 CD 
 
 o 
 
 
 o 
 
 o 
 
 _l 
 
 u_ 
 
 CO 
 
 * 
 
 * 
 
 * 
 
 1— 
 
 to 
 
 a> 
 +-> 
 
 4-> 
 
 to 
 
 to 
 o 
 <_) 
 
 +-> 
 
 c 
 
 0) 
 
 c 
 o 
 a. 
 E 
 o 
 o 
 
 
 LO 
 
 a; 
 
 
 r— 
 
 a> 
 
 O 
 
 r— 
 
 >5 
 
 ja 
 
 o 
 
 <a 
 
 
 l— 
 
 to 
 
 
 c 
 
 
 o 
 
 
 
 CM 
 
 
 CO 
 
 
 X 
 
 
 to 
 
 
 -o 
 
 to 
 
 s- 
 
 +J 
 
 o 
 
 •r- 
 
 5 
 
 -Q 
 
 
 
 ^ 
 
 <M 
 
 to 
 
 CO 
 
 r— 
 
 
 * 
 
 
 * 
 
105 
 
 q 
 The Illiac IV head-per-track disk with a capacity of 10 bits and 
 
 with the other capabilities assumed in this study cost approximately 
 
 $500,000. 
 
 Mass storage alternatives under consideration include the vari- 
 ous shift register technologies and the use of a moving head disk modified 
 for parallel transmission from several tracks. If this last alternative 
 proves attractive, a system containing a controller and eight drives com- 
 parable to the IBM 2314 could be obtained for about $70,000. 
 
 In summary, it appears that the hardware for a 16-path parallel 
 retrieval system could be built for about $100K--$150K. A 256-path system 
 would cost in the neighborhood of $1M. 
 
 Much remains to be done in the area of algorithm optimization 
 and in the development of a theoretical basis for describing the perfor- 
 mance of the system. Several unexpected phenomena have been observed in 
 connection with the interaction between the disk and the hardware system, 
 and a rigorous explanation for these is not yet available. Nevertheless, 
 the system performs well and exhibits promise for extending the capa- 
 bilities of inverted file retrieval systems. 
 
106 
 
 LIST OF REFERENCES 
 
 [1] "Putting Law Libraries into Computers," Business Week, p. 36; 
 January 26, 1974. 
 
 [2] Mead Data Central, Inc., Lexis, A Computer Based Legal Research 
 Service (Undated Brochure). 
 
 [3] D. B. McCarn and J. Leiter, "On-Line Service in Medicine and 
 Beyond," Science , pp. 318-324; July 27, 1973. 
 
 [4] W. R. Nugent, "The U. S. Patent Office Data Base: A Full Text 
 Communications Format for Computer-Aided Classification, 
 Retrieval and Examination of Patents," ASIS Proceedings, 
 Vol. 8, pp. 179-184; 1971. 
 
 [5] K. E. Batcher, "Sorting Networks and Their Applications," Spring 
 Joint Computer Conference , pp. 307-314; 1968. 
 
 [6] K. E. Batcher, "A New Internal Sorting Method," Goodyear Aero- 
 space Corporation Report No. GER-11759; September 1964. 
 
 [7] K. E. Batcher, "Minimum-Time Merging Networks," Goodyear Aero- 
 space Corporation Report No. GER-14122; December 1968. 
 
 [8] Motorola Semiconductor Products, Inc., MECL Integrated Circuits 
 Data Book , Third Edition; 1973. 
 
 [9] Intel Data Catalog , Intel Corporation, Santa Clara, California; 
 
 February 1973. 
 [10] Illiac IV Systems Characteristics and Programming Manual , IL4-PM1 , 
 Revision 4, Change 1; June 1970. 
 
107 
 
 [11] G. H. Barnes, R. M. Brown, M. Kato, D. J. Kuck, D. L. Slotnick 
 and R. A. Stokes, "The.Illiac IV Computer," IEEE Trans- 
 actions on Computers , Vol. C-17, No. 8, pp. 746-757; 
 August 1968. 
 
 [12] D. E. Mclntyre, "An Introduction to the Illiac IV Computer," 
 Datamation , Vol. 16, No. 4, pp. 60-67; April 1970. 
 
 [13] D. Huffman, "A Method for the Construction of Minimum-Redundancy 
 Codes," Proceedings of the IRE , pp. 1098-1101; September 
 1952. 
 
 [14] D. E. Knuth, The Art of Computer Programming, Vol. 3, Addi son- 
 Wesley, pp. 362-371; 1973. 
 
 [15] National Library of Medicine, Master MESH; November 1972. 
 
108 
 
 VITA - 
 
 William Howard Stellhorn was born in Fort Wayne, Indiana, in 
 1943. He received the B.S.E.E. and M.S.E.E. degrees from Purdue Uni- 
 versity in 1965 and 1966, respectively. His research activity at Purdue 
 dealt with computer-aided analysis and synthesis of nonlinear networks. 
 From 1967 to 1970 Mr. Stellhorn was employed as an Aerosystems Engineer 
 with the General Dynamics Corporation in Fort Worth, Texas, where he 
 developed and extended models for aircraft penetration studies. Since 
 entering the University of Illinois in 1970, he has held a Research As- 
 sistantship in the Department of Computer Science. He is a member of the 
 Association for Computing Machinery, the Institute of Electrical and 
 Electronics Engineers, and Tau Beta Pi and an associate member of Sigma Xi 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-74-637 
 
 l. Title and Subtitle 
 
 A SPECIALIZED COMPUTER FOR INFORMATION RETRIEVAL 
 
 3. Recipient's Accession No. 
 
 5- Report Date 
 
 October 1974 
 
 6. 
 
 '. Author(s) 
 
 William Howard Stellhorn 
 
 8. Performing Organization Rept. 
 
 N °- UIUCDCS-R-74-637 
 
 Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 US NSF GJ-36936 
 
 2. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 13. Type of Report & Period 
 Covered 
 
 Doctoral - 1974 
 
 14. 
 
 5. Supplementary Notes 
 
 6. Abstracts 
 
 Response time in large, inverted file document retrieval systems is deter- 
 mined primarily by the time required to access files of document identifiers on disk 
 and perform the processing associated with a Boolean search request. This paper 
 describes a specialized computer system capable of performing these functions in 
 hardware. Using this equipment, a complicated sample search involving 70 terms and 
 over 60,000 document references can be performed from 12 to 60 times faster than with 
 a conventional machine, and many small searches can be processed concurrently with 
 very little effect upon system performance. 
 
 A detailed description of the system, which can be realized with currently- 
 available technology, is presented; and algorithms for controlling the progress of a 
 search are discussed. Results from numerous simulations involving various system 
 configurations and other factors are also reported. 
 
 7. Key Words and Document Analysis. 17a. Descriptors 
 
 Information Retrieval 
 Inverted Files 
 Parallel Processing 
 Merging 
 Computer Systems 
 
 7b. Identifiers/Open-Ended Terms 
 
 7c. COSATI Field/Group 
 
 8. Availability Statement 
 
 RELEASE UNLIMITED 
 
 ORM NTIS-35 ( 10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 115 
 
 22. Price 
 
 USCOMM-DC 40329-P7 1 
 
*a 2 
 
 &B 
 

 i, ' 
 
 
 UNlVBf, 
 
 s 'T-VOF, 
 
 sr,«- »I1KS5? 
 
 
 ■ ■ 
 
 I ^^B 
 
 I 
 
 
 ►A-* : 
 
 •.V.*Atf 
 
 
 t.<$ 
 
 
 
 MM 
 
 ■