afflB 
 
 HSU 
 
 BBjnffi HO WW rWt 
 
 m mm 
 
 « Hfin US 
 
 ■ 
 
 
 f^ 
 
 fjlllllllffl 
 
 H IB Hi Bn 
 
 ta BBSS nBI 
 
 HAS BE BE 
 
 •T"K BSH 9b£ HHhBH 
 
 ■8 H 
 
 ^B m BBS 
 
 ■ 
 
 ■ 
 
 Jl 
 
 1 1 
 
 ■ 
 
 h 
 
 
 BB 
 
 
 ■ 
 
 
 
 I ■ 
 
 I^B 
 
 H SB 
 
 fl B 
 
 m 3HB jyfll 
 
 Boh Sh 
 
 a— HHBwHfllua wSSSSSSSsSSBSat 
 
 Bgnmi bi bub 
 
 ^■Bmn »|m Iff 
 
 — —aMhiiiniMiiB^w— an 
 
 KfOSfllMIilMUjsi 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 TiGy 
 
 ho. 637 -£42 
 
 cop. 2- 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the" 
 Latest Date stamped below. 
 
 f T or e 'L mU r ,a,i ° n ' ° nd ° nder,ini "9 »' "oaks are reasons 
 
 z u„r:e p :;;7 ac " on and maY resu,t in *— *- 
 
 To renew call Telephone Center, 333-8400 
 
 ^S^L^^!i_^^L^ URBANA-CHAMPAIGN 
 
 L161— O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/programspeedupth638stre 
 

 \uCSf 
 
 Report No. UIUCDCS-R-74-638 
 
 NSF - OCA - GJ -36936 - 000004 
 
 PROGRAM SPEEDUP THROUGH CONCURRENT RECORD PROCESSING 
 
 by 
 
 Richard Ernest Strebendt 
 
 October 1974 
 
 THE LIBRARY OF THE 
 
 JUL 9 1974 
 
Report No. UIUCDCS-R-74-638 
 
 PROGRAM SPEEDUP THROUGH CONCURRENT RECORD PROCESSING* 
 
 by 
 Richard Ernest Strebendt 
 
 October 1974 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 * 
 This work was supported in part by the National Science Foundation under 
 Grant No. US NSF-GJ-36936 and was submitted in partial fulfillment of the 
 requirements for the degree of Doctor of Philosophy in Computer Science, 
 October 1974. 
 
m 
 
 ACKNOWLEDGMENT 
 
 The author wishes to express his gratitude to Professor 
 David J. Kuck for the suggestions, incisive questions, and advice which 
 were invaluable in developing this thesis. 
 
 Thanks is due also to the personnel of the Administrative Data 
 Processing Office of the Urbana-Champaign campus of the University of 
 Illinois who provided the COBOL programs for the analyses presented in 
 this paper. Special thanks in this regard is expressed to Mr. Pinaki R. 
 Das. 
 
 Above all, the author wishes to express his gratitude to his 
 wife, Frances, for her encouragement, patience, and assistance. 
 
IV 
 
 PROGRAM SPEEDUP THROUGH CONCURRENT RECORD PROCESSING 
 
 Richard Ernest Strebendt, Ph.D. 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign, 1974 
 
 Much effort in the past has been devoted to speeding up 
 computational programs through the use of multiprocessing. This paper 
 examines the problem of speeding up data processing programs which typ- 
 ically do not contain a great deal of computation. 
 
 A machine organization is proposed which is capable of execu- 
 ting several instruction streams concurrently. Compiler algorithms are 
 described which automatically insert the necessary commands to start 
 and stop instruction streams and to protect common variables which must 
 be accessed sequentially. 
 
TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 1.1 Approaches to Program Speedup 1 
 
 1.2 Characteristics of COBOL 3 
 
 1.3 Assumptions and Restrictions 4 
 
 2. DESCRIPTION OF THE METHOD 8 
 
 2.1 Types of Hardware Units Needed 11 
 
 2.2 Address Counters and Interlocks 13 
 
 3. COMPILER ALGORITHMS 18 
 
 3.1 Source Text Scan 18 
 
 3.2 Phase and Link Identification 23 
 
 3.3 Statement Migration 24 
 
 3.4 Variable Type Identification . 32 
 
 3.5 Storage Assignment 37 
 
 3.6 Positioning FORK, HOLD, and QUIT Instructions .... 42 
 
 3.7 Inserting Interlocks 47 
 
 4. PROOF OF METHOD 53 
 
 4.1 Theorem 4.1 53 
 
 4.2 Theorem 4.2 54 
 
 4.3 Discussion 57 
 
 5. MACHINE DESIGN 60 
 
 5.1 Over-all Structure 60 
 
 5.2 Program Memory 62 
 
 5.3 Address Counters 65 
 
 5.4 Instruction Dispatch Unit 67 
 
 5.5 Address Counter Coordinator 71 
 
 5.6 Processors 81 
 
 5.7 Data Memory and Buses 84 
 
 5.8 I/O Processors 88 
 
 5.9 Routing Network 90 
 
 5.10 Modifications Needed for Multiprogramming 90 
 
 6. EXPERIMENTAL RESULTS 92 
 
 6.1 Introduction . . . . * 92 
 
 6.2 Variable Size Counts 92 
 
VI 
 
 Page 
 
 6.3 Statement Type Counts 93 
 
 6.4 Program Analyses 97 
 
 6.5 Program Simulation 106 
 
 7. MACHINE PARAMETERS 115 
 
 7.1 Speed Limitation 115 
 
 7.2 Number of Address Counters 116 
 
 7.3 Data Memory Word Size 116 
 
 7.4 Data Character Size 118 
 
 7.5 Number of Data Memory Units 119 
 
 7.6 Size of Data Memory 119 
 
 7.7 Size of Program Memory 122 
 
 7.8 Numbers of Processors 125 
 
 7.8.1 IF Tree Processor 125 
 
 7.8.2 Arithmetic/Logical Processors 125 
 
 7.8.3 I/O Processors 126 
 
 7.8.4 SORT Processor . 127 
 
 7.9 Instruction Dispatch Unit Memory Sizes 127 
 
 7.10 Other Devices 134 
 
 7.10.1 Program Memory 134 
 
 7.10.2 Address Counter 135 
 
 7.10.3 Instruction Dispatch Unit 136 
 
 7.10.4 Address Counter Coordinator 137 
 
 7.10.5 Data Memory Unit 138 
 
 7.10.6 Routing Network 139 
 
 7.10.7 Arithmetic/Logical Unit 139 
 
 7.10.8 IF Tree Processor 140 
 
 7.10.9 1/0 Processor 140 
 
 7.10.10 SORT Network 142 
 
 7.11 Package Counts 143 
 
 7.11.1 Memories 143 
 
 7.11.2 Queues 145 
 
 7.11.3 Bus Drivers and Receivers 145 
 
 7.11.4 Other Devices 145 
 
 7.11.5 Total Package Requirement 145 
 
 8. PROBLEM PROGRAMS 151 
 
 8.1 Characteristics of Problem Programs 151 
 
 8.2 Speeding Up Problem Programs 151 
 
 9. SOFTWARE DESIGN 155 
 
 9.1 Language Features Which Hinder 155 
 
 9.1.1 ALTER 155 
 
 9.1.2 Subroutines 156 
 
 9.2 Language Features Which Help 158 
 
 9.2.1 Complex Operations 158 
 
 9.2.2 SORT 158 
 
Vll 
 
 Page 
 
 9.3 Programming Techniques Which Hinder 159 
 
 9.3.1 Sequence Checking 159 
 
 9.4 Programming Techniques Which Help 159 
 
 9.4.1 Super-records ..... 159 
 
 9.4.2 Parallel Tasking 160 
 
 10. CONCLUSIONS . . . 161 
 
 10.1 Summary of Results 161 
 
 10.2 Areas for Further Inquiry 163 
 
 10.3 Final Comment 165 
 
 LIST OF REFERENCES 166 
 
 APPENDIX - Program Analysis Example 171 
 
 A.l Source Text Scan 171 
 
 A. 2 Phase and Link Identification 179 
 
 A. 3 Statement Migration 181 
 
 A. 4 Variable Type Identification 181 
 
 A. 5 Storage Assignment 185 
 
 A. 6 Positioning FORK, HOLD, and QUIT Instructions .... 185 
 
 A. 7 Inserting Interlocks 185 
 
 VITA 191 
 
vm 
 
 LIST OF TABLES 
 
 Page 
 
 6.1 Frequency Count of Variable Sizes 94 
 
 6.2 Frequency Count of Statement Types 98 
 
 6.3 Frequency Count of Operator Types 101 
 
 6.4 Statistics for Analyzed Programs 104 
 
 6.5 Memory Allocation Results 107 
 
 6.6 Speedup Results 112 
 
 7.1 Memory Requirements 120 
 
 7.2 Program Size Statistics 124 
 
 7.3 Memory Package Requirements 144 
 
 7.4 Queue Package Requirements 146 
 
 7.5 Bus Driver and Receiver Package Requirements 147 
 
 7.6 Other Device Package Requirements 148 
 
 7.7 Total Package Requirements 150 
 
 A.l Program Listing 172 
 
 A. 2 Variable Names 175 
 
 A. 3 Program Summary 176 
 
 A. 4 Variable Types 184 
 
 A. 5 Affinity and Segregation Sets 186 
 
 A. 6 Storage Unit Assignment 187 
 
IX 
 
 LIST OF FIGURES 
 
 Page 
 
 2.1 Concurrent Execution Example 9 
 
 3.1 IF Tree Example 22 
 
 3.2 Statement Migration Example 1 25 
 
 3.3 Statement Migration Example 2 28 
 
 3.4 Statement Migration Example 3 30 
 
 3.5 FORK, HOLD, QUIT Insertion 48 
 
 3.6 Interlock Insertion Example ... 51 
 
 4.1 FORK Insertion Example 59 
 
 5.1 Over-all Machine Structure 63 
 
 5.2 Program Memory 64 
 
 5.3 Address Counter 66 
 
 5.4 Instruction Dispatch Unit 68 
 
 5.5 Fetch and Tag Generator Logic . 72 
 
 5.6 Instruction Dispatch Controller Logic 74 
 
 5.7 Tag Status Register Logic 75 
 
 5.8 FORK Control Sequence 76 
 
 5.9 HOLD Control Sequence 78 
 
 5.10 QUIT Control Sequence 80 
 
 5.11 TEST Control Sequence 82 
 
 5.12 RELEASE Control Sequence 82 
 
 5.13 Data Memory Unit 85 
 
Page 
 
 6.1 Frequency Count Plot - Variable Sizes 96 
 
 6.2 Histogram - Statement Types by Class 99 
 
 6.3 Histogram - Statement Types by Statement 100 
 
 6.4 Histogram - Operator Types by Class 102 
 
 6.5 Histogram - Operator Types by Operator 103 
 
 7.1 Cumulative Variable Size Distribution 117 
 
 7.2 Memory Requirements vs. Program Size 121 
 
 7.3 Instruction Transfer Rate Model 129 
 
 9.1 ALTER Instruction Example 157 
 
 A.l Program Graph 180 
 
 A. 2 Phase I Statement Migration 182 
 
 A. 3 Phase II Statement Migration 183 
 
 A. 4 Final Program Graph 188 
 
1. INTRODUCTION 
 
 1 .1 Approaches to Program Speedup 
 
 A continuing concern [CHE71a, BEL72, WIT72] in Computer Science 
 is the problem of speeding up the execution of programs. In the early 
 days of computers this was primarily attacked by speeding up the cir- 
 cuitry of the machine itself. Faster devices were developed as relays 
 gave way to vacuum tubes which gave way in turn to semiconductors. More 
 efficient algorithms for arithmetic were found and continue to be inves- 
 tigated. Now that physical limits are in sight for the speed of devices, 
 the emphasis [GIL58, MUR66] on program speedup is being placed on 
 parallelism in the execution of the program. Machines have been devel- 
 oped [BAR68] to exploit the parallelism inherent in array operations. 
 The parallelism present in algorithms for arithmetic operations has also 
 been utilized in pipelined arithmetic units [SEN65, SEN67, HIN72, 
 WAT72]. 
 
 Some consideration has been given [ASC67, F0S71 , FLY71 , FLY72, 
 CUR73, BAE73] to the possibility of using more than one processing unit 
 to execute a program, with each processor executing independently of the 
 others. Two problems arise when this is done. The first is the problem 
 of conflicts in accessing data common to several processors. For a 
 particular class of programs this problem is solved by inserting a com- 
 plex set of tests around the instructions referencing the common data 
 [DIJ68a, DIJ68b, C0U71 , EIS72, HAB72] to allow any processor to access the 
 common data so long as no other processor is accessing that data. More 
 
commonly, however, the problem is avoided by allowing processors to 
 simultaneously execute only tasks which are independent. This leads to 
 the second problem, that of identifying independent tasks. Mechani- 
 cally finding independent tasks within a program can be done [BER66, 
 RAM69, RUS69, TJA70], but for a large program this can be expensive in 
 machine time. The approach suggested by several investigators [C0N63, 
 AND65, 0PL65, WIR66] is that of requiring the programmer to specify in 
 his program where he thinks the processors should be started and 
 stopped. For an occasional program this might be a workable technique, 
 but for a programmer with a heavy work load it would be too time con- 
 suming and error-prone to be a useful technique. 
 
 In this paper we also attack the problem of program speedup 
 through the concurrent operation of more than one processor. Our 
 approach, however, is different from that in previous work done and 
 yields a potentially very high speedup without searching for indepen- 
 dent tasks within a program. We attain a program speedup by executing 
 the program concurrently with itself , with each instruction stream (or 
 copy of the program) processing a different set of input data. No 
 parallel tasking is attempted within an instruction stream. It is 
 shown in this paper that this method of achieving concurrency has the 
 following advantages: 
 
 1) It is not necessary to compare all tasks with all 
 others to find those which are parallel executable. 
 
 The bulk of this paper assumes that only one program at a 
 time is in execution. The modifications needed to extend the machine 
 proposed in this paper to multiprogramming are discussed in section 5.10, 
 
2) The instructions to start and stop processors can 
 be inserted very easily by the compiler into a 
 program written for a single processor machine. 
 This relieves the programmer of this burden. Also, 
 the locations of these instructions can change with 
 each compilation as the program changes. Thus the 
 programmer can concentrate on what he wants the 
 program to do and not on how the machine does it. 
 
 3) The interlocking operations can be inserted by the 
 compiler without the intervention of the programmer. 
 
 4) The interlocking conditions are fairly simple and 
 can be relegated to an inexpensive hardware unit. 
 
 1.2 Characteristics of COBOL 
 
 Much recent work on the speedup of programs through multipro- 
 cessing emphasizes the speedup of arithmetic [MUR71 , KRA72, KUC72b]. 
 In many computational programs the speedups thus gained are substan- 
 tial. In many data processing programs written in COBOL, however, 
 little is gained in this way since there is little arithmetic in them. 
 It might be asked: why worry about COBOL programs? The answer is 
 quite straightforward; more programs are written in COBOL than in any 
 other language. Indeed, a recent survey [PHI73] of language users 
 indicates that more programs are written in COBOL than in all of the 
 other languages combined . The economic benefits resulting from improv- 
 ing the execution speed of COBOL programs should be well worth the 
 effort. 
 
The characteristics which make COBOL programs as long running 
 as they often are suggested our method of speeding up such programs. 
 
 A typical program, judging from the examination of a number of pro- 
 
 * 
 grams, involves wery little processing of the data compared to, say, 
 
 a FORTRAN numerical program. Commonly, a set of input data is ac- 
 quired, particular items are selected, simple calculations (if any) 
 are carried out, then the data is reformatted and written out, and 
 another set of data is acquired for processing. While the amount of 
 processing done on each set of data is relatively small, the number of 
 sets of data processed per run may be quite large. 
 
 This profile of a typical COBOL program suggests three 
 things. First, any arithmetic speedup we can obtain is useful, 
 although it may not be as dramatic as that in a numerical program. 
 Second, since much of the work in a COBOL program involves manipulating 
 data, it seems desirable to build these capabilities into the memory 
 [ST070] where this would avoid transferring large amounts of data back 
 and forth to special processors. Finally, our greatest speed improve- 
 ment can be expected to come from overlapping the processing of the 
 sets of input data. 
 
 1 .3 Assumptions and Restrictions 
 
 In attacking any problem of the potential magnitude of this 
 one, it is necessary to make some assumptions about the environment of 
 
 Our samples are described in Chapter 6, "Experimental 
 Results." 
 
the proposed solution and to set bounds on the degree to which we are 
 willing to modify the original COBOL programs. 
 
 An important consideration is the hardware available for im- 
 plementing the machine. We show in Chapter 7 that the machine proposed 
 in this paper could be built with components which are currently avail- 
 able or are within the capabilities of the current state-of-the-art. 
 In that chapter it is also indicated where capabilities which are not 
 yet available could be put to good use. 
 
 Another consideration is the software with which the compiler 
 is implemented. The compiler algorithms presented in this paper are 
 intended to demonstrate the feasibility of concurrent record process- 
 ing. In an actual implementation of these algorithms we assume that 
 the implementer would use techniques which take advantage of the capa- 
 bilities of the machine for which he was designing the compiler. Such 
 capabilities would include the ability to execute more than one instruc- 
 tion stream at a time. 
 
 For the purposes of this thesis we assume no extensions to 
 COBOL to facilitate the solution of the problem, although we suggest 
 some extensions in Chapter 9 that might be useful. We attack the 
 problem of speeding up COBOL programs which are presented to us as 
 they are now written for single processor sequential machines. This is 
 done for several reasons. First, concocting parallel-COBOL test pro- 
 grams for parallel machines could result in a yery small set of test 
 programs which is not representative of real data processing programs. 
 Second, were a machine such as the one proposed in this paper actually 
 put into use, it would have to be able to handle the huge number of 
 
previously existing programs in some way, or else force the users to 
 rewrite all of their programs. While many programs should be rewritten 
 to make best use of the abilities of the machine, we show in this paper 
 that many programs intended for a sequential machine can be made to run 
 well on a concurrent machine without requiring the user to modify the 
 programs. Third, while there is a constant quest for higher throughput 
 rates for business computers, a language extension which could improve 
 throughput but complicates programming is likely to be shunned. Quite 
 often in a business data processing environment the efficiency of a 
 program is less important than the ease with which a programmer, unfa- 
 miliar with the program, can make changes in it. To bring about the 
 kind of speed improvement possible in a concurrent machine, it is not 
 necessary to complicate a program by requiring that the programmer 
 insert additional instructions to control instruction sequencing. 
 These additional instructions can be inserted by the compiler and need 
 not appear in the language. Finally, because of the wide use of COBOL 
 and the normal conservative tendency of people to resist change, any 
 language radically different from COBOL would be slow to gain accep- 
 tance among business programmers. 
 
 In attempting a solution to an interesting problem, it is 
 easy to get carried away and to propose grandiose schemes which would 
 be costly to implement and might have relatively little applicability 
 to real programs. To avoid this pitfall we limit ourselves to adding 
 code to, and rearranging the code in the original COBOL program. We 
 do, of course, make use of the hardware we must introduce. We do not 
 attempt to transform the algorithm used in the program by attempting 
 
to discover the programmer's intentions and implementing them in a 
 better way than he did. Besides the obvious pitfalls inherent in 
 attempting to outprogram the programmer, such an approach could lead 
 to a very expensive system whose execution speedup was offset by a yery 
 long compilation time. Since programs are constantly being revised, 
 compilation cost is not to be ignored. Likewise, we do not try to 
 restructure the data files into forms more amenable to concurrent 
 processing. Again, the cost of restructuring each file on every run 
 to suit the needs of each program could eliminate any benefits derived 
 from the resulting faster execution. We do, in Chapter 9, point out 
 programming techniques and file structures which are good for concur- 
 rent processing and should be used by programmers in programming our 
 machine. 
 
8 
 
 2. DESCRIPTION OF THE METHOD 
 
 The technique described in this thesis achieves program 
 speedup by concurrently processing as many input records as possible, 
 while interlocking the processing to preserve any sequentially which 
 is essential to the correct operation of the program. This is done by 
 starting the processing of a record of input data as soon as it is 
 known which READ statement in the program is the next to be executed 
 (i.e.: when it is known what processing is to be done on the next 
 record). At those points in the program at which a sequential execu- 
 tion constraint exists, processing is suspended until the condition 
 inhibiting further processing has been removed. 
 
 To indicate how this technique works, consider Figure 2.1. 
 Figure 2.1(a) shows a program written for a single processor machine. 
 A record is read in block 1 to obtain values for A, B, and C. A test 
 is made in block 3 which compares A with its preceding value. If A 
 satisfies the test, X is computed in block 4 and written out in 
 block 5. In block 6 the value of A is saved. Figure 2.1(b) shows the 
 same program after we have inserted instructions to overlap processing 
 of different input records and to provide the necessary interlocking 
 between the concurrent processing. Block b causes the next record to 
 be read as soon as the decision at block 3 is made to remain in the 
 1-2-3-4-5-6 loop. Block d breaks this loop by releasing the hardware 
 used to process a record. Block a tests an interlock indication to be 
 certain that the correct value of OLD-A is used in block 3. Block c 
 
i 
 
 READ 
 A,B,C 
 
 2 ,, 
 
 D ^- B +C 
 
 I 
 
 X -*• D/A 
 
 5 \ 
 
 OLD-A-*- A 
 
 WRITE 
 ERROR-MSG 
 
 (a) 
 
 Figure 2.1 
 Concurrent Execution Example 
 
10 
 
 i 
 
 READ 
 A, B, C 
 
 I 
 
 D •+- B + C 
 
 I 
 
 WAIT UNTIL CORRECT 
 
 VALUE OF OLD-A 
 
 IS AVAILABLE 
 
 SIGNAL TO START 
 
 PROCESSING OF 
 
 NEXT RECORD 
 
 X <+- D/A 
 
 OLD-A 
 
 I 
 
 SIGNAL THAT OLD-A 
 HAS CORRECT VALUE 
 FOR NEXT RECORD 
 
 TERMINATE PROCESSING 
 
 OF THIS RECORD 
 
 1 
 
 WAIT UNTIL 
 
 OTHER RECORDS 
 
 TERMINATE 
 
 I 
 
 WRITE 
 ERROR-MSG 
 
 (b) 
 
 Figure 2.1 (continued) 
 Concurrent Execution Example 
 
11 
 
 releases the interlock as soon as OLD-A has been assigned its proper 
 value for the next record's processing. Block e is used to guarantee 
 that the processing occurring beyond that block is not entered until 
 all of the processing of preceding records is completed. 
 
 2.1 Types of Hardware Units Needed 
 
 To implement the speedup technique discussed in this paper 
 we need the following hardward units: 
 
 1) Multiple processors are needed. By the word 
 "processor" we do not mean a complete Central 
 Processing Unit. A processor, as referred to in 
 this paper, is either an Arithmetic Unit, a 
 Conditional Branch Tree (IF Tree) Processor, or 
 other type of special purpose unit. 
 
 a) In view of the fact that there may be many 
 records in process concurrently, we expect 
 enough demand for computation to require a 
 number of Arithmetic Units, even if there is 
 very little arithmetic per record. 
 
 b) A Conditional Branch Tree Processor [DAV72a] 
 is a device which accepts as input the results 
 of a collection of comparisons as a set of 
 Boolean values, and returns the identity of 
 the path to be taken for subsequent processing. 
 It has been shown [DAV72b] that such a processor 
 can select the appropriate exit point for up to 
 
12 
 
 an eight level tree of IF statements in about 
 two major clock cycles. The formation of 
 IF Trees is discussed in section 3.1. 
 c) Other types of processors include a unit used 
 to sort files, such as that described by 
 Batcher [BAT68], and a collection of I/O 
 processors. 
 
 2) To attain the necessary memory bandwidth to satisfy 
 the demands of a number of processors for data, we 
 need a number of memory units. In addition, more 
 memory units are needed to hold the program being 
 executed. We propose to separate the data memory 
 from the program memory so that there is no inter- 
 ference between them. This also allows the design 
 of the program memory to take advantage of the 
 fact that fetches of instructions tend to be from 
 locations relatively close together in memory 
 [C0F72]. The data memory can be designed to 
 include the capability of doing some types of pro- 
 cessing, such as replacing all occurrences of one 
 character by another, without the need to send the 
 data to another unit for processing. 
 
 3) To allow any processor to fetch data from any 
 memory and to allow transfers of data from any 
 memory to any other, we need some sort of Routing 
 Network. 
 
13 
 
 4) To control instruction sequencing, an Address 
 Counter is needed for each record being concur- 
 rently processed. When we reach a point, during 
 the execution of a program, at which another 
 input record could begin to be processed, we 
 activate a previously inactive Address Counter. 
 It then fetches instructions to perform the 
 processing for the next input record. When an 
 Address Counter completes the processing of a 
 set of input data, it is deactivated and returns 
 to a pool of units available for assignment to 
 subsequent records. 
 
 2.2 Address Counters and Interlocks 
 
 Since the use of multiple Address Counters is a key part of 
 this method of speeding up a program's execution, we examine here the 
 starting and stopping of Address Counters and the constraints under 
 which they must operate. 
 
 To start an inactive Address Counter into activity, an 
 Address Counter which is already active executes a FORK instruction. 
 One of the operands of the FORK instruction is the program location at 
 which the new Address Counter is to begin execution, which we call the 
 initiation point for that FORK. 
 
 While active, the jobs of the Address Counter are instruc- 
 tion sequencing and address calculation. An Address Counter fetches 
 instructions, computes the effective data addresses, and passes the 
 
14 
 
 instruction on to the rest of the machine for execution until one of 
 the following conditions arises: 
 
 1) A conditional branch instruction is encountered. 
 In this case the Address Counter generates the 
 request for the evaluation of the condition, then 
 awaits the result. It resumes execution at the 
 address computed from the information supplied by 
 the IF Tree Processor. 
 
 2) A QUIT instruction is encountered. 
 
 3) A HOLD instruction is encountered. Either of two 
 things causes the instruction stream to be resumed. 
 If the Address Counter was the only one active, it 
 
 is signalled to resume execution at the next instruc- 
 tion. If other Address Counters are still active, 
 this Address Counter halts as though it had encoun- 
 tered a QUIT instruction. The last active Address 
 Counter, after it executes a QUIT, resumes execu- 
 tion where the one executing the HOLD was halted. 
 
 4) An instruction is encountered which causes a value 
 to be transferred from one of the data memory units 
 into one of the Address Counter's index registers. 
 Execution resumes at the next instruction after the 
 value is received. 
 
 5) An instruction testing an interlock is encountered. 
 Execution resumes at the next instruction when the 
 
15 
 
 Address Counter is signalled that the interlocking 
 
 condition has been removed. 
 An Address Counter is restricted to only two classes of addressing in 
 the calculation of data addresses. The first is the class of addresses 
 which all Address Counters can access—those corresponding to common 
 variables. The second is the class of addresses which only a single 
 Address Counter can access—those corresponding to private variables. 
 To accomplish this separation of data we conceptually partition memory 
 into one area common to all Address Counters, and a number of private 
 areas, each accessable to only one Address Counter. By using base 
 registers containing the appropriate base addresses for the partitions 
 allowed to particular Address Counters, and by making the layout of 
 each copy of the private areas the same, we can easily implement this 
 partitioning. 
 
 In order to insure that the results of a program executed 
 using our concurrent record processing technique are correct, three 
 types of interlocks are needed in a program. 
 
 1) Those required to insure that instructions in the 
 same instruction stream which access the same 
 variable are executed in the correct sequence. 
 
 2) Those required to protect common variables which 
 can be modified by any instruction stream at any 
 time. It is necessary in this case to make sure 
 that only one instruction stream accesses such a 
 variable at a time. 
 
16 
 
 3) Those required to protect variables for which 
 sequential execution constraints exist. These 
 variables, such as OLD-A in Figure 2.1, must not 
 be accessed by an instruction stream until the 
 preceding instruction stream is no longer able to 
 access them. 
 The first type of interlock is obtained if we do not allow an 
 instruction to go into execution until all of its operands are available 
 for use. We show in Chapter 5 that this operation, and the others 
 necessary to handle this interlock problem, can be handled by a hard- 
 ware unit we call an Instruction Dispatch Unit. 
 
 For the second type of interlock, we could associate a bit 
 with each such variable and use it as a semaphore [DIJ65]. It turns 
 out, however, that the Instruction Dispatch Unit intended to handle the 
 first type of interlock problem also solves the second type of inter- 
 lock problem. It should be noted that no work by the compiler is 
 needed for either of these interlocks. 
 
 The third type of interlock problem does require both com- 
 piler algorithms and hardware to handle it. First, the variables which 
 require this type of interlock must be identified. These variables are 
 those which must be accessed by Address Counters (or, equivalently, 
 instruction streams) in the order in which the Address Counters are 
 activated. Then those blocks of code (nodes in the program graph) 
 which contain references to these variables must be identified. Finally 
 the compiler must insert instructions in appropriate places to test and 
 to release interlock indicators for each of these interlocked variables. 
 
17 
 
 These indicators are included in the circuitry of the Address Counter 
 Coordinator. We could implement this type of interlock by constructing 
 an unlocking function attached to the variable, but this could lead to 
 a problem. The simpler interlocks pose no problem with respect to 
 degrading performance by tying up resources while an instruction waits 
 for access to a variable. The reason for this is that the expected 
 length of the wait should be relatively short. For this third type of 
 interlock it is not sufficient that the variable is not being accessed; 
 it must no longer be able to be accessed by a given Address Counter's 
 predecessor in order for that Address Counter to be able to access the 
 variable. The length of the wait for that condition to be satisfied 
 could be quite long, especially if there are many active Address 
 Counters. With the interlock attached to the variable we could have 
 several statements per Address Counter which are half executed waiting 
 for locked variables, with intermediate results and unfi liable fetch 
 requests tying up a great deal of hardware. Instead, we do not allow 
 a block to enter execution until all of the interlocks attached to that 
 block are satisfied. Equipment is thus available to handle the pro- 
 cessing for the Address Counters which are not locked out of their data, 
 and we avoid a possible deadlock producing condition. 
 
18 
 
 3. COMPILER ALGORITHMS 
 
 This chapter is intended to demonstrate the feasibility of our 
 speedup technique by presenting a set of algorithms which could be used 
 to implement this process. The algorithms are presented in the same 
 order that they could appear in a compiler. 
 
 3.1 Source Text Scan 
 
 The two sections of a COBOL program which provide the bulk of 
 the information in which we are interested are the Data Division and the 
 Procedure Division, as they are named in the language [IBM72]. The 
 former describes the attributes of the files used by the program and 
 defines all of the variables used in the program. The Procedure 
 Division contains the executable instructions of the program. 
 
 During the scanning of the program we need to collect the 
 following information in addition to that normally collected by a com- 
 piler for a sequential machine. No great difficulty, however, is 
 entailed in accumulating this information since it is readily available 
 during the usual scanning process. 
 
 1) For each statement in the program we build two sets 
 of variables: 
 
 a) The identities of those variables fetched for 
 use in the statement comprise the input set, 
 or set of input variables, for the statement. 
 
19 
 
 b) The identities of those variables whose values 
 are set by the execution of the statement 
 comprise the output set, or set of output vari- 
 ables, for that statement. 
 2) A graph of the control flow of the program is built 
 from the contents of the Procedure Division, as is 
 often done for purposes of optimizing code. Each 
 statement in the program is represented by a single 
 node in the graph except for the following special 
 cases: 
 
 a) Contiguous assignment statements, such as 
 arithmetic and MOVE statements are lumped 
 together in a single node, so long as no other 
 type of statement or an intervening label is 
 encountered. Thus, a single node in the program 
 graph is generated for a block of assignment 
 statements. 
 
 b) PERFORM statements are expanded wherever possi- 
 ble. Where the PERFORM statement simply calls 
 for a single execution of a section of the pro- 
 gram, that section is copied into the program 
 in place of the PERFORM statement. Where there 
 is a fixed number of iterations specified in 
 the PERFORM statement, the performed section of 
 code is replicated with the appropriate value 
 of the iteration variable inserted for each 
 
20 
 
 replication. For more complex PERFORM state- 
 ments the performed section of code is copied 
 in place of the PERFORM statement and imbedded 
 in a construct similar to the PL/1 DO block. 
 These blocks are handled by a compiler as in 
 the FORTRAN analyzer described by Kuck et al 
 [KUC72b]. 
 3. For each variable used in the program two sets have 
 to be constructed. 
 
 a) The set of input references is the set of 
 statements for which the variable appears as 
 an input variable. 
 
 b) The set of output references is the set of 
 statements for which the variable appears as 
 an output variable. 
 
 The compiler can do two things during the construction of the 
 internal data base that can yield a speedup at little additional cost. 
 
 The first is to forward substitute [KUC72b] within a block of 
 assignment statements. In this technique, any occurrence of an output 
 variable as an input variable in a subsequent statement is replaced by 
 the expression in the assignment statement for that output variable. 
 For example, the block of assignment statements 
 
 A + B + C + D 
 E +■ A + F 
 G «- H + E + A 
 
21 
 
 would become, after forward substitution 
 
 A + B + C + D 
 
 E^B + C + D + F 
 
 G«-H + B + C + D + F + B + C + D 
 In the latter case, unlike the former, there are no interdependences 
 between the statements in the block, so that all three statements could 
 be executed in parallel. 
 
 The second thing the compiler can do during this phase to 
 improve execution speed is to form IF Trees. In this technique we 
 combine individual IF statements into a tree structure which can be 
 executed by an IF Tree Processor. Unlike Davis [DAV72a, DAV72b], 
 however, when we are building an IF Tree, we do not move all assignment 
 statements from within the tree upward to a point ahead of the tree. 
 Instead, we move upward all statements upon which the execution of the 
 conditional branches in the tree depend. All other statements are 
 moved down to be collected at the exits from the tree. This is 
 illustrated in Figure 3.1. Figure 3.1(a) shows a collection of 
 conditional branch statements with assignment statements occurring 
 between them. Figure 3.1(b) shows the IF Tree and associated assign- 
 ment blocks we form. The conditions have been transformed into 
 assignments of logical values to a set of temporary variables &1, &2, 
 and &3 which we call the conditional result set . This result set is 
 used by the IF Tree Processor to determine which exit is to be used. 
 The identity of the exit is then used by an Address Counter to select 
 the next instruction to be executed. 
 
22 
 
 L 
 
 S «*- T 
 
 ■(a) 
 
 
 1 
 
 61 
 
 **- A = B 
 
 &2 
 
 **-D<E 
 
 &3 
 
 *«-F>G 
 
 (b) 
 
 Figure 3.1 
 IF Tree Example 
 
23 
 
 3.2 Phase and Link Identification 
 
 A program typically consists of a collection of loops con- 
 nected by code which is not included in the loops. There can, of 
 course, be many loops within the outer loops. In terms of the program 
 graph, we define a phase to be a maximal strongly connected set of nodes 
 [CHE71b]. That is, a phase is defined in such a way that any node in 
 the phase can be reached from any node in the phase (including itself) 
 by way of some directed path in the program graph. Any node not found 
 in a phase is in a link . In terms of program execution, control remains 
 within a phase until a link is entered. Once a link is entered, control 
 never re-enters the exited phase. 
 
 We are particularly interested in phases for a couple of 
 reasons. Obviously, the address mapping used for variables referenced 
 within a phase must be invariant within the phase to avoid ambiguities 
 in calculating data addresses. Different mappings can be used in dif- 
 ferent phases. More importantly, we are concentrating our efforts on 
 speeding up the execution of a phase rather than of a link because the 
 link is executed no more than once, while the number of times the code 
 in a phase is executed is potentially \/ery large. 
 
 Identification of phases is simply the problem of identifying 
 maximal strongly connected subgraphs in the program graph. An algorithm 
 for this problem has been given by Ramamoorthy [RAM66], and a more effi- 
 cient technique has been found by Chappell [CHA69]. 
 
24 
 
 3.3 Statement Migration 
 
 It has been found during our analyses that a program can be 
 prevented from being sped-up as much as possible because the programmer 
 happened to code a crucial instruction at a point late in the program, 
 while it was actually possible to have placed the instruction earlier 
 in the instruction stream. On a sequential machine this is no problem. 
 When such an instruction involves the assignment of a value to an inter- 
 locked variable, however, this causes the associated interlock on our 
 concurrent machine from being released as early as it could be released. 
 This likewise unnecessarily delays the processing of data by succeeding 
 Address Counters. 
 
 Consider, for example, Figure 3.2. The loops in Figure 3.2 
 are identical except for the location of the assignment of SEQ to OSEQ. 
 In Figure 3.2(a), if an Address Counter is waiting to execute the condi- 
 tional branch, it cannot be allowed to proceed until its predecessor has 
 executed the assignment of DATA to WDATA, written out WDATA, and 
 assigned OSEQ the proper value. In Figure 3.2(b), the assignments can 
 both take place concurrently, thus requiring a shorter wait by an 
 Address Counter before the value of OSEQ is set. Since the speedup 
 attainable in a situation such as that in Figure 3.2(b) is potentially 
 much greater than in one such as that in Figure 3.2(a), it is worth our 
 while to reorder statements so that they are executed as early as 
 possible. 
 
 Because we migrate statements, instructions whose operands are 
 not available until late in the instruction stream tend to be placed 
 after instructions whose operands are available earlier. Because of 
 
25 
 
 < tu 
 Q V) 
 
 < o 
 
 8° 
 
 o 
 or k 
 
 ^ UJ 
 
 O- 
 E 
 CO 
 X 
 
 CO c 
 
 o 
 
 CO •!- 
 
 
 < 
 
 
 h- 
 
 
 Q < 
 
 
 < Q 
 
 k 
 
 UJ . 
 
 
 UJ 
 
 
 <0 
 
 
 s 
 
 
 
 < 
 
 < 
 
 
 
 
 ♦ 
 
 
 
 
 A 
 
 /c 
 
 L 
 C 
 
 \ 
 
 
 < 
 
 < 
 
 i- 
 Z5 
 
 en 
 
 +-> 
 re 
 
 s- 
 
 C7> 
 
 4-> 
 
 C 
 CD 
 E 
 <D 
 +-> 
 fO 
 
 +J 
 
 n3 
 
26 
 
 this ordering, statements do not usually wait for a long time in the 
 Instruction Dispatch Unit for their operands to become available, thus 
 reducing the amount of queue space required in that unit. 
 
 When we migrate statements, we only move those statements 
 which change the values of variables. Such statements as IF and WRITE 
 should not be moved. 
 
 The algorithm which follows is a modified version of one 
 reported by Foster and Riseman [F0S72a, F0S72b]. 
 
 Algorithm 3.1 - Statement Migration 
 
 1) Start at the head node, corresponding to the first 
 entry point, of the program graph. 
 
 2) Compute the earliest possible dispatch time, t ., 
 for each of the output variables. This is done by 
 computing the execution time, t , for the statement 
 (minimum tree height for blocks of assignment state- 
 ments in [KUC72b]) and finding the maximum of the 
 dispatch times for the input variable set 
 
 t m = max it d ( input variables )t • 
 
 Then 
 
 t . = t + t . 
 d e m 
 
 The dispatch time for a variable is, thus, the 
 earliest time along a particular path in the pro- 
 gram graph that the variable is available for use 
 as an input variable. 
 
27 
 
 3) If the node under consideration is the destination 
 of one or more branch instructions, examine the 
 locations of all branches to this node. There are 
 two possibilities for each. Either the branch is 
 looping back to an earlier point on the path 
 reaching it, or the branch causes the reconvergence 
 of paths which separated at an earlier point in the 
 processing. In the first case we do not attempt 
 further migration since we could end up moving this 
 block endlessly around the loop without real gain. 
 If all of the paths are reconvergent, we attempt to 
 carry out the migration process along all of the 
 paths. If we are able to migrate a statement up 
 any of the paths, then we move the statement into 
 the other paths as well. In Figure 3.3(b) the 
 assignment statement G ■*- B + D can be migrated 
 farther up the first two paths, but it cannot be 
 migrated farther up the third path because of the 
 conflict between the input variable set of the 
 WRITE statement and the output variable set of the 
 assignment statement, as discussed in step (5) 
 below. 
 
 4) If the node prior to the current node corresponds 
 to a conditional branch, we can migrate a state- 
 ment upward past the conditional branch only if 
 the same statement appears at all destinations of 
 
28 
 
 A 
 
 i 
 
 1 
 
 A <+- B+C 
 
 
 D «*- E+F 
 
 
 WRITE 
 
 
 
 
 
 
 
 G 
 
 
 -<'- 
 
 
 
 \ ' 
 
 
 
 G -*- B+D 
 
 
 
 1 
 
 
 
 
 
 (2 
 
 
 
 
 
 
 1 
 
 
 1 
 
 
 1 
 
 A -*- B + C 
 
 G «*- B + D 
 
 
 D -*- E + F 
 G -*- B+ D 
 
 
 WRITE 
 G 
 
 
 
 
 
 
 
 
 
 ,J 
 
 
 G -*- B + D 
 
 
 '-- 
 
 
 
 
 
 
 i 
 
 1 
 
 
 
 
 (b) 
 
 Figure 3.3 
 Statement Migration Example 2 
 
29 
 
 the branch. In Figure 3.4(a) the assignment to 
 A of B + C occurs on all paths leading from the 
 conditional branch. Also, as discussed in 
 step (5), there is no conflict between the output 
 set of this assignment and the input set of the 
 conditional branch instruction. Thus, this assign- 
 ment statement can be migrated up past the condi- 
 tional branch as shown in Figure 3.4(b). In the 
 same example, two paths from the conditional branch 
 contain the assignment of -3 to E. The third 
 branch, however, does not contain this statement. 
 Until the conditional branch is executed, it is 
 not known which value E takes on. This prevents 
 us from migrating this assignment statement. 
 5) For assignment statements there are two cases to 
 consider. The block of assignment statements may 
 be preceded by another block of assignment state- 
 ments or by some other type of block. If the 
 preceding block is another block of assignment 
 statements, the two blocks should be concatenated, 
 subject to the constraints in step (3). If the 
 preceding block is of another type, an assignment 
 can be moved ahead of the predecessor if the 
 following are true: 
 
 a) The dispatch time of the assignment state- 
 ment is less than that of the predecessor. 
 
30 
 
 A -*- B + C 
 E -*- - 3 
 
 T 
 
 A -*- B+ C 
 
 E -*- E + G 
 
 T 
 
 (a) 
 
 1 
 
 A -*- B + C 
 
 E **- -3 
 
 I 
 
 E-*- - 3 
 
 T 
 
 E -*- E + G 
 
 T 
 
 (b) 
 
 Figure 3.4 
 
 Statement Migration Example 3 
 
31 
 
 b) The relation 
 
 (I. n 0.) » (0. n I..) u (0. n Oj) = cf> (3.1) 
 
 is satisfied [BER66, RUS69], where I i and 0.. 
 are the input and output variable sets for the 
 assignment statements, I. and 0. are the input 
 and output variable sets for the predecessor, 
 and tj) denotes the empty set. With test (a) 
 having been performed, the test 
 0. - I . = * 
 
 is redundant. Relation 3.1 thus reduces to 
 
 0. n (I. u 0.) - <J> . (3.2) 
 
 6) If the movable node under consideration corresponds 
 to other than a block of assignment statements, it 
 can be moved ahead of its predecessor if the follow- 
 ing are satisfied: 
 
 a) Its dispatch time is less than that of all 
 statements in the previous block. 
 
 b) Relation 3.2 is satisfied, where I. and 0. are 
 input variable sets for the movable node, and 
 0. is the output variable set for the 
 predecessor, 
 
 7) If any assignment statements were moved in step (5), 
 forward substitute them if possible in their new 
 position. Then continue working them up the path 
 starting at step (2). 
 
32 
 
 8) If the statement moved is not an assignment state- 
 ment, continue migration with step (3). 
 
 9) If no migration was done in steps (5) or (6), 
 attempt migration starting at step (2) with the 
 next node which has not been examined for migra- 
 tion on the path. At a conditional branch, take 
 one of the paths emanating from it and put the 
 identities of the initial nodes of the rest of 
 the paths into a queue. 
 
 10) If there is no further node on the path which 
 
 has not been examined for migration possibilities, 
 take the next node from the queue built in 
 step (9). 
 
 11) If the queue is empty, the algorithm is completed. 
 
 3.4 Variable Type Identification 
 
 We next separate the set of variables referenced in a phase 
 into four classes. Identifying these classes of variables accomplishes 
 two things. First, it identifies those for which interlock instructions 
 must be generated. These variables are required to be accessed by 
 instruction streams in the order in which the streams are started into 
 execution, that is, variables for which sequential execution constraints 
 exist. Second, we can identify the private and common sets of variables. 
 The former have to be provided for each Address Counter, while the latter 
 are shared by all Address Counters. 
 
 The four classes in which we are interested are the following: 
 
33 
 
 1) Constants . These are storage locations whose values 
 are set outside of the phase under consideration. 
 Constants do not impose sequential execution con- 
 straints because they are never assigned values 
 during the phase. Thus they may be accessed by 
 Address Counters in any order. 
 
 2) Local variables . During the execution of a phase 
 a separate copy of each of these variables is 
 maintained by each of the active Address Counters. 
 Separate copies are needed since these variables 
 include the ones being used to contain the data 
 from several records undergoing processing 
 concurrently. Local variables do not impose 
 sequential execution constraints since each 
 Address Counter has its own copy of the Local 
 variable set for the phase and no Address Counter 
 can change the value of another Address Counter's 
 Local variables. 
 
 3) Reference Independent variables . The remaining 
 variables in the program are shared by all Address 
 Counters active in the phase. All of them must be 
 protected by interlocks to guarantee that they 
 have correct values when referenced. The Reference 
 Independent variables have less stringent require- 
 ments for their use than the Reference Dependent 
 variables described below. Reference Independent 
 
34 
 
 variables characteristically are modified during 
 the phase, but their values do not influence the 
 choice of paths through the program. For example, 
 a counter which is incremented for each input 
 record read is of this type. There is no sequential 
 execution constraint generated by the presence of 
 a Reference Independent variable in a phase since 
 the only place the value can be tested is beyond 
 the range of the phase. Only the final value of 
 the variable must be correct; intermediate values 
 are never examined. We must, of course, require 
 that only one Address Counter at a time have 
 access to the variable, but this can be implemented 
 without including interlock instructions in the 
 program by using the Instruction Dispatch Unit 
 described in section 5.4. 
 4) Reference Dependent variables . These are the vari- 
 ables for which sequential execution constraints 
 exist. Included in this class of variables are the 
 files used in the phase. No Address Counter is 
 allowed to access one of these variables until the 
 nearest active predecessor to that Address Counter 
 is no longer able to access that variable. The 
 following section of COBOL code demonstrates the 
 need for interlocks on these variables: 
 
35 
 
 1 MOVE DATA INTO PRINT-LINE. 
 
 2 IF LINE-COUNTER > 60 THEN 
 
 3 WRITE PRINT-FILE FROM PAGE-HEADER 
 
 4 AFTER POSITIONING NEW-PAGE LINES, 
 
 5 MOVE ZERO TO LINE-COUNTER. 
 
 6 WRITE PRINT-FILE FROM PRINT-LINE 
 
 7 AFTER POSITIONING 1 LINES. 
 
 8 ADD 1 TO LINE-COUNTER. 
 
 Since the variable LINE-COUNTER is tested in line 2, 
 we must force Address Counters to access LINE-COUNTER 
 in the order in which the Address Counters were acti- 
 vated, and make each wait until its predecessors no 
 longer alter the value of LINE-COUNTER. Otherwise it 
 could not be guaranteed that each Address Counter 
 would follow its proper path through this section of 
 code. 
 To accomplish variable classification, we make use of the set 
 
 of input references, I, and the set of output references, 0, for each 
 
 variable. 
 
 Algorithm 3.2 - Variable Type Identification 
 
 1) If I and are both empty, the variable is not 
 referenced during the phase and can be discarded. 
 
 2) If the variable is the name of a file, it is 
 considered a Reference Dependent variable. 
 
36 
 
 3) If the variable appears as an argument in a CALL 
 statement, it is considered a Reference Dependent 
 variable. 
 
 4) If is empty, then the variable is never assigned 
 a value during the execution of the phase. The 
 variable is thus a Constant during the phase. 
 
 5) If I is empty or I = 0, then the variable is never 
 used in a conditional branch test (i.e.: I contains 
 no elements not in 0), so that it does not determine 
 the flow of control in the phase. Thus the variable 
 is a Reference Independent variable for the phase. 
 
 6) If, for any path through the phase from one primary 
 READ statement to another primary READ statement, 
 the variable appears as an input variable before it 
 appears as an output variable, then it is a 
 Reference Dependent variable. 
 
 7) If none of the above conditions is met, the vari- 
 able is a Local variable. 
 
 8) If any item in a record has been made a Reference 
 Dependent variable, then all the items in that 
 record must also be made Reference Dependent to 
 insure that the whole record is available with the 
 correct values assigned when it is to be written 
 out. Otherwise part of the record could be lost 
 when a predecessor to the outputting Address 
 Counter is deactivated and releases its storage. 
 
37 
 
 Since one of the conditions tested in this algorithm must 
 hold for each variable, but no more than one condition per variable, it 
 is possible to uniquely determine to which class a variable should be 
 assigned. 
 
 3.5 Storage Assignment 
 
 Within each block of statements we would like to fetch and 
 store all variables without storage access conflicts. We also would 
 like, while avoiding access conflicts, to assign source and destination 
 locations for a data movement (COBOL MOVE instruction) to the same 
 memory unit to avoid needlessly using the Inter-Memory Bus. We also 
 would like to group elements of a data structure which can be fetched 
 together into the same memory word. To accomplish these objectives, we 
 assign variables to memory units according to Algorithm 3.3. In this 
 algorithm, the following four sets of variables are constructed for 
 each variable v: 
 
 D - The Data Division Affinity Set. This is the set of 
 
 v J- 
 
 variables which appear in the same record description 
 in the Data Division of a COBOL program. By assign- 
 ing v to the same word as an element of D , we can 
 fetch both items in the same memory cycle, and we 
 can also simplify the transfer of information 
 between the Data Memory and the I/O Processors. 
 A - The Procedure Division Affinity Set . This is the 
 
 set of variables with which we would like to group v. 
 Variables are grouped in A because of their 
 
38 
 
 relation to v in statements in the Procedure 
 Division of the program. 
 
 S - The Segregation Set . This set consists of vari- 
 ables from which we must separate v in assigning 
 memory units to avoid access conflicts. 
 
 Q - The Indefinite Set . This set consists of vari- 
 ables which we would like to put into the same 
 memory word as v if possible; but, if it is not 
 possible, we must place them in separate memory 
 units. 
 
 Algorithm 3.3 - Storage Assignment 
 
 1) Passing through the Data Division of the program, 
 form D for each variable v. Include in D all 
 variables appearing in the same record description 
 as v. 
 
 2) Passing through the Procedure Division of the 
 program, form the A, S, and Q sets for each 
 variable. 
 
 a) If u, v e I (where I is the set of input vari- 
 ables for a block of code), put u into S . 
 However, if u e (A u D v ), put u into Q y . 
 Similarly, put v into S , unless v e (A u D u ), 
 in which case put v into Q . 
 
 b) If u, v e (where is the set of output 
 variables for a block of code), then put u into 
 S and v into S . 
 
39 
 
 c) If u e I. and v e 0. (where I. and 0. are 
 the input and output sets for statement i), 
 then put u into A . However, if u e S , put 
 u into Q . Similarly, put v into A unless 
 v e S , in which case put v into Q . 
 
 3) For each variable v examine A , S , and Q . 
 
 a) If 3 ueO and u e S , then delete u from 
 
 v 
 
 b) If 3 u e Q and u e A , then delete u from 
 
 V 
 
 4) Assign variables to memory units. We do this by 
 examining S , A , D , and Q for each variable v 
 in turn. 
 
 a) Arbitrarily assign some variable to memory 
 unit 1. A heuristic for selecting this vari- 
 able is to use the one with the largest S set. 
 
 b) For each variable u e S and assigned to 
 memory unit m , mark m unavailable to v. 
 
 c) If all of the memory units to which variables 
 have been assigned are marked unavailable to 
 variable v, assign v to a previously unassigned 
 memory unit. 
 
 d) For each variable u e A and assigned to 
 memory m , determine whether or not m is 
 available to v. Assign v to the memory, in 
 the set of available m units, which has the 
 
40 
 
 fewest words of the appropriate type (common 
 or local ) assigned. 
 
 e) If v is not assigned in step (b), then for 
 each variable u e I assigned to memory 
 unit m , compute 
 
 L = length (v) + length (u 1 ) . 
 We introduce u' to represent u and all other 
 items assigned to the same word as u. That is, 
 if s and t are assigned to the same word, 
 
 s' = t' = {s,t} , 
 and 
 
 length (s 1 ) = length (f) = length (s) + length (t) 
 If L £ length (1 memory word), assign v to the 
 same word as u. Otherwise mark m unavailable 
 to v. 
 
 f) If v is not assigned in one of the steps above, 
 then for each variable u e D and assigned to 
 memory unit m , compute 
 
 L = length (v) + length (u 1 ) . 
 If L < length (1 memory word), assign v to the 
 same word as u. 
 
 g) If no assignment is made for a variable during 
 steps (a), (c), (d), (e), or (f), mark the vari- 
 able "unassigned" and start processing the next 
 variable at step (b). 
 
41 
 
 h) After the first pass through the variables, 
 try again to assign the variables marked 
 "unassigned" in step (g). Iterate this step 
 until either all variables are assigned, or 
 until no variable is assigned during the 
 iteration. 
 
 i) If some variables are still unassigned, 
 
 assign them to memory units in such a way as 
 to balance the number of words used for each 
 type (common and local) across the memories. 
 5) Calculate the address function for each variable. 
 
 We do not present an algorithm here, but, instead, 
 
 present the following remarks which are germane to 
 
 this problem. 
 
 a) Constants, Reference Independent, and Reference 
 Dependent variables are assigned to the common 
 area of each memory unit, using the same base 
 register. Local variables are assigned to a 
 replicated area using a different base register 
 or set of base registers. 
 
 b) It is helpful in assigning memory locations to 
 locate variables, which are in each other's 
 affinity sets, at the same displacements from 
 the start of a memory word to allow data to be 
 moved without the need for shifting to align 
 the data with the destination of the move. 
 
42 
 
 c) Since links are transitions from one phase to 
 another, we need instructions in the links to 
 move data items used in both phases from their 
 locations in the storage mapping of the exited 
 phase to their locations in the storage mapping 
 of the entered phase. Generating these 
 instructions and the storage mapping for the 
 links after the storage allocation for the 
 phases has been done should not present any 
 great difficulties. 
 
 While this algorithm has generated good storage allocations 
 for the programs we have analyzed, no claim of optimal ity is made for 
 it. 
 
 3.6 Positioning FORK, HOLD, and QUIT Instructions 
 
 We define the FORK, HOLD, and QUIT instructions as follows: 
 FORK - When a FORK instruction is encountered by an 
 
 Address Counter, it causes another Address Counter 
 to start executing at the program address included 
 in the FORK instruction. We refer to this address 
 as the initiation point for that FORK. 
 QUIT - When a QUIT instruction is encountered in an 
 
 instruction stream, it results in the release of 
 the private storage that had been assigned to the 
 Address Counter executing the instruction. That 
 Address Counter then returns to the pool of 
 
43 
 
 inactive Address Counters available for assign- 
 ment to new processing work. 
 HOLD - If the Address Counter executing a HOLD is the 
 
 last active Address Counter, it executes the next 
 instruction in its instruction stream. If it is 
 not the only active Address Counter, the location 
 of the HOLD instruction is saved and the Address 
 Counter is released. The last active Address 
 Counter then resumes processing at the instruc- 
 tion following the HOLD instruction after it 
 executes a QUIT instruction. Our rules for 
 inserting FORK instructions guarantee that only 
 one HOLD instruction can be executed to leave a 
 phase. Note that the HOLD instruction is not 
 the same as the JOIN instruction defined by 
 Conway [C0N63]. In Conway's machine, only the 
 n processor to reach a JOIN instruction is 
 allowed to proceed beyond it, where n is set by 
 a FORK instruction. In our machine, the last 
 active Address Counter executes the code follow- 
 ing the HOLD instruction. 
 The idea of using FORK instructions to initiate parallel 
 processing is not a new one [C0N63]. Usually it is proposed [0PL65, 
 AND65, WIR66] that the programmer insert these instructions into his 
 code at places he believes will yield correct parallel operation. The 
 type of concurrency we are attempting to exploit, however, leads to 
 
44 
 
 very simple rules for inserting FORK, HOLD, and QUIT instructions so 
 that this can be done by the compiler. Note that we make no assump- 
 tions when inserting these instructions about the independence of the 
 processing that may coincide in time during program execution. 
 
 Our goal in inserting FORK instructions is to cause the next 
 input record to enter processing as early as possible. Only when a 
 path has been selected to a specific READ statement can the appropriate 
 FORK be executed. A FORK, then, is always located after a conditional 
 branch instruction which selects between paths leading to different 
 READ statements. 
 
 In a program involving more than one input file, we select 
 only one of the files as the one from which we concurrently process 
 records. This file is referred to as the primary input file . An 
 initiation point is associated with each READ statement accessing the 
 primary input file. READs of other files are not specially handled. 
 We want, as the primary input file, the file which in some way controls 
 the processing, such as a "finder" deck identifying records to be 
 selected from another file, or an update deck which selects particular 
 records from a master file for updating. Because this controlling file 
 is typically accessed less often than, say, a master file, a programmer 
 tends to put READ statements for the controlling file into the outer 
 loop of a phase. Heuristically, by selecting the first file encoun- 
 tered in the outermost loop of a phase as the primary file, we did not 
 select the wrong one in any of our sample programs. 
 
 * 
 We consider the end-of-file option of a READ statement to be 
 
 a conditional branch instruction. 
 
45 
 
 To locate the position at which we want to place the FORK 
 instruction, we use the following algorithm for each primary READ 
 statement. 
 
 Algorithm 3.4 - FORK Insertion 
 
 1) Starting at the node in the program graph 
 corresponding to the primary READ statement, 
 follow one path within the phase at a time 
 backward. Ignore any node which does not 
 correspond to a conditional branch. 
 
 2) At a conditional branch, 
 
 a) If the paths leaving the conditional 
 branch lead to more than one different 
 primary READ statement, or any path has 
 not yet been traced, position a FORK 
 instruction on the path we are following 
 at a point immediately after the condi- 
 tional branch, setting the initiation 
 point address to the appropriate state- 
 ment as in Algorithm 3.5. 
 
 b) If all of the paths from the conditional 
 branch reach the same primary READ, 
 follow back along the path(s) entering 
 the conditional branch as in step (1). 
 
46 
 
 To locate the initiation point for a READ statement, we must 
 examine the node immediately preceding the READ on each path to it. 
 
 Algorithm 3.5 - Initiation Point Identification 
 
 1) If the block preceding the READ is not a block of 
 assignment statements, then the initiation point 
 address is the address of the READ statement. 
 
 2) If the block preceding the READ is a block of 
 assignment statements, we have to split it into 
 two blocks as follows. Since forward substitution 
 was done as a part of the source text scan, each 
 
 of the statements in the block of assignment state- 
 ments is independent of the rest. We can thus 
 reorder them so that all of the statements assign- 
 ing values to Local variables follow the statements 
 having other types of variables as output variables. 
 This block is then split to put the assignments to 
 Local variables into a separate block which follows 
 the remainder of the block. The initiation point 
 address is the address of this block of assignments 
 to Local variables. This modification of the orig- 
 inal block of assignment statements is necessary to 
 insure that all initialization needed is done. 
 
 Algorithm 3.6 - QUIT Insertion 
 
 1) The QUIT instructions are located after all terminal 
 nodes in the program. Terminal nodes, which include 
 
47 
 
 such instructions as STOP and GOBACK, are nodes 
 whose execution on a single Address Counter machine 
 would cause termination of the program. 
 2) QUIT instructions are positioned immediately before 
 all initiation points. 
 
 Algorithm 3.7 - HOLD Insertion 
 
 At the entries to links, other than the link from the 
 program entry point, we position HOLD instructions to 
 insure that all of the processing in a phase is com- 
 pleted before the processing in the link and the subse- 
 quent phase is entered. 
 
 Figure 3.5(a) illustrates a simple program before FORK, HOLD, 
 and QUIT instructions are inserted. Figure 3.5(b) shows the same pro- 
 gram after these instructions are inserted. 
 
 3.7 Inserting Interlocks 
 
 In Chapter 2 it was pointed out that there are three differ- 
 ent types of interlocking problems. It was also noted that two of 
 these problems could be solved by the use of an Instruction Dispatch 
 Unit, which is discussed in section 5.4. Since compiler algorithms are 
 not needed to handle these two problems, we need concern ourselves now 
 with only the third type of interlock. 
 
 We need to insert two types of instructions for each inter- 
 locked variable, RELEASE and TEST. For each path from a primary READ 
 to a QUIT we want to insert a TEST immediately before the first use of 
 
48 
 
 00 o 
 
 ♦ * 
 
 
 < < 
 
 c 
 o 
 
 •I— 
 
 s- 
 
 CD 
 to 
 
 c 
 
 
 Q 
 
 _l 
 O 
 
 o 
 
 ra 
 
 
 
 2 1 
 
 m o 
 
 
 o \ 
 
 H 
 
 
 
 -H OJ 
 
 — ► 
 
 WRI 
 
 TREC 
 
 HEAI 
 
 < < 
 
 
 
 
 3 
 
 
 
 O i 
 
49 
 
 the interlocked variable on that path. Similarly, for each path from a 
 primary READ to a QUIT we want to insert a RELEASE immediately after the 
 last use of the interlocked variable on that path. 
 
 Algorithm 3.6 - Interlock Instruction Insertion 
 
 1) Working backward from each QUIT, for each Reference 
 Dependent variable except the primary input file, 
 examine each block for a reference to the Reference 
 Dependent variable. Immediately after the first 
 such block, insert a RELEASE for that variable. At 
 a conditional branch, put a RELEASE immediately 
 after the exit from the conditional branch unless 
 all exits have RELEASES for the same variable. In 
 this case delete all of those RELEASES and continue 
 following the path backward. If the READ statement 
 is reached before a RELEASE instruction is posi- 
 tioned on that path, insert it immediately after 
 the READ. 
 
 2) Working forward from each initiation point, for 
 each Reference Dependent variable except the pri- 
 mary input file, examine each block for a reference 
 to the Reference Dependent variable. Immediately 
 ahead of the first such block, insert a TEST for 
 that variable. 
 
 As an example of this process consider Figure 3.6. We assume 
 in this example that the variable C and the file, FILE-0, associated 
 
50 
 
 with OUTREC, are Reference Dependent variables. Figure 3.6(a) shows the 
 
 program after FORK, HOLD, and QUIT instructions have been inserted. 
 
 Figure 3.6(b) shows the same program after TEST and RELEASE instructions 
 
 have been inserted. Note in Figure 3.6(b) that only one TEST and one 
 
 RELEASE have been inserted for each variable on each path. 
 
51 
 
 I 
 
 yes 
 
 WRITE 
 OUTREC FROM 
 HEADER 
 
 WRITE 
 
 OUTREC FROM 
 
 A 
 
 (a) 
 
 Figure 3.6 
 Interlock Insertion Example 
 
52 
 
 yes 
 
 < 
 
 RELEASE C 
 
 <Qtest file-o^> 
 
 > 
 
 WRITE 
 
 OUTREC FROM 
 
 HEADER 
 
 WRITE 
 
 OUTREC FROM 
 
 A 
 
 ^RELEASE FILE-O ^> 
 
 (b) 
 
 Figure 3.6 (continued) 
 Interlock Insertion Example 
 
53 
 
 4. PROOF OF THE METHOD 
 
 In section 3.4 we demonstrated that it is possible to 
 unambiguously separate the set of variables used in a phase into four 
 subsets. It was further shown that only one subset, the set of 
 Reference Dependent variables, cause the existence of sequential exe- 
 cution constraints. In order to show that our method of executing a 
 program will yield the same output as a single processor sequential 
 machine, we need to prove the following: 
 
 1) If our machine and a single processor both access 
 nodes containing references to Reference Dependent 
 variables in the same order, then both yield the 
 same output. 
 
 2) The interlocking method we proposed in section 3.7 
 guarantees the proper sequence of Reference 
 Dependent variable accessing. 
 
 In developing these proofs we are concerned with only an indi 
 vidual phase since this is where we apply our method to achieve a 
 speedup. 
 
 4.1 Theorem 4.1 
 
 Given the sets of references to Reference Dependent variables 
 in a phase, a single processor machine and a multiple Address Counter 
 machine both yield the same output if both machines execute, in the 
 same order, nodes containing references to Reference Dependent variables 
 
54 
 
 Consider, first, the processing of two consecutive records as 
 it is done by a single processor machine. The first record follows some 
 path through the program executing a sequence of nodes: 
 
 S l = n lT n 12' n 13' •'• ■ n li 
 until the flow of control in the program causes the next record to be 
 read. The second record then follows a path through the program execu- 
 ting a sequence of nodes: 
 
 Thus the processing of the two records requires the execution of a 
 sequence of nodes: 
 
 S 12 == n H' n 12' n 13' •'• » n li' n 2T •" ' n 2j * 
 However, we found previously that not all of the variables involved in 
 the execution of these nodes caused problems in concurrently processing 
 the data. In fact, we need consider only those nodes which contain 
 references to Reference Dependent variables. The sequence of nodes S, 2 
 then reduces to the sequence: 
 
 S = n-j , n£, r\y ... , n^ 
 
 when we omit any node in S,p that involves only Constants, Local vari- 
 ables, and Reference Independent variables. As long as the nodes in 
 set S are executed in the sequence given in S the results of the execu- 
 tion are the same regardless of what the method is. 
 
 4.2 Theorem 4.2 
 
 Given the execution sequence S from Theorem 4.1, the inter- 
 locking method proposed in section 3.7 preserves this sequence. 
 
55 
 
 For the case of the Reference Independent variables in the 
 program, we do not have to worry that accessing them affects the output 
 unless more than one Address Counter can access the variable at one 
 time. The Instruction Dispatch Unit discussed in section 5.4 prevents 
 this occurrence. 
 
 For Reference Dependent variables we must return to the dis- 
 cussion of the required sequence, S, for the execution of nodes contain- 
 ing Reference Dependent variables. Note that the sequence S, ? is 
 composed of two sequences: 
 
 S 12 = V S 2 
 where S-. and S 2 are defined as above. Since S is formed from S,« by 
 dropping nodes with no rearrangement, it follows that S is composed of 
 two subsequences: 
 
 S = S\ S" 
 where S' 5 S, , and S" E$o' Consider now the sequence of nodes, 0*. , 
 in which a variable, v., is referenced. 
 
 
 ' Q i 
 
 a. c. S 
 
 
 °y ^ s 1 
 
 t 
 
 0." c S' 
 
 
 It is apparent that two different Reference Dependent variables, v. and 
 v., can be accessed independently of each other and their sets of nodes, 
 
 J 
 
 a. and a., can be executed concurrently unless there is a node, n , 
 which is common to both sequences. When node n is encountered during 
 the execution of one of the sequences, the execution of that sequence 
 
56 
 
 is halted until the execution of the other sequence reaches n , 
 whereupon both sequences are continued in execution. Thus it is possi- 
 ble to rearrange sequence S into a sequence having three parts. The 
 
 first part is composed of the portions of o. and a. preceding n , 
 
 I j c 
 
 interleaved in any manner. The second part is n . The third part is 
 
 composed of the portions of a. and a. which follow n in S, interleaved 
 
 i j i» 
 
 in some way. This argument extends in a straightforward way to any 
 number of variables and any collection of nodes in S which are common 
 to more than one of the a. sequences. 
 
 In order for us to guarantee the correctness of the results of 
 the program while concurrently executing the processing of the records, 
 we must obey the following: 
 
 1) We allow the sequences of nodes containing references 
 to different Reference Dependent variables to proceed 
 independently until a node is encountered which con- 
 tains reference to more than one such variable. This 
 node cannot be executed until all sequences in which 
 it appears have reached it. 
 
 2) For a given variable, v., the execution of the pro- 
 gram must preserve the sequence a.. In particular, 
 the subsequence a." must follow the subsequence a.'. 
 
 The first condition is satisfied by the fact that we do not 
 allow a block to be executed until all of the interlocks on that block 
 have been satisfied. 
 
 The second condition is met through three features of our 
 technique. 
 
57 
 
 1) Each Address Counter executes its own subset of 
 nodes sequentially, taking advantage of arithmetic 
 parallelism where possible, just as a single 
 Address Counter machine would. 
 
 2) The Instruction Dispatch Unit protects the ordering 
 within the sequences a.' and a.". 
 
 3) The interlocks allow a variable to be accessed by 
 
 an Address Counter only after that Address Counter's 
 predecessor is finished with the variable, thus 
 guaranteeing that a." does not start being executed 
 until after a.' is finished. 
 
 4.3 Discussion 
 
 Having shown that the constraints contained in the method we 
 are proposing are sufficient to guarantee the correctness of the results 
 of the program, we now ask if they are necessary. There are two ques- 
 tions which we must investigate. 
 
 1) If each Address Counter has access to only its own 
 set of Local variables in addition to the global (or 
 common) variables (a restriction we examine in 
 Chapter 8), can we start Address Counters into oper- 
 ation sooner that we do now? 
 
 2) Can we relax or remove some of the interlock 
 constraints? 
 
 Considering question (1) we demonstrate that the Address 
 Counters cannot be put into operation any sooner than they are now. 
 
58 
 
 Figure 4.1(a) shows a portion of a program graph. The rules for setting 
 our FORK instructions cause a FORK instruction to be placed at the 
 earliest point in the program at which the identity of the next primary 
 READ statement to be executed has been decided. Thus our technique 
 would position the FORKs at the beginning of path 1 and of path 2, 
 immediately after the decision node as shown in Figure 4.1(b). If we 
 attempt to bring another record into processing at any point prior to 
 the locations of the FORKs we clearly get into trouble since we do not 
 know until the decision node has been executed just which READ we exe- 
 cute next and what processing ensues. 
 
 As to question (2) concerning removal of interlock constraints, 
 we have already demonstrated the necessity of interlocking Reference 
 Dependent variables. However, we acknowledge that it is not necessary 
 to protect these variables by preventing the whole block in which such 
 a variable is referenced from entering execution. This decision, 
 presented in section 3.7, is an engineering decision based on the belief 
 that it is more important to prevent a potentially deadlocking condition 
 than to achieve the ultimate in speedup between deadlocks. 
 
59 
 
 
 
 V) 
 
 * 
 
 
 V) 
 UJ 
 
 O 
 
 
 o 
 o 
 
 u. 
 
 
 0. 
 
 3 
 
 A 
 
 < 
 
 UJ 
 
 
 
 
 
 0) 
 
 
 
 
 
 CO 
 
 
 
 * 
 
 
 UJ 
 
 
 
 <r 
 
 
 o 
 
 
 o 
 
 
 o 
 
 ^1 4 
 
 u. 
 
 
 cc 
 
 %\ r 
 
 
 
 Q. 
 
 «?L 
 
 
 
 
 
 
 / 
 
 < 
 
 UJ 
 
 iK 
 
 CD 
 
 S- 
 
 en 
 
 CO 
 
 
 
 (0 
 UJ 
 
 o 
 
 
 Q 
 
 < 
 
 O 
 
 
 UJ 
 
 <r 
 
 
 cc 
 
 Q. 
 
 
 \ 
 
 <L> 
 Q- 
 
 X 
 
 c 
 o 
 
 +-> 
 
 c 
 
 o 
 
 A3 
 
 / 
 
 </> 
 
 
 
 (0 
 
 
 
 UJ 
 
 
 o 
 
 o 
 
 _ __M — ^ 
 
 < 
 
 o 
 
 
 UJ 
 
 or 
 
 
 or 
 
 Q. 
 
 
 
60 
 
 5. MACHINE DESIGN 
 
 5.1 Over-all Structure 
 
 The following features must be available in a machine designed 
 for concurrent record processing: 
 
 1) A number of independent program counters to bring 
 about the concurrent execution of different 
 instruction streams. 
 
 2) A number of arithmetic units capable of operating 
 independently of one another [FLY72b]. There also 
 must be no correspondence between instruction 
 streams and arithmetic units; an instruction from 
 any instruction stream is executed by any available 
 arithmetic unit. 
 
 3) A number of memories which can each be addressed 
 by any of the arithmetic units. 
 
 4) A device which prevents instructions which access 
 the same variables from being executed in an 
 incorrect sequence. 
 
 The following items are desirable in a machine designed for 
 concurrent record processing: 
 
 1) Program and data storage should be kept separate 
 to reduce the problem of access conflicts. 
 
 2) Each program counter should have associated with it 
 the necessary logic and registers to calculate the 
 
61 
 
 effective addresses of all operands. By the appro- 
 priate settings of the base and index registers, 
 each program counter could execute the same instruc- 
 tions but refer to different data storage areas 
 when necessary. 
 
 3) A device should be included to supervise the opera- 
 tion of all program counters. Any communications 
 between program counters could travel through this 
 device. When all units are active but requests are 
 generated for the activation of further units, this 
 device could handle the enqueuing of these requests 
 until program counters are available to satisfy the 
 requests. 
 
 4) Since much of the activity in a COBOL program is 
 memory oriented (e.g.: the MOVE, TRANSFORM, and 
 EXAMINE verbs), it seems desirable to build into 
 the memory units some processing capability to 
 avoid the necessity of transferring this data back 
 and forth, to and from a processor. Thus, opera- 
 tions which require only one operand could be done 
 in the memory processor, while those operations 
 needing more operands, which would be found in 
 separate memory units, would be handled by separate 
 processing units. 
 
62 
 
 The design we are proposing, shown in Figure 5.1, incorporates 
 the necessary and the desirable features. It is composed of the follow- 
 ing units: 
 
 1 ) Address Counters 
 
 2) Address Counter Coordinator 
 
 3) Instruction Dispatch Unit 
 
 4) Processors 
 
 5) Program Memory 
 
 6) Data Memory 
 
 7) I/O Processors 
 
 8) Routing Network 
 
 The design of each of these units is now discussed, but the 
 various numbers and sizes of units recommended on the basis of our 
 experimental results is deferred until Chapter 7. 
 
 5.2 Program Memory 
 
 The Program Memory, shown in Figure 5.2, is designed as a 
 hierarchy [KUC70, MAT72] of memory devices. The program comes ini- 
 tially from an external storage medium, such as disk or drum storage, 
 to the Primary Program Memory. Address Counters obtain instructions 
 from the fast Cache memory [BAR72a, C0N69, KAP73, MEA71] which holds 
 several segments of the program. 
 
 The design of this memory is similar to that of the IBM 360/85 
 memory [LIP68], with the addition of the Fetch Queuing and Routing Unit. 
 This unit allows any of the Address Counters to obtain data from the 
 Cache. 
 
63 
 
 ADDRESS 
 COUNTERS 
 
 ]'jJl TTf. 1111 
 
 • • • 
 
 PROGRAM 
 MEMORY 
 
 INDEX BUS 
 
 U J 
 
 A/ 
 
 From 
 
 IF Tree 
 
 Processors 
 
 ADDRESS 
 COUNTER 
 COORDINATOR 
 
 PROCESSOR OPERATION BUS 
 
 Figure 5.1 
 Over-all Machine Structure 
 
64 
 
 From 
 
 External 
 
 Devices 
 
 i 
 
 PRIMARY 
 
 PROGRAM 
 
 MEMORY 
 
 £_ 
 
 CACHE 
 
 I 
 
 FETCH 
 
 CONTROL 
 
 UNIT 
 
 FETCH QUEUING 
 & ROUTING UNIT 
 
 T 
 
 T 
 
 From 
 
 From 
 
 Address 
 
 Address 
 
 Counter 
 
 Counter 
 
 1 
 
 2 
 
 T 
 
 From 
 Address 
 Counter 
 n 
 
 Figure 5.2 
 Program Memory 
 
65 
 
 5.3 Address Counters 
 
 An Address Counter, shown in Figure 5.3, operates in the fol- 
 lowing manner: 
 
 1) The instruction whose address is in the Program 
 Address Register is fetched from the Program Memory 
 and placed in the Memory Buffer. 
 
 2) The Op Code Decoder examines a portion of the opera- 
 tion code of the instruction to determine the 
 instruction type. The six instruction types recog- 
 nized are unconditional branch, conditional branch, 
 set internal registers, fetch index, Address Counter 
 control , and other. 
 
 When an unconditional branch instruction is encountered, the 
 effective address is calculated and inserted into the Program Address 
 Register for the next instruction fetch. 
 
 When a conditional branch is found, the effective address of 
 the conditional result set is calculated and the conditional test 
 instruction is sent to the Instruction Dispatch Unit. The Address 
 Counter ID Match Unit is also armed to respond to the appearance on the 
 Index Bus of this Address Counter's identification number. After the 
 IF Tree Processor evaluates the conditional test, it sends out on the 
 Index Bus the Address Counter identification number and the jump dis- 
 placement from the current instruction to the next instruction to be 
 executed. This displacement is then stored in an index register. The 
 Address Calculation Unit uses the program address and the jump displace- 
 ment to compute the address of the appropriate entry in a transfer 
 
66 
 
 To 
 Memory 
 
 From 
 
 Address Counter 
 
 Coordinator 
 
 From 
 Memory 
 
 PROGRAM 
 ADDRESS 
 REGISTER 
 
 T7 
 
 MEMORY 
 
 BUFFER 
 
 INCREMENTER 
 
 OP CODE DECODER 
 & CONTROL 
 LOGIC 
 
 From 
 Address Counter 
 Coordinator 
 
 ADDR CNTR ID 
 
 MATCH 
 
 UNIT 
 
 ADDRESS 
 
 CALCULATION 
 
 UNIT 
 
 BASE & INDEX 
 REGISTERS 
 
 From 
 
 •Address Cntr 
 
 Coordinator 
 
 From 
 Index 
 
 ' Bus 
 
 INSTRUCTION 
 BUFFER 
 
 To 
 
 Instruction 
 
 Dispatch 
 
 Unit 
 
 Figure 5.3 
 Address Counter 
 
67 
 
 vector table and inserts this address into the Program Address Register 
 for the next instruction fetch. 
 
 When the operation code indicates that the current instruction 
 loads one of the internal registers, either the operand of the instruc- 
 tion or the contents of the Program Address Register, as the instruction 
 requires, is placed in the selected index register. 
 
 When the operation code indicates that the current instruction 
 fetches an item from the Data Memory to an index register, the effective 
 address of the data item is computed and the instruction is passed on to 
 the Instruction Dispatch Unit. The Address Counter ID Match Unit is 
 also armed to respond to the appearance on the Index Bus of this Address 
 Counter's identification number. When the Data Memory puts this identi- 
 fication number and the data item on the Index Bus, the Address Counter 
 loads the appropriate index register from the bus. 
 
 An Address Counter must recognize the QUIT, HOLD, and TEST 
 instructions and halt after passing them on to the Instruction Dispatch 
 Unit. At an appropriate time, the Address Counter is restarted by the 
 Address Counter Coordinator. 
 
 For any of the other instructions, the effective addresses of 
 the operands are calculated and the instruction, now containing full 
 memory addresses for the operands, is sent to the Instruction Dispatch 
 Unit. 
 
 5.4 Instruction Dispatch Unit 
 
 Figure 5.4 shows a design for the Instruction Dispatch Unit. 
 The primary function of this unit is to insure that no instruction is 
 allowed to go into execution until all of its operands have been set to 
 
68 
 
 To 
 
 Memory 
 Op Bus 
 
 Operand 
 Fetches 
 
 INSTRUCTION 
 
 WAITING 
 
 REGISTERS 
 
 From 
 
 Address 
 
 Counters 
 
 1 
 
 ARRIVING 
 
 INSTRUCTION 
 
 QUEUE 
 
 (AIQ) 
 
 I 
 
 FETCH & TAG 
 GENERATOR 
 
 (FTG) 
 
 T 
 
 (IWR) 
 
 Memory Instructions 
 
 £1 
 
 PROCESSOR 
 
 INSTRUCTION 
 
 QUEUE 
 
 (PIQ) 
 
 /Processor 
 / Status 
 Information 
 
 T 
 
 Processor 
 
 Instructions 
 
 To 
 Processors 
 
 Address 
 Counter 
 Instructions 
 
 INSTRUCTION 
 
 DISPATCH 
 
 CONTROLLER 
 
 (IDC) 
 
 To 
 
 Address 
 
 Counter 
 
 Coordinator 
 
 TAG STATUS 
 
 REGISTER 
 
 ARRAY 
 
 (TSR) 
 
 TAG 
 QUEUE 
 
 (TQ) 
 
 T 
 
 Tags 
 
 From 
 Memories 
 
 Figure 5.4 
 Instruction Dispatch Unit 
 
69 
 
 the proper value and are available in the Data Registers of the Data 
 Memory . 
 
 The technique for accomplishing this objective was inspired 
 by the method used to solve the same sort of problem in the IBM 360/91 
 [T0M67] but differs from that solution in several particulars. The 
 method reported by Tomasulo made use of a tag associated with each 
 operand. The tag was attached to the register(s) into which the 
 operand would be placed when it became available and represented the 
 identity of the source of that operand. In our method the tag has no 
 correlation with the identity of the source of the operand. Rather, the 
 tag is the identity of the Tag Register which contains the Data Memory 
 address and status of that operand. The tags are passed around the 
 machine to identify results and operands when needed, with the tag 
 always eventually returning to the Instruction Dispatch Unit as an indi- 
 cation that the associated data item is available for use. The follow- 
 ing description of the operation of the Instruction Dispatch Unit 
 explains our technique: 
 
 1) An instruction is accepted from the Arriving 
 Instruction Queue. 
 
 2) If the instruction is destined for the Address 
 Counter Coordinator, it is immediately dispatched 
 to that unit. 
 
 3) For other instruction types (memory and processor 
 instructions), the first operand is sent to the 
 Tag Status Register Array. That unit returns to 
 the Fetch and Tag Generator the identity of the 
 
70 
 
 register assigned to the operand and an indication 
 of whether or not the operand was previously in a 
 Tag Status Register. 
 
 4) If the operand was new to the Tag Status Registers, 
 the Fetch and Tag Generator issues a fetch request 
 to the memory, sending both the operand address 
 and the register identity, the tag, sent by the 
 Tag Status Registers. 
 
 5) Steps (3) and (4) are repeated for a second operand 
 if it exists. 
 
 6) The address of the result of the operation is then 
 sent to the Tag Status Register Array and the 
 resulting tag is returned. 
 
 7) The instruction with tags appended is moved into 
 an idle Instruction Waiting Register. 
 
 8) When the tags for all of the operands return from 
 data memory, the instruction is transferred either 
 onto the Memory Bus or into a Processing Instruction 
 Queue. In the former case the appropriate memory 
 accepts the instruction for processing. In the 
 latter case an idle processor of the appropriate 
 type is selected and the instruction is routed to 
 it. 
 
 9) When the operation has completed, the memory sends 
 the tag of the result to the Tag Queue in the 
 Instruction Dispatch Unit. 
 
71 
 
 10) When the tag is processed by the Instruction 
 
 Dispatch Controller, the corresponding Tag Status 
 Register is released and any instruction for 
 which all the other operands are also available 
 is started into execution. 
 Logic flow diagrams for the Fetch and Tag Generator, the 
 Instruction Dispatch Controller, and the Tag Status Register Array 
 appear as Figures 5.5 to 5.7. 
 
 5.5 Address Counter Coordinator 
 
 The Address Counter Coordinator is one unit which could be 
 implemented with a large portion of it contained in the operating 
 system software. It could also, on the other hand, be implemented com- 
 pletely in hardware. Because the type of implementation of this unit 
 would be affected by many considerations beyond the scope of this 
 paper, no structure is proposed here. Rather, Figures 5.8 to 5.12 give 
 the control sequence for each of the five instructions executed by this 
 unit. The functions given in these figures would have to be executed 
 regardless of the software/hardware proportion of the implementation. 
 
 In Figure 5.8 the control sequence of the FORK instruction is 
 given. In the event that all Address Counters are already active, the 
 FORK request is enqueued until one does become available. When an 
 Address Counter is assigned to begin execution at the initiation point 
 specified in the FORK instruction, several things must be done before 
 that Address Counter can begin execution. 
 
72 
 
 i 
 
 ACCEPT INSTRUCTION FROM 
 ARRIVING INSTRUCTION QUEUE 
 
 Memory or Processor / \ Address Counter Control 
 
 r — <f INSTRUCTION^) — n 
 
 1 \ TYPE /_ %_ 
 
 SEND FIRST OPERAND 
 ADDRESS TO TSR 
 
 SEND INSTRUCTION TO 
 ADDR. CNTR. COORDINATOR 
 
 ACCEPT TAG & STATUS 
 FROM TSR 
 
 yes 
 
 1 
 
 ISSUE FETCH REQUEST 
 FOR OPERAND 
 
 no 
 
 I 
 
 SEND SECOND OPERAND 
 ADDRESS TO TSR 
 
 yes 
 
 I 
 
 ACCEPT TAG & STATUS 
 FROM TSR 
 
 /** 
 
 no 
 
 1 
 
 ISSUE FETCH REQUEST 
 FOR OPERAND 
 
 3 
 
 w 
 
 Figure 5.5 
 Fetch and Tag Generator Logic 
 
73 
 
 SEND RESULT ADDRESS 
 TO TSR 
 
 ACCEPT RESULT TAG 
 FROM TSR 
 
 I 
 
 MOVE INSTRUCTION 
 AND TAGS TO IWR 
 
 i 
 
 © 
 
 Figure 5.5 (continued) 
 Fetch and Tag Generator Logic 
 
74 
 
 ACCEPT TAG 
 FROM TAG QUEUE 
 
 I 
 
 SEND TAG TO 
 TAG STATUS REGISTERS, 
 INSTRUCTION WAITING REGISTERS 
 
 yes 
 
 I 
 
 UPDATE MATCHING 
 INSTRUCTION'S STATUS 
 
 no 
 
 no 
 
 yes 
 
 1 
 
 SEND INSTRUCTION TO 
 
 MEMORY BUS OR 
 
 PROCESSOR INSTRUCTION QUEUE 
 
 Figure 5.6 
 Instruction Dispatch Controller Logic 
 
75 
 
 f 
 
 /•* 
 
 MARK REGISTER IDLE 
 
 no 
 
 SEARCH FOR 
 OPERAND ADDRESS 
 
 yes 
 
 PUT RESULT ADDRESS 
 
 INTO IDLE 
 
 STATUS REGISTER 
 
 I 
 
 MARK REGISTER 
 OCCUPIED 
 
 I 
 
 SEND REGISTER ID 
 TO FTG AS TAG 
 
 Z 
 
 SEARCH FOR 
 RESULT ADDRESS 
 
 no 
 
 SET MATCHING 
 REGISTER TO NULL 
 
 I 
 
 PUT OPERAND ADDRESS 
 INTO IDLE 
 STATUS REGISTER 
 
 I 
 
 MARK REGISTER 
 OCCUPIED 
 
 I 
 
 SEND REGISTER ID 
 TO FTG AS TAG 
 
 Figure 5.7 
 Tag Status Register Logic 
 
76 
 
 no 
 
 i 
 
 ADD REQUEST TO 
 FORK QUEUE 
 
 I 
 
 (done) 
 
 SELECT 
 ADDRESS COUNTER 
 
 COMPUTE STORAGE 
 BASE ADDRESSES 
 
 I 
 
 SET ADDRESS 
 COUNTER REGISTERS 
 
 I 
 
 INITIALIZE 
 INTERLOCKS 
 
 I 
 
 SET 
 PREDECESSOR/SUCCESSOR 
 INFORMATION 
 
 I 
 
 SET PROGRAM COUNTER 
 IN ADDRESS COUNTER 
 
 I 
 
 START 
 ADDRESS COUNTER 
 
 I 
 
 f done") 
 
 Figure 5.8 
 FORK Control Sequence 
 
77 
 
 1) Storage must be allocated [IS071] for private 
 variables. The size of the allocation is fixed 
 for all initiation points within a phase. 
 
 2) The base registers for the private and common 
 storage areas must be set in the Address Counter. 
 The base address for common storage is fixed 
 within a phase, and the base address of the 
 private storage area results from the storage 
 allocation process in step (1). 
 
 3) Interlocks must be initialized so that all inter- 
 locks associated with the phase initially block 
 the successor to this Address Counter. 
 
 4) Information must be maintained which allows each 
 Address Counter's immediate predecessor and 
 successor to be quickly identified. This infor- 
 mation is needed whenever an Address Counter tests 
 an interlock, to determine whether or not its 
 predecessor has released it. It is also needed 
 whenever an Address Counter releases an interlock, 
 to restart the successor if it is waiting for this 
 interlock to be released. 
 
 The control sequence for the HOLD instruction is shown in 
 Figure 5.9. If the Address Counter executing the HOLD instruction is 
 the only active Address Counter, it immediately is restarted on the next 
 instruction. If other Address Counters are still active, it is neces- 
 sary that they be allowed to complete their tasks before the instructions 
 
78 
 
 I 
 
 yes 
 
 1 
 
 START 
 ADDRESS COUNTER 
 
 SAVE ADDRESS COUNTER 
 REGISTERS IN 
 HOLD SAVE-AREA 
 
 I 
 
 ADD REQUEST TO 
 HOLD QUEUE 
 
 MARK 
 ADDRESS COUNTER 
 AS IDLE 
 
 I 
 
 (done) 
 
 Figure 5.9 
 HOLD Control Sequence 
 
79 
 
 beyond the HOLD instruction are executed by some Address Counter. There 
 are several ways the Address Counter which executes a HOLD under these 
 conditions can be handled. One way is simply to leave it waiting at the 
 HOLD instruction and restart it when the last Address Counter in the 
 phase executes a QUIT instruction. Unfortunately, this approach could 
 be quite inefficient if the waiting time is \/ery long, since it leaves 
 the Address Counter locked up. The approach we prefer, shown in 
 Figure 5.9, assumes that the waiting time is relatively long, that the 
 Address Counter might be usefully employed elsewhere, and that the over- 
 head in saving the necessary information to start another Address Counter 
 at this location at a later time is not prohibitive. In this approach 
 to handling the waiting period, the Address Counter registers are stored 
 in a memory for later restoration. The memory used to save the registers 
 could be one dedicated to this purpose, or a part of a data memory used 
 by the operating system. A request is added to the HOLD queue. If 
 there are any requests in the FORK queue, the Address Counter now begins 
 execution at the first requested initiation point. Otherwise the 
 Address Counter can be added to the pool of idle Address Counters. 
 
 When a QUIT instruction is executed, the sequence in 
 Figure 5.10 is followed. Storage assigned to the Address Counter is 
 released. If there are any requests in the FORK queue, the Address 
 Counter now begins execution at the first requested initiation point. 
 If, instead, there is a request on the HOLD queue and the Address 
 Counter is the last one active, it resumes execution of the program at 
 the instruction following the HOLD instruction. This is done by loading 
 the Address Counter registers with the contents of the registers saved 
 
80 
 
 F 
 
 FORK 
 
 I 
 
 yes 
 
 RESTORE 
 REGISTERS FROM 
 HOLD SAVE-AREA 
 
 I 
 
 START 
 ADDRESS COUNTER 
 
 1 
 
 RELEASE 
 STORAGE 
 
 MARK 
 
 ADDRESS COUNTER 
 
 AS IDLE 
 
 I 
 
 UPDATE 
 
 PREDECESSOR/SUCCESSOR 
 
 INFORMATION 
 
 + y * 
 
 I 
 
 Cdonej 
 
 Figure 5.10 
 QUIT Control Sequence 
 
81 
 
 when the HOLD was enqueued. If neither of these conditions hold true, 
 then the Address Counter is returned to the pool of idle Address 
 Counters. 
 
 In handling the interlocks which regulate the access to 
 Reference Dependent variables, it is sufficient to have an n x v bit 
 array, where n is the number of Address Counters and v is the number of 
 Reference Dependent variables in the program phase. All bits in the 
 row associated with an Address Counter are initially set when that 
 Address Counter is started into operation at some initiation point. 
 Whenever a TEST instruction is encountered, the bit corresponding to the 
 combination of the variable designated in the instruction and the Address 
 Counter's predecessor is tested, as shown in Figure 5.11. If the bit is 
 not set, execution by that Address Counter is resumed. If the bit is 
 set, the Address Counter waits at that instruction until the interlock 
 is released by its predecessor. This is done rather than storing the 
 registers and freeing the Address Counter for other work because the 
 waiting period in this case is expected to be relatively short. When a 
 RELEASE instruction is encountered, the appropriate interlock bit is 
 reset and a test is made to see if the successor to that Address Counter 
 is waiting for that interlock to be released, as shown in Figure 5.12. 
 If so, the successor is reactivated. The Address Counter releasing the 
 interlock executes its next instruction without waiting for the execu- 
 tion of the RELEASE instruction. 
 
 5.6 Processors 
 
 There are a number of processors in the machine we are pro- 
 posing in this chapter. A number of these processors are simply 
 
82 
 
 
 UJ 
 
 1- 
 
 _j 
 
 GD 
 
 CD 
 < 
 
 ^ 
 
 cc 
 
 o 
 
 < 
 
 o 
 
 > 
 
 _l 
 
 CC 
 UJ 
 
 UJ 
 
 H 
 
 u. 
 
 z 
 
 o 
 
 
 UJ 
 
 h- 
 
 n 
 
 Ul 
 
 CO 
 
 CO 
 
 
 UJ 
 
 cc 
 
 cc 
 
 o 
 
 
 u. 
 
 
 
 cc 
 
 <o 
 
 
 o 
 
 
 cc 
 
 CO 
 CO 
 
 — » 
 
 < 
 
 UJ 
 
 
 1- 
 
 o 
 
 
 (/) 
 
 o 
 
 CO 
 
 UJ 
 
 z 
 o 
 
 Q 
 
 CM 
 
 s- 
 en 
 
 <1> 
 
 o 
 
 c 
 
 cr 
 
 o 
 
 s- 
 
 4-> 
 
 C 
 
 o 
 
 
 UJ 
 
 a: 
 
 uj 
 
 C05 
 
 Ul 
 
 
 <r 
 
 
 
 
 
 UJ 
 
 
 
 H 
 
 
 
 Z 
 
 
 
 1- 3 
 
 
 
 cc 
 
 
 
 < 
 
 1- 
 
 CO CO 
 
 
 
 
 
 
 UJ CO 
 
 
 
 
 cc Ul 
 
 
 
 
 DC 
 
 
 
 
 a 
 
 
 
 
 
 
 
 
 * X 
 
 < 
 
 
 S~~>s. 
 
 S2 HV 
 
 
 
 
 
 gHj\ 
 
 ^ 
 
 r 
 
 Ul 
 
 z 
 
 
 
 LJ 1- / 
 
 1 
 
 I 
 
 Q 
 
 55/ 
 
 
 
 
 w 
 
 
 >■ ?; 
 
 
 
 
 t: 
 
 
 
 
 1- _j 
 
 
 
 
 z cc 
 
 Ul Ul 
 
 a h- 
 
 
 
 
 
 
 — z 
 
 
 
 Ul 
 
 
 
 ^ u - 
 
 
 
 < 
 
 
 
 CO 
 
 
 
 
 Cu»~ ^ 
 CO 
 
 a> 
 
 CT> 
 
 ai 
 
 
 c 
 
 =3 
 
 cr 
 cu 
 
 CO 
 
 o 
 s- 
 +-> 
 c 
 o 
 o 
 
 CO 
 
83 
 
 arithmetic and logical units. As with past machines intended for busi- 
 ness data processing [ADA60], these arithmetic processors should be 
 designed to do decimal, rather than binary, arithmetic. 
 
 In addition to the arithmetic processors, several other types 
 are included. 
 
 The IF Tree Processor proposed by Davis [DAV72b] accepts as 
 input the conditional result set. Each bit in the conditional result 
 set represents the result of evaluating the conditional expression from 
 one IF statement. The processor returns as output the identification 
 of the branch of a conditional tree traversed. By evaluating all of 
 the conditional expressions concurrently, passing the results to an 
 IF Tree Processor, then executing a small number of assignment state- 
 ments concurrently, this device allows parallel execution of conditional 
 branches which would otherwise degrade speedup badly [TJA70> RIS72]. 
 Because COBOL programs tend to have large and complex decision trees 
 compared to those in a typical numerical program, and because several 
 data records are undergoing processing concurrently, several of these 
 IF Tree Processors are needed. 
 
 Very commonly COBOL programs sort a file on one or more keys 
 contained in each record. Because of this use of the SORT operation and 
 the time-consuming nature of software methods of sorting large amounts 
 of data, there should be a sorting network included as one of the pro- 
 cessors. The networks described by Batcher [BAT68] are good candidates 
 for this job. 
 
84 
 
 5.7 Data Memory and Buses 
 
 Each Data Memory Unit, shown in Figure 5.13, includes a 
 Primary Memory Module, a Data Register Array, and a Function Unit. 
 
 Data items are stored in the Primary Memory Module until re- 
 quested by a fetch request from the Instruction Dispatch Unit. Fetch 
 requests are enqueued by the Control Logic. The requested data is 
 transferred to one of the Data Registers before it is required by a 
 processor, rather than after as in the case of slave memories [WIL65] 
 or cache memories [LIP68]. Associated with each register is a word in 
 the Address Memory, an associative memory which contains the Primary 
 Memory address of the contents of the register. This address is set 
 during a fetch from the Primary Memory Module or during the transfer of 
 a result value from a processor. To avoid unnecessary Primary Memory 
 fetches, the Address Memory is searched for each fetch request in hopes 
 of recovering previously used data. For each operand request from a 
 processor, it is searched to determine which register contains the re- 
 quested operand. When a result is received from a processor, the 
 Address Memory is searched. If a register has already been allocated 
 for this item, the item is placed in that register. Otherwise, the 
 least-recently used register is allocated for this item. The address 
 in the Address Memory is altered only when a new item is to be written 
 into the register. Flag bits are provided for each register to indi- 
 cate its status. The indications are: 
 
 1) Waiting for Request - Data has recently been fetched 
 but has not been requested by a processor. 
 
85 
 
 
 
 
 1 
 
 
 
 
 
 UJ 
 
 O (E 
 UJ 
 
 §5 
 
 
 
 _l 
 O 
 
 o o 
 
 O -J 
 
 
 1 
 
 
 
 
 
 
 
 
 
 ^ 
 
 
 i 
 
 I i 
 
 ' t 
 
 
 
 
 
 
 
 * 
 
 
 
 V) > 
 
 V) oc 
 UJ o 
 
 Q uJ 
 
 ' 
 
 ROUTING 
 
 NETWORK 
 
 INTERFACE 
 
 1 
 
 
 
 
 
 $ 
 
 
 
 
 
 1 
 
 r 
 
 
 
 t 
 
 
 
 v. 
 
 1 
 
 >- 
 
 K 
 O 
 Z UJ 
 
 £ § 
 
 z 
 
 K 
 CL 
 
 
 
 0E 
 UJ 
 
 0) 
 
 < < 
 
 < 
 a 
 
 
 INTER -MEMORY 
 
 BUS 
 INTERFACE 
 
 p 
 
 
 
 
 
 
 
 
 z 
 
 1! 
 
 u. 
 
 
 
 
 
 
 
 
 
 
 
 
 (O UJ 
 
 m if 
 
 X cc 
 
 UJ Ul 
 
 z z 
 
 1 
 
 •ft 
 
 
 
 
 cr> 
 
 en 
 
 o 
 
 E 
 <1J 
 
 eo 
 
 O 
 
86 
 
 2) Waiting for Store - A result has been sent by a 
 processor but has not yet been stored in the 
 Primary Memory. 
 
 3) Not Waiting - Data has been fetched and has been 
 sent on to a processor, or a store has been com- 
 pleted. The register is available for reassignment. 
 
 Within each Data Memory Unit a set of special processors is 
 provided to handle those functions which do not require a full processor 
 of one of the types described in section 5.6. These memory processors 
 reduce the demand on the Routing Network. Functions which these memory 
 processors perform include the following: 
 
 1) Data Transformation - In COBOL the TRANSFORM state- 
 ment is used to change all occurrences of one set 
 of characters into another set of characters 
 within a data item. For example, 
 
 TRANSFORM A FROM '$<£' TO 'DC. 
 results in the change of all occurrences of the 
 $ character in data item A to the letter D and 
 all occurrences of the character t to the letter C. 
 
 2) Character Examination - In COBOL the EXAMINE state- 
 ment is used to count the number of occurrences of 
 a character within a data item. It can also be 
 used to transform each occurrence of that charac- 
 ter to another character. COBOL also includes 
 tests to determine if a data item is numeric or if 
 it is alphabetic. 
 
87 
 
 3) Counter Incrementing - A very common statement 
 in COBOL programs is 
 
 ADD 1 TO item. 
 Since we can regard incrementing a value as a 
 monadic operation on that variable, there is no 
 need to route the value through the Routing 
 Network to a processor, perform the operation, 
 and return the result through the Routing 
 Network back to the same memory unit. 
 
 4) Another common type of COBOL statement is 
 
 MOVE SPACES TO item. 
 or 
 
 MOVE ZEROS TO item, 
 where the number of spaces or zeros is determined 
 by the length of the item. The operation of 
 jamming one of these values into an item could be 
 done by logic built into the Data Register Array 
 [ST070]. 
 In Figure 5.1 it can be seen that communications between 
 memories occur over the Inter-Memory Bus. This time-shared bus is ade- 
 quate since the storage assignment algorithm described in section 3.5 
 attempts to keep data items which have an affinity for each other in 
 the same memory unit. Calculations related to the necessary bus 
 bandwidth are given in Chapter 7. 
 
88 
 
 5.8 I/O Processors 
 
 In the paper up to this point we have been assuming that read- 
 ing and writing data records takes very little time—little enough time 
 that an I/O Processor can keep up with the demands of several instruc- 
 tion streams. Obviously, a very sophisticated I/O Processor is needed. 
 
 We are not attempting to design such a processor here since 
 such a design depends heavily on the capabilities of the bulk storage 
 devices with which it interfaces and the technologies available for its 
 implementation. There are some comments we can make regarding such a 
 design, however, which derive from observations of our example programs. 
 
 Since some COBOL programs operate on several input and output 
 files, it seems advantageous, in view of the throughput rates needed, 
 to have several I/O Processors. As long as the number of files being 
 accessed is not greater than the number of I/O Processors, each file 
 should be assigned to a separate I/O Processor. 
 
 One way of achieving a very fast I/O rate is to place all of 
 the data in a random access memory. Reading or writing then amounts to 
 a transfer of information from one memory to another. If the file 
 sizes are yery large, the amount of such buffer memory needed becomes 
 prohibitively expensive. However, it is apparent that the larger the 
 buffer memory can be made the closer we can approach this ideal. A 
 large memory, filled before program execution starts, could be at least 
 partially refilled while the original data is processed. When an Address 
 Counter encounters a READ or WRITE instruction which cannot be executed 
 immediately, because of the unavailability of data or buffer space, 
 there are several things that could be done. One is simply to wait 
 
89 
 
 until it is possible to proceed. In view of the disparity between 
 machine operation times and rotating storage access times, this approach 
 could be very wasteful of execution resources. A better way is to 
 create an I/O queue and an I/O save area in the Address Counter 
 Coordinator similar to those used for the enqueued HOLD instructions. 
 Subsequent instructions accessing the same file would have to be chained 
 together to insure that they are executed in the proper order, and a 
 mechanism for restarting an instruction stream when the data is avail- 
 able would also have to be implemented; but neither of these problems 
 seems especially difficult. 
 
 To keep the amount of data transferred between memories dur- 
 ing an I/O instruction small, it seems apparent that the I/O Processor 
 should have a description of the record format. This information 
 allows the following economies: 
 
 1) Only those items actually used from an input record 
 would be transferred from an input record to the 
 Data Memory Units. It is quite common for only a 
 few items, from a large set of items in a record, to 
 be used during the execution of the program. 
 Fillers and unused data items would be discarded. 
 
 2) Any constants appearing in output records, such as 
 page headings, could be retained in the I/O 
 Processor, eliminating the need to continually 
 transfer this invariant information between data 
 memory and the I/O Processor. 
 
90 
 
 5.9 Routing Network 
 
 It is necessary to provide some method [KUC72a] of allowing 
 any processor to interrogate any Data Memory Unit. Two methods are com- 
 bined in Figure 5.1. The first, the Switching Network, is a crossbar 
 switch, using a few of the high order bits of the operand to select a 
 path through the network. The second method is a system of time-shared 
 buses connecting groups of Data Memory Units and groups of processors to 
 the Switching Network. The sizes of the groups are determined by the 
 number of memories and processors, and are affected by tradeoffs between 
 the size of the Switching Network, bus bandwidth, and the bus holding 
 times. Some calculations relating to this problem are given in 
 Chapter 7. 
 
 5.10 Modifications for Multiprogramming 
 
 Thus far we have been assuming that only one program at a time 
 is in execution. We now briefly consider the modifications needed to 
 execute more than one program concurrently. 
 
 There need be no changes in the algorithms used in the com- 
 piler. All of the FORKs, HOLDs, QUITs, and interlocks operate only 
 between different records being processed by the same program. No vari- 
 ables are shared between different programs, although they may share 
 files of data. 
 
 The processors and memories need not be changed, except in 
 memory size, since they are indifferent to the source of the instructions 
 that pass through them. 
 
91 
 
 The Address Counters do not operate differently for different 
 programs. Each is independent of the others except for the handling of 
 interlock conditions. Since interlock handling is done by the Address 
 Counter Coordinator and not by individual Address Counters, the design 
 of the Address Counter need not be modified to support multiprogramming. 
 
 The largest changes must be made in the Instruction Dispatch 
 Unit and in the Address Counter Coordinator. Since instruction streams 
 for different programs are independent, it is possible to route instruc- 
 tions from different programs through different Instruction Dispatch 
 Units. This appears to be a good thing to do since it is the Instruction 
 Dispatch Unit which limits the speed of our machine, as discussed in 
 section 7.1. In order to implement this modification, a switch has to 
 be inserted to allow any Address Counter to send instructions to any 
 Instruction Dispatch Unit. The Address Counter Coordinator would, then, 
 have to be modified to allow it to control this switch and to keep inter- 
 locks from one program distinct from those in another program. 
 
92 
 
 6. EXPERIMENTAL RESULTS 
 
 6.1 Introduction 
 
 Several analyses were made of a sample set of COBOL programs. 
 These programs were obtained from the Student Data Area, the 
 Institutional Area, and the Financial Area in the University of Illinois 
 Office of Administrative Data Processing at the Urbana campus. Most of 
 the programs were considered by the programmers to be typical of the 
 types of processing done at that installation, but some programs were 
 selected because they were atypical and might be expected to tax any 
 method chosen for their execution. While this sample is limited to 
 those programs available from a single university administrative com- 
 puter center, we feel that they are comparable to programs found in 
 various businesses. For one example, a program which generates a 
 report from a student master file is similar to one which generates a 
 report from a file of an insurance company's policy holders. As another 
 example, consider the similarity of a program which prints student grade 
 reports and one which prints charge account bills. Of course, there are 
 also many functions common to the business and academic worlds, such as 
 maintenance of inventory records and payroll records. Thus, while our 
 sample is limited, it does seem to be representative of COBOL programs 
 in general . 
 
 6.2 Variable Size Counts 
 
 One analysis we made was of the frequency of occurrence of 
 variables of various sizes in our sample of 42 programs. Fortunately, 
 
93 
 
 the Data Division of a COBOL program contains a description of every 
 variable used in the program. This description contains a PICTURE 
 clause which contains information about the number of characters needed 
 to hold an item's value. Also included in the Data Division are a 
 number of entries with the name FILLER. These are typically used to 
 indicate items in a file which are not accessed, or to hold spaces or 
 text for printout format. Since we discard unused data items, such as 
 those in the former case, and since we hold format information and 
 constant printout text in the I/O Processors, we discarded all FILLER 
 items in this analysis. 
 
 A summary of the results of this analysis is given in 
 Table 6.1. It should be noted that some entries in this table have two 
 values. In these cases it was found that one or more of the programs 
 in our sample used an unusually large table (i.e.: a vector or 2- or 
 3-dimensional array). Had we had a very large sample, we would expect 
 such single occurrences of large counts to be swamped out by the rest 
 of the mass of the data. In our sample of 42 programs, however, such 
 single occurrences swamped out the rest of the counts. To overcome this 
 problem we have deleted the portion of the data attributable to large 
 single tables; but we have given, as the parenthesized value in the 
 table, the counts which do include such tables. A plot of the fre- 
 quency counts in Table 6.1 is given in Figure 6.1. 
 
 6.3 Statement Type Counts 
 
 Another type of analysis we did was count the frequency of 
 occurrence of each type of statement. This analysis was done for 34 of 
 
94 
 
 Table 6.1 
 Frequency Count of Variable Sizes 
 
 
 Number 
 
 Size 
 
 
 of 
 
 
 (Char) 
 
 Occurrences 
 
 1 
 
 4711 
 
 (24912) 
 
 2 
 
 2461 
 
 
 
 3 
 
 2217 
 
 ( 3917) 
 
 4 
 
 2062 
 
 (12618) 
 
 5 
 
 2980 
 
 ( 4996) 
 
 6 
 
 2438 
 
 ( 4201) 
 
 7 
 
 118 
 
 
 
 8 
 
 142 
 
 
 
 9 
 
 285 
 
 ( 
 
 410) 
 
 10 
 
 593 
 
 ( 
 
 1093) 
 
 11 
 
 27 
 
 
 
 12 
 
 34 
 
 
 
 13 
 
 24 
 
 ( 
 
 524) 
 
 14 
 
 28 
 
 ( 
 
 94) 
 
 15 
 
 24 
 
 
 
 16 
 
 16 
 
 
 
 17 
 
 14 
 
 
 
 18 
 
 55 
 
 ( 
 
 73) 
 
 19 
 
 5 
 
 
 
 20 
 
 103 
 
 ( 
 
 299) 
 
 21 
 
 31 
 
 
 
 22 
 
 2 
 
 ( 
 
 70) 
 
 23 
 
 24 
 
 
 
 24 
 
 4 
 
 
 
 25 
 
 7 
 
 
 
 26 
 
 2 
 
 
 
 27 
 
 4 
 
 
 
 28 
 
 3 
 
 ( 
 
 41) 
 
 29 
 
 8 
 
 
 
 30 
 
 8 
 
 ( 
 
 28) 
 
 31 
 
 2 
 
 
 
 32 
 
 
 
 
 
 33 
 
 5 
 
 ( 
 
 13) 
 
 34 
 
 1 
 
 
 
 35 
 
 1 
 
 
 
 36 
 
 
 
 
 
 37 
 
 2 
 
 
 
 38 
 
 
 
 
 
 39 
 
 2 
 
 
 
 40 
 
 6 
 
 
 
 41 
 
 2 
 
 
 
 42 
 
 1 
 
 
 
 43 
 
 1 
 
 
 
 Mean 
 
 Median 
 
 Percent 
 
 per 
 
 per 
 
 of 
 
 Program 
 
 Program 
 
 Sample 
 
 112 
 
 33 
 
 25.29 
 
 58.7 
 
 18 
 
 13.21 
 
 52.8 
 
 13 
 
 11.90 
 
 49.1 
 
 11 
 
 11.07 
 
 71.0 
 
 14 
 
 16.00 
 
 58.0 
 
 7 
 
 13.09 
 
 2.8 
 
 
 
 0.63 
 
 3.4 
 
 1 
 
 0.76 
 
 6.8 
 
 3 
 
 1.53 
 
 14.1 
 
 5 
 
 3.18 
 
 0.6 
 
 
 
 0.14 
 
 0.8 
 
 
 
 0.18 
 
 0.6 
 
 
 
 0.13 
 
 0.7 
 
 
 
 0.15 
 
 0.6 
 
 
 
 0.13 
 
 0.4 
 
 
 
 0.09 
 
 0.3 
 
 
 
 0.08 
 
 1.4 
 
 
 
 0.30 
 
 0.1 
 
 
 
 0.03 
 
 2.5 
 
 1 
 
 0.55 
 
 0.7 
 
 
 
 0.17 
 
 
 
 
 
 0.01 
 
 0.6 
 
 
 
 0.13 
 
 0.1 
 
 
 
 0.02 
 
 0.2 
 
 
 
 0.04 
 
 
 
 
 
 0.01 
 
 0.1 
 
 
 
 0.02 
 
 
 
 
 
 0.02 
 
 0.2 
 
 
 
 0.04 
 
 0.2 
 
 
 
 0.04 
 
 
 
 
 
 0.01 
 
 
 
 
 
 - 
 
 0.1 
 
 
 
 0.03 
 
 
 
 
 
 0.01 
 
 
 
 
 
 0.01 
 
 
 
 
 
 - 
 
 
 
 
 
 0.01 
 
 
 
 
 
 - 
 
 
 
 
 
 0.01 
 
 0.1 
 
 
 
 0.03 
 
 
 
 
 
 0.01 
 
 
 
 
 
 0.01 
 
 
 
 
 
 0.01 
 
95 
 
 Table 6.1 (continued) 
 Frequency Count of Variable Sizes 
 
 
 Number 
 
 Mean 
 
 Median 
 
 Percent 
 
 Size 
 
 
 of 
 
 per 
 
 per 
 
 of 
 
 (Char) 
 
 Occu 
 
 rrences 
 
 Program 
 
 Program 
 
 Sample 
 
 44 
 
 2 
 
 
 
 
 
 
 0.01 
 
 45 
 
 
 
 
 
 
 
 
 - 
 
 46 
 
 1 
 
 
 
 
 
 
 0.01 
 
 47 
 
 1 
 
 
 
 
 
 
 0.01 
 
 48 
 
 4 
 
 
 0.1 
 
 
 
 0.02 
 
 49 
 
 4 
 
 
 0.1 
 
 
 
 0.02 
 
 50 
 
 4 
 
 
 0.1 
 
 
 
 0.02 
 
 Counts for Sizes > 50 
 
 Size 
 
 Count 
 
 Size 
 
 Count 
 
 51 
 
 1 
 
 52 
 
 1 
 
 54 
 
 2 
 
 55 
 
 2 
 
 57 
 
 3 
 
 58 
 
 3 
 
 60 
 
 8 
 
 62 
 
 1 
 
 63 
 
 1 
 
 66 
 
 27 
 
 70 
 
 4 
 
 72 
 
 2 
 
 74 
 
 2 
 
 76 
 
 2 
 
 80 
 
 14 
 
 81 
 
 8 
 
 83 
 
 1 
 
 84 
 
 1 
 
 85 
 
 2 
 
 89 
 
 2 
 
 90 
 
 1 
 
 92 
 
 1 
 
 93 
 
 2 
 
 100 
 
 12 
 
 104 
 
 1 
 
 107 
 
 1 
 
 109 
 
 2 
 
 no 
 
 1 
 
 116 
 
 1 
 
 120 
 
 2 
 
 126 
 
 1 
 
 132 
 
 2 
 
 133 
 
 24 
 
 136 
 
 1 
 
 150 
 
 4 
 
 162 
 
 1 
 
 181 
 
 1 
 
 191 
 
 1 
 
 203 
 
 1 
 
 205 
 
 1 
 
 264 
 
 1 
 
 287 
 
 1 
 
 310 
 
 1 
 
 312 
 
 1 
 
 379 
 
 1 
 
 392 
 
 1 
 
 398 
 
 1 
 
 405 
 
 1 
 
 449 
 
 1 
 
 700 
 
 1 
 
 1500 
 
 1 
 
96 
 
 5000 
 
 Figure 6.1 
 
 Frequency Count Plot 
 
 Variable Sizes 
 
 1 * 1 I l I I I I I I M M I I I | I M I I I M I I ■ 
 20 25 30 35 40 45 50 
 
 10 15 
 
 Variable Size (Characters) 
 
97 
 
 our programs, since eight of the 42 programs were too large for us to 
 transform into an analyzable form. Table 6.2 gives a summary of the 
 results of this analysis. Figure 6.2 is a plot of the data in Table 6.2 
 broken down into classes of statements. Figure 6.3 is a plot of the data 
 in Table 6.2 broken down into individual statement types. 
 
 As a part of the same analysis, we counted the frequency of 
 occurrence of each of the operators available in COBOL. A summary of 
 this data is shown in Table 6.3. Since incrementing is very common, it 
 was counted separately from other types of ADD instructions. Figure 6.4 
 presents the data of Table 6.3 broken down by operator class. 
 Figure 6.5 shows the data of Table 6.3 broken down by individual opera- 
 tor type. 
 
 6.4 Program Analyses 
 
 A number of the programs were examined to determine what sort 
 of speedup our method might actually deliver. In the process, informa- 
 tion relating to machine parameters was also generated. 
 
 Table 6.4 gives a brief summary of the statistics relating to 
 the sizes of the programs that were analyzed. It should be noted that 
 these are not large programs since the analyses were done by hand. They 
 seem to be typical of the small- to medium-sized programs found at any 
 COBOL computer center. The following is a list of the programs we 
 analyzed with a brief description of each one: 
 
 B7510363 A report and an error listing are generated 
 from one input file. This program required 
 heavy interlocking. 
 
98 
 
 Table 6.2 
 Frequency Count of Statement Types 
 
 Statement 
 Type 
 
 All Statements 
 
 I/O 
 
 READ 
 
 WRITE 
 
 REWRITE 
 
 ACCEPT 
 
 DISPLAY 
 
 EXHIBIT 
 
 RETURN 
 
 RELEASE 
 
 OPEN 
 
 CLOSE 
 
 Assignment 
 
 MOVE 
 
 TRANSFORM 
 Ari thmeti c 
 
 Control 
 
 IF 
 PERFORM 
 
 Misc. 
 
 EXAMINE 
 SORT 
 ON, AT 
 CALL 
 
 Number 
 
 of 
 
 Occurrences 
 
 Percent 
 of all 
 Occurrences 
 
 Mean 
 per 
 Program 
 
 Medi an 
 
 per 
 Program 
 
 16945 
 
 100 
 
 498.4 
 
 143 
 
 1583 
 
 9.3 
 
 46.5 
 
 26 
 
 134 
 
 0.8 
 
 3.9 
 
 2 
 
 900 
 
 5.3 
 
 26.5 
 
 7 
 
 3 
 
 
 
 0.1 
 
 
 
 18 
 
 0.1 
 
 0.5 
 
 
 
 242 
 
 1.4 
 
 7.1 
 
 5 
 
 56 
 
 0.3 
 
 1.6 
 
 
 
 22 
 
 0.1 
 
 0.6 
 
 
 
 67 
 
 0.4 
 
 2.0 
 
 
 
 69 
 
 0.4 
 
 2.0 
 
 2 
 
 72 
 
 0.4 
 
 2.1 
 
 2 
 
 10735 
 
 63.4 
 
 315.7 
 
 84 
 
 8852 
 
 52.2 
 
 260.3 
 
 62 
 
 101 
 
 0.6 
 
 3.0 
 
 
 
 1782 
 
 10.5 
 
 52.4 
 
 16 
 
 4037 
 
 23.8 
 
 118.7 
 
 27 
 
 3998 
 
 23.6 
 
 117.6 
 
 27 
 
 39 
 
 0.2 
 
 1.1 
 
 
 
 590 
 
 3.5 
 
 17.4 
 
 4 
 
 4 
 
 
 
 0.1 
 
 
 
 65 
 
 0.4 
 
 1.9 
 
 
 
 492 
 
 2.9 
 
 14.5 
 
 4 
 
 29 
 
 0.2 
 
 0.9 
 
 
 
99 
 
 63.4 
 
 
 Figure 6.2 
 Histogram 
 Statement Types by Class 
 
100 
 
 
 
 +J 
 
 
 
 c 
 
 
 
 a> 
 
 
 
 E 
 
 
 
 <u 
 
 
 
 4-> 
 
 
 
 rO 
 
 
 
 4-> 
 
 
 
 ir> 
 
 CO 
 
 s 
 
 
 t 
 
 ra 
 
 >•> 
 
 <£> 
 
 
 -Q 
 
 CD 
 
 o 
 
 CO 
 
 X- 
 
 +-> 
 
 CD 
 
 3 
 
 l/> 
 
 a. 
 
 O) 
 
 •r— 
 
 >> 
 
 »r- 
 
 n: 
 
 h- 
 
 CD 
 E 
 CD 
 +-> 
 fO 
 4-> 
 
 O 
 O 
 
 o 
 
 O 
 
 o 
 
 41. 
 
 o 
 
 41. 
 
 iV 
 
 
 «* 
 
 
 
 
 <*> 
 
 * 
 
 ^ 
 
 -^> 
 
 
 ■% 
 
 'ft. 
 
 >* 
 
 <* 
 
101 
 
 Table 6.3 
 Frequency Count of Operator Types 
 
 Operator 
 Type 
 
 Number 
 
 of 
 
 Occurrences 
 
 Percent 
 of all 
 Occurrences 
 
 Mean 
 per 
 Program 
 
 Median 
 
 per 
 
 Program 
 
 Operators 
 
 7391 
 
 
 100 
 
 217.4 
 
 59 
 
 Arithmetic 
 
 2122 
 
 
 28.7 
 
 62.4 
 
 16 
 
 Increment 
 
 
 390 
 
 5.3 
 
 11.5 
 
 9 
 
 + 
 
 
 934 
 
 12.6 
 
 27.8 
 
 4 
 
 - 
 
 
 50 
 
 0.7 
 
 1.5 
 
 
 
 • 
 
 
 375 
 
 5.1 
 
 11.0 
 
 
 
 / 
 
 
 341 
 
 4.6 
 
 10.0 
 
 
 
 ** 
 
 
 15 
 
 0.2 
 
 0.4 
 
 
 
 Comparison 
 
 4655 
 
 
 62.9 
 
 136.9 
 
 29 
 
 = 
 
 
 3216 
 
 43.5 
 
 94.6 
 
 21 
 
 < 
 
 
 170 
 
 2.3 
 
 4.9 
 
 2 
 
 > 
 
 
 496 
 
 6.7 
 
 14.6 
 
 3 
 
 f 
 
 
 641 
 
 8.7 
 
 18.9 
 
 5 
 
 i 
 
 
 24 
 
 0.3 
 
 0.7 
 
 
 
 f 
 
 
 68 
 
 0.9 
 
 2.0 
 
 
 
 Connective 
 
 584 
 
 
 7.9 
 
 17.2 
 
 4 
 
 OR 
 
 
 216 
 
 2.9 
 
 6.4 
 
 2 
 
 AND 
 
 
 368 
 
 5.0 
 
 10.8 
 
 
 
 Misc. 
 
 30 
 
 
 0.4 
 
 0.9 
 
 
 
 NUMERIC 
 
 
 30 
 
 0.4 
 
 0.9 
 
 
 
 ALPHABETIC 
 
 
 
 
 
 
 
 
 
 
102 
 
 6* 
 
 
 V 1 G 
 
 Figure 6.4 
 
 Histogram 
 Operator Types by Class 
 
103 
 
 O 
 
 o 
 
 
 
 o 
 
 
 
 
 *0 
 
 o 
 
 4-> 
 
 i- 
 
 <D 
 Q- 
 O 
 
 O) 
 
 S- 
 
 
 S- 
 
 O co 
 
 4-> a; 
 
 LO Q. 
 
 •r- _>, 
 
 
 ^ 
 
 i- 
 O 
 +-> 
 ra 
 
 s- 
 <1J 
 a. 
 o 
 
 
104 
 
 Table 6.4 
 Statistics for Analyzed Programs 
 
 Number Number Number Number Number 
 of of of of of 
 
 Program ID 
 
 Data Cards 
 
 Variables 
 
 Proc. Cards 
 
 Statements 
 
 Phasi 
 
 B7510360 
 
 149 
 
 60 
 
 156 
 
 140 
 
 1 
 
 15156040 
 
 378 
 
 202 
 
 212 
 
 172 
 
 2 
 
 15156050 
 
 46 
 
 94 
 
 70 
 
 49 
 
 2 
 
 15210030 
 
 200 
 
 75 
 
 88 
 
 65 
 
 1 
 
 15212005 
 
 158 
 
 66 
 
 229 
 
 183 
 
 3 
 
 SSN512 
 
 69 
 
 160 
 
 81 
 
 84 
 
 2 
 
 S7510025 
 
 68 
 
 4 
 
 7 
 
 5 
 
 1 
 
 S7550180 
 
 217 
 
 85 
 
 84 
 
 73 
 
 2 
 
 S7550181 
 
 215 
 
 33 
 
 73 
 
 59 
 
 2 
 
 S7550182 
 
 68 
 
 40 
 
 106 
 
 101 
 
 2 
 
 S7550183 
 
 230 
 
 90 
 
 121 
 
 169 
 
 2 
 
105 
 
 15156040 From one input file this program prints Avery 
 labels, sorts the file, then prints a report. 
 
 15156050 Records from the input file are read, and 
 
 selected items are copied to an intermediate 
 file which is then sorted. A report is gen- 
 erated from the sorted file. 
 
 15210030 From an old master and finder cards, this 
 program copies the old master into a new 
 master, updating selected records. 
 
 15212005 This includes two programs in one. The first 
 one reads one file, edits the input, then 
 outputs a modified file. The second one 
 uses a finder file to select master file 
 records, then outputs records with data from 
 both finder and master records. 
 
 SSN512 Data from one input file with additional data 
 from a master file is copied to the output 
 file. 
 
 S7510025 This program generates an output file of 1350 
 identical records. It does no input. 
 
 57550180 This program generates an output file from an 
 input file plus information from matching 
 master records. 
 
 57550181 This program is similar to S7550180, but it 
 does more calculation of output data. 
 
106 
 
 57550182 This program is similar to S7550180, but it 
 copies more data into an output file. 
 
 57550183 This program is similar to S7550182. 
 
 As a part of the analysis, storage allocation was done for 
 each phase. In view of the peaks in Figure 6.1 at variable sizes of 5, 
 10, and 20 characters, an allocation was made for each of these sizes. 
 Table 6.5 presents the number of words needed and the percentage utili- 
 zation for each of these allocations. The significance of these 
 results is discussed in Chapter 7. 
 
 6.5 Program Simulation 
 
 After the detailed analysis of a program was complete, its 
 execution was simulated to gather further information about machine 
 parameters and to obtain an estimate of the speedup possible using our 
 method. For this simulation the following rules were used: 
 
 1) The effects of storage access conflicts were 
 ignored. This is justified since the multiplicity 
 of Data Memory Units and the use of the Storage 
 Allocation Algorithm of section 3.5 should keep 
 the number of such conflicts small. 
 
 2) IF Trees of four or fewer levels were given a con- 
 current execution time of one unit time, while 
 larger IF Trees were given an execution time of 
 two units of time. All of the IF Trees we found 
 could be arranged in eight or fewer levels. These 
 execution times are in line with Davis's results 
 [DAV72a]. 
 
107 
 
 r— LT> CTl CT> OO in CM S CM m CO LT) in CTl «3"0 CTlLO O CO 
 
 ^O , # ## p a # ■ ■ • • • •• •• ft ft 
 
 i— +■> r*» oooo oo in in <o «3- *a- co co ooo s co cm cr> ooo 
 
 raocsj om «d-i — i — i — i — i— r— 
 
 -Q i— 
 
 in o 
 
 ■ — to 
 
 II O =**= "O CO i— i— r— i— >— COCOr- r— i— i— r— t— i— i— f— ■— r— i— 
 
 <u — 
 
 N i — W MW OO M COON <t<31 >* i — S COO MO COCO 
 
 •I- &5 T- 
 
 oo +-> n mm ini — cor-».r^i — cos s toco inoo too <^-i — 
 
 r— rj co coco coi — in innvo coco cti r- ct» coco co ct> cm r— 
 -o (0 
 
 u o 
 
 O O to U) CON (Mi — COi — COi — O0CVI N COM OOl CVJCM OHO 
 
 IS-J^-O "vi" r— r— l— 
 
 lo r^. r-* «3- «3- o oooon cm o co oooo coco i— i— «^-^j- 
 
 CM CMM Oi — i — i — OOCOCMOOCMCMCOCMCOCO 
 
 =tfe Q) 
 
 r— «d- oo oo m cm ooo in oo co men cm in loco oco 
 
 fc« »l- 
 
 i— +j vo oo old in m co m «3- co co coo coco cm oo ooo 
 
 (O O CM i— r— in CM «3- r— i— i— i— i— r— 
 
 o 
 
 i — to 
 
 O =8= TO CO i — l — r— i— i — COCOi — l — i — i— i — l — l — i — l — r— i—i — 
 
 
 
 N 
 
 
 r- 
 
 m 
 
 CO 
 
 00 
 
 in cm 
 
 CO 
 
 CO r-. 
 
 O 
 
 «d" CT> 
 
 CO 
 
 r— r^ 
 
 in 
 
 r^ 
 
 r— CM 
 
 00 00 
 
 
 
 •r— 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 00 
 
 
 •4-> 
 
 in 
 
 in 
 
 in 
 
 f>» CO 
 
 CO 
 
 r^ i — 
 
 in 
 
 oo r^» 
 
 CO 
 
 in co 
 
 i^ 
 
 (T» 
 
 "=d- cr> 
 
 <=3- r— 
 
 
 
 
 p*™ 
 
 rD 
 
 CO 
 
 en 
 
 ct> 
 
 r— r— 
 
 in 
 
 in «=d- u 
 
 CO 00 
 
 CTl 
 
 r^ a> 
 
 CO 
 
 CO 
 
 oo r-» 
 
 CM i— 
 
 
 to 
 
 -a 
 
 fO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 +J 
 
 S- 
 
 u 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 r— 
 
 o 
 
 o 
 
 to 
 
 in 
 
 CO 
 
 CM 
 
 cm r-^ 
 
 CO 
 
 i— CO 
 
 r— 
 
 CM CM 
 
 CM 
 
 CO CM 
 
 o 
 
 <y> 
 
 CM CM 
 
 cr> in 
 
 
 =3 
 
 3 
 
 _l 
 
 =8= "O 
 
 
 
 
 r— 
 
 
 
 
 
 
 
 
 
 
 
 
 to 
 CD 
 
 
 
 3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 or 
 
 
 
 E 
 
 CO 
 
 CM 
 
 CM 
 
 "vj- «3" 
 
 o 
 
 ct> oo a 
 
 r— O 
 
 CM 
 
 <d- ^a- 
 
 r— 
 
 r— 
 
 in cm 
 
 1^ h». 
 
 
 
 
 
 =«= <D 
 
 ^— 
 
 r— 
 
 r-— 
 
 
 in 
 
 
 
 «vf «=!- 
 
 r— 
 
 in in 
 
 
 
 
 
 in 
 
 c 
 
 
 
 s: 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 • 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 m 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 in 
 
 
 
 +J 
 
 
 
 r^ 
 
 in 
 
 in 
 
 in 
 
 o in 
 
 in 
 
 o in 
 
 o 
 
 in co 
 
 CO 
 
 in en 
 
 CO 
 
 r»» 
 
 i — CM 
 
 «3" CO 
 
 a> 
 
 (0 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 i— 
 
 u 
 
 
 r— 
 
 +> 
 
 sj- 
 
 oo a 
 
 in cm 
 
 m 
 
 CTi CM 
 
 o 
 
 CO CM 
 
 CO 
 
 CO o 
 
 «* 
 
 o 
 
 en in 
 
 CTl 00 
 
 -Q 
 
 o 
 
 
 (0 
 
 o 
 
 CM 
 
 
 
 CM r— 
 
 
 CM i— 
 
 r— 
 
 
 
 
 r— 
 
 
 
 
 rcs 
 
 r— 
 
 O 
 
 -O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1— 
 
 5 
 
 o 
 
 CM 
 
 II 
 a> 
 
 N 
 
 o 
 o 
 
 to 
 
 3 
 
 CO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ,— 
 
 CTl 
 
 r— 
 
 r— 
 
 00 i— 
 
 CO 
 
 1— I-»~ 
 
 CO 
 
 r— r— 
 
 CO 
 
 r— r-» 
 
 ^f 
 
 CM 
 
 r— «3" 
 
 ^r in 
 
 
 E 
 
 •r - 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 0) 
 
 OO 
 
 
 +-> 
 
 CO 
 
 "=3- CM 
 
 oo in 
 
 CO 
 
 r^ r— 
 
 CO 
 
 r-«. in 
 
 CO 
 
 ID 00 
 
 CTl 
 
 r— 
 
 r— CTl 
 
 CO o 
 
 
 s: 
 
 "O" 
 
 
 o 
 
 CO 
 
 00 
 
 00 
 
 1 
 
 in 
 
 CO "?f 
 
 "3- 
 
 m in 
 
 cr> 
 
 r-^ cr> 
 
 CM 
 
 CO 
 
 m in 
 
 CM r— 
 
 
 
 O 
 
 o 
 
 to 
 
 in 
 
 CO 
 
 CM 
 
 CM CT> 
 
 CO 
 
 r— CM 
 
 r— 
 
 CM CM 
 
 CM 
 
 CO CM 
 
 O O 
 
 CM CM 
 
 en in 
 
 
 
 3 
 
 _l =0 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 E 
 
 s 
 
 r^ 
 
 r». 
 
 "^ «* 
 
 in 
 
 s m in 
 
 r«. r-. 
 
 in 
 
 r-» s 
 
 r^» 
 
 r*. 
 
 i— 00 
 
 cri cr> 
 
 
 
 
 
 =8: a> 
 
 
 
 
 
 CM 
 
 
 
 CM CM 
 
 
 CM CM 
 
 
 
 r-— 
 
 
 
 
 
 
 s: 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ,__ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 to <e 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 QJ -Q 
 
 CTi 
 
 CM CN 
 
 ^f CM 
 
 CO 
 
 co in cv 
 
 in co 
 
 ■ — 
 
 CTl i— 
 
 <* c 
 
 ^j" CM 
 
 in co 
 
 
 
 
 
 r— o 
 
 p— 
 
 
 
 
 
 CM 
 
 
 
 
 
 
 
 
 
 
 
 
 
 J2 «— 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 =*fc 
 
 (O O 
 S- i— 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 «a <a 
 
 CM 
 
 in 
 
 r— 
 
 r»» in 
 
 «=*- 
 
 s.cor^ 
 
 r— «^- 
 
 CM 
 
 cm in 
 
 00 CO 
 
 r» in 
 
 r— CTl 
 
 
 
 
 
 > u 
 
 CM 
 
 in 
 
 CM 
 
 CO 
 
 in 
 
 r— CM 
 
 i — 
 
 in co 
 
 
 s. in 
 
 00 CO 
 
 CM r— 
 
 CO «3- 
 
 
 
 
 
 o 
 
 
 r~~ 
 
 r— 
 
 
 
 
 
 r— ,— 
 
 
 
 
 
 
 
 
 
 
 
 _i 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 QJ 
 
 to 
 
 fO 
 O- 
 
 i — i — CM i — CM i — i — cm ro 
 
 CM 
 
 r— CM i— CM p— CM r— CM 
 
 CT) 
 O 
 
 Q. 
 
 CO 
 
 in 
 co 
 
 o 
 
 in 
 
 co 
 
 o 
 
 o 
 in 
 in 
 
 in 
 
 o 
 in 
 o 
 in 
 in 
 
 in 
 
 o m 
 
 co o 
 
 o o 
 
 O CM 
 
 CM 
 
 in 
 
 CM 
 
 in 
 
 
 in 
 
 o 
 
 i — 
 
 CM 
 
 CO 
 
 
 CM 
 
 00 
 
 CO 
 
 00 
 
 00 
 
 CM 
 
 o 
 
 r— 
 
 r— 
 
 r— 
 
 r— 
 
 r— 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 in 
 
 1— 
 
 in 
 
 in 
 
 in 
 
 in 
 
 •z. 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 oo 
 
 s 
 
 r^ 
 
 r^ 
 
 r^» 
 
 r*. 
 
 oo 
 
 00 
 
 oo 
 
 oo 
 
 oo 
 
 oo 
 
108 
 
 3) Reading records from the primary input file was 
 assumed to take no time. We assume that data from 
 the primary input file is transferred from the 
 
 I/O Processor to the Data Memory before the READ 
 statement is executed. For statements accessing 
 other files, we assume that it takes one cycle to 
 move the data between the Data Memory and the 
 I/O Processor. 
 
 4) Fetching and storing data were assumed to take no 
 execution time. In the case of store instructions, 
 data to be stored is kept in one of the Data 
 Registers in a Data Memory Unit until the Primary 
 Memory has a free cycle. Allotting no time to a 
 fetch of data by a processor is justified on the 
 following grounds. We can consider the Instruction 
 Dispatch Unit to be a "black box" which releases 
 each instruction some time after it arrives from 
 
 an Address Counter. While the delay is not the 
 same for each instruction, we can say that there is 
 some, unspecified, average delay for each program. 
 We would expect this average delay to be much 
 smaller than the total execution time of the pro- 
 gram, hence we ignore this delay in our calculation 
 of program speedup. Since the Instruction Dispatch 
 Unit does not release an instruction until its data 
 is available in a Data Register, and since we expect 
 
109 
 
 transfers of data from a Data Register to a 
 processor to take much less than the time needed 
 to execute the operation, it seems reasonable to 
 ignore data fetch time. 
 
 5) An exception to Rule (4) was the MOVE operation. 
 Since a MOVE is nothing more than a fetch followed 
 by a store, it is tempting to say that MOVEs take 
 no time. In view of the large fraction of a COBOL 
 program that MOVEs comprise, however, this would 
 not be realistic. Thus, we used a time of one unit 
 for a MOVE instruction. Further, we assumed that a 
 MOVE CORRESPONDING instruction required one unit of 
 time for each item to be moved. 
 
 6) All arithmetic, comparison, and connective operations 
 were assumed to take a single unit of execution time. 
 
 7) FORK and RELEASE instructions were assumed to take 
 no execution time. This is justified since these 
 instructions go directly to the Address Counter 
 Coordinator which executes them while the Address 
 Counter continues on with the next instruction. 
 
 8) QUIT instructions were assumed to take one unit of 
 time to execute. This was done to simulate the 
 delay between the start of execution of a QUIT 
 and the time the Address Counter is again avail- 
 able for assignment. 
 
no 
 
 9) The operation of TEST instructions was simulated 
 by delaying the execution of an interlocked block 
 until the preceding Address Counter had passed an 
 appropriate RELEASE instruction. 
 
 10) TRANSFORM instructions were assumed to take one 
 unit of time to execute. 
 
 11) Those instructions containing subscripted vari- 
 ables were assumed to require a unit of time for 
 effective address calculation in addition to the 
 execution time of the operation. This was done 
 to allow for the time needed to transfer the 
 subscript value from the Data Memory to an Address 
 Counter. For a block of assignment statements 
 using the same subscript more than once, the 
 additional unit of time was added to the whole 
 block once, rather than to each subscripted state- 
 ment in the block. 
 
 12) ON and AT conditions were assumed to require no 
 time to execute since they could be implemented 
 as interrupts causing a branch to the appropriate 
 section of code during the execution of the 
 instruction to which the ON or AT condition was 
 attached. 
 
 13) Each exit from a conditional branch was assigned 
 a probability of execution according to the 
 following criteria: 
 
Ill 
 
 a) Exits leading to error processing were assigned 
 a probability of zero. While these paths in a 
 program are undoubtedly executed in the real 
 world, they should normally be executed in yery 
 small proportion to the non-error processing. 
 
 b) Any exit leading to an early termination of the 
 program was assigned an execution probability 
 of zero. 
 
 c) For some programs, information was available to 
 us in the form of counter values accumulated 
 during the actual execution of the program. 
 From these counter values, rough estimates of 
 execution probabilities for some paths could 
 
 be made. 
 
 d) In all other cases, it was assumed that exits 
 from a conditional branch instruction were 
 equally probable. 
 
 The results of the simulations are given in Table 6.6. The 
 following items are tabulated for each phase: 
 
 # STMT - Number of statements found within the phase. 
 
 Note that this is a static, rather than 
 
 dynamic value, 
 t-, - The amount of time a sequential processor 
 
 needs to process one record. 
 T, - The amount of time a sequential processor 
 
 needs to execute the segment of the simulated 
 
112 
 
 Table 6.6 
 Speedup Results 
 
 Program # 
 ID Phase STMT 
 
 B7510363 
 15156040 
 
 15156050 
 
 15210030 
 15212005 
 
 SSN512 
 
 S7510025 
 S7550180 
 
 S7550181 
 
 S7550182 
 
 S7550183 
 
 1 
 
 74 
 
 34 
 
 1 
 
 105 
 
 136 
 
 2 
 
 22 
 
 15 
 
 All 
 
 127 
 
 103 
 
 1 
 
 14 
 
 11 
 
 2 
 
 16 
 
 13 
 
 All 
 
 30 
 
 12.3 
 
 1 
 
 50 
 
 42 
 
 1 
 
 100 
 
 49.5 
 
 2 
 
 36 
 
 14.1 
 
 3 
 
 9 
 
 9 
 
 All 
 
 145 
 
 36.8 
 
 1 
 
 54 
 
 37.5 
 
 2 
 
 18 
 
 17 
 
 All 
 
 72 
 
 32.9 
 
 1 
 
 4 
 
 4 
 
 1 
 
 64 
 
 34 
 
 2 
 
 5 
 
 6 
 
 All 
 
 69 
 
 30.0 
 
 1 
 
 40 
 
 25 
 
 2 
 
 8 
 
 10 
 
 All 
 
 48 
 
 22.0 
 
 1 
 
 63 
 
 65.5 
 
 2 
 
 29 
 
 60 
 
 All 
 
 92 
 
 63.5 
 
 1 
 
 93 
 
 68.5 
 
 2 
 
 47 
 
 45 
 
 All 
 
 140 
 
 65.0 
 
 34 
 
 136p 
 
 15p 
 
 151p 
 
 Up 
 24 
 llp+24 
 
 42 
 
 257.4 
 14.1 
 24 
 
 295.8 
 
 75 
 17p 
 17p+75 
 
 4 
 
 102 
 6p 
 6p+102 
 
 50 
 
 lOp 
 10p+50 
 
 131 
 120 
 251 
 
 137 
 
 45p 
 45p+137 
 
 12 
 
 16 
 
 6 
 
 22 
 
 5 
 
 10 
 15 
 
 13.5 
 7.0 
 6.0 
 
 26.5 
 
 14 
 
 4 
 
 18 
 
 36 
 
 6 
 
 42 
 
 12 
 
 3 
 
 15 
 
 9 
 
 5 
 
 14 
 
 17 
 
 3 
 
 20 
 
 max 
 
 12p 
 
 4 
 
 12p 
 
 4p 
 2 
 
 4p 
 
 25 
 
 5 
 
 3 
 25 
 
 3 
 
 3 
 
 1 
 
 4 
 
 2p 
 
 2p 
 
 P 
 
 P 
 
 3 
 3 
 3 
 
 7 
 
 2p 
 
 2p 
 
 
 1.3 
 
 
 P 
 
 
 P 
 
 
 P 
 
 
 P 
 1.3 
 
 3 
 
 .7p+8.7 
 
 
 1.9 
 
 
 5.2 
 
 
 1.1 
 
 
 2.3 
 
 
 3.5 
 
 
 1.9 
 
 
 
 P 
 .2p+1.5 
 
 
 1 
 
 
 1.4 
 
 
 
 P 
 .lp+1.2 
 
 
 1.6 
 
 
 
 P 
 .2p+1.3 
 
 
 1.5 
 
 
 2.0 
 
 
 1.7 
 
 
 2.3 
 
 
 
 P 
 .2p+2.0 
 
 2.8 
 
 8.5p 
 2.5p 
 6.9p 
 
 2.2p 
 2.4 
 0.7p+1.6 
 
 4.7 
 
 19.1 
 2.0 
 4.0 
 
 11.2 
 
 5.4 
 
 4.3p 
 0.9p+4.2 
 
 1 
 
 2.8 
 
 8p 
 l.lp+2.4 
 
 4.2 
 
 3.3p 
 0.7p+3.3 
 
 14.6 
 24.0 
 17.9 
 
 8.1 
 
 15p 
 2.3p+6.9 
 
113 
 
 code on which the speedup calculation is 
 
 based. 
 
 T - The amount of time our concurrent processor 
 
 needs to execute the same segment of code 
 
 which generated the value given for T-. . 
 
 p - The maximum number of arithmetic units 
 r max 
 
 required to execute the simulated code. 
 a~ - The average number of Address Counters in 
 
 use during the execution of the program, 
 n - The maximum number of Address Counters 
 
 needed during the execution of the program. 
 S - The average speedup found from the ratio: 
 
 s -A- 
 
 P T. 
 
 P 
 In addition, these items are also tabulated for the program as a whole, 
 
 In generating this information, however, the effect of the execution 
 
 time of links was ignored. Compared to the execution time of a phase 
 
 we expect link execution time to be negligible. The total values, 
 
 then, are for all of the phases combined, as follows: 
 
 N 
 
 T > = ,£, \ 
 
 N 
 
 P fa Pi 
 Pm,v = maX {Pm.v 1 (1 = !» 2 > •••> N ) 
 
114 
 
 a = 
 
 1 N - - 
 
 T" I d i T P 
 
 T i=l n p i 
 
 n = max {n.} (i = 1 ,2, . . . ,N) 
 
 s - T i 
 
 P T 
 P 
 
 where N = number of phases in the program. We have assumed that the 
 same number of records are processed by each phase. The factor p 
 appearing in some of the entries is defined as the minumum of the 
 number of records available to be processed and the number of Address 
 Counters available to direct the processing. For example, if there 
 are 10,000 records available and 10 Address Counters, a speedup of 5p 
 is a speedup of 50. 
 
115 
 
 7. MACHINE PARAMETERS 
 
 To complete the design of our concurrent record processing 
 machine, we present in this chapter the values of various parameters for 
 the machine design proposed in Chapter 5. 
 
 7.1 Speed Limitation 
 
 By selecting the appropriate numbers of memories and proces- 
 sors, we can supply any memory and computational bandwidth needed. To 
 provide the necessary instruction fetch rate from the Address Counters, 
 we can use replicated interleaved Program Memories, multiple-instruction 
 fetches, and a pipelined Address Counter design. Compared to the 
 instruction rates handled by the Data Memory and the processors, the 
 rate handled by the Address Counter Coordinator should be quite small. 
 Thus, none of these units should limit the speed of our machine. 
 
 The speed of the Instruction Dispatch Unit, however, does 
 impose a limitation on the speed in our design. All instructions except 
 branch instructions have to pass through it. Pipelining steps (1) 
 through (7) of the operation of this unit (section 5.4) and executing 
 steps (3) through (6) in parallel by maintaining three copies of the 
 operand address in the Tag Status Register Array should allow us [GUN70, 
 VAD71] to process a single instruction in on the order of 250 nanosec- 
 onds. With a five stage pipeline, this gives us a dispatch time of 
 50 nanoseconds per instruction. Taking one microsecond as a Primary 
 Memory cycle time, this gives us a throughput rate of about 20 instruc- 
 tions per cycle. 
 
116 
 
 7.2 Number of Address Counters 
 
 From the estimated throughput rate for the Instruction 
 Dispatch Unit, it is apparent that 20 Address Counters, each fetching 
 an instruction per cycle, would generate the maximum load the 
 Instruction Dispatch Unit could handle. In order to have a power of two 
 to simplify addressing, we choose 16 as the number of Address Counters 
 in our machine. 
 
 7.3 Data Memory Word Size 
 
 As a result of our analysis of variable size (Table 6.1) and 
 of storage allocation (Table 6.5), we can select a word length for the 
 Data Memory. Considering Figure 6.1 we note that there are peaks at 
 values of 5, 10, and 20 characters. Clearly, we would prefer to use 
 these sizes rather than, say, 4, 8, or 16. We note, from data presented 
 in Table 6.5, that the efficiency of memory usage is lowest for a word 
 size of 20 characters, much better for a word size of 10 characters, but 
 only marginally improved, beyond that, for a word length of 5 characters. 
 The choice of word size is made more obvious if we plot the percentage 
 of the variables in our sample having a length less than or equal to a 
 particular size, against that variable size. This is shown in 
 Figure 7.1. As can be seen from this plot, almost 80% of the variables 
 have a length less than or equal to 5, and over 96% have a length less 
 than or equal to 10. On the basis of these statistics, it seems appar- 
 ent that 10 characters per word is a good Data Memory word length. 
 
117 
 
 lOOl 
 
 tt + i i i *+ ++ 
 
 It I I I I I I I I M I I | I I I I I 
 
 10 
 
 20 
 
 30 
 
 - T 
 40 
 
 Variable Size (Characters) 
 
 Figure 7.1 
 Cumulative Variable Size Distribution 
 
118 
 
 7.4 Data Character Size 
 
 Since the number of bits in a character did not enter into any 
 of our analyses, we lack a statistical basis from which we can derive a 
 good character size. Instead, we present the following remarks as jus- 
 tification for our choice. 
 
 Judging from the programs we have examined and from discus- 
 sions with COBOL programmers, it appears that data items are most 
 commonly viewed as character strings. This includes alphabetic informa- 
 tion such as a student's name, non-computational numeric data such as 
 his Social Security Number, and computational data such as the number of 
 hours of credit he has accumulated. Furthermore, even high usage 
 numeric data such as record counters are often retained in character 
 form. During the execution of an IBM/360 COBOL program, this results in 
 much transformation of data from character form to packed decimal form 
 and back again. It seems apparent that the use of a single representa- 
 tion for all data could result in a modest improvement in execution 
 speed by simply avoiding all of these unnecessary transformations. 
 Doing arithmetic with such a representation should prove to be no prob- 
 lem [SCH72]. 
 
 Another consideration in determining character size is the 
 number of characters needed in the character set. The COBOL programs we 
 examined had no need for a huge set of characters. Some machines [BUR67] 
 executing COBOL programs today have as few as 64 characters in their 
 character sets without ill effect. 
 
 Thus, we propose that our machine use 6-bit characters. Fur- 
 ther, we propose that this be the only way data is encoded. Our 
 
119 
 
 arithmetic units use the low order four bits of a character as a BCD 
 encoding of a number, signalling an error condition if the high order 
 bits are not the proper code for a numeric character. 
 
 7.5 Number of Data Memory Units 
 
 There is no apparent correlation between program size and the 
 number of memory units needed for the program's data. This is shown in 
 Table 7.1 and in Figure 7.2. It is apparent that 16 memory units would 
 be adequate for all but a few programs. The data for these few programs 
 could be made to fit into 16 memories at some sacrifice in execution 
 speed due to memory access conflicts. However, this degradation would 
 be mitigated through our use of high speed Data Registers in the Data 
 Memory. Thus, we propose that 16 Data Memory Units be used. 
 
 7.6 Size of Data Memory 
 
 From Table 7.1 we find that the number of words needed to hold 
 the globally accessable data is small compared to that needed for locally 
 accessable data. The largest number of words needed for local storage 
 by our sample programs is 19 words per memory unit. Allowing for the 
 size of our programs and the growth of memory requirements with program 
 size, 128 words per memory unit per Address Counter seems a reasonable 
 amount. Programs needing more than this will not be able to use all 
 16 Address Counters, but will be able to execute. This gives a require- 
 ment of about 2K words of local memory. In terms of characters and 
 bits, this would provide 20K characters and 120K bits of storage. For 
 
 *1K = 1024; 1M = 1,048,576. 
 
120 
 
 
 
 Tab 
 
 le 7.1 
 
 
 
 
 
 
 
 Memory Requirements 
 
 
 
 
 Program ID 
 
 # 
 Cards 
 
 Phase 
 
 # Wds 
 Global 
 
 § Wds 
 
 Local 
 
 # 
 Memories 
 
 Local Wds 
 x Memories 
 
 B7510363 
 
 305 
 
 1 
 
 3 
 
 
 5 
 
 13 
 
 65 
 
 15156040 
 
 590 
 
 1 
 2 
 
 
 
 3 
 2 
 
 12 
 12 
 
 36 
 24 
 
 15156050 
 
 116 
 
 1 
 2 
 
 
 
 2 
 17 
 
 4 
 4 
 
 8 
 68 
 
 15210030 
 
 288 
 
 1 
 
 
 
 3 
 
 50 
 
 150 
 
 15212005 
 
 387 
 
 1 
 2 
 3 
 
 3 
 3 
 
 
 1 
 3 
 
 1 
 
 9 
 8 
 8 
 
 9 
 
 24 
 8 
 
 SSN512 
 
 150 
 
 1 
 2 
 
 
 
 2 
 2 
 
 41 
 40 
 
 82 
 
 80 
 
 S7510025 
 
 75 
 
 1 
 
 
 
 2 
 
 12 
 
 24 
 
 S7550180 
 
 301 
 
 1 
 2 
 
 
 
 3 
 2 
 
 54 
 54 
 
 162 
 108 
 
 S7510181 
 
 288 
 
 1 
 2 
 
 
 
 10 
 9 
 
 11 
 11 
 
 no 
 
 99 
 
 S7550182 
 
 174 
 
 1 
 2 
 
 
 
 2 
 2 
 
 16 
 12 
 
 32 
 
 24 
 
 S7550183 
 
 351 
 
 1 
 2 
 
 
 
 19 
 16 
 
 17 
 17 
 
 323 
 272 
 
121 
 
 o 
 o 
 
 o 
 o 
 m 
 
 o 
 o 
 
 o 
 o 
 to 
 
 o 
 o 
 
 CM 
 
 i 
 
 CD 
 
 S- 
 
 <1J 
 
 N 
 
 OO 
 
 03 
 
 o 
 
 s- 
 
 Q- 
 
 > 
 
 CO 
 
 +-> 
 
 c 
 
 O) 
 
 E 
 <D 
 
 i- 
 
 cr 
 <u 
 
 o 
 
 E 
 0J 
 
 o 
 o 
 
 5 
 
 T 
 2 
 
 T 
 8 
 
 
 8 
 
 
 <0 
 
 
 
 1 
 
 
 1 
 
 
 1 
 
 o 
 
 1 
 o 
 
 1 
 O 
 
 
 1 
 o 
 
 * 
 
 ro 
 
 r\i 
 
 
 — 
 
 s//u/j Ajoujdw jo jaquinN 
 
122 
 
 a 16 memory unit machine, then, we have 32K words, 320K characters, or 
 1.9M bits. In terms of 8 bit bytes, this would be slightly more than 
 243K bytes of data storage capacity. 
 
 From Table 7.1 it appears that 4 Data Registers per Data 
 Memory Unit per Address Counter should be adequate. Thus, we need 64 
 Data Registers per Data Memory Unit for a total of 1024 words of high 
 speed memory. In each Data Memory Unit we also need 64 words x 11 bits 
 of content addressable Address Memory. 
 
 7.7 Size of Program Memory 
 
 In order to calculate the size of a Program Memory word, we 
 need to make the following assumptions: 
 
 1) Each instruction can contain two operand addresses 
 and a result address. 
 
 2) Each address in the instruction is composed of a 
 base register field, an index register field, and 
 a displacement field. 
 
 3) Each Address Counter contains 8 registers as base 
 or index registers, requiring 6 bits per address. 
 
 4) The maximum displacement necessary is 250 charac- 
 ters (25 words), requiring 8 bits per address. 
 
 5) Eight bits are sufficient for the operation code. 
 
 6) Six bits are sufficient to contain the length of 
 each operand and result. 
 
 Each operand, then, requires 20 bits to hold the address and 
 length. This gives us a total requirement of 68 bits per instruction. 
 
123 
 
 Since not all types of instructions have two operands and a 
 result, short-format instructions can be defined which use up to 34 bits 
 and are stored two per word. 
 
 To obtain an estimate of the total number of words needed in 
 the Program Memory, we need to estimate the size of the largest programs 
 we can reasonably expect a user to attempt to run. Table 7.2 gives some 
 information which was obtained from the Student Data Area and the 
 Financial Area of the University of Illinois Office of Administrative 
 Data Processing. This data shows that there are few yery large programs, 
 with none of these large programs containing more than 6,000 cards. In 
 addition, we were informed that large programs tended to be overlayed to 
 reduce the memory required for the program. Thus, it seems reasonable 
 to assume that 6,000 cards produce about as much code as we would expect 
 to be required to hold in the Program Memory at one time. 
 
 To obtain an estimate of the ratio of the number of cards per 
 statement in a large program, we examined our original sample of 42 pro- 
 grams and computed this ratio for each program of more than 1,000 cards. 
 The average value of this ratio was approximately 1.5. If we assume 
 that a 6,000 card program will generate a statement for each 1.5 cards, 
 this implies that we need about 4,000 words of Program Memory. To imple- 
 ment a memory of 4,096 words of 68 bits per word requires 272K bits. 
 This is equivalent to a 34K byte memory. 
 
 Another part of the Program Memory for which we need to com- 
 pute a size is the Cache. For most programs, we would like the Cache 
 to hold an entire phase. From Table 6.4 we find none of the programs 
 we analyzed had phases which exceeded 200 statements in total size. 
 
124 
 
 Table 7.2 
 Program Size Statistics 
 
 Student Data Area programs for which data was readily available: 
 
 Percent of Sample 
 
 56 
 28 
 11 
 
 3 
 
 1 
 > 1 
 
 
 
 Financial Area largest programs for which data was at hand: 
 
 3348 cards 
 
 3293 
 
 5627 
 
 Program Size 
 
 Numbe 
 
 < 500 cards 
 
 292 
 
 < 1000 
 
 147 
 
 < 2000 
 
 60 
 
 < 3000 
 
 17 
 
 < 4000 
 
 5 
 
 < 5000 
 
 3 
 
 > 5000 
 
 
 
125 
 
 However, allowing for the fact that our sample programs are small, we 
 propose a Cache size of IK words. 
 
 7.8 Numbers of Processors 
 
 7.8.1 IF Tree Processor 
 
 Table 6.2 shows that IF statements form about a quarter of the 
 statements in our sample. If we assume that we can group an average of 
 five IF statements per IF Tree, which is a fairly conservative assump- 
 tion, then roughly 5% of the operations executed in our machine will be 
 IF Tree execution operations. With 16 instruction streams active, this 
 implies that we need one IF Tree Processor. 
 
 7.8.2 Arithmetic/Logical Processors 
 
 For the Arithmetic/Logical Processor we look first at 
 Table 6.2 and find that the Arithmetic and IF statements, in which the 
 operators of Table 6.3 are found, comprise 34.1% of the program. Com- 
 paring the number of operators with the number of Arithmetic and IF 
 statements, we find an average of 1.2 operators per statement. This 
 implies that almost 41% of the instructions in the program require an 
 Arithmetic/Logical Processor. Doing a similar calculation for memory- 
 only operations (MOVE, Increment, etc.), we arrive at an estimate that 
 about half the operations involve only the memory. 
 
 * 
 For this calculation we have not included the Increment 
 
 operator and the NUMERIC and ALPHABETIC tests which are executed in 
 the Data Memory. 
 
126 
 
 Next, we consider the interaction between the memories and the 
 processors. Using access times and cycle times of an existing machine 
 as a guide [IBM73a, IBM73b], it seems reasonable to assume the following 
 operation times: 
 
 Fetch from Primary Memory 
 
 to a Data Register 1000 nsec. 
 
 Transfer between Data Register 
 
 and processor 200 nsec. 
 
 Add instruction time 600 nsec. 
 
 With operation times on this order, it is apparent that a pair of memory 
 units could keep a pair of processors supplied with data and still have 
 time for two to three memory-only operations per Primary Memory cycle. 
 This is roughly the 40%: 50% proportion of Arithmetic/Logical to memory- 
 only operations we found in the first part of this section. Thus, there 
 should be approximately the same number of Arithmetic/Logical Processors 
 as Data Memory Units. To make the total number of Arithmetic/Logical 
 and IF Tree Processors a power of two, we choose 15 as the number of 
 Arithmetic/Logical Processors. 
 
 7.8.3 1/0 Processors 
 
 For most programs we would like to be able to assign an 1/0 
 Processor to each file in use in a phase. For our sample of 42 programs 
 we found that the average number of files used per phase was 3.4, with 
 a maximum of nine. Since the larger programs tended to use more files, 
 about six 1/0 Processors are needed. To get a power of two for address- 
 ing purposes, we select eight as the number of 1/0 Processors. 
 
127 
 
 7.8.4 SORT Processor 
 
 None of the programs we examined had more than one file being 
 sorted at one time. Thus, one sorting network should be sufficient. 
 
 7.9 Instruction Dispatch Unit Memory Sizes 
 
 There are two memories in the Instruction Dispatch Unit whose 
 sizes we must find. These are the Tag Status Register Array and the 
 Instruction Waiting Register Array. The following analysis yields these 
 sizes: 
 
 Let 1 unit time = 1 memory cycle time 
 = 1 add time 
 
 Assume: 
 
 1) n = number of Address Counters 
 
 2) m = number of memory units 
 
 3) p = number of Arithmetic/Logical Processors + number 
 
 of IF Tree Processors 
 
 4) Each Address Counter issues r instructions per unit time 
 at highest speed. 
 
 5) Each Address Counter is fetching instructions for a 
 fraction f f <_ 1 of the time. 
 
 6) The instruction stream contains the following fractions 
 of each instruction type: 
 
 f- Arithmetic 
 
 f M Memory-only 
 
 f c Address Counter Control 
 
128 
 
 f R Branch 
 
 f. Conditional Branch 
 
 f A +f M +f C +f B +f I =1 . f 7 - 9 " 1 ' 
 
 7) The number of operands for instructions reaching 
 the Instruction Dispatch Unit are: 
 
 3 Arithmetic 
 
 2 Memory 
 
 1 Address Counter Control 
 
 8) Average holding times for tags are: 
 
 Arithmetic instructions 
 
 Sources 2 
 
 Results 3 
 Memory-only instructions 
 
 Sources 1 
 
 Results 2 
 
 9) Average holding times in the Instruction Dispatch 
 Unit for instructions are: 
 
 Arithmetic 2 (1 fetch, 1 previous operation) 
 Memory 1 (1 previous operation) 
 10) Consider a condition in which all Address Counters 
 are active, with conditional and unconditional branch 
 instructions being encountered, but no interlocks 
 inhibiting any of the Address Counters. 
 Considering the machine to be represented by the model in 
 Figure 7.3, we define: 
 
129 
 
 2 o o 
 < o o 
 
 'c 
 ID 
 
 o 
 
 E 
 
 ZE 
 
 o 
 
 ro 
 
 CD 
 
 CD 
 
 = b 5 
 
 -a 
 o 
 
 CD 
 
 +-> 
 
 q; 
 
 i_ 
 cu 
 
 4- 
 
 t/» 
 
 c 
 
 c 
 o 
 
 (_> 
 
 3 
 S- 
 
 +-> 
 if) 
 
 
 
 
 ^r >/ 
 
 ^i 
 
 k^ 
 
 >v << 
 
 
 
 
 
 
 
 
 ^<" 
 
 
 
 
 
 
 
 
 
 CM 
 
 • • • 
 
 c 
 
 
 ~1 
 
 I 
 
 
 1 
 
 1 
 
 i 
 
 i 
 
 «^ 
 
 
 
 ••- 
 
 
 •^ 
 
 
 -< 
 
 
 
 ^< 
 
 
 y< 
 
 
 
 v> 
 
 M w 
 
 « «- 
 
 2 ° 
 
 E >. 
 
 2 I 
 
 a. 2E 
 
130 
 
 X f = rate at which instructions are being fetched from 
 Program Memory by each Address Counter 
 - f - x (maximum fetch rate) 
 
 X. = rate at which instructions are being transferred from 
 each Address Counter to the Instruction Dispatch Unit 
 
 A, = rate at which instructions reach the Instruction 
 Dispatch Unit 
 = rate at which instructions leave the Instruction 
 Dispatch Unit 
 
 A c = rate at which Address Counter Coordinator instructions 
 leave the Instruction Dispatch Unit 
 
 A p = rate at which processor instructions leave the 
 Instruction Dispatch Unit 
 
 A = rate at which processor instructions reach each 
 processor 
 
 A M = rate at which memory-only operations leave the 
 Instruction Dispatch Unit 
 
 A = rate at which memory-only operations reach each 
 
 Data Memory Unit 
 Since we are assuming that interlock instructions are not 
 encountered, we neglect the contribution of Ap. 
 
131 
 
 Since each processor is capable of executing an operation each 
 cycle, 
 
 A p < p operations per cycle. 
 Since each Data Memory Unit can handle about twice as many memory-only 
 operations each cycle (in addition to processor operations) as there are 
 memories 
 
 X M <_ 2m operations per cycle. 
 Thus, 
 
 L < 2m + p operations per cycle. (7.9-2) 
 Comparing this result with the discussions of sections 7.1, 7.6, and 7.8 ; 
 we see that 
 
 Xj = 20 
 
 m = 16 
 
 p = 16 
 which does satisfy relation 7.9-2. 
 
 Since we are assuming that interlocks do not interfere with 
 Address Counter functioning, the only things that do interfere are 
 branches, both conditional and unconditional. If we assume that a con- 
 ditional branch takes one unit of time to be resolved, then a fraction 
 1 - fx - f B of the instructions fetched are passed on to the 
 Instruction Dispatch Unit. Thus, 
 
 x i - (1 - f, - f B ) x f 
 
 negl . 
 
 ■ (f A + f M + X> X f 
 
 Since the instruction streams coming from each Address Counter are inde- 
 pendent (by our assumption of no interlock activity, the rate at which 
 
132 
 
 instructions reach the Instruction Dispatch Unit is n times the rate 
 instructions are issued by an Address Counter. 
 
 h = nA i 
 
 = nX f (f A + f M ) (7.9-3) 
 
 The rate at which instructions leave the Instruction Dispatch Unit is 
 equal to the rate at which they arrive. Clearly, this must be true or 
 infinite queues would be needed. From relation 7.9-3 the arrival rate 
 for tag requests is 
 
 A T = nX f (3f A + 2f M ) 
 
 using assumption (7). 
 
 The number of tags in use at any point in time is 
 
 
 t Y (number of arrivals at t.) 
 
 n t = y i=k 
 
 types of 
 requests 
 
 where t. is the smallest average time such that there are no tags 
 
 remaining at time zero from those that arrived t. units of time earlier. 
 
 From assumption (8) we have 
 
 n(3 + 3 + l)(3f.X f ) n(2 + D(2f„X f ) 
 
 1 3 2 
 
 N T = 7f A X f n + 3f M X f n 
 
 = nX f (7f A + 3f M ) (7.9-4) 
 
 In a similar manner we find that the number of instructions 
 
 waiting, per unit time, for their operands to become available is 
 
 Nj = nX f (f A + f M ) (7.9-5) 
 
133 
 
 We have previously calculated the following values: 
 n = 16 Address Counters 
 
 
 f A 
 
 ~ 
 
 0.4 
 
 
 
 
 f M 
 
 ~ 
 
 0.5 
 
 
 
 
 X l 
 
 = 
 
 20 operations 
 
 per 
 
 cycle 
 
 From 
 
 equation 7. 
 
 9 
 
 -3 we find 
 
 
 
 
 v 
 
 = 
 
 X I 
 
 
 
 
 n(f A + f M ) 
 
 
 
 
 = 
 
 20 
 
 
 
 
 16(.9) 
 
 
 = 1.4 operations per cycle per Address Counter, 
 From equation 7.9-4 we then have 
 
 N T = 16(1. 4)(7 x 0.4 + 3 x 0.5) 
 =96.3 tags in use. 
 From equation 7.9-5 we find 
 
 Nj = 16(1.4)(0.9) 
 
 =20.2 instructions waiting. 
 Clearly, 128 Tag Status Registers and 32 Instruction Waiting 
 Registers are adequate. The total sizes of these two memories are found 
 as follows: 
 
 1) For each Tag Status Register we need three content 
 addressable address fields, each of which is com- 
 prised of 15 bits (log 2 (32K)). In addition we 
 need a 128 word x 48 bit content addressable memory 
 for the Tag Status Register Array. 
 
134 
 
 2) Each Instruction Waiting Register is comprised of 
 the following fields: 
 
 a) Instruction Operation Code ( 8 bits) 
 
 b) Operand and result addresses (3 x 15 bits) 
 
 c) Operand and result lengths (3x6 bits) 
 
 d) Operand and result tags (3x5 bits) 
 
 e) Status bits ( 2 bits) 
 This gives a total of 88 bits of which the 15 tag 
 storage bits must be content addressable. Thus, 
 
 the Instruction Waiting Registers can be constructed 
 of 32 words x 73 bits of random access memory and 
 32 words x 15 bits of content addressable memory. 
 
 7.10 Other Devices 
 
 Within each unit of the machine there are a number of devices 
 which we have not yet discussed. To allow us to estimate in section 7.11 
 the number of packages needed to build the machine, we briefly describe, 
 in this section, the circuitry for each unit. Not included in this 
 section, however, are the control circuitry for each unit and the bus 
 drivers and receivers needed for the various control signals sent 
 between units. These units are included in the calculations of 
 section 7.11 . 
 
 7.10.1 Program Memory 
 
 In addition to the memory itself we need the following devices 
 in the Program Memory: 
 
135 
 
 Queue 
 
 - 16 words x 16 bits to hold instruction addresses 
 (12 bits) and Address Counter numbers (4 bits) in 
 the Fetch Queuing and Routing Unit. 
 
 Decoder 
 
 - 4 bits of l-of-16 for routing instructions to the 
 proper Address Counter. 
 
 Bus Drivers and Receivers 
 
 - 68 drivers for sending instructions to Address 
 Counters. 
 
 - 12 receivers for instruction addresses. 
 
 7.10.2 Address Counter 
 
 Each Address Counter requires the following devices: 
 Incrementer 
 
 - One 12 bit counter. 
 Registers 
 
 - One 12 bit Program Address Register. 
 
 - One 68 bit Memory Buffer. 
 
 - Eight 15 bit index registers. 
 Decoder 
 
 - 3 bits to l-of-6 for the Op Code Decoder. 
 Matcher 
 
 - 4 bit matcher for the Address Counter ID Match 
 Unit. 
 
136 
 
 Adder 
 
 - Three 15 bit adders for the Address Calculation 
 Unit. 
 
 Bus Drivers and Receivers 
 
 - 12 drivers for sending the program address to 
 the Program Memory. 
 
 - 71 drivers for sending the instruction to the 
 Instruction Dispatch Unit. 
 
 - 15 drivers for transferring index registers to 
 the Address Counter Coordinator. 
 
 - 68 receivers for instructions from the Program 
 Memory. 
 
 - 12 receivers for the initiation point address 
 from the Address Counter Coordinator. 
 
 - 15 receivers for index register values from 
 the Address Counter Coordinator. 
 
 - 19 receivers for the Index Bus. 
 
 7.10.3 Instruction Dispatch Unit 
 
 In addition to the Instruction Waiting Registers and the Tag 
 Status Register Array, the following devices are needed in the 
 Instruction Dispatch Unit: 
 
 Queues 
 
 - 16 words x 71 bits for the Arriving Instruction 
 Queue. 
 
 - 48 words x 5 bits for the Tag Queue. 
 
137 
 
 - 8 words x 76 bits for the Processor Instruction 
 Queue. 
 
 Registers 
 
 - One 16 bit register for processor status 
 information. 
 
 - Five 71 bit registers used for pipelining the 
 operation of the Fetch and Tag Generator. 
 
 Bus Drivers and Receivers 
 
 - 76 drivers for the Memory Operation Bus. 
 
 - 71 drivers for sending instructions to the 
 Address Counter Coordinator. 
 
 - 76 drivers for the Processor Operation Bus. 
 
 - 71 receivers for instructions from the Address 
 Counters. 
 
 - 5 receivers for the Tag Bus. 
 
 7.10.4 Address Counter Coordinator 
 
 A detailed design for this unit was not presented in 
 Chapter 5; thus it is difficult to specify precisely the necessary 
 devices for the unit. If we assume an implementation completely of 
 hardware, however, it is apparent that devices of the following types 
 would be needed: 
 
 Queues 
 
 - 8 words x ~ 32 bits for the FORK queue. 
 
 - 2 words x ~ 32 bits for the HOLD queue. 
 
138 
 
 Registers 
 
 - Sixteen 16 bit registers for interlock status 
 information. 
 
 - Three 16 bit registers for Address Counter 
 status information. 
 
 - Sixteen 8 bit registers for predecessor/successor 
 information. 
 
 - Nine 15 bit registers for saving index registers 
 for a HOLDing Address Counter. 
 
 - Fifteen 4 bit registers for the list of inter- 
 locks whose RELEASE is being awaited. 
 
 Bus Drivers and Receivers 
 
 - 12 drivers for sending an initiation point 
 address to an Address Counter. 
 
 - 15 drivers for sending index register values. 
 
 - 15 receivers for receiving index register values 
 from an Address Counter. 
 
 - 71 receivers for instructions from the 
 Instruction Dispatch Unit. 
 
 7.10.5 Data Memory Unit 
 
 The Data Memory Unit includes the following devices: 
 Queue 
 
 - 2 words x 76 bits for the Memory Operation Bus. 
 Incrementer 
 
 - A 10 BCD digit counter. 
 
139 
 
 Match Circuits 
 
 - Ten 6 bit matchers in the Function Logic for 
 EXAMINE and TRANSFORM operations. 
 
 Register 
 
 - One 6 bit register associated with the match 
 circuits. 
 
 Bus Drivers and Receivers 
 
 - 19 drivers for the Index Bus. 
 
 - 60 drivers for the Inter-Memory Bus. 
 
 - 60 drivers to the Routing Network. 
 
 - 60 drivers for the I/O Bus. 
 
 - 5 drivers for the Tag Bus. 
 
 - 60 receivers for the Inter-Memory Bus. 
 
 - 60 receivers from the Routing Network. 
 
 - 60 receivers for the I/O Bus. 
 
 - 76 receivers for the Memory Operation Bus. 
 
 7.10.6 Routing Network 
 
 The Routing Network can be implemented as a 16 x 16 crossbar 
 switch. For 60 lines per path through the network we then need a 
 total of 15,360 crosspoints in the network. 
 
 7.10.7 Arithmetic/Logical Unit 
 
 Each Arithmetic/Logical Unit has the capability of operating 
 on a pair of 10 digit BCD numbers. In addition, logic is included to 
 interpret two low order digits as an exponent to simulate floating point 
 operations. This requires the following devices for each of these units: 
 
140 
 
 Adder 
 
 - 10 BCD digits. 
 Multiplier 
 
 - 10 BCD digits. 
 Logical Unit 
 
 - Ten 6 bit characters. 
 Shift Register 
 
 - 8 BCD digits. 
 
 Bus Drivers and Receivers 
 
 - 60 drivers to send to the Routing Network. 
 
 - 60 receivers to receive from the Routing Network. 
 
 - 76 receivers to receive from the Instruction 
 Dispatch Unit. 
 
 7.10.8 IF Tree Processor 
 
 The design of the IF Tree Processor has been detailed by Davis 
 [DAV72b: page 32] and is not repeated here. 
 
 7.10.9 1/0 Processor 
 
 No design for an I/O Processor is given in Chapter 5. In 
 order to arrive at an estimate of the hardware required for each of 
 these units, we first examine the throughput required for a unit. 
 
 From Table 6.6 we find that the average time spent processing 
 a record for phases able to use p Address Counters is on the order of 
 10 units of time. During this time we can have 16 records being pro- 
 cessed concurrently. If we assume that each READ statement results in 
 the transfer to the Data Memory of the equivalent of an entire card 
 
141 
 
 image of useful data, then we must transfer 480 bits for each READ. If 
 
 we assume that a unit of time is about 1 microsecond, we have 
 
 R = (480 bits/READ)(16 READs) 
 10 ysec 
 
 = 768 M bits/sec 
 for the rate at which an I/O Processor must supply data to the Data 
 Memory. Transmitting all data in parallel, the I/O Processor must exe- 
 cute 1.6 M transmissions per second. With all eight I/O Processors 
 active (a rare occurrence), the I/O Bus must handle 12.8 M transmissions 
 per second. If three quarters of each record read is discarded by the 
 I/O Processor, then each file must supply data to its I/O Processor at 
 four times the rate the I/O Processor supplies data to the Data Memory. 
 The data rate required from a file is then 6.4 M transmissions per 
 second or 3072 M bits per second. We can achieve this transfer rate if 
 we can obtain large head-per-track disks capable of reading five words 
 (300 tracks) in parallel at a rate of 10 M bits per second. Such disks 
 are not yet available, but bulk storage devices which have the necessary 
 bandwidth, notably those built of semiconductor devices, are becoming 
 feasible. 
 
 If we can obtain large head-per-track disks which can supply 
 data at high rates, then the amount of buffer storage needed in the I/O 
 Processor can be quite small. We will assume for the purposes of this 
 section that the I/O Processor is composed of a register for the cur- 
 rent five words of the current record, a register for data to be 
 transferred to the Data Memory, a memory containing record format 
 information, and a network to route ten characters at a time to one of 
 
142 
 
 the words in the buffer register. On this basis it appears that the 
 following devices are needed for each I/O Processor: 
 Memory 
 
 - 256 60 bit words of format memory. 
 Registers 
 
 - One 300 bit record buffer. 
 
 - One 960 bit data buffer. 
 Crosspoints 
 
 - A 50 x 50 array of crosspoints where each 
 crosspoint carries six bits in parallel. 
 
 Bus Drivers and Receivers 
 
 - 960 drivers for the I/O Bus to supply all 
 16 Data Memory Units simultaneously. 
 
 - 300 drivers to send data to a file. 
 
 - 960 receivers for the I/O Bus. 
 
 - 300 receivers to receive data from a file. 
 
 7.10.10 SORT Network 
 
 As in the case of the I/O Processor, no design has been pro- 
 posed for the SORT Network. Consequently, only a rough estimate of the 
 required hardware is given here. By supplying the SORT Network with 
 data from the Data Memory Units, we can sort 16 words at a time. From 
 Batcher's results [BAT68] the number of comparators needed to sort 2 P 
 numbers is 
 
 N = (p 2 + p) 2 (p " 2) 
 
 = (16 + 4)(4) 
 
 = 80 
 
143 
 
 for our case, arranged in 10 levels with 8 comparators in each level. 
 Thus we need eighty 60 bit comparators in the network. We also need 
 drivers and receivers for sixteen 60 bit words. 
 
 7.11 Package Counts 
 
 In this section we present an estimate of the number of pack- 
 ages of circuitry needed to build this machine. The packages we use 
 for these counts are the Dual In-line Packages available from a number 
 of manufacturers. For those cases for which appropriate devices exist, 
 we have used them for these counts. For those cases for which we found 
 no existing device, we estimated the number of packages over which it 
 would be necessary to split the device. Further comments are included 
 in the discussion of each type of device. 
 
 7.11.1 Memories 
 
 For the purposes of counting memory packages we have different 
 package sizes. For slow bulk memory we assume a package contains up to 
 a 4096 x 1 bit array. For fast register memory we assume a 1024 x 1 bit 
 array per package. For Content Addressable Memory we assume a 16 x 1 bit 
 array per chip. For a small memory, such as the Data Register set in a 
 Data Memory Unit, we use a 64 x 4 bit array. For individual registers 
 we use a 4 bit register chip. Table 7.3 gives a summary of the memory 
 package requirement. Totaling the number of packages, we have the 
 following requirements: 
 
 Slow Memory 1028 packages 
 
 Fast Memory 3927 packages 
 
 Content Addressable Memory 1120 packages 
 
144 
 
 
 Table 
 
 7.3 
 
 
 
 
 
 
 Memory Package Requirements 
 
 
 
 
 Unit 
 
 Memory 
 Size 
 
 
 Memory 
 Type 
 
 Pkgs 
 
 per 
 
 Unit 
 
 # 
 Units 
 
 # 
 Pkgs 
 
 Program Memory 
 
 4096 
 
 X 
 
 68 
 
 Slow 
 
 68 
 
 1 
 
 68 
 
 
 1024 
 
 X 
 
 68 
 
 Fast 
 
 68 
 
 1 
 
 68 
 
 Data Memory 
 
 2048 
 
 X 
 
 60 
 
 Slow 
 
 60 
 
 16 
 
 960 
 
 
 64 
 
 X 
 
 60 
 
 Fast 
 
 15 
 
 16 
 
 240 
 
 
 64 
 
 X 
 
 11 
 
 CAM 
 
 44 
 
 16 
 
 704 
 
 
 6 
 
 X 
 
 1 
 
 Fast 
 
 2 
 
 16 
 
 32 
 
 Address Counter 
 
 12 
 
 X 
 
 1 
 
 Fast 
 
 3 
 
 16 
 
 48 
 
 
 68 
 
 X 
 
 1 
 
 Fast 
 
 17 
 
 16 
 
 272 
 
 
 15 
 
 X 
 
 8 
 
 Fast 
 
 2 
 
 16 
 
 32 
 
 Instruction Dispatch 
 
 128 
 
 X 
 
 48 
 
 CAM 
 
 384 
 
 
 384 
 
 Unit 
 
 73 
 
 X 
 
 32 
 
 Fast 
 
 16 
 
 
 16 
 
 
 15 
 
 X 
 
 32 
 
 CAM 
 
 32 
 
 
 32 
 
 
 16 
 
 X 
 
 1 
 
 Fast 
 
 4 
 
 
 4 
 
 
 71 
 
 X 
 
 5 
 
 Fast 
 
 4 
 
 
 4 
 
 Address Counter 
 
 16 
 
 X 
 
 16 
 
 Fast 
 
 4 
 
 
 4 
 
 Coordinator 
 
 16 
 
 X 
 
 3 
 
 Fast 
 
 1 
 
 
 1 
 
 
 8 
 
 X 
 
 16 
 
 Fast 
 
 4 
 
 
 4 
 
 
 15 
 
 X 
 
 9 
 
 Fast 
 
 2 
 
 
 2 
 
 
 4 
 
 X 
 
 15 
 
 Fast 
 
 4 
 
 
 4 
 
 Ari thmeti c/Logi cal 
 Unit 
 
 8 digit 
 (shift 
 
 x 1 
 
 reg.) 
 
 Fast 
 
 8 
 
 15 
 
 120 
 
 IF Tree Processor 
 
 255 
 
 X 
 
 1 
 
 Fast 
 
 64 
 
 1 
 
 64 
 
 
 12 
 
 X 
 
 1 
 
 Fast 
 
 3 
 
 1 
 
 3 
 
 
 34 
 
 X 
 
 1 
 
 Fast 
 
 9 
 
 1 
 
 9 
 
 I/O Processor 
 
 256 
 
 X 
 
 60 
 
 Fast 
 
 60 
 
 8 
 
 480 
 
 
 1 
 
 X 
 
 300 
 
 Fast 
 
 75 
 
 8 
 
 600 
 
 
 1 
 
 X 
 
 960 
 
 Fast 
 
 240 
 
 8 
 
 1920 
 
145 
 
 If we can obtain 16 bit register chips for the I/O Processor registers, 
 then the number of packages needed for fast memory would be reduced to 
 2037 packages. 
 
 7.11.2 Queues 
 
 In order to allow various units in the machine to operate 
 asynchronously from one another, we have included a number of queues. 
 For the purposes of Table 7.4, we have assumed that a single package 
 contains four bits of storage plus logic to control the first-in/ 
 first-out operation of the queue. From Table 7.4 it can be seen that 
 we need 1072 of these packages. 
 
 7.11.3 Bus Drivers and Receivers 
 
 Because we have a number of buses connecting units in the 
 machine, we are able to have a number of items of data and a number of 
 instructions in transit simultaneously. For this same reason we require 
 a number of bus drivers and receivers. We assume that we can have four 
 driver/ receivers per package for the purposes of Table 7.5. From this 
 table we find that we require 5349 packages. 
 
 7.11.4 Other Devices 
 
 A number of other devices are needed to complete the machine. 
 These are summarized in Table 7.6. The number of packages needed for 
 these other devices is 7380. 
 
 7.11.5 Total Package Requirement 
 
 Adding together the numbers of packages computed in this 
 section, we find that we need 19,876 packages for the devices we have 
 
146 
 
 Table 7.4 
 Queue Package Requirements 
 
 Unit 
 
 Queue Size 
 
 # 
 Pkgs/Unit 
 
 # 
 Units 
 
 # 
 Pkgs 
 
 Program Memory 
 
 16 x 16 
 
 64 
 
 
 64 
 
 Instruction Dispatch 
 Unit 
 
 16 x 71 
 48 x 5 
 
 72 
 96 
 
 
 72 
 96 
 
 
 8 x 76 
 
 152 
 
 
 152 
 
 Address Counter 
 Coordinator 
 
 8 x 32 
 2 x 32 
 
 64 
 16 
 
 
 64 
 16 
 
 Data Memory Unit 
 
 2 x 76 
 
 38 
 
 16 
 
 608 
 
147 
 
 Table 7.5 
 Bus Driver and Receiver Package Requirements 
 
 Unit 
 
 Per 
 # Drivers 
 
 Unit 
 #Re 
 
 iceivers 
 
 # Units 
 
 Total 
 Packages 
 
 Program Memory 
 
 68 
 
 
 12 
 
 1 
 
 20 
 
 Address Counter 
 
 98 
 
 
 114 
 
 16 
 
 800 
 
 Address Counter 
 Coordinator 
 
 27 
 
 
 86 
 
 1 
 
 25 
 
 Instruction Dispatch 
 Unit 
 
 223 
 
 
 76 
 
 1 
 
 75 
 
 Data Memory Unit 
 
 185 
 
 
 196 
 
 16 
 
 1120 
 
 Arithmetic/ Logical 
 Unit 
 
 60 
 
 
 136 
 
 15 
 
 510 
 
 I/O Processor 
 
 1260 
 
 
 1260 
 
 8 
 
 2520 
 
 SORT Network 
 
 960 
 
 
 960 
 
 1 
 
 240 
 
 IF Tree Processor 
 
 19 
 
 
 36 
 
 1 
 
 39 
 
148 
 
 CO 
 
 c 
 co 
 
 O 
 
 c_> 
 
 
 
 
 >? 
 
 
 cn 
 
 
 
 
 
 
 
 o 
 
 
 
 
 CO 
 
 CL 00 
 
 
 1 — 
 
 co 
 
 
 O) 
 
 CD 
 
 ■I- c 
 
 
 
 CD 
 
 
 en 
 
 to 
 
 +-> o 
 
 
 CO 
 
 03 
 
 Ol 
 
 to 
 
 J* 
 
 r— •!— 
 
 at 
 
 +-> 
 
 _^ 
 
 CD 
 
 .m 
 
 o 
 
 3 4-> 
 
 en 
 
 to 
 
 u 
 
 ro 
 
 u 
 
 fO 
 
 E to 
 
 <o 
 
 -t-> 
 
 rtj 
 
 _^ 
 
 <o 
 
 Q. 
 
 J- 
 
 jx. 
 
 CO 
 
 CL 
 
 O 
 
 Q. 
 
 *\ 
 
 " CO 
 
 o 
 
 
 "**s^ 
 
 tO 
 
 ~->^ 
 
 .c 
 
 T3 Q. 
 
 fO 
 
 + 
 
 -C 
 
 Q. 
 
 5- 
 
 o 
 
 -a o 
 
 CL 
 
 
 o 
 
 ^^, 
 
 CD 
 
 +j 
 
 to 
 
 ^^ 
 
 CD 
 
 ■»-> 
 
 4-> 
 
 -!Z 
 
 •»— 
 
 r— 
 
 to 
 
 -^ 
 
 • p— 
 
 • i— 
 
 o 
 
 s 
 
 CO to 
 
 CO 
 
 a. 
 
 2 
 
 CD 
 
 +-> 
 
 to 
 
 co a 
 
 T3 
 
 •^ 
 
 co 
 
 • i — 
 
 to 
 
 
 -o 1- 
 
 O 
 
 to 
 
 
 "O 
 
 E 
 
 «3- 
 
 >— o 
 
 C 
 
 -t-> 
 
 »d- 
 
 <u 
 
 co 
 
 X 
 
 a r— 
 
 o 
 
 .a 
 
 X 
 
 c 
 
 c 
 
 
 c 
 
 2' 
 
 
 
 o 
 
 o 
 
 «3- 
 
 •1— 
 
 4-> 
 
 «* 
 
 *3- 
 
 
 
 to 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 i— CD 
 
 «3- 
 
 00 
 
 to 
 
 to 
 
 «5l- 
 
 o 
 
 o 
 
 o 
 
 o 
 
 CO 
 
 CM 
 
 o 
 
 CM 
 
 
 
 to -^ 
 
 
 «3- 
 
 r— 
 
 r— 
 
 «3- 
 
 to 
 
 to 
 
 to 
 
 LO 
 
 CM 
 
 CM 
 
 o 
 
 r^ 
 
 
 
 4-> Q_ 
 
 
 
 
 
 1— 
 
 r— 
 
 1^ 
 
 CD 
 
 C\J 
 
 r— 
 
 
 to 
 
 00 
 
 
 
 O 
 
 
 
 
 
 
 
 
 
 CM 
 
 
 
 ^« 
 
 n- - 
 
 
 
 h- =*fc 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 to 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 to 
 
 ••-> 
 
 ■ — 
 
 U3 
 
 to 
 
 «o 
 
 to 
 
 vo 
 
 to 
 
 1 — 
 
 LO 
 
 r— 
 
 1 — 
 
 r— 
 
 00 
 
 
 -l-> 
 
 ^fe "P" 
 
 
 n— 
 
 r— 
 
 r— 
 
 r»* 
 
 r*— 
 
 1 — 
 
 
 r— 
 
 
 
 
 
 
 c 
 
 c 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CO 
 
 ID 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 E 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 s- 
 
 ■* — ( 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 •r— 
 
 to +-> 
 
 •3- 
 
 CO 
 
 r— 
 
 ^~ 
 
 CD 
 
 o 
 
 o 
 
 o 
 
 o 
 
 00 
 
 CM 
 
 o 
 
 «3- 
 
 
 3 
 
 CD-r- 
 
 
 
 
 
 
 f— 
 
 r- 
 
 to 
 
 LO 
 
 CM 
 
 CM 
 
 o 
 
 CO 
 
 
 cr 
 
 -i^ C 
 
 
 
 
 
 
 
 
 CD 
 
 ^~ 
 
 t— 
 
 
 to 
 
 CM 
 
 
 CO 
 
 Q- =) 
 
 
 
 
 
 
 
 
 
 
 
 
 r - — 
 
 
 
 cc 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 MD 
 
 
 =t*= 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 • 
 
 CO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 r~- 
 
 CD 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 >> 
 
 CO 
 
 -^ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 to 
 
 ^— 
 
 O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 S- 
 
 -Q 
 
 to 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 s_ 
 
 (O 
 
 Q- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 <o 
 
 1— 
 
 CO 
 
 o 
 
 
 
 
 
 
 
 s- 
 
 to 
 
 
 
 CO 
 
 CO 
 
 ■a 
 o 
 
 to 
 
 fc. 
 
 o 
 +J 
 
 4-> 
 
 
 > 
 
 C 
 
 
 
 
 
 
 CO 
 
 CO 
 
 
 
 -a 
 
 o 
 
 ra 
 
 o 
 
 
 CO 
 
 o 
 
 
 
 
 
 
 4-> 
 
 SZ 
 
 to 
 
 
 o 
 
 CO 
 
 s- 
 
 CL 
 
 
 Q 
 
 ■1— 
 
 
 
 
 
 to 
 
 c 
 
 o 
 
 +-> 
 
 
 o 
 
 Q 
 
 rt} 
 
 00 
 
 
 
 4-> 
 
 
 
 
 
 S- 
 
 CO 
 
 +-> 
 
 c 
 
 ZD 
 
 CO 
 
 
 CL 
 
 to 
 
 
 s- 
 
 CL 
 
 S- 
 
 
 
 
 QJ 
 
 E 
 
 to 
 
 ■r™ 
 
 _l 
 
 Q 
 
 S- 
 
 E 
 
 O 
 
 
 CO 
 
 •r~ 
 
 aj 
 
 s- 
 
 S- 
 
 
 -a 
 
 CO 
 
 E 
 
 O 
 
 <C 
 
 
 O 
 
 o 
 
 S- 
 
 
 -C 
 
 i- 
 
 -o 
 
 CO 
 
 CO 
 
 s- 
 
 ■o 
 
 S- 
 
 
 CL 
 
 
 CO 
 
 +J 
 
 o 
 
 o 
 
 
 +-> 
 
 O 
 
 o 
 
 +J 
 
 T3 
 
 CO 
 
 to 
 
 o 
 
 • 
 
 to 
 
 S- 
 
 CO 
 
 u 
 
 
 
 
 o 
 
 to 
 
 o 
 
 c 
 
 O 
 
 sz 
 
 
 c 
 
 s- 
 
 CO 
 
 a> 
 
 s- 
 
 CO 
 
 -»-> 
 
 to 
 
 
 
 CO 
 
 CO 
 
 =J 
 
 O 
 
 a 
 
 4-> 
 
 •t— 
 
 (O 
 
 o 
 
 +j 
 
 1— 
 
 CO 
 
 •P" 
 
 
 
 
 a 
 
 -o 
 
 o 
 
 CO 
 
 +-> 
 
 •r— 
 
 
 .c 
 
 S- 
 
 o 
 
 
 
 JD 
 
 X 
 
 
 
 
 
 o 
 
 -a 
 
 rO 
 
 -Q 
 
 +J 
 
 u 
 
 u 
 
 to 
 
 CO 
 
 CO 
 
 
 
 
 
 
 to 
 
 
 
 E 
 
 
 •r- 
 
 
 
 i- 
 
 +-> 
 
 +-> 
 
 o 
 
 o 
 
 
 
 
 r— 
 
 ■4-> 
 
 to 
 
 
 LO 
 
 CD 
 
 to 
 
 o 
 
 (0 
 
 to 
 
 to 
 
 to 
 
 ^~ 
 
 
 
 
 1 
 
 •r— 
 
 i 
 
 +J 
 
 |— 
 
 •r— 
 
 
 to 
 
 JZ 
 
 CD 
 
 CD 
 
 
 
 
 
 
 t»- 
 
 -Q 
 
 if- 
 
 •r- 
 
 
 XI 
 
 ^--^ 
 
 CO 
 
 o 
 
 
 
 *•—** 
 
 X 
 
 
 
 
 o 
 
 
 o 
 
 -Q 
 
 * — ■* 
 
 
 o 
 
 n 
 
 
 «sf 
 
 <o 
 
 o 
 
 
 
 
 
 1 
 
 CM 
 
 i 
 
 
 co 
 
 o 
 
 r— 
 
 LO 
 
 o 
 
 to 
 
 CM 
 
 00 
 
 o 
 
 
 
 
 r— 
 
 r— 
 
 r— 
 
 ^i- 
 
 — ' 
 
 r— 
 
 *> — ' 
 
 i — 
 
 r— 
 
 r^ 
 
 T— 
 
 *~-' 
 
 LO 
 
 
 S- 
 
 >1 
 
 co 
 
 S- 
 
 +-> 
 
 o 
 
 c 
 
 E 
 
 3 
 
 CO 
 
 o 
 
 s: 
 
 <_) 
 
 E 
 
 to 
 
 to 
 
 to 
 
 i~ 
 
 CO 
 
 CD 
 
 i. 
 
 O 
 
 -a 
 
 S- 
 
 T3 
 
 Q_ 
 
 <c 
 
 
 
 to 
 
 S- 
 
 
 
 -M 
 
 
 o 
 
 o 
 
 
 
 •r- 
 
 JXZ. 
 
 •^ 
 
 to 
 
 
 
 C 
 
 S- 
 
 CD 
 
 to 
 
 
 
 zo 
 
 o 
 
 O 
 
 CO 
 
 
 s- 
 
 
 s 
 
 _l 
 
 o 
 
 Js£ 
 
 o 
 
 >, 
 
 4-> 
 
 •^ i- 
 
 o 
 
 i. 
 
 to 
 
 i? 
 
 CO 
 
 o o 
 
 s_ 
 
 o 
 
 to 
 
 o 
 
 •zz 
 
 •i- to 
 
 CL. 
 
 s 
 
 CO 
 
 E 
 
 
 +■> to 
 
 
 4-> 
 
 o 
 
 CO 
 
 CD 
 
 CO CO 
 
 CO 
 
 CO 
 
 o 
 
 s: 
 
 c 
 
 E O 
 
 CO 
 
 2: 
 
 S- 
 
 
 ■f 
 
 ^Z o 
 
 s_ 
 
 
 Q_ 
 
 <a 
 
 +J 
 
 +-> s- 
 
 1— 
 
 1— 
 
 
 +-> 
 
 3 
 
 •r- CL. 
 
 
 or 
 
 o 
 
 <o 
 
 O 
 
 s- 
 
 u_ 
 
 o 
 
 •^^ 
 
 Q 
 
 cc: 
 
 «=c 
 
 ►—I 
 
 00 
 
 1— 1 
 
149 
 
 discussed. This does not, however, include control logic or signal paths 
 between units. To allow for this we assume that this logic requires half 
 again as many packages as we have counted. Thus we arrive at a total 
 package count of about 30,000 packages for this machine. Table 7.7 
 summarizes the counts generated in this section. The parenthesized 
 values include control logic, while the other values are the ones gener- 
 ated in this section. 
 
150 
 
 
 
 Table 7. 
 
 7 
 
 
 
 
 Total 
 
 Package Req 
 
 uirements 
 
 
 
 Unit 
 
 Memory 
 Packages 
 
 Queue 
 Packages 
 
 Driver 
 Packages 
 
 Other 
 Packages Pc 
 
 Total 
 ickages 
 
 Program Memory 
 
 136 
 ( 204) 
 
 64 
 
 ( 96) 
 
 20 
 
 ( 30) 
 
 4 . 
 ( 8) 1 
 
 224 
 ' 336) 
 
 Data Memory 
 
 1936 
 (2904) 
 
 608 
 ( 912) 
 
 1120 
 (1680) 
 
 320 
 (480) 1 
 
 3984 
 [ 5976) 
 
 Address 
 Counters 
 
 352 
 ( 528) 
 
 
 800 
 (1200) 
 
 224 
 ( 336) 1 
 
 1376 
 
 ; 2064) 
 
 Inst. Disp. 
 Unit 
 
 440 
 ( 660) 
 
 320 
 ( 480) 
 
 75 
 ( 113) 
 
 
 835 
 [ 1253) 
 
 Addr. Cntr. 
 Coordinator 
 
 15 
 ( 23) 
 
 80 
 ( 120) 
 
 25 
 
 ( 38) 
 
 
 120 
 ; 180) 
 
 Arith. /Logical 
 Units 
 
 120 
 ( 180) 
 
 
 510 
 ( 765) 
 
 2250 
 ( 3375) 1 
 
 2880 
 ; 4320) 
 
 IF Tree 
 Processor 
 
 76 
 ( 114) 
 
 
 39 
 ( 59) 
 
 150 
 ( 225) 1 
 
 265 
 ; 398) 
 
 I/O Processors 
 
 3000 
 (4500) 
 
 
 2520 
 (3780) 
 
 1872 
 ( 2808) 1 
 
 7392 
 [11088) 
 
 SORT Processor 
 
 
 
 240 
 ( 360) 
 
 1600 
 ( 2400) 1 
 
 1840 
 [ 2760) 
 
 Routing Network 
 
 
 
 
 960 
 ( 1440) 1 
 
 960 
 ; 1440) 
 
 Total Packages 
 
 6075 
 (9113) 
 
 1072 
 (1608) 
 
 5349 
 (8024) 
 
 7380 
 (11070) ( 
 
 19876 
 !29815) 
 
151 
 
 8. PROBLEM PROGRAMS 
 
 8.1 Characteristics of Problem Programs 
 
 In examining the results of the analyses of a number of COBOL 
 programs, we find that phases of programs fall into three classes. The 
 first class consists of those phases which are parallel processable. 
 These phases yield the best results for our method of concurrent pro- 
 cessing. The second class of programs consists of those which contain 
 sequential constraints but which do allow a number of Address Counters 
 to concurrently process data. While the speedup of these phases is not 
 as spectacular as in the first class, our method does yield an apprecia- 
 ble improvement over single Address Counter execution. The third class, 
 which we examine in this chapter, consists of phases which allow yery 
 little concurrent processing and consequently yery little speedup. 
 
 Some of the phases are in this last class because they contain 
 a large number of Reference Dependent variables and the resulting inter- 
 lock constraints. There is little we can do for these phases without 
 completely redesigning them. Some phases, however, are encumbered only 
 by the presence of a few Reference Dependent variables. It is possible 
 in some of these phases to improve their speedup by making a change in 
 one of the restrictions imposed in section 2.2. 
 
 8.2 Speeding Up Problem Programs 
 
 The change in our method needed to accomplish this improve- 
 ment is to allow each Address Counter, when necessary, the ability to 
 
152 
 
 examine the contents of its immediate prodecessor's Local variable set. 
 This ability could, of course, be extended to allow access to Local 
 variables between Address Counters at any fixed separation. As an exam- 
 ple of the usefulness of this ability, consider the following program 
 fragment: 
 
 1 LOOP. 
 
 2 READ IN-FILE AT END GO TO DONE. 
 
 3 IF IN-SEQ > OLD-SEQ 
 
 4 THEN MOVE CORRESPONDING IN-DATA TO OUT-DATA 
 
 5 WRITE OUT-DATA 
 
 6 ELSE DISPLAY IN-SEQ ' SEQ ERROR'. 
 
 7 MOVE IN-SEQ TO OLD-SEQ. 
 
 8 GO TO LOOP. 
 
 Under the constraint that each Address Counter has access only to its 
 own Local variable set, this loop is completely sequential. The problem 
 is that OLD-SEQ cannot be accessed by Address Counter i until Address 
 Counter i-1 has set its proper value. Relaxing this constraint, let us 
 substitute IN-SEQ. , for OLD-SEQ, and use IN-SEQ. where the program 
 indicates IN-SEQ. Now, by allowing Address Counter i to access 
 IN-SEQ. , we can process the input records concurrently. 
 
 In order to allow an Address Counter to access its predeces- 
 sor's Local variable set, the following conditions must exist: 
 1) It must be possible to replace the Reference 
 Dependent variable causing the problem (OLD-SEQ 
 in the example) with the same Local variable in 
 all cases. 
 
153 
 
 2) The phase being transformed by replacement must 
 contain only one primary READ statement. 
 
 3) Every path within the phase must contain an assign- 
 ment of the value of the Local variable to the vari- 
 able we want to remove. 
 
 4) There is no assignment statement in the phase for 
 which the Local variable is an output variable. 
 
 These conditions can be identified as follows: 
 
 1) Examine all occurrences of a Reference Dependent 
 variable as an output variable. If it is always 
 assigned a value from the same Local variable, the 
 Reference Dependent variable is a candidate for 
 removal . 
 
 2) The presence of only one primary READ statement can 
 be detected during the source text scan (section 3.1) 
 
 3) Each path in the phase must be traced, starting at 
 the primary READ. If the primary READ is again 
 encountered, this attempt at replacement must be 
 abandoned. If a statement causing the assignment 
 of the value of the Local variable to the Reference 
 Dependent variable is reached, the path satisfies 
 condition (3) above. 
 
 4) By finding that the set of output references for the 
 Local variable is empty, we know that no assignment 
 is made to the Local variable, satisfying 
 condition (4) above. 
 
154 
 
 If the Reference Dependent variable can be removed, we can 
 delete the statements in which the variable to be removed occurs as an 
 output variable. In each statement in which the removed variable 
 appears as an input variable, we substitute the preceding Address 
 Counter's copy of the Local variable. To give an Address Counter access 
 to its predecessor's copy of the Local variable, we use a different 
 index register for the base address of the predecessor's storage than 
 for the Address Counter's own storage. This index register's value is 
 set when an Address Counter is activated. It is also necessary to mark 
 Address Counters used in this way so that an Address Counter's storage 
 is not released until both that Address Counter and its successor are 
 ready to return to the pool of available Address Counters. 
 
 Unfortunately, allowing an Address Counter access to its pre- 
 decessor's Local variable set, while conceptually simple, causes an 
 increase in the complexity of the machine, a larger number of algorithms 
 in the compiler, and a more complicated operating system. Since the 
 conditions under which variable replacement can be done are seldom 
 satisfied in a real program, it appears that the cost of this improve- 
 ment outweighs the benefits derived from it. 
 
155 
 
 9. SOFTWARE DESIGN 
 
 While examining many COBOL programs we found various language 
 features and programming practices which hindered concurrent record pro- 
 cessing. Conversely, some features and techniques were found which 
 aided our method of speedup. The language features and programming 
 techniques which help and which hinder our method are discussed in this 
 chapter. 
 
 9.1 Language Features Which Hinder 
 
 9.1.1 ALTER 
 
 A COBOL feature [IBM72] which makes our method impossible to 
 apply is the ALTER command. This instruction causes the program code to 
 be modified during execution by overwriting the destination of a branch 
 instruction. For example, consider the code: 
 
 A. READ FILE-1. 
 
 B. GO TO'Pl. 
 
 pi. alter'b TO PROCEED TO P2. 
 
 GO T0*A. 
 P2. ADD 1 TO FILE-1-IN. 
 
 READ FILE-1. 
 The program analysis needed for concurrent record processing 
 is dependent on a priori knowledge of the program graph. Because the 
 ALTER command allows any branch in the program to be changed during 
 
156 
 
 execution, the program as analyzed by the compiler and the program as 
 executed may have entirely different program graphs. Our algorithms 
 would form the graph in Figure 9.1(a) after insertion of FORKs, HOLDs, 
 and QUITs. After the block labeled PI is executed, however, the 
 program graph would look like the one in Figure 9.1(b), which will give 
 erroneous results. 
 
 If a dialect of COBOL is implemented for our machine, the 
 ALTER command should be dropped from that dialect. The same function 
 can be performed by setting and testing in-line switches, as in the 
 following code: 
 
 77 FIRST-CARD-SW PICTURE 9 VALUE 1. 
 
 A. READ FILE-1. 
 
 B. IF FIRST-CARD-SW = 1 , GO TO PI 
 
 ELSE GO TO P2. 
 
 P2. ADD 1*T0 FILE-1-IN. 
 
 READ FILE-1. 
 This is properly handled by the algorithms presented in Chapter 3. 
 
 9.1.2 Subroutines 
 
 Subroutine calls are seldom used in COBOL since the PERFORM 
 command provides a yery powerful alternative. Those calls which do 
 appear, however, cause a problem. Since we lack information about the 
 way variables are used in a user-supplied subroutine, we must assume 
 the worst. We must assume that all variables in the argument list are 
 
157 
 
 
 1 
 
 
 a 
 c 
 
 i 
 
 
 , 
 
 
 i 
 
 / 
 
 READ 
 FILE 
 
 
 Li. 
 
 Y 
 
 
 Q 
 
 a 
 < 
 
 
 READ 
 FILE 
 
 "S 
 
 
 
 
 
 
 
 
 
 eu 
 
 
 ^— 
 
 
 Q. 
 
 
 E 
 
 
 ro 
 
 
 X 
 
 
 LxJ 
 
 , — 
 
 C 
 
 • 
 
 o 
 
 CT) 
 
 • r— 
 
 
 -l-> 
 
 <JJ 
 
 O 
 
 s- 
 
 ZJ 
 
 3 
 
 S- 
 
 en 
 
 +J 
 
 ■i™* 
 
 CO 
 
 Li- 
 
 c 
 
 d; 
 
 
 
 
 it 
 
 • 
 
 
 1 
 
 
 o 
 
 
 Q W 
 
 
 U_ 
 
 <" 
 
 < d 
 
 UJ Li 
 
 
 
 k* 
 
 V 
 
 ^ 
 
 
 • 
 
 
 ^ 
 
 a: 
 
 _ H . M _^to 
 
 H 
 
 UJ 
 
 
 3 
 
 »- 
 
 
 L oJ 
 
 _J 
 
 
 < 
 
 
 
 fO 
 
158 
 
 used as both input and output variables. Further, we must assume that 
 all arguments are Reference Dependent variables. For system-supplied 
 subroutines we may be able to relax these restrictions since we should 
 be able to determine which variables are not Reference Dependent and 
 which variables are not in the input or the output variable set of the 
 CALL statement. In the dialect of COBOL implemented for our machine, 
 the programmer should be able to supply information which informs the 
 compiler of the characteristics of the variables used in subroutine 
 calls. 
 
 9.2 Language Features Which Help 
 
 9.2.1 Complex Operations 
 
 There are a few statements in COBOL, such as TRANSFORM and 
 EXAMINE, which call for complex operations. In FORTRAN these operations 
 would require a subroutine call or appreciably more than one statement, 
 rather than the single COBOL statement. These statements are used 
 frequently enough to justify building hardware units to execute them. 
 Since a hardware unit can execute one of these instructions in a matter 
 of a few clock cycles rather than a number of instruction cycles, they 
 contribute to speeding up the execution of a program. 
 
 9.2.2 SORT 
 
 The SORT command is another which can contribute to program 
 speedup by being implemented in hardware. The networks described by 
 Batcher sort a large number of records in a relatively small number of 
 cycles compared to doing the same operation in a software algorithm. 
 
159 
 
 9.3 Programming Techniques Which Hinder 
 
 9.3.1 Sequence Checking 
 
 Many COBOL programs would run very slowly on a multiple 
 Address Counter machine because of techniques which use variables which 
 must be Reference Dependent. The sequence checking done in the example 
 in section 8.2 is an example of such a technique. In many cases there 
 is an alternative to sequence checking. This alternative is sorting the 
 input file on the sequence number before using the file. On a conven- 
 tional monoprocessor this would be a yery time-consuming alternative. 
 On a multiple Address Counter machine with sorting hardware, the oppo- 
 site would be true. 
 
 9.4 Programming Techniques Which Help 
 
 9.4.1 Super-records 
 
 Some programs would run relatively slowly on a multiple 
 Address Counter machine simply because there are READ statements on the 
 primary file scattered throughout the program. If these READs could be 
 combined into a single READ, the program would execute more rapidly 
 because the decision of whether or not that READ would be executed again 
 could be made sooner in the instruction stream. One way of accomplish- 
 ing this is through the use of what we have come to refer to as a 
 "super-record." An example should make this concept clear. 
 
 Consider a program which uses a "finder" file to update rec- 
 ords in a master file. Quite possibly there is more than one update 
 record for a given master record. If all of the updates for a master 
 
160 
 
 record are combined to form a super-record, then one READ statement 
 handles all of the data for updating one master record. Instead of 
 sequencing through the individual update records, requiring many inter- 
 locks to keep instruction streams from interfering, a number of update 
 super-records, requiring fewer interlocks, could be concurrently 
 applied to a number of master records. This could yield an appreciable 
 speed improvement. 
 
 9.4.2 Parallel Tasking 
 
 Occasionally two phases of a COBOL program are completely 
 independent. Because of the rarity of this occurrence and because of 
 the cost of searching for this occurrence, no such search is included 
 among the algorithms proposed in Chapter 3. However, in a machine 
 capable of concurrently executing more than one instruction stream, it 
 seems a shame not to provide a programmer with the necessary instruc- 
 tions for getting two or more phases into concurrent execution. The 
 same three instructions described in section 3.6 should be sufficient. 
 In COBOL- like syntax, they might look like the following in the dialect 
 implemented on our multiple Address Counter machine: 
 
 FORK TO procedure-name. 
 
 HOLD. 
 
 QUIT. 
 Since these instructions would generate the corresponding machine 
 instructions, no special hardware or compiler algorithms are needed to 
 handle them. The only restriction on their use would be that all links 
 from the concurrently executed phases would have to terminate on the 
 same HOLD instruction. 
 
161 
 
 10. CONCLUSIONS 
 
 10.1 Summary of Results 
 
 In this thesis we have tried to demonstrate the following: 
 
 1) The technique of concurrently processing a number 
 of records has the potential for greatly speeding 
 up the execution of a business data processing 
 program. 
 
 2) Compiler algorithms can be formulated to insert 
 the necessary FORK, HOLD, and QUIT commands to 
 start and stop the concurrent processing of 
 instruction streams. 
 
 3) Compiler algorithms can be formulated to insert 
 the necessary TEST and RELEASE commands to protect 
 those variables we call Reference Dependent 
 variables. 
 
 4) Hardware, rather than compiler routines, can be 
 designed to protect those variables we call 
 Reference Independent variables and to prevent 
 out-of-sequence references to all variables. 
 
 5) A machine can be designed to implement the 
 technique of concurrent record processing. 
 
 From the results of an analysis of a number of COBOL programs, 
 we have been able to determine values for a number of machine parameters 
 These parameters and their values are the following: 
 
162 
 
 1 ) Program Memory # 
 
 4096 words (RAM) 
 68 bits per word 
 
 2) Program Memory - Cache 
 
 IK words (RAM) 
 68 bits per word 
 
 3) Address Counters 
 
 16 units 
 
 4) Instruction Dispatch Unit - Tag Status Register Array 
 
 128 words (CAM)* 
 48 bits per word 
 
 5) Instruction Dispatch Unit - Instruction Waiting 
 Register Array 
 
 32 words 
 
 73 bits per word (RAM) 
 
 15 bits per word (CAM) 
 
 6) Data Memory 
 
 16 units 
 
 2K words (RAM) 
 
 10 characters per word 
 
 6 bits per character 
 
 7) Data Memory - Data Registers 
 
 64 words (RAM) 
 
 RAM - Random Access Memory 
 
 CAM - Content Addressable (Associative) Memory 
 
163 
 
 8) Data Memory - Address Registers 
 
 64 words (CAM) 
 11 bits per word 
 
 9) Processors 
 
 15 Arithmetic/Logical Processors 
 1 IF Tree Processor 
 8 I/O Processors 
 1 SORT Processor 
 
 10.2 Areas for Further Inquiry 
 
 As it might be expected, the research described in this paper 
 has raised a number of questions which could bear further research. 
 Questions which have occurred to us include the following: 
 
 1) How can the limit imposed on our machine by the 
 Instruction Dispatch Unit (section 7.1) be avoided? 
 The design of the Instruction Dispatch Unit given 
 in section 5.4 does perform the function for which 
 it was designed—protecting sequential access to a 
 variable where this protection is needed (section 2.2) 
 --but the speed we feel is attainable falls short of 
 the speed we feel is desirable. If it is not possi- 
 ble to make a large improvement in the speed of this 
 unit, say by an order of magnitude, then some other 
 way of implementing this function should be found. 
 
 2) What better compiler algorithms should be designed 
 for this machine? In Chapter 3 we presented a set 
 
164 
 
 of compiler algorithms to demonstrate the feasibility 
 of our technique. Due to our training and experience 
 in programming single instruction stream machines, 
 these algorithms are designed to use only one Address 
 Counter. For a machine capable of concurrently exe- 
 cuting several instruction streams, there must be 
 better algorithms and better approaches to compiler 
 design than simply using the single instruction 
 stream techniques developed for past machines. 
 
 3) What techniques, beyond those discussed in 
 section 9.4, will allow a machine such as this one 
 to perform as well as it has the potential to 
 perform? In writing programs for a machine with a 
 different structure, it is apparent that some new 
 techniques will have to be learned and some old ones 
 unlearned. 
 
 4) How does the availability of a number of Address 
 Counters affect the design of an operating system 
 for this machine? It should make some design 
 aspects easier than for a single instruction stream 
 machine. For example, instead of enqueuing some 
 requests for system resources, it may be possible 
 to execute a FORK instruction to provide these 
 resources. It could also make some design aspects 
 more challenging, such as trying to recover infor- 
 mation about the cause of a program failure 
 
165 
 
 (e.g.: from which instruction stream did an attempt 
 to divide by zero originate?). 
 
 10.3 Final Comment 
 
 Despite the questions and problems the technique of concurrent 
 record processing raises, there appears to be a very real potential for 
 program speedup using this technique. 
 
166 
 
 LIST OF REFERENCES 
 
 [ADA60] Adams, C. W., "A Chart for EDP Experts," Datamation , Vol. 6, 
 No. 6, November/December 1960, pp. 13-17. 
 
 [AND65] Anderson, J. P., "Program Structures for Parallel Processing," 
 Comm. ACM , Vol. 8, No. 12, December 1965, pp. 786-788. 
 
 [ASC67] Aschenbrenner, R. A., M. J. Flynn, and G. A. Robinson, 
 
 "Intrinsic Multiprocessing," Proc. 1967 Spring Joint 
 Computer Conf . , pp. 81-86. 
 
 [BAE73] Baer, J. L., "A Survey of Some Theoretical Aspects of Multi- 
 processing," ACM Computing Surveys , Vol. 5, No. 1, 
 March 1973, pp. 31-80. 
 
 [BAR68] Barnes, G. H., R. Brown, M. Kato, D. Kuck, D. Slotnick, and 
 R. Stokes, "The Illiac IV Computer," IEEE Trans, on 
 Computers , Vol. 17, No. 8, August 1968, pp. 746-757. 
 
 [BAR72a] Barsamian, H., and A. L. DeCegama, "Some Design Considerations 
 of Cache Memories," C0MPC0N 72 Digest of Papers , 
 pp. 107-110. 
 
 [BAT68] Batcher, K. E., "Sorting Networks and Their Applications," 
 Proc. 1968 Spring Joint Computer Conf. , pp. 307-314. 
 
 [BEL72] Bell, C. G., R. Chen, and S. Rege, "Effect of Technology on 
 Near Term Computer Structures," IEEE Computer , Vol. 5, 
 No. 3, March 1972, pp. 29-38. 
 
 [BER66] Bernstein, A. J., "Analysis of Programs for Parallel Proces- 
 sing," IEEE Trans, on Computers , Vol. 15, No. 5, 
 October 1966, pp. 757-763. 
 
 [BUR67] Burroughs B5500 Information Processing System COBOL Reference 
 Manual , Burroughs Corporation, 1967, Form No. 1024247. 
 
 [CHA69] Chappell, S. G., "LAMP System- -Topology Evaluation Programs," 
 Bell Telephone Laboratories internal publication, 1969. 
 
 [CHE71a] Chen, T. C, "Parallelism, Pipelining, and Computer 
 
 Efficiency," Computer Design , Vol. 10, No. 1, January 
 1971, pp. 69-74. 
 
167 
 
 [CHE71b] Chen, W. K. , Applied Graph Theory , American Elsevier, 
 New York, 1971. 
 
 [C0F72] Coffman, E. G., and T. A. Ryan, Jr., "A Study of Storage 
 Partitioning Using a Mathematical Model of Locality," 
 Comm. ACM , Vol. 15, No. 3, March 1972, pp. 185-190. 
 
 [C0N63] Conway, M. E., "A Multiprocessor System Design," Proc. 1963 
 Fall Joint Computer Conf. , pp. 139-146. 
 
 [C0N69] Conti, C. J., "Concepts for Buffer Storage," IEEE Computer 
 Group News , March 1969, pp. 9-13. 
 
 [C0U71] Courtois, P. J., F. Heymans , and D. L. Parnas, "Concurrent 
 Control with 'Readers' and 'Writers,'", Comm. ACM , 
 Vol. 14, No. 10, October 1971, pp. 667-668. 
 
 [CUR63] Curtin, W. A., "Multiple Computer Systems," Advances in 
 
 Computers , Vol. 4, Academic Press, 1963, pp. 245-303. 
 
 [DAV72a] Davis, E. W. , "Concurrent Processing of Conditional Jump 
 Trees," C0MPC0N 72 Digest of Papers , pp. 279-281. 
 
 [DAV72b] Davis, E. W. , "A Multiprocessor for Simulation Applications," 
 Ph.D. Thesis, University of Illinois at Urbana-Champaign, 
 Department of Computer Science Report No. 527, June 1972. 
 
 [DIJ65] Dijkstra, E. W. , "Solution of a Problem in Concurrent 
 
 Programming Control," Comm. ACM , Vol. 8, No. 9, September 
 1965, p. 569. 
 
 [DIJ68a] Dijkstra, E. W., "Co-operating Sequential Processes," 
 
 Programming Languages , Academic Press, 1968, pp. 43-112. 
 
 [DIJ68b] Dijkstra, E. W., "The Structure of the 'THE' --Multiprogramming 
 System," Comm. ACM , Vol. 11, No. 5, May 1968, 
 pp. 341-343. 
 
 [EIS72] Eisenberg, M. A., and M. R. McGuire, "Further Comments on 
 Dijkstra 's Concurrent Programming Control Problem," 
 Comm. ACM , Vol. 15, No. 11, November 1972, p. 999. 
 
 [FLY71] Flynn, M. J., "Shared Internal Resources in a Multiprocessor," 
 Proc. IFIP Congress 1971 , Ljubljana, Yugoslavia, 
 pp. 565-569. 
 
 [FLY72] Flynn, M. J., and A. Podvin, "Shared Resource Multiprocessing, 
 IEEE Computer , March/April 1972, pp. 20-28. 
 
168 
 
 [F0S71] Foster, C. C, "Uncoupling Central Processor and Storage 
 Device Speeds," The Computer Journal , Vol. 14, No. 1, 
 February 1971, pp. 45-48. 
 
 [F0S72a] Foster, C. C, and E. M. Riseman, "A Study of the Constraints 
 Upon the Parallel Dispatching and Execution of Machine 
 Code Instructions," Department of Computer and Informa- 
 tion Science, University of Massachusetts Technical 
 Report 72A-1, February 1972. 
 
 [F0S72b] Foster, C. C, and E. M. Riseman, "Percolation of Code to 
 
 Enhance Parallel Dispatching and Execution," IEEE Trans, 
 on Computers , Vol. 21, No. 12, December 1972, 
 pp. 1411-1415. 
 
 [GIL58] Gill, S., "Parallel Programming," Computer Journal , Vol. 1, 
 No. 1, April 1958, pp. 2-10. 
 
 [GUN70] Gunderson, D. C, "Some Effects of Advances in Memory System 
 Technology on Computer Organization," IEEE Computer , 
 Vol. 3, No. 6, November/December 1970, pp. 7-11. 
 
 [HAB72] Habermann, A. N., "Synchronization of Communicating Processes," 
 Comm. ACM , Vol. 15, No. 3, March 1972, pp. 171-176. 
 
 [HIN72] Hintz, R. G., and D. P. Tate, "Control Data STAR-100 Processor 
 Design," C0MPC0N 72 Digest of Papers , pp. 1-4. 
 
 [IBM72] IBM OS Full American National Standard COBOL , International 
 
 Business Machines Corporation, 1972, Order No. GC28-6396-3. 
 
 [IBM73a] A Guide to the IBM System/370 Model 158 , International Business 
 Machines Corporation, 1973, Order No. GC20-1754-0. 
 
 [IBM73bJ IBM System/370 Model 158 Functional Characteristics , 
 
 International Business Machines Corporation, 1973, 
 Order No. GA22-7011-2. 
 
 [IS071] Isoda, S., E. Goto, and I. Kimura, "An Efficient Bit Table 
 Technique for Dynamic Storage Allocation of 2 n -Word 
 Blocks," Comm. ACM , Vol. 14, No. 9, September 1971, 
 pp. 589-592. 
 
 [KAP73J Kaplan, K. R. , and R. 0. Winder, "Cache-based Computer 
 
 Systems," Computer , Vol. 6, No. 3, March 1973, pp. 30-36. 
 
 [KRA72] Kraska, P. W., "Parallelism Exploitation and Scheduling," 
 
 Ph.D. Thesis, University of Illinois at Urbana-Champaign, 
 Department of Computer Science Report No. 518, 1972. 
 
169 
 
 [KUC70] Kuck, D. J., and D. H. Lawrie, "The Use and Performance of 
 Memory Hierarchies: A Survey," Software Engineering , 
 Vol. 1, Academic Press, 1970, pp. 45-77. 
 
 [KUC72a] Kuck, D. J., D. H. Lawrie, and Y. Muraoka, "Interconnection 
 Networks for Processors and Memories in Large Systems," 
 C0MPC0N 72 Digest of Papers , pp. 131-134. 
 
 [KUC72b] Kuck, D. J., Y. Muraoka, and S. C. Chen, "On the Number of 
 Operations Simultaneously Executable in FORTRAN-Like 
 Programs and Their Resulting Speedups," IEEE Trans, on 
 Computers , Vol. 21, No. 12, December 1972, pp. 1293-1310. 
 
 [LIP68] Liptay, J. S., "Structural Aspects of the SYSTEM/360 Model 85, 
 Part II, The Cache," IB M Systems Journal , Vol. 7, No. 1, 
 January 1968, pp. 15-21. 
 
 [MAT72] Mattson, R. L., and I. L. Traiger, "Storage Hierarchy Design," 
 C0MPC0N 72 Digest of Papers , pp. 145-148. 
 
 [MEA71] Meade, R. M. , "Design Approaches for Cache Memory Control," 
 Computer Design , January 1971, pp. 87-93. 
 
 [MUR66] Murtha, J. C, "Highly Parallel Information Processing 
 
 Systems," Advances in Computers , Vol. 7, Academic Press, 
 New York, 1966. 
 
 [MUR71] Muraoka, Y., "Parallelism Exposure and Exploitation in 
 
 Programs," Ph.D. Thesis, University of Illinois at Urbana- 
 Champaign, Department of Computer Science Report No. 424, 
 1971. 
 
 [0PL65] Opler, A., "Procedure-Oriented Language Statements to Facilitate 
 Parallel Processing," Comm. ACM , Vol. 8, No. 5, May 1965, 
 pp. 306-307. 
 
 [PHI73] Philippakis, A. S., "Programming Language Usage," Datamation , 
 Vol. 19, No. 10, October 1973, pp. 109-114. 
 
 [RAM66] Ramamoorthy, C. V., "Analysis of Graphs by Connectivity 
 Considerations," Journal of the ACM , Vol. 13, No. 2, 
 April 1966, pp. 211-222. 
 
 [RAM69] Ramamoorthy, C. V., and M. J. Gonzalez, "A Survey of Techniques 
 for Recognizing Parallel Processable Streams in Computer 
 Programs," Proc. 1969 Fall Joint Computer Conf. , pp. 1-15. 
 
 [RIS72] Riseman, E. M. , and C. C. Foster, "The Inhibition of Potential 
 Parallelism by Conditional Jumps," IEEE Trans, on 
 Computers , Vol. 21, No. 12, pp. 1405-1411. 
 
170 
 
 [RUS69] Russell, E. C, "Automatic Program Analysis," Ph.D. Thesis, 
 
 University of California at Los Angeles Report No. 69-12, 
 March 1969. 
 
 [SCH72] Schmookler, M. S. , "Considerations in the Design of a High 
 Speed Decimal Unit," Proc. IEEE-TCCA Symposium on 
 Computer Arithmetic , College Park, Maryland, May 1972. 
 
 [SEN65] Senzig, D. N., and R. V. Smith, "Computer Organization for 
 
 Array Processing," Proc. 1965 Fall Joint Computer Conf. , 
 pp. 117-128. 
 
 [SEN67] Senzig, D. N., "Observations on High-Performance Machines," 
 Proc. 1967 Fall Joint Computer Conf. , pp. 791-799. 
 
 [ST070] Stone, H. S., "A Logic- in -Memory Computer," IEEE Trans, on 
 Computers , Vol. 19, No. 1, January 1970, pp. 73-78. 
 
 [TJA70] Tjaden, G. S., and M. J. Flynn, "Detection and Parallel 
 
 Execution of Independent Instructions," IEEE Trans, on 
 Computers , Vol. 19, No. 10, October 1970, p. 889. 
 
 [T0M67] Tomasulo, R. M., "An Efficient Algorithm for Exploiting 
 Multiple Arithmetic Units," IBM Journal of Res, and 
 Devel. , Vol. 11, No. 1, January 1967. 
 
 [VAD71] Vadasz, L. L., H. T. Chua, and A. S. Grove, "Semiconductor 
 Random-Access Memories," IEEE Spectrum , May 1971, 
 pp. 40-48. 
 
 [WAT72] Watson, W. J., "The Texas Instruments Advanced Scientific 
 Computer," COMPCON 72 Digest of Papers , pp. 291-293. 
 
 [WIL65] Wilkes, M. V., "Slave Memories and Dynamic Storage 
 
 Allocation," IEEE Trans, on Computers , Vol. 14, No. 2, 
 April 1965, pp. 270-271. 
 
 [WIR66] Wirth, N., "A Note on Program Structures for Parallel 
 Processing," Comm. ACM , Vol. 9, No. 5, May 1966, 
 pp. 320-321. 
 
 [WIT72] Withington, F. G., "The Next (and Last?) Generation," 
 Datamation , May 1972, pp. 71-74. 
 
171 
 
 APPENDIX 
 
 Program Analysis Example 
 To demonstrate the algorithms given in Chapter 3, we present 
 in this Appendix a simplified example of the analysis of a COBOL pro- 
 gram. The program, given in Table A.l, used for this analysis is a 
 very simple program constructed with many features commonly found in the 
 real programs we analyzed. In essence, this program selects records 
 from an input file, builds an output file by excerpting data from the 
 selected input records, and sorts this output file. The program then 
 matches the sorted records with records from a master file and prints a 
 report from the data included in the two sets of records. 
 
 A.l Source Text Scan 
 
 Two tables are presented which contain the results of the scan 
 of the source text. Table A. 2 gives the correspondence between the 
 variable names used in the program and the variable numbers we use in 
 various tables. Note that files are treated as variables and that 
 FILLER items have been dropped. Table A. 3 presents the following 
 information for each statement: 
 
 STMT # - The sequence number we have assigned to the 
 
 statement. 
 STATEMENT - An abbreviated copy of the statement. 
 NODE TYPE - The type of node representing this state- 
 ment in the program graph. The following 
 abbreviations are used: 
 
172 
 
 Table A.l 
 Program Listing 
 
 000001 
 000002 
 000003 
 000004 
 000005 
 000006 
 000007 
 000008 
 000009 
 000010 
 000011 
 000012 
 000013 
 000014 
 000015 
 000016 
 000017 
 000018 
 000019 
 000020 
 000021 
 000022 
 000023 
 000024 
 000025 
 000026 
 000027 
 000028 
 000029 
 000030 
 000031 
 000032 
 000033 
 000034 
 000035 
 000036 
 000037 
 000038 
 000039 
 000040 
 000041 
 000042 
 000043 
 000044 
 000045 
 000046 
 
 IDENTIFICATION DIVISION. 
 PROGRAM-ID. SAMPLE. 
 ENVIRONMENT DIVISION. 
 INPUT-OUTPUT SECTION. 
 FILE-CONTROL 
 
 SELECT DEPT-MAST ASSIGN TO UT-S-DPTMAST, 
 SELECT SDT-MAST ASSIGN TO UT-S-SDTMAST. 
 SELECT PRT-FILE ASSIGN TO UT-S-PRTFILE, 
 SELECT SORT-FILE ASSIGN TO UT-S-SRTFILE, 
 DATA DIVISION. 
 FILE SECTION. 
 FD DEPT-MAST 
 
 RECORDING MODE IS F 
 BLOCK CONTAINS 1240 CHARACTERS 
 RECORD CONTAINS 124 CHARACTERS 
 LABEL RECORDS ARE STANDARD 
 DATA RECORD IS DEPT-REC. 
 01 DEPT-REC. 
 
 PIC X(30). 
 PIC 9(9). 
 PIC X(82). 
 PIC XX. 
 PIC X. 
 
 FD 
 
 01 
 
 FD 
 
 01 
 SD 
 
 01 
 
 02 
 02 
 02 
 02 
 02 
 
 NAME 
 
 SSN 
 
 FILLER 
 
 CODE-1 ■ 
 
 CODE-2 
 SDT-MAST 
 
 RECORDING MODE IS F 
 BLOCK CONTAINS 800 CHARACTERS 
 RECORD CONTAINS 80 CHARACTERS 
 LABEL RECORDS ARE STANDARD 
 DATA RECORD IS MAST-REC. 
 MAST-REC. 
 
 02 SSNO PIC 9(9). 
 02 FILLER PIC X(64). 
 02 UNITS PIC 99V9. 
 02 GPA PIC 9V999. 
 PRT-FILE 
 
 RECORDING MODE IS F 
 BLOCK CONTAINS 1300 CHARACTERS 
 RECORD CONTAINS 130 CHARACTERS 
 LABEL RECORDS ARE STANDARD 
 DATA RECORD IS PRT-BUF. 
 PRT-BUF PIC X(130). 
 SORT-FILE 
 
 RECORDING MODE IS F 
 RECORD CONTAINS 40 CHARACTERS 
 DATA RECORD IS SORT-REC. 
 SORT-REC. 
 
173 
 
 Table A.l (continued) 
 Program Listing 
 
 000047 
 000048 
 000049 
 000050 
 000051 
 000052 
 000053 
 000054 
 000055 
 000056 
 000057 
 000058 
 000059 
 000060 
 000061 
 000062 
 000063 
 000064 
 000065 
 000066 
 000067 
 000068 
 000069 
 000070 
 000071 
 000072 
 000073 
 000074 
 000075 
 000076 
 000077 
 000078 
 000079 
 000080 
 000081 
 000082 
 000083 
 000084 
 000085 
 000086 
 000087 
 000088 
 000089 
 000090 
 000091 
 000092 
 000093 
 
 02 SSN PIC 9(9). 
 02 NAME PIC X(30). 
 02 C0DE-2 PIC X. 
 
 WORKING-STORAGE SECTION. 
 
 77 DEPT-CNT 
 
 77 DEPT-SELECTED 
 
 77 SORT-CNT 
 
 77 MAST-CNT 
 
 77 PAGES 
 
 77 LINES 
 
 01 PRINT-REC. 
 02 SSN 
 02 FILLER 
 02 NAME 
 02 FILLER 
 02 CODE-2 
 02 FILLER 
 02 UNITS 
 02 FILLER 
 02 GPA 
 02 FILLER 
 
 01 PRT-HEAD. 
 02 FILLER 
 02 FILLER 
 FILLER 
 PAGENO 
 FILLER 
 
 PIC 9(5) 
 PIC 9(5) 
 PIC 9(5) 
 PIC 9(5) 
 PIC 9(5) 
 PIC 9(2) 
 
 VALUE 
 VALUE 
 VALUE 
 VALUE 
 VALUE 
 
 0. 
 0. 
 0. 
 0. 
 0. 
 
 VALUE 52 
 
 PIC 9(9). 
 
 PIC XX VALUE SPACES. 
 
 PIC X(30). 
 
 PIC X(5) VALUE SPACES. 
 
 PIC X. 
 
 PIC XXX VALUE SPACES. 
 
 PIC 99 9 
 
 PIC X(4)" VALUE ' — 
 
 PIC 9.999 
 
 PIC X(68) VALUE SPACES. 
 
 02 
 02 
 02 
 
 PIC X(50) VALUE SPACES. 
 PIC X(20) VALUE ' MATCH REPORT 
 PIC X(40) VALUE SPACES. 
 PIC 9(5). 
 
 PIC X(35) VALUE SPACES. 
 PROCEDURE DIVISION. 
 
 SORT SORT-FILE ASCENDING SSN OF SORT-REC 
 INPUT PROCEDURE SELECT-DEPT 
 OUTPUT PROCEDURE PRINT-REPORT. 
 GOBACK. 
 SELECT-DEPT SECTION. 
 
 OPEN INPUT DEPT-MAST. 
 READ-DEPT. 
 
 READ DEPT-MAST AT END TO TO END-SELECT. 
 ADD 1 TO DEPT-CNT. 
 
 IF CODE-1 = '26' GO TO READ-DEPT-DONE. 
 IF CODE-1 = '37' OR CODE-2 OF DEPT-REC = 'M', 
 MOVE '*' TO CODE-2 OF DEPT-REC, 
 GO TO READ-DEPT-DONE. 
 GO TO READ-DEPT. 
 READ-DEPT-DONE. 
 
 MOVE CORRESPONDING DEPT-REC TO SORT-REC. 
 
 RELEASE SORT-REC. 
 
 ADD 1 TO DEPT-SELECTED. 
 
 GO TO READ-DEPT. 
 
174 
 
 Table A.l (continued) 
 Program Listing 
 
 000094 END-SELECT. 
 
 000095 CLOSE DEPT-MAST. 
 
 000096 PRINT-REPORT SECTION. 
 
 000097 OPEN INPUT SDT-MAST, OUTPUT PRT-FILE. 
 
 000098 RETURN-DEPT 
 
 000099 RETURN SORT-FILE AT END GO TO END-JOB. 
 
 000100 ADD 1 TO SORT-CNT. 
 
 000101 READ-SDT. 
 
 000102 READ SDT-MAST AT END 
 
 000103 DISPLAY ' RAN OUT OF MASTER RECORDS ' SORT-REC, 
 
 000104 GO TO END-JOB. 
 
 000105 ADD 1 TO MAST-CNT. 
 
 000106 MATCH-RECS. 
 
 000107 IF SSN OF SORT-REC > SSNO, GO TO READ-SDT. 
 
 000108 IF SSN OF SORT-REC < SSNO, 
 
 000109 DISPLAY ' NO MASTER RECORD FOR STUDENT ' SORT-REC, 
 
 000110 PERFORM RETURN-DEPT, 
 
 000111 GO TO MATCH-RECS. 
 
 000112 MOVE CORRESPONDING SORT-REC TO PRINT-REC. 
 
 000113 MOVE CORRESPONDING MAST-REC TO PRINT-REC. 
 
 000114 IF LINES > 50, 
 
 000115 ADD 1 TO PAGES, 
 
 000116 MOVE PAGES TO PAGENO, 
 
 000117 WRITE PRT-BUF FROM PRT-HEAD, 
 
 000118 MOVE TO LINES. 
 
 000119 WRITE PRT-BUF FROM PRINT-REC. 
 
 000120 ADD 1 TO LINES. 
 
 000121 GO TO RETURN-DEPT. 
 
 000122 END-JOB. 
 
 000123 CLOSE SDT-MAST, PRT-FILE. 
 
 000124 DISPLAY DEPT-CNT, DEPT-SELECTED, SORT-CNT, MAST-CNT. 
 
175 
 
 Table A. 2 
 Variable Names 
 
 No. Name 
 
 1 
 
 DEPT-MAST 
 
 2 
 
 DEPT-REC.NAME 
 
 3 
 
 .SSN 
 
 4 
 
 .C0DE-1 
 
 5 
 
 .CODE-2 
 
 6 
 
 SDT-MAST 
 
 7 
 
 MAST-REC.SSNO 
 
 8 
 
 .UNITS 
 
 9 
 
 .GPA 
 
 10 
 
 PRT-FILE 
 
 11 
 
 PRT-BUF 
 
 12 
 
 SORT-FILE 
 
 13 
 
 SORT-REC.SSN 
 
 14 
 
 .NAME 
 
 15 
 
 .CODE-2 
 
 16 
 
 DEPT-CNT 
 
 17 
 
 DEPT-SELECTED 
 
 18 
 
 SORT-CNT 
 
 19 
 
 MAST-CNT 
 
 20 
 
 PAGES 
 
 21 
 
 LINES 
 
 22 
 
 PRINT-REC.SSN 
 
 23 
 
 .NAME 
 
 24 
 
 .CODE-2 
 
 25 
 
 .UNITS 
 
 26 
 
 .GPA 
 
 27 
 
 PRT-HEAD.PAGENO 
 
 28 
 
 SYSOUT 
 
176 
 
 o 
 oo 
 
 •O C\J 
 
 cm co ■ — Looor-. oo 01 
 
 •O 
 
 o n n ^- lo ^ n 
 
 a. 
 
 <0 Lf> 
 
 Oi — cm co ^i- in <n i^ 
 
 oo w n 
 
 co 
 
 i— CM CO 
 
 LO 
 
 cd 
 o <c 
 
 Lf) 
 
 co 
 
 CM 
 
 <0 
 
 IT) 
 
 r-. 
 
 CO 
 
 oo r^ 
 
 in 
 
 «3- 
 
 O r— 
 
 CM "CO 
 
 f— l£> i— 
 
 00 
 
 or 
 
 CO 
 
 <u 
 
 (T3 
 
 OO 
 
 
 
 
 r»» 
 
 
 
 LO 
 
 
 
 
 — 
 
 CO 
 
 
 
 ^~ 
 
 
 
 
 <0 
 
 - 
 
 
 
 *i 
 
 
 
 ^— 
 
 CM 
 
 n 
 
 
 in 
 
 *tf" 
 
 n— 
 
 
 II 
 
 — 
 
 ** 
 
 — 
 
 tt 
 
 ^— 
 
 II 
 
 
 A 
 
 II 
 
 LO 
 
 * 
 
 ro 
 
 a 
 
 f\ 
 
 
 CO 
 
 ft 
 
 A 
 
 — 
 
 #% 
 
 CO 
 
 r-~ 
 
 1 — 
 
 1 1— 
 
 <3- 
 
 ^f 
 
 II 
 
 CM 
 
 1 — 
 
 t— 
 
 CM "CvJ 
 
 i— co i— 
 
 II 
 
 m 
 CO 
 
 JD 
 
 <o 
 
 
 
 
 
 
 
 
 
 
 UJ 
 
 
 
 
 
 
 
 
 rt3 
 
 i- 
 
 uj lu 
 
 1— 
 
 Q 
 
 
 
 
 
 
 
 1— 
 
 
 \— 
 
 h- 
 
 1— 
 
 Q 
 
 
 
 t— 
 
 C7> 
 
 q o_ 
 
 oo 
 
 «=c 
 
 
 oo 
 
 
 
 oo 
 
 OO 
 
 M 
 
 oo 
 
 OO 
 
 oo 
 
 oo 
 
 <x. 
 
 
 00 
 
 
 O 
 
 o >- 
 
 >- 
 
 LU 
 
 z: 
 
 et 
 
 U_ 
 
 U_ 
 
 ■=c 
 
 <: 
 
 Qd 
 
 < 
 
 >- 
 
 >- 
 
 >- 
 
 UJ 
 
 z: 
 
 <=c 
 
 
 s- 
 
 2: (— 
 
 oo 
 
 a; 
 
 o 
 
 CO 
 
 i — i 
 
 1 — 1 
 
 CQ 
 
 CQ 
 
 :s 
 
 CQ 
 
 oo 
 
 00 
 
 oo 
 
 on 
 
 o 
 
 CO 
 
 
 a. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CM 
 
 
 
 
 
 
 
 - 
 
 UJ 
 
 en 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 s: 
 
 Q 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 — 
 
 O 
 c_> 
 
 1— 
 
 O- 
 
 
 o 
 
 
 
 UJ 
 
 —I 
 
 
 
 
 
 
 
 
 
 
 n 
 
 o 
 
 UJ 
 Q 
 
 
 UJ 
 
 1— 
 
 
 
 1— 1 
 
 u. 
 
 
 
 
 1— 
 
 
 
 
 
 
 Qd CM 
 
 UJ 
 
 
 
 o 
 
 
 
 1 
 
 
 
 
 2: 
 
 
 
 
 
 
 O 1 
 
 en 
 
 CD 
 
 
 UJ 
 
 
 
 1— 
 
 
 
 
 UJ 
 
 
 
 
 1— 
 
 
 UJ 
 
 1 
 
 2T 
 
 
 _l 
 
 
 
 Qd 
 
 
 
 1— 
 
 2: 
 
 
 
 
 2: 
 
 — 
 
 - Q 
 
 h- 
 
 i— i 
 
 o 
 
 UJ 
 
 
 
 Q_ 
 
 UJ 
 
 
 •Zd 
 
 UJ 
 
 
 
 
 o 
 
 <o 
 
 r-. o 
 
 Q_ 
 
 Q C_> 
 
 UJ 
 
 oo 
 
 1— 
 
 
 
 _l 
 
 
 
 
 I— 
 
 h- 
 
 h- 
 
 
 i 
 
 CM 
 
 CO <_> 
 
 UJ 
 
 2T UJ 
 
 ct: 
 
 1 
 
 OO 
 
 UJ 
 
 ft 
 
 1 — 1 
 
 
 1 
 
 <C 
 
 oo 
 
 oo 
 
 
 h- 
 
 — 
 
 — • 
 
 Q 
 
 o cd 
 
 i 
 
 I— 
 
 «t 
 
 _J 
 
 1— 
 
 u. 
 
 
 1— 
 
 h- 
 
 «=c 
 
 <: 
 
 
 D_ 
 
 
 O 
 
 
 D_ 1 
 
 l— 
 
 Q_ 
 
 s: 
 
 l-H 
 
 OO 
 
 1 
 
 
 Qd 
 
 oo 
 
 s 
 
 s: 
 
 
 UJ 
 
 II 
 
 II UJ 
 
 O 
 
 O0 1— 
 
 Dd 
 
 UJ 
 
 i 
 
 Li_ 
 
 <c 
 
 h- 
 
 
 O 
 
 
 1 
 
 i 
 
 
 a 
 
 
 Qd 
 
 1— 
 
 uj on 
 
 O 
 
 Q 
 
 i— 
 
 1 
 
 s: 
 
 Cd 
 
 
 O0 
 
 
 1— 
 
 I— 
 
 
 
 r— 
 
 i— 1 
 
 
 on o 
 
 OO 
 
 
 Q_ 
 
 H- 
 
 i 
 
 O 
 
 
 
 
 a. 
 
 D_ 
 
 
 o 
 
 1 
 
 1 h- 
 
 — 
 
 ad oo 
 
 
 O 
 
 LU 
 
 Cd 
 
 i— 
 
 00 
 
 
 O 
 
 
 UJ 
 
 UJ 
 
 
 i— 
 
 UJ 
 
 UJ Q_ 
 
 * 
 
 o 
 
 UJ 
 
 1— 
 
 Q 
 
 o 
 
 a 
 
 
 
 1— 
 
 
 Q 
 
 Q 
 
 Q 
 
 
 Q 
 
 Q UJ 
 
 — 
 
 o o 
 
 OO 
 
 
 
 oo 
 
 oo 
 
 ^z 
 
 O 
 
 
 
 
 
 2: 
 
 r— 
 
 O 
 
 O Q 
 
 
 \— 
 
 <c 
 
 ■ — 
 
 UJ 
 
 
 
 Qd 
 
 Sd 
 
 r— 
 
 
 2: 
 
 Q 
 
 UJ 
 
 
 O 
 
 O 
 
 UJ 
 
 UJ 
 
 UJ 
 
 
 OO 
 
 1— 
 
 2: 
 
 ZD 
 
 UJ 
 
 
 
 UJ 
 
 < 
 
 
 Q 
 
 
 
 > 
 
 > 
 
 _i 
 
 Q 
 
 o 
 
 Qd 
 
 UJ 
 
 1— 
 
 
 O 
 
 
 O- 
 
 UJ 
 
 l— 
 
 Q 
 
 u_ 
 
 u_ 
 
 o 
 
 o 
 
 UJ 
 
 Q 
 
 _J 
 
 o 
 
 D_ 
 
 UJ 
 
 »— 
 
 O 
 
 
 O 
 
 DC 
 
 ■=c 
 
 < 
 
 l-H 
 
 1 — 1 
 
 s: 
 
 s: 
 
 c*: 
 
 <C 
 
 o 
 
 00 
 
 O 
 
 Qd 
 
 < 
 
 < 
 
 =*= 
 
 1— CM CO «3" LO <0 
 
 OO 
 
 00 
 
 CTlOr- cmco^-lovo 
 
177 
 
 o 
 oo 
 
 o 
 
 CM 
 
 c\j r>« 
 
 CM CM 
 
 CM 
 
 cocn^Oi — r^co«^-Lovoi — oo 
 i— i — to (\l i — w w w m w w 
 
 cr> 
 
 CM 
 
 -3- 
 ro 
 
 o i— 
 ro en 
 
 cm 
 ro 
 
 en 
 en 
 
 ro 
 
 at 
 
 UDr^OOCOOr— ojco^j-loud 
 
 ■ — i — i — i — CM (VI W CM N N W 
 
 CM 
 
 CO Ol O r- 
 
 cm cm en en 
 
 CM 
 
 ro 
 
 
 
 
 
 
 ID 
 
 
 «3- 
 
 
 
 
 
 
 
 r— 
 
 
 CM 
 
 
 
 
 
 
 
 r\ 
 
 
 n 
 
 
 on 
 
 cr> 
 
 
 
 
 «=fr 
 
 
 CO 
 
 VO 
 
 o <=£ 
 
 ft 
 
 
 
 
 ^~ 
 
 
 CM 
 
 CM 
 
 > 
 
 00 
 
 
 
 
 m 
 
 
 A 
 
 ft 
 
 
 n 
 
 00 
 
 C7i 
 
 00 
 
 en 
 
 00 
 
 CM 
 
 LO 
 
 
 i^ 
 
 1 CM 
 
 i — 1 
 
 1 CM 
 
 i — i 
 
 1 1 — 
 
 CM 
 
 CM 
 
 O t-* O i— 
 CM CM r- CM 
 
 CD 
 
 
 3 
 
 >> 
 
 C 
 
 s- 
 
 •p— 
 
 <U 
 
 4-> 
 
 § 
 
 O 
 
 3 
 
 o 
 
 00 
 
 
 E 
 
 ro 
 
 (O 
 
 • 
 
 s- 
 
 «=c 
 
 CD 
 
 
 o 
 
 CD 
 
 S- 
 
 f> 
 
 Q_ 
 
 
 
 Ot 
 
 LU LU 
 
 O >- 
 
 V£> 
 
 < 
 
 LO 
 
 ro 
 
 Ct: O IS 
 
 M CO 
 
 Ot <c 
 
 0Q 
 
 IT) 
 
 ro 
 
 cr> ro ro - 
 
 II r— i— i— || 
 
 at 
 
 LO 
 
 
 m 
 
 
 o 
 
 
 
 
 
 r— 
 
 ^t 
 
 
 LO 
 
 r^ 
 
 
 
 
 II 
 
 t— 
 
 
 II 
 
 II 
 
 
 
 
 •s 
 
 n 
 
 CTl 
 
 r> 
 
 #* 
 
 
 
 
 CO 
 
 ro 
 
 #» 
 
 r— 
 
 o 
 
 o 
 
 r^ 
 
 o 
 
 r— ■ 
 
 ■ — 
 
 CO 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 ii 
 
 Q 
 
 
 
 
 
 
 
 
 \— 
 
 
 <: 
 
 
 oo 
 
 oo 
 
 oo 
 
 
 OO 
 
 oo 
 
 t— i 
 
 oo 
 
 UJ 
 
 si 
 
 <=c 
 
 cC 
 
 < 
 
 U_ 
 
 <=c 
 
 <c 
 
 at 
 
 < 
 
 at 
 
 o 
 
 en 
 
 CO 
 
 CO 
 
 i— i 
 
 CO 
 
 CO 
 
 3 
 
 CO 
 
 
 
 
 
 
 OO 
 
 oo 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 oo 
 
 00 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 • 
 
 • 
 
 
 
 
 
 CJ 
 
 
 C_) 
 
 
 
 
 
 Q 
 
 
 
 
 
 
 
 o 
 
 C_3 
 
 
 
 
 
 LU 
 
 
 LU 
 
 
 
 
 
 «=C 
 
 
 
 
 
 
 
 LU 
 
 LU 
 
 
 
 
 
 at 
 
 
 at 
 
 
 
 
 
 LU 
 
 
 
 
 
 
 
 Ot 
 1 
 
 at 
 i 
 
 
 
 
 
 1— 
 
 
 i 
 
 I— 
 
 
 
 
 
 i 
 
 
 
 
 
 C_) 
 
 
 1— 
 
 I— 
 
 o 
 
 
 
 
 q; 
 
 
 oo 
 
 
 
 
 
 l— 
 
 
 
 
 
 LU 
 
 
 OO 
 
 oo 
 
 LU 
 
 
 
 
 o 
 
 
 < 
 
 
 
 
 
 at 
 
 
 
 
 
 ct: 
 
 
 <c 
 
 <c 
 
 at 
 
 
 
 
 oo 
 
 
 s: 
 
 
 
 
 o 
 
 Q. 
 
 
 I— 
 
 
 
 i 
 
 
 s: 
 
 s: 
 
 i 
 
 
 
 
 
 
 
 
 
 
 2: 
 
 
 
 St 
 
 
 
 h- 
 
 
 
 
 1— 
 
 
 
 
 CJ3 
 
 
 o 
 
 
 
 
 LU 
 
 s: 
 
 
 LU 
 
 
 
 at 
 
 I— 
 
 A 
 
 V 
 
 at 
 
 
 
 1— 
 
 st 
 
 
 ~^^ 
 
 
 
 
 o 
 
 o 
 
 
 2£ 
 
 
 
 o 
 
 st 
 
 
 
 o 
 
 LU 
 
 
 st 
 
 t— 4 
 
 <_> 
 
 »-H 
 
 o 
 
 
 
 >=C 
 
 ct: 
 
 
 LU 
 
 
 
 oo 
 
 o 
 
 2: 
 
 ^ 
 
 oo 
 
 _l 
 
 
 o 
 
 Q 
 
 LU 
 
 Q 
 
 LU 
 
 
 
 Q_ 
 
 Li_ 
 
 oo 
 
 1— 
 
 
 
 
 i 
 
 oo 
 
 oo 
 
 
 i— I 
 
 
 1 
 
 Z 
 
 at 
 
 St 
 
 q; 
 
 
 oo 
 
 
 
 LU 
 
 =c 
 
 h- 
 
 
 — 
 
 l— 
 
 oo 
 
 oo 
 
 — 
 
 Ll_ 
 
 
 1— 
 
 O 
 
 i 
 
 O 
 
 i 
 
 o 
 
 LU 
 
 O 
 
 U- 
 
 St 
 
 1— 
 
 oo 
 
 
 • 
 
 oo 
 
 • 
 
 • 
 
 • 
 
 1 
 
 
 q: 
 
 Q_ 
 
 h- 
 
 D_ 
 
 1— 
 
 tn 
 
 CD 
 
 1— 
 
 rs 
 
 t— 1 
 
 oo 
 
 < 
 
 
 • 
 
 <c 
 
 o 
 
 c_> 
 
 • 
 
 1— 
 
 
 o 
 
 00 
 
 ^ 
 
 OO 
 
 ZZL 
 
 
 ■< 
 
 
 CO 
 
 _J 
 
 
 s: 
 
 
 • 
 
 s: 
 
 LU 
 
 LU 
 
 • 
 
 at 
 
 
 oo 
 
 LU 
 
 i— i 
 
 LU 
 
 i— « 
 
 A 
 
 Q. 
 
 OO 
 
 1 
 
 
 
 | 
 
 
 — 
 
 
 Q£ 
 
 c£ 
 
 — 
 
 o 
 
 
 
 C£ 
 
 at 
 
 a: 
 
 ct: 
 
 
 
 LU 
 
 h- 
 
 o 
 
 
 i— 
 
 
 
 o 
 
 1 
 
 i 
 
 
 oo 
 
 
 o 
 
 ct: 
 
 a. 
 
 ct: 
 
 Q. 
 
 OO 
 
 O 
 
 o 
 
 at 
 
 1— 
 
 
 Q 
 
 
 >- 
 
 i— 
 
 h- 
 
 l— 
 
 >- 
 
 
 
 1— 
 
 o 
 
 
 o 
 
 
 LU 
 
 1— 
 
 cC 
 
 Q_ 
 
 
 
 OO 
 
 Q 
 
 <c 
 
 
 oc 
 
 at 
 
 <=E 
 
 st 
 
 Q 
 
 
 o c 
 
 o c 
 
 St 
 
 
 Q_ 
 
 
 o 
 
 
 
 z: 
 
 _I 
 
 r— 
 
 o 
 
 o 
 
 _J 
 
 at 
 
 St 
 
 f— 
 
 
 1— 
 
 
 1— 
 
 I— I 
 
 n- 
 
 
 LU 
 
 
 
 Q 
 
 LU 
 
 Q_ 
 
 
 oo 
 
 oo 
 
 Q_ 
 
 =3 
 
 LU 
 
 
 LU 
 
 
 LU 
 
 
 _l 
 
 
 LU 
 
 1— 
 
 LU 
 
 
 =j: 
 
 
 OO 
 
 Q 
 
 
 
 OO 
 
 I— 
 
 
 Q 
 
 > 
 
 
 > 
 
 
 
 Q 
 
 > 
 
 l—i 
 
 > 
 
 
 LU 
 
 I— 
 
 1— < 
 
 Q 
 
 LL. 
 
 Ll- 
 
 t— i 
 
 LU 
 
 h- 
 
 Q 
 
 o 
 
 
 o 
 
 
 u_ 
 
 a 
 
 O 
 
 ct: 
 
 o 
 
 
 Ot 
 
 ■=a: 
 
 Q 
 
 <C 
 
 HH 
 
 t— t 
 
 Q 
 
 at 
 
 <=C 
 
 < 
 
 s: 
 
 
 s: 
 
 
 1— < 
 
 ct 
 
 s: 
 
 3 
 
 s: 
 
 =«= 
 
 co cr> 
 
 oo 
 
 o 
 
 r— 
 
 CM 
 
 CO 
 
 «3" 
 
 LO 
 
 <£> 
 
 |-~. 
 
 CO 
 
 CT> 
 
 O 
 
 i — 
 
 CM 
 
 ro 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CO 
 
 CO 
 
 ro 
 
 ro 
 
178 
 
 o 
 
 ID 
 OO 
 
 IT) 
 CO 
 
 r— CO 
 
 
 
 
 LT) 
 
 
 
 
 
 CM 
 
 
 Q 
 
 
 
 •1 
 
 
 Ll) 
 
 CO 
 
 
 CX» 
 
 
 q; 
 
 CO 
 
 
 p"~ 
 
 
 Q_ 
 
 #» 
 
 
 n 
 
 
 
 <T> 
 
 «3" 
 
 LT) 
 
 l£> 
 
 
 CM 
 
 CO 
 
 «— 
 
 co 
 
 o <c 
 
 O >— "00 
 
 r- M (D N 
 
 -o 
 
 
 O) 
 
 
 3 
 
 >> 
 
 C 
 
 s- 
 
 •r- 
 
 (T3 
 
 •4-> 
 
 E 
 
 C 
 
 E 
 
 o 
 
 3 
 
 o 
 
 OO 
 
 •«^_-' 
 
 
 
 E 
 
 CO 
 
 <o 
 
 • 
 
 s- 
 
 cC 
 
 o> 
 
 
 o 
 
 CD 
 
 s- 
 
 i^ 
 
 Q_ 
 
 jD 
 
 
 fO 
 
 
 oc 
 
 UJ LjJ 
 Q D- 
 O >- 
 
 <JD 
 CM 
 
 uo 
 
 CO 
 
 Cvl 
 
 CO 
 CM 
 
 CM 
 CM 
 
 II 
 
 oo 
 
 o >— 
 
 <£> 
 
 UD 
 
 1 — 1 
 
 OO 
 
 00 
 
 1— 1 
 
 DC 
 
 et 
 
 >- 
 
 DC 
 
 3 
 
 CO 
 
 oo 
 
 3 
 
 
 CJ> 
 
 
 
 
 
 
 UJ 
 
 
 
 UJ 
 
 
 i 
 
 
 
 _J 
 
 UJ 
 
 
 1— 
 
 
 UJ 
 
 00 
 
 
 2: 
 
 
 _i 
 
 1 
 
 
 »— 1 
 
 
 1— 1 
 
 1— 1— 
 
 
 DC 
 
 
 u_ 
 
 Q_ -Z. 
 
 I— 
 
 Q_ 
 
 
 1 
 
 1— 
 
 UJ O 
 Q 1 
 
 ^ 
 
 S 
 
 
 DC 
 
 1— 
 
 UJ 
 
 O 
 
 
 Q. 
 
 "O0 
 
 s: 
 
 q; 
 
 
 
 1— <c 
 
 UJ 
 
 u. 
 
 
 #1 
 
 z: 2: 
 
 h- 
 
 
 OO 
 
 1— 
 
 
 
 e£ 
 
 U_ 
 
 UJ 
 
 OO 
 
 1 *> 
 
 1— 
 
 ZD 
 
 ■z. 
 
 < 
 
 1— 1— 
 
 OO 
 
 CO 
 1 
 
 1 — 1 
 
 _J 
 
 SI 
 1 
 
 q. z: 
 
 UJ 
 
 
 I— 
 
 
 I— 
 
 1 
 
 
 DC 
 
 
 
 O 
 
 1— 
 
 
 D_ 
 
 1— 
 
 OO 
 
 >- DC 
 
 
 UJ 
 
 1 — 
 
 UJ 
 
 _l 00 
 
 
 1— 
 
 
 O0 
 
 D_ 
 
 
 1 — 1 
 
 Q 
 
 O 
 
 O0 
 
 
 DC 
 
 O 
 
 _l 
 
 t— i 
 
 
 Z2 
 
 <C 
 
 c_> 
 
 Q 
 
 =*t= 
 
 00 
 
 <3- uo <jo r~- 
 co co CO CO 
 
179 
 
 BAS - Block of assignment statements. 
 
 IF - IF statement. 
 
 ON - ON or AT condition attached to 
 
 the preceding statement. 
 READ - Input statement. 
 SYST - Routine other than I/O which 
 
 should be provided by the opera- 
 ting system. 
 WRITE - Output statement. 
 
 I VAR - The set of input variables for the state- 
 ment. Items prefaced by an equal sign (=) 
 are constants having the appropriate value. 
 For example, ='M' is a constant whose value 
 is the character M. 
 
 VAR - The set of output variables for the statement. 
 
 PRED - The numbers of the statements which imme- 
 diately precede the statement on some path 
 through the program. 
 
 SUC - The numbers of the statements which imme- 
 diately follow the statement on some path 
 through the program. 
 These last two entries provide the program graph. 
 
 A. 2 Phase and Link Identification 
 
 Figure A.l is a sketch of the program graph. It is apparent 
 that there are two phases in this program. One consists of nodes 2 
 
180 
 
 Figure A.l 
 Program Graph 
 
181 
 
 through 10, and the other consists of nodes 14 through 35 with the 
 exclusion of node 19. Nodes 1, 11, 12, 13, 19, 36, and 37 are in vari- 
 ous links in the program. The statement numbers shown in parentheses 
 at some nodes are the comparison operations needed for the corresponding 
 IF statement; (29), for example, is the comparison (of LINES to the 
 value 50) whose result is used in executing the conditional branch at 
 node 29. 
 
 A. 3 Statement Migration 
 
 Very little statement migration is possible in this program. 
 Figure A. 2 shows the movement of statements 8 and 10 in Phase 2. 
 Figure A. 3 shows the movement of statements 33 and 35. In both figures, 
 only the portion of the phase affected is shown. 
 
 A. 4 Variable Type Identification 
 
 Table A. 4 shows the types of each of the variables used in the 
 program. The sets of nodes in which the variables appear as input and 
 as output variables are also shown. The abbreviations used for the 
 variable types are the following: 
 
 C - Constant 
 L - Local 
 
 RI - Reference Independent 
 RD - Reference Dependent 
 NU - Not used in this phase 
 
182 
 
 <\ 
 
 «VJ 
 
 
 £ 
 
 
 
 
 .5 
 
 
 i 
 
 i 
 
 
 
 i 
 
 
 
 
 
 
 
 
 
 
 g 
 
 00 
 
 
 
 
 
 <o 
 
 
 
 r „ 
 
 
 m 
 
 
 
 / 
 
 I * 
 
 
 
 
 
 
 0> 
 
 
 
 
 
 
 
 
 
 
 o 
 
 03* 
 
 
 
 
 o 
 
 +-> 
 
 en 
 
 CM 
 
 CD 
 
 S- 
 
 CT> 
 
 c 
 
 CD 
 E 
 CD 
 +-> 
 (0 
 4-> 
 00 
 
 CD 
 CO 
 <T3 
 
183 
 
 
 ID 
 
 
 
 
 - 
 
 
 
 fO 
 
 ro 
 
 
 
 
 
 
 
 
 
 o 
 
 ro 
 
 <M 
 
 
 
 to 
 
 ro 
 
 ro 
 
 
 
 
 1 
 
 
 i 
 
 
 
 
 
 
 
 
 
 in 
 
 
 
 
 to 
 
 
 
 
 £ 
 
 S- 
 CJ> 
 
 CO 
 
 < 
 
 C 
 
 
 <D 
 
 (U 
 
 1= 
 
 S- 
 
 0) 
 
 13 
 
 4-> 
 
 CD 
 
 A3 
 
 00 
 
 £ 
 
 <1J 
 
 to 
 
 o 
 
 ro 
 
 ro 
 
 ro 
 ro 
 
 ro 
 
 in 
 ro 
 
184 
 
 
 
 
 Table 
 
 A. 4 
 
 
 
 
 
 
 V 
 
 ariable 
 
 Types 
 
 
 
 
 Variable 
 
 I-set 
 
 Phase 1 
 0-set 
 
 Type 
 
 I-set 
 
 Phase 
 
 2 
 0-set 
 
 Ty 
 
 1 
 
 2 
 
 - 
 
 RD 
 
 - 
 
 
 -. 
 
 NU 
 
 2 
 
 8 
 
 2 
 
 L 
 
 - 
 
 
 - 
 
 NU 
 
 3 
 
 8 
 
 2 
 
 L 
 
 - 
 
 
 - 
 
 NU 
 
 4 
 
 5,6,8 
 
 2 
 
 L 
 
 - 
 
 
 - 
 
 NU 
 
 5 
 
 6,8 
 
 2,7 
 
 L 
 
 - 
 
 
 - 
 
 NU 
 
 6 
 
 - 
 
 - 
 
 NU 
 
 17 
 
 
 - 
 
 RD 
 
 7 
 
 - 
 
 - 
 
 NU 
 
 21,22 
 
 
 17 
 
 L 
 
 8 
 
 - 
 
 - 
 
 NU 
 
 28 
 
 
 17 
 
 L 
 
 9 
 
 - 
 
 - 
 
 NU 
 
 28 
 
 
 17 
 
 L 
 
 10 
 
 - 
 
 - 
 
 NU 
 
 - 
 
 
 32,34 
 
 RD 
 
 11 
 
 - 
 
 - 
 
 NU 
 
 - 
 
 
 32,34 
 
 L 
 
 12 
 
 - 
 
 9 
 
 RD 
 
 14,24 
 
 
 - 
 
 RD 
 
 13 
 
 9 
 
 8 
 
 L 
 
 19,21, 
 
 22,23,27 
 
 14,24 
 
 L 
 
 14 
 
 9 
 
 8 
 
 L 
 
 19,23, 
 
 27 
 
 14,24 
 
 L 
 
 15 
 
 9 
 
 8 
 
 L 
 
 19,23, 
 
 27 
 
 14,24 
 
 L 
 
 16 
 
 4 
 
 4 
 
 RI 
 
 - 
 
 
 - 
 
 NU 
 
 17 
 
 10 
 
 10 
 
 RI 
 
 - 
 
 
 - 
 
 NU 
 
 18 
 
 - 
 
 - 
 
 NU 
 
 16,26 
 
 
 16,26 
 
 RI 
 
 19 
 
 - 
 
 - 
 
 NU 
 
 20 
 
 
 20 
 
 RI 
 
 20 
 
 - 
 
 - 
 
 NU 
 
 30,31 
 
 
 30 
 
 RI 
 
 21 
 
 - 
 
 - 
 
 NU 
 
 29,35 
 
 
 33,35 
 
 RD 
 
 22 
 
 - 
 
 - 
 
 NU 
 
 34 
 
 
 27 
 
 
 23 
 
 - 
 
 - 
 
 NU 
 
 34 
 
 
 27 
 
 
 24 
 
 - 
 
 - 
 
 NU 
 
 34 
 
 
 27 
 
 
 25 
 
 - 
 
 - 
 
 NU 
 
 34 
 
 
 28 
 
 
 26 
 
 - 
 
 - 
 
 NU 
 
 34 
 
 
 28 
 
 
 27 
 
 - 
 
 - 
 
 NU 
 
 32 
 
 
 31 
 
 RD 
 
 28 
 
 _ 
 
 _ 
 
 NU 
 
 _ 
 
 
 19,23 
 
 RD 
 
185 
 
 A. 5 Storage Assignment 
 
 Table A. 5 shows the D, A, Q, and S sets, defined in 
 section 3.5, for each variable for which storage is to be assigned. 
 Note that the files and the buffer named PRT-BUFR have been dropped. 
 
 Table A. 6 shows our storage unit assignment. This assign- 
 ment is shown for a memory size of 10 characters per word. The 
 utilization of the memory is given as a percentage of the number of 
 characters available in the memories needed for this program. 
 
 A. 6 Positioning FORK, HOLD, and QUIT Instructions 
 
 Figure A. 4 shows the program after we have inserted the FORK, 
 HOLD, and QUIT instructions needed in each phase. 
 
 A. 7 Inserting Interlocks 
 
 Figure A. 4 also shows the program after we have inserted the 
 necessary TEST and RELEASE instructions for each phase. 
 
186 
 
 CO 
 
 C\J C\J CM 
 
 
 ft #1 A 
 
 
 LO LO CTl 
 
 
 r— i— i— 
 
 
 a *t * ^— (— 
 
 
 cri *3- ^f cn cm cm 
 
 r^ 
 
 
 OO CM 
 
 •> « « oo cr> cr> 
 
 I - ~ ** 
 
 CO CO oo «>#>«» 
 
 "i— o 
 
 1 r-i-i-NCOCO 1 
 
 1 1 1^ CM CM 
 
 CM 
 
 CM 
 
 CD 
 
 to 
 
 o- 
 
 LO LO «3" 
 
 «t n n 
 
 I I I I CT> 00 i — i— i — I I I 
 
 CM 
 
 C\J CM CSJ CM CM 
 CM CM CM CM CM 
 
 CM CM CM CM CM 
 
 OO CM CM CM CM O 
 CM CM CM CM CM CM 
 
 I I I I ICMCMCMCMCMI I I 
 
 OO «=3- LO 
 
 i — i — r — 00 CT> 
 
 LO 
 4-> 
 
 CD 
 CO 
 
 LO 
 
 
 +-> 
 
 
 
 
 *d- 
 
 
 fO 
 
 
 
 
 r^ 
 
 LO 
 
 en 
 
 
 
 
 n 
 
 • 
 
 CD 
 
 
 
 
 oo 
 
 < 
 
 CD 
 
 
 
 
 n 
 
 CD 
 
 CO 
 
 
 r>- 
 
 
 LO 
 
 I— 
 
 CO 
 
 
 ■ — 
 
 
 n 
 
 -O. 
 
 
 ^» 
 
 01 
 
 
 LO OO 
 
 fl3 
 
 -o 
 
 
 r>. r~» lo lo 
 
 r*>. r~. r~- 
 
 n «t 
 
 I— 
 
 c 
 
 •r— 
 
 c 
 
 •1— 
 
 CD CO 
 
 CO 
 
 fO 
 
 <3- 
 
 #t 
 
 LO LO OO 
 
 i i ■ — i — i — 
 
 «^- CM 1 
 
 
 M- 
 
 o- 
 
 0O CM LD CM 1 1 
 
 i i i i i 
 
 1 1 1 
 
 
 << 
 
 
 
 
 
 I I I I 
 
 I I I I 
 
 "=3" OO LO 
 
 i — i — li — I I IOOCMLOI I I I I I I I I I I 
 
 LO LO LO LO LO 
 CM CM CM CM CM 
 
 ft 01 ft ft A 
 
 LO LO LO «3" «* 
 CM CM CM CM CM 
 
 ft ft A A A 
 
 LOLOLO't LOLO"5J" ^" <* CO CO fO 
 
 « « «i «< i — i — r— CMCMCMCMCM 
 
 ^]- <^J- 0O OO O") 0"> 00 •»■»«! •>«•»•>« 
 
 «nn«n«i«^-CQrO 0OCMCMCMCM 
 
 ncMWCMCONNr- i— i— I I I I I ICMCMCMCMCMI 
 
 CI) 
 
 oo><Mi-crin<tcnoi — lololololocmoOi — «^-lolo 
 oo oo oo 
 
 
 CM00"^-LOr^.00CTl00«^-LOLOr-.000~>Oi — CM0O«=d-LOLOr^ 
 i — i — i — i — i — i — i — CMCMCMCMCMCMCMCM 
 
Phase 1 
 
 187 
 
 Table A. 6 
 Storage Unit Assignment 
 
 Word size = 10 characters 
 
 Memory Units 1 
 
 2 
 
 3 
 
 4 
 
 5 
 
 6 
 
 2a 
 £ (Local) 
 
 _Q 
 
 2b 
 
 2c 
 
 3 
 
 4 
 5 
 
 
 £ 14a 
 
 14b 
 
 14c 
 
 13 
 
 15 
 
 
 > (Global) 16 
 
 
 
 
 
 17 
 
 # Variables 
 § Memories 
 
 # Local words 
 
 # Global words 
 
 # Local characters i 
 
 # Local characters i 
 
 # Global characters 
 
 # Global characters 
 
 ivailable 
 jsed 
 
 avail abl 
 used 
 
 e = 
 
 9 
 
 6 
 
 2 
 
 1 
 120 
 
 82 (68.3%) 
 60 
 10 (16.7%) 
 
 
 Phase 2: 
 
 Word size = 10 characters 
 
 Memory Units 1 
 
 2 
 
 3 
 
 4 
 
 5 
 
 6 
 
 7 
 
 8 
 
 13 
 
 14a 
 
 14b 
 
 14c 
 
 co (Local) 
 
 9 
 
 15 
 
 
 
 
 £ 26 
 
 25 
 
 22 
 
 23a 
 
 23b 
 
 23c 
 
 -Q 
 fO 
 
 
 24 
 
 
 
 
 1 (Global) l° 7 
 
 21 
 
 
 19 
 
 
 
 # Variables 
 
 
 = 
 
 15 
 
 
 
 # Memories 
 
 
 = 
 
 6 
 
 
 
 # Local words 
 
 
 = 
 
 2 
 
 
 
 # Global words 
 
 
 = 
 
 1 
 
 
 
 # Local characters 
 
 avail a 
 
 ble = 
 
 120 
 
 
 
 # Local characters 
 
 used 
 
 = 
 
 102 (85.0%) 
 
 
 # Global characters 
 
 avail 
 
 able = 
 
 60 
 
 
 
 # Global characters 
 
 used 
 
 = 
 
 17 (28.3%) 
 
 
188 
 
 / TEST \ 
 \ SORT- FILE / 
 
 T 
 
 / RELEASE \ 
 \ SORT-FILE / 
 
 CQUIT 
 
 12 
 
 13 
 
 f 
 
 I 
 
 h 
 
 Figure A. 4 
 Final Program Graph 
 
189 
 
 Figure A. 4 (continued) 
 Final Program Graph 
 
RELEASE 
 PA6EN0 
 
 I 
 
 35 
 
 < 
 
 TEST 
 PRT-FILE 
 
 /RELEASE \ 
 \ LINES / 
 
 190 
 
 \ PAGE NO / 
 
 30,31,33,35 
 
 I 
 
 > 
 
 / RELEASE 
 ( PAGE NO 
 \ LINES 
 
 ~r. 
 
 / TEST \ 
 \ PRT-FILE / 
 
 32 
 
 I 
 
 34 
 
 / RELEASE \ 
 \ PRT-FILE / 
 
 ^LMT> 
 
 36 
 
 T 
 
 37 
 
 <$UIT> 
 
 Figure A. 4 (continued) 
 Final Program Graph 
 
191 
 
 VITA 
 
 Richard Ernest Strebendt was born in Detroit, Michigan, in 
 1943. He received the BSEE degree in 1966 and the MSEE degree in 1968 
 from Wayne State University in Detroit. 
 
 From 1968 to 1971 he was employed by Bell Telephone 
 Laboratories in Naperville, Illinois, as a Member of Technical Staff. 
 During this time he participated in the design of a computer-assisted 
 Directory Assistance system and he was responsible for the design and 
 implementation of a descriptive language for logic circuits which is 
 now in use in a design automation system. In 1971 he obtained an 
 educational leave-of-absence. 
 
 In 1971 he entered the University of Illinois Department of 
 Computer Science as a Graduate Research Assistant. While at the 
 University of Illinois he was a member of a group investigating prob- 
 lems in multiprocessor system design. 
 
 Mr. Strebendt is a member of Tau Beta Pi, Eta Kappa Nu, 
 Sigma Xi , IEEE, and ACM. 
 
MBLIOGRAPHIC DATA 
 ,HEET 
 
 1. Report No. 
 
 UIUCDCS-R-74-638 
 
 3. Recipient's Accession No. 
 
 lit !e and Subtitle 
 
 PROGRAM SPEEDUP THROUGH CONCURRENT RECORD PROCESSING 
 
 5. Report Date 
 
 October 1974 
 
 Author(s) 
 
 Richard Ernest Strebendt 
 
 8- Performing Organization Rept. 
 
 No -UIUCDCS-R-74-638 
 
 Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Or ant No. 
 
 US NSF GJ 36936 
 
 2. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 13. Type of Report & Period 
 Covered 
 
 Doctoral - 1974 
 
 14. 
 
 5. Supplementary Notes 
 
 6. Abstracts 
 
 Much effort in the past has been devoted to speeding up computational programs 
 through the use of multiprocessing. This paper examines the problem of speeding up 
 data processing programs which typically do not contain a great deal of computation. 
 
 A machine organization is proposed which is capable of executing several instruction 
 streams concurrently. Compiler algorithms are described which automatically insert 
 the necessary commands to start and stop instruction streams and to protect common 
 variables which must be accessed sequentially. 
 
 7. Ke\ Word' and Document Analysis. 17a. Descriptors 
 
 Concurrent Processing 
 
 Data Interlocks 
 
 Multiprocessors 
 
 COBOL 
 
 Data Processing 
 
 7b. Identifiers/Open-Ended Terms 
 
 7c. . OSATI Field/Group 
 
 
 
 
 : lability Statement 
 
 RELEASE UNLIMITED 
 
 19. Security (lass (Thi;. 
 Report ) 
 
 UNCLASSIFIED 
 
 rfi. no. of 
 
 201 
 
 Pages 
 
 20. Security Class (Tins 
 Page 
 
 UNCLASSIFIED 
 
 22. Price 
 
 ORM NTIS- 35 ( 10-70) 
 
 
 USCOMM- DC 
 
 <10329- P 
 

^2 
 
 ' #5 
 

 
 EJHn 
 
 UNlVEf}o. T 
 
 ■ I ^H 
 
 n 
 
 ■ I 
 
 IBB H <"> **--*.,.*: 
 
 
 M 
 
 
 ■ ■ 
 
 I 
 
 
 
 w 
 
 
 Hi 
 
 I I 
 H M H 
 
 ■ I 
 
 H 
 
 
 
 vt 
 
 i tl**AXj& 
 
 
 
 A 
 
 ■ 
 
 H