The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 
 
 dec 16 m 
 
 L161 — O-1096 
 
JULtW 
 
 Report No. 398 
 
 )u.3ft 
 
 i'i 
 
 v^SV-IA 
 
 A LIMITED CONNECTION ARITHMETIC UNIT 
 
 by 
 
 Michael John Pisterzi 
 
 June 1, 1970 
 
Report No. 398 
 
 A LIMITED CONNECTION ARITHMETIC UNIT * 
 
 by 
 
 Michael John Pisterzi 
 
 June 1, 1970 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 618OI 
 
 This work was supported in part by the National Science Foundation 
 under Grant No. US NSF GJ8l2 and Grant No. US NSF GJ813 and was 
 submitted in partial fulfillment for the Doctor of Philosophy degree 
 in Electrical Engineering, 1970. 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/limitedconnectio398pist 
 
Ill 
 
 ACKNOWLEDGEMENT 
 
 I would like to acknowledge the guidance, encouragement, and 
 helpful suggestions of my advisor, Professor James E. Robertson. 
 I would like to thank Mr. Webb T. Comfort for several suggestions 
 that contributed significantly to the clarity of this paper. 
 
 I would also like to thank Miss Cheryl Becker for her skillful 
 typing and the members of the Drafting Department of the Depart- 
 ment of Computer Science for the figures that they drew. 
 
 Finally, I would like to thank my family, particularly my wife 
 Candace, for their encouragement. 
 
TABLE OF CONTENTS 
 
 IV 
 
 1. 
 
 INTRODUCTION 
 
 -Page 
 
 1. 1 Statement of the Problem 1 
 
 1. 2 Relation to Prior Work 5 
 
 1.3 Structure of the Remainder of the Paper 7 
 
 INTRODUCTION TO THE METHOD OF 
 PERFORMING THE PROCESSING 
 
 2. 1 Organization of the Arithmetic Unit 11 
 
 2. 2 Generalized Examples ................. 22 
 
 2. 3 The Basic Micro-Instruction Repertoire of the DPUs 26 
 
 THE ARITHMETIC CONSIDERATIONS OF IMPLE- 
 MENTING A LIMITED -CONNECTION ARITHMETIC UNIT 
 
 3. 1 Introduction 
 
 3. 2 Applicable Number Representations and Addition 
 Methods ...................... 
 
 3. 3 Multiplication Considerations ........... 
 
 3.4 Multiplier Re coding ................ 
 
 3. 5 Normalization Considerations . . 
 
 3. 6 Division Considerations .............. 
 
 35 
 
 37 
 56 
 69 
 75 
 89 
 
 INTERACTION WITH MEMORY 
 
 4. 1 Introduction . 108 
 
 4. 2 Methods Applicable When the Memory Byte is the Digit 109 
 4. 3 Methods Applicable When the Memory Byte is a 
 
 Number of Digits .. ................. . 114 
 
 OPERATIONAL SPECIFICATION OF THE MODULES 
 
 5. 1 Introduction .............. 
 
 5. 2 The Digit Processing Unit ...... 
 
 5. 3 The Primitive Control Unit ..... 
 
 5. 4 The End Unit 
 
 5. 5 Exponent Arithmetic Unit ....... 
 
 5. 6 The Sense Micro-Instruction Detector 
 
 123 
 125 
 136 
 140 
 142 
 143 
 
Pa g < 
 
 6. SUMMARY AND CONCLUSIONS 
 
 6. 1 Discussion of Results .................... 149 
 
 6.2 Suggestions for Related Work ... o .......... . 152 
 
 APPENDIX 
 
 I. CHARACTERISTICS OF THE SYMMETRIC RADIX 
 
 TWO SIGNED DIGIT ADDER 156 
 
 II. MINIMAL RIGHT -DIRECTED RECODER OF RADIX 
 
 TWO SIGNED DIGIT NUMBERS .............. 166 
 
 III. SELF -INITIALIZING MODULES .............. 170 
 
 REFERENCES 171 
 
 VITA 174 
 
A LIMITED CONNECTION ARITHMETIC UNIT 
 
 Michael John Pisterzi, Ph.D. 
 Department of Electrical Engineering 
 University of Illinois, 1970 
 
 A method of designing digital arithmetic units which are capable 
 of performing floating point addition, subtraction, multiplication, 
 division, and normalization is presented in this paper. The resulting 
 arithmetic unit designs will be particularly appropriate for imple- 
 mentation in Large Scale Integration. The major characteristics 
 of a limited connection arithmetic unit are: 
 
 1 . It is composed of a large number of complex 
 modules . 
 
 2. A very small number (three to eight) of specific 
 module types are used. 
 
 3. The number of signal paths required between any 
 module and the remainder of the arithmetic unit 
 is small. 
 
 4. Each specific intermodule signal must be sent to a 
 very small number of modules (one to three). 
 
 The paper shows that the complexity of the modules which are 
 required to construct a limited connection arithmetic unit can be 
 adjusted by selecting the number system and the details of the pro- 
 
cessing algorithms. By this means the design can be tailored to the 
 specific technology with which it is to be implemented. 
 
 The general arithmetic considerations of limited connection 
 arithmetic units were also investigated. The major conclusion of this 
 investigation is that signed digit number systems must be employed. 
 
 Several studies pertinent to radix two limited connection arith- 
 metic units were also conducted. The probability of occurrence of 
 each digit pair was determined for one of the three radix two adders. 
 The results of this analysis were used in a study of normalization 
 techniques. Finally, an optimum right-directed multiplier recoder 
 was developed. 
 
1. INTRODUCTION 
 
 1 . 1 Statement of the Problem 
 
 This study reports on a method of designing arithmetic 
 units in a technology in which: 
 
 • the fan- out of the output signal from any logic 
 element is limited, 
 
 • the basic building blocks are networks con- 
 taining large numbers of logic elements, 
 
 • the number of signal paths through which a 
 basic building block communicates to other 
 basic building blocks is severely limited, and 
 
 • a small number of basic building block types 
 must be employed. 
 
 These are, of course, characteristics of that class of technologies 
 popularly known as Large Scale Integration, or LSI. The resulting 
 arithmetic unit has some rather desirable properties not shared 
 by other arithmetic unit organizations proposed for implementa- 
 
 Sic 
 
 tion in LSI (9, 10, 12) . 
 
 The result of a given step of the calculation is determined 
 digit by digit. Each digit of every result is determined in the 
 
 ^Numbers in parentheses refer to articles listed in the References. 
 
order in which it will be required when that result must be mani- 
 pulated further, and it is stored in the same building block in 
 which it will be needed when that result is to take part in 
 additional processing. These properties of the arithmetic unit 
 allow the processing of a second step to begin as soon as a suf- 
 ficient (and small) number of digits of the results of the first 
 step have been determined. . 
 
 This paper presents a design procedure which has a 
 high probability of evolving a design which effectively utilizes 
 the potentialities of the technology. For the purposes of this 
 paper, a given technology is considered to be effectively utilized 
 when the average number of logic elements utilized per basic 
 building block exceeds a fixed, predefined percentage of the 
 number which can be constructed on it. The arithmetic unit 
 will be developed as a large number of appropriately connected 
 modules. These modules are functionally defined and are dis- 
 tinguished from basic building blocks so that we may postpone 
 the question of whether a module can be implemented as one 
 basic building block until after the modules are specified. We 
 are able to do a significant amount of analysis without including 
 specific technological considerations. The results of this analysis 
 indicate that the major module type, the Digit Processing Unit 
 (DPU), can be tailored to the technology by the selection of the 
 
number representation employed by the arithmetic unit. The 
 number of logic elements in the other principal building block, 
 The Primitive Control Unit (PCU), is related to the details of 
 the algorithms employed to perform multiplication, division 
 and normalization, in addition to the number representation 
 employed. 
 
 An arithmetic unit will consist of a large number of 
 DPUs and only one PCU. If the performance objectives of the 
 arithmetic units are not achieved by the design in which the 
 DPUs and the PCU are implemented as single basic building 
 blocks, more sophisticated algorithms may be employed to 
 perform the basic operations. This would tend to increase 
 the number of logic elements in the PCU without significantly 
 affecting the design of the DPU. The PCU would then have to 
 be implemented as several basic building blocks. 
 
 The study concentrates on the former problem; namely, 
 the specification of module types required to construct an arith- 
 metic unit. This problem is essentially independent of the 
 technology with which the arithmetic unit is to be implemented. 
 The results will show that the design has a sufficiently large 
 number of parameters to allow this method of designing an 
 arithmetic unit to be applied to any technology which has the 
 characteristics mentioned earlier. 
 
The design of the fractional part processor of a floating 
 point arithmetic unit will be developed in detail in this paper. 
 It will be capable of performing addition, subtraction, multi- 
 plication, division, and normalization. This emphasis on the 
 fractional part processor is based on the relative number of 
 digits and the processing required by the fractional part in con- 
 trast to that required by the exponent. The fractional part of 
 a typical floating point number contains several times as many 
 digits as the exponent does. In addition, the processing required 
 by the fractional part of the operands is much more complex than 
 that required by the exponent. Before an addition or subtraction 
 is performed, the radix points of the operands must be aligned, 
 which may require a number of right shifts. The result may also 
 have to be shifted left to normalize it. In contrast, the exponents 
 of the operands need only be subtracted to determine the number 
 of radix alignment shifts required. The only other operation 
 necessary is the addition of the number of normalization shifts 
 to form the exponent of the result. 
 
 The difference in processing is even more extreme for 
 multiplication and division. The fractional part calculation con- 
 sists of a sequence of additions and shifts , while the exponent 
 
 *This implies, of course, that there is only one adder in the fractional 
 part processor. This assumption will be justified later. 
 
calculation is basically a single addition. 
 
 Hence the processing required by the fractional part is 
 more difficult than that required by the exponent. It is possible 
 to evolve a design for the fractional part processor, as will be 
 shown in this paper. The same design, or a simplified version 
 of it may be employed for the exponent processor, although it 
 may not be desirable to do so. The complete result of the 
 exponent calculation is required to determine the processing to 
 be performed on the fractional part, while only an estimate of the 
 current results of the fractional part processing is required to 
 determine what additional processing is necessary. For 
 example, if an add is being performed, the difference of the 
 exponents indicates the number of shifts required to align the 
 radix points of the fractional parts, while only the first several 
 digits of the sum of the fractional parts is necessary to deter- 
 mine if the sum is normalized. Hence, it may be desirable to 
 perform the exponent calculations by a means which allows the 
 entire results to be available in the shortest time. 
 
 1.2 Relation to Prior Work 
 
 The two known research efforts into the development of 
 limited connection arithmetic units have evolved designs in which 
 operations are performed by special purpose units (9, 10, 12). 
 
The two efforts, while not identical, have several charac- 
 teristics in common. The first of these is the development of 
 special purpose units, namely the successful development of inde- 
 pendent units to perform addition, subtraction, and multiplication. 
 The second is the mode of operation. In both, operands travel 
 through the arithmetic unit as a string of digits and must be gated 
 to the appropriate execution unit. The pattern in which the basic 
 building blocks are connected defines the processing that will be 
 performed on the operands streaming through them. Finally, in 
 both efforts control information is inherent in the logical structures 
 of the mechanism. 
 
 The arithmetic unit developed in this paper is opposite in 
 approach to those efforts. It has only one execution unit -- cap- 
 able of addition, subtraction, multiplication, division, and 
 normalization. The operands remain relatively stationary in 
 this unit while being processed. The processing is controlled by 
 control signals which propagate through the arithmetic unit. 
 
 This approach was taken because it holds promise of not 
 requiring modifications to the programming systems employed on 
 the computer. Computers having several specialized execution 
 units (for the same class of operands) require either additional 
 effort on the part of the programmer, more complicated compilers 
 "(1, 13, 22, 23), or special hardware (2, 24) to assure that all units 
 
are kept busy as much of the time as practical while retaining 
 the appropriate ordering of the operations where necessary. 
 
 This study evolves an arithmetic unit design in which all 
 floating point operations are performed by the same execution 
 unit. The intent of this effort was to evolve an arithmetic unit 
 that is operationally equivalent to the Von Neumann single adder 
 arithmetic unit. Since there is one execution unit, there is no 
 need for expending effort in attempting to maximize and coordi- 
 nate the activity of independent execution units. Hence, the 
 arithmetic unit developed here may be able to replace the 
 arithmetic unit in existing computers with little or no change 
 to the remainder of the computer and its programs. There is 
 no need to develop specialized compilers; schemes similar to 
 the Common Data Bus (24) are needed only in those systems 
 which require buffering between the arithmetic unit and memory. 
 The arithmetic unit described in this paper requires little or 
 no additional development in other areas. 
 
 1 . 3 Structure of the Remainder of the Paper 
 
 In Chapter 2, the concept of the autonomous Digit Pro- 
 cessing Unit is developed. The Digit Processing Units (or DPUs) 
 
 * Fixed point operations may also be performed by the unit. 
 
each contain one digit of each of the active operands in its 
 registers. The DPUs communicate with their neighbors in 
 such a way that when one DPU is executing a given processing 
 step (or micro- instruction), no other DPU is. The DPUs are 
 organized so that the micro-instructions are passed from one 
 DPU to the next in a specific order. The method of performing 
 useful processing by causing the same sequence of micro- 
 instructions to be performed by each of the DPUs will also be • 
 presented, as will the concept of micro- instructions streaming 
 through the DPUs as they are executed by the DPUs in sequence. 
 The chapter concludes by defining the micro- instructions which 
 the DPUs must be able to perform and by indicating how these 
 micro-instructions are combined to perform 'machine* instruc- 
 tions. 
 
 Chapter 3 treats the arithmetic aspects of the design. 
 It discusses the number systems which can be employed, 
 how addition overflow may be handled and how multiplication, 
 division, and normalization may be performed. It also contains 
 the development of a minimal right-directed recoder of radix 
 two signed-digit numbers. The various possible methods of 
 normalization are evaluated for radix two signed-digit numbers, 
 and the optimum method is shown to depend on the ratio of 
 shift time to the examination time. 
 
Chapter 4 describes how the arithmetic unit and the 
 memory containing its operands may communicate. The methods 
 discussed are applicable when the memory byte consists of an 
 integral number of digits . The methods discussed range from 
 the very simple (entailing no additional equipment in the repeti- 
 tive portion of the arithmetic unit) to the rather complex, 
 employing a significant amount of special logic. 
 
 The operational description of the modules required to 
 construct a limited connection arithmetic unit are described in 
 Chapter 5. An attempt has been made to relate types of mod- 
 ules required and their characteristics with such parameters 
 as the size of memory byte, the number system, and the algo- 
 rithms for performing the machine instructions. 
 
 Conclusions and suggestions of related research are pre- 
 sented in Chapter 6. As indicated there, the attempt to con- 
 struct a Limited Connection Arithmetic Unit in an appropriate 
 
 *Byte is used in this paper as the quantity of data which a device (e.g. , 
 the storage unit) operates on simultaneously. Byte is used in its arbi- 
 trary sense in this paper, which stresses that no assumption was made 
 concerning the relative size of the data unit of the storage unit with 
 respect to the sizes of digits and operands (which are denoted as words). 
 
 **In the unlikely event that several storage unit bytes are required to con- 
 tain one digit, digit assembly /disassembly logic may be included in the 
 DPUs or the PCU, and the method applicable to one digit per byte can be 
 employed. 
 
10 
 
 technology would be the best way of determining what additional 
 considerations should be made. 
 
 Three appendices are included, which provide specific 
 information required to design radix two arithmetic units. The 
 first presents the analysis and results of a study of the steady 
 state probabilities of each possible pair of digits in the repre- 
 sentation of sums of the symmetric radix two signed-digit adder. 
 The radix two minimal multiplier recoder developed in Chapter 3 
 is discussed further in Appendix II. A method of initializing the 
 arithmetic unit that requires no additional connections is presen- 
 ted in Appendix III. 
 
11 
 
 2. INTRODUCTION TO THE METHOD OF PERFORMING THE 
 
 PROCESSING 
 
 2. 1 Organization of the Arithmetic Unit 
 
 A basic Limited Connection Arithmetic Unit consists of 
 a Primitive Control Unit, a number of Digit Processing Units, 
 and an End Unit. The Primitive Control Unit (PCU) receives 
 instructions from some external device and converts them into 
 a sequence of micro-instructions to be executed by the Digit Pro- 
 cessing Units (DPUs). The conversion which the PCU performs 
 is very similar to the conversion performed by the adder con- 
 trol logic of contemporary single adder arithmetic units. For 
 example, a multiply is converted into a number of shifts and adds. 
 
 The DPUs collectively contain the fractional parts of all 
 active operands and do the processing on them. The DPUs have 
 the capability of performing micro- instructions which will (when 
 performed by all DPUs) form sums, perform shifts, and do inter- 
 register transfer. 
 
 The End Unit allows the last DPU to be identical to all the 
 other DPUs and to operate as though it had a DPU on its right. 
 
 -^'Chapters 4 and 5 will discuss variations which employ additional modules 
 
12 
 
 • 
 
 
 
 
 T 
 
 *t >_ 
 
 1 
 ■ 
 
 1 
 1 
 
 
 i 
 
 1 
 
 1 
 
 
 
 (VI 
 
 + 
 
 
 
 
 
 3 
 
 
 •— < 
 
 
 Q. 
 
 
 
 O 
 
 
 
 
 * 
 
 1 
 1 
 
 
 1 
 | 
 
 ♦ 
 
 
 
 •-t 
 + 
 
 
 
 3 
 
 
 
 
 CL 
 
 
 
 O 
 
 
 
 
 ♦ 
 
 1 
 
 1 
 
 
 1 
 
 1 
 
 t 
 
 
 
 3* 
 Q.' 
 
 
 
 
 O 
 
 
 
 
 * 
 
 | 
 
 1 
 l 
 
 
 J. 
 
 J. 
 
 
 ^ S, 
 
 \! / 
 
 
 
 * i 
 i i 
 
 ! # 
 
 
 
 X 
 
 
 6 
 
 CL 
 
 
 
 
 Q 
 
 
 
 • ; 
 
 
 I t 
 
 
 
 ft 
 
 
 z 
 
 
 
 3 
 
 
 
 
 O 
 
 
 
 ♦ ! 
 
 
 i i 
 
 J*, * 
 
 B 
 
 •I-t 
 U 
 
 o 
 
 •l-t 
 -ft 
 
 o 
 
 c 
 c 
 o 
 
 V 
 
 0) 
 
 rt 
 
 u 
 
 a 
 
 c 
 o 
 
 • H 
 -4-> 
 
 rt 
 
 N 
 
 • i-i 
 
 c 
 
 rt 
 
 GO 
 
 U 
 
 O 
 0) 
 
 H 
 
 a) 
 
 Sh 
 GO 
 
 •i-t 
 
 *i I 
 
13 
 
 A operand 
 M. operand 
 operand 
 
 Z operand 
 
 DPU, DPU DPU. 
 
 1 2 : 
 
 a l 
 m l 
 
 a 2 
 m 2 
 
 4 
 
 a 3 
 
 m 3 
 
 
 f * 
 
 
 ■* / 
 
 
 Z l 
 
 
 z 
 2 
 
 
 Z 3 
 
 DPU 
 
 n 
 
 n 
 m 
 
 n 
 
 n 
 
 n 
 
 n 
 
 M 
 
 a. r 
 
 l 
 
 i=l 
 
 n 
 
 m. r 
 
 i 
 
 i=l 
 
 etc 
 
 ■where r is an integer greater than one known as the radix 
 
 of the arithmetic unit. 
 
 Figure 2 - The distribution of operands digits in the DPUs of a limited 
 connection arithmetic unit. 
 
14 
 
 The inter- module connections of a typical Limited 
 Connection Arithmetic Unit is shown in Figure 1 . Each DPU 
 retains the values of one digit of each of the active operands 
 in its register, as shown in Figure 2. 
 
 As mentioned earlier, each DPU performs the same 
 sequence of micro-instructions . From Figure 1 it can be 
 seen that a given micro-instruction can not be executed by all 
 DPUs in synchronization, but rather must be executed by them 
 in sequence (i.e. , first by DPU. , then DPU , . . . ). As soon 
 as all the DPUs which contain information required by DPU. 
 to perform micro-instruction j+1 (referred to as /I ) have 
 
 executed a . and have sent the required information to DPU, , 
 
 J 1 
 
 li . may be performed by DPU. . The micro-instructions 
 
 are defined to have regular data requirements, so that as each 
 
 additional DPU executes u. . , one more DPU may execute a . , . 
 
 J J + 1 
 
 The micro-instructions may be viewed as flowing through suc- 
 cessive DPUs. 
 
 This initial description brings out two characteristics 
 that the set of micro-instructions is to have; namely, the need 
 
 '^This is not absolutely necessary, as certain micro-instructions are 
 performed to classify the value of an operand, as in selecting quotient 
 digits. The DPUs not containing information required to make this 
 classification need not perform these micro-instructions. 
 
15 
 
 for information from as small a number of other DPUs as 
 possible, and well- matched intrinsic execution rates. Both 
 decrease the time between micro-instruction executions by 
 DPU , and hence tend to yield high performance arithmetic 
 units. This paper shows that the number of DPUs which must 
 transmit information to a given DPU can be limited to one. 
 The execution rate could not be taken into account at the level 
 of this analysis. It is felt that differences in the execution rates 
 of the micro-instructions can be minimized by employing more 
 logic elements to perform the more complex (and hence poten- 
 tially slower) micro-instructions. 
 
 The above description of the operation of the arithmetic 
 unit indicates that the DPU registers do not contain entire oper- 
 ands as long as any of the DPUs are actively executing micro- 
 instructions. Each DPU contains the digits of the results of the 
 last micro-instruction it has executed. For example, if at a 
 particular instant of time the following DPUs are active: 
 DPU 3 (executing p 7g ), DPU 5 ( /* 74 >, DPU g ( ^ ?3 ), 
 
 DPU 10 ( ^ 72** DPU 15 ( ^ 71*' ' " ' the accumulator register 
 distributed through the DPUs would contain: 
 
 75*1' 75 a 2' (_) ' 74 a 4' (_) ' 73 a 6' 73 a 7' (_) ' 72 a 9' (_) ' 71*11' 
 7i a i2' 7i a i3' 7i a i4' ^"^' '*' ' The dashes (~) indi cate that the 
 
16 
 
 corresponding digit cannot be explicitly identified as it may be 
 
 changing. The number preceding the letter 'a' indicates which 
 
 of the operands the given digit is part of. For example, - c a 
 
 75 1 
 
 is the vahie of the first digit of the 'A' register after the first 
 seventy-five micro-instructions have been executed. This 
 terminology will be used throughout this paper. 
 
 The processing performed by the DPUs can be des- 
 cribed by the following: 
 
 .X. = ¥ .(. .X., .F. .n., ... .G., 2.1.1 
 
 J 
 
 .F. = *. (. $., .F. ) (2.1.2) 
 
 J i J J-1 i J i-l 
 
 and 
 
 .G = r. (. .X,, .F ) (2.1.3) 
 
 J k j j-1 k j k-1 
 
 where 
 
 ^ th 
 
 X is the operand information contained in the i DPU 
 
 J i 
 
 immediately following the execution of micro- 
 instruction j. It is indicated to be a vector as it 
 consists of the i digit of each of the active 
 operands , 
 
17 
 
 ty . is the function employed to obtain the new operand 
 
 set. It is dependent only on the micro-instruction 
 
 to be performed, 
 
 .F is a 'modifier' value which DPU. transmits to 
 
 J i i 
 
 DPU. . with the micro-instruction to be performed 
 l+l 
 
 next, 
 
 <S>. is the function which each DPU performs to deter- 
 
 J 
 
 mine .F, , 
 J k 
 
 Ci. is the number of DPUs which must cooperate with 
 
 J 
 
 the DPU performing /i . by transmitting to this 
 
 'active' DPU a value of .G, , 
 
 J k 
 >\< 
 .G, is the value which DPU, transmits to the DPUs 
 
 J k k 
 
 with which it cooperates when they are executing 
 
 M . , and 
 
 I\ is the function all DPUs employ to determine the 
 
 J 
 
 value of .G, which it transmits to all DPUs with 
 J k 
 
 which it cooperates. 
 The operation of a typical DPU, DPU., is as follows. It 
 begins in a state in which it is receptive to information defining 
 
 *The value of .G, must be defined for k > n+1 where n is the number of 
 J k - 
 
 DPUs in the arithmetic unit. One possibly is .G, = for all k >n+l. 
 
 J k 
 
18 
 
 the next micro-instruction to be performed. DPU. receives 
 
 1 
 
 this information and the value of .F. , from its left neighbor, 
 
 j l-l 
 
 DPU. , . When DPU. receives this information, it determines 
 l-l i 
 
 .G. (i.e. , performs Equation 2.1.3) and places this value on 
 signal lines which are connected to several of its left neighbors. 
 It also determines .F. by performing Equation 2. 1.2, and trans- 
 mits this value and the identity of Li . to DPU. , . Some time 
 
 later DPU. receives a signal from DPU. . indicating that DPU. , 
 l l-l l-l 
 
 has executed 11.. DPU. then executes a . (i.e. , performs 
 
 J 1 J 
 
 ty .), altering the value of one or more of its internal registers. 
 
 DPU. transmits a signal at this time to DPU. , which indicates 
 l l+l 
 
 that DPU. , may execute a .. When DPU. receives an acknow- 
 i+l J i 
 
 ledgment from DPU. , it goes into the state where it is recep- 
 tive to information concerning a . . . The sequence above then 
 
 j + 1 
 
 repeats. 
 
 Notice that in this formulation the operations are com- 
 pletely independent of the significance of the digits retained by 
 the DPU. All DPUs may then be identical. A second thing to 
 note is that a . , the number of DPUs to the right of the DPU 
 performing the micro- instruction, is assumed to be a function 
 of the micro-instruction being performed. Its value determines 
 the desirability of including the micro-instruction in the 
 repertoire of the DPUs. The larger the value of the parameter 
 
19 
 
 DPUj 
 
 
 DPU 2 
 
 
 DPU 3 
 
 
 DPU 4 
 
 
 DPU 5 
 
 • • 
 
 i 
 
 i 
 
 
 
 i 
 
 i 
 
 
 
 i 
 
 . 
 
 
 
 i 
 
 i 
 
 
 
 i 
 
 i 
 
 » 
 
 a = l 
 
 DPU t 
 
 
 DPU 2 
 
 
 DPU 3 
 
 
 DPU 4 
 
 
 DPU 5 
 
 
 I 
 
 i 
 
 . 
 
 
 
 i 
 
 i { 
 
 . 
 
 
 
 — « 
 
 — 
 
 i 
 
 
 
 i 
 
 i i 
 
 i 
 
 
 
 — i 
 
 i i 
 
 i — i 
 
 i 
 
 < 
 
 
 
 
 
 
 — < 
 
 — 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 i 
 
 a = 2 
 
 DPUi 
 
 
 DPU 2 
 
 
 DPU 3 
 
 
 DPU 4 
 
 
 DPU 5 
 
 
 i 
 
 i i 
 
 > i 
 
 , 
 
 
 
 i 
 
 \ i 
 
 i i 
 
 i 
 
 
 
 i 
 
 — i 
 
 i i 
 i— 
 
 i 
 
 
 
 — i 
 
 >— 
 
 \ i 
 
 i 
 
 
 
 i 
 
 f i 
 
 1 1 
 
 1 
 
 ( 
 
 
 
 
 
 
 
 
 — < 
 
 i — 
 
 
 
 
 — i 
 
 > — 
 
 
 
 
 
 
 
 
 
 
 
 
 — 4 
 
 i — 
 
 
 i 
 
 i 
 
 a = 3 
 
 DPUi 
 
 
 DPU 2 
 
 
 DPU 3 
 
 
 DPU 4 
 
 
 DPU 5 
 
 
 i 
 
 i J 
 
 
 I i 
 
 i 
 
 " I 
 
 i i 
 
 
 t " 
 
 
 i 
 
 M 
 
 1 1 
 
 
 
 ' i 
 
 i i 
 
 . i 
 
 i i 
 
 
 ■ ; 
 
 > t 
 
 i i 
 
 n 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 { 
 
 
 
 
 
 
 
 
 
 
 
 
 
 > 
 
 
 
 < 
 
 i i 
 
 | — 
 
 
 
 
 
 
 
 
 i 
 
 i 
 
 a =4 
 
 Figure 3. Inter-DPU data paths required for various values of cC. 
 
20 
 the less desirable the micro -instruction. There are two 
 reasons for this. The first reason is that the subject micro- 
 instruction cannot be performed by a DPU until the a . DPUs 
 
 * 
 to its right have completed all previous micro-instructions. 
 
 The execution rate of the subject micro-instruction is inversely- 
 proportional to a .+1 . The second reason is that the number of 
 
 J 
 
 data paths required between the DPUs is determined by the maxi- 
 mum value of this parameter. This is shown in Figure 3, where 
 
 01 is the maximum of the 0f . for the complete set of micro- 
 
 J 
 
 instructions. Hence, the number of connections which must be 
 made to a DPU is directly related to ° . The communication of 
 control information is not included specifically in the discussion 
 above. Status information must be communicated from a given 
 DPU to all DPUs to which it may supply data. Hence this com- 
 munication network will also appear as shown in Figure 3. 
 
 A second item which must be minimized is the number 
 
 of values taken on by .F. , the variable presented to DPU. , , by 
 
 J i i+l 
 
 fVi 
 DPU. when it identifies the next (j ) micro-instruction. This 
 
 factor has a less profound effect on the data communications 
 
 requirements. It affects only the number of transmission paths 
 
 *Or all the DPUs to its right, if there are less than a . remaining in 
 the sequence. 
 
Table 1. 0(. of the Micro-instruction of the Example, 
 
 21 
 
 12 3 4 5 6 
 
 2 10 12 
 
 Time 
 
 Micro- instruction 
 
 Operand Register 
 
 \ U 2 M 3 
 
 % 
 
 P 5 
 
 °1 °2 °3 
 
 °4 
 
 °5 
 
 1 
 
 S l 
 
 
 
 
 
 
 
 
 
 2 
 
 1 
 
 S l 
 
 
 1 
 
 
 
 3 
 
 S 2 1 
 
 
 S l 
 
 1 
 
 
 
 4 
 
 S 2 » 
 
 
 
 1 
 
 
 
 5 
 
 2 S 2 
 
 1 
 
 
 2 
 
 1 
 
 
 6 
 
 3 2 
 
 S 2 
 
 1 
 
 3 2 
 
 
 1 
 
 7 
 
 S, 3 2 
 4 
 
 
 S 2 
 
 3 2 
 
 
 
 8 
 
 S 4 3 
 
 2 
 
 
 3 
 
 2 
 
 
 9 
 
 4 S 4 
 
 3 
 
 2 
 
 4 
 
 3 
 
 2 
 
 10 
 
 S 5 4 
 
 S 4 
 
 3 
 
 4 
 
 
 3 
 
 11 
 
 S 5 4 
 
 
 S 4 
 
 4 
 
 
 
 12 
 
 S 5 
 
 4 
 
 
 
 4 
 
 
 13 
 
 5 
 
 S 5 
 
 4 
 
 5 
 
 
 4 
 
 14 
 
 6 5 
 
 
 S 5 
 
 6 5 
 
 
 
 15 
 
 6 5 
 
 
 
 6 5 
 
 
 
 16 
 
 6 
 
 5 
 
 
 6 
 
 5 
 
 
 17 
 
 
 6 
 
 5 
 
 
 6 
 
 5 
 
 18 
 
 
 
 6 
 
 
 
 6 
 
 Figure 4. Example of the operation of a limited connection 
 arithmetic unit. 
 
22 
 
 necessary between adjacent DPUs over and above that required 
 due to a . 
 
 2. 2 Generalized Examples 
 
 An example of generalized operations will now be pre- 
 sented to illustrate that the processing of several micro- 
 instructions may take place simultaneously in the arithmetic unit 
 
 each by a different DPU. The a . are given in Table 1. The 
 
 J 
 
 arithmetic unit will have five DPUs and one operand. The 
 
 operands will be indicated as A., the value of the operand after 
 
 the j micro-instruction. This operand will be composed of 
 
 five digits .a. , . . . , .a_ such that digit .a. is the digit contained 
 J 1 J 5 j i 
 
 in DPU. after the j micro-instruction, 
 l J 
 
 The operation of the arithmetic unit is presented in tabu- 
 lar form in Figure 4. The columns labelled O. will indicate the 
 
 th 
 
 operand contained in DPU.. The occurrence of 'j' in the i 
 
 operand column will indicate that .a. has just been determined 
 
 J i 
 
 and placed in the operand register of DPU.. The columns 
 
 labelled M will indicate when a micro- instruction has just 
 
 i 
 
 been completed by DPU.. The occurrence of 'j' in an instruction 
 
 column will be used to indicate that the associated DPU has just 
 
 .th 
 executed micro-instruction j. The occurrence of 'S.' in the i 
 
 J 
 
 instruction column indicates that DPU. has just received the 
 
23 
 
 identity of M ar *d will begin determining .G . The pro- 
 J J i 
 
 gression of time will be indicated by the rows, each row 
 
 equivalent to the time required for a DPU to execute one 
 
 micro-instruction . 
 
 In Figure 4, the arithmetic unit is shown to be in a 
 
 steady state at time 1. No micro-instructions are being 
 
 executed and A is in the operand register. We will assume 
 
 that the identity of \l has reached DPU and G , G , and 
 
 , G„ are available. At time 2, DPU receives G^ from DPU„ 
 13 1 12 2 
 
 and G from DPU and executes \i . This causes a to 
 
 be replaced by a . During the next four time intervals, 
 
 \i is performed consecutively by each of the remaining 
 
 DPUs , since an additional , G, becomes available just as it 
 
 Ik J 
 
 is required by a DPU to perform jU . The identity of the 
 second micro-instruction, S , is received by a DPU one time 
 unit after that DPU performs \i . Since DPU requires 
 
 G to execute \i ( CL - 1), this micro-instruction is not 
 performed by DPU until time 5, since it is not until that time 
 that DPU is able to determine this value and send it to DPU . 
 Just as with jU , jLl is executed sequentially by each of the 
 remaining DPUs during each of the next four time intervals. 
 
 -•'This presentation of the example is not intended to suggest that 
 synchronous operation is necessary. 
 
24 
 
 Micro-instruction 3 is performed by each of the DPUs one time 
 unit after that DPU has performed M because 0? = 0, and 
 no outside information ( G ) is required. The other micro- 
 
 ■J iC 
 
 instructions are performed in the same pattern. 
 
 In general, DPU. performs /I. the time unit following 
 
 its execution of \l . . if CL . = 0; DPU. performs U . after it 
 
 J" 1 J i J 
 
 has received .G ~. . from DPU *, .if 0/ . 4 0. 
 l u .+i " .+i l 
 
 Note that the number of DPUs in the arithmetic unit does 
 not affect the rate of execution of micro-instructions. 
 
 To illustrate that the correct result is determined, the 
 operation of the arithmetic unit is completed after executing six 
 micro-instructions. The processing will then be completed by 
 the end of time 18, at which time the contents of all the operand 
 registers are indicated. The reader will note that their con- 
 tents are ,a, , ,a_, ,a. ,a. and .a_ , which constitutes A . , 
 61626364 65 6 
 
 as required. 
 
 If the six micro-instructions whose processing was 
 depicted in Figure 4 were the micro-instructions required to 
 accomplish one regular machine instruction, and are to be 
 followed by other micro- instructions , the latter can clearly 
 be initiated by DPU at time 16 + Q (where ^ is the first 
 micro- instruction for the next instruction). Hence, the micro- 
 
25 
 
 instructions of successive instructions can be overlapped in 
 the same manner that the micro-instructions of a given opera- 
 tion can. The time required to 'perform' an instruction is 
 therefore not significantly affected by the number of DPUs 
 comprising the arithmetic unit. The time required to perform 
 an instruction is essentially the time between the performance 
 by DPU. of the first micro-instruction of that instruction and 
 its performance of the first micro-instruction of the next 
 instruction . This assumes, of course, that the time required 
 for a typical micro-instruction to be performed by all of the 
 other DPUs of the arithmetic unit is negligible in comparison 
 to the time the arithmetic unit is busy executing a continuous 
 string of micro-instructions 
 
 *For the last instruction, the time when a typical micro-instruction 
 could be performed by DPU, should be used instead of the time the 
 first micro-instruction of the next instruction is performed. 
 
 **This continuous string of micro-instructions is a result of the 
 
 mapping of a continuous string of instructions into micro-instructions 
 by the primitive control unit. The continuous string of instructions 
 may be a result of the effective concatenation by a supervisory pro- 
 gram of the instructions required by a sequence of tasks to be 
 performed by the arithmetic unit. The time the arithmetic unit is 
 busy on a continuous string of micro-instructions may then be on 
 the order of several hours. 
 
26 
 
 2. 3 The Basic Micro- Instruction Repertoire of the DPUs 
 
 In this section we will discuss the micro-instructions 
 which must be included in the repertoire of the DPUs so that 
 the overall arithmetic unit is able to do addition, subtraction, 
 multiplication, division, and normalization. The micro- 
 instructions may be placed in four classes for the purposes 
 of this discussion. These four classes are: 
 
 1. the inter- register transfers, 
 
 2. the shift micro-instructions, 
 
 3. the arithmetic micro- instructions , and 
 
 4. the memory accessing micro-instructions. 
 Micro-instructions in the first class cause operands to 
 
 be transferred from one register to another. This allows the 
 results of one machine instruction to be used as an operand in 
 a subsequent instruction. For example, let us consider the 
 case in which a third number is to be added to the quotient of 
 a division that has just been performed. In additions, the 
 number in the register is added to the contents of the A 
 register. Since the register is used as the interface with 
 the storage device, the contents of the MQ register (in this 
 case the quotient of the division) must be transferred to the 
 A register before the addition can be performed. 
 
27 
 
 A second application of the inter- register transfer 
 micro-instructions is in the exchange of operands when nor- 
 malization or radix point alignment are required. If, like a 
 classical Von-Neumann arithmetic unit, only the A register 
 and the MQ register have the required shifting ability, an 
 operand in the register which must be shifted to align its 
 radix point to that of the other operand must be moved to the 
 A or MQ register. 
 
 In all the inter- register transfers, all of the data 
 required by a DPU to perform the micro-instruction is con- 
 tained within that DPU. This can be seen in Figure 2. Each 
 DPU contains one digit of each of the operands. Therefore, 
 
 a. - for all inter-register transfers and .F. is not required 
 
 J i 
 
 to transmit data. The value of .F. may be used, therefore, to 
 
 J i 
 
 identify one or both of the registers taking part in the transfer. 
 The number of micro-instruction codes which must be assigned 
 to inter-register transfer micro-instructions is therefore 
 dependent on the number of values which may be taken on by 
 .F. in addition to the number of pairs of registers between 
 which inter- register transfers are to be performed. 
 
28 
 
 In the notation of Equation 2.1.1 through 2.1.3 the 
 inter- register transfer micro-instructions may be formu- 
 lated as: 
 
 •x. = . ,y. 
 J i J" 1 i 
 
 i = 1, 2, 
 
 , n 
 
 (2.3.1) 
 
 .F. = .F. 
 J i J i" 1 
 
 i= I, 2, 
 
 , n 
 
 (2.3.2) 
 
 .G = <null> 
 J i 
 
 i = 1 , 2, . . . , n 
 
 (2.3.3) 
 
 where 
 
 .x 
 
 J i 
 
 j-i y i 
 
 <null> 
 
 is the register to be copied into the X register, 
 
 is the i digit of the X register after the transfer, 
 
 th 
 is the i digit of the Y register before the transfer, 
 
 and 
 
 indicates that the value of .G. is not required when 
 
 J i 
 
 performing inter-register transfers. 
 The second class of micro-instructions is the shift micro- 
 instructions. They are used during radix point alignment prior 
 to addition or subtraction, for normalization, and for multiplication 
 and division by the radix during the repetitive steps for multi- 
 plication and division. We will assume that shifts of more than 
 
29 
 one digital position will be performed as a number of succes- 
 sive shifts of one digital position each. 
 
 The left shift can be accomplished by causing the DPU 
 to the immediate right of the DPU performing the micro- 
 instruction to transmit the value of the digit of the operand 
 contained in its register to the DPU performing the micro- 
 instruction. This DPU stores the digit it receives in its oper- 
 and register. The equations defining a left shift micro- 
 instruction are: 
 
 .x. = G i = 1, 2, ..., n (2.3.4) 
 
 .F. = .F i = 1, 2, ..., n (2.3.5) 
 
 J i J i-l 
 
 .G. = . _x. i=l,2,...,n (2.3.6) 
 
 J i J" 1 i 
 
 .G . = .F if .F is a valid digit (2.3.7) 
 
 J n+1 j n j n * 
 
 otherwise see text 
 
 where 
 
 X is the operand being shifted, 
 
 .x. is the i digit of the shifted operand, 
 
30 
 
 . .x. is the i digit of X before the shift, 
 
 .F. is the modifier value passed along with 
 
 the micro-instruction. 
 
 .F is the value that the PCU sends to DPU, with the 
 J 1 
 
 left shift micro-instruction to indicate the value that is to go 
 into the last DPU. If .F is a valid digit, it becomes the digit 
 shifted into the last DPU. If it is not a valid digit, it causes 
 the End Unit to shift in the digit shifted out during the last 
 right shift. 
 
 One should also note that the left shift micro -instruc- 
 tions make it possible to transmit the most significant digit 
 of an operand to the PCU. The value of this digit will be on 
 
 the .G, lines just prior to the execution of the left shift micro- 
 J 1 
 
 instruction by DPU . The left shift can therefore be used by 
 the PCU to examine operands. 
 
 The right shift micro-instruction does not have the 
 complexity of the left shift micro-instruction. The value 
 stored into a DPU is the value transmitted by its left neighbor 
 DPU with the indication that a right shift is to be performed. 
 The value of the digit stored in the first DPU is determined 
 
31 
 
 by the PCU. In the terminology of Equations 2.1.1 through 
 2.1.3, 
 
 .x. = .F. . i = 1, 2, . . . , n (2. 3.8) 
 
 J i J i- 1 
 
 .F. = .x. i = 1, 2, ..., n (2.3.9) 
 
 J i J" 1 i 
 
 .G. = <^null> i=l,2, ...,n (2.3.10) 
 
 where 
 
 .F^ is the digit which the PCU transmits with the indi- 
 
 J o 
 
 cation that a right shift is to be performed. This 
 
 value becomes the value of the most significant 
 
 digit of the shifted operand. 
 
 The value of F , which is transmitted by DPU to the 
 j n n 
 
 'End Unit' , where it is stored as the new top element in the 
 
 push-down stack. 
 
 A final note concerning shifts is that the value of a =1 
 
 and <* =0; and that in the worst case (left shift), .F. must 
 RS j i 
 
 take on one value more than the number of values a digit may 
 assume. Two micro-instruction codes are required for each 
 register which has shifting capabilities. 
 
32 
 
 The third class of micro- instructions are those micro- 
 instructions required to control the arithmetic processing of 
 the operands. Multiplication and division will be implemented 
 as a number of additions and shifts, just as they were in the 
 classical Von-Neumann arithmetic unit. The only micro- 
 instructions necessary in this case are those which cause A 
 to be replaced by the following expression 
 
 A. = A. .+ (k * ) (2.3.11) 
 
 J J" 1 j 
 
 where 
 
 A. is the value of the operand contained in the A 
 
 J 
 
 register after the arithmetic operation, 
 A is the value of the operand contained in the A 
 
 register prior to the arithmetic operation, 
 0. is the value of the operand contained in the 
 
 register, and 
 k is a number whose magnitude is less than the 
 
 radix employed by the arithmetic unit. 
 It is shown in Section 3. 1 that k does not take on values 
 which are not digits of the number representation. The .F. 
 interconnections may therefore be employed to distribute the 
 value of k to the DPUs with no need to increase the number of 
 
33 
 
 interconnections, since the .F. data paths must convey all 
 possible digit values for the shifting micro-instruction. 
 
 In Section 3.2 we show that the addition function can 
 be implemented such that Of is 1 or # is 2. In the imple- 
 mentations for which Of is 1 , either the radix and the values 
 which may be taken on by k are restricted, or two micro- 
 instructions must be performed to complete each addition. 
 In the implementation for which & is 2 , no such restrictions 
 are encountered. The choice of which method of addition is 
 implemented must be based on trade-off considerations. The 
 major points to be considered are: 
 
 • The average time taken to perform additions by 
 the three possible methods. 
 
 • The cost of implementing an arithmetic unit 
 whose DPUs have a = 2 compared to the cost if 
 
 a = 1. This cost has three components. The 
 first component is the larger number of inter- 
 connections which must be made. In addition to 
 
 transmitting .G. and receiving G. , , DPU. must 
 J i j 1+ 1 i 
 
 be able to receive .G. _ when # = 2. The second 
 
 J i+2 
 
 component of this cost is the additional logic ele- 
 ments required because of the larger number of 
 signals it must receive, the more complex control 
 
34 
 
 logic, and the more complex addition logic. The 
 third component is the increased time to perform 
 all micro- instructions; the control logic in the 
 DPU is more complex when CL - 2 and, quite 
 likely, slower. 
 The last class of micro-instructions are those which 
 cause exchanges of data between the arithmetic unit and the 
 storage unit. There will be no micro-instructions in this class 
 if the PCU acts as an intermediary in the exchange as suggested 
 by Comfort (8). The micro-instructions in this class are similar 
 
 to the inter-register transfer micro-instructions in that .F 
 
 J i 
 
 may be used to identify (or aid in identifying) the register taking 
 part in the exchange and the type of exchange (i.e. , load or 
 store). The number of micro-instruction codes required is 
 related to the number of registers which communicate with the 
 storage unit, and to the number of values which .F. may take on. 
 Interfacing the storage unit to the arithmetic unit is discussed 
 in depth in Chapter 4. 
 
35 
 
 3. THE ARITHMETIC CONSIDERATIONS OF IMPLEMENTING 
 A LIMITED -CONNECTION ARITHMETIC UNIT 
 
 3.1 Introduction 
 
 The limited connection arithmetic unit is organized to 
 process floating point operands. The fractional part of the 
 operands will be contained in, and processed by, the DPUs. 
 The processing will begin with the most significant digits of 
 the operands and proceed to those with decreasing significance. 
 This is necessitated by the requirements of the normalization 
 and division processes. In both of these processes, the values 
 of the most significant digits of the operands determine what 
 additional processing is required. In normalization, the value 
 of the most significant digits are examined to determine if the 
 number has been normalized, and if not, the next step to be 
 taken. During division, the approximate value of the partial 
 remainder, determined by examining several of the most 
 significant digits, is used in the selection of the next quotient 
 digits. In both of these cases, the results of examining several 
 of the most significant digits of the operands determines the 
 next several micro-instructions to be performed, and hence 
 must be performed in as short a time as possible. This then 
 makes it necessary for the most significant digits of the operands 
 
36 
 
 to be placed in the DPU which is adjacent to the PCU, since 
 it is the first DPU to execute each micro-instruction. 
 
 The various methods of implementing the addition micro- 
 instruction (see Equation 2.3.11) are discussed in Section 3.2, 
 together with a discussion on the implications of each method 
 on the complexity and performance of the arithmetic unit. 
 Section 3.3 discusses the overflow recoder which must be 
 implemented in the PCU if the arithmetic is to be able to form 
 double length product fractions. The desirability of recoding 
 multiplier digits is discussed in Section 3.4. A class of opti- 
 mum recodings for radix 2 signed-digit numbers is also 
 developed in this section. Normalization is discussed in 
 Section 3. 5. The optimum algorithm for normalizing radix 
 2 signed-digit numbers is determined; it is shown that the opti- 
 mum algorithm for a specific arithmetic unit is dependent on 
 the value of 5 of that arithmetic unit . Division is analyzed 
 in Section 3.6. The relationship between the parameters of the 
 number system and the number of quotient digits determined 
 during each examination of the partial remainder is determined 
 in this section. This analysis shows that it is possible to imple- 
 
 ' s The ratio of the time for the PCU to sense the value of the first two 
 digits of the number normalized to the time to shift that number one 
 digital position to the left. 
 
37 
 
 ment division without requiring special data paths if one 
 quotient digit is determined during each examination of the 
 partial remainder and if an o = 2 adder is implemented. 
 
 3. 2 Applicable Number Representations and Addition Methods 
 In order to perform multiplication and division by 
 alternately doing one addition and shifting, the arithmetic 
 unit must include micro-instructions to add or subtract 
 various multiples of one of the operands to the other. 
 That is, the arithmetic micro-instructions must be charac- 
 terized as follows: 
 
 A' = A + (k*0) (3.2.1) 
 
 where 
 
 A', A, are consistently represented 
 
 numbers , and 
 
 k is a multiplier or quotient digit such that 
 
 | k|<K, *j±- <K^r-l. (3.2.2) 
 
 Because of the necessity of beginning addition and sub- 
 traction with the most significant digit and with limited infor- 
 mation concerning values of the operands, the conventional, 
 non- redundant number representations cannot be used. Three 
 
38 
 methods of performing addition on signed-digit numbers were 
 investigated. Because it must be possible for the sum of one 
 addition to be used as an operand of a subsequent addition, the 
 Avizienis Adder and the Second Order Simple Transformation 
 Adder were found to be equivalent. The designer has three 
 methods of implementing addition, in the Limited Connection 
 Arithmetic Unit, two of which have a = 1, the other which has 
 a = 2. 
 
 The Avizienis Adder requires that the a of the arith- 
 metic unit to be at least one. The number of addition micro- 
 instructions which must be issued to perform one addition 
 operation, as defined by Equation (3.2.1) and (3.2.2), is deter- 
 mined by the value of k. The number of micro-instructions 
 which must be issued varies from one to three. Furthermore, 
 the Avizienis Adder cannot be used on radix two numbers. 
 
 The two remaining methods employ the Three Level 
 Adder. An adder of this type may be designed for any signed- 
 digit number system. The first of these methods is a straight- 
 forward implementation and requires that the a of the arithmetic 
 unit be at least two. Only one micro-instruction must be issued 
 for each addition operation. The second of these methods imple- 
 ments the Three Level Adder as a two step process, and hence 
 
39 
 
 requires two micro-instructions to be executed for each 
 arithmetic operation. This method requires that the a a of 
 the arithmetic unit be at least one. It has one major draw- 
 back over the other methods: the accumulator register must 
 
 r 
 be able to assume approximately — times as many states as 
 
 it must for either of the other methods. 
 
 3.2.1 The Avizienis Adder 
 
 The Avizienis Signed Digit Adder (4, 5, 6) can be 
 characterized in the notation of Equations (2. 1. 1) through 
 (2.1.3) by a = 1. That is, a digit of sums and differences can 
 be formed based on the operand digits of that digital position 
 and those of the digital position to its "right". The arithmetic 
 operations as proposed by Avizienis and these number repre- 
 sentations cannot satisfy the condition defined by Equations 
 (3. 2. 1) and (3. 2. 2) but can be characterized by K <C ~ (r-1) (5) 
 Avizienis suggests that either an odd radix be employed and the 
 multiplier digits be recoded to meet the restriction on the value 
 of k, or that two arithmetic operations be performed between 
 shifts when k exceeds the range. The first of these is not appli- 
 cable to this structure as it requires that the recoder yield a 
 non- redundant representation of the operand. This is not 
 possible because the recoder must begin at the most significant 
 
40 
 
 digital position, precluding a non- redundant output. However, 
 the second approach is applicable; the arithmetic operations can 
 be completed in two additions for any value of K if an odd radix 
 is employed. However, if an even radix is used, |k j= r-1 
 would require three additions. It seems clear that the relative 
 frequency with which the adder circuitry must be used twice 
 between shift operations can be reduced. Also, the possibility 
 that the adder must be used three times can be eliminated by 
 an appropriate recoding of multipliers and choice of division 
 parameters. An additional restriction on the use of this scheme 
 is that the radix must be greater than two. 
 
 3.2.2 The Second Order Simple Transformation Adder 
 
 Let us now consider an alternative addition scheme pro- 
 posed by Rohatsch (21). This method requires the values of one 
 additional digit of the operands (i.e. , a = 2). It is described as 
 the second order simple transformation by Rohatsch, and is like 
 the Avizienis structure in having two levels. Rather than two 
 
 intermediate digits, this structure employs three, with relative 
 
 2 
 weights of r , r, and 1 for the f's, t's, and w's respectively (see 
 
 Figure 5). 
 
 The operand digits (a 1 ., a., ^.) are all assumed to be 
 
41 
 
 Figure 5. The two level adder based on the second order simple 
 transformation. 
 
42 
 
 chosen from the same digit set for reasons of compatibility. 
 That is , if 
 
 B. = "(i, (i-1) 1, 0, I, ... l} i>l (3.2.3) 
 
 where an overbar indicates negation, 
 
 then 
 
 a. e B , a' e B , and <t>. e B (3.2.4) 
 
 i x-£ i r-A l r-£ 
 
 The multiplier digit k may be selected either from the 
 
 set B above or the set B , . The latter option is included 
 
 r-i r-1 F 
 
 as a possibility because it may be desirable to recode multiplier 
 digits in an attempt to minimize the number of uses of the adder, 
 and it may be desirable to conduct the division with a more 
 redundant number representation so as to decrease the number 
 of digits of each partial remainder which must be examined (3). 
 In the latter case, the quotient digits thus obtained must be 
 applied to a recoder which converts them to a compatible repre- 
 sentation. We will investigate adders compatible with both con- 
 straints on the multiplier digits. 
 
 The equations describing the adder of Figure 5 are 
 
 2 
 cr=a.+k^=f. „ r + t. . r + w. (3.2.5) 
 
 i l l 1-2 l-l i 
 
 and 
 
43 
 
 a' = f. + t. + w. (3.2.6) 
 
 111 1 
 
 where 
 
 Hi 
 a., <f>. are the i digits of the operands, as 
 11 
 
 discussed above 
 
 th 
 a', is the i digit of a representation of 
 
 A + k * 0. 
 
 th 
 CT. is the i digit of an intermediate repre- 
 sentation of 
 A + k * 0. 
 
 f is one of the components into which o~ is 
 i-2 i 
 
 recoded. 
 
 2 
 It has weight r . 
 
 t. , is another of the components into which (T. 
 l-l i 
 
 is recoded. 
 
 It has weight r. 
 
 w. is the last of the components into which <T. is 
 i i 
 
 recoded. 
 
 It has weight 1 . 
 Using the analysis of Rohatsch, 
 
 S(T)>r (3.2.7) 
 
 S(W)2;r (3.2.8) 
 
 2 
 
 is a necessary condition for r F + rT + W to be contiguous, 
 
44 
 
 and hence "cover" ^ , where 
 
 F, T, W, and ZE are digit sets such that 
 
 f. e F, t. e T, w. e W, and cr. e T , and 
 111 i 
 
 S(X) is the number of elements in the set X. 
 Rohatsch has shown that the choice of equality in (3.2.7) 
 and (3.2.8) above allows the size of <^, to be maximum since 
 the size of A' is fixed. Making this choice and applying if to the 
 condition that F + T + W is covered by A' , 
 
 S(A') >2r + S(F) - 2. 
 But from (3.2.3) and (3.2.4) we see that 
 
 S(A') = 2 (r- £ ) + 1, 
 
 so 
 
 But since 
 
 S(F)< 3 - 21 
 
 I> 1 > 
 
 then 
 
 S(F)<1. (3.2.9) 
 
 Note that if S(F) = 1 , the f input of the upper stage is 
 constant, and the structure therefore degenerates to the 
 Avizienis structure. The conclusion is that it is not possible to 
 devise a two level adder such that the conditions of Equations 
 (3. 2. 1) and (3.2.2) can be met. 
 
45 
 
 I 9i 
 
 t: 
 
 °i <t>\ 
 
 Figure 6. The three level adder based on the application of two 
 simple transformations. 
 
46 
 3.2.3 The Three Level Adder 
 
 We will now analyze a three level structure which can 
 also be characterized by a = 2. The notation for this adder, 
 shown in Figure 6, is as follows: 
 
 cr = a. + k 4> (3.2.10) 
 
 ii i 
 
 w = cr. - t. , r (3.2.11) 
 
 i l i-l 
 
 cr' = t. + w. (3.2. 12) 
 
 ill 
 
 w! = <r! - t' r (3.2.13) 
 
 l i i-l 
 
 a' = V + w 1 (3.2.14) 
 
 i i i 
 
 where 
 
 a., </>. are the i digits of the operands 
 
 ii 
 
 th ^ 
 
 a' is the i digit of a representation of A + k * 
 
 i 
 
 .th 
 cr. , rr 1 are the i digits of intermediate 
 ii 
 
 representation of A + k * 0. 
 
 t., w. , are the components into which <T. is recoded. 
 ii l 
 
 t' , w' , are the components into which cr! is recoded, 
 l i i 
 
 In the analysis below, the sets from which the digits are 
 
 chosen are as follows: 
 
 a e A, a' £ A' , <t>. e 0, k e K, C. e 5>J , 
 ill i 
 
 cr' e ^\ t.E T, t' e T' , w. e W, w' e W. 
 i <^-- i i ii 
 
47 
 Equation (3.2.14) yields 
 
 S(A') ^S(T') + S(W') - 1. 
 Using (3. 2. 3), (3.2.4), and assuming that 
 
 S(W') = r, 
 since this maximizes the set J> ' covered by T' + W , the 
 equation becomes an equality 
 
 2 (r-i ) + 1 = S(T') + r-1, 
 
 or 
 S(T') = r-2/ + 2. (3.2.15) 
 
 Since the size of T' must be 2 or greater to assure that the three 
 level adder does not degenerate into the Avizienis adder, 
 r - 2J+ 2 > 2, 
 
 or 
 r^2^ . 
 This condition is satisfied for all number systems con- 
 sidered in this paper. 
 
 Now, from (3.2. 13) and (3. 2. 1 5) the size of g ' is 
 
 S( ^') = S(T') • S(W) = r (r-2 /+ 2). (3.2.16) 
 
 From Equation (3.2.12) we see that 
 
 S( g ') <S(T) + S(W) - 1 
 
 *Since A' has a fixed size. 
 
48 
 
 Using (3.2.16) and assuming that 
 
 S(W) = r, 
 since this maximizes the size of ^ covered by T and W 
 and allows the equation above to become the following equality 
 
 S( £') = S(T) + r-l. 
 Applying (3.2. 16) to the above yields 
 
 S(T) = r 2 - 2 r/+ r + 1 (3.2.17) 
 
 From Equations (3.2. 10) and (3.2. 11) we see that 
 
 S(T) ■ S(W)£S( ^')>S(A) +S(k • 0) -1. 
 Applying (3.2. 17) and the assumption that S(W) = r, this 
 becomes 
 
 2 "j ^ 
 
 r -2ri+r - r > 2 (K + 1 ) (r-J ) + 1 (3.2.18) 
 
 Setting K = r-1, which is equivalent to assuming that 
 
 k e B ,,(3.2.18) reduces to 
 r-1 
 
 J <j + ~ whenr>l (3.2.19). 
 
 Since £ < ~ for all signed-digit number systems, it 
 
 is possible to implement a three level adder for any signed-digit 
 
 number system. This conclusion is valid whenever all multiplier 
 
 digits k are chosen from B , or one of its subsets. Hence, the 
 
 r-1 
 
 redundancy of multiplier and quotient digits has only a second- 
 order effect on the adder complexity. The number of DPUs to 
 
 *Since the size of ^' is fixed by our earlier assumption. 
 
49 
 
 a 1 
 
 25 
 
 g S M 
 
 L 
 
 10 •- 
 
 3 
 
 BS* 
 
 
 j i 
 
 ai 
 
 o u. o 
 
 T3 
 R) 
 
 a; 
 
 a 
 
 c 
 o 
 
 • H 
 
 4-> 
 
 O 
 
 u 
 
 0) 
 
 C 
 
 I 
 
 o 
 u 
 
 0) 
 
 c 
 o 
 
 W 
 
 <U 
 
 •rH 
 
 a 1 
 v 
 u 
 
 ■t-i 
 
 XI 
 
 ■i-i 
 
 u 
 a> 
 
 T> 
 T3 
 
 rt 
 
 i— I 
 > 
 
 0) 
 
 1—1 
 
 0) 
 <D 
 J-i 
 ^5 
 
 0) 
 
 o 
 
 d 
 
 o 
 
 £ 
 
 CO 
 ■— I 
 
 V 
 
 u 
 
 s> 
 
 • ■-4 
 
50 
 
 which a DPU must send information for performing additions 
 is independent of the range of multiplier digits employed, 
 while the amount of information is dependent on this factor. 
 
 3.2.4 Implementing the Three Level Adder 
 
 The designer has two methods of implementing the 
 
 three level adder in the limited connection arithmetic unit. 
 
 The first method is to implement it so that one micro-instruction 
 
 is executed for each addition operation. If this method is chosen, 
 
 the data processing portion of the structure can be represented 
 
 schematically as in Figure 7. The registers, of course, retain 
 
 one digit of each of the operands. Adder Part I determines the 
 
 appropriate C. from the data retained in the registers of the 
 
 Digit Processing Unit and the value of k, which is carried by the 
 
 control (micro-instruction) stream. That is, each Adder Part 
 
 I implements Equation 3.2. 10. It transmits this information to 
 
 the Adder Part II of its own DPU, and to those of the two DPUs 
 
 to its left. Each Adder Part II then determines its appropriate 
 
 a' digit based on the values of the <7"s presented to it. That is, 
 i 
 
 the transfer function of Adder Part II is the combined transfor- 
 mations of Equations 3.2.11 through 3. 2. 14. 
 
 Note that in this case, the a of the arithmetic unit must 
 be two or more. A second method allows a to be one by 
 
51 
 
 requiring two micro- instructions to be executed for each addition 
 operation. Equations (3. 2. 10) through (3. 2. 12) must be per- 
 formed by the first micro-instruction, which causes an interim 
 representation, ^ ' , of the sum to be placed in the accumulator 
 register. The second micro-instruction then recodes this 
 interim representation to an acceptable signed-digit number by 
 performing Equations (3.2.13) and (3.2. 14). 
 
 This method of implementing addition is subtly different 
 from the Avizienis Adder, described in Section 3.2.1. From one 
 to three uses of the Avizienis Adder are required to perform an 
 addition operation, while two addition micro-instructions are 
 required in the method just described. Moreover, this method 
 of performing addition requires that the number of states taken 
 on by a digit of the accumulator be larger than that required in 
 the Avizienis adder by approximately a factor of — , the radix of 
 the number system employed (see Equation 3.2.16). Hence, if 
 the arithmetic unit must be implemented such that a = l, the 
 Avizienis Adder appears to be a more optimum design than the 
 two step implementation of the three level adder. 
 
 3.2.5 A Detailed Look at the Radix Two Three Level Adder 
 
 We will now discuss the simplest of the adders of the 
 above type, the radix two three level adder. It is presented 
 
52 
 
 as both an example and a means of examining in greater detail 
 the requirements of the adder. 
 
 In the case of radix two, K =1 and £. must be chosen to 
 be one. There are three possible adders. The first, which 
 will be referred to as Adder 1 , can be characterized by 
 
 T = (1 , 0, T), W=T' = (0, T), and W = (1,0). 
 The second adder, which will be referred to as Adder 2, can 
 be characterized by 
 
 T = (1, 0, T), W = T' = (1, 0), and W = (0, 1). 
 The third adder, Adder 3, can be characterized by 
 
 T = W = T' = W = (1, 0, 1). 
 The operands for all adders are 
 
 n . n 
 
 A = "yT a. 2 and = S""* <t> . 2 , 
 
 i=l i=l 
 
 the sum is 
 
 n 
 i=0 
 
 • = ? a, i 2 ' i - 
 
 The relationships between them are given by Equations (3.2.10) 
 through (3.2. 14), in which r = 2 and | k | = 1 . The choices in 
 Equation (3.2.11) are given for all values of C , and for all 
 three adders in Table 2. 
 
53 
 
 Table 2. Digit Choices in (3.2.11) For The 
 Radix 2 Adder Types 
 
 Input 
 
 Adder 1 
 
 Adder 2 
 
 Adder 
 
 3 
 
 <3"i 
 
 t. w. 
 l-l i 
 
 i-l i 
 
 t 
 i-l 
 
 w 
 
 i 
 
 2 
 
 1 
 
 1 
 
 1 ' 
 
 
 
 1 
 
 i T 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 7 
 
 1 
 
 7 1 
 
 r 
 
 1 
 
 2 
 
 r o 
 
 r 
 
 i 
 
 
 
 Table 3. Digit Choices in (3.2.13) For The 
 Radix 2 Adder Types 
 
 Input 
 
 Adder 
 
 1 
 
 Adder 
 
 2 
 
 Adder 
 
 3 
 
 <n 
 
 'i-i 
 
 w! 
 i 
 
 'i-i 
 
 w! 
 
 i 
 
 'i-i 
 
 w! 
 
 l 
 
 2 
 
 • 
 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 I 
 
 1 
 
 
 
 7 
 
 
 
 I 
 
 2 
 
 1 
 
 
 
 — 
 
 — 
 
 1 
 
 
 
Table 4. Combined Adder Tables 
 
 54 
 
 
 
 
 Adder 
 
 • 1 
 
 
 
 Add 
 
 er 
 
 2 
 
 
 
 Add 
 
 er 
 
 3 
 
 
 
 °~i 
 
 
 
 a' 
 
 o 
 
 
 <7i 
 
 
 
 a o 
 
 
 01 
 
 
 
 a o 
 
 CT 
 
 i+1 
 
 °I+2 
 
 2 
 
 1 
 
 1 
 
 2 
 
 i=0 
 
 2 
 
 1 
 
 1 
 
 2 
 
 i=0 
 
 2 
 
 1 
 
 1 
 
 2 
 
 i=0 
 
 2 
 
 2 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 2 
 
 1 
 
 1 
 
 
 
 1. 
 
 1 
 
 2 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 2 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 2 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 I 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 2 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 2 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 2 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 2 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 2 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 2 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 o T 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 2 
 
 2 
 
 
 
 1 
 
 1 
 
 
 
 2 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 T 
 
 1 
 
55 
 
 The choices in Equation (3.2. 13) are given in Table 3 
 
 for all values of <T' and for all three adders. Table 4 gives 
 
 i 
 
 the sum digit a', for all possible combinations of values of 
 
 (T.i CT. , » CT- o an d for all three adders. It shows the 
 ^i v i+l ~i+Z 
 
 overall result of Equations (3.2.11) through (3.2.14). Specific 
 design details of radix two signed digit adders are discussed 
 in papers by Robertson (16) and Borovec (7). 
 
 3.2.6 Addition Overflow Correction 
 
 The sum produced by the adder may have a non-zero 
 
 a' digit. This is the indication of potential overflow; it is only 
 o 
 
 potential overflow because it may be possible to recode this 
 
 sum into an arithmetically equivalent representation A" which 
 
 has a" = (e. g. , A' = 1 . 001 . . . may be recoded into A" = 
 o 
 
 0.111 . . .). 
 
 The arithmetic unit may be designed to shift all such 
 sums until a" = 0, or to attempt to recode the sums prior to 
 shifting. It will be shown below that normalization is required 
 to perform division, so the ability to perform the recodings 
 mentioned above must exist in the arithmetic unit. Therefore 
 the decision above will have a minor effect on the complexity 
 of the arithmetic unit. If the arithmetic unit shifts all non-zero 
 sum digits into the DPUs without attempting to recode them, 
 
56 
 
 additions will be faster but will tend to discard digits of the 
 result unnecessarily. 
 
 3. 3 Multiplication Considerations 
 
 As indicated in Section 3. 1, the requirements of 
 division and normalization dictate that the operands are placed 
 in the DPUs so that the digits of each of the operands are 
 available in order of decreasing significance. Hence, the 
 multiplication algorithm must be right-directed; i.e., the 
 greater the significance of a given multiplier digit, the earlier 
 it determines what multiple of the multiplicand will be added to 
 the partial product. This, in turn, implies that the partial 
 product must be shifted left one digital position with respect 
 to the multiplicand for each step in the algorithm. 
 
 One possibility is that of shifting the multiplicand one 
 digital position to the right each step. While this method has 
 merit and may be sufficient in some circumstances, it does not 
 allow the product to be determined to the greatest precision 
 possible, as the digits shifted beyond the structure containing 
 addition capability are functionally truncated. * 
 
 * Multiple precision arithmetic operations, while possible in 
 the limited connection arithmetic unit, appear to require an 
 excessive amount of time to perform. Hence, a single pre- 
 cision product may be all that is required. This method 
 would then be adequate. 
 
57 
 We will go on to discuss an implementation of multipli- 
 cation in the limited connection arithmetic unit -which allows the 
 product to be determined to maximum precision. It is clear that 
 the partial product, not the multiplicand, must be shifted during 
 each step, since all digits of the multiplicand must take part in 
 each step of the multiplication. 
 
 3.3.1 Manipulation of the Partial Product 
 
 Since the arithmetic unit possesses a single length adder, 
 a scheme similar to that employed on the Illiac (14) and the IBM 
 7094 (11) must be incorporated in order to retain all the product 
 digits determined. The digits of the multiplier which are 
 retired can be used to make room for the digits of the product 
 which need no longer be processed by the adder (although this 
 destroys the multiplier in the process). 
 
 There are two problems associated with the adaptation 
 of this technique to the proposed arithmetic structure. The 
 first problem is the necessity of communicating information 
 from the most significant digital position of the adder unit to 
 the least significant digital position of the multiplier register. 
 Not only are the DPUs containing these digital positions of the 
 operands not in direct communication with one another, they 
 are not performing the same micro- instruction, as can be seen 
 
58 
 from the example of Section 2.2. Special connections causing 
 them to exchange this information would, therefore, be of no 
 use. The solution to this problem is to send the value of the 
 digit to be placed in the least significant digital position with 
 the left shift multiplier micro-instruction. In this way, the 
 product digit which is to be placed in the least significant posi- 
 tion of the multiplier register is available to the least signi- 
 ficant DPU when that DPU is executing the shift. The inter- 
 face between the DPU containing the most significant digit of 
 the operands and the Primitive Control Unit should also be 
 identical to that between any two adjacent DPUs. The Pri- 
 mitive Control Unit should contain a "multiplier register" into 
 which the information to be shifted into the least significant 
 digital position is placed and which is caused to exchange multi- 
 plier digits with the most significant DPU. This not only allows 
 the DPU containing the most significant digits of the operands 
 to be treated exactly like any other DPU, but affords a method 
 of making multiplier digits available to the primitive control unit. 
 The digit which is in this register at the completion of the 
 exchange between the register and the multiplier register of the 
 
 •JU 
 
 first DPU is the next multiplier digit to be employed. 
 
 *This is true if one or more exchanges, dependent on whether a recoder 
 is incorporated into the design, precede the first use of the adder. 
 
59 
 
 3.3.2 A Multiplication Example 
 
 An example of the procedure just described is shown in 
 Figure 8. In this example the following assumptions are made: 
 
 1) the operands are assumed to be three digits in 
 length, 
 
 2) three registers (M, A, and 0) take part in the multi- 
 plication. These registers play the following roles 
 in the process. The A register contains zero prior 
 to the multiplication and contains the least significant 
 digits of the product at the conclusion of the process. 
 The M register initially contains the multiplier and 
 holds the most significant digits of the product at the 
 conclusion of the multiplication. The register con- 
 tains the multiplicand throughout the process; 
 
 3) a = for all micro-instructions , 
 
 4) the adder never causes overflow (i.e. , a =0 following 
 
 o 
 
 an addition), 
 
 5) multiplier recoding is not employed, 
 
 6) the 'Pr Data' register is the only data register in the 
 
 *This assumption is definitely unrealistic. The figure is made more 
 compact by this assumption. 
 
60 
 
 PCU which takes part in the multiplication. Its 
 
 contents define the multiple of the register to 
 
 be added to the A register when an addition (+) 
 
 micro- instruction is performed. Just prior to the 
 
 execution of a left shift micro-instruction by DPU 
 
 the 'Pr Data' register contains the digit which is to 
 
 be placed in the least significant DPU (DPU ) by that 
 
 micro-instruction. The 'Pr Data' register receives 
 
 the digit shifted out of DPU. when DPU executes a 
 
 left shift micro-instruction. 
 
 The following conventions are used in Figure 8: 
 
 M is the heading of the column indicating the 
 
 J 
 
 contents of the M register of DPU., 
 
 J 
 
 A. is the heading of the column indicating the 
 
 contents of the A register of DPU., 
 V , is the heading of the column indicating the 
 
 J 
 
 micro- instruction which has just been 
 executed by DPU., 
 'PrData' is the heading of the column indicating the 
 contents of the register in the PCU taking 
 part in the multiplication, 
 
61 
 
 Line 
 
 
 
 
 Pr. 
 
 
 
 
 
 
 
 No. 
 
 M i 
 
 M 2 
 
 M 3 
 
 Data 
 
 A l 
 
 A 
 2 
 
 A 
 3 
 
 v 1 
 
 y 2 
 
 «J 
 
 1 
 
 m i 
 
 m 2 
 
 m 3 
 
 X 
 
 
 
 
 
 
 
 - 
 
 - 
 
 - 
 
 2 
 
 m 
 2 
 
 
 
 m i 
 
 
 
 
 EM 
 
 - 
 
 - 
 
 3 
 
 
 m 3 
 
 
 
 
 l p i 
 
 
 
 +m 
 
 EM 
 
 - 
 
 4 
 
 
 
 X 
 
 P l 
 
 1 P 2 
 
 1 P 2 
 
 
 EA 
 
 +m 
 
 EM 
 
 5 
 
 m 
 3 
 
 
 
 m 2 
 
 
 1 P 3 
 
 1 P 3 
 
 EM 
 
 EA 
 
 +m 
 
 6 
 
 
 X 
 
 
 
 
 2 P 1 
 
 
 
 
 +m 2 
 
 EM 
 
 EA 
 
 7 
 
 
 
 p l 
 
 P 2 
 
 2 P 2 
 
 2 P 2 
 
 
 EA 
 
 +m 2 
 
 EM 
 
 8 
 
 X 
 
 
 
 m 3 
 
 
 2 P 3 
 
 2 P 3 
 
 EM 
 
 EA 
 
 +m 2 
 
 9 
 
 
 p 
 1 
 
 
 
 
 3 P 1 
 
 
 
 
 +m 3 
 
 EM 
 
 EA 
 
 10 
 
 
 
 P 2 
 
 P 3 
 
 3 P 2 
 
 3 P 2 
 
 
 EA 
 
 +m 
 
 EM 
 
 11 
 
 P 
 1 
 
 
 
 X 
 
 
 3 P 3 
 
 3 P 3 
 
 EM 
 
 EA 
 
 +m 3 
 
 12 
 
 
 P 2 
 
 
 
 
 
 
 
 - 
 
 EM 
 
 EA 
 
 13 
 
 p i 
 
 P 2 
 
 P 3 
 
 X 
 
 P 4 
 
 P 5 
 
 
 
 - 
 
 - 
 
 EM 
 
 Figure 8. Multiplication example 
 
62 
 +k indicates that the corresponding DPU is per- 
 
 forming an addition micro-instruction in which 
 A*A + k • 0, 
 EM or EA indicates that the corresponding DPU is per- 
 forming a left shift of the M or A register 
 
 respectively, 
 
 f Vi 
 m. is the i multiplier digit, 
 
 .p. is the i digit of the portion of the j partial 
 
 product which is in the A register, and 
 
 th 
 P. is the j digit of the product and is 
 
 J 
 
 P. = 3 P.. 2 J - 4. 3 
 
 P 6 . 0. 
 The arithmetic unit is prepared for the multiplication at 
 time 1. The M register contains the multiplier (m m m ), the 
 register contains the multiplicand, and the A register contains 
 zero. At time 2 a 'left shift M register 1 micro-instruction (EM) 
 is launched. This causes the most significant multiplier digit 
 (m ) to be placed in ' Pr Data 1 . An addition micro-instruction is 
 launched at time 3, which causes the first partial product 
 ( p p p ) to be formed in the A register and clears 'Pr Data' 
 
 *The register is not shown in the figure because it remains unchanged 
 throughout the multiplication. 
 
63 
 
 A 'left shift A register' micro-instruction (EA) is launched next. 
 This causes the most significant digit of the partial product ( p ) 
 to be placed into 'Pr Data', which is also the most significant 
 digit of the product, P . A 'left shift M register' micro-instruc- 
 tion is issued at time 5. This causes: 
 
 a) P to be carried along with the micro-instruction 
 and placed into DPU , 
 
 b) m to be shifted into 'Pr Data', and 
 
 2 
 
 c) causes the remainder of the M register to be 
 shifted left one DPU (or digital position). 
 
 The second of the repetitive steps now begins at time 6 with the 
 launching of an addition micro-instruction. When all five of the 
 repetitive steps is completed by all DPUs (time 21) 
 
 a) the most significant part of the product 
 
 (P P P ) is contained in the M register, 
 
 b) the least significant part of the product 
 
 (P, P_ 0) is contained in the A register, and 
 4 5 
 
 c) 'Pr Data' contains the digit (x) that it contained 
 just prior to the multiplication. 
 
64 
 
 3.3.3 The Multiplication Overflow Recoder 
 
 A second problem associated with the shifting of the 
 partial product in a right multiplier is that not one but two 
 digits are shifted out at each step. That is, in addition to the 
 most significant digit being shifted out, a non-zero overflow 
 digit may have been formed and must be taken into account. 
 It is clear that the digits shifted out can be converted so that 
 only one digit need be stored in the multiplier register during 
 each step, since the product of two numbers which are less 
 than one in magnitude is also less than one in magnitude. 
 
 The multiplication algorithm can be characterized by 
 P = (3.3.1) 
 
 where 
 
 P. = r-P. .+m.-0 i=0,l, . . . ,n (3.3.2) 
 
 i l- 1 i 
 
 P. is the i partial product, 
 
 l 
 
 m. is the i digit of the multiplier, 
 
 r is the radix, and 
 
 is the multiplicand, 
 
65 
 
 The product is then 
 
 where 
 
 where 
 
 M • = r " n P (3.3.3) 
 
 n 
 
 <£ -i 
 
 M is the multiplier, i.e. , M -Zj m.r 
 
 x=0 i 
 
 Each partial product can be written in the form 
 
 1 ^ • 
 
 P. = ^.p.'r-' + z. + /, .a.r" J (3.3.4) 
 
 J=h J-* 
 
 th 
 .p. is the digit of the i partial product which 
 
 has weight r , .p. I <£"(r-l), 
 
 h is an integer to be determined, 
 
 th 
 z. is the i value of a variable indicating the 
 
 i 
 
 value of the overflow from the adder which 
 
 has not been included in the p.'s. 
 
 i J 
 
 , .th ,. . , .th . 
 
 a is the j digit of the i partial product, 
 
 i J 
 
 which will remain in the adder when the 
 
 st 
 i + 1 step of the algorithm is performed. 
 
 Applying the representation of P. above to Equation (3.3.2) we 
 
 obtain 
 
 x~-7 n n 
 
 P -.li = Z^.P-r J + 1 +rz.+ y.a.r 1_j + V m-. <4.r" J (3.3.5) 
 
 l+l r—ii j i ^-ii j z__i i+l J 
 
 j=h j =2 f=i 
 
66 
 
 Recognizing that the adder combines the last two portions of 
 the partial product, i.e. 
 
 n n n 
 
 2 i+ iy" J = 2iV " J+ Z m i + iV < 3 - 3 - 6 ' 
 
 3=0 j=2 j=l 
 
 we get 
 
 i . n 
 
 P ="y 7 .p. r ^ + +rz. + . a +. ,a,r _1 + V ,,a.r _:i (3.3.7) 
 
 l+l Z-j i*j i i+1 i+1 1 A i+l j v ' 
 
 j=h j=2 
 
 Identifying 
 
 yields 
 
 l + iVi = i p j' < 3 - 3 - 8 ' 
 
 i+1 n . 
 
 p -,i=y ,.p.r J + rz. + . 1 a rt + .,.a.r" + 5"* a - r " J (3.3.9) 
 i+l Z_v i+l j i i+l i+l 1 /__j i+1 j 
 
 j=h+l j=2 
 
 To obtain the form of Equation (3.3.2), the following relation 
 must hold 
 
 . ,P, r I z. , =rz +. ,a + a r" (3.3.10) 
 
 i+l^h i+1 i i+l i+1 1 
 
 Hence the problem of converting the digits shifted out 
 of the adder during a multiplication into a sequence of single 
 digits is the determination of the value of h and the range of 
 values taken on by the z. in Equation (3.3. 10) above. 
 
67 
 
 In order to simplify further discussion, we will let 
 
 v. - .P, (3.3.11) 
 
 ' 1 in 
 
 6. = r .a + a, (3.3.12) 
 
 i l i 1 
 
 and 
 
 Z. = r z.. (3.3.13) 
 
 l l 
 
 Multiplying Equation (3. 3. 10) by r and making the above 
 substitutions, it becomes 
 
 Vi-4- 1 
 
 Y.,, r + Z. . = r Z + 6 . (3.3.14) 
 
 i+I i+1 i i v ' 
 
 To determine h and the set of values taken on by Z. in 
 the equation above, we will use the analysis formulated by 
 Rohatsch (21). 
 
 We see from Equation (3.3.12), the characteristics of 
 
 signed digit adders, and the assumptions of symmetric number 
 
 systems that 
 
 6. c B ,.,.„. (3.3.15) 
 
 l (r-i) (r+1) 
 
 Clearly 
 
 S( B.)> r, (3.3.16) 
 
 i 
 
 so the set of values taken on by rZ. + 8. are therefore contiguous 
 
 11 
 
 in the terminology of Rohatsch. 
 
68 
 
 Vi-4- 1 
 
 Hence, the set of values taken on by v r + Z , 
 
 y T i T l+l 
 
 must also be contiguous. This implies that 
 
 S(Z ) >r h+1 . (3.3.17) 
 
 Since 6 and Y are symmetric, we will assume 
 i i+1 
 
 that Z. and Z , are also symmetric. Therefore, we will 
 
 assume that 
 
 h+1 
 Z < r - 6 (3.3.18) 
 
 1 2 
 
 where <$ = r Mod 2. 
 
 The set defined by the left side of Equation (3.3. 14) 
 
 must contain the set defined by the right side, and the largest 
 
 element of the former must be larger than the largest element 
 
 of the latter , or 
 
 r h+1 -(r-l) + v hil -* > r . /r* 1 ^ -6 U(r-/)-(r+l) (3.3.19) 
 
 After manipulation to isolate , this becomes 
 
 /?<r I (r-1) (r+1 -6 ) (3.3.20) 
 
 22 7 / h+1 n 
 
 2 (r -r-1) 
 
 In this form it is not difficult to see that h = 1 may be 
 
 employed for all but the minimally redundant (£= r_) even radix 
 
 2 
 
 *By symmetry, the smallest element of the former will then be smaller 
 than the smallest element of the latter. 
 
69 
 
 number systems. Therefore, the overflow digit recoder 
 
 3 
 requires approximately r states to retain the value of z 
 
 i 
 
 when the minimum redundancy even radix representations 
 
 2 
 are employed. In all other cases approximately r states 
 
 will suffice. 
 
 For the radix 2 arithmetic unit, the h = 2 overflow 
 digit recoder must be employed as r = Z , H - 1 must be con- 
 sidered a minimally redundant system. 
 
 3 . 4 Multiplier Recoding 
 
 The expected time to perform a multiplication can be 
 decreased in this arithmetic unit, as in all arithmetic units 
 employing the shift and add algorithm to implement multipli- 
 cation, if the fractional part of the multiplier is recoded to 
 optimize the operations to be performed. 
 
 The multiplier recoding employed in this arithmetic 
 unit should maximize the number of zero digits in the multi- 
 plier. A step of the multiplication algorithm for which the 
 controlling multiplier digit is non-zero consists of two sub- 
 steps, the shift and the addition; while a step for which the 
 multiplier digit is zero consists only of the shift. A shift 
 can be launched when only two modules have completed the 
 previous micro-instruction, and an addition requires three 
 
70 
 modules to have completed the previous micro-instruction. 
 Hence, the addition cannot be launched until the last micro- 
 instruction associated with the previous step of the algorithm 
 has been executed by the first four modules. The latter step, 
 therefore, takes at least twice as long to perform as the 
 former, and hence the goal of maximizing the number of zeros 
 in the multiplier. 
 
 3.4.1 Recoder Development Philosophy 
 
 We will now look at the problem of devising a multiplier 
 recoder for a radix 2 arithmetic unit which employs Adder 2 
 (see Section 3.2). 
 
 This adder is such that the probability of digits with 
 value zero in the sum is 1/2, as shown in Appendix I. It is 
 possible to increase this statistic to 2/3 (15), making possible 
 a potential decrease of 1/9 in the time required to perform the 
 repetitive steps of the multiplication. Since the time required 
 to do the other operations necessary to perform a multipli- 
 cation is expected to be a very small fraction of the time required 
 to do the repetitive steps, the multiplication time is expected to 
 be decreased by this same amount, 1/9, by the inclusion of a 
 recoder. 
 
71 
 The work on such recoders with which the author is 
 familiar are only applicable to numbers in non- redundant 
 form. An approach for extending this work to radix two 
 signed-digit numbers is discussed below. The recoder is 
 assumed to consist of two processors acting in cascade. 
 The first of these is an assimilator which reduces the 
 redundancy of the number as much as possible. The 
 second is a recoder which converts the assimilated inter- 
 mediate representation of the number to one in which the 
 probability of zero is maximized. 
 
 If it is possible to show that the recoder produces 
 the same number of zeros for all arithmetically equivalent 
 intermediate representations, then the probability of zero 
 digits is related only to the distribution of values presented 
 to the recoder and not to the distribution of their specific 
 representations. We will define an assimilator which accepts 
 signed-digit numbers and converts them into numbers which are 
 in conventional form except for the possible replacement of 
 10 ■ • -00 sequences by 01 • • • 12 sequences . We will then show 
 that one of the minimal right directed recoders developed by 
 Penhollow (15) can be extended to produce the same number of 
 zero digits for all possible occurrences of the two arithmetically 
 equivalent sequences above. Therefore this assimilator - recoder 
 
72 
 
 cascade will recode radix 2 signed-digit numbers into minimal 
 form. 
 
 3.4.2 The Assimilator 
 
 The proposed assimilator is shown in Table 5. The 
 terminology employed in all of the tables of this section is 
 as follows: The input representation is of the form 
 
 n 
 
 X= 2*.2"\ (3.4.1) 
 
 i=l x 
 
 that of the intermediate representation is 
 
 n 
 Y=-y + ZTy.2" 1 (3.4.2) 
 
 i=l x 
 
 and the recoded representation is 
 
 n 
 Z= 2Tz.2 _1 (3.4.3) 
 
 i=0 x 
 
 where 
 
 x and z are signed digits 
 i i 
 
 ; x., z. e -Tl, 0, 1 J 
 
 y is a digit of the assimilated intermediate repre- 
 i 
 
 sentation, y.e -JO, 1, 2,/- 
 
 1 indicates the -1 digit. 
 
 indicates the entry does not affect the choice. 
 
 1 indicates any number of consecutive one digits 
 (including none). 
 
73 
 
 Table 5. Right-Directed 
 Assimilator . 
 
 Table 6. Simplest Right-Directed 
 Extended Recoder. 
 
 Choose 
 
 Known 
 
 
 Vi 
 
 x. x. , 
 
 i i+1 
 
 x. „ 
 
 i+2 
 
 I 
 
 I 
 
 I I 
 
 - 
 
 1 
 
 I 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 I 
 
 1 1 
 
 - 
 
 1 1 
 
 1 
 
 1 
 
 - 
 
 1 1 
 
 1 
 
 
 
 I 
 
 1 1 
 
 1 
 
 
 
 
 
 2 
 
 1 
 
 
 
 1 
 
 
 
 
 
 
 
 - 
 
 1 
 
 
 
 1 1 
 
 - 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 1 1 
 
 - 
 
 Choose 
 
 Known 
 
 c. z. 
 l l 
 
 C i-1 y i y i + l y i + 2 
 
 OMMMMOMOMMMOO 
 OOOOOMlMlMMMMOO 
 
 
 1 
 1 1 
 1 2 
 
 2 
 Ol- 
 IO- 
 110 
 110 1 
 110 2 
 111 
 112 
 
 1 2 - 
 
 Table 7. Cases in Which a 2 Digit Is Presented to the Recoder Above, 
 
 Mode 
 Digit 
 
 Digit Sequences 
 Containing 2 Dig 
 
 LtS 
 
 A 
 
 rithmetically Equal 
 Digit Sequences 
 
 Recoder 
 Output 
 
 
 
 1 
 
 2 
 
 
 
 
 
 10 
 
 = 
 
 
 
 10 1 
 
 2 
 
 
 
 
 
 ll600 
 
 # 
 
 
 
 1 i 
 
 2 
 
 
 
 1 
 
 10 
 
 = 
 
 1 
 
 1 o i 
 
 2 
 
 
 
 
 
 110 
 
 = 
 
 1 
 
 1 i 
 
 2 
 
 
 
 1 
 
 10 
 
 * 
 
 1 
 
 1 1 i 
 
 2 
 
 
 
 1 
 
 110 
 
 = 
 
74 
 
 The basic relationship of the assimilator is 
 
 y. = x. + b. - 2b. , 1 <i<n (3.4.4) 
 
 i l i l-l - ~ 
 
 where b. is a borrow digit, b. e |0, If. 
 
 x. and b are known, y and b are to be chosen, 
 ii-l i i 
 
 3.4.3 The Extended Penhollow Minimal Recoder 
 
 Table 6 is an extension of the Penhollow simplest 
 
 minimal right-directed recoder (15). The equation governing 
 
 the recoder is 
 
 z. = y. + c. - 2c , (3.4.5) 
 
 ill i-I 
 
 where 
 
 c. is a 
 
 l 
 
 carry digit, c.e-|0, 1 r 
 
 y and c are known, z. and c are to be chosen, 
 i 1-1 i i 
 
 This extended recoder accepts the output of the assimilator 
 described above. The output of the recoder was determined 
 for the six cases (see Table 7) in which the recoder encounters 
 a 2 digit. The output was compared with the output which would 
 have been produced if the assimilator had been able to produce 
 an intermediate representation which did not contain the 2 digit. 
 In all six cases, the recoder produced an equal number of digits 
 whether the assimilator output contained a 2 digit or not. The 
 output of the assimilator extended recoder therefore has the same 
 shift average as the Penhollow recoder(i.e. , 3). 
 
75 
 
 In four of the six cases, the output of the recoder is the 
 same whether the assimilator produced a 2 digit or not. In both 
 of the other cases the recoder output had more 1 digits and fewer 
 1 digits when the assimilated number contained a 2 than when it 
 didn't. Hence, the probability of 1 and 1 digits in the recoder 
 output are unequal and not independent of the digit sequence used 
 to represent the number. For example, both '011001' and 
 '011111' represent 25 , yet the first is recoded '011001 
 while the second is recoded ' 1 1 1 '. Appendix II presents 
 the overall recoder table and some additional discussion. 
 
 3.5 Normalization Considerations 
 
 There are several situations in which it is necessary to 
 normalize numbers, that is, restrict the range of values which 
 the fractional part may assume. These are the preparation of 
 operands and the processing of results. 
 
 3.5.1 Definition of Normalized Numbers and Their Range of Values 
 The major design consideration in the implementation 
 of normalization is the choice of data to examine to determine 
 if additional processing of the number is necessary. 
 
76 
 
 We will define a signed digit to be normalized when 
 either: 
 
 2. |xj = 1, x =x = ...=x jSOandx "x £1, T ^v, 
 
 3.1x1 = 1, and x =0, i=2,3, . . . , v , or 
 
 ' l 1 i 
 
 4. IxJ =l,andx • x >(l-i)ifi>l. 
 1 l 1 1 2 
 
 where x. is the i digit of the number, with weight 
 
 n 
 
 r (ie, X = !> x.r )» x.eB , 
 T~i i i r-i 
 
 1=1 
 
 v is the number of digits examined to determine 
 
 if X is normalized. 
 
 H is a integer parameter indicating the redundancy 
 
 of the number system, 1 <|<r . 
 
 2 
 
 X therefore consists of two components X and X — , and 
 
 i i 
 
 X = X + X-, 
 i i 
 
 where 
 
 X. is the value of the first i digits of the number, 
 
 i 
 i.e. , X. = S^x.r" 3 and 
 
 j = l 
 
 X— is the value of the remaining digits of the number, 
 
 i 
 
77 
 
 n 
 
 i. e. , X— = ^> x r 
 
 i ^ j 
 
 j=i+l * 
 
 From the definition of normalized numbers above, we 
 
 see that, for 1= 1 
 
 -lii. -v 
 
 r OX K(l-r ), and 
 
 v I 
 
 ■ i -"V -n 
 
 0^ X- <(r - r n ). 
 v 
 
 These can be combined to yield 
 
 r _1 -r" V +r" n <^|x |< 1 - r" n (3.5.1) 
 
 When>?>l, we see from condition 4 that v = 2 and the analysis 
 becomes 
 
 r+1- 
 
 2 
 r 
 
 k 
 
 < 
 
 X 2 <(r- 
 
 i) (r+1) , 
 
 2 
 r 
 
 o < 
 
 (r-l) 
 
 x - n . 
 
 2 n 
 
 L r r J 
 
 which yields 
 
 r+l-i+(i- r) 
 
 2 (r-l) 
 r 
 
 • 
 
 J 
 
 2 
 
 r 
 
 _ - 1 
 
 n 
 r 
 
 < 
 
 X|< (r-i) 
 (r-l) 
 
 (3.5.2) 
 
 n 
 
 We may see from the above that the range to which numbers can 
 be normalized is fixed for all but the maximally redundant num- 
 bers. For maximally redundant numbers, the range of values 
 decreases as the number of digits increases. 
 
78 
 3.5.2 Normalization Recodings 
 
 The procedure for normalizing signed-digit numbers is 
 complicated by the existence of representations for which x. is 
 not zero but which do not meet the definition of normalized 
 numbers. These representations must be recoded into an arith- 
 metically equivalent representation which, after shifting out 
 leading digits, does meet the definition. 
 
 The recoding converts the most significant digits of this 
 representation, 
 
 x = + 1, x_ = x = ... =x = 7 (i-l), x = 7(r-a) (3.5.3) 
 
 1 — 2 3 T- 1 T 
 
 into the following digits 
 
 x, = 0, x = x = . . . = x = + (r-i), x = + a (3.5.4) 
 
 1 2 3 t-1 — T — 
 
 where either the upper or the lower sign is chosen uniformly in 
 
 the description, and x is the number of digits altered by the 
 
 recoding, t ^>2. 
 
 A decimal example of this procedure, assuming 2 - 2, is 
 
 the recoding of ' 1 1 1 6 5 3 ' into '088453'. In this example 
 
 x = -1 , x = x = + (2-1), and x = +6. 
 
79 
 
 3.5.3 Methods of Normalization 
 
 There are two basic methods of normalizing numbers in 
 this arithmetic unit. They differ in the manner in which they 
 obtain information about the number. 
 
 In the first method, the values of the most significant 
 digits are sensed only by shifting them into the PCU. This can 
 be thought of as over-normalization followed by restoration, 
 since the number will have a non-zero integer part stored in the 
 PCU just prior to restoration. 
 
 In the second method the values of required digits are 
 transmitted to the PCU by some micro -instruction. 
 
 Provision for sensing the contents of the A register will 
 be included to sense partial remainders during division. Hence, 
 this normalization technique would not appreciably add to the 
 complexity of the DPUs . In addition, this latter method affords 
 the designer two options to minimize the time to perform the 
 partial normalization. 
 
 The first option is the inclusion of sense micro-instructioni 
 detectors among the DPUs. Each time the primitive control unit 
 receives notification that additional digits of the number are 
 available it can determine whether it has sufficient information 
 to complete the partial normalization and also whether additional 
 
80 
 
 left shift micro- instructions must be issued. Note that the 
 only case for which neither is true is the case for which the 
 recoding may have to be performed. Hence, in all but this 
 one case the time required to sense the number is largely 
 overlapped with the time to perform the shifts required, and 
 the time to partially normalize may be less than the time for 
 a micro-instruction to propagate through v digit processing units. 
 
 The second option is the inclusion of micro-instructions 
 to perform the recodings indicated above. These micro-instruc- 
 tions would be such that they would add +_ (r-1) to the register 
 of the DPU containing the representation being normalized if 
 it initially contained + (*-l) and would then transmit the micro- 
 instruction to its right neighbor. "When the register of the DPU 
 executing the micro- instruction contains any other digit, the 
 micro -instruction causes + r to be added to the register contents, 
 hence affecting the + (r-a) to + a transformation. This is the 
 form of the transformation required of the last digit to be altered. 
 This last DPU does not pass the micro-instruction to its neighbor. 
 
 Comparisons between the two basic methods are very 
 difficult to make because of the options available with the latter 
 method. 
 
81 
 
 3.5.4 Analysis of Radix Two Normalization Methods 
 
 Now we will look in detail at the problem of normalizing 
 radix two numbers whenv = 2. The choice of examining first 
 two digits of the number is compatible with the implementation 
 of the simplest division algorithm. Because of the small value 
 of v , we will assume that the arithmetic unit does not contain 
 sense micro-instruction detectors and that the DPUs do not 
 have micro-instructions to do the recodings. 
 
 Numbers in this system which do not require recoding 
 during normalization are of the form 
 
 . . . , x X i^> 
 
 i 
 
 where x is either +1 or -1 for any particular number, and 
 
 X is either or x. 
 
 Numbers in this system which do require recoding are of the 
 
 form 
 
 where x = -x. 
 
 The above numbers will be referred to as (i) and (i, j) in the 
 
 remainder of the discussion. 
 
82 
 
 We will assume that the probability distribution of a 
 given digital position is a function only of the value of the 
 digit to its left, and that these probabilities are 
 
 P (leading zero) = j P(0 1 1) = P(0 1 1 ) = j 
 
 P (0|0) =| P(l|T) = P(T|l) = j 
 
 P(l|0) + P(l 0) = 7 P(l|l) = P(l |i) = 7, 
 
 where P (y z) = P (x = y, given x = z). 
 I l+l i 
 
 Then the probabilities of the two numbers are 
 
 P ((i)) = ^r— , and (3.5.5) 
 
 3-2 
 
 P((i. J)) = V~7 (3.5.6) 
 
 36-2 -6 J 
 
 Four methods for performing this normalization will be considered, 
 
 Method A consists of shifting the representation left until 
 the portion of the representation shifted into the Primitive Control 
 Unit can be recoded into two digits of the same sign or a non-zero 
 digit followed by a zero. The two digits are then shifted back into 
 the DPUs. 
 
 : This is based on the analysis of a radix two adder in Appendix I. 
 
83 
 
 Method B consists of examining the digits contained in 
 the two most significant DPUs. If the terminal digit is not 
 detected, two left shifts are launched and the process repeated. 
 If the terminal digit is detected, the additional shifts necessary 
 to normalize the number are performed and the process termi- 
 nated. 
 
 Method C consists of examining the digits contained in 
 the two most significant digital positions. If the terminal digit 
 is not detected, one left shift is performed if the first non-zero 
 digit is in the second digital position, two left shifts are per- 
 formed otherwise. The process is then repeated. If the ter- 
 minal digit is detected, then the appropriate final shifts, if any, 
 are performed and the process terminated. 
 
 Method D consists of shifting the representation to be 
 normalized left until a non-zero digit is shifted into the primitive 
 control unit. The first two digital positions are then examined. 
 If the terminal digit is not detected, two left shifts are performed. 
 The (examination, two left shift) portion of the procedure is 
 repeated until a terminal digit is detected and the appropriate 
 final steps performed. 
 
 *The recoding is performed by shifting the digit to be changed into the 
 primitive control unit, which recodes the digit and shifts the altered 
 digit right (into DPU ). 
 
Table 8. Normalization Procedure for All Possible Radix 2 
 
 Signed Digit Numbers 
 
 84 
 
 
 
 Left 
 
 Number of 
 
 Right 
 
 Method 
 
 Class Number 
 
 Shifts 
 
 Examinations 
 
 Shifts 
 
 
 (i) 
 
 i+2 
 
 
 
 2 
 
 A 
 
 
 
 
 
 
 (i. J) 
 
 i+j +3 
 
 
 
 2 
 
 
 (i) i even 
 
 i 
 
 i+2 
 2 
 
 
 
 
 (i) i odd 
 
 i+1 
 
 i+3 
 2 
 
 1 
 
 B 
 
 (i, j) i+j even 
 
 i+j+2 
 
 i+j +4 
 2 
 
 1 
 
 
 (i, j) i+j odd 
 
 i+j + 3 
 
 i+j+3 
 2 
 
 1 
 
 
 (i) i even 
 
 i 
 
 i+2 
 2 
 
 
 
 
 (i) i odd 
 
 i 
 
 i+3 
 2 
 
 
 
 
 (i» j) i+j even 
 
 i+j+2 
 
 i+j +4 
 2 
 
 1 
 
 C 
 
 (i,j) < 
 
 1 
 i even 
 
 i odd 
 
 ) 
 
 i+j+2 
 
 i+j+3 
 2 
 
 1 
 
 
 v: j 
 
 
 
 
 
 [i odd 
 (i.j) < 
 
 
 i+j+3 
 
 i+j +5 
 2 
 
 1 
 
 
 
 j even 
 
 
 
 
 
 (i) 
 
 i+1 
 
 1 
 
 1 
 
 D 
 
 (i, j) j even 
 
 i+j+3 
 
 i+2 
 
 2 
 
 1 
 
 
 (i, j) j odd 
 
 i+j+2 
 
 i+3 
 
 2 
 
 1 
 
Table 9. Average Number of Operations Required During a 
 Normalization 
 
 85 
 
 Normalization 
 Method 
 
 <L.S.> 
 
 . <R.S.> 
 
 <E> 
 
 A 
 
 3| 
 
 2 
 
 
 
 B 
 
 1^ 
 
 45 
 
 5 
 9 
 
 1^ 
 
 35 
 
 C 
 
 'if 
 
 1_ 
 3 
 
 2 l 
 
 105 
 
 D 
 
 *! 
 
 1 
 
 ^ 
 
86 
 
 Table 8 gives the number of left shifts, right shifts, and 
 examinations which must be performed for each possible number 
 and for each of the above methods of normalization. 
 
 Applying the probabilities given in (3. 5. 5) and (3. 5. 6) 
 and assuming that the numbers are not limited in length , one 
 obtains the average number of left shifts, right shifts, and 
 examinations shown in Table 9. Assuming that the time to per- 
 form a left shift is the same as the time to perform a right shift 
 and neglecting the time required to make decisions, the expected 
 number of shift times required to perform a normalization is 
 <T> = <LS> + <RS> + f . <Ce^> (3.5.7) 
 
 where 
 
 XT^ is the expected number of shift micro-instruction 
 
 times required to normalize an output of a 
 symmetric base 2 signed digit adder. 
 <(\jSy is the expected number of left shift micro-instructions 
 
 required. 
 
 ; The error in these calculations is approximately 3x2 , and 2 
 for the average number of left shifts and examinations required, 
 respectively, where n is the number of digits in a number. 
 
87 
 
 Ul 
 
 I 
 (/> 
 
 U. 
 
 o 
 or 
 
 UJ 
 CD 
 
 Z 
 
 Q 
 UJ 
 I- 
 O 
 UJ 
 0. 
 X 
 UJ 
 
 A 
 
 l- 
 V 
 
 6.5 
 
 6.0 
 
 5.5 
 
 5.0 
 
 4.5 
 
 4.0 
 
 METHOD A 
 
 METHOD B 
 
 METHOD C 
 
 • 
 
 METHOD D — — ' / 
 
 ■7- 
 
 / 
 
 
 / 
 
 / 
 
 / 
 
 * / 
 
 / 
 
 / 
 
 V 
 
 J 1 1 I I I ' ' » 
 
 1.0 
 
 1.5 
 
 2.0 
 
 Figure 9. Average time to normalize a radix two signed digit number. 
 
88 
 
 <\RS / is the expected number of right shift micro- 
 
 instructions required. 
 ^ is the ratio of the time required to transmit the 
 
 two most significant digits of the number under- 
 going normalization to the primitive control unit 
 to the time required to launch a shift micro- 
 instruction. 
 
 <^E^> is the expected number of times the digits retained 
 
 in the first two digital positions of the register con- 
 taining the number being normalized must be 
 transmitted to the primitive control unit. 
 A graph of \T/> versus 5 as determined from Table 9 
 
 and (3.5.7) is presented as Figure 9. This graph shows that 
 
 2 
 Normalization Method C is optimum when ^< 1 - , Method D 
 
 when 1 — <T 5 <T 1 ~, and Method A when <if > 1 — . The value 
 of f is approximately 2 if the 'red tape' is issuing micro- 
 instructions is negligible and becomes smaller as the 'red tape' 
 becomes more predominate. Hence, Method A is the optimum 
 when 'red tape' is negligible; Methods D and then C become 
 optimum as the 'red tape' increases. 
 
89 
 
 3. 6 Division Considerations 
 
 Division, like multiplication, must be performed by- 
 repetitive additions and shifts in the limited connection arith- 
 metic unit because it contains a single adder. Unlike multi- 
 plication, however, the partial remainder must be examined 
 periodically to determine one or more quotient digits, which 
 will control the use of the adder in subsequent steps. That is, 
 while the repetitive steps of division must be performed radix 
 r because of the existence of a single adder, the quotient digit 
 determination may be performed with radix r , where X is 
 the number of radix r quotient digits determined in one com- 
 parison. 
 
 In the arithmetic unit under investigation, the time 
 required to transmit a given number of digits of the partial 
 remainder or divisor to the selection mechanism is directly 
 proportional to the number of digits transmitted. The value 
 of several digits of the partial remainder are required to deter^ 
 mine one quotient digit; the value of one additional digit is 
 required for each additional quotient digit. Hence, the total 
 time spent obtaining information from the partial remainders 
 decreases as the number of quotient digits determined in each 
 step increases. This effect is opposed by two factors. The 
 first is the accuracy to which the value of the divisor must be 
 
90 
 
 known. The number of digits of the divisor whose value must 
 be available to the quotient digit selection logic has the same 
 form as that for the partial remainder; several digits for the 
 first quotient digit, one for each additional quotient digit deter- 
 mined. The second factor is the time required for the quotient 
 digit selection logic to yield the quotient digits after the appro- 
 priate information is presented to it. This last factor is not 
 related in a simple way to the micro-instruction execution time. 
 Therefore, rather than determining a specific division algorithm, 
 the number of digits of the divisor and partial remainders which 
 must be presented to the quotient digit selection logic will be 
 determined as a function of the number of quotient digits 
 selected per step, and the radix and redundancy of the numbers 
 employed. The class of divisions requiring the minimum infor- 
 mation of the value of the partial remainders and divisors will 
 be analyzed. 
 
91 
 
 <"o 
 
 Figure 10. P-D plot for general SRT division. 
 
92 
 
 3.6.1 Analysis of Division for Maximally Redundant Numbers 
 
 We will now begin the analysis of the number of digits 
 of the partial remainder required to determine a number of 
 quotient digits, extending the work of Robertson (18) and 
 Atkins (3). The number of digits of the partial remainder and 
 of the divisor which must be available to the quotient digit 
 selection mechanism is determined by the situation depicted 
 graphically in Figure 10. This figure is a P-D plot, as 
 suggested by C. V. Frieman (18). The abscissa is the divi- 
 sor value, the ordinate the value of the shifted partial remainder. 
 The graph is divided into regions for which a given quotient 
 digit value may be chosen. 
 
 The boundaries of such regions are lines of the form 
 
 and 
 
 where 
 
 r x p. = (q-K)d (3.6.1) 
 
 r X p. = (q+K)d (3.6.2) 
 
 is the radix of the number representation system, 
 is the number of quotient digits determined by a 
 single comparison (hence the division radix is 
 effectively r ) , 
 
93 
 
 p. is the value of the partial remainder at the con- 
 
 elusion of step j, 
 q is the quotient digit value which may be selected, 
 
 n is the largest possible quotient digit, i.e. , 
 
 q e jn, n-1, . . . , 1, 0, 1, . . . , nj , 
 d is the divisor value, 
 
 K is a constant determined by the redundancy of the 
 
 quotient and is 
 
 K = . (3.6.3) 
 
 r X - 1 
 
 Note that since quotient digits will be determined by truncated 
 
 versions of the divisor and partial remainder, it is only known 
 
 that the point representing a given divisor and partial remainder 
 
 is within a rectangular region of the truncated values. Hence, each 
 
 such region must lie entirely within one of the quotient digit regions 
 
 defined by Equations (3.6.1) and (3.6.2). 
 
 In Figure 10, line 1 represents the lower boundary of 
 
 the region for which the choice q=n can be made. Line 2 is the 
 
 upper boundary of the region for which q=n-l may be chosen. The 
 
 interior of the dashed-line rectangle represents the range of 
 
 possible divisor, partial remainder pairs which have truncated 
 
 •s 1 a 
 
 values of d and (n - "r)d, respectively where 
 
 d is the minimum positive truncated divisor value. 
 
94 
 
 When this rectangle lies entirely within the two lines, all 
 
 other such rectangles of the same size will be entirely 
 
 * 
 
 within some quotient digit region . Atkins has shown that 
 
 insuring that point 3 on line 2 is not below point 4 on the 
 rectangle guarantees that the rectangle is within the two 
 lines. This condition is: 
 
 (n-l+K) •( d -A d) >(n-j) d +A p (3.6.4) 
 
 where 
 
 d is the smallest positive truncated divisor value, 
 
 Ap is the truncation error in the shifted partial 
 
 remainder, 
 Ad is the truncation error in the divisor value. 
 
 Note that this equation is based on the assumption that the 
 partial remainder will be shifted X digital positions before 
 
 the comparison is made. Since the most significant quotient 
 
 \ -1 
 
 digit has weight r , the more usual procedure is to shift 
 
 the partial remainder left only one digital position, and the 
 X next quotient digits are determined. The quotient digits 
 are then disposed of beginning with the most significant digit. 
 
 *This is a conservative solution. It will be discussed later that 
 solutions were found which had smaller values of P and 6 
 
 r 
 
95 
 
 One left shift is performed between each such pair of steps. 
 
 Hence, the value of Ap determined by Equation (3.6.4) will 
 
 X-i 
 
 be r larger than it will have to be if the method above is 
 
 used for determining new partial remainders. 
 
 Taking the above consideration into account, Equation 
 (3.6.4) can be manipulated into the following more useful form: 
 
 X -1 1 A 
 
 r . Ap + (n-l+K)' Ad < (K- -) . d (3.6.5) 
 
 From the section on normalization we see that for maxi- 
 mally redundant numbers the division parameters are 
 
 3 = - , A d = r" 6 , Apzr _/ (3.6.6) 
 
 r 
 
 where 
 
 6 is the number of digital positions of the divisor 
 
 which are transmitted to the quotient digit selection 
 
 device and 
 
 p is the number of digital positions to the right of the 
 
 radix point which are transmitted to the quotient 
 
 digit selection mechansim. 
 
Table 10. Requirements for Performing Division with 
 Maximally Redundant Numbers. 
 
 r 
 
 X 
 
 e 
 
 6 
 
 r > 4 
 r = 3 
 r = 3 
 r = 2 
 r = 2 
 r = 2 
 
 X >1 
 A;>2 
 X = 1 
 X >3 
 X = 2 
 X = 1 
 
 X+ 1 
 X+ 2 
 
 2 
 X+ 2 
 
 4 
 
 2 
 
 X+ 2 
 X+ 2 
 
 3 
 X+ 3 
 
 4 
 
 2 
 
97 
 
 The maximum quotient 'digit' consists of Xdigits all 
 equal to (r-1), hence 
 
 X-l . , 
 
 n= ^(r-ljr 1 = r -1 (3.6.7) 
 
 i=0 
 which yields from (3.6.3) that 
 
 K = 1 . (3.6.8) 
 
 Applying (3.6.6), (3.6.7) and (3.6.8) to (3.6.5), 
 
 X _ l -p X _6 l 
 
 r r r + (r -l)r <— (3.6.9) 
 
 Now letting 
 
 6 = /O+l (3.6.10) 
 
 yields that 
 
 r P >(4 - — . ) r A (3.6.11) 
 
 A 
 
 r 
 
 Therefore 
 
 p = X +1 and 6 = X +2 for r>4. (3.6. 12) 
 The other cases are, in general, no more difficult to 
 solve. Optimum solutions are given in Table 10; the r=3, 
 ^ =1; r=2, X =2 and X=l entries were not, however, deter- 
 mined from Equation (3.6.9). For these cases, requiring that 
 the rectangle lie entirely within the region bounded by lines 1 
 and 2 in Figure 10 is overly conservative. In these cases it is 
 
Figure 11. P-D plot used to determine £ for r=2,A=l, 
 
99 
 
 possible to choose and 6 such that point 2 of the rectangle 
 coincides with point 1 of line 1. This condition is expressed 
 algebraically as 
 
 r _1 r"^ +(r X -2)r" 6 £ j- (3.6.13) 
 
 For all but r = 2 and X =1, this equation determines Pand6 
 However, the coefficient of (2) is zero when r=2, X=l 
 
 in Equation (3.6. 13); hence 6 could not be determined by that 
 equation. To determine the value of 6 , the P-D plot for this 
 case was drawn. It was seen that A d must be such that the 
 region of divisor values and partial remainder values centered 
 a t ( "r , 0) will remain within the q=0 region. As seen in Figure 
 11, this value is Ad = — ; hence 5=2. 
 
 The reader should notice that when X = 1, the values 
 of only the first two digits of the partial remainder are required 
 to select the quotient digit. Since, as we have seen in Section 3.2, 
 the primitive control unit must receive information from the first 
 two DPUs to detect overflow when addition or subtraction is per- 
 formed, no additional data paths are required to sense the value 
 of the partial remainder. For r = 2, 6 =2 also; indicating that 
 no special data paths are necessary. For r ^> 3 , 6=3; indi- 
 cating that special provision would have to be made to transmit 
 
100 
 
 the value of the divisor digit from the third DPU to the PCU. 
 "While it is possible to include a special data path from the 
 third DPU to the PCU, an alternative which appears attractive 
 is to normalize by method D as discussed in Section 3.4, 
 since this allows the value of three digits of the operand to be 
 accessible to the PCU. These are the digit shifted into the 
 PCU, and the digits in the first two DPUs. Hence, it is 
 possible to build an arithmetic unit for an arbitrary radix 
 which has very regular and repetitive interconnections. 
 
 Whether this is adequate, or whether a division algorithm 
 in which more than one quotient digit is determined at each com- 
 parison, can only be decided when some very implementation- 
 dependent factors are taken into account. The major factors 
 are the time required for the quotient digit selection logic to 
 produce its output and the decrease in the speed of micro- 
 instruction execution caused by the additional connection to cer- 
 tain of the data paths . 
 
 3.6.2 Analysis of Division for Other Number Systems 
 
 The analysis of division for an arithmetic unit in which 
 other than a maximally redundant representation is employed is 
 much more complicated. From the discussion of the preceeding 
 
section, we see that, in general, 
 
 101 
 
 4 r -r-rA + 21-1 (r-l) -6 
 
 d = 5 + i TV r 
 
 r 2 (r-l) f" 1 ' 
 
 (3.6.14) 
 
 Ap = 
 
 iizn -p 
 
 (r-l) 
 
 and 
 
 A d = 1 rf- r 
 (r- 1) 
 
 The maximum quotient digit is 
 
 X-l 
 
 (v- 0\ r _1 = 
 
 (r-l) 
 
 i=0 
 
 1) 
 
 (3.6. 15) 
 
 (3.6.16) 
 
 (3.6.17) 
 
 from which we obtain 
 
 K = 
 
 i 
 
 r-l 
 
 (3.6.18) 
 
 Applying the above to Equation (3.6.5) and doing some manipula- 
 tion we obtain 
 
 X-l-p 
 
 + 
 
 'LlH r x . 1 + (^-D 
 
 (r-l) 2 (r-l) 
 
 -6 
 
 - 6 r-2^+1 
 
 r £ — r 
 
 2r 
 
 1 + 
 
 (*-*) 
 
 (r-/)(r-l) 
 
 Neglecting all but the first r and right-hand term and letting 
 
 6 = p+ 1, 
 
102 
 
 and doing further manipulation we obtain 
 
 p 2r X+1 (2r-^-l) 
 
 r 2 (r-1) (r-2i + l ) ' (3.6.19) 
 
 From which we find that 
 
 />= X +1 forJ> < | - 1. (3.6.20) 
 
 When minimally redundant representations are employed 
 in the arithmetic unit, the solution is 
 
 f>= X +2 for |- l<i < | (3.6.21) 
 
 The solutions given in lines (3.6.20) and (3.6.21) are 
 pessimistic when A is small because of the simplification which 
 had to be made to obtain Equation (3.6. 19). The analysis does 
 show that, in general, the number of digits of the partial remain- 
 der and divisor which must be examined to determine the next 
 quotient digits is a constant plus the number of quotient digits 
 determined. The constant is a function of the radix and 
 redundancy employed by the arithmetic unit. 
 
103 
 
 3.6.3 Using the Multiplication Overflow Recoder During Division 
 The next problem associated with the implementation 
 of division is the fact that the shifted partial remainder may 
 become greater than unity, and hence could not be repre- 
 sented entirely in the accumulator register of the adder. 
 Robertson (17) has shown that the shifted partial remainder 
 
 satisfies 
 
 rp j 
 
 — r- 1 
 
 The multiplication overflow recoder is shown in Section 3.3.3 
 
 to allow the value of the accumulator register to become as 
 
 r 2 -l 
 
 large as — in absolute value. Hence, it can also be 
 
 employed to act as an extension of the adder during division. 
 Since in this arithmetic unit both multiplication and division 
 are right-directed, there is little fundamental difference 
 between the functions to be performed by the overflow recoder 
 when it is employed for division and when it is employed for 
 multiplication. During multiplication it will overflow and yield 
 non-zero digits, while the choice of quotient digits prevents this 
 during division. During division, the value retained must be 
 
 ^Assuming that one shift, rather than X , be made prior to determining 
 the quotient digits. This is the same assumption made in obtaining 
 Equation (3.6.5). 
 
104 
 
 available to the quotient digit selection mechanism, while no 
 such connection is necessary for multiplication. In every 
 other respect, the two uses of the unit are identical. Hence, 
 the multiplication overflow recoder can be employed to retain 
 the integer part of the partial remainders by the addition of a 
 very minimal amount of logic. 
 
 3.6.4 Placing the Quotient Digits into the DPUs 
 
 The final problem which the implementation of division 
 poses is that of the storage of quotient digits and the accomodation 
 of a double precision dividend. The most significant digit of the 
 quotient is obtained first, hence shifting the register receiving 
 the quotient right one position and inserting the newly determined 
 quotient digit would result in the quotient being stored in reverse 
 order. Hence, they must be "inserted" into the least significant 
 digital position after the register is shifted left. This is essen- 
 tially the same problem as accumulating product digits during 
 multiplication. Just as in the case of multiplication, the solution 
 is to use a left shift micro-instruction. In this case, the quotient 
 digit to be stored is identified as the digit which is to be placed 
 in the least significant digital position. The register that contains 
 the least significant part of the dividend should be used to store 
 the quotient digits (if it exists) since one digit of the least significant 
 
105 
 
 part of the dividend must be moved to the accumulator for 
 each quotient digit which must be stored. Furthermore, since 
 the digits of the least significant part of the dividend must be 
 sent to the accumulator in the order of decreasing significance, 
 the digit shifted into the PCU by this left shift micro-instruction 
 is the digit which must be moved to the accumulator. This digit 
 can then be sent to the least significant digital position of the 
 accumulator by identifying it as the digit which is to be placed 
 there when the accumulator left shift micro-instruction is issued 
 to multiply the partial remainder by the radix. 
 
 3.6.5 A Division Example 
 
 An illustration of how division may be performed on a 
 double-precision dividend is presented as Figure 12. In this 
 example the register contains the divisor. The A register 
 initially contains the most significant part of the dividend and 
 contains the remainder at the conclusion of the operation. The 
 M register initially contains the digits of lesser significance and 
 contain the quotient digits after the division. 
 
 *Which is contained in the accumulator during division, 
 
106 
 
 Line 
 
 
 
 
 Pr. 
 
 
 
 
 
 
 
 No. 
 1 
 
 M l 
 
 M 2 
 
 M 3 
 
 Data 
 
 A l 
 n l 
 
 A 2 
 n 2 
 
 A 3 
 
 M 
 
 P 2 
 
 U 3 
 
 n 4 
 
 n 5 
 
 
 
 
 
 n 3 
 
 - 
 
 - 
 
 - 
 
 2 
 
 
 
 
 q o 
 
 
 
 
 SA 
 
 - 
 
 - 
 
 3 
 
 
 
 
 
 1 P 1 
 
 
 
 +% 
 
 SA 
 
 - 
 
 4 
 
 n 5 
 
 
 
 n 4 
 
 
 1 P 2 
 
 
 EM 
 
 +q o 
 
 SA 
 
 5 
 
 
 
 
 
 1 P 1 
 
 1 P 2 
 
 
 1 P 3 
 
 EA 
 
 EM 
 
 + % 
 
 6 
 
 
 
 q o 
 
 q l 
 
 
 1 P 3 
 
 
 SA 
 
 EA 
 
 EM 
 
 7 
 
 
 
 
 
 2 P 1 
 
 
 1 P 4 
 
 +q l 
 
 SA 
 
 EA 
 
 8 
 
 o 
 
 
 
 n 5 
 
 
 2 P 2 
 
 
 EM 
 
 +q i 
 
 SA 
 
 9 
 
 
 q o 
 
 
 2 P 1 
 
 2 P 2 
 
 
 2 P 3 
 
 EA 
 
 EM 
 
 + ^i 
 
 10 
 
 
 
 q l 
 
 q 2 
 
 
 2 P 3 
 
 
 SA 
 
 EA 
 
 EM 
 
 11 
 
 
 
 
 
 3 P 1 
 
 
 2 P 4 
 
 +q 2 
 
 SA 
 
 EA 
 
 12 
 
 q o 
 
 
 
 
 
 
 3 P 2 
 
 
 EM 
 
 +q 2 
 
 SA 
 
 13 
 
 
 q i 
 
 
 
 
 
 3 P 3 
 
 - 
 
 EM 
 
 +q 2 
 
 14 
 
 q o 
 
 q i 
 
 q 2 
 
 
 
 3 P 1 
 
 3 P 2 
 
 3 P 3 
 
 - 
 
 - 
 
 EM 
 
 Figure 12. Division example. 
 
107 
 
 The conventions of the multiplication example, Section 
 3.3.2 will be used, together with the following additions 
 SA indicates that the corresponding DPU has 
 
 transmitted its digit of the A register to 
 the PCU to allow the PCU to select the next 
 
 quotient digit to be used, 
 
 th th 
 
 .p. is the i digit of the j partial remainder, 
 
 , .th . ■ ,. . 
 
 q is the 1 quotient digit, and 
 
 i 
 
 n. is the i digit of the dividend. 
 
 The arithmetic unit is prepared for the division at time 1. 
 
 At time 2, the first quotient digit (q ) is determined by the 
 
 PCU and placed in the ' Pr Data' register. At time 3, the PCU 
 
 issues an addition micro-instruction to DPU which causes 
 
 the first partial remainder to be determined. The 'Pr Data' 
 
 register does not participate in this addition. A left shift M 
 
 micro-instruction is issued next by the PCU. This causes q 
 
 o 
 
 to be sent into the M register and causes n to be placed in 
 Pr Data . At time 5, a left shift A micro- instruction is 
 launched by the PCU, causing n to become the least significant 
 digit of the A register. The cycle above is repeated for each 
 of the quotient digits . 
 
 *The final left shift of A is not performed. 
 
108 
 4. INTERACTION WITH MEMORY 
 
 4. 1 Introduction 
 
 The arithmetic structure must interact with a memory 
 in order to obtain operands and to return results. The major 
 parameter which determines the method of communicating 
 between the DPUs and memory is the number of digits con- 
 tained in the memory byte, where 'byte' is used in its 
 generalized sense of operational data unit. 
 
 The methods applicable in two cases will be discussed; 
 those for which the memory byte is one digit and those for which 
 it is a number of digits. It is unnecessary to consider systems 
 in which several memory bytes are required to represent one 
 digit, because the elements of such a system would be struc- 
 turally identical to a system in which the memory byte is one 
 digit. 
 
 We will assume that memory micro-instructions make 
 up a small fraction of the micro-instructions executed by the 
 DPUs and that the number of locations in the memory is very 
 large. 
 
 *The major differences are the reduced width of the memory busses 
 and the added complexity of the DPU caused by the requirement to make 
 a number of memory accesses to execute one memory reference micro- 
 instruction. 
 
 
109 
 
 These assumptions lead to the conclusion that the memory 
 address should not be carried as part of the micro-instruction 
 stream. Instead, a pointer to a register may be sent along with 
 
 the micro-instruction (as . F. in Equation 2.1.1 through 2.1.3), 
 
 J i 
 
 since this would decrease the number of interconnections required 
 to transmit this address information. 
 
 4. 2 Methods Applicable When the Memory Byte Is the Digit 
 
 There are two methods of causing data transfers between 
 the DPUs and memory when the memory byte is one digit of the 
 number representation. 
 
 The simpler of the two methods, suggested by Comfort (8), 
 is to cause all communication to take place via the PCU. The digits 
 are placed in or removed from the DPUs by means of left shift 
 micro-instructions. The left shift micro-instruction is defined 
 so that the digit which will appear in the last DPU is fixed by the 
 PCU when it issues the micro-instruction. To 'load' a register 
 from memory, the PCU would alternately perform a read memory 
 cycle and a left shift which causes the digit just read from memory 
 to be the least significant digit of the register. To store a 
 register, the PCU alternately performs left shifts and write 
 memory cycles where the digit stored in the memory is the digit 
 shifted out of DPU . A load and store of the same register can be 
 
z 
 O 
 a. 
 
 3 
 O 
 
 I ? 
 
 a v> 
 
 U Z 
 GE 4 
 
 §5 
 
 n 0- 
 
 110 
 
 
 
 » 
 
 > 
 
 
 U 1 
 
 2 
 
 lO 
 
 H 
 
 Li. 
 
 
 ? 
 
 -) 
 
 CO 
 
 _) 
 
 "3 
 
 Q 
 
 ^ 
 
 
 T 
 
 7 
 
 < 
 
 
 <> 
 
 
 rr 
 
 
 III 
 
 
 H 
 
 u 
 
 2 
 
 (0 
 CO 
 
 llJco 
 
 £* 
 
 gUJ 
 < CO 
 
 0. 
 
 
 
 
 
 
 N 
 
 
 ^ 
 
 
 o 
 
 n 
 
 
 Q 
 
 
 n 
 
 n 
 
 
 n 
 
 
 Cj 
 
 < 
 
 
 < 
 
 
 < 
 
 <•' 
 
 
 ^ 
 
 
 7 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 § t 
 
 £ Z 
 <r o 
 a. o 
 
 3 o 
 
 o u. 
 
 UJ 
 
 a! « 
 
 
 
 
 
 
 o 
 
 
 _. 
 
 
 CM 
 
 o 
 
 
 a 
 
 
 O 
 
 Q 
 
 
 n 
 
 
 Q 
 
 < 
 
 
 < 
 
 
 <I 
 
 1- 
 
 
 K 
 
 
 (- 
 
 3 
 
 
 3 
 
 
 ■J> 
 
 O 
 
 
 o 
 
 
 o 
 
 
 
 
 
 
 cr to 
 
 < r- 
 
 O 
 
 s 
 
 
 
 
 V 
 
 c 
 o 
 
 •iH 
 •J-> 
 
 O 
 
 •iH 
 
 C 
 
 S 
 
 s 
 
 o 
 
 u 
 
 CO 
 
 1— I 
 
 0) 
 h 
 
 3) 
 
 •iH 
 
Ill 
 
 accomplished simultaneously. To do this, the most significant 
 digit of the number to be loaded is obtained from memory. Then 
 the left shift micro-instruction is performed such that the digit 
 obtained from memory becomes the least significant digit. The 
 digit shifted into the PCU by that micro-instruction is then 
 stored in the memory as the most significant digit of the number 
 being stored. The next digit of the number to be loaded is then 
 obtained and the sequence above repeated. The sequence is 
 repeated as many times as there are digits in the numbers. 
 
 The other method of loading and storing the arithmetic 
 registers involves the use of generally distributed memory buses, 
 sets of address registers, and scheduling mechanisms. There 
 are two sets of each of these items, one for loading operands and 
 one for storing results. This method requires only one micro- 
 instruction to be issued per data transfer. The diagram of the 
 system is presented as Figure 13. 
 
 The operation of the system is as follows. When the PCU 
 
 detects a load or store operation, it transmits the memory address 
 
 to the address register indicated by the free pointer register 
 
 (PI or PO ). If there are fewer address registers indicated than 
 5 
 
 DPUs the storage of this new address must be preceded by a check 
 
112 
 
 that this address register is available . If the register is free, 
 the address is stored in the register. The pointer register is 
 incremented by one, modulo the number of registers in the 
 group. The appropriate micro-instruction is then issued to 
 the first DPU by the primitive control unit. This DPU requests 
 that its information digit be transferred to or from the storage 
 unit. When the scheduling mechanism sends the signal to pro- 
 ceed, the DPU sends the value of the appropriate pointer 
 
 register (PI or PO ) to the address registers, which convert 
 x x 
 
 it to a memory address by a table-lookup process in the input 
 address registers or output address registers (whichever is 
 appropriate). It transmits or accepts the data digit, and passes 
 the micro-instruction to its immediate neighbor, which then 
 goes through the same cycle of events. The address which was 
 referenced is also incremented to point to the location of the 
 next digit of the operand. 
 
 Note that since all "store" micro-instructions prior to a 
 given "load" micro-instruction are guaranteed to have been per- 
 formed by a given DPU, there is no danger that an inappropriate 
 value of a given variable is used by the adder. The choice which 
 the designer has in interfacing a digit oriented storage device to 
 
 The arithmetic unit must wait until it is available 
 
113 
 
 guj 
 
 
 
 
 
 
 
 
 -1 
 
 
 CJ 
 
 
 _ 
 
 
 o 
 
 o 
 
 
 Q 
 
 
 r> 
 
 
 Q 
 
 o 
 
 
 Q 
 
 
 o 
 
 
 Q 
 
 < 
 
 
 < 
 
 
 < 
 
 
 <I 
 
 z 
 
 
 2 
 
 
 z 
 
 
 7 
 
 
 
 
 
 
 
 — 
 
 1 — 
 
 
 
 
 
 
 
 0- 
 
 o 
 
 ^ 2 t 
 
 i § - 
 
 £ 8 
 
 05 UJ 
 
 too: . 
 
 UJ<K 
 
 CECL *: 
 
 S° 
 
 xndino 
 
 uj 5 
 
 bj (A 
 
 q. r 
 
 w 5 
 
 3 U_ 
 
 Q- (/I 
 
 H- Z 
 
 3 < 
 
 O <E 
 
 ?■ Z ? 
 
 3 Z 
 
 
 
 
 
 
 o 
 
 
 _, 
 
 
 m 
 
 o 
 
 
 a 
 
 
 o 
 
 n 
 
 
 n 
 
 
 o 
 
 < 
 
 
 < 
 
 
 < 
 
 H 
 
 
 i- 
 
 
 h- 
 
 -■) 
 
 
 > 
 
 
 i 
 
 O 
 
 
 o 
 
 
 O 
 
 
 
 
 
 
 
 UJ V) 
 
 o 
 
 Q 
 < 
 
 1- 
 
 <r tr 
 
 Q UJ 
 
 Q 1- 
 
 4 CO 
 
 O 
 
 1- w 
 
 truj 
 
 
 t- 
 
 
 z> 
 
 
 o 
 
 Xi 
 h 
 
 a 
 
 CO 
 
 J-> 
 
 • H 
 •H 
 
 
 W) 
 
 • H 
 
 > 
 
 u 
 O 
 
 £ 
 
 s 
 
 c 
 
 • 1-1 
 ■u 
 «J 
 O 
 
 C 
 
 I 
 
 S 
 o 
 
 o 
 
 o 
 
 UJ 
 
 O 
 < 
 
 a. 
 O 
 
 m 
 u 
 •i-i 
 
 0) 
 
114 
 
 the arithmetic unit is between a method which entails the mini- 
 mum additional hardware and a method which may be faster but 
 which requires an extensive amount of additional hardware. The 
 second method is expected to be faster by approximately the 
 number of digits contained in each operand. 
 
 4. 3 Methods Applicable When the Memory Byte Is a Number of Digits 
 
 We will now discuss the methods of communicating between 
 the arithmetic unit and memories which transfer a number of digits 
 per transaction. One problem, not present when the memory byte 
 is the digit, then faces the designer. He must incorporate 
 serializing and deserializing mechanisms into the interfaces. 
 
 In the case of the first method discussed in the preceding 
 section, in which all communication takes place through the PCU, 
 the extension is very straight-forward. The serializer and 
 deserializer simply become part of the PCU, since only one data 
 transfer can be occurring at any given time in each direction. 
 
 The approach of employing a centralized serializer and 
 deserializer does not appear to be appropriate when the method 
 depicted in Figure 1 3 is extended. There are two methods of pro- 
 viding communication between the memory and the arithmetic unit 
 in this case. The first method, shown in Figure 14, uses regis- 
 ters capable of storing one memory byte to convert data lengths. 
 
115 
 
 Each of the registers is interfaced with as many DPUs as 
 there are digits in the memory byte. When a load micro- 
 instruction reaches the first DPU connected to a given input 
 register, the control circuitry associated with the register 
 cause the appropriate information to be put into the register . 
 Each DPU connected to the register then gates in its digit 
 from the register when it executes the load micro-instruction. 
 Each DPU executes a store micro-instruction by storing the 
 appropriate digit into its portion of the output register to which 
 it is connected. When all of the positions of a register have 
 been filled, the contents of that register is stored in the 
 memory. 
 
 The storage address register scheme described in 
 Section 4. 2 can also be used with this method, except that now 
 addresses are associated with buffer registers, not DPUs. 
 
 Since data which a given DPU presents to its output 
 register for storage is not guaranteed to have been stored before 
 it performs subsequent micro- instructions , a mechanism must 
 
 *The register is filled only if all of the DPUs connected to the register 
 have executed the preceding load micro-instruction. 
 
 **The DPU must be inhibited from executing the store instruction until 
 the preceding information has been sent to the store unit. 
 
116 
 
 be included which assures that subsequent fetches will always 
 retrieve the appropriate data. A simple method of guaranteeing 
 this is to compare the address of the operand which is to be 
 fetched with the address of the last store operation. If they 
 are not the same, then obviously the memory will contain the 
 most recently calculated data when it is requested. A given 
 DPU cannot obey a store micro-instruction unless the store 
 buffer register is assigned to collect data for that store opera- 
 tion, therefore implying that all prior store instructions have 
 been completed. 
 
 If on the other hand, the address of the last store opera- 
 tion is the same as that of the operand, there is a possibility that 
 for some DPU the store has not been made prior to the load. 
 This condition can be controlled by means of an interlock. This 
 interlock could consist of altering the micro-instruction from a 
 storage unit reference to a reference to the contents of the output 
 register . This may necessitate the inclusion of register to 
 register transfer micro-instructions not otherwise required. A 
 
 *This has some analogy to the Common Data Bus (24) 
 
117 
 
 8, 
 G 
 
118 
 
 second method of implementing the required interlock consists 
 
 * 
 
 of a micro-instruction which would not proceed past the first 
 
 DPU attached to an output register unless that register had 
 stored all operands previously presented to it. This has the 
 undesirable effect of requiring either two types of DPU or some 
 way of altering the action of the universal DPU on the basis of 
 whether it is a "first" unit or not. This method of byte length 
 conversion has one distinct disadvantage. A second memory 
 micro-instruction of the same type, i.e. , the second of two 
 loads or two stores, must not be allowed to be performed by 
 the first DPU associated with a register until after the last 
 unit associated with that register has completed the previous 
 similar micro- instruction and the appropriate memory action 
 has been taken. 
 
 If this causes intolerable delays, a second method of 
 byte length conversion could be employed. This is depicted in 
 Figure 15. In this approach the information exchanged between 
 the adder and the memory passes through shift register-like 
 structures. An element of this structure can store one digit and 
 will accept data from its input neighbor when its indicator shows 
 
 : The most obvious micro-instruction meeting these conditions is a 
 second store micro-instruction. 
 
119 
 
 that it does not contain information. The element subsequently 
 causes its own indicator to turn on and sends a signal to its 
 input neighbor to turn that indicator off. These elements are 
 arranged in chains; one extremity of each chain is connected 
 to a DPU and the other is connected to a memory bus. There 
 must be two chains associated with each DPU, one of which 
 stages input operand digits, the other of which accumulates 
 digits until a memory byte is collected. The former is used 
 in loading operands from memory. The memory bus acts as 
 the input neighbor of the first element and the DPU receives 
 digits from the last element in each chain. The latter is used 
 in storing results in the memory. The DPU is the input 
 neighbor of the first element and the memory bus receives data 
 from the last element of each chain. The shift register chains 
 need not be equally long. For example, it is advantageous for 
 the store chain associated with the last DPU of a given memory 
 byte to be made smaller than the store chain receiving data 
 from the first DPU of that storage byte. The chain of the first 
 DPU must not only have room to store digits until the memory 
 may be accessed but must store digits while the storage unit byte 
 is being accumulated. 
 
 The address information can be handled very much like the 
 methods above, viz. by a number of registers in which storage 
 
120 
 
 addresses can be stored and pointers indicating the storage 
 address register that each data register is to use. 
 
 More extensive checking is required to assure that the 
 references to each specific memory location will be done in the 
 order in which they were issued. Associated with each address 
 register must be two dependency registers and a flag. The 
 registers identify which other data transfers are also accessing 
 this memory location, and the flag indicates whether or not 
 the address register is currently in use. The registers that 
 are used to indicate dependencies must have a null state, indi- 
 cating no dependency. Both dependency registers point to address 
 registers of the other type . One points to the operation which 
 must be completed (on a memory byte basis) before its operation 
 may be performed. Registers of this type are labelled xPP. in 
 Figure 15, where j is the register number and x may be either I 
 or 0. I indicates that the register is associated with an input 
 transfer, while indicates output. The other dependency registers, 
 labelled xPF., point to the last transfer found to be dependent on 
 its transfer. These registers are used to eliminate unnecessary 
 testing. 
 
 *That is, the dependency registers associated with a load point to address 
 registers associated with store operations, while those associated with a 
 store point to address registers associated with load operations. 
 
 
121 
 
 When a load or store appears in the instruction stream, 
 the memory address is sent to the storage unit control. When 
 the address register which is indicated by the appropriate free 
 register pointer is available, the memory address is placed into 
 it. The PCU is then allowed to issue the appropriate micro- 
 instructions to the DPUs. The memory address is also com- 
 pared with those contained in the currently active address 
 registers of the other type . If there are any matches, the 
 register number of the most recently initialized matching trans- 
 fer is placed in the xPP. of the transfer being initialized and the 
 register number of the transfer being initialized is placed into 
 the xPF. register of this matching transfer. If there are no 
 
 matches, null is placed into xPP.. The xPF. register is always 
 
 J J 
 
 set to null when a transfer is initialized. 
 
 When a data transfer is requested, the transfer indicated 
 by xPP. is checked to be sure that it has been performed at least 
 to the byte of the current request. The transfer must not be 
 allowed to take place until it satisfies the above condition. 
 
 With this scheme for interfacing the memory to the DPUs, 
 it appears very desirable to pre-fetch operands. 
 
 *That is, the dependency registers associated with a load point to 
 address registers associated with store operations, while those 
 associated with a store point to address registers associated with 
 load operations. 
 
122 
 
 While it is possible not to initiate the request for the 
 first memory byte until the PCU issues the appropriate com- 
 mand to the DPUs , this introduces an avoidable delay equal to 
 the time required by the memory to service this request. Of 
 course, when operand look ahead is employed and a branch 
 dependent upon the arithmetic properties of a result is encoun- 
 tered, the device must either cease operand look ahead or 
 fetch operands that may be required and find some way of 
 discarding operands which are not required. A micro- 
 instruction which causes the operand in the input shift- 
 register adjacent to the DPU to be discarded accomplishes 
 this function. This appears to be the only instance of non- 
 productive activity occurring in a limited connection arithmetic 
 unit. Unfortunately, non-productive activity takes place in the 
 DPUs after the branch is resolved, in addition to the non- 
 productive memory activity prior to its resolution. 
 
123 
 
 5. OPERATIONAL SPECIFICATION OF THE MODULES 
 
 5.1 Introduction 
 
 The design parameters of a limited connection arith- 
 metic unit determine the number of distinct types of modules 
 required and the detailed specification of these modules. This 
 chapter discusses each of the modules with particular emphasis 
 of the effects of the following parameters: 
 
 1 . the method of communicating with memory, 
 
 2. the number of quotient digits determined at 
 each examination of the partial remainder, 
 
 3. the number of digits examined in normalizing 
 number representations, 
 
 4. whether a push-down stack is included in the 
 End Unit, 
 
 5. the method of performing addition and subtraction, 
 
 6. whether multiplier recoding is performed, and 
 
 7. the parameters of the number representation 
 scheme. 
 
 Item 1, the memory communication scheme, determines 
 whether special modules, special data and control paths, and 
 special micro- instructions will be included for implementing the 
 communications paths between the arithmetic unit and memory. 
 
124 
 
 This area was discussed in detail in Chapter 4. The number of 
 registers in the arithmetic unit for holding intermediate results 
 is also dependent on the scheme for interfacing the arithmetic 
 unit and memory. Items 2 and 3, the number of quotient digits 
 
 per examination and the normalization parameter, will deter - 
 
 * 
 
 mine the number of additional DPUs which must have a direct 
 
 data path to the PCU, and if special modules are required to 
 signal the PCU when information is available on these paths. 
 Item 4 determines the complexity of the End Unit. Item 5 
 determines the number of DPUs with which each DPU com- 
 municates. Item 6 determines whether the complexity of the 
 PCU is increased by containing a multiplier recoder within it. 
 Item 7, the parameters of the number representation, deter- 
 mines the size of all data registers and data paths. The method 
 of performing arithmetic, item 5, also affects the size of the 
 inter-DPU data paths. 
 
 *The DPUs other than those which will have a direct data path to the 
 PCU because of its role as DPU . 
 
125 
 
 5 . 2 The Digit Processing Unit 
 
 5.2.1 The Role of the DPU 
 
 The DPUs collectively perform the fractional part pro- 
 cessing of the arithmetic unit. Each DPU contains one digit of 
 each of the active operands; the significance of the digits con- 
 tained in a DPU is inversely related to its 'distance' from the 
 PCU, as shown in Figures 1 and 2. Each DPU contributes to 
 the processing by executing a sequence of micro- instructions. 
 The sequences executed by the DPUs of an arithmetic unit are 
 identical except for minor substitutions or omissions . A 
 given micro- instruction in the sequence is executed by the DPUs 
 
 from left to right (i.e. , DPU, , DPU V . . • , DPU ). At any 
 
 l c n 
 
 given time there are a number of micro-instructions being 
 processed by the DPUs. 
 
 *Recoding and 'transmit digit to PCU' micro-instructions will, in 
 general, be replaced by a no-op or removed from the sequence 
 after being executed by some of the DPUs. 
 
126 
 
 5.2.2 DPU Registers 
 
 Each DPU retains one digit of each of the active operands . 
 There must be at least three registers distributed through the 
 DPUs: an accumulator register, a multiplier -quotient register, 
 and an operand register. Each DPU must also contain a register 
 to hold the micro-instruction it is executing. It must also have 
 a serial number counter if the arithmetic unit employs request- 
 response signals. This serial number is transmitted with the 
 inter-DPU data and identifies the micro-instruction to which the 
 data is to be applied. Additional registers may be distributed 
 through the DPUs. One possible use of such registers is to 
 hold the number temporarily displaced from the accumulator 
 while the accumulator registers are used to shift the number 
 that had been residing in an operand register that cannot be 
 shifted. Only one register need be included for this purpose. 
 A second use of such registers is to hold intermediate results 
 which are needed so soon after they are calculated that storing 
 them and retrieving them from memory would delay the pro- 
 cessing. The number of intermediate result registers that are 
 desirable to include for this purpose is dependent upon the 
 method of communicating between the arithmetic unit and memory. 
 The number is determined by trade-off considerations. The 
 decrease in the average time spent waiting for operands to 
 
127 
 
 become available must be compared with the additional hard- 
 ware required to cause the decrease. 
 
 5.2.3 The Micro-Instruction Repertoire 
 
 There are a number of micro- instructions which may 
 be included in the repertoire of the DPUs in addition to those 
 discussed in Section 2.3 and Chapter 4. 
 
 The first of these micro- instructions causes the digits 
 of a specific register to be placed on the inter-DPU data paths. 
 These micro-instructions would be used during normalization 
 and division. With one of these micro-instructions the PCU 
 can obtain the information it required to determine what 
 additional processing is required to complete the operation. 
 While the left shift micro-instruction may be used for this pur- 
 pose, special micro-instructions have advantage in some designs. 
 For example, modules which determine when the information 
 required by the PCU is available are much simpler -when special 
 micro-instructions are used. 
 
 The second of these micro-instructions causes one of the 
 normalization recodings (Equation 3. 5. 3 and 3.5.4) to be per- 
 formed. These micro-instructions make it possible to reduce 
 the number of shifts required to perform a recoding. When the 
 recoding micro-instructions are not implemented and a number 
 
128 
 
 must be recoded, all the digits to be changed must be shifted 
 
 into the PCU. All but the leading zero digits of the recoded 
 
 number must then be shifted back into the DPUs. 
 
 When the normalization recoding micro-instructions 
 
 are implemented, a normalization recoding is begun by shifting 
 
 out all digits which are to be recoded into leading zeros. The 
 
 recoding micro-instruction is then issued to DPU. , which passes 
 
 to DPU the indication that a recoding micro-instruction is to 
 
 be performed. DPU then indicates by the value of G whether 
 
 it will participate in the recoding . DPU recodes its digit when 
 
 it receives this information. DPU then passes to DPU the 
 
 indication that a recoding is to be performed. DPU responds 
 
 by indicating whether or not it will participate in the recoding 
 
 by the value of G which it sends to DPU . DPU then recodes 
 y 3 2 2 
 
 its digit. Each successive DPU goes through this process until 
 the micro-instruction reaches the first DPU which cannot 
 recode its digit. After this DPU sends its response to its left 
 neighbor, it terminates the recoding by passing either no micro- 
 instruction or a no-op micro-instruction to its right neighbor. 
 
 *This is necessary because the recoding of the last digit is different 
 from the recoding of all the other digits. 
 
 
129 
 
 A no-op and an execute micro- instruction must be 
 included in the repertoire of the DPUs if they are designed 
 to execute micro-instructions periodically and synchronously. 
 No-op micro-instructions must be executed prior to any micro- 
 instruction which requires information form neighboring DPUs. 
 The number of no-ops which must be given after the set-up 
 micro-instruction is equal to the number of DPUs that must 
 send information to the DPU executing the micro-instruction. 
 These no-ops are followed by an execute micro-instruction. 
 Note that if each DPU saves the information about the last 
 micro-instruction it has set up, only one execute micro- 
 instruction is necessary. 
 
 The last of the additional micro-instructions that a 
 designer may wish to include in the repertoire of the DPU is 
 one that places a constant, such as zero, in a register. The 
 representation of the constant determines the details of the 
 
 micro-instruction. In the most useful cases, where all of the 
 
 si- 
 digits of the constant are identical , the DPUs do not have to 
 
 cooperate with any of their neighbors in performing the micro- 
 instruction. The value of the digit may be implicit in the micro- 
 instruction or it may be sent as the modifier value sent along 
 with the micro-instruction. 
 
 *The most important examples of numbers whose digits are identical 
 are zero, the largest number, and the smallest number. 
 
130 
 
 5.2.4 The Sequencing and Coordination of the DPUs 
 
 The operation of the DPUs contained in an arithmetic 
 unit must be coordinated to obtain useful results. Each DPU 
 must execute the same sequence of micro-instructions , which 
 is determined by the processing to be performed and the specific 
 operand values. After executing micro-instruction j-1, a 
 
 typical DPU, DPU., must determine the value of .G. and place 
 
 1 j i 
 
 this value on its inter-DPU data lines (see Equations 2.1.1 
 
 through 2.1.3). This information is required by DPU. , 
 
 j 
 
 . . . , DPU. „ , and DPU. , to perform micro-instruction j. DPU. 
 i-2 l-l ^ J i 
 
 also passes the i micro-instruction and modifier to DPU. , . so 
 r J l+l 
 
 that DPU. , will determine .G. , . When DPU. receives .G , , 
 i+1 j i+l i j i+1 
 
 . . . , G. it performs micro-instruction j and begins the pro- 
 
 1 i+ o . r 
 J 
 cedure for micro-instruction j + 1 . 
 
 The method of achieving this coordination is determined 
 by the method of implementing the DPUs. We will discuss how 
 this coordination may be achieved for two implementations which 
 are at the extremes of implementation philosophies. 
 
 In the first method of implementing the arithmetic unit, 
 all of the DPUs periodically and synchronously execute micro- 
 instructions . Each DPU goes through the same two cycle opera- 
 tion. On the first cycle each DPU executes the micro- 
 instruction it has just received. On the second cycle each DPU 
 
MAIN SEQUENCE 
 
 ACKL 
 
 131 
 
 o> 
 
 
 
 
 
 OORMANT 
 STATE 
 
 
 
 
 
 
 T|_i OOES 
 
 
 
 TO 1 
 
 ® 
 
 \ 
 
 ' 
 
 
 GATE IN 
 
 
 
 MICRO-INSTRUCTION 
 
 
 
 AND MODIFIER 
 
 
 
 VALUE 
 
 
 
 
 ACKR 
 IN © 
 
 
 ® 
 
 1 
 
 1 
 
 
 ACTIVATE 
 
 
 
 ACKL 
 
 
 
 AND 
 
 
 
 ACKR 
 
 
 © 
 
 
 
 
 .^MICRO-^. 
 
 
 ./INSTRUCTION^^. 
 ^s. REOUIRES s* 
 
 NO 
 
 
 
 
 
 
 ^S^6" INFOx^ 
 
 
 
 
 YES 
 
 
 ® 
 
 1 
 
 
 
 
 
 
 
 CHANGE V; 
 
 
 
 
 TO 
 
 
 
 © 
 
 < 
 
 1 
 
 
 
 INCREMENT SN[ 
 
 
 
 DETERMINE G, 
 
 
 
 
 PLACE ON 
 
 
 
 
 INTER- DPU LINES 
 
 
 
 
 AND HOLD THESE 
 
 
 
 
 VALUES UNTIL 
 
 
 
 
 CHANGED 
 
 
 
 © 
 
 ' 
 
 r 
 
 
 
 
 
 
 CHANGE V, 
 
 
 
 
 TO 1 
 
 
 
 
 
 REQUIRED 
 
 'G' 
 
 
 
 VALUES AVAILABLE 
 
 
 
 
 (SEE NOTE) 
 
 
 © 
 
 \ 
 
 
 
 
 
 
 
 PERFORM 
 MICRO-INSTRUCTION 
 
 
 
 
 
 
 
 
 ACKL IN ( 
 
 D 
 
 
 
 ACKR IN © OR ® 
 
 ® 
 
 DORMANT 
 
 STATE 
 
 SEND ACK| • 
 
 © 
 
 MAIN SEOUENCE 
 IN ® 
 
 SEND ACK| ■ 1 
 
 T|_! GOES 
 TO 
 
 ACKR 
 
 ® 
 
 © 
 
 ® 
 
 © 
 
 DORMANT 
 STATE 
 
 
 CHANGE T| TO 
 
 
 
 \ 
 
 MAIN SEOUENCE 
 IN (5) 
 
 1 
 
 PUT MICRO- INSTR. 
 
 AND MODIFIER 
 
 ON OUT LINES 
 
 AND HOLD 
 
 
 1 
 
 1 
 
 
 CHANGE Tj 
 TO 1 
 
 
 1 
 
 ACK i+ i G 
 TO 1 
 
 1 
 
 3ES 
 
 CHANGE T| 
 TO 
 
 
 
 ACK i+1 G( 
 TO 
 
 >ES 
 
 NOTE REOUIRED 'G' VALUES ARE AVAILABLE TO DPU | WHEN ALL DPU'S 
 WHICH MUST SEND INFORMATION TO DPUj INDICATE THAT THE 
 INFORMATION THEY SEND ARE VALID (V,=l) AND SEND A SERIAL 
 NUMBER EQUAL TO THE SERIAL NUMBER COUNTER OF DPUj 
 (SN,-SN|). THAT IS, THE FOLLOWING CONDITION MUST BE 
 SATISFIED: 
 
 a , r(V i+e »l)-(SN| +e = SN i )] = l(TRUE) 
 ti| L J 
 
 Figure 16. Flow diagram of the control logic of DPUi. 
 
132 
 
 simultaneously passes the micro -instruction it has just executed 
 to its right neighbor and receives from its left neighbor the 
 micro-instruction which that DPU has just executed. 
 
 Coordination of the DPUs can be accomplished by the 
 use of no-op micro-instructions. Each micro-instruction -which 
 requires the cooperation of several DPUs is preceeded by a set-up 
 micro-instruction and a number of no-op micro-instructions. 
 These micro-instructions assure that the required information 
 is available when a DPU executes the micro-instruction. 
 
 At the other extreme of implementation philosophies are 
 
 the arithmetic units in which all activities are controlled by 
 
 request- response signals. The control logic of each DPU may 
 
 be composed of three interacting sequential machines in this 
 
 case. The flow diagrams of these machines are shown in 
 
 Figure 16, where the control signals are: 
 
 T. which indicates that the lines between DPU. 
 
 1 i 
 
 and DPU. which carry the micro- 
 instruction operation code and modifying data 
 
 .F. are valid, 
 J i 
 
 ACK indicates that DPU. has accepted the micro- 
 i i 
 
 instruction from DPU. . , 
 
 l- 1 
 
 SN is a serial number which indicates the micro- 
 i 
 
 instruction for which the .G. data was determined, 
 
 J i 
 
133 
 
 and 
 
 V indicates that the .G. and SN. lines are stable 
 
 i j i 1 
 
 and may be examined. 
 
 All of the sequential machines of all DPUs are initialized 
 
 to their respective state 1 in the figure. The V. signals are 
 
 initialized to 1, the T. and ACK. signals are initialized to 0. 
 
 i l 
 
 DPU. begins the execution of a micro- instruction when 
 i 
 
 it is in its dormant state and T. . goes to 1 . This indicates 
 
 l-l 
 
 that DPU. , is transmitting to it the next micro-instruction it 
 l-l 
 
 is to execute. DPU. gates this micro-instruction into its micro- 
 
 i 
 
 instruction register and activates the ACKL and ACKR machines. 
 
 ACKL indicates to DPU. , that it has received the micro- 
 
 l-l 
 
 instruction, while ACKR passes the micro-instruction to DPU 
 
 ^ i+1 
 
 If DPU. must send information to other DPUs which they require 
 
 to execute this micro-instruction, it turns off V.. It then 
 
 l 
 
 increments the serial number, SN., determines the required 
 
 information and places the information and identifying serial 
 
 number on the inter-DPU data paths. DPU. then turns V. back 
 
 l l 
 
 on and begins checking the inter-DPU data paths for the infor- 
 mation that it requires to perform the micro-instruction. When 
 this information becomes available or if information from other 
 DPUs is not required, DPU. performs the micro-instruction 
 (i.e. , it changes the value of some operand digit it contains). 
 
134 
 
 When the micro-instruction has been performed, ACKL is 
 in the dormant state, and ACKR is in state 4 or the dormant 
 state, the main sequence goes into the dormant state to wait 
 for the next micro-instruction. 
 
 5.2.5 The Number of Connections to a DPU 
 
 Figure 3 and the discussion above may be used to deter- 
 mine the number of electrical connections (pins) that must be 
 made to each DPU. In an arithmetic unit which uses request- 
 response signals this number is 
 
 C = P + 4 + 2 (OPS+SF) + ( a +1) (SG+SSN+1) + MEM (5.2.1) 
 where 
 
 C is the number of connections required by each DPU 
 
 RR 
 
 when request-response signals are used, 
 P is the number of pins required to power the DPU, 
 
 OPS is the number of bits required for the operation 
 
 code of the micro-instruction, 
 SF is the number of bits of the modifier value 
 
 accompanying the micro-instruction, 
 
 SG is the number of bits required to represent a .G. 
 
 J i 
 
 value, 
 SSN is the number of bits in the serial number, and 
 MEM is the number of pins required by the DPU to 
 communicate with memory. 
 
135 
 
 If we assume that the only values of .F. and .G are those 
 
 J i J i 
 
 required to perform addition and shifts, SF and SG are then 
 
 SF = 1 + Lg (r- £+1), and 
 
 SG= Lg (2r(r-i)+l), 
 where 
 
 r is the radix of the number system, 
 
 / is its redundancy parameter , and 
 
 Lg(x) is the smallest integer equal to or greater than 
 log 2 (x). 
 If we further assume that the PCU does not have any unusual 
 data requirements so that SN has to take on a values or 
 SSN = Lg ( a ), (5.2.1) becomes 
 
 C RR = P + 6 + 2 • OPS + 2 • Lg (r- £ +1) + 
 
 (a+1)- [Lg (2r(r-i)+l) + Lg (a )1 + MEM (5.2.2) 
 
 When the arithmetic unit is implemented so that all 
 DPUs execute micro- instructions periodically and synchronously, 
 the request-response signals and serial numbers are unnecessary. 
 Synchronizing signals are required instead; the number of con- 
 nections then becomes 
 
 C + P + 2 • (OPS+SF) + ( a + 1) • SG +SYNC + MEM (5.2.3) 
 
 s 
 
136 
 
 where 
 
 C is the number of connections required by each 
 
 DPU in a synchronous arithmetic unit, and 
 SYNC is the number of connections required to 
 transmit synchronizing signals to a DPU. 
 If the same assumptions are made regarding the values 
 taken on by .F. and .G. as was made above, this becomes 
 
 C = P + 2 + 2 * OPS + 2 • Lg (r- £ +1) + 
 s 
 
 ( a+l)«Lg (2r(r-i)+l) + SYNC + MEM. (5.2.4) 
 
 5.3 The Primitive Control Unit 
 
 5.3.1 The Role of the PCU 
 
 The main function of the PCU is to convert a sequence of 
 instructions (e.g. , add, multiply, divide) into a sequence of micro- 
 instructions which must be performed by the DPUs and to issue 
 these micro-instructions to DPU . This conversion process is 
 similar to the process that the adder control logic of the IBM 
 7094(11) performs in interpreting instructions. The major 
 difference between the two processes is that the multiplication 
 algorithm must be right-directed in the limited connection arith- 
 metic unit, as discussed in Section 3.3. 
 
137 
 
 The PCU may also perform subsidiary functions, 
 such as processing exponents or communicating with the 
 memory. 
 
 5.3.2 The Registers in the PCU 
 
 The PCU must contain an instruction register which 
 contains the instruction currently being converted into micro- 
 instructions. It also contains the integer extension of the 
 accumulator, which must be three digits in length (see Section 
 3.3). The PCU must also have a counter to control the number 
 of shifts and the number of repetitive steps during multiplication 
 and division. 
 
 If the arithmetic unit is implemented with request- 
 response signals and 'sense micro-instruction' detectors are 
 not used, the PCU must contain a serial number counter 
 identical to those in the DPUs . The PCU must be able to store 
 the value of several multiplier digits if it recodes multipliers. 
 Memory byte assembly and dis-assembly registers are required 
 in the PCU of arithmetic units that communicate with memory 
 through the PCU and in which the memory byte contains several 
 digits . 
 
 The exponents of all operands whose fractional parts are 
 contained in the DPUs may be contained in and process by the 
 
138 
 
 the PCU to obtain higher performance. It is possible to pro- 
 cess exponents in a special module or in a second set of DPUs, 
 but these choices tend to increase the time between when the 
 manipulation of the exponents is begun and when its result is 
 available. Since the result of the exponent manipulation deter- 
 mines the processing to be performed on the fractional parts of 
 the operands, such delays decrease performance. 
 
 5.3.3 The Sequencing of the PCU 
 
 The PCU is sequenced very much like the DPUs are 
 (see Figure 16). The PCU does not have an ACKL machine, 
 since it has no module to its left. When the PCU executes a 
 micro-instruction it is primarily determining the next micro- 
 instruction that it must issue to DPU . 
 
 5.3.4 The Connections to the PCU 
 
 The PCU must communicate with three major com- 
 ponents of the computer system. The first of these is the 
 complex of DPUs; these connections are shown in Figure 1. 
 In arithmetic units where the PCU requires from the DPUs 
 only that information which it gets because it is in effect 
 'DPU ' , the number of pins required by the PCU for this 
 information is less than the number required by a DPU. 
 The PCU requires one, rather than two, sets of connections 
 
139 
 
 for issuing micro-instructions. It also requires one fewer 
 
 set of inter-DPU signals ( G. , SN., and V.) in this case since 
 
 j 1 1 l 
 
 it has no left neighbor. 
 
 If exponents are processed external to the PCU, some 
 connections will be required to indicate what processing is 
 required and to return an indication of the results of the pro- 
 cessing. If a special module is employed to process exponents, 
 the number of connections required by the PCU to communicate 
 with the special module can be kept small. If a second complex 
 of DPUs is used to process exponents, the number of connections 
 can be expected to be quite large. 
 
 The second component of the computer system with which 
 the PCU must communicate is the memory unit, from which it 
 obtains instructions and possibly obtains operands and returns 
 results. The number of pins required is determined by the 
 memory unit byte, the number of bits required to address the 
 memory, and whether results must be stored. 
 
 Finally, the PCU must be connected to power sources 
 and possibly synchronizing signals. The number of pins 
 required for this purpose is the same as the number of pins 
 required by a DPU. 
 
140 
 
 5.4 The End Unit 
 
 The End Unit performs two major functions in the arith- 
 metic unit. The first function is to act like a terminator for the 
 DPUs. Terminating a set of DPUs with an End Unit is analogous 
 to terminating a transmission line with its characteristic impe- 
 dance: the DPUs operate as if DPUs extended indefinitely to the 
 right. The End Unit supplies all of the signals and data that are 
 required by the DPUs but are not supplied by other DPUs or the 
 PCU. 
 
 The second function of the End Unit is to cause all cal- 
 culations to be performed with the maximum possible precision. 
 It does this by saving the digits shifted out of the last DPU in one 
 or more push-down stacks. These digits can then be returned 
 to the DPUs when left shifts are given or be used in forming 
 sums . 
 
 The complexity of the End Unit is essentially independent 
 of the complexity of the DPUs or the PCU. The End Units of 
 arithmetic units designed with request- response signals must 
 have a serial number counter. If an End Unit is used to increase 
 the precision of calculation it must have a micro-instruction 
 register in addition to the push-down stacks for holding the digits 
 shifted out of the DPUs. The number and capacity of the push- 
 
141 
 
 down stacks are determined by economic considerations. In 
 the simplest scheme, the End Unit has one stack which is 
 associated with the A register. The register cannot have 
 shifting capability in this case. 
 
 The End Unit requires fewer pins than either the PCU 
 or the DPUs . Not only does the End Unit need only one micro- 
 instruction bus, but it needs only one validity signal and one 
 serial number bus. The End Unit also requires one less inter- 
 DPU data bus. The connection to the End Unit of a typical 
 arithmetic unit is shown in Figure 1 . 
 
 The control of the End Unit in an arithmetic unit 
 employing request- response signals is very much like the 
 control of a DPU, shown in Figure 16. It does not require 
 the ACKR machine since it does not have any right neighbors. 
 
 *In Figure 16, i = n+1 for the End Unit. The End Unit must supply 
 
 multiple V., SN. , and G signals. We will define that V , = 
 l l j i n+1 
 
 V ,=...= V and SN . = SN _=...= SN and will 
 
 n+2 n+ a n+1 n+2 n+a 
 
 interpret .G , to read .G , , G _,..., .G . a is the maxi- 
 
 J n+1 j n+1 j n+2 j n+a 
 
 mum of the a. of the micro-instruction repertoire of the DPUs. 
 
142 
 
 5. 5 Exponent Arithmetic Unit 
 
 There are three methods of processing exponents in a 
 limited connection unit. The first method is to process the 
 exponents in a second complex of DPUs . This method will 
 result in a low performance arithmetic unit and will require 
 the PCU to have a large number of electrical connections. 
 The second method is to perform exponent processing in the 
 PCU. 
 
 This method makes it possible to obtain higher perfor- 
 mance and to decrease the number of pins required by the PCU. 
 It, however, increases the complexity of the PCU by requiring 
 the exponents of all active operands to be stored in the PCU. 
 
 The third method, employing a special module to hold 
 and process exponents appears to make it possible to achieve 
 high performance while not increasing the complexity or the 
 number of pins of the PCU excessively. 
 
 The Exponent Arithmetic Unit would best be designed 
 to send to the PCU only the information that it requires to con- 
 trol the processing of the DPUs. For example, if an addition 
 is to be performed, the Exponent Arithmetic Unit would respond 
 with one of the following: 
 
 a) shift A register, 
 
143 
 
 b) shift register, 
 
 c) issue add micro-instruction. 
 
 The Exponent Arithmetic Unit must have commands to set up 
 multiplication and division in addition to those which have 
 analogies to the micro-instructions of the DPUs . These set up 
 commands cause the exponent of the result to be determined 
 and the repetitive step counter to be set. The PCU would then 
 precede each of the repetitive steps with an inquiry of the 
 Exponent Arithmetic Unit, which would indicate whether 
 another repetitive step is to be performed or if the operation 
 has been completed. 
 
 5. 6 The Sense M i cro-Instruction Detector 
 
 Another PCU function which may be performed external 
 to the PCU is that of determining when the necessary informa- 
 tion is available from the DPUs. This function may be per- 
 formed within the PCU just as it is within each of the DPUs. 
 However, performing some or all of this function in other 
 modules reduces the complexity and the pin requirements of 
 the PCU. The unit that detects micro-instructions which were 
 
 *Each inquiry could be overlapped with the previous step of the 
 algorithm. 
 
144 
 
 I 1 
 
 y- T* 
 
 ! i 
 
 
 
 r- 1 
 + 
 
 
 
 
 3 
 0. 
 
 
 
 
 Q 
 
 
 
 ! ! 
 
 
 .^/SENSE >y - 
 7 MICRO- \ 
 
 (instruction) 
 -^detector/ -- 
 
 
 ! i 
 
 
 
 * 
 
 3 
 
 
 
 
 0_ 
 
 
 Q 
 
 
 
 
 1 i 
 
 
 J- • L 
 
 
 
 I 
 
 I „ 
 
 ■"I 
 
 T 
 I 
 
 T 
 
 ro 
 
 3 
 CL 
 
 ci 
 
 C\] 
 
 3 
 Q 
 
 n 
 
 3 
 
 1 r 
 
 4 — * 
 
 p 2 t 
 1 z § 
 
 a: o 
 
 -J 
 
 T 
 
 1 
 
 1 
 
 1 
 
 " "" "T* 
 
 1 
 i 
 
 1 
 
 
 
 
 
 
 
 3 
 
 
 
 
 0_' 
 
 
 
 Q 
 
 
 
 
 t 
 
 1 
 
 1 
 
 1 
 
 1 
 
 t 
 
 
 
 + 
 
 
 
 
 3 
 
 
 
 
 CL 
 
 
 
 
 
 
 
 
 t 
 
 1 
 
 1 
 
 1 
 
 1 
 
 T 
 
 
 
 
 
 3 
 
 
 
 
 0_ 
 
 
 
 Q 
 
 
 
 
 | 
 
 1 
 
 X 
 
 1 
 
 
 ■* 
 
 l > 
 
 
 
 \UJ / 
 
 
 t i 
 
 
 1 
 
 ! * 
 
 
 
 z 
 
 
 3 
 
 
 
 oJ 
 
 
 Q 
 
 
 
 t ! 
 
 
 ! * 
 
 
 
 I— • 
 1 
 
 
 Z 
 
 
 
 3 
 
 
 
 a." 
 
 
 Q 
 
 
 
 t : 
 
 
 i i 
 
 
 
 u 
 d 
 
 o 
 
 a 
 
 -t-i 
 C 
 
 • i-H 
 
 o 
 
 4-1 
 
 o 
 
 CD 
 
 J-> 
 0) 
 T) 
 
 C 
 O 
 
 •|H 
 4-> 
 
 u 
 
 s-l 
 
 CO 
 
 c 
 
 o 
 u 
 
 0) 
 CO 
 
 c 
 
 cu 
 
 CO 
 
 CD 
 
 J-l 
 
 o 
 
 ■4-1 
 
 c 
 
 0) 
 
 a; 
 u 
 nJ 
 
 i—i 
 
 CD 
 U 
 
145 
 
 (or could have been) issued to examine the digits of one of 
 the operands is placed between the last DPU which supplies 
 information to the PCU and the first DPU which does not, 
 as shown in Figure 17. When one of these micro-instructions 
 reaches the sense micro-instruction detector, this unit sends 
 a signal to the PCU indicating that it has reached the detector. 
 Hence, it indicates to the PCU when the required information 
 is available. The PCU then does not have to receive the 
 serial numbers associated with the data it requires. 
 
 If unique micro-instructions are employed for this 
 sensing operation, this simple scheme will suffice. Any 
 micro-instruction which causes operands to be placed on the 
 inter- DPU data connections may be employed to sense oper- 
 ands contained in the DPUs , however. Some mechanism must 
 be included in this case to distinguish between the instances 
 when these micro-instructions are being used to sense an 
 operand from when they are not. The basic method employs 
 a counter which contains the count of the subsequent 'sense- 
 type' instructions which were not launched for sense purposes. 
 There are two possible implementations of the sense micro- 
 instruction detector when a counter must be included. 
 
146 
 
 In the first, shown in Figure 18, the counter is located 
 in the PCU; the detector module is little different from that 
 used if unique micro-instructions were employed for sensing 
 DPU contents. The major difference is that a second sense- 
 type micro-instruction will not be allowed to propagate beyond 
 the sense micro-instruction detector until a proceed signal 
 is received from the PCU. This proceed signal indicates that 
 the last sense micro-instruction has been tallied on the counter. 
 
 In the second, shown in Figure 19, the counter is in 
 the detector. This has the advantage of minimizing the amount 
 of hardware in the PCU. It also is less likely to cause delays 
 because the signal to increment the counter is sent at the same 
 time the micro-instruction is launched. Hence, incrementing 
 the counter is overlapped with the micro-instruction propagating 
 through the DPUs . 
 
147 
 
 U 
 
 -t-> 
 
 n) 
 y 
 
 o 
 
 o 
 
 -1-1 
 
 o 
 
 <u 
 
 n 
 o 
 
 • iH 
 ■4J 
 
 o 
 
 CO 
 
 a 
 
 •rH 
 O 
 
 0) 
 to 
 
 en 
 
 a; 
 
 •r-l 
 
 y 
 o 
 
 0} 
 
 to 
 
 o 
 U 
 
 oo 
 
 i— i 
 
148 
 
 u 
 o 
 
 u 
 cu 
 
 CD 
 
 fi 
 O 
 
 •H 
 
 o 
 
 •4-> 
 
 CO 
 
 C 
 
 • H 
 
 I 
 
 o 
 
 0) 
 CO 
 
 c 
 cd 
 
 CD 
 
 cd 
 
 CD 
 
 o 
 
 o 
 
 u 
 
 — 
 
 g 
 
 U 
 
 1—1 
 
 3 
 
 •iH 
 
149 
 
 6. SUMMARY AND CONCLUSIONS 
 
 6. 1 Discussion of Results 
 
 Section 1.1 describes the characteristics that a 
 mechanism must have in order for it to be particularly- 
 appropriate for implementation in the newly emerging 
 technologies commonly called 'large scale integration' 
 or 'LSI'. This paper presents a design for an arithmetic 
 unit which admirably meets those requirements. 
 
 One of the primary desiderata of a unit to be imple- 
 mented in LSI is that the item be composed of a small 
 number of rather complex module types (i.e. , the modules 
 employ a large number of logic elements). The approach 
 proposed in this paper will yield designs that will consist 
 of from three to eight types of modules. Furthermore, 
 the complexity of these modules can be adjusted over a 
 wide range by such factors as the number system and 
 algorithms used, so that a design may be tailored to the 
 technology with which it is to be implemented. 
 
 Another major desideratum of designs to be imple- 
 mented in LSI is that each module must require a relatively 
 small number of connections (i.e. , pins) to communicate 
 with its environment. 
 
150 
 
 The number of connections required by the modules 
 of a limited connection arithmetic unit are limited by the need 
 of each module to communicate with a small number of other 
 modules. The major module, the Digit Processing Unit, must 
 communicate with either two or four other modules . The 
 End Unit must correspondingly communicate with one or two 
 other modules. The number of modules that communicate 
 with the Primitive Control Unit is no less than the number 
 that communicate with the End Unit. 
 
 A third desideratum of designs to be implemented in 
 LSI is that no inter-module signal be routed to a large number 
 of modules. No inter-module signal in arithmetic units organi- 
 zed as proposed in this paper need be sent to more than three 
 modules; the number may be decreased to two or one with 
 attendant decreases in performance. 
 
 ; This case requires less than 50% more pins because of the nature 
 of the signals . 
 
 : *Arithmetic units in which the modules communicate with a larger 
 number of other modules can be expected to have higher perfor- 
 mance. 
 
151 
 
 Hence, the approach proposed in this paper is very- 
 well suited to designing arithmetic units in LSI. Furthermore, 
 the proposed organization has several advantages over the 
 other arithmetic unit organizations proposed for implementation 
 in LSI (9, 10, 12) because it is operationally a single multi- 
 purpose arithmetic unit rather than a multiplicity of special- 
 purpose units . 
 
 The first advantage is that no effort is required of the 
 programmer, compiler, assembler, or monitoring program to 
 take full advantage of the potential parallelism of the arithmetic 
 unit. The arithmetic unit is organized so that the operations 
 within a single instruction stream that can be performed con- 
 currently are evoked automatically. 
 
 The second advantage of the proposed scheme is that no 
 unnecessary processing is performed when a conditional branch 
 is encountered in the instruction stream. After the controlling 
 element (the PCU) is presented with a conditional branch 
 instruction, it can immediately initiate the testing required to 
 resolve the branch. Computers with multiple execution units 
 may have to delay the testing until the operand is available. 
 While this operand is being formed by one of the execution units 
 
 and is unavailable for testing, the instruction decoder either 
 waits (with the attendant loss of performance), or goes into a 
 
152 
 
 mode in which it continues to decode instructions and issues 
 them conditionally to the execution units. The execution units 
 are allowed to proceed but are not allowed to make any irrevoc- 
 able changes to the state of the computer until the branch 
 decision is made. This requires a great deal of interlocking 
 and control hardware. 
 
 Like the other organizations proposed for implementing 
 arithmetic units in LSI (9, 10, 12), the arithmetic unit proposed 
 in this paper has the property of modularity. That is, the word 
 length is determined by the number of modules and not by the 
 design of the modules. The same basic building blocks could 
 therefore be used to construct a variety of arithmetic units 
 with a wide range of computational capability. 
 
 6 . 2 Suggestions for Related Work 
 
 The number system and arithmetic algorithms are the 
 principal factors with which the design of a limited connection 
 arithmetic unit may be adjusted to a specific technology and to a 
 specific performance requirement. The designer has no general 
 analytic tools to aid him in making these choices. He must 
 therefore go through an iterative process of selecting tentative 
 parameters, designing an arithmetic unit based on these para- 
 
153 
 
 meters, and evaluating the desirability of the resulting design. 
 Studies which would be particularly useful to the designer of 
 limited connection arithmetic units include: 
 
 1 . the determination of the complexity of a 
 signed-digit adder as a function of its 
 number system, 
 
 2. the determination of the improvement in 
 performance that will occur if multiplier 
 recoding is employed or if additional digits 
 of the partial remainder are examined during 
 division, 
 
 3. the determination of the complexity of multi- 
 plier recoders, quotient digit selectors, and 
 normalization control circuitry. 
 
 Reliability and availability considerations were not 
 addressed in this paper. The arithmetic unit as proposed in 
 this paper will operate properly only if all the modules are oper- 
 ating properly. The modules can be designed so that they will 
 stop if they detect an error, so that maintenance becomes less 
 burdensome. Determining organizational modifications that are 
 necessary to yield an arithmetic unit that will operate properly 
 in the presence of failures is a very important area to be investi- 
 gated. 
 
154 
 Another unresolved problem in the area of the organi- 
 zation of the limited connection arithmetic unit is the provision 
 of a reasonable method of performing multiple precision addition 
 and subtraction. The method which would have to be used in the 
 arithmetic unit as described here requires very many shift 
 micro-instructions to be performed whenever radix point align- 
 ment is necessary. The digits shifted out of the active adder 
 register during radix point alignment are not shifted into another 
 active register, as they are in the classical Von-Neumann arith- 
 metic unit. Instead, they are shifted into the End Unit, from 
 which they can be returned to the active registers only by left 
 shifts. Reconstructing these digits from the initial operands 
 would also require a significant amount of shifting. 
 
 A suggestion for improving the performance of the 
 limited connection arithmetic unit was recently made by 
 Robertson (20). He observed that more optimal normalization 
 and quotient digit selection algorithms could be employed if 
 each zero digit is assigned the sign of the first non-zero digit 
 to its right. For example, if a number to be normalized has 
 the form 10 ... 01 , it would be clear that the number is nor- 
 malized after examining the first zero digit, whereas all of the 
 zero digits would have to be examined and shifted if the zero 
 digits were unsigned. Organizing the mechanism for associating 
 
155 
 
 signs with zero digits so that the appropriate zero digit receives 
 the sign is the fundamental problem to be solved in applying this 
 technique to limited connection arithmetic units. If a sum has 
 a number of adjacent zero digits, the DPU containing the first 
 of these zero digits will have performed a number of micro- 
 instructions before the sign to be associated with that zero 
 digit arrives at the DPU. These micro-instructions may have 
 made additional copies of the zero, sent it to another DPU, or 
 obliterated it. Developing a scheme for keeping the appropriate 
 records with a reasonable amount of hardware appears to be a 
 challenging problem. 
 
156 
 
 APPENDIX I 
 
 CHARACTERISTICS OF THE SYMMETRIC RADIX 
 TWO SIGNED DIGIT ADDER 
 
 The probability of each possible pair of adjacent digits in the 
 sum was determined for Adder 3 of Section 3.2. The analysis is 
 based on the assumption that one input representation, A, has the 
 same pair probabilities as the sum representation, A', while the 
 other input, 0, is characterized parametrically by an analogy to 
 SRT division (19), (21). The analysis was performed for the case 
 
 for which the probability of zero for a digit in the parametrically 
 
 2 2 
 
 defined input representation was in the range — to — . 
 
 5 3 
 
 This restriction is justified by the results of the analysis of 
 the single digit probabilities of Adder 2 by Rohatsch (21). He dis- 
 covered that the probability of zero digits in the sum representation 
 varied little from — when the probability of zero digits of both oper- 
 and representations varied from to 1 . 
 
 This analysis takes a somewhat different tack than that taken 
 by Rohatsch. He analyzed the output representations on the basis of 
 
157 
 
 the triplet probabilities of both operand representations, where the 
 
 representations of both operands were defined parametrically. In 
 
 the present analysis, only one operand, , is parametrically defined. 
 
 The other operand, A, is assumed to have the same distribution as 
 
 the sum representation, A'. This is justified by the observation that 
 
 at least one of the operands taking part in an addition is the sum of a 
 
 previous addition in practically all of the uses of the adder in a 
 
 typical calculation. The transfer digits are assumed to reach a 
 
 steady-state distribution independent of digital position. 
 
 Each possible combination of digits for the A operand and the 
 
 input transfers (t , t' ) was identified as the present state. Table 
 ^ i+1 i+1 K 
 
 11 lists these, indicating which states are equivalent because of the 
 symmetry of the adder. Each possible combination of digits for the 
 sum and the output transfers (t. , t| ) was identified as the next 
 state, also employing Table 11, with the following equivalences 
 
 Vi-'i+i (A1 - 1 ' 
 
 'i-i 5 *Ui (A1 - 2) 
 
 l' = a j = i, i+1 (A1.3) 
 
 J J 
 
 : Three consecutive digits of the number. 
 
158 
 
 Table 11. States of the Markov Process Used to Analyze Adder 3. 
 
 
 'Positive' 
 
 State 
 
 
 'Negative' 
 
 State 
 
 
 Steady 
 State 
 
 State 
 
 'i+i 
 
 'i+1 
 
 a. 
 l 
 
 a i + l 
 
 t t' 
 i+1 i+1 
 
 a. 
 
 i 
 
 a i+l 
 
 No. 
 
 A 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 B 
 
 1 
 
 
 
 1 
 
 
 
 I 
 
 1 
 
 
 
 - 
 
 C 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 D 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 - 
 
 E 
 
 1 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 1 
 
 F 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 2 
 
 G 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 3 
 
 H 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 4 
 
 I 
 
 1 
 
 
 
 1 
 
 1 
 
 T o 
 
 1 
 
 1 
 
 5 
 
 J 
 
 1 
 
 1 
 
 1 
 
 1 
 
 I l 
 
 1 
 
 1 
 
 - 
 
 K 
 
 1 
 
 1 
 
 1 
 
 
 
 l l 
 
 1 
 
 
 
 6 
 
 L 
 
 1 
 
 1 
 
 1 
 
 1 
 
 I l 
 
 1 
 
 1 
 
 7 
 
 M 
 
 1 
 
 1 
 
 
 
 1 
 
 I l 
 
 
 
 1 
 
 BA 
 
 N 
 
 1 
 
 1 
 
 
 
 
 
 l l 
 
 
 
 
 
 9 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 l l 
 
 
 
 1 
 
 10 
 
 P 
 
 1 
 
 1 
 
 1 
 
 1 
 
 l l 
 
 1 
 
 T 
 
 - 
 
 Q 
 
 1 
 
 1 
 
 1 
 
 
 
 l l 
 
 1 
 
 
 
 - 
 
 R 
 
 1 
 
 1 
 
 1 
 
 1 
 
 l l 
 
 1 
 
 1 
 
 - 
 
 S 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 - 
 
 T 
 
 
 
 
 
 1 
 
 
 
 
 
 1 
 
 
 
 11 
 
 U 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 12 
 
 V 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 1 
 
 8B 
 
 w 
 
 
 
 
 
 
 
 
 
 - 
 
 - 
 
 - 
 
 13 
 
159 
 
 
 co 
 
 tH 
 
 O 
 
 o 
 
 O 
 
 CN 
 
 ft4 
 
 o 
 
 o 
 
 o 
 
 ft, 
 
 X 
 
 CN 
 
 o 
 
 o 
 
 O 
 
 CN 
 
 ft, 
 
 CN 
 ft, 
 
 
 CM 
 tH 
 
 O 
 
 o 
 
 tH 
 
 PL. 
 
 ro 
 
 ft. 
 
 o 
 
 o 
 
 o 
 
 CN 
 
 ft, 
 
 tH 
 ft, 
 
 o 
 
 o 
 
 tH 
 
 ft. 
 
 d" 
 ft. 
 
 
 tH 
 tH 
 
 tH 
 
 ft. 
 
 tH 
 
 ft. 
 
 CM 
 ft. 
 
 o 
 
 o 
 
 o 
 
 tH 
 ft. 
 
 o 
 
 o 
 
 o 
 
 d" 
 ft, 
 
 tH 
 
 ft, 
 
 
 O 
 
 tH 
 
 o 
 
 o 
 
 tH 
 ft, 
 
 ft. 
 
 o 
 
 o 
 
 tH 
 
 ft, 
 
 CN 
 
 ft, 
 
 d- 
 
 ft. 
 
 o 
 
 tH 
 
 0, 
 
 O 
 
 tH 
 
 ft, 
 
 
 cn 
 
 o 
 
 o 
 
 O 
 
 CN 
 ft. 
 
 o 
 
 o 
 
 J- 
 
 ft. 
 
 CO 
 
 ft. 
 
 o 
 
 o 
 
 o 
 
 tH 
 ft, 
 
 CN 
 
 ft. 
 
 
 CO 
 
 o 
 
 tH 
 ft, 
 
 o 
 
 ft, 
 
 o 
 
 o 
 
 o 
 
 tH 
 
 ft, 
 
 ft. 
 
 o 
 
 tH 
 
 ft. 
 
 CN 
 
 ft. 
 
 tH 
 
 Q, 
 
 CD 
 +-> 
 «J 
 
 -t-> 
 
 r*> 
 
 o 
 
 tH 
 
 ft, 
 
 o 
 
 CO 
 
 ft. 
 
 o 
 
 o 
 
 CN 
 
 ft. 
 
 tH 
 ft, 
 
 tH 
 ft, 
 
 o 
 
 o 
 
 O 
 
 ft. 
 
 CO 
 
 CO 
 
 tH 
 
 ft, 
 
 co 
 ft. 
 
 o 
 
 CM 
 
 ft. 
 
 o 
 
 o 
 
 tH 
 
 ft. 
 
 O 
 
 O 
 
 o 
 
 o 
 
 li- 
 ft, 
 
 tH 
 
 ft. 
 
 0) 
 1-4 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Oh 
 
 m 
 
 tH 
 
 ftl 
 
 co 
 ft. 
 
 o 
 
 CN 
 ft. 
 
 tH 
 
 ft. 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 3- 
 ft, 
 
 tH 
 ft. 
 
 
 j- 
 
 o 
 
 tH 
 
 0. 
 
 o 
 
 co 
 ft. 
 
 CN 
 
 ft, 
 
 o 
 
 o 
 
 tH 
 ft. 
 
 tH 
 ft, 
 
 o 
 
 o 
 
 o 
 
 3- 
 ft. 
 
 
 co 
 
 o 
 
 o 
 
 o 
 
 tH 
 ft. 
 
 CO 
 
 ft, 
 
 o 
 
 o 
 
 ■3" 
 ft. 
 
 CN 
 ft. 
 
 tH 
 
 ft. 
 
 tH 
 
 ft. 
 
 o 
 
 o 
 
 
 CM 
 
 o 
 
 o 
 
 o 
 
 ft. 
 
 H" 
 ft. 
 
 tH 
 
 ft. 
 
 o 
 
 CO 
 
 ft. 
 
 O 
 
 o 
 
 o 
 
 tH 
 
 ft, 
 
 CN 
 
 ft. 
 
 
 tH 
 
 o 
 
 o 
 
 tH 
 
 ft. 
 
 o 
 
 tH 
 ft. 
 
 ft. 
 
 o 
 
 CN 
 
 ft. 
 
 d" 
 ft, 
 
 o 
 
 tH 
 
 ft, 
 
 O 
 
 tH 
 ft, 
 
 Next 
 State 
 
 tH 
 
 CM 
 
 CO 
 
 ^1- 
 
 LD 
 
 CO 
 
 r- 
 
 00 
 
 a-- 
 
 o 
 
 tH 
 
 tH 
 
 tH 
 
 CN 
 tH 
 
 CO 
 rH 
 
160 
 
 The transition probabilities were then determined, based on 
 the probabilities of pairs of the operand. The determination of 
 the steady- state digit statistics can then be determined the steady- 
 state distribution of states in the Markov process defined above. 
 Examination of the transition matrix showed that fourteen of the 
 twenty-three states are persistent, and that states M and V are 
 equivalent with respect to their transition probabilities to subse- 
 quent states. This is also indicated in Table 11. The transition pro- 
 babilities of the persistent states are given as Table 12. In this 
 table, 
 
 PI = - (A1.4) 
 
 4 
 
 P2 = | (A1.5) 
 
 P3= j - | (A1.6) 
 
 P4 = j - &■ (A1.7) 
 
 where 
 
 z is the probability of zero of the parametrically defined 
 
 2 2 
 
 input representation, — <T z<— . 
 
 The probabilities of the two equivalent states can be determined 
 by finding the probability of state V (or 8b) and subtracting that from 
 
161 
 
 Table 13. Pair Probabilities of Sums Produced by Adder 3 
 
 z 
 
 (0,0) 
 
 (0,1) 
 (0,1) 
 
 (1,1) 
 (1,1) 
 
 (1,0) 
 (1,0) 
 
 (1,1) 
 (1,1) 
 
 .40 
 
 . 256339 
 
 .129070 
 
 .078276 
 
 . 128170 
 
 .036316 
 
 .41 
 
 .255566 
 
 .128686 
 
 .079022 
 
 .127783 
 
 . 036726 
 
 .42 
 
 . 254825 
 
 .128297 
 
 .079753 
 
 .127413 
 
 .037125 
 
 .43 
 
 .254117 
 
 . 127902 
 
 . 080468 
 
 .127059 
 
 .037513 
 
 .44 
 
 . 253441 
 
 .127501 
 
 .081167 
 
 .126721 
 
 . 037891 
 
 .45 
 
 . 252795 
 
 . 127095 
 
 .081851 
 
 . 126398 
 
 .038258 
 
 .46 
 
 . 252180 
 
 . 126684 
 
 .082520 
 
 .126090 
 
 . 038616 
 
 .47 
 
 .251593 
 
 . 126269 
 
 .083173 
 
 .125796 
 
 .038965 
 
 .48 
 
 .251034 
 
 .125850 
 
 .083810 
 
 .125517 
 
 .039306 
 
 .49 
 
 .250504 
 
 . 125427 
 
 .084431 
 
 .125252 
 
 .039638 
 
 . 50 
 
 . 250000 
 
 .125000 
 
 . 085037 
 
 .125000 
 
 . 039963 
 
 .51 
 
 .249523 
 
 .124570 
 
 .085626 
 
 .124761 
 
 . 040281 
 
 .52 
 
 . 249071 
 
 .124136 
 
 .086200 
 
 .124535 
 
 .040593 
 
 .53 
 
 . 248644 
 
 . 123700 
 
 .086757 
 
 . 124322 
 
 .040899 
 
 .54 
 
 . 248243 
 
 .123261 
 
 .087298 
 
 . 124121 
 
 . 041199 
 
 . 55 
 
 . 247865 
 
 . 122819 
 
 .087822 
 
 . 123932 
 
 . 041493 
 
 . 56 
 
 .247511 
 
 . 122375 
 
 .088330 
 
 .123756 
 
 .041784 
 
 .57 
 
 . 247180 
 
 .121929 
 
 . 088821 
 
 .123590 
 
 . 042069 
 
 . 58 
 
 . 246872 
 
 .121480 
 
 .089296 
 
 .123436 
 
 .042351 
 
 .59 
 
 . 246587 
 
 .121030 
 
 .089753 
 
 . 123294 
 
 .042630 
 
 .60 
 
 . 246324 
 
 . 120578 
 
 .090193 
 
 .123162 
 
 .042906 
 
 .61 
 
 . 246082 
 
 .120124 
 
 . 090615 
 
 .123041 
 
 . 043179 
 
 .62 
 
 . 245863 
 
 .119668 
 
 . 091020 
 
 .122931 
 
 .043450 
 
 .63 
 
 .245664 
 
 . 119210 
 
 .091407 
 
 . 122832 
 
 . 043719 
 
 .64 
 
 . 245486 
 
 .118751 
 
 .091775 
 
 . 122743 
 
 .043988 
 
 .65 
 
 . 245330 
 
 . 118290 
 
 .092125 
 
 .122665 
 
 . 044255 
 
 .66 
 
 . 245194 
 
 . 117827 
 
 .092457 
 
 .122597 
 
 . 044523 
 
162 
 
 the probability of state 8 to find the probability of state M (or 8a). 
 The equations to accomplish this are 
 
 D(8b) = P2- D(l)+D(10) + PI' D(2)+D(4)+D(7)+D(9) + P4-D(3) (Al . 8) 
 
 D(8a) = D(8) - D(8b) (A1.9) 
 
 where 
 
 D(x) is the probability of state x. 
 
 Tables 13, 14, and 15 present the results of the analysis. Table 
 
 13 gives the pair probabilities, where 
 
 (x,y) = P(a! = x, a" = y). 
 l i+l 
 
 Note that at z = 0. 5, 
 
 P(a! = 0) = (0,0) + (0,1) +(0,7) = 0.500000 (ALIO) 
 
 and 
 
 P(a* = 0) = (0,0) + (1,0) +(7,0) = 0.500000 (ALU) 
 
 i+l 
 
 Since each of these is the probability of zero digits in sum repre- 
 sentations, the steady-state probabilities of the sum as given by z = — 
 closely approximates the steady- state probability when both operands 
 are the result of previous additions. Calculations similar to (ALIO) 
 and (ALII) for z = 0.40 yield that 
 
 P(a' = 0) = 0.514479, P(a' , = 0) = 0.512679 (Al. 12) 
 
 i i+l 
 
163 
 
 Table 14. Conditional Probabilities of Digits in the Sums 
 Produced by Adder 3. 
 
 z 
 
 (0|0) 
 
 (l|0) 
 (1|0) 
 
 (ill) 
 (1|1) 
 
 (o|i) 
 
 (0|1) 
 
 (111) 
 
 (1|D 
 
 .40 
 
 .498251 
 
 .250875 
 
 .322440 
 
 .527967 
 
 .149594 
 
 .41 
 
 .498238 
 
 . 250881 
 
 .324484 
 
 .524709 
 
 . 150808 
 
 .42 
 
 .498271 
 
 . 250865 
 
 . 326466 
 
 .521562 
 
 . 151972 
 
 .43 
 
 .498347 
 
 .250827 
 
 .328386 
 
 . 518523 
 
 .153091 
 
 .44 
 
 . 498465 
 
 .250767 
 
 .330246 
 
 . 515588 
 
 . 154166 
 
 .45 
 
 .498624 
 
 . 250688 
 
 .332045 
 
 .512754 
 
 .155201 
 
 .46 
 
 .498824 
 
 .250588 
 
 .333783 
 
 . 510019 
 
 .156198 
 
 .47 
 
 .499062 
 
 .250469 
 
 .335463 
 
 . 507378 
 
 .157159 
 
 .48 
 
 .499338 
 
 .250331 
 
 .337083 
 
 . 504830 
 
 . 158087 
 
 .49 
 
 .499651 
 
 .250175 
 
 . 338644 
 
 . 502371 
 
 .158984 
 
 . 50 
 
 . 500000 
 
 .250000 
 
 .340147 
 
 .500000 
 
 .159853 
 
 . 51 
 
 . 500384 
 
 .249808 
 
 .341591 
 
 .497713 
 
 . 160696 
 
 .52 
 
 . 500803 
 
 . 249599 
 
 .342976 
 
 .495509 
 
 . 161514 
 
 . 53 
 
 . 501254 
 
 . 249373 
 
 .344304 
 
 .493386 
 
 .162311 
 
 . 54 
 
 . 501739 
 
 .249130 
 
 . 345573 
 
 .491340 
 
 . 163087 
 
 . 55 
 
 . 502256 
 
 .248872 
 
 . 346783 
 
 .489371 
 
 . 163845 
 
 .56 
 
 . 502804 
 
 .248598 
 
 .347936 
 
 .487477 
 
 .164587 
 
 .57 
 
 . 503383 
 
 .248308 
 
 .349030 
 
 .485656 
 
 . 165315 
 
 . 58 
 
 . 503993 
 
 . 248004 
 
 .350065 
 
 .483906 
 
 . 166030 
 
 .59 
 
 . 504632 
 
 .247684 
 
 .351041 
 
 .482225 
 
 . 166734 
 
 .60 
 
 .505301 
 
 . 247349 
 
 .351957 
 
 .480612 
 
 . 167430 
 
 .61 
 
 . 505999 
 
 . 247000 
 
 . 352814 
 
 .479067 
 
 .168119 
 
 .62 
 
 . 506726 
 
 . 246637 
 
 .353611 
 
 .477587 
 
 .168803 
 
 .63 
 
 . 507482 
 
 .246259 
 
 .354347 
 
 .476171 
 
 . 169483 
 
 .64 
 
 . 508266 
 
 .245867 
 
 .355021 
 
 .474818 
 
 .170161 
 
 .65 
 
 . 509079 
 
 . 245460 
 
 .355634 
 
 .473527 
 
 . 170840 
 
 .66 
 
 . 509921 
 
 . 245040 
 
 .356183 
 
 .472296 
 
 .171520 
 
164 
 
 Table 15. Transfer Digit Probabilities in Adder 3 
 
 
 
 (1.1) 
 
 (1,0) 
 
 z 
 
 (0,0) 
 
 (1.1) 
 
 (1,0) 
 
 .40 
 
 .351448 
 
 .126288 
 
 .197988 
 
 .41 
 
 . 353988 
 
 .124556 
 
 .198450 
 
 .42 
 
 . 356484 
 
 . 122849 
 
 .198909 
 
 .43 
 
 . 358938 
 
 .121167 
 
 .199363 
 
 .44 
 
 .361351 
 
 .119510 
 
 .199815 
 
 .45 
 
 . 363722 
 
 . 117875 
 
 . 200264 
 
 .46 
 
 .366054 
 
 . 116262 
 
 . 200711 
 
 .47 
 
 . 368347 
 
 .114 6 71 
 
 .201156 
 
 .48 
 
 .370602 
 
 .113099 
 
 .201600 
 
 .49 
 
 .372819 
 
 .111547 
 
 . 202043 
 
 . 50 
 
 . 375000 
 
 . 110014 
 
 . 202486 
 
 . 51 
 
 . 377145 
 
 .108498 
 
 .202930 
 
 . 52 
 
 .379256 
 
 . 106998 
 
 . 203374 
 
 . 53 
 
 . 381333 
 
 . 105514 
 
 .203819 
 
 . 54 
 
 .383377 
 
 . 104045 
 
 . 204266 
 
 . 55 
 
 . 385389 
 
 . 102590 
 
 .204716 
 
 . 56 
 
 .387369 
 
 .101148 
 
 . 205168 
 
 .57 
 
 . 389318 
 
 . 099718 
 
 .205623 
 
 . 58 
 
 .391238 
 
 .098299 
 
 . 206081 
 
 .59 
 
 . 393129 
 
 . 096891 
 
 .206544 
 
 .60 
 
 .394992 
 
 .095493 
 
 .207011 
 
 .61 
 
 . 396827 
 
 .094103 
 
 .207484 
 
 .62 
 
 . 398635 
 
 . 092721 
 
 .207962 
 
 .63 
 
 . 400417 
 
 .091345 
 
 . 208446 
 
 .64 
 
 .402174 
 
 . 089976 
 
 . 208937 
 
 . 65 
 
 . 403907 
 
 . 088611 
 
 .209435 
 
 .66 
 
 .405615 
 
 .087251 
 
 . 209941 
 
165 
 
 Calculations for z = 0.60 indicate that 
 P(a! = 0) = 0.487480, P(a| = 0) = 0.492648 (A1.13) 
 
 The small deviation from — illustrates the strong tendency that the 
 sum representation of this adder have to P(a! = 0) = ~. 
 
 Table 14, gives the conditional probabilities for the steady- 
 state sum representation, where 
 
 P(x y) = P(a! , . = x a| = y) (Al . 14) 
 
 l+i i 
 
 The entries for z = — , which are required to evaluate the 
 
 various normalization strategies, can be approximated by very 
 
 , , • m ■ 11111., 
 
 simple fractions. The approximations are — , — , — , — , — m the 
 
 2 4 3 2 6 
 
 order of their appearance. These values are employed in the cal- 
 culations to evaluate the possible normalization strategies. 
 
 Table 15 gives the probability of occurrence of the various 
 transfer digit combinations, where 
 
 (x, y) = P(t. = x, t! = y). (A1.15) 
 
 l l 
 
166 
 APPENDIX II 
 
 MINIMAL RIGHT-DIRECTED RECODER OF 
 RADIX TWO SIGNED DIGIT NUMBERS 
 
 Table 16 gives the overall right-directed recoder which is the 
 result of applying the extended Penhollow recoder (Table 6) to the 
 output of the assimilator of Table 5. This recoder should be sym- 
 metric , since the numbers to be recoded are symmetric. Table 
 16 was checked for symmetry and ten lines did not have symmetric 
 mates. It was found that altering any of the ten lines to make it part 
 of a symmetric pair did not alter the probability of zero digits in the 
 recoder output. Hence, the recoded digit and mode digit of these ten 
 lines are pseudo don't-care conditions. They can be chosen from two 
 of the three possible values and still not affect the probability of zero 
 digits (e.g. , the shift average) of the recoder output. Viewed in this 
 way, Table 16 is one member of a class of minimal recoders that 
 contains 2 members. 
 
 : That is, for each line in the table a second line should be found whose 
 entries are the negative of the entries of the first line. 
 
 : *Lines 5,27; 12,34; 17,39; 49,65; 54,70. The pairs separated by 
 semi-colons are the pairs which should be symmetric. 
 
Table 16. Recoder Based on the Cascaded 
 
 Assimilator-Recoder of Tables 5 and 6. 
 
 167 
 
 Line 
 
 No. 
 
 Choose 
 
 Known 
 
 m. z 
 
 m i-l 
 
 X. 
 
 1 
 
 i+l i+2 
 
 X i+3 
 
 X i + 4 
 
 
 
 
 
 
 
 
 
 
 
 - 
 
 - 
 
 1 
 
 
 
 
 
 
 
 1 1 
 
 - 
 
 - 
 
 2 
 
 
 
 
 
 
 
 1 
 
 - 
 
 - 
 
 3 
 
 
 
 
 
 
 
 1 1 
 
 1 
 
 - 
 
 4 
 
 
 
 
 
 
 
 1 1 
 
 
 
 1 
 
 5 
 
 
 
 
 
 
 
 1 1 
 
 
 
 
 
 6 
 
 1 1 
 
 
 
 
 
 1 1 
 
 
 
 1 
 
 7 
 
 1 1 
 
 
 
 
 
 1 1 
 
 1 
 
 - 
 
 8 
 
 1 
 
 
 
 1 
 
 1 1 
 
 - 
 
 - 
 
 9 
 
 1 
 
 
 
 1 
 
 1 
 
 - 
 
 - 
 
 10 
 
 1 
 
 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 11 
 
 1 
 
 
 
 1 
 
 1 1 
 
 
 
 1 
 
 12 
 
 1 
 
 
 
 1 
 
 I 1 
 
 
 
 
 
 13 
 
 1 
 
 
 
 1 
 
 I 1 
 
 
 
 1 
 
 14 
 
 1 
 
 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 15 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 _ 
 
 16 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 17 
 
 1 
 
 o 
 
 1 
 
 1 
 
 
 
 
 
 18 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 19 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 - 
 
 20 
 
 1 
 
 
 
 1 
 
 
 
 _ 
 
 _ 
 
 21 
 
 1 
 
 
 
 1 
 
 1 
 
 - 
 
 - 
 
 22 
 
 1 
 
 
 
 1 
 
 1 
 
 - 
 
 - 
 
 23 
 
 
 
 
 
 
 
 1 1 
 
 - 
 
 - 
 
 24 
 
 
 
 
 
 
 
 1 
 
 - 
 
 - 
 
 25 
 
 
 
 
 
 
 
 1 1 
 
 1 
 
 _ 
 
 26 
 
 
 
 
 
 
 
 1 1 
 
 
 
 1 
 
 27 
 
 1 1 
 
 
 
 
 
 1 1 
 
 
 
 
 
 28 
 
 1 1 
 
 
 
 
 
 1 1 
 
 
 
 T 
 
 29 
 
 1 1 
 
 
 
 
 
 1 1 
 
 1 
 
 - 
 
 30 
 
 1 
 
 
 
 1 
 
 1 1 
 
 _ 
 
 - 
 
 31 
 
 1 
 
 
 
 1 
 
 1 
 
 - 
 
 - 
 
 32 
 
 1 
 
 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 33 
 
 1 
 
 
 
 1 
 
 1 1 
 
 
 
 1 
 
 34 
 
 1 
 
 
 
 1 
 
 1 I 
 
 
 
 
 
 35 
 
 1 
 
 
 
 1 
 
 1 1 
 
 
 
 1 
 
 36 
 
 1 
 
 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 37 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 - 
 
 38 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
Table 16 (Continued). 
 
 168 
 
 Line 
 No. 
 
 Choose 
 
 Known 
 
 m. z 
 
 i i 
 
 m i-l 
 
 X. 
 
 1 
 
 x - . , x. _ 
 l+l i+2 
 
 X i+3 
 
 X i+4 
 
 39 
 
 I 
 
 
 
 1 
 
 1 
 
 
 
 
 
 40 
 
 I 
 
 
 
 1 
 
 1 
 
 
 
 I 
 
 41 
 
 1 
 
 
 
 I 
 
 1 
 
 I 
 
 - 
 
 42 
 
 1 
 
 
 
 1 
 
 
 
 - 
 
 - 
 
 43 
 
 1 
 
 
 
 T 
 
 T 
 
 - 
 
 - 
 
 ' 44 
 
 1 
 
 
 
 1 
 
 1 
 
 _ 
 
 _ 
 
 45 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 - 
 
 - 
 
 46 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 - 
 
 47 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 48 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 
 
 1 
 
 49 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 
 
 
 
 50 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 
 
 1 
 
 51 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 52 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 53 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 54 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 55 
 
 o T 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 56 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 57 
 
 1 
 
 1 
 
 1 
 
 
 
 - 
 
 - 
 
 58 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 - 
 
 59 
 
 1 
 
 1 
 
 1 
 
 1 
 
 _ 
 
 - 
 
 60 
 
 1 1 
 
 1 
 
 
 
 1 
 
 - 
 
 - 
 
 61 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 - 
 
 - 
 
 62 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 - 
 
 63 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 64 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 
 
 I 
 
 65 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 
 
 
 
 66 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 
 
 1 
 
 67 
 
 1 
 
 1 
 
 1 
 
 1 1 
 
 1 
 
 - 
 
 68 
 
 I 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 69 
 
 I 
 
 1 
 
 T 
 
 1 
 
 
 
 1 
 
 70 
 
 1 
 
 1 
 
 l 
 
 1 
 
 
 
 
 
 71 
 
 1 
 
 1 
 
 l 
 
 1 
 
 
 
 1 
 
 72 
 
 1 
 
 I 
 
 l 
 
 1 
 
 1 
 
 - 
 
 73 
 
 1 
 
 1 
 
 I 
 
 
 
 - 
 
 - 
 
 74 
 
 1 
 
 1 
 
 I 
 
 1 
 
 - 
 
 - 
 
 75 
 
 1 
 
 1 
 
 I 
 
 1 
 
 - 
 
 - 
 
 76 
 
 1 1 
 
 1 
 
 
 
 1 
 
 - 
 
 " 
 
169 
 
 > 
 + 
 
 o 
 Q 
 
 
 •AA/S^ 
 
 WV f-j-C>H> I ' -WAr-" 
 
 Wwjh- 
 
 Q. 
 
 o 
 
 c o 
 
 60 
 
 i 
 m 
 i—i 
 
 o 
 
 CM 
 
 <D 
 J-i 
 
 
 
 bo 
 
170 
 
 APPENDIX III 
 SELF -INITIALIZING MODULES 
 
 Initializing a limited connection arithmetic unit with a cent- 
 ralized circuit requires that each module have one electrical 
 connection for the initializing signal. Since one of the major require- 
 ments of LSI is minimizing the number of connections to the module, 
 these pins should be eliminated if possible. 
 
 In one method which does not require pins for initializing 
 signals, a circuit which activates an initializing signal for a fixed 
 period of time after power is applied is placed on each module. This 
 signal is distributed to all points on the module that require an 
 initializing signal. An example of this technique is shown in Figure 
 20, where the initializing signal is active while C-j, is charging. 
 
 
171 
 
 REFERENCES 
 
 1. Allard, R. W. , Wolf, K. A., and Zemlin, R. A., "Some 
 
 Effects of the 6600 Computer on Language Structures, " 
 Communications of the ACM , Vol. 7 No. 2, Feb. 1964, 
 pp. 112-119. 
 
 2. Anderson, S. F. , et al. , "The IBM System / 360 Model 91 : 
 
 Floating Point Execution Unit, " IBM Journal of Research 
 and Development , Vol. 11 No. 1, Jan. 1967, pp 34-53. 
 
 3. Atkins, D. C. Ill, "The Theory and Implementation of SRT 
 
 Division, " Report 230, Department of Computer Science, 
 University of Illinois, Urbana, Illinois, June 1967. 
 
 4. Avizienis, Algirdas, "A Study of Redundant Number Representa- 
 
 tions for Parallel Digital Computers, " Report 101, 
 Department of Computer Science, University of Illinois, 
 Urbana, Illinois, May I960. 
 
 5. , "Signed Digit Number Representation for Fast 
 
 Parallel Arithmetic, " IRE Transactions on Electronic 
 Computers , Vol. EC-lONo. 3, Sept. 1961, pp. 389-400. 
 
 6. , "A Flexible Implementation of Digital Computer 
 
 Arithmetic, " Information Processing 1962 , North Holland 
 Publishing Company, Amsterdam, 1962, pp. 664-670. 
 
 7. Borovec, R. T. , "The Logical Design of a Class of Limited 
 
 Carry-Borrow Propogation Adders, " Report 275, 
 Department of Computer Science, University of Illinois, 
 Urbana, Illinois, August 1968. 
 
 8. Comfort, W. T. , Private Communication, July 1968. 
 
 9. Elspas, B. , et al. , "Investigation of Propogation-Limited 
 
 Computer Networks, " AFCRL-64-376, Stanford Research 
 Institute, Menlo Park, California, April 1964. 
 
172 
 
 10. , "Investigation of Propogation-Limited Computer 
 
 Networks, " AFCRL-64-376 (II), Stanford Research 
 Institute, Menlo Park, California, July 196 5. 
 
 11. IBM Corporation, "IBM 7094 Principles of Operation", A22-6703, 
 
 Poughkeepsie, New York, 1962. 
 
 12. Muller, D. E. , "Asynchronous Logics and Application to 
 
 Information Processing, " Switching Theory in Space 
 Technology , Ed. : Aiken and Main, Stanford University 
 • Press, 1963, pp. 289-297. 
 
 13. Nakata, I. , "A Note on Compiling Algorithms for Arithmetic 
 
 Expressions, " Communications of the ACM , Vol. 10 
 No. 8, Aug. 1967, pp. 492-494. 
 
 14. Nash, J„ P„ , Ed. , "Illiac Programming, A Guide to the Pre- 
 
 paration of Problems for Solution by the University of 
 Illinois Digital Computer, " Department of Computer 
 Science, University of Illinois, Urbana, Illinois, 1956. 
 
 15. Penhollow, J. O. , "A Study of Arithmetic Recodings with 
 
 Applications in Multiplication and Division, " Report 
 128, Department of Computer Science, University of 
 Illinois, Urbana, Illinois, Sept. 1962. 
 
 16. Robertson, J. E. , "A Deterministic Procedure for the Design 
 
 of Carry-Save Adders and Borrow-Save Subtractors, " 
 Report 235, Department of Computer Science, University 
 of Illinois, Urbana, Illinois, July 1967. 
 
 17. , "A New Class of Digital Division Methods, " IRE 
 
 Transactions on Electronic Computers , Vol. EC-7 No. 3, 
 Sept. 1958, pp. 218-222. 
 
 18. , "Methods of Selecting Quotient Digits During 
 
 Division, " Proceedings of IFIP Congress 1965, Spartan 
 Books, Washington, D. C. , 1966, pp. 444-445. 
 
173 
 
 19. , "The Correspondence Between Methods of Digital 
 
 Division and Multiplier Recoding Procedures, " Report 
 252, Department of Computer Science, University of 
 Illinois, Urbana, Illinois, Dec. 1967. 
 
 20. , Private Communication, July 1969- 
 
 21. Rohatsch, F. A. , "A Study of Transformations Applicable to 
 
 the Development of Limited Carry-Borrow Propogation 
 Adders, " Report 226, Department of Computer Science, 
 University of Illinois, Urbana, Illinois, June 1967. 
 
 22. Stone, H. S. , "One-Pass Compilation of Arithmetic Expressions 
 
 for a Parallel Processor, " Communications of the ACM , 
 Vol. 10 No. 4, April 1967, pp. 220-223. 
 
 23. Thorlin, J. F. , "Code Generation for PIE (Parallel Instruction 
 
 Execution) Computer, " AFIPS Conference Proceedings , 
 1967 Spri ng Joint Computer Conference, Thompson Book 
 Company, Washington, D. C. , 1967, pp. 641-643. 
 
 24. Thomasulo, R. M. , "An Efficient Algorithm for Exploiting 
 
 Multiple Arithmetic Units, " IBM Journal of Research 
 and Development , Vol. 11 No. 1, Jan. 1967, pp. 25-33. 
 
174 
 
 VITA 
 
 Michael John Pisterzi was born in Chicago, Illinois, on 
 September 28, 1936. 
 
 He was awarded the Bachelor of Science degree, with High 
 Honors, in Electrical Engineering by the University of Illinois in 
 June 1961. He received the Honeywell Award and the University of 
 Illinois Outstanding Electrical Engineering Student Award the year 
 of his graduation. He received the Master of Science degree in 
 Electrical Engineering from the University of Illinois in June 1962. 
 
 He is on the staff of the System Development Division of the 
 IBM Corporation. He has had positions with the Admiral Corporation, 
 Motorola, Incorporated, and the Douglas Aircraft Company. He has 
 had teaching appointments with the Electrical Engineering Depart- 
 ment of the University of Illinois and research appointments with the 
 Department of Computer Science of the University of Illinois. 
 
 He is a member of Phi Kappa Phi, Tau Beta Pi, Signma Tau, 
 Eta Kappa Nu, Pi Mu Epsilon, Chi Gamma Iota, Phi Alpha Mu, the 
 Institute of Electrical and Electronic Engineers, and the Association 
 for Computing Machinery. 
 
 SEP - 1 1970 
 

 ^^ 
 
 \SW