The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN dec 16 m L161 — O-1096 JULtW Report No. 398 )u.3ft i'i v^SV-IA A LIMITED CONNECTION ARITHMETIC UNIT by Michael John Pisterzi June 1, 1970 Report No. 398 A LIMITED CONNECTION ARITHMETIC UNIT * by Michael John Pisterzi June 1, 1970 Department of Computer Science University of Illinois Urbana, Illinois 618OI This work was supported in part by the National Science Foundation under Grant No. US NSF GJ8l2 and Grant No. US NSF GJ813 and was submitted in partial fulfillment for the Doctor of Philosophy degree in Electrical Engineering, 1970. Digitized by the Internet Archive in 2013 http://archive.org/details/limitedconnectio398pist Ill ACKNOWLEDGEMENT I would like to acknowledge the guidance, encouragement, and helpful suggestions of my advisor, Professor James E. Robertson. I would like to thank Mr. Webb T. Comfort for several suggestions that contributed significantly to the clarity of this paper. I would also like to thank Miss Cheryl Becker for her skillful typing and the members of the Drafting Department of the Depart- ment of Computer Science for the figures that they drew. Finally, I would like to thank my family, particularly my wife Candace, for their encouragement. TABLE OF CONTENTS IV 1. INTRODUCTION -Page 1. 1 Statement of the Problem 1 1. 2 Relation to Prior Work 5 1.3 Structure of the Remainder of the Paper 7 INTRODUCTION TO THE METHOD OF PERFORMING THE PROCESSING 2. 1 Organization of the Arithmetic Unit 11 2. 2 Generalized Examples ................. 22 2. 3 The Basic Micro-Instruction Repertoire of the DPUs 26 THE ARITHMETIC CONSIDERATIONS OF IMPLE- MENTING A LIMITED -CONNECTION ARITHMETIC UNIT 3. 1 Introduction 3. 2 Applicable Number Representations and Addition Methods ...................... 3. 3 Multiplication Considerations ........... 3.4 Multiplier Re coding ................ 3. 5 Normalization Considerations . . 3. 6 Division Considerations .............. 35 37 56 69 75 89 INTERACTION WITH MEMORY 4. 1 Introduction . 108 4. 2 Methods Applicable When the Memory Byte is the Digit 109 4. 3 Methods Applicable When the Memory Byte is a Number of Digits .. ................. . 114 OPERATIONAL SPECIFICATION OF THE MODULES 5. 1 Introduction .............. 5. 2 The Digit Processing Unit ...... 5. 3 The Primitive Control Unit ..... 5. 4 The End Unit 5. 5 Exponent Arithmetic Unit ....... 5. 6 The Sense Micro-Instruction Detector 123 125 136 140 142 143 Pa g < 6. SUMMARY AND CONCLUSIONS 6. 1 Discussion of Results .................... 149 6.2 Suggestions for Related Work ... o .......... . 152 APPENDIX I. CHARACTERISTICS OF THE SYMMETRIC RADIX TWO SIGNED DIGIT ADDER 156 II. MINIMAL RIGHT -DIRECTED RECODER OF RADIX TWO SIGNED DIGIT NUMBERS .............. 166 III. SELF -INITIALIZING MODULES .............. 170 REFERENCES 171 VITA 174 A LIMITED CONNECTION ARITHMETIC UNIT Michael John Pisterzi, Ph.D. Department of Electrical Engineering University of Illinois, 1970 A method of designing digital arithmetic units which are capable of performing floating point addition, subtraction, multiplication, division, and normalization is presented in this paper. The resulting arithmetic unit designs will be particularly appropriate for imple- mentation in Large Scale Integration. The major characteristics of a limited connection arithmetic unit are: 1 . It is composed of a large number of complex modules . 2. A very small number (three to eight) of specific module types are used. 3. The number of signal paths required between any module and the remainder of the arithmetic unit is small. 4. Each specific intermodule signal must be sent to a very small number of modules (one to three). The paper shows that the complexity of the modules which are required to construct a limited connection arithmetic unit can be adjusted by selecting the number system and the details of the pro- cessing algorithms. By this means the design can be tailored to the specific technology with which it is to be implemented. The general arithmetic considerations of limited connection arithmetic units were also investigated. The major conclusion of this investigation is that signed digit number systems must be employed. Several studies pertinent to radix two limited connection arith- metic units were also conducted. The probability of occurrence of each digit pair was determined for one of the three radix two adders. The results of this analysis were used in a study of normalization techniques. Finally, an optimum right-directed multiplier recoder was developed. 1. INTRODUCTION 1 . 1 Statement of the Problem This study reports on a method of designing arithmetic units in a technology in which: • the fan- out of the output signal from any logic element is limited, • the basic building blocks are networks con- taining large numbers of logic elements, • the number of signal paths through which a basic building block communicates to other basic building blocks is severely limited, and • a small number of basic building block types must be employed. These are, of course, characteristics of that class of technologies popularly known as Large Scale Integration, or LSI. The resulting arithmetic unit has some rather desirable properties not shared by other arithmetic unit organizations proposed for implementa- Sic tion in LSI (9, 10, 12) . The result of a given step of the calculation is determined digit by digit. Each digit of every result is determined in the ^Numbers in parentheses refer to articles listed in the References. order in which it will be required when that result must be mani- pulated further, and it is stored in the same building block in which it will be needed when that result is to take part in additional processing. These properties of the arithmetic unit allow the processing of a second step to begin as soon as a suf- ficient (and small) number of digits of the results of the first step have been determined. . This paper presents a design procedure which has a high probability of evolving a design which effectively utilizes the potentialities of the technology. For the purposes of this paper, a given technology is considered to be effectively utilized when the average number of logic elements utilized per basic building block exceeds a fixed, predefined percentage of the number which can be constructed on it. The arithmetic unit will be developed as a large number of appropriately connected modules. These modules are functionally defined and are dis- tinguished from basic building blocks so that we may postpone the question of whether a module can be implemented as one basic building block until after the modules are specified. We are able to do a significant amount of analysis without including specific technological considerations. The results of this analysis indicate that the major module type, the Digit Processing Unit (DPU), can be tailored to the technology by the selection of the number representation employed by the arithmetic unit. The number of logic elements in the other principal building block, The Primitive Control Unit (PCU), is related to the details of the algorithms employed to perform multiplication, division and normalization, in addition to the number representation employed. An arithmetic unit will consist of a large number of DPUs and only one PCU. If the performance objectives of the arithmetic units are not achieved by the design in which the DPUs and the PCU are implemented as single basic building blocks, more sophisticated algorithms may be employed to perform the basic operations. This would tend to increase the number of logic elements in the PCU without significantly affecting the design of the DPU. The PCU would then have to be implemented as several basic building blocks. The study concentrates on the former problem; namely, the specification of module types required to construct an arith- metic unit. This problem is essentially independent of the technology with which the arithmetic unit is to be implemented. The results will show that the design has a sufficiently large number of parameters to allow this method of designing an arithmetic unit to be applied to any technology which has the characteristics mentioned earlier. The design of the fractional part processor of a floating point arithmetic unit will be developed in detail in this paper. It will be capable of performing addition, subtraction, multi- plication, division, and normalization. This emphasis on the fractional part processor is based on the relative number of digits and the processing required by the fractional part in con- trast to that required by the exponent. The fractional part of a typical floating point number contains several times as many digits as the exponent does. In addition, the processing required by the fractional part of the operands is much more complex than that required by the exponent. Before an addition or subtraction is performed, the radix points of the operands must be aligned, which may require a number of right shifts. The result may also have to be shifted left to normalize it. In contrast, the exponents of the operands need only be subtracted to determine the number of radix alignment shifts required. The only other operation necessary is the addition of the number of normalization shifts to form the exponent of the result. The difference in processing is even more extreme for multiplication and division. The fractional part calculation con- sists of a sequence of additions and shifts , while the exponent *This implies, of course, that there is only one adder in the fractional part processor. This assumption will be justified later. calculation is basically a single addition. Hence the processing required by the fractional part is more difficult than that required by the exponent. It is possible to evolve a design for the fractional part processor, as will be shown in this paper. The same design, or a simplified version of it may be employed for the exponent processor, although it may not be desirable to do so. The complete result of the exponent calculation is required to determine the processing to be performed on the fractional part, while only an estimate of the current results of the fractional part processing is required to determine what additional processing is necessary. For example, if an add is being performed, the difference of the exponents indicates the number of shifts required to align the radix points of the fractional parts, while only the first several digits of the sum of the fractional parts is necessary to deter- mine if the sum is normalized. Hence, it may be desirable to perform the exponent calculations by a means which allows the entire results to be available in the shortest time. 1.2 Relation to Prior Work The two known research efforts into the development of limited connection arithmetic units have evolved designs in which operations are performed by special purpose units (9, 10, 12). The two efforts, while not identical, have several charac- teristics in common. The first of these is the development of special purpose units, namely the successful development of inde- pendent units to perform addition, subtraction, and multiplication. The second is the mode of operation. In both, operands travel through the arithmetic unit as a string of digits and must be gated to the appropriate execution unit. The pattern in which the basic building blocks are connected defines the processing that will be performed on the operands streaming through them. Finally, in both efforts control information is inherent in the logical structures of the mechanism. The arithmetic unit developed in this paper is opposite in approach to those efforts. It has only one execution unit -- cap- able of addition, subtraction, multiplication, division, and normalization. The operands remain relatively stationary in this unit while being processed. The processing is controlled by control signals which propagate through the arithmetic unit. This approach was taken because it holds promise of not requiring modifications to the programming systems employed on the computer. Computers having several specialized execution units (for the same class of operands) require either additional effort on the part of the programmer, more complicated compilers "(1, 13, 22, 23), or special hardware (2, 24) to assure that all units are kept busy as much of the time as practical while retaining the appropriate ordering of the operations where necessary. This study evolves an arithmetic unit design in which all floating point operations are performed by the same execution unit. The intent of this effort was to evolve an arithmetic unit that is operationally equivalent to the Von Neumann single adder arithmetic unit. Since there is one execution unit, there is no need for expending effort in attempting to maximize and coordi- nate the activity of independent execution units. Hence, the arithmetic unit developed here may be able to replace the arithmetic unit in existing computers with little or no change to the remainder of the computer and its programs. There is no need to develop specialized compilers; schemes similar to the Common Data Bus (24) are needed only in those systems which require buffering between the arithmetic unit and memory. The arithmetic unit described in this paper requires little or no additional development in other areas. 1 . 3 Structure of the Remainder of the Paper In Chapter 2, the concept of the autonomous Digit Pro- cessing Unit is developed. The Digit Processing Units (or DPUs) * Fixed point operations may also be performed by the unit. each contain one digit of each of the active operands in its registers. The DPUs communicate with their neighbors in such a way that when one DPU is executing a given processing step (or micro- instruction), no other DPU is. The DPUs are organized so that the micro-instructions are passed from one DPU to the next in a specific order. The method of performing useful processing by causing the same sequence of micro- instructions to be performed by each of the DPUs will also be • presented, as will the concept of micro- instructions streaming through the DPUs as they are executed by the DPUs in sequence. The chapter concludes by defining the micro- instructions which the DPUs must be able to perform and by indicating how these micro-instructions are combined to perform 'machine* instruc- tions. Chapter 3 treats the arithmetic aspects of the design. It discusses the number systems which can be employed, how addition overflow may be handled and how multiplication, division, and normalization may be performed. It also contains the development of a minimal right-directed recoder of radix two signed-digit numbers. The various possible methods of normalization are evaluated for radix two signed-digit numbers, and the optimum method is shown to depend on the ratio of shift time to the examination time. Chapter 4 describes how the arithmetic unit and the memory containing its operands may communicate. The methods discussed are applicable when the memory byte consists of an integral number of digits . The methods discussed range from the very simple (entailing no additional equipment in the repeti- tive portion of the arithmetic unit) to the rather complex, employing a significant amount of special logic. The operational description of the modules required to construct a limited connection arithmetic unit are described in Chapter 5. An attempt has been made to relate types of mod- ules required and their characteristics with such parameters as the size of memory byte, the number system, and the algo- rithms for performing the machine instructions. Conclusions and suggestions of related research are pre- sented in Chapter 6. As indicated there, the attempt to con- struct a Limited Connection Arithmetic Unit in an appropriate *Byte is used in this paper as the quantity of data which a device (e.g. , the storage unit) operates on simultaneously. Byte is used in its arbi- trary sense in this paper, which stresses that no assumption was made concerning the relative size of the data unit of the storage unit with respect to the sizes of digits and operands (which are denoted as words). **In the unlikely event that several storage unit bytes are required to con- tain one digit, digit assembly /disassembly logic may be included in the DPUs or the PCU, and the method applicable to one digit per byte can be employed. 10 technology would be the best way of determining what additional considerations should be made. Three appendices are included, which provide specific information required to design radix two arithmetic units. The first presents the analysis and results of a study of the steady state probabilities of each possible pair of digits in the repre- sentation of sums of the symmetric radix two signed-digit adder. The radix two minimal multiplier recoder developed in Chapter 3 is discussed further in Appendix II. A method of initializing the arithmetic unit that requires no additional connections is presen- ted in Appendix III. 11 2. INTRODUCTION TO THE METHOD OF PERFORMING THE PROCESSING 2. 1 Organization of the Arithmetic Unit A basic Limited Connection Arithmetic Unit consists of a Primitive Control Unit, a number of Digit Processing Units, and an End Unit. The Primitive Control Unit (PCU) receives instructions from some external device and converts them into a sequence of micro-instructions to be executed by the Digit Pro- cessing Units (DPUs). The conversion which the PCU performs is very similar to the conversion performed by the adder con- trol logic of contemporary single adder arithmetic units. For example, a multiply is converted into a number of shifts and adds. The DPUs collectively contain the fractional parts of all active operands and do the processing on them. The DPUs have the capability of performing micro- instructions which will (when performed by all DPUs) form sums, perform shifts, and do inter- register transfer. The End Unit allows the last DPU to be identical to all the other DPUs and to operate as though it had a DPU on its right. -^'Chapters 4 and 5 will discuss variations which employ additional modules 12 • T *t >_ 1 ■ 1 1 i 1 1 (VI + 3 •— < Q. O * 1 1 1 | ♦ •-t + 3 CL O ♦ 1 1 1 1 t 3* Q.' O * | 1 l J. J. ^ S, \! / * i i i ! # X 6 CL Q • ; I t ft z 3 O ♦ ! i i J*, * B •I-t U o •l-t -ft o c c o V 0) rt u a c o • H -4-> rt N • i-i c rt GO U O 0) H a) Sh GO •i-t *i I 13 A operand M. operand operand Z operand DPU, DPU DPU. 1 2 : a l m l a 2 m 2 4 a 3 m 3 f * ■* / Z l z 2 Z 3 DPU n n m n n n n M a. r l i=l n m. r i i=l etc ■where r is an integer greater than one known as the radix of the arithmetic unit. Figure 2 - The distribution of operands digits in the DPUs of a limited connection arithmetic unit. 14 The inter- module connections of a typical Limited Connection Arithmetic Unit is shown in Figure 1 . Each DPU retains the values of one digit of each of the active operands in its register, as shown in Figure 2. As mentioned earlier, each DPU performs the same sequence of micro-instructions . From Figure 1 it can be seen that a given micro-instruction can not be executed by all DPUs in synchronization, but rather must be executed by them in sequence (i.e. , first by DPU. , then DPU , . . . ). As soon as all the DPUs which contain information required by DPU. to perform micro-instruction j+1 (referred to as /I ) have executed a . and have sent the required information to DPU, , J 1 li . may be performed by DPU. . The micro-instructions are defined to have regular data requirements, so that as each additional DPU executes u. . , one more DPU may execute a . , . J J + 1 The micro-instructions may be viewed as flowing through suc- cessive DPUs. This initial description brings out two characteristics that the set of micro-instructions is to have; namely, the need '^This is not absolutely necessary, as certain micro-instructions are performed to classify the value of an operand, as in selecting quotient digits. The DPUs not containing information required to make this classification need not perform these micro-instructions. 15 for information from as small a number of other DPUs as possible, and well- matched intrinsic execution rates. Both decrease the time between micro-instruction executions by DPU , and hence tend to yield high performance arithmetic units. This paper shows that the number of DPUs which must transmit information to a given DPU can be limited to one. The execution rate could not be taken into account at the level of this analysis. It is felt that differences in the execution rates of the micro-instructions can be minimized by employing more logic elements to perform the more complex (and hence poten- tially slower) micro-instructions. The above description of the operation of the arithmetic unit indicates that the DPU registers do not contain entire oper- ands as long as any of the DPUs are actively executing micro- instructions. Each DPU contains the digits of the results of the last micro-instruction it has executed. For example, if at a particular instant of time the following DPUs are active: DPU 3 (executing p 7g ), DPU 5 ( /* 74 >, DPU g ( ^ ?3 ), DPU 10 ( ^ 72** DPU 15 ( ^ 71*' ' " ' the accumulator register distributed through the DPUs would contain: 75*1' 75 a 2' (_) ' 74 a 4' (_) ' 73 a 6' 73 a 7' (_) ' 72 a 9' (_) ' 71*11' 7i a i2' 7i a i3' 7i a i4' ^"^' '*' ' The dashes (~) indi cate that the 16 corresponding digit cannot be explicitly identified as it may be changing. The number preceding the letter 'a' indicates which of the operands the given digit is part of. For example, - c a 75 1 is the vahie of the first digit of the 'A' register after the first seventy-five micro-instructions have been executed. This terminology will be used throughout this paper. The processing performed by the DPUs can be des- cribed by the following: .X. = ¥ .(. .X., .F. .n., ... .G., 2.1.1 J .F. = *. (. $., .F. ) (2.1.2) J i J J-1 i J i-l and .G = r. (. .X,, .F ) (2.1.3) J k j j-1 k j k-1 where ^ th X is the operand information contained in the i DPU J i immediately following the execution of micro- instruction j. It is indicated to be a vector as it consists of the i digit of each of the active operands , 17 ty . is the function employed to obtain the new operand set. It is dependent only on the micro-instruction to be performed, .F is a 'modifier' value which DPU. transmits to J i i DPU. . with the micro-instruction to be performed l+l next, . is the function which each DPU performs to deter- J mine .F, , J k Ci. is the number of DPUs which must cooperate with J the DPU performing /i . by transmitting to this 'active' DPU a value of .G, , J k >\< .G, is the value which DPU, transmits to the DPUs J k k with which it cooperates when they are executing M . , and I\ is the function all DPUs employ to determine the J value of .G, which it transmits to all DPUs with J k which it cooperates. The operation of a typical DPU, DPU., is as follows. It begins in a state in which it is receptive to information defining *The value of .G, must be defined for k > n+1 where n is the number of J k - DPUs in the arithmetic unit. One possibly is .G, = for all k >n+l. J k 18 the next micro-instruction to be performed. DPU. receives 1 this information and the value of .F. , from its left neighbor, j l-l DPU. , . When DPU. receives this information, it determines l-l i .G. (i.e. , performs Equation 2.1.3) and places this value on signal lines which are connected to several of its left neighbors. It also determines .F. by performing Equation 2. 1.2, and trans- mits this value and the identity of Li . to DPU. , . Some time later DPU. receives a signal from DPU. . indicating that DPU. , l l-l l-l has executed 11.. DPU. then executes a . (i.e. , performs J 1 J ty .), altering the value of one or more of its internal registers. DPU. transmits a signal at this time to DPU. , which indicates l l+l that DPU. , may execute a .. When DPU. receives an acknow- i+l J i ledgment from DPU. , it goes into the state where it is recep- tive to information concerning a . . . The sequence above then j + 1 repeats. Notice that in this formulation the operations are com- pletely independent of the significance of the digits retained by the DPU. All DPUs may then be identical. A second thing to note is that a . , the number of DPUs to the right of the DPU performing the micro- instruction, is assumed to be a function of the micro-instruction being performed. Its value determines the desirability of including the micro-instruction in the repertoire of the DPUs. The larger the value of the parameter 19 DPUj DPU 2 DPU 3 DPU 4 DPU 5 • • i i i i i . i i i i » a = l DPU t DPU 2 DPU 3 DPU 4 DPU 5 I i . i i { . — « — i i i i i — i i i i — i i < — < — 1 i a = 2 DPUi DPU 2 DPU 3 DPU 4 DPU 5 i i i > i , i \ i i i i i — i i i i— i — i >— \ i i i f i 1 1 1 ( — < i — — i > — — 4 i — i i a = 3 DPUi DPU 2 DPU 3 DPU 4 DPU 5 i i J I i i " I i i t " i M 1 1 ' i i i . i i i ■ ; > t i i n { > < i i | — i i a =4 Figure 3. Inter-DPU data paths required for various values of cC. 20 the less desirable the micro -instruction. There are two reasons for this. The first reason is that the subject micro- instruction cannot be performed by a DPU until the a . DPUs * to its right have completed all previous micro-instructions. The execution rate of the subject micro-instruction is inversely- proportional to a .+1 . The second reason is that the number of J data paths required between the DPUs is determined by the maxi- mum value of this parameter. This is shown in Figure 3, where 01 is the maximum of the 0f . for the complete set of micro- J instructions. Hence, the number of connections which must be made to a DPU is directly related to ° . The communication of control information is not included specifically in the discussion above. Status information must be communicated from a given DPU to all DPUs to which it may supply data. Hence this com- munication network will also appear as shown in Figure 3. A second item which must be minimized is the number of values taken on by .F. , the variable presented to DPU. , , by J i i+l fVi DPU. when it identifies the next (j ) micro-instruction. This factor has a less profound effect on the data communications requirements. It affects only the number of transmission paths *Or all the DPUs to its right, if there are less than a . remaining in the sequence. Table 1. 0(. of the Micro-instruction of the Example, 21 12 3 4 5 6 2 10 12 Time Micro- instruction Operand Register \ U 2 M 3 % P 5 °1 °2 °3 °4 °5 1 S l 2 1 S l 1 3 S 2 1 S l 1 4 S 2 » 1 5 2 S 2 1 2 1 6 3 2 S 2 1 3 2 1 7 S, 3 2 4 S 2 3 2 8 S 4 3 2 3 2 9 4 S 4 3 2 4 3 2 10 S 5 4 S 4 3 4 3 11 S 5 4 S 4 4 12 S 5 4 4 13 5 S 5 4 5 4 14 6 5 S 5 6 5 15 6 5 6 5 16 6 5 6 5 17 6 5 6 5 18 6 6 Figure 4. Example of the operation of a limited connection arithmetic unit. 22 necessary between adjacent DPUs over and above that required due to a . 2. 2 Generalized Examples An example of generalized operations will now be pre- sented to illustrate that the processing of several micro- instructions may take place simultaneously in the arithmetic unit each by a different DPU. The a . are given in Table 1. The J arithmetic unit will have five DPUs and one operand. The operands will be indicated as A., the value of the operand after the j micro-instruction. This operand will be composed of five digits .a. , . . . , .a_ such that digit .a. is the digit contained J 1 J 5 j i in DPU. after the j micro-instruction, l J The operation of the arithmetic unit is presented in tabu- lar form in Figure 4. The columns labelled O. will indicate the th operand contained in DPU.. The occurrence of 'j' in the i operand column will indicate that .a. has just been determined J i and placed in the operand register of DPU.. The columns labelled M will indicate when a micro- instruction has just i been completed by DPU.. The occurrence of 'j' in an instruction column will be used to indicate that the associated DPU has just .th executed micro-instruction j. The occurrence of 'S.' in the i J instruction column indicates that DPU. has just received the 23 identity of M ar *d will begin determining .G . The pro- J J i gression of time will be indicated by the rows, each row equivalent to the time required for a DPU to execute one micro-instruction . In Figure 4, the arithmetic unit is shown to be in a steady state at time 1. No micro-instructions are being executed and A is in the operand register. We will assume that the identity of \l has reached DPU and G , G , and , G„ are available. At time 2, DPU receives G^ from DPU„ 13 1 12 2 and G from DPU and executes \i . This causes a to be replaced by a . During the next four time intervals, \i is performed consecutively by each of the remaining DPUs , since an additional , G, becomes available just as it Ik J is required by a DPU to perform jU . The identity of the second micro-instruction, S , is received by a DPU one time unit after that DPU performs \i . Since DPU requires G to execute \i ( CL - 1), this micro-instruction is not performed by DPU until time 5, since it is not until that time that DPU is able to determine this value and send it to DPU . Just as with jU , jLl is executed sequentially by each of the remaining DPUs during each of the next four time intervals. -•'This presentation of the example is not intended to suggest that synchronous operation is necessary. 24 Micro-instruction 3 is performed by each of the DPUs one time unit after that DPU has performed M because 0? = 0, and no outside information ( G ) is required. The other micro- ■J iC instructions are performed in the same pattern. In general, DPU. performs /I. the time unit following its execution of \l . . if CL . = 0; DPU. performs U . after it J" 1 J i J has received .G ~. . from DPU *, .if 0/ . 4 0. l u .+i " .+i l Note that the number of DPUs in the arithmetic unit does not affect the rate of execution of micro-instructions. To illustrate that the correct result is determined, the operation of the arithmetic unit is completed after executing six micro-instructions. The processing will then be completed by the end of time 18, at which time the contents of all the operand registers are indicated. The reader will note that their con- tents are ,a, , ,a_, ,a. ,a. and .a_ , which constitutes A . , 61626364 65 6 as required. If the six micro-instructions whose processing was depicted in Figure 4 were the micro-instructions required to accomplish one regular machine instruction, and are to be followed by other micro- instructions , the latter can clearly be initiated by DPU at time 16 + Q (where ^ is the first micro- instruction for the next instruction). Hence, the micro- 25 instructions of successive instructions can be overlapped in the same manner that the micro-instructions of a given opera- tion can. The time required to 'perform' an instruction is therefore not significantly affected by the number of DPUs comprising the arithmetic unit. The time required to perform an instruction is essentially the time between the performance by DPU. of the first micro-instruction of that instruction and its performance of the first micro-instruction of the next instruction . This assumes, of course, that the time required for a typical micro-instruction to be performed by all of the other DPUs of the arithmetic unit is negligible in comparison to the time the arithmetic unit is busy executing a continuous string of micro-instructions *For the last instruction, the time when a typical micro-instruction could be performed by DPU, should be used instead of the time the first micro-instruction of the next instruction is performed. **This continuous string of micro-instructions is a result of the mapping of a continuous string of instructions into micro-instructions by the primitive control unit. The continuous string of instructions may be a result of the effective concatenation by a supervisory pro- gram of the instructions required by a sequence of tasks to be performed by the arithmetic unit. The time the arithmetic unit is busy on a continuous string of micro-instructions may then be on the order of several hours. 26 2. 3 The Basic Micro- Instruction Repertoire of the DPUs In this section we will discuss the micro-instructions which must be included in the repertoire of the DPUs so that the overall arithmetic unit is able to do addition, subtraction, multiplication, division, and normalization. The micro- instructions may be placed in four classes for the purposes of this discussion. These four classes are: 1. the inter- register transfers, 2. the shift micro-instructions, 3. the arithmetic micro- instructions , and 4. the memory accessing micro-instructions. Micro-instructions in the first class cause operands to be transferred from one register to another. This allows the results of one machine instruction to be used as an operand in a subsequent instruction. For example, let us consider the case in which a third number is to be added to the quotient of a division that has just been performed. In additions, the number in the register is added to the contents of the A register. Since the register is used as the interface with the storage device, the contents of the MQ register (in this case the quotient of the division) must be transferred to the A register before the addition can be performed. 27 A second application of the inter- register transfer micro-instructions is in the exchange of operands when nor- malization or radix point alignment are required. If, like a classical Von-Neumann arithmetic unit, only the A register and the MQ register have the required shifting ability, an operand in the register which must be shifted to align its radix point to that of the other operand must be moved to the A or MQ register. In all the inter- register transfers, all of the data required by a DPU to perform the micro-instruction is con- tained within that DPU. This can be seen in Figure 2. Each DPU contains one digit of each of the operands. Therefore, a. - for all inter-register transfers and .F. is not required J i to transmit data. The value of .F. may be used, therefore, to J i identify one or both of the registers taking part in the transfer. The number of micro-instruction codes which must be assigned to inter-register transfer micro-instructions is therefore dependent on the number of values which may be taken on by .F. in addition to the number of pairs of registers between which inter- register transfers are to be performed. 28 In the notation of Equation 2.1.1 through 2.1.3 the inter- register transfer micro-instructions may be formu- lated as: •x. = . ,y. J i J" 1 i i = 1, 2, , n (2.3.1) .F. = .F. J i J i" 1 i= I, 2, , n (2.3.2) .G = J i i = 1 , 2, . . . , n (2.3.3) where .x J i j-i y i is the register to be copied into the X register, is the i digit of the X register after the transfer, th is the i digit of the Y register before the transfer, and indicates that the value of .G. is not required when J i performing inter-register transfers. The second class of micro-instructions is the shift micro- instructions. They are used during radix point alignment prior to addition or subtraction, for normalization, and for multiplication and division by the radix during the repetitive steps for multi- plication and division. We will assume that shifts of more than 29 one digital position will be performed as a number of succes- sive shifts of one digital position each. The left shift can be accomplished by causing the DPU to the immediate right of the DPU performing the micro- instruction to transmit the value of the digit of the operand contained in its register to the DPU performing the micro- instruction. This DPU stores the digit it receives in its oper- and register. The equations defining a left shift micro- instruction are: .x. = G i = 1, 2, ..., n (2.3.4) .F. = .F i = 1, 2, ..., n (2.3.5) J i J i-l .G. = . _x. i=l,2,...,n (2.3.6) J i J" 1 i .G . = .F if .F is a valid digit (2.3.7) J n+1 j n j n * otherwise see text where X is the operand being shifted, .x. is the i digit of the shifted operand, 30 . .x. is the i digit of X before the shift, .F. is the modifier value passed along with the micro-instruction. .F is the value that the PCU sends to DPU, with the J 1 left shift micro-instruction to indicate the value that is to go into the last DPU. If .F is a valid digit, it becomes the digit shifted into the last DPU. If it is not a valid digit, it causes the End Unit to shift in the digit shifted out during the last right shift. One should also note that the left shift micro -instruc- tions make it possible to transmit the most significant digit of an operand to the PCU. The value of this digit will be on the .G, lines just prior to the execution of the left shift micro- J 1 instruction by DPU . The left shift can therefore be used by the PCU to examine operands. The right shift micro-instruction does not have the complexity of the left shift micro-instruction. The value stored into a DPU is the value transmitted by its left neighbor DPU with the indication that a right shift is to be performed. The value of the digit stored in the first DPU is determined 31 by the PCU. In the terminology of Equations 2.1.1 through 2.1.3, .x. = .F. . i = 1, 2, . . . , n (2. 3.8) J i J i- 1 .F. = .x. i = 1, 2, ..., n (2.3.9) J i J" 1 i .G. = <^null> i=l,2, ...,n (2.3.10) where .F^ is the digit which the PCU transmits with the indi- J o cation that a right shift is to be performed. This value becomes the value of the most significant digit of the shifted operand. The value of F , which is transmitted by DPU to the j n n 'End Unit' , where it is stored as the new top element in the push-down stack. A final note concerning shifts is that the value of a =1 and <* =0; and that in the worst case (left shift), .F. must RS j i take on one value more than the number of values a digit may assume. Two micro-instruction codes are required for each register which has shifting capabilities. 32 The third class of micro- instructions are those micro- instructions required to control the arithmetic processing of the operands. Multiplication and division will be implemented as a number of additions and shifts, just as they were in the classical Von-Neumann arithmetic unit. The only micro- instructions necessary in this case are those which cause A to be replaced by the following expression A. = A. .+ (k * ) (2.3.11) J J" 1 j where A. is the value of the operand contained in the A J register after the arithmetic operation, A is the value of the operand contained in the A register prior to the arithmetic operation, 0. is the value of the operand contained in the register, and k is a number whose magnitude is less than the radix employed by the arithmetic unit. It is shown in Section 3. 1 that k does not take on values which are not digits of the number representation. The .F. interconnections may therefore be employed to distribute the value of k to the DPUs with no need to increase the number of 33 interconnections, since the .F. data paths must convey all possible digit values for the shifting micro-instruction. In Section 3.2 we show that the addition function can be implemented such that Of is 1 or # is 2. In the imple- mentations for which Of is 1 , either the radix and the values which may be taken on by k are restricted, or two micro- instructions must be performed to complete each addition. In the implementation for which & is 2 , no such restrictions are encountered. The choice of which method of addition is implemented must be based on trade-off considerations. The major points to be considered are: • The average time taken to perform additions by the three possible methods. • The cost of implementing an arithmetic unit whose DPUs have a = 2 compared to the cost if a = 1. This cost has three components. The first component is the larger number of inter- connections which must be made. In addition to transmitting .G. and receiving G. , , DPU. must J i j 1+ 1 i be able to receive .G. _ when # = 2. The second J i+2 component of this cost is the additional logic ele- ments required because of the larger number of signals it must receive, the more complex control 34 logic, and the more complex addition logic. The third component is the increased time to perform all micro- instructions; the control logic in the DPU is more complex when CL - 2 and, quite likely, slower. The last class of micro-instructions are those which cause exchanges of data between the arithmetic unit and the storage unit. There will be no micro-instructions in this class if the PCU acts as an intermediary in the exchange as suggested by Comfort (8). The micro-instructions in this class are similar to the inter-register transfer micro-instructions in that .F J i may be used to identify (or aid in identifying) the register taking part in the exchange and the type of exchange (i.e. , load or store). The number of micro-instruction codes required is related to the number of registers which communicate with the storage unit, and to the number of values which .F. may take on. Interfacing the storage unit to the arithmetic unit is discussed in depth in Chapter 4. 35 3. THE ARITHMETIC CONSIDERATIONS OF IMPLEMENTING A LIMITED -CONNECTION ARITHMETIC UNIT 3.1 Introduction The limited connection arithmetic unit is organized to process floating point operands. The fractional part of the operands will be contained in, and processed by, the DPUs. The processing will begin with the most significant digits of the operands and proceed to those with decreasing significance. This is necessitated by the requirements of the normalization and division processes. In both of these processes, the values of the most significant digits of the operands determine what additional processing is required. In normalization, the value of the most significant digits are examined to determine if the number has been normalized, and if not, the next step to be taken. During division, the approximate value of the partial remainder, determined by examining several of the most significant digits, is used in the selection of the next quotient digits. In both of these cases, the results of examining several of the most significant digits of the operands determines the next several micro-instructions to be performed, and hence must be performed in as short a time as possible. This then makes it necessary for the most significant digits of the operands 36 to be placed in the DPU which is adjacent to the PCU, since it is the first DPU to execute each micro-instruction. The various methods of implementing the addition micro- instruction (see Equation 2.3.11) are discussed in Section 3.2, together with a discussion on the implications of each method on the complexity and performance of the arithmetic unit. Section 3.3 discusses the overflow recoder which must be implemented in the PCU if the arithmetic is to be able to form double length product fractions. The desirability of recoding multiplier digits is discussed in Section 3.4. A class of opti- mum recodings for radix 2 signed-digit numbers is also developed in this section. Normalization is discussed in Section 3. 5. The optimum algorithm for normalizing radix 2 signed-digit numbers is determined; it is shown that the opti- mum algorithm for a specific arithmetic unit is dependent on the value of 5 of that arithmetic unit . Division is analyzed in Section 3.6. The relationship between the parameters of the number system and the number of quotient digits determined during each examination of the partial remainder is determined in this section. This analysis shows that it is possible to imple- ' s The ratio of the time for the PCU to sense the value of the first two digits of the number normalized to the time to shift that number one digital position to the left. 37 ment division without requiring special data paths if one quotient digit is determined during each examination of the partial remainder and if an o = 2 adder is implemented. 3. 2 Applicable Number Representations and Addition Methods In order to perform multiplication and division by alternately doing one addition and shifting, the arithmetic unit must include micro-instructions to add or subtract various multiples of one of the operands to the other. That is, the arithmetic micro-instructions must be charac- terized as follows: A' = A + (k*0) (3.2.1) where A', A, are consistently represented numbers , and k is a multiplier or quotient digit such that | k|2r + S(F) - 2. But from (3.2.3) and (3.2.4) we see that S(A') = 2 (r- £ ) + 1, so But since S(F)< 3 - 21 I> 1 > then S(F)<1. (3.2.9) Note that if S(F) = 1 , the f input of the upper stage is constant, and the structure therefore degenerates to the Avizienis structure. The conclusion is that it is not possible to devise a two level adder such that the conditions of Equations (3. 2. 1) and (3.2.2) can be met. 45 I 9i t: °i \ Figure 6. The three level adder based on the application of two simple transformations. 46 3.2.3 The Three Level Adder We will now analyze a three level structure which can also be characterized by a = 2. The notation for this adder, shown in Figure 6, is as follows: cr = a. + k 4> (3.2.10) ii i w = cr. - t. , r (3.2.11) i l i-l cr' = t. + w. (3.2. 12) ill w! = . are the i digits of the operands ii th ^ a' is the i digit of a representation of A + k * i .th cr. , rr 1 are the i digits of intermediate ii representation of A + k * 0. t., w. , are the components into which . e 0, k e K, C. e 5>J , ill i cr' e ^\ t.E T, t' e T' , w. e W, w' e W. i <^-- i i ii 47 Equation (3.2.14) yields S(A') ^S(T') + S(W') - 1. Using (3. 2. 3), (3.2.4), and assuming that S(W') = r, since this maximizes the set J> ' covered by T' + W , the equation becomes an equality 2 (r-i ) + 1 = S(T') + r-1, or S(T') = r-2/ + 2. (3.2.15) Since the size of T' must be 2 or greater to assure that the three level adder does not degenerate into the Avizienis adder, r - 2J+ 2 > 2, or r^2^ . This condition is satisfied for all number systems con- sidered in this paper. Now, from (3.2. 13) and (3. 2. 1 5) the size of g ' is S( ^') = S(T') • S(W) = r (r-2 /+ 2). (3.2.16) From Equation (3.2.12) we see that S( g ') S(A) +S(k • 0) -1. Applying (3.2. 17) and the assumption that S(W) = r, this becomes 2 "j ^ r -2ri+r - r > 2 (K + 1 ) (r-J ) + 1 (3.2.18) Setting K = r-1, which is equivalent to assuming that k e B ,,(3.2.18) reduces to r-1 J l (3.2.19). Since £ < ~ for all signed-digit number systems, it is possible to implement a three level adder for any signed-digit number system. This conclusion is valid whenever all multiplier digits k are chosen from B , or one of its subsets. Hence, the r-1 redundancy of multiplier and quotient digits has only a second- order effect on the adder complexity. The number of DPUs to *Since the size of ^' is fixed by our earlier assumption. 49 a 1 25 g S M L 10 •- 3 BS* j i ai o u. o T3 R) a; a c o • H 4-> O u 0) C I o u 0) c o W T> T3 rt i— I > 0) 1—1 0) • ■-4 50 which a DPU must send information for performing additions is independent of the range of multiplier digits employed, while the amount of information is dependent on this factor. 3.2.4 Implementing the Three Level Adder The designer has two methods of implementing the three level adder in the limited connection arithmetic unit. The first method is to implement it so that one micro-instruction is executed for each addition operation. If this method is chosen, the data processing portion of the structure can be represented schematically as in Figure 7. The registers, of course, retain one digit of each of the operands. Adder Part I determines the appropriate C. from the data retained in the registers of the Digit Processing Unit and the value of k, which is carried by the control (micro-instruction) stream. That is, each Adder Part I implements Equation 3.2. 10. It transmits this information to the Adder Part II of its own DPU, and to those of the two DPUs to its left. Each Adder Part II then determines its appropriate a' digit based on the values of the <7"s presented to it. That is, i the transfer function of Adder Part II is the combined transfor- mations of Equations 3.2.11 through 3. 2. 14. Note that in this case, the a of the arithmetic unit must be two or more. A second method allows a to be one by 51 requiring two micro- instructions to be executed for each addition operation. Equations (3. 2. 10) through (3. 2. 12) must be per- formed by the first micro-instruction, which causes an interim representation, ^ ' , of the sum to be placed in the accumulator register. The second micro-instruction then recodes this interim representation to an acceptable signed-digit number by performing Equations (3.2.13) and (3.2. 14). This method of implementing addition is subtly different from the Avizienis Adder, described in Section 3.2.1. From one to three uses of the Avizienis Adder are required to perform an addition operation, while two addition micro-instructions are required in the method just described. Moreover, this method of performing addition requires that the number of states taken on by a digit of the accumulator be larger than that required in the Avizienis adder by approximately a factor of — , the radix of the number system employed (see Equation 3.2.16). Hence, if the arithmetic unit must be implemented such that a = l, the Avizienis Adder appears to be a more optimum design than the two step implementation of the three level adder. 3.2.5 A Detailed Look at the Radix Two Three Level Adder We will now discuss the simplest of the adders of the above type, the radix two three level adder. It is presented 52 as both an example and a means of examining in greater detail the requirements of the adder. In the case of radix two, K =1 and £. must be chosen to be one. There are three possible adders. The first, which will be referred to as Adder 1 , can be characterized by T = (1 , 0, T), W=T' = (0, T), and W = (1,0). The second adder, which will be referred to as Adder 2, can be characterized by T = (1, 0, T), W = T' = (1, 0), and W = (0, 1). The third adder, Adder 3, can be characterized by T = W = T' = W = (1, 0, 1). The operands for all adders are n . n A = "yT a. 2 and = S""* . 2 , i=l i=l the sum is n i=0 • = ? a, i 2 ' i - The relationships between them are given by Equations (3.2.10) through (3.2. 14), in which r = 2 and | k | = 1 . The choices in Equation (3.2.11) are given for all values of C , and for all three adders in Table 2. 53 Table 2. Digit Choices in (3.2.11) For The Radix 2 Adder Types Input Adder 1 Adder 2 Adder 3 <3"i t. w. l-l i i-l i t i-l w i 2 1 1 1 ' 1 i T 1 1 1 7 1 7 1 r 1 2 r o r i Table 3. Digit Choices in (3.2.13) For The Radix 2 Adder Types Input Adder 1 Adder 2 Adder 3 r, (3.3.16) i so the set of values taken on by rZ. + 8. are therefore contiguous 11 in the terminology of Rohatsch. 68 Vi-4- 1 Hence, the set of values taken on by v r + Z , y T i T l+l must also be contiguous. This implies that S(Z ) >r h+1 . (3.3.17) Since 6 and Y are symmetric, we will assume i i+1 that Z. and Z , are also symmetric. Therefore, we will assume that h+1 Z < r - 6 (3.3.18) 1 2 where <$ = r Mod 2. The set defined by the left side of Equation (3.3. 14) must contain the set defined by the right side, and the largest element of the former must be larger than the largest element of the latter , or r h+1 -(r-l) + v hil -* > r . /r* 1 ^ -6 U(r-/)-(r+l) (3.3.19) After manipulation to isolate , this becomes /?(l-i)ifi>l. 1 l 1 1 2 where x. is the i digit of the number, with weight n r (ie, X = !> x.r )» x.eB , T~i i i r-i 1=1 v is the number of digits examined to determine if X is normalized. H is a integer parameter indicating the redundancy of the number system, 1 <| x r i ^ j j=i+l * From the definition of normalized numbers above, we see that, for 1= 1 -lii. -v r OX K(l-r ), and v I ■ i -"V -n 0^ X- <(r - r n ). v These can be combined to yield r _1 -r" V +r" n <^|x |< 1 - r" n (3.5.1) When>?>l, we see from condition 4 that v = 2 and the analysis becomes r+1- 2 r k < X 2 <(r- i) (r+1) , 2 r o < (r-l) x - n . 2 n L r r J which yields r+l-i+(i- r) 2 (r-l) r • J 2 r _ - 1 n r < X|< (r-i) (r-l) (3.5.2) n We may see from the above that the range to which numbers can be normalized is fixed for all but the maximally redundant num- bers. For maximally redundant numbers, the range of values decreases as the number of digits increases. 78 3.5.2 Normalization Recodings The procedure for normalizing signed-digit numbers is complicated by the existence of representations for which x. is not zero but which do not meet the definition of normalized numbers. These representations must be recoded into an arith- metically equivalent representation which, after shifting out leading digits, does meet the definition. The recoding converts the most significant digits of this representation, x = + 1, x_ = x = ... =x = 7 (i-l), x = 7(r-a) (3.5.3) 1 — 2 3 T- 1 T into the following digits x, = 0, x = x = . . . = x = + (r-i), x = + a (3.5.4) 1 2 3 t-1 — T — where either the upper or the lower sign is chosen uniformly in the description, and x is the number of digits altered by the recoding, t ^>2. A decimal example of this procedure, assuming 2 - 2, is the recoding of ' 1 1 1 6 5 3 ' into '088453'. In this example x = -1 , x = x = + (2-1), and x = +6. 79 3.5.3 Methods of Normalization There are two basic methods of normalizing numbers in this arithmetic unit. They differ in the manner in which they obtain information about the number. In the first method, the values of the most significant digits are sensed only by shifting them into the PCU. This can be thought of as over-normalization followed by restoration, since the number will have a non-zero integer part stored in the PCU just prior to restoration. In the second method the values of required digits are transmitted to the PCU by some micro -instruction. Provision for sensing the contents of the A register will be included to sense partial remainders during division. Hence, this normalization technique would not appreciably add to the complexity of the DPUs . In addition, this latter method affords the designer two options to minimize the time to perform the partial normalization. The first option is the inclusion of sense micro-instructioni detectors among the DPUs. Each time the primitive control unit receives notification that additional digits of the number are available it can determine whether it has sufficient information to complete the partial normalization and also whether additional 80 left shift micro- instructions must be issued. Note that the only case for which neither is true is the case for which the recoding may have to be performed. Hence, in all but this one case the time required to sense the number is largely overlapped with the time to perform the shifts required, and the time to partially normalize may be less than the time for a micro-instruction to propagate through v digit processing units. The second option is the inclusion of micro-instructions to perform the recodings indicated above. These micro-instruc- tions would be such that they would add +_ (r-1) to the register of the DPU containing the representation being normalized if it initially contained + (*-l) and would then transmit the micro- instruction to its right neighbor. "When the register of the DPU executing the micro- instruction contains any other digit, the micro -instruction causes + r to be added to the register contents, hence affecting the + (r-a) to + a transformation. This is the form of the transformation required of the last digit to be altered. This last DPU does not pass the micro-instruction to its neighbor. Comparisons between the two basic methods are very difficult to make because of the options available with the latter method. 81 3.5.4 Analysis of Radix Two Normalization Methods Now we will look in detail at the problem of normalizing radix two numbers whenv = 2. The choice of examining first two digits of the number is compatible with the implementation of the simplest division algorithm. Because of the small value of v , we will assume that the arithmetic unit does not contain sense micro-instruction detectors and that the DPUs do not have micro-instructions to do the recodings. Numbers in this system which do not require recoding during normalization are of the form . . . , x X i^> i where x is either +1 or -1 for any particular number, and X is either or x. Numbers in this system which do require recoding are of the form where x = -x. The above numbers will be referred to as (i) and (i, j) in the remainder of the discussion. 82 We will assume that the probability distribution of a given digital position is a function only of the value of the digit to its left, and that these probabilities are P (leading zero) = j P(0 1 1) = P(0 1 1 ) = j P (0|0) =| P(l|T) = P(T|l) = j P(l|0) + P(l 0) = 7 P(l|l) = P(l |i) = 7, where P (y z) = P (x = y, given x = z). I l+l i Then the probabilities of the two numbers are P ((i)) = ^r— , and (3.5.5) 3-2 P((i. J)) = V~7 (3.5.6) 36-2 -6 J Four methods for performing this normalization will be considered, Method A consists of shifting the representation left until the portion of the representation shifted into the Primitive Control Unit can be recoded into two digits of the same sign or a non-zero digit followed by a zero. The two digits are then shifted back into the DPUs. : This is based on the analysis of a radix two adder in Appendix I. 83 Method B consists of examining the digits contained in the two most significant DPUs. If the terminal digit is not detected, two left shifts are launched and the process repeated. If the terminal digit is detected, the additional shifts necessary to normalize the number are performed and the process termi- nated. Method C consists of examining the digits contained in the two most significant digital positions. If the terminal digit is not detected, one left shift is performed if the first non-zero digit is in the second digital position, two left shifts are per- formed otherwise. The process is then repeated. If the ter- minal digit is detected, then the appropriate final shifts, if any, are performed and the process terminated. Method D consists of shifting the representation to be normalized left until a non-zero digit is shifted into the primitive control unit. The first two digital positions are then examined. If the terminal digit is not detected, two left shifts are performed. The (examination, two left shift) portion of the procedure is repeated until a terminal digit is detected and the appropriate final steps performed. *The recoding is performed by shifting the digit to be changed into the primitive control unit, which recodes the digit and shifts the altered digit right (into DPU ). Table 8. Normalization Procedure for All Possible Radix 2 Signed Digit Numbers 84 Left Number of Right Method Class Number Shifts Examinations Shifts (i) i+2 2 A (i. J) i+j +3 2 (i) i even i i+2 2 (i) i odd i+1 i+3 2 1 B (i, j) i+j even i+j+2 i+j +4 2 1 (i, j) i+j odd i+j + 3 i+j+3 2 1 (i) i even i i+2 2 (i) i odd i i+3 2 (i» j) i+j even i+j+2 i+j +4 2 1 C (i,j) < 1 i even i odd ) i+j+2 i+j+3 2 1 v: j [i odd (i.j) < i+j+3 i+j +5 2 1 j even (i) i+1 1 1 D (i, j) j even i+j+3 i+2 2 1 (i, j) j odd i+j+2 i+3 2 1 Table 9. Average Number of Operations Required During a Normalization 85 Normalization Method . A 3| 2 B 1^ 45 5 9 1^ 35 C 'if 1_ 3 2 l 105 D *! 1 ^ 86 Table 8 gives the number of left shifts, right shifts, and examinations which must be performed for each possible number and for each of the above methods of normalization. Applying the probabilities given in (3. 5. 5) and (3. 5. 6) and assuming that the numbers are not limited in length , one obtains the average number of left shifts, right shifts, and examinations shown in Table 9. Assuming that the time to per- form a left shift is the same as the time to perform a right shift and neglecting the time required to make decisions, the expected number of shift times required to perform a normalization is = + + f . (3.5.7) where XT^ is the expected number of shift micro-instruction times required to normalize an output of a symmetric base 2 signed digit adder. <(\jSy is the expected number of left shift micro-instructions required. ; The error in these calculations is approximately 3x2 , and 2 for the average number of left shifts and examinations required, respectively, where n is the number of digits in a number. 87 Ul I (/> U. o or UJ CD Z Q UJ I- O UJ 0. X UJ A l- V 6.5 6.0 5.5 5.0 4.5 4.0 METHOD A METHOD B METHOD C • METHOD D — — ' / ■7- / / / / * / / / V J 1 1 I I I ' ' » 1.0 1.5 2.0 Figure 9. Average time to normalize a radix two signed digit number. 88 <\RS / is the expected number of right shift micro- instructions required. ^ is the ratio of the time required to transmit the two most significant digits of the number under- going normalization to the primitive control unit to the time required to launch a shift micro- instruction. <^E^> is the expected number of times the digits retained in the first two digital positions of the register con- taining the number being normalized must be transmitted to the primitive control unit. A graph of \T/> versus 5 as determined from Table 9 and (3.5.7) is presented as Figure 9. This graph shows that 2 Normalization Method C is optimum when ^< 1 - , Method D when 1 — 1 — . The value of f is approximately 2 if the 'red tape' is issuing micro- instructions is negligible and becomes smaller as the 'red tape' becomes more predominate. Hence, Method A is the optimum when 'red tape' is negligible; Methods D and then C become optimum as the 'red tape' increases. 89 3. 6 Division Considerations Division, like multiplication, must be performed by- repetitive additions and shifts in the limited connection arith- metic unit because it contains a single adder. Unlike multi- plication, however, the partial remainder must be examined periodically to determine one or more quotient digits, which will control the use of the adder in subsequent steps. That is, while the repetitive steps of division must be performed radix r because of the existence of a single adder, the quotient digit determination may be performed with radix r , where X is the number of radix r quotient digits determined in one com- parison. In the arithmetic unit under investigation, the time required to transmit a given number of digits of the partial remainder or divisor to the selection mechanism is directly proportional to the number of digits transmitted. The value of several digits of the partial remainder are required to deter^ mine one quotient digit; the value of one additional digit is required for each additional quotient digit. Hence, the total time spent obtaining information from the partial remainders decreases as the number of quotient digits determined in each step increases. This effect is opposed by two factors. The first is the accuracy to which the value of the divisor must be 90 known. The number of digits of the divisor whose value must be available to the quotient digit selection logic has the same form as that for the partial remainder; several digits for the first quotient digit, one for each additional quotient digit deter- mined. The second factor is the time required for the quotient digit selection logic to yield the quotient digits after the appro- priate information is presented to it. This last factor is not related in a simple way to the micro-instruction execution time. Therefore, rather than determining a specific division algorithm, the number of digits of the divisor and partial remainders which must be presented to the quotient digit selection logic will be determined as a function of the number of quotient digits selected per step, and the radix and redundancy of the numbers employed. The class of divisions requiring the minimum infor- mation of the value of the partial remainders and divisors will be analyzed. 91 <"o Figure 10. P-D plot for general SRT division. 92 3.6.1 Analysis of Division for Maximally Redundant Numbers We will now begin the analysis of the number of digits of the partial remainder required to determine a number of quotient digits, extending the work of Robertson (18) and Atkins (3). The number of digits of the partial remainder and of the divisor which must be available to the quotient digit selection mechanism is determined by the situation depicted graphically in Figure 10. This figure is a P-D plot, as suggested by C. V. Frieman (18). The abscissa is the divi- sor value, the ordinate the value of the shifted partial remainder. The graph is divided into regions for which a given quotient digit value may be chosen. The boundaries of such regions are lines of the form and where r x p. = (q-K)d (3.6.1) r X p. = (q+K)d (3.6.2) is the radix of the number representation system, is the number of quotient digits determined by a single comparison (hence the division radix is effectively r ) , 93 p. is the value of the partial remainder at the con- elusion of step j, q is the quotient digit value which may be selected, n is the largest possible quotient digit, i.e. , q e jn, n-1, . . . , 1, 0, 1, . . . , nj , d is the divisor value, K is a constant determined by the redundancy of the quotient and is K = . (3.6.3) r X - 1 Note that since quotient digits will be determined by truncated versions of the divisor and partial remainder, it is only known that the point representing a given divisor and partial remainder is within a rectangular region of the truncated values. Hence, each such region must lie entirely within one of the quotient digit regions defined by Equations (3.6.1) and (3.6.2). In Figure 10, line 1 represents the lower boundary of the region for which the choice q=n can be made. Line 2 is the upper boundary of the region for which q=n-l may be chosen. The interior of the dashed-line rectangle represents the range of possible divisor, partial remainder pairs which have truncated •s 1 a values of d and (n - "r)d, respectively where d is the minimum positive truncated divisor value. 94 When this rectangle lies entirely within the two lines, all other such rectangles of the same size will be entirely * within some quotient digit region . Atkins has shown that insuring that point 3 on line 2 is not below point 4 on the rectangle guarantees that the rectangle is within the two lines. This condition is: (n-l+K) •( d -A d) >(n-j) d +A p (3.6.4) where d is the smallest positive truncated divisor value, Ap is the truncation error in the shifted partial remainder, Ad is the truncation error in the divisor value. Note that this equation is based on the assumption that the partial remainder will be shifted X digital positions before the comparison is made. Since the most significant quotient \ -1 digit has weight r , the more usual procedure is to shift the partial remainder left only one digital position, and the X next quotient digits are determined. The quotient digits are then disposed of beginning with the most significant digit. *This is a conservative solution. It will be discussed later that solutions were found which had smaller values of P and 6 r 95 One left shift is performed between each such pair of steps. Hence, the value of Ap determined by Equation (3.6.4) will X-i be r larger than it will have to be if the method above is used for determining new partial remainders. Taking the above consideration into account, Equation (3.6.4) can be manipulated into the following more useful form: X -1 1 A r . Ap + (n-l+K)' Ad < (K- -) . d (3.6.5) From the section on normalization we see that for maxi- mally redundant numbers the division parameters are 3 = - , A d = r" 6 , Apzr _/ (3.6.6) r where 6 is the number of digital positions of the divisor which are transmitted to the quotient digit selection device and p is the number of digital positions to the right of the radix point which are transmitted to the quotient digit selection mechansim. Table 10. Requirements for Performing Division with Maximally Redundant Numbers. r X e 6 r > 4 r = 3 r = 3 r = 2 r = 2 r = 2 X >1 A;>2 X = 1 X >3 X = 2 X = 1 X+ 1 X+ 2 2 X+ 2 4 2 X+ 2 X+ 2 3 X+ 3 4 2 97 The maximum quotient 'digit' consists of Xdigits all equal to (r-1), hence X-l . , n= ^(r-ljr 1 = r -1 (3.6.7) i=0 which yields from (3.6.3) that K = 1 . (3.6.8) Applying (3.6.6), (3.6.7) and (3.6.8) to (3.6.5), X _ l -p X _6 l r r r + (r -l)r <— (3.6.9) Now letting 6 = /O+l (3.6.10) yields that r P >(4 - — . ) r A (3.6.11) A r Therefore p = X +1 and 6 = X +2 for r>4. (3.6. 12) The other cases are, in general, no more difficult to solve. Optimum solutions are given in Table 10; the r=3, ^ =1; r=2, X =2 and X=l entries were not, however, deter- mined from Equation (3.6.9). For these cases, requiring that the rectangle lie entirely within the region bounded by lines 1 and 2 in Figure 10 is overly conservative. In these cases it is Figure 11. P-D plot used to determine £ for r=2,A=l, 99 possible to choose and 6 such that point 2 of the rectangle coincides with point 1 of line 1. This condition is expressed algebraically as r _1 r"^ +(r X -2)r" 6 £ j- (3.6.13) For all but r = 2 and X =1, this equation determines Pand6 However, the coefficient of (2) is zero when r=2, X=l in Equation (3.6. 13); hence 6 could not be determined by that equation. To determine the value of 6 , the P-D plot for this case was drawn. It was seen that A d must be such that the region of divisor values and partial remainder values centered a t ( "r , 0) will remain within the q=0 region. As seen in Figure 11, this value is Ad = — ; hence 5=2. The reader should notice that when X = 1, the values of only the first two digits of the partial remainder are required to select the quotient digit. Since, as we have seen in Section 3.2, the primitive control unit must receive information from the first two DPUs to detect overflow when addition or subtraction is per- formed, no additional data paths are required to sense the value of the partial remainder. For r = 2, 6 =2 also; indicating that no special data paths are necessary. For r ^> 3 , 6=3; indi- cating that special provision would have to be made to transmit 100 the value of the divisor digit from the third DPU to the PCU. "While it is possible to include a special data path from the third DPU to the PCU, an alternative which appears attractive is to normalize by method D as discussed in Section 3.4, since this allows the value of three digits of the operand to be accessible to the PCU. These are the digit shifted into the PCU, and the digits in the first two DPUs. Hence, it is possible to build an arithmetic unit for an arbitrary radix which has very regular and repetitive interconnections. Whether this is adequate, or whether a division algorithm in which more than one quotient digit is determined at each com- parison, can only be decided when some very implementation- dependent factors are taken into account. The major factors are the time required for the quotient digit selection logic to produce its output and the decrease in the speed of micro- instruction execution caused by the additional connection to cer- tain of the data paths . 3.6.2 Analysis of Division for Other Number Systems The analysis of division for an arithmetic unit in which other than a maximally redundant representation is employed is much more complicated. From the discussion of the preceeding section, we see that, in general, 101 4 r -r-rA + 21-1 (r-l) -6 d = 5 + i TV r r 2 (r-l) f" 1 ' (3.6.14) Ap = iizn -p (r-l) and A d = 1 rf- r (r- 1) The maximum quotient digit is X-l (v- 0\ r _1 = (r-l) i=0 1) (3.6. 15) (3.6.16) (3.6.17) from which we obtain K = i r-l (3.6.18) Applying the above to Equation (3.6.5) and doing some manipula- tion we obtain X-l-p + 'LlH r x . 1 + (^-D (r-l) 2 (r-l) -6 - 6 r-2^+1 r £ — r 2r 1 + (*-*) (r-/)(r-l) Neglecting all but the first r and right-hand term and letting 6 = p+ 1, 102 and doing further manipulation we obtain p 2r X+1 (2r-^-l) r 2 (r-1) (r-2i + l ) ' (3.6.19) From which we find that />= X +1 forJ> < | - 1. (3.6.20) When minimally redundant representations are employed in the arithmetic unit, the solution is f>= X +2 for |- l U Z GE 4 §5 n 0- 110 » > U 1 2 lO H Li. ? -) CO _) "3 Q ^ T 7 < <> rr III H u 2 (0 CO llJco £* gUJ < CO 0. N ^ o n Q n n n Cj < < < <•' ^ 7 1 § t £ Z O o o cr to < r- O s V c o •iH •J-> O •iH C S s o u CO 1— I 0) h 3) •iH Ill accomplished simultaneously. To do this, the most significant digit of the number to be loaded is obtained from memory. Then the left shift micro-instruction is performed such that the digit obtained from memory becomes the least significant digit. The digit shifted into the PCU by that micro-instruction is then stored in the memory as the most significant digit of the number being stored. The next digit of the number to be loaded is then obtained and the sequence above repeated. The sequence is repeated as many times as there are digits in the numbers. The other method of loading and storing the arithmetic registers involves the use of generally distributed memory buses, sets of address registers, and scheduling mechanisms. There are two sets of each of these items, one for loading operands and one for storing results. This method requires only one micro- instruction to be issued per data transfer. The diagram of the system is presented as Figure 13. The operation of the system is as follows. When the PCU detects a load or store operation, it transmits the memory address to the address register indicated by the free pointer register (PI or PO ). If there are fewer address registers indicated than 5 DPUs the storage of this new address must be preceded by a check 112 that this address register is available . If the register is free, the address is stored in the register. The pointer register is incremented by one, modulo the number of registers in the group. The appropriate micro-instruction is then issued to the first DPU by the primitive control unit. This DPU requests that its information digit be transferred to or from the storage unit. When the scheduling mechanism sends the signal to pro- ceed, the DPU sends the value of the appropriate pointer register (PI or PO ) to the address registers, which convert x x it to a memory address by a table-lookup process in the input address registers or output address registers (whichever is appropriate). It transmits or accepts the data digit, and passes the micro-instruction to its immediate neighbor, which then goes through the same cycle of events. The address which was referenced is also incremented to point to the location of the next digit of the operand. Note that since all "store" micro-instructions prior to a given "load" micro-instruction are guaranteed to have been per- formed by a given DPU, there is no danger that an inappropriate value of a given variable is used by the adder. The choice which the designer has in interfacing a digit oriented storage device to The arithmetic unit must wait until it is available 113 guj -1 CJ _ o o Q r> Q o Q o Q < < < i O o O UJ V) o Q < 1- o Xi h a CO J-> • H •H W) • H > u O £ s c • 1-1 ■u «J O C I S o o o UJ O < a. O m u •i-i 0) 114 the arithmetic unit is between a method which entails the mini- mum additional hardware and a method which may be faster but which requires an extensive amount of additional hardware. The second method is expected to be faster by approximately the number of digits contained in each operand. 4. 3 Methods Applicable When the Memory Byte Is a Number of Digits We will now discuss the methods of communicating between the arithmetic unit and memories which transfer a number of digits per transaction. One problem, not present when the memory byte is the digit, then faces the designer. He must incorporate serializing and deserializing mechanisms into the interfaces. In the case of the first method discussed in the preceding section, in which all communication takes place through the PCU, the extension is very straight-forward. The serializer and deserializer simply become part of the PCU, since only one data transfer can be occurring at any given time in each direction. The approach of employing a centralized serializer and deserializer does not appear to be appropriate when the method depicted in Figure 1 3 is extended. There are two methods of pro- viding communication between the memory and the arithmetic unit in this case. The first method, shown in Figure 14, uses regis- ters capable of storing one memory byte to convert data lengths. 115 Each of the registers is interfaced with as many DPUs as there are digits in the memory byte. When a load micro- instruction reaches the first DPU connected to a given input register, the control circuitry associated with the register cause the appropriate information to be put into the register . Each DPU connected to the register then gates in its digit from the register when it executes the load micro-instruction. Each DPU executes a store micro-instruction by storing the appropriate digit into its portion of the output register to which it is connected. When all of the positions of a register have been filled, the contents of that register is stored in the memory. The storage address register scheme described in Section 4. 2 can also be used with this method, except that now addresses are associated with buffer registers, not DPUs. Since data which a given DPU presents to its output register for storage is not guaranteed to have been stored before it performs subsequent micro- instructions , a mechanism must *The register is filled only if all of the DPUs connected to the register have executed the preceding load micro-instruction. **The DPU must be inhibited from executing the store instruction until the preceding information has been sent to the store unit. 116 be included which assures that subsequent fetches will always retrieve the appropriate data. A simple method of guaranteeing this is to compare the address of the operand which is to be fetched with the address of the last store operation. If they are not the same, then obviously the memory will contain the most recently calculated data when it is requested. A given DPU cannot obey a store micro-instruction unless the store buffer register is assigned to collect data for that store opera- tion, therefore implying that all prior store instructions have been completed. If on the other hand, the address of the last store opera- tion is the same as that of the operand, there is a possibility that for some DPU the store has not been made prior to the load. This condition can be controlled by means of an interlock. This interlock could consist of altering the micro-instruction from a storage unit reference to a reference to the contents of the output register . This may necessitate the inclusion of register to register transfer micro-instructions not otherwise required. A *This has some analogy to the Common Data Bus (24) 117 8, G 118 second method of implementing the required interlock consists * of a micro-instruction which would not proceed past the first DPU attached to an output register unless that register had stored all operands previously presented to it. This has the undesirable effect of requiring either two types of DPU or some way of altering the action of the universal DPU on the basis of whether it is a "first" unit or not. This method of byte length conversion has one distinct disadvantage. A second memory micro-instruction of the same type, i.e. , the second of two loads or two stores, must not be allowed to be performed by the first DPU associated with a register until after the last unit associated with that register has completed the previous similar micro- instruction and the appropriate memory action has been taken. If this causes intolerable delays, a second method of byte length conversion could be employed. This is depicted in Figure 15. In this approach the information exchanged between the adder and the memory passes through shift register-like structures. An element of this structure can store one digit and will accept data from its input neighbor when its indicator shows : The most obvious micro-instruction meeting these conditions is a second store micro-instruction. 119 that it does not contain information. The element subsequently causes its own indicator to turn on and sends a signal to its input neighbor to turn that indicator off. These elements are arranged in chains; one extremity of each chain is connected to a DPU and the other is connected to a memory bus. There must be two chains associated with each DPU, one of which stages input operand digits, the other of which accumulates digits until a memory byte is collected. The former is used in loading operands from memory. The memory bus acts as the input neighbor of the first element and the DPU receives digits from the last element in each chain. The latter is used in storing results in the memory. The DPU is the input neighbor of the first element and the memory bus receives data from the last element of each chain. The shift register chains need not be equally long. For example, it is advantageous for the store chain associated with the last DPU of a given memory byte to be made smaller than the store chain receiving data from the first DPU of that storage byte. The chain of the first DPU must not only have room to store digits until the memory may be accessed but must store digits while the storage unit byte is being accumulated. The address information can be handled very much like the methods above, viz. by a number of registers in which storage 120 addresses can be stored and pointers indicating the storage address register that each data register is to use. More extensive checking is required to assure that the references to each specific memory location will be done in the order in which they were issued. Associated with each address register must be two dependency registers and a flag. The registers identify which other data transfers are also accessing this memory location, and the flag indicates whether or not the address register is currently in use. The registers that are used to indicate dependencies must have a null state, indi- cating no dependency. Both dependency registers point to address registers of the other type . One points to the operation which must be completed (on a memory byte basis) before its operation may be performed. Registers of this type are labelled xPP. in Figure 15, where j is the register number and x may be either I or 0. I indicates that the register is associated with an input transfer, while indicates output. The other dependency registers, labelled xPF., point to the last transfer found to be dependent on its transfer. These registers are used to eliminate unnecessary testing. *That is, the dependency registers associated with a load point to address registers associated with store operations, while those associated with a store point to address registers associated with load operations. 121 When a load or store appears in the instruction stream, the memory address is sent to the storage unit control. When the address register which is indicated by the appropriate free register pointer is available, the memory address is placed into it. The PCU is then allowed to issue the appropriate micro- instructions to the DPUs. The memory address is also com- pared with those contained in the currently active address registers of the other type . If there are any matches, the register number of the most recently initialized matching trans- fer is placed in the xPP. of the transfer being initialized and the register number of the transfer being initialized is placed into the xPF. register of this matching transfer. If there are no matches, null is placed into xPP.. The xPF. register is always J J set to null when a transfer is initialized. When a data transfer is requested, the transfer indicated by xPP. is checked to be sure that it has been performed at least to the byte of the current request. The transfer must not be allowed to take place until it satisfies the above condition. With this scheme for interfacing the memory to the DPUs, it appears very desirable to pre-fetch operands. *That is, the dependency registers associated with a load point to address registers associated with store operations, while those associated with a store point to address registers associated with load operations. 122 While it is possible not to initiate the request for the first memory byte until the PCU issues the appropriate com- mand to the DPUs , this introduces an avoidable delay equal to the time required by the memory to service this request. Of course, when operand look ahead is employed and a branch dependent upon the arithmetic properties of a result is encoun- tered, the device must either cease operand look ahead or fetch operands that may be required and find some way of discarding operands which are not required. A micro- instruction which causes the operand in the input shift- register adjacent to the DPU to be discarded accomplishes this function. This appears to be the only instance of non- productive activity occurring in a limited connection arithmetic unit. Unfortunately, non-productive activity takes place in the DPUs after the branch is resolved, in addition to the non- productive memory activity prior to its resolution. 123 5. OPERATIONAL SPECIFICATION OF THE MODULES 5.1 Introduction The design parameters of a limited connection arith- metic unit determine the number of distinct types of modules required and the detailed specification of these modules. This chapter discusses each of the modules with particular emphasis of the effects of the following parameters: 1 . the method of communicating with memory, 2. the number of quotient digits determined at each examination of the partial remainder, 3. the number of digits examined in normalizing number representations, 4. whether a push-down stack is included in the End Unit, 5. the method of performing addition and subtraction, 6. whether multiplier recoding is performed, and 7. the parameters of the number representation scheme. Item 1, the memory communication scheme, determines whether special modules, special data and control paths, and special micro- instructions will be included for implementing the communications paths between the arithmetic unit and memory. 124 This area was discussed in detail in Chapter 4. The number of registers in the arithmetic unit for holding intermediate results is also dependent on the scheme for interfacing the arithmetic unit and memory. Items 2 and 3, the number of quotient digits per examination and the normalization parameter, will deter - * mine the number of additional DPUs which must have a direct data path to the PCU, and if special modules are required to signal the PCU when information is available on these paths. Item 4 determines the complexity of the End Unit. Item 5 determines the number of DPUs with which each DPU com- municates. Item 6 determines whether the complexity of the PCU is increased by containing a multiplier recoder within it. Item 7, the parameters of the number representation, deter- mines the size of all data registers and data paths. The method of performing arithmetic, item 5, also affects the size of the inter-DPU data paths. *The DPUs other than those which will have a direct data path to the PCU because of its role as DPU . 125 5 . 2 The Digit Processing Unit 5.2.1 The Role of the DPU The DPUs collectively perform the fractional part pro- cessing of the arithmetic unit. Each DPU contains one digit of each of the active operands; the significance of the digits con- tained in a DPU is inversely related to its 'distance' from the PCU, as shown in Figures 1 and 2. Each DPU contributes to the processing by executing a sequence of micro- instructions. The sequences executed by the DPUs of an arithmetic unit are identical except for minor substitutions or omissions . A given micro- instruction in the sequence is executed by the DPUs from left to right (i.e. , DPU, , DPU V . . • , DPU ). At any l c n given time there are a number of micro-instructions being processed by the DPUs. *Recoding and 'transmit digit to PCU' micro-instructions will, in general, be replaced by a no-op or removed from the sequence after being executed by some of the DPUs. 126 5.2.2 DPU Registers Each DPU retains one digit of each of the active operands . There must be at least three registers distributed through the DPUs: an accumulator register, a multiplier -quotient register, and an operand register. Each DPU must also contain a register to hold the micro-instruction it is executing. It must also have a serial number counter if the arithmetic unit employs request- response signals. This serial number is transmitted with the inter-DPU data and identifies the micro-instruction to which the data is to be applied. Additional registers may be distributed through the DPUs. One possible use of such registers is to hold the number temporarily displaced from the accumulator while the accumulator registers are used to shift the number that had been residing in an operand register that cannot be shifted. Only one register need be included for this purpose. A second use of such registers is to hold intermediate results which are needed so soon after they are calculated that storing them and retrieving them from memory would delay the pro- cessing. The number of intermediate result registers that are desirable to include for this purpose is dependent upon the method of communicating between the arithmetic unit and memory. The number is determined by trade-off considerations. The decrease in the average time spent waiting for operands to 127 become available must be compared with the additional hard- ware required to cause the decrease. 5.2.3 The Micro-Instruction Repertoire There are a number of micro- instructions which may be included in the repertoire of the DPUs in addition to those discussed in Section 2.3 and Chapter 4. The first of these micro- instructions causes the digits of a specific register to be placed on the inter-DPU data paths. These micro-instructions would be used during normalization and division. With one of these micro-instructions the PCU can obtain the information it required to determine what additional processing is required to complete the operation. While the left shift micro-instruction may be used for this pur- pose, special micro-instructions have advantage in some designs. For example, modules which determine when the information required by the PCU is available are much simpler -when special micro-instructions are used. The second of these micro-instructions causes one of the normalization recodings (Equation 3. 5. 3 and 3.5.4) to be per- formed. These micro-instructions make it possible to reduce the number of shifts required to perform a recoding. When the recoding micro-instructions are not implemented and a number 128 must be recoded, all the digits to be changed must be shifted into the PCU. All but the leading zero digits of the recoded number must then be shifted back into the DPUs. When the normalization recoding micro-instructions are implemented, a normalization recoding is begun by shifting out all digits which are to be recoded into leading zeros. The recoding micro-instruction is then issued to DPU. , which passes to DPU the indication that a recoding micro-instruction is to be performed. DPU then indicates by the value of G whether it will participate in the recoding . DPU recodes its digit when it receives this information. DPU then passes to DPU the indication that a recoding is to be performed. DPU responds by indicating whether or not it will participate in the recoding by the value of G which it sends to DPU . DPU then recodes y 3 2 2 its digit. Each successive DPU goes through this process until the micro-instruction reaches the first DPU which cannot recode its digit. After this DPU sends its response to its left neighbor, it terminates the recoding by passing either no micro- instruction or a no-op micro-instruction to its right neighbor. *This is necessary because the recoding of the last digit is different from the recoding of all the other digits. 129 A no-op and an execute micro- instruction must be included in the repertoire of the DPUs if they are designed to execute micro-instructions periodically and synchronously. No-op micro-instructions must be executed prior to any micro- instruction which requires information form neighboring DPUs. The number of no-ops which must be given after the set-up micro-instruction is equal to the number of DPUs that must send information to the DPU executing the micro-instruction. These no-ops are followed by an execute micro-instruction. Note that if each DPU saves the information about the last micro-instruction it has set up, only one execute micro- instruction is necessary. The last of the additional micro-instructions that a designer may wish to include in the repertoire of the DPU is one that places a constant, such as zero, in a register. The representation of the constant determines the details of the micro-instruction. In the most useful cases, where all of the si- digits of the constant are identical , the DPUs do not have to cooperate with any of their neighbors in performing the micro- instruction. The value of the digit may be implicit in the micro- instruction or it may be sent as the modifier value sent along with the micro-instruction. *The most important examples of numbers whose digits are identical are zero, the largest number, and the smallest number. 130 5.2.4 The Sequencing and Coordination of the DPUs The operation of the DPUs contained in an arithmetic unit must be coordinated to obtain useful results. Each DPU must execute the same sequence of micro-instructions , which is determined by the processing to be performed and the specific operand values. After executing micro-instruction j-1, a typical DPU, DPU., must determine the value of .G. and place 1 j i this value on its inter-DPU data lines (see Equations 2.1.1 through 2.1.3). This information is required by DPU. , j . . . , DPU. „ , and DPU. , to perform micro-instruction j. DPU. i-2 l-l ^ J i also passes the i micro-instruction and modifier to DPU. , . so r J l+l that DPU. , will determine .G. , . When DPU. receives .G , , i+1 j i+l i j i+1 . . . , G. it performs micro-instruction j and begins the pro- 1 i+ o . r J cedure for micro-instruction j + 1 . The method of achieving this coordination is determined by the method of implementing the DPUs. We will discuss how this coordination may be achieved for two implementations which are at the extremes of implementation philosophies. In the first method of implementing the arithmetic unit, all of the DPUs periodically and synchronously execute micro- instructions . Each DPU goes through the same two cycle opera- tion. On the first cycle each DPU executes the micro- instruction it has just received. On the second cycle each DPU MAIN SEQUENCE ACKL 131 o> OORMANT STATE T|_i OOES TO 1 ® \ ' GATE IN MICRO-INSTRUCTION AND MODIFIER VALUE ACKR IN © ® 1 1 ACTIVATE ACKL AND ACKR © .^MICRO-^. ./INSTRUCTION^^. ^s. REOUIRES s* NO ^S^6" INFOx^ YES ® 1 CHANGE V; TO © < 1 INCREMENT SN[ DETERMINE G, PLACE ON INTER- DPU LINES AND HOLD THESE VALUES UNTIL CHANGED © ' r CHANGE V, TO 1 REQUIRED 'G' VALUES AVAILABLE (SEE NOTE) © \ PERFORM MICRO-INSTRUCTION ACKL IN ( D ACKR IN © OR ® ® DORMANT STATE SEND ACK| • © MAIN SEOUENCE IN ® SEND ACK| ■ 1 T|_! GOES TO ACKR ® © ® © DORMANT STATE CHANGE T| TO \ MAIN SEOUENCE IN (5) 1 PUT MICRO- INSTR. AND MODIFIER ON OUT LINES AND HOLD 1 1 CHANGE Tj TO 1 1 ACK i+ i G TO 1 1 3ES CHANGE T| TO ACK i+1 G( TO >ES NOTE REOUIRED 'G' VALUES ARE AVAILABLE TO DPU | WHEN ALL DPU'S WHICH MUST SEND INFORMATION TO DPUj INDICATE THAT THE INFORMATION THEY SEND ARE VALID (V,=l) AND SEND A SERIAL NUMBER EQUAL TO THE SERIAL NUMBER COUNTER OF DPUj (SN,-SN|). THAT IS, THE FOLLOWING CONDITION MUST BE SATISFIED: a , r(V i+e »l)-(SN| +e = SN i )] = l(TRUE) ti| L J Figure 16. Flow diagram of the control logic of DPUi. 132 simultaneously passes the micro -instruction it has just executed to its right neighbor and receives from its left neighbor the micro-instruction which that DPU has just executed. Coordination of the DPUs can be accomplished by the use of no-op micro-instructions. Each micro-instruction -which requires the cooperation of several DPUs is preceeded by a set-up micro-instruction and a number of no-op micro-instructions. These micro-instructions assure that the required information is available when a DPU executes the micro-instruction. At the other extreme of implementation philosophies are the arithmetic units in which all activities are controlled by request- response signals. The control logic of each DPU may be composed of three interacting sequential machines in this case. The flow diagrams of these machines are shown in Figure 16, where the control signals are: T. which indicates that the lines between DPU. 1 i and DPU. which carry the micro- instruction operation code and modifying data .F. are valid, J i ACK indicates that DPU. has accepted the micro- i i instruction from DPU. . , l- 1 SN is a serial number which indicates the micro- i instruction for which the .G. data was determined, J i 133 and V indicates that the .G. and SN. lines are stable i j i 1 and may be examined. All of the sequential machines of all DPUs are initialized to their respective state 1 in the figure. The V. signals are initialized to 1, the T. and ACK. signals are initialized to 0. i l DPU. begins the execution of a micro- instruction when i it is in its dormant state and T. . goes to 1 . This indicates l-l that DPU. , is transmitting to it the next micro-instruction it l-l is to execute. DPU. gates this micro-instruction into its micro- i instruction register and activates the ACKL and ACKR machines. ACKL indicates to DPU. , that it has received the micro- l-l instruction, while ACKR passes the micro-instruction to DPU ^ i+1 If DPU. must send information to other DPUs which they require to execute this micro-instruction, it turns off V.. It then l increments the serial number, SN., determines the required information and places the information and identifying serial number on the inter-DPU data paths. DPU. then turns V. back l l on and begins checking the inter-DPU data paths for the infor- mation that it requires to perform the micro-instruction. When this information becomes available or if information from other DPUs is not required, DPU. performs the micro-instruction (i.e. , it changes the value of some operand digit it contains). 134 When the micro-instruction has been performed, ACKL is in the dormant state, and ACKR is in state 4 or the dormant state, the main sequence goes into the dormant state to wait for the next micro-instruction. 5.2.5 The Number of Connections to a DPU Figure 3 and the discussion above may be used to deter- mine the number of electrical connections (pins) that must be made to each DPU. In an arithmetic unit which uses request- response signals this number is C = P + 4 + 2 (OPS+SF) + ( a +1) (SG+SSN+1) + MEM (5.2.1) where C is the number of connections required by each DPU RR when request-response signals are used, P is the number of pins required to power the DPU, OPS is the number of bits required for the operation code of the micro-instruction, SF is the number of bits of the modifier value accompanying the micro-instruction, SG is the number of bits required to represent a .G. J i value, SSN is the number of bits in the serial number, and MEM is the number of pins required by the DPU to communicate with memory. 135 If we assume that the only values of .F. and .G are those J i J i required to perform addition and shifts, SF and SG are then SF = 1 + Lg (r- £+1), and SG= Lg (2r(r-i)+l), where r is the radix of the number system, / is its redundancy parameter , and Lg(x) is the smallest integer equal to or greater than log 2 (x). If we further assume that the PCU does not have any unusual data requirements so that SN has to take on a values or SSN = Lg ( a ), (5.2.1) becomes C RR = P + 6 + 2 • OPS + 2 • Lg (r- £ +1) + (a+1)- [Lg (2r(r-i)+l) + Lg (a )1 + MEM (5.2.2) When the arithmetic unit is implemented so that all DPUs execute micro- instructions periodically and synchronously, the request-response signals and serial numbers are unnecessary. Synchronizing signals are required instead; the number of con- nections then becomes C + P + 2 • (OPS+SF) + ( a + 1) • SG +SYNC + MEM (5.2.3) s 136 where C is the number of connections required by each DPU in a synchronous arithmetic unit, and SYNC is the number of connections required to transmit synchronizing signals to a DPU. If the same assumptions are made regarding the values taken on by .F. and .G. as was made above, this becomes C = P + 2 + 2 * OPS + 2 • Lg (r- £ +1) + s ( a+l)«Lg (2r(r-i)+l) + SYNC + MEM. (5.2.4) 5.3 The Primitive Control Unit 5.3.1 The Role of the PCU The main function of the PCU is to convert a sequence of instructions (e.g. , add, multiply, divide) into a sequence of micro- instructions which must be performed by the DPUs and to issue these micro-instructions to DPU . This conversion process is similar to the process that the adder control logic of the IBM 7094(11) performs in interpreting instructions. The major difference between the two processes is that the multiplication algorithm must be right-directed in the limited connection arith- metic unit, as discussed in Section 3.3. 137 The PCU may also perform subsidiary functions, such as processing exponents or communicating with the memory. 5.3.2 The Registers in the PCU The PCU must contain an instruction register which contains the instruction currently being converted into micro- instructions. It also contains the integer extension of the accumulator, which must be three digits in length (see Section 3.3). The PCU must also have a counter to control the number of shifts and the number of repetitive steps during multiplication and division. If the arithmetic unit is implemented with request- response signals and 'sense micro-instruction' detectors are not used, the PCU must contain a serial number counter identical to those in the DPUs . The PCU must be able to store the value of several multiplier digits if it recodes multipliers. Memory byte assembly and dis-assembly registers are required in the PCU of arithmetic units that communicate with memory through the PCU and in which the memory byte contains several digits . The exponents of all operands whose fractional parts are contained in the DPUs may be contained in and process by the 138 the PCU to obtain higher performance. It is possible to pro- cess exponents in a special module or in a second set of DPUs, but these choices tend to increase the time between when the manipulation of the exponents is begun and when its result is available. Since the result of the exponent manipulation deter- mines the processing to be performed on the fractional parts of the operands, such delays decrease performance. 5.3.3 The Sequencing of the PCU The PCU is sequenced very much like the DPUs are (see Figure 16). The PCU does not have an ACKL machine, since it has no module to its left. When the PCU executes a micro-instruction it is primarily determining the next micro- instruction that it must issue to DPU . 5.3.4 The Connections to the PCU The PCU must communicate with three major com- ponents of the computer system. The first of these is the complex of DPUs; these connections are shown in Figure 1. In arithmetic units where the PCU requires from the DPUs only that information which it gets because it is in effect 'DPU ' , the number of pins required by the PCU for this information is less than the number required by a DPU. The PCU requires one, rather than two, sets of connections 139 for issuing micro-instructions. It also requires one fewer set of inter-DPU signals ( G. , SN., and V.) in this case since j 1 1 l it has no left neighbor. If exponents are processed external to the PCU, some connections will be required to indicate what processing is required and to return an indication of the results of the pro- cessing. If a special module is employed to process exponents, the number of connections required by the PCU to communicate with the special module can be kept small. If a second complex of DPUs is used to process exponents, the number of connections can be expected to be quite large. The second component of the computer system with which the PCU must communicate is the memory unit, from which it obtains instructions and possibly obtains operands and returns results. The number of pins required is determined by the memory unit byte, the number of bits required to address the memory, and whether results must be stored. Finally, the PCU must be connected to power sources and possibly synchronizing signals. The number of pins required for this purpose is the same as the number of pins required by a DPU. 140 5.4 The End Unit The End Unit performs two major functions in the arith- metic unit. The first function is to act like a terminator for the DPUs. Terminating a set of DPUs with an End Unit is analogous to terminating a transmission line with its characteristic impe- dance: the DPUs operate as if DPUs extended indefinitely to the right. The End Unit supplies all of the signals and data that are required by the DPUs but are not supplied by other DPUs or the PCU. The second function of the End Unit is to cause all cal- culations to be performed with the maximum possible precision. It does this by saving the digits shifted out of the last DPU in one or more push-down stacks. These digits can then be returned to the DPUs when left shifts are given or be used in forming sums . The complexity of the End Unit is essentially independent of the complexity of the DPUs or the PCU. The End Units of arithmetic units designed with request- response signals must have a serial number counter. If an End Unit is used to increase the precision of calculation it must have a micro-instruction register in addition to the push-down stacks for holding the digits shifted out of the DPUs. The number and capacity of the push- 141 down stacks are determined by economic considerations. In the simplest scheme, the End Unit has one stack which is associated with the A register. The register cannot have shifting capability in this case. The End Unit requires fewer pins than either the PCU or the DPUs . Not only does the End Unit need only one micro- instruction bus, but it needs only one validity signal and one serial number bus. The End Unit also requires one less inter- DPU data bus. The connection to the End Unit of a typical arithmetic unit is shown in Figure 1 . The control of the End Unit in an arithmetic unit employing request- response signals is very much like the control of a DPU, shown in Figure 16. It does not require the ACKR machine since it does not have any right neighbors. *In Figure 16, i = n+1 for the End Unit. The End Unit must supply multiple V., SN. , and G signals. We will define that V , = l l j i n+1 V ,=...= V and SN . = SN _=...= SN and will n+2 n+ a n+1 n+2 n+a interpret .G , to read .G , , G _,..., .G . a is the maxi- J n+1 j n+1 j n+2 j n+a mum of the a. of the micro-instruction repertoire of the DPUs. 142 5. 5 Exponent Arithmetic Unit There are three methods of processing exponents in a limited connection unit. The first method is to process the exponents in a second complex of DPUs . This method will result in a low performance arithmetic unit and will require the PCU to have a large number of electrical connections. The second method is to perform exponent processing in the PCU. This method makes it possible to obtain higher perfor- mance and to decrease the number of pins required by the PCU. It, however, increases the complexity of the PCU by requiring the exponents of all active operands to be stored in the PCU. The third method, employing a special module to hold and process exponents appears to make it possible to achieve high performance while not increasing the complexity or the number of pins of the PCU excessively. The Exponent Arithmetic Unit would best be designed to send to the PCU only the information that it requires to con- trol the processing of the DPUs. For example, if an addition is to be performed, the Exponent Arithmetic Unit would respond with one of the following: a) shift A register, 143 b) shift register, c) issue add micro-instruction. The Exponent Arithmetic Unit must have commands to set up multiplication and division in addition to those which have analogies to the micro-instructions of the DPUs . These set up commands cause the exponent of the result to be determined and the repetitive step counter to be set. The PCU would then precede each of the repetitive steps with an inquiry of the Exponent Arithmetic Unit, which would indicate whether another repetitive step is to be performed or if the operation has been completed. 5. 6 The Sense M i cro-Instruction Detector Another PCU function which may be performed external to the PCU is that of determining when the necessary informa- tion is available from the DPUs. This function may be per- formed within the PCU just as it is within each of the DPUs. However, performing some or all of this function in other modules reduces the complexity and the pin requirements of the PCU. The unit that detects micro-instructions which were *Each inquiry could be overlapped with the previous step of the algorithm. 144 I 1 y- T* ! i r- 1 + 3 0. Q ! ! .^/SENSE >y - 7 MICRO- \ (instruction) -^detector/ -- ! i * 3 0_ Q 1 i J- • L I I „ ■"I T I T ro 3 CL ci C\] 3 Q n 3 1 r 4 — * p 2 t 1 z § a: o -J T 1 1 1 " "" "T* 1 i 1 3 0_' Q t 1 1 1 1 t + 3 CL t 1 1 1 1 T 3 0_ Q | 1 X 1 ■* l > \UJ / t i 1 ! * z 3 oJ Q t ! ! * I— • 1 Z 3 a." Q t : i i u d o a -t-i C • i-H o 4-1 o CD J-> 0) T) C O •|H 4-> u s-l CO c o u 0) CO c cu CO CD J-l o ■4-1 c 0) a; u nJ i—i CD U 145 (or could have been) issued to examine the digits of one of the operands is placed between the last DPU which supplies information to the PCU and the first DPU which does not, as shown in Figure 17. When one of these micro-instructions reaches the sense micro-instruction detector, this unit sends a signal to the PCU indicating that it has reached the detector. Hence, it indicates to the PCU when the required information is available. The PCU then does not have to receive the serial numbers associated with the data it requires. If unique micro-instructions are employed for this sensing operation, this simple scheme will suffice. Any micro-instruction which causes operands to be placed on the inter- DPU data connections may be employed to sense oper- ands contained in the DPUs , however. Some mechanism must be included in this case to distinguish between the instances when these micro-instructions are being used to sense an operand from when they are not. The basic method employs a counter which contains the count of the subsequent 'sense- type' instructions which were not launched for sense purposes. There are two possible implementations of the sense micro- instruction detector when a counter must be included. 146 In the first, shown in Figure 18, the counter is located in the PCU; the detector module is little different from that used if unique micro-instructions were employed for sensing DPU contents. The major difference is that a second sense- type micro-instruction will not be allowed to propagate beyond the sense micro-instruction detector until a proceed signal is received from the PCU. This proceed signal indicates that the last sense micro-instruction has been tallied on the counter. In the second, shown in Figure 19, the counter is in the detector. This has the advantage of minimizing the amount of hardware in the PCU. It also is less likely to cause delays because the signal to increment the counter is sent at the same time the micro-instruction is launched. Hence, incrementing the counter is overlapped with the micro-instruction propagating through the DPUs . 147 U -t-> n) y o o -1-1 o CO C • H I o 0) CO c cd CD cd CD o o u — g U 1—1 3 •iH 149 6. SUMMARY AND CONCLUSIONS 6. 1 Discussion of Results Section 1.1 describes the characteristics that a mechanism must have in order for it to be particularly- appropriate for implementation in the newly emerging technologies commonly called 'large scale integration' or 'LSI'. This paper presents a design for an arithmetic unit which admirably meets those requirements. One of the primary desiderata of a unit to be imple- mented in LSI is that the item be composed of a small number of rather complex module types (i.e. , the modules employ a large number of logic elements). The approach proposed in this paper will yield designs that will consist of from three to eight types of modules. Furthermore, the complexity of these modules can be adjusted over a wide range by such factors as the number system and algorithms used, so that a design may be tailored to the technology with which it is to be implemented. Another major desideratum of designs to be imple- mented in LSI is that each module must require a relatively small number of connections (i.e. , pins) to communicate with its environment. 150 The number of connections required by the modules of a limited connection arithmetic unit are limited by the need of each module to communicate with a small number of other modules. The major module, the Digit Processing Unit, must communicate with either two or four other modules . The End Unit must correspondingly communicate with one or two other modules. The number of modules that communicate with the Primitive Control Unit is no less than the number that communicate with the End Unit. A third desideratum of designs to be implemented in LSI is that no inter-module signal be routed to a large number of modules. No inter-module signal in arithmetic units organi- zed as proposed in this paper need be sent to more than three modules; the number may be decreased to two or one with attendant decreases in performance. ; This case requires less than 50% more pins because of the nature of the signals . : *Arithmetic units in which the modules communicate with a larger number of other modules can be expected to have higher perfor- mance. 151 Hence, the approach proposed in this paper is very- well suited to designing arithmetic units in LSI. Furthermore, the proposed organization has several advantages over the other arithmetic unit organizations proposed for implementation in LSI (9, 10, 12) because it is operationally a single multi- purpose arithmetic unit rather than a multiplicity of special- purpose units . The first advantage is that no effort is required of the programmer, compiler, assembler, or monitoring program to take full advantage of the potential parallelism of the arithmetic unit. The arithmetic unit is organized so that the operations within a single instruction stream that can be performed con- currently are evoked automatically. The second advantage of the proposed scheme is that no unnecessary processing is performed when a conditional branch is encountered in the instruction stream. After the controlling element (the PCU) is presented with a conditional branch instruction, it can immediately initiate the testing required to resolve the branch. Computers with multiple execution units may have to delay the testing until the operand is available. While this operand is being formed by one of the execution units and is unavailable for testing, the instruction decoder either waits (with the attendant loss of performance), or goes into a 152 mode in which it continues to decode instructions and issues them conditionally to the execution units. The execution units are allowed to proceed but are not allowed to make any irrevoc- able changes to the state of the computer until the branch decision is made. This requires a great deal of interlocking and control hardware. Like the other organizations proposed for implementing arithmetic units in LSI (9, 10, 12), the arithmetic unit proposed in this paper has the property of modularity. That is, the word length is determined by the number of modules and not by the design of the modules. The same basic building blocks could therefore be used to construct a variety of arithmetic units with a wide range of computational capability. 6 . 2 Suggestions for Related Work The number system and arithmetic algorithms are the principal factors with which the design of a limited connection arithmetic unit may be adjusted to a specific technology and to a specific performance requirement. The designer has no general analytic tools to aid him in making these choices. He must therefore go through an iterative process of selecting tentative parameters, designing an arithmetic unit based on these para- 153 meters, and evaluating the desirability of the resulting design. Studies which would be particularly useful to the designer of limited connection arithmetic units include: 1 . the determination of the complexity of a signed-digit adder as a function of its number system, 2. the determination of the improvement in performance that will occur if multiplier recoding is employed or if additional digits of the partial remainder are examined during division, 3. the determination of the complexity of multi- plier recoders, quotient digit selectors, and normalization control circuitry. Reliability and availability considerations were not addressed in this paper. The arithmetic unit as proposed in this paper will operate properly only if all the modules are oper- ating properly. The modules can be designed so that they will stop if they detect an error, so that maintenance becomes less burdensome. Determining organizational modifications that are necessary to yield an arithmetic unit that will operate properly in the presence of failures is a very important area to be investi- gated. 154 Another unresolved problem in the area of the organi- zation of the limited connection arithmetic unit is the provision of a reasonable method of performing multiple precision addition and subtraction. The method which would have to be used in the arithmetic unit as described here requires very many shift micro-instructions to be performed whenever radix point align- ment is necessary. The digits shifted out of the active adder register during radix point alignment are not shifted into another active register, as they are in the classical Von-Neumann arith- metic unit. Instead, they are shifted into the End Unit, from which they can be returned to the active registers only by left shifts. Reconstructing these digits from the initial operands would also require a significant amount of shifting. A suggestion for improving the performance of the limited connection arithmetic unit was recently made by Robertson (20). He observed that more optimal normalization and quotient digit selection algorithms could be employed if each zero digit is assigned the sign of the first non-zero digit to its right. For example, if a number to be normalized has the form 10 ... 01 , it would be clear that the number is nor- malized after examining the first zero digit, whereas all of the zero digits would have to be examined and shifted if the zero digits were unsigned. Organizing the mechanism for associating 155 signs with zero digits so that the appropriate zero digit receives the sign is the fundamental problem to be solved in applying this technique to limited connection arithmetic units. If a sum has a number of adjacent zero digits, the DPU containing the first of these zero digits will have performed a number of micro- instructions before the sign to be associated with that zero digit arrives at the DPU. These micro-instructions may have made additional copies of the zero, sent it to another DPU, or obliterated it. Developing a scheme for keeping the appropriate records with a reasonable amount of hardware appears to be a challenging problem. 156 APPENDIX I CHARACTERISTICS OF THE SYMMETRIC RADIX TWO SIGNED DIGIT ADDER The probability of each possible pair of adjacent digits in the sum was determined for Adder 3 of Section 3.2. The analysis is based on the assumption that one input representation, A, has the same pair probabilities as the sum representation, A', while the other input, 0, is characterized parametrically by an analogy to SRT division (19), (21). The analysis was performed for the case for which the probability of zero for a digit in the parametrically 2 2 defined input representation was in the range — to — . 5 3 This restriction is justified by the results of the analysis of the single digit probabilities of Adder 2 by Rohatsch (21). He dis- covered that the probability of zero digits in the sum representation varied little from — when the probability of zero digits of both oper- and representations varied from to 1 . This analysis takes a somewhat different tack than that taken by Rohatsch. He analyzed the output representations on the basis of 157 the triplet probabilities of both operand representations, where the representations of both operands were defined parametrically. In the present analysis, only one operand, , is parametrically defined. The other operand, A, is assumed to have the same distribution as the sum representation, A'. This is justified by the observation that at least one of the operands taking part in an addition is the sum of a previous addition in practically all of the uses of the adder in a typical calculation. The transfer digits are assumed to reach a steady-state distribution independent of digital position. Each possible combination of digits for the A operand and the input transfers (t , t' ) was identified as the present state. Table ^ i+1 i+1 K 11 lists these, indicating which states are equivalent because of the symmetry of the adder. Each possible combination of digits for the sum and the output transfers (t. , t| ) was identified as the next state, also employing Table 11, with the following equivalences Vi-'i+i (A1 - 1 ' 'i-i 5 *Ui (A1 - 2) l' = a j = i, i+1 (A1.3) J J : Three consecutive digits of the number. 158 Table 11. States of the Markov Process Used to Analyze Adder 3. 'Positive' State 'Negative' State Steady State State 'i+i 'i+1 a. l a i + l t t' i+1 i+1 a. i a i+l No. A 1 1 1 1 1 1 - B 1 1 I 1 - C 1 1 1 1 1 1 - D 1 1 1 1 - E 1 1 1 F 1 1 1 1 2 G 1 1 1 1 1 1 3 H 1 1 1 1 4 I 1 1 1 T o 1 1 5 J 1 1 1 1 I l 1 1 - K 1 1 1 l l 1 6 L 1 1 1 1 I l 1 1 7 M 1 1 1 I l 1 BA N 1 1 l l 9 1 1 1 l l 1 10 P 1 1 1 1 l l 1 T - Q 1 1 1 l l 1 - R 1 1 1 1 l l 1 1 - S 1 1 1 1 - T 1 1 11 U 1 1 1 1 12 V 1 1 8B w - - - 13 159 co tH O o O CN ft4 o o o ft, X CN o o O CN ft, CN ft, CM tH O o tH PL. ro ft. o o o CN ft, tH ft, o o tH ft. d" ft. tH tH tH ft. tH ft. CM ft. o o o tH ft. o o o d" ft, tH ft, O tH o o tH ft, ft. o o tH ft, CN ft, d- ft. o tH 0, O tH ft, cn o o O CN ft. o o J- ft. CO ft. o o o tH ft, CN ft. CO o tH ft, o ft, o o o tH ft, ft. o tH ft. CN ft. tH Q, CD +-> «J -t-> r*> o tH ft, o CO ft. o o CN ft. tH ft, tH ft, o o O ft. CO CO tH ft, co ft. o CM ft. o o tH ft. O O o o li- ft, tH ft. 0) 1-4 Oh m tH ftl co ft. o CN ft. tH ft. o o o o o o 3- ft, tH ft. j- o tH 0. o co ft. CN ft, o o tH ft. tH ft, o o o 3- ft. co o o o tH ft. CO ft, o o ■3" ft. CN ft. tH ft. tH ft. o o CM o o o ft. H" ft. tH ft. o CO ft. O o o tH ft, CN ft. tH o o tH ft. o tH ft. ft. o CN ft. d" ft, o tH ft, O tH ft, Next State tH CM CO ^1- LD CO r- 00 a-- o tH tH tH CN tH CO rH 160 The transition probabilities were then determined, based on the probabilities of pairs of the operand. The determination of the steady- state digit statistics can then be determined the steady- state distribution of states in the Markov process defined above. Examination of the transition matrix showed that fourteen of the twenty-three states are persistent, and that states M and V are equivalent with respect to their transition probabilities to subse- quent states. This is also indicated in Table 11. The transition pro- babilities of the persistent states are given as Table 12. In this table, PI = - (A1.4) 4 P2 = | (A1.5) P3= j - | (A1.6) P4 = j - &■ (A1.7) where z is the probability of zero of the parametrically defined 2 2 input representation, — + o Q •AA/S^ WV f-j-C>H> I ' -WAr-" Wwjh- Q. o c o 60 i m i—i o CM