LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN net- cop. 2 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN SEP 2 197fc ir UP 1 2 RECTO SEP3< 1QQC 2001 L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/designofarithmet333atki Report No. 333 //yuucii coo-1018-1183 DESIGN OF THE ARITHMETIC UNITS OF ILLIAC III: USE OF REDUNDANCY AND HIGHER RADIX METHODS by Daniel E. Atkins May 1969 JUL u COO-1018-1183 REPORT NO. 333 DESIGN OF THE ARITHMETIC UNITS OF ILLIAC III: USE OF REDUNDANCY AND HIGHER RADIX METHODS* by Daniel E. Atkins May 1969 Department of Computer Science University of Illinois Urbana, Illinois 6l801 *To be presented at the Workshop on the Theory of Computer Arithmetic, Third Annual IEEE Computer Conference, Minneapolis, June 16 , 1969. This work was supported in part by the U.S. Atomic Energy Commission under Contract No. USAEC AT(ll-l -1018 and in part by the National Science Foundation under Grant No. NSF - GP - U636 . ABSTRACT In keeping with the experimental nature of the Illinois Pattern Recognition Computer (illiac III), the arithmetic units are intended to be a practical testing ground for recent theoretical work in com- puter arithmetic. This paper describes the use of redundant number systems and the design of a structure with which multiplication and division are executed radix 256. The heart of the unit is the stored- sign subtracter, a recently discovered member of the family of borrow- save subtracters and carry-save adders. A cascade of these subtracters controlled by a multiplier recoder, provides multiplication. The same structure, controlled by a "model division" (a quotient recoder), performs division. ■111- ACKNOWLEDGEMENT The author wishes to acknowledge and thank Professor James E. Robertson, Professor Bruce H. McCormick and Mrs. Tuh-Kai Koo for their assistance in the design effort described in this paper. Mrs. Koo wrote extensive simulation programs which were used to validate the arithmetic algorithms. -IV- TABLE OF CONTENTS Page INTRODUCTION 1 Adder-Subtracter 1 Multiplication 1 Division 2 ADDER-SUBTRACTER 3 Background 3 Definition 3 Properties 5 Input-Output Compatibility 5 Limited Borrow . 5 Unique Zero 5 Negation 6 Least Significant Digit 7 Overflow Detection 8 Truncation Error 11 Sign Detection 12 Assimilation 13 Implementation Ik MULTIPLICATION 17 Background 17 Recoding Scheme 18 Multiplication Structure 21 Brief Operation Description 2k Truncation Error 26 -v- Page DIVISION 28 Background 28 Model Division 29 Operational Description of Model Division 31 Division Structure 36 Brief Operational Description of Full Precision Division Scheme 37 Truncation Error 38 REFERENCES 39 APPENDIX 1+1 Proof of the Validity of the Correction Scheme for Bogus Overflow k2 Brief Description of Illiac III Computer System . . . . U6 -vi- INTRODUCTION In keeping with the experimental nature of the Illinois Pattern Recognition Computer (illiac III), the arithmetic units are intended to be a practical testing ground for some recent theoretical work in computer arithmetic. The hulk of this work centers upon the use of redundant number systems and/or the use of higher radix methods. The design of the arithmetic units of Illiac III exhibits both tech- niques. They are of primary importance in the adder-subtracter structure the multiplication structure, and the division structure. Adder-Subtracter A key factor in the rapid execution of the iterative sequences of multiplication and division is the operation time of the adder-subtracter. The design used in Illiac III is a member of a family of limited carry-borrow propagation adder-subtracters. The necessity for propagation of carries or borrows is eliminated by permit- ting the results of an operation to be represented in a redundant form. Redundancy is achieved by using a signed-digit format. Associated with each digit is a magnitude of either 1 or , and a sign of either posi- tive or negative Changing a number in a signed-digit format to a conventional non-redundant representation requires a carry or borrow propagation, but only one such conversion is required per arithmetic operation and it may be accelerated by use of lookahead techniques. The adder-subtracter structure exhibits several other interesting pro- perties not found in the conventional carry-save adder or borrow-save subtracter. Multiplication In other than the adder-subtracter complex, high-speed operation is also obtained by extensive use of redundancy and by executing operations in radices greater than two. Multiplication, for example, is performed radix 256, i.e. 8 bits of the multiplier are retired in one pass from the primary to the secondary rank of the -1- accumulator. By recoding, redundancy is introduced Li ;uch a manner that all tl required mult tip] Lcand W be formed merely by shifting. Division In division, redundancy is introduced into the representation of the quotient. As a consequence quotient digits may be determined fj a relatively few high-order bite of the divisor and partial remainder, full precision comparison of the divisor and partial remainder is not required. The division algorithm makes efficient ur. the large amount of hardware devoted to high S] -ed multiplication and is also performed radix 25b. Eight bits of the quotient are generated in one pass from * primary to the secondary rank of the accumulator. Appenli x The Appendix includes the proof of the validity of the bogus overflow correction scheme and introduction to the entire Illiac III . ,:tem. -2- ADDER- SUBTRACTER Background It has long been realized that the execution of multipli- cation is substantially accelerated by the use of adders in which carry propagation is eliminated until a terminal step. Recently, Robertson [l] has noted that the traditional carry save-adder or borrow-save subtracter derived by the modification of conventional adders or subtracters are but two members of a larger family of limited carry- borrow propagation adders - subtracters. At least two of the designs obtained using his deterministic procedures appear to be new and of practical importance. They are the stored sign adder and the stored sign subtracter . The design properties of both are similar and in the final analysis both are actually capable of either addition or subtrac- tion. The stored sign subtracter has been implemented in Illiac III. This device is also referred to as a signed-digit subtracter . The two names will be used interchangeably in this paper. Definition A typical position of a signed-digit subtracter is shown in Figure 1. Each position is a three-input, two-output device together with an interpositional connection and a "NEG" control line. The symbol Y. represents the ith bit of the subtrahend (minuend - subtrahend = difference) in conventional binary form*. S. and X. together comprise the ith minuend digit in a redundant format. X. is interpreted as a magnitude, either 1 or , and S. as a sign; is positive and 1 is negative. The digital values 1, 0, or l (overbar denotes negation) are thus represented as follows: *The design described here employes one operand in conventional form and one in redundant form. Designs have been proposed in which both operands are represented redundantly. See Rohatsh [2] and Borovec [3] -3- Subtrahend Minuend Yi Si C;_ i-1 -^ 4 POSITION i 4 ^ Ti Difference S. = sign of minuend digit X. = magnitude of minuend digit Y. = subtrahend in conventional binary form T. = sign of difference digit Z. = magnitude of difference digit NEG = control to complement T. If NEG = 1 then T. is complemented, else not G = gate on interpositional connections C. = interpositional connection T. = C. © NEG l l Z. = C. 9 X. © Y. i ill c i-i = (s i x i v x i V G c i = (s i + i x i+ i v x i+ i Vi> NEG G Ci Figure 1 - Typical Position of a Signed-Digit Subtracter -h- S X i i Digital Value +0 1 +1 10 -0 11 -1 The logical equations for a stored sign adder may be derived by changing the sign of all non-zero digits in a truth table for the equations given in Figure 1. The gate signal, G,. shown in Figure 1 is not inherent in the logical design of a stored sign adder or subtracter but is necessary for a particular application in Illiac III. During the assimilation of a redundantly represented result into conventional form the require- ment arises for the Z output to be identical to the X input. In general, the addition or subtraction of zero will not guarantee this. However, with G=0 , all interpositional inputs, C. , are 0, and thus Z. = X. ©Y. ; the subtracter will perform the exclusive or function with G=0. Further- more if all Y. are also 0, then Z.=X. . The signal G will always be 1 i 11 whenever the device is actually being used for addition or subtraction. Properties Input-Output Compatibility - An important property of the subtracter in the execution of iterative operations such as multipli- cation is the fact that the output is in this same signed-digit format as the input. Z. is the magnitude and T. is the sign of the ith digit of the output . Limited Borrow - The introduction of redundancy in the output of the subtracter has permitted the length of the borrow propagation chain to be drastically limited. The interpositional connection, C, is a function of only the inputs to the adjacent position, i+1. It is not a propagating borrow. Unique Zero - Note that although the representation is redundant, the representation of zero is unique except for sign. A -5- number in the signed-digit J Jl magnit i bits are zero. For a signed-digit representation in radix r, t:. requirement for a unique representation of zero demands magnitude of allowed digit values not exceed r-1. Negation - Another property of this logical structure is t ability to algebraically negate a number in sign' format I ely logically complementing all the sign bits. There is no analogous property for the conventional carry-save adder or borrow-save subtracter. This feature of the signed-digit subtracter permits additions and sub- tractions in a cascade of such devices to be interleaved in any manner desired. In Figure 1, NEG is a control signal which when set to logical 1 complements T. and when set to logical ) allows T ± to pass unchanged. Now consider a subtracter consisting of adjacent, inter- connected positions such as shown in Figure 1 and let Y = the algebraic value of the subtrahend in conventional form ; X* = the algebraic value of the minuend in signed-digit form, and Z* = the algebraic value of the difference. With NEG = '0' the device is truly a subtracter and Z* = X*-Y. With NEG = '1 ! the output is negated and thus Z* = -(X*-Y). Now note that if complementing circuits are added to the S. input so that both X* and Z* may be independently negated it is possible to form Z* = -(-X*-Y) = X* + Y and the device is adding. For many applications the negating circuits for the sign bits, S., need not be included in the subtracter per se but rather the same result is achieved by gating the complement outputs of the register containing S, or when the subtracters are cascaded, by negating the output of the previous stage. This ability to negate a result while it is still in a redundant form also expedites the execution of floating point addition and subtraction. In the floating point format adopted for Illiac III the mantissa is considered to be positive, i.e. to be a magnitude. The sign is given by a bit apart from the mantissa. In multiplication and division the sign of the result if the exclusive OR of the signs of the operands. In addition and subtraction the sign determination is more complicated: it depends upon the signs and the relative magnitude of the operands. -6- Consider two operands with magnitudes A and B and with signs SIGNA and SIGNB, respectively. A logical one denotes a negative quantity; a logical zero denotss a positive quantity. The table below gives the sign of the result as a function of the sign of the operands and their relative magnitude: SIGN(A+B) SIGN(A-B) SIGNA SIGNE A>B AiB A: B A*B . 1 1 1 I 1 1 1 1 1 1 1 If the exponents of the operands are different, then the relative magnitude is readily determined from the difference of the exponents. But if the exponents are equal and SIGNA ± SIGNB for addition, or SIGNA = SIGNB for subtraction then the sign of the result cannot be determined prior to actually performing the operation. First consider the cases in which the sign of the result maybe be determined. If the sign is known to be negative the result is negated prior to the conversion to a conventional form. The ability to negate the redundant form of the result permits this. In cases in which the sign is not known prior to calculation, the sign of the result is assumed to be positive, the operation is performed and then converted into a conventional form. The high order bit of the conver- ted result is the sign. If it is negative then the redundant result (still present on the outputs of the subtracter) is negated and then again converted to a conventional form. The necessity for two conversions would be avoided if the sign of the result could be deter- mined from the redundant form. However, as discussed in the next section, sign determination is complicated by use of redundant notation. The logic required is of the same order of complexity as that required to convert the redundant result to a conventional form in which the sign is apparent. Least Significant Digit - A basic property of a stored sign adder or subtracter is that the position of the least significant digit need not be known. A conventional adder used only for addition does -7- not required the insertion of a carry into the least signifies digital position. Similarly, a subtracter used only for subtroctic does not require borrow insertion. Hence, the combination adder- subtracter does not require insertion of a carry during addition or a borrow during subtraction. Since there is no requirement for a carry or borrow insertion in the least significant position, a signed-digit subtrac- ter of a given length may be partitioned into several subtracters of smaller length. Furthermore, by suitably partitioning the NEG con- trol signal, addition could be performed in some group, while subtrac- tion occurs in others. This facility is of application in variable length operand formats and for parallel vector arithmetic. Although neither of these are available in the initial version of the Illiac III arithmetic units the potential usefulness of vector operations influenced the decision to implement a signed-digit subtracter. Vector facilities could be included in a subsequent version of the arithmetic unit without major modifications. A very limited use of this facility is being made in performing integer division. To be compatible with the floating point division algorithm an integer divisor or dividend in two's complement negative form is converted to sign-magnitude form during a preliminary step. In performing this conversion a 6U-bit signed digit subtracter is used as two, 32-bit subtracters . Overflow Detection For redundant representations it is possible to derive sufficient but not necessary conditions for overflow detection. Let Z* = Z* + .1. Z* 2 _1 i=l l with the constraint -1 < Z* < 1. Inspection of Z* and Z* gives rise to three possible range conditions: overflow, no overflow, or maybe overflow. The later conditions means that overflow may or may not occur on assimilation to conventional form. Table 1 defines the range of Z* for all possible combinations of Z* and Z* In Illiac III overflow is checked only after the result has been converted to conventional form. There are sufficient subtracter positions to the o V * o _l Ll. *v o _i Ll or 1 5 3 U. h- N > N UJ > N 01 UJ V O V o V > o 1 o z O o z CM 1 £ — o CO o O o ,v _J LL *v _l lo- *v li- ar N V UJ s N V ir UJ > o N V Ul > o o o z T o z £ £ <\J o 1 3 CM o \ CM — i u. or \ _l lo- o N Ll tr UJ > N UJ > o 1 N or Ul > o +i V o V UJ 00 V UJ o CM CM CM z \ >z \ >- \ — < ro < 1 2E I S «9o o i /r + 1 i < (T O Ll ^2 O Q LU ~Z. ^2 en (r o o h- o LU I- LU Q 3 Li- LU > O LU _l -9- left, of the radix to insure that no high P digit t. But in using redundant representations it obvious what constitutes a "sufficient" number of subtracter posit i'. The decision is complicated by the fact that although the algebraic value of a redundantly represented number may be within the range of, say, n non-redundant digits, the actual form of the redundantly represented number requires more than n digits. This point is illustrated by an example. Consider an 8 bit integer represented in a conventional binary format. If I denotes this integer then the allowable positive range of I is £_ I <_ 255. Conversely, given a conventionally repre- sented binary integer, I, in the range <_ I ^_ 255, an 8-bit register should be adequate to hold I. Now let I* be a signed-digit version of I. We must now assign two bits per digital position of our 8 digit register; one for the sign and one for the magnitude. The term "digit" will now refer to one of these sign-magnitude positions. The tempta- tion is to reason as follows: Due to the range restrictions imposed on I , it may be stored in an 8-bit register. Every I* is equivalent in value to an I, therefore I* may be stored in an 8 digit register. This reasoning may well be incorrect as illustrated by the following specific example. Let I = 10000000. = 128 and Let I*= 10000000. = 128 Although both I and I* are equivalent in value and both are in the range to 255, I* is in a form requiring 9 digits. This behavior gives rise to a condition we shall call bogus overflow . The essence of the problem is the fact that a signed-digit subtracter or adder will sometimes transform a bit pattern of 01 into 11 or a pattern of 01 into 11. One method of coping with bogus overflow is to provide auxiliary register positions. It may be shown that if I* and I* are both represented within n digits or less, the sum or difference of I* and I* is representable within n+1 digits. Note however, that when repetitive additions or subtractions are performed (even addition or -10- subtraction of zero) each operation may generate another non-zero digit to the left. Once bogus overflow begins it tends to propagate leftward unless corrected. The implementation of positions to store the bogus overflow not only adds hardware costs to the subtracters and registers, it also burdents the assimilation logic. Although the assimilated number will be contained in only n digits, the assimilation logic must propagate borrows across n+k digits of the redundant form of the number. The maximum value of k is the number of additions or subtractions which take place prior to assimilation. But fortunately a procedure is available to control bogus overflow. We shall first state the procedure and then prove that it is valid, Statement : Consider the high-order byte of an Illiac III signed-digit sub- tracter. The positions are numbered 1 through 8. The radix point is to the right of position 8. If the inputs to position 1 are such that S X = 1 then a bogus overflow will occur. Without implementing the Oth position, it may be corrected by complementing the sign of the result of position 1, i.e. by replacing T by T . Proof : The proof is presented in the Appendix. Truncation Error n Let Z* = I Z* 2 _1 i=l X The first column of Table 2 gives the possible digital values of Z* for the output of a signed digit subtracter or adder, for the output of a conventional carry-save adder, and for the output of the conventional borrow save subtracter, all of length n to the right of the radix point. -11- Signed-Digit Conventional Carry- Save Possible Values due to truncation to of Z* the rifrjnt of position l*e i 1, o, 1 Let t = 2 - e -2' n -T - LlI z X 1* > •■» X Z z o o CO II II II t- N »- Z> IS o 1- 1- Z> ZD 61 6l O o AA ^AA M CO O X X > o 15= I CO o = w ± z o _l 2 <2co Q O Q_ Ld O < cr h- 00 Z> co CD Q i Q Ld Z CD CO U_ O CO O Q_ LU O o LL. o CD O CO Ld CD -15- 2. T. and Z. are formed in one additional collector d< 11 3. The complements of T. and Z, are formed in one collector delay. The complements are necessary as inputs to the next subtracter in the cascade. Using this logic parallel addition or subtraction takes place in three collector delays. A block diagram of the entire adder-subtracter complex is shown in the section describing multiplication. It consists of a cascade of four subtracters, each 6k positions wide. -16- MULTIPLICATION Background Multiplication in a digital arithmetic unit is generally accomplished by over-and-over addition of multiples of the multi- plicand with the contents of an accumulator. One way to accelerate the execution of multiplication is to decrease the time required to add the multiplicand to the partial product. The efficacy of a reduced add time is the primary motivation for the use of a borrow-save device such as the signed-digit subtracter. Another technique for accelerating the execution of multiplication is to accommodate more than one bit of the multiplier per iteration. Such a scheme may be viewed as multiplication in radix r, where r = 2 V , with k equal the number of bits inspected per iteration. While use of a higher radix has the advantage of reducing the number of iterations by a factor of k over the binary case, it has the disadvantage of requireing additional multiples of the multiplicand. For a non-redundant number system, multiplication radix r, requires the multiples 0, 1, 2,..., (r-l) times the multiplicand. If, however, a redundant number system is adopted then the multiples 0, 1, 2, ..., (r-l) may be transformed into the multiples -r/2, (-r/2 -l),..., 0, 1,..., r/2 (for even radices). In this new set of multiples, half of the members are merely the complement of the others. For the specific case r = h, the set {0, 1, 2, 3} may be replaced by the set {2~, 1, 0, 1, 2}. Note that in fact we do have redundancy in the second set , since there are more than r (in this case five) digit symbols. The multiple of 3 in the first set is awkward or costly to form, but in the second set all multiples may be formed by shifting and complementation. It is useful to view this transformation as a recoding of groups of k bits of the multiplier represented in conventional form into digits belonging to the redundant set in such a manner that algebraic equivalence is maintained. Additional information -IT- on the theory of multiplier recoding may be found in references [h] and [5]. Parts of these works are concerned with recodings which permit the probability of a digit to be high. This property is important in an implementation in which an adder is bypassed if a multiple of is selected. In Illiac III, however, this property is not stressed since the addition time is at least as fast as a bypass. Recoding Scheme The recoding scheme adopted for Illiac III was suggested by Wallace [6]. It is first defined for a radix k but will be extended to a radix 256. The recoding actually requires the parallel inspection of three bits of the multiplier. If X. is the low-order bits of the multiplier, then the bits inspected are X. , , X. , and X. , . The bit X. , is an extra position at the l-l 1 l+l l+l right of the least significant bit of the multiplier. It is initially 0, but after the first right shift of the multiplier it will equal the previous X. , which may not be 0. In a sense, X is the indicator of what "mistake" was made on the previous cycle. The recoding is shown in Table 3. It will accommodate a negative number in two's complement representation. v i-1 ^i yv i+l Recoded Digit /Multiple Selected Oil +2 10 +1 1 1 1 1 X. 1 X. , 1+1 1 1 1 1 1 1 1 1 TABLE 3 - Multiplier Recoding Scheme -18- The Wallace recoding scheme has the following advantages : 1. It requires little logic. 2. All selections can be made simultaneously; the recoding is not a serial process. 3. The multiples used can be obtained from the multiplicand by the processes of complementation and displacement. h. It applies without alteration to the leftmost digits of the multiplier. Multiplication has been further accelerated by cascading four signed-digit subtracters between the primary and secondary ranks of the accumulator. A radix h multiplication takes place at each subtracter: the result is a radix 256 multiplication for a complete pass. Eight bits of the multiplier are retired per iteration. The motivation for cascading subtracters is demonstrated by the following: Let t = the time required to execute the iterative part m of the multiplication, t = the time required to add or subtract; a t = the summation of the following times: time to load the secondary accumulator, propagation time through the shift gates into the primary accumulator, time to load the primary accumulator, propagation time through the gates on the output of the primary accumulator, control overhead time; n = the number of additions, a n = the number of shifts . s Thus t = n t + n t m a a s s i If N is the number of bits in the multiplier, r is the radix of the multiplication performed at each subtracter and K is the number of subtracters in cascade then, -19- n = a log 2 n a s K t = T— *-T (t ♦ ^ ) 'm log r' a K The radix of the multiplication from accumulator to accumulator is given by r = 2 . For the Illiac III implementation, t = 3 delays, t = 8 delays, N = 56 , and r' = k. The table below 3, S gives t for K = 1 to 6. m K t (collector delays) Percent Decrease m 1 308 2 196 36 3 158 kg h lUo 55 5 129 58 6 121 61 Increasing the number of subtracters decreases t , but m by a decreasing amount. The 36% decrease in t for doubling the m number of subtracters is substantial. The 55% decrease for quadrupling the subtracters is less impressive but was nevertheless deemed justifiable in light of the anticipated high demand for multiplications. The following factors also contributed to this decision: 1. A radix 256 structure is highly compatible with byte oriented data formats. 2. Control complexity and overhead is decreased. 3. The structure can be used to accelerate division and thus the cost is amortized across both operations -20- Multiplication Structure Figure 3 is a block diagram of an Illiac III arithmetic unit. The conventions used in this figure are as follows: 1) Functional sub-blocks are denoted by rectangles. Inside each box is the name of the block followed by a list of the names of signals which control it. 2) The lines between boxes denote data buses. 3) Selector signal names are of the form F X T, where F is the name of the register from which the data is transferred, and T is the name of the register to which the data is transferred. X = D if the transfer is direct , i.e. without shifting. X = Rn if the data is shifted n places to the right during the transfer. X = Ln if the data is shifted n places to the left during the transfer. h) A register name standing alone, for example, UQ, denotes the true output of all positions of the register. A subsection of a register is specified in the following form: np , where n is the number of the first byte (8 bits per byte) of the subsection and p is the number of the last byte of the subsection. Byte numbering is through "J. Example: VDUHU7 means V-BUS Direct to UH-Register, bytes h through 7- 5) If R denotes the name of a register, then RSEL denotes the output of the associated input selector. 6) If R denotes the name of a register, then LDR denotes the signal which loads the output of the associated selector into the register flip-flops. 7) All selectors, registers, subtracters and shift gates are 6h bits (8 bytes) wide, except for the M-Register which is 56 bits wide. 8) The signed digit subtracters are denoted SDS1 through SDSU. -21- - ► JQ 0 ■ ■ B ; h- z (\J Z) cr (J UJ > J- — UJ 2 < X rr UJ h- 55 CD < O < Q u ^ o o o < _J _l CD _l CD (/> D "J in u a (T IT 2 -J to V) a 2 W « 3 =5 2 oo o _j ^g 2 UJ -J 5 (/) . . Kl ■ 2 D 3Sg 2* 2 is 3 O 3 a -j ir 3 a uj = Ps oc a: O in UJ a => UJ o H t- (/> 10 -J o IP o UJ 3 a: UJ in 1 w 2 3 3 3 Z> o « 2 - .- 7 2 O = 2 ■-^ a ? □ o > r. - a a o -1 a a U t i o 5 a 8 o a So o5 Si k or 2 < -22- One multiplication cycle consists of a sequence of four, radix k multiplications: mule ipli cat ion is perform radix 256. Nine bits of the multiplier stored in the UQ Register are recoded simultaneously to control the gates of the M Shift Array and the NEG signals of the signed-digit subtracters which determine whether addition or subtraction is performed. The shifters are all logically identical, however, they are connected to the appropriate subtracter so that , with respect to the radix point of the subtracters (between the first and second byte^, the values of the multiples are as shown below: SDS No. Multiples Selected 0, +128, +6U 0, +32, +16 0, +8, +h 0, +2, +1 The recoding is performed in three bit, overlapping groups according to the specifications in Table 2. Figure h illustrates the low-order byte plus the extra right-most bit of the UQ register and the shift gates each control. UQ BIT NO. produces signals: 57 58 59 60 61 62 63 64 65 L J r i i ML7YI ML5Y2 ML3Y3 MLIY4 ML6YI ML4Y2 ML2Y3 MDY4 FIGURE 4. MULTIPLIER BIT, SHIFT GATE CORRESPONDENCE. -23- The logic equations actually implemented in Multiplier Recode box are shown below. The MYNEG (Multiply Negation) signals are used to set the NEG controls of the SDS to select whether the multiple is added or subtracted. ML7Y1 = (UQ 5T UQ 58 UQ 59 )v(UQ 5T UQ 58 UQ 59 ) ML6U1 = UQ cQ ®UQ cn 5o 59 ML5Y2 = (^Q 59 UQ 6o UQ 6l )v(UQ 59 UQ 6o UQ 6l ) MLUY2 = UQ 6q ®UQ 6 ML3Y3 = (UQ 6l UQ 62 UQ 63 )v(UQ 6l UQ 62 UQ 63 ) ML2Y3 = UQ 62 ®UQ 63 ML1YU = (^Q 63 UQ 61| UQ 65 )v(UQ 63 UQ 6U UQ 65 ) MDYk = UQ., ®UQ^ 64 Op NEGO = UQ NEG1 = UQ__®UQ__ p ( ?9 NEG2 = UQ ®UQ 6l NEG3 = UQ 6l ®UQ 63 NEGU = UQg Brief Operational Description The fractional part of the multiplier is loaded into the UQ-Register from the V-BUS. The fractional part of the multiplicand is loaded into the UH-Register from the V-BUS and then forwarded to the M-Register. Both fractions are 7 bytes (56 bits) long. The low-order byte of the UQ-Register plus an additional position UQ. (initially 0), drive the multiplier recorder. One multiplication loop consists of the following sequence of steps: -2k- 1) Recoder sets up shift gates and NEG signals. Contents of US-UM (accumulated result in signed-digit format) gated into subtracter cascade. 2) Output of subtracter cascade loaded into secondary rank of accumulator, LS-LM. 3) Multiplier shifted right 8 bits. Secondary rank of accumulator (LS-LM ) shifted right 8 bits into primary rank ( US-UM). This loop is executed seven times, once for each byte of the multiplier. At the end of seven loops, UQ^ may be 1. If so, then 1 times the multiplicand must be added to the partial product. This is accomplished during the assimilation pass, the steps of which are as follows: 1) Turn off subtracters 2, 3, k by setting G2 = G3 = GU = 0. Set PDYU (Propagation Logic Direct to YU input on subtracter k) . Set MDY1 if UQ65 = 1. Set NEGO = NEG1 = 1; other NEG signals set to 0. 2) Gate US-UM into subtracter cascade. The T and Z outputs of signed-digit subtracter 1 (SDSl) drive the propagation logic. Meanwhile the Z bits from SDSl propagate through SDS2 and SDS3. In SDSU the output of the propagation logic and the Z bits are combined in an exclusive OR to produce the result in a conventional form. This assimilated result is stored in the LM-Register and then forwarded to the UQ-Register. The UQ-Register serves as an input- output buffer. The range of a normalized, non-zero fraction, f, is given by To" <_ f <1 The product of two such fractions, f and f , therefore lies in the range 1 T^Z ^_ f f < 1. A product may require a terminal left shift of k bits accompanied by a reduction of its exponent. If zeros were inserted in the low-order k bits during the shift then the precision of the result would be impaired. The value of these bits, although actually computed, would normally be lost in the last right shift from the LS-LM Registers to the US-UM Registers. Logic has been added which assimilates and stores the four-digits before they are -25- lost by shifting. The borr'- a the assimilation n oupled into position 6k of the Propagation Logic. If a terminal shift is required these four bits, rather than zeros, are shifted into the low-order position. Truncation Error It is difficult to identify a normalized result while it is represented redundantly. For this reason the h extra low-order bits of the product are always assimilated but used only if the full-precision result requires a left shift for normalization. For purposes of error analysis we may assume that 60 rather than 56 bits of the product are assimilated. The range of the trunca- tion error, e , due to truncation after 60 signed-digits is given by -2 To Model Note: The symbol . represents the radix point for the radix k model division. The symbol : represents the radix point for the full precision division, radix 256. SETTING OF NEG SIGNALS: Division Stage No. Positive Partial Negative Partial Remainder Remainder (A =0) (A = 1) o o 1 NEG0 = NEG1 = 1 NEG0 = NEG1 = 2 NEG1 = NEG3 = 1 NEG1 = NEG2 = 3 NEG2 = NEG3 = 1 NEG2 = NEG3 = h NEG3 = NEGU = 1 NEG3 = NEGU = Figure 5 - Connection of Model Division to Full Precision Structure -33- C\JC\jfOfO,j. >->->-;>->->->L , fr f^tOirWfOCVJ — >-Oo'- _l — I I I — I — I _ I O UJ UJ UJ UJ UJ 5 5 525555ZZ2ZZ O if) a. CL z 3 CD O H O if) § o o - > - -i < Q. o < (f) if) < < • o < o o UJ UJ CO UJ m < Q- 3 CO AAA rr < h- n > o tn cr UJ Ul 1 ^> (- UJ Q z if) if) a -z. o if) CO ° > _1 UJ Q O < or < o o —I GO CO UJ or z> o o o oo _ 3 l + _ The Assimilation "box produces a two's complement version of the estimate of the shifted partial remainder. This together with the Division Interval Select Logic drives the Quotient Select Table. The quotient digits are represented in the same signed digit format as produced by the subtracters. The following gives the signed digit representation of each quotient digit value: Quotient Digit Representation +2 +1 1 l I T Note that a distinction is made between a positive and negative zero. The sign of all digits, including zero, is the same as the sign of the partial remainder. If the digit is formed then zero is subtracted from the partial remainder. If the digit is formed then zero is added to the partial remainder. As shown in the proof in the Appendix this method of handling a zero quotient digit eliminates bogus overflow at position 1 for division. The quotient digits are buffered until eight are collected. They are then gated to the low-order byte of the UH-UQ Register. The quotient digit also setup the shift gates and NEG signal in accordance with description in Figure 6. The operating time of the model is summarized in Table 5. -35- Block No. of Collector Delay: Input Gating 2 Assimilation 3 Quotient Selection 2 Quotient Storage and Shift Control 3 Total 10 Table 5 - Operating Times of the Model Division It should be emphasized that the scheme used in the model division is but one of many possibilities. Since the amount of logic involved is quite small (10 cards), and has a well defined interface and is physically one package, it is quite feasible to replace the model with new, hopefully improved versions. The operating time for division relative to the operating time for multiplication is primarily a function of the relative operating times of the multiplier recoder and model division. The concept of a i„odel division and the analogy to the multiplier recoder offers several interesting areas of research, some of which are being explored by the author in Ph.D. thesis research. Division Structure As mentioned earlier, a primary motivation for use of the model division approach is its high compatibility with multiplication. The division structure is the same as the multiplication structure described in conjunction with Figure 3. -36- Brief Operational Description of Full Precision Division Scheme The fractional part of dividend is loaded into the UQ- Register from the V-Bus . The fractional part of the division is loaded into the UH_Register from the V-Bus. Both fractions are 7 bytes (56 hits) long. The range of a normalized fraction, f, is given by l/l6 <_ f < 1. The model division scheme requires that the division, d, be in the range 1/2 <_ d < 1. If the given divisor is not in this range then both the divisor and dividend are shifted left until it is in range. After normalization, the divisor is forwarded to the M-Register and the dividend is for- warded to the UM-Register. The US-Register is cleared, i.e. all sign bits are set to 0. One division loop consists of the following sequence of steps: 1) The contents of US-UM (dividend) is gated into the subtracter cascade. The model division successively sets up the shift gates and NEG signals in accordance with the previous description. 2) The output of subtracter cascade is loaded into secondary rank of accumulator, LS-LM. 3) The quotient (sign bits in UH, magnitude bits in UQ) is shift left 8 bits and the 8 digits buffered in the model division are inserted into the low-order byte of UH-UQ. The secondary rank of the accumulator ( LS-LM) is shifted left 8 bits into the primary rank (US-UM). Due to the initial normalization of the divisor and corresponding shifting of the dividend, the dividend may extend across 8 bytes. The division loop must therefore be executed 8 times. After the last loop the quotient in the UH and UQ Registers is transferred to the US and UM Registers, respectively. The -37- quotient is then assimilated in the same manner as described in the brief operational description of multiplication. The range of the quotient for the division of two non- zero fractions F and F is given by 1/16 < f /f < l6 . A quotient may therefore require a terminal right shift of U bits accompanied by an increase of the exponent. Division by zero or into zero is detected during preliminary steps of the division operation. Truncation Error The range of the truncation error, e , due to truncation after 56 signed digits is given by -2 '' < e < 2 ' . If a terminal right shift of the assimilated result takes place e is brought into the range -2 < e < 2 ' , however, the first range is the best case that can be guaranteed. •38- REFERENCES [l] J. E. Robertson, "A deterministic procedure for the design of carry-save adders and "borrow-save subtracters," University of Illinois, Department of Computer Science, Report No. 235, July 5, 1967. [2] F. A. Rohatsch, "A study of transformations applicable to the development of limited carry-borrow propagation adders," University of Illinois, Department of Computer Science, Report No. 226, June 1, 1967 . [3] R. T. Borovec , "The logical design of a class of limited carry-borrow propagation adders," University Illinois, Department of Computer Science, Report No. 275, August 1, 1968. [k] J. E. Robertson, "The correspondence between methods of digital division and multiplier recoding procedures," Department of Computer Science, Report No. 252, University of Illinois, Urbana, December 1967 . [5] J. 0. Penhollow, "A study of arithmetic recoding with applications to multiplication and division," Department of Computer Science, Report No. 128, University of Illinois, Urbana, September 1962. [6] C. S. Wallace, "Suggest design for a very fast multipler," Department of Computer Science Report No. 133, University of Illinois, Urbana, February 11, 1963. [7] J. E. Robertson, Internal memo, February 11, 1968. -39- [8] J. E. Robertson, "Methods of selection of quotient digits during digital divison," Department of Computer Science, University of Illinois, Urhana, File 663, 1965. [9] D. E. Atkins, "Higher radix division using estimates of the divisor and partial remainders," IEEE Trans . Computers , vol. C-1T, no. 10 (Oct. 1968), pp. 925-93*+ . NOTE: All references except [7] and [9] are available upon request to the following: Department of Computer Science Mailing Center, Room 23^ DCL University of Illinois Urbana, Illinois 6l801 A reprint of reference [9] is available from the author -ko- APPENDIX The Appendix includes the proof of the validity of the bogus overflow correction scheme and an introduction to the entire Illiac III system. -1*1- PROOF OF THE VALIDITY OF THE BOGUS OVERFLOW CORRECTION SCHEME We are concerned with the value of the output of positions and 1 of the subtracter. Inspection of the equations for the subtracter defined in Figure 3 reveals that the value of these outputs, Z* and Z* are functions only of the inputs to positions 0, 1 and 2. Since the 0th position is not imple- mented S and X are implicitly both zero. Furthermore since the subtrahend is always considered to be positive and can never be greater than 128, Y and Y are also both zero. Table A-l enumerates Z* and Z* as functions of X*,X* and Y*. Recall the notational convention defined below: T. l z. 1 z* 1 1 1 1 ~0~ 1 1 1 A digit under the X* or X* columns may be either a positive or negative zero. The table is defined for NEG =0. If NEG were to be 1 , the signs of all output digits would be complemented but magnitudes and thus bogus overflow conditions would be the same. It is therefore sufficient to complete the proof for NEG = 0; the proof for NEG = 1 follows immediately by symmetry. For all the cases in Table A-l for which Z* is zero no problem arises. Note that Z* is non-zero if and only if X* = 1, in other words , when S X = 1. For the cases marked with * the bogus overflow scheme is valid. For those marked with # the scheme is not valid but we shall show that within the constraints of the Illiac III implementation these cases cannot occur. The proof is considered for the three classes of operations in which the subtracter cascade is used, namely, addition-subtraction, division and multiplication. -1+2- Entries in Table are Z* (weight 256) and Z* (weight 128 ] (a) (b) Row No. x l* x* Y 2 = Y 2 =l (128) (6k) . (6k) (6k) 00 01 l 00 00 I 01 01 1 01 00 1 11* io# 1 1 01 01 1 1 00 oo 1 1 11* 11* 1 1 10# 10# Notes: S Q = X Q = Y Q = Y ± = NEG = A digit may be positive or negative. Numbers in parentheses indicate the weight of the digital position. *Indicates bogus overflow which will be corrected by comple- menting the sign of Z* and disposing of Z*. ^Indicates cases in which this correction scheme is not valid. TABLE A-l - Possible Values of Z* and Z* ■k3- Addition - Subtraction The radix point for floating point operations is between posi- tions 8 and 9 of the subtracters. All operands are less than 1, therefore X* = X* = and Y = 0. We are therefore restricted to row 1, column a (denoted 1-a) of the table. Bogus overflow is therefore avoided. Division We have shown that bogus overflow arises if and only if X* = 1. For the case of division this is also the sign of the partial remainder and since the partial remainder is negative it is added to the selected multiple of the division. Addition requires that X* be complemented prior to entry into the subtracter and thus X* becomes 1. In division the sub- tracters always see only positive inputs and therefore the states in rows 5, 8 and 9 cannot occur. Bogus overflow is avoided altogether. Multiplication The multiplicand, M, is stored in the M-Register and is in the range l/l6 to 1. The maximum multiple of M which may be formed is 128 times M (at the first subtracter) and thus Y can be equal 1 only at the first subtracter. The contents of the accumulator (US-UM), the signed- digit input to the first subtracter, is always less than 1 in magnitude and therefore X* = X* = 0. Thus all entries in column b except 1-b are eliminated as possibilities. The remaining task is to show that case 9-a cannot occur. At this point we must note a property of the multiplier recoding scheme defined in Table 3. This property is that 128 is the maximum multiple of the multiplicand which may be combined with the partial product in any one pass through the subtracter cascade. This may be established by considering a group of nine bits which are to be recoded. If +2 or »2 is selected as -kk- the recoded version of the leftmost trio of bits, then all recoded digits to the right are either zero or of opposite sign. Recall from Figure k that the selection of 2 at the left of the recoding logic generates a 7 bit left shift of M into the first subtracter. Having ruled out cases 5-b and 9-b , we may state that the 10 condition in 9-a occurs if and only if X* = X* = 1. This means that the algebraic value of the signed-digit input to the subtracter is more negative than -128. This clearly cannot be the case unless a multiple of 128 x M has been combined with a non-zero partial product of the same sign as the multiple. This may occur only in the first subtracter. If case 1-a occurs, the partial product is less than 128 in magnitude and thus cannot become more negative than -128 by subsequent operations in the subtracter cascade. Case 1-b can occur only if a multiple of 128 is selected and thus the subsequent subtracters can only either preserve the value of the partial product by subtraction of zeros or decrease it in magnitude. If the mag- nitude is decreased, it will be decreased to less than 128 and thus case 9 _ a is immediately ruled out. If subtraction or additon of zero occurs at sub- sequent subtracters the 01 pattern in 1-b will propagate through and cannot become 10. This is demonstrated by the following reasoning. For case 1-b with X* = 0, Z* is never 1. Z* and Z* are the X*, X* inputs to the next subtracter and thus we are brought to either row, but will be corrected back to either the case Z* = 1 , Z* = 1 or to the case Z* = 1, Z* = 0. The 01 pattern of case 1-b will therefore pass through all of the subtracters and will never be reformed as 11 in 9-a. -1*5- BRIEF DESCRIPTION OF THE ILLIAC III COMPUTER SYS'-' The Illinois Pattern Recognition Computer, Illiac III, is a digital processor for visual information. It is primarily designed for automatic scanning and analysis of massive amounts of relatively homogeneous visual data. In particular the design is an outgrowth of studies at this laboratory of a computer system capable of scanning, measuring and analyzing in excess of 3 x 10 "bubble chamber negatives per year. Illiac III, though specifically designed to process visual information, also provides complete facilities for standard general- purpose computation. Both the picture processing and general- purpose computation facilities of Illiac III will be available to users on a time-sharing basis. As can be seen in Figure A-l, Illiac III is a multi- processor computer system. Six processors (U Taxicrinic Processors and 2 Input/Output Processors) access in parallel the computational/ storage units consisting of 2 Arithmetic Units, 1 Interrupt Unit, 1 Pattern Articulation Unit, and k Storage Units. Each computational/ storage unit of the computer system specializes in a particular activity. Thus, for example, all floating-point computation is done in the Arithmetic Units, while picture processing is performed primarily by the Pattern Articulation Unit. Processors, on the otherhand, analyze user jobs and route their constituent tasks to the appropriate specialized processing units. The individual processors of the system can operate simultaneously and independently (within the limits imposed by the System Supervisor) with a consequent increase in overall efficiency. The Input/Output Processors (I0P) are attached via Channel Interface Units and Device Controllers to various input and output devices. Among facilities important for the ingestion of visual information are 8 CRT flying spot scanners: two for 70 mm film, two for h6 mm film, two for microfilm/microfiche, and two for * From Section 1 of the Illiac III Programming Manual. -1*6- TAXICRINIC J J PROCESSORS >s ^ ^ ><; X jK ^ ^ "^ FC SC DC *; s * s < i 5 ,< , c ^ t i FC SC DC ^ »< >! »S 7 C ■><■ ^ ^ ^ FC SC DC >C >5 S 5 *t 7* 7<- \ \ "\ FC SC DC *= >< H >< 7' 7* 1 1 ] ARITHMETIC UNITS AU AU ^ ^r-^ 5 > 5 7 C 7 C \ \ \ EXCHANGE NET- A PAU I/O PROCESSORS o I ( t t t t t-rA X \ \ \ \ \ \ \ ■CENTRAL UNITS i i r CHANNEL INTERFACE UNITS 4 5 6 7 8 9 I 10 I II 12 13 14 15 SECONDARY STORAGE SCAN/DISPLAY INTER- MACHINE LINKS LOW SPEED TERM. Figure A-l Schematic of Illiac III Computer -1*7- microscope slides. These scanners c i.n also operate as film cameras and thus serve as both input and output devices. Monit-. stations have also been attached to the Input/Output syst' These each consist of a CRT display, a typewriter, and a magnetic tape unit; and are provided to assist human control of the analysis. The duty of the Pattern Articulation Unit (PAU) is to perform local preprocessing on the input from the scanners, such as track thinning, gap filling, line element recognition, etc. The logical design of this all-digital processor has been optimized for the idealization of the input image to a line drawing. Nodes representing end points, points of inflection, points of inter- section, etc. are labeled in parallel by appropriate programming under overall control of the Taxicrinic Processor. The abstract graph describing the interconnection of labelled nodes is then extracted as a list structure, which comprises the normal output of the Pattern Articulation Unit. This output is then operated on by a Taxicrinic Processor (TP), which assembles such graphs into coherent list structures subject to a recognition grammar and then syntactically categorizes them to complete the visual recognition process. The Taxicrinic Processors are primarily responsible for the execution of user programs, that is, to oversee the operations of the Pattern Articulation Unit, the Arithmetic Unit and to initiate input/ output operations in the IOP's by making requests to the Interrupt Unit. The Arithmetic Unit (AU) is used exclusively for performing arithmetic operations for the TP. Although there are a few simple arithmetic operations which can be done in a TP (e.g. , integer addition) the more complicated operations are done in the AU. The AU has been optimized for double-word floating point arithmetic. The Interrupt Unit (IU) handles all the interrupt requests from the TP and IOP. When an interrupt is requested it notifies the proper processors which then take appropriate action. -48- All of the Illiac III processors and units communicate with each other through the Exchange Net (XN) as shown in Figure A-l. The Exchange Net is responsible for all the necessary queueing and priority checking. As noted above, there is indeed a reason for calling one piece of equipment a processor and another a unit, even though the type of operations they perform may both appear to be "processing" operations. In the Illiac III system all major modules are designated as either "processors" or "units" according to their position in the Exchange Net. In Figure A-l, the processors are shown at the top and bottom and the units are shown on the right. The effect of this division is that processors may communicate directly with units and vice versa but may not communicate directly with each other. If a processor needs to communicate with another processor it must get help from a unit (normally the Interrupt Unit) and if a unit (say the PAU) wants to communicate with another unit (say a storage unit) the information must be transferred through a processor (the TP in this case). -k9- Form AEC-427 (6/68) AECM 3201 U.S. ATOMIC ENERGY COMMISSION UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR DISPOSITION OF SCIENTIFC AND TECHNICAL DOCUMENT 1. AEC REPORT NO. C00-1018-1183 ( See Instructions on Reverse Side ) 2. TITLE DESIGN OF THE ARITHMETIC UNITS OF ILLIAC III USE OF REDUNDANCY AND HIGHER RADIX METHODS 3. TYPE OF DOCUMENT (Check one): LJ a Scientific and technical report LJ b. Conference paper not to be published in a journal: Title of conference Date of conference Exact location of conference Sponsoring organization □ c. Other (Specify) 4. RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): l^j a. AEC's normal announcement and distribution procedures may be followed. Q b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors I I c - Make no announcement or distrubution. 5 REASON FOR RECOMMENDED RESTRICTIONS: SUBMITTED BY. NAME AND POSITION (Please print or type) Daniel E. Atkins Organization Department of Computer Science, University of Illinois, Urbana, 111, Signature fcK&O&ti-^ Date May 28, 19 69 FOR AEC USE ONLY AEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE A RECOMMENDATION: NNOUNCEMENT AND DISTRIBUTION PATENT CLEARANCE: D a. AEC patent clearance has been granted by responsible AEC patent group. U b. Report has been sent to responsible AEC patent group for clearance. LJ c. Patent clearance not required. : .^ 3fc r ,---;>