i^asffi] 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 l£6r 
 
 no. 131 -140 
 
 cop. 3 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 To renew call Telephone Center, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 NOV 1 1*2 
 
 SPP 1 a < 
 
 "95 
 
 LI61— O-1096 
 
Digitized by the Internet Archive 
 
 in 2013 
 
 http://archive.org/details/suggesteddesignf133wall 
 
71 ° * ' * 3 DIGITAL COMPUTER LABORATORY 
 
 °P UNIVERSITY OF ILLINOIS 
 
 URBANA, ILLINOIS 
 
 REPORT NOo 133 
 
 SUGGESTED DESIGN FOR A VERY FAST MULTIPLIER 
 
 by 
 C. S. Wallace 
 
 February 11, 1963 
 
 This work was supported in part by the 
 Atomic Energy Commission under Contract No. AT(ll-l)-4l5 
 
Abstract ' * i 
 
 It is suggested that the economics of present large-scale scientific 
 computers could benefit from a. greater investment in hardware to mechanize 
 multiplication and division than is now common. As a move in this direction 
 a design is developed for a multiplier which generates the product of two 
 40-digit numbers using purely combinational logic, i.e., in one gating 
 step. This design is described in some detail to establish that no ex- 
 ceptional cases invalidate the assertion made about its speed of operation 
 Using straightforward diode-transistor logic, it appears presently possible 
 to obtain products in under one microsecond, and quotients in three. A 
 rapid square root process is also outlined. Approximate component counts 
 are given for the proposed design. 
 
 I. Introduction 
 
 A contemporary computer spends a. large percentage of its time 
 executing multiplication, and to a lesser extent, division. The recent 
 advent in very large machines of bookkeeping controls -operating in advance 
 of the arithmetic unit to execute memory fetches, stores and address mod- 
 ification, etc. -has tended to increase this percentage by relieving the 
 arithmetic unit of many trivial burdens. The arithmetic unit of such a 
 machine, when used for scientific computations, will spend nearly half 
 its time multiplying or dividing. Paradoxically, the amount of hardware 
 built into large machines specifically for these operations is rarely very 
 great. Thus the situation has arisen, viewed in the context of a very large 
 machine involving a heavy investment in memory, peripheral equipment and 
 controls, that it may be advantageous to the economy of the machine as a whole 
 to increase the hardware investment in the operations of multiplication and 
 division, even beyond the point where an increment of this investment yields 
 an equal incremental increase in multiplication-division speed. Consistent 
 with this point of view, this paper will describe the logical design and 
 economics of a multiply-divide unit designed for maximum possible speed. 
 
 For multiplication, which will be discussed first, obvious ways 
 to get high speed are to (a) reduce the number of partial products to be 
 summed, and (b) to extend the parallelism used in their addition. The 
 limiting case of the latter course in which the product is formed by 
 combinatorial logic in one gating step is treated below. 
 
 -1- 
 
This approach, while clearly involving a great deal of hardware, 
 has some by-product advantages. First, control complexity is reduced to 
 the minimum of a single step. Second, with present transistor technology 
 the time for the distribution of gate signals to a flipflop register 
 augmented by the time required for the flipflops to settle into their new 
 state generally exceeds by a considerable factor the propagation delay 
 through a combinatorial logic element. Thus there is a strong argument 
 toward performing many levels of logic in each gating step. 
 
 In this paper attention will be restricted to the multiplication 
 and division of ^O-digit two's complement binary numbers. 
 
 II. The Adder Tree 
 
 Given a large number of numbers to be summed by combinatorial 
 logic, it is clearly unnecessary and undesirable to ha.ve carry propagation 
 at each intermediate stage of the additions. A straightforward approach, 
 used here, employs a sufficient number of full-adder words , each consisting 
 of as many full-adder circuits as there are significant digits in the 
 numbers to be added. The full-adder circuits are not interconnected in 
 any way. A full-adder word gives two output numbers, sum and carry, whose 
 sum equals the sum of the three input numbers, If there are n numbers t« 
 be summed, n - 2 full-adder words will be needed to express the sum as ti 
 numbers. These two numbers must then be summed in a carry -propagating adder 
 to produce the final result. 
 
 :.o 
 :wo 
 
 All partial products to be summed are generated simultaneously. 
 An arrangement of the n - 2 full -adder words to start work on all partial 
 products simultaneously, and to produce the result after as few full- 
 adder propagation delays as possible is desired. This suggests a tree 
 structure of the type shown in Figure 1. In this figure each box represents 
 a full-adder word, the three incoming numbers identified at the top. The 
 sum and carry numbers lea,ving the bottom of the box are identified by 
 the letters s and c. 
 
 As can be seen, starting with the carry -propagating adder, each 
 additional level of full-adder words increases the number of available 
 inputs by a factor of 1 . 5 or less. The inputs shown by w , W , etc., 
 to W^ are the partial product numbers. The example shown in the figure, 
 with twenty input numbers, corresponds to the particular multiplier 
 design to be developed below. 
 
 -2- 
 
WO W3 W5 W7 W9 wn W.13 W15 W17 V/19 W2l W23 W25 W27 W29 ¥31 W33 W35 W37 W39 
 
 4 1 
 
 18s 
 
 C±OS 
 
 i t j t 11 11 
 
 cl2s 
 
 cl3s 
 
 Jt t 
 
 lit # 
 
 cl4s 
 
 J L 
 
 cl5s 
 
 cl6s 
 
 ; 8 £ 
 
 cos 
 
 @L 
 
 JLJL 
 
 : re zn 
 
 level 7 
 
 cl7s 
 
 level 6 
 
 c9s 
 
 
 t 1 1 
 
 clOs 
 
 c5s 
 
 t__ 
 
 1 
 
 ells 
 
 T 
 
 L. i 
 
 c6s 
 
 1 j 
 
 c7s 
 
 rs 
 
 \ c3s 
 
 © 
 
 I I K 
 
 c^s 
 
 >) > ©1 
 
 c2s 
 
 % _£ $ 
 
 sic 
 
 JL i-JL 
 
 carry propagating adder 
 
 > 
 
 Fina.l Sum 
 
 level 5 
 
 level 4 
 
 level 3 
 
 level 2 
 
 level 1 
 
 Figure 1. Adder Tree 
 -3- 
 
Certain complications arise from the fact that the summands, being 
 partial products, are shifted relative to one another. Thus the three input 
 numbers to any one adder word do not in general cover the same range of 
 digital positions. At the less significant end of the adder word, there will 
 be digital positions having only one or two inputs. Since the function of an 
 adder word as to reduce three input numbers to two output numbers, these 
 digital positions of the adder word need not contain full adder circuits. In 
 some cases, they need contain only one or two inverter circuits. Unfortunately, 
 the same simplication does not apply at the more significant end. Each 
 partial product, and hence in general any input number to an adder word, may 
 be negative. In the two's complement representation, a number to be added in 
 an adder word to a number of greater significance must be augmented to the 
 left of its sign digit with copies of its sign digit. Thus, the adder word 
 must contain full adder circuits as far left as the most significant (i.e., sign) 
 digit of its most significant input number if all input numbers may be negative. 
 Also, the most significant full adder circuit of an adder word whose outputs, 
 by virtue of entering another adder word together with an input number of 
 greater significance, must be augmented by copies of their sign digits, each 
 capable of driving several full adder circuit inputs. However, in such 
 cases, it is possible to arrange that only the carry output number need be so 
 augmented, the sum output being restricted to positive values. 
 
 As will be shown when the circuitry proposed for the full adder 
 is described, carry output may be provided with extra fanout without overall 
 loss of speed (see Figure 2). If the three sign digit inputs to the most 
 significant adder stage of an adder word are denoted x, y and z, the adder 
 stage may take the normal form, giving two outputs 
 
 C = Xy V yZ v zx, s = (x ^ y v Z ) • c v- xyz 
 
 where, of course, the digit c enters the next adder word displaced one digital 
 position to the left, and provided that both s and c are augmented by copies 
 are far left as is necessary. This form will be used only where the s output 
 does not in fact require augmenting, as provision for the extra fanout would 
 slow the addition. To ensure that the sum output word is always positive, 
 use will be made of the fact, true in all cases where this maneuver is re- 
 quired, that the three input numbers are not of equal significance. Two of 
 the numbers will have identical digits in both the sign and next less sig- 
 nificant digital positions. Thus the left hand two adders of the adder word 
 must sum three numbers having the form at their left-hand ends 
 
Inputs (+0.5 or -1.5v) 
 O Q 
 
 4— T-^4 — 4 
 
 .S\- 
 
 i 
 
 ■#■ 
 
 -W— ^ 
 
 -* 
 
 % 
 
 All 
 
 1} H15V 
 
 +15v 
 
 i> 
 
 ~# 
 
 — i5 
 
 10K 
 l6v 
 
 A 
 
 o— « 
 
 Emitter follower for 
 fanout of carry when 
 necessary. 
 
 >~ m o 
 
 i.9K-ir 
 
 carry output 
 
 v -l6v i -2 . 5v 
 
 >1K 
 
 A +15v 
 
 -\ — 
 
 m — * 
 
 t 
 
 ■161 
 
 _* 
 
 i4k 
 
 Sum 
 Output 
 
 2.5v 
 
 Figure 2. A Full Adder Circuit 
 
xy ----- - 
 
 aa ------ y irrto two num b ers of the fon 
 
 bb 
 
 Slim (non-negative) opq ------ 
 
 Carry - - - rrrt ------ 
 
 q is developed as the sum modulo two of y, a and b in the usual way, 
 The logical expressions for the remaining digits are: 
 
 r = x ^ a ^ b 
 
 t = (a v b) • y • (ab) 
 
 p = r ■ [x ■ (a v b)] 
 
 The only digit requiring fanout in any circumstances will be r, which can be 
 amplified without loss of speed (see Figure 3). 
 
 When this form of sum and carry outputs from one adder stage both enter 
 the same adder word in the next stage together with a third input of greater 
 significance, the left hand end of the second adder word will have stages 
 with only two significant input digits. In this case, the circuits used in 
 these stages can be half -adders . 
 
 At least one adder word in each level of the adder tree will have 
 at i t s more significant end adder circuits handling digits of the weight 
 of the sign digit of the final product. Since two's complement representation 
 is proposed, carry outputs from these circuits may be ignored. No adder word 
 will contain digital positions to the left of the sign digit of the final 
 product „ 
 
 At the other end of the adder words,, each level of the adder tree 
 will contain an adder word some of whose least significant output digits have 
 less significance than any output digits of any other adder word in the same 
 level. These digits may bypass all remaining levels of the tree and enter 
 directly into the (double-length) carry propagating adder. Thus, at the time 
 when adder word one produces its output numbers, these will not contain the 
 less significant end of the product, which will have already been produced 
 in its final form by the right-hand end of the carry propagating adder. 
 
 -6- 
 
mod 2 • y 
 
 ) O p =: [(a v t) + x 
 
 i -i6. 
 
 O r = x ^ a. ^ b 
 
 mod 2 
 
 Figure 3« 
 
 Possible circuit for the two most significant stages of an 
 adder word giving a non-negative sum output. Biasing resistors 
 and clamps not shown. 
 
 -7- 
 
Ill . Generation of Partial Product s 
 
 Each partial product will be selected from a limited number of 
 multiples of the multiplicand on the basis of some of the multiplier digits. 
 It is proposed that the available multiples be +2, +l, 0, -1, and -2 times 
 the multiplicand. Partial product W., where '»i» is odd, will depend on 
 multiplier digits x.^, x., and x±+1 , where the multiplier digits are labelled 
 from x Q (the sign digit) to ^ (the least significant). Digit x,_ is taken 
 as zero. The rules for selection of a multiple are: 
 
 ^2 if x 
 
 X. ■ X, 
 
 1-1 1 1+1 
 +1 if ~ 
 
 1-1 X i ' X i+1 v X i-1 ■ x i ■ 
 
 i+1 
 
 if x 
 
 1-1 X i " X i+1 V X i-1 ■ \ ' * 1+1 
 
 _1 lf X ,- 1 ' X - ' X. n v X • x~ • v 
 
 !"! 1 1+1 i-1 X i X i+ 1 
 
 -2 if . x. _ • x. • x~ 
 
 i-l i i+1 
 
 This recoding scheme has the following advantages: 
 
 i) It requires little logic. 
 
 ii) All selections can be made simultaneously; the 
 recoding is not a serial process. 
 
 iii) The multiples used can be obtained from the multiplicand 
 by the trivial processes of complementation and 
 displacement . 
 
 iv) It produces only 20 partial products from the kO- 
 digit multiplier. 
 
 v 
 
 It applies without alteration to the leftmost digits 
 of the multiplier. 
 
 Alternative schemes involving a smaller number of partial products, 
 each selected from more possibilities, are considered Inadvisable in this 
 -ontext. If, say, eight or nine multiples are allowed, the number of partial 
 products is reduced to 1+. The time saving however is small; the adder tree, 
 while using 12 rather than 18 adder words, is shortened by only one level 
 
in seven. Such a recoding scheme would require multiples not obta.ina.ble by shifting and 
 'complementation as above. The generation of these multiples would almost certainly re- 
 quire longer than the propagation delay of the one adder tree level saved. Moreover, it 
 is not clear that any equipment saving could be made in this way, as the circuits re- 
 quired for selection of the partial products involve a considerable amount of equipment, 
 which increases linearly with the number of possible multiples. 
 
 A two's complement number representation has been assumed. When a. nega.tive 
 multiple of the multiplicand is selected, the complement of the multiplicand is used, 
 and a correction applied to this complement by adding one to its least -significant 
 iigit. To add this correction directly to the complement in a special adder provided for 
 the purpose would be both time-consuming a.nd expensive. Instead, the correction digit 
 for some partial product W., occurring in a, digital position i + 39, can be appended to 
 :he right-hand end of the next more significant partial product W. , thus extending 
 Ms partial product to the right from position i + 37 to i + 39- X A slight improvement 
 )f this method is to so recode the last digit of W. that the correction bit will occur 
 i position i + 38, thus extending W^ by one digital position rather than two. 
 
 Suppose the least-significant digit of the possibly shifted, but as yet un- 
 complemented, multiplicand is x. Instead of setting, for a negative multiple, digit 
 
 39 of W. equal to x, and digit i + 39 of W._ 2 equal to 1, we set digit i + 39 of W. 
 •qua! to x, and digit i + 38 of W. equal to x. 
 
 Thus, after allowing a. possible left displacement of the multiplicand of one 
 jlace for multiples of modules two, the range of digital positions occupied by significant 
 igits of W. is i - 1 to i + kO. 
 
 The correction to V ± cannot be so treated, as it is the most significant 
 jartial product. Its correction digit, which, by use of the above technique, Is made to 
 jie in position 39, is instead added to the least-significant partial product W . This 
 an be dene without loss of time in the following ways. 
 
 Digits of W 37 and W in positions to the left of and including position 
 9 are not fed directly into adder word 17 . Instead, they are fed into a, short adder 
 3rd section (number 19) in level seven of the adder tree, having adder stages in 
 jigital positions 36 to 39- This section also receives, in position 39, the W 
 
 action digit. The sum and carry outputs of this section cover digital positions 36 
 39 and 32*. to 38 respectively. These, together with positions 3^ to 39 of W , enter 
 'sitions 3^ to 39 of adder word 17. Since level seven of the adder tree is necessary 
 |"» any case, this additional section does not delay the final result. 
 
IV. Dimensions of the Adder Tree 
 
 Ha,ving decided upon the formation of the partial products, a.nd 
 
 the general scheme for their addition, one can now fix the length and 
 
 relative significance of the numbers appearing at the inputs to the various 
 
 adder words, and hence the dimensions of the adder words themselves. In 
 
 the following list, the inputs to each adder word are listed with their 
 
 ranges of significant digital positions. Partial product input numbers, as 
 
 modified by the addition of correction bits, are called W. . Sum and carry 
 
 output numbers of adder word j are called s . and c . . Where the last few 
 
 J J 
 output digits of an adder are fed directly to the carry propagating adder, 
 
 this is shown by the range of digital positions involved and the word "out." 
 
 In this case, the digits going "out" are not included in the listed sum and 
 
 carry outputs. Where the last few stages of an adder word have two or less 
 
 inputs, and hence do not involve the use of full adder circuits, the range 
 
 of digital positions is shown with the word "void." 
 
 Adder 
 Word 
 
 Inputs 
 
 Outputs 
 
 Stages Remarks 
 
 19 
 
 38, 39 of W 
 
 39 
 
 36-39 of W 3T 
 
 Correction to Wl at position 39 
 
 s: 36-39 36-39 
 c: 34-38 (4) 
 
 18 
 
 W3 (2-43) 
 W5 (4-45) 
 W7 (6-1+7) 
 
 s-: 1-47 2-43 kk-k7 void 
 o: 1-45 (42) 
 
 17 W39 (40-78), sig (36-39) 
 W37 (40-77), sig (34-38) 
 W35 (34-75) 
 
 s: 34-73 34-75 74-78 out. 
 c: 32-73 (42) 76-78 void 
 
 16 
 
 W33 (32-73) 
 W31 (30-71) 
 W29 (28-69) 
 
 s: 28-73 28-69 70-73 void 
 c: 26-71 (42) 
 
 15 
 
 W27 (26-67) 
 W25 (24-65) 
 W23 (22-63) 
 
 s: 21-67 22-63 
 g: 21-65 (42) 
 
 64-67 void 
 
 UNIVERSITY OF 
 ILLINOIS LIBRARY 
 
 .in. 
 
Adder 
 
 
 
 
 
 Word 
 
 Inputs 
 
 Outputs 
 
 Stages 
 
 Remarks 
 
 14 
 
 W21 (20-61) 
 
 a: 16-61 
 
 16-57 
 
 58-6l void 
 
 
 W19 (18-59) 
 
 c: 14-59 
 
 (42) 
 
 
 
 W17 (16-57) 
 
 
 
 
 13 
 
 W15 (14-55) 
 
 s-: 10-55 
 
 10-51 
 
 52-55 void 
 
 
 W13 (12-53) 
 
 c: 8-53 
 
 (42) 
 
 
 
 wii (10-51) 
 
 
 
 
 12 
 
 W9 (8-49) 
 
 s.: 0-49 
 
 1-45 
 
 46-49 void 
 
 
 sl8 (1-47) 
 
 Co 0-47 
 
 (46) 
 
 
 
 cl8 (1.49) 
 
 
 
 
 11 
 
 sl7 (34-73) 
 
 s: 28-71 
 
 28-71 
 
 72, 73 out 
 
 
 cl7 (32-73) 
 
 c: 26-71 
 
 (44) 
 
 
 
 sl6 (28-73) 
 
 
 
 
 10 
 
 cl6 (26-71) 
 
 s: 21-71 
 
 21-65 
 
 66-71 void 
 
 
 sl5 (21-67) 
 
 c: 19-67 
 
 (45) 
 
 
 
 cl5 (21-65) 
 
 
 
 
 9 
 
 sl4 (16-61) 
 
 s: 9-6l 
 
 1.0-55 
 
 56-6l void 
 
 
 elk ( 14-59) 
 
 c: 9-59 
 
 (46) 
 
 
 
 sl3 (10-55) 
 
 
 
 
 8 
 
 cl3 (8-53) 
 
 s: 0-53 
 
 0-47 
 
 48-53 void 
 
 
 sl2 (0-^9) 
 
 c: 0-49 
 
 (48) 
 
 
 
 cl2 (0-^7) 
 
 
 
 
 ■11- 
 
Adder 
 
 
 
 
 
 Word 
 
 Inputs 
 
 Outputs 
 
 Stages 
 
 Remarks 
 
 7 
 
 sll (28-71) 
 
 s: 21-67 
 
 21-71 
 
 68-71 out 
 
 
 ell (26-71) 
 
 cs 19-67 
 
 (51) 
 
 
 
 slO (21-71) 
 
 
 
 
 6 
 
 clO (19-67) 
 
 s: 9-67 
 
 9-59 
 
 60-67 void 
 
 
 s9 (9-6l) 
 
 c: 7-61 
 
 (51) 
 
 
 
 c9 (9-59) 
 
 - 
 
 
 
 5 
 
 s8 (0-53) 
 
 s: 0-53 
 
 o-4o 
 
 i+l-53 void 
 
 
 c8 (0-49) 
 
 c: 9-^-9 
 
 (to) 
 
 
 
 wi (o-4o) 
 
 
 
 
 k 
 
 s7 (21-67) 
 
 s: 9-61 
 
 9-61 
 
 62-67 out 
 
 
 c7 (19-67) 
 
 c: 7-6l 
 
 (53) 
 
 
 
 s6 (9-67) 
 
 
 
 
 3 
 
 c6 (7-61) 
 
 s: 0-6l 
 
 0-^9 
 
 50-61 void 
 
 
 s5 (0-53) 
 
 c: 0-53 
 
 (50) 
 
 
 
 c5 (O-49) 
 
 
 
 
 2 
 
 s4 (9-61) 
 
 s: 0-53 
 
 0-61 
 
 5^-6l out 
 
 
 ck (7-61) 
 
 c: O-53 
 
 (62) 
 
 
 
 s3 (0-6l) 
 
 
 
 
 1 
 
 s2 (0-53) 
 
 s: 0-53 
 
 0-53 
 
 All digits out 
 
 
 c2 (0-53) 
 
 c: 0-52 
 
 W 
 
 
 
 c3 (0-53) 
 
 
 
 
 Total adder steps: 7V7 
 
 ■12- 
 
V, Circuitry 
 
 The suggested circuits employ diode OR-AND logic and transistor 
 inverting amplifiers,, run saturated. Figure 2 shows a possible circuit for 
 a full-adder stage. It is designed to he fed from, and to feed, exactly 
 complementary circuits using npn transistors with their emitters tied to 
 -2 volts and collectors caught at +0.5 volts. Either output can drive one 
 input of a complementary circuit. Thus, odd-numbered levels of the adder 
 tree would employ one polarity of circuit, and even-numbered stages would 
 employ the other. The circuit shown generates the complement of the sum 
 and carry outputs as normally defined,, However, since the logical, equations 
 for both sum and carry of a full adder are self-dual, the same circuit con- 
 figuration is employed in both varieties of circuit. Another way of looking 
 at the polarity of signals is to define the output of a transistor of either 
 sort as one of the transistor is on, in which case the circuit shown will 
 give true signals at all adder tree levels. Notice that if all inputs to the 
 adder tree are zero in this latter convention, all transistors of the tree 
 will be cut off. This point will be of importance to the discussion of 
 division. 
 
 The component values shown guarant.ee in an assumed worst-case 
 combination of ±3 per cent resistor and power supply variations, base turn- 
 on and turn-off currents of about one-twelfth of the maximum collector standing 
 current. The use of modern epitaxial transistors of a.roung $1 in cost, and 
 of diodes around $.30 cost should, with reasonable care in packaging, give 
 a circuit propagation delay of about 3 nsec per transistor, or 60 nsec for 
 a full adder. 
 
 It is proposed that the partial products be generated using 
 OR-AND-NOT circuits of the same general type as used in the full adder. One 
 such circuit, and hence one transistor, would be required for each digit of 
 each of the twenty partial products, a total of 800 transistors. A circuit 
 producing the digit of weight 2"^ in partial product W. woula produce the 
 function 
 
 (p . +1 - 72) . (p . v Tl) • (p~ ^ ~) • (^ v 72) 
 
 where v . are the digits of the multiplicand, numbered from in order of 
 decreasing significance, and the signals +2, + 1, -1 and -2 are the signals 
 generated by receding the multiplier digits as described in section 3 to 
 
 -13- 
 
give the multiple for W. . The recoded multiplier digit zero can be obtained 
 by making signals +1 and -1 true simultaneously. Partial products generated 
 for introduction to level seven of the adder tree must differ in polarity 
 from those for introduction to levels four or six. Complementary forms of 
 the selector circuit would be used for the two polarities. 
 
 Each recoded multiplier digit signal must drive ^0 inputs, as 
 must each polarity of multiplicand digit. Thus, 160 driver circuits of about 
 200 ma. capacity are required. Modern silicon epitaxial transistors can be 
 used to make such drivers with delay times of about 30 nsec. However, in 
 view of the possibly large spatial fanout of these signals, a delay time 
 estimate of 100 nsec might be more realistic. 
 
 The logical conditions for the receded multiplier digit signals 
 would each be generated from an OR -AND- NOT single transistor circuit forming 
 the input to the associated driver. Eighty such circuits are needed. 
 
 The circuitry required for the 79 -digit carry-propagation adder 
 will not be discussed. General, designs nave appeared in the literature 
 capable of performing the carry propagation in time of the order of 100 nsec. 
 
 .It should be noted that of the 79 digital positions, only the 
 propagation time over the most -significant 5^ will be additive to the 
 propagation delay through the a.dder tree. 
 
 VI. The Possibilities for Division 
 
 Although a. case can be made for the use of a, structure of the form 
 described solely for the purpose of multiplication, it is of interest to see 
 whether it can be used to execute a reasonably rapid division when suitably 
 augmented, The author has been unable to discover any very effective method 
 for direct division in the multiplier. However, it appears possible to use it 
 in a four-step process to obtain the reciprocal of a 40-digit number,, 
 
 If the multiplier structure is to be used efficiently, advantage 
 must be taken of its ability to sum many numbers simultaneously and rapidly. 
 In normal division processes, the usual direction taken to accelerate the 
 process is to inspect the more significant digits of partial remainder (or 
 dividend) and divisor, and to guess on the basis of these, the next few 
 quotient digits. The product of the guessed digits and the divisor is then 
 formed, possibly using simultaneous addition and recoding of the guessed 
 quotient digits, and subtracted from the partial remainder to give a new 
 
 -Ik- 
 
partial remainder. This method can in principle be carried to whatever extent 
 desired, but the logic required for guessing quotient digits becomes rapidly more 
 complex as the number of digits guessed per step increases. The practical limit 
 is probably not much more than six quotient digits per step. The partial re- 
 mainders may be left in a carry-unassimilated form to save time, but this con- 
 siderably complicates the circuits required for guessing.. In any case, the 
 guesses can never be always correct, so the quotient must normally be developed in ' 
 a redundant form, e.g., as two numbers which must be summed to give the final 
 quotient . Excellent though this method and its variants may be for conventional 
 arithmetic units, it does not seem feasible to the author to extend it to the 
 point where it would make good use of the a.dder tree. 
 
 The proposed method is essentially based on the following iterative division 
 process: Given x and y, to divide y by x, set 
 
 a-L = xp b 2 = yp 
 tfhere p is some approximation to the reciprocal of x, and iterate 
 
 a n+l = a n (2 " a n } > Vl = V 2 " *J 
 
 Phis process converges quadratic ally, ^ to one, and b n to the required quotient. 
 
 is, the number of correct digits in b R+1 is double that in b . If p is 
 efficiently good an approximation to l/x that xp differs from one by 2" 5 at most, 
 :hen three of the repetitive steps will give a 2*0 -bit quotient., The part of the 
 liird step which generates a. is not needed,. 
 
 The values of (2 - a.J used at each step need not be exact, provided that 
 
 ;he same value is used to form both a and b 
 
 n+1 n+1 ' 
 
 11 - The Iterative Division 
 
 For the moment, consider only the part of the iteration involved in 
 nning the a^. This part is independent of the formation of b^ We will assume 
 hat the divisor, x, is positive and normalized to lie in the range 1/2 < x < 1, 
 y inspection of the first seven digits of x following the binary points, the 
 PProximation p will be generated,. If x has the foj 
 
 )rm 
 
 O.labcdef , 
 
 1 p is giv e n the form l.qrst, then suitable expressions for the digits of p are 
 
 -15- 
 
q = ab v ac 
 
 r = be ^ abc ^ acde ^ bdef 
 
 s = ace v abc v bde v abce v abed ^ abed v abedf v a ,bcef 
 
 t = acd v/ -bed v bde v bede v abc"e v bede v abdf ^ aedf v abed 
 ^ abedf ^ abede v abedf 
 
 These expressions are essentially raw minterm forms and may not be 
 minimal. However, even as they stand, they could be realized quite cheaply 
 and quickly with diode logic. 
 
 The values of p yielded by these expressions are such that, xp always 
 has one of the forms 
 
 0.11111. . . 
 
 or 
 
 1. 00000. . . 
 
 (The digit in position zero should be interpreted with positive weight.) 
 
 The set of p values chosen are not unique in having this property 
 but they appear to require the simplest logical expressions for their 
 generation. 
 
 The first step of the process will consist of forming p, receding it 
 to give three partial products each either -2, -l, 0, +1 or +2 times x, and 
 summing these. (it may possibly be advantageous to generate the receded 
 multiplier digits directly from the digits of x.) 
 
 Only digital positions 5 et seq of the product need be explicitly 
 
 formed. 
 
 The next step of the process should be to take a as formed and 
 multiply it by (2 - &1 ), with the aim to producing as ^ a number with digits 
 1 to 10 all complement of digit zero. This aim can be achieved by using a 
 multiplier which approximates (2 - e^), but which has many fewer significant 
 digits. 
 
 -16- 
 
Consider a, number of the form of a , viz., 
 P " PPPPPqrstuv. . . 
 
 and the following approximation to 2 - a : 
 
 1-2 (p ■ qrstu) where the number in brackets is interpreted 
 as a signed two's complement fraction, hereafter called d, in the range 
 -1 to 1 - 1/32 . We may write a, ± as 1 + 2~ 5 d + e, where e is in the range 
 < e < 2 . The product of the two numbers will be 
 
 1 - 2 _10 d 2 + e(l - 2" 5 d) 
 
 This number will have a minimum value, when e = and d = -1, of 1 - 2~ 10 
 and a maximum value,, when e is just less than 2~ ±0 and d = 0,of just less than 
 1+2 . (Although, with e at its maximum value, differentiation of the 
 above expression with respect to d would give a stationary value with d 
 slightly negative; in fact the smallest negative allowable value for d, viz., 
 -1/32, is already past the stationary point and yields a- value for the product 
 of 1 + e - 2 '• (2 ■ - e) which is slightly less than the value quoted above.) 
 
 Thus the approximation to (2 - a^ given above always yields a 
 product of the desired form, differing from one by an amount in the range 
 -2 to just under 2~ 
 
 Thus this multiplier, which can be receded to give four partial 
 
 products, is used in the second step of the iteration to eive a 
 
 '2" 
 
 Similarly, the number formed by adding one to 2~ 10 times the 
 
 signed two's -complement fraction represented by digits 10 to 20 of a is an 
 
 adequate multiplier for use in step three. It may be recoded to yield seven 
 
 partial products, and will give an a whose digits 1 to 20 will all differ 
 
 from its digit zero,. 
 
 The multiplier for step four will be one plus 2~ 20 times the signed 
 two's-complement fraction represented by digits 20 to k0 of a . However, no 
 a,^ will, he generated. 
 
 In forming b^, the final answer, we could start with the dividend 
 y and multiply it successively by the four multipliers used in the four steps. 
 This would require four multiplication times in addition to the three needed 
 ' form a, 1 , a g and a^, If, however, we instead generate the reciprocal of x, 
 
 -17- 
 
then y = 1 and b is simply p which is available „ As will be shown below, it 
 is possible because of the fact that b is then a number of only five digits, 
 to obtain b at the same time as a and b at the same time as a . Thus, b, , 
 the reciprocal, can be obtained in four multiplication times, and a true division 
 can be done in five multiplication times, a.s opposed to seven. Also, the 
 reciprocal, yielded as a byproduct may often be useful to the programmer. 
 
 VIII. Detailed of Reciprocal Generator 
 
 Assume the existence of a, register R having digital positions R., 
 where i = 0, 1, 2.... Initially, the positive normalized number x occupies 
 R to Ro Q = Digits R - R are decoded to give p and hence a recoded multiplier 
 of the form s OqOr where s, q and r ta.ke values +2^ +1, 0, -l,..-^. Three 
 partial products are formed by selector circuits other than those normally used 
 to produce W. . These selectors introduce their output words into the adder 
 three at the points labelled on Figure 1 a.s B, C and D. If the normally 
 used multiplier input is made zero during this and subsequent steps, all 
 transistors of the adder tree above the levels at which numbers are specifically 
 introduced will be off, so that the collectors of the selector circuits 
 introducing numbers at. these poinds may simply be commoned to the a.dder tree 
 collectors normally supply these points,. We will have 
 
 at B: (R ) s in positions 0-37 (a, two-pla.ce left displacement) 
 
 at C: (R, ~ Q ) q in positions 1-39 ( nc displacement) 
 
 at D° (R ) r in positions 3-^-1 ( a two-place right displacement) 
 
 In generating these partial products, correction bits are applied in the 
 usual way. That for the entry at B can be introduced Into the appropriate 
 digital, position at point A of Figure 1. (if x is positive, s will never 
 in fact be negative. However, a simple way to produce reciprocals of 
 negative numbers is to use a negative multiplier in step one.) 
 
 If the output of the adder tree is now gated ba,ck without shifting 
 
 into R, we will have in R, . digits 2-^3 of a , of which digits 2-5, i.e., 
 
 R„ _, will be identical. At the same time we gate digits to k of the 
 
 multiplier p just used into R rn „. This completes step one, 
 
 51-55 
 
 In the second step, R q, i.e., digits s to 10 of a , are decoded 
 to give a recoded multiplier of the form l.OOOOOcOdOe giving four partial 
 products introduced at tne adder-tree points A, B, C and D. 
 
 -1.8- 
 
at A: R 8-4l in P° sit ions 0-33, R 52 _ 55 in positions 44-47 
 
 (an eight -place left displacement) 
 at B: (K 2 _ kl ) c in positions 0-34, (X^_^) c in positions 49-53 
 
 (two -place left shift) 
 at C: ( R o4l^ d in Portions 0-4l, (* 51 _ 55 ) d in positions 51-55 
 
 (no displacement) 
 at D: (R 0-4l ) e in Positions 2-43, ( R 51 _ 55 ) e in positions 53-57 
 
 (two -place right shift) 
 
 Note that the entries to B, C and D are displaced in the same way as for 
 step one. Thus the same selector inputs may be used. Carries from digital 
 position 44 to position 1+3 of adder words 1 and 2 of the adder tree are 
 inhibited, to give effectively independent multiplication of a and b 
 by the same multiplier. The b part of the entries at B, an^D musAe 
 augmented by sign digit copies as far left as position 44 . 
 
 Gating the tree output back into R gives digits 10 to 53 of a in 
 R 0-4 3 ' and digits ^^ °f b 2 in \ k _ 5T Digit of b g is known to be one, 
 and need not be formed. It can be thought of as occupying an additional 
 flipflop R^ for the discussion of later steps. 
 
 In the third step, digits 10-20 of a g in E Q _ 1Q are decoded to give 
 a multiplier of the form 
 
 1 . OOOOOOOOOfOgO hOiO jOk 
 
 giving seven partial products, which must be formed in specially-provided 
 sectors and introduced into adder tree points E, F, G, H, I, J and K. 
 
 StE: R 10-43 in Positions 0-31, ^3.^ in positions 35-39 
 
 StF: R 0-33 in Positions 0-33, \ 3 _ 5? in positions 35-49 
 
 StG " R o- 3 i in Positions 2-33, R^ 3 _ 57 in positions 37-51 
 
 StH: R 0-29 in Positions 4-33, \ 3 _ 3J in positions 39-53 
 
 StI: R 0-27 in Positions 6-33, R 4 3 _ 5? in positions iu-55 
 
 StJ: R 0-25 in Positions 8-33, \ 3 _ 5? in positions 43-57 
 
 StK: R 0-23 in Positions 10-33, \ 3 _ 5? in positions 45-59 
 
 -19- 
 
Carry from position 3^ to position 33 is inhibited in all relevant adder words. As 
 described above, only digits 10 onwards of b 3 are formed. However, if, in the carry 
 propagating adder, an additional 10-digit section is added to the left of position ' 
 3k, this section and the stage in position 3^ can receive digits 0-9 of b , i.e., 
 R U3 _ 52 , and any carry (which may be negative) into position 3^ generated in the adder 
 tree. Since, in this step, positions of the double-length carry propagation adder 
 to the right of position 59 are not used, ten of these could be switched to form the 
 required additional section. 
 
 The adder output thus contains digits 20 to 53 of a in positions to 33, 
 and digits to 3^ of b 3 in positions 2 5 -59- In this third step some truncation of 
 a 3 has occurred. However, digits of b 3 are retained as far right as digit 53, and 
 only digits 20 to ko are required in step four. Thus the truncation error introduced 
 is very small. 
 
 Digits 20 to ko of a 3 are gated into R Q 2Q , and digits 0-3^ of b are gated 
 into R 25 _ 59 . 
 
 In the fourth and final step, R Q _ 2Q axe decoded to give a recoded multi- 
 plier specifying twelve partial products. These could be introduced into level five 
 of the adder tree, but only at considerable expense. The multiplier is therefore used 
 to control a standard multiplication step, using little or no additional equipment 
 to produce the reciprocal, b^ Thus the formation of a reciprocal requires 15 adder 
 word delays, four carry propagating addition delays, of which only the last is of the 
 normal length, and four recoding and selection delays. 
 
 If the range of digital positions of the numbers involved in each step is 
 examined and compared with the list of adder word dimensions, it will be found that 
 all numbers will fit into the adder word dimensions already prescribed, with the 
 exception of the partial products input in step 3 at points I, J and K of the adder 
 tree. To accommodate these, adder word k must be extended by three digital positions 
 
 its left-hand end. The reciprocal -forming process also requires an additional 
 eleven words of selector circuits, with drivers. 
 
 IX, 
 
 A Square -Root Pro cess 
 
 In iterative procedure for generating the reciprocal of the square root 
 * a number very similar to that used above for obtaining the reciprocal is: giver 
 
 n a. 
 
 number x, and an approximation p to the reciprocal, of its root, set a = xp 2 , b = p 
 2nd iterate 1 ' X 
 
 Vl " a n fl I " \ \ )2 ~ \ + l '- V 1 I - \ a „) 
 
 -20- 
 
The process converges quadrat! cally, a. n to 1, and b to the required 
 reciprocal root. If p and p 2 are provided by inspection of x, "eight multiplications 
 are required to obtain b^ which, if sp 2 differs from one by less than l/ 3 2, will have 
 about kO correct digits, It is possible that the eight multiplications could be done 
 in six steps, by a partitioning of the adder tree similar to that suggested above. 
 Even if this were not so, it would still be quite a. rapid process. The true square 
 root can of course be obtained from its reciprocal by multiplication by x. 
 
 X. Speed a.nd Cost 
 
 For the limit of a large number of digits per word, most conventional multi- 
 plier structures have a product of equipment cost and multiplication time which varies 
 
 I the square of the number of digits. In the present structure the equipment cost 
 varys as the square of the number of digits, multiplication time varys as the logarithm 
 of the number of digits. Thus, the present structure has a cost-time product increasing 
 more rapidly with increasing word length than that of conventional, structures. It is 
 therefore less efficient in the long-word limit. However, it is not necessarily as 
 inefficient as one might suppose for the proposed word length of kO bits, particularly 
 the context of existing transistor technology. The apparent logarithmically in- 
 easing inefficiency is a reflection of the fact that, while the multiplication time 
 lepends upon the propagation delay of signals passing through the logarithmically 
 increasing number of adder tree levels, each logical element of the structure is used 
 only once during the multiplication process. Thus, if one defines the useful duty 
 cycle of a logical element as the ratio between its propagation delay and the period 
 between meaningful and distinct uses of its output, this duty cycle is in the present 
 structure logarithmically decreasing with word length. However, the duty cycle in the 
 ase of W) digits is about l/l 5 , which is not greatly below the upper limit set by the 
 characteristics of the circuits used,, Typical transistor circuits having propagation 
 
 lays of 15 to 30 nsec are very difficult to operate at repetition above 1.0 mc, 
 especially when allowance is made for the reliable distribution time of clock and 
 gating signals, and for the settling time of flip flops. 
 
 The equipment requirements of this structure are approximately as follows: 
 
 ■21- 
 
For multiplication 
 
 
 
 
 
 
 Circuit Type 
 
 
 Transistors 
 (per unit) 
 
 Diodes 
 (per unit) 
 
 Number 
 units 
 
 Total 
 Transistors 
 
 Total 
 Diodes 
 
 Full adders 
 
 
 3 
 
 18 
 
 750 
 
 2,250 
 
 13,500 
 
 Selectors 
 
 
 1 
 
 13 
 
 840 
 
 840 
 
 10,920 
 
 Recoders 
 
 
 4 
 
 ~ 10 
 
 80 
 
 320 
 
 800 
 
 Multiplicand 
 
 
 
 
 
 
 
 Drivers 
 
 
 k 
 
 3 
 
 80 
 
 320 
 
 240 
 
 Totals for multiplication: 
 
 
 
 3,730 
 
 25,460 
 
 Additional re 
 
 quirements for division: 
 
 
 
 
 Recoders 
 
 
 4 
 
 10 
 
 15 
 
 60 
 
 150 
 
 Multiplicand 
 
 
 
 
 
 
 
 Drivers 
 
 
 4 
 
 3 
 
 60 
 
 240 
 
 180 
 
 Selectors 
 
 
 1 
 
 13 
 
 56l 
 
 561 
 
 7,293 
 
 Grand Totals: 
 
 
 
 
 
 4, 591 
 
 33,083 
 
 Not included in the above estimates are the carry propagating adder, 
 the registers necessary to hold operands and results, and the control circuitry. 
 This equipment would almost certainly be present in the computer arithmetic 
 unit for addition-subtraction, and should not be charged specifically to the 
 multiplier-divider. That is, the totals above represent the additional equipment 
 
 equired for the proposed multiply-divide scneme over and above that necessary 
 for even the most primitive parallel arithmetic unit. The additional equipment 
 is perhaps ten per cent of the semiconductor complement of a modern, large- 
 scale computer, but would almost certainly represent much less than' ten per cent 
 of the cost of a large computer. 
 
 To estimate the time required for a multiplication, it is assumed that 
 
 i) The propagation delay per transistor is 30 nsec; the delay 
 per adder tree level accordingly is 60 nsec. 
 
 ii) The propagation delay of the high-current drivers is 100 nsec 
 
 ill) The settling time of the carry-propagating adder is 100 nsec. 
 
 iv) The result will be gated into a register with a gating time 
 
 of 100 nsec 
 
 -22 ■ 
 
On this basis, the multiplication time becomes 750 nsec. This 
 should be a fairly conservative estimate, the circuit delays being those 
 obtainable at reasonable cost in 1962 using readily available components. 
 
 On the same basis, the time required for the generation of a 
 reciprocal, excluding the pre-normalization time, is 2220 nsec. The time for 
 a full division is therefore about 3 usee. 
 
 XI. A Simpler Vers ion 
 
 If a kO x kO bit multiplication is performed in two steps, the adder 
 tree technique described above can be used in a simpler form. In the first 
 step, the 22 least-significant multiplier digits would be recoded to yield 11 
 partial products. A carry can arise in the recoding process to be incorporated 
 in the recoding of remaining multiplier digits in the second step. An a.dder 
 tree of five levels containing nine adder words is used to reduce the 11 
 partial products to a sum word and a carry word having digits in positions 
 17 to 78 (approximately).. Of these words, digits in positions 57 to 78 are 
 final, and can be added in carry -propagating adder to give digits of the 
 final product. The remaining digits of both words, together with the nine 
 partial products formed by recoding digits 0-17 of the multiplier, are summed 
 in the same tree with the output words added in the carry -propagating adder to 
 yield the rest of the x product,, 
 
 The equipment cost of this scheme would be about half thai of the 
 one described above, and, making the same assumptions as above,, the multi- 
 plication time would be about 1.2 usee. The time for reciprocal formation 
 remains unchanged, since by increasing the number of adders in the tree to 
 ten, the tree can be made capable of all. operations required by the process 
 described, However, the time required for a division is increased by the 
 increase in the multiplication time to about 3.5 usee. Such a. scheme might 
 well be attractive in some circumstances. 
 
 XII „ Conclusion 
 
 A method of performing multiplications has been described using a 
 large amount of equipment to produce the product In a one-step combinatorial 
 manner. Although in principle rather inefficient, this process is reasonably 
 well ma.tched to the characteristics of saturating diode -transistor circuitry, 
 and the considerable increase it could yield in the overall speed of a large 
 
 -23- 
 
computer might well justify its cost. A four-step method for obtaining 
 reciprocals can he employed using essentially the same equipment to give 
 a fairly rapid substitute for division. A perhaps slightly more efficient 
 scheme employing a little more than half as much equipment can multiply 
 in a. little less than twice the time required by the more expensive scheme, 
 using two steps. This cheaper version is as fast as the more expensive 
 when generating reciprocals „ 
 
 -2k-