ymj iltl tf L I B RAHY OF THE UN IVLRSITY Of ILLINOIS 510.84 Iffcr no. 226-236 cop 2. The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN MAY 3 JUN 7 MAY 1 3 J MAY 041938 L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/theoryimplementa230atki 'X$t> Report No. 230 TTulXAi COO-1018-1115 THE THEORY AND IMPLEMENTATION OF SRT DIVISION by Daniel E. Atkins III June 1, 1967 THE LIBRARY OF THE AUG 15 19SJ UNIVERSITY Of ILLINOIS Report No. 230 THE THEORY AND IMPLEMENTATION OF SRT DIVISION by Daniel E. Atkins III June 1, 1967 Department of Computer Science University of Illinois Urbana, Illinois 6l801 *This work was submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering, June 1967, and was supported in part by the AEC under Contract No. USAEC AT(ll-l)l0l8 . ACKNOWLEDGEMENT I wish to thank Professor S. R. Ray for his most helpful advice and assistance in the preparation of this report. I also thank Professor J. E. Robertson for the enlightening discussions concerning the material in Chapter 2. I further acknowledge and thank Mr. Richard Borovec for his discussions concerning the cost determinations (Section 2.6), Mrs. L. A. Prendergast and Mr. Ronald C. Morrison for the drawings, and Mrs. Anita Worthington for the typing of the final draft. iii TABLE OF CONTENTS Page 1 . INTRODUCTION 1 2 . THE THEORY OE SRT DIVISION ......... k 2 . Introduction k 2 .1 The Recursive Relationship 5 2 .2 The Representation of Quotient Digits 7 2 . 3 Range Restrictions 9 2 .k Redundancy in the Quotient Representation 12 2.5 The P-D Plot . . 15 2.6 The Cost of Quotient Digit Selection 2k 2.6.1 General ............ ................. 2k 2.6.2 Cost Determination for an Arithmetic Model 2$ 2.6.3 Cost Determination for a Table Look-Up Model 3^ 2 .7 Quotient Conversion 38 3 . IMPLEMENTATION OF SRT DIVISION ....... kl 3 • Introduction kl 3-1 General Considerations for Implementation kl 3.1.1 Relative Occurrence of Division k2 3.1.2 Acceleration of Division k2 3.1.3 Compatibility of Division with the Multiplication Scheme U5 3.2 A High-Speed Multiplication Scheme k6 3.2.1 Notation............. k6 3 .2 .2 Description and Operation U9 3.3 Design of Division Scheme 53 3.3.1 General 53 3.3.2 An Arithmetic Model 5U 3.3.3 A Table Look-Up Model 56 3-k Estimate of Speed' of Elocution 66 k . SUMMARY AND CONCLUSION 69 '1 . 1 Summary 69 k .2 Conclusion 1 70 LIST OF REFERENCES 72 iv 1. INTRODUCTION Perhaps the major complication associated with digital divi- sion is "best illustrated by your performing the following long-division problem and noting carefully the steps you follow. 396 A 1057 6 2 1 A 1 A 2 A 3 A = decimal point marker Your operations in selecting the first quotient digit are summarized in the flow chart; Figure 1. The salient point is that division is a trial and error process requiring an initial "guess" of a quotient digit followed by a subtraction, or at least a comparison, to determine whether the guess is correct. If it is not, the initial choice is modified and the process repeated. It is the trial and error nature of division, whether performed by man or machine, which complicates its execution. In building a computer arithmetic unit, division is the most difficult basic operation to implement efficiently. But despite the complexity, the literature is replete with themes and variations for implementing digital division. Flores, for example, states four methods for increasing speed of division and then proceeds to describe no less than twenty-four schemes which in- [21 corporate some or all of these speed-up techniques. MacSorley describes four division techniques demanding various divisor multiples to accelerate execution. * Numbers in brackets refer to the corresponding entry under References j-l j - INDEX d = DIVISOR Pj= PARTIAL REMAINDER P = DIVIDEND qj= QUOTIENT DIGIT FIGURE I. FLOWCHART OF MANUAL EXECUTION OF DIVISION There is far less in the literature, however, describing theory and analytic tools to be used in designing a division scheme. Most of the articles describe schemes which are products more of art than of science,, This report is an attempt to contribute to the science of computer arithmetic implementation. This report describes a class of division techniques especially suited for implementation in an electronic digital computer. For historic reasons, this class will be referred to as SRT division. The name is derived from the fact that the binary case of this type of division was discovered independently, at about the same time, by Dura Sweeney of IBM, J. E. Robertson of the University of Illinois, r 3"! and T. D. Tocher of Imperial College, London . The paper, however, incorporates more recent work, due exclusively to Professor Robertson, which extends the binary SRT division to a radix higher than two. ["51 Much of Chapter 2 is based upon his report L and upon numerous personal communications - After a description of the theory and properties of SRT division, the report turns to the problem of actually implementing the scheme and presents an example of one possible realization. 2. THE THEORY OF SRT DIVISION 2 ,0 Introduction This chapter introduces a recursive relationship for de- scribing division and from it develops the nature of SRT division. The discussion is augmented with two graphical representations; one to determine the range restrictions associated with SRT, and the other to aid in computing the "cost", of quotient digit selection* Most of the following analysis will "be developed for a general radix, r. At first this generality may appear superfluous, for after all, isn't a digital computer a binary machine, and doesn't binary imply radix two? It is true that the basic storage elements of a digital computer are two state devices and that numbers are represented internally by strings of "l's" and "0's". Computer arithmetic, however, is often facilitated by considering groups of bits rather than each bit individually. Such grouping may be interpreted as use of digits of higher radix than two. For example, a pair of bits becomes one, radix four digit; a trio of bits, a radix eight (octal) digit. In the literature of arithmetic unit design, one finds re- ferences to such techniques as inspection of bits "two at a time," or perhaps " generation of several quotient bits simultaneously". In this report such techniques would be described in terms of higher radix arithmetic . 2 .1 The Recursive Relationship Digital division as implemented in an electronic computer consists of preliminary operations, i.e., normalization, a recursive process, and a terminal operation:,; i.e., changing the form of the remainder. Although preliminary and terminal operations vary from machine to machine, they generally consume much less of the execution time than the recursive operations. For restoring, non-restoring, and the SRT division scheme to be described in this report, this recursive relationship is defined by p._=rp. -q._d (2.1.1) where the symbols are defined as follows: j = the recursive index = 0, 1, ... m-1 th p . = the partial remainder used in the j cycle J p = the dividend o p = the remainder m q. = the j quotient digit in which the quotient is of the form J q A 9 l q 2 • • • q m L radix point m = the number of digits, radix r, in the quotient d = the divisor r = the radix This relationship and the symbols as defined will be used throughout this report. The relationship is used specifically in the development of range restrictions on the partial remainders in Section 2.3. Although not germane to the theory of SRT division, it is interesting to note in passing that this relation points to possibilities for accelerating the execution of division. Verbally, the equation says that each partial remainder must be multiplied by the radix (rp.), i.e. J shifted left one digital position and that the selected quotient digit must then be multiplied by the divisor (q. d) and subtracted from this shifted partial remainder. The division process will thus be accelerated if the shift and/ or the subtraction time is decreased. In practice, all values of q d are stored in registers or are readily available via shift gates from the register containing the divisor. The rapid forma- tion of q . d thus reduces to minimizing the necessity for forming awkward multiples requiring an addition, and to accelerating the selec- tion of q . d at the divisor input to the adder/ subtractor . Secondly, note that the recursive index, j, is implicitly an inverse function of the radix. When actually implemented on a machine, digits of a higher radix than two are represented by two or more binary bits. A string of £ binary digits (bits) is equivalent to £/2 radix four digits. In general for I bits of radix two, there corresponds I n m = digits of radix r, where for practical cases, r = 2 , log 2 r B r v > n = integer > 0. Thus to produce a quotient of given precision, the number of iterations required, and, concomitantly, the execution time is decreased as the radix is increased. 2 .2 The Representation of Quotient Digits As noted in the last section, the use of a higher radix reduces the number of cycles required to perform a division of given precision. The implementation of such a scheme may, however, be costly, and costlier still if quotient digits are represented as they are in manual methods or machine restoring division. In these cases quotient digits have the values 0, 1, 2, ... r-1. With the' tadix, )x, equal four the possible digit values are 0, 1, 2, and 3* A radix four restoring division there- fore requires that multiples of 1, 2, and 3 times the divisor be available for subtraction from the partial remainder. The 1 times is of course readily available, the 2 times is formed merely by shifting left one binary position, the 3 times multiple, however, requires extra time and/ or hardware. It may be formed by a tripler circuit or by addition of 1 times and 2 times the divisor which is then stored in an auxiliary register. For radix eight, multiples of 3> 5, and 7 times the divisor must be computed and stored. With SRT division the problem of forming divisor multiples is mitigated by using both plus and minus quotient digit values. The quotient digits are of the form -n, -(n-l), ... -1, 0, 1, . . . n, where n is an integer such that 1/2 (r-1 ) ^ n _£.r-l. Within this range the actual choice of n for a given r is largely a function of design de- tails. The choice is considered further in Section 2.6. The necessity for the range restriction is as follows'. At least r unique digits are required to represent a number, radix r. In the representation introduced above, there are 2n+l unique digitc, thus the requirement 2n+l 7"r. ( - >n ^he c^her hand, for radix r, the maximum value of a quotient digit, n, should not be greater than the value of the maximum digit representable, thus n ^ r-1. Combining these two inequalities yields the restriction stated above. With plus and minus quotient digits, a higher radix division may be implemented with fewer awkward multiples of the divisor. Now the quotient digits for a radix h division are -2, -1, 0, +1, 42. All the necessary multiples of the divisor may be formed by shifting and complementation and require no auxiliary registers. The second, but probably more significant consequence of this representation of quotient digits is that it introduces redundancy into the representation of the quotient. If 2n 7 -r-1, then there are more symbols available to represent a number than actually necessary. '■' 1 pome numerical values may therefore be represented in more than one form . For example, with r = k, n - 2, and with representing negation, the number 6 could be represented as 12, or 22. As explained in the next sections, this redundancy permits less precision in comparing the divisor and partial remainder in selecting a quotient digit. This statement seems intuitively correct since without redundancy, each quotient digit may be represented only one way and thus must be se- lected precisely. With redundancy, the quotient digit, thus the comparison of divisor and partial remainder, need not be precise. This non-unique representation does, however, complicate the division in that the redundant form must eventually be converted to a conven- tional representation. 8 2.3 Range Restrictions With the quotient representation now defined, consider the derivation of range restrictions on the partial reminders. Recall from the manual execution of a division that in determining whether a quotient digit is correct or not, one is essentially applying the restriction that < p. in < d, where p.... is the result of the sub- ~ J+l 3+1 th traction of q . times the divisor from the j ' partial remainder. If p.,, is not within this range then q.,-, is changed until it is. For non- restoring division, negative partial remainders and negative quotient digits are allowable, thus the range restriction is |p.,,| _<_| d | . It seems reasonable, therefore, to hypothesize other division techniques for which lp.,,1 < k | d |, and which utilize the quotient digit repre- sentation introduced in the last section. The upper limit on k will be 1, The lower limit, although not yet obvious, is 1/2, thus 1/2 < k < 1. To show that this is in fact the case, first reconsider the recursive relationship described in Section 2.1 and restated below. P j+1 - rp. - q. +1 d (2.3.1) th After p.,, is formed on the j k cycle, it is multiplied by the radix r (shifted left); j is increased by one and becomes rp. of the present cycle. Since lp. +1 l <. kd, it follows p. must obey the same restrictions, i.c r |pj I < rk |d | (2.3.2) Substituting 2.3.1 into 2.3.2 yields -kd < rp - q J+1 4, kd (2.3-3) At this point the divisor is assumed to be normalized, ice., restricted to the range 1/2 < d z_l. Furthermore, (2.3..I) is normalized with respect to the divisor and rewritten letting z. = p./d and J / 7 . . = p . , / d . j +1 * j +1 7 z J+1 = rz. - q (2.3.1*) Equation (2.3«M may be interpreted graphically as a plot of z. . versus rz . with the quotient digit, q. n as a parameter. Such a J+l J 4 & ' 4 J+1 * representation shall be called a z - z plot ,, Recall that the quotient digits assume values -n, -(n-l), . .., -1, 0, +1, . . . , n. Figure 2 is such a graph. To facilitate discussion, each plot corresponding to a different quotient digit is called a q-line. The goal of this section is to demonstrate that a correct division procedure exists which incorporates the above range restric- tions and quotient representation . This existence is substantiated if for each value of rz . in the allowed range there corresponds a J quotient digit and a z. ,, also in their allowed ranges. In terms of J+l Figure 2, this means that for any point on the rz . axis such that -rk < rz . < rk, one must be able to move on a line segment normal to the rz . axis and interesect a q-line at a point corresponding to a J z. , within the range -k ^ z . n <£ k. This allowed range is enclosed J+l - J+l - between the lines z. . = k and z. , = -k in Figure 2. J+l J+l 10 LJ Of => Q l±J O O a: 0- o CO > fe 7 N CM UJ or 11 To satisfy the foregoing requirements, the maximum value of rz , i.e. rk, must occur at the intersection of z. , = k and the q-line, z. , - rz . -n. Similarly, the minimum value must occur at the inter- J+l J section of z. n = -k and the q-line, z. n = -rz . + n. These bounds on J+l J+l j rz . are indicated by the dashed vertical lines of Figure 2. Figure 2 now points to the value of k in terms of r and n. At the upper right vertex of the bounding rectangle, z. , = k = rz . - n. J+l J But since rz . = rk, 3 k = ^ (2.3^5) The division is now characterized by tangible parameters, namely the radix and the maximum value of quotient digits. Combining (2. 3° 5) r-1 with the restriction on n, -_— c n *- r-1, verifies the statement at the beginning of this section, 1/2 £k ^.1. 2 ,h Redundancy in the Quotient Representation Section 2,2 indicated that the quotient digit representation of SRT division introduces redundancy into the quotient . This fact is also manifested in Figure 2 in the regions on the rz . axis for which J either one of two q-lines may be legitimately selected. For example, at point A one may move vertically upward to the q . = line or downward to the q. = +1 line. In either case the quotient digit is correct. Figure 3> a specific case of Figure 2, testifies to the fact that this freedom of choice is not merely the result of an inaccurately drawn graph. Here r - k, n = 2. The vertical dashed lines define the overlap regions. 12 1 1 sr cvj i_ -^ \ c *- c CI k_ CVJ II ii I- o M I M to UJ Z> o 13 The production of a redundant quotient requires extra hard- ware and perhaps time, to convert it to a conventional binary represen- tation acceptable by programmers and other sections of a machine. This conversion is discussed at greater length in Section 2.7. The conclusion of the section is that the positive consequences of a freedom in quotient digit selection overshadow the cost of conversion. With no redundancy, the divisor and the shifted partial remainder must be compared (usually by subtraction) to the full precision defined for the machine o With redundancy, the designer is at liberty to inspect fewer bits of the divisor and shifted partial remainder than define full precision. Handling fewer bits may save time and hardware: these ramifications are explored further in the chapter concerning implementation. In Figure 3> for example, a correct quotient digit is rp . selected knowing rz .= — — "- to a precision only great enough to contain it within an overlap region. Exactly what precision is required for a given value of r and n is the subject of the next section. In terms of z - z plots such as Figures 2 and 3> the redun- dancy is proportional to the width of the overlap regions. The width of this region in terms of n and r is found as follows r Consider two adjacent lines of Figure 2, i.e., z. = rz -i and z '. = rz . - (i-l). J +1 J ■ J +1 3 n The overlap, A rz. is the difference between rz . for z. , = — - and J j j+1 r-1 rz . for z' = ; — . Solving for this difference yields j j 41 r-1 A rz . = — — — 4 1. The ratio — =- is therefore a measure of redun- j r-1 r-1 dancy . Ik As redundancy (width of overlap region) is increased , the required precision of inspection of divisor and partial remainder, and thus hopefully the execution time, is decreased . It, therefore, appears that for a given r, n should be as large as possible, i.e., n should equal r-1. Such a choice may not be practical, however, since n = h, requires the ability to form h multiples of the divisor. The choice of n is therefore bound up in the usual trade off between time and hardware . 2,5 The P-D Plot Now consider another graphical representation of the division procedure. This construction, suggested by C . V. Freiman of the IBM [51 Corporation is useful in further describing SRT division and in computing the required precision of inspection of the divisor and shifted partial remainder. The basis for the plot is the recursive relationship Vi " rp j - Vi d (2 - la) as described in Section 2.1 together with the range restriction V 1 r-1 developed in Section 2.3- The figure is thus essentially a plot of partial remainder versus divisor values and therefore in this report shall be referred to as a P-D plot . 15 Solving the recursive relationship for rp . yields rp. = p. + q. d. (2.5.1) 0+1 0+1 v For a fixed quotient digit, the upper limit of rp . as a function of J the divisor, d occurs when p . . is maximum, i.e. when J +1 11 A *j+l r-1 thus rp, _ - l-rr + q, Al 1 d. (2.5-2) j max I r-1 0+1 Likewise, the lower limit occurs with p. _ = - — — d, thus ' 0+1 r ~l rp. . = ( -^r + q- Jd- (2.5.3) F j nun v r-1 0+1 These linear equations may be plotted as functions of d with q.,-, as J a parameter ranging from -n to +n in steps of 1. The area between rp . and rp . . for a given q. , = i will be denoted the q(i) area j max ^j mm to 0+1 The division procedure is now determined. A given value of th divisor, d and the j shifted partial remainder will specify a point in a q(i) area. The digit i will be the value of the next quotient digit q. which in turn is used in forming the next partial remainder. 16 In this representation the redundancy is manifested as overlapping of the q(i) regions, i.e. some pairs of d and rp . will specify a point for which either q.. = i or q . , = i - 1 is a valid choice* 4 J+1 J+l Figure k is an example of a P-D plot for a division with r = k, n = 2. The equations for the lines plotted, 2 l , 2, etc., are given in Table 1. The region for which q. . - 2 is a valid choice, i.e. the q(2) area. is between lines 2' and 2; the q(l) area is between lines 1' and 1, and so forth. Note the overlap between q(i) areas, for example, the region between line 1' and 2 in which either the choice q. , = 1 or q . _, -- 2 is correct. Note further that the figure is J+l J+l symmetric about both axes. On the right half of Figure k (the same may be done on the left), "steps" have been drawn within the overlap of the q(i) regions. The width of a "tread" (constant rp,, d varying) defines a divisor interval, the value of rp . for each tread defines a comparison con- _ __> ^ stant, the distance between comparison constants defines a partial remainder interval . Phrased in this terminology, division consists of locating a given divisor value within the appropriate divisor interval, locating the shifted partial remainder within the appropriate interval (using comparison constants), and selecting a value of q enclosed by the intersection of the boundaries of these intervals. Since a divisor and partial remainder must be located only to within an interval , they need not be inspected to full precision in selecting a correct quotient digit. Here is where the redundancy pays dividends. IT CsJ ii c n X H 5= H O _l Q_ Q I CL UJ 18 rp . - + . d J - r-1 + Vi d r = k Vl p d+l 2 2/3 d 2 -2/3 d 1 2/3 d 1 -2/3 d 2/3 d -2/3 d 1 2/3 d 1 -2/3 d 2 2/3 d 2 -2/3 d quation rp. = 8/3 d V3 d 5/3 d 1/3 d 2/3 d -2/3 d -1/3 d -5/3 d -V3 d -8/3 d Designation in Figure 3 2' 2 1' 1 0' I' I 2- 2 Table 1. Equations Defining the Regions of Figure h. Techniques for selecting divisor intervals and comparison con- stants are detailed in the next two sections < At this point, however, we shall make several general observations. First, as we shall soon discover, the comparison constants are compared with the high order N bits of the shifted partial remainder and, similarly, the end points of the divisor intervals are compared with the N high order bits of the divisor. The comparison constants and end point of the divisor intervals should therefore be numbers which are representable with N and KL bits, respectively. The choices illustrated in Figure h p d ' D which maximized the width of the divisor intervals do not meet this requirement. 19 In Figure 5> however, more practical choices are shown. The dashed lines represent the theoretical choices used in Figure k. Now, although the number of steps has been increased, the boundaries fall at points easily representable in binary notation „ Note that inspec- tion of k bits plus sign of the partial remainder and divisor is sufficient to locate the correct choice of quotient digit. The second observation is that the choice of divisor inter- vals and comparison constants is bound up with the required precision of inspection of the partial remainder and divisor; if, for example, the divisor intervals widths are increased, the required precision of divisor inspection, (number of bits) may be decreased. Further- more, the maximum precision of inspection of the divisor is determined by the divisor interval of smallest width. By inspection of Figure 5> the reader might guess where this step is, but, we shall now locate it analytically. The result of this derivation will be useful in the next sections. The length of a divisor interval is limited by the boundaries of the overlap region. The maximum precision of inspection is required where the divisor interval is minimum. To determine where this minimum divisor interval occurs consider the detail of the overlap of the q(i) and q(i-l) regions shown in Figure 6. For a given value of rp., the maximum width of a divisor J interval is 20 0.010 1 2 9 16 5 8 II 16 3 4 13 16 7 8 15 16 .1000 .1001 .1010 .1100 .1111 FIGURE 5 DIVISOR INTERVALS AND COMPARISON CONSTANTS WITH r=4, n = 2 21 r p : P: ; [ n/(r-l) + i-l ] d p. = [-n/(r-|) + i ] d FIGURE 6. DETAIL OF A P-D PLOT OVERLAP REGION 22 £s q.saqqBius aqq sx tuntuxuxiu sxqq uaqq. p 7 e jj °uox8aj dBqjaAo uaAxS b uBds oq. .A!jBssaoau sqBAjaq.ux josx/axp jo jaqiunu urntuxuxm aqq. jaqatuBjBd.Jaqq.ouB aq.nduioo oq. pasn aq ^bth oxq.Bj uoxq.oaqas aqjj ■u = T joj qqnox/jjxp q.sotu st jaxqjBa paqBoxpux sb puB -d oq. jBuoxq.Jodojd st uoxqoaqas jo A^qqnoxjjxp aqj; 1 ; u- (q-J)x sx (l-x = b puB x = b uaawq.aq oxqBj uoxq.oaqas aqq.) '°Q 3-Bqq. sjBaddB q.f ' 9 axnSxj iuojj °pxtba st qusqsuoo uosxJBd -moo aq§uxs b qoxqM joj "[BAJaqux josxaxp aqq. jo qq.pjM aqq. jo ajnsBam aAxq.Bqaj b sx oxq.Bj sxqj, ° punoq jaddn aqq. jo adoqs aqq. oq. uoxSaj dBqjaAo ub jo punoq jawoq aqq. jo adoqs aqq. jo oxjbj aqq. sb pauxjap sx qoxqM x'o q.Bqq. qons 'g f jaSaq.ux dividend and a 13 bit divisor. The results of this limited precision division (eight bits) are returned to the full precision mechanism as part of the full precision quotient and are used in forming the next full precision partial remainder . Note that the number defining full precision may be changed in discrete steps by changing the number of "calls" to the model division. Furthermore., the model division scheme may be quite different from that of the full precision division. For purposes of computing costs of quotient selection, we shall consider two classes of model division procedures. The first will be those involving the use of an auxilary arithmetic unit and employing addition and/ or subtraction in forming the quotient digits. Examples of schemes in this class include a radix four SRT division performed in the exponent arithmetic unit or the procedure suggested [9] by Wallace which is logically equivalent to forming the approxi- mate reciprocal of the divisor and multiplying by the partial remainder This class will be referred to as arithmetic m o dels . The second class consists of those methods which are the logical equivalent of a table look-up. This technique may be viewed as the direct implementation of a P-D plot, i.e., decoding the divisor interval, the partial remainder interval and producing the quotient digit indicated by their intersection. This class will be referred to as table look-up models . Before considering these two type models in further detail, let us state more precisely the conditions which must be obtained in 25 the choice of model division and precision of inspection. Let m = the number of bits to the right of the radix point of divisor and dividend. /^ rp . = the truncated version of the shifted partial re- J mainder . e = the number of bits to the right of the radix point in rp . . J Ap =+(2-2 ) ^ + 2 , the uncertainty in rp . . d = the truncated version of the divisor. 5 = the number of bits to the right of the radix point in d. Ad =+(2 -2 )^r + 2 , the uncertainty in d. The following cost criterion summarizes the requirements on the quotient selection mechanism, Ad and Ap. Cost criterion : Given the approximations rp . + Ap and J d + Ad, the integer result of rp ./d = i performed in the model must J be such that on the appropriate P-D plot, the rectangle defined by (d + Ad, rp . + Ap) is entirely within the q(i) region, J 2.6.2 Cost Determination for an Arithmetic Model We first consider the determination of the cost for a division using an arithmetic model. In this case rp . and d are J presented to a limited precision arithmetic unit and the division carried out to produce a rounded integer quotient. If the bit posi- tion to the right of the radix point in the model is "1", the integer 26 portion is increased by one and truncated, otherwise the result is merely truncated. This rounding is necessary if the cost criterion is to hold for an arithmetic model. Equation 2„5.^ indicated that maximum precision is required in the overlap of the q(n) and q(n-l) regions in the vicinity of d = l/2. The precision determined here will "be sufficient for any other region of the P-D plot. Figure 7 is a detail of this region. Two additional factors must now he considered: a redundantly represented partial remainder and a negative divisor. As illustrated in the next chapter, a division scheme which meshes well with multi- plication must cope with redundantly represented partial remainders. One consequence of the representation is that the truncation error (Ap) attributable to considering only a few higher order bits of the partial remainder may be either positive or negative. When a negative (2's complement) divisor is permitted, truncation error may also be negative . In the divisor interval l/2 + Ad, the dividing line between the selection of q = n and q = n-1 is rp . = l/2(n - l/2) since rp ./d = J J 2 x 1/2 (n - 1/2) = n - 1/2 which must be rounded to n. For the cost criterion to hold, the rectangle (l/2 + Ad, l/2(n - 1/2) + Ap) must not extend below the bottom of the overlap region defined by rp . = J (n - 2/3)d. Such a rectangle is indicated by the dashed lines in Figure 7. Since this rectangle is not unique, there is some avail- able trade off between Ap and Ad. To achieve more quantitative 27 rp. = (n- 1/3) d J r Pj = (n-2/3)d r pj = 1/2 (n-l/2) — d FIGURE 7. COST CALCULATION FROM P-D PLOT 28 results, we now limit the analysis to a special but useful case: that 2k in which the radix is of the form r = 2 • where k is a positive (non-zero) integer. 2k A division with r = 2 may be implemented with a cascade of k adder/ subtractors with multiples of 1 times and 2 times the divisor available to the first stage of the cascade, k times and 8 times to the second, and so forth through 2 times and 2 ; times available to the k stage. In this case, n, the largest multiple of the divisor which may be formed, is the sum of the largest multiple which may be formed at each stage in the cascadej i.e. n = 2 + 8 . . .+ 2 - Furthermore, the sum of this geometric series is — — = 2/3. Thus we r-1 2k shall consider the case r = 2 , n = 2/3(r-l). For practical implementation, the rectangular region defined horizontally by Ap will be symmetric about d = 1/2 and rp . = l/2(n-l/2) Referring to Figure 7, note that Ad must be smaller than the smaller of Ad n and Ad„ . The following demonstrates that Ad^<-Ad n . 1 max 2 max ° 2 1 max Ad Q = 1/2 ( n - y, 2 - l) (2.6.1) 2 max \n - 2/3 Ad = l/2 ( - n - 1/2 + 1 1 max ' V n - 1/3 Ad, -Ad Q .1- %- n + l/k (2.6.2) 1 max 2 max n 2 . n + 2 / 9 Since 2 n - n + l/k n 2 - n + 2/9 > 1 29 Ad - Ad ^ 1 max 2 max Ad., < Ad (2.6.3) ]. max 2 ma> Thus choosing Ad ^_Ad_, will insure that the rectangle will fit — 1 max horizontally. Similarly Ap x = (n - l/3)d l - l/2(n - l/2) (2.6.1+) Ap 2 = - (n - 2/3)d 2 + 1/2 (n - l/2) Ap x - Ap 2 = (n - l/3)d 1 + (n - 2/3)d 2 - (n - l/2) (2.6.5) let d = l/2 - Ad d = l/2 + Ad (2.6.6) Substituting (2.6.6) into (2.6.5) yields Ap x - Ap 2 = — ^ 30 thus Ap 1 ^ Ap 2 (2.6.7) As implied, earlier, if we are certain that rp . = 1/2 (n - l/2) J will produce the quotient selection, q = n, then Ap < Ap will be sufficient. If we cannot guarantee this, then Ap < Ap must hold. We shall adopt the latter, more cautious approach. If we selected the former, then the (n - l/3) term in equation 2.6.13 would be replaced by (n - 2/3). The results in Table 2, however, will be the same . Recalling that Ad - 2 we want 2" 5 < Ad n (2.6.8) — 1 max which from 2.6.1 becomes 2" 5 The sample implementation presented in the next chapter incorporates this approach. UO 3. IMPLEMENTATION OF SRT DIVISION 3-0 Introduction Armed with the theory and techniques unfolded in the last chapter, now consider an example implementation of SRT division- This example is not presented as a detailed construction proposal, "but is rather intended to contribute the following: 1. A description of several fairly general considerations for implementing digital division and of how SRT division meshes within these considerations. 2. An elaboration, in a rather concrete way, of the concept of limited precision modeling. 3. A notion as to the hardware demands and operation time of functional blocks required in implementing SRT division. Throughout this chapter, it is assumed that the designer has already made the decisions as to the speed of the electronic components he will use, and that now he is attempting to organize these components into a faster, more efficient system. 3=1 General Considerations for Implementation Chapter 2 introduced a class of division techniques which appear especially suited for implementation in a digital machine. Having accepted this premise and having decided to tackle SRT division, the designer is still faced with many decisions and dirty design details, kl These details are strongly related to the structure of the allied parts of the arithmetic unit and to such real life questions as available logic, speed demands, available packaging space, and to a large extent to the price the designer is willing to pay for a high-speed divide. A thorough exploration into these factors is well beyond the scope of this paper, however, there are several more general guidelines which may apply. 3-1.1 Relative Occurrence of Division The first guideline emerges from the observation that divi- sion is usually the least frequently executed of the basic arithmetic operations: add, subtract, multiply, and divide. The designers of the r6i IBM STRETCH computer estimated that on an average, out of l6 opera- tions of a general purpose computer, the relative occurrence by opera- tion type is as follows: 1 division 3 multiplications 6 additions 6 control transfers These figures indicate that the designer should pay more to accelerate multiplication than division: that in a conflict between accelerating multiplication and division, the former should be the victor. 3.1.2 Acceleration of Division With decreasing hardware costs, increasing packaging density, and demands for still faster arithmetic units, the first guideline may k2 not be as significant as it was in the days of STRETCH. Today the designer will probably aim both for very high-speed multiply and divide. The design question is not merely how to implement division, but rather, how to implement high-speed division, or yet more specifically, high- speed SRT division. The next guidelines, therefore, related to organizational factors affecting the speed of execution of division,, Of course, in selecting the SRT method, the designer has already seized upon the possibility of accelerating execution by decreasing the precision and thus reducing the time required in selecting a quotient digit. There are, however, other possibilities beyond this fundamental decision. As mentioned in Section 2.1, the recursive relationship points directly to four possibilities for accelerating division. A fifth, obvious, but important factor is added here. These possibilities are as follows: 1. Decrease the time for forming rp , i.e. the left shift time. 2. Decrease the selection time for multiples of the divisor at the divisor input to the adder/ subtractor . 3° Decrease the add/ subtract time. h. Increase the radix and thus decrease the number of cycles required to generate a quotient of specified precision. 5° Decrease the time for selecting a quotient digit, i.e. for comparing the divisior and shifted partial remainder. h3 The first of these is essentially the problem of minimizing the number of logic stage delays required to transfer and shift the contents of the secondary rank of the accumulator back to the primary rank. Similarly, the second item relates primarily to minimizing control delay in operating a shift gate once a quotient digit is selected. In approaching the third factor of this list, decreasing the add/ subtract time, the designer is likely to turn to a carry/ borrow save type unit which eliminates propagation until a terminal [71 step . This is a standard technique in implementing multiplication, but must be approached cautiously for the case of division. The necessity for caution arises from the fact that such schemes actually introduce redundancy into the representation of a sum or difference and thus, for division, produce a redundant partial remainder. As mentioned in Section 2.5-2, redundancy in the partial remainder complicates the quotient selection and, for a practical scheme, requires that at least part of the partial remainder be converted to conventional form after each pass through the subtractor (s) Increasing the radix, although it does decrease the number of cycles required, also carries with it some disadvantages. For a fixed n (the upper limit of a quotient digit) an increase of r decreases the redundancy — — and thus requires either greater precision in selecting quotient digits, or an increase of n. As noted earlier, an increase in the value of n demands the availability of more multiples of the divisor and thus more hardware. U4 The fifth factor is explored further in Section 3 = 3 with reference to the selection of the model division, Note that the question of minimizing control step-up time is largely beyond the scope of this paper. It is, however, a very real and related problem to be faced in accelerating an arithmetic process o There is little efficiency in building a system which operates faster than control signals can service it. 3=1=3 Compatibility of Division with the Mul t iplication Scheme According to the STRETCH statistics mentioned in Section 3-1.1, multiplications occur half as often as additions. Multiplica- tion, however, is usually executed as a series of considerably more than two additions and thus requires the use of acceleration techniques if the speed of multiplication and addition are to be compatible. These techniques essentially reduce to the first four of those mentioned in Section 3 = 1=2 with the word "divisor" replaced by multiplicand', "left shift" replaced by "right shift", and "quotient" by "product," Thus, at least to a first approximation, acceleration of multiplication and division are compatible. A high-speed arithmetic unit usually includes a substantial investment in hardware to accelerate the execution of multiplication. Hopefully, much of this investment may also be used for division. With this in mind and accepting the premise that accelera- tion of division should place second to accelerated multiplication, we adopt the following strategy: design a high-speed multiplication U5 scheme, then embed division within it , Although not the ideal, it is, in fact, a practical strategy which has been used in arithmetic unit design. In a sense, this guideline summarizes the guidelines mentioned in both of the previous sections. 3-2 A High-Speed Multiplication Scheme Having adopted the design strategy "multiply then divide", we must now propose a high-speed multiplication scheme with which we hope to mesh division. The description of the scheme will necessarily be at the block diagram level and will by no means be fully justified „ Also, details such as overflow and handling of the exponent will not be dis- cussed. The scheme, however, has been studied and, in fact, simulated by the author.. It is similar to that proposed for implementation in the Illinois Pattern Recognition Computer (llliac III). The number format to be handled by this device is assumed to be an 8 byte (8 bits per byte) normalized floating point number with 1 byte of exponent and 7 bytes of mantissa. Figure 9 is a simplified block diagram of the proposed unit. 3.2.1 Notation The conventions used in Figure 9 are as follows: 1. Flipflop registers are denoted by rectangles with the horizontal subdivisions indicating bytes. For example, the M register (M REG) is 7 bytes (56 bits) long. 2. Groups of combinatorial logic are shown in circles or rectangles with rounded corners. Any gating is re- presented in terms of AND (•), OR.(v), and EXCLUSIVE 0R($). 1+6 1- £3 UJ o _i o cr i- 2 O p c x X o cr < Z> o UJ X cr < UJ _i CL < X UJ u_ o < < o o _1 m CT> UJ Z> U T 3. The widest lines indicate a bus for data in SD format (2 "bits per digit, see Section 3 '2. 2), the next widest for numbers in conventional notation (l "bit per digit). h. Gating signal names are of the form F F„ X T T where: a* F and F (F p is optional) are the names of the registers from which data is transferred. b. X = D if the transfer is direct ; i.e. not shifted. X = Rn if the data is shifted n places to the right during the transfer. X = Ln if the data is shifted n places to the left during the transfer. c. T and 1 (T is optional) are the names of the registers to which data is transferred from F and F p respectively. d. The concatenation of register names starting with the same letter such as UM and US is further abbreviated as UMS. 5- Examples of gating signal names: a. VDM - Gate the data on the V-Bus directly into the M-Register. b. ML7Y1 - Gate the contents of the M-Register shifted left seven positions into the Y input of signed-digit subtractor SI. c. UHQDLHQ, is equivalent to the two names UHDLH and UQDLQ. ^8 6, The label TC MD or FROM MD indicates connections to the Model Division to be described in Section 3«3»3= 3 . 2 . 2 Description and Operation As mentioned earlier, multiplication is substantially accel- erated by the use of an adder or adders which eliminates carry propa- gation until a terminal step. The "adder" proposed for this model, Sl-SU is actually a signed-digit subtractor (SDS): it incorporates facilities for postponing borrow propagation . Actually, the device performs both addition and subtraction under control of the "KEG" signal. We shall digress a moment for a brief description of this device . Each stage of the signed-digit subtractor (SDS), as shown in Figure 10, is a 3-input, 2 -output device together with an interstage connection and a "NEG" control line. Y is a bit of the subtrahend i (minuend - subtrahend = remainder) in conventional binary form. S. and X. together comprise the minuend in a redundant notation which will be called SD format. Each digit of the minuend is of the form S. X. — ■ to ii where X, is interpreted as a magnitude, 1 or and S as a sign, - + 1 = -. The SD format digits are therefore represented as follows: s. 1 X. 1 DIGITAL VALUE +0 1 +1 40 1 -1 1 1 -1 h9 i-1 —I Y. S, X. 11 1 W V i£ Stage i T. Z. NEG C. S. = l X. = 1 Y. = l T. = l Z. = l NEG = C, = l T. = l Z. = l i-1 C = l sign of minuend digit magnitude of minuend digit subtrahend in conventional binary form sign of difference digit magnitude of difference digit control to complement T, NEG = -* T. not complemented NEG = 1 ■*- T. complemented i interstage interconnection, but not a propagating borrow/carry C . NEG l C. t (X. » Y. ) ill S. X. v X. Y, 11 11 S. , X. Ln v X. _ Y. , l+l l+l l+l l+l Figure 10, Stage of a Signed-Digit Subtractor 50 The output of the subtractor is in this same forma t, i.e. Z. is the magnitude of the digit, T. is the sign. C. and C. , are t> i i l-l interstage connections and, as may be seen from the logic equations are not propagating borrows. Another advantage of SD format is that a number may be negated merely by complementing the sign (S) bits. Note that the postponing of borrow propagation is achieved only at the expense of introducing redundancy into the representation of the result. Actually two registers, for example US and UM, are required to store a number in this redundant form. We must also pay the price of conversion or assimilation , to conventional form. This assimilation actually requires a borrow pro- pagation and one additional subtraction. The propagation is accelerated by use of look-ahead techniques, but is still rather time-consuming and expensive. The propagation occurs in the propagation logic the output of which is then applied to the Y input of Qh to produce the assimilated result. The propagation logic forms the outputs B. . = 3. Z. v T. Z. l-l 11 ii and o4 is used to produce the assimilated result with bits A. = Z. 9 B. ill roi The SDS is described in more detail in reference . In the proposed scheme, four of the signed-digit subtractors are cascaded to provide multiplication, radix 256, i.e. 8 bits of the 51 multiplier are used simultaneously. The multiplicand is loaded from the V-BUS into M, the multiplier into UQ . The low order byte of UQ drives recoding logic which couples to the control lines in the shift array. This recoding, suggested by Wallace , requires plus and minus multiples of 128, 6k, 32, l6, 8, k, 2, and 1 times the multiplicand, The multiples are formed by the shift array; the signs by the KEG con- trols, i.e. by adding or subtracting the multiple. The MDY1 input is used only for an ADD or SUBTRACT instruction, not for MULTIPLY. After passing through the SDS cascade, the contents of LS-LM and LH-LQ -(partial product and multiplier) are shifted right 8 bits back into the US-UM and UQ Registers. This continues for 8 cycles; the 9th is an assimilation cycle. Here the product in SD format is applied to the propagation logic, the output of the propa- gation logic to S^+, and consequently converted to conventional representation. Admittedly the scheme just outlined is expensive and in many cases may not be justified. The designer may wish to choose a similar scheme but with fewer levels of cascade, i.e. smaller radix. Although the division scheme to be designed is built upon this radix 256 multi- plication scheme, the techniques and procedures should be easily reducible to a lower radix case. Before concluding this section, we must admit a slight diversion from our design strategy. The reader may have noticed that all four of the SDS in Figure 9 have been extended to the left one byte. 52 Actually, if the multiples of M were added in the order, 1, 2, h, 8, l6, 32, Gh, 128 rather than the way shown, only S^4 would have to he extended a full 8 bits. Since, however, quotient digits are formed most significant first , (the product is formed least significant first) and we wish to use this same shift array for divide, the arrangement must be as shown. The extra SDS stages must be included and thus the division scheme has, to some extent, infringed upon the design of the multiplication scheme. 3.3 Design of Division Scheme 3°3°1 General Now begins the task of embedding a division scheme within the multiplication scheme described in the last section. Since the SDS cascade will perform both addition and subtraction of the contents of the M-Register and the number in SD format in the UM-US Registers, the obvious extension is to place the divisor in M and the dividend and subsequent partial remainders in UM-US. The quotient digits will be produced in redundant form. In this case a logical choice would be to produce quotient digits in SD format so that they may be assimilated by the same circuits as used in multiplication. The contents of UH-UQ may be gated to US-UM via UHQDUSM and then assimilated as in the final cycle of multiplication. The quotient is thus stored in UH-UQ: the sign bits in UH and magnitude bits in UQ. Furthermore, division with the hardware will require an 8 bit shift from LS-LM to US-UM (LSML8USM) and from LH-LQ to UH-UQ (LHQL8UHQ) . 53 The full precision division is now generally defined. The divisor is first stored in M, the dividend in UM and the sign of the dividend in all positions of US. Quotient digits are then formed by a model division using d and rp . The quotient digits are stored in SD format in UH-UQ and also used to set the multiples of the divisor in M to be subtracted from the dividend. The next partial remainder is formed in the SDS cascade (SI, S2, S3, SU), stored in LS-LM, and then shifted left 8 bits into US-IM. These cycles continue until the full precision quotient has. been generated. The quotient is then gated directly from UH-UQ into US-UM, assimilated, and gated into EM where it is available to the central processing unit. We must now design a model division to select the quotient digits to be stored in UH-UQ and to be used to control the M- shift array in forming a full precision partial remainder. Note that the 2k division scheme here is of the class with radix r = 2 , n = 2/3 (r-l) as mentioned in Section 2.5-2. The number of cascades, k, is k in this case. The value of n is the sum of the maximum multiples of the divisor which may be formed at each stage of the SDS cascade and here is 128 + 32 + 8 + 2 = 170. The radix point is between the leftmost and next leftmost byte of the UM-US and LM-LS Registers. 3.3.2 An Arithmetic Model First considering an arithmetic model, we select case 3 of Section 2.5.2 and calculate that for k = k. N = 12 bits and N, = 13 p d bits. The first 12 bits of the shifted partial remainder could there- fore be assimilated' into conventional form and divided by the 13 high 5h order bits of the divisor to produce 8 quotient bits. This operation could be performed by a non-restoring scheme in auxilary hardware such as the exponent arithmetic unite Since an exponent unit normally does not perform division, some augmentation is required. The minimum addition would be a left shift path from the secondary to the primary rank of the accumulator. Also, since we have specified only a 7 bit exponent, the width of the exponent unit would require an extension of 5 bits. These additions would, however, be relatively inexpensive. The exponent unit, which normally sits idle during most of the division operation, could be used more efficiently. There is however, a major disadvantage to the arithmetic models: the necessity to round the quotient digits produced by the model before being used by the full precision mechanism. This rounding was mentioned in Section 2.5-2 and is obligatory if the cost criterion is to hold. Without this requirement the quotient bits could be used sequentially as they are generated to set the gates of the M-Shift array. In this case, the full precision divisor would be formed in LS-LM very shortly after the last quotient bit was produced by the model. Since, however, the rounding may affect the most signi- ficant bit of the quotient returned from the model, the propagation through the SDS array cannot begin until the model division is complete. This restriction severely limits the feasibility of the arithmetic models and due to this rounding requirement, a table look-up model will be used in the example developed here. 55 3.3-3 A Table Look -Up Model As described in Section 2.6.3; the round-off problem does not arise in a table look-up model. The major disadvantage here is hard- ware cost and large fanout requirements of d and rp . to the selection u logic. In the example arithmetic unit being developed here, multipli- cation is radix 256. For compatibility we would also like division to be radix 256, and consequently, would like a radix 256 table look-up model which would produce 8 bits of the quotient in parallel. By considering a P-D plot for radix 256, n = I7O, or merely the fact that N = 12 bits and N, = h bits, the reader may quickly convince p d himself that the hardware requirements for such a scheme are prohibi- tive, at least with conventional logic. A radix 16 -table look-up is probably possible with integrated circuitry and perhaps with more conventional circuitry if the designer is willing to pay the price; approximately 25O, 5-input NAWDS; 160, 8-input NANDS; 250, 8-input N#RS; and 160 drivers which will drive up to 50 NOR loads. In this example we will adopt a more modest approach in implementing a radix U-table look-up and apply it successively at four positions of the SDS cascade. In a sense, we have been forced to reduce the radix 256 division to ^4-radix k divisions. From Section 2.5.3 a radix k table look-up model requires N = k, N =6. The 6 bits of the partial remainder are supplied d p sequentially from four stages of the full precision hardware labelled "TO MD" in Figure 9. The first stage is the output of US-UM, the other 56 three from the output of SI, S2, and S3- The high order bit supplied to the model is displaced 2 bits right at each stage „ Thus if the /\ subscript 1 denotes the high order digital position, the first rp . J to the model is US , UM through US,-, UM^ • The second input is the third through eighth output of SI, etc. A block diagram of the proposed table look-up model is shown in Figure 11 and described in Table k* The P-D plot which is actually implemented is shown in Figure 12c Table 5 explicitly illustrates the quotient digit selection for each rp . and d. Note the correspondence J between the steps in the overlap regions of Figure 11 and the steps shown in the table . Before studying these figures and tables note the following considerations which are incorporated in the design: 1. Only the first quadrant of the P-D plot is actually implemented. The approximations d and rp . are considered to be positive and the real sign is computed as with a sign-magnitude representation- If rp . is negative when J presented to the model, it is made positive before assimilation by complementing the sign bits. 2. The divisor and thus the selected divisor interval is a constant for a given division and thus the speed of selecting the divisor interval is much less critical than that of forming the partial remainder interval. 57 3. The QUOTIENT SELECT TABLE actually implements ZERO and TWO regions of the P-D plot in Fig-are 12 and forms j6NE as ZERO TWO. The TWO and ZERO regions are easier to implement than the j&NE region since they are bounded on one side by the range restrictions on rp . . The inputs to the model and the controls are supplied from the full precision unit as shown in Figure 9 and are designated as follows: i,j = integer subscripts ° US. = the true output of the j-th position of the US Register containing the sign bits of the partial remainder o UM. = the true output of the j-th position of the UM J Register containing the magnitude bits of the partial remainder. T. . = the j-th sign bit of the output of isigned- digit subtractor Si. Z. . = the j-th magnitude bit of the output of signed digit sutractor Si. M. = the true output of the j-th position of the M J Register containing the divisor. M is the sign of the divisor- C. = sequence control signals. E = logical simmation (j6R.). H = logical product (AND) The other symbols used in Figure 11 are defined in Table k. 58 — _ c\J co ro ro _,. >->->->->->->_^"0_cjr DQ o H O u. cr CO § o ■*• o ■*■ a 00 CO CO o O CO > UJ Q O o LlI o a CD < < cr < Q o o _l QQ UJ a: => CD cr o to > < > cr o UJ CO _ CM K> » o o oo 59 , — v T, T5 d) \W> -p IK o s a o CVJ o H w 1 K II EH 05 •H t-o Ph H -P « Qj •H O (U CO o Ph w X •H n CD ^ -? ^ <¥ •? ^ CO Ph j vO H *H -H •H -H -H < o H • rH »\ "n *\ "H »\ r\ *\ r-\ V 1 S H C\J CO CO P IS] tsl IS] P rH C\J CO EH EH EH vow .« O ■H H c\j on -a- H caj on J- o O O O O c_> o o o II H J V 1 II > > > II > > > Ph II H H PM Cis O CO |Ph rH G CD +^ a O T3 CD -H -H •H >} O x! a oo -p ^) s -p 00 -H T3 CO •h-O fl m T3 O CD Cn w cO o p 0) -P XI 00 -p -P O O CD CD m W X H •r) o "H a X CD En 4J Eh <+H CD O Qh M-P N < Ph a H O °H 1 O Ch H O CD H w • a w o pc; O W cO W (DOS o C CD cd • a a • CO 00 -r-±|D M X! "H rH 00 W M 0( CU -H CO -p 00 O -H P O • rH H W -p •n a ^h w -H -p go CD ttH -P 00 XI CO CD rH a< a M C B CJ -P s PCI -P CD rl ti rl -P H O cd a 3 cd cd o oo a Eh -poo 1 p> p> cd -p a cd o 1 CO O CD CD O XI -rH -H 1 00 w Tj CD -P T5 B -P P CD XI S O x! ^ H CD rl O o cd ch w o a < -p -p w EH co O p Ch cr EH O H H fi N EH D fe o H S H CO H •H Ph Ch O a o •H 4^ •H rl O 00 CD O CO o •H 00 o l-H a CO cO a o •rH -p o a a Ph CD H r° CO Eh 6o B M Ph H K O CO P s o P 1 •P pq •H S E > •H pq •H . J" CO e IS Ph en •P •H IS s s , oj E O Ph IS II I! II II •H VD •H H W m < P -cj- m IS c\j oj oo VD p P > > OJ LT\ LT\ VD on s p P P P no cn > > > > S S IS S ■3^ , w OJ OJ H -J- J" UA is IS S S P P P p ii ll ll ll II II II II PO -=f LT\ vO t— CO ON O p P P p P PT P rH P M I — 1 « co p •i o H H O 5 O H CD -I h CO P -p M CO d cd H Ch J>> CD H Xi d CO CD -H S x! CD -P H £ -^ S -p O -H O JS >> ,Q . CO ■np 0) CO P Ph cd M CD CD x! d -P O > P a; P rH -p CO crt T-J w d CO CI •H a; w ■H rd XI •H CO X! ■i-EH 1 -P < D4 CO M M d o CD o o P w •rH H CO -p P U M d S CD CD CD H > > rH a; d d rH O o O o CJ> CJ X> co -H- s o P 1 CD X> CO r 1 s -p co a o cj> II t3 CD rHP S > II > > O K O 4° H <: -4" P OA UA P < -J- , -J- < l > o •H > ■H 01 hi) a o CD >> H XI crt a; -P a Ti (D O jG H p 1 •H H O H TJ taD •H H o a o •H P> a, •H o w CD P o (an o P CO 3 o •H -P O a o 3 CD H ,a cO H 62 (=5 O M EH Ph M K O CO o H O O r-H & & CO OJ »>\ H O S M O CO •H T B H CO O II QJ H i ■H -H -H OJ OJ OJ ? CO CO ? C3? 0? CO CO CO la? Ic? Icsf (\l ro J- u o o , co. co, vo, -3-. oj CO CO CO CO CO la? \H H >H OJ OJ >H >H Lf\ _"t 1-H h3 ro no >H >H s H >H H HH s PI s c (L) H H CD m 3 •H P=h

« G O fn rH QJ 01 XI CO ■H u QJ XI P H a 1 0) D rf H P to § rn x! 1 p3 C3 tu p HH g (L) -P [xj Ph M XI o O o xl r-\ £3 S3 3 s 01 H P H o rH aj H crt p 4h p aj H P H U CO O

a. fe 0) H rtf ■H p rH 3 •H J - ) rH •rH QJ a; x! P CO p Ph o P P £ cO rrf c H -p q cO i G o rH d cO 2 M rO 2 cO o o S o o 63 r J ■ 3 - S*'\*\* H- 2 - yr 2 ^^4 Pj .§d' '*- ^^^-4 Pj =|d 1 - 1 ^ \ 1 ^ ^ 4 Pj = f d 2 1 1 4 Pj=i d - 1 1 1 1 1 1 -c 1 9 5 II 3 13 7 2 16 8 16 4 16 8 1.0 FIGURE 12. P-D PLOT FOR TABLE LOOK-UP MODEL 6k hp. QUOTIENT DIGIT SELECTED 10.1100 01.1100 01.1011 01.1010 01.1001 01.1000 01 . 0111 01.0110 01 . 0101 01 . 0100 01 . 0011 01.0010 01.0001 01 . 0000 00.1111 00.1110 00.1101 00.1100 00 . 1011 00.1010 00.1001 00.1000 00.0111 00.0110 00.0101 00.0100 00 . 0011 00.0010 00.0001 00 . 0000 I J 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 _2_ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Divisor d .1000 .1001 1010 .1011 1100 .1101 ,1110 1111 Table 5. Quotient Select Table. 65 3.U Estimate of Speed of Execution Although in this report we have described the division scheme only at the block diagram level, a detailed simulation has been programmed and will be available in . Based upon this simulation and actual logic design of the arithmetic unit of Illiac III we can estimate the execution time of this division scheme in terms of transistor collector delays „ The actual logic is of the direct coupled saturated DTL type . Table 6 summarizes the number of transistor collector delays associated with operation of each block of the model division, Figure 11, and with the relevant blocks of the complete arithmetic unit shown in Figure 9° These figures are used in Table 7 ■ in tracing the opera- tions involved in performing one division cycle i.e. making one pass through the SDS cascade and producing 8 quotient digits in SD format- The final cycle assimilates the redundantly represented quotient as described under ASSIMILATION. To estimate the execution time in seconds we shall assume a collector delay of 15 ns and thus 8 bits of quotient require 76 x 15 ns 1.1 usee. A 56 bit division such as proposed for Illiac III therefore requires 7 •! J^sec. plus 0.3j^sec. for assimilation or a total of 8 jusec, Initial and terminal shifting of operands have not been included but represent a negligible time compared to the execution time of the recursive operations. 66 BLOCK NUMBER OF COLLECTOR DELAYS Model Division Figure 11 Input Gating Sign Detect Negate Borrow Generate Quotient Select Table Quotient Storage and Shift Control 2 1 1 3 2 3 Total for Model per 2 Digits of Quotient 12 Full Precision Division Figure 9 Signed-Digit Subtracter (Each) (SI, S2, S3, Sk) M-Shift Gates (including Driver) Register to Register Transfer Propagation Logic 3 3 2 7 Table 6. Transistor Collector Delays of Blocks of the Division Scheme 67 Initial Conditions: Divisor in M-Register- Dividend in UM-Register „ Sign of Dividend in All Positions of US-Register. EVENT QUOTIENT GENERATION NUMBER OF COLLECTOR DELAYS Perform Model Division Set ML7Y1 or ML6Y1 Perform Add/ Subtract in SI Perform Model Division Set MLSY2 or MLUY2 Perform Add/Subtract in S2 Perform Model Division Set ML3Y3 or ML2Y3 Perform Add/Subract in S3 Perform Model Division Set ML1YU or MDY^ Perform Add/ Subtract in Sk Store Result in LS-LM Left Shift via LSML8USM Total Time per 8 Bits of Quotient 12 3 3 12 3 3 12 3 3 12 3 3 2 2 76 ASSIMILATION Gate Quotient in UH-UQ to US-UM via UHQDUSM 2 Direct through SI h Generate Borrows in Propagation 7 Assimilate to Conventional Form in Sk 3 Store in LM _2_ Total Time for Assimilation 18 Table 7* Transistor Collector Delays in Execution of Division. 68 k. SUMMARY AND CONCLUSION k.l Summary The first half of this report was largely a constructive definition of SRT division. It introduced a recursive relationship defining division, a representation of the quotient allowing both positive and negative digits, and range restrictions on the partial remainders. It was then shown that the consequence of this quotient representation and range restriction was that correct quotient digits could be selected by inspection of truncated versions of the divisor and shifted partial remainders. The P-D plot was described and used as a key tool in the development. Next, the report turned to the more specific task of deter- mining the number of bits necessary in these approximations. The cost criterion was stated as the fundamental requirement on the precision of inspection. Although this criterion is general, to obtain numerical 2k results the discussion was restricted to a radix of the form r = 2 and to the arithmetic or table look-up type. The chapter concluded with a short discussion of the conversion of the redundantly represented numbers to conventional form. The second major section of the report attempted to relate the equations, graphs, and statements of the first section to real- world problems of designing a digital arithmetic unit. It described some general design considerations and pointed to compatibility of division with multiplication as one of the most important. 69 At this point, the discussion of division digressed to one of proposing a multiplication scheme and to the block structure of an arithmetic unit with which it could be realized. The focus then returned to division where, after rejecting an arithmetic model, a table look-up model division was proposed. The model was described at the black-box level and some estimate was given as to the expected operation time of such a scheme implemented with conventional DTL. h.2. Conclusi on To a large extent, this report has been directed to the designer faced with the task of implementing digital division. The mode of presentation, however, has not been intended to be of an algorithmic style, but is rather aimed at a basic understanding of SRT division in hopes that the designer will be able to adapt it to his particular specifications and hardware. The chapter on imple- mentation was included merely to indicate one way of applying SRT division. The author also hopes that this report will support ex- ploration into development of higher radix quotient selection models, e.g. a true radix 256 model which can select 8 quotients bits in parallel. Note that the operating speed of the model in the example implementation is by far the slowest link. 70 Much of the delay in quotient select is, however, charge- able to the necessity for assimilating the redundantly represented p . . It would therefore appear appropriate to explore models which could select quotients directly from a redundantly represented partial remainder., Perhaps this could he accomplished with analog techniques in which rp . was converted to a voltage proportional to the weighted J sum of the bits. Such a converter could handle both plus and minus weights. It may also be possible to mitigate the round-off problem associated with the arithmetic models. The P-D plot could then be implemented with analog-digital rather than strictly digital circuits. Also note that the form of the quotient selected by the model in the example implementation is by no means unique. In this case, the SD format was selected so as to be compatible with the M-Shift Array control signals and the assimilation circuitry used for multiplica- tion. There may, however, be more efficient recodings. Perhaps the goals could best be summarized as attempting to implement division so that it is actually performed as the inverse of multiplication. 71 LIST OF REFERENCES [1] Ivan, Flores, The Logic of Computer Arithmetic, Englewood Cliffs, New Jersey, Prentice-Hall, Inc., 1963; pp* 2^6-3^7- [2] 0. L. MacSorley, "High Speed Arithmetic in Binary Computers," Proceedings of the IRE , U9, January, 1961, pp. 80-91. [3] J. E. Robertson, "A New Class of Digital Division Methods," IRE Transactions on Electronic Computers , EC-7, No. 3; September, 1958, pp. 218-222. [h] J. E. Robertson, "Lecture .1 Notes for Math/EE 39V University of Illinois, Urbana, Illinois, 1965° [5] J- E. Robertson, "Methods of Selection of Quotient Digits During Digital Division," File No. 663; Department of Computer Science, University of Illinois, Urbana, Illinois, 1965- [6] J. E. Robertson, Private Communication, September, 1966 . [7] Roger E. Wiegel, "Methods of Binary Addition," Report No. 195, Department of Computer Science, University of Illinois, Urbana, Illinois, 1966. [8] D. E. Atkins, "Arithmetic Unit of Illiac III: Simulation and Logical Design-Part I," File No. 713; Department of Computer Science, University of Illinois, Urbana, Illinois, 1966. [9] C. S. Wallace, "Suggested Design for a Very Fast Multiplier," Report No. 133; Department of Computer Science, University of Illinois, Urbana, Illinois, 1963; PP- 8-9- [10] D. E. Atkins, "Arithmetic Unit of Illiac III: Simulation and Logical Design-Part II," File Note in progress, Department of Computer Science, University of Illinois, Urbana, Illinois, 1967. 72 AUG 1 5 1SC3