nimmu ♦t • The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN OEC 16 m L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/studyofmethodsfo397atki COO- 1018- 120U Report No. 397 A STUDY OF METHODS FOR SELECTION OF QUOTIENT DIGITS DURING DIGITAL DIVISION by Daniel E. Atkins June 1970 THE LIBRARY OF. THE JUN 2 5 1970 AT URBANA- 1 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN ■ URBANA, ILLINOIS C00-1018-120U Report No. 397 A STUDY OF METHODS FOR SELECTION OF QUOTIENT DIGITS DURING DIGITAL DIVISION* by Daniel E. Atkins June 1970 Department of Computer Science University of Illinois Urbana, Illinois 6l801 *This work was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, June 1970, and was supported in part by the National Science Foundation under Grant Nos. US NSF GJ 8l2 and US NSF GJ 813, and in part by the U.S. Atomic Energy Commission under Contract USAEC AT(ll-l-10l8) . J I IS - 111 The author gratefully acknowledges the continued support of the ACKNOWLEDGMENT Department of Computer Science during the past five years and particularly thanks the following professors, colleagues and employees of the department. First to his thesis adviser, Professor James E. Robertson, whose guidance, encouragement, and friendship are highly valued. Second, to Professor Bruce H. McCormick, whose never failing loyalty, support, and enthusiasm are equally valued. The author's colleague, V. G. Tareski is the author of the very efficient prime implicant generation algorithm so essential to this work and mentioned in Section h. He was also a source of encouragement and enlighten- ing discussions. C. R. Baugh , T. K. Liu, and T. Ibaraki developed the program used to solve the massive covering problems encountered in the course of minimization. B. G. DeLugish has assisted in offering valuable discussions and in the arduous task of proofreading. The final typing is the fine work of Mary Ann Davis and Betty Gunsalus. The excellent drawings were done by Stanislav Zundo and the equally excellent offset printing is the work of Dennis Reed. And finally, acknowledgment is due Miss Peppermint Patty whose incisive comments, presented below, have been a source of comfort in times of confusion. I LM TWOS THE &ST...THEY'RE SORT Of GENTLE. THREES AMP FIVES ARE MEAN, WT A FOUR IS ALWAVS PIEASAMT.. I LIKE SEVENS AND EI6HTS, TOO, 6VT NINE5 ALWAflS SCARE ME. ..TENS ARE 6REAT... HAVE VOU PONE THOSE PlVlSlOW PROBLEMS FOR TOMORROW? N0THIN6 5P0ILS NUMBERS FASTER THAN A LOT OF ARITHMETIC I © 1968 - United Features Syndicate IV TABLE OF CONTENTS Page 1. INTRODUCTION 1 1 . 1 Background 1 1 . 2 Present Work 6 2 . DEFINITION OF THE DIVISION PROCEDURE 8 2.1 Formal Definition of the Full Precision Division 8 2.2 Graphical Representation of the Division Procedure 9 2.3 Formal Definition of the Quotient Selection Procedure 10 2.k Physical Model of the Quotient Selection Mechanism 13 3 . DEFINITION OF COST AND PERFORMANCE 20 3 • 1 Preliminary Remarks 20 3.2 Definition of Cost 21 3.2.1 Preliminary Remarks 21 3.2.2 Structure for Finding Cost of Table 2 22 3.2.3 Structure for Finding Cost of Table 1 and the Multipliers 25 3. 3 Definition of Performance 28 3.3.1 Performance of the Model Division 28 3.3.2 Performance of the Full Precision Division 29 k. ALGORITHMS FOR SYNTHESIS AND ANALYSIS 31 k . 1 Preliminary Remarks 31 k.2 Deriving a Minimal Cost Design for Table 2 32 U. 2. 1 Defining the Output Functions 33 4.2.2 Minimizing the Output Functions 46 4. 3 Deriving a Minimal Cost Design for Table 1 49 4 . 3 • 1 Defining the Output Functions 50 4.3.2 Minimizing the Output Functions 59 5 . RESULTS FROM DESIGN PROGRAMS 60 5 . 1 Preliminary Remarks 60 5.2 Numerical Results from Design Programs 60 5.2.1 Cost of Table 2 for Type 2 Structure 60 5.2.2 Cost of Table 1 for Type 1 Structure 65 5.3 Analytic Results Concerning Cost of Table 2 66 5-3.1 Preliminary Remarks 66 5.3.2 Definition of s. , s!, and sV 67 5.3.3 An Estimate of Cost as a Function of s! 73 5.3.4 Discrepancies 84 V Page 5.^ Analytic Results Concerning Cost of Table 1 88 5 . k . 1 Preliminary Remarks 88 5.^.2 Worst Case Bounds on Transformed Parameters 88 5- U.3 An Estimate of the Cost of Table 1 93 6. ESTIMATES OF COST AND PERFORMANCE 95 6. 1 Preliminary Remarks 95 6. 2 Type 2 Structures 95 6.2.1 Cost versus Radix 95 6.2.2 Performance versus Radix 97 6.2.3 Cost versus Performance 98 6 . 3 Type 1 Structures 99 6.3.1 Cost versus Radix 99 6.3.2 Performance versus Radix 100 6.3-3 Cost versus Performance 100 6. h Hybrid Structures 101 6.k.l Cost versus Radix and Number of Adders in Multiplier 1 101 6.k.2 Performance versus Radix and Numbers of Adders in Multiplier 1 102 6.H.3 Cost versus Performance 10U T . SUMMARY AND CONCLUSIONS 105 7 . 1 General Summary 105 7 . 2 Cost and Performance 106 7.3 Analytic Results ' 110 7 . h Suggestions for Further Investigation 112 APPENDIX A. Algorithm for Generating Minimum Cost Sum-of-Products Definitions of the q-Regions of Table 2 11 U B. Example of Results of QSU and Minimization Program 11 6 REFERENCES 117 VITA 120 VI LIST OF TABLES Table Page 1. Equations Defining the Regions of Figure 1 11 2. Summary of Cost Calculations for Table 2 with r=l6, n=10, a=l/2, b=l, Y=l/l6, A=l/l6, a=0, 6=1/256 63 3. Summary of Cost Calculations for Table 2 with r=l6, n=10, a=3A, b=9/8, Y=l/l6, X=l/l6, a=0 , 3=1/128 6k k. Summary of Cost Calculations for Table 1 with a=l/2 , b=l , y=l/l6 , X=l/l6 , a=0 65 5. Results of Least Squares Fit of M'(i), F'(i), and C ' ( i ) for Data from Table 2 78 6. Comparison of Results from Estimating Equations and the QS3 Program for rp = 1/32 8k 7- Cost of Table 2 versus Radix 96 8. Performance of Type 2 Structure versus Radix 98 9. Cost Bounds versus Performance for Type 2 Model Division 99 10 . Cost of Type 1 Structure versus Radix 100 11. Performance of Type 1 Struct re versus Radix 100 12. Cost versus Performance for Type 1 Model Division 101 13. Cost Computations for Hybrid Structures 102 ik. Performance Calculations for Hybrid Structures 103 15. Cost versus Performance for Hybrid Model Division Structures 10^ VI 1 LIST OF FIGURES Figure Page 1 . P-D Plot with r=k , n=2 11 2. Generalized Structure of Model Division (Quotient Selector) l6 3. Network Definition of Table 2 23 k. Network Definition of Table 1 25 5 . Structure of Multipliers 27 6. Portion of P-D Plot Illustrating Segmentation of rp-line 35 7. Portion of a P-D Plot Illustrating Constraints in Finding Divisor Transition Interval 36 8. Flowchart of QS3 Program Hi 9. Portion of a P-D Plot Illustrating Constraints in Finding A(d) 51 10. Flowchart of QSU Program 5^ 11. Cost of Implementing q(i) Region vs. i for Data in Table 2 66 12. Graphical Interpretation of s 68 13. Graphical Interpretation of s! 71 Ik. Model of the q(i) Region Used in Approximating Effects of Minimization 75 15. M' (i ) versus s 16. F' (i ) versus s 17. C ' (i ) versus s 18. C (i) versus s 79 80 82 for Arp=l/l6 and Arp=l/32 85 19. Geometry for Derivation of Estimates of d 90 20. Cost versus Performance for Samples of Model Division Structures 107 A STUDY OF METHODS FOR SELECTION OF QUOTIENT DIGITS DURING DIGITAL DIVISION Daniel Evell Atkins, III, Ph.D. Department of Computer Science University of Illinois, 1970 This study concerns a class of non-restoring division schemes in which redundancy is introduced into the representation of the quotient thereby permitting quotient digits to be selected from highly truncated versions of the divisior and partial remainders. The mechanism for selection of quotient digits is a limited precision model of the full precision division which it controls by the generation of simple microprogram instructions. A major advantage of this approach to division is a high degree of congruity with commonly used multiplication structures, including those making use of limited propagation adder-subtracters, for example, carry-save adders. A cost versus performance analysis for a large class of quotient selection mechanisms (model divisions) is developed. The class is defined in terms of a block diagram and a set of ten design parameters. By varying the structure of the sub-blocks and the values of the parameters , the model division scheme ranges from that of forming quotient digits by multiplying the dividend by the inverse of the divisor, to that of a direct table look-up of the quotient digit. So called hybrid structures exist between these two cases. Algorithms are described which synthesize near minimal cost realizations of the most complicated sub-blocks: a combinatorial logic network to produce appro- priate estimates of the reciprocal of the divisor, and a combinatorial logic network to generate a quotient digit directly as a function of the bits in estimate of the divisor and partial remainder. Formulas are given for the cost of the remaining sub-blocks. For a given type structure the primary determinant of performance is the radix of the model division, r = 2 , where k is the number of bits of quotient produced per access to the model division. A FORTRAN implementation of the synthesis routines is used to obtain the near minimal cost for several different structures and sets of design parameter values. The numerical results, together with the insight gained in obtaining them, are applied to hypothesize a formula for minimal cost. The analysis includes a multi-variable expression which relates cost to the radix of the model division, r, the degree of redundancy in the quotient representa- tion, and the magnitude and direction of the maximum truncation error in the divisor and partial remainder estimates. The cost formulas, together with easily derived performance formulas, are used to tabulate expected cost and performance for a variety of structures. It is found that for most schemes the cost varies exponentially with performance and consequently, that many of the higher radix schemes are not practicable. A radix U, direct table look-up, however, can be built with about ten, 10-input gates, and assuming 10 ns. logic, could produce 60 bits of quotient in about k us. The study is concluded with suggestions for further investigation. 1 . INTRODUCTION 1. 1 Background Since division is the mathematical inverse of multiplication, one might hope that the cost of implementing both a multiplication and division operation would not be much different than the cost of implementing multipli- cation alone. Furthermore, for a given operand length, one might expect the executions times for the operations to be about the same. In actual practice this hope has not been realized, largely due to the fact that division, un- like multiplication, is inherently a trial-and-error process. In multiplication, a product is accumulated by the successive addition of multiples of the multiplicand to a partial product. The selection of which multiple to add is dependent upon a digit, radix r, of the multi- plier — a quantity which is known apriori. Now consider a recursive relationship for a class of division techniques based upon subtraction. This relationship is defined by p = rp - q., +1 3) The cost terms C , C , and C are functionally related to the design parameters such as radix, maximum quotient digit, range of divisor, and uncertainty in the estimates of the divisor and remainders. The terms C and C in C are the most complex and will be studied by computer synthesis. Estimates of C and the remaining terms of C will be obtained manually as required. In most cases, the term C p is dominated by C +C and may be neglected. 3.2.2 Structure for Finding Cost of Table 2 Table 2 will be studied as a multiple-output logic network. It may be represented as shown in Figure 3. The functions, f through f are Boolean functions of the bit vectors corresponding to d and rp. These vectors are denoted d and rp, respectively. 23 A rp d A TABLE 2 MULTI - OUTPUT LOGIC NETWORK 7T "A - . ^f (d_,rp) ►fi^.rp) *f n (d,rp) Figure 3. Network Definition of Table 2 In specifying the quotient selection criterion (Section 2.3), every pair (d, rp) has "been associated with a set, I, of quotient digits which the quotient selection mechanism may generate when given inputs (d, rp). The functions, f , f . , ... , f must be found such that for every ordered pair, o 1 n (d, rp) with allowable quotient digit set, I f. (d, rp) = 1 for one and only one iel, and f (d, rp) = for all other values of i. K. (3.1+) (3.5) In other words, every pair (d, rp) in the set D x P must cause one and only one of the outputs to be true, and this output must correspond to a correct quo- tient digit. Due to the overlap of adjacent quotient regions produced by redun- dancy, many elements in D x P may have sets, I, containing more than one element, thus many sets of different functions are allowable for given design 2k parameters. But our wish to compare minimal* costs imposes another constraint, namely, that the cost of the multiple output network (as defined in Section 3.2.1) is minimal. Symbolically stated: the requirement is that Cost (f + f_ o 2 + f „+•••+ f ) be minimal . In the general minimization of two-level, AND-OR realization of a multiple-output network, it is necessary to generate the prime implicants of each of the individual output functions, plus the prime implicants of the functions which are equal to all possible products of two output functions, three output functions, etc. Each product is a multiple-output prime implicant McCluskey [33], states the following theorem of use here: Theorem: For any definition of networks cost such that the cost does not increase when a gate or gate input is removed, there exists at least one minimum-cost, two-stage network in which the corresponding expressions for the output functions, f a , are all sums of multiple-output prime implicants. All the product terms which occur only in the expression for f j are prime implicants of f a ; all the product terms which occur in both the expressions for f a and f^ but in no other expressions are prime implicants of f- • f^, etc. But in the present case, no two functions are ever simultaneously true and thus none of the prime implicants of f . are contained in any other J function, f , k ^ j. Thus, by the theorem stated above, there exists a minimum K. cost two stage network which may be found by minimizing each function indepen- dently of the rest, i.e. Min Cost (f + f + ••• + f ) = Min Cost (f ) + Min Cost o 1 n o (f ) + ••• + Min Cost (f ). *The term minimal , implies that we wish to find any one of possibly more than one minimum cost implementations. 25 3.2.3 Structure for Finding Cost of Table 1 and the Multipliers As with Table 2, Table 1 will be defined as a multiple-output logic network as shown in Figure k. The input is d, the bit-vector representation of d. The outputs are the variables a = g ,(d), a = g (d), ... a = g. (d), where g. is a Boolean function. The bits, a through a. comprise the binary representation of inverse of d, A. Unfortunately, in this case, we cannot constrain the problem so that none of the outputs are simultaneously true. For purpose of estimation, however, it will be assumed that the results obtained by minimizing each function independently will yield an adequate estimate of the minimum cost, i.e. C = Min Cost g + Min Cost g.. + ... + Min Cost g. . A d TABLE 1 MULTI - OUTPUT LOGIC NETWORK -l -►a -►a A = a . a * a, -1 o 1 a. Figure h. Network Definition of Table 1 26 We now consider the cost of the multipliers. It is beyond the scope of this work to develop a cost-performance analysis for multiplication struc- tures. The approach adopted here is to present a structure which experience has shown to be efficient and to approximate C from the structure. More information about such a structure may be found in [ 8 ]. The multiplier is illustrated in Figure 5. It consists of a cascade of limit carry-borrow adder-subtracters together with shift-gates (S.G. ) which form the necessary multiples of the multiplicand (rp). Shift gate SGO , in con- junction with complementing circuits, form the multiples +1 and +2 times; SGI 2i +1 forms +k , +8 times; and, in general, SGi , form multiples of +2 times the multiplicand. The multiples are selected by a recoding of a through a.. Appended to the output of the last adder is hardware which converts the pro- ducts from the redundant representation produced by the limited-carry or borrow device to a non-redundant format. The cost of Multiplier 1, C , will be defined by C M1 " J°R + N A N B C A + k. ALGORITHMS FOR SYNTHESIS AND ANALYSIS U.l Preliminary Remarks The derivation of cost and performance functions by a direct, analytic approach is complicated "by the discrete nature of these functions and by the large number of variables. An empirical, constructive approach was therefore adopted. The first phase of the experiment (the topic of this section) required the formulation of a systematic approach to the synthesis of a minimal cost, mathematically accurate, quotient selection mechanism for a given set of design parameter values. Although the synthesis routines in themselves would be of use in designing a quotient selection mechanism, in this study they are used as tools in studying the cost and performance functions. We are performing analysis by means of computer-aided synthesis. In the second phase of the experiment, the programs developed in the first phase were run with various combinations of parameter values in order to estimate cost and performance. The results of each run might be thought of as determining a point on a cost versus performance curve. The hope is that only a few runs, relative to all possible parameter combinations, would be necessary in order to find approximations which would be useful for inter- polation and extrapolation. But this empirical approach is not without major practical prob- lems. There are a huge number of possibilities for parameter values, and the minimization problems are very large and demanding of computer time. These problems were mitigated by restricting the values of parameters to those of practical importance and by concentrating on the effects of dominant parameters, 31 32 As discussed in Section 3, the dominant cost term for a Type 2 structure is C^, the cost of Table 2. For a Type 1 structure, although the cost of Table 1 (C™) may not dominate the cost of the multiplier, it is the least studied term. The following sub-sections comprise a description of algorithms which generate logic equations which define Table 1 and Table 2 for given values of design parameters. The algorithms do not produce a defi- nition of the other blocks of Figure 2, but do place some constraints upon their structure. h.2 Deriving a Minimal Cost Design for Table 2 Conceptually, Table 2 in Figure 2 is a direct implementation of a P-D plot. To implement a given P-D plot, a relation must be defined from the set D x P to a subset of D x P, D x P, such that each element of D x P maps into an element of D x P and with error bounds for each element (d, rp) such that the quotient selection criterion is satisfied. Note that we have not required that the relation be a function, since, due to redundant representa- A A tion, the same rp-value or d-value may map into different rp or d values; uniqueness is not guaranteed. For practical reasons the relation is restricted to those which may be defined by the successive operations of truncation and assimilation (conversion to a non-redundant form). Even within this restriction, however, there are many possible alternatives. The maximum amount of trunca- tion error which may be tolerated for a given pair (d, rp) depends upon the location of the point. There is also trade-off between e and 6, the points of truncation of rp and d, respectively. The following is a list of the steps in the process of deriving a minimal cost design for Table 2. 33 1. Set the values for design parameters: n, r, a, b, a', 8', Y 1 » ^'» e> 6 -* 2. Run the program QS3 (described in Section U.2.1) to produce a sum-of -products (minterm) definition of each output function of Table 2. 3. Run the program, PI, with each set of minterms produced by QS3 as input. The program PI finds all prime implicants of the functions, identifies the essential prime implicants, and generates the constraints which must be satisfied in order to cover the function. h. Run an Integer Linear Programming routine to find a minimal cost set of prime implicants which satisfy the constraints produced in step 3. The cost of a prime implicant is the number of literals. The combination of the prime implicants selected in this step, together with the essential prime implicants identified in step 3, define the Boolean function. 5. Tabulate the total number of literals required to define each output functions. The total of these values will be taken as the cost of implementing Table 2. U.2.1 Defining the Output Functions As described in Section 3.2.2, Table 2 is treated as a multiple out- put network. This section describes an algorithm for specifying these 'Initially, Table 2 is studied apart from Tl, Ml, and M2. A = P (d) = 1. 3^ functions as sums-of-products of minterms, The minterms are formed by con- catenating bit vectors, rp, with bit vectors, d. A Fortran program called QS3 (Quotient Selection Program 3) was written to accept design parameters and to produce the minterm definitions of each of the output functions, f (rp, d), ..., f n (rp, d). The derivation will be restricted to the first quadrant (positive rp and d) of the P-D plot. The full P-D plot is symmetric about both axes and thus the cost of implementing one quadrant is a good estimate of the cost of implementing any other. Figure 6 illustrates a portion of the first quadrant of a P-D plot. Three adjacent quotient regions, q (i+l), q(i), and q (i-l) are designated together with the hori zonal line, rp = rp = mArp. Every line of this form will be designated an "rp-line". The quantity, m, is an integer, and Arp = 2 The task of defining the output functions for Table 2 may be reduced to that of assigning adjacent sections of every rp-line to one and only one q-region. For example, the segment of the rp-line between d = a and d = b must be subdivided into three segments: one in each q-region shown. The dividing line between adjacent line segments assigned to q(i) and q (i+l) will be called the "divisor transition value between q(i) and q(i+l)." A divisor transition value between q(i) and q(i+l) may be picked from a sub-range of the divisor between the intersections of the rp-line and the boundaries of the overlap region. The range in which the divisor transition value may be chosen is determined as follows . 35 rp = rp=mArp UPPER BOUND OF q(i) LOWER BOUND OF q(i + l) UPPER BOUND OF q(i-l) •LOWER BOUND OF q( i ) •> d d = a d=b Figure 6. Portion of P-D Plot Illustrating Segmentation of rp-line Let d be the divisor transition value for rp = rp, between q(i) and q(i-l). Then the ordered pair (rp, d ) will be representative of all (rp, d) in the rectangle shown in Figure 7- Since d is a transition value, (d , rp) implies a quotient digit of i-1 and (d - Ad, rp) implies a quotient digit of i. X* The rectangle corresponding to (d , rp) must be completely within the q(i-l) region. The strictest bound is therefore at the upper, lefthand corner of the rectangle in Figure 7, and thus the following must hold. (U.l) rp + y - (i - 1 + p) (d t - a) 36 A rp UPPER BOUND OF q,(i-t) rp = (i-l+/>)d LOWER BOUND OF q.(i) rp = (i-/>)d Figure "J. Portion a P-D Plot Illustrating Constraints in Finding Divisor Transition Interval Similarly, the rectangle corresponding to (d - Ad, rp) must be com- pletely within the q(i) region. The strictest bound in this case is at the lover, righthand corner of the rectangle where the following must hold. rp - A * (i-p) (d t -Ad+3) (U.2) In practical cases, to insure that all d values map into at least one d value, Ad = 3 and thus (^.2) becomes rp - A * (i-p)d a (U.3) Combining (U.l) and (^.3) yields a range restriction on d , namely, (rp + Y )/(i-l+p) + ot ^ d t ^ (rp - X)/(i-p) (k.k) Note that the strategy is to select the size of the rp-steps, Arp, 37 and to allow the algorithm to find the maximum size steps allowable for d. Theoretically, the program could be designed such that Ad would be specified and the precision requirements for the partial remainder would be determined. The former approach is taken due to the fact that control of Arp is more critical. The precision of the estimate of the partial remainder (the number of bits) should be kept low in order to keep down the time required to convert from a redundant to a non-redundant form. The logic paths involving rp as opposed to those involving d, are changing with each call to the model division. For this reason there is motivation to simplify the logic involving only rp at the expense of complicating the logic involving only d. It should also be realized that the precision requirements on the estimate of the par- tial remainder are based upon worst case calculations. Although QS3 uses this worst case precision uniformly in generating the division precision requirements, the minimization routines will remove unneeded precision. The quantity, d , may be any value in the range defined in Equation k.h. Since the design goal is to minimize the total number of literals required to implement the table, d is picked to be a number which can be represented with the fewest bits. In other words, if all numbers in the range specified by (U.U) are represented as the ratio of two integers in the form M N/2 , the d selected is one satisfying (U.*0 and with the minimum value of M. Using the algorithm of selecting the simpliest binary number in the allowable divisor transition ranges, the rp-line in Figure 6 is divided into three segments , as follows : Segment Assigned to a ^ d < d q(i+l) d tl" d )d rp = (i -/o)d Figure 9. Portion of P-D Plot Illustrating Constraints in Finding A(d) 52 Each rp-line has a division transition range between i and i-1 with left end given by d 1 (rp) = ( (rp+ Y )/(i-l-p) ) + a (4.9) and right end given by d r (rp) = (rp- A)/(i-p) (4.10) This derivation is given in Section 4.2,1. If d * d ± (rp) (U.ll) then a quotient digit of i must be selected and thus a value of A(d) must be found such that (i-l/2)/rp - A(d) < (i+l/2)/rp. Similarly, if d + Ad > d (rp) (4.13) then a quotient digit of i-1 must be selected and thus an estimate must be found such that (i-3/2)/rp ^ A(d) * (i-l/2)/r P (k.lk) For a given value of i and d, find the minimum value of rp such that Equation 4.11 is true. Denote this quantity rp . Also find the maximum value of rp such that Equation 4.13 is true. Denote this value rp . Substituting these quantities into Equations 4.12 and k.lk, respectively, yields (i-1/2) rp tQp ^ A(d) ^ (i+l/2)/rp (4.15) (i-3/2) rp bQt ^ A(d) * (i-l/2)/rp bQt (4.l6) A value of A (d) is needed which satisfies both Equations 4.15 and 4.l6. Such a value must be within the range (i-l/2)/rp ^ A(a) * (i-l/2)/rp bQt (4.17) 53 Denote the lower bound of this range, LB(i), and the upper bound, UB(i). Now for all i, find maximum value of LB(i) and designate it LB max. Find minimum UB(i) and designate it UB min. Then select A(d) such that LB = A(d) = UB . and A(d) is the simplest binary number in the range, max mm * ° Every value of d is of the form mAd where m is an integer and d is a negative, integer power of 2. The index, m, is therefore a unique, minterm definition of d. Let a ., a . a, a. be a bit string representation of -1 o 1 j A(d). Each bit corresponds to a Boolean function of d and thus a Boolean function of m. a -l = g -l m a Q = g o (m) a l = S l ^ a = g (m) J J Each function, g. , is defined as the OR of all d-minterms for which a. is 1 in the bit string version of A(d). In other words, the set of min- terms , M. , corresponding to g. is M. = {mla. in A (mAd) is 1} . l ' l Figure 10 is an annotated flowchart of the program (QSU) which actually produces the definitions of the output functions for Table 1. 5h For given values of r , n , a , 3 . A , y find the maximum Ad which will satisfy the precision requirements everywhere on the PD-Plot . DELD = Ad Generate the array NDT (i) where NDT (I) is the numerator of the Ith value of d, where d = (I - M) *Ad, M is a constant determined by the minimum value of d. Let MM1 be the number of elements in NDT. This loop increments the value of d. MM1 is the number of d values. DO 290 1=1, MM1 1 ' D = NDT (I) * DELD \ Q = N d = d Set quotient digit value at N. Work from Q = N down to Q = 1. Figure 10. Flowchart of QS^ Algorithm 55 ERPP = l./DELRP ERPN = ERPP = l./DELRP ERPN = ERPP NP1 NM1 N + 1 N - 1 DO 95 K = 1, N Define maximum truncation error in rp. Q = NP1 - K Work from Q = N DOWN to Q - 1. J = D * (Q-NR) * DELRP Find minimum rp for which transition interval could intersect d^ Note V DELRP = 1/Arp, Figure 10 (continued). Flowchart of QS^ Algorithm 56 (200 \ RP = J/DELRP RPU = RP + ERPP DL = RPU/(Q-1+NR) Yes IQ = Q DIMIW(IQ) = (Q-.5)/RP 1 J = J I RP = J/DELRP RPL = RP-ERPN DR = RPL/(Q-NR; J = J + 1 No Find left end, DL, of divisor transition interval for present RP. ERPP = Y DL = d a = RPTOP has been found . IQ is an integer version of Q. Move down to next lower rp. Find right end, DR, of divisor transition interval for present RP. Figure 10 (continued). Flowchart of QSU Algorithm 57 DIMAX (IQ) = (Q-0.5)/RP RPBOT has been found. For J = 1, N Find LBMAX = max (DIMIN(j)) UBMIN = min (DIMAX(j)) ALL DT (LBMAX, UBMIN, DIN (i), DID (i)) Subroutine DT finds a value for the inverse of D, DI, such that DI = DIN (I)/DID (I), LBMAX <_ DI <_ UBMIN, and DI is the simplest binary fraction in the interval. TN TD DIN (I) DID (I) Figure 10 (continued). Flowchart of QSU Algorithm 58 DO 68 K = 1, 12 This DO-Loop assigns each minterm corresponding to a d value to the appropriate output functions. IP(K) = IP(K) + 1 A(K,IP(K) = NDT(I) TN = 2 * TN IP(K) = IP(K) + 1 A(K,IP(K))= NDT(I) If D = WDT (I) * DELD implies hit K of the output is 1, the NDT (I) is added to the minterm list for A(K). IP(K) the pointer for the Kth list. is Figure 10 (continued). Flowchart of QSi+ Algorithm 59 ^.3.2 Minimizing the Output Functions The same techniques used to minimize the output functions of Table 2 are used to minimize the output functions of Table 1. These were described in Section U.2.2. 5. RESULTS FROM DESIGN PROGRAMS 5.1 Preliminary Remarks The series of computer runs of the design and analysis routines described in the last chapter gave rise to four types of results. First, the algorithm produced numerical results for the cost of implementing Table 1 or Table 2 for various values of design parameters. But in retrospect it appears that the value of the computer was more insight than numbers. Studying the numerical results gave rise to some theoretical results with which to attack the problem of determining cost without actual design. A third result was a discrepancy. For some parameter values the theoretical results and the results obtained from the computer-aided synthesis were in disagreement. Closer study revealed a weakness in the QS3 algorithm. The fourth and final result of the work to date was therefore an improved algorithm for designing Table 2. 5.2 Numerical Results from Design Programs 5.2.1 Cost of Table 2 for Type 2 Structure Considering the large number of possible combinations of parameter values, even if restricted to practical cases, very few designs were actually generated in this present work. After generating the cost data for Table 2 with r = 16, n = 10, a=l/2,b=l, y=A = l/l6, a = 0, and B = 1/256, sufficient insight was gained to propose an analytic expression for the cost of implementing each quotient region of the table. Two additional runs of the Table 2 routines with different parameter values tended to substantiate 60 6l the predicted costs, but several points stood out as discrepancies. In attempting to reconcile the disagreement, a flaw in the QS3 algorithm was discovered: the selection of divisor transition values as the simplest binary number in the transition interval does not necessarily produce a minimum cost design. In view of this flaw, further runs of the algorithm were not justi- fied. The major emphasis was shifted to that of developing a reasonable derivation of an analytic cost expression and to developing an algorithm which would in fact yield correct results which could be used to verify the ex- pression. The parameter values selected correspond to practical cases. Let r, denote the radix and assume a multiplication structure in which the follow- ing multiples of the multipland are available: + 1 or +_ 2, +_ k or +_ 8, +_ l6 or +_ 32, ... , +_ (r-2) or + (r-l). Each of the groups such as +_ k or +_ 8, correspond to a two-way shift gate. Only one of the two multiples may be selected simultaneously. The magnitude of the maximum multiple which may be formed, n, is therefore 2 + 8 + 32 + . . . + (r - l) = 2 (r - l)/3. Since the same structure is used for division, the maximum quotient digit is also n and therefore in the cases studied, n = 2 (r - l)/3 and thus the redundancy ratio, P, is 2/3. As mentioned earlier, the study was restricted to the first quadrant of the P-D plot. The divisor ranges considered were the binary normalized case in which 1/2 <_ d < 1 , and a second case for which 3 A <. d. < 9/8. This second case corresponds to a case in which a divisor in the range 1/2 <_ d < 1 is multiplied by 3/2, if d < 3/k. The maximum truncation errors in rp, y and A, are initially set to the maximum value for which the criterion in Section 2.3 is satisfied, l/l6. Error was assumed in both directions so that the results would be applicable 62 to symmetric adders or subtracters [10]. The divisor is strictly positive and non-redundantly represented thus a = 0. The positive truncation error was the maximum necessary to satisfy the selection criterion (Section 2.3) everywhere on the P-D plot for the given value of y and A . Table 2 summarizes the cost computations for a Table 2 structure with r = 16, n = 10, a = 1/2, b = 1, y = l/l6, A = l/l6, a = 0, and 6 = 1/256. Radix l6 was selected as sufficiently large to be interesting but not so large as to demand great expense of computer time. Table k presents corresponding results for divisors in the range 3 A - d < 9/8. No cost values are given for the upper quotient region, q (n). These regions were not minimized since the results would be highly inaccurate without the ability to include don't care minterms. The upper boundary of q (n) need not be implemented since the range restrictions imposed by the division algorithm would prohibit (d, rp) values to occur above the q (n) region. All minterms corresponding to points above the line rp = (n + p) d are therefore don't care minterms which sharply minimize the cost of implementing the adjacent q (n) region. Note that the cost of a Table 2 structure for r = h, n = 2 is also contained within Table 2. Neglecting the upper region q(2) the cost is the cost of q (0) + q (l) for radix 16 less 2 literals per required prime implicant. 63 Table 2. Summary of Cost Calculations for Table 2 with r = 16, n = 10, a = 1/2, b = 1, y = 1/16, A = 1/16, a = 0, 3 = 1/256. Min. No. ... „ Mm. No. . _,. Mm. No. . _ . of Bits of Prime No. of Required Impli- „ _ T . . . . q- _., .5 terms No. of Literals Average u . Bits in d to , _ _. cants to , _ _. _ . _ . Region . _ _. to Define _ _. to Define Region Fan-in in rp Define .. Define to the _ . Region „ . , . N „,/.\ T3 • Region ,,,?.\ C'(i) F'(ij Region fa M' (1) Est. Act. rp d Total 8 2 12 12 1+ 25 6 31 7-75 1 8 k 96 99 13 82 27 109 8.38 2 8 5 192 195 21 138 62 200 9.52 3 8 6 381+ 38U 36 236 129 365 10. lU k 8 6 381+ 385 1+5 296 190 1+86 10.80 5 8 T 768 765 60 389 269 658 10.96 6 8 7 768 771+ 72 1+61+ 331+ 798 11.08 T 8 7 768 761+ 81+ 51+1 1+2I+ 965 11.1+9 8 8 7 768 771 96 627 507 1131+ 11.81 9 8 8 1536 1526 109 711 581+ 1295 11.88 Totals 5l+0 3509 2532 60l+l 64 Table 3. Summary of Cost Calculations for Table 2 with r = 16, n = 10, a = 3/4, b = 9/8, y = l/l6, A = 1/16, a = 0, $ = 1/128. q- Region Min. No. Min. No. of Min- Min. No. No. of of Bits Required of Prime Impli- Bits in d to to Define the Region cants to in rp Define the Region Define Region M'(i) Est. Act. 8 4 24 24 2 8 4 45 44 7 8 5 90 91 14 8 6 180 180 23 8 6 180 181 27 8 7 360 359 32 8 7 360 362 40 8 7 360 358 54 8 7 360 363 54 8 7 360 358 61 No. of Literals Average to Define Region Tan-in C'(i) F'(i) rp Total 10 7 17 8.50 4o 26 66 9.43 84 57 141 10.07 139 101 240 10.43 166 127 293 IO.85 199 160 359 11.22 246 212 458 11.45 332 301 633 11.72 339 305 644 11.93 383 351 734 12.03 Totals 314 1938 1647 3585 65 5.2.2 Cost of Table 1 for Type 1 Structure The design of Table 1 is considerably less complicated than that of Table 2 since it is a function of only one input rather than two. The costs for radix k, 16, and 6k were generated and summarized in Table k. The com- plexity of the table is adequate to produce a quotient digit in the leading bits of the product A • rp, where A = f(d) and is of the form a ^0 Table k. Summary of Cost Calculations for Table 1 with a = 1/2, b = 1, y = 1/16, A = 1/16, a = 0. Note: NPI = Minimum Number of Prime Implicants NL = Minimum Number of Literals Output Bit r = k, n = 2 6 = 1/16 NPI NL r = 16, n = 10 3 = 1/256 NPI NL r = 6k, n = k2 6 = 1/1024 NPI NL a n 1 k 7 7 1 3 8 12 16 19 18 1 8 29 56 79 95 109 1 1 k 13 9 38 18 91 28 153 Ul 239 63 Uoi 80 55k 80 579 9 70 Totals 19 77 377 333 2139 66 5-3 Analytic Results Concerning Cost of Table 2 5.3.1 Preliminary Remarks Figure 11 is a plot of cost in literals of implementing q(i) versus i for results given in Table 2. To a first approximation the cost varies linearly with i. This observation led to a comparison of the empirical results with the theoretical, indirect measure of the cost of selection of quotient digits suggested by Robertson [5 ]• This cost function also exhibits a similar behavior with i. In the following we will review aspects of Robertson's work, suggest extensions and then propose an expression for the cost of implementing Table 2 as a function of design parameters. 1200 -J IftOO u -i H- 800 CO O O 3E Z 600 Z Z ) where d „ = X + y - Arp + 2 + v(a + 2 ) + w(g - Ad) (5.26) 2p - 1 The actual number of steps required, s. is therefore bounded by i ac*c 'i act (5.27) Equation 5 .26 may be used to determine the minimum values of e and 6 required for a given P-D plot. The quantity, sV , and thus the cost, will tend to infinity as d" approaches a. To insure that every region of the P-D plot may be correctly defined for given values of A , y» a » 3> the quantities e and 6 therefore must be selected such that d" < a. 5*3.3 An Estimate of Cost as a Function of s! i In this section we will hypothesize an expression for the cost of implementing the q(i) region of a given P-D plot. Consider the region to be defined by a set of minterms corresponding to the set of ordered pairs (d, rp) Ik ^ ( ' \ for which q = i. Let Ad for the region be 2~ and Arp for the region be -e(i) 2 . The number of minterms to define the region will be M(i) = (b 2 - a 2 ) 2 e(i) + 6(i) " 1 . (5.28) The fan-in to each minterm, F(i) is given by F(i) = e' + 6« (5.29) where e' = log r + e(i), and (5-30) 6' = l(log 2 (b - 2" 6(i) ) + 1) + 6(i). (5.31) The term I (log (b-2 ) + l) is merely the number of bits of the divisor to the left of the radix point. Recall that l(x) has been defined as the integer portion of x. The cost before minimization is given by C T (i) = M(i) F(i) + M(i) (5-32) The term MF is the number of literals in the AND gates, the term M is the number of literals in the OR gate. After minimization Cvj(i) = M«(i) (F'(i) + 1) (5-33) where M'(i) is the number of prime implicants and F'(i) is the average fanin to each prime implicant. In order to obtain approximations of M'(i) and F'(i), we now approxi- mate the effects of minimization by the following algorithm. Figure ±h illustrates a portion of a quotient region. Note that it may be defined by a set of adjacent rectangles (denoted by heavy lines) each of which is defined by a set of minterms (denoted by small squares). Consider one of the rectangles of width ¥ and height H. Assume that minimization 75 Figure ik. Model of the q(i) Region Used in Approximating Effects of Minimization procedes first in the d-direction by combining adjacent minterms which differ by only the low order bit. If there were initially M minterms in the rectan- gle, after the first step there are M/2 implicants. Next, the implicants which differ only in the next to low-order position may combine to produce M/k impli- cants, etc. The minimization in the d-direction continues for k = I (log W) k d d 2 steps to form M/2 implicants. Similarly, combinations take place in the rp- k + k direction, further reducing the number implicants to M/2 d rp where k rp = J (l °g 2 H) * 76 The minimization of the quotient region will be characterized by an average rectangle of dimensions WH. The width is defined by W = 2 (b - a) / 7[ (5.3M where , i[ s (sj + aj[ +1 ) / 2. (5.35) The quantity, W, is therefore the average width of the minimum- number treads defining the upper and lower boundary of q(i). The height is defined by H = 2 £ ( b+a ) / k (5.36) which is the average value of the distance between rp = (i + 1/2) d (nominal upper boundary) and rp = (i - 1/2) d (nominal lower boundary). The preceeding argument suggest a cost expression of the following form: * C'(i) = ± ^ (F(i) - k + 2 ) (5.37) where M and F are defined by Equations 5.28 and 5.29, respectively, and k is defined by k = k + k d rp = log 2 WH (5-38) The factors and are constants which will be determined empirically. Equation 5*37 may be rewritten as C'(i) = M'(i) F'(i) (5.39) where M'(i) = 2 ^ s! , and (5.U0) *Note that C'(i) is the number of literals in the AND gates; C'(i) = C'(i) + M'(i) is total number of literals for the region. 77 k 0„ r s 1 .-6 F'(i)=log 2 % 2 + x ( lo S 2 (^ " 2 d ) + 1) . (5-Ul) b -a M'(i) is the minimal number of prime implicants required to implement the Boolean function for q(i) and F'(i) is the average fanin to each prime impli- cant . We now use numerical results from Tables 2 and 3 to find values for and and to test the predictive worth of Equation 5^39- The value of is obtained by a least squares fit of the actual values of M'(i) to Equation 5.^0. The value of is obtained by a least squares fit of the actual values of F'(i) to Equation 5.Ul. Values of = 2.12 and = 1.68 were obtained. Table 5 summarizes the results of the fit. Figures 15 9 16, and 17 display the results graphically with s\ as the independent variable. The heavy line denotes the predicted values; the circles denote actual values. Note that Equations 5-U0 and 5.^1 do not explicitly account for the discrete effects resulting from the fact that the treads and risers of the -e -6 q-region boundaries are restricted to integer multiples of 2 and 2 , respectively. The effect is included empirically in the choice of and . There are indications that a more explicit cost function of both s! and sV , ^ 11 which does include discrete effects, might be found. For present purposes, however, the estimates given by Equations 5*^0 and 5^1 were judged to be adequate. 78 Table 5. Results of Least Squares Fit of M'(i), F*(i), and C'(i) a = 1/2, b = 1 for Data from Table 2. i i M*(i) F'(i) C'(i) Equation QS3 Equation 7.6 QS3 7-7 Equation 1+1+ QS3 1.38 5 h 31 1 2.83 12 13 8.6 8.1+ 103 109 2 5.72 2k 21 9.6 9.5 231+ 200 3 8.59 36 36 10.2 10.1 373 365 1+ 11.1+6 1+8 h5 10.6 10.8 519 1+86 5 1U.33 6o 60 11.0 11.0 668 658 6 17.20 72 72 11.2 11.0 821 798 7 20.06 85 84 11. k 11.5 977 965 8 22.93 97 96 11.6 11.8 1135 1131+ 9 25.80 109 109 11.8 11.8 1296 1295 a = 3/1+, b = 9/8 0.7!+ 3 2 7.8 8.5 21+ 17 1 1.51 6 7 8.9 9.k 56 66 2 3.06 13 11+ 9.9 10.0 127 ll+l 3 ^.60 19 23 10.5 10.1+ 203 2i+0 1+ 6.13 26 27 10.9 10.9 282 293 5 7.66 32 32 11.2 11.2 363 359 6 9.19 39 1+0 11.5 11.5 1+1+6 1+58 7 10.72 U5 51+ 11.7 11-7 531 633 8 12.26 52 51+ 11.9 11.9 617 61+1+ 9 13.79 58 61 12.0 12.0 701+ 731+ 79 Figure 15- M'(i) versus s'. 80 Figure l6a. P'(i) versus s. 81 S- Figure l6b. F'(i) versus s\ 82 Figure 17 a. C'(i) versus s. .1/ • 83 Si Figure 17b. C'(i) yersus sT 81+ 5.3.1+ Discrepancies The two cases for which numerical results were presented in Section 5.2.1 differ only in the range of the divisor. We should also consider the effect of varying the precision in the estimates of the operands. The program. QS3, was therefore also run for the same parameter values as listed in Table 2 (Section 2.5«l) except that Arp, y, and X were decreased from l/l6 to 1/32. The minimized results are shown in Table 6. Numbers under the heading 'Equation' are from the evaluation of Equation 5.39; numbers under the head- ing 'QS3' are from the QS3 and minimization programs. Table 6. Comparison of Results from Estimating Equation and the QS3 Program for Arp = 1/32. i s! 1 M'(i) F*(j .) C'(i) Equation 5 QS3 3 Equation 7-37 QS3 7.66 Equation 36 QS3 1.16 23 1 2.38 10 10 8.1+1 8.20 81+ 82 2 U.80 20 20 9.^3 9.65 191 193 3 7.21 31 3)4 10.01 10.02 306 31+6 1+ 9.62 1+1 1+1+ 10.1+3 10.8 1+25 1+76 5 12.03 51 62 10.75 10.9 5I+8 679 6 Ik.kk 61 67 11.02 11.1+ 67!+ 71+9 7 16.85 71 8U 11.21+ 11.5 802 970 8 19.25 82 90 11. 1+3 11.9 933 1067 9 21.66 92 110 11.60 11.8 IO65 1303 In Figure 18 , the data from the C*(i)-QS3 column of Table 6 have been added (denoted by X's) to Figure 17(a). Note that these X-points start near the predicted values (solid line) but increasingly fall above the expected values. 85 Si Figure 18. C'(i) versus s| for Arp = l/l6 and Arp =* 1/32, 86 The source of this discrepancy turns out not to be the predictive equations, as might be first suspected, but rather the QS3 algorithm; speci- fically the decision to pick divisor transition values as the simplest binary fraction in the allowable interval. This choice was made in the early stages of the research when other measures of cost were being used and in changing to the minterm approach it was not evaluated critically. Fortunately, as will be explained, it was possible to salvage the numerical results produced by QS3- A correct algorithm has also been found and is described in the Appendix. The essence of the problem is the failure to fully appreciate the two-dimensional nature of the minimization problem. For several of the q- regions which produced doubtful results , the areas corresponding to the prime implicants of the reduced function were drawn on a P-D plot. The upper and lower stairstep boundaries were therefore made apparent. By close inspection of the boundaries, it could be seen that the decision to force the location of risers to the simplest binary fraction some- times over-constrainted the location of the tread. In other words, in some cases for which a divisor interval would have been spanned with one tread, the algorithm generated two treads. Furthermore, each of these extra treads required an extra prime implicant to define it. Thus, although the output function was minimal for the given definition of the q-region, the given definition of the q-region was unduly complicated and therefore not truly minimal. By manually revising the boundary to eliminate the superfluous prime implicants, it was found that the cost was reduced to close agreement with the predicted values . 87 But the constants in the equation for estimating cost, and , were specified based upon results from the QS3 program. Why should they be trusted? The answer to this question is found in the following argument. If we think of the transition region between q(i) and q(i-l) as being defined by a grid of vertical spacing, Arp, and horizontal spacing, Ad, then the set of all boundaries between q(i) and q(i-l) is all stairsteps which can be drawn along these grids and still remain inside the transition region. As Ad and Arp are decreased the number of different boundaries increases exponentially. The problem is to pick boundaries that will mini- mize the number of literals in the Boolean function defining the area enclosed by the boundaries. (Such an algorithm is described in the Appendix. ) For- tunately for the parameter values used to derive the constants and , there was very little choice in selecting the boundaries due to the dimen- sions of the transition regions. It is, therefore, asserted that the boundary produced by the QS3 algorithm and a correct algorithm would be very nearly the same. A graphical spot check of several of the boundaries confirmed this assertion. When however, Arp was reduced from l/l6 to 1/32 the number of possible boundaries increased and thus the discrepancy became apparent. There is one other case for which a discrepancy is apparent. In Table 5 for a = 3/U, b = 9/8, and i = 7, notice that M'(i) from QS3 is 5h while the predicted value is h5 . This difference accounts for the high points at s! = 10.72 in Figures 15 and 17(b). The prime implicant covering for this case (q(7) ) was drawn and it was thus discovered that six extra prime impli- cant s had been generated. In this case, although Arp is also 1/16, the shifting of the divisor range to the right increases the width of the transi- tion region to the extent that the QS3 algorithm may fail badly for d values near the upper limit, b. Fortunately, it did not except in the one region. 88 5.k Analytic Results Concerning Cost of Table 1 5.1+.1 Preliminary Remarks The program, QSU, produces a cost estimate of Table 1 for a Type 1 structure for which the precision of Ad is such that the rounded, integer portion of Ad is a correct quotient digit. As mentioned in Section 2.U, we are also interested in hybrid structures in which Table 1 and the multiples are used to transform the divisor and remainders before they are applied to Table 2. In the following sections we consider the effect of the transfor- mation on the design parameters for Table 2 and then propose an expression to estimate the cost of implementing Table 1 for given precision in A and d. 5.^4.2 Worst Case Bounds on Transformed Parameters As in Section 2.2, assume that we are given d which is representa- tive of divisor values in the range d - a - d - d+B and are given rp which is representative of remainders in the range rp - A - rp - rp + y. Let A = F(d) be generated by Table 1. The range of the transformed divisor, T d , now represented is given by Ad - Act ± d T £ Ad + A3 (5-^2) T and the range of the transformed remainder rp is given by Arp - AA ^ rp ^ Arp + Ay (5.^3) T T The divisor range which must be accommodated by Table 2 is (a , b ), where a T = (Ad) . - A a, and (5-hk) mm max b T = (Ad) + A 3. (5.U5) max max 89 The worst-case transformed values of a, 3, A, and y are merely A a, A 3, ' J max max A A, and A y. If 2 is the weight of the low order hit in A, then max max Ad T = Ad 2" j , and (5.U6) Arp T = Arp 2~ J . (5.^7) Assuming that A =2, then d' (Equation 5«19) becomes max 2A + 2y - 2" (£+j) + 2v« + w(23 - 2~ (6+j) ) 2p - 1 (5.W Assume that a = and that j is sufficiently large to permit the terms 2 and 2 to be neglected relative to A, y» a, and 3j then d' « 2 Arp (V +Y .) +2 wB . ( u) 2p - 1 This value of d' for given A', y' , Arp, and 6 is greater than d' as defined in Equation 5.2U. Furthermore, d' increases with i due to the 2w3 term. This comparison indicates that although the transformation reduces cost by narrowing the divisor range for Table 2, it increases cost by increasing restrictions on the q-region boundaries. The most difficult terms to evaluate in this analysis are (Ad) . mm >> in Equation ^>.hk and (Ad) in Equation 5.U5. This is the subject of the IIlcLX remainder of this section. The design problem for Table 1 may be viewed as that of imple- menting an estimate of the function f(d) = d . In the following analysis we shall treat divisors in the range 1/2 £ d <• 1. The approach adopted here is to specify the precision in A, the estimates of d , and then to determine the precision in d required to guarantee that dA is within a certain interval in the vicinity of one. The precision of A is selected as the independent variable since it determines the number of additions required in forming the 90 product dA. The number of additions is the dominant factor in determining the operating time of the Tl, Ml, M2 part of the quotient selector. Let the set of discrete values of the output of Table 1 be defined by A = { mi } (5-50) >-J where t = 2 for some positive integer, j, and m is an integer ranging from 1/t through 2/t. The tick marks on the ordinate of Figure 19 designate such a set for t = 2 2 r Figure 19. Geometry for Derivation of Estimates of d 91 For every element of A we must define a divisor interval for which mx is used as the estimate of the reciprocal of divisor values in the inter- val. Interpreted graphically, the elements of A determine the location of the treads of a stairstep approximation to d . The remaining task is to specify the location of the risers (the dotted lines in Figure 19). Let d., and d denote the left and right ends respectively of the l,m r,m divisor interval for which A = mi is taken as the inverse of divisor values in the range d^ - d * d .It may be shown that the optimum values l,m r,m J * for d., and d in the sense of minimizing the maximum value of 1-dA are l,m r,m ' ' \* " x (I + 1) • and (5 - 5l) (5-52) r ,m t (2m - l) These equations correspond to the reciprocal of the average value of xm and x(m+l), and xm and x(m-l). For divisor values, d, in the range d n - d «* d , the range of dA is given by l,m r,m 1 - e~(m) * dA ^ 1 + e + (m) (5-53) where e + (m) = 1 / (2m - 1) (5-5*0 e"(m) = 1 / (2m + 1) . (5-55) The negative error is maximum for m = m . = l/x , but since 1/2 - d *■ 1, mm the positive error, e (m) is maximum at m = m . + 1. mm In practice d n and d are also discrete values and thus in r 1 ,m r ,m general, cannot be placed precisely as specified by Equations 5«51 and 5*52. A In this case the determination of the error bounds on the product dA is more complicated. 92 If cL and d are represented to 6 places to the right of the 1 ,m r ,m radix point then the actual end points can be within 2 of the theore- tically optimal point. Let A = 2 ' for the worst case, replace d by d n - A and replace d by d + A. l,m r,m r,m Now, where 1 - e(m) * dA ^ 1 + e + (m) (5-56) e + (m) = mAx + 1 / (2m - 1) (5-57 e (m) = mAx + 1 / (2m + l) (5-56\ Note that due to the range restriction of d, e + (m . ) = m . At (5-59) min mm and e (m ) = m At . (5-60) max max Since we require 2" 6 - ^ ( i ' -j— ) (5.61) t m m + 1 x for all allowable m, the maximum value of 2 should be less than or equal to xA» and 6 should be less than or equal j + 2. For given values of t and A (Ad) . = 1 - e"(m) (5-62) mm max (Ad) = 1 + e + (m) (5.63) max max taken over all m in the range l/x to 2/t. 93 5.U.3 An Estimate of the Cost of Table 1 We now derive an expression with which to estimate the minimum cost in literals of Table 1 when structured as specified in Section 3.2.3. Let the outputs of value A be of the form A = a _ 2 a -l * a i a 2 *•* a j _ r and considered the d axis of Figure 19 to be equally divided in units of 2 After all values of d n and d are specified, each bit of A may be defined l,m r,m * J r by a sum-of-products of minterms of the form k 2 Let A = a a .a a ...a. . We will now derive an estimate of the cost of implementing a. = f.(d). In the range 1 - A - 2, each bit, a., is 1 in 2 intervals, each of length 2 . Let y! n be the value of the J i,k bottom of the k interval along the d axis for bit a. and let yV be the 1 X ,K top of the interval. Thus, y! , = 1 + (2k - 1) 2" 1 (5.6U) l ,k: yV , = 1 + 2k2 _1 (5.65) 1 ,K. for i = 1, 2, ..., j and k = 1, 2, ..., 2 (l " . Let X be the width of the corresponding interval along the d-axis, 1 ,K thus X. . = r^ : (5.66) 1,J " (Uk^ - 2k) 2~^ + (ilk - 1) 2" 1 + 1 Let each interval of width 2 ' along the d-axis correspond to a minterm, each with a fan-in of 6. The number of minterms required to define X. , is i,k 9k V " ^i,*' (5 - 6T) the number of literals is C. >k = «M. ;k . (5.68) Using the same approximation to the minimization algorithm as described in Section 5.3.3, the cost in literals, after minimization for implementing the X. interval is 1 ,k c ±,k ■ M i,k F i,k (5 - 69) where, with u = I (log M. ) c. 1 jK. M! . = M. . / 2 y i,k i,k is an approximation to the number of prime implicants required and F! , = 6 - m (5-70) i,k is an approximation to the average fan-in. The cost of implementing a. = f.(d) is therefore (5.71) i = I k = ■1) 1 c i,* The total number of prime impl] .cants required is 2 (i " -1) M! l = V k = 1 M! , i,k The cost for the entire table is therefore (5.72) C™ = \ C! + M! (5-73) Tl y i i i=l 6. ESTIMATES OF COST AND PERFORMANCE 6.1 Preliminary Remarks In this section we use the analytic tools developed in Section 5 together with the definitions in Section 3 to tabulate samples of expected cost and performance. Results are given for Type 2 structures, Type 1 structures, and finally for a family of hybrid structures. Since the radix of the model division is the primary determinant of performance, for each structure we first consider cost versus radix, then performance versus radix, and finally cost versus performance. Some of the results depend upon assignment of numerical values to quantities used in the definitions of Section 3. The values selected are based upon experience in arithmetic unit design. A different set of realistic values would only shift the location of the cost-performances curves and not materially alter the shape of the curve. General conclusions inferred from them would not change . 6.2 Type 2 Structures 6.2.1 Cost versus Radix The cost of Table 2, C , is given by n-1 C T2 = 2 (C ' (i) + M ' U)) (6,1) i=0 where C'(i) is defined by Equation 5.39 and M'(i) is defined by Equation 5.^0. 95 96 Tables la. and Ih summarize cost versus radix for several values of Arp. Table 7a is for a divisor in the range 1/2 to 1 and Table 7b is for a divi- sor in the range 3A to 9/8. In all cases, p = 2/3, y' = ^' = 1» 3' = 1, and a' = 0. The quantity Ad is 2 where 6 is given for each entry in the tables. The limiting cases {k and 8) are based upon the assumption that the precision in rp and d is increased such that s. 1 = s. . A near minimal cost should lie between Cases 1 and k for the first division range or between Cases 5 and 8 for the second division range. The cost entries are given in the following form: 18 (Prime Implicants) 111 (Literals in AND Gates) 129 (Total Cost) Table 7a. Cost of Table 2 versus Radix r 6 Case 1 Arp=l/l6 6 Case 2 Arp=l/32 6 Case 3 Arp=l/6U 6 Case k Arp=0 k 5 18 111 129 5 15 90 105 3 Ik 81 95 00 13 Ik 87 16 8 552 6170 6722 7 h6k 506h 5528 7 U30 k6h6 5076 OO Uoo U291 1+691 6k 9 loVro 160526 170996 9 8792 132578 11+1370 9 81U8 121971 130119 OO 7595 112928 120523 '56 11 17^597 3381283 3555880 11 11+6610 2802307 29^8987 11 135871 2582126 2717997 OO 126656 239^169 2520825 91 Table 7b. Cost of Table 2 versus Radix r 6 Case 5 Arp=l/l6 6 Case 6 Arp=l/32 6 Case 7 Arp=l/64 i Case 8 i Arp=0 1+ 5 10 61 71 1+ 8 52 60 3 8 hi 57 7 46 53 16 7 296 3353 3649 6 261 2920 3181 6 247 2742 2989 234 2583 2817 64 8 5597 86870 92*167 8 4953 75988 80941 8 4684 71481 76165 4443 67470 71913 256 10 93341 1825332 1918673 10 82590 1600477 1683067 10 78097 1505408 1583505 74090 1424130 1498220 6.2.2 Performance versus Radix The following equations from Section 3 are relevant to the calcu- lations in this section. Operating Time of Model Division: T=T +T +T +T Q PREF Ml T2 R' (3.7) Performance of Model Division: log r P = £ — « T Q (3.8) 98 Operating Time of Full Precision Division T„ T^ + T T D = M i + -^- (3.U) Performance of Full Precision Division: 2 log r D T A log 2 r + 2(T + T Q ) K3 ' X£} Table 8 is a summary of P and P for several radices with T r . T , TPT: ,=3 , T M1 = °' T T2 = 2 ' T R = ls T A = 3 ' T C = k ' F ° r these values T Q = 6 - Note that we have actually computed a best-case for performance since we have assumed that Table 2, even for the higher radices, can be implemented in two delays (T T2 = 2). Table 8. Performance of Type 2 Structure versus Radix r P (bits/delay) P (bits/delay) k .33 .15 16 .67 .25 6k 1.00 .32 256 1.33 .36 6.2.3 Cost versus Performance Neglecting the cost terms C pREF , C g^,, and C , the cost of implementing a Type 2 structure is C Table 9 summaries the bounds on C versus performance of the full precision division. The actual cost should 99 lie between the lower bound (LB) and the least upper bound (LUB) correspond- ing to Case 1 in Table 7a and Case 5 in Table 7b. These results are plotted and discussed further in the summary and conclusions (Section 7)« Table 9« Cost Bounds versus Performance for Type 2 Model Division P D C T2 (literals) (bits /delay) Times a = 1/2, b = 1 Times a = 3A, b = 9/8 Increase LB LUB Increase LB LUB .15 1.00 87 129 1 53 71 .25 1.67 U.691 6,722 5^ 2,817 3,61*9 .32 2.13 120,523 170,996 1385 71,913 92,1+67 36 2.U0 2,520,825 3,555,880 28975 1,1*98,220 1,918,673 6.3 Type 1 Structures 6.3.1 Cost versus Radix Neglecting the cost terms C TmriTli C^^^ and C„, the cost of imple- PREF DEF R menting a Type 1 model division is the sum of C and C . Values for C .. are taken from the results given in Table k. The term C is computed from Equat i on 3.6, namely , C M1 " i C R + \ N B C A + (H A + X) M B C SG + ^ C C" The following values are assumed: C R =10, C A = 50, C SG =6, c c -U, N B = 8. Table 10 summarizes the results. 100 Table 10. Cost of Type 1 Structure versus Radix r C T1 J N A C M1 C T0T k 28 3 2 1230 1258 16 k5k 6 3 1708 2162 Gk 21+72 9 5 263U 5106 6.3.2 Performance versus Radix In computing the operating time for a Type 1 structure we assume that T pREF = 3, T T2 = 0, T R = 1, T A = 3, T Q = h, and T^ = 3 N A , and there- fore from Equation 3.7, \ - 3 \ + k - Table 11 presents P (Equation 3-8) and P (Equation 3.12) for the cases which were described in Table k. Table 11. Performance of Type 1 Structure versus Radix r P (bits/delay) P (bits/delay) y d k .20 .12 16 .31 .17 61+ .32 .19 6.3.3 Cost versus Performance Table 12 merges the computations of the previous sections. P D (bits/delay) Times Increase .12 1.00 • IT 1.1+2 .19 1.58 101 Table 12. Cost versus Performance for Type 1 Model Division Times C (literals) Increase 1258 1.00 2162 1.72 5106 U.06 6.k Hybrid Structures 6.^.1 Cost versus Radix and Number of Adders in Multiplier 1 For hybrid structures the cost is computed in several stages. First, T T C and the worst-case bounds on the transformed divisor range (a , b ) are computed for the cases of 1, 2, 3, and k adders in Multiplier 1. The number of adders, N , is the dominant factor in the performance of the model division and furthermore specifies the cost of Table 1 under the assumptions presented in Section 5.4.2. Recall that the maximum uncertainty in A, x, is 2 where -1+2 where j = 2 N ; that the maximum uncertainty in d, 6, is 2 ; and that i and 6 determine C . Next the transformed parameters are computed for each of the four designs. The cost equation for Table 2 is evaluated for each set of trans- formed parameters , each for four different radices , to yield a total of six- teen designs. The total cost for each hybrid structure is taken to be C T1 + C M1 + C M2 + C T2* Table 13 summarizes the costs for the sixteen cases. The quantities T T a and b are defined by Equations 5.62 and 5-63, respectively, and C is defined by Equation 5-73. The terms C.. and £ are computed from Equation 3.6 with C. =50, C = 10, C _ = 6, C_ = k, and e = 5. The cost term, C m _, is 102 computed from Equations 5.33, 5-39, and 5.1+0 with the transformed parameters specified as follows : m m X = 1/16, a = 0, | max T -6+1 1=2 = 2, Arp T = 2 J 5 , Ad = 2 J ~°' Y T = l/l6, P = 2/3. Table 13. Cost Computations for Hybrid Structures Table 1 Parameters Case No. 'Tl Ml 'M2 'T2 Total V 1, j=2, 6=U, 1 1+ 17 512 332 332 10 928 T a = 27/32 2 16 17 61+8 332 295 3233 1+525 b T . 41/32 3 61+ 17 7Qk 332 5597 81+615 9131+5 4 256 17 920 332 9331+1 178713 ,882,323 V 2, j=l+, 6=6, 5 1+ 126 972 892 2 11+ 2006 T a = 123/128 6 16 126 1220 892 76 81+3 3157 t T = 137/128 7 61+ 126 11+68 892 li+1+9 22059 25991+ 8 256 126 1716 892 2U169 h6[ 1+92530 N A= 3, j=6, 6=8 9 It 688 ikkk 1708 1 3 381+1+ T a = 507/512 10 16 688 182I+ 1708 19 212 1+1+51 b T = 521/512 11 61+ 688 218U 1708 367 5583 10530 12 256 688 25I+1+ 1708 6121 118070 129131 N A= 1+, j=8, 6=10 13 1+ 31+69 1988 2780 8237 T a = 201+3/201+8 lU 16 31+69 21+60 2780 5 1+5 8759 b T = 15 61+ 3I+69 2932 2780 85 1286 10552 2056/20I+8 16 256 3I+69 3U0I+ 2780 11+26 27I+63 385I+2 6.1+ .2 Performance versus Radix and N umber of Adders in Mult iplier 1 In computing the operating time for the hybrid structures we assume that T pREF = 3, T T2 = 2, T R = 1, T A = 3, T Q = k and T^ = 3 N A , and therefore from Equation 3.7 T Q = 3 N A + 6. 103 Table Ik presents P (Equation 3-8) and P (Equation 3.12) for the cases in Table 13. Table Ik. Performance Calculations for Hybrid Structures Case No. P (bits/delay) P (bits/delay) 1 .22 .13 2 .1*5 .21 3 .67 .27 U • 89 .32 5 • 17 .11 6 .33 .18 7 • 50 .2k 8 .67 .29 9 .13 .09 10 .27 .16 11 .40 .21 12 .53 .26 13 .11 .08 14 .22 .Ik 15 .33 .19 16 .hk ,2k 10U 6.1+.3 Cost versus Performance Table 15 merges the cost and performance (P n ) data for the hybrid structures. These results are plotted and discussed further in the next section. Table 15. Cost versus Performance for Hybrid Model Division Structures Case No. P D Times C Times (bits/delay) Increase (literals) Increase 1 .13 1.00 928 1 2 .21 1.62 H525 5 3 .27 2.08 9131+5 98 1+ .32 2.U6 1882323 2028 5 .11 1.00 2006 1.0 6 .18 1.63 3157 1.6 7 .2U 2.18 25991+ 13 8 .29 2.63 1+92530 2l+5 9 .09 1.00 381+1+ 1.0 10 .16 1.78 1+1+51 1.2 11 .21 2.33 10530 2.7 12 .26 2.88 129131 31+ 13 .08 1.00 8237 1.0 lU .111 1.75 8759 1.1 15 .19 2.37 10552 1.3 16 .2k 3.00 385I+2 k.l 7. SUMMARY AND CONCLUSIONS 7.1 General Summary In the summary and conclusions it is convenient to distinguish be- tween the definitive, synthetic, and analytic aspects of this study. Sections 2 and 3 are definitive. Section 2 defines the class of division techniques to be studied and Section 3 defines the measure of cost and performance to be applied. It is noted that an advantage of the model division approach is congruity with commonly used multiplication structures including the capacity to form the partial remainders using non- propagating adders or subtractors . The attendant disadvantages are the necessity to store two bits per quotient digit and the requirement for a terminal step to convert the redundant to non- redundant form. The fact that for division, unlike multiplication, the selection of the jth quotient digit cannot be straightforwardly overlapped with the formation of the jth partial remainder, prompts consideration of high-speed division techniques for the model. Furthermore, the overhead required to "call" and "return" from the model division prompts study of higher radix structures which produce several bits per call. A variable radix block structure of a class of model division schemes is proposed for study. Section k describes algorithms with which to synthesize the most complicated sub-blocks of the family of proposed quotient selectors: a combi- natorial network to produce an estimate of the reciprocal of the divisor (Table l), and a combinatorial network to generate a quotient digit when given d and rp (Table 2). Although these synthesis routines generate a logic equation definition of the structure, the intent in this study is merely to 105 106 determine the cost; essentially the number of literals in the logic equations. After the cost vs. performance behavior is sufficiently understood to permit specification of parameters of a practicable model, the synthesis routines may be applied as a first step in implementation. Section 5 includes the bulk of the analytic work. The section opens with a tabulation of costs for several cases synthesized by the previously defined algorithms. But since there exists many variants of the model divi- sion and since even computer synthesis in this case is expensive, the numeri- cal results and insight are applied to hypothesize formulas rather than algorithms with which to estimate cost. The formulas take account of the ten variables of the model division. Although one of the formulas is normalized with two empirically defined quantities, it is assumed that these quantities are sufficiently constant to permit meaningful prediction of cost for cases other than those used in the normalization. In Section 6, the formulas for both cost and per- formance are applied to tabulate expected values of cost and performance. The present section is an attempt to summarize the work in the pre- vious sections, to reach some conclusions about the feasibility of the investigated quotient selection schemes, and to suggest areas for further investigation. The section is subdivided into consideration of numerical cost and performance results, analytic results, and concludes with additional remarks about areas for further research. 7.2 Cost and Performance Figure 20 is a graphical summary of the cost versus performance estimates tabulated in Section 6. The necessity for a five cycle semi-log plot emphasizes the extreme range of costs and disappointing cost-performance 107 f k IO« / / t -/ — — n 7 / I ti / / III. ' * //// j /// I0 5 // 1 I f fid f I I ft t / // / i 7 // / // CO _i < on 4 uj icr // -1 7 3 w — — 0-""" <-> 5 o Vi«— Vjn " I0 3 9 — ^ ' «r -/// /// v/ V / J / I0 2 - IB' $ 9 - J]A< // 7 - 1 B 1 lA, . .05 P D (BITS /DELAY) Figure 20. Cost yersus Performance for Samples of Model Division Structures 108 behavior. It is apparent that many of the results are negative; they indicate what not to attempt to implement. The points on the graph are taken from Tables 9, 12, and 15. Points corresponding to the same type structure but differing in radix are connected by straight-line segments. Each of these "curves" is labeled with a Roman numeral. Curves la and lb, with points from Table 7b, are the lower and upper bounds on the cost of a Type 2 structure (direct table look-up) for divisors in the range (3A, 9/8). Curves Ila and lib, with points from Table 7a, are the lower and upper bounds for a similar structure with divisors in the range (1/2, l). To a first approximation all four curves (log C) vary linearly with performance and thus Cost * lO 1 ^ where k is about 18. This exponential behavior is not surprising considering that performance varies as log r (see Equation 3.12) and that cost varies as o r log r. This latter statement is derived from Equations 5-39> 5.^0, and 5.^1. The radix k Type 2 structure is quite practicable, requiring about ten,10-input gates to yield performance of .15 bits per logic delay. Assuming 10 ns. logic, the scheme would generate 60 bits of quotient in about k ys. A radix 16 Type 2 structure theoretically increases performance by 5/3, conse- quently reducing divide time, under the same assumptions, to 2.U ys. The cost, however, increases over 50 times. Statements about the radix l6 structure must be qualified by the observation that due to fan-in and fan-out restrictions, the table cannot actually be implemented in two levels of logic. Since the divisor is con- stant, the d portion of each prime implicant can be formed in a cascade of many logic levels without degradation of performance. But going to additional levels to form functions of rp, although cost may be reduced, will decrease 109 performance below the ideal value assumed in Figure 20. Justification for a radix l6 Type 2 structure is discussed further in connection with a "quo- tient lookahead" scheme mentioned in Section 7.5. Type 2 structures "beyond radix 16 are too expensive to consider further. Based upon Figure 20, curve III, it appears that a Type 1 structure is never preferable to a Type 2 structure. Although this is probably true, the Type 1 structures might be studied further with the following points in mind: 1. The structures studied here employ a rather conventional multiplier requiring one cascaded adder per two bits of multiplier. Perhaps faster multipliers may be found. It is doubtful, however, that they would be less expensive. 2. For all structures studied the estimate of the partial remainders have been converted to a conventional form. For structures requiring a transformation of rp, the assimila- tion is performed after the multiplication. The conversion to conventional form has been required as a concession to reducing the cost of Table 2. For Type 1 structures, Table 2 is not required and thus perhaps the redundantly represented result could be used directly by the shift gates in the full precision arithmetic unit. The elimination of the conversion is roughly equivalent to eliminating one adder from the multiplier structure. The cost versus performance of the hybrid structures are shown in curves IV-VII, corresponding to 1 through k adders in the multipliers, Ml and M2. The curves initially rise slowly relative to the Type II curves but soon become steep as the cost of Table 2 for the higher radices dominates. 110 2 The r log r behavior of C T2 is not easy to suppress. Again, based upon results shown in Figure 20, it appears that hybrid structures should not be chosen over a Type 2 structure. It is apparent from Equation 3.12 that P as a function of r has an upper limit of T /2. This limit is the theoretical upper bound on the performance of the iterative steps of multiplication. With T = 3, the theoretical ratio of performance of division to performance of multiplication for cases in Figure 20 ranges from 0.09 to 0.53. For practicable cases, the range is 0.225 to 0.375- 7. 3 Analytic Results Only a few of the cases studied appear to be feasible. But negative results are valuable, and furthermore it should be kept in mind that the main purpose of this thesis is not to present an exhaustive enumeration of quotient selection schemes, but rather to develop general techniques for analysis. It is important to appreciate the generality of the extension of Robertson's cost measurement (s.) to the imprecise cases (s! and si'). Although the estimate of cost as a function of s! is not rigorous and includes empirically defined constants, the derivation of s .' is rigorous. The analysis developed in Section 5«3.2 leads to a succinct statement of worst-case pre- cision requirements in rp and d, (d"< a) and to insight into the effect of the parameters of the model division on the cost of quotient selection. The s! cost measurement is applicable to structures other than those fitting within the structure of the model division shown in Figure 2. For example, as mentioned earlier, the treads of the staircase boundaries between quotient regions may be viewed as comparison constants against which rp is compared to determine in which quotient region it belongs. The divisor range Ill is partitioned into intervals such that for each interval there is a single comparison constant between each quotient region. The comparison constants could be stored in a read only memory. A given divisor value would determine a column of comparison constants which would be read out to become one input to a set of comparators; the other input to the comparators would be rp. If c. is the comparison constant between q(i) and q(i-l) then q=k, where k is the greatest such that rp >_ c . The number of sets of comparison constants has a lower bound of s 1 and upper bound of s". The number of n n comparison constants in each set is n (assuming implementation of only the first quadrant of the P-D plot ) . Among others, the analytic results prompt the following observations 1. There are minimum requirements for the precision in the estimates of rp and d. 2. For given precision above the minimum required, there is a limit, s!, to the minimum number of comparison constants required between q(i) and q(i-l). 3. The actual number of steps, s. .is greater than s'. due to * l act , l discrete effects, i.e. due to the fact that the locations of treads and risers are restricted to discrete values. h. The upper bound on s. , including the discrete effects, X cLC"0 is sV. 1 A A 5. Increasing precision in d and rp moves s. closer to s. and s. closer to s ! , but by a decreasing amount, l act l 112 7 • h Suggestions for Further Investigation The following topics for further investigation have emerged in the course of this study. The order of listing does not imply any priority. 1. Compare the cost and performance of the model division approach to other division algorithms such as the Wallace algorithm [32] as implemented in the IBM 360/9l[l^+], and division schemes in other large machines such as the CDC 76OO . 2. Consider the use of a radix k , Type 2 structure in a pipeline arith- metic unit. Assuming that the divisors and quotients may be streamed along with the partial remainders , it appears that a set of the inexpensive radix k, Type 2 model division structures may be used to effectively pipeline the division operation. Multiplication and division could be intermixed in the same pipeline, however, assuming synchronous control, the clock frequency is limited by the quotient selection time and thus the multiply time is degraded. 3. Consider a "quotient lookahead" scheme. Assume that each adder in a cascade of adders is capable of performing a multiplication radix 2 . Then the shift gates for each adder may be controlled by a model division of the same radix. If the radix of the model is greater than 2 then more quotient digits are formed than can be used in forming the present partial remainder. It is conceivable, however, that as soon as they are formed they could be used to set shift gates to form the next partial remainder thus overlapping control time. For example, if k=2 but the model division is radix 16, control signals for the shift gates of two successive adders 113 are generated simultaneously. If a radix 16 quotient selector is coupled to the output of every adder in the cascade , then for each addition/subtraction four bits are formed, two of which overlap with the previously formed bits . The formation of the jth partial re- mainder may therefore be overlapped with formation of the j+1, radix k quotient digit. After startup, the effective control time per addition would be the quotient selection time minus the add time. If the times were equal, then division could proceed at multiply speed. k. Study the variation in cost of the entire arithmetic unit as a function of p, the redundancy ratio. Recall that p is one variable in the equation for s!. In all numerical work produced in this study p = n/(r-l) = 2/3. The decision to keep p constant excluded the explicit study of radix 8, 32, and 128 for which there is no integer, n such that p = 2/3. 5. Study a model division structure based upon simultaneous comparisons of rp with comparison constants selected by the value of the divisor, 6. Consider the engineering details of a radix l6, Type 2 structure. 7. Program the correct algorithm (Appendix A) for producing the minimal cost definition of a Table 2 structure. Reference [3^] defines the minimization algorithm. Compare the results with those produced by the QS3 algorithm (Section h) . llU APPENDIX A Algorithm for Generating Minimum Cost Sum-of-Products Definitions of the q-Regions of Table 2 1. Consider the P-D plot to be covered by a uniform grid with spacing of Ad along the d-axis and with spacing Arp along the rp-axis. The inter- section of each grid line is defined by the order pair (d, rp) where d is an integer multiple of Ad and rp is an integer multiple of Arp. Every pair, (d, rp) is representative of full precision quantities in the ranges defined by Equations 2.11 and 2.1^. Sufficient condition for the choice of X, y, a, 3, Arp, Ad, 6, and e is that d" (Equation 5-26) be greater than a, the lower bound of the divisor range. If Ad and/or Arp are smaller than necessary, the excess precision is removed by minimization. However, the smaller Ad and Arp, the closer the boundaries between the q-regions may approach the theoretical limit, i.e. the smaller will be the discrete effects. 2. Every pair, (d, rp) corresponds to a minterm, rp| |d. (See page 38 for definition of the notation. ) 3. Let R. be the set of minterms which are required to define q(i), i.e. which must be assigned to the output function f . . Thus, R. = {rp| |d | all or any part of the area corresponding to (d, rp) is completely within the area defined by the lines rp=(i+l-p)d, rp=(i-l+p)d, d=a, and a=b.} Let T. be the set of minterms which lie completely within the overlap region between q(i) and q(i+l). Thus, 115 T. = {rp||d J the area corresponding to (d, rp) is completely within the area defined by the lines rp=(i+p)d, rp=(i+l-p)d, d=a, and d=b.} Let D be the set of all minterms which correspond to (d, rp) which do not represent area within the boundaries of the P-D plot, i.e. area not within any q- region. Assume a minimization algorithm such as described in Section 1+.2.2 which will accept both true minterms, 0, and a set of don't care minterms, A, of a given function. The result of the minimization process is a minimal set of prime implicants, n. Let ft be the set of minterms implied by II, i.e. all minterms for which the function defined by the OR of the ele- ments of II is true. The following is the proposed algorithm for defining the output functions, f . , for i=0, 1,. .. , n. a) Let = R , A = T U D. b) Execute the minimization algorithms to produce P = II, and construct M Q = ft. Output function, f , is the OR of the elements of P . c) For i=l,2,...,n do the following: Let = R. U (T. . - (T. . n M. _)), and A = T. U D. Execute l l-l l-l l-l l the minimization algorithms to produce P. = n and construct M. = ft . Output function f. is the OR of the elements of P.. 116 APPENDIX B Example of Results of QSU and Minimization Program. Note: r=l+', n=2, a=l/2, b=l ;P =P 1 P 2 • P 3 P U P 5 P 6 d= - d l d 2 d 3 d U In the following '1' implies that the variable is present in true form; '0' implies that variable is present in complement form; 'x' implies that variable is absent. Variable d is deleted by inspection. Minimal cost prime implicants for q(0): P l P 2 P 3 ^ P 5 P 6 d l d 2 OOOxOOxx OOOOxxxx OOOxxOxl OOOxOxxl Minimal cost prime implicants for q(l) P l P 2 P 3 F h P 5 P 6 d l d 2 d 3 d U OOOlxlxOxx OOOllxxOxx OOxlllxlxx OOlOxxxxxx OOlxxOxxlx OOlxOxxxxl Olxxxxxlx OOlxxxxlxx OlOOOOxxll OlOOxxxlxx OlOxxOxllx OlOxOxxllx 117 REFERENCES [I] J. E. Robertson, "A new class of digital division methods," IRE Trans- actions on Electronic Computers , vol. EC-7, pp. 218-222, September 1958. [2] T. D. Tocher, "Techniques of multiplication and division for automatic binary computers," Quart. Jour. Mech. Appl. Math. , vol. 11, Part 3, pp. 36U-38U, 1958. 1-3] c. V. Freiman, "Statistical analysis of certain binary division algor- ithms," Proceedings of the IRE , vol. U9 , pp. 91-103, January 1961. [h] M. Nadler, "A high speed electronic arithmetic unit for automatic computing machines," Acta Technica , no. 6, pp. k6h-k r jQ i 1956. [5] J. E. Robertson, "Methods of selection of quotient digits during digital division," File No. 663, Department of Computer Science, Uni- versity of Illinois, Urbana, Illinois, June 1965. [6] D. E. Atkins, "The theory and implementation of SRT division," Report No. 230, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1967. [7] D. E. Atkins, "Higher radix division using estimates of the divisor and partial remainders," IEEE Transactions on Computers , vol. C-17 , no. 10, pp. 925-93U, October 1968. [8] D. E. Atkins, "Design of the arithmetic units of Illiac III: Use of redundancy and higher radix methods," IEEE Transactions on Computers , (to appear) August 1970. [9] D. E. Atkins, "illiac III computer system manual: Arithmetic units, vol. I," Report No. 366, Department of Computer Science, University of Illinois, Urbana, December 1969. [10] J. E. Robertson, "A deterministic process for the design of carry-save adders and borrow-save subtractors ," Report No. 235 5 Department of Computer Science, University of Illinois, Urbana, July 1967. [II] R. T. Borovec, "The logical design of a class of limited carry-borrow propagation adders," Report No. 275, Department of Computer Science, University of Illinois, Urbana, Illinois, August 1968. [12] F. A. Rohatsch, "A study of transformations applicable to the development of limited carry-borrow propagation adders," Report No. 226, Department of Computer Science, University of Illinois, Urbana, June 1967. 118 [l3j J. E. Robertson, "The correspondence between methods of digital division and multiplier recoding procedures ," Department of Computer Science Report No. 252, University of Illinois, Urbana, Illinois, December 1967. [Ik] S. F. Anderson, J. G. Earle , R. E. Goldschmidt, D. M. Powers, "The IBM System/360 Model 91; Floating-point execution unit," IBM Journal of Research and Development , vol. 11, no. 1, pp. 3*4-53, January 1967. L15] A. Avizienis, "Binary-compatible signed-digit arithmetic," AFIPS, Fall Joint Computer Conference, vol. 26, pp. 663-672, 196k. [16] V. S. Burtsev, "Accelerating multiplication and division operations in high-speed digital computers," in report by The Institute of Exact Mechanics and Computing Technique, The Academy of Sciences of the USSR, Moscow, 1958. [17] M. Combet , H. van Zonneveld, and L. Verbeek, "Computation of the base two logarithm of binary numbers," IEEE Transactions on Electronic Computers , vol. EC-lU, no. 6, pp. 863-867, December 1965. [l8] K. J. Dean, "A precision code converter for reciprocals of binary numbers," The Computer Bulletin , vol. 12, no. 2, pp. 55-58, June 1968. [19] D. Ferrari, "A division method using a parallel multiplier," IEEE Transactions on Electronic Computers , vol. EC-16 , no. 2, pp. 22*1-226, April 1967. [20] R. E. Gilman, "A mathematical procedure for machine division," Communi- cations of the ACM , vol. 2, no. k, pp. 10-12, April 1959- [21] R. E. Goldschmidt, "Applications of division by convergence," M.S. Thesis, MIT, June 196I+. [22] Ernest F. Hall, David D. Lynch, Richard E. Young, "Generation of products and quotients using approximate binary logarithms for digital filtering applications," IEEE Transactions on Computers Repository , no. R-68-l6*+, 1968. [23] Jiri Klir, "A note on Svoboda's algorithm for division," Stroje Na Zpracovani Informaci (information Processing Machines), no. 9, pp- 35-39 > 1963. [2k] E. V. Krishnamurthy , "On range-transformation techniques for division," IEEE Transactions on Computers , vol. C-19 , no. 2, pp. 157-160, February 1970. [25] John N. Mitchell, Jr., "Computer multiplication and division using binary logarithms," IRE Transactions on Electronic Computers , EC-11, no. U, pp. 512-518, August 1962. [26] Ray G. Saltman, "Reducing computing time for synchronous binary division,' IRE Transactions on Electronic Computers , vol. EC-10 , no. 2, pp. 169-17 1 *, June 196l. 119 [27] A. Soceneantu, "Binary iterative division," (Report in Progress), Department of Computer Science, University of Illinois, Urbana, Illinois, 1970. [28] R. Stef anelli , "A suggestion for an high-speed parallel binary divider," IEEE Transactions on Computers Repository , no. R-69-3, October 1968. [29] A. Svoboda, "An algorithm for division," Stroje Na Zpracovani Informaci (information Processing Magazine), no. 9, pp. 25-32, 1963. [30] C. Tung, "A division algorithm for signed-digit arithmetic," IEEE Transactions on Computers , vol. C-17 , no. 9, PP« 887-889, September 1968. [31] R. M. Wade, "A carry-independent quarternary division scheme," IEEE Transactions on Computers Repository, no. R-68-52, November 1967- [32] C. S. Wallace, "A suggestion for a fast multiplier," IEEE Transactions on Electronic Computers , vol. EC-13, pp. lU-17, February 19 6U. [33] E. J. McCluskey, Introduction to the Theory of Switching Circuits , McGraw-Hill, New York, 1965, pp. 135-136. [3^] V. G. Tar e ski , "Minimization of two level switching circuits involving many variables," Ph.D Thesis in preparation, Department of Computer Science, University of Illinois, Urbana, Illinois. [35] Chester C. Carroll and George E. Jordan, "A fast algorithm for Boolean function minimization," Auburn University Report No. AD 680 305, December 1968. [36] Tso-Kai Liu, "A code for zero-one integer linear programming by implicit enumeration (A Programming Manual for ILLIP,)" Department of Computer Science, Report No. 302, December 1968. L37] T. Ibaraki, et al , "An implicit enumeration program for zero-one integer programming," Department of Computer Science, Report No. 305, January 1969. 120 VITA Daniel Ewell Atkins, III was born in Jacksonville, Florida on April 12, 19^+3. He received the B,S. degree in Electrical Engineering from Bucknell University, Lewisburg, Pa., in 1965; the M.S. degree in Electrical Engineering from the University of Illinois, Urbana, in 1967; and the Ph.D. in Computer Science from the University of Illinois in 1970. Between 1963 and 1967 he held summer positions with the Freas- Rooke Computing Center, Bucknell University, and the U.S. Naval Ordnance Laboratory, White Oaks, Md. While attending the University of Illinois he was employed as a research assistant in the Department of Computer Science. He designed the floating point arithmetic units for the Illinois Pattern Recognition Computer (illiac III) under direction of Professor Bruce H. McCormick, and conducted research in the area of computer arithmetic under the direction of Professor James E. Robertson. Mr. Atkins has published papers evolving from this work in the IEEE Transactions on Computers of October 1968 and August 1970. Mr. Atkins is a member of Tau Beta Pi, Sigma Xi , Pi Mu Epsilon, Pi Delta Epsilon, the Association for Computing Machinery, the Institute of Electrical and Electronic Engineers, and the American Association of University Professors. m AEC-427 (6/68) ECM 3201 U.S. ATOMIC ENERGY COMMISSION UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR DISPOSITION OF SCIENTIFIC AND TECHNICAL DOCUMENT < See Instructions on Reverse Side ) \ AEC REPORT NO. Report No. 397 ! COO- 1018- 1201+ 2. TITLE A STUDY OF METHODS FOR SELECTION OF QUOTIENT DIGITS DURING DIGITAL DIVISION TYPE OF DOCUMENT (Check one): IX] a. Scientific and technical report I | b. Conference paper not to be published in a journal: Title of conference Date of conference Exact location of conference _ Sponsoring organization □ c. Other (Specify) " RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): DCl a. AEC's normal announcement and distribution procedures may be followed. I I b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. I | c. Make no announcement or distrubution. i REASON FOR RECOMMENDED RESTRICTIONS: i SUBMITTED BY: NAME AND POSITION (Please print or type) Daniel E. Atkins, Research Assistant Organization Department of Computer Science University of Illinois Signature Date May 28, 1970 FOR AEC USE ONLY 'lAEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION RECOMMENDATION: (PATENT CLEARANCE: LJ a. AEC patent clearance has been granted by responsible AEC patent group. LJ b. Report has been sent to responsible AEC patent group for clearance. LJ c. Patent clearance not required. jU* ( 3M970 ^^ $w