nimmu 
 
♦t • 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 
 OEC 16 m 
 
 L161 — O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 
 http://archive.org/details/studyofmethodsfo397atki 
 

 COO- 1018- 120U 
 
 Report No. 397 
 
 A STUDY OF METHODS FOR SELECTION OF 
 QUOTIENT DIGITS DURING DIGITAL DIVISION 
 
 by 
 
 Daniel E. Atkins 
 
 June 1970 
 
 THE LIBRARY OF. THE 
 
 JUN 2 5 1970 
 
 AT URBANA- 1 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN ■ URBANA, ILLINOIS 
 
C00-1018-120U 
 
 Report No. 397 
 
 A STUDY OF METHODS FOR SELECTION OF 
 QUOTIENT DIGITS DURING DIGITAL DIVISION* 
 
 by 
 
 Daniel E. Atkins 
 
 June 1970 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 *This work was submitted in partial fulfillment of the requirements 
 for the degree of Doctor of Philosophy in Computer Science, June 1970, 
 and was supported in part by the National Science Foundation under 
 Grant Nos. US NSF GJ 8l2 and US NSF GJ 813, and in part by the U.S. 
 Atomic Energy Commission under Contract USAEC AT(ll-l-10l8) . 
 
J I IS - 
 
 111 
 
 The author gratefully acknowledges the continued support of the 
 
 ACKNOWLEDGMENT 
 
 Department of Computer Science during the past five years and particularly 
 thanks the following professors, colleagues and employees of the department. 
 
 First to his thesis adviser, Professor James E. Robertson, whose 
 guidance, encouragement, and friendship are highly valued. Second, to 
 Professor Bruce H. McCormick, whose never failing loyalty, support, and 
 enthusiasm are equally valued. 
 
 The author's colleague, V. G. Tareski is the author of the very 
 efficient prime implicant generation algorithm so essential to this work and 
 mentioned in Section h. He was also a source of encouragement and enlighten- 
 ing discussions. C. R. Baugh , T. K. Liu, and T. Ibaraki developed the program 
 used to solve the massive covering problems encountered in the course of 
 minimization. B. G. DeLugish has assisted in offering valuable discussions 
 and in the arduous task of proofreading. 
 
 The final typing is the fine work of Mary Ann Davis and Betty 
 Gunsalus. The excellent drawings were done by Stanislav Zundo and the 
 equally excellent offset printing is the work of Dennis Reed. 
 
 And finally, acknowledgment is due Miss Peppermint Patty whose 
 incisive comments, presented below, have been a source of comfort in times of 
 confusion. 
 
 I LM TWOS THE &ST...THEY'RE 
 SORT Of GENTLE. THREES AMP 
 FIVES ARE MEAN, WT A FOUR IS 
 ALWAVS PIEASAMT.. I LIKE SEVENS 
 AND EI6HTS, TOO, 6VT NINE5 ALWAflS 
 SCARE ME. ..TENS ARE 6REAT... 
 
 HAVE VOU PONE THOSE PlVlSlOW 
 PROBLEMS FOR TOMORROW? 
 
 N0THIN6 5P0ILS NUMBERS FASTER 
 THAN A LOT OF ARITHMETIC I 
 
 © 1968 - United Features Syndicate 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 1 . 1 Background 1 
 
 1 . 2 Present Work 6 
 
 2 . DEFINITION OF THE DIVISION PROCEDURE 8 
 
 2.1 Formal Definition of the Full Precision Division 8 
 
 2.2 Graphical Representation of the Division Procedure 9 
 
 2.3 Formal Definition of the Quotient Selection Procedure 10 
 
 2.k Physical Model of the Quotient Selection Mechanism 13 
 
 3 . DEFINITION OF COST AND PERFORMANCE 20 
 
 3 • 1 Preliminary Remarks 20 
 
 3.2 Definition of Cost 21 
 
 3.2.1 Preliminary Remarks 21 
 
 3.2.2 Structure for Finding Cost of Table 2 22 
 
 3.2.3 Structure for Finding Cost of Table 1 and the 
 Multipliers 25 
 
 3. 3 Definition of Performance 28 
 
 3.3.1 Performance of the Model Division 28 
 
 3.3.2 Performance of the Full Precision Division 29 
 
 k. ALGORITHMS FOR SYNTHESIS AND ANALYSIS 31 
 
 k . 1 Preliminary Remarks 31 
 
 k.2 Deriving a Minimal Cost Design for Table 2 32 
 
 U. 2. 1 Defining the Output Functions 33 
 
 4.2.2 Minimizing the Output Functions 46 
 
 4. 3 Deriving a Minimal Cost Design for Table 1 49 
 
 4 . 3 • 1 Defining the Output Functions 50 
 
 4.3.2 Minimizing the Output Functions 59 
 
 5 . RESULTS FROM DESIGN PROGRAMS 60 
 
 5 . 1 Preliminary Remarks 60 
 
 5.2 Numerical Results from Design Programs 60 
 
 5.2.1 Cost of Table 2 for Type 2 Structure 60 
 
 5.2.2 Cost of Table 1 for Type 1 Structure 65 
 
 5.3 Analytic Results Concerning Cost of Table 2 66 
 
 5-3.1 Preliminary Remarks 66 
 
 5.3.2 Definition of s. , s!, and sV 67 
 
 5.3.3 An Estimate of Cost as a Function of s! 73 
 
 5.3.4 Discrepancies 84 
 
V 
 
 Page 
 
 5.^ Analytic Results Concerning Cost of Table 1 88 
 
 5 . k . 1 Preliminary Remarks 88 
 
 5.^.2 Worst Case Bounds on Transformed Parameters 88 
 
 5- U.3 An Estimate of the Cost of Table 1 93 
 
 6. ESTIMATES OF COST AND PERFORMANCE 95 
 
 6. 1 Preliminary Remarks 95 
 
 6. 2 Type 2 Structures 95 
 
 6.2.1 Cost versus Radix 95 
 
 6.2.2 Performance versus Radix 97 
 
 6.2.3 Cost versus Performance 98 
 
 6 . 3 Type 1 Structures 99 
 
 6.3.1 Cost versus Radix 99 
 
 6.3.2 Performance versus Radix 100 
 
 6.3-3 Cost versus Performance 100 
 
 6. h Hybrid Structures 101 
 
 6.k.l Cost versus Radix and Number of Adders 
 
 in Multiplier 1 101 
 
 6.k.2 Performance versus Radix and Numbers of Adders 
 
 in Multiplier 1 102 
 
 6.H.3 Cost versus Performance 10U 
 
 T . SUMMARY AND CONCLUSIONS 105 
 
 7 . 1 General Summary 105 
 
 7 . 2 Cost and Performance 106 
 
 7.3 Analytic Results ' 110 
 
 7 . h Suggestions for Further Investigation 112 
 
 APPENDIX 
 
 A. Algorithm for Generating Minimum Cost Sum-of-Products 
 Definitions of the q-Regions of Table 2 11 U 
 
 B. Example of Results of QSU and Minimization Program 11 6 
 
 REFERENCES 117 
 
 VITA 120 
 
VI 
 
 LIST OF TABLES 
 
 Table Page 
 
 1. Equations Defining the Regions of Figure 1 11 
 
 2. Summary of Cost Calculations for Table 2 with 
 
 r=l6, n=10, a=l/2, b=l, Y=l/l6, A=l/l6, a=0, 6=1/256 63 
 
 3. Summary of Cost Calculations for Table 2 with 
 
 r=l6, n=10, a=3A, b=9/8, Y=l/l6, X=l/l6, a=0 , 3=1/128 6k 
 
 k. Summary of Cost Calculations for Table 1 with 
 
 a=l/2 , b=l , y=l/l6 , X=l/l6 , a=0 65 
 
 5. Results of Least Squares Fit of M'(i), F'(i), and 
 
 C ' ( i ) for Data from Table 2 78 
 
 6. Comparison of Results from Estimating Equations 
 
 and the QS3 Program for rp = 1/32 8k 
 
 7- Cost of Table 2 versus Radix 96 
 
 8. Performance of Type 2 Structure versus Radix 98 
 
 9. Cost Bounds versus Performance for Type 2 Model Division 99 
 
 10 . Cost of Type 1 Structure versus Radix 100 
 
 11. Performance of Type 1 Struct re versus Radix 100 
 
 12. Cost versus Performance for Type 1 Model Division 101 
 
 13. Cost Computations for Hybrid Structures 102 
 
 ik. Performance Calculations for Hybrid Structures 103 
 
 15. Cost versus Performance for Hybrid Model Division 
 
 Structures 10^ 
 
VI 1 
 
 LIST OF FIGURES 
 
 Figure Page 
 
 1 . P-D Plot with r=k , n=2 11 
 
 2. Generalized Structure of Model Division 
 
 (Quotient Selector) l6 
 
 3. Network Definition of Table 2 23 
 
 k. Network Definition of Table 1 25 
 
 5 . Structure of Multipliers 27 
 
 6. Portion of P-D Plot Illustrating Segmentation 
 
 of rp-line 35 
 
 7. Portion of a P-D Plot Illustrating Constraints 
 
 in Finding Divisor Transition Interval 36 
 
 8. Flowchart of QS3 Program Hi 
 
 9. Portion of a P-D Plot Illustrating Constraints 
 
 in Finding A(d) 51 
 
 10. Flowchart of QSU Program 5^ 
 
 11. Cost of Implementing q(i) Region vs. i 
 
 for Data in Table 2 66 
 
 12. Graphical Interpretation of s 68 
 
 13. Graphical Interpretation of s! 71 
 
 Ik. Model of the q(i) Region Used in Approximating 
 
 Effects of Minimization 75 
 
 15. M' (i ) versus s 
 
 16. F' (i ) versus s 
 
 17. C ' (i ) versus s 
 
 18. C (i) versus s 
 
 79 
 80 
 82 
 
 for Arp=l/l6 and Arp=l/32 85 
 
 19. Geometry for Derivation of Estimates of d 90 
 
 20. Cost versus Performance for Samples of 
 
 Model Division Structures 107 
 
A STUDY OF METHODS FOR SELECTION OF 
 QUOTIENT DIGITS DURING DIGITAL DIVISION 
 
 Daniel Evell Atkins, III, Ph.D. 
 
 Department of Computer Science 
 
 University of Illinois, 1970 
 
 This study concerns a class of non-restoring division schemes in 
 which redundancy is introduced into the representation of the quotient thereby 
 permitting quotient digits to be selected from highly truncated versions of 
 the divisior and partial remainders. The mechanism for selection of quotient 
 digits is a limited precision model of the full precision division which it 
 controls by the generation of simple microprogram instructions. A major 
 advantage of this approach to division is a high degree of congruity with 
 commonly used multiplication structures, including those making use of limited 
 propagation adder-subtracters, for example, carry-save adders. 
 
 A cost versus performance analysis for a large class of quotient 
 selection mechanisms (model divisions) is developed. The class is defined in 
 terms of a block diagram and a set of ten design parameters. By varying the 
 structure of the sub-blocks and the values of the parameters , the model 
 division scheme ranges from that of forming quotient digits by multiplying the 
 dividend by the inverse of the divisor, to that of a direct table look-up of 
 the quotient digit. So called hybrid structures exist between these two cases. 
 Algorithms are described which synthesize near minimal cost realizations of the 
 most complicated sub-blocks: a combinatorial logic network to produce appro- 
 priate estimates of the reciprocal of the divisor, and a combinatorial logic 
 network to generate a quotient digit directly as a function of the bits in 
 estimate of the divisor and partial remainder. Formulas are given for the 
 cost of the remaining sub-blocks. For a given type structure the primary 
 
determinant of performance is the radix of the model division, r = 2 , where 
 k is the number of bits of quotient produced per access to the model division. 
 A FORTRAN implementation of the synthesis routines is used to obtain 
 the near minimal cost for several different structures and sets of design 
 parameter values. The numerical results, together with the insight gained in 
 obtaining them, are applied to hypothesize a formula for minimal cost. The 
 analysis includes a multi-variable expression which relates cost to the radix 
 of the model division, r, the degree of redundancy in the quotient representa- 
 tion, and the magnitude and direction of the maximum truncation error in the 
 divisor and partial remainder estimates. The cost formulas, together with 
 easily derived performance formulas, are used to tabulate expected cost and 
 performance for a variety of structures. It is found that for most schemes 
 the cost varies exponentially with performance and consequently, that many of 
 the higher radix schemes are not practicable. A radix U, direct table look-up, 
 however, can be built with about ten, 10-input gates, and assuming 10 ns. 
 logic, could produce 60 bits of quotient in about k us. The study is concluded 
 with suggestions for further investigation. 
 
1 . INTRODUCTION 
 
 1. 1 Background 
 
 Since division is the mathematical inverse of multiplication, one 
 might hope that the cost of implementing both a multiplication and division 
 operation would not be much different than the cost of implementing multipli- 
 cation alone. Furthermore, for a given operand length, one might expect the 
 executions times for the operations to be about the same. In actual practice 
 this hope has not been realized, largely due to the fact that division, un- 
 like multiplication, is inherently a trial-and-error process. 
 
 In multiplication, a product is accumulated by the successive 
 addition of multiples of the multiplicand to a partial product. The selection 
 of which multiple to add is dependent upon a digit, radix r, of the multi- 
 plier — a quantity which is known apriori. 
 
 Now consider a recursive relationship for a class of division 
 techniques based upon subtraction. This relationship is defined by 
 
 p = rp - q., +1 <i s J = 0,1,..., m-1 (1.1) 
 
 in which 
 
 p n is the dividend, 
 
 p. is the partial remainder used in the j recursion, 
 J 
 
 p is the remainder, 
 m 
 
 j is the recursion index, 
 
 q. is the j quotient digit, 
 J 
 
 d is the divisor, and 
 r is the radix. 
 
In forming the partial remainder, p , a multiple of the divisor 
 
 J -*- 
 
 is subtracted from the previous partial remainder shifted left by one digital 
 position. The selection of which multiple to subtract is dependent upon a 
 digit of the quotient; but it is precisely this quotient digit that we must 
 compute. It is not known apriori. As it stands this relationship for divi- 
 sion does not adequately specify how q.., is selected until we add a restric- 
 tion such as |p. | < |d| . The important point here is that division not 
 only requires an addition or subtraction as in multiplication, but also the 
 selection of a quotient digit such that the value of the contents of the ac- 
 cumulator after the subtraction is within a specified range. If it is not 
 within this range then some correction is required. 
 
 Several effective techniques have been developed for accelerating 
 the execution of multiplication. Foremost among them are the following: 
 
 1. Use of adders or subtracters which postpone carry or 
 borrow propagation until a terminal step. 
 
 2. The use of a higher radix (greater than 2) so that 
 several bits of the multiplier are retired in one 
 iteration. 
 
 3. The introduction of redundancy* into the multiplier 
 by multiplier recoding. 
 
 The success of such techniques in multiplication raises the question 
 of their applicability to division. A significant contribution to the answer 
 was made with the discovery of SRT division. 
 
 * Redundancy or redundant r epr e s ent at i on refers to a number representation in 
 which each radix r digit may assume more than r different values. 
 
In the middle 1950 f s D. Sweeney of IBM, J. E. Robertson of the 
 University of Illinois [ l]*, and T. D. Tocher [ 2 ] of Imperial College, 
 London, independently discovered a binary division technique especially suited 
 for implementation in an electronic digital computer. SRT division was named 
 by C. V. Freiman of IBM in a paper discussing its statistical properties [ 3 ] 
 although an example of the technique may actually have been presented by 
 Nadler [ k] in a 1956 paper describing a computer designed and built by the 
 Institute of Mathematical Machines of the Czechoslovak Academy of Science 
 under the direction of Dr. Antonin Svoboda. Whether or not the Nadler work 
 is equivalent to SRT is obscured by the fact that it is discussed in conjunc- 
 tion with a stored-carry adder and accumulator. 
 
 The basis of SRT division is the discovery that introducing redun- 
 dancy into the representation of the quotient yields more freedom in the 
 selection of a quotient digit at each step of the recursion. In SRT division 
 this freedom is used to increase the probability of a zero quotient digit, for 
 which the next partial remainder is produced merely by a shift rather than by 
 a subtraction followed by a shift. This flexibility is in contrast to con- 
 ventional restoring or non-restoring division which require a full-precision 
 subtraction for each quotient bit generated. Even though we are considering 
 a binary number system, digit values for SRT division are 1,0,1 (the over- 
 bar denotes negation, i.e. -l), and thus we have redundancy. 
 
 In 1965, Robertson [ 5 ] extended the concepts inherent in SRT 
 division to higher radix division schemes. The fundamental tenets of the 
 method remain, namely, that by introducing redundancy into the representation 
 
 ^Numbers in brackets refer to entries under References. 
 
of the quotient, the selection of a quotient digit at each step of the recur- 
 sion need not be precise. For the higher radix cases, a larger set of quo- 
 tient digits is necessary and thus the probability of a zero quotient digit 
 is reduced to the extent that adder bypass no longer yields significant speed 
 improvement. However, the redundancy may still be put to advantage; it 
 permits the selection of a quotient digit based only upon high-order digits 
 of the divisor and high-order digits of the shifted partial remainder. 
 
 In reference [ 5 ] , Robertson introduces the notion of a quotient 
 selection mechanism with inputs consisting of estimates of the divisor and 
 shifted partial remainder. He notes that the mechanism for selection of quo- 
 tient digits may be thought of as a limited precision model of the full 
 precision division. The procedures in the model need not be the same as the 
 procedure of the full precision scheme which it controls. The model division 
 generates simple microprogram instructions to the full-precision unit. His 
 paper also presents an indirect, relative measure of the cost of selection of 
 quotient digits. 
 
 The authors Master's Thesis in 1967 is based largely upon Robertson's 
 work as described in references [ 1] and [5]. The complete thesis, includ- 
 ing an example of a actual implementation of a model division scheme, is 
 available in report form [6]; the more theoretical aspects of the work are 
 available in journal form [7]. Implementation is also discussed in a more 
 recent report in conjunction with the development of the arithmetic units of 
 the Illiac III Computer [8], [9]. 
 
 The authors paper [ 7] is largely tutorial. It presents a detailed 
 review of Robertson's proof of the validity of the class of division techni- 
 ques to which the model division approach is applicable. The proof will not 
 
be repeated in the present work. The paper also describes a graphical repre- 
 sentation, the so-called P-D plot, suggested by C. V. Freiman [5], which is 
 useful in describing the division procedure, and then develops expressions 
 for the maximum number of bits of the divisor and partial remainder which must 
 be inspected in order to determine a correct quotient digit for a given radix, 
 a given lower limit on the divisor, and a given amount of redundancy* in the 
 representation of the quotient. These expressions, which provide a worst- 
 case measure of costs, also account for redundancy in the representation of 
 the partial remainder such as produced by a member of the family of carry- 
 save adders or borrow-save subtracters [10], [ll] » [12]. 
 
 We are now in a position to consider the design of division schemes 
 which are highly compatible with multiplication structures. The model divi- 
 sion determines which multiples of the divisor are to be combined with the 
 partial remainder. In this respect it is analogous to the multiplier recoder 
 and may be thought of as a quotient recoder. Multiplier recoding logic is 
 usually entirely combinatorial and grows in complexity only linearly with the 
 radix. The model division is complicated by the fact that the quotient digit 
 is a function of both the divisor and the partial remainder and the fact that 
 the partial remainder, unlike the divisor or multiplier, is not constant 
 throughout a given operation. An analysis of the growth of the complexity of 
 a model division with increase of radix is one aspect of this thesis work. 
 
 But despite these complications, the strong analogy between multi- 
 plier recoding and the concept of the model division leads to a division 
 
 *A measure of redundancy will be defined later in this paper. 
 
 Robertson has made a formal correspondence between multiplier recodings and 
 quotient recodings produced by SRT division. See Ref. [13]. 
 
structure which is potentially highly compatible with a given multiplication 
 scheme. The difference in the execution time between the iterative portion 
 of multiplication and division is essentially the difference between the total 
 time required to recode the multiplier and that to recode the quotient. The 
 bulk of the logic accounting for the difference in cost of a multiplier and 
 the cost of a multiplier and divider may then be associated with the cost of 
 implementing the model division. 
 
 1.2 Present Work 
 
 With this background in mind, we now turn to an introduction to the 
 present work. Section 2 begins by defining a class of full-precision multi- 
 plication-division structures. We then define a rather general block 
 structure of a quotient selection mechanism suitable for use as a model 
 division. The parameters of the model include the radix, the magnitude of 
 the largest quotient digit, the range of the divisor, and the truncation 
 error in the estimates of the divisor and partial remainder. 
 
 The flexibility of the model division approach and the generality 
 of the model proposed in Section 2 offer a large number of design possibili- 
 ties. A major goal of this work is to investigate the cost versus perfor- 
 mance of various designs and attempt to extract analytic results. Such an 
 attempt requires the definition of a measure of cost and performance. A 
 useful cost measurement should, in some sense, be minimal, and therefore we 
 must consider minimization criterion and a minimization algorithm. These 
 topics are discussed in Section 3. 
 
 The first approach taken to determining cost and performance of 
 various quotient selectors is that of computer-aided generation of a specific 
 
design followed by analysis. In Section h algorithms are described which, 
 when supplied with parameter values, will generate logic definitions of the 
 sub-blocks of the model. Most of the logic will be defined in a minimal sum- 
 of-products form which could serve as input to a logic design program custom- 
 ized for a given class of logic. 
 
 To this point we will have developed a mechanism for generating 
 and comparing various designs for a model division. The approach has been 
 one of computer-aided design followed by computer-aided minimization. The 
 results from the computer work are tabulated in Section 5. Although the 
 design and minimization programs are quite efficient, the large number of 
 design possibilities together with the large number of terms in the logic 
 equations for the higher radix models strongly discourages an exhaustive 
 analysis. An additional result described in Section 5 has been insight which 
 led to development of analytic expressions for the cost of a structure. 
 
 Section 6 is a tabulation of estimates of cost and performance 
 based upon the equations and computer generated results described in Section 
 5. The final selection is a summary and some conclusions as to the relative 
 merits of various members of the family of model division schemes considered. 
 The section includes a list of suggestions for further investigation. 
 
2. DEFINITION OF THE DIVISION PROCEDURE 
 
 2.1 Formal Definition of the Full Precision Division 
 
 The members of the class of division algorithms which may be em- 
 ployed to perform the full-precision division are those defined by the 
 recursive relationship (l.l) and the list of restrictions given below. The 
 recursive relationship is repeated here for convenience. 
 
 P i+1 = r P i - <1 1+ -, d » J = 0,l,...,m-l (1.1) 
 
 in which 
 
 p is the dividend, 
 o 
 
 p. is the partial remainder used in the jth recursion, 
 
 J 
 
 p is the remainder, 
 m 
 
 j is the recursion index, 
 
 q. is the jth quotient digit (radix - r), 
 
 J 
 
 d is the divisor, and 
 r is the radix. 
 
 The quantity rp . is referred to as the shifted partial remainder. 
 J 
 
 Restrictions which apply are as follows: 
 
 1. Allowable quotient digits are 
 
 -n, -n+1, -n+2, ...,0,1,2,..., n where 
 
 n is an integer such that n-(r-l)/2. (2.1) 
 
2. The dividend, p , must be in the range defined by 
 
 Ip o I < P |d| (2.2) 
 
 where p = n/(r-l). (2.3) 
 
 3. The divisor must be within a given range, i.e. the 
 quantities a and b must be defined such that 
 
 a - |d| - b. (2.U) 
 
 h. Every quotient digit, q. +1 for j from through m-1, 
 
 must be chosen such that p..., as defined by (l.l) is 
 
 j+1 J 
 
 within the range 
 
 |P J+1 I " P l d l- (2-5) 
 
 The derivation of these restrictions is given in 
 [ 6 ] and [ T ] . Note that the forms of rp and d have 
 not been limited to non-redundant representations. 
 They may be in forms such as produced by carry-save 
 adders or borrow-save subtracters. 
 
 2.2 Graphical Representation of the Division Procedure 
 
 This division procedure may be defined graphically with a con- 
 struction suggested by C . V. Freiman [ 5 ]. The basis for the construction is 
 the recursive relationship (l.l) together with the range restriction (2.5). 
 The figure is essentially a plot of partial remainder versus divisor values 
 and is thus designated a P-D plot. 
 
10 
 
 Solving the recursive relationship for rp . yields 
 
 J 
 
 r P, = P-+-I + c l-+-i d « For a fixed quotient digit, the upper limit of rp . as a 
 
 function of the divisor ,d, occurs when p.,_ is maximum, i.e., when p. „ = pd 
 
 J+1 0+1 
 
 and thus 
 
 rp. = (p + q. ... )d. (2.6) 
 
 j max j+1 
 
 Likewise the lower limit is defined by 
 
 rp. . = (-p + ck- )d. (2.7) 
 
 J mm J+1 
 
 These linear functions of d may be plotted as a family of curves with q as 
 
 a parameter ranging from -n through +n in steps of 1. The area between 
 
 rp. and rp . . for a given q.,, = i will be denoted the "q(i) region." 
 j max *j mm to J+1 
 
 For given r, n, a, and b, the division procedure is specified by 
 
 the corresponding P-D plot. A given value of d and rp . will specify a point 
 
 J 
 
 in a q(i) area. The quotient digit q. +1 is therefore i and is used in 
 
 J - L 
 
 f orming p . 
 
 j+1 
 
 Figure 1 is an example of a P-D plot with r = k, n=2, a = 1/2 
 
 and b = 1. The equations for the lines denoted 2', 2, etc. are defined in 
 
 Table 1. Note that as a consequence of the redundancy introduced into the 
 
 representation of the quotient there is overlap between adjacent quotient 
 
 regions. Some pairs (d, rp . ) will specify a point for which either 
 
 J 
 
 q.,., = i or q. _. = i - 1 is a valid choice. It is this overlap which permits 
 ^-j+1 ^j+1 
 
 quotient digit selection to be made on the basis of estimates of the full 
 precision divisor and shifted partial remainder. 
 
 2.3 Formal Definition of the Quotient Selection Procedure 
 
 The quotient selection mechanism may be defined as a device that 
 
11 
 
 Figure 1. P-D Plot with r=U, n=2 
 
 rp. = + 
 
 n 
 
 ti — d + q , , d 
 -1 q j+l 
 
 Designation 
 
 Vl 
 
 in Figure 
 
 1 
 
 2' 
 
 
 2 
 
 2 
 
 
 2 
 
 1' 
 
 
 1 
 
 1 
 
 
 1 
 
 0' 
 
 
 
 
 
 
 
 
 
 I 1 
 
 
 1 
 
 1 
 
 
 1 
 
 2' 
 
 
 2 
 
 2 
 
 
 2 
 
 *J+1 
 
 2/3 d 
 -2/3 d 
 
 2/3 d 
 -2/3 d 
 
 2/3 d 
 -2/3 d 
 
 2/3 d 
 -2/3 d 
 
 2/3 d 
 -2/3 d 
 
 Equation 
 
 rP J = 
 
 8/3 d 
 
 U/3 d 
 
 5/3 d 
 
 1/3 d 
 
 2/3 d 
 
 -2/3 d 
 
 -1/3 d 
 
 -1/3 d 
 
 -U/3 d 
 
 -8/3 d 
 
 Table 1. Equations Defining the Regions of Figure 1, 
 
12 
 
 when given estimates of the divisor and shifted partial remainder of "suffi- 
 cient" precision, will produce a quotient digit, i, such that restriction 
 (2.5) is satisfied. The definition of sufficient precision is given in 
 the following. 
 
 With a, b, n, and r given, the P-D plot is specified. Let D "be the 
 set of all divisor values for a given operand length and range specified by 
 (a, b). Let P be the set of all values of allowable shifted partial remain- 
 ders. The area of the P-D plot is the Cartesian product of P and D, i.e. the 
 area is the set 
 
 P x D = {(rp,d)|rp e P and d e D}. (2.8) 
 
 Every element of P x D is contained in one or more q(i) regions; 
 thus each element implies a set I = {i|(rp, d) is within the q(i) region}. 
 In Figure 1, every pair (d, rp) will be contained in either one or two q(i) 
 regions. This will be the case for all examples discussed in this study, 
 however, for p = n/(r - l) greater than 1, a given (d, rp) would be contained 
 within two or more q(i) regions. 
 
 The inputs to the quotient selection mechanisms are estimates of the 
 divisor and shifted partial remainder. Let d and rp denote these estimates, 
 respectively, and let Q(rp, d) be the output of the quotient selection 
 mechanism (a quotient digit) for given estimates (rp, d). 
 
 The set of rp and d values are of sufficient precision and the 
 quotient selection procedure is correct if for every pair (rp, d)e P x D, 
 there exists an ordered pair (rp, d) such that Q(rp, d) = i, where i belongs 
 to the set I implied by (rp, d). 
 
13 
 
 In actual practice, d and rp are formed by uniformly truncating, or 
 truncating and rounding d and rp, respectively. Assume that a binary repre- 
 sentation of d is truncated between position 6 and 6 + 1 to the right of the 
 binary point, and that a binary representation of rp is truncated between 
 position e and e + 1 to the right of the binary point. Let, 
 
 Ad = 2~ 6 , and (2.9) 
 
 Arp = 2~ £ . (2.10) 
 
 The set of d-values are therefore integer multiples of Ad and the 
 set of rp values are integer multiples of Arp. A given value of d is repre- 
 sentative of the range of full precision divisor values given by 
 
 d - a - d-d+3, (2.11) 
 
 where a = a ' Ad (2.12) 
 
 3=3' Ad. (2.13) 
 
 Similarly, rp is representative of the range of full precision shifted partial 
 remainders in the range 
 
 rp - X - rp - rp + y , (2.14} 
 
 where X = A' Arp, and (2.15) 
 
 y = y'Arp. (2.16) 
 
 The quantities a', 3', A', and y' are in the range to 2 and depend 
 upon the truncation procedure and the form of representation of d and rp. 
 
 2. k Physical Model of the Quotient Selection Mechani 
 
 sm 
 
 We now turn to the question of the physical realization of a quo- 
 tient selection mechanism; the device which performs the operation rp/d to 
 produce the quotient, i, such that i belongs to the set of quotient digits, I, 
 
Ik 
 
 implied by (rp, d). Since the operation time of division relative to multi- 
 plication is limited by the model division, the requirements for a high perfor- 
 mance arithmetic processor would demand the design of a high-speed model 
 division. One way to achieve this would be to use a higher-speed class of 
 logic in building the model division than in building the remainder of the 
 arithmetic processor, but in this work we are assuming one given class of 
 logic and are constraining the design problem such that speed advantages must 
 be gained by organization. 
 
 Any valid division technique is a candidate for a model division. 
 One aspect of this study was a survey of known division techniques suitable 
 for implementation in a digital computer. References [lh] through [32] are 
 some of the works consulted. In evaluating possible candidates we should 
 keep in mind the advantages of dealing with relatively short operands coupled 
 with the potential requirement for low operating times. 
 
 Digital division schemes may be classified as additive, multiplica- 
 tive, tabular or some combination of the three. Additive techniques are 
 those such as restoring and non-restoring division in which addition and 
 subtraction are the fundamental operations; the divisor remains invariant. 
 Multiplicative schemes are those in which both the dividend and divisor are 
 multiplied by factors in such a manner that the modified divisor converges to 
 1 and thus the modified dividend converges to the quotient. Tabular techni- 
 ques are those based upon a combinational network: the quotient digit is 
 produced by table-look-up. Note that neither of the two later techniques 
 produces a remainder but that a remainder is not needed for a model division. 
 
15 
 
 We have eliminated analog schemes and threshold logic from consid- 
 eration in this study. We have also ruled out logarithmic techniques since, 
 although the division is transformed to a subtraction, the equipment-time 
 ratio suffers due to necessity for forming logs and antilogs. 
 
 We now propose a generalized structure into which will fit multi- 
 plicative and tabular techniques. These schemes appear to have a potential 
 for higher operating speeds than the additive techniques. Since it is an 
 additive (non-restoring) scheme which is controlled with the model division 
 it seems intuitively justifiable to consider a higher performance class for 
 the model. Emphasis on hardwired table look-up techniques is also justified 
 by trends of technology towards LSI . 
 
 Figure 2 is the generalized structure. The parameters and blocks 
 are as follows : 
 
 Divisor Estimate Formation - This block accepts a full precision 
 
 < 
 
 divisor, d, in the range a - d < b, and from it produces an estimate of the 
 
 divisor, d, with maximum negative uncertainty, a, and maximum positive uncer- 
 tainty, 3. This box may also incorporate provisions for changing the form 
 of representation of d from that of d. For example, if the model division 
 structure accepts only positive quantities, but d is in both negative and 
 positive range, this box could convert d to a sign and magnitude form. The 
 magnitude would serve as input to the model. The sign would be used together 
 with the sign of the partial remainder in determining the sign of the quo- 
 tient digit. This block is part of the interface between the full precision 
 structure and the model division. 
 
 In addition to this interfacing function, the divisor estimate 
 formation box also serves as a selector. Note that the output of Multiplier 2 
 
16 
 
 
 u r 
 
 
 
 
 
 PARTIAL 
 REMAINDER 
 ESTIMATE 
 FORMATION 
 
 
 
 
 
 
 
 
 j, 
 
 
 
 
 DIVISOR 
 
 ESTIMATE 
 
 FORMATION 
 
 A 
 
 d 
 
 TABLE 
 
 1 
 
 A»f(d) 
 
 MULTIPLIER 
 1 
 
 P=A-r{ 
 
 
 r* 
 
 
 
 
 
 
 \ 
 
 
 
 
 
 
 
 
 
 
 f 
 
 
 
 
 
 
 TABLE 
 2 
 
 q 
 
 QUOTIENT 
 RECODE 
 
 
 
 
 
 
 i 
 
 
 
 
 
 
 
 1 
 
 ' 
 
 
 
 
 
 MULTIPLIER 
 2 
 
 6=Ad 
 
 
 
 w 
 
 
 
 i 
 
 L_ 
 
 
 
 
 . J 
 
 Figure 2. Generalized Structure of 
 
 Model Division (Quotient Selector) 
 
17 
 
 is coupled back into this "box. This feedback loop together with the one from 
 Multiplier 1 to the partial remainder estimate formation box admits iterative 
 multiplicative schemes into this structure. 
 
 Table 1 - This block accepts d as the input and produces a value, 
 A, as a function of d, i.e. A = f(d). The quantity A is a factor by which 
 both d and rp are multiplied (the quotient is therefore not changed). It may 
 be interpreted as a scale factor used to transform the range of d or as an 
 estimate of the inverse of d. 
 
 Partial Remainder Estimate Formation - This block accepts a full 
 precision shifted partial remainder, rp, and from it produces an estimate of 
 the shifted partial remainder, rp, with maximum negative uncertainty of A and 
 maximum positive uncertainty, y. As with divisor estimate formation, the 
 estimate is in practice a truncated version of the full precision quantity. 
 The block may also incorporate provisions for changing the form of the 
 representation. 
 
 In actual implementations the full precision remainder may be the 
 result of operations using an adder- subtracter which produces a redundant 
 representation. The estimate of the remainder, rp, however, is restricted to 
 non- redundant representations. We have assumed, although not rigorously 
 demonstrated the fact , that use of a redundantly represented value would un- 
 duly complicate the structure of the quotient selection mechanism. Merely 
 determining the sign, for example, is of the same order of complexity as con- 
 verting the value into a non-redundant form. It is important to realize, 
 however, that the estimate consists of only the high order digits of the full 
 precision remainder. In practice this estimate is sufficiently short to 
 permit conversion to a non-redundant form using full-lookahead techniques. 
 
18 
 
 The partial remainder estimate formation block also enables the 
 output of Multiplier 1 to couple back into the input side . As with the 
 divisor loop, this path is necessary for the inclusion of the iterative mul- 
 tiplicative division scheme. 
 
 Multipliers - Multiplier 1 and Multiplier 2 form, respectively, the 
 quantities P = A rp and D = A d. The outputs of both multipliers are the 
 inputs to the second table look-up structure, Table 2. P may be thought of 
 as a transformed version of rp. The maximum negative uncertainty in P is 
 
 AA : the maximum positive uncertainty is vA , where A = f(b). 
 
 max * * max max 
 
 If the product, Arp is truncated so that non-zero digits are lost, additional 
 
 uncertainties A and y are introduced. In this case P represents trans- 
 
 m m 
 
 formed rp values in the range 
 
 P - AA - A - Arp - P + Ay + Y • (2.17) 
 
 m m 
 
 Similarly, the maximum uncertainties in D are A , 3A with A = f(b). 
 J max max max 
 
 If D is truncated with maximum truncation errors (a , 3 ) then D is representa- 
 
 m m 
 
 tive of transformed d values in the range 
 
 D-Aa-a-Ad-D+A3+B (2.l8) 
 
 m m 
 
 Table 2 - This structure is an implementation of the function 
 which relates quotient digits, q, to the products P and D, the scaled 
 remainder and divisor, respectively, for the model division. 
 
 Quotient Re code - The quotient recode block represents the inter- 
 face between the output of the model division and the full precision divi- 
 sion. The output of Table 2, q, may require a recoding into a form directly 
 usable by the shift gate complex which selects the next multiple of the 
 divisor to be used in forming the subsequent partial remainder. 
 
19 
 
 At this point we narrow the scope of the present research to exclude 
 iterative multiplicative schemes: the feedback loops of Figure 2 will not 
 be used. The remaining structure includes what might be considered two 
 extremes or boundary cases. In the one structure, to be designated Type 1, 
 Table 1 is defined such that the rounded, integer portion of the product A rp 
 is the correct quotient digit for the division, rp/d. For a Type 1 structure 
 neither Table 2 nor Multiplier 2 need be implemented. The other extreme 
 occurs when A = f(d) = 1. In this case, designated Type 2, Table 2 bears the 
 full burden of quotient selection and neither Table 1 nor the multipliers 
 are required. 
 
 But there are also intermediate, hybrid, structures in which neither 
 Table 1 nor Table 2 is degenerate. In these structures Table 1 and the 
 multipliers are used to transform A d into a range closer to 1 than was d. 
 The effect of this range transformation is to simplify Table 2. In the next 
 chapter we shall examine the design of Table 1 and Table 2 independently and 
 then make some observations about hybrid structures. The shift from a Type 1 
 structure to a Type 2 structure and accompanying trade-off between speed and 
 hardware is but an example of the trade-offs available between sequential 
 networks and their combinatorial equivalent. 
 
3. DEFINITION OF COST AND PERFORMANCE 
 
 3.1 Preliminary Remarks 
 
 To this point in the thesis we have defined a division procedure 
 which generates a quotient "by successive calls to a lower precision, model 
 division unit. A generalized structure of the model division was proposed and 
 now we begin to consider the synthesis of such a unit. 
 
 Besides the definitive aspects of this work, a major goal is to 
 derive useful estimates of minimal cost and performance as functions of the 
 design parameters of the generalized structure in Figure 2. Design parameters 
 include such quantities as radix, r; magnitude of maximum quotient digit, n; 
 and the point of truncation of rp and d. In this section, the important boxes 
 of Figure 2 are made sufficiently specific to allow a measure of minimal cost 
 and performance to be proposed. 
 
 In finding a measure of cost or performance, the designer is faced 
 with a trade-off between generality and accuracy. Determining absolute cost 
 or absolute performance is very much dependent upon hardware and details of 
 implementation; but restricting the study to a specific class of logic limits 
 the significance of the work. Questions of minimization are further complicated 
 by controversy as to what to minimize. 
 
 This work makes a compromise. Since much of the emphasis is on 
 comparison , a relative measure of cost and performance is adequate. On the 
 other hand, some estimate of absolute cost is desirable. The higher-radix, 
 table look-up schemes offer potentially high performance but require a larger 
 number of gates to construct. Whether, in fact, they are at all feasible for 
 
 20 
 
21 
 
 a real machine strongly depends upon the absolute cost. 
 
 3.2 Definition of Cost 
 
 3.2.1 Preliminary Remarks 
 
 For this study, the cost of a logic network is defined as the total 
 number of literals required to implement the network in two-level, sum-of- 
 products (AND-OR or equivalent) logic. The choice ignores fan-in and fan-out 
 restrictions, but this shortcoming is outweighed by the following considera- 
 tions . 
 
 1. The logical definitions of the networks are in a 
 canonical form which can serve as an input to a specific 
 minimization and/or design automation package. 
 
 2. The networks are realized in the theoretical mini- 
 mum number of circuit delays and thus will be an upper 
 bound on speed and cost. 
 
 3. The tables for higher-radix structures are candidates 
 for LSI. In this case the number of literals is a measure 
 of silicon area required and power dissipation requirements. 
 
 h. A very efficient computer program for sum-of- products 
 minimization is available to the author. 
 
 The cost of implementing the structure shown in Figure 2 is the sum 
 of the costs of implementing each sub-block. Symbolically, 
 
 C " C DEF + °T1 + C M1 + °M2 + <W + C T2 + C R f 3 - 1 ' 
 
 where 
 
 C is the total cost, 
 
22 
 
 C is the cost of the Divisor Estimate Formation block, 
 
 C is the cost of Table 1, 
 
 C is the cost of Multiplier 1, 
 
 C is the cost of Multiplier 2, 
 
 C is the cost of the Partial Remainder Estimate Formation block, 
 
 C is the cost of Table 2, and 
 
 C is the cost of the Quotient Recode block. 
 
 At this point, it is convenient to introduce intermediate variables, 
 C and C , and group the cost terms as follows: 
 
 C TMM = °T1 + C M1 + C M2 (3 * 2) 
 
 C DPQ = °DEF + C PREF + °R (3>3) 
 
 The cost terms C , C , and C are functionally related to the 
 design parameters such as radix, maximum quotient digit, range of divisor, and 
 uncertainty in the estimates of the divisor and remainders. The terms C and 
 C in C are the most complex and will be studied by computer synthesis. 
 Estimates of C and the remaining terms of C will be obtained manually as 
 required. In most cases, the term C p is dominated by C +C and may be 
 neglected. 
 
 3.2.2 Structure for Finding Cost of Table 2 
 
 Table 2 will be studied as a multiple-output logic network. It may 
 be represented as shown in Figure 3. The functions, f through f are Boolean 
 functions of the bit vectors corresponding to d and rp. These vectors are 
 denoted d and rp, respectively. 
 
23 
 
 A 
 
 rp 
 
 d A 
 
 
 TABLE 2 
 
 
 
 
 
 
 
 
 
 MULTI - OUTPUT 
 
 
 
 LOGIC 
 
 
 
 NETWORK 
 
 
 
 
 
 
 
 7T "A - . 
 
 ^f (d_,rp) 
 ►fi^.rp) 
 
 *f n (d,rp) 
 
 Figure 3. Network Definition of Table 2 
 
 In specifying the quotient selection criterion (Section 2.3), every 
 
 pair (d, rp) has "been associated with a set, I, of quotient digits which the 
 
 quotient selection mechanism may generate when given inputs (d, rp). The 
 
 functions, f , f . , ... , f must be found such that for every ordered pair, 
 o 1 n 
 
 (d, rp) with allowable quotient digit set, I 
 
 f. (d, rp) = 1 for one and only one iel, 
 
 and 
 
 f (d, rp) = for all other values of i. 
 
 K. 
 
 (3.1+) 
 
 (3.5) 
 
 In other words, every pair (d, rp) in the set D x P must cause one and only one 
 of the outputs to be true, and this output must correspond to a correct quo- 
 tient digit. 
 
 Due to the overlap of adjacent quotient regions produced by redun- 
 dancy, many elements in D x P may have sets, I, containing more than one 
 element, thus many sets of different functions are allowable for given design 
 
2k 
 
 parameters. But our wish to compare minimal* costs imposes another constraint, 
 
 namely, that the cost of the multiple output network (as defined in Section 
 
 3.2.1) is minimal. Symbolically stated: the requirement is that Cost (f + f_ 
 
 o 2 
 
 + f „+•••+ f ) be minimal . 
 
 In the general minimization of two-level, AND-OR realization of a 
 multiple-output network, it is necessary to generate the prime implicants of 
 each of the individual output functions, plus the prime implicants of the 
 functions which are equal to all possible products of two output functions, 
 three output functions, etc. Each product is a multiple-output prime implicant 
 McCluskey [33], states the following theorem of use here: 
 
 Theorem: For any definition of networks cost such that the 
 cost does not increase when a gate or gate input is removed, 
 there exists at least one minimum-cost, two-stage network in 
 which the corresponding expressions for the output functions, 
 f a , are all sums of multiple-output prime implicants. All 
 the product terms which occur only in the expression for f j 
 are prime implicants of f a ; all the product terms which 
 occur in both the expressions for f a and f^ but in no other 
 expressions are prime implicants of f- • f^, etc. 
 
 But in the present case, no two functions are ever simultaneously 
 true and thus none of the prime implicants of f . are contained in any other 
 
 J 
 
 function, f , k ^ j. Thus, by the theorem stated above, there exists a minimum 
 
 K. 
 
 cost two stage network which may be found by minimizing each function indepen- 
 dently of the rest, i.e. 
 
 Min Cost (f + f + ••• + f ) = Min Cost (f ) + Min Cost 
 o 1 n o 
 
 (f ) + ••• + Min Cost (f ). 
 
 *The term minimal , implies that we wish to find any one of possibly more than 
 one minimum cost implementations. 
 
25 
 
 3.2.3 Structure for Finding Cost of Table 1 and the Multipliers 
 
 As with Table 2, Table 1 will be defined as a multiple-output logic 
 network as shown in Figure k. The input is d, the bit-vector representation 
 of d. The outputs are the variables a = g ,(d), a = g (d), ... a = g. 
 (d), where g. is a Boolean function. The bits, a through a. comprise the 
 binary representation of inverse of d, A. Unfortunately, in this case, we 
 cannot constrain the problem so that none of the outputs are simultaneously 
 true. For purpose of estimation, however, it will be assumed that the results 
 obtained by minimizing each function independently will yield an adequate 
 estimate of the minimum cost, i.e. C = Min Cost g + Min Cost g.. + ... + 
 Min Cost g. . 
 
 A 
 
 d 
 
 TABLE 1 
 
 MULTI - OUTPUT 
 
 LOGIC 
 
 NETWORK 
 
 -l 
 
 -►a 
 
 -►a 
 
 A = a . a * a, 
 -1 o 1 
 
 a. 
 
 Figure h. Network Definition of Table 1 
 
26 
 
 We now consider the cost of the multipliers. It is beyond the scope 
 of this work to develop a cost-performance analysis for multiplication struc- 
 tures. The approach adopted here is to present a structure which experience 
 has shown to be efficient and to approximate C from the structure. More 
 information about such a structure may be found in [ 8 ]. 
 
 The multiplier is illustrated in Figure 5. It consists of a cascade 
 of limit carry-borrow adder-subtracters together with shift-gates (S.G. ) which 
 form the necessary multiples of the multiplicand (rp). Shift gate SGO , in con- 
 junction with complementing circuits, form the multiples +1 and +2 times; SGI 
 
 2i +1 
 forms +k , +8 times; and, in general, SGi , form multiples of +2 times the 
 
 multiplicand. The multiples are selected by a recoding of a through a.. 
 Appended to the output of the last adder is hardware which converts the pro- 
 ducts from the redundant representation produced by the limited-carry or 
 borrow device to a non-redundant format. The cost of Multiplier 1, C , will 
 be defined by 
 
 C M1 " J°R + N A N B C A + <N A +1) N B C SG + H B C c (3 ' 6) 
 
 where 
 
 C^ is the cost per input digit of the recoding logic, 
 
 N is the number of adders in the multiplier and is given by 
 
 .H. 
 
 N = Integer portion of (j + l)/2, 
 
 N is the number of bit positions per adder and is given by 
 B 
 
 N B = e + log 2 r, 
 
 C. is the cost of one position of an adder, 
 
 C is the cost of one position of a shift gate, 
 oG 
 
 C is the cost of converting one digit from redundant to non- 
 
 redundant form (assuming the use of look ahead techniques). 
 
27 
 
 A 
 
 rp 
 
 TABLE 1 
 
 Qi 
 
 R 
 
 E 
 C 
 
 D 
 E 
 R 
 
 SG = SHIFT GATES 
 
 ▼ ft 
 
 TO MULTIPLIER 2 
 
 1 1 
 
 SGI 
 
 SGO 
 
 ADDER 
 
 SG 2 
 
 ADDER 
 
 SG n 
 
 I 
 
 ADDER 
 
 I 
 
 CONVERTER 
 
 TO TABLE 2 INPUT 
 
 Figure 5- Structure of Multipliers 
 
28 
 
 The quantity, j, is the index of the low-order bit of A, the approxi- 
 mation of d (A = a . £La....a.), e is the number of bits to the right of the 
 
 o 12 j 
 
 radix point in rp, and r is the radix of the model division. As the need 
 arises, estimates of minimum values of C , C , C , and C may be obtained. 
 The cost of M , C , is given by Equation 3. 6 with e replaced by (e + log r) 
 replaced by 6, the number of bits in d. 
 
 3.3 Definition of Performance 
 
 3.3.1 Performance of the Model Division 
 
 Performance will be measured in units of number of bits of quotient 
 
 generated per gate delay. For practical cases, the number of bits of quotient 
 
 generated by the model division is log_ r. Since the divisor is constant for 
 
 a given division operation, the operating time of the model division is limited 
 
 by the paths driven by the remainder. The time, T , in gate delays, required 
 
 to produce a quotient digit, radix r, is given by 
 
 where 
 
 T=T +T +T +T 
 Q PREF Ml T2 R 
 
 T is the number of logic delays required in forming the 
 
 estimate of the remainder, 
 T is the number of logic delays required to form A rp in 
 
 Multiplier 1, 
 T is the number of logic delays to select a quotient digit 
 
 in Table 2, and 
 
 (3.7) 
 
 T is the number of logic delays to recode the output of T2. 
 K 
 
29 
 
 Performance of the quotient selector, P , is therefore given by 
 
 p ^ (3.8) 
 
 Q " m 
 
 Q 
 
 3.3.2 Performance of the Full Precision Division 
 
 The measure of primary interest is the performance of the full 
 precision division. We shall assume a full precision multiplication structure 
 similar to that shown in Figure 5. It consists of a cascade of K adder sub- 
 tracters each of which is capable of retiring K 1 bits of the multiplier. The 
 
 kk' 
 effective radix for multiplication is therefore r„, = 2 
 
 Let , 
 
 M be the quotient length in bits, 
 
 T be the number of logic delays required for the iterative 
 
 steps of division, 
 T be the number of logic delays required to add two full 
 
 precision numbers, 
 T be the number of logic delays required for control after 
 
 the quotient bits have been generated by the quotient 
 
 selector, and 
 N be the number of calls to the quotient selector. 
 
 Then, 
 
 T = M T +N (T +T) (3.9) 
 
 D ^, A Q v Q C ; vo y ' 
 
 where, if r is the radix of the model division, 
 
 N = M 
 Q log r (3.10) 
 
30 
 For this study, K' = 2, thus 
 
 T . M K + !^im (3.n) 
 
 D 1 2 los.r / 
 
 The performance of the full precision division is defined by 
 
 T D \T A log 2 r + 2(T Q+ T c )j (3 - 12 > 
 
k. ALGORITHMS FOR SYNTHESIS AND ANALYSIS 
 
 U.l Preliminary Remarks 
 
 The derivation of cost and performance functions by a direct, 
 analytic approach is complicated "by the discrete nature of these functions and 
 by the large number of variables. An empirical, constructive approach was 
 therefore adopted. The first phase of the experiment (the topic of this 
 section) required the formulation of a systematic approach to the synthesis of 
 a minimal cost, mathematically accurate, quotient selection mechanism for a 
 given set of design parameter values. Although the synthesis routines in 
 themselves would be of use in designing a quotient selection mechanism, in 
 this study they are used as tools in studying the cost and performance 
 functions. We are performing analysis by means of computer-aided synthesis. 
 
 In the second phase of the experiment, the programs developed in the 
 first phase were run with various combinations of parameter values in order to 
 
 estimate cost and performance. The results of each run might be thought 
 of as determining a point on a cost versus performance curve. The hope is 
 that only a few runs, relative to all possible parameter combinations, would 
 be necessary in order to find approximations which would be useful for inter- 
 polation and extrapolation. 
 
 But this empirical approach is not without major practical prob- 
 lems. There are a huge number of possibilities for parameter values, and the 
 minimization problems are very large and demanding of computer time. These 
 problems were mitigated by restricting the values of parameters to those of 
 practical importance and by concentrating on the effects of dominant parameters, 
 
 31 
 
32 
 
 As discussed in Section 3, the dominant cost term for a Type 2 
 structure is C^, the cost of Table 2. For a Type 1 structure, although the 
 cost of Table 1 (C™) may not dominate the cost of the multiplier, it is the 
 least studied term. The following sub-sections comprise a description of 
 algorithms which generate logic equations which define Table 1 and Table 2 
 for given values of design parameters. The algorithms do not produce a defi- 
 nition of the other blocks of Figure 2, but do place some constraints upon 
 their structure. 
 
 h.2 Deriving a Minimal Cost Design for Table 2 
 
 Conceptually, Table 2 in Figure 2 is a direct implementation of a 
 P-D plot. To implement a given P-D plot, a relation must be defined from the 
 set D x P to a subset of D x P, D x P, such that each element of D x P maps 
 into an element of D x P and with error bounds for each element (d, rp) such 
 that the quotient selection criterion is satisfied. Note that we have not 
 required that the relation be a function, since, due to redundant representa- 
 
 A A 
 
 tion, the same rp-value or d-value may map into different rp or d values; 
 uniqueness is not guaranteed. For practical reasons the relation is restricted 
 to those which may be defined by the successive operations of truncation and 
 assimilation (conversion to a non-redundant form). Even within this restriction, 
 however, there are many possible alternatives. The maximum amount of trunca- 
 tion error which may be tolerated for a given pair (d, rp) depends upon the 
 location of the point. There is also trade-off between e and 6, the points of 
 truncation of rp and d, respectively. 
 
 The following is a list of the steps in the process of deriving a 
 minimal cost design for Table 2. 
 
33 
 
 1. Set the values for design parameters: 
 n, r, a, b, a', 8', Y 1 » ^'» e> 6 -* 
 
 2. Run the program QS3 (described in Section U.2.1) to produce 
 
 a sum-of -products (minterm) definition of each output function 
 of Table 2. 
 
 3. Run the program, PI, with each set of minterms produced by 
 QS3 as input. The program PI finds all prime implicants of 
 the functions, identifies the essential prime implicants, and 
 generates the constraints which must be satisfied in order to 
 cover the function. 
 
 h. Run an Integer Linear Programming routine to find a minimal 
 cost set of prime implicants which satisfy the constraints 
 produced in step 3. The cost of a prime implicant is the 
 number of literals. The combination of the prime implicants 
 selected in this step, together with the essential prime 
 implicants identified in step 3, define the Boolean function. 
 
 5. Tabulate the total number of literals required to define 
 each output functions. The total of these values will be 
 taken as the cost of implementing Table 2. 
 
 U.2.1 Defining the Output Functions 
 
 As described in Section 3.2.2, Table 2 is treated as a multiple out- 
 put network. This section describes an algorithm for specifying these 
 
 'Initially, Table 2 is studied apart from Tl, Ml, and M2. A = P (d) = 1. 
 
3^ 
 
 functions as sums-of-products of minterms, The minterms are formed by con- 
 catenating bit vectors, rp, with bit vectors, d. A Fortran program called QS3 
 (Quotient Selection Program 3) was written to accept design parameters and to 
 produce the minterm definitions of each of the output functions, f (rp, d), 
 ..., f n (rp, d). 
 
 The derivation will be restricted to the first quadrant (positive 
 rp and d) of the P-D plot. The full P-D plot is symmetric about both axes and 
 thus the cost of implementing one quadrant is a good estimate of the cost of 
 implementing any other. 
 
 Figure 6 illustrates a portion of the first quadrant of a P-D plot. 
 Three adjacent quotient regions, q (i+l), q(i), and q (i-l) are designated 
 together with the hori zonal line, rp = rp = mArp. Every line of this form will 
 be designated an "rp-line". The quantity, m, is an integer, and Arp = 2 
 The task of defining the output functions for Table 2 may be reduced to that of 
 assigning adjacent sections of every rp-line to one and only one q-region. For 
 example, the segment of the rp-line between d = a and d = b must be subdivided 
 into three segments: one in each q-region shown. The dividing line between 
 adjacent line segments assigned to q(i) and q (i+l) will be called the 
 "divisor transition value between q(i) and q(i+l)." A divisor transition value 
 between q(i) and q(i+l) may be picked from a sub-range of the divisor between 
 the intersections of the rp-line and the boundaries of the overlap region. 
 The range in which the divisor transition value may be chosen is determined 
 as follows . 
 
35 
 
 rp = rp=mArp 
 
 UPPER BOUND OF q(i) 
 
 LOWER BOUND OF q(i + l) 
 
 UPPER BOUND OF q(i-l) 
 
 •LOWER BOUND OF q( i ) 
 
 •> d 
 
 d = a 
 
 d=b 
 
 Figure 6. Portion of P-D Plot Illustrating Segmentation of rp-line 
 
 Let d be the divisor transition value for rp = rp, between q(i) and 
 q(i-l). Then the ordered pair (rp, d ) will be representative of all (rp, d) 
 in the rectangle shown in Figure 7- Since d is a transition value, (d , rp) 
 implies a quotient digit of i-1 and (d - Ad, rp) implies a quotient digit of i. 
 
 X* 
 
 The rectangle corresponding to (d , rp) must be completely within the 
 q(i-l) region. The strictest bound is therefore at the upper, lefthand corner 
 of the rectangle in Figure 7, and thus the following must hold. 
 
 (U.l) 
 
 rp + y - (i - 1 + p) (d t - a) 
 
36 
 
 A 
 
 rp 
 
 UPPER BOUND OF q,(i-t) 
 
 rp = (i-l+/>)d 
 
 LOWER BOUND OF q.(i) 
 
 rp = (i-/>)d 
 
 Figure "J. Portion a P-D Plot Illustrating Constraints in 
 Finding Divisor Transition Interval 
 
 Similarly, the rectangle corresponding to (d - Ad, rp) must be com- 
 pletely within the q(i) region. The strictest bound in this case is at the 
 lover, righthand corner of the rectangle where the following must hold. 
 
 rp - A * (i-p) (d t -Ad+3) 
 
 (U.2) 
 
 In practical cases, to insure that all d values map into at least 
 one d value, Ad = 3 and thus (^.2) becomes 
 
 rp 
 
 - A * (i-p)d a 
 
 (U.3) 
 
 Combining (U.l) and (^.3) yields a range restriction on d , namely, 
 (rp + Y )/(i-l+p) + ot ^ d t ^ (rp - X)/(i-p) 
 
 (k.k) 
 
 Note that the strategy is to select the size of the rp-steps, Arp, 
 
37 
 
 and to allow the algorithm to find the maximum size steps allowable for d. 
 Theoretically, the program could be designed such that Ad would be specified 
 and the precision requirements for the partial remainder would be determined. 
 The former approach is taken due to the fact that control of Arp is more 
 critical. The precision of the estimate of the partial remainder (the number 
 of bits) should be kept low in order to keep down the time required to convert 
 from a redundant to a non-redundant form. The logic paths involving rp as 
 opposed to those involving d, are changing with each call to the model 
 division. For this reason there is motivation to simplify the logic involving 
 only rp at the expense of complicating the logic involving only d. It should 
 also be realized that the precision requirements on the estimate of the par- 
 tial remainder are based upon worst case calculations. Although QS3 uses 
 this worst case precision uniformly in generating the division precision 
 requirements, the minimization routines will remove unneeded precision. 
 
 The quantity, d , may be any value in the range defined in Equation 
 k.h. Since the design goal is to minimize the total number of literals 
 required to implement the table, d is picked to be a number which can be 
 represented with the fewest bits. In other words, if all numbers in the range 
 
 specified by (U.U) are represented as the ratio of two integers in the form 
 
 M 
 N/2 , the d selected is one satisfying (U.*0 and with the minimum value of M. 
 
 Using the algorithm of selecting the simpliest binary number in the 
 
 allowable divisor transition ranges, the rp-line in Figure 6 is divided into 
 
 three segments , as follows : 
 
 Segment Assigned to 
 
 a ^ d < d q(i+l) 
 
 d tl" d <d t2 * (1) 
 
 d t2 ^ d <- b q(i-l) 
 
where 
 
 38 
 
 The segments are next defined as minterms and the minterms are assigned to the 
 appropriate output function, f . , f . , f . , etc. 
 
 The complete family of rp-lines is produced by stepping along the 
 rp-axis (beginning at o) in increments of Arp. By segmenting the rp-lines at 
 the boundaries of the P-D plot and the divisor transition values, each quotient 
 region, q(i) for i=0 through n, is defined by a set of triplets of the form 
 
 ( d, , d , mArr* ) 
 v l,m' r,m' *' 
 
 d., is the left end of the segment of the mth rp-line 
 
 l,m r 
 
 in q(i); 
 
 d is the right end of the segment of the mth rp-line 
 
 r ,m e 
 
 in q(i ) ; and 
 m*Arp defines the values of rp. 
 
 Rather than being stored as triplets, each segment is stored as a set of min- 
 terms . 
 
 Given the ordered pair, (d, rp), the minterm equivalent is rp d 
 where | denotes bit string concatenation. The minterm may be represented as 
 a bit string or as decimal integer equivalent of rp I d, treated as a binary 
 integer. Each triplet, (d n , d , mArp) is transformed into a pair of 
 minterms, (MINTRM , MINTRM ). Under this transformation, each segment of the 
 rp-line is logically defined by MINTRM v (MINTRM + l) v ... v MINTRM^ 
 The triplets are converted to minterms as follows. 
 
 The quantities d, and d are all divisor transition values and 
 
 ^ 1 ,m r ,m 
 
 are therefore of the form N/2 . For a given q(i) region, find the largest 6, 
 
 6 , required to represent d n or d . Then 2 is the maximum precision 
 max e l,m r,m 
 
39 
 
 required to represent d. Given d n = N / D n ; d = N /D ("both fractions 
 
 l,m 1 1 r,m r r 
 
 in reduced form) rp = mArp , and NBDL = the number of hits of the divisor to 
 the right of the radix point, then 
 
 MINTRM = m2 (6 max + NBDL ) + ( 2 6 majc N )/D (4.5) 
 
 MINTOM = (m2 (6max + NBDL) + (2 6ma * N )/D ) - 1. (U.6) 
 r r r 
 
 A useful estimate of the number of minterms required to define a 
 
 given q (i) region may be derived. The QS3 algorithm will actually select the 
 
 upper and lower boundary of each q (i) region which will be a stairstep in the 
 
 transition region between q (i), q (i + l) and q (i - l). For purposes of 
 
 this estimate, assume that the dividing line between q (i) and q (i + l) is 
 
 the average value between the upper boundary of q (i) and the lower boundary 
 
 of q (i + l). The boundary between q (i) and q (i + l) is thus defined by 
 
 rp = (i + 1/2) d. The area of each q (i) region will be defined as the area 
 
 between the lines d=a, d=b,rp=(i+ 1/2) d, and rp = (i - 1/2) d. Thus, 
 
 \ ? ? 
 
 Area (q (i) ) = j x dx = (b - a )/2. (k.-j) 
 
 a 
 
 The area is independent of the value of the quotient digit. Let e 
 
 be the number of bits to the right of the radix point in rp (Arp = 2 ) and 
 
 ■A 
 
 6 be the number of bits to the right of the radix point in d. Note that the 
 minimum value of 6 may increase with i . If the worst case value of 6 is 
 applied uniformly in defining all quotient regions, the bits of excess pre- 
 cision will become don't care literals in the course of minimization. To 
 reduce the minimization problem, 6 may be treated as a function of i by 
 
 A 
 
 defining 6 (i) as the minimum number of bits required in d in order to 
 correctly define the q (i) region for the given value of e. The number of 
 
ko 
 
 minterms for each q (i) region, M (i), is thus given "by 
 
 M (i) = (b 2 -a 2 ) 2 <e+6(l) " l5 . (U.8) 
 
 Figure 8 is an annotated flowchart of the program (QS3) which 
 actually produces the definition of the output functions for Table 2. The 
 following assumptions and conventions should be noted: 
 
 1. The program was written in Fortran and thus Fortran 
 notation and variable names are used in the flowchart. 
 
 2. In most cases, the Fortran variable names differ 
 from that used in Section 2. Included in the comments 
 are statements which related the Fortran name to that 
 used in the derivations. For example, DLEFT = a (2.l). 
 The number in parentheses is the section number in 
 which "a" is defined. 
 
 3. The divisor is restricted to positive values in a 
 non-redundant representation and thus a = in 
 Equation h.k. 
 
 k. Single circles on the flowchart denote entrances; 
 double concentric circles denote exits. 
 
Ul 
 
 READ DLNUM, 
 DLDENO, DRNUM, 
 DRDENO 
 
 DLEFT = DLNUM/DLDENO 
 DRIGHT = DRNUM/DRDENO 
 
 
 l 
 
 ' 
 
 
 
 READ ERR RP P, 
 ERR RP N 
 
 
 
 
 
 
 READ N, R 
 
 The endpoints of the divisor 
 interval are read in a fractional 
 form. DLNUM and DLDENO are the 
 numerator and denominator, 
 respectively, of the left end. 
 DRNUM and DRDENO are the 
 numerator and denominator, 
 respectively, of the right end. 
 
 DLEFT 5_a (2.1) 
 DRIGHT E b (2.1) 
 
 ERR RP P is the maximum positive 
 truncation error in rp; ERR RP N 
 is the maximum negative truncation 
 error in rp. 
 
 ERR RP P e Y (2.3) 
 
 ERR RP N = X (2.3) 
 
 N is the maximum allowable 
 quotient digit. R is the radix. 
 
 N = n (2.1) 
 
 R = r (2.1) 
 
 NR £ p (2.1) 
 Note: NR is REAL 
 
 Figure 8. Flowchart of QS3 Program 
 
1+2 
 
 DELRP = l./DENOM 
 
 FJ = J - 1 
 RP = FJ/DENOM 
 RPU = RP + ERR RP P 
 RPL = RP - ERR RP N 
 JMl = J - 1 
 
 
 \ 
 
 1 
 
 
 
 IZCK = 
 IWHICH = 1 
 
 
 
 
 
 
 DELRP is the increment between 
 successive values of rp. DENOM 
 is defined by an assignment state- 
 ment prior to this step. 
 
 DELRP = Arp 
 
 JMAX is the upper limit on the 
 index use to step along the 
 rp-axis. 
 
 This is the beginning of the outer 
 loop which steps along the rp-axis. 
 
 Compute the present value of RP 
 to be used as rp and also the 
 upper (RPU) and lower (RPL) 
 bounds of the rp values 
 represented by rp. 
 
 Initialize two control variables. 
 If IZCK remains at through the 
 inner-loop, which varies the 
 quotient digit, then no divisor 
 transition intervals occur between 
 (a,b). IWHICH = 1 indicates that 
 we are looking for the first 
 divisor transition interval for 
 the present value of rp. In this 
 case, a =DLEFT, will be used as 
 the left end of the segment. 
 
 Figure 8 (continued). Flowchart of QS3 Program 
 
U3 
 
 QI = (RPU/DLEFT) + 1 - NR 
 
 I ■ QI + 1 
 
 ID (I.GT.N) I = N 
 QI = I 
 
 1 lift 1 h. 
 
 
 
 ' 
 
 DUL = RPU/(QI - 1 + NR) 
 
 
 
 QI, the quotient digit value, is 
 initialized at the greatest value 
 such that the part of the line 
 segment formed by RP + FJ/DENOM 
 and the end points of the divisor 
 interval, (a,"b), is in the 
 Ql-region. 
 
 DUL is the left endpoint of the 
 divisor transition interval 
 between QI and QI - 1. 
 
 This tests whether or not the 
 transition interval is to the 
 left of the left boundary of the 
 P-D plot. If so, QI is decremented, 
 
 A divisor transition interval within 
 (a,b) has been found. 
 
 This tests whether or not the 
 divisor transition interval is to the 
 right boundary of the P-D 
 plot. If so, continue with new 
 RP- value . 
 
 Figure 8 (continued). Flowchart of QS3 Program 
 
kk 
 
 DUR = RPL/(QI-NR) 
 
 CALL DT (DUL, DUR, NN, MM) 
 
 CALL MINTAL (IWHICH, NN, MM, 
 J-l, QI) 
 
 IWHICH = 
 
 DUR is the right endpoint of 
 the divisor transition interval 
 between QI and QI - 1. 
 
 Subroutine DT selects the 
 divisor transition value between 
 DUL and DUR. The value selected 
 is returned in a fractional form 
 (NN/MM). MM = 2 111 , where m is the 
 smallest integer such that 
 DUL < NN/MM < DUR. 
 
 Subroutine MINTAL creates the 
 minterm definition of the rp-line 
 segments. If IWHICH = 1, then 
 DLNUM/DLDENO is the left end of 
 the segment and NN/MM is the 
 right end. If IWHICH = 0, 
 then the value of NN/MM on the 
 previous call to MINTAL is the 
 left end and the present NN/MM 
 is the right end. J-l denotes 
 the rp-line and QI, the quotient 
 region. 
 
 Set IWHICH = 0. 
 
 Figure 8 (continued). Flowchart of QS3 Program 
 
U5 
 
 TO 
 
 NN = DRNUM 
 MM = DRDENO 
 
 I 
 
 CALL MINTAL (rWHTCH,NN,MM, 
 
 J-1,01) 
 
 Decrement QI 
 
 Check whether or not all 
 Ql-regions have been accounted 
 for. 
 
 Use DRNUM/DRDENO as the right end 
 of the last rp-line segment for 
 the present. 
 
 End of DO-Loop which increments rp. 
 
 Figure 8 (continued). Flowchart of QS3 Program 
 
1*6 
 
 4.2.2 Minimizing the Output Functions 
 
 A discussion of the minimization of two level switching circuits is 
 beyond the scope of this thesis. However, this section sketches the approach 
 used in this work and references a detailed description of the algorithms. 
 These algorithms are noteworthy due to the fact that they will minimize 
 functions of many variables involving many minterms. In the present work 
 they have been used to minimize functions of 19 variables with over 3100 
 minterms. 
 
 The program QS3 generates a sum-of -products (each product is a min- 
 term) definition of each output function. For each function, the remaining 
 tasks are: l) to obtain all the prime implicants of the function; and 2) to 
 select a minimal cover which consists of some subset of all prime implicants. 
 
 The program used to accomplish step 1 was recently developed by 
 V. G. Tareski [3^ ] . It is an extension of an algorithm developed by Carroll 
 et. al. [35] in late 1968. Tareski has coded his improved version of the 
 algorithm in both PL/1 and Fortran IV on the IBM 360/75- 
 
 The output from the program (PI for Prime Implicant) is a list of 
 prime implicants, each in the form: 
 
 TTTTTT, where T is 
 
 1 if the corresponding variable appears in the true form; 
 if the corresponding variable appears in the complement 
 
 form • and 
 X if the corresponding variable is not present. 
 
 Each prime implicant is assigned an identification number. The PI program 
 also partially solves the covering problem in that it identifies all essential 
 
hi 
 
 prime impli cants. A prime implicant is essential (must be selected for the 
 covering) if it covers a cell in the n-cube representation of the function 
 which is not covered by any other prime implicant. 
 
 The program generates a set of constraint equations which must be 
 simultaneously satisfied to guarantee covering. The constraints are 
 specified by a set of equations, each of which is a Boolean sum of prime im- 
 plicant identification numbers. The identification number is "true" if the 
 prime implicant is selected; false otherwise. For example, two such equations 
 might be 
 
 2 v 5 v 7 = '1' 
 
 5 v 9 v 11 = »1' 
 
 The set of constraint equations pose a covering problem, i.e. the 
 problem of finding a set of prime implicants which satisfy every equation. 
 The problem is further constrained by the requirement that the sum of the 
 literals of the selected prime implicants be minimal. Fortunately, Liu [36 ] 
 and Ibaraki et. al. [37 ] recently developed a very efficient algorithm and 
 computer program which will solve this problem. The program accepts the 
 constraint equations together with the number of the literals in each prime 
 implicant, and produces a minimal cost covering. These prime implicants 
 together with the essential prime implicants found by the PI program con- 
 stitute the total function. An example is given in Appendix B. 
 
 It must be noted that the minimization program is not making explicit 
 use of "don't care" minterms. If e' is the total number of bits in rp and 6' 
 
 is the total number of bits in d, then the total number of minterms which can 
 
 6 ' + e' 
 be formed by concatenating rp and d is 2 . Many of these minterms may 
 
 not correspond to area within the range of the P-D plot and therefore are don't 
 
U8 
 
 cares in the sense that they may be arbitrarily added to or deleted from a 
 function depending upon which yields the simplest function. In the cases 
 actually designed, the number of don't cares far exceeds the number of true 
 minterms. For example, with a divisor in the range 1/2 to 1, the number of 
 minterms required to define a P-D plot with p = 2/3 and a uniform grid of 
 2 x 2 is .25 r 2 , and the number of don't care minterms is 
 •75 r 2 . Since in cases studied 6+e may be as great as lU, the don't care 
 minterms would severely tax the minimization routines. They have, therefore, 
 not been included explicitly. The potential effect of the don't cares can be 
 approximated in specific cases considering the following observations; 
 
 1. For d in the range 1/2 - d ^ 1, the don't care 
 minterms corresponding to area of the P-D plot to the 
 left of d = 1/2 would eliminate the d bit of weight 
 1/2 from all output functions of Table 2. The cost 
 in literals, therefore, reduced by the number of 
 prime impli cants . 
 
 2. If the don't care minterms above the upper 
 boundary of the q (n) region are combined with the 
 true minterms defining q (n), the output function 
 for q (n) is greatly minimized. The cost of q (n) 
 will, therefore, be neglected in estimating the total 
 cost of Table 2. 
 
 3. If the don't care minterms above the upper 
 boundary of q (n) region are combined with the true 
 minterms defining q (i), i ^ n, then some literals 
 may drop out of the bit string corresponding to the 
 
 
i+9 
 
 As 
 
 integer portion of rp, "but none are removed from the 
 fractional part of rp or d. This reduction may be 
 approximated by studying the problem of minimizing a 
 decoder of the integers through n, each of bit 
 length, log r. The minterms n + 1 through r - 1 
 should be treated as don't cares. It has been estimated 
 that this effect will reduce the total cost of Table 2 
 by about 15$. 
 
 ^.3 Deriving a Minimal Cost Design for Table 1 
 
 This section describes the algorithms used to synthesize a design 
 for Table 1 of a Type 1 structure. The approach can yield only an estimate 
 of minimal cost since the minimization algorithm is applied to each output 
 function independent of the others. Furthermore it has not been demonstrated 
 that the algorithm used to define the output function necessarily produces a 
 minimal cost design. Despite these shortcomings, the algorithms appear to 
 produce sufficiently accurate results for purposes of cost comparison and for 
 studying trade-offs between the cost of Table 1 and Table 2. 
 
 The following is a list of the steps in the process of generating 
 Table 1 and evaluating the cost: 
 
 1. Set the values for design parameters = n, r, a, b, a, (3, Y» ^ - 
 
 2. Run the program QSU (described in Section U.3.1) to produce a 
 sum- of -products (minterm) definition of each output function of 
 Table 1. 
 
 3. Run the program PI (Section U.2.2) with each set of minterms 
 produced by QSU as input. 
 
50 
 
 h. Run an Integer Linear Programming routine to find a minimal cost 
 set of prime implicants which satisfy the constraints produced 
 in step 3. 
 
 5. Tabulate the total number of literals required to define each 
 
 output function. The total of these values will be taken as the 
 cost of implementing Table 1. 
 
 U.3.1 Defining the Output Functions 
 
 Generating a quotient digit using a Type 1 structure is accomplished 
 
 as follows 
 
 1. Given d, form an estimate of d, d, and from d form an estimate 
 of 1/d, A. 
 
 2. Form y = rp • A + 1/2. 
 
 3. Take the integer portion of y as the quotient digit, i. e. 
 q = I (y). 
 
 The algorithm consists of two steps: 
 
 1 . For a given Arp , y , A , n, r, a, 3 , find a D such that the 
 selection critereon is satisfied everywhere on the P-D, plot. 
 
 2. The d-values are of the form j Ad, where j is an integer. Each 
 d represents a divisor interval d to d + Ad. For every d, we 
 must find a value of the function A (d) such that if (d, rp) 
 implies q = i, then I (A(d) rp + 1/2) = i. 
 
 
51 
 
 The strictest bounds occur in the vicinity of the transitions 
 between adjacent quotient regions. For a given d consider rp lines in the 
 vicinity of the intersection of d and the upper boundary of q (i-l) and lower 
 boundary of q (i). See Figure 9. 
 
 rp = (t-l + />)d 
 
 rp = (i -/o)d 
 
 Figure 9. Portion of P-D Plot Illustrating Constraints in Finding A(d) 
 
52 
 
 Each rp-line has a division transition range between i and i-1 with 
 left end given by 
 
 d 1 (rp) = ( (rp+ Y )/(i-l-p) ) + a (4.9) 
 
 and right end given by 
 
 d r (rp) = (rp- A)/(i-p) (4.10) 
 
 This derivation is given in Section 4.2,1. 
 
 If d * d ± (rp) (U.ll) 
 
 then a quotient digit of i must be selected and thus a value of A(d) 
 must be found such that (i-l/2)/rp - A(d) < (i+l/2)/rp. Similarly, if 
 
 d + Ad > d (rp) (4.13) 
 
 then a quotient digit of i-1 must be selected and thus an estimate must 
 be found such that 
 
 (i-3/2)/rp ^ A(d) * (i-l/2)/r P (k.lk) 
 
 For a given value of i and d, find the minimum value of rp such that 
 Equation 4.11 is true. Denote this quantity rp . Also find the maximum value 
 of rp such that Equation 4.13 is true. Denote this value rp . 
 
 Substituting these quantities into Equations 4.12 and k.lk, 
 respectively, yields 
 
 (i-1/2) rp tQp ^ A(d) ^ (i+l/2)/rp (4.15) 
 
 (i-3/2) rp bQt ^ A(d) * (i-l/2)/rp bQt (4.l6) 
 
 A value of A (d) is needed which satisfies both Equations 4.15 and 
 4.l6. Such a value must be within the range 
 
 (i-l/2)/rp ^ A(a) * (i-l/2)/rp bQt (4.17) 
 
53 
 
 Denote the lower bound of this range, LB(i), and the upper bound, 
 
 UB(i). Now for all i, find maximum value of LB(i) and designate it LB max. 
 
 Find minimum UB(i) and designate it UB min. Then select A(d) such that 
 
 LB = A(d) = UB . and A(d) is the simplest binary number in the range, 
 max mm * ° 
 
 Every value of d is of the form mAd where m is an integer and d is 
 
 a negative, integer power of 2. The index, m, is therefore a unique, minterm 
 
 definition of d. Let a ., a . a, a. be a bit string representation of 
 
 -1 o 1 j 
 
 A(d). Each bit corresponds to a Boolean function of d and thus a Boolean 
 function of m. 
 
 a -l = g -l m 
 a Q = g o (m) 
 
 a l = S l ^ 
 
 a = g (m) 
 J J 
 
 Each function, g. , is defined as the OR of all d-minterms for which 
 a. is 1 in the bit string version of A(d). In other words, the set of min- 
 terms , M. , corresponding to g. is 
 
 M. = {mla. in A (mAd) is 1} . 
 l ' l 
 
 Figure 10 is an annotated flowchart of the program (QSU) which 
 
 actually produces the definitions of the output functions for Table 1. 
 
5h 
 
 For given values of r , n , a , 3 . 
 A , y find the maximum Ad which 
 will satisfy the precision 
 requirements everywhere on the 
 PD-Plot . 
 
 DELD = Ad 
 
 Generate the array NDT (i) 
 where NDT (I) is the numerator 
 of the Ith value of d, where 
 d = (I - M) *Ad, M is a constant 
 determined by the minimum value 
 of d. Let MM1 be the number of 
 elements in NDT. 
 
 This loop increments the 
 value of d. MM1 is the 
 number of d values. 
 
 DO 290 
 1=1, MM1 
 
 
 1 
 
 ' 
 
 D = NDT (I) * DELD 
 
 
 \ 
 
 
 
 Q = N 
 
 
 d = d 
 
 Set quotient digit 
 value at N. Work 
 from Q = N down to 
 Q = 1. 
 
 Figure 10. Flowchart of QS^ Algorithm 
 
55 
 
 ERPP = l./DELRP 
 ERPN = 
 
 ERPP = l./DELRP 
 ERPN = ERPP 
 
 NP1 
 NM1 
 
 N + 1 
 N - 1 
 
 DO 95 
 K = 1, N 
 
 Define maximum 
 truncation error 
 in rp. 
 
 Q = NP1 - K 
 
 Work from Q = N 
 DOWN to Q - 1. 
 
 J = D * (Q-NR) * DELRP 
 
 Find minimum rp for which 
 transition interval could 
 intersect d^ 
 
 Note 
 
 V 
 
 DELRP = 1/Arp, 
 
 Figure 10 (continued). Flowchart of QS^ Algorithm 
 
56 
 
 (200 \ 
 
 RP = J/DELRP 
 RPU = RP + ERPP 
 DL = RPU/(Q-1+NR) 
 
 Yes 
 
 IQ = Q 
 
 DIMIW(IQ) = (Q-.5)/RP 
 
 1 
 
 J = J 
 
 I 
 
 RP = J/DELRP 
 RPL = RP-ERPN 
 DR = RPL/(Q-NR; 
 
 J = J + 1 
 
 No 
 
 Find left end, DL, of divisor 
 transition interval for present 
 RP. 
 
 ERPP = Y 
 DL = d 
 
 a = 
 
 RPTOP has been found . 
 IQ is an integer version 
 of Q. 
 
 Move down to next lower 
 rp. 
 
 Find right end, DR, of divisor 
 transition interval for present 
 RP. 
 
 Figure 10 (continued). Flowchart of QSU Algorithm 
 
 
57 
 
 DIMAX (IQ) = (Q-0.5)/RP 
 
 RPBOT has been found. 
 
 For J = 1, N 
 
 Find 
 LBMAX = max (DIMIN(j)) 
 UBMIN = min (DIMAX(j)) 
 
 ALL DT (LBMAX, UBMIN, DIN (i), DID (i)) 
 
 Subroutine DT finds a value for 
 the inverse of D, DI, such that 
 DI = DIN (I)/DID (I), 
 LBMAX <_ DI <_ UBMIN, and DI is 
 the simplest binary fraction in 
 the interval. 
 
 TN 
 TD 
 
 DIN (I) 
 DID (I) 
 
 Figure 10 (continued). Flowchart of QSU Algorithm 
 
58 
 
 DO 68 
 K = 1, 12 
 
 This DO-Loop assigns each 
 minterm corresponding to 
 a d value to the appropriate 
 output functions. 
 
 IP(K) = IP(K) + 1 
 A(K,IP(K) = NDT(I) 
 
 TN = 2 * TN 
 
 IP(K) = IP(K) + 1 
 A(K,IP(K))= NDT(I) 
 
 If D = WDT (I) * DELD implies 
 hit K of the output is 1, the 
 NDT (I) is added to the 
 minterm list for A(K). IP(K) 
 the pointer for the Kth list. 
 
 is 
 
 Figure 10 (continued). Flowchart of QSi+ Algorithm 
 

 59 
 
 ^.3.2 Minimizing the Output Functions 
 
 The same techniques used to minimize the output functions of Table 2 
 are used to minimize the output functions of Table 1. These were described 
 in Section U.2.2. 
 
5. RESULTS FROM DESIGN PROGRAMS 
 
 5.1 Preliminary Remarks 
 
 The series of computer runs of the design and analysis routines 
 described in the last chapter gave rise to four types of results. First, the 
 algorithm produced numerical results for the cost of implementing Table 1 or 
 Table 2 for various values of design parameters. But in retrospect it appears 
 that the value of the computer was more insight than numbers. Studying the 
 numerical results gave rise to some theoretical results with which to attack 
 the problem of determining cost without actual design. 
 
 A third result was a discrepancy. For some parameter values the 
 theoretical results and the results obtained from the computer-aided synthesis 
 were in disagreement. Closer study revealed a weakness in the QS3 algorithm. 
 The fourth and final result of the work to date was therefore an improved 
 algorithm for designing Table 2. 
 
 5.2 Numerical Results from Design Programs 
 
 5.2.1 Cost of Table 2 for Type 2 Structure 
 
 Considering the large number of possible combinations of parameter 
 values, even if restricted to practical cases, very few designs were actually 
 generated in this present work. After generating the cost data for Table 2 
 with r = 16, n = 10, a=l/2,b=l, y=A = l/l6, a = 0, and B = 1/256, 
 sufficient insight was gained to propose an analytic expression for the cost 
 of implementing each quotient region of the table. Two additional runs of 
 the Table 2 routines with different parameter values tended to substantiate 
 
 60 
 
6l 
 
 the predicted costs, but several points stood out as discrepancies. In 
 attempting to reconcile the disagreement, a flaw in the QS3 algorithm was 
 discovered: the selection of divisor transition values as the simplest binary 
 number in the transition interval does not necessarily produce a minimum cost 
 design. In view of this flaw, further runs of the algorithm were not justi- 
 fied. The major emphasis was shifted to that of developing a reasonable 
 derivation of an analytic cost expression and to developing an algorithm which 
 would in fact yield correct results which could be used to verify the ex- 
 pression. The parameter values selected correspond to practical cases. Let 
 r, denote the radix and assume a multiplication structure in which the follow- 
 ing multiples of the multipland are available: + 1 or +_ 2, +_ k or +_ 8, +_ l6 
 or +_ 32, ... , +_ (r-2) or + (r-l). Each of the groups such as +_ k or +_ 8, 
 correspond to a two-way shift gate. Only one of the two multiples may be 
 selected simultaneously. The magnitude of the maximum multiple which may be 
 formed, n, is therefore 2 + 8 + 32 + . . . + (r - l) = 2 (r - l)/3. Since the 
 same structure is used for division, the maximum quotient digit is also n and 
 therefore in the cases studied, n = 2 (r - l)/3 and thus the redundancy ratio, 
 P, is 2/3. 
 
 As mentioned earlier, the study was restricted to the first quadrant 
 of the P-D plot. The divisor ranges considered were the binary normalized 
 case in which 1/2 <_ d < 1 , and a second case for which 3 A <. d. < 9/8. This 
 second case corresponds to a case in which a divisor in the range 1/2 <_ d < 1 
 is multiplied by 3/2, if d < 3/k. 
 
 The maximum truncation errors in rp, y and A, are initially set to 
 the maximum value for which the criterion in Section 2.3 is satisfied, l/l6. 
 Error was assumed in both directions so that the results would be applicable 
 
62 
 
 to symmetric adders or subtracters [10]. 
 
 The divisor is strictly positive and non-redundantly represented 
 thus a = 0. The positive truncation error was the maximum necessary to 
 satisfy the selection criterion (Section 2.3) everywhere on the P-D plot for 
 the given value of y and A . 
 
 Table 2 summarizes the cost computations for a Table 2 structure with 
 r = 16, n = 10, a = 1/2, b = 1, y = l/l6, A = l/l6, a = 0, and 6 = 1/256. 
 Radix l6 was selected as sufficiently large to be interesting but not so large 
 as to demand great expense of computer time. Table k presents corresponding 
 results for divisors in the range 3 A - d < 9/8. No cost values are given for 
 the upper quotient region, q (n). These regions were not minimized since the 
 results would be highly inaccurate without the ability to include don't care 
 minterms. The upper boundary of q (n) need not be implemented since the range 
 restrictions imposed by the division algorithm would prohibit (d, rp) values 
 to occur above the q (n) region. All minterms corresponding to points above 
 the line rp = (n + p) d are therefore don't care minterms which sharply 
 minimize the cost of implementing the adjacent q (n) region. 
 
 Note that the cost of a Table 2 structure for r = h, n = 2 is also 
 contained within Table 2. Neglecting the upper region q(2) the cost is the 
 cost of q (0) + q (l) for radix 16 less 2 literals per required prime 
 implicant. 
 
63 
 
 Table 2. Summary of Cost Calculations for Table 2 with 
 
 r = 16, n = 10, a = 1/2, b = 1, y = 1/16, 
 A = 1/16, a = 0, 3 = 1/256. 
 
 Min. No. ... „ Mm. No. 
 
 . _,. Mm. No. . _ . 
 
 of Bits of Prime 
 
 No. of Required Impli- „ _ T . . . . 
 
 q- _., .5 terms No. of Literals Average 
 
 u . Bits in d to , _ _. cants to , _ _. _ . _ . 
 
 Region . _ _. to Define _ _. to Define Region Fan-in 
 
 in rp Define .. Define to 
 
 the _ . Region „ . , . N „,/.\ 
 T3 • Region ,,,?.\ C'(i) F'(ij 
 Region fa M' (1) 
 
 Est. Act. rp d Total 
 
 
 
 8 
 
 2 
 
 12 
 
 12 
 
 1+ 
 
 25 
 
 6 
 
 31 
 
 7-75 
 
 1 
 
 8 
 
 k 
 
 96 
 
 99 
 
 13 
 
 82 
 
 27 
 
 109 
 
 8.38 
 
 2 
 
 8 
 
 5 
 
 192 
 
 195 
 
 21 
 
 138 
 
 62 
 
 200 
 
 9.52 
 
 3 
 
 8 
 
 6 
 
 381+ 
 
 38U 
 
 36 
 
 236 
 
 129 
 
 365 
 
 10. lU 
 
 k 
 
 8 
 
 6 
 
 381+ 
 
 385 
 
 1+5 
 
 296 
 
 190 
 
 1+86 
 
 10.80 
 
 5 
 
 8 
 
 T 
 
 768 
 
 765 
 
 60 
 
 389 
 
 269 
 
 658 
 
 10.96 
 
 6 
 
 8 
 
 7 
 
 768 
 
 771+ 
 
 72 
 
 1+61+ 
 
 331+ 
 
 798 
 
 11.08 
 
 T 
 
 8 
 
 7 
 
 768 
 
 761+ 
 
 81+ 
 
 51+1 
 
 1+2I+ 
 
 965 
 
 11.1+9 
 
 8 
 
 8 
 
 7 
 
 768 
 
 771 
 
 96 
 
 627 
 
 507 
 
 1131+ 
 
 11.81 
 
 9 
 
 8 
 
 8 
 
 1536 1526 
 
 109 
 
 711 
 
 581+ 
 
 1295 
 
 11.88 
 
 Totals 5l+0 3509 2532 60l+l 
 
64 
 
 Table 3. Summary of Cost Calculations for Table 2 
 
 with r = 16, n = 10, a = 3/4, b = 9/8, y = l/l6, 
 A = 1/16, a = 0, $ = 1/128. 
 
 q- 
 
 Region 
 
 
 Min. No. 
 
 Min. No. 
 of Min- 
 
 Min. No. 
 
 No. of 
 
 of Bits 
 Required 
 
 of Prime 
 Impli- 
 
 Bits 
 
 in d to 
 
 to Define 
 the 
 Region 
 
 cants to 
 
 in rp 
 
 Define 
 
 the 
 Region 
 
 Define 
 Region 
 M'(i) 
 
 
 Est. 
 
 Act. 
 
 
 8 
 
 4 
 
 24 
 
 24 
 
 2 
 
 8 
 
 4 
 
 45 
 
 44 
 
 7 
 
 8 
 
 5 
 
 90 
 
 91 
 
 14 
 
 8 
 
 6 
 
 180 
 
 180 
 
 23 
 
 8 
 
 6 
 
 180 
 
 181 
 
 27 
 
 8 
 
 7 
 
 360 
 
 359 
 
 32 
 
 8 
 
 7 
 
 360 
 
 362 
 
 40 
 
 8 
 
 7 
 
 360 
 
 358 
 
 54 
 
 8 
 
 7 
 
 360 
 
 363 
 
 54 
 
 8 
 
 7 
 
 360 
 
 358 
 
 61 
 
 No. of Literals Average 
 to Define Region Tan-in 
 
 C'(i) F'(i) 
 
 rp 
 
 Total 
 
 10 
 
 7 
 
 17 
 
 8.50 
 
 4o 
 
 26 
 
 66 
 
 9.43 
 
 84 
 
 57 
 
 141 
 
 10.07 
 
 139 
 
 101 
 
 240 
 
 10.43 
 
 166 
 
 127 
 
 293 
 
 IO.85 
 
 199 
 
 160 
 
 359 
 
 11.22 
 
 246 
 
 212 
 
 458 
 
 11.45 
 
 332 
 
 301 
 
 633 
 
 11.72 
 
 339 
 
 305 
 
 644 
 
 11.93 
 
 383 
 
 351 
 
 734 
 
 12.03 
 
 Totals 
 
 314 
 
 1938 1647 3585 
 
65 
 
 5.2.2 Cost of Table 1 for Type 1 Structure 
 
 The design of Table 1 is considerably less complicated than that of 
 Table 2 since it is a function of only one input rather than two. The costs 
 for radix k, 16, and 6k were generated and summarized in Table k. The com- 
 plexity of the table is adequate to produce a quotient digit in the leading 
 
 bits of the product A • rp, where A = f(d) and is of the form a 
 
 ^0 
 
 Table k. Summary of Cost Calculations for Table 1 with 
 a = 1/2, b = 1, y = 1/16, A = 1/16, a = 0. 
 
 Note: NPI = Minimum Number of Prime Implicants 
 NL = Minimum Number of Literals 
 
 Output 
 Bit 
 
 r = k, n = 2 
 6 = 1/16 
 
 NPI 
 
 NL 
 
 r = 16, n = 10 
 3 = 1/256 
 
 NPI 
 
 NL 
 
 r = 6k, n = k2 
 6 = 1/1024 
 
 NPI 
 
 NL 
 
 a n 
 
 1 
 k 
 7 
 7 
 
 1 
 
 3 
 
 8 
 
 12 
 
 16 
 
 19 
 18 
 
 1 
 8 
 29 
 56 
 79 
 95 
 109 
 
 1 
 
 1 
 
 k 
 
 13 
 
 9 
 
 38 
 
 18 
 
 91 
 
 28 
 
 153 
 
 Ul 
 
 239 
 
 63 
 
 Uoi 
 
 80 
 
 55k 
 
 80 
 
 579 
 
 9 
 
 70 
 
 Totals 
 
 19 
 
 77 
 
 377 
 
 333 
 
 2139 
 
66 
 
 5-3 Analytic Results Concerning Cost of Table 2 
 
 5.3.1 Preliminary Remarks 
 
 Figure 11 is a plot of cost in literals of implementing q(i) versus 
 i for results given in Table 2. To a first approximation the cost varies 
 linearly with i. This observation led to a comparison of the empirical results 
 with the theoretical, indirect measure of the cost of selection of quotient 
 digits suggested by Robertson [5 ]• This cost function also exhibits a similar 
 behavior with i. In the following we will review aspects of Robertson's work, 
 suggest extensions and then propose an expression for the cost of implementing 
 Table 2 as a function of design parameters. 
 
 1200 
 
 -J IftOO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 u 
 
 -i 
 
 
 
 
 
 
 
 
 
 
 H- 800 
 
 CO 
 
 O 
 O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 3E 
 
 Z 600 
 
 Z 
 
 Z 
 
 
 
 
 
 
 <f 
 
 
 
 
 
 
 
 
 
 
 
 
 
 = 400 
 
 O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 200 
 
 n 
 
 
 
 
 
 
 
 
 
 
 3 4 5 6 
 i- QUOTIENT REGION, q(i) 
 
 
 Figure 11. Cost of Implementing q(i)-Region vs. i for Data in Table 2, 
 
61 
 
 5.3.2 Definition of s., s!, and s'.' 
 
 1 i i_ 
 
 In Robertson's work the design problem is presented as that of 
 
 choosing comparison constants against which rp. is compared and of determining 
 
 J 
 
 the divisor range for which each comparison constant is valid. The proposed 
 measure of cost of selecting between q(i) and q(i-l) is the minimum number of 
 comparison constants required to cover the given range of the divisor. 
 
 The selection ratio, a. , is first defined. It is the ratio of the 
 
 i ' 
 
 slope of the line defining the lower boundary of q(i) to the slope of the line 
 defining the upper boundary of q(i-l), i.e., 
 
 a. i - p 
 
 i - 1 + p 
 
 The selection ratio is a relative measure of the width of the divisor interval 
 for which a single comparison constant is valid. The minimum number of divi- 
 sor intervals required to correctly distinguish between q = i and q = i-1 
 corresponds to the number of treads in the staircase between the upper boundary 
 
 of q(i-l) and lower boundary of q(i). 
 I 
 
 Let s. denote the minimum number of steps required to span the over- 
 lap region between q(i) and q(i-l) for the divisor range a to be as shown in 
 Figure 12. The slope of the upper boundary is v = i-l+p and the slope of the 
 lower boundary is w = i-p . Let A be the width of the rightmost tread, A be 
 the width of the second tread (moving from right to left), etc. The quantity, 
 h, is the height of the riser between tread 1 and tread 2. 
 
 By definition 
 
 w = h/A 1 , (5-2) 
 
 v = h/A 2 , (5-1) 
 
 and thus 
 
 A = a. A_. (5-3) 
 
 2 l 1 
 
68 
 
 v = i - 1 + p 
 
 w = i - p 
 
 In general, 
 
 By definition 
 
 Figure 12. Graphical Interpretation of s.. 
 
 A, = a. A, 
 k i k 
 
 (k-1) 
 a i A l 
 
 (5.k) 
 
 s. 
 
 ^a. (k - 1) A 1 - b-a. 
 k=l 
 
 (5.5) 
 
 The left side of Equation 5-5 is the sum of a geometric series and thus 
 
 s. 
 
 o. x - 1 
 
 A l a. -1 = b " a * 
 
 i 
 
 (5-6) 
 
69 
 
 Since A = b(l-a.), s. is the smallest integer that satisfies 
 
 s. 
 
 c. - a/b . 
 
 (5.7) 
 
 For present purposes, consider s. to be a continuous variable, rather 
 than an integer. Then, 
 
 s. = log(a/b) / log 0. . (5.8) 
 
 We will now change the expression for s. into a form which makes apparent the 
 linear behavior with i. By the properties of logarithms 
 
 log (a ) = log (i-p) - log (i-l+p) 
 
 = log (l+x) - log (1-x) 
 where x = (l-2p) / (2i-l). 
 
 With p restricted to the range 1/2 - p < 1, then -1 < x < 1 and thus a 
 series form of log (l+x) - log ( 1-x) may be used. Therefore, 
 
 2m-l 
 
 (5.9) 
 
 log a. =2 
 
 3 5 
 
 x 
 
 2m-l 
 
 (5-10) 
 
 and thus , 
 
 = 2x + h.o.t., 
 
 log(b/a) (i-1/2) / (2p-l) 
 
 (5.H) 
 
 The quantity, s . , as defined so far is based upon the assumption of 
 full precision in the representation of the divisor and partial remainder. The 
 expression for s. will now be modified to yield the minimum number of steps 
 required to transerve the transition region between q(i) and q(i-l) when only 
 estimates of rp and d are available, rp and d respectively. Assume as before 
 that rp is representative of rp-values in a range given by rp - A - rp - rp+y , 
 and that d is representative of d-values in the range d-a-d- d+B. For 
 
70 
 
 the time being, assume that rp and d may assume any value, not merely discrete 
 values. 
 
 If we consider the staircase to be the upper boundary of the (d, rp) 
 values defining the q(i-l) region, then for all pairs, (d , rp. ), defining 
 the risers and treads, the restriction 
 
 rp ± _ 1 + Y - v (^i_! " a ) (5-12) 
 
 must hold. Thinking of the staircase as the lower boundary of values defining 
 the q(i) region, then for all pairs (d. , rp. ) defining the risers and treads, 
 the restriction 
 
 rp\ - X ± w(d ± + 3) (5-13) 
 
 must hold. 
 
 Since adjacent values of rp are separated by Arp and adjacent values 
 of d are separated by Ad, 
 
 d. = d. _ - Ad, and (5.1*0 
 
 l l-l 
 
 rp. = rp. + Arp. (5.15) 
 
 The staircase must satisfy both restrictions 5.12 and 5.13 subject to 
 
 equations 5.lU and 5-15. Substituting Equations 5.1^ and 5*15 into 5«13 yields 
 
 another restriction in terms of rp. _ and d. n , namely 
 
 l-l l-l 
 
 rp. , + Arp - A ^ w(d. . - Ad + 6). (5-l6) 
 
 l-l l-l 
 
 For a given value of rp. the maximum tread width is therefore the distance 
 
 between the intersection of the line rp = rp. and the lines 
 
 rp = w(d - Ad + 8) + A - Arp, and (5-17) 
 
 rp = v(d - a) - y. (5«l8) 
 
71 
 
 v = i - 1 + p 
 
 w = 1 - p 
 
 (T) rp = vd 
 
 (2) rp = v(d - a) - y 
 
 @ rp = w(d - Ad + 3) + X - Arp 
 (h) rp = wd 
 
 Figure 13. Graphical Interpretation of s!, 
 
 Figure 13 is a graphical interpretation of the minimum step boundary 
 between q(i) and q(i-l) for this non- precise case. 
 
 The effect of the imprecision on s. may be thought of as shifting 
 the divisor range of the P-D plot by an amount, d 1 given by 
 
 ,, A + y - Arp + vet + w(B - Ad) 
 2p - 1 
 
 The value of s. in this case, denoted s!, is given by 
 
 (5.19) 
 
 s: 
 
 1 
 
 log( (a - d') / (b - d') ) 
 
 log c\ 
 
 (5.20) 
 
72 
 
 Note that this equation is equivalent to replacing a by a-d 1 and b by b-d 1 in 
 Equation 5.8. This may be verified by replacing A in Equation 5.6 by the 
 appropriate expression in the present case , namely by 
 
 A = b - v(b + e " M) " (y + A " Arp) * a (S 21) 
 
 1 v 
 
 Geometrically, d' is the value of d at the intersection of the lines 
 defined by Equations 5-17 and 5-l8. 
 
 Equation 5-19 implies that it is not merely the imprecision but 
 rather the redundancy in the representation of rp and d which increases the 
 number of treads in the boundary staircase. First, note that to insure cover- 
 ing, i.e. that every value of rp and d map into at least one rp and d, respec- 
 tively, the inequalities 
 
 X + y - Arp ^ 0, and (5-22) 
 
 a + 6 - Ad 5b (5-23) 
 
 must hold. This restriction forbids d' from being negative and thus s! being 
 less than s.. If A + y - Arp = 0, a = 0, and 6 - Ad = 0, then s! = s.. This 
 corresponds to the case in which every rp and d value map into one and only one 
 rp and d, respectively. 
 
 In terms of the P-D plot this means that there is no overlap between 
 the area represented by the pairs (d, rp). In this case, even though rp ^ rp, 
 and d / d, the selection is theoretically no more complicated than in the full 
 precision case. 
 
 For the cases treated in this study X = A'Arp, y = Y'Arp, a = 0, 
 3 = Ad, and thus 
 
 d , . top U' + T '-i) m (5 . 2lt) 
 
 2p - 1 
 
 
73 
 
 The analysis so far has allowed for an error in representing d and rp 
 
 but has not restricted the value of d and rp. In practice these are formed by 
 
 — ft 
 truncation and therefore are restricted to integral multiples of Ad = 2 and 
 
 — £ 
 
 Arp = 2 ' where 6 and e are the number of bits to the right of the binary point 
 in the representation of d and rp respectively. The location of the treads and 
 risers of the actual staircase which can be implemented may therefore simulta- 
 neously differ by as much as Ad and Arp, respectively. The maximum number of 
 steps (taking into account both error and discrete effects) required to define 
 the boundary between q(i) and q(i-l) may therefore be given by 
 
 log( (a - d") / (b - d") ) , q j,_, 
 
 i " logo. ^' d>) 
 
 where 
 
 d „ = X + y - Arp + 2 + v(a + 2 ) + w(g - Ad) (5.26) 
 
 2p - 1 
 
 The actual number of steps required, s. is therefore bounded by 
 
 i ac*c 
 
 'i act 
 
 (5.27) 
 
 Equation 5 .26 may be used to determine the minimum values of e and 6 
 required for a given P-D plot. The quantity, sV , and thus the cost, will tend 
 to infinity as d" approaches a. To insure that every region of the P-D plot 
 may be correctly defined for given values of A , y» a » 3> the quantities e and 
 6 therefore must be selected such that d" < a. 
 
 5*3.3 An Estimate of Cost as a Function of s! 
 
 i 
 
 In this section we will hypothesize an expression for the cost of 
 implementing the q(i) region of a given P-D plot. Consider the region to be 
 defined by a set of minterms corresponding to the set of ordered pairs (d, rp) 
 
Ik 
 
 ^ ( ' \ 
 
 for which q = i. Let Ad for the region be 2~ and Arp for the region be 
 
 -e(i) 
 
 2 . The number of minterms to define the region will be 
 
 M(i) = (b 2 - a 2 ) 2 e(i) + 6(i) " 1 . (5.28) 
 
 The fan-in to each minterm, F(i) is given by 
 
 F(i) = e' + 6« (5.29) 
 
 where e' = log r + e(i), and (5-30) 
 
 6' = l(log 2 (b - 2" 6(i) ) + 1) + 6(i). (5.31) 
 
 The term I (log (b-2 ) + l) is merely the number of bits of the divisor to 
 the left of the radix point. Recall that l(x) has been defined as the integer 
 portion of x. 
 
 The cost before minimization is given by 
 
 C T (i) = M(i) F(i) + M(i) (5-32) 
 
 The term MF is the number of literals in the AND gates, the term M is the 
 number of literals in the OR gate. 
 
 After minimization 
 
 Cvj(i) = M«(i) (F'(i) + 1) (5-33) 
 
 where M'(i) is the number of prime implicants and F'(i) is the average fanin 
 to each prime implicant. 
 
 In order to obtain approximations of M'(i) and F'(i), we now approxi- 
 mate the effects of minimization by the following algorithm. 
 
 Figure ±h illustrates a portion of a quotient region. Note that it 
 may be defined by a set of adjacent rectangles (denoted by heavy lines) each 
 of which is defined by a set of minterms (denoted by small squares). Consider 
 one of the rectangles of width ¥ and height H. Assume that minimization 
 
75 
 
 Figure ik. Model of the q(i) Region Used in 
 Approximating Effects of Minimization 
 
 procedes first in the d-direction by combining adjacent minterms which differ 
 by only the low order bit. If there were initially M minterms in the rectan- 
 gle, after the first step there are M/2 implicants. Next, the implicants which 
 differ only in the next to low-order position may combine to produce M/k impli- 
 cants, etc. The minimization in the d-direction continues for k = I (log W) 
 
 k d d 2 
 
 steps to form M/2 implicants. Similarly, combinations take place in the rp- 
 
 k + k 
 direction, further reducing the number implicants to M/2 d rp where 
 
 k rp = J (l °g 2 H) * 
 
76 
 
 The minimization of the quotient region will be characterized by an 
 average rectangle of dimensions WH. The width is defined by 
 
 W = 2 (b - a) / 7[ (5.3M 
 
 where , 
 
 i[ s (sj + aj[ +1 ) / 2. (5.35) 
 
 The quantity, W, is therefore the average width of the minimum- number treads 
 defining the upper and lower boundary of q(i). The height is defined by 
 
 H = 2 £ ( b+a ) / k (5.36) 
 
 which is the average value of the distance between rp = (i + 1/2) d (nominal 
 upper boundary) and rp = (i - 1/2) d (nominal lower boundary). 
 
 The preceeding argument suggest a cost expression of the following 
 
 form: * 
 
 C'(i) = ± ^ (F(i) - k + 2 ) (5.37) 
 
 where 
 
 M and F are defined by Equations 5.28 and 5.29, respectively, 
 
 and k is defined by 
 
 k = k + k 
 
 d rp 
 
 = log 2 WH (5-38) 
 
 The factors and are constants which will be determined 
 
 empirically. Equation 5*37 may be rewritten as 
 
 C'(i) = M'(i) F'(i) (5.39) 
 
 where 
 
 M'(i) = 2 ^ s! , and (5.U0) 
 
 *Note that C'(i) is the number of literals in the AND gates; 
 C'(i) = C'(i) + M'(i) is total number of literals for the region. 
 
77 
 
 k 0„ r s 1 
 
 .-6 
 
 F'(i)=log 2 % 2 + x ( lo S 2 (^ " 2 d ) + 1) . (5-Ul) 
 
 b -a 
 
 M'(i) is the minimal number of prime implicants required to implement the 
 Boolean function for q(i) and F'(i) is the average fanin to each prime impli- 
 cant . 
 
 We now use numerical results from Tables 2 and 3 to find values for 
 and and to test the predictive worth of Equation 5^39- The value of 
 is obtained by a least squares fit of the actual values of M'(i) to Equation 
 5.^0. The value of is obtained by a least squares fit of the actual values 
 of F'(i) to Equation 5.Ul. Values of = 2.12 and = 1.68 were 
 obtained. 
 
 Table 5 summarizes the results of the fit. Figures 15 9 16, and 17 
 display the results graphically with s\ as the independent variable. The heavy 
 line denotes the predicted values; the circles denote actual values. 
 
 Note that Equations 5-U0 and 5.^1 do not explicitly account for the 
 
 discrete effects resulting from the fact that the treads and risers of the 
 
 -e -6 
 q-region boundaries are restricted to integer multiples of 2 and 2 , 
 
 respectively. The effect is included empirically in the choice of and . 
 
 There are indications that a more explicit cost function of both s! and sV , 
 
 ^ 11 
 
 which does include discrete effects, might be found. For present purposes, 
 however, the estimates given by Equations 5*^0 and 5^1 were judged to be 
 adequate. 
 
78 
 
 Table 5. Results of Least Squares Fit of M'(i), F*(i), and C'(i) 
 
 a = 1/2, b = 1 
 
 for Data from Table 2. 
 
 i 
 
 i 
 
 M*(i) 
 
 
 F'(i) 
 
 
 C'(i) 
 
 
 
 
 Equation 
 
 QS3 
 
 Equation 
 7.6 
 
 QS3 
 7-7 
 
 Equation 
 1+1+ 
 
 QS3 
 
 
 
 1.38 
 
 5 
 
 h 
 
 31 
 
 1 
 
 2.83 
 
 12 
 
 13 
 
 8.6 
 
 8.1+ 
 
 103 
 
 109 
 
 2 
 
 5.72 
 
 2k 
 
 21 
 
 9.6 
 
 9.5 
 
 231+ 
 
 200 
 
 3 
 
 8.59 
 
 36 
 
 36 
 
 10.2 
 
 10.1 
 
 373 
 
 365 
 
 1+ 
 
 11.1+6 
 
 1+8 
 
 h5 
 
 10.6 
 
 10.8 
 
 519 
 
 1+86 
 
 5 
 
 1U.33 
 
 6o 
 
 60 
 
 11.0 
 
 11.0 
 
 668 
 
 658 
 
 6 
 
 17.20 
 
 72 
 
 72 
 
 11.2 
 
 11.0 
 
 821 
 
 798 
 
 7 
 
 20.06 
 
 85 
 
 84 
 
 11. k 
 
 11.5 
 
 977 
 
 965 
 
 8 
 
 22.93 
 
 97 
 
 96 
 
 11.6 
 
 11.8 
 
 1135 
 
 1131+ 
 
 9 
 
 25.80 
 
 109 
 
 109 
 
 11.8 
 
 11.8 
 
 1296 
 
 1295 
 
 a = 3/1+, b = 9/8 
 
 
 
 0.7!+ 
 
 3 
 
 2 
 
 7.8 
 
 8.5 
 
 21+ 
 
 17 
 
 1 
 
 1.51 
 
 6 
 
 7 
 
 8.9 
 
 9.k 
 
 56 
 
 66 
 
 2 
 
 3.06 
 
 13 
 
 11+ 
 
 9.9 
 
 10.0 
 
 127 
 
 ll+l 
 
 3 
 
 ^.60 
 
 19 
 
 23 
 
 10.5 
 
 10.1+ 
 
 203 
 
 2i+0 
 
 1+ 
 
 6.13 
 
 26 
 
 27 
 
 10.9 
 
 10.9 
 
 282 
 
 293 
 
 5 
 
 7.66 
 
 32 
 
 32 
 
 11.2 
 
 11.2 
 
 363 
 
 359 
 
 6 
 
 9.19 
 
 39 
 
 1+0 
 
 11.5 
 
 11.5 
 
 1+1+6 
 
 1+58 
 
 7 
 
 10.72 
 
 U5 
 
 51+ 
 
 11.7 
 
 11-7 
 
 531 
 
 633 
 
 8 
 
 12.26 
 
 52 
 
 51+ 
 
 11.9 
 
 11.9 
 
 617 
 
 61+1+ 
 
 9 
 
 13.79 
 
 58 
 
 61 
 
 12.0 
 
 12.0 
 
 701+ 
 
 731+ 
 
79 
 
 Figure 15- M'(i) versus s'. 
 
80 
 
 Figure l6a. P'(i) versus s. 
 
81 
 
 S- 
 
 Figure l6b. F'(i) versus s\ 
 
82 
 
 Figure 17 a. C'(i) versus s. 
 
.1/ • 
 
 83 
 
 Si 
 
 Figure 17b. C'(i) yersus sT 
 
81+ 
 
 5.3.1+ Discrepancies 
 
 The two cases for which numerical results were presented in Section 
 5.2.1 differ only in the range of the divisor. We should also consider the 
 effect of varying the precision in the estimates of the operands. The program. 
 QS3, was therefore also run for the same parameter values as listed in Table 2 
 (Section 2.5«l) except that Arp, y, and X were decreased from l/l6 to 1/32. 
 The minimized results are shown in Table 6. Numbers under the heading 
 'Equation' are from the evaluation of Equation 5.39; numbers under the head- 
 ing 'QS3' are from the QS3 and minimization programs. 
 
 Table 6. Comparison of Results from Estimating Equation 
 and the QS3 Program for Arp = 1/32. 
 
 i 
 
 s! 
 
 1 
 
 M'(i) 
 
 
 F*(j 
 
 .) 
 
 C'(i) 
 
 
 
 
 Equation 
 5 
 
 QS3 
 3 
 
 Equation 
 7-37 
 
 QS3 
 7.66 
 
 Equation 
 36 
 
 QS3 
 
 
 
 1.16 
 
 23 
 
 1 
 
 2.38 
 
 10 
 
 10 
 
 8.1+1 
 
 8.20 
 
 81+ 
 
 82 
 
 2 
 
 U.80 
 
 20 
 
 20 
 
 9.^3 
 
 9.65 
 
 191 
 
 193 
 
 3 
 
 7.21 
 
 31 
 
 3)4 
 
 10.01 
 
 10.02 
 
 306 
 
 31+6 
 
 1+ 
 
 9.62 
 
 1+1 
 
 1+1+ 
 
 10.1+3 
 
 10.8 
 
 1+25 
 
 1+76 
 
 5 
 
 12.03 
 
 51 
 
 62 
 
 10.75 
 
 10.9 
 
 5I+8 
 
 679 
 
 6 
 
 Ik.kk 
 
 61 
 
 67 
 
 11.02 
 
 11.1+ 
 
 67!+ 
 
 71+9 
 
 7 
 
 16.85 
 
 71 
 
 8U 
 
 11.21+ 
 
 11.5 
 
 802 
 
 970 
 
 8 
 
 19.25 
 
 82 
 
 90 
 
 11. 1+3 
 
 11.9 
 
 933 
 
 1067 
 
 9 
 
 21.66 
 
 92 
 
 110 
 
 11.60 
 
 11.8 
 
 IO65 
 
 1303 
 
 In Figure 18 , the data from the C*(i)-QS3 column of Table 6 have 
 been added (denoted by X's) to Figure 17(a). Note that these X-points start 
 near the predicted values (solid line) but increasingly fall above the 
 expected values. 
 
85 
 
 Si 
 
 Figure 18. C'(i) versus s| for Arp = l/l6 and Arp =* 1/32, 
 
86 
 
 The source of this discrepancy turns out not to be the predictive 
 equations, as might be first suspected, but rather the QS3 algorithm; speci- 
 fically the decision to pick divisor transition values as the simplest binary 
 fraction in the allowable interval. This choice was made in the early stages 
 of the research when other measures of cost were being used and in changing 
 to the minterm approach it was not evaluated critically. Fortunately, as 
 will be explained, it was possible to salvage the numerical results produced 
 by QS3- A correct algorithm has also been found and is described in the 
 Appendix. 
 
 The essence of the problem is the failure to fully appreciate the 
 two-dimensional nature of the minimization problem. For several of the q- 
 regions which produced doubtful results , the areas corresponding to the prime 
 implicants of the reduced function were drawn on a P-D plot. The upper and 
 lower stairstep boundaries were therefore made apparent. 
 
 By close inspection of the boundaries, it could be seen that the 
 decision to force the location of risers to the simplest binary fraction some- 
 times over-constrainted the location of the tread. In other words, in some 
 cases for which a divisor interval would have been spanned with one tread, the 
 algorithm generated two treads. Furthermore, each of these extra treads 
 required an extra prime implicant to define it. Thus, although the output 
 function was minimal for the given definition of the q-region, the given 
 definition of the q-region was unduly complicated and therefore not truly 
 minimal. By manually revising the boundary to eliminate the superfluous prime 
 implicants, it was found that the cost was reduced to close agreement with 
 the predicted values . 
 
87 
 
 But the constants in the equation for estimating cost, and , 
 were specified based upon results from the QS3 program. Why should they be 
 trusted? The answer to this question is found in the following argument. 
 
 If we think of the transition region between q(i) and q(i-l) as 
 being defined by a grid of vertical spacing, Arp, and horizontal spacing, Ad, 
 then the set of all boundaries between q(i) and q(i-l) is all stairsteps 
 which can be drawn along these grids and still remain inside the transition 
 region. As Ad and Arp are decreased the number of different boundaries 
 increases exponentially. The problem is to pick boundaries that will mini- 
 mize the number of literals in the Boolean function defining the area enclosed 
 by the boundaries. (Such an algorithm is described in the Appendix. ) For- 
 tunately for the parameter values used to derive the constants and , 
 there was very little choice in selecting the boundaries due to the dimen- 
 sions of the transition regions. It is, therefore, asserted that the boundary 
 produced by the QS3 algorithm and a correct algorithm would be very nearly the 
 same. A graphical spot check of several of the boundaries confirmed this 
 assertion. When however, Arp was reduced from l/l6 to 1/32 the number of 
 possible boundaries increased and thus the discrepancy became apparent. 
 
 There is one other case for which a discrepancy is apparent. In 
 Table 5 for a = 3/U, b = 9/8, and i = 7, notice that M'(i) from QS3 is 5h 
 while the predicted value is h5 . This difference accounts for the high points 
 at s! = 10.72 in Figures 15 and 17(b). The prime implicant covering for this 
 case (q(7) ) was drawn and it was thus discovered that six extra prime impli- 
 cant s had been generated. In this case, although Arp is also 1/16, the 
 shifting of the divisor range to the right increases the width of the transi- 
 tion region to the extent that the QS3 algorithm may fail badly for d values 
 near the upper limit, b. Fortunately, it did not except in the one region. 
 
88 
 
 5.k Analytic Results Concerning Cost of Table 1 
 
 5.1+.1 Preliminary Remarks 
 
 The program, QSU, produces a cost estimate of Table 1 for a Type 1 
 structure for which the precision of Ad is such that the rounded, integer 
 portion of Ad is a correct quotient digit. As mentioned in Section 2.U, we 
 are also interested in hybrid structures in which Table 1 and the multiples 
 are used to transform the divisor and remainders before they are applied to 
 Table 2. In the following sections we consider the effect of the transfor- 
 mation on the design parameters for Table 2 and then propose an expression to 
 estimate the cost of implementing Table 1 for given precision in A and d. 
 
 5.^4.2 Worst Case Bounds on Transformed Parameters 
 
 As in Section 2.2, assume that we are given d which is representa- 
 tive of divisor values in the range d - a - d - d+B and are given rp 
 which is representative of remainders in the range rp - A - rp - rp + y. 
 
 Let A = F(d) be generated by Table 1. The range of the transformed divisor, 
 
 T 
 d , now represented is given by 
 
 Ad - Act ± d T £ Ad + A3 (5-^2) 
 
 T 
 and the range of the transformed remainder rp is given by 
 
 Arp - AA ^ rp ^ Arp + Ay (5.^3) 
 
 T T 
 The divisor range which must be accommodated by Table 2 is (a , b ), 
 
 where 
 
 a T = (Ad) . - A a, and (5-hk) 
 
 mm max 
 
 
 b T = (Ad) + A 3. (5.U5) 
 
 max max 
 
89 
 
 The worst-case transformed values of a, 3, A, and y are merely A a, A 3, 
 
 ' J max max 
 
 A A, and A y. If 2 is the weight of the low order hit in A, then 
 max max 
 
 Ad T = Ad 2" j , and (5.U6) 
 
 Arp T = Arp 2~ J . (5.^7) 
 
 Assuming that A =2, then d' (Equation 5«19) becomes 
 
 max 
 
 2A + 2y - 2" (£+j) + 2v« + w(23 - 2~ (6+j) ) 
 2p - 1 
 
 (5.W 
 
 Assume that a = and that j is sufficiently large to permit the terms 2 
 
 and 2 to be neglected relative to A, y» a, and 3j then 
 
 d' « 2 Arp (V +Y .) +2 wB . ( u) 
 
 2p - 1 
 
 This value of d' for given A', y' , Arp, and 6 is greater than d' as defined 
 in Equation 5.2U. Furthermore, d' increases with i due to the 2w3 term. 
 This comparison indicates that although the transformation reduces cost by 
 narrowing the divisor range for Table 2, it increases cost by increasing 
 restrictions on the q-region boundaries. 
 
 The most difficult terms to evaluate in this analysis are (Ad) . 
 
 mm 
 
 >> 
 
 in Equation ^>.hk and (Ad) in Equation 5.U5. This is the subject of the 
 
 IIlcLX 
 
 remainder of this section. 
 
 The design problem for Table 1 may be viewed as that of imple- 
 menting an estimate of the function f(d) = d . In the following analysis 
 we shall treat divisors in the range 1/2 £ d <• 1. The approach adopted here 
 is to specify the precision in A, the estimates of d , and then to determine 
 the precision in d required to guarantee that dA is within a certain interval 
 in the vicinity of one. The precision of A is selected as the independent 
 variable since it determines the number of additions required in forming the 
 
90 
 
 product dA. The number of additions is the dominant factor in determining 
 the operating time of the Tl, Ml, M2 part of the quotient selector. 
 
 Let the set of discrete values of the output of Table 1 be defined 
 
 by 
 
 A = { mi } 
 
 (5-50) 
 
 >-J 
 
 where t = 2 for some positive integer, j, and m is an integer ranging from 
 1/t through 2/t. The tick marks on the ordinate of Figure 19 designate such 
 a set for t = 2 
 
 2 r 
 
 Figure 19. Geometry for Derivation of Estimates of d 
 
 
91 
 
 For every element of A we must define a divisor interval for which 
 mx is used as the estimate of the reciprocal of divisor values in the inter- 
 val. Interpreted graphically, the elements of A determine the location of 
 the treads of a stairstep approximation to d . The remaining task is to 
 specify the location of the risers (the dotted lines in Figure 19). 
 
 Let d., and d denote the left and right ends respectively of the 
 l,m r,m 
 
 divisor interval for which A = mi is taken as the inverse of divisor values 
 
 in the range d^ - d * d .It may be shown that the optimum values 
 
 l,m r,m J * 
 
 for d., and d in the sense of minimizing the maximum value of 1-dA are 
 
 l,m r,m ' ' 
 
 \* " x (I + 1) • and (5 - 5l) 
 
 (5-52) 
 
 r ,m t (2m - l) 
 
 These equations correspond to the reciprocal of the average value 
 
 of xm and x(m+l), and xm and x(m-l). For divisor values, d, in the range 
 
 d n - d «* d , the range of dA is given by 
 l,m r,m 
 
 1 - e~(m) * dA ^ 1 + e + (m) (5-53) 
 
 where 
 
 e + (m) = 1 / (2m - 1) (5-5*0 
 
 e"(m) = 1 / (2m + 1) . (5-55) 
 
 The negative error is maximum for m = m . = l/x , but since 1/2 - d *■ 1, 
 
 mm 
 
 the positive error, e (m) is maximum at m = m . + 1. 
 
 mm 
 
 In practice d n and d are also discrete values and thus in 
 r 1 ,m r ,m 
 
 general, cannot be placed precisely as specified by Equations 5«51 and 5*52. 
 
 A 
 
 In this case the determination of the error bounds on the product dA is more 
 complicated. 
 
92 
 
 If cL and d are represented to 6 places to the right of the 
 1 ,m r ,m 
 
 radix point then the actual end points can be within 2 of the theore- 
 tically optimal point. Let A = 2 ' for the worst case, replace d 
 
 by d n - A and replace d by d + A. 
 l,m r,m r,m 
 
 Now, 
 
 where 
 
 1 - e(m) * dA ^ 1 + e + (m) (5-56) 
 
 e + (m) = mAx + 1 / (2m - 1) (5-57 
 
 e (m) = mAx + 1 / (2m + l) (5-56\ 
 
 Note that due to the range restriction of d, 
 
 e + (m . ) = m . At (5-59) 
 
 min mm 
 
 and e (m ) = m At . (5-60) 
 
 max max 
 
 Since we require 
 
 2" 6 - ^ ( i ' -j— ) (5.61) 
 
 t m m + 1 
 
 x 
 
 for all allowable m, the maximum value of 2 should be less than or equal to 
 xA» and 6 should be less than or equal j + 2. 
 
 For given values of t and A 
 
 (Ad) . = 1 - e"(m) (5-62) 
 
 mm max 
 
 (Ad) = 1 + e + (m) (5.63) 
 
 max max 
 
 taken over all m in the range l/x to 2/t. 
 
93 
 
 5.U.3 An Estimate of the Cost of Table 1 
 
 We now derive an expression with which to estimate the minimum cost 
 in literals of Table 1 when structured as specified in Section 3.2.3. Let 
 the outputs of value A be of the form 
 
 A = a _ 2 a -l * a i a 2 *•* a j 
 
 _ r 
 
 and considered the d axis of Figure 19 to be equally divided in units of 2 
 
 After all values of d n and d are specified, each bit of A may be defined 
 
 l,m r,m * J 
 
 r 
 
 by a sum-of-products of minterms of the form k 2 
 
 Let A = a a .a a ...a. . We will now derive an estimate of the 
 cost of implementing a. = f.(d). In the range 1 - A - 2, each bit, a., 
 
 is 1 in 2 intervals, each of length 2 . Let y! n be the value of the 
 
 J i,k 
 
 bottom of the k interval along the d axis for bit a. and let yV be the 
 
 1 X ,K 
 
 top of the interval. 
 Thus, 
 
 y! , = 1 + (2k - 1) 2" 1 (5.6U) 
 
 l ,k: 
 
 yV , = 1 + 2k2 _1 (5.65) 
 
 1 ,K. 
 
 for i = 1, 2, ..., j and k = 1, 2, ..., 2 (l " . 
 
 Let X be the width of the corresponding interval along the d-axis, 
 1 ,K 
 
 thus 
 
 X. . = r^ : (5.66) 
 
 1,J " (Uk^ - 2k) 2~^ + (ilk - 1) 2" 1 + 1 
 
 Let each interval of width 2 ' along the d-axis correspond to a 
 
 minterm, each with a fan-in of 6. The number of minterms required to define 
 
 X. , is 
 i,k 
 
9k 
 
 V " ^i,*' (5 - 6T) 
 
 the number of literals is 
 
 C. >k = «M. ;k . (5.68) 
 
 Using the same approximation to the minimization algorithm as 
 described in Section 5.3.3, the cost in literals, after minimization for 
 implementing the X. interval is 
 
 1 ,k 
 
 c ±,k ■ M i,k F i,k (5 - 69) 
 
 where, with u = I (log M. ) 
 
 c. 1 jK. 
 
 M! . = M. . / 2 y 
 i,k i,k 
 
 is an approximation to the number of prime implicants required and 
 
 F! , = 6 - m (5-70) 
 
 i,k 
 
 is an approximation to the average fan-in. 
 
 The cost of implementing a. = f.(d) is therefore 
 
 (5.71) 
 
 
 
 
 
 
 i 
 
 = 
 
 I 
 
 k = 
 
 ■1) 
 
 1 
 
 c i,* 
 
 The 
 
 total 
 
 number 
 
 of 
 
 prime 
 
 impl] 
 
 .cants 
 
 required is 
 
 
 
 
 
 
 
 
 2 (i " 
 
 -1) 
 
 
 
 
 
 
 
 M! 
 
 l 
 
 = 
 
 V 
 
 k = 
 
 1 
 
 M! , 
 i,k 
 
 The cost for the entire table is therefore 
 
 (5.72) 
 
 C™ = \ C! + M! (5-73) 
 
 Tl y i i 
 
 i=l 
 
6. ESTIMATES OF COST AND PERFORMANCE 
 
 6.1 Preliminary Remarks 
 
 In this section we use the analytic tools developed in Section 5 
 together with the definitions in Section 3 to tabulate samples of expected 
 cost and performance. Results are given for Type 2 structures, Type 1 
 structures, and finally for a family of hybrid structures. Since the radix 
 of the model division is the primary determinant of performance, for each 
 structure we first consider cost versus radix, then performance versus radix, 
 and finally cost versus performance. 
 
 Some of the results depend upon assignment of numerical values to 
 quantities used in the definitions of Section 3. The values selected are 
 based upon experience in arithmetic unit design. A different set of 
 realistic values would only shift the location of the cost-performances 
 curves and not materially alter the shape of the curve. General conclusions 
 inferred from them would not change . 
 
 6.2 Type 2 Structures 
 6.2.1 Cost versus Radix 
 
 The cost of Table 2, C , is given by 
 
 n-1 
 
 C T2 = 2 (C ' (i) + M ' U)) (6,1) 
 
 i=0 
 
 where C'(i) is defined by Equation 5.39 and M'(i) is defined by Equation 5.^0. 
 
 95 
 
96 
 
 Tables la. and Ih summarize cost versus radix for several values of Arp. 
 Table 7a is for a divisor in the range 1/2 to 1 and Table 7b is for a divi- 
 sor in the range 3A to 9/8. In all cases, p = 2/3, y' = ^' = 1» 3' = 1, 
 and a' = 0. The quantity Ad is 2 where 6 is given for each entry in the 
 tables. 
 
 The limiting cases {k and 8) are based upon the assumption that 
 the precision in rp and d is increased such that s. 1 = s. . A near minimal 
 cost should lie between Cases 1 and k for the first division range or between 
 Cases 5 and 8 for the second division range. The cost entries are given in 
 the following form: 
 
 18 (Prime Implicants) 
 111 (Literals in AND Gates) 
 129 (Total Cost) 
 
 
 
 Table 7a. 
 
 Cost 
 
 of Table 2 versus 
 
 Radix 
 
 
 
 r 
 
 6 
 
 Case 1 
 Arp=l/l6 
 
 6 
 
 Case 2 
 Arp=l/32 
 
 6 
 
 Case 3 
 Arp=l/6U 
 
 6 
 
 Case k 
 Arp=0 
 
 k 
 
 5 
 
 18 
 111 
 129 
 
 5 
 
 15 
 90 
 105 
 
 3 
 
 Ik 
 
 81 
 95 
 
 00 
 
 13 
 Ik 
 87 
 
 16 
 
 8 
 
 552 
 6170 
 6722 
 
 7 
 
 h6k 
 506h 
 5528 
 
 7 
 
 U30 
 k6h6 
 5076 
 
 OO 
 
 Uoo 
 
 U291 
 1+691 
 
 6k 
 
 9 
 
 loVro 
 160526 
 170996 
 
 9 
 
 8792 
 132578 
 11+1370 
 
 9 
 
 81U8 
 
 121971 
 130119 
 
 OO 
 
 7595 
 112928 
 120523 
 
 '56 
 
 11 
 
 17^597 
 3381283 
 3555880 
 
 11 
 
 11+6610 
 2802307 
 29^8987 
 
 11 
 
 135871 
 2582126 
 
 2717997 
 
 OO 
 
 126656 
 239^169 
 2520825 
 
91 
 
 Table 7b. Cost of Table 2 versus Radix 
 
 r 
 
 6 
 
 Case 5 
 Arp=l/l6 
 
 6 
 
 Case 6 
 Arp=l/32 
 
 6 
 
 Case 7 
 Arp=l/64 i 
 
 Case 8 
 i Arp=0 
 
 1+ 
 
 5 
 
 10 
 61 
 71 
 
 1+ 
 
 8 
 
 52 
 60 
 
 3 
 
 8 
 
 hi 
 
 57 
 
 7 
 46 
 53 
 
 16 
 
 7 
 
 296 
 3353 
 3649 
 
 6 
 
 261 
 2920 
 3181 
 
 6 
 
 247 
 2742 
 2989 
 
 234 
 2583 
 2817 
 
 64 
 
 8 
 
 5597 
 86870 
 92*167 
 
 8 
 
 4953 
 75988 
 80941 
 
 8 
 
 4684 
 71481 
 76165 
 
 4443 
 67470 
 71913 
 
 256 
 
 10 
 
 93341 
 1825332 
 1918673 
 
 10 
 
 82590 
 1600477 
 1683067 
 
 10 
 
 78097 
 1505408 
 1583505 
 
 74090 
 1424130 
 1498220 
 
 6.2.2 Performance versus Radix 
 
 The following equations from Section 3 are relevant to the calcu- 
 lations in this section. 
 
 Operating Time of Model Division: 
 
 T=T +T +T +T 
 Q PREF Ml T2 R' 
 
 (3.7) 
 
 Performance of Model Division: 
 
 log r 
 P = £ — 
 
 « T Q 
 
 (3.8) 
 
98 
 Operating Time of Full Precision Division 
 
 T„ T^ + T 
 
 T D = M i + -^- (3.U) 
 
 Performance of Full Precision Division: 
 
 2 log r 
 D T A log 2 r + 2(T + T Q ) K3 ' X£} 
 
 Table 8 is a summary of P and P for several radices with T r . T , TPT: ,=3 , 
 
 T M1 = °' T T2 = 2 ' T R = ls T A = 3 ' T C = k ' F ° r these values T Q = 6 - Note 
 that we have actually computed a best-case for performance since we have 
 
 assumed that Table 2, even for the higher radices, can be implemented in two 
 
 delays (T T2 = 2). 
 
 Table 8. Performance of Type 2 Structure versus Radix 
 r P (bits/delay) P (bits/delay) 
 
 k .33 .15 
 
 16 .67 .25 
 
 6k 1.00 .32 
 
 256 1.33 .36 
 
 6.2.3 Cost versus Performance 
 
 Neglecting the cost terms C pREF , C g^,, and C , the cost of 
 implementing a Type 2 structure is C Table 9 summaries the bounds on C 
 versus performance of the full precision division. The actual cost should 
 
99 
 
 lie between the lower bound (LB) and the least upper bound (LUB) correspond- 
 ing to Case 1 in Table 7a and Case 5 in Table 7b. These results are plotted 
 and discussed further in the summary and conclusions (Section 7)« 
 
 Table 9« Cost Bounds versus Performance for 
 Type 2 Model Division 
 
 P D 
 
 
 
 C T2 
 
 (literals) 
 
 
 
 (bits /delay) 
 
 Times 
 
 a = 1/2, 
 
 b = 1 
 
 Times 
 
 a = 3A, b 
 
 = 9/8 
 
 
 Increase 
 
 LB 
 
 LUB 
 
 Increase 
 
 LB 
 
 LUB 
 
 .15 
 
 1.00 
 
 87 
 
 129 
 
 1 
 
 53 
 
 71 
 
 .25 
 
 1.67 
 
 U.691 
 
 6,722 
 
 5^ 
 
 2,817 
 
 3,61*9 
 
 .32 
 
 2.13 
 
 120,523 
 
 170,996 
 
 1385 
 
 71,913 
 
 92,1+67 
 
 36 2.U0 2,520,825 3,555,880 28975 1,1*98,220 1,918,673 
 
 6.3 Type 1 Structures 
 
 6.3.1 Cost versus Radix 
 
 Neglecting the cost terms C TmriTli C^^^ and C„, the cost of imple- 
 
 PREF DEF R 
 
 menting a Type 1 model division is the sum of C and C . Values for C .. 
 are taken from the results given in Table k. The term C is computed from 
 Equat i on 3.6, namely , 
 
 C M1 " i C R + \ N B C A + (H A + X) M B C SG + ^ C C" 
 
 The following values are assumed: 
 
 C R =10, C A = 50, C SG =6, c c -U, N B = 8. 
 
 Table 10 summarizes the results. 
 
100 
 
 Table 10. Cost of Type 1 Structure versus Radix 
 
 r 
 
 C T1 
 
 J 
 
 N A 
 
 C M1 
 
 C T0T 
 
 k 
 
 28 
 
 3 
 
 2 
 
 1230 
 
 1258 
 
 16 
 
 k5k 
 
 6 
 
 3 
 
 1708 
 
 2162 
 
 Gk 
 
 21+72 
 
 9 
 
 5 
 
 263U 
 
 5106 
 
 6.3.2 Performance versus Radix 
 
 In computing the operating time for a Type 1 structure we assume 
 that T pREF = 3, T T2 = 0, T R = 1, T A = 3, T Q = h, and T^ = 3 N A , and there- 
 fore from Equation 3.7, 
 
 \ - 3 \ + k - 
 
 Table 11 presents P (Equation 3-8) and P (Equation 3.12) for the cases 
 which were described in Table k. 
 
 Table 11. Performance of Type 1 Structure versus Radix 
 r P (bits/delay) P (bits/delay) 
 
 y d 
 
 k .20 .12 
 
 16 .31 .17 
 
 61+ .32 .19 
 
 6.3.3 Cost versus Performance 
 
 Table 12 merges the computations of the previous sections. 
 
P D 
 (bits/delay) 
 
 Times 
 Increase 
 
 .12 
 
 1.00 
 
 • IT 
 
 1.1+2 
 
 .19 
 
 1.58 
 
 101 
 
 Table 12. Cost versus Performance for Type 1 Model Division 
 
 Times 
 C (literals) Increase 
 
 1258 1.00 
 
 2162 1.72 
 
 5106 U.06 
 
 6.k Hybrid Structures 
 
 6.^.1 Cost versus Radix and Number of Adders in Multiplier 1 
 
 For hybrid structures the cost is computed in several stages. First, 
 
 T T 
 C and the worst-case bounds on the transformed divisor range (a , b ) are 
 
 computed for the cases of 1, 2, 3, and k adders in Multiplier 1. The number 
 
 of adders, N , is the dominant factor in the performance of the model division 
 
 and furthermore specifies the cost of Table 1 under the assumptions presented 
 
 in Section 5.4.2. Recall that the maximum uncertainty in A, x, is 2 where 
 
 -1+2 
 
 where j = 2 N ; that the maximum uncertainty in d, 6, is 2 ; and that 
 
 i and 6 determine C . 
 
 Next the transformed parameters are computed for each of the four 
 designs. The cost equation for Table 2 is evaluated for each set of trans- 
 formed parameters , each for four different radices , to yield a total of six- 
 teen designs. The total cost for each hybrid structure is taken to be 
 
 C T1 + C M1 + C M2 + C T2* 
 
 Table 13 summarizes the costs for the sixteen cases. The quantities 
 
 T T 
 a and b are defined by Equations 5.62 and 5-63, respectively, and C is 
 
 defined by Equation 5-73. The terms C.. and £ are computed from Equation 3.6 
 
 with C. =50, C = 10, C _ = 6, C_ = k, and e = 5. The cost term, C m _, is 
 
102 
 
 computed from Equations 5.33, 5-39, and 5.1+0 with the transformed parameters 
 
 specified as follows : 
 
 m m 
 
 X = 1/16, a = 0, | 
 
 max 
 
 T -6+1 
 1=2 
 
 = 2, Arp T = 2 J 5 , Ad = 2 J ~°' Y T = l/l6, 
 P = 2/3. 
 
 Table 13. Cost Computations for Hybrid Structures 
 
 Table 1 Parameters 
 
 Case 
 No. 
 
 'Tl 
 
 Ml 
 
 'M2 
 
 'T2 
 
 Total 
 
 V 
 
 1, j=2, 6=U, 
 
 1 
 
 1+ 
 
 17 
 
 512 
 
 332 
 
 332 
 
 10 
 
 928 
 
 T 
 a = 
 
 27/32 
 
 2 
 
 16 
 
 17 
 
 61+8 
 
 332 
 
 295 
 
 3233 
 
 1+525 
 
 b T . 
 
 41/32 
 
 3 
 
 61+ 
 
 17 
 
 7Qk 
 
 332 
 
 5597 
 
 81+615 
 
 9131+5 
 
 
 
 4 
 
 256 
 
 17 
 
 920 
 
 332 
 
 9331+1 178713 
 
 ,882,323 
 
 V 
 
 2, j=l+, 6=6, 
 
 5 
 
 1+ 
 
 126 
 
 972 
 
 892 
 
 2 
 
 11+ 
 
 2006 
 
 T 
 a = 
 
 123/128 
 
 6 
 
 16 
 
 126 
 
 1220 
 
 892 
 
 76 
 
 81+3 
 
 3157 
 
 t T = 
 
 137/128 
 
 7 
 
 61+ 
 
 126 
 
 11+68 
 
 892 
 
 li+1+9 
 
 22059 
 
 25991+ 
 
 
 
 8 
 
 256 
 
 126 
 
 1716 
 
 892 
 
 2U169 h6[ 
 
 1+92530 
 
 N A= 
 
 3, j=6, 6=8 
 
 9 
 
 It 
 
 688 
 
 ikkk 
 
 1708 
 
 1 
 
 3 
 
 381+1+ 
 
 T 
 a 
 
 = 507/512 
 
 10 
 
 16 
 
 688 
 
 182I+ 
 
 1708 
 
 19 
 
 212 
 
 1+1+51 
 
 b T 
 
 = 521/512 
 
 11 
 
 61+ 
 
 688 
 
 218U 
 
 1708 
 
 367 
 
 5583 
 
 10530 
 
 
 
 12 
 
 256 
 
 688 
 
 25I+1+ 
 
 1708 
 
 6121 118070 
 
 129131 
 
 N A= 
 
 1+, j=8, 6=10 
 
 13 
 
 1+ 
 
 31+69 
 
 1988 
 
 2780 
 
 
 
 
 
 8237 
 
 T 
 a = 
 
 201+3/201+8 
 
 lU 
 
 16 
 
 31+69 
 
 21+60 
 
 2780 
 
 5 
 
 1+5 
 
 8759 
 
 b T = 
 
 
 15 
 
 61+ 
 
 3I+69 
 
 2932 
 
 2780 
 
 85 
 
 1286 
 
 10552 
 
 2056/20I+8 
 
 16 
 
 256 
 
 3I+69 
 
 3U0I+ 
 
 2780 
 
 11+26 
 
 27I+63 
 
 385I+2 
 
 6.1+ 
 
 .2 Performance 
 
 versus 
 
 Radix 
 
 and N 
 
 umber of 
 
 Adders 
 
 in Mult 
 
 iplier 1 
 
 
 In computing the operating time for the hybrid structures we assume 
 that T pREF = 3, T T2 = 2, T R = 1, T A = 3, T Q = k and T^ = 3 N A , and therefore 
 from Equation 3.7 
 
 T Q = 3 N A + 6. 
 
103 
 
 Table Ik presents P (Equation 3-8) and P (Equation 3.12) for the cases in 
 Table 13. 
 
 Table Ik. Performance Calculations for Hybrid Structures 
 
 Case No. 
 
 P (bits/delay) 
 
 P (bits/delay) 
 
 1 
 
 .22 
 
 .13 
 
 2 
 
 .1*5 
 
 .21 
 
 3 
 
 .67 
 
 .27 
 
 U 
 
 • 89 
 
 .32 
 
 5 
 
 • 17 
 
 .11 
 
 6 
 
 .33 
 
 .18 
 
 7 
 
 • 50 
 
 .2k 
 
 8 
 
 .67 
 
 .29 
 
 9 
 
 .13 
 
 .09 
 
 10 
 
 .27 
 
 .16 
 
 11 
 
 .40 
 
 .21 
 
 12 
 
 .53 
 
 .26 
 
 13 
 
 .11 
 
 .08 
 
 14 
 
 .22 
 
 .Ik 
 
 15 
 
 .33 
 
 .19 
 
 16 
 
 .hk 
 
 ,2k 
 
10U 
 
 6.1+.3 Cost versus Performance 
 
 Table 15 merges the cost and performance (P n ) data for the hybrid 
 
 structures. These results are plotted and discussed further in the next 
 
 section. 
 
 Table 15. Cost versus Performance for Hybrid 
 Model Division Structures 
 
 Case No. 
 
 P D 
 
 Times 
 
 C 
 
 Times 
 
 
 (bits/delay) 
 
 Increase 
 
 (literals) 
 
 Increase 
 
 1 
 
 .13 
 
 1.00 
 
 928 
 
 1 
 
 2 
 
 .21 
 
 1.62 
 
 H525 
 
 5 
 
 3 
 
 .27 
 
 2.08 
 
 9131+5 
 
 98 
 
 1+ 
 
 .32 
 
 2.U6 
 
 1882323 
 
 2028 
 
 5 
 
 .11 
 
 1.00 
 
 2006 
 
 1.0 
 
 6 
 
 .18 
 
 1.63 
 
 3157 
 
 1.6 
 
 7 
 
 .2U 
 
 2.18 
 
 25991+ 
 
 13 
 
 8 
 
 .29 
 
 2.63 
 
 1+92530 
 
 2l+5 
 
 9 
 
 .09 
 
 1.00 
 
 381+1+ 
 
 1.0 
 
 10 
 
 .16 
 
 1.78 
 
 1+1+51 
 
 1.2 
 
 11 
 
 .21 
 
 2.33 
 
 10530 
 
 2.7 
 
 12 
 
 .26 
 
 2.88 
 
 129131 
 
 31+ 
 
 13 
 
 .08 
 
 1.00 
 
 8237 
 
 1.0 
 
 lU 
 
 .111 
 
 1.75 
 
 8759 
 
 1.1 
 
 15 
 
 .19 
 
 2.37 
 
 10552 
 
 1.3 
 
 16 
 
 .2k 
 
 3.00 
 
 385I+2 
 
 k.l 
 
7. SUMMARY AND CONCLUSIONS 
 
 7.1 General Summary 
 
 In the summary and conclusions it is convenient to distinguish be- 
 tween the definitive, synthetic, and analytic aspects of this study. Sections 
 2 and 3 are definitive. Section 2 defines the class of division techniques 
 to be studied and Section 3 defines the measure of cost and performance to be 
 applied. It is noted that an advantage of the model division approach is 
 congruity with commonly used multiplication structures including the capacity 
 to form the partial remainders using non- propagating adders or subtractors . 
 The attendant disadvantages are the necessity to store two bits per quotient 
 digit and the requirement for a terminal step to convert the redundant to non- 
 redundant form. The fact that for division, unlike multiplication, the 
 selection of the jth quotient digit cannot be straightforwardly overlapped 
 with the formation of the jth partial remainder, prompts consideration of 
 high-speed division techniques for the model. Furthermore, the overhead 
 required to "call" and "return" from the model division prompts study of 
 higher radix structures which produce several bits per call. A variable 
 radix block structure of a class of model division schemes is proposed for 
 study. 
 
 Section k describes algorithms with which to synthesize the most 
 complicated sub-blocks of the family of proposed quotient selectors: a combi- 
 natorial network to produce an estimate of the reciprocal of the divisor 
 (Table l), and a combinatorial network to generate a quotient digit when given 
 d and rp (Table 2). Although these synthesis routines generate a logic 
 equation definition of the structure, the intent in this study is merely to 
 
 105 
 
106 
 
 determine the cost; essentially the number of literals in the logic equations. 
 After the cost vs. performance behavior is sufficiently understood to permit 
 specification of parameters of a practicable model, the synthesis routines 
 may be applied as a first step in implementation. 
 
 Section 5 includes the bulk of the analytic work. The section opens 
 with a tabulation of costs for several cases synthesized by the previously 
 defined algorithms. But since there exists many variants of the model divi- 
 sion and since even computer synthesis in this case is expensive, the numeri- 
 cal results and insight are applied to hypothesize formulas rather than 
 algorithms with which to estimate cost. The formulas take account of the 
 ten variables of the model division. 
 
 Although one of the formulas is normalized with two empirically 
 defined quantities, it is assumed that these quantities are sufficiently 
 constant to permit meaningful prediction of cost for cases other than those 
 used in the normalization. In Section 6, the formulas for both cost and per- 
 formance are applied to tabulate expected values of cost and performance. 
 
 The present section is an attempt to summarize the work in the pre- 
 vious sections, to reach some conclusions about the feasibility of the 
 investigated quotient selection schemes, and to suggest areas for further 
 investigation. The section is subdivided into consideration of numerical 
 cost and performance results, analytic results, and concludes with additional 
 remarks about areas for further research. 
 
 7.2 Cost and Performance 
 
 Figure 20 is a graphical summary of the cost versus performance 
 estimates tabulated in Section 6. The necessity for a five cycle semi-log 
 plot emphasizes the extreme range of costs and disappointing cost-performance 
 
107 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 f 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 k 
 
 
 IO« 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 / 
 
 
 
 t 
 
 
 
 
 
 
 
 
 
 
 
 -/ — 
 
 — n 
 
 7 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 I ti 
 
 / 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 III. 
 
 ' 
 
 
 
 
 
 
 
 
 
 
 
 
 * 
 
 
 //// 
 
 
 
 
 
 
 
 
 
 
 
 
 
 j 
 
 
 /// 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 I0 5 
 
 
 
 
 
 
 
 
 
 // 
 
 1 
 
 I 
 
 f 
 
 
 
 
 
 
 
 
 
 
 
 
 fid 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 f I I 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ft 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 t / 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 // 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / i 
 
 7 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 // 
 
 / 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 // 
 
 
 
 
 
 
 
 CO 
 
 _i 
 < 
 
 on 4 
 uj icr 
 
 
 
 
 
 
 
 
 // 
 
 
 
 
 
 
 
 -1 7 
 
 3 
 
 w 
 
 
 — — 0-""" 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 <-> 5 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Vi«— 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Vjn " 
 
 
 
 
 
 
 
 
 
 
 
 
 I0 3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 9 
 
 
 
 — ^ 
 
 ' «r 
 
 
 
 -/// 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 /// 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 v/ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 V 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 
 
 
 
 
 
 J 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / 
 
 
 
 
 
 
 
 
 
 
 I0 2 - 
 
 
 
 
 IB' 
 
 $ 
 
 
 
 
 
 
 
 
 
 
 9 - 
 
 
 
 
 J]A< 
 
 // 
 
 
 
 
 
 
 
 
 
 
 7 - 
 
 
 
 
 1 B 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 lA, 
 
 
 
 
 
 
 
 
 
 
 
 . 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 .05 
 
 P D (BITS /DELAY) 
 
 Figure 20. Cost yersus Performance for Samples of 
 Model Division Structures 
 
108 
 
 behavior. It is apparent that many of the results are negative; they indicate 
 what not to attempt to implement. The points on the graph are taken from 
 Tables 9, 12, and 15. Points corresponding to the same type structure but 
 differing in radix are connected by straight-line segments. Each of these 
 "curves" is labeled with a Roman numeral. 
 
 Curves la and lb, with points from Table 7b, are the lower and upper 
 bounds on the cost of a Type 2 structure (direct table look-up) for divisors 
 in the range (3A, 9/8). Curves Ila and lib, with points from Table 7a, are 
 the lower and upper bounds for a similar structure with divisors in the range 
 (1/2, l). To a first approximation all four curves (log C) vary linearly 
 with performance and thus 
 
 Cost * lO 1 ^ 
 where k is about 18. This exponential behavior is not surprising considering 
 
 that performance varies as log r (see Equation 3.12) and that cost varies as 
 
 o 
 r log r. This latter statement is derived from Equations 5-39> 5.^0, and 5.^1. 
 
 The radix k Type 2 structure is quite practicable, requiring about 
 ten,10-input gates to yield performance of .15 bits per logic delay. Assuming 
 10 ns. logic, the scheme would generate 60 bits of quotient in about k ys. A 
 radix 16 Type 2 structure theoretically increases performance by 5/3, conse- 
 quently reducing divide time, under the same assumptions, to 2.U ys. The 
 cost, however, increases over 50 times. 
 
 Statements about the radix l6 structure must be qualified by the 
 observation that due to fan-in and fan-out restrictions, the table cannot 
 actually be implemented in two levels of logic. Since the divisor is con- 
 stant, the d portion of each prime implicant can be formed in a cascade of 
 many logic levels without degradation of performance. But going to additional 
 levels to form functions of rp, although cost may be reduced, will decrease 
 
 
109 
 
 performance below the ideal value assumed in Figure 20. Justification for 
 a radix l6 Type 2 structure is discussed further in connection with a "quo- 
 tient lookahead" scheme mentioned in Section 7.5. Type 2 structures "beyond 
 radix 16 are too expensive to consider further. 
 
 Based upon Figure 20, curve III, it appears that a Type 1 structure 
 is never preferable to a Type 2 structure. Although this is probably true, 
 the Type 1 structures might be studied further with the following points in 
 mind: 
 
 1. The structures studied here employ a rather conventional 
 multiplier requiring one cascaded adder per two bits of 
 multiplier. Perhaps faster multipliers may be found. It 
 is doubtful, however, that they would be less expensive. 
 
 2. For all structures studied the estimate of the partial 
 remainders have been converted to a conventional form. For 
 structures requiring a transformation of rp, the assimila- 
 tion is performed after the multiplication. The conversion 
 to conventional form has been required as a concession to 
 reducing the cost of Table 2. For Type 1 structures, Table 
 
 2 is not required and thus perhaps the redundantly represented 
 result could be used directly by the shift gates in the 
 full precision arithmetic unit. The elimination of the 
 conversion is roughly equivalent to eliminating one adder 
 from the multiplier structure. 
 The cost versus performance of the hybrid structures are shown in 
 curves IV-VII, corresponding to 1 through k adders in the multipliers, Ml 
 and M2. The curves initially rise slowly relative to the Type II curves but 
 soon become steep as the cost of Table 2 for the higher radices dominates. 
 
110 
 
 2 
 The r log r behavior of C T2 is not easy to suppress. Again, based upon 
 
 results shown in Figure 20, it appears that hybrid structures should not be 
 
 chosen over a Type 2 structure. 
 
 It is apparent from Equation 3.12 that P as a function of r has an 
 
 upper limit of T /2. This limit is the theoretical upper bound on the 
 
 performance of the iterative steps of multiplication. With T = 3, the 
 
 theoretical ratio of performance of division to performance of multiplication 
 
 for cases in Figure 20 ranges from 0.09 to 0.53. For practicable cases, the 
 
 range is 0.225 to 0.375- 
 
 7. 3 Analytic Results 
 
 Only a few of the cases studied appear to be feasible. But negative 
 results are valuable, and furthermore it should be kept in mind that the main 
 purpose of this thesis is not to present an exhaustive enumeration of quotient 
 selection schemes, but rather to develop general techniques for analysis. 
 
 It is important to appreciate the generality of the extension of 
 Robertson's cost measurement (s.) to the imprecise cases (s! and si'). 
 Although the estimate of cost as a function of s! is not rigorous and includes 
 empirically defined constants, the derivation of s .' is rigorous. The analysis 
 developed in Section 5«3.2 leads to a succinct statement of worst-case pre- 
 cision requirements in rp and d, (d"< a) and to insight into the effect of 
 the parameters of the model division on the cost of quotient selection. 
 
 The s! cost measurement is applicable to structures other than those 
 fitting within the structure of the model division shown in Figure 2. For 
 example, as mentioned earlier, the treads of the staircase boundaries between 
 quotient regions may be viewed as comparison constants against which rp is 
 compared to determine in which quotient region it belongs. The divisor range 
 

 Ill 
 
 is partitioned into intervals such that for each interval there is a single 
 
 comparison constant between each quotient region. The comparison constants 
 
 could be stored in a read only memory. A given divisor value would determine 
 
 a column of comparison constants which would be read out to become one input 
 
 to a set of comparators; the other input to the comparators would be rp. 
 
 If c. is the comparison constant between q(i) and q(i-l) then q=k, where k 
 
 is the greatest such that rp >_ c . The number of sets of comparison 
 
 constants has a lower bound of s 1 and upper bound of s". The number of 
 
 n n 
 
 comparison constants in each set is n (assuming implementation of only the 
 first quadrant of the P-D plot ) . 
 
 Among others, the analytic results prompt the following observations 
 
 1. There are minimum requirements for the precision in the 
 estimates of rp and d. 
 
 2. For given precision above the minimum required, there is a 
 limit, s!, to the minimum number of comparison constants 
 required between q(i) and q(i-l). 
 
 3. The actual number of steps, s. .is greater than s'. due to 
 
 * l act , l 
 
 discrete effects, i.e. due to the fact that the locations of 
 treads and risers are restricted to discrete values. 
 h. The upper bound on s. , including the discrete effects, 
 
 X cLC"0 
 
 is sV. 
 
 1 
 
 A A 
 
 5. Increasing precision in d and rp moves s. closer to s. and 
 
 s. closer to s ! , but by a decreasing amount, 
 l act l 
 
112 
 
 7 • h Suggestions for Further Investigation 
 
 The following topics for further investigation have emerged in the 
 course of this study. The order of listing does not imply any priority. 
 
 1. Compare the cost and performance of the model division approach 
 
 to other division algorithms such as the Wallace algorithm [32] as 
 implemented in the IBM 360/9l[l^+], and division schemes in other 
 large machines such as the CDC 76OO . 
 
 2. Consider the use of a radix k , Type 2 structure in a pipeline arith- 
 metic unit. Assuming that the divisors and quotients may be streamed 
 along with the partial remainders , it appears that a set of the 
 inexpensive radix k, Type 2 model division structures may be used 
 
 to effectively pipeline the division operation. Multiplication and 
 division could be intermixed in the same pipeline, however, assuming 
 synchronous control, the clock frequency is limited by the quotient 
 selection time and thus the multiply time is degraded. 
 
 3. Consider a "quotient lookahead" scheme. Assume that each adder in 
 a cascade of adders is capable of performing a multiplication radix 
 2 . Then the shift gates for each adder may be controlled by a 
 model division of the same radix. If the radix of the model is 
 greater than 2 then more quotient digits are formed than can be 
 used in forming the present partial remainder. It is conceivable, 
 however, that as soon as they are formed they could be used to set 
 
 shift gates to form the next partial remainder thus overlapping 
 control time. For example, if k=2 but the model division is radix 
 16, control signals for the shift gates of two successive adders 
 
 
113 
 
 are generated simultaneously. If a radix 16 quotient selector is 
 coupled to the output of every adder in the cascade , then for each 
 addition/subtraction four bits are formed, two of which overlap with 
 the previously formed bits . The formation of the jth partial re- 
 mainder may therefore be overlapped with formation of the j+1, 
 radix k quotient digit. After startup, the effective control time 
 per addition would be the quotient selection time minus the add 
 time. If the times were equal, then division could proceed at 
 multiply speed. 
 
 k. Study the variation in cost of the entire arithmetic unit as a 
 
 function of p, the redundancy ratio. Recall that p is one variable 
 in the equation for s!. In all numerical work produced in this 
 study p = n/(r-l) = 2/3. The decision to keep p constant excluded 
 the explicit study of radix 8, 32, and 128 for which there is no 
 integer, n such that p = 2/3. 
 
 5. Study a model division structure based upon simultaneous comparisons 
 of rp with comparison constants selected by the value of the divisor, 
 
 6. Consider the engineering details of a radix l6, Type 2 structure. 
 
 7. Program the correct algorithm (Appendix A) for producing the minimal 
 cost definition of a Table 2 structure. Reference [3^] defines the 
 minimization algorithm. Compare the results with those produced by 
 the QS3 algorithm (Section h) . 
 
llU 
 
 APPENDIX A 
 
 Algorithm for Generating Minimum Cost Sum-of-Products 
 Definitions of the q-Regions of Table 2 
 
 1. Consider the P-D plot to be covered by a uniform grid with spacing of Ad 
 along the d-axis and with spacing Arp along the rp-axis. The inter- 
 section of each grid line is defined by the order pair (d, rp) where d 
 
 is an integer multiple of Ad and rp is an integer multiple of Arp. Every 
 pair, (d, rp) is representative of full precision quantities in the ranges 
 defined by Equations 2.11 and 2.1^. Sufficient condition for the choice 
 of X, y, a, 3, Arp, Ad, 6, and e is that d" (Equation 5-26) be greater 
 than a, the lower bound of the divisor range. If Ad and/or Arp are 
 smaller than necessary, the excess precision is removed by minimization. 
 However, the smaller Ad and Arp, the closer the boundaries between the 
 q-regions may approach the theoretical limit, i.e. the smaller will be 
 the discrete effects. 
 
 2. Every pair, (d, rp) corresponds to a minterm, rp| |d. (See page 38 for 
 definition of the notation. ) 
 
 3. Let R. be the set of minterms which are required to define q(i), i.e. 
 which must be assigned to the output function f . . Thus, 
 
 R. = {rp| |d | all or any part of the area corresponding 
 to (d, rp) is completely within the area defined by 
 the lines rp=(i+l-p)d, rp=(i-l+p)d, d=a, and a=b.} 
 
 Let T. be the set of minterms which lie completely within the overlap 
 region between q(i) and q(i+l). Thus, 
 
115 
 
 T. = {rp||d J the area corresponding to (d, rp) is 
 
 completely within the area defined by the lines 
 rp=(i+p)d, rp=(i+l-p)d, d=a, and d=b.} 
 
 Let D be the set of all minterms which correspond to (d, rp) which do not 
 represent area within the boundaries of the P-D plot, i.e. area not 
 within any q- region. 
 
 Assume a minimization algorithm such as described in Section 1+.2.2 which 
 will accept both true minterms, 0, and a set of don't care minterms, A, 
 of a given function. The result of the minimization process is a minimal 
 set of prime implicants, n. Let ft be the set of minterms implied by II, 
 i.e. all minterms for which the function defined by the OR of the ele- 
 ments of II is true. 
 
 The following is the proposed algorithm for defining the output functions, 
 f . , for i=0, 1,. .. , n. 
 
 a) Let = R , A = T U D. 
 
 b) Execute the minimization algorithms to produce P = II, and 
 construct M Q = ft. Output function, f , is the OR of the elements 
 of P . 
 
 c) For i=l,2,...,n do the following: 
 
 Let = R. U (T. . - (T. . n M. _)), and A = T. U D. Execute 
 l l-l l-l l-l l 
 
 the minimization algorithms to produce P. = n and construct 
 M. = ft . Output function f. is the OR of the elements of P.. 
 
116 
 
 APPENDIX B 
 
 Example of Results of QSU and Minimization Program. 
 
 Note: 
 
 r=l+', n=2, a=l/2, b=l 
 
 ;P =P 1 P 2 • P 3 P U P 5 P 6 
 d= - d l d 2 d 3 d U 
 
 In the following '1' implies that the variable is present in true form; 
 '0' implies that variable is present in complement form; 'x' implies that 
 variable is absent. Variable d is deleted by inspection. 
 
 Minimal cost prime implicants for q(0): 
 
 P l P 2 P 3 ^ P 5 P 6 d l d 2 
 
 OOOxOOxx 
 OOOOxxxx 
 OOOxxOxl 
 OOOxOxxl 
 
 Minimal cost prime implicants for q(l) 
 
 P l P 2 P 3 F h P 5 P 6 d l d 2 d 3 d U 
 
 OOOlxlxOxx 
 OOOllxxOxx 
 OOxlllxlxx 
 OOlOxxxxxx 
 
 OOlxxOxxlx 
 OOlxOxxxxl 
 Olxxxxxlx 
 OOlxxxxlxx 
 OlOOOOxxll 
 OlOOxxxlxx 
 OlOxxOxllx 
 OlOxOxxllx 
 
117 
 
 REFERENCES 
 
 [I] J. E. Robertson, "A new class of digital division methods," IRE Trans- 
 actions on Electronic Computers , vol. EC-7, pp. 218-222, September 1958. 
 
 [2] T. D. Tocher, "Techniques of multiplication and division for automatic 
 binary computers," Quart. Jour. Mech. Appl. Math. , vol. 11, Part 3, 
 pp. 36U-38U, 1958. 
 
 1-3] c. V. Freiman, "Statistical analysis of certain binary division algor- 
 ithms," Proceedings of the IRE , vol. U9 , pp. 91-103, January 1961. 
 
 [h] M. Nadler, "A high speed electronic arithmetic unit for automatic 
 computing machines," Acta Technica , no. 6, pp. k6h-k r jQ i 1956. 
 
 [5] J. E. Robertson, "Methods of selection of quotient digits during 
 
 digital division," File No. 663, Department of Computer Science, Uni- 
 versity of Illinois, Urbana, Illinois, June 1965. 
 
 [6] D. E. Atkins, "The theory and implementation of SRT division," Report 
 No. 230, Department of Computer Science, University of Illinois, 
 Urbana, Illinois, June 1967. 
 
 [7] D. E. Atkins, "Higher radix division using estimates of the divisor 
 and partial remainders," IEEE Transactions on Computers , vol. C-17 , 
 no. 10, pp. 925-93U, October 1968. 
 
 [8] D. E. Atkins, "Design of the arithmetic units of Illiac III: Use of 
 
 redundancy and higher radix methods," IEEE Transactions on Computers , 
 (to appear) August 1970. 
 
 [9] D. E. Atkins, "illiac III computer system manual: Arithmetic units, 
 
 vol. I," Report No. 366, Department of Computer Science, University of 
 Illinois, Urbana, December 1969. 
 
 [10] J. E. Robertson, "A deterministic process for the design of carry-save 
 adders and borrow-save subtractors ," Report No. 235 5 Department of 
 Computer Science, University of Illinois, Urbana, July 1967. 
 
 [II] R. T. Borovec, "The logical design of a class of limited carry-borrow 
 propagation adders," Report No. 275, Department of Computer Science, 
 University of Illinois, Urbana, Illinois, August 1968. 
 
 [12] F. A. Rohatsch, "A study of transformations applicable to the development 
 of limited carry-borrow propagation adders," Report No. 226, Department 
 of Computer Science, University of Illinois, Urbana, June 1967. 
 
118 
 
 [l3j J. E. Robertson, "The correspondence between methods of digital division 
 and multiplier recoding procedures ," Department of Computer Science 
 Report No. 252, University of Illinois, Urbana, Illinois, December 1967. 
 
 [Ik] S. F. Anderson, J. G. Earle , R. E. Goldschmidt, D. M. Powers, "The 
 
 IBM System/360 Model 91; Floating-point execution unit," IBM Journal of 
 Research and Development , vol. 11, no. 1, pp. 3*4-53, January 1967. 
 
 L15] A. Avizienis, "Binary-compatible signed-digit arithmetic," AFIPS, Fall 
 Joint Computer Conference, vol. 26, pp. 663-672, 196k. 
 
 [16] V. S. Burtsev, "Accelerating multiplication and division operations in 
 high-speed digital computers," in report by The Institute of Exact 
 Mechanics and Computing Technique, The Academy of Sciences of the USSR, 
 Moscow, 1958. 
 
 [17] M. Combet , H. van Zonneveld, and L. Verbeek, "Computation of the base two 
 logarithm of binary numbers," IEEE Transactions on Electronic Computers , 
 vol. EC-lU, no. 6, pp. 863-867, December 1965. 
 
 [l8] K. J. Dean, "A precision code converter for reciprocals of binary 
 
 numbers," The Computer Bulletin , vol. 12, no. 2, pp. 55-58, June 1968. 
 
 [19] D. Ferrari, "A division method using a parallel multiplier," IEEE 
 
 Transactions on Electronic Computers , vol. EC-16 , no. 2, pp. 22*1-226, 
 April 1967. 
 
 [20] R. E. Gilman, "A mathematical procedure for machine division," Communi- 
 cations of the ACM , vol. 2, no. k, pp. 10-12, April 1959- 
 
 [21] R. E. Goldschmidt, "Applications of division by convergence," M.S. Thesis, 
 MIT, June 196I+. 
 
 [22] Ernest F. Hall, David D. Lynch, Richard E. Young, "Generation of products 
 and quotients using approximate binary logarithms for digital filtering 
 applications," IEEE Transactions on Computers Repository , no. R-68-l6*+, 
 1968. 
 
 [23] Jiri Klir, "A note on Svoboda's algorithm for division," Stroje Na 
 
 Zpracovani Informaci (information Processing Machines), no. 9, pp- 35-39 > 
 1963. 
 
 [2k] E. V. Krishnamurthy , "On range-transformation techniques for division," 
 IEEE Transactions on Computers , vol. C-19 , no. 2, pp. 157-160, February 
 1970. 
 
 [25] John N. Mitchell, Jr., "Computer multiplication and division using 
 
 binary logarithms," IRE Transactions on Electronic Computers , EC-11, 
 no. U, pp. 512-518, August 1962. 
 
 [26] Ray G. Saltman, "Reducing computing time for synchronous binary division,' 
 IRE Transactions on Electronic Computers , vol. EC-10 , no. 2, pp. 169-17 1 *, 
 June 196l. 
 
119 
 
 [27] A. Soceneantu, "Binary iterative division," (Report in Progress), 
 
 Department of Computer Science, University of Illinois, Urbana, Illinois, 
 1970. 
 
 [28] R. Stef anelli , "A suggestion for an high-speed parallel binary divider," 
 IEEE Transactions on Computers Repository , no. R-69-3, October 1968. 
 
 [29] A. Svoboda, "An algorithm for division," Stroje Na Zpracovani Informaci 
 (information Processing Magazine), no. 9, pp. 25-32, 1963. 
 
 [30] C. Tung, "A division algorithm for signed-digit arithmetic," IEEE 
 
 Transactions on Computers , vol. C-17 , no. 9, PP« 887-889, September 1968. 
 
 [31] R. M. Wade, "A carry-independent quarternary division scheme," IEEE 
 Transactions on Computers Repository, no. R-68-52, November 1967- 
 
 [32] C. S. Wallace, "A suggestion for a fast multiplier," IEEE Transactions 
 on Electronic Computers , vol. EC-13, pp. lU-17, February 19 6U. 
 
 [33] E. J. McCluskey, Introduction to the Theory of Switching Circuits , 
 McGraw-Hill, New York, 1965, pp. 135-136. 
 
 [3^] V. G. Tar e ski , "Minimization of two level switching circuits involving 
 many variables," Ph.D Thesis in preparation, Department of Computer 
 Science, University of Illinois, Urbana, Illinois. 
 
 [35] Chester C. Carroll and George E. Jordan, "A fast algorithm for Boolean 
 function minimization," Auburn University Report No. AD 680 305, 
 December 1968. 
 
 [36] Tso-Kai Liu, "A code for zero-one integer linear programming by implicit 
 enumeration (A Programming Manual for ILLIP,)" Department of Computer 
 Science, Report No. 302, December 1968. 
 
 L37] T. Ibaraki, et al , "An implicit enumeration program for zero-one integer 
 programming," Department of Computer Science, Report No. 305, January 
 1969. 
 
120 
 
 VITA 
 
 Daniel Ewell Atkins, III was born in Jacksonville, Florida on 
 April 12, 19^+3. He received the B,S. degree in Electrical Engineering from 
 Bucknell University, Lewisburg, Pa., in 1965; the M.S. degree in Electrical 
 Engineering from the University of Illinois, Urbana, in 1967; and the Ph.D. in 
 Computer Science from the University of Illinois in 1970. 
 
 Between 1963 and 1967 he held summer positions with the Freas- 
 Rooke Computing Center, Bucknell University, and the U.S. Naval Ordnance 
 Laboratory, White Oaks, Md. While attending the University of Illinois he 
 was employed as a research assistant in the Department of Computer Science. 
 He designed the floating point arithmetic units for the Illinois Pattern 
 Recognition Computer (illiac III) under direction of Professor Bruce H. 
 McCormick, and conducted research in the area of computer arithmetic under 
 the direction of Professor James E. Robertson. Mr. Atkins has published 
 papers evolving from this work in the IEEE Transactions on Computers of 
 October 1968 and August 1970. 
 
 Mr. Atkins is a member of Tau Beta Pi, Sigma Xi , Pi Mu Epsilon, 
 Pi Delta Epsilon, the Association for Computing Machinery, the Institute 
 of Electrical and Electronic Engineers, and the American Association of 
 University Professors. 
 
m AEC-427 
 
 (6/68) 
 ECM 3201 
 
 U.S. ATOMIC ENERGY COMMISSION 
 
 UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR 
 
 DISPOSITION OF SCIENTIFIC AND TECHNICAL DOCUMENT 
 
 < See Instructions on Reverse Side ) 
 
 \ AEC REPORT NO. 
 
 Report No. 397 
 ! COO- 1018- 1201+ 
 
 2. TITLE 
 
 A STUDY OF METHODS FOR SELECTION OF 
 QUOTIENT DIGITS DURING DIGITAL DIVISION 
 
 TYPE OF DOCUMENT (Check one): 
 
 IX] a. Scientific and technical report 
 
 I | b. Conference paper not to be published in a journal: 
 
 Title of conference 
 
 Date of conference 
 
 Exact location of conference _ 
 
 Sponsoring organization 
 
 □ c. Other (Specify) 
 
 " RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): 
 
 DCl a. AEC's normal announcement and distribution procedures may be followed. 
 
 I I b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. 
 
 I | c. Make no announcement or distrubution. 
 
 i REASON FOR RECOMMENDED RESTRICTIONS: 
 
 i SUBMITTED BY: NAME AND POSITION (Please print or type) 
 
 Daniel E. Atkins, Research Assistant 
 
 Organization 
 
 Department of Computer Science 
 University of Illinois 
 
 Signature 
 
 Date 
 
 May 28, 1970 
 
 FOR AEC USE ONLY 
 
 'lAEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION 
 RECOMMENDATION: 
 
 (PATENT CLEARANCE: 
 
 LJ a. AEC patent clearance has been granted by responsible AEC patent group. 
 LJ b. Report has been sent to responsible AEC patent group for clearance. 
 LJ c. Patent clearance not required. 
 
jU* ( 3M970 
 

^^ 
 
 $w