LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 510.84 IJlfcr hfiS& Digitized by the Internet Archive in 2013 http://archive.org/details/radix16divisionm541erce s/o. #v C 7^P.JV/ UIUCDCS-R-T2-5^1 A. CAf RADIX l6 DIVISION, MULTIPLICATION, LOGARITHMIC AND EXPONENTIAL ALGORITHMS BASED ON CONTINUED PRODUCT REPRESENTATIONS by Mllos Dragutin Ercegovac August, 1972 THE W NOV 1 3 | 972 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMP LUN'.HZHI Mil A? UIUCDCS-R-72-5 i l-l RADIX 16 DIVISION, MULTIPLICATION, LOGARITHMIC AND EXPONENTIAL ALGORITHMS BASED ON CONTINUED PRODUCT REPRESENTATIONS by Milos Dragutin Ercegovac August, 1972 Department of Computer Science University of Illinois at Urb ana -Champaign Urbana, Illinois This work was supported in part by the National Science Foundation under Grant No. US NSF GJ-813 and was submitted in partial fulfillment for the Master of Science degree in Computer Science, 1972. Staff ACKNOWLEDGMENT I wish to express my sincerest gratitude to my advisor, Professor James E. Robertson of the Department of Computer Science of the University of Illinois for his highly valued guidance, suggestions and support. I thank also National Science Foundation and the Department of Computer Science of the University of Illinois for their support. Thanks are also due to Mr. Kishor S. Trivedi for many helpful discussions. Finally, I would like to thank Mrs. June Wingler for her fine Job of typing and Mr. Mark Gobel for excellent drawings. IV TABLE OF CONTENTS Page 1. INTRODUCTION 1 2. MULTIPLICATIVE NORMALIZATION 3 3. DIVISION 18 k • NATURAL LOGARITHM 22 5. ADDITIVE NORMALIZATION 29 6 . MULTIPLICATION , , , 32 7 . EXPONENTIAL % 8. IMPLEMENTATION k6 9. CONCLUSIONS , % LIST OF REFERENCES 56 APPENDICES , ' 57 1. INTRODUCTION There is no doubt that the available technological possibilities could, justify hardware implementation of a much wider class of functions than is presently done. If the corresponding algorithms have similarity, this is even more true. One effective way to obtain such a class of algorithms is to use, in a convenient way, continued products (CP) or continued sums (CS) during the function evaluation. The use of continued products in the calculation of some elementary functions appears as early as 1959* in Voider' s CORDIC technique[ 5] • The main results of this approach have been recently summarized in the form of a unified algorithm by ¥alther[6]. Specker[7] derived also a class of algorithms using the concept of continued products. Of particular importance and usefulness are the results obtained by DeLugish[ 1] . ■ He has defined efficient algorithms for a wide class of functions including division, multiplication, square root, logarithm, exponential, trigonometric and inverse trigonometric functions, with operation times from 1 to 3 multiplication cycle times. These algorithms are specified for the radix 2, using a redundant digit set [-1, 0, 1} in continued products (sums). The main idea is replacement of a required operation or function evaluation by two simple step by step processes using addition/subtraction, shifting and possibly a set of precomputed constants, stored in a read-only memory. One of the processes is normalization, through which the digits of continued products (sums) are generated and another is the related result evaluation. These processes can be carried out in parallel, so for a fast operation at least two arithmetic units, almost the same, are required. The work done here is based upon the results obtained by DeLugish. It is motivated by the fact that the higher radix implementation offers some speed/hardware trade-offs, worth investigating. In particular, the radix 16 is considered in four algorithms: division, multiplication, logarithm and exponential. The central problem is to find the rules, which are more difficult when the higher radix is used, for selection of the digits of continued products (sums). The rules and the complete algorithms are developed for fractional parts in the conventional range [ l/2, l), of the floating point numbers. The radix l6 merely means "4 bits-at-a-time" and represents, in some sense, the radix of implementation, not of an operand. The exponent arithmetic, being simple, is not considered. The use of a redundant representation [2, 3] effects the selection rules, but not the number of steps to be performed, the probability of zero, "no addition," being too small for the radix l6 approach. In the binary case [1], the redundancy is essential also in decreasing the average number of full steps. As a difference, in the radix l6 case, the number of steps is fixed and corresponds to the number of radix 16 digits, used to represent the fractional part. The digit -by- digit evaluation, employed in the described algorithm, is not a consequence of inherent properties in the continued products (sums) approach, but reflects a realization strategy, which attempts to achieve a reasonably fast implementation for all functions under consideration, retaining at the same time simplicity. Some comparisons between radix 2 and radix 16 approaches are made and a more efficient solution, requiring essentially one "pipelined" arithmetic unit is described. 2. MULTIPLICATIVE ROEMALIZATION By normalization we mean a step by step transformation of a given number X e[l/2, l) to one (or to any other number N, in general). For uniformity and simplicity of later described algorithms, the linear convergence is imposed on normalization. Namely, an m- digit normalized number is obtained in (m+l) iterative steps. If the reciprocal of a given number X can be represented in a continued product form as m 1/X_ = TT M. i=0 x then the normalization of X to one can be achieved through a multiplicative iteration: \ +1 = \ ' \, k - 0, ..., m (2.1) and m X Ln = X_ • TT M. = 1 m+l . _ l 1=0 The factors of a continued product representation are of the form M= 1 + S . l6" k (2.2) where S, is a one-digit constant, so that an implementation of (2.1) will require only addition and shift operations. The value of S is chosen such that the error e after step k becomes where I S. I is the largest constant in the chosen set {S, ). In other words, 1 k'max l k ' at every step k, by proper choice of constant S, , the k-th digit of the partially normalized number X, will be forced to zero or (radix - l) . Therefore, the final normalized quantity X will differ by at most 16 from unity. To define the normalization procedure, one must find the rules for selection of the proper value of S , given X, and the set [S. }. The set (S } is determined as follows. To make the selection process simple, it is essential to have a redundant number representation. Therefore, the set {S, } will contain more than r elements, both positive and negative. The maximal value | S, |, or better, the redundancy ratio is obtained from the practical requirements[ 3] : - one of the simplest ways to form the term S (X *l6 ) in the recursion, is to use a multilevel adder structure, with corresponding selection-complementation networks generating the following sets of multiples: tO, +1, +2} * (X 'l6" k ) in level 1; 10, jh, +8} * (X-'l6' k ) in level 2. Therefore, the maximal value should be I S, I = 10 corresponding to the redundancy ratio I S, |/(r-l) = 2/3. The set of constants S is then {10, ..., I, 0, 1, ..., 10} (2.5) where the overbar denotes negative values. Now the error (2.3) becomes IVi'l ■ ii-v M J*f- l6 " k (2 - 6) It is straightforward to show that for Xe[l/2,l), the reciprocal l/X_ can always he represented in a continued product form using constants S from the set (2.5). Therefore, the normalization procedure defined by the recursion (2.1) is always possible. The main problem now is to define a practical selection procedure. The constants S are selected in such a way that the error condition (2.6) is satisfied for every step. Since the set of constants SL has been chosen to be redundant, the range of X , for all k, can be partitioned in the overlapping intervals, each corresponding to a particular constant S . In the overlaps, at least two constants S, , differing by 1, are a valid choice. S can be determined on the basis of X, , but to keep selection dependent on the same register positions, i.e., to retain the same "weights" for selection rules, it is convenient to define the scaled remainder as R k = l6 k " 1 (X 1 ^-l), < k < m (2.7) The selection process can now be carried on the remainders, obtained recursively from R^ = 161^ + S k + l6" k+1 S k R k , < k < m (2.8) This recursion follows from (2.1), (2.2) and (2.7). If the general form of this recursion for radix r is considered, then the following remarks concerning implementation requirements can be made: i) The number of shifting paths, necessary to generate the last term in the recursion (2.8), is inversely proportional to the radix r; ii ) For the higher radices of the form r = 2 2p , p = 1, 2, 3, • • • such that multiple formation can "be done with a cascade of adders, 2 and with the set of S, 's such that S = — (r-l), the number of k. Kmax j extra levels with respect to the radix 2 is p - 1. Clearly, the full carry propogation need be provided only at the last level. Now, starting from the condition for error (2.6), redefined for scaled remainders as - 2/3 < E^ < 2/3 (2.9) the selection rules can be derived. The intervals can be determined by straightforward calculations: - for every S, , given bounds on R_ (2.9)* find the interval boundaries as the minimal and maximal value of R. such that k equation (2.8) holds. In addition, the continuity of the range should be preserved by retaining only the overlapping intervals. The numbers representing boundaries between intervals should be simple in the binary sense so that limited precision can be used in implementation of the selection rules. From the definition of the scaled remainder (2.7) it can be observed that the normalization will be more accurate if the initial step (k=0) is performed on X and then continued using R. for k = 1, ..., m. Since R = X - 1 this change in the procedure is almost trivial. The rules of the selection for the initial step can be made very simple, due to the fact that S may be chosen to be either or 1. The rules are: S Q = 1 if 1/2 < x Q < 5/8; S Q - if 5/8 < X Q < 1. The choice of 5/8 as the boundary value is made so that the X will be in a convenient range. From the rules, the range of X is [5/8, 5/k) . m / ,--i\ It is easy to see that even for m = 1, the continued products . 7T-, ( 1+S . 16 j can represent values less than 4/5 or greater than 8/5, hence making normalization successful. Therefore, the restriction of S to the values or 1 is valid. The precision, necessary to express boundaries between intervals is at most six bits after the radix point. For convenience the bounds of the intervals will be given also to this precision. For the step k = 1, the lower and upper bounds of intervals containing R and denoted as a and b, are calculated for all possible S and displayed in the following table. Table 2.1 S a < 64R < b 10 -26 -23 9 -2k -22 8 -23 -20 7 -21 -18 6 -19 -16 5 -17 -Ik k -14 -11 3 -12 -8 2 -9 -5 1 -6 -2 -2 3 -1 2 7 -2 7 12 -3 12 18 -k 18 2k -5 26 32 -6 35 k2 -7 k6 54 -8 60 69 -9 77 88 -10 100 113 The values below the starred line may not be used since corresponding intervals are not contiguous. Since -3/8 < R-, < l/^, as follows from the initialization rules (k=0), the constants S can be, without problems, restricted to the set {3, . .., 9)» By the same procedure the interval bounds a and b for R are calculated. The correspondence between values of S and allowed intervals is given in Table 2.2. Table 2.2 So a < 6iHL < b 2 - 2 10 ■42 -36 9 -37 -33 8 -33 -29 7 -29 -25 6 -25 -21 5 -22 -18 k -18 -Ik 3 -Ik -10 2 -10 -6 1 -6 -2 -2 3 -1 2 6 -2 6 10 -3 10 1*. -k 11+ 18 -5 18 23 -6 23 27 -7 27 31 -8 31 35 -9 35 39 10 39 k2 The intervals in which R may be found are contiguous and S can have all values from the set {10, ..., 10}. For the remaining steps, k > 3> the simple relationship holds between the value of S and the bounds of the corresponding interval: (-2S k -l) < 321^ < (-2S k +l) (2.10) and S e [10, . .., 10} The result (2.10) indicates, first, that the selection rules for k > 2 are invariant and, second, that the selection can be performed by rounding the scaled remainder to one non-sign digit (in radix 16). The selection process itself becomes very simple after the first three steps, due to the following -k fact. The last term in the remainder recursion, l6 S V R , cannot affect the most significant bits of R. ,,, used in selecting S.. ,.., for k > 2. At least ° k+1' k+1' k - 3 most significant digits of R, remain unchanged except for possible complementation, due to the change of the sign. Therefore, at the k-th step, constants S , S , ..., S are known. For m digit precision, when K K+J- ciJK— 3 k > (m+3)/2 all remaining constants S , ..., S are actually available and the process of normalization can be simplified. Namely, the basic remainder recursion (2.8) can be replaced by a simple form: R^ = 16R^ + S k , k > (m+3)/2 (2.11) The aforementioned features reveal the amenability of the normalization procedure to higher radix implementations. Once the process gets started, remaining steps are progressively easier. As mentioned before, for k > 3 the selection of S ' s can be performed through rounding, i.e., the most significant non-sign (radix 16) digit of R represents the correct value of S, , after rounding. It is, then, natural to specify the selection rules for "irregular" steps k = 0, 1, and 2 through a modified rounding procedure rather than using a table look-up or a direct combinational approach. 10 The following definitions are relevant to the description of the selection rules as well as the algorithms. Sign and magnitude representation of the constants S, : 3 ± S. = (l-2s. ) Z s.2 , s. e [0,1} for all i; 4 i=Q l l (2.12) Two's complement representation of scaled remainders: tan -i R, = -r~ + E r.2 , r. e [0,1} for all i: i=l (m is the number of radix l6 digits) (2.13) Truncated scaled remainder: 6 „-i r.2 i=l \ = " r o + A r i 2 ' Won -sign part of R. : (2.1*0 Then T k = I r2" i i=l i 6 - -i • x. Z r.2 if r Q i=l Li t\, = 0; Step-dependent rounding constant: -i U, = S u.2 , u. e 10,1} and K . -. 1 1 1=1 u. = r.(\) S. = I (T. +U. ).l6j and k L k k J (2.15) (2.16) Sign (S fc ) = Sign (R^.) (2.17) where [Y\ denotes largest integer not larger than Y. n Step k = For the initial step, a modified procedure in selection is applied. The extension of normalization to the negative values of X is easily achieved through the initial step. The selection rules and the starting value of the scaled remainder are specified as follows: For X Q e [1/2,1), S Q = 1 if 1/2 < X Q < 5/8; S Q = if 5/8 < X Q < 1; (2.18) For negative values of X , it is a simple approach to determine S by the rules analogous to (2.18) and then generate the negative of X.. "while calculating R as R x = -3^ - 1 = (X Q + X Q S ) + 2 " 1U - 1 (2.19) and proceed with normalization as in the case of positive X „ Step k = 1 From the Table 2.1, the interval break points are chosen so that the simple values of U are sufficient to obtain S . According to the proposed approach s i = L( T 1 +u 1 )- l6 J and Si e n ( s x ) = Si e n (\) In Table 2.3 the correspondence between intervals and rounding constants (U ) for each allowed value of S is given. Table 2.3 12 \ < ° s l 6Mr 1+ i) 64T X 64U 9 4o in 23 22 14 8 42 45 21 20 7 44 45 19 18 10 6 46 47 17 16 5 48 49 15 14 6 4 50 51 13 12 3 52 53 54 11 10 9 3 2 55 56 8 7 2 57 6 1 58 59 60 61 5 4 3 2 62 63 i 1 1 -1 2 3 4 2 3 4 1 5 5 6 6 -2 7 8 7 8 9 9 10 10 11 11 -3 12 13 14 15 12 13 14 15 13 Now, it is a simple task to find relations "between IL and R. in the ' Ik form of Boolean equations. The derivation of those equations, listed below, is given in Appendix A. II = Z u.2" 1 i=l U x = u 2 = 5 ° 2 (2.20) U 5 = r + *fk U 6 = *?k Stev k = 2 Using the data from Table 2.2, the interval break points are selected in such a way that the corresponding values of T and U will produce correct values of S : S 2 = L(T 2 +U 2 )l6j and Sign (S 2 ) = Sign (Rg) The correspondence between R~ and U is shown in Table 2.4. The Boolean equations are derived in Appendix B and listed below: U. = U = u_ = u. = 12 3 4 u 5 = r + r i (r 2 + V + r 6 (2 * 21) u 6 = r Q (r 1+ r 2 r 3 ) Table 2.1* Ik \<° S 2 6MR 2 +D 6kT 2 6i*u 2 10 23 1*0 3 26 37 9 27 36 50 55 8 31 32 31* 29 7 35 28 58 25 6 39 21* 14-2 21 5 1*3 1*5 20 19 18 2 1* 1*6 17 1*9 11* 3 50 13 53 10 2 5** 9 57 6 1 58 5 - 6l 2 62 63 1 Table 2.k Continued 15 R 2 > ° 6kt 6kT, 6ifU l 1 2 -1 2 2 5 5 -2 6 6 9 9 -3 10 10 r? 13 Jf ik Ik 17 17 -5 18 18 21 21 -6 22 23 2lt 22 23 24 1* 26 26 -7 27 27 30 30 -8 31 31 .... 34 ^ -9 35 55 - 58 38 -10 39 39 lt2 42 * - or 6kU = 2 whenever t/- = r^ = 1. 16 Step k > 3 As mentioned before, the selection procedure for all remaining steps consists of rounding: S k = L( T k + V ,l6j > Sign ( s k ) = s ig n (\) where U k = Z V" 1 = 1 ^ 32f i ' e '> i=l u. = 0, t 5 (2.22) « -1 We now summarize the multiplicative normalization of a given number !lj21 -j_ X = -x + Z x.2 in the form of the following algorithm. i=l 1 Algorithm N (Multiplicative Normalization): (2.23) Step Nl. [Initialize] k <- 0; S Q «- 1 if 1/2 < X Q < 5/8; S Q - if 5/8 < X < 1; R 1 «-X Q (1+S ) - 1; Step N2. [Loop] for k < m perform: k +- k + 1; S k *" l ( T k +U k )l6j; Sign S k «" Slgn R k= if k < (m+3)/2 then: - 5 L \K> -k+1 \ + l - l6 \ + S k + l6 S k\ = c+1 else: \ + i - 16 \ + s t' IT where [Yj denotes the largest integer not larger than Y; T and U are defined according to (2.15) and (2.l6) with U l = U 2 = ° ; u 3 = K i r oV u^ = \(r +rf k ) + Kg[r + \{\ + ? ? ) + r g ] + K u 6 = K i ? 3 r ^ + K 2 r o (r l +r 2 r 5 } and K , K and K stand for k = 1, k =2 and k > 3, respectively. This algorithm will normalize a given number X to one in (m+l) steps with the error bound |X . - ll < 2/3 • i6" m (2.2k) 1 m+ ]_ 1 _ / -- and simultaneously generate digits of the continued product representation of 1/X n « The method of multiplicative normalization is convergent by definition of the procedure and the existence of such procedure has been shown by construction of the selection rules. In deriving those rules, the aim was to achieve sufficient simplicity, not necessarily optimality. The implementation aspects will be discussed later. From the algorithms considered here, division and (natural) logarithm are based upon the multiplicative normalization and will be described in that order. 18 3- DIVISION As stated before, the proposed algorithms apply to floating point numbers with binary radix of the exponent. The radix 16 or, in other words "k bits at a time" is used to speed up operations on fractional parts. Since, in general, the exponent manipulation presents no problems, we will not be concerned with the exponent arithmetic here. Let Y», X_ e [l/2, l) be fractional parts of the given floating point dividend and divisor, respectively. Then by multiplying both the dividend and the divisor with the same sequence of factors M, •-■J*-.- 2 ^ (3-D X n f M. °i=0 X if X qTTM. -> 1, the fractional part Q of the quotient is obtained as Y irM. . Defining the factors M, to be of the form \ = 1 + S k -l6" k (3.2) and S e [10, ..., 10} one can determine constants S through the multiplicative normalization (Algorithm N). To form a desired quotient Q, let Q. = Y and define recursively the partial result as \+l = \'\ = \( 1+s k * l6 " k ). < k < m (3-3) 19 Then Q = Q, , has m correct digits, the error "bound in the normalized divisor being |X _ - ll < 2/3 »l6 . 1 m+1 ' — ' In presenting the algorithm for division, as well as for the other operations, it is assumed that the normalization and the result evaluation are carried out in two similar arithmetic units. Later, when discussing implementation aspects, it will be shown how the proposed algorithms can be realized with essentially one arithmetic unit with a tolerable decrease in performance. Algorithm D (Division) (AU1: Normalization) (AU2: Result Evaluation) Step Dl. [Initialize] k «- 0; Step Nl (Algorithm N); Q Q «- Y ; Step D2. [Loop] for k < m perform: Step IG; \ + l-\ + Vk*"*' An example of the division is given in Figure 3-l» The "predictability" feature, described before (2.1l), is apparent at step k = 7- the first five digits of Rq, when recoded, are the next five constants S Q , ..., S_ . o 12 In Figure 3-2, the basic hardware configuration, consisting of two arithmetic units, is shown. The control part is not described. The only difference between the two arithmetic units is that AU1 has the additional network TU and the five-bit register S, used in the selection process. 20 fv) © r- «-* >t cr 00 in s* •-t >* CP h •O CO «-» (T- ** "Q r*- h- O f— 4 >* cv O r^ r- r- vT -J IP in r- cr •— ( — < CO C\J <\J f* f-« r-4 ip <■ IT. C a f\J O «M vO m *• n!" m 2: e ro a 00 cr Cv! cv cr cr s cr oc *— cc r~ vD nC c CO r- CO CO 00 OC CO • r-* r- < O r- «* e~ IP ITi in m in in in in c it: UJ c cr O^ or- 00 CO CO CO cc 00 CO CO O r-» cr m Vf ** Sf ■** sl- >t t II 2: m (\i IT* in in ip in ir. ip in in m m o ^-t cr in rr r<^ ro m f*. ro ra CO ro on ro X w ir\ oc oc CP cc oc 00 CO CO CO 00 CO cc *v o c O O O O O O O O O >• w* or r\j in m in r~ in CV r- f. •-t r<. r- O O *^> (\j >r in en CV in r- r- f<^ O c CO _j c •— • 00 O O •-» in 00 O O r- 2: cc c h- cr >t cr »_< cr O C c <* *— • r- vC 00 in m r-« h- O cr O c r-< f* cr Cr CO cr. h- a> a^ O Cr O m + UJ c* cr O c 1 in X a -0 c a^ as Cr O c rn» t z; cr O cr cr. cr Cr O c •H k~< cv cr. cr cr C> O a W r- O cr c^ o> ■cr O II C O 9-i «-« O *■* O *•* O i~l r-4 »-» •H > fO CsJ *T in CO r- cr cr cr cr cr o r- • o 11 c X < o V X cr uj x CO <\i c o o o o < o o o o u. li- CC o < O u_ u_ C <_> u. u. cv; <\> o u. r- u. o co (M in r-t CJ fVJ cu Uj — t nO < UJ r- o c o a- < Uj m o po Cr UJ CO h- cr in in c cr < UJ cu r~ cr. in r- < Uj IP f*\ r- < in •— < CO sO IT* CO < < o o >c- cc CO CC < UJ CO CO Cr C> «-> U. «t CD IP. cr r*- r- «r t-> in o o CO in cr r- r^ in 00 O r^ CO o U- cc IP. cr f- r» »t in <*» rvi 00 o u. >* CC in cr w I in in m l I m >t r- sO 1 CO I I l —t CO in 00 -t Csi 21 aui: cb R»+i — »j APOEP I J T~T k s k R k+1 »16R|| + S|, +ia- k " , " 1 S| ( R|i AU2: 0k^l-0 k *16" , S k Q ll (0;±4;t8)«l6~''* 1 R|i | ADOEP. | SELECT-COMPLEMENT U— S h ^_^- ♦ (0;il;±2)»i6- ,1 * 1 H k i I i n i ■ R k " SELECT-COMPLEMENT [•— S k SHIFTING NETWORK U— k Y- Rn + i °k + i | AOOEP. [ ♦ (0;±4;±8)«ie -»# ■ SELECT-COMPLEMENT^ *— S k | ADPEP J I I I (o;±i;±2)«i6-' l o,i I y- s » SELECT-COMPLEMENT j— a h 1 16- k Q k ii SHIFTING NETWORK U— k K- Qk + i Figure 3-2 22 k. NATURAL LOGARITHM E Let X = X »2 be a given floating point number with fractional part X e [1/2,1) and exponent E . In formulation of the algorithm in radix l6, we follow the same approach as for the radix 2 case, given in [1]. To obtain to X, the problem is split in two parts. Namely, toX = toX n + E to 2. x (h.i) The algorithm for calculation of the first term to X is derived from the identity: X n = X. 7T M./ w M. ° 1*0 X i=0 1 -k ft.2) where multipliers are M, = 1 + S, »l6 and constants S e [10, ..., 0, ..., 10}, as defined before for the multiplicative normalization algorithm. Now, m m to X = to (X 7T M. ) - to ( $ M.) i=0 X i=0 1 (M) From the normalization algorithm N (2.26) we know that the error in normalization is bounded by |x m+1 - l| < 2/5-16- m Therefore, to (X ir M. ) = for m digits precision and to X is reduced to ^o 1 m m -i. toX = - to ( $ M. ) = S [- Ml+S.16 ] i=0 x i=0 x 0+A) 23 To obtain Ibn X , one needs to perform the summation of m + 1 sets of precomputed constants of the form - ton (1 + S. l6" k ) stored in a fast read-only memory (ROM). The stored constants are retrieved from the ROM using S, ' s, obtained in normalizing X . It is clear that this procedure is equally well applicable to any logarithm function — just the set of stored constants need to be precomputed in the corresponding base. The summation (k.k) is performed recursively: I^ +1 = 1^ + [- en (1 + S k -l6" k )] for k = 0, ..., m where L = (k.5) Then L n = P/n X-. It should be noted that this result may not have an m+1 accuracy of m digits because the stored constant cannot be exactly represented with m digits and during m+1 additions, errors will accumulate. If the result P/r, X is to be correct to m digits (km bits), i.e., with error less than 1/2*2 , then the precision of the second arithmetic unit should be extended by Am = [Pn (m+l)/0* 21 where [X] is the smallest integer not smaller than X. For m = 12, this amounts to an extension of k bits. If the algorithm is performed in radix 2, to retain the same error bound, the extension will be 6 bits. The calculation of the second term E Pm 2 can be performed using conventional multiplication since E is always of short precision compared -A. 2k to X . The constant ton 2 can be stored in a ROM. As an alternative, it might be convenient to implement E ton 2 using a multiplication algorithm based on continued sums, as described in Chapter 6. Assuming that the length of the exponent E is 8 bits, such a solution will increase the total time by 3 basic cycles but reduce the hardware requirements by an extra low precision multiplier. Namely, after 2m X has been computed in the second arithmetic unit (of Figure k.2), this result is taken as the first partial product. Then the first arithmetic unit is initialized to E , the step counter set to zero and the multiplication E ton 2 is performed using recursion (6.2): L. +1 = 1^ + {ton 2)S k 'l6" k for k = 0, 1, 2 where m+1 and S n , S , S are obtained through the additive normalization algorithm. Therefore, the last partial product P, will represent the final result feX. This approach has assimilated the extra add step, indicated in (^.l). For higher radices the set of constants S is enlarged and consequently the capacity of the ROM must be increased. As in the radix 2 -k case [1], there is no need to store all of the constants [-^(l + S*l6 )]. The possible reduction can be shown using the power series expansion for the logarithm. The constants to be stored are of the form ton (1 + S *l6" k ) = ton (l+a) and S € [10, •••, 0, •••, 10} 25 Then fa (l+a) = a - i a 2 + H.O.T. f or - 1 < a < 1 2 iW^ 10 - 1 + lua ,,,,, For k> k]L ,-5~8 - ^ Q +km (k.6) and m digits accuracy fa (1 + S. -l6" k ) = SL -l6" k eliminating the need for storing more constants. Therefore, for radix 16 the necessary capacity of the ROM is at most 10m + 15 words of m digits (km bits) each, including the constant fa 2, used in the initial step as well as in evaluation of the term E fa 2. x The algorithm for natural logarithm is given below: Algorithm L (Logarithm) (^.7) (AU1: Normalization) (AU2: Result Evaluation) Step LI. [Initialize] k *- 0; Step Nl. (Algorithm N); L Q - 0; Step L2. [ Loopl for k < m perform: Step N2. if k < k L then: Vi~\ - " (1 + s k l6 " k) = else : Step L3. [Form in X. + E fa 2] L x J Step Al. (Algorithm A); (L = L ); m+1' - initialize: R *- E ; x for k < 2 perform: Step A2. L^ -\+ (* 2)S k l6" k ; 26 where k is given by (^.6), and algorithm A for additive normalization is given in (5*6). In the example, Figure k-1, only the calculation of the Sm X part is shown. The additional requirement for implementation of the logarithm algorithm is a read-only memory (ROM), connected to the result evaluation unit (Figure lj—2). 27 in cm %*■ ♦4- CO co O cm ro ** CO CM CM •4" CO m co •4- CM co ro m <■» o> in m ft in CO r- h- O* _J m en CO O co m in CO ft r-4 CM CM < in O «4" cr O m ft CM CM cm CM 2: O r- O CM -t r» Q CM 3- *■ »* »— 1 as CO CO 03 co r- <\J ft r-4 ft r-4 r-t r-4 r-4 O r-4 r- r- a h- r» r- r- r- r- r- ♦ UJ r- in *o vj" m CO J- p-< CM «o «-4 t-t t-> r-4 ft t-< ft f-« t-t _J ft 03 CO m fTl m m m ro ro CO ro co z ro o> m rsi CM cm r-j <\J CNJ <\i CM nj CM •— • in o <4" CO CO O ft CM ■o cm CO ♦4" ro 03 sO ft a O »o «-• rm 0^ CO in CO in r~ r~ g3 O co _J r-« r- CO ro 03 >-t CO m •0 O cm < t— 1 a* »4 ro r- a* CM X r-4 in CM 0* m in CO cr cr» in »— « MO r-4 tf r~ O c c> 0^ o» • r-< O m in ft CM ro r~> O ■f UJ -t r- m 00 CM O O 1 *: o> t-t «4" C. O O cr a X CM in O O O er H it 2 03 CO f-« c O O a> o> o> 1 — » 1— « CO fO O O J- O «* —* O O 0^ 0^ 0m Q O CO in 03 O O U_ O* CM O U. in _J O CO ft tu m «4- ^ CM 31 03 CO 03 CO UJ rr\ cc h- vt CO UJ in 0* CO •— . O in 03 LL O CO 03 «* *- * < CO <4» 0^ (X UJ O m CM co O co 03 < •4- LL tn X CO < >i- •0 w-t CO Q «r 03 O < < m • u. co m CM ft GO CM -J" 03 O >!■ co O Z >— • CM o* m CM 03 ft r- CM >*- 03 < in H O X ft CM O g} en CM -O CM 1 1 1 1 1 1 1 CM >0 r-l CM ro in co o* ft CM 28 AUi: i --< cp t *». k fr k Rn-I"16Rk+Sk+18~ M ' l SkRk AU2: Lk+l«Lk + Ck,OSk 10, (maximal allowed value for S, _) then this pair is recoded as K+l — ' k+l S n S. , n , where S n = x n +1 and S, ,. = -(15 - x, ,.). Otherwise, S, = X, . k k+l' k k k+l k+l ' k k To define the selection rules, we proceed as before. First, the scaled remainder is defined as Rk = I6 k ~\ (5-2) k-1 , m± where X, = X_ - z S.l6 ^ ° i=0 x 30 Then, Vi - l6 \ + i -k- ^(^ - S k -16 *) l6R, - S , < k < m (5.3) represents the basic recursion. This recursion corresponds exactly to the recursion (2.11), except for the sign of S, . The scaled remainders are bounded, similarly as before: l\l < 2/3 (MO and the selection rule is the same for all steps. The rule is simple: S. equals the scaled remainder R, , rounded to one non-sign digit, the sign being that of R. More precisely, the selection rule is: S k = ^ T k + V* l6j and Sign S k = Sign \ < k < m (5.5) where i=l -l if R, > 0; T = Z r 2" 1 k i=l X if \ < and \ m " r o + X r i 2 " ls 1=1 U = 1/32. Thi s choice of conventional rounding, i.e., U = 1/32 will actually restrict the set of constants S, to [8, ..., 0, ..., 8}. This choice is 31 preferred because it is simplest and the restricted set of S is sufficient for recoding. The choice of U = 6/256 would require a full set of constants S. . Although the selection rule (5*5) holds for the initial step as well, it is more convenient to use the fact that for |X | > l/2, as is the case, |S I can always be taken to be 1. From the definition of the scaled remainder (5*2) it follows that it is preferable to start always with |S I = 1. Otherwise, for k = 0, to prevent loss of accuracy, an extension of one radix-l6 digit would be necessary as well as an additional right shift path, needed only in this step. As indicated before, the result of additive normalization is always exact, i.e., x - § s.i6 _i = i=0 x For reference purposes, the additive normalization is summarized in the form of algorithm A, given below. Algorithm A (Additive Normalization): (5.6) Step Al. [Initialize] k *- 0; s -l; R i *~ x o " s o ; Step A2. [Loop] for k < m perform: k - k + 1: S k «- l (T k + U k )l6l; Sign S k - Sign J^ Vi - l6R k " V where U = 1/32, 32 6. MULTIPLICATION The algorithm for multiplication, given here, is based on a conventional procedure applied to the radix l6 case. Let the multiplicand and the multiplier "be floating point numbers, satisfying the usual requirements, i.e., ~F Y= Y n 2 Y Y n 6 [1/2,1) "0 "cr "0 : X n 2 X X n € [1/2,1) Again, only the multiplication of fractional parts is described, omitting straightforward exponent arithmetic as well as postnormalization of the result. Consider P = Y o x o = Y ^ [ X ^ - 2 Z. + v Z.l 00 i?o x i=o lJ where m is the number of radix 16 digits in the fractional parts. If terms -k Z = S *l6 are properly chosen, then m x o - ifo s i- 16 " 1 ■ ° and m p = Y o A s i' 16 " 1 {6 - 1] 33 where the constants S,, as before, are from the set [10, . .., 0, . .., 10}. Applying additive normalization on X , the constants S can be obtained, as described by Algorithm A (5.6). The stimulation of the partial products is performed simultaneously in the second arithmetic unit, using the following recursion: P. ^ = P. + Y.'S 1 *l6' k , < k < m (6.2) k+1 k k ' — — where P - 0. Since the normalization of X is exact, the only error in multipli- cation comes from the single precision result representation. The algorithm for multiplication, compatible with other proposed algorithms, is given below. Algorithm M (Multiplication): (6.3) (AU1: Normalization) (AU2: Result Evaluation) Step Ml. [Initialize] k ^ 0; Step Al (Algorithm A); P «- 0; Step M2. [Loop] for k < m perform: Step A2; P.^, - P v + Y n S. if k+l ' "k ' "(Tk'"" An example is shown in Figure 6-1. For implementation, shown in Figure 6-2, an extra register to hold the multiplicand is needed. 3k o o o r- -t in oo ro o it o X ■M- o >- >*«oro0— lr-4f-4r-4f~lr-4r-li-lr-4r-4 Z CO r» r-<< r-l r-t^rlr-ir-tr-ir-tr*^ »-i Cj> O CM CM NNNNMIMNNN ooooooooooooo *t o O in CO CO co CM r- in o in vO ro «o ro 00 O o c o o c o o o -~ r» - CM CO CM in co r- o* a* o* o» o • o < a: o LU -I O ♦ < (X UJ I 00 ro CM O o o o o r~ o ro < O CO o Q u_ u. u. UJ 00 CM o in o o 00 ro CM O O O o o Q ro o o c CO o o ti- ll. u_ UJ CO CM O O o o CO ro CM o o o o o o o o o o CO ro CM o o o o r- •-« o o o o o o o o o o 00 ro CM o o o o o o o o o o o 00 ro CM o o o o o o o o o o o 00 ro CM o o o o o o o o o o o CO ro CM o o o o o o o o o o o 00" ro CM o o o o o o o o o o o CO ro CM o o o o o o o o o o o o 00 CO o X i/> r-i m I ro I r- f-* CM O ^ « 3 -\ +k \ (k.6) the logarithmic constants can be replaced with S *l6 reducing the basic recursion (7.12 ) to: R, _ = l6R - S , f or k > k (7-13) k+1 k k - 1 The last expression shows that for k > k, the selection process will be identical to one defined by the additive normalization (5*6). It would be, therefore, natural to try to find selection rules such that the similarity with additive normalization can also be satisfied for k < k.. . 39 We now consider rules for k < k in reverse order. First we recall that the selection rules are determined by choosing appropriate boundaries between intervals in R corresponding to particular S . Next we assume that the five bit precision is sufficient, i.e., the boundaries between intervals can be represented as L/32, L being an integer. To find an interval in R, corresponding to S , the bounds are determined as K. R^ e((T^ +1 -l6~ X + l6 k "Vl+S k l6~ k ), r^-16" 1 + l6 k "^(l+S k l6" k )) (7. 1*0 where R , -, and R are minimal and maximal allowed values of R _ , respectively. From the power series expansion, we have fcz (1 + S. l6" k ) = s. l6~ k + e (7. 15) K. K. K. where 1 s „ 2 + 1 V 1 V + ,_ .,, the condition of expansion clearly being satisfied. Let e k = l6 k -e k (7.17) then, to select S , R must be in the interval or, using the assumption of 5 bit precision \ « J where [xl denotes the smallest integer not smaller than x, and [xj denotes the largest integer not larger than x. Since R and R, have asymptotic limits - 2/3 and 2/3, respectively, it follows that if | e, | < 1/6, then a, and b can "be determined as \ - rs( Sk + i + s k)i and for all S (7.21) b k ■ i 2(i W + s k^ Clearly, |e, | < 1/6 is a sufficient but not a necessary condition. If the last expressions for a, and b, are valid then for selection of S, the rounding of R, to the most significant non-sign (radix 16) digit suffices, i.e., the selection process becomes the same as in additive normalization. Of course, once S is obtained, the next remainder is calculated using all terms in the basic recursion (7-13). By calculating e, it turns out that for k > 3, | e, | < l/6 and hence we have simple selection rules as before. For k = 2, it can be shown that it is possible to find intervals in R 2 for all S g except S 2 = 10, such that if R g e [ (2S -l)/32, (2S +l)/32) then S is the correct constant. Therefore, if the range of R is restricted so that S = 10 is excluded, again rounding can be used as a selection rule. It has been found that this restriction in the possible range of R does neither affect the selection process for k = 1 and k = nor the representa- tion of e For k = 1 intervals are determined, as before, using (7*1^) and the results are given in Table 7»1« Table 7.1 S a < 32R X < b 1 10 15 32R 9 lit- 15 8 12 Ik 7 11 12 6 9 11 5 8 10 k 6 8 3 5 6 2 3 5 11 3 0-1 1 -1 -3 -1 -2 -5 (-11/2) -3 -3 32^ -11/2 For all other values of S , i.e., for (10, . .., k) intervals are not contiguous and hence those constants may not be used. In fact, if the possible range of K p is not restricted, S = k can be included in the set of allowed constants in step 1. The actual selection rules for k = 1 can be specified as in multiplicative normalization, i.e., using conventions described by expressions (2.12 - 18) one can determine an additive constant U and through modified rounding obtain S . Another choice, which is given here, is to restrict the range of R. so that conventional rounding applies. This restriction should not affect the possibility of representation of the required result. From Table 7.1, if -,171 < R, < ,212 then S €{2,1,0,1,2,5) k2 can be selected applying rounding to one non-sign digit precision and no special rules, differing from those for k > 1, are necessary. To obtain R in the desired range, the 'following initialization (step k = 0) can be devised. Since X e (- fo 2, 0] by assumption and R = X = X - ton M the rules are: - Table 7- ,2 x o M o ton M R l [-1/8, 0] 1 [-1/8, 0] [-3/8, -1/8) -iA e ' -iA [-1/8, 1/8) (-*i2, -3/8) e -17/32 -17/32 [-.162, .157) x o Since e e (l/2, 1], it can be easily shown that such a choice for the initialization as well as the restricted sets of constants in steps 1 and 2, i.e., S, e [2, «>.., 3) and S e (9, ..., 10} can give the correct continued product representation of the result. A summary of the procedure for evaluation of e follows: Preparatory Operations P (7*22) Step PI. N «- Xfa^ e; Step P2. if X > then: I 4- [H] + 1; else : I -[HI? Step P3 . F «- N - I ; X <- F ton 2 ; h5 where [N] denotes the integer part of N; after preparatory operations, X will "be in the range (- in 2, 0], with corrected integer part I. Algorithm E (Exponential) (7.23) (AU1: Normalization) (AU2 : Result Evaluation) Step El. [Initialize] k «- 0; E i - x o - in %' E i *" M o' Step E2. [Loop] for k < m perform: k <- k + 1; if k < k, then: K ,, «- K + E. S, 17~ k ; 1 k+1 k k k ' s k-^ ( V u k )l6j; Sign S k ♦. Sign R^; EL - l6R k - l6 k Sm. (1 -+ S k l6" k ); else : , Step A2. (Algorithm A); where k is defined in (4.6), and U = 1/32. An example is shown in Figure 7-1. The read-only memory, for this algorithm, communicates with the normalization unit, Figure 7-2. The remaining configuration is the same as before. kk o o O ro CM r- sO ** in in v0 in in o o o in CM ** o o PO ro o cr CT -"- o o o r~« r-4 CO in CM cr cr o in in -J o o in CM CO o ro CO m m CO ro ro < o o r- CO co O r- CO o o o o o 3: o o CO in m cr m CO CO CO CO CO CO >-* cr. o T-t CO J- ** ** ** >*- «* •4- ♦ UJ o o r-\ r- m r~ r- r- r- r~ r- r- r~ *: a o o o r» m CO ro ro m ro ro m m UJ o o r- CO CO CO cc CO 00 CO CO CO CO z o in m «*• >r ** «J- >t >* vt ** *■ •4- i— § o r-i O o o cr> o o o i— \ o o O ***■ o CD <> o^ o> cr o- cr cr a* cr cr cr r-l O r- r-t o r-l NO cr ro in CO <7» »* >* r-4 cm •4- eg *■ r- m r- in in r-l o o r~ o O in r-4 ro o r-l in o r- t- r-t o o vO ro -•» o «4- CM vt o o r-t in ro ro O o o cr f- _l e> r\j CO ro CO o O ro O a o o o o in < o >o CO CM r-l r-l in cr O o o o o CO in z: o cm CO cr CO cr cr O O o o o o • CO >=* o cr CO *t r\J r* r-l o O o o o o r-4 • r-l O o ro cr cr CO o O o o o o o o o + UJ o r-t o ro r-t o O © o o o o o * o o ro C\J sl- O o O o o -o o o o II X o in O o O © o o o o o o o <-» ii z o ro r-l o O c o o o o o o o a u. *—) o ro O o o o o o o o o o o X — t-» o O o o o o c o o o o o ** • • • « • • • • • • • • • a o o o o o o o o o o o o o X 1 1 1 1 1 1 1 1 1 UJ cr cr in m o CO ro cr in II O x cr tn cr «i- O ro r- in in co • o II UJ Q « X — < 3E *-» O UJ r-l Q •► < X X CC UJ I CO cr cr cr cr cr cr cr cr cr cr Z r-t 1-4 • W I o Q Q O o CM «_) r- o U. CO o o cr nj u. «* co o < ^> uj in O o co u. 00 < O CO rvj o o r\j O -4- O -4" o u, ro u, ^ »h o >t »* o m «* i i H I * O o o u. < CO O tvt «4- -4- u. u, o o o u. < CO >o CM o o o o < CO NO CM **- «-t u. I I 0J * o o o o o o o r-t in r- cr Q CO CO H I * O O O O O O O r-l m r- cr o co co o >0 CO CO *«• r- »o cr o CO o X a. x UJ in o OJ cr I CO CM I r-l US OJ O r-l CM CO in co CM AUl: h5 H" EJ Rk+l ADDER U k r~r { { Rk + l s 16"k-16 h C|,, C h =^(M h ), l SELECT-COMPLEMENT 16""Ek<> SHIFTING NETWORK Ek + l Figure 7-2 k6 8. IMPLEMETWATION One of the basic features, relevant for an efficient realization of the previously described algorithms, is that the original operation is replaced by the two much simpler processes of limited dependency. Through one process, the normalization, a sequence of constants S, , the digits of the continued products (sums) are generated. Another process, the result evaluation, produces the final result using constants S, . Both processes are defined recursively, requiring only simple hardware operations: addition, shift and multiple formation. A realization, providing one separate arithmetic unit for each process, clearly offers the fastest solution and the simplest control requirements. Since both units are identical, as far as the main configu- ration is considered, a replication using an advanced technology should make this cost acceptable. If the speed is not of primary importance, one arithmetic unit can be used in both processes, performed in series. For simplicity, the processes should alternately use the arithmetic unit so that the current value of S, need only be available. A "pipelining" of processes through one adder and shifting network, described at the end of this chapter, can achieve only 15^-25^ slower operation than the double arithmetic unit realization, using essentially one arithmetic unit. As mentioned in the Introduction, this investigation of the use of radix l6 in implementation of the described algorithms, has been motivated V7 by a possible speed improvement over the radix 2 approach and by some trade- offs in hardware requirements. In the following comparisons of the radix 2 [1] and radix 16 solutions, a two arithmetic unit realization is assumed in both cases. Furthermore, no actual design being done in either approach, given comparisons are approximate in nature and restricted only to the size-dominant parts. Control is assumed to be synchronous, each recursive step being performed in one basic cycle. The main parts of the arithmetic units, used in comparisons, are as follows. a) The adder structure with the multiple formation networks, used in the radix l6 case, is estimated to be twice as complex as the corresponding part in the radix 2 case. Namely, in the former case, two adders and two (l out 2) select and complement networks are required, while the later case requires one adder with one select -complement network. The speed of addition in the radix 16 case will be only slightly decreased if both adders are unified into one three-input adder. If the add time of a two- input adder (the radix 2 case) is t ^, we estimate that t _/- < 1.2t ^, * a2 a 16 a2' for sufficiently large m. b) The shifting network, required to shift right/left k-digits, for < k < m - 1, is simpler for a higher radix. We assume that the fast shifting network is realized using a "barrel switch" technique [8]. Namely, shifting is performed in two or more levels so that the combination of level shifts corresponds to the required total shift. This technique, besides being fast, ensures low loading requirements and the shifting can be done using same paths both ways: shift count is represented in two's complement, a negative number specifying left shift. Implemented in integrated kQ circuits technology easily with its regular and simple structure, the barrel switch as a standard block can be used also in some other operations, e.g., shifting, normalization, etc. Because of an additional level, the shifting network in the radix 2 case is estimated to require 30$- 50$ more hardware than radix l6 for m = 48 to 6k bits. For example, if m = k8, then level 1 may provide displacements of 0, l6 and 32 positions, level 2 provides then displacements of 0, k f 8 and 12 positions, and in the radix 2 case, an additional level 3 would be necessary with the displacements 0, 1, 2 and 3 positions. Speedwise, then, t > 1.3t h -,s, where t , denotes shifting delay. c) The selection procedure in the radix 2 case requires implementa- tion of a simple k- bit comparison. For the radix l6 approach, the required precision for selection is 7 bits and the five Boolean equations (2.23), costing less than k-0 literals, are to be implemented. As described before, the selection is performed using rounding, so the additional inputs to the 7 most significant positions of the adder should be provided as well as the 5 bit register S to store the current value of the constant S, . Even with those requirements, the selection hardware size is small compared with the rest of the unit. In the radix 2 case, this is even more true, so the selection hardware requirements are neglected in both cases. d) For m bits precision, the number of precomputed logarithmic constants, stored in the read-only memory (ROM) is about m, in the radix 2 case, and about 3m, in the radix l6 case. k9 e) The control part, which includes also the step counter (two bits shorter in the radix l6 case) is not considered as being highly dependent on a particular realization technique. From the above considerations, the hardware requirement ratio for the radix 2 and the radix l6 is approximately 2:3. The basic cycle can be taken to be the same, since the add time is dominant over control, selection and shifting time- The ROM capacity requirement ratio is about 1:3 in favor of the binary case. Let the performance of an implementation in the radix r be P = Ib^r/Tr, where Tr is the total delay necessary to evaluate b^x bits of the result, as defined in [k] . Tr is equivalent to the basic cycle. In the radix 2 case, the probability of S, = (p = 2/3) is utilized by providing an adder bypass and reducing the number of full basic cycles to m/3 on the average, where m is number of bits. Then, it can be taken that the radix l6 basic cycle is T /- = 3T p , since the number of basic cycles in the radix l6 case is always the same, the probability of S, = being too low. Then, the ratio of performances is P /-/P = U/3 on the average. If the efficiency of the implementation is defined as the ratio between performance and cost per bit, then, with all previous assumptions, E ^-/E == 1 without consider- ing ROM requirements. If the ROM capacity requirement is taken into account then the radix 2 approach will offer more efficient design, but the radix l6 case will maintain better performance with shorter execution time. The selection procedure for radix l6 has been shown to be sufficiently simple. Even with the available efficient technological solutions, the use of two arithmetic units may be objectionable. Since both the process of normalization and the process of result evaluation have addition as the basic operation, a "pipelined" use of the same adder would be possible, 50 provided proper latching of the operands and the results is made. One way to achieve this is shown in Figure 8-1. The adder with multiple formation networks is split into two equal parts AS" and AS' "by breaking the carry path and inserting a one-bit carry register C. Outputs from the left half SN" of the shifting network are to be saved in a latch L. The selection is carried out in the block S on the basis of adder outputs and returns the value of S, . The initial operands are in register B, for normalization, and in register A, for result evaluation. Each register contains two separately controlled halves, B", B 1 and A", A'. The outputs from AS" and AS 1 are connected, under a separate control, to the inputs of A" and A' registers, respectively. One separate path a from A" to SN" must be pro- vided. The operation of this scheme is described for the division algorithm, with the help of Figure 8-2, with the initial control details omitted. The normalization process requires realization of the recursion -k+1 R = l6R + S + l6 S,R, while the result is evaluated as Q^. +1 = 0^ + 16" SQ.. Since the operand l6R, in the first equation corresponds to the operand Q. in the second equation, additional path b from B to AS must be provided as well as one k- bit register not shown in the scheme, to save the most significant digit of the left half of R. The operation begins with the divisor X in the B register and the dividend in the A register. Corresponding to the scheme, the superscripts 1 and " denote the right and the left half of each result. The basic cycle now contains two periods, each period terminated with the clock pulse. The time of the period corresponds to approximately one half of the full length addition time. The registers are assumed to be of master-slave type. In the first period, R| is obtained and then, simultaneously, Q^ from A' is transferred to B',R' from AS' to A' and the generated carry bit is saved 51 SHIFT COUNT I AS" TJ n SN" AS* SN' B" B' T T A' Figure 8-1 52 ( 1* u-J mi .1 -J J, « E a e 10 ♦ f CM ,a O ,0C /, / + + E . E . E _i = E E CM a st tr CO E / • • • « » Q? # O -J • • • » • • • 1 • • • / • • • • / • CM IT) -N - CM _j S i-« S CM M CO / - ^ _ CM _j / * V ■ 7 •-I IO _i s O s ^ ,tr CO _i ./. CM o cr x A 11 ^ -H - o .0 X - O 5 X CO • • • • -j "m "«* > X = co s < CO 3 , tr . m ir u l = U 2 = U 3 = \ = ° u = m + n^ + m 2 + nw + m^ + m = r^r^-Hr,) (B2) u 6 = m 6 + "7 + "8 + m 9 + ^0 = r l + r 2 r 3 but u and U/- can be given also as : u 5 = r i (? 2 +? 3 ) + r 6 u g = according to remarks given after Table 2.k. Therefore, for step k = 2 U l = U 2 = "3 = \ = ° u^ = r Q + r 1 (r 2 +r" 5 ) + rg (B3) U 6 = r o (r l +r 2 r 3 ) LIOGRAPHIC DATA ET 1. Report No. UIUCDCS-R-72-5^1 3. Recipient's Accession No. itle and Subtitle Radix l6 Division, Multiplication, Logarithmic and Exponential Algorithms Based on Continued Product Representations 5- Report Date August, 1972 6. uthon -< Milos Dragutin Ercegovac 8. Performing Organization Rept. No. erforming Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 618OI 10. Project/Task/Work Unit No. 11. Contract/Grant No. NSF GJ-813 Sponsoring Organization Name and Address National Science Foundation Washington, D. C. 13. Type of Report & Period Covered Research 14. iSuppiemeni ar y Note' Abstracts This thesis describes an investigation of radix l6 approach in defining a class of similar algorithms for automatic evaluation of some elementary functions. The algorithms for division, multiplication, natural logarithms, and exponential evaluation are developed using continued product (sums) and redundant digit sets. A new "pipelined" version of implementation is proposed and some basic comparisons between radix 16 and radix 2 approaches are given. Key Words and Document Analysis. 17a. Descriptors Computer arithmetic, Continued products, Continued sums, Division, Multiplication, Natural logarithm, Exponential, Identifiers /Open- Ended Terms Multiplicative normalization, Additive normalization, "Pipelined" implementation, Redundant digit sets, Radix 16 lC OSA I I I i,Kl/' roup Yailability Statement Release Unlimited 19. Security Class (This. Re port 1 UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price N TIS-39 (10-70) USCOMM-DC A0329-P7 1 •&' «{*• ^