LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 510.84 
 IJlfcr 
 
hfiS& 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/radix16divisionm541erce 
 
s/o. #v 
 
 C 7^P.JV/ UIUCDCS-R-T2-5^1 
 A. 
 
 CAf 
 
 RADIX l6 DIVISION, MULTIPLICATION, LOGARITHMIC AND 
 
 EXPONENTIAL ALGORITHMS BASED ON CONTINUED 
 
 PRODUCT REPRESENTATIONS 
 
 by 
 
 Mllos Dragutin Ercegovac 
 
 August, 1972 
 
 THE 
 
 W 
 
 NOV 1 3 | 972 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMP 
 
 LUN'.HZHI Mil 
 
A? 
 
 UIUCDCS-R-72-5 i l-l 
 
 RADIX 16 DIVISION, MULTIPLICATION, LOGARITHMIC AND 
 EXPONENTIAL ALGORITHMS BASED ON CONTINUED 
 PRODUCT REPRESENTATIONS 
 
 by 
 
 Milos Dragutin Ercegovac 
 
 August, 1972 
 
 Department of Computer Science 
 University of Illinois at Urb ana -Champaign 
 Urbana, Illinois 
 
 This work was supported in part by the National Science Foundation under 
 Grant No. US NSF GJ-813 and was submitted in partial fulfillment for the 
 Master of Science degree in Computer Science, 1972. 
 
Staff 
 
 ACKNOWLEDGMENT 
 
 I wish to express my sincerest gratitude to my advisor, Professor 
 James E. Robertson of the Department of Computer Science of the University 
 of Illinois for his highly valued guidance, suggestions and support. I 
 thank also National Science Foundation and the Department of Computer 
 Science of the University of Illinois for their support. 
 
 Thanks are also due to Mr. Kishor S. Trivedi for many helpful 
 discussions. Finally, I would like to thank Mrs. June Wingler for her 
 fine Job of typing and Mr. Mark Gobel for excellent drawings. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 2. MULTIPLICATIVE NORMALIZATION 3 
 
 3. DIVISION 18 
 
 k • NATURAL LOGARITHM 22 
 
 5. ADDITIVE NORMALIZATION 29 
 
 6 . MULTIPLICATION , , , 32 
 
 7 . EXPONENTIAL % 
 
 8. IMPLEMENTATION k6 
 
 9. CONCLUSIONS , % 
 
 LIST OF REFERENCES 56 
 
 APPENDICES , ' 57 
 
1. INTRODUCTION 
 
 There is no doubt that the available technological possibilities could, 
 justify hardware implementation of a much wider class of functions than is 
 presently done. If the corresponding algorithms have similarity, this is 
 even more true. One effective way to obtain such a class of algorithms is to 
 use, in a convenient way, continued products (CP) or continued sums (CS) during 
 the function evaluation. The use of continued products in the calculation of 
 some elementary functions appears as early as 1959* in Voider' s CORDIC 
 technique[ 5] • The main results of this approach have been recently summarized 
 in the form of a unified algorithm by ¥alther[6]. Specker[7] derived also a 
 class of algorithms using the concept of continued products. Of particular 
 importance and usefulness are the results obtained by DeLugish[ 1] . ■ He has 
 defined efficient algorithms for a wide class of functions including division, 
 multiplication, square root, logarithm, exponential, trigonometric and inverse 
 trigonometric functions, with operation times from 1 to 3 multiplication 
 cycle times. These algorithms are specified for the radix 2, using a redundant 
 digit set [-1, 0, 1} in continued products (sums). The main idea is replacement 
 of a required operation or function evaluation by two simple step by step 
 processes using addition/subtraction, shifting and possibly a set of precomputed 
 constants, stored in a read-only memory. One of the processes is normalization, 
 through which the digits of continued products (sums) are generated and another 
 is the related result evaluation. These processes can be carried out in 
 parallel, so for a fast operation at least two arithmetic units, almost the 
 same, are required. 
 
The work done here is based upon the results obtained by DeLugish. It 
 is motivated by the fact that the higher radix implementation offers some 
 speed/hardware trade-offs, worth investigating. In particular, the radix 16 
 is considered in four algorithms: division, multiplication, logarithm and 
 exponential. The central problem is to find the rules, which are more difficult 
 when the higher radix is used, for selection of the digits of continued 
 products (sums). The rules and the complete algorithms are developed for 
 fractional parts in the conventional range [ l/2, l), of the floating point 
 numbers. The radix l6 merely means "4 bits-at-a-time" and represents, in 
 some sense, the radix of implementation, not of an operand. The exponent 
 arithmetic, being simple, is not considered. The use of a redundant 
 representation [2, 3] effects the selection rules, but not the number of 
 steps to be performed, the probability of zero, "no addition," being too 
 small for the radix l6 approach. In the binary case [1], the redundancy 
 is essential also in decreasing the average number of full steps. As a 
 difference, in the radix l6 case, the number of steps is fixed and corresponds 
 to the number of radix 16 digits, used to represent the fractional part. The 
 digit -by- digit evaluation, employed in the described algorithm, is not a 
 consequence of inherent properties in the continued products (sums) approach, 
 but reflects a realization strategy, which attempts to achieve a reasonably 
 fast implementation for all functions under consideration, retaining at the 
 same time simplicity. Some comparisons between radix 2 and radix 16 approaches 
 are made and a more efficient solution, requiring essentially one "pipelined" 
 arithmetic unit is described. 
 
2. MULTIPLICATIVE ROEMALIZATION 
 
 By normalization we mean a step by step transformation of a given 
 number X e[l/2, l) to one (or to any other number N, in general). For uniformity 
 and simplicity of later described algorithms, the linear convergence is imposed 
 on normalization. Namely, an m- digit normalized number is obtained in (m+l) 
 iterative steps. If the reciprocal of a given number X can be represented 
 in a continued product form as 
 
 m 
 
 1/X_ = TT M. 
 
 i=0 x 
 
 then the normalization of X to one can be achieved through a multiplicative 
 iteration: 
 
 \ +1 = \ ' \, k - 0, ..., m (2.1) 
 
 and 
 
 m 
 
 X Ln = X_ • TT M. = 1 
 m+l . _ l 
 1=0 
 
 The factors of a continued product representation are of the form 
 
 M= 1 + S . l6" k (2.2) 
 
 where S, is a one-digit constant, so that an implementation of (2.1) will 
 require only addition and shift operations. The value of S is chosen such 
 that the error e after step k becomes 
 
where I S. I is the largest constant in the chosen set {S, ). In other words, 
 1 k'max l k ' 
 
 at every step k, by proper choice of constant S, , the k-th digit of the partially 
 normalized number X, will be forced to zero or (radix - l) . Therefore, the 
 final normalized quantity X will differ by at most 16 from unity. To 
 define the normalization procedure, one must find the rules for selection of 
 the proper value of S , given X, and the set [S. }. The set (S } is determined 
 as follows. To make the selection process simple, it is essential to have a 
 redundant number representation. Therefore, the set {S, } will contain more 
 than r elements, both positive and negative. The maximal value | S, |, or 
 better, the redundancy ratio is obtained from the practical requirements[ 3] : 
 - one of the simplest ways to form the term 
 S (X *l6 ) in the recursion, 
 
 is to use a multilevel adder structure, with corresponding 
 selection-complementation networks generating the 
 following sets of multiples: 
 
 tO, +1, +2} * (X 'l6" k ) in level 1; 
 
 10, jh, +8} * (X-'l6' k ) in level 2. 
 
 Therefore, the maximal value should be I S, I = 10 corresponding to the 
 redundancy ratio I S, |/(r-l) = 2/3. The set of constants S is then 
 
 {10, ..., I, 0, 1, ..., 10} (2.5) 
 
 where the overbar denotes negative values. Now the error (2.3) becomes 
 
 IVi'l ■ ii-v M J*f- l6 " k (2 - 6) 
 
It is straightforward to show that for Xe[l/2,l), the reciprocal 
 l/X_ can always he represented in a continued product form using constants 
 S from the set (2.5). Therefore, the normalization procedure defined by the 
 recursion (2.1) is always possible. 
 
 The main problem now is to define a practical selection procedure. 
 The constants S are selected in such a way that the error condition (2.6) 
 is satisfied for every step. Since the set of constants SL has been chosen 
 to be redundant, the range of X , for all k, can be partitioned in the 
 overlapping intervals, each corresponding to a particular constant S . In 
 the overlaps, at least two constants S, , differing by 1, are a valid choice. 
 S can be determined on the basis of X, , but to keep selection dependent on 
 the same register positions, i.e., to retain the same "weights" for selection 
 rules, it is convenient to define the scaled remainder as 
 
 R k = l6 k " 1 (X 1 ^-l), < k < m (2.7) 
 
 The selection process can now be carried on the remainders, obtained 
 recursively from 
 
 R^ = 161^ + S k + l6" k+1 S k R k , < k < m (2.8) 
 
 This recursion follows from (2.1), (2.2) and (2.7). 
 
 If the general form of this recursion for radix r is considered, then 
 the following remarks concerning implementation requirements can be made: 
 
 i) The number of shifting paths, necessary to generate the last 
 
 term in the recursion (2.8), is inversely proportional to the 
 
 radix r; 
 ii ) For the higher radices of the form 
 
 r = 2 2p , p = 1, 2, 3, • • • 
 
such that multiple formation can "be done with a cascade of adders, 
 
 2 
 and with the set of S, 's such that S = — (r-l), the number of 
 
 k. Kmax j 
 
 extra levels with respect to the radix 2 is p - 1. Clearly, the full 
 carry propogation need be provided only at the last level. 
 Now, starting from the condition for error (2.6), redefined for 
 scaled remainders as 
 
 - 2/3 < E^ < 2/3 (2.9) 
 
 the selection rules can be derived. The intervals can be determined by 
 
 straightforward calculations: 
 
 - for every S, , given bounds on R_ (2.9)* find the interval 
 
 boundaries as the minimal and maximal value of R. such that 
 
 k 
 
 equation (2.8) holds. In addition, the continuity of the range 
 should be preserved by retaining only the overlapping intervals. 
 The numbers representing boundaries between intervals should be 
 simple in the binary sense so that limited precision can be 
 used in implementation of the selection rules. 
 From the definition of the scaled remainder (2.7) it can be observed 
 that the normalization will be more accurate if the initial step (k=0) is 
 performed on X and then continued using R. for k = 1, ..., m. Since 
 R = X - 1 this change in the procedure is almost trivial. 
 
 The rules of the selection for the initial step can be made very 
 simple, due to the fact that S may be chosen to be either or 1. The 
 rules are: 
 
 S Q = 1 if 1/2 < x Q < 5/8; 
 
 S Q - if 5/8 < X Q < 1. 
 
The choice of 5/8 as the boundary value is made so that the X will 
 
 be in a convenient range. From the rules, the range of X is [5/8, 5/k) . 
 
 m / ,--i\ 
 It is easy to see that even for m = 1, the continued products . 7T-, ( 1+S . 16 j 
 
 can represent values less than 4/5 or greater than 8/5, hence making 
 
 normalization successful. Therefore, the restriction of S to the values 
 
 or 1 is valid. 
 
 The precision, necessary to express boundaries between intervals is 
 at most six bits after the radix point. For convenience the bounds of the 
 intervals will be given also to this precision. 
 
 For the step k = 1, the lower and upper bounds of intervals containing 
 
 R and denoted as a and b, are calculated for all possible S and displayed 
 
 in the following table. 
 
 Table 2.1 
 
 S a < 64R < b 
 
 10 
 
 -26 
 
 -23 
 
 9 
 
 -2k 
 
 -22 
 
 8 
 
 -23 
 
 -20 
 
 7 
 
 -21 
 
 -18 
 
 6 
 
 -19 
 
 -16 
 
 5 
 
 -17 
 
 -Ik 
 
 k 
 
 -14 
 
 -11 
 
 3 
 
 -12 
 
 -8 
 
 2 
 
 -9 
 
 -5 
 
 1 
 
 -6 
 
 -2 
 
 
 
 -2 
 
 3 
 
 -1 
 
 2 
 
 7 
 
 -2 
 
 7 
 
 12 
 
 -3 
 
 12 
 
 18 
 
 -k 
 
 18 
 
 2k 
 
 -5 26 32 
 
 -6 35 k2 
 
 -7 k6 54 
 
 -8 60 69 
 
 -9 77 88 
 
 -10 100 113 
 
The values below the starred line may not be used since corresponding 
 intervals are not contiguous. Since -3/8 < R-, < l/^, as follows from the 
 initialization rules (k=0), the constants S can be, without problems, 
 restricted to the set {3, . .., 9)» 
 
 By the same procedure the interval bounds a and b for R are 
 calculated. The correspondence between values of S and allowed intervals 
 is given in Table 2.2. 
 
 
 Table 2.2 
 
 
 So 
 
 a < 6iHL 
 
 < b 
 
 2 
 
 - 2 
 
 
 10 
 
 ■42 
 
 -36 
 
 9 
 
 -37 
 
 -33 
 
 8 
 
 -33 
 
 -29 
 
 7 
 
 -29 
 
 -25 
 
 6 
 
 -25 
 
 -21 
 
 5 
 
 -22 
 
 -18 
 
 k 
 
 -18 
 
 -Ik 
 
 3 
 
 -Ik 
 
 -10 
 
 2 
 
 -10 
 
 -6 
 
 1 
 
 -6 
 
 -2 
 
 
 
 -2 
 
 3 
 
 -1 
 
 2 
 
 6 
 
 -2 
 
 6 
 
 10 
 
 -3 
 
 10 
 
 1*. 
 
 -k 
 
 11+ 
 
 18 
 
 -5 
 
 18 
 
 23 
 
 -6 
 
 23 
 
 27 
 
 -7 
 
 27 
 
 31 
 
 -8 
 
 31 
 
 35 
 
 -9 
 
 35 
 
 39 
 
 10 
 
 39 
 
 k2 
 
 The intervals in which R may be found are contiguous and S can 
 have all values from the set {10, ..., 10}. 
 
 For the remaining steps, k > 3> the simple relationship holds between 
 the value of S and the bounds of the corresponding interval: 
 
(-2S k -l) < 321^ < (-2S k +l) (2.10) 
 
 and 
 
 S e [10, . .., 10} 
 
 The result (2.10) indicates, first, that the selection rules for k > 2 are 
 invariant and, second, that the selection can be performed by rounding the 
 scaled remainder to one non-sign digit (in radix 16). The selection process 
 
 itself becomes very simple after the first three steps, due to the following 
 
 -k 
 fact. The last term in the remainder recursion, l6 S V R , cannot affect the 
 
 most significant bits of R. ,,, used in selecting S.. ,.., for k > 2. At least 
 ° k+1' k+1' 
 
 k - 3 most significant digits of R, remain unchanged except for possible 
 complementation, due to the change of the sign. Therefore, at the k-th 
 step, constants S , S , ..., S are known. For m digit precision, when 
 
 K K+J- ciJK— 3 
 
 k > (m+3)/2 all remaining constants S , ..., S are actually available and 
 the process of normalization can be simplified. Namely, the basic remainder 
 recursion (2.8) can be replaced by a simple form: 
 
 R^ = 16R^ + S k , k > (m+3)/2 (2.11) 
 
 The aforementioned features reveal the amenability of the normalization 
 procedure to higher radix implementations. Once the process gets started, 
 remaining steps are progressively easier. 
 
 As mentioned before, for k > 3 the selection of S ' s can be performed 
 through rounding, i.e., the most significant non-sign (radix 16) digit of R 
 represents the correct value of S, , after rounding. It is, then, natural to 
 specify the selection rules for "irregular" steps k = 0, 1, and 2 through a 
 modified rounding procedure rather than using a table look-up or a direct 
 combinational approach. 
 
10 
 
 The following definitions are relevant to the description of the 
 selection rules as well as the algorithms. 
 
 Sign and magnitude representation of the constants S, : 
 
 3 ± 
 
 S. = (l-2s. ) Z s.2 , s. e [0,1} for all i; 
 4 i=Q l l 
 
 (2.12) 
 
 Two's complement representation of scaled remainders: 
 
 tan 
 
 -i 
 
 R, = -r~ + E r.2 , r. e [0,1} for all i: 
 
 i=l 
 
 (m is the number of radix l6 digits) 
 
 (2.13) 
 
 Truncated scaled remainder: 
 
 6 „-i 
 
 r.2 
 
 i=l 
 
 \ = " r o + A r i 2 ' 
 
 Won 
 
 -sign part of R. : 
 
 (2.1*0 
 
 Then 
 
 T k = 
 
 I r2" i 
 i=l i 
 
 
 
 6 - -i • x. 
 Z r.2 if r Q 
 
 i=l 
 
 Li t\, = 0; 
 
 Step-dependent rounding constant: 
 
 -i 
 
 U, = S u.2 , u. e 10,1} and 
 
 K . -. 1 1 
 
 1=1 
 
 u. = r.(\) 
 
 S. = I (T. +U. ).l6j and 
 k L k k J 
 
 (2.15) 
 
 (2.16) 
 
 Sign (S fc ) = Sign (R^.) 
 
 (2.17) 
 
 where [Y\ denotes largest integer not larger than Y. 
 
n 
 
 Step k = 
 
 For the initial step, a modified procedure in selection is applied. 
 The extension of normalization to the negative values of X is easily achieved 
 through the initial step. The selection rules and the starting value of the 
 scaled remainder are specified as follows: 
 For X Q e [1/2,1), 
 
 S Q = 1 if 1/2 < X Q < 5/8; 
 
 S Q = if 5/8 < X Q < 1; (2.18) 
 
 For negative values of X , it is a simple approach to determine S 
 by the rules analogous to (2.18) and then generate the negative of X.. "while 
 calculating R as 
 
 R x = -3^ - 1 = (X Q + X Q S ) + 2 " 1U - 1 (2.19) 
 
 and proceed with normalization as in the case of positive X „ 
 
 Step k = 1 
 
 From the Table 2.1, the interval break points are chosen so that the 
 simple values of U are sufficient to obtain S . According to the proposed 
 approach 
 
 s i = L( T 1 +u 1 )- l6 J and Si e n ( s x ) = Si e n (\) 
 
 In Table 2.3 the correspondence between intervals and rounding 
 constants (U ) for each allowed value of S is given. 
 
Table 2.3 
 
 12 
 
 \ < ° 
 
 s l 
 
 6Mr 1+ i) 
 
 64T X 
 
 64U 
 
 
 9 
 
 4o 
 in 
 
 23 
 
 22 
 
 14 
 
 
 8 
 
 42 
 45 
 
 21 
 20 
 
 
 
 7 
 
 44 
 
 45 
 
 19 
 18 
 
 10 
 
 
 6 
 
 46 
 47 
 
 17 
 16 
 
 
 
 5 
 
 48 
 49 
 
 15 
 14 
 
 6 
 
 
 4 
 
 50 
 51 
 
 13 
 
 12 
 
 
 
 3 
 
 52 
 53 
 54 
 
 11 
 
 10 
 
 9 
 
 3 
 
 
 2 
 
 55 
 56 
 
 8 
 7 
 
 
 
 2 
 
 
 
 57 
 
 6 
 
 
 
 1 
 
 58 
 59 
 60 
 61 
 
 5 
 4 
 
 3 
 2 
 
 
 
 
 
 62 
 
 63 
 
 
 
 i 
 
 1 
 
 
 
 
 1 
 
 
 
 -1 
 
 2 
 
 3 
 4 
 
 2 
 3 
 4 
 
 
 
 1 
 
 
 
 5 
 
 5 
 
 
 
 
 6 
 
 6 
 
 
 
 -2 
 
 7 
 
 8 
 
 7 
 
 8 
 
 
 
 
 
 
 
 9 
 
 9 
 
 
 
 
 10 
 
 10 
 
 
 
 
 11 
 
 11 
 
 
 
 -3 
 
 12 
 13 
 14 
 
 15 
 
 12 
 13 
 14 
 
 15 
 
 
13 
 
 Now, it is a simple task to find relations "between IL and R. in the 
 ' Ik 
 
 form of Boolean equations. The derivation of those equations, listed below, 
 is given in Appendix A. 
 
 II = Z u.2" 1 
 i=l 
 
 U x = u 2 = 
 5 ° 2 (2.20) 
 
 U 5 = r + *fk 
 
 U 6 = *?k 
 
 Stev k = 2 
 
 Using the data from Table 2.2, the interval break points are selected 
 in such a way that the corresponding values of T and U will produce correct 
 values of S : 
 
 S 2 = L(T 2 +U 2 )l6j and Sign (S 2 ) = Sign (Rg) 
 
 The correspondence between R~ and U is shown in Table 2.4. 
 
 The Boolean equations are derived in Appendix B and listed below: 
 
 U. = U = u_ = u. = 
 12 3 4 
 
 u 5 = r + r i (r 2 + V + r 6 (2 * 21) 
 
 u 6 = r Q (r 1+ r 2 r 3 ) 
 
Table 2.1* 
 
 Ik 
 
 \<° 
 
 S 2 
 
 6MR 2 +D 
 
 6kT 2 
 
 6i*u 2 
 
 
 10 
 
 23 
 
 1*0 
 
 3 
 
 
 26 
 
 37 
 
 
 
 9 
 
 27 
 
 36 
 
 
 
 50 
 
 55 
 
 
 
 8 
 
 31 
 
 32 
 
 
 
 31* 
 
 29 
 
 
 
 7 
 
 35 
 
 28 
 
 
 
 58 
 
 25 
 
 
 
 6 
 
 39 
 
 21* 
 
 
 
 14-2 
 
 21 
 
 
 
 5 
 
 1*3 
 1*5 
 
 20 
 
 19 
 
 18 
 
 
 
 2 
 
 
 1* 
 
 1*6 
 
 17 
 
 
 
 1*9 
 
 11* 
 
 
 
 3 
 
 50 
 
 13 
 
 
 
 53 
 
 10 
 
 
 
 2 
 
 5** 
 
 9 
 
 
 
 57 
 
 6 
 
 
 
 1 
 
 58 
 
 5 
 
 - 
 
 
 6l 
 
 2 
 
 
 
 
 
 62 
 63 
 
 1 
 
 
 
Table 2.k Continued 
 
 15 
 
 R 2 > ° 
 
 6kt 
 
 6kT, 
 
 6ifU 
 
 
 
 
 
 l 
 
 
 1 
 
 2 
 
 -1 
 
 2 
 
 2 
 
 
 
 5 
 
 5 
 
 
 -2 
 
 6 
 
 6 
 
 
 
 9 
 
 9 
 
 
 -3 
 
 10 
 
 10 
 
 
 
 r? 
 
 13 
 
 
 Jf 
 
 ik 
 
 Ik 
 
 
 
 17 
 
 17 
 
 
 -5 
 
 18 
 
 18 
 
 
 
 21 
 
 21 
 
 
 -6 
 
 22 
 23 
 2lt 
 
 22 
 
 23 
 24 
 
 
 
 1* 
 
 
 26 
 
 26 
 
 
 -7 
 
 27 
 
 27 
 
 
 
 30 
 
 30 
 
 
 -8 
 
 31 
 
 31 
 
 
 
 .... 34 
 
 ^ 
 
 
 -9 
 
 35 
 
 55 
 
 
 - 
 
 58 
 
 38 
 
 
 -10 
 
 39 
 
 39 
 
 
 
 lt2 
 
 42 
 
 
 * - or 6kU = 2 whenever t/- = r^ = 1. 
 
16 
 
 Step k > 3 
 
 As mentioned before, the selection procedure for all remaining steps 
 consists of rounding: 
 
 S k = L( T k + V ,l6j > Sign ( s k ) = s ig n (\) 
 
 where 
 
 U k = Z V" 1 = 1 ^ 32f i ' e '> 
 i=l 
 
 u. = 0, 
 
 t 5 
 
 (2.22) 
 
 « -1 
 
 We now summarize the multiplicative normalization of a given number 
 
 !lj21 -j_ 
 
 X = -x + Z x.2 in the form of the following algorithm. 
 i=l 1 
 
 Algorithm N (Multiplicative Normalization): (2.23) 
 
 Step Nl. [Initialize] k <- 0; 
 
 S Q «- 1 if 1/2 < X Q < 5/8; 
 
 S Q - if 5/8 < X < 1; 
 
 R 1 «-X Q (1+S ) - 1; 
 
 Step N2. [Loop] 
 
 for k < m perform: 
 k +- k + 1; 
 
 S k *" l ( T k +U k )l6j; Sign S k «" Slgn R k= 
 
 if k < (m+3)/2 then: 
 
 - 5 L \K> 
 
 -k+1 
 \ + l - l6 \ + S k + l6 S k\ = 
 
 c+1 
 
 else: 
 
 \ + i - 16 \ + s t' 
 
IT 
 
 where [Yj denotes the largest integer not larger than Y; T and U are 
 defined according to (2.15) and (2.l6) with 
 
 U l = U 2 = ° ; 
 u 3 = K i r oV 
 
 u^ = \(r +rf k ) + Kg[r + \{\ + ? ? ) + r g ] + K 
 
 u 6 = K i ? 3 r ^ + K 2 r o (r l +r 2 r 5 } 
 
 and K , K and K stand for k = 1, k =2 and k > 3, respectively. 
 
 This algorithm will normalize a given number X to one in (m+l) 
 steps with the error bound 
 
 |X . - ll < 2/3 • i6" m (2.2k) 
 
 1 m+ ]_ 1 _ / -- 
 
 and simultaneously generate digits of the continued product representation 
 of 1/X n « The method of multiplicative normalization is convergent by 
 definition of the procedure and the existence of such procedure has been 
 shown by construction of the selection rules. In deriving those rules, the 
 aim was to achieve sufficient simplicity, not necessarily optimality. The 
 implementation aspects will be discussed later. From the algorithms considered 
 here, division and (natural) logarithm are based upon the multiplicative 
 normalization and will be described in that order. 
 
18 
 
 3- DIVISION 
 
 As stated before, the proposed algorithms apply to floating point 
 numbers with binary radix of the exponent. The radix 16 or, in other words 
 "k bits at a time" is used to speed up operations on fractional parts. 
 Since, in general, the exponent manipulation presents no problems, we 
 will not be concerned with the exponent arithmetic here. 
 
 Let Y», X_ e [l/2, l) be fractional parts of the given floating 
 point dividend and divisor, respectively. Then by multiplying both the 
 dividend and the divisor with the same sequence of factors M, 
 
 •-■J*-.- 2 ^ (3-D 
 
 X n f M. 
 
 °i=0 X 
 
 if X qTTM. -> 1, the fractional part Q of the quotient is obtained as Y irM. . 
 Defining the factors M, to be of the form 
 
 \ = 1 + S k -l6" k (3.2) 
 
 and S e [10, ..., 10} 
 
 one can determine constants S through the multiplicative normalization 
 (Algorithm N). To form a desired quotient Q, let Q. = Y and define 
 recursively the partial result as 
 
 \+l = \'\ = \( 1+s k * l6 " k ). < k < m (3-3) 
 
19 
 
 Then Q = Q, , has m correct digits, the error "bound in the normalized 
 
 divisor being |X _ - ll < 2/3 »l6 . 
 
 1 m+1 ' — ' 
 
 In presenting the algorithm for division, as well as for the other 
 operations, it is assumed that the normalization and the result evaluation 
 are carried out in two similar arithmetic units. Later, when discussing 
 implementation aspects, it will be shown how the proposed algorithms can 
 be realized with essentially one arithmetic unit with a tolerable decrease 
 in performance. 
 
 Algorithm D (Division) 
 
 (AU1: Normalization) (AU2: Result Evaluation) 
 
 Step Dl. [Initialize] k «- 0; 
 
 Step Nl (Algorithm N); Q Q «- Y ; 
 Step D2. [Loop] for k < m perform: 
 
 Step IG; \ + l-\ + Vk*"*' 
 
 An example of the division is given in Figure 3-l» The 
 
 "predictability" feature, described before (2.1l), is apparent at step 
 
 k = 7- the first five digits of Rq, when recoded, are the next five 
 
 constants S Q , ..., S_ . 
 o 12 
 
 In Figure 3-2, the basic hardware configuration, consisting of two 
 arithmetic units, is shown. The control part is not described. The only 
 difference between the two arithmetic units is that AU1 has the additional 
 network TU and the five-bit register S, used in the selection process. 
 
20 
 
 fv) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 © 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 r- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 «-* 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 >t 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 cr 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 00 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 in 
 
 
 
 s* 
 
 •-t 
 
 >* 
 
 CP 
 
 h 
 
 •O 
 
 CO 
 
 <t- 
 
 r- 
 
 cv; 
 
 CO 
 
 <t 
 
 <M 
 
 CO 
 
 
 
 cr 
 
 0- 
 
 C" 1 
 
 
 
 «4 
 
 in 
 
 t*" 
 
 ** 
 
 co 
 
 r~t 
 
 
 
 O 
 
 
 
 »-> 
 
 
 «-» 
 
 (T- 
 
 ** 
 
 "Q 
 
 r*- 
 
 h- 
 
 O 
 
 f— 4 
 
 >* 
 
 cv 
 
 O 
 
 r^ 
 
 r- 
 
 r- 
 
 vT 
 
 
 -J 
 
 IP 
 
 in 
 
 
 
 r- 
 
 cr 
 
 •— ( 
 
 — < 
 
 CO 
 
 C\J 
 
 <\J 
 
 f* 
 
 f-« 
 
 r-4 
 
 ip 
 
 
 <■ 
 
 IT. 
 
 C 
 
 a 
 
 f\J 
 
 O 
 
 «M 
 
 
 
 vO 
 
 m 
 
 <r 
 
 <f 
 
 >*• 
 
 n!" 
 
 m 
 
 
 2: 
 
 e 
 
 ro 
 
 a 
 
 00 
 
 cr 
 
 Cv! 
 
 cv 
 
 
 
 cr 
 
 cr 
 
 s 
 
 
 
 cr 
 
 oc 
 
 
 *— 
 
 cc 
 
 r~ 
 
 vD 
 
 nC 
 
 c 
 
 CO 
 
 r- 
 
 
 
 CO 
 
 CO 
 
 00 
 
 OC 
 
 CO 
 
 • 
 
 r-* 
 
 
 
 r- < 
 
 O 
 
 r- 
 
 «* 
 
 e~ 
 
 IP 
 
 ITi 
 
 in 
 
 m 
 
 in 
 
 in 
 
 in 
 
 in 
 
 c 
 
 it: 
 
 UJ 
 
 c 
 
 
 cr 
 
 O^ 
 
 or- 
 
 
 00 
 
 CO 
 
 CO 
 
 CO 
 
 cc 
 
 00 
 
 CO 
 
 CO 
 
 
 O 
 
 
 r-» 
 
 
 
 cr 
 
 m 
 
 Vf 
 
 ** 
 
 Sf 
 
 ■** 
 
 sl- 
 
 >t 
 
 <f 
 
 <r 
 
 >t 
 
 II 
 
 
 2: 
 
 m 
 
 (\i 
 
 IT* 
 
 in 
 
 in 
 
 ip 
 
 in 
 
 ir. 
 
 ip 
 
 in 
 
 in 
 
 m 
 
 m 
 
 o 
 
 
 ^-t 
 
 cr 
 
 in 
 
 rr 
 
 r<^ 
 
 ro 
 
 m 
 
 f*. 
 
 ro 
 
 ra 
 
 CO 
 
 ro 
 
 on 
 
 ro 
 
 X 
 
 
 w 
 
 ir\ 
 
 oc 
 
 oc 
 
 CP 
 
 cc 
 
 oc 
 
 00 
 
 CO 
 
 CO 
 
 CO 
 
 00 
 
 CO 
 
 cc 
 
 *v 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 o 
 
 
 
 c 
 
 O 
 
 O 
 
 O 
 
 O 
 
 O 
 
 
 
 O 
 
 
 
 
 
 O 
 
 O 
 
 O 
 
 >• 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 <f 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 cr 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 m 
 
 
 
 cv 
 
 0* 
 
 CO 
 
 
 
 <f 
 
 ro 
 
 h- 
 
 •— > 
 
 
 
 w* 
 
 or 
 
 r\j 
 
 
 
 
 in 
 
 
 
 m 
 
 in 
 
 r~ 
 
 in 
 
 CV 
 
 r- 
 
 f. 
 
 •-t 
 
 r<. 
 
 r- 
 
 O 
 
 O 
 
 
 
 
 
 
 
 *^> 
 
 (\j 
 
 >r 
 
 in 
 
 en 
 
 CV 
 
 
 
 in 
 
 r- 
 
 r- 
 
 f<^ 
 
 O 
 
 c 
 
 
 
 
 CO 
 
 
 _j 
 
 <f- 
 
 in 
 
 cv 
 
 <\i 
 
 r- 
 
 «c 
 
 >c 
 
 •— • 
 
 00 
 
 O 
 
 O 
 
 
 
 
 
 
 •-» 
 
 
 <t 
 
 in 
 
 »•* 
 
 r- 
 
 00 
 
 r\j 
 
 in 
 
 w~> 
 
 in 
 
 00 
 
 O 
 
 O 
 
 
 
 
 
 
 r- 
 
 
 2: 
 
 cc 
 
 c 
 
 h- 
 
 cr 
 
 >t 
 
 cr 
 
 
 
 »_< 
 
 cr 
 
 O 
 
 C 
 
 c 
 
 
 
 
 <* 
 
 
 *— • 
 
 r- 
 
 vC 
 
 00 
 
 in 
 
 m 
 
 r-« 
 
 h- 
 
 O 
 
 cr 
 
 O 
 
 c 
 
 
 
 
 
 
 r-< 
 
 f* 
 
 
 
 cr 
 
 Cr 
 
 CO 
 
 cr. 
 
 h- 
 
 a> 
 
 a^ 
 
 O 
 
 Cr 
 
 O 
 
 
 
 
 
 
 
 
 m 
 
 + 
 
 UJ 
 
 c* 
 
 <f 
 
 
 
 r~ 
 
 r. 
 
 a* 
 
 cr- 
 
 c 
 
 CT 
 
 O 
 
 
 
 
 
 
 
 H 
 
 cr 
 
 *: 
 
 c 
 
 cr 
 
 CV, 
 
 C\ 
 
 in 
 
 
 
 cr> 
 
 
 
 
 
 cr 
 
 O 
 
 
 
 
 
 c 
 
 1 
 
 in 
 
 X 
 
 
 a 
 
 
 
 -0 
 
 
 
 c 
 
 a^ 
 
 as 
 
 
 
 Cr 
 
 O 
 
 c 
 
 
 
 
 
 rn» 
 
 t 
 
 
 z; 
 
 cr 
 
 O 
 
 
 
 cr 
 
 
 
 cr. 
 
 cr 
 
 
 
 Cr 
 
 O 
 
 
 
 
 
 c 
 
 •H 
 
 
 
 
 k~< 
 
 
 
 cv 
 
 
 
 cr. 
 
 
 
 cr 
 
 cr 
 
 
 
 C> 
 
 O 
 
 
 
 a 
 
 
 
 
 
 W 
 
 r- 
 
 O 
 
 
 
 cr 
 
 
 
 c^ 
 
 o> 
 
 
 
 ■cr 
 
 O 
 
 
 
 
 
 
 
 II 
 
 C 
 
 
 
 O 
 
 9-i 
 
 «-« 
 
 O 
 
 *■* 
 
 O 
 
 
 
 *•* 
 
 O 
 
 i~l 
 
 r-4 
 
 »-» 
 
 •H 
 
 > 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 fO 
 CsJ 
 
 *T 
 
 in 
 
 CO 
 
 r- 
 
 cr 
 cr 
 cr 
 cr 
 cr 
 o 
 r- 
 • 
 o 
 
 11 
 
 c 
 
 X 
 
 < 
 
 o 
 
 V X 
 
 cr uj 
 
 x 
 
 CO 
 
 <\i 
 
 c 
 o 
 o 
 
 o 
 
 o 
 
 < 
 
 o 
 o 
 o 
 o 
 
 u. 
 
 li- 
 
 CC 
 
 o 
 < 
 
 O 
 u_ 
 u_ 
 C 
 <_> 
 u. 
 u. 
 cv; 
 
 <\> 
 o 
 
 u. 
 
 r- 
 u. 
 
 o 
 
 co 
 
 (M 
 
 in 
 
 r-t 
 CJ 
 fVJ 
 
 cu 
 
 Uj 
 
 — t nO 
 
 < UJ 
 
 r- 
 
 o 
 c 
 o 
 
 a- 
 
 < 
 
 Uj 
 
 m 
 
 o 
 po 
 Cr 
 UJ 
 CO 
 h- 
 
 cr 
 
 in 
 
 in 
 <t 
 o 
 
 >c 
 cr 
 
 < 
 
 UJ 
 
 cu 
 
 r~ 
 cr. 
 in 
 
 r- 
 < 
 
 Uj 
 IP 
 f*\ 
 
 r- 
 < 
 
 in 
 
 •— < 
 
 CO 
 
 sO 
 
 IT* 
 CO 
 
 < 
 < 
 
 o 
 o 
 >c- 
 cc 
 
 CO 
 CC 
 
 < 
 
 UJ 
 CO 
 
 CO 
 
 Cr 
 C> 
 «-> 
 U. 
 
 «t 
 
 CD 
 IP. 
 cr 
 
 r*- 
 r- 
 «r 
 
 t-> in 
 
 o 
 o 
 
 CO 
 
 in 
 cr 
 
 r- 
 r^ 
 
 in 
 
 00 
 O 
 r^ 
 
 CO 
 
 o 
 
 U- 
 
 cc 
 IP. 
 cr 
 f- 
 
 r» 
 »t 
 in 
 
 <*» 
 rvi 
 00 
 o 
 u. 
 >* 
 
 CC 
 
 in 
 cr 
 
 w I 
 
 in 
 
 in m 
 l I 
 
 m 
 
 >t r- 
 
 sO 
 
 1 
 
 CO 
 
 I I 
 
 l 
 
 —t CO 
 
 in 
 
 00 
 
 -t Csi 
 
21 
 
 aui: 
 
 cb 
 
 R»+i 
 
 — »j APOEP 
 
 I J 
 
 T~T 
 
 k s k 
 
 R k+1 »16R|| + S|, +ia- k " , " 1 S| ( R|i 
 
 AU2: 
 
 0k^l-0 k *16" , S k Q ll 
 
 (0;±4;t8)«l6~''* 1 R|i 
 
 | ADOEP. | 
 
 SELECT-COMPLEMENT U— S h 
 
 ^_^- 
 
 ♦ (0;il;±2)»i6- ,1 * 1 H k 
 
 i I i n i ■ 
 
 R k " 
 
 SELECT-COMPLEMENT [•— S k 
 
 SHIFTING NETWORK U— k 
 
 Y- 
 
 Rn + i 
 °k + i 
 
 | AOOEP. [ 
 
 ♦ (0;±4;±8)«ie 
 
 -»# 
 
 ■ SELECT-COMPLEMENT^ *— S k 
 | ADPEP J I 
 
 I I (o;±i;±2)«i6-' l o,i I 
 
 y- s » 
 
 SELECT-COMPLEMENT j— a h 
 
 1 
 
 16- k Q k ii 
 
 SHIFTING NETWORK U— k 
 
 K- 
 
 Qk + i 
 
 Figure 3-2 
 
22 
 
 k. NATURAL LOGARITHM 
 
 E 
 Let X = X »2 be a given floating point number with fractional 
 
 part X e [1/2,1) and exponent E . In formulation of the algorithm in 
 
 radix l6, we follow the same approach as for the radix 2 case, given in 
 
 [1]. To obtain to X, the problem is split in two parts. Namely, 
 
 toX = toX n + E to 2. 
 x 
 
 (h.i) 
 
 The algorithm for calculation of the first term to X is derived from the 
 
 identity: 
 
 X n = X. 7T M./ w M. 
 ° 1*0 X i=0 1 
 
 -k 
 
 ft.2) 
 
 where multipliers are M, = 1 + S, »l6 and constants S e [10, ..., 0, ..., 10}, 
 as defined before for the multiplicative normalization algorithm. Now, 
 
 m 
 
 m 
 
 to X = to (X 7T M. ) - to ( $ M.) 
 i=0 X i=0 1 
 
 (M) 
 
 From the normalization algorithm N (2.26) we know that the error 
 in normalization is bounded by 
 
 |x m+1 - l| < 2/5-16- 
 
 m 
 
 Therefore, to (X ir M. ) = for m digits precision and to X is reduced to 
 
 ^o 1 
 
 m 
 
 m 
 
 -i. 
 
 toX = - to ( $ M. ) = S [- Ml+S.16 ] 
 i=0 x i=0 x 
 
 0+A) 
 
23 
 
 To obtain Ibn X , one needs to perform the summation of m + 1 sets of 
 precomputed constants of the form 
 
 - ton (1 + S. l6" k ) 
 
 stored in a fast read-only memory (ROM). 
 
 The stored constants are retrieved from the ROM using S, ' s, obtained 
 in normalizing X . It is clear that this procedure is equally well applicable 
 to any logarithm function — just the set of stored constants need to be 
 precomputed in the corresponding base. The summation (k.k) is performed 
 recursively: 
 
 I^ +1 = 1^ + [- en (1 + S k -l6" k )] for k = 0, ..., m 
 
 where L = (k.5) 
 
 Then L n = P/n X-. It should be noted that this result may not have an 
 m+1 
 
 accuracy of m digits because the stored constant cannot be exactly 
 represented with m digits and during m+1 additions, errors will accumulate. 
 If the result P/r, X is to be correct to m digits (km bits), i.e., with error 
 less than 1/2*2 , then the precision of the second arithmetic unit should 
 be extended by 
 
 Am = [Pn (m+l)/0* 21 
 
 where [X] is the smallest integer not smaller than X. For m = 12, this 
 amounts to an extension of k bits. If the algorithm is performed in radix 2, 
 to retain the same error bound, the extension will be 6 bits. 
 
 The calculation of the second term E Pm 2 can be performed using 
 conventional multiplication since E is always of short precision compared 
 
 -A. 
 
2k 
 
 to X . The constant ton 2 can be stored in a ROM. As an alternative, it 
 might be convenient to implement E ton 2 using a multiplication algorithm 
 based on continued sums, as described in Chapter 6. Assuming that the 
 length of the exponent E is 8 bits, such a solution will increase the total 
 time by 3 basic cycles but reduce the hardware requirements by an extra low 
 precision multiplier. Namely, after 2m X has been computed in the second 
 arithmetic unit (of Figure k.2), this result is taken as the first partial 
 product. Then the first arithmetic unit is initialized to E , the step 
 counter set to zero and the multiplication E ton 2 is performed using recursion 
 (6.2): 
 
 L. +1 = 1^ + {ton 2)S k 'l6" k for k = 0, 1, 2 
 where 
 
 m+1 
 
 and S n , S , S are obtained through the additive normalization algorithm. 
 Therefore, the last partial product P, will represent the final result 
 feX. This approach has assimilated the extra add step, indicated in (^.l). 
 For higher radices the set of constants S is enlarged and 
 
 consequently the capacity of the ROM must be increased. As in the radix 2 
 
 -k 
 case [1], there is no need to store all of the constants [-^(l + S*l6 )]. 
 
 The possible reduction can be shown using the power series expansion for 
 
 the logarithm. 
 
 The constants to be stored are of the form 
 
 ton (1 + S *l6" k ) = ton (l+a) and 
 S € [10, •••, 0, •••, 10} 
 
25 
 
 Then 
 
 fa (l+a) = a - i a 2 + H.O.T. f or - 1 < a < 1 
 
 2 iW^ 10 - 1 + lua ,,,,, 
 
 For k> k]L ,-5~8 - ^ Q +km (k.6) 
 
 and m digits accuracy 
 
 fa (1 + S. -l6" k ) = SL -l6" k 
 
 eliminating the need for storing more constants. Therefore, for radix 16 
 
 the necessary capacity of the ROM is at most 10m + 15 words of m digits 
 
 (km bits) each, including the constant fa 2, used in the initial step as 
 
 well as in evaluation of the term E fa 2. 
 
 x 
 
 The algorithm for natural logarithm is given below: 
 
 Algorithm L (Logarithm) (^.7) 
 
 (AU1: Normalization) (AU2: Result Evaluation) 
 
 Step LI. [Initialize] k *- 0; 
 
 Step Nl. (Algorithm N); L Q - 0; 
 
 Step L2. [ Loopl for k < m perform: 
 
 Step N2. if k < k L then: 
 
 Vi~\ - " (1 + s k l6 " k) = 
 
 else : 
 
 Step L3. [Form in X. + E fa 2] 
 L x J 
 
 Step Al. (Algorithm A); (L = L ); 
 
 m+1' 
 
 - initialize: R *- E ; 
 
 x 
 
 for k < 2 perform: 
 Step A2. L^ -\+ (* 2)S k l6" k ; 
 
26 
 
 where k is given by (^.6), and algorithm A for additive normalization is 
 given in (5*6). In the example, Figure k-1, only the calculation of the 
 Sm X part is shown. The additional requirement for implementation of the 
 logarithm algorithm is a read-only memory (ROM), connected to the result 
 evaluation unit (Figure lj—2). 
 
27 
 
 
 in 
 
 cm 
 
 %*■ 
 
 ♦4- 
 
 CO 
 
 co 
 
 O 
 
 cm 
 
 ro 
 
 ** 
 
 CO 
 
 CM 
 
 CM 
 
 
 
 
 •4" 
 
 CO 
 
 m 
 
 co 
 
 
 
 
 
 •4- 
 
 CM 
 
 co 
 
 
 
 ro 
 
 m 
 
 <■» 
 
 o> 
 
 in 
 
 m 
 
 ft 
 
 
 
 in 
 
 CO 
 
 r- 
 
 h- 
 
 
 
 O* 
 
 
 
 
 
 _J 
 
 m 
 
 en 
 
 CO 
 
 O 
 
 co 
 
 m 
 
 
 
 in 
 
 CO 
 
 ft 
 
 r-4 
 
 CM 
 
 CM 
 
 < 
 
 in 
 
 
 
 O 
 
 «4" 
 
 cr 
 
 O 
 
 
 
 m 
 
 ft 
 
 CM 
 
 CM 
 
 cm 
 
 CM 
 
 2: 
 
 O 
 
 r- 
 
 O 
 
 CM 
 
 -t 
 
 r» 
 
 Q 
 
 CM 
 
 <i 
 
 vT 
 
 >3- 
 
 *■ 
 
 »* 
 
 »— 1 
 
 as 
 
 CO 
 
 CO 
 
 03 
 
 co 
 
 r- 
 
 <\J 
 
 ft 
 
 r-4 
 
 ft 
 
 r-4 
 
 r-t 
 
 r-4 
 
 r-4 O 
 
 r-4 
 
 r- 
 
 
 
 r- 
 
 a 
 
 
 
 h- 
 
 r» 
 
 r- 
 
 r- 
 
 r- 
 
 r- 
 
 r- 
 
 ♦ UJ 
 
 r- 
 
 in 
 
 *o 
 
 
 
 vj" 
 
 m 
 
 CO 
 
 <M 
 
 CM 
 
 cm 
 
 CM 
 
 cm 
 
 CM 
 
 * 
 
 >J- 
 
 p-< 
 
 CM 
 
 «o 
 
 «-4 
 
 t-t 
 
 t-> 
 
 r-4 
 
 ft 
 
 t-< 
 
 ft 
 
 f-« 
 
 t-t 
 
 _J 
 
 ft 
 
 03 
 
 CO 
 
 m 
 
 fTl 
 
 m 
 
 m 
 
 m 
 
 ro 
 
 ro 
 
 CO 
 
 ro 
 
 co 
 
 z 
 
 ro 
 
 o> 
 
 m 
 
 rsi 
 
 CM 
 
 cm 
 
 r-j 
 
 <\J 
 
 CNJ 
 
 <\i 
 
 CM 
 
 nj 
 
 CM 
 
 •— • 
 
 
 
 in 
 
 <M 
 
 f\J 
 
 (M 
 
 CNJ 
 
 (Nl 
 
 CM 
 
 CM 
 
 <\i 
 
 CM 
 
 CM 
 
 eg 
 
 "»*■ 
 
 sO 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 in 
 
 
 O 
 
 O 
 
 O 
 
 
 
 O 
 
 
 
 O 
 
 O 
 
 O 
 
 O 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 I 
 
 1 
 
 <M 
 
 m 
 o 
 
 CM 
 
 <M 
 
 •4" 
 
 r- 
 
 
 
 CO 
 
 
 
 vi 
 
 >o 
 
 <4" 
 
 CO 
 
 CO 
 
 O 
 
 ft 
 
 CM 
 
 ■o 
 
 
 
 
 
 
 cm 
 
 
 
 CO 
 
 
 
 ♦4" 
 
 ro 
 
 03 
 
 sO 
 
 
 
 ft 
 
 a 
 
 O 
 
 »o 
 
 
 
 
 
 
 «-• 
 
 
 rm 
 
 0^ 
 
 
 
 CO 
 
 in 
 
 CO 
 
 in 
 
 r~ 
 
 r~ 
 
 g3 
 
 O 
 
 
 
 
 
 
 
 
 co 
 
 
 _J 
 
 r-« 
 
 r- 
 
 CO 
 
 ro 
 
 03 
 
 >-t 
 
 CO 
 
 m 
 
 •0 
 
 O 
 
 
 
 
 
 
 
 
 cm 
 
 
 < 
 
 t— 1 
 
 a* 
 
 »4 
 
 ro 
 
 r- 
 
 <t 
 
 r- 
 
 rr\ 
 
 
 
 a> 
 
 a* 
 
 
 
 
 
 
 CM 
 
 
 X 
 
 r-4 
 
 in 
 
 CM 
 
 0* 
 
 
 
 m 
 
 in 
 
 CO 
 
 cr 
 
 cr» 
 
 
 
 
 
 
 
 
 in 
 
 
 »— « 
 
 MO 
 
 
 
 r-4 
 
 tf 
 
 r~ 
 
 O 
 
 c 
 
 c> 
 
 0^ 
 
 
 
 o» 
 
 
 
 
 
 
 • 
 
 r-< 
 
 O 
 
 m 
 
 in 
 
 ft 
 
 
 
 CM 
 
 ro 
 
 r~> 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 O 
 
 ■f 
 
 UJ 
 
 -t 
 
 r- 
 
 m 
 
 00 
 
 CM 
 
 O 
 
 O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 *: 
 
 
 
 o> 
 
 
 
 t-t 
 
 «4" 
 
 C. 
 
 O 
 
 O 
 
 cr 
 
 a 
 
 
 
 
 
 
 
 
 
 
 
 X 
 
 
 CM 
 
 
 
 in 
 
 O 
 
 O 
 
 O 
 
 
 
 
 
 er 
 
 
 
 
 
 
 
 
 
 H 
 
 it 
 
 
 2 
 
 03 
 
 CO 
 
 f-« 
 
 
 
 c 
 
 O 
 
 O 
 
 
 
 a> 
 
 o> 
 
 o> 
 
 
 
 
 
 1 
 
 — » 
 
 
 1— « 
 
 CO 
 
 fO 
 
 
 
 
 
 
 
 O 
 
 O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 J- 
 
 O 
 
 
 «* 
 
 —* 
 
 
 
 
 
 
 
 
 
 O 
 
 O 
 
 
 
 
 
 0^ 
 
 
 
 
 
 
 
 <D 
 
 X 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 f* 
 
 r-4 
 
 £ 
 
 z 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ■H 
 
 -J 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 fe 
 
 *t 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CJ> 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 0^ 
 
 
 0m 
 
 Q 
 
 O 
 
 CO 
 
 in 
 
 03 
 
 O 
 
 
 
 O 
 
 U_ 
 
 O* 
 
 CM 
 
 O 
 
 U. 
 
 in 
 
 
 _J 
 
 O 
 
 CO 
 
 ft 
 
 tu 
 
 m 
 
 «4- 
 
 ^ 
 
 CM 
 
 <r 
 
 cn 
 
 CM 
 
 CO 
 
 O 
 
 in 
 
 
 < 
 
 CO 
 
 < 
 
 U- 
 
 < 
 
 N* 
 
 in 
 
 
 
 < 
 
 h- 
 
 03 
 
 03 
 
 u_ 
 
 u> 
 
 
 
 
 31 
 
 03 
 
 CO 
 
 03 
 
 CO 
 
 UJ 
 
 rr\ 
 
 cc 
 
 h- 
 
 vt 
 
 CO 
 
 UJ 
 
 in 
 
 0* 
 
 CO 
 
 
 •— . 
 
 O 
 
 in 
 
 03 
 
 
 
 LL 
 
 O 
 
 
 
 CO 
 
 03 
 
 
 
 «* 
 
 <t 
 
 m 
 
 r-t 
 
 
 O 
 
 r- 
 
 CM 
 
 03 
 
 UJ 
 
 c 
 
 CO 
 
 < 
 
 UJ 
 
 CO 
 
 <t 
 
 UJ 
 
 
 
 u, 
 
 r~ 
 
 
 UJ 
 
 
 
 UJ 
 
 03 
 
 UJ 
 
 r- 
 
 u_ 
 
 UJ 
 
 03 
 
 
 
 co 
 
 < 
 
 in 
 
 < 
 
 ^4• 
 
 r-4 
 
 a 
 
 CO 
 
 st 
 
 & 
 
 CM 
 
 QO 
 
 m 
 
 in 
 
 >*- 
 
 <t 
 
 O 
 
 CO 
 
 UJ 
 
 wM 
 
 r-4 
 
 + 
 
 < 
 
 
 
 CM 
 
 UJ 
 
 in 
 
 O 
 
 i_) 
 
 u_ 
 
 < 
 
 <f 
 
 < 
 
 
 
 < 
 
 m 
 
 m 
 
 X 
 
 X 
 
 u. 
 
 Q 
 
 in 
 
 CM 
 
 UJ 
 
 CO 
 
 0^ 
 
 
 
 < 
 
 >* 
 
 < 
 
 CO 
 
 <4» 
 
 0^ 
 
 (X 
 
 UJ 
 
 
 
 O 
 
 m 
 
 CM 
 
 co 
 
 O 
 
 co 
 
 03 
 
 
 
 < 
 
 •4- 
 
 
 
 LL 
 
 tn 
 
 
 X 
 
 CO 
 
 < 
 
 >i- 
 
 •0 
 
 w-t 
 
 CO 
 
 Q 
 
 «r 
 
 03 
 
 O 
 
 < 
 
 < 
 
 m 
 
 • 
 
 
 
 u. 
 
 co 
 
 m 
 
 CM 
 
 
 
 ft 
 
 GO 
 
 CM 
 
 -J" 
 
 03 
 
 O 
 
 >!■ 
 
 co 
 
 O 
 
 
 Z 
 >— • 
 
 CM 
 
 o* 
 
 
 
 m 
 
 CM 
 
 03 
 
 ft 
 
 r- 
 
 CM 
 
 >*- 
 
 03 
 
 < 
 
 in 
 
 H 
 O 
 X 
 
 
 ft 
 
 CM 
 
 O 
 
 g} 
 
 en 
 
 CM 
 
 -O 
 
 CM 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 CM 
 
 >0 r-l 
 
 CM 
 
 ro 
 
 in 
 
 co 
 
 o* 
 
 ft CM 
 
28 
 
 AUi: 
 
 i 
 
 --< 
 
 cp t *». 
 
 
 k fr k 
 
 Rn-I"16Rk+Sk+18~ M ' l SkRk 
 
 AU2: 
 
 Lk+l«Lk + Ck,OSk<k, 
 
 r-iL(M||),osk<k 1 
 C k «-| 1 .kisksm 
 
 ^ A- 2 , for Ek A- 2 
 
 Lfc* 1 »Lk-Skl67 h k x Sk5m 
 
 F' 
 
 ADDER 
 
 I 
 
 I 
 
 J (0;±4;±8)*16" h * l R k 
 
 AOOER 
 
 Rn " 
 
 *k+l 
 
 L-k-t-1 
 
 ADDER 
 
 ADDER 
 
 Lk 
 
 Lk + 1 
 
 I 1 ( ° 
 
 ;±x;±2)#16" i "*" 1 r 
 
 SELECT-COMPLEMENT U— Sk 
 
 t 
 
 SELECT- COMPLEMENT 
 
 \— Sh 
 
 IS-fc^R,, I — 
 
 I! SHIFTING NETWORK - "} *— k 
 
 ROM 
 
 h- Sk 
 
 -k 
 
 C k 
 
 j [ SELECT - COMPLEMENT ] •— 0;S k 
 
 Ck 
 
 SELECT-COMPLEMENT 
 
 3*-i;8k 
 
 Ck 
 
 SHIFTING NETWORK 
 
 7X 
 
 ]•- o;i 
 
 Figure 4-2 
 
29 
 
 5. ADDITIVE NORMALIZATION 
 
 Algorithms for multiplication and the exponential, analogous to 
 those defined for division and the logarithm can be derived on the basis of 
 continued sums. The step by step process, in which a given number X e[l/2, l) 
 
 
 
 is normalized to zero by proper choice of constants S such that 
 
 x n - § S..16" 1 = o (5.1) 
 
 i=0 
 
 where S e [10, ..., 0, ..., 10} 
 
 is clearly a right directed recoding of X . It is termed normalization 
 
 as an additive counterpart of the previously described multiplicative 
 
 normalization. The digits x. from the non-redundant digit set [0, ..., 15} 
 
 are replaced, starting from the most significant, with the digits S 
 
 belonging to the redundant digit set (10, ..., 0, ..., 10}. This recoding 
 
 is simple and exact. Namely, for every pair of digits x_ x , if 
 
 x > 10, (maximal allowed value for S, _) then this pair is recoded as 
 K+l — ' k+l 
 
 S n S. , n , where S n = x n +1 and S, ,. = -(15 - x, ,.). Otherwise, S, = X, . 
 k k+l' k k k+l k+l ' k k 
 
 To define the selection rules, we proceed as before. First, the 
 scaled remainder is defined as 
 
 Rk = I6 k ~\ (5-2) 
 
 k-1 , m± 
 where X, = X_ - z S.l6 
 
 ^ ° i=0 x 
 
30 
 
 Then, 
 
 Vi - l6 \ + i 
 
 -k- 
 
 ^(^ - S k -16 *) 
 
 l6R, - S , < k < m 
 
 (5.3) 
 
 represents the basic recursion. This recursion corresponds exactly to the 
 recursion (2.11), except for the sign of S, . The scaled remainders are 
 bounded, similarly as before: 
 
 l\l < 2/3 
 
 (MO 
 
 and the selection rule is the same for all steps. The rule is simple: S. 
 equals the scaled remainder R, , rounded to one non-sign digit, the sign 
 being that of R. More precisely, the selection rule is: 
 
 S k = ^ T k + V* l6j and Sign S k = Sign \ 
 
 < k < m 
 
 (5.5) 
 
 where 
 
 i=l 
 
 -l 
 
 if R, > 0; 
 
 T = Z r 2" 1 
 k i=l X 
 
 if 
 
 \ 
 
 < and 
 
 \ m " r o + X r i 2 " ls 
 
 1=1 
 
 U = 1/32. 
 
 Thi 
 
 s choice of conventional rounding, i.e., U = 1/32 will actually 
 
 restrict the set of constants S, to [8, ..., 0, ..., 8}. This choice is 
 
31 
 
 preferred because it is simplest and the restricted set of S is sufficient 
 for recoding. The choice of U = 6/256 would require a full set of constants 
 S. . Although the selection rule (5*5) holds for the initial step as well, 
 it is more convenient to use the fact that for |X | > l/2, as is the case, 
 |S I can always be taken to be 1. From the definition of the scaled 
 remainder (5*2) it follows that it is preferable to start always with 
 |S I = 1. Otherwise, for k = 0, to prevent loss of accuracy, an extension 
 of one radix-l6 digit would be necessary as well as an additional right 
 shift path, needed only in this step. As indicated before, the result of 
 additive normalization is always exact, i.e., 
 
 x - § s.i6 _i = 
 i=0 x 
 
 For reference purposes, the additive normalization is summarized in the 
 
 form of algorithm A, given below. 
 
 Algorithm A (Additive Normalization): (5.6) 
 
 Step Al. [Initialize] k *- 0; 
 
 s -l; 
 
 R i *~ x o " s o ; 
 
 Step A2. [Loop] for k < m perform: 
 
 k - k + 1: 
 
 S k «- l (T k + U k )l6l; Sign S k - Sign J^ 
 
 Vi - l6R k " V 
 
 where U = 1/32, 
 
32 
 
 6. MULTIPLICATION 
 
 The algorithm for multiplication, given here, is based on a 
 
 conventional procedure applied to the radix l6 case. Let the multiplicand 
 
 and the multiplier "be floating point numbers, satisfying the usual 
 
 requirements, i.e., 
 
 ~F 
 Y= Y n 2 Y Y n 6 [1/2,1) 
 
 "0 
 
 "cr "0 
 
 : X n 2 X X n € [1/2,1) 
 
 Again, only the multiplication of fractional parts is described, 
 omitting straightforward exponent arithmetic as well as postnormalization 
 of the result. 
 
 Consider 
 
 P = Y o x o 
 
 = Y ^ [ X ^ - 2 Z. + v Z.l 
 
 00 i?o x i=o lJ 
 
 where m is the number of radix 16 digits in the fractional parts. If terms 
 
 -k 
 Z = S *l6 are properly chosen, then 
 
 m 
 
 x o - ifo s i- 16 " 1 ■ ° 
 
 and 
 
 m 
 
 p = Y o A s i' 16 " 1 {6 - 1] 
 
33 
 
 where the constants S,, as before, are from the set [10, . .., 0, . .., 10}. 
 
 Applying additive normalization on X , the constants S can be obtained, 
 as described by Algorithm A (5.6). 
 
 The stimulation of the partial products is performed simultaneously 
 in the second arithmetic unit, using the following recursion: 
 
 P. ^ = P. + Y.'S 1 *l6' k , < k < m (6.2) 
 
 k+1 k k ' — — 
 
 where P - 0. 
 
 Since the normalization of X is exact, the only error in multipli- 
 cation comes from the single precision result representation. 
 
 The algorithm for multiplication, compatible with other proposed 
 algorithms, is given below. 
 
 Algorithm M (Multiplication): (6.3) 
 
 (AU1: Normalization) (AU2: Result Evaluation) 
 
 Step Ml. [Initialize] k ^ 0; 
 
 Step Al (Algorithm A); P «- 0; 
 
 Step M2. [Loop] for k < m perform: 
 
 Step A2; P.^, - P v + Y n S. if 
 
 k+l ' "k ' "(Tk'"" 
 
 An example is shown in Figure 6-1. For implementation, shown in 
 Figure 6-2, an extra register to hold the multiplicand is needed. 
 
3k 
 
 o 
 o 
 o 
 
 r- 
 -t 
 in 
 oo 
 
 ro 
 
 o 
 
 it 
 o 
 
 X 
 
 ■M- 
 
 o 
 
 >- 
 
 >*«oro<M.mroooooooc» 
 o* o m co oroooOoooo 
 — o^ sr o* in covtocooooo 
 „jmmr-coocMr'-r-r~r^r"~r'-r* 
 < in o so ro inco^^^st-Nt.^^ 
 sro^rocooommmmmmin 
 •—cocor— cMeorvjcococococooooo 
 *-* O •-• O in rn "-l-in^**^^^^^ 
 
 :*i O *f co o* •-• rorororororocororo 
 
 O. ,-ir->0— lr-4f-4r-4f~lr-4r-li-lr-4r-4 
 
 Z CO r» r-<< r-l r-t^rlr-ir-tr-ir-tr*^ 
 
 »-i Cj> O CM CM NNNNMIMNNN 
 
 ooooooooooooo 
 
 *t 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 in 
 
 
 
 CO 
 
 CO 
 
 co 
 
 CM 
 
 r- 
 
 in 
 
 
 
 
 
 
 
 o 
 
 
 in 
 
 
 
 vO 
 
 ro 
 
 «o 
 
 ro 
 
 00 
 
 O 
 
 o 
 
 c 
 
 o 
 
 o 
 
 c 
 
 o 
 
 o 
 
 
 o 
 
 
 -~ 
 
 r» 
 
 <M 
 
 r» 
 
 CM 
 
 «4" 
 
 -t 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 CO 
 
 
 _i 
 
 in 
 
 »* 
 
 in 
 
 ** 
 
 CM 
 
 o 
 
 o 
 
 Cj 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 t~* 
 
 
 < 
 
 •4- 
 
 in 
 
 ** 
 
 O 
 
 ro 
 
 •o 
 
 o 
 
 o 
 
 c 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 r- 
 
 
 z 
 
 r-4 
 
 CO 
 
 r-4 
 
 «— 1 
 
 in 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 *3- 
 
 
 i«— • 
 
 cm 
 
 r- 
 
 CM 
 
 «t 
 
 ro 
 
 in 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 i— » 
 
 f-t 
 
 o 
 
 o 
 
 o 
 
 in 
 
 O 
 
 r- 
 
 o 
 
 o 
 
 o 
 
 a 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 CO 
 
 ■f 
 
 UJ 
 
 o 
 
 O 
 
 r- 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 H 
 
 a 
 
 *: 
 
 o 
 
 o 
 
 o 
 
 ro 
 
 CO 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 MD 
 
 in 
 
 X 
 
 
 o 
 
 *t 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 ti 
 
 o 
 
 o 
 
 • 
 
 
 z 
 
 o 
 
 co 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 <D 
 
 c 
 
 
 l»-i 
 
 o* 
 
 CM 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 •pi 
 
 
 
 *—* 
 
 CM 
 
 c 
 
 o 
 
 6 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 c 
 
 o 
 
 o 
 
 
 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 » 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 ii 
 
 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 fe 
 
 a 
 
 
 
 1 
 
 
 1 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 >- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CM 
 CO 
 CM 
 
 in 
 co 
 
 r- 
 o* 
 a* 
 o* 
 
 o» 
 o 
 
 • 
 
 o 
 
 < 
 a: 
 
 o 
 
 LU 
 
 -I O 
 ♦ < 
 
 (X UJ 
 
 I 
 
 00 
 
 ro 
 
 CM 
 
 O 
 
 o 
 o 
 o 
 o 
 
 r~ 
 o 
 ro 
 < 
 
 O 
 
 CO 
 
 o 
 
 Q 
 
 u_ 
 
 u. 
 u. 
 
 UJ 
 00 
 CM 
 
 o 
 in 
 
 o 
 o 
 
 00 
 
 ro 
 
 CM 
 O 
 O 
 O 
 
 o 
 o 
 
 Q 
 ro 
 
 o 
 o 
 c 
 
 CO 
 
 o 
 o 
 ti- 
 ll. 
 
 u_ 
 
 UJ 
 CO 
 CM 
 
 O 
 
 O 
 
 o 
 o 
 
 CO 
 
 ro 
 
 CM 
 
 o 
 o 
 o 
 o 
 o 
 
 o 
 o 
 
 o 
 o 
 o 
 
 CO 
 
 ro 
 
 CM 
 
 o 
 o 
 o 
 
 o 
 
 r- •-« 
 
 o 
 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 
 o 
 
 00 
 
 ro 
 CM 
 
 o 
 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 
 00 
 
 ro 
 
 CM 
 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 
 00 
 
 ro 
 CM 
 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 
 CO 
 
 ro 
 CM 
 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 
 00" 
 
 ro 
 
 CM 
 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 
 CO 
 
 ro 
 CM 
 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 o 
 
 00 
 CO 
 
 o 
 
 X 
 
 i/> 
 
 r-i m 
 
 I 
 
 ro 
 
 I 
 
 r- f-* 
 
 CM 
 
 O <r-* CM CO 
 
 in 
 
 CO 
 
 r-l CM 
 
35 
 
 AUl: 
 
 U k 
 
 T U 
 
 7 — I 
 
 k R k 
 
 R k+1 - 16R k -S k 
 
 AU2: 
 
 Pk + l l Ph + YoS|«16 
 
 -k 
 
 ft 
 
 Rk+i 
 
 ADDER 
 
 n 
 
 II I*— (o;±4;±8) 
 
 ADDER 
 
 rr 
 
 R k + 1 
 , p k + l 
 
 k + i 
 
 (0;±r,±2) 
 
 | SELECT-COMPLEMENT^ — Sfc 
 
 SHIFTING NETWORK 
 
 T 
 
 (i) 
 
 ]*- (O) 
 
 ADDER 
 
 
 i 
 
 i 
 
 1 
 
 (0;±4;±8)*l6" k Yo 
 
 
 
 
 
 *- s„ 
 
 
 
 SELECT- COMPLEMENT 
 
 ADDER 
 
 
 . T 
 
 I 
 Pk 
 
 1 
 
 
 m- + l- + o(MU 
 
 
 1 (U|il.i(.l»lU IU 
 
 
 
 
 
 
 
 SELECT- COMPLEMENT 
 
 *- Sk 
 
 
 1 
 16 _l( Yo ' 
 
 1 
 
 
 
 
 
 SHIFTING NETWORK 
 
 «•— k 
 
 
 
 I 
 
 i 
 
 
 P 
 
 
 V. 
 
 Figure 6-2 
 
36 
 
 7 • EXPONENTIAL 
 
 The following manipulation, as defined for radix 2 case in [1], 
 
 applies without change to the radix l6, producing a convenient form of the 
 
 v 
 
 exponential e . In the identity 
 
 e x = gX-^e- ^2 (7.1) 
 
 let 
 
 Xfo^e = I + F 
 
 where I and F denote integer and fractional part, respectively. 
 
 Now 
 
 X Ife2Ffe2 
 
 e = e e 
 
 JFfe2 (7.2) 
 
 = 2 e ' 
 
 and defining X as 
 
 X Q = F^2 (7.3) 
 
 we obtain the result in the form 
 
 Y = Y Q 2 Y = e X = 2 I e ° (7-^) 
 
 Y 
 
 Therefore the problem of finding e is substantially reduced to the problem 
 
 X I 
 
 of finding e , the factor 2 being easily incorporated into the exponent 
 
 part of the result E . The exponent X is any number such that Y is in the 
 
37 
 
 x 
 
 machine range. Therefore, |f| < 1, |x| <to2 yielding e e (l/2, 2). It 
 is a simple detail of an actual design to obtain F hounded between -1 and 
 0, giving e e (1/2, 1], as usual. In the following discussion, we assume 
 that 
 
 -1 < F < or 
 
 - to2 <X Q <0 (7.5) 
 
 The described approach requires two extra multiplications before the main 
 algorithm can begin. Namely, one multiplication is necessary to determine 
 
 the terms I and F and another to obtain X . 
 
 X 
 The algorithm to evaluate e , described here, is similar to the 
 
 other algorithms, both in derivation and in structure. 
 
 We start with the identity 
 
 X^ X n - 0?i ( tt M. ) ■+ to, ( $ M. ) (7.6) 
 
 . _ i . _ i 
 e = e 1=0 1=0 
 
 -k 
 
 where M= 1 + S • 16 < k < m 
 
 and S e 110, ..., 0, ..., 10} 
 
 Once again, if constants S are selected properly, then 
 
 X n -M ft (1 + S.16* 1 )] = (7-7) 
 
 i=0 x 
 
 and the result is obtained as 
 
 x to ( Tr M ) 
 
 e ° = e i=0 " = 9 (1 + S.16' 1 ) (7.8) 
 
 i=0 x 
 
38 
 
 i.e., in the form of a continued product. To define the selection procedure, 
 
 we note first that 
 
 k-1 
 X. = X_ - z fc (1 + S.16" 1 ), 0<k<m (7.9) 
 
 * ° i=0 1 
 
 Then 
 
 \+l = Xfc " ^ (1 + S k l6" K ), < k < m (7.10) 
 
 and the scaled remainders, upon which the selection is performed, can be 
 defined as 
 
 E^. = l^" 1 ^, < k < m (7-11) 
 
 The basic recursion is 
 
 R^ = I6R, - l6 k &z (1 + S k l6" k ), < k < m (7.12) 
 
 This recursion shows that again precomputed constants of the form 
 Qm (l + S 16 ) are necessary. Since these constants are the same ones used 
 in the logarithm evaluation, all remarks about storage requirements and 
 simplification apply here: 
 
 - for k > ^ « 3 -\ +k \ (k.6) 
 
 the logarithmic constants can be replaced with S *l6 reducing the basic 
 recursion (7.12 ) to: 
 
 R, _ = l6R - S , f or k > k (7-13) 
 
 k+1 k k - 1 
 
 The last expression shows that for k > k, the selection process 
 will be identical to one defined by the additive normalization (5*6). It 
 would be, therefore, natural to try to find selection rules such that the 
 similarity with additive normalization can also be satisfied for k < k.. . 
 
 
39 
 
 We now consider rules for k < k in reverse order. First we recall 
 that the selection rules are determined by choosing appropriate boundaries 
 between intervals in R corresponding to particular S . Next we assume that 
 the five bit precision is sufficient, i.e., the boundaries between intervals 
 can be represented as L/32, L being an integer. To find an interval in R, 
 corresponding to S , the bounds are determined as 
 
 K. 
 
 R^ e((T^ +1 -l6~ X + l6 k "Vl+S k l6~ k ), r^-16" 1 + l6 k "^(l+S k l6" k )) (7. 1*0 
 
 where R , -, and R are minimal and maximal allowed values of R _ , 
 respectively. From the power series expansion, we have 
 
 fcz (1 + S. l6" k ) = s. l6~ k + e (7. 15) 
 
 K. K. K. 
 
 where 
 
 1 s „ 2 + 1 V 1 V + ,_ .,, 
 
 the condition of expansion clearly being satisfied. 
 Let 
 
 e k = l6 k -e k (7.17) 
 
 then, to select S , R must be in the interval 
 
 or, using the assumption of 5 bit precision 
 
 \ « <i , J) (7-19) 
 
ko 
 
 where 
 
 •k ■ r2( Sk + i + s k + V 1 
 
 and for all S (7.20 ) 
 
 \ • ^ 2( \ + i + s k + e k>J 
 
 where [xl denotes the smallest integer not smaller than x, and [xj denotes 
 the largest integer not larger than x. Since R and R, have asymptotic 
 limits - 2/3 and 2/3, respectively, it follows that if | e, | < 1/6, then 
 a, and b can "be determined as 
 
 \ - rs( Sk + i + s k)i 
 
 and for all S (7.21) 
 
 b k ■ i 2(i W + s k^ 
 
 Clearly, |e, | < 1/6 is a sufficient but not a necessary condition. 
 
 If the last expressions for a, and b, are valid then for selection 
 of S, the rounding of R, to the most significant non-sign (radix 16) digit 
 suffices, i.e., the selection process becomes the same as in additive 
 normalization. Of course, once S is obtained, the next remainder is 
 calculated using all terms in the basic recursion (7-13). By calculating 
 e, it turns out that for k > 3, | e, | < l/6 and hence we have simple selection 
 rules as before. 
 
 For k = 2, it can be shown that it is possible to find intervals 
 in R 2 for all S g except S 2 = 10, such that if R g e [ (2S -l)/32, (2S +l)/32) 
 then S is the correct constant. Therefore, if the range of R is restricted 
 so that S = 10 is excluded, again rounding can be used as a selection rule. 
 
 
It has been found that this restriction in the possible range of R does 
 neither affect the selection process for k = 1 and k = nor the representa- 
 tion of e 
 
 For k = 1 intervals are determined, as before, using (7*1^) and the 
 results are given in Table 7»1« 
 
 Table 7.1 
 
 S a < 32R X < b 1 
 
 10 15 32R 
 
 9 lit- 15 
 
 8 12 Ik 
 
 7 11 12 
 
 6 9 11 
 
 5 8 10 
 
 k 6 8 
 
 3 5 6 
 
 2 3 5 
 
 11 3 
 
 0-1 1 
 
 -1 -3 -1 
 
 -2 -5 (-11/2) -3 
 
 -3 32^ -11/2 
 
 For all other values of S , i.e., for (10, . .., k) intervals are not 
 contiguous and hence those constants may not be used. In fact, if the 
 possible range of K p is not restricted, S = k can be included in the set 
 of allowed constants in step 1. The actual selection rules for k = 1 can 
 be specified as in multiplicative normalization, i.e., using conventions 
 described by expressions (2.12 - 18) one can determine an additive constant 
 U and through modified rounding obtain S . Another choice, which is given 
 here, is to restrict the range of R. so that conventional rounding applies. 
 This restriction should not affect the possibility of representation of the 
 required result. From Table 7.1, if -,171 < R, < ,212 then S €{2,1,0,1,2,5) 
 
k2 
 
 can be selected applying rounding to one non-sign digit precision and no 
 special rules, differing from those for k > 1, are necessary. 
 
 To obtain R in the desired range, the 'following initialization 
 (step k = 0) can be devised. Since X e (- fo 2, 0] by assumption and 
 R = X = X - ton M the rules are: 
 
 - 
 
 
 Table 
 
 7- 
 
 ,2 
 
 
 x o 
 
 M o 
 
 
 
 ton M 
 
 R l 
 
 [-1/8, 0] 
 
 1 
 
 
 
 
 
 [-1/8, 0] 
 
 [-3/8, -1/8) 
 
 -iA 
 
 e ' 
 
 
 
 -iA 
 
 [-1/8, 1/8) 
 
 (-*i2, -3/8) 
 
 e -17/32 
 
 
 
 -17/32 
 
 [-.162, .157) 
 
 x o 
 
 Since e e (l/2, 1], it can be easily shown that such a choice for 
 the initialization as well as the restricted sets of constants in steps 1 
 and 2, i.e., S, e [2, «>.., 3) and S e (9, ..., 10} can give the correct 
 continued product representation of the result. 
 
 A summary of the procedure for evaluation of e follows: 
 
 Preparatory Operations P (7*22) 
 
 Step PI. N «- Xfa^ e; 
 Step P2. if X > then: 
 
 I 4- [H] + 1; 
 
 else : 
 
 I -[HI? 
 Step P3 . F «- N - I ; 
 
 X <- F ton 2 ; 
 
h5 
 
 where [N] denotes the integer part of N; after preparatory operations, X 
 will "be in the range (- in 2, 0], with corrected integer part I. 
 
 Algorithm E (Exponential) (7.23) 
 
 (AU1: Normalization) (AU2 : Result Evaluation) 
 Step El. [Initialize] k «- 0; 
 
 E i - x o - in %' E i *" M o' 
 
 Step E2. [Loop] for k < m perform: 
 
 k <- k + 1; 
 
 if k < k, then: K ,, «- K + E. S, 17~ k ; 
 
 1 k+1 k k k ' 
 
 s k-^ ( V u k )l6j; 
 
 Sign S k ♦. Sign R^; 
 
 EL - l6R k - l6 k Sm. (1 -+ S k l6" k ); 
 
 else : , 
 
 Step A2. (Algorithm A); 
 
 where k is defined in (4.6), and U = 1/32. An example is shown in Figure 
 7-1. The read-only memory, for this algorithm, communicates with the 
 normalization unit, Figure 7-2. The remaining configuration is the same 
 as before. 
 
kk 
 
 
 o 
 
 o 
 
 O 
 
 ro 
 
 CM 
 
 r- 
 
 sO 
 
 ** 
 
 in 
 
 in 
 
 v0 
 
 in 
 
 in 
 
 
 o 
 
 o 
 
 o 
 
 in 
 
 CM 
 
 ** 
 
 o 
 
 o 
 
 PO 
 
 ro 
 
 o 
 
 cr 
 
 CT 
 
 -"- 
 
 o 
 
 o 
 
 o 
 
 r~« 
 
 r-4 
 
 CO 
 
 in 
 
 CM 
 
 cr 
 
 cr 
 
 o 
 
 in 
 
 in 
 
 -J 
 
 o 
 
 o 
 
 in 
 
 CM 
 
 CO 
 
 o 
 
 ro 
 
 CO 
 
 m 
 
 m 
 
 CO 
 
 ro 
 
 ro 
 
 < 
 
 o 
 
 o 
 
 r- 
 
 CO 
 
 co 
 
 O 
 
 r- 
 
 CO 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 3: 
 
 o 
 
 o 
 
 CO 
 
 in 
 
 m 
 
 cr 
 
 m 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 >-* 
 
 cr. 
 
 o 
 
 T-t 
 
 CO 
 
 <o 
 
 CO 
 
 ro 
 
 r-4 
 
 r-4 
 
 r~t 
 
 r-t 
 
 r-4 
 
 r-4 
 
 r-l o 
 
 o 
 
 o 
 
 r- 
 
 r-l 
 
 r- 
 
 -t 
 
 >J- 
 
 ** 
 
 ** 
 
 ** 
 
 >*- 
 
 «* 
 
 •4- 
 
 ♦ UJ 
 
 o 
 
 o 
 
 r-\ 
 
 r- 
 
 m 
 
 r~ 
 
 r- 
 
 r- 
 
 r- 
 
 r~ 
 
 r- 
 
 r- 
 
 r~ 
 
 *: a 
 
 o 
 
 o 
 
 o 
 
 r» 
 
 m 
 
 CO 
 
 ro 
 
 ro 
 
 m 
 
 ro 
 
 ro 
 
 m 
 
 m 
 
 UJ 
 
 o 
 
 o 
 
 r- 
 
 CO 
 
 CO 
 
 CO 
 
 cc 
 
 CO 
 
 00 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 z 
 
 o 
 
 in 
 
 m 
 
 «*• 
 
 >r 
 
 ** 
 
 «J- 
 
 >t 
 
 >* 
 
 vt 
 
 ** 
 
 *■ 
 
 •4- 
 
 i— § 
 
 o 
 
 r-i 
 
 O 
 
 o 
 
 o 
 
 cr> 
 
 o 
 
 o 
 
 o 
 
 i— \ 
 
 o 
 
 o 
 
 O 
 
 ***■ 
 
 o 
 
 CD 
 
 <> 
 
 o^ 
 
 o> 
 
 cr 
 
 o- 
 
 cr 
 
 cr 
 
 a* 
 
 cr 
 
 cr 
 
 cr 
 
 r-l O 
 
 r- 
 
 r-t 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 o 
 
 r-l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 NO 
 
 cr 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ro 
 
 in 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CO 
 
 <7» 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 »* 
 
 >* 
 
 
 
 r-4 
 
 cm 
 
 •4- 
 
 eg 
 
 *■ 
 
 r- 
 
 m 
 
 r- 
 
 in 
 
 in 
 
 r-l 
 
 o 
 
 o 
 
 r~ 
 
 o 
 
 
 
 O 
 
 in 
 
 r-4 
 
 ro 
 
 o 
 
 r-l 
 
 in 
 
 o 
 
 r- 
 
 t- 
 
 r-t 
 
 o 
 
 o 
 
 vO 
 
 ro 
 
 
 -•» 
 
 o 
 
 «4- 
 
 CM 
 
 vt 
 
 o 
 
 o 
 
 r-t 
 
 in 
 
 ro 
 
 ro 
 
 O 
 
 o 
 
 o 
 
 cr 
 
 f- 
 
 
 _l 
 
 e> 
 
 r\j 
 
 CO 
 
 ro 
 
 CO 
 
 o 
 
 O 
 
 ro 
 
 O 
 
 a 
 
 o 
 
 o 
 
 o 
 
 o 
 
 in 
 
 
 < 
 
 o 
 
 >o 
 
 CO 
 
 CM 
 
 r-l 
 
 r-l 
 
 in 
 
 cr 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 CO 
 
 in 
 
 
 z: 
 
 o 
 
 cm 
 
 CO 
 
 cr 
 
 CO 
 
 cr 
 
 cr 
 
 O 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 • 
 
 CO 
 
 
 >=* 
 
 o 
 
 cr 
 
 CO 
 
 *t 
 
 r\J 
 
 r* 
 
 r-l 
 
 o 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 r-4 
 
 • 
 
 r-l O 
 
 o 
 
 ro 
 
 cr 
 
 cr 
 
 CO 
 
 o 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 o 
 
 + 
 
 UJ 
 
 o 
 
 r-t 
 
 o 
 
 ro 
 
 r-t 
 
 o 
 
 O 
 
 © 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 
 
 * 
 
 o 
 
 o 
 
 ro 
 
 C\J 
 
 sl- 
 
 O 
 
 o 
 
 O 
 
 o 
 
 o 
 
 -o 
 
 o 
 
 o 
 
 o 
 
 II 
 
 
 X 
 
 
 o 
 
 in 
 
 O 
 
 o 
 
 O 
 
 © 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 <-» 
 
 ii 
 
 
 z 
 
 o 
 
 ro 
 
 r-l 
 
 o 
 
 O 
 
 c 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 a 
 
 u. 
 
 
 *—) 
 
 o 
 
 ro 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 X 
 
 
 
 — 
 
 t-» 
 
 o 
 
 O 
 
 o 
 
 o 
 
 o 
 
 o 
 
 c 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 ** 
 
 
 
 
 • 
 
 • 
 
 • 
 
 « 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 • 
 
 a 
 
 
 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 X 
 
 
 
 
 1 
 
 
 1 
 
 1 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 UJ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 cr 
 cr 
 in 
 m 
 o 
 
 CO 
 
 ro 
 cr 
 in 
 
 II 
 
 O 
 x 
 
 cr 
 
 tn 
 cr 
 «i- 
 
 O 
 ro 
 
 r- 
 in 
 in 
 co 
 
 • 
 o 
 
 II 
 
 UJ 
 
 Q 
 
 « 
 X 
 
 — < 
 
 3E 
 
 *-» 
 
 O 
 
 UJ 
 
 r-l Q 
 
 •► < 
 
 X X 
 
 CC UJ 
 
 I 
 
 CO 
 
 cr 
 cr 
 cr 
 cr 
 cr 
 cr 
 cr 
 cr 
 cr 
 cr 
 
 Z r-t 
 
 1-4 • 
 W I 
 
 o 
 
 Q 
 Q 
 O 
 <vj 
 
 O 
 CO 
 
 <. 
 
 ro 
 co 
 in 
 cr 
 co 
 
 O 
 
 a 
 r\j 
 
 UJ 
 
 UJ 
 co 
 co 
 a 
 cr 
 m 
 cr 
 
 UJ 
 
 O 
 o 
 
 <. 
 o 
 
 CO 
 
 cr 
 rvi 
 
 s0 
 CO 
 CO 
 
 in 
 
 r-4 
 
 UJ 
 
 rg 
 
 O 
 
 CO 
 
 o 
 
 u. 
 
 o 
 cr 
 >o 
 
 CM 
 
 «_) 
 r- o 
 
 U. CO 
 
 o o 
 
 cr nj 
 
 u. «* 
 
 co o 
 
 < ^> 
 
 uj in 
 
 O o 
 
 co u. 
 
 00 < 
 
 O CO 
 
 rvj o 
 
 o r\j 
 
 O -4- 
 
 O -4" 
 
 o u, 
 
 ro u, 
 
 ^ »h 
 
 o >t 
 
 »* o 
 
 m «* 
 
 i i 
 
 H 
 I 
 
 * 
 O 
 
 o 
 o 
 
 u. 
 
 < 
 
 CO 
 O 
 tvt 
 
 «4- 
 -4- 
 u. 
 u, 
 
 o 
 
 o 
 o 
 
 u. 
 
 < 
 
 CO 
 
 >o 
 
 CM 
 
 o 
 o 
 o 
 o 
 
 < 
 
 CO 
 
 NO 
 
 CM 
 
 **- 
 
 «-t u. 
 I I 
 
 0J 
 
 * 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 o 
 
 r-t 
 
 in 
 
 r- 
 cr 
 
 Q 
 CO 
 CO 
 
 H 
 I 
 
 * 
 
 O 
 O 
 O 
 O 
 O 
 O 
 O 
 r-l 
 
 m 
 
 r- 
 cr 
 o 
 co 
 co 
 
 o 
 
 >0 
 CO 
 CO 
 
 *«• 
 
 r- 
 »o 
 cr 
 o 
 
 CO 
 
 o 
 
 X 
 
 a. 
 x 
 
 UJ 
 
 in 
 
 o OJ 
 
 cr 
 I 
 
 CO 
 
 CM 
 I 
 
 r-l US 
 
 OJ 
 
 O r-l 
 
 CM 
 
 CO 
 
 in 
 
 co 
 
 CM 
 
AUl: 
 
 h5 
 
 H" 
 
 EJ 
 
 Rk+l 
 
 ADDER 
 
 U k 
 
 r~r 
 
 
 { 
 { 
 
 Rk + l s 16"k-16 h C|,, 
 C h =^(M h ), l<k<kx, 
 Rk + l = 16R k -S k , 
 
 C|,*l, k^ksm 
 
 AU2: 
 
 E k + l s Ek+EkS|,16' 
 
 T~f ¥ 
 
 (0;±4;±8)»C„I6 ,( 
 
 ADDER 
 — I T 
 
 Rk 
 
 Rk + i 
 
 Ek+l 
 
 ADDER 
 
 ADDER 
 
 (0;±r.±2)*c lt i6 h 
 
 SELECT-COMPLEMENT )*— 0;S k 
 
 T 
 
 SELECT-COMPLEMENT 
 
 16 k C|, ♦ 
 
 }•- i;i 
 
 SHIFTING NETWORK 
 
 ]— (-k)j 
 
 Ck 
 
 ROM 
 
 Sk 
 k 
 
 (0;±4;±8)»16" k E| < 
 
 SELECT-COMPLEMENT 
 
 (0;±i;*2)»i6" k Ek 
 
 U- s> 
 
 SELECT-COMPLEMENT 
 
 16""Ek<> 
 
 SHIFTING NETWORK 
 
 Ek + l 
 
 Figure 7-2 
 
k6 
 
 8. IMPLEMETWATION 
 
 One of the basic features, relevant for an efficient realization of 
 the previously described algorithms, is that the original operation is 
 replaced by the two much simpler processes of limited dependency. Through 
 one process, the normalization, a sequence of constants S, , the digits of 
 the continued products (sums) are generated. Another process, the result 
 evaluation, produces the final result using constants S, . Both processes 
 are defined recursively, requiring only simple hardware operations: 
 addition, shift and multiple formation. 
 
 A realization, providing one separate arithmetic unit for each 
 process, clearly offers the fastest solution and the simplest control 
 requirements. Since both units are identical, as far as the main configu- 
 ration is considered, a replication using an advanced technology should 
 make this cost acceptable. If the speed is not of primary importance, one 
 arithmetic unit can be used in both processes, performed in series. For 
 simplicity, the processes should alternately use the arithmetic unit so 
 that the current value of S, need only be available. A "pipelining" of 
 processes through one adder and shifting network, described at the end of 
 this chapter, can achieve only 15^-25^ slower operation than the double 
 arithmetic unit realization, using essentially one arithmetic unit. 
 
 As mentioned in the Introduction, this investigation of the use 
 of radix l6 in implementation of the described algorithms, has been motivated 
 
V7 
 
 by a possible speed improvement over the radix 2 approach and by some trade- 
 offs in hardware requirements. In the following comparisons of the radix 2 
 [1] and radix 16 solutions, a two arithmetic unit realization is assumed in 
 both cases. Furthermore, no actual design being done in either approach, 
 given comparisons are approximate in nature and restricted only to the 
 size-dominant parts. Control is assumed to be synchronous, each recursive 
 step being performed in one basic cycle. The main parts of the arithmetic 
 units, used in comparisons, are as follows. 
 
 a) The adder structure with the multiple formation networks, 
 used in the radix l6 case, is estimated to be twice as complex 
 as the corresponding part in the radix 2 case. Namely, in the 
 former case, two adders and two (l out 2) select and complement 
 networks are required, while the later case requires one adder 
 with one select -complement network. The speed of addition in the 
 radix 16 case will be only slightly decreased if both adders are 
 unified into one three-input adder. If the add time of a two- 
 input adder (the radix 2 case) is t ^, we estimate that t _/- < 1.2t ^, 
 
 * a2 a 16 a2' 
 
 for sufficiently large m. 
 
 b) The shifting network, required to shift right/left k-digits, 
 for < k < m - 1, is simpler for a higher radix. We assume that 
 the fast shifting network is realized using a "barrel switch" 
 technique [8]. Namely, shifting is performed in two or more levels 
 so that the combination of level shifts corresponds to the required 
 total shift. This technique, besides being fast, ensures low 
 loading requirements and the shifting can be done using same paths 
 both ways: shift count is represented in two's complement, a 
 negative number specifying left shift. Implemented in integrated 
 
kQ 
 
 circuits technology easily with its regular and simple structure, 
 the barrel switch as a standard block can be used also in some other 
 operations, e.g., shifting, normalization, etc. Because of an 
 additional level, the shifting network in the radix 2 case is 
 estimated to require 30$- 50$ more hardware than radix l6 for 
 m = 48 to 6k bits. For example, if m = k8, then level 1 may 
 provide displacements of 0, l6 and 32 positions, level 2 provides 
 then displacements of 0, k f 8 and 12 positions, and in the radix 
 2 case, an additional level 3 would be necessary with the 
 displacements 0, 1, 2 and 3 positions. Speedwise, then, t > 
 1.3t h -,s, where t , denotes shifting delay. 
 
 c) The selection procedure in the radix 2 case requires implementa- 
 tion of a simple k- bit comparison. For the radix l6 approach, the 
 required precision for selection is 7 bits and the five Boolean 
 equations (2.23), costing less than k-0 literals, are to be 
 implemented. As described before, the selection is performed 
 using rounding, so the additional inputs to the 7 most significant 
 positions of the adder should be provided as well as the 5 bit 
 register S to store the current value of the constant S, . Even 
 with those requirements, the selection hardware size is small 
 compared with the rest of the unit. In the radix 2 case, this 
 
 is even more true, so the selection hardware requirements are 
 neglected in both cases. 
 
 d) For m bits precision, the number of precomputed logarithmic 
 constants, stored in the read-only memory (ROM) is about m, in 
 the radix 2 case, and about 3m, in the radix l6 case. 
 
k9 
 
 e) The control part, which includes also the step counter (two 
 bits shorter in the radix l6 case) is not considered as being 
 highly dependent on a particular realization technique. 
 From the above considerations, the hardware requirement ratio for 
 the radix 2 and the radix l6 is approximately 2:3. The basic cycle can be 
 taken to be the same, since the add time is dominant over control, selection 
 and shifting time- The ROM capacity requirement ratio is about 1:3 in 
 favor of the binary case. 
 
 Let the performance of an implementation in the radix r be 
 P = Ib^r/Tr, where Tr is the total delay necessary to evaluate b^x bits of the 
 result, as defined in [k] . Tr is equivalent to the basic cycle. In the 
 radix 2 case, the probability of S, = (p = 2/3) is utilized by providing 
 an adder bypass and reducing the number of full basic cycles to m/3 on the 
 average, where m is number of bits. Then, it can be taken that the radix l6 
 basic cycle is T /- = 3T p , since the number of basic cycles in the radix l6 
 case is always the same, the probability of S, = being too low. Then, 
 the ratio of performances is P /-/P = U/3 on the average. If the efficiency 
 of the implementation is defined as the ratio between performance and cost 
 per bit, then, with all previous assumptions, E ^-/E == 1 without consider- 
 ing ROM requirements. If the ROM capacity requirement is taken into account 
 then the radix 2 approach will offer more efficient design, but the radix l6 
 case will maintain better performance with shorter execution time. The 
 selection procedure for radix l6 has been shown to be sufficiently simple. 
 Even with the available efficient technological solutions, the use 
 of two arithmetic units may be objectionable. Since both the process of 
 normalization and the process of result evaluation have addition as the 
 basic operation, a "pipelined" use of the same adder would be possible, 
 
50 
 
 provided proper latching of the operands and the results is made. One way 
 to achieve this is shown in Figure 8-1. The adder with multiple formation 
 networks is split into two equal parts AS" and AS' "by breaking the carry 
 path and inserting a one-bit carry register C. Outputs from the left half 
 SN" of the shifting network are to be saved in a latch L. The selection 
 is carried out in the block S on the basis of adder outputs and returns the 
 value of S, . The initial operands are in register B, for normalization, 
 and in register A, for result evaluation. Each register contains two 
 separately controlled halves, B", B 1 and A", A'. The outputs from AS" and 
 AS 1 are connected, under a separate control, to the inputs of A" and A' 
 registers, respectively. One separate path a from A" to SN" must be pro- 
 vided. The operation of this scheme is described for the division algorithm, 
 with the help of Figure 8-2, with the initial control details omitted. 
 
 The normalization process requires realization of the recursion 
 
 -k+1 
 R = l6R + S + l6 S,R, while the result is evaluated as Q^. +1 = 0^ + 
 
 16" SQ.. Since the operand l6R, in the first equation corresponds to the 
 
 operand Q. in the second equation, additional path b from B to AS must be 
 
 provided as well as one k- bit register not shown in the scheme, to save 
 
 the most significant digit of the left half of R. 
 
 The operation begins with the divisor X in the B register and the 
 
 dividend in the A register. Corresponding to the scheme, the superscripts 
 
 1 and " denote the right and the left half of each result. The basic cycle 
 
 now contains two periods, each period terminated with the clock pulse. 
 
 The time of the period corresponds to approximately one half of the full 
 
 length addition time. The registers are assumed to be of master-slave type. 
 
 In the first period, R| is obtained and then, simultaneously, Q^ from A' 
 
 is transferred to B',R' from AS' to A' and the generated carry bit is saved 
 
51 
 
 SHIFT 
 COUNT 
 
 I 
 
 AS" 
 
 TJ 
 
 n 
 
 SN" 
 
 AS* 
 
 SN' 
 
 B" 
 
 B' 
 
 T 
 
 T 
 
 A' 
 
 Figure 8-1 
 
52 
 
 
 
 
 
 
 ( 
 
 1* 
 
 
 
 
 
 
 
 u-J 
 
 
 
 
 mi 
 
 .1 
 
 
 
 
 -J 
 
 
 
 J, 
 
 « E a e 
 
 
 
 10 
 
 ♦ 
 f 
 
 CM 
 
 ,a 
 
 
 
 O ,0C 
 
 
 
 
 /, 
 
 
 
 / 
 
 + 
 
 
 + 
 
 E 
 
 . E . E 
 
 
 _i 
 
 = E 
 
 E 
 
 
 CM 
 
 a st 
 
 
 
 
 tr 
 
 CO 
 
 E 
 
 
 / 
 
 
 
 • 
 • 
 
 
 • « 
 
 » 
 
 Q? # O 
 
 
 
 -J 
 
 • 
 
 • 
 
 • 
 
 » 
 
 • 
 
 
 
 
 • 
 
 • 
 
 1 
 
 • 
 
 • 
 
 • 
 
 
 
 
 
 / 
 
 • 
 • 
 
 • 
 • 
 
 / 
 
 • 
 
 CM 
 
 IT) 
 
 -N - CM 
 
 
 
 _j 
 
 S i-« S CM 
 
 M 
 
 CO 
 
 
 
 / 
 
 - ^ _ CM 
 
 
 _j 
 
 / 
 
 
 
 * 
 
 V 
 
 
 
 ■ 
 
 7 
 
 
 •-I 
 
 IO 
 
 
 
 
 _i 
 
 s O s ^ 
 
 ,tr 
 
 CO 
 
 
 
 
 
 _i 
 
 ./. 
 
 
 
 CM 
 
 o cr 
 
 
 
 
 x 
 
 
 
 
 
 A 
 
 
 
 11 
 
 ^ 
 
 
 -H 
 
 - o .0 
 X 
 
 
 
 - O 5 
 
 X 
 
 CO 
 
 
 
 
 • • • • 
 
 
 
 -j 
 
 
 
 
 
 "m "«* 
 
 > 
 
 X 
 
 = co s < 
 
 CO 
 
 3 
 
 
 , 
 
 tr 
 
 
 
 . 
 
 
 
 
 
 
 
 <r 
 
 »- 
 
 O CD 
 
 
 
 
 
 UJ UJ 
 
 < 
 
 < 
 
 UJ UJ 
 
 
 
 
 tr oc 
 
 
 
 _i 
 
 tr «r 
 
 
 1- 
 u. 
 
 X 
 CO 
 
 
 OJ 
 
 I 
 
 00 
 
 •H 
 
53 
 
 in C. During this period, operation on the left half is inhibited. In the 
 
 second period, the right half evaluates QJJ while the left half finishes 
 
 calculation of R by obtaining R". The latch L and the path a provide for 
 
 correct sharing of the shifting network. Once R" is obtained, S can be 
 
 determined, the register transfer, now on both halves, is done and the 
 
 process repeats. Shift count is changed when Qj* is obtained, matching 
 
 requirements of both recursions. Path b is selected whenever R is being 
 
 calculated. After 2m + 3 periods, where m is the number of radix l6 digits, 
 
 the result Q, = Q ,, is obtained in the A register. It is reasonable to 
 m+1 ° 
 
 estimate that the period will take 0.55-0.6 of the basic cycle so that the 
 
 total time of operation in the pipelined mode will be increased by 15$ to 
 
 2.^0. Since the major blocks, adder with multiple formation networks and 
 
 shifting network, are reduced from two to one, and since the new data paths, 
 
 latches and the extra control are still significantly less complex than 
 
 the major blocks, the solution might be optimal. 
 
 Without going into detailed description, we mention that a pipelined 
 
 implementation is also possible in radix 2. With the assumption that 
 
 (t ,.„, + t , ) < < t , the total number of full cycles will be about 
 snxx \j s s_LGC"c ct&cL 
 
 M/3 on the average, where M is the number of bits, if bypassing of the 
 adder is performed whenever S, =0. In a pipelined version, analogous to 
 the one previously described for radix 16, after S is determined and before 
 the next period starts, the full scaled remainder is in the register A. If 
 the next S is zero, the transfers between registers are inhibited and only 
 one-bit left shift on the register A is performed, making selection of the 
 next S possible. When S becomes non-zero, the normal operation is 
 resumed. Therefore, the average number of full cycles can be preserved 
 in a pipelined version. 
 
5^ 
 
 9. CONCLUSIONS 
 
 A radix 16 approach for implementation of the algorithms based on 
 the continued products (sums), as proposed by DeLugish [1] in the binary- 
 case, has been studied. Those algorithms, in general, offer simplicity in 
 mechanization and uniformity in hardware requirements for a wide class of 
 elementary functions. The use of a higher radix makes the execution of the 
 algorithms faster, but additional complexity, both in the selection pro- 
 cedures and hardware, must be considered. For the radix 16 case, it has 
 been found that the selection rules remain relatively simple. Namely, 
 the starting difficulties disappear after the first three steps, making 
 the rules very simple. Furthermore, after performing the initial steps, 
 an increasing number of constants S is simultaneously available at each 
 successive step. This property can be utilized to simplify normalization 
 or to define a variable radix method, provided fast and cheap multi -input 
 adder arrays are available. Such a method would restore fast convergence 
 of the algorithms, which is, in some sense, lost by specifying algorithms 
 in a step by step mode using a fixed radix. Hardware requirements for 
 implementation of algorithms based on continued products (sums) are, even 
 for the radix 2, greater than those of conventional arithmetic units but 
 still not prohibitive. A fast variable shift network, not commonly found 
 in conventional arithmetic units, is an essential part for the proposed 
 algorithms, but can be used advantageously in many other operations, like 
 floating point normalization, conversion between floating point and fixed 
 point number representations, shifting, etc. 
 
55 
 
 Only division, multiplication, logarithm and exponential have been 
 presently considered. Whether square root, trigonometric and inverse trigo- 
 nometric functions can be easily included in the radix l6 approach, remains 
 to be decided by finding corresponding selection rules. It is believed 
 that this is possible. 
 
 Since the basic hardware is used for implementation of many algorithms, 
 even a design with two arithmetic units would be acceptable. The outlined 
 "pipeline" solution makes the entire approach more attractive, since the 
 most complex parts, like the adder structure with the multiple formation 
 network and shifting network, are shared by both processes. This solution 
 illustrates also a possible general approach in defining arithmetic algorithms: 
 a difficult operation is decomposed into a set of simple processes with such 
 interdependencies that the overlapping of their execution is feasible. In 
 this particular case, the normalization and the result evaluation are two 
 such processes. 
 
LIST OF REFERENCES 
 
 56 
 
 [1] B. G. DeLugish, "A class of algorithms for automatic evaluation of 
 certain elementary functions in a binary computer," Report No. 399, 
 Department of Computer Science, University of Illinois, Urbana, 
 June 1970. 
 
 [2] J. E. Robertson, "A new class of digital division methods," IRE 
 Transactions on Electronic Computers, vol. EC -7, pp. 218-222, 
 September, 1958. 
 
 [3] , Lecture Notes for Computer Science Courses 39^ 
 
 (Fall 1970) and k&2 (Spring 1971), Department of Computer Science, 
 University of Illinois, Urbana. 
 
 [k] D. E. Atkins, "A study of methods for selection of quotient digits 
 during digital division," Report No. 397, Department of Computer 
 Science, University of Illinois, Urbana, June 1970* 
 
 [5] J. E. Voider, "The CORDIC trigonometric computing technique," IEEE 
 Transactions on Electronic Computers, vol. EC -8, No. 5, PP» 330-33^, 
 September 1959* 
 
 [6] J. S. Walther, "A unified algorithm for elementary functions," AFIPS 
 Conf. Proc, vol. 38, pp. 379-385, Spring Joint Computer Conference 
 1971. 
 
 [7] W. H. Specker, "A class of algorithms for LnX, ExpX, SinX, CosX, 
 Tan x, and Cof^X, " IEEE Transactions on Electronic Computers, 
 vol. EC-llj-, No. 1, pp. 85-86, February 1965. 
 
 [8] R. L. Davis, "The ILLIAC IV processing element," LEEE Transactions on 
 Computers, vol. C-18, No. 9, pp. 8OO-816, September 1969. 
 
57 
 
 APPENDICES 
 
 A. Derivation of equations u. = f (R, J 
 
 6 -i 
 We defined U_ = g u.2 . From the Table 2.3 it can be observed that 
 
 i=l X 
 
 u. = f . (r .) i = 1, . . ., 6 
 
 J 
 
 3 = 0, ..., I), 
 
 Using minterm notation we obtain: 
 
 - for r _ = 1: 
 
 "don't care" minterms are m , m , ..., m^ 
 
 U l = U 2 = ° 
 
 u 3 = m io + m ll = F 2 
 
 \ = m i0 + m !2 = (F 2 + S^ (A1) 
 
 u 5 = r o 
 
 u 6 = m^ = r fh 
 
 for r = 0: 
 
 "don't care" minterms are m^ , m , ..., ul . 
 
 u l = U 2 = u 5 = \ = ° 
 
 U 5 = m = *fk 
 
 IV = m n = r,r 
 
 (A2) 
 
 6 = 111., 5= i,I| 
 1 3 4 
 
58 
 
 Combining (Al) and (A2), equations for step k = 1 are: 
 
 U l = U 2 
 
 u = r *r 
 3 2 
 
 u 5 = r + *?k 
 
 B. Derivation of equations u. = f(R^). 
 
 l 2 
 
 - for r n = 1: 
 
 "don't care" minterm are m , m , ..., m. 
 
 U l = U 2 = "3 = \ = ° 
 
 u 5 = r Q (Bl) 
 
 u 6 = m 5 + m 6 + "V + "8 + m 9 + m 10 = \ + 7 2 ( V^ } 
 
 - for r_ = 0: 
 
 "don't care" minterms are m...., ^o* •••> m ir 
 
 u l = U 2 = U 3 = \ = ° 
 
 u = m + n^ + m 2 + nw + m^ + m = r^r^-Hr,) (B2) 
 
 u 6 = m 6 + "7 + "8 + m 9 + ^0 = r l + r 2 r 3 
 but u and U/- can be given also as : 
 
 u 5 = r i (? 2 +? 3 ) + r 6 
 u g = 
 
 according to remarks given after Table 2.k. 
 
 Therefore, for step k = 2 
 
 U l = U 2 = "3 = \ = ° 
 
 u^ = r Q + r 1 (r 2 +r" 5 ) + rg (B3) 
 
 U 6 = r o (r l +r 2 r 3 ) 
 
LIOGRAPHIC DATA 
 ET 
 
 1. Report No. 
 
 UIUCDCS-R-72-5^1 
 
 3. Recipient's Accession No. 
 
 itle and Subtitle 
 
 Radix l6 Division, Multiplication, Logarithmic and 
 Exponential Algorithms Based on Continued Product 
 Representations 
 
 5- Report Date 
 
 August, 1972 
 
 6. 
 
 uthon -< 
 
 Milos Dragutin Ercegovac 
 
 8. Performing Organization Rept. 
 No. 
 
 erforming Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 618OI 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 NSF GJ-813 
 
 Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 13. Type of Report & Period 
 Covered 
 
 Research 
 
 14. 
 
 iSuppiemeni ar y Note' 
 
 Abstracts 
 
 This thesis describes an investigation of radix l6 approach in defining 
 a class of similar algorithms for automatic evaluation of some elementary 
 
 functions. The algorithms for division, multiplication, natural logarithms, 
 
 and exponential evaluation are developed using continued product (sums) 
 
 and redundant digit sets. A new "pipelined" version of implementation 
 
 is proposed and some basic comparisons between radix 16 and radix 2 
 approaches are given. 
 
 Key Words and Document Analysis. 17a. Descriptors 
 
 Computer arithmetic, 
 Continued products, 
 Continued sums, 
 Division, 
 Multiplication, 
 Natural logarithm, 
 Exponential, 
 
 Identifiers /Open- Ended Terms 
 
 Multiplicative normalization, 
 Additive normalization, 
 "Pipelined" implementation, 
 Redundant digit sets, 
 Radix 16 
 
 lC OSA I I I i,Kl/' 
 
 roup 
 
 Yailability Statement 
 
 Release Unlimited 
 
 19. Security Class (This. 
 Re port 1 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 N TIS-39 (10-70) 
 
 USCOMM-DC A0329-P7 1 
 
•&' 
 
 «{*• 
 
 ^