't • The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN DEC 16 ill L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/classofalgorithm399delu Report No. 399 «/// t^-A AUG ,; 1970 Z- A CLASS OF ALGORITHMS FOR AUTOMATIC EVALUATION OF CERTAIN ELEMENTARY FUNCTIONS IN A BINARY COMPUTER by Bruce Gene De Lugish June 1, 1970 IHE LIBRARY OF THE JUL 3 1 1 Report No. 399 A CLASS OF ALGORITHMS FOR AUTOMATIC EVALUATION OF CERTAIN ELEMENTARY FUNCTIONS IN A BINARY COMPUTER* by Bruce Gene De Lugish June 1, 1970 Department of Computer Science University of Illinois Urbana, Illinois 618OI This work was supported in part by the National Science Foundation under Grant No. US NSF GJ812 and Grant No. US NSF GJ813 and was submitted in partial fulfillment for the Doctor of Philosophy degree in Electrical Engineering, 1970. Ill ACKNOWLEDGMENT The author wishes to express his sincerest gratitude to his advisor, Professor James E. Robertson. Professor Robertson's patience, guidance, invaluable aid and encouragement, timely suggestions, and con- tinuing interest in this research are most appreciated. The author also wishes to thank fellow student, Mr. Daniel E. Atkins, III, who often offered his time to discuss various aspects of the research and who proofread the entire text. Further, the author wishes to thank Mrs. Norene McGhiey who typed the final version of the paper and Mr. Mark Goebel for drafting services. Finally, the author wishes to express his gratitude to the National Science Foundation which provided funds for computer computations and to the Departments of Electrical Engineering and Computer Science which provided support in the form of assistantships and fellowships. iv TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1 Justification of the research effort 1 1.2 Redundancy and a measure of efficiency 1 1.3- Previous accomplishments 2 lA Assumptions 3 1.5 Procedure 3 1.6 Results 5 2. THE NORMALIZATION TECHNIQUE 7 2.1 Normalization schemes for division 7 2.1.1 Basic principle of normalization 7 2.1.2 Other normalization division algorithms 10 2.2 The standard (redundant) multiplication scheme 15 2.3 Normalization schemes for square rooting 17 2.^4- Remarks on scaling 23 2.5 Concluding remark 2k 3. THE ALGORITHM FOR DIVISION 25 3.1 Basic algorithm 25 3.2 Choice of initialization step 27 3.3 Error bound 31 3.^ Experimental estimate of speed 33 3.5 Implementation 3^ 3.6 Concluding remark 37 V Page h. THE ALGORITHM FOR MULTIPLICATION 38 k .1 Basic algorithm. . . , 38 U.2 Choice of initialization step kO h.3 Error bound kk h,k Experimental estimate of speed h6 U.5 Implementation i+7 k.6 Concluding remark U7 5. THE ALGORITHM FOR NATURAL LOGARITHM ' . . U9 5.1 Basic algorithm 1*9 5.2 Choice of initialization step 51 5.3 Error bound ^h 5.U Experimental estimate of speed 5^+ 5.5 Implementation 5^ 5.6 Concluding remark 56 6. FIRST ALGORITHM FOR SQUARE ROOT 57 6.1 Basic algorithm 57 6.2 Choice of initialization step 59 6.3 Error bound 62 6.U Experimental estimate of speed 65 6.5 Implementation 66 6.6 Concluding remark 66 7. SECOND ALGORITHM FOR SQUARE ROOT 70 7.1 Basic algorithm 70 7.2 Choice of initialization step 72 VI Page 7.3 Error bound 75 7 .h Experimental estimate of speed 79 7.5 Implementation 79 7.6 Concluding remark 79 8. THE ALGORITHM FOR EXPONENTIAL 82 8.1 Basic algorithm 82 8.2 Choice of initialization step Qk 8.3 Error bound ' . . 87 Q.k Experimental estimate of speed 92 8.5 Implementation 92 8.6 Concluding remark 93 9. THE ALGORITHM FOR TANGENT (OR COTANGENT) 95 9.1 Basic algorithm 95 9.2 Choice of initialization step 99 9.3 Error bound 102 S,h Experimental estimate of speed 106 9.5 Implementation 107 9.6 Concluding remark 107 10. THE ALGORITHM FOR COSINE AND SINE 109 10.1 Basic algorithm 109 10.2 Example 112 10.3 Error bound 11^ 10. k Experimental estimate of speed 118 10.5 Implementation 118 10.6 Concluding remark 11° Vll Page 11. THE ALGORITHM FOR ARCTANGENT 119 11.1 Basic algorithm . . » 119 11.2 Choice of initialization—error bound: Case 1 121 11.3 Choice of initialization—error bound: Case 2 126 11. k Experimental estimate of speed 129 11.5 Implementation 129 11.6 Concluding remark 129 12. A NOTE ON EVALUATION OF ARCCOSINE/ARCSINE ' . .131 13. ON A HIGHER RADIX IMPLEMENTATION 133 13.1 General considerations 133 13.2 Amenability of normalization algorithms to higher radix . . .13^ 13.3 A change in strategy 136 13.^ Efficiency of radix k versus radix 2 137 Ik. CONCLUSIONS 139 lU.l A set of algorithms 139 14.2 Areas of further investigation 1^0 LU.2.1 Generalization of the technique lUO lU.2.2 Correspondence between normalization recodings and classical multiplier recodings LUO 1^.2.3 Some partial insight? 1^1 LU.2.U A different radix k approach 1^3 lU.2.5 Some practical matters 1^-3 APPENDIX A. MONTE CARLO ESTIMATE OF THE PROBABILITY OF A ZERO ihk VI 11 Page B. DISCUSSION OF THE CHOICE OF MULTIPLIER CONSTANTS IN THE COSINE/SINE ALGORITHM 150 C. LISTING OF PRECOMPUTED CONSTANTS 152 D. LISTING OF PARTIAL SIMULATION PROGRAM CODE 159 BIBLIOGRAPHY 179 VITA 182 IX LIST OF FIGURES Figure Page 1. Block diagram for division 36 2. Block diagram for multiplication.- kd 3. Block diagram for logarithm 55 h. First block diagram for square root 67 5. Second block diagram for square root 68 6. Third block diagram for square root 69 7. Fourth block diagram for square root 80 8. Block diagram for exponential 9k . 9. Block diagram for tangent 108 10. Block diagram for cosine/sine 117 11. Block diagram for arctangent 130 A CLASS OF ALGORITHMS FOR AUTOMATIC EVALUATION OF CERTAIN ELEMENTARY FUNCTIONS IN A BINARY COMPUTER Bruce Gene De Lugish Department of ' Electrical Engineering University of Illinois, 1970 The time required to evaluate elementary transcendental functions in a digital computer can often be significantly reduced by performing the algorithms in hardware rather than software form, providing that efficient r hardware algorithms can be developed. Such time reduction may be important in batch processing on a large machine when the mix of problems includes frequent evaluation of such functions and may be essential in a special purpose real-time machine such as the guidance system of an airborne vehicle where an increase in speed is justified at almost any cost. In situations which warrant the increase in hardware investment, primarily in control complexity, necessary to implement these algorithms, schemes which utilize redundant number recodings are also quite well justi- fied. Ordinarily the introduction of redundancy, that is, allowing more than r digital values in radix r so that the representation of numbers is no longer unique, provides an increase in speed in exchange for some increase in hardware complexity. Usually the increase in hardware investment is moderate relative to the cost of the machine. Algorithms to evaluate common elementary functions to register length precision in from one to three "multiplication cycle times" are developed; a "multiplication cycle time" is defined as the time required to perform, on the average, M low precision comparisons, M shifting operations, and M/3 additions where M is the length of the mantissa of a floating-point operand. The table below summarizes the results. A small register-speed read- only-memory containing less than 100 precomputed constants is required. 2 TABLE Function Division Logarithm Square Root Exponential Tangent Cosine/Sine Arctangent Arccosine/ Arcsine Multiplication Cycle Times 1 1 1 3 2 2 1 3 It may be possible to extend the techniques employed in the development of these algorithms to a much wider class of functions. Algorithms which are readily amenable to implementation in radix r = 2 , n > 1, promise even greater increases in speed than those limited to radix 2 implementation. Of the algorithms listed in the table, only the in- verse trigonometric fail to fall in this category. 1. INTRODUCTION 1.1 Justification of the research effort The continuing reduction of the size and cost and the increasing reliability of integrated circuit logic elements, especially LSI, encourages the utilization of more complex logic networks in the arithmetic unit of a digital computer. Furthermore, since in most large modern machines the cost of the arithmetic unit is only a small portion of the cost of the entire machine, it is possible to substantially increase the percentage cost of the arithmetic unit without appreciably affecting its relative cost. In arith- metic units these cost and size reductions encourage the replacement of very frequently used programmed subroutines by built-in function generators. Division is now almost universally hard-wired; algorithms for hard-wired square root have often been proposed. In future machines, especially those intended to perform substantial numbers of scientific calculations, it may well be feasible to hard-wire algorithms to evaluate logarithm, exponential, cosine/sine, tangent, and arctangent as well as square root. With these thoughts in mind, an investigation leading to the development of such algorithms was undertaken. 1.2 Redundancy and a measure of efficiency The introduction of redundancy, for example allowing digital values 1, 0, 1 (where 1 means -l) in binary, allows, in general, several different representations of any particular number. Thus one can choose a recoding procedure which increases the probability of a zero, p n , which increases the shift average = l/(l-p ). Making the usual assumption that and 1 are equally likely, one can see that for conventionally represented numbers, p = l/2 and = 2. It is known that by introducing minimal redundancy (three values 1, 0, 1 in radix 2) one can increase p to 2/3 and to 3. It is also known that one cannot achieve a higher shift average with minimal redundancy. The feasibil- ity of further increasing redundancy in the algorithms developed in this research has not been studied because it is generally felt that the speed- hardware ratio diminishes rapidly. 1.3 Previous accomplishments Very considerable research has been and is being directed toward formulation of fast division schemes, including, for example, the SRT (and scaled SRT ) division algorithm which employs redundant representation of output digits to achieve greater speed. Several algorithms have also been developed for evaluation of various other elementary functions; included are the works of Meggitt, Specker, Senzig, and Voider. Most of the algorithms devised to date employ non-redundant number representation, although often digital values 1, 1 are used rather than 0, 1. The same factors which encourage hard-wiring of algorithms for evaluation of elementary functions also encourage the use of redundant number recodings which require more hard- ware in exchange for greater speed. The analysis of the SRT division and a study of the correspondence between that division and multiplier recoding procedures carried out by Robertson, ' Penhollow, Frieman, Shively, and others lead to the conjecture by the author that redundancy techniques could be favorably employed in achieving greater efficiency in algorithms beyond division and 12 square rooting. That that conjecture holds some validity, at least under the assumptions discussed below, is verified in this paper. 3 l.k Assumptions It is assumed throughout this paper that the time required to perform addition is significantly greater than the time necessary to perform low precision comparisons, shifting operations, or complementations. It is further assumed that, with present hardware technology, one can economically build small (less than 100 words) read only memories which can be accessed at register speed. It is strongly felt that these assumptions are justified by presently available hardware. 1.5 Procedure The fundamental technique employed throughout this research is the "normalization" of a given operand (or a simple function of that operand) by means of a continued product or continued summation, the terms of which may be easily chosen in a step-by-step process. The procedure for formulating an algorithm consists of the following operations: (l) choice of an appro- priate function of a given operand to be "normalized"; (2) approximation of that function by a continued product or continued sum whose terms are easily chosen; (3) utilization of the individual terms comprising that product or sum, as they are chosen, to form the desired function; (k) choice of an appropriate rule for selecting the individual terms; (5) scaling of the operand upon which the selection rule is based so that the rule is the same for every step in the recursion. The procedure is best illustrated by the following example. One scheme for performing division is to form the reciprocal of the divisor as a continued product of simple "multiplier" constants, using each of the "multipliers", as it is chosen, to form a better approximation to the quotient; viz. 1* q, .. = q_ d , q = dividend k+1 k k' where -■Id. ->0. X i=0 x In this particular scheme, the quantity 1 k --Ed x i=o x k is "normalized" to zero implicitly by explicitly forcing x IT d,- to unity. 1=0 k One must formulate a selection rule, based on the quantity 1 - x H d n - , to i=0 1 ' choose the next multiplier. It is convenient to scale the partially normal- ized quantity, upon which the selection rule is based, in such a way that the selection rule is the same for every step of the recursion. A convenient scaling here is u = 2 (1 - x n d ) * i=0 X with the resulting recursion Vi = 2 \ + \ + 2 s k\ where d k 1 + s 2 k -k ■ ( 1 if u < - c k otherwise 1 if u > + c for some comparison constant c. The efficiency of the algorithm is strongly dependent on the form of the recursion for two reasons: first, the more complicated the recursion the more time and/ or hardware required to implement 5 it; second, the form of the recursion determines the difficulty of formulating a higher radix implementation which could lead to even greater speed. This fundamental technique, termed the "normalization" technique in this paper, is discussed in detail in Section 2. In Sections 3 through 12 the technique is applied to various elementary functions — division, square rooting, logarithm, exponential, the circular functions, and their inverses. For each algorithm developed, an error bound and some methods of implementa- tion are discussed. Finally, in Section 13, some considerations of importance in a higher radix implementation are discussed. 1.6 Results With three adder circuit configurations of the type discussed in this paper, one can perform the algorithms listed below in the approximate average amounts of time indicated. A "multiplication cycle time" is the time required to perform M low precision (3 or k bits) comparisons, M shifting operations, and M/3 additions/subtractions, where M is the length of the mantissa. TABLE 1 Function Range of Operand Allowed Mult Cyc iplication le Times Division Machine Range 1 Logarithm X > 1 Square Root X > 1 Exponential Machine Range 3 Tangent < X < 2n 2 Cosine/Sine < X < 2n 2 Arctangent Machine Range 1 Arccosine/Arcsine < X < 1 3 Worst case error bounds are developed for each algorithm; typically worst case errors are less than one part in 2 REFERENCES 1 J. E. Robertson, "A New Class of Digital Division Methods," IRE Transactions on Electronic Computers , EC-7: 5:218-222, September, 1958, 2 G. A. Metze, "A Class of Binary Divisions Yielding Minimally Repre- sented Quotients," IRE Transactions on Electronic Computers , EC-11:6: 76I-76U, December, 196*2. 3 J. E. Meggitt, "Pseudo Division and Pseudo Multiplication Processes," IBM Journal of Research & Development , 6:2:210-226, April, 1962. h W. H. Specker, "A Class of Algorithms for Ln X, Exp X, Sin X, Cos X, Tan"l X, and Cot~l X," IEEE Transactions on Electronic Computers , EC-lU:l:85-86, February, 1965. 5 D. N. Senzig, "Digit-By-Digit Generation of the Trigonometric and Hyperbolic Functions," IBM Research Report , RC-860, December 17, I962. 6 J. E. Voider, "The CORDIC Trigonometric Computing Technique," IEEE Transactions on Electronic Computers , EC-8:5:330~33*S September, 1959. 7 J. E. Robertson, "Increasing the Efficiency of Digital Computer Arithmetic Through Use of Redundancy," Lecture Notes for EE U97B, University of Illinois, Fall Semester, 196^. 8 J. E. Robertson, "The Correspondence Between Methods of Digital Division and Multiplier Recoding Procedures," University of Illinois DCL Report No. 252, December 21, 1967. 9 J. 0. Penhollow, "A Study of Arithmetic Recoding with Applications in Multiplication and Division," University of Illinois DCL Report No. 128, September 10, 1962. 10 C. V. Frieman, "Statistical Analysis of Certain Binary Division Algorithms," Proceedings of the IRE , ^9:1:91-103, January, 1961. 11 R. R. Shively, "Stationary Distributions of Partial Remainders in SRT Digital Division," University of Illinois DCL Report No. 136, May 15, 1962. 12 G. A. Metze, "Minimal Square Rooting," IEEE Transactions on Electronic Computers , EC-l>+:2:l8l-l85, April, 1965. 2. THE NORMALIZATION TECHNIQUE 2.1 Normalization schemes for division 2.1.1 Basic principle of normalization Most of the division and square root algorithms known to the author appear to be derived on the basis of a hand -computation technique. Although some of the algorithms derived from the normalization technique used in this research have been known for some time, it is believed that the derivation by this technique provides further insight into the problem of automatic function evaluation because the normalization technique can be extended to algorithms beyond division and square root for which hand- computation methods are not known. Considerable attention is devoted in this paper to division and square root, not because the algorithms devised are better than those previously known (indeed, some coincide), but rather because it is believed that such a discussion leads to a fuller understanding of the general technique. With this thought in mind, let us begin with a normalization approach division algorithm. Suppose one wishes to find the quotient of two numbers represented in normalized binary floating-point format, X = x • 2 a x,y e [l/2, l) Y = y • Z a,p integers. Let X be the divisor and Y be the dividend. The exponent of the quotient Q, presents no particular problem, differing by at most one from the quantity (p-a). q = I = J£s£ = z . 2 0-oO = . 2 (^) X JX x x«2 where q = y/x; q e(l/2, 2). One is thus able to concentrate his efforts on the fractional parts of the operands and devise a convenient algorithm for computing q. Suppose one multiplies both the dividend and divisor by a sequence of (non-zero) constants {d.} such that the resulting divisor tends towards unity; one produces the quotient q. M y;\ d i M x M . ~ 1 1=0 x IT d. i=0 : M if x TT d. = 1. i=0 x Clearly the choice of the set of constants {d.} is critical, both for "normalizing" the divisor towards unity and for performing the indicated multiplications . form It is convenient to choose the set of constants {d } to be of the i d. = 1 ± 2" 1 . i One might choose the positive sign if the partially normalized divisor is less than unity and the negative sign otherwise. Then each "multiplication" can be performed by a shift (of i places) and an addition (preceded by a complementation if the sign is negative) . If two complete adder circuits are available, both the partially normalized divisor x^ and the partially developed quotient q are formed simultaneously and independently. k *k+l = x o .* d i = *A> x o = X \ + i = % .% d i - VV *o ■ y ' The time required to perform a division with this algorithm is approximately the time required to do M additions, assuming that comparisons, shifts, and complements are very much faster than additions. The above process is the normalization approach analog of the the non-restoring recoding in the following sense. Let us write d. = 1 + s.2 _1 , s. = [1, 1} . i l ' l ' Then the sequence {s.} represents a non-restoring recoding of the reciprocal of the divisor. Note, however, that the quotient is obtained in conventional form (digital values 0, l) rather than in recoded form (digital values 1, l). Since in the above algorithm, the probability of choosing s. = is zero, the shift average = l/(l-p ) is unity. One can increase the shift average (and the speed) by allowing s. = 0. For example, one might propose the following alternate selection th rules: for the k step, 1 if x^ > 1 - 2 _k d k = 1 + 2 _k if x. < 1 - 2" k . The division would then be completed in M/2 "addition cycle times," on the average, for a shift average of two. An "addition cycle time," in this context, includes the time required for comparison, shift, and complement, 10 as well as the addition itself. This alternate algorithm is the conventional recoding with digital values 0, 1. Clearly, one can go a step further and introduce redundancy, that is, allow more than two digital values. In particular, one may allow digital values 1, 0, 1. If the algorithm is properly scaled (i.e., the proper selection rules are chosen), a shift average approaching three can be achieved. It is known that a scale factor (essentially the ratio of com- parison constant to multiplier) in the range [2/3, 5/6] yields a recoding for which the probability of a zero approaches its limit of 2/3. A convenient scale factor in the allowed range is 3A» Thus one may formulate the following algorithm for division: for the k step, d, = 1 + s, 2 _k , 1 < k < M k ' — - s, = 1 if x. < 1 - 3A ^ otherwise 1 if x^ > 1 + 3A -k (The initial multiplier d may be chosen in a special way.) This division algorithm requires an average of only M/3 addition cycle times (if two adders are available) compared to M/2 addition cycle times for a conventional (0, l) division and M addition cycle times for the common non-restoring (l, l) division. This division algorithm is discussed in detail in Section 3. 2.1.2 Other normalization division algorithms While the division algorithm just proposed has been chosen for detailed study in this paper, three other normalization division schemes are known and are discussed here for completeness. To compare the four algorithms, 11 it is convenient to write each in a recursive form, as is done for the first algorithm here. Let k' a k + i ■ 1 - «<£ d i ) - 1 - (1 - a k )d k -k But d = 1 + s 2 , so k k ' a k + i ■ -% 2 ' y ' + a k (1+s k 2 " k ) a k + i = a k + s k v" k - s k 2 " k (s - x) where GC approaches zero. This algorithm is a multiplicative reciprocal formation, i.e., M TT d. S - i=0 x X which simultaneously uses the digits of the reciprocal to form the quotient. Such a formulation of this algorithm leads quite naturally to an analogous additive reciprocal formation, represented by the following recursion. k a, . = 1 - x( Z d! ) k+1 1-0 x where d| = s.2 . Then, l l ' 0. It is convenient, as a preparatory step, to normalize in a manner such that the exponent is even, X = x • if Xe [X ' k > ^ en' even integer. The exponent of the root is thus a' /2 and is formed by shifting a". One is thus able to concentrate his efforts on the fractional part of the operand and formulate a convenient algorithm for computing r = V x. Let us begin by considering a multiplicative approach that is quite analogous to the first division scheme proposed' in Section 2.1. One multiplies the given operand x by a sequence of (non-zero) constants {r.} such that the resulting operand tends towards unity. M x n r. 1-0 x x = M IT r. i=0 x where r i\2 = (1 + s.2 ) i l 18 . -(i-l) 2 -21 . = 1 + s.2 + s. 2 , s. = {1, 0, 1} Thus, so that, or M 1 - x IT r. = 0. 1=0 x M M • ? n n r. = H (1 + s.2" 1 ) = - 1=0 X 1=0 M . IT (1 + s.2" 1 ) = -i • ^ i "\/x 1=0 v M . y 7 IT (1 + s 2 X ) 3.= 1=0 x v* M . __ x IT (1 + s.2" 1 ) ="\/x. 1=0 x The selection rules for the choice of the set of multiplier constants {r.} are essentially the same as the selection rules for the division scheme represented by recursion formula (2-l). To carry out the algorithm, two re- cursions are performed simultaneously and independently. R = 1 - x( n r ) k+1 i=0 X 1 - Vl = x " Vk 1 - (1 - P k )r k 19 p k + l =1 '\ + Vk -. o-(k-l) 2 ^"2k But r. = 1 + s, 2 v ' + s. 2 . , so k k k ' ^i = Pv + 0v-l)(2s v + s^ 2 - k )2" k k+1 ^k XK k k k (2-7) R, _ = x IT -Vr~ ^ +1 i=0 x x n (l + s.2" 1 ) i=0 X = x k-1 n (i + s.2" 1 ) i=0 X (1 + s k 2" k ) = R. + s R 2 k k k -k (2-8) where (3 approaches zero and R approaches the required root. Note that by initializing R to an arbitrary value y, one can form the more general quantity y/-\[x~ rather than the more common problem of finding yx (when y = x). This algorithm is discussed in complete detail in Section 6. The above algorithm is a multiplicative formation of the reciprocal of the square root, i.e., M . n^7 = -2=-. i=o v x -V7 Such a multiplicative formulation leads one to investigate the feasibility of the analogous additive reciprocal root formation, represented by the following recursions. 20 k P k + l = X " X (Z v^T)' i=0 = 1 - Vi where r! 1 -i 2 (s.2 ) . Then l k-1 -i ~-kx2 1=0 = 1 - x ^f 1 -iv2 _ -k ^ -i 2 -2k ( L s.2 ) + 2s 2 L s.2 + s 2 1=0 X K i=0 x K k-1 1 - x( Z s.2" 1 ) 2 - 2s, x 2~\ + s 2 1=0 k k p k - 2s k x V + \ 2 (2-9) \ +1 ■ £ VT k Z s.2" 1 1=0 x k-1 . . Z s.2" 1 + Sl 2" k 1=0 x k = R. + s 2 K. k -k (2-10) where (3 approaches zero and R approaches the reciprocal of the square root 21 of the given operand. From recursion (2-9) one can see that this algorithm cannot be performed efficiently because the indicated multiplication of x by R is indeed a full-scale multiplication. It is interesting to note also that the given operand must be set aside in a special register throughout the algorithm. Furthermore, in order to achieve a minimal recoding, the comparison constants must be scaled with respect to the root of the operand, (in practice one must convert this scaling to a scaling with respect to the operand itself, since, after all, the root is not known). This algorithm seems to offer no redeeming features and is not studied in detail or proposed for implementation. Both of the square root algorithms discussed to this point are reciprocal root formation schemes, although in the first the root itself can easily be formed by appropriate initialization. The algorithms discussed below form the root directly. An algorithm which is quite similar to that represented by recursion relations (2-7) and (2-8) but which forms the root directly is that represented by the following recursion relations. 1=0 k-1 = x - ( n r )r i=0 x k = x - (x - P k )r k = x - x r k + P k r k 1 o-(k-l) 2 -2k But r = 1 + s 2 ' + s. 2 , so K K k 22 p k+ i - \ + \(\ s ~^~ 1] + 4 £ " 2k) / -(k-l) 2 -2kv x(s k 2 + s k 2 ) ^ P k + (P k - -)(23 k + s* 2" k )2- k (2-11) k r. . = n Vr~ k+1 i=o x k n (l + s.2" 1 ) i i=0 k-1 n (i + s.2" x ) 1=0 (1 + s k 2- k ) R. + s n R 2 k k k -k (2-12) where (3 approaches zero and R approaches the root of the given operand. From relation (2-ll) one can see that the comparison constants must be scaled with respect to the operand. Since this algorithm is less efficient than that first discussed, it is not proposed for implementation. Finally, one may consider an additive algorithm which forms the root directly. 'k+1 k x - ( E i=0 r! ) 2 l' k-1 x -tA-V 9 !^' i=0 k-1 k-1 1=0 X V i=0 V 23 k-1 .-k„ 2 -2k ■ p k " s A 2 " (k_1) - 4 2 " 2k <2-w>' k _ r, . = E -J7T ^ +1 1=0 V 1 k = E s.2 l-o x -1 = R + s2" k (2-14) k k It is shown in Section 7 that this algorithm must be scaled indirectly with respect to the root of the given operand (in practice, directly with respect to the operand itself), but the recursions are so simple and convenient that it is feasible to propose an implementation. This algorithm is discussed fully in Section 7. Hence, as in division, four normalization schemes are known; possibly others exist. Two algorithms from this set are studied in detail and are proposed for possible implementation because there is no clear case to be made for one in preference to the other—one is preferable in the strictly binary case, the other more readily amenable to a higher radix implementation. 2.h Remarks on scaling With regard to the necessity for scaling, the following table sum- marizes the observed requirements. 2k TABLE 2 (Requirements for Scaling) Multiplicative (Continued Product) Additive (Continued Sum) yx I 1 None wrt y None wrt x None None wrt x wrt ~Vx wrt ~\/x The necessity for scaling of the comparison constants in order to achieve a minimal recoding is basically an observed phenomenon which is not fully understood at this time. It is certainly an interesting question for further study. 2.5 Concluding remark Most of the remainder of this paper is concerned with the derivation of those algorithms proposed for implementation, a discussion of convenient initialization, a brief study of implementation and hardware requirements, and a discussion of error bounds for the algorithms. REFERENCE J. E. Robertson, "Increasing the Efficiency of Digital Computer Arithmetic Through Use of Redundancy," Lecture Notes for EE h^JB, University of Illinois, Fall Semester, I96U. 3. THE ALGORITHM FOR DIVISION 3.1 Basic algorithm In Section 2.1.1, a division scheme is proposed which is discussed here in some detail. It is assumed that one is attempting to formulate an algorithm for computing the quotient of two numbers represented in the usual binary format, X = x • 2 a x,y e [l/2, l) Y = y • 2 P a, £ integers where X is the divisor and Y is the dividend. The quotient, as well as the divisor and dividend, is assumed to lie within machine range. As mentioned earlier, the exponent of the quotient presents no problem and one may con- centrate on the ratio of fractional parts, x Let us multiply both divisor and dividend by a sequence of (non-zero) constants {d.} chosen in such a way that the resulting divisor tends towards unity. M x M J . n l x n d. 1=0 i=o x M if x H d. = 1. i=0 x With the hardware configuration of Figure 1 (Section 3.5), both the partially normalized divisor x^ and the partially developed quotient q may be formed simultaneously and independently. 25 26 *k+l = x o A d i = VV x o = x > q k + i = q o . n n d i = q A' q o = y > 1=0 -k where d. = 1 + s. 2 , 1 < k < M, = 1 if x. < 1 - c k otherwise 1 if x. > 1 + c ,-k The initial multiplier d is chosen in a special way. [See Section 3.2.] It is known that a minimal recoding (p = 2/3) will result if the comparison constant lies in the range [2/3, 5/6], the most convenient choice for c in this range being 3/^. However, it is also convenient to bound the variable upon which the actual comparisons are made, to lie in the range (-1, +l), so that each u, can be represented in the machine with only a sign bit to the left of the radix point. This is easily accomplished by scaling down both the comparison constants and the multipliers by a factor of two. Let \- s, = 1 + 1/2 s 2" k = 1 + s, 2"( k+1 ), 1 < k < M, 1 if x, < 1 - 3/8 • 2" k otherwise 1 if x^ > 1 + 3/8 • 2 -k or equivalently, 27 -{ 1 if % < -3/8 otherwise 1 if \ > +3/8 where \ = 2 k (^-D as previously indicated. It is shown that | tjl j < 1 for k = 0, 1, . . .., M, within the desired range of (-1, +l) . 3.2 Choice of initialization step The initial operand x Q = x lies in the range [l/2, l) . The object k of the normalization process is to force x, = x„ H d. to unity. The k+1 i= Q 1 selection rules for the set of multipliers {d.} are essentially symmetric about unity, the only lack of symmetry lying in the choice of the placement of equality signs, so it would be convenient to choose the initial multiplier d so as to force x = x d to lie in a range symmetric about unity. Thus the following choice of initial multiplier would be desirable. / 2 (shift) if 1/2 < x < 2/3 d n = ( if 2/3 < x_ < 1. This selection would leave x, in the range (2/3, V3)> symmetric about unity, and would make the choice of s = 1 and s, = 1 later in the algorithm equally likely. However, the comparison constant of 2/3 specified in this rule re- quires a full scale comparison in order to choose d . A less convenient, but much more practical choice of initialization step is the following. d o = 28 2 (shift) if 1 if 1/2 < x Q < 3A 3A < x Q < 1. Then, x x e [3A, 3/2) and the probabilities, for a finite register length (where the skewness in the probability density of x^ has not yet dissipated), that s = 1 and s = 1 are not equal. However, as verified by experimental means, the probability that s = is still very nearly 2/3, and that is the critical factor. Refer to Section 3.^-. Having chosen an initialization step, the algorithm is now completely specified. An example illustrates the flow of the algorithm. Example : Numerical values of x and q are listed below, rounded to lU decimal places. The divisor x = 0.6 = 0.10011001 (where the overbar indicates a repeated digit pattern) and the divisor y = 0.5 in = 0.10000000. The correct quotient, also rounded to 1^ decimal places, is q= 0.83333333333333. For this divisor, d = 2, so that x = 1.2 and q = 1.0. 29 TABLE 3 Vl l k+l 11 " 1 ■ ' 1 1.20000000000000 1.00000000000000 2 -1 0.90000000000000 O.75OOOOOOOOOOOO 3 1 1.01250000000000 O.8U375OOOOOOOOO ' k 1.01250000000000 O.8U375OOOOOOOOO 5 1.01250000000000 O.8U375OOOOOOOOO 6 -1 0.99667968750000 0.83056640625000 T 0.99667968750000 0.83056640625000 8 1 1.00057296752930 0.83381080627441 9 1.00057296752930 O.8338IO80627U+I 10 1.00057296752930 O.83381080627441 11 -1 1.00008^0651000 O.833U0367209166 12 1.00008^0651000 O.833U0367209166 13 1.00008^0651000 0.833^0367209166 Ik -1 1.00002336620198 O.83335280516832 15 -1 0.99999284791078 O.8333273T325898 16 0.9999928U791078 0.83332737325898 17 1 1.000000U772507U O.83333373IOU228 18 1. 000000 V772507U 0.8333337310^228 19 1.000000^7725074 0.8333337310^228 20 1.000000U772507U 0.8333337310^228 21 -1 1.0000000000U136 0.83333333367780 22 1.00000000004136 0.83333333367780 23 1.0000000000U136 0.83333333367780 2k 1.0000000000U136 0.83333333367780 25 1.00000000004136 (Continued) 0.83333333367780 30 TABLE 3 (Continued) k+1 26 1.00000000004136 27 1.00000000004136 28 1.00000000004136 29 1.00000000004136 30 1.00000000004136 31 -1 0.99999999994769 32 0.99999999994769 33 0.99999999994769 34 1 1.00000000000590 35 1.00000000000590 36 1.00000000000590 37 -1 0.99999999999863 38 0.99999999999863 39 1 1.00000000000044 Ho 1.00000000000044 *k+l 0.83333333367780 0.83333333367780 O.8333333336778O 0.83333333367780 0.83333333367780 0.83333333328975 0.83333333328975 0.83333333328975 0.83333333333825 0.83333333333825 0.83333333333825 0.83333333333219 0.83333333333219 0.83333333333370 0.83333333333370 Thus the quotient produced in 40 steps of this algorithm is Vl = °- 8 3333333333370 -12 which differs from the correct quotient by 0.37 x 10 . The error bound, -12 derived in the next section, is 0.45 x 10 for M = 40 steps. 31 3.3 Error bound Given an initial operand x_= x in the range [l/2, l) and the selection rules listed in the last section for the choice of the set of multipliers, one may now produce a bound on x^ , the resulting divisor, and ultimately an error bound for the division algorithm itself. Since, / d - < then and 2 if 1/2 < x Q < 3/U 1 if 3A < x Q < 1 X l = x o d o € [3/u > 3/2) u x = 2( Xl - 1) e [-1/2, 1). Next let us find the range of x = x d . The selection rule for the first (k = l) step is, d, = 1 + lA s 1 if 3/U < x 1 < 13/16 if 13/l6 2, x^ e [1 - 3/8 ' 2~^ k " 1 \ 1 + 3/8 • 2~^~ 1 '). The hypothesis has been shown explicitly to be valid for k = 2. Proof : For some k > 2, x^ e [1 - 3/8 • 2"' k-1 ', 1 + 3/8 • 2 _ ^ k " 1 ' ) ) or, equivalently, x. e [1 - 3A * 2~ k , 1 + 3A • 2 ~ ) • For all k > 2, d n = 1 + 1/2 s n 2 k k -k 1 if x. < 1 - 3/8 • 2 = otherwise 1 if ^ > 1 + 3/8 • 2 -k The first range, [l - 3 A • 2~ k , 1 - 3/8 • 2" k ) maps onto or (1 + 1/2 • 2" k ) • [1 - 3A ' 2~\ 1 - 3/8 - 2~ k ), [1 - lA • 2" k - 3/8 • 2" 2k , 1 + 1/8 • 2" k - 3/16 ' 2" 2k ), which lies within the desired range of [1 - 3/8 • 2 _k , 1 + 3/8 • 2" k ) The second range, [1 - 3/8 • 2~\ 1 + 3/8 • 2" k ) maps onto itself, and thus lies within the desired range, The third range, [1 + 3/8 • 2"\ 1 + 3A ' 2" k ) 33 maps onto (1 - 1/2 • 2" k ) [1+3/8 • 2~ k , 1 + 3/k • 2' k ) or [1 - 1/8 • 2~ k - 3/16 . 2" 2k , 1 + 1 A • 2" k - 3/8 • 2~ 2k ), which again lies within the desired range. Hence, x k+1 e [1 - 3/8 • 2~\ 1 + 3/8 • 2" k ) ' for all k > 2. Furthermore, |uj < 3A for a11 k > 2 and k | < 1 for all k. Q.E.D. Therefore, the final divisor Vl e [1 - 3/8 • 2" M , 1 + 3/8 - 2" M ) so that the error in the final divisor is bounded by |x M+1 - 1| < 3/8 • 2- M . The algorithm is thus capable of producing M correct quotient bits in M steps -(M+l) beyond the initialization, the error in the algorithm being less than 2 -, neglecting round-off. 3.^ Experimental estimate of speed The theoretical prediction that, with the selection rules chosen, the probability that s be zero is 2/3 is an asymptotic result, that is, it assumes a register of infinite length. To test the value of the algorithm in any practical machine, it becomes necessary to estimate the probability of a zero for a relatively short register. For this reason, this division algorithm was simulated on the available IBM 3^0/75 computer system for a register length M of kO bits, and a Monte Carlo estimate of the probability 3h .18 of a zero was made. For a statistically significant sample of 2 pseudo- random divisor operands, approximately uniform over the range [l/2, l), the mean probability of a zero is O.676, with a corresponding shift average of 3.08. As a practical matter, these figures differ very little from their theoretically predicted asymptotic values of 2/3 and 3.00. The relevant statistical considerations are discussed in Appendix A. 3.5 Imp leme n t at i on The necessary recursion formulas to implement the division process are those given earlier, namely, x. _ = x n IT d. aii, x = x, k+1 . _ 1 kk' i=0 \ + i = q o . n n d i = W> q o = y > 1=0 except that the former recursion must be rewritten in terms of u = 2 (x. - l) so that the comparison constants remain fixed at ± 3/8 (except for the initial- ization) . Vi = 2 \ + \ + w (3-D o"(k+l) q. , = q, + 2 x / s 1 q k+1 k kk (3-2) where S k = if 1 if ^ < -3/8 otherwise \ > +3/8. 35 To implement the first of these recursion relations, one requires a counter (0 to M) to keep track of the step being performed, a comparison circuit to select the value of s (represented by a sign bit and a magnitude bit), a register to hold x, (eventually sc ), a complementing circuit to form the negative of a given operand, a shifting network capable of rapidly shift- ing k places (0 < k < M), a full adder (which probably includes the register mentioned above), and facilities for shifting a single bit position. To implement the second recursion, one requires an additional register to hold q (probably falling within the full adder again), another complementing circuit and shifting network, and another full adder as indicated. Of course, control circuitry is also required, but that may be provided in either hard- ware or microprogramming form. The control requirement is a design criterion which is highly dependent on other machine design parameters and cannot be discussed in this paper in any detail. A hardware structure sufficient to implement all algorithms dis- cussed in this paper has been developed in block diagram form. The flow of information for the division algorithm is indicated in Figure 1. Those boxes, which are not required by the division algorithm are shown in dashed form; the counter and comparison circuitry are not explicitly shown. The read-only memory indicated in Figure 1 is required by several functions in the set; its size is a function of the register length, as well as which functions in the set are to be implemented in the machine. To implement all of the functions discussed, a total of slightly more than Um/3 words need be stored in the memory. The "mini -adder" is an adder capable of adding only a single bit (in any digital position) to a full precision operand. The "one-bit shifter" is capable of shifting left one digital position. 36 I 1 I £ I i i i I 2 I i i r-jr-T >i > ■ §g ■ i 1 I E I I 3 I I 1 I I I I I -I L.I I — i KK !?E! i> i £ i i § i I I ^ I I -J <0 D- 4 + p 1 1 '5. I I 5g I i si i IS | M OJ ^-v ^ + Ai ^ M *-" + OJ M X a< w ^ w + + «r d? ii II H H + + ^ a* 37 3.6 Concluding remark A division algorithm using an implicit redundant recoding and yielding a shift average of approximately three has been proposed. It has been shown that the algorithm can be realized with reasonable cost in hard- ware, especially with the expected advances in hardware technology (LSI). It is shown later in this paper that the structure of Figure 1 is sufficient to implement the other elementary functions in the set as well. While considerable discussion has been devoted to division tech- niques, it should be kept clearly in mind that the avowed purpose of this research was not to develop new division schemes, although that has been done, but rather to extend the techniques used in those schemes to the elementary transcendental functions. k. THE ALGORITHM FOR MULTIPLICATION 4.1 Basic algorithm In Section 2.2, it is shown that an additive normalization scheme leads to a feasible algorithm for multiplication, but that a multiplicative normalization scheme does not. The details of the additive scheme are discussed here. In formulating an algorithm for multiplication, one need only be concerned -with the product of the fractional parts y and x of the multiplicand Y and the multiplier X. M M p = yx = y(x - E m. + E m. ) 1=0 x 1=0 x where ,-k n^ = 1/2 s k 2"^, s k = {1, 0, 1} The selection rules must be chosen in such a way that so that M x - E m. = i=0 x M = y E m. . P = y i=0 The selection rule at the k step for the selection of s, is virtually the same as that for division, except that the operand is being forced to zero rather than unity. 1 if x, < - 3/8 • 2~ k s k =/ otherwise 1 if x^ > +3/8 • 2 -k 38 where 39 *k + l = x o " Z n m i 1=0 k-1 -E = X- - A m. - TIL i=0 x ^ = x k ■ V x = x Pk + 1 - * £ »i k-1 y £ m i + y^k 1=0 = p k + y\, v = o. Letting u = 2 x, , one may write the two recursions which must be implemented to perform multiplication. = 2 Vi = ^k - s k p k + l = p k + ys k 2 -(k+1) (h-1) where the selection rule for s, now reads, k s k =/ 1 if ^ < -3/8 otherwise 1 if \>+ 3/8. Note that a register must be provided to hold the original multiplicand throughout the process. i+0 k .2 Choice of initialization step The initial operand x n = x lies in the range [l/2, l) . The object k of the normalization process is to force x, n = x_ - Z m. to zero. The K+l i= Q 1 selection rules for the set of constants {m.} are essentially symmetric about zero, and it is convenient to choose the initial multiplier m so as to force x = x - m to lie in a range symmetric about zero, the extent of the range being as small as possible. Further, m should contain no more than a single non-zero bit since one would like to form P l = P + ^0 in no more than one addition cycle time. The choice of m Q = J 1/2 if 1/2 < x Q < 3/U 1 if 3/U < x n < 1 satisfies all of these criteria since it contains only a single non-zero bit and leaves x.. in the range [-l/h, +l/U) . It is shown in the next section that such a choice of initialization leads to a convergent algorithm. An example illustrates the flow of the algorithm. Example : With a multiplicand of 0.5 and a multiplier of 0.6, the correct product is 0.3. For this multiplier, m = l/2, so that x = 0.1 and p = 0.25. in TABLE 4 Vl \+i ~^~ 1 1 0.10000000000000 0.25000000000000 2 1 -0.02500000000000 0.31250000000000 3 -0.02500000000000 0.31250000000000 1+ -1 0.00625000000000 0.29687500000000 5 0.00625000000000 O.296875OOOOOOOO 6 1 -0.00156250000000 0.30078125000000 7 -0.00156250000000 0.30078125000000 8 -1 0.00039062500000 0.29980468750000 9 0.00039062500000 0.29980468750000 10 1 -0.00009765625000 0.30004882812500 11 -0.00009765625000 0.30004882812500 12 -1 0.00002441406250 0.29998779296875 13 0.00002441406250 O.29998779296875 11+ 1 -0.00000610351563 0.30000305175781 15 -0.00000610351563 0.30000305175781 16 -1 0.00000152587891 0.29999923706055 17 0.00000152587891 0.29999923706055 18 1 -0.00000038146973 0.30000019073486 19 -0.00000038146973 0.30000019073486 20 -1 0.00000009536743 0.29999995231628 21 0.OOOOOOO95367 43 0.29999995231628 22 1 -0.00000002384188 0.30000001192093 23 -0.00000002384188 0.30000001192093 24 -1 0.00000000596046 0.29999999701977 25 0.00000000596046 (Continued) 0.29999999701977 1+2 TABLE h (Continued) \ 26 1 27 28 -i 29 30 i 31 32 -l 33 3^ 1 35 36 -1 37 38 1 39 1+0 -1 Vi -0. 0000000011+9012 -0.0000000011+9012 0.00000000037253 0.00000000037253 -0.00000000009313 -0.00000000009313 0.00000000002328 0.00000000002328 -0.00000000000582 -0.00000000000582 0.0000000000011+6 0.000000000001^6 -0.00000000000036 -0.00000000000036 0.00000000000009 fk+1 0.30000000071+506 0.30000000071+506 0.2999999998137^ 0.2999999998137^ 0.30000000001+657 0.30000000001+657 0.29999999998836 0.29999999998836 0.30000000000291 0.30000000000291 0.29999999999927 0.29999999999927 0.30000000000018 0.30000000000018 0.29999999999995 Thus the product generated by the algorithm in 1+0 steps is P M+1 = 0.29999999999995 -13 which differs from the correct product by 0,5 x 10 . The error bound, -12 derived in the next section, is 0.3*+ x 10 for M = 1+0. It may be observed that the sequence of values of s takes on a peculiar pattern for this multiplier: 01010101. To dispel any inference that such a pattern exists for all multipliers, let us change the multiplier slightly and repeat the example. Example : With y = 0.5 and x = 0.6l, p = O.305. Also, m = l/2, x.. = 0.11, P x = 0.25. h3 TABLE 5 Vl k+1 ■mm 1 1 0.11000000000000 0.25000000000000 2 1 -0.01500000000000 0.31250000000000 3 -0.01500000000000 0.31250000000000 1+ -0.01500000000000 0.31250000000000 5 -1 0.00062500000000 0.30468750000000 6 0.00062500000000 0.30468750000000 7 0.00062500000000 0.30468750000000 8 0.00062500000000 0.30468750000000 9 0.00062500000000 0.30468750000000 10 1 0.00013671875000 0.30493164062500 11 0.00013671875000 0.30493164062500 12 1 0.0000146481+3750 0.30499267578125 13 0. OOOOII+6I+8I+375 0.30499267578125 11+ 0.00001464843750 0.30499267578125 15 1 -0.00000061035156 0.30500030517578 16 -0.00000061035156 0.30500030517578 IT -0.00000061035156 0.30500030517578 18 -0.00000061035156 0.30500030517578 19 -0.00000061035156 0.30500030517578 20 -1 -0.00000013351440 O.305OOOO667572O 21 -0.00000013351440 O.305OOOO667572O 22 -1 -0.00000001430511 0.30500000715256 23 -0.00000001430511 0.30500000715256 24 -0.00000001430511 0.30500000715256 25 -1 0.00000000059605 (Continued) 0.30499999970198 kk TABLE 5 (Continued) *k+l 26 0.00000000059605 27 0.00000000059605 28 0.00000000059605 29 0.00000000059605 30 1 0.00000000013039 31 1 0.00000000013039 32 1 0.00000000001397 33 0.00000000001397 34 0.00000000001397 35 1 -0.00000000000058 36 -0.00000000000058 37 -0.00000000000058 38 -0.00000000000058 39 -0.00000000000058 4o -1 -0.00000000000013 0.30499999970198 0.30499999970198 0.30499999970198 0.30499999970198 0.30499999993481 0.30499999993481 0.31+099999999302 0.3^099999999302 0.34099999999302 0.30500000000029 0.30500000000029 0.30500000000029 0.30500000000029 0.30500000000029 0.30500000000006 4.3 Error bound Given an initial operand x = x in the range [l/2, l) and the selection rules listed in the first section for the choice of the set of constants, one may produce a bound on x^ , and ultimately an error bound for the multiplication algorithm. It has already been noted that, with the previously indicated choice of m X x € [-1/4, +1/4). Also, u^ = 2x e [-1/2, +1/2) ^5 Following the example of the division algorithm, one may construct an induc- tive error bound proof. Hypothesis : For k > 1, x^ e [-3/8 • 2"^ k_1 % + 3/8 * 2~( k ~ 1 '). The hypothesis has been shown explicitly to be valid for k = 1. Proof : For some k > 1, x^ e [-3/8 ' 2"^ k_1 % +3/8 • 2"^"^) or x k e [-3A ' 2~ k , +3/U • 2" k ). For all k > 1, \ - ^ S k 2 -k 1 If x.< -3/8 • 2 s, = < otherwise k ] 1 if x k > +3/8 • 2 _k . The first range, maps onto or [-3 A • 2" k , -3/8 • 2" k ) [-3A ' 2 _k , -3/8 • 2" k ) + 1/2 • 2" k [-1A • 2" k , +1/8 . 2" k ), which lies within the desired range of [-3/8 • 2 _k , +3/8 • 2 _k ) The second range, [-3/8 • 2"\ +3/8 • 2" k ) maps onto itself, and thus lies within the desired range. h6 The third range, [+3/8 • 2~\ +3A ' 2" k ) maps onto [+3/8 • 2~ k , +3 A * 2" k ) - 1/2 • 2" k or [-1/8 • 2~ k , +l/k • 2" k ) again within the desired range. Hence, x^ e [-3/8 • 2~ k , +3 /8 . 2" k ) for a 11 k > 1. Furthermore, l^j < 3/k for all k > 1 and |uj < 1 f or all k, since u e [l/2, l) Q.E.D. Therefore, ? Vl' ^ 3/8 • 2 " M . The error in the multiplication algorithm, yX M+l is thus less, in magnitude, than 3/8 • 2 and the algorithm is capable of producing M correct product bits in M steps beyond the initialization. h.k Experimental estimate of speed The Monte Carlo estimate of the mean probability of a zero is 0.68^4, with a corresponding shift average of 3. IT. U7 J+.5 Implement at i on The recursion formulas necessary to implement the multiplication algorithm are those given in (^--l) and (U-2), repeated here for reference purposes. \ + l ■ Pk + ^k 2 ^ ^-2) Except that a. special register is required to hold the multiplicand y through- out the process, the hardware required to perform multiplication is virtually the same as that required for division. A block diagram indicating the flow of information required in the implementation of (^-l) and (h-2) is shown in Figure 2. k.6 Concluding remark A multiplication algorithm using an implicit redundant recoding and yielding a shift average of approximately three has been proposed. It has been shown that this algorithm is compatible with the division algorithm in that both require virtually the same hardware configuration. k8 o if) ' + I § I I 1 I I 2 I i i l OM l II j£ tt & 111 *^~ UJ cc • 1 i it i i" 1 1 I I o <0 r\r-n 1 sie ' - 1 2g i IS I l_4_J * (0 II JC >*_/ >~ UJ a a < 1 z 2 ONE-BIT SHIFTER REGISTER a o •H -P co o •H rH ft •H -P o B CO u CO •H <0 X a o H pq OJ M l 2 >l I 58 I i si i I S ! + en 1 CVI CO >> P* CVJ + ft II II rH + p^ rH + ft 5. THE ALGORITHM FOR NATURAL LOGARITHM 5.1 Basic algorithm The algorithm to compute the natural logarithm of an operand X > is very similar to the division algorithm discussed in Section 3) in fact, the normalization process is identical. Rather than forming the quotient in the second adder circuit, one merely forms the sum of a set of precomputed constants drawn from a register- speed read only memory [ROM], the ROM being the only piece of hardware, in addition to that required by division, required to implement this algorithm. Given an operand X = x • 2 , x e [l/2, l), one may write, in X = in x + a in 2. The second term in this expression may be evaluated in one multiplication cycle time. Let N N a = 2 Z m , n^ = s k 2" , s fc = {0, 1} i=0 2 -N 2 +N where N determines the dynamic range of the machine, (2 ,2 ). Typically N < 10; if N = 10 the dynamic range of the machine is (2 ,2 ) or roughly (10 , 10 ), sufficient for most problems. Since a contains relatively few bits (that is, normally N is considerably smaller than M), it is not necessary to recode a in order to speed up this multiplication; there is no advantage to completing this multiplication before the computation of in x is accomplished. A standard (n on -redundant) multiplication process suffices. a, in 2 = 2 [(in 2) Z m. ]. i=0 X h9 50 Letting k p k+1 = (in 2) E m. 1=0 the necessary recursion may be written, *W ■ p k + s k (to 2)2 ~ k > 5 ■ ° which is identical to .recursion relation {h-2) with y = ,0n 2. Thus, one adder circuit configuration suffices to evaluate a. tin 2, providing only that the value of £n 2 is stored in the read only memory. Simultaneously and in- dependently, the logarithm of the fractional part may be evaluated using two additional adder circuit configurations; three such configurations are re- quired in all. A final addition is required, so that it takes one addition cycle time longer to compute the natural logarithm than to perform a division. The computation of a In 2 requires no further comment, so one may concentrate on an algorithm to compute Zn x, x e [l/2, l). As in the division algorithm of Section 3> one multiplies the operand x by a sequence of constants [£.}, conceptually dividing x by the same continued product. M x n z. i=o x x = M IT £. 1=0 x where Z, - d, = 1 + l/2 s 2 , 1 < k < M, Z~ = d._, the same constants as k k k ' — ~ those chosen in the aforementioned division scheme. Then, M M Zn x = Zn(x IT £ . ) - in( II &.) i i i=0 1=0 M M = Zn(x II Z.) + Z {-Sin £.). 1=0 i=0 x 51 A power series expansion for Zn 6 is Zn 6 = (6 - 1) - 1/2 (8 - l) 2 + 1/3 (6 -l) 3 - l/k {& -l) k + . [0 < 6 < 2] It is known from Section 3.3 that , M -M x^ - 1| = |x IE i - 1| < 3/8 • 2 M . +x i=0 x Then to machine accuracy (M bits), M Zn (x IT I. ) = i=0 1 and M Zn x = E (-£n £. ). 1 i=0 Providing that the values { -Zn Z .} are precomputed and stored in the read only memory, one may form Zn x by performing the normalization process of the division algorithm of Section 3 to choose the set {Z.) with one adder configuration while forming the summation of the appropriate stored constants with a second adder configuration. Note that when s = 0, Z, =1 and Zn Z, = 0, so that no addition need be performed in the either adder. 5.2 Choice of initialization step Since the normalization process is identical to that of the pro- posed division, it suffices to choose Z n = d . The value of Zn 2 required by this choice was already to be stored in the memory. An example is provided to illustrate the similarity of this algorithm to that for division. 52 Example : Given x = 0.6, £n x = -O.5IO82562376599. As indicated below, the algorithm produces an approximation to In x given by -O.5IO825623766I+U which -12 is in error by 0.1+5 x 10 matching the error bound derived in the first section. In the table below, L represents the approximation to £,n x and x v is the operand being normalized to unity. TABLE 6 Vl 1 1.20000000000000 2 -1 0.90000000000000 3 1 1.01250000000000 h 1.01250000000000 5 1.01250000000000 6 -1 0.99667968750000 7 0.99667968750000 8 1 1.00057296752930 9 1.00057296752930 10 1.00057296752930 11 -1 1.000081+ 1+0651000 12 1.000081+ 1+0651000 13 1. 00008 1+1+0651000 111 -1 1.00002336620198 15 -1 0.99999281+ 791078 k+1 -O.6931V718055995 -O.U05I+65IO810816 -O.5232U81U376I155 -O.5232I+81I+376I+55 -O.5232U81U376U55 -O.507U99786796I+I -O.507I+99786796I+I -0.51139842721207 -0.511398^2721207 -0.511398U2721207 -O.51091002671396 -O.51091002671396 -O.51091002671396 -O.5108I+898969I+99 -O.510818U7165119 (Continued) 53 TABLE 6 (Continued) k ^__ Ik "k+1 k+1 16 0.99999281+79 1078 -O.5IO818I+7165H9 17 1 1.0000001+772507^ -0.510826l010l662 18 1.0000001+7725071+ -0.51082610101662 19 1.0000001+7725071+ -0.510826l010l662 20 1.0000001+7725071+ -0.51082610101662 21 -1 1.0000000001+1336 -O.5IO82562I+I7935 22 1.0000000001+1336 -0.510825621+17935 23 1.0000000001+1336 -0.510825621+17935 2fc 1.0000000001+1336 -0.510825621+17935 25 1.0000000001+1336 -0.510825621+17935 26 1.0000000001+1336 -0.510825621+ 17935 27 1.0000000001+1336 -0.510825621+17935 28 1.000000000U1336 -O.5IO82562I+ 17935 29 1.0000000001+1336 -O.5IO82562I+I7935 30 1.0000000001+1336 -O.5IO82562I+I7935 31 -1 0.9999999999^769 -0.51082562371368 32 0.9999999999^769 -O.5IO82562371368 33 0.9999999999^769 -O.5IO82562371368 3^ 1 1.00000000000590 -0.51082562377189 35 1.00000000000590 -0.51082562377189 36 1.00000000000590 -0.51082562377189 37 -1 0.99999999999863 -O.5IO82562376U61 38 0.99999999999863 -O.5IO82562376I+61 39 1 1.0000000000001+1+ -O.5IO825623766I+I+ 1+0 1.0000000000001+1+ -O.5IO825623766I+I+ 5h 5.3 Error bound It has been shown that the algorithm provides M correct bits in the value of #n x, neglecting machine round-off. 5 .h Experimental estimate of speed The value of #n X can be computed in one addition cycle time beyond that required for division, or approximately 1 + M/3 addition cycle times, on the average. 5 .5 Implementation The implementation of the evaluation of a |n 2 is discussed in Section 5.1. The recursion relation for the normalization process is identi- cal to (3"l) for division. Vi = 2 \ + s k + s k\ 2 " k (5 - 1} The second recursion relation is simply the continued summation of a set of stored constants. L k + 1 " .^ ( " in 'l> k-1 = Z (-£n H.) + (-in z ) =L k+ [-in (1 + s k 2" (k+l) )] (5-2) A block diagram indicating the flow of information is shown in Figure 3. That only a portion of the set of precomputed constants {-in (l + s 2 ^ )} need be explicitly stored in the ROM is easily seen. Consider again the power series expansion r — i I £ I I ! I I d I I I I i i I 1 log I i ** i i i i — 1 I S I I s I I i I i I i i _i i _-i »-o: !fE! 1 ?!' i — i i « i i is i i § i J I + (0 . o: uj H 0) * o-J UJ K -p •H CO WD O o t»D CO •H O o H PQ H + + X 56 £n (1 + &) = 5 - 1/2 8 2 + 1/3 6 3 - l/U 8 + ... [-1 < 8 < +1] where 6= s^^K If k > [&*],* £n (1 + 6) = 8 = s k 2 _(k+l) to machine accuracy and the constants need not actually be stored. 5.6 Concluding remark An algorithm for computing the natural logarithm has been proposed; the algorithm is virtually identical to the proposed division algorithm, except that a small ROM is required. The constants that must be precomputed and stored are £n 2 -m (i± 2 - (k+1) ) k-i, a, ..., [*p]. A listing of approximate decimal equivalents of the required precomputed constants is given in Appendix B. An algorithm for logarithm to any positive integer base could easily be formulated by the same procedure; new sets of precomputed constants would be required. 'Ml2l * The largest integer not greater than (— p ) . 6. FIRST ALGORITHM FOR SQUARE ROOT 6.1 Basic algorithm In Section 2.3 it was shown that four normalization square root algorithms are known; two of these are studied in this paper. The algorithm studied in this section is a multiplicative (continued product) formation of the quantity y/~\pT. The process is effectively the same as the proposed division algorithm, the only difference being that the multiplier constants {r.} are the squares of the multiplier constants chosen for division. Let M x IT r . 1=0 x X ~ M IT r. 1=0 x where X € [1/2, 1) r. = (1 + 1/2 s, 2 *)% 1 < k < M, b. = {1, 0, 1} k ' > - - ' k and {r.} is chosen such that M x IT r. = 1. 1=0 x Then, M TI r. S - i=0 x x and M y n (1+ X/2 s 2" 1 ) S-t- . i=0 ^ x The recursion relation for the normalization of the operand x is complicated by the fact that the multipliers have three terms rather than two. 57 58 k x. , = x. IT r. x^ = x K+l . _ 1 i=0 = (x n r )(1 + 1/2 s 2 K r U i=0 X k R. , = R^ TI Vr. R^ = y k+1 . _ v l J i=0 k-1 = (R H -V7T)(1 + 1/2 s 2" K ) U i=0 x K =\ (1+ s k 2"( k+1 )). (6-2) The multiplier constants for the division algorithm are d,=l+ 1/2 s. 2" k k k whereas those chosen in this square root algorithm are , -k 2 -2(k+l) r. = 1 + s n 2 + s. 2 v k k k The dominant terms (other than unity) in the multipliers differ by a factor of two; for this reason, the comparison constants must also differ by a factor of two to achieve the same recoding. A simple change in notation or a comparable change in the implementation transformation can overcome this minor discrepancy; the latter is chosen as being conceptually neater; in practice, the two are the same. Thus, in terms of the partially normalized operand x, , the selection rules are chosen as follows. r k = (1 + 1/2 s k 2" k ) 2 s k = / 1 1 ° 59 if x k < 1 - 3/k otherwise if x^ > 1 + 3/U .-k The implementation transformation in the division algorithm is given by \ ■ 2 K - « whereas the implementation transformation in this algorithm is taken as so that \vl\ < 1, as is shown in Section 6.3. The selection rules, in terms of vl, may thus be written in exactly the same form as that for division: ' 1 if u^ < -3/8 ^ otherwise 1 if u^ > +3/8, 6.2 Choice of initialization step The initialization step comparable to that chosen for the division 2 algorithm is r = d , that is, 0' r o = U if l/U < x Q < 3/U 1 if 3/U < x Q < 1 which leaves x = x r in the range [3/U, 3) and u = x-, - 1 € [-l/U, 2), outside the desired range, (-1, +l). Merely changing the initial comparison constant provides a more acceptable range. r = U if l/U < x Q < 1/2 1 if 1/2 < x Q < 1. 6o Thus, x e [l/2, 2), u e [-1/2, l). It is shown in the next section that this choice of initialization leads to a convergent algorithm. Example : An example computing 0.5/ V0.6 = O.6I+5I+9722I+3679O is tabulated below. It may be seen that the algorithm produces an approximation which is -12 -IP in error by 0.11 x 10 , well within the error bound of 0.68 x 10 " " derived in the next section. TABLE 7 1: 1 1 2 3 1+ 1 5 6 7 8 1 9 10 -1 11 12 13 Ik 1 15 0.93750000000000 0.93750000000000 0.93750000000000 0. 9970092773^375 0.9970092773^375 0.9970092773^375 0.9970092773^375 1.00090761+81219^ 1.00090761+812191+ 0.999930U3788180 0.999930^3788180 0.999930U3788180 0.999930U3788180 0.999991^6972357 0.999991^6972357 Vi 0.62500000000000 0.62500000000000 0.62500000000000 0.61+1+53125000000 0.6I+1+5 3125 000000 0.61+ 1+53125000000 0.61+ 1+5 3125 000000 0.61+579010009766 0.61+579010009766 0.6^51+71+7729003'+ 0.61+5I+71+7729003U 0.6^51+71+77290031+ 0.6^51+71+77290031+ 0.61+51+91+1+7122715 0.6I+5I+9UI+7122715 (Continued) 6l TABLE 7 (Continued) k _^_ ^k \+l k+1 16 0.999991^6972357 0.61+51+91+ 1+7122715 IT 1 0.99999909906757 0.61+51+9693359315 18 0.99999909906757 0.61+51+9693359315 19 0.99999909906757 0.61+51+9693359315 20 1 1.00000005271+126 0.61+51+9721+139007 21 1.0000000527^126 0.61+51+9721H39007 22 1. 00000005 271+126 0.61+51+9721+139007 23 1.00000005271+126 0.61+51+9721+139007 2U -1 0.99999999313661 0.61+51+9722215275 25 0.99999999313661 0.61+51+9722215275 26 0.99999999313661 0.6I+5 1+9722215 275 27 1 1.00000000058719 0. 61+51+97221+557^2 28 1.00000000058719 0.61+51+97221+5571+2 29 1.00000000058719 0.61+51+97221+557^2 30 1.00000000058719 0.61+51+97221+5571+2 31 -1 1.00000000012153 0.61+51+97221+1+0713 32 1.00000000012153 0.6U51+97221+1+0713 33 -1 1.00000000000511 0.61+51+97221+36955 3^ 1.00000000000511 O.6I+5 1+9722 1+3695 5 35 1.00000000000511 0.61+51+97221+36955 36 1.00000000000511 0.61+51+97221+36955 37 1.00000000000511 0.61+51+97221+36955 38 -1 1.0000000000011+8 0.61+51+9722U36838 39 -1 0.99999999999657 0.61+51+97221+36779 1+0 0.99999999999657 0.61+51+97221+36779 62 6.3 Error bound It has already been shown that x Q e [lA, 1), u Q e [-3/8, 0) x ± e [1/2, 2), u^ e [-1/2, l). Next let us find the range of x . The selection rule for the first (k = l) step is given by, 8 - / r = 1 + 1/2 s + l/l6 s^ 1 if 1/2 < x ± < 5/8 if 5/8 < x < 11/8 I 1 if 11/8 < x < 2. The first range, [1/2, 5/8) maps onto (1 + 1/2 + 1/16) • [1/2, 5/8) or [25/32, 125/128). The second range, [5/8, 11/8) maps onto itself. The third range, [11/8, 2) maps onto (1 - 1/2 + 1/16) • [11/8, 2) or [99/128, 9/8). 63 Hence, x 2 e [5/8, 11/8), u 2 6 [-3A, +3A) so that x lies in the middle range of the selection rules for s and |u J < 1. It is easy to show that X M+1 " X l ^ 3 / U * 2 " > l\l < 1 for a11 k ' Hypothesis : For k > 2, x, € [1 - 3A ' 2~^~ 1 \ 1 + 3/1+ • 2~^ k ~ 1 0. The hypothesis has been shown to be true for k = 2. The induction proof for the behavior for k > 2 is virtually identical to that in Section 3.3 fo r division, except for the perturbation caused by the second order term in the multiplier. Proof : For some k > 2, x, e [1 - 3/2 1 2~ k , 1 + 3/2 • 2~ k ) . For all k > 2, n -k ., /, 2 -2k r. = 1 + s 2 + 1/k s 2 1 k ' k if x. < 1 - 3A • 2 _k s n = < otherwise k The first range, maps onto 1 if XL > 1 + 3A ' 2" . [1 - 3/2 • 2" k , 1 - 3/k • 2" k ) [1 - 2" k (l/2 + 5 A * 2~ k + 3/8 • 2" 2k ), 1 + 2" k (l/U - 1/2 • 2 _k - 3/16 • 2~ 2k )). But for k > 2, 1/2 + 5A ' 2' k + 3/8 • 2" 2k < 339/512 < 3A 61+ so here x k+1 € [1 - 3A * 2~ k , 1 + 3A • 2" ) The middle range, [1 - 3 A • 2"\ 1 + 3 A • 2" k ) maps onto itself. The last range, [1 + 3A ' 2" k , 1 + 3/2 • 2" k ) maps onto [1 - 2~ k (lA + 1/2 ' 2" k - 3/16 • 2" 2k ), 1 + 2 _k (l/2 - 5A ' 2" k + 3/8 ' 2" 2k )) which clearly lies within the desired range. Thus, -M and Let so that and Then *M + 1 " X l * 3 / U • 2 |uj < 1 for all k. M x = n "V r - i=0 ,2 x X 2 - 1| < 3A • 2" M . :(X + -pr)U --±r I < 3A ' 2" M . 65 Let e be defined by where e represents a relative error in the approximation to 1/ -\Tx; |ej « 1, Then, x (^)|^|<3A- 2 - M or (2+e)|e| <3A ' 2~ M . Since |e| « 1, one may write approximately Id <3/8 -2- M . Thus, X -ft < ^(3/8 • 2 " M ) or |x - 4=| < 3 A • 2" M . Finally, since y < 1, that is, in order to guarantee M correct bits in the value of y/^J~x~, one must perform M 4- 1 steps beyond the initialization. 6.k Experimental estimate of speed The Monte Carlo estimate of the mean probability of a zero is O.669, with a corresponding shift average of 3.0*+. 66 6.5 Implementation Rewriting recursions (6-1) and (6-2) under the transformation xl = 2 (x, -l), one obtains, Vi ■ < 2 \ + V + 2 " k < as k\ + V* $ + 2 " 2k < 1 / 2 S 'V> u Q = l/2(x-l) (6-3) R k + i " E k + 1/2 s k\ 2 " k > R o " y - < 6 - u ' Figures h, 5, and 6 show possible hardware configurations to implement these recursion relations. 6.6 Concluding remark Further comments about this algorithm are included in Section 7.6 where a comparison of the two square root algorithms is made. 67 o + a: cc Ui O* o sS < B _j _) 3 XLU wz U. r 1 I S I I i I I 1 I i + fen ILU W2 Ipl i_^_j Co**?; i a i I § I 1 QJ ' -p o o u I 03 U o § SO CC ■H T3 ^J O o H ■P co Sh •H F>4 I 1 I H + W CM l OJ + co + OJ H OJ J*i CO OJ H XT' OJ i OJ + I OJ ■i w OJ II H + 68 + (X UJ Q Q < I 1 IK I o I -i I I 2 3 U. XUJ I- I 3 CM If If > i I i.'i. s K ZK uj 5 3* Q.O 1 O I § I I B|-| , I ill i i i 0<0 i i i I hoc 00 UJ 1 u. UJ — » r X o CO I 1 I rr I UJ CO I S I i i -p o o u u o u cd ■H CJ O T3 g o ^ I o I il I £ I I I X OJ H r-\ CD •9 H 3 CJ o f>» T3 o £ x to + X OJ + OJ ■* OJ x & X + X OJ + I OJ OJ CC OJ d" H a 69 ^ cr •» uj Jt S o < + _j Jt _j K 3 U. K UJ o Q < _l _l z o & xu (flZ u. o z zv - — 5.u _ 31 + o 2 K ZH 3s Q.O i o hoc 00 UJ 1 u. UJ — z X o (/) -p o o u aJ •H o H Til •H >5>i i f s I i si i i* i l i CO CM H + CM CM to CQ CO ft 0) O OJ + CM OJ H + > CM £ ^ > CM + CM l CM ,* w CM + II 5 K 7. SECOND ALGORITHM FOR SQUARE ROOT 7.1 Basic algorithm Let us consider in this section an additive square root algorithm which can be performed in approximately the same amount of time as the division algorithm and with essentially the same hardware, but which requires a scaling of the comparison constants to achieve a minimal recoding. It is somewhat less general than the algorithm studied in the last section in that the root itself, rather than y/ ~\fx, is evaluated. The normalization process one wishes to carry out is 7 k+l =: 7 k " r k where V^ r. = 1/2 s 2~ k k k j' 1 if 7l , < -c * 2" 1 k s n = ( otherwise k so that 1 if 7 > +c • 2 _k c e [1/3, 5/12) 7 M + 1 = ° M _ Vi = A r i = v^- i=0 One would achieve an asymptotic shift average of three with this recoding, While this is a conceptually neat algorithm, it cannot be carried out in practice in this form since ~\l~x is unknown and the recursion cannot be 70 71 initialized. Rather, one must perform recursions with respect to a known quantity such as \ + l = x. k E r i=0 " X = x where r k = 1/2 s k 2 -k S k = if x n < -c' otherwise if *k^ > +c' ,-k ,-k One must thus find, a suitable relationship between c' and c that yields a minimal recoding. Now, 7 k+1 \ + l ■yfx- R k+l = x - R k+l so \ + i = >W ( V^ + R k+ i) The problem thus becomes one of choosing a convenient value for c' = 2"\/*x"c, c g [1/3, 5/12). As an example, suppose yx = l/2 (x = l/M; then c' = c = 3/8 is the most convenient choice; if yx = 1 (x = l), then c' = 2c = 3A is convenient. Metze solved a similar scaling problem for the SRT division; he showed that at least four regions are needed to cover 3 ^ the range, whose endpoints are in the ratio 2/l, since (5A) < 2, (5A) > 2. 72 To reduce the precision of the comparisons, five regions are recommended, Convenient choices for comparison constants for the various ranges of x are listed in Table 8. TABLE 8 X & Comparison Constant c 1 L V 16 ; [0.500, 0.558) 3 8 L l6» 8 j [0.558, 0.612) 4 16 L 8' l6 ; [0.612, 0.750) 1 2 L l6> 8 j [0.750, 0.935) 5 8 [J,D [0.935, 1) 3 1+ 7.2 Choice of initialization step Let us consider, for a moment, the recursions necessary to imple- ment this algorithm. \+l r k Z r. i=0 x x Q = x X " x o " k-1 Z r. + r. 1=0 x k i k-1 Z rv i=0 " " s k 2 "\ 2 -2(k+l) V -k - w* - #" 2(k+1) (7-1) 73 k R, . = Z r, R n = k+1 . n 1 1=0 =R + s?-( k+1 > (7-2) k k Rewriting (7~l) under the transformation u, = 2 x , chosen to force \\\ < i, V = 2 \-^ s A +s k 2 " M '. u o = 1 ^V (7-3) If it can be guaranteed that R has a zero in bit position (k + 2) then the 2 -fk+2) single bit represented by s 2 can simply be inserted in s R and the value (s,R, + s 2 ) can be shifted, complemented, and added to 2vl. If the initialization constant r is a low precision number, and if radix complement notation is used for negative numbers, s R will indeed have a zero in bit position (k + 2) as desired. This minimizes the delay time caused by the mini-adder. Since the comparison constant for the algorithm is a function of the operand, the initialization constant r is also allowed to be a function of the operand. TABLE 9 r o U' 16 ; 2 _9 16 5 8 3 k tf- I) 8 ; * it> L l6' J) * 1) 7^ It is shown in the next section that these choices for initialization lead to a convergent algorithm. An example is listed below. Example : If x = 0.6, then ~\fx~ = O.77U5966692U1I+8. In kO steps, the algorithm produces an approximation to v x which is in error by 0.6 x 10 , -IP well within the worst case error bound of O.U5 x 10 (derived in the next section) . TABLE 10 1 2 3 k 5 1 6 1 7 8 9 1 10 11 12 1 13 1 Ik 15 16 17 18 19 20 -1 0.03750000000000 0.03750000000000 0.03750000000000 0.03750000000000 0.01381835937500 0.00179^3359375 0.00179^3359375 0.00179^3359375 0.00028285980225 0.00028285980225 0.00028285980225 0.00009377896786 -0.00000077262521 -0.00000077262521 -0.00000077262521 -0.00000077262521 -0.00000077262521 -0.00000077262521 -0.00000077262521 -0.00000003391201 (Continued) k+1 0.75000000000000 0.75000000000000 0.75000000000000 0.75000000000000 O.765625OOOOOOOO 0.773^3750000000 0.773^3750000000 0.773^3750000000 O.77I+I+1U0625OOOO 0.77^lU06250000 O.77I1U1U0625OOOO 0. 77^53613281250 0.77^59716796875 0.77^59716796875 0.77^59716796875 0.77^59716796875 0.77^59716796875 0.77^59716796875 0.77^59716796875 0.77^59669113159 75 TABLE 10 (Continued) JL ~k+l 21 -0.00000003391201 22 -0.00000003391201 23 -0.00000003391201 2k -0.00000003391201 25 -1 -0.00000001082723 26 -1 0.00000000071516 27 0.00000000071516 28 0.00000000071516 29 0.00000000071516 30 1 -0.00000000000624 31 -0.00000000000624 32 -0.00000000000624 33 -0.00000000000624 3k -0.00000000000624 35 -0.00000000000624 36 -0.0000000000062)4 37 -1 -0.00000000000060 38 -0.00000000000060 39 -0.00000000000060 Uo -1 0.00000000000010 k+l 0.77^59669113159 0.77^59669113159 0.77^59669113159 0.77^59669113159 0.77^596676230^3 0.77^59666877985 0.77^59666877985 0.77^59666877985 0.77^59666877985 0.77^5966692^551 0.77U5966692^551 0.77^59666924551 0.77^59666924551 0.77459666924551 0.7745966692U551 0. 77^59666921+551 0.77^5966692i+l87 0.7745966692U187 0.77459666924187 O.7745966692I+IU2 7.3 Error bound In this section it is shown that the error in the approximation to the root is bounded by 2~^ + ', so that M correct bits in the root are produced in M steps beyond the initialization. The first step of the proof consists of producing, by direct com- putation, bounds on the first few approximations to~yx. The proof is then 76 completed by induction. Recall, \ = Vx"" \, 7 = ~\[Te [1/2, 1) Table 9 is then completed to yield Table 11. X /x L U' i6 j [0.500, 0.558) L l6' 8 ; [0.558, 0.612) L 8' 16 ; [0.612, 0.750) L l6' 8 ; [0.750, 0.935) [J. 1) [0.935, 1) TABLE 11 r o y n = vx - r 1 o 2 x n = x - r 1 1 2 [0, +0.058) [°-i*> 9 16 [-0.00U5, +0.0U95) L 256' 256 ; 5 8 [-0.013, +0.125) r x ^ 3 1* [0, +0.185) c°-i|j [-0.065, 0) [■£. 0) Hence, 7;L e [-O.065, +O.I85) By looking at the selection rules for the first (k = l) step, one can see that s, =0, r = 0, independent of x ( a virtue of the initialization chosen) . Then 7 2 e [-O.O65, +O.I85). Hypothesis 7 k l <2 -k for all k. The hypothesis has been shown to be true for k = 0, 1, 2. Assume \y j < 2 for some k and show 77 th Proof : For the k step, r R = 1/2 % 2 -k s k = 1 if x. < -c' • 2 otherwise 1 if x^ > +c* • 2 -k -k where c' = 2^/x c, c e [ 1/3, 5/12). -k — Range 1 : Suppose x n < -c' • 2 so that s, = 1, r n = " k k ' k ,-k 1/2 • 2 . Then, 7 k+l 7 k " r k = 7 k + 1/2 ' 2"\ Clearly the signs of x and 7 must agree, so k k -2~ k < 7n < — k and thus, k+1 1 < 2 "(k+l) -k -k Range 2 : Suppose -c' • 2 < x^ < c' • 2 so that s =0, r = 0. Now, *k k+i " 2yr- 7 2 V^"d-^) c' . 2 -k 7 < 7 k+l' - k 2 V r < l -a^ 78 2-yr . c • 2 _k i < 2 V^d-^) 7 < 5/12 • 2 _k (1 - 2 *) For k > 3, 1 < 8/7 1 - 2" k " so |7 k+1 | < 5/12 ' 8/7 • 2" k <2" (k+l) . -k / -k Range 3 : Suppose x > c' • 2 so that s = 1, r = 1/2 • 2 . Then, 7 k+l = 7 k " r k = 7. - 1/2 ' 2"\ Since the signs of x^ and 7 agree, < 7, < +2~ k k — and thus Hence, for all k, 7 k | - l\ -^1 < s" k . Q.E.D. (k-2) Next it is shown that |u,| < 1 for all k. Recall u. = 2 V 'x,. Since x € [l/k, 1), u e [l/l6, l/k) . Also, 79 M - K - *l < 2" k (- ~ + 2" k + -^) < 2" k (2-yr+ 2" k ) |\| < iA (2Y7+ 2 _k ) < l/k (2 + 1/2) for k > 1. Hence , I -u- J < 1 for all k. 7.^ Experimental estimate of speed The Monte Carlo estimate of the mean probability of a zero is 0.66l, with a corresponding shift average of 2.96. 7.5 Implementation The recursion formulas for implementation are (7 - 2) and (7 _ 3). Figure 7 shows the corresponding block diagram. 7.6 Concluding remark Two square root algorithms have been studied in detail, a multipli- cative scheme in Section 6 and an additive scheme in this section. The multiplicative scheme clearly requires more hardware to achieve a speed com- parable to that of the additive scheme; it cannot achieve equality of speed. It thus appears that the multiplicative algorithm should be discarded in favor of the additive algorithm, and this is probably true in a strictly binary implementation. However, the algorithm of Section 6 offers two re- deeming features : first, the comparison constants need not be scaled; second, the algorithm easily lends itself to a higher radix implementation, which the algorithm of Section 7 does not. 80 I 1 I I. d I I £ I i i ill: I i i o in DC to XUJ wz Us 10 +■ f 1 I fg I i si i I £ I i i H + CJ CM ,W CO + X X CO + AS CVJ CM H AS CO «r + X ii M H + H + X K +-> o o u o c- 81 REFERENCE G. A. Metze, "A Class of Binary Divisions Yielding Minimally Repre- sented Quotients," IRE Transactions on Electronic Computers , EC-11:6: 761-764, December, I962. 8. THE ALGORITHM FOR EXPONENTIAL 8.1 Basic algorithm The exponential e may be evaluated in three multiplication cycle times, two of -which are required to identify a convenient operand which may be normalized to zero. Let X X log 2 e log e 2 e = e (1+f) In 2 = e where X log e = I + f I = integer -1 < f < +1. Then, X I gn 2 f £n 2 e = e e = 2 e where x = f £n 2 -in 2 < x < +£n 2. Thus three basic steps are required: (1) Multiply X by the precomputed and stored constant log^e to identify I, f . (2) Multiply f by the precomputed and stored constant log 2 to generate x. (3) Evaluate e via the normalization scheme discussed below. 82 83 Then, 'j e = e 2 , e e [1/2, 2) The first two of these operations may be performed by the multiplication algorithm of Section k, and require no further discussion. It is the major purpose of this section to formulate an algorithm to evaluate e , |x| < £n 2, The algorithm proposed here is, in some sense, the dual of the algorithm proposed in Section 5 for gn x. Let M M x = x - 0n( II e. ) + gn( H e.) i l i=0 1=0 where e, = 1 + 1/2 s n 2~ k , 1 < k < M. Then, M M x = (x - Z gne.) + gn(n e.) li i=0 i=0 M M (x - Z jne.) 0n( II e. ) i i x 1=0 1=0 e = e e M (x-Z^ne.) M 1=0 • ( H e.). 1=0 The set of multiplier constants is chosen such that M x+ Z (- n e . ) = i i=0 so that, 81+ M X ~ TT e = H e. , i=0 x The required set of precomputed constants, {-£n e.} is exactly the same set required by the algorithm for ?n x. Thus, to evaluate e , two simple recursions are performed. x k+1 = x Q + Z (-in e.) x Q = x i=0 = x k + (- £ n e k ) (8-1) E. 1 = n e. E. = 1 k+1 . _ l i=0 = E k + s k E k 2" (k+l) (3-2) where 1 if x k < -3/8 * 2 -k s n = < otherwise k ^ 1 if x, > +3/8 • 2~ k . V K 8.2 Choice of initialization step The initialization of this algorithm consists of a very small (five value) table-lookup and requires storage of four additional precomputed con- + + n ± 1/2 ± lA stants, namely, e , e 85 TABLE 12 X e An e [|. An2) .1/2 1 2 L U» 2 ; e iA 1 1+ 1 L 2 , u ; e "lA 1 (-£n2, -|) e" 1/2 1 "2 Since E = 1, no multiplication is required to form E = e . Since x = x - in e and #n e contains at most one non-zero bit, x can also be formed quite easily. It is shown in the next section that such an initialization leads to a convergent algorithm. An example of the evaluation of e is given below. Example : If x = 0.6, then e X = 1.82211880039051. The error bound derived in Section 8.3 indicates that, for M = kO, the error should be less than -12 O.69 x 10 . The actual error in the approximation produced by the algorithm -12 is less than 0.62 x 10 86 TABLE 13 1 2 1 3 1+ 5 -1 6 7 8 -1 9 10 11 12 13 -1 Ik 15 -1 16 IT -1 18 19 20 1 21 22 23 2k 25 26 -1 27 28 -1 29 -1 30 k+1 0.10000000000000 -0.01778303565638 -0.01778303565638 -0.01778303565638 -0.002031+67868821+ -0.002031+67868821+ -0.002031+67868821+ -O.OOO0796I+3852UI+ -0.0000796U3852HU -0.0000796U3852U)+ -0.0000796U3852l^ -0.00007961+3852l|l+ -O.OOOOI8606833U7 -0.0000186068331+7 -0.00000331+792799 -O.OOOOO33U792799 0.O0O0OO1+6677655 0.0000001+6677655 0.0000001+6677655 -O.OOOOOOOIOO60U9 -O.OOOOOOOIOO60U9 -0.0000000100601+9 -0.000000010060l+9 -O.OOOOOOOIOO60U9 -0.000000010060l+9 -0.00000000260991 -0.00000000260991 -0.00000000071+727 0.00000000018405 0.000000000181+05 k+1 1.6U872127070013 1.85U811U295376U 1.85U811U295376U 1.85U811U295376U I.82583OOOO95IU I.82583OOOO95IH I.82583OOOO95IH 1.82226392673051 1.82226392673051 1.82226392673051 1.82226392673051 1.82226392673051 1.82215270U 56701 1. 82215270456701 1.82212U90072326 1.822121+90972326 1. 8221179^986838 1. 8221179^986838 I.822II79W6838 1.82211881872192 I.822H881872192 1.82211881872192 1.82211881872192 1.82211881872192 1.82211881872192 1.82211880 51 1+608 1.8221l8805ll+608 1.82211880175212 1.82211880005511+ I.822II88OOO551I+ (Continued) 87 TABLE 13 (Continued) L k+1 31 1 -0.00000000001*878 32 -0.00000000001*878 33 -1 0.0000000000091*3 3>* 0.0000000000091*3 35 0.0000000000091*3 36 1 0.00000000000216 37 0.00000000000216 38 1 0.00000000000031* 39 0.00000000000031* i*o 0.00000000000031* k+1 1.8221188001*7938 1.8221188001*7938 1.82211880037332 1.82211880037332 1.82211880037332 1.82211880038658 1.82211880038658 1.82211880038989 I.822II88OO38989 I.822H880038989 8.3 Error bound The error in the approximation M e = IT e. i=0 x is strongly related to the value of the function itself: l eX -*Wil e XM+1 - llE M+l where M Vi = A e i i=0 Thus, a bound on E^ is required in addition to a bound on x^ . In pro- ducing a bound on r , two preliminary results are helpful. First recall the power series 88 &n(l + &) = 6 - 1/2 5 2 + 1/3 6 3 - l/k 6 + ... [-1 < 5 < 1]. Then, for k = 1, 2, ... |in(l + 2" k )| < |in(l - 2" k )|. (8-3) Next it is shown by induction that |in(l - 2~ k )| < 3/2 • 2~ k , k = 1, 2, .... (8-U) -k Observe that since in(l - 2 ) < 0, statement (8-10 is equivalent to -£n(l - 2" k ) < 3/2 • 2~ k , k = 1, 2, ... or or, exponentiating, in(l - 2 _k ) > -3/2 • 2~ k , k = 1, 2, ... -k 1 - 2~ k > e" 3/2 * 2 , k = 1, 2, ... (8-5) which is in more convenient form for an inductive proof than is (8-U). Proof : Since e ' = 0.^7 < 1/2, (8-5) is true for k = 1. For some k, 1 - 2 " k > e-3/ 2 • 2 " k . Then surely, 1 - 2" k + 1A • 2" 2k > e-3/ 2 • 2 " k or (1 - 2-(*+D)2 > e -3/2 • 2" k 89 or, taking positive roots, 1 . 2 -(^).> e -3/2-2" (k+l) . Q.E.D. Now let us produce a bound on r , For the initialization chosen in the last section, one may easily show by direct computation that x 1 = x Q - in e Q e [-l/k, +l/k) . /•, -, \ Hypothesis : |x_ | < 3/8 " 2 ^ for all k. The hypothesis is true for k = since |x | < £n 2 < $/h and true for k = 1 since |x | < l/k < 3/8. The in- th duction proof is completed by considering the mapping from the k step to st the (k+l) step. -k -k - Range 1 : Suppose -3/U -2 < 3c < -3/8 ' 2 L so that s = 1; then, w = \-«*- 2 " M > or. -3A • 2" k + |in(l-2" (k+l) )| < ^ +1 < -3/8 • 2" k + |in(l-2" (k+l) )|. By (8-10, |in(l-2- (k+l) )| <3A ' 2" k , and by a power series expansion, |in(l - 2- ( * +l) )| =2- (k+l) + l/2 . 2 - 2 ^K ... or |in(l - 2' (k+l) )| > 3/8 • 2" k . Thus, for Range 1, 90 -k K +1 |<3/8-2 Range 2 : Suppose -3/8 • 2~ k < x, < 3/8 * 2~ so that s = 0; then -k K +1 | - 1^1 < 3/8 • z -\ Range 3 : Suppose 3/8 • 2~ k < x^ < 3/U • 2~ k so that s = 1; then X k + 1 = x^ - £n(l + 2~ (k+l) ). From (8-3) and (8-4), Jn(l + 2" (k+l) ) < 3 A • 2" k . But from the power series expansion, m(i + s-< k+1) > 2 " (k+1) - 1/2 • 2" 2(k+1) or ei m(i + 2" (k+l) ) > 1/2 • 2" k - 1/8 ■ 2" k • 2' which for k > 1 yields in(Z.+ 2" (k+l) ) > 3/8 ' 2" k . Thus, 3/8 • 2" k < £n(l + 2 _(k+l) ) < 3A * 2' Therefore, K +1 | < 3/8 • 2" k for this range of x, also. Hence, for k = 0, 1, 2, ..., I\J * 3/8 • 2" k . Q.E.D, 91 Note further that :+i' < IV J 6 so that the error in the approximation to e is less than x M IVll e ( . IT n e i ) 1=0 where l^ + il < 3/8 • 2" M . For any reasonable register length, say M > 10, K+il e < 1.001, so the error may be bounded by or M M M e X - ( E e.)| < 3/8 ' 2" M (1.001) ( H e.) 1=0 X i=0 X e X - V J < 0.376 ■ 2" M E M+1 . By direct computation, M / M IT e. < e 1/2 n (1 + 1/2 s.2" 1 ) i=0 X i=l < e 1 / 2 " (1 + 2-^) 1=1 < 3.31. Thus surely 92 W < l -^ ■ 2""- But since E^ is a close approximation toe , it is substantially less than 3.31: \- *" ;+1 " 1 - 0.376 • 2" M " 1 - 0.376 • 2' M E^ +1 <2(1 + 1/2 • 2~ M ). Therefore, :•: • E^J < 0.376 • 2" M ♦ 2(1 + 1/2 • 2" M ) < 0.753 * 2" M for M > 10. Hence, the performance of M + 1 steps beyond the initialization x suffices to guarantee M correct bits in the approximation to e . Q.k Experimental estimate of speed The Monte Carlo estimate of the mean probability of a zero is O.669, "with a corresponding shift average of 3.0^. J: 8.5 Implementation Making the usual transformation, u^ = 2 x , such that | tjl | < 1 for all k, and rewriting recursion (8-l), one obtains, Vi = 2 ok / „ /-, -(k+l)x\ \+ 2 ( -£n(l + s k 2 0] u Q = x (8-6) with recursion (8-2) remaining unchanged. 93 \ + i - \ + s A 2 " (k+1) > E o = 1 ' E l = V ( 8 "7) A hardware configuration to implement these recursions is shown in Figure 8. 8.6 Concluding remark v An algorithm for evaluating e in three multiplication cycle times has been proposed. The algorithm requires storage of only a few precomputed constants not required by other algorithms; the hardware required for imple- mentation fits within the requirements of other algorithms previously proposed, 9k I 1 I g I i i i i d i i I £ i i i 1 log I i i r-£ — i >E > ■ Bb ■ Co il l — i i — i i s I i I i i i i i i i i I Ik. I I 1 I 5|S| I ■ii 1 -I I 1 I I « I I I % (0 . o z fcv * 5 5. o *- § o I £ I I i I i i i I i 03 •H •P c a; a o & tO -t- (T UJ ♦-' O^ Q O =8 < B _) ZUJ 3 wz U. n I_°_J >- =J>- 4 | Sac ?o Q UJ in 5.2 UJ ^ i £T ■»- jt r 1 I £ I or UJ a o < i p i I I Sg l ill 111 III ^ I I A OJ + H «^_X a y-— >. <** rH i + ■ — i AJ s^^ H 1 + OJ ,* ^ H OJ X a> + + sf II II H H + + ^ P^ 1 u. — UJ UJ QC o 6 CO bD CO •H >d AJ o o H CO E 9. THE ALGORITHM FOR TANGENT (OR COTANGENT) 9.1 Basic algorithm The tangent (or cotangent) of any argument X, < X < 2«, may be formed in two multiplication cycle times beyond an initial range reduction. From the standard identity, tan(X + a) = tan X it is clear that one need only consider arguments in the range < X < it . Further, from the identity, tan(X + jt/2) = -ctn X it is clear that the range may be reduced to < X < tt/2. Hence, the initial range reduction requires storage of the values jt, it/ 2; whether one stores both values explicitly or obtains one as a shifted version of the other is a moot question. The algorithm proposed here to evaluate tan X (or ctn X), < X < it/2, requires the use of the complex exponential, e J = cos X + j sin X where the operator j represents a 90° counterclockwise (positive) rotation in the complex plane. The normalization procedure is conceptually identical to that employed in the division algorithm, except that the multiplier constants are complex numbers. Let M H t. JX jX i=0 x e = e M n t. 1=0 x 95 96 1 where Then, and J

0. k ' k 1 ' ' k 1 M ir t. = ( n It !)e 1=0 i=0 x i=0 1 M M j(X - E cp ) IT t jX i=0 i=0 e = e M TT It. | i=0 X If the multipliers are chosen such that then where M x - E cp. = o i=0 x M e^ X = cos X + j sin X = K n t. i=0 K M n i=0 It. 1 1 i ' Hence, if M III ! S S.I ! + J III! = n t - M+l M+l ° M+l . _ i 1=0 Wlth Vl' Vl rSa1 ' then M+l tan X = R M+1 97 ctn X = % +1 M+l independent of K. For simplicity sake, assume that the tangent is required. The range of the operand may be further limited to < X < jt/U as follows: (1) If < X < it/k, let x = X and , Y , ~ hi+i tan X = tan x = ; R M+1 (2) If it A < X < jt/2, let x = tt/2 - X and R. tan X = ctn x = M+l "Wl Having performed the indicated range reductions, one need only formulate an ix algorithm to compute the real and imaginary parts of e , a final division being required to evaluate the tangent. Four constraints are placed on the set of multipliers: (l) the summation of the angles must approach x; (2) the magnitudes must be non- zero; (3) the continued product of the multipliers must be easy to compute in rectangular coordinates; (h) approximately two-thirds of the multipliers, on the average, should be 1 + jO, if possible. The following choice for the th k multiplier satisfies these constraints: t k = 1 + j 1/2 3^2 -k where if S k = 3^ < -3/8 • 2 otherwise -k if ^ > 3/8 ' 2" 98 k *k+l = x o " A V x o = x * i=0 Observe that one need not compute 1 K = M IT It. I i=0 ' x since it is merely a scale factor which disappears during the final division. With the exceptions of a few additions required for the initial range reductions and the final division, one need only perform the recursions developed below. k x. = x n - Z (p. X_= X K+l . _ Y l 1=0 *k r - n , , -1, -(k+lK = x^ - tan (s k 2 J ) = ^ - s k tan" 1 ( 2 - (k+l) ) (9-D i.e., T k + 1 ■ \ + l + J h.i T o = x + J0 n t. i=o x = (R k+ j I k )(l + j s k 2" (k+l) ) Vi = \-W (k+1) ^ 99 The set of constants, {tan (2 )} , must be precomputed and stored in the ROM. A series expansion tan" 1 5 = 6 - 1/3 & 3 + l/5 6 5 - l/7 6 T + . . . [|B| <1] indicates that only one-third of these constants need be explicitly stored since, for k > [M/3], tan (2 ) = 2 to machine accuracy. 9.2 Choice of initialization step At the cost of storing one additional precomputed constant, one may choose t Q = 1 + j tan(jt/8) so that x = x - jt/8 e [-n/8, +it/8) R 1= l I = tan(it/8), the value tan(jt/8) being stored. An example of the normalization procedure and the step-by-step values of R and I is listed in Table 1^. Presumably the division algorithm is used to compute the ratio L„ n /R„ ., to produce the tangent. ^ M+r M+l Example : If x = 0.6, then tan x = O.68U1368083U169. The algorithm produces R^ = 0.921308710261+29, I = O.6303OI2OO5377U. The ratio is -12 O.68U1368083U183, which differs from the correct value by O.lU x 10 -12 This error is within the error bound of 0.68 x 10 100 + no OO 00 on 00 CM CM CM t- O o o NO ON ON ON ON ON ON ON on ro oo On ON NO NO NO ON LP, LP LP O CM CM CM CM CM o O lp LP LP LTN LTN LTN LP LP CM O o o CM -=f -=t- -=t -=r -sf CO CO -=f -* J" t— D— NO NO NO CM CM CM CM 00 -=f -=f _=T -a- _=f t— [— H H H C— t- -3" J- -=J- NO CO CO CO CM CM CM CM CM CM CM CM On ON ON CM - vo VO V£> LTN LTN LTN LfN LfN LfN LfN LfN LfN LTN H H oo CO OO o CO OO t- b- t~- H H H CO OO CO CO ao OO oo en on CM CM CM On On vo VO VO t- t- t— LfN LfN LfN LfN LfN LfN LfN vo vo _=(■ _=r _=r C— C~- en en on vo vo VO CM CVJ CM CM CM CM CM CM CM vo vo vo -=f VO VO j- -d- -=t ON ON ON O o O O O O O o O CM CM CM H t— t— H H H o o O H H H H H H H H H H H H VO VO t- t- t— t- t— C— t— C— C- t- t— t- t- t— t— C— t— t- W CO CO oo oo OO CO CO CO OO OO OO OO OO CO oo oo OO OO OO OO .-3 O o o O o o o o o O O O O o o o o O O O 1 H on on on on on on on on on on on on on on en on en en on on + H H H H H rH H H H H H H H r-t H H H r-\ H H Eh r^ CM CM CM CM CM CM CM CM CM CM CM CM CM CM CM CM CM CM CM CM K ON ON ON ON On ON On ON ON ON ON ON On ON On ON ON On ON ON CO ao t- t— t- H H H On ON On ON ON ON ON CM CM o o o LfN LTN o o o LfN LfN LfN H H H H H H H On On H H H -J- -3" o o o o o o On ON On ON ON ON ON H H o o o ON ON H H H _=1- -3- _=!- O O O O O o o o O o o o o O LfN LfN LfN ON On ON O O O O O o o o O o o o on on vo vo vo o o o O O O O O o o o o o o o LTN LTN o o o o o o O O O O o o o o o o o o o O o o o o o o O o O O o o o o o o o o o O o o o o o o O o O O o o o o o o o o o O o o o o o o O o O o o o o o o o o o rH o O o o o o o o O o O o o o o o o o o o + o o o o o o o o O o O o o o o o o o o o M o o o o o o o o O o O o o o o o o o o o t o o o o o o o o o o o o o o o o o o o o o 1 o 1 o o o o 1 o 1 o o 1 o 1 o 1 o 1 o 1 o o o 1 o 1 o o o H H I M CM CM en -=r LfN VO >- CO ON O H CM en _=!- LfN vo r— oo ON o CM CM CM CM CM CVJ CM CM on on on en en on on on on en -=r 102 9.3 Error bound -k It is first shown that, for k > 2, |xj < 3/h • 2 , and that if u, = 2 x_ then | u. | < 1 for all k. Recall that the range reductions resulted in |x n | < n/h so that |u n ] < 1. Also, since |x | < fl/8, |u | < 1. Consider the choice of t • t 1 = 1 + l/U s cp, = s, tan (lA) = O.2UU98 s. 1 where 1 if x, < -3/l6 s = < otherwise 1 if x > 3/l6. Range 1 : Suppose - n/Q < x < -3/16 so that s = 1. Then X 2 = X l " ^1 = x ± + tan _1 (l/U) so that |x 2 | < 0.16 < 3/16 ,| < 0.6^ < 1. u. Range 2 : Suppose -3/16 < x < 3/16 so that s 1 = 0; then |x | = |x 1 | < 3/16 and I u 2 I < 1. Range 3 : Suppose 3/16 < x < it/8 so that s 1 = 1. By symmetry, |x 2 | < 3/l6, l u 2 l < 1 - 103 Hypothesis : For k > 2, |xj < 3/U • 2~ k , |uj < 1. Proof : The hypothesis has been shown to be correct for k = 2. The proof is completed by induction. Range 1 : Suppose -3/U • 2 < x < -3/8 • 2~ . Then, "hi ■ \ + *-"V (k+1) ) and -3/U. 2- k + tan- 1 (2- (k+l) ) < x^ < -3/8 • 2~ k + tan'V^"^) . Now, tan'V (k+l) ) = 2" (M) - 1/3 • 2- 3 < k+1 > + l/ 5 • 2" 5(k+l) - ... so 2" (k+l) - 1/3 • 2" 3(k+l) < tan-V (k+l) ) < 1/2 • 2" k . Since k > 2, or Thus or 2" k (l/2 - 1/1536) < tan" 1 (2" (k+l) ) < l/2 • 2 767/1536 ' 2" k < tan _1 (2" (k+l) ) < l/2 • 2~\ 2" k (-3A + 767/1536) < x k+1 < 2 -k (- 3 /8 + 1/2) -385/1536 • 2" k < x k+1 < 1/8 • 2"\ Hence 10k here. Range 2 : Suppose -3/8 • 2~ < x, < +3/8 • 2~ . Then s = and Ix, I < 3A ' 2" (k+l) Range 3 : The proof for this range follows that above by symmetry consider- ations. Hence, |x | < 3/U ■ 2~ ; also |uj < 1. Finally, |x„ 1 | < 3/8 ■ 2" . While the error in the normalization process has been bounded by -M 3/8 • 2 , it is not clear that the values of cos x and sin x are known to such precision. An approximate error bound on the error in tan x is developed below. First observe that jx __ JX M+l/_ . _ n e = K e (R^ + 3 3^) so that, equating real and imaginary parts, K Vl = C ° S(X " W K W = sin(x " W' where K R.. , is the approximation to cos x and K L,, . is the approximation M+l ^ M+l to sin x. The error in the approximation to tan x is thus given by E^ = tan x - tan (x - x. _ ) tan x M+l tan x - tan x^ 1 = tan x - *M+1 1 + tan x tan x^ .. 2 tan x,, n sec x 1 + tan x tan *M+1 105 Since < x < n/k, 1 < sec x < 2 and < tan x < 1. Therefore, ,- 2 tan l^ll 1 tan x 1 1 - tanjx^^^ | But |xj. 1 | < 3/8 • 2~ M , so, to first order, |tan x^ _|< 3/8 • 2~ M and l E tan ,1 2 3A • 2' M . When the value of X exceeds it/U, x = jt/2 - X, and one computes ctn x rather than tan x. The analogous error bound is given by 1 + ctn x ctn x^ E , = ctn x + - ctn x ctn x - ctn 2 CSC X *M+1 ctn x - ctn x M+1 2 tan x^ esc 2x tan x^ - tan x Therefore, to first order, 2 1 x„ T _ I esc 2x ctn x 1 tan x - x.„ n 1 M+l 1 It may be observed that as x approaches zero (X approaches «/2) the error increases without bound; such behavior is to be expected since tan X itself increases without bound. If x » |x M -,\, then the error may be approximated by 2 E ctn x I = Ivil csc x > x>> IVil which is seen to approach 3/h • 2 as x approaches n/h, matching the error bound for the tangent. 106 If one wishes to achieve a small error in the case where X -»jt/2, (x -*0), it may be best to turn to a power series expansion such as ctn x = l/x - 1/3 x - I/U5 x 3 - 2/9^5 x 5 - ... 2n 2n-l Ts^Ti — x [|x| <«] where B is a Bernoulli number. When x is very small, the first few terms in such a series suffice to yield an acceptably small error. For example, if x < ~\/l/3 * 2~^ M+ - l -<', then ctn x = l/x to machine accuracy. Similarly, if the tangent of X is required and X is sufficiently small, it would be better to use a power series expansion here also. tan X = X + 1/3 X 3 + 2/15 X 5 + [|X| < jr/2] If X < ~\/3 • 2 , then tan X = X to machine accuracy. 9.h Experimental estimate of speed The normalization of the operand x and the parallel evaluation of the approximations to cosine and sine require approximately one multiplication cycle time. The Monte Carlo estimate of the mean probability of a zero for this process is O.653? with a corresponding shift average of 2.90. In addition, as many as three range reduction subtractions are required to obtain x. A final division to evaluate either tan x or ctn x is also required. Hence, an average of slightly more than two multiplication cycle times is required to evaluate tan X. 107 9.5 Imp leme nt at i on Figure 9 shows a hardware configuration to implement recursion relations (9~l), (rewritten in terms of u= 2 x ), (9-2), and (9-3). 9.6 Concluding remark An algorithm for computing tan X (or ctn X) in approximately two multiplication cycle times has been developed. It has been shown that, for certain values of X, it may be advisable to consider a power series expansion as an alternative to minimize the error. 108 o % if). -p c 1) (3D § t-tc 0E ■p Ul H 1 u. UJ ae H 3 CM OJ w ^ d* H OJ h bD aJ •H o o H pq 1 + 10. THE ALGORITHM FOR COSINE AND SINE 10.1 Basic algorithm The algorithm for computing the cosine and sine of an argument X, < X < 2jt, differs from the algorithm for tangent in only two respects. First, the constant K must be computed since the cosine and sine are ex- plicitly required, rather than merely their ratio; second, the final division is unnecessary. The chief concern here is thus the evaluation of K. Recall that K = n a/i + s 2 2^^ i=0 AM — -* Clearly, if the algorithm is performed non-redundantly with s e [1, 1} , then K can be precomputed and stored. Such a choice, however, guarantees that at every step an addition must be performed, thus slowing down the algorithm considerably. A compromise between Specker's non-redundant al- gorithm and the fully redundant (s e {1, 0, 1}, k = 1, 2, ..., M) algorithm one would like to perform is developed here. Consider an expansion of l/|t | : 1 1 K\ k 1 A/l + si 2- S ^ . 1 . a 2 -(2k +3 ) 2 2 -(Uk + 6) k k 2 -(Uk+T) . . . - . + s 2 + higher order terms. K * Such an algorithm was proposed by Specker. 109 110 For k < (—)—), at least three terms in the expansion are required to repre- sent l/lt, | to machine accuracy; it is preferable to disallow s = and precompute the constant required. However, for k > (— r~) , it is preferable to allow s, = 0; the computation requires a simple correction factor. For (^) < k < <**), M to machine accuracy. For k > (~^) f 1 = 1 K\ to machine accuracy and no correction factor is necessary. In summary, then, the following selection rules are proposed: * In t = 1 + j tan jt/o t =1+^1/2 s k 2 _k , k = 1, 2, ..., M where (1) for k = 1, 2, ..., fij-] + 1, * The constants K 1 = IT |t | i=o (1 + 2 - 2 ( i+1 ))- 1 / 2 [—1+1 k 1 ■ = n (1 + 2 v ') ' *ol 1=0 are precomputed and stored and serve as initial values R, and 1^, respectively. Ill s k = if if *kS < *k > (One addition cycle time is required per step.) 2 (2) for k = [^] + 2, ..., ^] + l, 1 if x k < -3/8 • 2 (0 otherwise 1 if x^ > 3/8 ' 2 -k -k -i-4 2 - (a+3) k k 1 (Two addition cycle times are required if s / 0, leading to an average of two-thirds addition cycle times per step.) (3) for k = [~2] + 2, ..., M, s k as above, = 1. (A single addition cycle time is required if s / 0, leading to an average of one -third per step.) Since roughly one-quarter of the steps are non-redundant, one- quarter are redundant but require two additive cycle times per non-zero step, and one-half are redundant and require one addition cycle time per non-zero step, approximately lA M + lA ' 2 • M/3 + 1/2 • M/3 = 7/12 M addition cycle times, on the average, beyond initial range reduction, if any, are required to compute the cosine and sine. 112 Other choices for the set of multiplier constants {t.) were studied with the conclusion that the set chosen appears to be near an optimum. A brief discussion of the reasoning leading to this conclusion is presented in Appendix B. It may be recognized that the problem faced here is identical 2 to that faced in attempting to introduce redundancy into Voider 's CORDIC vector rotation scheme. 10.2 Example Having chosen t Q = 1 + j tan(jt/8), so that cp n = ir/8, R = K' , I = K» ' , it is possible to carry out an example . Example ; If x = 0.6, then cos x = 0.82533561^90968 and sin x = O.56J+6U2U733950U. The algorithm produces approximations which are in error _V2 -12 by O.k x 10 and 0.1 x 10 , respectively. The error bound, derived in -12 Section 10.3, for both cosine and sine is O.U5 x 10 TABLE 15 ^k+1 0.20730091830128 1 1 -0.0376777^82559 2 -1 0.0866772^972117 3 1 0.02U258U3972522 k 1 -0.00698139370505 5 -1 0.0086U233 1 +915 1 +2 6 1 0.00082999385532 7 1 -0.0030762362766^ 8 -1 -O.OOII23II376OI6 k+1 0.88706U17837978 0.79520567503^72 0.86885568228162 0. 8382^322299^37 0.8212U000959630 0.8301 5091 WU06 0.82579571H9 1 +79 0.82359277522U76 0.82U7005 1 +353106 k+1 0.367U3U01338025 0.58920005797520 0.1+897993W59586 0.5UU102828738U6 0.570297929^5703 0.557^6605^30709 0.56395160832853 0.56717737282538 0.56556879318627 (Continued) 113 TABLE 15 (Continued) k+1 k+1 'k+1 9 -1 -0.00014655157061 0.82525285680565 0.56476342156173 10 -0.00014655157061 0.82525285680565 O.56U763U2156173 11 -0.00014655157061 0.82525285680565 0.564763^2156173 12 -1 -0.00002448125871 0.82532179150388 O.56U662678U805U 13 -0.00002448125871 0.82532179150388 0.56466267848054 Ik -1 0.00000603631940 0.82523902325696 0.56463749139536 15 0.00000603631940 0.82533902325696 0.56463749139536 16 1 -0.00000159307513 0.82533471539075 0.56464378821596 IT -0.00000159307513 0.82533471539075 0.56464378821596 18 -1 0.00000031427351 0.82533579236181 0.564 64221401389 19 0.00000031427351 0.82533579236181 0.56464221401389 20 0.00000031427351 0.82533579236181 0.56464221401389 21 1 0.00000007585493 0.82533565774061 0.56464241078928 22 0.00000007585493 0.82533565774061 0.56464241078928 23 1 0.00000001625028 0.82533562408530 0.56464245998312 24 0.00000001625028 0.82533562408530 0.56464245998312 25 1 0.00000000134912 0.82533561567147 0.56464247228158 26 0.00000000134912 0.82533561567147 0.56464247228158 27 0.00000000134912 0.82533561567147 0.56464247228158 28 0.00000000134912 0.82533561567147 0.56464247228158 29 1 0.00000000041780 0.82533561514561 0.56464247305023 30 1 -0.00000000004786 0.82 533 561 1*88268 0.56464247343456 31 -0.00000000004786 0.82533561488268 0.56464247343456 32 -0.00000000004786 0.82533561488268 0.56464247343456 33 -1 0.00000000001034 0.82533561491554 0.56464247338652 34 0.00000000001034 0.82533561491554 0.56464247338652 35 0.00000000001034 0.82533561491554 0.56464247338652 36 1 0.00000000000307 0.82 533 56l 491144 0.56464247339252 37 1 -0.00000000000057 0. 82533561 U90938 0.56464247339552 38 -0.00000000000057 0.82533561490938 0.56464247339552 39 -0.00000000000057 0.82533561490938 0.56464247339552 ko -1 -0.00000000000011 0.82 533 561 U9096U 0.56464247339515 uk 10.3 Error bound It is shown that the errors in the approximations to both cosine -(M+l) and sine are less than 2 -1/ From the choice of t Q , |x | < tt/8 < tan" (l/2). For i < k < [Hj£] + i, where During these steps, \= S k tan- 1 (2' (k+l) ) 1 if k" \^ < 1 if x k > 0. x, I < tan _1 (2" k ). k 1 — -1 Proof : For k = 1, |x, | < tan (l/2) as indicated above. The inductive proof that |x^| < tan (2 ) is based on the observation that -1/ -kv -1/ -(k+l)x . . -1, -(k+lK tan (2 ) - tan (2 ) < tan (2 J i.e. , tan _1 (2" k ) < 2 tan' 1 ^ ■ 2 _k ) This observation follows from the power series -1 tan" 8=8-1/36+ l/5 6 - l/7 8 + .. [|5| < 1] Since for some k, jx^l 3/8 • 2' During these steps also, |xJ 3- o<* UJ DC K. cr 2 UJ •— ' O* H- Q O < _l _J U. fen ILU tf)Z AC O o I S I I Q i I S I I 1 I cO ^ I 2 I i-tr 1 b_ — UJ to j/ 55 UJ DC + CM H I a) -P i OJ ^ H rH + + X ^ CVJ « CVJ CVJ ^ K H + K H + H 118 10 . k Experimental estimate of speed Counting a non-zero step in the second quarter of the algorithm as two addition cycle times, the Monte Carlo estimate of the mean probability of a zero is O.U65, with a corresponding shift average of 1.88. 10.5 Implementation Except that the control is complicated somewhat by changing operation modes in the midst of the algorithm, and that during the second quarter of the algorithm a "multiplication" of R and I by the factor K K. (l - 2 ) may be required at the completion of a non-zero step, the implementation discussed in Section 9*5 suffices. 10.6 Concluding remark It may be preferable to perform the entire first half of the al- gorithm in the non-redundant mode. If this were done, it would require approximately 2/3 M as compared to 7/l2 M addition cycle times, beyond initial range reduction, to compute both cosine and sine. REFERENCES 1 W. H. Specker, "A Class of Algorithms for Ln X, Exp X, Sin X, Cos X, Tan-1 X, and Cot _ l x," IEEE Transactions on Electronic Computers , EC-lU: 1:85-86, February, 1965. 2 J. E. Voider, "The CORDIC Trigonometric Computing Technique," IEEE Transactions on Electronic Computers , EC-8: 5:330-33*+, September, 1959. 11. THE ALGORITHM FOR ARCTANGENT 11.1 Basic algorithm Formulating a normalization scheme required to force an arbitrarily- large operand to zero in a few simple steps seems a rather formidable task, but, in this case, it is not difficult. A common trigonometric identity is useful. -1 tan X = Tt/h + tan -llx-i \X+1 where X = x • 2 a X 6 [1/2, 1) OL integer. Let 8. / if a < o 1 if a > and consider the following identity. -sa 2 ! (X+l) + j(X-l) r 2 - sq: (x + i) + j 2 - sa (x-D M n t, i=0 " M IT t. 1=0 where the set of multipliers {t.} is of the same form as that used in the cosine/sine and tangent algorithms. Taking logarithms on both sides yields, in(2 _Sa ) + to[-^(X4-l) 2 + (X-l) 2 ) + j tan" 1 ^) M M = l0g(T M + l ) " Z in|t i' " J E ^i + i=0 i=0 119 120 where T. M+l = *M+1 + J I M+l 2 _SQ; (X + 1) + j 2 _SQ; (X-1) M ir t. J i=0 and log(T k+1 ) = MAj^ + I^ +1 ) + j t-"^)- Equating imaginary parts, one obtains, . -i/x-i\ . -l/Wl 8 tan krT = tan I- - L cp. , \ X+1 / \ R M + l' i=0 X Hence, the desired, relation is obtained. tan X = n/k + tan" -1 M+l \\. M E cp. +r i=o >-k Although t, - 1 + j l/2 s. 2 as before, the selection rules for s, are now k k ' k chosen such that L^ t/R** -1 is very nearly zero, so that one may approximate, -1 M tan X = n/k - Z cp. . i=0 X Three things should be noted carefully: (l) the required set of stored constants, {tan (2 )}, is already required by other algorithms in the set; (2) although a division has been indicated conceptually in the trans- formation outlined above, no actual division need be performed- -one merely sets R 00 - 2- flB (Xrt) I 0Q . 2- Sa (X-l) ) which requires less than one addition cycle time, (this is only the first part 121 of the initialization); (3) the algorithm, when implemented in hardware, is virtually" the same as the algorithm for tangent, the only significant dif- ference lying in that the quantity being normalized (indirectly) is the ratio /x-i\ rr-r- , rather than X itself. Comparisons to choose each s are performed with respect to I, , and it is shown that \l^ -i/^™ 1 I i- s very small, so that tan \ R M+1/ «! and the former term can be neglected. Never is the ratio I M -./R^ -, actually computed; rather, it is shown that |l | < 3/8 * 2 , whereas R^ > 3A. As in the other trigonometric algorithms, three recursion relations are required. Vi - \ - \ ^-^ \ + i - \ - \\ 2 " (k+1) <^ where (l if I < -3/8 • 2 ) s, = { otherwise k ^ 1 if I > +3/8 * 2~ k . Understanding of the details of initialization and error bound proof is facilitated by discussing separately two cases: (l) S = 0, < X < 1; (2) S = 1, 1 < X < 00. 11.2 Choice of initialization—error bound: Case 1 The first part of the initialization consists of the choice of S = and the setting of values 122 R 00 " x + x > R oo e [1 > 2) I oo- 1 - 1 ' ' I oo €[_1 ' 0) - The second part of the initialization, for this case, consists of a simple scaling of magnitudes. !3/U if < X < l/k 5/8 if l/k < X < 1/2 1/2 if 1/2 < X < 1. Note that

0, -3A ' 2" k < I k < 3A • 2~ k 3A + f (k) < \ < 1 + 2f(k) where f(k) = 3A S |s, J2" 21 , s =0. i=0 Proof : It was shown above that the hypothesis is true for k = 0. The in- duction proof is completed by considering the three ranges of I, . Range 1 : Suppose ~3/k ' 2~ k < I < -3/8 * 2 _ so that s. = 1 and 123 Vi ■ R k - v" (k+1) L . = L + R t 2" (k+1) k+1 k k then, 3A + f (t) + 3/16 • 2" 2k < R k+1 < 1 + 2f(k) + 3/8 • 2" 2k -3A -2" k + (3A + f(k)) l/2.2 _k 3/2. V The third part of the initialization consists of a scaling of magnitudes. '02 - < I 3A if 1 < R 01 < 5A 5/8 if 5A < R 01 < 3/2 1/2 if 3/2 < R 01 < 2 where, from the previous operation, R 01 6 [X, 2) I e [-1/2, 1/U). Thus, by direct computation, R € [3A, 1] l e [-3/8, +3/16) e [-3A, + 3A]. -(M+l) Therefore, by the argument of Case 1, the error is again bound by 2 Example : If X = 1.2, then tan" X = O.87605805059819. The algorithm produces -12 an approximation which is in error by less than 0.2 x 10 , within the error bound of 0.1+5 x 10" 12 . 127 K ON ON ON CVJ CO H LTN LA LA t— t— t- t- -st -st CO CO CO CO NO o o O NO NO O o CM CM -st -st -3- -st d H o o o o t— ON On ON o O b- -3- -st -st t~ t- t- t- H CM CM CM CM c— O O o t— C— t~- O oo CO G\ ON ON ON -=t -st LTN LTN LA LA t- ON On On LA LTN H la NO NO t- t— t— c— NO NO CO CO CO CO H O O O t— t— CO -st t- c— -st -st ^st -st On ON o o O O t- ON ON ON CO CO H -st t- t— CO CO CO CO -st -st On ON ON ON oo O o O LTN la O o t— t— t— t- t- c— t— c— -rj- -st -=t -st LA ON ON ON CO CO CO LA CO CO t— t- t- t— NO NO H H H H O O o o -4- -st H O ON On t— t- t— t- H H o O O O o ON ON ON C\J OJ NO CO CO CO o o o o o o o O O O o O o o CO CO CM -st O O o o o o o o o O O O o C\ ON ON C\J CM H o O o o o o o o o o o O O o o o o o o O o o o o o 1 o o o o o o o O O o o o o o o O o o o o 1 o 1 o 1 o 1 o o 1 o 1 O 1 O 1 o 1 o o o o o o o -st -st m en en en On On H H H H NO o o o o o LA CO LA LA CM CM CM CM LA LA NO NO NO NO H o o o o o CM LA t— t- en on en en LA LA 00 00 OO 00 ON o o o o o H On LA LA On ON On On -=f -=t 00 OO OO OO OO o o o o o CO H CM CM H H H H [~- t— NO NO NO NO ON o o o o o CM CO H H m en en 00 -* -St t"- t— t— t- LA o o o o o oo on CM CM t~ t~ t— t- o O 00 00 OO 00 _=t o o o LA LA NO CO H H LA LA LA LA ON ON CM CM CM CM -st o o o t- !^ oo CO NO NO _=r _^" -3- J- oo OO H H H H O o o o oo oo c— CO -St -=1- NO NO NO NO H H O O O O O H o o o -=t -=t -=f ON t— t- O O O o o O O O o O O + LA LA LA oo on o oo O o o o O o o O o O o o O M c— t— t— CM CM H o o o o o o o o O o O o o o -i o o o o o o o o o o o o o o o o O o o o o o o o o o o o o o 1 o 1 o 1 o i o 1 o 1 o O o 1 o 1 o 1 NO H K o o o o o o LA LA LA ON On ON On OO 00 ON On On On CO -d O o o o o LA CO O O o O o O OO oo -st -st -st -st NO o o o o o b- 00 OO OO t— C— [— t- CO CO O O O O H 3 o o o o o 00 ON On On CM CM CM CM NO NO CO CO CO CO CO C o o o o o ON H LA LA CM CM CM CM H H OO OO OO OO OO •H o o o o o o NO c— t- NO NO NO NO O o o o o O o -P • o o o o o H OO H rH -3- -St -St -St LA LA LA LA LA LA LA fl o o o LA LA c— LA H H co CO CO CO CO CO CO CO CO CO CO O o o o L^- L^- 00 LA H H H H H H H H H rH H H H o o o o CO CO LA OO LA LA LA LA LA LA LA LA LA LA LA LA LA v_^ H o o o NO NO O H H H H H H H H H H H H H H + LA LA LA ON ON O O O o O O O O O o O O O o O M CM CM CM CM CM oo OO OO OO 00 oo 00 00 oo 00 OO 00 OO oo oo CO CO CO CO CO CO CO CO CO CO CO CO CO OO CO CO CO CO CO CO o o o LA LA LA H H CO CO LA LA H H H H -Sl" -st t— t— t— t- NO -St -St -st ^t -st 0O ON ON ON LA LA LA LA oo 00 CM CM CM CM ON C— t- t— en oo oo OO LA LA LA LA LA LA On On o O O O LA ON ON On ON On H [— O O ON ON ON ON oo oo LA LA LA LA t- . 00 -OO OO oo OO O o CM CM OO OO 00 OO CM CM -St -st -st -St ir— 00 OO 00 oo OO CM 00 OO OO LA LA LA LA o o H H H H c— NO NO NO t— c— O -st t>- t— OO oo OO OO o o -=t -st -st -sf 0O H H H ON On t- o CM CM CO 0O CO CO oo CO LA LA LA LA LA CO CO CO NO NO O 00 On ON LA LA LA LA -st -st ON ON ON ON CO On On ON H rH -sf LA la LA 00 oo oo 00 t— t- IA LA LA LA LA H OO 00 00 CO CO -sl- CM H H H H H H o o O O O o O + LA LA LA t— t- oo H LA LA NO NO NO NO NO NO NO NO NO NO NO id CO CO CO -st -st NO t- tr- t— ir— t- t- t- t— tr- tr— t— t— C— t— t- tr- t— oo CO CO OO ee 0O oo CO CO 0O CO CO co 0O 0O 0O CO o O o o O O o o o o o O O o o o O o O O *l o o M I H CM OO LA NO t— 0O ON H H CM 00 LA NO H H CO On H O CM 128 K H H H VO vo VO VO VO NO NO o c— t- t- t— t- on On On CO VO VO VO ON On On On ON ON On en -3- -3- -3- en en o o o H O o o LA LA la Lf\ LA la LA o c— t— t— CO CO H H H o -=t- -3- -* oo oo oo oo en en en t- en en en o o o O o o oo oo CO t— t~ [— t- t~ t— c— CM o o o o o o o o o o o o o o o o o o o o o o o o o o o o o vo vo vo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 1 o 1 o o 1 o o 1 o 1 o o 1 o 1 o 1 o o 1 o o 1 o 1 o 1 o 1 a •H •p o o VO 3 t- c— t— vo VO vo vo vo VO vo On 3 H H -* -3- o o o LA CO CO CO ON On ON On o\ On ON en H d On o\ On On ON H H H H o O o O o O o .3- H H VO vo o O O O On On ON H H H H H H H OJ en en on o o o o O O O O O VO VO VO VO VO VO VO CM o o o o o o o O o O O O o O o O O O o o o o o o o o o O o LTN LA la o O o O o O o o o o o o o o o O o o O o o O o O o O o o o o o o o o o O o O O o o O o O o O o o o o o o o o o O o O O o o o o O o o o o o o o o o o o o o r-i O O o o o o O o o o o o o o o o o o o o + O O o o o o O o o o o o o o o o o o o o M O O o o o o O o o o o o o o o o o o o o H o o o o o o o o o o o o o o o o o o o o o o 1 o o 1 o 1 o 1 o o o 1 o o 1 o 1 o 1 o o o 1 o o 1 o 1 o 1 K On On On On ON On ON ON (J\ ON ON ON ON On ON 0\ On On ON ON CO CO CO oo CO CO CO CO CO CO 0O oo OO CO 0O CO CO 0O CO OO H H H H H H H H H H H H H H H H H H H H CO oo co CO CO CO CO CO CO OO CO 0O CO CO CO 0O CO CO 0O 0O en en oo OO OO 00 00 00 00 OO 00 oo oo 00 OO oo OO oo OO OO o o o o O o O o O o o o o O o o o o O o la LA LTN LTN LTN LfN LTN LTN LTN LT\ LTN u\ LA LA LA LA LA LA LA LA CO CO OO CO CO CO CO CO CO CO CO CO CO 0O CO CO CO 0O OO CO H H H H H H H H H H H H H H H H H H H H LTN LTN LTN LA LTN ITS LTN LTN LTN LTN LT\ LA LA LA LA LA LA LA LA LA H -H H H H H H H H H H H H H H H H H H H H + O O O O O O o O O O O O O O O O O O O O M en 00 00 OO OO OO oo OO 00 oo OO OO OO 00 OO OO 00 00 oo OO CO CO CO CO CO CO CO CO CO CO 0O CO 0O 0O 0O 0O GO CO 0O CO o o o LA LA LA LA LA LA LA ON VO VO vo VO vo CO CO 0O c— CO 0O OO H r-\ H H H H H _=f vo vo VO LA LA CM CM OJ 00 0O CO 0O -=r -3" -3" J" -=r -3" -=f 0O LA LA LA vo vo On On On 00 00 00 00 00 00 OO OO 00 00 00 VO 00 OO OO o o On On On - c— t— c— t— t- t— t— t— t— t- t— t— c— t— CO co 0O 0O CO 0O CO 0O OO OO CO CO 0O 0O oo 00 oo CO oo CO o o o o o O o o o o o o o o o O o O o o M C\J CVJ 00 _=)- LA VO (^ CO On O H CM 00 -tf LA VO t- CO 0\ o CNJ Cvl CM CM CM CM CM CM OO 00 00 00 00 OO oo 00 oo en j- 129 11. k Experimental estimate of speed Because of the sequence of steps in the initialization, the effi- ciency of the algorithm is somewhat less than desired, although the time required to compute tan X, < X < oo, is still only slightly greater (about three addition cycle times), than the time required to perform a division. For the steps beyond initialization, the mean probability of a zero is O.65O, with a corresponding shift average of 2.87. 11.5 Implement at i on As has been the usual practice, a simple transformation, u, = 2 I , is made in order that the comparison constant may remain ± 3/8> rather than ±3/8 * 2 . Introducing this transformation into the recursion relations obtained earlier yields the following. Vi - \ - <•* t-"V (k+1) ) m->o V ■ 2 \ + S A C 11 " 6 ) ■ It is of interest to note that, since |u| < 1, IL remains fixed during the last half of the algorithm. Figure 11 shows a block diagram to implement these recursions. 11.6 Concluding remark Since the inverse cosine and inverse sine algorithms employ a version of the inverse tangent algorithm, it is important that this latter algorithm be as fast as possible—it is nearly as fast as division. 130 % i0 + < fo#» s ; " I-." 5 *-ll 2 I I *P I I ill ,5*, I I QC UJ k- v V) < o ui cc H> S^*- 1 in 5t a: ■H. XUJ cnz + ^ ko: (£ m w UJ DD k t- 1 U- 4 O *^ UJ Si O CO tc + + CM v ■ H OJ i >- 1 C _J ^_ OJ cO ?° M ^ ■P i 1 5 to co M Q UJ OJ 10 <2 Ul + ! + / ^ ^ + ll II x II iH H H + + ^ -f +3 c CD CO -p cj u o Ch | CO u CO •H O o H pq o M 12. A NOTE ON EVALUATION OF ARCCOSINE/ARCSINE Previously developed algorithms suffice to evaluate arccosine or arcsine for X e [0, 1], although a modification of the initialization is required. From the identities sin' 1 X = tan" 1 -7= (l2-l) r77?. -1 „ . -1 f W l-X 2 1 ,_ c _s cos X = tan 1 - (12-2) sin' 1 X + cos" 1 X = «/2 (12-3) 2 it may be seen that a single multiplication to form X , followed by an appli- -\ 2 cation of a square root algorithm to form V 1 - X , followed by the arc- tangent algorithm with a special initialization procedure to avoid the indicated division may be used to evaluate arc cosine/arc sine. The initiali- zation is simplified by considering two cases. Case >e 1 : If Vi-x 2 > x, (0 < X < 1/^2), then set R 00 = ~^1-X 2 e [1/V2, 1] I m = X e [0, 1/V2] ■00 -1 -1 and use (12-1) to compute sin X and (12-3) to compute cos ' X if desired. The next part of the initialization consists of choosing t _ = 1 - j l/2 so that R Q1 -~Vl-X 2 + 1/2 X I Q1 = X - 1/2 V l-X 2 . 131 132 Finally choose t = 3 A "to scale magnitudes to desirable ranges. R o = 3/1+ R 01 € [3/U > 1] Z = 3/k I 01 € [ " 3/8 > +3/8] e [ ~ 3/U ' +3AL Continuation of the arctangent algorithm then provides the desired result ■with an error less than 2 Case 2 : If V 1-X 2 < X, (l/-^2 < X < l), then set R Q0 = X € (1/V2", 1] I 00 = Vl-X 2 e [0,1/ Y2) and continue as in Case 1. It may thus be concluded that arccosine/arcsine may be evaluated in slightly more than three multiplication cycle times. 13. ON A HIGHER RADIX IMPLEMENTATION 13.1 General considerations One of the chief limitations on the speed with which the algorithms developed in this research may be implemented is the step-by-step transposition in the control: s is chosen, then the appropriate recursions performed, s .. is chosen, the recursions performed, and so on. If the control time is at all significant, there are clear speed advantages in choosing not only s but also s, , ... from comparisons on 11 and then performing several (binary) steps in the recursion relations. The cost of this higher radix implementa- tion is a significant complication of the required comparisons to choose s n , .... In general, both an increase in the number of comparisons required and an increase in the precision of the comparisons would be expected. Although higher radix implementations are not a subject of detailed study in this research, a few apparently important considerations are known and it is desirable to discuss them here. Only radices which are integer powers of two are considered. To illustrate the general strategy, let us compare the recursion for the division scheme of Section 3 necessary to choose s , s , ... with the analogous recursion for most other division schemes. Vi - 2 \ + \ + 2 ~\\ (13 - L) Vi = 2 \" V (13 " 2) The symbols in (13-I) have been previously defined; in (13-2), s is the k quotient digit, vl and il are partial remainders (u_ is the dividend), and x is the divisor. In either (13-I) or (13-2) the range of il is the same 133 1& as the range of vl . Except for the first few steps, the term 2 s xl in (13-1) contains information of only secondary importance, that is, the dominant digits of il are determined only by 2il and s (recall s = (1, 0, 1}). Thus, from the value 2u. alone, one is able to choose not only s , but also s , .... From (13-2) it is seen that, independent of the step index k, significant information is contained in the term s, x, and schemes for higher radix implementation must take this effect into account, that is, the recoding is a function of the divisor. (Scaling procedures have been studied to overcome this difficulty in the division represented by (13-2)). To facilitate the implementation of a higher radix procedure, one would prefer to have a recursion which does not explicitly depend on the original operand — except for the starting difficulty incurred, recursion (l3-l) is considerably less complicated to implement in radix r = 2 , n > 1, than is recursion (13-2). Most, but not all, of the algorithms developed in earlier sections of this paper lend themselves to such a simplified higher radix implementation. 13.2 Amenability of normalization algorithms to higher radix Once the first few steps have been performed, the division scheme proposed in Section 3 lends itself well to a higher radix implementation. The multiplication algorithm of Section h requires the simple recursion and a higher radix implementation is visibly easier for multiplication than for division since no starting problem exists (beyond the actual initialization). Two square root algorithms were discussed in detail in Sections 6 and 7J the recursions of present interest are listed below. 135 vi ■ < 2 \ + \> + 2 " k < 2 VR + ^ \) + s ' s ^ s 4 V (m-*) vi ■ 2 \ - */*-a + 4 2 " (k+2)) (13 - 5) In the strictly binary case, the second algorithm, represented by recursion (13-5), is clearly superior since it is less complex and can be implemented with less time and/or hardware. However, it has the disadvantage of being more difficult to implement in higher radix because the recursion is a strong function of R , the approximation to y x. Thus, this algorithm presents at least the same level of difficulty in a higher radix implementation as the division represented by (13-2). The first square root algorithm, represented by (l3-*0, is clearly comparable to the division in (13-I). It is perhaps less obvious but still easy to show that the algorithm for exponential is readily amenable to higher radix. \ +1 - 2 \ " ^W + 2 ~ (k+ \> - 2^ - 2 k+1 [2-( k+l) s k - 1/2 2- 2 < k+1 >s 2 + higher order terms ] = (2u, -s ) +s 2 + higher order terms (13-6) Again, after some difficulty getting started, a higher radix implementation may be feasible. The same is true of the algorithms for tangent and cosine/ sine. _ _k+l -1, _-(k+l)v u k+l = 2 \ " s k 2 tan (2 ) = 2u^ - s k 2 k+1 [2"^ k+1 ^ - 1/3 2 "3( k+1 ) + h ig h er order terms] = (2u,-s ) + 1/3 s 2 + higher order terms. (l3"T) recursion. 136 For the arctangent algorithm, the issue is less clear. From the Vi = 2 \ + 3 A • ( 13 - 8 > it appears that this algorithm is even more difficult to implement in higher radix than the division of (13-2) since R is not only a function of the given operand, but a function of the step index as well. However, it was shown in Section 11 that R k = K + g(k) where K £ [3/^> l) is a function only of the original operand and g(k) is a very slowly changing function of the step index and is a second-order effect beyond the first few steps. Hence, (13-8), rewritten as, -ul = 2u, + s K + higher order terms (13 _ 9) appears to present the same level of difficulty in higher radix as does (13-2). From these general preliminary considerations, all of the algorithms except the second square root and the arctangent appear to be readily amenable to higher radix implementation. 13.3 A change in strategy One may view a higher radix implementation, say radix k, as a simple alteration of control in that two successive values s , s are chosen on the basis of il alone without forming xl . Even though the probability that s, = may approach 2/3, the probability of a radix h digit, K represented by s s , being zero is quite small. Furthermore, the digit 137 sequences 11 and 11, representing values 3 and 3, are to be avoided since two additions (or subtractions) are required. In a radix h implementation, one may wish to limit redundancy by allowing only digital values 2, 1, 0, 1, 2. Since the probability of a zero in radix h is small (about l/U if values 2, 1, 0, 1, 2 are allowed) and speed is achieved by reducing control time, the shift average is no longer a meaningful measure of efficiency. Rather the speed itself or a speed to hardware ratio must be extablished. While no efficiency studies of radix k implementations of these algorithms have been made, a few thoughts in that direction are appropriate. 13.^ Efficiency of radix k versus radix 2 For as specific a comparison as can be made at this time, let us consider the multiplication algorithm of Section k which presents no starting problems for a higher radix implementation. The time required to perform multiplication radix 2 is given by T 2 S M[t c + 1/3 t A + t g + 1/3 t SH ] where t = control sequencing time t = addition/subtraction time (including complementation) -H. t„ = selection time --time required to select a value for s. S k t = shifting time. orl The time required to perform multiplication radix h is given by T u S M/2[a t c + 3A t A + B t g + h/5 t gH ] where a is a measure of the complication of the control and B is a measure of the complication in the selection rules caused by choosing two "multipliers" 138 at once; probably a ~ 1, f3 ~ 2. It has been assumed throughout this work that complementations, low precision comparisons, and shifting operations are much faster than addition. Thus, / T 2 3 M [t c + 1/3 t A ] T^ ~ M [1/2 t c + 3/8 t A ]. It thus appears that if the control time is at all appreciable relative to the addition time, a radix k implementation would be faster than radix 2: T 2 > 1 k if t c > 1/12 t A . Similar considerations, but not necessarily similar conclusions, appear to apply to any radix r = 2 , n > 2. Ik . CONCLUSIONS lU.l A set of algorithms A class of algorithms for evaluation of certain elementary functions, suitable for hard-wire implementation in a scientific binary digital computer, has been developed and studied. It has been shown that all of the algorithms can be implemented with a reasonably economical structure, and it is strongly believed that these algorithms would be considerably faster than software routines that are now often used. The algorithms employ minimal redundancy to achieve an increase in speed over non-redundant versions of the same al- gorithms; the cost of employing redundancy is a complication of the initial- ization of the algorithms and an increase in the precision of the comparisons required to choose multiplier constants. The extension of the algorithms to higher radix, restricted to the case where the radix is a power of two, has been briefly studied with the observation that the recursion upon which s n , s n n , ... are based should be * k 7 k+1 7 independent of the initial operand. With the exception of the arctangent, algorithms are known for every function in the set that have this favorable property. In contrast, the recursion commonly used for division, Vi = r \ - \ x where q is the k radix r quotient digit, u and vl are partial remain- ders (u is the dividend), and x is the divisor, does not have this favorable property. The application of redundant arithmetic recodings has been extended to a limited set of functions. It is possible that much broader classes of functions may lend themselves to approaches similar to those employed here. 139 lU.2 Areas of further investigation 1^ . 2 . 1 Generalization of the technique The generalization and extension of the "normalization" technique presents, probably, the most interesting topic for further investigation. It is known that either a continued product cr a continued summation repre- sentation of a divisor or its reciprocal yields two possible division algorithms. Similarly it is known that either representation leads to a pair of algorithms for square root. Surprisingly, only a single such algorithm for multiplication appears to offer any practicality. Only a single algorithm is known for each of the other functions studied. The success in devising the algorithms discussed in this report is due basically to three useful properties: (1) log (H p.) = Z log p. (2) exp (E SjL ) = IT(exp s ± ) (3) exp (log x) = log (exp x) = x. It is not known what general class of functions may be evaluated through such a formulation, or whether that class has already been exhausted in this paper. The hyperbolic functions seem likely candidates, but were not studied here because they can quite easily be formed in terms of exponentials. lU.2.2 Correspondence between normalization recodings and classical multiplier recodings Considerable attention has been devoted to the study of multiplier recodings and the correspondence between certain digital division techniques 1 2 and these recodings. ' It is clear that a similar, but perhaps less direct, correspondence exists between the recodings of these algorithms and the multiplier recodings. A complete analysis of this correspondence would be quite valuable. Ik . 2 . 3 Some partial insight? Perhaps some insight into both of the problems discussed above is available even now. Given a number in conventional form, 00 -1 x = Z s.2 , S. € {1, 0, 1} (1^-1) i=0 X X one may recode as <*» x = Z s! 2" 1 , s! e (1, 0, 1} (1^-2) 1=0 x x as in the division scheme. However, more generally, one may recode as 00 x = Z s!«v(i), s!' e {1, 0, 1} (1^-3) i=0 x x where w(i) represents a "weighting function" not necessarily equal to its "nominal" value of 2 . Note that this is no longer a strictly radix 2 recoding, although it is assumed that w(i) is on the order of 2 . In particular, it is known that s.w(i) = s. tan" 1 (2 _1 ) (ik-k) and s.w(i) = £n(l + s i 2 _1 ) (1^-5) have practical application. In attempting to determine what class of functions may be evaluated by a normalization algorithm, or in studying the properties of the resulting recodings, it is natural to seek properties of weight func- tions that produce algebraically correct recodings. (it is not to be inferred that merely because a set of weights produces an algebraically, correct recoding the set has any practical application, but it is assumed to be a necessary prerequisite.) ordered: lk2 It is convenient, first of all, that the weights be positive and be < w(i+l) < w(i). (lh-6) It would appear at first glance that another convenient property would be that the ratio of successive weights w(i+l) v(i) should be constant, but some of those weights found to have practical appli- cation do not satisfy this property. If digital values are limited to the set {1, 0, 1}, then one would certainly like to have the property that the k weight satisfies 00 w(k) < £ w(i). (lh-1) i=k+l The special class of weights w(i)

d) If p is taken as normally distributed, then the standard deviation a is given by I p o^ 1 " p o^ where N is the population size (N = 2 for a register length of Uo) and n 1 ft the sample size (n = 2 ). The formula that connects n with the degree of precision is where t is the abscissa of the normal curve that cuts off an area Q; at the tails; solving for n, a 2 n = [ffo^fol ■ + N [2 ■ Ikk 1*5 For practical use, the estimate p Q must be used. Since n < < N, t 2 p (l-P ) Pq( 1 -Pq) n = — -j = — d 2 2 where V = d /t = variance of the sample. Thus, the sample variance is v , Pq^-Pq) , (2/3K1/3) = q.8,8 x 10 -6 n ^lo and the sample standard deviation is ~\Jv~= 0.920 x 10" 3 . With 99^ confidence, one may say that the actual deviation is within about 2.58 ~\JY~ 2.37 x 10~ 3 ; with 99.9$ confidence, one may say that the actual deviation is within about 3.29 V^~- 3-03 x 10~ 3 . Thus, for a sample of 2 , P is known accurately enough for the purposes of this research. A. 2 Generation of pseudo-random double precision operands The IBM 360/75 computer system software package at the University of Illinois contains a single precision pseudo-random number generator employing a version of Lehmer's method. The subroutine generates pseudo- random numbers using the recurrence relations a. = 2 % J+1 J P J+1 = 5 1T P J (mod 2 k2 ), P Q = 1 with a period of approximately 2 . Since a sample size of 2 was chosen, each of the random operands in the sample is unique. Two single precision operands are required to provide each double precision operand. 11+6 2 As described by Schreider, the chi-square and Kolmogorov tests of uniformity of random numbers are the most commonly accepted criteria for accepting or rejecting a set of pseudo-random numbers. A sample of ten sets 10 of 2 double precision pseudo-random numbers passed both of these tests. A. 2.1 The chi-square test The. chi-square test is based on the statistic 2 N ( v . -np . ) 2 y i i 1 = 1 *1 where V. is the sample number of objects in the i interval, np is the l *± expected value of v., and N is the number of intervals chosen. The values N = 10, n = 2 , p j[ = 1/10, i = 1, 2,..., 10 were used. Thus, 2 10 (V.-102.U) 2 X = i=l 1°2.U • The hypothesis of uniform distribution of the pseudo-random numbers is 2 2 rejected if % exceeds the upper limit X^ -, / \ °^ the confidence interval, where p is the assigned confidence probability and N - 1 is the number of 2 degrees of freedom. In this test, p = 0.95 was chosen, so that values of X should not exceed x Q / Qt -\ = l6.9. 2 Extremely small values of X are considered an indication of failure of randomness. The critical region of acceptance of the hypothesis is taken to be [ %-l(l-p)' %-l( P ) ] = [3 * 33 ' l6 ' 9] - * A confidence level p means a probability close to one such that, if the hypothesis of uniform distribution is correct, the probability is p that the value obtained for X<= will not exceed X^(p). ik-j 2 In the case of p = 0.95, "the probability that X lies outside the above interval is far from negligible, about one trial in ten. Ten samples 2 were tested; the values of x for the test samples are listed below. Since only one trial falls outside the critical region, the hypothesis is accepted, TABLE IT Trial £ 1 6.90 2 7.80 3 5.36 k 11. in 5 k.ld 6 J+.22 7 13.21 8 13.62 9 21.1*3 10 9.09 A. 2. 2 The Kolmogorov Criterion Kolmogorov's criterion is based on the statistic m x D = maxl— - F(x) n ' n v ' where n = 2 is the size of the sample, m is the number of objects in the -A. sample not exceeding the value x, and F(x) is the theoretical cumulative probability function. IhQ Schreider suggests rejecting the hypothesis of "uniformity" if 1/2 n D n falls outside the range (0.5, 1.5). The table below lists the values 1/2 of n ' D for ten test trials. n TABLE 18 Trial n ' D n 1 0.537 2 0.7UU 3 0.319 k O.613 5 O.638 6 0.519 7 0.713 8 O.825 9 1.119 10 0.956 1/2 Since for nine of ten trials the value of n ' D falls in the desired range, n D ' the hypothesis is accepted. It may be noted qualitatively, however, that the data indicate that the numbers generated tend to be somewhat more nearly "uniform than would be expected of purely random numbers. 14-9 A. 2. 3 Listing of raw data for statistical tests TABLE 19 X * m X 1 2 -2_ 4 _i_ 6 7 8 9 10 0.1 ■ 109 105 100 92 108 88 101 128 100 97 0.2 218 199 195 191 211 191 200 227 185 203 0.3 319 301 299 297 313 291 308 330 307 307 0.4 422 401 4l8 4l8 430 393 401 436 422 419 0.5 529 504 516 520 529 499 506 531 508 538 0.6 618 605 617 634 620 607 611 618 593 645 0.7 709 693 708 721 722 711 715 715 681 735 0.8 802 815 809 812 819 821 842 812 802 836 0.9 918 927 915 927 924 915 920 931 908 940 1.0 1024 1024 1024 1024 1024 1024 1024 1024 1024 1024 REFERENCES 1 W. G. Cochran, Sampling Techniques , pp. 49-86. 2 Y. A. Schreider, Method of Statistical Testing Monte Carlo Method , pp. 7-18, 196-220. * Ten trials for m are listed. x APPENDIX B: DISCUSSION OF THE CHOICE OF MULTIPLIER CONSTANTS IN THE COSINE/SINE ALGORITHM The cosine/sine algorithm .employs the identity e = e M j(x - Z q> ) i=0 M n • 1=0 t. i M n l*«l i=0 where {cp.} must be chosen such that M x - Z cp. =0. i=0 X Since the cosine and sine are required, one must evaluate T = where R k + J h K = Re \ 1 k \ k-1 fk-1 nt i=0 x it t. i-o x ' / k-l n t, I k = Im \£T n t. i=0 x Given T , one must compute the real and imaginary parts of T . Let us k+1" consider the following. (l) Write k t T = IT t—^- k+1 . \ ItT 1=0 ' i = IT (cos cp. + j sin cp. ). i=0 X 1 150 151 Then, T k+1 = (\ COS ^k " I k Sin ^ + ^\ COS ^k + R k Sin %J* Since it is not possible, in general, that both cos q> and sin qi have only- one non-zero bit, this is not an efficient recursion to perform. (2) Write T. . = n (t. X.) k+1 i=0 X x where X. = 1 t. ' Then, \ + i ■ ( \ r k - h \K + ith \ + R k \K i- 2 2 where t n = r n + j i, . Since X, ~\/r, + i, = 1. it is not possible, in kkk k V k k ' general, that r n , i, , and X, each have only one non-zero bit. Hence, the 7 k k' k ' recursion cannot be performed efficiently in this formulation. (3) As a variant of the above, let us choose r k = 1 k k so that x k = V - S 2 2- 2 < k+1 > 2 Then the necessary recursions can be performed if s = 1 and IT X. is precomputed . That for k sufficiently large, a simple approximation for X, suffices has been shown in Section 10. k APPENDIX C: LISTING OF FRECOMPUTED CONSTANTS All precomputed constants required by the algorithms presented in this paper are listed below; sufficient accuracy for a mantissa of length i+0 is retained. TABLE 20 Special Constant * Arroroximate Value jt 3.1U1592653589T9 n/2 1.57079632679^90 n/k 0.785398163397^5 Tt/8 O.39269908169872 I/Y2" O.707IO678II8655 K 1 O.88706U 17837978 K' » 0.367U3^01338025 tan it/8 O.U1U21356237310 * Note that the values of K T and K 1 ' are functions of M. 152 153 TABLE 21 -l 3.2 -l 1 0.50000000000000 1.50000000000000 2 0.25000000000000 O.75OOOOOOOOOOOO 3 0.12500000000000 O.375OOOOOOOOOOO k 0.06250000000000 O.I875OOOOOOOOOO 5 0.03125000000000 O.O9375OOOOOOOOO 6 0.01562500000000 0. 01+687500000000 7 0.00781250000000 O.O23I+375OOOOOOO 8 0.00390625000000 0.01171875000000 9 0.00195312500000 O.OO5859375OOOOO 10 0.00097656250000 0.00292968750000 n 0.0001+8828125000 0.0011+61+81+375000 12 0.00021+1+ 11+ 062500 0.0007321+2187500 13 0.00012207031250 0.00036621093750 11+ 0.00006103515625 O.OOOI83IO5 1+6875 15 0.00003051757813 0. 0000915 52731+38 16 0.00001525878906 O.OOOOI+577636719 IT 0.000007629391+53 0.00002288818359 18 0.000003811+69726 0.0000111+1+1+09180 19 0.00000190731+863 0.00000572201+590 20 0.000000953671+32 0.00000286102295 21 0.0000001+7683716 O.OOOOOIU305III+7 22 0.000000238U1858 0.0000007152557^ 23 0.00000011920929 O.OOOOOO35762787 2U 0.000000059601+61+ 0.00000017881393 25 0.00000002980232 O.OOOOOOO89I+O697 (Continued 15k TABLE 21 (Continued) -i 2*2 -i 26 O.OOOOOOOll+901l6 0.0000000UU7031+8 27 O.000000007U5058 0.0000000223 5 lib 28 O.OOOOOOOO372529 0.00000001117587 29 O.OOOOOOOOI86265 0. 000000005 5879U 30 O.OOOOOOOOO93132 0.00000000279397 31 0.OOOOOOOOOU6566 0.00000000139698 32 0.00000000023283 0.000000000698U9 33 0.000000000116^2 0.0000000003U925 3^ 0.00000000005821 0.000000000171+62 35 0.00000000002910 0.00000000008731 36 0.00000000001^55 O.OOOOOOOOOOU366 37 0.00000000000728 0.00000000002183 38 0.0000000000036U 0.00000000001091 39 0.00000000000182 0. 000000000005 U6 Uo 0.00000000000091 0.00000000000273 in 0.000000000000U5 0.00000000000136 42 0.00000000000023 0.00000000000068 ^3 0.00000000000011 0. 0000000000003 k kh 0.00000000000006 0.00000000000017 h5 0.00000000000003 0.00000000000009 155 TABLE 22 i ln(l+2~ 1 ) ln(l-2 _1 ) 1 O.I+O5I+65IO8IO816 -0.6931U718055995 2 0.2231U355131U21 -O.287682072U5178 3 0.11778303565638 -O.I3353139262I+52 k O.O6062U62I816U3 -O.O6U5 3852113757 5 O.O3O77165866675 -O.O317U869831U58 6 0.015501+18653597 -O.OI57I+83569681U 7 O.OO77821U0UU2O5 -O.OO78U3177U6IO3 8 O.OO38986U0I+I566 -O.OO391389932III+ 9 0.00195122013126 -O.OOI95503U8358O 10 O.OOO97608597306 -O.OOO9770396I+783 11 O.OOOl+88l6207950 -O.OOOU88I+OOU98H 12 0.00021+1+11082753 -0.000241+170^3217 13 0.00012206286253 -O.OOOI2207776369 11+ 0.00006103329368 -O.OOOO6IO37OI897 15 0. 0000305 171121+7 -O.OOOO305180I+38O 16 O.OOOOI525867265 -O.OOOOI5258905I+8 IT O.OOOOO7629365I+3 -O.OOOOO7629I+236I+ 18 O.OOOOO381U68999 -0.0000038ll+70l+5l+ 19 O.OOOOOI9O73I+68I - O.0000019073 5 Ok 5 20 O.OOOOOO95367386 -O.OOOOOO95367I+77 21 O.OOOOOOI+768370I+ -0.000000I+7683727 22 O.OOOOOO238U1855 -0.000000238I+1861 23 0.00000011920928 -0.00000011920930 2^ 0.000000059601+64 -0.00000005960I+65 25 0.00000002980232 -0.00000002980232 (Continued ) 156 TABLE 22 (Continued) -l- lu(l+2 ) -l. ln(l-2 x ) 26 0.000000011+90116 -0.000000011+90116 27 O.OOOOOOOO7I+5058 -0.0000000071+5058 28 0.00000000372529 -0.00000000372529 29 O.OOOOOOOOI86265 -0.00000000186265 30 O.OOOOOOOOO93132 -0.00000000093132 31 0.OOOOOOOOOI+65 66 -0.0000000001+6566 32 0.00000000023283 -0.00000000023283 33 0.0000000001161+2 -0.0000000001161+2 3^ 0.00000000005821 -0.00000000005821 35 0.00000000002910 -0.00000000002910 36 O.OOOOOOOOOOII+55 -0.000000000011+55 37 0.00000000000728 -0.00000000000728 38 O.OOOOOOOOOOO36I+ -0.0000000000036U 39 0.00000000000182 -0.00000000000182 1+0 0.00000000000091 -0.00000000000091 kl 0.0000000000001+5 -O.OOOOOOCOOOOOl+5 1+2 0.00000000000023 -0.00000000000023 ^3 0.00000000000011 -0.00000000000011 1+1+ 0.00000000000006 -0.00000000000006 ^5 0.00000000000003 -0.00000000000003 157 TABLE 23 i arctan (2 ) 1 O.U636U7609OOO8I 2 0.24^97866312686 3 O.I2U35I+99U54676 k O.O62U1880999596 5 0.03l239833 1 +3027 6 0.015623728620^8 7 0.0078123^106010 8 0.00390623013197 9 0.001953122516U8 10 O.OOO97656218956 11 0.000^8828121119 12 0.0002U1+1U062015 13 0.00012207031189 Ik 0.00006103515617 15 0.00003051757812 16 O.OOOOI525878906 17 O.OOOOO762939U53 18 0.0000038li+69726 19 0.0000019073^863 20 O.OOOOOO95367U32 21 O.OOOOOOU7683716 22 O.OOOOOO238U1858 23 0.00000011920929 2k 0.00000005960U6^ 25 0.00000002980232 (Continued) 158 TABLE 23 (Continued) i arctan (2 ) 26 O.OOOOOOOIU9OH6 27 O.OOOOOOOO7U5058 28 0.00000000372529 29 O.OOOOOOOOI86265 30 0.00000000093132 31 O.OOOOOOOOOU6566 32 O.OOOOOOOOO23283 33 0.000000000116U2 3^ 0.00000000005821 35 0.00000000002910 36 0.00000000001^55 37 0.00000000000728 38 O.OOOOOOOOOOO36U 39 0.00000000000182 1+0 0.00000000000091 Ul 0.000000000000^5 U2 0.00000000000023 ^3 0.00000000000011 41+ 0.00000000000006 ^5 0.00000000000003 APPENDIX D: LISTING OF PARTIAL SIMULATION PROGRAM CODE // EXEC ASM //ASM.SYSIN 00 * LML BEGIN LM 1,2,0(1) I 5*0(1) L 7, MASK NR 5,7 L 7, EX XR 5,7 * TM 1(1),X'80» BO ONE TM 1(1),X«40« BO TWO TM 1(1),X«20« $ BO THREE A 5, CTWO B STORE THREE A 5, CONETH B STORE TWO A 5, CONETW B STORE ONE A 5, CONEON STORE * ST 5,0(2) SR 8,8 ST 8,4(2) LEAVE CNOP 0,8 MASK DC X'FFOOOOOO* EX DC X'TFOOOOOO 1 CTWO DC X , 02800000' CONETH DC X'02400000« CONETW DC X«02200000« CONEON DC END X«02100000' /* // EXEC ASM //ASM.SYSIN DD * GEN BEGIN LM 1,2,0(1) LR 10,2 SAVE LR 9,1 L 2,0(2) ST 2, IN CALL RN3INZ,( IN) L 6, END L 7,C0MP L 8, SET SR 5,5 BGD CALL IRN3Z NR 0,7 OR 0,8 ST 0,0(9,5) CALL IRN3Z ST 0,0(10) S LOAD ADDRESSES OP X AND Y LOAD X INTO R5 LOAD MASK INTO R7 SET FRACTIONAL PART OP X TO ZERO LOAD EX INTO R7 COMPLIMENT CHARACTERISTIC SETTING SIGN TO ZERO ,IN R5 TEST FRACTIONAL PART GO TO •ONE' IF GREATER THAN 0.5 GO TO •TWO* IP GREATER THAN 0.25 GO TO 'THREE' IF GREATER THAN 0.125 GOES HERE IF LESS THAN 0.125 FORM 2**-E IN R5 PLACE 2**-E IN STORAGE LOCATION ASSIGNED TO Y LOAD ADDRESSES INTO Rl AND R2 LAST RANDOM NUMBER - NEW IX LOAD INITIAL RANDOM NUMBER INTO R2 ENTER INITIAL RANDOM NUMBER INTO RAN3Z LOAD 8192 INTO R6 LOAD 'AND* MASK INTO R7 LOAD 'OR' MARK INTO R8 SET R5 TO ZERO GENERATE FIRST HALF OF RANDOM NUMBER SET FIRST 8 BITS TO ZEROS SET FIRST 8 BITS TO 01000000... STORE FIRST HALF OF RANDOM NUMBER GENERATE SECOND HALF OF RANDOM NUMBER AVE LAST RANDOM NUMBER - NEW IX 159 160 END DEC COMP SET IN /* // EX //FOR C C SLL ST A CLR BNE LEAVE DC DC DC DC OS END 0*1 0,4(9,5) 5, DEC 5,6 BGD F«8192» F»8« X'OOFFFFFF 1 X« 40000000 • IF SHIFT SECOND HALF 1 §IT LEFT STORE SECOND HALF OF RANDOM NUMIIR AOD • TO R5 COMPARE RS TO R* GO TO 86D IF RS NOT EQUAL A* (40,0) EC FORTLKGO,TIME.GO« T.SYSIN DD * FORTRAN PARTIAL SIMULATION CODE OPERAND VALUE IS X IMPLICIT REAL*8( A-H,L,0-Z,$) DIMENSION TW0(100),TTW0(100),LNPLUS(60),LNMINS(60),ARCTAN(60) RAND (1024) ,NMZER0(20),NMMINS(20) ,NDZERO(20),NDMINS(20) ,NEZER0(20),NEMINS(20) ,NSZERO(20),NSMINS(20) ,NTZER0(20),NTMINS(20) ,NAZER0(20),NAMINS(20) ,NCZER0(20),NCMINS(20) ,N0ZERO(20),N0MINS(20) 11 12 14 15 16 18 20 21 24 25 26 27 28 29 30 36 38 40 41 42 43 44 45 46 47 48 49 51 DIMENSION DIMENSION DIMENSION DIMENSION DIMENSION DIMENSION DIMENSION DIMENSION DIMENSION LISTING OF NMPLUS(20) NDPLUS(20) NEPLUS(20) NSPLUS(20) NTPLUS(20) NAPLUS(20) NCPLUS(20) N0PLUS(20) FORMATS F0RMAKZ16) F0RMAT(I5,D28.16) F0RMAT(D28.16) FORMAT! • THE VALUES NPLUS,NZERO,NMI NUS •) FORMAT(3I15) F0RMAK2D28.16) FORMAT (1 15) FORMAT(« THE VALUE OF IX TO USE NEXT •) MULTIPLICATION: PLUS, ZERO, MINUS •) DIVISION/LOGARITHM: PLUS, ZERO, MINUS • EXPONENTIAL: PLUS, ZERO, MINUS •) SQUARE ROOT: PLUS,ZERO,MINUS •) TANGENT/COTANGENT: PLUS , ZERO, MINUS •) FORMAT ( FORMAT( FORMAT ( FORMAT( FORMAT ( FORMAT FORMAT (• ARCTANGENT: (• COSINE/SINE F0RMAT(4I15) F0RMAT(3D28.16) F0RMAT(4D28.16) PLUS, ZERO, MINUS PLUS,ZERO,MINUS • ) • ) FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT (• (• ( • (• ( • ( • ( • (• ( • f • • ) i LN2 PI PI2 PI4 PI8 TWO( I) TTWO(I) LNPLUS(I) LNMINS(I) ARCTAN(I) • ) •) • ) •) • ) • ) • ) • ) 161 69 70 71 72 73 74 75 76 77 60 61 62 63 64 FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT READ C READ(5 READ<5 READ(5 READ(5 READ(5 READ(5 REA0(5 READ(5 READ(5 READ(5 WRITE DO 60 WRITE WRITE DO 61 WRITE WRITE DO 62 WRITE WRITE DO 63 WRITE WRITE DO 64 WRITE WRITE WRITE WRITE WRITE WRITE WRITE WRITE WRITE WRITE WRITE DO 100 NMPLUS NMZERO NMMINS NDPLUS NDZERO NDMINS NEPLUS NEZERO NEMINS NSPLUS NSZERO (• THE PROBABILITY OF A ZERO «• MULTIPLICATION •) DIVISION •) EXPONENTIAL •) FIRST SQUARE ROOT ») TANGENT/COTANGENT • ) ARCTANGENT ') COSINE/SINE •) ( • SECOND SQUARE ROOT •] ONSTANTS tl ♦ 1 tl tl tl tl tl tl tl tl (6 I (6 (6 I (6 (6 I (6 (6 I (6 (6 I (6 (6 (6 (6 (6 (6 (6 (6 (6 (6 (6 J (J (J (J (J (J (J (J (J (J (J (J (TWO( I) ,1=1,60) (TTW0U),I = l,60) (LNPLUS( I), 1=1, 50) (LNMINSt I), 1=1, 50) (ARCTAN( I), 1=1, 50) LN2 PI PI2 P14 PI8 41) 1,60 12) ItTWOU) 42) 1,60 12) I,TTWO(I) 43) 1,50 12) ItLNPLUS(I) 44) 1,50 12) I,LNMINS(I) 45) 1,50 12) ItARCTAN(I) 46) 14) 47) 14) 48) 14) 49) 14) 51) 14) PI8 = ltl6 = = = = = = = = = = = LN2 PI PI2 PI4 100 201 202 203 204 250 255 256 257 258 300 309 310 311 162 NSMINS(J) NTPLUS(J) NTZERO(J) NTMINS(J) NAPLUS(J) NAZEROU) NAMINS(J) NCPLUS(J) NCZERO(J) NCMINS(J) NOPLUS(J) NQZERO(J) NQMINS(J) CONTINUE IX = 7867 DO 99200 00 99999 DO 99000 WRITE<6,2 WRITE<6,2 Z = RAND( IF(Z.GE.O IFiZ.GE.O IF(Z.LT.O IF(Z.LT.O IF(Z.LT.O IF(Z.LT.O J = 1 GO TO J = 2 GO TO J = 3 GO TO J = 4 GO TO IFiZ.LT.O IF(Z.LT.O IF(Z.LT.O IF(Z.LT.O J = 5 GO TO J = 6 GO TO J = 7 GO TO J = 8 GO TO IF(Z.GE.O IF(Z.LT.O IF(Z.LT.O IFIZ.LT.O IF(Z.LT.O J = 9 GO TO 500 J = 10 GO TO 500 J = 11 1353 NNNN ■ It 16 MX = 1,2 MMM = 1,8 1) 0) IX KKK) .500) GO TO 300 .25) GO TO 250 .0625) GO TO 201 .125) GO TO 202 .1875) GO TO 203 .25) GO TO 204 500 500 500 500 .3125) GO TO 255 .375) GO TO 256 .4375) GO TO 257 .500) GO TO 258 500 500 500 500 .750) GO TO 350 .5625) GO TO 309 .625) GO TO 310 .6875) GO TO 311 .75) GO TO 312 163 60 TO 500 312 J - 12 GO TO 500 350 IF1Z.LT. 0.8175) GO TO 363 IF(Z.LT. 0.875) GO TO 364 IFtZ.LT. 0.9375) GO TO 365 IFU.LE. 1.000) GO TO 366 363 J * 13 GO TO 500 364 J « 14 GO TO 500 365 J * 15 GO TO 500 366 J = 16 500 CONTINUE C RANDOM NUMBER GENERATOR CALL GEN(RAND,IX) C MULTIPLICATION ALGORITHM X = 0.5*RAND(KKK) + 0.5 NMINUS = NZERO = NPLUS = C INITIALIZATION IF (X.LT.0.75) GO TO 1150 X = X - 1.0 GO TO 1160 1150 X = X - 0.5 1160 CONTINUE NZERO = NZERO ♦ 1 DO 1400 I = lt40 CCM = - TTW0( 1+2) IF (X.GE.CCM) GO TO 1200 X = X + TWOd ) NPLUS = NPLUS + 1 GO TO 1400 1200 CCP = -»-TTWO( 1+2) IF (X.GE.CCP) GO TO 1300 NZERO = NZERO + 1 GO TO 1400 1300 X = X - TW0( I ) NMINUS = NMINUS + 1 1400 CONTINUE NMPLUS(J) = NMPLUS(J) + NPLUS NMZERO(J) = NMZERO(J) + NZERO NMMINS(J) - NMMINS(J) + NMINUS C DIVISION ALGORITHM X = 0.5 * RAND(KKK) + 0.5 NMINUS = NZERO = NPLUS = C INITIALIZATION IFU.LT.0.75) GO TO 1450 GO TO 1460 1450 X = X * 2.000 1460 NZERO » NZERO ♦ 1 DO 2000 I * 1,40 CCM * 1.0 - TTW0U+2) 16k IF(X.GT.CCM) GO TO 1500 X ■ X * (LO + TWOUJ) NPLUS » NPLUS ♦ 1 GO TO 2000 1500 CCP ■ 1.0 + TTWOU+2) IF(X.GT.CCP) GO TO 1700 NZERO ■ NZERO ♦ 1 GO TO 2000 i7oo x ■ x * d.o - Twonn NMINUS « NMINUS ♦ 1 2000 CONTINUE NDPLUS(J) ■ NDPLUS(J) ♦ NPLUS NOZERO(J) > NDZERO(J) ♦ NZERO NDMINS(J) * NDMINS(J) ♦ NMINUS : EXPONENTIAL ALGORITHM X * (2.0 * RAND(KKK) - 1.0) * LN2 NMINUS = NZERO = NPLUS = : INITIALIZATION IFU.LT.0.5) GO TO 3100 X = X - 0.5 NPLUS = NPLUS ♦ 1 GO TO 3800 3100 IF(X.LT.0.25) GO TO 3200 X = X - 0.25 NPLUS = NPLUS + 1 GO TO 3800 3200 IFU.LT.-0.25) GO TO 3300 NZERO = NZERO ♦ 1 GO TO 3800 3300 IFU.LT.-0.5) GO TO 3400 X = X + 0.25 NMINUS = NMINUS + 1 GO TO 3800 3400 X = X + 0.5 NMINUS = NMINUS ♦ 1 3800 CONTINUE 00 4000 I = 1,40 CCM = - TTW0( 1+2) IF(X.GT.CCM) GO TO 3850 X = X - LNMINS( I ) NMINUS = NMINUS ♦ 1 GO TO 4000 3850 CCP = + TTWO( 1+2) IF(X.GT.CCP) GO TO 3860 NZERO = NZERO ♦ 1 GO TO 4000 3860 X = X - LNPLUS( I ) NPLUS = NPLUS + 1 4000 CONTINUE NEPLUS(J) = NEPLUS(J) ♦ NPLUS NEZEROU) = NEZERO(J) + NZERO NEMINS(J) = NEMINS(J) ♦ NMINUS ; SQUARE ROOT ALGORITHM: RADIX 2»SCALED X = 0.75 * RAND(KKK) ♦ 0.25 XO * X 165 NMINUS « NZERO * NPLUS « : TEST INITIAL RANGE OF X IFU.GT. 0.3125) GO TO 4100 CO - 0*500 R - CO RSQ ■ R * R X « XO - RSQ NPLUS ■ NPLUS + 1 00 4050 I • 1*40 CCM * - TTWOU+3) IFU.GT. CCM) GO TO 4010 R » R - TWOU + l) RSQ = R * R X = XO - RSQ NMINUS = NMINUS + 1 GO TO 4050 4010 CCP = + TTWO( 1+3) IF(X.GE.CCP) GO TO 4020 NZERO = NZERO + 1 GO TO 4050 4020 R = R + TWOU + l) RSQ = R * R X = XO - RSQ NPLUS = NPLUS + 1 4050 CONTINUE GO TO 5000 4100 IFU.GT. 0.375) GO TO 4200 CO = 0.5625 R = CO RSO = R * R X = XO - RSQ NPLUS = NPLUS + 1 DO 4150 I = 1*40 CCM = - TWOU+2) - TTWOU+4) IFU.GT. CCM) GO TO 4110 R = R - TWO( I + l) RSQ = R * R X = XO - RSQ NMINUS = NMINUS + 1 GO TO 4150 4110 CCP = - CCM IF(X.GE.CCP) GO TO 4120 NZERO = NZERO ♦ 1 GO TO 4150 4120 R = R + TWO( I+l) RSQ = R * R X = XO - RSQ NPLUS = NPLUS + 1 4150 CONTINUE GO TO 5000 4200 IFU.GT. 0.5625) GO TO 4300 CO = 0.625 R = CO RSQ = R * R X = XO - RSQ 166 NPLUS ■ NPLUS ♦ 1 DO 4250 I ■ lt40 CCM ■ - TWOU + l) IF(X.GT.CCM) GO TO 4210 R ■ R - TWO( 1*1) RSO ■ p. ♦ R X « XO - RSO NHINUS ■ NMINUS ♦ 1 GO TO 4250 4210 CCP ■ - CCM IF(X.GE.CCP) GO TO 4220 NZERO - NZERO ♦ 1 GO TO 4250 4220 R - R ♦ TWOU + 1) RSO * R * R X ■ XO - RSO NPLUS « NPLUS ♦ 1 4250 CONTINUE GO TO 5000 4300 IFU.GT. 0.875) GO TO 4400 CO = 0,75 R - CO RSQ = R * R X s XO - RSO NPLUS = NPLUS + 1 DO 4350 I = l f 40 CCM = - TWOU + 1) - TWOU+3) IF(X.GT.CCM) GO TO 4310 R = R - TWO( 1+1) RSO = R * R X = XO - RSO NMINUS = NMINUS + 1 GO TO 4350 4310 CCP = - CCM IF(X.GE.CCP) GO TO 4320 NZERO = NZERO ♦ 1 GO TO 4350 4320 R = R + TWO( 1+1) RSO = R * R X = XO - RSO NPLUS = NPLUS ♦ 1 4350 CONTINUE GO TO 5000 4400 CO = 1.0 R = CO RSO = R * R X = XO - RSQ NPLUS = NPLUS + 1 DO 4450 I = l f 40 CCM = - TTWO( 1+2) IF(X.GT.CCM) GO TO 4410 R = R - TWOd + 1) RSQ = R * R X = XO - RSQ NMINUS = NMINUS ♦ 1 GO TO 4450 4410 CCP = - CCM 167 IF(X.GE.CCP) GO TO 4420 NZERO - N2ER0 * 1 GO TO 4450 4420 R ■ R ♦ TWO(H-l) RSO « R * R X ■ XO - RSO NPLUS ■ NPLUS ♦ 1 4450 CONTINUE 5000 CONTINUE NSPLUS(J) - NSPLUS(J) ♦ NPLUS NSZERO(J) ■ NSZERO(J) ♦ NZERO NSMINS(J) - NSMINS(J) ♦ NMINUS : TANGENT ALGORITHM X = RAND(KKK) * PI2 Rl * 0.96887271238293440-04 XH * X ; TEST RANGE OF OPERAND IF(X.LE.Rl) GO TO 5100 IFU.LE.PI4) GO TO 5200 C = PI2 - TW0(9) IFU.LT.C) GO TO 5300 D = PI2 - TW0140) IF(X.LT.D) GO TO 5400 INOIC = 5 TANXH = 1.0/(PI2 - XH) GO TO 8100 5100 INDIC = 1 TANXH = XH GO TO 8100 5200 INOIC = 2 X = XH GO TO 6100 5300 INDIC = 3 X = PI2 - XH GO TO 6100 5400 INDIC = 4 X = PI2 - XH 6100 NPLUS = NZERO = NMINUS = ; INITIALIZATION A = 1.0 B = 0.414213562373095 X = X - PI8 NPLUS = NPLUS ♦ 1 ; BEGIN NORMAL ALGORITHM ITERATION IF( INDIC. EO. 4) GO TO 6110 K = 40 GO TO 6120 6110 K = 50 6120 CONTINUE DO 7100 I = 1,K CCM = - TTWO( 1+2) IF(X.GE.CCM) GO TO 6200 X = X ♦ ARCTAN( I ) SAVEA = A A = A + B * TWO( I) 168 B « B - SAVEA * TWO(I) NMINUS « NHINUS «• 1 GO TO 6600 6200 CCP ■ - CCM IF(X.GT.CCP) GO TO 6300 NZERO « NZERO + 1 GO TO 6600 6300 X • X - ARCTANU) A B * TWO(I) SAVEA * TWOU) NPLUS ♦ 1 6600 7100 X * X SAVEA A » A B * B NPLUS CONTINUE CONTINUE NTPLUSU) NTZERO(J) NTMINS(J) IF( INDIC.E0.2) IFUNDIC.E0.3) 8100 8300 8400 8510 NTPLUS( J) NTZERO(J) NTMINS(J) TANXH TANXH TANXH ♦ NPLUS ♦ NZERO ♦ NMINUS B/A A/B A/B CASES GO TO 8800 TO OMIT 8800 IF( INDIC.EQ.4) CONTINUE ARCTANGENT ALGORITHM TAKE RESULT OF TANGENT X = TANXH NMINUS = NZERO = 0" NPLUS = TEST FOR SPECIAL IF(X.LT.TW0(20) ) T = 1.0/TWOUO) IF(X.GT.T) GO TO D = DABS(X-l.O) IF(D.LT.TW0(19) ) GO TO 8800 IF(X.GE.l.O) GO TO 8601 HERE X IS LESS THAN UNITY A = X + 1.0 B = X - 1.0 IFU.GT.0.25) GO TO 8300 A = 1.0 B = X NPLUS = NPLUS ♦ 1 GO TO 8510 IFU.GT.0.5) GO TO SAVEA = A A = 0.5 * A - B = 0.5 * B + NPLUS = NPLUS 8510 * 0.5 * 0.5 = NZERO ALGORITHM ALGORITHM 8400 0.25 0.25 + 1 B SAVEA GO TO A = A B = B NZERO + 1 CONTINUE BEGIN NORMAL ALGORITHM ITERATION WITH R DO 8600 I = 1»40 CCM = - TTWOU + 2) IF(B.LE.CCM) GO TO 8520 CCP = - CCM 40 169 IF(B.GT.CCP) GO TO 8530 NZERO * NZERO ♦ l GO TO 8540 8520 SAVEA * A A ■ A . - B * TWO(I) B » B + SAVEA * TWOU) NMINUS ■ NMINUS ♦ 1 GO TO 8540 8530 SAVEA « A A * A ♦ B * TWO(I) B * B - SAVEA * TWOU) NPLUS = NPLUS ♦ 1 8540 CONTINUE IFU.EQ.40) GO TO 8700 8600 CONTINUE 8601 CONTINUE C HERE X IS AT LEAST UNITY C FIND EXPONENT Y OF OPERAND X CALL LML(X,Y) C LML RETURNS Y = 2**(-E) P = Y * X A = P + Y B = P - Y IF (X.LT.1.5) GO TO 8605 SAVEA = A A = A + B B = B - SAVEA NMINUS = NMINUS + 1 8605 NZERO = NZERO + 1 IFU.GT.1.25) GO TO 8610 A = A * 0.75 B = B * 0.75 NPLUS = NPLUS + 1 GO TO 8670 8610 IFtA.GT.1.5) GO TO 8620 A = A * 0.62 5 B = B * 0.625 NPLUS = NPLUS + 1 GO TO 8670 8620 A = A * 0.5 B = B * 0.5 NZERO = NZERO + 1 8670 CONTINUE C BEGIN NORMAL ALGORITHM ITERATION WITH R = 40 DO 8700 I = 1,40 CCM = - TTWOU+2) IF(B.LE.CCM) GO TO 8680 CCP = - CCM IF(B.GT.CCP) GO TO 8690 NZERO = NZERO + 1 GO TO 8700 8680 SAVEA = A A = A - B * TWO( I ) B = B + SAVEA * TWO( I) NMINUS = NMINUS + 1 GO TO 8700 8690 SAVEA = A 8700 8800 8900 23000 24000 C 26000 26100 27000 27100 29000 NAPLUS(J) NAZEROU) NAMINS(J) ALGORITHM * PI4 NPLUS NZERO NMINUS 170 A = A + B * TWO(I) B « B - SAVEA * TWOU> NPLUS ■ NPLUS + 1 CONTINUE NAPLUS(J) ■ NAZERO(J) * NAMINS(J) ■ GO TO 8900 CONTINUE CONTINUE COSINE/SINE X ■ RAND(KKK) NMINUS = NZERO = NPLUS * INITIALIZATION X = X - PI8 NPLUS = NPLUS ♦ 1 NZERO = NZERO + 1 FIRST QUARTER NON-REDUNDANT DO 24000 I = 2,9 IF(X.GT.O) GO TO 23000 X = X + ARCTAN( I ) NPLUS = NPLUS ♦ 1 GO TO 24000 X = X - ARCTANU ) NMINUS = NMINUS + 1 CONTINUE REST REDUNDANT DO 29000 I = 10,40 CCP = TTWOd+2) IF(X.GT.CCP) GO TO 26000 CCM = - CCP IF(X.LE.CCM) GO TO 27000 NZERO = NZERO + 1 GO TO 29000 X = X - ARCTAN( I ) IFU.LE.20) NPLUS = NPLUS + 1 NPLUS = NPLUS + 1 GO TO 29000 X = X + ARCTAN( I ) IFd.LE.20) NMINUS = NMINUS + 1 NMINUS = NMINUS ♦ I CONTINUE NCPLUS( J) NCZERO( J) NCMINS( J) ALGORITHM 61tl00 NCPLUS(J) = NCZERO(J) = NCMINS(J) = SQUARE ROOT DO 30000 I = + NPLUS ♦ NZERO + NMINUS HIGHER RADIX, NOT SCALED TWO( I ) = 0.0 30000 TTW0( I ) = 0.0 X = 0.75 * RAND(KKK) Y = X NMINUS = NZERO = NPLUS = ALPHA = X * 0.25 171 C INITIAL STEP IF(X.GE.0.5) GO TO 30100 X = X * 4.0 ALPHA * ALPHA * 2.0 30100 NZERO ■ NZERO ♦ 1 DO 39000 I « 1,40 CCP - 1.0 ♦ TTWOU+2) IF(X.GE.CCP) GO TO 30200 CCM ■ 1.0 - TTWOU+2) IF(X.GT.CCM) GO TO 30300 X = X * (1.0 ♦ TWO(I) ♦ TWO(2*I*2)> ALPHA = ALPHA * (1.0 + TWOU + 1)) NPLUS = NPLUS + 1 GO TO 30400 30200 X = X * (1.0 - TWO(I) + TWO( 2* 1+2 ) ) ALPHA = ALPHA * (1.0 - TWOU + 1)) NMINUS = NMINUS + 1 GO TO 30400 NZERO = NZERO + 1 CONTINUE 30300 30400 39000 NOPLUS(J) NOZERO( J) NOMINS( J) CONTINUE NOPLUS(J) NQZERO( J ) NOMINS(J) 99000 CONTINUE WRITE(6,24) WRITE (6,36) WRITE(6,25) WRITE(6,36) WRITE(6,26) WRITE(6,36) WRITE(6,27) WRITE(6,36) WRITE(6,28 ) WRITE(6,36) WRITE(6,29) WRITE(6,36) WRITE(6,30) WRITE (6,36) WRITE(6,27) WRITE (6,36) 99999 CONTINUE C CALCULATE PROBABILITY MMULT = MMULTZ = MDIV = MDIVZ = MEXP = MEXPZ = MS0T1 = MSQT1Z = MTAN = MTANZ = MATN = MATNZ = MCOS = MCOSZ = NPLUS NZERO NMINUS ( I,NMPLUS( I),NMZERO(I ),NMMINS( I,NDPLUS( I) ,NDZERO(I ) ,NDMINS(I I,NEPLUS( I ) ,NEZERO( I) ,NEMINS( I I,NSPLUS( I ) ,NSZERO(I ) ,NSMINS( I I, NT PLUS ( I ) ,NTZERO(I ) ,NTMINS(I I,NAPLUS( I ) ,NAZERO(I ) ,NAMINS( I I,NCPLUS( I ) ,NCZERO( I ) ,NCMINS( I I,NQPLUS( I ) ,NOZERO( I) ,NOMINS( I ),I=1,16) =1,16) =1,16) 1,16) =1,16) =1,16) =1,16) =1,16) OF A ZERO FROM THESE DATA 172 MSQT2 ■ MSQT2Z ■ 00 99100 I - ltl6 MMULT » MMULT ♦ NMPLUSU) ♦ NMMINS(I) MHULTZ * MMULTZ ♦ NMZERO(I) MDIV * MDIV ♦ NDPLUS(I) ♦ NDMINS(I) MDIVZ ■ MDIVZ + NDZERO(I) HEXP > MEXP ♦ NEPLUSU) ♦ NEMINS(I) MEXPZ - HEXPZ ♦ NEZEROU) MS0T1 « MSQT1 ♦ NSPLUS(I) ♦ NSKINSU) MSQT1Z « MS0T1Z + NSZEROU) MTAN » MTAN ♦ NTPLUSU) ♦ NTMINSU) MTANZ = MTANZ ♦ NTZEROU) MATN * MATN + NAPLUSU) ♦ NAMINS(I) MATNZ = MATNZ ♦ NAZEROU) MCOS = MCOS ♦ NCPLUS(I) ♦ NCMINSU) MCOSZ = MCOSZ ♦ NCZEROU) MS0T2 = MS0T2 ♦ NQPLUSU) ♦ NQMINS(I) MSQT2Z = MS0T2Z ♦ NOZERO(I) 99100 CONTINUE AMULT = MMULT ♦ MMULTZ AMULT = NMULTZ PMULT = AMULTZ/AMULT WRITE (6,78) WRITE (6,14) PMULT DIV = MOIV + MDIVZ DIVZ = MDIVZ PDIV = DIVZ/DIV EXP = MEXP + MEXPZ EXPZ = MEXPZ PEXP = EXPZ/EXP S0T1 = MS0T1 ♦ MS0T1Z SQT1Z = MS0T1Z PS0T1 = S0T1Z/SQT1 TAN = MTAN + MTANZ TANZ = MTANZ PTAN = TANZ/TAN ATN = MATN ♦ MATNZ ATNZ = MATNZ PATN = ATNZ/ATN COS = MCOS + MCOSZ COSZ = MCOSZ PCOS = COSZ/COS SQT2 = MSQT2 «■ MSQT2Z SQT2Z = MS0T2Z PS0T2 = S0T2Z/SQT2 WRITE(6,69) WRITE (6,71) WRITE(6,14) PMULT WRITE (6,14) PDIV WRITE (6,72) WRITE (6,14) PEXP WRITE (6,73) WRITE (6,14) PSQT1 WRITE (6,74) WRITE (6,14) PTAN WRITE (6,75) 173 WRITE (6,14) PATN WRITE (6,76) WRITE (6,14) PCOS WRITE (6,77) WRITE (6,14) PSOT2 99200 CONTINUE WRITE(6,21) WRITE(6,20) IX C CONSTANTS IN IBM 360 HEXADECIMAL FORMAT STOP END //GO.SYSIN OD * SENTRY 4080000000000000 I 4040000000000000 2 4020000000000000 3 4010000000000000 4 3F80000000000000 5 3F40000000000000 6 3F20000000000000 7 3F10000000000000 8 3E80000000000000 9 3E40000000000000 10 3E20000000000000 11 3E10000000000000 12 3D80000000000000 13 3D40000000000000 14 3D20000000000000 15 3D10000000000000 16 3C80000000000000 17 3C40000000000000 18 3C20000000000000 19 3C10000000000000 20 3B80000000000000 21 3B40000000000000 22 3B20000000000000 23 3B10000000000000 24 3A80000000000000 25 3A40000000000000 26 3A20000000000000 27 3A10000000000000 28 3980000000000000 29 3940000000000000 30 3920000000000000 31 3910000000000000 32 3880000000000000 33 3840000000000000 34 3820000000000000 35 3810000000000000 36 3780000000000000 37 3740000000000000 38 3720000000000000 39 3710000000000000 40 3680000000000000 41 3640000000000000 42 3620000000000000 43 3610000000000000 44 3580000000000000 4f 3540000000000000 ♦* 3520000000000000 4? 3510000000000000 *• 3480000000000000 41 3440000000000000 10 3420000000000000 II 3410000000000000 ft 3380000000000000 It 3340000000000000 I* 3320000000000000 II 3310000000000000 I* 3280000000000000 IT 3240000000000000 II 3220000000000000 II 3210000000000000 60 4118000000000000 1 40C0000000000000 2 4060000000000000 I 4030000000000000 4 4018000000000000 I 3FC0000000000000 6 3F60000000000000 7 3F30000000000000 8 3F18000000000000 9 3ECO0O0OOOOO000O 10 3E60000000000000 11 3E30000000000000 12 3E18000000000000 13 3DC0000000000000 1* 3D60000000000000 1* 3030000000000000 1* 3D18000000000000 I 7 3CCOOOOOOO0OO00O II 3C60000000000000 I 9 3C30000000000000 20 3C18000000000000 21 3BC0000000000000 22 3B60000000000000 23 3B30000000000000 24 3B18000000000000 25 3ACO0OOOOOOOOO0O 26 3A60000000000000 27 3A30000000000000 28 3A18000000000000 29 39C0000000000000 30 3960000000000000 31 3930000000000000 32 3918000000000000 33 38C0000000000000 34 3860000000000000 35 3830000000000000 36 3818000000000000 37 37COOOOOOOOOOOOO 38 3760000000000000 39 3730000000000000 40 3718000000000000 4 1 175 36C0000000000000 3660000000000000 3630000000000000 3616000000000000 35C0000000000000 3560000000000000 3530000000000000 3518000000000000 34CO000OO00OOO00 3460000000000000 3430000000000000 3418000000000000 33C0000000000000 3360000000000000 3330000000000000 3318000000000000 32C0000000000000 3260000000000000 3230000000000000 4067CC8FB2FE6126 40391FEF8F35343C -V01E27076E2AF2E4 3FF85186008B152C 3F7E0A6C39E0CBF2 3F3F815161F807C6 3F1FE02A6B106784 3EFF805515885DFE 3E7FE00AA6AC4398 3E3FF80155156150 3E1FFE002AA6AB0A 3DFFF80055515582 3D7FFE000AAA6AAA 3D3FFF8001555154 301FFFE0002AAA6A 3CFFFF8000555514 3C7FFFE00007FFFE 3C3FFFF80000FFFE 3C1FFFFE00001FFF 3BFFFFF800003FFE 3B7FFFFE000007FE 3B3FFFFF800000FE 3B1FFFFFE000001F 3AFFFFFF8O0OO03E 3A7FFFFFE0000006 3A40000000000000 3A20000000000000 3A10000000000000 3980000000000000 3940000000000000 3420000000000000 3910000000000000 3880000000000000 3840000000000000 3820000000000000 3810000000000000 3780000000000000 3740000000000000 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 176 3720000000000000 19 3710000000000000 40 3680000000000000 41 3640000000000000 42 3620000000000000 4} 3610000000000000 44 3580000000000000 49 3540000000000000 4* 3520000000000000 47 3510000000000000 4t 3480000000000000 49 3440000000000000 50 C0B17217F701CF62 1 C049A58844D36E40 2 C0222F10044FC8EE 3 C0108598B59E3A05 4 BF820AEC4F3A2210 5 BF408159624D611C 6 BF20202AEB11BCDC 7 BF10080559588B35 8 BE80200AAEAC44EC 9 BE40080155956156 10 BE20O2OO2AAEAB0A 11 BE1O008005559558 12 BD8002000AAAEAAA 13 BD40008001555954 14 BD2OOO2O002AAAEA 15 8D1O000800055559 16 BC80002000080002 17 BC40000800010000 18 BC20000200002000 19 BC10000080000400 20 BB80000200000800 21 BB40000080000100 22 BB20000020000020 23 BB10000008000004 24 BA80000020000008 25 BA40000000000000 26 BA20000000000000 27 BA10000000000000 28 B980000000000000 29 B940000000000000 30 B920000000000000 31 B910000000000000 32 B880000000000000 33 B840000000000000 34 B82O000O0O0O00OO 35 B810000000000000 36 B780000000000000 37 B740000000000000 38 B720OOOO0O000O00 39 B710000000000000 40 B680000000000000 41 B640000000000000 42 B620000000000000 43 B610000000000000 44 B58OOOOOO0OO00OO 45 177 8540000000000000 46 B52OOOO00O0O00OO 47 B510000000000000 48 8480000000000000 49 8440000000000000 50 4076B19C1586ECF9 1 403EB6EBF25901A1 2 401FD5BA9AAC2F65 3 3FFFAA0DB967EF20 4 3F7FF556EEA5D879 5 3F3FFEAAB776E530 6 3F1FFFD555BBBA94 7 3EFFFFAAAADDDDA2 8 3E7FFFF55556EEEA 9 3E3FFFFEAAAAB774 10 3E1FFFFFD55555B9 11 3DFFFFFFAAAAAAD4 12 3D7FFFFFF5555551 13 3D3FFFFFFEAAAAA7 14 3D1FFFFFFFD55553 15 3CFFFFFFFFAAAA9F 16 3C7FFFFFFFF5554F 17 3C3FFFFFFFFEAAA7 18 3C1FFFFFFFFF0553 19 3BFFFFFFFFFFAA9F 20 3B80000000000000 21 3B40000000000000 22 3820000000000000 23 3B10000000000000 24 3A80000000000000 25 3A40000000000000 26 3A20000000000000 27 3A10000000000000 28 3980000000000000 29 3940000000000000 30 3920000000000000 31 3910000000000000 32 3880000000000000 33 3840000000000000 34 3820000000000000 35 3810000000000000 36 3780000000000000 37 3740000000000000 38 3720000000000000 39 3710000000000000 40 3680000000000000 41 3640000000000000 42 3620000000000000 43 3610000000000000 44 3580000000000000 45 3540000000000000 46 3520000000000000 47 3510000000000000 48 3480000000000000 49 3440000000000000 50 40B17217F7D1CF62 66 413243F6A8885A2F 61 178 4U921FB54442D19 40C90FDAA22168C1 406487ED5110B450 SSTOP /* *2 41 44 BIBLIOGRAPHY Ashenhurst, R. L., "Function Evaluation in Unnormalized Arithmetic," Journal of the ACM, 11:2:168-187, April, 196U. Ashenhurst, R. L., N. Metropolis, "Unnormalized Floating Point Arithmetic," Journal of the ACM, 6:3:415-428, July, 1959 . Avizienis, A., "Arithmetic Microsystems for Synthesis of Function Generators," Proceedings of the IEEE, 5^:12:1910-1919, December, 1966. Bemer, R. W., "A Machine Method for Square-Root Computation," Communications of the ACM , 1:1:6-7, January, 1958. Bemer, R. W., "A Note on Range Transformation for Square Root and Logarithm," Communications of the ACM , 6:6:306-307, June, 1963. Bemer, R. ¥., "A Subroutine Method for Calculating Logs," Communications of the ACM, 1:5:5-7, May, 1958. Cantor, D., G. Estrin, R. Turn, "Logarithmic and Exponential Function Evaluation in a Variable Structure Digital Computer," IRE Transactions on Electronic Computers , EC-ll:2:155-l6H, April, 1962. Cochran, ¥. G., Sampling Techniques , New York, John Wiley & Sons, Inc., 1963. Combet, M., H. Van Zonneveld, L. Verbeek, "Computation of the Base Two Logarithm of Binary Numbers," IEEE Transactions on Electronic Computers ," EC-lU: 6:863-868, December, 1965. Cowgill, D. "Logic Equations for a Built-in Square Root Method, "IEEE Transactions on Electronic Computers , EC-13:2:156-157, April, 1964. Frieman, C. V., "Statistical Analysis of Certain Binary Division Algorithms," Proceedings of the IRE , 49:1:91-103, January, I96I. Garner, L., "Number Systems and Arithmetic," Advances in Computers , Vol. 6, New York, Academic Press Inc., 1965. Gilman, R. E., "A Mathematical Procedure for Machine Division," Communications of the ACM, 2:4:10-12, April, 1959 . Goto, M. , T. Fukumura, "Application of Generalized Minimal Representation of Binary Numbers to Square Rooting," Electronics and Communications in Japan , 50:5:122-129, May, 1967. Hart, J. F., Handbook of Computer Appr ox imat i on s , New York, John Wiley & Sons, Inc., 1968. Hastings, C, Appr ox imat i ons for Digital Computers , Princeton, New Jersey, Princeton University Press, 1955. Kish, L., Survey Sampling , New York, John Wiley & Sons, Inc., I965. 179 180 Kogbetliantz, E. G., "Computation of Arcsine N for N Between and 1 Using an Electronic Computer," IBM Journal of Research and Development , 2:218-222, July, 1958. Kogbetliantz, E. G., "Computation of Arctan N for N Between Plus and Minus Infinity Using an Electronic Computer," IBM Journal of Research and Development , 2:^3-53, January, 1958. Kogbetliantz, E. G., "Computation of E to the N for N Between Plus and Minus Infinity Using an Electronic Computer," IBM Journal of Research and Development , 1:110-115, April, 1957. Kogbetliantz,- E. G., "Computation of Sin N, Cos N and M Root of N Using an Electronic Computer," IBM Journal of Research and Development , 3:1^7-152, April, 1959. Lenaerts, E. H., "Automatic Square Rooting," Electronic Engineering , 27:287-289, July, 1955. MacSorley, 0. L., "High Speed Arithmetic in Binary Computers," Proceedings of the IRE, ^9: 1:67-91, January, I96I. Meggitt, J. E., "Pseudo Division and Pseudo Multiplication Processes," IBM Journal of Research and Development , 6:2:210-226, April, 1962. Metze, G., "A Class of Binary Divisions Yielding Minimally Represented Quotients," IRE Transactions on Electronic Computers , EC-11: 6: 761-76^, December, 1962. Metze, G., "Minimal Square Rooting," IEEE Transactions on Electronic Computers , EC-li+:2: I8I-I85, April, I965. Penhollow, J. 0., "A Study of Arithmetic Recoding with Applications in Multiplication and Division," Ph.D. Thesis, University of Illinois, September, I962. Reitwiesner, G. W., "Binary Arithmetic," Advances in Computers , Vol. 1, New York, Academic Press Inc., i960. Robertson, J. E., "A New Class of Digital Division Methods," IRE Transactions on Electronic Computers, EC-7: 5:218-222, September, 195&7" Robertson, J. E., "Increasing the Efficiency of Digital Computer Arithmetic Through Use of Redundancy," Lecture Notes for EE U97 B, University of Illinois, Fall Semester, 196^. Robertson, J. E., "The Correspondence Between Methods of Digital and Multi- plier Recoding Procedures," University of Illinois DCL Report No. 252, December 21, 1967. Sarafayan, D., "Divisionless Computations of Square Roots Through Continued Squaring," Communications of the ACM, 3:5:319-321, May, i960. 181 Schreider, Y. A., Method of Statistical Testing Monte Carlo Method , New York, Elsevier Publishing Co., 196^. Senzig, D. N., "Digit-by-Digit Generation of the Trigonometric and Hyperbolic Functions," IBM Research Report , RC-860, December 17, 1962. Shively, R. R., "Stationary Distributions of Partial Remainders in SRT Digital Division," Ph.D. Thesis, University of Illinois, June, I963. Specker, W. H. "A Class of Algorithms for Ln X, Exp X, Sin X, Cos X. Tan' 1 X, and Cot" 1 X," IEEE Transactions on Electronic Computers , EC-l4:l:85-86, February, 1965 . Tung, C., "A Combinational Arithmetic Function Generation System," Ph.D. Thesis, UCLA Report No. 68-29, June, 1968. Voider, J. E., "The CORDIC Trigonometric Computing Technique," IEEE Trans - actions on Electronic Computers , EC-8:5 :330 - 33^ September, 1959. VITA Bruce Gene De Lugish was born on March l6, 19^3 in Davenport, Iowa, After graduation from high school in Rock Island, Illinois, he earned the Bachelor of Science, Master of Science, and Doctor of Philosophy degrees, all from the University of Illinois at Urbana, in June, 19&5, August, I96Q, and June, 1970, respectively. As an undergraduate he was elected to three honor societies: Eta Kappa Nu, Tau Beta Pi, and Sigma Tau. He was also chosen Outstanding Senior in the Men's Residence Halls Association in I965. As a graduate he was in- vited to membership in Phi Kappa Phi. During graduate work he was a teaching fellow and teaching assistant in the Department of Electrical Engineering and a research assistant in the Department of Computer Science. Summer employment included work with Bell Telephone Laboratories at Holmdel, New Jersey and with Lawrence Radiation Laboratory at Livermore, California. Mr. De Lugish is a member of IEEE and ACM. 182 tf* I* \SW