LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 5\0. 81 JJdA Cop. Z The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 1 3 R ■J L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/specialpurposepr796garc f / o. f y \JL6a) w.7% uiucdcs-r-76-796 Jw^jIai SPECIAL PURPOSE PROCESSORS FOR RADIO AIDS AND FUNCTION GENERATION IN AIRCRAFT SIMULATORS by Gilles H. Garcia February 1976 . DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Report No. UIUCDCS-R-76-796 SPECIAL PURPOSE PROCESSORS FOR RADIO AIDS AND FUNCTION GENERATION IN AIRCRAFT SIMULATORS BY Gilles H. Garcia February 1976 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 This work was supported by the Department of Computer Science and the Institute of Aviation and was submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, 1976. TABLE OF CONTENTS i Section Page 1 . INTRODUCTION 1 1.1 The ILLIMAC Project 1 1.2 Radio Aids Coverage 3 1.3 The Problems 3 2. ANALYSIS OF DIGIT-BY-DIGIT METHODS 9 2.1 Comparison of the Voider; De Lugish; Cantor, Estrin and Turn; Walter and Chen Approaches to the Coordin- ate Rotations Methods 9 2.2 The Principle of Operation 10 2.3 Convergence and Domains of Operation 13 2.4 Precision for Radix 2 23 2.5 Computation of the Constants Stored in the ROM 27 3. PROCESSOR SIMULATION ..,. 34 3.1 Software Simulation 34 3 . 2 Tri gonometri c Mode 36 3.3 Linear Mode 38 3 . 4 Hyperbol i c Mode 38 3.5 Precision of Results without Initial Normalization.... 40 Section Page 4. THE 8-BIT PROTOTYPE OF A CORDIC ARITHMETIC UNIT 42 4.1 Hardware Realization 42 4.2 Control of the Arithmetic Unit 42 - Mode of Operation of the Control 43 - Structure of the Basic Microprogram- mable Unit 45 4.3 The Use of Scalers for 2's Complement Shifting 47 4.4 An 8-Bit Prototype CORDIC 54 5. METHODS BASED ON FUNCTION APPROXIMATION 68 5.1 High Speed Division Using High Speed Multipliers 68 5.2 Division and Function Generation Using Optimum First Order Polynomial Interpolation with Se- lected Abscissas 69 - Optimum Polynomial Interpolation with Selected Abscissas 70 - Range Transf ormati ons 72 - Legendre and Tchebychev Interpolation of First Order . 72 - Precision Attainable 75 5.3 Second Order Interpolation 79 5.4 Higher Order Interpolation and Precision 81 5.5 Application to Arbitrary Function Generation in Aviation 83 Section Page 6. CONCLUSION 87 7. APPENDIX 89 8. REFERENCES 107 LIST OF FIGURES 11 Figure Page 1.1 General Approach to Flight Simulation 4 1.2 Radio Navigation Aids Coverage 5 2.1 Nulling the Quantity A. 14 2.2 The Double Induction Process 16 2.3 Shifting Operation Using Barrel Shifters 24 2.4 Accumulation of Errors in the Shifting Process 26 2.5 Accumulation of Errors with Guard Bits 26 2.6 Realization of Guard Bits with Barrel Shifters 28 3.1 Format of the Simulation Program Printout 35 3 . 2 Steps i n the Tri gonometri c Mode 37 3.3 Geometric Interpretation for Linear Mode 39 3.4 Relative Error as a Function of the Slant Range * 41 4.1 General Microprogramming Unit 44 4.2 General Microprogrammed Control 46 4.3 Mi croprogram Control and Timi ng 48 4.4 Scaler and Truth Table 49 4.5 Sign Propagation with the 8-Bit Scaler 51 Figure Page 4.6 Correction of Sign Bit, First Method 52 4.7 Correction of Sign Bit, Second Method 53 4.8 Block Diagram of the CORDIC Unit 55 4.9 Control Flowchart (Trigonometric Mode) 57 4.10 Microinstruction Address Register 58 4.11 Microinstruction Register 60 4.12 Input Multiplexers 61 4.13 Outputs 62 4.14 CORDIC Arithmetic Unit, X Portion 63 4.15 CORDIC Arithmetic Unit, Y Portion 64 4.16 CORDIC Arithmetic Unit, A Portion 65 5.1 Organization of an ALU for Function Generation 86 A. 5.1 and A. 5. 2 ROM Usage proposed by Steffanelli 106 LIST OF TABLES 111 Page Table 2.1 Values of tan" 1 (1/2) 1 " 1 29 Table 2.2 Values of tanh" 1 (1/2)" 1 31 Table 2.3 Values ofllK- =11 [1 + 2" 2i ] 1/2 33 o o Table 4.1 Microprogram for the Trigonometric Mode 67 Table 5.1 Error for the First Order Approximation for f(x) = l/x and o(=l/2 76 Table 5.2 Error for the First Order Approximation for f(x) = l/x and o/=^2/2 (Tchebycheff Approximation) 77 Table 5.3 Error for the First Order Approximation for f(x) = l/x and tf=V3/3 (Legendre Approximation) 78 Table A.l Output for the Trigonometric Mode 90 Table A. 2 Output for the Linear Mode 93 Table A. 3 Output for the Hyperbolic Mode 96 ACKNOWLEDGMENTS iv The author would like to express his appreciation to Professor William Kubitz for his invaluable guidance, support and continuing help throughout this work. My debt is particularly great to Professor Ralph Flexman, the Director of the Institute of Aviation, for his constant support and to Lynn Staples with whom I have had many valuable conversations. His fruitful and original ideas in the area of flight simulation and his constant friendship and understanding during this project contributed in no small way to its success. My sincere thanks are also extended to Stanislas Zundo for the care he took to do all the drawings and to the Department printing ser- vices for the quality of their work. 1. INTRODUCTION 1.1 The ILLIMAC Project The purpose of this work was to investigate the feasibility of a special purpose hardware processor for flight simulation. The pro- ject was supported by The Institute of Aviation, Singer SPD Corporation and the Department of Computer Science. The Institute of Aviation has as a definite long-range goal updating its curriculum and activities to incorporate advanced simulation and sophisticated avionics so that the student may have a better introduction to the total field of modern aviation. The Institute would like a low cost, maintenance-free, high- fidelity simulator which meets the training requirements from begin- ning student pilot through private, commercial, instrument and multi- engine pilots. Recent advances in simulation engineering have made great contributions to improving the quality of pilot training. Unfortunately these advances in the engineering of flight simulators have not been available to most of general aviation because of the high cost assoc- iated with their acquisition. ILLIMAC I (University of Illinois Mini Aviation Computer) of- fers a reasonable probability for providing significant cost reduction to flight simulation for all levels of application. Present simulator technology available to general aviation in an acceptable cost bracket is either too limited in fidelity, range of performance, or reliability to permit full exploitation of the flight simulator concept. It is believed that the use of digital computation and microprogramming in the ILLIMAC will resolve many of the aforementioned problems. Among the different features of the ILLIMAC Project are: 1. Solving flight and radio aids problems in six degree of freedom. 2. A Central dedicated arithmetic unit which performs the major arithmetic computations in the flight and radio problems. 3. Several satellite special purpose processors which remove the major load from the central processor. 4. Expandability so as to allow ILLIMAC to simulate fuel depletion, control loading, motion problems, flight ac- cessories and other functions by adding other satellite processors. 5. Microprogramability. All programmed instructions for the computations which are common to all flight simulators are preprogrammed in ROMs. Functions which identify par- ticular aircraft characteristics or NAV/COM and locations are programmed using RAMs or PROMs. 6. Three data communication links. The first link is the data transmission system between the ILLIMAC and the in- structor's console. The second link is the data transmis- sion system to the visual display generation equipment. The third link is the data transmission of a common audio system where multiple audio systems can be channeled so that each channel may be independently selected by tuning a simulated aircraft type receiver. 7. Radio interchangability. The system provides for rear- rangement of the various radios without reprogramming the computer. Figure 1.1 shows a diagram of the general approach to digital flight simulation. 1 .2 Radio Aids Coverage ILLIMAC is designed to cover up to 600,000 square miles. Al- though it does not have the storage capacity to store radio facilities for this large area, it does offer the ability to program radio stations over a path 200 miles wide from Lost Angeles to New York (Figure 1.2). This capacity permits the simulation of long cross-country jet aircraft flights at high altitudes. Since the mission is a high altitude flight programmed to land at a specific airport, the PROM memory system contains only the V0R-DME stations necessary for this specific training exercise. When the training mission occurs in a local area, all facili- ties may be programmed. In this case, the coverage will be 60,000 square miles. The user of the equipment may change from one type of program to another by having a spare memory card which has been preprogrammed for either application. 1.3 The Problems The aim of this study is to investigate various types of r < r < < 3 LL) > ID co O a LJJ C_> C_> < I I < o a: o cc to UJ 3 -1 C_> LU _J 5:^> g o cc ♦ ♦ t 6 o cc t £ ii tS £ 8 < 3£ to _3 2: UJ < o F < TT L* I co LU O CO cc cc en c o oo o o s- Q. Q. (T3 CD 5- > ^ £ DESIGNED FOR 256 FACILITIES WITH TOTAL PROGRAMABLE AREA INCLUDING CONTINENTAL UNITED STATES 5 200 Ml TYPICAL AREA FOR JET TRAINING FLIGHT FROM LOS ANGELES TO NEW YORK TYPICAL AREA FOR TRAINING FLIGHT FOR STANDARD AVIATION TRAINER AIRCRAFT Fig. 1.2 Radio Navigation Aids Coverage arithmetic processors for use in a real time environment where high ac- curacy and resolution are required for values of the elementary trans- cendental functions such as the many trigonometric relationships in- volved in the navigation equations of aircraft simulation. This arith- metic unit would be one part of the ILLIMAC described above. Listed below are problem areas which have historically con- sumed a large amount of flight simulator digital computer time and memory: 1. Arbitrary function generation of 1, 2 and 3 variables using fixed and/or variable breakpoints. 2. Algorithms for the computation of standard trigonometric and inverse trigonometric functions. 3. 3x3 matrix coordinate transformations. 4. Solutions of the basic equations of motion to provide velocities and rotation angles. Use of quaternion/direc- tion cosine techniques to resolve the gimbal lock problem. 5. Integration techniques satisfying the short term and long term dynamic problems encountered in aircraft simulation. 6. The RMA (/x 2 + y 2 + z) problem. The arithmetic unit must be able to multiply, divide and solve all or a large number of the aforementioned problems. Two main ap- proaches have been considered: 1. Coordinate Rotation Methods developed by J. Voider, B. G. DeLugish, T. C. Chen and J. S. Walther. 2. Methods based on function approximation employing techni- ques common to numerical analysis and relying on algorithms using multiplication as the basic operator. Recent ad- vances in LSI technology allow the use of fast and rela- tively inexpensive monolithic multipliers. These two methods have advantages in: speed cost, and uniformity in the various algorithms (use of a fast multiplier). We believe that the second approach is more flexible and al- lows for more efficient use of the large scale memories available today. For general purpose processors, where the precision required does not exceed 32 bits, this second method can be much faster than the coordi- nate rotation technique. However the first method lends itself more naturally to low speed, high accuracy processors. A striking feature in both techniques is the similarity of the different function routines. That is, a common microprogram control subroutine and common processor may be used in the different modes of operation. This represents an economy in the total amount of hardware required. Both techniques are useful to the designer of small and rela- tively powerful machines. The main difference is that the numerical analysis approach can be yery fast because it does not operate in a digit by digit manner. The operations are performed using an n x n bit parallel multiplier resulting in essentially a hardwired implementation of the function approximation algorithms. This second method is the preferred one with the recent introduction of 4 x 2 and 4x4 monolithic array multipliers. The basic building block for numerical computa- tions need no longer be the 4 bit adder or ALU today. Rather, fast multipliers {yery expensive only two years ago) may be used for imple- menting complex routines. This technique lends itself naturally to the arbitrary function generation problem and all interpolation problems in general. Moreover, this leads to an interesting question for the designer: How to adapt the well developed numerical analysis methods to the available LSI technology in building a special purpose numerical processor. Aviation, and simulation in general, illustrates well how cost can be drastically reduced by using special purpose hardware, removing most of the computational burden from the software design. The coordinate rotation technique will be explained in detail in Section 2. An 8-bit unit, implementing the trigonometric mode has been built. In addition, an n bit unit implementing trigonometric, hyperbolic and linear modes has been simulated. A theoretical study of the most efficient use of numerical al- gorithms is examined in Section 3. Simulation and realization of these methods is left for further study. 2. ANALYSIS OF DIGIT-BY-DIGIT METHODS 2.1 Comparison of the Voider; De Lugish; Cantor, Estrin and Turn; Walter and Chen Approaches to the Coordinate Rotations Methods D. Cantor, G. Estrin and R. Turn [CAN70] propose a sequential table look-up algorithm for calculating lnx and e x . The main feature of this algorithm is choosing the multipliers having short word length in order to force the operand to the value 1 or to 0. Speed is attained through the proper choice of short multipliers. The number of precom- puted constants is 2 for e x if x has n bits. Thus 2 " + 2 n ~ con- stants are needed because of complementary terms needed in the con- stants. Recoding is not used. De Lugish [LUG70] has generalized the Cordic principle deve- loped by Voider in 1959, by evaluating functions using redundant numbers of radix two. Chen [CHE72] applied the method to nonredundant numbers, using a Taylor series to approximate the lower half of the number being computed. De Lugish' s method, like Walther's, tests a variable whose value determines the precision of the approximation and the sign of the correction to be made to the current value. He showed that by using constants of the form a. = 1 + S. 2~ 1_ where S. can take the values l l l {-1, ,+1} he could obtain a better shifting average. The problems in the implementation for different functions resides in: — The lack of a common hardware topology. Many different paths must be provided for each function if several func- tions are to be implemented. 10 -- The initialization process is neither simple nor uniform for all the algorithms. -- The precision is computed assuming that the adders have infinite lengths. For L bits of accuracy, Walther points out that (L + log^L) bits of storage are required. This remark holds for De Lugish's algorithms as well. The maximum computation time for both De Lugish's and Walther' s methods is on the order of n for n bits accuracy. The number of stored constants for De Lugish's and Walther' s method is the same. However Walther (by generalizing Voider' s equations) uses the same basic method for initializing several variables and one of them is forced to 0. One of the great advantages of this method is that only one relationship is required for all functions, using only 2 sets of stored constants. Another inconvenience that is found in some of De Lugish's algorithms is the necessity of changing recursion relations in the middle of a computation (e.g., cosine and sine) making the control more complex than in Walther' s method. 2.2 The Principle of Operation The basic algorithm was first proposed by Voider (Coordinate Rotation Digital Computer) for use in trigonometric relations found in flight simulation. It was generalized by J. S. Walther to the hyper- bolic mode. The convergence properties will be discussed in detail in the following section. In this section the iteration equations will be briefly reviewed. The generalization introduced by Walther makes use of a parameter m in a coordinate system in which the radius R and angle 11 A of the vector P = (x,y) are defined as: R = (x 2 + my 2 ) 1/2 -1/2 . -1, 1/2 yx A = m tan (m ~) A The curves corresponding to solutions of the equation R = constant = C Q are: a circle of radius Cq , a line parrallel to the y-axis with equation X = C« , and a hyperbola for m = 1 , m = and m = -1 respectively. Let a new vector P.,, = (x.^,, y..,) be obtained from P. = l+l l+l •'i+V i (x-, y.) according to X. +1 = X, + W i 6 i y 1+1 ■ y, - x.«.. The angle and radius of the new vector in terms of the old one are given by: A.., = A. - a. l + l i l R.^, = R. * K. l+l l i where -1/2 . -l r 1/2 j -, a. - m tan [m o.J K. = [1 + m6 2 ] 1/2 l l 12 After n Iterations: n-1 A = A - 7 a. = A -a n o .L Q i o n-1 R = R * n K. = R *K n o i=Q i o Solving for x and y 3 n -'n = n K. {x cos(am 1/2 ) + y m 1/2 sin(am 1/2 )} n i=Q 10 - -o y n = . n K i {y o cos ( aml/2 ) " x m_1/2 sin ( aml/2 )> Z accumulates the angle variations: n Z = z + a. n o If A is forced to zero: y = If z is forced to zero: z = n m = 1 , m = 0, m = 1 correspond to the trigonometric, linear, and hyperbolic mode. In the case where 6. is chosen to be of the form 2 , the multi- l plications by 6- are merely right shifts. By proper choice of the initial values for x , y , and z , solutions for y/x, sinz, cosz, tan" y, sinhz, coshz and tan y may be obtained. The square root can be computed by using *^w = (x -y ) where x = w + 1/4 and y = w - 1/4 In the hyperbolic mode one can obtain the logarithm: 13 lnw = 2 tanh~ (y/x) where x = w + 1 and y = w - 1 . 2.3 Convergence and Domains of Operation By convergence we mean that either the Y register or the A register converges to during the iterative sequence. This is equiva- lent to nulling the contents of the angle register by adding to and sub- tracting from it a series of predetermined constants denoted by a.. To be resolved are: 1. Convergence conditions: What are the restrictions on the constants a.? i 2. Convergence domain: How large may the initial angle be and still guarantee convergence to zero? The primary proof is in [WAL 71]. The A register contains successive values A., A. + ,, . . ., given by: A. +1 = A. ± a. a. > (2.1) where |A..,| = |A.| - a. (A. decreases in magnitude). 1 i+l ' ' l ' ii a ' We wish to guarantee that after n steps, the final angle is sufficiently close to 0. That is its absolute value will be less than a in the worst case, n The worst case occurs when the value of A becomes zero and we add (or subtract) the largest constant during the next step (Figure 2.1) Thus, we wish to guarantee that: 14 OC ) LAST STEP n+1 ► Figure 2.1 Nulling of the Quantity A. 15 °i " * °n (2 - 2) The domain of convergence is |A | < I a. + a (2.3) 1 1=1 n n for n steps (n angles a,, . . ., a ) Theorem : Given any (positive or negative) A, convergence is guaranteed if (2.1), (2.2) and 2.3) hold. To prove this we must establish that n |A. | - I a. < a n (Property P(n,i)) which can be done using double induction (i.e., on two variables) (Figure 2.2) a. The statement is true for (n, 1) because we choose n |A, | - I a. < a 1 j=l J by hypothesis (domain of convergence). Now if P(n,l), we must show P(n+l,l). For the same reason n+1 I A, I - 7 a . < a . , ' ] ' j=l J n+1 by choice of the domain of convergence for the initial angle A,. b. Now consider P(n,i) => P(n,i+1). That is, if |A | < I a. + a (2.4) i j=1 J n Given a property J (i,n) P(i,n 16 (1.1) * I Jl.n). -(i.n+1) if ftl.U 1 P(i.D >=> ^(1+1.1) and ^(i.n) :> 7 J (i,n+l) this guarantee the property to be true for all (i,n)'s Figure 2.2 The Double Induction Process 17 this implies that n |A. .1 I < I a- + a 1+1 ' j= f +1 J n If we subtract a. from (2.4) l n A. ' L I - a. < 7 a. + a - a. 1 i fa n which can be written n l A i' " a i < a n + I a i* 1 n n j=i+l J To be able to use : IM -a. | - |A 1+1 |. we must also show that n J : We can use (2.2) which is the constraint on the angles n 1 1 ■a - I a. < |A. | - a. n j=i+l n ] n or a. - y a. < a 1 j-T+i J n -a. < |A.| - a. and -a - I a. < -a. by (2.2) n j=i+l J n Thus, we have shown that 18 n n -a - 7 a. < -a. < |A. I - a. < a + T ot. n j4 +1 J i ' i 1 i n j= ^ +1 j which implies | A.. J T+i J J- for the 3 modes is that n a. < y a. + a 1 j=i+l J n must hold for a. = tan" 1 2" 1 , a. = tanh 2 l and a.,. = 1/2 1 . Linear Mode We have I = 2 " (i+1) + 2" (i+2) + . . . + 2~ n + 2~ n . 19 This sum is equal to 2 and thus satisfies the convergence criterion. Trigonometric Mode tan" ( 1/2 1 ) is to be compared with T = tan 2 v '+...+ tan 2 + tan 2 (2.5) We have also that tan" 1 2" n + tan" 1 2" n = tan' 1 (2" n+1 /l-2" 2n ). Thus tan 2 + tan 2 > tan 2 By applying this inequality repeatedly to the last two terms of (2.5) one gets: T > tan' 1 2~\ Q.E.D. Hyperbolic Mode For the hyperbolic mode we will see that the inequality does not hold in general but that by repeating some of the a. constants it can be made to hold. The sum to reduce is: H = tanh" 1 2" (i+1) ♦ tanh" 1 2" {i+2) + . . . + tanh" 1 Z n . We also have that tanh" 1 a + tanh" 1 b = tanh" 1 ^/ab (2,6) and thus, 20 tanh ' 2 n u + tanh ' 2 U U = tanh ' - zrrprn 1 + 2 u u -1 2~ 1 -1 -1 = tanh 9M+1 1 < tann 2 1 + 2 l u which implies that tanh-Y 1 ' > tan.fY (1+1 > + tanffY< 1+1 > > ttnh-Y (1+1) + tanh" 1 2" (i+2) (2.7) or a i > a i+l + a i+l > a i+l + a i + 2- We can here ask ourselves the question: What is the element x. that must be added to the second member of the inequality in order to make it an equality? tanh-V 1 5 tanh-¥ (i+1 > ♦ tanh^w" 11 * 2 * ♦ x. x, > tanh" 1 2" 1 - tanh" 1 2" (1+1) - tanlf V* 1 * 1 ' x. > tanh'V 1 - tanh" i " """' ' "*"" 1 + 2 -(2i+2) ,-i 2" 1 -1 " 1 + 2" (2i+2) x. I tanh 0~~ { , I 2 -(2i+2) } 1 + 2 This becomes: 21 .1 2 "(31+2) X i " Unh 1 _ 3x 2 -(2i+2) • We can choose the quantity x. = tanh" 1 (2" (3i+2) x 2) = a 3i+1 to be greater than since i 2 -(3i+2) tanh [ -(2i+2) ] 1 - 3x 2 U1 L) 1 >1. 1 - 3x 2^V We have then: a i < a i+l + Vl + a 3i+l (2 ' 8) after adding x. = ot~« + , to the second member of the inequality (2.7). Writing (2.8) for i+1 , i+2, . . ., 3i-2, Vl < a i+2 + a i+2 + a 3i+4 a i+2 < a i+3 + a i+3 + a 3i+7 a 3i-2 < a 3i-l + a 3i-l + a 3(3i-2)+T and adding, one obtains: 3i-2 a i < a i+l + a i+2 + • • • + a 3i-l + a 3i + .\. a 3j+l 22 By applying inequality (2.7) successively this reduces 3i 3i-4 i < k= | +1 a k + X a 3j+l + a 3(3i-3)+l + a 3(3i-2)+l a to 3i 3i-4 1 k=i+l k j=i 3j ] 3(3i-3)+l 3(3i-3)+l and finally to: 3i 3i-4 a i < k= ^ a k + i. a 3j + l + a 3(3i-3) * We repeat this reduction by writing the last inequality in the form: 3i 31-5 a i < k= l +1 a k + 1 a 3j+l + a 3(3i-4)+l + a 3(3i-3) ■ Using the inequality ot 3(3i_3) < a 3(3i-4)+i» 3i 3i-5 a i < k= \ +] a k + .^ a 3j+l + a 3(3i-4) + l + a 3(3i-4)+l ' The final inequality obtained is: 3i 31+1 a i < k= l +1 a k + a 3i+l + a 3i+l = J 1+1 a k + a 3i+l For the hyperbolic process, the significance of the last in- equality is that, starting from cu we have: a, < ou + ou + a. + a. . Starting from a.: a 4 < a 5 + a 6 + . . . + a ]2 + 0^3 + a ]3 23 Starting from a,-,: a 13 K a 13 + a 14 + • ' ' + a 40 + a 40 • Th us, depending on the precision desired the quantities {a^, a, 3> a.^, a, ? ,» . • • , a., a~. + ,} will have to be added twice (or subtracted twice) to insure convergence. 2.4 Precision for Radix 2 The accuracy of these algorithms is limited by the finite length of the registers and adders. An n-bit arithmetic unit is used to perform all of the operations and the following discussion examines the value of the worst case error after n steps of computation. Let us sup- pose that we have an n-bit adder and an n-bit register for storage of intermediate results. For the shifting operation we have chosen 8 bit barrel shifters 2 and have tried to minimize the cost by using (n )/8 barrel shifters 2 rather than the 2((n )/8) that would be required in order to retain the shifted bits. This truncation will have some effect on the accuracy (see Figure 2.3). In the following, the CORDIC equations below will be applied n times in sequence. Vi ■ x i + m 2_i h Vi ■ Y i - 2_i h The worst case error will occur when all of the shifted bits are o 24 s 2 1 = l - 2" 1 -1 -2 -2 +2 ' + 2 * = 1 - 2 * +2" 1 + . . . + 2" n = 1 ■ - 2" n E I (1 - 2" 1 ) 25 one's and the successive operations performed are the same (n successive additions or subtractions). If we assign a weight of 2 to the least significant bit the successive errors are (Figure 2.4): First step 0=0 Second step Third step n th step The sum of these errors is: n I 1=1 E = n - (1 - 2" n ) E = n - 1 + 2" n < n and this will consequently make the log ^(n) least significant bits of the result wrong. Thus, a full n-bits accuracy will require n + log ? n bits of storage. For n = 16, the maximum error will be n - 1 + 2" n = 15 + 2" 16 = 15 If we use an additional 4 bit ALU and 2 barrel shifters so that the four most significant bits of the n bits lost while shifting are retained, the errors will be (see Figure 2.5) Sixth step 2' 5 = 2" 4 (1-2" 1 ) Seventh step 2" 5 + 2" 6 = 2~ 4 (l-2~ 2 ) n th step 2 -5 + 2' 6 + . . . + 2" n = 2" 4 (l-2~ n+4 ) and their sum is: 'ii«r step siau) sii? ,i IH STEP BITS LOST 2' 2° 2"' T 2 26 ERROR ■* 2-'+2~ 2 ■* 2-' + 2- 2 -2'' 2 J + 2- a +- 2- n Figure 2.4 Accumulation of Errors in the Shifting Process BITS LOST N AWED UURIifi THE ALG0RITH1 Figure 2.5 Accumulation of Errors with Guard Bits ERROR TEK1S 2- 5 + 2"V.. 2-* 27 n-4 E = 2~ 4 I (1 - 2" 1 ) 1 i=l E 1 ■ 2" 4 (n-4) - 2" 4 (1 - 2" n+4 ) = 2" 4 (n-3) + 2' n < 2" 4 (n-2) and thus the error is considerably diminished. In the n = 16 case: -4 E, < 2 (16 - 2) = 14/16 < 1 and thus 16 bits of accuracy can be obtained by using one additional 4 bit ALU and 2 barrel shifters (Figure 2.6). 2.5 Computation of the Constants Stored in the ROM There are three types of constants involved in the CORDIC algorithms: a. L. = 2' 1 l b. T. = tan" 1 2' 1 c. H. = tanh" 1 2' 1 l a. The constants used in the linear mode can be generated by using a serial-in, parallel -out shift register. b. The arctangent constants have been computed in both decimal and binary. The binary number represents angles (Table 2.1). The following conventions are used: it is represented by 10.0 . . . j 01.0 ... J 00.1 . . . and so on. CO 28 cCs VO CO «d- c\j 00 in V) o •r- +-> CD e O) to s- CD 4-> O0 QJ S- S- CQ -a s- ro O c o +-> N m 03 us CM s- CD 29 tan" 1 (1/2) 1-1 Decimal Binary Rounded to 25 places 1 45.0000000000 ' 00, 2 26.5650511771 00. 3 14.0362434679 00. 4 7.1250163489 00. 5 3.5763343750 00. 6 1.7899106082 00. 7 0.8951737102 00. 8 0.4476141709 00. 9 0.2238105004 00. 10 0.1119056771 00. 11 0.0559528919 00. 12 0.0279764526 00. 13 0.0139882271 00. 14 0.0069941137 00. 15 0.0034970569 00. 16 0.0017485284 00. 17 0.0008742642 00. 18 0.0004371321 00. 19 0.0002185661 00. 20 0.0001092830 00. 21 0.0000546415 00. 22 0.0000273208 00. 23 0.0000136604 00. 1 0000000000000000000000 0100101 11 00100000001 010 001001 1 11 1 101 1001 1 10000 00010100010001000100011 00001 01 0001 01 100001 1010 00000101000101 110101 1 11 000000101000101 1 1 101 100 0000000101000101 1 1 11000 00000000101000101 1 1 I 100 OOCOOCCGClOlOCClCil 1110 0000000000101000101 1 1 11 00000000000101000101 1 1 1 000000000000101000101 1 1 0000000000000101000101 1 00000000000000101000101 00000000000000010100010 00000000000000001010001 00000000000000000101000 00000000000000000010100 00000000000000000001010 00000000000000000000101 00000000000000000000010 00000000000000000000001 shifting Table 2.1 Values of tan" 1 (1/2) i-1 30 It can be seen (Table 2.1) that for i > 8 the successive con- stants are obtained by shifting. The Taylor expansion for arctan is , 3 5 7 ■•I y y y tan x = x q + ~ c 7 ' * ' l x l < 1 If tan" x has n bits resolution, we see that when |x| is less than 2 ' , the additional terms of the series (x - x . . .), will not contribute to an n bit precision angle and thus we will have, for |x| < 2- n/3 , tan" x = x or if 1 > § , tan" 1 2" 1 ' = 2" 1 ' c. The tanh" constants have a similar property and have been computed in the same way (Table 2.2). For x > -z tanh" x - x As seen in the CORDIC equations, the radii of the successive vectors are modified by multiplicative constants of the form: T2T K. = A + m 2 for base 2 algorithms. me (-1 , 0, +1) After n steps of the iteration, the magnitude of the rotated vector has grown by a factor K. n-1 K = n /l + m 2 dl n 1-0 tanh" 1 (1/2)" 31 Decimal Binary (rounded 25 places) 1 0.5493061443 00. 2 0.2554128119 00. 3 0.1256572141 00. 4 0.0625815715 00. 5 0.0312601785 00. 6 0.0156262718 00. 7 0.0078126590 00. 8 0.0039062699 00. 9 0.0019531275 00. 10 0.0009765628 00. li 0.0004882813 00. 12 0.0002441406 00. 13 0.0001220703 00. 14 0.0000610352 00. 15 0.0000305176 00. 16 0.0000152588 00. 17 0.0000076294 00. 18 0.0000038147 00. 19 0.0000019073 00. 20 0.0000009537 00. 21 0.0000004768 00. 22 0.0000002384 00. 23 0.0000001192 00. 24 0.0000000596 00. 25 0.0000000298 00. 10001 100100111110101010 01 00000101 100010101 1 1 1 00100000001010110001010 00010000000001010101 101 000010000000000010101 10 0000010000000000000101 1 000000 1 ooooooooooooooi 00000001 OOOOOOOOOOOOOOI 00000000100000000000001 0000000001 0000000000001 00000000001 000000C0C0C1 00000000000100000000001 00000000000010000000001 00000000000001000000001 00000000000000100000001 00000000000000010000001 00000000000000001000001 00000000000000000100000 00000000000000000010000 OO0OOO0OO0O00O0O00Q1 000 00000000000000000000100 00000000000000000000010 00000000000000000000001 00000000000000000000001 00000000000000000000001 Table 2.2 Values of tanh" 1 (1/2)" 1 32 Table 2.3 shows the value in decimal and binary of the scaling factor K for n up to 24. n r The two scaling factors K, and K , can be stored in one ROM location if division by K is necessary during a computation. 1 n k o 33 Decimal Binary 1. 1 1. 2 1. 3 1. 4 1. 5 1. 6 1. 7 1. 8 1. 9 1. 10 1. il 1. 12 1. 13 1. 14 1. 15 1. 16 1. 17 1. 18 1. 19 1. 20 1. 21 1. 22 1. 23 1. 24 1. 4142135623730950 01.011010 5811388300841890 01.100101 6298006013006610 01.101000 6424840657522360 01.101001 6456889 157572530 01.101001 6464922787 124770 01.101001 6466932542736420 01.101001 6467435065968990 01.101001 6467560702048760 01.101001 6467592111398200 01.101001 6467599963756150 01.101001 6467601926846920 01.101001 6467602417619690 01.101001 6467602540312890 01.101001 6467602570986180 01.101001 6467602578654500 01.101001 6467602580571570 01.101001 646760258 1050850 01.101001 6467602581170660 01.101001 6467602581200620 01.101001 646760258 1208110 01.101001 646760258 1209980 01.101001 6467602581210440 01.101001 6467602581210560 01.101001 6467602581210590 01.101001 10000010011 1 10100 001 1000101 1000010 01001 1 10101001 11 1 0001 1 1 1001 1 10101 1 0101001011 11 10000 01100000001C0001 1 01 10001 101 101 1001 011001000011 11110 01 10010001 1 101000 01100100100000010 01100100100001001 01 100100100001010 0110010010000101 1 01 10010010000101 1 01 100100100001011 01 10010010000101 1 0110010010000101 1 01 10010010000101 1 01 10010010000101 1 0110010010000101 1 01 10010010000101 1 01 10010010000101 1 0110010010000101 1 01 10010010000101 1 01 100100100001011 Table 2.3 Values of n K. = n [1 + 2 2l ] 1/2 o n o 34 3. PROCESSOR SIMULATION 3.1 Software Simulation The CORDIC algorithms have been simulated by implementing the digital arithmetic unit in software . To avoid the normalization and rounding inherent in the 360 arithmetic operations , the algorithms have been simulated in binary. The printouts allow the designer to check in a straightforward manner the digital prototype machine (described in Section 4) by com- paring the binary outputs to the strings of O's and Ts of the printout. The program requires the value of m(+l , 0, -1) so as to specify the trigonometric, linear, of hyperbolic mode. The user specifies whether the Z or Y register sign is to be tested during the algorithm. The pro- gram prints out the final values of X, Y, Z, K and K~ . 3 mm The initial values, X and Y , are input by the user. o o r Z must be specified in degrees for the trigonometric and hyper- bolic modes. It is also possible to print out the intermediate values of the registers in decimal, both decimal and binary, or to print the final register values directly. N, the precision of the computation, can also be specified. Figure 3.1 shows the format of the program printout. The final binary result is then converted back to decimal and the relative error is computed by using the Call -OS FORTRAN double preci- sion routines: Error in % = 100 x CORDIC Value - True Value hrror in % iuu x True Va]ue Values in Decimal Values in Binary X„ Zn _1 -1 " K l Z - Z Q + tan ' (— jL) = Z Q - 45° Figure 3.2 shows the first steps of the algorithm. INITIALIZATION 37 FIRST 90* ROTATION ( IP ROTATION) 1-1 1-2 1-07 0000138147 Figure 3.2 Steps in the Trigonometric Mode 38 3.3 Linear Mode The example of division is used with constants equal to 2 . Figure 3.3 is a geometric interpretation of the algorithm with Y = -. 3 and X = .6. The result of the division is -.4999961853. o Tabulation for the linear mode is given in Table A. 2 of the Appendix. 3.4 Hyperbolic Mode The equations, X ■+ K (X cosh l n + Y sinh Z ) 2 Y -* K (Y cosh Z + X sinh Z ) 2 o oo o Z ■+ are tabulated in Table A. 3 in the Appendix. The value of K is 0.82978162013 with X = 1/K 9 and Y = 2 o 2 o and the result is: X = cosh Z Y = sinh 1 o Z = Iterations 4 and 13 have been repeated twice to insure conver- gence as shown in a previous section. One can see that the precision of these algorithms is not uniform. The simulations were done with no 39 SHIFT = 1 G = X Figure 3.3 Geometric Interpretation for Linear Mode 40 initial normalization and thus the relative precision has some variations which are a function of the initial values, Xq, Y q and Z Q . In analyzing the total precision, one must account for the fact that values are represented in binary with finite length registers. The next section gives an indication of the behavior of the error for a typical computation of the slant range at various bearing angles. 3.5 Precision of Results without Initial Normalization Figure 3.4 represents the relative error for the X and Z registers when the following formulas are computed in binary: X + K l V + Y o" Y -> 7 7 4. "I Y Z + Z n + tan y— u A Q Several angles ranging from 90 have been chosen. X n and Y~ have values of A cos A~ and X sin A~; A Q being the chosen bearing angle and A the slant range. The percent error is plotted as a function of the slant range for various bearings. The curves correspond to 20 significant digits. The error is computed in double precision. 41 QJ CTl C TCI o; 4-> C TCI TCI en ro =3 en 42 4. THE 8-BIT PROTOTYPE OF A CORDIC ARITHMETIC UNIT 4.1 Hardware Realization 2 The prototypp, constructed with T L technology is composed of: i) a microprogrammed control unit ii) the X, Y, and Z sections of the arithmetic unit including the 8-bit scalers. The use of these particular 8-bit position scalers presented some problems in the 2's- complement notation. These are explored in Section 4.3. In Section 4.4 the design and operation of the prototype is discussed in detail . 4.2 Control of the Arithmetic Unit The term "microprogramming" was introduced two decades ago by M. V. Wilkes. His intent was to offer a more systematic approach to control design in digital computers. Within the last few years greater understanding of the digital process combined with improved manufacturing technology rendering highspeed changeable control economically feasible has led to a resurgence of interest in computers whose controls may be modified during use. The working nucleus of a conventional digital computer is the central processing unit or CPU and its associated main storage unit. Data flows from several parts of the computer to and from the CPU and/or memory so that operations can be performed on it (addition/subtraction and other possible elementary operations). During the basic time slot or cycle an instruction is performed and some gates are opened, counters are incremented, and so on by control signals . Depending on the present state, the machine will execute another instruction (next state). 43 In the conceptually simplest possible organization (Figure 4.1), the storage part of the control section would contain a storage element corresponding to each control line connected directly to its control terminal. This usually requires too many control lines (100 is a common number) and the usual procedure is to examine the control signals in groups that are logically mutually exclusive to reduce the length of the microword. It is also necessary to establish the microprogram address (con- trol state) that is to succeed the current one (Figure 4.1). A common technique is to construct the next address from the current address by providing in the microword format a field that controls the modification of the current address as a function of the current state. This provides the possibility of conditional jumps and GOTO-like micro-instructions. Because of the decreasing cost of ROMS and the flexibility of microprogramming these techniques are becoming more and more popular for the control design of computers. Studies have shown that above a certain size the cost of microprogramming is less expensive than the cost of control by standard techniques. Mode of Operation of the Control The microprogram will consist of a series of micro-routines, one for each different machine instruction defined by the user, and a master macro-routine to effect branching to the appropriate routine for the current instruction. Each instruction in the machine program can be thought of as a macro-call with an operation code making the call, and the remainder of the instruction supplying the parameters. 44 a 3 2 a sz r :x 1 1 fc5 1 H 1 1 1 w § r "i i i •r— c en c 1- cn o s- Q. O S- o s- 0) c C o o ■a 03 s- a> o S- Q. O s- o s- (L) c QJ CM «3" S-. cn 47 switches for manual addressing (to examine the contents of a given memory location). The stack is used to save the value of the current-address- plus-one whenever the program jumps to a subroutine. The current address in incremented by one by means of an adder. To simulate a stack, a RAM and an UP-DOWN counter are used. The two control lines, "Push" and "Pop", are part of the micro-instruction. This configuration allows asynchronous control of the input/output flow between the machine and the computer. The details of the clock sequences are shown in Figure 4.3 for a positive or negative edge triggered control line. The delay, , will be necessary to allow the control bits A and B to be latched into the micro- instruction register before the occurrence of the clock edge if edge triggered devices are used. The JUMP TO SUBROUTINE will issue a signal on the PUSH control line while the POP is (Figure 4.3). The return signal will POP the preceding address from the stack and cause the data selector switch to get the next address from the stack. 4.3 The Use of Scalers for 2 ' s Complement Shifting The CORDIC algorithms require shifting by positions varying from 1 to n for a typical n-step algorithm. However an eight-bit position scaler which performs only end-off shifts was used. Figure 4.4 shows the diagram and truth table of the scaler. This basic 8-bit building block can be used for any shift function. It is possible to use a simple shift register which shifts any number of places right or left. However, an 8-bit position scaler package takes no more space than an 8-bit parallel-in, parallel-out shift register. In addition, the scaler is faster and has simpler control. JJJ LLL ilSF slllciji; .IU'i .OJ. SdiiCUIt ±_i i_± rUl Jrt ittl TlST I'JSI PUP Ti3T=FALS£ TEST=TlttJE g 1 | A H— H RLTiUJ GET TIC itXT AJJiiESS Film TIC STACK A.U 'TOP" ■ECT AD, 48 IWTL A.4J INCIIEUfi UJUNUj! ~l_fl fO Ma l.j. TO ADDRESS — w> SELECTOR COU.JTDI K> TO PUSITIVL QJGC TUIGGQIQJ I.rJT a> JEUY >£ TO .EWTIVE QJGL IiIIGGUCl) I.I'JT 7'IOJ cuci: ^^J~L « ► T - LUCLST MSTiUCTiai Figure 4.3 Microprogram Control and Timing 49 l 7 l 6 h U l 3 l 2 h l Q Example with a shift of 3 (S Q S,$ 2 =110) Truth table : INHIBIT ENABLE 1*2 *0 s 1 h °0 °1 °2 °3 °4 °6 °6 °7 'o '1 '2 '3 >4 '5 *6 '7 1 'l '2 '3 '4 '5 ! 6 '7 1 '2 T 3 T 4 T 5 T 6 T 7 1 1 '3 T 4 T 5 T 6 T 7 1 '4 T S T 6 h 1 1 1 T 5 T 6 h 1 1 1 T 6 T 7 • 1 1 1 X 1 X 1 X 1 X T 7 1 1 1 1 1 X X X X 1 1 1 Figure 4.4 Scaler and Truth Table 50 Scalers belong to the more general class of uniform shift networks. This includes . Scalers or shift circuits for end-off shifting . Rotate circuits for end-around shifting . Shift and rotate circuits or "barrel switches" which are capable of both end-off and end-around operation. The outputs of the scaler used are open collector to allow for array expansion. However, for 2's complement numbers, there is the problem of sign propagation for right-shifting. A number in 2's complement form with the left most bit being the sign, must have its left most bits filled in with the sign bit when divided by 2 n (Figure 4.5). This scaler replaces the left most bits by l's independently of the input binary number. Some kind of connection is therefore needed in order to insure proper sign propagation. 2 The number of units required for an n-bit scaler is (g) (Figure 4.5) One approach implementable for small scalers, is to use (g) extra 8-bit scalers having the sign bit of the shifted number as inputs. The various cases are shown in Figure 4.5. For all possible cases it is easy to see that if the outputs of the B scaler are complemented when the sign bit is 1 and unchanged when the sign bit is 0, and if these outputs are wire ANDed with those of the A scaler, one will have the required correction (Figure 4.6). However, this solution is not very economical for large n. 2 Another approach makes use of a 32 x 8 T L ROM. This solution avoids the duplication of (g) barrel shifters whose only function is to propagate the sign-bit (Figure 4.7). When the sign bit is the ROM sees 51 Positive 2's complement number: 1 1 A, A, A4 A 3 k I 1 1 \ 1 1 I l ; fc ST!'C SCALLi! A 3 — SOT scall: B A. 'A, A, A3 Az a; Ac — 1 — < 1 — Negative 2's complement number : \ A 5 A, A 3 A, Correction needed in this case 3 ► 1 1 1 A 6 A 5 A 4 A 3 A, S2'G SCALER A 6 A 5 A 4 A 3 A, A, A 3 ► I 1 1 1 S32'l3 SCALE! Figure 4.5 Sign Propagation with the 8-Bit Scaler 52 SIG.J BIT SO\LL( SCAIIK MEi-m (DC UUJPJTS) B- ©- B- ©■ ©- " ID - 1 ■ ID ^ ■ P - 1 ■ =)E>- 1 ©- ©■ ©- Figure 4.6 Correction of Sign Bit, First Method 53 = 58i , _ o .— o — o - 185522 = : -a o X5 C o o 0) CO 4-> CO c CO ■(-> O •4- o e •=t +-> X o> o fO c to 2: I to to CD S- -o < CD -t-> C3 J* 2< to 10 CD S- -o x> o 3 S- to c o S- o Figure 4.10 Microinstruction Address Register 59 Control -Microinstruction Register Figure 4.11 shows the microinstruction register. It is an edge-trigerred latch since some input signals return to their initial value and some remain at a new value. In other words, some control signals are pulses and some are constant. The execute pulse is delayed and used to generate the pulse signals whereas the level signals come directly from the MIR. The single shot generates the "execute" pulse. The inputs to the MIR come from a 32 x 20 bit RAM. Input Multiplexers Refer to Figure 4.12. Four mutliplexersare used to steer the values X , Y , and A or to initialize the registers X, Y and A depending on the mode. Switches are used for initialization. The select inputs are labelled A and B; the strobe, G; the data inputs CO, CI, C2, C3 and the outputs Y. The strobe, G, can be used to clear RIY, RIX and RIA, since, if G is a 1, the output of Y is low no matter what the inputs may be. Outputs Figure 4.1 3 shows the three 8-bit output registers and the strobe signals GOX, GOY, and GOA. CORDIC Figures 4.14, 4.15, and 4.16 show the CORDIC unit itself. One can distinguish 7km >> s- o E O) S- cn o S- I O ALL Ic'S /WE 74175 1-2 11 9 _ucci3 D areiS <0 15. if lp l(J id 20 !£> 35 wD t#Q 10 60 to CD c o S- E o ^> Figure 4.11 Microinstruction Register 61 DIP 8 Switches All IC's are 74153 en c QJ if) =3 id in fO CD Figure 4.12 Input Multiples rs All IC's are 7475 >- CO o GOY X CO < o GOX CO o GOA -L.JFT -iic |a 9 — fp - T 1 Figure 4.13 Outputs to > - Q. to ■5. to 62 si.ndi.no x oj, A J 63 L_J < x snajUiHS aaHHva a wohj CO \ < o Eh AI 2 a Eh fa H K CO Q Q Q Q Q c o i- o € a a -vw-*- -aa/v»- ~Wv-» — wv-» -w*»- -a*v» -am* — vw* 4-> E .c 4-> •r— S- < Q O CD X HOJ NOU.YZITtfl.LINI sinaiifio a 01 64 J -*• sHai„aiHS aaanva x wohj \ o EH s w E-i H a a £ € £ £ ■G Q Q Q ~WAr» -A«A^» «WK"Wr» »\A*r» "AfW» -*Ai\-»- -VW» ' TT TT TT TT TT TT T T 4" +-> s- o a. 4-> ■r- C o I— I a a: o o 0) s- 3 CD A tfOd NOIiVZITVIilNI 65 o < *- 2 O en < ° 0) E O O i- =3 IS 5 Sit u. < i 5 w O IM 66 . the input register stage, . the barrel switch stage, and . the adder/subtracter stage. In Figure 4. 16, the "angle" portion of the arithmetic unit, shows that the inputs to the adder/subs tracter are taken directly from the ROM, the scalers being unnecessary for the A portion of CORDIC. Speed The microprogram is shown in Table 4.1 and requires ten steps. Due to the limitation of the stored constant ROM used, the minimum step duration obtained is -■ p Mh z = 850 ns. The ten steps required by the algorithms thus take 8.5 us. l— UJ X cc UJ o r- O r— O O <— i— O O O O i— O O O O <— O .— O .— O O r— .— O O i— r— i— .— O O O O O O i— r- ,— 1 o r— O o o r— O 1- o 67 >- CD +-> S- o 4- q: o O l-H >- OO X _J o 1— r— O S- cn o J- a. o s- u >- _l o .— O i— •— o 00 00 UJ a: o o o o o o 1— o O f— o o o o o o o o r— O O r— r- O r— O 1— o O 1— o o o o >— eg 00 IT) CO CO CT> r— C c C c C c C c «/J z: O O O O +-> c •i - •r— •r™ •f— •r™ •r™ •^ • ^ r— 1—1 >> +■> 4-> +J -M 4-> +J -t-> -!-> 3 1— -0 •■■• 03 03 03 03 03 03 03 03 »/) ■M +-> •*-> 4-> +-> +J 4-> +J +J CD -O. 03 O O O O O O O S- cc s- >> -M S- i- S- s_ S- S- S- S- h- 03 S- O C 00 CD CD i. +-> T3 -O .c -C -C -C jz 2: 1— > «/> C s- 4-> +-> 4-> +■> 4-> +J t— ) U CD O 1 — x,, ... x we have f (n+l) m f(x) - p n (x) = ir(x) f in+] )y = E(x,0 where tt(x) = (x-x Q )(x-x-| ) ... (x-x ) 71 if x Q < x 1 < ... < x p and £ e [ x Q , x p ] The interval must be reduced to the interval [-1, 1] by an appropriate change of variables. The term f* ' (£) depends on x and on the abscissas x Q , x, ... x . One can chose to minimize |tt(x)| over [-1, +1] rather than minimizing |f(x) - p (x)|. If f^ n ' (?) does not vary too much in the interval, one would obtain a nearly "best approximating" polynomial. This technique can be used to find an approxi mation to all functions f(x) having continuous derivatives in [-1, +1], Thus we will minimize the quantity: +1 r I = ,2 w(x) |>(x)] dx -1 Where w(x) is a prescribed weighing function which is non- negative in [-1, +1]. This will minimize the error due to the factor tt(x) It is known that a) If w(x) = l,the n + 1 roots of tt(x) that minimize I are the zeros of p + -i(x), the (n+l) s Legendre Polynomial. b) If w(x) = , , the n + 1 abscissas are the zeros /lT^2 °f T n+ i( x )> tne (n + l) Chebyshev polynomial and are given by 72 x. = cos (£i±j- . |) i = 0, 1, ... n It follows that tt(x) = 2~ n T n+] (x) (x ,.i^LiMii£p w (2n+2)l n+l c) If w(x) = (l-x) a (1+x) 3 a > -1 > -1 the abscissas are the zeros of the Jacobi Polynomial. Next, we will study in more detail the Legendre and Chebyschev polynomials for division and will describe a hardware implementation for various precision. Range Transformations The formulas used generally assume that the variable x takes values in the interval [-1, l]. The range transformations are derived as follows: We have x = x Q + r (notation as above) with x ranging between x Q and X-, , the mapping into [-1, +1] leads to x = 2x _ V X x l" x x l" x x(x,-x n ) x,+x n That is, x = i— — + -h^- Legendre and Tchebychev Interpolation of First Order 1 2 The values x Q and x-, are the roots of P 2 (x) = 2"(3x -1) for a p Legendre approximation of f(x) and T 2 (x) = 2x -1 for the Tchebychev case. 73 FT Jn The roots, in either case, are x Q = -x, = a = j f° r Legendre and j for Tchebychev. We have for the new abscissas: Xj-Xq x ! +x n x i +x q x = " a * — 2 — ^ — 2 — = ~ a 2 — 2 — - X T X 0^ V x h (X 1 +X } x 1 = a ( — 2 — ) + — 2 — = a 2 + The new first order approximation formula then becomes: f(x'-,)-f(x' ) M x >- f ( x V + x yx' ° < X - X V ^J Comparing it with the previous formula for Legendre interpolation f(x,)-f(x n ) M x > = f ( x 0> + x r x ° (X - X } It appears that we have lost several advantages from a hardware implement- ation standpoint: x - x Q was directly represented by the n - n Q low order bits of x. x-, - x Q had the value 2" n computation of p(x) involved only the multiplication [f(x-,) - f(x Q )] x R However, equation( 5-l)can be further transformed by the following relation x'-j - x' Q = ah x - x' Q = x - (x Q + | - a|) 74 = x - x Q - 2-0-a) = r - |(l-a) = h[R - l(l-a)] Thus, given R, one can perform the subtraction [with (n-n«) bits precision] of the constant j (l-oi)« Complications arise when the multiplier is not a signed number. It is better to express (5-1) as a function of r = x - x~: f(x\) - f(x') Pl (x) = f(x' Q ) + x °» \1t Tchebychev : T 3 (x) = x(4x -3) with roots: - y^-, 0, yj The roots are of the form: -a, 0, +a. The change of variable to map [-1, +1] into x Q , x, leads to the following collocation points: „. . h , x l + x 1 a "o 9 x' - Xl + X ° x 3 - 2 . Let us now derive the equation in a form suitable for hardware implementation. One would like a formula of the type: 80 P 2 (x) = a Q + a,R, + a 2 R where R is the number defined previously. We have (x-x' 2 )(x-x' 3 ) (x-x'-jJtx-x^) p 2 (x) = (x 1 -x , 2 )(x r x , 3 ) *1 + (x' 2 -x' 1 )(x' 2 -x' 3 ) H + (x-x'^Cx-x-'g) (x 1 3-x' 1 )(x' 3 -x' 2 ) y 3' After some elementary transformations, one obtains the polynomial : P|Cx)-(y^fl-^y lt ^y,) 2a a 2a -(2+a) y, + 4y ? - (2-a) y 3 + ( ] - 2 -) R a ?y'i - 4y ? + 2y 3 ? a which can be evaluated in several different ways (e.g., parallel multip- lication or Horner's rule). The precision can be calculated as follows: We have (x-x')(x-x' )(x-x' ) ,,, f(x) - p(x) = J j£ 3 - f (5). For the division case, -- p(x) = (x-x^Mx-x'gMx-x^) x^-, In the Tchebychev case tt(x) is bounded by: 81 3 3 E = -3-iL with a = | or 12./3 E = 3/3h 3 m h 3 8xl2x/3 2 5 The upper bound on the error is, for the first interval [.5, .5 + 2" n 0] E= h 3 2 4 = — h 3 and for the last interval E = -f- and thus 2 5 2 -3n Q -5 < Error < 2 " 3n _1 n Q = number of bits looked up. 5.4 Higher Order Interpolation and Precision For interpolations of order higher than 3 it is possible to obtain speed improvement by using any of the well-known "economical polynomial evaluations". There are two sources of error when a polynomial is evaluated. For example in evaluating: p^x) = a Q + a^, the two errors are: (x-x' )(x-x') 1) f(x) - p(x) = °-2 L. f" ( ? ) = E] and 2) the representation of numbers using finite length registers and multipliers. In fact one performs 82 P-|(x) = (a Q + E Q ) + (a 1 + E 1 )(R + E R ) and the maximum error due to the adder is E Q . The maximum error due to the finite length inputs to the multiplier is: E 1 R + E R a l + E 1 E R^ E 1 +E R +E 1 E R One can reduce E Q relative to (2E-|+E, ) by using more accuracy for the first constant a Q . The other source of error is then due to the multiplier. The task of the designer is to optimize the trade-off between time and precision, Totally parallel evaluation is expensive and requires a large multiplier whereas semi -parallel multiplication can make use of a smaller multiplier but is more time consuming. For example using a 24 x 24 bit multiplier, is it possible to have a division routine requiring one multiplication cycle time for a precision of -?4 2 CH or less? By using a first order optimal interpolation and looking up 11 bits of the number x, the upper bound of the division routine error will be: 2 -2n Q = ^ The formula implemented is p, (x) = a Q + a-iR. The error on a, and R is called E, . If -25 -23 E, = 2 the errors can add giving a total error of 2 . The only solution is to have a) a bigger multiplier, making the error due to the series truncation predominate or 83 b) for the same multiplier, use smaller intervals and thus make the error due to the multiplier the predominate. In the example, if 12 bits of the initial number are looked-up > the error due to truncation of the series is bounded by 2 -2n . .,-26 The predominant error is due to the multiplication, a,xR, -24 and is 2 because of the 24 x 24 bit multiplier. 5.5 Application to Arbitrary Function Generation in Aviation Aircraft and engine performance data is received in the form of plots of performance variables as functions of other system variables. The functional relationships generally are multivariable and nonlinear, and can seldom be approximated by analytical equations without intro- ducing inaccuracies beyond the specification tolerances. Furthermore, the data is subject to change as experience is gained with the aircraft- general ly in the direction of increasing complexity. At present all function generation problems are solved by software and, as a result, the analytical approximations such as poly- nomial expansions must also increase in complexity, resulting in a cor- responding increase in simulation program execution time. Function generation by linear interpolation of stored data points is generally superior in aviation to analytical approximations for two reasons: 84 1. Data can be readily programmed directly from plots or tables without extra design time to develop an approxi- mation; 2. Increased data complexity can be met with no increase in execution time, since the interpolation formula does not change. Notations f(x,) and f(x~) are adjacent function values X-, and x ? are the corresponding argument values x is the current argument value h the difference x~ - x. Since many functions can use the same breakpoints or argument values a saving in time and memory space may be realized if the "inter- polant" X - X. Vi - x i is shared by the functions. (Typically the interpolant may be shared by three or more functions in practical cases.) If fast multiplication and division hardware routines are available, the majority of the time consumed in the interpolation process will be due to the "argument search" routine (which can be implemented in hardware too) . The linear interpolation for one variable, f(x) = Cf(x i+1 ) - f(x.)] * " ^ + f(x.), ii-i l x i+1 - x i 1 85 requires 1. Argument search: given x, where is x.? 2. One division. 2. One multiplication. A schematic of the organization of the AU is given next page, Xo r = 2 R 06 DATA SeLSCTOR e LJ DATA selector MxM ROM MULTIPLIER 3_i L_£ 1 f I ALU ( 74 1 81 s) I ACC. B B' Mgure !> 1 9rcjani ration of an ALU for functioi generation 87 6. CONCLUSION The purpose of this investigation was to explore various methods for generating functions for processors used in aircraft simu- lators. A small 8-bit CORDIC prototype with microprogrammed control was constructed implementing the trigonometric mode. Other methods based on polynomial approximations have also been reviewed. The CORDIC method has the advantage of simplicity in implementation and, being a bit by bit method using only addition and shifting as the basic opera- tions, it can be cheaper to implement than a parallel method. For avia- tion simulation, it is a matter of choice between the following alter- natives: a) A central arithmetic unit performing all the computations and being time-shared between all the flight instruments. In this case, obtaining a frequent updating of the in- struments requires a fast arithmetic unit. b) Several arithmetic units doing specialized tasks (alti- tude computation, speed, fuel, ... . Each AU could be a relatively slow and inexpensive bit-by-bit serial AU. The design of an aircraft simulation system would use a top- down development process. Since the simulator in the decentralized architecture would contain more than one arithmetic unit, the develop- ment would proceed such that modules could be built and tested as soon as they are developed. Operations requiring simple and infrequent computations would be implemented by CORDIC bit serial algorithms. 88 Operations requiring frequent and complex computations (like function generation in an aircraft) would use polynomial approximation algorithms for faster execution. Polynomial approximation methods have been studied which make use of polynomials other than the Taylor-MacLauri n series currently used in hardware implementations for division. It has been shown that better error control is achieved by using Tchebychef polynomials. The advantage of the parallel algorithms is that they employ multiplicative operations which can share a single n x n multiplier. The potential parallelism of these algorithms should be fur- ther investigated. For example, it is known that a polynomial of degree n can be evaluated in O(logn) operations. Is it possible to de- sign (in an array expandable form) an n x n x n multiplier or more generally an n multiplier? By implementing parallelism at the gate level (which could be called microparal lelism) , what improvements can be achieved for special purpose highly parallel and high speed arithmetic processors? Since polynomial evaluation lends itself to parallel proces- sing, what would the time bound on the arithmetic computation of func- tions become? What improvements would be achieved by replacing the processors performing binary operations by a 3- or k- processor per- forming k-nary multiplications and additions? Function evaluation by hardware means is currently being further 3 investigated as well as some possible designs for an n multiplier. 89 7. Appendix 90 NUMBER 0F BITS »720 M « ?1 TEST SIGN «?Y X — > K1*SQRT(X0**2+Y0**2) Y— >0 Z — >ZO ♦ ARCTANCYO/XO) Kl« 1.6467 60258 12 106 1/K 1-0. 60725293500888 X0 ■ ?• 6072529 YO ■ ?-• 6072529 -180 DEG.<« Z0 <» +180 DEQ Z0 ■ 70 INIT. 0K ?Y DECIMAL 0NLY ?N 0.6072502136 0.100110110111010011 SUB 0.000000000000000000 0.000000000000000000 -0.6072502136 1 1.011001001000101101 ADD 0.0 0.000000000000000000 SUB 90.0000000000 1.000000000000000000 INITIALIZATI0N A.l Output for the Trigonometric Mode 0.6078508136 0. 1.001 101 I 01 1 I 01001 1 ADD 91 0.100110110111010011 0.1001101101 1101001 1 0.6072502136 0. 1 001 1 01 1 01 1 1 01 001 1 SUB •90.0000000000 1 1.000000000000000000 ADD 45.0000000000 0.100000000000000000 FIRST 90 DEG. ROTATION 1.2145004272 1.001101101110100110 ADD 1.001101101110100110 0.000000000000000000 0.0 0.000000000000000000 SUB ■45.0000000000 1 1.100000000000000000 ADD 26.5649414063 0.010010111001000000 # SHIFTS ■ 1.2145004272 1.001 101 101 1 101001 10 SUB 0. 100110110111010011 1 1.101100100100010110 -0.6072502136 1 1.011001001000101101 ADD 18.4350585938 1 1 • 1 100101 1 1001 000000 SUB 14.0360641479 0.001001111110110011 # SHIFTS • 1 A.l (cont.) 92 1.4142265320 1.011010100000101011 0.000000000000000010 SUB 1 1.111111111111111111 -0.0000036147 1 1.111111111111111111 ADD >45.0006866455 1 1.011111111111111110 0.0 0.000000000000000000 # SHIFTS ■ 17 SUB 1.4142303467 1.011010100000101100 0.000000000000000001 ADD 0.0 0.000000000000000000 0.000000000000000000 SUB •45.0006866455 1 1.011111111111111110 0.0 0.000000000000000000 # SHIFTS « 18 C0RDIC VALUES TRUE VALUES ADD 1.414230346680 0.0 -45.000686645508 1.414163948760 0.0 -45.000000000000 ERR0R IN X 0.00470 ********** 0.00153 A.l (cont.) 93 NUMBER §F BITS -?20 M ■ ?0 TEST SIQN «7Y X— >X0 Y— >0 Z~ >Z0 ♦ (YO/XO) XO - 7.6 YO m ?- # 3 ZO - 70 INIT. 0K 7Y DECIMAL 0NLY 7N 0.5999984741 0. ! 001 1 001 1 001 1 001 1 0.000000000000000000 SUB 0.000000000000000000 0.2999992371 1 1.101100110011001101 ADD 0.0 0.000000000000000000 0.5000000000 0.100000000000000000 INITIALIZATI0N SUB A. 2 Output for the Linear Mode 94 0.5999984741 0.100110011001100110 SUB 0.010011001100110011 0.000000000000000000 0.0 0.000000000000000000 SUB -0.5000000000 1 1.100000000000000000 ADD 0.2500000000 0.010000000000000000 # SHIFTS m i 0.5999984741 0.100110011001100110 SUB 0.00100110011001 1001 1 1.111101100110011001 -0.1499977112 1 1.110110011001100111 ADD -0.2500000000 1 1.110000000000000000 SUB 0.1250000000 0.001000000000000000 # SHIFTS « 2 A. 2 (cont.) 0.5999984741 0.100110011001100110 0.000000000000010011 95 SUB 1 1.111111111111111111 -0.0000915527 1 1 . 1 1 1 1 1 1 1 1 1 1 1 1 1 01 000 ADD -0.4998779297 1 1.100000000000100000 0.0000610352 0.000000000000010000 # SHIFTS * 13 SUB 0.5999984741 0.100110011001100110 0.000000000000001001 SUB 1 1.111111111111111111 -0.0000572205 1 1.111111111111110001 ADD -0.4999389648 1 1.100000000000010000 0.0000305176 0.000000000000001000 # SHIFTS ■ 14 0.5999984741 0.100110011001100110 0.000000000000000000 SUB SUB 1 1.111111111111111111 -0.0000305176 1 1.111111111111111000 ADD -0.4999961853 1 1.100000000000000001 0.0 0.000000000000000000 # SHIFTS » 19 C0RDIC VALUES SUB 0.599998474121 -0.000030517578 -0.499996185303 TRUE VALUES 0.600000000000 0.0 -0.500000000000 ERR0R IN X 0.00025 ********** 0.00076 A. 2 (cont.) 96 NUMBER 0F BITS -720 M - 7-1 TEST SIGN *?Z X — >K2*K2*(YO*C0SH ZO + XO*SINH ZO) Z-->0 K2 « 0.82978162013890 1/K2 - 1.20513635844646 XO ■ 71.205136 YO - ?0 -180 DE6.<= ZO <* + 180 DEG ZO * 7-45 INIT. 0K ?Y DECIMAL 0NLY ?N 1.2051353455 1.001101001000001111 SUB 0.000000000000000000 0.000000000000000000 0.0 0.000000000000000000 SUB •45.0000000000 1 1.100000000000000000 ADD 49.4374389648 0.100011001001111101 INITIALIZATI0N A. 3 Output for the Hyperbolic Mode 97 1.2051353455 1.001101001000001111 ADD 0. 1001 101001000001 11 1 1.101 100101 101 11 1 100 -0.6025657654 1 1.011001011011111001 ADD 4.4374465942 0.000011001001111101 SUB 22.9868316650 0.010000010110001010 # SHIFTS « 1 1.0544929504 1.000011011111001101 SUB 0.010000110111 110011 1 1.111011001011011111 -0.3012847900 1 1.101100101101111100 SUB 18.5493850708 1 1.110010110011110011 ADD 11.3090515137 0.001000000010101100 # SHIFTS ■ 2 A. 3 (Cont.) 98 1.1254158020 1.001000000001101101 0.000000000000000001 ADD 1 1.111111111111111111 -0.5200729370 1 1.011110101101110010 ADD 0.0003433228 0.000000000000000001 0.0 0.000000000000000000 # SHIFTS - 18 SUB 1.1254119873 1.001000000001101100 0.000000000000000000 ADD 1 1.111111111111111111 -0.5200729370 1 I . 01 1 1 1 01 01 1 01 1 1 00 1 ADD 0.0003433228 0.000000000000000001 0.0 0.000000000000000000 SUB # SHIFTS ■ 19 C0RDIC VALUES 1. 125411987305 -0.520072937012 0.000343322754 TRUE VALUES 1. 127625629814 0.521095150503 0.0 ERR0R IN X 0.19631 0.19617 ********** A. 3 (Cont.) 99 Appendix A. 4 Overview of the Algorithms Proposed By Baker, Ferrari and Ling 100 P.W. Baker [BAK75] presents generalized higher radix algorithms for some elementary functions. Parallelism is achieved by using fast paral- lel m-bit multipliers where radix = 2 . They are a generalization of the CORDIC and deLugish algorithms based on multiplications by prestored constants of the form (1 + 2" 1 ) or ln(l + 2" 1 ). Baker's algorithm for Y/X uses = 1 = ULii y x x n^ where the f are selected to be of the form (1+d. r' k ) d.e {0,1 ,2. . . ,r-l} the algorithm is Y i+1 - X^l^r-") V i+1 = Y.(W.r- k ) The first multiplier d, is chosen by table look-up. Then at each step d. is chosen as the one's complement of a. , a. being the k 2 -ary coefficient of the number x. written base 2 m . The hardware requires variable shift networks as in the CORDIC algorithms and, in addition, two m-bits multipliers. Ferrari usesa Lagrange interpolation method of the function — between x = .5 and x = 1. He uses several colloquation points and looks up the coefficients of the first order interpolating polynomial. 101 Ling proposes a high-speed division algorithm based, like Steffanelli 's on the Taylor expansion of a fraction of the form yj— . The eight leading digits of the denominator are looked up and two multipliers working in parallel are needed to evaluate the approximating polynomial. This algorithm is designed for 32 bit precision and the same criticisms hold for both Stefanelli's and Ling's algorithms. 102 A. 5 ROM Usage As Proposed By Stefanelli 103 Stefanelli proposes a "binary read only memory divider". He shows the realisation of three inverting circuits computing the reciprocal Q = -5- of a binary number B. Their main feature is the use of read-only memories whose function are to provide look-up of prestored polynomial coefficients, and polynomial values. The three circuits differ only in the number of bits of the binary number B. The number B is normalized B = 1. b 1 b 2 ... b r b r+1 ... b m and the reciprocal is Q = 0.1, q^ 2 ... q m . The idea is to split B into two parts B, and B ? : 6 = 6^ 2~ r B 2 and look up the inverse Q-, of B-, 17- Q l =0J «'l «' 2 ••• O'm Then Q=i= 1 - 1 B 1 + 2- r B 2 B l 1 + 2" r ^2 B l -Q ] 1 + 2 _r (B 2 Q 1 ) The fraction is expandable in a Taylor series 1 1 l~X+X _ X •»• 104 Q = Q 1 (l-2" r Q 1 B 2 +2' 2r (Q 1 B 2 ) 2 - Z' 3r (Qfi 2 ) 3 +....) Now Q is to be evaluated using a three-stage procedure: 1 - Look Up 2 - Multiplication 3 - Addition/Subtraction. With current technology r can be chosen to be up to 14 (16K of memory addresses). If, for example, the first term of the series is considered, i.e. Q = Q 1 - 2" r Q^ B 2 The error arises primarily 1) From the truncation error of the series, whose absolute value is known to be less than the maximum value of the first term neglected. Here: -r 2 2) From the computation of 2 Q, Bp if B 2 is supposed to be known with any desirable precision, 2 2 the error remaining is due to Q, . The number of bits of Q-, is chosen such that this error is of the same order as the truncation error. When a higher number of bits are required, the second term of the series is considered; Q ■ Q 1 - 2" r Q^ B 2 + 2" 2r Q-, 3 B 2 2 . 105 However, B ? can be further split into two parts B 2 * B 3 + 2 " S B 4 in order to avoid too many bits of address for the computation (or 2 3 2 3 look-up of B 2 ). The term neglected are Q, B. and Q, B 3 B*. The first and second schemes are shown in Figures A. 5.1 and A. 5, 2. One can criticize the method in that it makes use of the Taylor polynomial for approximation. Other polynomials exist having better error behavior for the same cost in terms of ROM requirements and same speed. Also, a set of small dedicated multipliers is not yery efficient because they cannot be used for any other functions or even for multiplication of two full-size numbers. It is also impossible to use the same topology for sequential evaluation of higher precision division (more than 32 bits ). 106 "l ADDER Ron B l «? — ^^1 MULTIPLIER B ? ^^ Figure A. 5.1 Bn ROM Qi + ADDER + D l n 2 ^^ Q l Qf MULTIPLIER n h Q ROM SQUARE MULTIPLIER B 2 2 Figure A. 5. 2 ROM Usage proposed by Steffanelli 107 8. REFERENCES [BAK75] Baker, P. W. "Parallel Multiplicative Algorithms for Some Elementary Functions". IEEE Transactions on Computers, March 1975, pp. 3. [CAN70] Cantor, D., Estrin, G. E. and Turn, R. "Logarithmic and exponential function generation in a variable structure digital comp- uter, IRE Trans, on Electronic Digital Computer, Vol EC-11, Apr 1962. [CHE72] Chen, T. "Automatic Computation of Exponentials, Logarithms, Ratios and Square Roots". IBM J. Res. Development, Vol. 16, July 1972, pp. 380-8." [FER71J Ferrari, D. "A Division Method using a Parallel Multiplier" IEEE Transactions on Electronic Computers, April 1967, pp. 224-228. [LIN71] Ling, H. "High Speed Division for Binary Computers". Proc, 1971 Spring Joint Computer Conf . , Vol. 38, AFIPS Press, Montvale, N. J., 1971, pp. 373-8. [LUG70] DeLugish, B. G. "A Class of Algorithms for Automatic Eval- uation of Certain Elementary Functions in a Binary Computer", Ph. D Dissertation. Department of Computer Science, University of Illi- nois, Urbana, 111., June 1970. 108 [STE74] Stefanelli, R. "A High Speed Binary Divider". Unpublished paper submitted to the IEEE repository system. [WAL64] Wallace, C. S. "A Suggestion for a Fast Multiplier". IEEE Transactions on Electronic Computers, Vol. EC 13, February 1964, pp. 14-17. [WAL71] Walter, T. S. "A Unified Algorithm for Elementary Functions, 1 AFIPS Conference Proceedings, Spring Joint Computer Conference, pp. 379 to 385, 1971. BLIOGRAPHIC DATA 1EET 1. Report No. UIUCDCS-R-76-796 Tille .mil Subt it le SPECIAL PRUPOSE PROCESSORS FOR RADIO AIDS AND FUNCTION GENERATION IN AIRCRAFT SIMULATORS 3. Recipient's Accession No. 5. Report Date February 1976 'gfo'tfe H. GARCIA 8. Performing Organization Rcpt No.UIUCDCS-R-76-79f5 Performing Ojganizayoa .Name at}d Address University of Illinois at Urbana-Champaign Department of Computer Science and Institute of Aviation Urbana, Illinois 61801 10. Project/Task/Work Unit No. 11. Contract /Grant No. Sponsoring Organization Name and Address Institute of Aviation University of Illinois Urbana, Illinois 61801 13. Type of Report & Period Covered Master's Thesis 14. Supplementary Notes . Abstracts The purpose of this work was to investigate the feasibility of a special purpose hardware processor for flight simulation. The study is centered around the arithmetic part of the processor for use in a real time environment where high accuracy and resolution are required for values of the elementary transcendental functions. Two main approaches have been considered: -Coordinate rotation methods (Voider, De Lugish and Walther) -Methods based on function approximation employing techniques common to Numerical Analysis and relying on algorithms using multiplication as the basic operator. An 8-bit Arithmetic Cordic Unit has been built and the trigonometric mode microprogrammed. In addition, the general n-bit Arithmetic unit has been simulated for base 2 f Errors bounds for some numerical algorithms havP hPPn Hpri\/Prl. I Key Words and Document Analysis. 17a. Descriptors Function Generation, Arithmetic Unit, Cordic Algorithms, Coordinate Rotation technique, De Lugish Algorithms, Microprogramming. 1 • Identifiers/Open-Ended Terms 13 COSATI Field/Group '^Availability Statement RELEASE UNLIMITED ' C ,"l NTIS-35 (10-70) •L 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 117 22. Price USCOMM-DC 40329-P7I nEBEMmM