LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 5\0. 81 
 JJdA 
 
 Cop. Z 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 1 3 R 
 
 ■J 
 
 L161 — O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/specialpurposepr796garc 
 
f / o. f y 
 \JL6a) 
 
 w.7% uiucdcs-r-76-796 
 
 Jw^jIai 
 
 SPECIAL PURPOSE PROCESSORS FOR RADIO AIDS AND 
 FUNCTION GENERATION IN AIRCRAFT SIMULATORS 
 
 by 
 
 Gilles H. Garcia 
 
 February 1976 
 
 . 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
Report No. UIUCDCS-R-76-796 
 
 SPECIAL PURPOSE PROCESSORS FOR RADIO AIDS AND 
 FUNCTION GENERATION IN AIRCRAFT SIMULATORS 
 
 BY 
 Gilles H. Garcia 
 
 February 1976 
 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 Urbana, Illinois 61801 
 
 This work was supported by the Department of Computer Science and the Institute 
 of Aviation and was submitted in partial fulfillment of the requirements for 
 the degree of Master of Science in Computer Science, 1976. 
 
TABLE OF CONTENTS i 
 Section Page 
 
 1 . INTRODUCTION 1 
 
 1.1 The ILLIMAC Project 1 
 
 1.2 Radio Aids Coverage 3 
 
 1.3 The Problems 3 
 
 2. ANALYSIS OF DIGIT-BY-DIGIT METHODS 9 
 
 2.1 Comparison of the Voider; De Lugish; Cantor, Estrin 
 and Turn; Walter and Chen Approaches to the Coordin- 
 ate Rotations Methods 9 
 
 2.2 The Principle of Operation 10 
 
 2.3 Convergence and Domains of Operation 13 
 
 2.4 Precision for Radix 2 23 
 
 2.5 Computation of the Constants Stored in the ROM 27 
 
 3. PROCESSOR SIMULATION ..,. 34 
 
 3.1 Software Simulation 34 
 
 3 . 2 Tri gonometri c Mode 36 
 
 3.3 Linear Mode 38 
 
 3 . 4 Hyperbol i c Mode 38 
 
 3.5 Precision of Results without Initial Normalization.... 40 
 
Section Page 
 
 4. THE 8-BIT PROTOTYPE OF A CORDIC ARITHMETIC UNIT 42 
 
 4.1 Hardware Realization 42 
 
 4.2 Control of the Arithmetic Unit 42 
 
 - Mode of Operation of the Control 43 
 
 - Structure of the Basic Microprogram- 
 
 mable Unit 45 
 
 4.3 The Use of Scalers for 2's Complement Shifting 47 
 
 4.4 An 8-Bit Prototype CORDIC 54 
 
 5. METHODS BASED ON FUNCTION APPROXIMATION 68 
 
 5.1 High Speed Division Using High Speed Multipliers 68 
 
 5.2 Division and Function Generation Using Optimum 
 First Order Polynomial Interpolation with Se- 
 lected Abscissas 69 
 
 - Optimum Polynomial Interpolation with 
 Selected Abscissas 70 
 
 - Range Transf ormati ons 72 
 
 - Legendre and Tchebychev Interpolation 
 
 of First Order . 72 
 
 - Precision Attainable 75 
 
 5.3 Second Order Interpolation 79 
 
 5.4 Higher Order Interpolation and Precision 81 
 
 5.5 Application to Arbitrary Function Generation in 
 Aviation 83 
 
Section Page 
 
 6. CONCLUSION 87 
 
 7. APPENDIX 89 
 
 8. REFERENCES 107 
 
LIST OF FIGURES 11 
 Figure Page 
 
 1.1 General Approach to Flight Simulation 4 
 
 1.2 Radio Navigation Aids Coverage 5 
 
 2.1 Nulling the Quantity A. 14 
 
 2.2 The Double Induction Process 16 
 
 2.3 Shifting Operation Using Barrel Shifters 24 
 
 2.4 Accumulation of Errors in the Shifting Process 26 
 
 2.5 Accumulation of Errors with Guard Bits 26 
 
 2.6 Realization of Guard Bits with Barrel Shifters 28 
 
 3.1 Format of the Simulation Program Printout 35 
 
 3 . 2 Steps i n the Tri gonometri c Mode 37 
 
 3.3 Geometric Interpretation for Linear Mode 39 
 
 3.4 Relative Error as a Function of the Slant Range * 41 
 
 4.1 General Microprogramming Unit 44 
 
 4.2 General Microprogrammed Control 46 
 
 4.3 Mi croprogram Control and Timi ng 48 
 
 4.4 Scaler and Truth Table 49 
 
 4.5 Sign Propagation with the 8-Bit Scaler 51 
 
Figure Page 
 
 4.6 Correction of Sign Bit, First Method 52 
 
 4.7 Correction of Sign Bit, Second Method 53 
 
 4.8 Block Diagram of the CORDIC Unit 55 
 
 4.9 Control Flowchart (Trigonometric Mode) 57 
 
 4.10 Microinstruction Address Register 58 
 
 4.11 Microinstruction Register 60 
 
 4.12 Input Multiplexers 61 
 
 4.13 Outputs 62 
 
 4.14 CORDIC Arithmetic Unit, X Portion 63 
 
 4.15 CORDIC Arithmetic Unit, Y Portion 64 
 
 4.16 CORDIC Arithmetic Unit, A Portion 65 
 
 5.1 Organization of an ALU for Function Generation 86 
 
 A. 5.1 and A. 5. 2 ROM Usage proposed by Steffanelli 106 
 
LIST OF TABLES 111 
 
 Page 
 
 Table 2.1 Values of tan" 1 (1/2) 1 " 1 29 
 
 Table 2.2 Values of tanh" 1 (1/2)" 1 31 
 
 Table 2.3 Values ofllK- =11 [1 + 2" 2i ] 1/2 33 
 
 o o 
 
 Table 4.1 Microprogram for the Trigonometric Mode 67 
 
 Table 5.1 Error for the First Order Approximation for 
 
 f(x) = l/x and o(=l/2 76 
 
 Table 5.2 Error for the First Order Approximation for 
 
 f(x) = l/x and o/=^2/2 (Tchebycheff Approximation) 77 
 
 Table 5.3 Error for the First Order Approximation for 
 
 f(x) = l/x and tf=V3/3 (Legendre Approximation) 78 
 
 Table A.l Output for the Trigonometric Mode 90 
 
 Table A. 2 Output for the Linear Mode 93 
 
 Table A. 3 Output for the Hyperbolic Mode 96 
 
ACKNOWLEDGMENTS iv 
 
 The author would like to express his appreciation to Professor 
 William Kubitz for his invaluable guidance, support and continuing help 
 throughout this work. 
 
 My debt is particularly great to Professor Ralph Flexman, 
 the Director of the Institute of Aviation, for his constant support and 
 to Lynn Staples with whom I have had many valuable conversations. His 
 fruitful and original ideas in the area of flight simulation and his 
 constant friendship and understanding during this project contributed 
 in no small way to its success. 
 
 My sincere thanks are also extended to Stanislas Zundo for the 
 care he took to do all the drawings and to the Department printing ser- 
 vices for the quality of their work. 
 
1. INTRODUCTION 
 1.1 The ILLIMAC Project 
 
 The purpose of this work was to investigate the feasibility 
 of a special purpose hardware processor for flight simulation. The pro- 
 ject was supported by The Institute of Aviation, Singer SPD Corporation 
 and the Department of Computer Science. The Institute of Aviation has 
 as a definite long-range goal updating its curriculum and activities to 
 incorporate advanced simulation and sophisticated avionics so that the 
 student may have a better introduction to the total field of modern 
 aviation. 
 
 The Institute would like a low cost, maintenance-free, high- 
 fidelity simulator which meets the training requirements from begin- 
 ning student pilot through private, commercial, instrument and multi- 
 engine pilots. Recent advances in simulation engineering have made great 
 contributions to improving the quality of pilot training. Unfortunately 
 these advances in the engineering of flight simulators have not been 
 available to most of general aviation because of the high cost assoc- 
 iated with their acquisition. 
 
 ILLIMAC I (University of Illinois Mini Aviation Computer) of- 
 fers a reasonable probability for providing significant cost reduction 
 to flight simulation for all levels of application. Present simulator 
 technology available to general aviation in an acceptable cost bracket 
 is either too limited in fidelity, range of performance, or reliability 
 to permit full exploitation of the flight simulator concept. It is 
 
believed that the use of digital computation and microprogramming in 
 the ILLIMAC will resolve many of the aforementioned problems. 
 
 Among the different features of the ILLIMAC Project are: 
 
 1. Solving flight and radio aids problems in six degree of 
 freedom. 
 
 2. A Central dedicated arithmetic unit which performs the 
 major arithmetic computations in the flight and radio 
 problems. 
 
 3. Several satellite special purpose processors which remove 
 the major load from the central processor. 
 
 4. Expandability so as to allow ILLIMAC to simulate fuel 
 depletion, control loading, motion problems, flight ac- 
 cessories and other functions by adding other satellite 
 processors. 
 
 5. Microprogramability. All programmed instructions for the 
 computations which are common to all flight simulators 
 are preprogrammed in ROMs. Functions which identify par- 
 ticular aircraft characteristics or NAV/COM and locations 
 are programmed using RAMs or PROMs. 
 
 6. Three data communication links. The first link is the 
 data transmission system between the ILLIMAC and the in- 
 structor's console. The second link is the data transmis- 
 sion system to the visual display generation equipment. 
 The third link is the data transmission of a common audio 
 system where multiple audio systems can be channeled so 
 
that each channel may be independently selected by tuning 
 a simulated aircraft type receiver. 
 
 7. Radio interchangability. The system provides for rear- 
 rangement of the various radios without reprogramming the 
 computer. 
 
 Figure 1.1 shows a diagram of the general approach to digital 
 flight simulation. 
 
 1 .2 Radio Aids Coverage 
 
 ILLIMAC is designed to cover up to 600,000 square miles. Al- 
 though it does not have the storage capacity to store radio facilities 
 for this large area, it does offer the ability to program radio stations 
 over a path 200 miles wide from Lost Angeles to New York (Figure 1.2). 
 This capacity permits the simulation of long cross-country jet aircraft 
 flights at high altitudes. Since the mission is a high altitude flight 
 programmed to land at a specific airport, the PROM memory system contains 
 only the V0R-DME stations necessary for this specific training exercise. 
 
 When the training mission occurs in a local area, all facili- 
 ties may be programmed. In this case, the coverage will be 60,000 square 
 miles. The user of the equipment may change from one type of program to 
 another by having a spare memory card which has been preprogrammed for 
 either application. 
 
 1.3 The Problems 
 
 The aim of this study is to investigate various types of 
 
r 
 
 < 
 
 r 
 
 < < 
 
 
 3 
 
 LL) 
 
 > 
 
 ID 
 
 co 
 
 O 
 
 a 
 
 LJJ 
 C_> 
 C_> 
 < 
 
 I I 
 
 < 
 
 o 
 a: 
 
 o 
 cc 
 
 to 
 
 UJ 
 
 3 
 
 -1 C_> LU _J 
 
 5:^> g 
 
 o 
 cc 
 
 
 ♦ ♦ t 
 
 6 
 
 o 
 cc 
 
 t £ 
 
 ii 
 
 tS 
 
 £ 8 
 
 < 
 
 
 3£ 
 
 to _3 
 2: UJ 
 
 < o 
 
 F < 
 
 TT 
 
 L* I 
 
 co 
 
 LU 
 O 
 
 CO 
 
 cc 
 
 cc 
 en 
 
 c 
 o 
 
 oo 
 
 o 
 
 o 
 
 s- 
 
 Q. 
 Q. 
 
 (T3 
 
 CD 
 5- 
 
 > ^ £ 
 
DESIGNED FOR 256 FACILITIES WITH TOTAL PROGRAMABLE 
 AREA INCLUDING CONTINENTAL UNITED STATES 5 
 
 200 Ml 
 
 TYPICAL AREA FOR JET TRAINING FLIGHT FROM LOS ANGELES 
 
 TO NEW YORK 
 
 TYPICAL AREA FOR TRAINING FLIGHT FOR STANDARD 
 
 AVIATION TRAINER AIRCRAFT 
 
 Fig. 1.2 Radio Navigation Aids Coverage 
 
arithmetic processors for use in a real time environment where high ac- 
 curacy and resolution are required for values of the elementary trans- 
 cendental functions such as the many trigonometric relationships in- 
 volved in the navigation equations of aircraft simulation. This arith- 
 metic unit would be one part of the ILLIMAC described above. 
 
 Listed below are problem areas which have historically con- 
 sumed a large amount of flight simulator digital computer time and 
 memory: 
 
 1. Arbitrary function generation of 1, 2 and 3 variables 
 using fixed and/or variable breakpoints. 
 
 2. Algorithms for the computation of standard trigonometric 
 and inverse trigonometric functions. 
 
 3. 3x3 matrix coordinate transformations. 
 
 4. Solutions of the basic equations of motion to provide 
 velocities and rotation angles. Use of quaternion/direc- 
 tion cosine techniques to resolve the gimbal lock problem. 
 
 5. Integration techniques satisfying the short term and long 
 term dynamic problems encountered in aircraft simulation. 
 
 6. The RMA (/x 2 + y 2 + z) problem. 
 
 The arithmetic unit must be able to multiply, divide and solve 
 all or a large number of the aforementioned problems. Two main ap- 
 proaches have been considered: 
 
 1. Coordinate Rotation Methods developed by J. Voider, B. G. 
 DeLugish, T. C. Chen and J. S. Walther. 
 
 2. Methods based on function approximation employing techni- 
 ques common to numerical analysis and relying on algorithms 
 
using multiplication as the basic operator. Recent ad- 
 vances in LSI technology allow the use of fast and rela- 
 tively inexpensive monolithic multipliers. 
 These two methods have advantages in: 
 speed 
 cost, and 
 
 uniformity in the various algorithms (use of a fast 
 multiplier). 
 We believe that the second approach is more flexible and al- 
 lows for more efficient use of the large scale memories available today. 
 For general purpose processors, where the precision required does not 
 exceed 32 bits, this second method can be much faster than the coordi- 
 nate rotation technique. However the first method lends itself more 
 naturally to low speed, high accuracy processors. 
 
 A striking feature in both techniques is the similarity of 
 the different function routines. That is, a common microprogram control 
 subroutine and common processor may be used in the different modes of 
 operation. This represents an economy in the total amount of hardware 
 required. Both techniques are useful to the designer of small and rela- 
 tively powerful machines. The main difference is that the numerical 
 analysis approach can be yery fast because it does not operate in a 
 digit by digit manner. The operations are performed using an n x n bit 
 parallel multiplier resulting in essentially a hardwired implementation 
 of the function approximation algorithms. This second method is the 
 preferred one with the recent introduction of 4 x 2 and 4x4 monolithic 
 
array multipliers. The basic building block for numerical computa- 
 tions need no longer be the 4 bit adder or ALU today. Rather, fast 
 multipliers {yery expensive only two years ago) may be used for imple- 
 menting complex routines. This technique lends itself naturally to 
 the arbitrary function generation problem and all interpolation problems 
 in general. Moreover, this leads to an interesting question for the 
 designer: How to adapt the well developed numerical analysis methods 
 to the available LSI technology in building a special purpose numerical 
 processor. Aviation, and simulation in general, illustrates well how 
 cost can be drastically reduced by using special purpose hardware, 
 removing most of the computational burden from the software design. 
 
 The coordinate rotation technique will be explained in detail 
 in Section 2. An 8-bit unit, implementing the trigonometric mode has 
 been built. In addition, an n bit unit implementing trigonometric, 
 hyperbolic and linear modes has been simulated. 
 
 A theoretical study of the most efficient use of numerical al- 
 gorithms is examined in Section 3. Simulation and realization of these 
 methods is left for further study. 
 
2. ANALYSIS OF DIGIT-BY-DIGIT METHODS 
 
 2.1 Comparison of the Voider; De Lugish; Cantor, Estrin and Turn; 
 Walter and Chen Approaches to the Coordinate Rotations Methods 
 
 D. Cantor, G. Estrin and R. Turn [CAN70] propose a sequential 
 table look-up algorithm for calculating lnx and e x . The main feature 
 of this algorithm is choosing the multipliers having short word length 
 in order to force the operand to the value 1 or to 0. Speed is attained 
 through the proper choice of short multipliers. The number of precom- 
 puted constants is 2 for e x if x has n bits. Thus 2 " + 2 n ~ con- 
 stants are needed because of complementary terms needed in the con- 
 stants. Recoding is not used. 
 
 De Lugish [LUG70] has generalized the Cordic principle deve- 
 loped by Voider in 1959, by evaluating functions using redundant numbers 
 of radix two. Chen [CHE72] applied the method to nonredundant numbers, 
 using a Taylor series to approximate the lower half of the number being 
 computed. 
 
 De Lugish' s method, like Walther's, tests a variable whose 
 
 value determines the precision of the approximation and the sign of the 
 
 correction to be made to the current value. He showed that by using 
 
 constants of the form a. = 1 + S. 2~ 1_ where S. can take the values 
 
 l l l 
 
 {-1, ,+1} he could obtain a better shifting average. The problems in 
 
 the implementation for different functions resides in: 
 
 — The lack of a common hardware topology. Many different 
 paths must be provided for each function if several func- 
 tions are to be implemented. 
 
10 
 
 -- The initialization process is neither simple nor uniform 
 for all the algorithms. 
 
 -- The precision is computed assuming that the adders have 
 infinite lengths. 
 For L bits of accuracy, Walther points out that (L + log^L) bits of 
 storage are required. This remark holds for De Lugish's algorithms as 
 well. The maximum computation time for both De Lugish's and Walther' s 
 methods is on the order of n for n bits accuracy. The number of stored 
 constants for De Lugish's and Walther' s method is the same. However 
 Walther (by generalizing Voider' s equations) uses the same basic method 
 for initializing several variables and one of them is forced to 0. One 
 of the great advantages of this method is that only one relationship 
 is required for all functions, using only 2 sets of stored constants. 
 
 Another inconvenience that is found in some of De Lugish's 
 algorithms is the necessity of changing recursion relations in the 
 middle of a computation (e.g., cosine and sine) making the control more 
 complex than in Walther' s method. 
 
 2.2 The Principle of Operation 
 
 The basic algorithm was first proposed by Voider (Coordinate 
 Rotation Digital Computer) for use in trigonometric relations found in 
 flight simulation. It was generalized by J. S. Walther to the hyper- 
 bolic mode. The convergence properties will be discussed in detail 
 in the following section. In this section the iteration equations will 
 be briefly reviewed. The generalization introduced by Walther makes use 
 of a parameter m in a coordinate system in which the radius R and angle 
 
11 
 
 A of the vector P = (x,y) are defined as: 
 
 R = (x 2 + my 2 ) 1/2 
 
 -1/2 . -1, 1/2 yx 
 A = m tan (m ~) 
 
 A 
 
 The curves corresponding to solutions of the equation 
 
 R = constant = C Q 
 
 are: a circle of radius Cq , a line parrallel to the y-axis with equation 
 
 X = C« , and a hyperbola for m = 1 , m = and m = -1 respectively. 
 
 Let a new vector P.,, = (x.^,, y..,) be obtained from P. = 
 
 l+l l+l •'i+V i 
 
 (x-, y.) according to 
 
 X. +1 = X, + W i 6 i 
 
 y 1+1 ■ y, - x.«.. 
 
 The angle and radius of the new vector in terms of the old one 
 are given by: 
 
 A.., = A. - a. 
 l + l i l 
 
 R.^, = R. * K. 
 l+l l i 
 
 where 
 
 -1/2 . -l r 1/2 j -, 
 a. - m tan [m o.J 
 
 K. = [1 + m6 2 ] 1/2 
 l l 
 
12 
 
 After n Iterations: 
 
 n-1 
 
 A = A - 7 a. = A -a 
 n o .L Q i o 
 
 n-1 
 
 R = R * n K. = R *K 
 n o i=Q i o 
 
 Solving for x and y 
 3 n -'n 
 
 = n K. {x cos(am 1/2 ) + y m 1/2 sin(am 1/2 )} 
 
 n i=Q 10 - -o 
 
 y n = . n K i {y o cos ( aml/2 ) " x m_1/2 sin ( aml/2 )> 
 
 Z accumulates the angle variations: 
 n 
 
 Z = z + a. 
 n o 
 
 If A is forced to zero: y = 
 
 If z is forced to zero: z = 
 
 n 
 
 m = 1 , m = 0, m = 1 correspond to the trigonometric, linear, 
 
 and hyperbolic mode. 
 
 In the case where 6. is chosen to be of the form 2 , the multi- 
 
 l 
 
 plications by 6- are merely right shifts. 
 
 By proper choice of the initial values for x , y , and z , 
 solutions for y/x, sinz, cosz, tan" y, sinhz, coshz and tan y may be 
 obtained. The square root can be computed by using 
 
 *^w = (x -y ) where x = w + 1/4 
 
 and y = w - 1/4 
 In the hyperbolic mode one can obtain the logarithm: 
 
13 
 
 lnw = 2 tanh~ (y/x) where x = w + 1 
 
 and y = w - 1 . 
 
 2.3 Convergence and Domains of Operation 
 
 By convergence we mean that either the Y register or the A 
 register converges to during the iterative sequence. This is equiva- 
 lent to nulling the contents of the angle register by adding to and sub- 
 tracting from it a series of predetermined constants denoted by a.. 
 
 To be resolved are: 
 
 1. Convergence conditions: What are the restrictions on the 
 
 constants a.? 
 i 
 
 2. Convergence domain: How large may the initial angle be 
 and still guarantee convergence to zero? 
 
 The primary proof is in [WAL 71]. The A register contains 
 successive values A., A. + ,, . . ., given by: 
 
 A. +1 = A. ± a. a. > (2.1) 
 
 where |A..,| = |A.| - a. (A. decreases in magnitude). 
 1 i+l ' ' l ' ii a ' 
 
 We wish to guarantee that after n steps, the final angle is 
 
 sufficiently close to 0. That is its absolute value will be less than 
 
 a in the worst case, 
 n 
 
 The worst case occurs when the value of A becomes zero and we 
 add (or subtract) the largest constant during the next step (Figure 2.1) 
 Thus, we wish to guarantee that: 
 
14 
 
 OC ) LAST STEP 
 
 n+1 
 
 ► 
 
 Figure 2.1 Nulling of the Quantity A. 
 
15 
 
 °i " <Vl + • • • + a n> * °n (2 - 2) 
 
 The domain of convergence is 
 
 |A | < I a. + a (2.3) 
 
 1 1=1 n n 
 
 for n steps (n angles a,, . . ., a ) 
 
 Theorem : Given any (positive or negative) A, convergence is 
 guaranteed if (2.1), (2.2) and 2.3) hold. 
 
 To prove this we must establish that 
 
 n 
 |A. | - I a. < a n (Property P(n,i)) 
 
 which can be done using double induction (i.e., on two variables) 
 (Figure 2.2) 
 
 a. The statement is true for (n, 1) because we choose 
 
 n 
 |A, | - I a. < a 
 1 j=l J 
 
 by hypothesis (domain of convergence). Now if P(n,l), we must show 
 P(n+l,l). For the same reason 
 
 n+1 
 
 I A, I - 7 a . < a . , 
 ' ] ' j=l J n+1 
 
 by choice of the domain of convergence for the initial angle A,. 
 
 b. Now consider P(n,i) => P(n,i+1). 
 That is, if 
 
 |A | < I a. + a (2.4) 
 
 i j=1 J n 
 
Given a property J (i,n) 
 
 P(i,n 
 
 16 
 
 (1.1) 
 
 * I 
 
 Jl.n). 
 
 -(i.n+1) 
 
 if 
 
 ftl.U 1 
 P(i.D 
 
 >=> ^(1+1.1) 
 
 and 
 
 ^(i.n) 
 
 :> 7 J (i,n+l) 
 
 this guarantee the property to be true for all (i,n)'s 
 
 Figure 2.2 The Double Induction Process 
 
17 
 
 this implies that 
 
 n 
 
 |A. .1 I < I a- + a 
 1+1 ' j= f +1 J n 
 
 If we subtract a. from (2.4) 
 l 
 
 n 
 
 A. ' 
 
 L I - a. < 7 a. + a - a. 
 1 i fa n 
 
 which can be written 
 
 n 
 
 l A i' " a i < a n + I a i* 
 1 n n j=i+l J 
 
 To be able to use : 
 
 IM -a. | - |A 1+1 |. 
 
 we must also show that 
 
 n 
 
 J : 
 
 We can use (2.2) which is the constraint on the angles 
 n 
 
 1 1 
 ■a - I a. < |A. | - a. 
 n j=i+l n ] n 
 
 or 
 
 a. - y a. < a 
 
 1 j-T+i J n 
 
 -a. < |A.| - a. 
 
 and 
 
 -a - I a. < -a. by (2.2) 
 n j=i+l J n 
 
 Thus, we have shown that 
 
18 
 
 n n 
 
 -a - 7 a. < -a. < |A. I - a. < a + T ot. 
 n j4 +1 J i ' i 1 i n j= ^ +1 j 
 
 which implies 
 
 | A.. J <a + I a j from (2 - ]) 
 
 Therefore the convergence is insured for the entire lattice (i,n) and at 
 the end of the process (i=n) : 
 
 I A ., I < a 
 1 n+1 ' n 
 
 The criterion that 
 
 n 
 
 a. - y a. < a 
 
 i. - y a. 
 1 >T+i J 
 
 J- 
 
 for the 3 modes is that 
 
 n 
 a. < y a. + a 
 1 j=i+l J n 
 
 must hold for 
 
 a. = tan" 1 2" 1 , 
 
 a. = tanh 2 
 l 
 
 and 
 
 a.,. = 1/2 1 . 
 
 Linear Mode 
 
 We have 
 
 I = 2 " (i+1) + 2" (i+2) + . . . + 2~ n + 2~ n . 
 
19 
 
 This sum is equal to 2 and thus satisfies the convergence 
 
 criterion. 
 
 Trigonometric Mode 
 
 tan" ( 1/2 1 ) is to be compared with 
 
 T = tan 2 v '+...+ tan 2 + tan 2 
 
 (2.5) 
 
 We have also that tan" 1 2" n + tan" 1 2" n = tan' 1 (2" n+1 /l-2" 2n ). Thus 
 tan 2 + tan 2 > tan 2 
 
 By applying this inequality repeatedly to the last two terms of (2.5) 
 one gets: 
 
 T > tan' 1 2~\ Q.E.D. 
 
 Hyperbolic Mode 
 
 For the hyperbolic mode we will see that the inequality does 
 not hold in general but that by repeating some of the a. constants it 
 can be made to hold. 
 
 The sum to reduce is: 
 
 H = tanh" 1 2" (i+1) ♦ tanh" 1 2" {i+2) + . . . + tanh" 1 Z n . 
 
 We also have that 
 
 tanh" 1 a + tanh" 1 b = tanh" 1 ^/ab (2,6) 
 
 and thus, 
 
20 
 
 tanh ' 2 n u + tanh ' 2 U U = tanh ' - zrrprn 
 
 1 + 2 u u 
 
 -1 2~ 1 -1 -1 
 
 = tanh 9M+1 1 < tann 2 
 
 1 + 2 l u 
 
 which implies that 
 
 tanh-Y 1 ' > tan.fY (1+1 > + tanffY< 1+1 > > ttnh-Y (1+1) 
 
 + tanh" 1 2" (i+2) (2.7) 
 
 or 
 
 a i > a i+l + a i+l > a i+l + a i + 2- 
 
 We can here ask ourselves the question: What is the element 
 x. that must be added to the second member of the inequality in order 
 to make it an equality? 
 
 tanh-V 1 5 tanh-¥ (i+1 > ♦ tanh^w" 11 * 2 * ♦ x. 
 
 x, > tanh" 1 2" 1 - tanh" 1 2" (1+1) - tanlf V* 1 * 1 ' 
 
 x. > tanh'V 1 - tanh" 
 
 i " """' ' "*"" 1 + 2 -(2i+2) 
 
 ,-i 2" 1 
 
 -1 " 1 + 2" (2i+2) 
 
 x. I tanh 
 
 0~~ 
 
 { , I 2 -(2i+2) } 
 
 1 + 2 
 This becomes: 
 
21 
 
 .1 2 "(31+2) 
 X i " Unh 1 _ 3x 2 -(2i+2) • 
 
 We can choose the quantity 
 
 x. = tanh" 1 (2" (3i+2) x 2) = a 3i+1 
 
 to be greater than 
 
 since 
 
 i 2 -(3i+2) 
 
 tanh [ -(2i+2) ] 
 
 1 - 3x 2 U1 L) 
 
 1 >1. 
 
 1 - 3x 2^V 
 We have then: 
 
 a i < a i+l + Vl + a 3i+l (2 ' 8) 
 
 after adding x. = ot~« + , to the second member of the inequality (2.7). 
 Writing (2.8) for i+1 , i+2, . . ., 3i-2, 
 
 Vl < a i+2 + a i+2 + a 3i+4 
 
 a i+2 < a i+3 + a i+3 + a 3i+7 
 
 a 3i-2 < a 3i-l + a 3i-l + a 3(3i-2)+T 
 
 and adding, one obtains: 
 
 3i-2 
 a i < a i+l + a i+2 + • • • + a 3i-l + a 3i + .\. a 3j+l 
 
22 
 
 By applying inequality (2.7) successively this reduces 
 
 3i 3i-4 
 
 i < k= | +1 a k + X a 3j+l + a 3(3i-3)+l + a 3(3i-2)+l 
 
 a 
 
 to 
 
 3i 3i-4 
 
 1 k=i+l k j=i 3j ] 3(3i-3)+l 3(3i-3)+l 
 
 and finally to: 
 
 3i 3i-4 
 
 a i < k= ^ a k + i. a 3j + l + a 3(3i-3) * 
 
 We repeat this reduction by writing the last inequality in the form: 
 
 3i 31-5 
 
 a i < k= l +1 a k + 1 a 3j+l + a 3(3i-4)+l + a 3(3i-3) ■ 
 
 Using the inequality ot 3(3i_3) < a 3(3i-4)+i» 
 
 3i 3i-5 
 
 a i < k= \ +] a k + .^ a 3j+l + a 3(3i-4) + l + a 3(3i-4)+l ' 
 
 The final inequality obtained is: 
 
 3i 31+1 
 
 a i < k= l +1 a k + a 3i+l + a 3i+l = J 1+1 a k + a 3i+l 
 
 For the hyperbolic process, the significance of the last in- 
 equality is that, starting from cu we have: 
 
 a, < ou + ou + a. + a. . 
 
 Starting from a.: 
 
 a 4 < a 5 + a 6 + . . . + a ]2 + 0^3 + a ]3 
 
23 
 
 Starting from a,-,: 
 
 a 13 K a 13 + a 14 + • ' ' + a 40 + a 40 • 
 
 Th 
 
 us, depending on the precision desired the quantities {a^, a, 3> a.^, 
 
 a, ? ,» . • • , a., a~. + ,} will have to be added twice (or subtracted twice) 
 to insure convergence. 
 
 2.4 Precision for Radix 2 
 
 The accuracy of these algorithms is limited by the finite 
 length of the registers and adders. An n-bit arithmetic unit is used to 
 perform all of the operations and the following discussion examines the 
 value of the worst case error after n steps of computation. Let us sup- 
 pose that we have an n-bit adder and an n-bit register for storage of 
 intermediate results. 
 
 For the shifting operation we have chosen 8 bit barrel shifters 
 
 2 
 and have tried to minimize the cost by using (n )/8 barrel shifters 
 
 2 
 rather than the 2((n )/8) that would be required in order to retain the 
 
 shifted bits. This truncation will have some effect on the accuracy 
 
 (see Figure 2.3). 
 
 In the following, the CORDIC equations below will be applied n 
 
 times in sequence. 
 
 Vi ■ x i + m 2_i h 
 Vi ■ Y i - 2_i h 
 
 The worst case error will occur when all of the shifted bits are 
 
o 
 
 24 
 
 s 
 
2 1 = l - 2" 1 
 
 
 -1 -2 -2 
 
 +2 ' + 2 * = 1 - 2 * 
 
 
 +2" 1 + . . . + 2" n = 1 ■ 
 
 - 2" n 
 
 E I (1 - 2" 1 ) 
 
 25 
 
 one's and the successive operations performed are the same (n successive 
 additions or subtractions). If we assign a weight of 2 to the least 
 significant bit the successive errors are (Figure 2.4): 
 
 First step 0=0 
 
 Second step 
 
 Third step 
 
 n th step 
 
 The sum of these errors is: 
 
 n 
 
 I 
 1=1 
 
 E = n - (1 - 2" n ) 
 
 E = n - 1 + 2" n < n 
 
 and this will consequently make the log ^(n) least significant bits of 
 the result wrong. Thus, a full n-bits accuracy will require n + log ? n 
 bits of storage. For n = 16, the maximum error will be 
 
 n - 1 + 2" n = 15 + 2" 16 = 15 
 
 If we use an additional 4 bit ALU and 2 barrel shifters so 
 that the four most significant bits of the n bits lost while shifting 
 are retained, the errors will be (see Figure 2.5) 
 
 Sixth step 2' 5 = 2" 4 (1-2" 1 ) 
 Seventh step 2" 5 + 2" 6 = 2~ 4 (l-2~ 2 ) 
 n th step 2 -5 + 2' 6 + . . . + 2" n = 2" 4 (l-2~ n+4 ) 
 
 and their sum is: 
 
'ii«r step 
 
 siau) sii? 
 
 ,i IH STEP 
 
 BITS LOST 
 
 2' 2° 
 
 2"' T 2 
 
 26 
 
 ERROR 
 
 ■* 2-'+2~ 2 
 
 ■* 2-' + 2- 2 -2'' 
 
 2 J + 2- a +- 2- n 
 
 Figure 2.4 Accumulation of Errors in the Shifting Process 
 
 BITS LOST 
 
 
 N 
 
 AWED UURIifi THE 
 ALG0RITH1 
 
 Figure 2.5 Accumulation of Errors with Guard Bits 
 
 ERROR TEK1S 
 
 2- 5 + 2"V.. 2-* 
 
27 
 n-4 
 
 E = 2~ 4 I (1 - 2" 1 ) 
 1 i=l 
 
 E 1 ■ 2" 4 (n-4) - 2" 4 (1 - 2" n+4 ) 
 = 2" 4 (n-3) + 2' n < 2" 4 (n-2) 
 
 and thus the error is considerably diminished. 
 
 In the n = 16 case: 
 
 -4 
 E, < 2 (16 - 2) = 14/16 < 1 and thus 16 bits of accuracy can 
 
 be obtained by using one additional 4 bit ALU and 2 barrel shifters 
 
 (Figure 2.6). 
 
 2.5 Computation of the Constants Stored in the ROM 
 
 There are three types of constants involved in the CORDIC 
 algorithms: 
 
 a. L. = 2' 1 
 
 l 
 
 b. T. = tan" 1 2' 1 
 
 c. H. = tanh" 1 2' 1 
 
 l 
 
 a. The constants used in the linear mode can be generated 
 by using a serial-in, parallel -out shift register. 
 
 b. The arctangent constants have been computed in both 
 decimal and binary. The binary number represents angles 
 (Table 2.1). The following conventions are used: 
 
 it is represented by 10.0 . . . 
 j 01.0 ... 
 
 J 00.1 . . . 
 
 and so on. 
 
CO 
 
 28 
 
 cCs 
 
 VO 
 
 CO 
 
 «d- 
 c\j 
 
 00 
 
 in 
 
 V) 
 
 o 
 
 •r- 
 +-> 
 
 CD 
 
 e 
 
 O) 
 
 to 
 
 s- 
 
 CD 
 4-> 
 
 O0 
 
 QJ 
 
 S- 
 S- 
 <T3 
 
 CG 
 
 to 
 
 4-> 
 
 CQ 
 
 -a 
 
 s- 
 
 ro 
 O 
 
 c 
 o 
 
 +-> 
 
 N 
 
 m 
 
 03 
 
 us 
 
 CM 
 
 s- 
 
 CD 
 
29 
 
 tan" 1 (1/2) 
 
 1-1 
 
 Decimal 
 
 Binary 
 
 Rounded to 25 places 
 
 1 45.0000000000 ' 00, 
 
 2 26.5650511771 00. 
 
 3 14.0362434679 00. 
 
 4 7.1250163489 00. 
 
 5 3.5763343750 00. 
 
 6 1.7899106082 00. 
 
 7 0.8951737102 00. 
 
 8 0.4476141709 00. 
 
 9 0.2238105004 00. 
 
 10 0.1119056771 00. 
 
 11 0.0559528919 00. 
 
 12 0.0279764526 00. 
 
 13 0.0139882271 00. 
 
 14 0.0069941137 00. 
 
 15 0.0034970569 00. 
 
 16 0.0017485284 00. 
 
 17 0.0008742642 00. 
 
 18 0.0004371321 00. 
 
 19 0.0002185661 00. 
 
 20 0.0001092830 00. 
 
 21 0.0000546415 00. 
 
 22 0.0000273208 00. 
 
 23 0.0000136604 00. 
 
 1 0000000000000000000000 
 0100101 11 00100000001 010 
 001001 1 11 1 101 1001 1 10000 
 00010100010001000100011 
 00001 01 0001 01 100001 1010 
 00000101000101 110101 1 11 
 000000101000101 1 1 101 100 
 0000000101000101 1 1 11000 
 00000000101000101 1 1 I 100 
 OOCOOCCGClOlOCClCil 1110 
 0000000000101000101 1 1 11 
 00000000000101000101 1 1 1 
 000000000000101000101 1 1 
 0000000000000101000101 1 
 00000000000000101000101 
 00000000000000010100010 
 00000000000000001010001 
 00000000000000000101000 
 00000000000000000010100 
 00000000000000000001010 
 00000000000000000000101 
 00000000000000000000010 
 00000000000000000000001 
 
 shifting 
 
 Table 2.1 
 Values of tan" 1 (1/2) 
 
 i-1 
 
30 
 
 It can be seen (Table 2.1) that for i > 8 the successive con- 
 stants are obtained by shifting. 
 
 The Taylor expansion for arctan is 
 , 3 5 7 
 
 ■•I y y y 
 
 tan x = x q + ~ c 7 ' * ' l x l < 1 
 
 If tan" x has n bits resolution, we see that when |x| is less 
 than 2 ' , the additional terms of the series (x - x . . .), will not 
 contribute to an n bit precision angle and thus we will have, for 
 |x| < 2- n/3 , 
 
 tan" x = x or if 
 
 1 > § , tan" 1 2" 1 ' = 2" 1 ' 
 
 c. The tanh" constants have a similar property and have been 
 computed in the same way (Table 2.2). 
 
 For x > -z tanh" x - x 
 
 As seen in the CORDIC equations, the radii of the successive 
 vectors are modified by multiplicative constants of the form: 
 
 T2T 
 
 K. = A + m 2 for base 2 algorithms. 
 
 me (-1 , 0, +1) 
 
 After n steps of the iteration, the magnitude of the rotated 
 vector has grown by a factor K. 
 
 n-1 
 
 K = n /l + m 2 dl 
 n 1-0 
 
tanh" 1 (1/2)" 
 
 31 
 
 Decimal 
 
 Binary (rounded 25 places) 
 
 1 0.5493061443 00. 
 
 2 0.2554128119 00. 
 
 3 0.1256572141 00. 
 
 4 0.0625815715 00. 
 
 5 0.0312601785 00. 
 
 6 0.0156262718 00. 
 
 7 0.0078126590 00. 
 
 8 0.0039062699 00. 
 
 9 0.0019531275 00. 
 10 0.0009765628 00. 
 li 0.0004882813 00. 
 
 12 0.0002441406 00. 
 
 13 0.0001220703 00. 
 
 14 0.0000610352 00. 
 
 15 0.0000305176 00. 
 
 16 0.0000152588 00. 
 
 17 0.0000076294 00. 
 
 18 0.0000038147 00. 
 
 19 0.0000019073 00. 
 
 20 0.0000009537 00. 
 
 21 0.0000004768 00. 
 
 22 0.0000002384 00. 
 
 23 0.0000001192 00. 
 
 24 0.0000000596 00. 
 
 25 0.0000000298 00. 
 
 10001 100100111110101010 
 01 00000101 100010101 1 1 1 
 00100000001010110001010 
 00010000000001010101 101 
 000010000000000010101 10 
 0000010000000000000101 1 
 000000 1 ooooooooooooooi 
 00000001 OOOOOOOOOOOOOOI 
 00000000100000000000001 
 0000000001 0000000000001 
 00000000001 000000C0C0C1 
 00000000000100000000001 
 00000000000010000000001 
 00000000000001000000001 
 00000000000000100000001 
 00000000000000010000001 
 00000000000000001000001 
 00000000000000000100000 
 00000000000000000010000 
 OO0OOO0OO0O00O0O00Q1 000 
 00000000000000000000100 
 00000000000000000000010 
 00000000000000000000001 
 00000000000000000000001 
 00000000000000000000001 
 
 Table 2.2 
 
 Values of tanh" 1 (1/2)" 1 
 
32 
 
 Table 2.3 shows the value in decimal and binary of the scaling 
 
 factor K for n up to 24. 
 n r 
 
 The two scaling factors K, and K , can be stored in one ROM 
 location if division by K is necessary during a computation. 
 
1 
 
 n k 
 o 
 
 33 
 
 Decimal 
 
 Binary 
 
 
 
 1. 
 
 1 
 
 1. 
 
 2 
 
 1. 
 
 3 
 
 1. 
 
 4 
 
 1. 
 
 5 
 
 1. 
 
 6 
 
 1. 
 
 7 
 
 1. 
 
 8 
 
 1. 
 
 9 
 
 1. 
 
 10 
 
 1. 
 
 il 
 
 1. 
 
 12 
 
 1. 
 
 13 
 
 1. 
 
 14 
 
 1. 
 
 15 
 
 1. 
 
 16 
 
 1. 
 
 17 
 
 1. 
 
 18 
 
 1. 
 
 19 
 
 1. 
 
 20 
 
 1. 
 
 21 
 
 1. 
 
 22 
 
 1. 
 
 23 
 
 1. 
 
 24 
 
 1. 
 
 4142135623730950 01.011010 
 
 5811388300841890 01.100101 
 
 6298006013006610 01.101000 
 
 6424840657522360 01.101001 
 
 6456889 157572530 01.101001 
 
 6464922787 124770 01.101001 
 
 6466932542736420 01.101001 
 
 6467435065968990 01.101001 
 
 6467560702048760 01.101001 
 
 6467592111398200 01.101001 
 
 6467599963756150 01.101001 
 
 6467601926846920 01.101001 
 
 6467602417619690 01.101001 
 
 6467602540312890 01.101001 
 
 6467602570986180 01.101001 
 
 6467602578654500 01.101001 
 
 6467602580571570 01.101001 
 
 646760258 1050850 01.101001 
 
 6467602581170660 01.101001 
 
 6467602581200620 01.101001 
 
 646760258 1208110 01.101001 
 
 646760258 1209980 01.101001 
 
 6467602581210440 01.101001 
 
 6467602581210560 01.101001 
 
 6467602581210590 01.101001 
 
 10000010011 1 10100 
 001 1000101 1000010 
 01001 1 10101001 11 1 
 0001 1 1 1001 1 10101 1 
 0101001011 11 10000 
 01100000001C0001 1 
 01 10001 101 101 1001 
 011001000011 11110 
 01 10010001 1 101000 
 01100100100000010 
 01100100100001001 
 
 01 100100100001010 
 
 0110010010000101 1 
 01 10010010000101 1 
 01 100100100001011 
 01 10010010000101 1 
 0110010010000101 1 
 01 10010010000101 1 
 01 10010010000101 1 
 0110010010000101 1 
 01 10010010000101 1 
 01 10010010000101 1 
 0110010010000101 1 
 01 10010010000101 1 
 01 100100100001011 
 
 Table 2.3 
 
 Values of n K. = n [1 + 2 2l ] 1/2 
 o n o 
 
34 
 
 3. PROCESSOR SIMULATION 
 3.1 Software Simulation 
 
 The CORDIC algorithms have been simulated by implementing the 
 digital arithmetic unit in software . To avoid the normalization and 
 rounding inherent in the 360 arithmetic operations , the algorithms have 
 been simulated in binary. 
 
 The printouts allow the designer to check in a straightforward 
 manner the digital prototype machine (described in Section 4) by com- 
 paring the binary outputs to the strings of O's and Ts of the printout. 
 
 The program requires the value of m(+l , 0, -1) so as to specify 
 the trigonometric, linear, of hyperbolic mode. The user specifies whether 
 the Z or Y register sign is to be tested during the algorithm. The pro- 
 gram prints out the final values of X, Y, Z, K and K~ . 
 3 mm 
 
 The initial values, X and Y , are input by the user. 
 
 o o r 
 
 Z must be specified in degrees for the trigonometric and hyper- 
 bolic modes. It is also possible to print out the intermediate values 
 of the registers in decimal, both decimal and binary, or to print the 
 final register values directly. N, the precision of the computation, can 
 also be specified. Figure 3.1 shows the format of the program printout. 
 
 The final binary result is then converted back to decimal and 
 the relative error is computed by using the Call -OS FORTRAN double preci- 
 sion routines: 
 
 Error in % = 100 x CORDIC Value - True Value 
 hrror in % iuu x True Va]ue 
 
Values in Decimal 
 
 Values in Binary 
 
 X„ 
 
 Zn 
 
 <X r 
 
 STEP n 
 
 X REGISTER : Xn 
 
 Xn SHIFTEJ 
 
 Z REGISTER Zn 
 
 ROM OUTPUT : C< N 
 
 35 
 
 Next 
 Operation 
 
 Each box represents a number (decimal or binary), result of 
 step n. 
 
 add 
 
 \ 
 
 OPERATIOII FOR TK ■£& 
 
 ITERATION. ( RESULT OBTAINED 
 AFTER TESTIS Yn OR Zn ) 
 / / 
 
 
 
 
 
 
 
 Yn SHIFTED 
 
 
 / 
 
 
 i 
 
 i 
 
 
 / 
 
 
 
 Y REGISTER : Yn 
 
 
 ADD 
 
 
 
 
 
 
 
 
 
 SUB 
 
 Figure 3.1 Format of the Simulation Program Printout 
 
36 
 
 3.2 Trigonometric Mode 
 
 Table A.l of the Appendix lists the output for the trigonometric 
 mode. It shows: 
 
 — The initialization 
 
 -- The first 90 degree rotation 
 
 -- First step 
 
 -- Second step 
 
 -- 19th step 
 of an iteration in the trigonometric mode. The 19th iteration will not 
 change the outputs X, Y, Z. In the formulas 
 
 X i+1 = X. + 2- 1 Y. 
 and 
 
 the additive terms 2 _1 Y. and 2" 1 Y. will not contribute to the final 
 20-bit-resolution number for i greater than 18. The first two steps are 
 the 90 degree rotation and shift. 
 
 For the example chosen, the initial vector (X , Y ) has the 
 value (1/K-j, -1/K-,) so that the final result should be: 
 
 * * K i ^ + <- r/ - * 
 
 Y ■> 
 
 _1 
 
 -1 " K l 
 Z - Z Q + tan ' (— jL) = Z Q - 45° 
 
 Figure 3.2 shows the first steps of the algorithm. 
 
INITIALIZATION 
 
 37 
 
 FIRST 90* ROTATION 
 
 ( IP ROTATION) 
 
 1-1 
 
 1-2 
 
 1-07 
 
 0000138147 
 
 Figure 3.2 Steps in the Trigonometric Mode 
 
38 
 
 3.3 Linear Mode 
 
 The example of division is used with constants equal to 2 . 
 
 Figure 3.3 is a geometric interpretation of the algorithm with Y = -. 3 
 
 and X = .6. The result of the division is -.4999961853. 
 o 
 
 Tabulation for the linear mode is given in Table A. 2 of the 
 Appendix. 
 
 3.4 Hyperbolic Mode 
 
 The equations, 
 
 X ■+ K (X cosh l n + Y sinh Z ) 
 
 2 
 
 Y -* K (Y cosh Z + X sinh Z ) 
 2 o oo o 
 
 Z ■+ 
 
 
 
 are tabulated in Table A. 3 in the Appendix. 
 
 The value of K is 0.82978162013 with X = 1/K 9 and Y = 
 2 o 2 o 
 
 and the result is: 
 
 X = cosh Z 
 
 Y = sinh 1 
 o 
 
 Z = 
 
 Iterations 4 and 13 have been repeated twice to insure conver- 
 gence as shown in a previous section. One can see that the precision of 
 these algorithms is not uniform. The simulations were done with no 
 
39 
 
 SHIFT = 1 
 
 G = X 
 
 Figure 3.3 Geometric Interpretation for Linear Mode 
 
40 
 initial normalization and thus the relative precision has some variations 
 which are a function of the initial values, Xq, Y q and Z Q . In analyzing 
 the total precision, one must account for the fact that values are 
 represented in binary with finite length registers. 
 
 The next section gives an indication of the behavior of the 
 error for a typical computation of the slant range at various bearing 
 angles. 
 
 3.5 Precision of Results without Initial Normalization 
 
 Figure 3.4 represents the relative error for the X and Z 
 registers when the following formulas are computed in binary: 
 
 X + K l V + Y o" 
 
 Y -> 
 
 7 7 4. "I Y 
 
 Z + Z n + tan y— 
 u A Q 
 
 Several angles ranging from 90 have been chosen. X n and Y~ 
 have values of A cos A~ and X sin A~; A Q being the chosen bearing angle 
 and A the slant range. The percent error is plotted as a function of the 
 slant range for various bearings. The curves correspond to 20 significant 
 digits. The error is computed in double precision. 
 
41 
 
 QJ 
 CTl 
 
 C 
 TCI 
 
 o; 
 
 4-> 
 C 
 TCI 
 
 <v 
 
 O 
 
 
 TC) 
 
 CO 
 TCi 
 
 S- 
 
 o 
 
 s- 
 s- 
 
 0) 
 
 > 
 
 TCI 
 
 en 
 
 ro 
 
 =3 
 en 
 
42 
 4. THE 8-BIT PROTOTYPE OF A CORDIC ARITHMETIC UNIT 
 
 4.1 Hardware Realization 
 
 2 
 The prototypp, constructed with T L technology is composed of: 
 
 i) a microprogrammed control unit 
 
 ii) the X, Y, and Z sections of the arithmetic unit including 
 
 the 8-bit scalers. The use of these particular 8-bit 
 
 position scalers presented some problems in the 2's- 
 
 complement notation. These are explored in Section 4.3. 
 
 In Section 4.4 the design and operation of the prototype is 
 
 discussed in detail . 
 
 4.2 Control of the Arithmetic Unit 
 
 The term "microprogramming" was introduced two decades ago by 
 M. V. Wilkes. His intent was to offer a more systematic approach to 
 control design in digital computers. Within the last few years greater 
 understanding of the digital process combined with improved manufacturing 
 technology rendering highspeed changeable control economically feasible 
 has led to a resurgence of interest in computers whose controls may be 
 modified during use. 
 
 The working nucleus of a conventional digital computer is the 
 central processing unit or CPU and its associated main storage unit. Data 
 flows from several parts of the computer to and from the CPU and/or memory 
 so that operations can be performed on it (addition/subtraction and 
 other possible elementary operations). During the basic time slot or 
 cycle an instruction is performed and some gates are opened, counters are 
 incremented, and so on by control signals . Depending on the present state, 
 the machine will execute another instruction (next state). 
 
43 
 In the conceptually simplest possible organization (Figure 4.1), 
 the storage part of the control section would contain a storage element 
 corresponding to each control line connected directly to its control 
 terminal. This usually requires too many control lines (100 is a common 
 number) and the usual procedure is to examine the control signals in 
 groups that are logically mutually exclusive to reduce the length of the 
 microword. 
 
 It is also necessary to establish the microprogram address (con- 
 trol state) that is to succeed the current one (Figure 4.1). A common 
 technique is to construct the next address from the current address by 
 providing in the microword format a field that controls the modification 
 of the current address as a function of the current state. This provides 
 the possibility of conditional jumps and GOTO-like micro-instructions. 
 
 Because of the decreasing cost of ROMS and the flexibility of 
 microprogramming these techniques are becoming more and more popular for 
 the control design of computers. Studies have shown that above a certain 
 size the cost of microprogramming is less expensive than the cost of 
 control by standard techniques. 
 
 Mode of Operation of the Control 
 
 The microprogram will consist of a series of micro-routines, 
 one for each different machine instruction defined by the user, and a 
 master macro-routine to effect branching to the appropriate routine for 
 the current instruction. Each instruction in the machine program can be 
 thought of as a macro-call with an operation code making the call, and 
 the remainder of the instruction supplying the parameters. 
 
44 
 
 a 
 
 3 
 
 
 
 2 
 
 
 
 
 a 
 
 
 sz 
 
 
 
 r 
 
 :x 
 
 
 
 1 
 1 
 
 
 
 fc5 
 
 1 
 
 
 H 
 
 
 
 
 1 
 1 
 1 
 
 w § 
 
 r 
 
 "i 
 
 i i 
 
 •r— 
 
 c 
 
 en 
 
 c 
 
 1- 
 cn 
 o 
 
 s- 
 
 Q. 
 O 
 
 S- 
 
 o 
 
 s- 
 
 0) 
 
 c 
 <ll 
 
 CD 
 
 s- 
 en 
 
45 
 There is a need for a ROM decoding table which has as many 
 entries as there are possible values of the operation code for each 
 processor. It is also necessary to have a way of doing a TEST and 
 BRANCH operation as well as a RETURN from a subroutine. 
 
 Structure of the Basic Microprogrammable Unit 
 
 Although a yery general microprogrammable unit has been des- 
 cribed its implementation is simple and requires only a small amount of 
 hardware. BRANCH or CONDITIONAL JUMP instructions are performed to imple- 
 ment any algorithm represented as a flow table. A small microprogram 
 control allowing the equivalent of a GO TO instruction, or a JUMP TO SUB- 
 ROUTINE (conditionally or unconditionally), will be implemented. This 
 likewise uses a small amount of hardware (Figure 4.2). 
 
 The sequence of instructions to be executed is implemented as 
 a flow chart rather than as a detailed microprogram subject to modifica- 
 tions. Thus, the microprogram part can be easily implemented during the 
 microprogramming phase when all the control lines are known and well 
 defined. 
 
 The organization of the control card is shown in Figure 4.2. It 
 is comprised of: a RAM or a ROM (with no more than 16 words) where the 
 microsteps are stored and a micro-instruction address register which has 
 four different fields - the test, true address, false address and the micro- 
 instruction. Many other configurations are possible, of course. 
 
 The test field represents a three bit code yielding one of eight 
 possible tests. The tests are selected by an 8 x 1 multiplexer addressed 
 by the test field. A memory location can be selected from among four 
 possible sources: the true link, the false link, the stack, and a set of 
 
46 
 
 < 
 
 I — ? — ? — J 
 
 tttt. 
 
 o 
 
 s- 
 
 4-> 
 
 C 
 
 o 
 o 
 
 ■a 
 
 03 
 
 s- 
 a> 
 o 
 
 S- 
 Q. 
 O 
 
 s- 
 o 
 
 s- 
 
 (L) 
 
 c 
 
 QJ 
 
 CM 
 «3" 
 
 S-. 
 
 cn 
 
47 
 switches for manual addressing (to examine the contents of a given memory 
 location). The stack is used to save the value of the current-address- 
 plus-one whenever the program jumps to a subroutine. The current address 
 in incremented by one by means of an adder. To simulate a stack, a RAM 
 and an UP-DOWN counter are used. The two control lines, "Push" and "Pop", 
 are part of the micro-instruction. This configuration allows asynchronous 
 control of the input/output flow between the machine and the computer. 
 
 The details of the clock sequences are shown in Figure 4.3 for 
 a positive or negative edge triggered control line. The delay, , will be 
 necessary to allow the control bits A and B to be latched into the micro- 
 instruction register before the occurrence of the clock edge if edge 
 triggered devices are used. The JUMP TO SUBROUTINE will issue a signal 
 on the PUSH control line while the POP is (Figure 4.3). The return 
 signal will POP the preceding address from the stack and cause the data 
 selector switch to get the next address from the stack. 
 
 4.3 The Use of Scalers for 2 ' s Complement Shifting 
 
 The CORDIC algorithms require shifting by positions varying 
 from 1 to n for a typical n-step algorithm. However an eight-bit position 
 scaler which performs only end-off shifts was used. Figure 4.4 shows the 
 diagram and truth table of the scaler. This basic 8-bit building block 
 can be used for any shift function. 
 
 It is possible to use a simple shift register which shifts any 
 number of places right or left. However, an 8-bit position scaler package 
 takes no more space than an 8-bit parallel-in, parallel-out shift register. 
 In addition, the scaler is faster and has simpler control. 
 
JJJ LLL 
 
 ilSF slllciji; 
 
 .IU'i .OJ. SdiiCUIt 
 
 ±_i i_± 
 
 rUl Jrt ittl 
 
 TlST 
 
 I'JSI 
 PUP 
 
 Ti3T=FALS£ 
 
 TEST=TlttJE 
 
 g 
 1 
 
 | 
 
 A 
 H— H 
 
 RLTiUJ 
 
 GET TIC itXT AJJiiESS Film 
 TIC STACK A.U 'TOP" 
 
 ■ECT AD, 
 
 48 
 
 IWTL A.4J INCIIEUfi UJUNUj! 
 
 ~l_fl 
 
 fO 
 
 Ma 
 
 l.j. TO ADDRESS 
 — w> SELECTOR 
 
 COU.JTDI 
 
 K> 
 
 TO PUSITIVL QJGC TUIGGQIQJ I.rJT 
 
 a> 
 
 JEUY >£ 
 
 TO .EWTIVE QJGL IiIIGGUCl) I.I'JT 
 
 7'IOJ 
 
 cuci: 
 
 ^^J~L 
 
 « ► 
 
 T - LUCLST MSTiUCTiai 
 
 Figure 4.3 Microprogram Control and Timing 
 
49 
 
 l 7 l 6 h U l 3 l 2 h l Q 
 
 Example with a shift of 3 (S Q S,$ 2 =110) 
 
 Truth table : 
 
 INHIBIT 
 
 ENABLE 
 1*2 
 
 *0 
 
 s 1 
 
 h 
 
 °0 
 
 °1 
 
 °2 
 
 °3 
 
 °4 
 
 °6 
 
 °6 
 
 °7 
 
 
 
 
 
 
 
 
 
 
 'o 
 
 '1 
 
 '2 
 
 '3 
 
 >4 
 
 '5 
 
 *6 
 
 '7 
 
 
 
 
 1 
 
 
 
 
 
 'l 
 
 '2 
 
 '3 
 
 '4 
 
 '5 
 
 ! 6 
 
 '7 
 
 
 
 
 
 
 
 1 
 
 
 
 '2 
 
 T 3 
 
 T 4 
 
 T 5 
 
 T 6 
 
 T 7 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 '3 
 
 T 4 
 
 T 5 
 
 T 6 
 
 T 7 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 '4 
 
 T S 
 
 T 6 
 
 h 
 
 1 
 
 
 
 
 
 
 
 1 
 
 
 
 1 
 
 T 5 
 
 T 6 
 
 h 
 
 
 1 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 T 6 
 
 T 7 
 
 
 • 1 
 
 1 
 
 
 
 
 
 1 
 
 X 
 
 1 
 
 X 
 
 1 
 
 X 
 
 1 
 
 X 
 
 T 7 
 
 1 
 
 1 
 1 
 
 
 
 1 
 1 
 
 
 
 
 X 
 
 
 
 X 
 
 X 
 
 X 
 
 1 
 
 1 
 
 
 
 1 
 
 
 
 
 Figure 4.4 Scaler and Truth Table 
 
50 
 
 Scalers belong to the more general class of uniform shift 
 
 networks. This includes 
 
 . Scalers or shift circuits for end-off shifting 
 
 . Rotate circuits for end-around shifting 
 
 . Shift and rotate circuits or "barrel switches" which are 
 
 capable of both end-off and end-around operation. 
 
 The outputs of the scaler used are open collector to allow for 
 
 array expansion. 
 
 However, for 2's complement numbers, there is the problem of sign propagation 
 
 for right-shifting. A number in 2's complement form with the left most bit 
 
 being the sign, must have its left most bits filled in with the sign bit 
 
 when divided by 2 n (Figure 4.5). 
 
 This scaler replaces the left most bits by l's independently of the input 
 
 binary number. Some kind of connection is therefore needed in order to 
 
 insure proper sign propagation. 
 
 2 
 The number of units required for an n-bit scaler is (g) (Figure 4.5) 
 
 One approach implementable for small scalers, is to use (g) 
 extra 8-bit scalers having the sign bit of the shifted number as inputs. 
 The various cases are shown in Figure 4.5. 
 
 For all possible cases it is easy to see that if the outputs 
 of the B scaler are complemented when the sign bit is 1 and unchanged when 
 the sign bit is 0, and if these outputs are wire ANDed with those of the 
 A scaler, one will have the required correction (Figure 4.6). 
 
 However, this solution is not very economical for large n. 
 
 2 
 Another approach makes use of a 32 x 8 T L ROM. This solution 
 
 avoids the duplication of (g) barrel shifters whose only function is to 
 
 propagate the sign-bit (Figure 4.7). When the sign bit is the ROM sees 
 
51 
 
 Positive 2's complement number: 
 
 
 
 
 1 
 
 1 
 
 A, 
 
 A, 
 
 A4 
 
 A 3 
 
 k 
 
 
 I 
 
 1 
 
 1 
 
 \ 
 
 1 
 
 1 
 
 I 
 
 l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ; fc 
 
 ST!'C SCALLi! 
 A 
 
 3 — 
 
 SOT scall: 
 B 
 
 
 
 
 
 A. 
 
 'A, 
 
 A, 
 
 A3 
 
 Az 
 
 a; 
 
 Ac 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 — 1 
 
 
 
 
 — < 
 
 1 — 
 
 
 
 
 
 Negative 2's complement number : 
 
 \ A 5 A, A 3 A, 
 
 Correction needed in this case 
 
 3 ► 
 
 1 1 1 A 6 A 5 A 4 A 3 A, 
 
 
 S2'G SCALER 
 
 A 6 A 5 A 4 A 3 A, A, A 
 
 3 ► 
 
 I 1 1 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 S32'l3 SCALE! 
 
 Figure 4.5 Sign Propagation with the 8-Bit Scaler 
 
52 
 
 SIG.J BIT 
 
 SO\LL( 
 
 SCAIIK 
 
 MEi-m (DC UUJPJTS) 
 
 B- 
 
 ©- 
 
 B- 
 
 ©■ 
 
 ©- 
 
 " ID - 1 
 
 ■ ID ^ 
 
 ■ P - 1 
 
 ■ =)E>- 1 
 
 ©- 
 
 ©■ 
 
 ©- 
 
 Figure 4.6 Correction of Sign Bit, First Method 
 
53 
 
 = 58i 
 
 , _ o .— o — o - 
 
 185522 = : 
 
 -a 
 o 
 
 X5 
 
 C 
 
 o 
 o 
 
 0) 
 
 CO 
 
 4-> 
 
 CO 
 
 c 
 
 CO 
 
 ■(-> 
 O 
 
 <D 
 
 S- 
 
 S- 
 
 o 
 o 
 
 i- 
 en 
 
54 
 the address 000 which is loaded with l's only. When the sign bit is a 1 
 the output of the ROM address is of the form: 00 .. 0111, where the 
 number of zeros is equal to the value of the binary address. 
 For a 32 position scaler one would need four ROMs plus two AND-gates for 
 the address. This is a substantially smaller number of packages than 
 required by the previous solution. The access time of the currently avail- 
 able T L ROMS is one the order of 30 to 60 ns; compatible with the delay 
 of the scalers. 
 
 4.4 An 8-bit Prototype CQRDIC 
 
 The organization is sketched in Figure 4.8. 
 We can distinguish: 
 
 - The input mutliplexers 
 
 - The C0RDIC arithmetic unit 
 
 - The output registers 
 
 The cross connection implementing the "cross addition/substraction" is shown. 
 The X, Y and A registers can be initialized independently by means of 
 switches. The counters are used to address the scalers and the ROM con- 
 taining the constants. 
 
 The microprogram is stored in a RAM: A set of switches allows debugging 
 and changing the micro instructions in order to implement any of several 
 algorithms. The following labels have been chosen: 
 
 - Input multiplexers: MPY, MPX, MPA 
 
 - Counters: CTSH (Count/Shift, for addressing the scalers) 
 
 CTA (for addressing the ROM storing the ATR 
 constants) 
 
 - Input registers: IRY, IRX, IRA 
 
 - ROM storing the constants: ROMA 
 
 
55 
 
 CO 
 
 •i— 
 
 c 
 
 Q 
 OC 
 O 
 O 
 
 CD 
 
 +-> 
 
 •4- 
 
 o 
 
 e 
 <o 
 
 J- 
 
 03 
 
 u 
 
 o 
 
 CO 
 
 cd 
 
 s- 
 
 < 
 
 5 
 
 5 
 
 LU 
 CO 
 
 UJ 
 CO 
 
 UJ 
 CO 
 
- The adder-substracters: A/SY, A/SX, A/SA 
 
 - The output registers: ORY, ORX, ORA 
 
 Microinstruction Format 
 
 56 
 
 I 
 
 
 
 CORDIC 
 
 NEXT ADDRESS 
 
 I: Input gating signals 
 
 0: Output gating signals 
 
 CORDIC: A/S, Counters, and ROM 
 
 NEXT ADDRESS: The binary value of the jump address or next 
 
 address. This format is convenient and flexible for this 
 
 prototype of the AU. 
 
 Figure 4.9 shows the diagram of the control. 
 
 The execute signal opens gates and takes into account the delay in 
 performing an instruction. The fetch signal latches the new address into 
 the instruction address register and issues the next instruction to the 
 input of the microinstruction register. 
 
 Control -Microinstruction Address Register 
 
 Figure 4.10 shows the microinstruction address register. The 
 dual 4-line to 1 line Data Selector/Multiplexers allow the selection of 
 the address for the RAM either from manual switches or from the next- 
 address portion of the microinstruction. The singleshot is used to 
 generate the fetch signal, which can be initiated manually also. 
 
57 
 
 
 
 RIY<- -XI 
 RIX<- Yl 
 RIA <- +90° 
 
 Yes 
 
 RIY <- +X1 
 RIX<- -Yl 
 RIA <- -90° 
 
 1 
 
 WW 21 " 2 
 
 
 V i+1 =Y 1+ X./2 i - 2 
 
 X i+l =X 1+ Y l ./2 1 - 2 
 
 VrVY 2i ~ 2 
 
 RIA<- RIA +OC. 
 
 l 
 
 RIA<- RIA -OC. 
 l 
 
 
 
 
 Fig. 4.9 Control Flowchart (Trigonometric Mode) 
 
58 
 
 to 
 
 lO 
 
 cu 
 
 s- 
 ■o 
 
 X> 
 
 •=t 
 
 +-> 
 X 
 
 o> 
 
 o 
 
 fO 
 
 c 
 
 to 
 
 2: 
 
 I 
 
 to 
 to 
 CD 
 S- 
 
 -o 
 
 < 
 
 CD 
 
 -t-> 
 C3 
 
 J* 
 
 2< 
 
 to 
 10 
 
 CD 
 
 S- 
 
 -o 
 x> 
 
 <c 
 
 c 
 o 
 
 •!■■ 
 
 •4-> 
 
 o 
 
 3 
 S- 
 
 to 
 
 c 
 
 o 
 
 S- 
 
 o 
 
 Figure 4.10 Microinstruction Address Register 
 
59 
 Control -Microinstruction Register 
 
 Figure 4.11 shows the microinstruction register. It is an 
 edge-trigerred latch since some input signals return to their initial 
 value and some remain at a new value. 
 
 In other words, some control signals are pulses and some are 
 constant. The execute pulse is delayed and used to generate the pulse 
 signals whereas the level signals come directly from the MIR. The single 
 shot generates the "execute" pulse. The inputs to the MIR come from a 
 32 x 20 bit RAM. 
 
 Input Multiplexers 
 
 Refer to Figure 4.12. Four mutliplexersare used to steer the 
 values X , Y , and A or to initialize the registers X, Y and A depending 
 on the mode. Switches are used for initialization. The select inputs 
 are labelled A and B; the strobe, G; the data inputs CO, CI, C2, C3 and 
 the outputs Y. The strobe, G, can be used to clear RIY, RIX and RIA, 
 since, if G is a 1, the output of Y is low no matter what the inputs may 
 be. 
 
 Outputs 
 
 Figure 4.1 3 shows the three 8-bit output registers and the 
 strobe signals GOX, GOY, and GOA. 
 
 CORDIC 
 
 Figures 4.14, 4.15, and 4.16 show the CORDIC unit itself. One 
 can distinguish 
 

 7km 
 
 >> 
 
 s- 
 o 
 
 E 
 O) 
 
 S- 
 
 cn 
 o 
 
 S- 
 
 I 
 
 O 
 
 ALL Ic'S 
 
 /WE 74175 
 
 1-2 
 
 11 
 
 9 
 
 _ucci3 
 
 D 
 
 areiS 
 
 <0 
 
 15. 
 
 if 
 
 lp l(J 
 
 id 20 
 
 !£> 35 
 
 wD t#Q 
 
 10 
 
 60 
 
 to 
 
 CD 
 
 c 
 
 o 
 
 S- 
 E 
 
 o 
 
 ^> 
 
 Figure 4.11 Microinstruction Register 
 
61 
 
 DIP 8 Switches 
 
 All IC's are 74153 
 
 en 
 
 c 
 
 QJ 
 if) 
 
 =3 
 
 id 
 
 in 
 
 <u 
 +j 
 to 
 
 •r- 
 CD 
 
 cc 
 
 =5 
 Q. 
 ■M 
 
 3 
 
 o 
 
 o 
 
 cn 
 
 c 
 
 4-> 
 fO 
 CD 
 
 Figure 4.12 Input Multiples rs 
 
All IC's are 
 7475 
 
 >- 
 
 CO 
 
 o 
 
 GOY 
 
 X 
 
 CO 
 
 < 
 
 o 
 
 GOX 
 
 CO 
 
 o 
 
 GOA 
 
 -L.JFT 
 -iic |a 9 
 
 
 — fp - T 
 
 1 
 
 Figure 4.13 Outputs 
 
 to 
 
 > - 
 
 Q. 
 to 
 
 ■5. 
 
 to 
 
 62 
 
si.ndi.no x oj, 
 
 A 
 
 J 
 
 63 
 
 L_J 
 
 < 
 x 
 
 snajUiHS aaHHva a wohj 
 
 CO 
 
 \ 
 
 < 
 
 o 
 
 Eh 
 
 AI 
 
 2 
 
 a 
 
 Eh 
 fa 
 H 
 K 
 CO 
 
 Q 
 
 Q 
 
 Q 
 
 Q 
 
 Q 
 
 c 
 o 
 
 i- 
 o 
 
 € 
 
 a 
 
 a 
 
 -vw-*- -aa/v»- ~Wv-» — wv-» -w*»- -a*v» -am* — vw* 
 
 4-> 
 
 E 
 .c 
 
 4-> 
 
 •r— 
 
 S- 
 < 
 
 Q 
 O 
 
 CD 
 
 X HOJ NOU.YZITtfl.LINI 
 
sinaiifio a 01 
 
 64 
 
 J 
 
 -*• 
 
 
 sHai„aiHS aaanva x wohj 
 
 \ 
 
 o 
 
 EH 
 
 s 
 
 w 
 
 E-i 
 H 
 
 a 
 
 a 
 
 £ 
 
 € 
 
 £ 
 
 £ 
 
 ■G 
 
 Q 
 
 Q 
 
 Q 
 
 ~WAr» -A«A^» «WK"Wr» »\A*r» "AfW» -*Ai\-»- -VW» 
 
 ' TT TT TT TT TT TT T T 
 
 4" 
 
 +-> 
 
 s- 
 o 
 a. 
 
 4-> 
 ■r- 
 
 C 
 
 
 o 
 
 I— I 
 
 a 
 a: 
 o 
 o 
 
 0) 
 
 s- 
 
 3 
 CD 
 
 A tfOd NOIiVZITVIilNI 
 
65 
 
 o < 
 
 *- 2 
 O 
 
 en < 
 
 °<r _ 
 
 t-o O 
 
 q a: 
 
 , lh , 
 
 S- 
 
 o 
 
 Q- 
 
 ■M 
 •r— 
 
 C 
 
 4-> 
 0) 
 
 E 
 
 
 O 
 O 
 
 i- 
 
 =3 
 
 IS 5 
 
 Sit 
 
 u. < i 
 5 w 
 
 O IM 
 
66 
 
 . the input register stage, 
 
 . the barrel switch stage, and 
 
 . the adder/subtracter stage. 
 
 In Figure 4. 16, the "angle" portion of the arithmetic unit, shows that 
 
 the inputs to the adder/subs tracter are taken directly from the ROM, the 
 
 scalers being unnecessary for the A portion of CORDIC. 
 
 Speed 
 
 The microprogram is shown in Table 4.1 and requires ten steps. 
 
 Due to the limitation of the stored constant ROM used, the minimum step 
 
 duration obtained is -■ p Mh z = 850 ns. The ten steps required by the 
 
 algorithms thus take 8.5 us. 
 
</> 
 
 l— UJ 
 X cc 
 
 UJ o 
 
 r- O r— O 
 
 O <— i— O 
 
 O O O i— 
 
 O O O O 
 
 <— O .— O .— O 
 
 O r— .— O O i— 
 
 r— i— .— O O O 
 
 O O O i— r- ,— 
 
 1 
 
 o 
 
 r— O 
 
 o o 
 
 r— O 
 
 1- 
 o 
 
 67 
 
 >- 
 <c 
 
 o 
 
 o 
 
 i— O r— 
 
 O r— 
 
 rvi uj 2: 
 
 Ul I— O 
 
 UJ <=C m 
 QCOOO 
 
 r- 1— O 
 
 
 
 O 1— o 
 
 o 
 
 00 
 
 00000 
 
 0000 
 
 CD 
 
 o 
 
 o 
 
 X 
 
 o 
 
 o 
 
 o: 00 
 
 o o 
 
 CD 
 
 -O 
 
 03 
 
 u 
 
 <1J 
 
 E 
 o 
 c 
 o 
 
 C7> 
 
 CD 
 
 +-> 
 
 S- 
 
 o 
 
 4- 
 
 q: o 
 
 O l-H 
 
 >- OO 
 
 X 
 
 _J 
 o 
 
 1— r— O 
 
 S- 
 cn 
 o 
 J- 
 a. 
 o 
 s- 
 u 
 
 >- 
 _l 
 o 
 
 .— O i— 
 
 •— o 
 
 00 
 00 
 
 UJ 
 
 a: 
 o 
 o 
 
 o 
 o 
 o 
 o 
 
 1— o 
 
 O f— 
 
 o o 
 
 o o 
 
 o 
 o 
 
 o 
 o 
 
 r— O 
 
 O r— 
 
 r- O 
 
 r— O 
 
 1— o 
 
 O 1— 
 
 o 
 o 
 
 o 
 o 
 
 >— eg 
 
 00 
 
 IT) 
 
 CO 
 
 CO 
 
 CT> r— 
 
 
 
 
 C 
 
 c 
 
 C 
 
 c 
 
 C 
 
 c 
 
 C 
 
 c 
 
 «/J 
 
 z: 
 
 
 
 O 
 
 
 
 O 
 
 
 
 O 
 
 
 
 O 
 
 
 
 +-> 
 
 
 
 
 c 
 
 •i - 
 
 •r— 
 
 •r™ 
 
 •f— 
 
 •r™ 
 
 •r™ 
 
 •^ 
 
 • ^ 
 
 r— 
 
 1—1 
 
 >> 
 
 
 
 +■> 
 
 4-> 
 
 +J 
 
 -M 
 
 4-> 
 
 +J 
 
 -t-> 
 
 -!-> 
 
 3 
 
 1— 
 
 -0 
 
 •■■• 
 
 03 
 
 03 
 
 03 
 
 03 
 
 03 
 
 03 
 
 03 
 
 03 
 
 »/) 
 
 
 
 
 
 ■M 
 
 +-> 
 
 •*-> 
 
 4-> 
 
 +-> 
 
 +J 
 
 4-> 
 
 +J 
 
 +J 
 
 CD 
 
 
 
 -O. 
 
 03 
 
 O 
 
 O 
 
 O 
 
 
 
 O 
 
 O 
 
 O 
 
 O 
 
 S- 
 
 cc 
 
 s- >> 
 
 -M 
 
 S- 
 
 i- 
 
 S- 
 
 s_ 
 
 S- 
 
 S- 
 
 S- 
 
 S- 
 
 
 h- 
 
 03 S- 
 
 O 
 
 
 
 
 
 
 
 
 
 C 
 
 00 
 
 CD CD 
 
 i. 
 
 +-> 
 
 T3 
 
 -O 
 
 .c 
 
 -C 
 
 -C 
 
 -C 
 
 jz 
 
 
 
 2: 
 
 1— > 
 
 
 «/> 
 
 C 
 
 s- 
 
 4-> 
 
 +-> 
 
 4-> 
 
 +■> 
 
 4-> 
 
 +J 
 
 t— ) 
 
 U CD 
 
 O 
 
 1 — 
 
 <M 
 
 co 
 
 «3" 
 
 LO 
 
 CD 
 
 1*^ 
 
 00 
 
 03 
 
68 
 
 5. METHODS BASED ON FUNCTION APPROXIMATION 
 
 5.1 High Speed Division Using High Speed Multipliers 
 
 There exist several algorithms in the literature dealing with 
 the computation of the reciprocal, -, of a normalized number, x. Wallace 
 [WAL64] first developed an iterative division scheme by making use of a 
 fast mutltiplier. The IBM/360 floating point execution unit also makes 
 use of a fast multiplier to perform division. It is an iterative process 
 where, on each iteration, a factor, R. , multiplies both numerator and 
 denominator so that the resultant denominator converges quadratically 
 toward one and the resultant numerator converges quadratically toward the 
 desired quotient: 
 
 fj x R^ x R7 X 4 X '•• X R^ NR R 1 ••' R n = Q u ° tient 
 
 Ferrari [FER71], Ling [|_lN71]and Stefanelli [STE74] have also 
 proposed various multiplicative schemes for computing the reciprocal of 
 a number. An overview of these algorithms is given in Appendix A. 4 
 Stefanelli describes an inversion algorithm based on the Taylor series. It 
 makes extensive use of read-only memories and a set of dedicated, non-cas- 
 caded mutlipliers. 
 
 This technique is very fast because it is entirely parallel, but there 
 are some inconveniences: 
 
 . the method uses dedicated multipliers to implement the 
 division algorithm and we believe that such algorithms should 
 be designed around an n x n multiplier that can be used for 
 multiplication, as well as other function generation. 
 
69 
 
 . the Taylor polynomial produces very poor behavior with 
 respect to error and it is possible to use other approximating 
 polynomials which yield better error control. 
 . It is expensive to use algorithms based on series approxi- 
 mations and ROM lookup to implement a totally parallel 
 divider for a precision of more than 32 to 34 bits. The 
 precision can be increased, however, by more efficient use of 
 the ROMs than that proposed by Steffanelli [STE# ] (See 
 Appendix A. 5) 
 As discussed later, other approximating schemes taken from 
 numerical analysis lead to better error behavior than that obtained from 
 a Taylor polynomial expansion. 
 
 5.2 Division and Function Generation Using Optimum First Order Polynomial 
 Interpolation with Selected Abscissas 
 
 We now investigate the precision and the possibility of implement- 
 ing function approximation by applying a technique known as optimum poly- 
 nomial interpolation. The results are general and apply to any continuous 
 function, f(x), that must be approximated between selected points x Q , x-,. 
 The equations are easily derived for any arbitrary function f(x), although 
 they are not optimal in the Tchebychev sense (minimax approximation). 
 Remez' algorithm and Tchebychev polynomial approximations are two other 
 techniques available. The following describes the notation used throughout 
 the remaining paragraphs. 
 
 The given number x has the following format 
 
 x o 
 
 0.1 
 
 3^7: T2~ 13 
 
 n bits 
 o 
 
x = x + r 0<r<2° 
 
 o — 
 
 -n 
 
 r = R. 2 ° 
 
 < R < 1 
 
 70 
 
 n is the number of bits looked up in a table (10 to 14 seems 
 to be practical given the present technology). 
 
 x is the given number with n bits precision 
 x represents the n high order bits 
 r represents the (n-n ) low order bits 
 
 — is stored in a ROM (rounded) 
 o 
 
 R represents the number (not necessarily normalized): 
 
 o n 
 
 
 . 
 
 -n 
 x-, = x Q + 2 , next interval point 
 
 1 x 
 f(x) is the function to be approximated (— , e , log x . . . ) 
 
 A 
 
 Optimum Polynomial Interpolation with Selected Abscissas 
 
 If a function f(x) is approximated by the interpolating polynomial 
 p (x) of degree n which agrees with f(x) at n + 1 points x 0> x,, ... x we 
 have 
 
 f (n+l) m 
 f(x) - p n (x) = ir(x) f in+] )y = E(x,0 
 
 where tt(x) = (x-x Q )(x-x-| ) ... (x-x ) 
 
71 
 
 if x Q < x 1 < ... < x p and £ e [ x Q , x p ] 
 
 The interval must be reduced to the interval [-1, 1] by an 
 appropriate change of variables. The term f* ' (£) depends on x and 
 on the abscissas x Q , x, ... x . One can chose to minimize |tt(x)| 
 
 over [-1, +1] rather than minimizing |f(x) - p (x)|. If f^ n ' (?) 
 
 does not vary too much in the interval, one would obtain a nearly "best 
 approximating" polynomial. This technique can be used to find an approxi 
 mation to all functions f(x) having continuous derivatives in [-1, +1], 
 Thus we will minimize the quantity: 
 
 +1 
 r 
 
 I = 
 
 ,2 
 
 w(x) |>(x)] dx 
 -1 
 
 Where w(x) is a prescribed weighing function which is non- 
 negative in [-1, +1]. This will minimize the error due to the factor tt(x) 
 It is known that 
 
 a) If w(x) = l,the n + 1 roots of tt(x) that minimize I are 
 the zeros of p + -i(x), the (n+l) s Legendre Polynomial. 
 
 b) If w(x) = , , the n + 1 abscissas are the zeros 
 
 /lT^2 
 
 °f T n+ i( x )> tne (n + l) Chebyshev polynomial and are given 
 by 
 
72 
 
 x. = cos (£i±j- . |) i = 0, 1, ... n 
 
 It follows that tt(x) = 2~ n T n+] (x) 
 
 (x ,.i^LiMii£p w 
 
 (2n+2)l n+l 
 
 c) If w(x) = (l-x) a (1+x) 3 a > -1 > -1 the abscissas 
 are the zeros of the Jacobi Polynomial. 
 
 Next, we will study in more detail the Legendre and Chebyschev 
 polynomials for division and will describe a hardware implementation for 
 various precision. 
 
 Range Transformations 
 
 The formulas used generally assume that the variable x takes 
 values in the interval [-1, l]. The range transformations are derived as 
 follows: We have x = x Q + r (notation as above) with x ranging between x Q 
 
 and X-, , the mapping into [-1, +1] leads to 
 
 x = 2x _ V X 
 x l" x x l" x 
 
 x(x,-x n ) x,+x n 
 That is, x = i— — + -h^- 
 
 Legendre and Tchebychev Interpolation of First Order 
 
 1 2 
 The values x Q and x-, are the roots of P 2 (x) = 2"(3x -1) for a 
 
 p 
 Legendre approximation of f(x) and T 2 (x) = 2x -1 for the Tchebychev case. 
 
73 
 
 FT Jn 
 
 The roots, in either case, are x Q = -x, = a = j f° r Legendre and j for 
 Tchebychev. We have for the new abscissas: 
 
 Xj-Xq x ! +x n x i +x q 
 x = " a * — 2 — ^ — 2 — = ~ a 2 — 2 — 
 
 - X T X 0^ V x h (X 1 +X } 
 
 x 1 = a ( — 2 — ) + — 2 — = a 2 + 
 
 The new first order approximation formula then becomes: 
 
 f(x'-,)-f(x' ) 
 
 M x >- f ( x V + x yx' ° < X - X V ^J 
 
 Comparing it with the previous formula for Legendre interpolation 
 
 f(x,)-f(x n ) 
 
 M x > = f ( x 0> + x r x ° (X - X } 
 
 It appears that we have lost several advantages from a hardware implement- 
 ation standpoint: 
 
 x - x Q was directly represented by the n - n Q low order 
 
 bits of x. 
 
 x-, - x Q had the value 2" n 
 
 computation of p(x) involved only the multiplication 
 
 [f(x-,) - f(x Q )] x R 
 
 However, equation( 5-l)can be further transformed by the following relation 
 
 x'-j - x' Q = ah 
 
 x - x' Q = x - (x Q + | - a|) 
 
74 
 
 = x - x Q - 2-0-a) 
 
 = r - |(l-a) = h[R - l(l-a)] 
 
 Thus, given R, one can perform the subtraction [with (n-n«) bits 
 
 precision] of the constant j (l-oi)« 
 
 Complications arise when the multiplier is not a signed number. 
 
 It is better to express (5-1) as a function of r = x - x~: 
 
 f(x\) - f(x') 
 Pl (x) = f(x' Q ) + x<i _ x , q (x-x 0+ x -x' ) 
 
 f(x') - f(x' ) f(x' ) - f(x' ) 
 
 M*) = f(x'n) + J _ v . (X n -X' n ) + J _ w , ° 
 
 x 'l " x 'o ° ° x 'l " x 
 
 (x-x ) 
 
 xf(x') - f(x') h f(x' ) - f(x' ) 
 = f (X'„) + ±-c °~ (a-1 ) §■ + ] -r 8 
 
 0' ah i- ■/ 2 ■ ah 
 
 (x-x ) 
 
 i (ftx'O - f(x' n )) 
 Pt(x) = f(x' Q ) + [fCx 1 ^ - f(x' )] (Sjl) + *— °—R 
 
 , (f(x') - f(x' )) 
 Thus one can store f(x' Q ) + [fU^) - f(x' )] (g 1 ) and — — 
 
 in ROMs (or RAMs) and perform the multiplication by R, the n - n Q low order 
 bits of x. 
 
 The error for the case f(x) = — is — - p-, ( x) and is tabulated 
 
 A A 
 
 for 
 
 x Q = .5 (first interval ) 
 
 -11 i/7 
 
 and x Q = 1 - 2~ (last interval) in the Chebychev case (a= j) 
 
75 
 
 /J 
 and the Legendre case (a= ~-) and for two intermediary cases: a = .5 and 
 
 a = 1 in Tables 5.1, 5.2 and 5.3. The j case simplifies the hardware 
 
 design as to the amount of ROM required and the a = 1 case corresponds to 
 
 the Lagrange polynomial for which the formula becomes: 
 
 P^x) = f(x' Q ) + [f(x'.,) - f(x' )]R. 
 
 Precision Attainable 
 
 1 (x-x' n )(x-x' 1 ) , 
 
 where %, takes values between £ = x Q and E, = x-,. 
 
 The polynomial tt(x) is weighted by a function of x taking values 
 
 between — ^ and — j. The error is then bounded by 
 V x l 
 
 E, = max|TT(x)| x — j in the interval [x Q ,x-,]. 
 x 
 
 For the Tchebycheff case one finds 
 
 ( h 2 
 
 E = !Zii._L k 2 
 L l 4 3 k ' 
 x 
 
 For the first interval: E ] = h 2 = 2~^o. 
 
 k -2n 
 
 For the last interval: E, = -^- = 2 Ml o - 3 and thus 
 
a = 1/2 
 
 h - 2" 12 
 
 h 2 = 0.0000000596046412 
 
 1/x - p^x) 
 
 76 
 
 0. 5000000000000000 
 0.5000076293945312 
 0.500015258 789 0625 
 0. 500022888 1835937 
 0.500030517 578 1250 
 0.500038 14 69 726562 
 0.5000457763671875 
 0.5000534057617187 
 0.500061035 15 62 500 
 0.5 00068 66455078 12 
 0. 50007 62939453 1 25 
 0. 5000839233398437 
 0.50009 15527343750 
 0.5000991821289062 
 0.50010681 15234375 
 0.5001 144409179687 
 0.5001220703125000 
 0.5001296997070312 
 0. 5001373291015625 
 0.50014495849 609 37 
 0.5001525878906250 
 0.5001602172851562 
 0.5001678466796875 
 0.5001754760742187 
 0.500183 105468 7500 
 0.5001907348632812 
 0.5001983642578125 
 0. 5002059936523437 
 0.5002 13 62304 687 50 
 0.5002212524414 062 
 0.5002288818359375 
 0.50023651 12304687 
 
 0. 000000C 393633290 
 0.0000000749337312 
 0.0000000614354 136 
 0. 0000000488683334 
 0. 0000000372324478 
 0.0000000265277142 
 0.0000000167540901 
 0.00000000791 15330 
 0.0 
 
 •0.00000000698 05515 
 •0.00000001303 01 638 
 •0.0000000181488800 
 •0.00000002233 67422 
 0.00000002 5593 79 33 
 0. 0000000279200758 
 0.000000029315632 6 
 0.00000002978 05056 
 0.000000029314 738 
 0.00000002 79 183 723 
 0.0000000255914510 
 0. 0000000223340164 
 0.0000000181461 1 15 
 0. 0000000130277789 
 0.0000000069 790609 
 0.0 
 
 0. 0000000079093607 
 0. 00000001 67489793 
 0.0000000265188127 
 0.000000037218818 7 
 0. 0000000488489544 
 0.000000061409 1775 
 0. 0000000748994453 
 
 Table 5.1 
 
 Error for the First Order Approximation 
 for f(x) = 1/x and a = 1/2 
 
a = /2/2 
 
 h = 2 
 
 -12 
 
 1/x - p^x) 
 
 77 
 
 0. 5000000000000000 
 0. 500007 629 39453 12 
 0.500015258 789 0625 
 0.500022668 1835937 
 0.500030517578 1250 
 0.500038 1469726562 
 0.50004 57 763 67 18 75 
 0.8000534057617 187 
 0.50006103515 62500 
 0.500068 66455078 12 
 0.50007 62939453 125 
 0. 5000839233396437 
 0.50009 15527343 750 
 0.500099 1821289 062 
 0.50010681 15234375 
 0.5001 144409179687 
 0.5001220703125 000 
 0.500129 6997070312 
 0.5001373291015625 
 0.50014495849 6093 7 
 0.5001525878906250 
 0.5001602 1 7285 1562 
 0.50016784 6679 68 75 
 0.5001754760742187 
 0.5001831054 687500 
 0.5001907348632812 
 0. 5001983642578125 
 0. 5002059936523437 
 0. 5002136230468750 
 0. 5002212524414062 
 0. 5002288818359375 
 0.50023651 12304 687 
 
 0. 0000000595755536 
 0. 000000045 1464 1 04 
 0. 00000003 1 648 54 7 1 
 0. 0000000190819212 
 0.00000000744 64899 
 0. 0000000032577894 
 0.000000013 0309 59 2 
 0. 00000002 18 73 062 
 0. 000000029 784 14 07 
 0.00000003 676423 79 
 0. 0000000428 1 339 59 
 0. 00000004 79 3165 78 
 0. 000000052 1 19 065 7 
 
 •0. 0000000553756625 
 0. 00000005770149 7 
 0. 0000000590965932 
 0.00000005956101 19 
 0.000000059 094 79 00 
 
 •0.00000005 769 79 700 
 0. 0000000553705943 
 0. 0000000521 127055 
 0.00000004 792434 63 
 0.0000000428 05559 3 
 0. 00000003 675 638 70 
 0.000000029 7768 719 
 0. 0000000218 6705 66 
 0.00000001302 698 39 
 0. 00000000325669 62 
 0. 0000000074437640 
 O. 000000019 074354 1 
 0. 0000000316350315 
 0. 000000 04 5125 7 53 6 
 
 Table 5.2 
 
 Error for the First Order Approximation for f(x) 
 and a = /2/2 (Tchebycheff approximation) 
 
 = 1/x 
 
1/x-p^x) 
 
 78 
 
 0. 5000000000000000 
 0. 500007 62939453 12 
 0.500015253 789 062 5 
 0.500022388 1835937 
 0.50003051 7578 1250 
 0.5 00038 14 69726562 
 0. 500045776367 1875 
 0. 500053/1057617 187 
 0.500061 U351562500 
 0. 500068 6645507ft 12 
 0. 5000762939453 1 2b 
 0. 5000839233398437 
 0. 5 0009 1552 73437 50 
 0.5000991821289062 
 0.50010681 15234375 
 0.5001 144409 179687 
 0. 5001220703125000 
 0.5001296997070312 
 0. 500137329 1 015625 
 0.5001449 5849 609 37 
 0.50015258789 06250 
 0. 5001602 1 72851562 
 0.5001678466796875 
 0.5001754760742 187 
 0. 5 00183 105^68 7 500 
 0. 50019 07348 6328 12 
 0.5001983642578125 
 0. 5002059936523437 
 0. 5002 13 623 04 68 750 
 0. 50022 125244 1 4 62 
 0. 50022888 18359375 
 0.50023651 12304687 
 
 0. 0000000(794340707 
 
 0. 000000065004 6246 
 
 0.00000005150 64 584 
 
 0. 0000000389395296 
 
 0. 0000000273037954 
 
 0.00 00000 1 65992 133 
 
 0. 0000000068257406 
 
 •0.00000000201 66651 
 
 •0. 00000000^9280466 
 
 ■0. 00000001 69 0844 66 
 
 •0.00000002 295 79 76 
 
 0.000000028 764 723 
 
 0.0000000322 64183 I 
 
 0.000000035 52 1082 7 
 
 0.0000000378472138 
 
 •0. 000000039242 6192 
 
 •0.000000 039 7073407 
 
 0.0000000392414217 
 
 0. 0000000378449045 
 
 ■0.0000000355178318 
 
 •0. 0000000322602458 
 
 0.000000028 072 189 5 
 
 0. 0000000229537054 
 
 0. 00000001 69048360 
 
 •0.00 00000099 25 623 7 
 
 0.00000000201 61115 
 
 0. 00000000682365H5 
 
 0. 00000001 6593 6433 
 
 0. 0000000272938008 
 
 0. 0000000389240880 
 
 0.00000005 1484 4 625 
 
 0. 0000000649 7488 1 7 
 
 a = /3/3 
 h - 2- 12 
 
 Table 5.3 
 
 Error for the First Order Approximation for f(x) = "l/x 
 and a = /3"/3 (Legendre approximation) 
 
79 
 
 2~ 2 V 3 < Error < 2" 2n 
 
 The use of the zeros of the Tchebycheff polynomials leads to 
 the minimum value of |7r(x)|max and yields a value smaller than in the 
 Legendre case. If it is desirable to control the maximum error, the 
 Tchebycheff polynomials are preferable. 
 
 The zeros of the Legendre polynomial will minimize the RMS value of tt(x) 
 over the interval . 
 
 5.3 Second Order Interpolation 
 
 The process is the same as for linear interpolation except that 
 — will be approximated by a second degree polynomial in each interval. 
 The collocation points are found to be, as before, the roots of a Tchebycheff 
 or Legendre polynomial: 
 
 1 2 /3 fT" 
 
 Legendre : P 3 (x) = j x ( 5x ~^) wltn roots: ~ J jr> °» \1t 
 
 Tchebychev : T 3 (x) = x(4x -3) with roots: - y^-, 0, yj 
 
 The roots are of the form: -a, 0, +a. 
 
 The change of variable to map [-1, +1] into x Q , x, leads to the following 
 
 collocation points: 
 
 „. . h , x l + x 
 1 a "o 9 
 
 x' - Xl + X ° 
 x 3 - 2 . 
 
 Let us now derive the equation in a form suitable for hardware implementation. 
 One would like a formula of the type: 
 
80 
 
 P 2 (x) = a Q + a,R, + a 2 R where R is the number defined 
 
 previously. We have 
 
 (x-x' 2 )(x-x' 3 ) (x-x'-jJtx-x^) 
 
 p 2 (x) = (x 1 -x , 2 )(x r x , 3 ) *1 + (x' 2 -x' 1 )(x' 2 -x' 3 ) H + 
 
 (x-x'^Cx-x-'g) 
 
 (x 1 3-x' 1 )(x' 3 -x' 2 ) y 3' 
 
 After some elementary transformations, one obtains the 
 polynomial : 
 
 P|Cx)-(y^fl-^y lt ^y,) 
 
 2a a 2a 
 
 -(2+a) y, + 4y ? - (2-a) y 3 
 + ( ] - 2 -) R 
 
 a 
 
 ?y'i - 4y ? + 2y 3 ? 
 
 a 
 
 which can be evaluated in several different ways (e.g., parallel multip- 
 lication or Horner's rule). 
 
 The precision can be calculated as follows: We have 
 
 (x-x')(x-x' )(x-x' ) ,,, 
 f(x) - p(x) = J j£ 3 - f (5). 
 
 For the division case, 
 
 -- p(x) = (x-x^Mx-x'gMx-x^) x^-, 
 
 In the Tchebychev case tt(x) is bounded by: 
 
81 
 
 3 3 
 E = -3-iL with a = | or 
 12./3 
 
 E = 3/3h 3 m h 3 
 
 8xl2x/3 2 5 
 
 The upper bound on the error is, for the first interval [.5, .5 + 2" n 0] 
 
 E= h 3 2 4 = — 
 
 h 3 
 
 and for the last interval E = -f- and thus 
 
 2 5 
 
 2 -3n Q -5 < Error < 2 " 3n _1 n Q = number of bits looked up. 
 
 5.4 Higher Order Interpolation and Precision 
 
 For interpolations of order higher than 3 it is possible to 
 obtain speed improvement by using any of the well-known "economical 
 polynomial evaluations". There are two sources of error when a polynomial 
 is evaluated. For example in evaluating: 
 
 p^x) = a Q + a^, 
 
 the two errors are: 
 
 (x-x' )(x-x') 
 
 1) f(x) - p(x) = °-2 L. f" ( ? ) = E] and 
 
 2) the representation of numbers using finite length registers 
 and multipliers. 
 
 In fact one performs 
 
82 
 
 P-|(x) = (a Q + E Q ) + (a 1 + E 1 )(R + E R ) 
 
 and the maximum error due to the adder is E Q . 
 
 The maximum error due to the finite length inputs to the multiplier is: 
 
 E 1 R + E R a l + E 1 E R^ E 1 +E R +E 1 E R 
 
 One can reduce E Q relative to (2E-|+E, ) by using more accuracy for the 
 first constant a Q . 
 
 The other source of error is then due to the multiplier. The 
 task of the designer is to optimize the trade-off between time and precision, 
 Totally parallel evaluation is expensive and requires a large multiplier 
 whereas semi -parallel multiplication can make use of a smaller multiplier 
 but is more time consuming. 
 For example using a 24 x 24 bit multiplier, is it possible to have a 
 
 division routine requiring one multiplication cycle time for a precision of 
 
 -?4 
 2 CH or less? 
 
 By using a first order optimal interpolation and looking up 11 
 
 bits of the number x, the upper bound of the division routine error will 
 
 be: 
 
 2 -2n Q = ^ 
 
 The formula implemented is 
 
 p, (x) = a Q + a-iR. The error on a, and R is called E, . If 
 
 -25 -23 
 
 E, = 2 the errors can add giving a total error of 2 . The only 
 
 solution is to have 
 
 a) a bigger multiplier, making the error due to the series 
 
 truncation predominate or 
 
 
83 
 
 b) for the same multiplier, use smaller intervals and thus 
 
 make the error due to the multiplier the predominate. 
 In the example, if 12 bits of the initial number are looked-up > 
 the error due to truncation of the series is bounded by 
 
 2 -2n . .,-26 
 
 The predominant error is due to the multiplication, 
 
 a,xR, 
 
 -24 
 and is 2 because of the 24 x 24 bit multiplier. 
 
 5.5 Application to Arbitrary Function Generation in Aviation 
 
 Aircraft and engine performance data is received in the form 
 of plots of performance variables as functions of other system variables. 
 The functional relationships generally are multivariable and nonlinear, 
 and can seldom be approximated by analytical equations without intro- 
 ducing inaccuracies beyond the specification tolerances. Furthermore, 
 the data is subject to change as experience is gained with the aircraft- 
 general ly in the direction of increasing complexity. 
 
 At present all function generation problems are solved by 
 software and, as a result, the analytical approximations such as poly- 
 nomial expansions must also increase in complexity, resulting in a cor- 
 responding increase in simulation program execution time. 
 
 Function generation by linear interpolation of stored data 
 points is generally superior in aviation to analytical approximations 
 for two reasons: 
 
84 
 
 1. Data can be readily programmed directly from plots or 
 tables without extra design time to develop an approxi- 
 mation; 
 
 2. Increased data complexity can be met with no increase in 
 execution time, since the interpolation formula does not 
 change. 
 
 Notations 
 
 f(x,) and f(x~) are adjacent function values 
 
 X-, and x ? are the corresponding argument values 
 
 x is the current argument value 
 
 h the difference x~ - x. 
 
 Since many functions can use the same breakpoints or argument 
 values a saving in time and memory space may be realized if the "inter- 
 polant" 
 
 X - X. 
 
 Vi - x i 
 
 is shared by the functions. (Typically the interpolant may be shared by 
 three or more functions in practical cases.) 
 
 If fast multiplication and division hardware routines are 
 available, the majority of the time consumed in the interpolation process 
 will be due to the "argument search" routine (which can be implemented 
 in hardware too) . 
 
 The linear interpolation for one variable, 
 
 f(x) = Cf(x i+1 ) - f(x.)] * " ^ + f(x.), 
 ii-i l x i+1 - x i 1 
 
85 
 
 requires 
 
 1. Argument search: given x, where is x.? 
 
 2. One division. 
 
 2. One multiplication. 
 
 A schematic of the organization of the AU is given next page, 
 
Xo 
 
 r = 2 R 
 
 06 
 
 DATA 
 SeLSCTOR 
 
 e 
 
 LJ 
 
 DATA 
 
 selector 
 
 MxM ROM MULTIPLIER 
 
 3_i 
 
 L_£ 
 
 
 
 
 1 
 
 f 
 
 
 I 
 
 ALU ( 74 1 81 s) 
 
 I 
 
 ACC. 
 
 B 
 
 B' 
 
 Mgure !> 1 9rcjani ration of an ALU for functioi 
 
 generation 
 
87 
 
 6. CONCLUSION 
 
 The purpose of this investigation was to explore various 
 methods for generating functions for processors used in aircraft simu- 
 lators. A small 8-bit CORDIC prototype with microprogrammed control 
 was constructed implementing the trigonometric mode. Other methods 
 based on polynomial approximations have also been reviewed. The CORDIC 
 method has the advantage of simplicity in implementation and, being a 
 bit by bit method using only addition and shifting as the basic opera- 
 tions, it can be cheaper to implement than a parallel method. For avia- 
 tion simulation, it is a matter of choice between the following alter- 
 natives: 
 
 a) A central arithmetic unit performing all the computations 
 and being time-shared between all the flight instruments. 
 In this case, obtaining a frequent updating of the in- 
 struments requires a fast arithmetic unit. 
 
 b) Several arithmetic units doing specialized tasks (alti- 
 tude computation, speed, fuel, ... . Each AU could be a 
 relatively slow and inexpensive bit-by-bit serial AU. 
 
 The design of an aircraft simulation system would use a top- 
 down development process. Since the simulator in the decentralized 
 architecture would contain more than one arithmetic unit, the develop- 
 ment would proceed such that modules could be built and tested as soon 
 as they are developed. 
 
 Operations requiring simple and infrequent computations would 
 be implemented by CORDIC bit serial algorithms. 
 
88 
 
 Operations requiring frequent and complex computations (like 
 function generation in an aircraft) would use polynomial approximation 
 algorithms for faster execution. 
 
 Polynomial approximation methods have been studied which make 
 use of polynomials other than the Taylor-MacLauri n series currently 
 used in hardware implementations for division. It has been shown that 
 better error control is achieved by using Tchebychef polynomials. The 
 advantage of the parallel algorithms is that they employ multiplicative 
 operations which can share a single n x n multiplier. 
 
 The potential parallelism of these algorithms should be fur- 
 ther investigated. For example, it is known that a polynomial of 
 degree n can be evaluated in O(logn) operations. Is it possible to de- 
 sign (in an array expandable form) an n x n x n multiplier or more 
 generally an n multiplier? By implementing parallelism at the gate 
 level (which could be called microparal lelism) , what improvements can 
 be achieved for special purpose highly parallel and high speed arithmetic 
 processors? 
 
 Since polynomial evaluation lends itself to parallel proces- 
 sing, what would the time bound on the arithmetic computation of func- 
 tions become? What improvements would be achieved by replacing the 
 processors performing binary operations by a 3- or k- processor per- 
 forming k-nary multiplications and additions? 
 
 Function evaluation by hardware means is currently being further 
 
 3 
 investigated as well as some possible designs for an n multiplier. 
 
89 
 
 7. Appendix 
 
90 
 
 NUMBER 0F BITS »720 
 
 M « ?1 
 
 TEST SIGN «?Y 
 
 X — > K1*SQRT(X0**2+Y0**2) 
 
 Y— >0 
 
 Z — >ZO ♦ ARCTANCYO/XO) 
 
 Kl« 1.6467 60258 12 106 
 1/K 1-0. 60725293500888 
 
 X0 ■ ?• 6072529 
 
 YO ■ ?-• 6072529 
 
 -180 DEG.<« Z0 <» +180 DEQ 
 
 Z0 ■ 70 
 
 INIT. 0K ?Y 
 
 DECIMAL 0NLY ?N 
 
 0.6072502136 0.100110110111010011 SUB 
 
 0.000000000000000000 
 
 0.000000000000000000 
 -0.6072502136 1 1.011001001000101101 ADD 
 
 0.0 0.000000000000000000 SUB 
 
 90.0000000000 1.000000000000000000 
 INITIALIZATI0N 
 
 A.l Output for the Trigonometric Mode 
 
0.6078508136 0. 1.001 101 I 01 1 I 01001 1 ADD 91 
 
 0.100110110111010011 
 
 0.1001101101 1101001 1 
 
 0.6072502136 0. 1 001 1 01 1 01 1 1 01 001 1 SUB 
 
 •90.0000000000 1 1.000000000000000000 ADD 
 
 45.0000000000 0.100000000000000000 
 FIRST 90 DEG. ROTATION 
 
 1.2145004272 1.001101101110100110 ADD 
 1.001101101110100110 
 
 0.000000000000000000 
 0.0 0.000000000000000000 SUB 
 
 ■45.0000000000 1 1.100000000000000000 ADD 
 26.5649414063 0.010010111001000000 
 # SHIFTS ■ 
 
 1.2145004272 1.001 101 101 1 101001 10 SUB 
 
 0. 100110110111010011 
 
 1 1.101100100100010110 
 -0.6072502136 1 1.011001001000101101 ADD 
 
 18.4350585938 1 1 • 1 100101 1 1001 000000 SUB 
 
 14.0360641479 0.001001111110110011 
 # SHIFTS • 1 
 
 A.l (cont.) 
 
92 
 
 1.4142265320 1.011010100000101011 
 0.000000000000000010 
 
 SUB 
 
 1 1.111111111111111111 
 -0.0000036147 1 1.111111111111111111 
 
 ADD 
 
 >45.0006866455 1 1.011111111111111110 
 0.0 0.000000000000000000 
 
 # SHIFTS ■ 17 
 
 SUB 
 
 1.4142303467 1.011010100000101100 
 0.000000000000000001 
 
 ADD 
 
 0.0 
 
 0.000000000000000000 
 0.000000000000000000 
 
 SUB 
 
 •45.0006866455 1 1.011111111111111110 
 0.0 0.000000000000000000 
 
 # SHIFTS « 18 
 
 C0RDIC VALUES TRUE VALUES 
 
 ADD 
 
 1.414230346680 
 0.0 
 -45.000686645508 
 
 1.414163948760 
 0.0 
 -45.000000000000 
 
 ERR0R IN X 
 
 0.00470 
 
 ********** 
 
 0.00153 
 
 A.l (cont.) 
 
93 
 
 NUMBER §F BITS -?20 
 
 M ■ ?0 
 
 TEST SIQN «7Y 
 
 X— >X0 
 
 Y— >0 
 
 Z~ >Z0 ♦ (YO/XO) 
 
 XO - 7.6 
 YO m ?- # 3 
 ZO - 70 
 INIT. 0K 7Y 
 DECIMAL 0NLY 7N 
 
 0.5999984741 0. ! 001 1 001 1 001 1 001 1 
 0.000000000000000000 
 
 SUB 
 
 0.000000000000000000 
 0.2999992371 1 1.101100110011001101 
 
 ADD 
 
 0.0 0.000000000000000000 
 
 0.5000000000 0.100000000000000000 
 INITIALIZATI0N 
 
 SUB 
 
 A. 2 Output for the Linear Mode 
 
94 
 0.5999984741 0.100110011001100110 SUB 
 0.010011001100110011 
 
 0.000000000000000000 
 0.0 0.000000000000000000 SUB 
 
 -0.5000000000 1 1.100000000000000000 ADD 
 
 0.2500000000 0.010000000000000000 
 # SHIFTS m i 
 
 0.5999984741 0.100110011001100110 SUB 
 
 0.00100110011001 1001 
 
 1 1.111101100110011001 
 -0.1499977112 1 1.110110011001100111 ADD 
 
 -0.2500000000 1 1.110000000000000000 SUB 
 
 0.1250000000 0.001000000000000000 
 # SHIFTS « 2 
 
 A. 2 (cont.) 
 
0.5999984741 0.100110011001100110 
 0.000000000000010011 
 
 95 
 
 SUB 
 
 1 1.111111111111111111 
 -0.0000915527 1 1 . 1 1 1 1 1 1 1 1 1 1 1 1 1 01 000 
 
 ADD 
 
 -0.4998779297 1 1.100000000000100000 
 0.0000610352 0.000000000000010000 
 # SHIFTS * 13 
 
 SUB 
 
 0.5999984741 0.100110011001100110 
 0.000000000000001001 
 
 SUB 
 
 1 1.111111111111111111 
 -0.0000572205 1 1.111111111111110001 
 
 ADD 
 
 -0.4999389648 1 1.100000000000010000 
 0.0000305176 0.000000000000001000 
 # SHIFTS ■ 14 
 
 0.5999984741 0.100110011001100110 
 0.000000000000000000 
 
 SUB 
 
 SUB 
 
 1 1.111111111111111111 
 -0.0000305176 1 1.111111111111111000 
 
 ADD 
 
 -0.4999961853 1 1.100000000000000001 
 0.0 0.000000000000000000 
 
 # SHIFTS » 19 
 
 C0RDIC VALUES 
 
 SUB 
 
 0.599998474121 
 -0.000030517578 
 -0.499996185303 
 
 TRUE VALUES 
 
 0.600000000000 
 0.0 
 -0.500000000000 
 
 ERR0R IN X 
 
 0.00025 
 
 ********** 
 
 0.00076 
 
 A. 2 (cont.) 
 
96 
 
 NUMBER 0F BITS -720 
 
 M - 7-1 
 
 TEST SIGN *?Z 
 
 X — >K2*<XO*C0SH ZO + Y0*SINH ZO) 
 Y — >K2*(YO*C0SH ZO + XO*SINH ZO) 
 Z-->0 
 
 K2 « 0.82978162013890 
 1/K2 - 1.20513635844646 
 
 XO ■ 71.205136 
 
 YO - ?0 
 
 -180 DE6.<= ZO <* + 180 DEG 
 
 ZO * 7-45 
 
 INIT. 0K ?Y 
 
 DECIMAL 0NLY ?N 
 
 1.2051353455 1.001101001000001111 SUB 
 
 0.000000000000000000 
 
 0.000000000000000000 
 
 0.0 0.000000000000000000 SUB 
 
 •45.0000000000 1 1.100000000000000000 ADD 
 
 49.4374389648 0.100011001001111101 
 INITIALIZATI0N 
 
 A. 3 Output for the Hyperbolic Mode 
 
97 
 
 1.2051353455 1.001101001000001111 ADD 
 
 0. 1001 101001000001 11 
 
 1 1.101 100101 101 11 1 100 
 -0.6025657654 1 1.011001011011111001 ADD 
 
 4.4374465942 0.000011001001111101 SUB 
 
 22.9868316650 0.010000010110001010 
 # SHIFTS « 1 
 
 1.0544929504 1.000011011111001101 SUB 
 
 0.010000110111 110011 
 
 1 1.111011001011011111 
 -0.3012847900 1 1.101100101101111100 SUB 
 
 18.5493850708 1 1.110010110011110011 ADD 
 
 11.3090515137 0.001000000010101100 
 # SHIFTS ■ 2 
 
 A. 3 (Cont.) 
 
98 
 
 1.1254158020 1.001000000001101101 
 0.000000000000000001 
 
 ADD 
 
 1 1.111111111111111111 
 -0.5200729370 1 1.011110101101110010 
 
 ADD 
 
 0.0003433228 0.000000000000000001 
 0.0 0.000000000000000000 
 
 # SHIFTS - 18 
 
 SUB 
 
 1.1254119873 1.001000000001101100 
 0.000000000000000000 
 
 ADD 
 
 1 1.111111111111111111 
 -0.5200729370 1 I . 01 1 1 1 01 01 1 01 1 1 00 1 
 
 ADD 
 
 0.0003433228 0.000000000000000001 
 0.0 0.000000000000000000 
 
 SUB 
 
 # SHIFTS ■ 19 
 
 C0RDIC VALUES 
 
 1. 125411987305 
 
 -0.520072937012 
 
 0.000343322754 
 
 TRUE VALUES 
 
 1. 127625629814 
 0.521095150503 
 0.0 
 
 ERR0R IN X 
 
 0.19631 
 0.19617 
 
 ********** 
 
 A. 3 (Cont.) 
 
99 
 
 Appendix A. 4 Overview of the Algorithms Proposed By Baker, Ferrari 
 and Ling 
 
100 
 
 P.W. Baker [BAK75] presents generalized higher radix algorithms 
 for some elementary functions. Parallelism is achieved by using fast paral- 
 lel m-bit multipliers where radix = 2 . They are a generalization of the 
 CORDIC and deLugish algorithms based on multiplications by prestored 
 constants of the form (1 + 2" 1 ) or ln(l + 2" 1 ). 
 
 Baker's algorithm for Y/X uses 
 
 = 1 = ULii 
 y x x n^ 
 
 where the f are selected to be of the form 
 
 (1+d. r' k ) d.e {0,1 ,2. . . ,r-l} 
 
 the algorithm is 
 
 Y i+1 - X^l^r-") 
 
 V i+1 = Y.(W.r- k ) 
 
 The first multiplier d, is chosen by table look-up. Then at 
 each step d. is chosen as the one's complement of a. , a. being the k 
 2 -ary coefficient of the number x. written base 2 m . The hardware requires 
 variable shift networks as in the CORDIC algorithms and, in addition, two 
 m-bits multipliers. 
 
 Ferrari usesa Lagrange interpolation method of the function — 
 between x = .5 and x = 1. He uses several colloquation points and looks 
 up the coefficients of the first order interpolating polynomial. 
 
101 
 
 Ling proposes a high-speed division algorithm based, like 
 Steffanelli 's on the Taylor expansion of a fraction of the form yj— . 
 The eight leading digits of the denominator are looked up and two 
 multipliers working in parallel are needed to evaluate the approximating 
 polynomial. This algorithm is designed for 32 bit precision and the 
 same criticisms hold for both Stefanelli's and Ling's algorithms. 
 
102 
 
 A. 5 ROM Usage As Proposed By Stefanelli 
 
103 
 
 Stefanelli proposes a "binary read only memory divider". He 
 shows the realisation of three inverting circuits computing the reciprocal 
 Q = -5- of a binary number B. Their main feature is the use of read-only 
 memories whose function are to provide look-up of prestored polynomial 
 coefficients, and polynomial values. The three circuits differ only in 
 the number of bits of the binary number B. 
 
 The number B is normalized 
 
 B = 1. b 1 b 2 ... b r b r+1 ... b m 
 
 and the reciprocal is 
 
 Q = 0.1, q^ 2 ... q m . 
 The idea is to split B into two parts B, and B ? : 
 6 = 6^ 2~ r B 2 
 
 and look up the inverse Q-, of B-, 
 
 17- Q l =0J «'l «' 2 ••• O'm 
 
 Then 
 
 Q=i= 1 - 1 
 
 B 1 + 2- r B 2 B l 1 + 2" r ^2 
 
 B l 
 
 -Q ] 
 
 1 + 2 _r (B 2 Q 1 ) 
 
 The fraction is expandable in a Taylor series 
 
 1 1 l~X+X _ X •»• 
 
104 
 
 Q = Q 1 (l-2" r Q 1 B 2 +2' 2r (Q 1 B 2 ) 2 - Z' 3r (Qfi 2 ) 3 +....) 
 
 Now Q is to be evaluated using a three-stage procedure: 
 
 1 - Look Up 
 
 2 - Multiplication 
 
 3 - Addition/Subtraction. 
 
 With current technology r can be chosen to be up to 14 (16K 
 of memory addresses). If, for example, the first term of the series is 
 considered, i.e. 
 
 Q = Q 1 - 2" r Q^ B 2 
 
 The error arises primarily 
 
 1) From the truncation error of the series, whose absolute 
 value is known to be less than the maximum value of the first term 
 neglected. Here: 
 
 -r 2 
 
 2) From the computation of 2 Q, Bp 
 
 if B 2 is supposed to be known with any desirable precision, 
 
 2 2 
 
 the error remaining is due to Q, . The number of bits of Q-, is chosen 
 
 such that this error is of the same order as the truncation error. 
 
 When a higher number of bits are required, the second term of 
 
 the series is considered; 
 
 Q ■ Q 1 - 2" r Q^ B 2 + 2" 2r Q-, 3 B 2 2 . 
 
105 
 
 However, B ? can be further split into two parts 
 
 B 2 * B 3 + 2 " S B 4 
 
 in order to avoid too many bits of address for the computation (or 
 
 2 3 2 3 
 
 look-up of B 2 ). The term neglected are Q, B. and Q, B 3 B*. 
 
 The first and second schemes are shown in Figures A. 5.1 and 
 A. 5, 2. 
 
 One can criticize the method in that it makes use of the 
 Taylor polynomial for approximation. Other polynomials exist having 
 better error behavior for the same cost in terms of ROM requirements and 
 same speed. Also, a set of small dedicated multipliers is not yery 
 efficient because they cannot be used for any other functions or even for 
 multiplication of two full-size numbers. It is also impossible to 
 use the same topology for sequential evaluation of higher precision 
 division (more than 32 bits ). 
 
106 
 
 
 
 
 "l 
 
 
 
 
 ADDER 
 
 
 Ron 
 
 B l 
 
 «? 
 
 
 — ^^1 
 
 
 
 
 MULTIPLIER 
 
 
 
 B ? 
 
 
 
 
 
 
 
 ^^ 
 
 Figure A. 5.1 
 
 Bn 
 
 ROM 
 
 Qi 
 
 
 
 
 
 + 
 
 ADDER 
 + 
 
 
 D l 
 
 n 2 
 
 
 ^^ 
 
 
 
 Q l 
 
 
 
 
 Qf 
 
 
 
 
 
 
 MULTIPLIER 
 
 
 n 
 
 
 
 
 h 
 
 
 Q 
 
 
 
 
 
 
 
 
 
 
 
 ROM 
 SQUARE 
 
 
 
 MULTIPLIER 
 
 
 
 
 
 
 
 
 
 
 
 B 2 2 
 
 
 
 
 Figure A. 5. 2 ROM Usage proposed by Steffanelli 
 
107 
 8. REFERENCES 
 
 [BAK75] Baker, P. W. "Parallel Multiplicative Algorithms for Some 
 Elementary Functions". IEEE Transactions on Computers, March 1975, 
 pp. 3. 
 
 [CAN70] Cantor, D., Estrin, G. E. and Turn, R. "Logarithmic and 
 exponential function generation in a variable structure digital comp- 
 uter, IRE Trans, on Electronic Digital Computer, Vol EC-11, Apr 1962. 
 
 [CHE72] Chen, T. "Automatic Computation of Exponentials, Logarithms, 
 Ratios and Square Roots". IBM J. Res. Development, Vol. 16, July 
 1972, pp. 380-8." 
 
 [FER71J Ferrari, D. "A Division Method using a Parallel Multiplier" 
 IEEE Transactions on Electronic Computers, April 1967, pp. 224-228. 
 
 [LIN71] Ling, H. "High Speed Division for Binary Computers". Proc, 
 1971 Spring Joint Computer Conf . , Vol. 38, AFIPS Press, Montvale, 
 N. J., 1971, pp. 373-8. 
 
 [LUG70] DeLugish, B. G. "A Class of Algorithms for Automatic Eval- 
 uation of Certain Elementary Functions in a Binary Computer", Ph. D 
 Dissertation. Department of Computer Science, University of Illi- 
 nois, Urbana, 111., June 1970. 
 
108 
 
 [STE74] Stefanelli, R. "A High Speed Binary Divider". Unpublished 
 paper submitted to the IEEE repository system. 
 
 [WAL64] Wallace, C. S. "A Suggestion for a Fast Multiplier". IEEE 
 Transactions on Electronic Computers, Vol. EC 13, February 1964, 
 pp. 14-17. 
 
 [WAL71] Walter, T. S. "A Unified Algorithm for Elementary Functions, 1 
 AFIPS Conference Proceedings, Spring Joint Computer Conference, pp. 379 
 to 385, 1971. 
 
BLIOGRAPHIC DATA 
 1EET 
 
 1. Report No. 
 
 UIUCDCS-R-76-796 
 
 Tille .mil Subt it le 
 
 SPECIAL PRUPOSE PROCESSORS FOR RADIO AIDS AND 
 FUNCTION GENERATION IN AIRCRAFT SIMULATORS 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 February 1976 
 
 'gfo'tfe H. GARCIA 
 
 8. Performing Organization Rcpt 
 
 No.UIUCDCS-R-76-79f5 
 
 Performing Ojganizayoa .Name at}d Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science and 
 Institute of Aviation 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 Sponsoring Organization Name and Address 
 
 Institute of Aviation 
 University of Illinois 
 Urbana, Illinois 61801 
 
 13. Type of Report & Period 
 Covered 
 
 Master's Thesis 
 
 14. 
 
 Supplementary Notes 
 
 . Abstracts 
 
 The purpose of this work was to investigate the feasibility of a special 
 purpose hardware processor for flight simulation. The study is centered 
 around the arithmetic part of the processor for use in a real time environment 
 where high accuracy and resolution are required for values of the elementary 
 transcendental functions. 
 
 Two main approaches have been considered: 
 -Coordinate rotation methods (Voider, De Lugish and Walther) 
 -Methods based on function approximation employing techniques common 
 to Numerical Analysis and relying on algorithms using multiplication as 
 the basic operator. 
 An 8-bit Arithmetic Cordic Unit has been built and the trigonometric mode 
 microprogrammed. In addition, the general n-bit Arithmetic unit has been simulated 
 for base 2 f Errors bounds for some numerical algorithms havP hPPn Hpri\/Prl. 
 
 I Key Words and Document Analysis. 17a. Descriptors 
 
 Function Generation, Arithmetic Unit, Cordic Algorithms, Coordinate Rotation 
 technique, De Lugish Algorithms, Microprogramming. 
 
 1 • Identifiers/Open-Ended Terms 
 
 13 COSATI Field/Group 
 
 '^Availability Statement 
 
 RELEASE UNLIMITED 
 
 ' C ,"l NTIS-35 (10-70) 
 
 •L 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 117 
 
 22. Price 
 
 USCOMM-DC 40329-P7I 
 

 nEBEMmM