LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 U6r 
 
 no. 715-72! 
 cop. Z 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/furthernegativer721triv 
 
5M ^r 
 
 M-^UIUCDCS-R-75-721 
 
 "yyujtii 
 
 Further Negative Results Regarding the Use 
 of Continued Fractions for Digital 
 Computer Arithmetic 
 
 by 
 
 Kishor Shridharbhai Trivedi 
 
 May 1975 
 
UIUCDCS-R- 75-721 
 
 Farther Negative Results Regarding the Use 
 of Continued Fractions for Digital 
 Computer Arithmetic 
 
 by 
 
 Kishor Shridharbhai Trivedi 
 
 May 1975 
 
 Department of Computer Science 
 
 University of Illinois at Urbana- Champaign 
 
 Urbana, Illinois 
 
 This work was supported in part by the National Science Foundation under 
 Grant No. NSF DCR 73-07998. 
 
11 
 
 Acknowledgment 
 
 The author wishes to thank Professor James E. Robertson for his 
 continued support and encouragement. Thanks are also due to Mrs. June 
 Wingler for typing this paper. 
 
Ill 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. Introduction 1 
 
 2 . Riccati Equation 5 
 
 2 . 1 Power of the Method 7 
 
 2.1.1 Constant Coefficients 7 
 
 2.1.2 Variable Coefficients 8 
 
 2.2 Implementation Considerations 10 
 
 2 . 3 Initial Condition 12 
 
 3. Selection Procedures 13 
 
 3.1 Constant Coefficients 13 
 
 3.1.1 The Case with A < ik 
 
 3.1.2 The Case with A > 23 
 
 3.2 Variable Coefficients 2k 
 
 k. Conclusion 27 
 
 References 28 
 
1. Introduction 
 
 Recently, there has been a considerable interest in the representa- 
 tions of numbers other than the conventional positional notation for 
 digital hardware calculations [1] ; the concern here will be with the 
 continued fractions. To facilitate the hardware implementation, we require 
 that the coefficients of the continued fractions be integral powers of two. 
 One important requirement for such a representation to be useful is that 
 it should be possible to define a broad class of algorithms that are 
 easily soluble. It was shown that a limited class of quadratics can be 
 solved using this approach [1,2]. This was later extended to polynomials 
 of degree larger than two [3]. An algorithm for logarithm was presented 
 in [k]. The class of Riccati differential equations is closed under a 
 bilinear transformation [5]. In this paper we show that a very large 
 number of functions may be evaluated using the Riccati equation approach. 
 
 As a result of the restriction on the coefficients of the 
 continued fractions, the selection of the coefficients, during the 
 interative evaluation of a function, becomes a difficult problem. We 
 require that such a selection procedure be computationally "simple. " It 
 was shown that a simple selection procedure can be obtained for the 
 algorithm for the quadratic equation [1,2], This was later extended to 
 the ploynomials of degree larger than two [3]. Recently, we have shown 
 that for an algorithm for logarithm, a simple selection procedure does 
 not exist [h]. In this paper, we obtain similar negative results for 
 many functions that can be evaluated using the Riccati differential 
 equation. 
 
An infinite continued fraction is represented, by, 
 
 P l P 2 
 
 q i + q 2 
 
 where p. are known as the partial numerators and q. are known as the 
 
 partial denominators. The classical theory of continued fractions uses 
 
 p. = 1 and q. e N where N is the set of natural numbers. We differ from 
 1 i 
 
 this in that we require p. e S and q. e S such that S and S are finite 
 
 i P ^ q P q 
 
 and positive sets. If we let p . = Min S , p = Max S , q . = Min S 
 
 mm p max p Tmn q 
 
 and q = Max S then the smallest number, m, representable as an 
 infinite' continued fraction is the positive solution of the quadratic 
 
 P • 
 mm 
 m = 
 
 p 
 max 
 
 lax q . + m 
 
 Tnin 
 
 Similarly, the largest representable number, M, is the positive solution 
 of the quadratic 
 
 p 
 max 
 M = 
 
 P • 
 mm 
 
 Tiiin q__ + M 
 Tiiax 
 
 Let m - -*-r7 , M = - P — and I = [m , M 1 where p e S and q e S . 
 pq q+M 7 pq q+m pq L pq 7 pq J P 1 
 
 Note that, I is a closed interval of the real numbers. It can be shown 
 
 ' pq 
 
 that [k] the set of numbers representable as infinite continued fractions, 
 using finite and positive digit sets S and S , is complete iff 
 
 I s s ^ U I - [m, M]. 
 
 P q pes 
 
 qes p 
 q 
 
It can also be shown that if S = {1} and S c N then we have completeness 
 
 P J q - 
 
 only if S = N. But this conflicts with the requirement of finiteness. 
 Therefore, we will depart from the classical approach either by allowing 
 
 fractions in S or by using a larger set of partial numerators or both. 
 
 q P l P 2 P n 
 
 Let the finite continued fraction — — ... — be denoted 
 
 P n q l + % + + % 
 
 by — . Letting P = 0, Q, = 1, P =1 and Q = 0, we can evaluate such 
 y, o o -J. -j- 
 
 n 
 
 a fraction using the following recursions [6] : 
 
 P. .. = p. . P. _ + q. P. 
 
 l+l -*!+! 1-1 T. + 1 1 
 
 Vl = P i+1 Q i-1 + q i+l Q i 
 
 i = 0, 1, . . . , n-1. 
 
 Each iterative step of such an evaluation requires four multiplications 
 and two additions. If we require that p. 's and q. 's are powers of two 
 then these four multiplications can be reduced to simple shifts in binary 
 arithmetic. We will, therefore, require that such be the case. 
 
 In the classical approach to function evaluation, a finite 
 continued fraction with a few terms is used. Furthermore, the partial 
 numerators and partial denominators are generally positive integral powers 
 of the argument, x [7]. This will clearly require multiplications in 
 an iterative step. Our approach requires that the partial numerators and 
 denominators be simple powers of two. This implies that the complexity 
 of function evaluation is transferred to a selection procedure which 
 yields the value of the pair (p., q. ) at the i iterative step. Since 
 such a selection procedure, in general, will be very complex (of the order 
 of complexity of the function to be evaluated) and since it will be used 
 in each iterative step, we are forced to use some approximation so as to 
 
render it "simple." A "simple" selection procedure may use shift, add, 
 
 subtract and comparison operations only. This leads us to a discussion 
 
 of redundancy [2,U]. 
 
 Given S and S , if we have completeness then the set of numbers 
 P q 
 
 representable as infinite continued fractions will be called a number 
 
 system (NS). A number system is defined to be nonredundant if for all 
 
 P-,; P^ G S and a,, q_. e S , I HI is either null or is a 
 1 2 p *1' ^2 q' p^ p^ 
 
 singleton. A number system is redundant if it is not nonredundant. It 
 
 can be easily shown that for a nonredundant number system, all but a 
 
 countable set of numbers can be represented uniquely. Therefore, the use 
 
 of any approximation in the selection procedure implies that we use a 
 
 redundant number system. 
 
 Two approaches to function evaluation using continued fractions 
 
 have been attempted. In the first approach, the function to be evaluated 
 
 is f (eO where a is a vector of arguments and we expand f (a. ) using the 
 —\j — i 
 
 following bilinear transformation: 
 
 f(a.) = ^=7 r 
 
 -i q. n + f (a. -, ) 
 
 TL+1 — 1+1 y 
 
 We require that the vector of coefficients a. , can be obtained, from a., 
 * -l+l -i' 
 
 p. , and q. by means of "simple" recursions. A recursion is "simple" 
 if it uses shift, addition and subtraction operations only. The algorithm 
 for the solution of a quadratic equation [1] and the algorithm for 
 logarithm [h] are members of this class. 
 
 In the second approach, we look for equations (algebraic or 
 differential) which are closed under a bilinear transformation. All the 
 
functions which are solutions to such equations can then be evaluated. 
 The Riccati differential equation is a member of this class. 
 
 In Section 2, we show that a very large number of functions can 
 be evaluated using the Riccati equation approach. In Section 3, we show 
 that no simple selection procedure exists for the functions discussed in 
 Section 2. 
 
 2. Riccati Equation 
 
 Riccati equation can be written as : 
 
 y' + a(x)y + b(x)y + c(x) = 0. 
 
 (2.1) 
 
 Let L be the set of all Riccati equations of this form. Wynn has shown 
 that the set L is closed under the bilinear transformation y = p/(q+z) 
 where p, q are constants [5]. Starting with £„ e L, by a repeated 
 application of the bilinear transformation, we can obtain a continued 
 
 fraction expansion for the solution to the initial Riccati equation £ . 
 
 2 
 Let i Q be given by: y^ = a Q y Q + b Q y Q + c Q , and let 
 
 y Q = P 1 /(q- ) +y 1 ). Let this transformation be called T-. : L - L. 
 
 2 
 T-lUq) = i is given by y| + & ± y^ f b 1 y f c - 0. 
 
 The recursion relations for the coefficients a , b , c in terms 
 
 of V b o' c o are ' 
 
 a i = c c/ p i' 
 
 b l = b o + 2 C q i/ p i' 
 
 c i = a o p i ' b o q i + c o q ?/ p i 
 
 > 
 
 (2.2) 
 
Note here that, we have changed the form of i Q to avoid negative 
 
 2 
 signs in recursions (2.2). In general, let ^ = (y^ = a 2m J^ + \ m J 2m + ^m 
 
 , n i t „ ^ +■ h v + c = 0). Assume that, 
 
 and let l^^ = (y^ m+1 + a 2m+1 y 2m+1 + ^ 2m+1 y 2m+1 + c 2m+ i ' 
 
 a = T T t (i ) has been obtained. Then the coefficients of 
 n n n-1 * ' ' 1 
 
 a _ T (o ) are given by the following recursions: 
 n+1 n+1 n 
 
 n+1 ~ "n'^n+l 
 
 c -7p 
 
 Vl = b n + 2 °n V/V 
 
 2 
 
 (2.3) 
 
 c n+l = a n P n+1 + b n Vl + °n V/Vl ' 
 
 As a result of these transformations, we have expanded y Q to n+1 terms as 
 
 follows : 
 
 !i \ Pn+1 (2.1+) 
 
 y " q 1 + ^ + + W +y n+l 
 
 Let P /ft denote the finite continued fraction obtained by setting 
 
 v = in equation (2.10. If we assume that |y | < M where M is a fixed 
 
 ^n+l 
 
 constant then clearly, the fraction P n /Q n converges to y Q . By setting, 
 
 P = 0, Q = 1, P 1 = P x and ^ = q^ the recursions for P n+1 and Q n+1 
 are [k] : 
 
 P n+1 = Vl P n + P n+1 P n-1 1 
 
 \ (2.5) 
 
 Vl = Vl %. + P n+1 Vl J 
 Thus if we have a method to correctly choose Pq , q^ for every n then, 
 we have an algorithm to solve the Riccati equation. 
 

 2.1 Power of the Method 
 
 We will now discuss the number of functions that can be obtained by 
 the method of Riccati equation. 
 
 2.1.1 Constant Coefficients 
 
 Let us consider a subset L of L such that 
 
 2 , 
 
 L = {y + ay + by + c = | a, b, c e R} i.e., the set of all Riccati 
 
 equations with constant coefficients. Consider £ e L given by, 
 
 2 2 
 
 y^ = a Q y Q + b Q y Q + c . Depending on the sign of A = b Q - 4 a Q c Q , the 
 
 solution y n (x) of £ can be written as, 
 
 y (x) = 2a^ (W^* + A Q ) - ^ 
 
 if A < and a / 0; 
 
 1 b o 
 V x) = - a^ " 2T Q f A if A = 0, a Q / 0; 
 
 y (x)^(tanh(^x + A )-^) 
 
 if A > 0, a. / 0; 
 
 V 
 y (x) = A e + c X ^ a Q = 0. 
 
 Depending on the values of the coefficients a_, b , c and the initial 
 condition t = y n (0), many different functions may be evaluated as shown 
 in the following table. 
 
8 
 
 a o 
 1 
 
 k 
 
 i 
 
 o 
 
 o 
 
 1 
 
 A 
 
 *0 
 
 y (x) 
 
 
 
 i 
 
 ! i 
 
 i 
 
 -k 
 
 
 
 tan x 
 
 : -1 
 
 ! o 
 
 1 - 1 
 
 - k 
 
 CO 
 
 cot X 
 
 -1 
 
 
 
 
 
 
 
 00 
 
 l/x 
 
 -1 
 
 ! 
 
 1 
 
 h 
 
 00 
 
 cot h x 
 
 -1 
 
 ! o 
 
 1 
 
 k 
 
 
 
 tan h x 
 
 
 
 i 
 
 1 +i 
 
 1 
 
 i 
 
 X) 
 
 1 
 
 +x 
 e— 
 
 
 
 Table 2 
 
 1 
 
 
 2.1.2 Variable Coefficients 
 
 Consider a subset L of L so that, 
 
 'j 
 
 L = {y 1 - a(x) y + b(x) y + c(x)|a(x) = k(x) a, 
 
 b(x) = k(x) b, c(x) = k(x) c, and a,, b, c 
 
 are constants} . 
 
 Recursions for a , , b n and c n can be derived from the recursions (2.3) 
 n+1' n+1 n+1 v ' 
 
 and are as follows : 
 
 a. , = c./P- , i 
 1+1 v ^l+l 
 
 > 
 
 i+1 
 
 = b. + 2c. q. -,/p. -, 
 i i H i+l'^i+l 
 
 (2.6) 
 
 i+ 
 
 ., = a . p. , +b. q. . + c . q . -,/P- -, • 
 1 i ^i+l i u+1 l ^l+l' *i+l 
 
 -2 
 
 Depending on the sign of /L = b - ^-a n c n , the solution to JL is 
 
 given by: 
 
and 
 
 •J -4. v-4^ f D 
 
 7„(x) = -? (tan(-^2 J k (x) dx + A ) - — ) 
 ° 2a Q ^ 
 
 if ^ < 0, a Q / 0; 
 
 a Q k(x) dx 2a Q 
 
 if ls q = 0, a Q ^ 0; 
 
 y n (x) - - -2 (tan h(-=2 Jk(x) dx + A ) - — ) 
 
 ° 2 *n ^ 
 
 if ^ > 0, a Q ^ 0; 
 
 b fk(x) dx c 
 y (x)=A e°J - =2 if a = 0, b Q /0; 
 
 ^0' 
 
 ^x) = c Q Jk(x) dx + A Q if a Q = b Q = 0. 
 
 Clearly, a large class of functions can be evaluated with this method. 
 
 2.1.2.1 The Case With 2L = 
 
 In this section, we will concentrate on a subset L of L such that, 
 L, = (1 e L k = 0). Any £ e L can be rewritten as: y' = k(x)(a*y+b*) 
 
 where, a* = v a, b* = a*(— ). With this modification, we have reduced the 
 
 2a 
 
 number of coefficients from three to two. The recursions on a*, b* can now 
 
 n' n 
 
 be written as follows : 
 
10 
 
 *\ 
 
 a* , = Wn/p~Tt 
 n+1 n' *n+l' 
 
 The solution & e I^q is given by, 
 
 y (x) = p 7~7~s T 
 
 (a5f(A -Jk(x) dx) 
 
 b* 
 
 y (x) = (b*) 2 /k(x) dx + A Q 
 
 b n+l = (a n p n + l + K VlV^nTl ^ 
 
 - •£ *■* *$*<>> 
 
 if a* = 0. 
 
 (2.7) 
 
 Note that, we can integrate the given function k(x) by this method by 
 
 ■x- 
 
 
 
 setting a* = and b* = 1. 
 
 2.2 Implementation Considerations 
 
 Let us assume that simple selection procedures are available for 
 all the functions to be evaluated as detailed in section 2.1. We now give 
 steps of an algorithm T which will evaluate these functions. 
 Algorithm T: 
 Step 1 : [Initialize] 
 
 Set P Q - 0, Q Q <- 1, P_ 1 <- 1, Q_ x <- 0; 
 
 Set initial values of coefficients according 
 
 to the function to be evaluated; 
 
 Set i <- 0; 
 Step 2 : [Select] 
 
 (P- -i> <L -i ) *- Select(x, coefficients, function); 
 
11 
 
 Step 3 • [Recursions] 
 
 p i + i *■ Vi p i + »i + i p i-i' 
 <W - Vi \ + p i + l \-v 
 
 Recur se using equations (2.3), (2.6) or 
 
 (2.7) whichever is applicable. 
 Step k : [Test] 
 
 After 'sufficient' number of iterations 
 
 GO TO Step 5; otherwise set i *- i+1, 
 
 and GO TO Step 2; 
 Step 5 : [evaluate] 
 
 y (x) = f(x) = P 1+1 /Q 1+1 ; 
 END T; 
 
 In any such iterative algorithm, the number of iterations required 
 and the execution time required per iteration are two important considerations. 
 In each iteration, steps 2, 3 and h are executed. Clearly, step 2 
 and 3 require more attention. We can assume that if the procedure Select 
 is known, it can be implemented in a combinational network and therefore, 
 will require very little time. In step 3, we see that all the assignments 
 are independent of each other and therefore, can be executed in parallel. 
 Thus, given sufficient hardware, step 3 can be speeded up considerably. 
 Each individual recursion requires additions (subtraction), multiplications 
 and sometimes division also. Since multiplication and division are relatively 
 slower operations, we would like to avoid them if possible. If we restrict 
 the coefficients p. and q. to be integral powers of two these multiplications 
 and divisions will be reduced to shifts, which is relatively a faster 
 operation. If we use the recursions (2.7), then we further require that 
 
12 
 
 p. = 1 for V i. We will assume that p. e S , where S = f 2 J I i is an 
 1 l p p ' 
 
 integer} and q. e S where, S = {2 J |j is an integer}. Since the number 
 
 of shifts available is finite we further require that, S = f2^|j < ,i < J 1 
 
 P -p _ P 
 
 and S = f2 J |j < i < J } where J.J.J and J are fixed integers, 
 q '-q - d - q J P ~ P q ~q & 
 
 The number of iterations to be carried out can be decided on the 
 basis of allowable error in the result. 
 
 2.3 Initial Condition 
 
 Associated with the solution of any differential equation, there 
 are one or more arbitrary constants which are evaluated using the boundary 
 conditions imposed. Depending on the Function f (x) to be evaluated, we 
 choose a particular iL e L (and the corresponding coefficient values) and 
 the associated initial condition y o (0) = t„ so that y n (x) = f(x). Clearly, 
 the initial condition on i. (i > l) is dependent on t_. In particular, 
 
 y n-l = Pi/^V^n^ which ^P 116 ^ 
 
 ^-1 = P i/^ g n" Hb n^ which ^P 1163 ^ ( 2 -9) 
 
 t = p /t - q 
 n n n-1 ti 
 
 As we will see in Chapter 3^ t is needed as an argument in a selection 
 
 procedure for p n and q _ . Therefore, we need to evaluate t in every 
 r n+1 -n+1 ' n 
 
 iteration. This, however, implies that a division be carried out. We 
 
 can avoid the division by the following technique. 
 
 Let t = d /e then, from equation (2.8), 
 n n n 
 
 d p e _ 
 
 n n n-1 
 
 e ti d ., 
 n n-1 
 
13 
 
 From which, 
 
 d = p e , - q d n 
 n n n-1 ti n-1 
 
 (2.9) 
 
 e = d 
 n n-1 
 
 and d = t and e = 1. 
 
 If the selection procedure can choose with the help of d and e 
 (does not explicitly require t ) then we have solved our problem. Now in 
 step 3 of algorithm T, we have to carry out recursions (2.9) as well. 
 
 3. Selection Procedures 
 
 We have seen that the form of the solution to a Riccati equation 
 depends on the sign of the discriminant A. It is also clear that the 
 selection procedure will be different for different forms of the solution, 
 i.e., depending on the sign of A. Therefore, if A remains invariant 
 under the bilinear transformation then hopefully the same selection 
 procedure can be used consistently during the iterative evaluation of a 
 function. It can be easily seen that this is indeed the case, i.e., 
 
 A. = A^ - ... = Aq. 
 
 In Section 3.1, we consider selection procedures for Riccati 
 equations with constant coefficients, and in Section 3.2, we consider 
 the more general case of variable coefficients. 
 
 3.1 Constant Coefficients 
 
 We will consider two subcases separately depending upon the value 
 of the discriminant A. 
 
Ik 
 
 3.1.1 The Case With A < 
 
 Consider I such that y.' = 3 (a y + b ± y ± + c ± ) where a. / and 
 j = 1 if i is even and -1 otherwise. The solution to this equation is 
 given by, 
 
 y i W 2a. 
 1 
 
 A ^. 
 
 ban (^ x + A ) - -i 
 
 (3.1) 
 
 If we let the initial condition be, y. (0) = d./e then we can evaluate 
 
 the arbitrary constant A. by substituting the initial condition in equation 
 
 (3.1). Thus, 
 
 A = j ££ (tan (A. ) - jb i/r A ) from which ' 
 e. ^a 1 v -A 
 
 1 x 
 
 2a. d. + b. e. 
 
 / 1 X X X N 
 
 A. = j arctan ( j . 
 
 1 e. n/-A 
 
 x 
 
 Substituting in (3.1), we get, 
 
 y t (x) = 3 ^ 
 
 X 
 
 2a.d.+b. e. -j 
 , « -A \ • x x x x 
 tan (-~ x) + j 
 
 e. \/-A 
 x 
 
 2a.d.+b.i. 
 
 . -A \ X X X X 
 
 1 - j tan(— x) 
 
 e. v-A 
 x 
 
 2a. 
 
 x 
 
 j-sT-A tan e~ x)-e. /-A + \T-A (2a.d.-+b.i. ) - b. (e.V-A - j 
 
 XI XX' XX 
 
 2a. (e. v -A - j tan 
 
 xx 
 
 ^ x) (2a.d.+b.X. ) 
 2 x x x x 
 
15 
 
 j tan C^x) [- e. A + b. (2a.d.+b.e. )] + n/~-A (2a. d. ) 
 J v 2 ' L i i v i i l i yj v i i y 
 
 2a. e. /-A- j tan (^ x) (2a.d.+bJ.)] 
 li 2 ill l J 
 
 j r. u + (v-a) d. 
 
 (v-A) e. - j h. u 
 
 (3.2) 
 
 where r. = 2c.e.+b.d., h. = 2a.d.+b.e. and u = tan (— — x). It is clear 
 l 11111 1111 2 
 
 that the process of selection will involve r., h. , d. and e. but not 
 
 1111 
 
 a., b.. and c.. Therefore, if we could obtain recursions for r. and h. 
 ill 11 
 
 which are free of a., b and c. then we will avoid the computation of 
 
 i i i 
 
 a., b. and c. . We will now derive the recursions for h. and r. using the 
 lii 11 
 
 recursions for a,., b. and c. and a slightly more general form of recursions 
 
 for d. and e. than those used in (2.Q). The recursions for d. and e. are 
 11 11 
 
 as follows : 
 
 L -, = k. , (p. n e. - q, ., d. ) 
 
 i+I i+I VJ ^i+l i H i+1 i y 
 
 and 
 
 Now, 
 
 e. , = k. -. d. . 
 
 i+I i+I i 
 
 h. -, = 2a. , d. , + b. , e. , 
 n-1 it-1 i+I i+I i+I 
 
 2(c./p. , ) k. , (p. , e. - q. , d. ) + 
 v i'-^i+l' 1+-1 Vi i+1 l T.+1 1' 
 
 (b. + 2c. q. ,/p. , ) k. - d. 
 v l l H i+l / ^i+l / i+I l 
 
 2k. , c. e. + k. , b. d. 
 i+I l l i+I l l 
 
 k. , , r. . 
 
 i+I l 
 
16 
 
 In a similar way, we can obtain the recursion for r. . As a result, the 
 set of recursions that we will use is as follows : 
 
 h. , = k. , r. 
 l+l l+l l 
 
 ?. -, = k. ., (p. , h. + q. _. r. ) 
 l+l i+l v± ihl i H i+1 ± J 
 
 d. , = k. _ (p. .. e. - q. ., d. ) 
 i+l i+l ^l+l l H i+1 i y 
 
 e. , = k. .. d. 
 
 i+l i+l i 
 
 ^ 
 
 !3.3) 
 
 J 
 
 The condition for the selection of a (p, q) pair is given by: 
 
 y. (x) e I . In other words, the selection condition is: If 
 l pq 
 
 j r. u + v -A 
 
 < M then choose (p, q). Note that, we cannot 
 
 m < 
 
 Pq ~V-Ae. - j h.u " pq 
 11 
 
 use this condition directly since u is an unknown, therefore, we would 
 like to rewrite the selection condition as follows: 
 
 arc tan (AEG. (m ) ) < A p J ' X < arc tan (AEG. (M ) ) 
 
 i pq 
 
 i pq' 
 
 (3.k) 
 
 where 
 
 ARG. (s) 
 l 
 
 v -A e. s - v-A d. 
 
 r. + s h. 
 i l 
 
 Note that such a rewriting is valid if both of the following conditions 
 are satisfied: (l) AEG.(s) is a monotone-increasing function of s, and 
 (2) arc tan(z) is a monotone-increasing function of z. Since condition (2) 
 is already known to be satisfied, we only have to verify the condition (l). 
 To do this, note that, 
 
17 
 
 Now 
 
 dARG. (s) (r-.+h.sX-sT-Ae.) - h.N/~-A(e s-d ± ) 
 
 **~ (r.+h.s) 2 
 
 1 l 
 
 = ■sT-Afr.e.+h.d. )/(r.+h s)' 
 li i l ' i i 
 
 \ , e. n + h. , d. , = k. . (p. .h.+q. ,r. ) k. d. + 
 
 i+1 i+I l+l l+l l+l ^l+l i T.+1 i y ii 
 
 k 
 
 . . r. (p. -.e.-q. n d. ) 
 l+l l *!+! l ^i+1 i 
 
 2 
 
 k. , (p. ,h.d. +p. ..r.e. ) 
 i+I v± i+l i i *i+l i i y 
 
 2 
 p. , k. , (r. e.+h.d. ) 
 
 *!+! 1 + 1 11 11 
 
 Therefore, 
 
 r e + h d = ( n (pk*)) (r o e 0+ h o d ) . 
 
 = 1 d 
 
 Therefore, ARG. (s) is a monotone-increasing function of s provided 
 
 r e + h d > 0. Observe that there is no loss of generality in 
 
 u u u o i 
 
 assuming that r e + h d > 0. Since if r e + h d < then 
 
 ARG (s) will be a monotone-decreasing function of s and we can turn the 
 
 i 
 
 inequality (3.^0 around and follow very similar arguments. Also note 
 that the condition r e +■ h d = will not occur, since this implies 
 that either t (the initial condition) is complex or d n = e = or 
 a 0. 
 
 In theory, the selection condition (3.*0 can be used to select 
 the (p,q) pair during each iterative step, but the amount of computation 
 involved is clearly excessive. We note that in order to compute a boundary 
 of a selection region, arc tan (ARG. (s)) needs to be computed and there are 
 
18 
 
 as many as S x S selection regions. It is, therefore, clear that we 
 i pi i qi 
 
 would like to use an approximation to arc tan (AEG. (s)) which is "easy" 
 enough to compute from the available coefficients h., r., d. , e. and the 
 
 1X11 
 
 known value of s. We note that the use of an approximation in the 
 selection procedure implies the use of redundancy in the digit sets since 
 otherwise we cannot guarantee correct selection. 
 
 With the use of redundancy, there will be regions in which more 
 than one (p, q) pair can be chosen. Define I < I if there exists 
 
 ¥1 ^ 
 
 f e I such that for all gel a >f<g. A pair (p p ,cu) is said to 
 
 p i q i P 2 q 2 
 
 be right-adjacent to a pair (p n ,q.) if I < I and for all (p-.q,,) 
 
 11 P 1 q 1 P 2 q£ 3 3 
 
 such that I <I ,1 <I .A similar definition of 
 p l q l p 3 q 3 v &> " P 3 q 3 
 
 left -adjacency can be given. Given a pair (p-,,q-,) its left -adjacent 
 
 pair (pp,^) and the right-adjacent pair (p~.>qo)> ^he following holds: 
 
 If f e I H I then we can choose (p ,q ) or (p.,qj, if 
 p l q l p 3 q 3 3 3 v-*!'-*!" 
 
 f 6 I (1 I then we can choose (p, ,q n ) or (p~,q~,) and if 
 P i q i V 2^ ^1^1^ \* 2 '^2' 
 
 f e I - (I HI ) - (I (1 I ) then we must choose the pair 
 p l q l p l q l p 3 q 3 P l q l ^ 
 
 (p , q ). We note that the existence of selection overlap regions such 
 
 as I I allows us to use an approximation in the selection 
 p l q l p 3 q 3 
 
 procedure. Let us denote the approximate value of arc tan (AEG. (s)) by 
 AT. (s), then the selection rule to be used can be specified by: 
 
 If AT i (z 1 ) < "^ j x < AT i (z 2 ) then choose (p^q^ (3.5) 
 
 where z, e I D I and z„ e I HI . Note that z n , z^ will 
 P l q l P 3 q 3 P l q l V 2^2 
 
19 
 
 now be a "boundary between adjacent selection regions and therefore the 
 selection of the (p,q) pair will now be unique. In order to guarantee 
 correct selection using condition (3.5)> we have to show that the region 
 specified by condition (3.5) is a subset of the region specified by the 
 condition (3.k). From this, we can say that the maximum error allowable in 
 the computation of arc tan (ARG. (s)), denoted by E., is given by: 
 
 E. = Max [arc tan (ARG. (M )) - AT. U), 
 
 AT. (z_) - arc tan (ARG. (m ))]. 
 l r i v p q yyj 
 
 In other words, we can find s. and s p (s p > s ) such that, 
 E. < arc tan (ARG. (s )) - arc tan (ARG. (s )). 
 Now we note that, arc tan(z) satisfies the IApschitz condition, i.e., 
 | arc tan(z„) - arc tan(z )| < L|z p -z | 
 
 for L > and L < II. Therefore, 
 
 E. < L (ARG.(s 2 ) - ARG i (s 1 )). (3.6) 
 
 Now, 
 
 II. = ARG.(s 2 ) - ARG i (s 1 ) 
 
 ^-A ( e i s 2 -c3 i ) /-A (e i s 1 -d i ) 
 
 (r.+h.s ) (r.+h. s n ) 
 112' ill 
 
 (r i e i +h i d i ) (vT-A) (s 2 - Sl ) 
 ( Sl h. + r.) (s 2 h..r.) ' 
 
20 
 
 Using an expression derived for r. e. + h. d. earlier, we have, 
 to 1111 
 
 /-A(s 2 - Sl ) (^p.k^) (r e 0+ h d Q ) 
 
 H i = (s n h.+r.]"(s"h. + r.) (3 - 7 ' 
 
 1 i i 2 i x 
 
 We are now interested in eliminating h. and r. from the expression of H. . 
 
 Towards this end, we will show that, 
 
 r. = r n K. Q. + h n K. P. 
 l Oil Oil 
 
 where 
 
 K. = JI (k.). 
 
 1 j=i > 
 
 We proceed to prove this result by induction on i. Since P = 0, Q = 1 
 and K = 1, we have r = r *1'1 + h *1*0 = r . Now recursions (3.3); 
 we have, 
 
 r i = ^^WW = r o K i Q i + h o K i p r 
 
 Now assume that the required result is true for r.. For j < i. Again 
 
 J 
 
 from recursions (3. 3); 
 
 r. - = k. , (p. n h.+q_. , r. ) 
 1 + 1 1+1^1 + 1 i tL+1 i y 
 
 = k. .. (p. ,k.r. ,+q. ,r. ) 
 1+1 V± 1 + 1 1 1-1 tL + 1 i y 
 
 - k. ,(p. _k. (r_K. _Q. -,+hK. n P. .) + q. . (rJC. Q. +h_K. P. ) ) 
 
 i+l v± i+l i v l-ll-l l-l l-l ti+l v 11 l i yy 
 
 - r„ K. . (p. _Q. n +q. ,Q. ) + h^ K. . (p. n P. _+q. _p. ) 
 
 i+l^i+l^i-l ^l+l^i' i+l^i+l l-l H i+1 i y 
 
 = r K. 1 Q. . + h K. ' P. , , . 
 
 i+l i+l 1+1 1+1 
 
21 
 
 Thus, we have the required result. It follows from this that 
 
 h. = k. r. _ = K. (r 0. .+h n P. .) 
 i x l-l l l-l i-l y 
 
 Now substituting these expressions for h. and r. in the equation (3.7), 
 
 we have, 
 
 H = l=i- 
 
 V'lt'Ad+Vw' + r o Q i + h o V^^oWVi-i) 
 
 + r o Q i + h o P i ] - 
 
 Substituting this in the expression (3.6), we have, 
 
 i 
 
 Up ) l (r e 0+ h a ) -f-A (s 2 - Sl ) 
 
 El - [ S l (r Q i-l th P i-l )+r Q i +h P i" S 2 (r S Q i-l +h P i-l )tr Q i +h P iJ ' 
 
 Now we consider two cases, depending upon the value of r_. If r / 
 then we have, 
 
 Up.,) 
 
 E i * \ i=^- (3-8) 
 
 1 1-1 
 
 since P., Q. , P. .., Q. n , s n , s_ are all > and where 
 l i l-l l-l 1 2 
 
 r^e.+h^d- s_-s n 
 B = L ( ° ° ° ° ) ( S_l) /-A . 
 
 r 2 
 
 On the other hand if r Q = 
 
 i 
 
 ( np.) Lh Q d Q /-A (s 2 -s 1 ) 
 
 E i s -^ ■ — 
 
 h (s l P i-l ,P i^ S 2 P i-l fP i) 
 
22 
 
 ( up.) 
 
 < J° l3 
 - p i p i-i 
 
 s^ J tu 
 
 (3.9) 
 
 We will now obtain a bound on P. P. n in terms of Q. Q. n . A well known 
 
 i l-l i i-l 
 
 property of the convergent s of an infinite continued fraction, f, can be 
 written as [6] : 
 
 P P 
 2 
 
 ^^<..,<f< 
 
 P. 
 
 P P 
 
 P. P P . 
 i j. d. min 
 Therefore, if i is odd, -r— > m. If i > 2 is even, — > — > 
 
 Therefore, 
 
 Tiiax 
 
 max 
 Tnin 
 
 P. P. , 
 
 i i-l 
 
 0, ' Q. -, 
 
 i i-l 
 
 m p 
 
 > 
 
 mm 
 
 Tnax 
 
 •max 
 
 tii: 
 
 .in 
 
 Substituting this in (3.9)j we have, 
 
 2 1 r 
 
 where B~ = 
 
 2 s^ 
 
 ( np.) 
 
 ,i=i J 
 
 
 
 h 
 
 
 m p . 
 mm 
 
 TTli 
 
 ax 
 
 max 
 Tiiiri 
 
 (3.10) 
 
 From (3.9) said (3.10), we have, 
 
 E. < 
 
 3=1 3 
 
 ± - Q i Q i-l 
 
 where B = B if r / and B otherwise. Note that B is a fixed, finite 
 and bounded constant independent of the value of i. The factor 
 
 ( K p.)/Q. Q. -. can be interpreted as the error in the solution, since 
 
 ,1=1 
 
23 
 
 it equals the difference in values of the successive convergent s P. ,/Q. , 
 and P./Q. [6]. Therefore, if we demand linear convergence then we must have, 
 
 Tip 
 
 1=1 d -i 
 
 —■, r — = constant • a 
 
 Q.Q. , 
 i l-l 
 
 for a small positive constant and some a > 1. As a result, we have, 
 
 E. < B* • en -1 . 
 
 l — 
 
 But this implies that the computation of arc tan (AEG. (s)) must be carried 
 out to nearly the same precision as that of the desired precision of the 
 function being evaluated. Thus we conclude that we cannot obtain a 
 computationally simple selection procedure for the functions that can 
 be evaluated using the Riccati equation with constant coefficients and 
 A < 0. 
 
 3.1.2 The Case With A > 
 
 Consider the following Riccati equation: 
 
 2 
 y[ = J'(a i y i +b i y i H-c i ) 
 
 such that A : A > and j = 1 if i is even and -1 otherwise. The solution 
 to this equation can be written as, 
 
 b. 
 
 .jWa + 
 
 yi(x)= ^ coth( J*p + AJ-^- (3.11) 
 
 where A. is an arbitrary constant of integration. Using the initial condition 
 
 "/a e. 
 
 y. (0) -= t. = d./e., we obtain tanh A. = - « — 3—7; • For "the sake 
 
 113/1' 1 ^a.d.+b.e. 
 
 11 11 
 
2k 
 
 of brevity, we let h. = 2a. d. + b. e. and after substituting for A. in 
 
 1 1 1 i 1 i 
 
 (3.11), we get, 
 
 , t j A e. tanh {-%-) - j b. h. tanh (—-) - nTa 2a. d. ^ 
 
 v (y-) L- I x £ x 1 E x x \ 
 
 From which, we get, 
 
 Jtanh (L T ) - (y.h.+r. (3.02) 
 
 11 l 
 
 where r. - b. d. +- 2c. e. . From equation (3.11), we note that if e„ = 1, 
 11111 
 
 d Q = 0, h Q = and r Q = n/a then y Q (x) = tan h f^). If e Q =.0, d Q = 1, 
 
 h = - vA and r = then y n (x) = coth (— p— ) . If c _ = and a n = then 
 
 b Q x 
 we have y (x) = A e 
 
 From the form of the equation (3.12) and the definitions of r. and 
 
 h., it is clear that we can follow the same arguments as in Section 3.1.1 
 
 and prove that a computationally simple selection procedure cannot be 
 
 obtained in the case that A > or a = 0. Thus we have shown the negative 
 
 results for the Riccati equation with constant coefficients. 
 
 3.2 Variable Coefficients 
 
 We will only consider the case with A = 0, i.e., we consider the 
 subset L of L. Consider the equation 
 
 y^ = j k(x) (& ± y+b ± f (3.13) 
 
 where ,j = 1 if i is even and zero otherwise. We will use the following set 
 of recursions: 
 
25 
 
 a. , = b./vp. -. 
 l+l i' *i+l 
 
 "b. , = a. vp. , + b. q. ,/vp. , 
 
 l+l l ^l+l l tL+1' *i+1 
 
 d. , = e. vp. , - d. q_. ,/vp. , 
 
 l+l i •*!+! i tL+1' -^1+1 
 
 e. , = d.A/p. , 
 l+l i' *i+l 
 
 J 
 
 (3. Hi) 
 
 The solution to this equation is given by: 
 
 cL + j(g(x)-g(0)) b i (a i b i +d.e i ) 
 y i U) = e. - j(g(x)-g(0)) a i (a i b i+ d i e i ) 
 
 (3.15) 
 
 where g(x) - / k(x) dx. To simplify the equation (3.15)> we can easily 
 prove by induction on i, that 
 
 a. b. + d. e. = a n b_ + d_ e_ = r. . 
 
 11 11 00 00 
 
 Note that (using the recursions 3.1*+)> 
 
 a. . b. . + d. , e. . 
 l+l l+l l+l l+l 
 
 = b./vp. n (e.vp. ,-d.q. ,/vp. , ) h 
 
 i' *i+l i *i+l 1T.+1' *i+l ' 
 
 (a.vp. . »• b. q. ,A/p. , ) d.A/p. . 
 
 v l l+l i l+l' *i+l' i' *i + l 
 
 - a. d. +- b. e. . 
 ii li 
 
 Using this, we get 
 
 \ y j(g(x)-g(0)) b. r Q 
 y i U) r o. , j(g(x)-g(0)) a. r Q 
 
26 
 
 The selection condition can now be written as: If 
 d. + j(g(x)-g(0)) Id. r. 
 
 < T7 — 7 — ? 77TYT < M then choose (p,q), 
 
 - e. + j(g(x)-g(0)) a, r. - pq v ^' Hy 
 
 m < 
 
 i 
 
 Since g(x) is the unknown we want to transform the selection condition to: 
 
 , fj(M e.-d.+jM g(0)a.r_)^ 
 -1 / pq j i pq &v i 0' \ 
 
 L I b.r.+M a.r_ J ~ 
 v l pq l ^ 
 
 , . j(m e -d +om g(0)a.r ) | 
 
 ^ r n b.+m a. ) j U.J-?; 
 
 But this transformation is valid provided, ARG. (s) is a monotone-increasing 
 function of s and g (z) is a monotone-increasing function of z. Note that, 
 
 s e - d + j s g(0) a r 
 
 ARG. (s) = i ^ , i— m 
 
 i r Jb.+sa. ) 
 
 l l 
 
 Therefore, 
 
 ^ARG i (s) r (b i +sa i )(e jL +jg(0)a i r )-(se i -d +jsg(0)a i r )r a jL 
 
 r (b.+sa. ) 
 i i 
 
 1 + j g(0) a ± b ± 
 
 (b. +sa. ) 
 i l 
 
 For simplicity, we assume g(0) = then clearly, ARG. (s) is a 
 monotone -increasing function of s. We also assume that g (z) is a 
 monotone-increasing function of z. If it is a monotone-decreasing then 
 we can turn the inequality (3.15) around and similar arguments can be 
 carried out. 
 
 The inequality (3.15) can be split up into two parts depending 
 upon the value of i. We will only consider the case when i is even, the 
 
27 
 
 other case being very similar. Then the selection condition is: 
 
 ^ _1 (AEG. (m )) < x < g _1 (ARG. (M )). 
 r pq - - 1 pq 
 
 Now since g (ARG. (s)) is difficult to compute in general, therefore, we 
 would like to use an approximation. The maximum error allowable in such 
 an approximation can be written as, 
 
 E i = g _1 (ARG.(s 2 )) - g" 1 (ARG i (s 1 )) 
 
 where m < s < s < M. Now we assume that g ' satisfies the Lipschitz 
 condition with "small" value of the Lipschitz constant L. Then 
 
 E < L[ARG i (s 2 )-ARG i (s 1 )] (3.l6) 
 
 Now, 
 
 H. = ARG. (s_) - ARG. (s_ ) 
 l i 2' l 1 
 
 s 2 e. - d. s ± e x - d. 
 
 r (b i KP '2 a i ) r O (b i + S i a i ) 
 
 S 2 - s l 
 
 (b. hs.a. ) (b. +s n a. ) 
 i 2 i'; i 1 i 
 
 From this point onwards, we can follow a procedure similar to 
 lion 3.1 to obtain a similar negative result. 
 
 h . Conclusion 
 
 Recently, there has been some interest in the use of continued 
 fractions for digital hardware calculations. We require that the 
 coefficients of the continued fractions be integral powers of two. As 
 a result, the selection of coefficients during the iterative evaluation of 
 a function becomes a difficult problem. Wc have shown that practical 
 Lection procedures do not exist for most functions evaluated using the 
 ; I quation approach. 
 
28 
 
 References 
 
 [1] Robertson, J. E. and K. S. Trivedi, "The Status of Investigations into 
 Computer Hardware Design Based on the Use of Continued Fractions, " 
 IEEE Transactions on Computers , Vol. C-22, No. 6, June 1973, pp. 555-560. 
 
 [2] Trivedi, K. S., "An Algorithm for the Solution of a Quadratic Equation 
 Using Continued Fractions, " M. S. Thesis, University of Illinois, Urbana, 
 June 1972; also Department of Computer Science Report #525. 
 
 [3] Bracha, A., "A Method for Solving Polynomial Equations by Continued 
 
 Fractions, " IEEE Transactions on Computers , Vol. C-23, No. 10, October 
 197^+, pp. 1093-1097. 
 
 [U] Trivedi, K. S., "On a Negative Result Regarding the Use of Continued, 
 Fractions for Digital Computer Arithmetic, " Department of Computer 
 Science Report #693, University of Illinois, Urbana, January, 1975. 
 
 [5] Wynn, P., "On Some Recent Developments in the Theory and Application 
 of Continued Fractions, " Journal SIAM on Num. Anal. , Vol. 1, 
 pp. 177-197, 196^. 
 
 [6] Wall, H. , "Analytic Theory of Continued Fractions," Van Nostrand, 
 Princeton, New Jersey, 1950. 
 
 [7] Khovanskii, A. N., "The Application of Continued Fractions," P. Nordhoff, 
 N. V. - Groningen - The Netherlands, 1963. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-75-721 
 
 3. Recipient's Accession No. 
 
 4. Title and Subtitle 
 
 Further Negative Results Regarding the Use of 
 Continued Fractions for Digital Computer Arithmetic 
 
 5- Report Date 
 
 May 1975 
 
 7. Author(s) 
 
 Kishor Shridharbhai Trivedi 
 
 8. Performing Organization Rept. 
 
 No. 
 
 9. Performing Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 NSF DCR 73-07998 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 15 Supplementary Notes 
 
 16. Abstract; 
 
 Recently, there has been some interest in the use of continued fractions 
 for digital hardware calculations. We require that the coefficients of the 
 continued fractions be integral powers of two. As a result, the selection 
 of coefficients during the iterative evaluation of a function becomes a 
 difficult problem. In this paper, we show that no practical selection 
 procedure exists for most functions evaluated using the Riccati equation 
 approach. 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 Bilinear Transformation 
 Completeness 
 Computer Arithmetic 
 Continued Fractions 
 Hardware 
 Redundancy 
 Riccati Equation 
 Selection Procedure 
 
 171'. Identifiers Open-Knded Terms 
 
 17c. < os \ I I I' ie Id/Group 
 
 18. As ,ul abil icy Scacemeni 
 
 19. Security Class (This 
 Report ) 
 
 UNCLASSIFIED 
 
 20. Security (lass (This 
 Page 
 
 UN( LASSIFIED 
 
 21. No. of Pa 
 
 M 1 ' 
 
 22. I'm 
 
 USCOMM-DC 40329-P7 
 
* 
 
 >