LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-OHAMPAIGN too. 2. Digitized by the Internet Archive in 2013 http://archive.org/details/parallelmethodsb437maru ^Jl^A? Report No. kjj JtyUy&{ PARALLEL METHODS AND BOUNDS OF EVALUATING POLYNOMIALS by Kiyoshi Maruyama March, 1971 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Report No. 1*37 PARALLEL METHODS AND BOUNDS OF EVALUATING POLYNOMIALS by Kiyoshi Maruyama March, 1971 Department of Computer Science University of Illinois at Urb ana- Champaign Urbana, Illinois 6l801 This work was supported in part by the Department of Computer Science at the University of Illinois at Urbana-Champaign, Urbana, Illinois; the National Science Foundation under NSF Grant GJ-328. Ill Abstract a lower bound of the maximum evaluable degree of polynomi? each step s is given, and is N(s)>2 " , J ~ ( 2 /s) ' . An upper 1 is derived for the number of steps required to evaluate polynomials of n, and is given by T(P m ) < (l + e)log p n, which approaches theoretical I n bound Eog 2 (2n + 1^ as n— *>&* , where e ^ (2/log n) 1/2 ■U~--*A ~-P -l-Vw ~-p ~,,~ — ±a — 3 required to achie\ ;e polynomials of d« COkrtKMIONS 1. In Theorem 3 (primal property) on page 15 should he ^/log^) 1 2 . ?. In Theorem 7 on page 32, the statement "The minimum number of " should he "The upper bound of number of ".. Better upper bound of number of operations required to evaluate polynomials of degree n by I.uraoka's folding method is given by 2n + V - 2 , where ^ is an integer ^-il such that ^-th Fibonacci number ^ n and (^+ 1) st Fibonacci number > n. For example f : 123*5678 — Fibonacci No.: 112358 13 21 ... if n=7 then *« 5 thus the number of operations required to evaluate a polynomial of degree 7 is 2»7 + 5 - 2 « 17 t which is better than Theorem 7. lonstruction of com] lods, which achieve; better results thai 2 of a simple schedi primal problem, du; LOF, consistency, Ill Abstract a lower bound of the maximum evaluable degree of polynomial for each step s is given, and is N(s)>2 S ^ " , J =s (2/s) ' . An upper bound is derived for the number of steps required to evaluate polynomials of degree n, and is given by T(P ) < (l + e)log n, which approaches theoretical lower bound Eog p (2n + 1)1 as n— *oq, where e % (2/log n) l/2 An upper bound of the number of operations required to achieve the theoretical lower bound of steps required to evaluate polynomials of degree n is given as Cn, C > 2. Furthermore a systematic construction of computation trees, mult i- folding and modified multi-folding methods, which achieves the bound is given. Our dual problem approach leads to better results than Brent's (1] primal problem approach, and shows the exi stance of a simple scheduling algorithm to evaluate polynomials within the bound. Index terms evaluation, n-th degree polynomial, primal problem, dual problem, steps, computation tree, binary, balanced, LHT, RHT, LOF, consistency, segment, subpolynomial, x-term, amplification factor. IV ACKNOWLEDGMENT The author wishes to thank Dr. D. Kuck, Professor of Computer Science at University of Illinois, for suggesting the problem and his comments. He also acknowledges the assistance of Dr. Y. Muraoka and his helpful discussion. Thanks are extended to Miss Sue Cook who helped in getting this report finished. V TABLE OF CONTENTS page 1. INTRODUCTION 1 2 . COMPUTATION TREES ^ 3 • ALGORITHMS 11 k . COMPUTATION RESULTS AND THE BOUNDS 13 5 • CONCLUSION 22 LIST OF REFERENCES .- 2k APPENDICES A. PROOFS OF MAIN PROPERTIES 25 B. SOME OTHER PROPERTIES 29 C. PL/l PROGRAMS 35 1. INTRODUCTION The computation of polynomials has "been studied for many years. It has "been showm by Pan[5] that 2n operations are required to evaluate a polynomial of degree n. Thus, for a serial machine, Horner's rule is optimal. However, it is easy to see that it requires only l^ogpn - ! steps to evaluate all powers, one step to multiply by the coefficients, and 1 + flog nl steps to sum the terms. Thus by introducing some "redundant" operations one can obtain the result in 2(1 + ^log p r? ) time steps (assuming multiplication and addition times are equal) . This is a crude bound for a multiarithmetic unit machine because some additions can be performed before the final multiplications are performed. Some of the well known methods are Estrin's[3] and the k-th order Horner's rule[2] . And two other methods called a tree method and a folding method have been developed by Muraoka[U] . Hereafter, we assume that arbitrarily many processors are available to evaluate polynomials. We write an n-th degree polynomial as ■r, / \ n n-1 / n n P (x) = a x + a ..x + + a.x + a_ (1) n n n-1 10 K ' where we assume a. X for < i < n. P may be used to denote P (x) for l - ~~ n n v ' simplicity. The lower bound for the number of steps, or time, required to evaluate a polynomial of degree n is 'log p (2n + l)' , or 'log n" + 1, since 2n operations are required to evaluate P (x) while a k level balanced binary tree k-1 contains 2 nodes which correspond to binary operators. The dual problem to a problem of finding the minimum number of steps s required to evaluate P (x) may be stated as a problem of finding the maximum N for which the polynomial P, T (x) may be evaluated for a particular given method in s steps. We consider this dual problem hereafter. Let P (x) and P (x) "be two polynomials and let s, and s^ "be the n K n p minimum number of steps required to evaluate P and P , respectively. Clearly 1 n 2 then n n > n implies s.. > s (the converse is not true) and s > s implies n.. > n (see appendix B) . P (x) may be divided into q segments where each segment consists of n m. a subpolynomial and an x-term, i.e., Q, (x) and x . Thus we may write n. 1 q-1 m. p n w = .z \ oo* 1 + p m ( x) ( 2 ) i=l i q-1 where n = m + JJ (n. + l) and m. =m+i+ XI n -- Such a segmentatin of a i=l 1 X d tCQ. ) and s' - 1 > flog m? , where T is a function from polynomials i p to the non negative integers I. Such a segment is said to be consistent if n both conditions are satisfied. Unless otherwise specified, for each Q (x) we i choose n. = N such that N = max \ ml s - 1 = T(P )] . i li m J The symbol N(s, q), where s is non negative integer and q is integer greater than or equal to 2, is used to denote the maximum degree of polynomial P / v which can be evaluated in s steps by q-cut. It is convenient to consider the formula (2) as a computation tree to evaluate P (x) . The sub computation tree to evaluate the first term of (2) is called the LHT while the sub computation tree for the second term is called the KHT . As an example, Muraoka's folding method, q = 2, may be stated as follows: let N(s-l) and N(s) be the maximum degrees of polynomials which can be evaluated in (s - l) and s steps, respectively. Then using his method, N(s+l) = N(s-l) + 1 + N(s), where N(s-l) + 1 and N(s) correspond to the LHT and RHT of the computation tree of P , y and N(s+l) approaches 1.62 times N(s) for relatively large s since the sequence [n(s) + 1 | s>oJ forms a Fibonacci sequence (reference Table l) . 2. COMMUTATION TEEES k k+1 In general, for such a computation tree, if 2 < q - 1 < 2 , k > 0, then the maximum degree of polynomial -which can be evaluated in (s + l) steps, N(s+1, q), is given by: N(s+1, q) = N(s) + A Q (N(s-k-l+M) + l) + A (N(s-k-2+M) + l) + + A M (N(s-k-l) +1) + A M+1 (N(s-k-2) +1) + + Vl 1 ^ 8 * 1 ^ +1) J M+L = N(s) + V A. (N(s-k-l+M-i) + l) (3) i=0 X M+L where q - 1 = Yl A. , A. is the number of segments on the level (k-M+i) and i=0 1 X A^ T is multiple of 2. L is the level difference between k and the highest level of segment in the LHT. M is the level difference between k and the lowest level of segment in the LHT. A computation tree is said to be q-consistent if each segment in the LHT is consistent. Thus formula (3) is valid if its computation tree is q-consistent, and is illustrated in Figure 1. It is easy to see that a computation tree is q-consistent if and only if the x-term of the leftmost segment in iL is consistent. A balanced computation tree with respect toq,2 0, < j < L, where (X is an amplification factor defined by the ratio of N(s-k-j) + 1 and N(s-k-l-j) + 1, and 1 < 0((s) < 2. Proof: Let us assume the computation tree of formula (3) is q-consistent and a pair of segments in A^ . are I levels moved up, and let •*- denote an assignment, Vm *" Vm + 2 and Vj-m - Vj.i-i " 1 - Thus the improvement achieved by this rearrangement is given by 2(N(s-k-l-j+£) + 1) + (N(s-k-l-j+l) + 1) - 2(N(s-k-l-j) + l) - (N(s-k-l-j+ +l)+l) > (2o/ + CX- of +1 - 2)(N(s-k-l-j) + 1) = (d - 1)(2 -O0(N(s-k-l-d) + 1) > where 0U#(s-k-l-j) = (N(s-k-j) + l) / (N(s-k-j-l) + l) and in general 0((s+l)> 0((s) for relatively large s. 2(q - 1—2 ) segments k+1 k-1 k+1 (2 - q+l) segments DD s - k -! s-k s-k+1 level RHT s-1 s+1 steps Fig 2. LOF computation tree Lemma 1 implies that the maximum degree of polynomial evaluated in (s + l) steps by a balanced tree is greater than that achieved by a computation tree which is non-balanced with respect to q. Lemma 2 k k+1 For given q, 2 < q-1 < 2 , if a balanced computation tree is q-consistent then the LOF computation tree is also q-consistent (the converse is not true). Proof : Let us assume the LOF computation tree is not q-consistent. Then m. the x-term, x J , of the leftmost segment in A_ is not consistent. Since m. is not greater than the power m. of the leftmost segment A_ of any balanced computation tree, then such a balanced computation tree is not q-consistent. (Q.E.D. ) Lemma 2 implies that the LOF computation tree can be q-consistent while the balanced computation tree is not q-consistent. In other words, the LOF computation tree is better than the regular balanced tree to solve our dual problem. Lemma 3 k k+1 The q-consistency condition, for q > 3 such that 2 < q-1 < 2 , of an LOF computation tree at step (s + l) is given by s - k - 2 > flog (N(s) + (2(q - 1 - 2 k ) - l)(N(s-k-2) + l) + lj for s > 7- It is easy to see that 2-consistency always holds. Proof: See appendix A. Lemma k Let us assume that the LHT of a computation tree at step (s + l) is arranged as a non-LOF computation tree (can be balanced), and let the maximum legree of polynomial evaluated by such a computation tree be denoted by N(s+1, q). Then there exist at least one LOF computation tree of q- consistency, q > q and N(s+1, q) > N(s+l,q.) . Proof: It is clear by the definition of q-consistency and the proof of Lemma 3 (Q.E.D.) Oheorem 1 For q > 3 such that 2 < q-1 < 2 , k > 0, if the q-consistency condition for the LOF computation tree holds at step (s + l), then the maximum legree of polynomial evaluated in (s + l) steps is given by T(s+l,q) = N(s) + 2(q - 1 - 2 )(N(s-k-2) + l) + (2 - q + l)(N(s-k-l) + l). ^or q =2, i.e., the folding method gives the maximum degree. N(s+1,2) = N(s) + N(s-l) + 1. ^Toof: Lemmas 1 through \ lead us to the result that the maximum degree of )olynomial evaluated at each step, s > 0, which is achieved by the LOF :omputation tree is greater than any other computation trees in LHT. By the lefinition of the LOF computation tree and the proof of Lemma 3, q - 1 = A_+ A.. , J < q - 1 < 2 k+1 and 2 k = A Q + A-/2, thus we get A Q = 2 + - q + 1 and Lj_ =2(q - 1 - 2 k ). (Q.E.D.) 10 Corollary 1 If a LOF computation tree is q-consistent then N(s+1, q) > N(s+1, q-i) for 1 $ i < q - 3 • Proof: It Is easy to see that a q-consistent LOF computation tree is (q - i)- consistent for 1 < i < q - 2. It is enough to show that W(s+1, q) > N(s+1, q-l) k k+1 Case 1. (for 2 = q - 2 < 2 ' ) k k+1 N(s+1, q) = N(s) + 2(q - 1 - 2 )(N(s-k-2) + l) + (2 ' - q + l)(N(s-k-l) + l), N(s+l,q-l) = N(s) + 2 k (N(s-k-l) + l) . Thus IT(s+l, q) - N(s+1, q-l) = 2(N(s-k-2) + l) - N(s-k-l) - 1 = (2 - a )(N(s-k-2) + 1) > 0. k k+1 Case 2. (for 2 0. (Q.E.D) 11 5- ALGORITHMS Corollary 1 implies that to maximize N(s+l) we should choose the maximum q such that the LOF computation tree is q-consistent. From this, we get the following algorithm to generate N(s+l) for s > 2. Algorithm 1 (multi- folding method). Step 1: Set N(o) +- N(l) <- 0, N(2) <- 1, q *- 2 and s «- 2. Step 2: If the LOF computation tree is 3-consistent then go to Step 3. Set q* *~ q, N(s+l,q*) *- N(s) + N(s-l) + 1, s *- s + 1 and if s > M, where M is an integer used to limit the number of steps then stop, otherwise go back to Step 2. Step 3« Let q* be the maximum q such that the LOF computation tree is q-consistent, where 2 < q - 1 < 2 Step k: Set N(s+l,q*) +- N(s) + 2(q* - 1 - 2 k )(N(s-k-2) + l) + (2 k+1 - q* + l) (N(s-k-l) + 1) and s ■«- s + 1. If s > M then stop, otherwise go back to Step 3. So far, %( s _k_i)( x ) and P N(s-k-2)^ X ^ the maxim ' um degrees of polynomials evaluated in (s-k-l) and (s-k-2) steps, respectively, were assumed for each subpolynomial of the segments in A and A on LHT, respectively. The q-consistency condition for a LOF computation tree given in Lemma 3 ' s - k - 2 > log u 1 , where u = N(s) + (2q* - 2 k+1 - 3)(w(s-k-2) + l) + 1, s-k-2 so we have information to increase N(s+l). Let f3 = 2 - u > then we may expect Y improvement on N(s+l,q*) given by Algorithm 1 by forcing (3 to be zero and achieving (q* +1) -consistency. Such ft improvement is defined as If = \(N(s-k-2) + p - N(s-k-l)) for 2 k < q* - 1 < 2 k+1 , 12 where A is a function such that A(y) = for y ^0 and A(y) = y for y > 0. This leads to Algorithm 2. J Algorithm 2 (modified multi-folding method) . Step 1: Set N(0) «- N(l) +■ 0, N(2) *" I, q 4- 2 and s«- 2. Step 2: If the LOF computation tree is 3-consistent then go to Step 3- Set q <- q, N(s+1, q ) *- N(s) + N(s-l) + 1, s<- s + 1 and if s > M then stop, otherwise go back to Step 2. -x- Step 3: Let q he the maximum q such that the LOF computation tree is q-consi- stent, where 2 then set A+- 1, otherwise A>**Q. Go to Step 6. Step 5: Set |3<- 2 S " k " 2 - N(s) - 2(q* - 1 - 2 k )(N(s-k-2) + l) - 1. and flT<- N(s-k-2) + p - N(s-k-l). If JT«- then set A *- 1, otherwise /I ♦" 0. Step 6: Set N(s+l,q +A) «- N(s) + 2(q - 1 - 2 )(N(s-k-2) + l) + (2 - q* + l)(N(s-k-l) + 1) + Xf and s — s + 1. If s > M then stop, otherwise go back to Step 3- 13 k. COMPUTATION RESULTS AND THE BOUNDS Table 1 shows the computation results of the maximum degree of polynomials evaluated by Muraoka's folding method, Algorithm 1, Algorithm 2 and the theoretical lower bound of the minimum required steps to evaluate polynomials of degree given by Algorithm 2. The amplification factor (x(s) , defined by the ratio between N(s) and N(s-l), approaches 2 as s increases. k(k-l) /2 Our results agree with Brent's result that polynomials of degree 2 ' are evaluated in k(k+l)/2 steps, e.g., for k = 5 the degree is 1021*. and s is 15 steps. Table 1 shows a polynomial of degree 1024 can be evaluated in 15 steps by our LOE computation tree methods (multi-folding and modified multi- folding) because by Algorithm 1 we can evaluate a polynomial of degree 1728 in 15 steps. The following results from our algorithms may be more interesting than those of Brent. Remember that N(s) denotes the maximum degree of polynomial evaluated in s steps by the LOF computation tree (multi -folding or modified multi-folding methods). The details of derivations of the following results are in Appendix A. Any s > 2 can be denoted by either k(k+l)/2 - 1 for k > 2 or k(k+l)/2 for k > 2 or k(k+l)/2 + i for k > i + 1, i > 1. (1) For k > 2 and k is an integer g k(k-l)/2 > N(k(k+1)/2 . 1} > 2 k(k-l)/2 _ x> (2) for k > 2 g k(k-l)/2 + 1 ^ N(k(k+1)/2) > 2 k(k-l)/2 ) (3) and for k>i+l, i > 1 2 k(k-l)/2 + i + 1 N(k(k+l)/2 + > 2 k(k-l)/2 + 1 Ik The above three lead to: (1). for s = k(k+l)/2 - 1, k> 2, 2 s + 3/2 - (2(s + l)) l/2 > N(g) > 2 s + 1/2 - (2(s + l)) 1 /^ (2). for s = k(k+l)/2, k £ 2, 2 s + 1/2 - (2 S )^ > N(s) > 2 s + 1/2 - (2s)^ ^ (3) . for s = k(k+l)/2 + i, i £ 1, k > i + 1, 2 s + 3/2 - (2(s-i)) l/2 > N(g)> 2 s + 1/2 - (2(s-i)) 1 / 2 < And we get the following. Theorem 2 (dual property) For any given (J > 0, N (s) ^ N(s) > 2 ■ "" , where N (s) denotes the maximum degree of polynomial which can be evaluated in s steps, (i.e., theoretical l/2 maximum degree), and £ ^ (2/s) ' for all sufficiently large s. Proof: It is easily derived from dual properties. (Q.E.D.) We are now ready to talk about the primal problem, i.e., the minimum number of steps required to evaluate a polynomial of degree n. (1) . For log 2 n = k(k-l)/2 - 1, k > 2, -I /o T(P n ) < log 2 n(l + (2/(log 2 n + l)) x/ ) + 1/2, (2) . for log 2 n = k(k-l)/2, k > 2, 1 /? T(P n ) < log 2 n(l + (2/log 2 n)" L/ ) + 1/2 and (3) - for log 2 n = k(k-l)/2 + i, k > i + 1, i > 1, T(P n ) < log 2 n(l + (2/(log 2 n + i)) 1 ' 2 ) + l/2. : 15 Thus we get Theorem 3 which is identical to Brent's Theorem 2. Theorem 3 (primal property) 0, T(P ) v n ,1/2 For any given e > 0, T(P ) < (l + e)log n, for all sufficiently large n, where T: P — » s and e at (21og n)' ' . Proof: It is derived from primal properties. (Q.E.D.) Theorem k The number of operations required to evaluate a polynomial of degree n , where n is large, in less than or equal to (l + e)log p n steps is given by #(P ) < Cn, where C » 2 €l ° S 2 n > 2. Proof: From Theorem 3, #(P n ) < 2 S - 1 « 2 (l + e)log 2 n . n2 €log 2 n . (Q.E.D.) 16 r r s N(s) "by folding N(s) by Algorithm 1 N( 3) by Algori' bhm 2 lower method, q = 2 -X- 1 q +A bound 1 - - 2 1 1 2 1 2 2 3 2 2 2 2 2 3 l+ 1+ 1+ 2 k 2 k 5 7 7 2 7 2 k 6 12 12 2 12 2 5 7 20 20 2 20 2 6 8 33 36 3 36 3 7 9 5^ 62 3 62 3 7 10 88 101+ 3 104 3 8 1.1 ll+3 183 1+ l83 1+ 9 12 232 320 k 320 k 10 13 376 572 5 572 5 11 ll+ 609 992 5 992 5 1.1 15 986 1728 5 I728 5 12 16 1596 3059 6 3059 6 13 IT 2583 5^89 7 5489 7 11+ 18 1+180 9767 7 9767 7 15 19 676^ 17^ 8 Y]k5k 8 16 20 109^5 31286 9 31286 9 16 21 17710 55766 9 55915 10 17 22 28656 100914-6 11 101095 11 18 23 1+6367 I82726 12 182875 12 19 2k 75021+ 330690 13 330839 13 20 25 12.1392 6027214 15 602873 15 21 26 1961+17 IO96509 16 1096807 16 22 27 317810 I988781 17 19911+63 17 22 28 511+228 36.II4-52O 18 3619735 18 23 29 832039 6595653 20 6603699 20 2k 30 131+6268 120605211- 22 12071699 22 25 ■ 31 2178308 2211^723 2k 22129325 21+ 26 32 3521+577 14.06393^3 26 1+0738153 27 27 33 5702886 7I4-9.IO7II 29 7502714-01 29 28 3h 92271+61+ I38188692 32 138391057 32 29 35 1^-930331 ! 2538533614- 33 2514-222609 33 29 Table 1. Maximum degrees c >f polj momi .als at each step . IT s N(s) by folding N(s) by Algorithm 1 N(s) by Algorithm 2 lower method, q = 2 q q. +A bound 36 2lU578l6 14.66181068 35 1+668.12558 35 30 37 39088.168 857771783 38 .858785I+53 38 31 38 632^5985 15831+99885 1+2 1585050551 1+2 32 Table 1. Maximum degrees of polynomials st each step . 18 / 12 v 21 ' ( 20 N U„x + — ■ + a 00 x + a on ) x (a on x + + a] x + a Q ) 33 22 21' 6 steps 5 steps 20 7 steps 8 steps (a). The computation tree for Muraoka's folding method at step (which evaluates a polynomial of degree 33) • i 7 \ 29 , 7 \ 21 , 20 N ^ a 36 X + "" + a 29^ X + ^ a 28 X + -" + a 21^ x + ( a 20 X + "- + a x x + a ) 5 steps 5 steps 5 steps 5 steps 7 steps v ®' ^ step 8 steps (b) . The computation tree for multi- folding method at step 8 (which evaluates a polynomial of degree 36) . Fig. 3 Examples of computation trees at steps 8, 19 o in o -a- l-l o m o w -d o -P o •H -P CO -P ! o ■a o H O ft H H cd ft Ch o «h Oh O . Consistency q v.s. steps s. 22 5 • CONCLUSION Graph 2 shows the maximum degree of polynomial which can be evaluated in a given number of steps "by the LOF computation tree. Brent's result is also plotted. Let us compare Brent's result (his Theorem l) which says that a polynomial k(k-l) /2 of degree 2 ' can be evaluated in less than or equal to k(k+l)/2 steps, k(k-l) /2 k > 2, with our result that a polynomial of degree greater than 2 ' can be evaluated in k(k+l)/2 steps. These two results are slightly different since the former statement does not say that there exists at least one polynomial of degree k(k-l)/2 higher than 2 ' whose evaluation can be accomplished in k(k+l)/2 steps. This statement is also clear from Graphs 1 and 2. Our dual problem approach becomes more reasonable than the primal problem approach because a set of polynomials can be evaluated in s steps and such a set is ordered with respect to polynomial degree. Moreover it is easy to see that the dual property implies the primal property, while the converse is not true. Some of the consequences of this paper are: (l) . We found the maximum possiblly evaluable degree of polynomial for each number of steps, i.e., the dual property for not only a particular s but for all s > 1, while Brent's primal property is applied only for descrete degrees or at descrete steps. (2) . Dual property implies primal property. (3) • We found simple computation trees called LOF, or multi-folding, to achieve Brent's upper bound and to approach the theoretical lower bound. This means that we have found a simple scheduling algorithm to evaluate a polynomial of degree n whose evaluation requires a number of steps close to the lower bound. l/2 (k) . We found e as a function of (2/log n) ' for the upper bound . 23 (l + c)log n of the primal problem. l/2 (5) . We found 2. Furthermore, by considering Brent's statement "Theorem 1 is used to obtain an upper bound for t(n), even if n is not a power of 2", we get the following . o + i /p Cp=? > 1 1 ' 2 N^s)^ 2 ' " K J , for integer s> 2. Notice that the equality should be included, i.e., > , while our dual results do not, i.e.,> (see dual properties developped ealier) . The maximum degree of polynomials derived by the above dualization of Brent's Theorem 2, IL^s), for each step s is plotted in the Graph 2. By the comparison of our results and his, we have (7) • the upper bound of the steps required to evaluate polynomials by our multi-folding method is lower than or equal to the Brent's, and the lower bound of the degree of polynomials evaluated at each step s, i.e., the maximum evaluable degree of polynomials, is .higher than Brent's by a factor of \i, where 1.5 < u < 2. LIST OF REFERENCES 1. Brent , R. , "On the addition of binary numbers", IEEE Transaction on Computers , August 1970, pp. 758-759- - - - 2. Dorn, W. S. , "Generalizations of Horner's Rule for Polynomial Evaluation", IBM Journal of Research and Development , 6, April 1962, pp. 239-2^5. 3. Estrin, G. , "Organization of Computer Systems -- the Fixed plus Variable Structure Computer", Proc. of Western Joint Computer Conference , May i960, pp. 33-V). \. Muraoka, Y. , "Parallelism Exposure and Exploitation in Programs", Ph.D. Thesis, University of Illinois, 1971? PP- 33- W- 5. Pan, V. Ya. , "Methods of Computing Values of Polynomials", Russian Mathematical Surveys , 21, January- February 1966, pp. IO5-I36. 6. Winograd, S. , "On the Time Required to Perform Addition", JACM , 12, April 1965, pp. 277-285. 25 APPENDIX A PROOFS OF MAIN PROPERTIES 2b Proof of Lemma 5« If s - 1 2s flog 2 (N(s) + l7 then N(s+1,2) = N(s) + N(s-l) + 1. If 8 - 2 > flogg(N(s) + N(s-2) + 2~)1 then N(s+1,3) = N(s) + 2(N(s-2) + 1). If s - 2 > Eog 2 (N(s) + 2(N(s-3) + 1) + 1)1 and if s - 3 ^ log (N(s) + (N(s-3) + 1) +ll then N(s+1,^) = N(s) + 2(N(s-3) + 1) + (N(s-2) + l) . If s - 3 > log 2 (N(s) + 3(N(s-3) + 1) + l7 then W(s+1,5) = N(s) + I<-(N(s-3) + 1). If s - 3 2 1og 2 (N(s) + 2(N(s-i|-) + 1) + 2(N(s-3) + l) + lj and if S - X ^ Eog 2 (N(s) + (W(s-if) + 1) + 1)1 then N(s+1,6) = N(s) + 2(N(s-if) + 1) +3(N(s-3) + l) • It is important to see that "If condition" of the very first case always hold. In general we get the following properties. Property 1, If q - 1 = 2 k and if s - k - 1 =? Eog 2 (N(s) + (2 k - l)(N(s-k-l) + l) + 1)1 then N(s+l,q) = N(s) + 2 k (N(s-k-l) + l) . Property 2. If 2 k < q - 1 < 2 k+1 , s - k - 1 St Eog 2 (N(s) + 2(q - 1 - 2 k )(N(s-k-2) + l) + (2 +l - q)(N(s-k-l) + l)+j and if s - k - 2 ^ Eog 2 (N(s) + (2(q - 1 - 2 k ) - l) (N(s-k-2) + l) + 1)1 then N(s+l,q) =N(s)+2(q-l- 2 k )(N(s-k-2) + l) + (2 k+1 - q + l) (N(s-k-l) + l) . 27 Property 3. If the second condition of Property 2 holds then the first condition always holds. Proof: N(s) + 2(q - 1 - 2 k )(N(s-k-2) + l) + (2 k+1 - q)(N(s-k-l) +l) +1 < 2N(s) < 2(N(s) + (2(q - 1 - 2 k ) - l)(N(s-k-2) + l) + l). Thus from the above arguments, the q-consistency condition (q > 3) for the LOF computation tree is given "by s - k - 2 > flog 2 (N(s) + (2(q - 1 - 2 k ) - l) (N(s-k-2) + l) + 1? ~k _, _ ,, _k+l where 2 < q - 1 < 2 (Q.E.D. ) Dual Properties . (1) At steps s = (k(k + l))/2 - 1, for k > 2, log 2 N(s) + 2 > log^s] 1 + 1 = (k(k - l))/2 + 1 > log 2 N(s) + 1. By rearrangement, we get (k(k - l))/2 > log 2 N(s) > (k(k - l))/2 -1 and this yields to: 2 (k(k - l))/2 > N(g) > 2 (k(k - l))/2 - 1. (2) At steps s = (k(k +l))/2, for k > 2 (i.e., at Brent's points), log N(s) + 2 > 1og 2 N(s7 + 1 = (k(k - l))/2 + 2 > log 2 N(s) + 1. By rearrangement, we get (k(k - l))/2 + 1 > log 2 N(s) > (k(k - l))/2 and this yields to: 2 (k(k - l))/2 + l >N(g) >2 (k(k - l))/2. 28 (3) At steps s = (k(k + l))/2 + i, for k > i + 1, i > 1, log 2 N(s) + 2 > ^log 2 N(s")l + 1 = (k(k - l))/2 + i + 2 > log 2 N(s) +] By rearrangement, we get (k(k - l))/2 + I + 1 > log N(s) > (k(k - l))/2 + i and this yields: 2 (k(k - l))/2 + i + 1 > N(s) > 2 (k(k - l))/2 + i. Primal Properties . These properties are derived directly from those dual properties discussed above. 29 APPENDIX B SOME OTHER PROPERTIES 30 The following two properties are important and are given without proofs . Lemma 5 Let P (x) and P (x) he two polynomials, where n n > n_. If s n and n n p 12 1 s are the minimum steps required to evaluate P and P , respectively, then s > s (the converse is not true). Lemma 6 Let s and s be the minimum steps required to evaluate polynomials P (x) and P (x), respectively. If s > s then n > n (the converse is not n x n 2 12 12 true ) . Theorem 5 Muraoka's folding method and the minimum steps required to evaluate polynomials achieved by exhaustive division of P (x) into 2 segments, called 2-cut, are equivalent, i.e., both methods give the same maximum degree of polynomials that can be evaluated in a given number of steps. Proof : Let s„ and s denote the minimum number of steps required to evaluate a polynomial of degree n by the folding method and exhaustive 2-cut method, respectively, and assume T is the function T: P (x) -* s. If s < s we have Q (x)x + P r _ 1 (x) € |o ii _.(x)x 1 + P ± _ 1 (x) | 1 < i < n j, therefore, s f > s = min lT(Q (x)x 1 + P. 1 (x)) | 1 < i < n k where f is given by the folding method. 31 Now consider the case s _ > s, . Let us assume s _ < s, , then there exists k such f — b f b that s > T(Q'(x)x + P 1 (x)). It is also obvious that s - 2 > T(P ,(x)) and s - 3 > log k for k < f (k > ^ is not valid from Lemma k) . But for such k s f - 3 > T(Q n _ k (x))must hold, while s f - 2 > T(Q^_ (x)), since k < J" implies deg(Q v.( x )) > ^^(Qy, ( x )) and by Lemma k we get T(Q k ( x )) ^ T (Q Vl ( x )) which leads us to a contradiction. (Q.E.D. ) Theorem 6 The minimum steps s required to evaluate a polynomial of degree n by Muraoka's folding method is upper bounded by (l + C, )log n + C for relatively large n, where C (^ 0.1^) and C (~ 0) are constants. Proof: Because the sequence (N(s) + 1 | s > Oj forms a Fibonacci sequence, the amplification factor a(s) for relatively large s approaches 1/2 (1+5 )/2 (« 1.62), and then we get the result. (Q.E.D. ) For example the minimum' steps s required to evaluate P (x) by the folding method for 2 < n < 200 are bounded by 1(1 + 0.58)log nl + 1 < s < 1(1 + 0.1^)log nl + 1. 32 Theorem 7 The •minimum number of operations required to evaluate a polynomial of degree n, N(s) < n < N(s) + N(s-l) + 1, by Muraoka's folding method is given by the following recurrence formula; # (P H(S) + 1 + 1 (X)) " #(P 1 (X)) + 5 + # (P H(S) (X)) for i =0, 1, ..., N(s-l), and for s = 1, 2, ".". '. , where #: P (x) -» I,, and #(P ) > C'n where C =s 2.6. " n — Proof: It is obvious from the computation tree of Muraoka's folding method, or from our 2- consistent LOF computation tree. (Q.E.D. ) In table 2, the computation result given by Theorem 7 is listed. Lemma 7 Let us assume P / >.(x) denotes the polynomial of the maximum degree whose evaluation requires at least s steps by "any" computation tree, then T(P, T , n(x)x X ) > s + 1 for i > 1. N(s) — — Proof: It is enough to show that T(xP, T / \(x)) > s + 1. Assume xP, t/ k N(s) — N(s) can be evaluated in s steps. If xP / \ can be evaluated by our multi-folding method then P / •, should be evaluated by (s - l) steps, which leads to contradiction. If the polynomial can be evaluated as 2 ...+...+...+ ax + ax 33 2 since the evaluation of ax and ax require 2 and 1 steps, respectively, it is clear to see that we can evaluate P / >.x + c, c is constant, in s steps. Furthermore, P , \ is, by assumption, the polynomial of the maximum degree whose evaluation requires s steps, thus we have a contradiction. Assume the polynomial is evaluated as ••• + + + ( )y? + (a,x + a )x we need 3 steps to evaluate the last term, but we can also evaluate 2 a_x + ax + c in 3 steps, which means we can evaluate P / x + c in s steps and leads us to a contradiction in our assumption. (Q.E.D. ) % n W n #(P n ) n #(P n ) n #(P ) n 35 91 70 183 105 273 1 2 36 9k 71 186 106 275 2 5 37 97 72 188 107 278 3 8 38 99 73 191 108 280 k 10 39 102 7^ 193 109 283 5 13 ko 1(A- 75 196 110 286 6 15 kl 107 76 199 111 288 7 18 k2 110 77 201 112 291 8 21 k3 112 78 203 113 29^ 9 23 kk 115 79 205 Hit-- 296 10 26 k5 1.18 80 207 115 299 11 29 k6 120 81 210 116 301 12 31 kl 123 82 212 117 304 13 3^ k8 125 83 215 118 307 Ik 36 k9 128 84 218 119 309 15 39 50 131 85 220 120 312 16 k2 51 133 86 223 121 315 17 kk 52 136 87 226 122 317 18 kl 53 138 88 228 123 320 19 k9 5k llH 89 231 124 322 20 52 55 Ikk 90 233 125 325 21 55 56 lk6 91 236 126 328 22 57 57 lk9 92 239 127 330 23 60 58 152 93 2kl 128 333 2k 63 59 l^k 9k 2kk 129 335 25 65 60 157 95 2kG 130 338 26 68 . 6l 159 96 2^9 131 34l 27 70 62 162 97 252 132 343 28 73 63 165 98 25^ 133 346 29 76 6k 167 99 257 13^ 349 i 30 1 78 65 170 100 260 135 351 51 8i 66 173 101 262 136 35^ 32 8k 67 175 102 265 137 356 33 86 68 178 103 267 138 359 - 3k 89 69 180 HA- 270 139 J. 362 Table 2. The number of operations required to evaluate polynomial of degree n by the folding method. 35 APPENDIX C PL/l PROGRAMS 36 OFGRFF CC : PROC L N(0_: S BI m a i DELTA TOWK EDLRE GPTICNSIMAIN); 5 1 ) _ _R_I N FIXED (31!j_ N FIXED, N FI>FD( 3 1) t BIN FIX EC (31), (0:35) BIN FIXFDOl); M( N< N( N( M( N< 0)=0? l)=0; 2) = l: 3) = ?; 4)=a; c ) = 7; N( M TO CO _EM NC DO L: ._.EN OUT:M= IF l; c 30; I }=2*T0fcK( 1-1) ; 6)=1?; 7 ) = ? ; i*K ((.))== 1 = 1 T towu DJ - -3 • - ~- » s = 7 t r so; r i=o tc 10; TF ( TOVK( I XMC-1J £(NC- K=TC WK( I + l ) ) THEN CG ; J=l; GO TO OUT; END; C_; N- M __PU NC EN END OF THEN DC: NCC=NC; NONC+1? K = J; GO TO L: END M( S.+ 1J-N (S )+2*(NCO-l-IQWKlKJ )*< N < S-K-21 + 1 ) ♦ (TOWMK + 1 l-NCO+1 )*(N(S-K-1)+1) ; CATMN( S + l) ,NC.D) ; T SKIP =NC-l; c; GPEE ; 57 PRINT: PLT SKIP DATA(NCS+1).NCD): NC=NC-l; end ; end degree; 58 DEGREE: PROCEDURE OPTI GNS< NAI N) ; DC L N(0;51) BIN FIXEO(31), S BIN FIXED, M BIN FIXED( 31) t OELTA B IN FIXECM31 I, THWK(0:35) PIN FIXED(31); N(0)=0; N ( 1 ) =o; M I 2 ) = 1 ; M( 3_) = ?j_ N ( 4 ) =4 ; M ( 5 ) = 7 ; N ( h ) = 1 2 ; N( 7)=2Q: rnwKto )=i; no i = i tc 30_; TOWM I ) =?*TOWK( 1-1 ) ; FNn; NC = 3; on S=7 TO 50j_ L: DH 1=3 TO io; IF ( TOWK ( I XNC-DS tNC-K-TDH K ( I +1 1 ) THEN HO: J=l; GG TO OUT; FND; END; 0UT:M=N( S) + < 2* (NO 1-T(*V»K( J ) ) -1 )*(N( S-J-2H-1J+1 ; TF TOWK (S-J-2 )>=M THEN DCl: NCD=NC: NC=NC+l; K = J; GO TO L ; END; JF TOWKCK+ U=NCD-1 THEN GO TO O UT1 ; DELTA=T0WK(S-K-2 )-N < S )-2* ( NC D- 1-TOWK { K ) )*(N< S-K-21 + ll-l; IF M( S-K-l )>-N(S-K-?)+DELTA __ T F FN nn ; N I S+ 1 ) =N ( S ) + 2.*_( NC C-l-TCWK(K) )*(N(S -K-2 ) t jj_ +(T0WK(K+1)-NCD+1 )*(N( S-K-l )+l); GO TC PR INT L FND; FLSF DN; N( S+1)=N(S )-M2*( NfC- 1-TOWK (K ) ) + l )*(N(S-K-2)+l) + i TO WK'('k + 1)-NC 1) *= N(S-K-3)+DELTA twfn on; M ( s +_ 1 ) = N I Si + TOWK{K+l)*lNtS-K- 2)+l ); GO TO PR INT; end; FLSF DO; N ( S + 1 ) = NM S)-H TOWK(K»l )-l ) *( N ( S-K-2 ) + 1 > + N( S-K-3 ) + DELTA; NCD=NCn+] : F N D : 39 BISECT: PROCEDURE OPT IONS( MAIN ) ; BISECT: PROCEDURE OPTIONS (MAIN ) ; OCL DPN BIN FIXED; get list(dpn); begin; DCL ( A,B,C,S1,S2,MINISTEP) BIN FIXED, STEP(Q:DPN) BIN FIXED; STEPCO)=0; STEP(1)=2;STEP(2)=3; DO N=3 TO DPN; MINISTEP=100; DO 1=1 TO N; S1=STEP( 1-1); A=CEIL(LQG2(I )); B=STEP(N-I ); S2=MAX(A,B)+1; END; END; C=MAX(Sl,S2)+l; I F C < MINI STEP THEN D O; M INISTEP=C; M l^U END; END; STEP(N)=MINISTEP; LB=CEIL(LOG2(2*N+l) ) ; PUT SK I P D ATA ( N,MIN1STE P, MI, LB) ; END BISECT; ko T R I^S EC ; P R OCEDURE OPT IONS ( MA IN ) ; fRI_SEC: PROCEDURE OPTIONS(MAIN ) ; DCL NN BIN FIXED; GET LIST(NN); BEGIN; DCL (A f B,CtOtSltS2»S3J BIN FIXED, STEP(OtNN) BIN FIXED; STEP =100; STEP(0)=0; STEP(1)=2; DO N=2 TO NN; K=N-i; DO 1=1 TO k; S1 = STFP(I-1 ) ; DO J=I+1 TO n; A=STEP(J-I-1) ; B=CEIL ; DCL NK50) BIN FIXED, /* INDEX REGS */ -Hi o: iocu am fixed* STEP(0:50) BIN FIXED, _PCSXU BIN- FIXED, -73fe--CJUT POS1JION */ (N,NN,NC,A,B,SR,COUNT) BIN FIXED; UP_DATE_H: procedure; Ih=o; sr=step(ni (i)-i); H4-SRJ-U — - - IF NC>=3 THEN DO 4=2 TO NC_1; A=STEP(NI(I)-Nl{I-l)-l»; B=CEIL(LOG2(NIU-l) II; SR=MAX(A,8)+l; -H4SR4=HISR4-±X4- -■ END; A=STE P ( N-N4 IHC_ 1 1 1 ; S=CEIL(LGG2(NI(NC_1) )>; SR=MAX(A,B)+1; H(SR)=H(SR)+l; COUNT=0; 1=0; L: IF H(I)>0 THEN DO; IF H(U>1 THEN DO; H CI + L ) = H< 1+1 )+l ; HfI)=H(I)-2; COUNT=COUNT+ 1; IF COUNT0 THEN DO; H ( J+l 1=H( J + l ) + 1 ; H(I)=H(I)-1; H(J)=H(J)-1; COUNT=CGUNT+l; IF COUNT27 THEN HNC=9; ELSE MNC=N/3+l; LO :DO NC=2 TO MNC; 42 nc_ .1=NC- i; LI :D0 NK1 )=1 TO N-NC+2; IF NC=2 THEN DO; CALL UP_DATE_ .h; GO TO OUT1 ; end; L2 :DQ JIU2 )=NI(1 )+l JQ N; IF NC=3 THEN DO; CALL UP_DATE_ .h; GO TO OUT2 ; end; L3 :D0 NK3 )=NI(2 1 + 1 TO n; IF NOV THEN DO; CALL UP_DATE_ .h; GO TO 0UT3 ; end; L4 :DQ lUli ) = NU3_ 1 + 1 TO n; IF NC=5 THEN DO; CALL UP-DATE. .h; GO TO OUT4 ; END; L5 :D0 NU5 )=NI(4 )+l TO n; IF NC=6 THEN DO; CALL UP-DATE. .h; GO TO OUT 5 ; end; L6 :D0 M(6 1-NK5 1+1 TO N; IF NC=7 THEN DO; CALL UP_DATE_ _h; GO TO 0UT6 ; end; L7 :D0 NI(7 )=NI (6 )+l TO N; IF NC = 8 THEN DO; CALL UP-DATE. _h; GO TO 0UT7 ; end; L8 :D0 NH8 )=NI(7 1+1 TO N; IF NC=9 THEN DO; CALL UP-DATE. _h; GO TO CUT8 ; end; 0UT8 : END L8 ; GUT7 : : END L7 ; 0UT6 ' : END L6 ; 0UT5 : END L5 ; 0UT4 ' : END L4 ; 0UT3 : : END L3 ; 0UT2 : END 12 ; 0UT1 . : END LIS END LO; PUT SKIP DATA(STEPIN) ,MC); DO 1=1 TO MC-l; PUT EDIT(P(I))(F( 5) 1 ; END; END E KHAUST • end; %