HmHili nffifflL mbhw 8uii Digitized by the Internet Archive in 2013 http://archive.org/details/methodsforautoma916lars 5/O.X'f MM Report No. UIUCDCS-R-78-916 UILU-ENG 78 1709 1 METHODS FOR AUTOMATIC ERROR ANALYSIS OF NUMERICAL ALGORITHMS by John Leonard Larson > b~> r April 1978 NSF-0CA-MCS75-21 758-000033 1 * * The Library of the JUN 1 5 1978 wersity ot >»ino.s sxna-Ch8 r DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILL! Report No. UIUCDCS-R-78-916 METHODS FOR AUTOMATIC ERROR ANALYSIS OF NUMERICAL ALGORITHMS* by John Leonard Larson April 1978 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 * This work was supported in part by the National Science Foundation under Grant No. US NSF-MCS75-21758 and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, April 1978. METHODS FOR AUTOMATIC ERROR ANALYSIS OF NUMERICAL ALGORITHMS John Leonard Larson, Ph.D. Department of Computer Science University of Illinois at Urbana-Champaign, 1978 Using Bauer's approach to relative error propagation, methods are presented for performing various error analyses on numerical algorithms. A technique is developed which generates a system of equations employed by the error analyses, and requires an order of magnitude less time and storage than present techniques. Accurate analyses are attempted by working in the °°-norm, and require the solution of minimax problems. A heuristic approxi- mation method is also described. These methods are compared with the 2-norm approximation methods of Miller. The error analyses provide alternative criteria by which algorithms that solve the same problem may be compared. Additionally, the analysis of a composite algorithm, which is made up of concatenated sub-algorithms is given in terms of analyses done on its parts. Finally, by using a forward analysis, limited algorithm improvement is obtained by the location of sensitive operations, and the substitution of mathematically equivalent expressions. Ill Acknowledgments I would like to express my sincere appreciation to my thesis advisor, Professor Ahmed H. Sameh, for his guidance, insight, encourage- ment, and friendship. I would also like to thank the members of my thesis committee for their interest and comments: Professors David Kuck, Dave Liu, Paul Saylor, and Robert Skeel . Additional thanks go to Professor Webb Miller for providing his software routines and for his interest in my work. I am grateful to the National Science Foundation and the Depart- ment of Computer Science for their financial support. Thanks are due to Ms. Cathy Gall ion for an excellent job of typing. Finally, I would like to thank my wife, Emmylou, for her love, patience and encouragement, and for her skill in preparing the illustrations IV Table of Contents Page 1. Introduction 1 2. Preliminaries 6 3. Roundoff Errors 11 4. Efficient Calculation of M 19 5. Error Analyses 25 5.1. Forward Analysis 25 5.2. Backward Analysis 27 5.3. B-analysis 34 6. Computational Details 39 6.1. The Form of g* 39 6.2. Simplification of Polytope Representation 40 6.3. Algorithms for g* 48 7. Comparison of Algorithms 66 8. Composite Algorithms 77 9. Algorithm Improvement 85 10. Conclusion 94 Bibliography 97 Appendix 99 Vita 102 1. Introduction - New algorithms are constantly being proposed for the solution of numerical problems. In particular, these include algorithms which exploit the capabilities of new and emerging parallel and pipeline machines. Unfortunately, this enthusiasm has not been matched by the development of systematic ways to evaluate the numerical properties of these new algorithms. In fact, until recently [Miller, 8-12] little work has been done in the area of automating error analysis for existing sequential algorithms. Besides J. H. Wilkinson, few people have the time or inclination to make tedious error analyses of the algorithms they use. Error analysis needs to be organized in such a way that it is easily done by machine. This thesis is concerned with the development and implementation of methods for automatic error analysis of numerical algorithms. This work parallels that of Miller [10], but takes a different approach to the analysis of numerical algorithms. His development is from an absolute error point of view, while this thesis is concerned with relative errors, a natural approach considering that floating point arithmetic is defined in terms of relative error. Perhaps the greatest difference in the approaches lies in their goals. While he strives to find data for which an approximate analysis indicates that the algorithm under investigation per- forms poorly, this thesis strives to give an accurate analysis of the algorithm for some given data. Each goal has its place depending on the need for the analysis. Section 2 begins with a summary of previous work, and shows how roundoff errors affect computations. Aided by the representation of an algor- ithm as a directed graph, each numerical quantity is a node, and directed arcs lead from operands to results. During the execution of the algorithm, the value" at each node is subject to roundoff error, called local relative error as described in section 3. The local relative error in a node sub- sequently affects the value of other nodes. In particular, it affects those nodes which use the contaminated node as an operand. The magnitude of the effect on one of these nodes is reflected by a number which weights the arc between it and the contaminated node. Moreover, the local rela- tive error in one node affects the value of another node if there exists a directed path between them. Additionally, the effect of the local rela- tive error is the product of the arc weights along this path. The effects of all local relative errors combine to produce a total relative error in each node. The definition of total relative error, when described in this way, is called the global definition, since all the nodes in the graph are inspected. An alternative formulation, called the local definition, recursively defines the total relative error in a node in terms of the total relative error in its operands. Each definition gives rise to a system of equations which relates the local and total relative errors. The system corresponding to the global definition is dense, while the local definition, as expected, produces an exceedingly sparse system, no more than four non-zero elements per row. The equivalence of these two systems is shown. Either of these systems could be used to perform various error analyses. However, only those rows of the dense system corresponding to output nodes need be used. Current methods generate the entire dense system, and then discard all but the desired rows. Section 4 derives an algorithm which relies on the equivalence relation and computes only those rows which are needed by solving sparse upper triangular systems. A com- parison of this method with an existing alternative shows an order of magnitude savings in both time and storage. Section 5 poses the problems to be solved in order to perform forward, backward, and B-analysis. A forward analysis determines relative error bounds for the computed solution. This bound may be easily computed from the system relating the local and total relative errors. A back- ward analysis determines the stability of an algorithm by finding a bound on relative input perturbations such that the computed solution satisfies the perturbed problem. This bound is found by comparing the set of output relative error vectors for all computed solutions with the set produced by local relative errors in only the input nodes. Conditions are specified for when a backward analysis is not possible. A B-analysis is an alterna- tive stability measurement which determines a bound on relative input and output perturbations such that the computed solution is near the exact solution of a perturbed problem. This bound is found by comparing the set of output relative error vectors for all computed solutions with the set generated by local relative errors in only input and output nodes. An ad- vantage of the B-analysis is that it may always be done. Section 6 analyzes the problems posed in section 5. The problems for backward and B-analysis are identified as minimax problems, whose solu- tions are often tedious to obtain. These problems involve the expan- sion of a polytope (a centrally symmetric convex set) to cover another poly- tope. The polytopes represent the sets of output relative error vectors described above. Two exhaustive methods are described. The first method searches the vertices of the polytope being covered for the vertex of maxi- mal relative distance. The second method searches the faces of the polytope doing the covering. Each method makes use of the particular form of the solution. Additionally, a close inspection of the problems reveals that their complexity may often be significantly reduced. This is reflected in a reduction of the number of columns in the system which relates local and total relative errors. Nevertheless, the computation time of the exhaustive methods precludes their use for all but small problems. An approximation method is also given which computes a lower bound on the solution and occasionally produces the exact solution. The remaining sections investigate several topics related to the main theme. Section 7 examines various ways by which algorithms may be compared. The first is in terms of a forward analysis, where one algorithm is preferable to another if it can guarantee a smaller relative error bound for the solution. This method is equivalent to comparing, for each algorithm, the set of all computed solution relative error vectors with the unit cube. A second method, proposed by Miller [8 J, compares the sets of output rela- tive error vectors for each algorithm against one another. The results of this comparison may indicate that in some sense each algorithm is prefera- ble to the other. This dilemma is overcome by a third method which, like the first method, uses a common standard against which the algorithms are compared. Here, the standard is the problem which the algorithms solve and is represented by the set of output relative error vectors resulting from local relative errors in only input and output nodes. Several examples are given. A composite algorithm is one which is made up of smaller sub- algorithms where the outputs of one sub-algorithm serve as the inputs to the next. Section 8 shows how a composite algorithm may be analyzed if analyses -have been performed on its parts. The relationship between the matrices of coefficients of the systems which relate local and total relative error in the composite algorithm and in the individual sub-algorithms forms the basis upon which the analysis of the composite algorithm may be done. Derivations are presented which provide bounds for forward, backward, and B-analysis of the composite algorithm in terms of corresponding analyses done on its sub-algorithms. Finally, section 9 scratches the surface of the problem of algorithm improvement. The goal taken is to reduce the forward analysis error bounds, since by doing so bounds for other analyses should also be reduced. Algorithm improvement consists of two parts: locating where changes to the algorithm should be made, and deciding what changes should be incorporated. It is shown that operations that have large weights on the arcs leading from their nodes are prime candidates for modification. A change in the algorithm involving these nodes may take the form of a mathematical identity for which the best alternative may be easily computed. Several basic identities are investigated. This technique is limited in its effectiveness by the fact that the computational graph does not contain all the information about the algorithm or the problem it solves. 2. Preliminaries " This section presents the basic definitions and concepts pro- vided by Bauer [4], and clarified and extended by Larson, Sameh [6]. Definition 2.1 : problem - a set X of n input variables, £-, 1 <_ i <_ n, and a set Y of m output variables, rj . , 1 <_ i <_ m, with t?. = f .. (£, , £«» ...» Definition 2.2 : algorithm - any finite sequence of elementary operations of the form £. = £. * £,, where * is one of ( + , -, *, 0, which mathematical- 1 J K ly realize the elements of Y. Definition 2.3 : computational graph - a directed graph, essentially a parse tree of the algorithm, having the £• as nodes and directed arcs from t d and£ k to*.. The computational graph contains a total of r nodes consisting of n input nodes, £., 1 <_ i <_ n, and r-n computational nodes, £ . , 1 <_ i <_ r-n, of which m nodes are designated as output nodes, i.e., tj- = £ , 1 <_ i <_m. i The variable co. is used to index the outputs. The nodes are numbered in such a way so that if $. = £. * £., then j < i and k < i. The value of each node is assumed to be non-zero. The following model will be used from time to time to illustrate the concepts being presented. problem: X = {^, £ 2> £ 3 , £ 4 > ; n = 4, Y = {t? 1 , tj 2 > ; m = 2, r? 1 = (^ + £ 2 h(£ 2 - t 3 ), v 2 = K 2 -« 3 M« 3 +« 4 ) algorithm : £ & = ^ + £ 2 h-h + h h'h*U where t? 1 = £ g , r? 2 = $ g , cjj computational graph : 8, o^ = 9, r-n = 5, r = 9, Subsequent graphs will employ implicit downward arrows. Definition 2.4 : partial relative derivative of (■. with respect to £ fi - a quantity, p£.j/p£ c , representing the first order relative effect on the value of £ . assuming a relative change in the value of £ fi , given by p^/p^ = K,/^) (a^/aSj) (2.1) Sterbenz [17] calls these quantities magnification factors. The arcs of the computational graph are weighted. The weight of the arc from £ . to £ . is f>i-lp%.. Table 1 illustrates the calculation J l v J of arc weights for the elementary operations. ■: 'V*j p{,.M k p^Mj Pf/P« k *i=V*k * = + * = _ * = X * = r V { i ty*i 1 1 v«1 -v*i 1 -1 Table 1 . Arc Weights The calculation of p£.j/p{; 8 using (2.1) when 2 ^ j, e ^ k, or 8 =j = k is complicated by the evaluation of a£./3£ fi by the chain rule. An al- ternative calculation makes use of the following definition and theorem. Definition 2.5 : path product - the product of partial relative derivatives belonging to each arc on a directed path between two nodes. Theorem 2.1 : p£ -/p£p is equal to the sum of the path products for all paths from £« and £ . . Proof: The chain rule may be stated as a^/at. s U P z n as p=l q=l f(p,q) /3t f(p,q+l) (2.2) where s is the number of distinct directed paths from £_ to £.; u is the number of arcs in path p; u +1 is the number of nodes in path p; f(p,q) = f. (p,q) is a function which describes the paths from £ g to £ . in the following manner: for each path p: f(p,l) = i; £w +1 \ is an operand of £ f / %; f(p,u +1) = £. Consider two adjacent nodes, say ^f(n a+1) anc ' ^f(D a)' on one °^ tnese P at hs. fr° m (2.1), the weight of the arc between these nodes is P *f(p,q) M f(p,q+l) = (? f(p,q-M) / ^f(p,q) )( ^f(p,q) / ^f(p,q+l) ) - Thus, ^f(p,q) /9 ^f(p,q+D = U f(p,q) /€ f(p,q+l) )(pt f(p,q) /p5 f(p,q+l) ) ' (2>3) Substituting (2.3) in (2.2) gives the following expression for the partial relative derivative of |. with respect to £„ : u *i'*t " Ml> < p ?, q " «f{p.q) /8 f(p.qfl) ,W f(p.q) /P *f(pW ) - Simplification by cancellation results in s U P p£-/p£n = 2 n p£,, n/p^/ ±11 (2-4) 1 P =i q=i f (P'^) f(p,q + D Thus, p£./p£ c is the sum of all path products. ■ Trivially, pZJpZ- = 1, and if there is no directed path from £ fi to £., then P^/P^ = 0. Formula (2.4) may also be expressed recursively. If £• = £ • * £^ then 10 p^/pSi = (p€ 1 /p€j)(p€ j /p€ 8 ) + (p£ i /^k )(p *k /p *8 ) (2.5) This follows from the fact that any path from £ g to £. must pass through For the model problem with data £, = £o = 1 » £3 = £4 = 2, the arc weights are shown below. The evaluation of p$.Jp£~ using (2.1) necessitates the calculati on 3{ g M 3 = % 2 - 2J 3 - { 4 from which follows P V P *3 = ( ^3 / ^9 )( ^9 / ^3 ) = 5/2 Alternatively, formula (2.4) gives P£ g /P* 3 = (Pf g /P5 6 )(p€ 6 /P€ 3 ) + (Pi g /Pi 7 ){pi 7 /Pt 3 ) = 5/2, 11 3. Roundoff Errors Floating point computation is subject to error due to finite word length. Wilkinson [19] provides the bounds for representation and arithmetic. f*i(l + e,) , 1 < i < n IjSj * * k )(l + 0, and explicit mention of e is henceforth dropped. The formulation of total relative error, (3.2), gives rise to an undetermined homogeneous system of equations, Mz = 0, or [-1 ,A,B] «- r _ n ' = (3.3) where I r _ n is the identity matrix of order r-n; A is an (r-n)*n matrix with elements a.. = p£ , ./p£.; B is an (r-n)x(r-n) unit lower triangular ij n+r j \ i \ i j matrix with elements . . = p£ ,./p£ . • below the diagonal; t is an ij n+r n+j (r-n)-vector with elements r. = [p£ + .] T ; d is an n-vector with elements 5 .j = [p^,-]| ; and finally, g is an (r-n)-vector with elements y. = [p^ n+1 *]i- System (3.3) may be used to perform an error analysis, e.g., forward, backward, etc., as in a following chapter. However, these analyses seek a relationship of errors between inputs and outputs, and between computational nodes and outputs. Hence, only those equations of (3.3) corresponding to output nodes need be considered, i.e., for 1 <_ i <_ m, co,. K ] T 2 (pr? ./p£«)[p£p]i 2=1 n (3.4) where 77 • = £ w . This produces the condensed system, M z = 0, or i 13 [-I m ,A,B,] = (3.5) where I m is the identity matrix of order m; A is an mxn matrix with elements a.. = pri./p%.; B is an mx(r-n) matrix with elements J. . = pv-/p£ + .; t is an m-vector with elements r. = [pi?.]-, and d, and g are as previously defined. The elements of M may be calculated from the formula for the partial relative derivative in (2.5). An obvious method forms the entries row by row, the partial relative derivatives on the right- hand side of (2.5) having already been computed. The matrix M can then be formed by selecting rows of A and B in M corresponding to output nodes, and appending -I . Since m is usually much less than r-n, i.e., few of the computational nodes are actually output nodes, much of the work done in forming M is concerned with rows which will eventually be discarded. An equivalent system which will allow the direct calculation of M will now be derived. Formula (3.2) may be expressed as i-1 [p*,] T = &>M + 2 (P*,/PtJ[p* ] i J T Substituting (2.5) gives i J L 2=1 V^l'^i-'l i-1 [P^] T ■ [P^.] L + '(pVPtj) ^ Wj"*!"*A i-1 + (p*./p* k ) Z (p* k /P* B )[p* fi ] L 14 where £. = £• * £,. From the node numbering scheme, {pZJpL ) = 0, if 1 J K J K 8 > j, and {ptJptn) = 0, if 8 > k; i.e., a node cannot be affected by a computation which is performed after its execution. Hence, changing the upper limits of both sums and substituting (3.2) results in [p^-lj = [P^.] L + (P^/Pf jOtajlr + (p^/pS^CpS^t (3.6) In (3-6), the effects of previous local relative errors have been accumu- lated in the operands of £ • . Formula (3.6) is a local relationship of errors, since it concerns only a node and its operands. In contrast, one may call (3-2) a global relationship, since all previous nodes are inspected Thus, formula (3-6) produces the system Mz = 0, or [B,A,I r _ n ] = (3.7) where B is an (r-n)x(r-n) lower triangular matrix with elements <3-jj = P£ n+1 -/p£ n+J -> if £ n+J - is an operand of £ n+i » and zero otherwise, and .. = -1; A is an (r-n)*n matrix with elements a.. = p£ + -/p£ ■:» if % ,• is an operand of £ n+ ,-> and zero otherwise; and I r _ n » and z are as previously defined. The analyses could use either Mz = or Mz = 0. The reduction of Mz=0toMz=0is done to economize time and storage. No such reduction is possible with Mz = 0, since this system specifies relationships at a local level, and to remove rows from it would result in the loss of the global relationships which the analyses seek. Both systems specify a relationship between any and all errors, and their effects. The relationship specified in each system is the same, 15 since by comparing (3.3) and (3.7) and noting that B is nonsingular, it must be that M = (-B)- 1 M (3.8) The systems Mz = 0, Mz = 0, and M z = for the model are given below. Row and column headings are provided as an aid in identifying matrix elements. The equation relating Cp£ q ] t to other errors is shown to illustrate the difference in complexity among the systems. 16 h h h % 9 -I * K *K S-, S a S Q Si So U S A U Si Si So S 5 *6 ^7 ^8 ^9 4 *5 *6 V ^8 <9 -10 1/2 1/2 1 0-1 0-120 1 0-1 1/2 1/2 1 0-1 1/2 3/2 -2 1 -1 1 0-1 -1 5/2 1/2 1 1 1 Mz = " — i r n b* 6 ] T [Pi 7 }j = [p{ 8 j t D>{ 9 ] T IpS 9 }j = -LpS 2 \ + 5/2[ P ^] L + l/2[p« 4 ] L + [ P e 6 ] L + [p^] L + [p^ g ] L = 2 17 B A I ^5 ^6^7 ^8^9 h h h U $5*6*7*8*9 -10 1/2 1/2 1 0-1 -1 2 1 0-1 1/2 1/2 1 1-1 0-1 1 1 1 0-1 1 Mz = [p£ 7 ] t [p* 8 ] t M 9 ] T [p*,] l [pi 4 ] l [p£ 9 ] l C^ 9 ] T = W 9 \ + [p^ 6 ] t + [p^ 7 ] t 18 -I A £r£q £i £? £-5 £ 8 *9 «1 *5 *6 *7 *8 h "1 -1 1/2 3/2 -2 1-1 1 [pV-\]j r? 2 -1 -1 5/2 1/2 110 1 [/»? 2 ] T Mz~ = ">*l\ &*A «A [p^ 4 ] l em 5 ] l Wfi] L &A »A [rf^ [p£ g 3 T is as given for Mz = 0, 19 4- Efficient Calculation of M From (3.8), the relationship between M and M, one can devise an efficient scheme for forming A and B directly from A and B without forming A and B. Let C~ = [A,B], C = [A,B], c. t be the i th row of C and the j th row of C, where n+j = "j. Since C = (-B)" 1 [A,I ], it follows that c l = e. L C = e. L (-B)"'[A,I ], .-th where e. is the j column of the identity matrix of order r-n • Post- multiplying this equation by the nonsingular matrix, I n -A -B and transposing gives I -A" n -B 1 c. = e" ., i n+j' where e 1 . is the n+j column of the identity matrix of order r. This n+j unit upper triangular system may be solved in an efficient manner by drawing on the sparseness of A and B. Let R be the coefficient matrix, s = c-j , and f = e^+j • Then initially, V-l ; r-l >-l 0v 20 The value of o is immediately known, and only the deflated system R r-l s r-l = f r-l " Vr need be considered in order to find the remaining a's. This system is of the same form as the original, and the same technique to find a , may be employed, etc. Each update of the right-hand side is particularly simple„ At step k, r >_ k >_ n+1 , the vector u. is a subvector of the (k-n) row of [-A,-B], and therefore is zero except for two elements. Let u k t = (M kl , M k2 ,.. • , M k k _ 1 ). If ^ k = £ * I , then the non-zero elements of u^ are M k = ~(p^/p^J* and ^ k q = '^^PlJ- Thus » the following algorithm computes the i^ n row of C. 1. f^ n J/ = e ' ., where n+j = u>. n+j 1 2. For k = n+j, n+j-1,..., n+1 f (k-l) = f (k) s except 1) p ( k " 1 ) = p < k ) + k (k, (P^ k /P^ p ), 2)0q (k-l) =0q (k) + * k <*0(p* k /p$ q >, where £ k = £ p * £ q , and f (k) = (0 i (k) ),l < i < r 3. c. - f (n) The algorithm starts with V n J ^ rather than f^ r ' since 0.' r ' = for i > n+j, and hence f( n+ J) = f( r ). The algorithm ends with f( n ) rather than f^ 1 ) since after j steps R = I , and hence f( n ) = fC). n n An algorithm similar to (4.1) has been proposed by Linnainmaa [7] for the determination of Taylor coefficients for accumulated absolute rounding error. (4.1) 21 The algorithm is related to formula (2.4) on path products. This relationship is based on the observation that any path from, say £ , to % , . must pass through some node £ , where £ has £ n as one of its n+j ^s ^s ^C operands. Let S be the set of all such indices s. Then (2.4) may be expressed as p£ 7p£ n = 2 (pi /pi ){pi /pi ) n+j £ sGS n+j s s C At each step k = s,

- *k<» e d / "'« ,)/e k Z Table 2. Evaluation of a£ . ./de. n+r 2 Note that 3 £ n+1 '/ 3c n+ -j = £ n+1 - ■ Computation proceeds row by row, partial derivatives in the table having already been computed. Assuming each node type is equiprobable, one performs an average of 2.5 operations to evaluate 23 each of the (r+n)(r-n)/2 elements of A and A • Rows corresponding to output nodes are subsequently used to perform the analyses. The storage required is r(r-n) for [A ,A ] . In comparison to generating M from M, this technique takes 5/6 of the time and the same storage. With respect to M from M, it requires 5(r+n)/(16m) of the time and r(r-n)/(r(m+2)-2n) of the storage. However, algorithm (4.1) may be modified to compute the elements of [A »A r ]. Since ^ n+i /3e £ = * e ( a *n+1 /d *« *» the " path P roduct " P™perty of partial derivatives in (2.2) may be employed to compute the matrix elements to within a scalar factor. The modifications to algorithm (4.1) entail replacing P£ k /P$ p by 3£ k /3£ , P$^/pi by 3£ k /3£ , and including a final scalar multiplication step. The modified algorithm would use Table 3 to calculate the arc weights. For £ . = £. * £, , J k * = + * = - * = X * = -r 1 1 h vh 1 -1 h V*k Table 3. Arc weights for Miller While the computations required in Table 3 are less than that of Table 1, the final scalar multiplication step in the modified algorithm outweighs this and results in an overall increase of up to 25% over the time to execute algorithm (4.1). The storage required remains the same. The generation of the equivalent of M in Miller [10] for several algorithms was timed on a 360/75. The results are summarized in Table 4. Many of the algorithms are described in Miller [12]. Givens/backsub re- duces a matrix to upper triangular form using Givens plane rotations, and solves the resulting system by backsubstitution. Algorithm n r-n Miller m [10] Miller w/(4.1) 24 Speedup 6ivens(QA=R)/backsub(Rx=Qf) (4x4) 20 172 4 0.98 0.085 sec. 11 .5 Least Squares (4x3) 16 116 3 0.45 0.046 9.8 Givens(QA=R) (4x4) 16 120 10 0.49 0.092 5.3 Choi esky(A=LL t )(Ly=f) (L t x=y) (4x4) 14 62 4 0.16 0.030 5.3 RINVERSE (4x4 UT) 10 30 10 0.042 0.010 4.2 Table 4. Timed Comparison 25 5. Error Analyses 5.1 . Forward Analysis A forward analysis finds a bound on the relative error in the solution |[p*?.] T |, 1 < i < m. Depending on the assumptions made, the relative error in t?. may be the result of inexact data, inexact arithmetic, or both. From the system M z = 0, (3.5), t = Ad + Bg (5-1) or, in terms of the matrix and vector elements, for 1 < i < m, [pr? i ] T j=l 1 J J L j=n+l 1 J J L (5.2) Considering for the moment any single output, 77. , the relative error in t? . can be maximized by choosing the local relative errors, [p£.l , 1 <_ j <_ r, in the following manner: &*A r 1 if p77./p£ . > c I c| <_ 1 , if pn./pt, . -1 if pv -/P% . < (5.3) This choice of the local relative errors guarantees that [pt?.],. takes on its maximum positive value. The maximum negative value of [p^?.]-. has the same magni- tude, and corresponds to changing the signs of [p£-j]i "in (5.3). Without loss of generality, the positive value will be used. The choice of local relative errors in (5.3) reduces (5.2) to 26 n l T - - =1 l J j=n+1 l J (5.4) where a possibly different set of local relative errors is chosen for each t?.. Formula (5.4) shows that the relative error in i?. is bounded by (||a. ||, + ||b.||,) where a. 1 and b. 1 are the i th row vectors of A and B. The vector of relative error bounds, for all outputs, obtained from (5.4) is a pessimistic one in that, generally, each bound cannot be simultaneously attained . Recall the matrices A and ¥ for the model problem: v. £ £ £ £ 5 1 *2 *3 M t t t t t ? 5 ? 6 ? 7 ? 8 *9 1/2 3/2 -2 1-10 10 -1 5/2 1/2 110 1 Application of (5.4) gives a relative error bound of 7 for t? and corre- sponds to the choice of 1 ,1 ,-1 ,c,l ,-1 ,c,l ,c for the local relative errors. Similarly, the choice of c,-l ,1 ,1 ,c,l ,1 ,c,l for [p£ .]. , 1 _< j _< 9, gives a relative error bound of 7 for 77 Recall that these values are coeffi- cients of e. Equation (5.1) shows the contributions of data error, d, and computational error, g, to the output error. By assuming exact computation, i.e., setting g - 0, then (5.1) reduces to t = Ad, and (5.4) becomes |[pi?.] T l < 2 |pt?./p£,| j=l ' J Definition 5.1 : condition vector of the problem - the vector of output relative error bounds as computed by (5.5). The elements of this vector (5.5) 27 may be interpreted as the sensitivity of the outputs to input error. . For a problem with given data, the condition vector of the problem is independent of the algorithm. This fact follows from the assumption of exact computation. For the model problem with data £, = £p = 1 , £ 3 = $, = 2, the "condition vector of the problem" is always (4,4) . By assuming exact data, i.e., d = 0, then t = Bg, and r |[pi?.] T | ^ \pn /p£ | (5.6) 1 ' j=n+l ! J Definition 5.2 : condition vector of the algorithm - the vector of output relative error bounds as computed by (5.6). The elements of this vector may be interpreted as the sensitivity of the outputs to computational error. Naturally, the condition vector of the algorithm varies from algorithm to algorithm. For the model problem and the given data, the algorithm employed has a "condition vector of the algorithm" of (3,3) , 5.2. Backward Analysis A backward analysis attempts to find a number, 0, which measures the numerical stability of an algorithm. An algorithm is stable if it produces a solution which is the exact solution of the problem with data ^0 + lfi$j\- e). 1 1 i <_ n > where I[P^] L I i - The value of is found by comparing the set of output relative error vectors corresponding to any computed solution with the set of output relative error vectors produced by some perturbation of the data. The analyses in this and the next section assume that the computed solution is obtained from exact data. Let C k = {v: || vjl^ £ 1 } be the unit cube in R (the --norm). Any vector, g = (7^, y. = [p*^.]^ 1 < i < r-n, of 28 local relative errors in the computational nodes is an element of C , since | -y - 1 <_ 1. From the forward analysis, a computed solution correspond- ing to a given g has an output relative error vector Bg e R . it follows that any computed solution must have an output relative error vector which is an element of BC = (Bg: g e c r _ n ^- The set > BC r _ n is a cent rally sym- metric convex polytope. This follows from the similar properties of C r-n' and the fact that B is a linear transformation from R ~ n to R , Householder [5]. For the model problem with given data, the polytope BC is shown below. [pt?i] 1 J T In a similar fashion the set of output relative error vectors produced by only perturbing the data may be found. Let d = (5 ), i 5 • = [p£ .]. » 1 £ i £ n, be a vector of local relative errors in input nodes with |5.| <_ 1 . Then d g C , and the corresponding output relative error vector for the solution is Ad e R • The set of all such vectors forms AC = {Ad: dec}, which is, again, a centrally symmetric convex polytope For the model problem, the polytope AC is shown below- n 29 1 ^zh - [p^Ij (4,-3) (4,-4) To perform a backward analysis, the output relative error v vector for any computed solution must be shown to be the same as one obtained by only perturbing the data. By superimposing BC and AC for the model problem, it is clear that there are output relative error vec- tors for computed solutions which are not output relative error vectors corresponding to input perturbations of only 1 ■ e. AC. [p^lj BC r-n 30 Thus, the value of © = 1 is too small. The correct value of © is found by expanding (or contracting) fiC n so that it just covers BC r _ n - This cor- responds to allowing d to lie in ©C n , where > 0. Ad is then an element of the polytope ©AC . The value of © sought is glb{© > 0: BC r _ n c ©AC n >. For the model problem, © = 2.375, as illustrated in the diagram below. [pT? 2 l T AC [PV^Ij 31 Thus, any computed solution of the model problem at the given data can be obtained by computing exactly with data £.(1 + [p£.l • e), 1 £ i £4, where [£p| .], | < 2.375. i L — The amount by which AC must be expanded to cover BC is governed by only a few of the elements of BC . These elements correspond to the vertices of BC From the convex properties of AC and BC , if AC covers the vertices of BC , then it covers all of BC . Hence, attention may be limited to covering only the vertices of BC . As the previous diagram illustrates attention may, in fact, be restricted to covering only one vertex (and its symmetric mate). This vertex is the out- put relative error vector Bg* corresponding to a "worst" set g* of local relative errors in the computational nodes. By covering this worst case, all others are covered. This vertex Bg* is characterized by being the "farthest" point on the perimeter, or boundary of BC from the boundary °f AC n . Any point Bg on the boundary of BC is covered, on expansion, by a corresponding point Ad on the boundary of AC r _ n . Both Bg and the corresponding Ad lie on the same radial line from the origin. Hence, Bg = 0Ad, where ©measures the distance of Bg. For the model problem, the points Bg* and Ad* are illustrated below. 32 AC BC r-n O??] 2 J T Ad* = (1.26, •421) t >r^ Bg* = (3,1 ) t [PT? 1 ] T In general, there may be more than one symmetric pair of vertices on the boundary of BC whose relative distance from the boundary of AC is maximal J r-n J n If this is the case, any arbitrary vertex of maximal distance may be used, since when any point on the boundary of BC of a given relative distance is covered, all points of that relative distance are also covered. The chapter on computational details will discuss the finding of g*, and it will be assumed here that it has been computed. The following problem may be solved to obtain 0. Backward analysis: find the smallest constant > 0, which satisfies the following conditions: Ad = Bg* |5.| < 1 l ' — (5.7) 1 < i < n where d = (5 ), 5. = [p£.l , 1 <_ i <_ n, and g* is as described above. The minimal is the °°-norm of that vector d e 0C of input relative errors n 33 whose image under A covers the "worst" output relative error vector Bg*. fiAC n therefore covers BC r _ n - Having set up the problem (5.7), it may be found to be unsolvable. From the construction of F it is easy to show that its columns span R . The polytope BC r _ n therefore extends in all m dimensions. If m > n, i.e., if there are more outputs than inputs, then the n columns of A can do no better than to span R n . The polytope AC can therefore extend in no more than n dimensions. It is impossible to cover a polytope in R m w -jth another in R , where m > n. This deficiency is reflected in the inconsistency of the equations in (5.7), i.e., rank (A) < rank ([A,Bg*]). Even if m <_ n, which is usually the case, there is no guarantee that m of the columns of A are linearly independent, and hence, that AC extends in m dimensions. Again, the situation could be the impossible task of covering a polytope in m . k R' with another in R , where m > k. The following example illustrates this case- This is the computational graph for the algorithm: t? = £ = % + \ • n 2 =* 4 = *l *h- 34 For ^ = * 2 , 1/2 1/2 , B = "l 1 1 1 A = The polytopes AC and BC are shown below [p7? 2 ^T E^i] 1 J T The polytope AC consists of the line segment joining the points (-1, -2) and (1, 2). The polytope ©AC- for any © will consist of the line seg- ment joining the points 0(-l, -2) and 0(1, 2), and cannot cover the poly- tope BCp. The inability to always perform a backward analysis warrants a relaxation of the criterion for stability to allow an analysis which may always be done. One such analysis is presented in the next section. 5.3. B-analysis A forward and backward analysis can be combined into what Panzer [14] calls a B-(Beitseitige) analysis. The B-analysis is along the lines of Stewart's [18] definition of stability, namely, that the 35 computed solution is "near" the exact solution of a perturbed problem. From the forward analysis, any computed solution, assuming exact data, must have an output relative error vector t B ■ Bg which is an element of BC . Similarly, the output relative error vector r-n J r produced by perturbing only the data and computing exactly is »A = Ad which is an element of AC . The backward analysis requires finding the smallest number such that for any choice of g in C , where ©' <_ 0. However, the B-analysis only requires that 0't. be "near" t R . The "nearness" of 0't A to t R can be measured by the °°-norm, ©' , of a vector ©'t such that ©'(t A + t) = t B (5.8) AAA ^ where t = (t •) , t , - [pr?-], [r,. | <_ 1 , 1 < i < n, is a vector of relative errors which artificially perturb the output values. A constant, 0, is now sought such that for any g in C , (5.8) holds for some 0' £0. By bounding both the input and output perturbations by 0, the "nearness" of the perturbed inputs to the true inputs, and the "nearness" of the com- puted solution to the exact solution of the perturbed problem are measured in the same way. 36 The value of is found by comparing sets of output relative error vectors. As before, the set of all vectors, tg, is BC r _ n . Let A = CA,I ]• and d 1 = (d fc , t*). Then t A + t = Ad, where d e C n+m - The set of all vectors, t A + t, is AC n+m , a centrally symmetric, convex poly- tope. For the model problem, AC n+m and BC r _ n are shown below. AC n+m [P*? 2 ] T BC. r-n [PV-^lj The B-analysis now follows a strategy similar to that of the backward analysis. It must be shown that the output relative error vector for any computed solution is the same as that obtained by perturbing the data and the outputs. The polytope AC + is expanded so that it just covers BC . Thus, d is allowed to lie in ©C , where © > 0. Ad is then an element of ©AC . . The value of souqht is glb{© > 0: BC „ c ©AC }. n+m M 3 r-n — n+m The diagram below shows that © = 1 .1176 is the desired for the model problem 37 AC n+m Ipv 2 1j ~\ \ \ ^ / S • < — [pv-,1 1 J T BC r-n Thus, any computed solution of the model problem at the given data can be obtained by computing exactly with data £.(1 + [p£.l • e), 1 <_ i <_ 4, and then perturbing the outputs q. by (1 + [pi?.] • e), 1 < i < 2, where |[P^] L I 1 1.H76, and H>?.]| < 1.1176. As in the backward analysis, is governed by a vertex Bg* corresponding to a "worst" set g* of local relative errors in the computa- tional nodes. The vertex Bg* is one of the points on the boundary of BC of maximal relative distance from AC , where distance is measured in the n+m same way as that in the backward analysis. For the model problem, g* is the same for both the backward and B-analysis, though, in general, this need not be the case. Assuming that g* has been computed, the following problem may 38 solved to obtain 0. - B-analysis: find the smallest constant > 0, which satisfies the following conditions: Ad = Bg* (5.9) |6 . | <_ 1 < i < n+m t where d - (8 •)» 1 £ 1 1 n+m r i _ n = [pT7 i _ n ] for n+1 <_ i < n+m A = [A, I], and m g* is as described previously. The minimum is the °°-norm of that vector d e 0C of input and output relative errors whose image under A covers the "worst" output relative error vector Bg*. 0AC , therefore covers BC J m+n r-n The system of equations in (5.9) is always underdetermined since *t t *t A is an mx(m+n) matrix. Furthermore, a vector, d = (d ,t ), which satisfies these equations can always be found since rank (A) = rank ([A, I ]) = m. In particular, the vector (0,(B~g*) ) satisfies these equations, and corre- sponds to a forward analysis. This vector, however, does not usually give the smallest 0. 39 6. Computational Details 6.1 . The Form of g* Before the problem for backward analysis (5.7) or the problem for B-analysis (5.9) can be solved, the vertex, Bg*, of the computational polytope of maximal relative distance from the problem polytope must be found. The following discussion will concern the B-analysis to which obvious modifications can be made if a backward analysis is to be attempted. Since B is fixed, the problem is that of finding g*. From the properties of B and C , g* is found to have a particular form as shown by the following argument. Any point, x, in a convex set may be represented by x = Xa + (l-X)b for some a and b in the set and <_ X <_ 1 . A vertex, x, has a unique representation of the form x = 1 • x. Let x be an element of - r " n - - th BC , i.e., x = £ y. b., where b. is the i column of B and \y.\ <_ 1. Theorem 6.1 : If x is a vertex, then \y . | = 1 , 1 <_ i <_ r-n . Proof: Assume there exists a j such that |t-| < 1. Without loss of generality, assume <^y. < 1. An analogous proof holds for -1 <_t- < 0. Then j j r-n _ x = 2 7,-b. + Tib. i=l ^ ^ J J 1+J r-n Let u = 2 T-jb.,. . Then x = u + T-b.. If y. = 0, then x = u. However, i=l i+J i=l 1 ! J J J (u + b.) and (u - b.) are elements of BC . Hence, x = X(u - b.) + (1 - X)(u + b.), where X = 1/2 j j Thus, x is a non-trivial convex combination of two points in BC which r r-n 40 contradicts the assumption that x was a vertex. - If < 7. < 1, then, again, x = u + 7^0. . However, u and (u + b ) J J J j are elements of BC , and r-n x = Xu + (1 - X)(u + ET. ), where \ s 1 - y. J J Again, the fact that x is a vertex is contradicted. Since the choice of 7. was arbitrary, it must be that any vertex has the form J r-n _ x = 2 7-b., where \y.\ = 1 ■ i=l n 1 n * Thus, g* must have components 7- = + 1 . Intuitively, one would expect a "worst" set of computational errors to contain a maximal error for each step of the algorithm. Thus, the problem of finding g* consists only of finding the signs of its components. For the model problem, g* = (1,-1,1,1,1) • 6.2. Simplification of Polytope Representation For efficiency, it will be of advantage to reduce the number of columns in A and B. The following argument shows how this may be done. Let v be a vector formed by the sum of j vectors, v., 1 <_ i <_ j, which are in the same or opposite direction. Then j v = 2 a. v., with |a.| < 1 (6.1) where for 1 < i <_ j, v. = (-1) i P 1 v ] » i = -sign (v.j v ] ) 'l = Hv 1 ll - /||vjlL>0 41 Then v may be represented by v = a w, with \a\ <_ 1 (6.2) where a = ( I a.(-l) i fl.)/( I 0.) (6.3) i=l ] 1 i=l n J a w = 2 (-1) ] v. (6.4) i=l n Formulas (6.1) through (6.4) provide the basis for condensing B to a matrix B having k, £ r-n columns, b.. In a similar fashion, the matrix A = [A, I ] may be condensed to A = [A,D] having k 2 +m <_ n+m columns, a.. The matrix D replaces I allowing for the case where some of the columns j m of A are multiples of e., the columns of I . i m Following (6.4), the matrix B is condensed to B by post-multiplica- tion by a (r-n)xk, matrix P R , B = B P B (6.5) where P B has elements A {+1 if b. is condensed into b. SL -1 if -b. is condensed into b. j ' otherwise The matrix P B has the following properties: (1) each row has only one non-zero element, (2) the columns are orthogonal, (3) P„ P B = D g , a diagonal matrix whose i diagonal element 42 equals the number of columns of B which are condensed into b . . l For the model problem (6.5) reveals 2-10 1 2 1-10 10 110 1 1 1 1 1 1 J These reductions have not changed the polytopes, and the following proof shows B C. = B C .A similar proof would show that AC, , m = A C ,. K k-j r-n r kp+m n+m Theorem 6.2 : B C k = B C p _ n . Proof: a) Let x = B g, where g = (7.), 1 <_ i £k, ,with \y.\ <_ 1 . Then x g B" C. . From (6.5), '1 x = Bg=BP R g Since P R has only one non-zero element per row and this non-zero element has magnitude one, it follows that II p R 9lL 1 1 Hence, there exists a g e C (namely g = P R g) such that x = Bg. Thus, x e B C , and B C. c B C r-n k, - r-n b) Let x = Bg, where g = (7.), 1 <, i <_ r-n, with |*y - 1 <_ 1. Then x e B C . Let the columns of B be partitioned according to the sets J., 1 < i < k n , where j e J. if b. or -b. is condensed into b.. Then 43 x = 2 7-b. + 2 7.b. + ... + Z T-b. jeJ 1 J J jej 2 J J jeJ k J J In each sum, 2 7-b., all the vectors, b., are in the same or opposite jej. J J J direction, and |t.| £ 1. Each sum has the form given by (6.1), and may be substituted by T^b., where y. and b. are computed via formulas analogous to (6.3) and (6.4). The formula analogous to (6.4) is (6.5). The formula analogous to (6.3) is g = Q B g where CL is a k,x(r-n) matrix with elements ll^jllo/H^ilL if bj is condensed into b. pi _~_ *. k.. = < -lib- II /lib. || if -b. is condensed into b U v otherwise The matrix Q B has the following properties: (1) each column has only one non-zero element, (2) the rows are orthogonal, (3) each absolute row sum equals one. This follows from lib. || = 2 ||b. II , (4) Q D (L = D D , a diagonal matrix, D B D (5) Qg has the same structure as P R , i.e., corresponding elements of Q R and P B are either both zero, both positive, or both negative, 44 (6) Q D P D = I,, • This follows immediately from properties DO r i (3) and (5). For the model problem, and g = (1,-1,1,1,1), the corresponding g is given below. 1/2 1/2 10 1/2 1/2 1 - * Thus, performing the above substitutions, x = 2 y. b., with i=l ^ 1 /\ /\ ^ |7- 1 < 1. Hence, x e B C. , and B C c B C, . ■ i ' — k-j r-n — k-. Similar arguments reveal that any point x in A C , of the form ■r.x.r, a n+m n+m . . C Ad = 2 5. a. can be expressed as an element of A C. .„, of the form • ■ill kp+m k 9 +m a ^ Ad = 2 6. a., and vise versa. Hence AC = A C. . . i= 1 li n+m k 2 + m For the model problem 3/2 -2 3/2 -1 5/2 3/2 A = , B = 2-10 1 2 where k +m = 4, and k =3. The following table shows the often dramatic 2 1 s reductions for a battery of problems and algorithms, see Appendix for description . 45 ALGORITHM m r-n k l n+m V - BACK4 3 12 3 13 6 SEMI4 3 12 3 13 6 PARA4 3 14 4 13 6 BACK8 7 56 7 43 15 SEMI8 7 56 7 43 15 PARA8 7 86 11 43 15 GS 12 81 26 24 19 MGS 12 81 28 24 19 LTBAND 7 86 10 37 15 RINVERSE 10 30 10 20 19 GJ 4 76 7 24 8 GE 4 62 6 24 8 CHOL 4 62 13 18 14 MATMUL 16 112 16 48 48 CG 4 122 32 12 9 Table 5. Simplification of Polytope Representation As the table illustrates, the reduction of B to B occasionally results in k =m in which case B is, to within column permutation, a lower triangular matrix. The following theorem gives sufficient conditions for this to occur. 46 Theorem 6.3 : If the computational graph is a binary tree, i.e., each node is an operand of no more than one node, then B is lower triangular. Proof: For each node r\ . , 1 <_ j <_ m, define S. = {£ : pr? /p£ fO and pr? /p£ = for all k < j} J n+i j n +i k n+i A node belongs to S . if 77. is the first output node encountered on the unique path from it to the root of the tree, i? m . Every computational node appears in one and only one set, and each set is non-empty, since 17. belongs to S . • Let the columns of B be partitioned according to the sets S-, i.e., column i is in partition block j if (■ . is in S.. Con- sider the columns corresponding to S, = {£ n+ ,-« PV-,/p£ + i f 0}. For any £ n+1 - in S-, , the corresponding column in B" is {pvJpZ _, prijpt . , ... , pv /Pi ) = V 1 ' {pv J Pi .) 1 n+i 2 n+i m n+i ' 1 n+i where v., = [pv,/pv , pr} /pr\ , • • • , pr\ /prj ) • This follows from the fact 1 112 1 ml that the graph is a tree and hence pv Jp% . = (pr? Jpv ){pv /p£ n+1 -), since \j III J I there is at most one path from £ ,. to 1?.. Thus, all the columns of B" r n+i 'j corresponding to S, are multiples of v-i , and the reduction of B may choose the first column of B to be v, • 2 \pr\ /p£_..-|. Following similar 1 t (=<: '^Y n+1 3 *n+i fc 5 1 arguments, the column of B corresponding to £ + . in S. is v. • {pvJp£ + .) where V. = (0, -.., 0, PV./PV.> PV. JPV., ..., prijpri*)* J y^_ ^ J J J+l J m J 47 t h — Thus, the j column of B is v. • 2 \pr\ Jp% . |. From the form of J £ x .eS. J n ] n+i j v., it follows that B is lower triangular. ■ A similar result for A, the condensed matrix for A, does not follow. For binary trees, n > r-n >_ m, and seemingly the possibility exists for reducing the n columns of A to m columns of A. The matrix, A, then being lower triangular, would be of rank m, and this fact would guarantee that a backward analysis could be done. The argument below shows why this is not always true. The following example provides an illustration. 48 Here, A = 1 1 1 1 1 1 1 1 , A = 2 2 2 2 In theorem 6.3, each set S,, 1 <_ j <_ m, is non-empty and hence, there is a condensed column for each set. For this example, S., is empty, character- izing the fact that each path from an input node to t? encounters another output node first. Thus, there is no condensed column corresponding to S^, and A has fewer than m columns. With the additional assumption that all the sets, S., 1 <_ j <_m, are non-empty, theorem 6.3 will hold for A. 6.3. Algorithms for g* The preceding section has shown that the size of the problem for B-analysis (5.9) may be reduced to the following: minimize subject to Ad = Bg* (6.6) £ 0, 1 _< i <_ kp+m, where d has components 5. representing, according to the reduction of A, combined input and injected output relative errors, and g* has components T*» T 1 1 1 k i > representing, according to the reduction of F, combined relative errors for the "worst" computation, with M| = 1. Thus, algorithms for g*, rather than g*, will be presented. The g* which is sought is that g in C. which gives the largest K l 49 solution to (6.6). The problem of finding g* is a minimax problem. Unfortunately, algorithms for finding the global solution to this type of problem are often tedious. Because of the special form of g*, finite, though often lengthy, algorithms can be constructed to find the global solution to (6.6). Two such algorithms will be presented. The first algorithm consists of solving (6.6) for all possible g, and selecting that g which gives the largest 0. Since each component ^1 of g* is +1, and there are k, components, (6.6) must be solved 2 times. The reason for the reduction of B to f in the last section now becomes apparent. For a given g, the solution of (6.6) is a linear program. As it stands, (6.6) is not in standard form, since the variables 6 are not constrained to be non-negative. A standard procedure for handling this case is to replace each inequality |5.| £ by the two inequalities 5. + - 6." < 0, and 5. + - 5." > -0, where 6. + and 6 " are non-negative i i — ' ii— i i - variables representing the positive and negative parts of 5.. This pro- cedure practically doubles the number of variables and constraints, and increases the storage required by a factor of four. An alternative procedure consists of a change of variables. Define the non-negative variables S\ = 8^ + 0, i.e., d' = d + 0e k +m> where e. + is a k 2 +m vector each of whose elements is 1. Then, the ^ e. *• constraint |5.| £0, becomes 8! - 20^0, or d' - 29e k +m <_ 0. Substi- tutinq d = d' - 0e, . , the constraint Ad = Bg* becomes Ad' - 0w = Bg*, 1 k 2 +m . where w is a k.+m vector having the elements ox = 2 a... Thus, (6.6) 2 j=l 'J has become 50 minimize subject to Ad' - 0w = Bg* (6.7) d' " 2©e. +m < However, rather than solving (6.7), tremendous savings in time and storage can be had by considering a dual problem, Ascher [1J. His A • notation will be employed. Given a candidate for g*, let d be a feasible solution to (6.6), i.e., d satisfies Ad = Bg*. Define the following quanti- ties: 7 = 1/ll'dH^ = 1/e, K Q = HAll^/llfg*!^. Choose any K > K Q and define the kp+m+l vector y with components i? . =76., 1 <_ i <_ k^+m, and *? k +m+1 = T/K. Since 7 < K Q , \\y\\ m < 1. Let A = [A, -K fg*] be a m*(k 2 +m+l) matrix, and c be a kp+m+1 vector whose elements are zero except for the last component which is 1. Then, (6.6) is equivalent to the linear program: t max c y subject to Ay = (6.8) The dual of (6.8), Rabinowitz [15], may be stated as min || r* || _! subject to A fc a + r = c (6.9) where A and c are as described above. What is claimed to be the most effi- cient algorithm yet devised for solving (6.9) is provided by Barrodale and Roberts [ 2» 3]- The outstanding features of the algorithm include: (1) a condensed tableau, only A and c are explicitly stored; (2) no phase 1 51 i.e., an initial basic feasible solution is available, r = c; (3) the ability of passing through several vertices in a single iteration. The following equalities show how is retrieved from the solution of (6.9). B ] = min || r- 1| ^ = max c l y = v k +m+1 = 7/K = 1/(6K). Hence, = 1/(K£ 1 ). The following table shows the saving in time of (6.9) over (6.7) for a battery of problems and algorithms. The time, in seconds, is for the evaluation of for one candidate for g*. It is assumed that A and Bg* have previously been computed. ALGORITHM PRIMAL (6.7) DUAL (6.9) BACK4 0.23 0.02 SEMI4 0.23 0.02 PARA4 0.19 0.01 BACK8 3.91 0.10 SEMI8 3.77 0.09 PARA8 2.10 0.11 GS 1.12 0.15 MGS 1.00 0.15 LTBAND 1.82 0.20 RINVERSE 0.78 0.12 GJ 0.55 0.03 GE 0.65 0.03 CHOL 0.38 0.02 MATMUL 5.79 0.58 CG 0.15 0.01 Table 6. Primal and Dual Solution Timings 52 While the first method for finding g* searches all possible g in C. for that g which gives the largest solution to (6.9), a second method searches all the faces of A C. . for that face which will cover kp+m Bg*. The representation of A C. + is changed to a more standard represen- tation which is in terms of the hyperplanes which support the polytope. A hyperplane, Householder [5], divides R into two half spaces and is repre- sented by the equation v x = w, where v is an m-vector which is normal to the hyperplane, and wis a constant which is related, by the length of v, to the Euclidean distance of the hyperplane from the origin. A hyper- plane supports the polytope if there is at least one point in common be- tween the two, and all the points of the polytope lie on the hyperplane or in only one of the half spacesi The points x in the polytope satisfy v x <_ 0). Since the polytope is centrally symmetric, there is a corresponding support hyperplane v x = -wand points in the polytope satisfy v x > -cj. Collectively, the points in the polytope satisfy |v x| <_ w. The particular support hyper- planes of interest intersect the (m-l)-dimensional faces of the polytope. The points on a face of the polytope A C. +m have a particular form. :: k+m : : : Each vertex of the face is of the form Ad = 2 6. a. with |5.| = 1. Points i=l ] n n on the face are convex combinations of these vertices. A point on the face supported by a hyperplane with normal v may be represented by Ad = „£ 6. a. + .2 5. a. (6.10) v l a.f0 v a^O 53 where the sum is split according to whether the projection of a. on v is zero. AH of the points on the face have the same projection on v. The following illustration shows a typical two-dimensional face of an A C. + c 3 . ' R , where, in this case, three of the columns of A are orthogonal to v, origin and y is the first sum in (6.10). The points on the face are symmetric around the point y, which may be called the "center" of the face. The coefficients, 5 ., for y are given by f f +1 if v a. > i ■ -1 if v a. < V. 1 If this were not so, then it would be possible to construct a point, y', which would have a projection on v which is greater than that of the points on the face. The origin and y' would then lie in different half-spaces, contradicting the fact that the hyperplane supports the polytope on this 54 face. The coefficients, 6., in the second sum of (6.10) satisfy |6.| <_ 1 and are related to the representations of a point on the face in terms of convex combinations of the vertices of that face. Since a face is (m-l)-dimensional , there must be at least m-1 terms in the second sum of (6.10). Each face therefore corresponds to a set of at least m-1 columns of A. These vectors must span an (m-l)-dimensional subspace of R m , and hence, attention is restricted to combinations of m-1 columns of A which are linearly independent. While there may be more than m-1 columns of A orthogonal to v, any subset of m-1 linearly independent vectors will suffice to determine the face and the support hyperplane. Suppose m-1 linearly independent columns of A are given, say t* a,, a 2 , ..., a i . Then the condition v a. = 0, 1 <_ i <^m-l, determines v to within a constant multiplier. Requiring that ||v|L = 1 determines a specific v. Having found v, the corresponding co is determined by finding a point which lies on the face. Such a point, y, has been determined above. The value of co is then |v y|. Thus, for each set of m-1 linearly independent columns of A, the corresponding symmetric pair of support hyperplanes can be determined. t k 2 +m The points of the polytope must satisfy |v x| <_ to. If k 3 <_ ( m _-| ) is the number of support hyperplane pairs, then the polytope may be represented by those points x which satisfy |Vx| <_ w, where V is a k^m matrix whose rows, v- , are the normals for the support hyperplanes, w is a k^-vector of k 2 +m n+m J corresponding co' s. Often ( m _-| ) is much less than ( m _-j) and the reason for the reduction of A to A now becomes apparent. It will be con- venient to work with a normalized version of these inequalities, |Vx| l e, where e = (1,1,..., 1) , in which each row has been divided by the corresponding right hand side. 55 The polytope B C. is covered by expanding A C. + by a factor 0. Corresponding to the expanded polytope A 0C, . is the hyperplane represen- tation |Vx| < 0e (6.11) in which the support hyperplanes have been moved out away from the origin by a factor 0. Given any x in R m , the smallest for which (6.11) holds is the norm of x, where here, instead of a unit ball or cube, there is a unit polytope. Of interest are the norms of the points which lie in B C. . K l ^ K l . A Such points are of the form Bg = I 7-b., with \y.\ <_ 1 . The particular i=l n ] ■* x = §g* is sought which gives the largest 0. It is known that this x is a vertex of B C. , and \y.*\ = 1, 1 ± i £ k,. Thus, the smallest is sought for which any choice of \y . | <_ 1 generates an x = Bg which satisfies (6.11). Let U = V B be a k.xk-, matrix with elements m... Then, it must be that 3 1 ij a A. XL a |Ug| <_®e. Let u. be the i row of U. Then, since \y.\ <_ 1 , 1 <_ i <_ k, , t* l u - 9 1 1 ll u illi» W1tn equality when the components of g are chosen to have magnitude 1 and sign of the corresponding elements of u. . Let J be that t~ t" t* subscript for which u. g. is maximum, i.e., u, g, = max u. g., where g. t " is that g chosen in the above manner for each row u. . Then, g* = g, with ** t A elements y^ = 2-sign (// )-l , 1 <_ i <_ k, , ahd = u, g*. If d e 0C + corresponding to Bg* is desired, additional computations need to be performed. Let \f. be the J row of V, where u, = Vj B and J is as defined above. Thus, v is the normal to the hyperplane which supports J 56 the face of A OC on which Bg* lies. Hence, Bg* may be represented as in (6.10). For the first sum in (6.10), |5.| = +<9, where the sign is determined by the sign of the corresponding V^a.. This follows from the fundamental theorem of linear programming which states that the inactive variables take on their extreme values. The values of 6. in the second sum are determined by solving an mx(m-l) consistent system of equations involving those columns of A for which v\a. = 0, and the right hand side (tg*-y). If the number of columns in A for which v,a. - is greater than m-1 , then the problem becomes a linear program. In an implementation of this algorithm, the matrix U need not be stored. It may be computed one row at a time and a variable may hold the largest value of u. g. computed so far. Additionally, the normals, v., 1 < j < k», may be easily computed in the following manner. Let A be an mx(m-l) matrix whose columns are some m-1 columns of A. Then, A == QR, where R has the form where the last row of R is zero. If rank (R) = m-1, then the m-1 columns of A chosen are linearly independent and hence, determine an (m-l)-dimensional face. The normal to the hyperplane supporting this face is v = Qe , where e is the last column of the identity matrix of order m, and || v|| 2 = !• 57 When m is greater than 20, it is computationally advantageous to compute the QR decomposition and the normal v in a different manner. Rather than computing a new QR decomposition for each combination m-1 columns of A, the combinations may be ordered in such a way that the factorization of the next combination may be easily computed from that of the previous one. As before, let A = QR, where A consists of some m-1 columns of A. Let A' be the same as A except that column k has been replaced by a column, a., of A so as to form a new combination. The revolving-door algorithm given by Nijenhuis, Wilf [13] specifies a sequence of replacements which will kp+m generate all ( , ) combinations. Then A' = QR', where R' has the form and R' is the same as R except for the k column which is now Q a.. The J matrix R' may be reduced to the form of R by premultipli cation by an orthogonal matrix P consisting of a product of Givens plane rotations. P = G , . . . G, . , G. . , m-1 k+1 k+1 m where G. introduces in the i row of column k, and G! introduces in the i+l st row of column i. The QR factorization of A* is then A = (QP t )(PR') or A' = Q'R', and the normal vector corresponding to this new combination is v' = Q'e . This method has the disadvantages of requiring extra storage 58 for Q in an unfactored form, and of the loss of orthogonality in the columns of Q. Both of the methods presented for finding g* are exhaustive. Hence, these methods are viable only for small k, or k~, due to the computation time involved. For larger problems, methods which obtain an approximation to are required. One algorithm to compute an approximation to g* is based on the following heuristic. If the direction in which Bg* lies is known, then the choice of 7* could be made according to whether b., the i column of i 3 1 B, or -b. lies in that direction. The direction used is the eigenvector, x m ,„. correspondinc max r eigenvalue problem x , corresponding to the largest eigenvalue, ^ max > of the generalized B Z l x = XAA t x (6.12) This problem is similar to the problem which Miller [11] solves to obtain 1/2 an approximation to 0. He has shown that (^ max ) approximates 0, and that his approximation satisfies < n+m »" 1/2 < X m ax> 1/2 ± 1 < X m ax> V2 (-"""J 1 ' 2 < 6 - 13 > The following figure illustrates Miller's methods applied to the model problem. 59 [>??]- Of-,] 1 J T The unit sphere S m = {v e R m : ||v|| 2 = 1) replaces the unit cubes C n+m and C r _ n , and ellipsoids kk\ and BB^ replace the polytopesA C n+m and B C p _ n The ellipsoid AA t S m is expanded so as to just cover B B S m . Solving (6.12) yields (X ) 1/2 = 1.163, and (6.13) states that lies in the interval max 0.4749 = (6)" 1/2 1.163 < < 1.1163 (5) 1/2 = 2.6 60 The following figure illustrates the use of x mau as a guide to max approximating g* and hence Bg*. 0? 2 ]t '[PV^Ij Each 7*» 1 j< i _< r-n, is chosen so that 7*b. has a positive projection on max ^ t - +1 if x b. > max l - n = (6.14) v max l 61 For the model problem, (6.14) was successful in generating the true g*, and a dual linear program similar to (6.9) led to the calculation of G= 1.1176. When applied to other problems, algorithm (6.14) did not always produce the true g*. Nevertheless, the resulting approximate © is a lower bound on the true ©, since the approximate g* gives rise to a Bg* which is a vertex of B" C p _ n . Occasionally, the approximation to g* was improved by using (6.14) iteratively, the approximate Bg* obtained on the previous iteration serving as a guide for the present iteration. Frequently, the resulting increase in ©was marginal. The preceding discussion has not made use of the simplification of polytope representation described in the previous section. Since B C = # C. and A C n+m = A C k +m , the same techniques used by Miller [11] in the development of (6.12) and (6.13) m *y be used to establish similar relationships based on B C k and A C k +m - Thus, may be approxi- mated by (^ J 1 ^ 2 , where /i is the largest eigenvalue of the generalized * max max eigenvalue problem ffy = jxAA 1 ^ (6.15) 1/2 Also, the approximation, (M mav ) , satisfies max <4 +m >" 1/2 KJ n i & i cw ,/2 The following figure illustrates the ellipses in (6.15) for the model problem. 62 [PV 2 ]j Solving (6.15) yields t^VZ = K270f and (6J6) state$ ^ Q ^ . p the interval. ,-V2 0.6342 = (4)-"' 1.270 < 6 < 1.270 (3) 1/2 = 2. 200 This interval is smaller than that provided by Miller, and similar results were found for most of examples tested. In all cases, the lower bound 63 was larger than Miller's, though a proof that this is always true is 1/2 1/2 lacking. "Additionally, it was observed that (a* v ) > (X mav ) and 1/2 that, in most cases, (M mav ) was a much better approximation to © than 1 1 id a 1/2 (a ) . Table 7 provides examples for comparison of (6.12) and (6.15) An algorithm similar to (6-14) based on the eigenvector guide, y , corresponding to M max > performed neither consistently better nor consistently worse. The values of 7* are chosen in the following manner. +1 ify* b. > (6.17) -'max 1 — -1 if y l b. < ^ •'max i The resulting approximate Bg* is used in (6.9). Table 8 provides examples for comparison. The true values of are computed by one of the exhaustive methods. 64 (6.12) (6.13) (6.15) (6.16) ALGORITHM true (X ) 1/2 interval for (m ) 1/2 Interval for max max BACK4 1.5879 1.390 [ \3855, 4.815] 1.981 [ \8086, 3.430 ] SEMI4 1.2345 1.166 [ \3234, 4.039J 1.526 j ;.6229, 2.643 ] PARA4 1.9547 1.380 [ \3827, 5.163] 2.126 | \8678, 4.251 ] BACK8 2.4139 1.750 [ ;.2669, 13.09 J 2.823 | ;.7544, 7.468 ] SEMI8 1.7148 1.426 [ ;.2175, 10.67 ] 1.998 [ ;.5340, 5.286 ] PARA8 7.6503 2.179 [ ;.3323, 20.21 ] 5.1468 [ ;i.376, 17.07 ] GS 10.723 3.488 ( ;.7120, 30.39 ] 6.841 | ]1.569, 34.88 ] MGS 11.032 3.408 I ;.6956, 31.67 J 6.748 | ;i.548, 35.71 ] LTBAND 4.2329 2.098 [ ;.3449, 19.46 ] 3.985 I ;i.029, 12.60 ] RINVERSE 18,329. (2) 23,936. | !5,491. ,75,691.] GJ 2.3719 1.341 [ ;.2737, 11.69 ] 2.438 | ;.8621, 6.451 ] GE 2.1406 1.273 [ \2599, 10.02 ] 2.623 I i-9273, 6.424 ] CHOL 2.3450 1.395 [ \3288, 10.98 ] 2.747 | \7343, 9.906 ] MATMUL (1) 2.723 [ \3930, 28.82 ] 6.404 I ;.9243, 25.62 ] CG 6.3204 3.150 1 ;.9093, 34.79 ] 5.2403 [ ;i.747, 29.65 ] Notes: (1) Computational time was prohibitive for exhaustive methods. (2) Miller's version of BB in (6.12) was not positive-definite. Table 7. Approximations to 65 (6.14) (6.17) ALGORITHM true °: B i C r r n^ wB 2 C r 2 -n } ca^ = glb {w > 0: BgC^cco^ C^} (7.2) These numbers may be interpreted as follows. If algorithm 1 produces a computed result with local relative errors bounded by e, then the same com- puted result can be obtained by algorithm 2 with local relative errors 67 bounded by 1, and algorithm^ is more attractive if (0 2/i > ^ • However, as Miller shows, ^1/2 ' ^/l - ^* and it: ^ s P° sslb l e f° r botn (0 i/2 and w 2/l t0 be 9 reater than 1 indicating that in some respect each algorithm is to be preferred to the other. This possibility poses a dilemma in the choice between two algorithms. Consider the following problem: "l = *1 * ({ 2 + { 3> "2 = h* { h +f 2» and the two algorithms "2 = *3 * { 1 + f 3 * { 2 2: ,, = t, x« z + t, x« 3 "2 = «3 X (f l + *2> For the data £, = -1, £ ? ■ 2, £~ = -3, the polytopes ACj-, B-,C 5 , and B ? C 5 are shown below. Using (7.2) to compare these two algorithms gives W l/2 = ^' and °2/l = ^' Tnese results supply conflicting information for choosing between the two algorithms. 68 .[PT? 2 ] T - [PV^j This dilemma may be overcome by choosing the polytope of the problem as the standard against which the polytopes of the algorithms are compared. Define the following numbers: ©1 e, a e. 1/2 2/1 glb { 9>0: B, C ri . n caftC nW glb {8 >0: B 2 C r2 _ n c^C n+m } V 2 e 2 /e, (7.3) The numbers © 1 .,, and 0^ compare the algorithms by taking ratios of the 69 stability numbers determined by a B-analysis. Clearly, algorithm 1 is more stable if 0, ,~ < 1, and algorithm 2 is more stable if ?/ , < 1. From the definitions, it follows immediately that 0, ,~ ' ?/1 = 1. implying that each algorithm cannot be more stable than the other. Applying (7.3) to the two algorithms gives 0, = 1 , 0« - 4/3, 0, /2 = 3/4, and 2/ , = 4/3, indicating that algorithm 1 is the preferred algorithm. Bounds may be derived which relate the numbers defined in (7.2) and (7.3). These bounds make use of the following definitions: Q 2 = glb0:Ac n+m C ■2 W 2*V2 - "1/2 or l/2 - °i/2 /(0 2 2 } (7 - 5) Similar steps would show that 2/l - W 2/1 /(0 1 1> (7 ' 6) The quantities (0. 0. ) j> 1, i = 1, 2, may be called similarity coefficients They measure the similarity of the polytopes AC , and B. C . and have n+m i r -:~ n value 1 when the polytopes are the same. From (7.2), 1 r-,-n - n/2 2 r~-n Using (7.3) But (7.3) states B, C c w. /9 AC , 1 r,-n - 1/2 2 n+m Bn C n c 0, AC . 1 r,-n - I n+m from which it follows that 0~ > 0, "1/2*2 ^ W l or "1/2 i ®l/2 (7 - 7 ' 71 Similar steps would show that "2/1 * ®2/l (7 - 8) Collectively, (7.5) and (7.7) give "1/2 *■ °l/2 i "l/Z^^ and (7.6) and (7.8) produce "2/1 £ ®2/l £ 1>/1 /(0 1 ®l' Using (7.3), a variety of algorithms for solving dense systems were com- pared for various data. The results are given in the following tables. 72 Algorithm 2 l/2 Alqorithm 1 G.E. w/o pivoting G.E. w/o pivoting G.J. Givens House G.S. M.G.S. 1.0 .549 1.14* 10 3 5.56*10 2 5.21*10 2 7.46*10 2 G.J. 1.82 1.0 2.07x 10 3 1.01*10 3 9.52*10 2 1.36*10 3 Givens .879*10~ 3 .483*10~ 3 1.0 .490 .459 .654 House .180*10' 2 .988*10~ 3 2.04 1.0 .935 1.34 G.S. .192xl0" 2 .105xl0" 2 2.18 1.07 1.0 1.43 M.G.S. .134xl0" 2 .737xl0" 3 1.53 .747 .701 1.0 Data: wel 1 -conditioned, Miller [12] A = 2.908 -2.253 6.775 3.970 1.212 1.995 2.266 8.008 4.552 5.681 8.850 1.302 5.809 -5.030 0.099 7.832 , b = 6.291 7.219 5.730 9.574 , x = -.7238 .1108 .8389 .7460 Algorithm 2 73 l/2 G.E Algorithm 1 w/o pivoting G.J Givens House G.S M.G.S G.E. w/o pivoti ng 1.0 .980 .571 .328 .117xl0" 3 .385xl0" 3 G.J. 1.02 1.0 .588 .333 .120*10" 3 .394*10' 3 Givens 1.75 1.70 1.0 .588 .204*10" 3 .671*10" 3 House 3.05 3.00 1.70 1.0 .357*10" 3 .117*10' 2 G.S. 8.55* 10 3 8.35x 10 J 4.90* 10 3 2.80* 3 10 J 1.0 3.33 M.G.S. 2.60* 10 3 2.54* 10 3 1.49* 10 3 8.52* 10 2 .300 1.0 Data: ill-conditioned - Vandermonde A = 1 1 1 1 10 1 1/2 1/4 1/8 1/16 , b = 13/8 2 1/4 1/16 1/64 1/256 7/16 , x = 3 1/8 1/64 1/512 1/4096 167/1024 L ^ ^ — 74 Algorithm 2 Q l/2 G.E. Algorithm 1 w/o pivoting G.J Givens House G.S M.G.S. G.E. w/o pivoti ng 1.0 .441 .521 .174 .613* io- 1 .714*10 _1 G.J. 2.27 1.0 1.18 .395 .139 .162 Givens 1.92 .846 1.0 .333 .118 .137 House 5.76 2.53 3.00 1.0 .352 .410 G.S. 1.63* 10 1 7.19 8.50 2.84 1.0 1.17 M.G.S. 1.40* 10 1 6.17 7.30 2.44 .858 1.0 Data: symmetric A = 6 4 4 1 4 6 1 4 4 1 6 4 1 4 4 6 , b 30 35 40 45 , x = 1 2 3 I 4 J 75 Algori thm 2 Alqorithm 1 G.E. w/o pivoting w/o G.E. pivoting G.J. Givens House G.S. M.G.S. 1.0 .917 .223 .145 .221 .213 G.J. 1.09 1.0 .244 .159 .242 .233 Givens 4.48 4.10 1.0 .654 .990 .952 House 6.88 6.30 1.53 1.0 1.52 1.47 G.S. 4.53 4.14 1.01 .658 1.0 .962 M.G.S. 4.69 4.29 1.05 .682 1.04 1.0 Data: diagonally dominant A = 4 1 1 1 13 1 2 -5 1 1 , b = -1 > x = 2 1 2 6 1 27 3 3 1 2 -7 -17 4 76 Algori thm 2 V Alqorithm 1 G.E. w/o pivoting w/o G.E. pivoting 1.0 G.J. Givens House G.S. H.G.S. .935 .210 .104 .220 .198 G.J. 1.07 1.0 .225 .111 .235 .211 Givens 4.76 4.45 1.0 .493 1.05 .943 House 9.65 9.02 2.03 1.0 2.12 1.91 G.S. 4.55 4.25 .956 .471 1.0 .901 M.G.S. 5.06 4.73 1.06 .524 1.11 1.0 Data: positive definite A = 10 3 2 3 11 3 2 3 12 1 2 3 L. — \ — ~\ 1 26 2 , b = 42 3 56 13 66 [? , x = L 77 8. Composite Algorithms -A composite algorithm is one which is made up of smaller sub- algorithms where the outputs of one sub-algorithm serve as the inputs to the next. Often an algorithm can be thought of as composite by partitioning it into parts which perform distinct logical functions. For example, an algorithm for solving a system of equations may be considered a concatena- tion of two algorithms, one of which decomposes the matrix of coefficients followed by an algorithm which performs substitutions. The purpose of con- sidering algorithms as composite is to be able to analyze an algorithm in terms of analyses done on its parts. Additionally, time may be saved in analyzing several algorithms by analyzing only once their common sub- algorithm. Without loss of generality, a composite Algorithm 12 will be con- sidered as Algorithm 1 followed by Algorithm 2. Let Algorithm 1 have n, inputs, £ . , 1 <_ i <_ n, ; m, outputs, v . , 1 <_ i <_ m, ; and r,-n, computa- tional nodes, (• . , n, < i <_ r, . 78 (2) Similarly, let Algorithm 2 have rip inputs £, v , 1 £ i < n^; rru outputs t?. , 1 ji i 1 m«; and rp-n,, computational nodes, £.* , n~ < i <_ r~. ► (2) . (2) *1 *2 (2) Algorithm 2 r, (2) » (2) (2) If m, = n ? , then the outputs of Algorithm 1 may be equated with the inputs of Algorithm 2, and the composite Algorithm 12 may be formed. , (12) * (12) f (12) h h & n 1 9 9'-? Algorithm 1 Algorithm 2 A 6-. -6 (12) (12) „ (12) 79 Algorithm 12 has n, inputs, (■.* £, , 1 <. i < n, ; m« outputs, (12) (2) t?^ ' r 1| . 1 li £ m p> anc * r i" n i + r 2" n 2 com P utationa l nodes, r S^ ] K if i < r 1 */ 12) ■ i - '1 n, < l < ^i + ^2~ n 2 Associated with Algorithms 1, 2 and 12 are the matrices of partial relative derivatives of outputs with respect to input and computa- tional nodes. For Algorithm 1, A, is an m,xn, matrix with elements (1) ij W/tf 0) = m^'/fit. and B, is an m,x(r,-n,) matrix with elements f(D (I), t (l) p. . = on . ' /pE ' . ij K 'l n^+j Similarly, for Algorithm 2, A~ 2 is an nuxn- matrix with elements a! 2 > - „„.< 2 W 2) IJ 1 ' J and Bp is an nux^-n,,) matrix with elements -(2) (2), fc (2) ij p 'l n 2 +j Finally, for Algorithm 12, A, ? is an m ? xn, matrix with elements a^ - «1 02 W 12) = ^i (2) /P«j (1) and B, 2 is an m„x(r,-n, + r ? -n ? ) matrix with elements 80 r prj , lZ) M^l.ifi- *< 12 > -= ^ (12 W< ,2 j - { r^+j r r n i u v^ l j-^+n-j+n^ J 1 1 The matrices for Algorithm 12 may be computed from the matrices for Algorithms 1 and 2. Consider an element of A,,,, -(12) (2), , (1) a: . ' = pv ■ Pk ■ lj 1 J Any path between n . * ' and £.* ' must pass through 77. ^ ' = £. ^ ', 1 £ k £ n ? , the outputs of Algorithm 1 and inputs of Algorithm 2. Hence, by the path product property of partial relative derivatives, n~ t02) _ a 1,1 , = 1 1 K K J s «-, (2) ^- (1) k-l lk kj Thus, A 12 " A 2 ' A l (8.1) Let 1 <_ j <_ r-,-n,, and consider an element of B^ jrf! 2) = PV. {2) /P^l By employing the path product argument, n~ *!] 2) ■ i^ww'y&j k=l n ik K kj k=l For 1 < j < r 2 -n 2 » jT< 12 > i.r 1 -n 1 +j (2), y (2) w (2) ""1 /p{ n,ij = *1j Thus 81 B 12 = (A 2 B r B 2 ) (8.2) Using (8.1) and (8.2) to compute A, 2 and B, 2 from A,, B , , Ap, and Bp may be more efficient than computing them by (4.1) from A, ? and B, ? , where A,p is an (r-,-n, + r^-n^Jxri, matrix of the form A 12 (A^, 0) and B, 2 is an (r-.-n, + r ? -n ? )x(r,-n 1 + r ? -n ) matrix of the form "12 where X is zero except that column cc. ' contains column i of A ? , where co. is the variable which indexes the outputs in Algorithm 1, i.e., if k = "j , then v- ' = * k » l-< 1 | * i |p*< 12 W 12) | 1 ' 0=1 n J J-n^l n J n l r l +r ?~ n ? < Z |a..02), + ] | 2 F (1?) , (8.3) i I|a 12 il + IIb 12 II. < 8 - 4 ) where the same bound is chosen for all i, 1 <_ i <_ nu. Using (8.1), the condition number of the problem may be bounded by I|a 12 IL < Ha 2 IL- II^IL and using (8.2), the condition number of the algorithm may be bounded by llBizll- i I^IL- "Bill- + H^H- Naturally, these easily computed bounds will be more pessimistic than those of (8.4), or (8.3) which require the calculation of A,- and B,.. A backward analysis for Algorithm 12 can be done if a backward analysis is possible for both Algorithms 1 and 2. The rank (A, 2 ) will be nip, if rank (A,) = m, > m„, and rank (A«) = m«. Let the stability numbers for backward analysis for Algorithm 1 and 2 be denoted ©, = glb{6 > 0: B, C „ c 6A. C } (8.5) 1 3 — 1 r i~ n i — ' n i and = glb{0 > 0: B 9 C c €K\ C } (8.6) respectively. 83 From (8.5") it follows that A 2 B l C r rni E 1 A 2 A l C ni < 8 ' 7 > If 0' is defined by 0' = glb{0 > 0: C c 0A, C } (8.8) (such as 0' exists since rank (A,) = m = n 2 ) then from (8.6) h C r 2 -n 2 £ 6 2 ' h *1 C ni (8 - 9 > Together, (8.7) and (8.9) imply [*2 h' hK } - n , * r 2 -n 2 £ < 1 + e 2 & ^2 *| C n, (0 - 10) But the stability number for backward analysis for Algorithm 12 is 9 12 . 9lb{e>0:B 12 C ri . ni + r2 . n2 c8J ]2 C n] ) (8.11) and hence, ]2 < + © 2 0' (8.12) While (8.12) shows how the stability number for backward analysis for Algorithm 12 may be related to those of Algorithms 1 and 2, the evaluation of this bound is complicated by the necessity of computing 0' in (8.8) by one of the exhaustive methods. Nevertheless, the amount of computation will certainly be much less than that required to compute 0-, 2 by exhaustive methods. 84 Finally, the stability number for B-analysis for Algorithm 12 can be bounded in terms of those for Algorithms 1 and 2. Let the stability numbers for B-analysis for Algorithms 1 and 2 be denoted by 5, = glb{0>O: B, C^ £ 9^ ♦ A, C^)) (8.13 and 6 2 - g lb<0>O:B 2 C r2 _ n2 £ etC^.A-,^)} (8.14 respectively. From (8.13) it follows that A B , C c 0.(A O C + A A, C ) (8.15 2 1 r^-n, - r 2 m, 2 1 n. Combining (8.14) and (8.15), A 9 B, C n + B C n c 0. C + (a + © 9 )A 9 C + 0, A 9 A, C (8.16 2 1 r,-n, 2 r 2 -n 2 — 2 m 2 1 2' 2, n 2 1 2 1 n n Define 0" by 0" = glb{0 > 0: A C c 0C } - 2 n 2 - m 2 Clearly, ©' ' = ||A || . Then (8.16) becomes / ' oo ^2 V ^-n, ♦ r 2 -n 2 £ < B 2 + < 5 , + ^F\ + ®1 h *l \ »■" But the stability number for B-analysis for Algorithm 12 is G 12 = glb{9 > 0: B 12 C^.^ + ^ £ 6^ + A, 2 C^)) and hence ]2 < max((0 2 + (Sj + 2 )0"), Sj ) 85 9. Algorithm Improvement "From the analysis of an algorithm it is possible to detect those operations which have contributed the most to the poor performance of the algorithm. Knowing which operations are at fault may provide information as to where the algorithm might be changed to improve it. This information is data dependent and enables only a limited improvement for the algorithm evaluated for a specific data. Mathematical identities are used to replace the offending operations. One goal in algorithm improvement might be to reduce the forward error bounds, |[p»?.] T |, 1 <_ 1 <. m. Since ||F|| = max | [pi? . ] | , reducing the forward bounds may decrease the norm of B. By decreasing the norm of B the polytope for the algorithm, B C , also shrinks so that the backward or B-analysis may determine a smaller stability number, 6. Recall (5.6), H>t? ] I < 2 |pt?./pU (9.1) 1 ' j=n+l n J Let I denote a set of indices of nodes in a sub-graph of the algorithm. Then (9.1) may be expressed as |[pr?.] T | < 2 \pnJpZA + 2 |pr?./pf.| (9.2) 1 ' ~ jel ' J #1 ] J Let I* denote a set of indices of nodes for a sub-graph which is mathematically equivalent to that specified by I. Then by substituting the new sub-graph, (9.2) becomes |[pr? ] | < Z |pr?,/pf J + Z l^i/^il ( 9 - 3 ) 1 jel' J j^I J 86 If the substitution leads to a decrease in the value of the first sum, then the bound has also been correspondingly decreased. Consider a simple case in which I and I' consist of only one node, say % . (9.4) The sub-graph, £ , and surrounding nodes portray the statements £ = £ a * $ b , and £ = % * £ c , where * is any one of (+, -, *, *). If the statements £ , = £. * % , and £ = % * £ ,, corresponding to the graph (9.5) 87 are equivalent to (9.4), i.e., both mathematically produce the same value for £ , then (9.5) may be substituted for (9.4). Also, this change was possible only if £ was an operand for £ and no other node. Because of this, all the quantities p7?./p£ . remain unchanged, except pt?./p£ J H becomes pnJpi . . Since PT?./p£ q = (pT7./p£ p )(p£ p /p$ q ) and (9.6) pri./p^ q , = (pT? i /p£ p )(p£ p /p£ ql ) the change in prijp% and hence, in the bounds, |[pt?. ]_|, is governed by the change in the arc weight from P% /pi to pZ/pZ,. This fact supports r H r H the idea of investigating those sub-graphs which contain large arc weights, since it is there that the greatest improvement can be made. However, each time an arc weight is reduced there is some nearby arc weight which is in- creased. This phenomenon is a result of the fact that the path product be- tween unchanged nodes must remain the same. Certainly, sub-graphs corresponding to algebraic identities are mathematically equivalent. The simplest such identities are the associated and distributive laws. These laws correspond to five node graphs similar to (9.4) and (9.5). For the associative law of addition, the following three graphs are interchangeable: <'M and correspond to the identities (^♦^.♦t. «1 + «2 + £ 3> «1 + «3» +? 2 If the first graph were a sub-graph of an algorithm, then successive replace- ments of it by the other two would replace the term \pr\./p$. | in (9.1) by |pT7./p£ 4 '| and then \prj./p^ ,l \. By (9.6), the change in |I>T? i ] T | would be determined by the values of Ptjpi*, p£e/p£/, and pZc/pZa", an arc weight in each graph. Clearly, the substitution to be made should choose the graph with the smallest absolute arc weight. The reduction in the arc weight between £. and ^ 5 will be compensated for by an increase in an arc weight between the new £. and one of its operands. Thus, the overall effect is to move large arc weights toward the inputs, i.e., any operations involving cancellation should be done early in the computation. This suggests that the graph be investigated from the bottom up, so that fewer and fewer paths contain the arcs with large weights. Since the weight of an incoming arc to an additive node is a ratio of the operand to result, the graph with the smallest absolute arc weight will be the one which computes the smallest £. in absolute value, 89 n This fact may be generalized to show the best way to compute n = 2 \y Any algorithm to compute n may be specified by the sequence of partial sums £ . 1 < i < n-1, where each £ .- is the sum of two inputs, one input and *• n+-j » — — n+i a previous partial sum, or two previous partial sums. Since n-1 | Ml < 2 |pT?/P* n+i l i=l and PH/P* n+i = * n +i /r? the forward error bound can be minimized by choosing the partial sums so as to minimize n-1 . , '^n+i i=l Fo r example, If ^ - 1,.* 2 - 1-25, « 3 = 1.5, and ^ = 1.75, then the algorithm * 5 = «! + h - 2.25 h - h + h - 3 - 25 -I - f ? - «5 + «6 " 5.50 has a forward error bound of 2, while the algorithm which adds the numbers in ascending order h ' h + h " 2 - 25 h ' h + h - 3 ' 75 „ = J ? = 5 6 + { 4 = 5.50 has a forward error bound of 46/22 > 2. 90 For the associative law of multiplication, the following three graphs are interchangeable. and correspond to the identities ($! * ^ x * «1 * C£ 2 - £3) <*1 x *3> **2 However, since the weights of incoming arcs to multiplicative nodesare 1, no immediate improvement can be made by interchanging these graphs. Such interchanges may however prevent overflow or underflow at the intermediate node £.. For the distributive law of multiplication over addition, the fol lowing two graphs are interchangeable. 91 and correspond to the identities *1 * « 2 * * 3 ) (*, x i.) + (t x t ) Th is law is similar to the other distributive laws involving £-i x (£o - £?) (£ ? + £ 3 )/£-i> and (£„ - £-)/£,, and what follows is applicable to these variants. The choice between the two graphs involves comparing the values of |p£ 5 /p£ 4 I and |p£ 5 /p£ 4 '| + |p£ 5 /p£ 4 "|- While the first is always equal to 1, the second may be much larger. Since the sum of weights of incoming arcs to additive nodes equals 1, i.e., p^c/pd/ + p%r/p£»' ' = 1» it follows that |p£ 5 /p£/| + |p£r/p£ 4 ''| > 1- Thus, the graph (II) can never be better than graph (I). As with the associative law of addition, the trend is to move larger arc weights toward the inputs. Finally, consider the identity i 2 -t 2 *1 *2 (^ + 5 2 )(f 1 - € 2 ) and associated graphs 92 The choice here depends on the values of |p£ 5 /p£J + \piJPiA and |p£(./p£ '| + IpSc/pS/I- The latter always has value 2, while the first may have any value greater than 1. Conservatively, the choice should be graph (IV). Again, the trend is to have any operations involving cancel- lation done early in the computation. The previous discussion has shown that offending operations may be detected by the presence of large values of the arc weights leading from their nodes. Through the use of mathematical identities, improvements by graph substitution can be made which reduce the bounds for forward analysis and hence other analyses as well. The underlying goal of these improvements is to force large arc weights to the arcs leading from input nodes where they will not contribute to the bound in (9.1). The identities described involve very small graphs and hence are limited in their effectiveness. Ad- ditionally, a graph substitution often enables another substitution to be made in nearby nodes. The inability to know when to apply the associative law of multiplication may inhibit some later substitution, and suggests that greater improvement can be made by considering larger equivalent graphs. However, this technique is limited by the fact that the computational graph 93 does not contain all the information about the algorithm or the problem it solves-. For example, the solution x of Ax=b satisfies each row equa- tion, and the columns, q . , of Q in A=QR satisfy q. q. = 0, i f j. Such information is not available from the inspection of graph nodes or algebraic identities. However, it is this type of information which enables one to transform Cramer's rule to Gaussian elimination, and Gram-Schmidt to modified Gram-Schmidt. Algorithm improvement which could utilize this additional information would have great consequences. 94 10. Conclusion "Methods have been presented for the automatic error analysis of numerical algorithms. These methods have been developed using Bauer's approach to relative error propagation. Basic to this approach are the concepts of local and total relative errors, and the various systems of equations which relate them. An algorithm is presented which makes use of the equivalence of these systems in order to efficiently compute the equations employed by the error analyses. A modification of the algorithm, is also shown to efficiently compute the system of equations used by Miller in his approach to error analysis. Three error analyses are presented. A forward analysis, which finds a relative error bound for the computed solution, is shown to be easily calculated from the coefficients of the system relating local and total relative errors. This computational ease is contrary to a remark on manual analysis by Wilkinson [19, p. 3] which states that, "In practice, we have found that the backward analysis is often much the simpler." Computa- tionally, the backward analysis, and the B-analysis as well, have proven to be much more difficult than the forward analysis. These analyses, which evaluate the stability of the algorithm, are identified as minimax problems, and as such required tremendous work for their solution, and could only be applied to small problems. A simplification of these analyses significantly reduced the work required, but nonetheless, the work remained exponential or combinatorial in form. An approximation method, which is linear in compari- son to the exact methods, is proposed and experimental results indicate that the approximation is wery good. The analyses presented are all worst case analyses. This is caused by the use of the °°-norm in the measurement of 95 relative errors. The results of the analyses may, therefore, be pessimistic. -Various spin-offs of the main theme are investigated. First, different criteria are presented for the comparison of algorithms. The choice between two algorithms depends on the criteria used. Second, error analyses are derived for composite algorithms, which are made up of smaller sub-algorithms. Bounds for forward, backward, and B-analysis of the compo- site algorithm are given in terms of corresponding analyses of the sub- algorithms. Finally, by using a forward analysis, limited algorithm improve- ment is obtained by the location of sensitive operations and the substitution of mathematically equivalent expressions. Several problems remain unsolved and point the way for future research. The most nagging is the requirement that the value of each node in the computational graph be non-zero. This is a result of the use of relative error in the description of the numerical effects of floating point arithmetic. Arc weights for additive nodes with zero value are infinite. Possible solutions include the replacement of zero by a non-zero quantity which computationally acts as zero, or the assignment of huge arc weights for those defined as infinity. Alternatively, the algorithm may be modified, as in Miller [9], to remove some of the offending nodes altogether. The simplification of polytope representation in section 6.2 is performed by taking inner products of the columns of B and A to find those columns which may be condensed. This technique is computationally expensive. An alternative method may perform a condensation of the nodes in the computa- tional graph, before the matrices are formed. A condensed node would be made up of nodes to which the outputs have similar sensitivities. The % sets, S., defined in theorem 6.3 may provide a clue as to which nodes may j be condense'd. Much remains to be done in the area of algorithm improvement. Ideally, methods for algorithm improvement should be data independent. This, however, may not be possible since data can usually be constructed so that any algorithm performs poorly. The best way to solve a problem may depend directly on the data,, and the best algorithm may be different for various chosen data. An alternative approach may seek to find the set of data for which a given algorithm performs satisfactorily. To be effective, either approach must incorporate properties of the problem and algorithm not found in the computational graph. Naturally, an algorithm may be improved in other ways. In particular, parallel processing imposes other desirable criteria. Bounds on processors, execution time, and storage required become equally important considerations. These quantities are reflected in the height and breadth of the computational graph. The error analyses described herein can serve to evaluate the tradeoff between numerical stability and other criteria. 97 Bibliography [1] Ascher, U. s "Linear Programming Algorithms for the Chebyshev Solution to a System of Consistent Linear Equations," SI AM J.N. A. , Vol. 10, No. 2, April 1968. [2] Barrodale, I. and F. D. K. Roberts, "An Improved Algorithm for Discrete 2, Linear Approximation," SIAM J.N. A. , Vol. 10, No. 5, October 1973. [3] Barrodale, I. and F. D. K. Roberts, "Solution of an Overdetermined System of Equations in the 2 Norm," Algorithm 478, Communications of the ACM , Vol. 17, No. 6, June 1974. [4] Bauer, F. L., "Computational Graphs and Rounding Error," SIAM J.N. A. , Vol. 11, No. 1, March 1974. [5] Householder, A., The Theory of Matrices in Numerical Analysis , Blaisdell, 1964. [6] Larson, J. and A. Sameh, "Efficient Calculation of the Effects of Roundoff Errors," to appear in ACM TOMS . [7] Linnainmaa, S., "Taylor Expansion of the Accumulated Rounding Error," BIT, Vol. 16, No. 2, 1976. [8] Miller, W., "Roundoff Analysis by Direct Comparison of Two Algorithms," SIAM J.N. A. , Vol. 13, No. 3, June 1976. [9] Miller, W. , "Roundoff Analysis and Sparse Data," Numerische Mathematik , Vol. 29, No. 1, 1977. [10] Miller, W., "Software for Roundoff Analysis," ACM TOMS , Vol. 1, No. 2, June 1975. [11] Miller, W. , "Computer Search for Numerical Instability," Journal of the ACM, Vol. 22, No. 4, October 1975. 98 [12] Miller W. and D. Spooner, "Software for Roundoff Analysis, II," to appear in ACM TOMS . [13] Nijenhuis, A. and H. S. Wilf, Combinatorial Algorithms , Academic Press, 1975. [14] Panzer, K. , "Gutartigkeit von Rechenprozessen," Technische Universitat Munchen, Fachbereich Mathematik, Interner Bericht, September 1975. [15] Rabinowitz, P., "Applications of Linear Programming to Numerical Analysis," SIAM Review , Vol. 10, No. 2, April 1968. [16] Sameh, A. and R. Brent, "Solving Triangular Systems on a Parallel Computer," SIAM J.N. A. , Vol. 14, No. 6, December 1977. [17] Sterbenz, P. H., Floating Point Computation , Prentice Hall, 1974. [18] Stewart, G. W., Introduction to Matrix Computations , Academic Press, 1973. [19] Wilkinson, J. H., Rounding Errors in Algebraic Processes , Prentice Hall, 1963. 99 Appendix Description of algorithms tested, BACK4, BACK8: SEMI4, SEMI8: PARA4, PARA8 [16] an algorithm for forward substitution on a unit lower triangular matrices of dimension 4 and 8. For data used see PARA4. an algorithm for solving unit lower triangular systems Ax=b of dimension 4 and 8 according to the following scheme: Let A = L^ 1 , L 2 _1 . L ,~ , where L, = (X..) is the n-1 k i J identity except in the k column where A.. = -a... Then x = (L-Jl^U, b))) for dimension 4 and similarly for dimension 8. For data used see PARA4. similar to SEMI4, SEMI8, except log product is used, i.e., x = (L-. L 2 ) (L, b) for dimension 4 and similarly for dimension 8. Data used for dimension 4 was 1 1 -2 1 1 A = , b = -3 -5 1 1 ^-4 -6 -7 1_ ^1 For di mension 8 the data \ vas 100 LTBAND: MATMUL : A = b = 1 -1 1 -2 -8 1 -3 -9 -14 1 -4 -10 -15 -19 1 -5 -11 -16 -20 -23 1 -6 -12 -17 -21 -24 -26 1 -7 -13 -18 -22 -25 -27 -28 1 an algorithm by Sameh, Brent [16] for the parallel solu- tion of banded lower triangular matrices. The data used was -1 4 -1 -14-1 -1 4 -1 -1 4 -1 -1 4 -1 -1 4 -1 -1 4 -lj an algorithm for multiplying two matices of dimension 4. The data was A = , b A = 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 B = 1 5 9 13 -2 -6 -10 -14 3 7 11 15 -4 -8 -12 -16 101 RINVERSE: CHOL an algorithm for inverting an upper triangular matrix. The data [12] was 0.0004 4.5692 4.8228 3.1819 3.9494 4.1681 3.7357 0.004 -5.0996 4.4464 the Cholesky method applied to the data [12] A = A = 3 (symmetric) 1 4 1 1 5 1116 b = Gaussian elimination without pivoting with data [12] as in CHOL. The method of conjugate gradients applied to the data [12] A = 3 (sym) 1 4 1 1 *i i 5 , b = 6 5 7 Gauss-Jordan elimination with data as in CHOL. The Gram-Schmidt orthogonal ization algorithm applied to the three 4-vectors r~ *^ 10 -6 -13 10 10 , v 2 = -6 14 ' V 3 = 7 -5 10 V — «*J 14 ^15^ The modified Gram-Schmidt algorithm with data as in G.S. 102 Vita The author, John L. Larson, was born February 6, 1949, in Champaign, Illinois. He received B.S., M.S., and Ph.D. degrees from the University of Illinois at Urbana-Champaign in the years 1971, 1974, and 1978, respectively. During his graduate education, he was employed as a teaching and research assistant with the Department of Computer Science. The titles of his publications include: "Efficient Calculation of the Effects of Roundoff Errors," (with Ahmed H. Sameh) and "Automatic Error Analysis for Serial and Parallel Algorithms." He is a member of the Association for Computing Machinery and the Society of Industrial and Applied Mathematicians. ILIOCRAPHIC DATA KET 1. Report No. UIUCDCS-R-78-916 "itlc and Subtitle Methods for Automatic Error Analysis of Numerical Algorithms 3. Recipient's Accession No. 5- Report Date April, 1978 uthor(s) John Leonard Larson 8- Performing Organization Rept. No - UIUCDCS-R-78-916 'erforming Organization Name and Address University of Illinois at Urbana-Champaign Department of Computer Science Urbana, Illinois 61801 10. Project/Task/Work Unit No. 11. Contract/Grant No. US NSF MCS75-21758 jSponsoring Organization Name and Address National Science Foundation Washington, D. C. 13. Type of Report & Period Covered Doctoral Dissertation 14. Supplementary Notes ! Abstracts :ing Bauer's approach to relative error propagation, methods are presented for informing various error analyses on numerical algorithms. A technique is developed iiich generates a system of equations employed by the error analyses, and requires an -der of magnitude less time and storage than present techniques. Accurate analyses re attempted by working in the °°-norm, and require the solution of minimax problems, heuristic approximation method is also described. These methods are compared with le 2-norm approximation methods of Miller. le error analyses provide alternative criteria by which algorithms that solve the ime problem may be compared. Additionally, the analysis of a composite algorithm, lich is made up of concatenated sub-algorithms is given in terms of analyses done on ts parts. Finally, by using a forward analysis, limited algorithm improvement is jtained by the location of sensitive operations, and the substitution of Key words and Document Analysis. i7o. Descriptors mathematically equivalent expressions. Jtomatic error analysis jmerical stability inear algebra i. Identifiers/Open-Ended Terms . COSATI Field/Group Availability Statement ELEASE UNLIMITED 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 106 22. Price 'M NTIS-35 ( 10-70) USCOMM-DC 40329-P7 1 JUN 2 1978 I