HmHili 
 
 nffifflL 
 
 mbhw 
 
 8uii 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/methodsforautoma916lars 
 
5/O.X'f 
 MM 
 
 
 Report No. UIUCDCS-R-78-916 
 
 UILU-ENG 78 1709 
 
 1 
 
 METHODS FOR AUTOMATIC ERROR ANALYSIS 
 
 OF NUMERICAL ALGORITHMS 
 
 by 
 
 John Leonard Larson 
 
 > 
 
 b~> 
 
 r 
 
 April 1978 
 
 NSF-0CA-MCS75-21 758-000033 
 
 1 * * 
 
 The Library of the 
 
 JUN 1 5 1978 
 
 wersity ot >»ino.s 
 sxna-Ch8 r 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILL! 
 
Report No. UIUCDCS-R-78-916 
 
 METHODS FOR AUTOMATIC ERROR ANALYSIS 
 OF NUMERICAL ALGORITHMS* 
 
 by 
 
 John Leonard Larson 
 
 April 1978 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 * 
 
 This work was supported in part by the National Science Foundation under 
 
 Grant No. US NSF-MCS75-21758 and was submitted in partial fulfillment of 
 the requirements for the degree of Doctor of Philosophy in Computer 
 Science, April 1978. 
 
METHODS FOR AUTOMATIC ERROR ANALYSIS OF NUMERICAL ALGORITHMS 
 
 John Leonard Larson, Ph.D. 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign, 1978 
 
 Using Bauer's approach to relative error propagation, methods are 
 presented for performing various error analyses on numerical algorithms. A 
 technique is developed which generates a system of equations employed by the 
 error analyses, and requires an order of magnitude less time and storage 
 than present techniques. Accurate analyses are attempted by working in the 
 °°-norm, and require the solution of minimax problems. A heuristic approxi- 
 mation method is also described. These methods are compared with the 2-norm 
 approximation methods of Miller. 
 
 The error analyses provide alternative criteria by which algorithms 
 that solve the same problem may be compared. Additionally, the analysis of 
 a composite algorithm, which is made up of concatenated sub-algorithms is 
 given in terms of analyses done on its parts. Finally, by using a forward 
 analysis, limited algorithm improvement is obtained by the location of 
 sensitive operations, and the substitution of mathematically equivalent 
 expressions. 
 
Ill 
 
 Acknowledgments 
 
 I would like to express my sincere appreciation to my thesis 
 advisor, Professor Ahmed H. Sameh, for his guidance, insight, encourage- 
 ment, and friendship. I would also like to thank the members of my 
 thesis committee for their interest and comments: Professors David Kuck, 
 Dave Liu, Paul Saylor, and Robert Skeel . Additional thanks go to 
 Professor Webb Miller for providing his software routines and for his 
 interest in my work. 
 
 I am grateful to the National Science Foundation and the Depart- 
 ment of Computer Science for their financial support. Thanks are due to 
 Ms. Cathy Gall ion for an excellent job of typing. 
 
 Finally, I would like to thank my wife, Emmylou, for her love, 
 patience and encouragement, and for her skill in preparing the illustrations 
 
IV 
 
 Table of Contents 
 
 Page 
 
 1. Introduction 1 
 
 2. Preliminaries 6 
 
 3. Roundoff Errors 11 
 
 4. Efficient Calculation of M 19 
 
 5. Error Analyses 25 
 
 5.1. Forward Analysis 25 
 
 5.2. Backward Analysis 27 
 
 5.3. B-analysis 34 
 
 6. Computational Details 39 
 
 6.1. The Form of g* 39 
 
 6.2. Simplification of Polytope Representation 40 
 
 6.3. Algorithms for g* 48 
 
 7. Comparison of Algorithms 66 
 
 8. Composite Algorithms 77 
 
 9. Algorithm Improvement 85 
 
 10. Conclusion 94 
 
 Bibliography 97 
 
 Appendix 99 
 
 Vita 102 
 
1. Introduction 
 
 - New algorithms are constantly being proposed for the solution 
 of numerical problems. In particular, these include algorithms which 
 exploit the capabilities of new and emerging parallel and pipeline machines. 
 Unfortunately, this enthusiasm has not been matched by the development of 
 systematic ways to evaluate the numerical properties of these new algorithms. 
 In fact, until recently [Miller, 8-12] little work has been done in the area 
 of automating error analysis for existing sequential algorithms. Besides 
 J. H. Wilkinson, few people have the time or inclination to make tedious 
 error analyses of the algorithms they use. Error analysis needs to be 
 organized in such a way that it is easily done by machine. This thesis 
 is concerned with the development and implementation of methods for automatic 
 error analysis of numerical algorithms. 
 
 This work parallels that of Miller [10], but takes a different 
 approach to the analysis of numerical algorithms. His development is from 
 an absolute error point of view, while this thesis is concerned with relative 
 errors, a natural approach considering that floating point arithmetic is 
 defined in terms of relative error. Perhaps the greatest difference in the 
 approaches lies in their goals. While he strives to find data for which an 
 approximate analysis indicates that the algorithm under investigation per- 
 forms poorly, this thesis strives to give an accurate analysis of the 
 algorithm for some given data. Each goal has its place depending on the 
 need for the analysis. 
 
 Section 2 begins with a summary of previous work, and shows how 
 roundoff errors affect computations. Aided by the representation of an algor- 
 ithm as a directed graph, each numerical quantity is a node, and directed 
 
arcs lead from operands to results. During the execution of the algorithm, 
 the value" at each node is subject to roundoff error, called local relative 
 error as described in section 3. The local relative error in a node sub- 
 sequently affects the value of other nodes. In particular, it affects 
 those nodes which use the contaminated node as an operand. The magnitude 
 of the effect on one of these nodes is reflected by a number which weights 
 the arc between it and the contaminated node. Moreover, the local rela- 
 tive error in one node affects the value of another node if there exists 
 a directed path between them. Additionally, the effect of the local rela- 
 tive error is the product of the arc weights along this path. 
 
 The effects of all local relative errors combine to produce a total 
 relative error in each node. The definition of total relative error, when 
 described in this way, is called the global definition, since all the nodes 
 in the graph are inspected. An alternative formulation, called the local 
 definition, recursively defines the total relative error in a node in terms 
 of the total relative error in its operands. Each definition gives rise to 
 a system of equations which relates the local and total relative errors. 
 The system corresponding to the global definition is dense, while the local 
 definition, as expected, produces an exceedingly sparse system, no more 
 than four non-zero elements per row. The equivalence of these two systems 
 is shown. 
 
 Either of these systems could be used to perform various error 
 analyses. However, only those rows of the dense system corresponding to 
 output nodes need be used. Current methods generate the entire dense 
 system, and then discard all but the desired rows. Section 4 derives an 
 algorithm which relies on the equivalence relation and computes only those 
 
rows which are needed by solving sparse upper triangular systems. A com- 
 parison of this method with an existing alternative shows an order of 
 magnitude savings in both time and storage. 
 
 Section 5 poses the problems to be solved in order to perform 
 forward, backward, and B-analysis. A forward analysis determines relative 
 error bounds for the computed solution. This bound may be easily computed 
 from the system relating the local and total relative errors. A back- 
 ward analysis determines the stability of an algorithm by finding a bound 
 on relative input perturbations such that the computed solution satisfies 
 the perturbed problem. This bound is found by comparing the set of output 
 relative error vectors for all computed solutions with the set produced by 
 local relative errors in only the input nodes. Conditions are specified 
 for when a backward analysis is not possible. A B-analysis is an alterna- 
 tive stability measurement which determines a bound on relative input and 
 output perturbations such that the computed solution is near the exact 
 solution of a perturbed problem. This bound is found by comparing the set 
 of output relative error vectors for all computed solutions with the set 
 generated by local relative errors in only input and output nodes. An ad- 
 vantage of the B-analysis is that it may always be done. 
 
 Section 6 analyzes the problems posed in section 5. The problems 
 
 for backward and B-analysis are identified as minimax problems, whose solu- 
 tions are often tedious to obtain. These problems involve the expan- 
 sion of a polytope (a centrally symmetric convex set) to cover another poly- 
 tope. The polytopes represent the sets of output relative error vectors 
 described above. Two exhaustive methods are described. The first method 
 
searches the vertices of the polytope being covered for the vertex of maxi- 
 mal relative distance. The second method searches the faces of the polytope 
 doing the covering. Each method makes use of the particular form of the 
 solution. Additionally, a close inspection of the problems reveals that 
 their complexity may often be significantly reduced. This is reflected in 
 a reduction of the number of columns in the system which relates local and 
 total relative errors. Nevertheless, the computation time of the exhaustive 
 methods precludes their use for all but small problems. An approximation 
 method is also given which computes a lower bound on the solution and 
 occasionally produces the exact solution. 
 
 The remaining sections investigate several topics related to the 
 main theme. Section 7 examines various ways by which algorithms may be 
 compared. The first is in terms of a forward analysis, where one algorithm 
 is preferable to another if it can guarantee a smaller relative error bound 
 for the solution. This method is equivalent to comparing, for each algorithm, 
 the set of all computed solution relative error vectors with the unit cube. 
 A second method, proposed by Miller [8 J, compares the sets of output rela- 
 tive error vectors for each algorithm against one another. The results 
 of this comparison may indicate that in some sense each algorithm is prefera- 
 ble to the other. This dilemma is overcome by a third method which, like 
 the first method, uses a common standard against which the algorithms are 
 compared. Here, the standard is the problem which the algorithms solve and 
 is represented by the set of output relative error vectors resulting from 
 local relative errors in only input and output nodes. Several examples are 
 given. 
 
 A composite algorithm is one which is made up of smaller sub- 
 algorithms where the outputs of one sub-algorithm serve as the inputs to 
 
the next. Section 8 shows how a composite algorithm may be analyzed if 
 analyses -have been performed on its parts. The relationship between the 
 matrices of coefficients of the systems which relate local and total relative 
 error in the composite algorithm and in the individual sub-algorithms forms 
 the basis upon which the analysis of the composite algorithm may be done. 
 Derivations are presented which provide bounds for forward, backward, and 
 B-analysis of the composite algorithm in terms of corresponding analyses 
 done on its sub-algorithms. 
 
 Finally, section 9 scratches the surface of the problem of algorithm 
 improvement. The goal taken is to reduce the forward analysis error bounds, 
 since by doing so bounds for other analyses should also be reduced. Algorithm 
 improvement consists of two parts: locating where changes to the algorithm 
 should be made, and deciding what changes should be incorporated. It is 
 shown that operations that have large weights on the arcs leading from 
 their nodes are prime candidates for modification. A change in the 
 algorithm involving these nodes may take the form of a mathematical identity 
 for which the best alternative may be easily computed. Several basic 
 identities are investigated. This technique is limited in its effectiveness 
 by the fact that the computational graph does not contain all the information 
 about the algorithm or the problem it solves. 
 
2. Preliminaries 
 
 " This section presents the basic definitions and concepts pro- 
 vided by Bauer [4], and clarified and extended by Larson, Sameh [6]. 
 Definition 2.1 : problem - a set X of n input variables, £-, 1 <_ i <_ n, 
 and a set Y of m output variables, rj . , 1 <_ i <_ m, with t?. = f .. (£, , £«» ...» 
 
 Definition 2.2 : algorithm - any finite sequence of elementary operations 
 
 of the form £. = £. * £,, where * is one of ( + , -, *, 0, which mathematical- 
 
 1 J K 
 
 ly realize the elements of Y. 
 
 Definition 2.3 : computational graph - a directed graph, essentially a 
 
 parse tree of the algorithm, having the £• as nodes and directed arcs from 
 
 t d and£ k to*.. 
 
 The computational graph contains a total of r nodes consisting 
 
 of n input nodes, £., 1 <_ i <_ n, and r-n computational nodes, £ . , 1 <_ i <_ r-n, 
 
 of which m nodes are designated as output nodes, i.e., tj- = £ , 1 <_ i <_m. 
 
 i 
 
 The variable co. is used to index the outputs. The nodes are numbered in 
 such a way so that if $. = £. * £., then j < i and k < i. The value of 
 each node is assumed to be non-zero. 
 
 The following model will be used from time to time to illustrate 
 the concepts being presented. 
 
 problem: X = {^, £ 2> £ 3 , £ 4 > ; n = 4, 
 Y = {t? 1 , tj 2 > ; m = 2, 
 r? 1 = (^ + £ 2 h(£ 2 - t 3 ), 
 
 v 2 = K 2 -« 3 M« 3 +« 4 ) 
 
algorithm : £ & = ^ + £ 2 
 
 h-h + h 
 h'h*U 
 
 where t? 1 = £ g , r? 2 = $ g , cjj 
 computational graph : 
 
 8, o^ = 9, r-n = 5, r = 9, 
 
 Subsequent graphs will employ implicit downward arrows. 
 Definition 2.4 : partial relative derivative of (■. with respect to £ fi - 
 a quantity, p£.j/p£ c , representing the first order relative effect on the 
 value of £ . assuming a relative change in the value of £ fi , given by 
 
 p^/p^ = K,/^) (a^/aSj) 
 
 (2.1) 
 
 Sterbenz [17] calls these quantities magnification factors. 
 
 The arcs of the computational graph are weighted. The weight of 
 
 the arc from £ . to £ . is f>i-lp%.. Table 1 illustrates the calculation 
 J l v J 
 
 of arc weights for the elementary operations. 
 
■: 
 
 'V*j 
 
 p{,.M k 
 
 p^Mj 
 
 Pf/P« k 
 
 *i=V*k 
 
 * = + 
 
 * = _ 
 
 * = X 
 
 * = r 
 
 V { i 
 
 ty*i 
 
 1 
 
 1 
 
 v«1 
 
 -v*i 
 
 1 
 
 -1 
 
 Table 1 . Arc Weights 
 
 The calculation of p£.j/p{; 8 using (2.1) when 2 ^ j, e ^ k, or 8 =j = k 
 is complicated by the evaluation of a£./3£ fi by the chain rule. An al- 
 ternative calculation makes use of the following definition and theorem. 
 Definition 2.5 : path product - the product of partial relative derivatives 
 belonging to each arc on a directed path between two nodes. 
 Theorem 2.1 : p£ -/p£p is equal to the sum of the path products for all 
 paths from £« and £ . . 
 Proof: The chain rule may be stated as 
 
 a^/at. 
 
 s U P 
 
 z n as 
 
 p=l q=l 
 
 f(p,q) /3t f(p,q+l) 
 
 (2.2) 
 
where s is the number of distinct directed paths from £_ to £.; u is 
 the number of arcs in path p; u +1 is the number of nodes in path p; 
 f(p,q) = f. (p,q) is a function which describes the paths from £ g to £ . 
 in the following manner: for each path p: f(p,l) = i; £w +1 \ is an 
 operand of £ f / %; f(p,u +1) = £. Consider two adjacent nodes, say 
 
 ^f(n a+1) anc ' ^f(D a)' on one °^ tnese P at hs. fr° m (2.1), the weight 
 of the arc between these nodes is 
 
 P *f(p,q) M f(p,q+l) = (? f(p,q-M) / ^f(p,q) )( ^f(p,q) / ^f(p,q+l) ) - 
 Thus, 
 
 ^f(p,q) /9 ^f(p,q+D = U f(p,q) /€ f(p,q+l) )(pt f(p,q) /p5 f(p,q+l) ) ' (2>3) 
 
 Substituting (2.3) in (2.2) gives the following expression for the partial 
 relative derivative of |. with respect to £„ : 
 
 u 
 *i'*t " Ml> < p ?, q " «f{p.q) /8 f(p.qfl) ,W f(p.q) /P *f(pW ) - 
 
 Simplification by cancellation results in 
 
 s U P 
 p£-/p£n = 2 n p£,, n/p^/ ±11 (2-4) 
 
 1 P =i q=i f (P'^) f(p,q + D 
 
 Thus, p£./p£ c is the sum of all path products. ■ 
 
 Trivially, pZJpZ- = 1, and if there is no directed path from £ fi to £., 
 
 then P^/P^ = 0. 
 
 Formula (2.4) may also be expressed recursively. If £• = £ • * £^ 
 then 
 
10 
 
 p^/pSi = (p€ 1 /p€j)(p€ j /p€ 8 ) + (p£ i /^k )(p *k /p *8 ) 
 
 (2.5) 
 
 This follows from the fact that any path from £ g to £. must pass through 
 
 For the model problem with data £, = £o = 1 » £3 = £4 = 2, the 
 arc weights are shown below. 
 
 The evaluation of p$.Jp£~ using (2.1) necessitates the calculati 
 
 on 
 
 3{ g M 3 = % 2 - 2J 3 - { 4 
 
 from which follows 
 
 P V P *3 = ( ^3 / ^9 )( ^9 / ^3 ) = 5/2 
 
 Alternatively, formula (2.4) gives 
 
 P£ g /P* 3 = (Pf g /P5 6 )(p€ 6 /P€ 3 ) + (Pi g /Pi 7 ){pi 7 /Pt 3 ) = 5/2, 
 
11 
 
 3. Roundoff Errors 
 
 Floating point computation is subject to error due to finite 
 word length. Wilkinson [19] provides the bounds for representation 
 and arithmetic. 
 
 f*i(l + e,) , 1 < i < n 
 IjSj * * k )(l + <?.), n < i < r 
 
 where |e • | <_ e = ((3 )/2 (rounding with base |3 and mantissa of d digits.) 
 Definition 3.1 : local relative error in (-j - a quantity, [p£-j]i • e = e i » 
 representing the relative change in the value of £j due to its represen- 
 tation or computation. 
 
 It follows immediately that |[p£,-]|_| < 1. 
 Definition 3.2 : total relative error in £j - a quantity, [p£j]t ■ e » 
 representing the relative change in the value of £• due to the accumulated 
 effects of local relative errors in nodes of which £• is a function. 
 
 Trivially, for input nodes, [p£-j]j ' € = [p£-j]|_ ' e » 1 £ 1 1 n » 
 For a given algorithm, let £ . = f(£,, £ 2 » •••■ ^i_l^' n < i 1 r - Pre " 
 vious local relative errors, e = [p£ c ]i • e,.l ±i li-1, give rise to 
 Fi = f (^( 1 + e-j). £ 2 + e 2 ), ..., £■,•_■! (1 + e i_]))- Expanding f. around £. 
 using Taylor's theorem, and dividing by £. results in 
 
 1-1 ? 
 
 (fj -Sj)/^ = 2 Uj/^OO^/atjJCp^lL • e + 0(0 (3.1) 
 
 Adding [p^\ • e to both sides of (3.1) and substituting (2.1), obtain 
 [p£.] T - e = 2 (P^-/P^)[P^] L • e + O(e') 
 
12 
 
 Dividing by c, and taking the limit as e approaches gives 
 
 X — | 
 
 (3.2) 
 
 By working with (3.2) the discussion will be independent of e. Thus, 
 the concern will only be with the coefficients of any small e > 0, and 
 explicit mention of e is henceforth dropped. 
 
 The formulation of total relative error, (3.2), gives rise to 
 an undetermined homogeneous system of equations, Mz = 0, or 
 
 [-1 ,A,B] 
 «- r _ n ' 
 
 = 
 
 (3.3) 
 
 where I r _ n is the identity matrix of order r-n; A is an (r-n)*n matrix 
 
 with elements a.. = p£ , ./p£.; B is an (r-n)x(r-n) unit lower triangular 
 ij n+r j \ i \ i j 
 
 matrix with elements . . = p£ ,./p£ . • below the diagonal; t is an 
 
 ij n+r n+j 
 
 (r-n)-vector with elements r. = [p£ + .] T ; d is an n-vector with elements 
 5 .j = [p^,-]| ; and finally, g is an (r-n)-vector with elements y. = [p^ n+1 *]i- 
 
 System (3.3) may be used to perform an error analysis, e.g., 
 forward, backward, etc., as in a following chapter. However, these 
 analyses seek a relationship of errors between inputs and outputs, and 
 between computational nodes and outputs. Hence, only those equations of 
 (3.3) corresponding to output nodes need be considered, i.e., for 1 <_ i <_ m, 
 
 co,. 
 
 K ] T 
 
 2 (pr? ./p£«)[p£p]i 
 2=1 n 
 
 (3.4) 
 
 where 77 • = £ w . This produces the condensed system, M z = 0, or 
 i 
 
13 
 
 [-I m ,A,B,] 
 
 = 
 
 (3.5) 
 
 where I m is the identity matrix of order m; A is an mxn matrix with 
 elements a.. = pri./p%.; B is an mx(r-n) matrix with elements 
 J. . = pv-/p£ + .; t is an m-vector with elements r. = [pi?.]-, and d, 
 and g are as previously defined. 
 
 The elements of M may be calculated from the formula for the 
 partial relative derivative in (2.5). An obvious method forms the 
 entries row by row, the partial relative derivatives on the right- 
 hand side of (2.5) having already been computed. The matrix M can then 
 be formed by selecting rows of A and B in M corresponding to output 
 nodes, and appending -I . Since m is usually much less than r-n, i.e., 
 few of the computational nodes are actually output nodes, much of the 
 work done in forming M is concerned with rows which will eventually be 
 discarded. An equivalent system which will allow the direct calculation 
 of M will now be derived. 
 
 Formula (3.2) may be expressed as 
 
 i-1 
 [p*,] T = &>M + 2 (P*,/PtJ[p* ] 
 
 i J T 
 
 Substituting (2.5) gives 
 
 i J L 
 
 2=1 
 
 V^l'^i-'l 
 
 i-1 
 
 [P^] T ■ [P^.] L + '(pVPtj) ^ Wj"*!"*A 
 
 i-1 
 + (p*./p* k ) Z (p* k /P* B )[p* fi ] L 
 
14 
 
 where £. = £• * £,. From the node numbering scheme, {pZJpL ) = 0, if 
 
 1 J K J K 
 
 8 > j, and {ptJptn) = 0, if 8 > k; i.e., a node cannot be affected by 
 a computation which is performed after its execution. Hence, changing 
 the upper limits of both sums and substituting (3.2) results in 
 
 [p^-lj = [P^.] L + (P^/Pf jOtajlr + (p^/pS^CpS^t 
 
 (3.6) 
 
 In (3-6), the effects of previous local relative errors have been accumu- 
 lated in the operands of £ • . Formula (3.6) is a local relationship of 
 errors, since it concerns only a node and its operands. In contrast, one 
 may call (3-2) a global relationship, since all previous nodes are inspected 
 Thus, formula (3-6) produces the system Mz = 0, or 
 
 [B,A,I r _ n ] 
 
 = 
 
 (3.7) 
 
 where B is an (r-n)x(r-n) lower triangular matrix with elements 
 <3-jj = P£ n+1 -/p£ n+J -> if £ n+J - is an operand of £ n+i » and zero otherwise, 
 and .. = -1; A is an (r-n)*n matrix with elements a.. = p£ + -/p£ ■:» if % ,• 
 is an operand of £ n+ ,-> and zero otherwise; and I r _ n » and z are as 
 previously defined. 
 
 The analyses could use either Mz = or Mz = 0. The reduction of 
 Mz=0toMz=0is done to economize time and storage. No such reduction 
 is possible with Mz = 0, since this system specifies relationships at a 
 local level, and to remove rows from it would result in the loss of the 
 global relationships which the analyses seek. 
 
 Both systems specify a relationship between any and all errors, 
 and their effects. The relationship specified in each system is the same, 
 
15 
 
 since by comparing (3.3) and (3.7) and noting that B is nonsingular, it 
 must be that 
 
 M = (-B)- 1 M (3.8) 
 
 The systems Mz = 0, Mz = 0, and M z = for the model are given 
 below. Row and column headings are provided as an aid in identifying 
 matrix elements. The equation relating Cp£ q ] t to other errors is shown to 
 illustrate the difference in complexity among the systems. 
 
16 
 
 h 
 h 
 h 
 
 % 
 
 9 
 
 -I 
 
 * K *K S-, S a S Q Si So U S A U Si Si So S 
 
 5 *6 ^7 ^8 ^9 
 
 4 *5 *6 V ^8 <9 
 
 -10 
 
 1/2 1/2 
 
 1 
 
 
 0-1 
 
 0-120 
 
 1 
 
 
 0-1 
 
 1/2 1/2 
 
 
 1 
 
 0-1 
 
 1/2 3/2 -2 
 
 1 -1 
 
 1 
 
 0-1 
 
 -1 5/2 1/2 
 
 1 
 
 1 1 
 
 Mz = 
 
 " — i 
 
 
 r n 
 
 
 b* 6 ] T 
 
 
 [Pi 7 }j 
 
 = 
 
 
 [p{ 8 j t 
 
 
 D>{ 9 ] T 
 
 
 IpS 9 }j = -LpS 2 \ + 5/2[ P ^] L + l/2[p« 4 ] L + [ P e 6 ] L + [p^] L + [p^ g ] L 
 
 = 2 
 
17 
 
 B A I 
 
 ^5 ^6^7 ^8^9 h h h U $5*6*7*8*9 
 
 -10 
 
 1/2 
 
 1/2 
 
 
 1 
 
 
 0-1 
 
 
 -1 
 
 2 
 
 
 1 
 
 
 0-1 
 
 
 1/2 1/2 
 
 
 1 
 
 1-1 0-1 
 
 
 1 
 
 1 1 0-1 
 
 
 1 
 
 Mz = 
 
 [p£ 7 ] t 
 
 [p* 8 ] t 
 M 9 ] T 
 
 [p*,] l 
 
 [pi 4 ] l 
 
 [p£ 9 ] l 
 
 C^ 9 ] T = W 9 \ + [p^ 6 ] t + [p^ 7 ] t 
 
18 
 
 -I A 
 
 £r£q £i £? £-5 £ 
 
 8 *9 «1 
 
 *5 *6 *7 *8 h 
 
 "1 
 
 -1 
 
 1/2 3/2 -2 
 
 1-1 1 
 
 
 [pV-\]j 
 
 
 r? 2 
 
 -1 
 
 -1 5/2 1/2 
 
 110 1 
 
 
 [/»? 2 ] T 
 
 
 Mz~ = 
 
 
 ">*l\ 
 
 
 &*A 
 
 
 «A 
 
 
 [p^ 4 ] l 
 
 
 em 5 ] l 
 
 
 Wfi] L 
 
 
 &A 
 
 
 »A 
 
 
 [rf^ 
 
 
 [p£ g 3 T is as given for Mz = 0, 
 
19 
 
 4- Efficient Calculation of M 
 
 From (3.8), the relationship between M and M, one can devise an 
 efficient scheme for forming A and B directly from A and B without forming 
 A and B. Let C~ = [A,B], C = [A,B], c. t be the i th row of C and the j th row 
 of C, where n+j = "j. Since C = (-B)" 1 [A,I ], it follows that 
 
 c l = e. L C = e. L (-B)"'[A,I ], 
 
 .-th 
 
 where e. is the j column of the identity matrix of order r-n • Post- 
 multiplying this equation by the nonsingular matrix, 
 
 I 
 n 
 
 -A -B 
 
 and transposing gives 
 
 I -A" 
 n 
 
 -B 1 
 
 c. = e" ., 
 i n+j' 
 
 where e 1 . is the n+j column of the identity matrix of order r. This 
 n+j 
 
 unit upper triangular system may be solved in an efficient manner by 
 drawing on the sparseness of A and B. Let R be the coefficient matrix, 
 s = c-j , and f = e^+j • Then initially, 
 
 V-l 
 
 ; r-l 
 
 >-l 
 
 0v 
 
20 
 
 The value of o is immediately known, and only the deflated system 
 
 R r-l s r-l = f r-l " Vr 
 
 need be considered in order to find the remaining a's. This system is 
 
 of the same form as the original, and the same technique to find a , may 
 
 be employed, etc. Each update of the right-hand side is particularly 
 
 simple„ At step k, r >_ k >_ n+1 , the vector u. is a subvector of the 
 
 (k-n) row of [-A,-B], and therefore is zero except for two elements. 
 
 Let u k t = (M kl , M k2 ,.. • , M k k _ 1 ). If ^ k = £ * I , then the non-zero 
 
 elements of u^ are M k = ~(p^/p^J* and ^ k q = '^^PlJ- Thus » the 
 following algorithm computes the i^ n row of C. 
 
 1. f^ n J/ = e ' ., where n+j = u>. 
 
 n+j 1 
 
 2. For k = n+j, n+j-1,..., n+1 
 
 f (k-l) = f (k) s 
 
 except 1) p ( k " 1 ) = p < k ) + k (k, (P^ k /P^ p ), 
 2)0q (k-l) =0q (k) + * k <*0(p* k /p$ q >, 
 
 where £ k = £ p * £ q , and f (k) = (0 i (k) ),l < i < r 
 
 3. c. - f (n) 
 
 The algorithm starts with V n J ^ rather than f^ r ' since 0.' r ' = for 
 
 i > n+j, and hence f( n+ J) = f( r ). The algorithm ends with f( n ) rather 
 
 than f^ 1 ) since after j steps R = I , and hence f( n ) = fC). 
 
 n n 
 
 An algorithm similar to (4.1) has been proposed by Linnainmaa [7] 
 for the determination of Taylor coefficients for accumulated absolute rounding 
 error. 
 
 (4.1) 
 
21 
 
 The algorithm is related to formula (2.4) on path products. 
 
 This relationship is based on the observation that any path from, say £ , 
 
 to % , . must pass through some node £ , where £ has £ n as one of its 
 n+j ^s ^s ^C 
 
 operands. Let S be the set of all such indices s. Then (2.4) may be 
 expressed as 
 
 p£ 7p£ n = 2 (pi /pi ){pi /pi ) 
 n+j £ sGS n+j s s C 
 
 At each step k = s, <p ^' is updated to include that path product corre- 
 
 (k) 
 ponding to a path from £ g to \ + . through £ , p£ n+ -j/P^ s = k having been 
 
 computed prior to step s. 
 
 The generation of M from M takes considerably less time and 
 storage than that for the generation of M from M. Computing the local 
 partial relative derivatives in Table 1 requires r-n divisions, assuming 
 each node type is equiprobable. These quantities are the a-., and 0.. 
 Of A and B in M. Algorithm (4.1) transforms [A,B] into [A,B] in no more 
 than 4m(r-n) operations, depending on the distribution of the output 
 nodes in the graph- In total, M is obtained from M in (4m+l)(r-n) opera- 
 tions. The storage required is mr for [A,Bl, and 2(r-n) for a condensed 
 form of [A,B], for a total of (m+2)r-2n. 
 
 Generating M from M again requires about r-n divisions for the 
 local partial relative derivatives in Table 1. Following formula (2.5), 
 each of the n(r-n) elements of A, and the (r-n)^/2 elements of B uses 3 
 operations. Collectively, generating M from M requires (1 + (3r+n)/2)(r-n) 
 operations. The storage required is r(r-n) for [A,B]. The elements of 
 [A,B] are generated when needed and discarded, hence no additional storage 
 is necessary. 
 
22 
 
 By taking ratios, it follows that forming M from M requires 
 approximately 3(r+n)/(8m) times the number of operations, and (r-n)/(m+2) 
 times the storage as that for generating M from M. For example, consider 
 an algorithm in linear algebra with n=k^ inputs, m=k outputs, and r-n=k^ 
 computations, where k is the dimension of the problem- Then forming M 
 from M requires 3(k^+2k)/8 the work and k^/(k+2) the storage of forming 
 M from M„ 
 
 These methods may be compared with existing ones, e-g-, Miller [10], 
 
 who has taken a different approach to error propagation. In his error 
 
 analyses, the matrices A and A are generated, where A is an (r-n)*n 
 
 matrix with elements a. ,'P) = a£ Jbe ,\ and A is an (r-n)x(r-n) 
 
 ij n+i j r 
 
 lower triangular matrix with elements a. .^ r ' = 3£ ./be ,.• These 
 3 1 j n+r n+j 
 
 correspond to A and B of M. The correspondence is 3£ ,-/de n = £ ,.(p£ , -/p£ n ) 
 r v n+r 9. n+i n+r 8 
 
 By working with A and A , the assumption of non-zero node values may be 
 
 dropped- Matrix elements are generated by the chain rule- If £ . = £ . * £. , 
 
 then the value of 9£ ^Jbe. is given in Table 2- 
 
 n+r 8 3 
 
 5 n+r fi 
 
 * = + 
 
 9£ ./3e„ + 9£. /3e„ 
 V 8 k' 2 
 
 * = _ 
 
 3 V 9e « ■ 8 V de « 
 
 * = X 
 
 ty'Vi^M'ty".' 
 
 * = V 
 
 «kt»«j/»«« > - *k<» e d / "'« ,)/e k Z 
 
 Table 2. Evaluation of a£ . ./de. 
 
 n+r 2 
 
 Note that 3 £ n+1 '/ 3c n+ -j = £ n+1 - ■ Computation proceeds row by row, partial 
 derivatives in the table having already been computed. Assuming each node 
 type is equiprobable, one performs an average of 2.5 operations to evaluate 
 
23 
 
 each of the (r+n)(r-n)/2 elements of A and A • Rows corresponding to 
 output nodes are subsequently used to perform the analyses. The storage 
 required is r(r-n) for [A ,A ] . In comparison to generating M from M, 
 this technique takes 5/6 of the time and the same storage. With respect 
 to M from M, it requires 5(r+n)/(16m) of the time and r(r-n)/(r(m+2)-2n) 
 of the storage. 
 
 However, algorithm (4.1) may be modified to compute the elements 
 
 of [A »A r ]. Since ^ n+i /3e £ = * e ( a *n+1 /d *« *» the " path P roduct " P™perty 
 of partial derivatives in (2.2) may be employed to compute the matrix 
 elements to within a scalar factor. The modifications to algorithm (4.1) 
 entail replacing P£ k /P$ p by 3£ k /3£ , P$^/pi by 3£ k /3£ , and including a 
 final scalar multiplication step. The modified algorithm would use Table 3 
 
 to calculate the arc weights. For £ . = £. * £, , 
 
 J k 
 
 
 * = + 
 
 * = - 
 
 * = X 
 
 * = -r 
 
 1 
 
 1 
 
 h 
 
 vh 
 
 1 
 
 -1 
 
 h 
 
 V*k 
 
 Table 3. Arc weights for Miller 
 
 While the computations required in Table 3 are less than that of Table 1, 
 the final scalar multiplication step in the modified algorithm outweighs 
 this and results in an overall increase of up to 25% over the time to execute 
 algorithm (4.1). The storage required remains the same. 
 
 The generation of the equivalent of M in Miller [10] for several 
 algorithms was timed on a 360/75. The results are summarized in Table 4. 
 Many of the algorithms are described in Miller [12]. Givens/backsub re- 
 duces a matrix to upper triangular form using Givens plane rotations, and 
 solves the resulting system by backsubstitution. 
 
Algorithm 
 
 n r-n 
 
 Miller 
 m [10] 
 
 Miller 
 w/(4.1) 
 
 24 
 
 Speedup 
 
 6ivens(QA=R)/backsub(Rx=Qf) (4x4) 
 
 20 
 
 172 
 
 4 
 
 0.98 
 
 0.085 sec. 
 
 11 .5 
 
 Least Squares (4x3) 
 
 16 
 
 116 
 
 3 
 
 0.45 
 
 0.046 
 
 9.8 
 
 Givens(QA=R) (4x4) 
 
 16 
 
 120 
 
 10 
 
 0.49 
 
 0.092 
 
 5.3 
 
 Choi esky(A=LL t )(Ly=f) (L t x=y) (4x4) 
 
 14 
 
 62 
 
 4 
 
 0.16 
 
 0.030 
 
 5.3 
 
 RINVERSE (4x4 UT) 
 
 10 
 
 30 
 
 10 
 
 0.042 
 
 0.010 
 
 4.2 
 
 Table 4. Timed Comparison 
 
25 
 
 5. Error Analyses 
 5.1 . Forward Analysis 
 
 A forward analysis finds a bound on the relative error in the 
 solution |[p*?.] T |, 1 < i < m. Depending on the assumptions made, the 
 relative error in t?. may be the result of inexact data, inexact arithmetic, 
 or both. From the system M z = 0, (3.5), 
 
 t = Ad + Bg 
 
 (5-1) 
 
 or, in terms of the matrix and vector elements, for 1 < i < m, 
 
 [pr? i ] T 
 
 j=l 1 J J L j=n+l 1 J J L 
 
 (5.2) 
 
 Considering for the moment any single output, 77. , the relative error in 
 
 t? . can be maximized by choosing the local relative errors, [p£.l , 1 <_ j <_ r, 
 
 in the following manner: 
 
 &*A 
 
 r 
 
 1 if p77./p£ . > 
 
 c I c| <_ 1 , if pn./pt, . 
 
 -1 if pv -/P% . < 
 
 (5.3) 
 
 This choice of the local relative errors guarantees that [pt?.],. takes on its 
 maximum positive value. The maximum negative value of [p^?.]-. has the same magni- 
 tude, and corresponds to changing the signs of [p£-j]i "in (5.3). Without loss of 
 generality, the positive value will be used. 
 
 The choice of local relative errors in (5.3) reduces (5.2) to 
 
26 
 
 n 
 
 l T - - =1 l J j=n+1 l J 
 
 (5.4) 
 
 where a possibly different set of local relative errors is chosen for each 
 t?.. Formula (5.4) shows that the relative error in i?. is bounded by 
 (||a. ||, + ||b.||,) where a. 1 and b. 1 are the i th row vectors of A and B. The 
 vector of relative error bounds, for all outputs, obtained from (5.4) is a 
 pessimistic one in that, generally, each bound cannot be simultaneously 
 attained . 
 
 Recall the matrices A and ¥ for the model problem: 
 
 v. 
 
 £ £ £ £ 
 5 1 *2 *3 M 
 
 t t t t t 
 
 ? 5 ? 6 ? 7 ? 8 *9 
 
 1/2 3/2 -2 
 
 1-10 10 
 
 -1 5/2 1/2 
 
 110 1 
 
 Application of (5.4) gives a relative error bound of 7 for t? and corre- 
 sponds to the choice of 1 ,1 ,-1 ,c,l ,-1 ,c,l ,c for the local relative errors. 
 Similarly, the choice of c,-l ,1 ,1 ,c,l ,1 ,c,l for [p£ .]. , 1 _< j _< 9, gives 
 a relative error bound of 7 for 77 Recall that these values are coeffi- 
 cients of e. 
 
 Equation (5.1) shows the contributions of data error, d, and 
 computational error, g, to the output error. By assuming exact computation, 
 i.e., setting g - 0, then (5.1) reduces to t = Ad, and (5.4) becomes 
 
 |[pi?.] T l < 2 |pt?./p£,| 
 
 j=l ' J 
 
 Definition 5.1 : condition vector of the problem - the vector of output 
 relative error bounds as computed by (5.5). The elements of this vector 
 
 (5.5) 
 
27 
 
 may be interpreted as the sensitivity of the outputs to input error. 
 
 . For a problem with given data, the condition vector of the problem 
 is independent of the algorithm. This fact follows from the assumption of 
 exact computation. For the model problem with data £, = £p = 1 , £ 3 = $, = 2, 
 the "condition vector of the problem" is always (4,4) . 
 
 By assuming exact data, i.e., d = 0, then t = Bg, and 
 
 r 
 |[pi?.] T | ^ \pn /p£ | (5.6) 
 
 1 ' j=n+l ! J 
 
 Definition 5.2 : condition vector of the algorithm - the vector of output 
 
 relative error bounds as computed by (5.6). The elements of this vector 
 
 may be interpreted as the sensitivity of the outputs to computational 
 
 error. 
 
 Naturally, the condition vector of the algorithm varies from 
 
 algorithm to algorithm. For the model problem and the given data, the 
 
 algorithm employed has a "condition vector of the algorithm" of (3,3) , 
 
 5.2. Backward Analysis 
 
 A backward analysis attempts to find a number, 0, which measures 
 the numerical stability of an algorithm. An algorithm is stable if it 
 produces a solution which is the exact solution of the problem with data 
 
 ^0 + lfi$j\- e). 1 1 i <_ n > where I[P^] L I i - 
 
 The value of is found by comparing the set of output relative 
 
 error vectors corresponding to any computed solution with the set of output 
 relative error vectors produced by some perturbation of the data. The 
 analyses in this and the next section assume that the computed solution 
 is obtained from exact data. Let C k = {v: || vjl^ £ 1 } be the unit cube in 
 R (the --norm). Any vector, g = (7^, y. = [p*^.]^ 1 < i < r-n, of 
 
28 
 
 local relative errors in the computational nodes is an element of C , 
 since | -y - 1 <_ 1. From the forward analysis, a computed solution correspond- 
 ing to a given g has an output relative error vector Bg e R . it follows 
 that any computed solution must have an output relative error vector which 
 is an element of BC = (Bg: g e c r _ n ^- The set > BC r _ n is a cent rally sym- 
 
 metric convex polytope. This follows from the similar properties of C 
 
 r-n' 
 
 and the fact that B is a linear transformation from R ~ n to R , Householder 
 [5]. For the model problem with given data, the polytope BC is shown 
 below. 
 
 [pt?i] 
 
 1 J T 
 
 In a similar fashion the set of output relative error vectors 
 
 produced by only perturbing the data may be found. Let d = (5 ), 
 
 i 
 
 5 • = [p£ .]. » 1 £ i £ n, be a vector of local relative errors in input nodes 
 with |5.| <_ 1 . Then d g C , and the corresponding output relative error 
 vector for the solution is Ad e R • The set of all such vectors forms 
 
 AC = {Ad: dec}, which is, again, a centrally symmetric convex polytope 
 
 For the model problem, the polytope AC is shown below- 
 
 n 
 
29 
 
 1 ^zh 
 
 - [p^Ij 
 
 (4,-3) 
 (4,-4) 
 
 To perform a backward analysis, the output relative error v 
 vector for any computed solution must be shown to be the same as one 
 obtained by only perturbing the data. By superimposing BC and AC for 
 the model problem, it is clear that there are output relative error vec- 
 tors for computed solutions which are not output relative error vectors 
 corresponding to input perturbations of only 1 ■ e. 
 
 AC. 
 
 [p^lj 
 
 BC 
 
 r-n 
 
30 
 
 Thus, the value of © = 1 is too small. The correct value of © is found 
 by expanding (or contracting) fiC n so that it just covers BC r _ n - This cor- 
 responds to allowing d to lie in ©C n , where > 0. Ad is then an element 
 of the polytope ©AC . The value of © sought is glb{© > 0: BC r _ n c ©AC n >. 
 For the model problem, © = 2.375, as illustrated in the diagram below. 
 
 [pT? 2 l T 
 
 AC 
 
 [PV^Ij 
 
31 
 
 Thus, any computed solution of the model problem at the given data can 
 
 be obtained by computing exactly with data £.(1 + [p£.l • e), 1 £ i £4, 
 
 where [£p| .], | < 2.375. 
 i L — 
 
 The amount by which AC must be expanded to cover BC is 
 governed by only a few of the elements of BC . These elements correspond 
 to the vertices of BC From the convex properties of AC and BC , 
 if AC covers the vertices of BC , then it covers all of BC . Hence, 
 attention may be limited to covering only the vertices of BC . As the 
 previous diagram illustrates attention may, in fact, be restricted to 
 covering only one vertex (and its symmetric mate). This vertex is the out- 
 put relative error vector Bg* corresponding to a "worst" set g* of local 
 relative errors in the computational nodes. By covering this worst case, 
 all others are covered. This vertex Bg* is characterized by being the 
 "farthest" point on the perimeter, or boundary of BC from the boundary 
 °f AC n . Any point Bg on the boundary of BC is covered, on expansion, 
 by a corresponding point Ad on the boundary of AC r _ n . Both Bg and the 
 corresponding Ad lie on the same radial line from the origin. Hence, 
 Bg = 0Ad, where ©measures the distance of Bg. For the model problem, 
 the points Bg* and Ad* are illustrated below. 
 
32 
 
 AC 
 
 BC 
 
 r-n 
 
 O??] 
 
 2 J T 
 
 Ad* = (1.26, •421) t 
 >r^ Bg* = (3,1 ) t 
 
 [PT? 1 ] T 
 
 In general, there may be more than one symmetric pair of vertices on the 
 
 boundary of BC whose relative distance from the boundary of AC is maximal 
 
 J r-n J n 
 
 If this is the case, any arbitrary vertex of maximal distance may be used, 
 since when any point on the boundary of BC of a given relative distance 
 is covered, all points of that relative distance are also covered. 
 
 The chapter on computational details will discuss the finding 
 of g*, and it will be assumed here that it has been computed. The following 
 problem may be solved to obtain 0. 
 
 Backward analysis: find the smallest constant > 0, which 
 satisfies the following conditions: 
 
 Ad = Bg* 
 
 |5.| < 
 1 l ' — 
 
 (5.7) 
 
 1 < i < n 
 
 where d = (5 ), 5. = [p£.l , 1 <_ i <_ n, and g* is as described above. 
 
 The minimal is the °°-norm of that vector d e 0C of input relative errors 
 
 n 
 
33 
 
 whose image under A covers the "worst" output relative error vector Bg*. 
 
 fiAC n therefore covers BC r _ n - 
 
 Having set up the problem (5.7), it may be found to be unsolvable. 
 
 From the construction of F it is easy to show that its columns span R . 
 
 The polytope BC r _ n therefore extends in all m dimensions. If m > n, i.e., 
 
 if there are more outputs than inputs, then the n columns of A can do no 
 
 better than to span R n . The polytope AC can therefore extend in no more 
 
 than n dimensions. It is impossible to cover a polytope in R m w -jth another 
 
 in R , where m > n. This deficiency is reflected in the inconsistency 
 
 of the equations in (5.7), i.e., rank (A) < rank ([A,Bg*]). Even if m <_ n, 
 
 which is usually the case, there is no guarantee that m of the columns of 
 
 A are linearly independent, and hence, that AC extends in m dimensions. 
 
 Again, the situation could be the impossible task of covering a polytope in 
 
 m . k 
 
 R' with another in R , where m > k. The following example illustrates this 
 
 case- 
 
 This is the computational graph for the algorithm: t? = £ = % + \ • 
 
 n 2 =* 4 = *l *h- 
 
34 
 
 For ^ = * 2 , 
 
 1/2 
 
 1/2 
 
 , B = 
 
 "l 
 
 
 1 
 
 1 
 
 
 1 
 
 A = 
 
 The polytopes AC and BC are shown below 
 
 [p7? 2 ^T 
 
 E^i] 
 
 1 J T 
 
 The polytope AC consists of the line segment joining the points (-1, -2) 
 and (1, 2). The polytope ©AC- for any © will consist of the line seg- 
 ment joining the points 0(-l, -2) and 0(1, 2), and cannot cover the poly- 
 tope BCp. 
 
 The inability to always perform a backward analysis warrants a 
 relaxation of the criterion for stability to allow an analysis which may 
 always be done. One such analysis is presented in the next section. 
 
 5.3. B-analysis 
 
 A forward and backward analysis can be combined into what 
 Panzer [14] calls a B-(Beitseitige) analysis. The B-analysis is along 
 the lines of Stewart's [18] definition of stability, namely, that the 
 
35 
 
 computed solution is "near" the exact solution of a perturbed problem. 
 From the forward analysis, any computed solution, assuming exact data, 
 must have an output relative error vector 
 
 t B ■ Bg 
 
 which is an element of BC . Similarly, the output relative error vector 
 
 r-n J r 
 
 produced by perturbing only the data and computing exactly is 
 
 »A = Ad 
 
 which is an element of AC . The backward analysis requires finding the 
 smallest number such that for any choice of g in C , 
 
 where ©' <_ 0. However, the B-analysis only requires that 0't. be "near" 
 t R . The "nearness" of 0't A to t R can be measured by the °°-norm, ©' , of 
 a vector ©'t such that 
 
 ©'(t A + t) = t B (5.8) 
 
 AAA ^ 
 
 where t = (t •) , t , - [pr?-], [r,. | <_ 1 , 1 < i < n, is a vector of relative 
 errors which artificially perturb the output values. A constant, 0, is 
 now sought such that for any g in C , (5.8) holds for some 0' £0. 
 By bounding both the input and output perturbations by 0, the "nearness" 
 of the perturbed inputs to the true inputs, and the "nearness" of the com- 
 puted solution to the exact solution of the perturbed problem are measured 
 in the same way. 
 
36 
 
 The value of is found by comparing sets of output relative 
 error vectors. As before, the set of all vectors, tg, is BC r _ n . Let 
 A = CA,I ]• and d 1 = (d fc , t*). Then t A + t = Ad, where d e C n+m - The 
 set of all vectors, t A + t, is AC n+m , a centrally symmetric, convex poly- 
 tope. For the model problem, AC n+m and BC r _ n are shown below. 
 
 AC 
 
 n+m 
 
 [P*? 2 ] T 
 
 BC. 
 
 r-n 
 
 [PV-^lj 
 
 The B-analysis now follows a strategy similar to that of the 
 
 backward analysis. It must be shown that the output relative error vector 
 
 for any computed solution is the same as that obtained by perturbing the 
 
 data and the outputs. The polytope AC + is expanded so that it just 
 
 covers BC . Thus, d is allowed to lie in ©C , where © > 0. Ad is then 
 
 an element of ©AC . . The value of souqht is glb{© > 0: BC „ c ©AC }. 
 n+m M 3 r-n — n+m 
 
 The diagram below shows that © = 1 .1176 is the desired for the model problem 
 
37 
 
 AC 
 
 n+m 
 
 
 Ipv 2 1j 
 
 
 ~\ 
 
 \ 
 
 
 \ ^ 
 
 
 / 
 
 S 
 
 • < 
 
 
 — [pv-,1 
 
 1 J T 
 
 BC 
 
 r-n 
 
 Thus, any computed solution of the model problem at the given data can be 
 obtained by computing exactly with data £.(1 + [p£.l • e), 1 <_ i <_ 4, and 
 then perturbing the outputs q. by (1 + [pi?.] • e), 1 < i < 2, where 
 |[P^] L I 1 1.H76, and H>?.]| < 1.1176. 
 
 As in the backward analysis, is governed by a vertex Bg* 
 corresponding to a "worst" set g* of local relative errors in the computa- 
 tional nodes. The vertex Bg* is one of the points on the boundary of BC 
 
 of maximal relative distance from AC , where distance is measured in the 
 
 n+m 
 
 same way as that in the backward analysis. For the model problem, 
 
 g* is the same for both the backward and B-analysis, though, in general, 
 
 this need not be the case. 
 
 Assuming that g* has been computed, the following problem may 
 
38 
 
 solved to obtain 0. 
 
 - B-analysis: find the smallest constant > 0, which satisfies 
 the following conditions: 
 
 Ad = Bg* 
 
 (5.9) 
 |6 . | <_ 1 < i < n+m 
 
 t 
 where d - (8 •)» 1 £ 1 1 
 
 n+m 
 
 r i _ n = [pT7 i _ n ] for n+1 <_ i < n+m 
 
 A = [A, I], and 
 m 
 
 g* is as described previously. 
 
 The minimum is the °°-norm of that vector d e 0C of input and output 
 
 relative errors whose image under A covers the "worst" output relative 
 
 error vector Bg*. 0AC , therefore covers BC 
 
 J m+n r-n 
 
 The system of equations in (5.9) is always underdetermined since 
 
 *t t *t 
 A is an mx(m+n) matrix. Furthermore, a vector, d = (d ,t ), which satisfies 
 
 these equations can always be found since rank (A) = rank ([A, I ]) = m. 
 In particular, the vector (0,(B~g*) ) satisfies these equations, and corre- 
 sponds to a forward analysis. This vector, however, does not usually give 
 the smallest 0. 
 
39 
 
 6. Computational Details 
 6.1 . The Form of g* 
 
 Before the problem for backward analysis (5.7) or the problem 
 for B-analysis (5.9) can be solved, the vertex, Bg*, of the computational 
 polytope of maximal relative distance from the problem polytope must be 
 found. The following discussion will concern the B-analysis to which 
 obvious modifications can be made if a backward analysis is to be attempted. 
 
 Since B is fixed, the problem is that of finding g*. From the 
 properties of B and C , g* is found to have a particular form as shown 
 by the following argument. Any point, x, in a convex set may be represented 
 by x = Xa + (l-X)b for some a and b in the set and <_ X <_ 1 . A vertex, x, 
 has a unique representation of the form x = 1 • x. Let x be an element of 
 
 - r " n - - th 
 
 BC , i.e., x = £ y. b., where b. is the i column of B and \y.\ <_ 1. 
 
 Theorem 6.1 : If x is a vertex, then \y . | = 1 , 1 <_ i <_ r-n . 
 
 Proof: Assume there exists a j such that |t-| < 1. Without loss of generality, 
 
 assume <^y. < 1. An analogous proof holds for -1 <_t- < 0. Then 
 j j 
 
 r-n _ 
 x = 2 7,-b. + Tib. 
 i=l ^ ^ J J 
 
 1+J 
 
 r-n 
 
 Let u = 2 T-jb.,. . Then x = u + T-b.. If y. = 0, then x = u. However, 
 i=l 
 i+J 
 
 i=l 1 ! J J J 
 
 (u + b.) and (u - b.) are elements of BC . Hence, 
 
 x = X(u - b.) + (1 - X)(u + b.), where X = 1/2 
 j j 
 
 Thus, x is a non-trivial convex combination of two points in BC which 
 
 r r-n 
 
40 
 
 contradicts the assumption that x was a vertex. 
 
 - If < 7. < 1, then, again, x = u + 7^0. . However, u and (u + b ) 
 
 J J J j 
 
 are elements of BC , and 
 r-n 
 
 x = Xu + (1 - X)(u + ET. ), where \ s 1 - y. 
 
 J J 
 
 Again, the fact that x is a vertex is contradicted. Since the choice of 
 7. was arbitrary, it must be that any vertex has the form 
 
 J 
 
 r-n _ 
 x = 2 7-b., where \y.\ = 1 ■ 
 
 i=l n 1 n 
 
 * 
 Thus, g* must have components 7- = + 1 . Intuitively, one would expect a 
 
 "worst" set of computational errors to contain a maximal error for each 
 
 step of the algorithm. Thus, the problem of finding g* consists only of 
 
 finding the signs of its components. 
 
 For the model problem, g* = (1,-1,1,1,1) • 
 
 6.2. Simplification of Polytope Representation 
 
 For efficiency, it will be of advantage to reduce the number 
 
 of columns in A and B. The following argument shows how this may be done. 
 
 Let v be a vector formed by the sum of j vectors, v., 1 <_ i <_ j, which are 
 
 in the same or opposite direction. Then 
 
 j 
 v = 2 a. v., with |a.| < 1 (6.1) 
 
 where for 1 < i <_ j, 
 
 v. = (-1) i P 1 v ] 
 » i = -sign (v.j v ] ) 
 'l = Hv 1 ll - /||vjlL>0 
 
41 
 
 Then v may be represented by 
 
 v = a w, with \a\ <_ 1 (6.2) 
 
 where 
 
 a = ( I a.(-l) i fl.)/( I 0.) (6.3) 
 
 i=l ] 1 i=l n 
 
 J a 
 w = 2 (-1) ] v. (6.4) 
 
 i=l n 
 
 Formulas (6.1) through (6.4) provide the basis for condensing 
 
 B to a matrix B having k, £ r-n columns, b.. In a similar fashion, the 
 
 matrix A = [A, I ] may be condensed to A = [A,D] having k 2 +m <_ n+m columns, 
 
 a.. The matrix D replaces I allowing for the case where some of the columns 
 j m 
 
 of A are multiples of e., the columns of I . 
 
 i m 
 
 Following (6.4), the matrix B is condensed to B by post-multiplica- 
 tion by a (r-n)xk, matrix P R , 
 
 B = B P B (6.5) 
 
 where P B has elements 
 
 A 
 
 {+1 if b. is condensed into b. 
 SL 
 -1 if -b. is condensed into b. 
 j ' 
 otherwise 
 
 The matrix P B has the following properties: 
 
 (1) each row has only one non-zero element, 
 
 (2) the columns are orthogonal, 
 
 (3) P„ P B = D g , a diagonal matrix whose i diagonal element 
 
42 
 
 equals the number of columns of B which are 
 
 condensed into b . . 
 l 
 
 For the model problem (6.5) reveals 
 
 2-10 
 1 2 
 
 1-10 10 
 110 1 
 
 1 
 
 1 
 
 1 
 
 1 
 1 
 
 J 
 
 These reductions have not changed the polytopes, and the following 
 
 proof shows B C. = B C .A similar proof would show that AC, , m = A C ,. 
 K k-j r-n r kp+m n+m 
 
 Theorem 6.2 : B C k = B C p _ n . 
 
 Proof: a) Let x = B g, where g = (7.), 1 <_ i £k, ,with \y.\ <_ 1 . Then 
 x g B" C. . From (6.5), 
 
 '1 
 
 x = Bg=BP R g 
 
 Since P R has only one non-zero element per row and this non-zero element 
 has magnitude one, it follows that 
 
 II p R 9lL 1 1 
 
 Hence, there exists a g e C (namely g = P R g) such that x = Bg. Thus, 
 
 x e B C , and B C. c B C 
 
 r-n k, - r-n 
 
 b) Let x = Bg, where g = (7.), 1 <, i <_ r-n, with |*y - 1 <_ 1. 
 Then x e B C . Let the columns of B be partitioned according to the sets 
 J., 1 < i < k n , where j e J. if b. or -b. is condensed into b.. Then 
 
43 
 
 x = 2 7-b. + 2 7.b. + ... + Z T-b. 
 jeJ 1 J J jej 2 J J jeJ k J J 
 
 In each sum, 2 7-b., all the vectors, b., are in the same or opposite 
 jej. J J J 
 
 direction, and |t.| £ 1. Each sum has the form given by (6.1), and may be 
 substituted by T^b., where y. and b. are computed via formulas analogous to 
 (6.3) and (6.4). The formula analogous to (6.4) is (6.5). The formula 
 analogous to (6.3) is 
 
 g = Q B g 
 
 where CL is a k,x(r-n) matrix with elements 
 
 ll^jllo/H^ilL if bj is condensed into b. 
 
 pi _~_ *. 
 
 k.. = < -lib- II /lib. || if -b. is condensed into b 
 
 U 
 
 v 
 
 otherwise 
 
 The matrix Q B has the following properties: 
 
 (1) each column has only one non-zero element, 
 
 (2) the rows are orthogonal, 
 
 (3) each absolute row sum equals one. This follows from 
 
 lib. || = 2 ||b. II , 
 
 (4) Q D (L = D D , a diagonal matrix, 
 
 D B D 
 
 (5) Qg has the same structure as P R , i.e., corresponding 
 elements of Q R and P B are either both zero, both positive, 
 or both negative, 
 
44 
 
 (6) Q D P D = I,, • This follows immediately from properties 
 
 DO r i 
 
 (3) and (5). 
 For the model problem, and g = (1,-1,1,1,1), the corresponding 
 g is given below. 
 
 1/2 1/2 
 10 
 1/2 1/2 
 
 1 - * 
 Thus, performing the above substitutions, x = 2 y. b., with 
 
 i=l ^ 1 
 
 /\ /\ ^ 
 
 |7- 1 < 1. Hence, x e B C. , and B C c B C, . ■ 
 
 i ' — k-j r-n — k-. 
 
 Similar arguments reveal that any point x in A C , of the form 
 
 ■r.x.r, a n+m 
 
 n+m . . C 
 
 Ad = 2 5. a. can be expressed as an element of A C. .„, of the form 
 • ■ill kp+m 
 
 k 9 +m a ^ 
 
 Ad = 2 6. a., and vise versa. Hence AC = A C. . . 
 i= 1 li n+m k 2 + m 
 
 For the model problem 
 
 3/2 -2 3/2 
 -1 5/2 3/2 
 
 A = 
 
 , B = 
 
 2-10 
 1 2 
 
 where k +m = 4, and k =3. The following table shows the often dramatic 
 2 1 s 
 
 reductions for a battery of problems and algorithms, see Appendix for 
 description . 
 
45 
 
 ALGORITHM 
 
 m 
 
 r-n 
 
 k l 
 
 n+m 
 
 V 
 
 - BACK4 
 
 3 
 
 12 
 
 3 
 
 13 
 
 6 
 
 SEMI4 
 
 3 
 
 12 
 
 3 
 
 13 
 
 6 
 
 PARA4 
 
 3 
 
 14 
 
 4 
 
 13 
 
 6 
 
 BACK8 
 
 7 
 
 56 
 
 7 
 
 43 
 
 15 
 
 SEMI8 
 
 7 
 
 56 
 
 7 
 
 43 
 
 15 
 
 PARA8 
 
 7 
 
 86 
 
 11 
 
 43 
 
 15 
 
 GS 
 
 12 
 
 81 
 
 26 
 
 24 
 
 19 
 
 MGS 
 
 12 
 
 81 
 
 28 
 
 24 
 
 19 
 
 LTBAND 
 
 7 
 
 86 
 
 10 
 
 37 
 
 15 
 
 RINVERSE 
 
 10 
 
 30 
 
 10 
 
 20 
 
 19 
 
 GJ 
 
 4 
 
 76 
 
 7 
 
 24 
 
 8 
 
 GE 
 
 4 
 
 62 
 
 6 
 
 24 
 
 8 
 
 CHOL 
 
 4 
 
 62 
 
 13 
 
 18 
 
 14 
 
 MATMUL 
 
 16 
 
 112 
 
 16 
 
 48 
 
 48 
 
 CG 
 
 4 
 
 122 
 
 32 
 
 12 
 
 9 
 
 Table 5. Simplification of Polytope 
 Representation 
 
 As the table illustrates, the reduction of B to B occasionally 
 results in k =m in which case B is, to within column permutation, a lower 
 triangular matrix. The following theorem gives sufficient conditions for 
 this to occur. 
 
46 
 
 Theorem 6.3 : If the computational graph is a binary tree, i.e., each 
 node is an operand of no more than one node, then B is lower triangular. 
 Proof: For each node r\ . , 1 <_ j <_ m, define 
 
 S. = {£ : pr? /p£ fO and pr? /p£ = for all k < j} 
 J n+i j n +i k n+i 
 
 A node belongs to S . if 77. is the first output node encountered on the 
 
 unique path from it to the root of the tree, i? m . Every computational 
 
 node appears in one and only one set, and each set is non-empty, since 
 
 17. belongs to S . • Let the columns of B be partitioned according to the 
 
 sets S-, i.e., column i is in partition block j if (■ . is in S.. Con- 
 
 sider the columns corresponding to S, = {£ n+ ,-« PV-,/p£ + i f 0}. For any 
 
 £ n+1 - in S-, , the corresponding column in B" is 
 
 {pvJpZ _, prijpt . , ... , pv /Pi ) = V 1 ' {pv J Pi .) 
 1 n+i 2 n+i m n+i ' 1 n+i 
 
 where v., = [pv,/pv , pr} /pr\ , • • • , pr\ /prj ) • This follows from the fact 
 1 112 1 ml 
 
 that the graph is a tree and hence pv Jp% . = (pr? Jpv ){pv /p£ n+1 -), since 
 
 \j III J I 
 
 there is at most one path from £ ,. to 1?.. Thus, all the columns of B" 
 
 r n+i 'j 
 
 corresponding to S, are multiples of v-i , and the reduction of B may choose 
 
 the first column of B to be v, • 2 \pr\ /p£_..-|. Following similar 
 
 1 t (=<: '^Y n+1 3 
 
 *n+i fc 5 1 
 
 arguments, the column of B corresponding to £ + . in S. is v. • {pvJp£ + .) 
 where 
 
 V. = (0, -.., 0, PV./PV.> PV. JPV., ..., prijpri*)* 
 
 J y^_ ^ J J J+l J m J 
 
47 
 
 t h — 
 
 Thus, the j column of B is v. • 2 \pr\ Jp% . |. From the form of 
 
 J £ x .eS. J n ] 
 n+i j 
 
 v., it follows that B is lower triangular. ■ 
 
 A similar result for A, the condensed matrix for A, does not 
 follow. For binary trees, n > r-n >_ m, and seemingly the possibility 
 exists for reducing the n columns of A to m columns of A. The matrix, 
 A, then being lower triangular, would be of rank m, and this fact would 
 guarantee that a backward analysis could be done. The argument below 
 shows why this is not always true. The following example provides an 
 illustration. 
 
48 
 
 Here, 
 
 A = 
 
 1 
 
 1 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 , A = 
 
 2 
 2 
 2 2 
 
 In theorem 6.3, each set S,, 1 <_ j <_ m, is non-empty and hence, there is 
 a condensed column for each set. For this example, S., is empty, character- 
 izing the fact that each path from an input node to t? encounters another 
 output node first. Thus, there is no condensed column corresponding 
 to S^, and A has fewer than m columns. With the additional assumption 
 that all the sets, S., 1 <_ j <_m, are non-empty, theorem 6.3 will hold 
 for A. 
 
 6.3. Algorithms for g* 
 
 The preceding section has shown that the size of the problem 
 for B-analysis (5.9) may be reduced to the following: 
 
 minimize 
 
 subject to Ad = Bg* 
 
 (6.6) 
 
 £ 0, 1 _< i <_ kp+m, 
 
 where d has components 5. representing, according to the reduction of A, 
 combined input and injected output relative errors, and g* has components 
 T*» T 1 1 1 k i > representing, according to the reduction of F, combined 
 relative errors for the "worst" computation, with M| = 1. Thus, algorithms 
 for g*, rather than g*, will be presented. 
 
 The g* which is sought is that g in C. which gives the largest 
 
 K l 
 
49 
 
 solution to (6.6). The problem of finding g* is a minimax problem. 
 Unfortunately, algorithms for finding the global solution to this type 
 of problem are often tedious. Because of the special form of g*, finite, 
 though often lengthy, algorithms can be constructed to find the global 
 solution to (6.6). Two such algorithms will be presented. 
 
 The first algorithm consists of solving (6.6) for all possible 
 
 g, and selecting that g which gives the largest 0. Since each component 
 
 ^1 
 of g* is +1, and there are k, components, (6.6) must be solved 2 times. 
 
 The reason for the reduction of B to f in the last section now becomes 
 
 apparent. 
 
 For a given g, the solution of (6.6) is a linear program. As 
 it stands, (6.6) is not in standard form, since the variables 6 are not 
 constrained to be non-negative. A standard procedure for handling this 
 case is to replace each inequality |5.| £ by the two inequalities 
 
 5. + - 6." < 0, and 5. + - 5." > -0, where 6. + and 6 " are non-negative 
 i i — ' ii— i i - 
 
 variables representing the positive and negative parts of 5.. This pro- 
 cedure practically doubles the number of variables and constraints, and 
 
 increases the storage required by a factor of four. 
 
 An alternative procedure consists of a change of variables. 
 
 Define the non-negative variables S\ = 8^ + 0, i.e., d' = d + 0e k +m> 
 where e. + is a k 2 +m vector each of whose elements is 1. Then, the 
 
 ^ e. *• 
 
 constraint |5.| £0, becomes 8! - 20^0, or d' - 29e k +m <_ 0. Substi- 
 
 tutinq d = d' - 0e, . , the constraint Ad = Bg* becomes Ad' - 0w = Bg*, 
 
 1 k 2 +m . 
 
 where w is a k.+m vector having the elements ox = 2 a... Thus, (6.6) 
 2 j=l 'J 
 
 has become 
 
50 
 
 minimize 
 subject to Ad' - 0w = Bg* (6.7) 
 
 d' " 2©e. +m < 
 
 However, rather than solving (6.7), tremendous savings in time 
 and storage can be had by considering a dual problem, Ascher [1J. His 
 
 A • 
 
 notation will be employed. Given a candidate for g*, let d be a feasible 
 solution to (6.6), i.e., d satisfies Ad = Bg*. Define the following quanti- 
 ties: 7 = 1/ll'dH^ = 1/e, K Q = HAll^/llfg*!^. Choose any K > K Q and define 
 the kp+m+l vector y with components i? . =76., 1 <_ i <_ k^+m, and 
 *? k +m+1 = T/K. Since 7 < K Q , \\y\\ m < 1. Let A = [A, -K fg*] be a m*(k 2 +m+l) 
 
 matrix, and c be a kp+m+1 vector whose elements are zero except for the last 
 component which is 1. Then, (6.6) is equivalent to the linear program: 
 
 t 
 max c y 
 
 subject to Ay = (6.8) 
 
 The dual of (6.8), Rabinowitz [15], may be stated as 
 
 min || r* || _! 
 
 subject to A fc a + r = c (6.9) 
 
 where A and c are as described above. What is claimed to be the most effi- 
 cient algorithm yet devised for solving (6.9) is provided by Barrodale and 
 Roberts [ 2» 3]- The outstanding features of the algorithm include: 
 (1) a condensed tableau, only A and c are explicitly stored; (2) no phase 1 
 
51 
 
 i.e., an initial basic feasible solution is available, r = c; (3) the 
 ability of passing through several vertices in a single iteration. The 
 following equalities show how is retrieved from the solution of (6.9). 
 
 B ] = min || r- 1| ^ = max c l y = v k +m+1 = 7/K = 1/(6K). 
 
 Hence, = 1/(K£ 1 ). 
 
 The following table shows the saving in time of (6.9) over (6.7) 
 for a battery of problems and algorithms. The time, in seconds, is for the 
 evaluation of for one candidate for g*. It is assumed that A and Bg* have 
 previously been computed. 
 
 ALGORITHM 
 
 PRIMAL (6.7) 
 
 DUAL (6.9) 
 
 BACK4 
 
 0.23 
 
 0.02 
 
 SEMI4 
 
 0.23 
 
 0.02 
 
 PARA4 
 
 0.19 
 
 0.01 
 
 BACK8 
 
 3.91 
 
 0.10 
 
 SEMI8 
 
 3.77 
 
 0.09 
 
 PARA8 
 
 2.10 
 
 0.11 
 
 GS 
 
 1.12 
 
 0.15 
 
 MGS 
 
 1.00 
 
 0.15 
 
 LTBAND 
 
 1.82 
 
 0.20 
 
 RINVERSE 
 
 0.78 
 
 0.12 
 
 GJ 
 
 0.55 
 
 0.03 
 
 GE 
 
 0.65 
 
 0.03 
 
 CHOL 
 
 0.38 
 
 0.02 
 
 MATMUL 
 
 5.79 
 
 0.58 
 
 CG 
 
 0.15 
 
 0.01 
 
 Table 6. Primal and Dual Solution Timings 
 
52 
 
 While the first method for finding g* searches all possible g 
 
 in C. for that g which gives the largest solution to (6.9), a second 
 
 method searches all the faces of A C. . for that face which will cover 
 
 kp+m 
 
 Bg*. The representation of A C. + is changed to a more standard represen- 
 tation which is in terms of the hyperplanes which support the polytope. A 
 hyperplane, Householder [5], divides R into two half spaces and is repre- 
 sented by the equation v x = w, where v is an m-vector which is normal 
 to the hyperplane, and wis a constant which is related, by the length of 
 v, to the Euclidean distance of the hyperplane from the origin. A hyper- 
 plane supports the polytope if there is at least one point in common be- 
 tween the two, and all the points of the polytope lie on the hyperplane or 
 in only one of the half spacesi The points x in the polytope satisfy v x <_ 0). 
 Since the polytope is centrally symmetric, there is a corresponding support 
 hyperplane v x = -wand points in the polytope satisfy v x > -cj. Collectively, 
 the points in the polytope satisfy |v x| <_ w. The particular support hyper- 
 planes of interest intersect the (m-l)-dimensional faces of the polytope. 
 
 The points on a face of the polytope A C. +m have a particular form. 
 
 :: k+m : : : 
 Each vertex of the face is of the form Ad = 2 6. a. with |5.| = 1. Points 
 
 i=l ] n n 
 
 on the face are convex combinations of these vertices. A point on the face 
 
 supported by a hyperplane with normal v may be represented by 
 
 Ad = „£ 6. a. + .2 5. a. (6.10) 
 
 v l a.f0 v a^O 
 
53 
 
 where the sum is split according to whether the projection of a. on v is 
 zero. AH of the points on the face have the same projection on v. The 
 following illustration shows a typical two-dimensional face of an A C. + c 
 
 3 . ' 
 
 R , where, in this case, three of the columns of A are orthogonal to v, 
 
 origin 
 
 and y is the first sum in (6.10). The points on the face are symmetric around 
 the point y, which may be called the "center" of the face. The coefficients, 
 
 5 ., for y are given by 
 
 f f 
 
 +1 if v a. > 
 
 i ■ 
 
 -1 if v a. < 
 V. 1 
 
 If this were not so, then it would be possible to construct a point, y', 
 which would have a projection on v which is greater than that of the points 
 on the face. The origin and y' would then lie in different half-spaces, 
 contradicting the fact that the hyperplane supports the polytope on this 
 
54 
 
 face. The coefficients, 6., in the second sum of (6.10) satisfy |6.| <_ 1 
 and are related to the representations of a point on the face in terms of 
 convex combinations of the vertices of that face. 
 
 Since a face is (m-l)-dimensional , there must be at least m-1 terms 
 in the second sum of (6.10). Each face therefore corresponds to a set of at 
 least m-1 columns of A. These vectors must span an (m-l)-dimensional subspace 
 of R m , and hence, attention is restricted to combinations of m-1 columns of 
 A which are linearly independent. While there may be more than m-1 columns 
 of A orthogonal to v, any subset of m-1 linearly independent vectors will 
 suffice to determine the face and the support hyperplane. 
 
 Suppose m-1 linearly independent columns of A are given, say 
 
 t* 
 a,, a 2 , ..., a i . Then the condition v a. = 0, 1 <_ i <^m-l, determines v to 
 
 within a constant multiplier. Requiring that ||v|L = 1 determines a specific 
 
 v. Having found v, the corresponding co is determined by finding a point which 
 
 lies on the face. Such a point, y, has been determined above. The value of 
 
 co is then |v y|. Thus, for each set of m-1 linearly independent columns of 
 
 A, the corresponding symmetric pair of support hyperplanes can be determined. 
 
 t k 2 +m 
 
 The points of the polytope must satisfy |v x| <_ to. If k 3 <_ ( m _-| ) is 
 
 the number of support hyperplane pairs, then the polytope may be represented 
 
 by those points x which satisfy |Vx| <_ w, where V is a k^m matrix whose 
 
 rows, v- , are the normals for the support hyperplanes, w is a k^-vector of 
 
 k 2 +m n+m J 
 
 corresponding co' s. Often ( m _-| ) is much less than ( m _-j) and the 
 
 reason for the reduction of A to A now becomes apparent. It will be con- 
 venient to work with a normalized version of these inequalities, |Vx| l e, 
 where e = (1,1,..., 1) , in which each row has been divided by the corresponding 
 right hand side. 
 
55 
 
 The polytope B C. is covered by expanding A C. + by a factor 0. 
 
 Corresponding to the expanded polytope A 0C, . is the hyperplane represen- 
 tation 
 
 |Vx| < 0e (6.11) 
 
 in which the support hyperplanes have been moved out away from the origin 
 
 by a factor 0. Given any x in R m , the smallest for which (6.11) holds 
 
 is the norm of x, where here, instead of a unit ball or cube, there is a 
 
 unit polytope. Of interest are the norms of the points which lie in B C. . 
 
 K l 
 
 ^ K l . A 
 Such points are of the form Bg = I 7-b., with \y.\ <_ 1 . The particular 
 
 i=l n ] ■* 
 
 x = §g* is sought which gives the largest 0. It is known that this x is a 
 
 vertex of B C. , and \y.*\ = 1, 1 ± i £ k,. Thus, the smallest is sought 
 
 for which any choice of \y . | <_ 1 generates an x = Bg which satisfies (6.11). 
 
 Let U = V B be a k.xk-, matrix with elements m... Then, it must be that 
 
 3 1 ij 
 
 a A. XL a 
 
 |Ug| <_®e. Let u. be the i row of U. Then, since \y.\ <_ 1 , 1 <_ i <_ k, , 
 
 t* 
 l u - 9 1 1 ll u illi» W1tn equality when the components of g are chosen to have 
 
 magnitude 1 and sign of the corresponding elements of u. . Let J be that 
 
 t~ t" t* 
 
 subscript for which u. g. is maximum, i.e., u, g, = max u. g., where g. 
 
 t " 
 is that g chosen in the above manner for each row u. . Then, g* = g, with 
 
 ** t A 
 
 elements y^ = 2-sign (// )-l , 1 <_ i <_ k, , ahd = u, g*. 
 
 If d e 0C + corresponding to Bg* is desired, additional computations 
 need to be performed. Let \f. be the J row of V, where u, = Vj B and J is 
 
 as defined above. Thus, v is the normal to the hyperplane which supports 
 
 J 
 
56 
 
 the face of A OC on which Bg* lies. Hence, Bg* may be represented as 
 
 in (6.10). For the first sum in (6.10), |5.| = +<9, where the sign is determined 
 by the sign of the corresponding V^a.. This follows from the fundamental 
 theorem of linear programming which states that the inactive variables take 
 on their extreme values. The values of 6. in the second sum are determined 
 by solving an mx(m-l) consistent system of equations involving those columns 
 of A for which v\a. = 0, and the right hand side (tg*-y). If the number of 
 columns in A for which v,a. - is greater than m-1 , then the problem becomes 
 a linear program. 
 
 In an implementation of this algorithm, the matrix U need not be 
 stored. It may be computed one row at a time and a variable may hold the 
 largest value of u. g. computed so far. Additionally, the normals, v., 
 1 < j < k», may be easily computed in the following manner. Let A be an 
 mx(m-l) matrix whose columns are some m-1 columns of A. Then, A == QR, 
 where R has the form 
 
 where the last row of R is zero. If rank (R) = m-1, then the m-1 columns 
 of A chosen are linearly independent and hence, determine an (m-l)-dimensional 
 face. The normal to the hyperplane supporting this face is v = Qe , where 
 e is the last column of the identity matrix of order m, and || v|| 2 = !• 
 
57 
 
 When m is greater than 20, it is computationally advantageous to compute 
 the QR decomposition and the normal v in a different manner. Rather than 
 computing a new QR decomposition for each combination m-1 columns of A, 
 the combinations may be ordered in such a way that the factorization of 
 the next combination may be easily computed from that of the previous one. 
 As before, let A = QR, where A consists of some m-1 columns of A. Let A' 
 be the same as A except that column k has been replaced by a column, a., 
 of A so as to form a new combination. The revolving-door algorithm given 
 by Nijenhuis, Wilf [13] specifies a sequence of replacements which will 
 
 kp+m 
 
 generate all ( , ) combinations. Then A' = QR', where R' has the form 
 
 and R' is the same as R except for the k column which is now Q a.. The 
 
 J 
 
 matrix R' may be reduced to the form of R by premultipli cation by an orthogonal 
 matrix P consisting of a product of Givens plane rotations. 
 
 P = G , . . . G, . , G. . , 
 m-1 k+1 k+1 
 
 m 
 
 where G. introduces in the i row of column k, and G! introduces in the 
 i+l st row of column i. The QR factorization of A* is then A = (QP t )(PR') 
 or A' = Q'R', and the normal vector corresponding to this new combination 
 
 is v' = Q'e . This method has the disadvantages of requiring extra storage 
 
58 
 
 for Q in an unfactored form, and of the loss of orthogonality in the 
 columns of Q. 
 
 Both of the methods presented for finding g* are exhaustive. 
 Hence, these methods are viable only for small k, or k~, due to the 
 computation time involved. For larger problems, methods which obtain 
 an approximation to are required. 
 
 One algorithm to compute an approximation to g* is based on the 
 
 following heuristic. If the direction in which Bg* lies is known, then 
 
 the choice of 7* could be made according to whether b., the i column of 
 i 3 1 
 
 B, or -b. lies in that direction. The direction used is the eigenvector, 
 
 x m ,„. correspondinc 
 max r 
 
 eigenvalue problem 
 
 x , corresponding to the largest eigenvalue, ^ max > of the generalized 
 
 B Z l x = XAA t x (6.12) 
 
 This problem is similar to the problem which Miller [11] solves to obtain 
 
 1/2 
 an approximation to 0. He has shown that (^ max ) approximates 0, and that 
 
 his approximation satisfies 
 
 < n+m »" 1/2 < X m ax> 1/2 ± 1 < X m ax> V2 (-"""J 1 ' 2 < 6 - 13 > 
 
 The following figure illustrates Miller's methods applied to the model 
 problem. 
 
59 
 
 [>??]- 
 
 Of-,] 
 
 1 J T 
 
 The unit sphere S m = {v e R m : ||v|| 2 = 1) replaces the unit cubes C 
 
 n+m 
 
 and 
 
 C r _ n , and ellipsoids kk\ and BB^ replace the polytopesA C n+m and B C p _ n 
 The ellipsoid AA t S m is expanded so as to just cover B B S m . Solving (6.12) 
 
 yields (X ) 1/2 = 1.163, and (6.13) states that lies in the interval 
 
 max 
 
 0.4749 = (6)" 1/2 1.163 < < 1.1163 (5) 1/2 = 2.6 
 
60 
 
 The following figure illustrates the use of x mau as a guide to 
 
 max 
 
 approximating g* and hence Bg*. 
 
 0? 2 ]t 
 
 '[PV^Ij 
 
 Each 7*» 1 j< i _< r-n, is chosen so that 7*b. has a positive projection on 
 
 max 
 
 ^ t - 
 +1 if x b. > 
 max l - 
 
 n = 
 
 (6.14) 
 
 v 
 
 max l 
 
61 
 
 For the model problem, (6.14) was successful in generating the true g*, 
 and a dual linear program similar to (6.9) led to the calculation of 
 G= 1.1176. When applied to other problems, algorithm (6.14) did not 
 always produce the true g*. Nevertheless, the resulting approximate © 
 is a lower bound on the true ©, since the approximate g* gives rise to 
 a Bg* which is a vertex of B" C p _ n . Occasionally, the approximation to 
 g* was improved by using (6.14) iteratively, the approximate Bg* obtained 
 on the previous iteration serving as a guide for the present iteration. 
 Frequently, the resulting increase in ©was marginal. 
 
 The preceding discussion has not made use of the simplification 
 of polytope representation described in the previous section. Since 
 B C = # C. and A C n+m = A C k +m , the same techniques used by Miller 
 [11] in the development of (6.12) and (6.13) m *y be used to establish 
 similar relationships based on B C k and A C k +m - Thus, may be approxi- 
 mated by (^ J 1 ^ 2 , where /i is the largest eigenvalue of the generalized 
 * max max 
 
 eigenvalue problem 
 
 ffy = jxAA 1 ^ (6.15) 
 
 1/2 
 
 Also, the approximation, (M mav ) , satisfies 
 
 max 
 
 <4 +m >" 1/2 KJ n i & i cw ,/2 <v 1/2 < 6 - 16 > 
 
 The following figure illustrates the ellipses in (6.15) for the model problem. 
 
62 
 
 [PV 2 ]j 
 
 Solving (6.15) yields t^VZ = K270f and (6J6) state$ ^ Q ^ . p 
 the interval. 
 
 ,-V2 
 
 0.6342 = (4)-"' 1.270 < 6 < 1.270 (3) 1/2 = 2. 
 
 200 
 
 This interval is smaller than that provided by Miller, and similar results 
 were found for most of examples tested. In all cases, the lower bound 
 
63 
 
 was larger than Miller's, though a proof that this is always true is 
 
 1/2 1/2 
 lacking. "Additionally, it was observed that (a* v ) > (X mav ) and 
 
 1/2 
 
 that, in most cases, (M mav ) was a much better approximation to © than 
 
 1 1 id a 
 
 1/2 
 (a ) . Table 7 provides examples for comparison of (6.12) and (6.15) 
 
 An algorithm similar to (6-14) based on the eigenvector guide, 
 
 y , corresponding to M max > performed neither consistently better nor 
 consistently worse. The values of 7* are chosen in the following manner. 
 
 
 +1 ify* b. > (6.17) 
 
 -'max 1 — 
 
 -1 if y l b. < 
 ^ •'max i 
 
 The resulting approximate Bg* is used in (6.9). Table 8 provides examples 
 
 for comparison. The true values of are computed by one of the exhaustive 
 methods. 
 
64 
 
 (6.12) 
 
 (6.13) 
 
 (6.15) 
 
 (6.16) 
 
 ALGORITHM true (X ) 1/2 interval for (m ) 1/2 Interval for 
 
 max max 
 
 BACK4 
 
 1.5879 
 
 1.390 [ 
 
 \3855, 
 
 4.815] 
 
 1.981 [ 
 
 \8086, 
 
 3.430 ] 
 
 SEMI4 
 
 1.2345 
 
 1.166 [ 
 
 \3234, 
 
 4.039J 
 
 1.526 j 
 
 ;.6229, 
 
 2.643 ] 
 
 PARA4 
 
 1.9547 
 
 1.380 [ 
 
 \3827, 
 
 5.163] 
 
 2.126 | 
 
 \8678, 
 
 4.251 ] 
 
 BACK8 
 
 2.4139 
 
 1.750 [ 
 
 ;.2669, 
 
 13.09 J 
 
 2.823 | 
 
 ;.7544, 
 
 7.468 ] 
 
 SEMI8 
 
 1.7148 
 
 1.426 [ 
 
 ;.2175, 
 
 10.67 ] 
 
 1.998 [ 
 
 ;.5340, 
 
 5.286 ] 
 
 PARA8 
 
 7.6503 
 
 2.179 [ 
 
 ;.3323, 
 
 20.21 ] 
 
 5.1468 [ 
 
 ;i.376, 
 
 17.07 ] 
 
 GS 
 
 10.723 
 
 3.488 ( 
 
 ;.7120, 
 
 30.39 ] 
 
 6.841 | 
 
 ]1.569, 
 
 34.88 ] 
 
 MGS 
 
 11.032 
 
 3.408 I 
 
 ;.6956, 
 
 31.67 J 
 
 6.748 | 
 
 ;i.548, 
 
 35.71 ] 
 
 LTBAND 
 
 4.2329 
 
 2.098 [ 
 
 ;.3449, 
 
 19.46 ] 
 
 3.985 I 
 
 ;i.029, 
 
 12.60 ] 
 
 RINVERSE 
 
 18,329. 
 
 (2) 
 
 
 23,936. | 
 
 !5,491. 
 
 ,75,691.] 
 
 GJ 
 
 2.3719 
 
 1.341 [ 
 
 ;.2737, 
 
 11.69 ] 
 
 2.438 | 
 
 ;.8621, 
 
 6.451 ] 
 
 GE 
 
 2.1406 
 
 1.273 [ 
 
 \2599, 
 
 10.02 ] 
 
 2.623 I 
 
 i-9273, 
 
 6.424 ] 
 
 CHOL 
 
 2.3450 
 
 1.395 [ 
 
 \3288, 
 
 10.98 ] 
 
 2.747 | 
 
 \7343, 
 
 9.906 ] 
 
 MATMUL 
 
 (1) 
 
 2.723 [ 
 
 \3930, 
 
 28.82 ] 
 
 6.404 I 
 
 ;.9243, 
 
 25.62 ] 
 
 CG 
 
 6.3204 
 
 3.150 1 
 
 ;.9093, 
 
 34.79 ] 
 
 5.2403 [ 
 
 ;i.747, 
 
 29.65 ] 
 
 Notes: (1) Computational time was prohibitive for exhaustive methods. 
 
 (2) Miller's version of BB in (6.12) was not positive-definite. 
 
 Table 7. Approximations to 
 
65 
 (6.14) (6.17) 
 
 ALGORITHM 
 
 true 
 
 <W 
 
 ^max ) 
 
 BACK4 
 
 1.5879 
 
 1.5879 
 
 1.5879 
 
 SEMI4 
 
 1.2345 
 
 1.2345 
 
 1.2345 
 
 PARA4 
 
 1.9547 
 
 1.9547 
 
 1.9547 
 
 BACK8 
 
 2.4139 
 
 2.3014 
 
 2.3014 
 
 SEMI8 
 
 1.7148 
 
 1.6501 
 
 1.6501 
 
 PARA8 
 
 7.6503 
 
 5.1575 
 
 5.1575 
 
 GS 
 
 10.723 
 
 7.5674 
 
 7.5674 
 
 MGS 
 
 11.032 
 
 7.0978 
 
 6.685 
 
 LTBAND 
 
 4.2329 
 
 4.1062 
 
 4.1063 
 
 RINVERSE 
 
 18,329. 
 
 15,314. 
 
 15,314, 
 
 GJ 
 
 2.3719 
 
 2.3719 
 
 2.3719 
 
 GE 
 
 2.1406 
 
 2.1406 
 
 2.1223 
 
 CHOL 
 
 2.3450 
 
 2.3393 
 
 2.1699 
 
 MATMUL 
 
 - 
 
 5.7605 
 
 5.8892 
 
 CG 
 
 6.3204 
 
 5.7967 
 
 5.7967 
 
 Table 8. Eigenvector Guide Approximations 
 
66 
 
 7. Comparison of Algorithms 
 
 -There are often several algorithms which may be used to solve 
 a given problem, or set of similar problems. The question arises as to 
 which of the competing algorithms should have been used. The answer will 
 depend upon the criteria used for selection. The analyses of the previous 
 sections provide several alternative criteria. Attention will be restricted 
 to a choice between two algorithms. 
 
 The simplest measurement of competing algorithms is the forward 
 analysis. One algorithm is preferable to another if it can guarantee a 
 smaller bound on the relative error in the solution. Let B. be the matrix 
 of partial relative derivatives of outputs with respect to computations 
 for algorithm i. Then, algorithm 1 is preferable to algorithm 2 if 
 
 A refinement of (7.1) may use (5.6) to compare the relative error bounds 
 for each output individually. 
 
 Miller [8] proposes a measurement based on the polytopes 
 B, C and B ? C , where r.-n is the number of computations in algor- 
 
 ithm l, and C. is the unit cube in R . He defines the numbers 
 
 "1/2 = 9 1b <">°: B i C r r n^ wB 2 C r 2 -n } 
 ca^ = glb {w > 0: BgC^cco^ C^} 
 
 (7.2) 
 
 These numbers may be interpreted as follows. If algorithm 1 produces a 
 computed result with local relative errors bounded by e, then the same com- 
 puted result can be obtained by algorithm 2 with local relative errors 
 
67 
 bounded by <a /2 " e - An analogous interpretation holds for w ? /i. 
 Clearly, algorithm 1 is preferable to algorithm 2 if w , ,„ > 1, and 
 algorithm^ is more attractive if (0 2/i > ^ • However, as Miller shows, 
 
 ^1/2 ' ^/l - ^* and it: ^ s P° sslb l e f° r botn (0 i/2 and w 2/l t0 be 9 reater 
 than 1 indicating that in some respect each algorithm is to be preferred 
 
 to the other. This possibility poses a dilemma in the choice between 
 
 two algorithms. Consider the following problem: 
 
 "l = *1 * ({ 2 + { 3> 
 
 "2 = h* { h +f 2» 
 
 and the two algorithms 
 
 "2 = *3 * { 1 + f 3 * { 2 
 2: ,, = t, x« z + t, x« 3 
 
 "2 = «3 X (f l + *2> 
 
 For the data £, = -1, £ ? ■ 2, £~ = -3, the polytopes ACj-, B-,C 5 , and B ? C 5 
 are shown below. Using (7.2) to compare these two algorithms gives 
 W l/2 = ^' and °2/l = ^' Tnese results supply conflicting information for 
 choosing between the two algorithms. 
 
68 
 
 .[PT? 2 ] T 
 
 - [PV^j 
 
 This dilemma may be overcome by choosing the polytope of the 
 problem as the standard against which the polytopes of the algorithms are 
 compared. Define the following numbers: 
 
 ©1 
 
 e, 
 
 a 
 
 e. 
 
 1/2 
 
 2/1 
 
 glb { 9>0: B, C ri . n caftC nW 
 glb {8 >0: B 2 C r2 _ n c^C n+m } 
 
 V 2 
 
 e 2 /e, 
 
 (7.3) 
 
 The numbers © 1 .,, and 0^ compare the algorithms by taking ratios of the 
 
69 
 
 stability numbers determined by a B-analysis. Clearly, algorithm 1 is 
 more stable if 0, ,~ < 1, and algorithm 2 is more stable if ?/ , < 1. From 
 the definitions, it follows immediately that 0, ,~ ' ?/1 = 1. implying 
 that each algorithm cannot be more stable than the other. Applying (7.3) 
 to the two algorithms gives 0, = 1 , 0« - 4/3, 0, /2 = 3/4, and 2/ , = 4/3, 
 indicating that algorithm 1 is the preferred algorithm. 
 
 Bounds may be derived which relate the numbers defined in (7.2) 
 and (7.3). These bounds make use of the following definitions: 
 
 Q 2 = glb<e>0:Ac n+m C<E*f 2 C r2 . n } 
 From (7.3), 
 
 B, C c 0. AC . 
 
 1 r,-n — 1 n+m 
 
 Using (7.4), 
 
 B, C n c © B. C n 
 1 r,-n — 12 2 r 2 -n 
 
 But (7.2) states 
 
 B, C „ c ol /0 B C 
 1 r-i-n — 1/2 2 r 2 ~n 
 
 from which it follows that 
 
 9 l G 2^ CJ l/2 
 Multiplying the left hand side by 1 in the form © 2 /0 2 results in 
 
70 
 
 0„ 0„ 0, ,„ > 
 
 ■2 W 2*V2 - "1/2 
 
 or 
 
 l/2 - °i/2 /(0 2 2 } (7 - 5) 
 
 Similar steps would show that 
 
 2/l - W 2/1 /(0 1 1> (7 ' 6) 
 
 The quantities (0. 0. ) j> 1, i = 1, 2, may be called similarity coefficients 
 
 They measure the similarity of the polytopes AC , and B. C . and have 
 
 n+m i r -:~ n 
 
 value 1 when the polytopes are the same. 
 From (7.2), 
 
 1 r-,-n - n/2 2 r~-n 
 
 Using (7.3) 
 
 But (7.3) states 
 
 B, C c w. /9 AC , 
 1 r,-n - 1/2 2 n+m 
 
 Bn C n c 0, AC . 
 1 r,-n - I n+m 
 
 from which it follows that 
 
 0~ > 0, 
 
 "1/2*2 ^ W l 
 
 or 
 
 "1/2 i ®l/2 (7 - 7 ' 
 
71 
 
 Similar steps would show that 
 
 "2/1 * ®2/l (7 - 8) 
 
 Collectively, (7.5) and (7.7) give 
 
 "1/2 *■ °l/2 i "l/Z^^ 
 and (7.6) and (7.8) produce 
 
 "2/1 £ ®2/l £ 1>/1 /(0 1 ®l' 
 
 Using (7.3), a variety of algorithms for solving dense systems were com- 
 pared for various data. The results are given in the following tables. 
 
72 
 
 
 Algorithm 
 
 2 
 
 
 l/2 
 Alqorithm 1 
 
 G.E. 
 
 w/o pivoting 
 
 G.E. 
 w/o pivoting 
 
 G.J. 
 
 Givens 
 
 House 
 
 G.S. 
 
 M.G.S. 
 
 1.0 
 
 .549 
 
 1.14* 
 
 10 3 
 
 5.56*10 2 
 
 5.21*10 2 
 
 7.46*10 2 
 
 G.J. 
 
 1.82 
 
 1.0 
 
 2.07x 
 
 10 3 
 
 1.01*10 3 
 
 9.52*10 2 
 
 1.36*10 3 
 
 Givens 
 
 .879*10~ 3 
 
 .483*10~ 3 
 
 1.0 
 
 
 .490 
 
 .459 
 
 .654 
 
 House 
 
 .180*10' 2 
 
 .988*10~ 3 
 
 2.04 
 
 
 1.0 
 
 .935 
 
 1.34 
 
 G.S. 
 
 .192xl0" 2 
 
 .105xl0" 2 
 
 2.18 
 
 
 1.07 
 
 1.0 
 
 1.43 
 
 M.G.S. 
 
 .134xl0" 2 
 
 .737xl0" 3 
 
 1.53 
 
 
 .747 
 
 .701 
 
 1.0 
 
 Data: wel 1 -conditioned, Miller [12] 
 
 A = 
 
 2.908 
 
 -2.253 
 
 6.775 
 
 3.970 
 
 1.212 
 
 1.995 
 
 2.266 
 
 8.008 
 
 4.552 
 
 5.681 
 
 8.850 
 
 1.302 
 
 5.809 
 
 -5.030 
 
 0.099 
 
 7.832 
 
 , b = 
 
 6.291 
 7.219 
 5.730 
 9.574 
 
 , x = 
 
 -.7238 
 .1108 
 .8389 
 .7460 
 
Algorithm 2 
 
 73 
 
 l/2 G.E 
 Algorithm 1 w/o pivoting 
 
 G.J 
 
 Givens 
 
 House 
 
 G.S 
 
 M.G.S 
 
 G.E. 
 
 w/o pivoti 
 
 ng 
 
 1.0 
 
 
 .980 
 
 
 .571 
 
 
 .328 
 
 
 .117xl0" 3 
 
 .385xl0" 3 
 
 G.J. 
 
 
 1.02 
 
 
 1.0 
 
 
 .588 
 
 
 .333 
 
 
 .120*10" 3 
 
 .394*10' 3 
 
 Givens 
 
 
 1.75 
 
 
 1.70 
 
 
 1.0 
 
 
 .588 
 
 
 .204*10" 3 
 
 .671*10" 3 
 
 House 
 
 
 3.05 
 
 
 3.00 
 
 
 1.70 
 
 
 1.0 
 
 
 .357*10" 3 
 
 .117*10' 2 
 
 G.S. 
 
 
 8.55* 
 
 10 3 
 
 8.35x 
 
 10 J 
 
 4.90* 
 
 10 3 
 
 2.80* 
 
 3 
 10 J 
 
 1.0 
 
 3.33 
 
 M.G.S. 
 
 
 2.60* 
 
 10 3 
 
 2.54* 
 
 10 3 
 
 1.49* 
 
 10 3 
 
 8.52* 
 
 10 2 
 
 .300 
 
 1.0 
 
 Data: ill-conditioned - Vandermonde 
 
 A = 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 10 
 
 
 1 
 
 1/2 
 
 1/4 
 
 1/8 
 
 1/16 
 
 , b = 
 
 13/8 
 
 
 2 
 
 1/4 
 
 1/16 
 
 1/64 
 
 1/256 
 
 7/16 
 
 , x = 
 
 3 
 
 1/8 
 
 1/64 
 
 1/512 
 
 1/4096 
 
 
 167/1024 
 
 
 L ^ 
 
 ^ — 
 
74 
 
 Algorithm 2 
 
 Q l/2 G.E. 
 Algorithm 1 w/o pivoting 
 
 G.J 
 
 Givens 
 
 House 
 
 G.S 
 
 M.G.S. 
 
 G.E. 
 
 w/o pivoti 
 
 ng 
 
 1.0 
 
 
 .441 
 
 .521 
 
 .174 
 
 .613* 
 
 io- 1 
 
 .714*10 _1 
 
 G.J. 
 
 
 2.27 
 
 
 1.0 
 
 1.18 
 
 .395 
 
 .139 
 
 
 .162 
 
 Givens 
 
 
 1.92 
 
 
 .846 
 
 1.0 
 
 .333 
 
 .118 
 
 
 .137 
 
 House 
 
 
 5.76 
 
 
 2.53 
 
 3.00 
 
 1.0 
 
 .352 
 
 
 .410 
 
 G.S. 
 
 
 1.63* 
 
 10 1 
 
 7.19 
 
 8.50 
 
 2.84 
 
 1.0 
 
 
 1.17 
 
 M.G.S. 
 
 
 1.40* 
 
 10 1 
 
 6.17 
 
 7.30 
 
 2.44 
 
 .858 
 
 
 1.0 
 
 Data: symmetric 
 
 A = 
 
 6 
 
 4 
 
 4 
 
 1 
 
 4 
 
 6 
 
 1 
 
 4 
 
 4 
 
 1 
 
 6 
 
 4 
 
 1 
 
 4 
 
 4 
 
 6 
 
 , b 
 
 30 
 35 
 40 
 45 
 
 , x = 
 
 1 
 
 2 
 
 3 
 
 I 4 J 
 
75 
 
 
 Algori 
 
 thm 
 
 2 
 
 
 Alqorithm 1 
 
 G.E. 
 
 w/o pivoting 
 
 w/o 
 
 G.E. 
 pivoting 
 
 G.J. 
 
 
 Givens 
 
 House 
 
 G.S. 
 
 M.G.S. 
 
 1.0 
 
 .917 
 
 
 .223 
 
 .145 
 
 .221 
 
 .213 
 
 G.J. 
 
 
 1.09 
 
 1.0 
 
 
 .244 
 
 .159 
 
 .242 
 
 .233 
 
 Givens 
 
 
 4.48 
 
 4.10 
 
 
 1.0 
 
 .654 
 
 .990 
 
 .952 
 
 House 
 
 
 6.88 
 
 6.30 
 
 
 1.53 
 
 1.0 
 
 1.52 
 
 1.47 
 
 G.S. 
 
 
 4.53 
 
 4.14 
 
 
 1.01 
 
 .658 
 
 1.0 
 
 .962 
 
 M.G.S. 
 
 
 4.69 
 
 4.29 
 
 
 1.05 
 
 .682 
 
 1.04 
 
 1.0 
 
 Data: diagonally dominant 
 
 A = 
 
 4 
 
 1 
 
 1 
 
 1 
 
 
 13 
 
 
 1 
 
 2 
 
 -5 
 
 1 
 
 1 
 
 , b = 
 
 -1 
 
 > x = 
 
 2 
 
 1 
 
 2 
 
 6 
 
 1 
 
 
 27 
 
 
 3 
 
 3 
 
 1 
 
 2 
 
 -7 
 
 
 -17 
 
 
 4 
 
76 
 
 
 Algori 
 
 thm 
 
 2 
 
 
 V 
 
 Alqorithm 1 
 
 G.E. 
 
 w/o pivoting 
 
 w/o 
 
 G.E. 
 pivoting 
 
 1.0 
 
 G.J. 
 
 
 Givens 
 
 House 
 
 G.S. 
 
 H.G.S. 
 
 .935 
 
 
 .210 
 
 .104 
 
 .220 
 
 .198 
 
 G.J. 
 
 
 1.07 
 
 1.0 
 
 
 .225 
 
 .111 
 
 .235 
 
 .211 
 
 Givens 
 
 
 4.76 
 
 4.45 
 
 
 1.0 
 
 .493 
 
 1.05 
 
 .943 
 
 House 
 
 
 9.65 
 
 9.02 
 
 
 2.03 
 
 1.0 
 
 2.12 
 
 1.91 
 
 G.S. 
 
 
 4.55 
 
 4.25 
 
 
 .956 
 
 .471 
 
 1.0 
 
 .901 
 
 M.G.S. 
 
 
 5.06 
 
 4.73 
 
 
 1.06 
 
 .524 
 
 1.11 
 
 1.0 
 
 Data: positive definite 
 
 A = 
 
 10 
 
 3 
 
 2 
 
 3 
 
 11 
 
 3 
 
 2 
 
 3 
 
 12 
 
 1 
 
 2 
 
 3 
 
 L. 
 
 — \ 
 
 
 — ~\ 
 
 1 
 
 
 26 
 
 2 
 
 , b = 
 
 42 
 
 3 
 
 
 56 
 
 13 
 
 
 66 
 
 [? 
 
 , x = 
 
 L 
 
77 
 
 8. Composite Algorithms 
 
 -A composite algorithm is one which is made up of smaller sub- 
 algorithms where the outputs of one sub-algorithm serve as the inputs to 
 the next. Often an algorithm can be thought of as composite by partitioning 
 it into parts which perform distinct logical functions. For example, an 
 algorithm for solving a system of equations may be considered a concatena- 
 tion of two algorithms, one of which decomposes the matrix of coefficients 
 followed by an algorithm which performs substitutions. The purpose of con- 
 sidering algorithms as composite is to be able to analyze an algorithm in 
 terms of analyses done on its parts. Additionally, time may be saved in 
 analyzing several algorithms by analyzing only once their common sub- 
 algorithm. 
 
 Without loss of generality, a composite Algorithm 12 will be con- 
 sidered as Algorithm 1 followed by Algorithm 2. Let Algorithm 1 have n, 
 inputs, £ . , 1 <_ i <_ n, ; m, outputs, v . , 1 <_ i <_ m, ; and r,-n, computa- 
 tional nodes, (• . , n, < i <_ r, . 
 
78 
 
 (2) 
 
 Similarly, let Algorithm 2 have rip inputs £, v , 1 £ i < n^; rru outputs 
 t?. , 1 ji i 1 m«; and rp-n,, computational nodes, £.* , n~ < i <_ r~. 
 
 ► (2) . (2) 
 *1 *2 
 
 (2) 
 
 Algorithm 2 
 
 r, (2) » (2) 
 
 (2) 
 
 If m, = n ? , then the outputs of Algorithm 1 may be equated with the inputs 
 of Algorithm 2, and the composite Algorithm 12 may be formed. 
 
 , (12) * (12) f (12) 
 
 h h & n 1 
 
 9 9'-? 
 
 Algorithm 1 
 
 Algorithm 2 
 
 A 6-. -6 
 
 (12) (12) „ (12) 
 
79 
 
 Algorithm 12 has n, inputs, (■.* £, , 1 <. i < n, ; m« outputs, 
 
 (12) (2) 
 
 t?^ ' r 1| . 1 li £ m p> anc * r i" n i + r 2" n 2 com P utationa l nodes, 
 
 r S^ ] K if i < r 1 
 
 */ 12) ■ 
 
 i 
 
 - '1 
 
 n, < l < ^i + ^2~ n 2 
 
 Associated with Algorithms 1, 2 and 12 are the matrices of 
 partial relative derivatives of outputs with respect to input and computa- 
 tional nodes. For Algorithm 1, A, is an m,xn, matrix with elements 
 
 (1) 
 
 ij 
 
 W/tf 0) 
 
 = m^'/fit. 
 
 and B, is an m,x(r,-n,) matrix with elements 
 
 f(D (I), t (l) 
 
 p. . = on . ' /pE ' . 
 
 ij K 'l n^+j 
 
 Similarly, for Algorithm 2, A~ 2 is an nuxn- matrix with elements 
 
 a! 2 > - „„.< 2 W 2) 
 
 IJ 1 ' J 
 
 and Bp is an nux^-n,,) matrix with elements 
 
 -(2) (2), fc (2) 
 ij p 'l n 2 +j 
 
 Finally, for Algorithm 12, A, ? is an m ? xn, matrix with elements 
 
 a^ - «1 02 W 12) = ^i (2) /P«j (1) 
 and B, 2 is an m„x(r,-n, + r ? -n ? ) matrix with elements 
 
80 
 
 r 
 
 prj 
 
 , lZ) M^l.ifi- 
 
 *< 12 > -= ^ (12 W< ,2 j - { 
 
 r^+j 
 
 r r n i 
 
 u 
 
 v^ l j-^+n-j+n^ J 1 1 
 
 The matrices for Algorithm 12 may be computed from the matrices 
 for Algorithms 1 and 2. Consider an element of A,,,, 
 
 -(12) (2), , (1) 
 a: . ' = pv ■ Pk ■ 
 
 lj 1 J 
 
 Any path between n . * ' and £.* ' must pass through 77. ^ ' = £. ^ ', 1 £ k £ n ? , 
 the outputs of Algorithm 1 and inputs of Algorithm 2. Hence, by the path 
 product property of partial relative derivatives, 
 
 n~ 
 
 t02) _ 
 
 a 
 
 1,1 
 
 , = 1 1 K K J 
 
 s «-, (2) ^- (1) 
 k-l lk kj 
 
 Thus, 
 
 A 12 " A 2 ' A l 
 
 (8.1) 
 
 Let 1 <_ j <_ r-,-n,, and consider an element of B^ 
 
 jrf! 2) = PV. {2) /P^l 
 
 By employing the path product argument, 
 
 n~ 
 
 *!] 2) ■ i^ww'y&j 
 
 k=l 
 
 n 
 
 ik K kj 
 
 k=l 
 
For 1 < j < r 2 -n 2 » 
 
 jT< 12 > 
 i.r 1 -n 1 +j 
 
 (2), y (2) w (2) 
 
 ""1 /p{ n,ij = *1j 
 
 Thus 
 
 81 
 
 B 12 = (A 2 B r B 2 ) 
 
 (8.2) 
 
 Using (8.1) and (8.2) to compute A, 2 and B, 2 from A,, B , , Ap, 
 and Bp may be more efficient than computing them by (4.1) from A, ? and B, ? , 
 where A,p is an (r-,-n, + r^-n^Jxri, matrix of the form 
 
 A 12 
 
 (A^, 0) 
 
 and B, 2 is an (r-.-n, + r ? -n ? )x(r,-n 1 + r ? -n ) matrix of the form 
 
 "12 
 
 where X is zero except that column cc. ' contains column i of A ? , where 
 co. is the variable which indexes the outputs in Algorithm 1, i.e., if 
 k = "j , then v- ' = * k » l-< 1 <n 2 - The matrices ^ , A 2 , B^ , and 
 Bp are those defined in (3.7) for Algorithms 1 and 2. The number of opera- 
 tions required by (4.1) is at most 4mp(r,-n, + r ? -n ? ) while (8.1) and (8.2) 
 require 2mpr,m, operations (note the independence from r 2 ). Thus, if r 2 
 is greater than r-, (m,-2)/2+n 1 +n 2 , then A, 2 and B, 2 are more efficiently 
 computed by (8.1) and (8.2). 
 
 Formulas (8.1) and (8.2) may be used to perform a forward analysis 
 of Algorithm 12 in terms of forward analyses of Algorithms 1 and 2. 
 
82 
 
 From (5.4), 
 
 n l „„ mm W^ 
 
 [p„02)] T | < i|p,/"W">| * i |p*< 12 W 12) | 
 
 1 ' 0=1 n J J-n^l n J 
 
 n l r l +r ?~ n ? 
 
 < Z |a..02), + ] | 2 F (1?) , 
 
 (8.3) 
 
 i I|a 12 il + IIb 12 II. < 8 - 4 ) 
 
 where the same bound is chosen for all i, 1 <_ i <_ nu. Using (8.1), the 
 condition number of the problem may be bounded by 
 
 I|a 12 IL < Ha 2 IL- II^IL 
 
 and using (8.2), the condition number of the algorithm may be bounded by 
 
 llBizll- i I^IL- "Bill- + H^H- 
 
 Naturally, these easily computed bounds will be more pessimistic than those 
 of (8.4), or (8.3) which require the calculation of A,- and B,.. 
 
 A backward analysis for Algorithm 12 can be done if a backward 
 analysis is possible for both Algorithms 1 and 2. The rank (A, 2 ) will be 
 nip, if rank (A,) = m, > m„, and rank (A«) = m«. Let the stability numbers 
 for backward analysis for Algorithm 1 and 2 be denoted 
 
 ©, = glb{6 > 0: B, C „ c 6A. C } (8.5) 
 
 1 3 — 1 r i~ n i — ' n i 
 
 and 
 
 = glb{0 > 0: B 9 C c €K\ C } (8.6) 
 
 respectively. 
 
83 
 
 From (8.5") it follows that 
 
 A 2 B l C r rni E 1 A 2 A l C ni < 8 ' 7 > 
 
 If 0' is defined by 
 
 0' = glb{0 > 0: C c 0A, C } (8.8) 
 
 (such as 0' exists since rank (A,) = m = n 2 ) then from (8.6) 
 
 h C r 2 -n 2 £ 6 2 ' h *1 C ni (8 - 9 > 
 
 Together, (8.7) and (8.9) imply 
 
 [*2 h' hK } - n , * r 2 -n 2 £ < 1 + e 2 & ^2 *| C n, (0 - 10) 
 
 But the stability number for backward analysis for Algorithm 12 is 
 
 9 12 . 9lb{e>0:B 12 C ri . ni + r2 . n2 c8J ]2 C n] ) (8.11) 
 
 and hence, 
 
 ]2 < + © 2 0' (8.12) 
 
 While (8.12) shows how the stability number for backward analysis for 
 Algorithm 12 may be related to those of Algorithms 1 and 2, the evaluation 
 of this bound is complicated by the necessity of computing 0' in (8.8) by 
 one of the exhaustive methods. Nevertheless, the amount of computation 
 will certainly be much less than that required to compute 0-, 2 by exhaustive 
 methods. 
 
84 
 
 Finally, the stability number for B-analysis for Algorithm 12 
 can be bounded in terms of those for Algorithms 1 and 2. Let the stability 
 numbers for B-analysis for Algorithms 1 and 2 be denoted by 
 
 5, = glb{0>O: B, C^ £ 9^ ♦ A, C^)) (8.13 
 
 and 
 
 6 2 - g lb<0>O:B 2 C r2 _ n2 £ etC^.A-,^)} (8.14 
 
 respectively. From (8.13) it follows that 
 
 A B , C c 0.(A O C + A A, C ) (8.15 
 
 2 1 r^-n, - r 2 m, 2 1 n. 
 
 Combining (8.14) and (8.15), 
 
 A 9 B, C n + B C n c 0. C + (a + © 9 )A 9 C + 0, A 9 A, C (8.16 
 2 1 r,-n, 2 r 2 -n 2 — 2 m 2 1 2' 2, n 2 1 2 1 n n 
 
 Define 0" by 
 
 0" = glb{0 > 0: A C c 0C } 
 - 2 n 2 - m 2 
 
 Clearly, ©' ' = ||A || . Then (8.16) becomes 
 
 / ' oo 
 
 ^2 V ^-n, ♦ r 2 -n 2 £ < B 2 + < 5 , + ^F\ + ®1 h *l \ »■" 
 
 But the stability number for B-analysis for Algorithm 12 is 
 
 G 12 = glb{9 > 0: B 12 C^.^ + ^ £ 6^ + A, 2 C^)) 
 and hence 
 
 ]2 < max((0 2 + (Sj + 2 )0"), Sj ) 
 
85 
 
 9. Algorithm Improvement 
 
 "From the analysis of an algorithm it is possible to detect those 
 operations which have contributed the most to the poor performance of the 
 algorithm. Knowing which operations are at fault may provide information 
 as to where the algorithm might be changed to improve it. This information 
 is data dependent and enables only a limited improvement for the algorithm 
 evaluated for a specific data. Mathematical identities are used to replace 
 the offending operations. 
 
 One goal in algorithm improvement might be to reduce the forward 
 error bounds, |[p»?.] T |, 1 <_ 1 <. m. Since ||F|| = max | [pi? . ] | , reducing the 
 forward bounds may decrease the norm of B. By decreasing the norm of B 
 the polytope for the algorithm, B C , also shrinks so that the backward 
 or B-analysis may determine a smaller stability number, 6. 
 
 Recall (5.6), 
 
 H>t? ] I < 2 |pt?./pU (9.1) 
 
 1 ' j=n+l n J 
 
 Let I denote a set of indices of nodes in a sub-graph of the algorithm. 
 Then (9.1) may be expressed as 
 
 |[pr?.] T | < 2 \pnJpZA + 2 |pr?./pf.| (9.2) 
 
 1 ' ~ jel ' J #1 ] J 
 
 Let I* denote a set of indices of nodes for a sub-graph which is mathematically 
 
 equivalent to that specified by I. Then by substituting the new sub-graph, 
 
 (9.2) becomes 
 
 |[pr? ] | < Z |pr?,/pf J + Z l^i/^il ( 9 - 3 ) 
 
 1 jel' J j^I J 
 
86 
 
 If the substitution leads to a decrease in the value of the first sum, 
 then the bound has also been correspondingly decreased. Consider a simple 
 case in which I and I' consist of only one node, say % . 
 
 (9.4) 
 
 The sub-graph, £ , and surrounding nodes portray the statements 
 
 £ = £ a * $ b , and £ = % * £ c , where * is any one of (+, -, *, *). If 
 
 the statements £ , = £. * % , and £ = % * £ ,, corresponding to the graph 
 
 (9.5) 
 
87 
 
 are equivalent to (9.4), i.e., both mathematically produce the same value 
 for £ , then (9.5) may be substituted for (9.4). Also, this change was 
 possible only if £ was an operand for £ and no other node. Because 
 of this, all the quantities p7?./p£ . remain unchanged, except pt?./p£ 
 
 J H 
 
 becomes pnJpi . . Since 
 
 PT?./p£ q = (pT7./p£ p )(p£ p /p$ q ) 
 and (9.6) 
 
 pri./p^ q , = (pT? i /p£ p )(p£ p /p£ ql ) 
 
 the change in prijp% and hence, in the bounds, |[pt?. ]_|, is governed by 
 the change in the arc weight from P% /pi to pZ/pZ,. This fact supports 
 
 r H r H 
 
 the idea of investigating those sub-graphs which contain large arc weights, 
 since it is there that the greatest improvement can be made. However, each 
 time an arc weight is reduced there is some nearby arc weight which is in- 
 creased. This phenomenon is a result of the fact that the path product be- 
 tween unchanged nodes must remain the same. 
 
 Certainly, sub-graphs corresponding to algebraic identities are 
 mathematically equivalent. The simplest such identities are the associated 
 and distributive laws. These laws correspond to five node graphs similar 
 to (9.4) and (9.5). For the associative law of addition, the following 
 three graphs are interchangeable: 
 
<'M 
 
 and correspond to the identities 
 
 (^♦^.♦t. 
 
 «1 + «2 + £ 3> 
 
 «1 + «3» +? 2 
 
 If the first graph were a sub-graph of an algorithm, then successive replace- 
 ments of it by the other two would replace the term \pr\./p$. | in (9.1) by 
 |pT7./p£ 4 '| and then \prj./p^ ,l \. By (9.6), the change in |I>T? i ] T | would 
 be determined by the values of Ptjpi*, p£e/p£/, and pZc/pZa", an arc weight 
 in each graph. Clearly, the substitution to be made should choose the graph 
 with the smallest absolute arc weight. The reduction in the arc weight 
 between £. and ^ 5 will be compensated for by an increase in an arc weight 
 between the new £. and one of its operands. Thus, the overall effect is to 
 move large arc weights toward the inputs, i.e., any operations involving 
 cancellation should be done early in the computation. This suggests that 
 the graph be investigated from the bottom up, so that fewer and fewer paths 
 contain the arcs with large weights. 
 
 Since the weight of an incoming arc to an additive node is a 
 ratio of the operand to result, the graph with the smallest absolute arc 
 
 weight will be the one which computes the smallest £. in absolute value, 
 
89 
 
 n 
 This fact may be generalized to show the best way to compute n = 2 \y 
 
 Any algorithm to compute n may be specified by the sequence of partial sums 
 
 £ . 1 < i < n-1, where each £ .- is the sum of two inputs, one input and 
 *• n+-j » — — n+i 
 
 a previous partial sum, or two previous partial sums. Since 
 
 n-1 
 | Ml < 2 |pT?/P* n+i l 
 i=l 
 
 and 
 
 PH/P* n+i = * n +i /r? 
 the forward error bound can be minimized by choosing the partial sums so as 
 
 to minimize 
 
 n-1 
 
 . , '^n+i 
 i=l 
 
 Fo 
 
 r example, If ^ - 1,.* 2 - 1-25, « 3 = 1.5, and ^ = 1.75, then the algorithm 
 
 * 5 = «! + h - 2.25 
 
 h - h + h - 3 - 25 
 
 -I - f ? - «5 + «6 " 5.50 
 
 has a forward error bound of 2, while the algorithm which adds the numbers 
 in ascending order 
 
 h ' h + h " 2 - 25 
 
 h ' h + h - 3 ' 75 
 
 „ = J ? = 5 6 + { 4 = 5.50 
 has a forward error bound of 46/22 > 2. 
 
90 
 
 For the associative law of multiplication, the following three 
 graphs are interchangeable. 
 
 and correspond to the identities 
 
 ($! * ^ x * 
 
 «1 * C£ 2 - £3) 
 
 <*1 x *3> **2 
 
 However, since the weights of incoming arcs to multiplicative nodesare 1, 
 no immediate improvement can be made by interchanging these graphs. Such 
 interchanges may however prevent overflow or underflow at the intermediate 
 node £.. 
 
 For the distributive law of multiplication over addition, the fol 
 lowing two graphs are interchangeable. 
 
91 
 
 and correspond to the identities 
 
 *1 * « 2 * * 3 ) 
 
 (*, x i.) + (t x t ) 
 
 Th 
 
 is law is similar to the other distributive laws involving £-i x (£o - £?) 
 
 (£ ? + £ 3 )/£-i> and (£„ - £-)/£,, and what follows is applicable to these 
 variants. The choice between the two graphs involves comparing the values 
 of |p£ 5 /p£ 4 I and |p£ 5 /p£ 4 '| + |p£ 5 /p£ 4 "|- While the first is always equal 
 to 1, the second may be much larger. Since the sum of weights of incoming 
 arcs to additive nodes equals 1, i.e., p^c/pd/ + p%r/p£»' ' = 1» it follows 
 that |p£ 5 /p£/| + |p£r/p£ 4 ''| > 1- Thus, the graph (II) can never be better 
 than graph (I). As with the associative law of addition, the trend is to 
 move larger arc weights toward the inputs. 
 Finally, consider the identity 
 
 i 2 -t 2 
 
 *1 *2 
 
 (^ + 5 2 )(f 1 - € 2 ) 
 
 and associated graphs 
 
92 
 
 The choice here depends on the values of |p£ 5 /p£J + \piJPiA and 
 |p£(./p£ '| + IpSc/pS/I- The latter always has value 2, while the first 
 may have any value greater than 1. Conservatively, the choice should be 
 graph (IV). Again, the trend is to have any operations involving cancel- 
 lation done early in the computation. 
 
 The previous discussion has shown that offending operations may 
 be detected by the presence of large values of the arc weights leading 
 from their nodes. Through the use of mathematical identities, improvements 
 by graph substitution can be made which reduce the bounds for forward analysis 
 and hence other analyses as well. The underlying goal of these improvements 
 is to force large arc weights to the arcs leading from input nodes where 
 they will not contribute to the bound in (9.1). The identities described 
 involve very small graphs and hence are limited in their effectiveness. Ad- 
 ditionally, a graph substitution often enables another substitution to be 
 made in nearby nodes. The inability to know when to apply the associative 
 law of multiplication may inhibit some later substitution, and suggests 
 that greater improvement can be made by considering larger equivalent graphs. 
 However, this technique is limited by the fact that the computational graph 
 
93 
 
 does not contain all the information about the algorithm or the problem 
 it solves-. For example, the solution x of Ax=b satisfies each row equa- 
 tion, and the columns, q . , of Q in A=QR satisfy q. q. = 0, i f j. Such 
 information is not available from the inspection of graph nodes or algebraic 
 identities. However, it is this type of information which enables one to 
 transform Cramer's rule to Gaussian elimination, and Gram-Schmidt to 
 modified Gram-Schmidt. Algorithm improvement which could utilize this 
 additional information would have great consequences. 
 
94 
 
 10. Conclusion 
 
 "Methods have been presented for the automatic error analysis of 
 numerical algorithms. These methods have been developed using Bauer's 
 approach to relative error propagation. Basic to this approach are the 
 concepts of local and total relative errors, and the various systems of 
 equations which relate them. An algorithm is presented which makes use 
 of the equivalence of these systems in order to efficiently compute 
 the equations employed by the error analyses. A modification of the 
 algorithm, is also shown to efficiently compute the system of equations 
 used by Miller in his approach to error analysis. 
 
 Three error analyses are presented. A forward analysis, which 
 finds a relative error bound for the computed solution, is shown to be 
 easily calculated from the coefficients of the system relating local and 
 total relative errors. This computational ease is contrary to a remark 
 on manual analysis by Wilkinson [19, p. 3] which states that, "In practice, 
 we have found that the backward analysis is often much the simpler." Computa- 
 tionally, the backward analysis, and the B-analysis as well, have proven to 
 be much more difficult than the forward analysis. These analyses, which 
 evaluate the stability of the algorithm, are identified as minimax problems, 
 and as such required tremendous work for their solution, and could only be 
 applied to small problems. A simplification of these analyses significantly 
 reduced the work required, but nonetheless, the work remained exponential or 
 combinatorial in form. An approximation method, which is linear in compari- 
 son to the exact methods, is proposed and experimental results indicate that 
 the approximation is wery good. The analyses presented are all worst case 
 analyses. This is caused by the use of the °°-norm in the measurement of 
 
95 
 
 relative errors. The results of the analyses may, therefore, be pessimistic. 
 
 -Various spin-offs of the main theme are investigated. First, 
 different criteria are presented for the comparison of algorithms. The 
 choice between two algorithms depends on the criteria used. Second, error 
 analyses are derived for composite algorithms, which are made up of smaller 
 sub-algorithms. Bounds for forward, backward, and B-analysis of the compo- 
 site algorithm are given in terms of corresponding analyses of the sub- 
 algorithms. Finally, by using a forward analysis, limited algorithm improve- 
 ment is obtained by the location of sensitive operations and the substitution 
 of mathematically equivalent expressions. 
 
 Several problems remain unsolved and point the way for future 
 research. The most nagging is the requirement that the value of each node 
 in the computational graph be non-zero. This is a result of the use of 
 relative error in the description of the numerical effects of floating point 
 arithmetic. Arc weights for additive nodes with zero value are infinite. 
 Possible solutions include the replacement of zero by a non-zero quantity 
 which computationally acts as zero, or the assignment of huge arc weights 
 for those defined as infinity. Alternatively, the algorithm may be modified, 
 as in Miller [9], to remove some of the offending nodes altogether. 
 
 The simplification of polytope representation in section 6.2 is 
 performed by taking inner products of the columns of B and A to find those 
 columns which may be condensed. This technique is computationally expensive. 
 An alternative method may perform a condensation of the nodes in the computa- 
 tional graph, before the matrices are formed. A condensed node would be 
 made up of nodes to which the outputs have similar sensitivities. The 
 
% 
 
 sets, S., defined in theorem 6.3 may provide a clue as to which nodes may 
 
 j 
 
 be condense'd. 
 
 Much remains to be done in the area of algorithm improvement. 
 Ideally, methods for algorithm improvement should be data independent. 
 This, however, may not be possible since data can usually be constructed 
 so that any algorithm performs poorly. The best way to solve a problem 
 may depend directly on the data,, and the best algorithm may be different 
 for various chosen data. An alternative approach may seek to find the set 
 of data for which a given algorithm performs satisfactorily. To be effective, 
 either approach must incorporate properties of the problem and algorithm 
 not found in the computational graph. 
 
 Naturally, an algorithm may be improved in other ways. In particular, 
 parallel processing imposes other desirable criteria. Bounds on processors, 
 execution time, and storage required become equally important considerations. 
 These quantities are reflected in the height and breadth of the computational 
 graph. The error analyses described herein can serve to evaluate the tradeoff 
 between numerical stability and other criteria. 
 
97 
 
 Bibliography 
 
 [1] Ascher, U. s "Linear Programming Algorithms for the Chebyshev 
 
 Solution to a System of Consistent Linear Equations," SI AM J.N. A. , 
 
 Vol. 10, No. 2, April 1968. 
 [2] Barrodale, I. and F. D. K. Roberts, "An Improved Algorithm for Discrete 
 
 2, Linear Approximation," SIAM J.N. A. , Vol. 10, No. 5, October 1973. 
 [3] Barrodale, I. and F. D. K. Roberts, "Solution of an Overdetermined 
 
 System of Equations in the 2 Norm," Algorithm 478, Communications 
 
 of the ACM , Vol. 17, No. 6, June 1974. 
 [4] Bauer, F. L., "Computational Graphs and Rounding Error," SIAM J.N. A. , 
 
 Vol. 11, No. 1, March 1974. 
 [5] Householder, A., The Theory of Matrices in Numerical Analysis , Blaisdell, 
 
 1964. 
 [6] Larson, J. and A. Sameh, "Efficient Calculation of the Effects of 
 
 Roundoff Errors," to appear in ACM TOMS . 
 [7] Linnainmaa, S., "Taylor Expansion of the Accumulated Rounding Error," 
 
 BIT, Vol. 16, No. 2, 1976. 
 [8] Miller, W., "Roundoff Analysis by Direct Comparison of Two Algorithms," 
 
 SIAM J.N. A. , Vol. 13, No. 3, June 1976. 
 [9] Miller, W. , "Roundoff Analysis and Sparse Data," Numerische Mathematik , 
 
 Vol. 29, No. 1, 1977. 
 [10] Miller, W., "Software for Roundoff Analysis," ACM TOMS , Vol. 1, No. 2, 
 
 June 1975. 
 [11] Miller, W. , "Computer Search for Numerical Instability," Journal of 
 
 the ACM, Vol. 22, No. 4, October 1975. 
 
98 
 
 [12] Miller W. and D. Spooner, "Software for Roundoff Analysis, II," 
 
 to appear in ACM TOMS . 
 [13] Nijenhuis, A. and H. S. Wilf, Combinatorial Algorithms , Academic 
 
 Press, 1975. 
 [14] Panzer, K. , "Gutartigkeit von Rechenprozessen," Technische Universitat 
 
 Munchen, Fachbereich Mathematik, Interner Bericht, September 1975. 
 [15] Rabinowitz, P., "Applications of Linear Programming to Numerical 
 
 Analysis," SIAM Review , Vol. 10, No. 2, April 1968. 
 [16] Sameh, A. and R. Brent, "Solving Triangular Systems on a Parallel 
 
 Computer," SIAM J.N. A. , Vol. 14, No. 6, December 1977. 
 [17] Sterbenz, P. H., Floating Point Computation , Prentice Hall, 1974. 
 [18] Stewart, G. W., Introduction to Matrix Computations , Academic Press, 
 
 1973. 
 [19] Wilkinson, J. H., Rounding Errors in Algebraic Processes , Prentice 
 
 Hall, 1963. 
 
99 
 
 Appendix 
 
 Description of algorithms tested, 
 
 BACK4, BACK8: 
 
 SEMI4, SEMI8: 
 
 PARA4, PARA8 [16] 
 
 an algorithm for forward substitution on a unit lower 
 
 triangular matrices of dimension 4 and 8. For data 
 
 used see PARA4. 
 
 an algorithm for solving unit lower triangular systems 
 
 Ax=b of dimension 4 and 8 according to the following 
 
 scheme: 
 
 Let A = L^ 1 , L 2 _1 
 
 . L ,~ , where L, = (X..) is the 
 n-1 k i J 
 
 identity except in the k column where A.. = -a... 
 Then x = (L-Jl^U, b))) for dimension 4 and similarly 
 for dimension 8. For data used see PARA4. 
 similar to SEMI4, SEMI8, except log product is used, 
 i.e., x = (L-. L 2 ) (L, b) for dimension 4 and similarly 
 for dimension 8. Data used for dimension 4 was 
 
 
 1 
 
 
 1 
 
 
 -2 1 
 
 
 1 
 
 A = 
 
 
 , b = 
 
 
 -3 -5 1 
 
 
 1 
 
 
 ^-4 -6 -7 1_ 
 
 
 ^1 
 
 For di 
 
 mension 8 the data \ 
 
 vas 
 
 
100 
 
 LTBAND: 
 
 MATMUL : 
 
 A = 
 
 b = 
 
 1 
 -1 1 
 -2 -8 1 
 -3 -9 -14 1 
 -4 -10 -15 -19 1 
 -5 -11 -16 -20 -23 1 
 -6 -12 -17 -21 -24 -26 1 
 -7 -13 -18 -22 -25 -27 -28 1 
 an algorithm by Sameh, Brent [16] for the parallel solu- 
 tion of banded lower triangular matrices. The data used 
 was 
 
 -1 
 4 -1 
 -14-1 
 
 -1 4 -1 
 
 -1 4 -1 
 
 -1 4 -1 
 
 -1 4 -1 
 
 -1 4 -lj 
 
 an algorithm for multiplying two matices of dimension 4. 
 The data was 
 
 A = 
 
 , b 
 
 A = 
 
 1 5 9 13 
 
 2 6 10 14 
 
 3 7 11 15 
 
 4 8 12 16 
 
 B = 
 
 1 5 9 13 
 
 -2 -6 -10 -14 
 
 3 7 11 15 
 
 -4 -8 -12 -16 
 
101 
 
 RINVERSE: 
 
 CHOL 
 
 an algorithm for inverting an upper triangular matrix. 
 The data [12] was 
 
 0.0004 4.5692 4.8228 3.1819 
 3.9494 4.1681 3.7357 
 0.004 -5.0996 
 4.4464 
 the Cholesky method applied to the data [12] 
 
 A = 
 
 A = 
 
 3 (symmetric) 
 1 4 
 1 1 5 
 1116 
 
 b = 
 
 Gaussian elimination without pivoting with data [12] 
 as in CHOL. 
 
 The method of conjugate gradients applied to the 
 data [12] 
 
 A = 
 
 3 (sym) 
 1 4 
 1 1 
 
 *i i 
 
 5 
 
 , b = 6 
 
 5 7 
 
 Gauss-Jordan elimination with data as in CHOL. 
 
 The Gram-Schmidt orthogonal ization algorithm applied 
 
 to the three 4-vectors 
 
 r~ *^ 
 10 
 
 
 -6 
 
 
 -13 
 
 10 
 10 
 
 , v 2 = 
 
 -6 
 14 
 
 ' V 3 = 
 
 7 
 -5 
 
 10 
 
 V — «*J 
 
 
 14 
 
 
 ^15^ 
 
 The modified Gram-Schmidt algorithm with data as in G.S. 
 
102 
 
 Vita 
 
 The author, John L. Larson, was born February 6, 1949, in 
 Champaign, Illinois. He received B.S., M.S., and Ph.D. degrees from the 
 University of Illinois at Urbana-Champaign in the years 1971, 1974, and 
 1978, respectively. During his graduate education, he was employed as a 
 teaching and research assistant with the Department of Computer Science. 
 
 The titles of his publications include: "Efficient Calculation of 
 the Effects of Roundoff Errors," (with Ahmed H. Sameh) and "Automatic Error 
 Analysis for Serial and Parallel Algorithms." 
 
 He is a member of the Association for Computing Machinery and 
 the Society of Industrial and Applied Mathematicians. 
 
ILIOCRAPHIC DATA 
 KET 
 
 1. Report No. 
 
 UIUCDCS-R-78-916 
 
 "itlc and Subtitle 
 
 Methods for Automatic Error Analysis of Numerical Algorithms 
 
 3. Recipient's Accession No. 
 
 5- Report Date 
 
 April, 1978 
 
 uthor(s) 
 
 John Leonard Larson 
 
 8- Performing Organization Rept. 
 
 No - UIUCDCS-R-78-916 
 
 'erforming Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 US NSF MCS75-21758 
 
 jSponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 13. Type of Report & Period 
 Covered 
 
 Doctoral Dissertation 
 
 14. 
 
 Supplementary Notes 
 
 ! Abstracts 
 
 :ing Bauer's approach to relative error propagation, methods are presented for 
 informing various error analyses on numerical algorithms. A technique is developed 
 iiich generates a system of equations employed by the error analyses, and requires an 
 -der of magnitude less time and storage than present techniques. Accurate analyses 
 re attempted by working in the °°-norm, and require the solution of minimax problems, 
 heuristic approximation method is also described. These methods are compared with 
 le 2-norm approximation methods of Miller. 
 
 le error analyses provide alternative criteria by which algorithms that solve the 
 ime problem may be compared. Additionally, the analysis of a composite algorithm, 
 lich is made up of concatenated sub-algorithms is given in terms of analyses done on 
 ts parts. Finally, by using a forward analysis, limited algorithm improvement is 
 
 jtained by the location of sensitive operations, and the substitution of 
 
 Key words and Document Analysis. i7o. Descriptors mathematically equivalent expressions. 
 
 Jtomatic error analysis 
 jmerical stability 
 inear algebra 
 
 i. Identifiers/Open-Ended Terms 
 
 . COSATI Field/Group 
 
 Availability Statement 
 
 ELEASE UNLIMITED 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 106 
 
 22. Price 
 
 'M NTIS-35 ( 10-70) 
 
 USCOMM-DC 40329-P7 1 
 
JUN 2 1978 
 

 I