HmHili nffifflL mbhw 8uii Digitized by the Internet Archive in 2013 http://archive.org/details/methodsforautoma916lars 5/O.X'f MM Report No. UIUCDCS-R-78-916 UILU-ENG 78 1709 1 METHODS FOR AUTOMATIC ERROR ANALYSIS OF NUMERICAL ALGORITHMS by John Leonard Larson > b~> r April 1978 NSF-0CA-MCS75-21 758-000033 1 * * The Library of the JUN 1 5 1978 wersity ot >»ino.s sxna-Ch8 r DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILL! Report No. UIUCDCS-R-78-916 METHODS FOR AUTOMATIC ERROR ANALYSIS OF NUMERICAL ALGORITHMS* by John Leonard Larson April 1978 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 * This work was supported in part by the National Science Foundation under Grant No. US NSF-MCS75-21758 and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, April 1978. METHODS FOR AUTOMATIC ERROR ANALYSIS OF NUMERICAL ALGORITHMS John Leonard Larson, Ph.D. Department of Computer Science University of Illinois at Urbana-Champaign, 1978 Using Bauer's approach to relative error propagation, methods are presented for performing various error analyses on numerical algorithms. A technique is developed which generates a system of equations employed by the error analyses, and requires an order of magnitude less time and storage than present techniques. Accurate analyses are attempted by working in the °°-norm, and require the solution of minimax problems. A heuristic approxi- mation method is also described. These methods are compared with the 2-norm approximation methods of Miller. The error analyses provide alternative criteria by which algorithms that solve the same problem may be compared. Additionally, the analysis of a composite algorithm, which is made up of concatenated sub-algorithms is given in terms of analyses done on its parts. Finally, by using a forward analysis, limited algorithm improvement is obtained by the location of sensitive operations, and the substitution of mathematically equivalent expressions. Ill Acknowledgments I would like to express my sincere appreciation to my thesis advisor, Professor Ahmed H. Sameh, for his guidance, insight, encourage- ment, and friendship. I would also like to thank the members of my thesis committee for their interest and comments: Professors David Kuck, Dave Liu, Paul Saylor, and Robert Skeel . Additional thanks go to Professor Webb Miller for providing his software routines and for his interest in my work. I am grateful to the National Science Foundation and the Depart- ment of Computer Science for their financial support. Thanks are due to Ms. Cathy Gall ion for an excellent job of typing. Finally, I would like to thank my wife, Emmylou, for her love, patience and encouragement, and for her skill in preparing the illustrations IV Table of Contents Page 1. Introduction 1 2. Preliminaries 6 3. Roundoff Errors 11 4. Efficient Calculation of M 19 5. Error Analyses 25 5.1. Forward Analysis 25 5.2. Backward Analysis 27 5.3. B-analysis 34 6. Computational Details 39 6.1. The Form of g* 39 6.2. Simplification of Polytope Representation 40 6.3. Algorithms for g* 48 7. Comparison of Algorithms 66 8. Composite Algorithms 77 9. Algorithm Improvement 85 10. Conclusion 94 Bibliography 97 Appendix 99 Vita 102 1. Introduction - New algorithms are constantly being proposed for the solution of numerical problems. In particular, these include algorithms which exploit the capabilities of new and emerging parallel and pipeline machines. Unfortunately, this enthusiasm has not been matched by the development of systematic ways to evaluate the numerical properties of these new algorithms. In fact, until recently [Miller, 8-12] little work has been done in the area of automating error analysis for existing sequential algorithms. Besides J. H. Wilkinson, few people have the time or inclination to make tedious error analyses of the algorithms they use. Error analysis needs to be organized in such a way that it is easily done by machine. This thesis is concerned with the development and implementation of methods for automatic error analysis of numerical algorithms. This work parallels that of Miller [10], but takes a different approach to the analysis of numerical algorithms. His development is from an absolute error point of view, while this thesis is concerned with relative errors, a natural approach considering that floating point arithmetic is defined in terms of relative error. Perhaps the greatest difference in the approaches lies in their goals. While he strives to find data for which an approximate analysis indicates that the algorithm under investigation per- forms poorly, this thesis strives to give an accurate analysis of the algorithm for some given data. Each goal has its place depending on the need for the analysis. Section 2 begins with a summary of previous work, and shows how roundoff errors affect computations. Aided by the representation of an algor- ithm as a directed graph, each numerical quantity is a node, and directed arcs lead from operands to results. During the execution of the algorithm, the value" at each node is subject to roundoff error, called local relative error as described in section 3. The local relative error in a node sub- sequently affects the value of other nodes. In particular, it affects those nodes which use the contaminated node as an operand. The magnitude of the effect on one of these nodes is reflected by a number which weights the arc between it and the contaminated node. Moreover, the local rela- tive error in one node affects the value of another node if there exists a directed path between them. Additionally, the effect of the local rela- tive error is the product of the arc weights along this path. The effects of all local relative errors combine to produce a total relative error in each node. The definition of total relative error, when described in this way, is called the global definition, since all the nodes in the graph are inspected. An alternative formulation, called the local definition, recursively defines the total relative error in a node in terms of the total relative error in its operands. Each definition gives rise to a system of equations which relates the local and total relative errors. The system corresponding to the global definition is dense, while the local definition, as expected, produces an exceedingly sparse system, no more than four non-zero elements per row. The equivalence of these two systems is shown. Either of these systems could be used to perform various error analyses. However, only those rows of the dense system corresponding to output nodes need be used. Current methods generate the entire dense system, and then discard all but the desired rows. Section 4 derives an algorithm which relies on the equivalence relation and computes only those rows which are needed by solving sparse upper triangular systems. A com- parison of this method with an existing alternative shows an order of magnitude savings in both time and storage. Section 5 poses the problems to be solved in order to perform forward, backward, and B-analysis. A forward analysis determines relative error bounds for the computed solution. This bound may be easily computed from the system relating the local and total relative errors. A back- ward analysis determines the stability of an algorithm by finding a bound on relative input perturbations such that the computed solution satisfies the perturbed problem. This bound is found by comparing the set of output relative error vectors for all computed solutions with the set produced by local relative errors in only the input nodes. Conditions are specified for when a backward analysis is not possible. A B-analysis is an alterna- tive stability measurement which determines a bound on relative input and output perturbations such that the computed solution is near the exact solution of a perturbed problem. This bound is found by comparing the set of output relative error vectors for all computed solutions with the set generated by local relative errors in only input and output nodes. An ad- vantage of the B-analysis is that it may always be done. Section 6 analyzes the problems posed in section 5. The problems for backward and B-analysis are identified as minimax problems, whose solu- tions are often tedious to obtain. These problems involve the expan- sion of a polytope (a centrally symmetric convex set) to cover another poly- tope. The polytopes represent the sets of output relative error vectors described above. Two exhaustive methods are described. The first method searches the vertices of the polytope being covered for the vertex of maxi- mal relative distance. The second method searches the faces of the polytope doing the covering. Each method makes use of the particular form of the solution. Additionally, a close inspection of the problems reveals that their complexity may often be significantly reduced. This is reflected in a reduction of the number of columns in the system which relates local and total relative errors. Nevertheless, the computation time of the exhaustive methods precludes their use for all but small problems. An approximation method is also given which computes a lower bound on the solution and occasionally produces the exact solution. The remaining sections investigate several topics related to the main theme. Section 7 examines various ways by which algorithms may be compared. The first is in terms of a forward analysis, where one algorithm is preferable to another if it can guarantee a smaller relative error bound for the solution. This method is equivalent to comparing, for each algorithm, the set of all computed solution relative error vectors with the unit cube. A second method, proposed by Miller [8 J, compares the sets of output rela- tive error vectors for each algorithm against one another. The results of this comparison may indicate that in some sense each algorithm is prefera- ble to the other. This dilemma is overcome by a third method which, like the first method, uses a common standard against which the algorithms are compared. Here, the standard is the problem which the algorithms solve and is represented by the set of output relative error vectors resulting from local relative errors in only input and output nodes. Several examples are given. A composite algorithm is one which is made up of smaller sub- algorithms where the outputs of one sub-algorithm serve as the inputs to the next. Section 8 shows how a composite algorithm may be analyzed if analyses -have been performed on its parts. The relationship between the matrices of coefficients of the systems which relate local and total relative error in the composite algorithm and in the individual sub-algorithms forms the basis upon which the analysis of the composite algorithm may be done. Derivations are presented which provide bounds for forward, backward, and B-analysis of the composite algorithm in terms of corresponding analyses done on its sub-algorithms. Finally, section 9 scratches the surface of the problem of algorithm improvement. The goal taken is to reduce the forward analysis error bounds, since by doing so bounds for other analyses should also be reduced. Algorithm improvement consists of two parts: locating where changes to the algorithm should be made, and deciding what changes should be incorporated. It is shown that operations that have large weights on the arcs leading from their nodes are prime candidates for modification. A change in the algorithm involving these nodes may take the form of a mathematical identity for which the best alternative may be easily computed. Several basic identities are investigated. This technique is limited in its effectiveness by the fact that the computational graph does not contain all the information about the algorithm or the problem it solves. 2. Preliminaries " This section presents the basic definitions and concepts pro- vided by Bauer [4], and clarified and extended by Larson, Sameh [6]. Definition 2.1 : problem - a set X of n input variables, £-, 1 <_ i <_ n, and a set Y of m output variables, rj . , 1 <_ i <_ m, with t?. = f .. (£, , £«» ...» Definition 2.2 : algorithm - any finite sequence of elementary operations of the form £. = £. * £,, where * is one of ( + , -, *, 0, which mathematical- 1 J K ly realize the elements of Y. Definition 2.3 : computational graph - a directed graph, essentially a parse tree of the algorithm, having the £• as nodes and directed arcs from t d and£ k to*.. The computational graph contains a total of r nodes consisting of n input nodes, £., 1 <_ i <_ n, and r-n computational nodes, £ . , 1 <_ i <_ r-n, of which m nodes are designated as output nodes, i.e., tj- = £ , 1 <_ i <_m. i The variable co. is used to index the outputs. The nodes are numbered in such a way so that if $. = £. * £., then j < i and k < i. The value of each node is assumed to be non-zero. The following model will be used from time to time to illustrate the concepts being presented. problem: X = {^, £ 2> £ 3 , £ 4 > ; n = 4, Y = {t? 1 , tj 2 > ; m = 2, r? 1 = (^ + £ 2 h(£ 2 - t 3 ), v 2 = K 2 -« 3 M« 3 +« 4 ) algorithm : £ & = ^ + £ 2 h-h + h h'h*U where t? 1 = £ g , r? 2 = $ g , cjj computational graph : 8, o^ = 9, r-n = 5, r = 9, Subsequent graphs will employ implicit downward arrows. Definition 2.4 : partial relative derivative of (■. with respect to £ fi - a quantity, p£.j/p£ c , representing the first order relative effect on the value of £ . assuming a relative change in the value of £ fi , given by p^/p^ = K,/^) (a^/aSj) (2.1) Sterbenz [17] calls these quantities magnification factors. The arcs of the computational graph are weighted. The weight of the arc from £ . to £ . is f>i-lp%.. Table 1 illustrates the calculation J l v J of arc weights for the elementary operations. ■: 'V*j p{,.M k p^Mj Pf/P« k *i=V*k * = + * = _ * = X * = r V { i ty*i 1 1 v«1 -v*i 1 -1 Table 1 . Arc Weights The calculation of p£.j/p{; 8 using (2.1) when 2 ^ j, e ^ k, or 8 =j = k is complicated by the evaluation of a£./3£ fi by the chain rule. An al- ternative calculation makes use of the following definition and theorem. Definition 2.5 : path product - the product of partial relative derivatives belonging to each arc on a directed path between two nodes. Theorem 2.1 : p£ -/p£p is equal to the sum of the path products for all paths from £« and £ . . Proof: The chain rule may be stated as a^/at. s U P z n as p=l q=l f(p,q) /3t f(p,q+l) (2.2) where s is the number of distinct directed paths from £_ to £.; u is the number of arcs in path p; u +1 is the number of nodes in path p; f(p,q) = f. (p,q) is a function which describes the paths from £ g to £ . in the following manner: for each path p: f(p,l) = i; £w +1 \ is an operand of £ f / %; f(p,u +1) = £. Consider two adjacent nodes, say ^f(n a+1) anc ' ^f(D a)' on one °^ tnese P at hs. fr° m (2.1), the weight of the arc between these nodes is P *f(p,q) M f(p,q+l) = (? f(p,q-M) / ^f(p,q) )( ^f(p,q) / ^f(p,q+l) ) - Thus, ^f(p,q) /9 ^f(p,q+D = U f(p,q) /€ f(p,q+l) )(pt f(p,q) /p5 f(p,q+l) ) ' (2>3) Substituting (2.3) in (2.2) gives the following expression for the partial relative derivative of |. with respect to £„ : u *i'*t " Ml> < p ?, q " «f{p.q) /8 f(p.qfl) ,W f(p.q) /P *f(pW ) - Simplification by cancellation results in s U P p£-/p£n = 2 n p£,, n/p^/ ±11 (2-4) 1 P =i q=i f (P'^) f(p,q + D Thus, p£./p£ c is the sum of all path products. ■ Trivially, pZJpZ- = 1, and if there is no directed path from £ fi to £., then P^/P^ = 0. Formula (2.4) may also be expressed recursively. If £• = £ • * £^ then 10 p^/pSi = (p€ 1 /p€j)(p€ j /p€ 8 ) + (p£ i /^k )(p *k /p *8 ) (2.5) This follows from the fact that any path from £ g to £. must pass through For the model problem with data £, = £o = 1 » £3 = £4 = 2, the arc weights are shown below. The evaluation of p$.Jp£~ using (2.1) necessitates the calculati on 3{ g M 3 = % 2 - 2J 3 - { 4 from which follows P V P *3 = ( ^3 / ^9 )( ^9 / ^3 ) = 5/2 Alternatively, formula (2.4) gives P£ g /P* 3 = (Pf g /P5 6 )(p€ 6 /P€ 3 ) + (Pi g /Pi 7 ){pi 7 /Pt 3 ) = 5/2, 11 3. Roundoff Errors Floating point computation is subject to error due to finite word length. Wilkinson [19] provides the bounds for representation and arithmetic. f*i(l + e,) , 1 < i < n IjSj * * k )(l + .), n < i < r where |e • | <_ e = ((3 )/2 (rounding with base |3 and mantissa of d digits.) Definition 3.1 : local relative error in (-j - a quantity, [p£-j]i • e = e i » representing the relative change in the value of £j due to its represen- tation or computation. It follows immediately that |[p£,-]|_| < 1. Definition 3.2 : total relative error in £j - a quantity, [p£j]t ■ e » representing the relative change in the value of £• due to the accumulated effects of local relative errors in nodes of which £• is a function. Trivially, for input nodes, [p£-j]j ' € = [p£-j]|_ ' e » 1 £ 1 1 n » For a given algorithm, let £ . = f(£,, £ 2 » •••■ ^i_l^' n < i 1 r - Pre " vious local relative errors, e = [p£ c ]i • e,.l ±i li-1, give rise to Fi = f (^( 1 + e-j). £ 2 + e 2 ), ..., £■,•_■! (1 + e i_]))- Expanding f. around £. using Taylor's theorem, and dividing by £. results in 1-1 ? (fj -Sj)/^ = 2 Uj/^OO^/atjJCp^lL • e + 0(0 (3.1) Adding [p^\ • e to both sides of (3.1) and substituting (2.1), obtain [p£.] T - e = 2 (P^-/P^)[P^] L • e + O(e') 12 Dividing by c, and taking the limit as e approaches gives X — | (3.2) By working with (3.2) the discussion will be independent of e. Thus, the concern will only be with the coefficients of any small e > 0, and explicit mention of e is henceforth dropped. The formulation of total relative error, (3.2), gives rise to an undetermined homogeneous system of equations, Mz = 0, or [-1 ,A,B] «- r _ n ' = (3.3) where I r _ n is the identity matrix of order r-n; A is an (r-n)*n matrix with elements a.. = p£ , ./p£.; B is an (r-n)x(r-n) unit lower triangular ij n+r j \ i \ i j matrix with elements . . = p£ ,./p£ . • below the diagonal; t is an ij n+r n+j (r-n)-vector with elements r. = [p£ + .] T ; d is an n-vector with elements 5 .j = [p^,-]| ; and finally, g is an (r-n)-vector with elements y. = [p^ n+1 *]i- System (3.3) may be used to perform an error analysis, e.g., forward, backward, etc., as in a following chapter. However, these analyses seek a relationship of errors between inputs and outputs, and between computational nodes and outputs. Hence, only those equations of (3.3) corresponding to output nodes need be considered, i.e., for 1 <_ i <_ m, co,. K ] T 2 (pr? ./p£«)[p£p]i 2=1 n (3.4) where 77 • = £ w . This produces the condensed system, M z = 0, or i 13 [-I m ,A,B,] = (3.5) where I m is the identity matrix of order m; A is an mxn matrix with elements a.. = pri./p%.; B is an mx(r-n) matrix with elements J. . = pv-/p£ + .; t is an m-vector with elements r. = [pi?.]-, and d, and g are as previously defined. The elements of M may be calculated from the formula for the partial relative derivative in (2.5). An obvious method forms the entries row by row, the partial relative derivatives on the right- hand side of (2.5) having already been computed. The matrix M can then be formed by selecting rows of A and B in M corresponding to output nodes, and appending -I . Since m is usually much less than r-n, i.e., few of the computational nodes are actually output nodes, much of the work done in forming M is concerned with rows which will eventually be discarded. An equivalent system which will allow the direct calculation of M will now be derived. Formula (3.2) may be expressed as i-1 [p*,] T = &>M + 2 (P*,/PtJ[p* ] i J T Substituting (2.5) gives i J L 2=1 V^l'^i-'l i-1 [P^] T ■ [P^.] L + '(pVPtj) ^ Wj"*!"*A i-1 + (p*./p* k ) Z (p* k /P* B )[p* fi ] L 14 where £. = £• * £,. From the node numbering scheme, {pZJpL ) = 0, if 1 J K J K 8 > j, and {ptJptn) = 0, if 8 > k; i.e., a node cannot be affected by a computation which is performed after its execution. Hence, changing the upper limits of both sums and substituting (3.2) results in [p^-lj = [P^.] L + (P^/Pf jOtajlr + (p^/pS^CpS^t (3.6) In (3-6), the effects of previous local relative errors have been accumu- lated in the operands of £ • . Formula (3.6) is a local relationship of errors, since it concerns only a node and its operands. In contrast, one may call (3-2) a global relationship, since all previous nodes are inspected Thus, formula (3-6) produces the system Mz = 0, or [B,A,I r _ n ] = (3.7) where B is an (r-n)x(r-n) lower triangular matrix with elements <3-jj = P£ n+1 -/p£ n+J -> if £ n+J - is an operand of £ n+i » and zero otherwise, and .. = -1; A is an (r-n)*n matrix with elements a.. = p£ + -/p£ ■:» if % ,• is an operand of £ n+ ,-> and zero otherwise; and I r _ n » and z are as previously defined. The analyses could use either Mz = or Mz = 0. The reduction of Mz=0toMz=0is done to economize time and storage. No such reduction is possible with Mz = 0, since this system specifies relationships at a local level, and to remove rows from it would result in the loss of the global relationships which the analyses seek. Both systems specify a relationship between any and all errors, and their effects. The relationship specified in each system is the same, 15 since by comparing (3.3) and (3.7) and noting that B is nonsingular, it must be that M = (-B)- 1 M (3.8) The systems Mz = 0, Mz = 0, and M z = for the model are given below. Row and column headings are provided as an aid in identifying matrix elements. The equation relating Cp£ q ] t to other errors is shown to illustrate the difference in complexity among the systems. 16 h h h % 9 -I * K *K S-, S a S Q Si So U S A U Si Si So S 5 *6 ^7 ^8 ^9 4 *5 *6 V ^8 <9 -10 1/2 1/2 1 0-1 0-120 1 0-1 1/2 1/2 1 0-1 1/2 3/2 -2 1 -1 1 0-1 -1 5/2 1/2 1 1 1 Mz = " — i r n b* 6 ] T [Pi 7 }j = [p{ 8 j t D>{ 9 ] T IpS 9 }j = -LpS 2 \ + 5/2[ P ^] L + l/2[p« 4 ] L + [ P e 6 ] L + [p^] L + [p^ g ] L = 2 17 B A I ^5 ^6^7 ^8^9 h h h U $5*6*7*8*9 -10 1/2 1/2 1 0-1 -1 2 1 0-1 1/2 1/2 1 1-1 0-1 1 1 1 0-1 1 Mz = [p£ 7 ] t [p* 8 ] t M 9 ] T [p*,] l [pi 4 ] l [p£ 9 ] l C^ 9 ] T = W 9 \ + [p^ 6 ] t + [p^ 7 ] t 18 -I A £r£q £i £? £-5 £ 8 *9 «1 *5 *6 *7 *8 h "1 -1 1/2 3/2 -2 1-1 1 [pV-\]j r? 2 -1 -1 5/2 1/2 110 1 [/»? 2 ] T Mz~ = ">*l\ &*A «A [p^ 4 ] l em 5 ] l Wfi] L &A »A [rf^ [p£ g 3 T is as given for Mz = 0, 19 4- Efficient Calculation of M From (3.8), the relationship between M and M, one can devise an efficient scheme for forming A and B directly from A and B without forming A and B. Let C~ = [A,B], C = [A,B], c. t be the i th row of C and the j th row of C, where n+j = "j. Since C = (-B)" 1 [A,I ], it follows that c l = e. L C = e. L (-B)"'[A,I ], .-th where e. is the j column of the identity matrix of order r-n • Post- multiplying this equation by the nonsingular matrix, I n -A -B and transposing gives I -A" n -B 1 c. = e" ., i n+j' where e 1 . is the n+j column of the identity matrix of order r. This n+j unit upper triangular system may be solved in an efficient manner by drawing on the sparseness of A and B. Let R be the coefficient matrix, s = c-j , and f = e^+j • Then initially, V-l ; r-l >-l 0v 20 The value of o is immediately known, and only the deflated system R r-l s r-l = f r-l " Vr need be considered in order to find the remaining a's. This system is of the same form as the original, and the same technique to find a , may be employed, etc. Each update of the right-hand side is particularly simple„ At step k, r >_ k >_ n+1 , the vector u. is a subvector of the (k-n) row of [-A,-B], and therefore is zero except for two elements. Let u k t = (M kl , M k2 ,.. • , M k k _ 1 ). If ^ k = £ * I , then the non-zero elements of u^ are M k = ~(p^/p^J* and ^ k q = '^^PlJ- Thus » the following algorithm computes the i^ n row of C. 1. f^ n J/ = e ' ., where n+j = u>. n+j 1 2. For k = n+j, n+j-1,..., n+1 f (k-l) = f (k) s except 1) p ( k " 1 ) = p < k ) + k (k, (P^ k /P^ p ), 2)0q (k-l) =0q (k) + * k <*0(p* k /p$ q >, where £ k = £ p * £ q , and f (k) = (0 i (k) ),l < i < r 3. c. - f (n) The algorithm starts with V n J ^ rather than f^ r ' since 0.' r ' = for i > n+j, and hence f( n+ J) = f( r ). The algorithm ends with f( n ) rather than f^ 1 ) since after j steps R = I , and hence f( n ) = fC). n n An algorithm similar to (4.1) has been proposed by Linnainmaa [7] for the determination of Taylor coefficients for accumulated absolute rounding error. (4.1) 21 The algorithm is related to formula (2.4) on path products. This relationship is based on the observation that any path from, say £ , to % , . must pass through some node £ , where £ has £ n as one of its n+j ^s ^s ^C operands. Let S be the set of all such indices s. Then (2.4) may be expressed as p£ 7p£ n = 2 (pi /pi ){pi /pi ) n+j £ sGS n+j s s C At each step k = s,
- *k<» e d / "'« ,)/e k Z
Table 2. Evaluation of a£ . ./de.
n+r 2
Note that 3 £ n+1 '/ 3c n+ -j = £ n+1 - ■ Computation proceeds row by row, partial
derivatives in the table having already been computed. Assuming each node
type is equiprobable, one performs an average of 2.5 operations to evaluate
23
each of the (r+n)(r-n)/2 elements of A and A • Rows corresponding to
output nodes are subsequently used to perform the analyses. The storage
required is r(r-n) for [A ,A ] . In comparison to generating M from M,
this technique takes 5/6 of the time and the same storage. With respect
to M from M, it requires 5(r+n)/(16m) of the time and r(r-n)/(r(m+2)-2n)
of the storage.
However, algorithm (4.1) may be modified to compute the elements
of [A »A r ]. Since ^ n+i /3e £ = * e ( a *n+1 /d *« *» the " path P roduct " P™perty
of partial derivatives in (2.2) may be employed to compute the matrix
elements to within a scalar factor. The modifications to algorithm (4.1)
entail replacing P£ k /P$ p by 3£ k /3£ , P$^/pi by 3£ k /3£ , and including a
final scalar multiplication step. The modified algorithm would use Table 3
to calculate the arc weights. For £ . = £. * £, ,
J k
* = +
* = -
* = X
* = -r
1
1
h
vh
1
-1
h
V*k
Table 3. Arc weights for Miller
While the computations required in Table 3 are less than that of Table 1,
the final scalar multiplication step in the modified algorithm outweighs
this and results in an overall increase of up to 25% over the time to execute
algorithm (4.1). The storage required remains the same.
The generation of the equivalent of M in Miller [10] for several
algorithms was timed on a 360/75. The results are summarized in Table 4.
Many of the algorithms are described in Miller [12]. Givens/backsub re-
duces a matrix to upper triangular form using Givens plane rotations, and
solves the resulting system by backsubstitution.
Algorithm
n r-n
Miller
m [10]
Miller
w/(4.1)
24
Speedup
6ivens(QA=R)/backsub(Rx=Qf) (4x4)
20
172
4
0.98
0.085 sec.
11 .5
Least Squares (4x3)
16
116
3
0.45
0.046
9.8
Givens(QA=R) (4x4)
16
120
10
0.49
0.092
5.3
Choi esky(A=LL t )(Ly=f) (L t x=y) (4x4)
14
62
4
0.16
0.030
5.3
RINVERSE (4x4 UT)
10
30
10
0.042
0.010
4.2
Table 4. Timed Comparison
25
5. Error Analyses
5.1 . Forward Analysis
A forward analysis finds a bound on the relative error in the
solution |[p*?.] T |, 1 < i < m. Depending on the assumptions made, the
relative error in t?. may be the result of inexact data, inexact arithmetic,
or both. From the system M z = 0, (3.5),
t = Ad + Bg
(5-1)
or, in terms of the matrix and vector elements, for 1 < i < m,
[pr? i ] T
j=l 1 J J L j=n+l 1 J J L
(5.2)
Considering for the moment any single output, 77. , the relative error in
t? . can be maximized by choosing the local relative errors, [p£.l , 1 <_ j <_ r,
in the following manner:
&*A
r
1 if p77./p£ . >
c I c| <_ 1 , if pn./pt, .
-1 if pv -/P% . <
(5.3)
This choice of the local relative errors guarantees that [pt?.],. takes on its
maximum positive value. The maximum negative value of [p^?.]-. has the same magni-
tude, and corresponds to changing the signs of [p£-j]i "in (5.3). Without loss of
generality, the positive value will be used.
The choice of local relative errors in (5.3) reduces (5.2) to
26
n
l T - - =1 l J j=n+1 l J
(5.4)
where a possibly different set of local relative errors is chosen for each
t?.. Formula (5.4) shows that the relative error in i?. is bounded by
(||a. ||, + ||b.||,) where a. 1 and b. 1 are the i th row vectors of A and B. The
vector of relative error bounds, for all outputs, obtained from (5.4) is a
pessimistic one in that, generally, each bound cannot be simultaneously
attained .
Recall the matrices A and ¥ for the model problem:
v.
£ £ £ £
5 1 *2 *3 M
t t t t t
? 5 ? 6 ? 7 ? 8 *9
1/2 3/2 -2
1-10 10
-1 5/2 1/2
110 1
Application of (5.4) gives a relative error bound of 7 for t? and corre-
sponds to the choice of 1 ,1 ,-1 ,c,l ,-1 ,c,l ,c for the local relative errors.
Similarly, the choice of c,-l ,1 ,1 ,c,l ,1 ,c,l for [p£ .]. , 1 _< j _< 9, gives
a relative error bound of 7 for 77 Recall that these values are coeffi-
cients of e.
Equation (5.1) shows the contributions of data error, d, and
computational error, g, to the output error. By assuming exact computation,
i.e., setting g - 0, then (5.1) reduces to t = Ad, and (5.4) becomes
|[pi?.] T l < 2 |pt?./p£,|
j=l ' J
Definition 5.1 : condition vector of the problem - the vector of output
relative error bounds as computed by (5.5). The elements of this vector
(5.5)
27
may be interpreted as the sensitivity of the outputs to input error.
. For a problem with given data, the condition vector of the problem
is independent of the algorithm. This fact follows from the assumption of
exact computation. For the model problem with data £, = £p = 1 , £ 3 = $, = 2,
the "condition vector of the problem" is always (4,4) .
By assuming exact data, i.e., d = 0, then t = Bg, and
r
|[pi?.] T | ^ \pn /p£ | (5.6)
1 ' j=n+l ! J
Definition 5.2 : condition vector of the algorithm - the vector of output
relative error bounds as computed by (5.6). The elements of this vector
may be interpreted as the sensitivity of the outputs to computational
error.
Naturally, the condition vector of the algorithm varies from
algorithm to algorithm. For the model problem and the given data, the
algorithm employed has a "condition vector of the algorithm" of (3,3) ,
5.2. Backward Analysis
A backward analysis attempts to find a number, 0, which measures
the numerical stability of an algorithm. An algorithm is stable if it
produces a solution which is the exact solution of the problem with data
^0 + lfi$j\- e). 1 1 i <_ n > where I[P^] L I i -
The value of is found by comparing the set of output relative
error vectors corresponding to any computed solution with the set of output
relative error vectors produced by some perturbation of the data. The
analyses in this and the next section assume that the computed solution
is obtained from exact data. Let C k = {v: || vjl^ £ 1 } be the unit cube in
R (the --norm). Any vector, g = (7^, y. = [p*^.]^ 1 < i < r-n, of
28
local relative errors in the computational nodes is an element of C ,
since | -y - 1 <_ 1. From the forward analysis, a computed solution correspond-
ing to a given g has an output relative error vector Bg e R . it follows
that any computed solution must have an output relative error vector which
is an element of BC = (Bg: g e c r _ n ^- The set > BC r _ n is a cent rally sym-
metric convex polytope. This follows from the similar properties of C
r-n'
and the fact that B is a linear transformation from R ~ n to R , Householder
[5]. For the model problem with given data, the polytope BC is shown
below.
[pt?i]
1 J T
In a similar fashion the set of output relative error vectors
produced by only perturbing the data may be found. Let d = (5 ),
i
5 • = [p£ .]. » 1 £ i £ n, be a vector of local relative errors in input nodes
with |5.| <_ 1 . Then d g C , and the corresponding output relative error
vector for the solution is Ad e R • The set of all such vectors forms
AC = {Ad: dec}, which is, again, a centrally symmetric convex polytope
For the model problem, the polytope AC is shown below-
n
29
1 ^zh
- [p^Ij
(4,-3)
(4,-4)
To perform a backward analysis, the output relative error v
vector for any computed solution must be shown to be the same as one
obtained by only perturbing the data. By superimposing BC and AC for
the model problem, it is clear that there are output relative error vec-
tors for computed solutions which are not output relative error vectors
corresponding to input perturbations of only 1 ■ e.
AC.
[p^lj
BC
r-n
30
Thus, the value of © = 1 is too small. The correct value of © is found
by expanding (or contracting) fiC n so that it just covers BC r _ n - This cor-
responds to allowing d to lie in ©C n , where > 0. Ad is then an element
of the polytope ©AC . The value of © sought is glb{© > 0: BC r _ n c ©AC n >.
For the model problem, © = 2.375, as illustrated in the diagram below.
[pT? 2 l T
AC
[PV^Ij
31
Thus, any computed solution of the model problem at the given data can
be obtained by computing exactly with data £.(1 + [p£.l • e), 1 £ i £4,
where [£p| .], | < 2.375.
i L —
The amount by which AC must be expanded to cover BC is
governed by only a few of the elements of BC . These elements correspond
to the vertices of BC From the convex properties of AC and BC ,
if AC covers the vertices of BC , then it covers all of BC . Hence,
attention may be limited to covering only the vertices of BC . As the
previous diagram illustrates attention may, in fact, be restricted to
covering only one vertex (and its symmetric mate). This vertex is the out-
put relative error vector Bg* corresponding to a "worst" set g* of local
relative errors in the computational nodes. By covering this worst case,
all others are covered. This vertex Bg* is characterized by being the
"farthest" point on the perimeter, or boundary of BC from the boundary
°f AC n . Any point Bg on the boundary of BC is covered, on expansion,
by a corresponding point Ad on the boundary of AC r _ n . Both Bg and the
corresponding Ad lie on the same radial line from the origin. Hence,
Bg = 0Ad, where ©measures the distance of Bg. For the model problem,
the points Bg* and Ad* are illustrated below.
32
AC
BC
r-n
O??]
2 J T
Ad* = (1.26, •421) t
>r^ Bg* = (3,1 ) t
[PT? 1 ] T
In general, there may be more than one symmetric pair of vertices on the
boundary of BC whose relative distance from the boundary of AC is maximal
J r-n J n
If this is the case, any arbitrary vertex of maximal distance may be used,
since when any point on the boundary of BC of a given relative distance
is covered, all points of that relative distance are also covered.
The chapter on computational details will discuss the finding
of g*, and it will be assumed here that it has been computed. The following
problem may be solved to obtain 0.
Backward analysis: find the smallest constant > 0, which
satisfies the following conditions:
Ad = Bg*
|5.| <
1 l ' —
(5.7)
1 < i < n
where d = (5 ), 5. = [p£.l , 1 <_ i <_ n, and g* is as described above.
The minimal is the °°-norm of that vector d e 0C of input relative errors
n
33
whose image under A covers the "worst" output relative error vector Bg*.
fiAC n therefore covers BC r _ n -
Having set up the problem (5.7), it may be found to be unsolvable.
From the construction of F it is easy to show that its columns span R .
The polytope BC r _ n therefore extends in all m dimensions. If m > n, i.e.,
if there are more outputs than inputs, then the n columns of A can do no
better than to span R n . The polytope AC can therefore extend in no more
than n dimensions. It is impossible to cover a polytope in R m w -jth another
in R , where m > n. This deficiency is reflected in the inconsistency
of the equations in (5.7), i.e., rank (A) < rank ([A,Bg*]). Even if m <_ n,
which is usually the case, there is no guarantee that m of the columns of
A are linearly independent, and hence, that AC extends in m dimensions.
Again, the situation could be the impossible task of covering a polytope in
m . k
R' with another in R , where m > k. The following example illustrates this
case-
This is the computational graph for the algorithm: t? = £ = % + \ •
n 2 =* 4 = *l *h-
34
For ^ = * 2 ,
1/2
1/2
, B =
"l
1
1
1
A =
The polytopes AC and BC are shown below
[p7? 2 ^T
E^i]
1 J T
The polytope AC consists of the line segment joining the points (-1, -2)
and (1, 2). The polytope ©AC- for any © will consist of the line seg-
ment joining the points 0(-l, -2) and 0(1, 2), and cannot cover the poly-
tope BCp.
The inability to always perform a backward analysis warrants a
relaxation of the criterion for stability to allow an analysis which may
always be done. One such analysis is presented in the next section.
5.3. B-analysis
A forward and backward analysis can be combined into what
Panzer [14] calls a B-(Beitseitige) analysis. The B-analysis is along
the lines of Stewart's [18] definition of stability, namely, that the
35
computed solution is "near" the exact solution of a perturbed problem.
From the forward analysis, any computed solution, assuming exact data,
must have an output relative error vector
t B ■ Bg
which is an element of BC . Similarly, the output relative error vector
r-n J r
produced by perturbing only the data and computing exactly is
»A = Ad
which is an element of AC . The backward analysis requires finding the
smallest number such that for any choice of g in C ,
where ©' <_ 0. However, the B-analysis only requires that 0't. be "near"
t R . The "nearness" of 0't A to t R can be measured by the °°-norm, ©' , of
a vector ©'t such that
©'(t A + t) = t B (5.8)
AAA ^
where t = (t •) , t , - [pr?-], [r,. | <_ 1 , 1 < i < n, is a vector of relative
errors which artificially perturb the output values. A constant, 0, is
now sought such that for any g in C , (5.8) holds for some 0' £0.
By bounding both the input and output perturbations by 0, the "nearness"
of the perturbed inputs to the true inputs, and the "nearness" of the com-
puted solution to the exact solution of the perturbed problem are measured
in the same way.
36
The value of is found by comparing sets of output relative
error vectors. As before, the set of all vectors, tg, is BC r _ n . Let
A = CA,I ]• and d 1 = (d fc , t*). Then t A + t = Ad, where d e C n+m - The
set of all vectors, t A + t, is AC n+m , a centrally symmetric, convex poly-
tope. For the model problem, AC n+m and BC r _ n are shown below.
AC
n+m
[P*? 2 ] T
BC.
r-n
[PV-^lj
The B-analysis now follows a strategy similar to that of the
backward analysis. It must be shown that the output relative error vector
for any computed solution is the same as that obtained by perturbing the
data and the outputs. The polytope AC + is expanded so that it just
covers BC . Thus, d is allowed to lie in ©C , where © > 0. Ad is then
an element of ©AC . . The value of souqht is glb{© > 0: BC „ c ©AC }.
n+m M 3 r-n — n+m
The diagram below shows that © = 1 .1176 is the desired for the model problem
37
AC
n+m
Ipv 2 1j
~\
\
\ ^
/
S
• <
— [pv-,1
1 J T
BC
r-n
Thus, any computed solution of the model problem at the given data can be
obtained by computing exactly with data £.(1 + [p£.l • e), 1 <_ i <_ 4, and
then perturbing the outputs q. by (1 + [pi?.] • e), 1 < i < 2, where
|[P^] L I 1 1.H76, and H>?.]| < 1.1176.
As in the backward analysis, is governed by a vertex Bg*
corresponding to a "worst" set g* of local relative errors in the computa-
tional nodes. The vertex Bg* is one of the points on the boundary of BC
of maximal relative distance from AC , where distance is measured in the
n+m
same way as that in the backward analysis. For the model problem,
g* is the same for both the backward and B-analysis, though, in general,
this need not be the case.
Assuming that g* has been computed, the following problem may
38
solved to obtain 0.
- B-analysis: find the smallest constant > 0, which satisfies
the following conditions:
Ad = Bg*
(5.9)
|6 . | <_ 1 < i < n+m
t
where d - (8 •)» 1 £ 1 1
n+m
r i _ n = [pT7 i _ n ] for n+1 <_ i < n+m
A = [A, I], and
m
g* is as described previously.
The minimum is the °°-norm of that vector d e 0C of input and output
relative errors whose image under A covers the "worst" output relative
error vector Bg*. 0AC , therefore covers BC
J m+n r-n
The system of equations in (5.9) is always underdetermined since
*t t *t
A is an mx(m+n) matrix. Furthermore, a vector, d = (d ,t ), which satisfies
these equations can always be found since rank (A) = rank ([A, I ]) = m.
In particular, the vector (0,(B~g*) ) satisfies these equations, and corre-
sponds to a forward analysis. This vector, however, does not usually give
the smallest 0.
39
6. Computational Details
6.1 . The Form of g*
Before the problem for backward analysis (5.7) or the problem
for B-analysis (5.9) can be solved, the vertex, Bg*, of the computational
polytope of maximal relative distance from the problem polytope must be
found. The following discussion will concern the B-analysis to which
obvious modifications can be made if a backward analysis is to be attempted.
Since B is fixed, the problem is that of finding g*. From the
properties of B and C , g* is found to have a particular form as shown
by the following argument. Any point, x, in a convex set may be represented
by x = Xa + (l-X)b for some a and b in the set and <_ X <_ 1 . A vertex, x,
has a unique representation of the form x = 1 • x. Let x be an element of
- r " n - - th
BC , i.e., x = £ y. b., where b. is the i column of B and \y.\ <_ 1.
Theorem 6.1 : If x is a vertex, then \y . | = 1 , 1 <_ i <_ r-n .
Proof: Assume there exists a j such that |t-| < 1. Without loss of generality,
assume <^y. < 1. An analogous proof holds for -1 <_t- < 0. Then
j j
r-n _
x = 2 7,-b. + Tib.
i=l ^ ^ J J
1+J
r-n
Let u = 2 T-jb.,. . Then x = u + T-b.. If y. = 0, then x = u. However,
i=l
i+J
i=l 1 ! J J J
(u + b.) and (u - b.) are elements of BC . Hence,
x = X(u - b.) + (1 - X)(u + b.), where X = 1/2
j j
Thus, x is a non-trivial convex combination of two points in BC which
r r-n
40
contradicts the assumption that x was a vertex.
- If < 7. < 1, then, again, x = u + 7^0. . However, u and (u + b )
J J J j
are elements of BC , and
r-n
x = Xu + (1 - X)(u + ET. ), where \ s 1 - y.
J J
Again, the fact that x is a vertex is contradicted. Since the choice of
7. was arbitrary, it must be that any vertex has the form
J
r-n _
x = 2 7-b., where \y.\ = 1 ■
i=l n 1 n
*
Thus, g* must have components 7- = + 1 . Intuitively, one would expect a
"worst" set of computational errors to contain a maximal error for each
step of the algorithm. Thus, the problem of finding g* consists only of
finding the signs of its components.
For the model problem, g* = (1,-1,1,1,1) •
6.2. Simplification of Polytope Representation
For efficiency, it will be of advantage to reduce the number
of columns in A and B. The following argument shows how this may be done.
Let v be a vector formed by the sum of j vectors, v., 1 <_ i <_ j, which are
in the same or opposite direction. Then
j
v = 2 a. v., with |a.| < 1 (6.1)
where for 1 < i <_ j,
v. = (-1) i P 1 v ]
» i = -sign (v.j v ] )
'l = Hv 1 ll - /||vjlL>0
41
Then v may be represented by
v = a w, with \a\ <_ 1 (6.2)
where
a = ( I a.(-l) i fl.)/( I 0.) (6.3)
i=l ] 1 i=l n
J a
w = 2 (-1) ] v. (6.4)
i=l n
Formulas (6.1) through (6.4) provide the basis for condensing
B to a matrix B having k, £ r-n columns, b.. In a similar fashion, the
matrix A = [A, I ] may be condensed to A = [A,D] having k 2 +m <_ n+m columns,
a.. The matrix D replaces I allowing for the case where some of the columns
j m
of A are multiples of e., the columns of I .
i m
Following (6.4), the matrix B is condensed to B by post-multiplica-
tion by a (r-n)xk, matrix P R ,
B = B P B (6.5)
where P B has elements
A
{+1 if b. is condensed into b.
SL
-1 if -b. is condensed into b.
j '
otherwise
The matrix P B has the following properties:
(1) each row has only one non-zero element,
(2) the columns are orthogonal,
(3) P„ P B = D g , a diagonal matrix whose i diagonal element
42
equals the number of columns of B which are
condensed into b . .
l
For the model problem (6.5) reveals
2-10
1 2
1-10 10
110 1
1
1
1
1
1
J
These reductions have not changed the polytopes, and the following
proof shows B C. = B C .A similar proof would show that AC, , m = A C ,.
K k-j r-n r kp+m n+m
Theorem 6.2 : B C k = B C p _ n .
Proof: a) Let x = B g, where g = (7.), 1 <_ i £k, ,with \y.\ <_ 1 . Then
x g B" C. . From (6.5),
'1
x = Bg=BP R g
Since P R has only one non-zero element per row and this non-zero element
has magnitude one, it follows that
II p R 9lL 1 1
Hence, there exists a g e C (namely g = P R g) such that x = Bg. Thus,
x e B C , and B C. c B C
r-n k, - r-n
b) Let x = Bg, where g = (7.), 1 <, i <_ r-n, with |*y - 1 <_ 1.
Then x e B C . Let the columns of B be partitioned according to the sets
J., 1 < i < k n , where j e J. if b. or -b. is condensed into b.. Then
43
x = 2 7-b. + 2 7.b. + ... + Z T-b.
jeJ 1 J J jej 2 J J jeJ k J J
In each sum, 2 7-b., all the vectors, b., are in the same or opposite
jej. J J J
direction, and |t.| £ 1. Each sum has the form given by (6.1), and may be
substituted by T^b., where y. and b. are computed via formulas analogous to
(6.3) and (6.4). The formula analogous to (6.4) is (6.5). The formula
analogous to (6.3) is
g = Q B g
where CL is a k,x(r-n) matrix with elements
ll^jllo/H^ilL if bj is condensed into b.
pi _~_ *.
k.. = < -lib- II /lib. || if -b. is condensed into b
U
v
otherwise
The matrix Q B has the following properties:
(1) each column has only one non-zero element,
(2) the rows are orthogonal,
(3) each absolute row sum equals one. This follows from
lib. || = 2 ||b. II ,
(4) Q D (L = D D , a diagonal matrix,
D B D
(5) Qg has the same structure as P R , i.e., corresponding
elements of Q R and P B are either both zero, both positive,
or both negative,
44
(6) Q D P D = I,, • This follows immediately from properties
DO r i
(3) and (5).
For the model problem, and g = (1,-1,1,1,1), the corresponding
g is given below.
1/2 1/2
10
1/2 1/2
1 - *
Thus, performing the above substitutions, x = 2 y. b., with
i=l ^ 1
/\ /\ ^
|7- 1 < 1. Hence, x e B C. , and B C c B C, . ■
i ' — k-j r-n — k-.
Similar arguments reveal that any point x in A C , of the form
■r.x.r, a n+m
n+m . . C
Ad = 2 5. a. can be expressed as an element of A C. .„, of the form
• ■ill kp+m
k 9 +m a ^
Ad = 2 6. a., and vise versa. Hence AC = A C. . .
i= 1 li n+m k 2 + m
For the model problem
3/2 -2 3/2
-1 5/2 3/2
A =
, B =
2-10
1 2
where k +m = 4, and k =3. The following table shows the often dramatic
2 1 s
reductions for a battery of problems and algorithms, see Appendix for
description .
45
ALGORITHM
m
r-n
k l
n+m
V
- BACK4
3
12
3
13
6
SEMI4
3
12
3
13
6
PARA4
3
14
4
13
6
BACK8
7
56
7
43
15
SEMI8
7
56
7
43
15
PARA8
7
86
11
43
15
GS
12
81
26
24
19
MGS
12
81
28
24
19
LTBAND
7
86
10
37
15
RINVERSE
10
30
10
20
19
GJ
4
76
7
24
8
GE
4
62
6
24
8
CHOL
4
62
13
18
14
MATMUL
16
112
16
48
48
CG
4
122
32
12
9
Table 5. Simplification of Polytope
Representation
As the table illustrates, the reduction of B to B occasionally
results in k =m in which case B is, to within column permutation, a lower
triangular matrix. The following theorem gives sufficient conditions for
this to occur.
46
Theorem 6.3 : If the computational graph is a binary tree, i.e., each
node is an operand of no more than one node, then B is lower triangular.
Proof: For each node r\ . , 1 <_ j <_ m, define
S. = {£ : pr? /p£ fO and pr? /p£ = for all k < j}
J n+i j n +i k n+i
A node belongs to S . if 77. is the first output node encountered on the
unique path from it to the root of the tree, i? m . Every computational
node appears in one and only one set, and each set is non-empty, since
17. belongs to S . • Let the columns of B be partitioned according to the
sets S-, i.e., column i is in partition block j if (■ . is in S.. Con-
sider the columns corresponding to S, = {£ n+ ,-« PV-,/p£ + i f 0}. For any
£ n+1 - in S-, , the corresponding column in B" is
{pvJpZ _, prijpt . , ... , pv /Pi ) = V 1 ' {pv J Pi .)
1 n+i 2 n+i m n+i ' 1 n+i
where v., = [pv,/pv , pr} /pr\ , • • • , pr\ /prj ) • This follows from the fact
1 112 1 ml
that the graph is a tree and hence pv Jp% . = (pr? Jpv ){pv /p£ n+1 -), since
\j III J I
there is at most one path from £ ,. to 1?.. Thus, all the columns of B"
r n+i 'j
corresponding to S, are multiples of v-i , and the reduction of B may choose
the first column of B to be v, • 2 \pr\ /p£_..-|. Following similar
1 t (=<: '^Y n+1 3
*n+i fc 5 1
arguments, the column of B corresponding to £ + . in S. is v. • {pvJp£ + .)
where
V. = (0, -.., 0, PV./PV.> PV. JPV., ..., prijpri*)*
J y^_ ^ J J J+l J m J
47
t h —
Thus, the j column of B is v. • 2 \pr\ Jp% . |. From the form of
J £ x .eS. J n ]
n+i j
v., it follows that B is lower triangular. ■
A similar result for A, the condensed matrix for A, does not
follow. For binary trees, n > r-n >_ m, and seemingly the possibility
exists for reducing the n columns of A to m columns of A. The matrix,
A, then being lower triangular, would be of rank m, and this fact would
guarantee that a backward analysis could be done. The argument below
shows why this is not always true. The following example provides an
illustration.
48
Here,
A =
1
1
1
1
1
1
1
1
, A =
2
2
2 2
In theorem 6.3, each set S,, 1 <_ j <_ m, is non-empty and hence, there is
a condensed column for each set. For this example, S., is empty, character-
izing the fact that each path from an input node to t? encounters another
output node first. Thus, there is no condensed column corresponding
to S^, and A has fewer than m columns. With the additional assumption
that all the sets, S., 1 <_ j <_m, are non-empty, theorem 6.3 will hold
for A.
6.3. Algorithms for g*
The preceding section has shown that the size of the problem
for B-analysis (5.9) may be reduced to the following:
minimize
subject to Ad = Bg*
(6.6)
£ 0, 1 _< i <_ kp+m,
where d has components 5. representing, according to the reduction of A,
combined input and injected output relative errors, and g* has components
T*» T 1 1 1 k i > representing, according to the reduction of F, combined
relative errors for the "worst" computation, with M| = 1. Thus, algorithms
for g*, rather than g*, will be presented.
The g* which is sought is that g in C. which gives the largest
K l
49
solution to (6.6). The problem of finding g* is a minimax problem.
Unfortunately, algorithms for finding the global solution to this type
of problem are often tedious. Because of the special form of g*, finite,
though often lengthy, algorithms can be constructed to find the global
solution to (6.6). Two such algorithms will be presented.
The first algorithm consists of solving (6.6) for all possible
g, and selecting that g which gives the largest 0. Since each component
^1
of g* is +1, and there are k, components, (6.6) must be solved 2 times.
The reason for the reduction of B to f in the last section now becomes
apparent.
For a given g, the solution of (6.6) is a linear program. As
it stands, (6.6) is not in standard form, since the variables 6 are not
constrained to be non-negative. A standard procedure for handling this
case is to replace each inequality |5.| £ by the two inequalities
5. + - 6." < 0, and 5. + - 5." > -0, where 6. + and 6 " are non-negative
i i — ' ii— i i -
variables representing the positive and negative parts of 5.. This pro-
cedure practically doubles the number of variables and constraints, and
increases the storage required by a factor of four.
An alternative procedure consists of a change of variables.
Define the non-negative variables S\ = 8^ + 0, i.e., d' = d + 0e k +m>
where e. + is a k 2 +m vector each of whose elements is 1. Then, the
^ e. *•
constraint |5.| £0, becomes 8! - 20^0, or d' - 29e k +m <_ 0. Substi-
tutinq d = d' - 0e, . , the constraint Ad = Bg* becomes Ad' - 0w = Bg*,
1 k 2 +m .
where w is a k.+m vector having the elements ox = 2 a... Thus, (6.6)
2 j=l 'J
has become
50
minimize
subject to Ad' - 0w = Bg* (6.7)
d' " 2©e. +m <
However, rather than solving (6.7), tremendous savings in time
and storage can be had by considering a dual problem, Ascher [1J. His
A •
notation will be employed. Given a candidate for g*, let d be a feasible
solution to (6.6), i.e., d satisfies Ad = Bg*. Define the following quanti-
ties: 7 = 1/ll'dH^ = 1/e, K Q = HAll^/llfg*!^. Choose any K > K Q and define
the kp+m+l vector y with components i? . =76., 1 <_ i <_ k^+m, and
*? k +m+1 = T/K. Since 7 < K Q , \\y\\ m < 1. Let A = [A, -K fg*] be a m*(k 2 +m+l)
matrix, and c be a kp+m+1 vector whose elements are zero except for the last
component which is 1. Then, (6.6) is equivalent to the linear program:
t
max c y
subject to Ay = (6.8)
The dual of (6.8), Rabinowitz [15], may be stated as
min || r* || _!
subject to A fc a + r = c (6.9)
where A and c are as described above. What is claimed to be the most effi-
cient algorithm yet devised for solving (6.9) is provided by Barrodale and
Roberts [ 2» 3]- The outstanding features of the algorithm include:
(1) a condensed tableau, only A and c are explicitly stored; (2) no phase 1
51
i.e., an initial basic feasible solution is available, r = c; (3) the
ability of passing through several vertices in a single iteration. The
following equalities show how is retrieved from the solution of (6.9).
B ] = min || r- 1| ^ = max c l y = v k +m+1 = 7/K = 1/(6K).
Hence, = 1/(K£ 1 ).
The following table shows the saving in time of (6.9) over (6.7)
for a battery of problems and algorithms. The time, in seconds, is for the
evaluation of for one candidate for g*. It is assumed that A and Bg* have
previously been computed.
ALGORITHM
PRIMAL (6.7)
DUAL (6.9)
BACK4
0.23
0.02
SEMI4
0.23
0.02
PARA4
0.19
0.01
BACK8
3.91
0.10
SEMI8
3.77
0.09
PARA8
2.10
0.11
GS
1.12
0.15
MGS
1.00
0.15
LTBAND
1.82
0.20
RINVERSE
0.78
0.12
GJ
0.55
0.03
GE
0.65
0.03
CHOL
0.38
0.02
MATMUL
5.79
0.58
CG
0.15
0.01
Table 6. Primal and Dual Solution Timings
52
While the first method for finding g* searches all possible g
in C. for that g which gives the largest solution to (6.9), a second
method searches all the faces of A C. . for that face which will cover
kp+m
Bg*. The representation of A C. + is changed to a more standard represen-
tation which is in terms of the hyperplanes which support the polytope. A
hyperplane, Householder [5], divides R into two half spaces and is repre-
sented by the equation v x = w, where v is an m-vector which is normal
to the hyperplane, and wis a constant which is related, by the length of
v, to the Euclidean distance of the hyperplane from the origin. A hyper-
plane supports the polytope if there is at least one point in common be-
tween the two, and all the points of the polytope lie on the hyperplane or
in only one of the half spacesi The points x in the polytope satisfy v x <_ 0).
Since the polytope is centrally symmetric, there is a corresponding support
hyperplane v x = -wand points in the polytope satisfy v x > -cj. Collectively,
the points in the polytope satisfy |v x| <_ w. The particular support hyper-
planes of interest intersect the (m-l)-dimensional faces of the polytope.
The points on a face of the polytope A C. +m have a particular form.
:: k+m : : :
Each vertex of the face is of the form Ad = 2 6. a. with |5.| = 1. Points
i=l ] n n
on the face are convex combinations of these vertices. A point on the face
supported by a hyperplane with normal v may be represented by
Ad = „£ 6. a. + .2 5. a. (6.10)
v l a.f0 v a^O
53
where the sum is split according to whether the projection of a. on v is
zero. AH of the points on the face have the same projection on v. The
following illustration shows a typical two-dimensional face of an A C. + c
3 . '
R , where, in this case, three of the columns of A are orthogonal to v,
origin
and y is the first sum in (6.10). The points on the face are symmetric around
the point y, which may be called the "center" of the face. The coefficients,
5 ., for y are given by
f f
+1 if v a. >
i ■
-1 if v a. <
V. 1
If this were not so, then it would be possible to construct a point, y',
which would have a projection on v which is greater than that of the points
on the face. The origin and y' would then lie in different half-spaces,
contradicting the fact that the hyperplane supports the polytope on this
54
face. The coefficients, 6., in the second sum of (6.10) satisfy |6.| <_ 1
and are related to the representations of a point on the face in terms of
convex combinations of the vertices of that face.
Since a face is (m-l)-dimensional , there must be at least m-1 terms
in the second sum of (6.10). Each face therefore corresponds to a set of at
least m-1 columns of A. These vectors must span an (m-l)-dimensional subspace
of R m , and hence, attention is restricted to combinations of m-1 columns of
A which are linearly independent. While there may be more than m-1 columns
of A orthogonal to v, any subset of m-1 linearly independent vectors will
suffice to determine the face and the support hyperplane.
Suppose m-1 linearly independent columns of A are given, say
t*
a,, a 2 , ..., a i . Then the condition v a. = 0, 1 <_ i <^m-l, determines v to
within a constant multiplier. Requiring that ||v|L = 1 determines a specific
v. Having found v, the corresponding co is determined by finding a point which
lies on the face. Such a point, y, has been determined above. The value of
co is then |v y|. Thus, for each set of m-1 linearly independent columns of
A, the corresponding symmetric pair of support hyperplanes can be determined.
t k 2 +m
The points of the polytope must satisfy |v x| <_ to. If k 3 <_ ( m _-| ) is
the number of support hyperplane pairs, then the polytope may be represented
by those points x which satisfy |Vx| <_ w, where V is a k^m matrix whose
rows, v- , are the normals for the support hyperplanes, w is a k^-vector of
k 2 +m n+m J
corresponding co' s. Often ( m _-| ) is much less than ( m _-j) and the
reason for the reduction of A to A now becomes apparent. It will be con-
venient to work with a normalized version of these inequalities, |Vx| l e,
where e = (1,1,..., 1) , in which each row has been divided by the corresponding
right hand side.
55
The polytope B C. is covered by expanding A C. + by a factor 0.
Corresponding to the expanded polytope A 0C, . is the hyperplane represen-
tation
|Vx| < 0e (6.11)
in which the support hyperplanes have been moved out away from the origin
by a factor 0. Given any x in R m , the smallest for which (6.11) holds
is the norm of x, where here, instead of a unit ball or cube, there is a
unit polytope. Of interest are the norms of the points which lie in B C. .
K l
^ K l . A
Such points are of the form Bg = I 7-b., with \y.\ <_ 1 . The particular
i=l n ] ■*
x = §g* is sought which gives the largest 0. It is known that this x is a
vertex of B C. , and \y.*\ = 1, 1 ± i £ k,. Thus, the smallest is sought
for which any choice of \y . | <_ 1 generates an x = Bg which satisfies (6.11).
Let U = V B be a k.xk-, matrix with elements m... Then, it must be that
3 1 ij
a A. XL a
|Ug| <_®e. Let u. be the i row of U. Then, since \y.\ <_ 1 , 1 <_ i <_ k, ,
t*
l u - 9 1 1 ll u illi» W1tn equality when the components of g are chosen to have
magnitude 1 and sign of the corresponding elements of u. . Let J be that
t~ t" t*
subscript for which u. g. is maximum, i.e., u, g, = max u. g., where g.
t "
is that g chosen in the above manner for each row u. . Then, g* = g, with
** t A
elements y^ = 2-sign (// )-l , 1 <_ i <_ k, , ahd = u, g*.
If d e 0C + corresponding to Bg* is desired, additional computations
need to be performed. Let \f. be the J row of V, where u, = Vj B and J is
as defined above. Thus, v is the normal to the hyperplane which supports
J
56
the face of A OC on which Bg* lies. Hence, Bg* may be represented as
in (6.10). For the first sum in (6.10), |5.| = +<9, where the sign is determined
by the sign of the corresponding V^a.. This follows from the fundamental
theorem of linear programming which states that the inactive variables take
on their extreme values. The values of 6. in the second sum are determined
by solving an mx(m-l) consistent system of equations involving those columns
of A for which v\a. = 0, and the right hand side (tg*-y). If the number of
columns in A for which v,a. - is greater than m-1 , then the problem becomes
a linear program.
In an implementation of this algorithm, the matrix U need not be
stored. It may be computed one row at a time and a variable may hold the
largest value of u. g. computed so far. Additionally, the normals, v.,
1 < j < k», may be easily computed in the following manner. Let A be an
mx(m-l) matrix whose columns are some m-1 columns of A. Then, A == QR,
where R has the form
where the last row of R is zero. If rank (R) = m-1, then the m-1 columns
of A chosen are linearly independent and hence, determine an (m-l)-dimensional
face. The normal to the hyperplane supporting this face is v = Qe , where
e is the last column of the identity matrix of order m, and || v|| 2 = !•
57
When m is greater than 20, it is computationally advantageous to compute
the QR decomposition and the normal v in a different manner. Rather than
computing a new QR decomposition for each combination m-1 columns of A,
the combinations may be ordered in such a way that the factorization of
the next combination may be easily computed from that of the previous one.
As before, let A = QR, where A consists of some m-1 columns of A. Let A'
be the same as A except that column k has been replaced by a column, a.,
of A so as to form a new combination. The revolving-door algorithm given
by Nijenhuis, Wilf [13] specifies a sequence of replacements which will
kp+m
generate all ( , ) combinations. Then A' = QR', where R' has the form
and R' is the same as R except for the k column which is now Q a.. The
J
matrix R' may be reduced to the form of R by premultipli cation by an orthogonal
matrix P consisting of a product of Givens plane rotations.
P = G , . . . G, . , G. . ,
m-1 k+1 k+1
m
where G. introduces in the i row of column k, and G! introduces in the
i+l st row of column i. The QR factorization of A* is then A = (QP t )(PR')
or A' = Q'R', and the normal vector corresponding to this new combination
is v' = Q'e . This method has the disadvantages of requiring extra storage
58
for Q in an unfactored form, and of the loss of orthogonality in the
columns of Q.
Both of the methods presented for finding g* are exhaustive.
Hence, these methods are viable only for small k, or k~, due to the
computation time involved. For larger problems, methods which obtain
an approximation to are required.
One algorithm to compute an approximation to g* is based on the
following heuristic. If the direction in which Bg* lies is known, then
the choice of 7* could be made according to whether b., the i column of
i 3 1
B, or -b. lies in that direction. The direction used is the eigenvector,
x m ,„. correspondinc
max r
eigenvalue problem
x , corresponding to the largest eigenvalue, ^ max > of the generalized
B Z l x = XAA t x (6.12)
This problem is similar to the problem which Miller [11] solves to obtain
1/2
an approximation to 0. He has shown that (^ max ) approximates 0, and that
his approximation satisfies
< n+m »" 1/2 < X m ax> 1/2 ± 1 < X m ax> V2 (-"""J 1 ' 2 < 6 - 13 >
The following figure illustrates Miller's methods applied to the model
problem.
59
[>??]-
Of-,]
1 J T
The unit sphere S m = {v e R m : ||v|| 2 = 1) replaces the unit cubes C
n+m
and
C r _ n , and ellipsoids kk\ and BB^ replace the polytopesA C n+m and B C p _ n
The ellipsoid AA t S m is expanded so as to just cover B B S m . Solving (6.12)
yields (X ) 1/2 = 1.163, and (6.13) states that lies in the interval
max
0.4749 = (6)" 1/2 1.163 < < 1.1163 (5) 1/2 = 2.6
60
The following figure illustrates the use of x mau as a guide to
max
approximating g* and hence Bg*.
0? 2 ]t
'[PV^Ij
Each 7*» 1 j< i _< r-n, is chosen so that 7*b. has a positive projection on
max
^ t -
+1 if x b. >
max l -
n =
(6.14)
v
max l
61
For the model problem, (6.14) was successful in generating the true g*,
and a dual linear program similar to (6.9) led to the calculation of
G= 1.1176. When applied to other problems, algorithm (6.14) did not
always produce the true g*. Nevertheless, the resulting approximate ©
is a lower bound on the true ©, since the approximate g* gives rise to
a Bg* which is a vertex of B" C p _ n . Occasionally, the approximation to
g* was improved by using (6.14) iteratively, the approximate Bg* obtained
on the previous iteration serving as a guide for the present iteration.
Frequently, the resulting increase in ©was marginal.
The preceding discussion has not made use of the simplification
of polytope representation described in the previous section. Since
B C = # C. and A C n+m = A C k +m , the same techniques used by Miller
[11] in the development of (6.12) and (6.13) m *y be used to establish
similar relationships based on B C k and A C k +m - Thus, may be approxi-
mated by (^ J 1 ^ 2 , where /i is the largest eigenvalue of the generalized
* max max
eigenvalue problem
ffy = jxAA 1 ^ (6.15)
1/2
Also, the approximation, (M mav ) , satisfies
max
<4 +m >" 1/2 KJ n i & i cw ,/2