IflW 
 
 I 
 
 HI 
 
 
 
CENTRAL CIRCULATION BOOKSTACKS 
 
 The person charging this material is re- 
 sponsible for its renewal or its return to 
 the library from which it was borrowed 
 on or before the Latest Date stamped 
 below. You may be charged a minimum 
 fee of $75.00 for each lost book. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 TO RENEW CALL TELEPHONE CENTER, 333-8400 
 
 UNIVERSITY Of ILLINOIS LIBRARY AT URBANACHAMPAIGN 
 
 JAN 2 8 1996 
 FEB 1 2 1997 
 AUG 2 4 2006 
 
 When renewing by phone, write new due date below 
 previous due date, L162 
 
b/u- * t 
 
 UIUCDCS- 
 
 -R- 
 
 ■77-862 
 
 
 
 771 
 
 &Jsli 
 
 UILU- 
 
 -ENG 77 
 
 rjih 
 
 
 
 GAUSSIAN 
 
 ELIMINATION 
 
 AND 
 
 NUMERICAL 
 
 INSTABILITY 
 
 
 
 
 
 
 
 by 
 
 
 
 
 
 
 
 
 
 Robert 
 
 D. 
 
 Skeel 
 
 
 
 ,ne Library ot «* 
 
 
 
 
 
 
 
 
 
 JIM- 
 
 1977 
 
 v oi W' 00 '" 
 
 
 April 1977 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/gaussianeliminat862skee 
 
UIUCDCS-R-77-862 
 
 GAUSSIAN ELIMINATION AND NUMERICAL INSTABILITY* 
 
 by 
 Robert D. Skeel 
 
 April 1977 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 -"Supported in part by the US Air Force under grant AFOSR-75-2854, 
 
1 . Introduction 
 
 The book by Forsythe and Moler [1967] gives an error analysis 
 for Gaussian elimination drawn from the work of Wilkinson and an analysis for 
 iterative improvement due to Moler. In this paper we do a careful backward 
 error analysis using a different idea of what it means for a perturbation 
 to be small, namely that each datum is subject to a small relative change. 
 
 A system of n equations 
 
 n 
 
 I a..x. = b . , 1 < i < n, 
 
 in n unknowns x. , 1 < i < n, is often written in matrix notation as 
 l — — 
 
 Ax = b. 
 
 The solution x computed in floating point arithmetic is not generally 
 
 exactly equal to x, but it still may be acceptable if it satisfies one of 
 
 two criteria: 
 
 (i) It fulfills the accuracy requirements of the problem poser. 
 This usually involves some measure of how far x is from x. 
 For example, in determining the coefficients of an inter- 
 polating polynomial it is some norm of the residual 
 r = A(x - x) which is of concern. More often though it 
 is some norm of the error x - x which is of concern, 
 (ii) It is the exact solution of a problem which differs from 
 
 the given problem by less than the uncertainty in the data, 
 so that it is conceivable that x exactly solves the original 
 problem. Uncertainty in the data is generally present if 
 only because of the roundoff error introduced when the 
 numbers are put into the computer. 
 
 It is this second criterion which motivates backward error analysis. 
 
In §2 we give mathematical form to the informal discussion of 
 ill conditioning found in Hamming's [1971] book Introduction to Applied 
 Numerical Analysis. The condition of a system is the sensitivity of the 
 solution to uncertainty in the data, and this sensitivity is often measured 
 by a condition number. The usual definition of the condition number is 
 based on the idea that the uncertainties in the data are roughly the same 
 size in an absolute sense. However, if we suppose that these uncertainties 
 have roughly the same relative size, then we obtain 
 
 A _1 1 I A I I x 
 
 as the condition number of a linear system where ||°|| is the max norm. 
 
 There are at least a couple of reasons for believing that this second 
 
 approach is more realistic. First, most numerical computations are done 
 
 in floating point rather than fixed point arithmetic, and for floating 
 
 point computation the conversion of data to machine represented to 
 
 numbers results in errors of the same relative size. Second, measurement 
 
 errors are usually more nearly the same in relative size than in absolute size. 
 
 In §3 we pose the question: What is the least amount by which 
 A and b must be perturbed so that x exactly solves the perturbed problem? 
 The answer to this question turns out to be 
 
 | (b - Ax) . | 
 max ' i ' 
 
 i (|b| + |A| |x|) . 
 
 which we call the "backward error." If this is less than the unit roundoff 
 error u, then the second acceptability criterion is satisfied. And if it 
 can be shown that an algorithm produces a backward error that is always 
 bounded by some fixed multiple K(n)u of the unit roundoff error, then by 
 increasing the precision of the intermediate results by a factor K(n) , the 
 
second acceptability criterion can be met. An algorithm with this desirable 
 
 property is said to be stable. If an algorithm is only stable for 
 
 infinitesimal values of u, it is said to be asymptotically stable. 
 
 The stability of Gaussian elimination with row pivoting (usually 
 
 called partial pivoting) is examined in §4. By using an example of 
 
 Hamming's it is shown that row pivoting is not asymptotically stable even 
 
 for systems with equilibrated matrices. Then by means of a careful error 
 
 analysis performed in Appendix A a bound on the backward error is obtained 
 
 which contains the quantity 
 
 max , 1 -1. I I * I N 
 (D. A x ) . 
 1 '1 ' ' ' l 
 
 min . 1 -1. i i - i x 
 • ( Pi A x ) • 
 
 where D is the matrix of row scaling factors. This quantity is minimized 
 
 by choosing 
 
 D 1 = diag(|A| |*|), 
 
 which calls for the i-th row to be divided by la... x, + a._ x„ I +. . .+ 
 
 1 ll 1 ' ' i2 2 ' 
 
 la. x |. It is shown that with such a choice for D n row pivoting would be 
 1 in n' 1 
 
 stable. Of course this is impractical, which explains Stewart's [1973, p. 151 
 observation that "In spite of intensive theoretical investigation, there is 
 no satisfactory algorithm for scaling a general matrix." Nonetheless, the 
 ratio 
 
 max 
 i 
 
 (|A||x|) 
 
 mm , I . I I * I v 
 • C A x ) . 
 J " J 
 
 is an excellent a posteriori measure of how poorly scaled the system is. 
 
 Sometimes, programming considerations (Sherman [1976]) call for 
 the use of column pivoting instead of row pivoting, where by column 
 pivoting we mean that columns are interchanged so that each pivot is the 
 largest in its row. In §5 it is shown that column pivoting could be 
 made stable if it were somehow possible to scale the columns with 
 
the matrix of scale factors 
 
 D 2 = diag(|x|). 
 This calls for each column to be multiplied by its corresponding computed 
 solution value. A measure of ill scaling is given by 
 
 (|A|e). ||x|| 
 max ' ' 1 " " 
 
 i (|A||i|). 
 
 where e is the vector of all ones. 
 
 Row pivoting may be regarded as the generalization of complete 
 pivoting in which the ordering of the columns is arbitrary, and similarly 
 column pivoting as the generalization in which the ordering of the rows is 
 arbitrary. From this observation it follows that the results of both §4 
 and §5 apply to complete pivoting. However, one suspects that the error 
 of complete pivoting satisfies an error bound which is appreciably better 
 than simply the smaller of the bounds for row and column pivoting. 
 
 In §6 iterative improvement is examined in the hope that the poor 
 stability properties of Gaussian elimination can be corrected. It is shown 
 that a single iteration of iterative improvement performed in single 
 precision is enough to make Gaussian elimination asymptotically stable. 
 
 Before proceeding, it might be interesting to demonstrate the 
 instability of complete pivoting with a simple 2x2 system of equations. 
 Consider Ax = b where 
 
 A = 
 
 3 3 
 -1 
 
 and b = 
 
 The coefficient matrix A is equilibrated according to the definition of 
 Forsythe and Moler [1967, p. 45]. Using rounded t digit decimal (floating 
 point) arithmetic the elimination step yields A'x = b' where 
 
A' = 
 
 3 3 
 .99- • -9 
 and so the computed solution 
 
 and b* = 
 
 1 
 33---3 
 
 x = 
 
 .33---3 x 10 
 .33-V3 
 
 -t- 
 
 The backward error is determined by considering perturbed problems of 
 the form 
 
 3(1 + 6 U ) 3(1 + 6 12 ) 
 - (1 + 6 21 ) 
 
 1 + 6 
 
 13 
 
 and choosing the relative changes 6.. so as the minimize the maximum <5 . . 
 
 ij ij 
 
 In this case, 6_ must be chosen to be -1, and so the backward error is 
 
 100% regardless of the precision t. 
 
2. Condition of Linear Systems 
 
 The condition of a problem is the sensitivity of its solution to uncertainties 
 in the problem data. The importance of this concept is that it indicates the 
 amount of accuracy that one should reasonably expect for the solution of a 
 problem with inexact data. And even for problems with exact data, the 
 conversion of the numbers to the computer's floating point number base usually 
 introduces errors. 
 
 As the measure of the condition of a problem we take the maximum 
 amount by which an infinitesimal perturbation in the problem data can be 
 amplified in the solution. More precisely if C denotes the given problem 
 data and <J>(?) denotes the solution of a problem with data £, then we define 
 the condition nwnbev to be 
 
 lim relative distance from <{>(£) to c|>(£) /? -, \ 
 
 £->-£; relative distance from £ to £ 
 
 In the case where £ and cj)(£) are scalars, the condition number is the absolute 
 
 value of the relative derivative, namely 
 
 <KO i 
 (cf. Bauer [1974]). For linear algebraic systems Ax = b we have £ = (A, b) 
 and 4>(£) = A b. (In roundoff analysis the number of equations n is not 
 considered to be part of the problem data; rather we take the point of view 
 that each value of n defines a separate class of problems.) There are two crucial 
 matters that have to be settled: (i) how to define relative distance in 
 the problem space, and (ii) how to define relative distance in the 
 solution space. 
 
 Any problem which is to be solved in an approximate sense is 
 incomplete unless there is also some "metric" specified for measuring how 
 
good the approximation is. It is this metric that should be used in defining 
 the "relative distance from <f>(C) to <KO." Often this metric measures how 
 close the approximate solution is to the true solution; in other cases it 
 measures how well the approximate solution satisfies the problem. In this 
 section we choose to consider the problem of approximating A b rather than 
 that of approximately solving Ax = b. The ratio 
 
 where | • j denotes the max norm is an adequate measure of relative distance 
 for most purposes if the unknowns are appropriately scaled. 
 
 The question of how to measure the "relative distance from \ to £" 
 is more difficult to answer because a completely specified approximation 
 problem need not include a metric for the problem space. However, there is 
 one metric which is always safe to use, namely the componentwise relative 
 error 
 
 \i. - u 
 
 max ' 1 i 1 
 
 If the value of this quantity is small, then | is close to £ by any reasonable 
 standard, especially in view of the fact that putting data into the computer 
 results in small componentwise errors. This metric has another advantage in 
 that it is always meaningful regardless of the physical dimensions of the 
 problem data and thus it is independent of possibly arbitrary choice of units 
 for the data. For these reasons we take as our measure of relative distance 
 the smallest e _> such that 
 
 I a . . - a.. I < e|a..| and lb. -b.l < elb.l 
 This seems to be consistent with the ideas expressed in Hamming's [1971] book 
 Introduction to Applied Numerical Analysis. On page 117 it is stated that 
 
"The term 'ill-conditioned' is ill defined. The vague idea 
 is that small changes in the initial system can produce large 
 changes in the final result. If we are to take floating 
 point seriously, then we should say 'relatively small changes' 
 and 'relatively large changes'." 
 
 and on page 122 it is stated that 
 
 "...the system is indeed ill conditioned because, no matter 
 how we try, we are unable to solve the system so that the 
 answer is not sensitive to small changes in the original 
 coefficients. " 
 
 Thus it seems that by "relatively small changes in the initial system" 
 
 Hamming means "relatively small changes in the coefficients of the initial 
 
 system." A similar thought is expressed by Kahan [1966, p. 795], It is 
 
 worth mentioning that this approach has the advantage of forcing the perturbed 
 
 matrix A to have the same sparsity structure as the original matrix A, making 
 
 it more plausible to regard A as the result of perturbing the original physical 
 
 problem. 
 
 Having chosen our metrics, we are in a position to determine the 
 condition number of a system Ax = b . We begin by obtaining bounds on the 
 uncertainty in the solution due to the uncertainty in A and b. Bounds of 
 this type also appear in Bauer [1966]. Our notation uses inequalities 
 between arrays to mean inequality of the corresponding components. The 
 absolute value of an array is also to be understood in a componentwise sense. 
 
 THEOREM 2.1. Let Ax = b and (A + 6A) (x + 6x) = b + 6b where 
 | <SA | <_ e | A | and | 6b | < e | b | . Then 
 
 «*ll< , II lA -1 ||A||x| + lA^llbl || 
 
 ™ " (l-elllA^llAllDHxH 
 
 provided that the denominator is positive. 
 PROOF. We have that 
 
 6x = A 1 6A(x + fix) + A 1 6b , (2.2) 
 
and so 
 
 5x| £ |A 1 ||6A|(|x| + |6x|) + |A 1 ||6b| 
 
 < £ I A -1 1 I A I C I x I + |6x|) + £ | A -1 ! I b I . 
 
 Therefore 
 
 6x|| < elllA^llAlM + |A- 1 ||b|||+ clllA-MlAlllllfixll . Q.E.D. 
 
 Note. Bauer [1966] shows that the bound of Theorem 2.1 can be 
 improved by replacing (1 - e || |A | |a| ||) by ||(I - e |A | |a|) ||, and so it 
 is not necessary that e || |A | |a| || < 1 but only that the spectral radius of 
 e | A | |a| be less than 1. 
 
 THEOREM 2.2. Let Ax = b . Then there exist 6k and 6b suah that 
 | 6 A J = e |a| , | 5b | = e |b|, and the solution x + 6x of (A + 6 A) (x + 6x) = 
 b + 6b satisfies 
 
 IISxl 
 
 > HlA^llAllxl + lA^Mblll 
 " (l + .lllA^HAllWWI 
 
 PROOF. Let I be such that 
 
 (|a _1 ||a||x| + | a -1 1 1 b | ) ^ = || |a _1 | |A| |x| + |A _1 ||b||| 
 
 Define 6A and 6b by 
 
 6a jk = sgn(a £jXk ) £ |a jk | 
 
 and 
 
 6b . = sgn(a. . ) £ |b . 
 3 ^J J 
 
 where A = (a . . ) . Then 
 
 (A 1 6Ax + A 1 6b) - S T. a 6a x,+ E a. . 6b. 
 
 = £ C | A -1 1 I A I | x | + |A _1 ||b|) £ 
 = £|||A- 1 ||A||x| + lA^llblH , 
 
but from (2.2) 
 
 10 
 
 (A 1 6Ax + A ^b) = (6x - A 1 6A6x) Ji , 
 
 and so 
 
 e II I A 1 | |a| |x| + | A 1 \ lb I || < II 6x|| + £ II |A 1 | |a| II ||6x|| . Q.E.D. 
 
 THEOREM 2.3. The condition number i as defined by (2.1), of a 
 linear algebraic system Ax = b is 
 
 A x 
 
 -1 
 
 + lA^llb 
 
 PROOF. The condition number of a linear algebraic system Ax = b is 
 
 lim 
 
 6x M x 
 
 e(6A,6b) -y e(6A,6b) 
 
 where e(6A,6b) = min{e _> 0: J 6 A j ;< e | A | , | 6b | <_ e | b | } and 6x satisfies 
 
 (A + 6A) (x + 6x) = b + 6b . Consider any sequence (6A , 6b ) for which 
 
 m m 
 
 e (6 A , 6b ) -*■ as m -*■ °°. By Theorem 2.1 we have 
 mm 
 
 6x 
 
 , (kk xu ^ || |A 1 | |A| |x| + |A 1 | |b| || 
 
 < e(6A ,6b ) LLJ LJ — u — ' ' — - — LJ — L - n 
 
 ~ mm,. ,«. „, v iii.-lii.iiK 
 
 (1 - e(6A ,6b ) A 
 m m 
 
 for sufficiently large m. Therefore 
 
 1^ Hfix m M|x|l < HlA^llAllxl + lA^MbMI 
 
 m ^ co e (6A ,6b ) - ||x|| 
 
 mm 1 1 ii 
 
 which gives an upper bound on the condition number. Let e be a sequence 
 converging to zero. By Theorem 2.2 there exists a sequence (6A , 6b ) such 
 
 that e(6A , 6b ) = e and 
 mm m 
 
 ll«* II 
 ii m ii 
 
 -1 
 
 -1 
 
 > e 
 
 m 
 
 | A] |x| + |a I |b| || 
 
 (i + OIIa^iiaIIDMI 
 
 Therefore 
 
 — — 6x \\/\ x iiIa _ 1|IaM i I A —1 1 i , i ti 
 
 lim " m" " " || |A |A| |x| + |A |b | 
 
 m -> co 
 
 m 
 
 x 
 
 which gives a lower bound on the condition number. Q.E.D. 
 
11 
 
 In subsequent sections of this paper, we will consider the effects 
 of perturbing only the elements of the coefficient matrix. 
 
 THEOREM 2.4. Let Ax = b and (A + 6 A) (x + 6x) = b where |<5A| <_ 
 e | A | . Then 
 
 fix 
 
 el llA^llA ll xM I u 
 
 (l-.HlA-HjAllMWI 
 PROOF. Similar to that of Theorem 2.1. Q.E.D. 
 
 THEOREM 2.5. Let Ax = b. Then there exists 5A such that | 6A | = £ |a| 
 
 and such that the solution x + 5x of (A + 6A) (x + 5x) = b satisfies 
 
 6x 
 
 , elllA^llAllxlH 
 
 " (l + eHlA^llAllDHxIl 
 
 PROOF. Similar to that of Theorem 2.2. Q.E.D. 
 
 It follows from these last two theorems that when only A is subject 
 
 to uncertainty the condition number is 
 
 A 
 Cond(A, x) = [UJ1 
 
 ALkLU 
 
 -ii 
 
 x 
 Since || |a _1 | |a| |x| || < || |a _1 | |a| |x| + |A _1 | |b| || £ 2|| |a _1 | |a| |x| ||, 
 
 Cond(A, x) is also adequate for the case where both A and b are subject to 
 
 uncertainty. A similar quantity 
 
 ||A- 1 ||,||Ae ( . ) || |x.| 
 
 K(A, x) = ^r 
 
 X 
 
 is used by Van der Sluis [1970a], which he calls [1970b] the "condition 
 
 number of the solution." Here e,. x denotes the i-th unit vector. 
 
 (j) 
 
 The condition number of a matrix A could be defined as the 
 
 T 
 maximum value of Cond(A, x), which is achieved with x = e = (1, 1,..., 1) . 
 
 Thus 
 
 Cond(A) = Cond(A, e) = || |a _1 | | a| ||. 
 
 This quantity is more satisfying as a measure of ill condition than the 
 
 usual cond(A) = ||a ||||a|| for a couple of reasons. First, the matrix 
 
12 
 
 | A | |A| is a mapping of the solution space into 
 
 itself, which means that the quantity || |A | |a| || can be defined entirely in 
 
 -1 
 
 terms of the solution space norm. whereas cond(A) = ||A ||A|| is defined in 
 
 terms of a solution space norm and a residual space norm, which seems quite 
 
 unnecessary. Second, the quantity Cond(A) is invariant under row scaling. 
 
 Multiplying a system of equations by a diagonal matrix does not 
 
 change the problem in any fundamental way. For example, all 
 
 systems Dx = b where D is diagonal are well conditioned. Accordingly, we 
 
 have that Cond(D) = 1; whereas cond(D) can be arbitrarily large. 
 
 Example. According to Hamming [1971, p. 120], the system Ax = b 
 is well conditioned where 
 
 A = 
 
 3 2 1 
 2 2e 2e 
 1 2e -e 
 
 b = 
 
 3 + 3e 
 6e 
 2e 
 
 The inverse of the coefficient matrix and the solution are given below: 
 
 -.6e .4 
 
 .2 
 
 C 1 = - 
 
 1-1. 8e 
 
 Hence 
 
 lA" 1 ! lAl = X 
 
 .4 -J.e 1 -.3 .2e 1 -.6 
 .2 .2e _1 -.6 -.4e -1 -.6 
 
 1+1. 8e 2.4e 1.6e 
 -1 
 
 x = 
 
 1-1. 8e 
 
 4e +1.2 1.4-.6e 
 -1 
 
 8e 
 
 1.6 l-.6e 
 
 and 
 
13 
 
 a 1 | |a| IxI + |a 1 | Ibl = 
 
 1-1. 8e 
 
 9.6e + 3.6e' 
 4.8 + 2.4e 
 6 - 2.4e 
 
 
 which shows that the system is well conditioned. However, 
 
 Cond(A) = 
 
 ,8e - 1 + 2.6 - .6e 
 
 1-1. 8e 
 
 which indicates that the system would be ill conditioned for some different 
 right hand side b, and in fact, Hamming [1971, p. 122] gives such an example, 
 
14 
 
 3. Stability of Algorithms for Linear Systems 
 
 Let +, -, *, / denote the floating point operations corresponding 
 
 to +, -, x, /. Every reference to a floating point result x o y carries with 
 
 it the assumption that x, o, and y are such that the result is well defined. 
 
 Nothing is assumed about the floating point arithmetic except that the 
 
 relative roundoff error is bounded by u/(l + u) where the unit roundoff 
 
 error u is a small positive number; that is, 
 
 x 6 y = (x o y) (1 + 6) 
 for some 6 depending on x, o, and y which satisfies 
 
 1 ' — 1 + u 
 
 It follows from the above condition that 
 
 x O y 
 x o y = — 
 
 1 + 6' 
 where |6'| <_ u. Note that for rounding u = -r- and for chopping u = 8 
 where B is the base and t is the number of base 3 digits in the fraction of 
 the floating point numbers. 
 
 For any computed solution x we define the relative backward error 
 to be the smallest real number n,~>. such that 
 
 (A + 6A)(x - 6x) = b + 6b 
 for some 6 A, 6b, and 6x with 1 6 A | <_ r\ ( ^.\k\, |6b| <^ n /^\ |b| , and 
 |6x| <_ ri/^\|x - 6x| . The backward error can be interpreted in the following 
 way: The computed solution x is the rounded solution of a problem with 
 rounded data where H/o\ is the maximum relative roundoff error. Thus, if 
 Hz-n is no larger than the unit roundoff error u, then our solution is as 
 good as our data deserve; otherwise, improved accuracy may be justified, 
 perhaps by using iterative improvement. Further motivation for this definition 
 is given in Miller [1975] and Bauer [1974]. If a solution cannot be computed 
 because the matrix is nearly singular, then the backward error is defined to be 
 
15 
 
 the smallest real number H/o\ such that A + 6A is singular for some 
 
 6A 
 
 with | 6A| <_ H/o\ |A[ . 
 
 Stability of the algorithm means that there exists a stability 
 constant K(n) and a stability threshold u(n) > both independent of the 
 problem data (A, b) such that the relative backward error 
 
 ^(3% 1 K(n)u 
 provided that u £ u(n). A weaker concept asymptotic stability allows the 
 threshold u(n) to be data dependent. These two types of stability are the 
 same as the "backward stability" and "asymptotic backward stability" used 
 by Miller [1972]. 
 
 The backward error n . . is not easy to determine, and for this 
 reason we introduce two variants of the backward error which are easier to 
 compute. Let H^ 9 >. be the smallest real number such that 
 
 (A + 6A)x = b + 6b 
 for some 6A and 6b with |6A| <_ n, ? v|A| and |6b| _< n,„v |b|. Let n,-,^ be the 
 smallest real number such that 
 
 (A + 6A)x = b 
 for some 6A with | 6A | Ji n ,-, >J A | . Naturally r\ (r >\ .1 M/n\ i.n^x- 
 
 THEOREM 3.1. Let I = {i: (|a||x| + Ibl). = 0}. The backward error 
 
 r | (b - Ax) . | 
 
 JiTl (lA l| x l + l b ] ). V (b " A ^i - /« i € I. 
 
 n (2) = \ 
 
 + oo 
 
 otherwise . 
 
 PROOF. First consider the case where (b - Ax). ^ for some i£ I, 
 Suppose that n,^ < + °°. Then there exist 6A and 6b such that (A + 6A) x 
 
 = b + 6b where 1 6A | <_ T\,~Ak\ and 1 6b | _< n,^ |b 
 
 We have 
 
16 
 
 |b - Ax| = |fiAx - fib | 
 
 £ |6A| |x| + | 6b | 
 
 £ n (2) (|a| |x| + |b|), (3.1) 
 
 which is impossible since (b - Ax). ^ and (|a||x| + |b|). = 0. Therefore 
 
 H/ 9 \ = + °°. Second consider the case where (b - Ax). = for all i 6 I, 
 
 To obtain an upper bound on n^y consider the choice 
 
 (b - Ax). 
 6a _ f Sgn( V |a iJ ! (|A||x| + |b|). 1 * 1 - 
 
 lj L0 16 1, 
 
 1* I. 
 
 and 
 
 _i b (b - Ax). 
 
 6b. - f l ( I A 'I-I + Ib|) ± 
 
 l 1 16 1. 
 
 We have 6 Ax - 6b = b - Ax or (A + 6A)x = b + 6b, and so 
 
 n / n x < max 
 
 (b - Ax) J 
 
 (2) - ."I j (|A|[x| + |b|). 
 Since n,.^ < + °°, equation (3.1) must hold; and so 
 
 I (b - Ax) . | 
 
 n. . > max / | a I u I ■ I u I n • Q.E.D 
 
 (2) ~ ± £ j (|A| |x| + |b|) i 
 
 THEOREM 3.2. Let I = {i :(|a||x|). = 0}. The backward error 
 
 — if (b - Ax) . = for i 6= I , 
 
 i 
 |(b - Ax). 
 
 max 
 
 n 
 
 iTi TW ^ 
 
 ■+ oo otherwise. 
 
 PROOF. Similar to Theorem 3.1. Q.E.D. 
 
 Remark. Similar types of results apply to other problems; for 
 
 example, the backward error for a polynomial equation a» x + a x +. . .+ 
 
 a = is given by 
 n b j 
 
 l ~n ^n-1 , | 
 a_. x + a. x +. . .+ a 
 
 n (2) -_L<L ^ ni_ . 
 
 I ~ n I I ^ n_ i I L _L I I 
 
 a_ x + a. x +. . .+ a 
 1 ' '1 ' ' n ' 
 
17 
 
 The following theorem gives bounds on the relative backward error 
 H/o\ in terms of the more easily computed r\s-,\ and n^-v- These bounds show 
 that the n's are roughly the same size when the backward error is small, 
 and so in the remainder of the paper only the quantity n,,N is used, which 
 we denote simply by r\. 
 
 THEOREM 3.3. The three types of baahdard error satisfy 
 
 n (2) 
 
 <n (3) <n (2) . (3.2) 
 
 < n (3) <ti (1) , (3.3) 
 
 2 + n (1) ln (2) 1^(1) ' (3 ' 4) 
 
 PROOF. The second inequalities of (3.2), (3.3), and (3.4) are 
 obvious. The first inequality of (3.2) is obvious if ri, ^ •> 1. Hence 
 assume H/«\ < 1. There exist 5A, 6b, fix such that 
 
 (A + 6k) (x - <Sx) = b + 6b 
 
 2 + n (2) 
 
 n (D 
 
 3 + 3^(1) 
 
 n (D 
 
 where |6A| £ n^J a| , | 6b | <_ n,.^ |b| , |6x| < n, 3 Jx - 6: 
 
 lb - Ax I = |fiA(x - fix) - A6x - fib I 
 
 Hence 
 
 It is easily shown 
 
 £ 2n/ov|A||x - 5x| + M/ 3 Jb|. (3.5) 
 
 x - fix < x , (3.6) 
 
 -l - n (3) 
 
 and so 
 
 2n 
 
 - Ax I (3) (|a| |x| + Ibl) 
 
 -l - n (3) 
 
 Therefore n, 2 \ £ 2n, 2 s/(l - ^ ( r>\) which verifies (3.2). The first inequality 
 
 3 3 
 
 of (3.3) is obvious if H/^n >. "F. Hence assume H/o\ K T* From (3.5) 
 
18 
 
 b - Ax J <_ 2n, 3 JA| |x - 6x| + n, 3 x |b - Ax | + n ^v |A| |x| , 
 
 and using (3.6) gives 
 
 2n 
 
 Therefore 
 
 (l ■ rw)|b ■- A*| ( T -^- + n,„.)|A||x|. 
 (3) ' — 1 - ri/o-j (3) 
 
 n (3) O - n (3) ) 
 (1 - n (3) ) 
 
 wh 
 
 < 3n (3) (1 ~ 3^ C3> > 
 (1 - 3n (3) )(l - 3n (3) ) 
 
 ich proves (3.3). The first unequality of (3.4) is obvious if n,~N >, 1. 
 Hence assume H/ 9 \ < 1. We have 
 
 |b - Ax | <_ n (2) (|b| + |A| |x|) 1 n (2) (|b - Ax | + 2 1 A | |x|), 
 and so 
 
 lb - a* I < , n(2) IaIIxI. 
 -l-n (2) 
 
 Therefore 1/-,>. £ 2n/-N./(l - H/^), which implies (3.4). Q.E.D. 
 
 A good algorithm should (i) return an acceptable answer most 
 of the time (robustness) , and (ii) signal failure whenever it does not 
 return an acceptable answer (reliability) . We formally define an algorithm 
 to be reliable if there exist K(n) and u(n) such that for any (A, b) and 
 any u _< u(n) either the algorithm computes an answer with n <_ K(n)u or the 
 
 algorithm signals failure. 
 
 Any algorithm for solving linear systems can be made reliable by 
 computing the backward error with floating point arithmetic and then accepting 
 the answer only if the computed backward error is less than a prescribed multiple 
 of the unit roundoff error. For example, the next theorem shows that if the 
 computed backward error f) < Ku, then we can conclude that n < (K + n)ue 
 

 19 
 
 The residual is to be computed in single precision 
 
 r. = b. - (...(a... x x, + a._ x x_) . . . + a. x x ) 
 11 ll 1 i2 2 in n 
 
 or in double precision 
 
 r. = fl(b. - (...(a., x x. + a._ x x )... 4- a. * x )). 
 l i 11 i 12 2 in n 
 
 Here 6 denotes the double precision counterpart of O where it is assumed that 
 
 x 6 y = (x o y) (1 + 6) 
 
 i i 2 2 
 with |6|<_u/(l + u). In practice the double precision unit roundoff error 
 
 is either this small (rounding in base two) or smaller. By fl(°) we mean 
 
 the conversion of a double precision value to a single precision value. It 
 
 is assumed that fl(xoy) = (xoy)(l+<5) with -u/(l+u) < 6 £ u, which is true for 
 
 rounding and chopping. The computed backward error f) is determined by 
 
 n = max (|r 1 |7(...(|a ±1 * k ± \ + \a ±2 * ^|)... + |a ±n * X J). 
 i 
 
 THEOREM 3.4. If n is the computed value of r\ } then 
 
 -(n+2)u " _ nu (n+2)u - nu 
 
 e n-nue ^.n£e n+nue 
 
 where 
 
 r\x for single precision residual accumulation, 
 
 U \ 2 
 
 V-u for double precision residual accumulation. 
 
 PROOF. Let q be the computed value of Ax. By the usual type of 
 error analysis 
 
 |q - Ax | < [(1 + G) n - 1] |a| |x| 
 
 < nu e nU | A | |x| . (3.7) 
 
 We have that 
 
 (i - y^-) 2 
 
 ~ . 1 + u 
 
 r\ > max 
 
 b - 
 
 jm 
 
20 
 
 and 
 
 f\ < 
 
 (1 + u)(l + 
 
 1 + u 
 
 ) 
 
 max 
 
 (1 " 
 
 U _nT1 
 
 b - q| 
 
 Arm 
 
 1 + u' 
 
 where division of two vectors is defined componentwise 
 
 This reduces to 
 
 f\ < max 
 
 (1 + 2u)(l + u) n 
 from which it follows that 
 
 -(n+2)u 
 
 Anrr- 
 
 (1 + 2u) 
 
 (1 + u) 
 
 n-2 
 
 n < max 
 
 b - 
 
 A^ 
 
 F* 
 
 (n+2)u 
 
 Using 
 
 and (3.7) gives 
 
 _ (n-2)u 
 
 b - q| - |q - Ax | 1 |b - Ax | . . |b ■■ q| + |q " Ax| . 
 
 _ (n-2)u 
 
 ,b ■■ q| ■ nC e^ -|A| |x| 1 |b - Ax| < |b - q| + nu e 
 Dividing by |a||x| and taking the maximum yields 
 
 b - ql L - (n-2)u 
 -f- 1 - + nu e 
 
 A x 
 
 max 
 
 b •■ a I - (n-2)u 
 
 nu e < ri < max 
 
 aTH 
 
 #- 
 
 A xl 
 
 Q.E.D. 
 
 Before concluding this section, it should be mentioned that for some classes 
 of problems it may be unreasonable to expect an algorithm to be stable. If the numbert 
 output values is fairly large compared to the number of input values, then 
 it becomes very difficult for an algorithm to be stable because each output 
 value must arise from the scone perturbation of the input values. For 
 example, Miller [1975] shows that the usual algorithm for inverting triangular 
 matrices is unstable. Hence it seems better to use stability as a relative 
 concept rather than an absolute concept. This idea is used by Miller [1976] 
 in a paper entitled "Roundoff analysis by direct comparison of two 
 algorithms." 
 
21 
 
 4. Gaussian Elimination with Row Pivoting and Scaling 
 This section applies the ideas of the preceding section to Gaussian 
 elimination with partial pivoting using row interchanges and implicit row 
 scaling. The reciprocals of the scale factors are to be given as inputs 
 
 d.. , d_,..., d to the algorithm, and so the pivoting is done as if one 
 1 z n 
 
 were solving D Ax = D b where D = diag(d l5 d„,..., d ). To keep the 
 
 L Z n 
 
 notation simple, it is assumed that the equations are numbered according 
 
 to their ordering after all row interchanges have been performed. The 
 
 computations of the algorithm are as follows: 
 
 (1) 
 a . . = a . . , 
 
 1.1 ij 
 
 for k - l(l)n - 1, 
 
 af k+1) =a< k) -m„*a<*>, l.j »k+l. (4.2) 
 ij ij ik kj ' J - 
 
 '0 if i < j, 
 y m. . if i > j, 
 
 IJ 
 
 ij 
 
 < a ij lf i - j ' 
 U if i > j, 
 
 (4.3) 
 
 y i = V 
 
 for i = 2(l)n, y. = b. - (...(l ±1 x ¥± + l^ x y 2 ) . . . + l ± ±mml x y._ 1 ) , 
 
 x = y /u , 
 n n nn 
 
 for i = n - 1(-1)1, (4.4) 
 
 x. = (y. - (...(u. , - x x ., 1 + u. . x x., )... + u. ><x ))/u... 
 i l i,i+l l+l i,i+2 i+2 in n n 
 
 It is assumed that the selection of a pivot is done exactly so that 
 
 In cases where there are more than one suitable pivot, the one with the lowest 
 row index is chosen. The assumption of exact pivot choice avoids some minor 
 technical difficulties, and it also makes for a sharp error 
 bound in the case where there is no scaling. 
 
22 
 
 It is important to appreciate the nature of the functional 
 relationship between x and D. The computed solution x is a function 
 £(P) of the row permutation P, which in turn is a function 11(D) of the 
 scaling matrix D. (Note that H(D) is also defined for values of D which 
 are not floating point numbers because the algorithm does not perform 
 floating point arithmetic on D.) If x is viewed as a function defined 
 
 in (d.. , d_,..., d )-space, it would be constant over regions bounded by 
 
 (k) 
 hyperplanes passing through the origin. For example, let a.. be the values 
 
 corresponding to a certain choice d. = d. of scale factors. Then x is 
 
 1 1 
 
 constant for all values of (d^ , d ,..., d ) which satisfy 
 
 d. > 
 
 x ■ 
 
 ff 
 
 L (k) 
 
 kk 
 
 i > k. 
 
 It is necessary for the completeness of the theory that the Gaussian 
 elimination algorithm be extended to permit the use of zero as a scaling 
 
 factor. For any diagonal matrix D = diag(d , d„,..., d ) we let D denote 
 the matrix diag({d } , {d„} ,..., {d } ) where 
 
 {d} = 
 
 d if d 4 
 
 e if d = 0, 
 
 That is, D is the matrix D with all the zero diagonal elements replaced by 
 e. The condition (4.5) is replaced by the following: 
 
 lim 
 e -v 
 
 u.r 1 .« 
 
 i e lk 
 
 {d k } e a kk 
 
 < 1 
 
 , r] . (k) , (k) 
 Let e = minljd. a,, ' a -i. 
 
 d = 0, d. 4 0}. Then it can be easily shown 
 
 K. X 
 
 that for any c with < lei < e the scale matrix D has the same effect on 
 the choice of pivots as does D. 
 
 Unfortunately, Gaussian elimination with pivoting is unstable. 
 This instability arises in the decomposition stage when the quantities 
 
23 
 
 (k+1) (k) . * (k) 
 a . . = a . . - m . , x a, . 
 13 iJ ik kj 
 
 (k) 
 
 (k) 
 
 are being formed. If |m., a, . >> |a |, then the error in a.. is not 
 
 (k+1) 
 
 ik "kj 
 (k) 
 
 lj 
 
 13 
 
 very small relative to a.. , and so in our backward error analysis we cannot 
 
 throw this error back into a 
 
 (k) 
 
 The extreme case occurs 
 
 (k) (k+1) 
 
 when a.. =0 and a.. ^ 0, which is commonly called "fill in." For 
 
 sparse systems of equations it is quite common to order the rows so as to 
 
 avoid fill in. This reduces computational cost, and it apparently may 
 
 also contribute to stability. 
 
 The instability of Gaussian elimination has been pointed out bv 
 
 Hamming, who on page 119 of his book [1971] announces the 
 
 " Theorem Pivoting can take a well-conditioned system into an 
 ill-conditioned system of simultaneous linear equations." 
 
 and on page 123 states 
 
 "We have not justified the pivoting method; rather we have 
 shown that it is an 'old wives' tale.' But like most old 
 wives' tales, it is a mixture of truth and mystic faith." 
 
 To prove his theorem, Hamming uses the example discussed at the end of §2 
 For this example, one elimination step with partial or complete pivoting 
 yields the system A'x = b' where 
 
 A' = 
 
 -4 -2 
 -j- + 2e ~ + 2e 
 
 -2 -1 
 T + 2e T - e 
 
 , b' = 
 
 3 + 3e 
 -2 + 4e 
 -1 + e 
 
 (4.6) 
 
 assuming exact arithmetic. This problem is ill conditioned for small e 3 
 since 
 
 Cond(A',x) = 
 
 Se - 1 - 3 + 3e 
 1 - 1.8e 
 
24 
 
 If the elimination were performed in floating point arithmetic, then a 
 slight perturbation of (4.6) could result, which may have a solution which 
 differs from the true solution by an amount proportional to c . This kind 
 of error could not arise from slightly perturbing the original problem 
 because it has a condition number of about 6. For example, suppose that 
 the computed righthand side of (4.6) was 
 
 3 + 3e 
 
 b' = 
 
 -2 + 4e - 
 
 -1 + e + 
 
 and everything else was exact. Then 
 
 1 - e - 1 u/8 
 
 1 + e X u/4 
 
 and by Theorem 3.2 the backward error is u/(8e + u) . 
 
 A related observation was made by Gear [1975] 
 
 "It might be possible to say that 6A represents a perturbation to 
 the original physical problem if the sparsity structure of 6k 
 were the same as that of A. Unfortunately, we will show that such 
 a demand on the structure of 6A can lead to very large bounds on 
 I 6A J |, bounds probably dependent on the condition number of A." 
 
 This was supported by the example 
 
 11-1-1 
 e 
 e 
 JL 1 
 for which Cond(A) = 4. 
 
 b = 
 
 , x - 
 
 1 
 
 .-1 
 
 -1 
 
25 
 
 In one of the theorems that follow, it is shown that the situation 
 is not quite as bad as Gear suggests. The bounds on |6A| | depend not on 
 how ill conditioned the problem is but on how badly scaled the equations are. 
 We begin by obtaining a totally a priori error bound for Gaussian elimination. 
 The proof is modeled after that of Forsythe and Moler [1967], which is mostly 
 borrowed from Wilkinson [1963]. However, our error bound, like that of 
 Van der Sluis [1970b], is more informative than that of Forsythe and Moler 
 in that it distinguishes among the columns of AA. 
 
 THEOREM 4.1. Let the vector x be computed by Gaussian elimination 
 
 with row pivoting and row scaling where D = diag(d 1 , d_,..., d ) is the 
 
 12 n 
 
 matrix of reciprocal scale factors. Then there exists AA such that 
 
 (A + AA)x = b 
 with 
 
 || Id" 1 aa|z|| < X (n)u|| Id" 1 a|z|| 
 for arbitrary z >_ and arbitrary z satisfying < |e| < e" where 
 X (n) = [19- 2 n-2 - n - 81e 2nu . 
 
 PROOF. See Appendix A. Q.E.D. 
 
 Note 1 . The factor e appears in the Forsythe and Moler book 
 as the constant 1.01. The advantage of e is that it indicates the 
 nature of the higher order effects and it does not require placing 
 some arbitrary restriction on the size of nu. 
 
 Note 2. It is actually possible to show that | A A [ <_ L [ A | for 
 some lower triangular matrix L , although the best possible L R is 
 somewhat complicated. 
 
 With z = e in Theorem 4.1 we get the usual type of bound 
 Hd; 1 AA|| < X (n)u||D- 1 A|| . 
 
26 
 
 The bound of this theorem is practically always extremely pessimistic. 
 However, there are cases where this bound can be attained in the limit 
 as u -*■ 0. 
 
 THEOREM 4.2. There exists a problem Ax = b and a floating point 
 arithmetic <+, -, x, /> suoh that the solution x computed by Gaussian 
 elimination with partial or complete pivoting satisfies (A + AA)x = b 
 only for those matrices AA for which 
 AA 
 
 > ri9-2 n 2 - n - 81u + 0(u 2 ). 
 
 Therefore s the bound of Theorem 4.1 is the best possible bound in the 
 limit u ■+ 0. 
 
 PROOF. See Appendix A for the proof, which employs a modification 
 of Wilkinson's [1963] example. Q.E.D. 
 
 The next theorem uses Theorem 4.1 to get a bound on the backward 
 error. We are especially interested in the effect of scaling on the error 
 bound. 
 
 THEOREM 4.3. Let I = {i: ( | A| | x| ) . = 0} . If d . = for i G I 
 and d . ^ for i 4 1 > then the backward error 
 
 , s max |D a| |x| 
 
 min (Id A I I x I ) . 
 
 i#t ' * 
 
 PROOF. Putting z = |x| in Theorem 4.1 gives 
 |D^(b - Ax)||=||D^ AAx|| < || ID" 1 AA||x||| 
 
 (4.7) 
 
 < x (n)u |||d £ 1 A| |x|||. 
 
 Multiplying this by e and letting e -*■ gives 
 
 max | (b - Ax) . | _5_ X (n)u max ( | A| |x | ) . , 
 i e I x i G I x 
 
 from which it follows that (b - Ax). = for i£ T. Hence, from Theorem 3.2 
 we have that 
 
27 
 
 -1 
 
 (b - Ax). | (|D (b - Ax)]). 
 
 1 
 max , i . I i - i ; = max 
 
 I ^ I C|A| |Sfc|> ± . £ T ( | D "l A || ft |) i 
 
 and from (A. 7) we see that 
 
 (|D _1 (b - Ax) |) . ^ x(n)u max |d -1 a||x|. Q.E.D. 
 
 By choosing 
 
 d. = (|a| |x|). (4.8) 
 
 i ' . i 
 
 the bound on the backward error is minimized, giving n <^ x( n ) u « This 
 suggests that a linear system should be scaled by dividing each row by 
 its weighted £.. norm where the weights are the components of the computed 
 solution. Unfortunately, (4.8) represents an implicit equation for the 
 scale factors d. because the computed solution x is a function £ (11(D)) of 
 the scaling matrix D; that is, D must solve the equation 
 
 D = diag(|A| |?(n(D))|), (4.9) 
 
 for which a solution may not exist. The nature of this equation becomes 
 more apparent by noting that it is equivalent to solving for a 
 permutation P that satisfies 
 
 P = n(diag(|A||c(P)|)). (4.10) 
 
 For if D satisfies (4.9), then P = 11(D) satisfies (4.10); and if P 
 satisfies (4.10), then D = diag( | a| | £ (P) | ) satisfies (4.9). In principle 
 we could determine the solution to (4.9), if there is one, by testing to see 
 if any of the n! permutations P satisfy (4.10). In the cases where a solution 
 exists, the backward error is bounded by x(n)u. One suspects that (4.10) 
 almost always has a solution, and it is even conceivable that (4.10) always 
 has a solution, at least whenever £(P) is defined for all P. The existence 
 of a solution for (4.10) implies the existence of an ordering for the rows 
 which makes Gaussian elimination stable. 
 
28 
 
 If one wishes to solve (4.10), the following iteration would 
 likely converge and converge quickly for almost every system of equations: 
 P (Q) = II(diag(|A|e)), 
 
 P (m+1) = n(diag(|A||5(P (m) )|)), m = 1, 2,.... 
 This is not suggested as a practical algorithm though because poorly 
 scaled equations can usually be accurately solved by doing iterative improvement. 
 
 A more useful application of Theorem 4.3 is the diagnosis of 
 ill-scaling, for 
 
 a R (A, x) = 
 
 max 
 
 A 
 
 
 X 
 
 
 min 
 
 A 
 
 
 X 
 
 
 is an easily computable measure of how badly scaled the rows are. 
 
 Remark. The bound of Theorem 4.1 can be refined: 
 
 (|D AA|z). <_ x(n)u max (|d A|z).. 
 1 ~ J 1 i J 
 
 This suggests that 
 max ( | A | | x | ) 
 
 1 
 
 . ii2 . 71X17x1). 
 i > j i 1 1 i 'i 
 
 would be a better measure of the possible effect of ill scaling. 
 
 The quantity a D (A, x) is not very satisfactory for theoretical 
 purposes because x depends on the arithmetic used in the computation. We 
 
 would prefer to use a (A, x) for the theory. For Hamming's example 
 
 R 
 
 |a| |x| = (3 + 3e, 6e, 4e) and a (A,x) = -f- + j- . Near optimal row 
 
 R 
 scaling for this problem is given by 
 
 4e 
 
 D*A- 
 
 3 2 1 
 2 
 
 2 2 
 2 -1 
 
 , D _1 b = 
 
 3 + 3e 
 6 
 2 
 
 For Gear's example |a||x| = (2/e + 2, 1, 1, 2) and a R (A,x) 
 optimal row scaling is given by 
 
 = 2/e + 2. Near 
 
29 
 
 
 £ 
 
 E 
 
 £ 
 
 E 
 
 
 
 
 D" 1 A = 
 
 
 
 
 e 
 
 
 
 
 £ 
 
 
 
 
 , D 1 b = 
 
 1 
 
 1 
 
 
 1 
 
 
 
 
 
 1 
 
 
 2 
 
 One may wonder about the effect of scaling strategies such as 
 
 row equilibration. Van der Sluis [1970b, p. 80] gives an example showing 
 
 "that it is quite possible. .. that there exists no bound depending on n only 
 
 for the ratios of the errors after and before equilibration." He goes on 
 
 to describe a cautious equilibration scheme that never worsens the situation 
 
 at the expense of possibly not improving it. An adaptation of this scheme 
 
 to our theory is to choose 
 
 a. . 
 ill 
 
 d. = min max — 
 i , . 'a. . ' ' 
 
 which has the effect of leaving no row of A strictly dominated by any other 
 row of A. Note that, since d. < 1, we have min|D a| |x| > a| |x|. Furthermore, 
 
 (|a| |x|). = min Sl-^l 
 
 i£i 
 k I a k£' k£ l 
 
 < min max 
 
 JJ. 
 
 a, „ x 
 
 k j kj £ 
 <_ d . max | A | | x | , 
 whence max | D A||x| <_max|A||x|. Therefore 
 
 min ID A| I x 
 
 fr^- 1 ,*, \ max[p A 1 j x | 
 a R (D A, x) - -1 | - 
 
 max 
 
 A 
 
 
 X 
 
 min 
 
 A 
 
 
 X 
 
 = a R (A, x), 
 
 so that the scaling of the problem is not made worse by our choice of D, 
 
 
30 
 
 The theorem that follows gives bounds on the backward error that 
 in the limit u -> depend only on how ill scaled the problem is and not 
 
 how ill conditioned it is. First we need a lemma. 
 
 LEMMA 4.4. If D is nonsingular, then 
 
 D "l AAx || < X(n)u || |D- 1 A||x ||l 
 
 1 - x(n)u Cond(A D) 
 
 PROOF. We have 
 
 x + A 1 AAx = x, 
 
 which implies 
 
 and 
 
 D -""AAx = (I + D 1 AA "4)) 1 D X AAx 
 
 iD^AAxlU— llD_] 
 
 1 - || D 1 AA A * D|| 
 
 The term in the numerator 
 
 ||d _1 aax|| ± || |d _1 aa| | x | II 
 
 <_ x( n ) u I I D A I |x | ||, 
 
 and the term in the denominator 
 
 ||D _1 AA A _1 D|| = HId^AA A _1 D|e|| 
 
 ± || |d -1 AA| I A _1 D | e |i 
 
 <_ x(n)u || |d -1 a| |A _1 D|e|| 
 
 = x(n)u || |d _1 a| |A _1 d| II . Q.E.D. 
 
 THEOREM 4.5. Let D be nonsingular and let |a| |x| > 0. 
 
 Gaussian elimination with row pivoting gives 
 
 X(n)u a (D~ A, x) 
 n < * 
 
 1 - 2x(n)u Cond(A 1 D) a D (D 1 A, x) 
 
 R 
 
 provided that the denominator is positive. 
 
 PROOF. Choose AA as in Theorem 4.1. From Theorem 3.2 we have that 
 - AAx | 
 
 T) = max 
 
 XfPT 
 
31 
 
 Since x = x + A Ax, we have 
 
 |x| <_ |x| + |A~ 1 D|e||D~ 1 AAx|| , 
 and so 
 
 I d_1a II x I 1 I d_1a I|x| + |d _1 a| |A _1 D|e)|D~ 1 AAx|| 
 
 Thus 
 
 e||D -1 AAx|| 
 n < max 
 
 | D 1 A | | x | 
 
 e||D _1 AAx 
 < max 
 
 D 1 A||x| - e Cond(A 1 D)||d AAx 
 
 Applying Lemma 4.4 yields 
 
 x(n)u a (D A, x) _ 
 
 e||D AAx || < — |D A I |x| , 
 
 1 - x(n)u Cond(A D) 
 
 from which the theorem follows. Q.E.D. 
 
 Although we are unable to prove that there is always some 
 ordering of the rows for which Gaussian elimination is stable, we can show 
 that this is true asymptotically as u ■*■ 0. 
 
 THEOREM 4.6. For any problem such that |a||x| > there is some 
 ordering of the rows for which Gaussian elimination is asymptotically stable. 
 
 PROOF. Using Theorem 4.5 with D = diag(|A||x|) gives 
 
 n < *L(iOu 
 
 A 
 1 - 2x(n)u max J — ' 
 
 1| |A||x| 
 
 A| |x 
 for small enough u. Hence for 
 
 IaI Ixl 
 
 u < u(n) = min 
 
 4x(n) |a| |a I |a| |x 
 
 we have 
 
 n < 2x(n)u. Q.E.D. 
 
32 
 
 We end this section by examining the effect of scaling on a good 
 
 bound for the "forward" error. 
 
 THEOREM 4.7. The error 
 
 _ x || < x(n)ullA- 1 DH HlD^Allxl 
 
 1 ■ x(n)u Cond(A _1 D) 
 PROOF. We have 
 
 ||x - x || = ||a _1 (- AAx) II 
 
 < ||a -1 d|| ||d -1 aax|| , 
 
 and the theorem follows from Lemma 4.4. Q.E.D. 
 
 If higher order terms in u are ignored, the bound on the 
 error is minimized by choosing D = diag( |A | |x |) . Thus 
 
 Ha" 1 !! II IaI U N 
 
 lllA^llAllxlH 
 is a measure of the possible effect on the "forward" error of how poorly the 
 equations happen to be scaled. For Hamming's example this quantity is 
 
 fi — 1 —l 
 
 yy e + 0(1) and for Gear's it is e + 0(1). 
 
 The usual type of bound on the error is of the form 
 
 ||ft - x|| < x(n)u|K" 1 li||K' 1 A||||x||+ 0(u 2 ). 
 
 This is minimized by D = diag(|A|e), which is row equilibration with 
 
 the £ norm. 
 
33 
 
 5. Gaussian Elimination with Column Pivoting and Scaling 
 
 This section is similar to the previous section except that we 
 
 examine the variant of Gaussian elimination in which the columns are 
 
 interchanged in order to ensure that the pivot element is the largest in 
 
 its row. The algorithm is assumed to do column scaling where the scale 
 
 factors d.. , d„,..., d are given as inputs to the algorithm. Again the 
 1 / n 
 
 selection of the pivot is assumed to be done exactly so that 
 
 a ki d i 
 
 (k) 
 
 a kk 
 
 < d. , i > k + 1 . 
 
 — ' k 1 — 
 
 Writing the condition in this form allows for the use of zero scale factors 
 
 but does not permit the selection of a zero pivot. 
 
 An a priori error bound is given by the following theorem: 
 THEOREM 5.1. Let the vector x be computed by Gaussian elimination 
 
 with column pivoting and column scaling where D = diag(d , d ,..., d ) is 
 
 the matrix of scale factors. Then there exists A A such that 
 (A + AA)x = b 
 
 with |AAD)e <_ x(n)u|AD|e 
 
 where x( n ) =127*2 - 5n - 7je 
 
 PROOF. See Appendix B. Q.E.D. 
 
 It is likely that the constant x(n) in this bound could be 
 
 replaced by a smaller constant. 
 
 The following theorem indicates how the columns should be scaled 
 in order that Gaussian elimination with column pivoting be stable. 
 
 THEOREM 5.2. Let x be the value of A b computed by Gaussian 
 elimination with column pivoting and column scaling where D = diag(d , d~,..., d ) 
 is the matrix of scale factors. Let I = {i: (|a| |x|). = 0} and let 
 J={j:x=0}. If d. = for j GJ and d. f for j $ J , then the backward error 
 
 •J J J 
 
34 
 
 (|AD|e), 
 
 n < x(n)u max , i . i i ^ i ? - max (|d x|) .. 
 
 PROOF. We have x = DD~ x where D denotes D with all diagonal 
 zero entries replaced by ones. Hence 
 
 |b - Ax | = |AAx| £ |AAD|e Hd" 1 *!! 
 
 £ X(n)u |AD|e | JD^x 
 
 -1. 
 
 = x(n)u |AD|e max (|d x|).. 
 & J 
 
 We have that (IaIIxI). = implies that a.. = for i ^ J, which implies 
 1111 l ij ' r 
 
 that (lADle). = 0. Hence the theorem follows from Theorem 3.2. Q.E.D. 
 COROLLARY. The backward error 
 C |d _1 x| ) . 
 
 n _< x(n)u max - "•- . 
 
 i,j<fj (|D _i x|). 
 l 
 
 PROOF. We have 
 
 e <_ |D x| max — : . Q.E.D. 
 
 rfJ (Id^xI). 
 
 A choice of D which minimizes the bound on the backward error is 
 
 d . = x . ; 
 l l 
 
 that is, we scale by multiplying the i-th column by the i-th component of 
 
 the computed solution. Again these weights are not known at the time when 
 
 scaling is performed. The main value of this theorem is that it gives an 
 
 easily computable measure of column ill scaling: 
 
 ~ /a ~\ [ A] e l|x|| 
 a c (A, x) = max ^J^f • 
 
 For theoretical purposes we would prefer to use a (A, x) . For Hamming's 
 
 1 2 
 example a (A, x) = -r— + — . Near optimal row and column scaling, which 
 
 would be appropriate for complete pivoting, is given by 
 
35 
 
 D I lAD 2 = 
 
 3e 2 1 
 2 2 2 
 12-1 
 
 For Gear's example a n (A, x) = — . 
 is given by 
 
 &• 
 
 3 + 3e 
 6 
 2 
 Near optimal row and column scaling 
 
 D l lAD 2 = 
 
 dA = 
 
 E 1 -1 E 
 
 10 
 10 
 10 1 
 LEMMA 5.3. If D is nonsingular 3 then 
 
 lAAftl < * (n)u |AD|e \\vT\\\ 
 |A ' - 1 - x(n)u Cond(AD) 
 
 provided that the denominator' is positive. 
 
 PROOF. We have 
 
 x + A _1 AAx = x, 
 
 which implies 
 
 Hence, 
 
 AAx = AAD(I + D A 1 AAD) 1 D 1 x. 
 
 IaAxI < lAADle II (I + D -1 A "''AAD) 1 D x] 
 
 -1 
 
 X(n)u |AD|e ||D x|| 
 
 l - ||d" 1 a" 1 aad|| 
 
 The term in the denominator 
 
 |D 1 A 1 AAD||^|||D """A 1 1 | A AD | e 1 1 
 
 : x(n)u II |D _1 A~ :L I |AD|e 
 
 = x(n)u II ID "'"A 1 | I AD 
 
 Q.E.D. 
 
36 
 
 THEOREM 5.4. Let D be nonsingular and let |a||x| > 0. Gaussian 
 
 elimination with column pivoting gives 
 
 X(n)u a (AD, D _1 x) 
 n < 
 
 1 - 2x(n)u Cond(AD) a (AD, D """x) 
 
 provided that the denominator is positive. 
 
 PROOF. Choose AA as in Theorem 5.1. From Theorem 3.2 we have that 
 
 - AAxI 
 n = max I I I „ I . 
 |A| |x| 
 
 Furthermore 
 
 x = x + A AAx, 
 
 and so 
 
 IaMxI > I A I I x I - I A I I A 1 || AAx I 
 
 Therefore 
 
 I AAx 
 
 n < max 
 
 I A I I x I - I A II A - 1 1 I AAx I 
 
 Applying Lemma 5.3 yields 
 
 a D X x) 
 
 AAv I < I A I I v I 
 
 AAX| - 1 - v(n)u Cond(AD) l A H x l» 
 
 X 1 
 
 from which the theorem follows. Q.E.D. 
 
 THEOREM 5.5. For any problem such that |x| > there is some 
 ordering of the columns for which Gaussian elimination is asymptotically stable. 
 PROOF. Using Theorem 5.4 with D = diag(|x|) gives 
 X(n)u 
 
 n < 
 
 A 
 1 - 2x(n)u max J — 
 
 -1 
 
 | A | |x| 
 
 for small enough u. Hence for 
 
37 
 
 u < u(n) = min 
 
 -1 
 
 4x(n)|A x ||A||x| 
 we have 
 
 n <_ 2x(n)u. Q.E.D, 
 
 Recall that the stability threshold in the case of Gaussian 
 elimination with optimal row ordering was 
 
 lA||x| 
 
 u(n) = min 
 
 -1 
 
 4x(n)|A| |A X | |A| |x| 
 
 It is easy to show that this is larger than the stability threshold for 
 optimal column ordering. This is a slight indication that row oivotine 
 may be superior to column pivoting. 
 
 we have 
 
 THEOREM 5.6. The error 
 
 ||£ _ x || < xCnHl l A-'llADl lHlD^xll 
 II* l! - 1 - x(n)u Cond(AD) 
 
 PROOF. Since 
 
 (A + AA)(x - x) = - AAx, 
 
 x-x = -(I+A 1 AA) 1 A ^"AAx 
 
 ■ - A _1 AA(I + A~ 1 AA)" 1 x 
 
 = - A -1 AAD(I + D~ 1 A~ 1 AAD)~ 1 
 
 D 1 x. 
 
 Therefore, 
 
 x - x < 
 
 1 - i | D 1 A 1 AAD|| 
 
 1a" 1 ! [aadU Itsliidl 
 i-| ! !d -1 a _1 I IaadU 
 
 < X(n)u H |A 1 1 [API II Hd"^ 
 - 1 - x(n)u Cond(AD) 
 
 Q.E.D. 
 
38 
 
 If higher order terms in u are ignored, the bound on the error 
 is minimized by choosing D = diag(|x|). Thus for column pivoting 
 
 HlA-'llA » || 
 
 HlA^llAllxlll 
 
 is a measure of the possible effect of ill scaling on the "forward" error. 
 
39 
 
 6. Iterative Improvement 
 
 It is often thought that iterative improvement is not worthwhile 
 
 unless either (i) the uncertainty in the values of A is less than the unit 
 
 roundoff error (e.g., if the elements of A are integers) or (ii) we wish to 
 
 diagnose ill conditioning. This thinking is based on the fact that Gaussian 
 
 elimination with pivoting is stable from the absolute error point of view. 
 
 But according to the relative error point of view, Gaussian elimination may 
 
 not give acceptable accuracy, and so it is of interest to examine the 
 
 stability behavior of iterative improvement. Results of a careful error 
 
 analysis are given for iterative improvement both with and without double 
 
 precision accumulation of the residuals. 
 
 The algorithm being considered is described as follows where 
 subscripts denote iterates rather than components of vectors: 
 
 x.. = value of A b computed by row pivoting, 
 
 for m = 1, 2, 3, . . . 
 
 r = computed value of b - Ax , 
 m m 
 
 d = value of A r computed by row pivoting, 
 m m y 
 
 x , .. = x + d . 
 m+1 m m 
 
 The algorithm for computing r appears in §3 and the algorithm for d is in §4, 
 The theorem which follows shows that just one iteration of iterative 
 improvement with just single precision accumulation of the residuals is 
 enough to make Gaussian elimination asymptotically stable. This may seem to 
 contradict the usual advice (Forsythe and Moler [1967, p. 49]) that "It is 
 absolutely essential that the residual r be computed with a higher precision 
 than that of the rest of the computation." Actually there is little conflict 
 because we have shown that poorly scaled systems may be solved with an 
 effective precision of much less than single precision. 
 
40 
 
 THEOREM 6.1. Assume that |a||x| > 0. Gaussian elimination 
 
 followed by one iteration of iterative improvement results in a backward 
 
 error which satisfies 
 
 nl 1 (n+l)u + {( X (n) 2 + 2n x (n) + n 2 + 2n) Cond(A _1 ) a (A, x) 
 ^ R 
 
 + X (n) a R (A, x) + | n 2 + | n}u 2 + 0(u 3 ) 
 
 for single precision residual accumulation and 
 
 n£ < u + { X (n) 2 Cond(A _1 ) o R (A, x) + X (n) a R (A, x) + n}u 2 + 0(u 3 ) 
 for double precision residual accumulation. That is 3 the algorithm is 
 asymptotically stable in either case. 
 
 PROOF. The theorem follows from Theorem C.9 in Appendix C. Q.E.D. 
 
 Note. The asymptotic nature of these bounds conceals the fact that 
 certain assumptions on the smallness of u are necessary in order to get any 
 bound at all. The actual assumptions, found in Appendix C, are too lengthy 
 to reproduce here; roughly speaking it must be assumed that the coefficient 
 of the second order term is less than 1/u. 
 
 Recall from Theorem 4.3 that for no iterations we have 
 
 2 
 n 5 x(n) ° R ( A > x ) u + °( u )» 
 
 and thus one iteration does make a big difference in the size of the bound. 
 
 However, the presence of the product Cond(A ) a (A, x) in the second order 
 
 term indicates that this may not be the case for problems which are 
 
 sufficiently ill scaled and ill conditioned. 
 
 THEOREM 6.2. Assume |a||x| > 0. The backward error n for 
 
 iterative improvement of Gaussian elimination with row pivoting satisfies 
 
 Tim n' < (n+l)u + (2n(x(n) + n+1) Cond(A -1 ) a D (A, x) 
 m — k 
 
 + X (n) a R (A, x) + \ n 2 + | n + l}u 2 + 0(u 3 ) 
 
 for single precision residual accumulation and 
 
 lim n" < u + (x(n) a n (A, x) + n+l}u + 0(u ) 
 m — K 
 
 for double precision residual accumulation. 
 
41 
 
 PROOF. The theorem follows from Theorem C.7 in Appendix C. Q.E.D, 
 The main effect of doing more iterations with single precision 
 accumulations is a moderate reduction in the magnitude of the second term. 
 
 But for the double precision case there is a striking improvement due to 
 
 2-1 2 
 
 the disappearance of the x ( n ) Cond(A ) a (A, x)u term so that the bound 
 
 K 
 
 -1 3 
 
 on n" depends on the condition number of A * only through the 0(u ) term. 
 
 m 
 
 This may represent a significant improvement for problems which are both 
 poorly scaled and ill conditioned. 
 
42 
 
 7. Practical Implications 
 
 The comments that follow are suggested by the error analysis, 
 
 but their usefulness remains to be established. 
 
 By means of examples it has been shown that Gaussian elimination 
 
 with (partial or complete) pivoting does not generally provide all the 
 
 accuracy that the data deserve or even a fixed fraction of that accuracy. 
 
 Hamming [1971, p. 121] states 
 
 "It is reasonable to ask how typical these examples are and 
 how often in the past the pivoting method has created the 
 ill conditioning that was reported to occur by some library 
 routines. The answers are not known at this time; all that 
 is claimed is that textbooks and library descriptions rarely, 
 if ever, mention this possibility (though it is apparently 
 known in the folklore)." 
 
 and so it seems that there have been practical instances where the pivoting 
 
 method has performed poorly. Perhaps Gaussian elimination without 
 
 iterative improvement should be regarded as a "quick and dirty" way to 
 
 solve linear equations. 
 
 The computation of the backward error is one reliable test for 
 
 deciding whether or not the solution of a linear system is "reasonably 
 
 accurate." The test can be made quite efficient by accumulating r and 
 
 |a| |x| + |b| at the same time. If the test is failed, then in most cases 
 
 the use of iterative improvement would result in a solution which passes 
 
 the test. One could, of course, forgo the backward error computation and 
 
 just do iterative improvement until "convergence." But such a procedure 
 
 may not be completely reliable since it has not been rigorously proven that 
 
 "convergence" implies a reasonably accurate solution. Stewart [1973, 
 
 p. 205] mentions "the possibility that, with a violently ill-conditioned 
 
 matrix, the iteration may appear to converge to a false solution." 
 
43 
 
 It is also suggested by the theory that if double precision 
 accumulation of the residuals is costly, then iterative improvement with 
 single precision accumulation might still be beneficial. 
 
 The success of the pivoting method depends upon a reasonable 
 scaling of the equations, which is at best guesswork unless one has some 
 knowledge about the sizes of the solution components. If c ~ |x|, then 
 (i) for row pivoting one should scale the system to get 
 (D~ 1 A)x = (D^b) where D = diag(|A|c). 
 (ii) for complete pivoting one should scale the system to get 
 
 (D~ AD 2 )(D~ 1 x) = (D^b) where D - diag(|A|c) and T> 2 = diag(c) 
 It may be worthwhile to allow users of a linear equation solver to provide 
 an estimate of the solution, particularly if the solution variable X is being 
 used only as an output parameter. For simple use of the program X could 
 be set to all ones. 
 
44 
 
 Appendix A. Error Bounds for Row Pivoting 
 
 For Lemmas A.l through A. 8 it is assumed that D has nonzero diagonal 
 elements. This assumption does not apply to the theorem that follows 
 these lemmas. For any n x n matrix C = (c . . ) let c. = c . . /d . . Also, 
 
 iJ 
 
 iJ 
 
 let u) = 1 + u. 
 
 LEMMA A.l. We have 
 
 and 
 
 m.. < a) d . /d, , i > k, 
 ik' — ' i k 1 
 
 ij i 
 
 -(k)i k+1 k _ 1 ,„ ,k-l-£,- | k-1,- 
 a. . < a) I {2b)) a. . + a) a 
 
 ij 
 
 i,j :• k. 
 
 I i (k) , (k) 
 m iJ ^K /a kk 
 
 PROOF. Equation (4.1) implies 
 
 i > k, 
 
 and because of row pivoting (see (4.5)) we get the first inequality of 
 the lemma. Equation (4.2) implies 
 
 |a< k+1 > 
 1 ij 
 
 and therefore 
 
 |a (k+1) 
 1 iJ 
 
 <_oo|a 
 
 (k) 
 ij 
 
 + (1 + ZiOlm., a^l, i,j > k + 1 
 1 ik kj ' J — 
 
 < Jaf k) | + u(l + 2u)|a 1 (k) |, 
 - ij kj 
 
 i,j > k + 1, 
 
 The second inequality of the lemma follows from this by induction on k. Q.E.D. 
 LEMMA A. 2. The matrices L = {I..) and U = (u..) satisfy 
 
 LD = A + E (1) + E (2) + ... + E (n "^ 
 
 (k) (k) 
 
 where the matrices E have elements e.. which satisfy 
 
 ij 
 
 -u> ^laf^l + (2 + 3u) a) 2 u|m..a. u for i,j > k , 
 
 ik kj 
 
 (k) 
 
 I.«l<< 
 
 - 1 I 
 to u a 
 
 (k) 
 ik 
 
 L 
 
 regardless of the pivoting strategy. 
 
 for i > j = k, 
 
 otherwise. 
 
45 
 
 PROOF. Define the elements of E 
 
 (k) 
 
 by 
 
 r (k+1) (k) (k) 
 a . . - a . . + m., a, . 
 ij ij ik kj 
 
 
 . * (k) * (k) 
 m ik a kk " a ik 
 
 ^' 
 
 for i,j > k, 
 for i > j = k, 
 otherwise. 
 
 By separately considering the cases i <_ j and i > j it is straightforward 
 
 (k) (k) 
 
 to show that the elements e . . of E satisfy 
 
 n-1 
 
 (k) 
 
 n 
 
 E e;?' - £ £., u . - a. . , 
 k= l « k= i lk ^ ij 
 
 which establishes the equality of the lemma. Let k <^ n - 1 be fixed. 
 Write (4.1) as 
 
 ™ Ik = C-S^Hl + V- 
 
 i > k .+ 1, 
 
 and (4.2) as 
 
 a (k+1) = (a 00 - m.,a< k) (l + 6..))(1 + 6'..), i,j > k + 1, 
 ij ij ik kj ij ij - 
 
 where the 6's are relative roundoff errors. Then 
 
 ^a^V - m.. a, (k) (6.. +6'.. +6.. 6'..) for i,j > k, 
 ij ij xk kj xj xj xj xj 
 
 (k) 
 
 U; .. / UK 
 
 J4 - \ a -,-U 6 
 
 IJ 
 
 ik ik 
 
 Lo 
 
 for i > j = k, 
 
 otherwise, 
 
 and the lemma follows from the bound u/(l + u) on the 6's. Q.E.D. 
 LEMMA A. 3. The matrices L and U satisfy 
 LU = A + E 
 
 with 
 
 -1 
 
 |D E|z|| £ (3-2 
 
 n-1 
 
 3)o 2n - 1 xi\\v~ 1 *\z 
 
 for arbitrary z _> 0, 
 
46 
 
 PROOF. Let E - E (1) + E (2) +. . .+ E (n "^ where the E (k) are given 
 
 by Lemma A. 2. Substituting the bound on m., of Lemma A.l into the bound 
 
 xk 
 
 (k) 
 
 on the elements e . . we get 
 
 ij 
 
 ,-(k), -1 ,-(k), , -1 ,-(k), 
 £.. < w u a.. + (2 + 3u)oj u a, . , 
 1 ij ' - ' ij ' kj 
 
 which, in fact, is valid for all i,j. This implies 
 E |e< k) |z. < 3d||n -1 A (k> |«|| 
 
 for arbitrary z >^ 0. From Lemma A.l it immediately follows that 
 || |D- 1 A (k) | Z || < 2 k - 1 a» 2k - 1 1||D" 1 A| Z ||; and so we h* 
 
 HlD^E^IzIl! 3-2 krl a." 2n - 1 «|||D" 1 A|z||, 
 
 from which the lemma follows. Q.E.D. 
 LEMMA A. 4. We have 
 
 lave 
 
 and 
 
 £. . < to d./d , 
 ij i 3 
 
 - I „ 2i-l r .i-£-l ir , 
 u . . < oo E 2 a, . , 
 
 i > J, 
 
 i < 3- 
 
 Jl=l 
 
 (i) 
 
 PROOF. Follows from Lemma A.l because £.. = m. . and u.. = a.. . Q.E.D, 
 
 ij ij ij ij 
 
 LEMMA A. 5. The vector x computed by (4.4) satisfies (U + 5U)x = y 
 for some upper triangular matrix 6U such that 
 
 where 
 
 6u..| < g(i,i)oj |u..| and lu.. + 6u..| < 
 
 r 
 
 n-i+li 
 to u 
 
 ij 
 
 g(i,j) = < 
 
 n - j + 2, j > i + 2, 
 
 n - j + 1, j = i + 1, 
 
 2, j = i £ n - 1, 
 
 xj-, j - i = n, 
 
 regardless of the pivoting strategy. 
 
47 
 
 PROOF. From (4.4) we have that 
 
 xi x 
 
 (1+6 
 
 .)(l+6^) 
 
 r v • 
 
 . V,U 
 
 and 
 
 
 
 
 
 u 
 nn 
 
 1 + 
 
 X 
 
 n 
 
 6' 
 
 n 
 
 y n 
 
 Xv + u xx ) 
 
 i,i+l i+l i,i+2 ±+2 J 
 
 .+ u. x x ) 
 xn n 
 
 = y. 
 
 i < n - 1 
 
 where 6. and 6! are relative roundoff errors due to subtraction and division, 
 x x 
 
 respectively. By the usual type of analysis we can obtain the bounds 
 
 r 
 
 6u 
 
 ij 
 
 1 
 
 . n-j+2 , I | 
 (u) J - l)|u |, 
 
 n-j+1 | | 
 (u> J - 1) |u. . | , 
 
 (u) - 1) U 
 
 1J 
 
 (O) - 1) u. . , 
 
 V. 
 
 1J 
 
 j 1 i + 2, 
 
 j = i + 1, 
 
 j = i < n - 1, 
 
 j = i = n, 
 
 from which the lemma follows. Q.E.D. 
 
 LEMMA A. 6. The matrix 6U of Lemma A. 5 satisfies 
 
 |||d _1 l6u|z|| < f5.2 n " 2 - 21 m^Hd^aUII 
 
 for arbitrary z _> 0. 
 
 PROOF. From Lemma A. 4 it follows that 
 
 (|d~ 1 L6u|z). < Z E IdT 1 A., d, I |6u, . |z. 
 x — . . x xk k ' kv i 
 J k 
 
 < (jo E E 6u, , z . . 
 " J k kj J 
 
 Applying Lemma A. 5 first and then Lemma A. 4 gives 
 (|D _1 L6u|z). < u E E g(k,j)w n ~ k+1 |u,.|z. 
 
 j k 
 
 j 1 j 
 k 
 
 <u E E g(k,j) W n+k E f2 k " l l|5 f .|s, 
 j=l k=l £=1 ^ J 
 
 <» u E E E r2 k " £_1 l g(k,j)|i |z 
 1=1 j=£ k=£ J J 
 
48 
 
 After much manipulation it turns out that 
 
 max E U 1 ^" 1 ! g(k,j) = 15- 2 n "*" 2 ], 
 £.<j<n k=£ 
 
 and therefore 
 
 (|D _1 L6u|z). < w 2n u E L 5-2 n ~ Z ~ 2 \ (|d _1 a|z) , 
 1 £=1 l 
 
 from which the lemma follows. Q.E.D. 
 
 LEMMA A. 7. The vector y computed by (4.3) satisfies 
 
 (L + 6L)y - b 
 
 for some lower triangular matrix 6L whose elements 61.. satisfy 
 
 \6l.. < min {n - j + 1, n - l}u> n ~^u I a 9 ' U:\ ) I > i > J. 
 
 1 ij 1 - ' ij jj ' - 
 
 regardless of the pivoting strategy. 
 
 PROOF. We have from (4.3) that 
 
 (...U u * y t + l a * y 2 )...+ t * T^ltij-- b., i » 2 
 
 where 6 . is a relative roundoff error. By the usual type of analysis 
 
 we £,et that 
 
 -1 v i-l 
 
 6 
 
 Z ±1 \ < ((1 + a) u) - 1)U ±1 , i 1 2, 
 
 -1 vi-j+1 
 
 + 0) u) J 
 
 and 
 
 62-.. < ((1 + a) u) J - l)\i..\ y i > j > 2, 
 
 ij ' - ' il ' - 
 
 6fc. . I < ult. . 
 n 1 — ' li 
 
 The lemma follows from the inequalities 
 
 (1 + of u) - 1 <_ u)~ u k(l + u u) 
 
 . k-2 
 < koj u 
 
 and 
 
 £..| < min {u, (/ j } I a9 ) /a^ * I . Q.E.D, 
 ij 1 - iJ JJ 
 
49 
 
 LEMMA A. 8. The matrices 6U and 6L of Lemmas A. 5 and A. 7 satisfy 
 
 HlD^LCU + 6U)|z|| < (2 n+1 - n - 3)a) 2n ^| | D _1 a| z \\ 
 for arbitrary z _> 0. 
 
 PROOF. Using the bound of Lemma A. 7 and the row pivoting 
 inequality (4.5), we get 
 
 |6£..| < min{n - j + 1, n - l}a) _J u I d . /d . I , i > j. 
 
 Hence 
 
 •> J-XT /TT _L XTT\ I _\ ^ .. T ^ ,' ^ / „ -I -L 1 „ 1 \, , D 3 / I W^ 
 
 (|D <5L(U + 6U)|z). <_ u E min{n - j + 1, n - l}co J (|D (U + 6U) | z) . 
 
 j J 
 
 It immediately follows from Lemma A. 5 that 
 
 (|D _1 (U + 6U)|z). < a) n " :i+1 (|D" 1 u|z). 
 
 and from Lemma A. 4 that 
 
 (|D _1 (U + 6U)|z). < 03 n+j 2 j_1 ]||D" 1 A|z||. 
 
 n 
 
 _1 xt /« ^ x„nI^ - .. 2n .. v _,•_/_ 4 j. i „ inJ-^ll^ -1 
 
 Therefore, 
 
 (|D -J -6L(U + 6U)|z) . £ c/ n u E min{n - j + 1, n - 1}2 J_I )| |d~ X a|z||, 
 1_ 3=1 
 
 from which the lemma follows. Q.E.D. 
 
 THEOREM 4.1. Let the vector & be computed by Gaussian elimination 
 with row pivoting and row scaling where D = diag(d , d ,..., d ) is the matrix 
 of reciprocal scale factors. Then there exists A A such that (A + AA)x = b 
 with 
 
 II [D~ 1 AA| Z If < x(n)^||D~ 1 A|z|| 
 
 for arbitrary z _> and arbitrary e satisfying < |e| < e where x(n) = 
 
 r n-2 , 2nu 
 1 19* 2 - n - 8 le 
 
 PROOF. The restriction on e implies that 5(II(r» )) = K (11(D)), and 
 
 from Lemmas A. 3, A. 6, and A. 8 we have the bounds 
 
50 
 
 -1 
 
 ,n-l 
 
 2n-l 
 
 .-1, 
 
 D; A e|z||< (3.2"- 1 - 3)^-^110^1 z||, 
 D^LfiuHl < [5.2 n ~ 2 - 21a, 2n u|||D- 1 A|z||, 
 
 and 
 
 || |D X 6L(U + 6U) |z|| < (2 n+1 - n - 3)u) 2r \J| |d "'"A | z||. 
 The theorem follows from the equation 
 
 (A + E + 6LU + (L + 6L)6U)x = b. Q.E.D. 
 
 THEOREM 4.2. There exists a problem Ax = b and a floating point 
 arithmetic <+, -, x, J> suoh that the solution x computed by Gaussian 
 elimination with partial or complete pivoting satisfies (A + AA)x = b 
 only for those matrices AA for which 
 
 AA 
 
 > [19-2 n 2 - n - 81u + 0(u 2 ) 
 
 Therefore, the bound of Theorem 4.1 is the best possible bound in the 
 limit u •* 0. 
 
 PROOF. Obvious for n = 1. Assume n > 2. Let 
 
 and 
 
 A = 
 
 M 
 
 -M M 
 
 -M -M 
 
 o : 
 
 M 1 
 -M 1 
 
 1 + (7.2 k X - k - 8)u, 
 
 ( 1 + (2 n+1 - n - 7)u, 
 1 + (19.2 n_2 - n - 8)u, 
 
 k <_ n - 2, 
 k = n - 1, 
 k = n. 
 
 If M is large enough, then there are no interchanges even with complete 
 
 pivoting. We have 
 
 m. . = m = M/(-M), i > j , 
 
 a (1) - 1, 
 
 in 
 
 (k) . (k-1) , (k-1) 
 
 a. = a . - m x a, _ , 
 
 in in k-l,n, 
 
 i > k > 2. 
 
51 
 
 Suppose that all these floating point operations increase the magnitude 
 of the result by a factor (1 + 2u)/(l + u) . Then 
 
 and 
 
 m = -(1 + u) + 0(u ) 
 
 00 - tt + u) a^'-H (1 + 3U) a/":" + 0(u 2 ). 
 
 in in 
 
 By induction on k it follows that 
 
 k-l,n 
 
 and hence 
 
 We have 
 
 a (k) =2 kl + 2 k (k _ 1)u + / u 2j i > k, 
 
 in — 
 
 u. = 2 k X + 2 k (k-l)u + 0(u 2 ) 
 kn 
 
 where 
 
 Then 
 
 and 
 
 y l = b l 
 
 y k = \" S k-l' k ^ 2 ' 
 
 S k = ( . . . (m * y;L + m * y 2 ) . . . + m * y fc ) . 
 
 S = m * b 
 
 S. = S. . + m * (b. - S. ,)» 2 < k < n - 1. 
 k k-1 k k-1 — — 
 
 Suppose that all these floating point operations reduce the magnitude 
 of the result by a factor 1/(1 + u) . Hence 
 
 S 1 = -b 1 + 0(u ) 
 
 and 
 
 S = (2 - 3u)S, 1 - (1 - 2u)b + 0(u ), 2 <_ k <_ n - 1. 
 
 By induction on k it follows that 
 
 k k+1 9 
 
 S k = 1 - 2 - 2 (k - 4)u - (k + 9)u + 0(u ), k < n - 2 
 
 and 
 
 S , = 1 - 2 n 1 - 2 n 2 (4n - 19)u - (n + 8)u + 0(u 2 ) 
 n— l 
 
52 
 
 Also we have 
 
 y = 1 - 2u, 
 
 y k = (b k " S k-l )(1 " U + 0(u2)) ' k 1 2 - 
 
 From this we get 
 
 y k = 2 k_1 + 2 k (k - 2)u + 0(u 2 ), k < n - 2, 
 y - = 2 n " 2 + 2 n ~ 2 (2n - 5)u + 0(u 2 ), 
 y = 2 n_1 + 2 n_1 (2n - l)u + 0(u 2 ). 
 
 We have 
 
 x = y /u , 
 n n nn 
 
 Vl = <Vl * Vl.n * V*' 
 
 Suppose that all these floating point operations reduce the magnitude of 
 
 the result by a factor 1/(1 + u) . Then 
 
 x = (y /u )(1 - u) = 1 + 0(u 2 ), 
 n n nn 
 
 n-2 . „n-2,„ ^ s , n , 2, 
 n-l,n n n-l,n 
 2, 
 
 u , * x_ = u n _(1 - u) = 2 " * + 2 (2n - 5)u + 0(u ), 
 
 and 
 
 Hence 
 
 x = 0(u ), 
 n-1 
 
 + u, * x = u, (1 - 2u) = 2 k_1 +2 k (k -2)u + 0(u 2 ), k < n - 2, 
 
 k,n n k,n — 
 
 x = 0(u 2 ), k <_ n - 2. 
 
 (b - Ax) = b - 1 + 0(u 2 ) = (19-2 n 2 - n - 8)u + 0(u 2 ). 
 n n 
 
 No matter how we choose AA we get 
 
 II I AAl |x| II > ||AAx|| = ||b - Ax|| = (19-2 n_2 - n - 8)u + 0(u 2 ). Q.E.D. 
 
53 
 
 Appendix B. Error Bounds for Column Pivoting 
 
 For any matrix C = (c..) let c. = c..d.. Also, let to = 1 + u, 
 LEMMA B.l. We have 
 
 and 
 
 -u^li-l^l. i>k.i>k. 
 
 ,-(k), „ k+1 k v 1 , ,k-l-£i- | k-1,- , , 4 v , 
 a.. <u E (2w) a.. +o) a.., i.j > k . 
 
 13 - £=1 '• i*' 1J 
 
 PROOF. Equation (4.1) implies 
 
 I 00, i I (k) (k) J , (k) , 
 
 • m ik a kj d jl 1 "l a ik a kj d j /a kk I' 1 > k, J > k. 
 
 and because of (A. 5) we get 
 
 Im^a^d.l < wlafj^d. I, i > k, j > k, 
 1 lk kj J ' — ' lk k ' J — 
 
 which proves the first inequality. Equation (4.2) implies 
 
 l (k+1) | i (k) | ,. ' N I (k) I 
 
 a.. < wa.. + (1 + 2u) m..a. . , i,j > k + 1, 
 1 ij ' - » ij ' ' lk kj " 
 
 and therefore 
 
 |af k+1) | <Jaf k) | + a)(l + 2u)|af k) |, i,j > k + 1. 
 1 ij '- ' ij ' /] lk " - 
 
 The second inequality of the theorem follows from this by induction on k. Q.E.D, 
 
 LEMMA B.2. The matrices L and U satisfy 
 
 LU = A + E 
 with 
 
 |ED|e < L7-2 n " 2 - n-2J co 2n U |AD|e. 
 
 PROOF. Assume n > 2. Let E = E^ 1 ' + E (2 ^ +. . .+ E^ n_1 ' where 
 
 (k) 
 the E are given by Lemma A. 2. Substituting the bound on m., of Lemma B.l 
 
 (k) 
 into the bound on the elements e . . we get 
 
 ij 
 
 r i-(k) | _ i-(k) I , , , ^ i 
 
 u|a_ | + 2a)u|a. | for i,j > k, 
 
 for i > j = k, 
 
 otherwise. 
 
54 
 
 This implies 
 
 Z|ef k) | < (2n - 2k+l)um|if k) | + " & |a (k) |, 1 > k, 
 
 « ij ' xk ' . , , ' ii ' 
 
 J J J=k+1 J 
 
 and from Lemma B.l it follows that 
 
 l|e< k) | < (2n - 2k + l). 2k u{^ Z^ 1 "*^ + \ij) 
 
 f .. 2k-i k : 1 _k-i-£ i - i k-i J i- i 
 
 + (n-k)io u E 2 a.. + oj u E a.. 
 
 £=1 " j-k+1 ^ 
 
 r(2n-l)w 2 u E la. . I if k = 1, 
 
 < < i 1J 
 
 L(3n - 3k+l)o) u 2 E | a. . | if k > 2. 
 
 j 1J 
 The lemma follows because 
 
 n-1 _ 
 
 (2n-l) + Z (3n - 3k+l)2 = 7-2 n - n-2. Q.E.D. 
 k=2 
 
 LEMMA B.3. We have 
 
 \l . . u..| < ulaH I i 
 1 ij ij - ij 
 
 IJ A=l 1J6 
 
 and 
 
 u . . < u. . I > i< i. 
 i iji - i ii" - 
 
 PROOF. The first two inequalities follow immediately from 
 Lemma B.l, and the third inequality is a consequence of column pivoting. Q.E.D, 
 LEMMA B.4. The matrix SU of Lemma A. 5 satisfies 
 |L 6U D|e < (2 n+1 - n-2)a) 2n u|AD|e. 
 PROOF. Applying the inequality |u..| <_ |u..| of Lemma B.3 and 
 
 the bounds of Lemma A. 5, we get for i < n 
 
55 
 
 n 
 Z 1 6u . . I < u Z g(i,i)oj lu.J 
 
 Therefore 
 
 (n-i+2) (n-i+1) n-i .- 
 < ~ 0) u u. . 
 
 — 2 ' li 
 
 ( L5UD e) . = Z £. . S 6u._ 
 3 k 
 
 <u£ <°-J+2)(n-J+l) M n-J|,,, S.J. 
 " j 2 1] " 
 
 and so using Lemma B.3 gives 
 
 ,| T » m | a x , 2n ° (n-j+2)(n-j+l) J r ^j-l-^-. .- i 
 
 (|L6UD|e). <_ to u E ■■ „ — * Z |2 ||a | 
 
 1 " j=l Z £--1 lJl 
 
 < w 2n u J (n-J + 2)(n-J + l) rf-2 } J |j , 
 j=l £=1 l£ 
 
 from which the lemma follows. Q.E.D. 
 
 LEMMA B.5. The matrices SU and 6L of Lemmas A. 5 anc? A. 7 satisfy 
 
 |6L(U + 6U)D|e £ 3 (2 n -n-l)a) 2n u| AD | e. 
 
 PROOF. Applying the inequality u.. < u.. of Lemma B.3 and 
 
 1 ij ' — ' ii 1 
 
 the bounds of Lemma A. 5, we get 
 
 ^\~ I r, n-l+ll- I 
 
 Z u. . + 6u. . < Z u) u. . 
 
 . ij ii ' •■, ■■ . ii 
 
 3 J 3=i 
 
 < (n-i+l)u, n - i+1 |u..|. 
 
 Therefore applying the bounds of Lemma A. 7 gives 
 
 (|6L(U + 6U)D|e). = Z I 6£ . . I Z I G #1 + 6u.. I 
 i j 13 k 3k 3k' 
 
 < Z min {n-j+1, a-l}u> n "" j laf^/af^ I (a-j+l)a> n ' j+1 |u J1 A , 
 
 - j ' 13 33 ' { 33* 
 
 and so using Lemma B.3 gives 
 
56 
 
 Appendix C. Error Bounds for Iterative Improvement 
 
 LEMMA C.l. The computed residual satisfies 
 
 r = b - Ax + c 
 m mm 
 
 with 
 
 where 
 
 c <ub-Ax+wAx 
 m' — m' ' m 
 
 w = 
 
 "(I + u) [(1 + ft)" - 1] 
 il + u) [(1 + u 2 ) n - 1] 
 
 for s.p. aooum. 
 for d.p. aooum. 
 
 and u = u/(l + u) . 
 
 PROOF. The computed residual 
 
 r = b - (Ax + c') + c" 
 m m m m 
 
 where c' is the error due to computing Ax and c" is the rest of the error, 
 m mm 
 
 For the single precision case we have 
 |cj 1 [(1 + u) n - 1] | A | | x 
 and 
 
 m' 
 
 m' 
 
 so that 
 
 c" < u b - Ax - c' 
 m — m m 
 
 c < c' + c" 
 m' — ' m 1 ' m 1 
 
 < ulb - Ax + (1 + u) [(1 + u) n - 1] IaIIx I. 
 — m m 
 
 For the double precision case we have 
 
 c'| < [(l + u 2 ) n - 1] |A||x J 
 
 m — m 
 
 and 
 
 so that 
 
 |c n | < ulb - Ax - c' 
 1 m 1 — ' m it 
 
 c < c 1 + c" 
 m — m 1 m 
 
 2,n 
 
 < ulb - Ax I + (1 + u) [(1 + u ) - 1] Ax . Q.E.D, 
 — ' m 1 m 
 
57 
 
 2 n J -_1 o 
 
 (|SL(U + 6U)D|e). < a) n u E min {n-j+1, n-1} (n-j+1) E \2 3 l\ a | 
 
 1 j=l 1=1 Xl 
 
 2 n • o n 
 
 £co u E min {n-j+1, n-1} (n-j+1) [2 J 1 E |a |, 
 
 j=l 1=1 l£ 
 
 from which the theorem follows. Q.E.D. 
 
 THEOREM 5.1. Let the veotor & be computed by Gaussian elimination 
 with column pivoting and column scaling where D = diag(d , d , . . . , d ) 
 is the matrix of scale factors. Then there exists A A such that 
 
 I (A +A A)x = b 
 
 with 
 |AAD|e <_ x(n)u|AJ)|e 
 where 
 
 - t \ i o-7 o n- 2 c -i i 2nu 
 
 X(n) = L27- 2 - 5n - 7Je 
 
 PROOF. The theorem follows from the bounds of Lemmas B.2, B.4, 
 
 and B.5 and from the equality 
 
 (A + E + 6LU + (L + 6L)6U)x = b. Q.E.D. 
 
58 
 
 LEMMA C.5. We have 
 
 uv + w 
 
 T^ I U V T W II | , I 
 
 lim z < — A x 
 
 " m" — 1 - t " ' " ' 
 
 provided that t < 1 where 
 
 t = u + (2v + uv + w)y. 
 PROOF. From Lemma C.4 
 
 \U +A\<¥ Z - 2£L W* ll + f^^^lllAllxHI. Q.E.D. 
 ' ' m+1 " — 1 - vy " m " 1-vy 
 
 LEMMA C.6. We have 
 
 w rl a I I I II i . I I I II i uv + w 
 
 T~- — I w r | . I || I . I llli, UV T W 
 
 lim z < {A x - A x e} + — 
 
 1 m 1 — 1 - u i ii in 1-t 
 
 A x e 
 
 m-x» 
 provided that t < 1. 
 
 PROOF. From Lemma C.4 
 
 lim I z < u lim |z I +w{|a||x| -|||A||x||e} 
 
 1 m I — I m I 
 
 m' — ' m 
 m-H» m>o° 
 
 + ((f^ 1 - u) limjz || + ^tw||| A || x ||| } 
 1 - vy " m" i - vy ' 
 m-x» 
 
 and the lemma follows from Lemma C.5. Q.E.D. 
 
 THEOREM C.7. Assume |a||x| > 0. Then 
 
 (1 + uy) (uv + w)o w(g - 1) 
 1-t "l-u 
 
 lim n < — 
 
 m — ., (1 + u) (uv + w)ya 
 
 ni-xx> l - u - z ' — 
 
 1-t 
 
 provided that (1 + u) (uv + w)ya < (1 - u) (1 - t) where a = o R (A, x) 
 
 PROOF. From Lemmas C.5 and C.6 we have 
 
 T~r~ ii ii (uv + w)a I . i i I 
 
 lim z e < • i — z A x 
 
 " m" — 1-t 'iii 
 m-x» 
 
 and 
 
 -TT- I I r (uv + w)a w(a - l) -. I . I I i 
 
 lim z < {-z—z »=■ -} Ax. 
 
 1 m' — 1-T l-u '!'■ 
 m+oo 
 
 Hence using Lemma C.3, 
 

 59 
 
 LEMMA C.2. Define z = A(x + d - x) . Then 
 
 m mm 
 
 I z <u|A(x - x) + w A x 
 i m i _ i m i i i i m i 
 
 + (1 - vy) _1 v (|| |A| |x m - x| || + yu||A(x m - x)|| + Y w|| |A||xJII>e 
 where v = x( n ) u °- n d y = Cond(A ). 
 
 PROOF. The correction term d m = (A + F )~ r where F is the AA 
 
 m 
 
 mm m 
 
 of Theorem 4.1. We have 
 
 z = A(x -x)+r - (I + F A ) FA~r 
 mm m m mm 
 
 = c - (I + F A -1 )" 1 F (x - x - A _1 c ). 
 m m m m m 
 
 It follows from the bounds of Theorem 4.1 that 
 
 -1 
 
 z < |c I + (1 - vy) v (|| |a| |x - x| II +yI|c ||)e. 
 
 m — ' m' " ' ' ' m ' " " m" 
 
 Substituting the bounds of Lemma C.l into this proves the lemma. Q.E.D. 
 LEMMA C.3. We have 
 
 and 
 
 A(x , , -x) < z + ul A| | x + uy z e 
 m+1 ' — ' m' i ii i "m" 
 
 A x ., - x < u A x + (1 + u)y z e. 
 i i i jn-i-i i _ i i i i " m" 
 
 PROOF. The new iterate 
 
 x = x + d + g 
 m+1 m m m 
 
 where |g <_ u|x + d |. Equivalently 
 
 m" — m 
 
 x = x + A z +g 
 m+1 m m 
 
 where |g <u|x| +u|A z I, from which the lemma follows. Q.E.D, 
 m m 
 
 LEMMA C.4. We have 
 
 z 
 
 < u(|z - ||z ||e) + w(|A| |x| - I! |A| |x| ||e) 
 
 m+1 ' — ' m ' ' ' m 
 
 + (1 - vy)~ { [u + y(v + uv + w)] ||z II + (uv + w)|| |A| |x| ||}e 
 
 2 
 where w = u + w(l + u) . 
 
 PROOF. Substituting the bounds of Lemma C.3 into those of 
 
 Lemma C.2 proves the lemma. Q.E.D. 
 
60 
 
 PROOF. Substituting the inequality of Lemma C.8 into those 
 of Lemma C.3 gives 
 
 (v + u)wy v(u + vy + w) (1 + uy) 
 
 |a(x 2 - x) | < (w + u) |a| |x| + i-^-z 
 
 and 
 
 (1 • vy) 
 
 and the theorem follows from the inequality 
 
 |A(x 2 - x| 
 
 n < max -r— r-i — i rrn T • Q-E.D, 
 
 2 — A x - A x„ - x 
 
 VY (1 - vy) 2 
 
 .ii i i a i i i (w + uv + vy),., x in . ii i ii 
 
 A| |x 9 ■ ■_ u|A| |x| + ■ — y 2 - (1 + u)y |[ |A| |x| ||e, 
 
 Axe 
 
61 
 
 m-**> 
 and 
 
 tt— i . / n i r (1 + uy) (uv + wr)a w(a - 1) . • ■, i . i i i 
 
 lim A(x - x) < {- "-* ^ + u} Ax 
 
 1 m ' — 1-t 1-u i i i i 
 
 < {u+ (1 + u)(uv+w)ya } | A | 
 
 — 1-T ' ' 
 
 lxm Ax -x| <{u+ ; L — } |A| |x| . 
 
 1 ' ' m 
 nv-*» 
 
 The theorem follows from 
 
 and thus 
 
 So 
 
 |A(x - x)| |A(x - x)| 
 
 m ' m 
 
 H = max I . I I 1 < max i-tt-i — i r-rT-i T • Q.E.D. 
 
 m Ax — Ax - Ax - x 
 
 i i i m i i i i i i i i m i 
 
 LEMMA C.8. Vie have 
 
 l l l a l l l . r wvY , v ( u + vy+wy)-,|||.|| in 
 
 z. < wAx + {- — + — ^ ' ^r - ^} Axe. 
 
 1 l 1 — ' ' ' ' 1 - vy /n N 2 ' 
 
 (1 - vy) 
 
 PROOF. The first iterate x = (A + AA)~ Ax, 
 
 x - x = -A 1 (I + AA A 1 ) "'"AAx. 
 
 and 
 
 A(x 1 - x)|| < (1 - vy) \ || |A| |x| 
 
 a| J x 1 - x| || <_ (1 - vy) vy | |a| |x| 
 
 From Lemma C.2 we have 
 
 z, | £ w| A | | x| 
 
 -1 
 
 + (1 - vy) {(v + w)|| |A| | X] _ - x| II + u||A( Xl - x)|| + vwy|| |a| |x| ||}. 
 The lemma follows by substituting the previous two inequalities into this. Q.E.D. 
 THEOREM C.9. Assume that \k\ |x| > 0. Then 
 
 (v + u)wya (u + vy + wy) (1 + uy)vo~ 
 
 W + U + :; + ^ 
 n < (1 " V T> 
 
 2 — 2 
 
 n (w + uv + v y) , 1 ,. 
 1-u -Z — — (1 + u)ya 
 
 (1 - vy) Z 
 
 provided that the denominator is positive. 
 
62 
 
 BIBLIOGRAPHY 
 
 BAUER, F. L. Genauigkeitsfragen bei der Losung linearer Gleichungssysteme. 
 ZAMM 46, 1 (November 1966), 409-421. 
 
 BAUER, F. L. Computational graphs and rounding error. SI AM J. Numer. 
 Anal. 11, 1 (March 1974), 87-96. 
 
 FORSYTHE, G. E. , AND MOLER, C. B. Computer Solution of Linear Algebraic 
 Systems. Prentice-Hall, Inc., Englewood Cliffs, N.J., 1967. 
 
 GEAR, C. W. Numerical errors in sparse linear equations. File F-75-885, 
 Dept. of Comput. Sci. , Univ. of Illinois at Urbana-Champaign, Urbana, 
 Illinois, May 1975. 
 
 HAMMING, R. W. Introduction to Applied Numerical Analysis. McGraw-Hill, 
 New York, 1971. 
 
 KAHAN, W. Numerical linear algebra. Canad. Math. Bull. 9 (1966), 757-801. 
 
 MILLER, W. On the stability of finite numerical procedures. Numer. Math 19 
 (1972), 425-432. 
 
 MILLER, W. Computer search for numerical instability. J. ACM 22, 4 
 (October 1975), 512-521. 
 
 MILLER, W. Roundoff analysis by direct comparison of two algorithms. 
 SIAM J. Numer. Anal. 13, 3 (June 1976), 382-392. 
 
 SHERMAN, A. H. Algorithms for sparse Gaussian elimination with partial 
 
 pivoting. Rpt. R-76-817, Dept. of Comput. Sci., Univ. of Illinois at 
 Urbana-Champaign, Urbana, Illinois, July 1976. 
 
 STEWART, G. W. Introduction to Matrix Computations. Academic Press, New 
 York, 1973. 
 
 VAN DER SLUIS, A. Stability of solutions of linear algebraic systems. 
 Numer. Math. 14 (1970), 246-251. 
 
 VAN DER SLUIS, A. Condition, equilibration, and pivoting in linear 
 algebraic systems. Numer. Math. 15 (1970), 74-86. 
 
 WILKINSON, J. H. Rounding Errors in Algebraic Processes. Prentice-Hall, 
 Inc., Englewood Cliffs, N.J., 1963. 
 
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 
 
 REPORT DOCUMENTATION PAGE 
 
 READ INSTRUCTIONS 
 BEFORE COMPLETING FORM 
 
 1. REPORT NUMBER 
 
 2. GOVT ACCESSION NO 
 
 3 RECIPIENT'S CATALOG NUMBER 
 
 4. TITLE (and Subtitle) 
 
 GAUSSIAN ELIMINATION AND NUMERICAL INSTABILITY 
 
 5. TYPE OF REPORT & PERIOD COVERED 
 
 Interim 
 
 6. PERFORMING ORG. REPORT NUMBER 
 
 UIUCDCS-R-77-862 
 
 7. AUTHORfs) 
 
 ROBERT D. SKEEL 
 
 8. CONTRACT OR GRANT NUMBERfs) 
 
 AFOSR-75-2854 
 
 9. PERFORMING ORGANIZATION NAME AND ADDRESS 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 61801 
 
 10. PROGRAM ELEMENT, PROJECT, TASK 
 AREA a WORK UNIT NUMBERS 
 
 61102F 
 
 2304/A3 
 
 11. CONTROLLING OFFICE NAME AND ADDRESS 
 
 Air Force Office of Scientific Research/NM 
 Boiling AFB DC 20332 
 
 12. REPORT DATE 
 
 April 1977 
 
 13. NUMBER OF PAGES 
 
 62 
 
 14. MONITORING AGENCY NAME & ADDRESSf// different from Controlling Office) 
 
 15. SECURITY CLASS, (of this report) 
 
 Unclassified 
 
 15a. DECLASSIFI CATION/ DOWN GRADING 
 SCHEDULE 
 
 16. CI' T RHUTION STATEMENT (of this Report) 
 
 Approved for public release; distribution unlimited. 
 
 J '7. DISTRIBUTION STATEMENT (of the abstract entered In Block 20, If different from Report) 
 
 18. SUPPLEMENTARY NOTES 
 
 19. KEY WORDS (Continue on reverse side if necessary and identify by block number) 
 
 numerical stability, Gaussian elimination, iterative improvement, 
 ill conditioning, scaling, equilibration, pivoting 
 
 20. ABSTRACT (Continue on reverse side If necessary and identify by block number) 
 
 Roundoff error in the solution of linear algebraic systems is studied using 
 a more realistic notion of what it means to perturb a problem, namely that each 
 datum is subject to a relatively small change. The condition number is determined 
 for this approach. A good computable error bound is given for the "backward 
 error." The effect of scaling on the stability of Gaussian elimination is 
 studied, and it is discovered that the proper way to scale a system is dependent 
 on knowing the solution. Finally it is shown that Gaussian elimination can be 
 stabilized by doing iterative improvement. 
 
 DD , :° N RM 73 1473 
 
 EDITION OF 1 NOV 65 IS OBSOLETE 
 
 SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 
 
BIBLIOGRAPHIC DATA 
 5HEET 
 
 1. Report No. 
 
 UIUCDCS-R-77-862 
 
 3. Recipient's Accession No. 
 
 Title and Subtitle 
 
 GAUSSIAN ELIMINATION AND NUMERICAL INSTABILITY 
 
 5. Report Date 
 
 April 1977 
 
 6. 
 
 I Author(s) 
 
 Robert D. Skeel 
 
 8. Performing Organization Rept. 
 No. g 62 
 
 . Performing Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 AFOSR-75-2854 
 
 2. Sponsoring Organization Name and Address 
 
 
 Air Force Office of Scientific Research/NM 
 Boiling AFB DC 20332 
 
 13. Type of Report & Period 
 Covered 
 
 interim 
 
 14. 
 
 5. Supplementary Notes 
 
 6. Abstracts 
 
 Roundoff error in the solution of linear algebraic systems is studied using 
 a more realistic notion of what it means to perturb a problem, namely that 
 each datum is subject to a relatively small change. The condition number is 
 determined for this approach. A good computable error bound is given for the 
 "backward error." The effect of scaling on the stability of Gaussian 
 elimination is studied, and it is discovered that the proper way to scale a 
 system is dependent on knowing the solution. Finally it is shown that 
 Gaussian elimination can be stabilized by doing iterative improvement. 
 
 '. Key Words and Document Analysis. 17a. Descriptors 
 
 numerical stability, Gaussian elimination, iterative improvement, 
 ill conditioning, scaling, equilibration, pivoting 
 
 b. Identifiers/Open-Ended Terms 
 
 "c. COSATl Field/Group 
 
 (.Availability Statement 
 
 unlimited 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 65 
 
 22. Price 
 
 FRM NTIS-35 (10-70) 
 
 USCOMM-DC 40329-P7I 
 
JUN2 81977