■I 
 
 ■HR Kb 
 
 HHBH9 
 
 
 
 ■HH1 
 
 hi 
 
 ■HE HBIsE & 
 
 H & II 
 
 HSVPfl 
 
 fffSifl 
 BIB! R IB 
 
 Bliliig 
 
 IF 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 U(oT 
 
 cop* 2. 
 
CENTRAL CIRCULATION BOOKSTACKS 
 
 The person charging this material is re- 
 sponsible for its renewal or its return to 
 the library from which it was borrowed 
 on or before the Latest Date stamped 
 below. You may be charged a minimum 
 fee of $75.00 for each lost book. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 TO RENEW CALL TELEPHONE CENTER, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 MAR 1 1 1998 
 
 When renewing by phone, write new due date below 
 previous due date. L162 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/algorithmsforspa817sher 
 
UIUCDCS-R-76-817 
 lUhJ 
 
 ru.f/7 
 
 I J\(KA^/{ 
 
 C00-2383-0034 
 
 Uf 
 
 2- 
 
 ALGORITHMS FOR SPARSE GAUSSIAN ELIMINATION 
 WITH PARTIAL PIVOTING 
 
 by 
 Andrew H. Sherman 
 
 y 
 
 July 1976 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 «&££* 
 
UIUCDCS-R-76-817 C00-2383-0034 
 
 ALGORITHMS FOR SPARSE GAUSSIAN ELIMINATION 
 WITH PARTIAL PIVOTING 
 
 by 
 Andrew H. Sherman 
 
 July 1976 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMAPIGN 
 
 URBANA, ILLINOIS 61801 
 
 Supported in part by the National Science Foundation under grant 
 NSF DCR-73-07998 and by the Energy Research and Development Administration 
 under grant US ERDA E(ll-l) 2383. An earlier version of this paper was 
 presented at the 1976 SIAM National Meeting. 
 
ill 
 
 Abstract 
 
 In this paper we compare several algorithms for sparse Gaussian 
 elimination with column interchanges. The algorithms are all derived from 
 the same basic elimination scheme, and they differ mainly in implementation 
 details. We examine their theoretical behavior and compare their perfor- 
 mances on a number of test problems with that of a high quality complete 
 threshold pivoting code. Our conclusion is that partial pivoting codes 
 perform quite well and that they should be considered for sparse problems 
 whenever pivoting for numerical stability is required. 
 
 Key Words: Sparse Gaussian Elimination, Sparse Linear Systems, Linear 
 Equations, Analysis of Algorithms, Pivoting Algorithms 
 
 CR Categories: 4.6, 5.14, 5.25 
 
1. Introduction 
 
 Let A be an NxN nonsingular matrix, and consider the system of 
 linear equations 
 
 A x = b (1.1) 
 
 where x and b are vectors of length N. If N is small or A is dense (i.e. 
 most entries a., are nonzero), then the standard algorithm for solving 
 (1.1) is Gaussian elimination with partial pivoting (cf. Forsythe and 
 Moler [6]). This algorithm is quite well understood, from both the 
 theoretical and practical points of view, and it produces an accurate 
 solution x (provided that A is well conditioned and the equations are 
 well scaled, both of which we shall assume throughout). 
 
 When N is large and the matrix A is sparse (i.e. most entries a. . 
 are zero), the situation is quite different. Now there are two problems 
 to contend with: the accurate computation of x, and the exploitation of 
 the large number of zeroes in A to reduce storage and computing time. 
 When A is symmetric and positive definite, a number of methods may be 
 effective, including variants of Gaussian elimination and iterative methods 
 such as conjugate gradient. However, in most other cases, Gaussian 
 elimination with pivoting seems to be the best method currently available. 
 
 In this paper we examine algorithms for sparse Gaussian elimination 
 with partial pivoting. In Section 2 we introduce a basic high-level 
 algorithm and point out several problems that arise in practice. In 
 Section 3 we discuss several possible algorithm refinements which solve 
 the practical problems, and we give a theoretical comparison of them based 
 on their behavior in the best, average, and worst cases. Unfortunately, 
 we do not believe that our theoretical analysis reflects what happens in 
 
2 
 practice, so Section 4 we give the results of some numerical experiments 
 with our algorithms and compare them with one of the better existing codes 
 for Gaussian elimination with pivoting. Finally, we conclude in Section 5 
 by suggesting directions for future development based on the results in 
 this paper. 
 
 2. A Basic Algorithm 
 
 In this section we examine a basic algorithm for sparse Gaussian 
 elimination with partial pivoting. More precisely, we consider an algorithm 
 which performs Gaussian elimination with column interchanges on the matrix 
 A to effectively obtain a factorization of the form 
 
 A Q = L U, (2.1) 
 
 where L is lower triangular, U is unit upper triangular, and Q is a 
 permutation matrix corresponding to the column interchanges. Once this 
 factorization has been obtained, we can compute x by solving the systems 
 
 L y = b and U Q T x = y, 
 although we will not discuss this forward and back solution process here. 
 The algorithm we give is a modification of an algorithm without pivoting 
 discussed in [5]. 
 
line 
 
 1 For k = 1 to N do 
 
 2 (I k = {J: a kj *0>; 
 
 3 m = min {j: j e I k U {N + 1}}; 
 
 4 While k > m do 
 
 5 < ! k " h " {m > u '•» 
 
 6 For j £ I do 
 
 m 
 
 7 (a kj = a kj " a km' a mj ); 
 
 8 m = min {j: j e I k U {N + 1}}); 
 
 9 If m $ N then do 
 
 10 U = min {j: |a,.| = max (|a k -|: k * i * N H; 
 
 11 If a. = then exit with failure (zero pivot); 
 
 12 Interchange columns k and l\ 
 
 13 If m = k then I. = I. - {k}; 
 
 14 Else I k = I k - {£}; 
 
 15 For j e I. do 
 
 16 (a kj = a kj /a kk )); 
 
 17 Else exit with failure (zero pivot)); 
 
 Algorithm 2.1 
 
 The basic algorithm, Algorithm 2.1, is a row-oriented version of 
 Gaussian elimination with column interchanges. It consists of N steps, 
 during each of which one row of A is processed. When processing the k-th 
 row at the k-th step, I. is used to hold the indices of all columns con- 
 taining nonzeroes in the k-th row. Initially (line 2), these are just 
 the nonzero columns of A in the k-th row. Then, in increasing order, for 
 
each m e I. , m < k, a multiple of row m of U is subtracted from row k to 
 annihilate a. (lines 3 - 8). This may cause fill-in, i.e., the introduction 
 of new nonzero elements in the k-th row, so I. must be updated (line 5). 
 Finally, when all m < k have been processed, I. contains the indices of 
 columns which contain nonzeroes in the portion of the k-th row lying in 
 the upper triangle of U. If A is nonsingular, I. will not be empty, so 
 that m s N will hold in line 9. In that case the algorithm selects the 
 remaining nonzero element with maximum modulus and interchanges its column 
 with the k-th column. If A is singular, on the other hand, then either 
 m = N + 1 in line 9, or a. = in line 11. In either case, the algorithm 
 takes a failure exit. (However, this situation could simply be handled by 
 storing a null k-th row, if one wished to deal with singular matrices.) 
 
 That Algorithm 2.1 is numerically stable can be shown quite easily 
 by relating it to the application of standard Gaussian elimination with 
 row interchanges to A . In fact, assume that such a procedure produces a 
 
 factorization of A as 
 
 P A T = L U, (2.2) 
 
 where L is unit lower triangular, U is upper triangular, and P is a permu- 
 tation matrix corresponding to the row interchanges. Then we can show that 
 Algorithm 2.1 produces the factorization (2.1) of A with L = U , U = L , 
 and Q = P . Since the computation of (2.2) is numerically stable (cf. [6]), 
 that of (2.1) by Algorithm 2.1 is also. 
 
 In Algorithm 2.1 there are only two operations whose costs can be 
 greatly affected by the column interchanges due to pivoting: the selection 
 of m in lines 3 and 8, and the updating of I. in line 5. The costs of 
 these operations may depend critically on the representation of the sets 
 
I, and I (e.g. ordered vs. unordered, list vs. array), while the costs of 
 
 other operations in the algorithm in general do not. 
 
 As an illustration, consider two possible representations of I. : 
 
 as a linked list in increasing order and as a heap with smaller nodes 
 
 nearer the root (cf. Knuth [7], p. 145). In the first case, selecting m 
 
 is easy, since we want to choose the first entry of I. , and the updating 
 
 operation in line 5 can be performed as a linear merge of the two ordered 
 
 lists in 0(1 1 + II. I) time. (We use II and II. I to denote the number 
 v i m i ik 1 v ' m 1 ' k 1 
 
 of entries in II I and ILL respectively, when we start to merge I into 
 1 m ' ' k 1 m 
 
 I. .) In the second case, selecting m is still easy, since we want to choose 
 the root of the heap, but now the updating operation may be more costly, 
 since we must delete m and separately insert each entry of I into the heap 
 
 for I.. This may require a total of 0(1 1 I log (|I I + II. I)) time. 
 
 k J ^ ' m 1 3 2 ' m 1 ' k 1 
 
 From this example it might seem that there is no question as to the 
 proper course: simply use an ordered linked list representation for I, (and 
 an ordered array representation for previously computed sets I ). In fact, 
 if there were no pivoting, this conclusion would probably be correct. 
 However, as soon as column interchanges can occur, the situation is not so 
 clear-cut. The problem is that while I was in increasing order when it 
 was computed at the m-th step, subsequent pivoting operations may cause it 
 to be out of order at the k-th step. Thus the set union operation can no 
 longer be performed as a simple ordered list merge. We might try to avoid 
 this problem by keeping the sets I. in some canonical order (say the initial 
 column order of A), but then the selection of m would become difficult, 
 since it would require a (necessarily linear) search for the minimum 
 remaining column index in I. (in the current column order) at the k-th step. 
 
6 
 In the remainder of this paper, we will use Algorithm 2.1 as the 
 basis for our discussion of various alternative means of implementing the 
 selection and updating operations which seem so problematical when pivoting 
 is permitted. The next section contains descriptions of several possible 
 implementations and a limited theoretical discussion of them. In Section 
 4, we will present more meaningful comparisons, based on numerical 
 experiments . 
 
 3. Algorithm Implementations 
 
 In this section we present six different ways of implementing the 
 
 selection and updating operations of Algorithm 2.1. For each we will 
 
 briefly describe the representations used for I, and I (i.e. for I, during 
 
 k m k 
 
 the k-th step and for the previously computed sets I ). Then we will 
 
 outline the methods used to select m and to form the union I, U I , giving 
 
 k m 
 
 rough estimates of the costs of these operations in the best case (no 
 pivoting), the average case, and the worst case. By average case here, 
 we mean the average case with respect to the N! possible reorderings of 
 the columns of A, not with respect to all possible matrices A. While the 
 former may be less interesting, the latter seems all but impossible to 
 study except through some (probably unrealistic) testing program using 
 randomly generated matrices A. 
 
 1) Run Insertion 
 
 Each set I. is kept as a linked list in increasing order with 
 respect to the current column order at the k-th step, and the sets I are 
 kept as arrays in increasing order with respect to the column order at the 
 m-th step, where they were computed. The selection of m requires constant 
 
time, since m is always the first remaining entry of I. . To form I.U I , 
 
 K K. 
 
 we effectively split I into its component increasing runs (cf. Knuth [7], 
 
 p. 34) and separately merge each of the runs into I. . In practice, this 
 
 is done by merging the entries of I into I. and backing up to the beginning 
 
 m k 
 
 of I, whenever an out-of-order entry is found in I . In the best case, the 
 
 k J m 
 
 union operation requires 0(|I | + 1 1 . | ) time, and in both the average and 
 worst cases it requires ( | I J • | I. | ) time (cf. Knuth [7], pp. 34-45, for 
 a discussion of the average number of runs in a sequence). 
 
 2) List Merge with Bubble Sort 
 
 The sets I. and I are kept as in (1), except that I is reordered 
 k m v \ i* r m 
 
 whenever it is used. As before, choosing m requires constant time. To 
 
 form I. U I , we first sort I with a bubble sort (cf. Knuth [7], pp. 
 km m rr 
 
 106-110) and then perform an ordered list merge. In the best case, this 
 
 will require ( 1 1 | + |lj) time, while in the average and worst cases it 
 
 i 1 2 
 will require 0( I + I, ) time. 
 1 m 1 ' k ' 
 
 3) List Merge with Insertion Sort 
 
 This is identical with (2), except that I is sorted with a straight 
 insertion sort (cf. Knuth [7], pp. 80-82) when it is used. As above, m 
 can be chosen in constant time, and the set union operation requires 
 0(1 1 I + 1 1, |) time in the best case and 0(|I | + |l,|) time in the 
 average and worst cases. 
 
 4) List Merge with Shell Sort 
 
 This is also identical with (2), except that I is sorted with a 
 
 v r m 
 
 Shell sort (cf. Knuth [7], pp. 84-95) using increments as suggested in 
 
Knuth [7], p. 95. Again m can be chosen in constant time, and the set 
 
 union operation requires 0(1 1 I + ll, I) time in the worst case. In the 
 
 1 m' ' k 1 
 
 3/2 
 average case, however, only 0(|I | + |I.|) time is required for 
 
 m 
 
 m' 
 
 5) Heap Insertion 
 
 The set I. is split into two parts. II. = {m e I. : m < k} and 
 
 12. = {m e I. : m > k}. II. is kept as a heap with smaller nodes nearer 
 
 the root (cf. Knuth [7], p. 145), and 12. is kept as an unordered array. 
 
 In addition the characteristic vector of I. (cf. Aho, Hopcroft, and Ullman 
 
 [1], p. 49) is kept to avoid storing duplicate entries. At the end of the 
 
 k-th step, II. is empty, and I. = 12. . Hence we assume that the sets I 
 
 are stored as unordered arrays. (In fact, there is no loss in not having 
 
 the sets I ordered.) Choosing m in this case still requires constant 
 
 time, since we want to choose the root of the heap for II. . To form 
 
 I, U I , we separately add each entry of I either to the heap for II, or 
 km m k 
 
 to the end of the unordered array for 12. . To insert an entry into the 
 
 heap requires 0(1 og 2 | II . | ) time, and to add an entry to the unordered 
 
 array requires constant time. Thus we expect that the best case time for 
 
 the set union operation will be 0(1 1 I) time (when all the entries of I 
 
 are larger than k), and that the average and worst case times will be 
 
 0(1 1 I log (|I I + 1 1, I)). Note that deleting m from I, requires 
 l m l3 2'm l ' k 1 " k ^ 
 
 0(log 2 1 1. | ) time, a usually insignificant cost. 
 
 6) Pivot Search (cf. Curtis and Reid [3]) 
 The set I. is kept as a linked lis 
 to a canonical ordering (e.g., the initial column order of A), and the sets 
 
 The set I. is kept as a linked list in increasing order with respect 
 
I are kept as ordered arrays in the canonical order. Now choosing m 
 requires searching I. for the minimum remaining column number in current 
 order, and since this must be done as a linear search, it will require 
 0(| I. |) time on average. In return for this searching, we can again form 
 I. U I in 0( 1 1 | + 1 1. | ) time independent of the amount of pivoting, 
 
 since both I and I, are in increasing order with respect to the canonical 
 
 m k 3 K 
 
 order. 
 
 4. Numerical Results 
 
 In this section we give numerical results obtained by programming 
 the five implementations of Section 3 and running the programs on several 
 test problems. The results here are not comprehensive, but the problems 
 are realistic ones. The problem GS192 arose in the solution of a parabolic 
 partial differential equation and was supplied by Professor Paul Saylor 
 of the University of Illinois. The others were taken from a collection 
 of test problems supplied by Dr. Iain Duff of the Atomic Energy Research 
 Establishment at Harwell, England. Brief descriptions of the problems are 
 given in Table 4.1. (See Duff and Reid [4] for further discussion of the 
 Harwell test problems.) 
 
10 
 
 Problem 
 
 N 
 
 Application 
 
 Remarks 
 
 ARC130 
 
 130 
 
 Laser 
 Research 
 
 
 SHLO 
 
 663 
 
 Linear 
 Programming 
 
 Lower triangular 
 
 SHL200 
 
 663 
 
 Linear 
 Programming 
 
 
 SHL400 
 
 663 
 
 Linear 
 Programming 
 
 
 STRO 
 
 363 
 
 Linear 
 Programming 
 
 Lower triangular 
 
 STR200 
 
 363 
 
 Linear 
 Programming 
 
 
 STR400 
 
 363 
 
 Linear 
 Programming 
 
 
 GS192 
 
 192 
 
 Parabolic 
 Partial 
 Differential 
 Equation 
 
 Banded with half 
 bandwidth of 19 
 
 TABLE 4.1 
 
11 
 
 Experiments have shown that the storage and time costs of the 
 algorithms discussed here may depend heavily on the initial row and column 
 orders of A. Except for GS192, we chose to order the rows in order of 
 increasing number of nonzero entries and to order the columns symmetrically. 
 For GS192, we used the matrix exactly as supplied to us. If the permutation 
 matrix P accomplishes the desired row reordering, then we applied the 
 algorithms to the permuted system 
 
 P A P T (P x) = P b. 
 There is little theoretical reason to expect that the choice of P would 
 have a significant effect on the accuracy of the computed solution x, except 
 that reducing the number of arithmetic operations performed should give a 
 corresponding reduction in the roundoff error incurred. In practice, 
 however, we found that particularly inefficient orderings of the rows led 
 to poor accuracy, while orderings which were at least reasonably efficient 
 gave uniformly good results. Unfortunately, at present we can say no more 
 than this, but we hope to investigate further the problem of choosing the 
 initial row and column orderings. (Note, however, that the simple row 
 reorderings described above gave good numerical results for our test 
 problems. ) 
 
 To provide a benchmark comparison for our algorithms, we solved 
 the test problems with a code due to A. R. Curtis and J. K. Reid [2] which 
 is part of the Harwell Subroutine Library. This code uses a rather more 
 sophisticated complete pivoting algorithm to factor A and compute x. Where 
 our algorithms pre-order A to exploit sparseness and then pivot entirely 
 for numerical stability, the Harwell code uses a strategy known as threshold 
 pivoting to do both at once. 
 
12 
 In threshold pivoting the pivot element at the k-th step is chosen 
 to be that entry a.., k < i, j < N, which requires the fewest arithmetic 
 operations to eliminate and which satisfies either 
 
 |a..| >, u-rnax {|a. |: k $ i $ N} or 
 
 | a . . | >* u«max { |a . | : k $ i $ N} , 
 
 where u is a threshold parameter satisfying < u $ 1. For our experiments 
 we used two different settings for u: u = 1.0E-6, to allow pivoting almost 
 entirely to exploit sparseness, and u = 1.0, to force pivoting almost 
 entirely for numerical stability. Note that the form of the threshold test 
 allows for some flexibility in either case. For intermediate values of u, 
 we would expect the code to exhibit a behavior somewhere between the two 
 extremes we used. 
 
 Our experiments were run in single precision on an IBM System/360 
 Model 75 using the Fortran IV Level G compiler. All of the programs gave 
 equally accurate results as measured by the l_ 2 and L^ norms of both the 
 absolute error in the solution and the residual, except that the Harwell 
 code with u = 1.0E-6 was sometimes less accurate than the others. Hence 
 the important question in our tests was efficiency, and we chose to use 
 two measures: the number of nonzeroes in the computed LU factorization 
 (as some measure of storage cost), and the time required to compute the 
 solution x from the original (not yet pre-ordered) system (1.1). We should 
 note that probably all of the codes could be made somewhat faster by more 
 careful coding and that the times may be in error by approximately 10% due 
 to system overhead. However, the results are still qualitatively 
 interesting. 
 
13 
 The results shown in Table 4.2 show several things. Most obvious, 
 the partial pivoting codes are invariably much faster than the Harwell 
 code, even though they produce a denser factorization (and hence perform 
 more arithmetic operations). Next, it appears that the Shell Sort, Heap 
 Insertion, and Pivot Search algorithms performed worse than the other 
 partial pivoting algorithms on most of the problems. In the first case 
 this is probably due to the nature of the data which must be sorted; in 
 the second case, this is probably due to data structure overhead; and in 
 the last case this is probably due to the fact that the costly searches 
 occurred in the filled-in rows at each step. Finally, it seems that the 
 four remaining partial pivoting algorithms all performed comparably well, 
 with the Run Insertion algorithm appearing best over all, perhaps because 
 it can be implemented with little programming overhead. 
 
14 
 
 
 .c 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 LO 
 
 
 
 ■»-> 
 
 a 
 
 r^ 
 
 O 
 
 CO 
 
 I— 
 
 r-» 
 
 ■ — 
 
 1— 
 
 CO 
 
 CO 
 
 CO 
 
 «3- 
 
 t— 
 
 CO 
 
 LO 
 
 O 
 
 CT> 
 
 
 
 o 
 
 s- 
 
 LO 
 
 r^ 
 
 CO 
 
 1 — 
 
 CO 
 
 CM 
 
 LO 
 
 r— 
 
 LO 
 
 r-^ 
 
 LO 
 
 CO 
 
 CO 
 
 CO 
 
 00 
 
 • 
 
 
 
 > 
 
 rO 
 
 <3- 
 
 • 
 
 LO 
 
 • 
 
 CO 
 
 • 
 
 CO 
 
 • 
 
 cti 
 
 • 
 
 CM 
 
 • 
 
 n— 
 
 • 
 
 CO 
 
 r-^ 
 
 
 
 •1 — 
 
 QJ 
 
 t— 
 
 o 
 
 CVJ 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 CM 
 
 O 
 
 r^ 
 
 <d- 
 
 CO 
 
 ^J- 
 
 CO 
 
 r— 
 
 
 
 Q. 
 
 CO 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 C 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 •r - 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 r-. 
 
 
 
 O- 
 
 +J 
 
 r*» 
 
 00 
 
 CO 
 
 cti 
 
 r^ 
 
 O 
 
 ^— 
 
 "St 
 
 CO 
 
 O 
 
 ^1- 
 
 00 
 
 CO 
 
 CTi 
 
 
 
 LO 
 
 
 
 ro 
 
 S_ 
 
 LO 
 
 LO 
 
 CO 
 
 LO 
 
 CO 
 
 LO 
 
 LO 
 
 CO 
 
 LO 
 
 CTl 
 
 LO 
 
 *d- 
 
 CO 
 
 CO 
 
 CO 
 
 • 
 
 
 
 CU 
 
 ai 
 
 *3- 
 
 • 
 
 LO 
 
 • 
 
 CO 
 
 • 
 
 CO 
 
 • 
 
 CTl 
 
 • 
 
 CM 
 
 • 
 
 CM 
 
 • 
 
 CO 
 
 
 
 
 
 DC 
 
 10 
 
 E 
 
 1—4 
 
 
 o 
 
 CM 
 
 «3- 
 
 CM 
 
 ^ 
 
 CM 
 
 »* 
 
 CM 
 
 
 1^ 
 
 CO 
 
 00 
 
 CO 
 
 CO 
 
 
 
 
 , 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CT> 
 
 
 
 1— 
 
 +J 
 
 r^ 
 
 «d- 
 
 CO 
 
 o 
 
 r*. 
 
 co 
 
 ^— 
 
 r^ 
 
 CO 
 
 CO 
 
 ^j" 
 
 t— 
 
 CO 
 
 «3- 
 
 
 
 r-^ 
 
 
 
 CU 
 
 s- 
 
 in 
 
 CO 
 
 CO 
 
 r— 
 
 CO 
 
 CM 
 
 LO 
 
 t— 
 
 LO 
 
 «3- 
 
 LO 
 
 co 
 
 CO 
 
 r— 
 
 00 
 
 • 
 
 
 
 -C 
 
 o 
 
 <3- 
 
 • 
 
 LO 
 
 • 
 
 CO 
 
 • 
 
 CO 
 
 • 
 
 <x> 
 
 • 
 
 CM 
 
 • 
 
 CM 
 
 • 
 
 CO 
 
 LO 
 
 
 
 CO 
 
 CO 
 
 1 
 
 o 
 
 CM 
 
 1 
 
 CM 
 
 1 
 
 CM 
 
 r_ 
 
 CM 
 
 
 
 r-. 
 
 *3- 
 
 00 
 
 LO 
 
 CO 
 
 CO 
 
 
 
 c 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 •r- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 r^ 
 
 
 
 4-> 
 
 +-> 
 
 r-» 
 
 o 
 
 CO 
 
 CO 
 
 r»- 
 
 CO 
 
 r— 
 
 «3- 
 
 CO 
 
 
 
 ■«i- 
 
 00 
 
 CO 
 
 CM 
 
 
 
 CT> 
 
 
 
 S- 
 
 s- 
 
 LO 
 
 CO 
 
 CO 
 
 ^~- 
 
 CO 
 
 r— 
 
 LO 
 
 
 
 LO 
 
 LO 
 
 LO 
 
 «d" 
 
 CO 
 
 CO 
 
 CO 
 
 • 
 
 
 
 QJ 
 
 o 
 
 «=* 
 
 • 
 
 LO 
 
 • 
 
 CO 
 
 • 
 
 CO 
 
 • 
 
 CTl 
 
 • 
 
 CM 
 
 • 
 
 CM 
 
 • 
 
 CO 
 
 CTt 
 
 
 
 (A 
 
 C 
 t— i 
 
 CO 
 
 
 o 
 
 CM 
 
 
 CM 
 
 
 CM 
 
 
 CM 
 
 O 
 
 r^ 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 
 
 
 cu 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ,_ 
 
 
 
 r— 
 
 +-> 
 
 r^. 
 
 CO 
 
 CO 
 
 en 
 
 r^ 
 
 CM 
 
 r— 
 
 00 
 
 CO 
 
 CO 
 
 ^l- 
 
 CM 
 
 CO 
 
 LO 
 
 
 
 «d" 
 
 
 
 -O 
 
 S- 
 
 LO 
 
 LO 
 
 CO 
 
 CTl 
 
 CO 
 
 r— 
 
 LO 
 
 O 
 
 LO 
 
 «3- 
 
 LO 
 
 CO 
 
 CO 
 
 «d- 
 
 00 
 
 ■ 
 
 
 
 -O. 
 
 o 
 
 <* 
 
 • 
 
 LO 
 
 • 
 
 CO 
 
 • ' 
 
 CO 
 
 • 
 
 CTt 
 
 • 
 
 CM 
 
 • 
 
 CM 
 
 • 
 
 CO 
 
 <^- 
 
 • 
 
 
 3 
 
 CO 
 
 r— 
 
 o 
 
 CM 
 
 o 
 
 CM 
 
 ^— 
 
 CM 
 
 ^~ 
 
 CM 
 
 
 
 r-. 
 
 CO 
 
 CO 
 
 CO 
 
 CO 
 
 1— 
 
 1= 
 
 CM 
 
 CD 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 **- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 c 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 =3 
 
 LU 
 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 r— 
 
 _l 
 
 
 •1 — 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 CM 
 
 O 
 
 CO 
 
 C 
 
 +J 
 
 r^ 
 
 00 
 
 00 
 
 CO 
 
 r>^ 
 
 CO 
 
 r— 
 
 CTi 
 
 CO 
 
 1— 
 
 *d- 
 
 ** 
 
 CO 
 
 CO 
 
 
 
 CM 
 
 to 
 
 < 
 
 3 
 
 s_ 
 
 LO 
 
 •3- 
 
 CO 
 
 CO 
 
 CO 
 
 r— 
 
 LO 
 
 r— 
 
 LO 
 
 CO 
 
 LO 
 
 
 
 CO 
 
 «d- 
 
 00 
 
 • 
 
 
 1— 
 
 cc 
 
 CU 
 
 «* 
 
 • 
 
 LO 
 
 • 
 
 CO 
 
 • 
 
 CO 
 
 • 
 
 CTl 
 
 • 
 
 CM 
 
 • 
 
 CM 
 
 • 
 
 00 
 
 O 
 
 CU 
 
 
 
 tO 
 
 r— 
 
 o 
 
 CM 
 
 1 — 
 
 CM 
 
 r— 
 
 CM 
 
 r— 
 
 CM 
 
 
 
 r^ 
 
 CM 
 
 CO 
 
 CM 
 
 CO 
 
 ^-* 
 
 -M 
 
 
 
 c 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ro 
 
 
 
 t—t 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 co 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 O 
 
 
 1— 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 * 
 
 (O 
 
 
 r— 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 * 
 
 CM 
 
 
 
 <u 
 
 r^ 
 
 ro 
 
 CO 
 
 1^ 
 
 en 
 
 CO 
 
 r— 
 
 CM 
 
 ct» 
 
 «3" 
 
 tT» 
 
 CO 
 
 CM 
 
 LO 
 
 00 
 
 CO 
 
 «^" 
 
 c 
 
 
 2 
 
 
 CTl 
 
 CO 
 
 CO 
 
 o 
 
 CM 
 
 CO 
 
 f— 
 
 
 
 LO 
 
 *d- 
 
 r— 
 
 CM 
 
 CO 
 
 LO 
 
 «* 
 
 • 
 
 (O 
 
 
 S- 
 
 II 
 
 CM 
 
 • 
 
 CO 
 
 • 
 
 1^ 
 
 • 
 
 r-» 
 
 • 
 
 *J- 
 
 • 
 
 LO 
 
 • 
 
 r--. 
 
 • 
 
 r-» 
 
 CM 
 
 
 
 ro 
 
 
 r— 
 
 r^ 
 
 ^~ 
 
 r^ 
 
 n— 
 
 co 
 
 T— 
 
 r-. 
 
 CM 
 
 >^- 
 
 CO 
 
 CO 
 
 co 
 
 CT> 
 
 CO 
 
 00 
 
 cu 
 
 
 a: 
 
 3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 3 
 Q. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 r— 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 E 
 
 
 ^— 
 
 • 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 — 
 
 
 CM 
 
 O 
 
 
 a> 
 
 r— 
 
 CO 
 
 CT> 
 
 r^ 
 
 o 
 
 CO 
 
 r^ 
 
 CM 
 
 CO 
 
 ^1- 
 
 CO 
 
 O 
 
 r— 
 
 CTi 
 
 cr> 
 
 CO 
 
 CTl 
 
 O 
 
 
 s 
 
 
 CT> 
 
 00 
 
 CO 
 
 en 
 
 CM 
 
 *3- 
 
 r— 
 
 CO 
 
 LO 
 
 CO 
 
 «^" 
 
 a> 
 
 r^. 
 
 • 
 
 CO 
 
 • 
 
 
 
 c 
 
 II 
 
 CM 
 
 • 
 
 CO 
 
 • 
 
 r-. 
 
 • 
 
 r^. 
 
 • 
 
 «d- 
 
 • 
 
 CO 
 
 • 
 
 CO 
 
 
 
 LO 
 
 O 
 
 O 
 
 
 ro 
 
 
 1 — 
 
 CO 
 
 i^ 
 
 LO 
 
 ^~ 
 
 CO 
 
 ^~ 
 
 CO 
 
 CM 
 
 CO 
 
 CO 
 
 00 
 
 CO 
 
 r— 
 
 CO 
 
 CO 
 
 4-> 
 
 
 31 
 
 3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -O 
 
 CU 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 • 
 
 
 • 
 
 
 • 
 
 
 • 
 
 
 • 
 
 
 • 
 
 
 • 
 
 
 • 
 
 ^~ 
 
 
 
 
 
 co 
 
 
 CO 
 
 
 to 
 
 
 to 
 
 
 to 
 
 
 to 
 
 
 to 
 
 
 to 
 
 *r— 
 
 
 
 
 f-4 
 
 (J 
 
 rvl 
 
 o 
 
 r-vi 
 
 
 
 M 
 
 
 
 M 
 
 
 
 M 
 
 CJ 
 
 M 
 
 <J 
 
 IVI 
 
 
 
 rO 
 
 
 
 
 z 
 
 CD 
 CO 
 
 2: 
 
 CO 
 
 2: 
 
 CU 
 CO 
 
 21 
 
 cu 
 
 CO 
 
 z: 
 
 cu 
 
 CO 
 
 2: 
 
 ai 
 
 CO 
 
 Z 
 
 CU 
 CO 
 
 2: 
 
 cu 
 
 CO 
 
 <4- 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 "O 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 .c 
 
 
 o 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -M 
 
 
 .c 
 
 /E 
 
 o 
 
 
 
 O 
 
 O 
 
 
 
 c 
 
 ) 
 
 c 
 
 3 
 
 
 
 flj 
 
 
 ■•-> 
 
 / OJ 
 
 CO 
 
 
 
 
 
 O 
 
 
 
 c 
 
 > 
 
 c 
 
 3 
 
 CN 
 
 J 
 
 E 
 
 
 * . 
 
 / ^~- 
 
 I— 
 
 C 
 
 D 
 
 CM 
 
 «3" 
 
 c 
 
 3 
 
 CN 
 
 j 
 
 *j 
 
 ; 
 
 
 
 ^ 
 
 
 
 s: y 
 
 .C 
 
 C_3 
 
 
 J 
 
 —1 
 
 _l 
 
 a 
 
 * 
 
 a 
 
 
 a 
 
 
 
 
 cu 
 
 
 
 o 
 
 Cd 
 
 5 
 
 
 :r 
 
 :n 
 
 h 
 
 
 1- 
 
 
 l- 
 
 
 (/ 
 
 ) 
 
 -C 
 
 
 
 J- 
 
 <X. 
 
 u 
 
 7 
 
 CO 
 
 CO 
 
 u 
 
 ■) 
 
 c 
 
 > 
 
 c/ 
 
 ) 
 
 c 
 
 J 
 
 r- 
 
 
 
 Q_ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ■»c 
 
 
 

 15 
 5. Conclusion 
 
 There seem to be several directions for future research in this 
 area. On the theoretical side, there is a particularly troubling problem 
 which may be stated as follows. Let C(N) be the cost (in time) of solving 
 a sparse system (1.1) without pivoting. (Assume for a moment that this 
 is numerically stable.) It would be desirable for a pivoting algorithm 
 to have a cost which was 0(N«C(N)) in the worst case and which declined 
 to C(N) as the amount of pivoting decreased. Unfortunately, none of the 
 algorithms discussed here has both properties, and, in fact, it is unclear 
 whether any algorithm could have them. 
 
 In a more practical vein, it is desirable to continue the testing 
 and comparison program begin in the work reported in this paper. One 
 problem of importance is the selection of the initial row ordering. 
 Another question which might be investigated is whether there is some 
 threshold technique which might be applied to partial pivoting as discussed 
 here. Since no information is kept about the column structures of A, L, 
 and U, the strategy used in the Harwell code is not suitable, but it might 
 be possible to use a threshold parameter to decide whether or not to pivot 
 at all in a certain row. Reducing the amount of pivoting in this way 
 might make the algorithms run faster without significantly affecting 
 accuracy. 
 
 Finally, we must turn to the eventual goal of work such as ours, 
 the production of useful mathematical software. It is true that the 
 behavior of a partial pivoting code depends quite heavily on the specific 
 problem being solved, in particular on the amount of pivoting required 
 and on the initial ordering of the rows and columns of A. Hence our 
 
16 
 sample of problems may provide a biased and incorrect view of the true 
 situation. However, we believe that our results do provide enough 
 encouragement to justify the development and extensive testing of high 
 quality subroutines based on one or more of our algorithms. 
 
17 
 References 
 
 [1] A. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis 
 of Computer Algorithms . Addison- Wesley, 1974. 
 
 [2] A. R. Curtis and J. K. Reid. FORTRAN Subroutines for the Solution of 
 Sparse Sets of Linear Equations. AERE Report R6844, Harwell, England, 
 1971. 
 
 [3] A. R. Curtis and J. K. Reid. The Solution of Large Sparse Unsymmetric 
 Systems of Linear Equations. Information Processing 71 , pp. 1240-1245, 
 1971. 
 
 [4] I. S. Duff and J. K. Reid. A Comparison of Sparsity Orderings for 
 Obtaining a Pivotal Sequence in Gaussian Elimination. AERE Report 
 T. P. 526, Harwell, England, 1973. 
 
 [5] S. C. Eisenstat and A. H. Sherman. Efficient Implementation of Sparse 
 Nonsymmetric Gaussian Elimination Without Pivoting. SIGNUM Newsletter 
 10:26-29, 1975. 
 
 [6] G. E. Forsythe and C. B. Moler. Computer Solution of Linear Algebraic 
 Systems . Prentice-Hall, 1967. 
 
 [7] D. E. Knuth. The Art of Computer Programming, Volume 3: Searching 
 and Sorting . Addi son-Wesley, 1973. 
 
Form AEC-427 
 
 (6/68) 
 
 AECM 3201 
 
 U.S. ATOMIC ENERGY COMMISSION 
 
 UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR 
 
 DISPOSITION OF SCIENTIFIC AND TECHNICAL DOCUMENT 
 
 ! S## Inttrvctions on Rmttrm Slda ) 
 
 1. AEC REPORT NO. 
 
 C00-2383-0034 
 
 2. TITLE 
 
 ALGORITHMS FOR SPARSE GAUSSIAN ELIMINATION 
 WITH PARTIAL PIVOTING 
 
 3. TYPE OF DOCUMENT (Check one): 
 
 K] a. Scientific and technical report 
 
 l~l b. Conference paper not to be published in a journal 
 Title of conference __^__^_^^_^^_^_ 
 Date of conference 
 
 Exact location of conference. 
 
 Sponsoring organization 
 
 □ c. Other (Specify) 
 
 
 4. RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): 
 
 171 a. AEC's normal announcement and distribution procedures may be followed. 
 
 I | b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. 
 ~2 c. Make no announcement or distribution. 
 
 5. REASON FOR RECOMMENDED RESTRICTIONS: 
 
 6. SUBMITTED BY: NAME AND POSITION (Please print or type) 
 
 C. W. Gear 
 
 Professor and Principal Investigator 
 
 Organization 
 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 Urbana, IL J -;61801 
 
 Signature 
 
 Date 
 
 July 1976 
 
 FOR AEC USE ONLY 
 
 . /rEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION 
 RECOMMENDATION: 
 
 8. PATENT CLEARANCE: 
 
 □ a. AEC patent clearance has been granted by responsible AEC patent group. 
 
 □ b. Report has been sent to responsible AEC patent group for clearance. 
 I I c. Patent clearance not required. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-76-817 
 
 4. Title and Subtitle 
 
 ALGORITHMS FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL 
 PIVOTING 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 July 1976 
 
 7. Author(s) 
 
 Andrew H. Sherman 
 
 8- Performing Organization Rept. 
 
 No - UIUCDCS-R-76-817 
 
 9. Performing Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, IL 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 US ERDA E(ll-l) 2383 
 
 12. Sponsoring Organization Name and Address 
 
 United States Energy Research and Development Administration 
 
 Chicago Operations Office 
 
 9800 South Cass Avenue, Argonne, IL 60439 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 In this paper we compare several algorithms for sparse Gaussian 
 elimination with column interchanges. The algorithms are all derived from the 
 same basic elimination scheme, and they differ mainly in implementation details. 
 We examine their theoretical behavior and compare their performances on a number 
 of test problems with that of a high quality complete threshold pivoting code. 
 Our conclusion is that partial pivoting codes perform quite well and that they 
 should be considered for sparse problems whenever pivoting for numerical 
 stability is required. 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 Sparse Gaussian Elimination, Sparse Linear Systems, Linear Equations, Analysis 
 of Algorithms, Pivoting Algorithms 
 
 17b. Identifiers/Open-Ended Terms 
 
 17e. COSATI Field/Group 
 
 18. Availability Statement 
 
 Unlimited 
 
 FORM NTIS-35 (10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 21 
 
 22. Price 
 
 USCOMM-DC 40329-P7 1 
 
I 
 
MM 
 
 ■ 
 
 IP 
 
 B?-$ 
 
 uffl