LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 IJKor no. 764-76 9 cop. Z 3SSH ~^Sta« on or before the Latest Date stamped below. HOV 3 lyHfiCl APR 3 wt* MAY 1 4 REC1 L16 i_O-1096 ^ y UIUCDCS-R-T5-T65 / / 1»- *-/ COO-2383-00?6 Block Envelope Solution of Finite Difference Systems by Andrew H. Sherman November 1975 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS 2HE LIBRARY OF TH£ UNIVERSITY OF ILLINOIS T ""^ANA-CHAMPAIGN Digitized by the Internet Archive in 2013 http://archive.org/details/blockenvelopesol765sher UIUCDC5-R-75-765 COO -2 38 3- 002 6" Block Envelope Solution of Finite Difference Systems* by Andrew H. Sherman November 1975 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBAN A- CHAMPAIGN URBANA, ILLINOIS 6l801 * This research was performed while the author was a student at Yale University. Supported in part by the Office of Naval Research under contract N00lH-67-A-0097-00l6 and the Atomic Energy Commission under contract US AEC AT(ll-l) 2383. ABSTRACT In this paper we describe an envelope-oriented block Gaussian elimination method for solving linear systems arising from the use of five-point finite difference approximations on n x n square grids. We 5/2 7/2 show that the method requires 0(n ) storage locations and 0(n ^) arithmetic operations. Asymptotically this is a significant improvement on the costs of ordinary band and envelope methods, but we find that for moderate values of n (say, n <: 50) , the band and envelope methods are superior. 1. Introduction Given a positive integer n, let D be a regular n x n square grid 2 with an unknown x. associated with each of the n grid points. Letting 2 N = n , we consider the solution of an N x N system of linear equations Ax = "b, (1.1) where a. . 4 if and only if the nodes corresponding to x. and x. are connected by a line in the grid. Systems of this form arise in the use of five-point finite difference approximations in the numerical solution of second-order elliptic partial differential equations over sauare regions. In order to simplify our discussion somewhat, we will assume that A is symmetric and positive definite, although similar results hold in certain other cases (e.g., when A is nonsymmetric and diagonally dominant) . A possible method for solving (l.l) is the Choleski decomposition, T i.e. , first factoring A into the product LL , where L is lower triangular, T and then solving Ly = b followed by L x = y. We consider here a variant of this method in which we do not explicitly obtain L. Either algorithm may be stably applied to the permuted system (PAP T )(Px) = (Pb) (1.2). instead of (l.l), for any N x N permutation matrix P [lTl» and. this fact allows us to vary the ordering of the grid points (i.e., the correspondence 2 between the unknowns x. and the n grid points of D ) in any way that seems useful. In particular, we will attempt to minimize the amount of storage or work required to solve the resulting linear system (1.2). These two goals are related, but it is usually not possible to attain both simultaneously. If L is to be computed explicitly, then there are several different algorithms which may he used to factor A. Primarily, these differ in the degree to which they take advantage of the known zero structures of A and L. Asymptotically, the most efficient in terms of both storage and wor> are the general sparse matrix methods, which store and operate only on the nonzero entries of A and L. With an appropriate ordering of the grid points (e.g., the nested dissection ordering [l]), these require only 0(n log n) storage locations and 0(n ) arithmetic operations [1,8]. However, the overhead, i.e., the number of extra storage locations and nonnumeric operations required to avoid storing and operating on zeroes in A and L, may be quite high for general sparse methods. Furthermore, they may not perform very well in paged storage environments, since they tend to access matrix entries which are physically stored far apart. General sparse matrix methods have been primarily used for large problems ,■ although it has been found that they may offer savings even when n is small [9l- In contrast, we can use standard band [ll] or envelope [T»l^] algorithms. These are easy to implement and use and have low overhead requirements, but they require 0(n ) storage locations and 0(n ) arithmetic operations asymptotically [3]. Hence these methods are mainly useful when n is so small that the overhead, instead of the amount of storage or number of operations, is critical. Between these two extremes is a whole class of hybrid block methods [5,6,13] which, though fairly complex to implement, offer a means of achieving the relatively low overhead of band and envelope methods without their large storage and work requirements. Generally, these methods divide A into a number of independent blocks and obtain the factorization of A by operating on the blocks and/or their factorizations. Storage 3 references tend to be localized to the entries of the individual blocks, so that block methods seem to be far better suited to paged storage systems than are the other direct methods. Moreover, they may prove useful for solving systems so large that auxiliary storage is required to store A or L. In [5], George describes a block method for the solution of (l.l) over a nine-point finite element grid. He divides A into blocks corresponding to independent vertical regions of the grid, uses a band solver to factor the blocks, and combines the factored blocks so as to obtain an implicit form of L. This method might be termed block envelope with band Choleski decomposition used in each independent block, and George shows that it cr / p 7 / p requires 0(n ; ) storage locations and 0(n ) multiplicative operations (i.e., multiplications and divisions). The use of envelope Choleski, instead of band Choleski , would yield modest reductions in the lead coefficients, but not the orders, of the expressions for the storage and work requirements of the method. George's method could be applied directly to our model problem, giving the same storage and work requirements as in [5l> but there is an analog of his method which is substantially better. Instead of vertical separators in the grid, diagonal separators are used to divide the grid into independent regions, and this change leads to approximate asymptotic savings of h0% in storage and 80% in work. In this paper we present the details of the diagonal block method and evaluate its usefulness through a theoretical and experimental analysis. Despite the apparently substantial asymptotic savings over simpler methods, we have not found that the use of either our method or George's is partic- ularly worthwhile. Apart from some practical difficulties in programming the methods, it appears that they suffer from a common malady among hybrid k methods in that there is no range of problems for which they are the best methods available. For n £ 50, both methods require more work than more standard band or envelope methods. And for larger values of n, the actual amount of work is critical, so that the higher overhead with general sparse methods does not matter very much. It should be noted here that these conclusions are based on the assumption that the work, not the storage, is of primary importance. If this is not the case, then the use of either our method or George's may be quite fruitful, although the recently developed minimal storage Gaussian elimination algorithms may be even better, since they require only 0(n ) storage locations [3,1*+ ]• 2. Description of the i Method For some positive integer a << n , choose a - 1 lines of grid points, or separators, parallel to the line y = x, in such a way that the semi- perimeter of the grid is divided into roughly equal segments (see Figure 2.1), Figure 2.1: n x n grid divided by separators for a = h. These separators divide the grid into a distinct regions which are independent in the sense that if x. and x. lie in different regions, then a. . = in (1.1). Two considerations lead to the choice of diagonal, rather than vertical, separators. First, the sizes of the diagonal separators decrease near the corners of the grid, so that the average separator size is approximately n/2 instead of n. Since a large part of the work is associated with the separators, this reduction in size has a substantial effect on the total work. Second, diagonal separators in a rectangular five-point grid lead to matrix blocks which are diagonal matrices. Once the grid has been divided into independent regions by the diagonal separators, the regions and separators are numbered in the order indicated by Figure 2.1. The nodes within each region are sequentiallv ordered along diagonals parallel to the line y = -x. We call this method of numbering the nodes the anti-diagonal ordering scheme, and it is an analog of the so-called diagonal ordering scheme which is known to be quite efficient for rectangular five-point grids [2,7]. The nodes in each separator are ordered sequentially from the lower left to the upper right in the grid (see Figure 2.2). 6 1+ 2 1 32 lU 12 11 58 I * 9 10 36 2 9 30 31 5 7 35 26 27 28 65 / 3 3^ 23 2k 25 69 yT 55 33 20 21 22 63 52 53 IT 18 19 62 1+a 50 51 15 16 61 1+6 1+7 1+8 80 13 60 1+3 1+1+ 1+5 79 71+ 59 l+o kl 1+2 78 70 72 ^6 57 56 51+ 81 76 75 73 71 37 38 39 77 67 68 69 Figure 2.2: Grid ordering for n = 9, a = k. For the example in Figures 2.1 and 2.2, the matrix A has the following block form: h J A = B. B, b; (2.1) B, Factoring A into the product LL we obtain: L = -T c i L i -T B 2 L 2 -T C 2 L 2 -T B u L i, -T where (following George [5]) A. = L. LT 1 11 A3 = A 3 -B 2 a; 1 B^ A" 1 C* T 3 ~3 L. L A. = A.-B. -1 T A, . B_: -F. f: „-c. -1 (2.2) -T B 6 L 6 i=l,2,k 9 6 (2.3a) (2.3b) 1-1 A i-1 B i-r F i-2 F i-2" C i-3 A i-3 C i-3 = L i L i' i=5 ' T f t = l: 1 Bi . 1 a:\ c t , , i 1 1-1 1-1 i-i' (2.3c) i=3,5 (2.3d) For a > U, the pattern of the last two rows and columns of A and L would repeat. Again following George [5], L is not obtained explicitly and only the B's, C's, L's, and F's are stored. Given these matrices, the solution of (1.1) with A given by (2.1) may be obtained as follows (it is assumed that b is divided compatibly with the blocks of A): (i) Solve L. y. = b. , for i = 1,2,U,6. (ii) Set b = "b 3 -B 2 L~ Yg-C^ L~ y^ and solve L^ y^ = b^ ( iii} Set \ = b i- B i-l L x-1 y i-l- F i-2 y i-2- C i-3 L I-3 y i-3' and S ° 1Ve L. y. = b. , for i=5,7. 11 1 T (iv) Solve L x = y„. _1 T T * (v) (a) Set yg = Yg-Lg B g x , and solve Lg x^ = y^. ip rp (b) Set y = ycr- F c x t j and solve L x = y . -1 T -1 T T (vi) (a) Set y^ = y^-L^ C^ x^-L^ B^ x , and solve L^ x^ = yy ^ m rp ^ (b) Set y = yo -F o x c > and solve L x = y . —1 T -1 T T (vii) (a) Set y 2 = y 2 "L 2 C 2 x -L g B 2 x^ and solve L g x 2 = Vg. "-IT T A (b) Set y 1 = y 1 -L 1 (^ x , and solve 1^ x x = y 1 . Within each numbered group of computations, the lettered computations may be performed in any order or simultaneously. The solution scheme extends to larger values of a in the obvious way. 3. Storage Requirements In this section we analyze the storage requirements of the method described in Section 2. Throughout the calculations we assume that a << n. The independent A. ' s are quite similar to matrices derived from the use of a five-point operator on long, thin, rectangular domains. For such matrices it is well known [T,10] that envelope methods are more efficient than band methods. Hence the independent A.'s and the corresponding L.'s c l i are stored and operated on as envelope matrices , so that the storage reauired is given by S p (a) = (-- ~r)n 3 + 0(n 2 ). (3.1) B a 3a 2 The A. ' s corresponding to the diagonal separators are diagonal matrices, hut the corresponding A.'s and L.'s tend to be quite dense. Hence 11 their entire lower triangles are stored, and the storage required is given by S 5 (o) = (|+ ^-)n 2 + o(na). (3.2) The B.'s and C.'s are quite sparse, so a standard sparse matrix storage scheme is used for them. The storage required is given by S BC (a) = 2a n + 0(a). (3.3) Finally, the F.'s are generally full. Therefore, the storage required for them is given by Sjo) = (§ + ~)r? + 0(no). (3.M F 3 3a Summing (3.l)-(3.M» the total storage required by the method is S T (a) = ( a " "% )n3 + ( f )n2 + ° (n2) ' (3 - 5) 3a This estimate is approximately minimized for a = a = /2n, and for n sufficiently large, S T (a Q ) = v^ n 5/2 + 0(n 2 ). (3.6) By way of comparison, George [5] obtains a minimal storage estimate of v6 n + 0(n ) , so that there is a significant saving in using diagonal separators for the five-point operator. For the standard band and envelope 13 2 methods, the storage requirements are at least — n + 0(n ) [3], so again there is a significant saving. Although (3.6) is an asymptotic result, Table 3.1 shows that there are large savings in storage even for small values of n. In fact, as we shall see, the storage requirement for the method may be its primary virtue. n a o S T (a Q ) 1 3 2 n 8 2 267 256 16 h 1530 20U8 2k 6 1+177 6Q12 32 8 8592 1638U Uo 8 1U82U 32000 U8 8 23392 552Q6 Table 3.1 Experimentally Determined Optimal Storage Requirements h. Work Requirements We now estimate the number of multiplicative operations required to obtain the implicit block form of L. Again we assume that a << n. The work can be broken into four sections which are treated separately: (i) decomposition of the independent A.'s; (ii) computation of the A.'s; (iii) decomposition of the A.'s; (iv) computation of the F.'s. The decompositions of the A.'s corresponding to the independent grid regions are performed as envelope Choleski decompositions [7,10]. Results of George [7] can be used to compute the total amount of work for the independent matrices. This is given by W B (a) = -^n U + 0(^-). (U.1) 2a The computation of the A.'s is very sensitive to the order in which we perform the matrix operations. Following George [5], we obtain 10 -1 T terms of the form B. = B. , A. , B, hv first computing i-1 l-l l-l l-l b. . = l:\ lT 1 ., BV _ (U.2) l-l l-l l-l l-l and then computing B i-i " B i-i B i-i • c*- 3 ' The computation of B can be done as two envelope backsolves , and we can take advantage of the sparseness of B. . to compute B. , from B. , in a l-l l-l l-l small number of additional multiplications. In this way all of the A.'s can be computed in a total of W s (a) = (rf + ^)n k + (f + |)n 3 ♦ O(^) (k.k) 1 3a mult i pi i cations . Once the A.'s have been obtained, they can be decomposed in a total of w s 2 (a) = ( 517 + 6 )n3 + 0( r» (t - 5) additional multiplications, assuming that the A.'s are generally full, so that sparse decomposition methods do not help much. Using ideas similar to those of (U.2) and (U.3), the F.'s can be obtained efficiently by first computing and then computing =i-i " L ili L I-i c i-i (1, - 6 > F T = lT 1 c. .. (U.T) l l l-l Using this technique and noting that the terms C. also appear in the computation of the A.'s, we find that the number of additional multiplications needed to compute the F.'s is given by 1 3 W F (a) = (f - |)n 3 + O(jj-). (U.8) 11 Summing and simplifying, the total number of multiplicative operations required for the implicit decomposition of A is given "by W T (a) = (3| + ^ + (|f + l^ + 0(4. 6a (U.9) W (a) is approximately minimized for a = a Pn 5 , and W_(ol) = i fio n T/2 + 0(n 3 ) . T 1 3 For a = a , (3.5) yields S T (a 1 ) = ^ v^O n 5/2 + 0(n 2 ), and conversely, for the a which minimized storage, the work is 3 7/2 W (a ) = f J2n Ud + 0(n J ). (U.io) (i+.n) (U.12) T 0' So there is little to choose between minimizing the work and minimizing the storage. Both (1+.10) and (1+.12) compare favorably with George's work estimate of 2 /lO n + 0(n ) [5]. However, all these results are valid only in the asymptotic case of large n. For values of n £ 50, we have found that the minimal amount of work is always at least n /k (see Table l+.l). n a Q S T (a Q ) a l W T (« 1 ) - n k /k 8 2 267 h 3hfh 1021+ 16 1+ 1530 8 38810 163BU 2k 6 Ul77 10 159^96 82 Q U1+ 32 8 8592 12 U29605 2621 1+1+ 1+0 8 1U82U 12 928186 61+nooo 1+8 8 23392 lU 17 396 1+2 1327101+ Table l+.l Experimentally Determined Optimal Storage and Work Requirements 12 5. Conclusions In conclusion we make the following remarks: (i) In terms of asymptotic storage requirements, our method (like George's [5]) has a great advantage over the standard band or envelope storage schemes. In practice, this holds for even very small values of n (see Table 3.1). However, the minimal storage methods [3,1*+] require o only 0(n'~) storage locations, and recent research [9,1^,1^] indicates that general sparse methods may be implemented so as to require less storage than our method, even for small values of n. (ii) Asymptotically, our method is an improvement over standard band or envelope methods in terms of the number of multiplicative operations , and it might be hoped that it would be competitive with general snarse matrix methods for some range of n. However, for moderate values of n (say n £ 50), for which it might not be worthwhile to use a general snarse matrix method, the method appears to be no better than an envelope method, except, perhaps, in terms of the work required for backsolution. In fact, Schultz [15] has proposed a very simple block envelope scheme which 1 ) "5 requires only 77 n + 0(n ) multiplicative operations for a symmetric n x n five-point finite difference grid. Hence it appears that where the number of arithmetic operations is critical, there is little use for our method. (A similar qualitative statement may be made concerning George's block method for nine-point finite element grids [5j Q ]K (iii) From (i) and (ii) it may be concluded that our method is not generally useful for the direct solution of (l.l). However, if (l.l) is to be solved for a number of different right hand sides, the method may prove quite efficient because it substantially reduces the storage require- ments and the number of arithmetic operations needed for backsolution. 13 Minimal storage methods are not efficient in this case, since they do not save a factorization of A. Typical examples of such effective use of the method might arise in the solution of semi-linear or nonlinear partial differential equations using certain iterative techniques for which backsolving is the critical operation (see [h]) . lit REFERENCES [ l] G. Birkhoff and J. A. George. "Elimination by Nested Dissection." In J. F. Traub, Ed. , Complexity of Sequential and Parallel Numerical Algorithms . New York" Academic Press, 1973, pp. 221-260. [ 2] E. Cuthill and J. McKee. "Reducing the Bandwidth of Sparse Symmetric Matrices." Proc . ACM National Conference, 1°6Q, pp. 157-172. [ 3] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. "Appli cations of an Element Model for Gaussian Elimination." In Proceedings of the Symposium on Sparse Matrix Computations, Argonne , IL , 1075. [ k] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. "The Application of Sparse Matrix Methods to the Numerical Solution of Nonlinear Elliptic Partial Differential Equations." In A. Dold and B. Eckmann, E ds . , Proceedings of the Symposium on Constructive and Computational Methods for Differential Equations . Berlin : Snringer-Verlag, 197^, pp. 131-153. [5] J. A. George. "An Efficient Band-Oriented Scheme for Solving n x n Grid Problems." FJCC, 1972, pp. 1317-1320. [6] J. A. George. "Block Elimination on Finite Element Systems of Equations." In D. J. Rose and P.. A. Willoughby, Eds. , Sparse Matrices and Their Applications . New York: Plenum Press, 197?, pp. 101-llh. [ 7l J- A. George. "Computer Implementation of the Finite Element Method. " Stanford University Computer Science Department, Technical Report STAN-CS-71-208, 1971. [ 8] J. A. George. "Nested Dissection of a Regular Finite Element Mesh." SIAM J. Num. Anal. 10 (1973), pp. 3^5-363. [9] J. A. George. Private communication. [10] A. Jennings. "A Compact Storage Scheme for the Solution of Symmetric Linear Simultaneous Equations." Computer J. 9 (1966), pp. 281-28S. [ll] R. S. Martin and J. H. Wilkinson. "Symmetric Decomposition of Positive Definite Band Matrices . " Handbook Series in Linear Algebra, Num. Math. 7 (1965), PP. 355-361. [12] D. J. Rose. "A Graph-theoretic Study of the Numerical Solution of Sparse Positive Definite Systems of Linear Equations." In Read, R.C., Ed., Graph Theory and Computing . New York: Academic Press, 1 Q 72, pp. 183-217. [13] D. J. Rose and J. R. Bunch. "The Role of Partitioning in the Numerical Solution of Sparse Systems." In D. J. Rose and R. A. Willoughby, Eds., Sparse Matrices and Their Applications . New York: Plenum Press, 1972, pp. 177-187. 15 [ik] A. H. Sherman. "On the Efficient Solution of Snarse Systems of Linear and Nonlinear Equations." Ph.D. dissertation, Yale University Department of Computer Science, 1975- [15] M. H. Schultz. Private communication. [l6] G. F. Whitten. Private communication. [IT] J. H. Wilkinson. The Algebraic Eigenvalue Problem . London: McGraw-Hill, 1971. FormAEC-427 U.S. ATOMIC ENERGY COMMISSION "JlrSoi UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR A DISPOSITION OF SCIENTIF.C AND TECHNICAL DOCUMENT ( See Instructions on Reverse Side ) |. AEC REPORT NO. C00-2 383-0026 2. TITLE Block Envelope Solution of Finite Difference Systems 3. TYPE OF DOCUMENT (Check one): KJa. Scientific and technical report J b. Conference paper not to be published in a journal: Title of conference Date of conference Exact location of conference. Sponsoring organization □ c. Other (Specify) 4. RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): KXa. AEC's normal announcement and distribution procedures may be followed. J b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractor*. "2 c. Make no announcement or distribution. 5. REASON FOR RECOMMENDED RESTRICTIONS: 6. SUBMITTED BY: NAME AND POSITION (Please print or type) C. W. Gear Professor and Principal Investigator Organization Department of Computer Science University Urbana, IL of Illinois at Urban a- Champaign 61801 Signature^ y 1 Data (/lr _^ A ^ , November 1975 V FOR AEC USE ONLY • AEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY. ON ABOVE ANNOUNCEMENT AND DISTRIBUTION RECOMMENDATION: 1. PATENT CLEARANCE: O a. AEC patent clearance has been granted by responsible AEC patent group. LJ b. Report has been sent to responsible AEC patent group for clearance. LJ c. Patent clearance not required. i BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R- 75-765 3. Rcc ipient's Acci ssi m '. 4. I I! It and Mibt it l( 5. Kcport Date November 1975 Block Envelope Solution of Finite Difference Systems 7. turhnn.s) Andrew H. Sherman 8. Performing Organi/.nion Re pi No -UIUCDCS-R-75-765 9. iv.tuii.mj. Organization Nairn- and Address Department of Computer Science University of Illinois at Urban a- Champaign Urban a, IL 6l801 10. Project/Task/Work Unit No. 1 1. Contrac t /Grant No US AEC AT(ll-l) 2383 1 ' ni uns >ring Organization Name and Address US AEC Chicago Operations Office 98OO South Cass Avenue Argonne, IL 6oi+39 13. Type of He port & Pi riod Covered 14. i| Icrrv ntary Notes 16 \ b m 1 a 1 1 ; In this paper we describe an envelope-oriented block Gaussian elimination method for solving linear systems arising from the use of five-point finite difference approximations on n x n square grids. We 5/2 7/2 show that the method requires 0(n ) storage locations and 0(n ) arithmetic operations. Asymptotically this is a significant improvement on the costs of ordinary band and envelope methods, but we find that for moderate values of n (say, n < 50) , the band and envelope methods are superior. 17. key Words and Document Analysis. 17a. Descriptor ptors Sparse linear systems Block Gaussian elimination Envelope methods i7b. I !.-••;! i,rs Open-Fnded Terms 7c ( OS ATI I -'ie Id /Group 18. A\ liability Statement Unlimited 'f*M N T!5.i)5 ( 10-70) 19. Security Class (This Report ) UNCLASSIFIED 20. Security (lass (This Page UNCLASSIFIED 22. Pri< e i 21. No. of Page -. 20 USCOMM-DC 40329-P > ^ ^ ,o^ N o«C HW