9m 
 
 HfiHBn 
 
 mm B»3ra 
 BkBbHHSHHS 
 
 WmWBaSBK 
 mqmSH 
 
 (HBhbkBe 
 
 SH MIMiMb 
 
 nan 
 
 mm 
 
 k 
 
 M 
 
 se 
 
 ffiS 
 
 HttBHnR 
 
 9BHnfii 
 
 BBWfffflHl 
 mmmtSSSOt 
 
 WMbMM iWffl 
 
 sen 
 
 ■MP 
 
 Hi 
 
 BsSSb 
 
 KrkBS 
 
 ■ 
 
 mm 
 bmsi 
 
 nmfiHn 
 
 BhwNHSw 
 
 BR 
 
 n 
 
 ■ 
 
 HH 
 
 ■raUHi 
 HMB B B 
 
 HBM H-M. 
 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 3/0.84 
 14 6 r 
 no. S9S-&&0 
 
UIUCDCS-R-73-600 
 
 AN EFFICIENT ITERATIVE PROCEDURE 
 FOR USE WITH THE FINITE ELEMENT METHOJ 
 
 August, 1973 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 #/£ 
 
 ^ 
 
 ^ 
 
 Op 
 
 
 

 The person charging this material is re 
 sponsible for its return to the library from 
 which ii was withdrawn <»n or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA CHAMPAIGN 
 
 JAN 3 ib/4 
 JAN 2 1 Recd 
 
 L161 — O-1096 
 
UIUCDCS-R-73-600 
 
 AN EFFICIENT ITERATIVE PROCEDURE 
 FOR USE WITH THE FINITE ELEMENT METHOD 
 
 Young Jip Kim 
 
 August, 1973 
 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 Urbana, Illinois 
 
 This work was supported in part by the National Science Foundation under 
 Grant No. US NSF GJ-36393 and the Water Resources Center, OWRR PROJ. 
 #A-058-ILL and was submitted in partial fulfillment for the Doctor of 
 Philosophy in Computer Science, 1973* 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/efficientiterati600kimy 
 
Ill 
 
 ACKNOWLEDGEMENT 
 
 The author wishes to express his appreciation and gratitude to 
 Professor Paul E. Saylor, whose supervision, guidance and encouragement 
 made the present work possible. 
 
 For their valuable comments the author is indebted to Professors 
 D. B. Gillies, J. R. Phillips and D. S. Watanabe. 
 
 In addition, the author would like to acknowledge the Department 
 of Computer Science, the Office of Computing Services and the Water Resources 
 Center (Allotment Grant Program, OWRR PROJ. # A-058-ILL), University of 
 Illinois and the National Science Foundation (GJ-J6393) for their financial 
 support during the period of his graduate study and research. 
 
 Sincere thanks are due to Mrs. J. Wingler for her efficient and 
 careful typing of the manuscript, and to Mrs. M. L. Gidel for her aid in 
 preparation of the manuscript. 
 
 Finally, the author wishes to thank his wife and three children for 
 their patient understanding through his long period of study. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 1.1 Iterative and Direct Methods 1 
 
 1.2 Factorization Methods 6 
 
 1.3 Summary of the Contents 10 
 
 l.k Summary 11 
 
 2. FINITE ELEMENT METHOD 12 
 
 2.1 Introduction. 12 
 
 2.2 Ritz • s Method and Galerkin' s Method 13 
 
 2.3 Finite Element Method 16 
 
 3. MATRIX STRUCTURE AND STORAGE SCHEME 26 
 
 3.1 Graph Theory and Positive Definite Matrices 26 
 
 3.1.1 Graph theory 26 
 
 3.1.2 Graph theory applied to positive definite matrices... 27 
 
 3.2 Structure of the Matrix Associated with the Finite 
 
 Element Method 29 
 
 3»3 A Storage Scheme for Sparse, Symmetric Matrices 36 
 
 3.k Matrix-Vector Multiplication Using a Row-Oriented 
 
 Storage Scheme 38 
 
 h. ADAPTIVE ITERATIVE PROCEDURE TO SOLVE FINITE ELEMENT SYSTEM kO 
 
 k-.l Introduction ^-0 
 
 k-,2 The Adaptive -Chebyshev-Factorizati on Method of Diamond kl 
 
 h.~5 Almost Factorization Algorithm k6 
 
V 
 
 Page 
 
 k.h Computational Results 58 
 
 APPENDIX 61 
 
 LIST OF REFERENCES 70 
 
 VITA 73 
 
1. INTRODUCTION 
 
 1.1 Iterative and Direct Methods 
 
 Two standard, general methods for the numerical solution of elliptic 
 partial differential equations are the method of finite differences and the 
 finite element method (FEM). Each one leads ultimately to a large system 
 of linear equations, called either a finite element system or a finite 
 difference system, the solution of which is usually the most expensive step 
 of the computation. The subject of this thesis is the extension to finite 
 element systems of an iterative method developed for the efficient solution 
 of finite difference systems. Other iterative methods, also originally 
 developed for finite difference systems, have been proposed for finite 
 element systems [12" 1 , but have not been widely accepted. The customary, 
 
 popular method for the solution of finite element systems is Gaussian 
 
 T 
 elimination or something equivalent such as the Cholesky GG algorithm. 
 
 It is important for the judgment of the method proposed here to understand 
 
 the reasons why iterative methods should be reconsidered. 
 
 The essential difference between the two methods in the early days 
 
 of computing was that iterative methods take much less storage, a fact 
 
 which made it possible to put reasonable size linear systems into fast 
 
 memory, thereby making feasible on stored program digital machines the 
 
 solution of boundary value problems in partial differential equations. 
 
 The first iterative methods used for automatic computing were primitive 
 
 relaxation methods which converged too slowly unless an effective relaxation 
 
parameter could be found to accelerate convergence. There was no satisfactory 
 method to approximate an effective parameter until Young [29] developed a 
 general theory of Successive-Over-Relaxation (SOR) that reduced the 
 computation of the optimum parameter value to a trivial algebraic evaluation 
 involving eigenvalue bounds of a matrix called the Jacobi matrix associated 
 with matrix of the linear system. 
 
 These eigenvalue bounds are easily computed for a vaguely defined 
 class of matrices called regular matrices. A matrix may be said to be 
 regular if the nonzero elements form diagonals such as occur from the use 
 of the five point difference equation applied to Poisson's equation on a 
 rectangular domain. Finite element matrices are also regular if the domain 
 is a rectangle and the triangulation is made up of congruent triangles. 
 Regularity of a matrix should not be defined too carefully as it also 
 depends on the iterative method. For the Alternating-Direction-Implicit 
 (ADl) method, the domain must not only be square but also the matrix must 
 split into a sum of commutative matrices. Whatever the iterative method, 
 the primary implication of regularity usually has been that a regular 
 matrix is one for which acceleration parameters are easy to compute. A 
 great variety of iterative methods have been proposed for the solution 
 of finite difference systems other than SOR or ADI, but as for ADI or SOR, 
 acceleration parameters are essential to the success of each method. 
 Theories for computing acceleration parameters for finite difference 
 systems are difficult to apply to finite element systems because they 
 exploit as much as possible the regular structure of finite difference 
 systems. Typical finite element systems are not regular as they result 
 from the triangulation of an irregular domain into irregular elements. 
 There are subtleties in the use of direct methods also, such as for example, 
 
finding a way to reorder the nodes of the triangulation to minimize 
 computation. Easy to use, broadly applicable techniques and algorithms 
 for reordering the nodes have been devised, whereas it was an obstacle for 
 the use of iterative methods that no general technique was known for 
 estimating acceleration parameters for finite element systems. One of 
 the features of the iterative method we are proposing is that no estimates 
 of acceleration parameters are required prior to execution of the iteration. 
 
 There are other reasons for the predominant use of direct methods 
 for finite element systems [1^]. One is that storage is not as critical 
 as for finite difference systems because finite element systems of the 
 same accuracy are lower order. Also storage is becoming cheap and 
 plentiful. As we have mentioned, iterative methods were favored for years 
 for solving finite difference systems primarily because they require less 
 storage. Slow convergence was tolerated to gain this advantage. Because 
 of hardware improvements such as bulk core storage and features such 
 as virtual memory, the importance of conserving storage i s fading 
 away. Yet this cannot mean the unlimited use of direct methods. As 
 storage problems become minor, run time for Gaussian elimination remains 
 a formidable barrier to the solution of large systems that cost memory 
 cannot affect. Run time can be reduced only by more efficient methods 
 than Gaussian elimination, such as the direct methods discussed in Buzbee, 
 Golub and Neilson [5" 1 or else rapidly convergent iterative methods. 
 
 In an attempt to explain the popularity of direct methods for 
 finite element systems, A. George [1^] has suggested these advantages 
 that (a) iterative refinement may be used with direct methods to gain 
 estimates of the condition of the system, and (b) it is common for more 
 than one right hand side to be processed. These reasons are important 
 
but, as the magnitude of the problem increases, they fail to justify the 
 blind application of direct methods. Consider for example, the flow 
 equations that arise in certain hydrology studies. These are time dependent 
 problems that may be solved by approximating the space derivatives by the 
 finite element method to obtain a system of ordinary differential equations. 
 The solution of this system is then computed by discretizing the time 
 derivative. If the backward difference method is used, each time step 
 requires the solution of a finite element system. Typical studies require 
 1000 time steps. With time dependent coefficients, no economy is possible 
 from solving the same system repeatedly by a direct method. Currently, 
 only two dimensional studies are considered feasible. Some technique more 
 efficient than conventional direct methods is essential to make three 
 dimensional problems practical. 
 
 Difficulties with the solution of the flow equations specifically 
 led to the development of the iterative method described in this work. 
 The difficulties with the solution by Gaussian elimination of the resulting 
 irregular finite element systems are that the code is limited to 2000 
 unknowns on the University of Illinois 360 service computer and is expensive 
 to run. The iterative method developed here has been tested for up to 
 10660 unknowns. However, these tests required the use of extended core 
 storage. Gaussian elimination code has not been run using this facility, 
 but it is estimated that extended core storage would allow no more than 
 ^000 unknowns. For some problems, ^000 unknowns is not enough to be 
 practical. Execution time for the code is estimated to be roughly ten 
 times faster than elimination routines. It should be emphasized that 
 further tests of the iterative method must be made to discover its 
 
limitations. Any conclusions about the superiority of iterative methods 
 
 should be made cautiously until more experience is acquired. These test 
 
 results are mentioned here as an example of how the magnitude of a problem 
 
 may affect the choice of an iterative method if one exists. 
 
 A final argument in favor of direct methods is that data management 
 
 is simpler for elimination schemes than for iterative methods when the 
 
 triangular mesh is arbitrary. The result of an arbitrary mesh is a sparse 
 
 matrix with irregular structure and it is an impression that techniques for 
 
 efficient storage of sparse matrices adapt to elimination schemes more 
 
 easily than to iterative methods. Alan George has articulated the difficulty 
 
 as follows, where A is used to denote the matrix of a finite element system. 
 
 [For difference systems arising from a regular 
 mesh, the structure of the grid and the coefficient 
 matrix can conveniently be stored in two-dimensional 
 arrays. Data management is trivial.] "...However, 
 for an arbitrary triangular mesh, A will not have 
 such regular structure, and the calculation of a 
 single component of the residual vector may be 
 relatively expensive. In general, A will be 
 symmetric and only its upper or lower triangle 
 will be stored; therefore, in order to compute a 
 single component of the residual, we must be able 
 to access lines of elements in both rows and 
 columns of the upper (or lower) triangle of A. 
 If the storage scheme is 'row oriented,' accessing 
 elements in a specific column may require scanning 
 several rows, and visa versa for column -oriented 
 schemes. By contrast, elimination schemes can be 
 conveniently implemented so that they operate only 
 on rows or only on columns." 
 
 It is possible, however, to avoid these difficulties. The programming 
 
 algorithms developed to implement the iterative method of this thesis do 
 
 not require any searching operations during execution of the steps of the 
 
 iteration even though sparse matrix techniques are used to store the data. 
 
The purpose of this part of the introduction has been to present 
 the reasons why direct methods are used exclusively for finite element 
 systems. The development of new programming techniques voids the 
 programming advantage of direct methods. But the main handicap with 
 direct methods is that for large systems, they are impractical due to 
 excessive run time. In presenting a case for consideration of iterative 
 methods, there is some risk that we may seem to propound a doctrine of the 
 inherent weakness of direct methods. The reasons given here for the wide 
 acceptance of direct methods overwhelmingly support direct methods for 
 the solution of moderate size systems if efficient data management and 
 sparse matrix methods are employed in the code. It is of interest that 
 after years of eclipse by iterative methods, direct methods are now 
 being recognized as of special importance in the solution of finite 
 difference systems of moderate size. Their use in petroleum 
 reservoir simulation, an area of engineering to which the finite element 
 method has not penetrated, is now being advocated [22]. 
 
 1.2 Factorization Methods 
 
 Let Ax = b be the system of equations obtained by a finite element 
 approximation, where A is a positive definite and, therefore, symmetric 
 square matrix. If B is a matrix for which the LU factors of A + B 
 are sparse then the sequence defined by 
 
 (A + B)x. = (A + B)x - t. (Ax. - b), (*) 
 
 for t • a parameter, is efficient to compute. Therefore, if the sequence 
 converges to x and converges rapidly, the result is a practical method 
 to approximate the solution. Methods to solve Ax = b that combine an 
 
7 
 
 iteration such as (*) with an algorithm to construct A + B with sparse LU 
 factors are called factorization methods . For example, successive-over- 
 relaxation may be written in the form of (*) with elements of B explicitly 
 equal to linear combinations of elements of A. Matrix A + B is called an 
 approximate factorization (of A) and (*) is called a factorization 
 procedure . If A +Bis positive definite a sequence of parameters t , . .., 
 may always be chosen to obtain convergence. Moreover, an optimum sequence 
 of parameters, t , . .., t may be computed to minimize the error after n 
 steps. Optimum parameters result from a routine use of Chebyshev minimax 
 theory. Two requirements are essential to apply the theory. One is that 
 A+B be positive definite. This implies that the eigenvalues of (A + B)~ A 
 are positive, since this matrix is similar to a positive definite matrix. 
 Unless these eigenvalues are positive there is no convenient, effective 
 method to apply Chebyshev theory to obtain decent parameters. If they are 
 positive it is trivial to compute optimum parameters using Chebyshev theory 
 from the largest and smallest eigenvalues of (A + B) A. 
 
 The second requirement is the practical one that the largest and 
 smallest eigenvalues of (A + B) A be available to compute with. It is a 
 common difficulty with iterative methods for solving linear systems that a 
 numerical value of some parameter is required in advance of running the 
 iteration. For example, an accurate value of the optimum relaxation 
 parameter is essential in order that SOR run efficiently. Also, precise 
 eigenvalue estimates are critically important in order to obtain optimum 
 or even reasonably effective ADI iteration parameters for solving finite 
 difference equations. In each case, a priori estimates valid for a class 
 of matrices are often satisfactory to use. The difficulty with estimating 
 iteration parameters is more acute for factorization procedures because the 
 
8 
 
 techniques that work for ADI or SOR are no longer effective. The regularity 
 of the matrices that appear in SOR or ADI makes it possible to obtain good 
 a priori estimates in most cases, whereas the complicated properties of the 
 elements of B in factorization procedures, hinder effective estimation of the 
 extreme eigenvalues of (A + B)~ A, so that a priori estimates of Chebyshev 
 parameters to use in factorization procedures are generally unsatisfactory. 
 In cases when a priori computation of SOR or ADI parameters is not possible, 
 methods have been devised for the direct computation of the necessary extreme 
 eigenvalues that, although a nuisance, are much less expensive than direct 
 numerical approximation of the eigenvalues of (A + B) A for factorization 
 procedures, which could cost as much as running the iteration. The amount 
 of computational work required to obtain accurate bounds of the eigenvalues 
 of (A + B) A is so great that factorization methods are not efficient unless 
 the labor of executing the iteration yields these bounds as a by-product. 
 Efficiency requires that the work of executing the iteration be combined 
 with the work of estimating the eigenvalues. The adaptive -Chebyshev- 
 factorization algorithm of Diamond [10] achieves this. It estimates 
 eigenvalue bounds and optimal parameters dynamically by using the fact 
 that without optimal Chebyshev parameters, the iterates may be combined 
 to approximate an extreme eigenvector. 
 
 The idea of estimating parameters dynamically is used in some SOR 
 programs [ 28] to obtain an increasing sequence of estimates of the SOR 
 parameter w. Therefore, the idea is not new although the details are 
 completely different. What does seem to be new is its potential for 
 practical applications. Experimental comparisons of Diamond's algorithm 
 
9 
 
 to other iterative methods have been made [10]. For each test, Diamond's 
 algorithm achieved roughly two orders of magnitude greater accuracy for the 
 same computational work. 
 
 One of the most appealing features of Diamond's algorithm is 
 that no preliminary eigenvalue estimates are necessary. Only trivial 
 input parameters are required, and the user is freed of the task of careful 
 numerical preparations, essential to the effective use of SOR, and ADI. 
 This feature should "be stressed as well as the promise of great practical 
 utility, if applications to finite element systems are to be made. 
 
 Diamond's algorithm is an example of an adaptive or dynamic algorithm. 
 The characteristic property of adaptive numerical algorithms is that the 
 sequence of executed steps is determined dynamically while the algorithm is 
 running. The logical control feature of the machine is exploited in a more 
 fundamental way than by traditional numerical algorithms, which may be 
 called static algorithms. In adaptive algorithms, logical control statements 
 (IF ... statements in the FORTRAN version of the algorithm) direct the 
 choice of one sequence of computation over another, depending on the outcome 
 of certain trials, whereas in static algorithms, the use of these statements 
 is superficial, for example to halt an iteration or to flag errors. In a 
 traditional algorithm, if the path of computation is to be altered by some 
 test, the test can usually be formalized as a mathematical function, that 
 is, the path the computations take may be predicted by a formula. For 
 this reason, static algorithms may be analyzed more easily than adaptive 
 algorithms to determine, for example, the rate of convergence. In adaptive 
 algorithms the quantities necessary for analytic study are not available 
 until certain trials are performed and the outcome of these trials is too 
 complex to formalize mathematically. 
 
10 
 
 Recently developed algorithms to solve initial value problems in 
 ordinary differential equations are an example of adaptive algorithms in 
 common use. These algorithms select at each step the next stepsize to be 
 used and the order of the method, depending on estimates of certain 
 parameters based on numerical results from the preceding step. 
 
 1.3 Summary of the Contents 
 
 The finite element method is reviewed in Chapter 2. 
 
 The structure of finite element systems is described in Chapter 3, 
 with an explanation of the relation between the structure of the matrix 
 and the ordering of the node points. The problem of how to reorder the 
 node points to optimize Gaussian elimination is complex and not discussed. 
 The reader is referred to the work of Rose [23]« For the finite element 
 systems considered here it is shown only that the matrices are not perfect 
 elimination matrices [23]. Reordering of the node points is not normally 
 considered important for iterative methods, but it would be convenient 
 as it would be also for direct methods if reordering would yield a matrix 
 for which the nonzero elements form bands of diagonals as this would 
 facilitate storage. It is shown that no such reordering scheme is generally 
 possible. A compact storage scheme is also presented that requires less 
 storage than other methods [1^1 known to the author. Programming techniques 
 to avoid searching operations are described. 
 
 In Chapter k the iterative procedure of Diamond is presented' in 
 detail. An almost factorization is described for the matrix of the 
 irregular finite element system in Figure k. 
 
 Details of an actual computer implementation are displayed in 
 four flowcharts of the Appendix. 
 
n 
 
 l.k- Summary 
 
 In this thesis, the symmetric almost factorization of Stone is 
 extended to irregular finite element systems. The principal contribution 
 of the thesis is to show that for a specific practical problem an adaptive 
 Chebyshev factorization procedure may be devised for the solution of a 
 finite element system. Some novel programming techniques are introduced 
 to make the code practical. Further work is planned to extend the method 
 to a wide class of finite element systems. 
 
12 
 
 2. FINITE ELEMENT METHOD 
 
 2.1 Introduction 
 
 Certain boundary value problems of partial differential equations 
 are equivalent to problems in the calculus of variations. The Ritz method 
 for solving a boundary value problem is to approximate the solution of the 
 equivalent variational problem by a linear combination of "coordinate 
 functions." The Galerkin method is an extension of Eitz's idea such that 
 it can be applied to equations "with no equivalent variational form. 
 The finite element method is essentially a technique to construct a set of 
 coordinate functions for the Ritz or Galerkin method. The difficulty of 
 constructing coordinate functions was clearly stated by Courant [9]: 
 
 "From a practical point of view almost any success depends 
 on the selection of coordinate functions. ...More annoying 
 is that a suitable selection of the coordinate functions is 
 often very difficult and that laborious computations are 
 sometimes necessary." 
 
 In the FEM the domain of a boundary value problem is partitioned into 
 finite elements (triangles or rectangles), and a piecewise polynomial 
 (a continuous function which is a polynomial on each finite element) is 
 constructed for the approximation to the solution. Accuracy of the 
 approximation is increased by increasing the degree of the polynomials. 
 It is known that piecewise polynomials of degree k - 1 achieve a reduction 
 in the error of order h , where h is the maximum diameter of the finite 
 elements [25]. In practice, high accuracy is sacrified for the simplicity 
 
13 
 
 and convenience of piecewise linear and piece-wise bilinear polynomials in 
 triangular and quadrilateral nets. Polynomials are rarely of degree greater 
 than three. 
 
 Throughout the paper we -will consider the Dirichlet problem in 
 two-dimensional Euclidean space. A review of the main ideas of the Ritz 
 and Galerkin methods is given in Section 2. In Section 3, details of the 
 finite element method are given for piecewise linear and piecewise bilinear 
 polynomials as basis functions. 
 
 2.2 Ritz's Method and Galerkin' s Method 
 
 In this section, the variational form of the Dirichlet problem is 
 presented. The approximate minimization of the functional of the variational 
 problem is reduced by the Ritz method to a system of linear equations Ax = b, 
 where x is a vector consisting of the unknown parameters in the linear 
 combination of coordinate functions. For later use in Section 3 "we show 
 that matrix A is positive definite. The Galerkin method is also described. 
 
 We consider the Dirichlet problem with a homogeneous boundary 
 condition in a bounded region q with smooth boundary c)ti, 
 
 Lu = f in fi, (l, a) 
 
 »l an - 0, (l,b) 
 
 where L is an elliptic operator defined by 
 
 Lu = - z 5 — ( a - • n — ) + hu, a. . = a.. . (2, a) 
 * dx ij dx ' ' 13 31 
 
 For every t n and t. the coefficients a. . satisfy 
 12 13 
 
11+ 
 
 2 2 
 
 Z a. . t. t. > u z t., u > 0. (2,b) 
 
 i,J=l ° d i=l 
 
 The operator L is defined on the dense set M of functions which are twice 
 
 continuously differentiable in Q and vanish on the boundary bti. It can be 
 
 2 2 
 shown that (Lu,u) > y ||u|| , i.e., L is positive definite. Under the 
 
 assumption that a. ., b and f are sufficiently smooth, the Dirichlet problem 
 
 has a unique, regular solution [21]. 
 
 It is well known [19] that if L is positive definite, the solution of 
 
 the Dirichlet problem minimizes the functional F(u) defined by 
 
 F(u) = (Lu,u) - 2(f,u) . (3) 
 
 Conversely the element which minimizes the functional is the solution of 
 
 the Dirichlet problem [19,20]. 
 
 The Ritz method approximates the minimum of F(u) in a finite 
 
 dimensional sub space of M spanned by linearly independent "coordinate" 
 
 functions cp_ , cp . ..., cp . We set 
 1 2 n 
 
 n 
 u n = Z a cp., (k) 
 
 1=1 
 
 where the a. are constants. Substitute u for u in (3) to obtain 
 1 n 
 
 F(u ) = (Lu ,u ) - 2(f,u ) 
 n n' n ' n 
 
 n n n 
 
 = ( z a Lcp ., z a cp ) - 2(f, E a cp ) 
 
 i=i 3=1 J J i=i 
 
 n n 
 
 = z (Kp., cp ) a a - 2 Z (f, cp. ) a.. 
 i,J=l X * x 3 i=1 1 1 
 
15 
 
 The minimum of F is sought over all such u by differentiating with 
 
 respect to a a , which are the parameters that define u , then setting the 
 
 In o 
 
 derivatives equal to zero. Since L is symmetric, i.e., (Lcp., cp.) = (Lcp., cp.), 
 ^P^= 2 s (L?., «PJ a - 2(f, q> ). 
 
 The minimization of F(u ) is now reduced to solving the linear system of 
 equations 
 
 n 
 
 Z (Lcp , cp.) a = (f, cp ), i ■= 1, .2, ..., n. (5) 
 
 3=1 d J 
 
 Denote system (5) "by 
 
 Ax = b. 
 
 Matrix A is the Gram matrix [19] associated with the coordinate functions 
 
 ^1* ^2' '"' V i * e *> A i-j = (^i* Wp> x= ^ a p a 2> •••' a n ^ ' and 
 
 b = ((f, cp ), (f, cp ), . .., (f, cp )) . Symmetry of L implies A is symmetric. 
 
 Moreover, since L is positive definite, A is also positive definite: For, 
 
 if c = (c n , c , .... c ) is a nonzero vector and cp = c_,cp_, + c^cp^ + ... + 
 1' 2' ' n 11 2 2 
 
 c cp , we have 
 n T n' n n 
 
 (Lcp, cp) = ( z c Lcp., E c cp ) 
 
 i=l X 1 j=l J J 
 
 n n 
 
 = Z I (Lcp , cp ) c c 
 
 i=l j=l J J 
 
 = (Ac, c). 
 Since cp cp , ..., cp are linearly independent, cp ^ and 
 
 (Ac, c) = (Lcp, cp) > J 2 \\ cp || 2 . 
 
16 
 
 By choosing the a. in (h) to be the solution of system (5) we 
 obtain the Ritz approximate solution of the variational problem. 
 
 System (5) can be derived without utilizing the variational form. 
 Equation (l, a) is equivalent to 
 
 Lu - f - infi (I 1 , a) 
 
 Substitute u in {k) for u in (l',a). The requirement that the left-hand 
 side be orthogonal to the coordinate functions leads to system (5). This 
 is the idea of the Galerkin method. More technically, the Galerkin method is 
 a method for solving the weak form of the differential equation. 
 
 2.3 Finite Element Method 
 
 For the numerical solution of elliptic boundary value problems in 
 a bounded region ft, the finite element method has become a popular and 
 effective procedure. For simplicity we assume that the region ft is a 
 polygon. Let ft = ft U dft. The technique, first, is to partition ft into a 
 union of "finite elements," of which commonly used finite elements are 
 triangles and rectangles. The partitioning must be such that each pair 
 of finite elements shares a whole edge, a vertex, or nothing; a vertex of 
 one triangle or rectangle is not permitted to lie on the edge of another. 
 Next, a trial function is constructed with the property that it is a 
 polynomial on each finite element. We will restrict our discussion to 
 piecewise linear and piecewise bilinear trial functions in triangular and 
 rectangular elements, respectively. The linear approximation is described 
 in relation to the Ritz method and the bilinear approximation in relation 
 to the Galerkin method. 
 
17 
 
 For convenient reference we include some of the details of the 
 finite element method. Standard matrix notation is used. 
 
 The Ritz method reduces the Dirichlet problem (l, a) with boundary 
 condition 
 
 U loT; = g dS*) 
 
 to the problem of minimizing the functional F(u) = (Lu, u) - 2(f,u) in the 
 set of functions which satisfy the boundary condition (l',b). By Green's 
 formula 
 
 Z a. . ^ ^ + bu 2 
 1J ox. ox 
 
 3u 
 
 Z a. . «r- cos(p, x. )ds (6) 
 
 . . , XJ UA. UA. j . . n 13 ox. ' 1' 
 
 where rj denotes the inward normal to the boundary o"fi and s denotes the arc 
 length of bo. measured in a counter-clockwise direction from a fixed point. 
 
 Suppose the domain fi is partitioned into triangles T , t = 1, 
 2, ..., M, and the internal and the boundary node points are numbered 
 1, 2, ..., R; N+l, N+2, ..., N respectively. A linear function u T defined 
 on a triangle T in Figure 1 can be written as 
 
 u = a + bx + cy. 
 
 The three node values w of u T at the points P (x , y ), n = i, i, k 
 
 n nn 
 
 determine the values of a, b, c: 
 
 a + bx + cy = w , n = i, i, k. 
 n n n' ' ' 
 
Figure 1 
 
 18 
 
 In matrix notation 
 
 'i x. r 
 
 i\ M l\\ 
 
 1 x. y. 
 J 3 
 
 *k w 
 
 lei 
 
 w. 
 
 
 w . 
 
 which may also be written in transpose form 
 
 (a, b, c) 
 
 111 
 
 x. x. x k 
 
 i J 
 
 (■w ± , w , w k ). 
 
 \7i *, tJ 
 
 The transpose form is more convenient and will be used below. Let 
 
 -1 
 
 C T = 
 
 /111 
 
 X. X. x. 
 1 J is 
 
 Vi y j y J 
 
 T 
 
 w = (w.,w., w ) and * = 
 
 l' 
 
 x 
 
19 
 
 Then 
 
 t T r 
 
 u = V C ijf. (7) 
 
 Define the function v on the domain ft by 
 
 v T - 
 
 u T on T T , 
 
 on ft - T T . 
 
 Then, the summation v of the functions v 
 
 M 
 
 T 
 
 v = Z v 
 
 (8) 
 
 T=l 
 
 is a piecewise linear function on the domain ft. Let the boundary node 
 values be the values of g at the points. If g is piecewise linear on Sft, 
 then vk = g, i.e., v is a trial function satisfying the boundary 
 condition. Substitute v for u in (6) to obtain 
 
 M 
 F(v) = Z F(v T ) (9) 
 
 T=l 
 
 where 
 
 2 ^ T n t 2 f 
 
 ( z a. . g- g- + bu T ) dxdy - (2fu T ) dxdy 
 
 T< 
 
 F(v L ) = 
 
 T 
 
 t i,j=l J i j ^t 
 
 r 2 ^u T 
 
 + J g . E . a ij §r: cos (ti > x i } ds - 
 
 o:ft T l '3-l ^ 
 
 In the last term oft, denotes the sides of the triangle T lying on dft. 
 From (7) 
 
20 
 
 T 
 
 T T _T T _T 
 
 U = ¥ C f = |, C w^ 
 
 Su T T n T . T „t T TT 
 
 sr — = w C i|( = iir C w. 
 £x. Xi "x. 
 
 ■N T 
 
 Substitute these formulas for u and « — in the three terms of F(v T ). 
 
 ox. ' 
 
 1 
 
 To express the result, define 
 
 J ( - E i *« \ 
 ■'t i,o=l d i 
 
 B T = ( Z a )|r / + b Mr >|r T ) dxdy, 
 
 p T (2ft) dxdy, 
 
 We have 
 
 q = I g Z a. . \k cos (n. x. ) ds. 
 . . -, lj x. 1 
 
 f 2 ^ ^ A T 2 m T 
 
 ( E a. . ^ — =r — + bu ) dxdy = w C B C w. 
 . . , i t i ox. OX. • : ' 
 
 J T i*J=l i r 
 
 T { T 
 
 I (2fu) dxdy = w C T p T , 
 
 T T 
 
 g . ? , a ±d sir cos Cti * x i 
 
 A j T T T 
 
 ; ds = w C q . 
 
 T 
 Let matrix A = C B C , vector f = C p and vector g = C q . We obtain 
 
 F(v j = w A w - W f + w g, 
 
 To minimize F(v) in (9), differentiate F(v ) with respect to the 
 
 «r 
 
 parameters associated with internal node points. If, in triangle T , 
 
 w., w., and w are the parameters associated with internal node points, then 
 
21 
 
 = (A T + A T ) w - f 
 
 (10) 
 
 There are two different classes of triangles on which at least one node 
 
 point lies on the "boundary. If, for example, i is the only internal node 
 
 i 
 point of the triangle T and one edge of the triangle lies on the boundary 
 
 dft, then 
 
 p-iuVc'V*-^^ 
 
 (id 
 
 where the subscript 1 denotes the first row of a matrix and the first 
 component of a vector. For the other class of triangles the partial 
 derivatives of F(v ) with respect to one or two parameters associated with 
 internal node points can be expressed similarly. If an internal node point 
 i is shared by triangles T , T a , ..., T , then 
 
 0D> 
 
 oT(v) _ oT(v T ) + dF(v g ) + _ + dF(Q < 
 
 dw. c)w. ^w. owl 
 
 111 1 
 
 (12) 
 
 Combining (10), (ll) and similar expressions associated with other 
 classes of triangles, we obtain a linear system of equations 
 
 dF(v) _ n 1-12 N 
 Bw. ' " ' ' J 
 
22 
 The linear system can be written in matrix and vector notation as 
 
 Ax = b (13) 
 
 T 
 where x = (w , w , . .., w N ) . The solution of the system (13) minimizes the 
 
 functional F(v) in (9), and is the finite element approximation of the 
 
 Dirichlet problem (l, a) and (l',b). 
 
 If Q is the union of rectangular elements, not only is the geometry 
 simpler but also the construction of the family of coordinate functions. 
 The simplification is due to the fact that a rectangle is, mathematically, 
 the cross product of two intervals. Techniques for the construction of 
 polynomials on intervals to approximate functions of one variable form a 
 part of classical mathematical theory. There is no corresponding 
 well-developed theory of approximation by polynomials of functions defined 
 over general higher dimensional domains. But if the domain is a rectangle 
 the one -dimensional theory easily extends to higher dimensions by multiplying 
 approximating polynomials in one variable to obtain approximating polynomials 
 in several variables. 
 
 We assume that II is a rectangle with edges parallel to the 
 
 coordinate axes, i.e., Cl = (a, b) X (c, d). Partition [a, b], [ c, d], and 
 
 0. by tt , 7T , and ir respectively: 
 x y 
 
 tt : a = x^ < x., < . . . < x = b, 
 x 1 n ,, ' 
 
 x+1 
 
 % ■'• c = y < y < ... < y - d, 
 y y+1 
 
 TT = 7T X TT . 
 
 x y 
 
23 
 
 A piece-wise Hermite interpolation in H (ir, fi) is defined as a linear 
 
 combination of bilinear functions t. . (x, y) = t. (x) t.(y) for < i < n + 1 
 
 i j i j x 
 
 and < j < n +1 where t. (x) are linear functions defined by 
 
 t Q (x) = 
 
 x l - 
 
 X 
 
 *1 - 
 
 x o 
 
 
 
 
 x n < x < X T 
 
 otherwise, 
 
 t.(x) 
 
 *n + 1 (X) 
 
 X 
 
 ( X - X 
 
 i-1 
 
 1 1-1 
 
 x i + l - x 
 x i + l - x i 
 
 f X - X 
 
 n 
 
 x 
 
 X - - X 
 
 n +1 n 
 x x 
 
 x. n < x < x., 
 
 l-l — — i' 
 
 x. < x < x. ,,, 
 l — — i+l' 
 
 V. 
 
 otherwise, 
 
 x < x < x + 1 , 
 n — — n ' 
 
 x x 
 
 otherwise. 
 
 Define 
 n n n +1 n +1 
 
 where g..=g(x., y.)0<i<n + 1, < j < n +1. Clearly 
 if I i n x y 
 
 t (x., X.) = g. .. 
 g x' 2 X3 
 
 Note that if g is piecewise linear on dfi, then t k = g, and that 
 n n 
 
 x y 
 
 Z E v. . t. . is zero on oT2 for all arbitrary constants v. . Define 
 i=l J=l ^ 1J l 3 
 
 v(x, y) by 
 
 
 n n 
 x y 
 
 v = t + z Z v. . t. .. 
 
 i-1 j=l 
 
 X3 xj 
 
 (1+) 
 
2k 
 
 Then we have vk = g» Therefore, the function v can be used as a trial 
 function. 
 
 We shall seek an approximation, of the form of v in (l^ ) to the 
 solution of the boundary value problem. The approximation is determined 
 by parameters v. .. In the Rayleigh-Ritz method these parameters are 
 determined by the condition that v minimize a certain functional. In the 
 Galerkin method, they are determined by the condition that Lv - f be orthogonal 
 to each coordinate function t. .. We shall employ the Galerkin approach 
 to obtain a system of linear equations whose solution is the set of values 
 of v. .. We have the condition, 
 
 n n 
 x y 
 
 Z (Lt. ., t ) v. . = (f,t ) - (Lt , t ), / m ~ f» %> •••' M. (15) 
 . _ . . v ij' mrr 13 ' mir g' mn" Vn = 1, 2, ..., nJ 
 
 1-1 j-i -y 
 
 Since t are zero on the boundary dfi, in the system (15) 
 mn 
 
 (Lt. ., t ) = 
 1J mn . 
 
 R. . 
 
 ijmn 
 
 r 2 at. . at 
 
 ( Z a v „ ■^ L ^ m +bt.. t ) dxdy (l6) 
 k,i=l k " \ Sx i 1J m 
 
 where R. . is the intersection of the two rectangles [x. n , x. J x [y. ,, y J 
 ljmn & L i-l> 1+1 J LJ j-l' J j+r 
 
 and [x n , x , n ] X [y _, y , ] in which t. . and t are nonzero respectively. 
 L m-1' m+1 L n-1' n+1 ij mn J 
 
 Renumbering v. . in (15) by v. ; i = 1, 2, .... N(=n n ) we obtain a linear 
 ij 1' ' ' ' x y 
 
 system 
 
 Ax = b (IT) 
 
 T 
 where x = (v^ v , ..., v ) . 
 
 In Section 2, we have seen that the Gram matrix associated with 
 
 coordinate functions is positive definite. Matrix A in (17) is the Gram 
 
 matrix associated with the bilinear functions, and so is positive definite. 
 
25 
 
 Matrix A in (13) is also positive definite, and the proof follows: It is 
 simple to show that v in (8) is a linear combination of Synge's pyramid 
 
 N H T 
 
 function, i.e., v = z w. cp. + z w. cp., where cp. is a continuous and 
 
 -nil. txt.t i i 1 
 
 i=l i=N+l 
 piecewise linear function on ft such that the support of cp. (x, y) is the 
 
 union of triangles with the vertex i and cp. (x., y. ) = 1[26]« Therefore, 
 
 matrix A is the Gram matrix of the linearly independent pyramid functions 
 
 cp., i = 1, 2, . . ., 1. 
 
 Henceforth, A will be used as a generic symbol for a matrix arising 
 
 from the finite element method. 
 
26 
 
 3- MATRIX STRUCTURE AND STORAGE SCHEME 
 
 For the numerical solution of the Dirichlet problem, we have 
 obtained a linear system of equations Ax = b, where A is positive definite. 
 The structure of A and a scheme for storing the elements of A are discussed 
 in this chapter. 
 
 To investigate a linear system of equations with a sparse matrix, 
 it is convenient to use graph theoretical methods. The relation between 
 graph theory and positive definite systems is reviewed in Section 1. The 
 treatment is standard [23]« In Section 2, the relation of the structure 
 of A to ordering of the node points is described and some results on 
 possible structures given. A compact storage scheme for sparse positive 
 definite matrices is explained in Section 3* In Section k a technique is 
 presented for matrix-vector multiplication without the need for any 
 searching operations. 
 
 3.1 Graph Theory and Positive Definite Matrices 
 
 3.1.1 Graph theory. A graph is an ordered pair G = (X, E), where X is 
 
 a set of finite elements called vertices and E is a subset of unordered 
 
 pairs { {p, q} |p, qeX, p / q}. The elements of E are called edges . An 
 
 ordering of X is a one-one correspondence rx 
 
 a 
 [1, 2, ..., n} ^-> X. 
 
 If X is ordered by a in the graph G = (X, E), then the ordered triple 
 
 G = (X, E, a) is the ordered graph. In the ordered graph the point 
 n. — — *— — 
 
 assigned the number i will be denoted by a. or simply by i« A 
 
27 
 
 realization of a graph on a plane is trivial: vertices and edges are 
 represented by points and line segments joining the pair of points 
 respectively. 
 
 If IP* 9.3 is an e dg e j then q is a vertex adjacent to p, and the 
 set of vertices adjacent to p is denoted by adj (p) . A cycle of length n 
 is an ordered set of n distinct vertices [p , p , ..., p , p ] such that 
 p e adj(p. ), i = 1, 2., ..., n-1, and p e adj (p ), and a chord is an 
 edge of two non- consecutive vertices of the cycle. A graph is triangulated 
 if every cycle of length greater than three possesses a chord. 
 
 3.1.2 Graph theory applied to positive definite matrices. For a positive 
 definite matrix A there exist unique matrices L and D such that 
 
 T 
 A = L D L , 
 
 where L is a unit lower triangular matrix and D is a diagonal matrix with 
 positive entries [16]. If we solve the system 
 
 r Lz - b 
 Dy = z 
 .L x = y, 
 
 then x is the solution of the original system Ax = b. In general the 
 
 matrix L is not as sparse as the lower triangular part of A. 
 
 T 
 The system Ax = b is equivalent to the system PAP Px = Pb where 
 
 P is a permutation matrix. (A permutation matrix is a matrix for which 
 
 each row and column contains exactly one element equal to unity, the other 
 
 elements being equal to zero.) If there exists a permutation matrix P 
 
 such that 
 
28 
 
 T T 
 B = PAP = L D L , 
 
 where B. . = implies L. . = 0, i > j, then A is called a perfect elimination 
 
 matrix . A graph theoretic characterization of perfect elimination matrices 
 
 will be stated below. 
 
 We associate with a positive definite matrix A of order N an 
 
 ordered graph G (A) = (X, E, a) with N vertices cu, rv p , . .., r/ , such that 
 
 the vertex n. corresponds to row i and [a., a..} e E if and only if A. . 4 
 i i ' o io 
 
 and i < j. It is simple to show that if a matrix B is obtained from 
 matrix A by interchanging rows A. and A. and interchanging corresponding 
 
 ■*" tJ 
 
 columns A, . and A, ., then the ordered graph G R (B) associated with the 
 
 matrix B is obtained from G (A) by simply interchanging the two numbers 
 
 assigned to the points a. and ex. • i.e., ct. = p., ct. - P., a, = P, for 
 
 i o ■ 1 i * * 
 
 k / i, j. This interchange of exactly two elements is called a 
 
 T 
 transposition. Let B = P A P . Then B is obtained from A by permuting 
 
 its rows and columns, and the permutation is defined by P. Every 
 
 permutation can be written as a product of transpositions, from which it 
 
 can be shown that the ordered graph of B is obtained from the graph of A 
 
 by reordering its vertices. To state this precisely, if P.. = 1, define 
 
 J -'- 
 
 P(i) = 0« We have the following proposition: 
 
 T 
 Pro-position 1 : For a given matrix A of order N, let B = P A P 
 
 and the graph of the matrix A be given by G (A) = (X, E, ry). The graph 
 
 of B is obtained by reordering the vertices X indicated by the permutation 
 
 matrix P, i.e., G (b) = (X, E, p) where p is defined by 3 p (j_) = a. ± , 
 
 i = 1, 2, ..., N. 
 
 T 
 Now we can associate with the class of matrices {B|B = PA P } 
 
 an unordered graph G(A) by deleting the ordering a from the ordered graph 
 
29 
 
 G (A). It is known that G(A) is triangulated if and only if A is a perfect 
 
 elimination matrix [23] • 
 
 3.2 Structure of the Matrix Associated with the Finite Element Method 
 
 The locations of nonzero elements in matrix A are determined by 
 the structure of the triangular net and the ordering of the node points 
 of the net. A more detailed description of the relationship is as follows: 
 Assume that an internal node i is connected to internal nodes j_, j_, ..., j. 
 by the edges of triangles 1 , t f ..., t as in Figure 2. The i-th equation 
 of the system Ax = b in (13) is -^ — <- - 0, and, as in (12) 
 
 "d 
 
 w. 
 
 aF(v) _ \ 5F(v n ) 
 
 ow\ _ dw\ 
 1 n=l 1 
 
 (18) 
 
 Figure 2 
 
 By (10) 
 
 1 x 
 
 1 T 
 
 where A is a 3x5 matrix and w = (w., w. , w. ). If we denote the first 
 
 ^1 ^1 T l ^1 ^1 
 
 row of the matrix (A + A ) by (a.., a. . , a.. ), then 
 
 11' 2.3 ± ' ij 2 
 
 Mz!ii_ a T i 
 
 T i t i t i 
 
 ~ = a . . w . + a . . w. + a . . w. + f -, • 
 
30 
 
 Similarly, 
 
 T 
 
 — ^ L = a. . w. + a. . w. + a. . w. . + f_ , n = 2, 3, • • . t k-1. 
 
 ^T X1 X 10 n J n «(a+l) 1J (n + l) X 
 
 aF(v Tk ) T k r T k T k Jk 
 
 «J ' ' = a., w. + a. . w. + a. . w. + f . 
 dw ± 11 i ij k j k 13 x J-l 1 
 
 Thus 
 
 dF(v) 
 
 -•^r — = a. . w. + a. . w. +a.. w. + ... + a. . w -b. 
 
 ow. ii i lj-. j, ij j ij, j. i 
 
 i u l u l d 2 d 2 d k d k 
 
 where 
 
 
 k 
 
 T 
 
 n 
 
 
 a. . = 
 
 E a. 
 
 
 ii 
 
 n=l 
 
 ii 
 
 
 
 
 
 
 T l 
 
 
 T k 
 
 a. 
 
 a. . 
 
 + a. 
 
 U-L 
 
 1J 1 
 
 
 13 X 
 
 a.. — a.. + a. . , n — 2, o, ...,k 
 
 ij 10 id 
 
 n n n 
 
 T l T 2 T k 
 
 b. - -f, - f_ - . . . - f_ . 
 
 ill 1 
 
 Therefore if internal node point i is connected to internal nodes j , j , 
 •••> 3-uj ^■' ie ^-"^ equation of Ax = b is 
 
 a. . w. + a. . w. + a. . w. + ... + a. . w. = b. (19) 
 
 ii l i3 ± J 1 ij 2 j 2 ij k J k l 
 
 It follows from (12) that if node point i is also connected to 
 boundary nodes, then the boundary nodes induce no additional terms in (l9)» 
 To summarize we have: 
 
 Proposition 2 : In a given triangular net of the domain fi, if an internal node i 
 is connected to k internal nodes j_ , j p , . .., j, by an ordering of the node poin 
 then, in the linear system Ax = b, the nonzero elements of i-th row of A are at 
 most A.., A.. , A.. , .... A.. . 
 
31 
 
 Without loss of generality we shall assume in the following pages 
 that A... A. . , A. . , .... A. . are nonzero. 
 
 As an illustration consider the nonuniform triangular net of a 
 rectangular domain in Figure 3« With the node points ordered as shown, 
 matrix A is as in Figure k where nonzero elements are marked by an x. 
 
 Figure 3 
 
32 
 
 XX 
 
 X 
 
 
 XXX 
 
 XXX 
 
 
 XXX 
 
 X 
 
 
 XXX 
 
 XXX 
 
 
 XXX 
 
 X 
 
 
 XXX 
 
 * 
 
 XXX 
 
 • 
 
 
 1 
 
 • 
 
 XXX XXX 
 XXX X 
 XXX XXX 
 XXX X 
 XXX XXX 
 XX X 
 
 
 XX 
 
 XX X 
 
 
 X 
 
 XXX XXX 
 
 
 XXX 
 
 XXX X 
 
 
 X 
 
 XXX XXX 
 
 
 XXX 
 
 XXX X 
 
 
 X 
 
 XXX XXX 
 
 • 
 
 • 
 
 
 • 
 
 • 
 
 • 
 
 
 • • 
 
 • 
 
 
 X XXX 
 
 XXX 
 
 
 XXX XXX 
 
 X 
 
 
 X XXX 
 
 XXX 
 
 
 XXX XXX 
 
 X 
 
 
 X XXX 
 
 XXX 
 
 
 XX XX 
 
 X 
 
 
 XX XX 
 
 
 
 X xxxx 
 
 X 
 
 
 XXX XXX 
 
 
 
 X xxxxx 
 
 XXX 
 
 
 XXX XXX 
 
 
 
 X XXXXX X 
 
 • • 
 
 
 • 
 • 
 
 X 
 
 XXXXX X 
 
 
 XXX 
 
 XXX 
 
 
 X 
 
 XXXXX XXX 
 
 
 XXX 
 
 XXX 
 
 
 X 
 
 XXXX X 
 
 
 XX 
 
 XX 
 
 
 X X 
 
 XX X 
 
 
 X 
 
 XXX XXX 
 
 
 X X 
 
 X XXX X 
 
 • • • 
 
 *X XXX XXX 
 XX XX X 
 XX XX X 
 X XXX XXX 
 XXX XXX X 
 
 • • • 
 
 * X XXX XXX 
 XX XX X 
 XX XX 
 X XXX 
 XXX XXX 
 
 • • 
 
 'x XXX 
 
 \ 
 
 
 XX XX 
 
 Figure k 
 
 A finite element triangulation of a region is a graph, and is the 
 graph of the resulting finite element matrix, if the coordinate functions 
 are piece-wise linear. This fact is well known; we mention it here for 
 use below in a remark. Some adjustment must be made for boundary points; 
 
33 
 
 an exact statement is given below. It is not true that the discretization 
 net is the graph of the corresponding matrix for higher order approximations, 
 for bilinear coordinate functions, or for the method of finite differences. 
 Nevertheless an association between the net and the matrix, which does not 
 fit into graph theory, may be fruitful as Alan George has shown with the 
 method of nested dissection [15]. 
 
 Corollary : In a given triangular net, after removing the boundary node 
 points together with edges incident to them, the remaining triangular net 
 is called the inner triangular net. The inner triangular net of a given 
 triangular net of ft is the graph of the matrix A. 
 
 It would be optimum for the application of direct methods for 
 matrix A to be a perfect elimination matrix [23], i.e., the LU decomposition 
 of A does not fill in any of the zero elements of A. Rose has proved [23] 
 that A is a perfect elimination matrix if and only if G(A) is triangulated. If 
 a triangular net has an internal node point surrounded by a cycle of at least 
 four adjacent internal node points, then the graph of the matrix A is not 
 triangulated, and A is not a perfect elimination matrix. In most realistic 
 problems, every triangular net has many internal node points possessing at 
 least four adjacent internal node points, and so most realistic problems 
 do not yield a perfect elimination matrix. 
 
 In this thesis we shall only study approximate factorizations of 
 matrices with at most nine nonzero elements per row. Some justification 
 of this follows. The error bound of the approximation is proportional to 
 the term l/sin 6 where 6 is the smallest angle formed by a pair of edges 
 with a common node point [33 ] • The error bound is therefore minimized if 
 the triangles are equilateral. In order to propose a restriction in 
 triangulation scheme we define the concept of a mean angle at a node of 
 triangular net. 
 
Definition : If an internal node point is a common vertex of n triangles, 
 then the mean angle at the node is 36o°/n, and if a boundary node point j 
 is a common vertex of m triangles, then the mean angle at the node j is A°/m 
 where A° is the angle covered by the two boundary edges incident to the 
 node j. 
 
 Conclusion : If the mean angle at every node point is not less than ^5"', 
 then the number of nonzero elements in every row of A is not more than nine. 
 This assumption is satisfied in most triangulations used for practical 
 purpose. 
 
 Under the assumption that coordinate functions are piecewise linear 
 defined over a triangular net, we have seen that the triangulation is the 
 graph of A and A is not a perfect elimination matrix. We next show that 
 triangulations exist for which no reordering of the node points yields a 
 banded matrix. This property would be desirable because band matrices 
 are simple to store. We do not present a graph theoretic characterization 
 of banded matrices such as is possible for perfect elimination matrices, 
 but only show that for one simple triangulation a banded matrix is not 
 possible. First we give some definitions. 
 
 A set of three consecutive positive integers is called a basic set . 
 The distance between two disjoint basic sets B and B is defined by 
 
 d = mini a. - b . I - 1, a. e B_, b. e B . 
 1 i 3 ' i 1 J 2 
 
 A family of three disjoint basic sets is called a compound set if the 
 distances of "adjacent" basic sets are equal. The equal distance is 
 called the gap of the compound set. The center of a compound set is 
 the middle element of the increasing sequence consisting of the elements 
 of the set. If for row i of matrix A, the column indices of nonzero 
 
35 
 
 elements are contained in the compound set of gap d with center i, for d 
 independent of i, then the matrix is called a three - banded matrix with 
 gap d. All nonzero elements of a three -banded matrix are contained in at 
 most nine diagonals, therefore it is enough to store only five diagonals 
 and the gap to store the matrix. It is easy to construct an example of a 
 triangular net such that A is not three -banded. 
 
 If the inner triangular net of a region contains the net G depicted 
 in Figure 5> then there is no ordering of the node points such that A is a 
 three-banded matrix with any positive gap. 
 
 Figure 5 
 
 For an inner triangular net containing G, suppose that there exists 
 
 an ordering of the node points such that A is a three-banded matrix with 
 
 positive gap d. Then there are two compound sets C and C with centers, 
 
 i and j, respectively, which order v^ v £ , v , v^, v , o, p, q, r, u^ u , 
 
 u , u, . and u_, such that |C_ D C_ I =1+. Let i < j. It is clear that 
 2 4 5 '12' 
 
 j=i+d+2orj=i+d+4. Suppose j = i + d + 2. Then q should be 
 ordered by j + 1 = i + d + 3 or j - d - 3 = i - 1. If q is ordered by 
 i - 1, then s and t should be ordered by i - d - 5; and if q is ordered by 
 3+1, then s and t should be ordered by j + d + 5. Each is a contradiction. 
 Arguments for j =i+d+4is similar. 
 
36 
 
 3.3 A Storage Scheme for Sparse, Symmetric Matrices 
 
 Several compact storage schemes have been developed for sparse 
 symmetric matrices. We shall mention some of these, and describe the scheme 
 we use. For matrices with a dense band the "diagonal band storage scheme" 
 is very efficient because it does not require any bookkeeping information 
 about indices of stored elements and overhead storage due to zero elements 
 in the stored band is negligible. In the "profile storage scheme" of 
 Jennings [171 all elements from the leading nonzero element to the diagonal 
 element of each row are stored successively in a one-dimensional array. 
 An "address sequence" of another one -dimensional array is used to locate 
 the position of the diagonal elements. George has proposed [Ik] storing 
 only nonzero elements of the lower triangular part (including the diagonal) 
 of the matrix of order N in a one -dimensional array S. Two one -dimensional 
 arrays, one of order equal to the order of S and the other of order N, are 
 used to access A from S. 
 
 In the scheme we propose, the upper triangular part of A is stored 
 as explained below. It is similar to George's method, but takes less 
 storage. Suppose the upper triangular part of A is, 
 
 A ll A 12 A m 
 
 A 22 A 23 A 2n A 2 ri+1 A 2 n+2 
 
 A 33 V A 3 n+2 
 
 A. . A. . _ A. . A. A. A. 
 
 ii i l+l x 1+2 lm i m+1 l m+2 
 
 \-l N-l \-l N 
 
37 
 
 The nonzero elements are stored successively from the first to the last row 
 in a one -dimensional array OS, 
 
 os i - V 0S 2 ■ V os j ■ V 
 
 °\ • V °S - V ° S 6 " V ° S 7 " A 2 n + l> 0S 8 . A g ^, ... 
 
 °VV ° S 10 = V ° S ll = A 3n + 2' "• 
 
 ° 8 3 - \i' °Vl " A i ML' ° S j + 2 " A i i + 2> ° S o + 3 " V 
 
 V = 'i*l' °V 5 = A im + 2' - 
 0S M-2 = ^-1 N-l' ° S M-1 = Vl H 
 
 0S M " V- 
 
 In order to retrieve A from storage, another one-dimensional array L is 
 used to store the number of nonzero elements of each row and column indices 
 of the saved elements not on the diagonal. The element of L corresponding 
 to the diagonal element is the number of nonzero elements in the row with 
 that diagonal element, and the element of L corresponding to a nonzero 
 element is the column index of that non-diagonal element. For the example 
 above, 
 
 L l = 3 ' L 2 = 2 > S = n ' 
 
 \ = 5, L^ = 3, L 6 = n, 1^ = n+1, L g = n+2, 
 
 L 9 = 3 > L io = k > L n = n+2 > ' ' ' 
 
 L. = 6, L. . = i+1, I. _ = i+2, L. , = m, L. , = m+1, L.._ = m+2, 
 j ' ,1+1 ' j+2 ' j+3 ' M » j+5 ' 
 
38 
 
 This method for storing the elements of A will be called a natural compact 
 storage scheme . 
 
 3.^ Matrix-Vector Multiplication Using a Row-Oriented Storage Scheme 
 
 The storage scheme we have described is row oriented: access of 
 elements in a fixed row is easy, but access of elements in a fixed column 
 may require searching through several rows. This means that straightforward 
 computation of Ax, which is necessary in computing the residual, may be 
 relatively expensive for an iterative method. However, it is possible to 
 avoid this difficulty by a simple technique described briefly in this 
 section. 
 
 First, we explain more the difficulty with matrix vector multiplica- 
 tion. Suppose A = (A. . ) 1 < i, j < N, in standard matrix notation and 
 
 -'-J 
 
 x = (x , ..., 2C.). Consider the k-th component of Ax, 
 
 k-1 N 
 
 (Ax) = Z \. x ,.+ Z \, x . (20) 
 
 Computation of the second sum is simple to program using the row oriented 
 storage scheme proposed in Section 3, because the values of a.-,, a^. t-.-i* ••• 
 appears in sequence in array OS. Computation of the first sum is not as 
 simple. Because only the upper half of the matrix is stored, the 
 coefficients in the sum are stored as the elements above the diagonal 
 in the k-th column. Access of each coefficient from storage requires 
 searching through the row to which it belongs. The total number of 
 searches to compute Ax equals about half the number of elements of A, and 
 computation of the residual at each step of the iteration would be too 
 expensive . 
 
39 
 
 To explain how to avoid these search operations, we rewrite (20 ), as 
 
 k-1 N 
 
 (Ax), - E A., x. + z A. . x.. (21) 
 
 Consider any one of the terms, say A x., in the first sum, i.e., j < k. 
 
 Jk J 
 
 When (Ax), was computed, access of A to compute A x_ did not require a 
 d Jk jk k 
 
 search. If in the program segment to compute A x, we also include 
 statements to compute A., x. and store this in the location for (Ax), , we 
 will have achieved the computation without any search operations. It 
 should "be clear that each component of Ax may he computed with no search 
 operation and no extra multiplications . An algorithm for computing Ax. = y 
 with A stored in OS by the natural compact storage scheme is given in 
 Flowchart 1 of the Appendix. 
 
 
1+0 
 
 k. ADAPTIVE ITERATIVE PROCEDURE TO 
 SOLVE FINITE ELEMENT SYSTEM 
 
 k.l Introduction 
 
 The finite element method reduces the Dirichlet problem to a finite 
 element linear system with matrix A such that A is sparse, positive definite, 
 not a perfect elimination matrix, and the structure is irregular. In Chapter 
 1, we discussed the use of iterative and direct methods to solve finite 
 element systems. In this chapter we present an iterative method for solving 
 a typical, irregular finite element system that arises in the numerical 
 solution of the mathematical model of the flow of liquid waste in underground 
 sandstone formations. 
 
 The iterative method is described in Section 2. ■ It is due to 
 Diamond [10] and is referred to here as an adaptive- Chebyshev- factorization 
 (ACF) method. Efficiency of the ACF method depends on the existence of an 
 almost factorization A+B with a sparse LU decomposition for which the ACF 
 method converges rapidly. The technique for constructing A+B is a modification 
 and extension of the technique H. L. Stone introduced in his Strongly Implicit 
 Procedure [2^], which in turn is based on ideas originally due to Buleev. 
 
 The algorithm for the approximate factorization is described in 
 Section 3« It is presented in such a way as to emphasize implementation on 
 a computer; that is, features of the algorithm that generalize to a class 
 
 of finite element systems are an integral part of the flowchart description. 
 
 T 
 The almost factorization matrix A+B is factored in the form LDL . 
 
 T 
 Only the nonzero elements of L (or L) and D are stored. The elements of 
 
T 
 L are stored in a one-dimensional array as they are generated. An algorithm 
 
 T 
 for solving L D L x = y without searching the elements of L is also presented. 
 
 Computational results obtained by the new algorithms are presented in Section ^ 
 The algorithms are tested with three classes of matrices. The order of the 
 matrix is increased up to 10660. These experimental results show the per- 
 formance of the algorithm to be satisfactory. 
 
 k.2 The Adaptive-Chebyshev-Factorization Method of Diamond 
 
 H. L. Stone proposed an iterative technique, which he calls a 
 Strongly Implicit Procedure (SIP) for solving the linear systems that arise 
 from the finite difference discretizations of the flow equations in petroleum 
 reservoir simulation [2^]. Experience has shown SIP to be a reliable 
 efficient technique for solving many difficult systems, even though no 
 rigorous mathematical analysis of its properties has been performed. SIP 
 is widely used in the oil industry as well as other applications areas. In 
 some installations it is used exclusively. The ACF method that we propose 
 for finite element systems originated with work on the analysis of SIP, and 
 so we begin with a brief description of SIP. 
 
 Stone's Strongly Implicit Procedure is, first, an iteration 
 
 (A+B) x. . = (A+B) x. - (Ax.-b) 
 l+l 1 1 
 
 to solve Ax = b and, second, a specific algorithm to construct an almost 
 factorization, A+B. The purpose of the iteration is to reduce the error 
 caused by solving (A+B) y = b instead of Ax - b. The iteration Stone 
 proposed achieves rapid convergence by varying matrix B cyclically according 
 to a complicated, but easy to compute, formula. 
 
k2 
 
 Stone's formula is the result of experience, intuition and skill, 
 with no mathematical theory to explain its success. In an attempt to 
 analyze SIP, certain mathematical simplifications may be made. SIP employs 
 a sequence of matrices B such that A+B is nonsymmetric. To simplify this, 
 Stone's construction may be modified to make A+B symmetric, in fact, positive 
 definite. Next A+B may be fixed instead of varying from step to step. To 
 accelerate convergence another iteration parameter may be introduced to 
 modify the residual. The iteration becomes 
 
 (A + B) x. _ = (A + B) x. - t. (Ax. - b). (22) 
 
 l+l 111 
 
 Dupont, Kendall, and Rachford Cll] studied this iteration under the assumption 
 that A and A+B are positive definite. They showed that the selection of t. 
 may be made scientifically by the use of classical Chebyshev theory. 
 Unfortunately, in order to apply the theory it is necessary to possess 
 accurate approximations of the largest and smallest eigenvalues of (A + B) ' A. 
 Diamond's contribution has been to show how to estimate these eigenvalues 
 dynamically. Some of the details of Diamond's method will now be given. 
 
 Let x be the solution and e. = x - x. be the error. Then, 
 
 1 l ' 
 
 e. _ = (A + B)" 1 (A + B - t. A) e.. 
 l+l 11 
 
 Therefore, for any vector norm and consistent matrix norm, 
 
 n-1 
 
 Hell < || tt (I - t (A + B) x A) || ||e ||. 
 n i=0 x 
 
 We shall use the Euclidean vector norm and the spectral matrix norm. Since 
 
 -1 1 -1 i 
 
 (A + B) A is similar to the Hermitian matrix A 2 (A + B) A 2 , the 
 
 eigenvalues *.. ((A + B) A)' are real and [ll] 
 
k3 
 „ = x r r a ^r 1 a^ - min (Ay, y) 
 
 a - "-min ((A + B) A) - y£> ((A+b) y, y) 
 
 k = x ( ( i\ 4. -r)' 1 i\) - raax (Ay, y) 
 
 b - \a* ((A + B) A) - y fo ((a + b5 y, y) * 
 
 Iteration parameters t. are selected to minimize the spectral radius of the 
 
 n-1 _! 
 
 error propagation matrix tt (I - t. (A + B) A) . The optimum parameters 
 
 i=0 X 
 
 are the values that minimize the maximum absolute values of the polynomial 
 
 n-1 
 P (x) = T (1 - t.x) 
 i=0 
 
 on the positive interval [a, b]. By Chebyshev analysis the sequence t. 
 should be selected such that 
 
 T ((a + b - 2x)/(b - a)) 
 P n (x) = \ ((a + b)/(b - a)) (23) 
 
 where T is the Chebyshev polynomial of degree n. The maximum absolute 
 
 . -1 
 
 value of P (x) on [a, b] is [T (- )] , which is less than 1. The 
 
 n ' n v b - a 7 ' 
 
 recursive property of the Chebyshev polynomials may be used to define the 
 iterates x. of (22) by 
 
 x. = x. + Sx.; i = 0, 1, 2, . . ., n - 1 (2k, a) 
 
 ^T,(y) -, T n(y) 
 
 Sx. = jr t-t= r-r (A + B)" X (b - Ax.) +' / v Sx. _ (2k, b) 
 
 l (b - a) T i+1 (y) i' T i+1 (y) i-l ' 
 
 6x o = rfb (A + B)_1 (b " ^V (2 ^> c) 
 
 u a+b 
 
 where y = = . 
 
 b - a 
 
 Although A. . and A. are not available, matrix (A + B) can be 
 
 mm max ' 
 
 generated such that the interval [A. . A. ] includes 1. If we choose an 
 
 min max 
 
kk 
 
 arbitrary positive interval [a, b] containing 1, then one of these four 
 
 cases will arise: 
 
 (i) a < \ . and b < \ , 
 rain max' 
 
 (ii) a > X. . and b > \ , 
 man max' 
 
 (iii) [a, b] c \\. , X ], 
 
 v ' L ' J L mm' max J ' 
 
 (iv) [a, b] => YK ,. , \ ]. 
 
 L ' J — L min' max 
 
 In case (iv) the iteration (2^-) obviously converges; in other cases it may 
 
 not. For other cases, Diamond developed an algorithm that yields a sequence 
 
 of intervals Fa., b.l such that b. ->• \ . a. -+ \ . . and a. -» \ . and 
 L 1' 1 i max' i mm' i min 
 
 b. -* \ in cases (i), (ii), and (iii) respectively. In order to show 
 x max 
 
 some of the details, let S. ,_ = (A + B)~ (b -Ax.) (see (2k, b)). It can 
 
 i+l l 
 
 be shown that 
 
 Vi ■ n \ (I - *± (A + B)_1 A) 6 r 
 
 1=0 
 
 If I P (X. . ) I 4 P (^ )|, 6 n approaches the eigenvector corresponding 
 1 n min ' ' ' n max ' ' n+1 
 
 to one of the extreme eigenvalues of (A + B) A, and the approximate 
 eigenvalue is given by 
 
 (A5 _ , 8 .. ) 
 v n+1' n+1' 
 
 U * = ((A + B) B , 8 n+1 )' 
 
 (25) 
 
 This quantity is used to determine which of the four cases occurred and to 
 obtain a new interval. In the actual implementation the possibility that 
 
 I P (A. . )|~|P (^ )| is also considered. The iteration of (2k ) is cyclically 
 
 1 n min '= ' n max ' 
 
 performed with adjusted eigenvalue bounds at the end of each cycle if needed. 
 In (2k, c) x is an initial estimate of the solution for the first iteration 
 
^5 
 
 cycle, x is used as the initial guess for the next iteration cycle, and so 
 
 on. In summary, while the interval [a, bl of the initially estimated 
 
 eigenvalue bounds is approaching one of 
 
 (i f ) [a, X 1, 
 L ' max " 
 
 (ii') [X . , b], 
 L man' ' 
 
 (iii') [X . , X ], 
 L man 7 max ' 
 
 (iv') [a, bl, 
 
 in cases (i), (ii), (iii), and (iv) respectively, x. converges to the 
 solution of the system Ax = b. 
 
 If the initial estimate [a, b] is very crude, then at the end of the 
 first iteration cycle 
 
 Ax - b » Ax. - b . 
 ii n ii ii o M 
 
 It is better to choose x. again, instead of x , as the initial guess of the 
 
 . ' n 
 
 solution for the next iteration cycle, and so on. With this modification 
 
 Diamond's procedure is shown in Flowchart 2 of the Appendix. If a « X . 
 
 man 
 
 or A. » b in (i'), (ii'), and (iv'), then the final interval is much 
 
 longer than the optimum interval \X . . X 1. But it is easy to avoid 
 
 L man' max J J 
 
 this by choosing the initial a and b to be close to 1. Within a few cycles 
 of iteration, experience has shown that the interval of the bounds stabilizes, 
 Once the interval is stabilized no further adjustment is required after the 
 next iteration cycle. When this happens, iteration {2k) can be performed 
 with increased n until a satisfactory approximation is achieved. Experience 
 with the algorithm has been that about forty iterations are sufficient to 
 reduce the norm of the residual to less than 10 
 
U6 
 
 k.~5 Almost Factorization Algorithm 
 
 There are an infinite number of matrices A + B for which a parameter 
 sequence exists to make iteration (2k) converge and for which the system 
 (A + B) y = c is trivial to solve. More technically, there are an infinite 
 
 number of almost factorizations such that A + B is positive definite and 
 
 T 
 possesses a sparse LU or LDL decomposition. The simplest is A + B = I for 
 
 which the iteration reduces to 
 
 x. ,, = x. - t. (Ax. - b), 
 l+l l 11 ' 
 
 which is known as Richardson's method. Richardson's method typifies the 
 
 essential problem with selecting an almost factorization: slow convergence. 
 
 The technique Stone employed was to select B in such a way that Bx = 
 
 when the components of x are the values of a first degree polynomial at the 
 
 node points of the finite difference grid system. Stone's technique may be 
 
 modified to yield a symmetric almost factorization, but.Bx vanishes only when 
 
 the components of x are constant. This symmetric version has been developed [3] 
 
 for a finite difference system resulting from a rectangular and therefore 
 
 regular grid network. It is the objective of this thesis to modify the 
 
 Stone technique to yield an almost factorization for an irregular finite 
 
 element system, A, with the following characteristic properties : 
 
 (i) Each row of the upper triangular half of A contains at most 
 
 six nonzero elements, divided into two sets FA and SA. 
 
 (ii) Each set has at most three consecutive elements. 
 
 (iii) Let the smallest column index of the second set of elements 
 
 in row i be SCI.. If i < i, then SCI. < SCI., 
 l ' i-0 
 
 These properties of A are shown in Figure k. 
 
+7 
 
 For a positive definite matrix, there are three common triangular 
 decompositions essentially equivalent to Gaussian elimination, the LU, the 
 Cholesky GG , and the U DU decompositions. Factor U is unit upper 
 
 triangular, that is, the diagonal elements are l's. These decompositions 
 
 T 
 are unique, from which it follows that L = U D. Since only G must be 
 
 saved, the Cholesky decomposition apparently takes less storage, but it 
 
 involves extracting square roots. Also, it is a simple obversation that 
 
 T 
 no more storage is required for the U DU decomposition since the additional 
 
 elements to be stored are the elements of D and these may be saved in place 
 
 of the diagonal of U which consist of l's. Although the LU decomposition 
 
 yields nonsymmetric factors, it does not require more storage than the 
 
 T 
 LDL form because only U and the diagonal elements of L, which are the 
 
 diagonal elements of D, need be saved. The LU decomposition will be used 
 
 below with this strategy, to save U and the diagonal elements of L. Thus, 
 
 the LU decomposition is performed, but the result is saved in the form of 
 
 T 
 the LDL decomposition. The nonzero elements of A are shown below. 
 
 A = 
 
 A ll A 12 
 A 12 A 22 A 23 
 
 23 33 ^ 
 
 A 
 
 A 
 
 In 
 
 A 
 
 2n 
 
 n-1 n-1 
 
 In 2n 
 
 A 
 
 nn 
 
 A 
 
 A 
 
 2 n+1 A 2 n+2 
 
 A 
 
 n n+1 
 
 A 
 
 3 n+2 
 
 2 n+1 
 
 A -. A _ . A . 
 n n+1 n+1 n+1 n+1 n+2 
 
 A 
 
 2 n+2 
 
kQ 
 
 The role of matrix B is to make the factors of A + B sparse. The LU 
 factors of A + B are to have the form shown below. These factors when 
 multiplied do not yield A, but A + B, where B = L U - A. The nonzero 
 elements of B are also shown. 
 
 L = 
 
 h: 
 
 L 21 L 22 
 
 L 32 L 33 
 
 nl 
 
 n+1 2 
 
 n+2 2 
 
 U = 
 
 u. 
 
 12 
 
 tL 
 
 U, 
 
 3+ 
 
 U. 
 
 In 
 
 U 2 n+1 U 2 n+2 
 
 U 
 
 3 n+2 
 
 B = 
 
 B 
 
 -B 
 
 2n 
 
 ;h 3 n+1 
 " B 3 n+1 
 
 2n " B 2n 
 
 B 3 n+1 " B 3 n+1 
 
 " B 2n 
 " B 2n 2B 2n " B 3 n+1 
 ~ B 3 n+1 2B 3 n+1 
 
k9 
 
 The elements of L and U are computed from a set of algebraic relations 
 obtained by setting LU equal to A + B. The details follow. From 
 
 (L 11' L 11 U 12' °>'"' °' L ll U ln' °> ' ' '> 0) = (A 11' \2' °'"-> > A ±n>°> ' ' '> 0) > 
 
 we obtain 
 
 L ll ~ A ll' 
 
 U 12 = V 1 !!' 
 
 U- = A_ /L._. 
 In In' 11 
 
 Also, 
 
 L 21 = L ll ^ ( = V' 
 
 L = L U_ (= A ). 
 nl 11 In In 
 
 The second row of the matrix L U is 
 
 (L 21' L 21 U 12 + V L 22 U 23' ' -'°' L 21 U ln' L 22 U 2 n + l' L 22 U 2 n + 2' °' -»°>' 
 
 L 21 is already equal to A^. From L^ U^ + L 22 = A g2 and L 22 U^ = A , 
 we obtain 
 
 L 22 = A 22 ' L 21 U 12> 
 
 u = A /L . 
 23 23 22 
 
 L^., U. is known, and the element B^ is initialized from L^ n U. = A_^ + B^ , 
 21 In ' 2n 21 In 2n 2n' 
 
 B 2n " L 21 U ln " A 2n 
 
50 
 The second row sum of B is zero, if 
 
 B 2 n+1 = ~ B 2n' 
 
 From L ori U ^ = A + B_ _ and L U = A , we obtain 
 
 22 2 n+1 2 n+1 2 n+1 22 2 n+2 2 n+2' 
 
 U 2 n+1 = (A 2 n+1 + B 2 n+l ) / L 22' 
 
 U 2 n+2 = A 2 n+2 /L 22* 
 If B is symmetric and each row sum is zero, 
 
 B n2 = B 2n> 
 
 B n+1 2 - B 2 n+1 ( ~ " B 2n } 
 
 n n+1 2n' 
 
 3 n+l n " B n n+1 ^ " B 2n^ 
 
 B , _ = 2B^ . 
 n+1 n+1 2n 
 
 Without any further computation, we know 
 
 L 32 = L 22 U 2 5 (= V' 
 L n+1 2 = L 22 U 2 n+1 (= A 2 n+1 + B 2 n+1 = A 2 n+1 + Vl 2 )f 
 L n+2 2 = L 22 U 2 n+2 ^ = A 2 n+2^ 
 
 and also 
 
 L _ IL. = L__ IL U no (= L__ U n = A. + B c ). 
 nl 12 11 In 12 21 In 2n 2n 
 
51 
 
 The third row of L U is 
 
 (0,L 32 , L 32 U 23 + L 33 , L 33 U 3V Q, ..., 0, L 32 U 2 ^ L 32 U 2 n+2 + L^ n+g , , . . ., 0) 
 
 The value of L,_ is A__. From L__ U_., + L,_ = A„ and L U,. = A ,, we obtain 
 32 23 32 23 33 33 33 34 3^ 
 
 L 33 = A 33 " L 32 V 
 
 % = A 3i + //L 33* 
 
 The element B_, _ satisfies 
 3 n+1 
 
 B 3 n+1 = L 32 U 2 n+1" 
 To make the third row sum of B equal to zero, we have 
 
 B 3 n+2 ~ " B 3 n+1" 
 
 From L 32 U 2 n+2 + L 33 Q 3 n+2 = A 5 n+2 + B 5 n+2' 
 
 U 3 n+2 = (A 3 n+2 + B 3 n+2 " L 32 U 2 n+2 ^33 * 
 
 Elements B , B ,. B . B . and B _ are determined 
 n+1 3 n+2 3 n+1 n+2' n+2 n+1 n+2 n+2 
 
 by symmetry and row sums of B. 
 
 In practice, only the elements of L and U are saved. The elements 
 of B need not be explicitly computed. The remaining steps of the algorithm 
 are omitted from this informal description. Later the algorithm will be 
 given in a high-level flowchart. 
 
 The nonzero elements of U and B are shown in Figures 6 and 7> 
 respectively. 
 
52 
 
 ex x 
 
 xx xx 
 
 XX X 
 XX XX 
 
 XX X 
 
 XX XX 
 
 XX XX 
 
 XX X 
 
 XX XX 
 
 XX X 
 
 XX XX 
 
 X X 
 
 XX X 
 
 XX XX 
 
 XX X 
 
 XX XX 
 
 XX X 
 
 XX XX 
 
 XX XX 
 
 XX X 
 
 XX XX 
 
 XX X 
 
 XX XX 
 
 X X 
 
 XX 
 XXX X 
 
 XX X 
 
 XXX XX 
 
 XX X 
 
 XXX X 
 
 XXX X 
 XX X 
 
 XXX XX 
 XX X 
 
 XX X 
 X X 
 XX X 
 XX XX 
 XX X 
 
 XX XX 
 X X 
 XX X 
 XX XX 
 XX X 
 
 XX XX 
 X X 
 XX 
 XX 
 XX 
 
 XX 
 X 
 
 Figure 6 
 
53 
 
 XX 
 XXX 
 XXX 
 XXX 
 XXX 
 XXX 
 
 X 
 
 XXX 
 XX 
 XXX 
 XX 
 XXX 
 
 XXX 
 XXX 
 XXX 
 XXX 
 XXX 
 XX 
 
 XXX 
 XX 
 XXX 
 XX 
 XXX 
 XX 
 
 XX 
 XX 
 XXX 
 XX 
 XXX 
 X 
 
 XX 
 XXX 
 XXX 
 XXX 
 XXX 
 XXX 
 
 X 
 XXX 
 XX 
 XXX 
 XX 
 XXX 
 
 XX 
 XXX 
 XX 
 XXX 
 XX 
 XX 
 
 XXX 
 XXX 
 XXX 
 XXX 
 XXX 
 XX 
 
 XXX 
 XX 
 XXX 
 XX 
 XXX 
 XX 
 
 XX 
 XX 
 XXX 
 XX 
 XXX 
 XX 
 
 XX 
 
 xxxx 
 
 XXX 
 
 xxxxx 
 
 XXX 
 
 xxxxx 
 
 XXX 
 XX 
 XX 
 
 XX 
 XXX 
 XX 
 XXX 
 XX 
 XX 
 
 xxxxx 
 
 XXX 
 
 xxxxx 
 
 XXX 
 
 xxxx 
 
 XX 
 
 XX 
 
 XXX 
 XX 
 XX 
 
 
 X X 
 XXX 
 XXX X 
 
 XX 
 XXX 
 XXX 
 
 X 
 
 XXX 
 XX 
 
 XXX 
 XXX 
 
 XX 
 XX 
 XXX 
 
 XXX XXX 
 XX XX 
 
 XXX XXX 
 XXX XX 
 
 XX 
 XX 
 
 XX 
 XX 
 XXX 
 
 XXX XXX 
 XX XX 
 XX 
 XXX 
 XXX 
 
 XX 
 XX 
 
 XXX 
 XX 
 
 Figure 7 
 
54 
 
 In the informal description given above of the algorithm, note: 
 
 (i) Once an element of B is initialized, six additional elements 
 
 of B are computed in such a way as to make B symmetric and 
 
 satisfy the condition that the sum of elements in each row of 
 
 B is zero. 
 
 (ii) At the point of computing L. . for i > 2, we already know the 
 
 values of L for k < i and U, . and U, . for k < i and i < j; 
 
 therefore, we can compute, except for the last term, the 
 
 product terms in the elements (L U).. and (L U). . for j > i. 
 
 11 ij 
 
 Instead of constructing a new matrix B, we will modify the elements of A such 
 
 that it becomes A + B. It will be convenient not to distinguish between 
 
 elements A. . and (A + B). .: the notation AB. . will be used for either. If 
 ij 1.3 ij 
 
 at least one nonzero product term of an element is computable in (ii), the 
 
 element will be called partially known (PK), and the sum of the nonzero 
 
 product terms is called partial value of the element (FV. .). We assume 
 
 that, at the moment of computing the element L. ., the PK elements of the 
 
 i-th row in the upper triangular part of L U have the following three 
 
 properties: 
 
 (i) There are at most four PK elements divided into two sets, 
 
 FPK and SPK. 
 
 (ii) Each set has at most two consecutive elements. 
 
 (iii) Let the largest column index of FPK (SPK) elements be 
 
 LCI n (LCI , ), the largest column index of FA (SA) 
 fpk spk * 
 
 elements in the corresponding row be LCI (LCI ). 
 
 ia sa 
 
 Then, 
 
 LCI^. < LCT , LCI . < LCI . 
 fpk — fa' spk — sa 
 
55 
 
 Let the smallest column index of SPK elements be SCI , , 
 
 spk 
 
 the smallest column index of SA elements in the corresponding 
 
 row be SCI . Then, 
 sa ' 
 
 SCI , = SCI or SCI , = SCI - 1. 
 spk sa spk sa 
 
 A high-level flowchart of the almost factorization algorithm is 
 given in Figure 8. In the LU factorization, we need only the upper 
 triangular part of A + B; as the computation progresses, the used part of 
 A + B may be destroyed. The elements in L. are required only for computing 
 elements in U. ; except for the diagonal elements of L, the used elements 
 of L also may be destroyed; a small (bandwidth X five) two-dimensional 
 array is enough to save the destructible elements of L in a compact form 
 until they are used. The diagonal elements of U are all unity; only the 
 nonzero and non-diagonal elements required to be saved. 
 
 Matrix A stored in an array S by the natural compact storage scheme. 
 The almost factorization algorithm is such that S is .overwritten by the 
 nonzero and non-diagonal elements of U as they are generated, and diagonal 
 elements of L are stored in one -dimensional array D. The new array of the 
 nonzero and non-diagonal elements in U is called PU, as a control information 
 for the array PU, two one -dimensional arrays IC and HE are also generated: 
 column indices of the elements in PU are stored in the array IC, and the 
 numbers of saved elements in each row of U are stored in the array NE. 
 For example, for the matrix U in Figure 6, we have 
 
 PU 1= U 12> ro 2 = V 
 
 pu 3 . u 23 , w k . u 2 n+1 , pu 5 . u 2 n+2 , 
 
 ro 6 = V PU 7 = U 3n + 2> 
 
56 
 
 FACTLU 
 
 i-*- 
 
 X 
 
 i^-i+1 
 
 Compute PV. . 
 
 
 No ^ 
 
 J> 
 
 
 
 
 
 
 Yes 
 
 
 
 \< 
 
 
 L.-* AB. . - PV. . 
 
 11 li 11 
 
 L\.-—AB. . - PV. . 
 Ji iJ iJ 
 
 U. .+— L../L. . 
 
 H ij ji' ii 
 
 Yes 
 
 SWITCH -*- 1 
 
 Modify AB... , AB.. , AB, , 
 lk jk kk 
 
 No 
 
 L. .«— AB.. - PV.. 
 ki lk lk 
 
 lk ki' ii 
 
 Yes 
 
 B. .-«— PV. .- AB. . 
 Modify AB. k , AB , AB kk 
 
 »U- 
 
 Modify AB , AB Rk 
 
 L .<_ -B. .-PV.. 
 ki ij lk 
 
 U.,-*— L. ./L. . 
 lk ki' ii 
 
 L..<— AB. .- PV. . 
 Ji ij ij 
 
 U. .*—L../L. . 
 ij 01 ii 
 
 u i k 
 
 AB.. - PV.. 
 i lk lk 
 
 L. ./L.. 
 ki ii 
 
 Yes 
 
 SWITCH *- 
 
 RETURN \ 
 
 Figure 8 
 
57 
 
 IC 1 = 2, IC 2 = n; m ± = 2, 
 
 IC 3 = 3, IC^ = n+1, IC = n+2; HE 2 = 3, 
 
 ICg = 4, IC = n+2; WE = 2, 
 
 The details of the implementation is displayed in Flowchart 3 of the 
 Appendix. 
 
 The storage scheme used for the elements of PU is not the natural 
 compact storage described in Chapter 3« Also, note that the elements of 
 the diagonal of L, stored in array D, may be stored in array PU since the 
 elements of U are l's and need not be stored. A separate array has been 
 used simply for programming convenience. 
 
 The adaptive Chebyshev procedure of Section 2 requires that A + B 
 be positive definite. A proof of positive definiteness has been established 
 for a regular finite element system, which it is expected extends to the 
 irregular system discussed here. These results will be presented elsewhere. 
 The emphasis here is on the potential practical benefit and implementation 
 of iterative methods. 
 
 Note that the solution of 
 
 T 
 L D L x = y, 
 
 is required in {2k, b), which is equivalent to the systems 
 
 Lv = y, 
 
 Du = v, 
 
 L x = u. 
 
58 
 
 An algorithm for solving these equivalent systems is given in Flowchart k- 
 of the Appendix. For which no searching operation is used in the algorithm. 
 
 k.k Computational Results 
 
 As a model problem for testing the developed algorithms and storage 
 schemes, the Dirichlet problem (l) and (2) in a rectangular region fi is 
 selected. Operator L in (2, a) is defined by 
 
 LU " -5x" ^lo^ ~^ {a 2p- (2 > a) 
 
 The domain ft is partitioned into graded triangles and the node points are 
 numbered according to the pattern and ordering scheme in Figure 5« The 
 structure of A is shown in Figure km For the problem with a _ = a = 1, 
 the following five systems are solved: 
 
 (1) 570 unknowns; 
 
 (2) 1600 unknowns; 
 
 (3) 5130 unknowns; 
 (k) 8732 unknowns; 
 (5) 10660 unknowns. 
 
 The right hand side vector b is chosen as the null vector; the initial guess 
 x is the vector with components equal to unity. The test results are 
 displayed by a graph of the number of iterations vs. the norm of the residual 
 vectors in Figure 9* 1^- e initial estimates of the eigenvalue bounds used 
 for the test are as follows: 
 
 (1) [0.8, 1.5] for system (l) 
 
 (2) [0.8, 1.51 for system (2) 
 
 (3) [O.k, 7.5] for system (3) 
 
59 
 
 m 
 
 OA 
 
 o 
 
 oo 
 
 o 
 
 CO 
 
 
 o 
 
 
 o 
 
 LT\ 
 
 o 
 
 LT\ 
 
 
 O 
 
 ro 
 
 O 
 
 OO 
 
 LT\ 
 CM 
 
 O 
 
 CM 
 
 UA 
 
 H 
 
 w 
 o 
 
 •H 
 -P 
 05 
 !h 
 <D 
 -P 
 H 
 
 <+h 
 O 
 
 ?H 
 
 0) 
 
 0> 
 CD 
 
 •H 
 
 o 
 
 H 
 
 OJ 
 
 H 
 
 O 
 
 O 
 
 O 
 
 O 
 
 ■+- 
 
 H 
 i 
 
 O 
 H 
 
 -t- 
 
 OJ 
 i 
 
 o 
 
 H 
 
 ro 
 i 
 
 O 
 H 
 
 O 
 H 
 
 i 
 
 I 
 
 o 
 H 
 
 vc 
 
 o 
 
 H 
 
 i 
 
 o 
 
 H 
 
 CO 
 
 i 
 
 o 
 H 
 
 1— 
 
 I 
 
 o 
 H 
 
 o 
 
 H 
 
 O 
 H 
 
 1- 
 
 H 
 H 
 i 
 O 
 
 CM 
 
 O 
 H 
 
 1— 
 
 ro 
 
 rH 
 
 I 
 
 o 
 
 s-p2npTS8H jo uijo^j 
 
6o 
 
 (h) [0.1)-, 9.5] for system (h) . 
 (5) [O.k, 9.5] for system (5). 
 
 Since the structure of A is irregular, other efficient iterative 
 methods such as ADI are not applicable. There are no known computational 
 experiments -with which our results can be compared directly. Of course, 
 this is the primary reason for the development of our method. For the 
 linear system with 900 unknowns obtained from the same Dirichlet problem 
 
 by the five point difference technique, Diamond [10| reduced the residual 
 
 -k- -8 
 
 norm to 10 10 after 35 iterations using his adaptive procedure with 
 
 Stone's symmetric factorization for several choices of initially estimated 
 
 eigenvalue bounds. For the finite element system with 570 unknowns and 
 
 1600 unknowns, the residual norms are reduced approximately to 10 and 
 
 10 , respectively, after j)5 iterations in our tests. Although the matrix 
 
 structure is irregular, the results are comparable with that obtained by 
 
 Diamond. For the system with 10660 unknowns, the residual norm is reduced 
 
 -5.7 
 to 10 after 55 iterations. 
 
61 
 
 APPENDIX 
 
 Details of the computer implementation, some of which are not 
 precisely stated in the main text, are displayed in the following four 
 flowcharts of this Appendix. We use the symbols discussed by Gear [131 • 
 Since most of the symbols are self-explanatory we do not explain the meaning 
 of them except to indicate that N stands for DO LOOP and ^> —A 
 
 for SUBROUTINE CALL. 
 
 Brief description of each flowchart follows: 
 Flowchart 1 : This shows the algorithm for computing Y = A * X, where A is a 
 matrix stored in an array OS by the natural compact storage scheme and N is 
 the order of A. 
 
 Flowchart 2 : Diamond's adaptive -Chebyshev- factorization procedure is 
 presented with minor modification. Among eight subroutines called in the 
 procedure, three (PRODMT, FACTLU and SOLVLU) are in Flowchart 1, 3 and k. 
 Contents of the other five follow: 
 
 For the finite element system A * X = H, GEOSLH (N, NOS, OS, L, H) 
 generates A and H. A is stored in OS; control information for OS is created 
 in L; NOS is the number of elements of A in OS. 
 
 INITIA (N,X) initializes X for the iteration. 
 
 RESIDL (N,DW,H,RN) computes a vector DW = H - RN and also sets 
 RN = DW. 
 
 NEWT (N,X,DW, TAU) computes a vector X = X + TAU * DW. 
 
 INPRO (N,DW,RN,UN) computes an innerproduct UN = (DW, RN). 
 
62 
 
 Flowchart j : Details of the almost factorization algorithm are displayed. 
 
 T 
 S is a copy of OS generated by GEOSLH. In the factorization L * D * L = 
 
 A + B, elements of D are stored in D; nonzero and non-diagonal elements of 
 
 L overwrite S, and the part of the array S containing the elements of L 
 
 is called PU; IC and NE are control information for PU. The variable I DIM 
 
 should be greater than or equal to (the bandwidth of A) +1. The variable 
 
 MDIM should be greater than or equal to the maximum number of partial values 
 
 in every row. IFRB and JFRU are one -dimensional scratch arrays of size 
 
 greater than or equal to N. NECR is a one-dimensional scratch array of 
 
 size greater than or equal to IDIM. ICE and CRT are two-dimensional 
 
 scratch arrays of size greater than or equal to IDIM X 5. ICL and PV are 
 
 one -dimensional arrays of size greater than or equal to MDIM. 
 
 T 
 Flowchart k : The algorithm computes the solution of the system L * D * L * X 
 
 = Y, where L is stored in PU by the storage scheme discussed in Section h.~5> 
 
 NPU is the number of elements of L stored in PU; IC and- KE are the control 
 
 information for PU. 
 
63 
 
 RETURN 
 
 > 
 
 PRODMT 
 
 N,OS,L,Y,X 
 
 !«*— 1 
 
 N 
 
 Y(I)— 
 
 LE*-0 
 
 END> 
 
 I— 1 1 
 
 N 
 
 £ 
 
 LB-e-LE + 1 
 LE-^-LE + L(LB) 
 
 1 
 
 1y(I)^-Y(I) + OS(LB) * X(I) 
 
 LBP<-LB + 1 
 
 J-«— LBP 1 
 
 LE 
 
 i 
 
 y(D — 
 
 -Y(I) 
 
 + 
 
 OS (J) 
 
 * 
 
 X(K) 
 
 A 
 
 Y(K) — 
 
 ■Y(K) 
 
 + 
 
 OS (J) 
 
 -K- 
 
 X(I) 
 
 END> - 
 
 END> 
 
 Folwchart 1 
 
START 
 
 6k 
 
 * . 
 
 GEOSLH \ 
 
 ,NOS,OS,L,H/ 
 
 I*-l 1 
 
 NOS 
 
 s(i)-«-os(i)H 
 
 FACTLU 
 
 S,L,N,D,PU,IC,NE,NPU,IFRB, JFRU,NECR, ICR,CRT, ICL, PV 
 
 / IN,ITMA 
 
 |bnder,ei, 
 
 ITMAX 
 E2 
 
 INP1<— IN+1 
 
 I TER •«- 
 
 UMIN-*-El 
 
 U-*-0 
 
 PRODMT 
 
 N,OS,L,RN,DWM 
 
 SOLVLU 
 
 N,NPU,D,PU,IC,NE,DW,DW 
 
 TAU-6- -TAU 
 
 NEWT 
 
 N,DWM,RN,TAU 
 
 INITIA 
 
 N,X 
 
 Y0*-E1P2/E2M1 
 
 TMM<— 1 
 
 TM<--YO 
 TMP-«-2*Y0**2-l 
 Q1N -*- ^/E2M1 
 
 I*-l 1 
 
 N 
 
 i 
 
 I-*— 
 
 E1P2 <t— E1+E2 
 E2M1<— E2-E1 
 TAU«— 2/E1P2 
 
 NTEST*- 
 
 PRODMT 
 
 N,OS,L,RN,X 
 
 RESIDL 
 
 N,DW,H,RN 
 
 SOLVLU 
 
 N,NPU,D,PU,IC,NE,DW,DW / 
 
 NEWT 
 
 N,X,DW,TAU 
 
 ITER^ITER+1 
 
 DWM(l)-— DW(l)*TAU 
 
 n 
 
 n 
 
 Flowchart 2 
 
65 
 
 
 Q2 <-TMM/TMP 
 
 I 
 
 I*-l 1 
 
 DWM(I)-«-DWM(I)*Q2 
 
 PRODMT 
 
 N,OS,L,RN,X 
 
 RESIDL 
 
 N,DW,H,RN 
 
 INPRO 
 
 N,DW,DW,ERRBND 
 
 X 
 
 ERRBND «— SQRT ( ERRBND ) 
 
 H SAVE -e- ERRBND 
 
 •x(M, •■)=•, 
 x(i) 
 
 NSETl*— 1 
 El-«— E2 
 E2*-U 
 
 N,NPU,D,PU,IC,NE, DW, DW 
 
 TAU««— l/E2 
 NTEST <- 1 
 
 INPRO 
 
 N,DW,RN,UD 
 
 'ITMAX EXCEEDED' / N,OS,L,RN,DW 
 
 El-«— IF.IN 
 
 PRODMT 
 
 INPRO 
 
 N,DW,RN,UN 
 
 I 
 
 qi-— qjn*tm/tmp 
 
 NEWT 
 
 N,DWM,DW,Q1 
 
 NEWT 
 
 N,X,DWM,1 
 
 TMM <r-7M 
 
 TM-«-TMP 
 
 IMP -*- 2 * YO*TM -TMM 
 
 u«-un/ud 
 
 Flowchart 2 (Continued' 
 
66 
 
 FACTLU 
 
 !, L,N,D, HJ, IC,NE, IIC, IFRB, JFRU,NECR, ICR,CRT, ICL, 
 
 ( / idim,mdim| 
 
 IFRB(l) «-K 
 I 
 
 2 1 
 
 N 
 
 11* 
 
 IDIM 
 
 JFRU(l)*— 1 
 IIC*-0 
 
 -1 
 
 N 
 
 j: 
 
 RETURN 
 
 ) 
 
 £ 
 
 KN— NECR(j) 
 NICL— 
 
 MP- 
 
 1 1 
 
 MDIM 
 
 ICL(MP)<— ~0 
 
 P 
 
 II— 1 1 
 
 KN 
 
 >JK— K + L(K) 
 IFRB(I)— K 
 
 NECR(l) — 
 
 1 
 
 A 
 
 K— IFRB(l) 
 
 NE(l) — 
 
 J *- MOD (I, IDIM) 
 
 #~^<T 
 
 o; 
 
 -H J — IDIM 
 
 D(I) — S(K) 
 IP— 
 
 NCL— ICR(I1,J) 
 NEL— NE(NCL) 
 LFB— JFRU(NCL) 
 
 LP 
 
 I 
 
 1 * 
 
 NEL 
 
 LF^-LFB +(LP-l) 
 LC — IC(LF) 
 
 :lc< I 
 
 Flowchart 3 
 
67 
 
 NECR( J) «-Q 
 
 PD — - CRT( II, J)*HJ( LF ) 
 
 MP-«-l 1 
 
 MDIM 
 
 NICL-1 
 
 (l)«-S(K)-PV(l) 
 
 IP<-1 
 
 IS*— 1 
 NER<^L(K) 
 
 ID-*-L(K+IS) 
 IE-*-ICL(lP+l) 
 
 IIP -«-Il+l 
 
 12 —IIP 1 
 
 PV(MP) — PD 
 ICL(MP) — LC 
 NICL — NICL+1 
 
 IT— ICL(ll) 
 TP — PV(ll) 
 ICL(ll)— ICL(l2) 
 PV(ll)— PV(l2) 
 ICL(l2)— IT 
 PV(I2) — TP 
 1 
 
 PV(MP) — PV(MP)+PD 
 
 ® 
 
 J = N 
 
 RETURN 
 
 JFRU(l+l)— JFRU(l)+NE(l) 
 
 <A) 
 
 cic— s(k+TsT 
 
 ip— mi 
 
 NIC— L(K+IS) 
 
 [IS <— IS+l j 
 
 -© 
 
 X — PV(lP) 
 NIC— ICL(IP) 
 LDP-*— IFRB(NIC) 
 S(LDP) — S(LDP)-X 
 
 CIC — S(K+IS)-PV(IP) 
 
 ® 
 
 X — PV(lP)-S(K+IS) 
 
 Flowchart 3 (Continued) 
 
IP««-IP+1 
 
 1 
 
 Y — PV(lP) 
 
 \ 
 
 1 
 
 CIC<--X-Y 
 NIC-— ICL(IP) 
 LDP*-IFRB(NIC) 
 S(LDP)*-S(LDP)+X 
 
 -»1. 
 
 CIC- 
 
 NIC- 
 
 -PV(IP) 
 ICL(IP) 
 
 INI— NECR(JP)+1 
 NECR(JT)-e— INI 
 ICR(UD.,JP)-*— I 
 CRT(lNl,JP)«— CIC 
 IC(lIC)t— NIC 
 HJ(IIC)*-CIC/D(I) 
 NE(I)^— NE(l)+l 
 
 N 
 
 68 
 
 IP— -IB-1 
 
 X— PV(IP) 
 
 NIC-«-ICL(lP) 
 LDP—IFRB(NIC)+1 
 S(LDP)«-S(LDP)-X 
 NIC^-L(LDP) 
 LDP— IFRB(NIC) 
 S(LDP)— S(LDP)+X 
 
 IP*— IP+1 
 
 Y«-PV(IPJ 
 
 CIC— - S(K+IS)-X-Y 
 
 _£ 
 
 NIC — ICL(lP) 
 LDP — IFRB(NIC)+1 
 S(LDP)— S(LDP)-X 
 NIC — L(LDP) 
 LDP— IFRB(NIC) 
 S(LDP)<— S(LDP)+X 
 
 US— IS+l l 
 
 CIC — S(K+IS)-X 
 
 NIC — L(K+IS) 
 LDP<— IFRB(NIC) 
 S(LDP) — S(LDP)+X 
 
 J*. 
 
 H 
 
 4 
 
 Flowchart 3 (Continued) 
 
SOLVLU 
 
 N,NPU,D,PU,IC,NE,Y,X 
 
 69 
 
 RETURN 
 
 > 
 
 J<— 1 1 
 
 K 
 
 WM— N - 1 
 M-*-NPU 
 K— -1 
 
 LE^-0 
 
 I«-l 1* 
 
 x(D— Y(fT 
 
 LB<-LE + 1 
 LE-«— IE + NE(l) 
 
 J^LB 1* 
 
 LE 
 
 JR^-IC(j) 
 
 1 
 
 Y(JR)<— Y(JR) - RJ(J) * X(l) 
 I 
 
 END> 
 
 END> - 
 
 X(N)-«— Y(N) 
 
 I 
 
 1 1 
 
 N 
 
 Y(I)^-X(I)/D(I) 
 
 I 
 |END> 
 
 X(N)«— Y(N) 
 
 i 
 
 1 1 
 
 NM 
 
 *|M<-M - K 
 IX-*- N - I 
 
 x(ix)-e— y(ix) 
 
 K«-NE(IX) 
 
 JX-e— IC(M - J) 
 
 J. 
 
 X(lX)^-X(lX) - X(JX) * RJ(M - J) 
 
 1 
 
 END> 
 
 END> 
 
 Flowchart k 
 
70 
 
 LIST OF REFERENCES 
 
 1. G. Birkhoff, The Numerical Solution of Elliptic Equations, Regional 
 Conference Series in Applied Mathematics, No. 1, SIM, Philadelphia, 
 1971. 
 
 2. G. Birkhoff, M. H. Schultz and R. S. Varga, "Piecewise Hermite 
 Interpolation in One and Two Variables with Applications to Partial 
 Differential Equations," Num. Math ., 11 (1968), pp. 232-256. 
 
 3. A. Bracha-Barak and P. E. Saylor, "A Symmetric Factorization Procedure 
 for the Solution of Elliptic Boundary Value Problems, " SIAM J. Num . Anal . 
 12 (1973), PP. 190-206. 
 
 k. H. H. Bramble and M. Zlamal, "Triangular Elements in the Finite Element 
 Method," Math. Comp ., 2k (1970), pp. 809-820. 
 
 5. B. L. Buzbee, G. H. Golub and C. W. Nielson, "On Direct Methods for 
 Solving Poisson's Equations," SIAM J. Num . Anal ., 7 (1970), pp. 627-656. 
 
 6. R. E. Carlson and C. A. Hall, "Ritz Approximations to Two-Dimensional 
 Boundary Value Problems," Num . Math ., 18 (1971), PP> 171-181. 
 
 7. J. C. Cavendish, H. S. Price and R. S. Varga, "Galerkin Methods for the 
 Numerical Solution of Boundary Value Problems," Soc . Pet . Eng . J-, 9 
 (1969), pp. 20^-220. 
 
 8. L. Collatz, The Numerical Treatment of Differential Equations, 3rd ed. , 
 Grundlehren d. Math. Wiss., 60, Springer -Verlag, Berlin, i960. 
 
 9. R. Courant, "Variational Methods for the Solution of Problems of 
 Equilibrium and Vibrations, " Bull . Amer . Math . Soc , k-9 (19^3), pp. 1-23. 
 
 10. M. A. Diamond, "An Economical Algorithm for the Solution of Finite 
 Difference Equations," Ph. D. Dissertation, Department of Computer 
 Science, University of Illinois, UIUCDCS-R-71-^92, December, 1971. 
 
 11. T. Dupont, R. P. Kendall and H. H. Rachford, Jr., "An Approximate 
 Factorization Procedure for Solving Self -Adjoint Elliptic Difference 
 Equations," SIAM J. Num . Anal ., 5 (1968), pp. 559-573. 
 
 12. C. A. Felippa and R. W. Clough, "The Finite Element Method in Solid 
 Mechanics," SIAM-AMS Proc , 2, G. Birkhoff and R. S. Varga eds., AMS, 
 Providence, 1970, pp. 210-252. 
 
71 
 
 13. C. W. Gear, Introduction to Computer Science, Science Research Associates 
 Inc., 1973. 
 
 1^4-. A. George, "Computer Implementation of the Finite Element Method, " Ph. D. 
 Dissertation, Computer Science Department, Stanford University, 
 STAN-CS-71-208, February, 1971. 
 
 15. , "Nested Dissection of a Regular Finite Element Mesh," SIAM J. 
 
 Num . Anal ., 10 (1973), PP- 3^5-363- 
 
 16. A. S. Householder, The Theory of Matrices in Numerical Analysis, 
 Blaisdell, New York, 196^. 
 
 17. A. Jennings, "A Compact Storage Scheme for the Solution of Symmetric 
 Linear Simultaneous Equations," Computer J., 9 (19&6), pp. 28l-285« 
 
 18. R. Levy, "Resequencing of the Structural Stiffness Matrix to Improve 
 Computational Efficiency," J PL Quarterly Technical Review, 1 (1971), 
 pp. 61-70. 
 
 19. S. G. Mikhlin, The Problem of the Minimum of a Quadratic Functional, 
 English Trans 1., Holden-Day, San Francisco, 1965* 
 
 20. , Variational Methods in Mathematical Physics, English Transl., 
 
 MacMillan, New York, 196^. 
 
 21. C. Miranda, Partial Differential Equations of Elliptic Type, 2nd rev. ed., 
 Erg. d. Math. u. Grenzgebiete, 2, Springer -Verlag, Berlin, 1970* 
 
 22. H. S. Price and K. H. Coats, "Direct Methods in Reservoir Simulation," 3-rd 
 SPE Sym. Num. Sim. Reser. Per. AIME, Houston, Texas, Jan., 1973- 
 
 23. D. J. Rose, "A Graph-theoretic Study of the Numerical Solution of Sparse 
 Positive Definite Systems of Linear Equations, " Graph Theory and 
 Computing, R. C. Read ed., Academic Press, New York, 1972, pp. 183-217. 
 
 2k. H. L. Stone, "Iterative Solution of Implicit Approximations of 
 
 Multi- dimensional Partial Differential Equations," SIAM J. Num . Anal., 
 5 (1968), pp. 530-558. 
 
 25. G. Strang, "Approximation in the Finite Element Method," Num. Math . 19 
 (1972), pp. 81-98. 
 
 26. J. K. Synge, The Hypercircle in Mathematical Physics, University Press, 
 Cambridge, 1957- 
 
 27. W. F. Tinney and J. W. Walker, "Direct Solutions of Sparse Network 
 Equations by Optimally Ordered Triangular Factorization, " Proc . IEEE, 
 55 (1967), PP. 1801-1809. 
 
 28. R. S. Varga, Matrix Iterative Analysis, Prentice -Hall, Englewood Cliffs, 
 N. J., 1962. 
 
72 
 
 29» D. M. Young, Iterative Solution of Large Linear Systems, Academic Press, 
 New York, 1971- 
 
 30. 0. C. Zienkiewicz, The Finite Element Method in Engineering Science, 
 McGraw-Hill, London, 1971- 
 
 31. and Y. K. Cheung, "Finite Elements in the Solution of Field 
 
 Problems," The Engineer, 220 (1965), pp. 507-510. 
 
 32. M. Zlamal, "The Finite Element Method in Domains with Carved Boundaries," 
 Int . J. Num. Math . Eng ., 5 (1973), pp. 367-373- 
 
 33. , "On the Finite Element Method," Num . Math., 12 (1968), 
 
 pp. 39^-^09. 
 
73 
 
 VITA 
 
 Young Jip Kim was born in Korea on November 27, 193 5 • After 
 receiving a M.S. in Mathematics from Yonsei University, Seoul,, Korea, in 
 i960, he taught mathematics at Taejon Presbyterian College, Taejon, Korea, 
 until 1965* He received another M.S. in Mathematics from the University 
 of Illinois in February 1967* Prior to continuing his studies at the 
 Department of Computer Science, University of Illinois, in 1969, he held 
 the position of Head of the Department of Mathematics, Taejon Presbyterian 
 College. He is a past member of AMS, MAA and SIAM; he is currently a 
 student member of ACM and an associate member of Sigma Xi. 
 

 
 
 
 
 
 BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-73-600 
 
 2. 
 
 3. Recipient's Accession No. 
 
 4. Title and SuDtitTe 
 
 AN EFFICIENT ITERATIVE PROCEDURE 
 FOR USE WITH THE FINITE ELEMENT METHOD 
 
 5- Report Date 
 
 August, 1973 
 
 6. 
 
 7. Author(s) 
 
 Young Jip Kim 
 
 8. Performing Organization Rept. 
 No. 
 
 9. Performing Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Champaign-Urbana 
 
 Urbana, Illinois 6l801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 OWRR PROJ. #U058-ILL 
 NSF GJ-36393 
 
 12. Sponsoring Organization Name and Address 
 
 Water Resources Center National Science Foundation 
 University of Illinois 1800 G. Street, N.W. 
 at Champaign-Urbana Washington, D. C. 20550 
 Urbana, Illinois 618OI 
 
 13. Type of Report & Period 
 Covered 
 
 Thesis (Ph.D.) 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 The symmetric almost factorization of Stone for finite difference 
 systems is extended to irregular finite element systems. Combined with 
 an adaptive Chebyshev iterative procedure of Diamond, the factorization 
 algorithm is applied to linear systems for the numerical solution of 
 two-dimensional Dirichlet problems by the finite element method. 
 
 For a computer implementation, a compact storage scheme is 
 introduced for sparse symmetric matrices; programming techniques are 
 developed such that no searching operation for elements of the matrices 
 is required in computations of the iterative method. 
 
 17. Key Words and Document Analysis. 17o. Descriptors 
 
 Almost Factorization Algorithm, Compact Storage Scheme, 
 Dirichlet Problem, Finite Element Method, Iterative Method, 
 Positive Definite Matrix, Sparse Matrix. 
 
 17b. Identifiers/Open-Ended Terms 
 17c. COSATI Field/Group 
 
 18. Availability Statement 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 75 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 22. Price 
 
 ORM NTIS-35 (10-70) 
 
 
 
 
 
 USCOMM-DC 40329-P7 1 
 
1 4 1973 
 

!-' 
 
 \<& 
 
MBHH hmh 
 
 wiaS