9m HfiHBn mm B»3ra BkBbHHSHHS WmWBaSBK mqmSH (HBhbkBe SH MIMiMb nan mm k M se ffiS HttBHnR 9BHnfii BBWfffflHl mmmtSSSOt WMbMM iWffl sen ■MP Hi BsSSb KrkBS ■ mm bmsi nmfiHn BhwNHSw BR n ■ HH ■raUHi HMB B B HBM H-M. LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 3/0.84 14 6 r no. S9S-&&0 UIUCDCS-R-73-600 AN EFFICIENT ITERATIVE PROCEDURE FOR USE WITH THE FINITE ELEMENT METHOJ August, 1973 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS #/£ ^ ^ Op The person charging this material is re sponsible for its return to the library from which ii was withdrawn <»n or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA CHAMPAIGN JAN 3 ib/4 JAN 2 1 Recd L161 — O-1096 UIUCDCS-R-73-600 AN EFFICIENT ITERATIVE PROCEDURE FOR USE WITH THE FINITE ELEMENT METHOD Young Jip Kim August, 1973 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois This work was supported in part by the National Science Foundation under Grant No. US NSF GJ-36393 and the Water Resources Center, OWRR PROJ. #A-058-ILL and was submitted in partial fulfillment for the Doctor of Philosophy in Computer Science, 1973* Digitized by the Internet Archive in 2013 http://archive.org/details/efficientiterati600kimy Ill ACKNOWLEDGEMENT The author wishes to express his appreciation and gratitude to Professor Paul E. Saylor, whose supervision, guidance and encouragement made the present work possible. For their valuable comments the author is indebted to Professors D. B. Gillies, J. R. Phillips and D. S. Watanabe. In addition, the author would like to acknowledge the Department of Computer Science, the Office of Computing Services and the Water Resources Center (Allotment Grant Program, OWRR PROJ. # A-058-ILL), University of Illinois and the National Science Foundation (GJ-J6393) for their financial support during the period of his graduate study and research. Sincere thanks are due to Mrs. J. Wingler for her efficient and careful typing of the manuscript, and to Mrs. M. L. Gidel for her aid in preparation of the manuscript. Finally, the author wishes to thank his wife and three children for their patient understanding through his long period of study. IV TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1 Iterative and Direct Methods 1 1.2 Factorization Methods 6 1.3 Summary of the Contents 10 l.k Summary 11 2. FINITE ELEMENT METHOD 12 2.1 Introduction. 12 2.2 Ritz • s Method and Galerkin' s Method 13 2.3 Finite Element Method 16 3. MATRIX STRUCTURE AND STORAGE SCHEME 26 3.1 Graph Theory and Positive Definite Matrices 26 3.1.1 Graph theory 26 3.1.2 Graph theory applied to positive definite matrices... 27 3.2 Structure of the Matrix Associated with the Finite Element Method 29 3»3 A Storage Scheme for Sparse, Symmetric Matrices 36 3.k Matrix-Vector Multiplication Using a Row-Oriented Storage Scheme 38 h. ADAPTIVE ITERATIVE PROCEDURE TO SOLVE FINITE ELEMENT SYSTEM kO k-.l Introduction ^-0 k-,2 The Adaptive -Chebyshev-Factorizati on Method of Diamond kl h.~5 Almost Factorization Algorithm k6 V Page k.h Computational Results 58 APPENDIX 61 LIST OF REFERENCES 70 VITA 73 1. INTRODUCTION 1.1 Iterative and Direct Methods Two standard, general methods for the numerical solution of elliptic partial differential equations are the method of finite differences and the finite element method (FEM). Each one leads ultimately to a large system of linear equations, called either a finite element system or a finite difference system, the solution of which is usually the most expensive step of the computation. The subject of this thesis is the extension to finite element systems of an iterative method developed for the efficient solution of finite difference systems. Other iterative methods, also originally developed for finite difference systems, have been proposed for finite element systems [12" 1 , but have not been widely accepted. The customary, popular method for the solution of finite element systems is Gaussian T elimination or something equivalent such as the Cholesky GG algorithm. It is important for the judgment of the method proposed here to understand the reasons why iterative methods should be reconsidered. The essential difference between the two methods in the early days of computing was that iterative methods take much less storage, a fact which made it possible to put reasonable size linear systems into fast memory, thereby making feasible on stored program digital machines the solution of boundary value problems in partial differential equations. The first iterative methods used for automatic computing were primitive relaxation methods which converged too slowly unless an effective relaxation parameter could be found to accelerate convergence. There was no satisfactory method to approximate an effective parameter until Young [29] developed a general theory of Successive-Over-Relaxation (SOR) that reduced the computation of the optimum parameter value to a trivial algebraic evaluation involving eigenvalue bounds of a matrix called the Jacobi matrix associated with matrix of the linear system. These eigenvalue bounds are easily computed for a vaguely defined class of matrices called regular matrices. A matrix may be said to be regular if the nonzero elements form diagonals such as occur from the use of the five point difference equation applied to Poisson's equation on a rectangular domain. Finite element matrices are also regular if the domain is a rectangle and the triangulation is made up of congruent triangles. Regularity of a matrix should not be defined too carefully as it also depends on the iterative method. For the Alternating-Direction-Implicit (ADl) method, the domain must not only be square but also the matrix must split into a sum of commutative matrices. Whatever the iterative method, the primary implication of regularity usually has been that a regular matrix is one for which acceleration parameters are easy to compute. A great variety of iterative methods have been proposed for the solution of finite difference systems other than SOR or ADI, but as for ADI or SOR, acceleration parameters are essential to the success of each method. Theories for computing acceleration parameters for finite difference systems are difficult to apply to finite element systems because they exploit as much as possible the regular structure of finite difference systems. Typical finite element systems are not regular as they result from the triangulation of an irregular domain into irregular elements. There are subtleties in the use of direct methods also, such as for example, finding a way to reorder the nodes of the triangulation to minimize computation. Easy to use, broadly applicable techniques and algorithms for reordering the nodes have been devised, whereas it was an obstacle for the use of iterative methods that no general technique was known for estimating acceleration parameters for finite element systems. One of the features of the iterative method we are proposing is that no estimates of acceleration parameters are required prior to execution of the iteration. There are other reasons for the predominant use of direct methods for finite element systems [1^]. One is that storage is not as critical as for finite difference systems because finite element systems of the same accuracy are lower order. Also storage is becoming cheap and plentiful. As we have mentioned, iterative methods were favored for years for solving finite difference systems primarily because they require less storage. Slow convergence was tolerated to gain this advantage. Because of hardware improvements such as bulk core storage and features such as virtual memory, the importance of conserving storage i s fading away. Yet this cannot mean the unlimited use of direct methods. As storage problems become minor, run time for Gaussian elimination remains a formidable barrier to the solution of large systems that cost memory cannot affect. Run time can be reduced only by more efficient methods than Gaussian elimination, such as the direct methods discussed in Buzbee, Golub and Neilson [5" 1 or else rapidly convergent iterative methods. In an attempt to explain the popularity of direct methods for finite element systems, A. George [1^] has suggested these advantages that (a) iterative refinement may be used with direct methods to gain estimates of the condition of the system, and (b) it is common for more than one right hand side to be processed. These reasons are important but, as the magnitude of the problem increases, they fail to justify the blind application of direct methods. Consider for example, the flow equations that arise in certain hydrology studies. These are time dependent problems that may be solved by approximating the space derivatives by the finite element method to obtain a system of ordinary differential equations. The solution of this system is then computed by discretizing the time derivative. If the backward difference method is used, each time step requires the solution of a finite element system. Typical studies require 1000 time steps. With time dependent coefficients, no economy is possible from solving the same system repeatedly by a direct method. Currently, only two dimensional studies are considered feasible. Some technique more efficient than conventional direct methods is essential to make three dimensional problems practical. Difficulties with the solution of the flow equations specifically led to the development of the iterative method described in this work. The difficulties with the solution by Gaussian elimination of the resulting irregular finite element systems are that the code is limited to 2000 unknowns on the University of Illinois 360 service computer and is expensive to run. The iterative method developed here has been tested for up to 10660 unknowns. However, these tests required the use of extended core storage. Gaussian elimination code has not been run using this facility, but it is estimated that extended core storage would allow no more than ^000 unknowns. For some problems, ^000 unknowns is not enough to be practical. Execution time for the code is estimated to be roughly ten times faster than elimination routines. It should be emphasized that further tests of the iterative method must be made to discover its limitations. Any conclusions about the superiority of iterative methods should be made cautiously until more experience is acquired. These test results are mentioned here as an example of how the magnitude of a problem may affect the choice of an iterative method if one exists. A final argument in favor of direct methods is that data management is simpler for elimination schemes than for iterative methods when the triangular mesh is arbitrary. The result of an arbitrary mesh is a sparse matrix with irregular structure and it is an impression that techniques for efficient storage of sparse matrices adapt to elimination schemes more easily than to iterative methods. Alan George has articulated the difficulty as follows, where A is used to denote the matrix of a finite element system. [For difference systems arising from a regular mesh, the structure of the grid and the coefficient matrix can conveniently be stored in two-dimensional arrays. Data management is trivial.] "...However, for an arbitrary triangular mesh, A will not have such regular structure, and the calculation of a single component of the residual vector may be relatively expensive. In general, A will be symmetric and only its upper or lower triangle will be stored; therefore, in order to compute a single component of the residual, we must be able to access lines of elements in both rows and columns of the upper (or lower) triangle of A. If the storage scheme is 'row oriented,' accessing elements in a specific column may require scanning several rows, and visa versa for column -oriented schemes. By contrast, elimination schemes can be conveniently implemented so that they operate only on rows or only on columns." It is possible, however, to avoid these difficulties. The programming algorithms developed to implement the iterative method of this thesis do not require any searching operations during execution of the steps of the iteration even though sparse matrix techniques are used to store the data. The purpose of this part of the introduction has been to present the reasons why direct methods are used exclusively for finite element systems. The development of new programming techniques voids the programming advantage of direct methods. But the main handicap with direct methods is that for large systems, they are impractical due to excessive run time. In presenting a case for consideration of iterative methods, there is some risk that we may seem to propound a doctrine of the inherent weakness of direct methods. The reasons given here for the wide acceptance of direct methods overwhelmingly support direct methods for the solution of moderate size systems if efficient data management and sparse matrix methods are employed in the code. It is of interest that after years of eclipse by iterative methods, direct methods are now being recognized as of special importance in the solution of finite difference systems of moderate size. Their use in petroleum reservoir simulation, an area of engineering to which the finite element method has not penetrated, is now being advocated [22]. 1.2 Factorization Methods Let Ax = b be the system of equations obtained by a finite element approximation, where A is a positive definite and, therefore, symmetric square matrix. If B is a matrix for which the LU factors of A + B are sparse then the sequence defined by (A + B)x. = (A + B)x - t. (Ax. - b), (*) for t • a parameter, is efficient to compute. Therefore, if the sequence converges to x and converges rapidly, the result is a practical method to approximate the solution. Methods to solve Ax = b that combine an 7 iteration such as (*) with an algorithm to construct A + B with sparse LU factors are called factorization methods . For example, successive-over- relaxation may be written in the form of (*) with elements of B explicitly equal to linear combinations of elements of A. Matrix A + B is called an approximate factorization (of A) and (*) is called a factorization procedure . If A +Bis positive definite a sequence of parameters t , . .., may always be chosen to obtain convergence. Moreover, an optimum sequence of parameters, t , . .., t may be computed to minimize the error after n steps. Optimum parameters result from a routine use of Chebyshev minimax theory. Two requirements are essential to apply the theory. One is that A+B be positive definite. This implies that the eigenvalues of (A + B)~ A are positive, since this matrix is similar to a positive definite matrix. Unless these eigenvalues are positive there is no convenient, effective method to apply Chebyshev theory to obtain decent parameters. If they are positive it is trivial to compute optimum parameters using Chebyshev theory from the largest and smallest eigenvalues of (A + B) A. The second requirement is the practical one that the largest and smallest eigenvalues of (A + B) A be available to compute with. It is a common difficulty with iterative methods for solving linear systems that a numerical value of some parameter is required in advance of running the iteration. For example, an accurate value of the optimum relaxation parameter is essential in order that SOR run efficiently. Also, precise eigenvalue estimates are critically important in order to obtain optimum or even reasonably effective ADI iteration parameters for solving finite difference equations. In each case, a priori estimates valid for a class of matrices are often satisfactory to use. The difficulty with estimating iteration parameters is more acute for factorization procedures because the 8 techniques that work for ADI or SOR are no longer effective. The regularity of the matrices that appear in SOR or ADI makes it possible to obtain good a priori estimates in most cases, whereas the complicated properties of the elements of B in factorization procedures, hinder effective estimation of the extreme eigenvalues of (A + B)~ A, so that a priori estimates of Chebyshev parameters to use in factorization procedures are generally unsatisfactory. In cases when a priori computation of SOR or ADI parameters is not possible, methods have been devised for the direct computation of the necessary extreme eigenvalues that, although a nuisance, are much less expensive than direct numerical approximation of the eigenvalues of (A + B) A for factorization procedures, which could cost as much as running the iteration. The amount of computational work required to obtain accurate bounds of the eigenvalues of (A + B) A is so great that factorization methods are not efficient unless the labor of executing the iteration yields these bounds as a by-product. Efficiency requires that the work of executing the iteration be combined with the work of estimating the eigenvalues. The adaptive -Chebyshev- factorization algorithm of Diamond [10] achieves this. It estimates eigenvalue bounds and optimal parameters dynamically by using the fact that without optimal Chebyshev parameters, the iterates may be combined to approximate an extreme eigenvector. The idea of estimating parameters dynamically is used in some SOR programs [ 28] to obtain an increasing sequence of estimates of the SOR parameter w. Therefore, the idea is not new although the details are completely different. What does seem to be new is its potential for practical applications. Experimental comparisons of Diamond's algorithm 9 to other iterative methods have been made [10]. For each test, Diamond's algorithm achieved roughly two orders of magnitude greater accuracy for the same computational work. One of the most appealing features of Diamond's algorithm is that no preliminary eigenvalue estimates are necessary. Only trivial input parameters are required, and the user is freed of the task of careful numerical preparations, essential to the effective use of SOR, and ADI. This feature should "be stressed as well as the promise of great practical utility, if applications to finite element systems are to be made. Diamond's algorithm is an example of an adaptive or dynamic algorithm. The characteristic property of adaptive numerical algorithms is that the sequence of executed steps is determined dynamically while the algorithm is running. The logical control feature of the machine is exploited in a more fundamental way than by traditional numerical algorithms, which may be called static algorithms. In adaptive algorithms, logical control statements (IF ... statements in the FORTRAN version of the algorithm) direct the choice of one sequence of computation over another, depending on the outcome of certain trials, whereas in static algorithms, the use of these statements is superficial, for example to halt an iteration or to flag errors. In a traditional algorithm, if the path of computation is to be altered by some test, the test can usually be formalized as a mathematical function, that is, the path the computations take may be predicted by a formula. For this reason, static algorithms may be analyzed more easily than adaptive algorithms to determine, for example, the rate of convergence. In adaptive algorithms the quantities necessary for analytic study are not available until certain trials are performed and the outcome of these trials is too complex to formalize mathematically. 10 Recently developed algorithms to solve initial value problems in ordinary differential equations are an example of adaptive algorithms in common use. These algorithms select at each step the next stepsize to be used and the order of the method, depending on estimates of certain parameters based on numerical results from the preceding step. 1.3 Summary of the Contents The finite element method is reviewed in Chapter 2. The structure of finite element systems is described in Chapter 3, with an explanation of the relation between the structure of the matrix and the ordering of the node points. The problem of how to reorder the node points to optimize Gaussian elimination is complex and not discussed. The reader is referred to the work of Rose [23]« For the finite element systems considered here it is shown only that the matrices are not perfect elimination matrices [23]. Reordering of the node points is not normally considered important for iterative methods, but it would be convenient as it would be also for direct methods if reordering would yield a matrix for which the nonzero elements form bands of diagonals as this would facilitate storage. It is shown that no such reordering scheme is generally possible. A compact storage scheme is also presented that requires less storage than other methods [1^1 known to the author. Programming techniques to avoid searching operations are described. In Chapter k the iterative procedure of Diamond is presented' in detail. An almost factorization is described for the matrix of the irregular finite element system in Figure k. Details of an actual computer implementation are displayed in four flowcharts of the Appendix. n l.k- Summary In this thesis, the symmetric almost factorization of Stone is extended to irregular finite element systems. The principal contribution of the thesis is to show that for a specific practical problem an adaptive Chebyshev factorization procedure may be devised for the solution of a finite element system. Some novel programming techniques are introduced to make the code practical. Further work is planned to extend the method to a wide class of finite element systems. 12 2. FINITE ELEMENT METHOD 2.1 Introduction Certain boundary value problems of partial differential equations are equivalent to problems in the calculus of variations. The Ritz method for solving a boundary value problem is to approximate the solution of the equivalent variational problem by a linear combination of "coordinate functions." The Galerkin method is an extension of Eitz's idea such that it can be applied to equations "with no equivalent variational form. The finite element method is essentially a technique to construct a set of coordinate functions for the Ritz or Galerkin method. The difficulty of constructing coordinate functions was clearly stated by Courant [9]: "From a practical point of view almost any success depends on the selection of coordinate functions. ...More annoying is that a suitable selection of the coordinate functions is often very difficult and that laborious computations are sometimes necessary." In the FEM the domain of a boundary value problem is partitioned into finite elements (triangles or rectangles), and a piecewise polynomial (a continuous function which is a polynomial on each finite element) is constructed for the approximation to the solution. Accuracy of the approximation is increased by increasing the degree of the polynomials. It is known that piecewise polynomials of degree k - 1 achieve a reduction in the error of order h , where h is the maximum diameter of the finite elements [25]. In practice, high accuracy is sacrified for the simplicity 13 and convenience of piecewise linear and piece-wise bilinear polynomials in triangular and quadrilateral nets. Polynomials are rarely of degree greater than three. Throughout the paper we -will consider the Dirichlet problem in two-dimensional Euclidean space. A review of the main ideas of the Ritz and Galerkin methods is given in Section 2. In Section 3, details of the finite element method are given for piecewise linear and piecewise bilinear polynomials as basis functions. 2.2 Ritz's Method and Galerkin' s Method In this section, the variational form of the Dirichlet problem is presented. The approximate minimization of the functional of the variational problem is reduced by the Ritz method to a system of linear equations Ax = b, where x is a vector consisting of the unknown parameters in the linear combination of coordinate functions. For later use in Section 3 "we show that matrix A is positive definite. The Galerkin method is also described. We consider the Dirichlet problem with a homogeneous boundary condition in a bounded region q with smooth boundary c)ti, Lu = f in fi, (l, a) »l an - 0, (l,b) where L is an elliptic operator defined by Lu = - z 5 — ( a - • n — ) + hu, a. . = a.. . (2, a) * dx ij dx ' ' 13 31 For every t n and t. the coefficients a. . satisfy 12 13 11+ 2 2 Z a. . t. t. > u z t., u > 0. (2,b) i,J=l ° d i=l The operator L is defined on the dense set M of functions which are twice continuously differentiable in Q and vanish on the boundary bti. It can be 2 2 shown that (Lu,u) > y ||u|| , i.e., L is positive definite. Under the assumption that a. ., b and f are sufficiently smooth, the Dirichlet problem has a unique, regular solution [21]. It is well known [19] that if L is positive definite, the solution of the Dirichlet problem minimizes the functional F(u) defined by F(u) = (Lu,u) - 2(f,u) . (3) Conversely the element which minimizes the functional is the solution of the Dirichlet problem [19,20]. The Ritz method approximates the minimum of F(u) in a finite dimensional sub space of M spanned by linearly independent "coordinate" functions cp_ , cp . ..., cp . We set 1 2 n n u n = Z a cp., (k) 1=1 where the a. are constants. Substitute u for u in (3) to obtain 1 n F(u ) = (Lu ,u ) - 2(f,u ) n n' n ' n n n n = ( z a Lcp ., z a cp ) - 2(f, E a cp ) i=i 3=1 J J i=i n n = z (Kp., cp ) a a - 2 Z (f, cp. ) a.. i,J=l X * x 3 i=1 1 1 15 The minimum of F is sought over all such u by differentiating with respect to a a , which are the parameters that define u , then setting the In o derivatives equal to zero. Since L is symmetric, i.e., (Lcp., cp.) = (Lcp., cp.), ^P^= 2 s (L?., «PJ a - 2(f, q> ). The minimization of F(u ) is now reduced to solving the linear system of equations n Z (Lcp , cp.) a = (f, cp ), i ■= 1, .2, ..., n. (5) 3=1 d J Denote system (5) "by Ax = b. Matrix A is the Gram matrix [19] associated with the coordinate functions ^1* ^2' '"' V i * e *> A i-j = (^i* Wp> x= ^ a p a 2> •••' a n ^ ' and b = ((f, cp ), (f, cp ), . .., (f, cp )) . Symmetry of L implies A is symmetric. Moreover, since L is positive definite, A is also positive definite: For, if c = (c n , c , .... c ) is a nonzero vector and cp = c_,cp_, + c^cp^ + ... + 1' 2' ' n 11 2 2 c cp , we have n T n' n n (Lcp, cp) = ( z c Lcp., E c cp ) i=l X 1 j=l J J n n = Z I (Lcp , cp ) c c i=l j=l J J = (Ac, c). Since cp cp , ..., cp are linearly independent, cp ^ and (Ac, c) = (Lcp, cp) > J 2 \\ cp || 2 . 16 By choosing the a. in (h) to be the solution of system (5) we obtain the Ritz approximate solution of the variational problem. System (5) can be derived without utilizing the variational form. Equation (l, a) is equivalent to Lu - f - infi (I 1 , a) Substitute u in {k) for u in (l',a). The requirement that the left-hand side be orthogonal to the coordinate functions leads to system (5). This is the idea of the Galerkin method. More technically, the Galerkin method is a method for solving the weak form of the differential equation. 2.3 Finite Element Method For the numerical solution of elliptic boundary value problems in a bounded region ft, the finite element method has become a popular and effective procedure. For simplicity we assume that the region ft is a polygon. Let ft = ft U dft. The technique, first, is to partition ft into a union of "finite elements," of which commonly used finite elements are triangles and rectangles. The partitioning must be such that each pair of finite elements shares a whole edge, a vertex, or nothing; a vertex of one triangle or rectangle is not permitted to lie on the edge of another. Next, a trial function is constructed with the property that it is a polynomial on each finite element. We will restrict our discussion to piecewise linear and piecewise bilinear trial functions in triangular and rectangular elements, respectively. The linear approximation is described in relation to the Ritz method and the bilinear approximation in relation to the Galerkin method. 17 For convenient reference we include some of the details of the finite element method. Standard matrix notation is used. The Ritz method reduces the Dirichlet problem (l, a) with boundary condition U loT; = g dS*) to the problem of minimizing the functional F(u) = (Lu, u) - 2(f,u) in the set of functions which satisfy the boundary condition (l',b). By Green's formula Z a. . ^ ^ + bu 2 1J ox. ox 3u Z a. . «r- cos(p, x. )ds (6) . . , XJ UA. UA. j . . n 13 ox. ' 1' where rj denotes the inward normal to the boundary o"fi and s denotes the arc length of bo. measured in a counter-clockwise direction from a fixed point. Suppose the domain fi is partitioned into triangles T , t = 1, 2, ..., M, and the internal and the boundary node points are numbered 1, 2, ..., R; N+l, N+2, ..., N respectively. A linear function u T defined on a triangle T in Figure 1 can be written as u = a + bx + cy. The three node values w of u T at the points P (x , y ), n = i, i, k n nn determine the values of a, b, c: a + bx + cy = w , n = i, i, k. n n n' ' ' Figure 1 18 In matrix notation 'i x. r i\ M l\\ 1 x. y. J 3 *k w lei w. w . which may also be written in transpose form (a, b, c) 111 x. x. x k i J (■w ± , w , w k ). \7i *, tJ The transpose form is more convenient and will be used below. Let -1 C T = /111 X. X. x. 1 J is Vi y j y J T w = (w.,w., w ) and * = l' x 19 Then t T r u = V C ijf. (7) Define the function v on the domain ft by v T - u T on T T , on ft - T T . Then, the summation v of the functions v M T v = Z v (8) T=l is a piecewise linear function on the domain ft. Let the boundary node values be the values of g at the points. If g is piecewise linear on Sft, then vk = g, i.e., v is a trial function satisfying the boundary condition. Substitute v for u in (6) to obtain M F(v) = Z F(v T ) (9) T=l where 2 ^ T n t 2 f ( z a. . g- g- + bu T ) dxdy - (2fu T ) dxdy T< F(v L ) = T t i,j=l J i j ^t r 2 ^u T + J g . E . a ij §r: cos (ti > x i } ds - o:ft T l '3-l ^ In the last term oft, denotes the sides of the triangle T lying on dft. From (7) 20 T T T _T T _T U = ¥ C f = |, C w^ Su T T n T . T „t T TT sr — = w C i|( = iir C w. £x. Xi "x. ■N T Substitute these formulas for u and « — in the three terms of F(v T ). ox. ' 1 To express the result, define J ( - E i *« \ ■'t i,o=l d i B T = ( Z a )|r / + b Mr >|r T ) dxdy, p T (2ft) dxdy, We have q = I g Z a. . \k cos (n. x. ) ds. . . -, lj x. 1 f 2 ^ ^ A T 2 m T ( E a. . ^ — =r — + bu ) dxdy = w C B C w. . . , i t i ox. OX. • : ' J T i*J=l i r T { T I (2fu) dxdy = w C T p T , T T g . ? , a ±d sir cos Cti * x i A j T T T ; ds = w C q . T Let matrix A = C B C , vector f = C p and vector g = C q . We obtain F(v j = w A w - W f + w g, To minimize F(v) in (9), differentiate F(v ) with respect to the «r parameters associated with internal node points. If, in triangle T , w., w., and w are the parameters associated with internal node points, then 21 = (A T + A T ) w - f (10) There are two different classes of triangles on which at least one node point lies on the "boundary. If, for example, i is the only internal node i point of the triangle T and one edge of the triangle lies on the boundary dft, then p-iuVc'V*-^^ (id where the subscript 1 denotes the first row of a matrix and the first component of a vector. For the other class of triangles the partial derivatives of F(v ) with respect to one or two parameters associated with internal node points can be expressed similarly. If an internal node point i is shared by triangles T , T a , ..., T , then 0D> oT(v) _ oT(v T ) + dF(v g ) + _ + dF(Q < dw. c)w. ^w. owl 111 1 (12) Combining (10), (ll) and similar expressions associated with other classes of triangles, we obtain a linear system of equations dF(v) _ n 1-12 N Bw. ' " ' ' J 22 The linear system can be written in matrix and vector notation as Ax = b (13) T where x = (w , w , . .., w N ) . The solution of the system (13) minimizes the functional F(v) in (9), and is the finite element approximation of the Dirichlet problem (l, a) and (l',b). If Q is the union of rectangular elements, not only is the geometry simpler but also the construction of the family of coordinate functions. The simplification is due to the fact that a rectangle is, mathematically, the cross product of two intervals. Techniques for the construction of polynomials on intervals to approximate functions of one variable form a part of classical mathematical theory. There is no corresponding well-developed theory of approximation by polynomials of functions defined over general higher dimensional domains. But if the domain is a rectangle the one -dimensional theory easily extends to higher dimensions by multiplying approximating polynomials in one variable to obtain approximating polynomials in several variables. We assume that II is a rectangle with edges parallel to the coordinate axes, i.e., Cl = (a, b) X (c, d). Partition [a, b], [ c, d], and 0. by tt , 7T , and ir respectively: x y tt : a = x^ < x., < . . . < x = b, x 1 n ,, ' x+1 % ■'• c = y < y < ... < y - d, y y+1 TT = 7T X TT . x y 23 A piece-wise Hermite interpolation in H (ir, fi) is defined as a linear combination of bilinear functions t. . (x, y) = t. (x) t.(y) for < i < n + 1 i j i j x and < j < n +1 where t. (x) are linear functions defined by t Q (x) = x l - X *1 - x o x n < x < X T otherwise, t.(x) *n + 1 (X) X ( X - X i-1 1 1-1 x i + l - x x i + l - x i f X - X n x X - - X n +1 n x x x. n < x < x., l-l — — i' x. < x < x. ,,, l — — i+l' V. otherwise, x < x < x + 1 , n — — n ' x x otherwise. Define n n n +1 n +1 where g..=g(x., y.)0 •••' M. (15) . _ . . v ij' mrr 13 ' mir g' mn" Vn = 1, 2, ..., nJ 1-1 j-i -y Since t are zero on the boundary dfi, in the system (15) mn (Lt. ., t ) = 1J mn . R. . ijmn r 2 at. . at ( Z a v „ ■^ L ^ m +bt.. t ) dxdy (l6) k,i=l k " \ Sx i 1J m where R. . is the intersection of the two rectangles [x. n , x. J x [y. ,, y J ljmn & L i-l> 1+1 J LJ j-l' J j+r and [x n , x , n ] X [y _, y , ] in which t. . and t are nonzero respectively. L m-1' m+1 L n-1' n+1 ij mn J Renumbering v. . in (15) by v. ; i = 1, 2, .... N(=n n ) we obtain a linear ij 1' ' ' ' x y system Ax = b (IT) T where x = (v^ v , ..., v ) . In Section 2, we have seen that the Gram matrix associated with coordinate functions is positive definite. Matrix A in (17) is the Gram matrix associated with the bilinear functions, and so is positive definite. 25 Matrix A in (13) is also positive definite, and the proof follows: It is simple to show that v in (8) is a linear combination of Synge's pyramid N H T function, i.e., v = z w. cp. + z w. cp., where cp. is a continuous and -nil. txt.t i i 1 i=l i=N+l piecewise linear function on ft such that the support of cp. (x, y) is the union of triangles with the vertex i and cp. (x., y. ) = 1[26]« Therefore, matrix A is the Gram matrix of the linearly independent pyramid functions cp., i = 1, 2, . . ., 1. Henceforth, A will be used as a generic symbol for a matrix arising from the finite element method. 26 3- MATRIX STRUCTURE AND STORAGE SCHEME For the numerical solution of the Dirichlet problem, we have obtained a linear system of equations Ax = b, where A is positive definite. The structure of A and a scheme for storing the elements of A are discussed in this chapter. To investigate a linear system of equations with a sparse matrix, it is convenient to use graph theoretical methods. The relation between graph theory and positive definite systems is reviewed in Section 1. The treatment is standard [23]« In Section 2, the relation of the structure of A to ordering of the node points is described and some results on possible structures given. A compact storage scheme for sparse positive definite matrices is explained in Section 3* In Section k a technique is presented for matrix-vector multiplication without the need for any searching operations. 3.1 Graph Theory and Positive Definite Matrices 3.1.1 Graph theory. A graph is an ordered pair G = (X, E), where X is a set of finite elements called vertices and E is a subset of unordered pairs { {p, q} |p, qeX, p / q}. The elements of E are called edges . An ordering of X is a one-one correspondence rx a [1, 2, ..., n} ^-> X. If X is ordered by a in the graph G = (X, E), then the ordered triple G = (X, E, a) is the ordered graph. In the ordered graph the point n. — — *— — assigned the number i will be denoted by a. or simply by i« A 27 realization of a graph on a plane is trivial: vertices and edges are represented by points and line segments joining the pair of points respectively. If IP* 9.3 is an e dg e j then q is a vertex adjacent to p, and the set of vertices adjacent to p is denoted by adj (p) . A cycle of length n is an ordered set of n distinct vertices [p , p , ..., p , p ] such that p e adj(p. ), i = 1, 2., ..., n-1, and p e adj (p ), and a chord is an edge of two non- consecutive vertices of the cycle. A graph is triangulated if every cycle of length greater than three possesses a chord. 3.1.2 Graph theory applied to positive definite matrices. For a positive definite matrix A there exist unique matrices L and D such that T A = L D L , where L is a unit lower triangular matrix and D is a diagonal matrix with positive entries [16]. If we solve the system r Lz - b Dy = z .L x = y, then x is the solution of the original system Ax = b. In general the matrix L is not as sparse as the lower triangular part of A. T The system Ax = b is equivalent to the system PAP Px = Pb where P is a permutation matrix. (A permutation matrix is a matrix for which each row and column contains exactly one element equal to unity, the other elements being equal to zero.) If there exists a permutation matrix P such that 28 T T B = PAP = L D L , where B. . = implies L. . = 0, i > j, then A is called a perfect elimination matrix . A graph theoretic characterization of perfect elimination matrices will be stated below. We associate with a positive definite matrix A of order N an ordered graph G (A) = (X, E, a) with N vertices cu, rv p , . .., r/ , such that the vertex n. corresponds to row i and [a., a..} e E if and only if A. . 4 i i ' o io and i < j. It is simple to show that if a matrix B is obtained from matrix A by interchanging rows A. and A. and interchanging corresponding ■*" tJ columns A, . and A, ., then the ordered graph G R (B) associated with the matrix B is obtained from G (A) by simply interchanging the two numbers assigned to the points a. and ex. • i.e., ct. = p., ct. - P., a, = P, for i o ■ 1 i * * k / i, j. This interchange of exactly two elements is called a T transposition. Let B = P A P . Then B is obtained from A by permuting its rows and columns, and the permutation is defined by P. Every permutation can be written as a product of transpositions, from which it can be shown that the ordered graph of B is obtained from the graph of A by reordering its vertices. To state this precisely, if P.. = 1, define J -'- P(i) = 0« We have the following proposition: T Pro-position 1 : For a given matrix A of order N, let B = P A P and the graph of the matrix A be given by G (A) = (X, E, ry). The graph of B is obtained by reordering the vertices X indicated by the permutation matrix P, i.e., G (b) = (X, E, p) where p is defined by 3 p (j_) = a. ± , i = 1, 2, ..., N. T Now we can associate with the class of matrices {B|B = PA P } an unordered graph G(A) by deleting the ordering a from the ordered graph 29 G (A). It is known that G(A) is triangulated if and only if A is a perfect elimination matrix [23] • 3.2 Structure of the Matrix Associated with the Finite Element Method The locations of nonzero elements in matrix A are determined by the structure of the triangular net and the ordering of the node points of the net. A more detailed description of the relationship is as follows: Assume that an internal node i is connected to internal nodes j_, j_, ..., j. by the edges of triangles 1 , t f ..., t as in Figure 2. The i-th equation of the system Ax = b in (13) is -^ — <- - 0, and, as in (12) "d w. aF(v) _ \ 5F(v n ) ow\ _ dw\ 1 n=l 1 (18) Figure 2 By (10) 1 x 1 T where A is a 3x5 matrix and w = (w., w. , w. ). If we denote the first ^1 ^1 T l ^1 ^1 row of the matrix (A + A ) by (a.., a. . , a.. ), then 11' 2.3 ± ' ij 2 Mz!ii_ a T i T i t i t i ~ = a . . w . + a . . w. + a . . w. + f -, • 30 Similarly, T — ^ L = a. . w. + a. . w. + a. . w. . + f_ , n = 2, 3, • • . t k-1. ^T X1 X 10 n J n «(a+l) 1J (n + l) X aF(v Tk ) T k r T k T k Jk «J ' ' = a., w. + a. . w. + a. . w. + f . dw ± 11 i ij k j k 13 x J-l 1 Thus dF(v) -•^r — = a. . w. + a. . w. +a.. w. + ... + a. . w -b. ow. ii i lj-. j, ij j ij, j. i i u l u l d 2 d 2 d k d k where k T n a. . = E a. ii n=l ii T l T k a. a. . + a. U-L 1J 1 13 X a.. — a.. + a. . , n — 2, o, ...,k ij 10 id n n n T l T 2 T k b. - -f, - f_ - . . . - f_ . ill 1 Therefore if internal node point i is connected to internal nodes j , j , •••> 3-uj ^■' ie ^-"^ equation of Ax = b is a. . w. + a. . w. + a. . w. + ... + a. . w. = b. (19) ii l i3 ± J 1 ij 2 j 2 ij k J k l It follows from (12) that if node point i is also connected to boundary nodes, then the boundary nodes induce no additional terms in (l9)» To summarize we have: Proposition 2 : In a given triangular net of the domain fi, if an internal node i is connected to k internal nodes j_ , j p , . .., j, by an ordering of the node poin then, in the linear system Ax = b, the nonzero elements of i-th row of A are at most A.., A.. , A.. , .... A.. . 31 Without loss of generality we shall assume in the following pages that A... A. . , A. . , .... A. . are nonzero. As an illustration consider the nonuniform triangular net of a rectangular domain in Figure 3« With the node points ordered as shown, matrix A is as in Figure k where nonzero elements are marked by an x. Figure 3 32 XX X XXX XXX XXX X XXX XXX XXX X XXX * XXX • 1 • XXX XXX XXX X XXX XXX XXX X XXX XXX XX X XX XX X X XXX XXX XXX XXX X X XXX XXX XXX XXX X X XXX XXX • • • • • • • • X XXX XXX XXX XXX X X XXX XXX XXX XXX X X XXX XXX XX XX X XX XX X xxxx X XXX XXX X xxxxx XXX XXX XXX X XXXXX X • • • • X XXXXX X XXX XXX X XXXXX XXX XXX XXX X XXXX X XX XX X X XX X X XXX XXX X X X XXX X • • • *X XXX XXX XX XX X XX XX X X XXX XXX XXX XXX X • • • * X XXX XXX XX XX X XX XX X XXX XXX XXX • • 'x XXX \ XX XX Figure k A finite element triangulation of a region is a graph, and is the graph of the resulting finite element matrix, if the coordinate functions are piece-wise linear. This fact is well known; we mention it here for use below in a remark. Some adjustment must be made for boundary points; 33 an exact statement is given below. It is not true that the discretization net is the graph of the corresponding matrix for higher order approximations, for bilinear coordinate functions, or for the method of finite differences. Nevertheless an association between the net and the matrix, which does not fit into graph theory, may be fruitful as Alan George has shown with the method of nested dissection [15]. Corollary : In a given triangular net, after removing the boundary node points together with edges incident to them, the remaining triangular net is called the inner triangular net. The inner triangular net of a given triangular net of ft is the graph of the matrix A. It would be optimum for the application of direct methods for matrix A to be a perfect elimination matrix [23], i.e., the LU decomposition of A does not fill in any of the zero elements of A. Rose has proved [23] that A is a perfect elimination matrix if and only if G(A) is triangulated. If a triangular net has an internal node point surrounded by a cycle of at least four adjacent internal node points, then the graph of the matrix A is not triangulated, and A is not a perfect elimination matrix. In most realistic problems, every triangular net has many internal node points possessing at least four adjacent internal node points, and so most realistic problems do not yield a perfect elimination matrix. In this thesis we shall only study approximate factorizations of matrices with at most nine nonzero elements per row. Some justification of this follows. The error bound of the approximation is proportional to the term l/sin 6 where 6 is the smallest angle formed by a pair of edges with a common node point [33 ] • The error bound is therefore minimized if the triangles are equilateral. In order to propose a restriction in triangulation scheme we define the concept of a mean angle at a node of triangular net. Definition : If an internal node point is a common vertex of n triangles, then the mean angle at the node is 36o°/n, and if a boundary node point j is a common vertex of m triangles, then the mean angle at the node j is A°/m where A° is the angle covered by the two boundary edges incident to the node j. Conclusion : If the mean angle at every node point is not less than ^5"', then the number of nonzero elements in every row of A is not more than nine. This assumption is satisfied in most triangulations used for practical purpose. Under the assumption that coordinate functions are piecewise linear defined over a triangular net, we have seen that the triangulation is the graph of A and A is not a perfect elimination matrix. We next show that triangulations exist for which no reordering of the node points yields a banded matrix. This property would be desirable because band matrices are simple to store. We do not present a graph theoretic characterization of banded matrices such as is possible for perfect elimination matrices, but only show that for one simple triangulation a banded matrix is not possible. First we give some definitions. A set of three consecutive positive integers is called a basic set . The distance between two disjoint basic sets B and B is defined by d = mini a. - b . I - 1, a. e B_, b. e B . 1 i 3 ' i 1 J 2 A family of three disjoint basic sets is called a compound set if the distances of "adjacent" basic sets are equal. The equal distance is called the gap of the compound set. The center of a compound set is the middle element of the increasing sequence consisting of the elements of the set. If for row i of matrix A, the column indices of nonzero 35 elements are contained in the compound set of gap d with center i, for d independent of i, then the matrix is called a three - banded matrix with gap d. All nonzero elements of a three -banded matrix are contained in at most nine diagonals, therefore it is enough to store only five diagonals and the gap to store the matrix. It is easy to construct an example of a triangular net such that A is not three -banded. If the inner triangular net of a region contains the net G depicted in Figure 5> then there is no ordering of the node points such that A is a three-banded matrix with any positive gap. Figure 5 For an inner triangular net containing G, suppose that there exists an ordering of the node points such that A is a three-banded matrix with positive gap d. Then there are two compound sets C and C with centers, i and j, respectively, which order v^ v £ , v , v^, v , o, p, q, r, u^ u , u , u, . and u_, such that |C_ D C_ I =1+. Let i < j. It is clear that 2 4 5 '12' j=i+d+2orj=i+d+4. Suppose j = i + d + 2. Then q should be ordered by j + 1 = i + d + 3 or j - d - 3 = i - 1. If q is ordered by i - 1, then s and t should be ordered by i - d - 5; and if q is ordered by 3+1, then s and t should be ordered by j + d + 5. Each is a contradiction. Arguments for j =i+d+4is similar. 36 3.3 A Storage Scheme for Sparse, Symmetric Matrices Several compact storage schemes have been developed for sparse symmetric matrices. We shall mention some of these, and describe the scheme we use. For matrices with a dense band the "diagonal band storage scheme" is very efficient because it does not require any bookkeeping information about indices of stored elements and overhead storage due to zero elements in the stored band is negligible. In the "profile storage scheme" of Jennings [171 all elements from the leading nonzero element to the diagonal element of each row are stored successively in a one-dimensional array. An "address sequence" of another one -dimensional array is used to locate the position of the diagonal elements. George has proposed [Ik] storing only nonzero elements of the lower triangular part (including the diagonal) of the matrix of order N in a one -dimensional array S. Two one -dimensional arrays, one of order equal to the order of S and the other of order N, are used to access A from S. In the scheme we propose, the upper triangular part of A is stored as explained below. It is similar to George's method, but takes less storage. Suppose the upper triangular part of A is, A ll A 12 A m A 22 A 23 A 2n A 2 ri+1 A 2 n+2 A 33 V A 3 n+2 A. . A. . _ A. . A. A. A. ii i l+l x 1+2 lm i m+1 l m+2 \-l N-l \-l N 37 The nonzero elements are stored successively from the first to the last row in a one -dimensional array OS, os i - V 0S 2 ■ V os j ■ V °\ • V °S - V ° S 6 " V ° S 7 " A 2 n + l> 0S 8 . A g ^, ... °VV ° S 10 = V ° S ll = A 3n + 2' "• ° 8 3 - \i' °Vl " A i ML' ° S j + 2 " A i i + 2> ° S o + 3 " V V = 'i*l' °V 5 = A im + 2' - 0S M-2 = ^-1 N-l' ° S M-1 = Vl H 0S M " V- In order to retrieve A from storage, another one-dimensional array L is used to store the number of nonzero elements of each row and column indices of the saved elements not on the diagonal. The element of L corresponding to the diagonal element is the number of nonzero elements in the row with that diagonal element, and the element of L corresponding to a nonzero element is the column index of that non-diagonal element. For the example above, L l = 3 ' L 2 = 2 > S = n ' \ = 5, L^ = 3, L 6 = n, 1^ = n+1, L g = n+2, L 9 = 3 > L io = k > L n = n+2 > ' ' ' L. = 6, L. . = i+1, I. _ = i+2, L. , = m, L. , = m+1, L.._ = m+2, j ' ,1+1 ' j+2 ' j+3 ' M » j+5 ' 38 This method for storing the elements of A will be called a natural compact storage scheme . 3.^ Matrix-Vector Multiplication Using a Row-Oriented Storage Scheme The storage scheme we have described is row oriented: access of elements in a fixed row is easy, but access of elements in a fixed column may require searching through several rows. This means that straightforward computation of Ax, which is necessary in computing the residual, may be relatively expensive for an iterative method. However, it is possible to avoid this difficulty by a simple technique described briefly in this section. First, we explain more the difficulty with matrix vector multiplica- tion. Suppose A = (A. . ) 1 < i, j < N, in standard matrix notation and -'-J x = (x , ..., 2C.). Consider the k-th component of Ax, k-1 N (Ax) = Z \. x ,.+ Z \, x . (20) Computation of the second sum is simple to program using the row oriented storage scheme proposed in Section 3, because the values of a.-,, a^. t-.-i* ••• appears in sequence in array OS. Computation of the first sum is not as simple. Because only the upper half of the matrix is stored, the coefficients in the sum are stored as the elements above the diagonal in the k-th column. Access of each coefficient from storage requires searching through the row to which it belongs. The total number of searches to compute Ax equals about half the number of elements of A, and computation of the residual at each step of the iteration would be too expensive . 39 To explain how to avoid these search operations, we rewrite (20 ), as k-1 N (Ax), - E A., x. + z A. . x.. (21) Consider any one of the terms, say A x., in the first sum, i.e., j < k. Jk J When (Ax), was computed, access of A to compute A x_ did not require a d Jk jk k search. If in the program segment to compute A x, we also include statements to compute A., x. and store this in the location for (Ax), , we will have achieved the computation without any search operations. It should "be clear that each component of Ax may he computed with no search operation and no extra multiplications . An algorithm for computing Ax. = y with A stored in OS by the natural compact storage scheme is given in Flowchart 1 of the Appendix. 1+0 k. ADAPTIVE ITERATIVE PROCEDURE TO SOLVE FINITE ELEMENT SYSTEM k.l Introduction The finite element method reduces the Dirichlet problem to a finite element linear system with matrix A such that A is sparse, positive definite, not a perfect elimination matrix, and the structure is irregular. In Chapter 1, we discussed the use of iterative and direct methods to solve finite element systems. In this chapter we present an iterative method for solving a typical, irregular finite element system that arises in the numerical solution of the mathematical model of the flow of liquid waste in underground sandstone formations. The iterative method is described in Section 2. ■ It is due to Diamond [10] and is referred to here as an adaptive- Chebyshev- factorization (ACF) method. Efficiency of the ACF method depends on the existence of an almost factorization A+B with a sparse LU decomposition for which the ACF method converges rapidly. The technique for constructing A+B is a modification and extension of the technique H. L. Stone introduced in his Strongly Implicit Procedure [2^], which in turn is based on ideas originally due to Buleev. The algorithm for the approximate factorization is described in Section 3« It is presented in such a way as to emphasize implementation on a computer; that is, features of the algorithm that generalize to a class of finite element systems are an integral part of the flowchart description. T The almost factorization matrix A+B is factored in the form LDL . T Only the nonzero elements of L (or L) and D are stored. The elements of T L are stored in a one-dimensional array as they are generated. An algorithm T for solving L D L x = y without searching the elements of L is also presented. Computational results obtained by the new algorithms are presented in Section ^ The algorithms are tested with three classes of matrices. The order of the matrix is increased up to 10660. These experimental results show the per- formance of the algorithm to be satisfactory. k.2 The Adaptive-Chebyshev-Factorization Method of Diamond H. L. Stone proposed an iterative technique, which he calls a Strongly Implicit Procedure (SIP) for solving the linear systems that arise from the finite difference discretizations of the flow equations in petroleum reservoir simulation [2^]. Experience has shown SIP to be a reliable efficient technique for solving many difficult systems, even though no rigorous mathematical analysis of its properties has been performed. SIP is widely used in the oil industry as well as other applications areas. In some installations it is used exclusively. The ACF method that we propose for finite element systems originated with work on the analysis of SIP, and so we begin with a brief description of SIP. Stone's Strongly Implicit Procedure is, first, an iteration (A+B) x. . = (A+B) x. - (Ax.-b) l+l 1 1 to solve Ax = b and, second, a specific algorithm to construct an almost factorization, A+B. The purpose of the iteration is to reduce the error caused by solving (A+B) y = b instead of Ax - b. The iteration Stone proposed achieves rapid convergence by varying matrix B cyclically according to a complicated, but easy to compute, formula. k2 Stone's formula is the result of experience, intuition and skill, with no mathematical theory to explain its success. In an attempt to analyze SIP, certain mathematical simplifications may be made. SIP employs a sequence of matrices B such that A+B is nonsymmetric. To simplify this, Stone's construction may be modified to make A+B symmetric, in fact, positive definite. Next A+B may be fixed instead of varying from step to step. To accelerate convergence another iteration parameter may be introduced to modify the residual. The iteration becomes (A + B) x. _ = (A + B) x. - t. (Ax. - b). (22) l+l 111 Dupont, Kendall, and Rachford Cll] studied this iteration under the assumption that A and A+B are positive definite. They showed that the selection of t. may be made scientifically by the use of classical Chebyshev theory. Unfortunately, in order to apply the theory it is necessary to possess accurate approximations of the largest and smallest eigenvalues of (A + B) ' A. Diamond's contribution has been to show how to estimate these eigenvalues dynamically. Some of the details of Diamond's method will now be given. Let x be the solution and e. = x - x. be the error. Then, 1 l ' e. _ = (A + B)" 1 (A + B - t. A) e.. l+l 11 Therefore, for any vector norm and consistent matrix norm, n-1 Hell < || tt (I - t (A + B) x A) || ||e ||. n i=0 x We shall use the Euclidean vector norm and the spectral matrix norm. Since -1 1 -1 i (A + B) A is similar to the Hermitian matrix A 2 (A + B) A 2 , the eigenvalues *.. ((A + B) A)' are real and [ll] k3 „ = x r r a ^r 1 a^ - min (Ay, y) a - "-min ((A + B) A) - y£> ((A+b) y, y) k = x ( ( i\ 4. -r)' 1 i\) - raax (Ay, y) b - \a* ((A + B) A) - y fo ((a + b5 y, y) * Iteration parameters t. are selected to minimize the spectral radius of the n-1 _! error propagation matrix tt (I - t. (A + B) A) . The optimum parameters i=0 X are the values that minimize the maximum absolute values of the polynomial n-1 P (x) = T (1 - t.x) i=0 on the positive interval [a, b]. By Chebyshev analysis the sequence t. should be selected such that T ((a + b - 2x)/(b - a)) P n (x) = \ ((a + b)/(b - a)) (23) where T is the Chebyshev polynomial of degree n. The maximum absolute . -1 value of P (x) on [a, b] is [T (- )] , which is less than 1. The n ' n v b - a 7 ' recursive property of the Chebyshev polynomials may be used to define the iterates x. of (22) by x. = x. + Sx.; i = 0, 1, 2, . . ., n - 1 (2k, a) ^T,(y) -, T n(y) Sx. = jr t-t= r-r (A + B)" X (b - Ax.) +' / v Sx. _ (2k, b) l (b - a) T i+1 (y) i' T i+1 (y) i-l ' 6x o = rfb (A + B)_1 (b " ^V (2 ^> c) u a+b where y = = . b - a Although A. . and A. are not available, matrix (A + B) can be mm max ' generated such that the interval [A. . A. ] includes 1. If we choose an min max kk arbitrary positive interval [a, b] containing 1, then one of these four cases will arise: (i) a < \ . and b < \ , rain max' (ii) a > X. . and b > \ , man max' (iii) [a, b] c \\. , X ], v ' L ' J L mm' max J ' (iv) [a, b] => YK ,. , \ ]. L ' J — L min' max In case (iv) the iteration (2^-) obviously converges; in other cases it may not. For other cases, Diamond developed an algorithm that yields a sequence of intervals Fa., b.l such that b. ->• \ . a. -+ \ . . and a. -» \ . and L 1' 1 i max' i mm' i min b. -* \ in cases (i), (ii), and (iii) respectively. In order to show x max some of the details, let S. ,_ = (A + B)~ (b -Ax.) (see (2k, b)). It can i+l l be shown that Vi ■ n \ (I - *± (A + B)_1 A) 6 r 1=0 If I P (X. . ) I 4 P (^ )|, 6 n approaches the eigenvector corresponding 1 n min ' ' ' n max ' ' n+1 to one of the extreme eigenvalues of (A + B) A, and the approximate eigenvalue is given by (A5 _ , 8 .. ) v n+1' n+1' U * = ((A + B) B , 8 n+1 )' (25) This quantity is used to determine which of the four cases occurred and to obtain a new interval. In the actual implementation the possibility that I P (A. . )|~|P (^ )| is also considered. The iteration of (2k ) is cyclically 1 n min '= ' n max ' performed with adjusted eigenvalue bounds at the end of each cycle if needed. In (2k, c) x is an initial estimate of the solution for the first iteration ^5 cycle, x is used as the initial guess for the next iteration cycle, and so on. In summary, while the interval [a, bl of the initially estimated eigenvalue bounds is approaching one of (i f ) [a, X 1, L ' max " (ii') [X . , b], L man' ' (iii') [X . , X ], L man 7 max ' (iv') [a, bl, in cases (i), (ii), (iii), and (iv) respectively, x. converges to the solution of the system Ax = b. If the initial estimate [a, b] is very crude, then at the end of the first iteration cycle Ax - b » Ax. - b . ii n ii ii o M It is better to choose x. again, instead of x , as the initial guess of the . ' n solution for the next iteration cycle, and so on. With this modification Diamond's procedure is shown in Flowchart 2 of the Appendix. If a « X . man or A. » b in (i'), (ii'), and (iv'), then the final interval is much longer than the optimum interval \X . . X 1. But it is easy to avoid L man' max J J this by choosing the initial a and b to be close to 1. Within a few cycles of iteration, experience has shown that the interval of the bounds stabilizes, Once the interval is stabilized no further adjustment is required after the next iteration cycle. When this happens, iteration {2k) can be performed with increased n until a satisfactory approximation is achieved. Experience with the algorithm has been that about forty iterations are sufficient to reduce the norm of the residual to less than 10 U6 k.~5 Almost Factorization Algorithm There are an infinite number of matrices A + B for which a parameter sequence exists to make iteration (2k) converge and for which the system (A + B) y = c is trivial to solve. More technically, there are an infinite number of almost factorizations such that A + B is positive definite and T possesses a sparse LU or LDL decomposition. The simplest is A + B = I for which the iteration reduces to x. ,, = x. - t. (Ax. - b), l+l l 11 ' which is known as Richardson's method. Richardson's method typifies the essential problem with selecting an almost factorization: slow convergence. The technique Stone employed was to select B in such a way that Bx = when the components of x are the values of a first degree polynomial at the node points of the finite difference grid system. Stone's technique may be modified to yield a symmetric almost factorization, but.Bx vanishes only when the components of x are constant. This symmetric version has been developed [3] for a finite difference system resulting from a rectangular and therefore regular grid network. It is the objective of this thesis to modify the Stone technique to yield an almost factorization for an irregular finite element system, A, with the following characteristic properties : (i) Each row of the upper triangular half of A contains at most six nonzero elements, divided into two sets FA and SA. (ii) Each set has at most three consecutive elements. (iii) Let the smallest column index of the second set of elements in row i be SCI.. If i < i, then SCI. < SCI., l ' i-0 These properties of A are shown in Figure k. +7 For a positive definite matrix, there are three common triangular decompositions essentially equivalent to Gaussian elimination, the LU, the Cholesky GG , and the U DU decompositions. Factor U is unit upper triangular, that is, the diagonal elements are l's. These decompositions T are unique, from which it follows that L = U D. Since only G must be saved, the Cholesky decomposition apparently takes less storage, but it involves extracting square roots. Also, it is a simple obversation that T no more storage is required for the U DU decomposition since the additional elements to be stored are the elements of D and these may be saved in place of the diagonal of U which consist of l's. Although the LU decomposition yields nonsymmetric factors, it does not require more storage than the T LDL form because only U and the diagonal elements of L, which are the diagonal elements of D, need be saved. The LU decomposition will be used below with this strategy, to save U and the diagonal elements of L. Thus, the LU decomposition is performed, but the result is saved in the form of T the LDL decomposition. The nonzero elements of A are shown below. A = A ll A 12 A 12 A 22 A 23 23 33 ^ A A In A 2n n-1 n-1 In 2n A nn A A 2 n+1 A 2 n+2 A n n+1 A 3 n+2 2 n+1 A -. A _ . A . n n+1 n+1 n+1 n+1 n+2 A 2 n+2 kQ The role of matrix B is to make the factors of A + B sparse. The LU factors of A + B are to have the form shown below. These factors when multiplied do not yield A, but A + B, where B = L U - A. The nonzero elements of B are also shown. L = h: L 21 L 22 L 32 L 33 nl n+1 2 n+2 2 U = u. 12 tL U, 3+ U. In U 2 n+1 U 2 n+2 U 3 n+2 B = B -B 2n ;h 3 n+1 " B 3 n+1 2n " B 2n B 3 n+1 " B 3 n+1 " B 2n " B 2n 2B 2n " B 3 n+1 ~ B 3 n+1 2B 3 n+1 k9 The elements of L and U are computed from a set of algebraic relations obtained by setting LU equal to A + B. The details follow. From (L 11' L 11 U 12' °>'"' °' L ll U ln' °> ' ' '> 0) = (A 11' \2' °'"-> > A ±n>°> ' ' '> 0) > we obtain L ll ~ A ll' U 12 = V 1 !!' U- = A_ /L._. In In' 11 Also, L 21 = L ll ^ ( = V' L = L U_ (= A ). nl 11 In In The second row of the matrix L U is (L 21' L 21 U 12 + V L 22 U 23' ' -'°' L 21 U ln' L 22 U 2 n + l' L 22 U 2 n + 2' °' -»°>' L 21 is already equal to A^. From L^ U^ + L 22 = A g2 and L 22 U^ = A , we obtain L 22 = A 22 ' L 21 U 12> u = A /L . 23 23 22 L^., U. is known, and the element B^ is initialized from L^ n U. = A_^ + B^ , 21 In ' 2n 21 In 2n 2n' B 2n " L 21 U ln " A 2n 50 The second row sum of B is zero, if B 2 n+1 = ~ B 2n' From L ori U ^ = A + B_ _ and L U = A , we obtain 22 2 n+1 2 n+1 2 n+1 22 2 n+2 2 n+2' U 2 n+1 = (A 2 n+1 + B 2 n+l ) / L 22' U 2 n+2 = A 2 n+2 /L 22* If B is symmetric and each row sum is zero, B n2 = B 2n> B n+1 2 - B 2 n+1 ( ~ " B 2n } n n+1 2n' 3 n+l n " B n n+1 ^ " B 2n^ B , _ = 2B^ . n+1 n+1 2n Without any further computation, we know L 32 = L 22 U 2 5 (= V' L n+1 2 = L 22 U 2 n+1 (= A 2 n+1 + B 2 n+1 = A 2 n+1 + Vl 2 )f L n+2 2 = L 22 U 2 n+2 ^ = A 2 n+2^ and also L _ IL. = L__ IL U no (= L__ U n = A. + B c ). nl 12 11 In 12 21 In 2n 2n 51 The third row of L U is (0,L 32 , L 32 U 23 + L 33 , L 33 U 3V Q, ..., 0, L 32 U 2 ^ L 32 U 2 n+2 + L^ n+g , , . . ., 0) The value of L,_ is A__. From L__ U_., + L,_ = A„ and L U,. = A ,, we obtain 32 23 32 23 33 33 33 34 3^ L 33 = A 33 " L 32 V % = A 3i + //L 33* The element B_, _ satisfies 3 n+1 B 3 n+1 = L 32 U 2 n+1" To make the third row sum of B equal to zero, we have B 3 n+2 ~ " B 3 n+1" From L 32 U 2 n+2 + L 33 Q 3 n+2 = A 5 n+2 + B 5 n+2' U 3 n+2 = (A 3 n+2 + B 3 n+2 " L 32 U 2 n+2 ^33 * Elements B , B ,. B . B . and B _ are determined n+1 3 n+2 3 n+1 n+2' n+2 n+1 n+2 n+2 by symmetry and row sums of B. In practice, only the elements of L and U are saved. The elements of B need not be explicitly computed. The remaining steps of the algorithm are omitted from this informal description. Later the algorithm will be given in a high-level flowchart. The nonzero elements of U and B are shown in Figures 6 and 7> respectively. 52 ex x xx xx XX X XX XX XX X XX XX XX XX XX X XX XX XX X XX XX X X XX X XX XX XX X XX XX XX X XX XX XX XX XX X XX XX XX X XX XX X X XX XXX X XX X XXX XX XX X XXX X XXX X XX X XXX XX XX X XX X X X XX X XX XX XX X XX XX X X XX X XX XX XX X XX XX X X XX XX XX XX X Figure 6 53 XX XXX XXX XXX XXX XXX X XXX XX XXX XX XXX XXX XXX XXX XXX XXX XX XXX XX XXX XX XXX XX XX XX XXX XX XXX X XX XXX XXX XXX XXX XXX X XXX XX XXX XX XXX XX XXX XX XXX XX XX XXX XXX XXX XXX XXX XX XXX XX XXX XX XXX XX XX XX XXX XX XXX XX XX xxxx XXX xxxxx XXX xxxxx XXX XX XX XX XXX XX XXX XX XX xxxxx XXX xxxxx XXX xxxx XX XX XXX XX XX X X XXX XXX X XX XXX XXX X XXX XX XXX XXX XX XX XXX XXX XXX XX XX XXX XXX XXX XX XX XX XX XX XXX XXX XXX XX XX XX XXX XXX XX XX XXX XX Figure 7 54 In the informal description given above of the algorithm, note: (i) Once an element of B is initialized, six additional elements of B are computed in such a way as to make B symmetric and satisfy the condition that the sum of elements in each row of B is zero. (ii) At the point of computing L. . for i > 2, we already know the values of L for k < i and U, . and U, . for k < i and i < j; therefore, we can compute, except for the last term, the product terms in the elements (L U).. and (L U). . for j > i. 11 ij Instead of constructing a new matrix B, we will modify the elements of A such that it becomes A + B. It will be convenient not to distinguish between elements A. . and (A + B). .: the notation AB. . will be used for either. If ij 1.3 ij at least one nonzero product term of an element is computable in (ii), the element will be called partially known (PK), and the sum of the nonzero product terms is called partial value of the element (FV. .). We assume that, at the moment of computing the element L. ., the PK elements of the i-th row in the upper triangular part of L U have the following three properties: (i) There are at most four PK elements divided into two sets, FPK and SPK. (ii) Each set has at most two consecutive elements. (iii) Let the largest column index of FPK (SPK) elements be LCI n (LCI , ), the largest column index of FA (SA) fpk spk * elements in the corresponding row be LCI (LCI ). ia sa Then, LCI^. < LCT , LCI . < LCI . fpk — fa' spk — sa 55 Let the smallest column index of SPK elements be SCI , , spk the smallest column index of SA elements in the corresponding row be SCI . Then, sa ' SCI , = SCI or SCI , = SCI - 1. spk sa spk sa A high-level flowchart of the almost factorization algorithm is given in Figure 8. In the LU factorization, we need only the upper triangular part of A + B; as the computation progresses, the used part of A + B may be destroyed. The elements in L. are required only for computing elements in U. ; except for the diagonal elements of L, the used elements of L also may be destroyed; a small (bandwidth X five) two-dimensional array is enough to save the destructible elements of L in a compact form until they are used. The diagonal elements of U are all unity; only the nonzero and non-diagonal elements required to be saved. Matrix A stored in an array S by the natural compact storage scheme. The almost factorization algorithm is such that S is .overwritten by the nonzero and non-diagonal elements of U as they are generated, and diagonal elements of L are stored in one -dimensional array D. The new array of the nonzero and non-diagonal elements in U is called PU, as a control information for the array PU, two one -dimensional arrays IC and HE are also generated: column indices of the elements in PU are stored in the array IC, and the numbers of saved elements in each row of U are stored in the array NE. For example, for the matrix U in Figure 6, we have PU 1= U 12> ro 2 = V pu 3 . u 23 , w k . u 2 n+1 , pu 5 . u 2 n+2 , ro 6 = V PU 7 = U 3n + 2> 56 FACTLU i-*- X i^-i+1 Compute PV. . No ^ J> Yes \< L.-* AB. . - PV. . 11 li 11 L\.-—AB. . - PV. . Ji iJ iJ U. .+— L../L. . H ij ji' ii Yes SWITCH -*- 1 Modify AB... , AB.. , AB, , lk jk kk No L. .«— AB.. - PV.. ki lk lk lk ki' ii Yes B. .-«— PV. .- AB. . Modify AB. k , AB , AB kk »U- Modify AB , AB Rk L .<_ -B. .-PV.. ki ij lk U.,-*— L. ./L. . lk ki' ii L..<— AB. .- PV. . Ji ij ij U. .*—L../L. . ij 01 ii u i k AB.. - PV.. i lk lk L. ./L.. ki ii Yes SWITCH *- RETURN \ Figure 8 57 IC 1 = 2, IC 2 = n; m ± = 2, IC 3 = 3, IC^ = n+1, IC = n+2; HE 2 = 3, ICg = 4, IC = n+2; WE = 2, The details of the implementation is displayed in Flowchart 3 of the Appendix. The storage scheme used for the elements of PU is not the natural compact storage described in Chapter 3« Also, note that the elements of the diagonal of L, stored in array D, may be stored in array PU since the elements of U are l's and need not be stored. A separate array has been used simply for programming convenience. The adaptive Chebyshev procedure of Section 2 requires that A + B be positive definite. A proof of positive definiteness has been established for a regular finite element system, which it is expected extends to the irregular system discussed here. These results will be presented elsewhere. The emphasis here is on the potential practical benefit and implementation of iterative methods. Note that the solution of T L D L x = y, is required in {2k, b), which is equivalent to the systems Lv = y, Du = v, L x = u. 58 An algorithm for solving these equivalent systems is given in Flowchart k- of the Appendix. For which no searching operation is used in the algorithm. k.k Computational Results As a model problem for testing the developed algorithms and storage schemes, the Dirichlet problem (l) and (2) in a rectangular region fi is selected. Operator L in (2, a) is defined by LU " -5x" ^lo^ ~^ {a 2p- (2 > a) The domain ft is partitioned into graded triangles and the node points are numbered according to the pattern and ordering scheme in Figure 5« The structure of A is shown in Figure km For the problem with a _ = a = 1, the following five systems are solved: (1) 570 unknowns; (2) 1600 unknowns; (3) 5130 unknowns; (k) 8732 unknowns; (5) 10660 unknowns. The right hand side vector b is chosen as the null vector; the initial guess x is the vector with components equal to unity. The test results are displayed by a graph of the number of iterations vs. the norm of the residual vectors in Figure 9* 1^- e initial estimates of the eigenvalue bounds used for the test are as follows: (1) [0.8, 1.5] for system (l) (2) [0.8, 1.51 for system (2) (3) [O.k, 7.5] for system (3) 59 m OA o oo o CO o o LT\ o LT\ O ro O OO LT\ CM O CM UA H w o •H -P 05 !h CD •H o H OJ H O O O O ■+- H i O H -t- OJ i o H ro i O H O H i I o H vc o H i o H CO i o H 1— I o H o H O H 1- H H i O CM O H 1— ro rH I o s-p2npTS8H jo uijo^j 6o (h) [0.1)-, 9.5] for system (h) . (5) [O.k, 9.5] for system (5). Since the structure of A is irregular, other efficient iterative methods such as ADI are not applicable. There are no known computational experiments -with which our results can be compared directly. Of course, this is the primary reason for the development of our method. For the linear system with 900 unknowns obtained from the same Dirichlet problem by the five point difference technique, Diamond [10| reduced the residual -k- -8 norm to 10 10 after 35 iterations using his adaptive procedure with Stone's symmetric factorization for several choices of initially estimated eigenvalue bounds. For the finite element system with 570 unknowns and 1600 unknowns, the residual norms are reduced approximately to 10 and 10 , respectively, after j)5 iterations in our tests. Although the matrix structure is irregular, the results are comparable with that obtained by Diamond. For the system with 10660 unknowns, the residual norm is reduced -5.7 to 10 after 55 iterations. 61 APPENDIX Details of the computer implementation, some of which are not precisely stated in the main text, are displayed in the following four flowcharts of this Appendix. We use the symbols discussed by Gear [131 • Since most of the symbols are self-explanatory we do not explain the meaning of them except to indicate that N stands for DO LOOP and ^> —A for SUBROUTINE CALL. Brief description of each flowchart follows: Flowchart 1 : This shows the algorithm for computing Y = A * X, where A is a matrix stored in an array OS by the natural compact storage scheme and N is the order of A. Flowchart 2 : Diamond's adaptive -Chebyshev- factorization procedure is presented with minor modification. Among eight subroutines called in the procedure, three (PRODMT, FACTLU and SOLVLU) are in Flowchart 1, 3 and k. Contents of the other five follow: For the finite element system A * X = H, GEOSLH (N, NOS, OS, L, H) generates A and H. A is stored in OS; control information for OS is created in L; NOS is the number of elements of A in OS. INITIA (N,X) initializes X for the iteration. RESIDL (N,DW,H,RN) computes a vector DW = H - RN and also sets RN = DW. NEWT (N,X,DW, TAU) computes a vector X = X + TAU * DW. INPRO (N,DW,RN,UN) computes an innerproduct UN = (DW, RN). 62 Flowchart j : Details of the almost factorization algorithm are displayed. T S is a copy of OS generated by GEOSLH. In the factorization L * D * L = A + B, elements of D are stored in D; nonzero and non-diagonal elements of L overwrite S, and the part of the array S containing the elements of L is called PU; IC and NE are control information for PU. The variable I DIM should be greater than or equal to (the bandwidth of A) +1. The variable MDIM should be greater than or equal to the maximum number of partial values in every row. IFRB and JFRU are one -dimensional scratch arrays of size greater than or equal to N. NECR is a one-dimensional scratch array of size greater than or equal to IDIM. ICE and CRT are two-dimensional scratch arrays of size greater than or equal to IDIM X 5. ICL and PV are one -dimensional arrays of size greater than or equal to MDIM. T Flowchart k : The algorithm computes the solution of the system L * D * L * X = Y, where L is stored in PU by the storage scheme discussed in Section h.~5> NPU is the number of elements of L stored in PU; IC and- KE are the control information for PU. 63 RETURN > PRODMT N,OS,L,Y,X !«*— 1 N Y(I)— LE*-0 END> I— 1 1 N £ LB-e-LE + 1 LE-^-LE + L(LB) 1 1y(I)^-Y(I) + OS(LB) * X(I) LBP<-LB + 1 J-«— LBP 1 LE i y(D — -Y(I) + OS (J) * X(K) A Y(K) — ■Y(K) + OS (J) -K- X(I) END> - END> Folwchart 1 START 6k * . GEOSLH \ ,NOS,OS,L,H/ I*-l 1 NOS s(i)-«-os(i)H FACTLU S,L,N,D,PU,IC,NE,NPU,IFRB, JFRU,NECR, ICR,CRT, ICL, PV / IN,ITMA |bnder,ei, ITMAX E2 INP1<— IN+1 I TER •«- UMIN-*-El U-*-0 PRODMT N,OS,L,RN,DWM SOLVLU N,NPU,D,PU,IC,NE,DW,DW TAU-6- -TAU NEWT N,DWM,RN,TAU INITIA N,X Y0*-E1P2/E2M1 TMM<— 1 TM<--YO TMP-«-2*Y0**2-l Q1N -*- ^/E2M1 I*-l 1 N i I-*— E1P2 JK— K + L(K) IFRB(I)— K NECR(l) — 1 A K— IFRB(l) NE(l) — J *- MOD (I, IDIM) #~^ J<— 1 1 K WM— N - 1 M-*-NPU K— -1 LE^-0 I«-l 1* x(D— Y(fT LB<-LE + 1 LE-«— IE + NE(l) J^LB 1* LE JR^-IC(j) 1 Y(JR)<— Y(JR) - RJ(J) * X(l) I END> END> - X(N)-«— Y(N) I 1 1 N Y(I)^-X(I)/D(I) I |END> X(N)«— Y(N) i 1 1 NM *|M<-M - K IX-*- N - I x(ix)-e— y(ix) K«-NE(IX) JX-e— IC(M - J) J. X(lX)^-X(lX) - X(JX) * RJ(M - J) 1 END> END> Flowchart k 70 LIST OF REFERENCES 1. G. Birkhoff, The Numerical Solution of Elliptic Equations, Regional Conference Series in Applied Mathematics, No. 1, SIM, Philadelphia, 1971. 2. G. Birkhoff, M. H. Schultz and R. S. Varga, "Piecewise Hermite Interpolation in One and Two Variables with Applications to Partial Differential Equations," Num. Math ., 11 (1968), pp. 232-256. 3. A. Bracha-Barak and P. E. Saylor, "A Symmetric Factorization Procedure for the Solution of Elliptic Boundary Value Problems, " SIAM J. Num . Anal . 12 (1973), PP. 190-206. k. H. H. Bramble and M. Zlamal, "Triangular Elements in the Finite Element Method," Math. Comp ., 2k (1970), pp. 809-820. 5. B. L. Buzbee, G. H. Golub and C. W. Nielson, "On Direct Methods for Solving Poisson's Equations," SIAM J. Num . Anal ., 7 (1970), pp. 627-656. 6. R. E. Carlson and C. A. Hall, "Ritz Approximations to Two-Dimensional Boundary Value Problems," Num . Math ., 18 (1971), PP> 171-181. 7. J. C. Cavendish, H. S. Price and R. S. Varga, "Galerkin Methods for the Numerical Solution of Boundary Value Problems," Soc . Pet . Eng . J-, 9 (1969), pp. 20^-220. 8. L. Collatz, The Numerical Treatment of Differential Equations, 3rd ed. , Grundlehren d. Math. Wiss., 60, Springer -Verlag, Berlin, i960. 9. R. Courant, "Variational Methods for the Solution of Problems of Equilibrium and Vibrations, " Bull . Amer . Math . Soc , k-9 (19^3), pp. 1-23. 10. M. A. Diamond, "An Economical Algorithm for the Solution of Finite Difference Equations," Ph. D. Dissertation, Department of Computer Science, University of Illinois, UIUCDCS-R-71-^92, December, 1971. 11. T. Dupont, R. P. Kendall and H. H. Rachford, Jr., "An Approximate Factorization Procedure for Solving Self -Adjoint Elliptic Difference Equations," SIAM J. Num . Anal ., 5 (1968), pp. 559-573. 12. C. A. Felippa and R. W. Clough, "The Finite Element Method in Solid Mechanics," SIAM-AMS Proc , 2, G. Birkhoff and R. S. Varga eds., AMS, Providence, 1970, pp. 210-252. 71 13. C. W. Gear, Introduction to Computer Science, Science Research Associates Inc., 1973. 1^4-. A. George, "Computer Implementation of the Finite Element Method, " Ph. D. Dissertation, Computer Science Department, Stanford University, STAN-CS-71-208, February, 1971. 15. , "Nested Dissection of a Regular Finite Element Mesh," SIAM J. Num . Anal ., 10 (1973), PP- 3^5-363- 16. A. S. Householder, The Theory of Matrices in Numerical Analysis, Blaisdell, New York, 196^. 17. A. Jennings, "A Compact Storage Scheme for the Solution of Symmetric Linear Simultaneous Equations," Computer J., 9 (19&6), pp. 28l-285« 18. R. Levy, "Resequencing of the Structural Stiffness Matrix to Improve Computational Efficiency," J PL Quarterly Technical Review, 1 (1971), pp. 61-70. 19. S. G. Mikhlin, The Problem of the Minimum of a Quadratic Functional, English Trans 1., Holden-Day, San Francisco, 1965* 20. , Variational Methods in Mathematical Physics, English Transl., MacMillan, New York, 196^. 21. C. Miranda, Partial Differential Equations of Elliptic Type, 2nd rev. ed., Erg. d. Math. u. Grenzgebiete, 2, Springer -Verlag, Berlin, 1970* 22. H. S. Price and K. H. Coats, "Direct Methods in Reservoir Simulation," 3-rd SPE Sym. Num. Sim. Reser. Per. AIME, Houston, Texas, Jan., 1973- 23. D. J. Rose, "A Graph-theoretic Study of the Numerical Solution of Sparse Positive Definite Systems of Linear Equations, " Graph Theory and Computing, R. C. Read ed., Academic Press, New York, 1972, pp. 183-217. 2k. H. L. Stone, "Iterative Solution of Implicit Approximations of Multi- dimensional Partial Differential Equations," SIAM J. Num . Anal., 5 (1968), pp. 530-558. 25. G. Strang, "Approximation in the Finite Element Method," Num. Math . 19 (1972), pp. 81-98. 26. J. K. Synge, The Hypercircle in Mathematical Physics, University Press, Cambridge, 1957- 27. W. F. Tinney and J. W. Walker, "Direct Solutions of Sparse Network Equations by Optimally Ordered Triangular Factorization, " Proc . IEEE, 55 (1967), PP. 1801-1809. 28. R. S. Varga, Matrix Iterative Analysis, Prentice -Hall, Englewood Cliffs, N. J., 1962. 72 29» D. M. Young, Iterative Solution of Large Linear Systems, Academic Press, New York, 1971- 30. 0. C. Zienkiewicz, The Finite Element Method in Engineering Science, McGraw-Hill, London, 1971- 31. and Y. K. Cheung, "Finite Elements in the Solution of Field Problems," The Engineer, 220 (1965), pp. 507-510. 32. M. Zlamal, "The Finite Element Method in Domains with Carved Boundaries," Int . J. Num. Math . Eng ., 5 (1973), pp. 367-373- 33. , "On the Finite Element Method," Num . Math., 12 (1968), pp. 39^-^09. 73 VITA Young Jip Kim was born in Korea on November 27, 193 5 • After receiving a M.S. in Mathematics from Yonsei University, Seoul,, Korea, in i960, he taught mathematics at Taejon Presbyterian College, Taejon, Korea, until 1965* He received another M.S. in Mathematics from the University of Illinois in February 1967* Prior to continuing his studies at the Department of Computer Science, University of Illinois, in 1969, he held the position of Head of the Department of Mathematics, Taejon Presbyterian College. He is a past member of AMS, MAA and SIAM; he is currently a student member of ACM and an associate member of Sigma Xi. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-73-600 2. 3. Recipient's Accession No. 4. Title and SuDtitTe AN EFFICIENT ITERATIVE PROCEDURE FOR USE WITH THE FINITE ELEMENT METHOD 5- Report Date August, 1973 6. 7. Author(s) Young Jip Kim 8. Performing Organization Rept. No. 9. Performing Organization Name and Address Department of Computer Science University of Illinois at Champaign-Urbana Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract /Grant No. OWRR PROJ. #U058-ILL NSF GJ-36393 12. Sponsoring Organization Name and Address Water Resources Center National Science Foundation University of Illinois 1800 G. Street, N.W. at Champaign-Urbana Washington, D. C. 20550 Urbana, Illinois 618OI 13. Type of Report & Period Covered Thesis (Ph.D.) 14. 15. Supplementary Notes 16. Abstracts The symmetric almost factorization of Stone for finite difference systems is extended to irregular finite element systems. Combined with an adaptive Chebyshev iterative procedure of Diamond, the factorization algorithm is applied to linear systems for the numerical solution of two-dimensional Dirichlet problems by the finite element method. For a computer implementation, a compact storage scheme is introduced for sparse symmetric matrices; programming techniques are developed such that no searching operation for elements of the matrices is required in computations of the iterative method. 17. Key Words and Document Analysis. 17o. Descriptors Almost Factorization Algorithm, Compact Storage Scheme, Dirichlet Problem, Finite Element Method, Iterative Method, Positive Definite Matrix, Sparse Matrix. 17b. Identifiers/Open-Ended Terms 17c. COSATI Field/Group 18. Availability Statement 19. Security Class (This Report) UNCLASSIFIED 21. No. of Pages 75 20. Security Class (This Page UNCLASSIFIED 22. Price ORM NTIS-35 (10-70) USCOMM-DC 40329-P7 1 1 4 1973 !-' \<& MBHH hmh wiaS