LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 
 TfiGr 
 
 no.770-775 
 cop. 2~ 
 
1 he person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 
 
 L161 — 0-10% 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/onnewtoniterativ772sher 
 
UIUCDCS-R-75-772 
 
 /ju cq 
 
 coo-2383-0027 
 
 On Newton-Iterative Methods for the 
 Solution of Systems of Nonlinear Equations 
 
 *y 
 
 Andrew H. Sherman 
 
 December 1975 
 
UIUCDCS-R-75-772 COO-2383-0027 
 
 On Newton— Iterative Methods for the 
 Solution of Systems of Nonlinear Equations 
 
 by 
 Andrew H. Sherman* 
 
 December 197? 
 
 Department of Computer Science 
 
 University of Illinois at Urban a- Champaign 
 
 Urbana, Illinois 6l801 
 
 * This work supported in part by the Office of Naval Research under grant 
 N-00lU-67-A-0097-00l6 and by the U. S. Energy Research and Development 
 Administration under grant US AEC AT(ll-l) 2383. 
 
1. Introduction 
 
 N N 
 
 Let R be a subset of IR and let F: R -*■ E be a nonlinear function. 
 
 * 
 
 We wish to consider computing a solution x of the system of nonlinear 
 
 equations 
 
 F(x) = 0, (1.1) 
 
 vhere the Jacobian matrix 
 
 F'(.x)=f (x) 
 
 is sparse, symmetric, and positive definite. Although our results apply in 
 general, nonlinear systems arising in the numerical solution of partial 
 differential equations are of particular interest. 
 
 A very important method for the solution of (l.l) is Newton's 
 
 method (cf. Ortega and Rheinboldt [ll]). For a given initial guess x , 
 
 - . + + (1) (2) (3) 
 
 we generate a sequence of iterates x , x , x , ... by 
 
 j^) . ;W . jW, (1.2) 
 
 (k) 
 
 where 6 is the solution to the sparse linear system 
 
 F'(x (k) ) 6 (k) = F(x (k) ). (1.3) 
 
 Under suitable assumptions on x and F, the sequence is well defined and 
 
 converges quadratically to x*, i.e. , 
 
 m «r i , ., . II (k+1) *i i 2 k , , (k) »nt tt 
 
 3c < 1 such that [|x - x £ c ||x -x||. 
 
 A straightforward implementation of Newton's method could use 
 
 sparse symmetric Gaussian elimination (cf. Eisenstat, Schultz, and Sherman 
 
 t Throughout this paper we use | • | to denote the Euclidean norm. 
 
 ++ In the notation of Ortega and Rheinboldt [ll] , this condition defines 
 R-quadratic convergence. This is slightly weaker than the more standard 
 definition of Q-quadratic convergence, i.e., 
 
 3c' < 1 such that x - x ^ c' x - x 
 
2 
 [8], Sherman [13]) to solve the linear systems which arise. Unfortunately , 
 
 this would "be inefficient computationally because at each step it would 
 
 (k) (k) 
 
 require the computation of F(x ) and F'(x ) and the exact solution of 
 
 (1.3). Also, a large amount of storage would be required for the 
 
 (k) 
 factorization of F' (x ). 
 
 A number of modifications have been proposed to reduce some of the 
 costs of Newton's method, notably the simplified Newton's or chord method 
 and a variety of Newton-iterative methods (cf. Ortega and Rheinboldt [ll]). 
 For certain problems, these methods may require less storage or work than 
 Newton's method. However, computational difficulties are typical in the 
 solution of nonlinear systems, and each of these methods has its disadvan- 
 tages. The choice of method for a specific nonlinear system (l.l) depends 
 quite heavily on the properties of F and on the storage and time constraints 
 imposed by the computational environment. 
 
 The simplified Newton's method replaces F' (x ) with F'(x ) in 
 (l.3). This avoids the computation of F' (x ) at each step and reduces 
 
 the cost of solving (1.3) for k 5 1, since, if F'(x ) is factored into 
 
 T 
 the product U DU at the first step, each succeeding step only requires 
 
 forward and backsolving three triangular systems. However, the chord 
 
 t 
 method converges only linearly and requires the same amount of storage 
 
 as Newton's method. 
 
 The Newton— iterative methods use a linear iterative method (such as 
 
 SOR, cf. Ortega and Rheinboldt [ll], p. 215) to approximate the solution of 
 
 (1.3) instead of solving it exactly. The sequence of iterates generated by 
 
 (k) * 
 
 t The iterates x converge linearly to x if 
 
 u 4.x. 11 Ck+1) *, 1 I, (k) *, 
 
 3c < 1 such that x - x £ c x - x 
 
3 
 such methods depends upon the particular iterative method chosen and the 
 criteria used to stop the inner iteration. The advantages of Newton- 
 iterative methods are that they require only a small amount of storage and 
 
 (k) * 
 
 that, when x is near x , they can take advantage of the fact that zero 
 
 (k) 
 is a good initial approximation to 6 " . The problem with Newton-iterative 
 
 methods is that their rate of convergence is essentially determined "by the 
 
 rate of convergence of the inner (linear) iterative method applied to the 
 
 linearized problem (1.3). Thus they encounter all of the usual difficulties 
 
 with linear iterative methods relating to parameter estimation, starting 
 
 guesses, and stopping criteria. 
 
 In this paper we present the Newton-Richardson methods (cf. 
 Eisenstat, Schultz, and Sherman [7], Sherman [13]), a class of Newton- 
 iterative methods which use a Richardson-D' Jakonov iteration (cf. Richardson 
 [12], D'Jakonov [h]) to approximate the solution of (1.3) at each step. 
 The Newton-Richardson methods require the same amount of storage as Newton's 
 method or the chord method, but if 2 steps of the inner iteration are taken 
 at the k-th step, they will converge quadratically with less work per step 
 (on average) than with Newton's method. And as compared to most other 
 Newton-iterative methods, the Newton-Richardson methods have the advantage 
 that their rate of convergence is essentially independent of N. 
 
 In the next section we describe a model semilinear partial 
 differential equation which will be used to motivate the Newton-Richardson 
 methods. In Section 3 we discuss the implementation of Newton's method and 
 show how to derive Newton-Richardson methods for the model problem. In 
 Section U, we examine the general theory of Newton-iterative methods and 
 state several local convergence results. In Section 5 we consider the 
 
k 
 
 ramifications of the theory for the model problem, and finally, in the 
 appendix we prove the results stated in Section h. 
 
2. A Model Problem 
 
 In this section we describe a model problem arising in the numerical 
 solution of semilinear partial differential equations and show how to apply 
 Newton's method to it. In the next section, we will motivate the Newton- 
 Richardson methods by analyzing the costs of Newton's method and certain of 
 its modifications for this model problem. 
 
 Let D be the unit square (0,l) x (0,l) and let 3D denote the boundary 
 
 of D. The model problem we wish to solve is the nonlinear Poisson equation 
 
 -A v = g(v) in D 
 
 (2.1) 
 with v = on 3D, 
 
 where g satisfies 
 
 g v (v) £ X < A 
 
 (where A is the fundamental eigenvalue of -A on D) , so that v(x,y) exists 
 
 and is unique (cf. Ortega and Rheinboldt [ll], pp. 110, ll6). 
 
 To obtain an approximate solution to (2.1) , the continuous problem 
 
 must be reduced to a discrete form. Several techniques are available to do 
 
 this, and we consider the use of five-point finite difference approximations. 
 
 We superimpose on D a uniform square n*n mesh D , with boundary 3D , and 
 
 replace the differential operator -A in (2.1) with the standard five-point 
 
 difference approximation on D, . (Here h = — r* is the vertical or horizontal 
 ^ h n+1 
 
 distance between adjacent mesh points.) Letting V. . be the approximation 
 to v(ih, jh) , we obtain the system of equations: 
 
 -V. . . - V. ■ . - V. . _ - V._,_- , + UV. , = h 2 g(V. .) 
 l-l, j i,J-l i,,] + l i+l, J ij ij 
 
 for (ih, jh) e D h (2.2) 
 
 V = for (ih, jh) e 3D h . 
 
 2 
 
 Letting N = n and ordering the mesh points with the natural, or row-by-row 
 
(2.3) 
 
 ordering, we may write these equations more concisely as an N*N system of 
 nonlinear equations 
 
 AV = G(V) , 
 in which 
 
 V = {V ij: (ih, jh) e D h }, 
 
 G(V) = {h g(V ): (ih, jh) e ly, 
 
 and A is the n*n block tridiagonal matrix 
 
 A = 
 
 -I X B 
 
 where I is the nxn identity matrix and B is the n><n tridiagonal matrix 
 
 *U -1> 
 
 B = 
 
 -1 
 N -l H 
 
 Under the conditions imposed on (2.1), this system of equations has a 
 unique solution V which is a close approximation to v(ih, jh) (cf. Ortega 
 and Rheinboldt [ll], p. 112). 
 
 Newton's method may be applied in a straightforward way to solve 
 (2.3) for V. Given an initial guess V , we generate a sequence of 
 successive approximations V , V , V , ... to V by 
 
 v (k+l) _ y (k) _ 6 (k) 
 
 (k) . 
 
 where 6 is the solution of the system of linear equations 
 
7 
 
 j(y (k) ) 6 (k) = AV (k) - G(V (k) ) (2.1+) 
 
 and 
 
 j(y (k) ) = a - ~ (v (k) ). 
 
 (k) 
 Under the assumptions made above, J(V ) is an N*N sparse, symmetric, 
 
 positive definite matrix having the same zero structure as A (i.e. , 
 
 (j(V (k) )). . 4 if and only if A. . i 0). If V^ is sufficiently close to 
 
 V, then Newton's method converges quadratically to the solution of (2.3) 
 
 independent of h (cf. Greenspan and Parter [10]), and only 0(log log N) 
 
 Newton iterations are required to reduce the initial error | [ V - v| | 
 
 2 1 
 hy a factor of h = — , the discretization error. 
 
 t Throughout this paper we use log(x) to denote log_(x) 
 
8 
 3. Newton-Richardson Methods for the Model Problem 
 
 As we saw in the last section, each step of Newton's method for the 
 model problem requires the solution of a sparse symmetric linear system. 
 Thus Newton's method really amounts to the solution of a sequence of related 
 linear systems, and its costs are directly related to the manner in which 
 these systems are solved. In this section we consider a number of approaches 
 to solving the systems (2.U), and we introduce Newton-Richardson methods 
 for the model problem (cf. Eisenstat, Schultz , and Sherman [7], Sherman [13]). 
 
 In the standard implementation of Newton's method for the model 
 
 problem, the linear systems which arise can be solved with sparse symmetric 
 
 Gaussian elimination (cf. Eisenstat, Schultz, and Sherman [7, 8], Sherman 
 
 [13]). Each solution requires 0(N log N) storage locations for the factors 
 r (kh rt ,„3/2, ..„.__., ,,.„_ „__ . T „ T (k) 
 
 o 
 
 f J(V V ; ), 0(N ) arithmetic operations to factor J(V ; ), and 0(N log N) 
 
 arithmetic operations to solve the resulting triangular systems (cf. Sherman 
 [13]). For V sufficiently close to V, Newton's method converges quad- 
 
 ratically, and the initial error | | V - v| can be reduced by a factor of 
 
 2 1 
 h = — in O(log log N) steps at a total cost of 0(N log N) storage locations 
 
 3/2 
 and 0(N log log N) arithmetic operations. 
 
 One of the reasons that Newton's method requires so many arithmetic 
 
 (k) 
 
 operations is that it does not exploit the fact that the iterates V are 
 
 converging quadratically to V. The sequence of systems (2.H) is solved as 
 
 if the systems were unrelated, when, in fact, their solutions rapidly 
 
 (k) 
 
 approach zero as V converges to V. To exploit this situation, we could 
 
 use an iterative method to approximate the solution of (2.U) at each step. 
 Most Newton-iterative methods require only 0(N) storage locations for the 
 model problem, and some require fewer arithmetic operations than Newton's 
 method. Unfortunately, however, for the model problem the rates of 
 
9 
 convergence of most Newton-iterative methods depend on the mesh spacing h 
 (see Section 5) in such a way that they have higher asymptotic arithmetic 
 costs than Newton's method. For example, even though Newton-SOR converges 
 
 linearly, the rate of convergence depends on h, and the method requires 
 
 1/2 2 
 
 0(N log N) iterations to reduce the error by a factor of h (see Section 
 
 5). If the rate of convergence were independent of h, then only O(log N) 
 
 iterations would be required. 
 
 We now present the Newton-Richardson methods for the model problem. 
 These methods require more storage than most other Newton-iterative methods 
 (0(N log N) locations), but they converge at rates which are independent of 
 h. The methods are based on the principles embodied in the Strongly 
 Implicit Procedures developed by Stone [lU], Dupont [6], Diamond [3], and 
 others. The basic idea is to use sparse Gaussian elimination to obtain 
 6 at the first step and to compute 6 at succeeding steps with a 
 
 Richardson-D' Jakonov iteration (cf. Richardson [12] , D' Jakonov [k]) 
 involving the factorization of J(V ). 
 
 In particular, suppose that at the k+l-st step (k >, l) we wish to 
 solve the system 
 
 j(V (k) ) 6 (k) = b (k) . (3.1) 
 
 Since we have already factored J(V ) at the first step, the system 
 
 J(V ) w = d 
 can be solved with 0(N log N) arithmetic operations. We can then solve 
 (3.1) by using a Richardson-D 'Jakonov iteration to generate a sequence of 
 iterates 6 ,8 , 8 , ... which converges to 6 . (The idea here is 
 
 t It is also possible to use a scaled conjugate gradient iteration here 
 (cf. Douglas and Dupont [5])» but the analysis is more complicated, and, 
 in any case, the results are improved by at most a constant factor. 
 
10 
 similar to that of iterative improvement, cf. Forsythe and Moler [9]» 
 pp. U9-5M • For 6 =0 and a fixed real constant y, the Richardson- 
 
 D'Jakonov iterates satisfy 
 
 Als.) .. .(k) (k) 
 
 6 =6. - ye. 
 
 ~i+l ~i ~i 
 
 (k) 
 where e. is the solution to the linear system 
 ~i 
 
 J(V (0) ) ef k) = j(V (k) ) 6< k) -b (k) . 
 
 ~i ~ ~i 
 
 Each step of the iteration requires 0(N log N) arithmetic operations, and 
 
 (k) 
 the iterates will converge linearly to 6 if 
 
 p = ||I- yJtV^rW 10 )!! < 1 
 (cf. Young [15], p. 77). 
 
 To maximize the rate of convergence, we choose y to minimize p. If 
 all the eigenvalues of J(V ) j(V ) lie in the interval [a,g] , this is 
 accomplished by choosing 
 
 2 
 
 y = 
 
 a+3 ' 
 yielding 
 
 p = (g - a)/(a + 3). 
 For the model problem, p can be made independent of h by choosing V near 
 
 enough to V (see Section 5), so the Richardson-D' Jakonov iteration will 
 
 t 
 converge linearly, independent of the mesh size. 
 
 Different Newton-Richardson methods are obtained by varying the 
 
 number of Richardson-D' Jakonov iterates computed at each step of the outer 
 
 k 
 iteration. Of particular interest are the case in which 2 iterates are 
 
 t It has been shown by several authors (.cf. Young [15]) that the rate of 
 convergence is increased by a constant factor by using Tchebychev 
 acceleration to choose a sequence of parameters {y v l instead of the 
 single parameter y. 
 
11 
 
 computed at the k-th step and the case in which just one iterate is computed 
 at each step. In Section 5 we will show that if V is sufficiently close 
 to V, then the first of these methods converges quadrat ically to V, while 
 the second, which reduces to a relaxed chord method, converges linearly. 
 However, in either case the solution of the model problem requires a total 
 of 0(N(log N) ) arithmetic operations in addition to the cost of the initial 
 factorization of J(V ) (which we treat as preprocessing). 
 
12 
 h. Convergence Results 
 
 In this section we present an analysis of the local convergence 
 properties of Newton-iterative methods. We will show that it is possible 
 to adjust the rate of convergence of such methods by varying the number of 
 inner iterations taken at each step and that by taking 2 inner iterations 
 at the k-th step, quadratic convergence is obtained. Since the Newton- 
 Richardson methods described in the last section are a particular instance 
 of Newton-iterative methods, the convergence results for them follow 
 immediately. The proofs of the results described here are contained in 
 the appendix. 
 
 We begin by restating our problem in a more precise and formal way. 
 
 N N 
 
 If R is a bounded subset of R and F: R -»■ R is a nonlinear function, we 
 
 consider the solution of 
 
 F(x) = 0, (U.l) 
 
 * 
 
 where we assume that a solution x exists. We also assume that there is an 
 
 ii *i i 
 r > such that if S = {x: | |x - x | < r }, then 
 
 (i) F is differentiable in S ; 
 
 3F * 
 
 (ii) F'(x) = — (x) is nonsingular at x ; 
 
 ~ oX ~ 
 
 (iii) There is an L < + °° such that for x e S_ , 
 
 | |F'(x) - F'(x*)| | <: L| |x - x*| |. 
 
 It can be shown that the discretized model problem of Section 2 
 
 satisfies (i) - (iii) (cf. Greenspan and Parter [10]), so that we may 
 
 directly apply the theory developed here. In the next section we will 
 
 consider the model problem as a practical illustration of the theory. 
 
 From assumptions (i) - (iii), it follows that there is a < r_ £ I'- 
 ll *i i 
 
 such that F' is continuous and nonsingular in S ] = {x: |x - x | < r.. } 
 
13 
 
 (cf. Ortega and Rheinboldt [ll], p. U6). Thus Newton's method can be used 
 
 * (0) 
 
 to solve (U.l) for x . For a given x e S. , we generate the sequence of 
 
 iterates x , x , x , ... "by 
 
 5 (»D.. ,(*)■■_ jOO, (u.2, 
 
 where 
 
 F'(x (k) ) 6 (k) = F(x (k) ). (k.3) 
 
 It can he shown that for some < r £ r , Newton's method converges 
 
 * (0) ii *, | 
 
 quadratically to x for x e S = {x: |x - x | | < r_} (cf. Ortega and 
 
 Rheinboldt [ll], p. 312). 
 
 Newton-Richardson methods are a specific instance of general Newton- 
 iterative methods in which a linear iterative method is used to approximate 
 the solution of (k.3) at each step. A common way of generating Newton- 
 iterative methods involves splitting F' (x) by writing it as 
 
 F'(x) = B(x) - C(x), (k.k) 
 
 where 
 
 (iv) B(x) is continuous at x and 
 
 (v) B(x ) is nonsingular. 
 Furthermore, if the splitting (U.U) is to lead to a computationally efficient 
 method, it must be possible to quickly solve the system 
 
 B(x) w = d. (k.5) 
 
 It follows from (iv) - (v) that there is a < r_ £ r p such that B is 
 
 *i 
 
 continuous and nonsingular in S = {x: |x - x | < r } (cf. Ortega and 
 
 Rheinboldt [ll], p. k6)\ and if we let 
 
 H(x) = B(x)" 1 C(x) 
 
 = I - B(x)" 1 F(x), 
 it then follows that H is continuous in S_. 
 
Ik 
 
 The Newt on- Richards on methods are obtained with the splitting (U.U) 
 in which B and C are defined by 
 
 B(x) = - F'(x (0) ), C(x) = B(x) - F'(x), 
 
 where y is a real constant satisfying < y < 2 and x e S_ . Clearly 
 
 this splitting satisfies (iv) and (v) , and once F' (x ) has been factored 
 at the first step, the system (h.3) can be solved quickly. 
 
 At each step of a Newton-iterative method, we use the linear 
 iteration to generate a sequence of inner iterates 6 ,6 , 6^ , ... in 
 which S XJ = and 
 
 .00 _ .00 (k) 
 
 6. .- - 0. - E . , 
 
 -1+1 ~1 -i 
 
 where 
 
 B(x (k) ) 5 < k) = F'(x (k) ) «< k) -F(x (k) ). 
 
 For any positive integer m, let 
 
 A (x) = (I + H(x) + H(x) 2 + ... + H(x) 111 " 1 ) B(x) -1 
 
 m ~ ~ ~ 
 
 = (I - H(x) m )(l - H(x))' 1 B(x)" 1 
 = (I - H(x) m ) F'(x) -1 . 
 
 Then letting 
 
 and 
 
 we have 
 
 B k = B(x (k) ) 
 
 H k = H(x (k) ), 
 
15 
 
 |W = 8< k! -B-V(x W ) «! k) -F(x (k) )) 
 
 ~i+l ~i k - ~i 
 
 = IL 6< k) ♦ BT 1 r(» W ) 
 
 k ~i k 
 
 = H^ 6^ + B" 1 F(x (k) )] + B" 1 F(x (k) ) 
 
 = H k^ +(l + H k )B k _1 ^ (k)) 
 
 - H^ +1 & { k) + (I + H k + ... + h£) B" 1 F(x (k) ) 
 
 = (I + H k + ... + h£) B" 1 F(x (k) ) 
 
 = A i+1 (x (k) )FU (k) ). 
 
 At the k-th Newton-iterative step, we define 
 
 6 (k) _ 6 (k) 
 
 for some positive integer m. Letting 
 
 G (x) = x - A (x) F(x) , 
 m - ~ m ~ - 
 
 we have 
 
 -o (x (h) ). 
 \ - 
 
 If | |h(x )| < 1, then under assumptions (i) - (iii) it can he shown that 
 
 for some < r, s r , the outer iterates {x } defined hy Qi.6) exist and 
 
 * (0) II * i i 
 
 converge to x for x e S, = {x: [ |x - x < r, } (cf. Ortega and 
 
 Fheinboldt [11 ], pp. 350-351 ). + 
 
 t It actually suffices that the spectral radius of HCx ) satisfy 
 
 p(H(x*)) < 1, hut in order to avoid problems in relating norms to the 
 spectral radius , we give all our results in terms of the Euclidean norm, 
 (if H(x*) is symmetric, then of course we have 
 
 I |H(x*)|| = p(h(x*)).) 
 
16 
 The rate at which the Newton-iterative iterates converge to x 
 depends mainly on | |H(x )| and the sequence {m, }. To obtain our main 
 
 convergence result, we make two further assumptions on the splitting (U.U): 
 
 ii*ii 
 (vi) For some constant A, < A < 1, |[H(x)| < A; 
 
 (vii) There are an L < + °° and a < r £ r, such that for 
 
 ii *i i 
 
 xeS= {x: | | x - x || < r } , 
 
 |B(x) - B(x )| | * L 1 | 
 
 *l 
 
 X - X ' 
 
 We now state our main convergence result, deferring its proof to 
 the appendix. 
 
 Theorem k.l : Let F satisfy (i) - (iii) and let F' (x) = B(x) - C(x) be a 
 
 splitting which satisfies (iv) - (vii). Let m , hl , m , ... be a sequence 
 
 of positive integers, and define 
 
 k-1 
 m = max [U,m } u {(m, - I m ): k = 1,2,3,...}]. 
 U * 1=0 l 
 
 Then, if m < + °°, there are a < r <; r (depending on X, H, and F) and a 
 
 positive constant c (depending on A and m) with X < c < 1 such that for 
 
 (0) *i i 
 
 x e S = {x: |x - x || < r}, the sequence of iterates defined by (h.6) 
 
 satisfies 
 
 I | (k+1) »j. "k ,, 00 *, I 
 
 ||x -x||£c |x - x I I . 
 
 While the restriction on m in Theorem U.l may seem contrived, it 
 simply reflects the fact that Newton-iterative methods cannot be expected 
 to exhibit super-quadratic convergence. Hence choosing ni > 2 is pointless, 
 If we choose m = 1 or iil = 2 , then m = 1 in the theorem, and we have the 
 following corollaries. 
 
IT 
 Corollary U.2 : Let F satisfy (i) - (iii) and let F 1 (x) = B(x) - C(x) be a 
 splitting which satisfies (iv) - (vii). Define the sequence 
 {m.: k = 0,1,2,...} by nL = 1. Then there are a < r <: r (depending on 
 
 X, H, and F) and a positive constant c (depending on X) with X < c < 1 such 
 
 (0) ii *, | 
 
 that for x e S = {x: | |x - x | < r}, the sequence of iterates defined 
 
 by (h.6) converges linearly, i.e., 
 
 I I (k+1) *i i ii (k) *, | 
 
 [ |x - x I | £ c| [x - x | | . 
 
 Corollary U.3 : Let F satisfy (i) - (iii) and let F' (x) = B(x) - C(x) be a 
 
 splitting which satisfies (iv) - (vii). Define the sequence 
 
 {hl : k = 0,1,2,.'. . } by m. = 2 . Then there are a < r £ r (depending on 
 
 X, H, and F) and a positive constant c (depending on X) with X < c < 1 such 
 
 (0) ii *, | 
 
 that for x e S = {x: | |x - x | < r}, the sequence of iterates defined 
 
 by (h.6) converges quadratically , i.e., 
 
 II (k+1) *,, 2 k M (k) *,, 
 
 X - X £ C X - X . 
 
18 
 5. Ramifications of the Theory for the Model Problem 
 
 Corollary" U.2 expresses the previously known result that a Newton- 
 iterative method converges linearly for m = 1 (cf. Ortega and Rheinboldt 
 [ll], p. 350). The real import of Theorem k.l is the rather surprising 
 result of Corollary U. 3, i.e., that we can obtain R-quadratic convergence 
 with a Newton-iterative method by choosing hl = 2 . Since Newton's method 
 itself is only quadratically convergent (cf. Ortega and Rheinboldt [ll] , 
 p. 312) this is the fastest convergence that can be expected for any 
 Newton-iterative method. 
 
 A problem with the result of Theorem U.l is that c depends on X: 
 
 specifically, A < c < 1. To reduce the initial error |v - v| j by a 
 
 factor of K < 1 requires I outer iterations , where 
 
 l-l m 
 K * tt c . (5.1) 
 
 k=0 
 
 Taking logarithms , we obtain 
 
 l-l 
 log K ~ ( Z m )(log c), 
 k=0 K 
 
 so that I satisfies 
 
 l-l 
 
 I til * log K/log c ^ log K/log X. 
 k=0 k 
 
 As an illustration, we consider the model problem of Section 2. 
 
 For Newton-SOR methods it can be shown that even with the optimal SOR 
 
 parameter co, log X ~ -2iTh (cf. Young Il5], p. 190). Reducing the initial 
 
 2 
 error by a factor of the discretization error h requires I iterations, 
 
 where £ satisfies 
 
19 
 
 U1 2 
 
 E hi ~ log h /log X 
 
 k=0 K 
 
 8 (-log N)/(-2irh) 
 
 I /p 
 = 0(N ' log N). 
 
 Thus if hl = 2 , we have 
 
 ^"1 -l /p 
 
 £ m = 2 = 0(N ' log N) , 
 
 k=0 
 so that & = 0(log N). And if bl = 1, then we have 
 
 4 " 1 1/2 
 
 Z hi = H = 0(N ' log N). 
 
 k=0 k 
 
 In either case, though, since each inner SOR iteration requires 0(N) opera- 
 
 3/2 
 tions (cf. Young [l5J)j a total of 0(N log W) iteration time is required, 
 
 3/2 
 
 as compared to the 0(N log log N) time required by Newton's method. 
 
 For Newton-Richardson methods, however, the situation is quite 
 different. We have 
 
 ||h(x*)|| = MbCx*)- 1 C(x*)|| 
 
 YF'Cx^^" 1 [( h ) F'(x (0) ) - F'(x*)] 
 
 Y 
 
 i - yF'U^V 1 f«(x*)||. 
 
 Since F r (x) is continuous in S , we may make | |H(x )| arbitrarily close 
 
 i i i i (0) *i i 
 
 to |1 - y| by choosing | |x - x | | small enough. In particular, we can 
 
 always make |h(x )| < (l + |l - y\)/2 < 1. For the model problem, then, 
 
 the constants X and c of Theorem k.l are independent of h for Newton- 
 
 2 
 Richardson methods. Thus to reduce the initial error by a factor of h 
 
 requires I outer iterations, where 
 
 £-1 2 
 
 I m ~ log h /log c = -log N/log c 
 k=0 K 
 
20 
 and log c is a constant independent of N. Hence if itl = 2 , we have 
 
 2 n. s 2 = -log N/log c, 
 k=0 
 
 so that £ = 0(log log N). And if m. = 1, then we have 
 
 £-1 
 
 I m = I ~ -log N/log c = 0(log N). 
 k=0 k 
 
 This leads to the following result on the cost of solving the model problem 
 
 with Newton-Richardson methods (for a proof, see Eisenstat, Schultz, and 
 
 Sherman [T]» Sherman [13]). 
 
 Theorem 5.1 : Consider the use of Newton-Richardson methods to solve the 
 model semilinear problem presented in Section 2, and assume that V and 
 
 Y are chosen so that Theorem k.l holds for c independent of h. Then 
 ignoring the costs of preprocessing and the factorization at the first 
 step, we may reduce the initial error | |V - V| ] by a factor of h in 
 0(N(log N) ) iteration time by choosing either itl = 2 or m, = 1. 
 
 From Theorems k.l and 5.1 we conclude that the choice of m, for 
 
 k 
 
 Newton-Richardson methods should be based on the relative costs of function 
 and derivative evaluations. For the model problem, choosing itl = 2 
 requires 0(log log N) function evaluations and 0(log log N) derivative 
 evaluations, while choosing m = 1 requires 0(log N) function evaluations 
 and only one derivative evaluation (i.e., F' (V )). Since function 
 evaluations and derivative evaluations are equally costly for the model 
 problem ( — (V ) is a diagonal matrix) , it seems best to choose m, = 2 . 
 However, we have observed experimentally that for other problems it may 
 be more efficient computationally to choose itl = 1 (cf. Sherman [13]). 
 The reason for this is that derivative evaluations are often far more 
 
21 
 costly than function evaluations for problems more general than the model 
 problem. 
 
 
 
22 
 6. Conclusion 
 
 In this paper we have given a general analysis of the local rates 
 of convergence of Newton-iterative methods for the solution of systems of 
 nonlinear equations. In addition, we have presented the Newton-Richardson 
 methods, a particular class of Newton-iterative methods which appear to 
 have computational advantages over both Newton's method and other Newton- 
 iterative methods for sparse nonlinear systems. In practice these methods 
 have performed quite well (cf. Eisenstat, Schultz and Sherman [7], Sherman 
 [13]), although they are by no means good methods for all problems. It is 
 clear that the choice of a method for a particular nonlinear system depends 
 heavily on the nonlinear operator involved and the context in which it 
 arises . 
 
 In closing, it is worth pointing out that the ideas underlying the 
 Newton-Richardson methods have important applications in many other settings. 
 Already, a number of authors (e.g., Bank [l], Concus and Golub [2], Douglas 
 and Dupont [5]) have used SIP-like iterations to solve linear systems arising 
 in the numerical solution of partial differential equations, and in the 
 future similar techniques will no doubt be applied to a wide variety of 
 nonlinear and time-dependent problems. In fact, it would seem that the 
 idea of solving difficult problems by iterating on the solutions of nearby, 
 easily-solved problems is one which should prove fruitful whenever it can 
 be applied in a natural way. 
 
23 
 
 Appendix: Proofs of Convergence Results 
 
 In this appendix we shall prove Theorem k.l. We will make frequent 
 use of the assumptions and results of Section U, and the reader is advised 
 to refer to that section, particularly for the statements of assumptions 
 (i) - (vii). Much of our analysis follows that of Ortega and Rheinboldt 
 [11], pp. 350-352. 
 
 Lemma A.l : Let F satisfy (i) - (iii) and let F'(x) = B(x) - C(x) he a 
 splitting which satisfies (iv) - (vi). Then there is a < r^ $ r, such 
 that for x e S^- = {x: | |x - x | < *>}, 
 
 I |h(x)|| * a 
 
 and for any positive integer m, 
 
 ||H(x) m - H(x*) m || v < mA^llHCx) - H(x*)||. 
 Proo f : Since H is continuous in Si , we can choose r^ £ r, so that 
 
 | H(x) I £ A for x e S^-. The second part of the lemma follows by induction. 
 It holds trivially for m = 1; and if it holds for m = k, then 
 
 | |H(x) k+1 - H(x*) k+1 | | = | |H(x) k (H(x) - H(x*)) + (H(x) k - H(x*) k )H(x*) | | 
 
 S X k ||H(x) - H(x*)| | + kA 1 "" 1 ! |H(x) - H(x*)|| | |h(x*)| 
 
 $ (k+l)A k ||H(x) - H(x*)||. D 
 
 Lemma A. 2 : Let F satisfy (i) - (iii) and let F 1 (x) = B(x) - C(x) he a 
 splitting which satisfies (iv) - (vi). Then there are positive constants 
 
 c, < + °° and c < + °° and a < r £ r^ such that for any positive integer 
 
 ii *i i 
 
 m and xeS={x: | |x - x | | < r } , 
 
 |G (x) - x I I <: cJ |x - x I I 2 + c | |H(x) - H(x )| I |x - x I + A m | |x - x I 
 
 Proof : From (iv) - (vi) and Lemma A.l, it follows that we may choose r_ 
 
2k 
 so that there is a B < + °° such that for x e S , | |h(x) | £ A and 
 | |b(x) _1 | I <: 6 (cf. Ortega and Rheinboldt [ll], p. 351). Then for x e S 
 we have 
 
 I |G (x) - x*| | * | |x - A (x) F(x) - X* - H(x*) m (x - x*) I I + I |H(x*) m (x - x*) 
 
 = ||-A m (x) F(x) + (I - H(x*) m )(x - x*)|| + ||H(x*) m (x - x) \ \ 
 = ll-A (x) F(x) + A (x*) F'(x*)(x - x*) I I + I |H(x*) m (x - x*) I 
 
 AA y. JJ£ 
 
 S | -A (x)[F(x) - F(x ) - F' (x )(x - x )]|| + 
 ||A m (x)[F'(x) - F'(x*)](x - x*)|| + 
 
 | | [A (x) F'(x) - A (x*) F'(x*)](x - x*)|| + A m ||x - x*| 
 m ~ ~ ra~ - ~ ~ ' ~ ~ 
 
 Using 
 
 I I A (x)|| = | | (I + H(x) + H(x) 2 + ... + H(x) m " 1 )B(x)' 
 
 * (1 + A + A 2 + ... + A m_1 )| 
 
 * 1-A » 
 we can bound the terms of (A.l) individually. From the mean value theorem 
 we obtain 
 
 ll-A (x)[F(x) - F(x*) - F'(x )(x - x )]|| 
 
 < I A (x) I |x - x | sup |F»(x + t(x - x )) - F' (x ) I 
 m~ " " _ , _ ~ ~~ ~ 
 
 QStSl 
 
 $ |A (x) I |x - x I sup L |t(x - x ) 
 m ~ 0$t*l 
 
 * 6L/(1 - A) i |x - x I | 2 . 
 
 The second terra of (A.l) is bounded by 
 
25 
 I |A m (x)[F'(x) - F«(x*)](x - x*)|| $ ll A m (x)| | ||F'(x) - F'(x*)| | ||x - x*| | 
 
 $ PL/(1 - X) | |x - x | | 2 . 
 
 Lemma A.l now yields 
 
 | | [ A (x) F« (x) - A (x*) F' (x*) ] (x - x*) | | = | | (H(x*) m - H(x) m ) (x - x*j | | 
 
 I ' m ~ ~ m~ ~ *• «• ~ ~ ~ ~ ' 
 
 * mX* 1 " 1 | |H(x) - H(x*)| | ||x - x*||. 
 Since X < 1 , we can choose c = c (X) < + °° so that the set {mX } is 
 bounded by c . Letting c = c (X, r , 8, L) = 2BL/(l - X) and combining 
 terms, we obtain 
 
 ii * i i ii ^ I i ? 11/. /^>iiii * i i TTlii * i i 
 
 G (x) - x S a. x - x r + cj H(x) - H(x x - x + X x - x . □ 
 
 II m - - " 1 ' ~ ~ 2. - ~ i>i' r - 11 M - ~ ' ' 
 
 Assumption (vii) is not the only restriction which could be used to 
 
 prove Theorem U.l, but it is one which the Newton-Richardson methods satisfy 
 
 trivially since B(x) is a constant function. From assumptions (iii) and 
 
 (vii), it follows that C(x) also satisfies a Lipschitz condition. Thus, 
 
 since we have already noted that B is continuous and nonsingular and B 
 
 is uniformly bounded in S 7 , it follows that there are an L ] < + °° and a 
 
 < r <: min(r,_, r„) such that for x e S Q = {x: x - x < r D } 
 
 o 5 7 -.(j-ii~-i" H 
 
 | |H(x) - H(x ) | | £ L | |x - x | |. 
 Using this result, we now prove Theorem k.l. 
 
 Proof of Theorem k.l : Let 8, c , c , and r be defined as in Lemma A. 2 
 and define 
 
 c(t) = ((c x + c 2 L ] _)t + X 1 ) 1/m . 
 
 We select r < r ft so that c(r) <c= (X+l)/2<l and choose x e S. We 
 
 can now prove the theorem by induction using Lemma A. 2. For k = we have 
 
26 
 
 x (l) -x*||= ||G m ( X ( °h-x*| 
 
 - m o ~ 
 
 * Cl ||x (0) -x*|| 2 + c 2 ||H(x (0) ) -H(x*)|| ||x (0) -x*|| + 
 
 X m °||x (0 ^/|| 
 i ((c x + c 2Ll )r + A 1 " ) ||x (0) - 5 *| | 
 
 w m i >% ii (°) *n 
 
 <(c - X + X ) X - X II. 
 
 m. 
 „ „ , , m 
 
 But 0<X <X<c,so 
 
 and we have 
 
 m , , m 
 
 0<c -X+X £c £ c , 
 
 (D *|i < m o I, (o) *, 
 
 x - x | £ c [x - x I 
 
 If we assume that the theorem holds for £ k < j, then for k = j 
 we have 
 
27 
 
 l; (J+1) -j*ll-l|o n (x (J) )-=c*! 
 
 J 
 
 * Cl ||x (j) - x*|| 2 + c 2 ||H(x (j) ) - H(x*) | | ||x (j) - x*| 
 
 m. 
 
 *W J> -X* 
 
 t (o 1 *e 2 L 1 ) ||x (J) -x*|| 2 + » J ||x<J ) - x*| 
 
 S I(c 1+ = 2 L X )( V c"* ) ||x (0) - 5 *1| + X^] ||x (J) 
 
 
 * [( C]L + c 2 L x )r c J + X J ] ||x UJ - x 
 
 * [(c m - X)c J + X J ] ||x (j) - x 
 
 m . m . — m m . / . \ ^ 
 
 = [ c J _ xc J + X J ] ||x UJ - x I 
 
 m . m . -m 
 
 = c 
 
 I . -Ul . —Ill . / . \ 
 
 J [l- Xc" m + X J c J ] ||x (j) - x 
 
 * 
 
 m. m.— 1 m— m. / .\ . 
 
 = c J [l + Xc" m (x J c J - 1)] ||x (j) - x' 
 
 But < X < c m < 1, so X c ~ m < 1 and 
 
 Hence 
 
 and we have 
 
 m.-l m-m. m(m.-l) m-m. (m-l)m. 
 
 0<X J c J < c J c J = c J< 1. 
 
 m.-l m-m. 
 -1 < A c - m (X J c J - 1) < 0, 
 
 i (1+1) *i i m i i i (J) *i i 
 x VJ - x £ c J x VJ/ - x . 
 
28 
 Acknowledgments 
 
 This work was performed as part of my doctoral dissertation research 
 in the Department of Computer Science at Yale University. I am most grate f til 
 to my advisors, Professors Martin H. Schultz and Stanley C. Eisenstat, for 
 their assistance and helpful comments. 
 
29 
 
 References 
 
 [ l] Bank, R. E. Marching Algorithms for Elliptic Boundary Value Problems . 
 Ph.D. Dissertation, Division of Engineering and Applied Physics, 
 Harvard University, 1975. 
 
 [ 2] Concus, P. and G. H. Golub. Use of fast direct methods for the 
 
 efficient numerical solution of non-separable elliptic equations. 
 SIAM Journal on Numerical Analysis 10: 1103-1120, 1973. 
 
 [ 3] Diamond, M. An Economical Algorithm for the Solution of Finite 
 Difference Equations. Ph.D. dissertation, Department of Computer 
 Science, University of Illinois at Urb ana-Champaign , 1971. 
 
 [ h] D'Jakonov, E. G. On certain iterative methods for solving nonlinear 
 
 differential equations. Proceedings of the Conference on the Numerical 
 Solution of Differential Equations (Scotland, I960) j Springer-Verlag, 
 1969, pp. 7-22. 
 
 [ 5] Douglas, J. and T. Dupont. Preconditioned conjugate gradient iteration 
 
 applied to Galerkin methods for a mildly nonlinear dirichlet problem. 
 
 Proceedings of the Symposium on Sparse Matrix Computations. , Argonne , IL, 
 ___ 
 
 [ 6] Dupont, T. A factorization procedure for the solution of elliptic 
 
 difference equations. SIAM Journal on Numerical Analysis 5: 753-782, 
 1968. ' 
 
 [ 7] Eisenstat, S. C. , M. H. Schultz, and A. H. Sherman. The application 
 of sparse matrix methods to the numerical solution of nonlinear 
 elliptic partial differential equations. In A. Dold and B. Eckmann , 
 editors , Proceedings of the Symposium on Constructive and Computational 
 Methods for Differential Equations , Springer-Verlag, 197^, pp. 131-153. 
 
 [ 8] Eisenstat, S. C. , M. H. Schultz, and A. H. Sherman. Efficient 
 
 implementation of sparse symmetric Gaussian elimination. Proceedings 
 of the AICA International Symposium on Computer Methods for Partial 
 Differential Equations , Bethlehem, Pennsylvania, 1975 » PP. 33-39. 
 
 [ 9] Forsythe , G. E. and C. B. Moler. Computer Solution of Linear Algebraic 
 Systems , Prentice-Hall, 1967. 
 
 [10] Greenspan, D. and S. V. Parter. Mildly nonlinear elliptic partial 
 differential equations and their numerical solution, II. Numerische 
 Mathematik 7: 129-1U6, 1965. 
 
 [ll] Ortega, J. M. and W. C. Rheinboldt. Iterative solution of Nonlinear 
 Equations in Several Variables , Academic Press , 1970. 
 
 [12] Richardson, L. F. The approximate arithmetical solution by finite 
 differences of physical problems involving differential equations 
 with an application to the stresses in a masonry dam. Philosophical 
 Transactions of the Royal Society , Series A(210), London, pp. 307-357, 
 1910. 
 
30 
 
 [13] Sherman, A. H. On the Efficient Solution of Sparse Systems of Linear 
 and Nonlinear Equations . Ph.D. Dissertation, Department of Computer 
 Science, Yale University, 1975. 
 
 [ll+] Stone, H. L. Iterative solution of implicit approximations of multi- 
 dimensional partial differential equations. SIAM Journal on Numerical 
 Analysis 5: 530-558, 1968. 
 
 [15] Young, D. M. It erative Solution of Large Linear Systems , Academic 
 Press, 1971. 
 
FormAEC-427 U.S. ATOMIC ENERGY COMMISSION 
 
 16/68) UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR 
 
 AECM3201 DISPOSITION OF SCIENTIFC AND TECHNICAL DOCUMENT 
 
 I See Instructions on Reverse Side ) 
 
 1. AEC REPORT NO. 
 
 COO-2383-0027 
 
 2. TITLE 
 
 On Newton-Iterative Methods for the Solution of 
 Systems of Nonlinear Equations 
 
 3. TYPE OF DOCUMENT (Check one): 
 
 $A a. Scientific and technical report 
 
 I I b. Conference paper not to be published in a journal: 
 
 Title of conference 
 
 Date of conference 
 
 Exact location of conference 
 
 Sponsoring organization 
 
 □ c. Other (Specify) 
 
 4. RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): 
 
 JQ^ a. AEC's normal announcement and distribution procedures may be followed. 
 ~2 b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors, 
 j c. Make no announcement or distrubution. 
 
 5. REASON FOR RECOMMENDED RESTRICTIONS: 
 
 6. SUBMITTED BY: NAME AND POSITION (Please print or type) 
 
 C. W. Gear 
 
 Professor and Principal Investigator 
 
 Organization 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana-Champaign 
 
 Urn an a, IL 6l801 
 
 ^%L&t>o 
 
 Signature 
 
 c r 
 
 Date 
 
 December, 1975 
 
 FOR AEC USE ONLY 
 
 7. AEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION 
 RECOMMENDATION: 
 
 8. PATENT CLEARANCE: 
 
 Q a. AEC patent clearance has been granted by responsible AEC patent group. 
 LJ b. Report has been sent to responsible AEC patent group for clearance. 
 I I c. Patent clearance not required. 
 
IBLIOGRAPHIC DATA 
 HEET 
 
 1. Report No. 
 
 UIUCDCS-R-75-772 
 
 3. Recipient's Accession No. 
 
 Title and Subtitle 
 
 On Newton-Iterative Methods for the Solution of Systems of 
 Nonlinear Equations 
 
 5- Report Date 
 
 December 1975 
 
 6. 
 
 Author(s) 
 
 Andrew H. Sherman 
 
 8- Performing Organization Rept. 
 
 No - UIUCDCS-R-75-772 
 
 Perl. Hining Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urh ana-Champaign 
 
 Urbana, IL 6l801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 US AEC AT (11-1) 2383 
 
 msoring Organization Name and Address 
 
 US AEC Chicago Operations Office 
 98OO South Cass Avenue 
 Argonne, IL 60^39 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 S. supplementary Notes 
 
 6. Abstracts 
 
 In this paper we consider the local rates of convergence of Newton-iterative 
 methods for the solution of systems of nonlinear equations. ¥e show that under 
 certain conditions on the inner, linear iterative method, Newton-iterative methods 
 can be made to converge quadratic ally by computing a sufficient number of inner 
 iterates at each step. As an example of this phenomenon, we examine the Newton- 
 Richardson methods which use an inner Richardson-D' Jakonov iteration. 
 
 7. Key Words and Document Analysis. 17o. Descriptors 
 
 Nonlinear equations 
 Newton- iterative methods 
 Newton-Richardson method 
 Iterative methods 
 
 7b. Identifiers /Open-Ended Terms 
 
 7c. COSAT1 Field/Group 
 
 3- Availability Statement 
 
 Unlimited 
 
 3RM NTIS-35 (10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 35 
 
 22. Price 
 
 USCOMM-DC 40329-P71