X I B RAHY 
 
 OF THE 
 
 UN IVERSITY 
 
 Of ILLINOIS 
 
 510.84- 
 
 Ii6r 
 
 no. 355-360 
 
 cop.Z 
 
The person charging this material is re- 
 sponsible for its return on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANACHAMPAIGN 
 
 NOV 1 19T1 
 
 OCT 13 
 
 Rtco 
 
 L161— O-1096 
 
ft • C I 
 
 w/*aJ Keport No * 357 
 
 
 A NONLINEAR PROGRAMMING ALGORITHM 
 FOR AN ARRAY COMPUTER 
 
 by 
 
 John Michael Mulvey 
 
 m 
 
 October 22, I969 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/nonlinearprogram357mulv 
 
Report No. 357 
 
 A NONLINEAR PROGRAMMING ALGORITHM 
 FOR AN ARRAY COMPUTER 
 
 by 
 
 John Michael Mulvey 
 
 October 22, I969 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 6l801 
 
 *This work was supported in part by the Advanced Research Projects Agency as 
 administered by the Rome Air Development Center under Contract No. US AF 30 
 (602)ljlM and submitted in partial fulfillment of the requirements for the 
 degree of Master of Science in Computer Science, September, 1969. 
 

 ABSTRACT 
 
 This paper describes an algorithm for implementing a nonlinear p 
 gramming system on the array computer ILLIAC IV. A brief review of existing 
 algorithms is made to determine the feasibility of using a particular method. 
 The choice for the ILLIAC IV is a combination of the sequential unconstrained 
 minimization technique and the projected gradient method. The advantages of 
 this combined method are discussed. Problems associated with implementation 
 of the algorithm on the machine are discussed, and an overview of the system 
 is presented. 
 
ACKNOWLEDGMENT 
 
 The author would like to express appreciation for the help ren i 
 by Professors R. S. Northcote and I. W. Marceau in the development of the 
 report. Thanks are also due Miss Sharon Dunlavey for her kind encouragement 
 in organizing the paper. Finally, Mrs. Patricia Douglas must be thanked for 
 her patience in typing the manuscript. 
 
 The results recorded herein were obtained in the course of research 
 studies developed under the auspices of the ILLIAC TV project in the Department 
 of Computer Science, University of Illinois. This paper would not have been 
 possible without this support. 
 

 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 1.1 The Object and Scope of the Study 1 
 
 1.2 Organization of the Report 
 
 1.3 Statement of the Problem 
 
 l.k A Review of the Existing Algorithms k 
 
 1.5 A Description of the Combined Method 7 
 
 2. THE SEQUENTIAL UNCONSTRAINED MINIMIZATION TECHNIQUE 9 
 
 2.1 Formulation of the Method 9 
 
 2.2 Penalty Functions 10 
 
 2.3 Interior Points 12 
 
 2.k Exterior Points lk 
 
 2.5 Unconstrained Algorithms 18 
 
 2.6 Advantages 22 
 
 3. THE PROJECTED GRADIENT METHOD 2k 
 
 3.1 Review of the Problem 2k 
 
 3.2 The Algorithm 2k 
 
 3.3 Proof of the Projected Gradient Formula 29 
 
 3.^ An Example Problem 32 
 
 3-5 Step Length 3k 
 
 3.6 Zigzagging 36 
 
 3.7 Convergence 37 
 
 k. THE COMBINED METHOD ko 
 
 k.l Discussion of the Method ko 
 
Page 
 
 k,2 Phase I of the Algorithm 4l 
 
 4.3 Feasibility Test 44 
 
 4.4 Phase II of the Algorithm 44 
 
 4.5 The Locking Test 46 
 
 4.6 Global-Local Conditions 47 
 
 5. IMPLEMENTATION ON ILLIAC IV 48 
 
 5-1 Parallel Computations 48 
 
 5-2 Storage Methods 49 
 
 5-3 Disk Delays 51 
 
 5.4 Computational Procedure 52 
 
 5.5 Size Limitations 54 
 
 6. CONCLUSION 57 
 
 LIST OF REFERENCES 59 
 
 APPENDIX 
 
 A. CALCULATIONS FOR THE EXAMPLE PROBLEM OF THE PROJECTED GRADIENT 
 METHOD 60 
 
 B. FLOW DIAGRAMS OF THE COMBINED METHOD 65 
 

 LIST OF FIGURES 
 
 Figure Page 
 
 1. An illustrative example of the interior points approach .... 13 
 
 2. An illustrative example of the exterior points approach .... 16 
 
 3. The correction of the projected gradient back into the 
 
 feasible region 27 
 
 h. Dropping a constraint in the Projected Gradient Method .... 28 
 
 5. Representation of the projected gradient formula 30 
 
 6. A sample problem of the Projected Gradient Method 33 
 
 7. The optimum in the Projected Gradient Method 3^ 
 
 8. An example of zigzagging 37 
 
 9. A straight storage scheme 50 
 
 10. Multimodal functions 55 
 

 NOTATION 
 
 A = Approximation to Hessian matrix 
 
 C , C = vector components of the gradient of the objective function 
 
 d = step length 
 
 D = direction vector 
 
 E = symmetric s by s matrix 
 
 E = Euclidean r-dimensional space 
 
 f, = k-th current solution of the objective function f(X. ) 
 
 g. (X) = i-th constraint equation 
 
 G = hypersurface of the intersection of s active constraints 
 
 h(Y) = penalty function 
 
 H(X, p) = revised objective function 
 
 i = index for constraints 
 
 In - H(X, p) = infeasible revised objective function 
 
 j = index for activities 
 
 J = set of equality constraints 
 
 k = current solution index 
 
 K = set of inequalities greater than zero 
 
 I = length of feasible region 
 
 L = Hessian matrix 
 
 L = set of inequalities less than zero 
 
 m = total number of inequalities 
 
 n = total number of activities 
 
 N = unit vector which points in direction of the projected gradient 
 
 p = penalty constant 
 
 P = projected matrix 
 
 P V f = projected gradient 
 
vilj 
 
 q = slack variables in exterior approach 
 
 R = test vector in projected gradient method 
 
 r = total number of constraints 
 
 s = total number of active constraints 
 
 U = Matrix used in Gradient Projection Algorithm 
 
 V = symmetric s by s matrix 
 X = current solution vector 
 X* = local optimum 
 
 V = correction into feasible region 
 
 a = test for projected gradient method 
 [3., = i-th position of P, vector 
 
 7 = test for rate of convergence 
 
 5 = measure of closeness to a hypersurface 
 
 e = step length in conjugate direction method 
 
 V = gradient - a column vector of first partial derivatives 
 v = matrix of second partial derivatives 
 
 A = forward difference operator 
 = Euclidean norm 
 = transpose of a matrix 
 
1 . INTRODUCTION 
 
 1.1 The Object and Scope of the Study 
 
 The purpose of this investigation is to establish a nonlinear pro- 
 gramming system for the parallel array computer ILLIAC IV. The nonlinear 
 system is to supplement the linear system now being developed. Its capabil- 
 ities range from small decidedly nonlinear problems to large quasi-linear 
 problems. The bound of the larger problems is of the order of ten thousand 
 constraints, while the smaller highly nonlinear problems rarely exceed one 
 hundred constraints. This wide variance makes it difficult to apply one of 
 the existing algorithms. 
 
 This report presumes an understanding of the concepts of ILLIAC IV 
 and mathematical programming. The uninformed reader should refer to the list 
 of references for background material; however, the author recommends a prior 
 knowledge of linear programming. 
 
 At the present, nonlinear programming is a hodgepodge of specific 
 examples. A unified theory has not yet been devised and appears unlikely to be 
 in the near future. Particular methods have arisen which solve very special 
 problems. The system designer must account for this diversity and plan accord- 
 ingly. The path is not an easy one and many questions must be answered. What 
 is the range of problems that the system will solve? Is the local optimum 
 satisfactory? How stable is the stationary solution? These types of questions 
 are examined in this report. 
 
 The scope of the system is not limited to convex programming. An 
 attempt is made to find many local optima so that the probability of determining 
 the global optimum is increased. Since no one has discovered a technique for 
 finding global conditions for general nonconvex problems, this deficiency must 
 be accepted. Parallel computing handles the problem nicely, as is demonstrated 
 in a subsequent chapter. 
 
1.2 Organization of the Report 
 
 The remainder of this chapter presents the nonlinear programming 
 problem per se. A review of the existing techniques is made and the procedure 
 chosen is the best with regard to the range of applications and the structure 
 of ILLIAC IV. The advantages and disadvantages are briefly explained. 
 
 Chapter 2 describes the Sequential Unconstrained Minimization Tech- 
 nique (SUMT) in detail. Various penalty functions are examined and the differ- 
 ences weighed. The interior and exterior theories are discussed, as well as 
 I ■ blems with convergence. Unconstrained methods are looked at and an example 
 is presented. 
 
 Chapter 3 involves the projected gradient method. The essential for- 
 mula is proven, and a sample problem is shown. Problems with step length and 
 zigzagging are pointed out and a search for the best step length is covered. 
 The Combined Method algorithm is discussed in Chapter k. 
 
 The implementation of the algorithm on ILLIAC IV is discussed in 
 Chapter 5. The efficiency of machine usage is observed with an eye to the rate 
 of convergence. The details of parallel computations are discussed so that a 
 realistic code can be programmed. The coding requirements, such as storage 
 schemes, are presented and their importance in an array environment is stressed. 
 
 The results of the study are listed in the summary. Finally, the re- 
 port ends with an overview of the system and a recommendation for future imple- 
 mentation. 
 
 1. 3 Statement of the Problem 
 
 The general mathematical programming problem is to minimize the objec- 
 tive function in n variables 
 
 w 
 
 ■ f( V \ = (*ik> • • • > *„*> 
 
subject to the constraints 
 
 g i (x k )>b i 1-1,. 
 
 m 
 
 g i (X k ) = b i i = nH-1, . . . , r 
 
 b. > 
 
 1 — 
 
 which form the feasible region in r-dimensional Euclidean space. It is usually 
 unnecessary to add slack or artificial variables in nonlinear problems, in 
 contrast to the simplex algorithm of linear programming. The equality con- 
 straints can be changed into inequalities by adding the following constraints 
 
 MV - b i >° 
 
 \ -8 l (\)>0 
 
 but often they should be dealt with separately. In this report, the vector 
 (X, ) is the k current solution. 
 
 The objective function and constraints f and g. (i = 1, . . . , r) 
 can be either linear or nonlinear, but they must be continuous and possess first 
 derivatives. The difficulty in determining the solution is directly propor- 
 tional to the amount of curvature of the function. Hence, linear programming is 
 solved in the most straightforward fashion. 
 
 The first step into the nonlinear realm is to allow the objective 
 function to be nonlinear while holding the constraints linear. This problem 
 can be solved by the simplex method and is an extension of that technique. The 
 linear constraints define a convex region, and the method is called convex pro- 
 gramming. 
 
 The next step into the nonlinear is more painful, for many reasons. 
 The most important problem is the global -local dilemma. Once the feasible 
 
region is no longer convex the optimum solution to the entire problem cannot be 
 determined with certainty. A stationary point solution is the best that can be 
 achieved, and many local minima are found. These local minima are compared to 
 find the smallest. Starting the procedure at various points increases the prob- 
 ability of determining the global minimum. 
 
 1.*+ A Review of the Existing Algorithms 
 
 Two methods are combined in the design of the system for ILLIAC IV 
 because of the difficulty in justifying a stand-alone procedure. The combina- 
 tion is helpful in overcoming the global-local difficulty. The advantages of 
 both the methods are utilized so that a more effective process results. Before 
 the "Combined Method" (a name used in the remainder of the report) is analyzed, 
 a review of some of the more popular algorithms is made. 
 
 The first technique to be examined is the cutting plane method [9], 
 the central principle of which is to linearize the constraints at predetermined 
 points and solve the resulting LP problem. The linearization is achieved by 
 applying first order Taylor series approximations 
 
 g .(x) = g .(x k ) + ^.(Xjp'cx - x^ 
 
 at the solution (X, ). If the solution of the LP problem satisfies the original 
 constraints, the method is finished; otherwise the estimates are revised and the 
 process continues. The technique converges for convex problems, but fails to 
 find a local optimum in nonconvex problems. The path leaves the feasible region 
 and never returns. The method is not applicable in this study because of the 
 limitation of the range of applications to convex problems. The author discounts 
 the accelerated cutting plane technique for similar reasons. 
 
 The separable programming method is used when the objective function 
 is separable; that is, of the form 
 
n 
 f(X) = Z f 6c. ) 
 
 and the constraints are linear and also separable. The method divides the 
 objective function into a grid and linearizes it at the intervals. This lin- 
 earization is applied to each variable. The problem is to minimize f(X) subject 
 to 
 
 g,(X) = Z g.,(x.) . 
 
 . - -13* "J* 
 
 Here again the procedure converges if convexity is retained, but fails otherwise, 
 This method is likewise rejected. 
 
 The Lagrangian differential gradient method has a different form than 
 the previous methods. It utilizes the gradient of the objective function which 
 points toward the greatest increase of the function. The negation of the gradi- 
 ent is used here. Knowledge of the constraints alters the direction according 
 to the gradient. Notationally 
 
 dX/dy = Vf( X ) - Z u.Vg.(x) 
 i=l X x 
 
 where u. is a Lagrange multiplier and 
 
 f g, (X) if u > or g.(X) > 
 du/dy = ( x 
 
 otherwise 
 
 The solution to the original problem is obtained from the solution to the 
 differential equations by letting y -* ». Strict convexity must be retained, and 
 the solution cannot be found if the constraints are linear. This method has 
 been a stepping stone for other gradient methods. 
 
The next method to be examined is the Sequential Unconstrained Mini- 
 mization Technique. It is discussed lightly because it is one of the methods 
 in the combined algorithm. The details are presented in Chapter 2. The method 
 has been perfected at the Research Analysis Corporation by McCormick and Fiacco 
 [5]- Convergence is proved for convex regions, and will also closely approach 
 nonconvex optima. An advantage is the partial avoidance of unwanted local minima. 
 The idea is to incorporate the constraints into the objective function and thereby 
 change the constrained problem into a series of unconstrained minimizations. A 
 penalty term is introduced to weigh the importance of the constraints. The se- 
 quence approaches the solution of the original problem as the penalty is mono- 
 tonically decreased. An advantage of the computational technique is the natural 
 addition of an unconstrained subroutine, thereby greatly increasing the range of 
 possible applications. 
 
 The small step gradient method is a well known nonlinear algorithm. 
 It is similar to the cutting plane scheme. The small step method attempts to 
 remain in the feasible region by changing the solutions gradually. The steps of 
 the algorithm are: 
 
 1. Linearize all constraints by a Taylor series approximation at a 
 current feasible solution. 
 
 2. Minimize w = f (X^) = v f (X^ (X) with the current solution = X, 
 subject to V g i (X k )' (\) = Vg i (X k )'(X k ) - f(X k ) 
 
 and || (X) - (X )|| = OL, with a a small number. 
 
 3. Test for an optimum value and return to step 1 if the test fails. 
 
 Any familiar LP method is used to solve step 2 above. Notice that no informa- 
 tion carries from iteration to iteration because of the relinearization of all 
 constraints. The number of computations is increased. This method converges 
 very slowly and is, therefore, rejected. 
 
 
7 
 
 The final method to be e; amined is the gradient projection method of 
 Rosen [12]. It is a subset of the general method of feasible direction, whi 
 attempts to decrease the objective function at every step, and is the second 
 portion of the Combined Method. Therefore, the description here is brief a:: 
 it will be described more fully below. The method begins with an initial fea- 
 sible interior solution and follows the negative gradient to the boundary at 
 which point the projection is determined. The projected direction is not nec- 
 essarily the best, as in Zoutendijk's method [15] which solves an LP problem to 
 determine the direction of greatest change within the region; however, a decrease 
 of the objective function is guaranteed and an LP problem is not formulated at 
 each iteration. The method converges in a small number of iterations, especially 
 with linear constraints. Although the amount of work per step is large, the 
 method is acceptable and is even shown to be desirable for the ILLIAC IV. 
 
 1. 5 A Description of the Combined Method 
 
 As mentioned in the previous section, a combination of two existing 
 methods is used in the nonlinear system design. The sequential unconstrained 
 minimization technique is used for the small highly nonlinear problems and as 
 a first approximation to the projected gradient method. The latter method is 
 used exclusively with large quasi-linear problems and in speeding convergence 
 in the combined mode. 
 
 The stages of the Combined Method are as follows: 
 
 1. Phase one of the procedure involves the determination of an initial 
 feasible solution if one does not already exist. It is often the 
 case that a feasible solution is known a priori because of the 
 nature of the problem. 
 
8 
 
 2. The unconstrained revised I'unction is determined by the original 
 constrained problem. The function depends upon the current solu- 
 tion, and an arbitrary penalty function and constant. 
 
 3. The unconstrained function is minimized for a particular penalty 
 constant. An attempt is made to find all local minima of the 
 converted function. Tests are performed to check feasibility. 
 
 k. The solution of the unconstrained problem becomes the current 
 
 solution, and a test for the rate of convergence is made. If the 
 rate is adequate, the penalty is lowered and the process returns 
 to step three. The test for convergence is part of the locking 
 test. 
 
 5. The projected gradient method is called into the program if the 
 locking test indicates that it would be appropriate. The method 
 determines a feasible direction and then a step length which 
 minimizes the objective function violating the constraints. 
 
 6. The locking test is made again. The test includes a check for 
 local optimum as well as the rate of convergence. The optimum is 
 found within a delta region because of the nature of the algorithm. 
 It is possible to return to the sequential method if conditions so 
 warrant. 
 
 7- This step is included for nonconvex regions. Once a local minimum 
 is found by the above process, the entire procedure is reiterated 
 from another feasible position. Many optima are found in this 
 manner and the probability of finding the global is improved. 
 
 These steps give a rough understanding of the Combined Method. The prime advan- 
 tage of this algorithm is the wide range of applications which attempt to find 
 global optimum of nonconvex regions in a few iterations. The remainder of the 
 report expands this assertion. 
 
 
2. THE SEQUENTIAL UNCONSTRAINED MINIMIZATION TECHNIQUE 
 
 Formulation of the Method 
 
 The essence of the method is concerned with the conversion of the 
 objective function and constraints of the constrained problem into an uncon- 
 strained problem with a revised objective function, which is minimized. A 
 sequence of conversions and minimizations is computed which converges to the 
 solution of the constrained problem as the number of iterations approaches in- 
 finity. The procedure terminates at a delta closeness to the optimum which is 
 defined by 
 
 S 2 2 
 i=l x 
 
 for the s constraints which lie in the hypersurface G. In the unconstrained 
 method, a test of convergence is made to measure performance. The formula 
 
 ' ' (f k " f k + iV f k . 
 
 where f = f (X, ), is checked against a constant which has been predetermined by 
 the nature of the problem. A local optimum test is made when y becomes less than 
 the specified constant. The procedure halts or switches to the projected gradi- 
 ent if the minimum is not found. 
 
 The revised objective function for the sequential method is defined by 
 
 m 
 
 with 
 
 y ik = e±(\) 
 
10 
 
 The penalty constant for the k-th iteration is p , and the penalty function is 
 
 Ihus the revised objective function is also a function of the constraints 
 when the penalty is nonzero. The function depends upon the curvature of the 
 constraints, the number of local minima, and the ease of calculating the con- 
 straints. It has the property that 
 
 m 
 lim L P k h(y lk ) = 
 k -* » i =1 
 
 and the result 
 
 lim H(X k ,p R ) = f # , 
 k-*« 
 
 * •*- 
 where f # = f (X ) and X is the solution to the constrained problem. 
 
 The effect of the conversion is the simulation of the original prob- 
 lem more and more closely as the sequence progresses. The contribution of the 
 constraints to the revised objective function diminishes and vanishes at the 
 limit. 
 
 2.2 Penalty Functions 
 
 The function h(X) has many forms, with inverse and logarithmic forms 
 being the most popular. It is possible to use a weighted penalty function that 
 combines the two approaches of the sequential unconstrained method. This is dis- 
 cussed later in the report. 
 
 A logarithmic penalty function has the form 
 
 m 
 
 H( W - f k - p k .V n g i ( V 
 
 1=1 
 
 in the revised function. Taking first derivatives, find the gradient of the 
 H function 
 
11 
 
 m 
 1=1 
 
 It is apparent that the gradient vanishes provided )C = X and p = 0. This 
 assumes the Kuhn and Tucker condition that the gradient of a function vanishes 
 at the optimum or f^ = 0. If this assumption holds, the first order sufficiency 
 condition for the logarithmic function is proved. The second order condition is 
 proved in a similar fashion. Proceeding to the second derivatives, we have 
 
 v^H(x k ,p k ) = v f - Z [pj/g^)] v g.(x k ) + Z [i/g i (x k ) ] v gi (x k ) vg i (x k ). 
 
 i=l i=l 
 
 2 
 A second order condition for the optimum is that the matrix V f* he positive 
 
 # 2 
 
 definite. As p, ■• 0, X -» X , and the H function becomes equal to V f^j there- 
 
 ? * 
 fore, V H(X ,0) is a positive definite matrix. This completes a second order 
 
 -ondition for the logarithmic penalty function. 
 
 These proofs demonstrate the nearness of the unconstrained function H 
 
 to the minimum of the objective function f in the constrained problem, provided 
 
 a logarithmic penalty function is utilized. 
 
 A second penalty function, called the inverse function, has the form 
 
 m 
 
 H <W ■ f k + p k .V /g i ( V 
 
 i=l 
 
 It can be shown, in an analysis similar to the previous function, that first 
 and second order conditions are satisfied. The first derivative is 
 
 m p 
 
 V H(X k ,p k ) = V f k - p k ^ i Vg i (X k )/g. 2 (X k ) 
 
12 
 
 so the H function has a vanishing gradient at X . The limiting value of the 
 sequence of solutions to the unconstrained problems is the solution of the con- 
 strained problem. 
 
 Note that the penalty function increases without bound as the con- 
 straints approach zero, when p, is greater than zero. Hence, the boundary 
 cannot be reached and the equality constraints cannot be satisfied, so the pro- 
 cedure halts at a delta closeness to the optimal solution. 
 
 The inverse penalty function would appear to be the more profitable 
 of the two functions to use. The calculation of the inverse of a constant is 
 trivial and the summation is easy. Although the computation of the logarithm 
 is simple, it does require more operations and consequently longer computation 
 time for the same number of unconstrained problems in the solution sequence. No 
 information is available as to which penalty function yields the shortest solu- 
 tion sequence in actual problems. 
 
 2.3 Interior Points 
 
 An example is presented so that the reader can gain an understanding 
 of the sequence method. It also helps demonstrate the difference between the 
 interior and exterior points methods. 
 
 The problem is to minimize the objective function 
 
 w = f (X) = x x + 2x 2 
 
 subject to 
 
 X l " X 2 - ° 
 
 x ± > 
 
13 
 
 x x - x 2 > 
 
 iA V 2 3A i 1 
 
 Figure 1. An illustrative example of the interior points approach. 
 
 The revised objective function is 
 
 H(X k ,p k ) = x ± + 2x 2 - p k ln[g 1 (X k )] - pjInCg^)] 
 = x x + 2x 2 - P k ln(x x - x 2 ) - p k ln(x 1 ) . 
 
 Figure 1 demonstrates the problem at hand, the shaded area being the feasible 
 region, and the boundaries the equality constraints. The trajectory of current 
 solutions, which are listed in Table 1 and plotted in the figure, shows how the 
 boundaries are avoided and how the interior is utilized. The revised function 
 is first minimized for p = 1 by taking partial derivatives and setting the sys- 
 tern of equations to zero. The penalty constant is lowered to one half of the 
 previous constant, and the new revised function is minimized. This process 
 
Ill- 
 continues until the path of the current solutions converges upon the origin 
 (0,0) and halts within a 5-closeness. The reduction in the penalty was arbi- 
 trarily chosen. The percent of reduction is large because of the simplicity of 
 the problem and the speed of convergence. Note that the path remains in the 
 interior and the rate of convergence slows as the optimum is approached. This 
 method is called the interior points approach for obvious reasons. 
 
 TABLE 1 
 THE SOLUTION OF THE INTERIOR PROBLEM 
 
 k 
 
 P k 
 
 7 1 
 
 7 2 
 
 f k 
 
 1 
 
 1 
 
 • 833 
 
 .667 
 
 2.16 
 
 2 
 
 • 5 
 
 .416 
 
 • 333 
 
 1.08 
 
 
 
 • 25 
 
 .208 
 
 .167 
 
 0.5^ 
 
 k 
 
 • 125 
 
 .104 
 
 .083 
 
 0.271 
 
 The computations of the minimum of the revised function are usually 
 more involved because the derivatives are harder to find, and the system of 
 equations is less straightforward; therefore another method, such as one of the 
 conjugate direction techniques, is used. 
 
 2.k Exterior Points 
 
 Thus far, the method (SUMT) discussed in this chapter has dealt only 
 with the interior points approach. Another approach is to seek the optimum from 
 an infeasible current solution. This is called the exterior points method. 
 
 The idea is to add positive slack variables to the constraints which 
 do not meet the predefined inequality conditions. This addition takes the form 
 
15 
 
 6 i ( V + \ = ° x - i - m 
 
 in which the variable q decreases as the solution sequence progresses. The 
 
 K 
 
 constraints which affect the results are these relaxed inequalities. The new 
 objective function 
 
 
 H(X k ,p k ) = f + p k L min[0,g i (X k )] , 
 
 i=l 
 
 called the quadratic loss function, accounts for this relaxation. The penalty 
 function, p , is monotonically increased as the sequence of unconstrained 
 problems is solved. Since as p increases, X, is drawn towards the feasible 
 region so as to minimize the value of the penalty term, X -> X as p -> °°. 
 
 Each step consists of solving the set of equations formed by setting 
 the gradient of the new objective function to zero, i.e., 
 
 m 
 V H(X k ,p k ) = V f k + 2p k I V g i (X k ) • min[0, g;L (X k )] = 0. 
 
 i=l 
 
 The second order condition is also fulfilled because the second derivatives 
 form a positive definite matrix as in the logarithmic function discussed above 
 X is a local minimum of the converted function H(X, , p, )• 
 
 The difference between the exterior method and the interior method 
 arises because of the form of the revised objective function. The interior 
 method uses all constraints in each iteration, but the exterior method calcu- 
 lates only those constraints which do not fulfill the proper inequalities. 
 The optimum of the converted H function moves toward the optimum of the 
 constrained problem in both cases. 
 
16 
 
 An example of the exterior method is to minimize 
 
 w = f (X) = x 1 x 2 
 
 subject to 
 
 2 
 C 2 
 
 gl (x) = -xj -x: + k :> o 
 
 g 2 (X) = -xj -x 2 + 1 > 
 g 3 (X) - x ± + x 2 > 
 
 X, + X 2 >0 
 
 Figure 2. An illustrative example of the exterior points approach. 
 
x 7 
 The shaded area is again bhe feasible region with the equality conditions the 
 boundaries. The revised objective function with p equal to .5 '■• 
 
 2 2 2 
 
 H(X ,.5) = r Q + 1 X{[min(0,x 1 + x g )] + [min(0,l -x Q -^ )] 
 
 + [min(0, -x 1 -Xg + h)] } . 
 
 The solution of this function is listed in Table 2 with the other solutions in 
 the sequence. The penalty constant p is decreased in successive iterations to 
 bring the current solution closer to the feasible region. The trajectory of 
 solutions is plotted in Figure 2. 
 
 TABLE 2 
 SOLUTIONS OF THE EXTERIOR PROBLEM 
 
 k 
 
 p k 
 
 x l 
 
 x 2 
 
 f k 
 
 1 
 
 1 
 
 • 67 
 
 .89 
 
 .60 
 
 2 
 
 2 
 
 .62 
 
 • 77 
 
 .k6 
 
 3 
 
 k 
 
 .61 
 
 • 73 
 
 M 
 
 h 
 
 8 
 
 • 58 
 
 .67 
 
 .38 
 
 00 
 
 00 
 
 • 577 
 
 .667 
 
 • 385 
 
 The computations of the minimum of the revised function for the exte- 
 rior method is less involved than the interior method in many cases. Note that 
 in the example the current solutions fulfill the proper inequalities for con- 
 straints 1 and 3- Hence these constraints are set to zero in the revised func- 
 tion while the conditions are satisfied. The order of the polynomial is 
 increased by the quantity squared, which is a disadvantage from a computational 
 standpoint. 
 
18 
 
 The exterior method is quite useful when the minimum of the first 
 iteration X is close to the feasible region and the solution occurs on the 
 boundary of the region. The convergence may be prohibitively slow in other 
 situations. The exterior algorithm handles equality constraints more effec- 
 tively than the interior method, because the slack variables are taken care of 
 in the quadratic loss function. 
 
 The difference between the interior and exterior methods is that the 
 interior method uses the constraints to avoid regions, while the exterior method 
 follows the philosophy that the constraints keep the current solution close to 
 the feasible region. Although the ideas are different, the calculations are 
 similar, except for the number of constraints in the revised function. 
 
 2. 5 Unconstrained Algorithms 
 
 An integral portion of the sequential method is the minimization of 
 the unconstrained problem using the objective function H. Without a good tech- 
 nique the whole idea falters. What constitutes a good technique? It must have 
 good convergence properties and be computationally straightforward. Global- 
 local difficulties and saddle points must be avoided. These problems resemble 
 those of the constrained problem. A multitude of algorithms appear in the liter- 
 ature and have been applied with varying degrees of success. The most popular 
 are the method of steepest descent, the conjugate direction methods, and Newton's 
 second order method. 
 
 The first class of methods to be examined is the conjugate direction 
 approach. The approach guarantees to locate the minimum of a function in n 
 steps, provided the function is quadratic and possesses n arguments; otherwise 
 a test of convergence is required. The idea is to approximate the function by 
 a quadrat ir: equation and search along this function until the minimum is deter- 
 mined. A subset of this method is the Fletcher and Reeves [7] proposal. Their 
 algorithm begins with the vector (X, ), where k is initially zero, and the 
 
19 
 
 ■t ion 
 
 D K * V h 
 
 The next point (X ) is found by determining the minimum value of f(X) along 
 this direction. When this point is found, a new direction is computed by the 
 
 D, i = -V f, , + €. D, 
 
 k+1 k+1 k k 
 
 where 
 
 £ K " V f k + l 2 /^ f k 2 
 
 Tliis process continues for n iterations until X , the solution of the quadratic 
 problem, is found. An assumption is that the calculations involve no roundoff 
 or truncation errors. The generated directions correspond to the quadratic 
 approximations and the procedure should be restarted every n+1 steps for general 
 functions. The principle quantities to be observed are: the rate of conver- 
 gence, the choice of the initial solution, the search for the next current solu- 
 tion, and the test for delta closeness to the optimum. 
 
 Another conjugate direction method is the Davidon variable metric 
 method [2]. The notion is to approximate the second partial derivatives in the 
 Hessian matrix. It begins with the arbitrary vector X and any positive definite 
 matrix A. Successive points are found through the equation 
 
 \ + i - \ - ^ f k 
 
 The step length d is found in a one dimensional search for the local minimum of 
 
20 
 the objective function. The difference from the previous method is in the 
 matrix approximation 
 
 
 with the difference equations 
 
 and 
 
 ^ - - d \ v f k 
 
 ^k ■ *Vl - *V 
 
 The matrix A, is positive definite if A is also positive definite, and the 
 function decreases appropriately. 
 
 The advantage of the conjugate direction methods lies in the quadratic 
 approximation to the function. The method works quite well for functions which 
 resemble quadratic forms, but not so well with general functions. If the initial 
 point is sufficiently close to the optimum, the rate of convergence is good. The 
 calculations of A are somewhat involved. In some cases it is much better than 
 computing second derivatives. 
 
 The method of steepest descent is the second type of method to be dis- 
 cussed. It is an offshoot of the generalized gradient method, which points in 
 the direction of greatest change of the function. The negative of the gradient 
 is again used. Distance along this gradient is maximized to lower the number of 
 iterations. The arbitrary vector I is used once more to begin the procedure. 
 Subsequent points are found by 
 
 
21 
 
 *kn'h- dv f * 
 
 where 
 
 d - the step length 
 k + 1 = the new current solution 
 
 The step length is found by determining the smallest value of d which minimizes 
 the objective function along - Vf fc but remains positive. That point becomes 
 
 the next current solution and a new direction is found that is orthogonal to 
 
 the previous direction. 
 
 The final method to be examined is the Newton second order method. 
 Second partial derivatives of the unconstrained H function are required. The 
 method is based on the second order Taylor series 
 
 W =f k + fi!CVf k + i^'hP* 
 
 where 
 
 \ = 
 
 b f. fix fix.. 
 
 k 7 1 ] 
 
 d f . fix dx, 
 
 k' n 1 
 
 d f, fixfix 
 
 k' 1 n 
 
 d^f, fix dx 
 
 k' n n 
 
 and L^. = L()C ) is the Hessian matrix. The A, matrix of the Davidon variable 
 metric method is an attempt to duplicate this matrix. It is a positive definite 
 symmetric matrix. The series of current solutions is represented by 
 
 -1. 
 
 \ + l ' \ - «* v f : 
 
22 
 where 
 
 L Vf, = direction of step 
 
 d = step length , chosen to minimize f along 
 
 -L V f, starting at x, • 
 Note that the inverse of the Hessian matrix must exist for the method to con- 
 tinue. Also, the matrix must be stored, which requires ^-N(N + l) words. If 
 the derivatives are troublesome to compute, the method slows considerably. 
 Nonetheless, the rate of convergence is as good as, if not better than, the 
 earlier two methods. Actual tests have been run by Fiacco and McCormick [k~\ 
 and the results indicate that the Newton method requires the fewest iterations, 
 
 o 
 
 but the work per step is roughly n /3, while the variable metric method requires 
 
 2 
 n operations per step and Fletcher-Reeves method requires n /2 operations per 
 
 step. These results indicate that the Newton method requires many more compu- 
 tations per step but fewer iterations are required for convergence; therefore, 
 it compares favorably with the other methods. Because of the structure of 
 ILLIAC IV, these characteristics make the Newton method the preferred technique. 
 Chapter k expounds on this issue in more detail. 
 
 2.6 Advantages 
 
 In Chapter 2 the sequential unconstrained minimization algorithm has 
 been examined. To summarize, the advantages of the method are listed below. 
 
 1. The algorithm is computationally concise, yet convergence has been 
 proven for first and second order conditions. 
 
 2. The sequence of current solutions remains in the interior for the 
 interior points method. This avoids unwanted local minima to 
 some degree. 
 
23 
 
 3. Highly nonlinear I attaints can be hand] I because the path of 
 solutions does not attempt to follow the constraints. The path 
 avoids the constraints through the use of the penalty constant 
 and the penalty function. 
 
 k. It includes a useful unconstrained technique within the computa- 
 tional algorithm. 
 
 5. A large variety of problems can be solved. 
 
 sse conditions suggest that the method be used in a combined algorithm, its 
 principle task being to serve as a first approximation to the gradient projection 
 method. 
 
 The principle disadvantages stem from equality constraints. It should 
 be apparent that these constraints can be satisfied only in the limit for both 
 the interior and exterior methods. Something must be done to overcome this 
 difficulty. The second disappointment is the slowness of convergence for general 
 rules which affect the penalty constant. This can be overcome by a priori knowl- 
 edge of the problem; however, a problem solving system should avoid user mani- 
 pulation as much as possible. The formulation of a good extrapolation procedure 
 has not been achieved. 
 
2k 
 3. THE PROJECTED GRADIENT METHOD 
 
 3*1 Review of the Problem 
 
 The second portion of the combined method is taken from the algorithm 
 iosen [12]. The first account of this algorithm appeared in a paper which 
 dealt with linear constraints, and this was followed by the subsequent paper on 
 nonlinear constraints. Hadley [8] includes this method in his book on nonlinear 
 programming. The gradient projection method is one of the techniques of feasible 
 Lirections. It attempts to move the solution in the direction of the negative 
 gradient, similar to the procedure of the unconstrained method of steepest de- 
 scent. Differences occur when the gradient of the objective function points 
 outside the feasible region. Following the gradient outside the region leads to 
 infeasibility, so other directions are searched to find the "next best" direc- 
 tion. 
 
 3-2 The Algorithm 
 
 1. The starting vector can be found in two ways. It can be determined 
 either by the sequential unconstrained minimization method or by the projected 
 gradient method. This is examined in more detail in the next chapter. 
 
 2. If the starting vector is interior, the current solution is found 
 by the formula 
 
 x i ■ x o - d v f > 
 
 where the step length d is determined by a one -dimensional search. Find the 
 step length which 
 
 minimizes f 
 
25 
 subject to 
 
 < d < d , 
 
 — — max 
 
 where d is the longest step length which remains in the feasible region. 
 
 3. If X, lies within a & -neighborhood of the intersection of the s 
 
 K 
 
 constraints, then the solution is moved in the direction of the projection of 
 the gradient Vf, on the intersection G of the tangent hyperplanes at )C. Deter- 
 mine the matrix 
 
 u k = (v gl , vg 2 , . . . , vg s ) . 
 
 The columns of this matrix must be linearly independent and the procedure checks 
 for this condition; otherwise dependent columns are removed until the remaining 
 columns are independent. Next define 
 
 \ ■ [ \'\ yl > 
 
 which exists because of the independence condition above, and 
 
 R 
 
 k k k 
 
 The projection of the gradient on the intersection G of the tangent hyperplanes 
 at I is then 
 
 V^ ■ w k - U A 
 
26 
 
 where 
 
 \ - x - \\\ 
 
 The proof of this vital equation is contained in the following section. The 
 test for the optimum is started by finding 
 
 P.. = r.. /(2 sTv..) 
 lk ik/ v 11 
 
 where r is the i-th element of the vector R and v. . are the diagonal elements 
 
 of the matrix V, . Define 
 k 
 
 Finally, if 
 
 a = max [ II P k Vf k || , max p ±k ] 
 
 i 
 
 a < b/ 2s lii 
 
 where 
 
 6 = specified tolerance 
 
 s = number of constraints 
 
 I = a bound on the diameter of the feasible region 
 
 U = measure of independence of the active constraints, || V-. f | < j/ 
 
 then f is within a 6-distance of the local minimum. 
 
 h. If a < S/2s£u and || P Vf | = max[p ], then find the unit vector 
 
 k k 
 
 ik" 
 
 " k -P k vf k /l|P k vf k 
 
27 
 
 Now compute the series of points from Y, to Y which projects the current solu- 
 
 tion back into the feasible region by 
 
 + dN, 
 
 Y„ = 
 
 Y l - W V 
 
 Y = Y . - u, v n w(y . ) 
 
 n n-1 k k v n-l y 
 
 wh 
 
 ere w ( Y i _ 1 ) is a vector which is a measure of the distance to the intersection 
 
 The n-th solution is found at a 5 -neighborhood of the intersection G, and the 
 iteration terminates. 
 
 X 2 
 
 A 
 
 (0,0) 
 
 Figure 3. The correction of the projected gradient back into the 
 feasible region. 
 
28 
 
 5. If a < 5/2si/Lz and | P.Vf. |l < max 3 then drop the i-th column 
 of the U matrix which corresponds to the element max (3 . The new matrix 
 
 1 
 
 becomes U and the process is continued by finding the new V , R , P , 
 Vf ,, and N matrices and vectors. The notion is to proceed along a new 
 intersection G by dropping the appropriate constraint. 
 
 X 2 
 
 A 
 
 ,— CONTOURS OF THE 
 
 x \ x„ x )0 x 9 OBJECTIVE FUNCTION 
 
 X.3 
 
 — -X| 
 
 Figure h. Dropping a constraint in the Projected Gradient Method. 
 
 Figure k shows how this movement occurs geometrically. At solution X, » the path 
 switches to intersection G, which is constraint 1 in this example, from inter- 
 section G, which is constraint number 2. The constraint is d. _pped from consid- 
 eration because the optimum occurs at X , away from the equality condition of 
 constraint 2. Point X is not a stationary point because the objective func- 
 tion can be decreased by moving along intersection G. 
 
 This completes the explanation of a typical loop in the projected 
 gradient method. 
 
29 
 
 . 3 Proof of the Project id .iradienl Formula 
 The formula in question is 
 
 p k Vf k ■ Wl k - \\ 
 
 where the notation used has been described in the previous section. This 
 equation is crucial to the method because it is the distinguishing factor 
 between this method and other methods of feasible directions, i.e., it is a 
 projection of the gradient onto the intersection of the s constraints. It is 
 also part of the test for a local minimum. A necessary condition for the opti' 
 mum is that 
 
 V* k " ° 
 
 which follows automatically when 
 
 Vf . = . 
 k 
 
 A sufficient condition is that r.. < for I = 1. .... s where r.. is the i-th 
 
 lk ' ' ik 
 
 element of the vector R. Combining these conditions, one knows that a move in 
 any direction fails to decrease the objective function. 
 
 The first step in verifying the equation for P is to let 
 
 E = E + E 
 s u ju 
 
 where 
 
 E = Euclidian s space 
 
 E = Sub space spanned by the columns of U, 
 
 E = Sub space spanned by vectors orthogonal to the columns of U, . 
 
 JU .K 
 
Now let 
 
 where 
 
 30 
 
 Vf , = c . + c , 
 
 k uk )ik 
 
 C = component of the gradient in Q, which is the tangent to the 
 intersection of hyperplanes at X^, 
 
 C , = orthogonal component. 
 
 X 2 
 
 X| 
 
 Figure 5. Representation of the projected gradient formula. 
 

 The illustration in Figure 5 depicts the vectors geometrically. 
 Now 
 
 Vf = C . + C , , C.eE , C.eE 
 
 k uk nk ' uk u ' nk u 
 
 But since E is generated by the columns of U 
 
 C , = U. 
 
 uk ■ \\ ■ 
 
 The 
 
 re CL is a vector of coefficients. Since E is the orthogonal complement 
 
 of E , 
 u' 
 
 u. *c . = 
 
 k uk 
 
 Hence 
 
 \ OT k ■ u k "A 
 
 \ • ("kV" 1 V^k 
 
 since the inverse exists. It follows that 
 
 uk k uk 
 
 -V** 
 
 where 
 
 p k - J - vVV" 1 u k' 
 
32 
 
 3-^ An Example Problem 
 
 The sample problem which demonstrates all of the above principles is 
 to minimize the objective function, -3x, -x , subject to the constraints 
 
 gl (X) = -x^/h + 2x 1 -x 2 > 
 
 g 2 (X) = x^ - k > 
 
 g 3 (X) = x ± > 
 
 g k (X) = x 2 > 
 
 figure 6 represents the problem with X , the initial solution, at [2,3] • It 
 happens to be on a boundary, which eliminates the first step in the algorithm. 
 Note that 
 
 W k = [-3,-1] 
 
 V g;L (X) = [-x.j/2 + 2, -1] 
 
 Vg 2 (X) = [x 2 , Xl ] . 
 
 The calculations of the first move, which are included in Appendix A, indicate 
 that the current solution is not a local optimum because the projected gradient 
 is not zero. The direction of the step length is P-,Vf and the path moves along 
 this direction with the arbitrary step length d = 2. Figure 6 shows the neces- 
 sary vectors. 
 

 g,(X)=o 
 
 2 3 4 5 6 7 8 
 
 X| 
 
 Figure 6. A sample problem of the Projected Gradient Method. 
 
 A constraint must be added with this step length. These calculations are also 
 contained in the Appendix. The solution is iterated back to the feasible region. 
 The second current solution is found by this process to be (1.578,2.53*+) and the 
 constraints are both equal to zero. The computations of the projected gradient 
 at this point are made, and the result is 
 
 P 2 Vf 2 || =0.0 
 
 R = (-0.^9527,-0.91+721+) 
 
 Because the projected gradient is zero and the R vector's elements are negative, 
 the current solution is a local optimum, which is X p . Figure 7 illustrates the 
 local conditions for optimality. 
 
3^ 
 
 X 2 
 
 Figure 7. The optimum in the Projected Gradient Method. 
 
 3. 5 Step Length 
 
 The choice of a step length search procedure depends upon the position 
 the current solution. If it is in the interior, a simple one -dimensional 
 search is performed. The objective function to be minimized is f (X) along the 
 direction 
 
 where 
 
 < d < d 
 — — max 
 
35 
 
 Fhe optimum step length is quite easy to find provided the function is unimodal. 
 
 a is true even if the function is complicated to calculate. Problems arise 
 when the function is multimodal, and these are explained in the following chapt. 
 The curvature of the constraints determines the length of the step if 
 the current solution lies within a S-neighborhood of the boundary. A tradeoff 
 exists between the length and the time required to project back into the feasible 
 
 ; on. The distinct possibility of never returning to the feasible region exists 
 for highly nonlinear constraints, especially if the step length is long. Rosen 
 [12] suggests that a good step length is 
 
 d ■ <*W/« * 
 
 where 
 
 6 = unit of independence of vectors of U 
 
 3 = maximum of 6.. 
 max ^ik 
 
 = measure of curvature of the active constraints. 
 
 One determines the matrix 
 
 \ - E <V ■ 
 
 3 2 g 1 (x k )/5x 1 Sx 1 . . . a 2 g 1 (x k )/a Xl ax i 
 
 ^'W^fri ■ ■ • ^.(V/ 3 *,**, 
 
 and 
 
 \W 
 
36 
 
 where s is the number of constraints in the intersection. This particular step 
 length is one of the most popular, but there are many alternatives. 
 
 An arbitrary constant step length could be used if the second deriva- 
 tives are expensive to compute. The time taken to achieve convergence would be 
 increased. Brown [l] arrives at the constant i/50 in which I is the length of 
 the longest vector in the feasible region. If the optimum happens to be between 
 the intervals, two things are possible. The function can be interpolated and 
 the answer approximated, or the grid size is refined again and again until the 
 proper accuracy is achieved. The latter case appears to be the better because 
 termination occurs at the desired precision. It halts at 5-closeness to the mini- 
 mum. For these reasons it is used in the combined algorithm. 
 
 3.6 Zigzagging 
 
 All gradient methods have major disadvantages when the contours do not 
 form well conditioned hills, i.e., when ridges and valleys occur. Ridges are 
 troublesome to define except in a vague sense because a change of scale affects 
 the conditioning of the hill. One way to categorize ridges is by the difficulty 
 of descent or the rate of convergence. Difficulty of descent is determined by 
 the size of the arc of a circle which encompasses directions which result in a 
 decrease of the function for a given step length. Usually a small step length 
 makes movement easier, and a variable step length helps handle ill-conditioned 
 hills. This is another advantage of the variable step length. Figure 8 shows 
 the trouble with climbing a hill with a gradient technique. Zigzagging is demon- 
 strated. In some problems the procedure runs thousands of iterations without 
 noticeable improvement. A change of scale could be made to round the ridges, but 
 the axis of the hill must be known. This is seldom the situation. A good tech- 
 nique is to shorten the optimum step length by 0.9 and use this length. 
 
37 
 
 Figure 8. An example of zigzagging. 
 
 Overshoot is lessened. Another ridge correction technique is to approximate the 
 ridge axes by a quadratic equation. If the second derivatives are known, this 
 method accelerates movement along the ridge. 
 
 These methods are not cure-alls for zigzagging; they are merely 
 attempts to speed convergence. 
 
 3«7 Convergence 
 
 The gradient projection method is proven to converge to a local mini- 
 mum if certain conditions are fulfilled; however, an inordinate number of itera- 
 tions is sometimes required. This situation is clearly unacceptable. One way 
 this happens is by the accumulation of roundoff and truncation errors. For 
 example, the length of the projected gradient might never reach zero, and a 
 reasonable approximation must be made. The procedure terminates at a 5-close- 
 ness. Another method for the determination of the local optimum is to test the 
 
38 
 
 rate of convergence 
 
 * - (f k - f k+ iV f k 
 
 and halt below a predetermined amount. The projected gradient method uses this 
 idea in the a test. It is the usual scheme when the solution occurs in the 
 interior of the feasible region. 
 
 What happens to zigzagging when a test for the optimum is made by a 
 test of convergence? Perhaps the path of solutions is climbing a ridge or is 
 stuck on a peak. Even worse, the current solution could be alternating around 
 the optimum in some random fashion. 
 
 Hadley [8] utilizes an approximation to speed convergence. A second 
 order Taylor series is used 
 
 f(x) = f k + Vf^(X - X k ) + | (X - X k )' L k (X - X k ) 
 
 where L is the Hessian matrix. The estimate for the stationary point is found 
 by setting the gradient equal to zero in this approximation. The following 
 system of equations is the end result 
 
 -1 
 
 ^k ~k 
 
 x k ,, 
 
 The answer is found by solving this system. This method is used only in the last 
 few iterations because it is computationally involved. 
 
 The rate of convergence is not a clear concept. One never knows if 
 the process will accelerate just around the next bend. An analogy can be drawn 
 with the driver of an automobile who is attempting to pass a truck in the moun- 
 tains. Should he risk obliteration (infeasibility) by passing on a curve? Per- 
 haps it is more profitable to ride the curves out in the hope of nearing a 
 
39 
 
 straightaway. The answers to these questions are similar to those asked in 
 nonlinear programming. 
 
k. THE CM BUSED METHOD 
 
 ^■.1 Discussion of the Method 
 
 The methods described in the two previous chapters are now synthesized 
 into a working system to solve the widest possible range of applications. Both 
 methods are applied in solving most problems because the advantages are better 
 utilized in this fashion. Use of the combined method will increase the probabi- 
 lity of attaining the global optimum. This chapter describes the system as a 
 whole with the aid of the flow diagram in Appendix B. The first diagram is a 
 short summary of the Combined Method, which should be easily understood at this 
 point; the details of the algorithm are described in the subsequent flow diagrams. 
 The individual decisions in the diagrams are fully discussed, as are the particu- 
 lar conditions for termination, such as infeasibility, optimality, etc. The parts 
 of the flow diagram which involve parallel computations are mentioned in this 
 chapter and are described in the following chapter. Otherwise, the description 
 of the system is contained below. 
 
 As in the simplex method of linear programming, the Combined Method is 
 divided into two phases. However Phase I is only concerned with determining an 
 interior feasible solution instead of a basic feasible solution. When an array 
 computer is used in the computation, multiple interior points are determined. 
 Phase :: also predicts the infeasibility of the problem if it is unable to find 
 a solution in the feasible region. 
 
 Phase II begins with the interior solutions and attempts to find the 
 global optimum. The procedure remains within the feasible region unless the pro- 
 jected gradient points outside the region, in which case a correction is quickly 
 rendered. The procedure can therefore be stopped at any step with an improved 
 current solution. Many local optima are usually found which help determine the 
 global minimum by a comparison of them and choice of the smallest. The number of 
 local optima calculated is proportional to the probability of finding the global 
 
1+1 
 
 optimum. Hence more iterations of -he procedure gives better results with a 
 trade-off at some point, at which the increase in accuracy is equal in value to 
 the cost of calculating the increase. The locking test defines the trade-off 
 point and terminates at that position. 
 
 1+.2 Phase I of the Algorithm 
 
 An advantage of the Combined Method is the lack of need for an initial 
 basic feasible solution. A feasible point is needed, however, and an interior 
 point is helpful. Most nonlinear algorithms apply their procedure to an in- 
 feasible point to find the interior initial solution. Both techniques of the 
 Combined Method can employ this approach. 
 
 Phase I is crucial to the entire formulation because the final solu- 
 tion depends a great deal upon the initial solution. Nonlinear programming is 
 essentially a heuristic search which claims little more than the ability to 
 determine a stationary point --usually a minimum when the direction employed is 
 the negative gradient. The inherent uncertainty surrounding the solution can 
 be lessened by reapplying the procedure at different initial points, but the 
 fact remains that the global optimum cannot be found under certainty for non- 
 convex problems. Hence many optima are found in the Combined Method. 
 
 The author contends that the best procedure for Phase I is to pierce 
 the feasible region as much as possible, with the initial solution. ■ The 
 Sequential Method can take over from this interior point, and accuracy is im- 
 proved because the current solution has not been greatly altered. 
 
 One begins the procedure by dividing the constraints into two groups: 
 
 (1) large separable polynomials approximations of the constraints and 
 
 (2) smaller general constraints. The two groups are stored in different ways 
 so that the parallel capability of the computer is realized. The process is 
 defined in detail in the following chapter. In either case, many interior solu- 
 tions are determined in Phase I. 
 
1+2 
 
 The problem in Phase I is tj determine a vector X -which satisfies 
 the constraints 
 
 g.(X) > b. i = 1, . . • , m 
 i l 
 
 or 
 
 g.(X) - b. > . 
 
 The first step is to pick many arbitrary initial feasible vectors X , 
 by any process. One should use any previous information in picking these 
 vectors, otherwise they can be chosen by a random number generator, within 
 certain bounds. These vectors are substituted into the constraints and they 
 are divided into three sets — J, K, L. The next test branch in the flow diagram 
 is needed to handle the different forms of storage. Multiple optima are found 
 in two different fashions; these are listed under number 7 and 8 in the flow 
 diagram. 
 
 Set J = (a|g (Xj - b = 0} 
 
 Set K - {c|g c (X ) - b c > 0} 
 
 Set L = U| gi (X ) - h £ < 0) 
 
 The procedure forces J and L into becoming null sets, with the elements 
 goin g into set K so that the conditions are fulfilled. Select a series of solu- 
 tions which increases the first constraint of set L, without violating the con- 
 straints of set K. Set J is checked as the procedure continues and put into 
 set K or L appropriately. Using the unconstrained method, minimize 
 
U3 
 
 In-HOC^) = - Z lefa) - b/ ] +Pk E h c (y ck ) 
 
 i€L CGK 
 
 which is called the infeasible revised objective function. The negative sign 
 in front of the first summation is necessary because one wants to increase the 
 desired constraint, and this equation is equivalent to maximizing a positive 
 quantity. The minimization is performed in two separate ways depending upon 
 the existence of second partial derivatives. The Newton Method is used if the 
 derivatives exist; otherwise a conjugate direction method is used. The flow 
 diagram accounts a multi -model revised objective function through the global 
 test. This test reiterates the procedure for another minimum if it is not 
 satisfied with the solutions at that point. A sequence of iterations is per- 
 formed with a decreasing p . The sets are continuously checked for the proper 
 inequality during the sequence. A constraint of set L leaves the set when it 
 becomes positive and is put into the K set. The process halts when set L is 
 null. Set J is probably null at this time also. If it is not, the minimization 
 is performed on the elements of set J. The problem is infeasible if either set 
 L or J have elements when the minimum is found, provided the function is convex. 
 Otherwise, the procedure is returned many times to test multiple starting points 
 through the test until it is satisfied with infeasibility. At some point, the 
 answer at the test is negative and the algorithm halts. 
 
 The procedure assumes that the equality constraints have been converted 
 to inequalities by the formula of Chapter 1. If the conversion proves trouble- 
 some, the equalities can be ignored in Phase I but must be reckoned with sepa- 
 rately in the remainder of the algorithm. The interior and exterior methods co- 
 operate to handle equalities. Once multiple interior points have been calcu- 
 lated, the procedure switches to Phase II, which improves the solutions. 
 
kk 
 
 
 4.3 Feasibility Test 
 
 The previous section provides a test of feasibility if the constraints 
 are convex, and it can be applied to nonconvex problems through many iterations. 
 The author's proposal is to work with the equality constraints, which are usually 
 few in number. One limits the range of possible answers by finding a solution of 
 these equations. This method is costly in machine time and should be used only 
 if the method in the previous section fails two or more times. 
 
 The method finds any solution of the equation 
 
 Z [g.(X) - b.] 2 = 
 iej 
 
 for the equality constraints by minimizing this system. At the minimum, the 
 final vector is the solution of the system. The minimization utilizes any un- 
 constrained procedure, such as Steepest Descent or Newton Second Order Method. 
 Another technique is to approximate the function by any interpolation procedure. 
 When a root of the polynomial is found, check it against the equality constraints. 
 If the desired accuracy is not achieved, use a higher order interpolation formula. 
 
 This procedure is not entered on the flow diagram because the first 
 method is usually satisfactory, but it can be added quite easily at entry point 
 number 11 in the flow diagram. 
 
 k.k Phase II of the Algorithm 
 
 The second portion of the algorithm is the most important and the 
 longest with regard to machine usage. It begins with the multiple interior 
 point solutions and attempts to find the global optimum. In most situations 
 the global optimum cannot be found with certainty and one must be satisfied 
 with a good local optimum. The locking test is the crucial portion of Phase II 
 
k5 
 
 ause it determines the technique ij be used during Phase II. It also pre- 
 dicts the final stationary points and termination. A discussion of the test is 
 presented in the following section. 
 
 The first portion of Phase II to be examined is the Sequential Uncon- 
 
 ■ lined Method and it is located at entry point number 6. Many solutions are 
 found during this portion of the algorithm, and a division of the flow diagram 
 is made to show how each type of constraint finds these solutions. A test 
 divides the separable polynomial constraints and the general constraints. The 
 next stage is to find the penalty constant p , which very often is 1 for the 
 
 ri 
 
 first iteration. For the problems with general constraints many solutions can 
 be found simultaneously whereas a sequential loop is required for the problems 
 with separable polynomial constraints. Then the sequential approach is selected 
 and the penalty function is determined. These are fixed for all solutions during 
 application of the Sequential Unconstrained Minimization algorithm. The revised 
 objective function (functions for general constraints) is formed and minimized. 
 Many local minima of the revised objective function are also found in the same 
 fashion as above. Finally, the penalty constant is lowered and the procedure 
 is reiterated until a satisfactory improvement is obtained. The Sequential Un- 
 constrained Method does not usually find the final stationary solution because 
 the Projected Gradient Method has faster convergence near the end of the optimi- 
 zation procedure. 
 
 Many similar tests are performed during the Projected Gradient portion 
 of Phase II. One additional test is for interior points because the procedure 
 uses a different algorithm depending upon whether the solution is on the boundary 
 pr in the interior. Separable polynomial constraint problems again are handled 
 jin a different manner. Many optima are found during the procedure with both 
 portions of Phase II. Control is returned to the locking tests at this point. 
 
k6 
 
 U.5 The Locking Test 
 
 A recurring question in the system is the determination of the proper 
 technique to apply in Phase II. When does it become advantageous to switch to 
 the Projected Gradient Method instead of continuing the Sequential Method? An 
 estimate of the future rate of convergence is made. Obviously, certainty does 
 not enter into the prediction, and no exact answer exists. 
 
 Each method works better for various problems. Two benefits of the 
 Projected Gradient Method are the existence of linear constraints and of few 
 hyperplanes in the intersection at the optimum. On the other hand, the Sequen- 
 tial Method handles nonlinear constraints. To determine the appropriate method 
 to be used, the following scheme is suggested. 
 
 1. Determine an index which categorizes the type of problem according 
 to size and linearity and the user's preference. This index runs from zero, 
 which means a few very nonlinear constraints, to one which represents a large 
 number of quasi-linear constraints. 
 
 2. The locking test is made at various iterations to determine the 
 rate of convergence. A part of the test is made with the equation 
 
 ' - (f k ' f k+ iVf k ! 
 
 which tests for the rate of convergence. 
 
 3- The locking test keeps the current technique in position according 
 to the index. If it is zero, the Sequential Unconstrained Technique is used 
 alone. If the index is 1, the Projected Gradient Method is the only method 
 employed in the problem. 
 
 h. Indices between the extremes are weighed proportionately. The 
 algorithm quickly switches to the Projected Gradient Method when the index is 
 large and the convergence is small. Conversely, the Sequential Unconstrained 
 Method is used often when the index is small. 
 
*7 
 
 5. The locking test also determines the final stationary optimum. 
 It must analyze the previous information and predict the termination point. 
 
 Global -Local Conditions 
 
 The global-local difficulty has been mentioned often in this paper, 
 nonconvex problems there is no certainty that the final solution is the global 
 optimum. For many types of problems, this condition is usually not serious be- 
 cause a close approximation is adequate. Many precautions are taken in the 
 Combined Method to overcome this problem. For example, multiple interior 
 points are found in Phase I which lead to many local optima during the procedure. 
 Even in the minimization of the revised objective function, many local optima 
 are determined to increase the probability of finding the global solution. Few 
 algorithms have the flexibility of the Combined Method built into the system for 
 finding the global optimum. This useful capability is one of the advantages of 
 the Combined Method. 
 
 The power of the simplex method in linear programming stems from the 
 ability to locate the global optimum under certainty. Some types of problems 
 are better solved by linear approximations because of the global conditions and 
 ;he cutting plane method uses this idea; however, the approximation leads to 
 Lnfeasible solutions in many situations. Perhaps nonlinear programming will also 
 tichieve the capability of the simplex method someday. Until that time, methods 
 -ike the Combined Method must be used. 
 
kS 
 
 5. IMPLEMENTATION ON ILLIAC TV 
 
 5-1 Parallel Computations 
 
 The purpose of this chapter is to present some of the difficulties 
 encountered in actual implementation of the Combined Method on an array computer 
 such as ILLIAC IV. Solutions for some of these difficulties are described and 
 alternative suggestions are mentioned. The parallel portions of the computa- 
 tional algorithm are fully explained and storage schemes are developed. All 
 nonlinear programming problems to be solved by the Combined Method are divided 
 into two groups: (a) the problems with a large number of constraints which 
 are linear to a great extent, and (b) the smaller problems with a more general 
 form of constraints. Storage schemes are developed for each group to handle 
 particular characteristics as explained below. 
 
 ILLIAC IV is a large scale array computer which is divided into four 
 quadrants with 6h processing elements (PE) and 6k processing element memories 
 (PEM) per quadrant. The Combined Method is designed to fit into one quadrant 
 but can be expanded at a later date. A quadrant has one control unit (CU) which 
 broadcasts simultaneous instructions to the PEs. Individual PEs operate only if 
 the mode is correctly set. The idea is to keep a maximum number of PEs operating 
 to enhance the efficiency of the machine. All computer installations are con- 
 cerned with efficiency and ILLIAC IV is no exception. This is emphasized by the 
 fact that 6*+ PEs must wait for an input-output request. Parallel calculations 
 improve this efficiency. 
 
 Any algorithm can be coded in parallel to some degree; however, certain 
 techniques are better than others. Much of the time, the Projected Gradient 
 Method builds on previous steps. The Sequential Unconstrained Method also uses 
 a great deal of previous information during the sequence. Portions of the method, 
 such as the locking test, are not so satisfactory for parallel computations. 
 Nonetheless, nost of the algorithm can be successfully implemented. 
 
h 9 
 
 Storage Methods 
 
 The division of the constraints into two groups has been discussed 
 Ler in the paper. This division is made to better utilize the parallel 
 capabilities of the machine. Tt is possible to divide the types of problems 
 according to many criteria other than that used in the Combined Method. Perhaps 
 the user wishes to refrain from approximating the constraints in a large problem. 
 He is faced with the difficulty of storing the code on the disk and waiting for 
 retrieval of the information. Depending upon the particular type of problem, 
 he must devise some method which improves efficiency. These kinds of problems 
 are not examined here because of the enormous variety of the forms of large 
 general constraints. Also these problems are not common because of the effort 
 required in setting them up. 
 
 The author does not discount all large size nonlinear problems; how- 
 ever, these are limited to those which can be approximated by separable poly- 
 nomial equations. These have the form 
 
 r 
 
 g, (X n ) - b n = Z (a .x . + . . . + a, .x . + a . ) 
 
 & l v 1 J 1 . v nj j lj j oj' 
 
 where n is the highest order of the polynomials. The objective function can 
 possess any reasonable form, but the constraints must have the above formulation. 
 Linear programming constraints have this form and are clearly a subset of the 
 separable polynomials. Large size constraints which do not possess this shape 
 are approximated by n-th order polynomials before the procedure is run. The 
 storage of the separable polynomials is a variation of the "strewing" method 
 devised for the ILLIAC IV LP algorithm with the columns and rows interchanged. 
 The method assigns coefficients of the columns of the constraints depending 
 upon the number of elements. The most dense column enters PE n , the next dense 
 into PE , and proceeds to PE^- where the order of assignment is reversed. The 
 densest remaining column enters PE/-~ and the process returns to PE~. The 
 
50 
 
 overlapping of columns is continued 'mull no coefficients remain. Next, the 
 PEs are sorted according to rows. Each coefficient is stored in 32 hit half- 
 words with the power and the column and row tags in the remaining portion of the 
 word. This storage scheme has many advantages: (l) the rows can be calculated 
 in parallel and the columns are accessible if needed; (2) the number of elements 
 per PE is approximately equal; (3) only the nonzero elements are stored, which 
 saves space because of the sparseness of large problems. The general objective 
 function is stored in a manner sijnilar to the method below. 
 
 The general constraints are stored in a different fashion than the 
 separable polynomials. For example, the constraint 
 
 In x r 
 
 -1 """** "2 -1 
 g 1 (X k ) = sin (x ± x^ + esc x^) , 
 
 must be stored in some fashion. It is translated into code which is entered in 
 a straight storage scheme. 
 
 PEM 
 
 
 
 
 PEM 1 
 
 instl' 
 
 inst2 
 
 
 inst3 inst^ 
 
 inst ' 
 129 1 
 
 
 
 
 x n 1 
 
 X 12 
 
 
 X 13 \ X lk 
 
 x 21 1 
 
 1 
 
 X 22 
 
 
 X 23 \ X 2h 
 
 PEM 63 
 
 inst ' inst 
 127 | 128 
 
 X l,127j X l,l28 
 X 2,127j X 2,128 
 
 Figure 9* A straight storage scheme. 
 
51 
 
 opt. To realize the parallel cap- , 'he 
 algorithm calculates many simultaneous vectors. Tli liagram Lustrates 
 this fact. Thr data now is the generated matrix of ti -h 
 
 Ls aiso illustrated in Figure 9. The first subscript represents the position 
 
 Mi in a solution, and the second subscript designates the current solution; 
 for example, X is the first position within the k-th current solution. In 
 
 it cases 6k solutions are computed together, which helps improve the global- 
 local difficulties. 
 
 These two diverse storage schemes represent two trains of thought. 
 The first idea is that large problems are so unruly that any reasonable answer 
 Is usually sufficient; hence the parallel capabilities are used to achieve this 
 solution. The algorithm gives better solutions by reiteration as often as the 
 user wishes. Problems with a smaller number of general constraints are handled 
 
 a different philosophy. The parallel capability is used to predict the global 
 optimum by calculating a multitude of local optima simultaneously and its advan- 
 tage lies in this concept. 
 
 5.3 Disk Delays 
 
 Although the ILLIAC IV is designed with an extremely fast disk storage, 
 it is slow relative to the speed of computations of the machine. The delays for 
 input-output of information must be minimized, of course, but the latency time 
 also must be utilized. In other words, the time when the computer is requesting 
 information must be used for other calculations. The locking test is ideally 
 suited for this delay purpose because it is performed between iterations. The 
 locking test determines the subsequent method to be used while the data is being, 
 read into memory. This procedure improves efficiency because the locking test 
 is a sequential operation for the most part and it would slow the algorithm if 
 used elsewhere. 
 
52 
 
 The Combined Method has relatively few iterations because it attempts 
 to maximize improvement of the objective function at each step. In this respect, 
 it is well suited to an array computer because it minimizes the difficulty men- 
 tioned heretofore. Problems with few iterations, even though the computations 
 per iteration might be complicated, are much more appropriate to the ILLIAC IV. 
 The Combined Method fulfills these requirements. 
 
 5«*+ Computational Procedure 
 
 The algorithm has been explained above, but some of the more elementary 
 calculations have not been discussed. This section attempts to alleviate this 
 omission. 
 
 The heart of the Unconstrained Technique is the incorporation of the 
 constraints into the objective function which results in an unconstrained prob- 
 lem. This conversion is a likely target for parallel calculations, and the in- 
 verse penalty function is nice in this respect. Recall the form of the inverse 
 revised objective function 
 
 m 
 
 H( W - f k + \ .Y /g i ( V 
 
 i=l 
 
 To determine the revised objective function, the algorithm computes the constrain 
 vector simultaneously across processing elements for separable constraints, 
 broadcasts p to all elements, multiples, and finds the sum within each PE and 
 then globally across the PEs. The conversion takes place frequently and is 
 therefore important. For general constraints multiple solutions are determined. 
 
 The computation of the logarithmic penalty function also is an easy 
 task to perform in parallel. The familiar relationship 
 
 in x + £n y = £n xy 
 
53 
 
 used to advantage. Substitution f the vector X Lnto the constraints is 
 
 milarl.v bo the Inverse penalty function. The operands are multiplied, 
 i the natural logarithm is taken. 
 
 A most important portion of the Combined Method is the computation of 
 Lient. It is the central idea in the Projected. Gradient Method and is 
 it useful and necessary in the first portion of the Combined Method in mini- 
 an unconstrained function. The gradient is defined by the equation 
 
 Vf(X) = (df/tfx^ . . . , df/dx ) 
 
 and the partial derivatives are usually found through the approximation 
 
 df/o^ = (f(x^ + D) -f^ - D))/2d 
 
 where D = (d, 0, . . . , 0) 
 which is called the central difference equation. It has better accuracy than 
 the forward or backward difference equations. Of course exact derivatives are 
 used when they are easily calculated, as in polynomials. Here the derivatives 
 are fed as input into the system. The direction of the gradient is normalized 
 by the equation 
 
 = [(df/d Xl ) ,.".., (df/dx n )]/a , 
 
 l2~ 
 
 a = 
 
 n 
 
 2 
 
 The Sequential and Projected Methods have separate means of termination 
 at a local stationary point. Necessary and sufficient conditions show that both 
 of these methods usually arrive at an optimum in the limit. The Sequential 
 Method is especially weak when the final solution occurs at a boundary. Hence 
 the projected gradient test for an optimum is used instead. This test takes the 
 
5U 
 
 form 
 
 6 < II P k W k 
 
 and 
 
 R.. < i = 1. . . . , s 
 
 where 5 is the tolerance presented heretofore. When these conditions are ful- 
 filled, the current solution is no more than a distance 8 from the local opti- 
 mum. The first equation is computed sequentially in the locking test and com- 
 pared against the constant. Time is lost in many of the elements but not a 
 great deal because of the shortness of this computation. The Rs can be computed 
 in parallel, on the other hand. The rows and columns of the matrices can be 
 operated on together for calculation of the R-vector. One test is made to deter- 
 mine if any are negative. 
 
 5. 5 Size Limitations 
 
 The number and magnitude of the constraints determines the size of the 
 nonlinear problem. The system can be quite large for the quasi-linear con- 
 straints. In fact nonlinear programming techniques might be used successfully 
 in linear programming. The storage would be the same as in the previous section 
 for separable constraints. At the present time it is difficult to determine the 
 feasibility of this proposal. The gradient method cuts across the feasible re- 
 gion, in contrast to the simplex method which follows the boundaries at basic 
 feasible solutions. The advantage- of the gradient method will be determined when 
 a comparison of the number of iterations is developed. 
 
us on , lonlinear 
 The :i ms1 raints are ston code, 
 
 t\ uit.-i i i mall nonlinear problems is the con- 
 
 :ept of "multiple search". he process is described i mpl one-dimen- 
 Lrch below. The curve of Figure 10 has many peaks. How does as 
 id the global optimum for this function? For an array computer, a multiple 
 
 jorithm Ls ested. Although the example is elementary, the uses are 
 
 Figure 10. Multimodal functions. 
 
 many; for example, the search for the optimum step length along the negative 
 gradient direction is a one -dimensional search. The procedure first determines 
 the boundaries of the zone of uncertainty. The zone is divided into 6k smaller 
 intervals and the values of these points are found in parallel. A sweep of the 
 
56 
 
 6U functional values decides upon the number of peaks in the zone. In the 
 example three valleys are found. The process now switches to the sequential 
 mode in which each of the three valleys are closed upon together. If the inter- 
 val had been larger, the simultaneous search would have been performed a few 
 more iterations. By this method, many local minima are found. When the zone 
 of uncertainty has been closed sufficiently the optima are compared for the best 
 value. The probability of finding the global optimum is greatly increased. 
 This same idea can be applied to more complicated searches such as the Sequen- 
 tial Unconstrained Minimization Technique. Parallel computations make this task 
 worthwhile and profitable. 
 
57 
 CON LUSION 
 
 This paper has examined various algorithms to d bhe most 
 Le to utilize on the array computer ILLIAC IV. Marry of these methods, 
 s the Cutting Plane Technique, were rejected because of the lack of 
 at ions for nonconvex problems. Other methods are rejected because of 
 the slowness of convergence for general problems. The best methods are the 
 Sequential Unconstrained Minimization Technique and the Projected Gradient 
 Method. These methods work in nonconvex regions, have good convergence pro- 
 perties, and augment each other quite effectively. For these reasons, the two 
 algorithms are combined into a single system for solving nonlinear problems. 
 
 Detailed questions about the Sequential Unconstrained Technique were 
 examined and answered. The Projected Gradient Method was presented in depth, and 
 the difficulties examined. The actual implementation of the Combined Method was 
 examined, and comments injected where applicable. The parallel capabilities of 
 portions of the method were brought out as were questions concerning data mani- 
 pulation. 
 
 Many results have been discovered through the study. Some of the more 
 useful are presented below: 
 
 1. Nonlinear programming theory is a hodgepodge of particular examples; 
 hence, one method is not acceptable in a general nonlinear system. 
 
 2. The Sequential Unconstrained Method is the most straightforward of 
 all existing theories. It also has many computational advantages. 
 
 3. Although the Projected Gradient Method is quite involved, the 
 extra calculations often accelerate convergence. 
 
 k. A combination of these two methods gives wide latitude in the 
 range of nonlinear problems which can be solved. 
 
 5- A test, which the author calls the locking test, must be developed 
 for controlling the method being used. 
 
58 
 
 6. The Combined Method appears to be the best system for implementa- 
 tion on an array computer. 
 
 Any system for solving nonlinear programming problems is very incom- 
 plete at best. Hence many limitations are present when the design is finished. 
 To compensate for the unexpected deficiencies one must allow for an evolutionary 
 process to take place. The system must change as particular difficulties or un- 
 expected types of applications occur. The Combined Method leaves room for 
 future changes in many places. For example, the user can supply an additional 
 technique, such as the Cutting Plane Method, if certain portions of the feasible 
 region display linear characteristics. This kind of detail must be developed 
 after the design has been implemented, that is, if the design of the algorithm 
 makes provision for these additions. 
 
 The author believes that the problems with separable constraints will 
 be the first application of the Combined Method. Linear programming problems, 
 which are a subset of the class of separable polynomials, might be better solved 
 with a nonlinear technique for very large size problems; quasi-linear problems 
 are included in this framework. 
 
 Problems with general constraints are the least defined and will there- 
 fore be the last application of the Combined Method to be fully realized. None- 
 theless much progress can be made with these applications by minimizing the 
 global-local difficulties. Effort should be initiated in this area. 
 
LIST 0] 
 
 [l] Hrown, P.M., and Ang, A.H., "Structural Optimization by Nonlinear 
 Programming", Journal of the :' ■ i-'i ri.ural Pivision , ASCE, %2 
 (1966), pp. 3l9-3 1 +o. 
 
 [2] Pavidon, W.C., "Variance Algorithm for Minimization," The Computer 
 Journal , 10 (1968), pp. U06-410. 
 
 [3] Porn, W.S., "Non-Linear Programming-A Survey," Management Science , £ 
 (1963), pp. 171-208. 
 
 [hi Fiacco, A.W., and McCormick, G.P., "Programming under Nonlinear 
 Constraints by Unconstrained Minimization: A Primal-Pual 
 Method," Technical Paper RAC-TP-96, Research Analysis Corpora- 
 tion, McLean, Va«, 1963* 
 
 [5] Fiacco, A.W., and McCormick, G.P., Nonlinear Programming : Sequential 
 Unconstrained Minimization Technique , John Wiley, New York, 1968. 
 
 [6] Fletcher, R., and Powell, M.J. , "A Rapidly Convergent Pescent Method 
 for Minimization," The Computer Journal , 6 (1963), pp. 163-168. 
 
 [7] Fletcher, R., and Reeves, CM., "Function Minimization by Conjugate 
 Gradients," The Computer Journal , J_ (196U), pp. 1U9-I5U. 
 
 [8] Hadley, G., Nonlinear and Pynamic Programming , Addis on -Wesley, 
 Reading, Mass., 1964. 
 
 [9] Kelly, J.E., Jr. "The Cutting Plane Method for Solving Convex Programs," 
 J.SIAM 8 (I960), pp. 703-71 i +- 
 
 [10] Kuhn, H.W., and Tucker, A.W., "Nonlinear Programming," Proceedings 
 
 of the Second Berkeley Symposium on Mathematical Statistics and 
 Probability , U. of California Press, Berkeley (1951), 
 pp. 481-1+93- 
 
 [ll] Rosen, J.B., "The Gradient Projection Method for Nonlinear Programming, 
 Part I: Linear Constraints," J.SIAM , 8 (i960), pp. 181-217- 
 
 [12] Rosen, J.B., "The Gradient Projection Method for Nonlinear Programming, 
 Part II: Nonlinear Constraints," J.SIAM , £ (1961), pp. 51^-532. 
 
 [13] Wilde, P. J., Optimum Seeking Methods , Prentice-Hall, Englewood Cliffs, 
 N.J., ±9~bh. 
 
 [l4] Wolfe, P., "Methods of Nonlinear Programming," in R.L. Graves and 
 P. Wolfe (Eds.), Recent Advances in Mathematical Programming , 
 McGraw-Hill, New York, 1963, pp. 67-86. 
 
 [15] Zoutendijk, G., Methods of Feasible Pirections , Eisevier, Amsterdam, 
 i960. 
 
 [l6] Zoutendijk, G-, "Nonlinear Programming: A Numerical Survey," SLAM J . 
 Control , h (1966), pp. 194-210. 
 
60 
 
 APPENDIX A 
 
 CALCULATIONS FOR THE EXAMPLE PROBLEM 
 OF THE PROJECTED GRADIENT METHOD 
 

 Minimize 
 
 subject to 
 
 Note that 
 
 Setting 
 
 "3 X 1 " 
 
 §1 (X) = -x 2 ± /k + 2x ± - x 2 > 
 
 g 2 (x) = Xl x 2 - h > o 
 
 g 3 (X) = x x > 
 
 gl+ (X) = x 2 > . 
 
 Vf k = [-3,-1] 
 
 V gl (X) = [- Xl /2 + 2,-1] 
 Vg 2 (X) = [x 2 , Xl ] 
 
 X x = [2,3] 
 
 g 1 (X 1 ) - , g 2 (X 1 ) = 2 > 
 
 u x = [vg 1 (x 1 )] 
 
 = [1,-1] 
 
 U 1 U 1 " 2 
 
62 
 
 v l = ^iv" 1 = - 5 
 
 R l ■ w*! ■ - 1 
 
 U^ = [-1,1] 
 
 ? i^i = ^i - U A = [ " 2 '- 2] 
 
 Move in the direction of projected gradient P-,^-, • Let d = 2.0 . Now 
 
 N 
 
 l = ^l/H ^l II = [--707, -.707] 
 
 Y = X + d-^ = [0.586,1.586] 
 W(Y X ) = g 1 (Y 1 ) = [-0.5] 
 
 correction = YA^n-1^ = ^--25, -25D 
 
 Y = Y ± - correction = [0.836,1.336] 
 
 W(Y 2 ) = [0.161] 
 
 correction- = UVW n (Y . ) = [.756.I.U16] 
 
 2 l v m-1 ' 
 
 W(Y ) = -O.0U7 
 
 and continue until 
 
 gl (x) < .005 . 
 
 The solution thus obtained is not very good due to the arbitrariness with which 
 the value d = 2 was selected. 
 

 Adding a constraint 
 
 if X 1 = [2,3] , then g 2 (X) = 2 > 
 
 if X 2 = correction g (X) = -2.93 
 
 By interpolation 
 
 d = j£j£ X 2.0 = .81 
 
 Y ± = X + d-N x = [1.1+28,2.428] 
 
 W(Y 1 ) = [-0.081] 
 
 correction = UW(Y ) = [-.o4o,.o4o] 
 
 Y 2 = Y l " correcti °n = [1.468,2.388] 
 
 W 1 (Y 2 ) = (.014) 
 
 W 2 (Y 2 ) = (-.49) 
 "by continuing the above process one finds 
 
 x 2 = [1.578,2.534] 
 and 
 
 gl (x 2 ) = g 2 (x 2 ) = 0. 
 
6k 
 
 
 Now check for the optimum at X p . In this case U. is a 2 X 2 nonsingular matrix, 
 in which case 
 
 since 
 
 P - I - V^VS " ° ' 
 
 Hence 
 
 (U 2 U 2 rl 'UjVg) -1 
 
 P 2 ^f 2 II - 
 
 R 21 <0 
 
 R 22 < o , 
 
 therefore X is a local optimum. 
 
6s 
 
 APPENDIX B 
 
 FLOW DIAGRAMS OF THE 
 COMBINED METHOD 
 
66 
 
 CjElJ 
 
 read 
 input 
 
 ^problem 
 
 Phase T 
 
 Summary of the Flow Diagram 
 
 infeasible 
 
 sequential 
 
 unconstrained. 
 
 method 
 
 print 
 interior 
 solutions 
 
 no 
 
 solution 
 
 V - print 
 \Lnfeasible 
 ~*\ problem 
 
 f Halt J 
 
 & 
 
 Phase II 
 
 locking 
 test 
 
 final 
 station> 
 
 ary 
 :olution 
 
 print 
 answer: 
 
 -U Halt J 
 
 yes 
 
 find a local 
 stationary 
 point 
 
 find a 
 
 better 
 
 solution 
 
 print 
 results 
 

 ( lrt J 
 
 l-'h.-i.-.i' I oi' Mi" Al^. )i-i thin 
 
 store 
 general^ /general 
 
 [constraints 
 ^across PEs' 
 
 separable 
 
 find separ- 
 able poly- 
 nomial 
 approximations 
 
 store 
 
 coeffi- 
 cients and 1 
 flag 
 
 €> 
 
 k=l 
 
 
 
 find 
 
 k-th 
 
 solution 
 
 find 
 
 multiple 
 
 k-th solutior 
 
 no 
 
 & 
 
 yes. 
 
 K? 
 
 no 
 
 divide 
 constraints 
 into 3 sets 
 
 yes» 
 
 ■V 
 
Separable Polynomials 
 
 68 
 
 Cterap = k ^ 
 *=1 J 
 
 find k-th 
 f. 17 J penalty p R 
 
 form k-th re 
 vised objec- 
 tive function 
 
 Gh 
 
 yes 
 
 minimize by 
 second order 
 method 
 
 minimize by 
 
 conjugate 
 directions 
 
 yes 
 
 form another 
 
 starting 
 
 vector 
 
 <<? 
 
 yes 
 
 -+f k = k+1 \ 
 
 minimum 
 
 is now the 
 
 k-th solution 
 
 conver- yes 
 
 form another 
 starting 
 vector 
 
 J 
 
 ^ 
 
 -+fk = temp + lj "4 k \ 
 
69 
 
 
 
 Ctemp - k ^ 
 
 * — . find k-th 
 (i,.V* penaltj 
 V_y natant p 
 
 General Constraints 
 
 /second \ 
 
 ves m 
 
 minimize by 
 Newton Method 
 
 
 ^^derivatives' 
 
 
 '6U solutions) 
 
 
 Ino 
 
 
 
 
 
 minimize by 
 conjugate 
 directions 
 
 
 pick best of 
 
 J 
 
 
 
 
 *v 
 
Infeasibility Test 
 
 70 
 
 .Jeparable\ ves i — -— i 
 )olynomials^ , r?B K k J 
 
V 
 
 Phase II of the Algorithm 
 
 Phase II 
 
 find index 
 for locking 
 test 
 
 locking test 
 for 
 convergence 
 
 yes 
 
 PCM 
 
 print 
 answers 
 
 projected 
 gradient 
 
 •C Halt J 
 
 V 
 

 
 72 
 
 projected 
 
 gradient 
 
 method 
 
 \~~® 
 
 yes 
 
 *^p 
 
 yes 
 
 (temp = k "\ 
 * = * y 
 
 /araoj.e\ ><• ■ ■->. 
 
 ajolynom-yi^ — ^( temp = k A 
 
 find all 
 Vf, 
 
 find all 
 
 step 
 
 lengths 
 
 move to 
 
 next 
 solutions 
 
 V 
 
 find 
 Vf, 
 
 find step 
 length and 
 move there 
 
 k = temp + 
 
 
 
 ) 
 
 fk = k+1 J C k = k+1 J 
 

 
 73 
 
 find multiple 
 P k W k = ^ k 
 
 find 
 V*k 
 
 find best 
 
 step 
 
 length 
 
 move m 
 
 that 
 direction 
 
 find all 
 
 step 
 
 lengths 
 
 move 
 toward 
 region 
 
 move to 
 
 next 
 
 points 
 
 project 
 back into 
 region 
 
 compare 
 
 all 
 
 solutions 
 
 pick best 
 solution 
 
 (k = temp + 1 ) 
 
 <77777) <*? 
 
lh 
 
 sequential 
 
 unconstrained 
 
 method 
 
 yes 
 
 find Pl 
 
 determine 
 approach 
 
 28 
 
 32 
 
 28 
 
 find 
 
 penalty 
 
 function 
 
 
 form 
 revised 
 
 
 objective 
 function 
 
 
 
 
 
 find all 
 
 V s 
 
 determine 
 approach 
 
 determine 
 
 penalty 
 
 function 
 
 <temp = k j 
 * ° 1 J 
 
 find multiple 
 minima 
 for k 
 
 pick best 
 minimum 
 
 c 
 
 k = k + 1 
 
 minimize 
 unconstrained 
 function 
 
 yes 
 
 no 
 
 ^ 
 
 es 
 
 ^} 
 
UNC] \ ' 
 
 Sccunly Cluwmfi , ■ a itioi n 
 
 DOCUMENT CONTROL DATA R&D 
 
 (Security cl...,l,ca,lon ol ,1,1a. body ol mnmlrmc , mnd , ndm ,„, .„„„,.„„ muml „. .„,.,.„ „» „, ,„. fJ ( f- _ ,„. 
 
 °' NA ( TINC «"-/MCon»o,. ( ..u,^ U.. REPORT. LC» W .,Y C, ASM,., a,, J 
 
 nent of Computer Science UNCLASSIFIED 
 
 rsity of Illinois at Urbana-Champ 
 . ill Lnois 6lv' 
 
 MPOKT TITLE 
 
 26. CROUP 
 
 'NEAR PROGRAMMING ALGORITHM FOR AN ARRAY COMPUTER 
 
 descriptive notes (Type of report and Inclualre datum) 
 »UIHORISI(F/riln«i., middle Initial, la at name) 
 
 chael Mulvey 
 
 
 
 
 "t^ORT DATE "~~ ~" ~ ~" 
 
 dber 22, 1969 
 
 7J). TOTAL NO. OF PACES 
 
 83 
 
 76. NO. OF REFS 
 
 16 
 
 . CONTRACT OR GRANT NO 
 
 US AF 30(602)1+11+1+ 
 
 . PROJEC T NO. 
 
 - , 6-15-305 
 
 9a. ORIGINATOR'S REPORT NUMBER(S) 
 
 DCS Report No. 357 
 
 
 »b. OTHER REPORT NOISI (Any other numbers that may be amaloned 
 thlm report) 
 
 DISTRIBUTION STATEMENT 
 
 
 
 ualified requesters may obtain copies from DCS. 
 
 SUPPLEMENTARY NOTES 
 
 NONE 
 
 ABSTRACT 
 
 12. SPONSORING MILITARY ACTIVITY 
 
 Rome Air Development Center 
 Griffiss Air Force Base 
 Rome, New York I3I+I+0 
 
 This paper describes an algorithm for implementing a nonlinear pro- 
 gramming system on the array computer ILLIAC IV. A brief review of existing 
 algorithms is made to determine the feasibility of using a particular method. 
 The choice for the ILLIAC IV is a combination of the sequential unconstrained 
 minimization technique and the projected gradient method. The advantages of 
 this combined method are discussed. Problems associated with implementation 
 of the algorithm on the machine are discussed, and an overview of the system 
 is presented. 
 
 D , F r..1473 
 
 UNCLASSIFIED 
 
 Security Classification 
 
UNCLASSIFIED 
 
 Security Classification 
 
 KEY WO R DS 
 
 non-linear programming algorithm 
 array computer 
 
 ILLIAC IV 
 
 sequential unconstrained minimization technique 
 
 projected gradient method 
 
 RO LE WT 
 
 UNCLASSIFIED 
 
 Security Classification