X I B RAHY OF THE UN IVERSITY Of ILLINOIS 510.84- Ii6r no. 355-360 cop.Z The person charging this material is re- sponsible for its return on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANACHAMPAIGN NOV 1 19T1 OCT 13 Rtco L161— O-1096 ft • C I w/*aJ Keport No * 357 A NONLINEAR PROGRAMMING ALGORITHM FOR AN ARRAY COMPUTER by John Michael Mulvey m October 22, I969 Digitized by the Internet Archive in 2013 http://archive.org/details/nonlinearprogram357mulv Report No. 357 A NONLINEAR PROGRAMMING ALGORITHM FOR AN ARRAY COMPUTER by John Michael Mulvey October 22, I969 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 *This work was supported in part by the Advanced Research Projects Agency as administered by the Rome Air Development Center under Contract No. US AF 30 (602)ljlM and submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, September, 1969. ABSTRACT This paper describes an algorithm for implementing a nonlinear p gramming system on the array computer ILLIAC IV. A brief review of existing algorithms is made to determine the feasibility of using a particular method. The choice for the ILLIAC IV is a combination of the sequential unconstrained minimization technique and the projected gradient method. The advantages of this combined method are discussed. Problems associated with implementation of the algorithm on the machine are discussed, and an overview of the system is presented. ACKNOWLEDGMENT The author would like to express appreciation for the help ren i by Professors R. S. Northcote and I. W. Marceau in the development of the report. Thanks are also due Miss Sharon Dunlavey for her kind encouragement in organizing the paper. Finally, Mrs. Patricia Douglas must be thanked for her patience in typing the manuscript. The results recorded herein were obtained in the course of research studies developed under the auspices of the ILLIAC TV project in the Department of Computer Science, University of Illinois. This paper would not have been possible without this support. TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1 The Object and Scope of the Study 1 1.2 Organization of the Report 1.3 Statement of the Problem l.k A Review of the Existing Algorithms k 1.5 A Description of the Combined Method 7 2. THE SEQUENTIAL UNCONSTRAINED MINIMIZATION TECHNIQUE 9 2.1 Formulation of the Method 9 2.2 Penalty Functions 10 2.3 Interior Points 12 2.k Exterior Points lk 2.5 Unconstrained Algorithms 18 2.6 Advantages 22 3. THE PROJECTED GRADIENT METHOD 2k 3.1 Review of the Problem 2k 3.2 The Algorithm 2k 3.3 Proof of the Projected Gradient Formula 29 3.^ An Example Problem 32 3-5 Step Length 3k 3.6 Zigzagging 36 3.7 Convergence 37 k. THE COMBINED METHOD ko k.l Discussion of the Method ko Page k,2 Phase I of the Algorithm 4l 4.3 Feasibility Test 44 4.4 Phase II of the Algorithm 44 4.5 The Locking Test 46 4.6 Global-Local Conditions 47 5. IMPLEMENTATION ON ILLIAC IV 48 5-1 Parallel Computations 48 5-2 Storage Methods 49 5-3 Disk Delays 51 5.4 Computational Procedure 52 5.5 Size Limitations 54 6. CONCLUSION 57 LIST OF REFERENCES 59 APPENDIX A. CALCULATIONS FOR THE EXAMPLE PROBLEM OF THE PROJECTED GRADIENT METHOD 60 B. FLOW DIAGRAMS OF THE COMBINED METHOD 65 LIST OF FIGURES Figure Page 1. An illustrative example of the interior points approach .... 13 2. An illustrative example of the exterior points approach .... 16 3. The correction of the projected gradient back into the feasible region 27 h. Dropping a constraint in the Projected Gradient Method .... 28 5. Representation of the projected gradient formula 30 6. A sample problem of the Projected Gradient Method 33 7. The optimum in the Projected Gradient Method 3^ 8. An example of zigzagging 37 9. A straight storage scheme 50 10. Multimodal functions 55 NOTATION A = Approximation to Hessian matrix C , C = vector components of the gradient of the objective function d = step length D = direction vector E = symmetric s by s matrix E = Euclidean r-dimensional space f, = k-th current solution of the objective function f(X. ) g. (X) = i-th constraint equation G = hypersurface of the intersection of s active constraints h(Y) = penalty function H(X, p) = revised objective function i = index for constraints In - H(X, p) = infeasible revised objective function j = index for activities J = set of equality constraints k = current solution index K = set of inequalities greater than zero I = length of feasible region L = Hessian matrix L = set of inequalities less than zero m = total number of inequalities n = total number of activities N = unit vector which points in direction of the projected gradient p = penalty constant P = projected matrix P V f = projected gradient vilj q = slack variables in exterior approach R = test vector in projected gradient method r = total number of constraints s = total number of active constraints U = Matrix used in Gradient Projection Algorithm V = symmetric s by s matrix X = current solution vector X* = local optimum V = correction into feasible region a = test for projected gradient method [3., = i-th position of P, vector 7 = test for rate of convergence 5 = measure of closeness to a hypersurface e = step length in conjugate direction method V = gradient - a column vector of first partial derivatives v = matrix of second partial derivatives A = forward difference operator = Euclidean norm = transpose of a matrix 1 . INTRODUCTION 1.1 The Object and Scope of the Study The purpose of this investigation is to establish a nonlinear pro- gramming system for the parallel array computer ILLIAC IV. The nonlinear system is to supplement the linear system now being developed. Its capabil- ities range from small decidedly nonlinear problems to large quasi-linear problems. The bound of the larger problems is of the order of ten thousand constraints, while the smaller highly nonlinear problems rarely exceed one hundred constraints. This wide variance makes it difficult to apply one of the existing algorithms. This report presumes an understanding of the concepts of ILLIAC IV and mathematical programming. The uninformed reader should refer to the list of references for background material; however, the author recommends a prior knowledge of linear programming. At the present, nonlinear programming is a hodgepodge of specific examples. A unified theory has not yet been devised and appears unlikely to be in the near future. Particular methods have arisen which solve very special problems. The system designer must account for this diversity and plan accord- ingly. The path is not an easy one and many questions must be answered. What is the range of problems that the system will solve? Is the local optimum satisfactory? How stable is the stationary solution? These types of questions are examined in this report. The scope of the system is not limited to convex programming. An attempt is made to find many local optima so that the probability of determining the global optimum is increased. Since no one has discovered a technique for finding global conditions for general nonconvex problems, this deficiency must be accepted. Parallel computing handles the problem nicely, as is demonstrated in a subsequent chapter. 1.2 Organization of the Report The remainder of this chapter presents the nonlinear programming problem per se. A review of the existing techniques is made and the procedure chosen is the best with regard to the range of applications and the structure of ILLIAC IV. The advantages and disadvantages are briefly explained. Chapter 2 describes the Sequential Unconstrained Minimization Tech- nique (SUMT) in detail. Various penalty functions are examined and the differ- ences weighed. The interior and exterior theories are discussed, as well as I ■ blems with convergence. Unconstrained methods are looked at and an example is presented. Chapter 3 involves the projected gradient method. The essential for- mula is proven, and a sample problem is shown. Problems with step length and zigzagging are pointed out and a search for the best step length is covered. The Combined Method algorithm is discussed in Chapter k. The implementation of the algorithm on ILLIAC IV is discussed in Chapter 5. The efficiency of machine usage is observed with an eye to the rate of convergence. The details of parallel computations are discussed so that a realistic code can be programmed. The coding requirements, such as storage schemes, are presented and their importance in an array environment is stressed. The results of the study are listed in the summary. Finally, the re- port ends with an overview of the system and a recommendation for future imple- mentation. 1. 3 Statement of the Problem The general mathematical programming problem is to minimize the objec- tive function in n variables w ■ f( V \ = (*ik> • • • > *„*> subject to the constraints g i (x k )>b i 1-1,. m g i (X k ) = b i i = nH-1, . . . , r b. > 1 — which form the feasible region in r-dimensional Euclidean space. It is usually unnecessary to add slack or artificial variables in nonlinear problems, in contrast to the simplex algorithm of linear programming. The equality con- straints can be changed into inequalities by adding the following constraints MV - b i >° \ -8 l (\)>0 but often they should be dealt with separately. In this report, the vector (X, ) is the k current solution. The objective function and constraints f and g. (i = 1, . . . , r) can be either linear or nonlinear, but they must be continuous and possess first derivatives. The difficulty in determining the solution is directly propor- tional to the amount of curvature of the function. Hence, linear programming is solved in the most straightforward fashion. The first step into the nonlinear realm is to allow the objective function to be nonlinear while holding the constraints linear. This problem can be solved by the simplex method and is an extension of that technique. The linear constraints define a convex region, and the method is called convex pro- gramming. The next step into the nonlinear is more painful, for many reasons. The most important problem is the global -local dilemma. Once the feasible region is no longer convex the optimum solution to the entire problem cannot be determined with certainty. A stationary point solution is the best that can be achieved, and many local minima are found. These local minima are compared to find the smallest. Starting the procedure at various points increases the prob- ability of determining the global minimum. 1.*+ A Review of the Existing Algorithms Two methods are combined in the design of the system for ILLIAC IV because of the difficulty in justifying a stand-alone procedure. The combina- tion is helpful in overcoming the global-local difficulty. The advantages of both the methods are utilized so that a more effective process results. Before the "Combined Method" (a name used in the remainder of the report) is analyzed, a review of some of the more popular algorithms is made. The first technique to be examined is the cutting plane method [9], the central principle of which is to linearize the constraints at predetermined points and solve the resulting LP problem. The linearization is achieved by applying first order Taylor series approximations g .(x) = g .(x k ) + ^.(Xjp'cx - x^ at the solution (X, ). If the solution of the LP problem satisfies the original constraints, the method is finished; otherwise the estimates are revised and the process continues. The technique converges for convex problems, but fails to find a local optimum in nonconvex problems. The path leaves the feasible region and never returns. The method is not applicable in this study because of the limitation of the range of applications to convex problems. The author discounts the accelerated cutting plane technique for similar reasons. The separable programming method is used when the objective function is separable; that is, of the form n f(X) = Z f 6c. ) and the constraints are linear and also separable. The method divides the objective function into a grid and linearizes it at the intervals. This lin- earization is applied to each variable. The problem is to minimize f(X) subject to g,(X) = Z g.,(x.) . . - -13* "J* Here again the procedure converges if convexity is retained, but fails otherwise, This method is likewise rejected. The Lagrangian differential gradient method has a different form than the previous methods. It utilizes the gradient of the objective function which points toward the greatest increase of the function. The negation of the gradi- ent is used here. Knowledge of the constraints alters the direction according to the gradient. Notationally dX/dy = Vf( X ) - Z u.Vg.(x) i=l X x where u. is a Lagrange multiplier and f g, (X) if u > or g.(X) > du/dy = ( x otherwise The solution to the original problem is obtained from the solution to the differential equations by letting y -* ». Strict convexity must be retained, and the solution cannot be found if the constraints are linear. This method has been a stepping stone for other gradient methods. The next method to be examined is the Sequential Unconstrained Mini- mization Technique. It is discussed lightly because it is one of the methods in the combined algorithm. The details are presented in Chapter 2. The method has been perfected at the Research Analysis Corporation by McCormick and Fiacco [5]- Convergence is proved for convex regions, and will also closely approach nonconvex optima. An advantage is the partial avoidance of unwanted local minima. The idea is to incorporate the constraints into the objective function and thereby change the constrained problem into a series of unconstrained minimizations. A penalty term is introduced to weigh the importance of the constraints. The se- quence approaches the solution of the original problem as the penalty is mono- tonically decreased. An advantage of the computational technique is the natural addition of an unconstrained subroutine, thereby greatly increasing the range of possible applications. The small step gradient method is a well known nonlinear algorithm. It is similar to the cutting plane scheme. The small step method attempts to remain in the feasible region by changing the solutions gradually. The steps of the algorithm are: 1. Linearize all constraints by a Taylor series approximation at a current feasible solution. 2. Minimize w = f (X^) = v f (X^ (X) with the current solution = X, subject to V g i (X k )' (\) = Vg i (X k )'(X k ) - f(X k ) and || (X) - (X )|| = OL, with a a small number. 3. Test for an optimum value and return to step 1 if the test fails. Any familiar LP method is used to solve step 2 above. Notice that no informa- tion carries from iteration to iteration because of the relinearization of all constraints. The number of computations is increased. This method converges very slowly and is, therefore, rejected. 7 The final method to be e; amined is the gradient projection method of Rosen [12]. It is a subset of the general method of feasible direction, whi attempts to decrease the objective function at every step, and is the second portion of the Combined Method. Therefore, the description here is brief a:: it will be described more fully below. The method begins with an initial fea- sible interior solution and follows the negative gradient to the boundary at which point the projection is determined. The projected direction is not nec- essarily the best, as in Zoutendijk's method [15] which solves an LP problem to determine the direction of greatest change within the region; however, a decrease of the objective function is guaranteed and an LP problem is not formulated at each iteration. The method converges in a small number of iterations, especially with linear constraints. Although the amount of work per step is large, the method is acceptable and is even shown to be desirable for the ILLIAC IV. 1. 5 A Description of the Combined Method As mentioned in the previous section, a combination of two existing methods is used in the nonlinear system design. The sequential unconstrained minimization technique is used for the small highly nonlinear problems and as a first approximation to the projected gradient method. The latter method is used exclusively with large quasi-linear problems and in speeding convergence in the combined mode. The stages of the Combined Method are as follows: 1. Phase one of the procedure involves the determination of an initial feasible solution if one does not already exist. It is often the case that a feasible solution is known a priori because of the nature of the problem. 8 2. The unconstrained revised I'unction is determined by the original constrained problem. The function depends upon the current solu- tion, and an arbitrary penalty function and constant. 3. The unconstrained function is minimized for a particular penalty constant. An attempt is made to find all local minima of the converted function. Tests are performed to check feasibility. k. The solution of the unconstrained problem becomes the current solution, and a test for the rate of convergence is made. If the rate is adequate, the penalty is lowered and the process returns to step three. The test for convergence is part of the locking test. 5. The projected gradient method is called into the program if the locking test indicates that it would be appropriate. The method determines a feasible direction and then a step length which minimizes the objective function violating the constraints. 6. The locking test is made again. The test includes a check for local optimum as well as the rate of convergence. The optimum is found within a delta region because of the nature of the algorithm. It is possible to return to the sequential method if conditions so warrant. 7- This step is included for nonconvex regions. Once a local minimum is found by the above process, the entire procedure is reiterated from another feasible position. Many optima are found in this manner and the probability of finding the global is improved. These steps give a rough understanding of the Combined Method. The prime advan- tage of this algorithm is the wide range of applications which attempt to find global optimum of nonconvex regions in a few iterations. The remainder of the report expands this assertion. 2. THE SEQUENTIAL UNCONSTRAINED MINIMIZATION TECHNIQUE Formulation of the Method The essence of the method is concerned with the conversion of the objective function and constraints of the constrained problem into an uncon- strained problem with a revised objective function, which is minimized. A sequence of conversions and minimizations is computed which converges to the solution of the constrained problem as the number of iterations approaches in- finity. The procedure terminates at a delta closeness to the optimum which is defined by S 2 2 i=l x for the s constraints which lie in the hypersurface G. In the unconstrained method, a test of convergence is made to measure performance. The formula ' ' (f k " f k + iV f k . where f = f (X, ), is checked against a constant which has been predetermined by the nature of the problem. A local optimum test is made when y becomes less than the specified constant. The procedure halts or switches to the projected gradi- ent if the minimum is not found. The revised objective function for the sequential method is defined by m with y ik = e±(\) 10 The penalty constant for the k-th iteration is p , and the penalty function is Ihus the revised objective function is also a function of the constraints when the penalty is nonzero. The function depends upon the curvature of the constraints, the number of local minima, and the ease of calculating the con- straints. It has the property that m lim L P k h(y lk ) = k -* » i =1 and the result lim H(X k ,p R ) = f # , k-*« * •*- where f # = f (X ) and X is the solution to the constrained problem. The effect of the conversion is the simulation of the original prob- lem more and more closely as the sequence progresses. The contribution of the constraints to the revised objective function diminishes and vanishes at the limit. 2.2 Penalty Functions The function h(X) has many forms, with inverse and logarithmic forms being the most popular. It is possible to use a weighted penalty function that combines the two approaches of the sequential unconstrained method. This is dis- cussed later in the report. A logarithmic penalty function has the form m H( W - f k - p k .V n g i ( V 1=1 in the revised function. Taking first derivatives, find the gradient of the H function 11 m 1=1 It is apparent that the gradient vanishes provided )C = X and p = 0. This assumes the Kuhn and Tucker condition that the gradient of a function vanishes at the optimum or f^ = 0. If this assumption holds, the first order sufficiency condition for the logarithmic function is proved. The second order condition is proved in a similar fashion. Proceeding to the second derivatives, we have v^H(x k ,p k ) = v f - Z [pj/g^)] v g.(x k ) + Z [i/g i (x k ) ] v gi (x k ) vg i (x k ). i=l i=l 2 A second order condition for the optimum is that the matrix V f* he positive # 2 definite. As p, ■• 0, X -» X , and the H function becomes equal to V f^j there- ? * fore, V H(X ,0) is a positive definite matrix. This completes a second order -ondition for the logarithmic penalty function. These proofs demonstrate the nearness of the unconstrained function H to the minimum of the objective function f in the constrained problem, provided a logarithmic penalty function is utilized. A second penalty function, called the inverse function, has the form m H 13 x x - x 2 > iA V 2 3A i 1 Figure 1. An illustrative example of the interior points approach. The revised objective function is H(X k ,p k ) = x ± + 2x 2 - p k ln[g 1 (X k )] - pjInCg^)] = x x + 2x 2 - P k ln(x x - x 2 ) - p k ln(x 1 ) . Figure 1 demonstrates the problem at hand, the shaded area being the feasible region, and the boundaries the equality constraints. The trajectory of current solutions, which are listed in Table 1 and plotted in the figure, shows how the boundaries are avoided and how the interior is utilized. The revised function is first minimized for p = 1 by taking partial derivatives and setting the sys- tern of equations to zero. The penalty constant is lowered to one half of the previous constant, and the new revised function is minimized. This process Ill- continues until the path of the current solutions converges upon the origin (0,0) and halts within a 5-closeness. The reduction in the penalty was arbi- trarily chosen. The percent of reduction is large because of the simplicity of the problem and the speed of convergence. Note that the path remains in the interior and the rate of convergence slows as the optimum is approached. This method is called the interior points approach for obvious reasons. TABLE 1 THE SOLUTION OF THE INTERIOR PROBLEM k P k 7 1 7 2 f k 1 1 • 833 .667 2.16 2 • 5 .416 • 333 1.08 • 25 .208 .167 0.5^ k • 125 .104 .083 0.271 The computations of the minimum of the revised function are usually more involved because the derivatives are harder to find, and the system of equations is less straightforward; therefore another method, such as one of the conjugate direction techniques, is used. 2.k Exterior Points Thus far, the method (SUMT) discussed in this chapter has dealt only with the interior points approach. Another approach is to seek the optimum from an infeasible current solution. This is called the exterior points method. The idea is to add positive slack variables to the constraints which do not meet the predefined inequality conditions. This addition takes the form 15 6 i ( V + \ = ° x - i - m in which the variable q decreases as the solution sequence progresses. The K constraints which affect the results are these relaxed inequalities. The new objective function H(X k ,p k ) = f + p k L min[0,g i (X k )] , i=l called the quadratic loss function, accounts for this relaxation. The penalty function, p , is monotonically increased as the sequence of unconstrained problems is solved. Since as p increases, X, is drawn towards the feasible region so as to minimize the value of the penalty term, X -> X as p -> °°. Each step consists of solving the set of equations formed by setting the gradient of the new objective function to zero, i.e., m V H(X k ,p k ) = V f k + 2p k I V g i (X k ) • min[0, g;L (X k )] = 0. i=l The second order condition is also fulfilled because the second derivatives form a positive definite matrix as in the logarithmic function discussed above X is a local minimum of the converted function H(X, , p, )• The difference between the exterior method and the interior method arises because of the form of the revised objective function. The interior method uses all constraints in each iteration, but the exterior method calcu- lates only those constraints which do not fulfill the proper inequalities. The optimum of the converted H function moves toward the optimum of the constrained problem in both cases. 16 An example of the exterior method is to minimize w = f (X) = x 1 x 2 subject to 2 C 2 gl (x) = -xj -x: + k :> o g 2 (X) = -xj -x 2 + 1 > g 3 (X) - x ± + x 2 > X, + X 2 >0 Figure 2. An illustrative example of the exterior points approach. x 7 The shaded area is again bhe feasible region with the equality conditions the boundaries. The revised objective function with p equal to .5 '■• 2 2 2 H(X ,.5) = r Q + 1 X{[min(0,x 1 + x g )] + [min(0,l -x Q -^ )] + [min(0, -x 1 -Xg + h)] } . The solution of this function is listed in Table 2 with the other solutions in the sequence. The penalty constant p is decreased in successive iterations to bring the current solution closer to the feasible region. The trajectory of solutions is plotted in Figure 2. TABLE 2 SOLUTIONS OF THE EXTERIOR PROBLEM k p k x l x 2 f k 1 1 • 67 .89 .60 2 2 .62 • 77 .k6 3 k .61 • 73 M h 8 • 58 .67 .38 00 00 • 577 .667 • 385 The computations of the minimum of the revised function for the exte- rior method is less involved than the interior method in many cases. Note that in the example the current solutions fulfill the proper inequalities for con- straints 1 and 3- Hence these constraints are set to zero in the revised func- tion while the conditions are satisfied. The order of the polynomial is increased by the quantity squared, which is a disadvantage from a computational standpoint. 18 The exterior method is quite useful when the minimum of the first iteration X is close to the feasible region and the solution occurs on the boundary of the region. The convergence may be prohibitively slow in other situations. The exterior algorithm handles equality constraints more effec- tively than the interior method, because the slack variables are taken care of in the quadratic loss function. The difference between the interior and exterior methods is that the interior method uses the constraints to avoid regions, while the exterior method follows the philosophy that the constraints keep the current solution close to the feasible region. Although the ideas are different, the calculations are similar, except for the number of constraints in the revised function. 2. 5 Unconstrained Algorithms An integral portion of the sequential method is the minimization of the unconstrained problem using the objective function H. Without a good tech- nique the whole idea falters. What constitutes a good technique? It must have good convergence properties and be computationally straightforward. Global- local difficulties and saddle points must be avoided. These problems resemble those of the constrained problem. A multitude of algorithms appear in the liter- ature and have been applied with varying degrees of success. The most popular are the method of steepest descent, the conjugate direction methods, and Newton's second order method. The first class of methods to be examined is the conjugate direction approach. The approach guarantees to locate the minimum of a function in n steps, provided the function is quadratic and possesses n arguments; otherwise a test of convergence is required. The idea is to approximate the function by a quadrat ir: equation and search along this function until the minimum is deter- mined. A subset of this method is the Fletcher and Reeves [7] proposal. Their algorithm begins with the vector (X, ), where k is initially zero, and the 19 ■t ion D K * V h The next point (X ) is found by determining the minimum value of f(X) along this direction. When this point is found, a new direction is computed by the D, i = -V f, , + €. D, k+1 k+1 k k where £ K " V f k + l 2 /^ f k 2 Tliis process continues for n iterations until X , the solution of the quadratic problem, is found. An assumption is that the calculations involve no roundoff or truncation errors. The generated directions correspond to the quadratic approximations and the procedure should be restarted every n+1 steps for general functions. The principle quantities to be observed are: the rate of conver- gence, the choice of the initial solution, the search for the next current solu- tion, and the test for delta closeness to the optimum. Another conjugate direction method is the Davidon variable metric method [2]. The notion is to approximate the second partial derivatives in the Hessian matrix. It begins with the arbitrary vector X and any positive definite matrix A. Successive points are found through the equation \ + i - \ - ^ f k The step length d is found in a one dimensional search for the local minimum of 20 the objective function. The difference from the previous method is in the matrix approximation with the difference equations and ^ - - d \ v f k ^k ■ *Vl - *V The matrix A, is positive definite if A is also positive definite, and the function decreases appropriately. The advantage of the conjugate direction methods lies in the quadratic approximation to the function. The method works quite well for functions which resemble quadratic forms, but not so well with general functions. If the initial point is sufficiently close to the optimum, the rate of convergence is good. The calculations of A are somewhat involved. In some cases it is much better than computing second derivatives. The method of steepest descent is the second type of method to be dis- cussed. It is an offshoot of the generalized gradient method, which points in the direction of greatest change of the function. The negative of the gradient is again used. Distance along this gradient is maximized to lower the number of iterations. The arbitrary vector I is used once more to begin the procedure. Subsequent points are found by 21 *kn'h- dv f * where d - the step length k + 1 = the new current solution The step length is found by determining the smallest value of d which minimizes the objective function along - Vf fc but remains positive. That point becomes the next current solution and a new direction is found that is orthogonal to the previous direction. The final method to be examined is the Newton second order method. Second partial derivatives of the unconstrained H function are required. The method is based on the second order Taylor series W =f k + fi!CVf k + i^'hP* where \ = b f. fix fix.. k 7 1 ] d f . fix dx, k' n 1 d f, fixfix k' 1 n d^f, fix dx k' n n and L^. = L()C ) is the Hessian matrix. The A, matrix of the Davidon variable metric method is an attempt to duplicate this matrix. It is a positive definite symmetric matrix. The series of current solutions is represented by -1. \ + l ' \ - «* v f : 22 where L Vf, = direction of step d = step length , chosen to minimize f along -L V f, starting at x, • Note that the inverse of the Hessian matrix must exist for the method to con- tinue. Also, the matrix must be stored, which requires ^-N(N + l) words. If the derivatives are troublesome to compute, the method slows considerably. Nonetheless, the rate of convergence is as good as, if not better than, the earlier two methods. Actual tests have been run by Fiacco and McCormick [k~\ and the results indicate that the Newton method requires the fewest iterations, o but the work per step is roughly n /3, while the variable metric method requires 2 n operations per step and Fletcher-Reeves method requires n /2 operations per step. These results indicate that the Newton method requires many more compu- tations per step but fewer iterations are required for convergence; therefore, it compares favorably with the other methods. Because of the structure of ILLIAC IV, these characteristics make the Newton method the preferred technique. Chapter k expounds on this issue in more detail. 2.6 Advantages In Chapter 2 the sequential unconstrained minimization algorithm has been examined. To summarize, the advantages of the method are listed below. 1. The algorithm is computationally concise, yet convergence has been proven for first and second order conditions. 2. The sequence of current solutions remains in the interior for the interior points method. This avoids unwanted local minima to some degree. 23 3. Highly nonlinear I attaints can be hand] I because the path of solutions does not attempt to follow the constraints. The path avoids the constraints through the use of the penalty constant and the penalty function. k. It includes a useful unconstrained technique within the computa- tional algorithm. 5. A large variety of problems can be solved. sse conditions suggest that the method be used in a combined algorithm, its principle task being to serve as a first approximation to the gradient projection method. The principle disadvantages stem from equality constraints. It should be apparent that these constraints can be satisfied only in the limit for both the interior and exterior methods. Something must be done to overcome this difficulty. The second disappointment is the slowness of convergence for general rules which affect the penalty constant. This can be overcome by a priori knowl- edge of the problem; however, a problem solving system should avoid user mani- pulation as much as possible. The formulation of a good extrapolation procedure has not been achieved. 2k 3. THE PROJECTED GRADIENT METHOD 3*1 Review of the Problem The second portion of the combined method is taken from the algorithm iosen [12]. The first account of this algorithm appeared in a paper which dealt with linear constraints, and this was followed by the subsequent paper on nonlinear constraints. Hadley [8] includes this method in his book on nonlinear programming. The gradient projection method is one of the techniques of feasible Lirections. It attempts to move the solution in the direction of the negative gradient, similar to the procedure of the unconstrained method of steepest de- scent. Differences occur when the gradient of the objective function points outside the feasible region. Following the gradient outside the region leads to infeasibility, so other directions are searched to find the "next best" direc- tion. 3-2 The Algorithm 1. The starting vector can be found in two ways. It can be determined either by the sequential unconstrained minimization method or by the projected gradient method. This is examined in more detail in the next chapter. 2. If the starting vector is interior, the current solution is found by the formula x i ■ x o - d v f > where the step length d is determined by a one -dimensional search. Find the step length which minimizes f 25 subject to < d < d , — — max where d is the longest step length which remains in the feasible region. 3. If X, lies within a & -neighborhood of the intersection of the s K constraints, then the solution is moved in the direction of the projection of the gradient Vf, on the intersection G of the tangent hyperplanes at )C. Deter- mine the matrix u k = (v gl , vg 2 , . . . , vg s ) . The columns of this matrix must be linearly independent and the procedure checks for this condition; otherwise dependent columns are removed until the remaining columns are independent. Next define \ ■ [ \'\ yl > which exists because of the independence condition above, and R k k k The projection of the gradient on the intersection G of the tangent hyperplanes at I is then V^ ■ w k - U A 26 where \ - x - \\\ The proof of this vital equation is contained in the following section. The test for the optimum is started by finding P.. = r.. /(2 sTv..) lk ik/ v 11 where r is the i-th element of the vector R and v. . are the diagonal elements of the matrix V, . Define k Finally, if a = max [ II P k Vf k || , max p ±k ] i a < b/ 2s lii where 6 = specified tolerance s = number of constraints I = a bound on the diameter of the feasible region U = measure of independence of the active constraints, || V-. f | < j/ then f is within a 6-distance of the local minimum. h. If a < S/2s£u and || P Vf | = max[p ], then find the unit vector k k ik" " k -P k vf k /l|P k vf k 27 Now compute the series of points from Y, to Y which projects the current solu- tion back into the feasible region by + dN, Y„ = Y l - W V Y = Y . - u, v n w(y . ) n n-1 k k v n-l y wh ere w ( Y i _ 1 ) is a vector which is a measure of the distance to the intersection The n-th solution is found at a 5 -neighborhood of the intersection G, and the iteration terminates. X 2 A (0,0) Figure 3. The correction of the projected gradient back into the feasible region. 28 5. If a < 5/2si/Lz and | P.Vf. |l < max 3 then drop the i-th column of the U matrix which corresponds to the element max (3 . The new matrix 1 becomes U and the process is continued by finding the new V , R , P , Vf ,, and N matrices and vectors. The notion is to proceed along a new intersection G by dropping the appropriate constraint. X 2 A ,— CONTOURS OF THE x \ x„ x )0 x 9 OBJECTIVE FUNCTION X.3 — -X| Figure h. Dropping a constraint in the Projected Gradient Method. Figure k shows how this movement occurs geometrically. At solution X, » the path switches to intersection G, which is constraint 1 in this example, from inter- section G, which is constraint number 2. The constraint is d. _pped from consid- eration because the optimum occurs at X , away from the equality condition of constraint 2. Point X is not a stationary point because the objective func- tion can be decreased by moving along intersection G. This completes the explanation of a typical loop in the projected gradient method. 29 . 3 Proof of the Project id .iradienl Formula The formula in question is p k Vf k ■ Wl k - \\ where the notation used has been described in the previous section. This equation is crucial to the method because it is the distinguishing factor between this method and other methods of feasible directions, i.e., it is a projection of the gradient onto the intersection of the s constraints. It is also part of the test for a local minimum. A necessary condition for the opti' mum is that V* k " ° which follows automatically when Vf . = . k A sufficient condition is that r.. < for I = 1. .... s where r.. is the i-th lk ' ' ik element of the vector R. Combining these conditions, one knows that a move in any direction fails to decrease the objective function. The first step in verifying the equation for P is to let E = E + E s u ju where E = Euclidian s space E = Sub space spanned by the columns of U, E = Sub space spanned by vectors orthogonal to the columns of U, . JU .K Now let where 30 Vf , = c . + c , k uk )ik C = component of the gradient in Q, which is the tangent to the intersection of hyperplanes at X^, C , = orthogonal component. X 2 X| Figure 5. Representation of the projected gradient formula. The illustration in Figure 5 depicts the vectors geometrically. Now Vf = C . + C , , C.eE , C.eE k uk nk ' uk u ' nk u But since E is generated by the columns of U C , = U. uk ■ \\ ■ The re CL is a vector of coefficients. Since E is the orthogonal complement of E , u' u. *c . = k uk Hence \ OT k ■ u k "A \ • ("kV" 1 V^k since the inverse exists. It follows that uk k uk -V** where p k - J - vVV" 1 u k' 32 3-^ An Example Problem The sample problem which demonstrates all of the above principles is to minimize the objective function, -3x, -x , subject to the constraints gl (X) = -x^/h + 2x 1 -x 2 > g 2 (X) = x^ - k > g 3 (X) = x ± > g k (X) = x 2 > figure 6 represents the problem with X , the initial solution, at [2,3] • It happens to be on a boundary, which eliminates the first step in the algorithm. Note that W k = [-3,-1] V g;L (X) = [-x.j/2 + 2, -1] Vg 2 (X) = [x 2 , Xl ] . The calculations of the first move, which are included in Appendix A, indicate that the current solution is not a local optimum because the projected gradient is not zero. The direction of the step length is P-,Vf and the path moves along this direction with the arbitrary step length d = 2. Figure 6 shows the neces- sary vectors. g,(X)=o 2 3 4 5 6 7 8 X| Figure 6. A sample problem of the Projected Gradient Method. A constraint must be added with this step length. These calculations are also contained in the Appendix. The solution is iterated back to the feasible region. The second current solution is found by this process to be (1.578,2.53*+) and the constraints are both equal to zero. The computations of the projected gradient at this point are made, and the result is P 2 Vf 2 || =0.0 R = (-0.^9527,-0.91+721+) Because the projected gradient is zero and the R vector's elements are negative, the current solution is a local optimum, which is X p . Figure 7 illustrates the local conditions for optimality. 3^ X 2 Figure 7. The optimum in the Projected Gradient Method. 3. 5 Step Length The choice of a step length search procedure depends upon the position the current solution. If it is in the interior, a simple one -dimensional search is performed. The objective function to be minimized is f (X) along the direction where < d < d — — max 35 Fhe optimum step length is quite easy to find provided the function is unimodal. a is true even if the function is complicated to calculate. Problems arise when the function is multimodal, and these are explained in the following chapt. The curvature of the constraints determines the length of the step if the current solution lies within a S-neighborhood of the boundary. A tradeoff exists between the length and the time required to project back into the feasible ; on. The distinct possibility of never returning to the feasible region exists for highly nonlinear constraints, especially if the step length is long. Rosen [12] suggests that a good step length is d ■ <*W/« * where 6 = unit of independence of vectors of U 3 = maximum of 6.. max ^ik = measure of curvature of the active constraints. One determines the matrix \ - E b. i = 1, . . • , m i l or g.(X) - b. > . The first step is to pick many arbitrary initial feasible vectors X , by any process. One should use any previous information in picking these vectors, otherwise they can be chosen by a random number generator, within certain bounds. These vectors are substituted into the constraints and they are divided into three sets — J, K, L. The next test branch in the flow diagram is needed to handle the different forms of storage. Multiple optima are found in two different fashions; these are listed under number 7 and 8 in the flow diagram. Set J = (a|g (Xj - b = 0} Set K - {c|g c (X ) - b c > 0} Set L = U| gi (X ) - h £ < 0) The procedure forces J and L into becoming null sets, with the elements goin g into set K so that the conditions are fulfilled. Select a series of solu- tions which increases the first constraint of set L, without violating the con- straints of set K. Set J is checked as the procedure continues and put into set K or L appropriately. Using the unconstrained method, minimize U3 In-HOC^) = - Z lefa) - b/ ] +Pk E h c (y ck ) i€L CGK which is called the infeasible revised objective function. The negative sign in front of the first summation is necessary because one wants to increase the desired constraint, and this equation is equivalent to maximizing a positive quantity. The minimization is performed in two separate ways depending upon the existence of second partial derivatives. The Newton Method is used if the derivatives exist; otherwise a conjugate direction method is used. The flow diagram accounts a multi -model revised objective function through the global test. This test reiterates the procedure for another minimum if it is not satisfied with the solutions at that point. A sequence of iterations is per- formed with a decreasing p . The sets are continuously checked for the proper inequality during the sequence. A constraint of set L leaves the set when it becomes positive and is put into the K set. The process halts when set L is null. Set J is probably null at this time also. If it is not, the minimization is performed on the elements of set J. The problem is infeasible if either set L or J have elements when the minimum is found, provided the function is convex. Otherwise, the procedure is returned many times to test multiple starting points through the test until it is satisfied with infeasibility. At some point, the answer at the test is negative and the algorithm halts. The procedure assumes that the equality constraints have been converted to inequalities by the formula of Chapter 1. If the conversion proves trouble- some, the equalities can be ignored in Phase I but must be reckoned with sepa- rately in the remainder of the algorithm. The interior and exterior methods co- operate to handle equalities. Once multiple interior points have been calcu- lated, the procedure switches to Phase II, which improves the solutions. kk 4.3 Feasibility Test The previous section provides a test of feasibility if the constraints are convex, and it can be applied to nonconvex problems through many iterations. The author's proposal is to work with the equality constraints, which are usually few in number. One limits the range of possible answers by finding a solution of these equations. This method is costly in machine time and should be used only if the method in the previous section fails two or more times. The method finds any solution of the equation Z [g.(X) - b.] 2 = iej for the equality constraints by minimizing this system. At the minimum, the final vector is the solution of the system. The minimization utilizes any un- constrained procedure, such as Steepest Descent or Newton Second Order Method. Another technique is to approximate the function by any interpolation procedure. When a root of the polynomial is found, check it against the equality constraints. If the desired accuracy is not achieved, use a higher order interpolation formula. This procedure is not entered on the flow diagram because the first method is usually satisfactory, but it can be added quite easily at entry point number 11 in the flow diagram. k.k Phase II of the Algorithm The second portion of the algorithm is the most important and the longest with regard to machine usage. It begins with the multiple interior point solutions and attempts to find the global optimum. In most situations the global optimum cannot be found with certainty and one must be satisfied with a good local optimum. The locking test is the crucial portion of Phase II k5 ause it determines the technique ij be used during Phase II. It also pre- dicts the final stationary points and termination. A discussion of the test is presented in the following section. The first portion of Phase II to be examined is the Sequential Uncon- ■ lined Method and it is located at entry point number 6. Many solutions are found during this portion of the algorithm, and a division of the flow diagram is made to show how each type of constraint finds these solutions. A test divides the separable polynomial constraints and the general constraints. The next stage is to find the penalty constant p , which very often is 1 for the ri first iteration. For the problems with general constraints many solutions can be found simultaneously whereas a sequential loop is required for the problems with separable polynomial constraints. Then the sequential approach is selected and the penalty function is determined. These are fixed for all solutions during application of the Sequential Unconstrained Minimization algorithm. The revised objective function (functions for general constraints) is formed and minimized. Many local minima of the revised objective function are also found in the same fashion as above. Finally, the penalty constant is lowered and the procedure is reiterated until a satisfactory improvement is obtained. The Sequential Un- constrained Method does not usually find the final stationary solution because the Projected Gradient Method has faster convergence near the end of the optimi- zation procedure. Many similar tests are performed during the Projected Gradient portion of Phase II. One additional test is for interior points because the procedure uses a different algorithm depending upon whether the solution is on the boundary pr in the interior. Separable polynomial constraint problems again are handled jin a different manner. Many optima are found during the procedure with both portions of Phase II. Control is returned to the locking tests at this point. k6 U.5 The Locking Test A recurring question in the system is the determination of the proper technique to apply in Phase II. When does it become advantageous to switch to the Projected Gradient Method instead of continuing the Sequential Method? An estimate of the future rate of convergence is made. Obviously, certainty does not enter into the prediction, and no exact answer exists. Each method works better for various problems. Two benefits of the Projected Gradient Method are the existence of linear constraints and of few hyperplanes in the intersection at the optimum. On the other hand, the Sequen- tial Method handles nonlinear constraints. To determine the appropriate method to be used, the following scheme is suggested. 1. Determine an index which categorizes the type of problem according to size and linearity and the user's preference. This index runs from zero, which means a few very nonlinear constraints, to one which represents a large number of quasi-linear constraints. 2. The locking test is made at various iterations to determine the rate of convergence. A part of the test is made with the equation ' - (f k ' f k+ iVf k ! which tests for the rate of convergence. 3- The locking test keeps the current technique in position according to the index. If it is zero, the Sequential Unconstrained Technique is used alone. If the index is 1, the Projected Gradient Method is the only method employed in the problem. h. Indices between the extremes are weighed proportionately. The algorithm quickly switches to the Projected Gradient Method when the index is large and the convergence is small. Conversely, the Sequential Unconstrained Method is used often when the index is small. *7 5. The locking test also determines the final stationary optimum. It must analyze the previous information and predict the termination point. Global -Local Conditions The global-local difficulty has been mentioned often in this paper, nonconvex problems there is no certainty that the final solution is the global optimum. For many types of problems, this condition is usually not serious be- cause a close approximation is adequate. Many precautions are taken in the Combined Method to overcome this problem. For example, multiple interior points are found in Phase I which lead to many local optima during the procedure. Even in the minimization of the revised objective function, many local optima are determined to increase the probability of finding the global solution. Few algorithms have the flexibility of the Combined Method built into the system for finding the global optimum. This useful capability is one of the advantages of the Combined Method. The power of the simplex method in linear programming stems from the ability to locate the global optimum under certainty. Some types of problems are better solved by linear approximations because of the global conditions and ;he cutting plane method uses this idea; however, the approximation leads to Lnfeasible solutions in many situations. Perhaps nonlinear programming will also tichieve the capability of the simplex method someday. Until that time, methods -ike the Combined Method must be used. kS 5. IMPLEMENTATION ON ILLIAC TV 5-1 Parallel Computations The purpose of this chapter is to present some of the difficulties encountered in actual implementation of the Combined Method on an array computer such as ILLIAC IV. Solutions for some of these difficulties are described and alternative suggestions are mentioned. The parallel portions of the computa- tional algorithm are fully explained and storage schemes are developed. All nonlinear programming problems to be solved by the Combined Method are divided into two groups: (a) the problems with a large number of constraints which are linear to a great extent, and (b) the smaller problems with a more general form of constraints. Storage schemes are developed for each group to handle particular characteristics as explained below. ILLIAC IV is a large scale array computer which is divided into four quadrants with 6h processing elements (PE) and 6k processing element memories (PEM) per quadrant. The Combined Method is designed to fit into one quadrant but can be expanded at a later date. A quadrant has one control unit (CU) which broadcasts simultaneous instructions to the PEs. Individual PEs operate only if the mode is correctly set. The idea is to keep a maximum number of PEs operating to enhance the efficiency of the machine. All computer installations are con- cerned with efficiency and ILLIAC IV is no exception. This is emphasized by the fact that 6*+ PEs must wait for an input-output request. Parallel calculations improve this efficiency. Any algorithm can be coded in parallel to some degree; however, certain techniques are better than others. Much of the time, the Projected Gradient Method builds on previous steps. The Sequential Unconstrained Method also uses a great deal of previous information during the sequence. Portions of the method, such as the locking test, are not so satisfactory for parallel computations. Nonetheless, nost of the algorithm can be successfully implemented. h 9 Storage Methods The division of the constraints into two groups has been discussed Ler in the paper. This division is made to better utilize the parallel capabilities of the machine. Tt is possible to divide the types of problems according to many criteria other than that used in the Combined Method. Perhaps the user wishes to refrain from approximating the constraints in a large problem. He is faced with the difficulty of storing the code on the disk and waiting for retrieval of the information. Depending upon the particular type of problem, he must devise some method which improves efficiency. These kinds of problems are not examined here because of the enormous variety of the forms of large general constraints. Also these problems are not common because of the effort required in setting them up. The author does not discount all large size nonlinear problems; how- ever, these are limited to those which can be approximated by separable poly- nomial equations. These have the form r g, (X n ) - b n = Z (a .x . + . . . + a, .x . + a . ) & l v 1 J 1 . v nj j lj j oj' where n is the highest order of the polynomials. The objective function can possess any reasonable form, but the constraints must have the above formulation. Linear programming constraints have this form and are clearly a subset of the separable polynomials. Large size constraints which do not possess this shape are approximated by n-th order polynomials before the procedure is run. The storage of the separable polynomials is a variation of the "strewing" method devised for the ILLIAC IV LP algorithm with the columns and rows interchanged. The method assigns coefficients of the columns of the constraints depending upon the number of elements. The most dense column enters PE n , the next dense into PE , and proceeds to PE^- where the order of assignment is reversed. The densest remaining column enters PE/-~ and the process returns to PE~. The 50 overlapping of columns is continued 'mull no coefficients remain. Next, the PEs are sorted according to rows. Each coefficient is stored in 32 hit half- words with the power and the column and row tags in the remaining portion of the word. This storage scheme has many advantages: (l) the rows can be calculated in parallel and the columns are accessible if needed; (2) the number of elements per PE is approximately equal; (3) only the nonzero elements are stored, which saves space because of the sparseness of large problems. The general objective function is stored in a manner sijnilar to the method below. The general constraints are stored in a different fashion than the separable polynomials. For example, the constraint In x r -1 """** "2 -1 g 1 (X k ) = sin (x ± x^ + esc x^) , must be stored in some fashion. It is translated into code which is entered in a straight storage scheme. PEM PEM 1 instl' inst2 inst3 inst^ inst ' 129 1 x n 1 X 12 X 13 \ X lk x 21 1 1 X 22 X 23 \ X 2h PEM 63 inst ' inst 127 | 128 X l,127j X l,l28 X 2,127j X 2,128 Figure 9* A straight storage scheme. 51 opt. To realize the parallel cap- , 'he algorithm calculates many simultaneous vectors. Tli liagram Lustrates this fact. Thr data now is the generated matrix of ti -h Ls aiso illustrated in Figure 9. The first subscript represents the position Mi in a solution, and the second subscript designates the current solution; for example, X is the first position within the k-th current solution. In it cases 6k solutions are computed together, which helps improve the global- local difficulties. These two diverse storage schemes represent two trains of thought. The first idea is that large problems are so unruly that any reasonable answer Is usually sufficient; hence the parallel capabilities are used to achieve this solution. The algorithm gives better solutions by reiteration as often as the user wishes. Problems with a smaller number of general constraints are handled a different philosophy. The parallel capability is used to predict the global optimum by calculating a multitude of local optima simultaneously and its advan- tage lies in this concept. 5.3 Disk Delays Although the ILLIAC IV is designed with an extremely fast disk storage, it is slow relative to the speed of computations of the machine. The delays for input-output of information must be minimized, of course, but the latency time also must be utilized. In other words, the time when the computer is requesting information must be used for other calculations. The locking test is ideally suited for this delay purpose because it is performed between iterations. The locking test determines the subsequent method to be used while the data is being, read into memory. This procedure improves efficiency because the locking test is a sequential operation for the most part and it would slow the algorithm if used elsewhere. 52 The Combined Method has relatively few iterations because it attempts to maximize improvement of the objective function at each step. In this respect, it is well suited to an array computer because it minimizes the difficulty men- tioned heretofore. Problems with few iterations, even though the computations per iteration might be complicated, are much more appropriate to the ILLIAC IV. The Combined Method fulfills these requirements. 5«*+ Computational Procedure The algorithm has been explained above, but some of the more elementary calculations have not been discussed. This section attempts to alleviate this omission. The heart of the Unconstrained Technique is the incorporation of the constraints into the objective function which results in an unconstrained prob- lem. This conversion is a likely target for parallel calculations, and the in- verse penalty function is nice in this respect. Recall the form of the inverse revised objective function m H( W - f k + \ .Y /g i ( V i=l To determine the revised objective function, the algorithm computes the constrain vector simultaneously across processing elements for separable constraints, broadcasts p to all elements, multiples, and finds the sum within each PE and then globally across the PEs. The conversion takes place frequently and is therefore important. For general constraints multiple solutions are determined. The computation of the logarithmic penalty function also is an easy task to perform in parallel. The familiar relationship in x + £n y = £n xy 53 used to advantage. Substitution f the vector X Lnto the constraints is milarl.v bo the Inverse penalty function. The operands are multiplied, i the natural logarithm is taken. A most important portion of the Combined Method is the computation of Lient. It is the central idea in the Projected. Gradient Method and is it useful and necessary in the first portion of the Combined Method in mini- an unconstrained function. The gradient is defined by the equation Vf(X) = (df/tfx^ . . . , df/dx ) and the partial derivatives are usually found through the approximation df/o^ = (f(x^ + D) -f^ - D))/2d where D = (d, 0, . . . , 0) which is called the central difference equation. It has better accuracy than the forward or backward difference equations. Of course exact derivatives are used when they are easily calculated, as in polynomials. Here the derivatives are fed as input into the system. The direction of the gradient is normalized by the equation = [(df/d Xl ) ,.".., (df/dx n )]/a , l2~ a = n 2 The Sequential and Projected Methods have separate means of termination at a local stationary point. Necessary and sufficient conditions show that both of these methods usually arrive at an optimum in the limit. The Sequential Method is especially weak when the final solution occurs at a boundary. Hence the projected gradient test for an optimum is used instead. This test takes the 5U form 6 < II P k W k and R.. < i = 1. . . . , s where 5 is the tolerance presented heretofore. When these conditions are ful- filled, the current solution is no more than a distance 8 from the local opti- mum. The first equation is computed sequentially in the locking test and com- pared against the constant. Time is lost in many of the elements but not a great deal because of the shortness of this computation. The Rs can be computed in parallel, on the other hand. The rows and columns of the matrices can be operated on together for calculation of the R-vector. One test is made to deter- mine if any are negative. 5. 5 Size Limitations The number and magnitude of the constraints determines the size of the nonlinear problem. The system can be quite large for the quasi-linear con- straints. In fact nonlinear programming techniques might be used successfully in linear programming. The storage would be the same as in the previous section for separable constraints. At the present time it is difficult to determine the feasibility of this proposal. The gradient method cuts across the feasible re- gion, in contrast to the simplex method which follows the boundaries at basic feasible solutions. The advantage- of the gradient method will be determined when a comparison of the number of iterations is developed. us on , lonlinear The :i ms1 raints are ston code, t\ uit.-i i i mall nonlinear problems is the con- :ept of "multiple search". he process is described i mpl one-dimen- Lrch below. The curve of Figure 10 has many peaks. How does as id the global optimum for this function? For an array computer, a multiple jorithm Ls ested. Although the example is elementary, the uses are Figure 10. Multimodal functions. many; for example, the search for the optimum step length along the negative gradient direction is a one -dimensional search. The procedure first determines the boundaries of the zone of uncertainty. The zone is divided into 6k smaller intervals and the values of these points are found in parallel. A sweep of the 56 6U functional values decides upon the number of peaks in the zone. In the example three valleys are found. The process now switches to the sequential mode in which each of the three valleys are closed upon together. If the inter- val had been larger, the simultaneous search would have been performed a few more iterations. By this method, many local minima are found. When the zone of uncertainty has been closed sufficiently the optima are compared for the best value. The probability of finding the global optimum is greatly increased. This same idea can be applied to more complicated searches such as the Sequen- tial Unconstrained Minimization Technique. Parallel computations make this task worthwhile and profitable. 57 CON LUSION This paper has examined various algorithms to d bhe most Le to utilize on the array computer ILLIAC IV. Marry of these methods, s the Cutting Plane Technique, were rejected because of the lack of at ions for nonconvex problems. Other methods are rejected because of the slowness of convergence for general problems. The best methods are the Sequential Unconstrained Minimization Technique and the Projected Gradient Method. These methods work in nonconvex regions, have good convergence pro- perties, and augment each other quite effectively. For these reasons, the two algorithms are combined into a single system for solving nonlinear problems. Detailed questions about the Sequential Unconstrained Technique were examined and answered. The Projected Gradient Method was presented in depth, and the difficulties examined. The actual implementation of the Combined Method was examined, and comments injected where applicable. The parallel capabilities of portions of the method were brought out as were questions concerning data mani- pulation. Many results have been discovered through the study. Some of the more useful are presented below: 1. Nonlinear programming theory is a hodgepodge of particular examples; hence, one method is not acceptable in a general nonlinear system. 2. The Sequential Unconstrained Method is the most straightforward of all existing theories. It also has many computational advantages. 3. Although the Projected Gradient Method is quite involved, the extra calculations often accelerate convergence. k. A combination of these two methods gives wide latitude in the range of nonlinear problems which can be solved. 5- A test, which the author calls the locking test, must be developed for controlling the method being used. 58 6. The Combined Method appears to be the best system for implementa- tion on an array computer. Any system for solving nonlinear programming problems is very incom- plete at best. Hence many limitations are present when the design is finished. To compensate for the unexpected deficiencies one must allow for an evolutionary process to take place. The system must change as particular difficulties or un- expected types of applications occur. The Combined Method leaves room for future changes in many places. For example, the user can supply an additional technique, such as the Cutting Plane Method, if certain portions of the feasible region display linear characteristics. This kind of detail must be developed after the design has been implemented, that is, if the design of the algorithm makes provision for these additions. The author believes that the problems with separable constraints will be the first application of the Combined Method. Linear programming problems, which are a subset of the class of separable polynomials, might be better solved with a nonlinear technique for very large size problems; quasi-linear problems are included in this framework. Problems with general constraints are the least defined and will there- fore be the last application of the Combined Method to be fully realized. None- theless much progress can be made with these applications by minimizing the global-local difficulties. Effort should be initiated in this area. LIST 0] [l] Hrown, P.M., and Ang, A.H., "Structural Optimization by Nonlinear Programming", Journal of the :' ■ i-'i ri.ural Pivision , ASCE, %2 (1966), pp. 3l9-3 1 +o. [2] Pavidon, W.C., "Variance Algorithm for Minimization," The Computer Journal , 10 (1968), pp. U06-410. [3] Porn, W.S., "Non-Linear Programming-A Survey," Management Science , £ (1963), pp. 171-208. [hi Fiacco, A.W., and McCormick, G.P., "Programming under Nonlinear Constraints by Unconstrained Minimization: A Primal-Pual Method," Technical Paper RAC-TP-96, Research Analysis Corpora- tion, McLean, Va«, 1963* [5] Fiacco, A.W., and McCormick, G.P., Nonlinear Programming : Sequential Unconstrained Minimization Technique , John Wiley, New York, 1968. [6] Fletcher, R., and Powell, M.J. , "A Rapidly Convergent Pescent Method for Minimization," The Computer Journal , 6 (1963), pp. 163-168. [7] Fletcher, R., and Reeves, CM., "Function Minimization by Conjugate Gradients," The Computer Journal , J_ (196U), pp. 1U9-I5U. [8] Hadley, G., Nonlinear and Pynamic Programming , Addis on -Wesley, Reading, Mass., 1964. [9] Kelly, J.E., Jr. "The Cutting Plane Method for Solving Convex Programs," J.SIAM 8 (I960), pp. 703-71 i +- [10] Kuhn, H.W., and Tucker, A.W., "Nonlinear Programming," Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability , U. of California Press, Berkeley (1951), pp. 481-1+93- [ll] Rosen, J.B., "The Gradient Projection Method for Nonlinear Programming, Part I: Linear Constraints," J.SIAM , 8 (i960), pp. 181-217- [12] Rosen, J.B., "The Gradient Projection Method for Nonlinear Programming, Part II: Nonlinear Constraints," J.SIAM , £ (1961), pp. 51^-532. [13] Wilde, P. J., Optimum Seeking Methods , Prentice-Hall, Englewood Cliffs, N.J., ±9~bh. [l4] Wolfe, P., "Methods of Nonlinear Programming," in R.L. Graves and P. Wolfe (Eds.), Recent Advances in Mathematical Programming , McGraw-Hill, New York, 1963, pp. 67-86. [15] Zoutendijk, G., Methods of Feasible Pirections , Eisevier, Amsterdam, i960. [l6] Zoutendijk, G-, "Nonlinear Programming: A Numerical Survey," SLAM J . Control , h (1966), pp. 194-210. 60 APPENDIX A CALCULATIONS FOR THE EXAMPLE PROBLEM OF THE PROJECTED GRADIENT METHOD Minimize subject to Note that Setting "3 X 1 " §1 (X) = -x 2 ± /k + 2x ± - x 2 > g 2 (x) = Xl x 2 - h > o g 3 (X) = x x > gl+ (X) = x 2 > . Vf k = [-3,-1] V gl (X) = [- Xl /2 + 2,-1] Vg 2 (X) = [x 2 , Xl ] X x = [2,3] g 1 (X 1 ) - , g 2 (X 1 ) = 2 > u x = [vg 1 (x 1 )] = [1,-1] U 1 U 1 " 2 62 v l = ^iv" 1 = - 5 R l ■ w*! ■ - 1 U^ = [-1,1] ? i^i = ^i - U A = [ " 2 '- 2] Move in the direction of projected gradient P-,^-, • Let d = 2.0 . Now N l = ^l/H ^l II = [--707, -.707] Y = X + d-^ = [0.586,1.586] W(Y X ) = g 1 (Y 1 ) = [-0.5] correction = YA^n-1^ = ^--25, -25D Y = Y ± - correction = [0.836,1.336] W(Y 2 ) = [0.161] correction- = UVW n (Y . ) = [.756.I.U16] 2 l v m-1 ' W(Y ) = -O.0U7 and continue until gl (x) < .005 . The solution thus obtained is not very good due to the arbitrariness with which the value d = 2 was selected. Adding a constraint if X 1 = [2,3] , then g 2 (X) = 2 > if X 2 = correction g (X) = -2.93 By interpolation d = j£j£ X 2.0 = .81 Y ± = X + d-N x = [1.1+28,2.428] W(Y 1 ) = [-0.081] correction = UW(Y ) = [-.o4o,.o4o] Y 2 = Y l " correcti °n = [1.468,2.388] W 1 (Y 2 ) = (.014) W 2 (Y 2 ) = (-.49) "by continuing the above process one finds x 2 = [1.578,2.534] and gl (x 2 ) = g 2 (x 2 ) = 0. 6k Now check for the optimum at X p . In this case U. is a 2 X 2 nonsingular matrix, in which case since P - I - V^VS " ° ' Hence (U 2 U 2 rl 'UjVg) -1 P 2 ^f 2 II - R 21 <0 R 22 < o , therefore X is a local optimum. 6s APPENDIX B FLOW DIAGRAMS OF THE COMBINED METHOD 66 CjElJ read input ^problem Phase T Summary of the Flow Diagram infeasible sequential unconstrained. method print interior solutions no solution V - print \Lnfeasible ~*\ problem f Halt J & Phase II locking test final station> ary :olution print answer: -U Halt J yes find a local stationary point find a better solution print results ( lrt J l-'h.-i.-.i' I oi' Mi" Al^. )i-i thin store general^ /general [constraints ^across PEs' separable find separ- able poly- nomial approximations store coeffi- cients and 1 flag €> k=l find k-th solution find multiple k-th solutior no & yes. K? no divide constraints into 3 sets yes» ■V Separable Polynomials 68 Cterap = k ^ *=1 J find k-th f. 17 J penalty p R form k-th re vised objec- tive function Gh yes minimize by second order method minimize by conjugate directions yes form another starting vector <<• ■ ■->. ajolynom-yi^ — ^( temp = k A find all Vf, find all step lengths move to next solutions V find Vf, find step length and move there k = temp + ) fk = k+1 J C k = k+1 J 73 find multiple P k W k = ^ k find V*k find best step length move m that direction find all step lengths move toward region move to next points project back into region compare all solutions pick best solution (k = temp + 1 ) <77777) <*? lh sequential unconstrained method yes find Pl determine approach 28 32 28 find penalty function form revised objective function find all V s determine approach determine penalty function