LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 5IO.S4 t\o. 3^7- 342 cop. 2 Digitized by the Internet Archive in 2013 http://archive.org/details/modelforlinearpr340gold 3*0 Report No. 3I+0 A MODEL FOR LINEAR PROGRAMMING OPTIMIZATION OF I/O-BOUND PROGRAMS by David E. Gold June, I969 THE LIBRARY OF THE AUG 25 1969 UNIVERSITY nF ILLINOIS Report No. 3^0 A MODEL FOR LINEAR PROGRAMMING OPTIMIZATION OF I/O -BOUND PROGRAMS by David E. Gold June, 1969 Department of Computer Science University of Illinois at Urb ana -Champaign Urbana, Illinois 6l801 * This work was supported in part by the Advanced Research Projects Agency as administered by the Rome Air Development Center under Contract No. US AF 30(602)l+lM* and submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, June, 1969. A Model for Linear Programming Optimization of l/0-Bound Programs ABSTRACT A model of a class of machines having periodically addressable secondary memory is derived. The problem of (timewise) optimization of programs consisting of many core-loads is considered with respect to the model. For a large class of programs, sufficient constraints are estab- lished to make the solution amenable to linear programming methods. Ill ACKNOWLEDGMENT I wish to thank Professor D. J. Kuck for his suggestions and criticisms during the evolution of this study. I would also like to extend my thanks to the Department of Computer Science and the ILLIAC IV Project for their financial support. Finally, I am grateful to Mrs. Sharon Hardman for her excellent job in preparing this manuscript. IV TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1 History 1 1.2 Description of the Algorithm 2 1.3 Evaluation 3 2. PRELIMINARy k 2.1 Code Graph h 2.2 Basic Assumptions 6 3- SIMPLE COMPUTATION MODEL AND ESTABLISHMENT OF CONSTRAINTS. . . 9 3.1 Basic Model 9 3«2 Memory Constraint 11 3 . 3 Disk Timing Constraints 12 3.^+ Additional I/O Constraints 13 3.5 Sequencing Constraints lU 3.5.1 Standard ik 3»5«2 Special Case: Temporary Variables 17 3.6 I/O Contiguity Constraints 18 3' 7 System Constraints 19 3'8 Criterion Function 20 h. BOUNDARY CONDITIONS 21 k.l Constraints Which Apply to All Nodes 21 4.2 New Sequencing Constraints 22 5. OTHER CONSIDERATIONS 2k 5.1 Real Code 2k 5.1.1 Looping 2k 5.1.2 Branching 25 Page 5.2 Overlapping Nodes 27 5.3 Reduction of Variables 27 APPENDIX A. SUMMARY OF CONSTRAINTS 30 B. EXAMPLE 32 1. INTRODUCTION 1.1 History As computers become faster and larger, programs which run on them expand also. For an increasing proportion of programs (those manip- ulating very large amounts of data), a large amount of the "execution time" may not be spent executing. The speed at which these programs run is a function of the time it takes to manipulate their necessary data between primary and secondary memory. It is this class of problems (commonly called "I/O bound") to which the method to be described is directed. The objective was to develop a model of a program and a model of both primary and second- ary memories of the processor on which it is to be executed. The model must be suitable for analysis. In particular, the goal was to formulate a linear programming specification of the model such that, for most types of source code, the solution is an altered code which completes its execu- tion in the shortest possible time and is computationally equivalent to the original program. It is obvious, however, that there can exist no algorithm for reading programmer's minds. This immediately precludes consideration of every program which is computationally equivalent to the original. On the other hand, a very large subset of these is obtained from consideration of the original code and permutation of its statements. Indeed, it is the reordering of statements alone which may allow for the transferring of data at optimal or near optimal times. 1.2 Description of the Algorithm The algorithm presented here was developed for use on a machine (or, more precisely, system) having one or more fixed head disks. The method is, in fact, sufficiently general to handle any system with peri- odically accessible secondary memory (e.g., drums, delay lines). For obvious historical reasons, the word "disk" as used in this paper is synon- ymous with the more general secondary memory devices. The algorithm first models a program- -in sections- -with a directed graph (called a code graph ). This code graph represents the structure of the replacement statements in each section of code such that reorderings of these statements, when beneficial for I/O reasons, are guaranteed not to affect the final calculated results of the code. The execution of the program being modeled is quantized in the model. That is, the time axis is considered to be a series of small incre- ments of time in which various operations (execution, input, output, etc.) may occur. Since there is no a priori knowledge of the optimal solution, all such operations are initially allowed. This is accomplished by con- sidering a large number of binary integer variables as "flags" or indicators. If a particular variable is assigned the value "1" in a solution, the oper- ation with which it is associated is performed in the time interval to which it corresponds. It is upon this set of variables that the constraints are placed, guaranteeing a solution which is physically realizable but not restrictive with respect to unusual use of the I/O facilities. Such techniques as multiple copies of data on the disk and relocating data on the disk are therefore allowed in the final solution. 1.3 Evaluation While the model presented in this paper yields too many variables to be adequately tested with present linear programming methods, several observations may nonetheless be made. There will be an obvious trade-off between the time taken to optimize a program (the execution time for an LP program to find a solution for the model derived here) and the time which it actually saves. At one end of the spectrum are those programs which spend a relatively small part of their time waiting for I/O. At the other end are programs in which the majority of the execution time is really I/O "wait" time. It is clearly towards the latter direction that the model presented in this paper will be useful. At just what point in the spectrum it will begin to become economical, only testing with one or more LP programs can determine. [It should also be mentioned that the ability to optimize code is highly dependent upon the freedom in reordering statements of the code. In cases where little or no reordering is possible, while the model will yield con- straints forcing a rapid solution, little optimization may be accomplished.] 2 . PRELIMINARY 2.1 Code Graph A code graph corresponding to a set of instructions is defined by the following algorithm. The code must have no transfers (i.e., is assumed to be a sequen- tial list of operations). 1; For each variable name which has a second occurrence on the left of a replacement statement (that is, any variable which has its value assigned more than once in the code), do the following: a) rename the second such occurrence such that the new name is unique with respect to the variables already appearing in the code. b) rename all occurrences of the same variable which occur subsequent to the one changed to the same name (this is done on the left side of replacement state- ments, if necessary, yielding new "second occurrences"). 2) repeat 1 until no more such occurrences remain. 3) obtain a graph from this modified code by placing nodes in one-to-one correspondence with the replacement statements in the code. Each node is to be connected (by directed branches) to those nodes corresponding to the computation By "variable" here is meant the standard programming concept of variable as applied to large blocks of data (e.g., arrays, matrices). of variables which appear on the right side of the statement to which it corresponds. The direction assigned to each branch is from predecessor to successor (intuitively, the order in which the computation must take place) . Initial (non- computed) variables have their cor- responding nodes placed such that they are not successors of any nodes. Example : Consider the following segment of code (where no assumptions are made about the meaning of the operations +, -, *): A*-B*C D^B+A-C A«-A-D A*-A*D Step 1, first iteration yields: A*-B*C D*-B+A-C A '+A.-D A'*-A*D Step 1, second iteration A*-B*C IV-B+A-C A'-A-D AVA'*D The graph corresponding to this is: With each node on this code graph is associated the execution time for the operation which corresponds to that node. It is assumed that all operations (for given data sizes) have a known execution time; this is then normalized so that it is represented as an integer multiple of the time intervals (to be defined later) chosen for the particular model. This normalized time is denoted q. for the i-th node. 1 Note that a node may be considered as both an operation (calcu- lation) on some data (which we assume to take a fixed amount of time) as well as the generation of some data. For ease of explanation, then, the word "node" will be used interchangeably for both calculations and pro- duced data, where such use is unambiguous. 2.2 Basic Assumptions The following simplifying assumptions are made. All calculated data is to be saved. (That is, no provision is made for recognizing purely That is, the time intervals are chosen to be at least as short as the shortest execution time so that all execution time may be represented as multiples of time intervals. temporary intermediate results, so one copy of everything will eventually be stored on the disk, (it will be shown later that the model can be refined to avoid this inefficiency. ) The disk is assumed to be infinite. In other words, the amount of data to be written on the disk is significantly smaller than the disk's capacity; hence, the assumption that disk space is always available when and where it is required. The actual transmission time for each block is negligible compared to the rotation (cycle) time of primary memory. Primary memory, while fixed in size, is assumed to be capable of storing (holding) data in a form suit- able for direct calculation. It may, however, be the case that for many programs it is not possible to use primary memory with perfect efficiency. This is somewhat analogous to having a number of differently dimensioned rectangles and attempting to place them (non- overlapping) in a larger rec- tangle, the area of which is equal to the combined areas of the smaller rectangle. The situation occurs here because the model does not consider formatting of primary memory whereas, for large problems of the type being considered, its contents (data) must be highly structured. A reasonable approximation to real life can be obtained here by considering primary memory as being smaller than it actually is. That is, a certain "bias" is introduced to account for the "holes" or unused primary memory due to the rigid data structure. This bias is best considered to be particular to each program and memory size. For the remainder of this paper, Memory or Mem will be considered to be the biased maximum useable max max primary memory. In other words, no storage formatting problems are con- sidered. (Thus, it is possible to effect legitimate solutions to optimising 6 code by making the memory "look" smaller than it really is for the machine under consideration. This has the effect of allowing enough extra room to store arrays which do not naturally "fit together" in the restricted memory. ) 3. SIMPLE COMPUTATION MODEL AND ESTABLISHMENT OF CONSTPAINTS 3.1 Basic Model The model is a two-dimensional array of vectors. The array is as illustrated below: 12 3^ • m where time is the abcissa. This means that each column refers to a unique time interval. The intervals are of equal length, their minimum (time) is such that every calculation time may be expressed an integral number of inter- vals, m is the total number of such intervals and is fixed by obtaining any upper bound on computation time for the entire section of code being considered. Clearly, an upper bound on the running time is easily obtained. The time interval and m determine the amount of time we consider- -not the time the program runs. Indeed the program should execute in somewhat less time than this upper bound, the optimization criterion being that the last A perfectly legitimate upper bound of total computation time may be obtained by just considering the original code-executed in sequential order with appropriate delays for I/O. 10 block to finish computing do so at the earliest possible time (used number of time intervals is minimum) . The ordinate in the matrix is node number (the nodes having been ordered arbitrarily). Each entry in the matrix is a vector having six elements. Thus, we have the following set of variables: x. .. ljk where k is the time interval < k < m where j is the node number 1 < j < n and 1 < i < 6 corresponding to the actions i = 1 computation i = 2 input (begin) i = 3 input (continuing) i = h output (begin) i = 5 output (continuing) i = 6 overlay Recalling the concept of equivalence between node, computation and data, the meaning of the variables x. ,. is as follows: ljk x . = 1 iff the j-th node is "computing" in the ljk k-th time interval 11 x = 1 iff the j-th node (data) starts being input in the k-th time interval (that is, this data was not being input during the k-lth time interval) x = 1 iff the j-th node is being input in the k-th Jjk time interval x. = 1 iff the j-th node starts being output in the k-th time interval x = 1 iff the j-th node is being output in the k-th !?Jk time interval x> ., = 1 iff the memory space occupied by the j-th node is overlaid (released, made available) immediately at the beginning of the k-th time interval In the part that follows, one or more of the subscripts may not appear when our concern is only with the remaining one(s) and the meaning is clear. The various constraints on the variables will now be introduced. 3«2 Memory Constraint The amount of available primary memory is fixed and the amount of data contained therein must not exceed its capacity. This yields m constraints: k n x x Z Z (-Jsl£ + _±i£ _ x . ) • ( s i Z e of the j-th block) < Memory . , v p. n ojr' v ° ' — "'max r=0 j=l *j c d J 12 The above inequality must hold for all values of k, 1 < k < m, yielding m constraints, p. is the number of time intervals required to transmit the j-th block (of data). The summation, then, yields iff the j-th block is not in primary memory at time k (more accurately: at the k-th interval); otherwise, the result is the sum of all memory taken up by blocks in memory plus the fractional portion of any blocks in the process of being input. The model thus assumes that blocks of data are input in such a fashion that the amount of memory taken up by them is simply a ramp function over the time intervals in which they are being input. Note that it is possible to substitute the simpler constraint n ^ir Z Z (x . + — «— - jo- . ) (size of the j-th block) < Memory r=0 j=l ° r n c °J r max for the previous one. The difference is in the resolution of occupied memory: the latter constraint requires that room be available in memory for an entire block when the transmission begins. This is analogous to a square wave description of memory usage. 3*3 Disk Timing Constraints Here the fact that we are considering only fixed head disks is used. We require that the time between when a block is output and subse- quently input be a multiple of the disk revolution time. (The reason is Note that actual transmission time is assumed to be negligible. In cases where this is not the case, however, the time between output and subsequent input is an integral number of disk rotations plus the time for two transmissions of the data. The constraint is then modified accordingly. 13 obvious: data cannot wander around on the disk--it must remain where it is put.) Let r he the (integral) number of time intervals corresponding to a single disk revolution. Then for each node j, x - Xj , < for some integer I > 1 (for x = 1 some x, , an integral number of revolutions ago must be 1, For x = 0, we are not concerned with the values of xi . for integral numbers of revolutions ago.) Written as a linear constraint: f,r Vj, Vk x - Z x, ., < 0, s = k modulo r, f = k-r ^ JK t=s Jt (note that r is a parameter which is constant for any particular system, allowing the generation of the above constraints.) 3»^ Additional I/O Constraints The disk timing constraints are sufficient only for continuous data transfer. That is, they guarantee only that initiation of input and output be "modulo the disk". Here additional constraints establish con- tinuous transmission: (let p. be as defined previously) J k+p.-l Ik (for x = 1 we start to input in the k-th time interval. Since there are p . terms in the summation, there must be p . continuous intervals of data J J transmission commencing with the k-th. For x = 0, nothing is constrained.) There is also a similar set of constraints for output: k+p.-l v 1 P J ^k i=k 5ik - The above constraints do not disallow initialization of input or output more than once within any p. intervals --which can be precluded with another J set of constraints of the form k+p .-1 L 3 ^ - x k+p.-l z J X hU - 1 but because of the restrictions on allowable sequences of computation, input, output and overlay, these are redundant. 3»5 Sequencing Constraints 3. 5*1 Standard Let the letters C, I, 0, D represent the operations (in primary memory) compute, input, output and overlay (destroy), respectively. 15 For any node, then, the temporal sequence of operations associated with it can he expressed C00*D(I0*D)* where * is Kleene star. In other words, any node must first be computed, then output any number (> l) times and overlaid. This may then be followed by any number (possibly zero) of the following sequences: input, output > times, overlaid. All non-permitted sequences are disallowed by the constraints given in this section. W - 1 - ^ (X 2Ji • W ^ ° guarantees the proper relative sequencing of I and D, [D(lD)*] k requires that a block be (at least started to be) input before it may start being output. (Essentially, this establishes that O's appear only in the allowed sequence.) It is still necessary to require the computa- tion and output (in that order) before the remainder of the sequence. Let C. be the integer formed by considering the (binary) vari- ables x_ . Q x_ ..... x_ . as an integer in binary notation. That is, C. - Z x • 2^'^ m Similarly, 0. = Ex,.,- 2 (m -^ £=0 <3 o-r, ^d*> 5. = Z x... • 2 (m "^ then we require at least 1 overlay: m at least one output preceding it: > D. or Z (*., - x,. ) • 2 (m_i) > and computation preceding this: 16 V 5 j ° r | (x ui - x ^> • *™ * The last constraint, however, is not sufficient. It is still possible for computation to be unfinished when the first output begins. This is disallowed by forming the next constraints, viz. : let n be the number of time intervals for which the j-th node (computation) must take place. Then m E X lii = n c 17 and it is necessary to insure that the "right-most" or latest variable with value 1 is still "to the left of" or before the first output. The constraints are k Vk, Vj x + Z x. . < 1 (for any x = 1, there can be no Ijk *_q H-ji; J-JK x h . £ = 1 f or i < k) . The constraints corresponding to the inequality C. > 0. are included in the above set and may therefore be discarded. 3. 5*2 Special Case: Temporary Variables In the case of nodes which are only temporary variables, there is no reason to require that a copy of same be ultimately placed on the disk. The allowable sequences, then, for such nodes are C0*D(I0*D)*. Such an alteration in sequencing (no longer requiring an output) can be accomplished by substituting the inequality C. > D. for the inequality J J 0. > D.. This yields the constraint: 3-3 £2-^ ! x 6 . 2 (^)>o 2 n c. J!=0 6ji (J or. m x., . 4=0 2 n c 6 ^ 18 where the 2 c . in the denominator of the compute term guarantees that a J node is not overlaid before it is completely computed. 3 .6 I/O Contiguity Constraints The constraints presented in this section perform two functions: i) the structure of the code graph is represented here, and ii) data which is needed for any calculation is forced to be entirely in primary memory before the calculation begins. Definition: The requirements of a node j (or, more simply, the requirements of j) are the predecessors of j. Thus, in the example: A and B are the only requirements of j; C is a requirement of D, etc. We require that for any node j, when x = 1, every one of the requirements of J be entirely in memory. [Clearly the requirements that the temporal order of execution in the solution be consistent with the code graph is met if the previous requirement is (data may not exist before it is calculated).] 19 * for all i which are requirements of j [The summation is seen to range between and 1, as the portion of i in memory. The constraint simply guarantees that when j is being calculated (x = l), the summation term must also be unity.] Ijk 3^7 System Constraints It is here that constraints peculiar to the system being con- sidered appear. Such items as multiprogramming, number and interrelation of data channels, etc., are "specified" as linear constraints. It is not the purpose of this paper to anticipate the various system configurations one might consider. It is, rather, to indicate that appropriate constraints would be generated and to consider one specific example (albeit a simple one). Consider a machine which can only transfer data over one channel at a time. This gives the constraints: n £ (x 3jk + x 5 jk> < 1 Except for those nodes i which are initial nodes 20 This machine is further more not allowed to perform more than one "calculation" (as used in this paper) at a time: n Z XL, .. < 1 There are assumed to be no further constraints, although it is obvious that many can be considered for various systems. Indeed, many differently configured systems may be described by suitably establishing the proper constraints here. 3«8 Criterion Function Considering the constraints which require an overlay following a required output following a required computation for each node which is ultimately saved , it suffices to further require only that the final (timewise) overlay be at the minimum time. That is m n minimize Z ( E x. .,) • 2 £=0 j=l 4 ^ * That is, ignoring those nodes which are "intermediate data" and never output. 21 k. BOUNDARY CONDITIONS The constraints which have been given are for nodes which are computed. That is, nodes corresponding to replacement statements in the original code. It behooves us to now provide suitable alterations for these constraints as they apply to "initial nodes". (initial nodes are those corresponding to input data and hence never get computed.) Each initial node may be thought of as having the same set of six variables associated with it as do the other nodes. The first, x n „ , ljk is always for < k < m, 1 < j < n. This is done to allow as many of the previous constraints as possible to hold. k.l Constraints Which Apply to All Nodes The following constraints are valid for initial nodes: k n x x Memory constraints: E E ( + — — - x r . ) S . < Mem „ . , p. n bjr 7 j - max r=0 o=l *j c J (noting that the second term is always zero and may be omitted) f,r Disk timing constraints: x c ., - E x. ., < <^ K t=s Jt ~ k+p -1 I/O contiguity constraints: p. • x - E x o/?v < ° k+p.-l E : i=k P J * x 4jk " £* x 5ik < ° 22 k.2 New Sequencing Constraints It is necessary to change the sequencing constraints since initial nodes must have different temporal sequences of "basic operations than the other nodes. This is due to the fact that i) initial nodes are not computed; ii) there is no reason to require a final copy on the disk as we assume that there is a copy there to begin with; iii) since the initial copy is assumed to originate on the disk, it is not possible to reference its location there relative to its first (non-existent) output. The allowable sequences, therefore, are (I0*D) (I0*D)* Note that it will not be necessary to establish constraints requiring the initiation (lO*D) portion. Requiring (lO*D)* only is sufficient due to the fact that the in -memory constraints will force the input of the block at least once. We first guarantee proper relative sequencing of I,D: o S ^ ("aw " x 6jP < ° foroes (ID >- k requires output to follow input (when it occurs). The remainder of the normal sequencing constraints are not applicable. 23 Note that the constraints in this section allow the initial nodes to be stored initially at any location on the disk, thus allowing full generality. 2h 5- OTHER CONSIDERATIONS 5.1 Real Code The method previously described is restricted to linear (sections of) code. In order to show some potential utility, however, it is necessary to show applications to "non-linear" code. 5«1.1 Looping A loop may be thought of as concatenation of a section of linear code with itself as many times as the loop is executed. It is clearly impractical, though, to generate such "expanded code" and then proceed to optimize same. Another method will be given for "optimizing" loops which is at worst only slightly inefficient. Consider the consequences of first optimizing a single iteration of a loop and then allowing the repetition (looping) of this "optimized" piece of code. The first time through the code would be linear and the execution time would indeed be minimal (in the sense discussed in this paper). Upon finishing execution of this first iteration, execution must commence at the "beginning" of the linear section of code. In general the data needed to begin this execution will not be in an optimal location on the disk, thus requiring a wait due to "disk latency" of anywhere up to one revolution time. The remainder of this section of code will find items it needs from the disk at precisely those locations most advantageous. This is because after the initial wait, the computation of the second iteration of the loop is "in phase" with that of the first iteration. If the linear section (one complete iteration) 25 is of moderate length, it is reasonable to expect that the optimization process has reduced the running time by at least several disk revolutions (there would be little justification for it otherwise). This being the case, the loss of as much as only a single revolution per iteration is not outrageous—especially when we consider a latency of only one-half revolu- tion on the average. The procedure of concatenating a previously optimized linear section, then, will still yield highly reduced running times over the original (non-altered) code, in spite of the fact that, in general, the execution of the loop is not really optimal. Further refinements to the running time of loop may be obtained by concatenating the loop with itself some number of times (usually much less than the total number of iterations at run time). That a saving of time is effected is clear. As was previously explained, any loss of time due to disk latency (on the average, one-half a disk revolution) occurs when the optimized code is concatenated with itself. If this time loss can be made to occur once for every q iterations (as is the case when the original code is repeated q times and then optimized) instead of every iteration, the time lost is reduced by a factor of q in the long run. This method would be particularly valuable for loops which take very little time to execute but will be repeated often. Such short loops need only be repeated several times before optimization. 5.1.2 Branching In the absence of detailed information on the frequency and ordering of the various possibilities at each conditional transfer, it seems impossible to perform general inter-section optimization. Tentative 26 investigation in this area has not yielded any algorithms for obtaining an optimal solution. It is, however, possible to improve on a brute-force section-by- section optimization. For example, in the case of a two-way conditional branch preceded by a section of linear code, the following "optimization" may be performed. The initial linear code and any one of the two branches are considered to be one section of linear code, (it may be necessary to add a dummy instruction at the branch point to account for the timing of the condi- tional test there.) Upon this "new" code is placed one additional restriction- -while statements are allowed to be reordered for the normal optimization algorithm, reorder ings over the boundary established at the branch point are prohibited. This may be easily done for example, through introduction of constraints requiring that every node in the initial section be computed before every node in the post-branch section. The second branch is considered by itself afterwards except that the initial conditions have already been determined by the previous optimization. In other words, the placement of data in primary and secondary memory at the branch point (resulting from the first optimization) are now the boundary conditions for the second branch and its optimization. 27 Several refinements of the above algorithm have been considered. Unfortunately, however, these may best be described as heuristic approaches since there is no real experience on which to base their value. 5.2 Overlapping Nodes In many problems (e.g., PDE, weather codes) there are a set of basic operations which are carried out on an extremely large amount of data arranged on a grid. These operations require the values of points in local areas on the grid for each calculation at a point. Since it is necessary to partition these large grids into smaller sections, informa- tion must somehow be provided over these created boundaries. This can be done within the confines of the code lattice model as follows. The requirements for each node include all over-boundary data (as many as 8 additional nodes for a five point finite difference method on a rectangular grid). Each calculation produces, besides the updated information for the section of grid, additional separate (although redun- dant) pieces of data- -each of which is a requirement for at least one other node. The previous constraints are perturbed such that while only one calculation is considered to have taken place, it produces a multiplicity of new nodes each of which is thereafter independent of its siblings. 5.3 Reduction of Variables The procedure, as thus far described, yields 6 mn binary variables. Besides generating a large number, this "brute force" method ignores _ Recalling, m = no. of time intervals, n = number of nodes. x 28 information useful for reducing the number of variables. It is possible, though, to easily eliminate a large number of the variables. Consider, for example, the node marked A in the earlier example. i Since the nodes A, B, C and D must all have been calculated before A begins, it seems silly to consider the computation of A "early" in the solution. Indeed, it is unnecessary to consider the compute variables (x ) for all time intervals before the time in which it takes to compute all the pred- ecessors for a given node. The required nodes are easily obtained by merely tracing up the graph (against the arrows) from the node under con- sideration; that is, all predecessors and their predecessors, recursively. It should also be clear that all variables (input, output, overlay) are forced to be for these "early" time intervals in -which the compute vari- ables are also 0. In a similar fashion, variables may be removed from "late" time intervals with two exceptions: i) since no a priori knowledge of the optimal running time exists, the time intervals in question are those counting backwards from the maximum time; and, ii) input, output, and overlay variables are not set to as are compute variables in these regions. There appears to be no analytic way of relating the saving (removing) of variables to the graph structure (which, indeed, it is highly dependent upon) . Preliminary results indicate, however, a savings of the order of magnitude of one-half the total number of variables (6mn) . It should be noted that the programs for which there are great reductions of variables by the preceding process are those which have 29 code graphs which have many levels. Similarly, programs yielding code graphs which have few levels do not result in large reductions of variables, (intuitively, long code graphs are those in which there are long chains of nodes dependent on their predecessors and hence many nodes which are con- strained to start computing late--or finish computing early.) 30 APPENDIX A SUMMARY OF CONSTRAINTS Memory: Z I (^£ + ^Ml r=0 3=1 *j c J x £- ) ojr' (Size of jth block) < Mem max Number of Constraints: m Disk Timing: f,r x_ .. - Z Xi, . , < 2jk 4jt - m*n s = k modulo r, f = k-r Continuous Transmission: k+p .-1 m*n k+p -1 m*n Sequencing: All Nodes: £ (x 6ji " X 2o|) < X m*n 4 (x 2ji " W - ° m«n £ (x 6ji " X 2 3 P + X ^k + 1 i % m*n Non-Initial Nodes: m -Z x^ . , < 1 i=l M " n (standard) m £ l*6 it - ^ ■ S(m ' l) <- ° ' (temporary) I ( x -fiM) i=0 M 2 n c. J 2 (m-i) < Q Vi which are requirements of j 31 m Z x_ . ., - n =0 i=0 lji C j n x + Z X., . « < 1 Ijk ^ =Q 4ji - m«n Contiguity: x ljk i=0 C . *1 W $ ° 3m*n System: n Z x_ ., < 1 Ijk - m n £ (x 3;jk + *a|k) < x m Total Number of Constraints: ~ 10 mn + 3(m+n) 32 APPENDIX B EXAMPLE As an example of this method, let us consider the following sequence of replacement statements as a section of code which is to "be optimized. F - J = C = L - E - G = H = K = I = f x (B,D f 2 (A,F f 3 (A,B f U (C,J f 5 (C,D f 6 (B,E f T (F,G f 8 (F,H f 9 (C,H where each of the "variables" A-L represents a block of data, their respective sizes being as follows: (we will also here associate block numbers 1-12 for variables A-L, respectively; as well as normalizing the available primary memory to be unity) 33 Size block F = 1/8 memory therefore, size of the 6th block = 1/8 J = l/l6 memory therefore, size of the 10th block = l/l6 C = l/k memory therefore, size of the 3d block = l/k L = l/k memory therefore, size of the 12th block = 1/4 E = l/k memory therefore, size of the 5th block = l/k G = 1/8 memory therefore, size of the 7th block = 1/8 H = 1/8 memory therefore, size of the 8th block = 1/8 K = l/l6 memory therefore, size of the 11th block = l/l6 I = l/l6 memory therefore, size of the 9"th block = l/l6 The code graph for this code is: and we will assign the following time intervals necessary to evaluate each of the (arbitrary) functions f, - f . 3k Time for f _ , f_, f_, f Q , f_ = 1 interval therefore, n = 1 1 3 7 o 9 c j-t • • • »y f , f , fs = 2 intervals therefore, n = 2 5 C 2,5,6 f. =3 intervals therefore, n =3 Where one revolution will be k intervals. In an effort to keep this example non- complex, all blocks are assumed to be transmitted between primary memory and secondary memory in a single time interval (p. = 1, 1 < j < 12). J We will further assume a maximum running time of 2k intervals (m = 2k). As an example of the flexibility available in specifying con- straints for particular problems, we will make two alterations to the con- straints as previously described. The first change deals with the constraints for initial conditions: it seems reasonable to consider that the initial nodes may be initially in memory (although it would not be reasonable to require that they do). This can be reflected by removing the constraint on the number of blocks which may be input during the Oth time interval. Thus, the solution may indicate that more than one block has been transmitted during the Oth time interval. This may in fact indicate that these blocks were loaded at the same time (just prior to run time) or may reflect the situation in which these blocks were input (one at a time) earlier (intuitively, during intervals -1,-2,...). The second alteration is a minor one dealing with the use of memory. We will arbitrarily declare that memory can never be completely 35 used. That is, the inequality "<" in the memory constraint is replaced by "<" yielding: 2k 12 x . Z E (x_. - -±3E - x,. ) (S.) < 1.0 i"ir n oiri r=0 j=l JJ c. J J ° = 1,2,1. ° i - !' 2 ' k ° 2 k 5 21, 38 Sequencing: Non-Initial k Nodes: E ( x £^ - x 2 -^) - - 1 J = 3 and 5 < j < n E (x ^ - x 6 ^) < j = 3 and 5 < j < n 1 < k < 24 k ^6^ " X 2j^ + \ j k + l ^ X J = 3 ' 5 " n 1 < k < 24 j = 3,5-n j = 3,5-n E i=i x 6ji ^ 1 24 Z k n = c . = x ij^ + E " i=0 X 4jk < 1 J = 3,5-n 1 < k < 24 Contiguity: x^- - E (x^ - Xg^) < 1 < k < 24 *13k - | (X 32I " W < ° 1 < k < 2U x 15k " | Q ( ^ + x 3ii " W S i = S'3,U l Zt 1 + X 3il " W $ ° t ' ( " 1 1 - k - 2k £=0 c. 39 k x ' x 19k " £<"^ + x 3ii " W 2 = 3,6,8 1 < k < 2k k x 110 x_ _. ,- E (-==£ + x... - x>. J < i * 1,6 1 < k < 2U 1 10 k „ ~ n 3ii oii - - - 1=0 "c. k x c. . . , - Z (-^ + x. . . - x. . . ) < 1 11 k v n 3ii bii' - i=0 c. X_ . g L. - Q . - Z (-^ + X_.„ - X..„) < 1 12 k „ ~ n 3ii oii' - i=0 c. i = 6,8 1 < k < 2k = 3,10 1 < k < 2k System: 12 Z x_ ., < 1 < k < 2k 12 j =1 3jk 5jk y - 1 < k < 2k UNCLASSIFIED Security Classification DOCUMENT CONTROL DATA -R&D (Sacurity claaaltlcatlon of tltla. body ol aba tract and inda wing anno till an mum I ft* antarad whan tha ovarall report la clamalllad) originating ACTIVITY (Corpormf author) Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 2«. REPORT SECURITY C L A SSI Ft C A TION UNCLASSIFIED 26. GROUP 3 REPORT TITLE A MODEL FOR LINEAR PROGRAMMING OPTIMIZATION OF l/0-Bound Programs 4. DESCRIPTIVE NOTE* (Typa of raport and Inclutlra dataa) Research Report 5 AUTHOR(S) (Flrmt naora, middlm Initial, la a I nam*) David E. Gold 9 REPORT DATE June, 1969 7a. TOTAL NO. OF PACES h3 7b. NO. OF REFS »a. CONTRACT OR GRANT NO. U6 -26 -15 -305 b. PROJEC T NO. USAF 30(6O2)klkk »m. ORIGINATOR'S REPORT NUMSER(S) DCS Report No. 3^0 9b. OTHER REPORT NOISI (Any othar numbara that may ba aaalgncd this raport) 10. DISTRIBUTION STATEMENT Qualified requesters may obtain copies of this report from DCS. II. SUPPLEMENTARY NOTES NONE 12. SPONSORING MILITARY ACTIVITY Rome Air Development Center Griffiss Air Force Base Rome, New York ISUkO IS. ABSTRACT A model of a class of machines having periodically addressable secondary memory is derived. The problem of (timewise) optimization of programs consisting of many core-loads is considered with respect to the model. For a large class of programs, sufficient constraints are estab- lished to make the solution amenable to linear programming methods. DD F " , r..1473 UNCLASSIFIED Security Classification UNCLASSIFIED Security Classification KEY WORDS ROLE WT ILLIAC IV input output optimize computation computation model disk timing memory utilization linear programming 1 UNCLASSIFIED Security Classification ^^