LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 no.7a2-727 cop .2. Digitized by the Internet Archive in 2013 http://archive.org/details/analysisofschedu724chen 1 v> o i j4uJn 7 UIUCDCS-R-75-724 IHE LIBRARY OF THE AN ANALYSIS OF SCHEDULING ALGORITHMS IN ^ MULTIPROCESSOR COMPUTING SYST3MBVERSJTY OF .ILLINOIS fey Nai-Fung Chen ■ May 1975 UIUCDCS-R-75-72^ AN ANALYSIS OF SCHEDULING ALGORITHMS IN MULTIPROCESSOR COMPUTING SYSTEMS by Nai-Fung Chen May 1975 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 'Hiis work was supported in part by the Department of Mathematics, the Department of Computer Science, and the National Science Foundation under grant GJ-^1538 and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Mathematics, 1975. Ill ACKNOWLEDGMENT I wish to express my sincere thanks to Professor C. L. Liu for his inspiring guidance and constant encouragement during the course of preparing this thesis. I would also like to thank Drs. Andrew and Frances Yao for numerous constructive conversations and discussions on the subject of this thesis as well as on other subjects. Last but not least, I wish to thank my wife, Ann, for keeping me happy throughout my studies. TABLE OF CONTENTS IV CHAPTER Page 1 . INTRODUCTION 1 1.1 Motivation 1 1.2 A General Multiprocessing System 2 1.3 Optimal Scheduling Problems and NP-Complete Problems 6 1 ,k Heuristic Scheduling Algorithms 8 1.5 A Preview 9 2 . A SURVEY OF PREVIOUS WORKS 10 2 . 1 Introduction 10 2.2 The General Multiprocessing System 10 2.2.1 A General Bound 10 2.2.2 Some Special Bounds 11 2.2.2.1 The Critical Path Algorithm 11 2.2.2.2 The Co ffman- Graham Algorithm (CGA) 13 2.3 The Augmented General Multiprocessing Model 15 2.k The Heterogeneous Computing System 17 2 . 5 Conclusion 21 3- BOUNDS ON THE CRITICAL PATH SCHEDULING ALGORITHM 22 3 • 1 Introduction 22 3.2 Notations 23 3 -3 Some Preliminary Lemmas 2k CHAPTER Page 3.3.1 Part 1 of the Proof of Theorem 1: a., £ a is assumed 25 1 n 3.3.2 Part 2 of the Proof of Theorem 1: a n > a is assumed 27 1 n 3 ,1|. Conclusion of the Proof of Theorem 1 l+l k. A SUFFICIENT CONDITION FOR THE OPTIMALITY OF THE CRITICAL PATH ALGORITHM k6 k.l Introduction k6 k.2 A Generalization of Hu' s Theorem k6 5 . ON A VARIATION OF THE CRITICAL PATH ALGORITHM 52 5.1 The Critical Path and Count Algorithm 52 5.2 Optimality of the Critical Path and Count Algorithm 53 6 . ON THE SUCCESSORS ALGORITHM 59 6.1 The Successors Algorithm 59 6.2 A Bound on the Successor Algorithm 59 7 . CONCLUSION 65 LIST OF REFERENCES 66 VITA 68 VI LIST OF FIGUEES Figure Page 1.1 A directed graph representing a set of jobs h 1.2 A timing diagram representing a schedule 5 3.1 A partition of CPS in Type I and Type II blocks 31 3-2 An illustration of X(l) and X(ll) block of Lemma 3.8.. 3k 3«3 An example of go /to ->■ 4- f° r a ^-processor system.... kk k.l An example of < satisfying hypothesis of Theorem 2.... lj-7 5.1 Example of Algorithm CPCA 56 5.2 Example for Definition of X 57 6.1 An example for which go /go = — 6*1- oA U 3 CHAPTER 1 INTRODUCTION 1.1 Motivation One of the most significant advantages attributed to multiprocessor systems is the potential decrease in computation time for a large class of problems achievable by parallel programming, that is, the concurrent execution of independent portion of a computational job. This is especially important in real-time application when the results are needed more quickly than they can be provided by single-processor systems. However, it has been known [ 8 ] for some time that certain rather general models of multiprocessing systems frequently exhibit behavior which could be termed "anomalous," e.g., an increase in the number of processors of the system or reducing the execution times of some of the jobs or weakening the precedence constraints among the jobs can cause an increase in the time used to complete a set of jobs. In order to fully realize the potential benefits afforded by parallel processing, it becomes important to understand the underlying causes of this behavior and the extent to which the resulting system performance may be degraded. Indeed, how best to exploit the multiple processor organization in order to obtain maximum performance is the primary concern of this thesis. The performance of an algorithm can be measured in several rather different ways. Two of the most common involve examining the expected behavior and the worst-case behavior of the algorithm under consideration. Theoretical results regarding expected behavior require assumptions concerning the underlying probability- distributions of the parameters involved and historically have been extremely resistant to attack. Fortunately, there are many situations for which worst-case behavior is the appropriate measure in addition to the fact that worst-case behavior does bound expected behavior. It is this latter measure of performance which will be used on the model and algorithms discussed in this thesis. Since it is essential to have on hand the worst examples one can think of before conjecturing and proving bounds on worst-case behavior, numerous such examples will be given throughout the thesis. 1.2 A General Multiprocessing System We consider the problem of scheduling a set of jobs on a multiprocessor computing system that has a set of identical processors capable of independent operation on independent jobs. Let P , P , . ..,P denote the n identical processors in a multiprocessor computing system. Let & = [j_,J p , . ..,J ) denote a set of jobs to be executed on the computing system. We assume that the execution of a job occupies one and only one processor. Moreover, since the processors are identical, a job can be executed on any one of the processors. Let ju(J.) denote the execution time of job J., that is, the amount of time it takes to execute J. on a processor. There is also a partial ordering < specified over Q. It is required that if J. < J. then the execution of job J. cannot begin until the execution of job J. has been completed. (j. is said to be a predecessor of J., and J. is said to be a successor of J.. J. is an immediate successor of J. if there is no J. such that J. < J, < J..) i k l k j ' Formally, a set of jobs is specified by an ordered triple (6, m, "0 where ju is a function from Q> to the reals. A set of jobs can be described by a directed graph such as that in Figure 1.1. There is a directed arc from the vertex (job) J. to the vertex (job) J. if and only if J. is an immediate successor of J. . 3 i By scheduling a set of jobs on a multiprocessor computing system we mean to specify for each job J. the time interval within which it is to be executed and the processor P on which execution will take place. An explicit way to describe a schedule is a timing diagram , also known as the Gantt chart . As an example, the timing diagram of a schedule for the execution of the set of jobs in Figure 1.1 on a two-processor computing system is shown in Figure 1.2, All schedules considered in this thesis are non-preemptive meaning that once the execution of a job begins on a processor, it will continue until completion. In a given schedule, an idle period of a processor is defined to be a time interval within which the processor is not executing a job (while at least one other processor is executing some job) . In a schedule, a processor might be left idle for a period of time either because there is no ready job within that time period or because it is an intentional choice. (A job is said to be ready at a certain time instant if the execution of its predecessors have all been completed at that time.) Clearly, it is never necessary nor beneficial in a schedule to leave all processors idle at the same time. For unit-time jobs an optimal schedule can always be found among schedules with no intentional idle Figure 1.1 A directed graph representing a set of jobs. J l | Jj i J u 1 J 7 I * i 2 1 i ' 2 1 1 1 1 1 J 2 * 1 J 5 1 h I 2 1 i ' 2 1 2 1 I h J 6 1 Figure 1.2 A timing diagram representing a schedule periods; that is, a processor is left idle for a certain period of time if and only if no job is ready within that period. Thus we need only be concerned with such schedules. In this case, a scheduling algorithm can be specified by merely giving the rules on how jobs are to be chosen for execution at any instant when one or more processor is free. (Of course, the choice is only among jobs that are ready at that instant.) A simple way to spell out the rule is to assign priorities to the jobs so that jobs with higher priorities will be executed instead of jobs with lower priorities when they are competing for processors. (if two or more processors are available, pick the one with smallest index.) Consequently, we call such algorithms priority- driven scheduling algorithms. The usual practice is to list the jobs in decreasing order of their priorities from left to right. Such a list is called a priority list . Whenever a processor is free, the priority list is searched from left to right and the first ready job encountered will be executed. 1.3 Optimal Scheduling Problems and NP-Complete Problems Given a set of jobs, one of the problems of scheduling is to find an optimal schedule. For a set of n jobs there are only a finite number of schedules and one of them will be an optimal one. However, an exhaustive examination of all the schedules (the number of which exceeds n!) requires considerable computation time and will offset the advantage gained by using an optimal schedule. There is a general agreement that an algorithm used to solve a certain problem is considered to be efficient if the number of 'elementary' steps of the algorithm is bounded by a polynomial function of the size of the problem which in our present case is the number of jobs to be scheduled. Such algorithms are usually called po lynomi a 1 - 1 ime - bounded algorithms or, for conciseness, polynomial algorithms . Thus the brute force method of examining all the schedules is clearly inefficient since the number of elementary steps involved is exponential in n. There are many classical problems in combinatorics, such as the travelling salesman problem, the Hamiltonian circuit problem, and integer linear programming problem for which no polynomial algorithm producing optimal solutions is known. . Recently a class of problems known as nondeterministic polynomial-time complete (or NP-complete) problems which includes all the previously mentioned problems was studied. S. A. Cook [ k ] and R. M. Karp [11] showed that all problems in this class are equivalent in the following sense: If one problem of this class has a polynomial solution, then all problems in this class have. Since many of these problems are practically important and have been studied by mathematicians and computer scientists for decades and no polynomial solution has been found for even one of them, it is natural to conjecture that no such polynomial algorithms exist. But, in spite of the overwhelming empirical evidence to the contrary, it is still an open question whether NP-complete problems admit of polynomial solution. J. D. Ullman [1^] showed that for the multiprocessing system described in Section 1.2 the problem of determining an optimal algorithm for n-processor systems for all n is NP-complete. More specifically, he showed that the problem of determining an optimal schedule is NP-complete even for the following cases. 8 (1) All jobs in the given job set have equal execution time. The number of processors in the system is arbitrary. (2) There are only two processors in the system. The execution time of each job in the job set is either one or two. Thus the problem of designing an optimal scheduling algorithm is difficult indeed. Several authors have designed efficient algorithms to produce optimal schedules [ 3 ], [ 5 ], [10]. Unfortunately, these algorithms are only applicable to some special cases. As a matter of fact, polynomial algorithms that produce optimal schedules are known only for the following special cases: (1) All jobs have equal execution time and the partial ordering over the jobs is such that either every job has at most one successor or every job has at most one predecessor [10]. (2) All jobs have equal execution time and there are only 2 processors in the computing system [3 L [ 5 ']'■ l,k Heuristic Scheduling Algorithms In view of the difficulties in practice, one has to use approximate heuristic scheduling algorithms which hopefully yield 'good' schedules in a reasonable amount of computing time. Thus instead of seeking an algorithm which produces optimal schedules, one seeks an approximation algorithm from the set of ' sufficiently efficient' algorithms. Unfortunately, it is usually difficult to evaluate and compare the performance of heuristic algorithms other than by running them on large problem sets with known optimal solutions. A more rigorous approach is to analyze mathematically the performance of such algorithms to determine how closely the constructed solutions approximate optimal solutions. In this thesis a number of heuristic algorithms are considered. Worst-case performance bounds relative to the optimal solution are determined. Sufficient conditions under which these heuristic algorithms are optimal are given . Worst-case results on performance, though not as practical as results on average performance, are quite useful, especially because they enable one to guarantee that a particular algorithm will never exceed the optimal solution by more than a known, hopefully small, percentage. Intuitively, one might expect that a mechanism which causes a particular algorithm to have a certain worst-case behavior might also be expected to manifest itself to a certain extent in the average case. 1.5 A Preview A review of past results on the model described in Section 1.2 as well as other related models is given in Chapter 2. In Chapter 3> bounds on the worst-case performance of the critical path algorithm are derived. A generalization of T. C. Hu's [10] theorem is obtained in Chapter k. Chapter 5 gives an analysis of an algorithm which is a variation of the critical path algorithm. Finally, the successors algorithm proposed by R. L. Graham [ 8 ] is analyzed in Chapter 5- 10 CHAPTER 2 A SURVEY OF PREVIOUS WORKS 2.1 Introduction In this chapter previous works on the scheduling problem for the general multiprocessing model described in Section 1.2 are reviewed. Furthermore, works on Garey and Graham's [6 ] augmented model with multiple resources and Liu and Liu's results [13] on the heterogeneous model will be presented. 2.2 The General Multiprocessing System 2.2.1 A General Bound For the general multiprocessing model described in Section 1.2, R. L. Graham [8 ] obtained the following remarkable theorem: Let ( and w 1 be the corresponding completion times when the jobs are executed according to these lists. Then, for an n-processor system toJ_ ^ 2n-l w ~ n Moreover, the bound is best possible in the sense that the right- hand side cannot be replaced by any smaller function of n. The significance of this result is as follows: Although it is desirable to have a priority list such that the 11 total completion time is minimal, any arbitrary priority list will not lead to an increment of more than 100$, . Since searching for an optimal priority list is quite time-consuming — as a matter of fact, there is no general algorithm short of exhaust ion- -the result in Graham's theorem is indeed comforting. Graham's theorem can be extended to a more general form. Let (Q, \i, O and (&',ju',<' ) be two sets of jobs such that (i) Q = J. <' J. for any J., J.eA, i 3 i a i o * (iii) ij.(J) 1 jLt'(J) for an y Je.O* • Let to and to' be the total execution times when these two sets of jobs are executed according to two lists L and L' on an n-processor and an n' -processor system, respectively. Then U)« n- 1 1 1. + to n 2.2.2 Some Special Bounds The results in the previous theorems say that even if no effort is spent in searching for a priority list, the performance of any arbitrary list will not be "too far off" from that of an optimal one. Another possible attitude toward the scheduling problem would be to spend some effort in searching for a priority list and hope that the performance of the list so obtained is "closer" to that of an optimal one. 2.2.2.1 The Critical Path Algorithm Let (Q,/Li,<) be a set of jobs. A chain C in (Cf) is a sequence of jobs J., i=l, ...,m such that J. < J for i=l, . ..,m-l. J, is said to head the chain C . The length of the chain C equals 12 m Z u( J. ) . Let C(J) be a chain headed by J which is of maximal i=l x length among all chains headed by J. The critical path algorithm (CPA) is a priority driven scheduling algorithm such that (i) If length of C(J, ) is greater than the length of C(J p )> then CPA assigns a higher priority to J than to J" 2 , for any 2 jobs J,, J„ of Cfr. (ii) If length of C(J- L ) equals that of C(jp) then CPA makes an arbitrary assignment of priorities between J]_ and Jg. When jLt(j) = 1 for all J in C^ then the length of C(j) is usually called the level of J . The CPA is then known as the level algorithm . Hu [10] has shown that the level algorithm is optimal when the partial ordering < is a forest. A generalization of Hu's result appeared in [ 1 ] and will be presented in Chapter k. Let w be the length of a critical path schedule and io_ be the length of an optimal schedule, for a given set of jobs Q. In [ 9 ] Graham gave examples to show that co ./w could be arbitrarily close to 2 -— for an n-processor system. One notes that 2 -— is the bound for W T / W for an arbitrary priority list L in an n-processor system. However, Graham conjectured in [ 9 1 that the bounds for co /w could be improved when all jobs in G. are of equal execution time. In Chapter 3 of this thesis this is shown to be the case. The following results are due to R. L. Graham [ 8 ] for the special case in which < is empty, (i.e., all jobs are independent). Theorem : Let L* be a priority list obtained by CPA. To be precise, a job will have a higher priority if it has a longer execution time. Let w be the corresponding completion time. Let oo be the 13 completion time for an optimal priority list. Then W CPA ^ k 1 w - 3 3n ' Moreover, this bound is the best possible. Theorem : Suppose the k jobs with the longest execution times are picked and the k highest priorities are assigned to them in such a way that they will be executed optimally. Let L(k) be a priority list obtained by appending to such an assignment of priorities arbitrary assignment of priorities to the rest of the jobs. Let w(k) be the corresponding completion time. Then tt(k) «; JL^ J o 1 + l^J The bound is best possible if k = (mod n) . 2.2.2.2 The Coffman-Graham Algorithm (CGA) The CGA which appeared in [ 3 ] is defined only for job sets (ft., ju, <) with u(j) = 1 for all J€Q> . Coffman and Graham showed that CGA produces optimal lists when u(j) = 1 for all J in (Q.,u,<) and there are only 2 processors in the system. Another algorithm which produces optimal lists under the same two conditions is due to Fujii, Kasami and Ninomiya [ 5 ] • Their algorithm is based on a matching algorithm for bipartite graphs of Edmonds. Let (Q,ju, <) be a set of n jobs such that /i(j) = 1 for all Je<^. Let S(j) be the set of all immedi ate successors of J. In Coffman-Graham algorithm, integer labels will be assigned to jobs and a( 0, there exist examples for which W CPA . 17 . w 10 2.4 The Heterogeneous Computing System In the previous models, the multiprocessing computing systems contain identical processors. In this section a more 18 general model in which different processors have different computation speeds is described and some known results will be presented. This model is a realistic one when one considers the possibilities of replacing one or more of the processors in an existing system by faster processors and of interconnecting different computers in an installation. By a heterogeneous computing system, one means a multiprocessor system in which processors have different computing speeds. One measures the speed of a processor against that of a "standard" processor whose speed is taken to be 1. The speed of a processor is said to be b if it is b times as fast as a standard processor. Without loss of generality one may assume that b > 1. A multiprocessor system which contains n, processors of speed b , n processors of speed b , . .., xl processors of speed b, is denoted by All the theorems that follow are due to Liu and Liu and appeared in [13]. Theorem : Suppose a multiprocessor system P= (^^-vY b^bg, ...,b k ) is given. Let w and w 1 be the completion times of a set of jobs ((l,li, K) when it is run on P according to a priority- driven schedule specified an arbitrary priority list L and an arbitrary (not necessarily priority-driven) schedule, respectively. Suppose that b, > b > ... > b. . Then 12 k 19 k E n-b, i=l Moreover, the bound is the best possible. The above theorem can be generalized as follows: Let P« = (n^, ...,n^; b^, . . .,b£) be another multiprocessing system. Let oo" be the completion time when ((1, n,K) is executed on P' according to an arbitrary (not necessarily priority-driven) schedule. Then Theorem : k' b' .£ n pi b' w ^ 1 1=1 1 1 1 ^ ; + go" - b. k k k £ n i b i , Z n Z, l b l i=l i=l Moreover, the bound is the best possible. Another interesting problem is to compare the execution of a set of jobs on two different multiprocessor systems using a priority-driven schedule specified by the same priority list. This is a realistic situation when some of the processors in an existing system were replaced by processors of different speed yet the priorities assigned to the jobs to be executed were not changed. To illustrate the point, the following special case in which a set of jobs (A., iu) with < empty is to be executed on two multiprocessor systems P = (n+l;l) and P' = (l,n;b,l) according to the same priority list L was investigated. Let co and to' denote the respective completion times. Then 20 Theorem: r b if b £ 2 4^ < 2(n+b). n+2 Between the two extremes of using an optimally chosen priority list and of using a completely arbitrary priority list, there is the possibility of using priority lists obtained by simple heuristic procedures. Such a possibility offers a good compromise between the performance of the resultant scheduling algorithm and the computational cost of determining the priority list. Liu and Liu [13] considered the following problem. A set of jobs (a, ju) with < empty is given. Suppose the r longest jobs in Q, are chosen. Let w denote the completion time of these, jobs when they are executed on P= (n^Dg, •••,.\; W '"'V according to a priority list L . Let L denote the priority list obtained by appending to L an arbitrary assignment of priorities to the remaining jobs in Q, . Let co denote the corresponding completion time, and let co' denote the completion time according to an arbitrary (not necessarily priority-driven) schedule. When oo > co and b, = 1, then r k ' Theorem : r 1 w' - X + Q k i=l where 21 Q, = max min r+l k r i L njb-j i=l 'V - b .!-i r+l k Z n,«b4 i=l Moreover, the bound is best possible when b. are integers and k Z n.b. divides r. i=l 1 i Another simple heuristic for assigning priorities to the jobs in (Q, ij.) with < empty is to assign higher priorities to longer jobs. Let w denote the completion time when (0, ju) is executed on P = (l, n;b, l) using a priority assignment according to the lengths of the jobs. Let u' be the completion time of an arbitrary (not necessarily priority-driven) schedule. Theorem : w . 2(n+b) CO' b+2 -, b g 2 and £«¥»*" 2.5 Conclusion There is still a lot of room for further investigation in each of the models surveyed in this chapter. In view of a growing number of researchers in this field some significant results are likely to emerge in the next few years . 22 CHAPTER 3 BOUNDS ON THE CRITICAL PATH SCHEDULING ALGORITHM 3.1 Introduction The critical path algorithm (CPA) as described in Section 2.2.2.1 is the basis for many of the project planning techniques such as PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method) . Therefore, an investigation into its worst-case performance is of both practical and theoretical interest. Graham [9] showed that to /co (same notation as in 2.2.2.1) is upper-bounded by 2 - — for an arbitrary job set (QjM, <) in an n-processor system, and that this bound cannot be improved. However, he conjectured that when Q. consists only of unit-time jobs, then the upper-bound for co ./w could be improved 2 to 2 - — - . The example at the end of this chapter shows that w ^t^a/ w o could be greater than 2 - — - and thus defeats Graham's CPA' to n+1 conjecture. The rest of this chapter is devoted to the proof of the following theorem [ 2 ] . Theorem 1 : For an n-processor system and a job set (Q, \i, <) of unit- time jobs W CPA ^ 1 _ ^ _ ^ 2 - — =- for n ^ 3 % n ~ :i and 23 W CPA „ 4 _ ^ =■ for n = 2 W 5 Moreover, the above bounds are the best possible. 3.2 Notations For a given set of jobs of unit execution time, let L be a priority list obtained by the critical path algorithm. The schedule obtained according to L shall be referred to as the critical path schedule (CPS) . It may be assumed that execution of jobs starts at t = 0. Let w denote the time at which the execution of all jobs is completed according to the CPS and let w n denote the corresponding time according to an optimal schedule. Throughout the paper, we shall refer to an arbitrarily chosen optimal schedule (among all possible optimal schedules) as the optimal schedule (OPS) . When a processor is idle from t to t+1 it is said to be executing an empty job during this time interval. An empty job will be denoted 0. Hereafter a job shall mean a non-empty job unless the contrary is expressly stated. Let a: CL -*• {0, 1,2,3, ••• ) be a function such that for any two jobs J and J , a( J, ) > a(J_) if and only if J has higher priority than J ? according to the list L, a(0) = for an empty job and a(j) > if J is non-empty. Corresponding to the CPS let t: Cl-> {0, 1, 2, ...} be the function such that t(j) equals the time at which the execution of J starts. Let p: (\ -*■ (0, 1,2, . . .,n) be the function such that p(J) is the index of the processor which executes J. A column of jobs in CPS is the set of jobs executed in the time unit [t,t+l] for some t, ^ t £ u - 1. A job J is said to be in the column of job J 2k if t(j_) = t(j p ). A job I is said to be executed to the right (left) of J if t(l) > t(j) (t(l) < t(j)). A block in CPS is a set of consecutive columns in CPS. For a job J in A n(j) denotes the level of J in d. 3.3 Some Preliminary Lemmas Lemma $.1 : If p(j) = 1 and t(j) £ t(j'), then a(j) > a(J'). Proof : If t(j) = t(j')j then according to convention that the higher priority job is assigned to a processor of lower index, we must have a(j) > a(J')- For t(j) < t(j'), a(J') > a(j) implies that J' is not ready at t(j); that is, a predecessor of J', K, is being executed at t(j). Now a(K) > a(J') since L is produced by a critical path algorithm. Thus a(K) > a(j) . This implies that p(K) should be assigned the value 1 which is a contradiction. ■ Lemma 3-2 : If a(j) < a(K) but t(j) < t(K), then there is a job I, satisfying t(l) = t(j) and p(l) < p(j), which precedes K. Proof : K is not ready at t(j) otherwise K would have been executed at t(j) or earlier since a(j) < a(K) . Thus, there must be a predecessor I of K which is being executed at t(j). We thus have a(l) > a(K) > a(j) . It follows that p(l) < p(j) by our convention. I We show now a construction of a chain in (1: 1. i := 1; let IL be any job in the rightmost column of the CPS. 2. i := i + 1; 3. Search the rightmost column that is to the left of the column containing U. (in CPS) for a job V. (empty or non-empty) such that a(V. ) < a(U. - ) . i y v l-l ' (a) if there is no such V., go to (5). 25 (b) If there is such a V. then, according to Lemma 3*2, there is a job in the column containing Vi which precedes U- -, . Let this job be U. . 1 k. Go to (2) . 5- Stop. We shall refer to the chain obtained this way as "the chain" in Q.. Note that if a column contains an empty job $, then either (ft or some non-empty job in this column (in CPS) will be a V. in the above construction, since cc(0) = 0. Consequently, the chain contains a job in each of the columns in CPS that have empty jobs . Let 0. be the set of columns in CPS each of which contains i non-empty jobs, for i=l, ...,n. We shall use a. to denote |0. |. The proof of Theorem 1 is now divided into 2 parts. 5.3.1 Part 1 of the Proof of Theorem 1: a n ^ a is assumed 1 n Lemma 3 » 3 : Let a , a , ...,a be non-negative integer such that a % a and n ^ 2. Let (Q,<0 be a set of unit-time jobs that has a., +2a r ,+3a^ + . . ,+na jobs and contains a chain of length a n +a +...+a _, 1 2 3 n d 12 n-1 Then the length of an optimal schedule w- is lower bounded by (a 1 +a 2 +...+a n )/(2 - —^) for n § 3 and by ^(a-j+a^ for n - 2. Proof : There are 3 cases to be considered. Case (l): na g (n-l)a.. +(n-2)a^+. . .+2a +a , and n ^ 3. Since Q v ' n v 1 v 2 n-2 n-1 f contains a chain of length a +...+a w s a +...+a . The inequality a.+...+a n+a _ _ _. (*) 1 - n-1 n tf 2n-3 _ 2 J_ { ' a +. • .+a n-1 ' n-1 26 is equivalent to (n-l)a ^ (n-2)(a_+...+a _) v ' n 1 n-1 which implies na - (a -a.) % (n-l)a n + (n-2) (a + — +a n ) n n V ' 1 2 n-1' The last inequality is true by the assumption of Case (l) and a, £ a" . Hence the inequality (*) is established. In Case (2): na > (n-l)a n +. . .+2a _+a " and n ^ 3- v ' n 1 n-2 n-1 Let d = na - ( (n-l)a n +. . .+2a ^+a _). Hence d > 0. n vv ' 1 n-2 n-1' There are na +(n-l)a n +...+a_, jobs in Q. and hence n v n-1 1 o w~ ^ (na +(n-l)a +...+a )/n. The inequality a + • • • +a n r^ -z 1 (**) 5L_ L_ * 2n-3 = 2 _ _±_ K ' (na +. ..+2a +a_ )/n " n-1 n-1 v n 2 1 7/ is equivalent to 2n-3 (n-l)(a + ...+a_.) ^ — — (na +...+2a^+a n ) n 1 / n v n 2 l y Subtract from both sides (2n-3)(a +...+&.), we obtain: n 1 ' 2n 3 — — ^ ((n-l)a_+(n-2)a_+...+a . ) < (n-2) (a +a _ + ...+a. ) n 1 v '2 n-l y v ' v n n-1 1' The righthand side of the inequality is greater than na -2a +na -d-a_ n n n 1 The lefthand side is equal to (na -d) n n 27 Thus the right-hand side minus the left-hand side of the last n-5 inequality is greater than or equal to a -a +(— = ^-)d which is non- negative since a s a and d ^ 0. Thus inequality (**) is proved. Case (3) : n = 2. There are 2a +a jobs in Q. Hence 2a 2 +a x - 2 Thus a^+a„ a + a a + a 21 21.11. (2a. +ai )/2 = a^ * T since a i * a 2 a 2+ ~ - +&1 3 Therefore Lemma 3-3 is proved. Now u)__. = a +a _+...+a, . Hence CPA n n-1 1 L g 2 - JL for n ^ 3 o) n-: and CPA ^ ^ ^ < ■=■ for n = 2 . W Thus Part 1 is done. 3 • 3 • 2 Part 2 of the Proof of Theorem 1: a > a is assumed Part 2 of the proof is divided into steps. Step 1 : Definitions related to the chain 1. A job belonging to the chain in CPS is called a chain job . Otherwise, the job is called a non- chain job. 2. A column in CPS is called a chain column if it contains a chain job. Otherwise, the column is called a non-chain column. 28 3. A job J in CPS is good if (i) it is a non- chain job and (ii) the level of J is higher than or equal to that of the chain job (if there is one) immediately to its right, or there is no chain job to the right of it. k. A non-chain job J in CPS is bad if it is not good. Note that a chain job is neither good nor bad but a non-chain job is either good or bad. 5. A column is bad if it has 2 to n-1 jobs and all the non- chain jobs in the column are bad. Lemma 3 A ; Any job J in a non-chain column in CPS is good. Proof: Let U. be the leftmost chain job that is to the right of J. 1 If J is bad then by definition the level of J is smaller than the level of U. . Hence a(j) < a(U. ) by definition of the critical path algorithm. There must be a job in the same column as J, or to the right of J that precedes U. which contradicts, by construction of the chain, the assumption that J is in a non-chain column and U. is the leftmost chain job to the right of J. ■ Step 2: A definition related to the given optimal schedule OPS 1. A special column (SC) in CPS is a column satisfying the conditions: (i) it is a bad column, and (ii) all the non- chain jobs of that column (which are bad jobs) are executed to the right of the chain job of that column, in the OPS. 29 2. The set of columns S-, is defined to be the set which is the union of and the set of SC ' s in CPS. A column belonging to S is referred to as an S -column. Lemma 3 • 5 : Suppose U. is the chain job in an S, -column. If K is a job which is executed to the right of IL in the CPS, then K is also executed to the right of U. in the OPS. Proof: An S -column has at least one empty job by definition. Since CPS is a priority driven schedule, K is either preceded by U. or by some bad job J in the column containing U. . But J is executed to the right of U., by definition of a S -column, in the OPS. Hence K must be executed to the right of U. in the OPS. ■ Lemma $.6 : Same hypothesis and notations as Lemma 3-5- All jobs executed to the right of U. in CPS are of lower level than that of U. . l Proof : We shall use the notations in Lemma 3-5« The level of J is at least 2 lower than that of U. by definition of a bad job. Since K is a successor of either U. or some J and hence l(K) < i(U.). ■ l i Step 3: A partition of CPS into 2 types of blocks Let <& = 0' U 0" where 0' is the subset of consisting r n r n r n r n r n of all the non-chain columns and 0" the subset consisting of all the chain columns of 6 . We introduce now a partition of the CPS: A segment is defined to be either (l) an S, -column or (2) the columns between two successive S..-columns (excluding the S -columns at both ends), or (3) the columns between an S, -column and the right or left end of CPS. Thus the CPS is partitioned into segments. A type I block is defined to be a segment containing at least one 0' -column. A 30 type II block is defined to be the union of all the segments between two successive type I blocks. Note that the CPS is thus partitioned into type I and type II blocks so that they alternate (see Figure 3.1) . Step k- : A partition of a type II block into sub-blocks Type II blocks are further classified and partitioned. Case (1): If the leftmost block in the CPS is a type II block, it is a type II B sub-block by definition. Case (2) : A type II block consisting of only one column which is not the leftmost column of CPS is defined to be a type IIA sub-block. (Note that this column must then be an S, -column.) Case (3) : The type II block does not belong to case (l) and case (2) , A column in a type II block belonging to this case is defined to be the dividing column if it satisfies the following three conditions : (1) It is not the leftmost column of the block. (2) The chain job in the column has higher level than that of all other jobs in the same column. (3) Among all columns in the block which satisfy (l) and (2) the column is the leftmost one. Clearly, if there is another S -column in the type II block besides the leftmost column then such a column satisfies conditions (l) and (2) and thus the dividing column exists. Therefore, if a type II block is not the rightmost block of the CPS, the dividing column exists for sure. In this case, we define the type IIA sub-block of this type II block to be the block of columns to the left of the dividing column and excluding the dividing x(D If x(n) 31 K U m U. 1 i-1 U tL IIA JL II B Figure 3.1 A partition of CPS in Type I and Type II blocks 32 column, and the type IIB sub-block of this type II block to be the block of columns to the right of the dividing column and including the dividing column. In the case of a type II block that does not contain a dividing column, it is defined to be a type IIA sub-block. Lemma 3 • 7 : (a) In a column of a type II block the level of any job is lower than or equal to that of the chain job in that column, (b) If C is a column in a type II block but is not the rightmost column of that block, then each job in C that is of the same level as that of the chain job U in C precedes some job K in the next column on the right of C, and l(K) = i(V) = f(C) - 1 where V is the chain job in the column of K. Proof : The rightmost column of a type II block is either an S, - column or is the rightmost column of CPS. In the first case the jobs in the column besides the chain job are either bad jobs or empty jobs by definition of S n -column. Bad jobs are at least 2 levels lower than the level of the chain job and empty jobs are of level 0. In the second case all jobs in that column are of level 1 since they have no successors. Thus Lemma 3«7( a ) holds for the rightmost column of the type II block. Suppose, as induction hypothesis, that Lemma 3-7 is true for all columns in the type II block to the right of a certain column C. Let U be the chain job of C. Now U is of higher level than the levels of all chain jobs to its right since U precedes them. In particular £(u) is greater than the level of the chain job of the rightmost column of the type II block. By Lemma 3«6, l(U) is greater than the level of any job executed to the right of this type II block. The levels of jobs in columns on the right of C and within 33 the type II block are, by induction hypothesis, not higher than the corresponding chain jobs in the respective columns. Thus the levels of all jobs in columns on the right of C and within the type II block are lower than the level of U . Let J be any job in the column of U. Successors of J must all be on the right of C in CPS . Thus the above argument implies the level of any successor of J is lower than that of U. Hence I (u) ^ i(J). Thus Lemma 3.7(a) Is proved. Now if i(U) = £(j) then J must have a successor K of level £(u)-l. By Lemma 3-7( a ) this successor K could only lie in the column next to and on the right of the column C. The level of this successor K is not higher than the chain job V in its column by Lemma 3.7(a). Thus |(u)-l i? i(V) > f(K) = i(U)-l. Hence !(K) = l(V) = !(U)-1. ■ Lemma 3 • 8 : Let X(ll) be a type II block containing a type IIB sub-block and X(l) a block of (consecutive) columns (not necessarily a type I or type II block) immediately adjacent to the left of X(ll) and containing no columns of a type IIB block. Let C be the dividing column of X(ll) (hence the leftmost column of the type IIB sub-block of X(ll)) and U the chain job in C (see Figure 3-2) . Then (a) If J is a job in the type HA sub-block of X(ll) and is of the same level as the chain' job in its column then J precedes U (b) If J is a good job in X(l) then J precedes U. Proof ; If J is a chain job (a) follows from that fact. If J is not a chain job, (a) follows by repeated applications of Lemma 3 '7(b) since U is the only job in C whose level is as high as f(U). One notes that X(l) might contain type II block of case (2), i.e., single column type II block and in this case the type II block is a type IIA block. 3h s l K s i s i' s i s i JA h ^ ^ Type II II II IL I II Figure 3.2 An illustration of X(l) and X(ll) "block of Lemma J. 8. 35 Let U. be the chain job in the leftmost column of X(ll) . This column is an S, -column. Let U. n be the chain job in the 1 l-l column on the right of and next to the column of U. . U. may- or may not be U. Then £(U. -.) = £(U.)-1 by Lemma 3.7(b). Let J be a good job in the rightmost column of X(l) . By definition of good job £(j) ^ £(U.). Successors of J are executed to the right of J in CPS. If J does not precede U. or any job in the column of U. that are of level £(U. ..), then by Lemma 3-7 and Lemma 3.6 the maximal level of the successors of J is not higher than £(U.)-2. Thus £(j) % £(U.)-1, contradicting the previous inequality £( "by Lemma 3.7 and Lemma 3-6. Hence !(J) ^ ^(U m )" 2 g ^ K )" 2 - Thus in any one of these 3 cases the maximal level of the successors of K is not greater than i(K)-2 which is a contradiction. Therefore K must precede one or more of the following: (a) a chain job G where t(k) < t(G) ^ t(U. )• Th en K precedes U since U is part of the chain. (b) a good job G where t(K) < t(G) < t(U. )• Then G precedes U by induction hypothesis. Hence K precedes U. (c) U. , which is a particular case of case (a). (d) some job J in the column of U. , and £(j) = £(U. , ). Then J precedes U by part (a) of this lemma. Thus K precedes U. One sees that in each case K precedes U, and if K precedes U then K precedes all chain jobs executed to the right of U and in particular those chain jobs in type I IB blocks to the right of X(l) . ■ The following corollary easily follows : Corollary : (a) If J is a job in a type IIA block whose level equals that of the chain job in its column, then J precedes any chain job in a type IIB block which is on the right of the type IIA block, adjacent or otherwise. (b) If J is a good job in a type I block, then J precedes any chain job in a type IIB block (if there is one) on the right of the type I block, adjacent or otherwise. Step 5: A reduction of the CPS by the dropping of some jobs from CPS To find an upperbound for w /w it is necessary to find a lower bound for 10 . In estbalishing this lower bound, it is 37 unnecessary to consider all the jobs. The jobs that will not be included in such a consideration are referred to as jobs which have been dropped. 1. Jobs dropped from columns in type I blocks : (a) From a column which is not bad, all bad jobs are dropped. (b) A bad column in a type I block is not a special column by definition of a type I block. Hence one or more of the bad jobs in a bad column are not executed to the right of the chain job of the bad column in OPS. From a bad column in a type I block only those bad jobs which are executed to the right of the chain job of the same column in OPS are dropped . Therefore one or more bad jobs and the chain job are not dropped. 2. Jobs dropped from a type II block : (a) From a column in a type IIA sub-block, all jobs whose levels are lower than the level of the chain job of the column are dropped. (b) From a column in a type IIB sub-block, all jobs except the chain jobs are dropped. After jobs are dropped from CPS as described in (l) and (2), the resulting schedule is referred to as the reduced critical path schedule (RCPS) . In the future when one speaks of a column or a block of columns one always first specifies the schedule CPS or RCPS. A 38 column or a block of columns in RCPS is nothing more than the same column or block of columns in CPS minus the jobs dropped with the position of the column or the block of columns along the time- axis remaining the same. The following lemma is trivial. Lemma 3-9 : (a) From a 0' -column, no jobs are dropped. (b) From an S^- column, all jobs except the chain job have been dropped. Proof : (a) ^'-columns are the non-chain columns which only contain good jobs by Lemma J>.k. 0' -columns are in type I blocks and they are not bad columns. (b) Fvery S-, -column is in a type II block. An S, -column is either a -column or a bad column. The level of a bad job is at least 2 levels lower than the chain job in its column. Thus all jobs, if there is any, except the chain jobs are dropped from the S-, -columns. ■ Lemma 3«10 : Let U be a (chain) job in a column of a type IIB block in RCPS and J be a job which is not in a type IIB block in RCPS, then t(j) 4 t(U) in OPS . Proof : t(u) 4 t(J) in RCPS otherwise J belongs to the type IIB block, For any job J in RCPS, t(j) in CPS is equal to t(j) in RCPS. Two cases arise. Case (1): t(u) < t(j) in RCPS (CPS) . Let U. be the chain job in the rightmost column of the type IIB block. This type IIB block is not the rightmost block in CPS for J, which is outside this type IIB block, is executed to its right. Therefore, the rightmost column is an S-, -column. Now 39 t(Uj_) < t(j) since J does not belong to the type IIB block. By Lemma 3-5, t(U. ) < t(j) in OPS. But U is a chain job hence U precedes U. (or U = U. ) . Thus t(U) ^ t(U. ) in any schedule. Therefore t(U) < t(j) in OPS. Case (2): t(j) < t(U) in RCPS (CPS) . If J is a chain job then J precedes U. Hence t(j) ^ t(U) in OPS. Hence we may assume J is not a chain job. There are 3 sub-cases: (a) J is a job in a type IIB block in RCPS : J is a job whose level equals that of the chain job in its column otherwise J would have been dropped in Step 5. By the corollary (b) to Lemma 3-Q, J precedes U. Thus t(j) 4 t(U) i n a ^y schedule and in particular OPS. (b) J is a good job in a type II block : Again Corollary (a) to Lemma 3-8 implies J precedes U. Thus t(j) / t(U) in OPS. (c) J is a bad job in a type I block : Let U. be the chain job in the column of J in RCPS (CPS) . J The fact that J is not dropped in Step 5 implies that J is in a bad column such that t(U.) it t(j) in OPS. Now U. precedes U since U. is J J J on the left of U. Thus t(U.) < t(u) in OPS. From above t(j) ^ t(U.). o J Therefore t(j) < t(u), and hence t(j) / t(u) . ■ Step 6: Derivation of the upperbounds Let n., i=l,2, ...,n be the set of columns in RCPS each of which has exactly i jobs. Let n' be the subset of n consisting of the non-chain columns in RCPS. Thus Tt * = ' since no job in a column n r n in 0' was dropped in Step 5. Let it" be the subset of tt consisting of 1*0 the chain columns of it in RCPS. So it = Jt' U it". A column in it. n n n n 1 shall be referred to as a it. -column. Thus a type IIB block in RCPS consists completely of j^- columns. Lemma 3 • 11 : (l) Number of it -columns in type I blocks is less than or equal to |0"| - |it"|. (2) Number of it -columns in type IIA sub-blocks is less than or equal to |it' | (note that |it' | = |0' |). (3) Number of it, -columns is larger than or equal to |0 | plus the number of it -columns in type I blocks. (k-) Number of it -columns in type IIB sub-blocks is larger than or equal to |0 | - |0' | ^ |0"|* Proof ; (l) According to Lemma ~t>X, only columns in the type I blocks in CPS that could possibly become it -columns in RCPS are chain columns in which all non-chain jobs are bad. But 1(b) of Step 5 implies that bad columns in a type I block in CPS could not become it, -columns in RCPS. Thus only ^"-columns can possibly become it, - columns. Since the subset it" of ^"-columns remains unchanged and n r n hence only |0" | - |it"| could have changed to it -columns. (2) Recall the dropping in 2(a) of Step 5. By definition of a type IIA sub-block, every column besides the leftmost column of a type IIA sub -block has at least one other job whose level is as high as that of the chain job in the column. Thus none of the columns besides the leftmost column of the type IIA block in CPS could become it -columns. Thus corresponding to each type IIA block in CPS there is only one it -column in RCPS. Now each type I block has at least one -column by definition. So the number of type I blocks is greater than |0' | which is equal to |it' |. With each type IIA block kl one can associate the type I "block adjacent to and on the left of the type IIA block. Thus there are no more than |0* | = | tc * | type IIA blocks and so no more than |0' | = |it' | it -columns in type IIA blocks. (3) All -columns are in type II blocks and automatically become it -columns in RCPS . Hence (3) follows. (k) It follows from (l), (2), and (3). ■ 3-^ Conclusion of the Proof of Theorem 1 Let jrJ be the subset of it -columns not in any type IIB block (so in type I and type IIA blocks) whereas it" be the subset of it -columns in type IIB blocks. Then it = it 1 U it". Define S(tt. ) to be the set of jobs in rt. -columns in RCPS. l l Lemma 3-10 shows the execution of the jobs of S(n") in OPS is completely independent of the execution of the jobs of S(rt' U it U . . . U n ) in the sense that any job of S(it") is not executed at the same time with any job of S(it' U it U ... U it ) in OPS. Now there is a chain of length lit ' | + |iu | + . . • + U .1 + 1 it" | in t> ' 1 ' ' 2 ' ' n-1 ' ' n ' S(rt' U n U ... U n ). The following cases arise. Case (1): n|itj * (n-l) |«{ | + (n-2) \* 2 | + . . .+2 |n n _ 2 | + | n^ 1 and n 1 3 By the above remark w ^ ( |n| | + |ir | + . . .+ |n n _ 1 | + |it^ |) + |it^ |, W CPA = l^il + '" + l^ I = U.I + .-. + ln | since CPS and RCPS have the same length. Thus a, lit'l+.-.+lit'l+lit'^ + lit:'! CPA - 1 , n ll n ll l l - g |it'| + ...+0+|it^| + |it^| ' U) The inequality k2 ( ' l*il+|»gl + .. -40+1^1 + 1^1 s ~ " 2 - H^l is equivalent to (by similar manipulations as those in Lemma 3*3 of part l) nj«£l+(|*il->;l) S (n-l)|^| + (n-2)(U 2 | + ... + | Vl |) + (n-2)(|^| + |^|) Now by assumption n | n' | < (n-1) |it' |+(n-2) ( \x | + . . . + |jt |). ' I ' ; U0:l-I (n-l) |jt.jJ+(n-2.) j* 2 |+. .. ^W n _ 2 \ + |n n-1 | and n i= 3- Let d x = n| T t i ;|-((n-l)|^| + (n-2)|Tt 2 | + ...+2| V2 | + | J t n _ 1 |). Hence d_ > 0. There are nln' | + (n-l) lit _ | + . . ..+ |it n Unlit" I jobs in 1 f n ' . ' n-l 1 1 ' ' n ' ° S(n' U it. U ... U it' U it"), and S(n") needs an additional U"| units 12 n n ' v r '1' of time. So a) i (nl^l^n-Dl^^l + .-. + l^l+nl^D/n+l^l = (n|^|+(n-l)| Vl |+... + | 1 t 1 |)/n+| J t»| + |«j| , - CP a= KM^l+---+Kl+Kl+Kl • The inequality (****) ^ g 2n-2 = 2 _ 1 for n ^ 3 w n-l n-l ' k-3 is true if (after similar manipulations as those in Lemma 3*3 of part 1) the following inequality is true Og (|^|-| I r'|) + (^)d 1+ (n-2)(|^| + |^|) The inequality (n-2) ( |jt" | + |jt" | ) ^ |n'|-|jt'| is proved in case (l) . Since d > 0, therefore the last inequality is true. Hence (*-**■*) is established. Case (3) : n = 2 In S(n| U n£ U jtg) there are | «^ | +2 | jt^ | +2 | n^ | jobs. To I jr-i | . . execute the jobs in S(rt' U jt' U rt") OPS need at least — ~- + |tt_| + |jt" OPS needs an additional |n"| units of time to execute S(n") . Thus "o ^"T- + |*J|+|it£M*£ "CPA - l"il T|U l |T|Jl 2 |T,Jl 2 t _ CPA 1+ . So — - — ^ 7 is true if W 3 l«{l £ l^l + l^l + K is true. But |n^|-|n^| ^ (|^|-|n^|) by (l) and (2) of Lemma J. 11, |n^| § |0" | by (H) of Lemma 3.11. So the last inequality is established. Therefore part 2 of the proof of Theorem 1 is completed. The partial ordering in Figure 3-3 is an example which shows the bounds in Theorem 1 are exact. If CPA assigns a lower priority to the rightmost job of every level than that of all other jobs of hk m Figure 3-3 An example of w cpA / w -♦ % for a ^-processor system. 45 the same level then the length of CPS equals 7+5 (m-1) for a ^-processors system. The length of an OPS can be^easily shown to be 5+3(m-l). Thus ^ /w 2. a s m -» ». The generalization of the above example to n-processors systems for any n ^ 2 is obvious 14-6 CHAPTER h A SUFFICIENT CONDITION FOR THE OPTIMALITY OF THE CRITICAL PATH ALGORITHM k .1 Introduction In [10] T. C. Hu proved that for an n-processor system and a job set (Q.,id,<) with ju(j) = 1 for all JeQ, and < being a forest the critical path algorithm produces an optimal schedule. K is defined to be a forest if every job has at most one successor. Hu's proof is far from simple. In this chapter a simple proof of a generalization of Hu's theorem will be presented. k-,2 A Generalization of Hu's Theorem For a given set of jobs (Q.,u,0 where ju(J) = 1 for all JeQ. let £(J) be the level of a job J. Let h be maxU(j) |je is the set of jobs to be executed which satisfies the hypothesis of the theorem. We shall explicitly construct a schedule that executes all the jobs in time t, and this schedule is the same as the one given by the CPA. Let L - (J]_J2J^Jk • • • ) be the priority list given by CPA. By definition the first N, jobs are of level h followed by N, , jobs of level (h-l) and so on. Suppose we have t columns of boxes indexed from 1 to t and each column contains n boxes indexed from 1 to n. Suppose we regard the jobs as physical objects and we are going to distribute the jobs among the boxes assigning one job to one box according to the following procedure. (1) Initialize k = 1. (2) Assign J, to the column with the smallest index which can accommodate J (A column can accommodate a job J if that — ___ _______ __ j£ column at that moment contains less than n jobs and all predecessors of J are in columns with smaller index than that of the column) and to the box with the samllest index which is not yet filled. (3) Increment k, k=k+l . (k) If k < |p|, GO TO (2). (5) STOP. Thus the jobs are assigned in'order of their priorities. 50 (*) We claim that after step (2) is executed on J^. where Jv is the lowest priority job of level i, all jobs which have been assigned at this point are in columns with indices smaller than or equal to (t-i+l), i=h,h-i, . . .,1. (**) For i=h, the claim is true by virtue of the inequalities (4-1). Assume it is true for all i, h^i>j. Suppose, for contradiction, that in the process of assigning jobs of level j we encounter for the first time that a certain job J of level j cannot be assigned to a column with index smaller than or equal to (t-j+l) . Two possibilities arise. Either each column with index smaller than or equal to (t-j+l) aready contains n jobs or there are some columns in that range which contain less than n jobs but cannot accommodate J because of some precedence relations. In view of the inequalities (k-l) only the second possibility could arise. By induction assumptions jobs of level j+1 or higher are in columns 1 to (t-(j+l)+l) and so column (t-j+l) is filled with jobs of level j. Let r(< t-j+l) be the largest among indices of columns preceding column (t-j+l) which are not filled at the time of assigning J. That is to say all columns succeeding column r up to column (t-j+l) are filled at this time. (***) we assert that each column, s, r |s 2 | (2) |s, I = |s p | = m (say). Suppose L.. and L p are the lists of jobs in S and S in descending order of priorities. There exists an n, g n ^ m-1, such that the first n jobs in both lists coincide and the (n+l) job in L, has higher priority than the (n+l) tn job in L . Remark: Two jobs or two subsets of a level have the same priority if they are identical. 53 The CPCA is an algorithm such that a job of higher level will be assigned higher priority than a job of lower level. CPCA will be defined once the priorities of jobs within a level are specified. This shall be done inductively as follows: (1) Assign priorities to jobs of level 1 arbitrarily. (2) Assume priorities have been assigned to jobs of level if for all i, 1 < i < n. Let I and J be two jobs of level n. Let l(i) and j(i) be the set of successors of I and J that are in level i respectively, 1 £ ± < n> Define L(I) = (I(n-l), l(n-2), ..., 1(1)) L(J) = (J(n-l), J(n-2), ..., j(l)) . A higher priority will be assigned to I than J if there exists an m, ^ m < n-1, such that the first m sets of both lists coincide and the (m+l) set of L(l) has higher priority than the (m+l) set of L(j) . If such an m does not exist then the two lists coincide and we make an arbitrary choice of assignment of priorities between I and J. 5-2 Optimality of the Critical Path and Count Algorithm For a 2-processor system CPCA will be shown to produce optimal schedules. Theorem 3 : For a 2-proeessor system and a set (Q, u, <0 of jobs where u(j) = 1 for all JeCU "CPCA _ where w p and w are the completion times using respectively a schedule produced by CPCA and an optimal schedule respectively. 5h The notations in Section 3«2 and Lemma 3.1 of Section 3*3 of Chapter 3 will be used in the proof of Theorem 3* Of course the function a is defined using CPCA instead of CPA and the functions t and p are defined using a schedule produced by CPCA. Proof of Theorem 3 : Suppose CL is executed using a schedule produced by CPCA. Jobs V. and W. are defined recursively as follows : (1) V is defined to be the job executed by P satisfying t(V ) = w ppn -1 (i* e '.> v n is the last job to be executed by P ). Similarly W Q is defined to be the (possibly empty) job executed by P 2 with t(W Q ) = VcA -l. (2) In general, for k ^ 1, W, is defined to be the K. (possibly empty) job J for which a(j) < cc(V, ,) , t(j) < t(v n ) and t{j) is maximal. It follows from Lemma 3-1 that W, must be executed by P„. V is defined to be the job executed by P.. satisfying t(v k ) = t(w k ) . If W., does not exist then no processor is idle before time w -1 and w p . is clearly equal to w . Hence, it may be assumed that W-, (and therefore V ) exists. Suppose W. ' s exist for O^igm. X. is defined to be the set of jobs J satisfying t(V. +1 ) < t(j) g t(V.) but with J 4 W., 1 < i ^ m. Since V _ does not exist then X is the set of jobs J with t(j) £ t(V ) and J 4 w . T m The cardinality of each X, is odd. Let |x, | = 2il -1 for a positive integer n, , < k ^ m. An example illustrating the definition of X. 's is given in Figure 5-2 which is a reproduction 55 of the timing diagram in Figure 5-1 with boxes drawn to isolate the sets X. 's. 1 The heart of the proof of the theorem is contained in the following assertion: (*) JeX and J'e\ +1 imply J' < J for ^ k ^ m. For k = there is nothing to be proved. Assume, for induction, that (*) is true for all k less than i for some i, 1 ^ i ^ m-1. Now a(W. , ) < a( J) for all JeX. by definition of W. , . But v 1+1 X 1+1 t(W. ) < t(j) for all JeX. . Therefore J is not ready at t(W. ) fo r all JeX. • This implies V. , < J for all JeX. . Suppose, for induction, that I' < J for all JeX. for each I' satisfying I'eX. n ' l J l+l and t(l') > t(V. , ) - r for some r, < r ^ n.-l. Let I be a job l+l ' l ° of X. . such that t(l) = t(V. ,) - r. If I < I' for some I'eX. n l+l x v i+l y l+l and t(l) < t(l') then I < J for all JeX. by the induction hypothesis above. Hence it can be assumed that I precedes no jobs in X. . Suppose, for contradiction, that I does not precede some job K in X ± . Now a(K) > a(V i ) by definition of X. and a(V. ) > a(W. ) by definition of W. . Therefore a(K) > a(W. ) . Similarly a(l) > a(V. ) . All successors of I, with the possible exception of W. ' and of those in X., are of lower level than that of V., by l+l i' i' J the first induction hypothesis and the definition of W^ ' s and hence that of K. The fact- that a(K) > a(W. ,) implies that the set of successors of V. at the level f(K) has higher priority than the 1+-L set of successors of I at the same level. Successors of I of level higher than l(K) are contained in X. which is contained in the set of successors of V. ... According to the definition of CPCA a(V. ,)> a(l) which is a contradiction. Thus I < J for all JeX. . i+l y v ' l This completes the induction and proves (*) . 56 12 L - (J ig ,J l8 , ...,J 1 ) N.B. Jobs have "been indexed so that the index of a job is equal to the label assigned by Algorithm CPCA, i.e., a(j-j) = j. P x | J 19 | J 17 , J l6 , J lk , J 12 | J 1Q | J q | J 7| J 5 , J 2 | P 2 | J l8 , J 6 ( J l5 , J 13 , J H, J k- , J 8 | j( ( J3 , J l 4-^f Figure 5-1 Example of Algorithm CPCA, 57 *il J 19 , J 17 J l6 , J Ht , J 12 J 10 J 9 , J 7 J 5 J 2 X, '18 *l J 15 , J 13 , J n , J, Jc i X, J 3 , J l Note that job indices are chosen to correspond to labels, i.e., a(Jj) = j. Figure 5.2 Example for Definition of X. . 58 According to (*) all jobs in X. have to be completed before any job in X, can be started in any schedule. X, , has 2n, , -1 jobs and will require at least n, units of time. Thus m to complete Q? requires at least Z n, units of time in any m k=0 * schedule. Since w . = Z n, the schedule produced by CPCA is optimal. ■ For to,ju,0 with ju( ; J) = 1 for all Jety CPCA and the Coffman-Graham algorithm are two variations of the CPA which assign higher priorities to higher level jobs. It would be interesting to know if there exists an efficient algorithm which assigns higher priorities to higher level jobs and which is the best among the set of all such algorithms. 59 CHAPTER 6 ON THE SUCCESSORS ALGORITHM 6.1 The Successors Algorithm Let (&.,/!,<) "be a set of jobs with u(j) = 1 for all JeQ.. The Successor Algorithm (SA) assigns a higher priority to a job with more successors than a job with less successors. If two jobs have the same number of successors then SA breaks tie between them arbitrarily. 6.2 A Bound on the Successor Algorithm In this chapter the following theorem will be proved. Theorem h : For a 2-processor system and a job set (<2,u,<) with jli(j) = 1 for all Jeg where w and w are the lengths of a schedule produced by the SA and an optimal schedule respectively. Moreover, the bound is the best possible. Proof : The notations of Section 3.2 and Lemma 3.1 of Section 3-3 will be used. The function a is defined by SA and the functions t and p are defined using a schedule produced by SA. SS will be used to represent a schedule produced by the SA. OPS is an arbitrarily chosen optimal schedule (chosen from the set of all possible optimal schedules) . Let 0-, be *^ e set of columns in SS each of which has 6o only one job and p be the set of columns in SS each of which containing two jobs. Thus w = |0 | + |$f |. A column in 0. will be referred to as a 0. -column for i=l,2. The proof of Theorem k is now divided into two parts . Part 1: [jjL [g|0 2 1 is assumed There are |0_ | + 2.|0 | non-empty jobs in Cfr . Thus w. ^ (|^| + S|0 J)/2 . Therefore k 3 ' Part 2:_ |0 , | > [gjLJ is assumed The heart of the proof lies in the fact that OPS has at least l^l-l^g | empty jobs. Jobs V. ' s, W. ' s and sets of jobs X. ' s are defined in i ' i i exactly the same way as in the proof of Theorem 3- The following properties about V. 's, W. 's and X. 's are true independent of the algorithm used to produce the schedule. (1) V. < V. for all g j < i g m. (2) V. < J for all JeX. with g j < i ^ m. 61 (3) a(W. O < a(V.) < a(j) for all JeX. with g i ^ m. By definition of the V. 's and W. 's each n -column contains * l i r l a V. and a W. = since a(0) = 0. Let " "be the subset of oL j j r vr/ r 2 ^2 consisting of all -columns each of which contains a V, (and hence a W too) . Let 0' be -0" Thus = 0' U 0". It can he seen that 01- columns are contained in X. 's. An X. consists of some r 2 11 0' -columns and the job V. . Since the V. 's form a chain (property (l)) there is a chain of length |0"|+|0 | in Cl. The following assertion will be proved: For any J in X. where J f Y. f J precedes V where V is the (only) job in the second 1 S S ' ■ ■ '■ -column on the right of V.. Equivalently this assertion says that each job in a 0'-column precedes the job V in the second -column on the right of the 0' -column. The above assertion will be proved by induction. Let JeX. and J / V. . Suppose, as induction hypothesis, that all jobs in X. different from V. and executed to the right of J satisfy the i l — — conclusion of the assertion. If J precedes any one of these jobs then J precedes V by induction hypothesis. Hence it can be assumed s that J precedes no jobs in X. . It can also be assumed that J precedes no V. executed to the left of V for otherwise J < V. < V and the J s OS assertion is proved.. Let V f be the job in the first -column on the right of V. . Then V. precedes the following: (a) All V., for all j < i. This means V. precedes all V. executed to its right (property (l)). (b) All jobs in X., for all j < i (property (2)). J 62 (c) All jobs executed to the right of V. . This follows from the fact that "V- precedes all jobs executed to its right and Vj_ < V f . Now a(V. ) < a(j) according to property (3). Hence J has at least as many successors as V. . If J odes not precede V- then, by a simple counting on the timing diagram, V. will have X more successors than J which is a contradiction. Thus the assertion is established. The assertion has the following implication: If J is in a 0'-column and V n is in a 0., -column, then either J < V, or *-2 k -kL r k V, < J unless V k is in the first 0. -column on the right of the 0' -column containing J. Now there are two jobs in each 0' -column. Thus the following important conclusion can be drawn: If J, and Jp are the two jobs in a 0'-column then either J. or J is not executed simultaneously with any V, where V, is the job of any n -column in the arbitrarily chosen OPS . Thus at most half of the jobs in 0' -columns can be executed with V, ' s in -columns in OPS . The only other jobs of (\ that can be simultaneously executed with V, 's in -columns are the ¥. 's and there are only |0"| of them. Now there are |0 | V ' s in -columns and hence there are at least |0 | - — • 2 |0' | - |0"| V ' s in -columns which are not executed simultaneously with any other job of Q, in OPS . Therefore there are at least |0 |-|0 p | empty jobs in OPS. Hence u) l (I0J + 2|^| + |^| -|0 2 |)/2= 1^1+1^1/2 . Therefore 63 "SA . l*iH* ; 2 ^ a l^kl0 2 l/2 - I ^Kl^ l /2 since 1^1 < \h I 4 3 ' , 4 The example in Figure 6.1 shows that the bound ^ is the best possible. SA might produce a list L = ( J>, J-, J. , J,, J , J ) . The corresponding SS equals k while OPS equals 3 • * The following question is still open: What is the least upper-bound for w q . A> when there are 3 or more processors in the system? 6k Figure 6.1 An example for which w ./u = ^ . 65 CHAPTER 7 CONCLUSION At this point there are certainly more questions than answers available. But the following questions seem to be particularly interesting. 1. What efficient algorithm A exists such that w A /°° n is bounded away from 2? It seems natural that if one is willing to use more complicated algorithms one can be guaranteed that oo /to will get closer to 1. 2. What efficient algorithms producing optimal schedules exist for a job set (n, p., <) with u(J) = 1 for all Je d. when there are 3 processors in the system. 66 LIST OF REFERENCES [1] Chen, N. F. and C. L. Liu, "On a Class of Scheduling Algorithms for Multiprocessor Computing Systems, " 197^ Sagamore Computer Conference on Parallel Processing. [2] Chen, N. F. and C. L. Liu, "Bounds on the Critical Path Scheduling Algorithm for Multiprocessor Computing Systems, " (to appear) . [3] Coffman, Jr., E. G. and R. L. Graham, "Optimal Scheduling for Two Processor Systems," Acta Informatica 1,3 (1972), pp. 200-213. ' " [k] Cook, S. A., "The Complexity of Theorem Proving Procedures," Proceedings of the 3 r ^ ACM Symposium on Theory of Computing , 1970. [5] Fujii, M., T. Kasami, and K. Ninomiya, "Optimal Sequencing of Two Equivalent Processors," SIAM J. Appl. Math . l"J f k (July, 1969), pp. 7811-789; Erratum 20,1 (January 1971), p. lil. [6] Garey, M. R. and R. L. Graham, "Bounds for Multiprocessing Scheduling with Resource Constraints, " SIAM Journal on Computing , (to appear) . [7] Garey, M. R., R. L. Graham, D. S. Johnson, and A. C. Yao, (to appear) . [8] Graham, R. L., "Bounds on Multiprocessing Timing Anomalies," SIAM J. Appl. Math . 17,2 (March 1969), pp. kl6-\2.9. [9] Graham, R. L., "Bounds on Multiprocessing Anomalies and Related Packing Algorithms, " Proceedings of the AFIPS Conference, ko (1972), pp. 205-217. '~"~ -™ [10] Hu, T. C, "Parallel Sequencing and Assembly Line Problems," Oper. Res . 9,6 (November 1961), pp. 8^1-81<-8. [11] Karp, R. M., "Reducibility Among Combinatorial Problems," in Complexity of Computer Computations , R. E. Miller and J. W. Thatcher (Eds.), Plenum Press, New York, 1972, 85-10^. [12] Lam, S. and R. Sethi, "Worst Case Analysis of Two Scheduling Algorithms, " (to appear) . 67 [13] Liu, J. W. S. and C. L. Liu, "Bounds on Scheduling Algorithms for Heterogeneous Computer Systems, " Proceedings of the IFIPS 197*4- Congress , North-Holland Publishing Co., August 197I4-. [Ik] Ullman, J. D., "Polynomial Complete Scheduling Problems," ij-th Symposium on Operating Systems Principles, Yorktown Heights, New York, (October 1973) , PP« 96-101; to appear JCSS . [15] Yao, A. C, "Scheduling Unit-Time Tasks with Limited Resources," Proceedings of the Sagamore Computer Conference, 197^ • 68 VITA Nai-Fung Chen was born on May 22, 19^5> in China. He received a B.A. degree in Mathematics from the University of Hong Kong. In August, 1972, he received an A.M. degree in Mathematics from the University of Illinois at Urbana-Champaign. While at the University of Illinois, he held a teaching assistantship for three years, a teaching fellowship for two years and a research assistantship for two years. BIBLIOGRAPHIC DATA 5HEET 1. Report No. UIUCDCS-R-75-72^ 3. Recipient's Accession No. I. Title and Subtitle AN ANALYSIS OF SCHEDULING ALGORITHMS IN MULTIPROCESSOR COMPUTING SYSTEMS 5. Report Date May, 1975 6. '. Author(s) Nai-Fung Chen 8. Performing Organization Rept. No. ). Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana- Champaign Urbana, Illinois 61801 10. Project/Task/Work Unit No. 11. Contract/Grant No. NSF GJ-41538 12. Sponsoring Organization Name and Address National Science Foundation Washington, D.C 13. Type of Report & Period Covered 14. 15. Supplementary Notes 16. Abstracts In this thesis the worst-case performance of three scheduling algorithms for unit- time, jobs are studied. The least upper-bound on the ratio of the length of the critical path schedule to that of an optimal schedule is established for an n-processor system. A sufficient condition under which the critical path algorithm is optimal is given. A variation of the critical path algorithm is considered. It is shown to produce optimal schedules when there are only two processors in the system. Finally the successors algorithm is studied and the least upper- bound on its worst-case performance on a 2-processor system is obtained. 17. Key Words and Document Analysis. 17o. Descriptors 17b. Identif icrs/Opcn-Endcd Terms 17c. ( OSATI Field/Group 18. Availability Statement 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 21 22. Price FORM NTIS-3B ( 10-70) USCOMM-DC 40329-P7I J tf» f % in CO