m LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 U6r no. 7J5-72I cop. Z Digitized by the Internet Archive in 2013 http://archive.org/details/studyofconcretec716yaoa JlUL&t UIUCDCS-R-75-716 X «E UB^y F THE A STUDY OF CONCRETE COMPUTATIONAL COMPLEXITY 3 ^^ by- Andrew Chi -Chin Yao May 1975 Mi^m f. UIUCDCS-R-75-716 A STUDY OF CONCRETE COMPUTATIONAL COMPLEXITY* hy Andrew Chi -Chin Yao May 1975 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 * This work was supported in part by the Department of Computer Science and in part by the National Science Foundation under Grant GJ-^1538, and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, 1975- Ill ACKNOWLEDGMENT I am very grateful to my thesis advisor, Professor Chung- Laung Liu, for the invaluable guidance, advice and encouragement he gave during my study at the University of Illinois. I would also like to thank Professor Edward Reingold for the genuine interest and kind aid extended to me . Thanks are due to the Computer Science Department of the University of Illinois and the National Science Foundation for supporting this research. Special thanks go to Mrs. Connie Slovak for the excellent work in typing this thesis as well as the earlier manuscripts. I wish to thank my wife, Frances, who introduced me to this fascinating field of study, provided many ideas and suggestions related to this work, and constantly devoted her love. Finally, I am happy to have had a most pleasant time with my officemates, Jim Bitner, Nai-Fung Chen, Brian Hansche, and John Koch. IV TABLE OF CONTENTS Page 1 . INTRODUCTION 1 1 . 1 Concrete Computational Complexity 1 1.2 Models of Computation 1 1.3 An Overview of the Thesis 2 2 . SELECTION NETWORKS 5 2 . 1 Introduction 5 2.2 Known Bounds for U(t,N) and T(t,N) 6 2.3 New Results Concerning U(t, N) 7 2.3.1 Asymptotic Behavior of U(t,N) for Fixed t 7 2.3.2 Other Sufficient Conditions for U(t,N) * [%(t+l)]N 8 2.3.3 Lower Bounds for U(t,N) 10 2.k New Results Concerning T(t,N) Ik 2.U.1 Main Theorem Ik 2A.2 Value of T(t,N) for Fixed t and Large N l8 2A.3 More Applications of Theorem 2.8 20 2 . 5 Conclusions 2k 3 • COMPUTING THE MINIMA OF QUADRATIC FORMS 25 3 . 1 Introduction 25 3.2 Outline of Proof 27 3.3 Concluding Remarks 33 V Page k . FINDING MINIMUM SPANNING TREES 3^ h.l Introduction Jk k.2 Algorithm 35 k .3 Remarks 37 5 . SCHEDULING UNIT- TIME TASKS WITH LIMITED RESOURCES 39 5 . 1 Introduction 39 5.2 The Model 1+0 5-3 Algorithms to be Considered k2 5 A Bounds on the Worst-Case Behavior 43 5 .5 The Special Case in which <• is Empty 4 5 5.6 Proof of Theorems 5.3, 5-4, and 5.5 k-6 5.7 Proof of Theorems 5.6 and 5.7 56 5.8 Proofs of Theorems 5.8 and 5.9 60 5 «9 The Multi -Weight Packing Problem 66 5 .10 Generalizations to Non-Uniform Task Lengths 68 5 • 11 Conclusions 68 LIST OF REFERENCES 69 APPENDIX A : Proof of Theorem 2 A 71 APPENDIX B: Proofs for Theorems 5. 11, 5-12, 5. Ik 76 VITA 89 VI LIST OF FIGURES Figure Page 2.1 A (2, 5) -selector 5 2.2 A (5, N)- selector 9 2.3 "Priming" a ( 3, N)- selector 12 2.k Dividing a network into levels 15 2.5 Weight assignment of a network l6 5-1 Disjoint chains C. , C p , ...,C k$ 5-2 A "bad" partial order for the arbitrary list heuristics 53 5.3 A "bad" partial order for the level algorithm 58 5-^- A "bad" partial order for the resource decreasing algorithm.. 63 5.5 The schedule generated by the resource decreasing algorithm for the system in Figure 5 •*+ 6^ 5.6 An optimal schedule for the system in Figure ^.k 65 A.l Construction of a (t,N) -selector 72 1 . INTRODUCTION 1.1 Concrete Computational Complexity A great variety of computational problems are encountered in computer science. Some arise in the design of computer systems, e.g., job scheduling, connection networks; others come from different disciplines, but are to be solved on a computer, e.g., solutions of systems of equations, graph problems. In all cases, good algorithms are needed in order to efficiently use the available resources. The study of the complexity of these computations constitutes the realm of concrete computational complexity . In complexity theory the goal is to find good algorithms and prove that they are close to the best possible. Thus, there are two complementary facets to the complexity problem. One aspect is to find better and better algorithms and analyze their costs, which gives us upper bounds to the "difficulty" of the problem considered. The other aspect is to establish lower bounds on the minimum cost necessary for all algorithms. When the upper and lower bounds come close together, we would then have a clear measure of the intrinsic complexity of a problem. 1.2 Models of Computation For any given problem, a computational model has to be set up so that the complexity of the problem can be discussed in a precise way. This means specifying a repertoire of elementary operations, which defines the class of algorithms allowed; and a complexity measure , which quantifies the concept of cost of algorithms. The complexity of the problem, the main object of interest, is then defined as the minimum cost required by all algorithms. To make the results meaningful, the choice of the class of algorithms should be wide enough, and the cost measure should reflect the "true cost", e.g., running time of programs, in the real world. A prominent example is the decision tree model for the problem of sorting a set of numbers {x-.,x p , ...,x }. In this model, an algorithm consists of a finite sequence of queries x. > x. ?, x. > x. ?, . . ., where the choice of a query may 1 1 X 2 X 3 \ depend on the answers to the previous queries. The cost of an algorithm is defined as the maximum number of queries asked over all possible inputs. It is well known [17] that there are algorithms with cost « n«!og. n, and that any algorithm must have a cost greater than or equal to % p (n!) ~ n%_n. Therefore, the complexity S(n) of this problem is approximately n% p n. It should be emphasized that care must be taken when practical conclusions are drawn from results in complexity theory. There is always the pitfall of having an over-simplified model. Furthermore, many theoretical results are statements about asymptotic behavior while the problem at hand may not have the size for the theorems to be interesting. 1.3 An Overview of the Thesis We shall study the complexity of four problems, one belonging to each of the following categories: (i) Discrete Problem: Selection networks (ii) Continuous Problem: Computing the minima of quadratic forms (iii) Graph Problem: Finding minimum spanning trees (iv) Heuristic Algorithm: Scheduling unit time tasks with limited resources A summary of the results follows. Preliminary versions for these results have appeared in [27], [28], [29], and [30]. Selection Networks (Chapter 2) [27] We investigate the complexity of network selection by measuring it in terms of U(t, N), the minimum number of comparators needed, and T(t,N), the minimum delay time possible, for networks selecting the smallest t elements from a set of N inputs. New bounds on U(t, N) and T(t, N) are obtained. In particular, U(3,N) is determined to within a constant of 2, and asymptotic formulae for U(t, N) and T(t, N) are obtained for fixed t. Computing the Minima of Quadratic Forms (Chapter 3) [28] Let F(x ,x , ...,x ) be a quadratic form in n variables We wish to compute the point x^ ' at which F achieves its minimum, by a o series of adaptive functional evaluations. We prove that 0(n ) evaluations are necessary in the worst case for any such algorithm. Finding Minimum Spanning Trees (Chapter k) [29] We obtained an algorithm which finds a minimum spanning tree in 0( |E|fcj%|v|) time for a graph G = (V, E) . Previously the best algorithms known have running time 0( |e |n/% |v | ) for sparse graphs Scheduling Unit-Time Tasks with Limited Resources (Chapter 5) [50] A set of tasks are to be scheduled on a multiprocessing system with s resources. Each task takes one unit time to complete, and rquires certain amounts of resources. The schedule is to be consistent with a prescribed partial order relation on the tasks, and the total demand for each resource must not exceed a fixed amount at any instant. We analyze the worst-case behavior of several heuristic scheduling algorithms. Let co be the time taken for executing all the tasks according to a priority list, and co be the time required when scheduled in an optimal way. It is shown that, independent of the number of processors, w/co g stj0 r/^ + ^( s ) -^ or an y list. When certain heuristic algorithms are used to prepare the list, a significantly improved upper bound can be derived: w/co ^ const, x s + 0(1) . Some generalizations are possible to the case when the "unit-time" restriction is removed. When the partial order relation is empty, the problem becomes a natural generalization of the bin-packing problem. A bound of w/co ^ s + ~r + e is given. 2. SELECTION NETWORKS 2.1 Introduction A (t,N)-selector where l^t^Nisa network with N inputs (x,,x , . . .,x ) and N outputs (x' x' ...,x') such that the set (x',x', . . .,xM consists of the smallest t elements of x ,x p , ...,x^. We consider (t, N) -selectors that are built of basic modules called comparators which are themselves (1, 2) -selectors . A (2, 5)-selector is shown in Figure 2.1 where a comparator is represented as *i - Yr with yj = min(y ,y ) and y' = max(y ,y ). Note that the "sorting networks" (for N elements), which have been extensively studied [7], [Ik], [18], [2k], are networks that are (t, N)- selectors for all t (1 £ t £ N). x 1 ■■-■ '■— — ■ ' ...... , 1 ■ ■ ■ ■■ 2 k 1 C ' ■ 1 1 level 1, level 2, level 3> level k *3 Figure 2.1 A (2, 5) -selector , In a network the comparators may be grouped into levels , where within each level are some comparisons that can be made simultaneously (see Figure 2.1). The delay time of a network is the minimum number of levels that its comparators can be grouped into. In this chapter, we investigate the complexity of network selection by measuring it in terms of u(t,W), the minimum number of comparators needed in a (t, N)- selector, and T(t,N), the minimum delay time possible for (t, N)- selectors . New bounds on U(t,N) and T(t,N) are presented. In particular, u(3, N) is determined to within a constant of 2, and asymptotic formulae for U(t,N) and T(t,N) are given for fixed t. A new lower bound on the delay time for sorting networks is also obtained. Main results are contained in Theorems 2.1, 2.3, 2.7, 2.8, 2.11. 2.2 Known Bounds for U(t,N) and T(t, N) In the literature, the following results (A), (B), (C) are known : (A) [18] U(1,N) = W-l, U(2,N) = 2N-4- (2.1) (B) (Alexeyev, see reference l) A. (N-t)[%(t+l)l £U(t,N) g (N-t) (1 +^p-) (2.2) where S(t) is the minimum number of comparators needed in a sorting network with t inputs. (C) [26] T(t,N) > ^-^ (%t +H(1 -|)) (2.3) Remark Throughout this chapter, logarithms are taken with respect to base 2. 2.3 New Results Concerning U(t, N) In this section, we shall show that U(t,N) is well approximated by [%(t+l)]N when t is small compared to VN. A new lower bound for U(t, N), which improves on Alexeyev's bound (2.2) in most cases, is also derived. As a consequence, U(3,N) is determined to within a constant. 2.3.1 Asymptotic Behavior of U(t,N) for Fixed t Since the best upper bound known for S(t) is of the order 2 t(fcjt) for general t, the inequality that can be deduced from (2.2) is [toj(t+l)](N-t) ^U(t,N) gC(%t) 2 N (2.10 where C is a constant. Thus, the asymptotic behavior of U(t,N) for fixed t is not well determined by (2.2). We shall construct (t, N)- selectors that yield a better upper bound for U(t,N). This enables us to identify the leading term of U(t, N). For example, when t = 11, we can obtain MN-11) * U(11,N) *^n+c((%n) 2 ) (2.5) for some constant C. In general, we have Theorem 2.1 U(t,N) = [«o»(t+l)]N +0((%N)l ^l - !) for fixed t. (2.6) To prove Theorem 2.1 we need the following lemma. Lemma 2.2 U(t,N) SU(Lt/2j,lN/2j) +U(t,f|l +Lt/2|) + [n/2J. (2-7) Proof of Lemma 2.2 We need only show that a (t, N) -selector can be constructed from one ([t/2], [n/2J ) -selector, one (t, [N/2]+[t/2J ) -selector, 8 and [n/2J additional comparators. Figure 2.2 shows such a construction. The reason that it works as a (t, N)- selector is that, after the initial [n/2J comparisons in A, at most [t/2j of the smallest t elements can come out on the [N/2J lines of the lower half. These possible "small" elements are selected by B and input into C. Therefore all of the smallest t elements are fed into the (t, [n/2"|+[ t/2 J) -selector C which finally outputs those elements on the top t output lines. This proves Lemma 2.2. □ Proof of Theorem 2.1 According to Equation (2.1), U(1,N) = N-l, U(2,N) = 2N-1+. (2.8) Using these equations as the basis of induction, it is not hard to prove (2.6) from (2-7). D It is clear that (t, N)- selectors which satisfy the bound of Theorem 2.1 can be explicitly constructed by following inductive scheme illustrated in Figure 2.2, based on (l,W) -selectors and (2, N) -selectors which achieve (2.8). As will be seen in Section 2.3«3, the (3,N)- seiectors thus constructed are optimal within a constant number of comparators . 2.3-2 Other Sufficient Conditions for U(t,W) « [ %.(t+l) ]N According to Theorem 2.1, f"%(t+l)]N is the dominant term of U(t, N) for fixed t as N -> 00 . Actually, this is true under more general situations. For example, we have the following theorem: Theorem 2.3 If f (w) satisfies lim — x2L = for some e > 0, then l to _U(£(l) i lL_ = ! (2. 9 ) U,«, Nfto,(f(H)+l)l (U/2j,lN/2j) -selector (t,rH/2l+L5/2j) -selector Figure 2.2 A (5, N) -selector . 10 Corollary 2-3 If < a < l/2, then U(N a ,N) ]^*c N%N (2.10) i e Thus, for any t satisfying t = o(N 2 ), U(t,N) is well approximated by [%(t+l)]N. The validity of (2-9) and (2.10) are direct consequences of the following theorem and Alexeyev's lower hound [%(t+l)](N-t) for U(t,N) . Theorem 2.k For t < n/n, U(t,N) ^ [%(t+l)]N+^%(^)j 9 ! t+1 (2.11) \H—]j The proof of Theorem 2.k will he given in Appendix A. Obviously, Theorem 2.1 is implied by (2.11). We have proved Theorem 2.1 separately because it has a more elegant proof of its own, also the construction described there often yields better networks than obtained by the general construction of Theorem 2.k. 2.3.3 Lower Bounds for U(t,N) Since U(l,N) = N-l and U(2,N) = 2(N-2), it is an interesting question whether Alexeyev's lower bound f%(t+l)|(N-t) is in general achievable for U(t, N") . We shall give a new lower bound which shows that, for most values of t, Alexeyev's lower bound cannot be achieved. The new lower bound that improves on Alexeyev's lower bound (2.2) asymptotically whenever t / 2 is given below: Theorem 2 . 3 If t is not a power of 2, we have U(t,N) 2? [%(t+l)]N+(t-2'- t ^ t J)%N +C(t) (2.12) for some function C(t). 11 We shall prove Theorem 2.5 by showing it for the special case t = 3- The arguments are immediately generalizable to any t f 2 by using induction. Thus we are actually going to show Theorem 2.6 U(5,N) i? 2N-6 +ffcjfN/3H (2.13) Proof The basic idea of "pruning" a network used in this proof is similar to that used by Van Voohris [25] in showing lower bounds for sorting networks. Let A be any (3, N)- selector such as the one shown in Figure 2.3a. We will show that, by removing at least ("%[n/3]] comparators from A and reconnecting some of the lines, we shall be left with a (2, N-l) -selector. This leads to inequality (2.13) since U(3,N) i? U(2,N-1) +f%fN/3H = 2N-6 + [ %[ N/3 "| "| • We begin by numbering the lines of the network from the top as in Figure 2.3a. If the smallest element is input to the j-th line, for any 1 ^ j ^ N, this element will always move "upward" across any comparators encountered, and wind up on one of the top three lines at the output end (see Figure 2.3b). As j runs from 1 to N, these N paths can be divided into three groups according to the output line they lead to. One group will contain at least [n/3] paths. We can look at this group of paths as a binary tree, regarding the comparators contained in the paths as branch nodes (internal nodes), and the input terminals as leaves. Since there are at least [n/3] leaves, there is a path with at least |~%[n/3]] comparators connected to it (Figure 2.3c). We can remove this path and all the comparators incident with it, then reconnect the lines (and straighten them if necessary) as shown in Figure 2.3d. The m - ■■■ ■■■ i— ... — - . — ii ■— — ,,_,._ . » m ■ ■ ■.♦-. ■ - i ■ , — . ^.-. „ ^ ■■ 1 1 i 1 1 ,, (a) 12 i 1 1 1 ., I I |_ I J I I ■ I J I i Cb) (c) Figure 2.3 "Pruning" a (3, N) -selector 13 (d) (e) Figure 2.3 (continued) Ik new netowrk of Figure 2.3e is the resulting (2,N-l)-selecotr and this proves our claim. Q We conclude this section with the following theorem. Theorem 2.7 2N-6 +("%[N/3]] £ U(3,N) < 2N-5 +lH(N-3)J (2. lit) Proof We need only prove the upper bound. From Lemma 2.2, we have U(3,N) * U(3, fN/21+1) +U(1,.[W/2J) +[N/2j = U(3Jn/2]+1) +2[n/2J -1 (2.15) Using U(3, 5) = 6 and U(3,6) = 8 as the basis of induction, we obtain the inequality on the right-hand side of (2.14). D 2.4 New Results Concerning T(t, N) In this section a new inequality involving T(t, N) is derived. With the help of this inequality, we can determine the asymptotic value of T(t, N) for fixed t and large N to within a term of the order Qoq.foq.faj.'E . For general values of t and N, this inequality also provides lower bounds on T(t, N) that are stronger than those previously known. As an interesting corollary, it is shown that the minimal delay time for a sorting network with N inputs is at least 2.4 %N for large N. 2.4.1 Main Theorem This subsection is devoted to a proof of the following theorem which forms the basis of all later discussions. Throughout this section we adopt the convention that the binomial coefficient (.) is zero if J k < j. Theorem 2.8 T(t, N) satisfies the following inequality: t|"Mt+l)] N (f%(t+l)l-i) \ T(t,N) \ 1 /J (2.16) 15 Corollary 2-9 T(t,N) § fcjN + T(t,N) t[%(t+l)] [ %t J (2.17) Proof of Theorem 2.8 In a network that is divided into s levels, each line can he viewed as being partitioned into s+1 segments as shown in Figure 2.k. >.. ... ]_St segment 2nd 3rd l^th 5 th Figure 2.4 Dividing a network into levels Let us associate with each line segment a weight as follows [l8] (1) The first (i.e. the leftmost) segment of each line is assigned weight 0. (2) If a line is not connected to any comparator at the £ th level, then its weights on the i™ 1 and the £+l s "t segments are the same. (3) Let m. and m. be the weights on the t^h j segments of line i and line j respectively where i < j and £ ^ 1. If there is a comparator at the £ level between line i of and line j, then on their £+1 segments 16 line i has weight min(m. ,m.) and line J has weight max(m.,m.) +1. An example of the weight assignment of a network is shown in Figure 2.5. 1 1 1 2 1 2 2 1 -— — - ■■ 2 2 2 Figure 2.5 Weight assignment of a network. For a (t, N)- selector with s levels, we define two s+1 by [%tj+l matrices X = (x , ) and Y = (y , ) "by letting x Ik the number of lines whose 2 segments have weight equal to k, and y Ik L (k+l-u)x„ , ^ ill u=0 I = 1,2, .. ., s+1 k = 0,1,2, ...,[fcjt] (2.18) Of course the difference y . v~y» v depends very much on the ,th th comparators at the I level. A comparator at the i level will not affect y , -y if at least one of its input segments has weight exceeding k. This is so, because this type of comparator does not change the number of lines with weights u for u ^ k. Only those 17 comparators at the I level where the maximum weight of two input segments does not exceed k can affect y» ,-, v-y« >•• I n fact, it is easily seen that every such comparator contributes exactly -1 to the difference y |+ljk -7 |A - Proof ofLemma A As mentioned above, y , -y . is equal to the — — — — — — — — — — £ jK i ~i-Lj K. number of comparators at the £ level for which both input line segments have weights not exceeding k. Clearly the number of such comparators is bounded by ^-(x +x +. . .+x , ) . D *i + i,k * i (y *,k +y i,k-i 5 k ^ (2 - 19) Proof of Lemma B By definition of y , X 20 = V £0 ^0 +X il + "' +X ik = y £k- y i,k-l kSl * According to Lemma A, we have < 1 y £,0" y £+l,0 - 2 V £,0 aM y £,k- y £ + l,k g l^lk^i^-l 5 k S 1 " This then leads to (2.19). □ N ^ PI Lemma C y £ . * -j-^ Z^ (*. ) (j+l-i) I = 1,2, ...,5+1 2 i=0 j = 0,1,2,. ..,[%t| Proof of Lemma C Use Lemma B and prove by induction. 18 We are now ready to prove Theorem 2.8. As was shown in reference [15], when weight function is so defined, in a (t, N)- selector there are at most t output lines which have weight less than or equal to [(bg.tj. Therefore, y i n , i = Z (I fait |+l-i)x n . ■"■s+i, L%tJ i=0 s+1 > x ^ ([%tj+t)t = [%(t+l)]t (2.20) On the other hand, according to Lemma C y s + l,lHtj *? £, (LWtJ+l-D^) (2.21) Comparing (2.20) and (2.21), we obtain tr«b»(t+l)l fei Z (r%(t+l)]-i)( n f) (2.22) 2 s i=0 ' x This completes the proof of Theorem 2.8. D 2. it-. 2 Value of T(t, N) for Fixed t and Large N It is easy to transform a (t, N)- selector into a (t-t , N-t„)- selector by deleting the top t Q lines together with all the comparators that are connected to them. This leads to, T(t,N) 1 T(t-t Q ,N-t ) (2.23) In particular, when t ^ N/2, we have T(t,N) £ T(l,N-t+l) ^ |"%(N-t+l)] ^ [%Nj (2.2*0 19 To derive a "better bound than (2.3J+) we use equation (2.17), T(t,N) > %N + Gog. t[%(t+l)] \ T(t,N) V [■** J/ Now, according to (2.2k), T(t,N) > [ %Nj . Thus, (2.25) implies \ T(t,N) ^ %N +% [fcjNj t[%(t+l)] M^tjl For fixed t and large N, L%tJ (2.25) (2.26) [C ^ Nj .. (1M1) . l%tjl (L%tJ)! Therefore, (2.26) becomes T(t,N) > %N + [%tJ%%N +C(t) for some function C(t) The next theorem states that the lower bound in (2.27) is actually a good approximation to T(t, N) in the asymptotic sense. Theorem 2.10 For any fixed t and large N, (2.27) T(2,N) = * z (?) 2 S i=0 1 2 s 1=0 X 2 S 2 i= X 2 21 The left-hand side of (2.28) is equal to tffc»(t+l)] ^ (HN-%P^N) <| Thus (2.28) is not true for s = 2[%tJ. This proves T(t,N) ^ 2[%tJ. D When t > 777-77, we can first use (2.23) to obtain 2 wtjN We now apply Theorem 2.11 to the right-hand side of (2.30) and get Therefore T(t,N) ^ 2(%N-%%N-3) for t 1 pFN ' ( 2 -3l) (2.29) and (2.31) constitute a pair of inequalities that are stronger than F. Yao ' s inequality (2.3)- It should he pointed out that a slightly weaker form of (2.29) can be derived in another way. Noting that each level can contain at most n/2 comparators, we can combine the inequality T(t, N) 1 } vV with Alexeyev's lower bound (2.2) to obtain a result similar to (2.29) • The following theorem is a more interesting application of our technique . Theorem 2.12 Let ql = w p . 7 v « 0.8. For any fixed e > 0, there exists a number f(e) such that T(N° b ,N) 1 (— L^_ e )%N« (2.U1 -€)%N for all N i? f(e) (2.32) Corollary 2.13 There exists a function g(e) such that, T(t,N) > ( g Y -e)%N for all N i? g(e) and n/2 > t > N^. (2-33) 22 Proof of Theorem 2.12 Let e n > "be any fixed number. Without loss of generality we can assume e satisfies e > e > 0. We shall choose e to be small enough such that h(e) > for all e > e > (2.3*0 "0 where h(e) = e[l-%(3 — ) + %(2 - -£-.)■] "0 a + a (5l»»(l -^-) - 2%(l -*£-)) (2.35) °n 1 Now, let t = N u , and s = [( g _, 3 -e)fogN| = [3a -e) %N"| , We shall prove that (2.28) is not satisfied when N is sufficiently large. This then implies the theorem. To prove our assertion, we observe that for large N, Stirling's approximation yields ^v e)HNl l (y)¥)! L j J = (a %N)!((2 ve ) W ! (3 a -e)%w ^2ll(3aQ-€)%N a %n * / \ (2 -€)^N s/2na HN 1 I v2n(2 a -e)%N ' ]_ (3oo-e)%(3Q( D -€)-a %a -(2q D -e)%(2q D -e) constant x - N 23 Therefore, in Formula (2.28) right-hand side ^ — 2 1 [a Q %Nj , . 1 N s constant x , — r t NO (3 Qb -€)%(3 Qb -e)-a %a -(2 ao -€)H(2 ao - € ) After some algebraic manipulation, we have, QL -r±= n V 1 ' 6 ' (2.35) However, the left-hand side of (2.28) is equal to left-hand side = t[%(t+l)] = N^Tc^^N] (2-36) Since h(c) > 0, a comparison of (2.35) and (2.36) shows that, for sufficiently large N, the right-hand side of (2.28) is greater than the left-hand side. This is a violation of (2.28). We have proved our assertion. □ (2.33) can be obtained in the following way: First we use (2.23) to get (assuming t ^ n/2), °t) a a w T(t,N) ^ T(N , N+N -t) i= T(N , |) (2-37) ^0 N A lower bound for T(N- , p) can be obtained in exactly the same way as On that for T(N , N) . This leads to (2-33). D 2k An interesting consequence of Theorem 2.12 is that, for a sorting network with N inputs, the delay time is at least 2.4%N for sufficiently large N. This result seems to be new. 2.5 Conclusions Bounds have been given on the minimal "cost" and "delay time" of selection networks built of comparators. In particular, we have identified the leading asymptotic terms of U(t,N) and T(t, N) for fixed t. Many questions still remain open. For example, (1) What is the exact value of U(3,.N)? (2) What is the order of magnitude of 11(75, N) U(f,N)- 2 as N -* 00? Does lim __■ ■ == exist? N-co W ^N (3) What is T(|,N)? Solutions to these problems not only are interesting in their own rights, but also will give more insight to the sorting network problem. 25 3- COMPUTING THE MINIMA OF QUADRATIC FORMS 3 .1 Introduction The following problem was recently raised by C. William Gear [11]: Let F(x n ,x„, . . .,x ) = Z a.' .x.x. + Z b.x. +c be a quadratic 1 2' ' n y ig j ij i j ± l l form in n variables. We wish to compute the point x = (x -, . ..,x^ '), at which F achieves its minimum, by a series of adaptive functional evaluations. It is clear that, by evaluating F(x) at — (n+l) (n+2)+l points, we can determine the coefficients a! .,b.,c and thereby find the point x . Gear's question is, "How many evaluations are necessary?" We shall prove that 0(n ) evaluations are necessary in "tie worst case for any such algorithm. Assume the coefficients a!.,b.,c are such that F assumes its 10 i minimum at a unique point. Then (x ,...,x^ ') is the unique solution to the following equations: r a n . x n + a_ _x_ + . . . + a n x +b n = 11 1 12 2 In n 1 a 12 x 1+ a 22 x 2+ ... +a 2n x n+ b 2 = a n x., + a^ x,. + . . . + a x + b =0 In 1 2n 2 nn n n (3-D where a. . = 2a.' • if i = 1 J a: . if i < J ij 26 An algorithm consists of a series of queries F(x^ ') = ?, F(x ') = ?, . .., F(x^ ') = ?, where the choice of yr^ depends on the set of values (F(x^) |l £ i £ d-1) ■ Note that a query F(x) = ? can he written as Z s. .a. . + Z t.b. + c = ? (3.2) igj J iJ i where the s. .'s and t. 's depend on x. Therefore, a lower hound for our iJ i ' problem can be obtained from a lower bound on the complexity of the following problem: Problem Solve (3.1) by means of a series, of queries of the form (3'2) where each query (i.e., the choice of the numerical coefficients s. .,t.) may depend on the results of previous queries. In this form, the problem is related to a result by Rabin [20] : It was shown there that, to solve a system of equations n ]_ Z A. .x. = (i=l,2, . . .,n-l) we need 7rn(n+l)-l tests of the form 3=1 ^ J 2 -> -*■ -> "A. 'V = ?" where A. = (A. n ,A._, — ,A. ). The problem considered there, 1 1 v ii' i2' ' irr * ' however, differs from the present problem in two respects: (a) In our problem, A. . and A., are not independent variables. In fact, system (3.1) corresponds to the case of having a symmetric matrix (A. .). (b) Queries of the form (3 '2) are more general than the tests allowed in [20]. This latter feature makes our problem somewhat more complex than the problem studied in [20], and a different approach is required. The main result is the following: 27 Main Theorem To solve the system of equations (3.1) by queries of the form (3-2), O(n^) queries are necessary. It turns out that the present approach can also be used to rederive Rabin's result; some of the technical assumptions made in [20] can actually be removed to result in a stronger theorem. 3.2 Outline of Proof Without loss of generality, we can assume that c = in F. (A) A few concepts . Informally, as more and more queries of the form (3-2) are asked in an algorithm, the possible values of the coefficients a. .,b. are confined to a smaller and smaller region. Correspondingly, X J X the set of vectors (x , ...,x ) satisfying (3*1) is gradually being reduced until finally a unique point remains in the set. At that point the algorithm shall halt with the answer. Therefore, at any point in the computation, we shall denote by A, called the coefficient space , the region of those vectors (a. ,,b.) which satisfy all the queries answered so far. Clearly, AcrP where p = ^-(n+l) (n+2) . Similarly, we denote by X, called the solution space (corresponding to A), the set of those vectors (x.,,...,x ) eR that satisfy (l) for some (a. .,b.) eA. We v 1 ' n ij i shall also write A as A = _JJ A(x) where A(x) is the linear subspace in xeX A that corresponds to the point xeX. Initially, A is taken to be any region in R p such that det(a. . ) 4 for all (a. ,,b.) eA: and X is the solution space corresponding to this A. With each subsequent query of the form (3-2) we shall associate a query vector E = (s. .,t.). Locally, the solution space X is always an algebraic set (the set of common roots to a finite set of polynomial equations in variables 28 x n ,x^, ...,x ) in R n . which makes the concept of dim X well-defined. 1' 2 n This can be proved by induction on the number of queries answered so far (cf. the proof of Lemma 3*1); details will be omitted here. We call x a simple point of X if the tangent plane to X at x has the same dimension as X itself [22]. Throughout our discussion, X will be implicitly restricted to a small neighborhood consisting of simple points only. This frees us from worrying about "bad" behavior of X. Some elementary properties of algebraic sets will be used in the sequel without being stated explicitly. (B) Some notations . We define the functions G. from X to A, assuming a vector a is written as en - (a a . . .a a a -, . . .a . . .a 11 _,_... a., a_ _ 8l~-, ... a_ ...a .. n a a b., b_ • • . b ) 12 In 22 23 2n n-l,n-l n-l,n nn 1 2 n' r G 1 (x) = ( Xl x 2 ...x n ...0.... 1 ...0 ) G 2 (x) = (0 x x ...0 x 2 x 3 ...x n 10 ) G n (x) - (0 x x x £ . .x _ x 0, n-1 n •1 ) (3-3) In this notation, (3-1) can be written as; a -G. (x) = i i=l,2, ...,n (3.1)' and a query is written as "E • a = ?"• 29 (C) Oracle . When a query "E. * a = ?" is asked, the following oracle will specify how to answer it. Later we shall show that 0(n ) queries are needed by all algorithms if we give answer according to this oracle. Let E-,E , ...,E. be the previous query vectors, X the present solution space, and X(c)(c: X) the new solution space if an answer E . , • a = c so that the number max {dimV(E , . . .,E .,G. (x), . . ., G (x),E. )} is as large as possible, where V(E, ,.. .,E ., G, (x), .. ., G (x),E. , ) denotes the vector space spanned by the set of vectors inside the parenthesis. Remark We could have used an alternative oracle which simply tries to maximize dimX(c). The present formulation has essentially the same effect, but makes the analysis slightly easier. st The 3+1 query will be called a critical query if E. , is a L.C. (linear combination) of E ,E p , . . .,E .,G, (x), . . ., G (x) for all J- C- J J_ 11 xeX. (D) Analysis of the oracle . Let E ,E , ...,E be the sequence of query vectors produced by some algorithm under the oracle given in (C) . Without loss of generality, assume that they are all linearly independent. Our aim is to show that m ^ O(n^) . at Lemma 3 • 1 Let X be the solution space when the j+1 query is asked and c be the answer provided by the oracle, then dimX(c) = dimX if the j+l s ^ query is not critical, and dimX(c) < dimX if it is critical. Proof For any xeX, A(x) (as defined in Section 3«2 (A)) is the set of vectors aeF satisfying 30 r E * a = d E 2 • a = d 2 / E. • a = d. \ J J G (x) -a = G (x) • a = n ' (3.*0 V where d. is the answer given to the i query "E. • a = ?' The equations for A(x) , the new A(x) after an answer E. • a = c is given, are simply (3-M plus the new equation E. , • a = c (3.5) v The proof of the lemma is divided into two parts: (i) If the j+1 s "t query is not critical, it is not too difficult to show that there exists some xeX, such that (a) xeX(c), (b) E. is not a L.C. of E , .. V E.,G (x),...,G (x), and (c) dimV(E , E.,G, (x), . . ., G (x)) is the largest possible for all X€X. We wish to show that dimX(c) = dimX. Let a = cl eA be a solution for equations (3«*0 and (3 .5) • It is easy to see that, for any x'eX sufficiently close to x, there exists a solution (to (3'M and (3.5))a = cd close to a . Therefore X(c) includes all x"'eX in some neighborhood of x, which implies dlmX(c) = dimX. (ii) If the j+1 query is critical, then E. , is a L.C. of E , . . .,E.,G, (x), . . .,G (x) for all xeX. This implies that the hyperplane in RP F = {a|E. ■ a 3+1 c} contains A(x) if P n A(f) 4 0- Now, if 31 dimX(c) = dimX is true, then there is an open set TcS such that F n A(x) ^ yxeT. Consequently, F contains U A(x) . This means X€T that F contains A locally, which implies that E. is a L.C. of E n , E_, . . ., E., a contradiction. D To derive another useful lemma, we define D = min{dimA(x)} xeX at every stage of the computation. It is not difficult to verify (from equations (3»*0 and (3*5)) that D is decreased by 1 each time a non- critical query is answered, and D does not change when a critical query is answered. As a result, the number dimA -D does not change in the former situation, and is decreased by 1 in the latter case. Since dimA -D is decreased from n to in the computation process, we conclude that there must be exactly n critical queries. This, together with Lemma 3-1, implies the following: Lemma 3 • 2 If the j+1 query is critical and E^ + ^_ • a = c is the answer given under the oracle then dimX(c) = dimX-1. Proof of the Main Theorem ->(0) Let x v be the minimum position found, and j_ < j_ < . . . < j be the query indices at which dimX is forced to be reduced. For any r (l ^ r ^ n), let X be the solution space after the j r query is answered. According to Lemma 3-2 and Lemma 3*1, we have: (a) dimX = n-r (3-7) (b) r E. = linear combination of E„'s 4 linear combination of G.(x)'s E. = linear combination of E„ s + r linear combination of G (x)'s v for all xeX (3-8) 32 According to (3.1), there exist integers i,,i„, ...,i 1 2' ' n-r such that (x_. .. ,x n - .., . ..,Xi ) can be used as local coordinates for X J-1 1 2 n-r in some neighborhood of x . Without loss of generality, we may- assume these xi, 's to be (x , ,x _, . . .,x ) . x t v r+V r+2> ' n y From (3*8), it follows that for each X€X, there exists integers ^ (x) , kg (x) , . . .,k r (x) such that G fc (-») ,0^ (f ) , > . . ,G fc (^ are linear combinations of the E 's and the remaining G (x)'s. It is then easy to see that there exists a set of n-r points Z= {x (l \x (2) ,...,x (ri " r) } such that: , N ->(1) ->(2) -*(n-r) _. . . , , . ( a ) y >7 >'••>¥ are linearly independent vectors where y^ 1 ' is the projection of the vector x on its last n-r components. (3-9) (b) There exists integers 1 s,i 1 2 r such that, for all xeZ, Q ± (x) ,G ± (x) , . . .,G ± (x) are linear combinations of E.'s and the remaining G/x)'s. (3-10) Now, let V, be the linear space spanned by {&,-., (x"), . ...G_. (x) IxeZ}, l i if and V p be the linear space spanned by the E 's and the other G (x)'s (i / i t Vt). It is clear by (3-10) that dimV ^ dimV (3-H) and m+(n-r) 2 ^ dimV (3-12) Furthermore, the following proposition is true: Proposition dimV-, l (n-r) (r- (n-r) ) (3 «13) Proof of Proposition Obviously i.,,i 2 , ...,i / n < r+1. By explicitly examining the forms of G. (x)'s in equation (3-3) and making use of (3'9), 33 it is seen that all the vectors in the set Q, = {&,• (x) |l ^t ^r-(n-r), t xeZ} are linearly independent. It follows that dimA^ § \Q,\ = (n-r)(r-(n-r)) which is (3-13)- □ Equations (3-11), (3-12), and (3-13) lead to m+(n-r) ^ (n-r) (r-(n-r)) 5 Taking r = 7- n, we obtain m ^ ^2 n (3- 1*0 Thus, at least O(n^) queries are needed. This completes the proof of our main theorem. □ 3-3 Concluding Remarks We have shown that 0(n ) adaptive functional evaluations are required to find the minimum point of a quadratic form. It seems likely that an exact bound could he obtained by making more effective use of the lemmas. As mentioned earlier, our approach can also be used to prove a stronger version of the result derived in [20]. 3k k . FINDING MINIMUM SPANNING TREES k . 1 Introduction Given a connected, undirected graph G = (V, E) and a function c which assigns a cost c(e) to every edge e eE, it is desired to find a spanning tree T for G such that Z c(e) is minimal. In this note eeT we describe an algorithm which finds a minimum spanning tree (MST) in 0( |E | Bog.fog.|v|) time. Previously the best MST algorithms known have running time 0(|e|%|v|) for sparse graphs [1]; and more recently Tarjan [23] has an algorithm that requires 0(|e| v%|V|) time. Our algorithm is a modification of an algorithm by Sollin [2] . His method works by successively enlarging components of the MST. In the first stage the minimum-cost edge incident upon each node of G is found. These edges are part of the MST sought. The groups of vertices that are connected by these edges are then identified. By shrinking each such group of vertices to a single node, we obtain a new graph with at most |v|/2 nodes. This process is repeated for a number of times, at each stage for a new graph, until finally a single contracted node remains. Clearly each stage of this procedure involves 0(|e|) operations, and %|v| stages are necessary in the worst case. Thus this algorithm requires a total of 0(|e|%|v|) operations. In our algorithm, we first partition the set of edges incident with each node v into k levels El , E^ , . .., El ' so that c(e) ^ c(e') if e eE^ 1 ', e 1 eE^' and i < j. This can be done in 0(|E|%k) time by 35 repeatedly applying the linear median -finding algorithm [3] • Having accomplished this, we follow basically Sollin's algorithm as outlined above. Note that the number of operations needed in this phase is now reduced to 0(-L—L%|v| ) since only approximately JE | /k edges have to be examined at each stage to find the minimum-cost edges incident with all the nodes. Therefore, the total number of operations required by our algorithm is 0( |e | %k + -^-H|v| ), which is 0(|e|%%|v|) if we choose k to be %|v|. k.2 Algorithm For the moment, assume |e| 1 |v|fy|v|. If |e| < |v|%|v|, the algorithm needs a slight modification as will be discussed later. The algorithm uses three sets T, VS, and ES. T is used to collect edges of the final spanning tree. The set VS contains the vertex sets corresponding to the connected components of the spanning tree found so far. And ES contains, for each vertex set W in VS, an edge set E(w) . Initially we have VS = { fv} |v eV) and ES = {{all the edges incident upon v} |v eV} . The algorithm also uses an integer parameter k, a level function I : V -> {l, 2, . . .,k, k+1}, > and a function low: V -» real numbers. 36 Procedure MST; begin T *- 0; VS <- 0; ES <- 0; for each vertex y eV do begin add the singleton set (v) to VS; add the set E({v}) = [all the edges incident with v} to ES; A. divide E({v)) into k levels of equal size according to cost, i.e., obtain EA ,E^ , ...,Ey with the property ■ UiE^ = E (M) and c ( e ) ^c(e') if e eE^, e'eE^ 3 '. and i < j ; that -J^Ey = E({v}) and c(e) ic(e') if e eE^ ; ,e'eE and i < j set a (v) *- 1 end while |vs| > 1 de- begin B. take a vertex set ¥ from VS; for each vertex v eW do begin low(v) *-oo; while low(v) = oo and i(v) ^ k do ? each begin begin for each edge e = (v,v') in El do C. if v'eW then delete e from E^ V ^ ; D. else low (v) *- min{low(v), c(e)}; end if low(v) = oo then i(v) + 1; end end F. find the edge e = (v,v') in E(w) whose cost is equal to min{low(v) |v eW} ; H. in VS, replace ¥ and the vertex set ¥' containing v' by ¥U¥'; I. in ES, replace E(w) and E(W) by E(w) UE(W); add e to T; end output T; end MST 37 k.~5 Remarks (1) In the above procedure, the set VS is implemented with a circular queue. Step B corresponds to removing W from the front of the queue; step H corresponds to deleting W and adding the new W to the tail of the queue. A full cycle of the queue, in which every vertex set of VS is merged with some others, corresponds to one "stage" of the algorithm as discussed before. (2) Step A is done by applying the median- finding algorithm [3] repeatedly, and takes 0( |e| %k) time . (3) Step C is executed at most 2 | E | times, since each edge (v,v') of G can be thrown out at most twise--once as (v,v'), once as (v',v) . (k) Steps D and F amount to approximately order %|v| Z veV |E({v}) M(#U|v|) min operations . (5) The set union operation in step I can be implemented in a straightforward manner; for example, as in [21]. The total number of operations incurred is o(|v|%|v|). It follows that the total cost of this algorithm is of the |E|%k + (bj|v| M + |v|h|v| . (h.l) 38 Taking k = %|v| and noting |e| ^ |v|%|v|, the above expression is bounded by const. X (|e|%%|v|). If |e| < |v|to»|v| for a graph G, we will first let k = 1 and execute procedure MST until each vertex set in VS is of size at least %|v|. This process takes at most %%|v| "stages" since the size of the smallest vertex set in VS at least doubles after each stage. Hence the amount of work involved in this process is 0( |E I Qoq.bg. |v | ) . The result can be regarded as a new graph G' = (V,E') where |V | ^ |v|/%|v| and |e' | =g |e|. We now apply Procedure MST to G' with k = hj, |V | . The number of operations required is given by (^.l), iE-i%k + %ivi J^J- + ivmivi ^ |E^k + h|v| J|i + |v|, which is again 0(|e|%%|v|) for k = %|v|. 39 5. SCHEDULING UNIT- TIME TASKS WITH LIMITED RESOURCES 5-1 Introduction One great advantage offered by the multiprocessing system is the potential decrease in computation time. Because of the sequential constraints that often exist among the tasks, a good scheduling algorithm is essential for the efficient utilization of parallel processing facilities. Various aspects of an abstract multiprocessor model have been studied in the literature [h], [5], [6], [8], [13], [19]. That model considers a set of tasks {T , T p , . ..,T } to be processed on n identical processors. A partial order <• on [T-,...,T } is given, and a function /i(T. ) which determines the execution time for T. is 1 1 specified. In a schedule a task cannot be executed unless all of its predecessors (in <•) have been completed. Recently Garey and Graham [9] incorporated the idea of "resources" into this model. Each task requires certain amounts of resources for its execution. A schedule now has to satisfy the additional constraint that the total demand for each resource cannot exceed a fixed amount at any instant. It was found [9] that, in general, the efficiency of a schedule derived according to an arbitrary priority list is completely unpredictable. We examine an interesting case of the Garey-Graham model in which each task takes unit time to complete. Since efficient methods for finding optimal schedules seem unlikely to exist even without the 1^0 resource complication [23], it is important to study practical heuristic algorithms. In the present paper, the worst-case behavior of several heuristic algorithms is analyzed. In particular, the total execution time resulting from the use of certain heuristic algorithms is shown to differ from the best possible by no more than a multiplying factor that depends on the number of different resources only . Some results derived here can be generalized to the case when the lengths of execution time for tasks are not the same. This will be briefly discussed in section 10. Even with the "unit-time" restriction, however, this model is of some practical interest in view of its close connection with preemption scheduling [6]. Main results are stated in sections k and 5- The proofs are given in sections 6, 7, 8, and 9- 5-2 The Model A system consists of (n, s,F,<*,R) where n and s are positive integers, F = (T(l),T(2), . . .,T(r) } a set of tasks, <• a partial order on F, and R is a vector function with s components defined on F. We require o ^ R.(T(i)) ^lyij where R(T(i)) s (R (T(i) ),R p (T(i) ), . . ., R s (T(i))). (A) A schedule f is a finite sequence of non-empty subsets of tasks F X ,F 2 , .-.,F U such that: CO (i) U F. = F and F. n F. = yi 4 3 i=1 l i J (ii) if T(i) <• T(j), T(i) eF v , T(j) eF, then k < I (iii) |F k | ^ n Vk in (iv) E T(i)eF. R^(T(i)) ^ 1 Vk,J> k go is said to be the running time of schedule f . (B) Interpretation . n is the number of processors, s the number of resources, R.(T(i)) the amount of resource j demanded by the execution J of task i, and F, is the set of tasks executed simultaneously between time k-1 and k. (C) List schedule . Given a list (i.e., a permutation of the r tasks), L = (T(i, ), T(i ), ..., T(i )), a schedule f is generated as follows: Step (a) Set i +- 1. Step (b) Let F. «- 0. Step (c) If list is empty, stop. Else scan list from the beginning; find the first task T(ij) such that if we let F. «- F. U (T(i.)}, all the conditions (ii), (iii) and (iv) of (A) are not violated; set F. «- F. U {T(i.)l and delete T(i.) from the list. Step (d) If |f. I = n or no eligible T(i.) could be found in step (c), -J- J then set i «- i + 1 and go to step (b) . Otherwise goto (c). (D) Some useful definitions . Definition 5.1 m and W are functions defined on F by: m(T(i)) = max{R. (T(i)) |l§ j^ s} J W(T(i)) = Z R,(T(i)) 0=1 J Definition ^.2 Let A be a set of tasks. R.(A), m(A), W(A) are J defined by: R (A) = Z R.(T(i)) J T(i)eA J k2 m(A) = max{R.(A)} 3 s W(A) = E R.(A) 0=1 J 5«3 Algorithms to be Considered The following algorithms are used to generate lists, which in turn produce schedules as was described in section 2. (A) Arbitrary list . Just form any list. (b) Level algorithm . This algorithm and its variants have been considered by several authors [2], [5], [6]. First a level function H is defined by: H(T(i)) = M if the longest chain of tasks that starts with T(i) has length M. A list L is then defined by the following linear order relation a (T(i) appears before T(j) in L if T(i) a T(j)). (i) T(i) a U) if H(t(i)) > H(T(j)) (ii) Let T(i) a T(j) if H(T(i)) = H(T(o)) and i < j. (C) Resource decreasing algorithm . This is a generalization of the first fit decreasing algorithm used in bin packing problem [10]. A linear order a for L is defined by: (i) T(i) a (Tj) if m(T(i)) > l/2 andm(T(o)) ^ l/2. (ii) If (i) is not applicable, then T(i) a T(j) if W(T(i)) > W(T(o)). (iii) In (ii), if W(T(i)) = W(T(j)), then T(i) a T(j) if i < 0- k3 5.4 Bounds on the Worst-Case Behavior We shall discuss the worst-case "behavior of algorithms defined in the last section. As explained in section 2, n denotes the number of processors, r the number of tasks, s the number of different resources, and <• a partial order on the set of tasks. Let (jo be the running time for an optimal schedule (i.e., one which requires minimal running time), and 00 be the running time associated -Li with list L. Theorem 5.3 If n ^ r, then ^ s \ s(u o + T-) for any list L 5-1) The following theorem states that (5-1) can be essentially achieved. Theorem 5 »k There exists systems with arbitrary large w and n l r for which w_ |s( Wo -2s) S ^ for some list L (5-2) to When the condition n ? r is dropped, an upper bound for — can readily be obtained if we regard "processors" also as a resource. However, a more detailed analysis yields the following stronger result: Theorem 5»5 For any n and any list L, L . n-1 7 (n-l) , a) 2n 2n (5-3) 1* Formula (5*2) shows that, unlike in the conventional model where "resources" are not taken into account, the worst-case behavior of a^/w is not bounded by a constant. The following theorems show that this behavior improves drastically if some efforts are made in preparing the list. Theorem 5.6 If a list L is prepared by using level algorithm, then: U) — ^ — ^— (2s+l) +1 for arbitrary n (5-*0 — ^ 2s +1 if rln (5.5) Theorem 5*7 For any given € > 0, there exists systems with arbitrary large co such that 17 . __L where L is generated / /-«. 10 " co by level algorithm If the resource decreasing algorithm described in section 3 is used, a still better upper bound exists for w /co J_j u . Theorem 5.8 If a list L is obtained by using resource decreasing algorithm, then: J± < Bzl (? s+l) + 1 for any n (5-7) co n v 4 y W T 7 — g f s+ 1 ifrgn (5-8) An asymptotic lower bound is given by: Theorem 5.9 For any given e > 0, there exists systems with arbitrary large oj„ such that ^ go where L is generated ^ s- e < — by resource (5*9) decreasing algorithm 5.5 The Special Case in which <• is Empty An upper bound for oj /to was derived in [9] under the assumption r ^ n for this special case. That bound to /to ^ s +1 is true even when the lengths of execution time for tasks are not the same. The following theorem gives a slightly better asymptotic upper bound when uniform execution is assumed. Theorem 5. 10 If r ^ n, and <• empty, then for any given e > and an arbitrary list L, W L . 17 if w n is — ^ s + — - +e . . oj 20 large enough (5. 10) The following theorem can be obtained by constructing examples. Theorem 5 .11 For any given e > 0, then exists systems with <■ empty, and arbitrary large to satisfying 7 \ s + -^r- - e < — for some list L (5«ll) 10 a) Theorem 5.12 If <• is empty, then for any e > 0, and any list L, w - IT (s + io ) +1 +€ for large w o (5 - 12) (5-10) and (5-12) may not be the best possible bounds. For example, a stronger statement can be proved for the case s = 2. k6 Theorem 5.13 If s = 2, r g n, and <• is empty, then for any given G > 0, ~ - 15 + e for lar § e w (5-13) It is interesting to note that the scheduling problem with s = 1, r ^ n, and <• empty is equivalent to the bin-packing problem. The resource demanded by a task corresponds to the weight of an object, and the running time w corresponds to the number of boxes used in the bin-packing problem. If we extend the concept of weight to an s-dimensional weight-vector, then our problem considered here can be regarded as a bin- packing problem with multi-weight. We shall return to this point in section 9« 5.6 Proof of Theorems 5-3j 5«^-, a nd 5-5 (A) Proof of Theorem 5*3 . Let f be the schedule generated by list L, and F, ,F „, . ..,F W be the sets of tasks corresponding to f as defined in section 2. Several preliminary results are needed to establish the theorem. Lemma 5 • 1^- For any task T(j ), there exist a sequence of integers a < a _ < . . . < a^ < a n and a chain of tasks T(j ) <• T(j„ , ) <• ... q q-1 2 1 v q y vu q-l y <• T(j ) <• T(j ) such that (i) T(j £ ) e F a£ for 1=1,2,... ,q. (ii) m(T(,j )) > l/2, unless for all |(lg£ l/2. q q x. (iii) If I / a. Vi, and a < I < a_, then m(F.) > l/2. \ / ' . x q 1 1 Proof The following procedure generates a sequence of integers a , a , ...,a and a chain of tasks T(j,), T(jp), ..., T(j ) that satisfy the required conditions. ^7 st ep (a) (initialization) Let a-^ be such that T( j, ) eF Q : t ■*- 1; v *- a , Step ("b) (Termination condition) If m(T(j, )) > l/2 or v = 1 then stop. Step (c) (Try to find a T(d t41 ) eF y _ 1 such that T(j t+1 ) <• T(j t ) . This is always possible if m(F ) ^ l/2, since T(j ) v — J- o could not be executed between time v-2 and v-1.) If 3 a u such that T(u) <• T(jt) and T ( u ) gF ->> then v *- v-1, t «- t+1, j , ■*- u, a, <- v else v *- v-1; Step (d) goto Step (b). Condition (ii) stated in the lemma is satisfied by the termination condition in Step (b) . D Lemma 5.15 For some k, there exist k disjoint chains C,,C p , ...,C, of tasks such that the following is true: Let C be given by T(j ) <• T(,j. ,) <• . . . <• T(J ±1 ), 1 iq^ -l^M.^- 1 - - L - L then (i) m(T(j. ))>l/2 for i-1,2, ...,k-l (5. 1*0 (ii) If F„ is such that F„ D C. = for i=l,2, ...,k, thenm(F i ) > 1/2. (5-15) Proof Choose an arbitrary task T(j-,-.) eF . Find a chain C. using the procedure described in the proof of Lemma 5.1^-. Let this chain be T ^i qi ) <' T ^l, qi -l) <' '"<' T (Jn)- Suppose T(j lqi ) eF £ . If m (F(j-, )) ^ l/2 or £ = 1, then stop. Otherwise, choose an arbitrary q l task T(,jp , ) eF . We then construct a new chain C according to the 1+8 procedure in the proof of Lemma J.lk. Let C be T(j_ ) <• T(j_ ,) d d.q_£ d, q.2~-L <• ... <• T(j 21 ). Again we check if m(T(j 2 )) < l/2 or T(j 2q ) eF^ and decide whether to construct C . This process is repeated until m (T(j kQ )) ^ l/2 or T(j k ) eF where T(j, ) is the maximum element % of the last chain obtained. At this point, k chains C.,C , . ..,C with respective lengths q , q_, . ..,q have been obtained. It is straightforward to verify that they satisfy condition (ii) by using Lemma 5.1k. Figure 5.1 illustrates the result of this process. D Lemma 5 ■ l6 Let {j., |.l gi gk, 1 ^h ^q. ] be the same as in Lemma -5.15. There exists an integer d (l ^ d g s) and a set of integers Ve (1,2, ...,k-l) such that (i) R d (TQ iq )) > 1/2 for i eV ■\ (ii) Z q i + q k s ieV k i=l (5.16) (5-17) Proof Define s sets of integers by V = ti|l S i-g k-1, H (T(j i )) > 1/2} K. — J-y £-y • • • y S • Since U V = {1,2, . . .,k-l}, we have k-1 E (_ E q ± ) s ^Z q i £=1 ieV i=l Therefore, there exists d such that k-1 Z ieV; % = I A % i=l k-9 T(j 12 ) T(j i;L ) m(F | ) > 1/2 i m(T)J llf )) > 1/2 Figure 5.1 Disjoint chains C-.CL, ...,C, , 12 ' k 50 Let V = V-,, we have thus d' 1 k_1 1 k leV i=l i=l We are now ready to prove Theorem 5»3« Let D G = U|F n ( u c.) = 0} z i-1 Obviously Z q. + G = u i=l X Since the total amount of resource available at any instant is s, we have k-1 w„ > Z W(T ) Z W(F ) + Z W(T(Ji q .)) i ^ teG i=l 1 - s k-1 Z m(F ) + Z m(T(j. )) > igG i=l ' * By virtue of (lk) and (15), we have W = \ |G| +|(k-l) Thus, 2soj n l |G| + k-1 (5-19) 51 Now, according to Lemma 5-l6, there exists a set V such that (5.16) and (5. IT) are true. Let us renumber the chains C.'s (ieV) as C' , C ',..., C£ where ^ = |v | . Let the length of C. 1 he q! and let its maximum task be T(t.). It follows from (5*17) that k i 1 k . E , «i + \ * s * % < 5 - 2 °) 1=1 1=1 Furthermore, no two tasks in the set (T(t.) |i=l,2, . . .,k , } can be executed simultaneously in any schedule since each of them demands more than l/2 unit of the d resource. In the optimal schedule, we can assume, without loss of generality, that T(t^) is done before T(t. , ) for 1 ^ i £ k, -1. Using the starting time of T(t ) as a reference point in time, the last task in chain C cannot be done in 1 less than ql + (i-l) time units. Therefore, u> > max(q k ,q^,q^+l,q^+2, ...,q^ +(^-1)) ^(q k ni + (^ + i) + ... + (Qi i+ (ki-i))) k-l S i=l k L ■ " i=l ^i z ,v?W^ (5 - 21) where (5-20) is used in the last step. (5»2l) can be written as k 1 s(k 1 +l)a) ^ I qi +isk(k-l) (5.22) i=l 52 From (5.l8), (5.19), and (5.22), it is straightforward to deduce that - ^sk-j^k -1) + s(k +l)u + 2soo ^ co +k - 1 ^ u> (5-23) (5.23) can be written as 2 2 - -s(k 1 -(to () +-)) + 3so3 Q + - s(w Q +-) * Cd (5.2*0 Now, since co , k are integers, |k..-(u) +— )| l l/2. Thus (5.2*0 implies ll 2 1 1 2 ^s(^) + 3sw Q +- s(w Q +-) ^w That is, 12 7 2 SW + 2 SW " W (5-25) This completes the proof of Theorem 5-3- Q (B) Proof of Theorem ^.k . We need only to exhibit a system with a W L 1 2 schedule L such that — > -=r soj~ - s . U)q 2 Let F = {T^,T^y |l ^ k ^ s, 1 g j g q, Ul ^j). The partial order <• is represented as a precedence graph in Figure 5-2. — > R is defined by: R(T (k) ) = (1,1,. ..,1) k = 1,2,. ..,s R(T^) = (0,0, ...,0,1- -^-pO, ...0) k = 1, 2, . . ., s; i = 1, 2, . . ., q 53 ,(2) ,(s) P <1> qq *T (s) 'qq Figure 5 .2 A "bad" partial order for the arbitrary list heuristics. 5^ RCT^) = (0,0, ..., 0,-^,0,... ,0) k = 1,2, ...,s; i = 1,2, ...,q; j = 1,2,. ..,1 where 6 is a small positive number. Let L = L^.-.Lg where, for k=l,2, ...,s, L^ = (T^,tJ?', t}^\t}^/ , ...,TV t^, . ..,T^). For this list L, only one task 21 ' 22 ' ' ql ' q2 ' Q.Q. is executed at any time. The total execution time is, therefore, equal to s(l+l+2+3+...+q). Thus, u> L = s(l+ g q(q+l)). On the other hand, the optimal schedule is generated by L. = (T^ 1 ' ) ,T^ 2 ' ) , ...,T^ S ' ) )L'L'...L' where, for each K, ' ' ■ 1 2 s _ , (k) (k) (k) (k) (k) (k) \" U ql 'q2 ' '"' qq ' q-1, 1' q-1,2' '"' i q-l,q-l' * ' v (k) (k) (k) 21 ' 22 ' 11 In this case co = q +s. Thus w L s(l + ^ q(q+l)) x 2 — = > - sw_ - s oo q+s 2 D (C) Proof of Theorem 5«5- The proof goes essentially the same as the proof for Theorem k-tl. The only difference is that, if HeG where k G = it |F, D ( U C. ) = 0}, then either m(F ) > l/2 or |f | = n. Let i=l X 55 Q = (l|ieG, m(F £ ) > l/2) and Q' = G -Q,. Then equation (5-l8) is modified to k 2 q. + |Q| + |Q' I = co (5.26) i=l while equations (5-19) and (5-22) are unchanged, 2soo Q > |Q| + k-1 i? |Q| (5-27) k 1 s(k 1 +l)w 1 Z q. + J sk^k^l) (5.28) i=l A new constraint arises from the fact that there are now only n k processors. Since there are at least Z q. + |q| + n | Q ' | tasks, we i=l ^ have nco > Z q. + |Q| + |Q'| (5.29) i=l k Eliminating |q|, |q' |, and Z q. from (5-26), (5-27), (5-28), and (5-29), i=l ^ L we obtain: W " ^~ \ sk 1 (k 1 -l)+(k 1 +l)sa) + 2sw ) + u) Q (5-30) The following equation can then be obtained from (5-3) in the same manner that (5»25) is obtained from (5.23). W "— ( 2 SW + ^ SW ) +W (5 ' 31) 56 5-7 Proof of Theorems ^.6 and 5-7 (A) Proof of Theorem ^.6 . We shall only prove equation (5*5) • The other inequality, equation (5.*0, can be similarly obtained (cf. the argument used in section 6 (C)). Consider a schedule generated by the level algorithm. Let u be the completion time and F , F , . . .,F be defined as in section 2. We introduce a function u defined as follows: u(!) = max (H(T(i))) for £=1,2,... ,w (5-32) T(i)eF £ where H is the level function defined in section 2. Lemma 5.17 u(l) 1 u(i+l) for £=1,2, . . .,oo-l Proof Suppose otherwise. Then there exists an I for which u(i) < u(l+l). Now, let T(j)eF be a task with H(T(j)) = u(£+l). Then H(T(j)) > H(T(i)) for all T(i)eF. Thus, T(j) must appear before all T(i)eF in the list since it is prepared by the level algorithm. Furthermore, there is no T.eF„ for which T. <• T.. Therefore, T. ' 1 I 1 j 3 should be done no later than any of the task in F . This contradicts the assumption that T.eF„ _, . □ y 3 i+1 Lemma 5.18 If U(i) - u(l+l), then W(F ) + W(F £+1 ) > 1. Proof Similar to the proof for Lemma 5 .17. D Now let A = (l|u(i) = u(£+l)}, then CO 2 Z W(F.) 1 Z (W(F ) +W(F )) > |A| (5-33) i=l x £eA ' x L Thus, i=l 57 On the other hand, according to Lemma 5.17> we have: u(i) ^ u(l+l) + 1 for l£k, 1 ^ t £ u-1 Therefore, i? u(l) ^ u - |A| (5.35) CO, (5«3 j +) and (5-35) imply (2s+l)co ^ u. This proves the theorem. D (B) Proof of Theorem 5«7 » Consider a system of tasks with partial order <• as is shown in Figure 5 .3- For all U i £ s, 1 ^ k g q, let R(T.) = (1,1,... ,1), S(Tj) = (I(TJ) = (0,0,. ..,0) W =R 2 (T 2k ) = - = VV /o R d (T ik ) -0- if J^i The values of R. (T., ) are unspecified at this point. The list generated by the level algorithm is L = L, L ...L where L. = (T.T:T'.'T. n T. ...T. ) i=l, 2, ...,s. l x l l l ll i2 iq' ' ' ' It is easy to see that T. , and T.' , are done after the set of tasks J l+l l+l {T.,,T._, . . .,T. } is completed, and before the set of tasks il' i2' ' lq ' [T. , ,,T. ,. -, ...,T. ,_ } is started. Therefore, 1+1,1' 1+1,2' ' l+l, q ' co = s + sco (5.36) 58 T i T" 2 mil 2 T' T' _ s-1 mil S-1 m, s m" S m * m Si S2 Figure 5*3 A bad partial order for the level algorithm. 59 where to is the amount of time needed to execute all the tasks in [T nn , T n _, . . .,T n } scheduled according to the list L, . 11' 12' ' lq 1 Now the problem of scheduling [T , T , . ..,T } is identical to the bin-packing problem [10], [15] since R.(T,, ) = for all j ^ 1. In fact, we can regard R-, as the weight function, and the completion time as the number of boxes used in the bin-packing problem (cf . section 9) • It is known [10], [15] that we can choose the weight 17 function so that the number of boxes used is -r-k -e times greater than the optimal number needed. Thus, by properly choosing R , we have "i 2 ij s - e ' for u> » s 00 10 This proves Theorem 5.7. 6o 5.8 Proofs of Theorems 5.8 and 5-9 (A) Proof of Theorem 5»8 . We shall only prove equation (5.8). The other equation in Theorem 5 '8 can be established similarly. First we partition the set (1,2, . . .,.00} into three parts: A = {k|l ^ k ^ w, 3a task T(i)eF, such that K. m(T(i)) > 1/2} B = (k|l s k i to, W(F. ) s? 2/3} - A K. C '= (1,2, ,05) - A - B Next we define s subsets of tasks: E = (T(i)|R ,(T(i)) > 1/2} j=l,2,... ,s . J J s From the definitions of A and the E! s, it follows that E E. has at 3 j =1 3 least |a| elements. Thus, s 1 1 11 Z E. i? A . 3=1 J Let d, where 1 ^ d ^ s, be such that |E, | = max{|E. |}. Then 1 ■ j |E, | ^ — |A I . Since no two tasks in E, can be executed simultaneously in any schedule, we must have, o) Q > |E d | fei|A| (5.39) Two more inequalities follow easily from the definitions of A, B, C ^o =¥ ( |l A ' + f' Bl) (5A0) |a| + |b| + |c| = w (5.^1) 6l We need one more inequality. For the moment, let us assume that the following is true. Claim u 1 |C| (5A2) From the four equations (5*39), (5^0), (j.kl), and (5.14-2), we can eliminate |a|, |b|, |c| to obtain (J S + 1)U) Q i? 03 (5.14-3) which is the formula we want to prove. The only remaining work now is to prove the validity of the claim, equation (5-14-2). Lemma 5.19 Let ieC and T(i)eF , where £' > I. If for every i", i < i" < l\ there is no T(h)eF „ satisfying T(h) <• T(i), then there must exist a task T(j)eF such that T(j) <• T(i) . Proof Suppose otherwise. Then clearly there must be some task in F« that appears before T. in the list. We shall prove that this leads to contradictions . Case (i) : There is only one task T(k) in F. that appears before T(i) in the list. It is clear that m(T(k)) % 1/2 because icC. Furthermore, since the list is prepared by resource decreasing algorithm, we have m (T(i)) ^ W(T(i)) < W(T(k)) < l/2 . Therefore, T(i) could be done with T(k) . Since T(i) has priority over all tasks in F other than T(k), T(i) should be done with T(k) . This is a contradiction. 62 Case (ii) : There are at least two tasks T, , T', that appear before T(i) in the list. Again we have W(T(i)) ^ W(T(k)),W(T(i)) £W(T(k')) Thus, W(T(i)) ik |(W(T k ) +W(T k ,)) g| Z W(T(j)) ~ T(j)eF je W(T(i)) + Z W(T(j))*| Z W(T(j)) <| • |= 1 (5- MO where, in the last step, the fact that HeC is used. Thus, T(i) could he done with all the T(j)'s in F . This again contradicts the fact that T(i)eF . This proves the lemma. D Lemma 5.20 There is a chain T(j ) <• T(j j) <• . . . <• T(j,) such that, for any £eC, there is a task T(j, ) in the chain such that T(j, )eF» Proof This chain can easily be constructed "from bottom up" by making use of Lemma 5.18. D An immediate consequence of Lemma 5-19 is that. there is a chain of length greater or equal to |c|. Thus, oo > |c|, which proves the chain , equation (5.^-2). This completes the proof of Theorem 5-8. D (B) Proof of Theorem 5. 9 - We define a system of tasks as follows: (i) F = {T^T^T^T^Il 5 i 3= s, 1 £ j g q) (ii) The partial order is defined by (see Figure ^ .h) : T. <• T.' <• T. <• T! 1 ^ i < j ^ s i i 3 J T. <* T. ., t: <• T.' . Vi, 3 63 rp t ml m t sl s2 sq Figure 5.4 A "bad" partial order for the resource decreasing algorithm. (iii) R(T ± ) = (3e,3e, ...,3e) R(T') = (0, ...,0,|-€+8,0, ...,0) l(T id ) = (0, ...,0,|+ € ,0,...,0) S(^) = (o, ...,o,|-€,o,...,o) with q an even number, and < 5 < e < l/8. 1 -k i ^ s 1 £ i £ S, U j Gk Let L be the list generated "by the resource decreasing algorithm. Then L = L-,L ...L (T'T' . . .T" )L'L' . . .L' (T n T_. . .T ) where 12 s v 1 2 s y 1 2 s 1 2 s' L. = (T...,T. , ...,T.„) and LJ = (T' T' . . ., T.' ). For this list, l il' i2' ' i2 y i il j.2.' ' iq/ ' w = s(^ + 2) as can be seen from Figure 5*5 • r ?q M T l T ll T 12 ■ T iq T i T' 11 T ia T' 13 T i^ • T 2 T X 21 Figure 5 '5 The schedule generated by the resource decreasing algorithm for the system in Figure ^.k. 65 On the other hand, by choosing a proper list, the tasks can all he completed in time co = 2s + q (see Figure 5*6). T l *■ ; T s s T 11 ml 11 T 21 mi 21 si T' si T 12 ml 12 T 22 mi 22 T s2 mi s2 # • • ■ I • lq ml lq T 2q mi 2q T sq T' sq Figure 5*6 An optimal schedule for the system in Figure ^.k. As q -*■ oo, — s(|q+2) 2s+q 3 > ^ s - e for any fixed e > 0. D 66 5-9 The Mult i -Weight Packing Problem The proofs for the theorems in section 5 are most conveniently described in the language of a packing problem. In this section we shall define the packing problem and formulate the theorems in section 5 in this new context. The proofs of these theorems are given in Appendix B. Stronger results are now known [12]. (A) The Problem . Suppose we have an unlimited supply of boxes B-.,B p , ... and we want to pack objects ,0 , ...,0 into these boxes. -> Associated with each object 0. is an s-dimensional weight vector a. J J - J whose components are real numbers between and 1. A box B. can hold objects (0- , 0^ , . . ., 0j_ } if each component of the sum vector -> -> -> a. + b.a + . . . + a . is no greater than 1. The problem is to pack l! i 2 x p a given set of objects into as few boxes as possible. When s = 1, this becomes the bin-packing problem which is well-studied in the literature [71. (B) First-fit Algorithm . In the above problem, any list L = (0j_ , 0^_ } ...,0i ), where (i_ ,± ,...,i. ) is a permutation of (.1, 2, ...,t), generates a packing scheme as follows: Step (1): Take a sequence of boxes B, , B _, ...,B . Step (2): Set z(B.) = for all i. Step (3): For j=l, 2, ...,r, do the following: Find the least k such that each component of the vector z(B, ) + a^ . is less than or equal to 1. Set "z(B, ) <- z (B k ) + a^ _ . And we say that we have "packed" 0j_ . into box B, . J We shall use E^L) to denote the number of non-empty boxes after the above procedure. I\L is defined to be the minimum of N FF (L) over all lists L. When s = 1, the first-fit algorithm defined above agrees with the usage of this term in the bin-packing algorithm. 67 (C) Connection Between Scheduling and Packing . Consider the scheduling of tasks (T(l),T(2), . . ., T(r)) with empty partial order <•, resource demand R(T(j)), and number of processors n l r. Let us turn it into a multi-weight packing problem via the following correspondence: T(j) : J R(T(j)) : ^ completion time : number of boxes used (5-^5) Lists: L' = (T(i 1 ), ...,T(i r )) : L = (0. ,...,0 i ). Given any list L' = (T(i. ), T(i ? ), . . ., T(i )) we can generate a list schedule for the scheduling problem. This process can be translated via (5-^5) into an algorithm for generating a packing scheme based on list L = (0j_ , 0j_ , . . .,0. ). If we compare this algorithm with the first- fit algorithm, we see that these two algorithms are not the same. However, it is not too difficult to see that the resulting packing schemes are the same. As a result, w' = N^fL), and to. = N~. This ' L ff allows us to state theorems in section 5 in the following form. Theorem 5.10 N (L)/n % s + 17/20 + e for large N . Theorem 5.11 There exists situations where N FF (L)/N n > s + "j/lO - e with arbitrary large N^. Theorem 5-13 For s = 2, N tt ,„(L)/n^ % 4l/l5 + e for large N_. — — — — — — — r r U U A slightly modified version of packing problem is needed for Theorem 5-12. 68 5 -10 Generalizations to Non-Uniform Task Lengths When tasks lengths are allowed to be non-uniform, generalizations of our results take the following form: n > r Let u n = length of the shortest task Id-. = length of the longest task Then u /u Q ^ ^ s((u Q /u Q ) + l(u Q /n )) for any list L Furthermore, let us "pretend" that every task is of length u and construct a schedule according to level algorithm with every task taking u amount of time. This schedule can become a schedule for the actual problem since each task really needs no more than u amount of time. For this schedule io/to~ ^ (2s+l)u /V . In general, this schedule can not be generated by a list. 5 . 11 Conclusions We have considered several heuristic list scheduling algorithms. The bounds obtained on to /to enable us to make certain conclusions . For example, we learned that a little extra work to prepare the list guarantees a much better efficiency of the multiprocessing system than an arbitrary list. We have also seen that these algorithms do not work as well as they do in the absence of "resource" constraints. It would be interesting if other simple algorithms can be found that make more efficient use of the parallel processing facilities. Some of the results presented here have been improved, and will appear in a paper by Garey, Graham, Johnson, and Yao [12] . 69 LIST OF REFERENCES [1] Aho, A. V., J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms , Addi son-Wesley, Reading, Mass., 197^. [2] Berge, C. and A. Ghouila-Houri, Programming, Games and Transportation Networks , Wiley, 1965, p. 179* [3] Blum, M., R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan, "Time Bounds for Selection, " Journal of Computer and Systems Sciences 7:^, 1973, pp. kk8-kBT. [h] Chen, N. F. and C. L. Liu, "Scheduling Algorithms for Multi- processing Computing Systems, " Proceedings of the 197^- Sagamore Conference on Parallel Processing , August 197^- • [5] Chen, N. F. and C. L. Liu, "Bounds on the Critical Path Scheduling Algorithm on a Multiprocessor Computing System," to appear. [6] Coffman, E. G. and R. L. Graham, "Optimal Scheduling for Two- Processor Systems," Acta Informatica 1:3, 1972, pp. 200-213. [7] Drysdale, III, R. L. and F. H. Young, "Improved Divide/Sort/Merge Sorting Networks, " Knox College preprint, 1973 • [8] Fujii, M., T. Kasami, and K. Ninomiya, "Optimal Sequencing of Two Equivalent Processors," SIAM Journal of Applied Mathematics 17:3, 1969, PP. 78)+-789. [9] Garey, M. R. and R. L. Graham, "Bounds on Scheduling with Limited Resources, " ^"th Symposium on Operating System Principles, October 15-17, 1973- [10] Garey, M. R., R. L. Graham, and J. D. Ullman, "Worst-Case Analysis of Memory Allocation Algorithms, " Conference Record of ACM Symposium on Theory of Computing , 1972 . [11] Gear, C. W., private communication. [12] Garey, M. R., R. L. Graham, D. S. Johnson, and A. C. Yao, to appear. [13] Graham, R. L., "Bounds on Multiprocessing Anomalies and Related Packing Algorithms, " AFIPS Conference Proceedings kO, 1972, pp. 205-217. 70 [lk] Green, M. W., "Some Improvements in Non-Adaptive Sorting Algorithms, " Proceedings of the 6™ Annual Princeton Conference on Information Sciences and Systems , 1972, pp. 387-391- [15] Johnson, D.S., A. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham, "Worst-Case Performance Bounds for Simple One -Dimensional Packing Algorithms," SIAM Journal on Computing 3, 197*+, PP« 299- 326. [16] Hu, T. C, "Parallel Scheduling and Assembly Line Problems," Operation Research 9:6, 1961, pp. 8-U-l— 8^4-8 . [17] Knuth, D. E., The Art of Computer Programming , Volume 3 Sorting and Searching , Addison- Wesley, 1973- [18] Knuth, D. E., The Art of Computer Programming , Volume 3 Sorting and Searching , Addison-Wesley, 1973, pp. 23 1 +-23 5 • [19] Liu, Jane W. S. and C. L. Liu, "Bounds on Scheduling Algorithms for Heterogeneous Computing Systems, " Proceedings of the 197*+ IFIP Congress, August 197*+ [20] Rabin, M. 0. r "Solving Linear Equations by Means of Scalar Products," in Complexity of Computer Computations , edited by R. E. Miller and J. W. Thatcher, Plenum Press, 1972. [21] Aho, A. V., J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms , Addison-Wesley, Reading, Mass., 197*+> Section k.6. [22] Shafarevich, I. R., Basic Algebraic Geometry , Springer-Verlag, 197*+ • [23] Tar j an, R. E., unpublished. [2k~\ Ullman, J. D., "Polynomial Complete Scheduling Problems, " k Symposium on Operating System Principles, October 15-17> 1973- [25] Van Voorhis, D. C, "Toward a Lower Bound for Sorting Networks," in Complexity of Computer Computations , edited by R. E. Miller and J. W. Thatcher, Plenum Press, 1972. [26] Yao, F. F., private communication. [27] Yao, A. C, "Bounds on Selection Networks," Proceedings of the 13 th SWAT Conference , pp. 110- ll6. [28] Yao, A. C, "On Computing the Minima of Quadratic Forms," to appear in the Proceedings of 1973 SIGACT Conference . [29] Yao, A. C, "An 0(|e|%%|v|) Algorithm for Finding Minimum Spanning Trees, " submitted to Information Processing Letters . [30] Yao, A. C, "On Scheduling Unit-Time Tasks with Limited Resources," Proceedings of the 197*+ Sagamore Conference on Parallel Processing , August 1974. 71 APPENDIX A Proof of Theorem 2.k Let f(t,N) be a' function (to be defined later) that satisfies f(t,N) St. We shall construct a family of networks E(t, N) called (t, N) -eliminators with the following property: Of the N output lines of E(t, N) there are f(t, N) designated lines among which the smallest t elements are found for any permutation of the inputs. According to Alexeyev's upper bound (2.2), there exists a (t, f(t, N) )- selector F (dependent on t and N) that contains (f(t,N)-t)(l + 2S l t ' ) < 2f% ' t * 1 ) ] f(t,N) comparators. We can append this network F to the (t, N) -eliminator E(t, N) by making the f(t, N) designated output lines of E(t, N) the inputs to the network F. Figure A.l shows such an arrangement. Clearly this gives us a (t, N)- selector . If g(t,N) is the number of comparators contained in E(t, N), then the total number of comparators in the (t, N) -selector of Figure A.l is bounded from above by g (t,N) ^2\y^- L \ f(t,H). We have proved Lemma A.l U(t,N) ^ g(t,N) +2\ by-^p-} f(t,N) (Al) 72 E(t,N) Figure A.l Construction of a (t, N) -selector. 73 We shall now define f(t, N), E(t, N), and derive upper bounds for g(t, N). They can then be substituted into (Al) to prove Theorem 2.k. Network E(t,N) E(t,N) and f(t,N) are derived inductively as follows: (a) E(1,N): f(l,N) = 1 » i « 1 < . ______ < 1 4 1 1 ii E(2,N) f(2,N) = 2 i p f f — _. 1— « I— « — i it , — < (b) E(t,N) for t 1 3 (i) if t s N, E(t, N) contains no comparators and f(t, N) = N. (ii) if t < N, E(t, N) is given by 7^ E(t,fN/2p E(lt/2J,lN/2J) and f(t,N) = f(tjN/2]) +f([t/2j,[N/2j) (A2) From this construction, we have r ;(t,N) ( V = if tin = N-l if t = 1 = 2N-h if t = 2 = g(t, !"N/2]+g([t/2j,[N/2j)+LN/2j if 3 *t ^N (A3) From (A2) and (A3), the following lemma can be proved by induction. Lemma A. 2 g(t,N) < f (t+l) ]N f(t,N) ^ 2 \ ° / [Ml + . . .+ W 75 Substituting (A^) into (Al), we obtain 2 \H^f] / [%N] N U(t,N) < [%(t+l)]N+^[%^-l Z k=0 (A5) T Ml] / [Ml Now, if t this problem is equivalent to a general bin-packing problem which differs from the conventional bin- packing problem only in that the weights associated with objects are now s -dimensional vectors instead of simple numbers. For a box to hold a set of objects, each component of the sum of the weight vectors must be equal to or less than 1. A first-fit algorithm can be naturally defined. It is possible to prove (see remarks in section 5-9 of the paper) that Theorem 5«H can be formulated as: V L) 17 , , — ^ s + ■— + e for N large enough (B2) We shall prove (B2) . Consider the s-dimensional cube {(y-ijy > • • *?y )|0 - y- - !)• We divide it into s+2 regions: J- d s j Definition Bl r A = {( yi ,y 2 , ...,y g )|0 g j ± g l/2 Vi] CjSCCy^yg, ...,y g )|l/2 < y. g 1, =g y± g 1/2 Vi 4 d), i ^ d ^ s s D = s dimensional cube - A - U C . 77 -*■ n-*n s Definition B2 For y = (y^y^ • • .,y ), let ||y|| = E y . i=l Definition B3 Let y = (y^ . ..,y g ), y* = (y^...,yp. y and y' are incompatible if max(y.+y! } > 1. ill Now consider the packing scheme generated under first- fit algorithm by a list L. For each non-empty box B, let z(b) = a,- + st H + ... + it,- with the a% 's being the weight vectors associated with objects Jp 3t, held in B. Lemma B^ z(B. ) and z(B.) are incompatible if B.,B. are distinct non- empty boxes. Proof Property of first- fit algorithm. D Lemma B5 There is at most one non-empty box B. with z(B.)eA. Proof If there is another B, with "z(B, )eA, then the components of z(B.) +"z(B, ) are all no greater than 1 by Definition Bl. This contradicts Lemma h. D The following lemma will be proved later. Lemma B6 Let 1 g M s, and y\ = (y^, y , . . . , y ) eC . for i=l,2, . . ., I. If y. and y. are incompatible for all i f j, then I Z ||y || * J - 1/2 (B3) i=l I Lemma B7 If z(B..)eC. for i=l,2, . . ., I, then L ||z(B..)ll ^ * - 1 / 2 - Proof From Lemma k and Lemma B6. □ Lemma B8 If there are M distinct boxes non-empty B.-, ,B._, . . .,B. ^ J jl' j2' jn such that "z(B..)eC n V i> then m/n^ < v5 + e for large N_. Jl 1 v ' ' 10 Proof This is essentially a situation for the conventional bin-packing problem. The lemma is a consequence of the Garey-Graham-Ullman bound. D 78 We are now ready to prove (B2) . Let there be M. boxes {B.} with z(B.)eC, and Q, boxes {B.} with z(B.)eD. According to Lemma B5, there is at most one box B- with z(B.)eA. Thus, Z M. + Q + 1 ^ N FF (L) i=l (B4) Without loss of generality, assume JML ^ M c ^ . . . fe M . We -L £_ S partition the set of boxes (3 = (B.|z(B.)eC. for some i) into s disjoint categories : /" ^ category s: It has M groups of boxes. Each group has s s boxes B?t ,B n --., . . .,Bj with B,-.eC. 3=1,2, . ..,s category s-1: It has M -M groups of boxes. Each group has s-I boxes B± -.,B± , . . .,B-j_ with B ± .eC. j=l,2, ..,, s-1 category k: It has M, -M, groups of boxes. Each group has k boxes B-j-^Bi* , . .... B-?, with B-; .eC. j=l,2, ...,k category 1: It has M, -M groups of boxes. Each group has 1 box B n - . with R? n eC. (B5) According to Lemma B7, for a group of boxes {Bj_ , B^ , . . .,Bj_ ,} in category k, we have, k E ||z(Bi )|| a-l/2 (b6) a=i Moreover, "z(B) > 1 if z(B)eD (B7) 79 Therefore, from (b6), (B7), and (B5), we have: s-1 .: ||z(B d )|| 1 M g -(s -1/2) + Z (M k -M k+1 )(k-l/2) + Q- 1 all boxes B.... . k=l j from boxes with "z(B.)eC. from boxes with s = Q. + E M. + l/2 M_ (B8) k=2 * ^ From (B4, (b8), we have -M-i -1 i E \\z(B^)\\ > (N FF (L)-1 X ) + ± M ] _ = N FF (L) -1 -| ^ all boxes B. Hence, N Q a | Z ^(B^H 1 |(N ?F (L) -1 -| M^ (B9) 1 1 N FF (L) ■ + ^ 1 + i^ iJ t" (B10) Making use of Lemma B8, we have 17 V (L) s + il + e s _££ as N Q ■+ » . (Bll) This is just (B2) . Thus, Theorem 5-1 will be proved if we can show that Lemma B6 if true. Proof of Lemma b6 We shall first show that Lemma b6 holds for the case i. = s, i.e., s £ ||y, || * S - l/2 (B12) i=l X 8o We shall prove (B12) by induction: (i) For s = 1, (B12) is obviously true, since y.eC. . (ii) Suppose (B12) is true for s = k-1, we shall prove it is also true for s = k. Since y\ and. y\ are incompatible, either y lk + y kk > 1 — y ll + y kl > 1 " This is SO because y~ij + y^j - 1 for all j f l,k. Without loss of generality, assume y,, + y,, > 1. (B13) Now consider the projection of vectors y-,,y , . . .,,y in the space formed by the first k-1 components. Let us call these projections y^, ...j^. For i ± j (llTi,j ^ k-1), y[ and y' are obviously incompatible. By the induction hypothesis, we must have: k-1 2 lly.'ll > (k-1) - 1/2 (B1*0 i=l ± Therefore, from (B13), (Blk) , we have: k k-1 Z 'M = y ik + y kk + Z H y i'l " 1 + (k_l) " 1//2 - k " X / 2 i=l '" i=l This completes the induction proof for (B12) . To prove Lemma B6 for general I, we need only observe that, any two vectors in the set {y? = (y. ,y. , . . .,y. ) |i=l,2, . . ., £} are incompatible in the i! -dimensional cube. Thus, according to (B12, we i ^ must have Z ||y'.'|| ^ i - l/2. i=l x , Therefore, Z ||y. || i? Z ||y'.'|| i £ - l/2. 1=1 X i=l X This completes the proof of Lemma B6, and Theorem 5.11. □ 8l Proof of Theorem 5.12 To prove: d systems such that — > s +-7- -e N FF (L) N and N„ is large. (A) For s = 1, this theorem is true by the Garey-Graham-Ullman theorem. Therefore, for any number a, there exist two lists of real numbers L ll = (VV'"'V L 10 = ^±±^±2' '"' X± q ) Wh6re ° ~ X ± ~ lf and ^l'V "•»V is a permutation of (1,2, ...,q) such that: (i) a Q > a (ii) a/a > g - § where a Q = N^L^), a = H^L^) (B15) (B) Consider the function p(x) - C (tJ - |) x - (s-l)]/(x+l) (Bl6) Let a' be such that p(x) > ii - e yx^a' (B17) (C) We shall now construct an example to prove Theorem 5-2. Consider a set of objects (0 ,0.,,0! . |l ^ i £ q, 2 ^ j ^ s, ±1 JK JK 1 ^ k ^ a }, where a and q are defined in (A) . Let the s-dimensional weight vectors be defined as follows: a(0 u ) = {\ ± , 8, 6, 8, 5) UiM a(0 ) = (0,0,... ,0.1-6, 0,0, ...0) 3-1 a(0! ) = (26,26, .. .,26, 6,26,26, .. .,26) 2 I j g s, 1 ^ k < a, where 5 = 4sq 3-1 1 82 Let L = L Q L'L^L'...L L'(O nn O no ...O. ) 2 2 3 3 s s^ 11 12 lq/ L n = . (0. . n . ...0_. )L L, ...L L'...L' li. li„ li ' 2 3 s 2 s 12 q. where L. = (0 .,0... . .0. ) 3 jl j2 ja Q ' L! - (0' 0* ..0'. ) 2 ^ J ^ s J Jl J2 ja^ As before, let EL be the minimum of boxes used, then: N FF (L) = (s-l)a Q + a N = W = a + 1 It follows that V L > (*-^y» , a-(s-D „ , (j-|)y^) _, = . = s - 1 + — x - ' > s - 1 + = N Q a Q+ l a Q+ l a 0+ l (B18) where (B15) is used in the last step. Let a Q l a l a' -> 00, using (B17), V L > -, , 17 7 This proves Theorem 5-12. □ Proof of Theorem ^.lk When s = 2, V L) kl + e as 1\L -> co N Q - 15 Proof We shall continue to use the notations defined earlier. 83 regions r v. First divide the region ((y-,,yp) |0 ^ y ,y g 1) into k A = t(y 1 ,y 2 )-|0 *Y ± * 1/2, 1=1,2} c x = {(y^yg) 1 1/2 < y x s 1, =g y £ g 1/2} c 2 = {(y^yg) |o ^ y x =g 1/2, 1/2 < y 2 g 1} D = {(y lf y;r>)\l/2 < y^y 2 ^ 1} (B19) classes: Let us partition the set of all non-empty boxes into 6 disjoint fG Q = {B. |*(Bj)eA] G. = (B. |"z(B.)€C. , B. contains only one object] i=l,2 G.' = {B. I z(B.)eC. , B. contains at least two objects] 1=1.2 ^G 3 = [B |t(B.)eD] (B20) Definition B9 g n 1 |G n |, g. = |0 ± |, g£ = |Gj| 1=1,2, and §3 = | G^ | - >0 ' Lemma BIO ||z(B.)|| > 1 if B.eG, (B2l) J o J Proof Trivial. D The following three lemmas follow easily from the definition of first- fit algorithm. Lemma Bll If B.eG U G', B^eGg U G', then |z(B )|| + HzCb^H § 3/2 (B22) Proof A special case of Lemma B7- D Lemma B12 For each i (i=l,2), for all but one of the boxes B.€G.', [z(B ,)] > 2/3 (B23) J i 81* Proof Let i = 1. Suppose the lemma is false. Then there are two boxes B.,R eG , j < k, such that [z(B.)] ^ 2/3, [z"^)]-, £ 2/3. Then B, contains an object with weight a such that ("a\) ^ 1/3, (a^) ^ l/2- This means CL should be put into box B.. This is a contradiction. D Lemma B13 Let Bj^B^, . . .,Bj eG£ and \ ,3^, . . .,\£G^ be 21 distinct non-empty boxes. Then, for at least 1-2. values of t, lz(B dt )|| + ||z(B kt )|| * 5/3 (B24) Proof It is obvious that either [z(B-? ) +z(Bv,)] > 1 or J"fc t -1 [z(B-j.) +z(B^ )] > 1. Assume that the former is true. Now, if B^, satisfies (B23), then [z(B 1 ) + z(B k )] > 2/3 -> ||z(B, ) + z(\ ) ||^l+2/3=5/3 D Lemma Bl^t- This is at most one box in G„, i.e., g~ Proof A special case of Lemma B5- < 1. □ Without loss of generality, assume g + g' ^ g + g'. There are three cases (i) g-L > g 2 + gg (ii) g 2 + g^ > g 1 and g 1 s g g (iii) g 2 + gg > g 1 and g 2 > g ± Case (i) g_ L ^ g g + g 2 As shown in Figure (a), to each box B.€G p U G', associate a B k(j) eG r Gl B, ,.v D D D D D fk(j) T T t T t b. a a d a a 3 Gg Gg i D Gl G3 G a a d d / eq. (B23) l*fcj)||*2/3 D Dl D l?(B,)lf + P(^ (J) )IIWA! eq. (B22) l|z(B.)||>l eq. (B21) |z(Bj)||>l/2 eq. (B20) 85 + £ l|z(B.) Z ||*(B,)||* Z ||z(B.)||+ir?(B. M .)„ all Bj J B.eG 2 UG 2 J * U; B eG^-G^ J + Z ||1(B) || + Z ||?(B.) B.eGJ J B.eG 3 J J 1 J 3 3 1 2 * ^(g 2 +g 2 ) + 2 (g 1 -g 2 -g 2 ) + J^" 1 ) + gj = 2 g x + J g{ + (g 2 + g 2 + g 5 ) - 2/3 (B25) 86 Thus, N Q ^ \ Z ||z(B )|| ^ \ g x + |(g 2 +g^+g ) + | g{ - l/3 all B (B26) On the other hand, as in the proof for Theorem 5-1* N Q * (g 1+ g{) ^ - € (B27) 1 + g l + g 2 + S l + g 2 + g 3 - N FF^ L ^ (B28) Furthermore, it is not difficult to prove that no two objects in G-, U G p can be put in the same box, thus, N = g l + g 2 (B29) Multiplying both sides of (-B26) by 2, (B27) by ^ , (B28) by 1, (B29) by 1 kl %f( L ) 7- and add them. We obtain, — + e ^ — = as N_ -> 00 . ' 15 W Q Case (ii) g g + g 2 > g ± and g ± > g g B. Gi Gi , -— ^ "] N a a a a Di d di v 1 •I G* G, Gr & D D a «b)||4*(^ (j) )II*3/2 / i|t( V IM/3 eq. (B22) eq. (B23) 2(B 3 ) || + B*(^ (J) )II» 5/3 eq. (B21+) eq. (B21) 8 7 S ||z(B d )||g| g 1 + |(g 2 +g^- gl -2) + ^(gx+gi-gg-gg- 1 ) + g 3 , L 2 . 12 = 2 S l + g 2 + g 2 + 3 g l + g 3 ■ T 1 P Therefore, 2N Q § Z ||z(B )|| ^ ^ &i_ + g 2 + g 2 + 3 g l + S 3 ' k ^ B3 °^ B D Note that (B30) is almost exactly the same as (B26) . Since (B27), (B28), and (B29) are still valid, we can do the same as in case (i). This leads to hi V L) 15 + € - -Bq- as N - °° Case (iii) g g + gg > g ± and g g > g x Gl G i D D D D DJD D □ D D D DID D v_ G 2 ! G ' i/ °3 G o a Id d d , — \ D |t(Bj) 11+1^(^)11 ^3/2 / 1^^)11,2/3 ||z(B.)|| + ||z(B k( . ) )||^5/3 |^(B.||^1 2N Q * | g 2 + |(g^-2) + jCg^g^-gg-g^-l) + g 3 i.e 2N > j(g 1 +g-[) + | g 2 + g^ + g^ - ^ (B3D 88 Wow, from (B27), (B28), (B29) o - 17^1 & 1 ; - e (B32) r + g l + S 2 + g l + g 2 + g 3 - N FF^ L ^ (B33) N = g l + g 2 (B3M Multiply both sides of (B32) by =£ , (B33) by 1, and (B3 1 *-) by i , then add them up. We obtain, ' 2 + 55 + l> H o * I g i - (5 + i e > + V L) ••• l| M * V L) " (5 + 35 £) J4.I %F / \ Hence, — + e ^ -r=— as N Q -*■ 00 . This proves the theorem for case (iii) This completes the proof of Theorem ^.lk. □ 8 9 VITA Andrew Chi-Chih Yao was born in Shanghai, China, on December 2k, 19*4-6. He received his B.S. in Physics from National Taiwan University in 1967^ a *i A.M. from Harvard University in 19&9* and a Ph.D. in Physics from Harvard University in 1972. Since September 1973> he has been studying Computer Science at the University of Illinois. He served in the Chinese Air Force from July 1967 to June 1968. During his graduate study at Harvard, he visited the C.N.R.S. in Marseille, France, for one semester in 1971* From July 1972 to May 1973> he was a research associate in the Physics Department of UC Santa Barbara. He is a member of IEEE and American Mathematical Society. 3IBLI0GRAPHIC DATA iHEET 1. Report No. UIUCDCS-R-75-716 3. Recipient's Accession No. . Title and Subtitle A Study of 'Concrete Computational Complexity 5. Report Date May 1975 . Author(s) Andrew Chi -Chin Yao 8. Performing Organization Rept. No. Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana- Champaign Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract/Grant No. NSF GJ-1H538 2. Sponsoring Organization Name and Address National Science Foundation Washington, D.C. 13. Type of Report & Period Covered 14. 5. Supplementary Notes 6. Abstracts In this thesis, the computational complexities of four problems are studied. In Chapter 2, the problem of selecting t smallest numbers from a set of n by means of networks is studied. The complexity is measured both in terms of the number of comparators used, and the delay time of the network. In Chapter 3 it is shown that O(n^) functional evaluations are needed to locate the minima of quadratic forms in n variables. Chapter k presents an ( e fcg. 805.V ) algorithm for finding a minimum spanning tree. Asymptotically, this improves over the best algorithm previously known for sparse graphs . In the last chapter, the performances of several heuristic algorithms for a multiprocessing model (the limited- re source Garey-Graham model with unit-time-task constraints) are analyzed. In particular, it is shown that the worst-case behavior 7. Key Words and Document Analysis. 17o. Descriptors of the critical path algorithm is much better than that of an arbitrary list schedule. b. Ucntif iers/Open-F.ndcd Terms e. < 0SAT1 Field/Group Availability Statement 19. Security (.lass (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21- No. of Pages 22. Price '»M NTIS--J5 (10-7C USCOMM-DC 40329-P71 JUM Z i>$n