198 WSBBd WM H llBB Bfl BHMUkuHHBki w m mm ■ m Hi HtWHSfS VHHH! ISi Ufa LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 5I0.84 ho. &49-6S4 cop. 2 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN SEP27ku L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/boundsforselecti651hyaf Tt 01 UIUCDCS - R -T i <- 6 51 yOi-^L^ti; BOUNDS FOR SELECTION by Laurent Hyafil June 197^ IHE LIBRARY OF THE JUL 9 1974 UIUCDCS-R-7^-651 Bounds for Selection by Laurent Hyafil June 197^ Department of Computer Science University of Illinois at Urb ana -Champaign Urbana, Illinois 6l801 This work was supported in part "by NSF Grant GJ-^1538 and in part by IRIA. IRIA-Laboria, Domaine de Voluceau, 7815O Rocquencourt, FRANCE. ABSTRACT In this paper we show that the minimum number of comparisons necessary for the computation of the k*" element of a totally ordered set of size n,V. (n), is lower hounded by n-k+(k-l)[Co^ (r-rr)] . For 3 < k < r, this bound improves the best lower bound presently known. A new algorithm which yields an upper bound that is better than the currently known bound for a large range of values of n will also be presented. 1. Introduction The selection problem is to determine the k^* 1 element of a totally ordered set P of size n. Two efficient algorithms for solving this problem are presently known. When k is small with respect to n, 3 < k < j- - 1 , A. Hadian and M. Sobel's algorithm ([3]) which needs at most n-k+(k-l)j"^(n-k+2)] comparisons is adequate. Another method, using at most 5.73 n comparisons, was discovered by M. Blum et al. ([1]). This method is more efficient than Hadian and Sobel's method for k > kn/egn . Let V, (n) denote the minimum number of comparisons necessary for finding the k element of a set of size n. The exact values of V, (n) are known for k = 1 (V (n) = n-l), and k = 2 (V (n) = n-2+f^.nl ) (Schreier and Kislitsyn). For k = 3> F. Yao ([6]) has obtained a lower bound which is equal to the upper bound of Hadian and Sobel for infinitely many values of n. V. Pratt and F. Yao ([5]) also showed that: for k<^n/2%n, \^ n ^ - n " k+ (k-l) [^n-(k-l)%*n - 28$ ( (k-l)! ) ], for k < |, V k( n ) - n+2k_ ^ n > for n/3 < k < [n-3/2j, V (n) > (3n+k)/2-^n-0(l) . improving the bound due to Blum et al, except when 2^n/22g2g.n < k < $$n. In this paper we first present a new lower bound for V (n), namely: n-k+(k-l)["^(j~-)] < V fe (n). When 3 < k < r-, this bound is strictly greater than the best previously known bound. For instance, for k = 5, this result together with the best known upper bound enables us to determine the value of V (n) within a gap of at most 8, while the previously known bounds leave a gap of at least 80. Furthermore, this _ ^ stands for % p . result shows that, for a fixed vaiue of k, the tree selection algorithm is asymptotically optimal. We then present a new algorithm for selecting the k^* 1 largest element of a set of size n which yields an upper bound that improves strictly the previously known bound when 2^n < k < kn/ 2g.n and when fk-1 . 1 n ■r—r 1 -1 2 i +k-2 < n<2 for all integer i. Specifically, for k = 3, the o new upper bound is V,(n) < n-3+[% (n-l)]+[ 8$ (n-2 )]. Following the original formulation of the selection problem by Lewis Caroll ([2]), we call an element of the set P a "player" and a comparison between two players a "match" which must be won by one of the two players. A procedure for selecting the k^h largest element will be referred to as a "tournament" for determining the k^ h best player. 2. The Lower Bound We want to show that, for any algorithm that computes the k^^ 1 best player among n players, there exists a ranking of the players such that this algorithm must perform at least n-k+(k-l)[^ r— r- 1 comparisons. The idea of using an oracle in our proof is due to Knuth ([4]) who gave a new proof of Kislitsyn's lower bound for k = 2. Here we extend this idea for all values of k. An oracle is basically a deterministic process which builds up a ranking among the players, while the algorithm tries to find out the solution. This ranking, which must satisfy transitivity and antisymmetry, will force the algorithm to perform at least V, (n) comparisons. A correct algorithm cannot stop before the k^* 1 player is uniquely determined by the oracle (how can the algorithm know the answer if the oracle still has some freedom to choose it?). As a direct consequence, the set of the k-1 best players must also be uniquely determined. We describe the oracle as an automaton whose states are represented by ordered pairs. To be specific, the state vector S before the t match is (cp,, E ) where cp is a mapping from P to N, and E, is a totally ordered subset of P. The initial state is S - (i, 0) where I is the constant mapping such that yxeP, l(x) = 1. Roughly speaking, the players in E are the top players, specifically the i player to enter the set E^. is the i"" 1 best player. Candidates for entering E^. are selected according to the values of cp, . The input to the oracle at time t is an unordered pair of players (x,y}, who are engaged in the t match according to the selection procedure. The oracle decides the winner of the match and enters state S according to the following rules: Rl - If x e E and y e E , then x wins if and only if x > y (E, is an ordered set). Moreover, S, . := S, . t ' t+1 t R2 - If x e E and y ^ E , then x wins and S := S . R3 - If x X E and y X E,, then if cp, (x) > cp, (y) x wins; and if cp, (x) = cp, (y) an arbitrary decision compatible with transitivity will be made. In both cases, if cp, (x) + cp, (y) > r—r then cp, , := cp,, E, := E U{x} and x becomes the smallest element of E, . If cp t (x) + cp t (y) < ~- then E t+1 := E t , cp t+] _(y) := o, cp t+1 (x) := cp t (x) +cp t (y) and yz / x,y cp t+1 (z) := cp t (z). Being given that x domirates only x at time 1, we say that x dominates y at time t+1, if x dominates y at time t, or if x has beaten y in the t ^ match, or if x dominates z and z dominates y. Clearly, if x dominates y, x is a better player than y. Theorem : the number V (n) satisfies: n-k+(k-l) [^(r- r-)] < V (n). We first prove the following lemma: Lemma : Using oracle the k-1 best players will have played at least (k-l)p^j(r— r)] matches when the tournament is completed. Proof : The lemma follows from the facts listed below: Fact 1 : The number of matches won by x by time t is greater or equal to pW t (x)] . Fact 2 : Let e. e E be the i tn player (l < i < |E | ) to enter E . Then e. can be dominated only by e. with j < i. Fact 3 ' £ cp, (x) = n. xeP t We call W, the set of players x such that x ^ E and cp, (x) > o. Fact k : |E | + |W | > k-1. This is a consequence of Fact 3 and from the fact that: V x e P cp, (x) < r—rr . Fact 3 : At the end of the tournament |E | > k-1. Since the players in W, can be dominated only by the players in E , if |E | < k-1, then any player in E, or W, can be one of the k-1 best players. Contradiction results from Fact k. Fact 6 : At the end of the tournament the k-1 best players are the k-1 top players in E . This is a consequence of Facts 2 and 5. Since, when x enters E, by defeating y,cp, (x) + cp, (y) > r~ and cp, (x) > cp, (y), the result is a direct consequence of Facts 1 and 6. □ Proof of theorem ; According to the lemma, the k-1 best players have played at least (k-1 ) [&?(:--=-)] matches. Clearly, any player who is not among the k best players have lost at least one match against a player which is not among the k-1 best. Thus, there are n-k additional matches which were not included in the count of the matches played by the k-1 top players. This completes the proof of the theorem. 3. Improving the Upper Bound Since Hadian and Sobel's algorithm needs at most n-k+(k-l)[^(n-k+2)] comparisons, the new lower bound presented above enables us to determine V, (n) to within a gap of at most (k-l)[^(k-l)] comparisons. The new algorithm we present reduces that gap when 8g. n < k/2 and when 2 X +k-2 < n g 2 1 + 2 k-2 . k^l 1 -1 , for any integer i We describe the algorithm in a pseudo-ALGOL dialect including set operations ( U, f\~) and list operations (first, last, ", " for concatenation). We first describe the procedure BEST(i,S) which is a tree selection algorithm used to determine the ordered list of the i best players of the set S. The set S is initially divided into two disjoint subsets S and S 2 , such that S=S US and |S | = 2i^! S '1~ 1 . Furthermore, each set is associated with a list TOP(S) which is initially empty. list procedure WINNER ( list L , list L ) := if last (L ) > last (L ) then L else L ; comment ; WIENER uses one comparison except if one of the two lists is empty; list procedure BEST ( integer i, set S); begin if S ^ then begin for j = |T0P (S)|+l until i do begin W;=WIMER (BEST (l, S^), BEST (l, S )); if T0P(S ) = W then begin T0P(S- L ) :=#; S-r-S.-W; end ; else begin TQP(S 2 ):=0; S 2 :=S 2 -W; end ; T0P(S):=T0P(S),W; end ; end ; T0P(S); end . This tree selection algorithm performs at most |S| -i+(i-l)f"^|S | ] comparisons (see for instance [k] for further details). The new algorithm is an extension of this tree selection algorithm. Let P be the initial set of players which is divided into two disjoint subsets P, and P p such that PUP = P and |P | = 2'^I P ' '" . The procedure BEST applied to P selects top players one by one in P, and P p . The new algorithm uses two sequences of positive integers fu } and (v } and a characteristic step is to select either the u, top players of P, or the v. top players of P p , according to the results of previous comparisons. list procedure SELECT ( integer k, set P); begin h:=l; j:=l; A:»u-+v_; while A g k do LI: begin W:=WIWWER (BEST(u^,P ), BEST (v., P )); if T0P(P ) = W then begin T0P (p ) :=0; P :=P -W; h:=h+l; A:=A+u,; end ; else begin T0P (P ):=0; P :=P -W; j end ; end ; R:=k - A + u. + v.; 3' T0P(P):= T0P(P), PICK (BEST(R,P 1 ), BEST(R,P 2 )); Comment : TOP(p) contains the k best players of P, furthermore the k th element of TOP(P) is the k th player of P; end. list procedure PICK ( list L , list L ) Comment : selects the top R players from the ordered lists L and L of length R using R comparisons; Remark: It is possible (and sometimes more efficient) to use the procedure SELECT recursively instead of the procedure BEST. In that case, since the result of SELECT is not an ordered list, it is also necessary to replace the procedure PICK. Analysis of the Algorithm An exhaustive analysis of the algorithm, to determine the best possible choices of {u } and (v ) for given values of n and k, being quite tedious, we restrict our study to particular values of {u } and {v } . A comparison performed when line LI of the algorithm is executed, or a comparison performed in the procedure PICK, clearly determines at least one new element of the pool of the k best players. Such a comparison will be referred to as an active comparison. 8 Case 1 u =v =a, ae N, for all integer a. a a Assuming that k = ta, te N, a+t-1 active comparisons are performed and clearly at most n-2 + (k+a-2) (|"^n-2l) inactive ones. So that the difference between the number of comparisons performed by tree selection and the number of comparisons performed by this algorithm is clearly equal to: k - [(a-l)([^nl-l)+|] . The choice a=2 shows that this algorithm strictly improves on tree selection if k > 2 (f^n]-l). In fact, there is an optimal manner of choosing a which is the closest integer to / k . V R?n] -1 For instance, suppose we are to select the 90 player among a set of 20kS. The choice u =v =3, for all a, in our algorithm will save 39 comparisons over tree selection. Case 2 We want to choose fu } and fv } such that, in the worst case, the number of inactive comparisons is equal to n-2k + (k-l) [%n] . Such a choice guarantees that the algorithm is not worse than tree selection. i l i 2 Assume that n=2 + 2 with i. > i„. The values of fu 1 must 12 a satisfy the relation: n-2-(l - Z u )(i -1) + (k-l- Z u )(i -l)gn-2k + (k-l)(i +l); i^c^h a igc^h a x i l" i 2 that is: u, g 1+- — — (k-l- Z u ). 1 l^o^h-1 The choice v =1 for all integer a appears to be always convenient, and a simple calculation yields that the algorithm improves strictly on tree selection if: i 2 < (l<-2)i 1 + 1 For instance, for k=7, using the sequence VL^k, u g =2, u,=l, saves 3 comparisons on tree selection if IL.+1 2 < n ^ 2 +2 For k=3, using u,=2 and u =1 saves one comparison on tree selection if i i 2 < n g 2 +2 V 1 -1 and the new upper bound for V (n) is &n. V (n) ^ n-3 + [%(n-l)l+|"^(n-2 L 2j )]. i4-. Acknowle dgment s I am very grateful to C. L. Liu and J. A. Koch for their comments and suggestions during the preparation of this paper. 5. References [1] Blum, M., R. Floyd, V. Pratt, R. Rivest, and R. Tarjan, "Linear time bounds for median computations", Proceedings of the Fourth Annual ACM Symposium on Theory of Computing, May, 1972. [2] Caroll, L., St. Jame's Gazette, August 1, 1883, pp. 5-6. [3] Hadian, A., and M. Sobel, "Selecting the t h largest using binary errorless comparisons", Technical report 121, Department of Statistics, University of Minnesota, May, 1969* [k] Knuth, D., "The Art of Computer Programming", Vol. 3, pp. 209-220, Addison -Wesley, 1973. [5] Pratt, V. and F. Yao, "On lower bounds for computing the i largest element", Proceedings of the Ik™ 1 Symposium on Switching and Automata Theory, pp. 70-81, 1973- [6] Yao, F., "On lower bounds for selection problems", Technical report MAC TR-121, Massachusetts Institute of Technology, 197^- 3LI0GRAPHIC DATA EET 1. Report No. UIUCDCS-R-T^-651 3. Recipient's Accession No. Title and Subtitle BOUNDS FOR SELECTION 5. Report Date June 197^ \uthor(s) Laurent Hyafil 8. Performing Organization Rept. No. IPerforming Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 618OI 10. Project/Task/Work Unit No. 11. Contract/Grant No. GJ-^1538 I Sponsoring Organization Name and Address National Science Foundation Washington, DC 13. Type of Report & Period Covered 14. Supplementary Notes K Abstracts In this paper we show that the minimum number of comparisons necessary for the computation of the k^* 1 element of a totally ordered set of size n,V k (n), is lower bounded by n-k+(k-l)f% 2 (~-)l . For 3 < k < j^ this bound improves the best lower bound presently known. A new algorithm which yields an upper bound that is better than the currently known bound for a large range of values of n will also be presented. 1 Key Words and Document Analysis. 17a. Descriptors J. Identifiers /Open-Ended Terms c. COSATI Field/Group .Availability Statement 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price '"M NTIS-35 ( 10-70) USCOMM-OC 40329-P7 1 t& ■ » a .'V ■ ■ I *, 1 UN°« 0*8*** H I ■ ■ ■ H ■ 1