198 WSBBd 
 
 WM H llBB Bfl 
 
 BHMUkuHHBki 
 
 
 w 
 
 m mm 
 
 ■ 
 
 
 
 
 m 
 
 Hi 
 
 HtWHSfS 
 
 VHHH! 
 
 ISi Ufa 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 5I0.84 
 ho. &49-6S4 
 
 cop. 2 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 SEP27ku 
 
 L161 — O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/boundsforselecti651hyaf 
 
Tt 01 UIUCDCS - R -T i <- 6 51 
 
 yOi-^L^ti; 
 
 BOUNDS FOR SELECTION 
 
 by 
 Laurent Hyafil 
 
 June 197^ 
 
 IHE LIBRARY OF THE 
 
 JUL 9 1974 
 
UIUCDCS-R-7^-651 
 
 Bounds for Selection 
 
 by 
 
 Laurent Hyafil 
 
 June 197^ 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana -Champaign 
 
 Urbana, Illinois 6l801 
 
 This work was supported in part "by NSF Grant GJ-^1538 and in part by IRIA. 
 IRIA-Laboria, Domaine de Voluceau, 7815O Rocquencourt, FRANCE. 
 
ABSTRACT 
 
 In this paper we show that the minimum number of comparisons 
 necessary for the computation of the k*" element of a totally ordered 
 set of size n,V. (n), is lower hounded by n-k+(k-l)[Co^ (r-rr)] . For 
 3 < k < r, this bound improves the best lower bound presently known. A 
 new algorithm which yields an upper bound that is better than the currently 
 known bound for a large range of values of n will also be presented. 
 
1. Introduction 
 
 The selection problem is to determine the k^* 1 element of a 
 totally ordered set P of size n. Two efficient algorithms for solving 
 this problem are presently known. When k is small with respect to 
 n, 3 < k < j- - 1 , A. Hadian and M. Sobel's algorithm ([3]) which needs 
 at most n-k+(k-l)j"^(n-k+2)] comparisons is adequate. Another method, 
 using at most 5.73 n comparisons, was discovered by M. Blum et al. ([1]). 
 This method is more efficient than Hadian and Sobel's method for 
 k > kn/egn . 
 
 Let V, (n) denote the minimum number of comparisons necessary 
 for finding the k element of a set of size n. The exact values of V, (n) 
 are known for k = 1 (V (n) = n-l), and k = 2 (V (n) = n-2+f^.nl ) (Schreier 
 and Kislitsyn). For k = 3> F. Yao ([6]) has obtained a lower bound which 
 is equal to the upper bound of Hadian and Sobel for infinitely many values 
 of n. V. Pratt and F. Yao ([5]) also showed that: 
 
 for k<^n/2%n, \^ n ^ - n " k+ (k-l) [^n-(k-l)%*n - 28$ ( (k-l)! ) ], 
 
 for k < |, V k( n ) - n+2k_ ^ n > 
 
 for n/3 < k < [n-3/2j, V (n) > (3n+k)/2-^n-0(l) . 
 
 improving the bound due to Blum et al, except when 2^n/22g2g.n < k < $$n. 
 
 In this paper we first present a new lower bound for V (n), 
 namely: n-k+(k-l)["^(j~-)] < V fe (n). When 3 < k < r-, this bound is 
 strictly greater than the best previously known bound. For instance, 
 for k = 5, this result together with the best known upper bound enables 
 us to determine the value of V (n) within a gap of at most 8, while the 
 previously known bounds leave a gap of at least 80. Furthermore, this 
 
 _ 
 ^ stands for % p . 
 
result shows that, for a fixed vaiue of k, the tree selection algorithm is 
 asymptotically optimal. 
 
 We then present a new algorithm for selecting the k^* 1 largest 
 element of a set of size n which yields an upper bound that improves 
 
 strictly the previously known bound when 2^n < k < kn/ 2g.n and when 
 
 fk-1 . 1 n 
 ■r—r 1 -1 
 
 2 i +k-2 < n<2 for all integer i. Specifically, for k = 3, the 
 
 o 
 new upper bound is V,(n) < n-3+[% (n-l)]+[ 8$ (n-2 )]. 
 
 Following the original formulation of the selection problem by 
 
 Lewis Caroll ([2]), we call an element of the set P a "player" and a 
 
 comparison between two players a "match" which must be won by one of the 
 
 two players. A procedure for selecting the k^h largest element will be 
 
 referred to as a "tournament" for determining the k^ h best player. 
 
 2. The Lower Bound 
 
 We want to show that, for any algorithm that computes the k^^ 1 
 best player among n players, there exists a ranking of the players such 
 that this algorithm must perform at least n-k+(k-l)[^ r— r- 1 comparisons. 
 The idea of using an oracle in our proof is due to Knuth ([4]) who gave a 
 new proof of Kislitsyn's lower bound for k = 2. Here we extend this idea 
 for all values of k. An oracle is basically a deterministic process 
 which builds up a ranking among the players, while the algorithm tries 
 to find out the solution. This ranking, which must satisfy transitivity 
 and antisymmetry, will force the algorithm to perform at least V, (n) 
 comparisons. A correct algorithm cannot stop before the k^* 1 player is 
 uniquely determined by the oracle (how can the algorithm know the answer 
 if the oracle still has some freedom to choose it?). As a direct 
 
consequence, the set of the k-1 best players must also be uniquely 
 determined. 
 
 We describe the oracle as an automaton whose states are 
 represented by ordered pairs. To be specific, the state vector S before 
 the t match is (cp,, E ) where cp is a mapping from P to N, and E, is a 
 totally ordered subset of P. The initial state is S - (i, 0) where I is 
 the constant mapping such that yxeP, l(x) = 1. Roughly speaking, the 
 players in E are the top players, specifically the i player to enter 
 the set E^. is the i"" 1 best player. Candidates for entering E^. are selected 
 according to the values of cp, . 
 
 The input to the oracle at time t is an unordered pair of players 
 (x,y}, who are engaged in the t match according to the selection procedure. 
 The oracle decides the winner of the match and enters state S according 
 to the following rules: 
 
 Rl 
 
 - If x e E and y e E , then x wins if and only if x > y 
 
 (E, is an ordered set). Moreover, S, . := S, . 
 t ' t+1 t 
 
 R2 - If x e E and y ^ E , then x wins and S := S . 
 
 R3 - If x X E and y X E,, then if cp, (x) > cp, (y) x wins; and if 
 
 cp, (x) = cp, (y) an arbitrary decision compatible with transitivity 
 
 will be made. In both cases, if cp, (x) + cp, (y) > r—r then cp, , := cp,, 
 
 E, := E U{x} and x becomes the smallest element of E, . 
 
 If cp t (x) + cp t (y) < ~- then E t+1 := E t , cp t+] _(y) := o, cp t+1 (x) := cp t (x) +cp t (y) 
 
 and yz / x,y cp t+1 (z) := cp t (z). 
 
Being given that x domirates only x at time 1, we say that x 
 dominates y at time t+1, if x dominates y at time t, or if x has beaten y 
 in the t ^ match, or if x dominates z and z dominates y. Clearly, if x 
 dominates y, x is a better player than y. 
 Theorem : the number V (n) satisfies: n-k+(k-l) [^(r- r-)] < V (n). 
 
 We first prove the following lemma: 
 
 Lemma : Using oracle the k-1 best players will have played at least 
 
 (k-l)p^j(r— r)] matches when the tournament is completed. 
 
 Proof : The lemma follows from the facts listed below: 
 
 Fact 1 : The number of matches won by x by time t is greater or equal 
 
 to pW t (x)] . 
 
 Fact 2 : Let e. e E be the i tn player (l < i < |E | ) to enter E . Then 
 
 e. can be dominated only by e. with j < i. 
 
 Fact 3 ' £ cp, (x) = n. 
 xeP t 
 
 We call W, the set of players x such that x ^ E and cp, (x) > o. 
 Fact k : |E | + |W | > k-1. 
 
 This is a consequence of Fact 3 and from the fact that: V x e P cp, (x) < r—rr . 
 Fact 3 : At the end of the tournament |E | > k-1. 
 
 Since the players in W, can be dominated only by the players in E , if |E | < k-1, 
 then any player in E, or W, can be one of the k-1 best players. Contradiction 
 results from Fact k. 
 
 Fact 6 : At the end of the tournament the k-1 best players are the k-1 top 
 players in E . 
 This is a consequence of Facts 2 and 5. 
 
 Since, when x enters E, by defeating y,cp, (x) + cp, (y) > r~ 
 and cp, (x) > cp, (y), the result is a direct consequence of Facts 1 and 6. □ 
 

 Proof of theorem ; According to the lemma, the k-1 best players have 
 played at least (k-1 ) [&?(:--=-)] matches. Clearly, any player who is not 
 among the k best players have lost at least one match against a player 
 which is not among the k-1 best. Thus, there are n-k additional matches 
 which were not included in the count of the matches played by the k-1 
 top players. This completes the proof of the theorem. 
 
 3. Improving the Upper Bound 
 
 Since Hadian and Sobel's algorithm needs at most n-k+(k-l)[^(n-k+2)] 
 comparisons, the new lower bound presented above enables us to determine 
 V, (n) to within a gap of at most (k-l)[^(k-l)] comparisons. The new 
 algorithm we present reduces that gap when 8g. n < k/2 and when 
 
 2 X +k-2 < n g 2 1 + 2 
 
 k-2 . 
 k^l 1 
 
 -1 
 
 , for any integer i 
 
 We describe the algorithm in a pseudo-ALGOL dialect including set 
 operations ( U, f\~) and list operations (first, last, ", " for concatenation). 
 
 We first describe the procedure BEST(i,S) which is a tree selection 
 algorithm used to determine the ordered list of the i best players of the 
 set S. The set S is initially divided into two disjoint subsets S and 
 S 2 , such that S=S US and |S | = 2i^! S '1~ 1 . Furthermore, each set is 
 associated with a list TOP(S) which is initially empty. 
 
 list procedure WINNER ( list L , list L ) := if last (L ) > last (L ) 
 
 then L else L ; 
 
 comment ; WIENER uses one comparison except if one of the two 
 
 lists is empty; 
 list procedure BEST ( integer i, set S); 
 begin if S ^ then 
 
begin for j = |T0P (S)|+l until i do 
 
 begin W;=WIMER (BEST (l, S^), BEST (l, S )); 
 if T0P(S ) = W then 
 begin T0P(S- L ) :=#; S-r-S.-W; end ; 
 
 else 
 begin TQP(S 2 ):=0; S 2 :=S 2 -W; end ; 
 T0P(S):=T0P(S),W; 
 end ; 
 end ; 
 T0P(S); 
 end . 
 
 This tree selection algorithm performs at most |S| -i+(i-l)f"^|S | ] comparisons 
 (see for instance [k] for further details). The new algorithm is an 
 extension of this tree selection algorithm. Let P be the initial set of 
 players which is divided into two disjoint subsets P, and P p such that 
 PUP = P and |P | = 2'^I P ' '" . The procedure BEST applied to P selects 
 top players one by one in P, and P p . The new algorithm uses two sequences 
 of positive integers fu } and (v } and a characteristic step is to select 
 either the u, top players of P, or the v. top players of P p , according to 
 the results of previous comparisons. 
 
 list procedure SELECT ( integer k, set P); 
 begin h:=l; j:=l; A:»u-+v_; 
 
 while A g k do 
 LI: begin W:=WIWWER (BEST(u^,P ), BEST (v., P )); 
 if T0P(P ) = W then 
 begin T0P (p ) :=0; P :=P -W; 
 h:=h+l; A:=A+u,; 
 
end ; 
 
 else 
 begin T0P (P ):=0; P :=P -W; 
 
 j 
 
 end ; 
 end ; 
 
 R:=k - A + u. + v.; 
 
 3' 
 
 T0P(P):= T0P(P), PICK (BEST(R,P 1 ), BEST(R,P 2 )); 
 
 Comment : TOP(p) contains the k best players of P, furthermore 
 the k th element of TOP(P) is the k th player of P; 
 
 end. 
 
 list procedure PICK ( list L , list L ) 
 
 Comment : selects the top R players from the ordered lists 
 
 L and L of length R using R comparisons; 
 Remark: It is possible (and sometimes more efficient) to 
 use the procedure SELECT recursively instead of the procedure 
 BEST. In that case, since the result of SELECT is not an ordered 
 list, it is also necessary to replace the procedure PICK. 
 
 Analysis of the Algorithm 
 
 An exhaustive analysis of the algorithm, to determine the best 
 possible choices of {u } and (v ) for given values of n and k, being quite 
 tedious, we restrict our study to particular values of {u } and {v } . 
 
 A comparison performed when line LI of the algorithm is executed, 
 or a comparison performed in the procedure PICK, clearly determines at 
 least one new element of the pool of the k best players. Such a comparison 
 will be referred to as an active comparison. 
 
8 
 
 Case 1 u =v =a, ae N, for all integer a. 
 
 a a 
 
 Assuming that k = ta, te N, a+t-1 active comparisons are 
 performed and clearly at most n-2 + (k+a-2) (|"^n-2l) inactive ones. So 
 that the difference between the number of comparisons performed by tree 
 selection and the number of comparisons performed by this algorithm is 
 clearly equal to: 
 
 k - [(a-l)([^nl-l)+|] . 
 
 The choice a=2 shows that this algorithm strictly improves on tree 
 
 selection if k > 2 (f^n]-l). In fact, there is an optimal manner of choosing 
 
 a which is the closest integer to / k . 
 
 V R?n] -1 
 
 For instance, suppose we are to select the 90 player among a 
 set of 20kS. The choice u =v =3, for all a, in our algorithm will save 39 
 comparisons over tree selection. 
 Case 2 
 
 We want to choose fu } and fv } such that, in the worst case, the 
 
 number of inactive comparisons is equal to n-2k + (k-l) [%n] . Such a choice 
 
 guarantees that the algorithm is not worse than tree selection. 
 
 i l i 2 
 Assume that n=2 + 2 with i. > i„. The values of fu 1 must 
 
 12 a 
 
 satisfy the relation: 
 
 n-2-(l - Z u )(i -1) + (k-l- Z u )(i -l)gn-2k + (k-l)(i +l); 
 i^c^h a igc^h a x 
 
 i l" i 2 
 that is: u, g 1+- — — (k-l- Z u ). 
 
 1 l^o^h-1 
 The choice v =1 for all integer a appears to be always convenient, and a 
 simple calculation yields that the algorithm improves strictly on tree 
 selection if: 
 
i 2 < 
 
 (l<-2)i 1 + 1 
 
 For instance, for k=7, using the sequence VL^k, u g =2, u,=l, saves 3 
 comparisons on tree selection if 
 
 IL.+1 
 
 2 < n ^ 2 +2 
 
 For k=3, using u,=2 and u =1 saves one comparison on tree selection if 
 
 i i 
 2 < n g 2 +2 
 
 V 1 
 
 -1 
 
 and the new upper bound for V (n) is 
 
 &n. 
 
 V (n) ^ n-3 + [%(n-l)l+|"^(n-2 L 2j )]. 
 
 i4-. Acknowle dgment s 
 
 I am very grateful to C. L. Liu and J. A. Koch for their comments 
 and suggestions during the preparation of this paper. 
 
 5. References 
 
 [1] Blum, M., R. Floyd, V. Pratt, R. Rivest, and R. Tarjan, "Linear time 
 bounds for median computations", Proceedings of the Fourth Annual ACM 
 Symposium on Theory of Computing, May, 1972. 
 
 [2] Caroll, L., St. Jame's Gazette, August 1, 1883, pp. 5-6. 
 
 [3] Hadian, A., and M. Sobel, "Selecting the t h largest using binary 
 
 errorless comparisons", Technical report 121, Department of Statistics, 
 University of Minnesota, May, 1969* 
 
 [k] Knuth, D., "The Art of Computer Programming", Vol. 3, pp. 209-220, 
 Addison -Wesley, 1973. 
 
 [5] Pratt, V. and F. Yao, "On lower bounds for computing the i largest 
 element", Proceedings of the Ik™ 1 Symposium on Switching and Automata 
 Theory, pp. 70-81, 1973- 
 
 [6] Yao, F., "On lower bounds for selection problems", Technical report 
 MAC TR-121, Massachusetts Institute of Technology, 197^- 
 
3LI0GRAPHIC DATA 
 EET 
 
 1. Report No. 
 
 UIUCDCS-R-T^-651 
 
 3. Recipient's Accession No. 
 
 Title and Subtitle 
 
 BOUNDS FOR SELECTION 
 
 5. Report Date 
 
 June 197^ 
 
 \uthor(s) 
 
 Laurent Hyafil 
 
 8. Performing Organization Rept. 
 No. 
 
 IPerforming Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 618OI 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 GJ-^1538 
 
 I Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, DC 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 Supplementary Notes 
 
 K Abstracts 
 
 In this paper we show that the minimum number of comparisons 
 necessary for the computation of the k^* 1 element of a totally ordered set of 
 size n,V k (n), is lower bounded by n-k+(k-l)f% 2 (~-)l . For 3 < k < j^ this 
 bound improves the best lower bound presently known. A new algorithm which 
 yields an upper bound that is better than the currently known bound for a 
 large range of values of n will also be presented. 
 
 1 Key Words and Document Analysis. 17a. Descriptors 
 
 J. Identifiers /Open-Ended Terms 
 
 c. COSATI Field/Group 
 
 .Availability Statement 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 '"M NTIS-35 ( 10-70) 
 
 USCOMM-OC 40329-P7 1 
 
t& 
 
■ » a .'V 
 
 ■ 
 
 ■ 
 
 I *, 
 
 1 
 
 UN°« 
 
 0*8*** 
 
 
 
 H 
 
 I 
 
 ■ 
 
 ■ 
 
 ■ 
 
 H 
 
 ■ 
 1