WSmmm 
 
 THHHWImWHtlBfmmfflfP 
 
 1 
 
 Hi 111£h1§ 
 
 g|l|||i 
 
 HnfffllK 
 
 H 
 
 HI 
 
 H 
 
 Hi HI 
 H 
 
 1H 
 
 MRi BB 
 
 ■ 
 
 rain 
 
 mm 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 ho. 69 1 -696 
 
 cop. 2. 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 • 
 
 
 L161 — O-1096 
 
^ /L, / *T 
 
 T^f/ Report No. UIUCDCS-R-75-696 
 
 ^txTT I 
 
 Adaptive merging by parallel disjoint comparisons 
 
 by 
 FcrnicS Gavril 
 
 January 1975 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN • URBANA, ILLINOIS 
 
 j^E LIBRARY OF 1 He 
 
 UN ,VERS.TY OF ILLINOIS 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/adaptivemergingb696gavr 
 
Report No. UIUCDCS-R-75-696 
 
 Adaptive merging by parallel disjoint comparisons 
 
 by 
 
 Fanica Gavril 
 
 January 1975 
 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 Urbana, Illinois 61801 
 
 * 
 
 This work was supported in part by the National Science Foundation 
 under Grant No. US NSF GJ-41538. 
 
Abstract 
 
 Consider two linearly ordered sets A,B, |A| = m, |B| = n and 
 p (p £ m £ n) parallel processors. The paper presents an adaptive 
 parallel merging algorithm by disjoint comparisons, which requires at 
 most 41og 2 2m + 3m/p + 2rm/p~||_log n/mj steps. 
 
1 . Introduction 
 
 In recent years, there has been a growing interest in developing 
 efficient algorithms for parallel processors, and the present paper 
 continues this line of research. 
 
 Consider two disjoint linearly ordered sets 
 
 A = {a, < a« < ... < a }, B = {b, < b« < ... < b }, m <_ n, 
 
 which are subsets of a linearly ordered set S. The problem of merging A 
 and B is to find the linear ordering of aUb. Throughout the paper we will 
 assume that m <_ n, mn > 1 and we have p parallel processors, p <_ m <_ n. 
 In the efficiency evaluations we will consider only the comparisons between 
 elements of A and elements of B. A step is a set of comparisons performed 
 in the same time by the p parallel processors. As usual, T*! will denote 
 the smallest integer bigger than x, |_*J will denote the biggest integer 
 smaller than x, and |V| will denote the number of elements of a set V. 
 Throughout the paper, the logarithms are in base 2. We will assume that the 
 given sets A,B are sequential lists in the memory of the computer, the 
 elements of A being located in the nodes whose addresses are X+l , ..., X+m 
 and the elements of B in the nodes whose addresses are Y+l , ..., Y+n, 
 X+m < Y. For e\/ery weAUB we keep in the node of w a field for a pointer 
 variable c(w) and a field called ADDRESS(w) which contains the original 
 address of w plus one, i.e., for a.eA ADDRESS(a.) = X+i+1 and for b.eB 
 
 ADDRESS (b.) = Y+j+1 . At the beginning, c(w) = « for ewery weAUB. In 
 
 w 
 
 comparison-interchange operations when we move w, we move its entire node, 
 hence we can know later from ADDRESS (w) what was its original address. For 
 
an element aeA let b., b. + , be the place in B in which it must be 
 
 inserted for obtaining the linear ordering of {a}UB; by inserting a 
 pointer from a to its place in B we mean that we make c(a) equal to the 
 address of the smallest element of B greater than a, i.e., c(a) = Y+j+1 . 
 A merging algorithm is called adaptive if the result of some set 
 of comparisons may be used to decide what will be the comparisons to be 
 performed in further steps. A merging algorithm is called nonadaptive if 
 the comparisons to be performed during the algorithm are initially pre- 
 scribed and they remain fixed irrespective of the result of any particular 
 step. Generally, two different assumptions can be made on the given 
 parallel processor. One assumption is that any memory word can be accessed 
 by any number of processors in the same time. The other assumption is 
 that a memory word can be accessed by only one processor at a time, i.e., 
 the comparisons performed in the same time are disjoint. Based on the 
 above assumptions, we can have four kinds of parallel merging algorithms: 
 
 a) nonadaptive by disjoint comparisons; 
 
 b) nonadaptive by comparisons not necessarily disjoint; 
 
 c) adaptive by disjoint comparisons; 
 
 d) adaptive by comparisons not necessarily disjoint. 
 
 The most efficient known algorithm for nonadaptive merging by 
 disjoint comparisons with one processor is the algorithm of Batcher [1], 
 (see merging networks in [2]). It works inductively as follows. 
 
 Batcher's algorithm . Let A,B be the sets to be merged. 
 
 BAT 1 . Merge (by comparison-interchanges in place) 
 A 1 = {a r a 3 , ..., * 2[m/2 -]-l } with B l = {b r b 3' "•■ b 2fn/2l-l } 
 obtaining the sorted result C-, = {v, , Vp, ..., v r m /pi + fn/21^' 
 
BAT 2 . Merge A 2 = {a 2 , a 4 , ..., a 2 lm/2J* with B 2 = {b 2' b 4' 
 
 b 2 . n/2 ,} obtaining the sorted result C 2 = {w-j , w 2 , . .., Wj. 2 j + i n /2 J ^ " 
 
 BAT 3. On the sequence 
 
 • •* 
 
 C = {v r w r v 2 , w 2 , ..., v Lm/2J + Ln/2J , w Lm/2j + Ln/2J , V , w } , 
 perform comparison-interchange operations of the following pairs: 
 
 * 
 
 T v 2' W ••" w Lm/2J + Ln/2J 
 
 * 
 
 (v = Vi ,„i i , i i if m or n is odd, and it does not exist otherwise; 
 Lm/2J + Ln/2J + 1 
 
 ** 
 
 v = V, , n , i ,o i « ifm and n are odd, and it does not exist other- 
 Lm/2j + Ln/2J + 2 
 
 wise). The result will be sorted. For a complete proof see Knuth [2]. This 
 algorithm requires at most ((m+n)/2)logm + n steps. As proved by A. and 
 F. Yao [3], such an algorithm requires at least (n/2) log(m+l) steps. There- 
 fore, the algorithm of Batcher is asymptotically optimal. This algorithm 
 can be easily implemented as a nonadaptive merging algorithm by disjoint 
 comparisons with p processors as follows. Perform BAT 1 using Tp/2l 
 processors and BAT 2 using Lp/2J processors, in parallel. After this, 
 perform BAT 3 using p processors. 
 
 A nonadaptive parallel merging algorithm, by comparisons not 
 necessarily disjoint can be obtained from Batcher's algorithm by performing 
 in parallel the first p required comparisons, then the next p comparisons 
 and so on, without caring whether the comparisons are disjoint or not. 
 
 The adaptive parallel merging by comparisons not necessarily disjoint , 
 was discussed in [4] which contains an algorithm requiring at most 
 2riog(2m+l)l + |_3m/pJ + fm/plLlog n/mj steps. 
 
The purpose of this paper is to describe an adaptive parallel 
 merging algorithm by disjoint comparisons which requires at most 
 41og 2m + 3m/p + 2rm/p]Llog n/mj steps and is asymptotically optimal 
 whenever p is a function growing slower than m/41og 2m. Algorithms A,B,C 
 are auxiliary, and they are used in the Main Algorithm. 
 
 2. Algorithm A 
 
 Consider the two ordered sets A,B and the p parallel processors. 
 Algorithm A is a modification of Batcher's algorithm which inserts for every 
 aeA (beB) a pointer to its place in B (in A respectively). The Algorithm A 
 works inductively as follows. 
 
 Perform BAT 1 using Tp/21 processors and for every aeA, (beB,) make 
 
 c(a) (c(b)) equal to the address in B (in A) of the smallest element of 
 BflC, (AOC,) greater than a (b respectively). Let us denote these values 
 
 of c(a), c(b) by c(a), c"(b). 
 
 Perform BAT 2 using LP/2J processors and assign the values 
 c(a), c(b), aeAp, beBp, as above. 
 
 Now, in BAT 3, on the sequence 
 
 C = {v r W] , v 2 , w 2 , ..., v Lm/2J + [n/2J , w Lm/2J + Ln/2J , V , w } 
 
 we perform the following comparison-interchange operations using p 
 processors: 
 
 W w 2 :v 3' ••" >/2j + Ln/2J :V * ' 
 Consider some comparison w. :v. + , and assume that the result of this 
 
 comparison-interchange is w. < v. + , (if v. + , < w. the treatment will be 
 
mi 
 
 similar). Assume that v. +1 eA (the treatment is similar when v. + -, eB). 
 
 If w i eB, then we put c^.) = ADDRESS(v i+1 )-l . If w i eA, then let v. 
 
 be the element of B whose address is c(v. + ,). Since v. + , < v. + ^ < ... 
 
 < v., it follows that v- + -,, v. +2 » . ... , v. , eA. Also, for every 
 
 v. , k < i+1 , the pair containing v. is at the left of the pair containing 
 
 w., and hence v. < w.. Thus c~(v. + -,) is the address of the smallest element 
 
 of BflC-, greater than w. . Therefore, c(w. ) = min(c(w. ), c(v. + -.)) is the 
 
 address of the smallest element of Bf)C greater than w. . 
 
 Let us now calculate c(v. + ,). If w i+1 eB then c(v i+ -|) = 
 
 n(c(v. +1 ), ADDRESS(w i+1 )-l) and if w i+1 eA, then c(v i+1 ) = min(c(v i+1 ), 
 
 c"(w. +1 )). Now, using the ADDRESS of every node we return every weC to its 
 original place in A or B. 
 
 When m = n = p, Algorithm A requires pog 2ml steps. 
 
 3. Algorithm B 
 
 Consider the two ordered sets A,B and assume that we have m parallel 
 processors. Algorithm B is an adaptive algorithm which inserts for every 
 a e A (b e B) a pointer to its place in B (in A respectively). In fact, it is 
 a merging algorithm.) It works as follows. Denote r = rn/(m+l)l and let 
 
 B = {b r , b 2r , ..., b mr } |B| = m, BcB. 
 
 First, using the m processors, we perform on A and B" the Algorithm A, in 
 Tlog 2ml steps. The set B" divides B in m+1 intervals B, , B«, ..., B , , 
 
 every of length |_n/m+l)J. For every interval B. defined by b/._ , v and 
 
b. , the pointers c(b/. .« ) and c(b. ) define on A an interval A. starting 
 with the successor of the element pointed by c(b#j ,* ) and ending with the 
 element pointed by c(b. ). Now, to eyery interval B.. we assign |A i | 
 processors and for eyery 1 <_ i <_ m + 1 we perform Algorithm B on A. and B. . 
 
 Let M(n,m,m) be the number of steps required by Algorithm B. Hence, 
 
 M(n,m,m) = Hog 2ml + max M(Ln/(m+l)J, j, j). Let us prove by 
 
 1 <_ j <_ m 
 
 induction on m and n that M(n,m,m) <_ 21og n. Clearly, M(n,l,l) = pog(n+l)l 
 
 and M(m,m,m) = ["log 2ml. Assume that the relation is true for less than 
 
 m processors or less than n elements. Then 
 
 M(n,m,m) = pog 2ml + max M(|_n/(m+l)J, j, j) 
 
 1 <_ j <_ m 
 
 £ Hog 2ml + 21og|_n/(m+l)J £ 21og n. 
 
 4. Algorithm C 
 
 Algorithm C (based on an observation of Valiant [51) uses two 
 parallel processors to perform on A,B the following algorithm: 
 
 The first processor performs x comparisons starting with a-,, b, . 
 
 In some stage it compares a., b.. Then: if a. < b., it makes c(a. ) = Y + j 
 and continues by comparing a. + ,, b.; if a. > b., it makes c(b.) = X + i and 
 continues by comparing a. , b. + ,. After the first processor finishes, the 
 second processor performs m+n-l-x comparisons starting with a , b . In some 
 stage it compares a. , b . Then: if a. < b , it makes c(b ) = X+k+1 and 
 
 con 
 
 tinues by comparing a. , b , s if a, > b , it makes c(a. ) = Y+r+1 and 
 
continues by comparing a. + ^ , b p . 
 
 It is easy to see that after both processors are finished, there is 
 a pointer from every alement of A to its place in B and from every element 
 of B to its place in A. 
 
 5. The Main Algorithm 
 
 Let us now describe the algorithm for adaptive parallel merging of 
 A and B by disjoint comparisons using p processors. Denote 
 t = Llog n/mj, u = 2 , v = |_n/u_|, s = I'm/pi, and let 
 
 B= {b u , b 2u , ..., b vu } . |B| = v . 
 
 B i = {b u rv/pv b 2urv/ P v ■••■ b (p-i)urv/ P i } » l^i = p - 1 , 
 
 ^ = {a s , a 2s , ..., a (p _ 1)s }, |A-,| = p - 1 . 
 
 It is easy to see that m <_ v <_ 2m, n/(2m) <_ u <_ n/m, ["v/pl <_ f2m/pl, 
 B,cBcB and A,cA. 
 
 Our first task is to find the place of every element of A in B. 
 In the description, we will refer to B" independently of B, but this part 
 of the algorithm can be performed while the elements of B remain in their 
 places in B (without recopying B out of B). This part works as follows. 
 
 Using p-1 processors, we perform Algorithm B on B, and A, and after 
 
 this on A, and B in 41og 2m steps. The elements w of A,UB, and their 
 pointers c(w) define two families of successive disjoint intervals 
 {A,, A«, ..., A 2d-1* 0n ^ anC * * B 1' B 2' ••*' B 2n-1^ on ^' c ^ ear ^y' to fi nd 
 the place in B of an element of A. it is enough to find its place in B. . 
 
The elements of ff. divide B in p segments B 1 , ..., B p every of 
 length at most |"v/pl, and the elements of A^ divide A in p segments 
 A,, ..., A every of length at most I'm/pi. For every A.. , B i there is a 
 unique j such that A i o a\ and a unique k such that B 1 CB k . For every 
 1 £ j £ p, let us assign the j-th processor to A^. The segment A^ is 
 
 defined by a/,-_-|\ s and a. . From the pointers c ( a (i_-n s ) anc ' c ( a i s ) we 
 
 can find the elements of B, between c ( a (-ji\ s ) and c ( a -j s )- From tne 
 
 pointers of these elements of B, and a (,-_-n s » a ,- s we can find the 
 
 intervals A., A. + ,, . .., A. + . contained in A. and their correspondents 
 
 B., B. +1 , ..., B. + . . Now, using the j-th processor we perform sequentially 
 
 |A- + | operations as in Algorithm C on every pair A. + . B i +r > 1 £ r ± k, 
 
 starting with the smallest elements. We perform this in parallel on all 
 the segments A., 1 <_ j ^ p. After this, for every 1 <_ j <_ p we assign the 
 
 j-th processor to B.. We find the intervals B , B , , ..., B . contained 
 
 in B. and their correspondents A , A , , ..., A . as above. Now, using 
 
 the j-th processor we perform sequentially |B | - 1 operations as in 
 
 Algorithm C on every pair A , B , 1 <_ r <_ h, starting with the biggest 
 
 elements. We perform this in parallel on all the segments B., 1 <_ j <_ p. 
 
 In this way, we perform in fact Algorithm C on every pair A., B. , 
 
 1 <. i <. 2p - 1, and hence for every element of A we obtain a pointer to its 
 place in B. In the above process the parallel comparisons are disjoint and 
 
every processor performs at most |"v/pl + ["m/pl - 2 <_ 3m/p comparisons. 
 
 The elements of B" divide B - B in v + 1 intervals, every of length 
 u - 1. Thus, for merging A and B we have to insert every element a of A in 
 the interval of B between the addresses c(a) - u and c(a). We do this in 
 the following way. 
 
 We merge the first p elements of A with B as follows. We consider 
 in parallel the first p intervals of B defined by B, and using c(b. ), 
 
 c(b/. + -.\ ), 1 <_ i <_ p, we find the segment (and its length) among 
 
 {a-,, ..., a } which must be inserted in every interval. We do the same 
 
 on the next p intervals of B, and so on, until we arrive to the interval 
 in which a must be inserted. Then, we assign to every interval of the 
 
 above stage a number of processors equal to the segment to be inserted in 
 this interval, and we perform Algorithm B on every interval and its 
 corresponding segment. 
 
 After this, we merge the next p elements of A and the corresponding 
 intervals of B, this time starting with the last interval used in the 
 previous stage, and so on. 
 
 Since |A| = m, this process will require 2rm/p"|log u = 2rm/pl|_log n/mj 
 steps. 
 
 At this stage, for every aeA we have a pointer c(a) from a to its 
 place in B. It remains to adjust the pointers ADDRESS for obtaining a 
 linked linear ordering of AUB, as follows. For every 1 <_ i <_ Lm/2J we 
 check whether c(a«. ,) = c(a 2 -) and remember it. Then, for every 
 
 1 £i 1 Lm/2J we check whether c(a 2i ) = c(a 2 . + ,) and remember this too. 
 
10 
 
 Now, for every 1 <_ i <_ m, if c(a.) f c(a. + ,), we put ADDRESS(a.) = c(a.) 
 
 and ADDRESS(b. , ) = X + i + 1, when b. is the element of B whose address 
 J-i J 
 
 is c(a i+1 ). Also, if c^) points to b i+1 , we put ADDRESS^.) = X + 1. 
 
 The number of steps required by the entire Main Algorithm is 
 
 41og 2m + 3m/p + 2rm/p"||.log n/mj . 
 Since any adaptive parallel merging algorithm requires at least (m/p)log n/m 
 + m/p steps ([4]), the above algorithm is asymptotically optimal whenever 
 p is a function growing slower than m/41og 2m. 
 
11 
 
 References 
 
 [1] K. E. Batcher, Sorting Networks and Their Applications, Proc. 
 
 AFIPS Spring Joint Comp. Conf., 32(1968), pp. 307-314. 
 [2] D. E. Knuth, The Art of Computer Programming, Vol. 3, Sorting 
 
 and Searching, Addison-Wesley, Reading, Mass., 1973. 
 [3] A. C. Yao and F. F. Yao, Lower Bounds on Merging Networks, 
 
 Technical Report UIUCDCS-R-74-680, University of Illinois 
 
 at Urbana-Champaign, 1974. 
 [4] F. Gavril, Merging with Parallel Processors, Technical Report, 
 
 1974. 
 [5] L. G. Valiant, Parallelism in Comparison Problems, Technical 
 
 Report, University of Leeds, England, 1974. 
 
I BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-75-696 
 
 "4. Title and Subtitle 
 
 Adaptive merging by parallel dis joint comparisons 
 
 7. Author(s) 
 
 FSfnica Gavril 
 
 9. Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 20550 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 January 1975 
 
 6. 
 
 8. Performing Organization Rept. 
 
 N o-UIUCDCS-R-75-696 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 US NSF-GJ-41538 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 Consider two linearly ordered sets A,B, |A| = m, |B| = n and p (p < m < n) parallel 
 processors. The paper presents an adaptive parallel merging algorithm by disjoint 
 comparisons, which requires at most 41og 2 2m + 3m/p + 2rm/plLlog 2 n/mJ steps. 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 17b. Identifiers /Open-Ended Terms 
 
 17c. COSATI Field/Group 
 
 18. Availability Statement 
 
 Unlimited 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 14 
 
 22. Price 
 
 FORM NTIS-35 (10-70) 
 
 USCOMM-DC 40329-P71 
 
en 
 
 *<K 
 
JVJ* 
 
HP 
 
 IP 
 
 am 
 
 UNIVERS.TYOFILLINO.SUn^T 
 510 MHaRno COO? no 691 696(1 9M 
 
 "•part / 
 
 088401663 
 
 
 "hsffl* 
 
 ■ 
 
 ■ 
 
 ■ ■ 
 
 ■ 
 
 
 nan 
 
 
 m 
 
 m 
 
 ■ 
 
 $y 
 
 ■ 
 
 
 I I 
 
 I ■ 
 
 ■ * 
 
 MhKB