WSmmm THHHWImWHtlBfmmfflfP 1 Hi 111£h1§ g|l|||i HnfffllK H HI H Hi HI H 1H MRi BB ■ rain mm LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 ho. 69 1 -696 cop. 2. The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN • L161 — O-1096 ^ /L, / *T T^f/ Report No. UIUCDCS-R-75-696 ^txTT I Adaptive merging by parallel disjoint comparisons by FcrnicS Gavril January 1975 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN • URBANA, ILLINOIS j^E LIBRARY OF 1 He UN ,VERS.TY OF ILLINOIS Digitized by the Internet Archive in 2013 http://archive.org/details/adaptivemergingb696gavr Report No. UIUCDCS-R-75-696 Adaptive merging by parallel disjoint comparisons by Fanica Gavril January 1975 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 * This work was supported in part by the National Science Foundation under Grant No. US NSF GJ-41538. Abstract Consider two linearly ordered sets A,B, |A| = m, |B| = n and p (p £ m £ n) parallel processors. The paper presents an adaptive parallel merging algorithm by disjoint comparisons, which requires at most 41og 2 2m + 3m/p + 2rm/p~||_log n/mj steps. 1 . Introduction In recent years, there has been a growing interest in developing efficient algorithms for parallel processors, and the present paper continues this line of research. Consider two disjoint linearly ordered sets A = {a, < a« < ... < a }, B = {b, < b« < ... < b }, m <_ n, which are subsets of a linearly ordered set S. The problem of merging A and B is to find the linear ordering of aUb. Throughout the paper we will assume that m <_ n, mn > 1 and we have p parallel processors, p <_ m <_ n. In the efficiency evaluations we will consider only the comparisons between elements of A and elements of B. A step is a set of comparisons performed in the same time by the p parallel processors. As usual, T*! will denote the smallest integer bigger than x, |_*J will denote the biggest integer smaller than x, and |V| will denote the number of elements of a set V. Throughout the paper, the logarithms are in base 2. We will assume that the given sets A,B are sequential lists in the memory of the computer, the elements of A being located in the nodes whose addresses are X+l , ..., X+m and the elements of B in the nodes whose addresses are Y+l , ..., Y+n, X+m < Y. For e\/ery weAUB we keep in the node of w a field for a pointer variable c(w) and a field called ADDRESS(w) which contains the original address of w plus one, i.e., for a.eA ADDRESS(a.) = X+i+1 and for b.eB ADDRESS (b.) = Y+j+1 . At the beginning, c(w) = « for ewery weAUB. In w comparison-interchange operations when we move w, we move its entire node, hence we can know later from ADDRESS (w) what was its original address. For an element aeA let b., b. + , be the place in B in which it must be inserted for obtaining the linear ordering of {a}UB; by inserting a pointer from a to its place in B we mean that we make c(a) equal to the address of the smallest element of B greater than a, i.e., c(a) = Y+j+1 . A merging algorithm is called adaptive if the result of some set of comparisons may be used to decide what will be the comparisons to be performed in further steps. A merging algorithm is called nonadaptive if the comparisons to be performed during the algorithm are initially pre- scribed and they remain fixed irrespective of the result of any particular step. Generally, two different assumptions can be made on the given parallel processor. One assumption is that any memory word can be accessed by any number of processors in the same time. The other assumption is that a memory word can be accessed by only one processor at a time, i.e., the comparisons performed in the same time are disjoint. Based on the above assumptions, we can have four kinds of parallel merging algorithms: a) nonadaptive by disjoint comparisons; b) nonadaptive by comparisons not necessarily disjoint; c) adaptive by disjoint comparisons; d) adaptive by comparisons not necessarily disjoint. The most efficient known algorithm for nonadaptive merging by disjoint comparisons with one processor is the algorithm of Batcher [1], (see merging networks in [2]). It works inductively as follows. Batcher's algorithm . Let A,B be the sets to be merged. BAT 1 . Merge (by comparison-interchanges in place) A 1 = {a r a 3 , ..., * 2[m/2 -]-l } with B l = {b r b 3' "•■ b 2fn/2l-l } obtaining the sorted result C-, = {v, , Vp, ..., v r m /pi + fn/21^' BAT 2 . Merge A 2 = {a 2 , a 4 , ..., a 2 lm/2J* with B 2 = {b 2' b 4' b 2 . n/2 ,} obtaining the sorted result C 2 = {w-j , w 2 , . .., Wj. 2 j + i n /2 J ^ " BAT 3. On the sequence • •* C = {v r w r v 2 , w 2 , ..., v Lm/2J + Ln/2J , w Lm/2j + Ln/2J , V , w } , perform comparison-interchange operations of the following pairs: * T v 2' W ••" w Lm/2J + Ln/2J * (v = Vi ,„i i , i i if m or n is odd, and it does not exist otherwise; Lm/2J + Ln/2J + 1 ** v = V, , n , i ,o i « ifm and n are odd, and it does not exist other- Lm/2j + Ln/2J + 2 wise). The result will be sorted. For a complete proof see Knuth [2]. This algorithm requires at most ((m+n)/2)logm + n steps. As proved by A. and F. Yao [3], such an algorithm requires at least (n/2) log(m+l) steps. There- fore, the algorithm of Batcher is asymptotically optimal. This algorithm can be easily implemented as a nonadaptive merging algorithm by disjoint comparisons with p processors as follows. Perform BAT 1 using Tp/2l processors and BAT 2 using Lp/2J processors, in parallel. After this, perform BAT 3 using p processors. A nonadaptive parallel merging algorithm, by comparisons not necessarily disjoint can be obtained from Batcher's algorithm by performing in parallel the first p required comparisons, then the next p comparisons and so on, without caring whether the comparisons are disjoint or not. The adaptive parallel merging by comparisons not necessarily disjoint , was discussed in [4] which contains an algorithm requiring at most 2riog(2m+l)l + |_3m/pJ + fm/plLlog n/mj steps. The purpose of this paper is to describe an adaptive parallel merging algorithm by disjoint comparisons which requires at most 41og 2m + 3m/p + 2rm/p]Llog n/mj steps and is asymptotically optimal whenever p is a function growing slower than m/41og 2m. Algorithms A,B,C are auxiliary, and they are used in the Main Algorithm. 2. Algorithm A Consider the two ordered sets A,B and the p parallel processors. Algorithm A is a modification of Batcher's algorithm which inserts for every aeA (beB) a pointer to its place in B (in A respectively). The Algorithm A works inductively as follows. Perform BAT 1 using Tp/21 processors and for every aeA, (beB,) make c(a) (c(b)) equal to the address in B (in A) of the smallest element of BflC, (AOC,) greater than a (b respectively). Let us denote these values of c(a), c(b) by c(a), c"(b). Perform BAT 2 using LP/2J processors and assign the values c(a), c(b), aeAp, beBp, as above. Now, in BAT 3, on the sequence C = {v r W] , v 2 , w 2 , ..., v Lm/2J + [n/2J , w Lm/2J + Ln/2J , V , w } we perform the following comparison-interchange operations using p processors: W w 2 :v 3' ••" >/2j + Ln/2J :V * ' Consider some comparison w. :v. + , and assume that the result of this comparison-interchange is w. < v. + , (if v. + , < w. the treatment will be mi similar). Assume that v. +1 eA (the treatment is similar when v. + -, eB). If w i eB, then we put c^.) = ADDRESS(v i+1 )-l . If w i eA, then let v. be the element of B whose address is c(v. + ,). Since v. + , < v. + ^ < ... < v., it follows that v- + -,, v. +2 » . ... , v. , eA. Also, for every v. , k < i+1 , the pair containing v. is at the left of the pair containing w., and hence v. < w.. Thus c~(v. + -,) is the address of the smallest element of BflC-, greater than w. . Therefore, c(w. ) = min(c(w. ), c(v. + -.)) is the address of the smallest element of Bf)C greater than w. . Let us now calculate c(v. + ,). If w i+1 eB then c(v i+ -|) = n(c(v. +1 ), ADDRESS(w i+1 )-l) and if w i+1 eA, then c(v i+1 ) = min(c(v i+1 ), c"(w. +1 )). Now, using the ADDRESS of every node we return every weC to its original place in A or B. When m = n = p, Algorithm A requires pog 2ml steps. 3. Algorithm B Consider the two ordered sets A,B and assume that we have m parallel processors. Algorithm B is an adaptive algorithm which inserts for every a e A (b e B) a pointer to its place in B (in A respectively). In fact, it is a merging algorithm.) It works as follows. Denote r = rn/(m+l)l and let B = {b r , b 2r , ..., b mr } |B| = m, BcB. First, using the m processors, we perform on A and B" the Algorithm A, in Tlog 2ml steps. The set B" divides B in m+1 intervals B, , B«, ..., B , , every of length |_n/m+l)J. For every interval B. defined by b/._ , v and b. , the pointers c(b/. .« ) and c(b. ) define on A an interval A. starting with the successor of the element pointed by c(b#j ,* ) and ending with the element pointed by c(b. ). Now, to eyery interval B.. we assign |A i | processors and for eyery 1 <_ i <_ m + 1 we perform Algorithm B on A. and B. . Let M(n,m,m) be the number of steps required by Algorithm B. Hence, M(n,m,m) = Hog 2ml + max M(Ln/(m+l)J, j, j). Let us prove by 1 <_ j <_ m induction on m and n that M(n,m,m) <_ 21og n. Clearly, M(n,l,l) = pog(n+l)l and M(m,m,m) = ["log 2ml. Assume that the relation is true for less than m processors or less than n elements. Then M(n,m,m) = pog 2ml + max M(|_n/(m+l)J, j, j) 1 <_ j <_ m £ Hog 2ml + 21og|_n/(m+l)J £ 21og n. 4. Algorithm C Algorithm C (based on an observation of Valiant [51) uses two parallel processors to perform on A,B the following algorithm: The first processor performs x comparisons starting with a-,, b, . In some stage it compares a., b.. Then: if a. < b., it makes c(a. ) = Y + j and continues by comparing a. + ,, b.; if a. > b., it makes c(b.) = X + i and continues by comparing a. , b. + ,. After the first processor finishes, the second processor performs m+n-l-x comparisons starting with a , b . In some stage it compares a. , b . Then: if a. < b , it makes c(b ) = X+k+1 and con tinues by comparing a. , b , s if a, > b , it makes c(a. ) = Y+r+1 and continues by comparing a. + ^ , b p . It is easy to see that after both processors are finished, there is a pointer from every alement of A to its place in B and from every element of B to its place in A. 5. The Main Algorithm Let us now describe the algorithm for adaptive parallel merging of A and B by disjoint comparisons using p processors. Denote t = Llog n/mj, u = 2 , v = |_n/u_|, s = I'm/pi, and let B= {b u , b 2u , ..., b vu } . |B| = v . B i = {b u rv/pv b 2urv/ P v ■••■ b (p-i)urv/ P i } » l^i = p - 1 , ^ = {a s , a 2s , ..., a (p _ 1)s }, |A-,| = p - 1 . It is easy to see that m <_ v <_ 2m, n/(2m) <_ u <_ n/m, ["v/pl <_ f2m/pl, B,cBcB and A,cA. Our first task is to find the place of every element of A in B. In the description, we will refer to B" independently of B, but this part of the algorithm can be performed while the elements of B remain in their places in B (without recopying B out of B). This part works as follows. Using p-1 processors, we perform Algorithm B on B, and A, and after this on A, and B in 41og 2m steps. The elements w of A,UB, and their pointers c(w) define two families of successive disjoint intervals {A,, A«, ..., A 2d-1* 0n ^ anC * * B 1' B 2' ••*' B 2n-1^ on ^' c ^ ear ^y' to fi nd the place in B of an element of A. it is enough to find its place in B. . The elements of ff. divide B in p segments B 1 , ..., B p every of length at most |"v/pl, and the elements of A^ divide A in p segments A,, ..., A every of length at most I'm/pi. For every A.. , B i there is a unique j such that A i o a\ and a unique k such that B 1 CB k . For every 1 £ j £ p, let us assign the j-th processor to A^. The segment A^ is defined by a/,-_-|\ s and a. . From the pointers c ( a (i_-n s ) anc ' c ( a i s ) we can find the elements of B, between c ( a (-ji\ s ) and c ( a -j s )- From tne pointers of these elements of B, and a (,-_-n s » a ,- s we can find the intervals A., A. + ,, . .., A. + . contained in A. and their correspondents B., B. +1 , ..., B. + . . Now, using the j-th processor we perform sequentially |A- + | operations as in Algorithm C on every pair A. + . B i +r > 1 £ r ± k, starting with the smallest elements. We perform this in parallel on all the segments A., 1 <_ j ^ p. After this, for every 1 <_ j <_ p we assign the j-th processor to B.. We find the intervals B , B , , ..., B . contained in B. and their correspondents A , A , , ..., A . as above. Now, using the j-th processor we perform sequentially |B | - 1 operations as in Algorithm C on every pair A , B , 1 <_ r <_ h, starting with the biggest elements. We perform this in parallel on all the segments B., 1 <_ j <_ p. In this way, we perform in fact Algorithm C on every pair A., B. , 1 <. i <. 2p - 1, and hence for every element of A we obtain a pointer to its place in B. In the above process the parallel comparisons are disjoint and every processor performs at most |"v/pl + ["m/pl - 2 <_ 3m/p comparisons. The elements of B" divide B - B in v + 1 intervals, every of length u - 1. Thus, for merging A and B we have to insert every element a of A in the interval of B between the addresses c(a) - u and c(a). We do this in the following way. We merge the first p elements of A with B as follows. We consider in parallel the first p intervals of B defined by B, and using c(b. ), c(b/. + -.\ ), 1 <_ i <_ p, we find the segment (and its length) among {a-,, ..., a } which must be inserted in every interval. We do the same on the next p intervals of B, and so on, until we arrive to the interval in which a must be inserted. Then, we assign to every interval of the above stage a number of processors equal to the segment to be inserted in this interval, and we perform Algorithm B on every interval and its corresponding segment. After this, we merge the next p elements of A and the corresponding intervals of B, this time starting with the last interval used in the previous stage, and so on. Since |A| = m, this process will require 2rm/p"|log u = 2rm/pl|_log n/mj steps. At this stage, for every aeA we have a pointer c(a) from a to its place in B. It remains to adjust the pointers ADDRESS for obtaining a linked linear ordering of AUB, as follows. For every 1 <_ i <_ Lm/2J we check whether c(a«. ,) = c(a 2 -) and remember it. Then, for every 1 £i 1 Lm/2J we check whether c(a 2i ) = c(a 2 . + ,) and remember this too. 10 Now, for every 1 <_ i <_ m, if c(a.) f c(a. + ,), we put ADDRESS(a.) = c(a.) and ADDRESS(b. , ) = X + i + 1, when b. is the element of B whose address J-i J is c(a i+1 ). Also, if c^) points to b i+1 , we put ADDRESS^.) = X + 1. The number of steps required by the entire Main Algorithm is 41og 2m + 3m/p + 2rm/p"||.log n/mj . Since any adaptive parallel merging algorithm requires at least (m/p)log n/m + m/p steps ([4]), the above algorithm is asymptotically optimal whenever p is a function growing slower than m/41og 2m. 11 References [1] K. E. Batcher, Sorting Networks and Their Applications, Proc. AFIPS Spring Joint Comp. Conf., 32(1968), pp. 307-314. [2] D. E. Knuth, The Art of Computer Programming, Vol. 3, Sorting and Searching, Addison-Wesley, Reading, Mass., 1973. [3] A. C. Yao and F. F. Yao, Lower Bounds on Merging Networks, Technical Report UIUCDCS-R-74-680, University of Illinois at Urbana-Champaign, 1974. [4] F. Gavril, Merging with Parallel Processors, Technical Report, 1974. [5] L. G. Valiant, Parallelism in Comparison Problems, Technical Report, University of Leeds, England, 1974. I BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-75-696 "4. Title and Subtitle Adaptive merging by parallel dis joint comparisons 7. Author(s) FSfnica Gavril 9. Performing Organization Name and Address University of Illinois at Urbana-Champaign Department of Computer Science Urbana, Illinois 61801 12. Sponsoring Organization Name and Address National Science Foundation Washington, D. C. 20550 3. Recipient's Accession No. 5. Report Date January 1975 6. 8. Performing Organization Rept. N o-UIUCDCS-R-75-696 10. Project/Task/Work Unit No. 11. Contract /Grant No. US NSF-GJ-41538 13. Type of Report & Period Covered 14. 15. Supplementary Notes 16. Abstracts Consider two linearly ordered sets A,B, |A| = m, |B| = n and p (p < m < n) parallel processors. The paper presents an adaptive parallel merging algorithm by disjoint comparisons, which requires at most 41og 2 2m + 3m/p + 2rm/plLlog 2 n/mJ steps. 17. Key Words and Document Analysis. 17a. Descriptors 17b. Identifiers /Open-Ended Terms 17c. COSATI Field/Group 18. Availability Statement Unlimited 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 14 22. Price FORM NTIS-35 (10-70) USCOMM-DC 40329-P71 en *