university of illinois library At urbana-champaign Digitized by the Internet Archive in 2013 http://archive.org/details/performanceevalu891yama (6n 77? a n UIUCDCS-R-78-891 UILU-ENG 78 1702 # PERFORMANCE EVALUATION OF MULTIPROCESSOR SYSTEMS CONTAINING SPECIAL PURPOSE PROCESSORS by Haruaki Yamazaki January 1978 PERFORMANCE EVALUATION OF MULTIPROCESSOR SYSTEMS by Haruaki Yamazaki Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 January 1978 * This work was supported in part by the National Science Foundation under grant MCS 73-07980 A03. IV TABLE OF CONTENTS Chapter Page 1 INTRODUCTION 1 2 NOTATIONS 3 3 THE SYSTEMS WITH INDEPENDENT INPUT PROCESS 5 3.1 The Deterministic Task Execution Time 5 3.1.1 The Performance of a Multiprocessor System with General Purpose Processors 7 3.1.2 The Performance of a Multiprocessor System with Special Purpose Processors 14 3.2 The Heavy Traffic Approximation with Exponential Execution Time 18 3.2.1 The Performance of a Multiprocessor System with General Purpose Processors 18 3.2.2 The Performance of a Multiprocessor System with Special Purpose Processors 21 4 THE SYSTEM WITH SERIAL INPUT PROCESS 28 4.1 The Performance of a Multiprocessor System with General Purpose Processors 28 4.2 The Performance of a Multiprocessor System with Special Purpose Processors 30 LIST OF REFERENCES 34 APPENDIX 35 LIST OF FIGURES Figure Page 3. 1 Decomposition of a Job 6 3.2 System with General Purpose Processors 8 3.3 Time Slot 9 3.4 Curves W and Wg 15 3.5 Multiprocessor System with Special Purpose Processors... 16 3.6 Equivalent Single Processor Model 20 3.7 Wg and W under Heavily Loaded Condition 26 3.8 Wg and W under Heavily Loaded Condition 27 4 . 1 System with General Purpose Processors 29 4.2 Multiprocessor System with Special Purpose Processors... 31 4.3 Wg and W under Heavily Loaded Condition 33 1 . INTRODUCTION In recent years, large and complex multiprocessor systems have been designed and implemented. The architectural differences among these systems can be characterized in terms of the way to interconnect functional units [1]. Some of them are common bus systems, or crossbar switch systems (e.g., Cmmp [2]), or multiport-memory systems (e.g., Prime [3]). We focus our attention here on the types of functional units (processors) in the system. Some multiprocessor systems consist of identical general purpose processors, which share the input job load under the control of the job scheduler. Other multiprocessor systems consist of special purpose processors (or functionally dedicated pro- cessors) each of which is designed to execute a particular type of job or task. (For example, these processors may be file management pro- cessors, input-output control processors, and processors for high speed computations, etc.) These special purpose processors may be implemented in hardware, firmware or software. In this paper, both types of multiprocessor systems are modelled queueing theoretically and their performance is evaluated. Our purpose here is not to extend known results in the queueing theory itself, but to determine the architectural merits of the two types of systems. Moreover, in the case of the multiprocessor system with special purpose processors, the optimal architecture are discussed. The notations which are necessary in our discussions are defined in section 2. In section 3, we discuss the models where the arrivals of different types of tasks are statistically independent. In section 4, queueing networks are used to model systems in which the execution of some tasks cannot be started before the completion of some other tasks 2. NOTATIONS Let us assume that jobs arriving for service of the multi- processors system may be decomposed into a number of different tasks. There are altogether N+l different types of tasks. (For example, con- sider a system in which some jobs may be decomposed into input tasks followed by compilation, computation and output tasks while the other jobs may be decomposed into input, sorting, merging and output tasks. In this case, we say that there are 6 different types of tasks.) We refer to the set of type i tasks as task set J . • It suffices here to consider the relative speeds of the pro- cessors. In particular, to compare the merits of a system consisting of different special purpose processors with the system consisting of identical general purpose processors, we measure their relative speeds. The relative speed of a special purpose processor with respect to a general purpose processor is called the capacity of the special purpose processor. For example, if a task takes 1/u units of time to be completed by a general purpose processor, then it takes (l/u)(l/C) units of time to be completed by a special purpose processor with capacity C . We refer the time required to complete a task in a given system as the execution time (or service time) of the task on that system. Particularly, the execution time of the task on a general purpose processor is called the amount of work for that task. Hence, if a special purpose processor with capacity C completes the given task within t sec, then the amount of work for that task is fC units. We measure the effectiveness of a multiprocessor system by the total amount of work remaining in the system. That is, the total time required for a general purpose processor to complete all tasks being served and waiting for service in the system. This performance measure is chosen since it is dependent only on the architecture of the system but not on the queueing discipline used to schedule tasks on processors. Let n be the mean queue length in a system consisting of a pro- q cessor with capacity C, and E(S ) be the mean residual time (that is the remaining execution time of the task in progress). The average total amount of work remaining in the system is given by - n + CE(S ) (2.1) u q r where 1/y is the average amount of work for the task. (Note that n and E(S ) also depend on C.) 3. THE SYSTEMS WITH INDEPENDENT INPUT PROCESS In this section, we consider the queueing models of multiprocessor systems in which the arrival processes of the different types of tasks are statistically independent. It is difficult to analyze the general behavior of multiprocessor queues except for the simple case where the arrival process is Poisson and service time is exponentially distributed. The assumption of exponential distribution for service time being invalid in our case, we are forced to use various approximation methods. Again, our purpose in this section is to evaluate the relative effectiveness of different multiprocessor system architectures. 3.1 The Deterministic Task Execution Time In this model, we assume that the amountof work required for all tasks is constant and identical. Furthermore, the interarrival time of jobs is exponentially distributed. Each of these jobs is decomposed into a number of tasks as shown in Figure 3.1. We say that these tasks are generated by the job. Let T. denote the number of tasks of type i generated by a job. Then the total number of tasks generated by the job is T = T n + T, + . . . + T . We assume, furthermore, that T.'s are 1 N l statistically independent random variables. Let us denote the generating function of random variable T. by A.(Z). Then, the generating function N of T is n A. (Z) . i=0 X task type number of tasks Figure 3.1 Decomposition of a Job 3.1.1 The Performance of a Multiprocessor System with General Purpose Processors This system consists of m general purpose processors as described in Figure 3.2. Since all processors are identical, a task can be executed by any of the m processors. Therefore, the job scheduler assigns a waiting task to any processor when it becomes idle. When all processors are busy, the task joins the common queue. Let us define the amount of work for a task as a unit of time or a time slot (see Figure 3.3). If the interarrival time of jobs is sufficiently long compared with this time slot, then we can approximate the exponential distribution of the interarrival time by a geometric distribution. Specifically, the probability that there are n jobs arrived within the time duration At starting from time t is: r • , i . r it (AAt) -A(At) p = Pr {n jobs arrived in [t,t+M:]} = p^— e If we assume that X is sufficiently small (X«l) , then we can approximate this probability distribution as: r ? Q = 1 - AAt I P 1 - XAt l p =0 for n = 2,3,... ^ n Let At = 1. The expression above becomes P = 1 - A ■ P l' A CD en o en a, u u o 3 en a (A 01 H CJ c0 O u u QJ Cu c CD 60 en en en o en en cd o o u eu CD en o a, u 3 CO CD c cu O | 0) en CN ro cd U &o •H O a time slot I 1 \- 1 1 1- 'o H C 2 Si Si+1 Figure 3.3 Time Slot 10 Let the generating function of the total number of tasks arrived within a time slot be denoted by A(Z) . A(Z) is given by: N A(Z) = (1-X) + A n A.(Z) i=0 x (3.1) We assume that the job scheduler assigns tasks to the processors at the beginning of each time slot. If t is the nth epoch of this time slot, the total number of tasks at t +0 forms a markov chain. Since n the execution time of a task is 1 time slot, the total number of tasks at t + is equivalent to the amount of work in the system at t + . Let n n the probability of the arrival of j tasks during one time slot be P. 00 -* (i.e., A(Z) = E P.Z J ). Assuming that the number of processor m is larger j=0 J than AE(T), we solve the mean number of the tasks in the system at the equilibrium. Let II.. be the transition probability of the total number of tasks In the system, then f n. . = < p. V j-i+m for j < i - m for i < m for i>^m, j>_i-m (3.2) Since at the equilibrium, the probability of having j tasks in the system is n. = i n. .n. , 2 i=o 1J x the generating function P(Z) is p(z) = z n.z J = z z 2 z n..n. = z n. z n..z J j-0 2 j=o i=o 1J x i=0 x j=0 1J 11 From the condition in (3.2), we get 00 oo p(z) = n n e l.z j + l z l.z j + ... +n , En , .z J + i n. e i,.z- . _ On 1 . _ li m-1 _ m-lj . i . . ij j=0 J j=0 J j=0 J i=m 3=0 Thus, m— 1 00 00 oo p(z) = e n. e p.z j + e n. e p. _ • z j_1+tn • z 1_m . 1=0 j=0 J i=m j=i-m 00 Since A(Z) = E P.Z 1 , i=0 X m-1 °° p(z) = e n.A(z) + e n.A(z)z 1_in 1=0 i=m m-1 a^v^ m- -'- = e n.A(z) + sV£2. (p( Z ) _ j n.z 1 ) . i-o * z m i-o x In other words, , v m-1 . m-1 p(z) = Au; ( e n.z 1 - z m e n.) A(z)-z m i-o x i-o x m-1 Let F(Z) = E n.(Z 1 -Z m ), then i-0 1 P(z) . ACZHCZ! . (3.3) A(Z)-Z m Since lim P(Z) = 1 and lim A(Z)F(Z) = 1, Z-+1 Z+l A(Z)-Z m lim A'(Z)F(Z)+A(Z)F'(Z) A'(1)F(1)+A(1)F'(1) = Z ' y1 ' A'CZ)^ 1 "' 1 A ' (1) " m In other words , A'(1)F(1) + F'(l) = A'(l) - m , 12 But F(l) = 0. Hence F'(l) = A'(l) - m . (3.4) However, m-1 F'(Z) = E (iZ 1 -mZ 1 "" 1 ) n. , 1=0 1 and m-1 - F'(l) = £ (m-i) n. . 1=0 1 The sum in the right hand side of the equation above is the mean number of idle processors. Then, m+F'(l) is the mean number of busy processors. Therefore, according to Eq . (3.4) A'(l) = E{number of busy processors} . The average number of tasks in the system at each epoch is derived in appendix. It is given by: p . m ~ = z _i_x m (m-,l) + A'(1)+ A " (1) (3 5) i=0 (1 " Z i 2(m-A'(l)) +A U; + 2(m-A»(l)) ' K *' D) z m where Z-,Z..,...,Z are the zeros of (1 - ; / _> ) within the unit circle. If U 1 m— 2. A(,ZJ we assume that every job is decomposed into a number of tasks that are integer numbers of m, then we can solve explicitly for n. In other words, m we assume that IT A.(Z) is a polynomial of Z . (This model can be i=l 1 reduced to the single processor case. But here we will derive the explicit expression using eq . (3.5).) In this case, II. = for i = 1,2,..., m-1, F"(l) and by appendix the first term in eq . (3.5) can be expressed as ?F , ,. >, . Since F „ (1) m d_2Xzl m m " n {i(i _ 1)z i-2 _ m(m _ 1)z m - 2 } dZ . . l 1=1 Z=l ' m-1 a> F"(l) + m(m-l) = Z i(i-l) II. + Z m(m-l) n. 1=1 i=m Let p = Z II. . Then 1 i=m m-1 Therefore F"(l) + m(m-l) = Z i(i-l) II . + m(m-l)p . i=l 1 F"(l) = m(m-l) (p-1) Meanwhille m-1 °° A* (1) = Z in. + Z mn. = mp i=l i=m Therefore F"(l) m(m-l)(p-l) _ m-1 2F'(1) 2m(p-l) 2 Thus we get 13 — m-1 m(m-l) . l/n A"(l) , -. n = T~- 2(m-A'(l)) +A (1) + 2(m-A'(l)) ' (3 ' 6) According to eq . (3.4), the mean number of busy processors is given by A'(l). Hence, n - A'(l) tasks are in the queue. For an arbitrary time t, the mean residual execution time is given as 1/2. The average amount of work in the system is given by: Wg = n - A'(l) +\ A'(l) or - m-1 _ m(m-l) A^J A"(l) } 8 2 2(m-A'(l)) + 2 2(m-A»(l)) k "*' /; 14 m. where m > A'(l). Figure 3.4 describes the behavior of Wg when A(Z) = 1 - A + AZ , and m = N + 1 = 3. 3.1.2 The Performance of a Multiprocessor System with Special Purpose Processors This system consists of N + 1 subsystem of special purpose processors as described in Figure 3.5, where m + m + ... + m Since a special purpose processor can execute only one type of task, the job scheduler assigns a type i task to ith subsystem or let it join the ith queue. Let us consider the number of tasks in the ith subsystem. If we let C. be the capacity of each processor in this subsystem (C. >_ 1), then the execution time of a task is 1/C.. Therefore, if we define this execution time as a time slot, then the number of tasks arrived to this processor during one time slot is (A/C.) T.. Let B.(Z) be the generating function of the number of tasks arrived during the execution of a task, and P.(Z) be a generating function of the number of tasks in ith subsystem. Then the same discussion as in the previous section can be applied. According to eq . (3.3), B (Z)F.(Z) P,(Z) = where l m. B.(Z)-Z X V z) = 1 -cT + cr A i (z) • l l From eq. (3.5), the average number of tasks in ith subsystem n. is given by: W^j 15 W g (A=0.25) Wg(A=0.25) Figure 3.4 Curves W and Wg 16 r o CO 03 B o & n 3 01 >> a h tn O 43 «H CO 3 to 01 tn •rl ai CJ a -c CO o 4-1 a- i-i Q 03 a "N r I-l CM o E o O o o o CJ O * • * o o CL a a r ■ ■ T ' r o 01 o 01 a u M 0) 3 01 >, a, U CO o ,n iH 0) 3 cfl 01 u •H 0) O u 4= 01 o •U a >-l S3 01 a A H S3 S3 CJ o Pu CJ O CJ o ~3 r — 3C -1 Jk . - r CO 01 3 cr 4-1 S3 03 J-l O CO CO 0) O O PL. a> CO o 3 p-. a CD a. w B CD ■M CO >^ CO u o CO CO CO o o I CO V-i 3 60 •H fa 4=> O 17 _ m i 2 . m (n^-1) BV(1) P i (1) = n i = * (i) " 2(m -B!(l)) + 2(m,-B:(l)) * m, Thus, if we assume A. (Z) is a polynomial of Z , the average amount of work in the ith subsystem W. is given by equation (3.7). m.-l m.(m.-l) B|(l) B'.'U) 77 _ _Jt i i , _JL x W i " 2 2(m.-B|(l)) 2 2(m.-B!(l)) * li li where B.(Z) = 1 - A/C. + (X/C.) A.(Z) Moreover, the total mean amount of work in this system Ws is given by: _ N m.-l m.(m.-l) B!(l) B'.'(l) WS = .l Q { ^T~ ~ 2(m.-B: 1 (l)) + "V" + 2(m.-Bj(l)) } (3 ' 8) where N m = E m. i=0 x In particular, ifC. =C, m. =1, N + 1 = m, then eq. (3.8) becomes , m A!(l) CAV(l) WS = 2 f { ~ C~~ + (C-X-A^(l)) } Figure 3.4 also shows the behavior of the curves Ws in the case N + 1 = m = 3, A.(Z) = Z , m. = 1 for i = 1,2,3. That is, Wg = 3A - 1 + 1-2A 77 * / 6 j. 2c 6C , WS = 2 { C + ^2A + ^ 18 where A £ — and X £ C/3. 3.2 The Heavy Traffic Approximation with Exponential Execution Time In this model, we assume the amount of work required for type i tasks is exponentially distributed with parameter a.. Furthermore, we assume the arrived job is decomposed into N + 1 tasks of different types. In other words, if we let S. be the amount of work required for type i job, then the total amount of work generated by the job is S = S. + S 1 + ... + S where S.'s are assumed to be statistically independent random variables. Moreover, we assume the interarrival time of two jobs is exponentially distributed with parameter A, and the system is heavily loaded. 3.2.1 The Performance of a Multiprocessor System with General Purpose Processors This system is similar to the one described in Figure 3.2. Let us denote by t the epoch when nth job arrived. Let W be the amount of 3 n r J n work remaining at t , A be the amount of work completed and B be the n n n amount of work arrived in the duration [t ,, t ]. Then W , can be ' n-1 n n+1 expressed as W+B-A if W + B - A > n n n n n n — (3.9) n+1 L if W + B - A < ^ n n n — Here, we assume heavy traffic condition. That is: the number of tasks arrived during [t n , t ] is sufficiently large so that all m pro- n— i n cessors are busy all the time. Therefore, A is independent of n. 19 Moreover, A has the same distribution as mA where A is the interarrival n time of two jobs. Since A is exponentially distributed with parameter A, mA is also exponentially distributed with parameter A/m. Also, B has the same distribution as S, we can write eq. (3.9) as: W + (S-mA) ifW +S-mA>0 n n ViM (3 - 10 > ,0 ifW+S-mA<0 ^ n However, the expression in the right hand side of this equation is the waiting time of the (n+l)th job in the case when the system consists of one processor, the execution time of a job is S, and the interarrival time of jobs is mA. Thus, we see that the mean waiting time in the single server system shown in Figure 3.6 is equivalent to the average amount of work remaining in our original model. Since Pr{A < t} = 1 - e~ t , and S = S n + S n + ... + S where — 1 n -a.t P{S. < x} = 1 - e X , l — E[mA] = j N 1 E[S] = I — 1=0 i j N 2 N 2 E[S Z ] = ( I — ) + I (— ) . n a/ . n a.' i=0 i i=0 i In the case of single server, the mean queue length and the mean residual time are given by [A]: 20 CU 6 •H ■U CU H e B) •H > 4J •H U CU M o CO ■H >-4 > CU J-i 4-1 U cu CO L CU 00 •H CO J k cu cu a" cu bO •r c CU T3 O >-< o CO CO CU a o U PM CU H M 2 •H C/3 c CU r-H Ctf > •H cr W r"> CU U 3 bO •H CO > •H H U cO .a o 21 (X/m) 2 E(S 2 ) n g 2(1- p) _,_ v _ (A/m)E(S 2 ) E(S r ) - - N 1 where p = (A/m)E(S) = (A/m) Z ( — ). Therefore, as mentioned in i=0 a i Section 2, the mean waizing time in queue of this single server system is 1(8) n g + E(S r ) = (X/ ^ E(S2) (^ f 1) Moreover, the mean amount of work remaining in our original system Wg is: - _ (A/m)E(S 2 ) 1 Wg - <— ) (A/m) (( _1_ + E ( _Ls )( JL-) 2 U / n a/ , ft V ; n l-p ; i=0 i i=0 l where N 1 p = (A/m) Z (^-) i=0 i or 1 N 1 N 1 2 /u_p; i=0 a i i=0 a i 3.2.2 The Performance of a Multiprocessor System with Special Purpose Processors This system is similar to the one described in Figure 3.5. Let us denote by C. the capacity of each processor in the ith subsystem. Let m. and W be the number of processors and the amount of work remaining 22 at the nth epoch in the ith subsystem, respectively. By applying the same arguments in 3.2.1, we express W i as follows: r W (i) + S - C^m.A if W (l) + S, - C^m.A > n iii n iii <&• if W (l) + S - C.m A < n iii Under heavy traffice conditions, we derive the expression for the waiting time in the queue in the case of a single processor system with interarrival time of tasks being C.m. A and the execution time for a job i i being S. Since S and C.m. A are exponentially distributed, then: C.m. E[m.C.A] = "i~i" J A E[S.] = — l a. l 2 2 °1 Consequently, we get that the mean waiting time in queue for a single processor system is equivalent to the average amount of work remaining in the subsystem W . Moreover, W is given by: q/c.m.ms*) 1 S 2 4-p/ C.m. I i i a . , (- L -) 2 1_p i where 23 p i = (A/C i m i>^ 1 Thus the total average amount of work remaining, W , is: N -c\ N i 1 W s = E W^ i; = I ^-_ ( i_ ) (3>12 ) i=0 i=0 C.m.a. 1 - — - ill a .C .m. ill i=0 l l or N 1 1 W_ = A E (~-)( w r l i) ( 3 - 13 > S i-0 a i a i c iV A We can minimize the value of W with respect to m. under the con- N _ 1 dition m = E m.. Since W (m n ,m , . . . ,m ) is a convex function, we can i=0 use the Lagrange method. Let _ _ N W (m ,m , ...,m^,B) = W g + B( E m ± - m) . i=0 3W g To solve - — = 0, we get dm. AC. l (a. Cm. -A) 2 ill Therefore, i A 5 " m i = ^: a+/ — } • (3 - 14) 1 1 N Since m = E m. , i=0 X m = A N i N n~ N i K N nr E -V + 2 -\~ = A E -V" + £ E A±- i=0 Vi i=0/6afc. i-0 °i C i y 6 i=0/afc. li ii Thus 24 /T = N 1 m - A E -^tt- . r, a.C. i=0 i i A N i=0 /a.C. (3.15) Therefore, from eqs . (3.14) and (3.15), we get: /m - A E in. = a .C. l l A +/C. V . a.C. -1 =0 J -1 N I \ r 2 c. (3.16) Furthermore, the minimum value of W is: Min(W g ) N A E i=0 N m - A E A + /c7 l , a .C. 1=0 J J N E j-o - A 1 a 2 C. J J -1 1 N N = A E 1=0 /a.C. _ JUL N * m /CT - X /CT E (— TT-) l l . a.C/ J=0 J J N A ( E N -)• E U-±f) (j~) j=0/aTC. i=0 /cT i - A E ( a.C. J=0 j j Therefore, 25 N n i(W ) = ( E /-—-) 3.17) i=0 a.C. I m 11 I - - £ . n a.C. 3=0 3 J If C. = C for all i, then eq . (3.16) can be simplified to: m. = — — — (3.18) i N , a. 3=0 »J Obviously, when C. = C, the value of m. which minimize W is independent of C. In this case, Min(W ) is given by: N 2 N -1 Min(W s ) -i. (J i) (f-I Z i) (3.19) J=0 J j=0 j Figure 3.7 shows the curves W = Min(W ) , and Wg in the case of N + 1 = 3, ( — ) = i + 1 for i = 0,1,2, C. = C and m = 10. As expected, i under heavily loaded conditions, the optimized system with special purpose processors behaves better. Even in the case C = 1 (i.e., no improvement of processor speed), the value of W is almost comparable to that of Wg. Obviously, we obtain this result since the architectural flexibility in scheduling does not make much difference under heavily loaded conditions. Figure 3.8 describes another case where N + 1 = m = 3, ( — ) = (0.9) 1 , C. = C for i = 0,1,2. Therefore, m. = 1 for all i. Since a. i i i all subsystems consist of a single processor, eq. (3.12) holds for any value of p. (not necessarily p. - 1 for all i) . Obviously W cannot be optimized in this case. 26 W 70 60 50 40 30 20 10 W S (C=1) Wg W S (C=1.2) W S (C=1.5) T7 0.7 0.75 0.8 0.9 0.95 where for Wg: P = 25, 1 <~, ) » mi 5 ■ Wg = TT77 ' mi 5, m = 10, m = 15 for W s : p. = £, W s - 6(^-) P Figure 3.7 Wg and W under Heavily Loaded Condition 27 W _ 0. 4.1 The Performance of a Multiprocessor System with General Purpose Processors This system consists of m general purpose processors. It can be described as in Figure 4.1. Since the tasks must be executed in order, the execution time of a job S is: where S. is the amount of work for a type i task. l We again assume heavily loaded condition as in section 3.2. Since the model is the same as that in section 3.2 in this case, from eq. (3.11), the average amount of work in the system is 1 f N 1 N 1 2 \ N 1 where p = (X/m) I — . i=0 x 29 CU m o ex 03 U S-i o ft DO 03 r-i 0) CO o U o CD M c ft 0) ao 03 u o en CO OJ O o ft CD 03 O ft >-l ft OJ cr c o u QJ § u +J •H I ■u 03 !>•> CO CD H 3 00 ■H ft CO > •H H H CO & O 30 4.2 The Performance of a Multiprocessor System with Special Purpose Processors This system is described as in Figure 4.2. The ith subsystem consists of m. special purpose processors where m = m_ + m. + ... + m^. Since a special purpose processor can execute only one type of tasks, the job must go through the series of N + 1 subsystems. Let us consider the number of tasks in the ith subsystem with C. being the capacity of a processor in the ith subsystem (C. >^ 1). The execution time of a job at the processor in the ith subsystem is exponentially distributed with parameter C.a.. Therefore, by Burke's theorem [5], the interarrival time of jobs to the ith subsystem is also exponentially distributed with parameter X. If we assume heavily loaded condition (all processors are busy all the time) , the average amount of -(i) ' work in the ith subsystem W is: 77(1)' _J ,_l_s . _J. W^ = t: (t~Z~) where p . = S „ 2 v l-p/ i C.m. a. C.m. a. i ill ill However, the jobs in the ith subsystem have additional amount of work. (That is, tasks for i + 1, i + 2, ... N types.) Therefore, the amount of work in the ith subsystem W is W< 1} = n. ? i +W' (i) S i . .,, a. S where n. is the mean number of jobs in the ith subsystem. Since P i [31 n. = , then we get: l 1-p . w« . j^ ; i + * ^ . S 1-p . . , n a „ 2 1-p . l i=i+l i C.m. a. l J J ill 31 6 cu 4-> CO >S CO ■9 42 o •u CO s^ CO ■8 CO 42 ■U S3 r^ 43 CO cu 3 cu cr 42 o H 43 3 CO s - 0) 3 CU 3 cr 42 4-1 i-H c*- 1 43 3 CO CU 3 cu 3 cr 42 S3 CO > •H >-l U CO 43 O •H a cu a C/> 42 ■U •H S a cu 4-1 CO CO >. M w o CO S-i CO O cu CO u CO cu U a fX- o u 0) a 0) •H o 4-» a r-i S 3 3 S a. Cs| CU u 3 •H 32 Therefore, the total average amount of work W is: N ,.. N p. N p. W_ = Z W^ = Z (rr^-) ( Z — ) + — (r-M S . n S . - 1-p . . ... o. a. 1-p . i=0 i=0 l j=i+l 3 li N N p. £ Z — (t- 2 -) • . A . , a . 1-P . i=0 j=l j i (4.1) where i C.m.a. ill If C. = C for all i, then / X W„ = N N 1 E I i=0 j=l ot. m.a . l i C - m.a. i l (4.2) Figure 4.3 describes the behaviors Wg and W in the case of N + 1 = m = 3, (^-) = (0.9) 1 , C. = C. i 33 w < W_(C=1) S Wg 35 1 W S (C=1.2) 30 20 10 W g (C=1.5 ^ W g (C=2) V 0.7 0, 0.9 0.95 Figure 4.3 Wg and W under Heavily Loaded Condition 34 LIST OF REFERENCES [1] Philip, H. and Enslow, J. R., "Multiprocessor Organization - A Survey," Computing Surveys , Vol. 9, No. 1, March 1977. [2] Wulf, W. A. and Bell, G. C, "Cmmp - A Multi-Mini-Processor," Proc. of AFIPS 1972 Fall Joint Comput e r Conference , Vol. 41, 1972, pp. 765-777. [3] Baskin, H. B., Borgenon, B. R. and Roberts, R. , "Prime - A Modular Architecture for Terminal-Oriented Systems," AFIPS 1972 Spring Joint Computer Conference , 1972, pp. 431-437. [4] Coffman and Denning, "Operating Systems Theory," Prentice-Hall, Englewood Cliffs, New Jersey, pp. 157-162. [5] Kleinrock, L., "Queueing Systems," Vol. 1, Whiley, 1975, pp. 147-160. 35 APPENDIX In this appendix, we derive the expressions given by eq. (3.5), the average amount of work in equilibrium state for the system consisting of m general purpose processors (Figure 3.2). Therefore, we need to solve c i • dP(Z) . . , . for lim — — — , which is given: Z-l dz i ,„ dP(Z) . . r A(Z)F(Z))'(A(Z)-Z m )-A(Z)F(Z)(A'(Z)-mZ m " 1 1 lim — — — - lim 1 x ) Z+l aL Z->1 (A(Z)-Z m ) Let N(Z) and D(Z) be the numerator and denominator respectively in the fraction above. Since lim N(Z) = F'(l) - F'(l) = and lim D(Z) = 0, we Z+l Z-+1 a +. c- a ^ * u +.X. dN(Z) , dD(Z) „. . need to find the expression for both — — — and — — — . Clearly, dD(Z) m 2( A '(Z)-mZ ra - 1 )(A(Z)-Z m ) . dZ Therefore, u. MI . o z-i dz Also, dN X Z) ■ 2AA'F' + A 2 F" - mZ m " 1 (A , F+AF') - Z m (A ,, F+2A , F'+AF") uZ + FA(m(m-l)Z m " 2 ) + mZ ID " 1 (AF , +A , F) = F"(A 2 -Z m A) + F , (2AA , -2A»Z m ) + F(m(m-1) Z^A-zV ) . (A-l) 2 2 Since lim ^}p- = lim {F f (2A'-2A' ) } = 0, we need to find d N < Z) and d ^ Z) Z+l dz Z->1 dz dz 36 Differentiating the expression in eq . (A-l) , we have ^fP - F'" (A 2 -Z m A) + F ,, (2AA , -Z m A , -mZ m " 1 A+2AA'-2A , Z m ) + F'(2A ,2 +2AA"-2(Z m A"-hnZ m 1 A , )+m(m-l)Z m 2 A-Z m A M ) + F(m(m-l)Z m 2 A , +m(m-l)(m-2)Z m ' 3 A-Z in A ,,, -mZ m " 1 A n ) Since F(l) = 0, A(l) = 1, 2 lim d ^ Z) = F"(l)(4A'(l)-m-A'(l)-2A'(l)) Z-l dz + F , (l)-(2A ,2 (l)+2A"(l)-2A ,, (l)-2mA , (l)+m(ra-l)-A"(l)) 2 lim d ^ Z) = 2 lim (A(Z)-mZ m_1 ) 2 = 2 (A'(l)-m) 2 = 2F»(1) • z-i dz Z-l Hence, N"(l) = F'^D'F'Cl) + F'(1)(2A'(1)F , (1) + m(m-l) - A"(l)) , D"(l) = 2F ,2 (1) . Thus, 1 . dP(Z) = F"(l) 2A'(l)F'(l)+m(m-l)-A"(l) . ^ dZ ' 2F'(1) + 2F'(1) lA ' /; In other words, the mean number of tasks in the system is: - _ F"(l) m(m-l) -A"(l) n " 2F'(1) + 2F'(1) + 2F'(1) + A (1) (A ' 3) m-1 where F(Z) = E n.(Z 1 -Z m ). Particularly, if m = 1, then i=0 X F'(l) = A'(l) - 1 = -(1-p) , where p is the mean number of arrivals during 37 one time slot, and F"(l) = 0. Hence, A"(l) n = p + 2(l-p) * F"(l) We now solve the value of , / . . According to eq. (3.3), P(Z) F (,i; is expressed as follows: A(Z)F(Z) P(Z) = A(Z)-Z m Since P(Z) is analytic and bound within the unit circle, all zeros of the denominator within the unit circle must coincide with those of the numerator. While by Roache's Theorem, F(Z) can be expressed as follows: F(Z) = K(Z-1)(Z-Z_)...(Z-Z ) U m-z where Z. is the root of A(Z) = Z inside of unit circle. Thus, m-2 m-2 f'(z) = k { n (z-z.) + (z-i) z n (z-z.)} , l -± i=0 X i=0 jff m-2 F'(l) = K n (1-Z.) . i=0 1 Therefore, m-2 m-2 m-2 f"(z) = k { z n (z-z.) + z n (z-z.) + (z-i)( z n (z-z.))*}, i=0 jH J i=0 jj^i J i=0 jffci J m-2 F"(l) = K {2 Z n (1-Z.)} . i=0 jH 3 38 Then, we get: m-2 2K E n (1-Z.) F"(l) = i=0 j±l J F'(l) ' m-2 k n (1-Z.) 1=0 x m-2 = 2 E i-o (1 - z i> Therefore, the mean number of tasks Is: ~ . m : 2 1 m(m-l) A"(l) " 1=0 (1_Z i ) 2 ( m - A, (D) 2(m-A'(l)) z ffi where Z ft ,Z.,...,Z _ are the zeros of (1 - / . ) , in the unit circle, U i m-2 A(.ZJ .)GRAPHIC DATA I 1. Report No. UIUCDCS-R-78-891 and Subtitle PERFORMANCE EVALUATION OF MULTIPROCESSOR SYSTEMS CONTAINING SPECIAL PURPOSE PROCESSORS 3. Recipient's Accession Ni 5. Report Date January 1978 6. oris 1 Haruaki Yamazaki 8. Performing Organization Kept. No. fining Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 10. Project/Task/Work Unit N< 11. Contract 'Grant No. MCS 73-07980 A03 Of ng Organization Name and Address National Science Foundation Washington, D.C. 13. Type of Report & Period Covered 14. ntary Note- In this paper, multiprocessor systems are characterized by the types )rocessors in the system. Some multiprocessor systems consist of identical general )ose processors which share the input job load under the control of the job ;duler. Other multiprocessor systems consist of special purpose processors each /hich is designed to execute a particular type of job. Both types of multi- :essor systems are modeled queueing theoretically, and their performance is Luated to determine their relative architectural merits. In the case of the :iprocessor system with special purpose processors, the optimal architecture iiscussed. ■ fords and Document Analysis. 17a. Descriptors ziprocessor systems, functionally dedicated modules. i-ntifiers Open- landed Terms >v\ I I I ield/Grour " statement > t IS- 15 ( 10-70) 19. Security Class (Thi.- Report ) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of P 22. Price USCOMM-DC 4032<>-F' 0C7 2'WM UNIVERSITY OF ILLINOIS-URBANA 510 84 IL6R no COO? no 886 893(1977 Generating binary trees lexlcographlcall 3 0112 088403594 BETH DH mm MMo m n •We m HH RHH mm mm