?£ 3 Report No. 363 1 a. THE USE AND PERFORMANCE OF MEMORY HIERARCHIES: A SURVEY by D. J. Kuck D. H. Lawrie December h, 1969 THE LIBRARY 01: t THB DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS REPORT NO. 363 THE USE AND PERFORMANCE OF MEMORY HIERARCHIES: A SURVEY by D. J. Kuck D. H. Lawrie December k, I969 Department of Computer Science University of Illinois at Champ aign-Urbana Urbana, Illinois 61801 TABLE OF CONTENTS Page I. Introduction 1 II. Page Fault Rate 3 2.1 EFFECT OF PRIMARY MEMORY ALLOTMENT ON PAGE FAULT RATE ... 3 2.2 EFFECT OF PAGE SIZE AND PRIMARY MEMORY ALLOTMENT ON PAGE FAULT RATE 13 2.2.1 FRAGMENTATIONS AND PAGE SIZE lk 2.2.2 SUPERFLUITY VS. PAGE SIZE 17 2.2.3 PRIMARY MEMORY ALLOTMENT AND PAGE SIZE 18 2.3 REPLACEMENT ALGORITHMS 22 2.U PROGRAM ORGANIZATION 25 2.5 SUMMARY 26 III. Mult iprogranming 26 IV. Average Tine Per I/O Request 3U i+.l PHYSICAL LATENCY OF SECONDARY MEMORY 35 -.2 EFFECTIVE LATENCY OF SECONDARY MEMORY 36 U.3 REQUEST QUEUEING 36 h.k MINIMIZATION OF EXPLICIT I/O REQUEST TIME 37 V. Summary and Extensions Ul LIST OF FOOTNOTES hk BIBLIOGRAPHY 1+7 ii LIST OF FIGURES Figure P a 8 e 1. Mean time to reference p pages as a function of p. 7 2a. E vs. (p,T) surface for q = 2 x 10 , a = 3.8, p ■ 2.U. 11 2b. E vs. (p,T) surface for q = 5 x 10 , a = 3.6, 3 = 2.U 12 3« Memory fragmentation with four pages of size b, = Uq, = 1.5Q, b 3 = 3-2Q and b^ = Uq. B = i+Q. 16 Page fault rate a as a function of primary allotment m and page size B. Data for a FORTRAN compiler is from Anacker and Wane [ ^ ]• Note B scale is logarithmic. 20 Page fault rate \ vs. M and B. Data for a SN0B0L compiler from Varian and Coffman [135]. Note \ scale is different than Figure 2a. Dashed lines indicate locus of equal ,\. 21 5a. CPU efficiency as a function of the number of jobs J and average I/O completion time T. Average page rate is V(3.£ (uk/j) * ) and explicit I/O interrupts occur every 10K instructions on the average. 29 CPU efficiency as a function of J and T. Average page rate is t[3." (6V ; J) )and explicit I/O interrupts occur every instructions on the average. 30 5c. CPU efficiency as a function of J and T. Average page rate is 1/^3.- (32/j) - ' ) and explicit I/O interrupts occur every 10K instructions on the average. 31 Relative gain G in efficiency over monoprogramming for optimal oer of jobs vs. average I/O completion time (normalized). a. = 3»S> 3 = 2.U. Numbers on curves indicate optimal number of jobs. 33 iii Digitized by the Internet Archive in 2013 http://archive.org/details/useperformanceof363kuck LIST OF TABLES Table Page I. Summary of Results from Varian and Coffman [135]. 6 iv I. Introduction The fundamental reason for using memory hierarchies in computer systems is to reduce the system cost. System designers must balance the system cost savings accruing from a memory hierarchy against the system performance degradation sometimes caused by the hierarchy. Since modern computers are being used for a great variety of applications in diverse user environments, the hardware and software systems engineers' task is becoming quite complex. In this paper we shall discuss a number of the hardware and software elements of a memory hierarchy in a computer system. Included are several models and attempts at optimization. Computer engineers may choose from a number of optimization criteria in designing a computer system. Examples are system response time, system cost, and central processing unit (CPU) utilization. We shall primarily discuss CPU utilization and then relate this to system cost. Such considerations as interrupt hardware and scheduling algorithms determine response time and are outside the scope of this paper. In order to discuss CPU utilization, let us list a number of reasons for non-utilization of the CPU. That is, assuming that a user or system program is being executed by the CPU, what may be the causes of subsequent CPU idleness. 1) The computation is completed. 2) A user program generates an internal interrupt due to e.g., an arithmetic fault. - 1 - 3) A user program generates an explicit I/O request to secondary storage. h) The system generates an internal interrupt due to e.g., a page fault. 5) The system generates a timer interrupt. 6) The system receives an external interrupt from e.g., a real time device. We are using "system" here to mean hardware, firmware, or software. Point l) will be implicitly included in some of our discussions by assuming a distribution of execution times. Point 2) will not be discussed. Point 3) will be discussed in some detail and point h) will be given a thorough discussion. Points 5) and 6) fall under system response time and will note be explicitly discussed. If a program (instructions and data) is being executed, let us del'ine a page fault to be the generation by the system of an address outside the machine's primary memory. This leads to the generation by the system of an I/O request to the secondary memory. Now we can des- cribe the CPU idle time for both points 3) and h) above, by CPU I/O idle time = number of I/O requests x average time per I/O request - 2 - In this equation, "average tiirj per I/O request" is the interval from when an I/O request occurs until some user program is again started. Notice that we are including both the case of explicit, user initiated I/O requests and the case of implicit system generated page faults which lead to I/O requests to the secondary memory. Much of our discussion will be centered on the minimization of one or the other of the terms on the richt hand side of this equation. It should be observed that this equation holds for multiprogrammed as well as monoprogrammed systems. In a monoprogrammed system, the "average time per I/O request" is defined as the interval frcm when an I/O request occurs for some program until that program is again started. We regard the execution of operating system instructions as CPU idle time. In a multiprogramming situation, the average time per I/O request is decreased by allowing several users to interleave their I/O requests and we shall also deal with this case. II. Page Fault Rate In this section we will deal with the first term on the right hand side of the equation of Section I. In particular, we will restrict our attention to the rate of generation of page fault I/O requests, explicit I requests being ignored. We consider only demand paging where one page at a tir.e is obtained from secondary memory. 2.1 EFFECT OF PRIMARY MEMORY ALLOTMENT ON PAGE FAULT RATE Obviously, the page fault rate will be zero if all of a program's instructions and data are allowed to occupy primary memory. On the other hand, it has been demonstrated that a small memory allotment can lead to - 3 - disastrous paging rates. The relationship between primary memory allot- ment and page faults has been studied by a number of workers [ 12 , 40 , Ul, 95, 109, 125, 127, 128, 132] and many experiments have been conducted to determine program paging behavior [ k, 9, 11, 18, 27, 55, 62, 95 , 108, 111, 133 > 135]' One of the statistics which is of interest is the length of the average execution burst. We will define an execution burst

i . p i=i x : ir.( T an empirical curve for t = f(p) (Figure l) we can determine P, P" 1 - t - t = \ i/a. - T. i/x. = i/> A. p i=l ' * i=l p - 5 - P T3 o to >< 01 CO P to CO P fx< CO « OJ c co w e M-P CO o "3 a cr CO P CO Q P to c. O (X, COOIIA • 00 ir\ r-H CO H-J- CO 0> OJ OJ H -d- OJ VO P H 03 fa 2 C O CO p on t^-co CD r-l OJ -3" 00 oDOtn ir\_* o H ro OJ J" (JO OJ H oj C •H 1 o CO p CO « c CO p co C H co H OJ j- 8 ITS m H OJ CO c o •H CO C O P C CD -H CO •H ^ p »-, tj cj co ho f~. Cm p| oj O Cw O* p •H W G - 6 - O a o u a) w cd u oj CO ft ft CD CJ C CJ u 0) o •p 0) .9 •p e cd H 0) •H a. ■p •P ITS •P 4* CO ■P -p t J V - 7 - Since t p+1 - t p - f(p+l) - f(p) - ^^Pl 4? we have 1\ S ^El ^ . (!) Thus, we can determine the X. probabilities by examining empirical t curves. We will model the t function of a program with the formula f(p) = 6P 7 . (2) This formula has been applied to the f(p) data presented by Fine, et al and it was determined that 5 = 1.1 and y = 3»^» Using Eqs. (l) and (2) vmere Ap = 1, we find !A P = ^ = r^ 1 « ^ (3) or l/\ = 3.8 p 2,U . Given we are in state p (p most recently referenced pages in primary armory) the probability of referencing a new. page (page fault) at time t, -k t assuming a Poisson distribution is given by p(t|p) = 1-e P . Now, if we assume that we force the system to remain in state p by replacing the least recently used page with the new page each time a page fault occurs, then we might expect the system to continue to behave as before; i.e., t..e system will continue to generate faults according to 1-e * . It - 8 " can then be shown that the mean time between page faults in state p is just given a program starts with one page and is allowed a maximum of p pages should be derived using distributions of q (see Smith [132] and Freibergs [ 62] for q distributions)! but is beyond the scope of this paper. We shall settle for the approximations V p) " ?VJ ' q -*p (5) -i ft where f (q) is the average number of pages referenced in time q < t . In case q > t P V p) : F^fey ' q>t P (6) where p + >. (q - t ) is the total number of page faults generated in time a > t . If a » t , then X q » p and we have: p P P

>t P < T) which is Eq. (U) as q •* °°. - 9 " Each time a page fault occjrs, we have to pay an average time T to make space for and make present a page from secondary memory. Thus, we can define the (rnonoprogrammed) CPU efficiency factor as

W C\J a> - 12 - In this section we have presented a very simple model of program paging behavior in terms of the average time required to reference p pages t - Bp? P Then, under the assumption that paging is a Poisson process, we derived the a /era^e execution burst as a function of the number pages in primary memory ~ d K ~ s ^ (p) ■ -dT = °* Using these relations and values X, O. and £ derived from Fine's results, we showed the effect on monoprogrammed efficiency of a gross time charac- teristic T of secondary memory, primary memory allotment, and time quantum q. This was done under the assumption that the page size was 102U words and that a least-recently-used page replacement algorithm was used. In the following sections, we will examine the effects of different page sizes, replacement algorithms, and the use of multiprogramming to mask I/O time. 2.2 EFFECT OF PAGE SIZE AND PRIMARY MEMORY ALLOTMENT ON PAGE FAULT RATE In the previous section we assumed that the page size was fixed at 102U words. As we shall see in this section, the page size, b, will effect the page fault rate \ for two reasons. First, primary memory may be underutilized to some extent due to a) primary memory not being filled with potentially useful words, i.e., fragmentation and b) the presence of 13 words which are potentially useful but which are not referenced during a period when the page is occupying primary memory, i.e., superfluity. Any underutilization of primary memory tends to increase the page rate since the effective memory allotment is decreased as analyzed in the last section Second, more page faults may be generated when the page size is b than when page size is 2b, simply because we only have to generate one page fault to reference all words in the 2b page whereas to reference the same words we have to generate two faults if the page size is b. 2.2.1 FRAGMENTATIONS AND PAGE SIZE We assume ° that a program consists of a number of segments of size s where s varies according to some statistical distribution with mean s. These segments may contain instructions or data or both. The words of a segment are lo gically contiguous, but need not be stored in a physically contiguous way. Each segment is further divided into a number of pages. The pages consist of b words which are stored in a physically contiguous way. To allow for variable page size, we assume the system imposes a size quantum Q < B on all storage requests such that requests are always rounded up to the next multiple of Q. Page size b may be any multiple of Q, but may not exceed B which is the largest number of neces- sarily physically contiguous words which the system can handle. The ratio B/Q may be thought of as an index of the variability of the page size. All pages of a segment will be of size b = B except the last which will be some multiple n of Q, b = nQ < B. The physical base address of a page may be any multiple of Q; that is, it may be loaded beginning at any address which is a multiple of Q. For example, if the maximum segment size s = B = 21 1024 and Q = 1, then we have the case corresponding to the Burroughs B5500. - 14- If Q » B and s » B, then we have the case of more conventional paging systems. Thus, we night have several pages allocated in primary memory as shown in Figure 3 where Q ■ B/U. Notice that there are two sources of memory waste evident in Figure 3» First, memory is wasted because every storage request must be rounded up to a multiple of Q as shown by the wavy lines. We refer to this as internal fragmentation . Second, memory is wasted because there are four blocks of Q words which cannot be used to hold a full page because they are not contiguous. This is the classical situation of checkerboarding, which we will refer to as external fragmentation . Notice that as Q ■* 1, internal fragmentation diminishes to zero, while as Q - B, external fragmentation disappears. The exact amount of waste will be dependent on Q, B, and the distribution of segment sizes. Randell [113] has studied the effects on memory utilization of variations in these parameters. His results indicate that: l) loss of utilization due to external fragmentation when Q « B is not as great as loss due to internal fragmentation when Q = B; and 2) utilization does net change significantly with changes in the mean segment size if Q « B, but it does change significantly with s if Q = B. It is also apparent that if s » B, then Q makes little difference. Tr.e conclusion from this is that if a program is to be segmented where s = B, then small Q is definitely desirable. If the page size must b 2 = 1.5Q, b - 3.2Q and b. » Uq. b = Uq. - 16 - afford to spend 1/2 the total cost of primary memory on the increased paging hardware. Unfortunately, a small B or Q is not the entire answer. While small B or Q increases memory utilization and thus reduces the page rate for a given memory allotment, small B or Q may also result in u corres- ponding increase in page rate for reasons we will discuss in 2.2.3* 2.2.2 SUPERFLUITY VS. PAGE SIZE Another factor which leads to an effective underutilization of prir.ary memory arises from instruction or data words which are loaded into primary memory as part of a page but are never referenced during that period of residency. We will refer to these words as superfluous . We can obtain slower bound on the number of superfluous words by examining the total primary memory requirements M of a program as a 12 function of page size . That is, assume primary memory is unlimited, then M(B) is the total amount of primary memory occupied after a given execution of the program with page size B. Now, given unlimited primary memory, if the program is run with page sizes b = B and b = 1, then at least MCE) - H(l) words must be superfluous. If we force the program to run with primary memory m < M(B), then page faulting will occur and the number of superfluous words may increase over M(B) - M(l) since some words which are eventually referenced are not referenced during same period of their page residency and are thus superfluous during that period. O'Neill [108 ] 13 and Belady [ ll] 1 present M(B) statistics which are remarkably linear over the ranges 256 < B < 2048 and 128 < B < 1024, respectively. Even for larger page sizes M(B) is reasonably linear, but for small B, M(B) drops off sharply. Thus, we can assume " 17 M(B) = a Q + a^ 256 < B < 1024 (9) and a.B is a lower bound on the number of superfluous words. Unfortunately Eq. (9) only establishes a lower bound on the number of superfluous words. It does not tell us anything about the aver- age number of superfluous words present when primary memory is less than that absolutely required by the program. The authors know of no published data which pertain directly to superfluous words in this case, so we shall move on to determine the overall effect of block size on the paging rate \. 2.2.3 PRIMARY MEMORY ALLOTMENT AMD PAGE SIZE In Section 2.1 we discussed the average execution burst cp(p) as a function of memory allotment in units of p, the number of b=B=102U word pages. In this section we will examine the paging rate X = l/cp as a func- tion of primary memory allotment in words m = pB, for various values of page size b=B. We would expect that for small m, \ will vary considerably with the page size. This is because for small m, the average time each page is in primary memory will be relatively short, and so the extra words in larger pages will tend to go unreferenced and will only take up space which might better be occupied by new, smaller pages. On the other hand, as m increases, we would expect to see page size have less effect since the probability will be higher that more words in the page will be refer- enced due to the longer expected page residence time. In addition, we might also expect to see, for a given m, a B , such that any B.. > B will only include superfluous words and any B p < B will not include enough words. - 18 - Figure 4a is a graph oJ X vs. B and m based on experimental data from a FORTRAN compiler [ h ] . This graph clearly exhibits that when a program is "compressed," i.e., run in a smaller memory, large page sizes lead to excessive paging. When the page size is small, then the program tends to be more compressible. As m gets larger, the paging behavior becomes less a function of B, and for large enough m, small B - even increase the page rate. Slight minimum points were observed at the (rn,B) points (2K,64), (MC,256), (8k,256). This illustrates that if talninw exist, then they are not necessarily independent of m. Figure 4b is another graph of \ vs. m and B data for a SNOBOL compiler [135l« This program is evidently much less "compressible" than the FORTRAN compiler in Figure 4a. However, it shows the same general tendencies as Ficurc 4a except for the apparent lack of minima. Another way to view the X vs. (m,B) relationship can be seen observing in Figure 4b the dashed lines which pass through points of eaual \, Notice that ,\(6K, 256) is only slightly lower than \(4k, 64). Thus, we can affect an almost equal tradeoff between half as much primary memory and 1/4 the page size; i.e., we double the number of pages but each page is only 1/4 as large. However, we must also consider the increase 17 paging Hardware necessary to handle the larger number of pages. Tr.e r.ain point to be had from these figures is that programs are more cor.pressible when B is small; i.e., they will tolerate a much smaller primary memory allotment if B is small. However, too small a B may lead to a slight increase in paging activity. (See also a study performed on the ATIAS system by Baylis, et al . [9 ].) The above results further support arguments for variable page sizes allowing logically dependent words (e.g., subroutines or array rows) - 19- J- on CVJ H -d- o o o O OJ • ' o H m S£ - 20 - << ,3 - 21 - to be grouped in a page without leading to underutilization of memory due to internal fragmentation or superfluity. Logical segmentation of code and data will be taken up more generally in later sections. 2.3 REPLACEMENT ALGORITHMS Whenever it is necessary to pull a new page, i.e., transfer a new page from secondary to primary memory, it is also necessary to select a replacement page in primary memory to be pushed (transferred to secondary memory) or overlayed. If we assume that all programs are in the form of pure procedures, then we never need to push program pages. Data pages need to be pushed only if we have written into them. The selection of a replacement page is done by a replacement algorithm. A number of these algorithms have been proposed and evaluated [ 9, H> 12, 17, lo, 27, kO, kl, 86, 116, 125, 135] where Belady [ 11 ] has produced the most extensive summary and evaluation to date. The various algorithms can be classified according to the type of data which is used by the replacement algorithm in choosing the replacement page. Type l) The first type of information pertains to the length of time each page has been in primary memory. The page (or class of pages) which has been in memory the longest is pushed or overlayed first. This information forms the basis of what are usually referred to as FIFO algorithms. This is the simplest type of information to maintain and it usually requires no special hardware to implement. Type 2) Type 2 information is similar to Type 1 information but "age" is measured by the time since the last reference to a page rather than how long the page has been in primary - 22 - memory. This information is the basis of the so-called least -recently-used replacement algorithms. Many variations exist, e.g., based on the fineness of age measurement. Systems which accumulate this type of information usually employ some type of special hardware to record page use statistics. Type 3) Information as to whether or not the contents of a page have been changed is frequently used to bias the selection towards pages which have not been changed and thus do not have to be pushed (but simply overlayed) since an exact copy is still available in secondary memory. Special hardware is needed to record the read-only/write status of each page in primary memory. Type U) In the ATLAS system [ 9 , 86 ] the length of the last period of inactivity is recorded for all pages in a program. This information is used to predict how long the current period of inactivity will be, i.e., how soon a page will be referenced again. Replacement is biased towards pages which, on the basis of this information, are expected to be inactive for the longest time. This type of information is particularly useful for detecting program loops as was intended by the ATLAS designers. Belady [ 11 ] has evaluated the performance in terms of page fault rate of a nu".ber of algoritnr.s as functions of page size and primary memory allot- nent, anu we will now discuss his results. The simplest algorithm studied was the RAM)OM algorithm. This uses no information about pages, but chooses a replacement page randomly from those in primary memory. The use of Type 1 information (time in - 23 - primary memory) never significantly improves performance relative to RANDOM and in some cases performance is worse than RANDOM. The use of Type 2 information (time since last read or write) leads to the most significant and consistent improvement in performance. With these algorithms the accuracy with which "age" is measured does not seem to have much effect on performance, however. That is, performance does not change significantly whether we keep a complete time history of pages in primary memory, or just divide all pages into two classes— recently used and not-so-recently used. The use of Type 3 information (read only /write status) in addition to Type 2 information does not affect the total number of page faults very much. However, it does increase per- formance due to the fact that no push is required on 10 to 6of of all page faults. The ATLAS algorithm [86] which used both Type 2 and h informa- tion is the most complex algorithm studied, and it is interesting to note that it consistently leads to worse results than Type 2 algorithms and is sometimes worse than RANDOM or FIFO. This result has been further sub- stantiated by Baylis, etal. [ 9 ]• Apparently, the problem is that most programs do not have a regular or small enough loop structure to warrant the use of the ATLAS algorithm which is intended to take advantage of program loops. Thus, algorithms which make replacements are the basis of least recently referenced pages and bias towards read-only pages would seem to be best in terms of cost effectiveness. However, for existing systems which do not have the hardware necessary to automatically maintain Type 2 and/or Type 3 information, RANDOM, FIFO or programmer directed schemes must be used. 2k - 2.1* PROGRAM ORGANIZATION Coroeau [ 30] has shown that simply by reordering the assembler deck of the Cambridge Monitor System to cause logically dependent routines to be grouped together, paging of the monitor was reduced by as much as bO%. Brawn and Gustavson [ 18] and McKellar and Coffman [103] have shown that simple changes in computation algorithms, such as referencing matrices by square partition instead of row or column, can also affect large improvements in paging activity. (See also [ 36, 37 > 51> 7330 These studies indicate that: 1) Programmers need to be aware of the paged and/or segmented environment in which their programs will be executed. Program optimization by reducing page faults is more important than classical optimization techniques (e.g., common subexpression elimination). 2) Prorp-amners should be able to direct or advise the compiler as to which code should be placed in which page/segment. 3) If possible, subroutine or procedure code should be placed in the code segment where it is called. If this code is smell and is used in several different segments, then several copies of the subroutine could be generated, one in each segment where it is called. h) More emphasis should be placed on compiler optimization of code through strategic segmentation. For example, by analyzing the structure of a program (see Martin and Estrin [99]) the compiler could make better segmentation decisions and provide information which the operating system could use to make replacement decisions, and to perform prepaging. - 25 - In addition, compilers might be able to detect certain cases of poor data referencing patterns and issue appro- priate warnings to the programmer. Thus, we can improve paging behavior both by changing the physi- cal parameters of the system and by intelligent program organization. The latter method would appear to have a higher cost effectiveness and should not be overlookeu. 2.5 SUMMARY As we have noted, CPU efficiency can be related to the page fault rate and the average time T to satisfy these I/O requests. In Section II we have tried to illustrate the relationships between page fault rate and primary memory size, primary memory allotment, page size, replacement algorithm, program organization, and secondary memory characteristics. Our intent has only been to indicate trends and general relationships, and with this in mind our models have not been very elaborate. However, all our models have been based on observed program behavior and are probably accurate, at least for the classes of programs studied. III. Multiprogramming Multiprogramming arises for two reasons: 1) In an attempt to overlap I/O time by having one program be executed while other procrams are waiting for I/O (implicit or explicit). 2) In order to provide quick response to several real time .iobs (time sharing, process controls, etc.) will concern ourselves only with the first of these functions. - 26 - Whenever several concurrent programs share memory in order to "mask" I/O time each program operates with less primary memory than it would have if it were running alone. As we have seen, this causes the paging rate for each program to increase. On the other hand, by multiprogramming we are able to decrease the average time per I/O request (both paging and explicit). Several questions now arise: First, when does the degradation of efficiency due to increased page traffic become greater than the increase in efficiency due to more I/O masking. Second, how much of an improvement can we expect with multi- programming over monoprogramming. Gaver [ 65] has presented an analysis of multiprogramming based on a probability model which relates CPU efficiency to the number of con- current jobs J. where each job runs for an average of l/h instructions (hyperexponentially distributed) before generating an I/O interrupt, and I/O requires an average of T instruction times to complete (exponentially distributed). Unfortunately, Gaver does not consider the fact that as J increases, each job must be executed with less primary memory and thus paging I/O increases. However, this is fairly easy to add to his model, using the results of Section 2.1. 20 Suppose the total available primary memory is M pages and all programs are identical and are allocated equal amounts of this memory. 21 Then the memory allotment for each program is just M/j. ' The paging rate >. for each program as a function of J is then >-< J > ■ mu ■ (10) - 27 - where cp(p) was defined in Section 2.1. We will assume this is exponent- ially distributed. As in Section 2.1 we will use the function a,^A/jy to model 08 B < P • fc< 'O u c T1 2 •"3 (4* JC • +J ^ G •"3 6000, there is no gain to be had from multiprogramming. This does not mean that multiprogramming with this system configuration is bad. It merely illustrates that for this system it is not wise to multiprogram programs characterized by a = 3»8, B = 2.k and l/r = 10,000. (if l/r = 5000, then running 2 jobs is advan- tageous; see Figure 6). This introduces the scheduling problem. That is, which jobs should be run concurrently? A good scheduler whose purpose is to maximize throughput should be able to use information about programs' working sets or a,e characteristics to determine an optimal load. We will not pursue this subject further here (see Denning [ ho, Ul ] and Heller [ TO]). Figure a shows the relative gain in efficiency over monoprogram- ming due to multiprogramming with an optimal number of jobs E(J^) - E(l) G = °F* (13) as a function of T for several combinations of r and M (in all cases, a = 3'b) B = 2.U). This figure illustrates that for multiprogramming to yield a reasonable gain, there must be sufficient primary memory (note the I: = 22 curves). Literature on multiprogramming and tine -sharing is extensive I we will not attempt to present a comprehensive bibliography here, (instead, see Buchholz [20], Calingaert [22], McKinney [104] Trimble [13*0 and Bell and Pirtle [ lU]. Some useful studies can be found in [ 12, U9, 52, 56, 65, 107, 130, 131, 132, 136]. - 32 - I- o w cd > B • M w -P £> ft O O •f-i a) Cm -p O ert o m •• i Q) T) Jg C b w c E M .-i ■L) 4J ft c O o M w $-, Cvh 0) X) bO E £ o ■H M bD-^ o • ^ ft o Ck c E cc u i ■ aj > ii o >. o" • c «*— >. o; 'G ■H ai rj M • H •H Cm rH <+-' CtJ a; E c o ■H c N«_^' o a; E c •H -*•-' crj bfl t- o 0) > -p •rH a; -p r i a) p< ,; H e » CJ o -5 PC o o •1-3 O r • M *"< >o C U . a> M rM M S o> a H •H > s fr< crj C - 33 - IV. Average Time Per I/O Request In Section II we introduced T as the average interval between the time when a program is forced to stop (due to a lack of instructions or data in primary memory) and the time when the program could resume. In 2.1 and III, we showed that CPU efficiency is highly correlated with the magnitude of T (see Figures 2 and 5)« In the following sections we will examine T in more detail. Specifically, we will discuss techniques whereby T can be reduced. Secondary storage devices range from extended core storage to magnetic tape, but the most common device in use today is the disk file. The time required for these devices to deliver a block of b words can be generally characterized by T = t + t + b/p (Ik) q a where t is queueing time before the disk logic recognizes an I/O request; t is the sum of head positioning latency and rotational latency, and p 3 is the transmission rate between primary and secondary memory. Four ways in which we can decrease the average T are : 1) Decrease t by making the disk spin faster using more a heads per surface or by using extended core storage. 2) Making the disk spin faster or using higher bit densi- ties increases p. We might also increase p directly by reading more heads simultaneously. 3) Use parallel queueing techniques so that the average T over n requests is less than T. *) Change the distribution of t by planning the layout of data on the disk in such a way that the data is - 3 U - almost under the read heads when it is needed (this technique is only practical in systems doing large calculations where a dedicated disk is available). Alternately, we can prefetch data blocks (buffering). We will now discuss some of these techniques. -.1 PHYSICAL LATENCY OF SECONDARY MEMORY Consider a disk system with one movable head per surface and with all heads fixed to the same head positioner assembly. Now t , the access time for this device, is the sum of two statistically distributed * i~.es: t , the time to position a head, and t., the time required for 22 the desired sector to come under the heads: t - t + t f . (15) a p I v ' Or.e way to make this disk faster is to add more heads to each arm so that the arm does not have to move so far to position a head over the right track. This tends to decrease t . P Another way to decrease t would be to have independent posi- Jr tioners for each surface. Fife and Smith [ 5*+ ] have presented a good analysis of this technique. Several manufacturers have eliminated t altogether by providing one fixed head per track. To provide further speedup we could introduce multiple heads per track (a matter which pre- sents technological difficulties) or use a drum which typically rotates faster than a disk but does not have as large a capacity. Both of these latter techniques reduce t. in Eq. (15). (See also [133]*) - 35 - Any further improvement in the physical response of secondary memory probably must come from the use of extended core storage (ECS). This is potentially quite expensive (the cost per word being typically more than one-tenth that of primary memory) but is considerably faster as latency is on the order of ten microseconds as opposed to tens of milliseconds for disks and drums. This could double CPU efficiency (see Figures 2 and 5) but must be evaluated on the basis of cost effec- tiveness. Several studies of the use of ECS can be found in [ 7> 63, &S 79, 83, 101]. k.2 EFFECTIVE LATENCY OF SECONDARY MEMORY Several techniques can be used to decrease the effective latency of a disk device without changing its physical characteristics. For instance, if several requests for blocks from the disk are waiting for service, then we can decrease the average latency over all requests by servicing requests in the order in which the required blocks come under the heads. Another possibility which can be used in certain special cases is to coordinate the layout of blocks on the disk with the timing of the program so that blocks will be almost under the heads when they are needed. U.3 REQUEST QUEUEING 23 We will assume that at any given time, there are n requests for service from secondary memory (these requests having been generated by the several programs being multiprogrammed). We also assume that the secondary memory is a rotating device divided into M tracks, each track being further divided into N sectors. Each request is for access to a particular track and sector. The rotation time of the device is T,.. - 36 - Each request waiting for service will experience a delay T, sum of t (time in queue), t (access time), and t (transmission time, Go r assured constant). The simplest way to service these requests is to establish a single queue which is serviced on a first in, first out (FIFO) basis. A better strategy is to service requests according to which request can be serviced next (FSFO), i.e., the request whose required track and sector is due under the heads next is serviced first. Denning [ 39] shows that for a fixed head per track device the ratio of delay time under FIFO to delay tine under FSFO is (FIFO) n(N + 2) (l6) (FSFO) N+2(n+l) For Y = 3*- sectors and n = 10 requests then the relative improvement by Eo. (l6) is 7.66. That is, the response of a fixed head device with 6k sectors and 10 waiting requests is J. 66 times better under FSFO than under Zk FIFO. An analysis of movable head devices shows that improvement can also be affected by similar scheduling algorithms, but the improvement is not as dramatic. U.U MINIMIZATION OF EXPLICIT I/O REQUEST TIME A number of large scale calculations require space for their data and instructions which exceeds the available primary memory. These calculations involve operations on very large arrays and may require several tens of hours per production run on the fastest computers. In such cases there is no point to interval time slicing of the computation for user interaction, although system throughput can be enhanced by multi- programming, as discussed in Section III. If we restrict our attention - 37 - only to these kinds of large jobs, then one limiting case is a large machine with one large job at a time, i.e., batch processing. We will now turn to a discussion of preplanning the layout of a secondary storage device in such a way that explicit I/O request time is minimized. The interleaving of several jobs will not be discussed except to remark that in such cases the execution time requirements become less stringent for each job, but the sequencing of the interleaved steps presents new difficulties. Historically, there are many examples of preplanned drum layout. When drums were used as primary memory, optimizing assemblers would locate the sequence of instructions at appropriate intervals around the drum so that (in jump free segments of code) the next instruction would always be available when the previous one was finished [ll8]» For current machines in monoprogramming mode, it is reasonable to assume that enough code resides in primary memory at any time so that the time required to perform instruc- tion overlays is negligible. However, data overlays may be extensive and we might be able to decrease the latency involved in obtaining data blocks from secondary memory by planning the layout of these data blocks and pre- fetching data. The question of overlaying data must be considered with respect to the average amount of processing which may be performed on each data element. Many matrix calculations (e.g., multiplication, inversion, eigen- value calculation) require a N operations where a < 1 and N is the dimen- sion of the matrix. Also, it can be empirically observed that a number of partial differential equation solution techniques on N x N meshes require a I operations per iteration, where cc is generally smaller than in the matrix case but usually greater than 0.1. In the partial differential - 38 - equation case it is sometimes possible to iterate several times on a block in memory, thus increasing a. If we assume a fr operations on N data elements, then each element requires a N operations, where an opera- tion may be regarded as, e.g., a multiply, an add, and a memory fetch or say, one microsecond on a current machine. Let us assume a machine with 2 N words of memory available for each block transmitted from the disk. This allows a flr microseconds of computation per block. If a = .5 and '.: ■ 6U, then we compute for about 125 milliseconds per block. This is more time than is required for the rotation of any current large disk, which is usually in the range of UO to 60 milliseconds. Thus, if we can always keep one input request ahead in a disk queuer, it should be possible to completely mask the I/O request time. As the ratio of processor speed to disk rotation speed gets larger, this problem becomes more difficult. Suppose we have a calcula- tion with the same parameters as above, but we wish to use a processor which is ten times faster. Then we have only 12 milliseconds of computa- tion time per block and this is faster than the rotation time of any large iisk. T:.ere are several obvious ways to avoid this problem. One is to increase M; this may require a larger primary memory. Another is to sup- ply the disk queuer with several requests, thereby decreasing the expected time until some request is honored [ 39]. In some cases there are uniform but intricate relationships between the data blocks and their processing sequerce. To handle these cases, we can attempt a third solution, namely the preplanning of block layout on the disk. Consider the problem of matrix multiplication using a head per track disk. Suppose that both operand matrices are partitioned into square blocks, that the prernultiplier - 39 - is stored by rows of partitions, and that the postmultiplier is stored by columns of partitions. Let us also assume that the angle on the disk between the positions of successive partitions represents the disk motion time equal to the processor time required to multiply two square partitions. Now if it happens that one row (ana column) of partitions ends just where the next starts, then it is clear that such a disk storage scheme allows matrix multiplication with no CPU time lost due to waiting for data from the disk. It is also clear that if a sequence of matrix operations are required, then the preplanning of the disk layout becomes more complex. Ir. general, some I/O wait time will be required of the CPU. However, in order to use any matrix as a premultiplier or postmultiplier, it is possi- ble to store all matrices in such a way that they may be fetched by row partitions or column partitions. This is achieved by storing the second partition of the first row, say A p , in the same relative position on the disk as the first partition in the second row, say A p ,. This skewing pattern may be continued in the obvious way, given a sufficient number of disk surfaces. Matrix inversion and eigenvalue calculations require much more intricate disk storage schemes, but the problems are similar [ 91]* A somewhat more difficult set of constraints is encountered in some problems, e.g., explicit partial differential equation methods. In these cases it is necessary to sweep through an array of data repeatedly. When any partition of the array is being processed, it is necessary also to have some data elements from neighboring partitions. For example, if a five point finite difference operator is being applied to M element partitions of an array, then vH border elements are required from each of the four adjacent partitions. It should be possible to pack these - 1+0 - border elements in separate arrays, then write and read them on and off the disk at appropriate times. Assume the calculation on an M element partition requires time T . Next assume it is possible to map partitions of the array onto the disk such that the one-way transmission time for a T -€ c partition is — = — . Now we can read a new block and write an old block T -€ in 2(— — ) = T -e. If the edge values of the neighboring blocks can be transmitted in and out to the disk in e time units, then the scheme main- tains a steady state balance between computation time and I/O transmission time. A somewhat weakened set of conditions are imposed in Bernott [15] "re it is assumed that T is not less than five times the one-way trans- mission time for a bloc';. Various depths of finite difference operators and any rectangular mesh are allowed. Also, the number of variables being computed is a parameter. In terms of several latency considerations and the above mentioned parameters, a disk layout is computed which gives a resulting computation scheme that has an overall expected CPU efficiency ater than &0$. V. Summary and Extensions As computer systems become more complex and as user's require- -_s become more specialized, the computer system designer must give r.ore attention to overall system cost performance when he designs each part of the system. In other words, he must study more and more trade- offs between various parts of the system. In this paper we have discussed some interrelations between -tern parameters including: primary memory size, page size, secondary - Ul - memory speed, I/O request queuers, and the number of jobs multiprogrammed* These together with user program parameters including: mean time to access p pages, number of instructions executed per datum and regularity of addres- sing a data structure have a major influence on the CPU efficiency. V/e limited our discussion to two -level memory hierarchies, but the techniques mentioned can be applied to more levels by lumping several levels and reducing the problem to one of two levels. This requires approx- imating the parameters of a lumped level using the parameters of the levels being combined. The use of a two-level primary memory is quite successful in the IBM 360/85 [ 66}* It is also common to use a fast drum between primary memory and a slow disk [3*+ ]• Machines which operate on arrays of data and are organized as arrays of arithmetic processes are now being designed. For example, the pipeline processors [12U] (which might be called serial array processors) and ILLIAC IV [10 ] (which might be called a paral- lel array processor) have many individual memory units, and this fact makes it necessary to carefully plan the layout of data in primary memory for maximum CPU utilization. The kinds of storage planning discussed below might be regarded either as minimizing the number of data faults or the time per data fault because the question is that of supplying data to the processor from the primary memory at a maximum rate. Serial array processors generally require a memory whose effec- tive cycle time is equal to the CPU clock time. This is achieved by inter- leaving many slower memory units in a large bank. Since, in general two vectors are entering the processor and one is emerging, it is convenient if at least three such banks are available. Clearly, serious memory con- flicts can arise in this situation. If two argument vectors are stored in the same bank, the processing speed may be cut in half. - 42 - Since present serial array processors reach a speed limit due to the fact that the pipeline length can be made no longer than the number of elementary steps in an arithmetic operation, parallel array processors see™ to be a logical necessity for more speed improvement. The memory system of IT..LIAC IV consists of one memory unit per processor. Each mem- ory unit is directly accessible by just one processor. A network of rout- ing logic may be used to get data to other processors. If one -dimensional arrays are stored with one element per processor, then the full speedup over a single processor may be achieved. In two-dimensional arrays, row operations are easy to perform with a straightforward mapping of an array into the memory, e.g., rows are stored across the processors and each column is within a processor. Similarly, column operations are easy with a transposed array. However, if both row and column operations are required with such a storage scheme using an n processor machine, then operations in one direction will realize an n-fold speedup but operations in the other direction will realize no speedup at all over a one processor machine. If row and column operations are required, some kind of skewing scheme as out- lined in Section IV will provide the full speedup [90]. It may be expected that in the future, parallel arrays of pipeline processors will require even rr.ore intricate primary storage mapping schemes. It should be remembered that we have been discussing just one underlying subject throughout this paper: the ratio of cost to performance for en overall computer system. We have attempted to relate several memory parameters and program characteristics to the system performance as meas- ured by CPU utlization. - 43 - LIST OF FOOTNOTES 1. Note we always measure time in instruction executions; i.e., we scale time by the average instruction time. 2. The results of these experiments consisted of 1737 execution bursts from lo2 service intervals for five programs: l) LISP, 2) an interpretive meta compiler, 3) an interpretive, initially inter- active, display generation system, h) an interactive JOVIAL compiler, and 5) a concordance generation and reformatting program. Page size was 1024 words. 3- This corresponds to imposing a variable q on the program. Smith [132] -+■ IP- Pi indicates this q had a hyper exponential distribution, w. e ' + -t/U0.7 x 10 3 v 2 e 4. See Denninp [ 40, 4l ]. 5« We assume that the first page is referenced at t=0 with probability 1 (t =0) which accounts for the difference between this formula and that of Shemer and Shippey. Determined from a least-squares fit to the function, £ t = a + y&np where 5 = e . Average error over 18 points was l6$. It should be remembered that values of a and (3 are characteris- tics of a given program or class of programs, and should not be used to describe all programs. A similar study of results [135] from a SIIOBOL compiler yielded, cp(p) = .51+ p . Belady and Kuehner [12 ] suggest the function 4,5« Segment size was generated from several distributions. B was 1024 and Q was varied from 32 to 1024 in powers of 2. Total memory size was 32K. It was assumed that requests for memory were always waiting to be filled. 11. .For Q = B/32, utilization varied from over 95% for s = 4B to about 90^ for s ■ B/2. At Q = b, utilization varied from just under 90% for s = 4B to about h% at s = B/2. 12. Until stated otherwise, we now assume b = Q = B, i.e., page size is constant over a given experiment. 13* This data comes from two program loads: 1) "10 small FORTRAN compil- ations and loads" and 2) "FORTRAN compilations, and executions, used to debug the 44x FORTRAN compiler. " Apparently, there is negligible internal and external fragmentation in this experiment. 14. This data is from an integer programming/calculation. 15- Since apparently M(l) < a Q + a . 16. Again in this and the following experiment, there is apparently neg- ligible fragmentation. 17. See Rosene [119 3- 16. B J. We will only consider the case where I > J; i.e., there are no conflicts - 45 - for secondary memory. The assumption of an exponentional distribution of I/O completion time is not particularly realistic as Gaver admits. Since we are using T to represent the average time required to complete all kinds of I/O requests, paged or explicit, the density of T will probably consist of a collection of exponential, Gaussian, and delta functions. However, even with a simple exponential distribution, the total expectation functions become quite complex, and a more complex distribution would not be warranted here. See Smith [132] for a slightly different model. 20. Pzces are here assumed fixed at 102U words. 21. Actually, this could only be true if M were some multiple of J. However, if M » J, this is not a bad approximation. We also assume here that programs are not swapped out of primary memory while waiting for I/O. 22. See Frank [ 61] for an analysis of the statistical properties of disk systems. 23. Our development in this section will follow Denning [ 39]. See also [ 26, 132, 139, 1U0]. 'elk. The particular case of Gaver 's model which we used in Section III assumed r.o conflicts for secondary memory, i.e., rate of I/O comple- tion was not dependent on the number of jobs (requests). The tech- niques discussed here are not as good as those assumed in Section III. - h6 - BIBLIOGRAPHY [1] Arden, B. W. Time sharing systems: a review. Michigan Summer Conference on Computer and Program Organization , 19&7* [2] Arden, B. W. Time sharing measurement and accounting. Michigan Sumner Conference on Advanced System Programming , 1969* [3] Arden, B. W., Galler, B. A., O'Brian, T. C. and Westervelt, F. H. Program ani addressing structure in a time sharing environment. JAC.: 13,1 (1/66), 1-16. *[>] Anackcr W. and Wang, C. P. Performance evaluation of computing systems with memory hierarchies. IEEE EC-l6,6 (12/67 ), 765-773. Asp i nail, D. , Edwards, D.B.G. and Kinniment, D. J. Associative rrer.ories in large computer systems. IFTP (1968\D8l-85. Aspinall, D. , Edwards, D.B.G. and Kinniment, D. J. An integrated associative memory matrix. IF IP (1968), D86-90. Badges, G. F. Jr., Johnson, E. A. and Philips, R. W. The Pitt time sharing system for the IBM system 36O: two years experience. AFIPS FJCC, 33 (1968). * Referenced in text. - ^1 - [8] Bairstow, J. n. Time sharing, Electronic Design , 16,9 (1968) C1-C22, *[9l Baylis, M. H. J., Fletcher, D. G. and Howarth, D. J. Paging studies made on the I.C.T. Atlas computer. IFIP (1968), D113. *[10] Barnes, G. H. , et al . The ILLIAC IV computer. IEEE EC-17,8 (8/68), 7^6-757. *[ll] Belady, L. A. A study of replacement algorithm for a virtual storage coirputer. IBM S. J., 5,2 (1966), 78-IOI. *[12 n Belady, L. A. and Kuehner, C. J. Dynamic space sharing in computer systems. CACM 12,5 (5/69), 282-288. [13] Belady, L. A., Nelson, R. A. and Shedler, G. S. An anomaly in space- time characteristics of certain programs running in a paging machine. CACM 12,6 (6/69), 3^9-353. *[1U] Bell, G. and Pirtle, M. W. Time sharing bibliography. IEEE EC-15,12 (12/56), 1764-1765. *[15l Bernott, B. A. Disk I/O For Non-Core-Contained P.D.E. Meshes and Arrays . DCS Report No. 3H, Department of Computer Science, Referenced in text. - U8 - University of Illinois at ^hampaign-Urbana, Urbana, Illinois, (3/69). f 16] Bobrov, D. G. and Murphy, D. L. Structure of a LISP system using two-level storage. CACM 10,3 (3/67), 155. *[17] Bovet, D. P. Memory allocation in computer systems. Department of Engineering, UCLA Report 68-17 . *[13] Brawn, B. and Gastavson, F. Program behavior in a paging environment. AFIPS FJCC 33 (1968), Part 2, 1019. Buchholz, W. File organization and addressing. IBM S.J. 2, (6/63), 5-111. Buchholz, W. A selected bibliography on computer system performance evaluation. Computer Group News , (3/69), 21-22. [dl] Burroughs Corp. A .Narrative inscription of the Burroughs B5500 Disk File taster Control Program . Burroughs Corp., Detroit, Michigan, 1966. Calingaert, P. System performance evaluation: survey and appraisal. CACM 10,1 (1967), 12-18. Campbell, D. J. and Heffner, W. J. Measurement and analysis of large Referenced in text. - U 9 - operating systems during system development. AFIPS FJCC 33 (1968) 903-91U. [2k] Chu, Y. Direct execution of programs in floating code by address interpretation. IEEE EC-lU,3 (6/65), U17-U22. [25] Coffman, E. G. Stochastic Models of Multiple and Time -Shared Computer Operations . Department of Engineering, University of California, Los Angeles, California, Report 66-38, I966. *[26] Coffman, E. G. Analysis of a drum input /output queue under scheduled operation in a paged computer system. JACM 16, 1 (I/69), 73-90. *[27] Coffman, E. G. and Varian, L. C. Further experimental data on the behavior of programs in a paging environment. CACM 11,7 (7/68), U7I-U7I+. [28] Cohen, L. J. Stochastic evaluation of static storage allocation. CACM U,10 (10/61), 1+60-U6U. [29] Collins, G. 0. Jr. Experience in automatic storage allocation. CACM U,10 (10/61), 436-M+o. Referenced in text. - 50 - *[30] Cameau, L. W. A study of the effects of user program optimization in a pacing system. ACM Symposium on OS (IO/67). [31] Conti, C. J. Concepts for buffer storage. Computer Group News 2,8 (3/69), 9-13. Conti, C. J., Gibson, D. H. and Pitkowsky, S. H. Structural aspects of the System 360 model 85: I* General organization. IBM S. J. : (I960), 2. Conway, M. E. A multiprocessor system design. AFIPS FJCC 2k (1963), 139-l 1 +6. *t3^] Corbato, F. J., and Vyssotsky, V. A. Introduction and overview of the multics system. AFIPS FJCC 27,1 (1965), I85-I96. Daley, R. C. and Dennis, J. B. Virtual memory, processes, and sharing in multics. ACM Symposium on OS (IO/67). Also CACM 11,5 (5/66), 306. Deley, R. C. and Neumann, P. G. A general purpose file system for secondary storage. AFIPS FJCC 27 (1965), 213. erenced in text. - 51 - *[37] Dearnley, F. H. and Newell, G. B. Automatic segmentation of programs for a two level store computer. TCJ 7, 3 (10/61+), I85-I87. [38] Denes, J. E. BROOKNET - an extended core storage oriented network of computers at Brookhaven National Laboratory. IFIP (1968), I9U. *[39] Denning, P. J. Effects of scheduling a file memory operation. AFIPS SJCC 30 (19oT), 9-21. *[Uo] Denning, P. J. The working model set for program behavior. ACM Symposium on OS (IO/67). Also CACM 11,5 (5/68), 323. *Ol] Denning, P. J., Thrashing and its cause and prevention. AFIPS FJCC 33 (1968), 915-922. [-2] Denning, P. J. Resource Allocation in Multiprocessors Computer Systems. MIT, MAC-TR-50 (1968). [U3] Dennis, J. B. Segmentation and the design of multiprogrammed computer systems. JACM 12, h (10/65), 589. [•■•k] Dennis, J. B. and Glaser, E. L. The structure of on-line information processing systems. Proc. Second Congress on Information Systems erenced m text. - 52 - Science*, .19o5, 5-1 1 *. [45] Derrick, M. , Summer, F. H. and Wyld, M. T. An appraisal of the Atlas supervisor, Proc. 22 Nat. ACM (1967), 67. Dreyfus, P. L. System design of the Gamma 60. WJCC (1958), 130. Elmore, W. B. and Evans, G. J. Jr. Dynamic control of core memory in a real time system. IFIP (1965), 26l. [<~ Estrin, G., Coggan, B., Crocker, S. D. and Hopkins, D. Snuper Computer - a computer in instrumentation automaton. AFIPS SJCC 30 (1967), , ; 56. )] Estrin, G. and Kleinrock, L. Measures, models and measurements of time shared computer utilities. Proc. 22 Nat. ACM (1967), 85-96* Evans.. D. C. and Leclerc, L. Y. Address mapping and the control of access in an interactive computer. AFIPS SJCC 30 (1967), 23-32. *[5l] Feldman, J. A. and Rovner, P. D. An ALGOL-based associative language, CACM 12,8 (6/69), U39-UU9. Referenced in text. - 53 - •[52] Fenichel, R. R. and Grossman, A. J. An analytic model of multi- programmed computing. AFIPS SJCC 3^ (1969), 717- [ 53 ] Fife, D. W. An optimization model for time sharing. AFIPS SJCC 28 I (1966), 97-10U. *[ 5 1*] Fife, D. W. and Smith, J. L. Transmission capacity of disk storage systems with concurrent arm positioning. IEEE EC-lM (8/65), 575-582. •[55] Fine, G. H. , Jackson, C. W. and Mdsaac, P.-V. Dynamic program behavior under paging. Proc. 21 Nat. ACM (1966), 223-228. .[ 5 o] Fine, G. H. and Mclseac, P. V. Simulation of a time-sharing system. Man. Sci. 12 (2/66), Bl8C-19^ 5 n^ ««j r r> Time sharing on a computer with a [57] Fisher, R. 0. and Shepard, C. D. lime snamifc small memory. CACM 10,2 (2,6?), 77-61. [58] Flores, I. Derivation of a waiting-time factor for a multiple-bank memory. JACM 11,3 (7/30, 26 5- [59] Flores, I. Virtual memory and paging: Part I, Datamation 13,8 ( j/67) 31; Part II, Datamation 13,9 (9/67), l H- Referenced in text. - 5U - [60] Fotheringham, J. Dynamic storage allocation in the Atlas computer including an automatic use of backing store. CACM U,10 (l0/6l), U35-U36. *[6l] Frank, H. Analysis and optimization of disk storage devices for time sharing. JACM l6,U (IO/69), 602-620. Freibergs, I. F. The dynamic behavior of programs. AF3PS FJCC 33 (1966), II63-II08. *L 153-165. *[10U] McKinney, J. M. A survey of analytical time-sharing models. Comp . Surveys 1,2 (6/69), IO3-II0. Holland, F. C. and Merikallio, R. A. Simulation design of a multi- processing system. AFIPS FJCC 33 (1968), 1399* Referenced in text. - 60 - [10b] Naylor, T. H. , Wertz, K. ind Wonnacott, T. H. Methods for analyzing data from cons>uter simulation experiaants. CACM 10,11 (H/67), 703-7IO. *[107] Neilson, N. R. The simulation of time-sharing systems. CACM 10,7 (1967), 397-^12. *[106] O'Neill, R. W. Experience using a time sharing multiprogramming system with dynamic address relocation hardware. AFIPS SJCC 30 (1967), 611-621. "■[109] Opperheimer, G. and Weizer, N. Resource management for a medium scale tine sharing operating system. ACM Symposium on OS (10/67). Also CACM 11,5 (5/68), 313. [110] Penny, J. P. An analysis, both theoretical and by simulation, of a time-shared computer system. TCJ 9 (5/66), 53-59* [111] Pinkerton, T. Program behavior and control in virtual storage computer systems. University of Michigan, CONCOMP Report h (1+/68). [112] Pirtle, M. Intercommunication of processors and memory. AFIPS FJCC 31 (1967), 621-633. Referenced in text. - 61 - *[113] Randell, B. A note on storage fragmentation and program segmentation. CACM 12,7 (7/69), 365. [llU] Randell, B. and Kuehner, C. J. Dynamic storage allocation systems. ACM Symposium on OS (IO/67). Also CACM 11,5 (5/68), 297- [115] Rehmann, S. L. and Gangwere, S. G. Jr. A simulation study of resource management in a time-sharing system. AFIPS FJCC 33 (1968), IUU-IU30. *[ll6] Riskin, B. N. Core allocation based on probability. CACM U,10 (10/61), I+5U-U6O. [117] Roberts, A. E. Jr. A general formulation of storage allocation. CACM 4,10 (10/61), 419-U20. *[ll8] Rosen, Saul. Programming Systems and Languages. McGraw-Hill Computer Science Series. (1967), p. 6. *[119] Rosene, A. F. Memory allocation for multiprocessors. IEEE EC-l6,5 (IO/67), 659-665. [120] Rosin, R. F. Determining a computing center environment. CACM 8,8 (7/o5), U63-U68. * Referenced in text. - 62 - L21] Saclonan, H. Time sharing v*. batch processing: the experimental evidence. AFIP3 SJCC 32 (1968), 1-10. Scarrott, G. G. The efficient use of multilevel storage. IFIP (1965), 137-142. Schwartz, J. I. Coffman, E. G. and Weissmen, C. A general purpose tine sharing system. AFIPS SJCC 25 (1964), 397 -411. Senzig, D. N. and Smith, R. V. Computer organization for array processing. AFIPS FJCC 27,1 (1965), 117-128. Sherner, J. E. and Gupte, *S. C. On the design of Bayesian storage allocation algorithms for paging and segmentation. Tfflti C-l8,7 (7/69). er, J. K, end Gupta, S. C. A simplified analysis of processor "look-ahead" and simultaneous operation of a multimodule main memory. IEEE C-1&,1 (I/69), 64-71. Shemer, J. E. and Shippey, G. A. Statistical analysis of paged and segmented computer systems. IEEE EC-15,6 (12/66), 855-863. Sisson, S. S. and Flynn, M. Addressing patterns and memory handling fferenced in text. - 63 - algorithms. AFIPS FJCC 33,2 (1968), 957-967. [129] Sherr, A. L. Time-sharing measurement. Datamation 12, h (U/66), 22-26. *[130] Sherr, A. L. An Analysis of Time-Shared Computer Systems . MIT Press, Cambridge, Mass. (1967). *[13l] Smith, J. L. An analysis of time-sharing computer systems using Markov models. AFIPS SJCC 28 (1966), 87-95. *[132] Smith, J. L. Multiprogramming under a page on demand strategy. CACM 10,10 (IO/67), 636-6U6. *[133l Stevenson, D. A. and Vermillion, W. H. Core storage as a slave memory for disk storage devices. IFIP (I968), F86-F91. *[13^] Trimble, G. R. Jr. A time sharing bibliography. CR Bibliography ;■•' li, Computing Reviews 9,5 (5/68), 291-301. *[135] Varian, L. C and Coffman, E. G. An empirical study of the behavior of program in a paging environment. ACM Symposium on OS (10/67). Also CACM 11,7 (7/68), U7I-47U. *~136] Wald, B. The Throughput and Cost Effectiveness of Monoprogrammed, Referenced in text. 6k - Multiprogrammedj and Multl - nroceaaing Digital Computers . NRL Report 65U9, AD# 65^38U. [137] Wallace, V. L. and Mason, D. L. Degree of multiprogramming in page on demand systems. CACM 12,6 (6/69), 305. Wagner, P. Machine organization for multiprogramming. Proc. 22 Nat. ACM (I967), 135-150. [139] Wingarten, A. The analytical design of real-time disk systems. IFIP (1968), D131-137. Wingarten, A. The Eschenbach drum scheme. CACM 9,7 (7/66), 509. '.veizer, N. and Oppenheimer, G. Virtual memory management in a paging environment. AFIPS SJCC 3 1 * (1969), 2U9. Wilkes. Slave memories and dynamic storage allocation. IEEE EC-l4,2 (U/65), 270-271. V/ilkes, M. V. A model for core allocation in a time-sharing system. :ps sjcc 3^ (1969), 265. Referenced in text. - 65 - NOV z 8 ^72 ^ ,&