?£ 3 Report No. 363
1 a.
THE USE AND PERFORMANCE
OF MEMORY HIERARCHIES: A SURVEY
by
D. J. Kuck
D. H. Lawrie
December h, 1969
THE LIBRARY 01: t THB
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
URBANA, ILLINOIS
REPORT NO. 363
THE USE AND PERFORMANCE
OF MEMORY HIERARCHIES:
A SURVEY
by
D. J. Kuck
D. H. Lawrie
December k, I969
Department of Computer Science
University of Illinois at Champ aign-Urbana
Urbana, Illinois 61801
TABLE OF CONTENTS
Page
I. Introduction 1
II. Page Fault Rate 3
2.1 EFFECT OF PRIMARY MEMORY ALLOTMENT ON PAGE FAULT RATE ... 3
2.2 EFFECT OF PAGE SIZE AND PRIMARY MEMORY ALLOTMENT
ON PAGE FAULT RATE 13
2.2.1 FRAGMENTATIONS AND PAGE SIZE lk
2.2.2 SUPERFLUITY VS. PAGE SIZE 17
2.2.3 PRIMARY MEMORY ALLOTMENT AND PAGE SIZE 18
2.3 REPLACEMENT ALGORITHMS 22
2.U PROGRAM ORGANIZATION 25
2.5 SUMMARY 26
III. Mult iprogranming 26
IV. Average Tine Per I/O Request 3U
i+.l PHYSICAL LATENCY OF SECONDARY MEMORY 35
-.2 EFFECTIVE LATENCY OF SECONDARY MEMORY 36
U.3 REQUEST QUEUEING 36
h.k MINIMIZATION OF EXPLICIT I/O REQUEST TIME 37
V. Summary and Extensions Ul
LIST OF FOOTNOTES hk
BIBLIOGRAPHY 1+7
ii
LIST OF FIGURES
Figure P a 8 e
1. Mean time to reference p pages as a function of p. 7
2a. E vs. (p,T) surface for q = 2 x 10 , a = 3.8, p ■ 2.U. 11
2b. E vs. (p,T) surface for q = 5 x 10 , a = 3.6, 3 = 2.U 12
3« Memory fragmentation with four pages of size b, = Uq,
= 1.5Q, b 3 = 3-2Q and b^ = Uq. B = i+Q. 16
Page fault rate a as a function of primary allotment m and
page size B. Data for a FORTRAN compiler is from Anacker
and Wane [ ^ ]• Note B scale is logarithmic. 20
Page fault rate \ vs. M and B. Data for a SN0B0L compiler
from Varian and Coffman [135]. Note \ scale is different
than Figure 2a. Dashed lines indicate locus of equal ,\. 21
5a. CPU efficiency as a function of the number of jobs J and
average I/O completion time T. Average page rate is
V(3.£ (uk/j) * ) and explicit I/O interrupts occur every 10K
instructions on the average. 29
CPU efficiency as a function of J and T. Average page rate
is t[3." (6V ; J) )and explicit I/O interrupts occur every
instructions on the average. 30
5c. CPU efficiency as a function of J and T. Average page rate
is 1/^3.- (32/j) - ' ) and explicit I/O interrupts occur every 10K
instructions on the average. 31
Relative gain G in efficiency over monoprogramming for optimal
oer of jobs vs. average I/O completion time (normalized).
a. = 3»S> 3 = 2.U. Numbers on curves indicate optimal number
of jobs. 33
iii
Digitized by the Internet Archive
in 2013
http://archive.org/details/useperformanceof363kuck
LIST OF TABLES
Table Page
I. Summary of Results from Varian and Coffman [135]. 6
iv
I. Introduction
The fundamental reason for using memory hierarchies in computer
systems is to reduce the system cost. System designers must balance the
system cost savings accruing from a memory hierarchy against the system
performance degradation sometimes caused by the hierarchy. Since modern
computers are being used for a great variety of applications in diverse
user environments, the hardware and software systems engineers' task is
becoming quite complex. In this paper we shall discuss a number of the
hardware and software elements of a memory hierarchy in a computer system.
Included are several models and attempts at optimization.
Computer engineers may choose from a number of optimization
criteria in designing a computer system. Examples are system response
time, system cost, and central processing unit (CPU) utilization. We
shall primarily discuss CPU utilization and then relate this to system
cost. Such considerations as interrupt hardware and scheduling algorithms
determine response time and are outside the scope of this paper.
In order to discuss CPU utilization, let us list a number of
reasons for non-utilization of the CPU. That is, assuming that a user
or system program is being executed by the CPU, what may be the causes
of subsequent CPU idleness.
1) The computation is completed.
2) A user program generates an internal interrupt due
to e.g., an arithmetic fault.
- 1 -
3) A user program generates an explicit I/O request to
secondary storage.
h) The system generates an internal interrupt due to
e.g., a page fault.
5) The system generates a timer interrupt.
6) The system receives an external interrupt from e.g.,
a real time device.
We are using "system" here to mean hardware, firmware, or software.
Point l) will be implicitly included in some of our discussions by
assuming a distribution of execution times. Point 2) will not be
discussed. Point 3) will be discussed in some detail and point h)
will be given a thorough discussion. Points 5) and 6) fall under
system response time and will note be explicitly discussed.
If a program (instructions and data) is being executed, let
us del'ine a page fault to be the generation by the system of an address
outside the machine's primary memory. This leads to the generation by
the system of an I/O request to the secondary memory. Now we can des-
cribe the CPU idle time for both points 3) and h) above, by
CPU I/O idle time =
number of I/O requests x average time per I/O request
- 2 -
In this equation, "average tiirj per I/O request" is the interval from
when an I/O request occurs until some user program is again started.
Notice that we are including both the case of explicit, user initiated
I/O requests and the case of implicit system generated page faults which
lead to I/O requests to the secondary memory. Much of our discussion
will be centered on the minimization of one or the other of the terms on
the richt hand side of this equation.
It should be observed that this equation holds for multiprogrammed
as well as monoprogrammed systems. In a monoprogrammed system, the "average
time per I/O request" is defined as the interval frcm when an I/O request
occurs for some program until that program is again started. We regard
the execution of operating system instructions as CPU idle time. In a
multiprogramming situation, the average time per I/O request is decreased
by allowing several users to interleave their I/O requests and we shall
also deal with this case.
II. Page Fault Rate
In this section we will deal with the first term on the right
hand side of the equation of Section I. In particular, we will restrict
our attention to the rate of generation of page fault I/O requests, explicit
I requests being ignored. We consider only demand paging where one page
at a tir.e is obtained from secondary memory.
2.1 EFFECT OF PRIMARY MEMORY ALLOTMENT ON PAGE FAULT RATE
Obviously, the page fault rate will be zero if all of a program's
instructions and data are allowed to occupy primary memory. On the other
hand, it has been demonstrated that a small memory allotment can lead to
- 3 -
disastrous paging rates. The relationship between primary memory allot-
ment and page faults has been studied by a number of workers [ 12 , 40 ,
Ul, 95, 109, 125, 127, 128, 132] and many experiments have been conducted
to determine program paging behavior [ k, 9, 11, 18, 27, 55, 62,
95 , 108, 111, 133 > 135]'
One of the statistics which is of interest is the length of the
average execution burst. We will define an execution burst
i .
p i=i x
: ir.( T an empirical curve for t = f(p) (Figure l) we can determine
P, P" 1
- t - t = \ i/a. - T. i/x. = i/>
A.
p i=l ' * i=l p
- 5 -
P T3
o
to ><
01
CO P
to CO
P fx<
CO
«
OJ
c co w
e
M-P
CO o
"3
a
cr
CO
P
CO
Q
P
to
c.
O
(X,
COOIIA
• 00 ir\
r-H CO
H-J- CO
0> OJ
OJ
H -d-
OJ
VO
P
H
03
fa
2 C
O CO
p
on t^-co
CD r-l
OJ -3" 00
oDOtn
ir\_* o
H ro
OJ J" (JO
OJ
H
oj
C
•H
1
o
CO
p
CO
«
c
CO
p
co
C
H
co H
OJ j-
8
ITS
m
H
OJ
CO
c
o
•H
CO C O P
C CD -H CO
•H ^ p »-,
tj cj co ho
f~. Cm p| oj
O Cw O* p
•H W G
- 6 -
O
a
o
u
a)
w
cd
u
oj
CO
ft
ft
CD
CJ
C
CJ
u
0)
o
•p
0)
.9
•p
e
cd
H
0)
•H
a.
■p
•P
ITS
•P
4*
CO
■P
-p
t
J
V
- 7 -
Since t p+1 - t p - f(p+l) - f(p) - ^^Pl 4?
we have
1\ S ^El ^ . (!)
Thus, we can determine the X. probabilities by examining empirical t
curves.
We will model the t function of a program with the formula
f(p) = 6P 7 . (2)
This formula has been applied to the f(p) data presented by Fine, et al
and it was determined that 5 = 1.1 and y = 3»^» Using Eqs. (l) and (2)
vmere Ap = 1, we find
!A P = ^ = r^ 1 « ^ (3)
or l/\ = 3.8 p 2,U .
Given we are in state p (p most recently referenced pages in primary
armory) the probability of referencing a new. page (page fault) at time t,
-k t
assuming a Poisson distribution is given by p(t|p) = 1-e P . Now, if
we assume that we force the system to remain in state p by replacing the
least recently used page with the new page each time a page fault occurs,
then we might expect the system to continue to behave as before; i.e.,
t..e system will continue to generate faults according to 1-e * . It
- 8 "
can then be shown that the mean time between page faults in state p
is just
given a program
starts with one page and is allowed a maximum of p pages should be
derived using distributions of q (see Smith [132] and Freibergs [ 62]
for q distributions)! but is beyond the scope of this paper. We shall
settle for the approximations
V p) " ?VJ ' q -*p (5)
-i ft
where f (q) is the average number of pages referenced in time q < t .
In case q > t
P
V p) : F^fey ' q>t P
(6)
where p + >. (q - t ) is the total number of page faults generated in
time a > t . If a » t , then X q » p and we have:
p P P
>t P < T)
which is Eq. (U) as q •* °°.
- 9 "
Each time a page fault occjrs, we have to pay an average time
T to make space for and make present a page from secondary memory. Thus,
we can define the (rnonoprogrammed) CPU efficiency factor as
W
C\J
a>
- 12 -
In this section we have presented a very simple model of program
paging behavior in terms of the average time required to reference p pages
t - Bp?
P
Then, under the assumption that paging is a Poisson process, we derived
the a /era^e execution burst as a function of the number pages in primary
memory
~ d K ~ s
^ (p) ■ -dT = °*
Using these relations and values X, O. and £ derived from Fine's results,
we showed the effect on monoprogrammed efficiency of a gross time charac-
teristic T of secondary memory, primary memory allotment, and time quantum
q. This was done under the assumption that the page size was 102U words
and that a least-recently-used page replacement algorithm was used. In
the following sections, we will examine the effects of different page
sizes, replacement algorithms, and the use of multiprogramming to mask
I/O time.
2.2 EFFECT OF PAGE SIZE AND PRIMARY MEMORY ALLOTMENT ON PAGE FAULT RATE
In the previous section we assumed that the page size was fixed
at 102U words. As we shall see in this section, the page size, b, will
effect the page fault rate \ for two reasons. First, primary memory may
be underutilized to some extent due to a) primary memory not being filled
with potentially useful words, i.e., fragmentation and b) the presence of
13
words which are potentially useful but which are not referenced during a
period when the page is occupying primary memory, i.e., superfluity. Any
underutilization of primary memory tends to increase the page rate since
the effective memory allotment is decreased as analyzed in the last section
Second, more page faults may be generated when the page size is b than when
page size is 2b, simply because we only have to generate one page fault to
reference all words in the 2b page whereas to reference the same words we
have to generate two faults if the page size is b.
2.2.1 FRAGMENTATIONS AND PAGE SIZE
We assume ° that a program consists of a number of segments of
size s where s varies according to some statistical distribution with
mean s. These segments may contain instructions or data or both. The
words of a segment are lo gically contiguous, but need not be stored in a
physically contiguous way. Each segment is further divided into a number
of pages. The pages consist of b words which are stored in a physically
contiguous way. To allow for variable page size, we assume the system
imposes a size quantum Q < B on all storage requests such that requests
are always rounded up to the next multiple of Q. Page size b may be any
multiple of Q, but may not exceed B which is the largest number of neces-
sarily physically contiguous words which the system can handle. The ratio
B/Q may be thought of as an index of the variability of the page size. All
pages of a segment will be of size b = B except the last which will be some
multiple n of Q, b = nQ < B. The physical base address of a page may be
any multiple of Q; that is, it may be loaded beginning at any address which
is a multiple of Q. For example, if the maximum segment size s = B =
21
1024 and Q = 1, then we have the case corresponding to the Burroughs B5500.
- 14-
If Q » B and s » B, then we have the case of more conventional paging
systems.
Thus, we night have several pages allocated in primary memory
as shown in Figure 3 where Q ■ B/U. Notice that there are two sources
of memory waste evident in Figure 3» First, memory is wasted because
every storage request must be rounded up to a multiple of Q as shown by
the wavy lines. We refer to this as internal fragmentation . Second,
memory is wasted because there are four blocks of Q words which cannot
be used to hold a full page because they are not contiguous. This is the
classical situation of checkerboarding, which we will refer to as external
fragmentation . Notice that as Q ■* 1, internal fragmentation diminishes
to zero, while as Q - B, external fragmentation disappears. The exact
amount of waste will be dependent on Q, B, and the distribution of segment
sizes.
Randell [113] has studied the effects on memory utilization
of variations in these parameters. His results indicate that: l) loss
of utilization due to external fragmentation when Q « B is not as great
as loss due to internal fragmentation when Q = B; and 2) utilization does
net change significantly with changes in the mean segment size if Q « B,
but it does change significantly with s if Q = B. It is also apparent
that if s » B, then Q makes little difference.
Tr.e conclusion from this is that if a program is to be segmented
where s = B, then small Q is definitely desirable. If the page size must
b 2 = 1.5Q, b - 3.2Q and
b. » Uq. b = Uq.
- 16 -
afford to spend 1/2 the total cost of primary memory on the increased
paging hardware.
Unfortunately, a small B or Q is not the entire answer. While
small B or Q increases memory utilization and thus reduces the page rate
for a given memory allotment, small B or Q may also result in u corres-
ponding increase in page rate for reasons we will discuss in 2.2.3*
2.2.2 SUPERFLUITY VS. PAGE SIZE
Another factor which leads to an effective underutilization
of prir.ary memory arises from instruction or data words which are loaded
into primary memory as part of a page but are never referenced during
that period of residency. We will refer to these words as superfluous .
We can obtain slower bound on the number of superfluous words
by examining the total primary memory requirements M of a program as a
12
function of page size . That is, assume primary memory is unlimited,
then M(B) is the total amount of primary memory occupied after a given
execution of the program with page size B. Now, given unlimited primary
memory, if the program is run with page sizes b = B and b = 1, then at
least MCE) - H(l) words must be superfluous. If we force the program to
run with primary memory m < M(B), then page faulting will occur and the
number of superfluous words may increase over M(B) - M(l) since some words
which are eventually referenced are not referenced during same period of
their page residency and are thus superfluous during that period.
O'Neill [108 ] 13 and Belady [ ll] 1 present M(B) statistics which
are remarkably linear over the ranges 256 < B < 2048 and 128 < B < 1024,
respectively. Even for larger page sizes M(B) is reasonably linear, but
for small B, M(B) drops off sharply. Thus, we can assume
" 17
M(B) = a Q + a^ 256 < B < 1024 (9)
and a.B is a lower bound on the number of superfluous words.
Unfortunately Eq. (9) only establishes a lower bound on the
number of superfluous words. It does not tell us anything about the aver-
age number of superfluous words present when primary memory is less than
that absolutely required by the program. The authors know of no published
data which pertain directly to superfluous words in this case, so we shall
move on to determine the overall effect of block size on the paging rate \.
2.2.3 PRIMARY MEMORY ALLOTMENT AMD PAGE SIZE
In Section 2.1 we discussed the average execution burst cp(p) as
a function of memory allotment in units of p, the number of b=B=102U word
pages. In this section we will examine the paging rate X = l/cp as a func-
tion of primary memory allotment in words m = pB, for various values of
page size b=B.
We would expect that for small m, \ will vary considerably with
the page size. This is because for small m, the average time each page
is in primary memory will be relatively short, and so the extra words in
larger pages will tend to go unreferenced and will only take up space
which might better be occupied by new, smaller pages. On the other hand,
as m increases, we would expect to see page size have less effect since
the probability will be higher that more words in the page will be refer-
enced due to the longer expected page residence time. In addition, we
might also expect to see, for a given m, a B , such that any B.. > B
will only include superfluous words and any B p < B will not include
enough words.
- 18 -
Figure 4a is a graph oJ X vs. B and m based on experimental
data from a FORTRAN compiler [ h ] . This graph clearly exhibits that
when a program is "compressed," i.e., run in a smaller memory, large page
sizes lead to excessive paging. When the page size is small, then the
program tends to be more compressible. As m gets larger, the paging
behavior becomes less a function of B, and for large enough m, small B
- even increase the page rate. Slight minimum points were observed at
the (rn,B) points (2K,64), (MC,256), (8k,256). This illustrates that if
talninw exist, then they are not necessarily independent of m.
Figure 4b is another graph of \ vs. m and B data for a SNOBOL
compiler [135l« This program is evidently much less "compressible" than
the FORTRAN compiler in Figure 4a. However, it shows the same general
tendencies as Ficurc 4a except for the apparent lack of minima.
Another way to view the X vs. (m,B) relationship can be seen
observing in Figure 4b the dashed lines which pass through points of
eaual \, Notice that ,\(6K, 256) is only slightly lower than \(4k, 64).
Thus, we can affect an almost equal tradeoff between half as much primary
memory and 1/4 the page size; i.e., we double the number of pages but each
page is only 1/4 as large. However, we must also consider the increase
17
paging Hardware necessary to handle the larger number of pages.
Tr.e r.ain point to be had from these figures is that programs are
more cor.pressible when B is small; i.e., they will tolerate a much smaller
primary memory allotment if B is small. However, too small a B may lead
to a slight increase in paging activity. (See also a study performed on
the ATIAS system by Baylis, et al . [9 ].)
The above results further support arguments for variable page
sizes allowing logically dependent words (e.g., subroutines or array rows)
- 19-
J-
on
CVJ
H
-d-
o
o
o
O
OJ
•
'
o
H
m
S£
- 20 -
<<
,3
- 21 -
to be grouped in a page without leading to underutilization of memory
due to internal fragmentation or superfluity. Logical segmentation of
code and data will be taken up more generally in later sections.
2.3 REPLACEMENT ALGORITHMS
Whenever it is necessary to pull a new page, i.e., transfer
a new page from secondary to primary memory, it is also necessary to
select a replacement page in primary memory to be pushed (transferred
to secondary memory) or overlayed. If we assume that all programs are
in the form of pure procedures, then we never need to push program pages.
Data pages need to be pushed only if we have written into them. The
selection of a replacement page is done by a replacement algorithm. A
number of these algorithms have been proposed and evaluated [ 9, H>
12, 17, lo, 27, kO, kl, 86, 116, 125, 135] where Belady [ 11 ] has
produced the most extensive summary and evaluation to date. The various
algorithms can be classified according to the type of data which is used
by the replacement algorithm in choosing the replacement page.
Type l) The first type of information pertains to the length
of time each page has been in primary memory. The page (or
class of pages) which has been in memory the longest is
pushed or overlayed first. This information forms the basis
of what are usually referred to as FIFO algorithms. This is
the simplest type of information to maintain and it usually
requires no special hardware to implement.
Type 2) Type 2 information is similar to Type 1 information
but "age" is measured by the time since the last reference
to a page rather than how long the page has been in primary
- 22 -
memory. This information is the basis of the so-called
least -recently-used replacement algorithms. Many variations
exist, e.g., based on the fineness of age measurement. Systems
which accumulate this type of information usually employ
some type of special hardware to record page use statistics.
Type 3) Information as to whether or not the contents of
a page have been changed is frequently used to bias the
selection towards pages which have not been changed and
thus do not have to be pushed (but simply overlayed) since
an exact copy is still available in secondary memory.
Special hardware is needed to record the read-only/write
status of each page in primary memory.
Type U) In the ATLAS system [ 9 , 86 ] the length of the
last period of inactivity is recorded for all pages in a
program. This information is used to predict how long the
current period of inactivity will be, i.e., how soon a page
will be referenced again. Replacement is biased towards
pages which, on the basis of this information, are expected
to be inactive for the longest time. This type of information
is particularly useful for detecting program loops as was
intended by the ATLAS designers.
Belady [ 11 ] has evaluated the performance in terms of page fault rate of
a nu".ber of algoritnr.s as functions of page size and primary memory allot-
nent, anu we will now discuss his results.
The simplest algorithm studied was the RAM)OM algorithm. This
uses no information about pages, but chooses a replacement page randomly
from those in primary memory. The use of Type 1 information (time in
- 23 -
primary memory) never significantly improves performance relative to
RANDOM and in some cases performance is worse than RANDOM.
The use of Type 2 information (time since last read or write)
leads to the most significant and consistent improvement in performance.
With these algorithms the accuracy with which "age" is measured does not
seem to have much effect on performance, however. That is, performance
does not change significantly whether we keep a complete time history of
pages in primary memory, or just divide all pages into two classes—
recently used and not-so-recently used. The use of Type 3 information
(read only /write status) in addition to Type 2 information does not affect
the total number of page faults very much. However, it does increase per-
formance due to the fact that no push is required on 10 to 6of of all page
faults.
The ATLAS algorithm [86] which used both Type 2 and h informa-
tion is the most complex algorithm studied, and it is interesting to note
that it consistently leads to worse results than Type 2 algorithms and is
sometimes worse than RANDOM or FIFO. This result has been further sub-
stantiated by Baylis, etal. [ 9 ]• Apparently, the problem is that most
programs do not have a regular or small enough loop structure to warrant
the use of the ATLAS algorithm which is intended to take advantage of
program loops.
Thus, algorithms which make replacements are the basis of least
recently referenced pages and bias towards read-only pages would seem to
be best in terms of cost effectiveness. However, for existing systems
which do not have the hardware necessary to automatically maintain Type 2
and/or Type 3 information, RANDOM, FIFO or programmer directed schemes
must be used.
2k -
2.1* PROGRAM ORGANIZATION
Coroeau [ 30] has shown that simply by reordering the assembler
deck of the Cambridge Monitor System to cause logically dependent routines
to be grouped together, paging of the monitor was reduced by as much as
bO%. Brawn and Gustavson [ 18] and McKellar and Coffman [103] have
shown that simple changes in computation algorithms, such as referencing
matrices by square partition instead of row or column, can also affect
large improvements in paging activity. (See also [ 36, 37 > 51> 7330
These studies indicate that:
1) Programmers need to be aware of the paged and/or segmented
environment in which their programs will be executed.
Program optimization by reducing page faults is more
important than classical optimization techniques (e.g.,
common subexpression elimination).
2) Prorp-amners should be able to direct or advise the compiler
as to which code should be placed in which page/segment.
3) If possible, subroutine or procedure code should be placed
in the code segment where it is called. If this code is
smell and is used in several different segments, then
several copies of the subroutine could be generated, one
in each segment where it is called.
h) More emphasis should be placed on compiler optimization
of code through strategic segmentation. For example, by
analyzing the structure of a program (see Martin and Estrin
[99]) the compiler could make better segmentation decisions
and provide information which the operating system could
use to make replacement decisions, and to perform prepaging.
- 25 -
In addition, compilers might be able to detect certain
cases of poor data referencing patterns and issue appro-
priate warnings to the programmer.
Thus, we can improve paging behavior both by changing the physi-
cal parameters of the system and by intelligent program organization. The
latter method would appear to have a higher cost effectiveness and should
not be overlookeu.
2.5 SUMMARY
As we have noted, CPU efficiency can be related to the page
fault rate and the average time T to satisfy these I/O requests. In
Section II we have tried to illustrate the relationships between page
fault rate and primary memory size, primary memory allotment, page size,
replacement algorithm, program organization, and secondary memory
characteristics. Our intent has only been to indicate trends and general
relationships, and with this in mind our models have not been very elaborate.
However, all our models have been based on observed program behavior and
are probably accurate, at least for the classes of programs studied.
III. Multiprogramming
Multiprogramming arises for two reasons:
1) In an attempt to overlap I/O time by having one program
be executed while other procrams are waiting for I/O
(implicit or explicit).
2) In order to provide quick response to several real
time .iobs (time sharing, process controls, etc.)
will concern ourselves only with the first of these functions.
- 26 -
Whenever several concurrent programs share memory
in order to "mask" I/O time each program operates with less primary
memory than it would have if it were running alone. As we have seen,
this causes the paging rate for each program to increase. On the other
hand, by multiprogramming we are able to decrease the average time per
I/O request (both paging and explicit). Several questions now arise:
First, when does the degradation of efficiency due to increased page
traffic become greater than the increase in efficiency due to more I/O
masking. Second, how much of an improvement can we expect with multi-
programming over monoprogramming.
Gaver [ 65] has presented an analysis of multiprogramming based
on a probability model which relates CPU efficiency to the number of con-
current jobs J. where each job runs for an average of l/h instructions
(hyperexponentially distributed) before generating an I/O interrupt, and
I/O requires an average of T instruction times to complete (exponentially
distributed). Unfortunately, Gaver does not consider the fact that as
J increases, each job must be executed with less primary memory and thus
paging I/O increases. However, this is fairly easy to add to his model,
using the results of Section 2.1.
20
Suppose the total available primary memory is M pages and all
programs are identical and are allocated equal amounts of this memory.
21
Then the memory allotment for each program is just M/j. ' The paging rate
>. for each program as a function of J is then
>-< J > ■ mu ■ (10)
- 27 -
where cp(p) was defined in Section 2.1. We will assume this is exponent-
ially distributed. As in Section 2.1 we will use the function a,^A/jy
to model
08
B
<
P
•
fc<
'O
u
c
T1
2
•"3
(4*
JC
•
+J
^
G
•"3
6000, there is no gain
to be had from multiprogramming. This does not mean that multiprogramming
with this system configuration is bad. It merely illustrates that for this
system it is not wise to multiprogram programs characterized by a = 3»8,
B = 2.k and l/r = 10,000. (if l/r = 5000, then running 2 jobs is advan-
tageous; see Figure 6).
This introduces the scheduling problem. That is, which jobs
should be run concurrently? A good scheduler whose purpose is to maximize
throughput should be able to use information about programs' working sets
or a,e characteristics to determine an optimal load. We will not pursue
this subject further here (see Denning [ ho, Ul ] and Heller [ TO]).
Figure a shows the relative gain in efficiency over monoprogram-
ming due to multiprogramming with an optimal number of jobs
E(J^) - E(l)
G = °F* (13)
as a function of T for several combinations of r and M (in all cases,
a = 3'b) B = 2.U). This figure illustrates that for multiprogramming to
yield a reasonable gain, there must be sufficient primary memory (note
the I: = 22 curves).
Literature on multiprogramming and tine -sharing is extensive
I we will not attempt to present a comprehensive bibliography here,
(instead, see Buchholz [20], Calingaert [22], McKinney [104] Trimble
[13*0 and Bell and Pirtle [ lU]. Some useful studies can be found in
[ 12, U9, 52, 56, 65, 107, 130, 131, 132, 136].
- 32 -
I-
o
w
cd
>
B
• M
w
-P
£>
ft
O
O
•f-i
a)
Cm
-p
O
ert
o
m
•• i
Q)
T)
Jg
C
b
w
c
E
M
.-i
■L)
4J
ft
c
O
o
M
w
$-,
Cvh
0)
X)
bO
E
£
o
■H
M
bD-^
o
•
^
ft
o
Ck
c
E
cc
u
i ■
aj
>
ii
o
>.
o"
•
c
«*— >.
o;
'G
■H
ai
rj
M
• H
•H
Cm
rH
<+-'
CtJ
a;
E
c
o
■H
c
N«_^'
o
a;
E
c
•H
-*•-'
crj
bfl
t-
o
0)
>
-p
•rH
a;
-p
r i
a)
p< ,;
H
e »
CJ
o -5
PC
o o
•1-3
O r
•
M *"<
>o
C
U .
a>
M rM
M
S o>
a
H
•H
> s
fr<
crj C
- 33 -
IV. Average Time Per I/O Request
In Section II we introduced T as the average interval between
the time when a program is forced to stop (due to a lack of instructions
or data in primary memory) and the time when the program could resume.
In 2.1 and III, we showed that CPU efficiency is highly correlated with
the magnitude of T (see Figures 2 and 5)« In the following sections we
will examine T in more detail. Specifically, we will discuss techniques
whereby T can be reduced.
Secondary storage devices range from extended core storage to
magnetic tape, but the most common device in use today is the disk file.
The time required for these devices to deliver a block of b words can be
generally characterized by
T = t + t + b/p (Ik)
q a
where t is queueing time before the disk logic recognizes an I/O request;
t is the sum of head positioning latency and rotational latency, and p
3
is the transmission rate between primary and secondary memory. Four ways
in which we can decrease the average T are :
1) Decrease t by making the disk spin faster using more
a
heads per surface or by using extended core storage.
2) Making the disk spin faster or using higher bit densi-
ties increases p. We might also increase p directly
by reading more heads simultaneously.
3) Use parallel queueing techniques so that the average
T over n requests is less than T.
*) Change the distribution of t by planning the layout
of data on the disk in such a way that the data is
- 3 U -
almost under the read heads when it is needed (this
technique is only practical in systems doing large
calculations where a dedicated disk is available).
Alternately, we can prefetch data blocks (buffering).
We will now discuss some of these techniques.
-.1 PHYSICAL LATENCY OF SECONDARY MEMORY
Consider a disk system with one movable head per surface and
with all heads fixed to the same head positioner assembly. Now t , the
access time for this device, is the sum of two statistically distributed
* i~.es: t , the time to position a head, and t., the time required for
22
the desired sector to come under the heads:
t - t + t f . (15)
a p I v '
Or.e way to make this disk faster is to add more heads to each arm so that
the arm does not have to move so far to position a head over the right
track. This tends to decrease t .
P
Another way to decrease t would be to have independent posi-
Jr
tioners for each surface. Fife and Smith [ 5*+ ] have presented a good
analysis of this technique. Several manufacturers have eliminated t
altogether by providing one fixed head per track. To provide further
speedup we could introduce multiple heads per track (a matter which pre-
sents technological difficulties) or use a drum which typically rotates
faster than a disk but does not have as large a capacity. Both of these
latter techniques reduce t. in Eq. (15). (See also [133]*)
- 35 -
Any further improvement in the physical response of secondary
memory probably must come from the use of extended core storage (ECS).
This is potentially quite expensive (the cost per word being typically
more than one-tenth that of primary memory) but is considerably faster
as latency is on the order of ten microseconds as opposed to tens of
milliseconds for disks and drums. This could double CPU efficiency
(see Figures 2 and 5) but must be evaluated on the basis of cost effec-
tiveness. Several studies of the use of ECS can be found in [ 7> 63,
&S 79, 83, 101].
k.2 EFFECTIVE LATENCY OF SECONDARY MEMORY
Several techniques can be used to decrease the effective latency
of a disk device without changing its physical characteristics. For
instance, if several requests for blocks from the disk are waiting for
service, then we can decrease the average latency over all requests by
servicing requests in the order in which the required blocks come under
the heads. Another possibility which can be used in certain special cases
is to coordinate the layout of blocks on the disk with the timing of the
program so that blocks will be almost under the heads when they are needed.
U.3 REQUEST QUEUEING 23
We will assume that at any given time, there are n requests for
service from secondary memory (these requests having been generated by
the several programs being multiprogrammed). We also assume that the
secondary memory is a rotating device divided into M tracks, each track
being further divided into N sectors. Each request is for access to a
particular track and sector. The rotation time of the device is T,..
- 36 -
Each request waiting for service will experience a delay T,
sum of t (time in queue), t (access time), and t (transmission time,
Go r
assured constant).
The simplest way to service these requests is to establish a
single queue which is serviced on a first in, first out (FIFO) basis.
A better strategy is to service requests according to which request can
be serviced next (FSFO), i.e., the request whose required track and sector
is due under the heads next is serviced first. Denning [ 39] shows that
for a fixed head per track device the ratio of delay time under FIFO to
delay tine under FSFO is
(FIFO) n(N + 2) (l6)
(FSFO) N+2(n+l)
For Y = 3*- sectors and n = 10 requests then the relative improvement by
Eo. (l6) is 7.66. That is, the response of a fixed head device with 6k
sectors and 10 waiting requests is J. 66 times better under FSFO than under
Zk
FIFO. An analysis of movable head devices shows that improvement can
also be affected by similar scheduling algorithms, but the improvement is
not as dramatic.
U.U MINIMIZATION OF EXPLICIT I/O REQUEST TIME
A number of large scale calculations require space for their
data and instructions which exceeds the available primary memory. These
calculations involve operations on very large arrays and may require
several tens of hours per production run on the fastest computers. In
such cases there is no point to interval time slicing of the computation
for user interaction, although system throughput can be enhanced by multi-
programming, as discussed in Section III. If we restrict our attention
- 37 -
only to these kinds of large jobs, then one limiting case is a large
machine with one large job at a time, i.e., batch processing. We will
now turn to a discussion of preplanning the layout of a secondary storage
device in such a way that explicit I/O request time is minimized. The
interleaving of several jobs will not be discussed except to remark that
in such cases the execution time requirements become less stringent for
each job, but the sequencing of the interleaved steps presents new
difficulties.
Historically, there are many examples of preplanned drum layout.
When drums were used as primary memory, optimizing assemblers would locate
the sequence of instructions at appropriate intervals around the drum so
that (in jump free segments of code) the next instruction would always be
available when the previous one was finished [ll8]» For current machines
in monoprogramming mode, it is reasonable to assume that enough code resides
in primary memory at any time so that the time required to perform instruc-
tion overlays is negligible. However, data overlays may be extensive and
we might be able to decrease the latency involved in obtaining data blocks
from secondary memory by planning the layout of these data blocks and pre-
fetching data.
The question of overlaying data must be considered with respect
to the average amount of processing which may be performed on each data
element. Many matrix calculations (e.g., multiplication, inversion, eigen-
value calculation) require a N operations where a < 1 and N is the dimen-
sion of the matrix. Also, it can be empirically observed that a number of
partial differential equation solution techniques on N x N meshes require
a I operations per iteration, where cc is generally smaller than in the
matrix case but usually greater than 0.1. In the partial differential
- 38 -
equation case it is sometimes possible to iterate several times on a
block in memory, thus increasing a. If we assume a fr operations on N
data elements, then each element requires a N operations, where an opera-
tion may be regarded as, e.g., a multiply, an add, and a memory fetch or
say, one microsecond on a current machine. Let us assume a machine with
2
N words of memory available for each block transmitted from the disk.
This allows a flr microseconds of computation per block. If a = .5 and
'.: ■ 6U, then we compute for about 125 milliseconds per block. This is
more time than is required for the rotation of any current large disk,
which is usually in the range of UO to 60 milliseconds. Thus, if we
can always keep one input request ahead in a disk queuer, it should be
possible to completely mask the I/O request time.
As the ratio of processor speed to disk rotation speed gets
larger, this problem becomes more difficult. Suppose we have a calcula-
tion with the same parameters as above, but we wish to use a processor
which is ten times faster. Then we have only 12 milliseconds of computa-
tion time per block and this is faster than the rotation time of any large
iisk. T:.ere are several obvious ways to avoid this problem. One is to
increase M; this may require a larger primary memory. Another is to sup-
ply the disk queuer with several requests, thereby decreasing the expected
time until some request is honored [ 39]. In some cases there are uniform
but intricate relationships between the data blocks and their processing
sequerce. To handle these cases, we can attempt a third solution, namely
the preplanning of block layout on the disk.
Consider the problem of
matrix multiplication using a head per track disk. Suppose that both
operand matrices are partitioned into square blocks, that the prernultiplier
- 39 -
is stored by rows of partitions, and that the postmultiplier is stored
by columns of partitions. Let us also assume that the angle on the disk
between the positions of successive partitions represents the disk motion
time equal to the processor time required to multiply two square partitions.
Now if it happens that one row (ana column) of partitions ends just where
the next starts, then it is clear that such a disk storage scheme allows
matrix multiplication with no CPU time lost due to waiting for data from
the disk. It is also clear that if a sequence of matrix operations are
required, then the preplanning of the disk layout becomes more complex.
Ir. general, some I/O wait time will be required of the CPU. However, in
order to use any matrix as a premultiplier or postmultiplier, it is possi-
ble to store all matrices in such a way that they may be fetched by row
partitions or column partitions. This is achieved by storing the second
partition of the first row, say A p , in the same relative position on the
disk as the first partition in the second row, say A p ,. This skewing
pattern may be continued in the obvious way, given a sufficient number of
disk surfaces. Matrix inversion and eigenvalue calculations require much
more intricate disk storage schemes, but the problems are similar [ 91]*
A somewhat more difficult set of constraints is encountered in
some problems, e.g., explicit partial differential equation methods. In
these cases it is necessary to sweep through an array of data repeatedly.
When any partition of the array is being processed, it is necessary also
to have some data elements from neighboring partitions. For example, if
a five point finite difference operator is being applied to M element
partitions of an array, then vH border elements are required from each
of the four adjacent partitions. It should be possible to pack these
- 1+0 -
border elements in separate arrays, then write and read them on and off
the disk at appropriate times. Assume the calculation on an M element
partition requires time T . Next assume it is possible to map partitions
of the array onto the disk such that the one-way transmission time for a
T -€
c
partition is — = — . Now we can read a new block and write an old block
T -€
in 2(— — ) = T -e. If the edge values of the neighboring blocks can be
transmitted in and out to the disk in e time units, then the scheme main-
tains a steady state balance between computation time and I/O transmission
time.
A somewhat weakened set of conditions are imposed in Bernott [15]
"re it is assumed that T is not less than five times the one-way trans-
mission time for a bloc';. Various depths of finite difference operators
and any rectangular mesh are allowed. Also, the number of variables being
computed is a parameter. In terms of several latency considerations and
the above mentioned parameters, a disk layout is computed which gives a
resulting computation scheme that has an overall expected CPU efficiency
ater than &0$.
V. Summary and Extensions
As computer systems become more complex and as user's require-
-_s become more specialized, the computer system designer must give
r.ore attention to overall system cost performance when he designs each
part of the system. In other words, he must study more and more trade-
offs between various parts of the system.
In this paper we have discussed some interrelations between
-tern parameters including: primary memory size, page size, secondary
- Ul -
memory speed, I/O request queuers, and the number of jobs multiprogrammed*
These together with user program parameters including: mean time to access
p pages, number of instructions executed per datum and regularity of addres-
sing a data structure have a major influence on the CPU efficiency.
V/e limited our discussion to two -level memory hierarchies, but
the techniques mentioned can be applied to more levels by lumping several
levels and reducing the problem to one of two levels. This requires approx-
imating the parameters of a lumped level using the parameters of the levels
being combined. The use of a two-level primary memory is quite successful
in the IBM 360/85 [ 66}* It is also common to use a fast drum between
primary memory and a slow disk [3*+ ]• Machines which operate on arrays
of data and are organized as arrays of arithmetic processes are now being
designed. For example, the pipeline processors [12U] (which might be called
serial array processors) and ILLIAC IV [10 ] (which might be called a paral-
lel array processor) have many individual memory units, and this fact makes
it necessary to carefully plan the layout of data in primary memory for
maximum CPU utilization. The kinds of storage planning discussed below
might be regarded either as minimizing the number of data faults or the
time per data fault because the question is that of supplying data to the
processor from the primary memory at a maximum rate.
Serial array processors generally require a memory whose effec-
tive cycle time is equal to the CPU clock time. This is achieved by inter-
leaving many slower memory units in a large bank. Since, in general two
vectors are entering the processor and one is emerging, it is convenient
if at least three such banks are available. Clearly, serious memory con-
flicts can arise in this situation. If two argument vectors are stored
in the same bank, the processing speed may be cut in half.
- 42 -
Since present serial array processors reach a speed limit due
to the fact that the pipeline length can be made no longer than the number
of elementary steps in an arithmetic operation, parallel array processors
see™ to be a logical necessity for more speed improvement. The memory
system of IT..LIAC IV consists of one memory unit per processor. Each mem-
ory unit is directly accessible by just one processor. A network of rout-
ing logic may be used to get data to other processors. If one -dimensional
arrays are stored with one element per processor, then the full speedup
over a single processor may be achieved. In two-dimensional arrays, row
operations are easy to perform with a straightforward mapping of an array
into the memory, e.g., rows are stored across the processors and each
column is within a processor. Similarly, column operations are easy with
a transposed array. However, if both row and column operations are required
with such a storage scheme using an n processor machine, then operations
in one direction will realize an n-fold speedup but operations in the other
direction will realize no speedup at all over a one processor machine. If
row and column operations are required, some kind of skewing scheme as out-
lined in Section IV will provide the full speedup [90]. It may be expected
that in the future, parallel arrays of pipeline processors will require even
rr.ore intricate primary storage mapping schemes.
It should be remembered that we have been discussing just one
underlying subject throughout this paper: the ratio of cost to performance
for en overall computer system. We have attempted to relate several memory
parameters and program characteristics to the system performance as meas-
ured by CPU utlization.
- 43 -
LIST OF FOOTNOTES
1. Note we always measure time in instruction executions; i.e., we scale
time by the average instruction time.
2. The results of these experiments consisted of 1737 execution bursts
from lo2 service intervals for five programs: l) LISP, 2) an
interpretive meta compiler, 3) an interpretive, initially inter-
active, display generation system, h) an interactive JOVIAL compiler,
and 5) a concordance generation and reformatting program. Page size
was 1024 words.
3- This corresponds to imposing a variable q on the program. Smith [132]
-+■ IP- Pi
indicates this q had a hyper exponential distribution, w. e ' +
-t/U0.7 x 10 3
v 2 e
4. See Denninp [ 40, 4l ].
5« We assume that the first page is referenced at t=0 with probability
1 (t =0) which accounts for the difference between this formula and
that of Shemer and Shippey.
Determined from a least-squares fit to the function, £ t = a + y&np
where 5 = e . Average error over 18 points was l6$.
It should be remembered that values of a and (3 are characteris-
tics of a given program or class of programs, and should not be used
to describe all programs. A similar study of results [135] from a
SIIOBOL compiler yielded, cp(p) = .51+ p .
Belady and Kuehner [12 ] suggest the function 4,5« Segment
size was generated from several distributions. B was 1024 and Q was
varied from 32 to 1024 in powers of 2. Total memory size was 32K.
It was assumed that requests for memory were always waiting to be
filled.
11. .For Q = B/32, utilization varied from over 95% for s = 4B to about
90^ for s ■ B/2. At Q = b, utilization varied from just under 90%
for s = 4B to about h% at s = B/2.
12. Until stated otherwise, we now assume b = Q = B, i.e., page size is
constant over a given experiment.
13* This data comes from two program loads: 1) "10 small FORTRAN compil-
ations and loads" and 2) "FORTRAN compilations, and executions, used
to debug the 44x FORTRAN compiler. " Apparently, there is negligible
internal and external fragmentation in this experiment.
14. This data is from an integer programming/calculation.
15- Since apparently M(l) < a Q + a .
16. Again in this and the following experiment, there is apparently neg-
ligible fragmentation.
17. See Rosene [119 3-
16. B J. We
will only consider the case where I > J; i.e., there are no conflicts
- 45 -
for secondary memory.
The assumption of an exponentional distribution of I/O completion
time is not particularly realistic as Gaver admits. Since we are using
T to represent the average time required to complete all kinds of I/O
requests, paged or explicit, the density of T will probably consist of
a collection of exponential, Gaussian, and delta functions. However,
even with a simple exponential distribution, the total expectation
functions become quite complex, and a more complex distribution would
not be warranted here. See Smith [132] for a slightly different model.
20. Pzces are here assumed fixed at 102U words.
21. Actually, this could only be true if M were some multiple of J. However,
if M » J, this is not a bad approximation. We also assume here that
programs are not swapped out of primary memory while waiting for I/O.
22. See Frank [ 61] for an analysis of the statistical properties of disk
systems.
23. Our development in this section will follow Denning [ 39]. See also
[ 26, 132, 139, 1U0].
'elk. The particular case of Gaver 's model which we used in Section III
assumed r.o conflicts for secondary memory, i.e., rate of I/O comple-
tion was not dependent on the number of jobs (requests). The tech-
niques discussed here are not as good as those assumed in Section III.
- h6 -
BIBLIOGRAPHY
[1] Arden, B. W. Time sharing systems: a review. Michigan Summer
Conference on Computer and Program Organization , 19&7*
[2] Arden, B. W. Time sharing measurement and accounting. Michigan
Sumner Conference on Advanced System Programming , 1969*
[3] Arden, B. W., Galler, B. A., O'Brian, T. C. and Westervelt, F. H.
Program ani addressing structure in a time sharing environment.
JAC.: 13,1 (1/66), 1-16.
*[>] Anackcr W. and Wang, C. P. Performance evaluation of computing
systems with memory hierarchies. IEEE EC-l6,6 (12/67 ), 765-773.
Asp i nail, D. , Edwards, D.B.G. and Kinniment, D. J. Associative
rrer.ories in large computer systems. IFTP (1968\D8l-85.
Aspinall, D. , Edwards, D.B.G. and Kinniment, D. J. An integrated
associative memory matrix. IF IP (1968), D86-90.
Badges, G. F. Jr., Johnson, E. A. and Philips, R. W. The Pitt time
sharing system for the IBM system 36O: two years experience.
AFIPS FJCC, 33 (1968).
*
Referenced in text.
- ^1 -
[8] Bairstow, J. n. Time sharing, Electronic Design , 16,9 (1968) C1-C22,
*[9l Baylis, M. H. J., Fletcher, D. G. and Howarth, D. J. Paging studies
made on the I.C.T. Atlas computer. IFIP (1968), D113.
*[10] Barnes, G. H. , et al . The ILLIAC IV computer. IEEE EC-17,8 (8/68),
7^6-757.
*[ll] Belady, L. A. A study of replacement algorithm for a virtual storage
coirputer. IBM S. J., 5,2 (1966), 78-IOI.
*[12 n Belady, L. A. and Kuehner, C. J. Dynamic space sharing in computer
systems. CACM 12,5 (5/69), 282-288.
[13] Belady, L. A., Nelson, R. A. and Shedler, G. S. An anomaly in space-
time characteristics of certain programs running in a paging machine.
CACM 12,6 (6/69), 3^9-353.
*[1U] Bell, G. and Pirtle, M. W. Time sharing bibliography. IEEE EC-15,12
(12/56), 1764-1765.
*[15l Bernott, B. A. Disk I/O For Non-Core-Contained P.D.E. Meshes and
Arrays . DCS Report No. 3H, Department of Computer Science,
Referenced in text.
- U8 -
University of Illinois at ^hampaign-Urbana, Urbana, Illinois, (3/69).
f 16] Bobrov, D. G. and Murphy, D. L. Structure of a LISP system using
two-level storage. CACM 10,3 (3/67), 155.
*[17] Bovet, D. P. Memory allocation in computer systems. Department of
Engineering, UCLA Report 68-17 .
*[13] Brawn, B. and Gastavson, F. Program behavior in a paging environment.
AFIPS FJCC 33 (1968), Part 2, 1019.
Buchholz, W. File organization and addressing. IBM S.J. 2, (6/63),
5-111.
Buchholz, W. A selected bibliography on computer system performance
evaluation. Computer Group News , (3/69), 21-22.
[dl] Burroughs Corp. A .Narrative inscription of the Burroughs B5500 Disk
File taster Control Program . Burroughs Corp., Detroit, Michigan, 1966.
Calingaert, P. System performance evaluation: survey and appraisal.
CACM 10,1 (1967), 12-18.
Campbell, D. J. and Heffner, W. J. Measurement and analysis of large
Referenced in text.
- U 9 -
operating systems during system development. AFIPS FJCC 33 (1968)
903-91U.
[2k] Chu, Y. Direct execution of programs in floating code by address
interpretation. IEEE EC-lU,3 (6/65), U17-U22.
[25] Coffman, E. G. Stochastic Models of Multiple and Time -Shared Computer
Operations . Department of Engineering, University of California,
Los Angeles, California, Report 66-38, I966.
*[26] Coffman, E. G. Analysis of a drum input /output queue under scheduled
operation in a paged computer system. JACM 16, 1 (I/69), 73-90.
*[27] Coffman, E. G. and Varian, L. C. Further experimental data on the
behavior of programs in a paging environment. CACM 11,7 (7/68), U7I-U7I+.
[28] Cohen, L. J. Stochastic evaluation of static storage allocation.
CACM U,10 (10/61), 1+60-U6U.
[29] Collins, G. 0. Jr. Experience in automatic storage allocation. CACM
U,10 (10/61), 436-M+o.
Referenced in text.
- 50 -
*[30] Cameau, L. W. A study of the effects of user program optimization
in a pacing system. ACM Symposium on OS (IO/67).
[31] Conti, C. J. Concepts for buffer storage. Computer Group News 2,8
(3/69), 9-13.
Conti, C. J., Gibson, D. H. and Pitkowsky, S. H. Structural aspects
of the System 360 model 85: I* General organization. IBM S. J.
: (I960), 2.
Conway, M. E. A multiprocessor system design. AFIPS FJCC 2k (1963),
139-l 1 +6.
*t3^] Corbato, F. J., and Vyssotsky, V. A. Introduction and overview of
the multics system. AFIPS FJCC 27,1 (1965), I85-I96.
Daley, R. C. and Dennis, J. B. Virtual memory, processes, and
sharing in multics. ACM Symposium on OS (IO/67). Also CACM 11,5
(5/66), 306.
Deley, R. C. and Neumann, P. G. A general purpose file system for
secondary storage. AFIPS FJCC 27 (1965), 213.
erenced in text.
- 51 -
*[37] Dearnley, F. H. and Newell, G. B. Automatic segmentation of programs
for a two level store computer. TCJ 7, 3 (10/61+), I85-I87.
[38] Denes, J. E. BROOKNET - an extended core storage oriented network
of computers at Brookhaven National Laboratory. IFIP (1968), I9U.
*[39] Denning, P. J. Effects of scheduling a file memory operation. AFIPS
SJCC 30 (19oT), 9-21.
*[Uo] Denning, P. J. The working model set for program behavior. ACM
Symposium on OS (IO/67). Also CACM 11,5 (5/68), 323.
*Ol] Denning, P. J., Thrashing and its cause and prevention. AFIPS FJCC
33 (1968), 915-922.
[-2] Denning, P. J. Resource Allocation in Multiprocessors Computer Systems.
MIT, MAC-TR-50 (1968).
[U3] Dennis, J. B. Segmentation and the design of multiprogrammed computer
systems. JACM 12, h (10/65), 589.
[•■•k] Dennis, J. B. and Glaser, E. L. The structure of on-line information
processing systems. Proc. Second Congress on Information Systems
erenced m text.
- 52 -
Science*, .19o5, 5-1 1 *.
[45] Derrick, M. , Summer, F. H. and Wyld, M. T. An appraisal of the
Atlas supervisor, Proc. 22 Nat. ACM (1967), 67.
Dreyfus, P. L. System design of the Gamma 60. WJCC (1958), 130.
Elmore, W. B. and Evans, G. J. Jr. Dynamic control of core memory
in a real time system. IFIP (1965), 26l.
[<~ Estrin, G., Coggan, B., Crocker, S. D. and Hopkins, D. Snuper
Computer - a computer in instrumentation automaton. AFIPS SJCC 30
(1967), , ; 56.
)] Estrin, G. and Kleinrock, L. Measures, models and measurements of
time shared computer utilities. Proc. 22 Nat. ACM (1967), 85-96*
Evans.. D. C. and Leclerc, L. Y. Address mapping and the control of
access in an interactive computer. AFIPS SJCC 30 (1967), 23-32.
*[5l] Feldman, J. A. and Rovner, P. D. An ALGOL-based associative language,
CACM 12,8 (6/69), U39-UU9.
Referenced in text.
- 53 -
•[52] Fenichel, R. R. and Grossman, A. J. An analytic model of multi-
programmed computing. AFIPS SJCC 3^ (1969), 717-
[ 53 ] Fife, D. W. An optimization model for time sharing. AFIPS SJCC 28 I
(1966), 97-10U.
*[ 5 1*] Fife, D. W. and Smith, J. L. Transmission capacity of disk storage
systems with concurrent arm positioning. IEEE EC-lM (8/65), 575-582.
•[55] Fine, G. H. , Jackson, C. W. and Mdsaac, P.-V. Dynamic program
behavior under paging. Proc. 21 Nat. ACM (1966), 223-228.
.[ 5 o] Fine, G. H. and Mclseac, P. V. Simulation of a time-sharing system.
Man. Sci. 12 (2/66), Bl8C-19^
5 n^ ««j r r> Time sharing on a computer with a
[57] Fisher, R. 0. and Shepard, C. D. lime snamifc
small memory. CACM 10,2 (2,6?), 77-61.
[58] Flores, I. Derivation of a waiting-time factor for a multiple-bank
memory. JACM 11,3 (7/30, 26 5-
[59] Flores, I. Virtual memory and paging: Part I, Datamation 13,8
( j/67) 31; Part II, Datamation 13,9 (9/67), l H-
Referenced in text.
- 5U -
[60] Fotheringham, J. Dynamic storage allocation in the Atlas computer
including an automatic use of backing store. CACM U,10 (l0/6l), U35-U36.
*[6l] Frank, H. Analysis and optimization of disk storage devices for time
sharing. JACM l6,U (IO/69), 602-620.
Freibergs, I. F. The dynamic behavior of programs. AF3PS FJCC 33
(1966), II63-II08.
*L 153-165.
*[10U] McKinney, J. M. A survey of analytical time-sharing models. Comp .
Surveys 1,2 (6/69), IO3-II0.
Holland, F. C. and Merikallio, R. A. Simulation design of a multi-
processing system. AFIPS FJCC 33 (1968), 1399*
Referenced in text.
- 60 -
[10b] Naylor, T. H. , Wertz, K. ind Wonnacott, T. H. Methods for analyzing
data from cons>uter simulation experiaants. CACM 10,11 (H/67), 703-7IO.
*[107] Neilson, N. R. The simulation of time-sharing systems. CACM 10,7
(1967), 397-^12.
*[106] O'Neill, R. W. Experience using a time sharing multiprogramming
system with dynamic address relocation hardware. AFIPS SJCC 30 (1967),
611-621.
"■[109] Opperheimer, G. and Weizer, N. Resource management for a medium scale
tine sharing operating system. ACM Symposium on OS (10/67). Also
CACM 11,5 (5/68), 313.
[110] Penny, J. P. An analysis, both theoretical and by simulation, of a
time-shared computer system. TCJ 9 (5/66), 53-59*
[111] Pinkerton, T. Program behavior and control in virtual storage computer
systems. University of Michigan, CONCOMP Report h (1+/68).
[112] Pirtle, M. Intercommunication of processors and memory. AFIPS FJCC
31 (1967), 621-633.
Referenced in text.
- 61 -
*[113] Randell, B. A note on storage fragmentation and program segmentation.
CACM 12,7 (7/69), 365.
[llU] Randell, B. and Kuehner, C. J. Dynamic storage allocation systems.
ACM Symposium on OS (IO/67). Also CACM 11,5 (5/68), 297-
[115] Rehmann, S. L. and Gangwere, S. G. Jr. A simulation study of resource
management in a time-sharing system. AFIPS FJCC 33 (1968), IUU-IU30.
*[ll6] Riskin, B. N. Core allocation based on probability. CACM U,10
(10/61), I+5U-U6O.
[117] Roberts, A. E. Jr. A general formulation of storage allocation.
CACM 4,10 (10/61), 419-U20.
*[ll8] Rosen, Saul. Programming Systems and Languages. McGraw-Hill Computer
Science Series. (1967), p. 6.
*[119] Rosene, A. F. Memory allocation for multiprocessors. IEEE EC-l6,5
(IO/67), 659-665.
[120] Rosin, R. F. Determining a computing center environment. CACM 8,8
(7/o5), U63-U68.
*
Referenced in text.
- 62 -
L21] Saclonan, H. Time sharing v*. batch processing: the experimental
evidence. AFIP3 SJCC 32 (1968), 1-10.
Scarrott, G. G. The efficient use of multilevel storage. IFIP (1965),
137-142.
Schwartz, J. I. Coffman, E. G. and Weissmen, C. A general purpose
tine sharing system. AFIPS SJCC 25 (1964), 397 -411.
Senzig, D. N. and Smith, R. V. Computer organization for array
processing. AFIPS FJCC 27,1 (1965), 117-128.
Sherner, J. E. and Gupte, *S. C. On the design of Bayesian storage
allocation algorithms for paging and segmentation. Tfflti C-l8,7 (7/69).
er, J. K, end Gupta, S. C. A simplified analysis of processor
"look-ahead" and simultaneous operation of a multimodule main memory.
IEEE C-1&,1 (I/69), 64-71.
Shemer, J. E. and Shippey, G. A. Statistical analysis of paged and
segmented computer systems. IEEE EC-15,6 (12/66), 855-863.
Sisson, S. S. and Flynn, M. Addressing patterns and memory handling
fferenced in text.
- 63 -
algorithms. AFIPS FJCC 33,2 (1968), 957-967.
[129] Sherr, A. L. Time-sharing measurement. Datamation 12, h (U/66), 22-26.
*[130] Sherr, A. L. An Analysis of Time-Shared Computer Systems . MIT Press,
Cambridge, Mass. (1967).
*[13l] Smith, J. L. An analysis of time-sharing computer systems using
Markov models. AFIPS SJCC 28 (1966), 87-95.
*[132] Smith, J. L. Multiprogramming under a page on demand strategy.
CACM 10,10 (IO/67), 636-6U6.
*[133l Stevenson, D. A. and Vermillion, W. H. Core storage as a slave
memory for disk storage devices. IFIP (I968), F86-F91.
*[13^] Trimble, G. R. Jr. A time sharing bibliography. CR Bibliography
;■•' li, Computing Reviews 9,5 (5/68), 291-301.
*[135] Varian, L. C and Coffman, E. G. An empirical study of the behavior
of program in a paging environment. ACM Symposium on OS (10/67).
Also CACM 11,7 (7/68), U7I-47U.
*~136] Wald, B. The Throughput and Cost Effectiveness of Monoprogrammed,
Referenced in text.
6k -
Multiprogrammedj and Multl - nroceaaing Digital Computers . NRL Report
65U9, AD# 65^38U.
[137] Wallace, V. L. and Mason, D. L. Degree of multiprogramming in page
on demand systems. CACM 12,6 (6/69), 305.
Wagner, P. Machine organization for multiprogramming. Proc. 22 Nat.
ACM (I967), 135-150.
[139] Wingarten, A. The analytical design of real-time disk systems.
IFIP (1968), D131-137.
Wingarten, A. The Eschenbach drum scheme. CACM 9,7 (7/66), 509.
'.veizer, N. and Oppenheimer, G. Virtual memory management in a paging
environment. AFIPS SJCC 3 1 * (1969), 2U9.
Wilkes. Slave memories and dynamic storage allocation. IEEE EC-l4,2
(U/65), 270-271.
V/ilkes, M. V. A model for core allocation in a time-sharing system.
:ps sjcc 3^ (1969), 265.
Referenced in text.
- 65 -
NOV z
8 ^72
^
,&