LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 no.722-727 cop. 2. Digitized by the Internet Archive in 2013 http://archive.org/details/comparativerespo722mamr tJU no- 723. uiucDcs-R-75-722 COMPARATIVE RESPONSE TIMES OF TIME -SHARING SYSTEMS ON THE ARPA NETWORK by SANDRA ANN MAMRAK May 1975 /sr/ Report No. UIUCDCS-R-75-722 COMPARATIVE RESPONSE TIMES OF TIME -SHARING SYSTEMS ON THE ARPA NETWORK by Sandra Ann Mamrak May 1975 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 This work was supported in part by the Computing Services Office at the University of Illinois at Urbana-Champaign and in part by the Advanced Research Projects Agency under contract DAHC0U-72-C-0001, and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, May 1975* Um 2- in ACKNOWLEDGMENT The support, guidance, advice and criticisms of Professor Edward K. Bowdon, Sr. were elements essential to the success of this research project. His contribution to all substantive aspects of the thesis are greatly appreciated. Ms. Gayanne Carpenter's contribution to the project in terms of friendly and timely advice about departmental policies and various deadlines were also invaluable help toward project completion and are also greatly appreciated. IV COMPARATIVE RESPONSE TIMES OF TIME-SHARING SYSTEMS ON THE ARPA NETWORK Sandra Ann Mamrak, Ph .D . Department of Computer Science University of Illinois at Urbana-Champaign, 1975 If, indeed, the ultimate aim of a computing network is resource sharing, then the human component as -well as the technical component of networking must be fully investigated to achieve this goal. This research is a first step toward assisting the user in participating in the vast store of resources available on a network. Analytical, simulation and statistical performance evaluation tools are employed to investigate the feasibility of a dynamic response time monitor that is capable of providing comparative response time information for users wishing to process various computing applications at some network computing node. In particular, the following areas are investigated: 1. The measurement and statistical analysis of response times of individual time-sharing systems on a computing network. 2. The comparison of response times of these same time-sharing systems as they process a set of benchmark jobs. 3- The development of a single analytical and a single simulation model able to explain and predict response times for all time -sharing systems under investigation . k. The effect of heavy network traffic on the comparative response times of the individual time-sharing systems. V The research clearly reveals that sufficient system data is currently obtainable, at least for the five diverse ARPA network systems studied in detail, to describe and predict response time for network time-sharing systems as it depends on some measure of system busyness or load level. VI TABLE OF CONTENTS Page ACKNOWLEDGMENT iii 1 . INTRODUCTION 1 1.1. Computer Network Evaluation Trends 1 1.2. Computer Network Evaluation Deficiencies: A Problem Statement k 1«3- Time -Sharing System Evaluation 6 2 . COMPARATIVE RESPONSE TIMES ON THE ARPA NETWORK 9 2.1. The ARPA Network 10 2 .2 . System Variables 12 2 .2 .1. The Computing Systems 16 2.2.1.1. TSS - IBM Time -Sharing System 16 2.2.1.2. TENEX - PDP-10 Time-Sharing System 18 2.2.1.3. TSO - IBM Time -Sharing Option 2k 2 .2.1.1+. MULTICS - MIT Time-Sharing System 27 2.2.1.5. CANDE - University of California at San Diego (UCSD) Time-Sharing System 30 2 .2 .2 . Benchmark Jobs • 3^ 2.2.3. Load Level 37 2 .2 .U . Response Time Uo 3 • MEASURING TIME-SHARING SYSTEMS kk 3-1- Analysis of Individual System Data U8 3.1.1. AMES-TSS kQ 3-1.2. BBN-TENEX 5^ 3.1.3. CCN-TSO 56 3-l.U. MIT -MULTICS 60 3.1-5- UCSD-CANDE 6k 3 .2 . Comparison of Computing Systems 66 3 .2 .1 . Arithmetic Benchmark 73 3 .2 .2 . Bit Manipulating Benchmark 73 3.2.3. I/O Bound Benchmark 73 Vll TABLE OF CONTENTS (Continued) Page k . MODELING TIME -.SHARING SYSTEMS 79 k.l. An Analytical Model for Time -Sharing Systems 79 k.2 . A Simulation Model for Time -Sharing Systems 93 h.3- Analysis of Model Predictions 99 1+.3.1. Individual System Results 100 4.3.2. Success of Model Generalization 10U h.k . Consideration of Network Queueing Delays 108 5 • A DYNAMIC RESPONSE TIME MONITOR 113 5-1. Currently Feasible Monitor Features 113 5 -2 . Additional Desirable Monitor Features 117 6 . CONCLUSIONS 119 6.1. Implications for Future Network Development 120 6 .2 . Suggested Further Research 122 LIST OF REFERENCES 12k APPENDIX A 127 APPENDIX B 129 APPENDIX C 131 VITA 134 Vlll LIST OF TABLES Page 2.1. Computing Systems Summary 13 2.2. Benchmark Jobs Run at Various Computing Centers 36 2.3- Load Level Definitions 39 2.k. Command Sequence for Systems ' Measurement ^3 3 • 1 • Systems ' Saturation Level hG 3 .2 . Average Benchmark Processing Times ^7 3-3- Residual Mean Squares for AMES-TSS Curve Fits 53 k .1. Analytical Model Parameters 102 k.2. Transmission Times for Illinois to Experimental Sites... 109 k.3. Infinite Network Delays from U. of I. Node 112 5.1. Load Levels at AMES-TSS 11 4 CI. Residual Mean Square (RMS) Statistics 132 C .2 . Individual System Best Curve Fit Data 133 IX LIST OF FIGURES Page 2.1. ARPA Network Configuration in Early l^jk 11 2 .2 . Generalized Time-sharing Scheduling 15 2-3. The TENEX Scheduler 20 2.k: BBN-TENEX Scheduling 22 2.5. CCN-TSO Scheduling 27 2 .6 . MIT-MULTICS Scheduling 31 2 .7 . UCSD-CANDE Scheduling 33 3.1. Statistical Results - AMES-TSS hy 3 .2 . Statistical Results - BBN-TENEX 55 3-3- Statistical Results - CCN-TSO 57 3-h. Statistical Results - MIT-MULTICS 6l 3 .5 • Statistical Results - UCSD-CANDE 65 3 '6 . Arithmetic Benchmark Comparisons 67 3 -7 • Bit String Benchmark Comparisons 7 J 4 3 '8. I/O Bound Benchmark Comparisons 76 l+.l. Comparison of Two Models 87 k.2. Simulation of MIT-MULTICS Time -Sharing Scheduler 9h '4.3. Model Comparison - BBN-TENEX 101 k . k . Model Comparison - CCN-TSO 103 J +-5- Model Comparison - MIT-MULTICS 105 k .6 • Generalized Simulation Model Results 107 1 1. INTRODUCTION Less than a decade ago, the time-sharing concept on single computer systems was one of the main objects of computer science inquiry. There existed a wide divergence of opinion on such issues as where the technology stood, key application possibilities, feasibility, future directions and economics. Today the resource sharing concept on networks of computer systems has moved into the spotlight and become the object of identical kinds of inquiries. 1.1. Computer Network Evaluation Trends Although the computer network concept developed in an unrevolutionary manner, proceeding logically and in an orderly way from the development of highly sophisticated single processor systems, the performance evaluation techniques developed for single processor systems differ radically from those developed for geographically distributed multiple processor computer systems. Performance evaluation in single processor systems is characterized by a hodge-podge of performance goals and performance measurements. The most significant convergence of thought among single processor systems' analysts is agreement that what is required is a quantitative methodology on which to base analysis of real system data for model formulation and validation. Performance evaluation in networks, on the other hand, where it has been present, has been characterized by a careful development of analytic and simulation network models, generally supported "by data analyzed using optimization and statistical techniques. These evaluation techniques, as well as their specific application in existing or proposed networks are surveyed elsewhere [MAM7 1 !-] . An examination of existing models and measurements in computer network systems reveals several trends. Analysis "based on queueing theory has "been anchored in a node-by-node approach, assuming independence of the various network nodes. This approach works very satisfactorily for a limited set of network phenomena. Simulation has been successfully used, but can become prohibitively expensive when detailed representations of the network system are required [KLE7O, SAL73, WAR39J • Optimization techniques have been effectively transferred from network flow theory and are working well to yield specific design parameters [WHI72] . Actual system measurements, analyzed using statistical techniques and used to improve queueing and simulation models, have been relatively neglected [C0L7l] • (This neglect may be due in part to the unavailability of tools for making desired observations of dynamic systems and of statistically significant test environments.) Finally, although sophisticated performance evaluation tools are generally available, they have been applied almost solely to the ARPA (Advanced Research Projects Agency) network. Not the least important among the recent trends in computer network performance evaluation is research aimed at aiding the user in optimizing job routing and scheduling, and minimizing job cost. This trend has been spurred by a relatively stable network technology, coupled with an ever increasing number of general network users. From their embryonic days of the late 1960's until just recently, computer networks have been a subject of interest mainly to universities and research agencies. As late as January of 1973, the AREA network [R0B70] statistics were showing that even though the network was reliable and available, communication lines were used 3*5 percent on the average [MCQ73J • Also in 1973, the MERIT network [HER72] found itself in serious financial difficulties due to lack of interest by a sufficient number of users. However, over the last year, very substantial interest has been materializing in the wider university and research communities and in the commercial world as well . The Distributed Computer System network concept developed by D. Farber [FAR72] at the University of California at Irvine has been a significant exception to the common mode of development of network systems which provides inter-connected computer resources but requires users to do their own unadvised job scheduling. The host sites on Farber' s ring- structured network send bids for jobs back to the customers, thus pro- viding them with some criteria by which to choose a particular hpst for job processing. The majority of operational networks, though, do not provide the user with formal information on comparative job costs or comparative job run times. Marshall Abrams at the National Bureau of Standards should also be mentioned here as another unique contributor to user-oriented network performance evaluation research. He has developed a "stimulus-acknowledg- ment-response" model to describe the user-computer interaction and a data acquisition system called the Network Measurement Machine. He is using these tools to analyze network performance as perceived by a network user or the "consumer of computer services" [ARB7*+J • 1.2. Computer Network Evaluation Deficiencies; A Problem Statement There exists a need, then, for network performance evaluation efforts to be geared toward aiding the network user in the decision -making inherent in network interactions. Network designers and managers have been the fortunate recipients of analytical, simulation and statistical tools useful in carrying out their network duties. These same tools of the network performance analysts must also be applied to answer questions of importance to the user . While cost-effectiveness is an important performance factor, response time is often the primary performance parameter of interest to users and, in particular, interactive or time-sharing system response time . Given a choice of different interactive computing systems with varying capabilities for handling particular types of computer applications, network users need to be advised of the comparative turnaround or response times of those systems. More specifically, for a given network facility, let the system environment for a user at a particular time, t, be described by the set [i,j,k i (t),T i (s,j)J, where i is one of a set of n time- sharing computing systems accessible from the facility (presumably n is a constant over reasonably short periods of time), j is one of a set of m computing applications required by the user (presumably m is a constant over reasonably short periods of time), k. (t) is the load level on the i computer system 1 at time t (for convenience, k.(t) is partitioned •4- V. at the i facility into ten equal length intervals), and T.(s,j) (called "response time") is the time required at load s, where s = k.(t) at some time t, to complete the execution of a run command for the j application at the i facility. Within this system environment, answers to the following questions must he provided: (1) For some particular system i, is it possible to describe and predict the behavior of T.(s,j) as s varies with time? (Discussed in section 3«3»1«) (2) At some time t, is it possible to meaningfully th compare T.(s,j) for a particular i computing application when run at m different time -sharing computing facilities? (Discussed in section 3«3«2.) (3) Is there a single response time model (analytical, simulation or statistical) that will describe and predict T.(s,j) for each i and each j with an acceptable level of accuracy? (Discussed in sections k.k.l. - k.k.3.) (k) What is the effect of network traffic on T.(s,j)? (Discussed in section 4.U.U.) If the first three of these questions above can be answered affirmatively, then it will be feasible to develop a dynamic response time monitor that users can query to gain up-to-the-minute, on-line, comparative response time data for a particular computing application to be run on one of a set of network time-sharing facilities. 6 1.3. Time -Sharing System Evaluation The research required to answer the response time queries of the network user cuts across two distinct, but related areas — comparison of independent computing systems and the investigation of response time parameters in time-shared systems. The work done to compare systems is sparse. One significant comparative study of computing machines has been published and one is presently in process. K. E. Knight has compared the performance capabilities for 318 general purpose computer systems in terms of computing power and cost [KNI66, KNI68] . His measurements spanned the evaluation of machines from l^hk-l^Gj and distinguished machine capabilities in performing "scientific" computations from those in performing "commercial" computations. P. A. Alsberg from the University of Illinois at Urbana-Champaign has directed research aimed at producing comparative data for machine cost-effectiveness as it is measured across six interactive computing systems performing four different types of work--(l) numerical, (2) console, (3) input/output, and (k) bit/byte manipulation. All of the six computing systems either are on or will be added to the ARPA network. A third comparative study of computing systems was performed by P. E. Jackson and his associates [FUC70, JAC69] • This work will be discussed below. Extensive measurements and performance evaluations of response time in time-sharing systems have been reported by several independent researchers. Kleinrock [KLE72] has produced a survey of these performance studies, with an emphasis on analytical results. Studies based mainly on system measurements rather than analytical models have also resulted in important contributions to the field. A. L. Scherr [SCH67] who analyzed a large set of measurements taken on the MIT Project MAC Compatible Time-Shared System (CTSS) concluded from his work: that only mean think time, mean processor time and the number of users interacting with the system are of first-order effect in describing system behavior. R. A. Totschek's contribution [TOT65] resulting from his study of the SDC Q-32 system was characterized by the classification of many of the empirical distributions associated with interactive usage as having density functions with long slowly decreasing tails and standard deviations exceeding the mean value . Jackson and Stubbs [JAC69] studied a number of time-shared systems and determined average values for a variety of measurements relevant to interactive systems--think time, idle time, response time and so on. Later Jackson along with Fuchs [FUC70] estimated the distribution of many of these random variables. This study reiterated Totschek's finding in that Jackson and Fuchs found that for all the continuous random variables, the gamma distribution was an excellent fit and that the parameter in the gamma distribution ranged between 1.0 and 1.8. At 1.0 the distribution becomes exponential, and even at 1.8 its tail is still definitely exponential. The essential elements of the research methodologies associated with comparing computer systems and those associated with describing the behavior of time- shared systems can be abstracted from the work reported above. The comparative system studies are characterized by (l) running benchmark jobs with specified properties and (2) measuring well-defined quantities obtainable from all of the machines involved. The time-shared 8 studies also have two essential characteristics: (l) the conception of all interactions with the time- shared system (compiles, edit commands, run commands, etc.) as being of equal significance and the measurement of 'them as such, and (2) the development of models to describe and predict system behavior. The research methodology required to compare response times for different job applications on different machines is similar to that methodology already used in comparing systems, but somewhat different from the methodologies used to date in studying response time data. Since comparative results are required, running jobs with identical characteristics on each system and measuring well-defined quantities obtainable from each system is an appropriate and useful procedure. On the other hand, while former studies on time- shared systems considered all interactions to be of equal importance and measured and modeled under this assumption, we are concerned here only with job execution interactions. Furthermore, our concern is with run command response time measurements and models for specific computing applications . 2. COMPARATIVE RESPONSE TIMES ON THE ARPA NETWORK The task of providing the network user with information to facilitate decisions concerning job routing must "be accomplished within the framework of the present network technology. Theoretically, such aids may be as sophisticated as a "black box" environment in which users need merely indicate the type of job and special resources required and jobs are automatically scheduled to run with minimum response time. Given the present configuration of even the most advanced networks, however, a scheme best able to be readily implemented would be one in which the network interface processors contained sufficient information to indicate current expected response times for all time-sharing systems on the network . Such a scheme would require that both the user and the computing hosts input relevant information with which the interface processor can make its predictions. Generally, users are able to indicate the expected execution time required for a job and characterize the job as basically i/O bound or CPU bound. Generally, time-sharing systems provide some measure of load, such as number of users. One of the major purposes of this research is to develop a process for the network interface processor (dynamic response time monitor) that, given the user and system input described, can predict response time within some confidence interval. A combination of statistical, analytical and simulation tools will be used to produce this result. 10 The Advanced Research Projects Agency (AREA) network has been chosen as the environment for the research project and five different time-sharing systems accessible from the network will be investigated.* A brief description of the ARPA network and a definition of the system variables --the computing systems, the benchmark jobs each representing a particular computing application, response time and load level--are given below. 2.1. The ARPA Network . The ARPA network shown in Figure 2.1 is generally recognized as the pioneering effort in computer networking and resource sharing research. The initial objective of the network was to provide a system research environment in which the technical problems of networks could be explored by allowing persons and programs at one computing center to interactively access data and programs at other computer centers attached to the network. A packet switching store-and-forward network** whose nodes consist of interface message processor computers (iMPs) was set up and interconnected by 50 kilobytes/ second synchronous communication lines. The host computers ranged from large-scale general purpose systems such *Both experimental design and practical considerations influenced the decision to limit this investigation to just five systems. The systems themselves were diverse enough to represent a wide range of time-sharing scheduling philosophies and research funds were available for this specific set of computing nodes. **Definitions for such technical terms as "packet switching" and store- and-forward network" are given in Appendix A along with a ready-reference list of frequently used abbreviations. 11 Figure 2.1. AEPA Network Configuration in Early 197^ 12 as PDP-lOs and IBM 36o/370s to specialized computers such as the Illiac IV and the Massachusetts Institute of Technology (MIT) Multics System. The network is considered to "be a technological and resource sharing success as is evidenced "by its operational accomplishments which include: - remote use of computers either from a termination on a host or a terminal interface processor (TIP) - file movement and printing - communication of personal messages by way of "mailboxes" - machine-to-machine subroutine communication - access to large common data bases. 2 .2 . System Variables Five interactive operating systems currently available on the AREA network were chosen for comparison. The performance of these operating systems is essentially tied to the computing installation supporting and maintaining them. All references to interactive systems, therefore, will include the computing site as well as the name of the system itself. A summary of the five systems and their basic scheduling characteristics is presented in Table 2.1. A more detailed discussion of each of the systems follows. Throughout the discussion, reference is made to. the "working set" of a process, in describing its paging behavior. This concept was first defined by Denning and is explained in an article written by him [DEN68] . Four of the five interactive systems (all but AMES-TSS) dispatch jobs to the processor using a scheduling algorithm whose major components are a series of priority queues and associated CPU time-slices. As jobs 13 Table 2.1. Computing Systems Summary Hardware Scheduler Location Configuration* Characteristics** AMES67--Nasa Ames IBM 360/67 Table driven; frequency Research Center, 1000K-2000K core, and duration of processor Moffett Field, CA 5 disks, 3 drums, time-slices determined by (TSS) 10 tape drives, 3 printers, 2 card readers, 6k terminals paging behavior BBN--Bolt, Beranek PDP-10 Five priority queues and Newman, Inc . , 193K core, 9 disks, with SXFS processing Cambridge, MA 1 drum, k tape among queues, LXFS pro- (tenex) drives, 1 printer, cessing within queues, 6k terminals, 1 dis- RR processing in the play processor, 1 last queue plotter, 1 paper tape punch, 1 paper tape reader, 1 teletype scanner CON-Campus Com- IBM 360/91 Series of priority queues, puting Network, 4000K core, 5 disks, each with lower dispatching Los Angeles, CA 1 drum, 8 tape drives, priority and effectively (TSO) k printers, 85 a longer time-slice than terminals the former; each queue served FIFO MIT — Massachu- Honeywell 6k5 Series of priority queues, setts Institute 38i« core, 11 disks, each with lower dispatching of Technology, 1 drum, 10 tape priority and a longer fixed Cambridge, Kk drives, 2 printers, time-slice than the former; (MULTIC3) 1 I/O controller, 1 card reader, 1 card punch, "several hundred" terminals each queue served FIFO US CD- -University Burroughs 6700 Basically two priority queues, of California at 2I40K core, 19 disks, with high priority queue of San Diego, CA 8 tape drives, 3 burst-oriented processes and (CANDE) printers, 1 remote low priority queue of compute job entry terminal, bound processes; both queues 1 card punch, 1 card served FIFO reader, 512 terminals "Detailed hardware descriptions are available in [ANR73a] information is accurate as of August, 1973. **FIF0 - First arrival, first service RR - Round robin SXFS - Shortest execution, first service IXFS - Longest execution, first service This Ik Table 2 .1 (continued) . Computing Systems Summary Location Memory Management Remarks AMES67--TSS Estimations of working set and working set size char- acteristics are the heart of the scheduler; "balance core time principle" used to determine time-slicing Distinguished by a scheduler that is primarily concerned with core demands rather than CPU demands . BBN--TENEX Balance set control module in scheduler regulates running processes so as to minimize the probability of and idle CPU due to too frequent page faults Most sophisticated of sche- dulers; embodies all three scheduling disciplines of . SXFS, LXFS, and RR CCN--TSO Fixed (virtual) region size alloted to each virtual machine; single process currently on a virtual machine has access to entire region Distinguished by binding processes to one of a fixed ' number of virtual machines within which no multi- programming occurs MIT— MULTICS A list of "eligible" processes is maintained consisting of those processes which have the highest dispatching priority and can simultan- eously exist in core Concept of set of "eligibles" insures efficient resource utilization in a multi- programming environment UCSD— CANDE Multiprogramming paged system in which each core resident process can expand core holdings up to the maximum size of its currently assigned "sub space" Simplest of time -sharing scheduling philosophies; like TSS, time-slices are associated with a process rather than a queue 15 enter the system, they are assigned to the highest priority queue. This queue has a relatively short time-slice associated with it. If the job uses its entire time-slice in its first pass through the system, it is relegated to the second priority queue which has a slightly longer time-slice associated with it and so on. Queues are served from highest priority to lowest priority. Discipline within queues vary among FIFO, RR, SXFS, and LXFS as explained in Table 2.1. A generalized version of these scheduling algorithms is presented in Figure 2.2. This representation will be made specific for each system (except AMES-TSS) as it is described in detail. Figure 2.2. Generalized Time-sharing Scheduling PRIORITY LEVEL i i t-i i ^ lictct \ ■ (QUEUE DISCIPLINE) _ 1 IHIbntb 1 J - ARRIVAL ^ ti DEPARTURE — ^ 2 1 ^ t 2 — ► 3 i ^ t 3 — ► N (LOWEST) t 16 2.2.1. The Computing Systems 2.2.1.1. TSS - IBM Time -Sharing System [DOHTO] The TSS/360 interactive system has a table driven scheduler consisting of a set of programs in the resident supervisor used for scheduling, and a table with many rows (levels) of entries. The scheduling philosophy is based on the premise that processes making light demands on the CPU and core resources should receive fast service and those making heavier demands on these resources should receive relatively slower service . The implementation of this philosophy is concentrated almost entirely in the constant monitoring of a process ' paging requirements (as opposed to its CPU usage). Programs with small working set sizes are awarded frequent and comparatively long time-slices in the processor. Processes with large working set sizes and poor locality are awarded only short, infrequent time-slices. This strategy tends to minimize the time that any large program can clog memory, thereby providing a potentially significant increase in the level of multiprogramming, and faster response time for a larger number of processes. Assignment of core resources is the heart of the TSS scheduler. The table which drives the scheduler can be thought of as being divided into sets of levels grouped primarily according to the core usage char- acteristics of a process. The interactive sets of table levels are the Starting Set, the Looping Set, the AWAIT set, the Holding Interlock Set and the Waiting for Interlock Set. 17 The Starting Set of table levels is used to handle new inputs from the terminals. This set consists of several successive high priority table levels, each with small execution time limits and increasingly larger core space limits. A process remains under control of the Starting Set of table levels and proceeds through its various queues as long as it continues to exceed its space limits only (up to some maximum). When the process exceeds its time limit at a given level, the space limit of that level is used as the estimate of the current working set size of that process and the future execution of the process is controlled by the Looping Set of table levels. The Looping Set table levels performs three significant functions Its first function deals with the dynamic estimation of the time and space requirements of a process in accordance with the balanced core time principle. This principle states that the length of the time-slice to be awarded to a process is inversely proportional to the working set size in that time interval. The second function of these table levels is to cause the load generated by long running processes to be distributed so as to allow Starting Set entries to be processed quickly. Finally, the Looping Set optimizes CPU utilization and penalizes bad paging processes by causing processes with minimal paging requirements to be selected for running far more frequently than those with large paging requirements. Of the three remaining sets, only the Holding Interlock Set of table levels deals with processes that are ready to run. Processes running from this set are currently holding interlocks on some system 18 resource and have a high priority so that the interlocked resource may be quickly freed. The AWAIT Set and the Waiting for Interlock Set administer processes which are in a wait state for some reason. As described above, processor time- slices are allocated dependent upon a process' recent core usage behavior. The frequency and duration of the time-slice a process is awarded is determined by values in the table levels of the Starting Set, Looping Set and the Holding Interlock Set. These values in turn are determined by the working set size and locality characteristics .demonstrated in the. process' paging demands. 2.2.1.2. TENEX - PDP-10 Time -Sharing System [B0BT2] The TENEX scheduling philosophy takes a middle ground between two conflicting precepts of process behavior in a time-sharing environment. On the. one hand, the more time a process has used, the closer it is to completion. On the other hand, the longer a process has run, the less are the chances that it will complete "soon". Ready jobs are distributed in queues for service, therefore, such that if two processes are widely • separated in accummulated run time (are in different queues) the one with the lesser time will be preferred, and if two processes are closely spaced (are in the same queue), the one with the greater time will be preferred. This type of scheduling can be characterized as shortest- processing-time first among queues and longest-processing- time first within queues. 19 A second aspect of the TENEX scheduling philosophy is concerned with the complex interplay in the allocation of core and CPU resources. Incorrect handling of the information gathering and decision making procedures involved in determining working sets and core utilization in a multi-process paged system can result in poor efficiency and bad service. Thus, a "balance set control" module directly responsible for these functions is made an integral part of the scheduler. Figure 2.3 depicts the four distinct scheduler modules. The process controller and balance set control modules will be discussed in detail below. The real-time scheduler is concerned only with those processes which are currently making real-time demands on the system. Its scheduler portion is invoked whenever an external signal or clock indicates that rescheduling may be required. If there are no real-time processes requiring service, then the selection of a process to run falls to one of the other modules. The function of the startup and dismiss routines is fairly common and straightforward. Included in this module are routines to save and restore environments as they go out of and into execution. No important scheduling or other decisions are made by this module. The balance set control module of the TENEX scheduler is responsible for efficient use of core. The logical storage organization includes the core, drum and disks and their associated channels so that the efficient use of core is closely related to making efficient use of the data channels to the drum and disk. Because of this logical memory 20 Figure 2-3. The TENEX Scheduler BALANCE SET CONTROL REAL-TIME Y SCHEDULER J STARTUP AND DISMISS INTERFACES (PROCESS \ CONTROLLER J structure, when a process cannot "be run because of a page fault, the process is not considered to he in a wait state. The process is, in fact, still demanding CPU services which cannot he given because core rather than the CPU is not available. Three basic functions fall under the jurisdiction of the balance set control module. These include maintaining the list of processes in the balance set such that the working set of all these processes can co-exist in core, selecting a process in the balance set for running 21 when the running process must be stopped for a page fault, and, on the occurrence of rescheduling event, removing and/or adding processes to the balance set in cooperation with the process controller. Dynamically, determining how many processes can simultaneously reside in core and what the size of these processes should be is the central function of the balance set control. This involves trying to keep a balance set which maximizes the probability that there will always be at least one process to run. That is, whenever one process experiences a page fault, there should be another process ready to utilize the CPU resource. This suggests that the processes must run an average time, T , greater than the average interval over which one page transfer will be completed for one of the page-waiting processes, W . The balance set control module iteratively estimates T and W and attempts J av av to maintain an environment in which T > W av av If the balance set control function described above provides more than one process which is an eligible member of the balance set, then an algorithm is required for selecting one among these processes to run when a page fault occurs. This algorithm is also a part of the balance set control module. Finally, several rescheduling events can occur which require the removal or addition or processes to the balance set. These events include processor time quantum overflow, I/O blocks, or i/O unblocks. Handling these process exchanges in and out of the balance set is a balance set control module task. 22 Processor resources in TENEX are allocated to processes chosen from distinct ready queues, where queue position is determined "by previously accumulated processor time. Figure 2.k is a graphic presentation of this scheduling algorithm. Figure 2.k. BBN-TENEX Scheduling PRIORITY LEVEL ARRIVAL FIFO DEPARTURE *» 64 ms LXFS 1 ■ »» 256 ms LXFS i 1 1 >» 1024 ms LXFS 4096 ms t RR 16370 ms The scheduler prefers a process in a smaller numbered queue over that in a higher numbered queue. In this respect, it prefers processes with the smallest amount of accumulated time. But further, within a queue, the scheduler chooses for execution the process with the longest accumulated time in the expectation of completing a process which probably requires only a small additional amount of CPU time. 23 These queues are not extended indefinitely, but terminated with N = 5 distinct queues, for two separate reasons. First, a process that had run a very long time would get no further service if another process began a long computer run until the second process had run nearly as long as the first. (A long running process could also be completely shut out of service by a set of short running processes which used 100 percent of the CPU.) Second, although the frequency of rescheduling goes down as the queue time becomes large, a point is reached at which the rescheduling overhead is an insignificant fraction of the total time and no gain is achieved by reducing it further. For these reasons, then a "last queue" is defined. Processes in this queue are scheduled using a round-robin discipline, disregarding all former processing history at this point and cyclically giving each process a certain quantum of processing time in turn. Use of this scheduling algorithm requires the assignment of three parameters: - the factor by which the processing time allotted on each queue is greater than the last - the amount of processor time allotted on the first queue - the number of queues. The basic principle involved in assigning these parameters is that fewer and longer queues result in less system overhead but produce a poorer approximation to ideal scheduling as represented by a large number of queues. Bolt, Beranek and Newman (BBN) have currently assigned values to these three parameters as follows: 2k - the i queue receives four times the processing allotment st as the (i-l) queue - queue one allots 6k msec for processing - there are five distinct queues. Up to this point, the discussion of the scheduler has "been limited to handling jobs on the ready queue. The scheduling algorithm also keeps account of processes waiting for some external condition or event such as an i/o device to complete or a user to type a character. In this case, the scheduler's goal is to insure that these processes too will receive their fair share of processing time, i.e., about l/M of the CPU, where M is the number of processes in the system. The scheduler achieves this goal by using the following procedure. During the periods in which a process is in the wait state, the process is "credited" for CPU time not used by reducing the accumulated time values at the rate of l/M. Reducing this quantity tends to move the process to the higher queues so that it will be preferred over other processes which continue to run. This procedure does not include waits- occasioned by disc or drum transfers as explained in the previous section describing the core allocation algorithm. 2.2.1.3. TSO - IBM Time -Sharing Option The basic scheduling philosophy of the TSO time-sharing system is to award fast response times to processes requiring only a short amount of CPU service. Processes requiring increasingly longer amounts of processing time experience proportionately longer response times. This philosophy is implemented in a series of queues (usually three or 25 four) through which a process descends during its residence in the system. Each queue has a lower dispatching priority than the former one, and each queue typically allots a longer processing time-slice to its members. Processes are served strictly first-come, first-served within queues. The TSO time -sharing system can be run in a real or virtual memory system, for example OS/MVT* or 0S/VS2*, respectively. The basic concept behind core assignment is the same in both types of systems, but the implementation of the assignment is, of course, different. A pre- determined number of regions, say four, is set up in memory and these regions form separate virtual processing systems which are assigned to users as they log onto TSO. Users are associated with one of these regions exclusively for the duration of their working session. Each of the virtual systems acts independently of the others and each has an independent, optionally identical scheduling algorithm as described below. Within a region (,or virtual processing system) no multiprogramming exists. Each process has use of the entire core and CPU resources assigned to its region until it is swapped out in total and put back on one of the dispatching queues. The UCLA Campus Computing Network (CCN) TSO system is an OS/MVT system with one memory region. *See Appendix A for definitions. 26 The TSO scheduler chooses processes for running dependent only on the most recent behavior of the process. That is, only the last cause for removal from execution (i/O request, timer run-out, etc.) is used to determine the next queue position for that process. The dispatching algorithm, illustrated in Figure 2.5, typically defines three queues, Ql, Q2, and Q3, to which a ready process may "be assigned. The first queue consists of processes which have just passed from a blocked (or wait) state to a ready state. These processes have the highest dispatching priority and are served first-come, first-served within Ql. The second and third queue consist of processes which experienced a timer run-out during their last time-slice in Ql or Q2, respectively. In general, an extensive set of parameters exist with which to manipulate the function of dispatching processes for CPU service. CCN-TSO in effect controls its queues by setting three of these optional parameters to significant values. The "preempt" option is enabled, and parameters called "min-slice" and "occupancy time" are set for each queue. The occupancy time associated with a queue is the maximum time-slice of execution allowable to a process from that queue. These values are presently set a 2.0 seconds for Ql, k.O seconds for Q2, and 16.0 seconds for Q3« The min-slice settings work in conjunction with the preempt option and presently are assigned values of 1.6 seconds for Ql, 2.0 seconds for Q2, and 3-0 seconds for Q3« These matter values override the occupancy time settings in the following way. If a process is queued for service at the same or higher priority level than a process presently holding the CPU, then the process holding the CPU is preempted 27 after its respective min- slice, rather than "being allowed to utilize its entire occupancy time quantum of service. Preempted processes return to the queue from which they had just come, until they have been allocated processor time equal to the occupancy time for that queue. Figure 2.5. CCN-TSO Scheduling PRIORITY LEVEL ARRIVAL FIFO 2 sec. * FIFO »- 4 sec. FIFO 16 sec. DEPARTURE 2.2. l.k. MULTICS - MIT Time-Sharing System [0RG72] The MULTICS time -sharing scheduler design was based on the philosophy that the higher the load a process places on the system when it is allowed to run, the lower its scheduling priority should be. Thus, processes requiring the smallest amount of processor time share the highest priority queue. Principally because of memory limitations, 28 however, not all equal-priority processes can share the processor simultaneously. The basic time-sharing scheduling philosophy, then, is modified by a multiprogramming scheduling function. This multiprogramming function restricts access to the processor to an appropriate subset of equal-priority processes called the "eligibles" . This subset is chosen small enough so that work that is done for each member is not degraded, for instance, by thrashing. An active process in the MULTICS system cycles through five execution states- -running, ready, waiting, blocked and stopped. The execution state not only describes a process ' processor contention characteristics, but also suggests how that process is competing for me mor y res our c e s . Only running and waiting processes are considered eligible to directly compete for pages of core memory at any one time. Eligibility refers to the depth or degree of multiprogramming and is first conferred on a ready process when that process attains highest relative priority among noneligible ready processes and when its core requirements, when added to those of the eligible processes, do not exceed the total available core. Eligibility is withdrawn when a process uses up its time-slice allotment, completes an interaction or otherwise enters a dormant (blocked or stopped) state. A running process may attempt to capture as much core as it needs. It will be restricted in its attempts only by the competing demands of processes that are simultaneously executing on the processor. A waiting process (differentiated from a blocked process by the predictably short period of time it has to wait for a system event, for example, 29 the arrival of a page into core) remains eligible to compete for core and retains its favorable queue position. In general, because a waiting process is not actually executing, attrition can occur in its core holdings due to demands made by executing processes. Since wait periods are expected to be relatively short, however, there are only short periods between the wait and running states of a process and, therefore, minimal, if any, attrition of the waiting processes' core holdings occurs. The ready, blocked and stopped processes share the same core competition status in that they are all "losers". Because these processes are not eligible, they cannot acquire core pages. The executing processes fulfill their core requirements at the expense of these noneligible processes and thus these latter continue to lose what pages they previously had resident in core. The longer a process is not eligible, the fewer pages it can expect to have in core. As stated earlier, a process receives a dispatching or scheduling priority based on the load it will place on the system. Since in general a command's duration is not known in advance, an adaptive technique is used to dynamically estimate the processor requirements of each process. In the MULTICS scheduler, the assumption is made that every process arriving on the ready list for the first time will execute a short command and, therefore, deserves a high priority position on the ready list. Associated with the position is some fixed time allotment t . When a process is picked to compete directly for processor and core resources, i.e., is eligible, the command may run to completion. If the allotted time is exhausted, a timer run-out mechanism will halt execution of the process and it will then be assigned to a lower priority position. Each 30 lower priority position awards the process an increased allotment of time up to some maximum until it completes execution. The processing time allotment associated with the r priority position is approximately Figure 2.6 illustrates a convenient way to conceptualize the MULTICS dispatching of processes. Even though, in fact, only one ready list exists in the MULTICS scheduling scheme, this single list effectively functions as a set of n priority queues. The processing time allotment in queue 1 is one second and approximately doubles in each queue up to queue k. Processes are served FIFO at each priority level. Exact implementation of this straightforward algorithm "becomes fairly complex in the MULTICS system and the reader is referred to other authors [GRE7^, 0RG72] for a more detailed discussion. In keeping with the policy of giving good response to interactive users that issue commands of short duration, preemption is permissible in the MULTICS system. A higher priority process can preempt a presently eligible process of lower priority. The preempted process is favorably treated, relatively speaking, in that it is placed at the top of its priority queue with a time allotment equal to whatever time is unused from its last scheduling allotment . 2.2.1.5. CAKDE - University of California at San Diego (UCSD) Time -Sharing System The CANDE interactive computing system espouses a straightforward approach which operates basically by distinguishing burst-oriented processes from those that are compute bound. Processes which are estimated to require a "small" amount of CPU time as determined by the 31 Figure 2.6. MIT-MULTICS Scheduling* PRIORITY LEVEL ARRIVAL FIFO *• 1 sec. FIFO 2 sec. DEPARTURE FIFO I — »» 4 sec. ^ RR 1 ► ■ *• 8 sec. ► x This illustration is an approximate description of the MULTICS scheduling function. In fact, only one dispatching queue is maintained, and the system has two processors. fact that they did not exceed their allotted time-slice during their most recent execution state, are served first-come, first-served from a high priority queue. Processes which incurred a timer run-out during their last run period are served first-come, first-served from a low priority queue . CANDE is a virtual memory system which multiprograms processes into "subspaces" of real core. If a process has the highest dispatching priority and there is adequate memory available for a swap-in, then the process receives its required core storage. Memory assigned to a 32 process is increased up to the fixed size of the subspace whenever the process exceeds its currently allotted space. There are five events which can cause a process to be swapped out. These include an input wait, an output wait, a process suspension, a time-slice allotment expiration and a core demand in excess of the subspace size allocated to the process during its previous swap into core. The primary goal of the subspace option is to allow a large number of burst-oriented processes to run without freezing memory resources during their dormant periods. Memory is freed by immediately swapping the process to disk when it becomes dormant. Because a large number of tasks are bidding for a limited memory resource, tasks which discontinue their burst-orientation (become compute bound) have an artificial burst rate imposed upon them. This artificial burst rate is called the process' time-slice. CAKDE has two priority levels (queues) for selecting ready tasks for execution, or swapping into core as illustrated in Figure 2 .7. The lower priority queue contains processes which exceeded their time slice during their last swap-in. The higher priority queue contains all other ready processes. These high priority processes are those which are new to the system, which have received input for which they were waiting, which have output at least half of the data excess which originally caused them to be swapped out or which have been awakened from swap-out suspension. Within this high priority or "demand status" queue processes are ordered first-in, first-out as they are within the lower priority queue. Lower priority queue processes, or "time-sliced" processes, are swapped into available memory only if there are no demand status swap requests which can be satisfied. 33 Figure 2.7. UCSD-CANDE Scheduling PRIORITY LEVEL ARRIVAL FIFO f(c,n)* DEPARTURE *— FIFO_ f(c,n) ^ »~ *n and c are defined on page 3^ • Jobs feed into the first priority queue if their last removal from execution was caused "by a wait or blocked state and they feed into the second priority queue if their last cause of removal from execution was a timer run-out. The time-slice allocated to each process when it is swapped into core is computed on an individual basis and does not depend exclusively on priority level. Before allocating a processor to a swappable process, both its allowable processor time-slice and its allowable elapsed time- slice are checked. If either has been exceeded, a new slice is computed as defined by the formulas given below. 3h The formulas for computing a time-slice are: Processor Time-Slice: T= (n*kl + c+p + 8) * k2 + m * I+16667 Elapsed Time-Slice: E = T * r where n is the slice number. When a process is swapped out due to a demand condition, its slice number is set to zero. Each time a process is swapped because of exceeding its (processor or elapsed) time-slice, its slice number is incremented by one. This number is subject to a maximum value of 7* c is the core space used by the process in chunks, (l chunk =± 990 words) m is the minimum time-slice in seconds, (m = 1) kl is h. k2 is 5000 p is priority (p = 51) r' is the ratio of elapsed time to processor time. Time-slice units are 2 .k msec. 2.2.2. Benchmark Jobs Three benchmark jobs were distributed on each of the computing systems studied, with some exceptions. The first benchmark job was dominated by arithmetic operations, the second consisted of manipulations of bit strings and the third was input/output bound. Listings of these benchmark jobs as they were stored and used on each computing system are presented in Appendix B.* These jobs were chosen for their distinct *These benchmark jobs were generated by members of a research group working under the direction of Dr. P. A. Alsberg, Center for Advanced Computation, University of Illinois, Urbana-Champaign . They were used in this research with Dr. Alsberg' s permission. 35 claims on the system resources of CPU processing, core use and i/o channel utilization. These particular listings were generated at MIT- MULTICS. Job listings from all other installations are essentially identical. The "number cruncher " or arithmetic benchmark job was written in standard FORTRAN and generates a 100 x 100 correlation matrix for a 100 x 100 input array called DATA. A main program dimensions all arrays and appropriately initializes arrays and variables. This main program then calls on a subroutine to generate the required correlation matrix. This benchmark places demands on the system resources of core (more than 20 kilobytes of core are required just for array storage) and on CPU processing (the innermost loop in the subroutine is executed .5*10 times) . The bit string manipulating benchmark job was designed to place its main system resource demand on the CPU alone. This standard PL/ I program takes a 100 x 100 input matrix called REALITY whose entries are ones that can be traversed from the top row to the bottom row, traveling only vertically and horizontally between adjacent squares. A second matrix (FOUND) of the same dimensions as REALITY is used as an internal work space. Initially all entries in FOUND are zeroes. When a valid path is discovered from the first row of reality to an adjacent square, the corresponding neighboring element in FOUND becomes a one. Thus, the elements in FOUND that are ones represent elements in REALITY which can be reached from the first row. At each iteration, an element in FOUND 4 becomes a one if the corresponding element of REALITY is a one (i.e., it 36 is connected to a valid path from the top row) . The process terminates either when no new ones appear in FOUND or when an element in the "bottom row of FOUND becomes a one. This program is a bit manipulating benchmark since matrices are stored and referenced as hit strings. The third and simplest of the benchmark jobs was also written in standard PL/ 1 • It was designed to make its main resource demand on the i/O mechanism of the computing system. The program opens a file and writes 1,000 250-word records into it. It proceeds to close the file, reopen it, read the same 1,000 records back and finally closes the file once again. Table 2 .2 indicates exactly which benchmarks were run at each of the computing centers and explains why certain of the benchmarks were omitted. Table 2.2. Benchmark Jobs Run at Various Computing Centers System Number Crunching Benchmark Bit Manipulating Benchmark I/O Bound Benchmark AMES-TSS Yes Yes Yes BBN-TENEX Yes PL/l is not available on this system CCN-TS0 Yes Yes Yes MIT-MULTICS Yes Yes Yes UCSD-CANDE Yes PL/l is not available on this system 37 2.2.3- Load Level Each of the computing systems under study was arbitrarily said to have ten distinct load levels within which it operated. In general, the load levels are uniformly distributed intervals in which the value (e.g., number of users, load average or utilization fraction) of the two end points and the interval width depend on the load measure for a particular system and its observable load range, respectively. The ' k load level for the i system, I. , , is defined by an interval i,k' *i,k = [ (( s i/ 10 ) * ( k " 1 )) + !> ( s i/ 10 ) * ^ where s. is a measure of load in a saturated system i. For example, UCSD measures load in number of users and its highest observable load level was taken to be 30 users. The fifth load level, therefore, would be defined as \rCSD,5 = [ ((30/10) * k) + 1, (30/10) * 5J or Ws " [13 ' 151 Several exceptions to this load level definition arise owing to the individual characteristics of the systems being studied. AMES67 measures increasing load in terms of a decreasing function in direct 38 contrast to all the other systems under consideration. The AMES67 measure is a utilization fraction ranging from 1.0 for no load to 0.0 for extremely heavy loads. In this case, the load interval is defined as follows : 'jBBSST.k " [ - 1 * (1 °" k) ' U * (U - k)) " - 001] - Other exceptions to the load level definitions occur in the ■widths of the most heavily loaded levels (that is, load levels 8, 9 and 10) . Since response times on some of the systems studied grows very large with increasing loads (response times rise to approximately one hour on the BBN-TENEX system under heavy loads), it "becomes difficult to take a response time measurement within a load level that is too narrowly defined. The load varies more during these longer periods than during the lightly loaded, short response time periods. For this reason, the widths of load levels were sometimes broadened at the high end of the load level spectrum (see levels 6-10 of BBN-TENEX in Table 2.3) Still another adjustment was made in the load level definition for the BBN system running TENEX. The TENEX load measure is one of "load average" defined as the ratio of number of runnable jobs (jobs not blocked for i/o or otherwise in a wait state) to running jobs (jobs which are loaded in core and immediate potential candidates for CPU time-slices) The rapidly changing nature of this measure, combined with the relatively long response times for the TENEX system, even under moderate loads, necessitated overlapping load level definitions to obtain any valid 39 Table 2.3- Load Level Definitions SYSTEM AMES67-T3S BBN-TENEX CCN-TSO MIT-MULTIC S UCSD-CANDE 1 LOAD MEASURE Utilization Fraction Load Average Num.be r of Users Number of Users Number of Users LOAD LEVEL ■I 1 1 i (-900, .999) ( 0,2 ) ( 1,1 ) ( 1,7 ) ( 1,3 ) 2 (.800, .899) ( 1,3 ) ( 2,2 ) ( 8, lU) ( 4,6 ) 3 (.700,-799) ( 2,k ) ( 3,3 ) | (15,21) ( 7,9 ) h (.600, .699) ( 3,5 ) ( h,k ) (22,28) (10,12 ) i 5 (.500,-599) ( k,8 ) ( 5,5 ) (29,35) (13,15 ) 6 (.koo,.k99) ( 6,io ) ( 6,6 ) (36,42) (16, 18 ) 7 (-300,-399) ( 8,12 ) (7,7 ) (43, 49) (19,21 ) 8 (.200, .299) ' (10, lk ) ( 8,8 ) (50,56) (22,26 ) 9 (.ioo,.i99) (12, 16 ) (9,9 ) i (57,63) (26,30 ) 10 (.ooo, .099) (i4,n4) (10,>10) i (64,70) (31,>31) 1+0 response time measurements. For example, iL__ T ,- = [ 10.0,1^.0] and ^■d-dtvt £ - [I2.0,l6.0], where the end points of the intervals are load BB1N , O averages. Table 2.3 contains a complete listing of the load level definitions for the five systems under study. The system load was recorded before and after each response time measurement. A measurement was said to be taken at one of the ten possible load points only if both load recordings fell within the interval defined by that respective load level. 2.2.4. Response Time The main performance measure to the user of an interactive system is response time. Users are happy if the system reacts within a time span they have learned to expect. If the system does not perform as expected, user discontent rises. Frustration increases rapidly when expectations of immediate response are thwarted. However, frustration increases much more slowly when the expected turnaround time is such that the user turns attention away from the response time to other activities. This latter expected response time may range from approxi- mately ten minutes to several hours. The response time to a "run" command, given that the required CPU time for the program to be run is less than one minute or so, hovers between two response classes. On the one hand, if the system is lightly loaded, program execution may be completed in a few minutes. In this case, the users would probably devote their attention solely to waiting for the system response. On the other hand, if the system is heavily kl loaded, full program execution may require as much as an hour, or even more, and users would turn their attention to some other activity while they were waiting. In order to measure and compare response times to run commands on heterogeneous computing systems, a definition of response time is required that will be consistent across all systems, exhibit a meaningful association to time-sharing system performance and also correspond to the users' conception of how long they have waited. J. F. Maranzano [MAR73J has proposed such a definition. Maranzano ' s definition of interactive response time identifies the interval "from the end of user typing of a command (often called the carriage return) to the first character of output on the terminal" as the critical time span. This response time definition meets the criteria described above in that it is measurable on all systems, the distribution of its values under varying circumstances is a description of system performance and users stop their waiting activity at the first physical sign of output on the terminal. This definition will be slightly modified in this study to its following form: DEFINITION: Interactive response time is the number of seconds which elapse from the end of user typing of a command (carriage return) to the first character output on the terminal indicating the completion of execution of the command. The first output character is required to be that which signals the completion of command execution because some commands print informative messages at the beginning of their execution. k2 Separate classes of commands are defined by Maranzano to insure that uncontrolled variability of times within each class will be minimized. We are concerned here only with the respective "load and run" command associated with each computing system that directs the system to load the (previously compiled) object version of a particular program and to proceed with its execution. Since our response time comparison is limited to this single command, no further command classifications are required. Two of the systems under study (BBN-TENEX and UCSD-CANDE) trace and record the interactive elapsed time automatically and report it to the user upon completion of a command execution. For the other three systems, the response time was measured by utilizing system clocks in various ways. The exact command sequence used in each system measurement is presented in Table 2.k. The average response time for the execution and printout of the TIME command information was calculated in each case and accounted for in the final determination of the "load and run" response time. AREA, network transmission time which is presently less than .1 second in either direction was not isolated in the response time determination (was recorded as part of the individual system response time). All response time measurements were made from a terminal, using commands available to all users of the system. No special hardware or software monitors were used. ^3 Table 2.k. Command Sequence for Systems' Measurement SYSTEM COMMANDS* COMMENTS AMES67-TSS TIME? CALL PROGRAM TIME? The TIME? command returns the wall clock time. BBN-TENEX PROGRAM NAME Response time to this run command is returned by the system automatically . CCN-TSO TIME GOCOMPILER NAME TIME The TIME command gives the total connect time MIT-MULTICS TIME PROGRAM NAME TIME TIME is a user written subroutine that calls and displays the system clock time. UCSD-CANDE EXECUTE PROG NAME The EXECUTE command returns the response time automatically upon completion. All the run commands load (if it is not already loaded) and execute the object module of the program. hk 3- MEASURING TIME-SHARING SYSTEMS Response times were measured and recorded at the various observable load levels, for each of the appropriate benchmark jobs, on each of the five computing systems. The data was subsequently subject to curve-fitting analysis in order to formulate statistically significant quadratic, cubic or exponential representations of the response time-load level relationships. Linear and nonlinear least squares regressions were performed. The curve fitting was done using a package program authored by J. A. Middleton titled, "Least- Squares Estimation of Non-Linear Parameters--NLIN" [MH>68] • User subroutines indicating the function to which the data are to be fit are called by the main program which then iteratively attempts to determine the required variable coefficients (a and p in the log-normal case). The algorithm used selects an optimized correction vector for the coefficients by interpolating between the vector obtained by the gradient method and that obtained by a Taylor's series expansion truncated after the first derivative. Iteration is applied to this vector according to the least squares method of estimating parameters until one of the several stopping criteria is met. The set of criteria used to choose the curve that best fit the data included comparison of the residual mean square of each of the fits (these are presented in Appendix C), consideration of the possible and most probable shape of the curve for the time -sharing system under consideration, and special handling of "outlying" or obviously exceptional ^5 data points. A discussion of the results of the analysis for each of the computing systems under study is presented below. The plots of the curve fits presented for each benchmark on each computing system display the best pair of fits in each case and indicate which of the two fits was finally chosen. Also included in the discussion of the individual system results is a determination of whether or not "saturation" occurs within any of the observable load level intervals. Mathematically speaking, a system is said to be saturated when the probability of zero users waiting for service becomes less than some arbitrarily small number. This definition may be related to a quadratic, cubic or exponential response time curve that is relatively flat and then becomes concave upward by determining the point (or load level) at which the slope of the curve becomes greater than some arbitrarily small number. Alternatively, when the curve fit tends to have linear characteristics (slow steady rising), or in the interest of relating saturation to the users' experience with the system, saturation may be defined as that point or load level in which the response time exceeds users' expectations of waiting time. For the types of benchmarks and systems involved in this study, except for BBN-TENEX, two minutes was taken as a reasonable time span within which to expect job completion. Not all systems exhibit definite saturation characteristics within the observable range of the data. A summary table of saturation levels in each of the systems is presented in Table 3«1> The information given there is explained more fully in the discussions of individual systems that follows. The processing times required by each benchmark in each system are presented in Table 3«2. 46 Table 3«1« Systems' Saturation Level SATURATION SATURATION SYSTEM CRITERIA LOAD LEVEL COMMENTS AMES-TSS Response time rises 10th Only a gradual, steady above 120 seconds. or above increase in response times occurs. BBN-TENEX Sharp rise in 3rd System response times response time curve tend to be of the mag- combined with nitude of batch pro- excessively high cessing times rather response times than interactive "(about 5 minutes). processing times. CCN-TSO Fairly sharp rise above System is generally in response time 10th lightly loaded and curve combined with saturation did not rise of response occur in the observ- time above 120 able range of the seconds. data. MIT-MULTICS Extremely sharp 8th System response times rise in response conform most closely time curve com- to popular response bined with rise time expectations. of response time above 120 seconds. UCSD-CANDE Response time rises 8th Only a gradual, steady above 120 seconds. or above increase in response time occurs. kj Table 3.2. Average BenchmarK Processing Times* NUMBER CRUNCHER BIT MANIPULATOR FILE FLOGGER AMES-TSS 21 16 7 BB'N-TENEX 63 NA NA CCN-TSO 6 5 OJ MIT-MJLTIC3 ^5 2 3 UCSD-CANDE 57 NA NA *A11 processing times are given in seconds. NA, not applicable, indicates that the benchmark was not run at the computing center in question. 1+8 3.1. Analysis of Individual System Data 3-1.1. AMES-TSS When a dispatching algorithm for time-sharing systems assigns processes to one of a set of increasingly lower priority queues depending mainly on the processes' former behavior in using its alloted time-slice (like BBN-TENEX), the response time curve for that system is generally almost constant for a lightly loaded system and "begins to rise rapidly when the load increases beyond some critical point. On the other hand, when some other criteria is the main factor in determining queue position, such as the amount a user is willing to pay, or the paging behavior of the process, then the response time curve appears to rise slowly as the load increases, in a strictly linear fashion. AMES-TSS is one such system. As described earlier, core usage characteristics of a process are the main factor in determining queue position at AMES. This implies that a process with a small working set size and good locality will stay in the top priority queues, regardless of how much service it is requiring of the CPU. The response time of a process can increase linearly with increased load, therefore, and not necessarily exhibit a sharp rise at some critical saturation point. This phenomenon can be observed in the response time curves for all three of the benchmark jobs run at AMES, shown in Figures 3«l( a ) through 3.1(c). Although the exponential, quadratic and cubic curves were chosen as best fits for the arithmetic, bit string manipulator and i/O bound benchmarks, respectively, within the observable load span all three curves rise slowly but steadily, in an almost linear fashion. Because of the linear shapes of the curves, saturation in this system k 9 Figure 3.1(a). Statistical Results - AMES-TSS RESPONSE TIME (SECSO § 8 § 8 S I o o c > > -I D > > T) H O O > o r n < p * ->J (O £ " 3 r w op o n > n w en cz n z] n c n n rn 7D *See Table 2.3. for the correspondence between Load Levels 1-10 and the system measure of busyness for each of the five systems studied 50 Figure 3.1(b) (continued). Statistical Results - AMES-TSS RESPONSE TIME (SECS.) o > a rn < p i © G 9 Q > 33 "S 09 CO cn en oo § 8 g g ro £00 i $0 © \ \ 08 © © \ \ ■n > rn w cn i H Tl e r > o X! 51 Figure 3.1(c) (continued). Statistical Results - AMES-TSS RESPONSE TIME (SECS.) 8 no o s ro GO s _\ \ O \ V o > a 5 P I t) ID O GG0 \ OD \ \ \ \ \ G Q\ G \ G G Q \> \ Q G > rn w -J I r~ m ID r - o GO O rn 52 must "be considered as occurring in the load level in which response time rises above 120 seconds. This rise does not occur within the observable range of the data except for the i/o bound benchmark, in which case the curve barely climbs above two minutes between the 9 th and 10 load levels. The relatively long response time for this benchmark takes on significance in view of the fact that the i/o bound job required less execution time by a factor of 1:2 as compared with the bit string manipulator and 1:3 as compared with the arithmetic! benchmark job. Since "pi", a measure of core contention, was used as the load measure in this system, the question arises as to whether using number of users as load measure would yield different results. Number of users is an undesirable measure for load in the AMES system because of the tendency for local users to stay logged in for long periods of time, regardless of whether or not they are doing useful work. The response time data were plotted against number of users, however, and linear curves similar to the ones already displayed resulted' as best fits. But, as can be observed from Table 3*3, the residual mean squares (RMS) were larger in every case for these plots as compared to the response time versus pi plots. The data collected on the AMES-TSS system is complete, even though few valid observations were recorded at the 9 and 10 load levels, in the sense that AMES has adjusted their overall scheduling scheme such that the value for pi very seldom goes below 0.2. A new "Resource Allocation Scheme" attempts to guarantee some level of service 53 Table 3.3. Residual Mean Squares for AMES-TSS Curve Fits Benchmark Residual Mean Square (RMS) Using "no. of users" -vs- response time Using "pi" -vs- response time Number Cruncher 1.09 (10 3 ) 2.76 (10 2 ) Bit Manipulator 2.25 (io 2 ) 1.U5 (10 2 ) File Flogger 1.05 (10 3 ) 1.03 (10 3 ) to authorized priority users at various times of the day, e.g., group 1 receives top priority between 8 a.m. and 10 a.m., group 2 from 10 a.m. to 12 noon and so on. The data, therefore, represents observations over all the load levels that AMES-TSS will assume in its present configuration. 5h 3-1.2. BBN-TENEX Even the novice user of the TENEX system at BBN quickly forms the impression that for a fairly good sized job, even with light loads, response times are slow and they tend to increase very rapidly. The exponential curve shown in Figure 3 .2, chosen as the "best fit to the BBN number crunching benchmark data, readily verifies this impression (as explained in Section 2.2.2., the bit manipulating and the file flogging benchmarks were not run on this system) . The data range is the largest of all the systems studied, rising to a measured turnaround time of more than one hour at the tenth load level. The slope of the curve rises relatively rapidly, making a saturation point difficult to define . Only measurements at the lowest load level were consistently under 120 seconds. The BBN system response actually hovers between time- sharing and batch expectations. The exponential curve fit reflects the success with which the philosophy of the TENEX dispatching algorithm (which predicts approximately exponential response times) is implemented in the total BBN- TENEX system. Of special interest in this system is the fact that the load measure is not number of users as it was in the majority of systems, but is the quantity defined as "load average" in the earlier description of the TENEX system. With this quantity as independent variable, the BBN data yield the best regression fit of any other set of data. The ratio of regression sum of squares to total sum of squares is a satisfyingly high 0.872 (see Appendix C). 55 Figure 3-2. Statistical Results - BBN-TENEX RESPONSE TIME (SECS.) M 2 re c c o — _ ■ — — ro o > a r p yi CP i © n > 3 2 m (C ^ X 00 e - r i n cz — ^ o IE n 56 The BBN data can "be accepted as complete in the sense that the observable range of load levels shows the interactive response time for the arithmetic benchmark rising above an intolerably high one hour. A user searching for a time- sharing system on which to run a job would surely reject the BBN-TENEX option (except for some extenuating circum- stances such as free computing) when the load average rose above about rd 14.0 as it does in the 3 load interval. 3.1.3. CCN-TSO The CCN-TSO system is not often heavily loaded, with thirteen users being the maximum load observed during this study. Moreover, the processor is a powerful one and in the context of TSO's particular dispatching algorithm, the CCN system required only 6 seconds of execution time to execute the arithmetic benchmark. This was a performance improve- ment of more than 3:1 over the next fastest system (AMES-TSS) and of more than 10:1 over the slowest system (BBN-TENEX) . Further, since within the entire CCN computing system the TSO system is guaranteed a portion of - CPU service, but not a portion of i/O service, i/o interactions of a process become the dominating factor in determining response time. This becomes evident upon examination of Figures 3«3(a) through 3-3( c ) in which response times for the i/o benchmark is almost double that for either the arithmetic or bit manipulating benchmark, even though the i/o benchmark requires less than half the processing time of either of the latter two. Both the arithmetic and bit manipulating benchmark sets of data suggest that the CCN-TSO system has not reached saturation within the observable range. Both curves are very slowly rising and stay below 120 57 Figure 3.3(a). Statistical Results - CCN-TSO RESPONSE TIME (SECS.) 8 8 2 O O Z I cz Q DO PI XI 8 O n X C in o > o 9 P I O D C > > -o H O CD ceo o 00 o o z n x ■n =1 58 Figure 3.3(b) (continued). Statistical Results - CCN-TSO RESPONSE TIME (SECSO n o a x, c > TD ■> -\ o a > Z H O p n 5 r u> s § § S i © oa tD © 0\ n o (BE) ! Da 0\O0 > C r~ > r~ o > a r rn < p en 01 0\© \,e O XI 59 IS Q > "1 Figure 3.3(c) (continued). Statistical Results - CCN-TSO 1 ID RESPONSE TIME (SECS.) § § n o Tl OJ Q 0\ rn r* o > a r P o rn 00 60 th seconds even in the 10 load interval. CCN personnel estimate that their system will saturate with about twenty users and the data suggests that this intuition may be valid. These two benchmarks require approximately the same amount of processing time (about five seconds) and their response time curves are similar. The i/o benchmark response time curve is effectively linear, rising steadily as the load increases. An exponential-like quick rise is not observed in this case because it is the i/o service and not the processor service that is causing the increased waiting time. This bench- mark required only two seconds of processing, so it did not descend through the priority dispatching queues. Rather, it spent time waiting as a result of increased competition with all other TSO and total CCN system jobs for limited i/o resources. This wait time grows linearly as the load increases, and has a high degree of variability as is seen by observing th th the actual data point values in the 7 through 10 load intervals. 3-1.4. MIT-MULTICS The MIT-MULTICS data as shown in Figures 3.4(a) through 3.4(c) conforms most closely to the popular conception of expected response time from a time- sharing system. Considering the arithmetic benchmark plot, the exponential curve chosen as the best fit is almost constant (and below 120 seconds) until approximately the 8 load level. Between the 8 and 9 load levels, the curve shoots up extremely sharply, clearly indicating a saturated system. The combination of a fairly fast processor and a scheduling algorithm that relies very heavily on 61 Figure 3.U. Statistical Results - MIT-MULTICS RESPONSE TIME (SECS.) o > P 1 CBf) CD n n cz o in rn 7] 62 Figure 3-M"b) (continued). Statistical Results - MIT -MULT ICS RESPONSE TIME (SECSO 8.6 S S g u O > 3 P m en OB i © S H O to o GO TJ C r > H O X) 1 1 H O 63 Figure 3-^(c) (continued). Statistical Results - MIT-MULTICS m I © CD RESPONSE TIME (SECSO 8 § 8 8 O > p QQ QDV O ~n u 3 100 o o PI 6k previous time-slice usage and a series of priority dispatching queues work together to achieve this expected behavior. The approximately ^5 seconds of required execution time allow the benchmark job to remain in the system long enough to keep using up its formerly alloted time-slice and descend through the priority queues. Position on a low priority queue is of no significance until the probability that there are processes waiting for service becomes greater than some arbitrarily small number. This happens at the 8 load level. The other two benchmarks run at MIT required only about 2 seconds of processing each and so were not caught up in the descending queue phenomenon. They received excellent response times regardless of the load level. 3.1.5- UCSD-CANDE The response time load level curve (Figure 3*5) is linear for UCSD-CANDE as it was for AMES-TSS, but for different reasons. (Recall, only one benchmark was run at UCSD.) UCSD has only two priority queues for its interactive programs, the lower priority queue for processes which exceed their previous time-slice and the higher priority queue of all other ready jobs. All processes are served FIFO from both queues, so that except for the possible interruption of high priority processes, even jobs requiring long processor service times are served approximately round robin (RR) until completion. Response time grows linearly with load, therefore, rather than exponentially. For the arithmetic benchmark which required 57 seconds of processing time on the average, the response time rises to less than three times the execution time within the 65 Figure 3-5. Statistical Results - UCSD-CANDE RESPONSE TIME (SECSO o £ ro o O > a r 5 P I G o o c: > □ > "0 z h o ui CO (IT ©00 oe o G Q \ CZ o U) a CZ CD n X o, XI CZ o ZE rn x 66 observable range of the data. The curve rises above 120 seconds at th approximately the 8 load level, but given the average performance ratio of better than 3:1 of total response time to required execution time, a more heavily loaded system needs to be observed in order to more accurately pinpoint a saturation level, if one exists. 3.2 . Comparison of Computing Systems One of the major goals of this study of the response times on various time- sharing systems on the ARPA network was the comparison of system performance . Each of three benchmarks was run on from three to five different systems, with response time measurements being made at varying load levels. The arithmetic benchmark job was run on all five of the systems under study. The bit string manipulating and i/o bound benchmark jobs were run at AMES, CCN and MIT only. The load levels are equivalent (and hence comparable) in the sense that each i load level th represents the i (approximately) uniformly distributed load interval- over the range of the observable data for a particular system. Reference should be made to section 2.2.3. for the precise load level definitions on each system. 3.2.1. Arithmetic Benchmark Comparison plots for the arithmetic benchmark job are presented in Figures 3«6(a) through 3«6(e). Curves shown in these figures are those determined to be the best fit in the individual system analyses. The BBN-TENEX response time curve dwarfs all other systems in comparison as Figure 3«6(a) illustrates. Reducing the dependent variable scale by a 67 Figure 3.6(a). Arithmetic Benchmark Comparisons RESPONSE TIME (SECSO 8 u en en P * *See Table 2.3. for the correspondence between Load Levels 1-10 and the system measure of busyness for each of the five systems studied 68 Figure 3«6(b) (continued). Arithmetic Benchmark Comparisons (Without BBN-TENEX) RESPONSE TIME (SECS.) 8 8 2 § o > o r n < n r C*l tn 01 oo CO CI en m XI o 73 C O IE n XI 6 9 Figure 3 .6(c) (continued). Arithmetic Benchmark Comparisons (With 9% Confidence Intervals) RESPONSE TIME (SECSO S 8 8 ! o > a s p no w oi CD rn X) o XJ o in n XI TO Figure 3«6(d) (continued). Arithmetic Benchmark Comparisons (With 95$ Confidence Intervals) RESPONSE TIME (SECSO 8 8 8 s o > n < p ro oj en o> ca ID 71 Figure 3-6(e) (continued). Arithmetic Benchmark Comparisons (With 95$ Confidence Intervals) RESPONSE TIME (SECS.) S 8 2 ro ro OJ O > a 3 P on CD 72 factor of more than five as was done in Figure 3«6(b) brings the comparison of the other systems into better perspective. Figures 3«6(c) through 3.6(e) show the 95 percent nonlinear confidence intervals for the four curves presented in Figure 3 -6(b). The CCN-TSO, AMES-TSS and MIT-MULTICS systems give very nearly st th equivalent response times in the 1 through 7 load intervals. In the 8 interval, MIT becomes saturated and response time in that system rises sharply, while CCN and AMES continue to give comparable good response time throughout the entire observable range. An important consideration in these observations is that AMES and MIT are producing favorable response time data over the entire range of usage in those systems, while the CCN data, though favorable, was collected on only a lightly loaded system. The UCSD-CANDE system, while giving quite acceptable response times, is generally out performed by all systems except BBN-TENEX. The UCSD system reacts to saturation less radically than does the MIT - system, however, and performance is better at UCSD than at MIT in the 9 and 10 load intervals . If a strict ranking were required, from fastest to slowest systems in terms of response time curves for the type of processing inherent in the arithmetic benchmark job, it would be given as CCN-TSO, AMES-TSS, MIT-MULTICS, UCSD-CANDE and BBN-TENEX. Such a ranking, though, must be considered in the context of how significant the difference between any two particular systems really is. 73 3.2.2. Bit Manipulating Benchmark The bit string manipulating "benchmark was run on the three systems that supported the PL/ I programming languages: AMES, CCN and MIT. Figures 3«7(a) and 3«7(h) present the comparative response time results for this highly CPU bound benchmark. The MIT-MULTICS system required only two seconds of execution time on the average to complete the task and clearly out performs the AMES and CCN systems in terms of response times. Even the 95 percent confidence interval is very tight and evidences the MULTICS superiority. As Figure 3.7 indicates, the AMES and CCN th curves intersect in the k load interval at which point the advantage switches from CCN to AMES. The AMES 95 percent confidence interval is smaller than that of CCN, however, and indicates that of AMES and CCN, AMES generally gives the faster response time. This is true in spite of the fact that in the AMES system the benchmark requires more than three times (l6 seconds) the execution time of the CCN system (5 seconds). For a completely CPU bound job of only moderate length requiring no signficant amount of core and doing no significant amount of i/o, the ranking of systems from fastest to slowest is MIT-MULTICS, AMES-TSS and CCN-TSO. 3.2.3. I/O Bound Benchmark The file flogging benchmark was run on the same systems as the bit string manipulating benchmark. Figures 3 .8(a) and 3 .8(b) demonstrate that MIT-MULTICS again gives the best response time performance, with AMES-TSS clearly second and CCN-TSO third. The CCN system acknowledges Ik Figure 3.7(a). Bit String Benchmark Comparisons RESPONSE TIME (SECS.) 8 8 8 8 o > a r n < ro oj ai on (D TO H O 73 75 Figure 3-7(1)) (continued). Bit String Benchmark Comparisons (With 9% Confidence Intervals) RESPONSE TIME (SECS.) 8 8 8 § » ra u en a i i \ I I I I OB I ua > Z T) C > O X] \ \ \ \ \ \ \ \ \ \ \ \ 76 Figure 3 -8(a). i/o Bound Benchmark Comparisons RESPONSE TIME (SECS.) 3 8 8 s o > rn < n r u CT> (O 77 Figure 3.8(b) (continued). i/o Bound Benchmark Comparisons (With 9% Confidence Intervals) RESPONSE TIME (SECS.) 8 § 8 S 78 that its I/O resources are those most likely to "become bottlenecked. The wide variability in the CCN 95 percent confidence interval is evidence of the processes outside of TSO control that also compete for the i/O resources. 79 k. MODELING TIME-SHARING SYSTEMS The system comparison data presented thus far is useful in evaluating the performance of various time-sharing systems in reference to a given set of computing applications (benchmark jobs) which require a given amount of actual processing time. In order to compare and predict turnaround time for a wider class of jobs, however, a system model is desired which accepts the processing time of a job as an independent variable rather than as an implied constant. The approach used in this investigation is to develop an analytical and/or simulation model to describe the behavior of the various time- sharing systems under study as they process the number crunching benchmark job. These more general models are tuned to approximate as closely as possible the behavior of the already developed statistical models describing the respective systems. The tuned models, depending on the success with which they are able to describe system behavior, may then be used in place of the statistical models to predict job response time for similar job applications but for jobs requiring any amount of processing time. In addition, the effect of network delays (which was not a factor in the statistical model) is introduced into the analytical and/or simulation models to more completely predict job response time. *4 .1. An Analytical Model for Time-Sharing Systems During the late 1960's, analytical modeling of time-sharing systems with various scheduling disciplines resulted in a wide range of useful system models. A thorough survey of such models is presented by 80 L. Kleinrock [KLE72] . Basically, the systems are studied by considering priority disciplines operating in a stochastic queueing environment. The essential elements of such systems include the source from which jobs emanate for service, the input process, the service process, the number of servers and the service discipline. Many variations exist within and among these elements, providing a wide choice of model designs. Some design parameters are strongly recommended for ease of model analysis, such as Markov assumptions for the arrival and service processes. Other design options such as a particular queue discipline can be more closely matched with the actual system that is being modeled. Below is a list of the set of design options that completely define the analytical model used to represent the time-sharing systems under study. Except for the AMES-TSS system, all the systems dispatch processes through a set of priority queues, each of which has its own associated time-slice. Source: The source was assumed to be an infinite one. The load of a system is equated with the number of job arrivals emanating from this source. This assumption is not a completely accurate one since the load on a time- sharing system is often limited by the number of terminals with system access capability. Scherr [SCH67] has developed a model based on a finite source . Input Process: The input process is assumed to be the Poisson process and is described by an interarrival time distribution denoted by A(t) . A(t) is defined by the exponential distribution A(t) = 1 - e " xt t > 0, X > t < 0, \ > 0. 81 The mean arrival rate is than l/X seconds . The interarrival times form a sequence of independent and identically distributed random variables. Service Process: The service process is also assumed to be exponential and is defined by B(t) = 1 - •JUT T > 0, U > o T < 0, M > 0, The mean service time is l/ju seconds. The service times are also independent and identically distri- buted. In a measurement study by Fuchs and Jackson [FUC70], a significant result showed that for all continuous random variables studied, the gamma distribution was an excellent fit. Because of the close relationship of the gamma and exponential distributions, analytical models studied under the assumption of exponential distributions may not be far from the truth. Number of Servers: The number of servers is 1. The standard notation used to describe the model thus far is M/M/l, where the first and second parameters indicate the exponential distribution for the input and service process, respectively, the third indicates one server and the lack of a fourth indicates an infinite source. Service Discipline The service discipline is quantum controlled with a variable quantum size, FB„ .th V FIFO, preemptive resume Each in the N~" queue and having zero swap time of these options is discussed separately. Quantum Controlled: Each process receives a maximum service time from the service facility equal to the quantum q. associated with its particular queue. Different quantum sizes may be associated with different queues, but the variability is limited to a linear function of some constant quantum. FB. N' If a job has not completed processing during its quantum, it returns to the system at the end of the next lower priority queue . There are N such queues. Units at the N level are served a quantum q at a time in turn until completion . That is, an 82 N level process will be preempted "by a higher level process if one exists, or by another N^* 1 level process if one exists, after it has completed the quantum- service in progress. FIFO: The service is first-in, first-out within queue s . Swap Time: The time required to swap a process in and out of the memory is assumed to be absorbed in the process' required service time. The swap time is thus considered to be zero. Of the many time -shared models presented in the literature, two meet almost all of the specifications listed above. Wolff [WOL68] analyzes a model identical to the one described except that it is FB rather than 00 FB„. Jobs are permitted to descend through an infinite number of priority queues before completing processing. Coffman and Kleinrock [COF68] present a model identical to the one described except that it does not provide for variable quantum sizes. A modification of the Coffman- Kleinrock model extends its application to include a limited use of variable quantum sizes. In addition to the arrival and service time definitions already given, the following notation will be used: 5 = a constant fractional amount of time allocated to a job on each pass through the system q. = the amount of time allocated to a job on its i pass through the system, i = 1,2, . . . We require that 5 < q. and q. = m.8 , i = 1,2, . . . , and m. is an integer. Q. = the total time allocated to a job on its first j passes: J CL = £ t. Since the derivation depends on the integral property of k and since the q. were defined to map into an integral multiple of 5, the model can be adjusted to accommodate variable q. . For a job requiring t seconds of service in the FB„ system with fixed quanta of length 5, the expected waiting time in the system as derived by Coffman and Kleinrock is U/2 [E.(t=) +7 ]r E-(t C )] w ( t ) = £ [ 1 — i k [1 - p (1 - e"^- 5 )] [1 - p (1 - e~^ k - 1)Q )] p (l-e^" 1 ^) -iu(k-l)5v (k-l)o + t 1 < k < N-l 1 - p (1 - c W, (t) P(l//i k (l - P )!l - P (l - t -'^ N - 1 )5) p (1 - e-MCH-DB) 1 - P ( 1 _ p -„(N-l)B) (k-l)B + t k > N where k is the smallest integer such that k8 > t, where we define (t ) as the second moment of the distribution *V(t) = 84 0, T < 1 - e~ MT , < T < ks 1, t > kS with EJT) = ^ [1 - e^* 8 ], \(t ) = — - ^-2— [(/iksT + 2^/kS + 2] M Id and where 7 e' M k = 1 - e^ 5 and P = "Nju where p is a measure of system utilization. Now since q. > 5 and q. = m. 5 for all i and m. an integer, the number of 5s required to service a job can be partitioned into A. subsets in such a way that there exists a unique mapping between the A. 's and the q. 's. Let the partitions of the 5s be defined by sets A. = k.S, with k. an integer. If a process requires t seconds of service time with k the smallest integer such that k 5 > t and m the smallest integer such Q > t, then partition the k 5s into m subsets, each representing a sum of 5s, such that k t 5 = A i = q. ± 1 < ± < m- 1 (3) 85 and -<\- (h) m m Each q is now associated with a particular k . x i There also exists a mapping from the number of priority queues, N, in the Coffman-Kleinrock model into the number of priority queues, N f , in the modified model. N' is associated with some q ,. Define N' N= E q,/5- i=l 1 Returning to the Coffman-Kleinrock model, let the values that k assumes in equations (l) and (2), instead of being any integer, be only those integers . for which J P . = I k. for 1 < j < m . (5) J i=l X " " Then can be substituted for k in those equations since I is the m m smallest of the I. integers such that H. s > t. J J As an example of this type of mapping, consider t = .93, 5 = .1, q. - 2 5 and N' = k. Clearly, q. < F> and q. = m. 5 for all i and m. an integer. Further, for k = 10, k is the smallest integer such that k ft = 10* .1 > t = .93 and for m = k, m is the smallest integer such that _ = q + q^ + q + q } = .1 + .2 + .k + .8 > t = .93- 86 Now ^ = l*.l = ■1= 4l J k l = 1, Ag = 2*.l = ,2--qg J k 2 = 2, A3 - 1...1 = •u=, 3 J k 3 = >*, A U = 3*.l = •3 < 1 u > k U = 3 and « 1= 1, i 2 =3, jg =T, 4^=10. This mapping changes the way the system is conceptualized in a greater degree than it changes the way the system actually works. Figure 4.1 illustrates this change for k.=4-. Assuming the job requires at least k service quantums before completion when it arrives, a job passing through the Coffman-Kleinrock system receives k short bursts of service, each time taking its place on the next lower priority queue and waiting for jobs of higher priority to be processed first. A job passes through the modified system in one service burst, after having waited for all jobs queued at that priority level to use their required service quantum of up to k. The restriction on the choice of k divides the first type (Coffman-Kleinrock model) of system into several 'black boxes" each of which represents an equivalent service quantum available in the second system. 87 Figure k.l. Comparison of Two Models COFFMAN-KLEINROCK MODEL MODIFIED MODEL ARRIVAL 48 i th PRIORITY LEVEL jth THROUGH (j+4)th PRIORITY LEVELS In order to see how the restriction on the choice of k effects the expected waiting time results, we consider a tagged job arriving at the FB N system in equilibrium, assuming that its service requirement is t seconds and that k is the smallest integer such that k 5 > t, and 88 rn the smallest integer such that Q > t. The system must be divided into two disjoint subsystems to derive the modified system equations. We will first examine the progress of the tagged job for its first Z n m-1 passes through the system, and then consider the tagged job's I pass through the system separately. We have defined A. subsets, 1 < i < m, to partition the 5-quantums required to service a job. We now consider the waiting time in queue of the tagged job as it passes through a A. subset of quanta for any i < m. We will define this waiting time as W., where W. = W . (t) - W- (t) i < m • (6) i i-1 Assuming that the units in all queues of priority higher than i have been processed, in the modified system the waiting time of the tagged job is effected only by those jobs which are ahead of it in the i queue. These jobs will receive their q. quantum of service under a strictly FIFO discipline, and then the tagged job will receive its q. quantum of service, completely independent of jobs which have arrived during the waiting interval of the tagged job on queue i. This is not the case in the Coffman-Kleinrock system. Still working under the assumption that the units in all higher priority queues have been processed, and also k. > 1, in the Coffman- Kleinrock system a tagged job's total waiting time in the j 8-quantum queues, £,. ,+1 < j < I , is dependent upon new arrivals that occur during the tagged unit's waiting time. This is so because these new arrivals will start to receive 5 quanta of processing time before the 89 tagged job has received its total j 5 -quantum service slices. If no new arrivals occurred during the tagged job's waiting time, the tagged job would experience identical waiting times in both systems. The waiting time in the j 5-quantum queues in the Coffman-Kleinrock system is greater th than that in the i queue in the modified system by a factor that depends on the average number of new arrivals to that set of queues. We define E(T. ) to be this extra expected waiting time. The average number of arrivals must be based on W. + (k. - l) 5 since new arrivals can seize 5-quantums of service until the tagged job begins its last 5 of processing. The average arrival rate to the i queue, \., is determined by the following consideration. A job arrives th for service at the i queue only if it requires more than £. seconds of processing. We recall that B(t) = 1 - e is the service time distribution, where B(t) represents the probability that the service time t is less than or equal to some number. The inverse is formed by solving for t: t = - ( l/p ) In [1 - B(t)J. The inverse form can be used to calculate the probability that t is greater than some particular L , . If we call this probability p > « than As an example of this process, we consider I . - =• 7 a nd seek to discover the probability that 1 > 7. For t < 7 and ju = — , B(t) = .37 so that p _ = .63 and the arrival rate to 1 is given by X. = .63 X- T ^ 1 1 1 90 Having determined the arrival rate, X , and the interval in which these arrivals take place, W. + (k. - 1)5, the average number of arrivals is calculated as the product of these two quantities. The time by which the tagged job will be. delayed is the product of this average number of arrivals and their average service time requirement, I -1(t)» Only service times strictly less than or exactly equal to 1 (k.-l.)o are significant here since (k.-l)s is the maximum service time, a job arriving to this queue will receive before the tagged job completes its service requirements. The expression for the average service time, therefore, is given by V 1 1, E(T.) must be subtracted from the Coffman-Kleinrock m-1 response time equations, or the term - Z E(T.) must be added. These i=l x terms may be considered independently for each f±. subset because even though a job may wait longer to complete service in the j 5 -quantum queues of the Coffman-Kleinrock system than in the corresponding i queue in the modified system, the relative ordering of the jobs does not change from one system to the other. That is, when the job arrives at either st st the /\. , ' or (i+l) ' queue it sees the same queue configuration in either system. We now consider the £ or m pass, where the waiting time m B ' is not the same for the two models if q > 5 . In the fixed quantum system, a job continuously receives small bursts of service up to and including its k burst, waiting only for other jobs in the system to receive their same bursts up to m. But in the variable sized quantum system, a job 92 that is queued for service at the m priority level must wait until the jobs ahead of it receive their total quantum of service, up to the maximum alloted at that level. Since an arrival requires service at the £ priority queue (which consists of £ - £ . 5 -service queues) or the m priority queue only if it requires in excess of £ _ seconds of service, the , th m * ueue ' h average arrival rate to the £ queue, \ ff , is given by m m m-1 £ is the time our tagged job must wait for service, then th the expected average number of arrivals to the £ queue must be based on W„ + £ , 5 since the tagged job receives £ n 5 seconds of service £ m-1 m-1 m , , before reaching the £ queue. Therefore, the expected average number th of arrivals to the £ queue prior to the tagged job would be m m The average service time distribution for the queue arrivals would differ depending on whether the job was serviced in system one, the Coffman- Kleinrock system, or in system two, the modified Coffman-Kleinrock system. In system one, £ - £ _ queues remain through which the tagged ' m m-1 job must pass before completion. Each of the arrivals to the (£ + 1) queue must have remaining quanta of service of which |* „ (t) is the m m-1 average amount. The expected time to process all jobs before the tagged job in the £ - £ n interval is therefore m m-1 93 mm m m-1 This -waiting time is already included in the Coffman-Kleinrock equations. In system two, each arrival to the m queue will have remaining a quanum of service of which £ (t) is the average amount. The expected time to process all jobs before the tagged job is therefore \ [W + L J L (t) • m m X. X. m-I ^ The term that must be added to the Coffman-Kleinrock equations number (l) and (2) therefore, to make the results valid for the variable quantum size model is ^ tW^ + L J [L (t) i„ „ (t)J Vl V ; X- Vi If q = I -i _, then the term is zero. Tn m m-1 Thus, with the two modifications detailed above, the modified Coffman-Kleinrock model becomes directly applicable to time-sharing systems of the type represented by the general time-sharing model of Figure 2.2. h.2 . A Simulation Model for Time- Sharing Systems A GPSS simulation model of a time -sharing system with a scheduling discipline identical to that specified for the analytical model was also developed. A flowchart of this model as it simulates the MIT-MULTICS time-sharing system is presented in Figures U.2(a) 94 Figure k.2(&). Simulation of MIT-MULTICS Time -Sharing Scheduler (Generation of Tagged Jobs) 2 ,K45 C ASSIGN J _J ' ] .,P2 c« 5SIGN J < ' 3.K1 (« 3SIGN J 1 ! 4 ,K10 (« JSIGN J 1 ' 7,K1 (.« 5SIGN J GENERATE JOB ASSIGN REQUIRED SERVICE TIME ASSIGN SERVICE TIME REMAINING ASSIGN FIRST SERVICE SLICE ASSIGN SCHEDULING PRIORITY TAG THIS JOB TRANSFER TO SCHEDULER 95 Figure U.2(t>) (continued). Simulation of MIT -MULT ICS Time -Sharing Scheduler (Generation of Jobstream) GENERATE JOB ASSIGN REQUIRED SERVICE TIME IS SERVICE TIME EQUAL TO ZERO ? IF SO, ASSIGN ONE SERVICE UNIT (SCH7) ASSIGN SERVICE TIME REMAINING ASSIGN FIRST SERVICE SLICE ASSIGN SCHEDULING PRIORITY 9 6 Figure U.2(c) (continued). Simulation of MIT-MULTICS Time -Sharing Scheduler (Scheduling Discipline) (SCH3) BUFFER QUEUE 3 SEIZE k^ DEPART P3 3 PI (SCH1) TEST (SCH2) ADVANCE P3,0 ^~ RELEASE w RESCAN THE CURRENT EVENTS CHAIN QUEUE THE JOB FOR PROCESSING SEIZE THE PROCESSOR AND RESERVE IT COLLECT RELEVANT QUEUE STATISTICS IS THIS THE LAST SERVICE SLICE? GIVE JOB REQUIRED SERVICE SLICE RELEASE THE PROCESSOR FOR THE NEXT JOB 97 Figure 4.2(d) (continued). Simulation of MIT-MULTICS Time-Sharing Scheduler (Job Parameter Updating) PRIORITY BUFFER 4,V3 f ASSIGN J (SCH5) TRANS (SCH3) IS THE TERMINATE FLAG SET ? ASSIGN SERVICE TIME REMAINING IS THIS THE LARGEST ALLOWABLE ALLOTMENT ? ASSIGN INCREASED TIME ALLOTMENT DECREASE PRIORITY ASSIGN REDUCED PRIORITY SEND TO SCHEDULER 98 Figure k.2(d) (continued). Simulation of MIT-MULTICS Time-Sharing Scheduler (Job Parameter Updating) (SCH1) (SCH4) ASSIGN LAST SERVICE SLICE SET TERMINATE FLAG ^ SEND TO PROCESSOR TERMINATE JOB Figure k.2(e) (continued). Simulation of MIT-MULTICS Time-Sharing Scheduler (Run Time Control) GENERATE TIMER STOP RUN 99 through k.2(e), with the chart symbols identifical to those of Schriber's in his General Purpose Simulation System/360: Introductory Concepts and Case Studies [SCHTl]. Figures if .2(b) through U.2(d) illustrate the heart of the simulator as it generates jobs with exponentially distributed interarrival rates, assigns service times exponentially distributed about some mean, and services the jobs according to the scheduling discipline described for MIT-MULTICS in section 2.2 .l.k. The simulator generates tagged jobs for data collection purposes and this process is diagrammed in Figure U.2(a). Figure U.2(e) shows the control module for desired running time of the simulator. 4.3. Analysis of Model Predictions The analytic and simulation models were developed to generalize the predictive capability of the statistical response time models. The conceptualization and definition of the analytical and simulation models were derived from the Generalized Time -sharing Scheduling diagram shown in an earlier chapter in Figure 2.2. As a result of the generalized conceptualization of the models, they can be expected to most closely describe those time -sharing systems which are most similar to the generalization. Since the AMES-T3S system scheduler depends on core usage behavior rather than processor usage, the analytical and simulation models do not apply to that system. They also are not applicable to the UCSD-CANDE system since the models allow a variable, but fixed, service slice at each priority level and time-slices are dynamically awarded in CANDE's priority queues as a function of parameters generated 100 during past processor usage. The models, therefore, have been particularize! to the three remaining time-sharing systems — BBN-TENEX, CCN-TSO and MIT-MULTIC S . 1+.3.1. Individual System Results BBN-TENEX Models — Both the analytic and simulation models were developed for the BBN-TENEX system. The results of these model predictions are plotted in Figure k.3> The analytic model is valid only for values of system utilization, p, less than one, so that since the BBN system saturates under relatively light loads, predictions from the analytic model are possible only for load levels 1-6 . The technique used to tune the analytic and simulation models to closely represent the TENEX system was, after setting up the appropriate priority queues and assigning their associated time-slices, to adjust the average service time and average interarrival rate parameters so that the analytical model prediction for the number crunching benchmark job was as similar to the statistical model prediction as seemed feasible. For TENEX, the best results were obtained for the average service time equal to twenty seconds. The average interarrival rates were associated with previously defined TENEX load levels as indicated in Table k.l. As can be observed from Figure ^-.3, both the analytic and simulation model plots yield satisfactorily close fits to the statistical model plot. They are also well within the 95 percent confidence interval of the statistical model. 101 Figure k.3. Model Comparison - BBN-TENEX RESPONSE TIME (SECS.) o > 5 p 102 Table k.l. Analytical Model Parameters Load Level Associated Average Interarrival Rate BBN-TENEX CCN-TSO MIT-MULTICS 1 29 60 2 25 60 3 2k 30 30 3-5 23 1+ 20 k.5 22 5 | 20 15 6 j 21 15 12 7 12 10 8 8 8.5 10 9 9-5 8 10 19 CCN-TSO Models --Since the run times to obtain comparable response time results from the analytical and simulation models is greater by a factor of approximately ten for the simulation model, and since the analytical model results correspond so closely with those of the statistical model for the CCN-TSO system, only the analytical model was developed in this case. Model comparison results are presented in Figure k.k. The average service time for jobs in this system was tuned to seven seconds 103 Figure h.k. Model Comparison - CCN-TSO RESPONSE TIME (SECSO ro § ft 8 § o > a 3 P OJ en (Ll 1 \ 11 \ \ * * n 1\ \ o f • \ \ \ cz \ > ^ \ v oo \ \ n \ N *J \ N \ \ Q \ \ ^ ^ \ \ c: \ \ z \ \ n \ \ IE \ \ n n \ \ \ Xi \ \ n \ \ s \ \ \ \ \ N \ \ N \ \ ** \ \ ^ \ \ n \ \ s \ \ n \ \ ^ — \ \ v n \ 3 9 o H o o 1 p £ ioU and the average interarrival rates are associated "with load levels as shown in Table ^.1. The 95 percent confidence interval is a relatively •wide one for the TSO statistical model and the analytical model results fall well within this interval for all load levels. MIT-MULTICS Models—Because the statistical response time curve for the MIT-MULTICS system was most like that usually associated with time-sharing systems, this system was initially used to develop and validate both the analytical and simulation models. The average service time was tuned to seven seconds and the average interarrival rate/load level association can again be found in Table k.l. The models plotted in Figure U.5 verify that indeed the analytical and simulation models yield very nearly identical results for this well-behaved MULTICS system and that for all but approximately one load length (6.5-7*5) the analytical and simulation models fall within the 95 percent confidence interval of the statistical model. This confidence interval is relatively tight and it is only near system saturation that the two models tend to move unacceptably far away from the statistical model results. This discrepancy is easily explained by the fact that the MIT-MULTICS system deviates from the generalized time-sharing scheduler model in that it has two processors rather than one. The analytical and simulation models, therefore, would approach saturation more quickly than the statistical model which represents actual two -process or system data. U.3.2. Success of Model Generalization The striking success with which the analytical and simulation models were able to describe system behavior for the set of time-sharing 105 Figure k.5. Model Comparison - MIT-MULTICS RESPONSE TIME (SECS.) 8 § 8 ro O > a p Ln 01 <£> 106 systems whose scheduling discipline can be conceptualized by the generalized time-sharing model (Figure 2.2), indicates that these models can be used for more extended predictions of system behavior. Having been validated against the statistical models based on actual system measurements, the analytic and simulation models can now be utilized to predict system behavior for jobs with characteristics similar to the number-crunching benchmark, but with variable processing time requirements. One example of a set of such predictions is shown in Figure k.6. In this case, the MTT-MULTICS simulation model was used to predict response times for jobs requiring various amounts of processing time, t, as the load level increases. The relative ease with which the analytical and simulation models could be tuned to reproduce the statistical model results for the number crunching benchmark job indicates that this process could be easily repeated for the other benchmark jobs on the appropriate systems (CCN-TSO and MIT-MULTICS, since only the number crunching benchmark was run on BBN-TE1JEX) . Thus, the goal of finding a single model capable of describing and predicting response times for time-sharing systems has been accomplished in the case where the time-sharing scheduling discipline depends on quanta fixed at each priority level, but variable across priority levels, and on the past processing history of the job to be serviced. Although both the analytical and simulation models successfully meet this goal, the analytical model produces its results in approximately one-tenth the time as the simulation model and it, therefore, may be the most practical model for actual use in cases where response times for load levels beyond the saturation point of the system are not required. 107 Figure k.6. Generalized Simulation Model Results RESPONSE TIME (SECSO 8 8 8 I - O > a r P 0j 01 to 108 h .k. Consideration of Network Queueing Delays The response time measurements taken on individual systems of the ARPA network for this study did not distinguish the delay due to network transmission and queueing from the delay due to individual system busyness. This dichotomy of delays was considered to "be insignificant at the time the measurements were taken since network traffic was generally light and only a short run command as opposed to the total "benchmark program was transmitted. Network transmission and queueing delays were estimated at their maximum to be on the order of .1 second in either direction and as such did not contribute measurably to the individual system response time delays. The question now arises as to the effect of network transmission and queueing delays on comparative system response times, given that in the future network traffic increases by a significant amount. G. D. Cole in his extensive measurement work on the ARPA network [C0L71] develops expressions for the serial transmission delays and the queueing component delays of ARPA network messages. The network delay time as calculated using Cole's expressions can be added to the delay times generated by the individual system response time models to form a composite response time model . The delay caused by physically sending a message on the ARPA network from one node to any other node has two components- -the service times at each IMP to store and forward the message and the actual serial transmission delay. For this experiment, the run commands were either one or two word messages and their expected store -and-forward service 109 times were 3«^ and k.O msec, respectively [C0L71, P« 131] at each IMP. The propagation delay is about 10 ju sec/mile, resulting in a cross-country delay of approximately 30 msec. Assuming that the University of Illinois node is the one from which all messages originate, and assuming that routing occurs in an environment in which all nodes are connected as shown in Figure 2.1, then transmission delays for the run command message can be estimated. Table k.2 summarizes these calculations. Inspection of the table reveals that even the longest transmission delay of .05 seconds to UCSD is insignificant when response time measurements are recorded in seconds. Table k.2. Transmission Times from Illinois to Experimental Sites Destina- tion ■ ■ ■-.... ■ — — — No. of store & forward trans- missions Expected Ser- vice time at each IMP (msec) Total expected IMP service time (msec) Propaga- tion delay (msec) Total trans- mission time (msec) AMES 1+ k.O 16.0 20.0 36.0 BBN 3 3.k 10.2 10.0 20.2 CCN 7 3.h 22.8 20.0 U2.8 MIT 1 S.h 3-k 10.0 13.4 UCSD 8 k.O 32.0 20.0 52.0 110 The queueing component of message delay may be a significant addition to individual response time, however, if the ARPA network becomes congested. Cole's expression for the expected message queueing delay [C0L71, P- 13*+] is A/2 [(Xj + X x + (x ) ] m' v a 7 a m v nr w = [1 - ■ \ X ] [1 -\ (X + x )J ma m v a m' > with variables defined in the following way: "\ - arrival rate of messages into the network. m X - service time for an ACK or acknowledgment . Each message is answered by a request for next message, EFNM, which must in turn be answered by an ACK. Therefore, a number of ACKs will be in contention for the service facility along with the messages themselves, and in heavy traffic conditions, will effectively increase each service time by the 3*0 msec that is required to transmit an ACK. x - average message service time, m D ° Using the average message service times for the various destinations listed in Table k .2 and allowing "\ to increase, the effect of network D m ' congestion on comparative response times can now be investigated. Cole defines an alternative system descriptor to "K called m T ) where T is the transmission attempt interval of the time between a a ^ "attempts" at transmission, since no transmission will occur on a link which is waiting for a RFNM (Request for Next Message) return. Further, if N is the number of active nodes or "generators" of transmissions, then "\ = n/'T • Assuming that 3^ nodes are active simultaneously on the ARPA m / a & J Ill network, then the value of T for which the expected queueing delay a approaches infinity for transmission of a message of a particular node becomes a meaningful basis of comparison between nodes. For example, if messages are being transmitted from the University of Illinois node to one of the five systems investigated in this study, then network queueing delays to each of these systems approaches infinity for the value of T listed in Table k*3> From 81 the table, it can be observed that while queueing delays to UCSD from the University of Illinois approach infinity when the network transmission attempt interval is slightly higher than 1 second, transmissions to MIT are not adversely affected until T is close to .2 seconds. Not evident a from the table information is the fact that network transmission and service speeds of the order of milliseconds cause the queueing delays to be sensitive to changes in transmission attempt intervals of the order of milliseconds. Queueing delays to UCSD, for instance, do not rise above one second until T = 1.215 seconds. From that point, congestion EL quickly increases so that at T = 1-195 seconds saturation occurs. Likewise, at MIT queueing delays rise above one second only at T = .218 seconds and saturation occurs at T = .217 seconds. a 112 Table k.3- Infinite Network Delays from U. of I. Node T value at which network a queueing delays approach «> (msec. ) AMES-TSS 61+5 BBN-TENEX 1+U8 CCN-TSO 878 MIT -MULTI CS 217 UCSD-CANDE 1190 In cases where one system responds faster than another, then, but where network traffic causes larger queueing delays to the faster system for a given (low) value of T , then the network queueing a delay becomes a significant consideration in system comparison during periods of heavy network usage and must be included as a part of the predictive response time models. 113 5- A DYNAMIC RESPONSE TIME MONITOR The major purpose of this research was to investigate methodologies and models which could be utilized to develop a dynamic response time monitor for ARPA network users. The monitor is to supply- on-line, real-time information about the level of busyness or load level of each computing node of the network and also to supply comparative response time data for particular computing applications for each of these nodes. Research results indicate current feasible features of such a monitor and also suggest additional features that should be implemented. 5-1. Currently Feasible Monitor Features Evidence is available from the investigation of response time at the five computing nodes included in this study to suggest three immediately implementable monitor features. The first of these is a table of load levels at each node by time of day and day of the week. If ten load levels are defined across the observable load range for all computing nodes, as was described in section 2.2.3., then users could gain a snap-shot overview of relative busy times at any one node. This type of information might influence a decision about when to do work on a particular system. An example of a section of such a table has been compiled for the AMES-TSS system and it is presented in Table 5.1. The data in the table approximates system behavior during May and June of 197^- For user convenience, the time of day on these tables should be translated to the time zone (EDT, EST, PST, etc.) from which an inquiry Hk is made. Times in the AMES-TSS table correspond to the time framework of a user at the University of Illinois node. Table 5.1. Load Levels at AMES-TSS Sunday Monday - Friday 1 Saturday 1-2 -8 AM 8-9 AM 1 - 2 1 " 2 9-10AM 1 - 3 r 1-2 10AM- 2 PM 5 - 8 2-3 PM 5 - 6 3-7 PM 7 - 9 7-8 PM 3 - k 8-9 PM 1-2 3 - k 9PM- A second feature to be included in a dynamic response time monitor is a descriptive text explaining relevant local factors effecting system response times at each node. In some cases, such explanations are buried in "HELP" files associated with a particular time-sharing system. Also, the Network Information Center of the ARPA network provides a brief explanation of local conditions in its NIC publication 115 No. 18666. These courses of local load information are either incomplete (not available at every node) or out of date and are not necessarily easily accessible to all potential users of a particular computing node. For example, the NIC "Service Schedule" description for the AMES-TSS system published in August of 1973 reads as follows: AMES-67 is available 2h hours per day but severe loading generally restricts access from 0800 to 1700 PST. The weekend schedule varies. Typical Load is 30-50 users (including batch). The maximum number of users is regulated dynamically by loading. Network users are not regulated separately. [ANR73b] This description is accurate, but omits information that may be useful, or at least of interest, to a network user. For instance, the AMES system has developed a "Resource Allocation Scheme" which attempts to guarantee a certain level of service to authorized priority users at various times throughout the day (one group has priority from 8-10AM, another from lOAM-noon and so on) . Because of this, the load measure, PI, rarely goes below .250. When the guaranteed level of service for a particular priority group is being threatened by a heavy load, then system access is curtailed for all non-priority users, including the non-priority network user. A further point of interest about the AMES system is that the local user group works a fairly regular 8AM- 5PM schedule, taking the noon hour for lunch. Thus, the machine is lightly loaded during noon-lFM PST. In addition to the load level tables and load descriptive text, the dynamic response time monitor must include an inquiry feature by which a network user can obtain actual current comparative response 116 time data for a job to be processed. The inquiry feature would be made up of two interactive modules — the user interface and the predictive mechanism. The user interface would require user input consisting of the set of nodes at which response time is to be calculated and the CPU and i/O processing characteristics of the job to be submitted. The output to this user inquiry would consist of a list of expected response times at each of the indicated nodes, including the current load level at each node. This data for output would be generated by the predictive module of the inquiry feature. Prediction, of course, is at the heart of the dynamic response time monitor and the feasibility of the predictive feature has been verified by this research. For each of the five different time -sharing systems investigated in this study, it was possible to develop a statistical model in all cases, and an analytic and simulation model in most cases, to describe and predict the response time behavior of that system as it processed a limited set of benchmark jobs. The initial indication from the analytic and simulation models is that they can be easily extended to predict response times for more general classes of jobs than the three benchmark applications. Moreover, the systems themselves represented a wide range of time-sharing scheduling implementations, including the unique AMES-TSS table driven, memory-usage dominated system. Successful description and prediction of behavior for this wide variety of time-sharing schedulers suggests equal success with other time-sharing systems whose scheduling is any variation on the general time-sharing scheduling algorithm as described in section 2.1. 117 Results have been obtained for the network transmission and queueing delays which add significantly to response time when the network itself becomes congested. Should network usage become such that the network approaches a saturated state, then these queueing delays would have to be added to the individual system delay. Although calculations made for this research were done only for very short messages, the same Cole expression can be used when input or output message length is expected to be greater than that able to be transmitted in one message packet. Thus, an analytic model, able to be used from any network node, is available for prediction of this component of the response time. 5 .2 . Additional Desirable Monitor Features Beside the features which have already proven to be immediately implementable components of a dynamic response time monitor, there exist other desirable monitor features which would make utilization of network resources easier for the user. Chief among these is comparative cost information. Some preliminary work done by Peter Alsberg at the University of Illinois Center for Advanced Computation illustrates the difficulties encountered in collecting charging algorithm data for individual systems on the ARPA network. Some systems have free accounts for network users and some heavily subsidized systems use charging algorithms that do not reflect their actual expenses. Further, information is needed on network routing expenses since if charging where done on a node by node basis, then some systems which offer a cost advantage as individual entities may lose that advantage due to extensive job routing requirements. 118 Given both comparative response time data and comparative cost data, the dynamic monitor could be extended to appear to the user as a dealer in network services. The monitor would be enabled to indicate the fastest response time possible at the highest cost a user is willing to pay. Thus, the monitor can provide complete time -vs- cost data while not usurping the users' power to finally decide where to run a job. 119 6. CONCLUSIONS This research has shown that it is feasible to develop a response time monitor for use in a network computing system that is capable of providing comparative response time information for users with various computing applications to process. System response behavior was measured and modeled using statistical techniques as well as analytical and simulation techniques. The effect of network traffic on response times were also considered. Analysis of measurements on individual time -sharing systems revealed that it is, in fact, possible to describe and predict response time for these systems using linear and/or nonlinear regression techniques. The need for more uniform measures of "response time" and system "busyness" was particularly evident in this phase of the investigation. While response time could be satisfactorily defined in a uniform, consistent, easily measurable way, a uniform measure of load level or busyness of a system was more elusive. A more satisfactory solution to the busyness dilemna would have been possible if all systems could have been observed with busyness ranging from no users to system saturation. Although the lower bound was observable on all systems, some of the nodes under investigation did not approach saturation during the measurement phase of the research. Having decided on a definition of load level that was uniform and consistent across all systems, but perhaps not intuitively pleasing, comparison of response times of time-sharing systems as they processed given benchmark jobs was possible. Systems were able to be ranked in 120 order of fastest to slowest response times for a relatively long (approximately 4 5 seconds of processing time) CPU bound job, for a short (approximately 3 seconds processing time) CPU bound job and for an i/o bound job. This comparative capability was expanded from these three specific benchmark jobs to a more general class of jobs through the development of a single analytical and a single simulation model. The models were developed to describe and predict the response time behavior of the time-sharing systems involved in the study and were found to be valid system representations in three of the five systems investigated. The effects of increased network traffic were also studied and an expression was found to predict this component of response time if and when it becomes significant (adds delay on the order of magnitude of seconds to the response time of any individual system) . Currently on the APPA network, traffic is light and delays due to network congestion were not significant in the response time measurements. The successful results of the various areas of investigation described above led to the postulation of the feasibility of a dynamic response time monitor that users could query to obtain current on-line comparative response time data for a particular computing application run on one of a set of network time- sharing facilities. The contents and structure of such a monitor were discussed. 6.1. Implications for Future Network Development User oriented network research requires a commitment to the investigation and development of tools that go beyond mere reliability 121 goals. If, indeed, the ultimate aim of a computing network is resource sharing, then the human component as well as the technical components of networking must be fully investigated to achieve this goal. This research, a first step toward assisting the user in participating in the vast store of resources available on a network, suggests that a firm commitment on the part of node managers must be made (or required) to maintain and improve such assistance. The most pressing commitment on the part of node managers, needed to make more effective the implementation of the dynamic response time monitor discussed in section 5., is the investigation of and agreement upon uniform response time and load measures. Two of the five systems studied (BBN-TENEX and UCSD-CANDE) already automatically generate a consistent response time measure, as defined in section 2.2.U., when a job is run. This information is easily obtainable using a system clock and could be provided by other network systems with very likely only a minimum of effort. An acceptable load measure may be more difficult, but not impossible, to implement on all network systems. The BBN-TENEX "load average" measure which is a ratio of jobs on the ready queue to jobs on the run queue has proved to yield the least variation when statistical analysis of response time data is performed. It is a highly dynamic measure and a meaningful one in terms of system loading and the users ' conception of system busyness . The "load average" measure is, therefore, a prime candidate for a uniform measure of system load on all network systems. 122 A second commitment required of node managers is to the develop- ment and maintenance of descriptive and predictive response time models for their respective nodes. This research has illustrated that such models are possible to generate and can be effectively used. But a considerable amount of work is involved in fine tuning these models so that they are accurate for various classes of input jobs (CPU bound, i/o bound, etc.) and for variations within and among these classes. Even given that the initial system models may be developed by an outside group, cooperation from those persons most intimately involved with the system and model updating, at least at times of system configuration modifications, are essential to accurate response time prediction. 6 .2 . Suggested Further Research There are, of course, many other areas of investigation not directly related to dynamic response time monitors, but aimed directly at assisting users of computer networks, that need to be explored. Some of these areas are comparative job cost, "bidding" scheduling disciplines, a basic, uniform subset of time -sharing system commands available on any network system and the "black box" approach to scheduling in which the user views the network as a single powerful system. If we agree that "people use computers", then we have to agree to serve the needs of the computing community. Direct extensions of this research require the cooperation of all HOST facilities to gather the necessary data required to make the monitor universal to the entire network. Even the extensive measurements 123 collected for the particular five systems investigated in detail are incomplete in that they do not conclusively guarantee response time predictive capability for all classes of computing applications. A consistent, uniform response time measure and load level measure must be adopted by all network HOSTS. Facilities should be provided for forcing the systems into saturation so that system behavior can be observed under all loading conditions and so that comparisons of systems can be made more conceptually satisfying. Fine tuning of the basic models developed in this research must be done for various kinds of computing applications and models as well as tables and descriptions of system loading characteristics must be continually updated so as to credibly correspond to users' actual experience with a system. A further extension of this research is the investigation of comparative system costs so that users are enabled to balance their response time desires with their budget constraints. A final suggestion for future research which may be of particular significance in determining the viability of the whole computer networking concept is to determine the degree to which users at various sites are motivated to exploit the resources at other network nodes, given that the advantages of such activities are made readily apparent to them. 124 LIST OF REFERENCES [ABR74] Abrams, M. D., "A New Approach to Performance Evaluation of Computer Networks, " Proc . 1974 Symposium COMPUTER NETWORKS: Trends and Applications , pp. 15-20. [ANRT3a] ARPA Network Resources Notebook , NIC 6740 Network Information Center, Stanford Research Institute, Menlo Park, California. [ANR73"b] ARPA Network Resources Notebook , NIC 18666, Network Information Center, Stanford Research Institute, Menlo Park, California. [B0B72] Bobrow, D. G-, et al, "TENEX, a Paged Time Sharing System for the PDP-10, " Comm. ACM , Vol. 15, No. 3, March 1972, pp. 135-143- [C0F68] Coffman, E. G-, L. Kleinrock, "Feedback Queueing Models for Time-Shared Systems," Journal of the ACM , Vol. 15, No. 4, October 1968, pp. 5^9-576. [C0L71.1 Cole, G. D., "Computer Network Measurements: Techniques and Experiments, " UCLA-ENG-7165, University of California, October 1971. [DEN68] Denning, P. J., "The Working Set Model for Program Behavior," Comm. ACM , Vol. 11, No. 5, May 1968, pp. 323-333- [D0H70] Doherty, W. J., "Scheduling TSS/36O for Responsiveness," Proc. I97O Fall Joint Computer Conf ., Vol. 37, pp. 97-111- [FAR72] Farber, D., "Data Ring Oriented Computer Networks," Computer Networks , ed. R. Rustin, Prentice-Hall, 1972, pp. 79-93- [FUC70] Fuchs, E., P. E. Jackson, "Estimates of Random Variables for Certain Computer Communications Traffic Models, " Comm. ACM , Vol. 13, No. 12, December 197 0, pp. 752-757- [GRE7U] Greenberg, B. S., personal notes, to be published as Multics Program Logic Manual, Order No. AN73, Multics Multi- programming and Scheduling . [HER72] Herzog, B., "MERIT Computer Network, " Computer Networks , ed. R. Rustin, Prentice-Hall, 1972, pp. 45-48. [JAC69] Jackson, P. E., C D. Stubbs, "A Study of Multi-Access Computer Communications, Proc. 1969 Spring Joint Computer Conference , Vol. 3k, pp. 491-504. 125 [KLE70] Kleinrock, L., "Analytic and Simulation Methods in Computer Network Design, " Proc . 1970 Spring Joint Computer Conference , Vol. 36, pp. 569-579- [KLE72] Kleinrock, L., "Survey of Analytical Methods in Queueing Networks," Computer Networks , ed. R. Rustin, Prentice -Hall, 1972, pp. 185-205 • [KNI66] Knight, K. E., "Changes in Computer Performance," Datamation , Vol. 12, No. 9, September 1966, pp. kO-^k. [KNI68] Knight, K. E., "Evolving Computer Performance I963-I967, " Datamation , Vol. Ik, No. 6, January 1968, pp. 31-35* [MAMjh] Mamrak, S., "Performance Evaluation in Computer Networks: A Survey," January 197^, submitted for publication. [MAR73J Maranzano, J. G., "Proposal for a Definition of Response Time, " Computer Measurement and Evaluation , selected papers from the SHARE Project, Vol. II, December 1973, pp. 1+81+-1+96. [MCQ73J McQuillan, J. M-, "Throughput in the ARPA Network- -Analysis and Measurement," Report No. 2^91, Bolt, Beranek and Newman, Inc., January 1973 • [MID68] Middleton, J. A., "Least Squares Estimation of Non-Linear Parameter s-NL IN, " 360D-13-2 .003, International Business Machines Corporation, 1968. [0RG72] Organic k, E. I., The Multics System: An Examination of Its Structure , MIT Press, Cambridge, Massachusetts, 1972. [R0B70] Roberts, L. G., B. D.Wessler, "Computer Network Development to Achieve Resource Sharing, " Proc . 1970 Spring Joint Computer Conference , Vol. 36, pp. 543-549- [SAL73] Salz, F., Simulation Analysis of a Network Computer , Master of Science Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, June 1973- [SCH67] Scherr, A. L., An Analysis of Time-Shared Computer Systems , MIT Press, Cambridge, Massachusetts, 1967. [SCH71] Schriber, T. J., General Purpose Simulation System/ 36O : Introductory Concepts and Case Studies , Ulrich's Books, Inc., Ann Arbor, Michigan, c. 1971* 126 [TOT65] Totschek, R. A., "An Empirical Investigation into the Behavior of the SDC Time -Sharing System, " System Development Corporation, Report SP-2191, AD622003, Santa Monica, CA, 1965 • [WART 3] [WHI72] Ware, G. 0., et al, "A Simulation Study of an Information Dissemination Center Network, " The University of Georgia, Technical Report UGA/OCA 73-1° Whitney, V. K., "Comparison of Network Topology Optimization Algorithms, " Proc . First International Conference on Computer Communications , Washington, D .C pp. 332-337. October 1972, [WOL68] Wolff, R. W., "Time Sharing with Priorities," Operations Research Center, Univeristy of California at Berkeley, ORC 68-13, June 1968 . 127 APPENDIX A Definitions and Abbreviations AMES-TSS Time Sharing System created by IBM and run on an IBM 360/67 at the Nasa Ames Research Center, Moffett Field, California, This interactive system is characterized by a table driven process scheduler, in which the frequency and duration of processor time slices awarded to processes is determined by the process' paging behavior. BBN-TENEX A time -sharing system run on a PDP-10 machine at Bolt, Beranek and Newman, Incorporated in Cambridge, Massachusetts. The scheduler is characterized by five priority queues and a "balance set" control module which regulates running processes so as to minimize the probability of an idle CPU due to too frequent page faults. CANDE CCN-TSO See UCSD -CANDE below. Time Sharing Option created by IBM and run on an IBM 360/91 a "t the Campus Computing Network on the University of California campus in Los Angeles. The scheduler is distinguished by its binding processes to one of a fixed number of virtual machines within which no multiprogramming occurs. FIFO MIT-MI JLTICS A scheduling discipline in which processes are served in a first-in, first-out order. A time sharing system run on a Honeywell 6*+5 at the Massachusetts Institute of Technology in Cambridge. This scheduler is characterized by its concept of a set of "eligibles" which consists of those processes having the highest dispatching priority that can simultaneously exist in core. MULTICS See MIT-MULTICS above. 128 OS/MVT 0S/VS2 An IBM Operating System in which a multiprogramming environment exists. An IBM Operating System characterized by its Virtual Storage memory allocation scheme . Packet Switching RR Store-and-forward Network TENEX Thrashing TSO A method for sending transmissions through a communications network in which messages are broken down into smaller "packets" of information to be transmitted separately and reassembled by the receiver. A scheduling discipline in which processes are scheduled Round Robin; that is, they each receive a specified amount of service and then are returned to the end of the service queue if they have not completed execution in the specified time. A computer network in which messages to be transmitted are stored in each node along the transmission path until they are safely received by the next node in their path. See BBN-TENEX above. A state occuring in paged memory systems in which too many different working sets occupy main memory and each displaces the others pages in an attempt to have its own pages present . See CCN-TSO above. TSS UCLA UGSD-CANDE See AMES-TSS above. University of California at Los Angeles A time -sharing system run on a Burroughs 67OO machine at the University of California at San Diego. The scheduler is characterized, by two priority queues, with a high priority queue serving burst-oriented processes and a low priority queue serving compute bound processes . 129 APPENDIX B Benchmark Jobs B.l. MIT-MULTICS Number Cruncher REAL CRL( 100, 100) , DATA ( 100, 100) , SUM(lOO) , SD (100), OBS, TSUM, TSUMS INTEGER I,J,K,L,M DATA L/l00/,M/l00/ DO 10 1=1, M DO 10 J=1,L 10 DATA(l,j)=l./(3*I-3+J) CALL C ORREL(CRL, DATA, SUM, SD,L,M) STOP END SUBROUTINE CORREL(CRL,DATA, SUM, SD,L,M) INTEGER L,M, I,J,K REAL CRL (M, M ) , DATA( L, M ) , SUM(M) , SD (M ) , OBS, TSUM, TSUMS OBS=M DO 100 1=1, L TSUM=0 . TSUMS=0. DO 20 J=1,M TSUM=T SUM+DATA ( J, I ) 20 TSUMS=TSUMS+DATA(J, l)**2 SUM(l)=TSUM sd(i)=sort(tsums-tsum*tsum/obs) 100 CRL(I,I)=1. LML=L-1 DO 150 1=1, LM1 IP1=I+1 DO 150 J=IP1,L TSUM=0 . DO 125 K=1,M 125 TSUM=TSUM+DATA(K,I)*DATA(K,J) CRL (l,j)=(TSUM-SUM(l)*SUM(j)/OBS)/(SD(l)*SD(j)) 150 CRL(J, I)=CRL(I,J) RETURN KM) 130 B.2. MIT-MULTICS Bit String Manipulator C0NN100: EROD; DCL(SYSIN, SYSFRINT)FILE; DCL (FOUND, GOAL, REALITY, LAST) BIT (10201) ALIGNED ; DCL ( I, J, ITERATIONS) FIXED BIN; DCL SEED FIXED BIN(lT); DCL MULTIPLIER FIXED BIN; MULTIPLIER^ 57; SEED =99; DO 1=1 TO 10201 BY 17; LF SEED=0 THEN SEED = MULTIPLIER; SEED=MOD ( SEED*MULTIPLIER, 131072 ) ; SUBSTR (REALITY, I, 17)=BIT(SEED) ; END; SUBSTR (REALITY, 1, 101)= (100)"0"B; DO 1=102 TO 10201 BY 101; SUBSTR (REALITY, I, 1 )= "O n B ; end; GOAL, FOUND, LAST= "o"B ; SUBSTR (GOAL, 10102, 100 )=( 100 )"1"B; SUBSTR (FOUND, 103, 100)=SUBSTR (REALITY, 103, 100) ; ITERATI0NS=1; DO "WHILE ( (FOUND t=IAST) &( (FOUND&GOAL)="0"B) ) ; LAST=FOUND ; ITERATI0NS=ITERATI0NS+1 ; SUBSTR (FOUND, 102) = SUBSTR (REALITY, 102 )&(F0TJND | SUBSTR (FOUND, 101 )> SUBSTR (FOUND, 102 )\|SUBSTR (FOUND, 103 K|\ I SUBSTR (FOUND, 203 ) ) \ END; END C0NN100; MIT-MULTICS I/O Bound FILFLG: PROC ; DECLARE I FIXED BIN(3l); DECLARE (NUMBERRECS INIT(lOOO), RECLENGTH INIT(250)) FIXED BIN(15), FILEIN FILE RECORD, FILEOT FILE RECORD, 1 RECORD ALIGNED, 2 WORTHLESSTEXT CHAR (2 50) INIT( (250) "X") ; OPEN FILE (FILEOT ) TITLE ( "VFILE «- TSTJKM") OUTPUT ; DO 1=1 TO NUMBERRECS; WRITE FILE (FILEOT) FROM(RECORD) ; END; CLOSE FILE (FILEOT); OPEN FILE (FILEIN) TITLE ( 'V FILE *- TSTJKM") INPUT; DO 1=1 TO NUMBERRECS; READ FILE (FILEIN) INTO (RECORD); END; CLOSE FILE (FILEIN); END FILFLG; 131 APPENDIX C Relevant Statistical Data Comparison of residual mean squares (RMS) for the individual system data curve fits was one of the criteria used to determine a "best fit" to the response time data. Table C.l contains the RMS for the quadratic, cubic and exponential curve fits, for each of the bench- mark jobs run. The other criteria used were possibility of fit (does the regression curve indicate the response time is negative for some range of the data) and probability of fit (does the regression curve indicate a higher response time for a lower load level than a higher one). The final choices for the best fit curve are listed in Table C.2. The regression sum of squares to total sum of squares ratio given in the table is a measure of how well the regression curve explains the total variation in the data. A ratio of 1.0 would indicate a perfectly fit curve . 132 Table C.l. Residual Mean Square (RMS) Statistics Location Benchmark (RMS) Quadratic (RMS) Cubic (RMS) Exponential 923.5^ AMES Wo. Cruncher 8^8.92 788.29 Bit Manipul. 1^5-66 150.19 1^5-95 I/O Bound 985. h2 1029.69 952. Ok BBN No . Cruncher lA9(io 5 ) 1.^7(10 5 ) 1.35(10 5 ) CCN No . Cruncher 1217.02 1882.58 1079. ^3 Bit Manipul. 839.89 1591.57 835.72 I/O Bound 6811.62 6725.53 5^1.88 MIT i No. Cruncher 7.^9(10^) 7A9(10 U ) 6.62(10^) Bit Manipul. 11.72 12.35 12.19 I/O Bound 25.68 23.19 26 A3 UCSD No. Cruncher li+6l.07 15^2.0^ 1301.9 133 Table C.2. Individual System Best Curve Fit Data Location Benchmark rss/tss* Type of Curve for Best Fit AMES No. Cruncher .hk Exponential Bit Manipul. • 59 Quadratic I/O Bound •67 Cubic BBN No. Cruncher .87 Exponential CCN No. Cruncher •59 Exponential Bit Manipul. .6k Exponential I/O Bound • 53 Exponential MIT No. Cruncher • 37 Exponential Bit Manipul. .kk Exponential I/O Bound • 59 Exponential UCSD — _ No. Cruncher .81 Exponential ^Regression Sum of Squares/Total Sum of Squares 13^ VITA Sandra Ann Mamrak was born in Cleveland, Ohio in 19^ • She received the B.S. degree from Notre Dame College of Ohio in 1967 and subsequently taught in the Cleveland secondary school system for three years . From 1971 "to 1975, Ms. Mamrak was employed as a research assistant by the Department of Computer Science and later the Computing Services Office at the University of Illinois where she was a participant in groups investigating problems in performance evaluation in single and network computer systems. She received the M.S. degree in 1973 and her Ph.D. degree in 1975 from the University of Illinois at Urbana- Champaign. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-75-722 3. Recipient's Accession No. 4. Title and Subtitle Comparative Response Times of Time-Sharing Systems on the ARPA Network 5. Report Date May 1975 7. Author(s) Sandra Ann Mamrak 8. Performing Organization Rept. No " UIUCDCS-R-75-722 ). Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana -Champaign Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract/Grant No. 12. Sponsoring Organization Name and Address Computing Services Office University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 13. Type of Report & Period Covered Ph.D. Dissertation 14. 15. Supplementary Notes Also sponsored by the Advanced Research Projects Agency under contract DAHCO^-72-C-OOCJl 16. Abstracts If, indeed, the ultimate aim of a computing network is resource sharing, then the human component as well as the technical component of networking must be fully investigated to achieve this goal. This research is a first step toward assisting the user in participating in the vast store of resources available of a network. Analytical, simulation and statistical performance evaluation tools are employed to investigate the feasibility of a dynamic response time monitor that is capable of providing comparative response time information for users wishing to process various computing applications at some network computing node. The research clearly reveals that sufficient system data is currently obtainable, at least for the five diverse ARPA network systems studied in detail, to iescribe and predict response time for network time-sharing systems as it depends on some measure of system busyness or load level. 7. Key Words and Document Analysis. 17a. Descriptors Response time monitor, computer networks, time-sharing systems, analytic modeling, simulation, ARPA network 7b. Identif icrs/Opcn-Ended Terms lc. ( OSATI Field/Group J. Av.ul ability Statement Release Unlimitied 19. Security (lass (This Report ) DNC.l.ASSH-THD 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Prici l)R M NTIS-lfl (10-70) USCOMM-DC 40327 1 a. UJ CO