LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510. 84 Cop. Z The person charging this material is re sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below . Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 1JR L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/statisticalestim799chou tHr Re P° rt No - ircucDcs-R-76-799 «t* THE STATISTICAL ESTIMATION OF THROUGHPUT AND TURNAROUND FUNCTIONS FOR A UNIVERSITY COMPUTER SYSTEM 3 by PAUL LEWELLYN CHOUINARD May 1976 Report No. [JIUCDCS-R-76- 1 THE STATISTICAL ESTIMATION OF THROUGHPUT AND TURNAROUND FUNCTIONS FOR A UNIVERSITY COMPUTER SYSTEM by PAUL LEWELLYN OHOUTNARD May I.976 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois A18OI x Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University/ of Illinois at Urbana-Champaign. iii Acknowledgment I would like to thank Professor Richard Montanelli for his helpful encouragement throughout the preparation of this thesis. The efforts of Professor Judge, Professor Friedman, and Mr. J.M. Randal in reading early drafts of this thesis is greatly appreciated. A special thanks to the Computing Services Staff members who, at one time or another, each played a part in helping complete this research: George Badger, Jerry Beck, Joe Kolman, the entire systems programming staff, operations staff, and engineering staff. iv Preface This thesis was typed on a Diablo printer under control of the DEC-10 system. The text was formatted by the program Runoff. The reader is informed that the typographical restrictions of this configuration did not permit the use of superscript -1 to denote matrix inversion. The symbol \ was used instead, so that the ordinary least squares estimator, for example, appears as (X'X)\X'Y . Table of Contents Page Acknowledgement iii Preface iv Table of Contents v 1. Computer Performance Evaluation 1 1 . 1 Introductory Comments 1 1.2 The Need for Computer Performance Measurement 3 1.3 The Basic Measures of Performance 4 1.4 Quantitative Methods for Performance Measurement. . . . 4 1.5 Use of Regression Models in Performance Evaluation. . . 7 1.6 A Simulation Model for Turnaround 10 2. Methodology 12 2.1 Objectives 12 2.2 Estimation of a Throughput Function 14 2.3 Estimation of a Limited Turnaround Function 23 3- The System and The Collection of Data 30 3.1 The System 30 3.2 Throughput Data 34 3.3 Turnaround Data 41 4. Results 43 4.1 Throughput 43 4.2 Turnaround 50 5. Discussion 56 5.1 Throughput 56 5.2 Turnaround 60 5.3 Concluding Remarks 65 References 65 Appendix A 69 Appendix B 72 Appendix C 75 Appendix D 78 Appendix E 81 Appendix F 86 Appendix G 88 Appendix H 90 Appendix I 92 Appendix J 94 Appendix K 96 Appendix L 97 Vita 101 1 . Computer Performance Evaluation 1 . 1 Introductory Comments Computer performance evaluation, a term currently in fashion in computer science literature, has been broadly applied to include computer performance measurement and computer systems modeling. An overview of the area can be obtained by reading Calingaert (1967), Drummond (1969), Drummond (1973), Lucas (1971), Goodman (1972), Kimbleton (1972), Williams (1972), Grenander and Tsao (1Q72), and Boehm and Bell (1975). Bibliographies have been compiled by Anderson and Sargent (1972) and by Miller (1973). The difference between computer performance evaluation and computer performance measurement is demonstrated by the fact that measurement necessarily precedes evaluation. Measurements may be from an actual system or from a model of a system. In either event, values are then placed on the measurements by the use of a management defined objective function. The objective function, whether explicitly or implicitly specified, is the manifestation of management policy. It embodies what it is that management wants the computer system to do. Clearly, different objective functions, when applied to a given system, will result in different evaluations. The importance of the difference between measurement and evaluation is stressed by Boehm and Bell (1975). Objective functions may be quite complicated. Typically, a manager wants to get the largest possible amount of his work done as efficiently as possible, where efficiency may include such notions as accuracy of results, low cost, and minimum time. Additionally, managers are interested in a system which to them seems flexible. Flexibility is desired so that the manager can easily alter the system, on either a permanent or temporary basis, for the purpose of satisfying some demand. Flexibility is one variable in the objective function which depends to a high degree upon the personal preferences of the manager. Political considerations, inherently difficult to quantify, may also play a large part in a system evaluation. For these reasons, the specification of objective functions is not treated further. Measurement, on the other hand, is an exercise in technique and has the advantage of immediately conveying specific information about the object being measured. Sets of measurements can be collected, and the process of trying to establish relationships between measurements is the motivation for model building. The confusion between evaluation and measurement probably results from a confusion between the roles of policy maker and technician. The goal of performance evaluation is to maximize the management defined objective function while the goal of performance measurement is to provide measures for certain of the variables in the objective function 1.2 The Need for Computer Performance Measurement The need for performance measurement can arise out of different situations. Lucas suggested three general reasons for undertaking performance evaluation at all, namely: 1. selection evaluation purpose 2. performance projection purpose 3. monitoring purpose Grenander and Tsao essentially reiterated these three reasons in their discussion of the three levels of the performance evaluation problem. According to Grenander and Tsao, the three levels are: 1. the ability to compare two systems 2. the ability to predict the performance of alternate design decisions 3. the ability to tune a system. This thesis is concerned with the monitoring purpose applied to operating systems, although certain of the techniques used can be advantageously applied to other performance measurement situations. Performance measurement aids the computer center manager monitor his operating system by providing him with knowledge necessary for informed decision making. As operating system changes are made, a new model can be estimated and performance changes can be measured by a comparison of the new and old models. The first model establishes a base line against which the effect of subsequent changes are measured. 1.3 The Basic Measures of Performance Calingaert suggested that the basic measures of a computer system are throughput, turnaround, and availability. He emphasizeed the fact that overhead is not a basic measure of performance. In most "general purpose" multiprogrammed systems, it is well known that an increase in overhead may be beneficial in that it may also increase throughput. Throughput is the amount of work done per unit time under steady state conditions. Turnaround is the amount of time necessary to do a unit of work. Availability includes the notion of reliability and, according to Calingaert, is a measure of "the likelihood that a system is operating properly at a given moment." While availability as a performance measure is not considered in this thesis, both throughput and turnaround for an operating system are considered, with an emphasis on how these two measures are related . 1.4 Quantitative Methods for Performance Measurement Grenander and Tsao suggested that the quantitative methods used for performance measurement and modeling fall into three categories, namely: 1 . analytical methods 2. simulation methods 3. empirical methods. Analytical methods involve the use of mathematical models, usually of the queuing theory type. These models are stochastic in the sense that inputs to the model are a priori assumed to arrive according to some probability distribution, and that these inputs are then serviced according to a servicing probability distribution, also a priori assumed. Under the assumptions which may be made for a particular model, the outputs of the model are worked out mathematically. In a survey of analytical models for timesharing systems, McKinney (1969) suggested that analytical models for timesharing systems may be categorized by the following seven criteria: 1. number of input channels 2. number of central processors 3. type of arrival process 4. type of service process 5. assumptions about swapping and overhead time 6. quantum, i.e. time slicing, assumptions 7. service discipline Grenander and Tsao , while they believed that analytical models are useful, were of the opinion that the broad assumptions (abstractions) usually made to render these models tractable "are questionable for real systems" and that " the reliance on analytic methods will not be sufficient for the evaluation of computing systems . " ( emphasis theirs ) [ 1 ] While analytical methods have the advantage of mathematical rigor, the price paid for this formalism is a corresponding loss in realism. The difference between simulation methods and empirical methods is that under simulation, data is generated by a program which attempts to simulate the behavior of the system being studied, whereas empirical methods use data gathered from the running of a real system. The advantage of simulation over empirical methods is the ability to [1] Grenander and Tsao (1972), PP. 8-11 generate data which could not otherwise be gathered or which would be too expensive to gather from a real system. Also, it is easier to make major adjustments in a simulation program, in an attempt to model changes in a real system, than it would be to make those changes in the real system. Again, the cost of these advantages for simulation over empirical methods is a loss of realism. When the real system is available and one is interested in modeling that system as it exists, then empirical methods are appropriate for the performance measurement. In any performance evaluation or measurement, it is necessary to consider the workload to be processed, since it is only when the system is processing work that the concepts of throughput and turnaround have any meaning. As pointed out by Calingaert, the evaluation of computer systems performance "as an independent entity does not exist". Regardless of the type of quantitative method used, the more realistic the input workload, the more realistic the performance measurement. In the case of empirical methods, one of the classical objections to the use of kernel programs, benchmarks, or synthetic jobstreams is that they are, each in their turn, atypical of what the system will actually be required to process. This objection is particularly relevant when one is interested in characterizing system behavior on a production jobstream. In such an instance, the system should be measured under production conditions, if at all possible. The exception to this argument is when one is interested in trying to measure a narrowly defined hardware or software configuration, and the unpredictability of a production jobstream would seriously interfere with the high degree of data precision required. Measurements under such laboratory conditions require dedicated machine time, are therefore quite expensive, and although carried out to a high degree of precision may still suffer from a lack of realism. Whether simulation or empirical techniques are used, typically some use of statistics is also required. Schatzoff and Tillman (1974), for example, used statistics to validate a trace-driven simulator. Empir- ically collected data usually are at least summarized statistically, say by reporting means and standard deviations. If the data were collected in an experimental situation, then the data will have been collected for the purpose of further statistical analysis, such as ordinary least squares. Grenander and Tsao concluded that it was the use of modern statistical methodology on empirically collected data, techniques largely ignored by computer scientists, which showed the greatest promise among the various quantitative methodologies for helping to solve computer performance problems. 1.5 Use of Regression Models in Performance Evaluation The use of statistical linear model techniques such as analysis of variance and multiple regression, while not totally absent from the literature, has been the exception rather than the rule. An exception worth noting was a series of papers which described research done using 8 regression techniques on the CP-67 system at the IBM Cambridge Scientific Center. Two papers, one by Bard (1971) and the other by Bard and Suryanarayana (1972), discussed the structure of overhead for the CP-67 system. The CP-67 system was run on an IBM 360/67 in a timesharing environment. Each user interacted with a stand-alone system 360 virtual machine which was set up by the CP-67 system. The overhead model which Bard estimated in the first paper had time spent in the CP-67 system per unit clock time as the dependent variable. He referred to this variable as CP-67 overhead. The independent variables were an intercept term, virtual selector I/O operations, virtual multiplexor I/O operations, page I/O operations, spool I/O operations, and problem state time. Data for this study was collected under varying workload conditions, i.e. presumably a production situation, at nominally five-minute intervals. Data samples were collected at three different times for about two months each. A stepwise regression program was used to choose the final set of independent variables from a larger set of hypothesized independent variables and to estimate the coefficients of the resulting model. The resulting estimates had an explained variance of between 85% and 90? after outliers, observations with residuals greater than 2.5 standard deviations, were thrown out. For example, the number of outliers thrown out when the model was estimated using the first of the three samples of data was 2.5% of the observations. The Bard and Suryanarayana paper was a follow-up of the first Rard paper. In this paper, they observed that even though the original model had variance explained by the model at around 85^ to 90%, they felt that the model could be improved upon by the inclusion of additional variables which were felt to contribute to system overhead. An additional nine variables were added to the five of the first model. Again a stepwise regression procedure was used to pare this number from a total of fourteen independent variables down to ten. This new model did give a better fit of the data in an explained variance sense, an unsurprising result. A third paper by Schatzoff and Bryant (1973), titled "Regression Methods in Performance Evaluation: Some Comments on the State of the Art," summarized the results of Bard (1Q71) and Bard and Suryanarayana (1972). Unfortunately, in all three papers the authors seemed to be unaware of the fact that the stepwise multiple regression estimator is a type of preliminary test estimator, and consequently they were unaware of its implied sampling properties. An example of the use of a linear regression model in a controlled, non-production environment is described in IBM document G320-1 373 on Project SAFE (1974). The goal of this research was to measure the degradation of a specific software modification called Resource Security System to a standard IBM OS/MVT operating system. A job stream of 63 jobs, specifically programmed for the study, were run on a dedicated system. The model parameters, estimated using ordinary least squares, could be used to predict the impact of RSS on IBM 370 systems running OS/MVT release 21.0. This can be an effective technique when one has 10 large amounts of dedicated machine time available for investigating a narrowly defined problem. 1.6 A Simulation Model for Turnaround Mamrak C 1973) f using a simulation model of essentially the same system as the one considered in this thesis, attempted to model the effect of a change in the system's scheduling algorithm. Mamrak 's goal was "a priority algorithm . . . that allows jobs to vie for queue position depending on how short a turnaround time the user desires for his job. "[2] This priority scheduling algorithm was in contrast to the first-in first-out, or FIFO, algorithm used before her study. One of Mamrak 's objectives was the prediction of a job's turnaround. The predictor attempted was that of average turnaround by job priority. The primary difficulty encountered was an inability to predict arrival rates of jobs with higher priority after a job had been submitted, thereby severely complicating turnaround prediction for the lower priority job. Ignoring this problem and assuming a FIFO discipline with discrete job classes rather than continuous priority assignments, the question arises as to whether or not average turnaround by jobclass is a reasonable predictor of a job's expected turnaround. The difficulty with average turnaround in a FIFO scheduling algorithm is [2] Mamrak (1973) , p. 1 11 that it is still influenced by job arrival rate. In particular, if the rate at which users are submitting jobs is greater than the rate at which jobs are being processed, then turnaround is increasing and an estimate of a job's turnaround based upon the average turnaround of like jobs in an immediately preceding time period will underestimate what one would expect actual turnaround to be. Average turnaround ignores the effect of a changing queue length. An after the fact discussion of the difficulties encountered in implementing the simulated priority scheduling algorithm, and the reasons for its subsequent abandonment, were discussed in Mamrak and Randal (1974). Under the heading "Insufficient Backup Data", the following observation is made. "Also overlooked from a technical point of view was the need to establish more exactly how the system behaved before major system modifications were made. Once dissatisfaction began to increase under the new system, allegations were freely made by disgruntled users claiming system malfunctions ranging from the operating system caught in a tight loop within itself, to increases of turnaround by a factor of seven, to discrimination in scheduling against users requesting low-speed core. No concrete evidence could be quickly produced to either prove or disprove these allegations." A model for turnaround and throughput, of the type to which we now turn, would have mitigated precisely this situation. 12 2. Methodology 2.1 Objectives The purpose of this thesis is to estimate throughput and limited turnaround functions for a general purpose, multiprogrammed , electronic digital computer system. In doing so, exact and stochastic restricted least squares and the James-Stein positive part estimator (Judge, Bock, and Yancy, 1974) are used, statistical techniques not previously applied to the problem of computer systems modeling. It is felt that these techniques are appropriate for a large class of computer measurement problems and therefore are of general interest. The data used in these estimations were collected while the system was running production work, thereby avoiding the many problems inherent in data collected under artificial testing conditions. For the throughput function, sample observations consisted of accumulated activity in nominally fifteen minute intervals. For the turnaround function, sample observations consisted of the amount of activity of individual jobs and the level of throughput during the fifteen minute time period in which the job executed. The functions estimated provide 13 a characterization, not otherwise attainable, for the particular system studied. Such a characterization facilitates decision making in a performance evaluation context in the sense that it provides estimates of critical system parameters thereby reducing uncertainty about the nature of the system. The type of computer system for which we want to estimate these functions has a workload characterized by four components, namely: 1 . operating system functions 2. interactive timeshared computing 3. small memory, short execution time, batch jobs 4. medium to huge batch jobs All general purpose multiprogrammed systems have an operating system component and at least one of the three remaining workload components. Systems which have only one or two of the last three components are subclass members from the set of systems having all of the last three components. It is assumed that these components execute at different priorities and that the medium to huge jobs execute at the lowest priority. Within this context, it is assumed that during a fifteen minute time period all operating system, interactive computing, and small job workload requirements are serviced. For these components, throughput per fifteen minutes is assumed to be the same as the amount of work submitted, that is, all such work. The same is not true for the medium to huge batch jobs. Since the larger jobs execute only when all pending higher priority work is completed, these larger, lower running priority, batch jobs have a throughput which is dependent upon the amount of work performed at the higher priority levels. It is the 14 throughput and turnaround functions for these larger batch jobs which are estimated in this thesis. 2.2 Estimation of a Throughput Function The throughput function is an expression of the amount of CPU time that the collection of large batch jobs spend in execution per fifteen minutes. Clearly, time spent performing interactive and small batch work will have a negative effect on the amount of time available for large batch jobs. Likewise, peripheral activity and the number of jobs in the system might reasonably be considered to have a negative effect on throughput. It shall be assumed that all such effects are linear in their influence. The multiprogramming level for the large batch jobs might also be expected to have an effect on throughput, but one would not expect the effect to be necessarily linear. The problem posed by this potentially non-linear effect can be overcome by using dummy variables for the various multiprogramming levels over which data are collected. The throughput function, Y , in general is given by: (1) Y = XB + u where Y is an Nx1 vector of observed throughput values, N the number of observations; X is an NxK full rank matrix of K independent variables, K < N ; B is a Kx1 vector of unobservable population parameters to be estimated; and u is an Nx1 vector of identically and independently distributed normal random variables with expected value and variance-covariance matrix given by 15 (2a) E(u) = (2b) E(uu') = ql where q is the variance of each of the u variables, and I is an identity matrix of order N . Under these assumptions, ordinary least squares estimates for the unknown vector B , (3) b = (X'X)\X'Y = CX'Y , are linear, unbiased, maximum likelihood, and have minimum variance within the class of all linear unbiased estimators. [ 1 ] Furthermore, ordinary least squares estimates are minimax, that is, within the class of linear estimators they minimize the maximum Mean Square Error. [2] An unbiased estimate, s , of the variance q , is (4) s = (Y-XbK(Y-Xb) = e'e (N - K) (N - K) The variance-covariance matrix for b is [1] Typographical restrictions being what they are, the symbol \ has been used to denote the inverse of a matrix, replacing the traditional superscript -1 . [2] The Mean Square Error of an estimate b of an unknown vector B is defined to be MSE = E(b-B)(b-B)' , where E is the expected value operator. By adding and subtracting the expected value of the estimate to each term, the Mean Square Error may be rewritten as MSE = E(b-E(b)-B+E(b))(b-E(b)-B+E(b))' . Expansion of this yields MSE = E(b-E(b))((b-E(b))' - E(b-E(b) ) (B-E(b) ) ' - E(B-E(b))(b-E(b))' + E(B-E(b))(B-E(b))' . Notice that the first term is the variance of b and that the two inner terms are zero. The last term, the bias squared of the estimate, is positive (semi-)definite, so that the maximum MSE is minimized when this last term is zero. By a linear estimator of B is meant an estimator of the form AY , where A is some matrix, so that the last term may be written as E(B-E(AY))(B-E(AY) ) ' = E( B-AXB) ( B-AXB) ' which is zero when AX = I . The ordinary least squares estimate satisfies this criterion, for indeed AX = (X'X)\X'X = I . 16 (5a) qC = q(X'X)\ and an unbiased estimate is given by (5b) sC = s(X'X)\ Within this context, we want to hypothesize that certain elements of B are equal to zero in order to reduce the number of independent variables necessary to predict throughput adequately, and thereby achieve a more parsimonious model than the initial full variable model. This hypothesis is equivalent to making an exact linear restriction of the form (6) r = RB where R is a JxK restriction matrix of rank J , J the number of restrictions to be imposed, and r is a Jx1 vector, which in this case is the zero vector. The exact restricted least squares estimator, b* is given by (7a) b* = b - CR'(RCR')\(Rb-r) , and an unbiased estimate of the variance is (7b) s = (Y-Xb»)'(Y-Xb») = e*'e* (N-K+J) (N-K+J) In the case when the R matrix makes no linear restrictions other than restricting specific B elements to zero, the b* solution given is computationally equivalent to performing an ordinary least squares solution on the remaining unrestricted independent variables and setting the restricted coefficients, as hypothesized, to zero. [3] Under the [3] Goldberger (1964), pp. 257-258. The exact restricted least squares model is extensively discussed in Goldberger (1964), pp. 255-265 and in Theil (1971), pp. 42-46, 137-145, and pp. 282-289- 17 assumption that the hypothesized prior information is correct, b* is unbiased. Similarly, the variance-covariance matrix for b* , (8) E(b*-B)(b*-B)' = qC* = q(C - CR '( RCR ')\RC) has zero variance and covariances for all restricted coefficients, and the variances and covariances for unrestricted coefficients are identical to the corresponding variances and covariances which result from an ordinary least squares solution on the unrestricted variables. Note that qC* is necessarily singular. Each row of R restricts one of the B elements to have the value zero. There is a specification problem, however, in that there are no _a priori grounds for hypothesizing which parameters should be restricted in this manner. For this reason, the regression strategy taken was to divide the sample in half, using one set of data to build a reasonable model, and then using the second set of data to verify the model. In particular, results from the model estimation on the first set of data are used as stochastic restrictions on the second set of data. In deciding which coefficients are to be exactly restricted to zero, one can use the fact that the ratio of each of the individual elements of b to the square root of its corresponding estimated variance has the Student's t distribution with N-K degrees of freedom. This ratio can be used to test the hypothesis that the true population parameter, within a certain degree of confidence, is zero. [4] The results of these tests are used as an informal guide in [4] Goldberger (1964), pp. 172-173 18 formulating a set of exact restrictions. Typically, after an exact restricted model is estimated, a test of hypothesis is made as to whether the prior information of the exact restrictions is compatible with the sample information. The test statistic used is (9) F = (e*'e» - e # e)(N-K) J (e'e) This statistic, under the hypothesis that the prior information is true, would have an F distribution with J and N-K degrees of freedom if it were not for the fact that the choice of hypotheses was based on the same sample of data. In this case, the actual distribution of (9) is not known. It should be noted that when the distribution of (9) is known and (9) is used as the critical value in an F test for determining the acceptance or rejection of the prior information, the preliminary test estimator results. Discussion of the preliminary test estimator can be found in Bock, Yancy, and Judge ( 1 973 ) • The model constructed from the first set of data is determined, in part, by preliminary tests of significance and, in part, by a subjective evaluation of those tests. It is acknowledged that this procedure, similar in vein to the use of stepwise multiple regression, contains the inherent pitfall that the sample properties of multiple preliminary tests of significance are not known. The practical result of this pitfall is that one can easily fit the model too closely to the particular data sample. It is imperative in such situations that one provides an independent test of hypotheses using a second set of data. 19 This is the justification for the use of stochastic restricted least squares in a hypothesis testing framework on a second sample of data. The stochastic restricted least squares procedure, for its part, does not know from whence the hypothesized prior information springs. Traditionally, stochastic prior information is, as Theil remarks, "either from previous samples or from introspection ."[5] In this case, the stochastic prior information is from both. The stochastic restricted model is (1) and (2) with the restriction (10) r = RB + v , where v is a vector of random variables with mean equal to the zero vector and with variance-covariance matrix V. In this use of the stochastic restricted model, the R matrix is a KxK identity matrix, and the r vector is the vector of b estimates using the exact restricted model on the first half of the data. V is the variance-covariance matrix of the exact restricted b estimates, however, since this matrix is unknown, an estimate is used. The use of stochastic restrictions was described in Theil and Goldberger (1961), Theil (1963), and more recently by Judge, Yancy, and Bock ( 1973) - To solve the stochastic restricted model, we follow the general development given in Judge et al . (1973) by reformulating the stochastic restricted least squares problem as a seemingly unrelated regression model with exact restrictions. This form states the model as [5] Theil (1963), PP- 402 20 ! Y x o ! ! B + u ! ! r o i ! ! P v ! (11) with the exact restriction (12) I -I = B - P where I is a KxK identity matrix, and P is a K-vector of parameters which are restricted to equal the B parameters. The vector [u' v']' has mean and variance-covariance (13) I mV where I is an NxN identity matrix in both (11) and (12), and V is KxK . The so-called nuisance parameter m , the ratio of the variance of the second sample to the variance of the first sample, is assumed to be one . The next step is to reparameterize this model so that it is in the form of an exact restricted least squares model. This reparameter- ization requires that (13) is non-singular. If C* from (8) is used for V , however, this condition does not hold. The approach adopted here uses only the non-zero rows and columns of C* and reintroduces the exact parameter restrictions imposed in the first sample. 21 First the zero restrictions are introduced by a partition of the elements of B into two vectors B1 and B2 , with B1 corresponding to the parameters which were not restricted in the first sample and B2 corresponding to the parameters which were restricted to zero in the first sample. Partition the independent variables of the second sample of data in a like manner. Now stochastically restrict B1 , and exactly restrict B2 to equal zero. This version of the model is (14) X1 X2 1 ! ! B1 + u ! i ! B2 v ! ! P subject to the restriction that (15a) I -I 10 B1 B2 P or equivalently , (15b) B1 = P B2 = The identity matrix in (14) is order K-J as are the identity matrices in the first row of (15a). The identity matrix in the second row of (15a) is order J . In this instance, the stochastic prior information variance-covariance matrix is the same as (13) except that all all-zero rows and columns have been deleted. This model is reparameterized 22 (16) Y Gr X1 X2 G B1 B2 P Gv subject to the restriction (17) G -G 10 B1 B2 P where G'G = C*\ . The vector [u' (Gv)']' is multivariate normal with mean zero and variance-covariance ql where I is an order N+K-J identity matrix. For notational simplicity, the model (16) and (17) may be expressed as (18) Y = ZD + w subject to the constraint (19) HD = , where D' = [B1 ' B2 ' P']' , and w' = [u' v'G']' . If d is the unrestricted estimator for D , then the solution to (18) and (19) is given by (20) d~ = d - C~H'(HC~H')\Hd where C~ = (Z'Z)\ . A test statistic for testing the compatability of the hypotheses (15a) with the unrestricted estimation of (14) is (21) F~ = (e~'e~ - e'e)(N-K) K-J (e'e) 23 If the hypotheses are correct, it is conjectured that F~ has an F distribution with K-J and N-K degrees of freedom, K-J the number of stochastic restrictions. Although Judge suggests that the distribution of F~ is not known for the mixed stochastic and exact restricted model adopted here[6], this test statistic is similar to the test statistic for the purely stochastic preliminary test estimator model described in Judge et al . (1973). Since this estimator is in the form of a linear model with K>2 exact linear restrictions, it is dominated in a Mean Square Error sense by the positive part James-Stein estimator as described in Judge, Bock, and Yancy (1974). This estimator has the form: (22) d+ = d~ if F~ <= a d+ = (1 - a/F~)(d - d~) + d~ if F~ > a for (23) < a_K <= 2(K-2) N-K N-K+2 Typically, the maximum value is chosen for a , so that (24) a = 2(K-2)(N-K) ( N-K+2 )K 2.3 Estimation of a Limited Turnaround Function Whereas throughput is the amount of work done per unit time, turnaround is the amount of time required per unit of work. These two performance measures are inversely related representations of the same [6] George Judge, personal communication. 24 phenomenon taken from two different points of view. The throughput measure is taken from the point of view of the system, and the turnaround measure is taken from the point of view of the collection of individual jobs. The total turnaround time for a batch job is the elapsed wall clock time from when the job is read onto the system until it completes all functions and leaves, i.e. is purged from, the system. Total turnaround time can be broken into three parts: 1. execution queue time 2. execution residency time 3. peripheral time. Execution queue time for a job is the elapsed wall clock time between when the job is read into the system and the moment it begins execution. It is assumed that once a job is in execution, it is not returned to the execution queue. Execution residency time is the elapsed wall clock time that a job is in execution. Peripheral time, the elapsed wall time between the end of execution and the moment the job is purged, is a collection of queue and device residency times for plotting, punching, and printing devices. Not all jobs require servicing by each type of peripheral device, and it is assumed that at the moment that each job is read into the system it is known for which devices that job does not need to be scheduled. Total turnaround time for jobs was not considered here, but rather the view of turnaround was limited in two ways. First, peripheral time was not considered. It was relatively easy to keep the system 25 parameters concerned with the execution of jobs, such as the batch multiprogramming level and the classes of jobs being allowed to execute, in stable states while the data were being collected. By contrast, the diversity of peripheral devices, by both type and physical location, made it impossible to maintain a similarly stable configuration of the peripheral world. Given this fact, it was not possible in any practical sense to collect meaningful data necessary for an appropriate estimation of peripheral throughput or turnaround. Even if one wanted to pursue the estimation of the peripheral component to turnaround, the methodology would merely be a repetition of the one used to estimate turnaround for the execution queue time and the execution residency time. That portion of turnaround with which we are concerned will be called execution turnaround time. Unfortunately, to the extent that the turnaround of a job is dependent upon throughput, turnaround is dependent upon the level of throughput during all time periods between when the job is submitted and when it ends execution. This requires the ability to predict levels of throughput in future time periods, a difficult task by any measure. One technique for making such predictions is the use of distributed lag models as described in Almon (1965) and Dhrymes (1971). A second approach is the use of autoregressive integrated moving averages, or ARIMA, models as described in Box and Jenkins (1970). Both techniques are stochastic process modeling techniques, are beyond the scope of this thesis, and are the second reason for the limited notion of turnaround adopted here. 26 In any computer system, turnaround is greatly affected by the type of scheduling algorithm used. The type of scheduling algorithm considered here was one in which each batch job was assigned a job class on the basis of user estimated computer resources which the job required. Within each job class, scheduling for execution was on a first-in first-out basis. The execution turnaround time of a job was equal to its own execution residency time plus the execution residency time for all jobs ahead of it in its own job class queue. It was noted for systems in which CPU utilization approached 10055 that if no one job class tended to be more CPU bound than other job classes, then for some multiprogramming level, say five, the jobs running in each multiprogramming partition would get, on average, one-fifth of the available rate of throughput. For example, if the multiprogramming level were five and the batch throughput rate were 60% of wall clock time, then a randomly chosen job which required 36 seconds of CPU time would be expected, on average, to have an execution residency time of five minutes. This expected value relationship is expressed as : (25a) execution residency = multiprogramming level CPU time throughput rate or equivalently (25b) execution residency = B L (CPU time) , throughput rate where L is the multiprogramming level and B is a multiplicative constant that is one for job classes which get their "fair share" of the CPU. Job classes which get more than their fair share would have 21 shorter residency times and B would be less than one, whereas for job classes which were dominated, B would be greater than one. In formulating this job execution residency model, it was implicitly assumed that the average long term effects of job to job variations in any other variables were nil. The argument is that a job which must wait for core because it is large, for example, will later hold up other jobs once it gets core, thereby balancing out the effect of previously having to wait. This working assumption may or may not have been valid, and warranted testing. The approach taken was first to estimate an ordinary least squares model of the form (1), (2a), and (2b) with job execution residency as the dependent variable. The multiprogramming level times job CPU time divided by throughput rate was an independent variable with additional independent variables measuring the levels of I/O activity, the number of steps in the job, and amount of core memory used by the job. Next, hypotheses of the form given by (6) were introduced that the coefficients of all independent variables were zero except for the coefficient of the CPU variable weighted by the ratio of the multiprogramming level to the throughput rate. The hypotheses were tested using (9) as a test statistic. This model was estimated using data from all jobs which executed entirely within the sample periods over which data was collected, i.e. jobs which partially executed outside all observed periods were not used. Of the jobs which were used, some executed entirely within a 28 single sample period, and some executed during more than one period. For jobs which executed during more than one period, a pro rata weighting factor for job CPU was constructed. For example, the weighting factor, w , for some particular job which may have been resident for H0% of its total residency during observation i , and 60% of its total residency during observation i+1 , was calculated as w = .4 w(i) + .6 w(i+1) , where w(i) was the weighting factor for jobs which executed during the i-th period exclusively. This residency model was estimated for each job class for which data was available. Use of the restricted least squares hypothesis testing procedure again resulted in a preliminary test estimator, and again a James-Stein positive part estimator improvement was appropriate. For the purposes of informal comparison, some alternate expressions for the basic execution residency equation were also estimated from the same data. These estimations of execution residency give expressions for the expected residency of a job given its class and the amount of computer resources it uses during execution. Whereas a job's class is determined at the time it is read into the system, its use of CPU time and other resources are not known until it actually executes. This clearly presents a problem if one is interested in predicting the execution residency time of jobs which have not yet executed. An estimate for the system resources which a job will use is the sample mean of the use of that resource by jobs in its class. An estimate of a job's expected execution residency time, therefore, would be the predicted execution 2Q residency of an "average" job in its class as determined by substituting mean resource usage into the James-Stein estimated residency equation for its class. The expected execution turnaround of a particular job would be the sum of its expected execution residency time and the expected execution residency times of all jobs ahead of it in its own job class divided by the number of multiprogramming levels servicing that class. This estimate for turnaround takes into explicit account both the number of jobs in each class queue and the multiprogramming level servicing each class. Changes in these turnaround parameters are ignored when average turnaround is used as an estimate of a job's expected turnaround. The problem of predicting turnaround in non-FIFO scheduling algorithms remains but may be attacked with the use of, again, stochastic process models. 30 3. The System and The Collection of Data 3. 1 The System The particular machine considered was the IBM 360/75 system which resided in the Digital Computer Laboratory, University of Illinois at Urbana-Champaign . Operation of the machine was the responsibility of the Computing Services Office. The system included one million bytes of IBM fast core memory and two million of Ampex slow core memory; four selector channels, one of which was dedicated to a high speed drum with the remaining three selector channels being connected to 231 i t-type Ampex disk drives; and one multiplexor channel for magnetic tape drives, local unit record peripheral equipment, and a remote job entry communications controller. Also attached to the multiplexor channel were two mini- computers dedicated to front end processing of timesharing users. This system ran under the multiprogramming system OS/MVT and Hasp. The timesharing system was known as Plorts, and the batch small job monitor was known as Express. 31 Jobs were read into the system under the control of Hasp Input Service. Jobs with ID card errors were immediately flushed. Hasp determined the appropriate job class for the remaining jobs and spooled the input for later use by the system. During the time between when a job was read in and when it was chosen for execution, a job was said to be in the Hasp Execution Queue. Each job in the Hasp Execution Queue was identified as belonging to one of the classes A, B, C, D, E, F, G, or X. The system was capable of holding a maximum of six hundred jobs. The class X queue was for Express. A job was scheduled as a class X job if SYSTEM=EXPRESS was specified by the user as an ID card parameter. Normally Express jobs went into execution immediately after being read in. There was always one Express initiator available which was dedicated to running class X jobs only. If, in the opinion of the operations staff, the backlog of Express jobs had become large, additional class X initiators would have been started. One of the job classes A through G was assigned to a non-Express job by Hasp Input Service on the basis of ID card estimates according to the formula: "magic number" = 3*CPU + .05*10 + .01*K**2 where CPU was the number of CPU seconds requested for the job; 10 was the number of Execute Channel Program supervisor calls (i.e. input/ output requests) exclusive of calls for cards read, lines printed, cards punched, or plotting requested for the job; and K was the amount of primary memory expressed in kilobytes requested for the job. A job's 32 class, A through G, was chosen on the basis of where its "mae;ic number" fell in relation to the job class boundaries given in Table 1 . Within each job class, jobs were selected by Hasp modules, known as initiators, for execution according to a first-in first-out, or FIFO, discipline. Once a job had been selected for execution it was removed from the Hasp Execution Queue and placed in the OS/MVT Job Queue. It was not returned to the Hasp Execution Queue except by the deliberate intervention of the operator. Each Hasp initiator could select only one job at a time for execution so that the number of active Hasp initiators was the number of Hasp jobs in the OS/MVT job queue. If an initiator was free to select a job but there were no jobs available for it to select, it was said to be inactive. If an initiator had a job "on it" but had been set by the operator not to select any new jobs, it was draining. If an initiator was both not processing a job and had been set not to select any new jobs, it was drained. During the collection of data, only observations in which initiators were either drained, draining, or active were used. Table 1 JOB CLASS BOUNDARIES Class "magic number" - MN A < MN < = 450 B 450 < MN < = 850 C 850 < MN < = 1600 D 1600 < MN < = 2800 E 2800 < MN < = 4500 F 4500 < MN < = 7000 G 7000 < MN < = 32767 The queues from which initiators selected jobs were specified by sets of ordered pairs of letters. For example, if initiator 1 were set to ACDD, it would first select the oldest job in the collection of all jobs from classes A through C. Only if there were no jobs in classes A through C would the initiator then consider jobs from class D through class D (or more simply just class D jobs). In this example if there were no A, B, C, or D jobs, then the initiator would become inactive. Once a Hasp initiator had selected a job for the OS/MVT job queue, its previous Hasp job class was essentially irrelevant since OS/MVT had no knowledge of what the Hasp job class designation was for jobs. OS/MVT did have information about the maximum CPU time to allow a job to run so that the job could be abnormally terminated if the job exceeded this limit. This information, although used by Hasp in its scheduling algorithm, was not used by OS/MVT in its scheduling algorithm. The above scheme is typical of the Hasp OS/MVT scheduling technique. The major departures are that each computer installation specifies its own number and types of initiators, and each computer installation has its own algorithm for defining job class boundaries. Once a Hasp job was in the OS/MVT job queue, OS/MVT went about the business of running the first program, or jobstep, of the job.[1] Preparing a jobstep for execution consisted of acquiring a region of main storage, locating data sets that were to be input to the jobstep, assigning I/O devices required for the jobstep, reserving auxiliary [1] IBM Corporation, MVT Guide , 169-178 34 storage for data sets created during the step, and finally attaching a task for the jobstep so that execution could begin. When the first jobstep terminated, subsequent jobsteps, if any, were sequentially processed in a similar manner. Once a job had terminated execution, Hasp scheduled it for peripheral activity. Job priorities were used by the OS/MVT supervisor to select the multiprogramming task to be resumed upon completion of interrupt handling. Hasp jobs normally ran at priority 6, Hasp itself ran at priority 13, Plorts ran at priority 12, and Express ran at priority 11. Two additional monitors, Civil and UOI, ran at priorities 10 and 13 respectively. The fact that Express and Plorts both ran at priorities higher than Hasp jobs was the reason that Express and Plorts work was performed immediately. The result of this was that the amount of Express and Plorts work done in any fifteen minute observation was nominally the same as the amount of such work demanded during the observation. Among Hasp jobs, the dispatching order was reordered every two seconds so that the job which had been using the least amount of CPU time was dispatched first. 3.2 Throughput Data Data for the estimation of the throughput and execution residency functions were obtained from four different sources: ^5 1. SMF - System Management Facility 2. Hardware monitor 3. Console log 4. ARDS - the Accounting Record Data Set Data on throughput, where throughput was defined to be seconds of billable Hasp job CPU time per observation, was available from SMF step termination records. The SMF is an IBM written data gathering facility described in IBM manual number GC28-6712-7 (1973). The hardware monitor, described in Carter and Pelg (1970), is a locally built monitor designed to sample activity on IBM equipment panel lights. This monitor, affectionally known as the heart-lung machine, has eight counters which display accumulated tenths of seconds of panel light activity. These eight counters were set up to display: 1. instruction counter time within Plorts region 2. elapsed clock time 3. wait light time 4. channel time - multiplexor channel 5- channel 1 time - drum dedicated selector channel 6. channel 2 time - disk selector channel 7. channel 3 time - disk selector channel 8. channel H time - disk selector channel The console log provided information on the average number of active Plorts terminals and seconds of CPU time accumulated by the Plorts task per observation. The console log also provided information on the jobqueue backlog. Lines printed, cards punched, cards read, and seconds of Calcomp plotter time for all jobs and seconds of Express job CPU time came from the ARDS, the data set used by the Computing Services Office for all account billings. The number of non-drained initiators per observation, also investigated as a determinant of throughput, was under experimental control. 36 A total of 111 observations, of which 89 were finally used, were collected under the aegis of the Computing Services Office over a three week period from September 19, 1974 through October 4, 1974.. Of the 89 observations, six of them were collected while running with four Hasp initiators, forty-one were collected while running with five Hasp initiators, twenty-seven were collected while running with six Hasp initiators, and fifteen were collected while running with seven Hasp initiators. Afternoon hours were used exclusively since the diurnal nature of mankind and a deliberate choice of management scheduling policy caused the character of the workload to shift radically by time of day. Afternoon hours were characterized by lots of small and medium sized jobs being run. Nighttime saw the large and huge jobs being run while morning hours were relatively lightly loaded so that initiators would sometimes become inactive. Morning hours could not be used since it was necessary that all non-drained initiators be active. If there had been, say, five non- drained initiators but one or more of them had been sometimes active and sometimes not, a question would have arisen as to whether the data from such an observation was indicative of how the system handled five or a fewer number of initiators. In order to avoid the measurement error which such a situation creates, only those observations taken when there was a backlog for all non-drained initiators were considered. Indeed, the twenty-two unused observations which were collected were not used 37 for precisely this reason, even though they were collected during afternoon hours. Measurement error of another type was the reason that nighttime hours could not be used. Some throughput measurement error was present in all 89 observations as a result of the fact that knowledge of how much CPU time any jobstep took (i.e. the amount of throughput the jobstep generated) was available only at its step termination time when an SMF record was written. For each jobstep which started and ended within a single observation, no measurement error was generated since all of its CPU time was properly credited to the throughput for that period. However, each jobstep which was in execution over an observation boundary necessarily generated some measurement error because no decision rule was possible which could have accurately assigned the correct amount of billable CPU time which the jobstep used during the period in question, yet some decision rule could not be avoided. Obviously, the greater the amount of throughput which is assigned to, or left out of, any observation on the basis of some decision rule, the greater the measurement error for the observation. In the case of night jobs, it was common for a jobstep to be in execution for several hours with the result that the measurement error for night sampling periods clearly would have been greater than for afternoon ones. Night hours could not be used because of the substantial throughput measurement error which would have resulted in observations collected while jobs with long execution residency times were running. 38 The decision rule used for handling; a jobstep which was in execution over an observation boundary was to prorate its billable CPU time across its residency time. For example, if 40? of the wall clock time that a jobstep was in execution residency occurred during a particular observation, then that observation was credited with 40? of that jobstep 's total billable CPU time. The determination of when observations began and ended was under control of the Plorts task. Plorts issued interval timing supervisor calls which caused an exit routine to be taken approximately every fifteen minutes. The exit routine wrote its message to the operator's console giving the time of day, accumulated Plorts activity, and the average number of active Plorts terminals since the previous message had been written. When this Plorts message appeared on the operator's console, hardware monitor readings were collected by taking a black and white photograph of the monitor's counters. The camera shutter was released manually. Photographing the hardware monitor was necessary since the counter values were not available in machine readable form. Also, whenever the Plorts message came out on the console and data was being collected, the operator interrogated Hasp for the total number of jobs in the job queue and the number of jobs in each job class. Hasp responded by writing this information to the operators 's console. The Plorts, backlog, and hardware monitor data were keypunched later from these records. A list of the 89 observations and the number of active initiators during each one is given in Appendix A. Values of activity measured by the hardware monitor are given in Appendix B. Backlog 39 figures at the beginning and end of each observation were averaged to give an estimate of the average backlog during the observation. Values of backlog and Plorts activity taken from the operator's console are given in Appendix C. During the three week period over which the data were collected, SMF and ARDS data were being automatically recorded on a continuing basis. A total of three SMF and two ARDS magnetic tapes had data on them covering the 89 observations. PL/1 programs were written to read these tapes and reduce the data on them to a useable format. Time boundaries for the observations, as defined by the appearance of the Plorts message on the operator's console, were used in the processing of data from both the SMF tapes and the ARDS tapes. Hasp job billable CPU time per observation was collected from the SMF step termination records as previously described. In the data collected from the ARDS (i.e. all cards read, cards punched, lines printed, seconds of Calcomp plotter time, and accumulated Express CPU time), no attempt at prorating the data was attempted. For all jobs, the assumption was made that cards read were read in during the observation time period in which the job first went on a reader. All printing, punching, and plotting was assumed to take place during the observation in which the job purged. Similarly, all Express CPU time was assumed to take place during the same period in which the job went into execution. 40 The ARDS data on cards read, lines printed, and cards punched were available by the job entry site at which the activity took place. There were eighteen reader sites, fourteen punching sites, and eighteen printing sites for a total of fifty device type by site variables. A problem was that the locally driven devices generally had higher data rates than remote job entry station devices. Also, lines to the various remote job entry sites had different bandwidths. It was clearly necessary to collapse the fifty variables of peripheral data, but it was not entirely clear whether cards punched over a 4800 baud line were more like cards punched over a 9600 baud line or cards read over a 4800 baud line in terms of their influence on Hasp job throughput. The decision was made to treat all record types alike and ignore device rates rather than the other way around. Observations were not all the same length, plus there was random variability in the elapsed time between when a Plorts message would appear on the operator's console and when the corresponding photograph would be taken of the hardware monitor. To correct for these measurement difficulties, all data was normalized to a standard fifteen minute interval. All SMF, ARDS, and console data, except for the average number of active terminals and average backlog, were multiplied by the ratio of nine hundred seconds to the number of seconds in the observation as defined by the difference in the times of day on the beginning and ending Plorts messages. Similarly, hardware monitor activity was divided by the length of the observation interval as recorded by the free running counter number two and multiplied by nine 41 hundred seconds. Two measures of Plorts CPU time were collected, one from the hardware monitor, and one from the operator's console. Correlation between the two measures was .91 • This indicated that measurements between the hardware monitor and the operator 's console Plorts message were consistent. The difference in the amount of time spent in Plorts as indicated by each of the measurements is accounted for by the fact that the hardware monitor measured time during which the address portion of the PSW (i.e. the address of the instructions being executed), was between two particular addresses in primary memory whereas the Plorts CPU time, as measured by the operator's console message, measured time spent by the Plorts task. Apparently, a considerable amount of execution time was being spent by OS/MVT in the name of Plorts outside the Plorts area of memory. In the case of Plorts, this discrepancy is assumed to be due to time spent servicing task needs connected with I/O activity. For the purposes of estimating the effect of Plorts CPU time on Hasp throughput, the Plorts console measurement was chosen since it contained more information about the amount of CPU time used by Plorts. 3.3 Turnaround Data Data for turnaround consisted primarily of the ARDS records for jobs run during the same 89 time periods over which throughput data was collected. Unfortunately, ARDS did not contain accurate times of when 42 jobs went into or finished execution. The procedure followed was to read the SMF job termination records for jobnames, the time when each job ended execution, and the number of steps in each job. Each jobname was then looked up in a sorted table of available ARDS records jobnames. A correspondence table then pointed to job information in a random access organized copy of the ARDS data base. The amount of time that the job was in execution was available in the ARDS. If the jobname was unique in the ARDS data base, then that job was included in the sample, otherwise it was not. The total number of jobs excluded because of duplicate jobnames was 202. No effort was made to retrieve SMF step termination records for the jobs. Information on a total of 763 class A jobs, 203 class B jobs, 94 class C jobs, and 5 class D jobs was collected . The ARDS records provided execution residency time, billed CPU time, the amount of fast and slow core, and the number of I/O requests used by the job exclusive of requests for peripheral activity. Also in the ARDS records was the amount of peripheral activity, that is the number of cards read, cards punched, lines printed, and seconds of plotter time generated by the job. 43 4. Results 4.1 Throughput The independent variables used in the estimation of throughput, and the names by which they are subsequently referred to in this thesis, are: column of ones dummy variable, five active Hasp initiators dummy variable, six active Hasp initiators dummy variable, seven active Hasp initiators Plorts task CPU time from console message wait light activity channel activity channel 1 activity channel 2 activity channel 3 activity channel 4 activity average number of active Plorts terminals average number of jobs in the system Express CPU time seconds of Plotter time cards read cards punched lines printed. The dependent variable is identified as THROUGHPUT. 1. CONSTANT 2. 5 INITS 3- 6 INITS 4. 7 INITS 5. PLORTCPU 6. WAIT TIME 7. CHANNEL 8. CHANNEL 1 9. CHANNEL 2 10. CHANNEL 3 11. CHANNEL 4 12. # TERMS 13- BACKLOG 14. EXPR CPU 15. PLOT SEC 16. CARDSREAD 17. CARDSPUNCH 18. LINESPRINT The variables 5 INITS, 6 INITS, and 7 INITS are dummy variables, that is, have value either zero or one. The variables PLORTCPU, WAIT TIME, CHANNEL 0, CHANNEL 1, CHANNEL 2, CHANNEL 3, CHANNEL 4, 44 EXPR CPU, PLOT SEC, and THROUGHPUT are all in seconds of activity per fifteen minutes. BACKLOG is the average of the number of jobs in the Hasp queue at the beginning and at the end of each observation. The variable # TERMS is the average number of Plorts terminals logged on during the observation. CARDSREAD, CARDSPUNCH, and LINESPRINT are in terms of unit records per fifteen minutes. Table 2 shows the mean and standard deviation, in each of the two samples, for each of the last seventeen independent variables and for the dependent variable. There were forty- five observations in sample one and forty- four observations in sample two. Table 2 Means and Standard Deviations by Sample First Sample Second Sample VARIABLE MEAN STAND. DEV. MEAN STAND. DEV 5 INITS 0.467 0.499 0.455 0.498 6 INITS 0.311 0.463 0.295 0.456 7 INITS 0.156 0.362 0.182 0.386 PLORTCPU 234.801 98.498 230.182 90.103 WAIT TIME 13.135 17.275 11.884 19.925 CHANNEL 83.830 77.013 79-253 68.398 CHANNEL 1 151.756 65.522 159.808 48.526 CHANNEL 2 204.439 94.187 192.314 50.058 CHANNEL 3 193.156 137.192 202.715 144.334 CHANNEL 4 182.873 58.989 180.389 52.396 # TERMS 27.244 7-370 26.614 7.365 BACKLOG 161 .089 56 . 3 1 5 167.307 56.778 EXPR CPU 45.780 15.340 44.296 14.968 PLOT SEC 121 .214 172.466 115.390 242.399 CARDSREAD 1 1135.889 4692.323 11216.538 5369-399 CARDSPUNCH 466.348 775.408 670.796 1134.611 LINESPRINT 19429.671 9071 .112 19370.971 7407.885 THROUGHPUT 263-757 88.247 279-462 89.046 45 Table 3 shows the ordinary least squares estimates for the regression coefficients estimated from the first sample of data along with an additional column of t-ratios. Each t-ratio is the ratio of a regression coefficient to its standard error and is a test statistic for testing the hypothesis that the numerator coefficient is significantly different from zero. Table 3 First Sample Ordinary Least Squares Estimates VARIABLE CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU WAIT TIME CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT COEFFICIENT 559.640064 -85.677230 -93-373450 -39.715200 -0.447093 -0.154016 -0.120110 -0.088327 0.117751 0.025301 0.039271 -0.612968 0.059455 -2.296580 -0.019927 -0.002579 -0.014905 0.001235 T-RATIO 4.397768 -1 .7979 19 -2.007523 -0.678047 -1.978566 -0.172599 -0.744931 -0.342260 0.793057 0.307520 0.226993 -0.287407 0.288432 -2.884794 -0.362034 -0.813457 -1 .047926 1.143110 The sum of squared error, ( Y-XB) '( Y-XB) , for the ordinary least squares solution on the first sample of data was 86000. The estimate of variance was 3185.2, and the coefficient of multiple correlation was .869. The sum of squares cross products matrix is given in Appendix F. The estimate of the variance-covariance matrix for the ordinary least squares coefficients using the first sample is given in Appendix G. 46 The CONSTANT coefficient implied that there were 900 - 559.64 = 340.36 seconds of unaccounted for overhead per fifteen minutes when running with four initiators. The coefficients for 5, 6, and 7 INITS implied that the unaccounted for overhead was 426.04 seconds with five, 433-73 seconds with six, and 380.07 seconds with seven initiators. In the first sample, Hasp throughput decreased by an average of .447 seconds for each second of Plorts activity and 2.3 seconds for each second of Express activity. The column of t-ratios in Table 3 indicated that the coefficients for CONSTANT, 5 INITS, 6 INITS, PL0RTCPU, and EXPR CPU were significant at the .1 level and that the others were not. It was hypothesized, therefore, that the coefficients for the twelve variables WAIT TIME, CHANNEL 0, CHANNEL 1, CHANNEL 2, CHANNEL 3, CHANNEL 4, # TERMS, BACKLOG, PLOT SEC, CARDSREAD, CARDSPUNCH, and LINESPRINT were zero. The coefficient for 7 INITS was not hypothesized to be zero, even though it was not significant in the first sample, since the coefficients for 5 INITS and 6 INITS were significant, and it seemed reasonable to treat the three initiator variables as a class. With the above twelve variables restricted (hypothesized) to have coefficients of zero, the model was estimated again, still using the first sample of data. The coefficient estimates are given in Table 4, and the estimate of the variance-covariance matrix is given in Appendix H along with the inverse of the sum of squares cross products matrix (of the six unrestricted coefficients) which was later used as a stochastic 47 prior information matrix Table 4 First Sample Exact Restricted Least Squares Estimates VARIABLE COEFFICIENT T-RATIO CONSTANT 551.257652 15.933139 5 INITS -81.120235 -2.261331 6 INITS -87.468766 -2.311494 7 INITS -52.286001 -1.257343 PLORTCPU -0.492793 -5.634263 EXPR CPU -2.153568 -3.847391 The sum of squared error for the exact restricted model using the first sample of data was 107940. The estimate of variance was 2767-7, and the coefficient of multiple correlation was .832. The F-ratio was .574 which, if the test of hypotheses had been formed without the benefit of having looked at the t-ratios, would not have rejected the hypotheses that the twelve coefficients were zero, since 85^ of the F distribution with 12 and 27 degrees of freedom is greater than .574. Next, an ordinary least squares solution was calculated using the second sample alone. A test of compatibility between these estimates and prior information generated from the restricted estimation on the first sample was subsequently done. Table 5 shows the ordinary least squares estimates and corresponding t-ratios estimated from the second sample. A comparison of Table 3 and Table 5 shows a 16 1 second drop in the CONSTANT coefficient. The coefficients for 5 INITS and 6 INITS were not significant in the second sample, and the coefficients for CHANNEL 2 and CHANNEL 3 were significant and had positive sign in the second sample, 48 but were not significant in the first sample. The coefficients for PLORTCPU and EXPR CPU remained negative and significant. Table 5 Second Sample Ordinary Least Squares Estimates VARIABLE CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU WAIT TIME CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT COEFFICIENT 398.468561 -32-500840 -34.306105 9.160686 -0.329023 0.096762 0.242331 -0.247365 0.657034 0.259085 -0.279221 -0.623909 0.077421 -2.295179 0.015678 0.001656 -0.003757 -0.002164 T-RATIO 3.872288 -0.830606 -0.999077 0.253968 -1.719392 0.163795 1.658375 -0.927424 2.660Q42 3.330325 -1.347550 -0.350428 0.533708 -3.204583 0.356936 0.842072 -0.502224 -1.374564 The sum of squared error was 59312, and the estimated variance was 2281.2. The coefficient of multiple correlation was .911. The sum of squares cross products matrix for sample two is given in Appendix I, and the estimate of the variance-covariance matrix in the ordinary least squares case for sample two is given in Appendix J. Next, exact and stochastic restrictions were imposed on the second sample, and the model was estimated under these restrictions. A test of hypotheses was performed to determine if the sample and prior information were compatible. Restrictions were imposed on all eighteen independent variables. The twelve variables which had been exactly 40 restricted to zero in the first sample were exactly restricted to zero in the second sample, and the remaining six variables were restricted to a multivariate distribution with mean given by the COEFFICIENT column in Table 4, and variance-covariance structure given by the variance for the model times the prior information matrix given in Appendix H. The estimated regression coefficients, using the second data sample and stochastic restricted least squares, are given in Table 6. Table 6 Second Sample Stochastic Restricted Least Squares Estimates VARIABLE CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU EXPR CPU COEFFICIENT 546.666948 -54.400008 -49.482815 -24.572966 -0.559189 -2.240166 T-RATIO 23-639095 -2.503728 -2.180647 -1 .012293 -9.396908 -6.241305 The sum of squared error was 109814, and the estimate of variance was 2889.8. The coefficient of multiple correlation was .828. An estimate of the variance-covariance matrix for the stochastic restricted regression coefficients is given in Appendix K. The compatibility statistic was 3-68968 which rejected the hypotheses when compared against the F distribution with 6 and 26 degrees of freedom, since the area in the tail of the distribution is .00873- The preliminary test estimator is dominated in a Mean Square Error sense by the positive part James-Stein estimator. The positive part James-Stein estimates are given in Table 7. The value of (1-a/F~) used in calculating these estimates was .55259, where the critical value, 50 a , was 1.65079- Using these estimates, the coefficient of multiple correlation was .715 in the first sample and .896 in the second. Table 7 Throughput Model Positive Part James-Stein Estimates VARIABLE CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU WAIT TIME CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT COEFFICIENT 464.773758 -42.298711 -41 .096292 -5.932032 -0.432001 0.053470 0.133910 -0.136692 0.363071 0.143168 -0.154295 -0.344767 0.042782 -2.270566 0.008663 0.000915 -0.002076 -0.001196 4.2 Turnaround In a FIFO scheduling discipline, the expected execution turnaround for a job is the sum of its expected execution residency plus the expected execution residency of all jobs ahead of it in the same class divided by the number of multiprogramming levels servicing jobs in that class. To model expected execution residency times, job execution residency functions were estimated for each of the job classes A, B, and C. A function for class D jobs was not estimated since the number of independent variables exceeded the sample size. The independent 51 variables, and the names by which they are referred, are CONSTANT L*CPU/TH 3. IOREQ - column of ones - Hasp multiprogramming level times job CPU seconds divided by system throughput rate - number of job 10 requests exclusive of requests for unit record devices - hundred kilobytes of high speed core memory - hundred kilobytes of slow speed core memory - number of job steps in the job - number of cards read by job - number of cards punched by job - number of lines printed by job - second of plotter time generated by job The dependent variable is identified as RESIDE. Table 8a gives the means and Table 8b gives the standard deviations for the residency variables in each of the job classes A, B, C, and D. Table 8a Means of Execution Residency Data by Class 4. RGNO 5. RGN1 6. STEP 7. JOBREAD 8. JOBPUNCH 9. JOBPRINT 10. J0BPL0T VARIABLE CLASS A CLASS B CLASS C CLASS D L*CPU/TH 130.8155 253-2274 700.3682 1261 .6494 IOREQ 297.1835 304.8916 1257-5851 3636.2000 RGNO 1.1811 1.7825 2.1365 1.8460 RGN1 0.0005 0.0896 0.1483 0.1600 STEP 1.7785 1.6552 2.1170 2.2000 JOBREAD 342.4836 309.8177 463-9681 165.4000 JOBPUNCH 54.4626 3-6502 11 .2021 13.8000 JOBPRINT 621.5033 652.4877 1395.6915 1841 .8000 J0BPL0T 8.3971 13-4138 20.7766 0.0000 RESIDE 165.5609 315.7192 679.2660 741.8000 Table 8b Standard Deviations of Execution Residency Data by Class VARIABLE L*CPU/TH IOREQ RGNO RGN1 STEP JOBREAD JOBPUNCH JOBPRINT J0BPL0T RESIDE CLASS A 188.9304 475.8803 0.3150 0, 0, 614. 300, 2531. 38, 187- 0102 9227 2224 6418 3724 6619 2538 CLASS B 398.1240 399.0410 0.3905 0.2108 0.8762 436.7106 41.6804 812.1435 65.8754 34 3.6127 CLASS C 1066.2295 2580.4859 0.7916 0.2803 1 599 52 1966 140 698 .2704 .4597 .6471 .6624 .0837 .0862 CLASS D 1166.3849 3962.3346 0.8730 0.3200 1.6000 141.0611 27.6000 1802.7827 0.0000 467-7347 52 The variables L*CPU/TH and RESIDE are measured in seconds. Sura of squares cross products matrices are given in Appendix L. Ordinary least squares estimates, and their corresponding t-ratios, for the Class A, Class B, and Class C residency models are given in Table 9. Table 9 Job Execution Residency Model Parameter Estimates and T-Ratios Ordinary Least Squares CLASS A CLASS B CLASS C VARIABLE COEFF. T-RATIO COEFF. T-RATIO COEFF. T-RATIO CONSTANT -27-5966 -1.140 -75.4277 -0.744 -28.5122 -0.153 L*CP07TH 0.5291 15-960 0.6026 12.032 0.4193 7.830 I0REQ 0.0590 4.614 0.0478 0.877 0.0607 2.877 RGN0 86.6472 5.042 106.4253 2.244 147.3806 2.212 RGN1 -221.7544 -0.423 -4.9220 -0.053 -183.4167 -0.973 STEP 6.0071 0.980 37.4996 1.467 21.5536 0.478 JOBREAD -0.0107 -1.096 -0.0402 -0.846 -0.0440 -0.491 JOBPUNCH -0.0002 -0.010 -0.2902 -0.669 0.1670 0.173 J0BPRINT -0.0020 -0.905 -0.0179 -0.731 0.0155 0.546 J0BPL0T -0.1916 -1.371 -0.1599 -0.572 0.0704 0.204 On the basis of the t-ratios in Table 9, the coefficients for L*CPU/TH and RGN0 were significantly different from zero in all three classes, and the coefficient for I0REQ was significantly different from zero in class A and class C. The coefficients for L*CPU/TH and IOREQ remained fairly stable across job classes while the coefficient for RGN0 increased by 20 seconds per hundred kilobytes from class A to class B and by 41 seconds per hundred kilobytes from class B to class C. The coefficients for CONSTANT, RGN1 , STEP, JOBREAD, JOBPUNCH, JOBPRINT, and J0BPL0T were consistently not significantly different from zero. The sum of squared error, estimate of variance, and coefficient of multiple correlation for each of the job class models are Riven in Table 10. 53 Table 10 Ordinary Least Squares Job Residency Models by Job Clas3 Summary Statistics Sum of Multiple Squared Error Variance Corr . CLASS A . 164462E 08 .218409E 05 .621 CLASS B .120226E 08 .622933E 05 .706 CLASS C .177090E 08 .210821E 06 .783 Next, exact linear restrictions (hypotheses) were imposed on the regression parameters in each of the three models. These linear hypotheses, made prior to having looked at the ordinary least squares results given in Tables 9 and 10, were each of a form restricting a single parameter to zero. There were nine such restrictions in each model. The only coefficient in each model which was not restricted to zero was the coefficient for the variable L*CPU/TH. The results of these restrictions, given in Table 11, may be compared against Table 9 In each job class, the coefficient for L*CPU/TH is larger in Table 11 than in Table 9, but in no instance is the coefficient for L*CPU/TH equal to, or greater than, one. Table 11 Job Execution Residency Model Parameter Estimates and T-Ratios Restricted Least Squares CLASS A CLASS B CLASS C VARIABLE COEFF. T-RATIO COEFF. T-RATI0 COEFF. T-RATI0 L*CPU/TH 0.8019 30.126 0.7779 18.105 0.6286 13-985 For each of the exact restricted job residency models, the sum of squared error, the estimate of variance, and the coefficient of multiple correlation are given in Table 12. These results may be compared with the results in Table 10. 54 Table 12 Exact Restricted Least Squares Job Residency Models by Job Class Summary Statistics Sum of Multiple Squared Error Variance Corr. CLASS A .217560E 08 .285512E 05 .432 CLASS B .168536E 08 .834336E 05 .545 CLASS C .287409E 08 -309042E 06 .610 F tests were performed to test the hypotheses that the sample and hypothesized prior information were compatible. The results of these F tests are given in Table 13- Table 13 Job Residency Model Hypothesis Test Results Degrees of F-Ratio Prob Freedom CLASS A 27.01 .000 9 , 753 CLASS B 8.62 .000 9 , 193 CLASS C 5.81 .000 9 , 84 The hypotheses were rejected in all three models. Since these models each satisfy the conditions for positive part James-Stein estimates to have a Mean Square Error smaller than the Mean Square Error for the preliminary test estimator, the positive part James-Stein estimates were calculated and are presented in Table 14. These estimates may be compared to the results given in Table 9 and in Table 11. The coefficients of multiple correlation for these models, when rounded to three decimal places, are identical to the coefficients of multiple correlation in Table 10. This is not surprising since the regression coefficients in Table 14 and in Table 9 are nearly identical, 55 Table 14 Execution Residency Models Positive Part James-Stein Estimates VARIABLE CLASS A CLASS B CLASS C CONSTANT -27-5776 -74.7992 -27-7139 L*CPU/TH 0.5293 0.6040 0.4252 IOREQ 0.0590 0.0474 0.0590 RGNO 86.5877 105.5386 143.2542 RGN1 -221.6022 -4 .8810 - 178 .281 3 STEP 6.0029 37.1871 20.9501 JOBREAD -0.0107 -0.0399 -0.0427 JOBPUNCH -0.0002 -0.2878 0.1624 JOBPRINT -0.0020 -0.0177 0.0150 JOBPLOT -0.1915 -0.1586 0.0684 The fact that the hypothesized prior information was rejected in each of the three job class models (Table 13) raised the question as to whether or not the variable L*CPU/TH was a "better" predictor of execution residency than, say, unweighted CPU. This question was informally considered by calculating ordinary least squares estimates using the same set of independent variables as were used in the regressions of Tables 9 and 10, except that CPU was substituted for L*CPU/TH. A comparison of the coefficients of multiple correlation, given in Table 15, suggested that L*CPU/TH was "better" than CPU for predicting execution residency. The first column of Table 15 is the same as the Multiple Corr. column of Table 10. Table 15 Execution Residency Models Ordinary Least Squares Multiple Correlation Coefficients using using L*CPU/TH CPU CLASS A .621 .535 CLASS B .706 .599 CLASS C .783 .727 56 5. Discussion 5.1 Throughput The coefficients of multiple correlation in each of the two samples (.715 in the first sample and .896 in the second sample), using the positive part James-Stein estimates, were good. These estimates were used as alternatives to the conventional preliminary test estimates since they have a smaller Mean Square Error than preliminary test estimates. A comparison of the results in Tables 3 and 5 showed that in both samples the variables PLORTCPU and EXPR CPU had a clearly measurable effect on THROUGHPUT, and that the coefficients for WAIT TIME, # TERMS, BACKLOG, PLOT SEC, CARDSREAD, CARDSPUNCH, and LINESPRINT were not significantly different from zero. Noticeable differences between first and second sample ordinary least squares results occurred primarily in the CONSTANT, the initiator variables, the PLORTCPU variable, and the channel variables. 57 Particularly striking were the differences in the channel variables. In sample one, each channel coefficient was not significantly different from zero at the .1 level. In sample two, however, both CHANNEL 2 and CHANNEL 3 coefficients were significantly greater than zero. In addition, these two coefficients were respectively five and six times larger than their first sample estimates. All second sample ordinary least squares coefficients for channel variables were greater in absolute value than their corresponding sample one coefficients. These first and second sample differences were unexpected. Channel activity variables had been included in the throughput model under the presumption that their coefficients, if significantly different from zero, would be negative. Such a result would have had an obvious cause and effect interpretation, viz. an increase in I/O activity caused a decrease in throughput. Unfortunately, the significantly positive second sample estimates for the CHANNEL 2 and CHANNEL 3 coefficients presented an interpretation problem from such a cause and effect point of view. Was one to believe that an increase in I/O activity resulted in an increase in throughput, or was there an alternate explanation for these results? An explanation for no measurable negative effect on throughput due to I/O activity is that instructions executed in the service of an I/O request were credited to the currently active task, and therefore were included as part of throughput if that task was part of a Hasp job. Any 58 decrease in throughput due to increased I/O activity was thereby effectively masked. One possible explanation for the measured positive relation between I/O activity and throughput is that as throughput increased, the aggregate of running Hasp programs could perform more I/O, hence a positive relation between throughput and channel activity is reasonable. This explanation, which assumes that the level of channel activity is dependent upon throughput, has a certain intuitive appeal. It should be noted, however, that measured channel activity was for the entire system, not just for Hasp jobs. In any event, it appears that the indirect attempt to measure degradation in throughput due to channel I/O activity was not successful. The positive valued channel coefficients, and the specification error which they imply, are also suspected to have had a confounding influence upon, and to have been largely responsible for, the sample differences in the CONSTANT, PLORTCPU, and initiator variable coefficients. The CONSTANT and PLORTCPU coefficients may have been unduly low in sample two as compensation for the positive CHANNEL 2 and CHANNEL 3 coefficients. In light of the lack of significance of the initiator variables in the second sample, the effect on throughput of varying the number of initiators, when the system had no inactive non-draining initiators, was left somewhat in doubt. There did not appear to be any explanation as to why the CHANNEL 2 and CHANNEL 3 coefficients were significantly positive in sample two but were not so in sample one. Thus, these differences were assumed to be 59 due to random sampling variability. Likewise, there was no apparent explanation why the CHANNEL 4 coefficient was not similarly affected, since CHANNEL 4 was similar in function to CHANNEL 2 and CHANNEL 3. Concerning the positive part James-Stein estimates for the PLORTCPU and EXPR CPU coefficients, -.432 and -2.27 respectively, it was observed that these estimates indicated that for each one second decrease in Plorts activity there was a corresponding increase in Hasp throughput of .432 seconds, and for each one second decrease in Express activity there was a 2.27 second increase in Hasp throughput. Why was it not the case that a reduction of one second in either Plorts or Express time resulted in a one second increase in Hasp throughput? In the case of Plorts activity, each second not used by Plorts was available to the whole system, not just to Hasp jobs. Part of that single second, namely 43.?^ of it, went to Hasp jobs and the balance went into general overhead or Express. In the EXPR CPU case, it must be remembered that the activity data gathered was not the entire time used by the Express subsystem, but only that portion which was billable CPU time. Therefore, by eliminating a second of Express billable CPU time one gained not only that second but also a corresponding portion of Express overhead. Apparently, this overhead was 1.27 seconds per second of billable Express CPU time. It is conjectured that if execution time for the entire Express subsystem had been available, rather than just the portion which was billed to users, the coefficient for Express would also have been approximately .4. 60 The major shortcoming of the throughput model presented was the large percentage of overall CPU time, approximately 40% of wall clock time, which the operating system spent performing various functions but for which the model failed to account. This was unfortunate, but largely unavoidable, since the use of a production environment constrained data collection to facilities which were already available. In particular, no new software probes were added to the ones which already existed, and the hardware monitor had only one address compare board so that only one region of primary memory could be selected for hardware monitoring. The region which was selected was the Plorts region. SMF and ARDS data contained no information on how the operating system was spending the balance of its time. As a result, any substantial refinement of the throughput model for this system would have required a remedy for these restrictions. 5.2 Turnaround Residency functions, which can be used as a basis for estimating turnaround by jobclass, were estimated for class A, B, and C jobs. The hypotheses were rejected in all three cases that all coefficients, except the one for L*CPU/TH, were zero. Again, positive part James-Stein estimates were used instead of the preliminary test estimates. Although the hypotheses were rejected, it was felt that the variable L*CPU/TH was a useful construct and that the hypotheses were rejected simply because other variables do have a measurable effect on 61 job residency. If this model were to be estimated again, it is recommended that only the coefficients for RGN1, STEP, JOBREAD, JOBPUNCH, JOBPRINT, and JOBPLOT be hypothesized to be zero. The coefficient for L*CPU/TH was consistently less than one. It had been expected that those job classes which got less than their "fair share" of CPU time, on average, would have L*CPU/TH coefficients greater than one. Since this never happened, apparently no such effect is present along class lines. Rather in all classes, jobs which were CPU prone had shorter residency times, and each 16 to 20 I/O requests a job required caused an increase in its execution residency of approximately one second. Whereas the coefficients for L*CPU/TH and IOREQ were relatively stable across job classes, the coefficient for RGNO, the only other coefficient which was significantly different from zero, increased sharply across job classes. It is felt that the reason for this phenomenon is that the time spent waiting for region allocation is not a linear function of the amount of region requested coupled with the fact that as one proceeds from class A through class B to class C jobs the mean region requested increases. As an example of the estimation of execution turnaround for a job, suppose that a class A job is submitted and becomes the fortieth job in the class A queue. Further assume that five Hasp initiators are active, two of which are processing class A jobs, that Hasp throughput is 15% of wall clock time, and that these conditions are expected to prevail at 62 least until our test job finishes execution. Suppose that the computer resources expected to be used by each of the forty jobs is given in Table 16. Table 16 Turnaround Example Assumed Class A Job Characteristics CPU 8.00 seconds of CPU time IOREQ 300.00 I/O requests RGN0 1.16 hundred kilobytes of fast core RGN1 0.00 hundred kilobytes of slow core STEP 1 .77 job steps JOBREAD 350.00 cards read J0BPUNCH 50.00 cards punched JOBPRINT 620.00 lines printed J0BPL0T 8.40 seconds of plotter time The expected value of the L*CPU/TH variable would therefore be 5*8/. 35 = 114.3- Using the positive part James-Stein estimates in Table 14, the expected execution residency for any one class A job is 155.26 seconds. For the entire forty jobs, the expected residency is 40*155.26 = 6210.4 seconds, and since there are two class A initiators, our test job can be expected to finish execution in approximately half that time, or 3105-2 seconds after it was read in. Another way of looking at this result is that the system is disgorging a class A job approximately every 155.26/2 = 77-63 seconds. For simplicity we have ignored the two class A jobs which were in execution when our test job was read in. If the throughput rate were 55% of wall clock time instead of 35$ , and all other quantities remained constant, the system would complete execution of class A jobs on the average of one every 66.63 seconds. 63 The above calculations were not dependent upon the existence of either Plorts or Express but only upon the rate of Hasp throughput per unit of wall clock time and the expected resource usage characteristics of class A jobs. Also, this model of turnaround is not affected by changing queue length. It can be dynamically used to explicitly account for changes in the total number of initiators, the number of initiators servicing particular job classes, changes in throughput rate, and changes in the resources which jobs may be expected to use. A job residency and turnaround estimating routine, based on the approach used in this thesis, would be an easy to implement and useful operating system enhancement. 5.3 Concluding Remarks Linear models can be an effective tool for computer performance measurement, particularly when the dependent variable is measured in time units, such as the second, which are typically treated as being linear. Models of this type were used to model throughput and turnaround for the batch workload of a multiprogrammed operating system. The data used was collected in a production environment. The estimation of turnaround, for a first-in first-out disciplined job queue under "busy" conditions, was related to the batch throughput rate, thereby relating these two basic measures of performance. Positive part James-Stein estimates were used as final parameter estimates since they dominate, in a Mean Square Error sense, the exact and stochastic 64 restricted preliminary test estimates. The estimation of a model reveals the structural parameters of the system at the time that the data were collected, but the continued modeling of any system, computing or otherwise, is an iterative process. Like any piece of software, a system model must be maintained and improved. Unfortunately, the continued modeling for the particular system presented here is unlikely since the decision was made, subject to the approval to the University of Illinois Board of Trustees, to acquire a new machine to assume the workload of the IBM 360/75. In chapter one it was indicated that this thesis was concerned with performance measurement as a monitoring tool. The statistical techniques presented here are appropriate for the estimation of models which perform this function, and it is felt that the iterative process of estimating models of this type should be carried out over the entire life of any system. As a result of this research it is concluded that the estimation of such models, with data collected under production conditions, is not only feasible but is also beneficial and entirely warranted. Furthermore, global models of this type are important since they provide insight into what parts of an operating system may be relatively uninteresting and what parts may benefit from more detailed study. The use of stochastic process models, such as the ARIMA models of Box and Jenkins, is recommended as an appropriate tool for further research. Use of these techniques will continue to provide useful computer systems models which would not otherwise be available. 65 References Almon, Shirley, "The Distributed Lag Between Capital Appropriations and Expenditures," Econometrica , 33 (January, 1965), 178-196. Anderson, Harold A. and Sargent, Robert G., "A Statistical Evaluation of the Scheduler of an Experimental Interactive Computing System," in Walter Freiberger, ed . , Statistical Computer Performance Evaluation , New York: Academic Press, 1972, 73-98. Anderson, Harold A. and Sargent, Robert G., "Bibliography 31. Modeling, Evaluation, and Performance Measurements of Time-Sharing Computer Systems," Computing Reviews , 13 (December, 1972), 603-608. Bard, Yonathan, "Performance Criteria and Measurement for a Time-Sharing System," IBM Systems Journal , 10 (1971), 193-216. Bard, Yonathan and Suryanarayana, K. V., "On the Structure of CP-67 Overhead," in Walter Freiberger, ed . , Statistical Computer Performance Evaluation , New York: Academic Press, 1972, 329-346. Bock, M. E., Yancy, T. A., and Judge, G. 0., "The Statistical Consequences of Preliminary Test Estimators in Regression," Journa l of the American Statistical Association , 68 (March, 1973), 109-116. Boehm, B. W. and Bell, T. E., "Issues in Computer Performance Evaluation: Some Consensus, Some Divergence," Performance Evaluation Review , 4 (July, 1975), 4-39- Box, George E. P. and Jenkins, Gwilym M. , Time Series Analysis: forecasting and control , San Francisco: Holden-Day Inc., 1970. Calingaert, Peter, "System Performance Evaluation: Survey and Appraisal," Communications of the ACM , 10 (January, 1967), 12-13 Carter, C. E. and Pelg, E. , 360 Hardware Monitor , File No. 831 , Department of Computer Science, University of Illinois at Urbana-Champaign, 1970. 66 Drummond, M. E., Jr., "A Perspective on System Performance Evaluation," IBM Systems Journal , 8 (1969), 252-263- Drummond , M. E., Jr., Evaluation and Measurement Techniques for Digital Computer Systems , Englewood Cliffs: Prentice-Hall, 1973. Dhrymes, Phoebus, Distributed Lags: Problems of Estimation and Formulation , San Francisco: Holden-Day Inc., 1971. Goldberger, Arthur S. , Econometric Theory , New York: John Wiley & Sons, Inc., 1964. Goodman, Arnold F., "Measurement of Computer Systems - An Introduction," AFIPS Conference Proceedings , 41/11 (Fall Joint Computer Conference), Montvale, New Jersey: AFIPS Press, 1972, 669-680. Grenander, Ulf and Tsao, Rhett F. , "Quantitative Methods for Evaluating Computer System Performance: A Review and Proposals," in Walter Freiberger, ed . , Statistical Computer Performance Evaluation , Mew York: Academic Press, 1972, 3-24. IBM Corporation, "Appendix E: Software Security Performance Study," Data Security and Data Processing, Volume 3, Part 2, Study Results: State of Illinois , Order Number G320-1373, June, 1974, 201-225. IBM Corporation, MVT Guide , Order Number GC28-6720-4, March, 1972 IBM Corporation, MVT Supervisor , Order Number GY28-6659-5, January, 1971 . IBM Corporation, OS SMF , Order Number GC28-6712-7, April, 1973- Johnston, J., Econometric Methods , Second Edition, New York: McGraw Hill, 1972. Judge, G. G., Bock, M. E., and Yancy, T. A., "Post Model Evaluation," The Review of Economics and Statistics , LVI (May, 1974), 245-253- 67 Judge, G. G., Yancy, T. A., and Bock, M. E., "Properties of Estimators after Preliminary Tests of Significance When Stochastic Restrictions Are Used in Regression," Journal of Econometrics , 1 (1973), 29-48. Kimbleton , Stephen R., "The Role of Computer System Models in Performance Evaluation," Communications of the ACM , 15 (July, 1972), 586-590. Kimbleton, Stephen R., "Performance Evaluation - A Structured Approach," AFIPS Conference Proceedings , 40 (Spring Joint Computer Conference), Montvale, New Jersey: AFIPS Press, 1972, 411-416. Lucas, Henry C, Jr., "Performance Evaluation and Monitoring," Computing Surveys , 3 (September, 1971), 79-91. Mamrak, Sandra Ann, Simulation Analysis of a Pay-For-Priority Scheme for the IBM 360/75 . Report No. UIUCDCS-R-73-605 , Department of Computer Science, University of Illinois at Urbana-Champaign, August, 1973. Mamrak, Sandra Ann and Randal, J. M. , An Analysis of a Software Engineering Failure , Unpublished manuscript, Computing Services Office, University of Illinois at Urbana-Champaign, 1974. McKinney, J. M. , "A Survey of Analytical Time-Sharing Models," Comput ing Surveys , 1 (June, 1969), 105-116. Miller, Edward F., Jr., Bibliography and KWIK Index on Computer Performance Measurement , Santa Barbara, California: General Research Corporation, June, 1973- Robinson, Louis, "Computer Systems Performance Evaluation (and Bibliography)," Unpublished manuscript, IBM Corporation, November, 1972 Schatzoff, Martin and Bryant, Peter, "Regression Methods in Performance Evaluation: Some Comments on the State of the Art," Proceedings of Computer Science and Statistics: Seventh Annual Symposium on the Interface , Ames, Iowa: Iowa State University, 1973, 8-57. 68 Schatzoff, Martin and Tillman, C.C., "Statistical Validation of a Trace-driven Simulmator," Performance Evaluation Review , 3 (December, 1974), 82-93- Theil, Henri, "On the Use of Incomplete Prior Information in Regression Analysis," Journal of the American Statistical Association , 58 (June, 1963), 404-414. Theil, Henri, Principles of Econometrics , New York: John Wiley & Sons, Inc. , 1971 . Theil, Henri and Goldberger, Arthur S. , "On Pure and Mixed Statistical Estimation in Economics," International Economic Review , 2 (January, 1961), 65-78. Williams, Thomas, "Computer Systems Measurement and Evaluation," The Computer Bulletin , 16 (February, 1972), 100-104. 69 Appendix A This appendix gives the beginning and ending times for each of the 89 observations used in the estimation of throughput. Column one is the observation number. Column two indicates whether the observation was included as part of the first or second sample. Column three is the day of the observation. Column four is the time the observation started in hours, minutes, and seconds according to a twenty-four hour clock. Column five is the ending time of the observation. Column six is the number of Hasp initiators which were active during the observation. Obs. Sample Day Start End Inits 1 1 09/19/74 13.01.35 13-17.16 5 2 2 0Q/19/74 13.17.16 13.33.13 5 3 1 09/19/74 13.33.13 13.48.59 5 4 2 09/19/74 13-48.59 14.04.38 5 5 1 0Q/19/74 14.04.38 14.20.14 6 6 2 09/19/74 14.20.14 14.36.01 6 7 1 09/19/74 14.36.01 14.51 .45 6 8 2 09/19/74 14.51.45 15.07.22 6 9 1 09/19/74 15.23.05 15.38.44 5 10 2 09/19/74 15.38.44 15.54.50 5 11 1 09/19/74 15.54.50 16.10.48 5 12 2 09/19/74 16.10.48 16.26.34 5 13 1 09/20/74 13.29.36 13.45.27 6 14 2 09/20/74 13.45.27 14.01 .11 6 15 1 09/20/74 14.01.11 14.16.43 6 16 1 09/20/74 14.32.13 14.47.46 5 17 2 09/20/74 14.47.46 15.03.33 5 18 1 09/20/74 15.03.33 15.19.32 5 19 2 09/20/74 15.19.32 15.35.15 5 20 1 09/20/74 15.35.15 15.51.12 7 21 2 09/20/74 15.51.12 16.06.45 7 22 1 09/20/74 16.06.45 16.22.15 7 23 2 09/20/74 16.22.15 16.37.50 7 24 1 09/23/74 14.08.00 14.23-44 6 25 2 09/23/74 14.23-44 14.39-37 6 26 1 09/23/74 14.39-37 14.55.21 6 27 2 09/23/74 14.55.21 15.11 .08 6 28 1 09/23/74 15.26.49 15.42.33 5 70 Obs. Sample Day Start End Inits. 29 2 09/23/74 15.42.33 15.58.06 5 30 1 09/23/74 15.58.06 16.13.43 5 31 2 09/23/74 16.32.36 16.48.19 5 32 1 09/24/74 12.54.56 13.10.44 5 33 2 09/24/74 13.10.44 13-26.24 5 34 1 09/24/74 13-26.24 13-42.10 5 35 2 09/24/74 13-42.10 13-57.46 5 36 1 09/24/74 13-57.46 14.13-22 6 37 2 09/24/74 14.13-22 14.29.05 6 38 1 09/24/74 14.29.05 14.44.48 6 39 2 09/24/74 14.44.48 15.00.31 6 40 1 09/24/74 15.00.31 15.16.07 7 41 2 09/24/74 15.16.07 15.31.58 7 42 1 09/24/74 15.31.58 15.47.38 7 43 2 09/24/74 15.47.38 16.03.11 7 44 1 09/25/74 14.47.39 15.04.00 6 45 2 OQ/25/74 15.04.00 15.19.43 6 46 1 09/25/74 15.19.43 15.35.13 6 47 2 09/25/74 15.35.13 15.50.46 6 48 1 09/26/74 15.11.19 15.27.03 7 49 2 09/26/74 15.27.03 15.42.40 7 50 1 09/26/74 15.42.40 15.58.12 7 51 2 09/26/74 15.58.12 16. 14. 18 7 52 1 09/26/74 16.30.12 16.46.08 5 53 2 09/26/74 16.46.08 17.01.55 5 54 1 09/26/74 17.01.55 17.17.45 5 55 2 09/26/74 17.17.45 17.33.20 5 56 1 09/27/74 15.54.28 16.10.06 6 57 2 09/27/74 16.39-04 16.54.36 6 58 1 09/27/74 16.54.36 17.10.18 6 59 2 09/27/74 17.10.18 17.25.53 6 60 2 09/30/74 14.39.05 14.54.59 7 61 1 09/30/74 14.54.59 15.10.36 7 62 2 09/30/74 15.10.36 15.26.18 7 63 1 09/30/74 15.41.48 15.57.25 4 64 2 09/30/74 15.57.25 16.13-01 4 65 2 09/30/74 16.28.47 16.44.25 4 66 1 10/01/74 14.30.39 14.46.08 5 67 2 10/01/74 14.46.08 15.01.50 5 63 1 10/01/74 15.01.50 15.17.26 5 69 2 10/01/74 15.17.26 15.33-07 5 70 1 10/01/74 15.33.07 15.48.41 6 71 2 10/01/74 15.48.41 16.04.29 6 72 1 10/01/74 16.04.2Q 16.20.11 6 73 2 10/01/74 16.20.11 16.35.50 6 74 1 10/01/74 16.51.36 17.07.31 5 75 2 10/01/74 17.07.31 17.23.08 5 76 1 10/01/74 17.23-08 17.38.50 5 77 2 10/01/74 17.38.50 17.54.28 5 71 Obs. Sample Day Start End Ini 78 1 10/01/74 18.09.49 18.25.18 4 79 2 10/02/74 16.50.42 17.06.21 4 80 1 10/02/74 17.06.21 17-21.52 4 81 1 10/03/74 13-51.17 14.06.59 . 5 82 2 10/03/74 14.06.59 14.22.59 5 83 1 10/03/74 14.22.59 14.38.40 5 84 2 10/03/74 14.38.40 14.54.21 5 85 1 10/04/74 15.48.02 16.03-55 5 86 2 10/04/74 16.03-55 16.19.43 5 87 1 10/04/74 16.19.43 16.35.22 5 88 2 10/04/74 16.35.22 16.51.14 5 89 1 10/04/74 16.51.14 17.07.22 5 12 Appendix B This appendix gives the change in the counter values on the hardware monitor during each of the 89 observations. Units are in tenths of seconds of activity. Column one is the observation number, and columns two through nine are the observed changes in counter values. Counter one measured the amount of time the instruction counter was addressing instructions within the Plorts region. Counter two was allowed to run free and measured the elapsed time from the point of view of the hardware monitor. Counter three measured wait light activity. Counter four measured activity on channel 0, the multiplexor channel. Counters five through eight measured activity on each of the selector channels 1 through 4. Column ten is the ratio of counter three to counter two multiplied by one hundred thereby giving the percent of wall time that the machine was in the wait state. Wait Obs. Plorts Time Wait Ch Ch 1 Ch 2 Ch 3 Ch 4 Time 1 1699 9975 84 355 1608 2409 1819 2291 0.8 2 1638 9329 304 281 1566 2426 2068 2819 3-3 3 1301 9647 128 603 1190 2657 1501 2055 1.3 4 1075 9486 428 841 1968 1726 1379 2574 4.5 5 1249 9508 170 807 2466 2155 1796 2971 1.8 6 924 9614 43 2486 2515 2356 1715 2395 0.4 7 2100 9581 215 747 2010 1653 2229 0.0 8 973 9858 5 250 1623 1926 1960 2046 0.1 9 287 9481 327 323 1939 3118 2475 3502 3-4 10 708 9734 199 927 2056 2960 2784 2726 2.0 11 746 9721 383 578 1209 4667 2776 1989 3-9 12 875 9609 196 1460 1753 3348 1 963 3060 2.0 13 2309 9681 34 2062 1039 1232 1228 1529 0.4 14 2194 9570 120 2462 1340 1384 1176 2015 1.3 15 2356 9478 161 1360 1030 1344 1421 1279 1.7 16 2158 9502 35 1290 939 1314 1815 1944 0.4 17 1028 9601 130 1677 1874 1957 1471 1886 1.4 73 Wait Obs. Plorts Time Wait Ch Ch 1 Ch 2 Ch 3 Ch 4 Time 18 760 9838 534 1122 2760 3150 1972 2241 5.4 19 460 9691 601 480 2742 3517 157 3 2006 6.2 20 394 9545 286 317 2629 2392 2306 2412 3-0 21 1399 9477 11 337 1885 1711 1499 1706 0.1 22 3221 9441 1 155 1005 1244 1226 1823 0.0 23 2966 9493 1 202 1205 1271 1431 2131 0.0 24 1940 9656 111 2 39 1421 3012 1458 2574 1.1 25 1841 9674 4 244 1499 2464 1345 3151 0.0 26 1462 9992 1 253 1492 1896 1607 3447 0.0 27 1104 9273 8 296 1544 1766 2150 2207 0.1 28 1184 9623 3 261 978 1696 2022 1856 0.0 29 1021 9524 3 168 1449 1979 2121 1612 0.0 30 6 36 9518 87 1049 1988 1871 2012 1828 0.9 31 665 9527 8 350 1639 2277 1586 1120 0.1 32 664 9621 21 1824 2270 2694 1835 1897 0.2 33 881 9590 3 556 1244 1443 1919 1548 0.0 34 1370 9656 153 1705 1433 1723 1867 205 3 1.6 35 1096 9615 4 1844 1087 2085 1432 1660 0.0 36 726 9484 132 2430 1808 1872 1574 2192 1.4 37 1031 9617 1007 1036 1423 1586 2507 0.0 38 918 9639 2 1149 1283 1685 1500 2318 0.0 39 1320 9563 8 1580 1411 1650 1523 2560 0.1 40 1233 9760 2 1211 1096 1652 1671 2897 0.0 41 1202 9517 6 725 987 1648 1442 2500 0.1 42 1494 9538 9 1239 1095 1456 1587 1908 0.1 43 2520 9520 4 864 960 1468 1375 1570 0.0 44 722 9978 712 4192 1594 7450 1837 1700 7.1 45 994 9618 125 1694 1939 2509 1767 1498 1.3 46 1025 9477 1 314 432 1018 1060 1261 0.0 47 2353 9548 1 339 1003 1587 1501 2176 0.0 48 684 9634 27 374 1716 2116 1846 1548 0.3 49 585 9541 196 295 1982 2194 2143 1687 2.1 50 924 9512 1 413 1524 1683 1613 1815 0.0 51 1001 9834 7 214 2063 2079 1860 1856 0.1 52 686 9741 383 320 2436 2841 2610 1570 3-9 53 611 9668 294 298 2904 2249 2305 1634 3-0 54 428 9955 493 2193 3373 2276 2561 2237 5.0 55 565 9231 153 1398 1970 3078 1834 1308 1.7 56 675 9813 22 762 1552 2210 7862 1359 0.2 57 830 10100 24 982 1885 1915 8541 1554 0.2 58 695 9488 27 481 2596 1958 8944 2066 0.3 59 362 9748 232 1311 2108 1880 8935 1332 2.4 60 699 10048 26 360 1266 2700 34 37 2255 0.3 61 1127 9515 24 1199 579 2263 2958 3150 0.3 62 809 9633 210 900 2818 2607 3909 2389 2.2 63 842 9656 127 253 2191 2402 3507 3187 1.3 64 1040 9701 96 386 2173 2278 3568 2555 1.0 65 1019 9701 97 350 1537 157 3 4240 1966 1 .0 66 966 8750 2 188 819 1746 1381 935 0.0 74 Wait ibs. Plorts Time Wait Ch Ch 1 Ch 2 Ch 3 Ch 4 Time 67 1175 8567 108 229 1431 1900 1340 1471 1.3 68 874 8757 2 168 1039 2259 1421 911 0.0 69 846 8615 197 1116 1554 1273 1219 0.0 70 1346 8675 1 204 670 1392 1597 1154 0.0 71 1234 8791 345 903 1189 956 1284 0.0 72 1900 8734 183 325 784 803 907 0.0 73 1730 8689 784 658 1175 1029 1509 0.0 74 621 8877 1257 1985 2688 1322 1166 0.0 75 368 8753 12 699 1711 2311 1777 1002 0.1 76 185 8678 254 161 2400 2151 2126 1117 2.9 77 141 8718 1096 2247 2355 1533 1626 740 12.6 78 135 8681 662 1925 2463 2290 1624 1202 7.6 79 245 9425 83 2975 2038 2077 1256 1575 0.9 80 365 9309 150 2402 1997 1461 1131 1640 1.6 81 986 9442 209 326 1602 2153 1363 1621 2.2 82 856 950 3 101 218 1613 2139 1591 1971 1 .1 83 626 9516 326 854 2552 2134 1400 2081 3.4 84 634 9514 409 694 2220 2484 1945 2714 4.3 85 1000 9472 63 2Q5 2097 1936 1745 1738 0.7 86 1039 9472 27 425 1769 18Q8 1523 1140 0.3 87 606 9341 11 676 1032 1297 1036 1180 0.1 88 50 3 Q548 60 303 1307 1541 1011 1241 0.6 89 482 9578 96 30 3 1635 1522 1054 2445 1.0 75 Appendix C This appendix gives the console log data for the 89 data observations. Column two, Plorts CPU time is given in seconds. Column three is the average number of timesharing terminals active during the observation. The last two columns are the backlog, or total number of jobs in the system, at the beginning and at the end of the observation. Plorts Start End Obs. CPU Terminals Backlog Backlog 1 346 33 162 177 2 332 35 177 189 3 292 27 189 211 4 251 28 211 219 5 247 25 219 234 6 183 26 234 226 7 419 35 226 275 8 255 25 275 294 9 77 18 281 276 10 194 36 276 275 11 214 33 275 291 12 227 26 291 299 13 386 34 76 74 14 386 30 74 84 15 368 21 84 88 16 396 37 86 100 17 265 31 100 123 18 207 26 123 109 19 133 16 109 115 20 115 20 115 114 21 248 21 114 109 22 474 19 109 126 23 4 39 27 126 143 24 341 33 106 107 25 370 34 107 109 26 305 31 109 119 27 256 23 119 132 28 332 37 134 155 29 249 28 155 167 30 158 27 167 156 31 159 21 174 158 32 152 25 89 91 33 221 29 91 100 34 273 34 100 108 35 264 28 108 128 36 195 26 128 140 76 Plorts Start End Obs. CPU Terminals Backlog Backlog 37 293 26 140 169 38 226 28 169 187 39 256 22 187 177 40 267 31 177 190 41 300 38 190 197 42 341 35 197 215 43 386 33 215 225 44 181 30 115 118 45 238 28 118 122 46 243 31 122 140 47 425 33 140 158 48 184 24 119 131 49 148 26 131 119 50 217 29 1 19 119 51 2 37 31 119 129 52 224 27 119 134 53 152 22 134 126 54 128 19 126 122 55 118 18 122 113 56 131 21 113 115 57 136 15 148 140 58 141 13 140 126 59 65 11 126 108 60 189 26 125 128 61 396 28 128 148 62 202 29 148 145 63 214 32 157 145 64 245 33 145 136 65 267 21 127 146 66 271 31 161 174 67 305 31 174 186 68 258 34 186 197 69 260 29 197 216 70 339 32 216 224 71 350 39 224 240 72 418 34 240 255 73 391 36 255 256 74 178 17 274 275 75 96 13 275 260 76 63 10 260 243 77 42 7 243 214 78 40 8 209 189 79 82 16 281 254 80 119 15 254 238 81 313 38 100 100 82 303 39 100 123 83 215 34 123 131 84 204 27 131 138 85 279 32 161 166 Plorts Start End Obs. CPU Term inals Backlog Backlog 86 315 32 166 169 87 221 31 169 178 88 179 26 178 179 89 155 21 179 176 77 78 Appendix D This appendix gives the Express CPU time, Plot time, number of cards read, cards punched, and lines printed from the ARDS records. It also gives the throughput per observation as determined from SMF step termination records. Express CPU time and throughput are in centiseconds, and Plot time is in seconds of plotter residency. Throughput values have been rounded for printing purposes. Obs. Expr CPU Plot Read Punch Print Throughput 1 4287 16156 30262 23679-78 2 4808 43 10542 12262 20474.63 3 5815 14229 24886 25717.15 4 3135 11667 5283 11665 25847.53 5 3787 29 17448 50208 29760.43 6 3839 97 7780 1753 23979 35045.50 7 3982 25151 481 18505 21009.07 8 5231 27900 139 12403 29 306.27 9 4840 8679 1010 22777 41351.80 10 5960 9107 18213 36391.33 11 3325 269 12408 367 23269 36179.34 12 6035 Q441 102 24400 31599.08 13 3621 524 8394 111 15217 17281.24 14 3573 9855 100 10672 19224.89 15 4631 215 10006 206 13101 17109.84 16 3121 18663 13905 20600.91 17 5 321 12026 321 16027 22092.91 18 3766 242 17976 884 14444 25867.40 19 5444 12926 5267 28999 32604.1 1 20 3912 97 15066 4755 20505 31368.62 21 2948 294 9273 547 24858 27560.84 22 2884 23 13284 41 6114 23790.99 23 3719 14914 11407 24256.35 24 3733 178 12001 796 19297 23184.84 25 4639 4 14565 1276 12210 22587-13 26 4011 304 9371 1091 15397 23240.39 27 4942 20749 90 22347 23202.74 28 10429 12415 90 14328 10333.30 29 6045 15955 71 21398 20487.13 30 4 328 11152 539 29036 19059.25 31 5857 9164 28 20261 24776.1 1 32 5146 7768 12724 37675.44 33 3808 15394 78 11937 39880.10 34 5721 11591 78 43371 31606.94 35 3780 8815 94 12454 34082.05 79 Obs. Expr CPU Plot Read Punch Print Throughput 36 5615 11449 15446 25935.07 37 5626 24061 117 19821 21442.54 38 7470 7735 14294 23326.19 39 6312 781 12068 156 18053 21279.14 40 8220 562 17461 507 16972 21190.25 41 7667 484 17616 1498 21242 18216.73 42 5818 209 14722 244 11 165 19456.75 43 6935 43 7426 1608 19167 18789-14 44 4987 96 8931 148 13447 29771.43 45 5318 205 36 19135 30324.09 46 5685 8700 12311 30275.58 47 6 329 12773 69 134 38 23889.09 48 7920 63 10555 302 16028 28002.83 49 1565 2352 24981 40314.05 50 3194 185 9264 1000 45690 40757.64 51 4719 9624 33 13787 36032.01 52 5665 4 8729 33 24454 27375.87 53 4596 180 12647 2418 26101 27607.90 54 3693 252 14593 1254 19734 34 371.25 55 3835 4039 23756 42044.56 56 6091 8354 23119 19810.15 57 34 35 826 8068 755 21424 48746.87 58 3423 19 7785 601 15910 43152.85 59 1280 39 2575 359 56025 51729.26 60 4920 11025 691 31343 38016.07 61 5818 130 27642 101 27803 20214.71 62 3466 111 8598 193 21358 34676.37 63 2718 66 3853 859 8314 43105.82 64 411 1 250 5944 5 27199 33524.71 65 4729 15756 78 25995 25684.61 66 4298 15367 551 36045 33054.95 67 4181 21476 50 23178 28692.12 68 6142 253 11803 18284 28082.97 69 6185 151 11680 103 16239 29477.35 70 4932 10223 1027 18314 24831-40 71 2999 199 9262 1566 22889 17735.24 72 5148 10943 9032 12531.63 73 2878 11781 1937 11017 20610.69 74 3464 12470 677 22615 28754.78 75 3387 381 6613 571 20624 37445.29 76 2968 604 2175 13281 47739.02 77 1000 208 802 1293 25514 46261.18 78 2450 2249 20519 51659.66 79 2970 1211 6224 1542 25053 47422.83 80 2181 733 7428 1091 22981 42354.60 81 6099 61 10855 33419 18910.37 82 7166 17823 19697 15809.63 83 5330 86 13459 629 23362 22687.15 84 6467 15097 13136 25545.34 85 5087 270 10967 16321 24632.94 Obs. Expr CPU Plot Read Punch Print Throughput 86 7665 14156 126 21056 21026.09 87 5020 4 7652 2555 14300 23356.05 88 5626 7349 590 15956 15837.43 89 5085 243 7802 14739 18728.57 80 81 Appendix E This appendix gives the complete set of transformed variables, both independent and dependent, used in the estimation of throughput. All time variables (i.e. Plorts, Wait, Channels through 4, Express CPU, Plot, and Throughput) are in units of seconds of activity per fifteen minutes. Cards read, cards punched, and lines printed are in units of records per fifteen minutes. The first column is the observation number. Remaining columns are identical to, and in the same order as, the list of variables given in Section 4.1. The second column, labelled C is a constant column of ones. The columns labelled 5, 6, and 7 are dummy variables which have the value one if that observation had the corresponding number of active initiators for processing Hasp batch jobs, and have the value zero otherwise. Data values have been rounded for printing purposes. Obs C 5 6 7 Plorts Wait Ch Ch 1 Ch 2 Ch 3 Ch 4 1 1 1. 0. 0. 330.92 7.58 32.03 145.08 217.35 164.12 206.71 2 1 1. 0. 0. 312.23 29-33 27.11 151.08 234.04 199.51 271 .96 3 1 1. 0. 0. 277.80 11.94 56.26 111 .02 247.88 140.03 191 .72 4 1 1. 0. 0. 240.58 40.61 79.79 186.72 163-76 130.83 244.21 5 1 0. 1 . 0. 237.50 16.09 76.39 233-42 203-99 170.00 281 .23 6 1 0. 1 . 0. 173-92 4.03 232.72 235.44 220.55 160.55 224.20 7 1 0. 1 . 0. 399.47 0.00 20.20 70.17 188.81 155.28 209-38 8 1 0. 1 . 0. 244.93 0.46 22.82 148.17 175.84 178.94 186.79 9 1 1. 0. 0. 73-80 31.04 30.66 184.06 295.98 234.94 332.43 10 1 1 . 0. 0. 180.75 18.40 85.71 190.10 273-68 257.41 252.04 11 1 1 . 0. 0. 201 .04 35.46 53-51 111.93 432.09 257.01 184.15 12 1 1 . 0. 0. 215-96 18.36 136.75 164. 1Q 313-58 183.86 286.61 13 1 0. 1 . 0. 365.30 3.16 191-70 96.59 114.53 114.16 142.14 14 1 0. 1 . 0. 368.01 11.29 231.54 126.02 130.16 110.60 189-50 15 1 0. 1 . 0. 355.36 15.29 129.14 97.81 127.62 134.93 121.45 16 1 . 1 . 0. 0. 381.99 3-32 1 22 . 1 8 88.94 124.46 171 .91 184.13 17 1 . 1. 0. 0. 251.85 12.19 157-20 175.67 183.45 137.89 176. 79 18 1 1. 0. 0. 194.26 48.85 102.64 252.49 288.17 180.40 205.01 19 1 1. 0. 0. 126.94 55.81 44.58 254.65 326.62 146.08 186.30 20 1 . 0. 0. 1 . 108.15 26.97 29-89 247.89 225.54 217.43 227.43 21 1 . 0. 0. 1 . 239.23 1.04 32.00 179-01 162.49 142.36 162.01 82 Obs C 5 6 7 Plorts Wait Ch Ch 1 Ch 2 Ch 3 Ch 4 22 1 . 0. 0. 1 . 458.71 0.10 14.78 95.81 118.59 116.87 173.78 23 1 . 0. 0. 1 . 422.57 0.09 19.15 114.24 120.50 135.67 202.03 24 1 . 0. 1 . 0. 325.1 1 10.35 22.28 132.45 280.74 135.89 239.91 25 1 . 0. 1 . 0. 349-42 0.37 22.70 139.46 229.23 125.13 293.15 26 1 . 0. 1 . 0. 290.78 0.09 22.79 134.39 170.78 144.75 310.48 27 1 . 0. 1 . 0. 243.29 0.78 28.73 149.85 171 .40 208.67 214.20 28 1 1 . 0. 0. 316.53 0.28 24.41 91 .47 158.62 189.11 173.58 29 1 1 . 0. 0. 240.19 0.28 15.88 136.93 187.01 200.43 152.33 30 1 1 . 0. 0. 151 .76 8.23 99-19 187.98 176.92 190.25 172.85 31 1 1 . 0. 0. 151.75 0.76 33.06 154.83 215.10 149.83 105.80 32 1 1 . 0. 0. 144.30 1.96 170.63 212.35 252.01 171.66 177-46 33 1 1 . 0. 0. 211 .60 0.28 52.18 116.75 135.42 180.09 145.28 34 1 1 . 0. 0. 259-73 14.26 158.92 133-56 160.59 174.02 191.35 35 1 1 . 0. 0. 253-85 0.37 172.61 101.75 195.16 134.04 155.38 36 1 . 0. 1 . 0. 187-50 12.53 230.60 171.57 177-65 149.37 208.01 37 1 . 0. 1 . 0. 279-64 0.00 94.24 96.95 133.17 148.42 234.62 38 1 . 0. 1 . 0. 215.69 0.19 107-28 119.79 157-33 140.06 216.43 39 1 0. 1 . 0. 244.33 0.75 148.70 132.79 155.29 143-33 240.93 40 1 0. 0. 1 . 256.73 0.18 111 .67 101.07 152.34 154.09 267-14 41 1 0. 0. 1 . 283-91 0.57 68.56 93.34 155.85 136.37 236.42 42 1 0. 0. 1 . 326.49 0.85 116.91 103.32 137.39 149.75 180.04 43 1 0. 0. 1 . 372.35 0.38 81.68 90.76 138.78 129.99 148.42 44 1 0. 1 . 0. 166.06 64.22 378.11 143.78 671 .98 165.69 153.34 45 1 0. 1 . 0. 227.15 11 .70 158.52 181 .44 234.78 165.35 140.17 46 1 0. 1 . 0. 235.16 0.09 29.82 41.03 96.68 100.66 119.75 47 1 0. 1 . 0. 409-97 0.09 31.95 94.54 149.59 141.49 205.1 1 48 1 0. 0. 1 . 175.42 2.52 34.94 160.31 197.67 172.45 144.61 49 1 0. 0. 1 . 142.16 18.49 27.83 186.96 206.96 202.15 159.13 50 1 0. 0. 1 . 209-55 0.09 39.08 144.20 159.24 152.62 171.73 51 1 0. 0. 1 . 220.81 0.64 19.59 188.80 190.27 170.23 169.86 52 1. 1. 0. 0. 210.88 35-30 29.57 225.07 262.49 241.15 145.06 53 1. 1. 0. 0. 144.46 27.37 27.74 270.34 209.36 214.57 152.11 54 1 1 . 0. 0. 121 .26 44.57 198.26 304.94 205.77 231.53 202.24 55 1 1. 0. 0. 113.58 14.92 136.30 192.07 300.10 178.81 127.53 56 1 0. 1 . 0. 125.69 2.02 69.89 142.34 202.69 721 .06 124.64 57 1 0. 1 . 0. 131.33 2.14 87.50 167.97 170.64 761.08 138.48 58 1 0. 1 . 0. 134.71 2.56 45.63 246.25 185.73 848.40 195.97 59 1. 0. 1 . 0. 62. S7 21 .42 121.04 194.62 173-57 824.94 122.98 60 1 0. 0. 1 . 178.30 2.33 32.25 113-40 241.84 307.85 201.98 61 1 0. 0. 1 . 380.36 2.27 1 13.41 54.77 214.05 279.79 297.95 62 1. 0. 0. 1 . 192.99 19.62 84.09 263-28 243-57 365.21 223.20 63 1. 0. 0. 0. 205.55 11.84 23.58 204.21 223.88 326.87 297.05 64 1. 0. 0. 0. 235.58 8.91 35.81 201.60 211 .34 331.02 237.04 65 1 0. 0. 0. 256.18 9.00 32.47 142.59 145.93 393-36 182.39 66 1. 1 . 0. 0. 262.54 0.21 19.34 84.24 179.59 142.05 96.17 67 1. 1. 0. 0. 291 .40 11.35 24.06 150.33 199.60 140.77 154.53 68 1. 1 . 0. 0. 248.08 0.21 17.27 106.78 232.17 146.04 93-63 69 1. 1 . 0. 0. 248.67 0.00 20.58 116.59 162.34 132.99 127-35 70 1 0. 1 . 0. 326.66 0.10 21 .16 69.51 144.41 165.68 119-72 71 1 0. 1 . 0. 332.28 0.00 35.32 92.45 121.73 97.87 131.45 83 Obs C 5 6 7 Plorts Wait Ch Ch 1 Ch 2 Ch 3 Ch 4 72 0. 1.0. 399.36 0.00 18.86 33 .49 80.79 82.75 93-46 73 0. 1.0. 374.76 0.00 31 .21 68 .16 121.71 106.58 156.30 74 1. 0. 0. 167.75 0.00 127.44 201 .25 272.52 134.03 118.22 75 . 1.0.0. 92.21 1.23 71.87 175 .93 237.62 182.71 103-03 76 . 1. 0. 0. 60.19 26.34 16.70 248 .91 223.08 220.49 115.84 77 1. 0. 0. 40.30 ' 13-15 231-97 243 .12 158.26 167.86 76.39 78 0. 0. 0. 38.75 68.63 199.57 255 .35 237.42 168.37 124.62 79 0. 0. 0. 78.59 7.93 284.08 194 .61 198.33 119.94 150.40 80 0. 0. 0. 115.04 14.50 232.23 193 .07 141.25 109-35 158.56 81 1. 0. 0. 299-04 19.92 31-07 152 .70 205.22 129.92 154.51 82 1. 0. 0. 284.06 9.57 20.65 152 .76 202.58 150.68 186.67 83 1. 0. 0. 205.63 30.83 30.77 241 .36 201.83 132.41 196.82 84 1. 0. 0. 195.11 38.69 55.65 210 .01 234.98 183.99 256.74 85 1. 0. 0. 263-48 5.99 28.03 199 .25 183-95 165.80 165.14 86 1. 0. 0. 299.05 2.57 40.38 168 .08 180.34 144.71 108.32 87 1. 0. 0. 211 .82 1.06 55.13 99 .43 124.97 99.82 113-69 88 1. 0. 0. 169.22 5.66 28.56 123 .20 145.26 95.30 116.98 89 1. 0. 0. 144.11 9.02 28.47 153 .63 143.02 99.04 229.75 Obs Ti ;rms Backlog ; Expr CPL I Plot Read Punch Print Throughput 1 3: ]. 169-50 41.00 0.00 15452 .07 0.00 28943.46 226.48 2 v. 5. 183-00 45.22 40.44 9914 .11 0.00 11531.66 192.55 3 r r . 200.00 55.32 0.00 13537 .10 0.00 23675.90 244.67 4 2i I. 215.00 30.05 0.00 11182 .43 5063.58 11180.51 247.74 5 2\ >. 226.50 36.41 27.88 16776 .92 0.00 48276.92 286.16 6 2( ). 230.00 36.48 92.19 7393 .88 1666.00 22788.91 333-06 7 3 C 5. 250.50 37.96 0.00 23978 .71 458.58 17642.48 200.30 8 2\ S. 284.50 50.24 0.00 26798 .29 133.51 11913.23 281.49 9 M ]. 278.50 46.39 0.00 8318 .53 968.05 21830.99 396.34 10 3( ). 275.50 55.53 0.00 8484 .78 0.00 16968.63 339.05 11 3': ]. 283-00 31.24 252.71 11656 .78 344.78 21860.23 339-89 12 2t j. 295.00 57.42 0.00 8981 .92 97.04 23213.53 300.63 13 3' L. 75.00 34.27 495.90 7943 .85 105.05 14400.95 163-54 14 3( ). 79.00 34.06 0.00 9395 .66 95.34 10174.58 183.29 15 21 86.00 44.72 207.62 9662 .45 198.93 12651.18 165.22 16 3 r r. 93-00 30.11 0.00 18002 .89 0.00 13413.18 198.72 17 31 111.50 50.57 0.00 11429 .14 305.07 15231.57 209.96 18 2£ j. 116.00 35.34 227.11 16870 07 829.61 13555.37 242.76 19 M >. 112.00 51.96 0.00 12336 59 5026.83 27676.67 311.17 20 2C ). 114.50 36.79 91 .22 14168 .65 4471.79 19283.70 295.00 21 2' 111.50 28.44 283-60 8945 .02 527.65 23978.78 265.86 22 1< ). 117-50 27.91 22.26 12855 .48 39.68 5916.77 230.24 23 2' 134.50 35.80 0.00 14355 72 0.00 10980.00 233-48 24 3; }. 106.50 35.59 169-70 11441 63 758.90 18397.56 221.04 25 3 l [. 108.00 43.81 3-78 13754 98 1205.04 11530.95 213.31 26 31 114.00 38.24 289.83 8934 .22 1040.15 14679-34 221.57 27 2: ]. 125.50 46.97 0.00 19719 .22 85.53 21237.91 220.51 28 3 r r. 144.50 99.43 0.00 11836 33 85.81 13660.17 98.52 29 2* ]. 161.00 58.31 0.00 15390 .68 68.49 20641.16 197.63 84 Obs Terms Backlog Expr CPU Plot Read Punch Print Throughput 30 27- 161.50 41.57 0.00 10711 .63 517.72 27889.43 183.07 31 21. 166.00 55-90 0.00 8746.13 26.72 19337.12 236.46 32 25. 90.00 48.85 0.00 7374.68 0.00 12079.75 357.68 33 29. 95.50 36.46 0.00 14738.94 74.68 11429.04 381.83 34 34. 104.00 54.43 0.00 11027.38 74.21 41262.05 300.70 35 28. 118.00 36.35 0.00 8475-96 90.38 11975.00 327.71 36 26. 134.00 53-99 0.00 11008.65 0.00 14851.92 249-38 37 26. 154.50 53.69 0.00 22963-84 111.66 18917.18 204.65 38 28. 178.00 71.29 0.00 7382.29 0.00 13642.21 222.63 39 22. 182.00 60.24 745.39 11517.71 148.89 17229.80 203.09 40 31. 183-50 79-04 540.38 16789.42 487.50 16319.23 203.75 41 38. 193-50 72.56 458.04 16671 -29 1417.67 20102.84 172.40 42 35. 206.00 55.70 200.11 14095.53 233.62 10689.89 186.29 43 33- 220.00 66.90 41.48 7163.34 1551.13 18489.07 181.25 44 30. 116.50 45.75 88.07 8193-58 135.78 12336.70 273.13 45 28. 120.00 50.76 0.00 19599.58 0.00 18262.46 289.41 46 31. 131 -00 55.02 0.00 8419.35 0.00 11913.87 292.99 47 33. 14Q.00 61 .05 0.00 12321.22 66.56 12962.70 230.44 48 24. 125.00 75.51 60.06 10063.03 287.92 15280.93 266.98 49 26. 125.00 15.03 0.00 2259. 12 0.00 23994.56 387.22 50 29. 119-00 30.84 178.65 8945.92 965.67 44121.24 393-58 51 31. 124.00 43-97 0.00 8966.46 30.75 12845.03 335.70 52 27. 126.50 53-33 3-77 8217.68 31.07 23021 .55 257.72 53 22. 130.00 43.68 171 .07 12019.32 2297.99 24805-60 262.38 54 19. 124.00 34. Q9 238.74 13824.95 1188.00 18695.37 325.62 55 18. 117.50 36.91 0.00 3887.81 0.00 22866.74 404.71 56 21 . 1 14.00 58.44 0.00 8015.57 0.00 22132.41 190.08 57 15. 144.00 33.17 797.64 7790.99 729.08 20688.41 470.73 58 13- 133-00 32.70 18.15 7437.90 574.20 15200.64 412.29 59 1 1 . 1 17-00 12.32 37.54 2478.61 345.56 53927.81 497.93 60 26. 126.50 46.42 0.00 10400.94 651 .89 29568.87 358.64 61 28. 138.00 55.88 124.87 26550.48 97-01 26705.12 194.16 62 29. 146.50 33.11 106.05 8214.65 184.39 20405.73 331-30 63 32. 151 .00 26.11 63-39 3700.85 825.08 7985.70 414.04 64 33- 140.50 39.53 240.38 5715.38 4.81 26152.88 322.35 65 21 . 136.50 45-37 0.00 15117.70 74.84 24941 .90 246.44 66 31 • 167-50 41.64 0.00 14887.30 533-80 34919.81 320.23 67 31. 180.00 39.95 0.00 20518.47 47.77 22144.59 274.13 68 34. 191 -50 59.06 243.27 1 1349.04 0.00 17580.77 270.03 69 29. 206.50 59.16 144.42 11171 .09 98.51 15531.46 281.93 70 32. 220.00 47.52 0.00 9850.86 989.61 17647-32 239.27 71 39. 232.00 28.47 188.92 8793-04 1486.71 21730.06 168.37 72 34. 247.50 49.18 0.00 10455.10 0.00 8629-30 119.73 73 36. 255.50 27.58 0.00 11291.69 1856.55 10559-42 197.55 74 17. 274.50 32.65 0.00 11751.83 638.01 21312.57 270.99 75 13. 267.50 32.53 365.96 6351.87 548.45 19809.61 359.67 76 10. 251.50 28.36 577.07 2078.03 0.00 12688.85 456.11 77 7. 228.50 9.59 199.57 769.51 1240.62 24480.38 443.87 78 8. 199.00 23.74 0.00 2178.79 0.00 19878.47 500.47 79 16. 267.50 28.47 1160.70 5965.50 1477. 96 24012.46 45^.53 05 Obs Terms Backlog Expr CPU Plot Read Punch Print Throughput 80 15. 246.00 21 .08 708.59 7180.67 1054.67 22215.79 409.44 81 38. 100.00 58.27 58.28 10371.02 0.00 31928.98 180.67 82 39. 111 .50 67-18 0.00 16709.06 0.00 18465.94 148.22 83 34. 127.00 50.98 82.25 12872.58 601.59 22344.10 216.99 84 27. 134.50 61.85 0.00 14439.21 0.00 12563-66 244.32 85 32. 163-50 48.04 254.98 10357.08 0.00 15413-33 232.63 86 32. 167.50 72.77 0.00 134 39.24 119.62 19989.87 199.61 87 31. 173-50 48.12 3-83 7334.19 2448.88 13706.07 223.86 88 26. 178.50 53.19 0.00 6947.58 557.77 15084.45 149.72 89 21. 177-50 47.28 225.93 7253.93 0.00 13703.62 174.13 86 Appendix F This appendix gives the Sum of Squares Cross Products Matrix for the first sample of throughput data. Sum of Squares Cross Products Matrix CONSTANT 5 I NITS 6 I NITS 7 I NITS PLORTCPU WAIT TIME CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT WAIT TIME CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT CONSTANT 0.45000E+02 0.21000E+02 0.14000E+02 0.70000E+01 0.10566E+05 0.59109E+03 0-37724E+04 0.68290E+04 0.91998E+04 0.86920E+04 0.82293E+04 0.12260E+04 0.72490E+04 0.20601E+04 0.54546E+04 0.50112E+06 0.20986E+05 0.87434E+06 0.11869E+05 WAIT TIME 0.21194E+05 0.76658E+05 0.11931E+06 16821E+06 1 1342E+06 10983E+06 14122E+05 0.95799E+05 0.23509E+05 0.80231E+05 0.58500E+07 0.33204E+06 0.1 1747E+08 0.18396E+06 5 INITS 0.21000E+02 0.00000E+00 0.00000E+00 0.45269E+04 0.33645E+03 0, 0.14925E+04 0.35365E+04 0.45887E+04 0.36157E+04 0.36505E+04 0.59100E+03 0, 0.35170E+04 0.98638E+03 0, 0.21679E+04 0.23509E+06 0.82615E+04 0.44349E+06 0. 0.55178E+04 6 INITS 7 INITS PLORTCPU 14000E+02 00000E+00 37644E+04 12669E+03 13638E+04 17326E+04 28037E+04 32287E+04 25359E+04 39400E+03 21325E+04 64110E+03 12972E+04 14950E+06 42612E+04 24245E+06 32573E+04 .70000E+01 .19154E+04 •32983E+02 .46067E+03 .90735E+03 . 12048E+04 .12430E+04 .14627E+04 . 18600E+03 . 10035E+04 ■36168E+03 .12175E+04 . 10347E+06 .65832E+04 . 13832E+06 . 17700E+04 29175E+07 10016E+06 80290E+06 13916E+07 20179E+07 18554E+07 19297E+07 30Q19E+06 16565E+07 49460E+06 11869E+07 12896E+09 0.40157E+07 0.20168E+09 0.25251E+07 CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 0.58313E+06 0.61666E+06 0.89681E+06 0.68622E+06 0.67744E+06 0.98208E+05 57235E+06 16647E+06 56500E+06 40783E+08 14681E+07 71861E+08 10339E+07 0.12295E+07 0.14703E+07 0.14297E+07 0.12799E+07 0.17336E+06 0.10935E+07 0.29649E+06 0.88883E+06 0.71872E+08 0.37530E+07 0.13621E+09 0.19445E+07 22800E+07 18408E+07 17083E+07 24859E+06 15043E+07 41337E+06 10813E+07 10149E+09 0.42818E+07 0.18019E+09 0.25254E+07 0.25259E+07 0.16233E+07 0.22197E+06 0.13613E+07 0.39281E+06 0.89918E+06 0.93466E+08 0.40949E+07 0.16921E+09 0.24243E+07 87 CHANNEL 4 IP TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT CHANNEL n .16615E+07 .22487E+06 .13236E+07 •37580E+06 .10153E+07 .95476E+08 •41792E+07 . 16264E+09 .21812E+07 # TERMS 0.35846E+05 0.19369E+06 0.58136E+05 0. 13912E+06 0. 14247E+08 0.53485E+06 0.24037E+08 0.30575E+06 BACKLOG 0.13104E+07 0.32674E+06 0.92687E+06 0.80756E+08 n.33360E+07 0.14215E+09 0.19674E+07 EXPR CPU PLOT SEC 10190E+06 22640E+06 23401E+08 84047E+06 39575E+08 51099E+06 19997E+07 56153E+08 28847E+07 96614E+08 15396E+07 CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT CARDSREAD 0.65712E+10 0.23822E+09 0.10197E+11 0.12380E+09 CARDSPUNCH LINESPRINT THROUGHPUT 0.36843E+08 0.40651E+09 0.6016HE+07 0.20691E+11 0.23654E+09 0.34810E+07 88 Appendix G This appendix gives the ordinary least squares estimate of the regression coefficients variance-covariance matrix for the first sample of throughput data. Variance-Covariance Matrix CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU WAIT TIME CHANNEL CHANNEL CHANNEL CHANNEL CHANNEL # TERMS BACKLOG EXPR CPU ■ PLOT SEC • CARDSREAD CARDSPUNCH. LINESPRINT WAIT TIME CHANNEL CHANNEL 1 • CHANNEL 2 ■ CHANNEL 3 CHANNEL 4 # TERMS BACKLOG EXPR CPU • PLOT SEC CARDSREAD ■ CARDSPUNCH- LINESPRINT CONSTANT 0.16194E+05 -0.18714E+04 -0.16883E+04 -0.22947E+04 -0.18189E+02 -0.93725E+01 -0.94439E+01 -0.19297E+02- 0.77694E+00- -0.42765E+01 -0.28493E+01 -0.60585E+02 -0.18283E+02 -0.35872E+02- -0.51993E+00- 0.18745E+00- -0.56624E+00 -0.30243E-01- WAIT TIME 0.79626E+00 -0.74462E-02 -0.781 18E-01 • 0.81939E-01- 0.22865E-01 0.19952E-01 0.32937E+00 0.29914E-01 -0.22626E-01 0.13739E-02- ■0.35086E-03- ■0.30665E-03 0. 13463E-04 5 INITS 22709E+04 19032E+04 23054E+04 58922E+00- 1 1250E+02 27865E+01 14876E+01 0.25819E+01- 0.84411E+00 0.24346E+01 0.89837E+01 0.28434E+01 0.10978E+02- 0.66948E-01- 0.59939E-01- 0.90314E-02- 0.21771E-02- 6 INITS 0.21633E+04 0.22466E+04 ■0.10111E+01- 0.10828E+02 0.13633E+01 0.18607E+00 ■0.23691E+01- 0.14250E+00 0.12759E+01 0.18985E+02 0.25756E+01 0.10745E+02- 0.27899E-01- 0.36070E-01- •0.60807E-01- 0.27315E-02- 7 INITS PLORTCPU 0.34308E+04 0.90975E+00 0.15266E+02-0 0.27404E+01 0.24195E+00 0.32978E+01 0.82417E+00 0.15488E+01 0.41774E+02-0 0.44741E+01 0.18418E+02 0.46029E+00 0.72411E-01-0 0.16178E+00 0.58374E-02 •51062E-01 .20898E-03 11385E-01 ,23075E-01 68107E-02 51479E-02 30735E-02 .19621E+00 15754E-01 84806E-01 75719E-03 40727E-03 15724E-02 83081E-04 CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 0.25997E-01 0.34187E-02 0.66641E-02 0.46959E-02- 0.59000E-02- 0.33367E-01 0.15200E-01 0.18092E-03 0.17398E-02- 0.20476E-03- 0.58742E-03 0.12719E-04- 0.66599E-01 0.67904E-02 0.21623E-02- •0.12362E-01- 0.94229E-01- 0.16299E-01- 0.50198E-01 0.91 173E-03 0.72124E-04 0.43074E-04 0.10121E-04 0.22046E-01 ■0.35298E-02 0.46961E-02 0.1 1704E+00 0.83538E-02 0.27557E-01 0.10519E-02 0.35738E-04- 0.24801E-03 0.17949E-04 0.67691E-02 0.93339E-03 0.34538E-01 0.58907E-02 0.67555E-03 0.44599E-03 0.73860E-04 0.22400E-03 0.6282QE-05 89 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CHANNEL 4 0.2Q931E-01 # TERMS -0.37178E-01 0.45486E+01 BACKLOG 0.23270E-02 0.12l83E+no 0.42490E-01 EXPR CPU -0.85791E-02-0.73040E+00 0.10O76E-01 0.63377E+00 PLOT SEC -0.73690E-03-0.1 1754E-01-0.17512E-02 0.86191E-02 0.30297E-02 CARDSREAD -0 . 19607E-03-0 .28103E-04-0.25980E-03-0 .23559E-03 0.15868E-04 CARDSPUNCH . 1 6586E-04-0 . 3281 3E-02 0.42343E-03 0.44844E-02 0.61358E-04 LINESPRINT 0.38545E-O5-0.60840E-03 0.23688E-05 0.23205E-03 0.970S2E-05 CARDSREAD CARDSPUNCH LINESPRINT CARDSREAD 0.10051E-04 CARDSPUNCH-0 . 1 2062E-04 . 2023 1 E-03 LINESPRINT-0.10626E-05 0.37466E-05 0.11680E-05 90 Appendix H This appendix gives two matrices, the second of which is equal to the first times a scalar, that scalar being the estimate of variance in the exact restricted model. The first matrix is the inverse of the Sum of Squares Cross Products matrix of those variables in the restricted model which were not restricted to zero. The second matrix is the estimate of the variance-covariance matrix for the exact restricted least squares regression coefficients. These matrices were estimated using the first sample of data. The first of these two matrices is used as the stochastic prior information in the stochastic least squares estimation on the second sample of data. Restricted Least Squares Model Inverse of the Unrestricted Independent Variables Sums of Squares Cross Products Matrix CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU CONSTANT 0.43250E+00 5 INITS -0.24255E+00 0.46495E+00 6 INITS -0.22879E+00 0.42758E+00 0.51736E+00 7 INITS -0.21204E+00 0.44391E+00 0.46197E+00 0.62480E+00 PLORTCPU -0.31535E-03-0.24924E-03-0.39739E-03-0.40660E-03 0.27640E-05 EXPR CPU -0 .25967E-02-0 .25772E-02-0 .24085E-02-0 . 30704E-02-0 .66492E-06 EXPR CPU EXPR CPU 0.11320E-03 91 Variance-Covariance Matrix CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU CONSTANT 0.11970E+04 5 INITS -0.67131E+03 0.12869E+04 6 INITS -0.63324E+03 0. 1 1834E+04 O.U319E+OM 7 INITS -0.58687E+03 0.12286E+04 0.12786E+0H 0.17293E+04 PLORTCPU -0.87279E+00-0.68984E+00-0.10999E+01-0.1 1254E+01 0.76499E-02 EXPR CPU -0.7187OE+O1-O.71330E+O1-O.66660E+01-0.8498OE+O1-O.18403E-O2 EXPR CPU EXPR CPU 0.31332E+00 Appendix I 92 This appendix gives the Sum of Squares Cross Products Matrix for the second sample of throughput data. Sum of Squares Cross Products Matrix CONSTANT 5 I NITS 6 I NITS 7 INITS PLORTCPU WAIT TIME CHANNEL CHANNEL CHANNEL CHANNEL CHANNEL # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT WAIT TIME CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT CONSTANT 0.44000E+02 0.20000E+02 0.13000E+02 0.80000E+01 0.10128E+05 52289E+03 34871E+04 70315E+04 84618E+04 0.89195E+04 0.79371E+04 11710E+04 73615E+04 19490E+04 50772E+04 49353E+06 29515E+05 0.85232E+06 0.12296E+05 WAIT TIME 23683E+05 58597E+05 10879E+06 11208E+06 1 1120E+06 Q1210E+05 0.11192E+05 0.90616E+05 18805E+05 46337E+05 43862E+07 73669E+06 11427E+08 16997E+06 5 INITS 0.20000E+02 0.00000E+00 0.00000E+00 0.40637E+04 0.40088E+03 0.14726E+04 0.34351E+04 0.42583E+04 0.33124E+04 0.33863E+04 52200E+03 34545E+04 95456E+03 92145E+03 21593E+06 15664E+05 0.36UQ3E+06 0.55133E+04 6 INITS . 13000E+02 ■00000E+00 •34416E+04 •53018E+02 . 12970E+04 . 18279E+04 .21877E+04 •31729E+04 .24779E+04 , 34800E+03 .21810E+04 .53886E+03 . 18655E+04 , 17382E+06 .79304E+04 .25192E+06 .34938E+04 7 INITS 0.80000E+01 0.20523E+04 0.43163E+02 0.36514E+03 0.12298E+04 0.14603E+04 0.15898E+04 0.15031E+04 0.23100E+03 0.11815E+04 0.34222E+03 0.88918E+03 0.76977E+05 0.43635E+04 0.16036E+06 0.22659E+04 PLORTCPU ,26885E+07 84700E+05 71898E+06 14894E+07 18565E+07 18248E+07 19008E+07 29154E+06 . 16742E+07 47096E+06 86604E+06 12333E+09 ■62172E+07 17952E+09 25684E+07 CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 0.48221E+06 0.58974E+06 0.67898E+06 0.70008E+06 0.61729E+06 0.85138E+05 0.61603E+06 14043E+06 68593E+06 33402E+08 26124E+07 0.69738E+08 0.10733E+07 12273E+07 14139E+07 15013E+07 12667E+07 17971E+06 1 1683E+07 30226E+06 861615+06 75248E+08 0.53439E+07 0.14212E+09 0.20579E+07 0.17376E+07 0.17347E+07 0.15561E+07 0.22250E+06 0.14210E+07 37845E+06 91990E+06 92917E+08 59086E+07 16761E+09 24291E+07 0.27247E+07 0.15814E+07 0.21853E+06 0.14259E+07 0.36540E+06 13087E+07 91663E+08 50447E+07 20238E+09 28158E+07 93 CHANNEL 1* # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT CHANNEL H .15526E+07 .21839E+06 .13311E+07 .36143E+06 .865214 E+06 .92534E+08 0.54508E+07 0.149M5E+09 0.21600E+07 # TERMS 0.33551E+05 0.19594E+06 0.54120E+05 0.10850E+06 0.13722E+08 0.73555E+06 0.21456E+08 0.30825E+06 BACKLOG EXPR CPU PLOT SEC 13735E+07 .32716E+06 10125E+07 81692E+08 54075E+07 14115E+09 0.96191E+05 0.20253E+06 0.31712E+07 0.23660E+08 0.43475E+08 0.11956E+07 0.45997E+07 0.36267E+08 0.10980E+09 0.20653E+07 0.50897E+06 0.17245E+07 CARDSREAD CARDSPUNCH LINESPRINT THROUGHPUT CARDSREAD 0.68042E+10 0.30451E+09 0.89576E+10 0.12756E+09 CARDSPUNCH LINESPRINT THROUGHPUT 0.76442E+08 0.59629E+09 0.82384E+07 0.18925E+11 0.25288E+09 0.37852E+07 94 Appendix J This appendix gives the ordinary least squares estimate of the regression coefficients variance-covariance matrix for the second sample of throughput data. Variance-Covariance Matrix CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU WAIT TIME CHANNEL CHANNEL CHANNEL CHANNEL CHANNEL # TERMS BACKLOG EXPR CPU PLOT SEC • CARDSREAD ■ CARDSPUNCH LINESPRI NT- WAIT TIME CHANNEL • CHANNEL 1 ■ CHANNEL 2 CHANNEL 3 CHANNEL 4 ■ # TERMS BACKLOG EXPR CPU PLOT SEC CARDSREAD CARDSPUNCH- LINESPRI NT- CONSTANT 0.10589E+05 -0.19760E+04 -0.12217E+04 -0.16028E+04 -0.11050E+02 0.16672E+01- -0.38932E+01 -0.11593E+02 -0.42806E+01- -0.15763E+01- 0.19708E+01 -0.33374E+02- -0.44905E+01 0.55739E+01- -0.15324E+01 -0.53477E-01 0.73548E-01- ■0.71990E-01 WAIT TIME 0.34899E+00 -0.18709E-01 • 0.64960E-01- 0.40809E-01- 0.20425E-02 -0.31254E-01- 0.26405E+00 -0.95383E-02- 0.54060E-01 0.67032E-02- 0.92469E-04 -0.1 1282E-02 ■0.75256E-04 5 INITS 6 INITS 7 INITS PLORTCPU 0.15311E+04 0.99623E+03 0.10936E+04 0.15831E+01 0.57077E+01 0.52818E+00-0 0.18495E+01 0.1977QE+01-0 0.75327E-01-0 0.22545E+01 0.33184E+01 0.21892E-01-0 0.69309E+01-0 0.61673E+00 0.57222E-02-0 0.30153E-01-0 0.21597E-01 . 1 1791E+0M .89959E+03 . 3796 1E-01 .82453E-01-0 .76078E+00 . 12253E+01 .55633E+00-0. .46962E+00 .73352E+00 .31299E+01-0 , 12815E+00 .29085E+00-0, .46179E+00 .86021E-02 0, •39412E-01-0, .10441E-01 0, 13011E+04 53870E+00 14317E+01- 85230E+00 82411E+00 58151E+00 12327E+00 69298E+00- 92967E+00- 39074E+00 24314E+01- 38022E+00 10871E-01- 18571E-01- 89149E-02 36619E-01 50696E-02 19180E-02 16910E-01 13567E-01 20502E-02 85079E-02 U74HE+00 0.46365E-02 0.19312E-01 0.22331E-02 0.27723E-05 0.13022E-03 0.70826E-04 CHANNEL CHANNEL 1 CHANNEL 2 CHANNEL 3 0.21353E-01 0.16023E-03 0.83125E-03- 0.36273E-02- 21215E-02 21022E-01- 1 1347E-03 30439E-02 16451E-02- 89618E-04- 19925E-03- 10314E-05 . 7 1 1 4 1 E 0.25642E 13680E 51183E 46764E 89966E 16872E 5164QE 50619E 24262E 34214E -01 ■01 0.60968E-01 02 0.30424E-02 ■03-0.26578E-01-0 ■01 0.17807E-01 • 02-0.26357E-02 0, ■01-0.37165E-01 0, ■03 0.26592E-02-0 04 0.89981E-04 •03 0.46938E-04 ■04-0.65878E-04-0, 60522E-02 42200E-02 12348E-01 18649E-02 2J54J49E-02 38712E-03 27853E-04 15067E-03 56766E-04 95 CHANNEL 4 # TERMS BACKLOG EXPR CPU PLOT SEC CHANNEL 4 0.4293^-01 # TERMS -0.92577E-01 0-31699E+01 BACKLOG -0.11781E-02-0.36330E-01 0.21043E-01 EXPR CPU -0.10980E-01-0.19123E+00 0.78493E-03 0.51297E+00 PLOT SEC -0.77584E-03 0. 1 3772E-01-0 . 15151E-02-0 .79361E-02 0.19293E-02 CARDSREAD -0.78517E-04 0.65764E-03 0.81 332E-05-0.58533E-03 0.13206E-04 CARDSPUNCH-0.14070E-03 0.29406E-03-0. 10680E-03 0.521 19E-03-0 .68012E-04 LINESPRINT 0.78034E-04 0. 1 3059E-03-0 . 1 4048E-05-0. 17843E-03 0.14771E-04 CARDSREAD CARDSPUNCH LINESPRINT CARDSREAD 0.38694E-05 CARDSPUNCH 0.36209E-06 0.55955E-04 LINESPRINT 0.26205E-06-0.20360E-05 0.24776E-05 96 Appendix K This appendix gives the stochastically restricted estimate of the regression coefficients variance-covariance matrix for the second sample of data. Variance-Covariance Matrix CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU EXPR CPU EXPR CPU CONSTANT 5 INITS 6 INITS 7 INITS PLORTCPU 0.53479E+03 ■0.30770E+03 0.47209E+03 •0.29716E+03 0.41562E+03 0.51492E+03 ■0.28804E+03 0.42132E+03 0.43308E+03 0.58925E+03 ■0.38097E+00-0.10250E+00-0.32548E+00-0.29947E+00 ■ 0.311O9E+O1-O.1W6E+O1-O.1O618E+O1-0.1490OE+O1-O EXPR CPU 0.12883E+00 35412E-02 54608E-02 97 Appendix L This appendix gives, by job class, the job execution residency Sums of Squares Cross Products Matrices. SUMS OF SQUARES CROSS PRODUCTS MATRIX - CLASS A JOBS CONSTANT L*CPU/TH IOREQ RGNO RGN1 STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE RESIDE CONSTANT 0.76300E+03 0.99835E+05 0.22675E+06 0.90120E+03 0.40000E+00 0.13570E+04 0.26132E+06 41555E+05 47421E+06 64070E+04 12632E+06 STEP 0.30630E+04 0.49611E+06 0.53571E+05 0.95645E+06 0.10888E+05 0.24841E+06 RESIDE 0.47668E+08 L*CPU/TH 40298E+08 58020E+08 11827E+06 95412E+01 20715E+06 0.60345E+08 0.68012E+07 0.13211E+09 0.13406E+07 0.32314E+08 JOBREAD 0.37735E+09 0.50317E+08 0.47824E+09 0.24547E+07 0.54297E+08 IOREQ 0.24018E+09 0.25810E+06 0.88400E+02 0.49766E+06 0.96396E+08 0.37560E+07 0.22517E+09 0.13533E+07 0.62207E+08 JOBPUNCH 0.71227E+08 0.17799E+08 0.30195E+05 0.61151E+07 RGNO 0.11402E+04 0.60000E+00 0.15913E+04 0.30442E+06 0.42830E+05 0.55769E+06 0.76064E+04 0.15533E+06 JOBPRINT 0.51839E+10 0.27721E+07 0.10810E+09 RGN1 0.80000E-01 0.40000E+00 0.13000E+02 0.00000E+00 0.68800E+02 0.00000E+00 0.35600E+02 JOBPLOT 0.11943E+07 0.10762E+07 98 SUMS OF SQUARES CROSS PRODUCTS MATRIX - CLASS B JOBS CONSTANT L*CPU/TH IOREQ RGNO RGN1 STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE CONSTANT 0.20300E+03 0.51U05E+05 0.61893E+05 0.36184E+03 0.18190E+02 0.33600E+03 0.62893E+05 0.74100E+03 0.13246E+06 0.27230E+04 0.64091E+05 L*CPU/TH 0.1J5193E+08 0.24192E+08 0.86903E+05 0.19969E+04 10271E+06 25620E+08 65953E+06 50712E+08 19711E+07 0.35157E+08 IOREQ 0.51195E+08 0.1087UE+06 0.66822E+04 0.13591E+06 0.31803E+08 0.58560E+06 0.58060E+08 0.45157E+06 0.26432E+08 RGNO 0.67592E+03 0.31782E+02 0.594U8E+03 0.10610E+06 0.15259E+04 0.21948E+06 0.48100E+04 0.1 1494E+06 RGN1 0.10650E+02 0.18590E+02 0.29269E+04 O.OOOOOE+OO 0.63457E+04 0.78000E+02 0.393^0E+04 STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE STEP 0.71200E+03 0.13508E+06 0.14590E+04 0.25068E+06 0.35110E+04 0.12202E+06 JOBREAD 0.58201E+08 0.89737E+06 0.66034E+08 0.88619E+06 0.24633E+08 JOBPUNCH 0.35537E+06 0.98196E+0& O.OOOOOE+OO 0.42982E+06 JOBPRINT 0.22032E+09 0.14816E+07 0.48950E+08 JOBPLOT 91746E+06 14382E+07 RESIDE RESIDE 0.44203E+0! 99 SUMS OF SQUARES CROSS PRODUCTS MATRIX - CLASS C JOBS CONSTANT L*CPU/TH IOREQ RGNO RGN1 STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE CONSTANT 0.94000E+02 0.65835E+05 11821E+06 20083E+03 13940E+02 19900E+03 0.43613E+05 0.10530E+04 0.13120E+06 0.19530E+04 0.63851E+05 L*CPU/TH 0.15297E+09 0.19313E+09 0.12144E+06 0.34764E+04 0.17709E+06 0.38127E+08 0.32712E+06 0.16241E+09 0.29184E+07 0.96154E+08 IOREQ 0.77460E+09 0.20548E+06 0.69590E+04 0.33059E+06 0.82764E+08 0.30876E+06 0.26971E+09 0.18298E+07 0.16145E+09 RGNO 0.48797E+03 0.29826E+02 0.41669E+03 0.94861E+05 0.3W4E+04 0.28607E+06 0.30910E+04 0.13413E+06 RGN1 94548E+01 16300E+02 .31 189E+04 .00000E+00 91601E+04 77600E+02 45053E+04 STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE STEP 0.57300E+03 0.11799E+06 0.16140E+04 0.36031E+06 0.58390E+04 0.16047E+06 JOBREAD .54014E+08 .71082E+06 ,10124E+09 .85515E+06 .35085E+08 JOBPUNCH JOBPRINT JOBPLOT .27234E+06 93026E+06 OOOOOE+OO 69736E+06 54668E+09 .44939E+07 . 1 3344E+09 18852E+07 20108E+07 RESIDE RESIDE 0.89180E+08 100 SUMS OF SQUARES CROSS PRODUCTS MATRIX - CLASS D JOBS CONSTANT L*CPU/TH IOREQ RGNO RGN1 STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE CONSTANT 0.50000E+01 0.63082E+04 0.18181E+05 0.92300E+01 0.80000E+00 0.1 1000E+02 0.82700E+03 0.69000E+02 0.92090E+04 O.OOOOOE+OO 0.37090E+04 L*CPU/TH 14761E+08 12352E+08 16470E+05 12621E+04 75S95E+04 16178E+07 0.23237E+06 0.19260E+08 O.OOOOOE+OO 0.73956E+07 IOREQ 0.14461E+09 0.29M06E+05 0.46464E+04 0.62933E+05 0.91393E+06 0.17595E+05 0.31419E+08 O.OOOOOE+OO 0.97000E+07 RGNO RGN1 0.20849E+02 0.20000E+01 0, 0.16190E+02 0, 0.17989E+04 0.22425E+03 0.23853E+05 O.OOOOOE+OO 0.88148E+04 64000E+00 80000E+00 60800E+02 OOOOOE+OO 37120E+04 OOOOOE+OO 70000E+03 STEP JOBREAD JOBPUNCH JOBPRINT JOBPLOT RESIDE STEP 0.37000E+02 0.10310E+04 0.69000E+02 0. 10947E+05 O.OOOOOE+OO JOBREAD 0.23628E+06 0.24357E+05 0.17634E+07 O.OOOOOE+OO 0.58310E+04 0.82413E+06 JOBPUNCH 0.n76lOE+04 0.22956E+06 O.OOOOOE+OO 0.11026E+06 JOBPRINT 0.33211E+08 O.OOOOOE+OO 0.Q9274E+07 JOBPLOT OOOOOE+OO OOOOOE+OO RESIDE RESIDE 0.38452E+07 101 VITA A lifelong native of Illinois, Paul Lewellyn Chouinard was born in Elrahurst, Illinois on December 7, 19^5. He earned the A.B. degree, with College Honors in Liberal Arts and Sciences, from the University of Illinois at Urbana-Champaign in June, 1967. He subsequently received an A.M. degree in Economics from the same institution in June, 1971. From September, 1967 until August, 1975 he was employed by the University of Illinois as a research assistant, as a research programmer, and again as a research assistant. His paper "SOUPAC System Development: Past, Present, and Future" was published in the Proceedings of Computer Science and Statistics: Seventh Annual Symposium on the Interface . He also was author of the SOUPAC System Programmer 's Guide . He is a member of the Association for Computing Machinery and the American Statistical Association. BIBLIOGRAPHIC DATA SHEET 1. Report No. uiucdcs-r -76-799 l. Title and Subtitle The Statistical Estimation of Throughput and Turnaround Functions for a University Computer System 3. Recipient's Accession N< 5- Repon Date May 1976 Author(s) Paul Lewellyn Chouinard 8- Performing Organization Kept. No iJI!IODCS-R-76-Y'/J •• Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 10. I'roicrt/Task/Work Unit No. 11. Contract /Grant No. 12. Sponsoring Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 13. Type of Report & Period Covered Ph.D. Dissertation 14. 15. Supplementary Notes Computing Services Office, University of Illinois at Urbana-Champaign, Urbana, 111 6. Abstracts Throughput and turnaround functions for a university batch multiprogrammed system under "busy" conditions are estimated using exact restricted least squares, a mixed exact and stochastic restricted least squares, and positive part James-Stein estimates. The data used for these estimations were gathered under production conditions. The 'turnaround model is developed for a first-in first-out by job class scheduling algorithm through the use of execution residency functions. These functions explicitly consider the inverse relationship between throughput and the execution residency time of a job. 7. Key Words and Document Analysis. 17a. Descriptors Performance Measurement Turnaround Throughput Execution Residency Exact Restricted Least Squares Stochastic Restricted Least Squares Positive Part James-Stein Estimator Hypothesis Testing 7b. Identifiers /Open-Ended Terms 7e. COSATI Field/Group 8. Availability Statement Release Unlimited 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 191 22. Price 3RM NTIS-35 ( 10-70) USCOMM-DC 40329-P71 JUM24 1976