5”b / . H 9 Qrzo V D c I_j tin 3 ‘/'■S 6 IV /963 iVXs 7 fEBCOQt Uttiv COMPUTER ROUTINES FOR PROBABILITY DISTRIBUTIONS, RANDOM NUMBERS, AND RELATED FUNCTIONS By W.H. Kirby <-r» oo o C\J o —3 JJWBfisny of ILLINOISi ‘Pa,,' num S^ Water-Resources Investigations Report 83—4257 (Revision of Open-File Report 80—448) 1983 UNITED STATES DEPARTMENT OF THE INTERIOR WILLIAM P. CLARK, Secretary GEOLOGICAL SURVEY Dallas L. Peck, Director For additional information write to: Chief, Surface Water Branch U.S. Geological Survey, WRD 415 National Center Reston, Virginia 22092 Copies of this report can be purchased from: Open-File Services Section Western Distribution Branch Box 25425, Federal Center Denver, Colorado 80225 (Telephone: (303) 234-5888) CONTENTS Page Introduction. 1 Source code availability. 2 Linkage information. 2 Calling instructions. 3 BESFIO - Bessel function Iq. 3 BETAP - Beta probabilities. 3 CHISQP - Chi-square probabilities. 4 CHISQX - Chi-square quantiles. 4 DATME - Date and time for printing.... 4 DGAMMA - Gamma function, double precision. 4 DLGAMA - Log-gamma function, double precision. 4 ERF - Error function. 4 EXPIF - Exponential integral. 4 GAMMA - Gamma function. 4 GAMMAP - Gamma distribution probabilities. 3 GAUSAB - Gaussian (standard normal) deviates.... 3 GAUSCF - Gaussian cumulative probability function. 5 GAUSDY - Gaussian probability density. 5 GAUSEX - Gaussian deviate for stated exceedance probability. 5 GAUSSF, GAUSSV - Gaussian random numbers. 6 GUMBEL - Extreme-value quantiles. 6 GUMBEP - Extreme-value probabilities. 6 HARTIV - Table lookup of Pearson Type III distribution. 6 HARTK - Table lookup of Pearson Type III quantiles. 6 HARTKK - Table lookup of Pearson Type III quantiles. 7 HARTP - Table lookup of Pearson Type III probabilities. 7 HARTRG - Pearson Type III skew coefficient from quantile ratio. 7 HARTTP - Copy of tabular probabilities. 7 MINV - Matrix inversion. 7 OUTKGB - Grubbs-Beck outlier coefficient. 7 PRPLOT - Printer-plotter, arithmetic and probability grids. 8 PL0T2 - Initialize plot. 8 PL0T3 - Plot data points. 8 PL0T4 - Print the plot. 8 PL0T2P , PL0T4P - Normal probability scale. 8 PL0T2G, PL0T4G - Gumbel probability scale. 9 PL0T2L, PL0T4L - Logarithmic probability scale. 9 PRPSCL - Pretty scale determination. 9 RANDUB, RANDUV - Uniform random numbers. 10 SMIRP - Kolmogorov-Smirnov test probabilities. 10 SNCTPA - Noncentral t probability approximation. 10 SNCTXA - Noncentral t quantile approximation. 11 SNEFPB - Snedecor variance-ratio F probabilities. 11 SORT, SORTP - Sorting a vector of real numbers. 11 STAT2, STAT3 - Sample statistics. 11 STUTP, STUTPB - Student t probabilities. 12 STUTX - Student t quantiles. 12 WCKGSM - WRC Bulletin-17 generalized skew map. 12 WEI. BUL - Wei bull quantiles. 12 .0.9999, negative if p<0.0001). Reference—see HARTIV. 6 HARTKK(Z ,RIPS) - Function looks up that quantile of the standardized Pearson Type III distribution tabulated in the vector RIPS that has the same non¬ exceedance probability as the given standard normal deviate z. That is, HARTKK solves for x in the equation P{ R <_ x} = P( Z < zj where Z_ is the standard normal random variable and R is the standard Pearson Type III random variable tabulated in RIPS. The vector RIPS must have been filled by a call to HARTIV. Restriction—The value of z must not exceed 3.719 in absolute value; otherwise, HARTKK will be set to a large value of the same sign as z. Reference—see HARTIV. HARTP(Z,RIPS) - Function looks up the cumulative (non-exceedance) probability of the given value Z in the standardized Pearson Type III distribution tabulated in the vector RIPS. The vector RIPS must have been filled by a prior call to HARTIV. If z is outside the range of the table, HARTP is set to 0 (if z<0) or 1 (if z>0). Reference—see HARTIV. HARTRG(OR) - F unction returns the value of the skew coefficient corresponding to the given value of the 2-10-i00-quantile ratio, QR = (^100 - ^10^/^10~^2^» in which Xq- is the "T-year" quantile of a Pearson Type III distribution (having exceedance probability 1/T). The results are obtained from empirical polynomial and semilog formulas that correlate skew-values and quantile-ratios from Harter’s (1969, 1971) tables. HARTTP(IORDER,PROBV) - Subroutine copies the 31 standard tabular probabilities from Harter's (1969, 1971) tables into the user's 31-word Vector PROBV. If IORDER is negative, the probabilities are stored in decreasing order as ex¬ ceedance probabilities; otherwise, in increasing order. Reference—see HARTIV. MINV (A,N,DET,IWORK,JWORK) - Subroutine inverts the NxN matrix A and returns the value of the determinant in DET. The original contents of A are destroyed and replaced by the inverse. A singular A-matrix yields a DET value of zero. IWORK and JWORK are integer work vectors of dimension N, Source—IBM (1970) SSP routine MINV. OUTKGB(SIG,N) - Function returns the value of the Grubbs-Beck (1972) one-sided single-outlier criterion for normal samples of size N at significance level SIG. That is, OUTKGB is the solution for x of the equation P{ (X max - x)/s > x} = SIG where X max is the maximum of a sample of N normal ra_ndom variates, x and s are the sample mean and standard deviation (s={Z(x-x)^/(N-l)} 1 /2 ), and StG is the given significance level, expressed as either a percentage or a decimal fraction. This is the high-outlier criterion; the low-outlier criterion is just the negative of this: that is, -OUTKGB. Restrictions— The sample size N must be at least 3. The available significance levels are 1., 2.5, 5., and 10. percent (or 0.01, 0.025, 0.05, and 0.10). Other¬ wise, OUTKGB will be set to a large negative number. Note—Piecewise linear approximation is used for N above 100 and OUTKGB is constant for N above 180. Reference-~Grubbs and Beck (1972). 7 PRPLOT - Subroutine produces "printer-plotted" graphs on the line printer. Graphs are produced in three phases by calling three entry points, as f ollows: PL0T2(DUMMY,XMAX,XMIN,YMAX,YMIN) - Blanks out an internal plot- image area and then fills it with a rectangular coordinate grid. The standard grid has a vertical (Y) axis 51 print lines high and a horizontal (X) axis 101 print columns wide. Five horizontal grid lines and 10 vertical ones are drawn at 10-space intervals in each direction. The axis annotations at the grid lines are printed with three decimal places. Nonstandard grids can be defined by PL0T1: see the reference for details. Subroutine PRPSCL, described below, can be used to make the axis annotations come out as "even" numbers. The argument DUMMY must be supplied for compatibility with earlier versions of the PRPLOT package, but it is not used in any way by the subroutine. PL0T3(SYMBOL,X ,Y,N) - Plots the character SYMBOL at the N locations defined by the vectors X and Y. The SYMBOLS are plotted in the internal plot-image area initialized by PL0T2. SYMBOL may be either a literal constant or a CHARACTER*1(Prime) or L0GICAL*1(IBM) variable or array. If SYMBOL contains more than one character, only the leftmost one is plotted. X-Y coordinates outside the limits of the grid are ignored. Several curves can be drawn on one graph by repeated calls to PL0T3. When several points occupy the same grid position, the first-plotted points are obliterated; only the last-plotted point shows. PL0T4(N,LABEL) - Prints the internal plot-image area and scale anno¬ tations for the X and Y axes. The N-character string LABEL is printed as a label for the vertical (Y) axis. The standard grid uses 53 print lines, including the X-axis annotation. The user is responsible for ejecting the page and printing page headings, captions, and the X-axis label. PL0T4 does not destroy the plot area, so additional curves may be plotted on the same graph, if desired, by additional calls to PL0T3 and then printed by PL0T4. Calling PL0T2 reinitializes the plotting grid. Probability plots (normal probability grid) may be made by calling PL0T2P and PL0T4P instead of PL0T2 and PL0T4. The horizontal (X) axis is used for cumulative probabilities running from 0.1 percent at the left to 99.9 percent at the right. Probabilities outside this range are ignored. The user must convert the cumulative probabilities to the corresponding standard normal deviates; X = GAUSAB(PCUM) may be used for this purpose. Then PL0T3 is used in the usual way to plot the transformed cumulative probabilities/deviates on the X scale and the data observations on the Y scale. The necessary subroutine calls are as follows: PL0T2P(DUMMY,YMAX,YMIN) - Initializes the PRPLOT internal plot area with a standard normal probability grid. The argument DUMMY must be supplied for compatibility with previous versions of the program, but is not used in any way by the program. 8 PL0T4P(N,LABEL) - Prints the plot area and scale annotations, character string LABEL is printed on the vertical axis and probability label is printed under the X axis. Fifty-four printed. The user is responsible for page ejects, titles, The N- a cumulative lines are etc. A similar pair of routines, PL0T2G and PL0T4G, is available for plotting on Gumbel extreme-value probability coordinates. Cumulative probabilities must be transformed to standardized Gumbel deviates by the formula X = GUMBEL(PCUM). Then PL0T3 is used to plot the transformed cumulative probabilities on the X (horizontal) scale and the data observations on the Y (vertical) scale. Plotting the logarithms of the data gives a straight-line fit for a Weibull population. The necessary subroutine calls are as follows: PL0T2G(DUMMY ,YMAX,YMIh T ) - Sets up the standard Gumbel probability grid in the internal plot-image area. The argument DUMMY is required for compatibility with an earlier version of the subroutine but is not used in any way by the subroutine. PL0T4G(N,YLABEL) - Prints the plot area and scale annotations. The character string YLABEL, of length N, is printed on the Y (vertical) axis and a cumulative probability label is printed under the X (horizontal) axis. Fifty-four lines are printed. The user is respon¬ sible for printing captions, starting on a fresh page, etc. Finally, routines PL0T2L and PL0T4L provide a logarithmic probability grid for plotting exponentially distributed data or the logarithms of Pareto- distributed data. Before plotting, cumulative probabilities must be trans¬ formed to exponential variates by the formula X = -ALOG(1.-PCUM). Then PL0T3 is used to plot the transformed probabilities on the X (horizontal) scale and the data on the Y (vertical) scale. As above, PL0T2L sets up the grid, PL0T3 plots the data, and PL0T4L prints the graph and scale annotations. The calling sequences are: PL0T2L(DUMMY,YMAX.YMIN) « PL0T4L(N,YLABEL) in which all arguments are as above. PL0T4L prints 54 lines, including a probability label under the horizontal axis. The user is responsible for page ejects, titles, etc. The scale annotations may be following subroutine call: forced to come out as "even" values by the PRPSCL(U1 ,U2 ,NGRID,P1,P2) - Subroutine PI, P2 for a scale with NGRID grid the "ugly" values U1 and U2. computes "pretty” endpoints marks for plotting data between Reference—unpublished documentation for USGS program number B524 (PRPLOT), available from USGS Computer Center Division, User Services. PL0T1 - PL0T4 are part of program B524 (PRPLOT); the probability-plot routines make use of PRPLOT but are not part of program B524; PRPSCL is an independent subroutine. The present version of PRPLOT has been revised to use an internal plot-image 9 area of size 61 x 121 (7,381) characters, the size of a full printed page, rather than an area defined in the calling program. The revised PRPLOT nonetheless is compatible with the original one, so existing programs need not be changed, with the following exceptions: 1. Multi-page plots, with image areas larger than 7,381 characters, are not supported. 2. The calling program cannot inspect or modify the plot image area (except by calling PRPLOT). 3. Plot-image definitions can be removed from the calling program to save space, but do not have to be removed. RANDUB(IRAN) - Function returns a quasi-random number drawn from the uniform distribution on the unit interval. IRAN is the "seed" of the random number generator; it must be initialized by the user but thereafter is updated automatically by the generator. The initial value of IRAN may be between 1 and 2147483646. Note--This generator is the Lewis-Goodman-Miller (1969) generator designed for the IBM/360 computer. RANDUV(IRAN,X,N) - Subroutine places a quasi-random sample of size N from the unit uniform (0,1) distribution into the vector X. IRAN is the "seed" of the random number generator; it must be initialized by the user but there¬ after is updated automatically by the generator. The initial value of IRAN may be between 1 and 2147483646. Note--This generator is the Lewis- Goodman-Miller (1969) generator designed for the IBM/360 computer. SMIRP(X) - Function returns the large-sample limiting probability distribution of the Kolmogorov and Smirnov statistics, as follows: p{/"nD n _< x} (Kolmogorov 1-sample) P{/ (n+m)/nm D n ^ m _< x} (Smirnov 2-sample) where D n = max |F n (x) - x D n,m = lF n < x ) x and F n , F m are empirical (sample) distribution functions of sizes n and m and F is the corresponding hypothetical distribution. Source—IBM (1970) subroutine SMIRN. SNCTPA(X,N,D) - Function returns an approximate value of the Student non¬ central t probability P{ tjj d _< x} with N degrees of freedom and non¬ centrality parameter D. When D=0, the noncentral t becomes the ordinary Student t. The approximation is intended for N greater than say, 20, but retains some usefulness down to N equal 5 or 10. Reference—Zelen and Severo (1964, eqn. 26.7.10). F(x)| - F m (x)| 10 k', .*> •* ■> i SNCTXA(P,N,D) - Function returns an approximate value of the Student noncentral t quantile with N degrees of freedom, noncentrality parameter D, and non¬ exceedance probability P. That is, it is the solution for x of the equation P{t d <_ x} = P. The approximation is intended for large N but retains some value for N as small as 5 or 10. Reference—Zelen and Severo (1964, eqn. 26.7.10). SNEFPB(X,N1,N2) - Function returns the probability that the Snedecor F (or variance-ratio) random variable is less than or equal to the given value of x. That is. SNEFPB = P( F N1 >N 2 ^ x} 2 2 where F^^ N 2 = SjVS is the F or variance-ratio statistic with degrees of freedom in the numerator and N 2 in the denominator. SNEFPB is an entry point of subprogram BETAP (which see). S0RT(X,N) - Subroutine rearranges the N real numbers in the vector X into increasing order. If the value of N is negative (S0RT(X,-N)), the numbers are sorted in decreasing order. This routine uses the N-log-N Quicksort algorithm. It is about 3 times as fast as a simple bubble sort for N=25 and about 15 times as fast for N=250. Source—Stanford University Computation Center (unpublished program C063, 1971). S0RTP(X,IPOINT,N) - Subroutine sorts the N elements of the array X into increasing numerical order while maintaining pointers to the original positions of the sorted data items. If N is specified as a negative number, SORTP(X,IPOINT,-N) , then X is sorted into decreasing order. In 3 either case, IPOINT is an integer vector computed such that its i~th element, IPOINT(i), equals the original (input) index in X of the data element that sorts into the i-th location on output. The IPOINT values thus can be used to refer to the values of variables associated with sorted values of X, as follows: If before sorting X(i) and Y(i) are correlated, then after sorting X(i) and Y(lP0INT(i)) are correlated. The following example further illustrates the operation of SORTP: X(input) 4.2 1.3 8.7 9.2 2.1 X(output) 1.3 2.1 4.2 8.7 9.2 IPOINT 25134 STAT2(X,N,XBAR,STDDEV) - Subroutine computes the sample mean and standard deviation of the N observations stored in the vector X. STDDEV is formed by taking the square root of the sum of squares of deviations from XBAR divided by N-l. (See STAT3 for formulas.) If N is less than 2, STDDEV is set equal to zero; if N is less than 1, XBAR also is set to zero. 1 1 STAT3(X,N,A,S,G) - Subroutine computes the sample mean, standard deviation, and skew coefficient (A, S, and G) of the N real numbers in the vector X. The sample statistics are defined as follows: A = EX/N S = {Z(X~A) 2 /(N—1)} 1/2 G = {NI(X-A) 3 }/( (N-l ) (N-2 )S 3 } The sample size N must be at least 3 to define the skew, 2 to define the standard deviation, and 1 to define the mean. If the sample is too small to define any of these statistics (or if N is negative), the undefined statistics are set equal to zero; no error messages or warnings are issued. STUTP(X,N) - Function returns the probability that Student's t with n degrees of freedom is less than or equal to x, p{ t n _< x} . Note—Tail probabilities for two-sided t tests can be computed as follows: P{ ! t n | > x} = 2. *STUTP(-x,n) for x>0. Reference—Hill (1970a). STUTPB(X,N) - Function returns the probability that Student's t with n degrees of freedom is less than or equal to x. STUTPB performs the same function as STUTP, but is provided automatically as part of the BETAP routine for the beta distribution (which see). STUTX(P,N) - Function returns the p-th quantile of Student's t with n degrees of freedom. That is, it solves for x the equation p{ t n _<_ x} = p for the given value of p. Note—Two-sided t-criteria can be computed as follows: P{|t n | > x; = a has the solution x = STUTX(1.-a/2.,n). Reference—Hill (1970b). WCFGSM(FLAT,FLON) - Function returns the value of the generalized skew coeffi¬ cient shown on Plate 1 of the U.S. Water Resources Council's "Guidelines for Determining Flood Flow Frequency" (Hydrology Committee Bulletin 17, 1976, or 17A, 1977) at latitude FLAT and longitude FLON. FLAT and FLON must be expressed in decimal degrees so that, for example, 45°30' would have to be expressed as 45.5°. For points outside the conterminous United States, Alaska, and Hawaii, the function returns a large negative number. WEIBUL(SKU,P) - Function returns the p-th quantile of a standardized Weibull variate with skew SKU. Thus, it returns the value (W p -EW)/s w , in which EW and sy are the mean and standard deviation of the Weibull variate W and Wp is the solution for x of the equation Pj W _< x} = 1 - exp(-x c ) = p for any probability p. In this equation, c is the Weibull shape factor, which depends on the skew; the program automatically calculates c, EW, and s w each time the skew coefficient is changed. Restrictions—The skew coefficient must be less than 100 in absolute value; otherwise, 100 is used. The skew decreases to zero as the shape factor c increases to 12 about 3.6. Negatively skewed distributions are obtained by reflecting the positively skewed distribution, not by increasing c beyond 3.6. Reference—Benjamin and Cornell (1970). WILFRT(SKU,ZETA) - Function returns an (approximately) Pearson Type III standardized variate with skew SKU and probability corresponding to that of the given standard normal deviate ZETA. That is, the value returned is an approximate solution for x of the equation p { W <_ x} = P{ Z <_ ZETA} where Z and W are respectively the standard normal and Pearson Type III variates. Specifically, WILFRT(SKU,GAUSAB(P)) is an approximation to the p~th quantile (x such that p{ W _< x} = p) of the Pearson Type III distribution with skew SKU. Restriction—The skew must not exceed 9.75 in absolute value; otherwise, +9.75 will be used. Note—The approxi¬ mation is an improved Wilson-Hilferty (cube-root-normal) transformation which matches the mean, standard deviation, skew, and lower bound of the Pearson Type III distribution. The approximation is excellent for skews below 2 in absolute value and is useful throughout the defined range of skews. Reference—Kirby (1972). WILFRV(SKU,ZETAV,N) - Subroutine transforms the N-element vector ZETAV from standard normal deviates to (approximate) Pearson Type III standardized deviates with skew SKU. The vector ZETAV must be filled with standard normal deviates before calling WILFRV. GAUSAB or GAUSSV may be used for this purpose. WILFRV is a vectorial version of WILFRT (which see). i XETIME(0.) - Function returns the total amount of execution time (CPU time) used by the program in seconds. The timer is started by the first call to XETIME (which returns a value of 0.0), and all subsequent times are measured relative to this origin. To ensure proper operation of the function, its argument always should be specified as 0.0. REFERENCES Benjamin, J. R., and Cornell, C. A., 1970, Probability, statistics, and decision for civil engineers: New York; McGraw-Hill, 684 p. Box, G. E. P. , and Muller, M. E. , 1958, A note on the generation of random normal deviates: Annals of Mathematical Statistics, v. 29, no. 2, P . 610-611. Goldstein, R. B., 1973, Algorithm 451, Chi-square quantiles: Communications of the Association for Computing Machinery, v. 16, no. 8, p. 483-485. Grubbs, F. E., and Beck, G. , 19/2, Extension of sample sizes and percentage points of outlying observations*Technometrics, v. 14, no. 4, p. 847-854, Harter, H. L. , 1969, A new table of ^r^entage points of the Pearson Type III distribution: Technoinet ri cs*Fvj 1 I , no. 1, p. 1 77-187. v UNIVERSfTY OF ILLINOIS-URBANA Harter, H. L. , 1971, More percentage points of the Pears 30112098716696 distribution: Technometrics, v. 13, no. 1, p. 203-20s. Hill, G. W. , 1970a, Algorithm 395, Student's t-distribution: Communications of the Association for Computing Machinery, v. 13, no. 10, p. 617-619. _ 1970b, Algorithm 396, Student's t-quantiles: Communications of the Association for Computing Machinery, v. 13, no. 10, p. 019-620. Hill, I. D., and Pike, M. C., 1967, Algorithm 299, Chi-squared integral: Communications of the Association for Computing Machinery, v, 10, no. 4, p. 243-244. International Business Machines, 1970, System/360 scientific subroutine package version III programmer's manual: IBM Order No. GH20-0205. Johnson, N. L., and Kotz, S., 1970, Continuous univariate di cributions, 2 vols New York, John Wiley, 300 p., 306 p. Kirby, W., 1972, Computer-oriented WiLson-Hilferty transformation that preserves the first three moments and the lower bound of the Pearson Type 3 distribution: Water Resources Research, v. 8, no. 5, p. 1251-1254. Lewis, P. A. W., Goodman, A. S., and Miller, J. M., 1969, A pseudorandom number generator for the System/360: IBM Systems Journal, v. 8, no. 2, p. 136 - 146 . Mood, A. M., Graybill, F. A., and Boes, D. C., 1974, Introduction to the theory of statistics, 3d ed: New York, McGraw-Hill, 564 p. U.S. Water Resources Council, Hydrology Subcommittee, 1977, Guidelines for determining flood flow frequency, Bulletin 17A: Washington, D.C. Zelen, N. , and Severo, N. C. , 1964, Probability functions, Chap. 26 i_n M. Abramowitz and I. A. Stegun, eds., Handbook of Mathematical Functions: U.S. National Bureau of Standards, Applied Mathematics Series No. 55, 1045 p.