Report No. 370-3 
 
 SOUPAC PROGRAM DESCRIPTIONS 
 STATISTICALLY ORIENTED USERS PROGRAMMING AND CONSULTING 
 
 March 1, 1971 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 519.92 
 I46s 
 
The person charging this material is re- 
 sponsible for its return on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 University of Illinois Library 
 
 MAY 1 6 197' 
 JUL 2 6 1979 
 
 fMY2l 
 
 AUG 20 
 
 -2 
 
 o m\ 
 
 981 
 
 1983 
 
 L161— O-1096 
 
J5 Q^<^Y^C^ \JfUO-jP-A~~rX- 
 
 This manual has been prepared by the Statistical Consultants 
 of the Department of Computer Science, University of Illinois (Urbana Campus), 
 as documentation for the system of statistical programs known as SOUPAC. 
 SOUPAC has been written by the statistical consultants of this department in 
 an effort to provide a broad range of standard statistical procedures which 
 are of use to the academic community at large. Inquiries about SOUPAC 
 should be directed to 138 Digital Computer Laboratory. 
 
PROGRAM LIST 
 
 February, 1971 
 
 AUTOCORRELATION AND SPECTRAL ANALYSIS 
 
 BALANOVA 5 
 
 BINORMAMIN 
 
 BISERIAL CORRELATION 
 
 CANONICAL ANALYSIS 
 
 CENTROID FACTOR ANALYSIS 
 
 CLASSIFICATION 
 
 CLIQUE ANALYSIS 
 
 COMMUNALITY ESTIMATION 
 
 CORRELATION 
 
 DISCRIMINANT ANALYSIS 
 
 ECONOMETRIC REDUCED FORM AND RESIDUAL ANALYSIS 
 
 FREQUENCY COUNTING - MEASURES OF ASSOCIATION 
 
 FIT (CHI SQUARE GOODNESS-OF-FIT TEST) 
 
 ITERATIVE FACTOR ANALYSIS 
 
 JACOBI (Eigenvalues and Eigenvectors) 
 
 K - CLASS ESTIMATION 
 
 KOLMOGOROV- SMIRNOV 
 
 LINEAR PROGRAMMING 
 
 MATRIX OPERATIONS 
 
 MISSING DATA CORRELATION 
 
 MULTIPLE CORRELATION 
 
 OBLIMAX 
 
 PAIRED COMPARISONS 
 
 PARTIAL CORRELATION 
 
 PRINCIPAL AXIS FACTOR ANALYSIS 
 
 PROBIT (Maximum Likelihood Regression) 
 
 PROCRUSTES 
 
 QUADRATIC PROGRAMMING 
 
 RANDOM NUMBER GENERATOR 
 
 RANKING 
 
 SCALOGRAM ANALYSIS 
 
 SQUARE ROOT FACTOR ANALYSIS 
 
 STANDARD SCORES 
 
 STEP-WISE MULTIPLE CORRELATION 
 
 THREE MODE FACTOR ANALYSIS 
 
 THREE-STAGE LEAST SQUARES 
 
 TRANSFORMATIONS 
 
 T-TESTS 
 
 UNRESTRICTED MAXIMUM LIKELIHOOD FACTOR ANALYSIS 
 
 VARIMAX 
 
Digitized by the Internet Archive 
 
 in 2011 with funding from 
 
 University of Illinois Urbana-Champaign 
 
 http://www.archive.org/details/soupacprogramdesOOuniv 
 
TABi_,E OF CONTENTS 
 
 A. Introduction 
 
 1. Introduction to SOUPAC 
 
 2. Options to a SOUPAC Job 
 
 3. SOUPAC Input-Output and Temporary Storage 
 h. Using Owner Storage Devices with SOUPAC 
 
 5. Cross-Reference 
 
 6. Temporary Program Data Size Restrictions 
 
 B. Data and Matrix Manipulations Package 
 
 1. Introduction 
 
 2. MATRIX 
 
 3. TRANSFORMATIONS 
 
 C. Basic Population Statistics Package 
 
 1 . FREQUENCY 
 
 2. RANK 
 
 3. STANDARD SCORES 
 
 D. Analysis of Variance Package 
 
 1. BALANOVA 5 
 
 2. CLASSIFICATION 
 
 3. DISCRIMINANT 
 k. T-TEST 
 
 E. Correlations and Regression Package 
 
 1. BISERIAL CORRELATION 
 
 2. CANONICAL ANALYSIS 
 
 3. CORRELATION 
 
 k. MISSING DATA CORRELATIONS 
 
 5. MULTIPLE CORRELATIONS 
 
 6. PARTIAL CORRELATIONS 
 
 7. STEP-WISE MULTIPLE REGRESSION 
 
 F. Distribution Analysis Package 
 
 1. FIT 
 
 2 . KOLMOGOROV-SMIRNOV 
 
 G. Factor Analysis Package 
 
 1. Introduction 
 
 2 . BINORMAMIN 
 
 3. CENTROID FACTOR ANALYSIS 
 h. COMMUNALITY ESTIMATION 
 
 5. ITERATIVE FACTOR ANALYSIS 
 
 6. JACOBI 
 T • OBLIMAX 
 
 8. PRINCIPAL AXIS FACTOR ANALYSIS 
 9 • PROCRUSTES 
 
 10. SQUARE ROOT FACTOR ANALYSIS 
 
 11. THREE MODE FACTOR ANALYSIS 
 
 12. UNRESTRICTED MAXIMUM LIKELIHOOD FACTOR ANALYSIS 
 
 13. VARIMAX FACTOR ROTATION 
 
TABLE OF CONTENTS 
 (continued) 
 
 H. Econometrics Package 
 
 1. ECONOMETRIC REDUCED FORM AND RESIDUAL ANALYSIS 
 
 2. K-CLASS ESTIMATION 
 
 3. LINEAR PROGRAMMING 
 
 k. QUADRATIC PROGRAMMING 
 
 5. THREE STAGE LEAST SQUARES ESTIMATION 
 
 I. Spectral Analysis Section 
 1 . AUTOCORRELATIONS 
 
 J. Scale Analysis Package 
 
 1. CLIQUE ANALYSIS 
 
 2. PAIRED COMPARISONS 
 
 3 . SCALOGRAM 
 
 K. ProMt Analysis Section 
 1 . PROBIT 
 
 L. Random Number Generation Section 
 1. RANDOM NUMBER GENERATOR 
 
 M. UTILITY 
 
INTRODUCTION TO SOUPAC 
 A USER'S GUIDE 
 
 A SOUPAC program consists, at least, of (l) IBM 360 system statements, 
 and (2) SOUPAC statements. The IBM 360 system statements are used to give 
 instructions to the 360 computer, and the SOUPAC statements are used to 
 describe the types of data manipulations and statistical analyses that 
 are to be performed. The set of SOUPAC statements in one job is called a 
 SOUPAC parameter deck, and it will include references to one or more 
 individual SOUPAC programs. Note all cards should be punched on an IBM 029 
 keypunch . 
 
 Every 360 system statement must be on a separate card and must have the 
 characters // or /* in columns 1 and 2. In particular, the system cards 
 for a SOUPAC job are listed here, in the order they must appear: 
 
 CARD 1 /*ID accounting information 
 CARD 2 // EXEC SOUPAC 
 CARD 3 //SYS IN DD * 
 
 SOUPAC parameter deck 
 CARD k /* 
 
 Note that these four cards are an absolute minimum. 
 
 CARD 1 
 
 The /*ID card is punched on a yellow striped card available in the 
 keypunch area. This card has no clipped corners and is used only as an 
 ID card for jobs to be run on the 36O. The first five columns of this card 
 must contain the characters /*ID followed by a blank. This is the only 
 card in your deck which must be punched on a special card; the rest of your 
 deck may be punched on any of the standard corner cut cards found in the 
 keypunch area. 
 
 Sample ID cards : 
 
 /*ID PS=9999,DEPT=AGRON,NAME=SMITH 
 
 /*ID NAME= ' B0B_SMITH ' , PS=8899, DEPT=VOTEC, LINES=5000 
 
 /*ID C0DE=SWITCH, PS=9988,NAME=STEPPENWULF,DEPT=REC 
 
 The accounting information consists of the following keywords and 
 responses : 
 
 KEYWORD RESPONSE 
 
 necessary 
 
 PS= your problem specification number 
 
 DEPT= your department 
 
 NAME= your name 
 
KEYWORD 
 
 RESPONSE 
 
 optional 
 
 (min, sec) 
 TIME=(,sec) 
 min 
 
 IOREQ=n 
 
 CODE=ZZZZZ 
 
 LINES=XXXXX 
 
 CARDS =XXXX 
 
 the optional TIME parameter on the 
 ID card is the estimate of execution 
 time for the job. By default it is 
 1 minute. The time estimate on the 
 ID card is the time estimate for the 
 SOUPAC job. If there is a time 
 estimate on the // EXEC SOUPAC card 
 there must be at least as much time 
 on the ID card. 
 
 where n is the number of INPUT/OUTPUT 
 requests. The default is 1000. 
 Moving from cards to a temporary 
 storage location generates one I/O 
 request per observation. Moving 
 to or from a temporary storage 
 location generates two I/O requests 
 per observation. Cards read in or 
 lines printed out do not generate 
 I/O requests. The interactive nature 
 of SOUPAC programs often necessitates 
 considerable amounts of data transfer. 
 For this reason, I/O requests for 
 SOUPAC jobs are relatively large. 
 
 code is the signal that your problem 
 specification number is code word 
 protected. If you have code word 
 protection, get the code word from 
 your approved problem specification 
 request form. If you do not have 
 code word protection, do not use the 
 keyword CODE. 
 
 you requested XXXXX lines of output. 
 The default is 2000. 
 
 you requested XXXX cards punched. 
 The default is NONE. 
 
 There must be no blank spaces in the accounting information and the order 
 of the accounting information makes no difference. Blanks in a name, such 
 as JOHN USER, are not allowed. The name must be only one word or of the 
 form JOHNUSER. If blanks are desired, then the name must be enclosed in 
 apostrophes and an underscore substituted for the blank, e.g. 'J0HN_USER'. 
 
 Note further that commas separate keywords from responses to keywords 
 and that the equal sign is a part of the keyword response. Furthermore, 
 your ID card parameters may not go beyond column 71 • To continue an ID card 
 put a comma after the last parameter that will fit completely into 71 columns 
 
-3- 
 
 or less and proceed on a second card which has the characters /* punched in 
 columns 1 and 2, followed by at least one blank. This second card may be 
 any standard corner cut card. 
 
 Further information about additional parameters is available from either 
 the Service Programming Area or the SOUPAC office. 
 
 CARD 2 
 
 The // EXEC SOUPAC is punched as in the example on page 1 with two 
 slashes in columns 1 and 2? followed by at least one blank. This is 
 followed by the word EXEC, at least one blank, and then the word SOUPAC. 
 
 There is in fact another way to get a SOUPAC job. One can say: 
 
 // EXEC SOUP 
 //SYSIN DD * 
 
 The essential differences are these: SOUPAC has 15 available temporary 
 storage locations; SOUP has only 5 temporary storage locations. SOUP, 
 however, runs about five seconds faster than SOUPAC and is generally 
 recommended for users who do not require more temporary storage locations 
 than SI through S5. 
 
 The amount of memory to be used is specified on each // EXEC SOUPAC 
 or // EXEC SOUP card rather than on the ID card. SOUPAC jobs are given 
 a default region of 150K which is adequate for nearly all SOUPAC jobs. 
 See sections on options to a SOUPAC job. 
 
 CARD 3 
 
 The next card is copied as in the example with //SYSIN beginning in 
 column 1 and extending through 7 without any blanks. This is followed by at 
 least one blank, the two characters DD, at least one blank, finally the 
 character *. 
 
 SOUPAC Statements 
 
 The SOUPAC parameter deck is divided into two main sections. The 
 first consists of the problem program parameter cards. Individual problem 
 program write-ups are available to users through the SOUPAC office. The 
 last card of the problem program parameter cards must always contain the words 
 END SOUPAC as the first non-blank characters. The END SOUPAC card is 
 immediately followed by any data decks you may have. All data decks are 
 preceded by a DATA format card and followed by an END# card. The format 
 card has the word DATA as the first four non-blank characters. After the word 
 DATA may be any comments the user may want, if any. These comments will 
 be printed on the user's output, but will be otherwise ignored by SOUPAC. 
 Next comes the data card parameters. The user must specify the number of 
 variables per row of data to be input, and also may optionally specify the 
 number of rows of data( observations ) . Following this information is a 
 standard FORTRAN type format enclosed in parentheses. This is the format 
 
-k- 
 
 which describes the structure of the data deck it precedes. The following 
 forms of the data format card are valid: 
 
 DATA(observations , variables ) (format string£_592 non-blank characters) 
 
 DATA (, variables ) (format string£_592 non-blank characters) 
 
 DATA (variables ) (format string£592 non-blank characters) 
 
 Remember that all data decks must be terminated by END# as the first 
 four non-blank characters on a single card. 
 
 Data used for calculations should be read in either E or F format. 
 Data not used for calculations can be read in any format. 
 
 END CAEDS 
 
 Within a SOUPAC parameter deck, three types of END cards are used: 
 
 END#: This card is used to end each data deck which appears. For each 
 DATA format card used, there must be a corresponding END# card 
 at the end of the data deck. 
 
 END S: This card appears once and only once per SOUPAC parameter deck 
 
 and must appear as the last card in the parameter deck. Its function 
 is to separate the SOUPAC parameter deck from the data decks. 
 
 END P: This card is used to indicate the end of any program which requires 
 subparameter cards and must appear for all such programs. (in this 
 context $-control cards are not taken to be subparameter cards). 
 Any program which needs the END P card explicitly states so within 
 its individual program write-up. For all other programs, use of the 
 END P card is an error. If a program which uses the END P card 
 is also the last program to be executed it must be terminated by an 
 END P card then an END S card must appear to indicate that no more 
 cards follow. 
 
 CARD h 
 
 The last physical card in your deck must contain the characters /* 
 in columns 1 and 2. No other punches should appear on this card. 
 
 Sample SOUPAC deck setup: 
 
 //ID NAME=SMITH,DEPT=PEM,PS=6894,LINES=5000,CARDS=500,IOREQ=10000 
 // EXEC SOUPAC 
 //SYSIN DD * 
 
 program 1 ()()()( ) 
 
 program 2 ()()()( ) 
 
 program n ()()()( ) 
 
 END SOUPAC 
 DATA 
 
 END# 
 /* 
 
-5- 
 
 Note that there may he more prog] ams than data decks or more data decks 
 than programs. In any case, for programs that read data decks, the user 
 must insure that the order of data decks corresponds to the order of 
 programs that read data decks since the first program to read a data deck 
 will read the first available deck, and so on. The process continues until 
 either the SOUPAC program list is exhausted or a program reading data cannot 
 find a data deck available. Notice that extra data decks are ignored. 
 
 A FINAL NOTE 
 
 Any changes the user may want to specify in the job control language 
 of SOUPAC must follow the standard OS/360 conventions. 
 
-1A- 
 
 OPTIONS 10 A SOUPAC JOB: 
 PARMS, PROLOG CARDS, AND $-CONTROL CARDS 
 
 PARMS 
 
 Parms are arguments to the keyword 'Parameter,' contracted into 
 the keyword 'PARM' , which give instructions to a processor running under a 
 360-system. In this context, SOUPAC is a processor running under a 360 
 system. PARMS are always coded on an EXEC card and have the following form: 
 
 / / EXEC SOUPAC , PARM= ' 0PT1 , 0PT2 , , OPTm ' 
 
 The permissable options to he used as SOUPAC PARMS are listed below with an 
 explanation of their use and function. Note that the default is underlined, 
 that is, // EXEC SOUPAC is equivalent to // EXEC SOUPAC ,PARM='0PT1,0PT2. . . 
 where the underlined PARM is to be taken as one of the list of options in the 
 PARM string in the example. These PARMS give the SOUPAC system instructions 
 in the same way that parameters give SOUPAC statistical or data management 
 programs instructions. 
 
 1. NODYNAM or DYNAM 
 
 NODYNAM implies that a non-dynamic ally allocatable version of the 
 library of statistical procedures is to be used. This version will 
 run in some 150K of core and will handle a lesser number of 
 variables than the dynamically allocatable version. DYNAM will use 
 the dynamically allocatable version of any program requested which 
 will handle more variables in an arbitrarily specified amount of 
 core above a certain minimum. If using DYNAM, see the SOUPAC 
 consultants for a handout on optimal region sizes for particular 
 numbers of variables. 
 
 2. EXECUTE or NOEXECUTE 
 
 NOEXECUTE implies that the SOUPAC parameter deck, for which the 
 Syntax Interpreter is to scan and build intermediate parameters, 
 should not be executed. NOEXECUTE indicates that only a syntax 
 check is to be performed. If EXECUTE is specified and no errors 
 are found by the Syntax Interpreter, the job step will proceed. 
 If EXECUTE is specified and errors are found by the Syntax Inter- 
 preter, execution of the step may continue depending upon 
 whether LET or NOLET is also specified. 
 
 3. NOLET or LET 
 
 If an error is found by the Syntax Interpreter and EXECUTE has 
 been specified, execution will proceed only if LET was also 
 specified. In this case, execution will proceed only through the 
 last program processed which was completely error free. If NOLET 
 was specified and errors are found by the Syntax Interpreter, 
 execution will not be permitted. 
 
 h. LIST or NOLIST 
 
 LIST indicates that all program cards are to be listed. NOLIST 
 indicates that only the prolog section of the SOUPAC parameter 
 deck is to be listed. 
 
-2k- 
 
 5- PGM or NOPGM 
 
 PGM indicates that a complete SOUPAC parameter deck and data decks 
 follow. NOPGM indicates that only the prolog section and data 
 deck follow, and that the intermediate parameters are being provided 
 by the user by over-riding the cataloged procedure. This implies 
 that the user has previously run a SOUPAC job and has saved the 
 two necessary data sets so that he may run the same program again. 
 To perform this saving of data sets correctly, a user should visit 
 the SOUPAC office first to ensure it is done correctly. 
 
 If any error is found by the Syntax Interpreter in the prolog section, 
 the job step will not continue. 
 
 If the job step which generated the intermediate parameter data 
 sets found syntax errors, execution of the job step in which 
 NOPGM is specified will continue (if EXECUTE is specified) through 
 the last program processed which was completely error free regard- 
 less of whether LET or NOLET was specified in either job step. 
 
 Examples : 
 
 To do just a syntax check: 
 
 // EXEC SOUPAC ,PARM= ' NOEXECUTE ' 
 To execute up to the first program found to have syntax errors : 
 
 / / EXEC SOUPAC ,PARM= ' LET ' 
 
 To execute up to the first program found to have syntax errors and 
 use the dynamically allocatable library: 
 
 / / EXEC SOUPAC , PARM= ' LET , DYNAM ' . 
 
 Note that the PARMS may be listed in any order. 
 
 B. PROLOG OF A SOUPAC JOB 
 
 Described below are several # control cards which may appear in the 
 prolog of a SOUPAC job. Within the prolog these control cards may 
 appear in any order. If prolog control cards are used, they must appear 
 immediately after the SYSIN card. The Syntax Interpreter determines 
 the end of the prolog when it reads a card which is not one of these 
 types. All types have parameters and must be terminated by a period . 
 Prolog cards may not have continuation cards , hence all parameter 
 information must be punched within 80 columns. There is no limit 
 to the number of prolog cards permitted nor is there any restriction on 
 the number of any one type. If conflicting information is entered, the 
 information entered last overrides any previous definitions. 
 
 1. ^REPEAT OPTION 
 
 The #REPEAT OPTION is used to repeat sections of a SOUPAC 
 parameter deck an optional number of times. The #REPEAT card 
 
-3A- 
 
 which appears in the prolog section will be followed by up to 
 22 (twenty-two) integer parameters which will indicate the number 
 of repetitions of up to 22 repeat sequences. The card sequences to 
 be repeated will be preceded and followed by #SREP and #EREP 
 cards respectively. Example: 
 
 /*ID 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 #REPEAT (2). 
 
 <additional program cards> 
 
 #SREP 
 
 CORRELATION (c)( )(Sl). 
 
 SQUARE ROOT FACTOR ANALYSIS (Si) (P(F) ) (20) (C)(P(F) ) . 
 
 #EREP 
 
 END S 
 
 In this example the program sequence of CORRELATION and SQUARE ROOT 
 FACTOR ANALYSIS will be repeated twice. Four card input data 
 sets would be required for the repeated sections. 
 
 Repeat sequences which begin before a main program and end in 
 a subprogram or which begin in a subprogram and do not end in the 
 same subprogram are not allowed. Nested or overlapping repeat sequences 
 are not allowed. Also a #SREP card cannot be immediately followed 
 by a #EREP card and a single appearance in the deck of either card 
 will cause an error. 
 
 2. #V-UNIT OPTION 
 
 #V-UNIT allows the user to change input and output addresses 
 in the execution of one SOUPAC job. The form of a #V is as follows: 
 
 #Vn (m) (A 1 ) (AjJ. 
 
 where n is an integer 1 through 9» thus there can be at most 9 
 variable addresses, namely VI through V9; and m is a counter which 
 determines how many times a variable address may be used before it 
 
 assumes the next value in its list of possible values. Aj_ A^ 
 
 are addresses which Vn assumes. These can be any valid address. 
 
 At the moment, however, forms like (Sl/P) will not work. Note that . 
 
 CARDS and PRINT are permitted. 
 
 Finally, the list of addresses is cyclic; that is, if, after 
 A^ has been used, Vn occurs again in the program, Vn will have the 
 value A]_ , and so on . 
 
 /•ID 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 #V9(D(S1)(S2)(S3)(SU). 
 
 #V5(1)(S1)(S2)(S3)(SU). 
 
 #REPEAT (k). 
 
 MAT. 
 
-UA- 
 
 ( Example, Continued) 
 
 #SREP 
 
 MOV (C)(V9). 
 
 #EREP 
 
 HOR (V5)(V5)(V5)(V5)(S5' 
 
 END P 
 
 END S 
 
 This program segment reads k separate card decks, saving them in 
 temporary storage, and horizontally augments them into one data set. 
 
 The equivalent without the use of #REPEAT and #V would be as follows 
 
 /•ID 
 
 // EXEC SOUP 
 //SYSIN DD * ' 
 MAT. 
 
 M0V(C)(S1) • 
 M0V(C)(S2) • 
 M0V(C)(S3) • 
 MOV(C)(SU) • 
 
 H0H(B1)(S"2)(S3)(SU)(S5). 
 
 END P 
 
 END S 
 
 Note that the M0V(C)(V9). statement is expanded into four move statements 
 and V9 takes the values SI through SU . Similarly, V5 takes on the 
 values of SI through Sk. 
 
 3- # OLD OPTION 
 
 The #0LD option is used to define the number of rows in a 
 sequential data set created by a previously run SOUPAC job. The 
 number of rows is then entered into a table in the monitor. This 
 option should be used whenever the header record on the data set 
 is not known to have a correct value for the number of rows, and the 
 user does not want to execute a MATRIX MOVE to count the rows. To 
 use the option, punch a card with #OLD in the first four columns. 
 Then code the address and the number of rows in the usual SOUPAC 
 fashion. The number of columns may be coded on the card if desired, 
 but will be totally ignored. Include this card in the prolog section 
 of the SOUPAC job. 
 
 For example, to indicate that a data set to be input from 
 SEQUENTIAL 1 has 77 rows you would prepare the following card: 
 
 #OLD (Sl)(77)- 
 
-5A- 
 
 k. #TEST OPTION 
 
 There is also available a #TEST option; however, this facility- 
 is complicated and intended for testing purposes within the SOUPAC 
 office and has no significant advantage for the general user. 
 
 5. #DEFINE OPTION 
 
 Whenever the user wishes to specify the dimensions of a direct access 
 data set (DISK address), punch #DEFINE in the first seven columns of 
 a card followed by the address, number of rows and number of columns 
 coded in the usual SOUPAC fashion. Include this card in the prolog 
 section of your program. For double precision matrices, code the same 
 number of rows, but twice as many columns as otherwise. DISK 1 and DISK 2 
 have default definitions of 1+50 rows by k^O columns single precision. 
 If the user desires any other dimensions on these data sets , #DEFINE must 
 be used. If the user desires to use any DISK address other than DISK 1 
 and DISK 2, #DEFINE must be used besides supplying the necessary DD 
 cards . 
 
 For example, to define a data set for DISK 17 with 20 rows and Ho 
 columns double precision, you would prepare the following card: 
 
 #DEFINE (DISK 17)(20)(8o). 
 
 Notice that all prolog cards start with a # in column one and must occur 
 before any SOUPAC program parameter cards. A #-card in the middle of 
 the SOUPAC program parameter deck is treated as a comment. There is, 
 however, a #-control card, while not strictly a prolog card, which may 
 occur in the SOUPAC program parameter deck and will not be treated as 
 a comment. This is the #-zero card and is the only exception to the 
 statement about # cards being comments if in the middle of the deck. 
 The #-zero card is essentially a debugging tool to facilitate reading 
 of dumps if one is needed. It has no particular use for the user. 
 
 C. $- CONTROL CARDS 
 
 $-C0NTR0L CARDS are used to provide additional information 
 to a SOUPAC program above and beyond what is included in the parameters. 
 There are 3 $-control cards. All must begin in column one with the character 
 $ and then continue accross the card without blank columns. 
 
 1. $C-B 
 
 The $C-B card provides as its arguments the variables to 
 be used as control breaks for a program which accept control 
 breaks. The use of this card with a program which does 
 not accept control breaks is an error. The form of this card 
 is as follows: 
 
 $C-B(V 1 )(V 2 ) (V n ). 
 
 When V-j_ through V n are variable numbers and n must be less than or 
 equal to 2k. 
 
-6a- 
 
 2. $INP 
 
 $INP has as its arguments a string of input addresses. 
 The form is: 
 
 $INP(A 1 ) (A ). 
 
 where A-[_ through A n are input addresses including cards. 
 The number of addresses will be determined by the program 
 accepting the $INP card and will explicitly mentioned in the 
 program write-up. 
 
 3 • $OUT 
 
 $OUT(A n ) (A ). 
 
 I 1 n 
 
 $OUT has as its arguments a string of output addresses. The 
 
 form is the same as that for $INP and the number of addresses 
 
 is also determined by the program accepting the $OUT card. 
 
 Multiple output address will be accepted. See section on 
 
 Input/Output multiple addresses. 
 
-1B- 
 
 SOUPAC INPUT-OUTFJT AND TEMPORARY STORAGE 
 
 I. GENERAL 
 
 A. Input and Output as Data Types 
 
 Consider a set of data which a researcher wants intercorrelated. 
 To do correlations there is in the SOUPAC library of statistical precedures 
 a correlation program. Input to the correlation program is the researcher's 
 raw data; output from the correlation program is a matrix of correlation 
 coefficients. Similarly, every conceivable program has a particular input; 
 in fact, perhaps several inputs, and some output. 
 
 The nature of the input and output of a particular program will 
 depend on the program and its intent. For example, raw data variables 
 are input into a correlation program which outputs a correlation matrix. 
 But a factor analysis program expects as input a correlation matrix, 
 and yields as output a factor matrix. In contrast to the singular 
 relation of the nature of input and output to a particular statistical 
 program, every program finds its input somewhere and must put its output 
 somewhere . 
 
 B. Input and Output as Data Sources 
 
 SOUPAC is designed in such a manner that the researcher can tell 
 any program where his inputs are and where to put his outputs. Punched cards 
 are an obvious input source ; printed pages are an obvious output source . 
 But the nature of a punched card deck input into a correlation program would 
 be that of raw data variables. In the SOUPAC system input and output sources 
 are also called addresses. Thus, a possible input address for a correlation 
 program is cards and a possible output address for correlation coefficients 
 is print. Input and output addresses are parameters to every program in 
 the SOUPAC system. As the researcher reads a particular program write-up 
 he will notice that the order of the parameters determines the nature of 
 his input or output and his supplying an input or output address determines 
 whether or not he uses or gets the particular inputs and outputs. 
 
 II. ELEMENTARY INPUT/OUTPUT ADDRESS AND TEMPORARY STORAGE 
 
 A. Possible elementary input and output addresses in the SOUPAC 
 system are these: 
 
 INPUT: CARDS, SEQUENTIAL 1, SEQUENTIAL 2, . . . 
 . . . SEQUENTIAL 15 
 
 OUTPUT: PRINT, SEQUENTIAL 1, SEQUENTIAL 2 
 . .SEQUENTIAL 15 (See section on i 
 
 punched cards ) 
 
 Again, CARDS and PRINT are obvious sources. SEQUENTIAL 1 through 
 SEQUENTIAL 15, however, are input or output names of 15 temporary storage 
 regions available to the researcher in the SOUPAC system. These 15 
 temporary storage regions are provided for exactly that purpose, temporary 
 storage of data. Notice that with this facility a user can save his 
 
-2B- 
 
 correlation matrix, for example, at SEQUENTIAL 1 and then give SEQUENTIAL 
 1 as an input address to a factor analysis program. Or a researcher can 
 construct a copy of his data on temporary storage and then let any number 
 of programs use the same data as input from the same input address, saving 
 him the effort of making multiple copies of his card deck so that each 
 program would read its own deck. Finally, temporary storage addresses 
 enable the saving of intermediate results for further processing or 
 modification by other programs and thereby enable the researcher to 
 construct his own analysis procedure by providing the appropriate inputs 
 and outputs to the right programs at the right times. 
 
 B. SOUP vs SOUPAC with Respect to Temporary Storage 
 
 There are two ways of invoking the SOUPAC system. One can ask 
 for SOUPAC or SOUP. Note that all 15 temporary storage regions are allocated 
 to SOUPAC, while only SEQUENTIAL 1 through SEQUENTIAL 5 are allocated to 
 SOUP. Asking for SEQUENTIAL 6 through SEQUENTIAL 15 vhen running under 
 SOUP will cause an error and terminate the job. 
 
 All of these input-output addresses may be abbreviated as follows : 
 
 CARDS C 
 
 PRINT P 
 
 SEQUENTIAL 1 SI (or Tl) 
 
 SEQUENTIAL 15 SI 5 (or T15) 
 
 Tl through T15 are alternative abbreviations for SEQUENTIAL 1 through 
 SEQUENTIAL 15- Tl through T15 are, in fact, abbreviations of TAPE 1 
 through TAPE 15- SEQUENTIAL 1 through SEQUENTIAL 15 and their abbreviations 
 are the recommended uses. The Tl through T15 notation reflects a real 
 technical distinction but has been kept to enable programs using that notation 
 to run. 
 
 C. Multiple Output Addresses 
 
 A researcher may want to output to several sources : he may desire 
 to both print and save some results for later use. He cannot, however, 
 input from more than one source for a particular input address. The facility 
 of multiple output addresses has the following construction: 
 
 (output addressl/output address^/output address3). 
 
 This is the completely general form providing for up to three separate 
 outputs. Each output must be a different source, however. Thus, (Sl/P/X) 
 is a valid multiple output address providing for temporary storage at SI, 
 
-3B- 
 
 a print of the same data, and a j unched copy of the data. (See section on 
 punch for explanation of X). (Sl/P) will print and store but not punch. 
 The order of the addresses makes no difference. (Sl/P) is equivalent to 
 (P/Sl). Forms such as (S1/S2), however, are not permitted, nor are (P/P) 
 or (X/X): one can output only to one sequential and only once to P or X. 
 
 The above general form is available only if the output address in 
 the particular program is marked with an Q. 
 
 In all cases, however, the form 
 
 (output address l/output address 2) 
 
 is valid unless the program write-up explicitly has a restriction. 
 
 D. Print is F Form of Output Print Address. 
 
 There is yet another form to output addresses. This form is 
 available only where the researcher finds the symbol Q in the program write- 
 up and has to do with the kind of printed output. For technical reasons, 
 most programs print in a form called E-format which is a form of 
 scientific notation. This form allows the computer to print numbers of 
 any size. Some programs, for which the output numbers are known to be 
 constrained, as in correlation coefficients, however, print in a form 
 called F- format which is ordinary decimal number representation. F-format 
 generally cannot print numbers larger than a pre-determined size. The 
 size of number depends on the nature of a researcher's data, but the program 
 has no way of knowing this, hence, the most general form, E-format is used. 
 
 The researcher however, can on option specify F-format. To print in 
 F-format he would use the following output address: 
 
 (P(F)) or (P(F)/S1) if he wanted a multiple 
 
 output address. Those programs which print in F-format already, as for 
 correlation coefficients, can be made to print in E-format by using the 
 following output address: 
 
 (P(E)) or (P(E)/S15). 
 
 The different forms look like this: 
 
 E-format Scientific Notation F-format Decimal Number 
 
 Representation 
 
 + 0.123 1 +5E 06 +1.23^5 x 105 + 123^56. 123^5 +123U56.123U5 
 
 All four numbers have the same value correct to 5 places. Notice that F-format 
 cannot represent a number greater than 999999-99999 in absolute value 
 whereas E-format can represent the first 5 digits of any number of order of 
 magnitude up to 1099 . The numbers of digits illustrated for E and F 
 formats are the pre-determined limits for the size of numbers. E-format is 
 the more general form but F-format is easier to read. 
 
-2B- 
 
 correlation matrix, for example, at SEQUENTIAL 1 and then give SEQUENTIAL 
 1 as an input address to a factor analysis program. Or a researcher can 
 construct a copy of his data on temporary storage and then let any number 
 of programs use the same data as input from the same input address, saving 
 him the effort of making multiple copies of his card deck so that each 
 program would read its own deck. Finally, temporary storage addresses 
 enable the saving of intermediate results for further processing or 
 modification by other programs and thereby enable the researcher to 
 construct his own analysis procedure by providing the appropriate inputs 
 and outputs to the right programs at the right times. 
 
 B. SOUP vs SOUPAC with Respect to Temporary Storage 
 
 There are two ways of invoking the SOUPAC system. One can ask 
 for SOUPAC or SOUP. Note that all 15 temporary storage regions are allocated 
 to SOUPAC, while only SEQUENTIAL 1 through SEQUENTIAL 5 are allocated to 
 SOUP. Asking for SEQUENTIAL 6 through SEQUENTIAL 15 when running under 
 SOUP will cause an error and terminate the job. 
 
 All of these input-output addresses may be abbreviated as follows: 
 
 CARDS 
 
 C 
 
 PRINT 
 
 P 
 
 SEQUENTIAL 1 
 
 SI (or Tl 
 
 SEQUENTIAL 15 S15 (or T15 ) 
 
 Tl through T15 are alternative abbreviations for SEQUENTIAL 1 through 
 SEQUENTIAL 15- Tl through T15 are, in fact, abbreviations of TAPE 1 
 through TAPE 15. SEQUENTIAL 1 through SEQUENTIAL 15 and their abbreviations 
 are the recommended uses. The Tl through T15 notation reflects a real 
 technical distinction but has been kept to enable programs using that notation 
 to run. 
 
 C. Multiple Output Addresses 
 
 A researcher may want to output to several sources: he may desire 
 to both print and save some results for later use. He cannot, however, 
 input from more than one source for a particular input address. The facility 
 of multiple output addresses has the following construction: 
 
 (output addressl/output address^/output address^). 
 
 This is the completely general form providing for up to three separate 
 outputs. Each output must be a different source, however. Thus, (Sl/P/X) 
 is a valid multiple output address providing for temporary storage at SI, 
 
-3B- 
 
 a print of the same data, and a j unched copy of the data. (See section on 
 punch for explanation of X). (Sl/P) will print and store but not punch. 
 The order of the addresses makes no difference. (Sl/P) is equivalent to 
 (P/Sl). Forms such as (S1/S2), however, are not permitted, nor are (P/P) 
 or (X/X): one can output only to one sequential and only once to P or X. 
 
 The above general form is available only if the output address in 
 the particular program is marked with an tt. 
 
 In all cases, however, the form 
 
 (output address l/output address 2) 
 
 is valid unless the program write-up explicitly has a restriction. 
 
 D. Print is F Form of Output Print Address. 
 
 There is yet another form to output addresses. This form is 
 available only where the researcher finds the symbol fl in the program write- 
 up and has to do with the kind of printed output. For technical reasons, 
 most programs print in a form called E-format which is a form of 
 scientific notation. This form allows the computer to print numbers of 
 any size. Some programs, for which the output numbers are known to be 
 constrained, as in correlation coefficients, however, print in a form 
 called F- format which is ordinary decimal number representation. F-format 
 generally cannot print numbers larger than a pre-determined size. The 
 size of number depends on the nature of a researcher's data, but the program 
 has no way of knowing this, hence, the most general form, E-format is used. 
 
 The researcher however, can on option specify F-format. To print in 
 F-format he would use the following output address: 
 
 (P(F)) or (P(F)/S1) if he wanted a multiple 
 
 output address. Those programs which print in F-format already, as for 
 correlation coefficients, can be made to print in E-format by using the 
 following output address: 
 
 (P(E)) or (P(E)/S15). 
 
 The different forms look like this: 
 
 E-format Scientific Notation F-format Decimal Number 
 
 Representation 
 
 + 0.123 i +5E 06 +1.23^5x105 + 123^56. 123^5 +123^56. 123^5 
 
 All four numbers have the same value correct to 5 places. Notice that F-format 
 cannot represent a number greater than 999999-99999 in absolute value 
 whereas E-format can represent the first 5 digits of any number of order of 
 magnitude up to 1099 . The numbers of digits illustrated for E and F 
 formats are the pre-determined limits for the size of numbers. E-format is 
 the more general form but F-format is easier to read. 
 
4b- 
 
 In this example of E-fomiat the E 02 part is to be understood as 
 10 . E 03 would be 103 and E-OU would be 10"^. Thus, .376 E 03 is 
 .376 x 103 or 376. while . 129^-01 is .129U x 10" 1 or .0129 1 +. The sign 
 following the E determines which way to move the decimal point; left for 
 negative, right for blank or positive. The number following the sign or 
 blank determines how many places to move the decimal point. 
 
 E. Punched Output (Don't forget to specify CARDS= on ID Card!) 
 
 All programs which have output addresses marked with the symbol 
 Q can punch output directly by using the X output address. X is the 
 abbreviation for cards as output. C used as an abbreviation for an output 
 address will be an error. Punched output generated by the use of the 
 X output address will be in E- format. (See section above). X(F) is not 
 a valid form and will be an error. 
 
 If punched output is desired in a form other than E-format or 
 from a program which does not allow the X output address, then the 
 researcher must make a copy of his data on temporary storage and go to 
 the MATRIX program and use the PUNCH instruction provided in that 
 program. 
 
 F. Obtaining Additional Input/Output Sources 
 
 It happens that 15 temporary storage locations may not be 
 enough. Additional temporary storage may be obtained by calling 
 for Sl6 through S40 . Use of Sl6 through S*t0 requires the addition of Job 
 Control Cards to the 360 system cards of the SOUP AC program deck. At 
 least the first time the researcher should check with S0UPAC consultants 
 before doing this; firstly to learn to do it correctly if he doesn't know how 
 already, and secondly, if he knows how, to make sure none of the Job Control 
 Language has been changed or modified, which can happen due to 360 system 
 changes or reconfigurations, or S0UPAC system changes, which may not be 
 announced in contrast to SOUPAC program changes which would have been 
 announced. 
 
 If in special instances even Uo temporary storage regions are not 
 sufficient or a situation arises where so-called DISK temporary storage is 
 required, there can be made available temporary storage regions called 
 DISK1 through DISK^O. Check with the SOUPAC consultants before using 
 these for the proper Job Control Cards and the proper SOUPAC prolog cards. 
 
 G. Using Owner Data Sources or Special Input/Output Requirements 
 in the SOUPAC system 
 
 Users' own tapes or disk packs can be used with the SOUPAC 
 system for input or output. See Appendix A for additional information. 
 
-SB- 
 
 Special input/output requirements can usually be handled in the 
 SOUPAC system provided the requirements can be handled by the 360 system 
 at all. In such cases check with the SOUPAC consultants. 
 
 General problem types of the nature alluded to above would be 
 multiple file volumes, blocked input/output, formatted or unformatted 
 input/output, different kinds of record lengths and different forms of data 
 representation due to machine differences or differences in facilities 
 at other computer installations. 
 
February, 1971 
 
 Using Owner Storage Devices With SOUPAC 
 Introduction 
 
 User data storage mediums for our 360 user will typically be dismountable 
 tape or disk packs or they will be user data sets on some dismountable or 
 permanently mounted disk pack. Whatever the source, it must be described fully 
 so that the machine can access it. This information is needed as arguments 
 to certain JCL keywords. These parameters will be treated in a moment. First 
 we want to consider the dismountable devices, tapes and disk packs. User disk 
 packs are not as frequent as user tapes so we shall ignore them for the moment. 
 There are several "physical" characteristics of tapes which pertain to recording 
 techniques which must be considered before we get inot the "logical" attributes 
 described by the JCL parameters in the next section. 
 
 Physical tape parameters are density, number of tracks written and recording 
 mode. Density can be 200, 556, 800, or 1600 bits per inch, abbreviated BPI. Thus 
 a tape written at 800 BPI can store less information per inch than one written at 
 1600 BPI. Note that a tape to be used on the 360 must be written at 1600 BPI. It 
 is true that we have a dual density tape drive, namely 800/1600 BPI accessible to 
 the 360 and that there is a density parameter in JCL enabling one to specify the 
 density but since the 1600 mode is so much faster, generally, to be used on the 360, 
 the tape should be at 1600 BPI. 
 
 The number of tracks used in recording is 7 or 9- The 360 usually uses a 9 
 track recording mode; the 709^ used the 7 track mode. Recording mode is unformatted 
 or formatted but we usually speak of binary tapes or card image tapes, respectively. 
 If it is a card image tape it is BCD or EBCDIC. 
 
 Thus, the most desirable input tape for the 360 is 1600 BPI, 9 track, 360 binary 
 or EBCDIC. If one or more of these characteristics is different, then it should be 
 changed by doing a tape to tape copy with the appropriate conversion. Asking for 
 
-2- 
 
 special exceptions is not advised; i.e. do not try getting permission to write 
 
 800 BPI except in very special circumstances. 
 
 At this installation we can do the following conversions: 
 
 556 BPI to 800 BPI to 1600 BPI and vice versa 
 7 track to 9 track and vice versa 
 BCD to EBCDIC and vice versa 
 
 Note that in going from EBCDIC to BCD some characters may be lost. These conver- 
 sions are at the moment done by the 360/50 by requesting a tape to tape copy for 
 which the above operations are requested. 
 
 We can also do 3o0 binary to EBCDIC conversion and vice versa but we do not 
 have the facility to do anything about a 709*+ binary tape. Not that it cannot be 
 done, but of the two alternative means we know of, one uses IBM conversion routines 
 which would require hardware changes; the other requires a non-trivial program 
 which we do not have and even when we get it, it will be slow and clumsy. 
 
 We repeat that there are JCL keywords to specify density, number of tracks, 
 and parity but it is too slow and clumsy for the 360/75 • 
 
 Clearly the greater the density, the more information can be stored on a tape 
 and the faster it can be read or written. The higher density does however also 
 require a better tape. One should not write 1600 BPI on a tape tested at 800 BPI. 
 Further only a 1600 BPI tested tape can be reliably written with 9 tracks. This 
 test BPI information is usually on the tape reel in the form of a comment saying 
 tested at 1600 BPI or 800 BPI. Note that 1600 BPI and 3200 FCI mean the same thing. 
 
 There are several things to keep in mind when writing a user tape in binary 
 or card image mode. Card image, line image, and formatted tape all mean essentially 
 the same and mean exactly what their names imply. Clearly all these are read and 
 written with a format. Binary and unformatted tapes read and write faster than 
 formatted tapes but may take up more space than formatted tapes. How much data 
 a user wants always accessible and how important time is will probably determine 
 what mode to use. A card image tape may be read many different ways using formats 
 
-3- 
 
 to pick off variables. An unformatted tape has variables immediately usable but 
 perhaps not everything the user has on his cards. He can put everything he may 
 ever want to use on his unformatted tape but he may also be dragging some super- 
 fluous variables around everytime he does a read. 
 
 Choice, though possible, is typically not available for number of tracks on 
 tapes written by the 360. You get 9 track tapes. 
 
 So, the usual 360 output tape is 1600 BPI, 9 track, formatted or unformatted. 
 SOUPAC writes 1600 BPI, 9 track, unformatted tapes. To get a 1600 BPI, 9 track, 
 formatted tape out of SOUPAC the easiest thing to do at the moment is to over-ride 
 the punch statement in the proc, i.e., FT07F001. To input a formatted tape to 
 SOUPAC, over-ride the procedure in an appropriate way and use the MATRIX INPUT 
 instruction. To input a SOUPAC written unformatted tape just over-ride the pro- 
 cedure . 
 
 Over-riding the SOUPAC procedure for using user data storage devices 
 
 SETUP or not SETUP 
 
 Users data exists on permanently mounted volumes such as UIDCS1 or on dismount- 
 
 able volumes such as UIUSRU or DK0013 or user tapes or disk packs. In this context, 
 
 a volume is that physical disk pack or tape which gets mounted onto a disk drive 
 
 or tape dirve. If a volume is permanently mounted you need do nothing at this setup 
 
 stage. If it is dismountable, you need a /^SETUP card for each such volumne among 
 
 your /* cards, if any, after the /*TD card. Tapes always require a /^SETUP card. 
 
 For data sets on disk packs, determine from the user whether it is dismountable or 
 
 not. The form of a /*SETUP card is as follows: 
 
 /*SETUP_UNIT=type, ID=name 
 
 , ID=( name, label) . 
 
 Type is DISK or TAPE: the choice is obvious. 
 
 For tapes, name under ID will be 0XXXXX for an owner tape where XXXXX is some 
 
 tape sequence number or PXXXXX or LXXXXX where P and L are pool and lease tapes 
 
-k- 
 
 rented from DCS. 
 
 A tape may be internally labeled or not. If it is not labeled the label 
 argument is NL, meaning no label. If it is labelled the argument will be SL, 
 meaning standard label. If it is standard labelled, later in the over-ride of 
 the proc you will need to know the DSNAME. 
 
 If no label argument is used for tapes the default is SL. For dismountable 
 disk packs at DCS, the volume name is all that is needed since they are all 
 internally standard labelled. You must find out from a user if his tape or non- 
 DCS disk pack is internally labelled or not. 
 
 Over-riding the SOUPAC proc 
 
 Assuming you now have the user volume mounted, you must still tell SOUPAC 
 about it. Suppose that the user volume is a sequential data set (or possibly 
 more than one sequential data set). Usually they will be. Direct access data 
 sets will be discussed later. You must assign it a SOUPAC sequential data set 
 reference number. Numbers 11 through k9 are available and correspond to the 
 SOUPAC addresses SI through S39 (or Tl through T39) . Which data set reference 
 number you use does not make any difference. For sake of example let us choose 
 SI and over-ride FT11F001 in the SOUPAC proc. Our JCL should look like this to 
 this point : 
 
 /*ID 
 
 /*SETUP UNIT=type,ID= (name, label) 
 // EXEC SOUPAC 
 
 //FT11F001 DD indication string 
 //SYSIN DD * 
 
 The indication string has some or all of the parameters listed below. Use all 
 except those where an option is mentioned. 
 
 UNIT=type type is the same as that on the SETUP card; 
 
 if no setup then it is DISK 
 
-5- 
 
 VOL=SER=name 
 
 namp is the same as that on the SETUP card; 
 if no setup, it is the name of a permanently 
 mounted volume 
 
 LABEL=NL 
 LABEL=SL 
 
 DISP= 
 
 DSNAME= 
 
 same as label argument if used on SETUP card; 
 if not used on SETUP card, can be omitted 
 
 arguments are NEW, OLD, KEEP, DELETE, CATALG, 
 UNCATALG, PASS singly or in combination de- 
 pending on what you are doing 
 
 some name; use if you have a standard labelled 
 tape; otherwise optional depending on what you 
 are doing 
 
 SPACE (kind, (primary, secondary) ) 
 
 kind=TRK 
 CYL 
 
 primary 
 
 secondary 
 
 DCB=arg l=arg 2 
 DCB=(argl, arg 2 , arg 3 ) 
 
 not necessary for an OLD, 
 PASS, or CATALG data set; 
 necessary for a NEW data 
 set; not necessary for a 
 NEW user tape data set 
 
 a TEK holds 7200 bytes 
 
 a CYL has 20 tracks 
 
 there are other kind parameters, but you probably 
 
 should not be using them 
 
 number of kinds of space units you initially 
 request; you get this space at once; if it is 
 not possible to allocate you get a space un- 
 available message 
 
 if you run off the end of your primary allocation 
 you get 15 extends of the number you specify as 
 secondary of the space units you specify as kind 
 
 if the machine cannot give you 15 extents or 
 if after 15 extents you still need space you 
 get a space unavailable message 
 
 optional depending on what you are doing 
 
 DCB arguments are as follows : 
 RECFM= 
 
 F fixed record format (used for formatted I/O 1 ) 
 V varying record format (usually used for 
 
 unformatted i/O) 
 U undefined record format (used for formatted 
 
 or unformatted I/O) 
 B blocked record format 
 A alphameric record format; usually just on SYSOUT 
 
 these may be used in combination as in RECFM=FB. Note if B is used 
 you must also use LRECL and BLKSIZE. 
 
 LBFCL=n n is the logical record length in bytes 
 BLKSIZE=m m is the blocksize in bytes 
 
-6- 
 
 If no DCB is specified you get the OS default DCB. You may not necessarily 
 want this. Ascertain from the user what DCB he has and use it. For more in- 
 formation on DCB's check your FORTRAN users guide. But, we mention several pit- 
 falls already encountered and several points of information. SOUPAC by default 
 uses : 
 
 dcb=(recfm=v,lrecl=796,blksize=8oo^ 
 
 If you write a formatted data set with DISP=PASS you get RECFM=F whether you 
 specify it or not and SOUPAC, unless you specify DCB, will kick you off with the 
 message that it has an F or U RECFM but expecting V. If you had a tape written 
 with format conversion in another job and did not specify DCB=RECFM=F you will 
 probably get an illegal decimal character message. If you give SOUPAC a data 
 set with DCB=RECFM=F some way or another, SOUPAC will be unable to write on it 
 since it writes unformatted unless you use the MATRIX PUNCH instruction and over- 
 ride FT07F001. If you over-ride FT07F001 remember you get a SOUPAC DATA card 
 image and an END# card image. If you write large formatted strings or many vari- 
 ables, you should check on default LRECL and BLKSIZE. If your record is larger 
 than LRECL, you just spill onto the next logical record and the machine will worry 
 about it. If your record is larger than BLKSIZE, you get at best that piece of 
 the record that fits into BLKSIZE and the rest is lost. There are several tricky 
 things that are going on here, but usually you need not worry about them. If 
 your record is shorter than LRECL, then most of LRECL is unused and you may be 
 wasting space. This becomes important when you start getting space unavailable 
 or end of volume messages. Furthermore it is inefficient, though it is often much 
 more convenient just to take the defaults. Again, consult your FORTRAN user's 
 guide for more details. 
 
EXAMPLES 
 
 Write onto a users tape a SOUPAC data set in binary 
 
 /*ID 
 
 /*SETUP UNIT=TAPE,ID=(0XXXXX,NL) 
 
 // EXEC SOUPAC 
 
 //FT12F001 DD UNIT=TAPE,VOL=SER=0'XXXXX,LABEL=I« J ,DISP= (NEW, KEEP) 
 
 //SYSIN DD * 
 
 TRANSFORMATIONS (I A) (T2) (13) 
 
 Write onto a users tape a SOUPAC data set in EBCDIC 
 
 /*ID 
 
 /*SETUP UNIT=TAPE,ID=(0XXXXX,NL) 
 
 // EXEC SOUPAC 
 
 //FT07F001 DD UNIT=TAFE,VOL=SER=0XXXXX,LABEL=NL,DISP=( NEW, KEEP) 
 
 //SYSIN DD * 
 
 MATRIX. 
 
 PUNCH (IA)" (format )" 
 
 note that you will get the SYSPUNCH default DCB which limits you to card images, 
 i.e., 80 bytes per record. For larger records use your own DCB. 
 
 Read a users formatted tape into SOUPAC 
 
 /*ID 
 
 /*SETUP UNIT=TAPE,ID=0XXXXX,SL^ 
 
 // EXEC SOUPAC 
 
 //FTU9F001 DD UNIT=TAPE,VOL=SER=0XXXXX,LABEL=SL,DISP= (OLD, KEEP), 
 
 DSN=USER . PYXYY . TPDATA, DCB= (RECFM=F, LRECL=80, BLKSIZE=800) 
 //SYSIN DD * 
 MATRIX . 
 INPUT(T39^ (Tl) (NCOL^ " (format ) " 
 
 note that you are reading a card image tape with 10 card images per physical 
 record, i.e., blocking factor of 10 to 1. 
 
Read a users unformatted tape into SOUPAC written by SOUPAC 
 
 /*ID 
 
 /*SETUP UNIT=TAPE, ID=(0XXXXX, NL^ 
 
 // EXEC SOUPAC 
 
 //FT13F001 DD UNIT=TAPE,VOL=SER=0XXXXX,DISP=(OLD,KEEP),LABLE=NL 
 
 //SYSIN DD * 
 
 COR(T3)(P)(P). 
 
 note this presupposes that SOUPAC wrote the tape originally 
 Read a users unformatted tape into SOUPAC not written by SOUPAC 
 
 /*ID 
 
 /*SETUP UNIT=TAPE,ID=(0XXXXX,NL) 
 
 // EXEC SOUPAC 
 
 //FT11F001 DD IMIT=TAPE,VOL=SER=0XXXXX,DISP= (OLD, KEEP), LABEL=NL, 
 
 DCB= (RECFM=V, LRECL=U00, BLKSIZE=i+Oi+) 
 //SYSIN DD * 
 #0LD(T1) (nrow , i (NCOL) . 
 MATRIX . 
 MOVE(Tl) (T2^ . 
 
 note first of all you need the #OLD card to simulate a SOUPAC header record. 
 Further each record must contain three extra integer *h words at the front of 
 the record to correspond to SOUPAC records. Also, if it was written with default 
 DCB you do not need to specify a DCB. Finally, the #OLD card is good to use if 
 you are having header record problems in SOUPAC anyway. 
 
 Read a user data set from a dismountable disk 
 
 /*ID 
 
 /*SETUP UNIT=DISK, ID=DK0291 
 
 // EXEC SOUPAC 
 
 //FT12F001 DD UNIT=DISK,V0L=SER=DK0291,DSN=USER.PXXX.name, 
 
 DISP=(OLD,KEEP\DCB=(RECFM= ,LRECL= ,BLKSIZE= ) 
 //SYSIN DD * 
 
 use input if formatted; otherwise go through the appropriate sequence for un- 
 formatted reads. To write the data set, use the SPACE parameter; change the 
 DTSP parameter appropriately; and consider what kind of write you want and fix 
 the over-ride appropriately. 
 
 If you intend to generate data in SOUPAC and wish to use the data in any context 
 other than SOUPAC, it is highly recommended that you do not simply save the binary 
 image file but generate a formatted record data set. Formatted record data sets 
 interface more completely with other systems. 
 
CORRECTION TO HANDOUT ON USING OWNER STORAGE DEVICES WITH SOUPAC 
 
 On page 5 the statements: 
 
 LABEL=NL 
 LABEL=SL 
 
 should read 
 
 LABEL=(,NL) 
 LABEL=(,SL) 
 
 therefore also in the examples every statement which has in it an occurance of 
 the phrase LABEL=NL or LABEL=SL should have that phrase changed to LABEI>(,NL) 
 or LABEL=(,SL), respectively. 
 
 For your information : 
 
 LABEL=(,NL , l or LABEL=(,SL) is equivalent to LABEL=(l,NL) or LABEL=(l,SL) 
 which in fact means file 1 on the volumne. Thus to access file 2 you would 
 write LABEL=(2,NL) or LABEL=(2, SL"! , or in general, LABEL=(n,XX) where n is 
 the file number and XX is NL or SL. 
 
 10/27/69 
 
CROSS REFERENCE 
 
 Statistical Technique 
 or Estimate 
 
 Alpha factor analysis 
 
 Analysis of Variance 
 
 Analysis of Covariance 
 
 Autocorrelations 
 
 Beta Coefficients 
 
 Biserial Correlations 
 Canonical Correlations 
 Canonical Factor Analysis 
 Centroid Factors 
 Chi-Square 
 Clique analysis 
 Communality estimation 
 Correlations 
 
 Covariance, Analysis of (See 
 Analysis of Covariance) 
 
 Covariance Matrix 
 
 Covariance Matrix of Coefficients 
 
 Covariance Matrix of reduced 
 form residuals 
 
 Cross-products matrix 
 Cross-Tabulation 
 
 SOUP AC Source 
 
 ITERATIVE 
 
 BALANOVA, T-TEST 
 
 T-TEST, UMAVAC* 
 
 AUTOCORRELATIONS 
 
 CANONICAL, CORRELATIONS, MULTIPLE, 
 STEP-WISE 
 
 BISERIAL 
 
 CANONICAL 
 
 ITERATIVE 
 
 CENTROID 
 
 CANONICAL, CHI-SQUARE, FREQUENCY, PROBIT 
 
 CLIQUE 
 
 COMMUNALITIES 
 
 AUTOCORRELATIONS, BISERIAL, CANONICAL 
 CORRELATIONS, DISCRIMINANT, MISSING, 
 MULTIPLE, OBLIMAX, PARTIAL, STEP-WISE 
 
 CORRELATION, K-CLASS , MULTIPLE, STEP-WISE 
 K-CLASS 
 
 ECONOMETRIC 
 
 CORRELATIONS, DISCRIMINANT, K-CLASS, 
 MULTIPLE 
 
 FREQUENCY 
 
 * See SOUPAC Consultants about using UMAVAC. 
 
Statistical Technique 
 or Estimate 
 
 Data Processing 
 
 Determination of a matrix 
 
 Determination, Coefficient of 
 
 Deviation covariance matrix 
 
 Deviation (partial) correlation 
 matrix 
 
 Diagonal Method Factor Analysis 
 
 Dichotomous data 
 
 Discriminant Function 
 
 Discriminant Scores 
 
 Dispersion Matrix 
 
 Distribution Analysis 
 
 Durbin-Watson Coefficients 
 
 Dwyer Extension analysis 
 
 Eigenvalues 
 
 Eigenvectors 
 F-Ratios 
 
 Factor Analysis 
 
 Factor Scores (See Factor Analysis) 
 
 Frequency Distributions 
 
 Gamma Coefficients 
 
 Goodman and Kruskal Coefficients 
 
 Iterative Principal Axis Solution 
 
 K-Class Estimation 
 
 Kolmogorov-Smirnov Statistic 
 
 SOUPAC Source 
 
 MATRIX, STANDARD, TRANSFORMATIONS 
 
 MATRIX 
 
 MULTIPLE, STEPWISE, K-CLASS 
 
 MULTIPLE 
 
 MULTIPLE 
 
 SQUARE 
 
 BISERIAL, SCALOGRAM 
 
 DISCRIMINANT 
 
 DISCRIMINANT 
 
 DISCRIMINANT 
 
 FIT, FREQUENCY, KOLMOGOROV 
 
 ECONOMETRIC, MULTIPLE 
 
 PROCRUSTES 
 
 DISCRIMINANT, JACOBI , K-CLASS 
 PRINCIPAL AXIS, THREE-MODE 
 
 JACOBI, PRINCIPAL AXIS, THREE-MODE 
 
 BALANOVA, DISCRIMINANT, MULTIPLE, 
 STEP-WISE, T-TEST 
 
 COMMUNALITIES , ITERATIVE , OBLIMAX , 
 UNRESTRICTED, VARIMAX, THREE-MODE 
 
 FREQUENCY 
 
 FREQUENCY 
 
 FREQUENCY 
 
 ITERATIVE 
 
 K-CLASS 
 
 KOLMOGOROV 
 
Statistical Technique 
 or Estimate 
 
 Lambda Coefficients 
 
 Latin Squares 
 
 Least Squares Factor Fitting 
 
 Limited Information Estimation 
 
 Linear Programming 
 
 Means 
 
 Missing Data Correlations 
 
 Moving Averages 
 
 Multiple correlation 
 
 Noncard, non SOUPAC generated input 
 
 Oblique Factor Rotation 
 
 Orthogonal Factor Rotation 
 
 Orthogonal Simple Structure 
 
 Paired Comparisons 
 
 Partial Correlation Coefficients 
 
 Phi Coefficients 
 
 Point Biserial Correlations 
 
 Predicted Values 
 
 Primary Factor Pattern 
 
 Principal Components (See Factor 
 Analysis ) 
 
 Probit Analysis 
 
 Punched Card Output 
 
 Quadratic Programming 
 
 Random Number, Generation of 
 
 SOUPAC Source 
 
 CANONICAL, DISCRIMINANT, FREQUENCY 
 
 UMAVAC 
 
 PROCRUSTES 
 
 K-CLASS 
 
 LINEAR 
 
 CORRELATIONS, DISCRIMINANT K-CLASS, 
 MISSING, MULTIPLE, T-TEST, STANDARD 
 
 MISSING 
 
 STANDARD 
 
 CANONICAL, MULTIPLE, STEP-WISE 
 
 MATRIX (INPUT operation) 
 
 BINORMAMIN, OBLIMAX 
 
 VARIMAX 
 
 VARIMAX 
 
 PAIRED 
 
 PARTIAL 
 
 Use CORRELATIONS 
 
 CORRELATIONS 
 
 ECONOMETRIC, MULTIPLE 
 
 BINORMAMIN, OBLIMAX 
 
 PROBIT 
 MATRIX 
 QUADRATIC 
 RANDOM 
 
Statistical Technique 
 or Estimate 
 
 Rank Order Correlations 
 Recoding Variables 
 Reduced Form Estimates 
 Reduced Form Predicted Values 
 Reduced Form Residuals 
 Reference Vector Structure 
 Regression Analysis 
 Regression Covariance 
 Residual Analysis 
 Rotation of Factors 
 Sample Size 
 
 Serial Correlations 
 
 Spearman ' s Rho 
 
 Spearman ' s C 
 
 Spectrum Analysis 
 
 Square Root Factors 
 
 Standard Deviations 
 
 Standard Errors of Regression 
 Coefficients 
 
 Standardized Data 
 
 Standardized Regression Coefficients 
 
 Stepwise Maximum Likelihood Factor 
 
 Analysis 
 
 Stepwise Multiple Regression 
 T-Statistic (Student's) 
 
 Tchuprow's T 
 
 SOUPAC Source 
 
 RANK 
 TRANSFORMATIONS 
 
 ECONOMETRIC 
 
 ECONOMETRIC 
 
 ECONOMETRIC 
 
 OBLIMAX, BINORMAMIN 
 
 CORRELATION, K-CLASS, MULTIPLE, STEP-WISE 
 
 MULTIPLE 
 
 ECONOMETRIC 
 
 BINORMAMIN, OBLIMAX VARIMAX 
 
 CORRELATION, DISCRIMINANT, FREQUENCY, 
 MISSING, MULTIPLE, STANDARD 
 
 AUTOCORRELATIONS 
 
 RANK 
 
 FREQUENCY 
 
 AUTOCORRELATIONS 
 
 SQUARE 
 
 CORRELATION, MULTIPLE, STANDARD, T-TEST 
 
 K-CLASS, MULTIPLE, STEP-WISE 
 
 STANDARD SCORES 
 
 CANONICAL, MULTIPLE, STEP-WISE 
 
 ITERATIVE 
 
 STEP-WISE 
 
 CORRELATION, MULTIPLE, STEP-WISE 
 T-TEST 
 
 FREQUENCY 
 
Statistical Technique 
 or Estimate 
 
 SOUPAC Source 
 
 Tetrachoric Correlations 
 
 Three Mode Factor Analysis 
 
 Three Stage Least Squares 
 
 Two Stage Least Squares 
 
 Unrestricted Maximum Likelihood 
 Factor Analysis 
 
 Variance (See Standard Deviations) 
 
 Variance, Analysis of (See Analysis 
 of Variance) 
 
 PARTIAL 
 THREE MODE 
 THREE STAGE 
 K-CLASS 
 
 UNRESTRICTED 
 
DATA AND MATRIX MANIPULATIONS PACKAGE 
 
The data manipulative programs 
 
 As implied by the name, TRANSFORMATIONS is used for performing 
 data transformations often necessary in "setting up" data to "be input 
 to one of the "cook-book" programs of the SOUPAC system. The user may 
 create new variables as linear combinations or as algebraic functions of 
 old variables, and the user may recode or alter data values on the basis 
 of test conditions by using the TRANSFORMATIONS program. To perform 
 matrix algebra operations with one or more matrices, to augment matrices either 
 row-wise or column-wise, to reorder, save or delete specific rows or 
 columns of a matrix, one would use the MATRIX program. MATRIX also has 
 the capability of printing, punching card decks, and reading and writing 
 tape and disk files in ways not available elsewhere in SOUPAC. 
 
 The TRANSFORMATIONS and MATRIX programs have a unique place in the 
 SOUPAC system. Although the remainder of the SOUPAC library performs 
 a large number of specialized statistical procedures, there are some 
 computations which are not represented by a uniquely written program. 
 However, by an imaginative utilization of the combined powers of 
 TRANSFORMATIONS and MATRIX it is possible to perform a virtually unlimited 
 range of established and experimental statistical techniques. It is a 
 common practice within the SOUPAC office to "check out" newly written 
 programs against results computed by TRANSFORMATIONS and MATRIX. Conversely, 
 these two programs can be used as teaching tools by having students learn 
 the step-by-step computations and then checking the results against results 
 of the "cook-book" programs. Complete multiple regressions and analysis 
 of variance programs, for example, have been written in this manner. Since 
 most SOUPAC jobs require the use of TRANSFORMATIONS and MATRIX, a 
 familiarity with these two programs is basic to an effective use of the 
 SOUPAC system. 
 
MATRIX 
 
 I. General Description 
 
 The MATRIX program is a data manipulating program for inputting 
 and outputting, creating, performing matrix algebraic operations, and 
 generally handling data matrices. All the MATRIX suboperations are 
 restricted to ^50 columns (variables). No absolute limit is set on 
 the number of rows (observations). 
 
 Standard SOUPAC address conventions are used including the use of 
 the character X to denote punched output, and (f) after a print to 
 denote print with F format. Also available and discussed in section III 
 below is the use of I for storing a matrix in memory, and the use of (l) 
 after a print to invoke the MATRIX labeling feature. All other 
 restrictions are noted by the discussion of the individual suboperation 
 explanations. 
 
 II. Parameters 
 
 A. Main Parameters 
 
 To invoke the MATRIX program, code the name MATRIX (or simply 
 the program mnemonic MAT). There are two optional parameters 
 available which may be coded on the MAT card. 
 
 First, if it is desired to print, immediately prior to the 
 execution of each MATRIX subparameter operation, the time in 
 seconds since entry into the MATRIX program, code a (l) after the 
 name MATRIX. This option is not normally needed and is provided 
 merely for giving timing estimates. 
 
 The second optional parameter is coded as a (l) following the 
 timing estimate parameter. This second option causes the number 
 of rows and columns and the precision (either single or double) of 
 the answer matrix (for all suboperations which produce an answer 
 matrix) to be printed out. This is useful in debugging a MATRIX 
 program when it is not necessary to see the entire answer matrix to 
 be printed out, but it would be helpful to check the dimensions of 
 an answer matrix. As in all SOUPAC programs, the main program 
 parameter card must be terminated by a period. Examples: 
 
 MATRIX. 
 
 MATRIX (1). 
 
 MAT. 
 
 MAT (1) . 
 
 MATRIX (l)(l). 
 
 MAT ( )(l). -- recommended usage — 
 
MATRIX 
 Page 2 
 
 B. Subparameters 
 
 Any MATRIX operation may be invoked by coding its mnemonic 
 followed by appropriate subparameters. All operations in MATRIX 
 handle both single and double precision matrices at the control 
 of the user (see operations SINGLE and DOUBLE). For an address 
 not explicitly assigned either single or double precision, MATRIX 
 assumes a default of double precision for output to the address. 
 Terminate all subparameter cards with a period. 
 
 To end a MATRIX program, place a card which has the characters 
 END P after the last MATRIX subparameter card. Since all MATRIX 
 programs must have at least one subparameter operation, an error 
 will be signaled if a MAT card is followed immediately by an 
 END P card. 
 
 Input and output for MATRIX may be from any source, however, the 
 following rules must be observed: 
 
 1) Never use CARDS as input to any operation except 
 MOVE unless both the number of rows and the number 
 of columns have been specified on the DATA format 
 card at the front of the data deck. 
 
 2) You may never output to PRINT only. All MATRIX output 
 must go to some intermediate storage location even when 
 only printout is desired. 
 
 3) Avoid using the same address more than once on the same 
 parameter card unless otherwise noted in the description 
 of an individual operation. However, in those operations 
 which do permit using an address more than once as an input 
 address, CARDS may not be used as an input address more than 
 once. In all operations except INVERT, never specify an 
 output address which is the same as an input address for 
 that operation. 
 
 k) The contents of an input address remains unchanged 
 
 during the execution of an operation unless otherwise 
 noted. 
 
 Following is a description of the subparameter operations currently 
 in the MATRIX program. 
 

 
 
 Summary of MATRIX Oj 
 
 erations 
 
 MATRIX 
 Page 3 
 
 mnemonic 
 
 notes 
 
 operation name 
 
 examples 
 
 ABS 
 
 
 
 Absolute value 
 
 ABS ( 
 
 SI] 
 
 (S2). 
 
 ADD 
 
 1 
 
 2,5 
 
 Add 
 
 ADD ( 
 
 SI' 
 
 (S2) (S3). 
 
 ALL 
 
 
 3,6 
 
 All 
 
 ALL ( 
 ALL ( 
 ALL ( 
 
 SI' 
 SI' 
 SI 
 
 "GT" *0-* (S2). 
 "NE" *-0* (S2) (2). 
 "LE" (S2) (S3) (1,10,3) 
 
 ANY 
 
 
 3,6 
 
 Any 
 
 ANY ( 
 ANY 1 
 ANY 1 
 
 'SI 
 'SI 
 
 'si 
 
 "GE" (S2) (S3). 
 
 "EQ" *1.* (S2) (1,2) (5 
 
 "LT" *99* (S2). 
 
 CHO 
 
 
 h 
 
 Cholesky decomposition 
 
 CHO 
 
 'si 
 
 (S2). 
 
 COL 
 
 
 6 
 
 Column delete 
 
 COL 
 COL 
 
 'si 
 
 ;si 
 
 ) (S2) (2). 
 
 ) (S2) (10) (12,15). 
 
 CON 
 
 
 
 Constant addition 
 
 CON 
 CON 
 
 ;si 
 [si 
 
 ) *2.* (S2). 
 ) (S2) (S3). 
 
 cou 
 
 
 
 Count 
 
 COU 
 COU 
 COU 
 
 [si 
 
 ;si 
 
 [si 
 
 ) (S2). 
 
 ) (S2) (1). 
 
 ) (S2) (2). 
 
 DIA 
 
 
 
 Diagonal to vector 
 
 DIA 
 
 [si 
 
 ) (S2). 
 
 DIM 
 
 
 
 Dimension 
 
 DIM 
 
 [si 
 
 ) (S2). 
 
 DOU 
 
 
 2 
 
 Double precision 
 
 DOU 
 
 [si 
 
 ) • 
 
 EJE 
 
 
 
 Eject 
 
 EJE. 
 
 
 
 E-D 
 
 1, 
 
 2,5 
 
 Elementwise divide 
 
 E-D 
 
 [si 
 
 ) (S2) (S3). 
 
 E-M 
 
 1 
 
 2,5 
 
 Elementwise multiply 
 
 E-M 
 
 [si 
 
 ) (S2) (S3). 
 
 E-R 
 
 
 
 Elementwise square root 
 
 E-R 
 
 [si 
 
 ) (S2). 
 
 EXP 
 
 
 
 Expand 
 
 EXP 
 EXP 
 
 [si 
 [si 
 
 ) *20* (S2). 
 ) (S2) (S3). 
 
 FIL 
 
 
 2 
 
 File 
 
 FIL 
 
 [si 
 
 ) • 
 
 GEN 
 
 
 6 
 
 Generate 
 
 GEN 
 
 [si 
 
 ) *1*. 
 
 HOR 
 
 1. 
 
 2,5 
 
 Horizontal augment 
 
 HOR 
 
 [si 
 
 ) (S2) (S3). 
 
 IDE 
 
 
 
 Identity matrix 
 
 IDE 
 
 [k8 
 
 ) (si). 
 
 INP 
 
 
 
 Input 
 
 INP 
 INP 
 INP 
 
 [si 
 [si 
 [si 
 
 ) (S2) ( ) (9) "(9F6.1)". 
 
 ) (S2) ( ) (12). 
 ) (I) (576) (10). 
 
 INV 
 
 
 h 
 
 Invert 
 
 INV 
 INV 
 INV 
 
 [si 
 
 [I) 
 
 [si 
 
 ) (S2). 
 
 (I). 
 ) (S2) (1) (1) *10.E-6*. 
 
 LAB 
 
 
 6 
 
 Label 
 
 LAB 
 
 [si 
 
 ) "SAMPLE 1" "AGE" "SEX". 
 
 LAG 
 
 
 
 Lag 
 
 LAG 
 LAG 
 
 [si 
 [si 
 
 ) (2) (6) ( ) (S2). 
 ) (3) (10 (1) (S2). 
 
 LOW 
 
 
 
 Lower triangle 
 
 LOW 
 LOW 
 
 [si 
 [si 
 
 ) (S2). 
 
 ) (S2) (1). 
 
MATRIX 
 Page h 
 
 mnemonic notes operation name 
 
 examples 
 
 MAX 
 
 
 Maximum value 
 
 MAX ( 
 MAX ( 
 MAX ( 
 
 'si 
 
 'SI 
 
 SI 
 
 ) (S2). 
 
 ) (S2) (1). 
 
 ) (S2) (2). 
 
 MIN 
 
 
 Minimum value 
 
 MIN ( 
 MIN ( 
 MIN ( 
 
 SI 
 SI 
 SI 
 
 ) (S2). 
 ) (S2) (1). 
 (S2) (2). 
 
 MOV 
 
 2 
 
 Move 
 
 MOV ( 
 MOV ( 
 
 :c) 
 
 SI 
 
 (si). 
 
 (S2). 
 
 MUL 
 
 1A5 
 
 Multiply 
 
 MUL ( 
 
 'SI 
 
 > (S2) (S3). 
 
 OUT 
 
 
 Output 
 
 OUT ( 
 OUT ( 
 
 'si 
 'si 
 
 ) (S2) "(10E16.9)". 
 ) (S2). 
 
 PAR 
 
 
 Partition 
 
 PAR ( 
 
 'SI 
 
 ) (S2) (5) (10) (2) (21). 
 
 PER 
 
 6 
 
 Permutation 
 
 PER ( 
 
 SI 
 
 ) (2) (1). 
 
 PRI 
 
 
 Print 
 
 PRI ( 
 
 'si 
 
 ) "(' ',10F13.4)". 
 
 PUN 
 
 
 Punch 
 
 PUN ( 
 
 'SI 
 
 ) "(8F10.3)". 
 
 REC 
 
 
 Reciprocal 
 
 REC ( 
 
 'si 
 
 ) (S2). 
 
 REM 
 
 
 Remap 
 
 REM 
 
 'si 
 
 ) (S2) (8). 
 
 REW 
 
 2 
 
 Rewind 
 
 REW ( 
 
 'si 
 
 ) . 
 
 ROW 
 
 6 
 
 Row delete 
 
 ROW ( 
 ROW ( 
 
 si 
 'si 
 
 ) (S2) (1) (2) (3). 
 
 ) (S2) (2,20,3) (3,20,3). 
 
 SCA 
 
 
 Scalar multiply- 
 
 SCA ( 
 SCA ( 
 
 'si 
 
 SI 
 
 ) *.l* (S2). 
 ) (S2) (S3). 
 
 SIN 
 
 2 
 
 Single precision 
 
 SIN ( 
 
 SI 
 
 ) . 
 
 SUB, 
 
 1,2,5 
 
 Subtract 
 
 SUB ( 
 
 SI 
 
 ) (S2) (S3). 
 
 SUM 
 
 
 Sum 
 
 SUM ( 
 SUM ( 
 SUM ( 
 
 SI 
 
 'si 
 
 SI 
 
 ) (S2). 
 ) (S2) (1). 
 (S2) (2). 
 
 TRA 
 
 k 
 
 Transpose 
 
 TRA ( 
 
 SI 
 
 (S2). 
 
 UPP 
 
 
 Upper triangle 
 
 UPP ( 
 UPP ( 
 
 SI 
 SI 
 
 (S2). 
 (S2) (1). 
 
 VEC 
 
 
 Vector to diagonal 
 
 VEC ( 
 
 SI 
 
 (S2). 
 
 VER 
 
 1,2,5 
 
 Vertical augment 
 
 VER ( 
 
 SI 
 
 (S2) (S3). 
 
 1. Conformability of input matrices is checked. 
 
 2. Up to twenty-one total addresses may be used. 
 
 3. A warning message is printed if no rows are output. 
 
 k. Any matrix which has been previously stored under the I address will 
 be destroyed. 
 
 5- An input address may be used more than once for input to the same 
 instruction. 
 
 6. As many arguments as are needed of the last argument type may be used. 
 
MATRIX 
 Page 5 
 
 ABSOLUTE VALUE (mnemonic: ABS) 
 
 The ABSOLUTE VALUE operation has two address parameters, an input 
 address and an output address. The absolute value of each element of 
 the input matrix is taken and the result goes to the output address. 
 Example : 
 
 ABSOLUTE VALUE (SEQ3) (SEQJO . 
 
 ADD (mnemonic: ADD) 
 
 The ADD operation has from three to twenty-one address parameters. 
 The last address is the output address; all other addresses are for 
 input. Each input matrix must have the same number of rows and columns 
 as all other input matrices. An address may be used more than once 
 as an input address. 
 
 Corresponding elements of the first matrix through the next to 
 last matrix are added together, and the result goes to the output 
 addre s s . Example s : 
 
 ADD (SEQ1)(SEQ,U)(SEQ5). 
 
 ADD (SEQl) (SEQJ+) (SEQ2) (SEQ3) (SEQ5) . 
 
 ALL (mnemonic: ALL) 
 
 The ALL operation performs a particular test, specified by the 
 second operand as a relational operator, between a set of elements for 
 each input row and a floating point number specified by the third 
 operand. If all elements of the set for a given row pass the test, 
 that row is output to the output address. 
 
 The first parameter is the input address. The relational 
 operator is enclosed in quotation marks. The third operand may be 
 either a floating point number or an address in which case the 
 first element of the matrix is used as the floating point number. 
 The six legal relational operators are "LT", "LE", "EQ", "NE", 
 "GT", and "GE". Remaining (optional) parameters are index 
 sets specifying which variables are to be included in the testing. 
 If no variables are specified, all variables are included in the 
 testing. Note that if only one variable is specified, the results 
 of the ANY and the ALL operation would be the same. Examples: 
 
 ALL (SEQ1) "NE" *0.* (SEQ2) . 
 ALL (SEQ3) "GE" (SEQ» (SEQl) . 
 
 ANY (mnemonic: ANY) 
 
 The ANY operation performs a particular test, specified by the 
 second operand as a relational operator, between a set of elements for 
 each input row and a floating point number specified by the third 
 operand. If any element of the set for a given row passes the test, 
 that row is output to the output address. 
 
MATRIX 
 Page 6 
 
 The first parameter is the input address. The relational 
 operator is enclosed in quotation marks. The third operand may be 
 either a floating point number or an address in which case the 
 first element of the matrix is used as the floating point number. 
 The six legal relational operators are "LT", "IE", "EQ", "HE", "GT", 
 and "GE". Remaining (optional) parameters are index sets specifying 
 which variables are to be included in the testing. If no variables 
 are specified, all variables are included in the testing. Note that 
 if only one variable is specified, the results of the ANY and the ALL 
 operation would be the same. Examples: 
 
 ANY (SEQ2) "GT" *3-* (SEQl). 
 
 ANY (SEQl) "NE" *-0.* (SEQ»(l,3). 
 
 CHOLESKY (mnemonic: CHO) 
 
 The CHOLESKY operation decomposes a square symmetric matrix into 
 the product of an upper triangular matrix and a lower triangular 
 matrix such that the two triangular matrices are the transpose of 
 each other. CHOLESKY has two operands, an input address and an 
 output address. The result which goes to the output address is 
 the lower triangular matrix resulting from the decomposition. 
 
 If the input matrix is not square, the "extra" rows or columns 
 are ignored. Additionally, if the square matrix is not symmetric, 
 the actual upper triangle of the matrix is effectively ignored 
 and is instead assumed to be identical to the lower tirangle. 
 Example : 
 
 CHOLESKY (SEQl)(SEQ2). 
 
 If the input matrix on SEQ 1 is 
 
 h -2 -k 
 
 -2 2 3 
 -h 3 6 
 
 the resulting matrix output to SEQ 2 is 
 
 2 
 -110 
 -2 11 
 
 COLUMN DELETE (mnemonic: COL) 
 
 The COLUMN DELETE operation specifies which columns of an input 
 matrix are to be deleted before sending the result to the output 
 address. The first parameter is the input address. The second 
 parameter is the output address. Columns to be deleted are 
 specified by index sets following the output address. Examples: 
 
 COLUMN DELETE (SEQU) (SEQ5) (k) (5) (6) (8) (13) (15) • 
 COLUMN DELETE (SEQl) (SEQ2) (l,30,3^ • 
 
MATRIX 
 Page 7 
 
 CONSTANT (mnemonic: CON) 
 
 The CONSTANT operation has three parameters A rwir. 
 number specified hv +h& o^™^ ^^^rs. a Iloatmg point 
 
 the matrL specified by th ftT T ^ added t0 eTery el «»«t <* 
 the third opened addresj * """""^ ^ reSiat «°" *° 
 
 CONSTANT (SEQ1) *4. 5 * ( SE Q2). 
 CONSTANT (SEQl)(SEQ 3 )(SEQif). 
 
 COUNT (mnemonic: COU) 
 
 ^i:°z ropts iSiSw operands; an in ^ add — - - -t^ 
 
 visaing data, of^acn l^of^,5£?Lg£^ f ? ,d ? Ul, « 
 option is equivalent to specifying option S! SpeClf > ln e no 
 
 coial^oA L™a d counl o r rSe tlng ^ ""** " a Sl ^ 
 basing data of each r^oflS Sput^! ° f ^^ «*•"•« 
 
 Examples: e±ementS ' exclu ^g missing data over the entire matrix. 
 
 COUNT (SEQ1) (SEQU) . 
 COUNT (SEQ5)(SEQ3)(1). 
 COUNT (SEQ2)(SEQ3)(2). 
 
 DIAGONAL (mnemonic: DIA) 
 
 - out e put IA addrts? er T a h e i0 " t^° ^^ m ^ a ^ess and 
 matrix are used t ; fo™ T d n lag ° nal elements °f the first 
 second operSf aldre'r ^leY™ ^^ WMch " °^put to the 
 
 DIAGONAL (SEQ2)(SEQl + ). 
 DIMENSION (mnemonic: DIM) 
 
 The DIMENSION operation h^ +r m „a* 
 address, and an outpS address ?hf . f V"?"***™' ™ input 
 of columns of the input S' 7 ° f r ° WS and the number 
 
 second elements respJctivelv of a f . used 1 to fo ™ **e first and 
 which is output to t^'ou^tldLe^/^ief^ 6 "" ^ 
 DIM (SEQ2)(SEQ1). 
 
MATRIX 
 Page 8 
 
 DOUBLE (mnemonic: DOU) 
 
 The DOUBLE operation has anywhere from one to twenty-one 
 addresses as parameters. Listing an address as a parameter negates 
 the effect of any previous listing of that address as a parameter 
 in the operation SINGLE. Listing an address as a parameter which 
 has not appeared as a SINGLE subparameter has no effect. Example: 
 
 DOUBLE (SEQ1) (SEQ2) (SEQ3) (SEQU) (SEQ5) (i) . 
 
 EJECT (mnemonic : EJE ) 
 
 The EJECT operation causes the next printout to begin at the 
 top of a new page. EJECT has no parameters. Example: 
 
 EJECT . 
 
 E -DIVIDE — Elementwise Divide — (mnemonic: E-D) 
 
 The E-DIVIDE operation has from three to twenty-one address 
 parameters. The last address is the output address; all other 
 addresses are for input. Each input matrix must have the same 
 number of rows and columns as all other input matrices for the use 
 of the operation. An address may be used more than once as an 
 input addre s s . 
 
 Elements of the second matrix through the next to last matrix are 
 divided into the corresponding elements of the first matrix. Output 
 goes to the last address. Example: 
 
 E-DIVIDE (SEQl) (SEQ2) (SEQ3) . 
 
 E-MULTIPLY -- Elementwise Multiply -- (mnemonic: E-M) 
 
 The E-MULTIPLY operation has from three to twenty-one address 
 parameters. The last address is the output address; all other 
 addresses are for input. Each input matrix must have the same 
 number of rows and columns as all other input matrices for the use 
 of the operation. An address may be used more than once as an 
 input address. 
 
 Corresponding elements of the first matrix through the next to 
 last matrix are multiplied together. Output goes to the last 
 addre s s . Example : 
 
 E-MULTIPLY (SEQ3)(SEQ2) (l)(SEQ^). 
 
 E-ROOT -- Elementwise Square Root — (mnemonic: E-R) 
 
 The E-ROOT operation has two address parameters, an input address 
 and an output address. The (positive) square root of each element 
 of the input matrix is taken and the result goes to the output 
 addre s s . Example : 
 
 E-ROOT (SEQl)(SEQ2). 
 
MATRIX 
 Page 9 
 
 EXPAND (mnemonic : EXP) 
 
 of times specified by the Lo"d parameter ^ "*"'"' the ^^ 
 
 -se^ 8e Ct P ^S e ^ d b t e e JS fi r ? f ™ *** - ber in which 
 
 specified by the integration of S L* 653 the number of times 
 
 second parameter can be aTi™? tL n ° atl »g number; or the 
 
 is copied to the output Sdress unt^X " J** CaSe the in P ut — 
 
 same number of rows as the 7 ^ ° UtpUt matrix has the 
 
 is the output address! E^ple": "** ^^ The tMrd P-ameter 
 
 EXPAND (SEQ1)(SEQ2)(SEQ3). 
 EXPAND (SEQI)*5 5 *(SEQ4). 
 
 FILE (mnemonic: FIL) 
 
 as parie'ters^'FI^i^ 3 Tf^ ^ ** t0 twen ^— e addresses 
 written : t th; end^f " ^NTIaTS 1 " ^T^ 16 ^ ^ be 
 most useful to the" uir & llflfslfs ^JlZ^Z* 1 * *?'*** 
 his own physical tape. Since any meanfngSl use o^he FlS 
 a?I r but t n he reqUi r S the additi ° n ° f a PP-P-ate IBM 3<£ JCL^ards 
 
 FILE (SEQ5). 
 GENERATE (mnemonic: GEN) 
 
 The GENERATE operation generates a single row vector with the 
 floating point numbers the user specifies vhTrJ I 
 
 GENERATE (SEQ2^ *1.* * 2 .* *U.* *8.* *l6.* * 3 2.*. 
 HORIZONTAL AUGMENT (mnemonic: HOR^ 
 
 addr^s™Ss AU T\2r a L i0n ^ ^ th - e t0 ^y-one ■ 
 other addres^el are for input EaT" ^ ^ addreSS ' a11 
 same number of rZt U 1 ? P ^ ? Ch lnput matrix must have the 
 
 address Ex^ple: "** aM the reSult 8 MS to «» 1-t 
 
 HOBIZOMTAL AUGMENT (SEQ1) (SEQM (SEQ21 (i) . 
 
MATRIX 
 Page 10 
 
 IDENTITY (mnemonic: IDE) 
 
 The IDENTITY operation has two parameters. An identity matrix, 
 of order specified by a fixed point number as the first operand, is 
 output to the address specified by the second operand. Example: 
 
 IDENTITY (20)(SEQ3). 
 
 INPUT (mnemonic: INP^ 
 
 The INPUT operation will input formatted or non-formatted 
 records from any available device. This option is primarily for 
 reading card images or other similar data the user may have 
 usually on his own tape, which would be awkward to input in the 
 typical card deck manner. 
 
 Never input to I (see SPECIAL COMMENTS^ using the INPUT operation 
 unless both the number of rows and the number of columns of the 
 input matrix are specified as parameters on the INPUT operation 
 parameter card. 
 
 The parameters for INPUT are the input address, the output 
 address, the number of rows of the input matrix (optional in 
 most cases^, number of columns, and optionally the format enclosed 
 in quotation marks. Examples: 
 
 INPUT (SEQ1)(SEQ2/PRINTH20U5) "(10F8.3)". 
 INPUT (SEQ2)(SEQ3U H8) . 
 
 INVERT (mnemonic: INV) 
 
 The INVERT operation inverts a non-singular real matrix. 
 The INVERT operation has five subparameters, the last three of 
 which are optional. The first parameter is the address of the matrix 
 to be inverted, and the second parameter is the output address of 
 the result. (The incore address option described in section III. A - 
 SPECIAL COMMENTS - may be used for either input, output, or both) . 
 To have the determinant of the original matrix printed out, code a 
 (l) as the third parameter. 
 
 The inversion technique used is the Gauss-Jordan method with 
 pivot elements assumed to be on the main diagonal. If it is 
 desired that the inversion technique perform row and column 
 interchange, for the purpose of picking pivot elements as those with 
 the largest absolute value at each step of the elimination procedure, 
 code a (l) as the fourth parameter. The default case, pivot elements 
 assumed to be on the main diagonal, executes faster than when row 
 and column interchange is performed. For those real symmetric 
 matrices which have the property that the largest elements are 
 necessarily on the main diagonal (e.g. correlation, cross-products, 
 variance-covariance matrices^ numerical accuracy of the results is 
 not significantly different between the two options. For general 
 matrices in which specific properties are not known, using row and 
 column interchange will probably produce more accurate results. 
 
MATRIX 
 Page 11 
 
 value of any pi^t element IsZll^^" 1 ^ 7 - If the abs ° lu *« 
 
 *. nT L p a LL s- rL\^n th : ™ « - -~ ty , 
 m^ss-HorST^s s^sr- address 
 
 AX = Y 
 
 input to the INVERT suboperation a matrix which contains A -v U 
 the constant term appearing as the last column^ variables) Al 
 resulting output of the INVERT suboperation will be 
 
 A -1 :X 
 
 The Y above may be more than one column vector in which case each 
 resulting column vector of X will be the solution fv£ +>, . 
 
 column of Y. Examples: solution for the corresponding 
 
 INVERT (SEQ1)(SEQ2). 
 
 INVERT (SEQif)(SEQ3)(D(l). 
 
 INVERT (SEQ2)(SEQU)(1)( ) *10.E-5*. 
 
 INVERT (SEQ5)(SEQ1)( ) (l) *. 0000001*. 
 
 LABEL (mnemonic: LAB) 
 
 a sohpa^S ° peration is used to store a title and column labels at 
 
 is limited to 6 128 ch ^ ^ ^ * mTRIX Pr °^- The title 
 each characters. Labels are limited to eight characters 
 
 The first parameter is the address where the title and labels 
 are to be stored. This is then followed by the title and labels 
 
 "v one C ti m f ^ qUOtat± ° n ^^ ° nly <** label set S a^ at 
 tX? \ HenCe ' Sach USe of LABEL overrides all previous uses 
 
 progrLf ne (n a ot e e. Wi Th in - a "™ »«*«» -7 «* be passed%Toth r' 
 programs. (note: The mcore address option may be used to store a 
 title and label set if desired.) Examples: 
 
 LABEL (SEQ» "TITLE" "LABEL 1" "LABEL 2" "LABEL V 
 LABEL (SEQ1) "ONLY A TITLE; NO LABELS". ' 
 
 LAG (mnemonic: LAG) 
 
 The LAG operation has five operands; an input address an 
 integer specifying which variable is to be lagged an inteSr 
 specifying the number of lag periods, an inteffr option fS 
 and an output address. integer option flag, 
 
MATRIX 
 Page 12 
 
 If the option flag is zero, intermediate lags are included 
 in the output matrix; if the option flag is non-zero, intermediate 
 lags are not included. 
 
 For example, suppose we are interested in lagging the 
 fourth variable of a five variable matrix and suppose we want 
 three lag periods. First it should be noted that the resulting 
 output matrix will necessarily have three fewer rows (observations) 
 than the input matrix. 
 
 If we use 
 
 LAG(S1)(10(3)(0)(S2). 
 
 th 
 the resulting t row to the output address would be 
 
 X t,l X t,2 X t,3 X t,^ X t,5 X t-3,^ X t-2,l+ X t-1,^ 
 If we use 
 
 LAG(S1)(U)(3)(1)(S2). 
 
 th 
 
 the resulting t row to the output address would be 
 
 X t,l X t,2 X t,3 X t,5 X t-3,^ 
 
 Note that the difference between the option flag being zero 
 and non-zero is that the resulting output matrix has W-l 
 fewer lags where N is the number of lag periods (i.e. the 
 intermediate lags are not included in the output). 
 
 As a concrete example consider the following input on SI. 
 
 1 
 
 2 
 
 3 
 
 k 
 
 5 
 
 6 
 
 7 
 
 8 
 
 9 
 
 10 
 
 11 
 
 12 
 
 13 
 
 Ik 
 
 15 
 
 16 
 
 17 
 
 18 
 
 Use of the statement 
 
 LAG(S1)(2)(2)( )(S2). 
 will result in the following matrix to be output to S2. 
 
 7 
 
 8 
 
 9 
 
 2 
 
 5 
 
 10 
 
 11 
 
 12 
 
 5 
 
 8 
 
 13 
 
 ll+ 
 
 15 
 
 8 
 
 11 
 
 16 
 
 17 
 
 18 
 
 11 
 
 Ik 
 
 Use of the statement 
 
 LAG(S1)(2)(2)(1)(S2) 
 
MATRIX 
 Page 13 
 
 will result in the following matrix to be output to S2. 
 
 7 
 
 8 
 
 9 2 
 
 10 
 
 11 
 
 12 5 
 
 13 
 
 11+ 
 
 15 8 
 
 16 
 
 17 
 
 18 11 
 
 LOWER TRIANGLE (mnemonic: 
 
 LOW) 
 
 The LOWER TRIANGLE instruction copies a matrix from one 
 address to another and sets all elements which are above the 
 main diagonal to zero. It is possible to indicate if it is 
 desired that the main diagonal elements also be set to zero. 
 
 The LOWER TRIANGLE instruction has three operands; an 
 input address, an output address, and an integer option flag. 
 If the option flag is omitted or is zero, the main diagonal 
 elements are included as part of the lower triangle. If the 
 option flag is non-zero, the main diagonal elements are set to 
 zero. Examples: 
 
 LOWER (SI) (S3). 
 LOWER (S2)(Si+)(l). 
 
 MAXIMUM (mnemonic: MAX) 
 
 The MAXIMUM operation has three operands; an input address, an 
 output address, and an option indicator. 
 
 If option is specified, the resulting output matrix is a single 
 row vector containing the maximum element of each column of the 
 input matrix. Specifying no option is equivalent to specifying option 
 0. 
 
 If option 1 is specified, the resulting output matrix is a single 
 column vector containing the maximum element of each row of the 
 input matrix. 
 
 If the option is specified as any number other than or 1, a 
 single element matrix is output which contains the maximum element 
 of the entire matrix. Examples: 
 
 MAX (SEQ1)(SEQ3). 
 MAX (SEQ1)(SEQ2)(2). 
 
 MINIMUM (mnemonic: MEN) 
 
 The MINIMUM operation has three operands; an input address, an 
 output address, and an option indicator. 
 
 If option is specified, the resulting output matrix is a 
 single row vector containing the minimum element of each column 
 of the input matrix. Specifying no option is equivalent to 
 specifying option 0. 
 
 If option 1 is specified, the resulting output matrix is a 
 single column vector containing the minimum element of each row 
 of the input matrix. Examples: 
 
 MEN (SEQ1)(SEQ3). 
 MEN (SEQ1)(SEQ2)(2). 
 
MATRIX 
 Page Ik 
 
 MOVE (mnemonic: MOV) 
 
 The MOVE operation moves (actually copies) a matrix from one 
 SOUPAC standard input source to another. If reading from SEQUENTIAL, 
 the MOVE operation assumes that the data set was created using SOUPAC 
 conventions, i.e. by some SOUPAC program. If the input source is 
 CARDS, the input deck must be preceded by a correct DATA format 
 statement and terminated by an END# card. 
 
 Never MOVE from CARDS to I (see SPECIAL COMMENTS) unless both 
 number of rows and number of columns of the input matrix are 
 specified at the front of the data deck. 
 
 The operation has between two and twenty-one addresses as 
 parameters. The first address is the input address. All remaining 
 addresses are output addresses. Examples: 
 
 MOVE (CARDS) (SEQ1) (SEQ2) . 
 MOVE (CARDS) (SEQl). 
 
 MULTIPLY (mnemonic: MUL) 
 
 The MULTIPLY operation has three addresses for parameters. A 
 matrix multiplication is performed between the matrices on the first 
 two addresses and the result is stored in the third address. The 
 MULTIPLY operation permits use of the same address to be used as an 
 input address for both first and second operands. This usage is 
 equivalent to using the SQUARE suboperation. 
 
 The incore address option may not be used for input of the first 
 operand if the first operand is different from the second. The 
 incore address option may never be used for output of the result. 
 The MULTIPLY operation destroys any matrix which has been stored 
 in core using the incore address option (see section III. A - SPECIAL 
 COMMENTS'). All calculations are done in double precision. Examples: 
 
 MULTIPLY (SEQ1)(SEQ2)(SEQ3). 
 MULTIPLY (SEQl)(SEQl)(SEQ2). 
 MULTIPLY (SEQ2) (l) (SEQ5) . 
 
 OUTPUT (mnemonic: OUT"i 
 
 The OUTPUT operation outputs a matrix, a row at a time with or 
 without format control, to a user specified data set. If format 
 control is used, the output address specified should not be used 
 anywhere else in the current SOUPAC job step except with options 
 which also perform formatted i/O (e.g. the INPUT and OUTPUT operations 
 of MATRIX). The syntax of OUTPUT is two addresses, input and 
 output addresses respectively, optionally followed by the desired 
 format enclosed by quotation marks. Examples: 
 
 OUTPUT (SEQ1)(SEQ5^ "(20EI5.7)". 
 
 OUTPUT (SEQ2)(SEQ3^. 
 
MATRIX 
 Page 15 
 
 PARTITION (mnemonic: PAR) 
 
 The PARTITION operation is used to select a sub-matrix of an 
 original input matrix. The first operand is the input address 
 and the second operand is the output address. The next four 
 parameters specify in order, the beginning column of the partition, 
 the ending column of the partition, the beginning row of the partition, 
 and the ending row of the partition. If either beginning parameter is 
 left out, the partition begins with the first row (or column). If 
 either ending parameter is left, the partition ends with the last 
 row (column). Examples: 
 
 PARTITION (SEQ5) (SEQ2) (5) (6) (2) (50) 
 PARTITION (SEQU)(SEQ2)(3)(^0). 
 
 PERMUTATION (mnemonic: PER) 
 
 The PERMUTATION operation permutes, on option, rows or columns 
 or rows and columns of an input matrix. The resulting matrix is 
 output to the second operand output address. 
 
 The third operand is an option flag. If the option flag is 
 specified as zero, columns of the matrix are permuted. If the option 
 flag is specified as one, rows of the matrix are permuted. If the 
 option flag is specified as other than zero or one, both rows and 
 columns are permuted. 
 
 The order of columns or rows of the output matrix is determined 
 by index sets. Examples: 
 
 If the statement 
 
 PERMUTE (Sl)(S2)(0)(5)(l,^). 
 
 is used and SI has the matrix 
 
 1. 2. 3. k. 5. 
 
 -1. 0. 2. .1 .3 
 
 the resulting output matrix will be 
 
 5. 1. 2. 3. h. 
 .3 -1. 0. 2. 1. 
 
 If the input matrix had been 
 
 1. 2. 3. k. 5- 6. 
 -1. 0. 2. .1 .3 k. 
 
 the resulting output matrix would be the same as the output matrix 
 already listed. Note that this implies that only those columns or 
 rows which are explicitly listed will be output. 
 
 If the statement 
 
 PER (Sl)(S2)(l)(3)(2Hl)- 
 
MATRIX 
 Page l6 
 
 is used and the matrix on SI is 
 
 12 3 
 
 k 5 6 
 7 8 9 
 
 the resulting output matrix will be 
 
 3 2 1 
 
 6 5 h 
 
 9 8 7 
 
 If the statement 
 
 PER (S1)(S2)(2)(3)(2)(1). 
 is used on the matrix 
 
 12 3 
 
 k 5 6 
 
 7 8 9 
 
 the resulting output matrix will be 
 9 8 7 
 
 6 5 h 
 
 321 
 
MATRIX 
 Page I7 
 
 PRINT (mnemonic: PRI 
 
 The PRINT operation prints out a matrix, one row at a time, under 
 the control of a user supplied format. Formats follow FORTRAN IV 
 conventions with the added restriction that formats are limited to 
 592 characters. 
 
 The first parameter is the address of the matrix to be printed. 
 The second parameter is the format enclosed in quotation marks. 
 (Warning: Allow for carriage control as the first character in 
 output lines. A print line has 133 characters.) Example: 
 
 PRINT (SEQ2) "(' ',3F20.10)". 
 
 PUNCH (mnemonic: PUN) 
 
 The PUNCH operation has the same syntax as PRINT and is used to 
 punch out a matrix under the control of a user supplied format. 
 (Warning: When punching cards, remember that there is room for only 
 80 characters per card) . 
 
 The PUNCH operation always punches two cards in addition to the 
 actual data deck. At the front of the data is punched a DATA format 
 card, and at the end of the data is punched an END# card. Example: 
 
 PUNCH (SEQ1)"(8F10.2)". 
 
 RECIPROCAL (mnemonic: REC) 
 
 The RECIPROCAL operation has two operands, an input address and 
 an output address. The reciprocal of the elements from the first 
 matrix are used to form an output matrix which is output to the 
 second operand address. Example: 
 
 RECIPROCAL ( SE01 ) ( SEQ3 ) • 
 REMAP (mnemonic: REM) 
 
 The REMAP instruction is used when a row of the input 
 matrix represents several observations and it is desired to 
 "break up" the input row into several shorter output rows. 
 
 The REMAP instruction has three operands; an input address, 
 an output address, and an integer which is to be used as the 
 column dimension of the output address. 
 
 For example, if we have the row of data on SI 
 1. 2. 3. k. 5- 6. 
 
 and use the instruction 
 REMAP (S1)(S2)(2). 
 
 the resulting output to S2 would be 
 
 1. 2. 
 3. k. 
 5. 6. 
 
 Notice that the column dimension to be specified for the 
 output matrix must exactly divide the column dimension of the 
 input matrix. 
 
MATRIX 
 Page 18 
 
 REWIND (mnemonic: REW) 
 
 The REWIND operation has anywhere from one to twenty-one addresses 
 as parameters. REWIND is used to rewind a sequential file. The 
 REWIND operation needs only to be used with the INPUT operation when 
 it is desired to reread a formatted input file. Example (to input 
 the same formatted file from SEQ 3 onto both SEQ 1 and SEQ, 2 under 
 control of different formats) : 
 
 INPUT (SEQ3) (SEQ1) ( ) (5) " (10X, 5F10.0)" . 
 
 REWIND (SEQ3). 
 
 INPUT (SEQ3)(SEQ2)( ) (8) "(8F10.0)". 
 
 R OW DELETE (mnemonic: ROW) 
 
 The ROW DELETE operation specifies which rows of an input matrix 
 are to be deleted before sending the result to the output address. 
 The first parameter is the input address. The second parameter is 
 the output address. Rows to be deleted are specified by index 
 sets following the output address. Example: 
 
 ROW DELETE (i) (SEQ3) (l) (3) (7) (8) (ll) (k^) . 
 SCALAR (mnemonic: SCA) 
 
 The SCALAR operation has three parameters. A floating point 
 number specified by the second operand is multiplied by every element 
 of the matrix specified by the first operand. The result goes to 
 the third operand address. 
 
 The second operand may be either a floating point number enclosed 
 in asterisks or a standard SOUPAC input address. If an address is 
 specified, the first element of the matrix at the address is used 
 for the floating point number. Examples: 
 
 SCALAR (SEQ1) *2.* (SEQ2) . 
 SCALAR (SEQ>)(SEQ3)(SEQ2). 
 
 SINGLE (mnemonic: SIN) 
 
 The SINGLE operation has anywhere from one to twenty-one addresses 
 as parameters. Listing an address as a parameter causes any matrices 
 written on that address to be written in single precision. MATRIX 
 stores all data matrices in double precision unless the user specifies 
 otherwise with the suboperation SINGLE. The listing of an address in 
 a SINGLE statement in one MATRIX program does not carry over in effect 
 to any other MATRIX program. Example : 
 
 SINGLE (SEQ1)(SEQ2)(SEQ». 
 
MATRIX 
 Page 19 
 
 SUBTRACT (mnemonic: SUB) 
 
 The SUBTRACT operation has from three to twenty-one address 
 
 ITeToTlL^VT . add r S \ is the ™^ a ^ess; all other addresses 
 are for xnput. Each input matrix must have the same number of rows 
 and columns as all other input matrices for the use of the operation 
 Elements of the second matrix through the next to last matrix are 
 subtracted from corresponding elements of the first matrix. Output 
 goes to the last address. An address may be used more than once as 
 an input address. Examples: 
 
 SUBTRACT (SEQl)(SEQ3)(SEQl+). 
 
 SUBTRACT (SEQlO (SEQ2) (SEQ3) (SEQl) (SEQ5) . 
 
 SUM (mnemonic: SUM) 
 
 The SUM operation has three operands; an input address, an output 
 address, and an option indicator. 
 
 If option is specified, the resulting output matrix is a single 
 row vector containing the column sum of each column of the input matrix. 
 Specifying no option is equivalent to specifying option 0. 
 
 If option 1 is specified, the resulting output matrix is a single 
 column vector containing the row sum of each row of the input matrix. 
 
 If the option is specified as any number other than or 1, a 
 single element matrix is output which contains the sum of all elements 
 over the entire matrix. Examples: 
 
 SUM (SEQ1)(SEQ3). 
 SUM (SEQ1)(SEQ2)(2). 
 
 TRANSPOSE (mnemonic: TRA) 
 
 and'colZ!)!^ a ma ^ix (interchanges rows 
 
 address storLe ™f f destr ° ys an ^ Previous usage of the incore 
 imS If g \ tW ° P arame *ers for TRANSPOSE are first the 
 
 xnput address and second the output address of the result Example: 
 
 TRANSPOSE (SEQ1)(SEQ2). 
 
 UPPER TRIANGLE (mnemonic: UPP) 
 
 addSA™ "V-f-"- -Pies a matrix from one 
 
 are included as Zlf tr V Mr °' the min dia S°*al elements 
 
 UPPER (S1)(S3^. 
 UPPER (S2)(S4)(1). 
 
MATRIX 
 Page 20 
 
 VECTOR (mnemonic: VEC) 
 
 The VECTOR operation has two operands, an input address and an 
 output address. A single vector from the first location is used 
 to form a diagonal matrix which is output to the second address. 
 If the input matrix has more rows than columns, the first column 
 vector is used to form the diagonal matrix. If the input matrix 
 has more columns than rows, the first row is used to form the 
 diagonal matrix. Example: 
 
 VECTOR (SEQ1)(SEQ3). 
 
 VERTICAL AUGMENT (mnemonic: VER) 
 
 The VERTICAL AUGMENT operation has from three to twenty-one 
 address parameters. The last address is the output address; all 
 other addresses are for input. Each input matrix must have the 
 same number of columns as all other input matrices for the use of 
 the operation. An address may be used more than once as an input 
 addre s s . 
 
 Input matrices from the first address through the next to the last 
 address are stacked top to bottom and the result goes to the last 
 address. All input matrices must have the same number of columns. 
 Example : 
 
 VERTICAL AUGMENT (SEQl) (SEQ2) (SEQ3! 
 
MATRIX 
 Page 21 
 
 Addendum to the MATRIX Program Description 
 
 KRQNECKER PRODUCT (mnemonic : KRO) 
 
 The KRONECKER PRODUCT operation forms the Kronecker Product 
 of two matrices and outputs the results to an output address. 
 The resulting output matrix is composed of m-j_ x n, submatrices 
 where m-[_ and n-|_ are the dimensions of the first input matrix. 
 Each submatrix has the size mp x n2 where va.2 and n2 are the 
 dimensions of the second input matrix. Note that the output 
 matrix has the dimensions mj_m2 x n]_n2- Each submatrix is the 
 result of the scalar product a. . B. 
 
 For example, if matrix A is oh SI and B is on S2 where 
 
 1. 2 13 
 
 A= -1 1 B= 
 
 .5 2 k 
 
 the result of executing the MATRIX statement 
 
 KRONECKER (SI ) (S2) (S3 ) • 
 
 would be the following matrix on S3. 
 
 1 
 
 3 
 
 2 
 
 6 
 
 2 
 
 k 
 
 It 
 
 8 
 
 -1 
 
 -3 
 
 1 
 
 2 
 
 -2 
 
 -k 
 
 3 
 
 k 
 
 
 
 
 
 .5 
 
 1.5 
 
 
 
 
 
 1 
 
 2 
 
 RECIPROCAL OF SQUARE ROOT OF nTAflONAT, (mnemonic: RSD) 
 
 The RSD operation has two operands, an input address and an 
 output address. The reciprocal of the square root of the main 
 diagonal elements from the first matrix are used to form a single 
 row vector which is output to the second operand address. Example: 
 
 RSD (SEQ1)(SEQ3). 
 
 These programs were designed after the bulk of the MATRIX program 
 description was typed. 
 
MATRIX 
 Page 2.2 
 
 II. Special Comments 
 
 A. Incore Address Option 
 
 Besides the standard SOUPAC addresses, MATRIX also recognizes 
 the additional address I. The I symbol as an address represents 
 internal storage in the machine . 
 
 An obvious use of this feature is to cut down on I/O time for 
 matrices which are to be used in future operations within the 
 current matrix program. The internal storage feature also saves 
 time when the user desires his output from an operation to be 
 printed or punched. The user must keep in mind that data cannot 
 be passed to subsequent programs with the I storage. The user 
 should also be aware of the restrictions on I storage as mentioned 
 above in some of the subparameter operations (see INVERT, MULTIPLY, 
 SQUARE, and TRANSPOSE). In all cases the use of this option is not 
 recommended for matrices which do not fit within the memory available 
 to the MATRIX program while running within any particular region size, 
 
 l) To add the matrix on SEQl to the matrix on SEQ2 leaving 
 the result in core and also printing the result, code as 
 follows : 
 
 ADD (SEQl)(SEQ2)(l/PRINT) 
 
 it the matrices in core, oi 
 ; suit on SEQU, code as fcxL 
 
 VERTICAL AUGMENT (i) (SEQl) (SEQ2) (SEQ» 
 
 To vertically augment the matrices in core, on SEQl and on 
 SEQ2, storing the result on SEQ4, code as follows: 
 
 Labeled Output 
 
 Provided in the MATRIX program is the facility to title and put 
 column labels on any matrix which is printed using normal SOUPAC 
 print conventions. The labeling feature is not allowed with the 
 PRINT matrix operation. 
 
 To use the labeling feature, it is first necessary to put the 
 title and labels in a temporary storage area. This is accomplished 
 with the LABEL operation (see Subparameter s) . 
 
 To use a label which has been placed in a temporary storage area, 
 code (L) after the print portion of the output address. If F 
 format is also desired code either (F,L) or (L,F) after the print. 
 Example s : 
 
 1) To move (copy) the matrix on SEQl onto SEQ5 printing the 
 result in F format with title and column labels, code as 
 follows : 
 
 MOVE ( SEQl) (SEQ5/PRINT (F, L) ) . 
 
MATRIX 
 Page 2 3 
 
 2) To add the matrices on SEQ1 and SEQ2 storing the result on 
 SEQ3 and also printing the result with title and column 
 labels, code as follows: 
 
 ADD (SEQ1)(SEQ2)(SEQ3/PRINT(L)). 
 
 3) To transpose the matrix stored on SEQ1 to SEQ2, printing 
 out the result in F format with title and column labels, 
 and punching out a card deck of the transposed matrix, 
 code as follows : 
 
 TRANSPOSE (SEQ1)(SEQ2/PRINT(L,F)/X). 
 
TRANSFORMATIONS 
 I. General Description 
 
 TRANSFORMATIONS is a data manipulation program. Unlike MATRIX, which performs 
 operations on a complete matrix, TRANSFORMATIONS operates upon matrices one row 
 at a time. This strategy provides for almost unlimited flexibility in trans- 
 forming your data. Some of the general uses include creating new variables as 
 functions of present variables, recoding or collapsing data, and reordering or 
 eliminating variables. More advanced uses are facilitated by an instruction 
 set which allows testing and branching depending on single variable character- 
 istics or relations between variables, indirect addressing (FLAG-NOTATION), and 
 inputting and outputting to and from different sequential units during the program, 
 
 TRANSFORMATIONS serves several purposes in the SOUPAC system. First, it 
 can be used as a stand alone program to perform computations on your input data 
 and yield the final results. Also, it can be used to prepare your data for 
 input into another SOUPAC program or to make modifications from the output of 
 one program for input into another. 
 
 II. Parameters 
 
 Parameter 
 Numhpr Use or Meaning 
 
 1 Input address of the main input matrix. 
 
 CARDS or SEQUENTIAL 1-15- This parameter can 
 be left out if there is no input or if input 
 comes from the input instruction. 
 
 The format of the entire TRANSFORMATIONS program follows : 
 
 TRANSFORMATIONS (C or Sn, n<15). 
 
 '. subparameter cards which describe the transformation to be performed 
 
 END PROGRAM 
 
 III. Programming Considerations 
 
 The TRANSFORMATIONS program, as described in the first section, reads in one 
 row of data at a time and then it executed the program until the end or until 
 a last card instruction, and then it cycles on the identical program for each 
 row of data. 
 
 There are 2000 variables allowed in the TRANSFORMATIONS program. Before 
 each row of data is read into the program variables 1 through 1000 are filled 
 with zeroes. Variables 1001 through 2000 are zeroed out only before the program 
 begins so that during the program they can be used for accumulating sums or 
 related totals as well as being used for temporary storage variables for row 
 manipulations . 
 
 Input to the program can be specified by the parameter on the TRANSFORMATIONS 
 card or by the INPUT instruction. Output can only be achieved by use of the 
 OUTPUT instruction. 
 
 IV. Semi-table of Contents 
 
 Section 5 - Subparameter List Section 9 - DO-notation 
 
 Section 6 - Labels Section 10- FLAG-notation 
 
 Section 7 - Subparameter Descriptions Section 11- A TRANSFORMATIONS Example 
 
 Section 8 - EBCDIC Tables Section 12- Notes and Ideas 
 
V. Subparameter List 
 
 TRANSFORMATIONS 
 PAGE 2 
 
 mnemoni 
 
 c notes 
 
 operation name 
 
 exami 
 
 Dies 
 
 ABO 
 
 Abort 
 
 ABORT. 
 
 ABS 
 
 1 
 
 Absolute Value 
 
 ABS 
 
 ID(21). 
 
 ADD 
 
 1,2 
 
 Add 
 
 ADD 
 ADD 
 
 Cl)(2)(2l). 
 
 (D(3)(5)(T)(22). 
 
 ANG 
 
 1 
 
 Angle to Radians 
 
 ANG 
 
 (D(21). 
 
 A-C 
 
 1 
 
 Arccosine 
 
 A-C 
 
 (D(21). 
 
 A-S 
 
 1 
 
 Arcsine 
 
 A-S 
 
 (D(21). 
 
 A-T 
 
 1 
 
 Arctangent 
 
 A-T 
 
 (D(21). 
 
 C-G 
 
 
 Computed Go To 
 
 C-G 
 
 (l00)"A""B""C""D". 
 
 CON 
 
 3 
 
 Constant 
 
 CON 
 CON 
 
 (101)*!+. 3*. 
 (102)(7). 
 
 COS 
 
 1 
 
 Cosine 
 
 COS 
 
 (D(21). 
 
 DIF 
 
 2 
 
 Difference if 
 
 DIF 
 
 (1)(5)"A""B""C". 
 
 DIV 
 
 1,2 
 
 Divide 
 
 DIV 
 
 DIV 
 DIV 
 
 (D(2)(21). 
 
 (D(2)(21)(3). 
 
 (1)(2)(21)(3)"A". 
 
 EBC 
 
 1 
 
 EBCDIC 
 
 EBC 
 
 (D(21). 
 
 EXC 
 
 1 
 
 Exchange 
 
 EXC 
 
 (1)(2). 
 
 EXI 
 
 
 Exit 
 
 EXIT 
 
 
 EXP 
 
 1 
 
 Exponent Base e 
 
 EXP 
 
 (D(21). 
 
 FAC 
 
 1 
 
 Factorial 
 
 FAC 
 
 (D(21). 
 
 FIX 
 
 1 
 
 Fixed Point Conversion 
 
 FIX 
 
 (D(21). 
 
 FLO 
 
 1 
 
 Floating Point Conversion 
 
 FLO 
 
 (D(21). 
 
 GO 
 
 
 Go to 
 
 GOTO 
 
 "A". 
 
 IF 
 
 
 Arithmetic If 
 
 IF (1)"A""B M "C". 
 
 INP 
 
 
 Input from Unit 
 
 INPUT (S2)(200). 
 
 LAS 
 
 
 Last 
 
 LAST 
 
 
 ELO 
 
 1 
 
 Log base e 
 
 ELOG 
 
 (D(21). 
 
 LOG 
 
 1 
 
 Log base 10 
 
 LOG 
 
 (D(21). 
 
 MAX 
 
 1,2 
 
 Maximum Value 
 
 MAX 
 
 (D(2)(21). 
 
 MIN 
 
 1,2 
 
 Minimum Value 
 
 MAX (1)(3)(5)(7)(9)(22) 
 
 MIN (l)(2)(2l). 
 
 MIN (1)(3)(5)(7)(9)(22) 
 
 MOD 
 
 1,2 
 
 Modular Arithmetic 
 
 MOD (1)(5)(21). 
 
 MOV 
 
 1 
 
 Move 
 
 MOVE (l)(2l). 
 
TRANSFORMATIONS 
 PAGE 3 
 
 mnemonic notes operation name 
 
 examples 
 
 MUL 
 
 NO 
 
 OUT 
 
 PER 
 
 RAD 
 
 RAI 
 
 REC 
 
 SIG 
 SIN 
 SKI 
 
 SQU 
 
 SUB 
 
 1,2 Multiply 
 
 No operation 
 
 3 Output to Unit 
 
 3 Permut e 
 
 1 Radians to Angle 
 
 1,2 Raise to Power 
 
 1,2 Recode 
 
 1 Sign Transfer 
 
 1 Sine 
 
 2 Skip record on unit 
 
 1 Square root 
 
 1,2 Subtract 
 
 MUL (l)(2)(2l). 
 MUL (1)(3)(5)(22). 
 
 NOOP. 
 
 OUT (S3)(l,50). 
 
 PER (100)(5)(1,M(8,9)(7). 
 
 RAD (l)(2l). 
 
 RAI (1)(2)(21). 
 
 REC (1)"GT M (2)(3)*11*. 
 
 REC (7)"EQ"*3*(7)*0*(7)*1*. 
 
 SIGN (l)(22). 
 
 SINE (l)(2l). 
 
 SKIP (S2)(100). 
 SKIP (S3)*l*. 
 
 SQU (l)(2l). 
 
 SUB (1)(2)(21). 
 
 XAD - add 
 
 XDI - divide These fixed point instructions are identical in operations 
 . f performed and format with the corresponding floating point 
 instructions, except that all variables appearing in these 
 XMU - multiply instructions must be in fixed point representation. 
 
 XSM - sum 
 
 XSU - 'subtract 
 
 2. 
 
 For all parameters containing variables or floating point constants, DO 
 notation may be used to indicate repetition of the operation on different 
 sets of variables and/or floating point constants. (Explanation of D0- 
 notation appears in Sec. 9). 
 
 All input parameters of the instruction which contain variables may be 
 replaced by floating point constants. 
 
 3. For all parameters containing variables, DO-notation ranges may be 
 substituted for a single variable occurance. (DO-notation ranges are 
 described in SEC. 9) 
 
TRANSF ORMAT IONS 
 PAGE k 
 
 VI. Transformations Labels 
 
 In TRANSFORMATIONS there are several instructions which perform 
 some kind of test on your data. In most cases, the results of that 
 test cause the program not to execute the next sequential instruction, 
 but to branch to some other statement in the TRANSFORMATIONS program 
 and continue executing with that statement. In order to refer to these 
 statements we wish to transfer to, TRANSFORMATION'S labels are 
 employed . 
 
 The form of these labels can be illustrated by the following 
 example which skips over the divide statement if the divisor is zero. 
 
 IF (3 1 "NEXT" "JUMP" "NEXT", 
 "NEXT"DIV (2H3)(^. 
 "JUMP" next statement 
 
 or 
 
 IF(3) "*+l" "*+2" "*+l*. 
 DIV (2) (3) (4). 
 
 next statement 
 
 The preceding equivalent examples exhibit the two types of TRANSFORMATIONS 
 labels. The syntactical rules which govern the two types of labels 
 follow. 
 
 Type 1 
 
 A. A type 1 label consists of eight or less alphanumeric characters 
 set off by a pair of quotes. 
 
 B. Alphanumeric characters consist of the alphabet from A to Z 
 and the numberic digits from to 9. 
 
 C. Any unique label may appear as an operand or branch address of 
 any number of TRANSFORMATIONS subparameter instructions. 
 
 D. Any label which appears as an operand or branch address of an 
 instruction must appear immediately preceding and be part of 
 
 at least one and only one TRANSFORMATIONS subparameter instruc- 
 tion in the present TRANSFORMATIONS program. 
 
 Type 2 
 
 A. A type 2 label consists of a positional reference of the 
 form *+n set off by a pair of quotes. 
 
 B. The symbol * is pointing to the statement in which it appears. 
 Therefore *+l would point to the next statement, *+2 would skip 
 
TRANSFORMATIONS 
 PAGE 5 
 
 one statement, and *-l wuld point to the preceding statement. 
 
 C. This type of label need only appear as an operand or branch 
 address and not before the TRANSFORMATIONS subparameter 
 instruction which it is referencing. 
 
 D. Since the statement to which you are going to branch is always 
 indicated relative to the statement from which you are branching, 
 it is possible to point to an address which would lie beyond the 
 end of the program or before the beginning of the program. 
 Needless to say, this would result in an error condition. 
 
 E. Branching to "*+0" would create an infinite loop and is also 
 illegal. 
 
TRANSFORMATIONS 
 PAGE 6 
 
 VII. Subparameter Description 
 ABO 
 
 The ABORT instruction causes immediate termination of the TRANSFORMATIONS 
 program and the entire SOUPAC program. This instruction is often 
 transferred to when internal tests reveal incorrect data. Example: 
 
 ABORT. 
 
 ABS 
 
 The ABSOLUTE VALUE instruction takes the absolute value of the first 
 variable and stores it into the second variable. Example: 
 
 ABS (3) (25). 
 
 ADD 
 
 The ADD instruction has from three to one hundred parameters 
 pointing to variables. The first variable through the next to last 
 variable are summed and the result is stored into the last variable. 
 Example s : 
 
 ADD (6) (7) (23). 
 
 ADD (1) (3) (5) (7) (9) (100). 
 
 ANG 
 
 The ANGLE TO RADIANS instruction converts the first variable, which 
 should be a measure of an angle, into radians and stores the result 
 into the second variable. Example: 
 
 ANG (3) (17). 
 
 A-C 
 
 The ARCCOSINE instruction takes the arccosine of the first variable 
 and stores it into the second variable. The first vairable must be 
 between minus one and one inclusive. The result will be stored in radians. 
 Example : 
 
 A-C (7)(3l+). 
 
 A-S 
 
 The ARCSINE instruction takes the arcsine of the first variable and 
 stores it into the second variable. The first variable must be 
 between minus one and one inclusive. The result will be stored in 
 radians . Example : 
 
 A-S (9)(13). 
 
TRANSFORMATIONS 
 PAGE T 
 
 A-T 
 
 The ARCTANGENT instruction takes the arctangent of the first 
 variable and stores it into the second variable. Example: 
 
 A-T (8) (70). 
 
 C-G 
 
 The COMPUTED GO TO has from two to twenty two parameters. The 
 first parameter contains a variable and the following parameters contain 
 labels. The basic form is 
 
 C-Gtv)"^" "L 2 " "L " "L n M . where n < 21 
 
 The variable must be floating point. If it is not of integral value 
 then it is truncated, (all digits to the right of the decimal point are 
 dropped") . The instruction will then branch to the label whose position 
 in the list is equal to the integral value of the variable. Example: 
 
 C-G (7) "A" "B" "C" "D" "E". 
 
 If variable 7 is equal to k.O or k.3 then the instruction will branch 
 to label "D". 
 
 CON 
 
 The CONSTANT instruction contains from two to one hundred parameters. 
 The instruction is used to assign constant values to variables. The 
 first parameter indicates either a variable or range of variables. 
 The subsequent parameters contain the fixed point constant (s) and/or 
 the floating point constant (s) which are assigned to the variable (s). 
 Example s : 
 
 CON(7)(99) assigns fixed point value 99 to variable 7. 
 CON(8)*!+.3* assigns floating point value ^.3 to variable 8. 
 CON(7,8)(99)*-U.3* is equivalent to the previous two together. 
 
 The more complicated structures which contain ranges and increments for 
 the variables and constants, uses DO-notation which is explained in Sec. 9. 
 An example of the CONSTANT instruction with that structure will be included 
 in that section. 
 
 COS 
 
 The COSINE instruction takes the cosine of the first variable and 
 stores it into the second. The first variable must be expressed in 
 radians. Example: 
 
 COS (1)(17). 
 
TRANSFORMATIONS 
 PAGE 8 
 
 DIF 
 
 The DIFFERENCE IF instruction contains five parameters. The first 
 two parameters point to variables and the next three contain labels. 
 The second variable is subtracted from the first. If the difference is 
 negative the instruction branches to the first label, if the difference 
 is zero it branches to the second or middle label, and if the difference 
 is positive it branches to the third or last label. Example: 
 
 DIF (3) (5) "A" "B" "C". 
 
 If variable 3 minus variable 5 is negative the instruction will branch 
 to label "A". 
 
 DIV 
 
 The DIVIDE instruction contains from three to five parameters. In the 
 case of three parameters, the first variable is divided by the second variable 
 and the result is stored in the third variable. A division by zero will 
 terminate the program. -Example : 
 
 DIV (1^(2^(30^. 
 In the case of four parameters, the division will take place as normal except 
 when the second variable is zero. In that event, the fourth variable will 
 be stored into the third variable as a supplied quotient. Example : 
 
 DIV (1) (2) (30) (3D. 
 
 If the fifth parameter is added, it indicates a label to be branched to 
 in the event of a division by zero. The branch will take place after 
 the supplied quotient is stored into the third variable. Example: 
 
 DIV (1) (2) (30) (3D "ZERO". 
 
 EBC 
 
 The EBCDIC instruction converts characters, which are read into the 
 program by means of an Al format field, into floating point numbers. 
 The first variable contains the character. The result after the 
 table look up will be stored into the second variable. Example: 
 
 EBC (6) (89^. 
 
 The table used for the conversions is located following the 
 subparameter descriptions. This instruction is often used to prepare 
 character codes for input to a SOUPAC FREQUENCY program. 
 
 EXC 
 
 The EXCHANGE instruction exchanges the contents of two variables. 
 Example : 
 
 EXC (l)(2). 
 
 EXI 
 
 The EXIT instruction causes immediate termination of the TRANSFORMATIONS 
 program, but will continue to execute the SOUPAC program which follows- 
 Example : 
 
TRANSFORMATIONS 
 PAGE 9 
 
 EXP 
 
 The EXPONENT BASE e instruction raises e to the power of the first 
 variable and stores the result in the second variable. Example: 
 
 EXP (31) (704). 
 
 FAC 
 
 The FACTORIAL instruction calculates the factorial of the first 
 variable and the result is stored in the second variable. Example: 
 
 FAC (87) (120). 
 FIX 
 
 The FIXED POINT CONVERSION instruction converts the floating point 
 variable indicated by the first parameter and stores the result into 
 the second variable. Example: 
 
 FIX (2) (9). 
 
 FLO 
 
 The FLOATING POINT CONVERSION instruction converts the fixed point 
 variable indicated by the first parameter and stores the result into 
 the second variable. Example: 
 
 FLOAT (7) (8). 
 
 GO 
 
 The GO TO instruction unconditionally branches to the label indicated 
 by the only parameter. Example: 
 
 GO TO "LABEL". 
 
 IF 
 
 The IF statement has four parameters. The first parameter contains 
 a variable and the remaining three parameters are labels. If the 
 variable is negative, zero, or positive the instruction will branch to 
 the first, second, or third label respectively. Example: 
 
 IF (33) "L1""L2""L3". 
 If variable 33 is positive the instruction will branch to label "L3". 
 
 INP 
 
 By use of the INPUT instruction you can read in additional rows of 
 data in addition to the rows from the main input matrix. The first parameter 
 indicates which sequential unit the data will be input from. The 
 second parameter indicates the starting variable number where you 
 wish the row to be read in. Example: 
 
 INPUT (S3) (500). 
 
TRANSFORMATIONS 
 PAGE 10 
 
 If S3 contains 6k variables then this input instruction will read the 
 next row from S3 and store it in variables 500 through 563. Be careful 
 to avoid overwriting of existing variables which you need and also of 
 trying to read more or less rows than exist on a particular unit. 
 
 LAS 
 
 The LAST CARD instruction allows instructions to be performed after 
 the last row of data has been read in and processed. The LAST instruction 
 divides a program into regular and last card segments. The regular 
 section, as is a TRANSFORMATIONS program without the LAST option, is 
 executed once for every row of data. After all the main input data 
 is processed the last card segment is executed once. One of the main 
 uses of the LAST instruction is to analyze data accumulated in variables 
 1001-2000 during the regular segment. Example: 
 
 LAST. 
 
 Only one LAST instruction may be used for TRANSFORMATIONS programs and 
 branching between regular and last card segments is prohibited. 
 
 ELO 
 
 The LOG BASE e instruction takes the natural log of the first variable 
 and stores it in the second variable. Example: 
 
 EL0G (3) (17). 
 LOG 
 
 The LOG BASE 10 instruction takes the base 10 log of the first 
 variable and stores it into the second variable. Example: 
 
 LOG (18) (3k). 
 
 MAX 
 
 The MAXIMUM VALUE instruction has from three to one hundred parameters 
 pointing to variables. The variable with the largest value from the 
 first variable to the next to last variable is stored into the last 
 variable . Example s : 
 
 MAX (1)(2)(7). 
 
 MAX (1) (7) (8) (11) (15! 
 
 MIN 
 
 The MINIMUM VALUE instruction has from three to one hundred parameters 
 pointing to variables. The variable with the smallest value from the 
 first variable to the next to last variable is stored into the last 
 variable. Examples: 
 
 MIN (11) (13) (29). 
 
 MIN (1) (3) (k) (7) (8) (10) (12). 
 
TRANSFORMATIONS 
 PAGE 11 
 
 MOD 
 
 The MODULAR ARITHMETIC instruction finds the value of the first 
 variable modulus the second variable and stores the result into the 
 third variable. Example: 
 
 MOD (1)(U)(7). 
 MOV 
 
 The MOVE instruction stores a copy of the first variable into the 
 second variable. If a value already exists in the second variable it 
 will be overwritten. Example : 
 
 MOVE (3H9)- 
 
 MUL 
 
 The MULTIPLY instruction has from three to one hundred parameters 
 pointing to variables. The first variable through the next to last 
 variable are multiplied together and the result is stored in the 
 last variable. Examples: 
 
 MUL (l) (2^(20). 
 
 MUL (3) (k) (5) (6) (10). 
 
 NO 
 
 The N00P instruction does nothing. Its primary use is when it is 
 preceded by a label and used as a placeholder in the TRANSFORMATIONS 
 subprogram to which many different instructions branch. It is 
 commonly used at the end of a TRANSFORMATIONS subprogram where 
 several isolated groups of instructions all 'Wish to branch to the 
 end. Example : 
 
 "END" N00P. 
 
 OUT 
 
 The OUTPUT TO UNIT instruction is the only way the TRANSFORMATIONS 
 program can output a row of data. There are two parameters. The 
 first indicates the unit to which the row of variables should be output, 
 The second parameter can be either a single variable or a range of 
 variables. Examples: 
 
 OUT (S2^ (7). 
 
 This will output onto S2 a row with one variable. 
 
 OUT (sU)(8,lU). 
 
 This will output onto Sk a row with seven variables, variable eight 
 through variable fourteen. All outputs to the same unit must have the 
 same number of variables output. A more detailed description of types 
 of ranges which are allowed will appear in Sec. 9 on DO notation- 
 
TRANSFORMATIONS 
 PAGE 12 
 
 PER 
 
 The PERMUTE instruction permutes the order of all or a subset of 
 your variables. It can have from two to one hundred parameters, all 
 indicating variable numbers. The first parameter indicates a starting 
 point of where a string of variables should be placed. The rest of the 
 parameters compose that string of variables. Each of the parameters 
 in that string represent either a single variable or a range of variables, 
 Examples : 
 
 PER (lOO)(3)(lO,l^)( 1 +,5). 
 
 This example places variable 3 in variable 100, variables 10 through ik 
 in variables 101 through 105, and variables k and 5 in variables 
 106 and 107. A more detailed description of types of ranges which 
 are allowed will appear in Sec. 9 on DO notation ranges. 
 
 PER (2)(1,1999). 
 
 This example will not propagate variable 1 through all the variables. 
 It will perform the intended purpose of raising all variable numbers 
 up one . 
 
 RAD 
 
 The RADIANS TO ANGLE instruction converts the first variable, which 
 should be a measure of radians, into an angle and stores the result into 
 the second variable. Example: 
 
 RAD (9)(13) 
 
 RAI 
 
 The RAISE instruction raises the first variable to the power contained in ti 
 second variable and stores the result into the third variable. 
 Example : 
 
 RAISE (1)(2)(7). 
 
 EEC 
 
 The RECODE instruction recodes variables depending on the satisfaction 
 of a set of conditions. The sequence of parameters depends on the 
 number of conditions that must be met. The RECODE instruction 
 introduces a new set of terminology which follows: 
 
 A. RELATIONAL OPERATORS 
 
 B. 
 
 "LT" 
 
 less than 
 
 "LE" 
 
 less than or equal to 
 
 "EQ" 
 
 equal 
 
 "NE" 
 
 not equal 
 
 "GE" 
 
 greater than or equal 
 
 "GT" 
 
 greater than 
 
 CONNECTIVES 
 
 
 "AND" 
 
 
 "OR" 
 
 
 "E0R" 
 
 
TRANSFORMATIONS 
 PAGE 1 3 
 
 C. CONDITION SET 
 
 A condition set consists of three parameters, the first and 
 third parameters pointing to variables and the second 
 parameter containing a relational operator. Any condition set is 
 either true or false depending upon whether the two variables 
 satisfy the conditions of the relational operator. 
 
 D. RECODE SET 
 
 A recode set consists of two parameters. The first parameter 
 indicates a variable. The second parameter indicates a variable 
 or a floating point constant. If a recode set is executed 
 the value of the second variable or floating point constant 
 is stored into the first variable. 
 
 The RECODE instruction consists of from one to twenty one condition 
 sets joined together in the case of more than one by connectives. This is 
 followed by a recode set to be executed if the logical product of the condi- 
 tion sets is true and optionally a second recode set to be executed 
 if the logical product is false. Examples: 
 
 REC (1|) "EQ" *3* (M *1* (M *0*. 
 
 If variable k equals 3.0 then recode it to 1.0, if not then recode it 
 to 0.0. 
 
 REC (6) "GE" (10) "AND" (7) "LE" (20) (110) (ill) 
 
 If variable 6 is greater than or equal to variable 10 and variable 7 is 
 less than or equal to variable 20 then recode variable 110 to the 
 value of variable 111 . 
 
 SIG 
 
 The SIGN instruction places the sign of the first variable on 
 the second variable. It is often used for saving signs of variables 
 during intermediate calculations. Example: 
 
 SIGN (l)(90l). 
 
 SIN 
 
 The SINE instruction takes the sine of the first variable and stores 
 it into the second variable. Example: 
 
 SIN (l)(ll). 
 SKI 
 
 The SKIP instruction has two parameters. The first parameter is a 
 sequential unit. The second is a variable or floating point number. 
 The number in the variable or the floating point number indicate the 
 number of rows to be skipped o n the specified unit. Example: 
 
 SKIP (S2)(7) 
 SKIP (S5) *1*. 
 
TRANSFORMATIONS 
 PAGE 1^ 
 
 SQU 
 
 The SQUARE ROOT instruction takes the square root of the first 
 variable and stores it into the second variable. Example: 
 
 SQU (13) (lM. 
 
 SUB 
 
 The SUBTRACT instruction subtracts the second variable from the first 
 variable and stores the result into the third variable. Example; 
 
 SUB (7)(9)(10). 
 XAD - fixed point addition 
 XDI - fixed point divides 
 XIF - fixed point if ' 
 XMU - fixed point multiply 
 XSM - fixed point sum 
 XSU - fixed point subtract 
 
 The preceding fixed point instructions have the same parameters as 
 the corresponding floating point instructions. The variables used 
 in the fixed point instructions must be the fixed point representation 
 rather than the normal floating point representation. 
 
TRANSFORMATIONS 
 PAGE l5 
 
 VIII. EBCDIC Conversion Table 
 
 Character 
 
 Floating 
 
 Point Number 
 
 Character 
 
 Floating Point Number 
 
 blank 
 
 
 -0. 
 
 * 
 
 
 Ik. 
 
 
 
 
 0. 
 
 . 
 
 
 75. 
 
 1 
 
 
 1. 
 
 < 
 
 
 76. 
 
 2 
 
 
 2. 
 
 ( 
 
 
 77- 
 
 3 
 
 
 3. 
 
 + 
 
 
 78. 
 
 k 
 
 
 k. 
 
 1 
 
 
 79- 
 
 5 
 
 
 5- 
 
 & 
 
 
 80. 
 
 6 
 
 
 6. 
 
 ! 
 
 
 90. 
 
 7 
 
 
 7- 
 
 $ 
 
 
 91. 
 
 8 
 
 
 8. 
 
 * 
 
 
 92. 
 
 9 
 
 
 9- 
 
 ) 
 
 
 93. 
 
 A 
 
 
 10. 
 
 » 
 
 
 9k. 
 
 B 
 
 
 11. 
 
 -T 
 
 
 95- 
 
 C 
 
 
 12. 
 
 
 
 96. 
 
 D 
 
 
 13. 
 
 / 
 
 
 97- 
 
 E 
 
 
 Ik. 
 
 5 
 
 
 107- 
 
 F 
 
 
 15. 
 
 J 
 
 
 108. 
 
 G 
 
 
 16. 
 
 
 
 109 
 
 H 
 
 
 17- 
 
 > 
 
 
 110. 
 
 I 
 
 
 18. 
 
 ? 
 
 
 ill. 
 
 J 
 
 
 19. 
 
 J 
 
 
 122. 
 
 K 
 
 
 20. 
 
 # 
 
 
 123- 
 
 L 
 
 
 21. 
 
 e 
 
 
 12U. 
 
 M 
 
 
 22. 
 
 i 
 
 
 125- 
 
 N 
 
 
 23. 
 
 = 
 
 
 126. 
 
 
 
 
 2k. 
 
 H 
 
 
 127. 
 
 P 
 
 
 25. 
 
 
 
 
 Q 
 
 
 26. 
 
 
 
 
 R 
 
 
 27. 
 
 
 
 
 S 
 
 
 28. 
 
 
 
 
 T 
 
 
 29- 
 
 
 
 
 U 
 
 
 30. 
 
 
 
 
 V 
 
 
 31. 
 
 
 
 
 w 
 
 
 32. 
 
 
 
 
 X 
 
 
 33. 
 
 
 
 
 Y 
 
 
 3k. 
 
 
 
 
 Z 
 
 
 35. 
 
 
 
 
TRANSFORMATIONS 
 PAGE l6 
 
 IX. DO-Notation 
 
 DO-notation is a facility provided in the TRANSFORMATION program to enable 
 a user to easily and compactly perform an operation on a set of variables 
 instead of performing that operation on each of the variables individually. 
 This concept of DO-notation corresponds to the concept of FORTRAN DO-loops . 
 The following form of DO-notation would be used in a parameter which 
 points to a variable. 
 
 (v ls V 2 , I) 
 
 V-, = the initial variable of the set 
 
 Vo = the criterion variable for termination of the set. 
 
 I = the increment 
 
 Examples : 
 
 (l, 5, l) points to variables 1, 2, 3, h, and 5. 
 
 (l, 5 » 2) points to variables 1, 3, and 5- 
 
 (k, lU,3) points to variables k, 7, 10, and 13. 
 
 If the increment is not specified it is assumed to be 1, Example: 
 
 (2, 10, l) is equivalent to (2, 10). 
 
 A. The major use of DO-notation is to indicate repetition of an instruction 
 on different sets of variables. Examples: 
 
 ADD(l,5,2)(6,8)(l2,lM is equivalent to ADD(l) (6) (12) . 
 
 ADD(3)(7)(13). 
 ADD(5)(8)(lU). 
 
 MUL(1,10(100)(101,107,2) is equivalent to MUL(l) (100) (lOl) . 
 
 MUL(2)(100)(103). 
 MUL(3)(100)(105). 
 MUL(M(100)(107). 
 
 If, in an instruction containing several parameters using DO 
 notation, the sets of variables are unequal in length, the instruction 
 will cycle until the longest set of variables has been satisfied. 
 The variables used in the shorter sets after they have been exhausted 
 will be the last variables of that set. Example: 
 
 MAX(1,7,3)(1 1 +,15)(13,15,2). is equivalent to MAX(l) (lk) (13) . 
 
 MAX(U)(15)(15). 
 MAX(7)(15)(15). 
 
 B. The secondary use of DO-notation, called DO-notation ranges, is used 
 with the constant, output, and permute instructions. Instead of 
 indicating a repetition of the instruction on each variable in the 
 set, the instruction is executed once. The parameter in which the 
 
TRANSFORMATIONS 
 PAGE IT 
 
 DO-notation range occurs now points to a string of variables. 
 Examples : 
 
 0UTPUT(Sl)(2,T). outputs the string of variables 
 
 2, 3,1+, 5, 6 and 7- 
 
 0UTPUT(S2) (3,7,2). outputs the string of variables 
 
 3,5, and 7- 
 
 PERMUTE (l00)(3)(l0,15)(20,2l+,2). 
 
 has two parameters of DO-notation 
 
 ranges in one instruction. The instruction 
 
 places into variables 100 through 109, the 
 
 following variables: 3, 10 ,11 ,12 ,13 ,lh ,15 , 
 
 20,22,2i+ 
 
 Looking at the CONSTANT instruction, we see DO-notation ranges used 
 to indicate strings of fixed and floating point constants as well 
 as strings of variables. 
 
 The first parameter of CONSTANT can point to a variable or a string 
 of variables. The later is the only case which involves DO-notation 
 ranges, so the discussion will be confined to that case. 
 
 C0N(2,*0(l)(2)(3) • places the fixed point constants one, two 
 
 and three, into the string of variables 
 2,3,U 
 
 By using DO-notation ranges with fixed point constants the equivalent 
 instruction would be 
 
 CON(2,l+)(l,3). 
 
 The next two examples are also equivalent. They both assign to 
 variables five through eight the floating point constants 
 2.5,5-0,7.5, and 10.0. 
 
 CON ( 5 , 8 ) *2 . 5**5 • 0**7 • 5**10 . 0* . 
 
 CON(5,8)*2.5,10.0,2.5*. 
 
 The initial values of the string of 
 floating point constants is 2.5- The 
 termination criterion is 10.0. The 
 increment is 2.5 
 
 All three types of DO-notation ranges can be used together in the 
 CONSTANT instruction. The following example combines both of the 
 preceding sets of examples into one instruction. 
 
 CON(2,8)(l,3)*2.5,10.0,2.5*. 
 
 Note : If, when using DO-notation ranges with the CONSTANT instruction, 
 the string of variables is unequal in length to the total number of 
 fixed and/or floating point constants indicated, the string of constants 
 is truncated if it is longer and if it is shorter the last constant 
 is assigned to the remainder of the variables. 
 
TRANSFORMATIONS 
 PAGE 18 
 
 X. Flag Notation 
 
 Flag notation is TRANSFORMATION'S version of indirect addressing. Instead 
 of a parameter pointing directly to a variable, it points indirectly to a 
 variable through another variable. The parameter points to a variable which 
 in turn points to another variable. This feature enables an instruction to 
 point to different sets of variables depending upon the values assigned to 
 the intermediate variables. 
 
 The main type of flag notation is called F-flag notation. F-flag notation 
 is indicated by inserting an F directly after the variable number. Example: 
 
 ADD(7F)(8F)(9F). Restriction : The values in the 
 
 intermediate variables of flag-notation 
 (variables 7,8, and 9 in this example) 
 must be in fixed point representation and 
 must point to a valid variable number. 
 
 Let the notation Vn indicate variable n. Example: V"7 indicates variable "J. 
 
 If V7 = 100 
 V8 = 150 
 V9 = 180 
 
 then the preceding example would generate after the indirect addressing 
 takes place : 
 
 ADD(100)(150)(180). 
 
 F-flag notation can also be used with DO-notation. Example: 
 
 If V100 = 2 V110 = 6 V120 = 10 
 V101 = 13 Vlll = 16 V122 = 19 
 V102 = 5 V112 = 9 V121+ = 13 
 
 then MUL(l00F,102F)(ll0F,112F)(l20F,12l+F,2). would generate after indirect 
 addressing. 
 
 MUL(2)(6)(10). 
 
 MUL(13)(16)(19). 
 
 MUL(5)(9)(13). 
 
 Note that the limits of the DO-notation are extracted form the intermediate 
 variables and not from the final variables. 
 
 The other type of flag notation is called D-Flag notation. Its only 
 use is with DO-notation. The difference between it and F-flag notation is 
 that the limits of the DO-notation with D-flag notation are derived from the 
 final variables after indirect addressing takes place. Example: 
 
 Using the same variable values as above. 
 
 MUL(100D,102D)(110D,112D)(120D,12UD). 
 
 would generate after indirect addressing: 
 
 MUL(2,5)(6,9)(10,13). 
 
 or 
 
 MUL(2)(6)(10). 
 MUL(3)(7)(H). 
 MUL(U)(8)(12). 
 MUL(5)(9)(13). 
 
TRANSFORMATIONS 
 PAGE 19 
 
 XI. A TRANSFORMATIONS Example 
 
 Let us consider a set of data consisting of ten variables. Variable 
 one contains a zero for females and a one for males. Variables two through 
 five and eight through ten all range from zero to ninety-nine and are to 
 be collapsed into the numbers one to four representing quartiles. Variables 
 six and seven are to be recoded into a dichotomous variable depending if 
 the value is five or not five. The output is then split into males and 
 females suitable for input into two separate FREQUENCY programs. Also 
 desired are averages from variables two through ten before recoding takes 
 place. Totals are kept during the regular segment and then in the last 
 card segment these are divided by the sample size and printed. 
 
 TRANSFORMATIONS 
 
 INPUT (C)(1). 
 
 PERMUTE (6) (8,10) (6,7). 
 
 ADD ( 1001 )*1*( 1001). 
 
 ADD (1002 ,1010 ) (2 ,10 ) (1002 ,1010 ) . 
 
 REC0DE (2,8)"LT"*25*(2,8)*1*. 
 
 REC0DE (2,8)"GE"*75*(2,8)*1+*. 
 
 REC0DE (2,8)"GE"*50*(2,8)*3*(2,8)*2*. 
 
 REC0DE (9,10)"EQ"*5*(6,7)*1*(6,7)*0*. 
 
 IF (l) "BAD ""FEMALE" "MALE". 
 
 "BAD "ABORT. 
 
 "FEMALE"0UTPUT ( S2 ) ( 2 ,10 ) . 
 
 GO TO "END". 
 
 "MALE"0UTPUT ( S3 ) ( 2 , 10 ) . 
 
 "END"N00P. 
 
 LAST. 
 
 DIVIDE (1002, 1010) (1001) (1002, 1010). 
 
 OUTPUT (P) (1001, 1010). 
 
 END PROGRAM 
 
 inputs one row from cards 
 
 reorders the variables 
 
 increments row number by one 
 
 adds respective values to row totals 
 
 recodes first quartile values 
 
 recodes fourth quartile values 
 
 recodes third and second quartile values 
 
 creates dichotomous variables 
 
 branches based on male or female 
 
 aborts program due to bad data 
 
 outputs female data 
 
 branches around male output 
 
 outputs male data 
 
 this instruction is only a placeholder 
 
 indicates beginning of last card segment 
 
 calculates averages 
 
 prints sample size and averages 
 
 The regular segment is executed once for every row of card input data. 
 Here the values are recoded and output, while also row totals are kept 
 in variables greater than 1000. The last card segment is executed once 
 to calculate and print the averages. 
 
TRANSFORMATIONS 
 PAGE 20 
 
 XII. NOTES and IDEAS 
 
 1. Missing data of the form -0.0 can be differentiated from 0.0 
 only in the recode statement, so if this distinction must be 
 made, -0.0 should first be recoded to another value before testing. 
 
 2. The valid outputs in the OUTPUT instruction are PRINT and/or 
 Sn, n<15. 
 
 3. When collapsing data be careful not to overlap your recoding 
 and inadvertently recode values twice or more. 
 
 h. The former ZAP and SPRAY instructions have been incorporated 
 into the CONSTANT instruction by using DO notation with 
 floating point constants. 
 
 5- The former RECIPROCAL instruction can now be performed using 
 a floating point constant of *1* in the DIVIDE instruction. 
 
 6. The ONE RECODE, TWO RECODE, OPEN WHEN, and CLOSED WHEN instructions 
 have all been replaced by the RECODE instruction. For those 
 not familiar with the new terminology in the RECODE instruction 
 more examples appear below. 
 
 Given: VI =10 V3 = 13 V5 = 23 
 V2 =23 VU = 89 V6 = -T 
 
 The following condition sets have the respective truth values. 
 
 (1) "GT" (2) is false 
 (3) "LE" (k) is true 
 
 (2) "EQ" (5) is true 
 (6) "GE" *0* is false 
 (2) "NE" (5) is false 
 (6) "LT" (It) is true 
 
 If two or more condition sets are joined by "AND", they must 
 all be true for the logical product to be true. If two or more 
 condition sets are joined by "OR", then the logical product is true 
 if any of the condition sets are true. If the connectives are mixed, 
 then the "AND" connective is of higher precedence than the other 
 connectives in the same way that multiply is of higher precedence 
 than addition in ordinary arithmetic. 
 
BASIC POPULATION STATISTICS PACKAGE 
 
FREQUENCY COUNTING AND 
 MEASURES OF ASSOCIATION 
 
 General Description 
 
 This program computes tables of the frequency of occurrence of 
 values that input variables take, and where appropriate, measures of 
 association may be computed. Input to the program may be in the form of 
 previously computed tables (on which measures of association will be 
 computed) or may be in the form of raw data. Only integer numbers 
 may be counted; decimal point data will be rounded. Negative values 
 are allowed. 
 
 A. FREQUENCY COUNTING 
 
 The following options are available : 
 
 1. Either one-dimensional or two-dimensional tables may be 
 specified. For one-dimensional counts, the frequency of 
 occurrence for each value of the variable is listed. For 
 two-dimensional counts, each value of the second variable 
 is counted separately for each value of the first variable. 
 
 2. Control variables may be used which enable counting to be done in 
 up to 12 dimensions. If control variables are specified, data 
 must be presorted on these variables. When the value of any 
 variable designated a control variable changes from one row 
 
 to the next, counting is stopped and a new table is started. 
 Thus counting proceeds as long as the values of all control 
 variables remain constant. 
 
 3. The minimum and maximum values to be attained may be specified 
 separately for each variable to be counted (or, optionally, 
 not specified at all). If either the maximum or minimum values 
 are not specified, they will be determined from the data using 
 an extra read of the data. Values which fall below the 
 minimum or above the maximum are ignored. This capability adds 
 flexibility to the program and may be an appreciable cost saver. 
 Its misuse by gross estimates of minimum and maximum values 
 can, however, be costly. 
 
 k. For each cell in a one-dimensional table, the percentage, if 
 requested, of the total sample that were counted there will be 
 printed. For two dimensional tables, the percentage of the row 
 and column may also be requested. 
 
 5. A weighting variable may be specified. Without a weighting 
 
 variable, frequency counts are advanced by one for each occurrence 
 of a value. When a weighting variable is used, the frequency 
 counts are advanced by the value of the weighting variable for 
 the row. Thus some rows of data may be given more importance 
 than others. 
 
FREQUENCY 
 Page 2 
 
 6. Labels can be given for variables so that output is more 
 readable. Each label is restricted to eight characters or 
 less. 
 
 7. Input may be from previously computed two-dimensional tables, 
 from which measures of association can be directly computed. 
 
 B. MEASURES OF ASSOCIATION 
 
 The following coefficients are calculated and printed on option for 
 two -way t ab le s : 
 
 1. Chi-square and related coefficients 
 
 Let: n = total population of the table 
 
 n = number of Vertical classification a (column a) 
 and Horizontal classification b (row b) 
 
 n a. = b n ab 
 
 n , = ^ab 
 • b a 
 
 oc = number of rows 
 P = number of columns 
 
 2 
 LL ( ab a. .b) 
 
 Then: chi-square = , n 
 
 ab - 
 
 n a. n .b/n 
 
 adjusted chi-square (Yate's correction for continuity) for 2x2 
 tables only = 
 
 2 
 ZZ /n , - n n - 1/2 . 
 , ( ab a b ' ) 
 
 ab — - 
 
 a d 
 n 
 
 c ( chi-square/n \ ' 
 1 + chi-square/n 
 
 /Chi-square/nv / 
 
 T = ( (a-lHp-lf ) 
 
 C and T are measures of contingency and can be looked up in con- 
 tingency tables. The maximum expected frequency is also printed. 
 
FREQUENCY 
 Page 3 
 
 2. Lambda coefficients 
 
 Let: n = Max n 
 
 am , ab 
 b 
 
 n , = Max n 
 mb ab 
 a 
 
 n = Max n 
 .m , . b 
 b 
 
 n = Max 
 m. 
 
 a 
 
 r 
 
 i 
 a. 
 
 
 
 
 Lambda = 
 
 £ 
 a 
 
 Z 
 
 n am + b n mb " 
 
 n .m - 
 
 • n m. 
 
 
 
 
 
 
 
 
 
 2n - n 
 
 .m 
 
 - n 
 
 m. 
 
 
 
 
 £ 
 
 a 
 
 n am " n .m 
 
 
 
 Lambda H 
 
 
 £ 
 b 
 
 n - n 
 
 m. 
 
 n mb " %. 
 
 
 
 Lambda V 
 
 = 
 
 
 n - n m. 
 
 
 
 Lambda coefficients will be indeterminate if all values lie in 
 one column or row. 
 
 Lambda H can be defined as the decrease in probability of error 
 in predicting the H-variable when knowledge of the value of the 
 V- variable is considered as opposed to random guessing of the 
 H-variable. 
 
 Ninety-five per cent confidence limits are calculated and printed 
 for Lambda H and Lambda V using the methods discussed by Goodman 
 and Kruskal in their second article. (See references). Lambda 
 is always between Lambda H and Lambda V. 
 
 Weighted Lambda Coefficients 
 
 n n , 
 
 v am ab 
 
 £ Max £ 
 
 „ n n 
 
 Weighted Lambda H = -— ^ B — — ^ 
 
 Max £ ab 
 a - , — 
 b an 
 
 a. 
 
 n r, n v, 
 
 v, mb ,, „ ab 
 E Max z 
 
 b .b a b .b 
 
 Weighted Lambda V = u —■ n —r 
 
 B Max E n ab 
 
 R - a b n , 
 
 K .b 
 
 These are Lambda H and Lambda V calculated using weighted quantities 
 
 n n 
 
 l/cc — — and 1/3 — — , respectively^ instead of n . 
 a. .b 
 
FREQUENCY 
 Page k 
 
 No confidence limits are provided for the weighted lambda coef- 
 ficients. 
 
 k. Gamma Coefficient 
 
 Zi 
 
 Z Z 
 
 Let: Ps = o>, n ab [a'>a b>b n n . h ,] 
 
 ab 
 
 ZE 
 
 Z 
 a' 
 
 Z 
 b' 
 
 PD = ab n ab C a '> a b '> b n a'b'^ 
 
 Then Gamma = 
 
 PS - PD 
 PS + PD 
 
 Ninety-five per cent confidence limits for Gamma are calculated 
 using the method outlined and preferred in the second article by 
 Goodman and Kruskal. 
 
 References 
 
 These coefficients are discussed and compared by Leo A. Goodman and 
 William H. Kruskal in their article "Measures of Association for 
 Cross Classification", American Statistical Association Journal, 
 December, 195^- 
 
 The Gamma coefficient is their suggested measure. 
 
 The C coefficient was first suggested by Karl Pearson and the T 
 coefficient is due to Tchuprow. 
 
 The Lambda coefficients apparently were first suggested by Louis 
 Guttman ("The Predication of Personal Adjustment", Bulletin h8, 
 Social Science Research Council, New York, 19^+1 "i ■ 
 
 The development of the approximate sampling theory and of the 
 machinery for calculating the confidence intervals for Lambda and 
 Gamma was done in a sequel article by Goodman and Kruskal: "Measures 
 of Association for Cross Classification III; Approximate Sampling 
 Theory", American Statistical Association Journal , June, 1963- 
 
 The statistics that are requested will be printed immediately 
 following each table. 
 
 II. 
 
 Restrictions 
 
 The program is limited currently to U50 input variables and 1000 
 tables. Tables are restricted to a maximum of 80,000 cells, 
 each of which can hold a maximum count of 32, 767. As many tables as 
 will fit into work storage (80,000) will be computed in each read of 
 the data. If tables will fit into 80,000 cells, card input is allowed. 
 If maxima and minima are not specified for card input, data will be 
 transferred to disk during preread of data. 
 
FREQUENCY 
 Page 5 
 
 III. Parameters 
 
 Main Parameter Card 
 
 Immediately following the program name FREQUENCY (mnemonic: FRE), the 
 following parameters are listed, each enclosed in parentheses with a 
 period after the last parameter used: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. 
 
 2 - ignore blanks 
 
 1 - count blanks separately 
 
 2 - count blanks as zeroes 
 
 _ - normal spacing 
 
 3 Spacing . , , / 
 
 1 - one table per page 
 
 h Address of labels. 
 
 5 Variable number of weight variable. 
 
 r „ . - raw data 
 
 6 Type of input . 
 
 n - where n is the number of 
 
 previously computed tables. 
 If n > 1, then input must be 
 from cards, and each table is 
 a separate data deck. 
 
 If both parameters 1 and h are cards, the labels must precede data. 
 
 Subparameters 
 
 Subparameters follow the main parameter card and can be in any order. 
 A period must follow each subparameter statement though the statement 
 can be continued on more than one card. If the subparameter statement 
 is left out, the option is not used. In the following explanation 
 I = integer and F = real number . 
 
 Mnemonic Use or Meaning 
 
 PER(l) (i) (i) . Per cents are requested . 
 
 1 - yes 
 
 l s "k integer = total per cent 
 2 n integer = row per cent 
 3 integer = column per cent 
 
 MIN*F**F* Minimum and maximum are given. The 
 
 MAX*F**F* last value is propagated to any 
 
 remaining variables. Data will be 
 reread if either MIN or MAX is missing. 
 
FREQUENCY 
 Page 6 
 
 Mnemonic 
 
 MEA(I)(I)(I)(I). 
 
 Only applicable to 
 2-way tables 
 
 Use or Meaning 
 
 CONTROL (I Hi 11 
 
 ONE(l,I,lHl,I.l) 
 TW0(I,I,I)(I,I,I) 
 
 - no 
 
 1 - yes 
 
 1 S ^ integer = X (with a code of 
 
 Measures are requested 
 r 2 
 
 2 both X^ and a table of expected 
 frequencies will be printed) 
 
 2nd integer = \ (lambda") 
 3 r integer = weighted X 
 h integer = j ( gamma ^ 
 
 Up to 10 control variables are allowed. 
 The I ' s should be the variable numbers 
 of the control variables. 
 
 One and only one of these two must be 
 .... in every program. ONE means one-way 
 tables. TWO means two-way tables. 
 In ONE, (l,I,l) specifies one range of 
 tables. In TWO, (l,I,l) (l,I>l) specifies 
 one range of 2-dimensional tables. 
 
 The notation (1,1,1 s ! has the following meaning: If it is absent com- 
 pletely, i.e., ONE. or TWO. then all possible tables are calculated. 
 The first integer is the initial value, the second is the terminal 
 value and the third is the increment. It means: take all values 
 starting at the first integer and stepping by the third integer until 
 you reach the second integer. If the third integer is missing, the 
 increment is taken to be one. If the second is also missing, then 
 the first is taken as a single table specification. As many as 
 wanted can be specified subject to the following restrictions: In 
 the two-way tables, no more than 5^0 separate ranges, i.e., (I, I, lHl>I> l) 
 can be specified. 
 
 IV. Labels 
 
 Labels can come from cards or temporary storage. Each label should 
 be treated as if it were two variables each h characters long. For 
 example, if there are 6 variables in the input data, then there would be 
 twelve variables for labels and the data card would be DATA(l2) (12AU) . 
 
 All the labels are treated as one row of input n variables long. 
 Labels need not be given for each variable but if a variable is skipped 
 and more labels follow, then it should be replaced with eight blanks. 
 
 Examples 
 
 1. FRE(C). 
 PER(l). 
 
 C0NTR0L(1)(3) • 
 0NE(2,6,2)(9). 
 END P 
 
 Input is from cards; per cent of totals will be printed; control variables 
 are 1 and 3; resulting tables are 2, k, 6, and 9. Blanks will be ignored. 
 
FREQUENCY 
 Page T 
 
 2. FRE(C) (2) (1) (C)(2). 
 PER(l) (1) (1) . 
 
 TW0(3)(1+) (2,6,2) (1,5,2). 
 
 END P 
 
 All per cents will be given; 2 will be the weighting variable; resulting 
 tables will be 3 vs h; 2 vs 1; 2 vs 3 j 2 vs 5 > ^ vs 1 ; U vs 3 ; h vs 5 ; 6 vs 1; 
 6 vs 3; 6 vs 5. Tables will be printed one per page and blanks will be 
 counted as zeroes. 
 
 Since both labels and data are on cards the deck will look like this: 
 FRE(C) 
 
 END S 
 
 DATA(n)(nAU) 
 
 label for first labeled variable. 
 
 ENB# 
 
 MTA(n/2)( ) 
 
 .label for last labeled variable 
 
 END# 
 
 3- FRE(S1)(1). 
 TWO. 
 MEA(2)(1)(1)(1). 
 
 END P 
 
 All possible two-way tables will be calculated; all four measures will be 
 calculated and the table of expected frequencies will be printed. Blanks 
 will be counted separately. 
 
 k. FRE(C)()()()0(2). ' 
 MEA(l)()(l). 
 
 END P 
 
 o 
 Input is m the form of two previously computed tables. X and weighted X 
 
 will be calculated. 
 
 Since there are two tables the deck will look like this: 
 FRE(C) 
 
 END S 
 
 DATA 
 
 END# 
 
 DATA 
 
 END# 
 
FREQUENCY 
 Page 8 
 
 VI. Output Examples 
 
 A. ONE-DIMENSIONAL TABLE 
 
 A one-dimensional frequency table might "be output as follows 
 
 VALUE 
 FREQ 
 
 1 
 3 
 
 k 
 25 
 
 5 
 
 30 
 
 9 
 2 
 
 TOTAL 
 
 6o 
 
 This table indicates that the value 1 occurred 3 times, that the 
 value k occurred 25 times, and so on, for a total of 60. 
 
 B. TWO-DIMENSIONAL TABLE 
 
 A two-dimensional frequency table might look like this: 
 
 VARIABLE 
 
 1 
 
 ACROSS 
 
 
 
 VARIABLE 
 
 2 
 
 DOWN 
 
 
 
 VALUE 
 
 
 1 
 
 3 
 
 5 
 
 2 
 
 
 1 
 
 It 
 
 3 
 
 5 
 
 
 2 
 
 8 
 
 1 
 
 7 
 
 
 1 
 
 2 
 
 1 
 
 SAMPLE SIZE = 23 
 
 This table would indicate that simultaneous observations of 1 for 
 variable 1 and 2 for variable 2 occurred once. A value of 3 for 
 variable 1 at the same time as a value of 5 for variable 2 occurs 
 8 times. The number of observations in the sample was 23. 
 
RANK ORTERING PROGRAM 
 
 I . General Description 
 
 A. Purpose 
 
 The RANK ORDERING program receives as input raw data matrix and 
 produces as output a matrix in which each element has been replaced 
 by a number denoting the rank of the element WITHIN ITS COLUMN. In 
 other words, each column of the input matrix is considered a separate 
 variate and will be converted to a corresponding ranking. 
 
 The smallest variate-value is assigned rank 1.0, the next largest 
 a rank 2.0, etc., until the largest variate-value is assigned the 
 highest rank. In the case of tied values, identical ranks are assigned 
 to equal values, the rank-number being set equal to the average of the 
 rank which would occur if the tied values were distinguishable. This 
 is sometimes known as "mid-rank method". 
 
 B. References 
 
 Kendall, Maurice G., Rank Correlation Methods , Charles Griffin and 
 Co., Ltd. London, I9U8. 
 
 II. Restrictions 
 
 A . Input 
 
 The input data to this program may come from any source. If cards 
 are used as input, the number of rows in the input matrix must be 
 specified on the data format card and the total number of elements 
 in the matrix may not exceed 30,000. The maximum number of rows 
 for any matrix input to this program is 30,000, and the maximum 
 number of columns for any matrix input to this program is U50. 
 
 B . Output 
 
 If an input matrix contains more than 30,000 elements, an automatic 
 partitioning of the input data occurs such that each partition contains 
 the maximum number of complete columns possible within the constraint 
 that no one partition may contain more than 30,000 elements. 
 
 The results of the ranking of each partition are output separately, 
 one partition per output address specified as a parameter on the pro- 
 gram parameter card. A maximum of twenty- one such output address are 
 allowed . 
 
 CAUTION : If partitioning is anticipated, the user should specify one output 
 address for each partition anticipated. This warning applies especially in 
 the case where printed or punched output occurs. Printing and punching will 
 occur only for the partitions for which it is specified. 
 
RANK ORDERING PROGRAM 
 Page 2 
 
 The exception is for partitions over the twenty-first one. For partitions 
 beyond the twenty-first, printing and punching is done if it was specified 
 for the twenty-first partition. However, no partitions beyond the twenty- 
 first one may be stored on a peripheral device (SEQUENTIAL address). 
 
 C. Data 
 
 Since all comparisons in this program are done in single word length 
 operands, in some cases the program may not be able to successfully 
 differentiate between two values which agree through the first five 
 significant digits and differ in subsequent digits. 
 
 III. Parameters 
 
 The parameters for the RANK ORDERING program must follow the program 
 name on the program call card in the order given below: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Addre s s . 
 
 2-23 Output Address. 
 
 IV. Special Comments 
 
 If RANK ORDER correlation coefficient P (Spearman's rho) is desired, 
 the rankings should be input to the CORRELATION program (see individual 
 program description) and the Product Moment Correlation coefficient 
 obtained. 
 
 CAUTION : Input to the CORRELATION program is restricted to 175 columns 
 (variables) . 
 
STANDARD SCORES 
 
 General Description 
 
 This program is used to calculate the following: 
 _ N 
 
 Mean: 
 
 i=l 
 
 1J 
 
 N 
 
 Standard Deviation: 
 
 Variance : 
 
 V. 
 
 Standardized Scores: 
 
 S.= 
 
 - N N 
 N E XT. - (2X. .) 
 i=l ^ i=l ^ 
 
 J 
 
 2 , 
 
 L ' J 
 
 z. . 
 
 ID 
 
 X. . 
 10 
 
 X, 
 
 1/2 
 
 IX 
 
 Moving Averages: X J 
 
 i=l 
 
 1J 
 
 where b=number of periods 
 
 II. Restrictions 
 
 A. The maximum number of variables is ^50. 
 
 B. Means, standard deviations, and W's may be calculated using as many 
 as 30 control variables. Data must be presorted (for instance with 
 SORT-MERGE or on a card sorting machine) on the control variables. 
 
 C. In obtaining moving averages where nvar = number of variables 
 and b = length of period, nvar*b is fixed for any core size. 
 
 If a design will not fit, a message will be printed giving proper 
 increment for Region. At 200K approximately 12000 cells (nvar*b) 
 are available . 
 
 D. Moving averages are exclusive of all other options. 
 
 E. Output is of four categories: 
 
 1. With or Without Control Breaks 
 
 a. Sample size, mean, standard deviation and variance. 
 
 b. Moving averages: sample size, mean, standard 
 deviation and variance. 
 
 Without Control Breaks 
 
 a. Standard scores, printed output includes sample size, 
 mean, standard deviation and variance. Missing data 
 option for mean, etc., is possible, but user is cautioned 
 against its use. Means, etc., may be output to a 
 temporary storage location. 
 
STANDARD SCORES 
 Page 2 
 
 Standard scores about a given mean and standard deviation. 
 Printed output includes sample size, mean, standard 
 deviation and variance. Missing data option for mean, 
 etc., is possible. Output may consist of standard 
 scores and about a given mean and standard deviation. 
 Output may be print and/or two different temporary 
 storage locations. Means, etc., may be output to a 
 temporary storage location. 
 
 III. Parameters 
 
 The parameters appear on the program call card following the 
 program name STANDARD SCORES in this order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. SEQUENTIAL 1-15 . Cards if only 
 
 - means, standard deviations, and variance desired, 
 or if precalculated means and standard deviation 
 are supplied (see parameter 10). 
 
 2 Output Address of Standardized Scores. 
 
 3 Output Address for Mean, Sample Size, 
 Standard Deviation, and Variance . Sample 
 Size, Mean, Standard Deviation, and Var- 
 iance can be put out on a temporary unit . 
 Output is in the form of four column 
 
 vectors (N, X, S., V.). 
 J J 
 
 h If 1, use N-l, if 0, use N for denominator 
 
 of standard deviations. N-l gives an unbiased 
 estimate of the population standard deviation 
 and population variance. N gives the sample 
 standard deviation and sample variance . 
 
 5 Output Address for Standard Scores about a 
 specified Mean and Standard Deviation. 
 SEQUENTIAL 1-13 and/or PRINT. 
 
 6 If parameter 5 is being used, place desired 
 Mean between asterisks, for example, *50*. 
 
 7 If parameter 5 is being used, place desired 
 Standard Deviation between asterisks, for 
 example, *5*- 
 
 8 Moving Averages: Put the number of periods 
 
 (observations) over which it is desired that 
 the data be averaged (i.e. b) . If control 
 variables are being used and/or the actual 
 number of observations is less than stated, 
 the data will be averaged using the actual 
 number of observations. 
 
STANDARD SCORES 
 Page 3 
 
 Parameter 
 Number Use or Meaning 
 
 9 If set equal to 1, blanks coded as -0.0 will 
 
 be checked for. s= _ , Tr 
 
 X., S., and V. will reflect 
 
 reduced N. 3 3 3 
 
 10 Input Address of means and standard deviations. 
 
 1st row contains means, 2nd row contains standard 
 deviations. All additional rows input are ignored, 
 Valid only for standard scores. 
 
 If using controls, on a separate card immediately after the STANDARD 
 SCORES card, list variable numbers of those variables used as controls. 
 For example, if controlling on variables 1, 2, and k: 
 
 STA(T2)(P) . 
 $C-B(1)(2)(1+). 
 
 NOTE: If there is only 1 observation and parameter k is set to 1, then the 
 mean, standard deviation and variance for that variable will be set to zero. 
 
 If blanks are checked and standard scores are requested, those observations 
 which have a blank will remain blank after calculation of standard scores. 
 
 Example (Use of Parameter 10 ) 
 
 // EXEC SOUP 
 //SYS IN DD * 
 MAT. 
 
 M0V(C)(S1). 
 END P 
 
 C0R(S1)(S2). 
 MAT. 
 
 TEA(S2)(S3). 
 END P 
 
 STA(S1)(P)()()()()()0()(S3). 
 END S 
 
 DATA(lO) (10F1.0) 
 
 END# 
 /* 
 
ANALYSIS OF VARIANCE PACKAGE 
 
BALANOVA 5 
 A General Analysis of Variance Program for the IBM 709^- 
 Paul Herzberg August 1966 
 Statistical Service Unit Research, University of Illinois, Urbana 
 
 Table of Contents 
 Chapter 1. General Description Page 1 
 
 1.1 Introduction 
 
 1.2 Special features 
 
 1.3 Legal designs in BALANOVA 5 
 
 l.k Calculations for equal and unequal number of replications 
 1.5 Specification of a design 
 
 Chapter 2. Design Examples 
 
 OMITTED TEMPORARILY 
 
 Chapter 3 • Preparation of Input 
 
 3.1 Introduction 
 
 3-2 Data matrix examples 
 
 Chapter k. Program Details 
 
 Page 8 
 
 Page 9 
 
 Page 15 
 
 h.1 Method and program flow 
 
 k.2 Some comments on accuracy of computation 
 
 k.3 Error conditions 
 
 k.h Program checkout 
 
 h.^> Number of levels of the replication factor 
 
 h.6 Acknowledgements 
 
 ^•7 References 
 
 Appendix. Key to designs in Winer and Lindquist Page 21 
 
 OMITTED TEMPORARILY 
 
BALANOVA 5 
 
 This version of BALANOVA 5 is unchanged as far as computations are 
 concerned from the original program written by Paul Herzberg in August, 
 1966. See his write-up for explanation and references for the calculations, 
 
 Each observation (row of data) input to this program must be identi- 
 fied by a number for each factor including the replication factor. These 
 numbers (which cannot be punched in I format) represent the levels of the 
 corresponding factors and must precede the dependent variables. In the 
 output produced by the program, each factor is given a unique letter name 
 beginning with A. Thus the first column of the input data corresponds to' 
 the levels of factor A which is described on the first factor specification 
 card (see below). Each additional factor is given the next letter in til 
 alphabet, and a corresponding factor specification card. The dependent 
 
 onTttoL I ^Vf/^r l6VelS ° n the lnPUt ^ and th ^ are numbered 
 program! *" ° f dependent variables, in the output of the 
 
 name ^AL^NOvTT "^w^*?' *** followln S Parameters follow the program 
 name, BALANOVA 5; with the first four parameters being required. 
 
 Parameter 
 Number Use or Meaning 
 
 1 
 2 
 
 Input Address. SEQUENTIAL 1-15 OR CARDS. 
 
 Number of factors counting replication 
 factor if there is one. Maximum = 10. 
 
 3 Number of dependent variables. 
 
 Maximum = 200. 
 
 k Number of levels of the 1 st factor. 
 
 J 1-3 lMirm'hov <->-P 1 Q,r»i ^ „.p j--u _ ^,__ i -, „th 
 
 11+ 
 
 Number of levels of the 2nd - 10 th factors, 
 
 1 if desire unweighted means analysis even though 
 have proportional cell frequencies. 
 p 1 to suppress printing of cell means, 
 
 fs „„+•"? t he P r °e ram ca rd is a separate subparameter card (factor speci- 
 
 the innTtT^ I T* f&Ct ° T ln the 0rder in Which the Actors appear in 
 the input data. Each card has the following parameters. 
 
 Parameter 
 
 Number Use or Meaning 
 
 if fixed factor 
 
 1 if random factor 
 
 if not the replication factor 
 
 1 if is the replication factor 
 
BALANOVA 5 
 Page 2 
 
 Parameter 
 Number Use or Meaning 
 
 3-11 factors in which this factor is nested 
 
 As in other SOUPAC programs, parameters at the end of the card which are 
 not used may be deleted and the period appear after the last non-zero para- 
 meter. The factor specification cards must be followed by an END PROGRAM 
 card. 
 
BALANOVA 5 
 
 Chapter 1. General Description 
 
 1.1 Introduction 
 
 BALANOVA 5 is a general analysis of variance program applicable to a 
 wide-range of balanced designs. In the case of designs with a replication 
 factor, BALANOVA 5 allows inequality in the number of replications in each 
 cell. If the number of replications is equal or proportional, the analysis 
 is handled by least squares (weighted means). If the number of replications 
 is not proportional then an unweighted means analysis is performed. This 
 is an approximation to the least squares solution. 
 
 BALANOVA 5 accepts some designs that are not completely crossed, namely 
 those nested designs in which all main factors are balanced. Hence hierarchical 
 designs are allowed. As well, repeated measures designs are allowed. In these 
 designs the replication factor is not nested in all the other factors. 
 
 The design model may be fixed-effects, random- effects or mixed. BALANOVA 5 
 automatically determines all the legal sources of variation (main effects and 
 interactions^ and determines the correct denominator mean square for those 
 sources which can be tested by F test. In order to do this, BALANOVA 5 first 
 generates the expected mean square table which is printed in readable form. 
 The method used closely follows Scheffe (1959"), Chapter 8. 
 
 BALANOVA 5 will accept most of the designs described in Winer (1962), 
 Chapters 3, h, 5, 6, and 7 and Lindquist (1953), Chapters 3, 5, 6, 6, 8, 9, 
 10, and 13 (Types I, III, V1 1. Chapter 2 and the Appendix of this manual 
 contain a large number of examples drawn from these two books. 
 
 The author was somewhat reluctant to develop BALANOVA 5 and is hesitant 
 to encourage its wide use for the following reasons: 
 
 1. A general program such as BALANOVA 5 encourages the use 
 of statistics in a "cook-book" manner. Data is generated 
 to fit the input specifications of the program with no 
 consideration given to the theory of analysis of variance. 
 The experimenter who uses a computer program in this way 
 often neglects to consider whether the statistical test 
 
 is appropriate for the work he is interested in and whether 
 the assumptions needed for the test are satisfied in the 
 particular experiment he has used. 
 
 2. In the author's experience, the results printed by such a 
 program as BALANOVA 5 have a certain finality which encourages 
 the user to accept the results as gospel truth. When used 
 
 in this way, the experimenter forgets that there is a real 
 possibility of programming or machine error. 
 
 3- In the particular case of analysis of variance, the idea 
 has become widespread that the summary table of F ratios 
 is the most important part of the analysis. This is not 
 the case . The most important part of analysis of variance 
 
-2- BALANOVA 5 
 
 is the estimation of the main effects and the 
 interactions. Only by looking at their size can 
 the experimenter evaluate what is happening in 
 his experiment. In order to encourage this use 
 of analysis of variance, BALANOVA 5 prints a table 
 of marginal means which allows easy calculation of 
 all the effects in the experiment. The F table is 
 only a set of warning signals. A non- significant 
 F indicates that the corresponding differences 
 between effects can be attributed to sheer chance. 
 
 k. BALANOVA 5 performs an unweighted-means analysis when 
 the replication numbers are non-proportional. The 
 author fears that this option will be used too often and 
 without consideration of its dangers. The unweighted- 
 means solution is often not satisfactory and references 
 on analysis of variance should be consulted. (Scheffe, 
 1959, Winer, 1962, Lindquist, 1953). 
 
 BALANOVA 5 was designed to reduce the great amount of hand computation 
 needed in analysis of variance calculations. It was not intended to eliminate 
 the necessity of the user being familiar with the theory of analysis of vari- 
 ance. It is hoped that the above comments will discourage some indiscriminant 
 use of BALANOVA 5- 
 
 1.2 Special Features 
 
 The output from BALANOVA 5 consists of 
 
 1. A table of the expected mean squares in readable 
 form. 
 
 2. The number of replications in each cell in the 
 case of designs with a replication factor. 
 
 3- The table of marginal means. All means entering 
 
 in the computation of the sum of squares are printed. 
 
 h. The analysis of variance summary table including, 
 for each source of variation, the sum of squares 
 and mean square, and for each source with denominator, 
 the F ratio and the probability of the chance occurance 
 of the F ratio. 
 
 A feature of BALANOVA 5 is its flexible specification of analysis of 
 variance designs, allowing a wide range of designs to be described by a 
 common code . 
 
 A large number of checks are made by BALANOVA 5 to ensure that the design 
 is legal and that the data correspond to the design. Diagnostics are printed 
 to indicate all error conditions. 
 
BALANOVA 5 
 
 1-3 Legal Design of BALANOVA 5 
 
 Consider the following definitions, taken from Scheffe (1959) • Let 
 there be p factors in a design, not counting the replication factor, if 
 there is one. A cell is specified by a set of p levels, one for each 
 factor. The layout of design is complete if there is at least one ob- 
 servation in every cell. The factors in such a design are completely 
 crossed. If the design is complete and there is a replication factor 
 (i.e. all cells have at least one observation and at least one cell has 
 more than one observation^ then the design is considered to be a Class A 
 design in BALANOVA 5- 
 
 There are many analysis of variance designs which are not complete 
 in the above sense. Examples of incomplete designs are Latin-square, 
 incomplete blocks and nested designs. The only incomplete designs 
 which are allowed in BALANOVA 5 are nested designs which are balanced in 
 all factors except for the replication factor (which need not be balanced). 
 These incomplete designs are called Class B and C designs. "Nesting" 
 "balanced" and "replication factor" are defined in the nest three para- 
 graphs. These definitions are illustrated in Chapter 2. 
 
 Nesting may be defined as follows: The levels of a factor C are 
 nested within the levels of a factor A (in short, C is nested within A) 
 if and only if each level of C appears with only a single level of A in 
 the observations. Note that if C is not nested within A, it is crossed 
 with A, but only if every level of C appears with every level of A is 
 C completely crossed with A. Latin-square and incomplete block designs 
 are only partly crossed. 
 
 A nested factor C is balanced if the number of levels of C is the same 
 within each combination of those factors within which C is nested and the 
 factors (if any) which are crossed with C are completely crossed. 
 
 A replication factor , in BALANOVA 5, is a factor which is nested within 
 one or more other factors, but not necessarily within all other factors. 
 Furthermore, no factor may be nested within the replication factor. That is, 
 a factor is a replication factor if and only if for every other factor A in 
 the design, it is either nested within A or crossed with A. A replication 
 factor may be nested within some factors and crossed with others. There can 
 be at most one replication factor in a design. 
 
 The distinction is made between replication factors and other nested 
 factors in BALANOVA 5 since replication factors do not have to be balanced . 
 All other factors must be balanced. 
 
 Using these definitions, the following designs are legal in BALANOVA 5- 
 
 Class A designs (completely crossed with nested replications) 
 
 Class A designs contain (p + 1) factors of which p are the main factors 
 and the other factor is the replication factor. The following two conditions 
 must both be met for the design to be Class A. 
 
-k- 
 
 BALANOVA 5 
 
 (a) All p main factors are completely crossed. 
 
 (b) The replication factor is nested in all main factors. 
 Thus one-way and factorial designs are Class A designs. 
 
 Class E designs (other replication designs) 
 
 Class B designs also contain (p + 1) factors of which p are the main 
 factors and the other factor is the replication factor. However one or 
 both of the two conditions, (a) and (b), are not satisfied in Class B 
 
 designs. 
 
 When (a) is not satisfied, that is, the p main factors are not 
 completely crossed, then the main factors must satisfy the following 
 condition. 
 
 (a') Consider any two main factors, A and B. Either A is completely 
 crossed with B, or A is nested within B or B is nested within 
 A. This must be true for all pairs of main factors. Further- 
 more, at least one pair must have the nested relationship or 
 else (al would be satisfied. 
 
 When (b) is not satisfied, then the following condition must be true. 
 
 (b') The replication factor is nested in at least one but not all 
 main factors. Note that the requirement that the replication 
 factor be nested in at least one factor is part of the basic 
 definition of a replication factor. 
 
 Class B designs then can be of the following two types. 
 
 Hierarchical designs: (a') and (b) are satisfied. The replication 
 factor is nested in all factors but there is some nesting among the main 
 factors. 
 
 Repeated measures designs: (b 1 ) is satisfied. Either (a) or (a') can 
 be satisfied. The necessary feature (b 1 ) of repeated measures designs is 
 that the replication factor is crossed with one or more of the main factors. 
 The factors in which the replication factor is nested may themselves be either 
 crossed (a) or nested (a'). 
 
 Class C designs (no replication factor) 
 
 Class C designs have p factors and there is no replication factor. 
 All factors must be balanced. For each pair of factors, e.g. factors A 
 and B, either A is completely crossed with B, or A is nested within B or 
 B is nested within A. There does not necessarily have to be any nesting 
 at all. 
 
BALANOVA 5 
 
 In summary, then, designs are classed in the following way in BALANOVA 
 5. Class A and B designs have a replication factor, Class C designs do not. 
 Class A designs are distinguished from Class B designs in that a Class A 
 must have l) all main factors completely crossed and 2) the replication 
 factor nested in all main factors. Class B designs violate one or both 
 these requirements. 
 
 In Class A and B designs, the replication factor does not need to be 
 balanced. However all nested factors, except the replication factor (if 
 any^, must be balanced. Recall that in Class A designs, the replication 
 factor is the only nested factor. 
 
 As explained above, the replication factor is distinguished from 
 other nested factors since it does not have to be balanced. There are 
 two other reasons for distinguishing the replication factor from other 
 nested factors. These reasons are important even if the replication 
 factor is balanced. 
 
 1. In Class A designs (completely crossed with replications) only 
 cell means are stored in the computer and thus very large designs 
 can be accommodated. The allowable number of replications in 
 each cell is virtually unlimited. 
 
 2. For all replication designs, whether of Class A or B, the level 
 number for the replication factor in each nest does not need 
 
 to run from one up to the maximum number of levels in each 
 nest as it does for all other factors. Any convenient numbering 
 of the replications may be used (e.g. a unique number for every 
 subject in the experiment, regardless of the nest within which 
 he is"> . This feature of BALANOVA 5 is especially useful when 
 several dependent variables are analyzed and there is missing 
 data for some of the subjects for some of the dependent variables. 
 
 l.U Calculations for equal and unequal number of replications 
 
 The calculations performed by BALANOVA 5 for designs with a replication 
 factor (Class A and B designs) depend on whether the number of replications 
 in each cell are equal or unequal. If the numbers are equal, the standard 
 analysis of variance calculation is made (least-squares or weighted means 
 analysis^. If the numbers are unequal, a check is first made to see if the 
 cell N's are proportional. In a two-way analysis of variance, for example, 
 the cell N's are proportional if the number of replications in the ij cell, 
 N-ji, satisfies 
 
 1J N rr 
 
 where the T's indicate marginal totals. If the cell N's are proportional, 
 BALANOVA 5 makes the least-squares calculations, i.e. weighted means are 
 used. If the cell N's are not proportional, then the method of unweighted 
 means is used (See Scheffe, pp. 262-3 or Winer, pp. 222-U) . 
 
BALANOVA 5 
 
 In general, if 1, j, k, 
 
 X are those factors within which the 
 
 replication factor is nested (not necessarily all the factors in the design'), 
 
 and if N 
 
 then the cell N's are proportional, if, for all combinations ijk...A, 
 
 ink. , .X ^ s ^ e num b er °f replications in a particular nested cell, 
 
 N 
 
 N 
 
 iTT. 
 
 ijk. . . .X 
 
 ■ T X N TjT. 
 
 N 
 
 ,q-i 
 
 TTT. 
 
 x N 
 
 TTT X 
 
 In this formula, the T's indicate marginal totals and q is the number of 
 factors within which the replication factor is nested. In particular, the 
 one-way analysis with unequal N's is a proportional design (i.e. the cell 
 N's are proportional ) by this definition, since 
 
 N. 
 
 N. 
 
 N i 
 
 (V 
 
 N. 
 
 1 
 
 In fact any design in which the replication factor is nested in only one 
 factor is a proportional design. 
 
 1-5 Specification of a Design 
 
 Any design is described by listing the following information about each 
 factor in the design, including the replication factor if there is one. The 
 information for each factor is punched on a separate card (a factor specifi- 
 cation card), and the cards should be in the same order as the factors are 
 in the input data. Each parameter should be enclosed in parentheses, and 
 each card terminated by a period. 
 
 Parameter 1 
 
 Type of factor. The first parameter on each 
 factor specification card should be a zero 
 if the factor is fixed, and a one if it is 
 random. The replication factor is a lways a 
 random factor. At least one factor in every 
 design must be random. 
 
 Parameter 2 
 
 Replication Factor . If the design has a 
 replication factor, this is indicated by 
 punching a one for the second parameter. 
 A design may have only one replication factor. 
 If there is no replication factor, the second 
 parameter should be zero (or blank) on all 
 of the factor specification cards. 
 
 Parameter 3-11 
 
 Nesting . The factors in which the given factor 
 ( the one to which this card refers^ is nested 
 are listed. Factors are numbered from one 
 through the number of factors in the design. 
 If the factor is not nested, parameters 3-H 
 may be completely omitted. 
 
-7- BALANOVA 5 
 
 An example of this way of specifying a design will be now given. 
 Consider a two-way analysis of variance with subjects within cells. The 
 design is considered to have three (not two) factors, namely A and B, 
 the main factors, and C, the replication factor. Suppose there are 3 
 levels of A and k levels of B and that each cell has 10 subjects. The 
 cards used to perform this analysis are listed below. Each line corresponds 
 to one IBM card. 
 
 TRANSFORMATIONS (CARDS) (SEQUENTIAL l) (k) . 
 
 END PROGRAM 
 
 BALAN0VA( SEQUENTIAL D (3^ (l) (3) (k) (10) . 
 
 (0>(0). 
 
 (o)(o). 
 
 (1)(1)(1)(2). 
 
 END PROGRAM 
 
 The first card listed above calls the TRANSFORMATION program and uses 
 it to store raw data from cards onto sequential file number 1 (SEQUENTIAL 1) . 
 The number four is the total number of variables, independent (3) + dependent 
 (l). Since no transformations are performed, the END PROGRAM card immediately 
 follows the main program card. 
 
 The third card listed above calls the BALANOVA program. The first para- 
 meter is the location of the data (SEQUENTIAL V, the second is the number 
 of factors (3" 1 , the third is the number of dependent variables (1), the fourth 
 is the number of levels of the first factor (3S followed by the number of 
 levels of the second factor (k) , and finally the number of levels of the 
 last factor, which is the maximum cell size (10) when considering the repli- 
 cation factor. 
 
 The fourth card is the factor specification card for factor 1. The first 
 2 parameters are zero, labeling this factor as fixed, and as not being the 
 replication factor. The fifth card is the factor specification card for factor 
 2 which is also fixed, and not the replication factor. Note that parameters 
 3-11 are blank, as factors 1 and 2 are not nested in any other factors. 
 
 The sixth card is the factor specification card for factor 3- Its four 
 parameters denote it as a random factor, as the replication factor, and as 
 nested in factors 1 and 2. The seventh card terminates the BALANOVA program. 
 
-8- BALANOVA 5 
 
 Chapter 2. Design Examples 
 OMITTED TEMPORARILY 
 
-9- BALANOVA 5 
 
 Chapter 3- Preparation of Input 
 
 3-1 Introduction 
 
 The following rules apply to the assignment of factor levels in all 
 types of designs. There is usually no need to rekeypunch existing data 
 however, as it is almost always possible to create the factor levels in 
 the TRANSFORMATIONS program. If you need help using this program, see a 
 SOUPAC consultant. 
 
 (al Non-replication Factors in Class A, E and C designs 
 
 The levels for non-replication factors must run from one (l) con- 
 secutively up to the number of levels given on the BALANOVA call card. 
 E.g., if a factor represents four treatment groups, these groups must be 
 numbered 1, 2, 3, and k and each subject's row or rows in the data matrix 
 must have a 1, 2, 3, or k punched to indicate the group he is in. If the 
 factor is nested, the level numbers must run from one (l^ up, in each 
 cell of the nest. See the example given in Section 3-2. 
 
 (b^ Replication Factors in Class A design 
 
 The replication numbers (level numbers"! can be anything , for example, 
 a subject identification number. The subject numbers do not have to be 
 unique either in a group or between groups. In fact, to tell the truth, 
 in Class A designs, the replication level is not used but it must never- 
 theless appear, even if it is a dummy. This statement does not apply to 
 other design classes. 
 
 (c) Replication Factors in Class B designs, of repeated measures type 
 
 Special care must be taken with the replication levels in those designs. 
 Let us divide the non-replication factors into two groups : 
 
 a- set: those factors in which the replication factor is nested. 
 
 P-set: all other factors - i.e. those factors crossed with the 
 replication factor. 
 
 If the p-set is empty, the design is of hierarchical rather than repeated 
 measures type. See paragraph (d) below. 
 
 Let us denote by an a- cell a particular set of levels of the factors in 
 the a-set. The replications in this cell may be any values (not necessarily 
 from one (IV up) but must be distinct. Again, the numberings in two different 
 a- cells do not have to be distinct, but can be. In other words, if the 
 replication factor is subjects, an identification number may be used as the 
 replication level. Now each subject appears in more than one row (card) of 
 the data matrix since each subject appears with every combination of levels 
 of the factors in the p-set. Now it should be obvious that every row that 
 refers to the same subject has to have the same replication number. This is 
 the only way that BALANOVA 5 can tell that two different rows refer to the 
 same subject. 
 
10- BALANOVA 5 
 
 (d'l Replication Factors in Class B designs, of hierarchical type 
 
 The subject numbers must all be different within any one cell (one 
 level set of the non-replication factors) but may be the same over 
 different cells. 
 
 Special note on missing data 
 
 If a dependent variable' field on a card is totally blank, BALANOVA 5 
 does not include the score in the analysis for the given dependent variable. 
 However other non-blank dependent variable fields on the same card will be 
 included in their respective analyses. 
 
 Do not confuse this deletion of missing data with an error comment by 
 BALANOVA 5 to the effect that there is no data cell A = 1, B = 2. This 
 comment means that no data card with A = 1, B = 2 had non-blank data for 
 the given dependent variable'. 
 
 3-2 Data matrix examples 
 
 Class A design 
 
 In all the following examples, it is assumed data is stored on 
 sequential file number 1 (SEQUENTIAL l) . 
 
 Consider a two-way design with three subjects in each cell. For the 
 purpose of BALANOVA, subjects are also considered to be a factor, the 
 replication factor. Suppose that there are two dependent variables, and 
 further that the factor specification cards are listed in the order given 
 below, following the main program card. 
 
 BALANOVA (SEQUENTIAL V (3) (2) (2) (3) (3) • 
 (0)(0). 
 
 (D(iHiH3 N . 
 (oKo). 
 
 END PROGRAM 
 
 Note that, contrary to the usual case, the replication is the second factor. 
 This illustrates one flexible feature of BALANOVA 5. A data matrix could be; 
 
 1 1 1 20 19 
 
 1 
 
 2 
 
 l 
 
 8 
 
 8 
 
 1 
 
 3 
 
 l 
 
 k 
 
 k 
 
 1 
 
 h 
 
 2 
 
 -3 
 
 6 
 
 1 
 
 5 
 
 2 
 
 1+ 
 
 10 
 
 1 
 
 6 
 
 2 
 
 2 
 
 3 
 
 1 
 
 7 
 
 3 
 
 k 
 
 h 
 
 1 
 
 8 
 
 3 
 
 6 
 
 2 
 
 1 
 
 9 
 
 3 
 
 8 
 
 k 
 
 2 
 
 10 
 
 1 
 
 2 
 
 7 
 
 2 
 
 11 
 
 1 
 
 -k 
 
 8 
 
-11- BALANOVA 5 
 
 12 1 25 2 
 
 2 
 
 13 
 
 2 
 
 126 
 
 15 
 
 2 
 
 lU 
 
 2 
 
 2 
 
 20 
 
 2 
 
 15 
 
 2 
 
 3 
 
 3 
 
 2 
 
 16 
 
 3 
 
 1+ 
 
 k 
 
 2 
 
 17 
 
 3 
 
 5 
 
 -1 
 
 2 
 
 18 
 
 3 
 
 3 
 
 2 
 
 Note that the first column is the A level, the third column is the C level 
 and the second column is the replication level, which in Class A designs can 
 be anything. The last two columns are the dependent variables. Each row 
 of the data matrix would be punched on one or more cards. A possible format 
 would be (3F5.0,3X,2F6.0) . 
 
 The order of the rows is immaterial. They could be in any order and 
 have been written in a systematic order only for convenience. 
 
 Class B design - repeated measures 
 
 The data in Winer, Table 7-2-3 could be analyzed with the following pro- 
 gram, using three factors and one dependent variable. 
 
 BALANOVA (SEQUENTIAL l) (3) (l) (2) (k) (3) • 
 (0)(0). 
 
 (oVo). 
 
 END PROGRAM 
 
 Data Card 
 
 s : 
 
 
 
 1 
 
 1 
 
 l 
 
 
 
 1 
 
 2 
 
 l 
 
 
 
 1 
 
 3 
 
 l 
 
 5 
 
 1 
 
 k 
 
 l 
 
 3 
 
 1 
 
 1 
 
 2 
 
 3 
 
 1 
 
 2 
 
 2 
 
 1 
 
 1 
 
 3 
 
 2 
 
 5 
 
 1 
 
 k 
 
 2 
 
 ^ 
 
 etc. 
 
 
 
 
 d 
 
 1 
 
 5 
 
 5 
 
 2 
 
 o 
 
 5 
 
 U 
 
 2 
 
 3 
 
 5 
 
 6 
 
 2 
 
 h 
 
 5 
 
 6 
 
 2 
 
 1 
 
 6 
 
 7 
 
 2 
 
 2 
 
 6 
 
 5 
 
 2 
 
 3 
 
 6 
 
 8 
 
 2 
 
 k 
 
 6 
 
 9 
 
 Again, the rows could be any order. The subjects in the second level of factor 
 A could be assigned level numbers 1, 2, 3 or any other thre distinct numbers. 
 A possible format for this matrix is (2F5 .0,F6 .0,1X,F7.0) . 
 
 Another repeated measure design is Winer, Table J.k-3- Here we have 
 four factors and one dependent variable. 
 
-12- BALANOVA 5 
 
 BALANOVA( SEQUENTIAL l") (If) (l) (2) (2) (3) (k) 
 (0)(0). 
 
 (o)(o). 
 
 (1)(1)(1)(2). 
 (0)(0). 
 
 END PROGRAM 
 
 Data Card 
 
 s : 
 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 18 
 
 1 
 
 1 
 
 1 
 
 2 
 
 llf 
 
 1 
 
 1 
 
 1 
 
 3 
 
 12 
 
 1 
 
 1 
 
 1 
 
 k 
 
 6 
 
 1 
 
 1 
 
 2 
 
 1 
 
 19 
 
 1 
 
 1 
 
 2 
 
 2 
 
 12 
 
 1 
 
 1 
 
 2 
 
 3 
 
 8 
 
 1 
 
 1 
 
 2 
 
 h 
 
 If 
 
 etc. 
 
 
 
 
 
 1 
 
 2 
 
 6 
 
 l 
 
 18 
 
 1 
 
 2 
 
 6 
 
 2 
 
 10 
 
 1 
 
 2 
 
 6 
 
 3 
 
 5 
 
 1 
 
 2 
 
 6 
 
 If 
 
 1 
 
 2 
 
 1 
 
 7 
 
 1 
 
 16 
 
 o 
 
 1 
 
 7 
 
 c 
 
 10 
 
 2 
 
 1 
 
 7 
 
 3 
 
 8 
 
 2 
 
 1 
 
 7 
 
 if 
 
 If 
 
 etc. 
 
 
 
 
 
 2 
 
 2 
 
 12 
 
 1 
 
 16 
 
 2 
 
 2 
 
 12 
 
 2 
 
 12 
 
 2 
 
 2 
 
 12 
 
 3 
 
 8 
 
 2 
 
 2 
 
 12 
 
 If 
 
 8 
 
 The assignment of levels to factors A, B, D (columns 1, 2, h) has to be as 
 shown above, but the subjects numbers can be changed provided that, for each 
 (A,B) cell, no two subjects have the same number. A possible format is 
 (2F5-0, 1X,2F5-0,F12.0) . 
 
 Class B design - hierarchical 
 
 Consider the following design with three factors and one dependent vari- 
 able. 
 
 BALANOVA ( SEQUENTIAL! ) (3^ (D (2) (2) (If) . 
 
 (o)(o). 
 (o)(o)(i). 
 
 (1)(1)(1)(2). 
 
 END PROGRAM 
 
 This is a hospitals (factor 2) within drugs (factor 1) design, illustrated 
 below, with unequal cell size. 
 
.13- BALANOVA 5 
 
 Cards : 
 
 
 
 1 1 
 
 1 
 
 5-0 
 
 1 1 
 
 2 
 
 k.2 
 
 1 2 
 
 1 
 
 5-6 
 
 1 2 
 
 £1 
 
 3-2 
 
 1 2 
 
 3 
 
 1+.6 
 
 2 1 
 
 1 
 
 5-3 
 
 2 1 
 
 2 
 
 8.2 
 
 2 1 
 
 3 
 
 k.3 
 
 2 1 
 
 U 
 
 6.3 
 
 2 2 
 
 1 
 
 5-7 
 
 2 2 
 
 2 
 
 6.8 
 
 Note that the hospitals are numbered 1, 2 in each level of factor 1 even though 
 there are four different hospitals involved. This is necessary since factor 2 
 is a non-replication factor - see the rules about Data Cards in Section 3-1- 
 There are 2 patients in hospital 1 for drug 1, 3 patients in hospital 2 for 
 drug 1, k patients in hospital 1 for drug 2 and 2 patients in hospital 2 for 
 drug 2. The patient numbering is flexible - it could be a different number 
 for every patient, regardless of hospital. A possible format is (3F5 -0,F7.l) . 
 
 Class C design 
 
 The example in Winer, Table h.3-1, could be set up as follows, with two factors 
 and one dependent variable. 
 
 BALANOVA (SEQUENTIAL 1) (?.) (l) (5) (k) . 
 
 (D(0). 
 
 (0)(0). 
 
 END PROGRAM 
 
 Data Cai 
 
 ■ds : 
 
 
 1 
 
 1 
 
 30 
 
 1 
 
 3 
 
 16 
 
 2 
 
 1 
 
 Ik 
 
 2 
 
 o 
 
 18 
 
 2. 
 
 3 
 
 10 
 
 r- : 
 
 k 
 
 22 
 
 1 
 
 2 
 
 28 
 
 3 
 
 1 
 
 2k 
 
 
 2 
 
 20 
 
 3 
 
 _ 
 
 3 
 
 18 
 
 3 
 
 k 
 
 30 
 
 k 
 
 1 
 
 38 
 
 1 
 
 k- 
 
 3^ 
 
 k 
 
 2 
 
 3^ 
 
 k 
 
 3 
 
 20 
 
 k 
 
 k 
 
 kk 
 
 5 
 
 k 
 
 30 
 
 5 
 
 3 
 
 Ik 
 
 5 
 
 2 
 
 28 
 
 5 
 
 1 
 
 26 
 
-Ill- BALANOVA 5 
 
 The rows have been written in a non-systematic order to emphasis that, 
 without exception , in BALANOVA 5 the data rows can be in any order. A 
 possible format is (2F5 -0,F10.0) . 
 
■15- 
 
 BALANOVA 5 
 
 Chapter h. Program Details 
 
 U.l Method and program flow 
 
 The program follows the procedures in Scheffe (1959) , Chapter 8. Scheffe's 
 discussion will not be repeated here, but only a general description of the 
 program flow will be given. The names of the subroutines used are indicated 
 in case reference is made to the program listing. Many of the minor steps 
 and subroutines are not described. 
 
 1. Main Program 
 
 2. 
 
 Calls subroutines and routes your data through the program. 
 Design input and check (iNPUTD) 
 
 The Factor Specification Cards are read and checked for errors. Many 
 of the error conditions mentioned in Section k.3 are checked in INPUTD. 
 The design is transformed into the symbolic notation of live, dead and 
 absent subscripts as in Scheffe. 
 
 3- Derivation of all legal sources (LEGALS,NEWS) 
 
 All possible interactions are generated but only one interaction with 
 a given set of subscripts is retained. The procedure is identical to Scheffe, 
 p. 277- para. 1. The program now has a list of all legal sources (including 
 the original factors'! . 
 
 k. Expected mean squares (AUXIL,EMS) 
 
 The expected mean squares for each source are, of course, not computable 
 numbers, bu!, rather symbolic expressions. (cf. last column, Table 8.2.2, 
 Scheffe). The program generates and prints these expression in a form very 
 close to the normal printed form. The method is from Scheffe, pp. 28U-8. 
 
 5- Denominator for each source (FINDEN) 
 
 By the standard procedures, using the expected mean squares, the program 
 determines the correct denominator (if any) for each source. 
 
 6. Sorting of sources for summary table (SORT) 
 
 The sources are sorted in a convenient order, combining all sources 
 with the same denominator. This order is then used in printing the summary 
 table . 
 
 7 Input of data (INPUTX) 
 
 The input data is read from the input device (the first parameter on the 
 main program card' . The grand means for each dependent variable are computed, 
 ignoring missing ( blank ^ data. 
 
-l6- BALANOVA 5 
 
 8. Storage of data for one dependent variable (READX) 
 
 This routine as well as all the remaining ones are executed in cycle 
 once for each dependent variable . READX stored the data in core and checks 
 that no data is missing in the design. The data is actually stored as 
 deviations from the grand mean. This is done to improve accuracy. See 
 Section k.2. 
 
 9. Check of replication numbers (CELLN) 
 
 In the case of Class A and B designs, BALANOVA 5 checks whether the 
 call frequencies are equal, proportional or non-proportional. 
 
 10. Computation of sum of squares (SSEQU, SSPR0P,XMEAN) 
 
 The marginal means and sums of squares for each legal source are 
 calculated. 
 
 11. Computation and printing of final summary table (FISHER, FPRINT) 
 These calculations are made in the standard way. 
 
 k.2 Some comments on accuracy of computation 
 
 An attempt was made in the design of this program to eliminate the 
 largest sources of computational inaccuracies that can occur in analysis of 
 variance calculations. 
 
 Consider a one-way analysis with the following data: 
 
 Group 1 Group 2 Gro up 3 
 
 8.88 
 8.90 
 8.90 
 8.91 
 8.91 
 
 8.90 9-00 
 
 Sums of squares computations are generally made as the sums and 
 differences of two or more terms. In this example, the exact calculations 
 would be 
 
 SS between = 1188.2500 - 1188.1500 
 = 0.1000 
 
 SS within = ll88.255 l + - 1188.2500 
 = 0.005*+ 
 
 Note that these answers each have a string of zeros following the given digits 
 since they are exact. However on a computer, with about 8 digit accuracy, the 
 differences would only be accurate to about four decimals due to the cancellation 
 of all the higher order digits by subtraction. 
 
 
 
 8 
 
 77 
 
 
 
 8 
 
 79 
 
 
 
 8 
 
 80 
 
 
 
 
 82 
 
 
 ans 
 
 8 
 
 82 
 
 Me 
 
 8 
 
 80 
 
 8.96 
 
 8.99 
 
 9-00 
 
 9-02 
 
 9-03 
 
-17- 
 
 BALANOVA 5 
 
 This is illustrated by a calculation using the previous analysis of 
 variance program in SOUPAC. SOUPAC ' s answers were: 
 
 SS between = 0.10000610 
 SS within = 0.0053863525 
 
 (5 significant digits) 
 (2 significant digits) 
 
 Note the large errors in these SS. Even worse errors can occur in other data. 
 
 The whole problem could be avoided by accumulating a true sum of squares, 
 that is, by adding positive numbers to form each SS rather than taking a 
 difference of two large numbers. However this procedure was rejected because 
 it is extremely slow. 
 
 The following procedure is used in BALANOVA 5 and it is very effective. 
 The data are internally transformed to deviations from the grand mean . This 
 is why the grand means are computed in subroutine INPUTX before the deviations 
 are actually stored in the memory in subroutine READX. When the deviations 
 are used the individual terms which are added and subtracted to give each 
 SS are now numbers of approximately the same size as the SS itself. This 
 means that the number of significant digits in the SS is large even if the 
 grand mean is large. In the example given above, the deviation scores are: 
 
 Group 1 
 
 -•13 
 -.11 
 -.10 
 -.08 
 -.08 
 
 Means 
 
 .10 
 
 Group 2 
 
 -.02 
 
 .00 
 
 .00 
 
 + .01 
 
 + .01 
 
 .00 
 
 Group 3 
 
 + 
 
 .06 
 
 + 
 
 .09 
 
 + 
 
 .10 
 
 + 
 
 .12 
 
 + 
 
 .12 
 
 + .10 
 
 and the SS are computed as 
 
 SS between = 0.10000000 - 0.00000000 
 = 0.10000000 
 
 SS within = 0. 105^0000 - 0.10000000 
 = 0.005^0000 
 
 The actual results produced by BALANOVA 5 were 
 
 SS between = 0.099999998 (8 significant digits) 
 
 SS within = 0.0053999992 (7 significant digits) 
 
 Note the great improvement in accuracy. 
 
 As a final feature of BALANOVA 5, the approximate number of significant 
 digits in each SS is calculated and printed alongside each SS. These numbers 
 should not be interpreted exactly but only as a warning when they are small. 
 The approximate number of significant digits is calculated in the following way: 
 
-18- BALANOVA 5 
 
 (a) Find the largest term, in absolute value, entering into the 
 calculation of the SS. In the example above, the largest 
 term in SS within is the first term (0.105U0000) . 
 
 (b) Take the ratio of this largest term to the SS itself. In the 
 example, this ratio = 0.105^0000/0. 005^0000. 
 
 (c) The approximate number of significant digits is then = 8.0-log,Q 
 (ratio). In the example, this is 8.0-1. 38 = 6.62 which is 
 printed by BALANOVA 5 as 7j a pretty good estimate. 
 
 Note that the number of significant digits printed by BALANOVA 5 reflects 
 the loss of accuracy in the computation of the SS from two or more terms. 
 It does not reflect loss of accuracy due to computation of the terms them- 
 selves. 
 
 U.3 Error conditions 
 
 BALANOVA 5 makes a detailed check to insure that the design is legal, 
 that none of the computer storage arrays are exceeded and that all the data 
 corresponds to cells within the specified design. The following general 
 types of errors are distinguished and corresponding error messages are 
 printed giving detailed instructions about how to correct the error. 
 
 1. One of the restrictions on program size has been exceeded. These 
 restrictions are: 
 
 (a) maximum number of factors = 10 
 
 (b N ; maximum number of legal sources = 100 
 
 (c) maximum size of X- storage array (used for data, means and 
 cell numbers) = 10,000 
 
 (d) maximum number of dependent variables = 200 
 
 (e) maximum number of sigma-squared terms in any one expected 
 mean square = 10 
 
 2. The factor specification cards are incorrect or inconsistent. 
 This is, the design is illegal. The checks made are: 
 
 (a) all nested factors must be listed as a factor. 
 
 (b) no factor may be nested within itself. 
 
 (c) at most one factor can be the replication factor. 
 Furthermore, the replication factor must be nested 
 in at least one other factor and no factor can be 
 nested in the replication factor. 
 
-19- BALANOVA 5 -- 
 
 (d) the factor type must be fixed or random. 
 
 (e) the maximum number of levels for each factor must be 
 more than one . 
 
 (f) there must be at least one denominator term in the 
 analysis of variance summary table. If this is not 
 
 the case it is probably due to no factor being designated 
 as a random factor. 
 
 3. A Data Card has a level set which exceeds the limits stated in 
 the maximum number of levels on the Factor Specification Cards. 
 
 k. Once the data for a dependent variable has been read in, a 
 detailed check is made to insure that all cells in the design are filled. 
 If one cell is not, the calculation for that design is deleted and the 
 program moves on to the next dependent variable after printing sufficient 
 information for the user to locate the missing datum. An additional check 
 for Class B and C designs is made to insure that data for a given subscript 
 set is not read in twice. If two data cards specify the same level set, 
 a comment is made to this effect and the calculations for the dependent 
 variable are deleted. Note that both the checks mentioned in this paragraph 
 are made independently for each dependent variable and are made after the 
 missing data (blank fields) for that dependent variable have been deleted. 
 Errors referred to in this paragraph are not fatal and the program proceeds 
 to the next dependent variable. 
 
 5. About one dozen other checks are made. They should always be 
 passed satisfactorily since the design is first checked as above. These 
 additional checks were inserted to assist in debugging the program and 
 if one of them fails it indicates a remaining error in the program. A 
 printed message is made to this effect in these cases. 
 
 k .k Program Checkout 
 
 BALANOVA 5 has been checked on a large number of designs. Among these, 
 the following calculations were reproduced by BALANOVA 5 : 
 
 1. Lindquist, p. 266, Class A. 
 
 2. Winer, Table 7-8-3 (P- 376^, both least-squares and unweighted 
 means, Class B. 
 
 k.5 Number of levels of the replication factor 
 
 The rules in Section 1.5 and Chapter 3 are strict in the sense that, 
 if they are followed, BALANOVA 5 will execute correctly. However the rules 
 may be relaxed or ignored in the case of the number of levels of the repli- 
 cation factor and it is sometimes convenient to do so. 
 
 In Class A designs, any number of levels of the replication factor may 
 be punched on the Factor Specification Card provided the number is > 2. This 
 
-20- BALANOVA 5 
 
 is so because in Class A designs only cell means are stored and the program 
 does not check the replication number anyway. This rule relaxation is use- 
 ful when it is inconvenient for the user to calculate ahead of time how many 
 subjects are in each cell. 
 
 In Class B designs, any number of levels of the replication factor may 
 be punched on the Factor Specification Card provided 
 
 (al the number is > the maximum number of replications 
 in any one nest, and 
 
 (b) the number is not so large that the restriction on 
 the size of the X matrix (10,000) is exceeded. 
 
 Again this rule relaxation saves the user from having to know the maximum 
 number of replications before using BALAW0VA 5, provided he knows an upper 
 limit. The restriction on the size of the X matrix will not often be ex- 
 ceeded. However, the user has not been informed, in this manual, how to 
 estimate the size of the X matrix needed, since this limit is complicated 
 to specify. 
 
 h.6 Acknowledgements 
 
 The general theory of balanced analysis of variance designs was first 
 introduced to the writer by Professor R. W. Frankmann. The procedures used 
 in this program are very similar to the approach he taught. 
 
 The author is indebted to John Paraskevopoulos who helped greatly in 
 the revision of this manual. 
 
 U.7 References 
 
 Hays, W. L. : Statistics for Psychologists . New York: Holt, 
 Rinehart and Winston, 1963- 
 
 Lindquist, E. F. : Design and Analysis of Experiments in Psychology 
 and Education . Boston: Houghton Mifflin, 1953. 
 
 Scheffe, H. A.: The Analysis of Variance . New York : Wiley, 1959- 
 
 Winer, B. J. : Statistical Principles in Experimental Design . New 
 York: McGraw-Hill, 19b 2. " 
 
~ 21 ~ BALANOVA 5 
 
 Appendix. Key to Designs in Winer and Lindquist 
 OMITTED TEMPORARILY 
 
CLASSIFICATION 
 
 I . General Description 
 
 The CLASSIFICATION program is designed to measure individuals against 
 previously determined groups in order to determine probable group member- 
 ship. The classification is done in a reduced test space derived from 
 discriminant analysis. The method of classification is based on the 
 premise that a. group is totally described by its mean (or centroid) and 
 dispersion; the individual's relation to each group is determined by a 
 X 2 which indicates how many members of the group are farther from the 
 centroid than he, and a Bayesian probability of membership in the group 
 based on this X 2 . For each individual, the X 2 and probability for each 
 group are given; the user then applies a decision rule of his choice 
 for assigning individuals to groups. 
 
 Since analysis is to be performed in a reduced space, the means and 
 dispersion of each group must also be reduced to this space. The CLASSI- 
 FICATION program performs these reductions. 
 
 The calculation of probabilities requires the specification of the 
 number of members in each group against which the individual is being 
 compared. This may be the numbers actually in the groups used for 
 finding experimental means and standard deviations or the number of 
 individuals from the total group being tested who are to be assigned 
 to each group. These numbers are specified as subparamters . 
 
 In every case the input is expected in the form in which it is 
 output by the DISCRIMINANT ANALYSIS program. Input is limited to 16 
 groups and 50 variables. 
 
 II . Parameters 
 
 The program name CLASSIFICATION is followed by these parameters on 
 the program call card: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address of discriminant vectors. 
 The vectors are expected as columns. 
 CARDS or SEQUENTIAL 1-15- 
 
 2 Input Address of means. The means for a 
 given group are expected as a row. CARDS 
 or SEQUENTIAL 1-15- 
 
 3 Input Address of dispersion matrices. The 
 dispersion matrices are expected in a 
 vertically augmented form. CARDS or 
 SEQUENTIAL 1-15- 
 
 k Input Address of individual scores. CARDS 
 
 or SEQUENTIAL 1-15- 
 
CLASSIFICATION 
 Page 2 
 
 Parameter 
 Number Use or Meaning 
 
 5 Output Address for inverse of dispersion 
 
 matrix. PRINT or SEQUENTIAL 1-15. 
 
 2 
 
 6 Output Address for X and probabilities for 
 
 each subject. SEQUENTIAL 1-15 and/or PRINT. 
 
 (Output on SEQUENTIAL is in the form 
 
 N., X._, X. , Y.,, Y. 
 
 1/ il' in' ll in 
 
 where N. = sequential subject number in groups 
 
 X. . = j probabilities for i subject 
 
 th 2 th . 
 Y. . = j X for i subject) 
 
 7 . Number of groups. 
 
 8 Number of variables. 
 
 9 Input Address of group sample sizes. 
 
 CARDS or SEQUENTIAL 1-15- * 
 
 * Must be a row vector (l observation) with the number of discriminant 
 vectors as the first variable. The form output by Discriminant Analysis 
 Program may be used. If provided independent of Discriminant Analysis, 
 be sure to use single precision integer mode ( I -format, fixed mode). 
 If sample sizes and any other input are from cards, sample sizes are 
 given first. 
 
DISCRIMINANT ANALYSIS 
 Page 1 
 
 DISCRIMINANT ANALYSIS 
 
 I. GENERAL DESCRIPTION 
 
 Suppose that we have k populations (groups) and p measures (variables) 
 on each member of each population. We want to test the hypothesis that our 
 groups are significantly different on the entire set of variables. This 
 one-way multivariate analysis of variance hypothesis is tested by this 
 program. The program then locates the dimensions (discriminant functions) 
 along which the group differences are maximum. Thus, we need some 
 function to transform the p variates into a smaller set of independent 
 measures which will indicate the differences between the groups. The 
 DISCRIMINANT ANALYSIS program finds the independent linear functions of 
 the variables which maximally discriminate between the populations (groups 
 input 1. The results from this program, namely the discriminant functions, 
 may be used in the CLASSIFICATION program to determine the probability that 
 any subject belongs in any group. Also, by looking at the coefficients of 
 the functions, we can determine to what extent each of the p variates 
 contributes to each function. In order to do this we need to determine the 
 coefficients of the functions such that the ratio of variances between 
 groups to the variances within groups is maximized, i.e. the differences 
 between groups are to be large relative to the differences within groups. 
 
 In matrix terms, we are trying to maximize the ratio 
 
 f . ' Af. 
 
 f . 'Wf . 
 1 1 
 
 th -1 
 
 where f . is the eigenvector associated with the i eigenvalue \i of W A, 
 
 A = the covariance matrix between means, 
 
 k 
 
 a. . = Z N (X. - X.) (X. - X.) 
 ij g=1 S ig i Dg 3 
 
 and W = the covariance matrix within classes, 
 
 k N g 
 w. . = £ [ E (X. - X. ) (X. - X. )] 
 13 g=l n=l xgn lg 3gn Jg 
 
 where k = number of groups, N = number of subjects in group g, N = total 
 number of subjects, and i and j run from 1 to p, where p = number of 
 variables. 
 
 To find the maximum, we derive from the partial derivatives of that 
 ratio, the matrix equation 
 
 (W _1 A - XI ) F =0 
 
 where F is the matrix of eigenvectors. The eigenvectors are the coefficients 
 of the discriminant functions. The relative sizes of the eigenvalues 
 indicate the extent to which the associated discriminant functions distinguish 
 
DISCRIMINANT ANALYSIS 
 Page 2 
 
 among the groups. The percentage of the total discriminating power of 
 the variables contained in the j"th discriminant function is represented by 
 
 100 ( ^ 
 
 N (N should be the smaller of 
 
 £ X. k-1 and p) 
 
 i=l 
 
 i 
 
 In addition to obtaining the eigenvalues and discriminating coefficients, 
 the program will compute scaled vectors to show the relative contributions 
 of the variables to the discriminant function by 
 
 f ' = (»..) ^ 2 /f.. 
 id 1X ' ^ 
 
 II. INPUT 
 
 Input to the DISCRIMINANT ANALYSIS program consists of two or more 
 data groups. Each data group consists of a set of observations on 
 two or more variables. All of the groups must contain observations on 
 the same set of variables. The groups may be input as separate card decks 
 (each preceded by a DATA format card and followed by an END# card), as 
 data groups located on separate temporary storage areas, or as a 
 mixture of data groups on card decks and data decks on temporary storage 
 areas. For examples see Section VIII. 
 
 III. SIGNIFICANCE TESTS 
 
 The measure of significance calculated in the DISCRIMINANT ANALYSIS 
 program is a Wilks' lambda (likelihood ratio test statistic). This is 
 a test of the discriminating power of the test battery. It tests the 
 hypothesis that the population centroids (mean vectors) are equal for 
 the k groups. The Wilks' lambda is a function of the roots of W _ l A and 
 is of the following form: 
 
 r 1 
 
 A = n 1-X. 
 
 i-1 
 
 where r is the lesser of k - 1 and p. In matrix terms this criterion is 
 defined in the following manner: 
 
 A = 
 
 W 
 
 where W is the pooled within groups deviation score cross-products and T 
 is the total sample covariance matrix. As |t| increases relative to |w| 
 the ratio decreases in value with an accompanying increase in the confidence 
 that the group centroids are not equal. 
 
DISCRIMINANT ANALYSIS 
 Page 3 
 
 An F ratio which yields an apjroximate test of the significance of 
 the Wilks' lambda is calculated and printed. 
 
 F m { i_^ JL) (a^gx.) 
 
 where s = / , 2 2 4 77, 2 2 . q = k - 1 
 
 (p q -** )/(p + q -5 ) 
 
 m=n- (p+q+ l)/2 n = N - 1 
 
 A = -(pq - 2)/U N = total number of subjects 
 
 r = pq/2 k = number of groups 
 
 1/2 
 y = I\ p = number of variables 
 
 The degrees of freedom to be used with the F value printed in the output 
 are printed and are labeled Fl (degrees of freedom for the numerator) 
 and F2 (degrees of freedom for the denominator) and equal 2r and 
 mx + 2A. 
 
 IV. OUTPUT 
 
 The output consists of the following: 
 
 1. Means of input variables for each group and group sample size 
 (Parameter Number 8) 
 
 2. A dispersion matrix for each group. (Parameter Number 9) 
 
 3. The total sample deviation score cross-products matrix 
 
 N _ _ 
 
 t. . = Z (X. - X. )(x. - X.) 
 
 ij n=1 in 1 jn j 
 
 where i and j range over the variables. This matrix is the sum 
 of the A and W matrix described in section I. This is the T matrix 
 referred to in Parameter Number 7 • The diagonal of this matrix 
 contains the sums of squares. (Parameter Number 6) 
 
 k. The pooled within-groups deviation scores cross-products matrix which 
 is labeled W on the output. (Parameter Number 6) 
 
 5. The total number of subjects in all the groups combined. (Parameter 
 Number 6 ) 
 
 6. The means and standard deviations of the variables across all 
 groups. (Parameter Number 6) 
 
 7. The correlation matrix of variables over all groups. (Parameter 
 Number 6 ) 
 
DISCRIMINANT ANALYSIS 
 Page k 
 
 8. The among groups cross-products of deviations of group from grand 
 means weighted by group sizes. This matrix is labeled A matrix on 
 the output. (Parameter Number 6) 
 
 9. The eigenvalues for the W A matrix. (Printed) 
 
 10. The eigenvalues and percentage of variance explained by each 
 additional eigenvalue. (Printed on output, automatically) 
 
 11. The trace of the W A matrix. This is the sum of the eigenvalues. 
 (Printed on output, automatically) 
 
 12. The discriminant functions (f^*). The number of discriminant 
 function will equal r where r is the lesser of the two values 
 k-1 and p , where k = the number of groups and p = number of 
 variables. (Parameter Number l) 
 
 13- The group means on the discriminant functions. This is a 
 
 k x r matrix formed by multiplying the group means on variables 
 and the discriminant functions. The matrix may be used to determine 
 the relative positions of the groups on the derived function. 
 (Parameter Number 11 ) 
 
 lU. The scaled vectors. These vectors are formed by multiplying the 
 discriminant functions by the square roots of the diagonal of 
 the W matrix described above. The scaled vectors show the 
 relative contributions of the input variables to each of the 
 discriminant functions. (Parameter Number 5) 
 
 15. The measures of significance described in Section III. (Parameter 
 Number k ) 
 
 V. RESTRICTIONS 
 
 The DISCRIMINANT ANALYSIS program follow the program name on the main 
 program card. Each parameter must be enclosed in parentheses. The parameters 
 must appear in the order given below. If a parameter is not needed, do not 
 punch anything between its parentheses. All parentheses after the last 
 non-empty pair may be omitted. 
 
 Parameter 
 Number Use or Meaning 
 
 1 fi Output address of discriminant functions 
 
 (Matrix f. .). SEQUENTIAL 1 -15 and/or 
 PRINT. 1J (Needed for CLASSIFICATION). 
 
 2. Number of variables 
 
 3 Number of groups 
 
 h 1 if desire significance measures printed. 
 
DISCRIMINANT ANALYSIS 
 Page 5 
 
 Parameter 
 Number 
 
 6 
 
 7 
 
 Use or Meaning 
 
 1 if desire scaled discriminant vectors 
 printed. 
 
 1 if desire intermediate results printed. 
 
 1 if input is W and T matrices instead 
 of raw data. 
 
 Output Address of group means on original 
 variable and sample size (printed only).* 
 SEQUENTIAL 1-35 and/or PRINT. (See Parameter 
 12) (Needed for CLASSIFICATION). 
 
 Q Output Address of group dispersion matrices 
 of original variables.* SEQUENTIAL 1-15 
 and/or PRINT. (Needed for CLASSIFICATION). 
 
 10 
 
 N - total number of subjects in all groups 
 combined. This parameter is left blank if 
 raw data is input rather than W and T 
 matrices . 
 
 11 
 
 Q Output Address of group means on discriminant 
 functions. SEQUENTIAL 1-15 and/or PRINT. 
 
 12 
 
 Output Address of group sample sizes. Possibly 
 needed for CLASSIFICATION. 
 
 Input addresses of raw data input groups or W and T matrices 
 are listed on a $INPUT card. The number of raw data groups is limited 
 to l6. W precedes T in the sequence when these are input. W and T 
 may be output using a $0UTPUT card. Output order is the same as 
 input . 
 
 * If W and T are input instead of raw data, group means and dispersion 
 matrices are not printed. Means and dispersion matrices on discriminant 
 functions are not computed in this case. 
 
 Q It is possible to print in F format and/or punch the output from 
 
 these parameters. If you need either of these options, see the section in 
 
 the INTRODUCTION on INPUT and OUTPUT. 
 
 VII. SPECIAL COMMENTS 
 
 1. This program does not check for missing data, 
 are read as zeros . 
 
 All blank spaces 
 
 2. The user is cautioned against using the DISCRIMINANT ANALYSIS 
 program without an understanding of the statistical technique 
 used. See the references (Section IX). 
 
DISCRIMINAM 1 ANALYSIS 
 Page 6 
 
 . EXAMPLES 
 
 
 A 
 
 B 
 
 /*ID 
 
 /•ID 
 
 // EXEC SOUP 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 //SYSIN DD * 
 
 MATRIX . 
 
 DIS(S1/P)(33)(M( 
 
 MOV(C)(S5). 
 
 $INPUT(C)(C). 
 
 END P 
 
 END SOUP 
 
 DIS(S1)(U0)(U)(1)()()()(32/P). 
 
 DATA(33)(33F2.0) 
 
 $INP(C)(C)(S5)(C). 
 
 , 
 
 END SOUPAC 
 
 (Data for W) 
 
 DATA ( 1+0 )(1+0F2.0) 
 
 END# 
 
 
 DATA(33)(33F2.0) 
 
 
 (Data for T) 
 
 END# 
 /* 
 
 END# 
 /* 
 
 In example A , four groups of data are being input with the first 
 two groups coming from cards, the third from temporary storage on S5 
 and the fourth from cards. Discriminant functions are stored on 
 SI and printed and group means are stored on S2 and printed. 
 Significance measures are calculated for 33 variables input. In example 
 B, W and T matrices are input and much the same results are obtained as 
 for example A. Group means cannot be calculated. 
 /*ID (accounting information) 
 // EXEC SOUPAC 
 //SYSIN DD * 
 
 DIS(Sl)(l5)(2)()()()()(S2)(S3)()()(Sl4). 
 $INP(C)(C). 
 
 CLA(S1)(S2)(S3)(C)(P)(P)(2)(15)(D(SM. 
 
 END S 
 
 DATA(15) 
 
 END# 
 /* 
 
 This illustrates the use of the DISCRIMINANT and CLASSIFICATION programs. 
 The DISCRIMINANT program will save discriminant functions on SI; it 
 will operate on 15 variables for each of two groups. It will output 
 group means on S2 and groups dispersion matrices and store group sample 
 sizes on Sk . 
 
 CLASSIFICATION in turn will read discriminant functions, group means, 
 and group dispersion matrices from SI, S2, and S3 respectively. It 
 will read the group to be classified from cards and print inverse of 
 dispersion matrices and x and probability. It will expect two 
 sets of group means and dispersion matrices from 15 variables but 
 1 discriminant function (See Section IV). Original group sample 
 sizes are read from S^. 
 
DISCRIMINANT ANALYSIS 
 Page 7 
 
 IX. REFERENCES 
 
 Bryan, J. F. , The Generalized Discriminant Function: Mathematical Foundation 
 and Computational Routine. Harvard Educational Review , (1951 ) 
 21:90-95. 
 
 Cooley, W. W. and Lohnes , P. R. , Multivariate Procedures for the 
 Behavioral Sciences . New York, Wiley, 1962. 
 
 Kendall, M. G. , A Course in Multivariate Analysis . New York, Hafner, 
 1961, pp. 106-108. 
 
 References on output: 
 
 Wilks ' Lambda - first formula, page 119 , Cooley and Lohnes. 
 F ratio - formula k.l, page 62. Cooley and Lohnes. 
 
• ^-TEST 
 
 General Description 
 
 The T-TEST program calculates a T coefficient or F ratio as described 
 tielow: 
 
 Suboperation (l) : Paired T-Test (also called correlated T-Test ) . 
 Variables are in a row, variable on is paired with variable two, three with 
 four, etc., and a paired T coefficient is calculated for each pair as 
 follows : 
 
 t = d/s_ 
 d 
 N 
 d = E [Xa - X3]/N. , where a and 3 are the j pair of variables 
 
 S_ = N. E [Xa - X6] 2 - ( E [Xa - X3]) 2 /N. 2 .f] 
 d J i=l i=l J 
 
 where f = degrees of freedom = N. or N. -1 as desired and N. is the sample 
 
 J J J 
 
 size for the j pair of variables. 
 
 Suboperation (2): Paired T-Test for all possible combinations of 
 variables computed as in Suboperation (l). 
 
 Suboperation (3) : Test of differences from a known population mean. 
 A population mean must be provided for each column of data or the 
 mean will be set to zero. Population means should be provided as a 
 row vector. The following are calculated and printed for each vari- 
 able : 
 
 t value: t = (X -jj )/S_ 
 
 x 
 where p = parameterized value or zero 
 
 _ N - 
 Mean: X = E 1 X./N. 
 
 1=1 X X ? 2 
 
 N.EX7 - (EX. ) 
 
 Standard Deviation: S.D. =( 
 
 l 
 
 N.x(d.f . ). 
 l l 
 
 NOTE: N-l is the usual degrees of freedom, but 1 may be specified. 
 
 S.D. 
 Standard Error of Mean: S_ = -— — — - 
 
 x /~ N 
 
 Suboperation (h) : Test of difference^ from a known population mean for 
 previously analyzed data: the mean (X), standard deviation (S. D.), and 
 sample size (N) as well as the population mean [see suboperation (3)] are 
 
T-TEST 
 Page 2 
 
 read in and the program computes the standard error of the mean and T 
 for each trio of the X, S.D., and N read in that order. If R trios of 
 data are used (R observations on rows occur" 1 then R population means 
 should be given. Calculations are the same as in Suboperation (3). 
 
 NOTE: A column of data of Suboperation (3) is reduced to a three item 
 row here. 
 
 Suboperation (5) : Test of differences between two or more group means 
 taken pairwise: each group is located on a separate storage location 
 or set of cards. The following are calculated: 
 
 t value: t = (X. - X.)/S 
 
 1 J - 
 
 Xi - x. 
 
 Mean: X. £ X j/ N i 
 
 2 N 2 
 
 N. T ± x( " (^ X i) 
 
 Standard Deviation: S.D. 
 
 _i=l__^l/2 
 
 N.x(d.f. ). 
 
 N 2 N N. N, 
 
 b ± z x - (z T~) N d £ J x d - (z 3 yr) 
 
 Pooled Estimate of Variance: &~ = [ — — — — + — ^ — i^](d.f. 
 
 i J 
 
 S 2 (N + N ) / 
 
 Estimate of Standard Error: S_ = [ t-=- <L.pf 
 
 xi - x. I d 
 
 Suboperation (6) : One-way analysis of variance. The data to be compared 
 are located on different storage units or sets of cards. Calculations are 
 made as follows for each variable : 
 
 S = number of storage units = number of subroups 
 
 N- = number of observations in i subgroup, i=l, ■, S, and for 
 
 each i, j=l, , Nj_ 
 
 X. . = element in the j row of the i subgroup 
 
 S 
 N = £ N. = total observations 
 i=l 
 
 S N i S Ni ? 
 
 Total SS = £ (£ X. . ) - (z £ X ±i ) /N 
 i=l j=l J i=l J=l 
 
 Ni 
 
 £ X.j. = sum of each constant i (i.e. over the subgroup) 
 
 3=1 
 
T-TEST 
 Page 3 
 
 S N S K , 
 
 Between SS =[ E E X. . ]/N. - (E E X.J 
 
 i=i 3=1 J x i=i 3=1 1J 
 
 Within SS = Total SS - Between SS 
 
 Within D.F. = Total D. F. - Between D. F. 
 
 F = 
 
 BSS 
 BDF 
 
 WSS 
 WDF 
 
 Suboperation (7) : One way analysis of covariance. The experimental 
 
 (dependent) variable comes first followed "by up to kg covariates . The 
 
 dependent variables is adjusted to the set of covariates, not iteratively 
 
 to one covariate at a time. Subgroups or factor levels are handled as 
 
 in analysis of variance, Suboperation (6), i.e. as separate input decks 
 
 or temporary storage locations. Coefficients obtained are means and standard 
 
 deviations, test of homogenity of regression, F-ratio for covariance, 
 
 and adjustment coefficients. For further discussion see Winer, p. 578 ff. 
 
 References 
 
 Bryant, E. C. , Statistical Analysis . New York: McGraw-Hill, i960. 
 
 Snedecor, G. W. , Statistical Methods . Ames: Iowa State College 
 Press, 1957. 
 
 Winer, B. J., Statistical Principles in Experimental Design . 
 New York: McGraw-Hill, 1962. 
 
 II. Restrictions 
 
 The maximum number of input variables is 150 except for Suboperation 
 (2), all possible paris, and Suboperation (7), covariance, where the 
 maximum is 50. Suboperations (5) and (6), tests of differences and analysis 
 of variance, require the data to be divided into 2 or more subgroups. The 
 maximum number of subgroups is lU. 
 
 III. Parameters 
 
 T-TEST and ANALYSIS OF VARIANCE 
 
 Parameter 
 Number 
 
 1 
 2 
 
 Use or Meaning 
 
 Suboperations 1-7- (See above). 
 
 Number of subgroups (if applicable). 
 
T-TEST 
 Page h 
 
 Parameter 
 Number Use or Meaning 
 
 3 - Count clanks as zeros 
 
 1 - Count blanks as missing data 
 
 k - use N-l as degrees of freedom 
 
 1 - use N as degrees of freedom (See Special 
 Comments ) . 
 
 Input addresses of subgroups for options 5, 6, and 7 are listed on a 
 $-iriP card. (See section on SOUPAC Input /Output ) . The maximum number of 
 subgroups is ik. See example for illustration. If options 1 or 2 are 
 used, provide only 1 input address. For option 3, provide tv6 addresses, 
 the first for the sample being tested, the second for the criterion means. 
 For option h provide two input addresses, the first for the sample being 
 tested, the second for the criterion means, standard deviations and sample 
 size. 
 
 IV. Special Comments 
 
 If it is desired to use the option of comparing the mean with some 
 population mean other than zero as in Suboperations (3) and (k) , a row 
 vector of \i must be included in temporary storage. This row vector must 
 be of length N = number of variables. 
 
 Most work requires N-l degrees of freedom. 
 
 V. Examples 
 
 T-TEST (l). 
 $INP(S1). 
 
 Paired T-Test on data stored on SI. 
 
 T-TEST (2)( )(1)(1). 
 $INP(S15). 
 
 Paired T-Test on data stored on SI 5 doing test on all possible pairs 
 checking for missing data and using N degrees of freedom. 
 
 T-TEST (3) 
 $INP(S1)(S3). 
 
 T-Test of population means on SI against criterion means on S3. 
 
 T-TEST (5) (2). 
 $INP(S1)(S2) 
 
 T-Test of group mean of two groups, one on SI and the other on 
 S2. 
 
T-TEST 
 Page 5 
 
 T-TEST (6) (k) (1). 
 $INP(S1)(C)(C)(S3). 
 
 One-way analysis of variance over four groups, one on SI, two from 
 cards, and one from S3, checking for missing data. 
 
CORRELATIONS AND REGRESSION PACKAGE 
 
BISERIAI CORRELATION 
 
 General Description 
 
 This program calculates the following coefficients for each com- 
 bination of one dichotomous and one continuous variable. 
 
 Case totals : % cases in p = N /N 
 
 % cases in q = N a /W 
 Total cases = N 
 
 Mean: 
 
 X p = IX p /N p 
 
 X q = ^q/ N q 
 X = £X/N 
 
 Standard deviation: 
 
 S p 2 = 2Xp 2 /W p - X p 2 
 
 
 S q 2 = £Xq 2 /N q 
 S 2 = £X 2 /N - X 2 
 
 Biserial r: 
 
 ( x P " x q) pq 
 
 where p = percentage of cases in category 
 q = percentage of cases in 1 category 
 h = height of the normal curve computed from normal tables 
 
 The program checks for missing data, and computes the above measures 
 only for those cases where both dichotomous and continuous variables are 
 present. 
 
 II . Restrictions 
 
 A. Dimension 
 
 The input for the BISERIAL R program is limited to a maximum of 
 20 dichotomous variables and 100 continuous variables. 
 
 B. Type of input 
 
 Data is read row-wise from either tape or cards. All dichotomous 
 variables must be first in each row. They should be coded with 
 and 1 
 
 III . Parameters 
 
 The program call card requires h parameters after the program name, 
 BISERAL R: 
 
BISERIAL CORRELATION 
 Page 2 
 
 Parameter 
 Number ~ Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 
 2 Output Address of correlations. SEQUENTIAL 1-15 
 and/or PRINT. 
 
 3 Number of dichotomous variables. 
 
 k Number of continuous variables. 
 
 5 - ignore blanks 
 
 1 - count blanks as zeros 
 
CANONICAL ANALYSIS 
 
 I. GENERAL DESCRIPTION 
 
 A. Mathematics 
 
 The Canonical Correlation program provides a multivariate test of the 
 hypothesis that two sets of normally distributed variables are independent. 
 The larger set of variables (the predictor variables) is considered to 
 have q members, and the smaller set (criterion variables) has p members. 
 This program also linearly transform each set of variables into a new set 
 of independent variables, (or dimensions) such that the first new predictor 
 (a linear combination of the original predictors) has maximum correlation 
 with the first new criterion variable. The second new predictor is 
 maximally correlated with the second new criterion, and so on (with the 
 constraint that each new variable is uncorrelated with the previous new 
 variables derived from the same set of original variables ) . 
 
 Let the criteria set consist of the p_ variables x , ,x and 
 
 the predictor of q_ other variates x + -,,...., x . Assume p<_ q.. We 
 then look for weighting matrices c^g and d alD such that 
 
 K a = 3 °3 X 3 a = 1, 2, ... p 
 
 3 = 1, 2, . . . p 
 
 b ab x b a = 1, 2, 
 
 P 
 
 b = p+1, p+2, . . . p+q. 
 
 The variables 5 and nhave the following properties: 
 
 1. They are standardized variables. 
 
 2. Within each set, the £'s are independent and the x\ ' s are dependent 
 
 3. Within the set E, a (as a runs from 1 through p) and within the set of na 
 (as a_ runs from 1 through p) the correlation is zero. These matrices 
 
 are formed in such a way that E,^ and l-i » 5p and n o ' et °' are maximally 
 correlated giving p correlations l,....,p. 
 
 The purpose of this program is to find the p correlations A ,...,A and 
 
 the weighting matrices c and d 
 
 op ab 
 
 B Procedure 
 
 The correlation matrix R is first partioned into: 
 
 A 
 
 C 
 
 where A = correlation among predictors 
 
 C = correlation between predictors and criteri a 
 B - correlation among criteria 
 
CANONICAL ANALYSIS 
 Page 2 
 
 The matrix equation 
 
 (CA _1 C' - X 2 B) U = 
 
 is solved to find the p correlations X , A , and the criteria weighting 
 
 matrix U whose elements are c a g. The eigenvalues D and eigenvectors H of B 
 are found: CA C is then post-multiplied by HD~l/2 and pre-multiplied by- 
 its transpose to reduce the equation to the standard form (Z - Al)U=0. 
 Then the predictor weighting matrix V whose elements are d is computed by 
 the equation 
 
 V = (A -1 C U) 1/ X 
 
 II. INPUT 
 
 Input to the CANONICAL ANALYSIS program consists of a correlation matrix . 
 These variables include a set of predictor variables and a set of criterion 
 variables. Either set may be first on the input data but there can be no 
 mixing of the two types of variables on the input data. The TRANSFORMATION 
 program may be used to reorder the variables if they are mixed on the 
 card data deck. 
 
 III. SIGNIFICANCE TESTS 
 
 Included in the printed output of the CANONICAL program is a Chi-square 
 value for each of the eigenvalues A 2 computed in the program. The chi-square 
 values printed are determined from the Wilks ' lambda values using the 
 procedure outlined by Bartlett (see Section IX ). The chi-square values 
 provide a test of the null hypothesis that the p_ variates are unrelated to 
 the q variates. If there is at least one way in which a linear combination 
 of the predictor variables is correlated with a linear combination of 
 the criterion variables this Chi-square value will be significant. The 
 second Chi-square may then be examined. This Chi-square is a test of 
 a second relationship after the first relationship has been removed. If 
 this Chi-square is significant a second linear combination of the predictor 
 variables is correlated with a second linear combination of the criterion 
 variables. This process continues until the first non-significant 
 Chi-square is found. All Chi-squares beyond that point will be non- 
 significant. 
 
 IV . OUTPUT 
 
 The output consists of the following: 
 
 1. The matrix of standardized regression coefficients. This is the matrix 
 of coefficients which would be formed if the raw data used to calculate 
 the correlation matrix input had been converted to standard scores. 
 
 The predictor variables are on the rows of the matrix and the criterion 
 variables are on the columns of the matrix. 
 
 2 
 
 2. A multiple correlation squared (R ) for each of the criterion variables. 
 
 The first R^ value is the multiple correlation of the first criterion 
 
CANONICAL ANALYSIS 
 Page 3 
 
 variable with the entire set of predictors variables. The second is 
 for the second criterion variable with the set of predictors, etc. 
 
 2 
 
 3. A set of eigenvalues A, correlations A, Wilks' lambdas, Chi-squares, 
 and degrees of freedom. (See Section III) (Printed) 
 
 k. A matrix of criterion weights. (Parameter Number 3) 
 
 5. A matrix of predictor weights. (Parameter Number k) 
 
 V . RESTRICTIONS 
 
 A. Dimension 
 
 The maximum permissible number of variables in each set is 80. 
 
 B. Special Conditions 
 
 The matrices A and B must both be non-singular 
 
 C. Number of criteria (p) aiust be equal to less than the number of 
 predictors (q). 
 
 VI. PARAMETERS 
 
 The parameters for the CANONICAL ANALYSIS program follow the program 
 name on the main program card. Each parameter must be enclosed in 
 parentheses. The parameters must appear in the order given below. If 
 a parameter is not needed, do not punch anything between its parentheses. 
 All parentheses after the last non-empty pair may be omitted. 
 
 Parameter 
 
 Number Use or Meaning 
 
 1 Input Address (correlation matrix). CARDS 
 
 or SEQUENTIAL 1-5- 
 
 2 Output Address of canonical correlations. 
 
 SEQUENTIAL 1-5 and/or PRINT. 
 
 3 ft Output Address of criterion weighting 
 
 matrix. SEQUENTIAL 1-5 and/or PRINT. 
 
 h Q Output Address of predictor weighting 
 
 matrix. SEQUENTIAL 1-5 and/or PRINT. 
 
 5 Number of predictor variables. 
 
 6 Number of criterion variables. (Must 
 
 be less than or equal to number of 
 predictors ) . 
 
 7 Order of variable sets on input: 
 
 1 if predictors are first 
 
 2 if criteria are first 
 
Parameter 
 Numbers 
 
 9 
 10 
 
 CANONICAL ANALYSIS 
 Page k 
 
 Use or Meaning 
 
 Operation to be performed: 
 
 1 if only canonical correlations 
 
 2 if weights are to be computed 
 
 1 if want regression coefficients printed 
 
 1 if want multiple correlation squared 
 (R ) printed 
 
 It is possible to print in F format and/or punch the output from these 
 parameters. If you need either of these options, see the section in the 
 Introduction on Input and Output. 
 
 VII. SPECIAL COMMENTS 
 
 This program does not check for missing data. All blank spaces are 
 read as zeros. 
 
 VIII. EXAMPLES 
 1A 
 
 /*ID <accounting information> 
 // EXEC SOUP 
 //SOUP.SYSIN DD * 
 
 CANONICAL (CARDS) (SEQUENTIAL l/PRINTi 
 (PRINT) (PRINT) (15) (5) (2) (1). 
 ENDS 
 DATA 
 
 END# 
 
 /* 
 
 IB 
 
 /*ID <accounting information 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 CAN (C) (S1/P)(P)(P)(15)(5)(2)(1). 
 
 ENDS 
 DATA 
 
 END# 
 
 n 
 
 Examples 1A and IB illustrate the use of the program call card for 
 a CANONICAL correlation program. In these examples the input data consists 
 of 15 predictor and 5 criterion variables and will be a card deck. 
 The predictor \rariables are the first 15 variables of each observation 
 in the data deck. The printed output will be the canonical correlations, 
 
CANONICAL ANALYSIS 
 Page 5 
 
 the criterion weighting matrix, and the predictor weighting matrix. The 
 canonical correlation matrix will also he stored on temporary storage 
 SEQUENTIAL 1. 1A & IB perform same calculations. 
 
 IX. REFERENCES 
 
 For a discussion of the uses of canonical analysis, see Kendall, M.G. , 
 A Course in Multivariate Analysis , New York, Hafner, 196l, pp. 68-85 
 or Kendall, M.G. , The Advanced Theory of Statistics , New York, Hafner 
 1951, Vol. II., pp. 3U8-358. 
 
 For the derivation of the method used, see Anderson, T. W. , An 
 
 Introduction to Multivariate Statistics , New York, Wiley, 1958, 
 pp. 288-296. 
 
 For significance test procedure, see Cooley, W. W. and Lohnes , P. R. , 
 Multivariate Procedures for the Behavio ral Sciences, New York, 
 Wiley, 1962, p. 37- 
 
CORRELATION 
 
 I. General Description 
 
 The main purpose of the CORRELATION program is the calculation of 
 Pearson product-moment correlations (hereafter referred to as correlations 
 in this writeup) . A correlation measures the linear dependency between 
 two variables, and this program calculates a correlation for each pair 
 of input variables. The square of a correlation, sometimes called the 
 coefficient of determination, represents the proportional reduction in 
 variance of one variable due to a linear relationship with another. Thus 
 the coefficient of determination measures the strength of a linear relation- 
 ship, or the proportion of variance accounted for by a linear rule. 
 
 The CORRELATION program automatically produces other types of 
 correlation coefficients, because the calculations required are identical. 
 Thus point biserial coefficients of correlation (often preferred to 
 biserial correlation), phi coefficients (alternative to tetrachoric 
 coefficients), and Spearman's rank order correlations can be readily 
 obtained. Thus, if the input consists of dichotomous variables, the 
 output will contain a mixture of phi's, point biserials, and ordinary 
 correlations. (A point biserial correlation is a correlation between 
 a dichotomous variable and a continuous variable). If the input to 
 the correlation program consists of rank ordered data (ordinal), the 
 output will be Spearman's rank order correlations. (See Walker and Lev, 
 Chapter 11 for comparisons and comments on the above mentioned coefficients). 
 
 In the process of calculating the correlations, the means and standard 
 deviations of the individual variables are computed, as are the cross- 
 products and covariances between variables. After the correlations have 
 been calculated, they are used to calculate the linear regression coefficient 
 and corresponding intercept terms needed for predicting each variable from 
 each other variable. 
 
 II. Input 
 
 Input to the CORRELATION program consists of a set of independent 
 observations on two or more variables. The data is considered as a 
 two-dimensional array (or matrix) of numbers with each column containing 
 the observations on one variable, and each row consisting of one obser- 
 vation on each variable. If we use the letter X to represent the matrix 
 of raw data, we let Xj_j represent the i~kh row (where i = 1, 2, ... N) and 
 the jth column (where j = 1, 2, . . . M) . In other words, we have N obser- 
 vations (rows'* and M variables (columns') in our data matrix X. 
 
 III. Formulas and Calculations 
 
 The following formulas define certain statistics and illustrate their 
 methods of calculating within the program. The subscript i refers to 
 observations (or individuals) and runs from 1 to N. The subscripts j and 
 k refer to variables, and they run from 1 to M. 
 
CORRELATION 
 Page 2 
 
 N 
 
 Z X. . 
 
 - i=l 1J 
 
 Mean = X . = — = 
 
 (of variable j) 
 
 W N N N 
 
 Z(X -X )(X -X ) NZX X -( ZX )( zx ; 
 1=1 iJ ik k i=1 ij ik i=1 x3 i=1 iK 
 
 Covariance* = C = ^— 3- = n(n - l) 
 
 (between variables 
 j and k) 
 
 N _ 2 / N N 
 
 Z (X. . -X.) /W ZX7. - ( ZX. .) 
 
 - 1=1 1J J / i=l 13 1=1 1D / 
 
 Standard Deviation* = S. = v / — = =J r-y— — — r = v C . . 
 
 if . , -, .\ 3 s * W-l ^ W(N-1 jj 
 (of variable J ) 
 
 Correlation = R., 
 
 Ok S .S 
 
 W WW 
 
 w zx. .x.. - ( zx. .)( zx.. ; 
 
 C. . , ij lk \ , ij v . , ik' 
 jk i=l i=l i=l 
 
 2 
 
 (between variables J j k /~N ? W p /~~ IT W 
 
 J and k) / W ZX - ( ZX ) / W ZX - ( ZX ) 
 
 \/ i=l 1J i=l la V i=l lk i=l lk 
 
 From the equation X. . = B..X., + A., , the program calculates 
 
 ij jk ik jk 
 
 S. 
 Linear Regression Coefficient = B., = R., (— -) 
 
 (for predicting variable j from variable k) k 
 
 Intercept = A = X. - B. X 
 
 (constant term in equation for '■' 
 
 predicting variable j from variable k) 
 
 *NOTE : the sample covariances and sample standard deviations are unbiased 
 estimates of the corresponding population parameters. The definitions 
 given here follow the practice of many current statisticians. [See Anderson 
 (1958) - Chapter 3 for example.] 
 
 Significance Tests 
 
 If we assume that two variables (indexed by j and k) have a bivariate 
 normal distribution, there is a test statistic for testing the hypothesis 
 
 :'■ the correlation in the population is zero (or equivalently that either 
 regression coefficient is zero). Even for a relatively small sample size 
 (n), this hypothesis can be tested using the t ratio: 
 
 R., -s/w- 2 
 t = J*- 
 
 4 2 
 
 1 - R ., 
 Jk 
 
CORRELATION 
 Page 3 
 
 with N - 2 degrees of freedom. Other types of hypotheses can be tested 
 through use of the Fisher R to Z transformation. [See Hays (1963), pages 
 529 - 533 for example. ] 
 
 V. Output 
 
 Output from the CORRELATION program may consist of any or all of 
 the statistics from section III above, by using parameters 2 through 7. 
 Any output from this program may be printed and/or output to temporary 
 storage (SEQUENTIAL 1-5). The means, standard deviations, and the sample 
 size (N) are output as a matrix with M rows (one for each variable) and 
 three columns (the third column will have a constant value of N for all 
 variables) . Correlations, covariances, and cross-products are printed 
 as lower triangular matrices, while the regression coefficients and 
 intercepts are printed as square matrices. However, all five of these 
 matrices are stored as square matrices. 
 
 VI. Restrictions 
 
 The CORRELATION program will accept an unlimited number of 
 observations, but the number of variables is limited as noted in the 
 section on PROGRAM LIMITS in the INTRODUCTION. 
 
 VII. Parameters 
 
 The parameters for the CORRELATION program follow the program 
 name on the main program card. Each parameter must be enclosed in 
 parentheses. The parameters must appear in the order given below. 
 If a parameter is not needed, do not punch anything between its 
 parentheses. All parentheses after the last non-empty pair may be 
 omitted. 
 
 Parameter 
 Number 
 
 5 
 6 
 
 Use or Meaning 
 
 Input Address of raw data (X matrix). 
 CARDS or SEQUENTIAL 1-15 and/or PRINT. 
 
 Output Address for means, standard 
 deviations, and sample size. 
 SEQUENTIAL 1-15 and/or PRINT. 
 
 ti Output Address for correlation matrix 
 (R). SEQUENTIAL 1-15 and/ or PRINT. 
 
 .0 Output Address for cross-products 
 matrix. SEQUENTIAL 1-15 and/or PRINT. 
 
 n Output Address for covariance matrix (c). 
 SEQUENTIAL 1-15 and/or PRINT. 
 
 Q Output Address for matrix of regression 
 coefficients (B) . SEQUENTIAL 1-15 and/or 
 PRINT. 
 
CORRELATION 
 Page h 
 
 7 
 8 
 
 fi Output Address for intercepts (matrix A) 
 SEQUENTIAL 1-5 and/or PRINT. 
 
 1 if last variable in each row is a 
 weighting factor. 
 
 It is possible to print in F format and/or punch the output from these 
 parameters. If you need either of these options, see the section in 
 the INTRODUCTION on INPUT and OUTPUT. 
 
 VIII. Special Comments 
 
 1. This program does not check for missing data. All blank spaces 
 are read as zeroes. If you have missing data, use the MISSING 
 DATA CORRELATION program. 
 
 2. In the output matrices of regression coefficients and intercepts, 
 the row number refers to the dependent variables, and the column 
 numbers refer to the independent variables. 
 
 3. If a variable is constant, an error message will be printed and 
 all correlations with that variable will be set to zero. 
 
 h. In order to have the program perform its calculations separately 
 for sub samples of the data, see the section on CONTROL VARIABLES 
 in the INTRODUCTION. 
 
 IX. Examples 
 
 1A 
 
 /*TD <accounting information> 
 // EXEC SOUP 
 //SOUP.SYSIN DD * 
 CORRELATIONS (CARDS 1 ( ) (PRINT ^ 
 END SOUPAC 
 DATA (6) (6F2.0) 
 
 IB 
 
 /*ID <accounting information> 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 COR (C)( )(P). 
 
 ENDS 
 
 DATA (6H6F2.0) 
 
 END# 
 /* 
 
 END# 
 
 Example 1A illustrates the usage of the CORRELATION program. Notice 
 that all words are spelled out although this is unnecessary. Notice also 
 that correlations are to be printed out, although the means and standard 
 deviations are not. Example IB will perform exactly the same computations 
 as 1A, except that all instructions have been abbreviated to make keypunching 
 easier. 
 
CORRELATION 
 Page 5 
 
 2 
 
 /*ID <accounting information> 
 
 // EXEC SOUP 
 
 //SYS IN DD * 
 
 cor (cHp)(p/si). 
 
 principal axis from (si) to (s2/p) with (10) factors and 
 (100) percent of the variance to be removed, 
 varimax rotation from (s2) to (print). 
 
 ENDS 
 
 DATA (20)(lOF4.0, 5F6.2/10X, 5F^.l) 
 
 END# 
 
 /* 
 
 In the second example, the CORRELATION program first prints the means 
 and standard deviations. Then it prints the CORRELATION matrix and stores 
 it on SEQUENTIAL 1 (Si) . The PRINCIPAL AXIS program then performs a 
 principal components analysis and outputs 10 components to S2. VARIMAX 
 then rotates these 10 components, using the VARIMAX criterion, and prints 
 the results. 
 
 X. References 
 
 T. W. Anderson, An Introduction to Multivariate Statistical Analysis ; 
 John Wiley and Sons, Inc. , 1958. 
 
 E. C Bryant, Statistical Analysis ; McGraw-Hill, i960, pp. 113-135- 
 
 W. L. Hays, Statistics for Psychologists; Holt, Rinehart and Winston, 
 i960. 
 
 H. M. Walker and J. Lev, Statistical Inference ; Henry Holt and Company 
 New York, i960. 
 
MISSING DATA CORRELATION 
 
 General Description 
 
 The MISSING DATA CORRELATION program calculates the following coeffi- 
 cients for every combination of variables: 
 
 Mean: X ± = — 
 
 n. z(x 2 id ) - (rx^) 2 1/2 
 
 Standard Deviation : s . . = [ — ] 
 
 " Ntj (N ir l) 
 
 . n^eCxy^) - (ffl 13 )(g 13 ) 
 
 Covariance : S. . = 
 
 S. 
 
 N,. (N..-1) 
 
 Correlation : r 
 
 " Sx. .Sy. . 
 
 II. Restrictions 
 
 The maximum number of variables for this program is 100. 
 
 The input data to this program may come from any source conforming to 
 SOUPAC. Output may be printed and the correlation matrix may be placed on 
 any source conforming to SOUPAC. 
 
 III. Parameters 
 
 The parameters for the MISSING DATA CORRELATION program appear on the 
 program card. They must follow the program name in the following order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 Default is CARDS. 
 
 2 - printing as usual 
 
 1 - printing is suppressed 
 
 3 Output Address of correlation matrix. 
 
 h Output Address for sample sizes. 
 
 5 Coding for missing data; if left blank or if 
 
 zero is entered, minus zero is used as check. 
 It is NOT possible for this program to count 
 true zeroes as missing data. This parameter 
 must be enclosed in asterisks. Example: *99*« 
 
 NOTE: All output is in double precision. 
 
MISSING DATA CORRELATION 
 Page 2 
 
 IV. Special Comments 
 
 A. The user is warned against further processing of the correlations 
 output by this program because the correlations do not necessarily 
 come from the same sample. 
 
 B. For control breaks, data must be presorted on the control variables 
 with the last variable changing fastest. The maximum number of control 
 variables is 30 Control variables begin on a new card with $C-B in column 
 1 and are enclosed in parentheses. 
 
 C . The correlation matrices can be stored in parameter 3 is a temporary 
 storage address. However, if control breaks are also being used, 
 only the first matrix corresponding to the first control break can be 
 saved . 
 
PARTIAL CORRELATION 
 
 I . General Description 
 
 This routine, upon option, provides two of the more common types of 
 special purpose correlation coefficients. 
 
 A. Partial Correlations: 
 
 This program produces coefficients of net correlation of any order 
 froin 1 to 19 in matrix form. Coefficients of successively higher 
 order may be obtained by repeated calls to the program, each time 
 using as input the previously generated partial correlation matrix; 
 or several variables may be held constant at the same time by one 
 call to the program. 
 
 The general equation used is: 
 
 
 ij .abc. . . (n-1) in.abd. . . (n-l) i j < 
 
 .abc. . . (n-l) 
 
 ij .abc . . .n 
 
 (1 " r 2 - UA t ,0 1/2 (1 - r. , 
 in.abd. . . (n-l) ;jn.abc 
 
 ...(n-l) } 
 
 eferences : 
 
 
 
 Mills, F.C. 
 
 Statistical Methods, Holt, Rinehart and 
 
 Winston. 
 
 New York, 1955, 3rd edition. 
 
 Tetrachoric Correlations : 
 
 This type of correlation coefficient is used when continuous normally 
 distributed variables are measured dichotomously. 
 
 This program is based on a program by Roald Buhler at Princeton 
 University which in turn is based on a 650 program written at the 
 Educational Testing Service. The approximation used was developed 
 by Professor Ledyard Tucker. 
 
 II. Restrictions 
 
 Partial Correlations : 
 
 Input matrices may be no larger than lUO x lUO and must be compatible 
 with SOUPAC conventions. In most cases the original input to the 
 program will be a matrix of zero order correlations (see CORRELATION 
 program write-up) . 
 
 Tetrachoric Correlations : 
 
 This option is limited to 1^0 variables. All observations should be 
 coded either or 1. The program generates cross-count tables before 
 computing the correlation coefficients. 
 
PARTIAL CORRELATION 
 Page 2 
 
 III. Parameters 
 
 The program name, PARTIAL CORRELATION, should be followed by the 
 following parameters : 
 
 Parameter 
 Number 
 
 2 
 3 
 
 22 
 
 Use or Meaning 
 
 Input address of R if partial correlations or raw 
 data if tetrachoric correlations. (R is a 
 correlation matrix) . 
 
 Output Address of correlations desired. 
 
 if tetrachoric correlations are desired 
 
 1 if partial correlations are desired 
 
 Variables to be held constant in using 
 partial correlations. 
 
 IV. Special Comments 
 
 When there is a zero cell or sufficiently close so that the 
 tetrachoric correlation cannot be computed by this approximation, a 
 value of -1.0 is used if the missing cell is off-diagonal. If a 
 diagonal cell is zeroish (i.e., if a variable is all zero or all one) 
 its correlations are set to 0.0. 
 
 Blanks are counted as zeroes. 
 
 Examples 
 
 A series of observations of 8 variables are used to obtain 3rd 
 order partial correlations with variables 5, 7> and 8 held constant: 
 
 /'■ L ID 
 
 // EXEC SOUPAC 
 SYSIN DD * 
 
 RRELATIONS ( CARDS ) ( ) ( SEQ 1 ) . 
 
 RTIAL CORRELATION (SEQl) (PRINTUl) (5) (7) (S" 1 • 
 ... SOUPAC 
 DATA(8)(8F6.2) 
 
 F.ND -// 
 /* 
 
STEP-WISE MULTIPLE CORRELATION 
 
 I. General Description 
 
 The STEP-WISE MULTIPLE CORRELATION program calculates the following: 
 
 - *i 
 
 Mean : X n - 
 
 1 ~ N 
 
 Covariance : s. 
 
 NE(X.X.) - (2C.)(flC.) 
 
 ij : N(N-l) 
 
 1/2 
 Standard Deviation: s. = (s..) ' 
 l li 
 
 Product Moment Correlation : r . 
 
 s. . 
 
 ij s. s . 
 
 i J 
 
 In the step-wise procedure, intermediate results are used to give 
 valuable statistical information at each step in the calculation. These 
 intermediate answers are also used to control the method of calculation. 
 A number of intermediate regression equations are obtained by adding one 
 variable at a time thus giving the following intermediate equations. 
 
 a. Y = B Q + B 1 X 1 
 
 b. Y = B + B 1 X 1 + B 2 X 2 , etc. 
 
 The coefficients for each of these intermediate equations and the 
 reliability of each coefficient are obtained by the step-wise procedure. 
 The values and reliability may vary with each subsequent equation. The 
 coefficients represent the best values when the equation is fitted by the 
 variables included in the equation. The variable is added that makes the 
 greatest improvement in "goodness of fit" or, stated another way, gives the 
 greatest reduction in variance of the dependent variable. 
 
 A variable may be indicated to be significant at an early stage and 
 enter the regression equation. After several other variables are added to 
 the regression equation, a variable in the equation may be indicated to be 
 insignificant. Under this situation the step-wise regression procedure will 
 remove the insignificant variable before adding an additional variable. 
 Thus, at the various steps in the regression procedure, only those variables 
 which are significant will be included in the regression equation. 
 
 The F level to enter a variable controls when variables enter the 
 equation and the F level to remove a variable likewise controls the removing 
 of variables from the equation. 
 
 The last step in the step-wise procedure predicts the value of the 
 dependent variable for each set of observations based on the final re- 
 gression equation. Deviation between the actual and predicted values are 
 also calculated. 
 
 (See parameter k) . 
 
STEP-WISE MULTIPLE CORRELATION 
 
 Page 2 
 
 For reference to formulas ard methods used see: 
 
 A. Ralston and H. S. Wilf, Mathematical Methods for Digital 
 Computers , New York, Wiley and Sons, i960, pp. 191-195 • 
 
 II . Restrictions 
 
 The maximum number of independent variables in this program is 199- 
 The dependent variable must be the last variable of each row. 
 
 The input data to this program may come from any source conforming to 
 SOUPAC. Output may be PRINT only. 
 
 Ill . Parameters 
 
 The parameters for the STEP-WISE MULTIPLE CORRELATION program appear 
 on the program call card. They must follow the program name in this order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 
 (See parameter h for special conditions). 
 
 2 "F" level to enter an independent variable 
 into the regression equation. An example 
 would be: *i+.0* 
 
 3 "F" level to remove a variable from the 
 regression equation. An example would 
 be: *U.O * 
 
 h This parameter should be set to 1 if the 
 
 predicted dependent variables are to be 
 calculated. (if this option is needed, 
 input data must not be from cards . ) or 
 blank if not wanted. 
 
 5 1 if constant term in equation is assumed 
 to equal zero (0) . 
 
 6 1 if want to use weighting factor. (if a 
 weighting factor is used, it must be the 
 last variable in the input data row. ) 
 
 7 1 if intermediate steps of regression are 
 not to be printed. 
 
 8 1 if do not want cross-product matrix printed 
 
 2 if input data is in the following form: 
 
 1 N N+l 
 
 CORRELATION 
 MATRIX 
 
 N 
 
 STANDARD 
 N+1 DEVIATION 
 
 (where N is the number of variables) 
 
 M 
 E 
 A 
 N 
 S 
 
 sample 
 size 
 
Parameter 
 Number 
 
 9 
 
 10 
 11 
 
 12 
 
 STEP-WISE MULTIPLE CORRELATION 
 
 Page 3 
 
 Use or Meaning 
 
 1 if do not want means and standard deviation 
 printed 
 
 1 if do not want covariance to be printed 
 
 1 if do not want correlations to be printed. 
 
 Tolerance to be used to determine when 
 singularities are assumed to occur. If 
 this parameter is left blank 10~5 is used. 
 If it is desired to change this parameter, 
 the following would be used: *1.E -10* 
 where any number could be substituted for 
 the 10. 
 
 Output (intermediate storage^ of coefficients. 
 
 First (N) variables are placed in regression 
 first. 
 
 13 
 11+ 
 
 IV. Special Comments 
 
 The dependent variable must be the last variable in the input row 
 (unless a weighting factor is used, then the dependent variable will be the 
 next to the last variable in the input row) . 
 
 Negative F-ratios may sometimes result in the computational procedure. 
 They should be considered to be analytically zero. Frequently negative 
 F's arise when input is a missing data correlation matrix. Results up to 
 the negative F are always correct. 
 
 The standard error of Y 
 predicted Y. 
 
 ;iven at each step is the standard error of 
 
 The program will loop if a variable entered at one step is removed 
 at the very next step. This can usually be corrected by changing the 
 F-levels to enter and remove variables. 
 
 V. Example 
 
 STEP ( CARDS )*k.**h.*{ )( )( )( )(1). 
 
DISTRIBUTION ANALYSIS PACKAGE 
 
FIT 
 (CHI-SQUARE GOODNESS-OF-FIT TEST > 
 
 I. General Description 
 
 In statistical applications, it is frequently the case that certain 
 assumptions were made concerning the probability distribution of a random 
 variable. A frequent assumption is that a particular variable is normally 
 distributed. The question that arises is "how valid an assumption was 
 this?" A method of testing this assumption (or hypothesis) is the CHI- 
 SQUARE GOODNESS-OF-FIT TEST. 
 
 This program provides tests of hypothesis that user's data is 
 (l) a random sample from the distribution PO, where is a user specified 
 parameter, or (2) a random sample from a class, PO, of distributions where 
 is not specified. These will be called tests of type 1 and type 2 , 
 respectively. 
 
 Distributions which can currently be tested in the program are the 
 binomial, Poisson, normal, gamma and continuous rectangular distributions. 
 The user may, for example, wish to test the hypothesis that his data was 
 from a normal distribution with variance 1, for some mean. 
 
 The program on the basis of the user provided information decides on 
 
 a set of points X, < X„ < . . . .X in the range of the distribution under 
 
 12 n 
 
 consideration. The test then compares the observed numbers, o^ ,of 
 
 observations in each interval [X. n , X.), with the expected numbers 
 
 l-l l 
 
 e. = Ps '{X. . ■: I. X.} , {Y. ; 1 < i < m } 
 l 1-1—1—1 l — — 
 
 Where is the user specified value of , if it was specified, and if 
 not, is the maximum likelihood estimator for a hypothesis of form (2) 
 above. The comparison of o. with e. is made by the statistic 
 
 n+1 (o - e ) 2 
 
 X = Z 
 
 1-1 S i 
 
 2 
 The distribution of this statistic has an approximate X distribution when 
 
 the hypothesis is true and n is large. (See Billingsley , 1961). 
 
 The program prints the computed value of X and the number of degrees of 
 
 freedom for the test. 
 
 If one sample, with m observations, is to be tested, input to the program 
 from cards or sequential will be one variable with m observations. More 
 than one sample may be tested at a time, if the same distribution, or in some 
 cases a distribution from the same class, is the one being considered. Thus 
 a test of each of three samples, all of which are on the same sequential 
 storage or card deck, may be performed. 
 
FIT ( CHI SQUARE 
 GOODNESS OF FIT ^ 
 Page 2 
 
 II. Distributions Available 
 
 NOTATION: In the description below f(x) denotes the value of the probability- 
 density function of the point x. F(x) is the cumulative distribution function 
 defined as Pr {X < x } 
 
 DISCREET DISTRIBUTIONS 
 
 The Distributions are classified here as being "discreet," (i.e. 
 having positive probability at a countable number of values of the 
 random variable it describes) or as "Continuous" (having positive 
 density over a continuous range of values of the random variable). 
 
 1. BINOMIAL(N,P) N a positive integer; <_ p <_ 1 
 
 f(x) -- ( jp (1-p) for x an integer 
 
 Vx/ such that < x < N 
 
 F(x) = E g) p 1 (1-p 
 
 i=0 
 
 N-i for x an integer 
 
 such that < x < N 
 
 This distribution is appropriate, for example, in situations 
 where a random variable is sampled independently (observed) N times, 
 the observations, and where the values of the random variable can 
 be classified precisely as being in one of two sets often denoted 
 "success," and "failure," with probabilities p, and q = 1 - p 
 respectively. 
 
 2. POISSON (X) X>0 
 
 / \ x —X 
 f(x) = X e x >_ ; and x an integer 
 
 x! 
 
 x i -X 
 F(x) = I A .? x > 
 
 i = 
 
 The Poisson distribution is often used as an approximation 
 to the binomial when the number of observations is large, 
 p = Pr {success} is small, and the product n * p is essentially a 
 constant. The Poisson is a pervasive distribution in its own 
 right, arising in situations where the probability of no occurrences, 
 or of one occurrence of a phenomenon in a unit of time is moderate, 
 where the probability of more than one occurrence is essentially 
 negligible in comparison, and where the frequencies of occurrence in 
 adjacent intervals are independent of each other. 
 
FIT (CHI SQUARE 
 GOODNESS OF FIT 1 
 Page 3 
 
 B. CONTINUOUS DISTRIBUTIONS 
 
 2 2 
 
 1. NORMAL (\i, a ) -« < u < +«> ;a >0 
 
 1 2 
 
 f(x) = I 2 eX P ~2 M _oo < X < +oo 
 
 211 a 2 a 
 
 dt -°° < x < +°° 
 
 Through the Central Limit Theorem, the use of this distribution 
 has been justified to describe a tremendous variety of phenomena in 
 which the random variable under consideration is assumed to be the 
 sum of a large number of independent random variables , each having 
 a small contribution to the total. 
 
 2. GAMMA (a, 6) a>0 , 3 > 
 
 f(x) 
 
 „a a-1 -0x 
 3 x e 
 
 A(a ) 
 
 
 x . 
 
 F(x) 
 
 ( a .a-1 -3t ,. 
 3 t e dt 
 
 / A(a) 
 
 
 
 
 Here 
 
 A(a) = (a-1) A (a-1) 
 
 integer, 
 
 A(a) = (a-1)! 
 
 for x > 
 
 and so if a is a non-negative 
 
 The gamma distribution is the sampling distribution for the 
 sum of a independent identically distributed "negative exponential" 
 random variables, with 3 =A where X is the parameter for the 
 negative exponential distribution. The negative exponential 
 distribution is itself the special case gamma (1,3). The gamma distri- 
 bution is used in dealing with waiting times, where the expected 
 frequency in a given interval has Poisson distribution. The 
 "Chi-Square" distribution is another important special case of the 
 gamma distribution where 
 
 2 
 X (r) e gamma (r/2, 1/2). 
 
 3. RECTANGULAR (0^ © 2 ) Q i " G 2 
 
 1 
 
 (e 2 - Oj) 
 
 f <*> = (* - a.<> 1 ix 1 9 2 
 
 x 
 
 x-0, 
 
 F(x) = / 1 dt = H_ Q l - 
 
 4 Q 2- Q 1 V 1 
 
FIT (CHI SQUAEE 
 GOODNESS OF FI1 ^ 
 Page k 
 
 This distribution is appropriate where any event in an 
 interval {0j , 2 } has equal probability of occurring, but no 
 occurrences will be outside the interval. 
 
 III. Use of the Program 
 
 A. PRELIMINARY 
 
 The program is invoked by the main parameter card (a SOUPAC 
 "Program Card"). Information concerning the distribution and 
 intervals to be used are supplied on subparameter cards for the 
 program. 
 
 Input is in the form of column vectors, and may be from cards or 
 sequential storage. Each variable (vector of data will be tested 
 against the same type of distribution, i.e. information provided 
 on the distribution card applies to all variables. 
 
 The subparameter card provides the means of supplying information 
 to the program concerning the distribution to be used, as well as 
 the number and size of intervals to be used in the tests. In 
 some cases a distribution parameter must be supplied by the user. 
 
 B. MAIN PARAMETER CARD 
 
 Immediately following the program name CHI-SQUARE GOODNESS-OF-FIT 
 (mnemonic: FIT), the following parameters are listed. 
 
 Parameter 
 Number 
 
 Use of Meaning 
 
 Input Address. May be CARDS or 
 SEQUENTIAL (SI, or S2, etc). 
 
 SUBPARAMETER CARDS 
 
 1. GENERAL DESCRIPTION 
 
 One "distribution card',' chosen from the list below, 
 should appear with each call to the program. For tests of 
 type 1 ("simple hypothesis") the parameters of the distribution 
 are specified on this card. For distributions allowing tests 
 of type 2 ("composite hypothesis") one or more of these 
 parameters may be left blank. Other parameters on the card 
 relate to intervals to be used for determining observed and 
 expected frequencies. Because this test is "asymptotically 
 valid," these should be chosen so that expected frequencies of 
 any interval does not fall below 5- 
 
 2. DISTRIBUTION CARDS 
 
 Where * * is used, a parameter requires a decimal point 
 number; when ( ) appears an integer is required. 
 
FIT ( CHI SQUARE 
 GOODNESS OF FIT ^ 
 Page 5 
 
 DISTRIBUTION 
 
 BINOMIAL(M)*P*(N). 
 
 M and P are parameters of the distribution, P the probability 
 of success on a trial, M the number of trials. N is the number 
 of points (integral values starting at 0) to be grouped in each 
 interval. For a test of Type 2, the parameter P should be 
 left blank. 
 
 POISSON*A*(ENDPOINT)(N). 
 
 X is the Poisson density parameter, and may be left blank for 
 a test of type 2. N is the number of adjacent points to be 
 grouped in each interval until ENDPOINT is reached. The 
 Interval [ENDPOINT, +°°) is then the last interval. 
 
 RECTANGULAR*Q1**Q2*(N) . 
 
 01, and 02 are the endpoints for the range of the distribution 
 
 01 < 02. N is the number of equal sized intervals to use for 
 
 the test. Only tests of type 1 are allowed with this distribution. 
 
 2 
 N0RMAL*y**a **STPT**EP*(N) . 
 
 P is the mean of the distribution, a" is the variance. Either 
 or both of the parameters may be left blank for a test of 
 type 2. STPT and ZP are the start point and end point of an 
 interval to be broken up into N equal sized intervals for 
 the test. The additional intervals (-°°, STPT), and [EP, +°° ) 
 will be used. 
 
 GAMMA*a**3**ENDP0INT*(N) . 
 
 *a* is the degree's of freedom parameter, and must be specified. 
 *6* is the density parameter and may either be specified for 
 a type 1 test or left out for a type 2 test. The portion of the 
 real line between and ENDPOINT will be divided into N equal sized 
 intervals for the test. One additional interval, [ENDPT, +°° ) 
 will be included. 
 
 IV. Restrictions 
 
 A maximum of 50 input variables (each variable a sample to be tested) 
 may be input to the program. Only one distribution or distribution type 
 (e.g. NORMAL with variance 1 and any mean) may be tested. 
 
FIT ( CHI SQUARE 
 GOODNESS OF FITO 
 Page 6 
 
 V. Examples 
 
 /*ID 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 FIT(CARDS). 
 
 N0R*2 . 0****-10**lU*(2>! i 
 
 END P 
 
 END S 
 
 DATA( 1000,1 )(21X,Fl+.l) 
 
 #END 
 /* 
 
 This program will test the hypothesis that the sample is from a normal 
 distribution with a mean of 2, for some variance. The interval [-10, lU) 
 will be divided into 2U pieces, each of length 1, for the test. 
 
 /•ID 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 FIT (CARDS). 
 
 BIN(l0)*l/2*(2). 
 
 END P 
 
 END S 
 
 DATA(600,l)(26X,F2.0) 
 
 #END 
 /* 
 
 This program will test the simple (type l) hypothesis that the sample was 
 
 from a B ( 10, 1/2) population. Note that with the given number (600) of 
 
 observations we had to group adjacent pairs of points into the same interval 
 to insure a reasonable expected frequency in every cell. 
 
THE KOLMOGOROV-SMIKNOV STATISTIC 
 
 I. General Description 
 
 The program computes the Kolmogorov-Smirnov (K-S) D statistic, 
 
 D = SUP 
 all x 
 
 F N (x) - F(x) 
 
 where F^ is the sample cumulative distribution for a sample of size N and 
 F(x) is the specified cumulative distribution. 
 
 IT. Theoretical Discussion 
 
 The sample distribution function: 
 
 V x) =| 3 = °> N 
 
 will generally differ from the population distribution function. If 
 the sample distribution differs exceedingly from the specified 
 distribution F(x), the amount of the difference might be of use in 
 determining whether to accept the hypothesized distribution as correct. 
 The Kolmogorov-Smirnov test uses the maximum actual numerical 
 difference If^OO - F(x) I . 
 
 Example : 
 
 Consider testing the hypothesis that a distribution is normal with 
 mean = 32 and variance=3.2U with 10 sample observations 
 
 31.0 33-7 
 
 31. h 3^.U 
 
 33.3 3^.9 
 
 33.^ 36.2 
 
 33.5 37-0 
 
 F (x) and F(x) are sketched below. 
 
KOLMOGOROV-SMIRNOV 
 Page 2 
 
 F N (x) - F(x) 
 
 D = .5b which is the maximum of 
 
 At the .95 confidence level the critical value D' = . U0925 
 
 Since D > D' the distribution being tested is rejected at the 3% level. 
 
 * See Owen [5] 
 
 III. Notes 
 
 Only supply the parameters needed to determine a specified 
 distribution. This program, at this time, calculates Kolmogorov- 
 Smirnov statistic for the following distributions if given the 
 proper parameters. 
 
 A. Normal 
 
 1. mean 
 
 2. variance 
 
 B. Chi-Square 
 
 1. Degrees of freedom 
 
 C. Central F 
 
 1. Degrees of freedom numerator 
 
 2. Degrees of freedom denominator 
 
 D. Noncentral F 
 
 1. Degrees of freedom numerator 
 
 2. Degrees of freedom denominator 
 
 3- noncentrality parameter N 
 
 when noncentrality parameter X = — £ w 
 
 i=l 
 
 (See Graybill [3] for further development of \). 
 
 Additional distributions are planned for the near future. Please 
 see consultants for these additions. 
 
 IV. Sample Programs 
 
 A. Normal - type 1 
 
 2 = 32 
 a = 3. 2k 
 
 K-S (C)(1) '-' : 2^.2k*. 
 
 B. Chi-Square - type 2 
 
 d.f.= 2k 
 
 K-S(C) (2)****(2k). 
 
 C. Central F - type 3 
 
 d.f. numerator = 5 
 d.f. denominator = 16 
 
 K-s(si)(3)****(5)(-l6). 
 
KOLMOGOROV-SMIRNOV 
 Page 3 
 
 D. Noncentral F - type 3 
 d.f. numerator = 10 
 d.f. denominator = 13 
 noncentrality parameter (X) = 2.5 
 
 K-S(S2) (3 )*-*** (10) (13)*2.5*. 
 
 V. Parameters 
 
 The following parameters follow the mnemonic K-S : 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input address, CARDS, SEQUENTIAL 1-15. 
 
 2 Specified distribution (see Section IV) . 
 
 3 Floating point value of the mean. 
 
 k Floating point value of the variance. 
 
 5 Degrees of Freedom (numerator) . 
 
 6 Degrees of Freedom (denominator) . 
 
 7 Floating point value of non-centrality 
 
 parameter. 
 
 VI. References 
 
 [l] Darling, D. A., "The Kolmogorov-Smirnov, Cramer-Von Mises Tests," 
 
 Annals of Mathematical Statistics , Vol. 28, (1957), PP- 823-838. 
 
 [2] Goodnight, James, Department of Experimental Statistics, N. C. 
 State University, Raleigh , North Carolina. Mr. Goodnight 
 developed and programmed the computational procedure used to 
 integrate the central and non- central F distribution. 
 
 [3] Graybill, F. A., An Introduction to Linear Statistical Models , 
 
 Vol. I, Chapter k, "Distribution of Quadratic forms" pp7 7^f— 92. 
 
 [U] Lindgren* 
 
 [5] Cwen, D. B., Handbook of Statistical Tables . This book contains 
 a table of the critical values for the Kolmogorov-Smirnov 
 statistic. It is available for reference in the SOUPAC office. 
 
FACTOR ANALYSIS PACKAGE 
 
Factor Analysis Package 
 
 Users of the PRINCIPAL AXIS FACTOR ANALYSIS program are reminded that 
 if no estimates of communalities are provided with the input matrix, the 
 output is a principal components matrix. If communalities are provided, 
 then the output is principal axis factors. Whether communality estimates 
 are used, and if so, what kind, can be a very important conceptual considera- 
 tion. Mathematically, the principal components approach the principal axis 
 factors for large numbers of variables. The rate of convergence is the ratio 
 of N 2 to N 3 . 
 
 Centroid factor analysis similarly requires that a user consider the 
 question of communality estimates. 
 
 The ITERATIVE FACTOR ANALYSIS program on the other hand, simultaneously 
 estimates communalities and factor loadings. 
 
 The UNRESTRICTED MAXIMUM LIKELIHOOD procedure requires as input a 
 correlation matrix with unities on the diagonal. 
 
 For principal components, component scores are derived by the equation 
 ZR F = X where Z is the standardized raw scores; R the inverse of the 
 correlation matrix; and F the component loadings. 
 
 In general, factor scores from oblique rotations are obtained by the 
 formula X = DY~" M~ F'Z where D is the correlation between components and 
 the reference vectors; Y the reference vector transformation; M the roots; 
 F the principal components; and Z the standardized raw scores. For the 
 orthogonal case, the general formula is X = TM F'Z where T is a transforma- 
 tion matrix, for example, the varimax loadings. Notice that if T is the 
 identity matrix we have the formula for component scores. 
 
 These general formulas are also applicable to principal axis factors, 
 as well. 
 
BINORMAMIN 
 
 I. General Description 
 
 BINORMAMIN ROTATION rotates a matrix, F, of orthogonal factor loadings 
 to oblique simple structure. 
 
 It does this by iterating for T in FT = A (where A is the rotated factor 
 pattern ^ so as to minimize: 
 
 Z (v 2 /h 2 )(v 2 /h 2 ) 
 
 K "P K P"P q=P (^v 2 /h 2 )(^ 2 /h 2 ) J 
 
 3 jp 3 3 jq' 3 
 
 Since solving directly for K is too complex, BINORMAMIN takes one vector at 
 a time, rotating it against all the others, to minimize each Kp. 
 
 Its name comes from the fact that is uses a double (Bl) NORMAlization 
 in seeking a MINimum. 
 
 For further information see : 
 
 1. Kaiser, H. F. and Dickman, K. W., "Analytic Determination of 
 Common Factors". Unpublished manuscript, University of 
 Illinois, 1959- 
 
 2. Harmon, H. H., Modern Factor Analysis . Chicago, University 
 of Chicago Press, i960. pp.326ff. 
 
 II. Restrictions 
 
 Input is limited to matrices of 150 x 30 or less. 
 n. Parameters 
 
 After the program name, BINORMAMIN, are the following parameters: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address of factor matrix, F. 
 
 2 Output Address of factor matrix, F. 
 
 3 Output Address of the transformation 
 matrix, T. 
 
 h Output Address of the reference vector 
 
 structure, V. 
 
 Output Address of the correlations between 
 reference vectors. 
 
 Output Address of the primary factor 
 pattern, P. 
 
BINORMAMIN 
 Page 2 
 
 Parameter 
 Number 
 
 7 
 8 
 
 10 
 
 11 
 Note on Output : 
 
 Note on parameters 8 and 9 
 meets first. 
 
 Use or Meaning 
 
 Output Address of the correlations between 
 factors. 
 
 Maximum number of iterations (see note). If 
 blank, the maximum will be set at 100 itera- 
 tions . 
 
 Convergence criterion (see note). 
 
 A. Defined zero change: iterating will stop 
 when each element in V changes by less 
 than A. (A must be less than .2 and no 
 less than .0000001). 
 
 B. Defined zero rotation: iterating will 
 stop when each vector in T changes by- 
 less than 9, where is the angle whose 
 cosine is B. (B must be less than 1.0 
 and no less than .2). 
 
 If left blank, A will be set to .001. 
 
 If an initial T is to be read in, input 
 address of T. 
 
 Output Address of the initial T. 
 
 A. Any output option left blank will not be output. 
 
 B. The program will always print out the program 
 name and the number of iterations actually done . 
 
 C. The program will print out the largest change in 
 V, unless option 9B is used. 
 
 D. All data printed out is to 7 decimal places. 
 Program will stop at whichever criterion it 
 
 Note on parameter 9: This parameter is a floating point constant and there- 
 fore must be enclosed in asterisks, with a decimal point, as in example: 
 
 Example: BINORMAMIN ( CARDS) ( U ) (SEQ 1 /PRINT K )( )( PRINT) ( )*.0001*. 
 
 Store V on SEQ 1, also prints V and correlations between factors. On the last 
 iteration, no element in V changed by more than .0001, unless the maximum of 
 100 iterations was reached. 
 
CENTROID I ACTOR ANALYSIS 
 
 I . General Description 
 
 CENTROID FACTOR ANALYSIS computes a set of f linearly independent 
 vectors (factors' 1 which are mutually uncorrelated. Normally, a factor 
 analysis decomposes a matrix of correlations, R , into a set of f 
 factors. The factors are arrayed as column vectors in the factor matrix, 
 F, such that 
 
 ^ = FF ' + R (n-f) 
 
 th 
 where R n _f is the matrix of residual effects. The K factor is computed 
 
 by dividing the column sums of R , by the square root of the total sum of 
 
 elements of R n _i. 
 
 < h V Jxz r, , (k: 
 
 f i k = Z r v ' / 
 
 11, k l r / - rr '1,3 
 
 13 
 
 Between each factor extraction, the variables in the residual matrix 
 are successively reflected until all the column sums are positive. 
 
 For more detailed discussion see: 
 
 1. L. L. Thurstone, Multiple Factor Analysis , Chicago, 
 University of Chicago Press, 19^7, pp. 1^9-175- 
 
 2. Harry Harmon, Modern Factor Analysis , Chicago, 
 University of Chicago Press, i960, pp. 192-215- 
 
 II . Restrictions 
 
 The input matrix for the CENTROID program must not exceed the dimen- 
 sions of 190 x 190. The input matrix is further limited to being a square, 
 positive definite or semi-definite, symmetric matrix. Commonly, correlation, 
 covariance, or cross-product matrices are used as input data. Any attempt 
 to introduce communality estimates (change the diagonal elements) must be 
 made before data is passed to the CENTROID program. A set of communalities, 
 which are incorrectly estimated, can make the matrix non-positive and could 
 conceivably cause a hang-up. 
 
 The input data may come from any storage medium which conforms to SOUPAC. 
 Similarly, the output codes follow the established conventions and are at 
 the option of the user. 
 
 The input matrix may be completely factored (i.e., N factors from a 
 N variable matrix") . However, factoring may be stopped by any of three 
 criteria: 
 
 1. The user may specify the number of factors to be extracted. 
 This criterion provides an upper limit beyond which factoring 
 will not be done. Consequently, it is advisable to put the 
 maximum value on this limit in cases where it is not the 
 primary criterion. (Set it equal to the number of variables). 
 
CENTROID 
 Page 2 
 
 2. The per cent of total variance removed from the R matrix is 
 a second limiting criterion. This parameter also specifies 
 an upper limit to the process. Therefore, it should be set 
 at 100 per cent unless it is the criterion for stopping. 
 
 3. The last criterion is to stop when the factor contribution falls 
 below 1. The use of this procedure is dictated by the presence 
 or absence of its associated parameter. 
 
 If all three criteria are used simultaneoulsy, factoring will be stopped by 
 whatever criterion is met first. 
 
 III. Parameters 
 
 Following the program anme on the program call card come the parameters 
 needed by the program. The parameters must appear in the order below: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 
 2 Output Address. SEQUENTIAL 1-15 and/or PRINT. 
 
 3 Maximum number of factors to be extracted . 
 This must be less than or equal to the 
 order of the input matrix. 
 
 k Per cent of total variance to be removed 
 
 expressed as an integer between and 100. 
 
 5 The presence of any number greater than 
 in this parameter indicates that factoring 
 should stop when the factor contribution 
 falls below unity. 
 
 6 Output Address of Residual Matrix. 
 
 If parameters 3 and k are left blank then by default option they will be . 
 set to maximum possible values and a message will be printed. 
 
 Residual Matrix must be stored before it can be printed. 
 
 Example: Assume that you have 77 variables and that the correlation matrix 
 is stored on SEQ 1 , then legal forms of CENTROID call statement may be: 
 
 CENTR0ID(SEQ l) (PRINT) (77) (100) (l) . 
 
 CENTROID (SEQ 1)(P(F))(50)(80). 
 
 CENTROID (SEQ l)(SEQ 2/P) (20) (100) (l) (SEQ 3/P) . 
 
 CENTROID (SEQ l)(SEQ 2/P(F) H±5) (90) (SEQ 3/P(F) ) . 
 
 CENTROID (SEQ l) (P) . In this case, number of factors = 77 
 
 and per cent of variance = 100 will be assumed by default. 
 
COMMUNALITiT ESTIMATION 
 
 I. General Description 
 
 Five methods of COMMUNALITY ESTIMATION are offered in this program. 
 In each case the estimates replace the diagonal elements of the matrix. 
 They are as follows : 
 
 Code Number Method 
 
 1 The element of largest absolute magnitude in 
 each row replaces the diagonal element of the 
 row. 
 
 2 The square of the multiple R of each variable 
 with all others replaces the diagonal entry 
 
 for that variable. (See Special Comment Number 2) 
 
 3 Communalities produced from another analysis 
 and are to be input from cards or another 
 storage medium. 
 
 N 2 l/2 
 
 k For each row (N) ((z r ± ^)/N) 7 " 
 
 replace the diagonal entry for that row. 
 This is the square root of the average 
 square across the row. 
 
 5 For each row (Nj (r*.)^ - r*^/^ - r*^ 
 
 replaces the diagonal entry for that row 
 where : 
 
 r* = max ab s ( r . . ) and 
 lk ij 
 
 S. = Z abs (r. .), S = Z abs (r. .) 
 
 J J 
 
 This method of COMMONALITY ESTIMATION is due to 
 Professor L. Tucker. 
 
 II. Restrictions 
 
 Input is restricted to correlation matrices of order 150 or less. 
 
 TTT. Parameters 
 
 The parameters for the COMMUNALITY ESTIMATION program appear on the 
 program call card. They must follow the program name in this order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 
IV. 
 
 Parameter 
 Number 
 
 2 
 
 3 
 
 k 
 
 Special Comments 
 
 COMMONALITY ESTIMATION 
 Page 2 
 
 Use or Meaning 
 
 Output Address. SEQUENTIAL 1-15 and/or PRINT. 
 Section Code Number. (See General Description ' 
 Input Address if Option 3 is used. 
 
 (1) If the correlation matrix and communalities both are input from cards, 
 the correlation matrix precedes the communality estimations. (See Code 
 Number 3 in the General Description '! . 
 
 (2) If the input correlation matrix is singular, or very nearly so, 
 squared multiple correlations computed by standard procedures may be subject 
 to considerable error, and will usually exceed unity for several variables. 
 For this reason, the user should be aware of the characteristics of the 
 matrix. The program will check to see that all R are less than or equal 
 
 to 1.0. If this is not the case, execution will cease and a message will 
 be printed stating that the correlation matrix is virtually singular. 
 There is an alternative procedure devised by Ledyard Tucker for finding 
 R in singular matrices. Information is available in the SOUPAC office. 
 
 V. Reference 
 
 Harmon, Harry: Modern Factor Analysis , Chicago, University of Chicago 
 Press, i960, pp. 83-90. 
 
ITERATIVE FACTOR ANALYSIS 
 
 I. General Description 
 
 A. Procedural 
 
 This routine, upon option, provides one of four iterative factor- 
 ization methods : 
 
 1. Alpha factor analysis (AFA, Kaiser, 1962) 
 
 2. Canonical factor analysis (CFA, Rao, 1955, Harris, 1962) 
 
 3- Stepwise maximum likelihood factor analysis (MLFA, Lawley, 19^0) 
 ^4. Iterative principal axis factor solution (IPRAX, Traditional) 
 
 All four methods have in common that communalities and factor loadings 
 are estimated simultaneously. In three cases (AFA, CFA, IPRAX) the 
 number of factors decision can be made beforehand by the user, or it 
 can be left to the program, in which case appropriate modifications of 
 Guttman's lower bound criterion will be used. 
 
 The four methods differ from each other in theory with respect to the 
 defining criterion of optimization, and consequently they differ tech- 
 nically with respect to the matrix that is diagonalized in each case. 
 
 1. AFA (Kaiser) 
 
 Optimization criterion : maximize the alpha- reliabilities 
 (Cronbach) of the retained factors. If the number of factors 
 decision is left to the program, the Kaiser modification of 
 the Guttman criterion will be used and all factors with 
 positive alpha-reliability will be iterated upon. 
 
 The diagonalization is on the matrix C in 
 
 C = H" 1 (R - U 2 ) H" 1 so that C = Q0Q' 
 
 o 
 where R is an nxn input matrix of covariances, H is a diagonal 
 
 matrix of communalities, U = I - H^ is a diagonal matrix of 
 
 uniquenesses, Q is an nxm matrix of latent vectors corresponding 
 
 to the m largest latent roots in Q which are used to recompute 
 
 new estimates of H 2 through F = H Q 0V 2 . An initial set of H 2 
 
 is provided by I - (diag(R"-'-) )"-'- which is equivalent to the 
 
 squared multiple correlations of R if R itself is a correlation 
 
 matrix. 
 
 Invar iance under scaling : Kaiser has shown that the resulting 
 factors will be invariant under scaling, i.e., if a covariance 
 matrix R gives rise to a factor matrix F then the covariance matrix 
 SRS will give rise to a factor matrix SF (S diagonal). 
 
 Behavior of latent roots, alpah reliabilities : the n-m rejected 
 roots of C add to zero at each state (i.e., C is non-Gramian), 
 the m accepted roots are simple functions of the alpha-reliabilities 
 of the retained factors in F. These reliabilities will be output 
 by this sub -program. 
 
ITERATIVE FACTOR ANALYSIS 
 
 Page 2 
 
 2. CFA (Rao, Harris) 
 
 Optimization criterion : maximize the correlations between m 
 linear combination of the common parts of the variables with m 
 linear factors that are canonically correlated (Hotelling) with 
 the variables in the common factor space. If the number of 
 factors decision is left to the program, the Harris modification 
 of the Guttman criterion will be used, leading to a Gramian 
 R-U of minimum rank . 
 
 The diagonalization is on the matrix 
 
 C = U -1 (R - U 2 ) IT 1 so that C = Q9Q' 
 
 where F - UQ.W-V is used to recompute new estimates for U 2 , 
 retaining the m largest roots of C in 9. The notation is the 
 same as in section 1 (AFA) . An initial set of U 2 is provided by 
 [diag (R- 1 )]- 1 . 
 
 Invariance under scaling : the resulting factors are again invariant 
 under scaling as defined in section 1 (AFA" 1 . 
 
 B ehavior of latent roots : Chi- square criterion: Rao has shown 
 that the n-m rejected roots approach unit at convergence. For 
 exact rank m data they will be "exactly" unity within the tol- 
 erance of the convergence criterion ETA (see section III - B) . 
 For data containing random error their departure, from unity 
 provides a likelihood ratio test for the hypothesis that the 
 population matrix P-V 2 = GG' , where P, V, G are population 
 parameters corresponding to R, U, F in the sample, is rank m 
 or less. A criterion for this test is computed by this sub- 
 program which can be compared with two chi-square approximations 
 which are also output by this sub-program. Note that such a 
 chi-square test is valid only if the iterative process has indeed 
 converged, as indicated by the maximal discrepancy between trial 
 vectors which is printed out for that purpose. 
 
 3- MLFA (Lawley) 
 
 The CFA variant of the program can be used for a step-wise maximum 
 likelihood factorization in the Lawley-Rao sense. 
 
 Optimisation criterion : maximize the likelihood function corre- 
 sponding to the multivariate normal distribution with covariance 
 matrix parameters P=GG' + V 2 (as defined in section 2, CFA), given 
 the sample matrix R, under choice of G and V and observing the 
 side-conditions that P - V^ is Gramian and V diagonal with < Vj_ < 1 . 
 
 The diagonali zation is the same as in CFA, section 2, hence, the 
 resulting factors are again invariant under scaling as defined in 
 section 1 (AFA) . 
 
ITERATIVE FACTOR ANALYSIS 
 
 Page 3 
 
 In contrast to CFA, however, the number of factors decision is 
 made on statistical ground. The user would start with a reason- 
 able guess for m (preferably m<n/2 to ensure positive degrees 
 of freedom for the chi-square test, which otherwise will be by- 
 passed) . After convergence has been obtained, the user would 
 insepct the chi-squared statistic. If the statistic is below the 
 table values of the chosen porbability level (.05 or .01), then 
 the hypothesis can be accepted at this level with corresponding 
 risk and the user has the option to reduce m for a second run, 
 etc. On the other hand, if the adjusted statistic exceeds the 
 table value, then m must be raised until the adjusted statistic 
 warrants acceptance of the hypotheses. 
 
 Within the package the user is free to re-enter the routine 
 repeatedly with sequentially de- or increasing m specified on 
 the call card. Since the test assumes convergence, it is per- 
 tinent that the number of iterations be allowed large enough 
 for convergence to occur within the chosen tolerance bound ETA 
 (see section III - B) . 
 
 k. IPRAX (traditional 1 ' 
 
 Optimization criterion : none 
 
 The diagonalization is on the matirx 
 
 2 
 C = R - U so that C = Q9Q,' 
 
 where F = Q0 1 / 2 is used to recompute H 2 = I - U 2 , retaining 
 the m largest roots in 6. The notation is the same as in 
 section 1 (AFA) . An initial set of H 2 is provided by the 
 identity matrix. If the number of factors decision is left 
 to the program, the unmodified Guttman criterion will be used, 
 i.e., all factors corresponding to roots of the input matrix 
 R which exceed unity will be retained. 
 
 Invariance under scaling : as defined in section 1 (AFA) is 
 
 not obtained by this method. 
 
 The behavior of the latent roots is not known at present. Wo 
 statistical or other significance can be attached to the m 
 largest or n-m smallest root of C. 
 
 5- Both covariance matrices and correlations matrices are accept- 
 able as input. If covariances are used the tenth parameter 
 should be 1. In this case the covariance matrix is scaled into 
 a correlation matrix, and all computations, in particular the 
 number of factors decision, are based on this correlation matrix. 
 At the final stage the factors are scaled back so as to account 
 for the covariance matrix which was input. The matrix of residuals 
 is computed in the metric of the covariances. 
 
ITERATIVE FACTOR ANALYSIS 
 
 Page k 
 
 References : 
 
 Guttman, L. "General Theory and Methods for Matrix Factoring," 
 Psychometrika , 1955, IX, 1-16. 
 
 Harris, C. L. "Some Rao-Guttman Relationships," Psychometrika , 
 
 1962, xxvii, 2U7-263. 
 
 Hotelling, H. "The Relation of The More Multivariate Statistical 
 Methods to Factor Analysis," British Journal of Statistical 
 Psychology , 1952, X, 69-79- 
 
 Kaiser, H. F. and Caffres, T. Psychometric Method of Factor 
 Analysis . Unpublished manuscript, 1963- 
 
 Lawley, D.N. "The Estimation of Factor Loadings by the Method 
 of Maximum Likelihood," Procedures of the Royal Society of Edin ., 
 19U0, LX, 6k-82. 
 
 Rao, C. R. "Estimation and Tests of Significance in Factor 
 Analysis," Psychometrika , 1955, XX, 93,111. 
 
 II. Restrictions 
 
 Input is restricted to matrices of order 100 x 100 or less. Up to 
 50 factors can be handled by this program. If the number of factors 
 decision is left to the program and more than 50 factors are estimated, 
 an appropriate message will be printed out and control will be returned 
 to the system. 
 
 III. P arameters 
 
 The program name is ITERATIVE FACTOR ANALYSIS. After the name on 
 the call card the parameters must appear in the following order: 
 
 Parameter 
 N umber Use or Meaning 
 
 1 Input Address of data matrix. CARDS or 
 
 SEQUENTIAL 1-15. 
 
 2 Output Address of data matrix. SEQUENTIAL 1-15 
 and/or PRINT. 
 
 3 Output Address for principal axis factors. 
 SEQUENTIAL 1-15 and/or PRINT. 
 
 h Output Address of residual matrix. SEQUENTIAL 1-15 
 
 and/or PRINT. 
 
 5 Option code if IPRAX 
 
 1 if ALPHA 
 
 2 if CANONICAL 
 
 3 if STEP-WISE MAXIMUM LIKELIHOOD 
 
ITERATIVE FACTOR ANALYSIS 
 
 Page 5 
 
 Parameter 
 Number Use or Meaning 
 
 6 Maximum number of cycles to be executed. 
 If left blank, 50 cycles will be used as 
 upper limit . 
 
 7 Number of factors to be extracted. If left 
 blank, all factors with roots exceeding unity 
 will be retained. 
 
 8 Exponent of convergence, n, where tolerance 
 ETA = 10" n . If left blank n = 3 or ETA = 10"3. 
 If all goes well, the program will stop as soon 
 as either one of the stopping criteria is met. 
 Error stops, if they occur, are labelled 
 accordingly. 
 
 9 Sample size (for CFA only). If left blank, 
 the chi-square computations are by-passed. 
 
 If specified, chi-square is computed with the 
 sampel size. 
 
 10 1 if input matrix was a covariance matrix. 
 
 A. Output common to all four sub -programs 
 
 1. Matrix output within system conventions: 
 
 a. R (input covariance matrix ) 
 
 b. F (factor matrix) 
 
 c. R-FF' (residual matrix) 
 
 2. Vector output, print only: 
 
 a. communality vector (last iteration) 
 
 b. vector of latent roots of C (last iteration) 
 
 3- Constant, print only: 
 
 a. number of iterations completed 
 
 b. largest discrepancy between trial vectors 
 (H^, U , H"- 1 -, depending on sub-program) 
 
 c. root mean square of off -diagonal residual matrix 
 
 d. per cent of variance removed 
 
 B. Additional output specific to sub-programs 
 
 AFA: The alpha-reliabilities of the m retained factors 
 
 CFA: chi-square statistic, chi-square appriximations (Wilson, 
 Hilferty) for p = .05 and p = .01, for comparison with 
 statistic. Degrees of freedom. 
 
ITERATIVE FACTOR ANALYSIS 
 
 Page 6 
 
 IV. Special Comments 
 
 The accuracy should be approximately 6 digits in computations, 
 possibly somewhat lower for a very large number of iterations. The 
 effective accuracy depends on the chosen tolerance ETA and the actual 
 convergence as indicated by the largest discrepancy between trial 
 vectors. The chi-square approximations are within 2 x 10"^ for more 
 than 8 degrees of freedom. 
 
JACOBI 
 
 General Description 
 
 This program calculates eigenvalues and eigenvectors of a square, 
 symmetric matrix, using the JACOBI rotating technique. This program is 
 limited to matrices of 110 rows and 110 columns. The user should realize 
 that this technique is extremely slow on large matrices, while the program 
 takes no longer than PRINCIPAL AXIS FACTOR ANALYSIS for small matrices 
 (up to 20 x 20). 
 
 II, 
 
 Parameters 
 
 Parameter 
 Number 
 
 Use or Meaning 
 
 Input Address of correlation matrix, 
 or SEQUENTIAL 1-15. 
 
 CARDS 
 
 2 
 
 3 
 k 
 
 Output Address of eigenvectors. 
 
 Output Address of principal axis factor. 
 
 Output Address of eigenvalues, stored as a row 
 vector. PRINT is not valid . The eigenvalues 
 are always printed. 
 
 Number of eigenvectors (or factors) to be out- 
 put. 
 
 Special Comments 
 
 The eigenvalues are stored in descending algebraic order (from largest 
 to smallest), and the eigenvectors and factors are placed in the same order. 
 
 IV. Reference 
 
 Ralston, A. and Wilf , H. S. : Mathematical Methods for Digital Computers , 
 John Wiley and Sons, New York, 196i+. 
 
OBLIMAX rotation 
 
 I. General Description 
 
 The OBLIMAX OBLIQUE ROTATION transforms a set of factors F to a new 
 set V such that the factor kurtosis, 
 
 k 
 
 ZZ V. . • -i o 
 K _ ij i = 1, 2, , n 
 
 , 2,2 j = 1, 2, , k 
 
 (ZE v. . ) J ' ' ' 
 
 ID 
 
 is at a maximum. 
 
 The purpose of the transformation is to attempt to rotate analytically 
 to a position similar to that obtained by applying Thurstone's rules for 
 simple structure. (See Multiple Factor Analysis , L. L. Thurstone, 19^7j 
 pp. 319-^10.) However, Thurstone's rules and the oblimax procedure are 
 not the same, and it is too much to expect that results obtained from 
 both procedures will agree exactly. 
 
 It would be desirable to solve directly for the transformation matrix 
 T, but unfortunately no solution to this problem has been found. Instead 
 oblimax takes two vectors at a time, solves for the rotational angles, 
 transforms the vectors, and then selects another pair until all k(k-l) 
 pairs have been rotated. This process is repeated iteratively until the 
 criterion K no longer increases. Despite the pairwise procedure, K is 
 well behaved, and in general, approaches steadily to a minimum. 
 
 For any pair of factors, a and b, the solution proceeds as follows: 
 
 K 
 
 ZZ(a. cos 0. + b. sin 0.) EZ(a. + b. X.) 
 
 i r 3 i r 3 = i i J 
 
 [zz(a. cos 0. + b. sin 0.) 2 ] 2 [zz(a. + b. X.) 2 ] 2 
 
 The derivative of K a j-, is set equal to zero, resulting in a quartic 
 equation in X which is tan 0. Two solutions for X will be maxima, and, 
 each X is found, the sign of the second derivation is inspected to select 
 maxima. A small transform (2 x 2) is created, but before post-multiplication 
 is performed, the transforms must be adjusted so that when it becomes a part 
 of T, t , and t^, will remain normalized. In this way, both B and T are 
 developed pair by pair. 
 
 For references see: 
 
 Pinzka, C, and Saunders, D. R., "Analytic Rotation to 
 Simple Structure II: Extension to an Oblique Solution." 
 Research Bulletin RB-5U- 31 • Princeton, N. J.: Educational 
 Testing Service, 195^- 
 
 II. Alternate Use 
 
 If the user already has a transformation matrix, he may use it to com- 
 pute V rs et . al by giving the input address of T in parameter 8; in this case, 
 
OBLIMAX ROTATION 
 Page 2 
 
 the oblimax calculation of T will be skipped. If both F and T are to be 
 input from cards, then the data deck of F should precede the deck of T. 
 
 III. Output 
 
 The OBLIMAX program always prints the following (unless parameter 8 
 is used) : 
 
 1. The value of K for each pass 
 
 2. The iteration time 
 
 It outputs the following on demand (See Parameters) : 
 
 1. Transformation matrix T 
 
 2. Reference vector structure, V rs = FT 
 
 3- Reference vector correlations, C rs = T'T 
 
 k. Diagonal of D and of D" 1 
 
 where D is the diagonal matrix of the reciprocal 
 square root of the diagonal elements of C rs 
 
 5. Primary factor pattern, Vf. p = FTD = V rs D 
 
 6. Primary factor correlations, C-^p = DC ~-4) 
 All data is printed out to seven decimal places. 
 
 IV. Restrictions 
 
 The number of variables plus the number of factors must be no more 
 than 300. 
 
 V. Note on Parameter 2 
 
 If row normalization is specified, the normalization constants will be 
 preserved and the rows will be re scaled to proper length after rotation and 
 prior to output. 
 
 VI . Parameters 
 
 The program name, OBLIMAX, appears first on the program call card 
 and is followed by the following parameters. Any output option (except 
 parameter 9) may be SEQUENTIAL 1-15 and/or PRINT; it may be left blank 
 if not desired. 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address of F. CARDS or SEQUENTIAL 1-15- 
 
OBLIMAX ROTATION 
 Page 3 
 
 Parameter 
 Number Use or Meaning 
 
 2 If rows are to be normalized before rotation, 
 punch a 1; otherwise a zero or leave blank. 
 (See Note on Parameter 2 ) . 
 
 3 Output Address of T. 
 h Output Address of V rs 
 
 5 Output Address of C 
 
 6 Output Address of V fp 
 
 7 Output Address of C„ 
 
 8 Input Address of T (See Alternate Use ) . 
 
 9 D- value and Inverse of D. PRINT only. 
 
PRINCIPAL AXIS FACTOR ANALYSIS 
 (Eigenvalues and Vectors) 
 
 General Description 
 
 The purpose of PRINCIPAL AXIS FACTOR ANALYSIS is to determine a 
 factor matrix, F, given a Gramian matrix, R, of order n such taht 
 
 F (n,f) F '(f,n) = R*( n ,n) 
 where R* is an approximation to R. 
 
 The column vectors of F are defined as the factors (measures of 
 dimensionality) of the original matrix, R. The solution for the matrix 
 F is the classical eigen problem. Consequently, the computations are 
 done by an eigenvalue subroutine. Before output the eigenvectors, Ej, 
 are scaled as follows : 
 
 F(I,J) = E(l,j)*LAMBDA(j)**.5 
 
 for I = 1, ....,n. J = 1, ....,n. 
 
 to generate the principal axis factors, F. (See Introduction on Factor Analysis] 
 
 For a more detailed discussion see: 
 
 Harry Harmon, Modern Factor Analysis , Chicago, University of 
 Chicago Press, i960, pp. 15U-191. 
 
 Restrictions 
 
 The input matrix for the PRINCIPAL AXIS program must not exceed the 
 dimensions of 190 x 190 double precision. The input matrix is further 
 limited to being a square, symmetric matrix. Generally correlation, 
 covariance, or cross-product matrices are used as input data. It should 
 be noted that matrices with large numerical entries such as cross-products 
 may generate output values which cannot be printed under the fixed out- 
 put formats. The probability of this happening is very small. Any com- 
 munality estimation (i.e., change in the diagonal entries of R) must be 
 done prior to the input of R, to the PRINCIPAL AXIS program. 
 
 If the communality estimates are used, the user should check the 
 resulting roots for negative numbers. If any exist the associated vector 
 is meaningless. 
 
 The input data may come from any source conforming to SOUPAC. Similarly, 
 the output codes follow the established conventions and are specified at the 
 option of the user. 
 
 The R matrix may be completely factored (i.e., N factors from N vari- 
 able matrix) . However, there are three criteria which may be used to stop 
 the factoring : 
 
PRINCIPAL AXIS FACTOR ANALYSIS 
 
 Page 2 
 
 1. The user may specify the number of factors to be extracted. 
 This criterion provides an upper limit beyond which factoring 
 will not proceed. Therefore, it is necessary to put the 
 maximum value in this limit in cases where it is not the 
 primary criterion. 
 
 2. The percentage of total variance removed from R is the 
 second limiting criterion. This parameter also specifies 
 an upper limit to the process. Therefore, it should be set 
 at 100 per cent unless it is the criterion for stopping. 
 
 3- The last criterion is to stop when the factor contribution 
 (eigenvalue or root) falls below 1. The use of this pro- 
 cedure is dictated by the presence of its parameter. 
 
 If all three criteria are employed simultaneously, factoring is stopped 
 by whichever criterion is first met. 
 
 III. Parameters 
 
 The parameters for the PRINCIPAL AXIS program appear on the program 
 call card. They must follow the program name in this order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 
 2 Output Adddres. SEQUENTIAL 1-15 and/or PRINT. 
 
 3 Maximum number of factors to be extracted. 
 This must be less than or equal to the 
 order of the input matrix. 
 
 h The percentage of total variance to be 
 
 removed expressed as an integer between 
 and 100. 
 
 5 The presence of a number greater than 
 indicates the factoring should stop when 
 the eigenvalues (roots) fall below unity. 
 
 6 Output Address of Eigenvectors 
 
 The address of where eigenvalues are to be 
 placed as a row vector if they must be 
 stored for further use. If values need 
 not be saved, leave parameter blank. PRINT 
 is not valid. 
 
PRINCIPAL AXIS FACTOR ANALYSIS 
 
 Page 3 
 
 Parameter 
 Number 
 
 Use or Meaning 
 
 Mode of sorting eigenvalues and associated 
 vectors. The codes are as follows: 
 
 Code 
 
 Meaning 
 
 
 
 1 
 
 2 
 
 10 
 
 11 
 12 
 
 Descending algebraic order 
 Descending absolute values 
 Order of extraction 
 Ascending algebraic order 
 (the k smallest root) 
 Ascending absolute values 
 Reverse order of extraction 
 
 Leaving any parameter blank is the same as specifying zero. Con- 
 sequently, options which are not needed can be avoided by leaving the 
 associated parameter blank. 
 
PROCRUSTES (Oblique Case) 
 
 General Description 
 
 This program offers 3 options: 
 
 1. (Oblique) Procrustes . Given A, B, the program solves 
 
 AT* = B + E 
 
 for T* in a least square sense (i.e., minimizing tr[E'E]), 
 so that 
 
 T* = (A'Al^A'B, 
 
 and then normalized T* by columns to yield T = T*D so that 
 
 diag (T'T) = I. It then computes AT which, in a loose sense, 
 
 can be regarded as a least squares fit to A to B under the 
 
 restriction that diag (T'T) = I. It also provided 
 
 C f = D n (T'T) _1 where D n is a normalized diagonal matrix 
 
 so that diag (Cf) = I. If D gave the cosines between tests 
 
 C f will give the factor intercorrelations. A has to be a 
 
 full column rank. 
 
 Dwyer Extension Analysis . Given F = R tc , a centroid or 
 equivalent matrix of cosines between tests t and uncorrelated 
 factors c, and L = R cn , a matrix of cosines between uncorre- 
 lated factors c and uncorrelated reference vectors n, this 
 program computes 
 
 Q = T tn = F(F'F)~ 1 L 
 
 which is used as a post-multiplier on some correlations 
 matrix R et between the tests t x in F and some set of 
 extension variables e given R en , the cosines of the extension 
 variables e with reference n, to the extent that the former 
 can be projected into the sub -space spanned by the latter. 
 This multiplication 
 
 R en = R et T tn 
 can be performed by use of the MATRIX program. 
 Left Inverse (transposed). Given A, the program will return 
 
 Q = A(A'A)" 1 
 
 provided A was a full column rank. Q is the transposed left 
 inverse of A which can be used in lease squares application. 
 
 IP 
 
 Restrictions 
 
 Input is restricted to matrices (A, B, or F) of order 190 x 50 or less, 
 
PROCRUSTES 
 Page 2 
 
 III. Parameters 
 
 The parameters for this program appear on the program call card. 
 They must follow the program name in this order: 
 
 Parameter 
 Number 
 
 Use or Meaning 
 
 Procrustes 
 
 LEA 
 
 LINV 
 
 1 
 
 Input Address 
 CARDS or 
 SEQUENTIAL 1-15- 
 
 A 
 
 F 
 
 A 
 
 2 
 
 Input Address 
 CARDS or 
 SEQUENTIAL 1-15- 
 
 B 
 
 L 
 
 
 3 
 
 - Output Address 
 SEQUENTIAL 1-15 
 and/or PRINT. 
 
 A 
 
 F 
 
 A 
 
 Output Address B L 
 SEQUENTIAL 1-15 
 and/or PRINT. 
 
 Output Address T Q A(A'A)' 
 SEQUENTIAL 1-15 
 and/or PRINT. 
 
 Output Address C~ 
 SEQUENTIAL 1-15 
 and/or PRINT. 
 
 Output Address AT 
 SEQUENTIAL 1-15 
 and/or PRINT. 
 
 Output Address E 
 SEQUENTIAL 1-15 
 and/or PRINT. 
 
 Choice Address 12 
 
SQUARE ROOT FACTOR ANALYSIS 
 
 I. General Description 
 
 The SQUARE ROOT method of factor analysis, also called the Diagonal 
 Method, by L. L. Thurstone, decomposes a correlation matrix R (or any 
 other positive semi-definite or definite symmetric matrix) such that 
 
 R = F F' + R(k+1) 
 
 where R(k+l) is the residual matrix after extracting k factors. Of course 
 if all n factors are extracted, the residual matrix becomes a null matrix. 
 
 The factor fj is computed by dividing each element of the j^h column 
 of R by its diagonal square root : 
 
 f^ := r i -/v/"r 
 
 ij 
 
 33 
 
 (i = 1,2, 
 
 The matrix A = fj'f" is then subtracted from R and the operation repeated 
 on the residual matrix. 
 
 Prior to the widespread use of high speed computers, the SQUARE ROOT 
 method was sometimes used as a substitute for the PRINCIPAL AXIS or CENTROID 
 method due to the relative ease of computing a square root factor. When 
 used in this way, one seeks to extract the maximum variance for each factor, 
 in which case Parameter h should be blank. The program then selects 
 the next column on the basis of the largest residual column sum of squares. 
 
 Nowadays, however, the SQUARE ROOT method is more likely to be used 
 for special purposes. By selecting successive pivot variables, the user 
 retains control over the factoring. Factors are passed directly through 
 the test variables and the effect of these variables is removed from the 
 matrix. The communalities or row sums of squares are the squared multiple 
 correlations of the remaining variables with the pivot variables. 
 
 The pivots selected may be any columns in the matrix. Let us assume, 
 however, that these are adjacent to each other in the upper right hand 
 corner of the partitioned matrix below: 
 
 R = 
 
 R PP R ps 
 
 R Poo 
 
 sp ss 
 
 Then the effect of pivoting successively on the variables in the upper 
 right hand corner is shown by the residual matrix as follows: 
 
 R (P+1) = R-F r F ' 
 
 
 
 R ^R R "J-R 
 ss sp pp ps 
 
 II. Restrictions 
 
 ) 
 
 A. 
 
 Dimension 
 
 Maximum size of the R matrix is 190 variables. 
 
SQUARE ROOT FACTOR ANALYSIS 
 
 Page 2 
 
 B. Special Conditions 
 
 1. The researcher may specify the extraction of any number of 
 factors up to dimension of R. 
 
 2. The researcher may specify the diagonal element to be used 
 
 in the extraction of each factor, or he may have the procedure 
 remove the maximum variance each time. 
 
 3. The residual matrix may be saved if the researcher desires. 
 
 III. Parameters 
 
 Following the program name the parameters must appear in the following 
 order on the program call card: 
 
 Parameter 
 Number 
 
 2 
 3 
 h 
 
 Use or Meaning 
 
 Input Address. CARDS or SEQUENTIAL 1-15- 
 (Correlation or positive definite or semi- 
 definite matrix). 
 
 Output Address. SEQUENTIAL 1-15 and/or PRINT. 
 
 Number of factors extracted. 
 
 Input Address for diagonal elements. 
 CARDS or SEQUENTIAL 1-15- 
 
 Output Address for residual matrix. 
 SEQUENTIAL 1-15 and/or PRINT. 
 
 IV. Special Comments 
 
 If the diagonal element for each factor is specified, and if both 
 input addresses are cards, then data precedes diagonal specification. 
 
 V. Example 
 
 Assume you have a 20 x 20 correlation matrix on cards and that you 
 want to extract 15 factors ; also you are reading the pivot column from 
 cards. The program would be set up as follows: 
 
SQUARE ROOT FACTOR ANALYSIS 
 
 Page 3 
 
 /*ID 
 
 // EXEC SOUPAC 
 //SYS IN DD * 
 SQU(C)(P)(15)(C)(P). 
 END SOUPAC 
 DATA(20)(8F9-T) 
 
 data 
 
 END # 
 DATA(15)(15I2) 
 
 END # 
 
 /* 
 
 diagonal specification card(s) 
 
THREE-MODE FACTOR ANALYSIS 
 
 I. General Description 
 
 A. GENERAL COMMENTS 
 
 This program provides a factor analytic solution for a 3-dimensional 
 i by j by k data matrix. The computational procedures employed are 
 those presented in Method III of Tucker's article (reference below). 
 This method provides most efficient analysis when one of the modes, 
 usually individuals is quite large, though this is certainly not a 
 necessary condition. 
 
 B. THE THEORETICAL MODEL 
 
 Here, d, j, and k represent the modes of classification which 
 are directly related to the observation of the data; i, j, and k are 
 thus termed observational modes. An example would be the observation 
 of scores for i individuals on j tests given under k different conditions. 
 
 Through factoring, we wish to reduce the observational modes i,j, and 
 k to corresponding derivational modes m, p, and q. Each of the deriva- 
 tional modes can be thought of as a set of factors in the domain of the 
 corresponding observational mode. The core matrix G then serves to 
 describe the relationships among the derivational modes. 
 
 The fundamental three-mode factor analysis model is represented by 
 the equation : 
 
 Z- ZEE 
 
 x - ., = a. b. a. g , 
 
 ljk m q p m jp kq mpq 
 
 where x. ., is an approximation to the observed score x. ... ; a. , b. , 
 ljk ** ljk 1m' jp' 
 
 and c, are entries in two-mode matrices .A , .B , and , C describing 
 kq l m 3 p 7 k q 
 
 the elements in the observational modes i, j, and k in terms of the 
 
 dimensions in the derivational modes m, p, and q respectively; the 
 
 coefficients g are entries in a three-dimensional matrix G and 
 
 represent the measures of the phenomenon being observed for each 
 
 combination of the dimensions of the derivational modes. 
 
 In matrix form, the model could be represented as: 
 
 .X.. = .A G, A B. % C. ) , 
 1 jk l m (pq) v p j qk' 
 
 where X indicates a Kronecker product. Matrices A, B, and C are 
 factor solutions for modes i, j, and k respectively which serve to 
 transform the core matrix G of the 3 derivational modes to the matrix 
 X representing the 3 observational modes. 
 
THREE MODE 
 FACTOR ANALYSIS 
 Page 2 
 
 C. INPUT DATA 
 
 The input data must be a Gramian matrix, usually correlations, 
 covariances, or cross-products, in the form .,R , where i is assumed to 
 be the largest mode and jk represents the combination mode, with 
 mode k nested within mode j. 
 
 II. Output 
 
 The output consists of the following: 
 
 (l) the .P. and CL matrices which represent the correlations, covariances, 
 or cross products within modes j and k respectively; 
 
 (2^ the eigenvalues and eigenvectors of R ; 
 
 Jk jk 
 
 (3) the eigenvalues and eigenvectors of .P.; 
 (h^ the eigenvalues and eigenvectors of ,0- 
 
 (5 ' the core matrix G , where m, p, and q represent the derivational 
 modes corresponding to observational modes i, j, and k respectively. 
 
 All of this is printed out and may also be stored on sequential storage 
 devices. The user must specify the number of factors to be extracted 
 from each of the three modes. This procedure is employed since the use 
 of other factor- stopping criteria (e.g. per-cent of variance accounted 
 for, or eigenvalues below unity^ could easily lead to the computation of 
 a great many useless factors as well as a very large and unmanageable 
 core matrix. The user is also cautioned against specifying large numbers 
 of factors since this would cause substantial increases in time required 
 to factor the various modes and compute the core matrix. 
 
 It is strongly recommended that the user be familiar with the Tucker 
 article and with factor analysis in general before attempting to use 
 this program. 
 
 III. Parameters 
 
 The program mnemonic (T-Ml appears first on the program card and is 
 followed by the following 16 parameters, the first 8 of which are required. 
 Output addresses are optional. All output will be printed. 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input address of R matrix in the form „ R., 
 
 Dk jk 
 
 (Cards or S1-S15). 
 
 2 Number of subjects or elements in mode i. 
 
 3 Number of variables in mode j. 
 
THREE MODE 
 FACTOR ANALYSIS 
 Page 3 
 
 Parameter 
 Number 
 
 It 
 
 5 
 6 
 
 7 
 8 
 
 9 
 10 
 11 
 12 
 13 
 11+ 
 15 
 16 
 IT 
 
 Use or Meaning 
 
 Number of variables in mode k. 
 
 Number of factors to be removed from matrix R (mode i), 
 
 Number of factors to be removed from matrix P (mode j ) , 
 
 Number of factors to be removed from matrix Q (mode k) 
 
 (1) If input matrix is cross-products 
 
 (2) If input matrix is correlations. 
 
 (3) If input matrix is covariances. 
 
 Output address for row vector of eigenvalues of R. 
 
 Output address for eigenvectors of R. 
 
 Output address for matrix .P • 
 
 Output address for row vector of eigenvalues of P. 
 
 Output address for eigenvectors of P. 
 
 Output address for matrix -.0,' 
 
 Output address for row vector of eigenvalues of Q. 
 
 Output address for eigenvectors of Q,. 
 
 Output address for core matrix G . 
 
 pq m 
 
 IV. Reference 
 
 Tucker, Ledyard R. Some mathematical notes on three-mode factor analysis, 
 Psychometrika, 1966, 31, 279-311. 
 
UNRESTRICTED MAXIMUM LIKELIHOOD FACTOR ANALYSIS 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address for correlation matrix. 
 
 SEQUENTIAL 1-15; CARDS are not permitted 
 
 2 Output Address for final unrotated factor matrix. 
 
 SEQUENTIAL 1-15- See also Parameter 13- 
 
 3 Input Address for row vector of initial 
 
 estimate of uniqueness. CARDS, SEQUENTIAL 
 1-15 (optional). 
 
 k Lower bound for number of factors. 
 
 5 Upper bound for number of factors . 
 
 6 Sample size (number of observations) on 
 
 which correlation matrix is based. 
 
 7 Maximum number of iterations. 
 
 8 Probability of chance occurance, i.e., *1.00*. 
 
 9 1 to print input correlation matrix and partial 
 
 correlation matrices after any variables have 
 been removed. 
 
 10 1 to print technical output. 
 
 11 1 to print intermediate results. 
 
 12 1 to punch unrotated factor matrices. 
 
 13 1 to apply a varimax rotation to all factor 
 
 matrices. If this parameter is used the output 
 of parameter 2 will be a rotated factor matrix. 
 
 This program has been taken directly from JSreskog (1967) with his per- 
 mission. Anyone interested in the methods is referred to the references listed 
 below. The program is temporarily limited to 75 variables and 30 factors. 
 Parameters 1, k, 5» 6, 7> and 8 are required. Parameter 8 must be enclosed 
 within asterisks, **, and must have a punched decimal point. 
 
 References : 
 
 Joreskog, K. G. UMLFA - a computer program for unrestricted maximum likelihood 
 factor analysis. Research Memorandum 66-20. Princeton, New Jersey: 
 Educational Testing Service. Revised Edition, 1967 • 
 
 JSreskog, K. G. Some contributions to maximum likelihood factor analysis. 
 Psychometrika , I967, 32, 3+1+3-U82 . 
 
VARIMAX FACTOR ROTATION 
 
 I. General Description 
 
 VARIMAX ROTATION is used to redistribute a factor matrix (principal 
 axis, centroid, etc. 1 ) variance so that the matrix approches orthogonal 
 simple structure. The varimax scheme maximizes the following criterion 
 function: 
 
 Z (hz(a,. 2 / h (j) 2 ) 2 - (z(a, , 2/h (j) 2 )) 2 ) 
 s 3 U ' S) j l3 ' S) 
 
 where j is the variable index number: 1, ., n 
 
 s is the factor index number: 1, , f 
 
 a/ . \ is the factor loading of the n^h variable on the s factor 
 
 hj is the j* variable communality 
 
 For further discussion see: 
 
 H.F. Kaiser, "Computer Program for Varimax Rotation in Factor 
 Analysis", Educational and Psychological Measurement , Vol. XIX, 
 Nov. 3, 1959, pp. 413-^207" 
 
 Cooley and Lohnes, Multivariate Procedures for the Behavioral 
 Sciences , New York, John Wiley and Sons, Inc., 1962, pp.l6l-3- 
 
 II. Restrictions 
 
 The input matrix for VARIMAX ROTATION must not exceed 190 variables 
 and 190 factors. The number of factors may be anything greater than or 
 equal to 2. Any factor matrix generated by a statistical system factor 
 analysis program is acceptable input. A matrix may also be entered from 
 cards. 
 
 III. Parameters 
 
 The parameters for the VARIMAX ROTATION appear on the program call 
 card. They must follow the program name in this order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 
 2 Output Address. SEQUENTIAL 1-15 and/or PRINT. 
 
 3 The presence of a number greater than in this 
 parameter indicated the communalities should be 
 printed. 
 
 4 or blank for normal VARIMAX. 1 if raw VARIMAX 
 is desired. 
 
 *If CARDS are used the DATA card must contain the number of rows as well 
 
 as the number of columns in the input matrix (see User's Guide for details) 
 SOUPAC (Statistically Oriented Users Programming and Consulting) 
 
ECONOMETRICS PACKAGE 
 
ECONOMETRIC REDUCED FORM AND RESIDUAL ANALYSIS 
 
 I- General Description 
 
 The ECONOMETRIC REDUCED FORM AND RESIDUAL ANALYSIS program calculates 
 the following : 
 
 (1) Residuals 
 
 U = (YX) fp7' 
 
 where : 
 
 (YX) is the raw data matrix of endogenous and exogenous variables 
 andjp/is the matrix of coefficient estimates. 
 
 (2) Durbin-Watson statistic each equation(i) 
 
 N 
 
 2 
 
 Z [Uj(t) _ Uj(t-1) 3 
 t=2 ' 
 
 Z [Ui(t)] 2 
 
 t=l 
 
 (3) Covariance matrix for residuals 
 
 w= :'p] (s)ip 
 
 where : 
 
 S is the raw data covariance matrix. 
 (U) Reduced form estimates 
 
 * = -'(P) -1 (r) 
 
 (51 Reduced form predicted values 
 y" = Yn 
 
 (6) Reduced form residuals 
 V = Y - Y 
 
 (7) Covariance matrix for reduced form residuals 
 
 (P" 1 )'w(p- 1 ) 
 
References 
 
 ECONOMETRIC REDUCED FORM AND 
 RESIDUAL ANALYSIS 
 
 Page 2 
 
 II, 
 
 Johnston, J., Econometric Methods , New York, McGraw-Hill Book 
 Company, Inc., i960. 
 
 Goldberger, Arthur S., Econometric Theory , New York, John Wiley 
 and Sons, Inc., 196U. 
 
 Restrictions 
 
 Only those inputs used in the calculations called for need be given. 
 They must be in the following formats : 
 
 (1) Coefficients: 
 
 The coefficient matrix for K equations with N variables, Nl 
 exogenous and N2 endogenous, must be a K by N+l matrix. Each 
 row corresponds to an equation. The first element in each row 
 is the constant term followed by the coefficients matrix (i.e., exog- 
 enous coefficients first; endogenous coefficients next). In 
 each row, there must be -1 which corresponds to the endogenous 
 variable that was normalized on. 
 
 (2) Raw Data: 
 
 The data must be arranged so that exogenous variables occur 
 first and endogenous variables last. (The TRANSFORMATION program 
 may be used to arrange data in this way, if it is not already 
 like this^ . 
 
 (3) Raw Data Covariance Matrix: 
 
 The covariance matrix must have the following form 
 
 sample 
 size 
 
 (standard deviations) 
 
 X ( 
 
 means , 
 
 Covariance (exogenous first, endogenous last) 
 
 Care should be taken to see that an input address is specified for any data 
 needed in calculating the desired statistics and that any intermediate 
 statistics needed are stored (i.e., an output address besides print is 
 specified ). The following list indicates which previous statistics are 
 needed in the calculation of each statistic. 
 
ECONOMETRIC REDUCED FORM AMD 
 RESIDUAL ANALYSIS 
 
 Page 3 
 
 1. Residuals - coefficients and raw data 
 
 2. Dubrin-Watson statistic - coefficients and raw data 
 
 3. Covariance matrix for residuals - coefficients and raw 
 data covariance matrix 
 
 k. Reduced form coefficients - coefficients 
 
 5. Reduced form predicted values and residuals - reduced form 
 coefficients (no output address) and raw data 
 
 6. Covariance matrix of reduced form residuals - reduced form 
 coefficients and covariance matrix of orginal residuals 
 
 This program is restricted to less than 150 variables. 
 
 III. Parameters 
 
 The parameters appear on the program card following the name ECONO- 
 METRIC in the following order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address for coefficients. SEQUENTIAL 1-15. 
 
 2 Input Address for raw data covariance matrix. 
 SEQUENTIAL 1-15- 
 
 3 Input Address for raw data. SEQUENTIAL 1-15- 
 
 (See Special Comments) . 
 
 h Number of exogenous variables (total). 
 
 5 Output Address for residuals. SEQUENTIAL 1-15 
 and/or PRINT. 
 
 6 Output Address for covariance matrix of 
 
 residuals. SEQUENTIAL 1-15 and/or PRINT. 
 
 7 If greater than 0, reduced forms are 
 calculated and printed. 
 
 8 ' If greater than 0, reduced form predicted 
 
 values and residuals are printed. 
 
 9 If greater than 0, covariance matrix for 
 reduced form residuals is printed. 
 
ECONOMETRIC REDUCED FORM AMD 
 
 RESIDUAL ANALYSIS 
 
 Page k 
 
 IV. Special Comments 
 
 If Parameter Number 3 is specified, the Durbin-Watson statistic will 
 be calculated and printed. 
 
K-CLASS ESTIMATION 
 
 I. General Description 
 
 KIClass and K2Class programs work together to do K-class estimation. 
 There are three estimators which belong to the K-class. These include: 
 Ordinary Least Squares (multiple regression), Two-stage Least Squares, 
 and Limited Information Maximum Likelihood. 
 
 II. Description of KIClass Output 
 
 The KIClass program calculates the "basic statistics: 
 
 _ EX. 
 
 Mean: X. = „ T where N = Sample Size 
 1 N 
 
 Variance ZX.X. 
 
 Covariance: S. . = — |-"- - X.X. 
 
 ij N 1 J 
 
 Standard Deviation: s. = / S. . 
 1 li 
 
 Correlation: C. . = 
 
 S. . 
 
 10 s.s. 
 
 Cross-products in matrix notation : CP = X'X 
 
 KIClass also calculates the eigenvalue to he used in Limited Information 
 Maximum Likelihood estimation in K2Class. 
 
 Variables must be ordered; exogenous, endogenous, (independent, dependent), 
 
 III. Description .of K2Class Output 
 
 K2Class calculates estimates and associates statistics for Ordinary 
 Least Squares (OLS), Limited Information Maximum Likelihood, and Two- 
 stage Least Squares. 
 
 Formulas 
 
 B' = estimates of the endogenous variables 
 
 Y - estimates of the exogenous variables 
 y = variable normalized on 
 
 Y = endogenous variable in the equation 
 X = exogenous variables in the equation 
 X = exogenous variables in the system 
 
Estimating Formulas 
 
 K-CLASS ESTIMATION 
 Page 2 
 
 Y'Y - kV'V 
 X'Y 
 
 Y'X 
 * 
 
 -1 
 
 X'X 
 * * 
 
 
 
 
 
 Y'y - kVy c 
 
 x'.y n 
 
 or 
 
 = A J 
 
 where V'V = Y'Y - Y'X(X'X) 1 X'Y 
 
 V'y = Y'v - Y'X(X'X) -1 X'y 
 J o • o v J o 
 
 k determines the estimating technique. 
 
 standard Error of Estimate 
 
 ; = yX - (e y) c 
 
 N-J 
 
 where j = Rank of A 
 
 -1 
 
 Standard Error of the Estimated Coefficients . 
 
 s*. 5 The square root of the i diagonal element of the 
 a A matrix 
 
 
 li 
 
 ariance matrix of the coefficients 
 
 C = a A 
 
K-CLASS ESTIMATION 
 Page 3 
 
 IV. Ordinary Least Squares (OLS) 
 
 When K = 0, the K-class formula reduces to Ordinary Least Squares. Y 
 is then assumed to he another exogenous variahle in the equation. When 
 ordinary least squares is specified the following additional statistics 
 are supplied. 
 
 In matrix notation 
 
 2 
 0C - ly / N Explained Sum of Squares 
 
 R = - 
 
 p 
 y 'y - Zy / N Total Sum of Squares 
 
 o " o o 
 Total Sum of Squares 
 
 TSS = y Q ' y Q - I 2 y /N 
 
 Regression (Explained) Sum of Squares : 
 
 RSS = 9C - £y 2 /N 
 
 Error (Unexplained) Sum of Squares : 
 
 ESS = TSS - RSS = y 'y - 0C 
 
 J o o 
 
 Example: Suppose that there were two possihle models that one wanted to 
 estimate. In one model variahle 10 is included and in the other variahle 
 10 was not included. 
 
 K1C(S1)(S2)(0)( )( )(1)(1)( Hi). 
 
 END P 
 
 K2C(S2)(S3)(SM( )(10)(2)*0*. 
 
 (10)(1)(1)(2)(3)(M(5K6)(7)(8)(9)(10)(11). 
 
 (9)(l)(D(2)(3)(U)(5)(6)(7)(8)(9)(ll). 
 
 END P 
 
 V. Limited Information Maximum Likelihood (LIML) 
 
 When K = the smallest eigenvalue as calculated in KIClass, K2Class 
 calculates Limited Information Maximum Likelihood estimates. 
 
 The eigenvalue is calculated in the following manner: 
 
 Let 
 
 W = Y'Y - Y'X (X'X) _1 X' Y 
 * * * 
 
 W = Y'Y - Y'X(X'X)~ 1 X'Y 
 KIClass transfers to K2Class the smallest eigenvalue of the matrix W W^ 
 
K-CLASS ESTIMATION 
 Page k 
 
 Example: Suppose there is a two equation system with four exogenous and 
 two endogenous variables. The following program will calculate LIML 
 estimates for both equations: 
 
 KLC(C)(S1)(0)(2)(U)( )( )(S2). 
 
 (2)(2)(1)(U)(5)(6). 
 
 (2)(2)(2)(3)(5)(6). 
 
 END P 
 
 K2C(S1)(S3)(SU)(S2)(U)(2)*-1*. 
 (2)(2)(1)(U)(5)(6). 
 (2)(2)(2)(3)(5)(6). 
 
 END P 
 
 VI. Two-stage Least Squares (2SLS) 
 
 VJhen K = 1 is specified Two stage Least Squares estimates are computed. 
 
 Example : Suppose one has a three equation model that contains three 
 endogenous and ten exogenous variables. The following program will calculate 
 2SLS estimates. 
 
 K1C(C)(S2). 
 
 END P 
 
 K2C(S2)(S3)(SM( )(10)(3)*1.*. 
 
 (5)(2)(1)(2)(3)(M(5)(H)(13). 
 
 (M(2)(3)(U)(6)(7)(12)(13). 
 
 (5)(2)(1))2)(8)(9)(10)(11)(12). 
 
 END P. 
 
 r TI. 
 
 Parameters 
 
 A. MAIN Parameters 
 
 1. KIClass 
 
 Parameter 
 Number 
 
 1 
 
 2 
 
 1+ 
 5 
 
 Use or Meaning 
 
 Input Address. CARDS, SEQUENTIAL 1-15- 
 
 Output Address for raw data covariance 
 matrix. SEQUENTIAL 1-15- 
 
 Type of Input = raw data 
 
 1 = cross-products 
 
 2 = covariance 
 
 Number of equations of LIE; otherwise 
 
 Total number of exogenous variables in LIE 
 system; 1 <_ n <_ 9^+ • 
 
 1 if want covariance matrix printed. 
 
K-CLASS ESTIMATION 
 Page 5 
 
 Parameter 
 Number 
 
 7 
 8 
 
 Use or Meaning 
 
 1 if want cross-products matrix printed. 
 
 Output Address for eigenvalues. 
 SEQUENTIAL 1-15- 
 
 1 if want correlations matrix printed. 
 
 2. K2Class 
 Parameter 
 
 Number 
 
 5 
 6 
 7 
 
 Use or Meaning 
 
 Input Address same as output address for 
 KIClass. 
 
 Output Address for estimated coefficients. 
 SEQUENTIAL 1-15- 
 
 Scratch Address 
 SEQUENTIAL 1-15- 
 
 Input Address for Eigenvalues same as 
 output address in KIClass. 
 
 Number of exogenous variables in the system. 
 
 Number of equations. 
 
 Floating point value of k. (See special 
 comments.) This value should be enclosed 
 in asterisks. 
 
 B. Sub-parameters (Equation Control Cards) 
 
 Subparameters are needed for all K2Class programs and for KIClass 
 when LIML estimates are desired. There is a subparameter card for 
 each equation. If subparameters appear in both KIClass and K2Class, 
 they must appear in the same order. 
 
 Each equation control card has the following form: 
 
 Parameter 
 
 Number 
 1 
 2 
 3 
 
 Use or Meaning 
 
 Number of exogenous variables in the equation. 
 
 Number of endogenous variables in the equation. 
 
 The variable number of all variables in the 
 equation in the order : 
 
K-CLASS ESTIMATION 
 Page 6 
 
 1 - exogenous in the equation 
 
 2 - endogenous in the equation with 
 
 variable standardized on last. 
 
 VIII. Special Comments 
 
 A. If K = *0* 
 If K = *1* 
 If K = *-l* 
 
 Ordinary Least Squares Estimates are computed. 
 2-Stage Least Squares Estimates are computed. 
 Limited Information Maximum Likelihood Estimation are 
 computed. 
 
 B. When LIML estimates are desired, the output and input addresses 
 for eigenvalues must he specified . 
 
 C. KIClass accepts data from cards or intermediate storage either as 
 raw data, cross-products, or covariance. If cross-products or 
 covariance are used as input to KIClass, the matrix must be in 
 the following order : 
 
 Cross-products 
 
 Ex 
 
 Covariance 
 
 Ex 
 
 cross- 
 products 
 
 covariance 
 
 Input to K2Class must come from KIClass. However, once data has been 
 processed by KIClass, it may be used by K2Class any number of times. 
 
 IX. 
 
 References 
 
 Goldberger, A. S. , Econometric Theory , New York, John Wiley and Sons, 
 Inc. , 196U. 
 
 Johnston, J. , Econometric Methods , New York, McGraw-Hill Book Company, 
 Inc. , I960. 
 
LINEAR PROGRAMMING 
 
 I. General Description 
 
 LINEAR PROGRAMMING maximizes or minimizes a linear function subject to 
 certain linear inequalities called constraints. 
 
 In matrix notation : 
 
 Find the solution to 
 
 AX <, = , >b (a system of linear equations or inequalities) 
 which maximizes (or minimizes) 
 
 Z = CX 
 where X > 
 
 A is the matrix of coefficients of the constraints, X the vector of vari- 
 ables, C the vector of costs or profits associated with e ach variable, and b 
 a vector or matrix of non-negative constants which places a bound on the linear 
 equations. 
 
 The equations, AX <, =, >b in n variables define and bound a space called 
 the feasible space in which all allowable values of the n variables are defined. 
 The SIMPLEX criterion finds those combinations of variables which optimize the 
 objective function within this feasible space. To solve the system of linear 
 equations defined above, the inequalities must be changed to equalities. This 
 is accomplished by addition of surplus variables to "greater than" constraints, 
 and slack variables to "less than" constraints. To create the basis for solving 
 a system of linear equations, an identity matrix must be formed and augmented 
 to the A matrix of structural variables. Creation of the identity matrix is 
 completed by addition of artificial variables to constraints with a "greater 
 than" relational operator. The program adds any needed variables. 
 
 Since there are more variables (structural + surplus + slack + artificial) 
 than rows, some method must select which variables will be in solution. The 
 SIMPLEX Algorithm selects a number of variables (equal to the number of rows) 
 which will be in solution. The final solution is the maximum (or minimum) of 
 the linear function- subject to the constraints. Since slack and surplus vari- 
 ables have "real" meaning, they may appear in the final and intermediate solu- 
 tions. Their presence as a non-zero value indicates that the constraint to 
 which they were added is not binding. Artificial variables have no "real" 
 meaning. Presence of artificial variables in solution indicates that some con- 
 straints are so constructed as to preclude a solution which has "real" meaning. 
 
 The slack and surplus variables are given costs of zero in the objective 
 function. Artificial variables are given large negative costs. SIMPLEX attempts 
 to drive artificial variables from solution. 
 
 Failure to drive artificial variables from solution may indicate a problem 
 in which constraints are mutually exclusive or that the cost assigned to the 
 artificial variable is not large enough. 
 
LINEAR PROGRAMMING 
 Page 2 
 
 In matrix notation the augmented matrix before calculations begin would 
 appear as: , . i 
 
 A I I I S I = b 
 where I is the identity matrix of slack and artificial variables and S is the 
 matrix of surplus variables. Row operations are performed on the augmented 
 matrix according to the SIMPLEX criterion. After any number of row operations, 
 the inverse matrix of the original coefficients of structural variables now 
 in solution is contained in the columns where the original identity matrix was 
 located. At every stage (row operation) an identity matrix will be present. 
 This identity matrix indicates the variables in solution. 
 
 Since the original table is stored by the program, it is possible to com- 
 pare the results of the inverse obtained through LINEAR PROGRAMMING with the 
 inverse obtained by a standard inversion technique. The user may set the absolute 
 value for this comparison in Parameter 3- If the comparison does not meet the 
 accuracy requirement, a new table is formed using the original table and the 
 calculated inverse. After a feasibility check, the program continues calculations 
 until an acceptable solution is obtained. 
 
 References 
 
 Llewellyn, R. LINEAR PROGRAMMING . New York, New York: Holt, Rinehart, 
 and Winston, 1966 . 
 
 Hadley, G. LINEAR PROGRAMMING . Reading Massachusetts: Addison-Wesley, 
 1953. 
 
 LI . Restrictions 
 
 The program is limited to a maximum of 90 rows or constraints, 300 columns 
 or variables, and 5 columns in the requirement matrix. 
 
 These limits are internal limits and the user is warned that large problems 
 may exceed the program capacity during accuracy check and calculations involving 
 multiple column requirement matrices. Program capacity WILL be exceeded if: 
 the number of constraints + number of structural variables + number of "greater 
 than" inequalities > 300. 
 
 Input may come ONLY from CARDS in the form of subparameters. 
 
 LI. Parameters 
 
 All floating point numbers (indicated by FP) must be enclosed by a pair of 
 asterisks. All integer numbers (indicated by IN) must be enclosed in parentheses. 
 The main call to the program and each subparameter must be terminated by a period 
 
 (•)• 
 
 The program is entered by punching the symbols L-P followed by the appropri- 
 ate main parameters and subparameters. All main parameters have default options. 
 
LINEAR PROGRAMMING 
 Page 3 
 
 Main Parameters to follow L-P 
 
 1 Cost of artificial variables *large negative FP numbers*. 
 Default = -1.E50. 
 
 2 Minimum value for calculations *FP*. If any calculation 
 falls below this value, it is set to zero. Default = 
 internal calculations. 
 
 3 Value for accuracy check *FP*. If absolute value for calcu- 
 lated difference (See General Description ) falls below this 
 value, final value is termed inaccurate and calculations are 
 performed to correct rounding errors. Default = .5 . 
 
 k If 1, suppress print of solution matrix (IN). 
 
 5 If 1, suppress print of check matrix (IN). 
 
 6 Print every IN step, i.e. row operation (IN). 
 
 7 If 1, insert small positive, non-zero number for any zero in the b 
 
 vector. Useful aid if b vector contains many zeros. 
 
 Subparameter 
 
 me program now expects to find the word MINimize or MAXimize followed 
 
 by a string of constants which represent, in sequential order , the cost or 
 values associated with each variable. All non-zero constants (with or without 
 decimal) must be enclosed by a pair of asterisks. Zeros may be enclosed by 
 asterisks. A series of sequential zeros may be represented by a pair of paren- 
 theses, i.e. the integer number in the pair of parentheses represents the number 
 of sequential zeros to be inserted. All coefficients must appear and be in 
 sequential order. 
 
 The cost coefficients representing the objective function are terminated by 
 a period. The constraints are entered in a similar manner. All variables must 
 be in sequence. Coefficients of zero must be included. Multiple requirement 
 vectors are entered in the standard form. The constraint is terminated by a per- 
 iod. Comments which do not include period (.), comma (,), asterisks (*) or left 
 parenthesis may be entered at any point outside those characters delimiting con- 
 stants. The requirement vectors are separated from the rest of the constraint 
 by relational operators. All coefficients must appear and be in sequential- order . 
 
 The program recognizes three relational operators: LE (less than or equal), 
 EQ (equal), and GE (greater than or equal). These relational operators are 
 surrounded by quotes ("). See Section V. Examples in this program. 
 
 Output 
 
 The output consists of the objective function, the final solution matrix, 
 the variables in solution, and the optimal functional value. In addition, 
 Shadow Prices or opportunity costs are printed. Shadow Prices provide useful 
 information on the "cost" of having certain constraints, or the increased pro- 
 fit to be obtained by 'relaxing' a particular constraint. 
 
LINEAR PROGRAMMING 
 Page U 
 
 For example : 
 
 Constraint 1: IX(l) + 2X(3) < 5-0 
 To this constraint, slack variable X(l) is added to make it an equality. In 
 the final solution, X(l) is not in solution. The optimal maximum functional 
 value is 20. The 'Shadow Price' on variable X(l) is 2.0. This means that if 
 we relax this constraint to 6.0, the optimal maximum value could be 22.0. For 
 every unit the constraint is relaxed, the functional value will be changed by 
 the Shadow Price. The Shadow Price holds until the constraint is no longer 
 binding. The same logic may be applied to "GE" type constraints with surplus 
 variables. For interpretation of Shadow Price for structural and artificial 
 variables, the user is referred to texts under headings such as "Dual Algorithm", 
 Interpretation of the Dual", and "Opportunity Costs". 
 
 Basis variables refer to those variables which form the original identity 
 matrix. The variable numbers are listed in the order they were added to the 
 constraints. The number of basis variables will always equal the number of 
 constraints. To determine whether a basis variable is a slack or artificial 
 variable, refer to the coefficients of these variables in the objective func- 
 tion. A slack variable will have a coefficient of 0.0. 
 
 MESSAGES 
 
 SLEM TOO LARGE : More than 300 variables or 100 constraints on input or 
 during addition of slack, surplus, and artificial variables. 
 
 NORM FOR CUTOFF : Value of Main Parameter Number 2, either supplied or 
 default. 
 
 . - R IN SIMPLX : Source Program Error. See a consultant. 
 
 "OLU T iPN UNBOUNDED : Constraints do not form a closed space. Optimal 
 functional value is infinite. 
 iER OF ITERATIONS : Number of row operations needed to calculate final 
 solution. For multiple requirement vectors, number is not cumulative. 
 
 ACCURACY ACCEPTABLE or ACCURACY NOT ACCEPTABLE : Comparison with Main Para- 
 meter Number 3- 
 
 VARIABLE ADDED 
 
 VARIABLES ARE : Iterations either inaccurate and new variable added 
 or. during execution of multiple requirement vector, a new variable had 
 to be added to make problem feasible (requirement vector positive). 
 
 NON-RESOLVABLE TIE : Cannot occur mathematically. Only reason for occurance 
 is due to rounding error in machine. Can be corrected by incrementing 
 or decrementing requirement vector by a small amount. Perform this 
 only for constants of same value. (Use Parameter 7). 
 
 Other messages should be self-explanatory. 
 
 . 'r ■■i-il Comments 
 
 Speed and accuracy can be increased by observing the following suggestions: 
 
 1) Never make Parameter 3 (accuracy check) larger than (0.1) X (number of 
 significant digits in table). For example, if numbers in the table are 
 k, 5, .001, 86, 95-32, you have "one" significant digit. Set Parameter 3 
 to *.l*. 
 
LINEAR PROGRAMMING 
 Page 5 
 
 2) Scale numbers in table to get them into same range. For example, if 
 table entries are of the order 10 , and the requirement vectors are of 
 the order 103, scale requirement vectors to KH- and rescale solution by 
 10^. The objective function may also be rescaled in a similar manner. 
 Rescaling essentially reflects the number of significant digits. 
 
 V . Example s 
 
 The problem: 
 
 Minimize --75X(l) + 150X(2) -.02X(3) + 6x(k) 
 Subject to the following constraints: 
 Constraint (l) 
 
 .25X(l) -60X(2) -.04X(3) + 9X(*0 < 0, 1, 2 
 Constraint (2) 
 
 .05X(l) -90X(2) -.02X(3) -3X(U) < 0, 1, 2 
 Constraint (3) 
 
 1X(3) < 1, 2, 3 
 
 Could be set up on cards as follows : 
 
 /*ID 
 
 // EXEC SOUPAC 
 //SOUPAC.SYSIN DD * 
 
 L-P*-1.E20****.1*()0 (!)• 
 
 MIN*- . 75**150**- . 02**6* . 
 
 LABOR * . 25** -60**- . 0k** 9*" LE" *0**1**2* . 
 
 LAND* . 05-**-90**- . 02**-3 • 0*"LE"*0**1**2* . 
 
 CASH(2)*l*(lV'LE"*r*-*2**3'*- 
 
 END PROGRAM 
 
 END SOUPAC 
 
 /* 
 
LINEAR PROGRAMMING 
 Page 6 
 
 This problem will result in an unbounded solution with requirement 
 vector number one. The problem terminates without performing calculations 
 on the other vectors. 
 
 Note insertion of sequential zeros on CASH card. 
 
QUADRATIC PROGRAMMING 
 
 I. General Description 
 
 This program maximizes the quadratic function ex + 1/2 x T Dx subject to 
 the linear constraints Ax <_ b, where c is an n-vector, D is a symmetric 
 negative definite n by n matrix, A is an m by n matrix of coefficients 
 or constraints and b is an m vector. 
 
 The Kuhn-Tucker theory shows that a solution to the constrained maxi- 
 mization problem is obtained if and only if vectors x, L, v, and w can be 
 found such that : 
 
 Dx - A T L + v = -c 
 Ax + w = b 
 
 where the elements of x, L, v, and w are non-negative and the conditions 
 xv = and lw = are satisfied. To find these vectors, artificial vectors 
 z 1 and z^ are added to the first equation and a y-vector is added to the 
 second. Simples techniques are then used to eliminate first the y 
 and then the z-^ amd z2 variables. 
 
 References : 
 
 Carr, C. R. and C. H. Howe, Quantitative Decision Procedures in Management 
 and Economics , McGraw-Hill , 196*+ . 
 
 Hadley, G. , Nonlinear and Dynamic Programming , Addi son-Wesley, ISSh. 
 
 Wolfe, P., "The Simplex Method for Quadratic Programming", Econometrica , 
 27, 1959, PP. 382-398. 
 
 NOTE: Carr and Howe claim that elements of the w-vector may not be en 
 entered in the first stage of the simplex procedure. Since this requires 
 that there exist a solution to Ax = b, it is a severe restriction. It 
 is also unnecessary, and this program does enter w-variables during the 
 first stage. Otherwise, the procedures used closely follow those of 
 Carr and Howe. 
 
 II. Restrictions 
 
 The maximum number of x-variables is kO. The number of x-variables 
 plus the number of constraints must be <_ 80. 
 
 The D-matrix must be negative definite. If this is dubious, use the 
 PRINCIPLE AXIS FACTOR ANALYSIS program to extract the eigenvalues. All 
 must be negative. Semi-definite D-matrices may be perturbed or the user 
 may limit the number of iterations to be performed. If this limit is 
 exhausted, final solution vectors will be printed out (see below). 
 
 The only form of input is a matrix of data. If there are n x-variables 
 and m constraints , the matrix should have n + 1 columns and m + n rows , 
 partitioned as follows: 
 
QUADRATIC PROGRAMMING 
 Page 2 
 
 D (n x n ) 
 
 A (m x n) 
 
 c (n x 1) 
 
 b (m x 1) 
 
 Note that this is the c vector, not the -c vector mentioned in the 
 Kuhn-Tucker formulas. Also note the + sign and the 1/2 coefficient of the 
 xDx term. All constraints in this type of input are assumed to be < type. 
 Multiply > constraints through by -1. The equality constraing: 
 
 n 
 
 Z a. jX = b. 
 
 0=1 
 
 is equivalent to the two constraints Za-^X. <; b- and Z-a. .X. <; b. . 
 
 This matrix can be read- in from cards or from temporary storage. Only 
 one problem can be read from a tape or sequential location. Multiple problems 
 must be read from cards, the matrix for each problem preceded by its own 
 "DATA (N + l) (FORMAT)" card and followed by its own "END#" card. 
 
 The elements of the w-vector are always non-negative and are to be 
 considered "slack" for < constraints and "surplus" for > constraints. 
 
 The user may obtain the basis vector at the end of each iteration 
 showing which variables are in the basis and their quantities (option 2). 
 lie may alternatively have the entire matrix printed out after each iteration 
 (option 3)- The user is cautioned that option 3 can use immense quantities 
 of paper and time unless the problem is very small. 
 
 A method outlined in Hadley, pages 183 - 186, is used to avoid cycling 
 in cases of degeneracy. 
 
 Parameters 
 
 The program call card should have the name QUADRATIC PROGRAMMING 
 followed by these parameters: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Number of problems following 
 
 2 Input Address. SEQUENTIAL 1-15 or CARDS. 
 
 3 Output option: 
 
 if final results only 
 
 1 if iterated basis vectors 
 
 2 for entire iterated matrix 
 
 h Limit on number of iterations if 
 
 desired. Leave blank otherwise. Default 
 is 1000. 
 
QUADRATIC PROGRAMMING 
 Page 3 
 
 Parameter 
 Number 
 
 Use or Meaning 
 
 IV. Example s 
 Example I 
 
 Pertubation quantity. Punch quantity 
 to be subtracted from diagonal of D- 
 matrix between asterisks instead of 
 parenthesis; e.g., *.001*. Leave blank 
 if not desired. 
 
 Suppose we wish to maximize the quadratic function 
 F = IOx-l + 20x 2 + 15x3 - Ix-l 2 - 2x 2 2 + lxix 2 
 
 subject to the constraints 
 
 2xi + 3x 2 + Ix^ < 50 
 lxj + 3x3 < 70 
 3xj_ + 2x 2 < 60 
 
 Since the D-matrix is only negative semi-definite, it should be perturbed 
 to insure convergence to a solution. The following set of cards would solve 
 the problem using data matrix input : 
 
 /*ID 
 
 // EXEC SOUPAC 
 
 //SYSIN DD * 
 
 QUADRATIC PROGRAMMING (l) (CARDS) (0) (0)*.001*. 
 
 END SOUPAC 
 
 DATA(i+)(UF3.0) 
 
 
 
 1 
 
 
 
 10 
 
 
 1 
 
 -if 
 
 
 
 20 
 
 c 
 
 
 
 
 
 
 
 Vi 
 
 
 2 
 
 3 
 
 1 
 
 50 
 
 
 1 
 
 
 
 3 
 
 70 
 
 b 
 
 3 
 
 2 
 // 
 
 
 
 60 
 
 
 /* 
 
QUADRATIC PROGRAMMING 
 Page h 
 
 Example II : 
 
 Maximize 
 
 F = 8x-|_ + 10x 2 - X]_ 2 - xp 2 
 subject to the constraint 
 
 3x x + 2x 2 < 6 
 The D-matrix is negative definite. The problem would be set us as follows: 
 
 /*ID 
 
 // EXEC SOUPAC 
 //SYSIN DD * 
 QUAD (1) (C)(2). 
 END SOUPAC 
 DATA(? (F2.0,2F3-0) 
 -2 
 D -2 10 c 
 
 A END# 
 /* 
 
 b 
 b 
 
 The extreme value of the objective function for this example is .213E02 
 
THREE STAGE LEAST SQUARES ESTIMATION 
 
 I. GENERAL DESCRIPTION 
 
 The Three Stage Lease Squares Estimation program calculates three stage 
 least squares estimates and an asymptotic covariance matrix. A raw data 
 covariance matrix and two stage least squares residual covariance matrix 
 are the necessary input. Calculations are carried out an in "Econometric 
 Theory" by Arthur S. Goldberger, pp. 3^7-352. The coefficients may also be 
 stored for use with the Econometric Reduced Form and Residual Analysis 
 program. 
 
 References : 
 
 Goldberger, Authur, S. , Econometric Theory , New York, John Wiley and Sons, 
 Inc. 196h. 
 
 Johnston, J., Econometric Methods , New York, McGraw-Hill Book Company, Inc., 
 I960. 
 
 II. RESTRICTIONS 
 
 The raw data covariance matrix must be arranged in the K-Class Estimation 
 program write-up. The program has the following size restrictions: Total 
 number of coefficients estimated <_ 1^+0. 
 
 (NEQ x NVAR) + (NEQ x NEQ) + (NVAR x NVAR) <_ 20,000 
 
 NVAR = the total number of variables 
 NEQ = the number of equations estimated 
 
 Note: Endogenous coefficients are printed out first, followed by exogenous 
 coefficients. 
 
 III. PARAMETERS 
 
 The parameters appear on the program card following the name Three Stage 
 in the following order : 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address for raw data covariance matrix. 
 SEQUENTIAL 1-15- 
 
 2 Output Address for coefficients. 
 SEQUENTIAL 1-15- 
 
 3 Input Address for residual covariance matrix. 
 SEQUENTIAL 1-15- 
 
 h Number of equations to be estimated 
 
 5 Number of exogenous variables 
 
THREE STAGE LEAST SQUARES 
 Page 2 
 
 Subparameters 
 
 For each equation a card specifying the variables in the equation must 
 follow the main parameter card with the following parameters : 
 
 Parameter 
 Number Use or Meaning 
 
 1 Number of exogenous variables in the equation 
 
 2 Number of endogenous variables in the equation 
 
 3 to N + 2 Variable number of the N variables included in 
 
 the equations with exogenous variables first; 
 endogenous variables next , with the variable on 
 which the system is normalized last. 
 
 IV. SPECIAL COMMENTS 
 
 The Three State Least Squares Estimation program requires input from 
 several other SOUPAC programs. The following is an example of the steps 
 needed to calculate the necessary input. 
 
 V . EXAMPLE 
 
 K1CLASS (Tl ) ( T2 ) ( ) ( ) ( ) ( 1 ) (1 ) . 
 K2CLASS(T2)(T3)(TM()(8)(2)*1.*. 
 (M(2)(1)(2)(3)(M(D(2). 
 (M(2)(5)(6)(7)(8)(2)(l). 
 
 END P 
 
 ECON (T3)(T2)()(8)()(T5). 
 
 THREE(T2)(T6)(T5)(2)(8). 
 
 (M(2)(1)(2)(3)(M(D(2). 
 
 (M(2)(5)(6)(T)(8)(2)(1). 
 
 END P 
 
 Notice that the equation control cards for both K2CLASS and Three-stage 
 Least Squares must be in the same order. 
 
 Also notice that an ENDP card is required after the equation cards. 
 
SPECTRAL ANALYSIS SECTION 
 
AUTOCORRELATION AND SPECTRAL ANALYSIS 
 
 General Description 
 
 The calculation of autocorrelation coefficients and the determination 
 of power spectra are of interest to economists in the study of time series 
 and others whose interest leads them to suspect some repetition of variation 
 within a set of observations. For a single variable, the autocorrelation, 
 r p , is calculated as follows: 
 
 (N-p) IX. X. - £X.£X. 
 ^ " [(N-p) tt* - (2X.) 2 ] 1/2 [(N-p) ZX i+p 2 - ^ i+p ) 2 ] V2 
 
 N is the total number of observations; p is an arbitrarily chosen time lag. 
 The power spectrum is a Fourier transformation of the autocovariances and 
 is used in the harmonic analysis of Xj_ as a function of time. Raw estimates 
 of the spectral density are given by formula (2) and a smoothed value is 
 given by formula (3) : 
 
 (2) L p = W + 2 ZW q cos SM + Wm cos pfl 
 
 (3) U = 0.23 L + 0.5^ L + 0.23 L 
 
 p+1 
 
 This program, however, not only will calculate values of X^ with X. + , 
 but will calculate values for all possible pairs of variates, Xj_j with 
 
 X i+P,j- 
 
 For a more detailed discussion, see: 
 
 R. W. Southworth, "Autocorrelation and Spectral Analysis" 
 from Mathematical Methods for Digital Computers , by Anthony 
 Ralston and Herbert Wilf; John Wiley and Sons, i960, pp. 213-20. 
 
 II. Restrictions 
 
 This program is designed to accept from 1 to 15 variables. The number 
 of observations is limited to 2500 or less. Since all possible pairs of 
 lagged cross-correlations are printed for values of p from p to p + kd (p, 
 k, and d are set by program parameters), the output under maximum circumstances 
 will be very large. The value of p is limited to 1500 or less. Current 
 literature suggests that for a correlation between Xj_ and Xj_ + p, the value of 
 p should not exceed 10 per cent of the total number of observations. It 
 should be pointed out, however, that a set of observations, Xj_, can be 
 broken up into blocks of equal size, and the several blocks can be treated as 
 additional variables. In this way, the value of p can exceed 1500 and also 
 the value of N can exceed 2500. 
 
 Ill . Parameters 
 
 Data may be read from the program either from cards or from any temp- 
 orary storage medium. In the output, X is the lead variable and Y is the 
 lag variable. 
 
AUTOCORRELATIONS 
 Page 2 
 
 The call card will have the program name first. After this the para- 
 meters are in the following order: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address. CARDS or SEQUENTIAL 1-15- 
 
 2 An integer number denoting the minimum number 
 of lag periods, p. 
 
 3 An integer number denoting the maximum lag, 
 p + kd, at which time the execution of the 
 program is terminated. 
 
 h An integer number denoting the increment, d, 
 
 -to be added to p, so that the lag period can 
 be altered. 
 
 Parameters 3 and k will be useful generally when one wants to study the 
 changes in the power spectra as a function of lag length. 
 
 5 The presence of a number greater than 
 indicates that the means, standard deviations, 
 and correlations are to be printed. 
 
 6 Standard SOUPAC output parameters consisting of 
 
 an output address, and/or / PRINT, and/or /X. 
 
 If the punch or print option is chosen, then 
 the quantities output are the lag period, auto- 
 correlation coefficient, autocovariance function, 
 raw spectrum, and smoothed spectrum in 15, !+El8.8 
 format . 
 
 7 A number greater than indicates that the lead- 
 lag sums and cross-products are to be printed. 
 
 If a parameter is left blank, this is the same as specifying a zero. It 
 should be pointed out that the output for several variables is quite large, 
 and unless there is interest in the output, several of the parameters should 
 be left as blanks. 
 
SCALE ANALYSIS PACKAGE 
 
CLIQUE ANALYSIS 
 
 I- General Description 
 
 This routine is designed to enumerate all third order or higher 
 interrelationships (communication chain) which exist in a sociometric 
 matrix. The algorithm is identical to the method described by Harary 
 and Ross. A communication chain is considered to be any submatrix of 
 order three or more in which all the off diagonal cells are full. 
 
 II- Restrictions 
 
 The maximum dimensions for an input array is 190 x 190. Input may 
 come from cards or any temporary storage area. The array must contain 
 only zeroes and ones in its elements. Any number greater than zero is 
 considered to be one; therefore, care should be used in constructing the 
 array. Symmetry in the input matrix is not necessary since the program 
 automatically forces symmetry through element-wise products. It is 
 suggested that TRANSFORMATIONS be used to modify input arrays when various 
 cut-off points are used to distinguish ones from zeroes. 
 
 III. Parameters 
 
 The name CLIQUE ANALYSIS appears first on the program call card and 
 is followed by the following parameter: 
 
 Parameter 
 Number Use or Meaning 
 
 1 Input Address of data array. 
 
 CARDS or SEQUENTIAL 1-15. 
 
 IV. Special Comments 
 
 The following is an illustration of the clique detection concept. 
 Data matrix: 
 
 
 
 1 
 
 1 
 
 1 ° 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 lie 
 
 
 
 
 
 
 
 c 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 I 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 ] 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 ] 
 
 
 
 1 
 
 
 
 , 
 
 
 
 
 
 
 
 ] 
 
 1 
 
 1 
 
 
 
 Oj 
 
 oj 
 
 
 
 0! 
 
 
 
 ll 
 
 1 
 
 ll 
 
 
 
CLIQUE ANALYSIS 
 Page 2 
 
 Clique (1) 1, 2, 3 
 
 Clique (2) 8, 6, 7, 9 
 
 Clique (3) k, 3, 5 
 
 Clique (k) 3, 5, 6 
 
 Clique (5) k, 5, 7 
 
 Clique (6) 5, 6, 7 
 
 Harary and Ross, "A Procedure for Clique Detection Using the Group Matrix", 
 Sociometry , Vol. 20, No. 3, 1956, pp. 2-5, 215. 
 
 October 10, I969 
 
 S0UPAC( Statistically Oriented Users Programming and Consulting "> 
 
PAIRED COMPARISONS 
 
 General Description 
 
 Paired comparisons is a method of obtaining empirical estimates of 
 the form "stimulus J is judged greater than any other stimulus i." Each 
 stimulus in turn serves as the standard; that is, all possible pairs of 
 stimuli are compared. With n_ stimuli, there are n (n-l)/2 pairs. 
 Comparisons of a stimulus with itself is disregarded; it is assumed 
 that a proportion of 0.05 would result. In the following m = no. of 
 subjects = sample size. 
 
 Each subject's preferences are tabulated, and the total number of 
 times he preferred each stimulus is computed producing, A , an n x n 
 matrix of l's and 0's , k = 1, m. Totals for each stimulus and a grand 
 total are computed for each subject and this m x n + 1 matrix is 
 referred to as individual preference sums. The individual tables, A , 
 are summed over all subjects to form an n x n frequency matrix F, 
 whose elements (fij ) denote the observed number of times stimulus j was 
 judged greater than stimulus i. 
 
 The matrix of proportions, P, is then computed from F, so that p. . 
 is the observed proportion of times stimulus j was judged greater than J 
 stimulus i. The matrix X is derived from P by reference to the normal 
 curve; x. . is the unit normal deviate corresponding to the element p... 
 These are the sample estimates of the values required to determine the 
 scale values of the stimuli. The scale values are computed by summation 
 
 producing s., a least squares estimate of the scale value of stimulus j. 
 
 J 
 
 II. Input 
 
 A. Both an indication of ordering for each pair and an array of 
 subjects' choices are required. The former must be given as a set 
 of pair subparameter cards and the latter as an observation of 
 data for each subject in a data deck. 
 
 B. In the subjects deck one number is used to denote the subject's 
 choice for each pair. This choice may be "is greater than," 
 "is better than," "is brighter than," etc. This number is 1 if 
 the subject chose the left, or first stimulus, 2 if the subject 
 chose the right or second stimulus. No other coding is acceptable. 
 
 C. The pair cards consist of one mention each of every possible pair 
 of stimuli. The order of the pairs is the same as the order of 
 the subjects' choices, i.e. pair 1 corresponds to item 1 of the 
 subject array. The order of the elements in the pairs is reflected 
 in the subjects' choice deck, if (5>T) corresponds to a 1 then 
 
 the subject chose stimulus 5 over stimulus 7» if (7,5) were the 
 pair, a 2 corresponding would mean stimulus 5 preferred. Note that 
 one set of pair specifications serves for all subjects. 
 
PAIRED COMPARISONS 
 Page 2 
 
 III. Formulas and Calculations 
 
 A. INDIVIDUAL PREFERENCE SUMS 
 
 Let A "be an individual preference frequency table, a. . , is an 
 element of A, i=l , n, j=l, n, where n is the number or stimuli 
 
 a. . = 1 if an individual chose stimulus j over 
 
 stimulus i. 
 
 a. . = if the individual chose stimulus i over j 
 
 a. . = a. . = no stimulus is compared with itself, 
 n JJ 
 
 Individual preference sum for stimulus j = I a. . 
 
 i 1J 
 
 Error messages concerning incorrect frequency tables refer to the 
 configurations of Table A. A can be correct only if subject data 
 and pair cards are correct . 
 
 STIMULUS PREFERENCE FREQUENCY TABLE, F 
 
 Given the matrix A for each of m subjects 
 
 m 
 Stimulus preference frequencies = f . . = Z a 
 
 U k=1 ij 
 
 C. TABLE OF PROPORTIONS, P 
 
 If m is the sample size, i.e. number of subjects, then, 
 
 f . . 
 
 P.. =" iL 
 
 ij m 
 
 D. TABLE OF NORMAL DEVIATES, Z 
 
 Let p. . be an element of the table of proportions: 
 Then let 
 
 e - 
 
 e.. = log (1/ Pi . 2 ) 
 
 2.515517 + .802853xe + .010328xe 2 
 
 1. + l.i+32T88xe + .l89269xe 2 + .001308xe 3 
 
 producing a z for each e. .. Critical values of p occur at , 1 
 and .5 so adjustments are made for these values before the formula 
 is applied and sometimes after. 
 
PAIRED COMPARISONS 
 Page 3 
 
 E. SCALE VALUES, S 
 
 s. = X J 
 
 J 
 
 , where n is the number of stimuli 
 
 ii 
 
 Then total scale = L S i 
 
 J J 
 
 A row of scales and a total of length n + 1 is calculated. 
 
 IV. Output 
 
 Matrices for individual preference totals =„* q 
 always printed. Other intermediate re su^F ^L V V S ' *** 
 on option. All matrices mav h« !+ ^sults F, P and Z may be printed 
 
 in F format. F, P and Z are n x n ™? " ° Ptl ° n - Ml rSSUltS are P rinte * 
 
 aremxn + 1 and n + 1 \L ,• T f eS> i ndivi dual totals and S 
 
 The A matrix I.^^cS^rSj'S^S = *» calculations). 
 
 v - Restrictions 
 
 A ' ^^o™ SSS ^E^^st 1 \' '^ The - e " n ° -friction on toe 
 
 sublets sh oSr n :; e ss^rs:.""' be paired " ith ^ ° th - «• 
 
 pairs (te sti^li) ^ be inSerted t0 * "»*™ °f 881 
 
 n(n-l) 
 * = — . 
 
 Any other relationship is invalid. 
 
 D ' the'fir J 5 ' 7 - V ^^ thSn (7 ' 5) is invali *' Also, if this is 
 "LuluT^^^^ 
 
 VI- Parameters 
 
 the p^eters ^ S&£££? *»"""• °" »» ** ™* — 
 
 Parameter 
 ^S^r Use or Meaning 
 
 1 
 2 
 
 Input Address of data. CARDS, SEQUENTIAL 1-5. 
 
 Output Address of individual preference sums. 
 Always printed. SEQUENTIAL 1-5. 
 
 n sS^TTAf 1 S S ° f ,f lmUlUS P reference f-auency table 
 SEQUENTIAL 1-5 and/or PRINT; if not desired, leave 
 parameter Wnr,v ' a,e 
 
 parameter blank. 
 
PAIRED COMPARISONS 
 Page k 
 
 Parameter 
 Number Use of Meaning 
 
 k ft Output Address of proportions. SEQUENTIAL 1-5 and/ 
 
 or PRINT; if not desired, leave parameter blank. 
 
 5 ft Output Address of normal deviates. SEQUENTIAL 1-5 
 
 and/or PRINT; if not desired, leave parameter blank. 
 
 6 Scale Values. SEQUENTIAL 1-5, always printed. 
 
 ft It is possible to punch the output from these parameters while 
 
 executing this program. If you need this option, see the section in 
 the Introduction on Input and Output. Any storable output may be 
 punched using the Matrix program. 
 
 VII. Examples 
 A. 
 
 /*ID <accounting information> 
 
 // EXEC SOUP 
 
 //SYS IN DD * 
 
 PAIRED COMPARISONS (C)( )(P)(P)(P). 
 
 PAIRS (l,2)(3,l)(i+,l)(3,2)(2,U)(it,3) 
 
 END P 
 
 END S 
 
 DATA(6)(6F1.0) 
 
 122211 
 
 222111 
 
 121122 
 
 END # 
 /* 
 
 Print has been indicated for all output except individual preference 
 sums and scale values which are always printed. 
 
 The pairs card indicates that there are k stimuli. All possible 
 pairs of these stimuli are presented to the subjects, and the subject's 
 responses are recorded in the order (l,2), (l,3), (l,M, (2,3), (2,4), 
 (3,^). Some of these pair members have been inverted indicating that 
 no special order is required, left member or right member preference of 
 subjects would, of course, be affected by the inversion. 
 
 The pairs need not be given in the increasing order of the example, 
 but at all times the order of the pairs is the order of the corresponding 
 subject responses . 
 
 The data deck is a set of subject responses for each pair of stimuli. 
 
PAIRED COMPARISONS 
 Page 5 
 
 B. 
 
 /*ID<accounting information 
 
 // EXEC SOUP 
 
 //SYSIN DD * 
 
 TRA (C). 
 
 C0N(900)*1*. 
 
 ADD (l,625)(900)(l,625). 
 
 OUT(Sl)(l,625). 
 
 END P 
 
 PAI(S1)( )(S2/P)( )(P). 
 
 PAI(5,3)(2,8)(1+,16)(7,26)(8,3)( )(U,13). 
 
 PAl(25,2l+)(23,28)( )(U,12). 
 
 END P 
 
 END S 
 DATA(325)(T5F1.0) 
 
 Data Deck — 5 cards per subject 
 END# 
 /* 
 
 This example shows a program for 26 stimuli, 26 x 25/2 = 325 is 
 the number of pairs required and the number of subject preferences. 
 Since no more than 300 pairs may be given per pairs statement at least 
 two pairs statements are needed, two are shown. The unique pairs may- 
 occur in any order, the subject responses are in the same order. 
 
 The Transformations program shown is designed to correct subject 
 responses punched zero/one or blank/one to 1 and 2. 
 
 A selection of possible output has been made. Note that 
 individual preference sums and scale values , as well as normal deviates 
 and stimulus preference frequencies are printed. The latter is also 
 stored. This storage implies some further use is made of the frequencies, 
 perhaps in the missing part of the program. 
 
 VIII. References 
 
 Torgerson, Warren S. Theory and Methods of Scaling . John Wiley and 
 Sons, New York; I960, pp. 166-173. 
 
 Edwards, Allen L. Techniques of Attitude Scale Construction . Appleton 
 Century, Crofts, New York: 1957, pp. 19-52. 
 
SCALOGRAM ANALYSIS 
 
 General Description 
 
 The SCALOGRAM ANALYSIS (mnemonic : SCA) was developed to provide a 
 method of producing Guttman scales automatically without the need of 
 external decisions to determine which items do and which items do not enter 
 into Guttman scales. Items are grouped together in as few as possible 
 submatrices with each subgroup having a maximum homogeneity within each 
 submatrix. Each item from the total group is chosen to fit into only one 
 submatrix. 
 
 The SCALOGRAM program is started by choosing an item from the total 
 group and then it searches the remainder of the items to find an item 
 similar to the item chosen. Similarity is tested by using an error cri- 
 teria and a chi- square test to insure that the items are similar. If the 
 above criteria are met, this item is added to the first item and a scale 
 is formed. This last item is then used to find another similar item and 
 this procedure continues until either of the two criteria is not met. 
 Whenever a criteria fails, the scale is terminated and a new scale is 
 started. 
 
 SCALOGRAM will only work for dichotomous data and it can be used 
 to analyze both subject-wise and item-wise. SCALOGRAM differs from 
 Guttman analysis in three ways: l) It uses an empirical rather than a 
 rational basis for selecting items to enter a scale. 2) It uses a 
 statistical method of deciding on groups and for testing the scale- 
 ability of the item. 3) It yields multiple scales rather than reject 
 the scale hypothesis for the whole item set. 
 
 SCALOGRAM can be considered to be more descriptive than the raw data 
 but less than factor analysis. SCALOGRAM also is unlike factor analysis 
 in that SCALOGRAM is not bound to linear assumptions about the regressions 
 involved. Factor analysis is set up to study quantitative variables and 
 will not show correct relationships between qualitative variables, SCALOGRAM 
 will show what relationships do exist between qualitative variables. (See 
 Guttman 1950 for a complete discussion of the relation between the scalogram 
 technique and other statistical procedures). (See Lingoes 196~3 f° r "the 
 complete algorithm for SCALOGRAM) . 
 
 References : 
 
 Guttman, L. "Relation of Scalogram Analysis to other Techniques." 
 In Stouffer, et al., Measurement and Prediction . Princeton, N.J. : 
 Princeton University Press, 1950 (P. 172-212). 
 
 Lingoes, J. C. "Multiple Scalogram Analysis. A Set-Theoretic Model 
 For Analyzing Dichotomous Items." Educational and Psychological 
 Measurement XXIII (196"3), 501-524. 
 
 Lingoes, J. C. "A Multiple Scalogram Analysis of Selected Issues 
 of the 83rd U.S. Senate." American Psychologist, XVII (1962), 327- 
 
SCALOGRAM ANALYSIS 
 Page 2 
 
 II 
 
 Parameters 
 
 The program mnemonic is SCA. The following parameters appear on 
 the program card : 
 
 Paramete r 
 1 
 2 
 
 3 
 
 Use or Meaning 
 
 Input Address 
 
 Address of Labels 
 
 A 1 indicates that the matrix should be 
 transposed 
 
 III. 
 
 Since the program scales by columns or items, to scale by subjects 
 the matrix must be transposed. 
 
 Labels can be used to describe items; they can be input from cards 
 or tape. A maximum of 28 characters is allowed per label and they should 
 be expressed as follows: DATA(n) (nA.U) where n< 7 • A separate card for 
 each label is most convenient to use with the description in the first 
 28 columns. If both labels and data are being input from cards, labels 
 must precede data. 
 
 Restrictions 
 
 Both the number of items and the number of subjects is restricted 
 to U90. Lables are restricted to 28 characters. 
 
 Data must be coded as O's and l's. If data is not of this form, 
 TRANSFORMATIONS may be used to recode it. 
 
 IV. Example s 
 
 SCA(C) (C)(1). 
 
 ENDS 
 
 DATA(7M7A^) 
 labels 
 
 END// 
 DATA(U0)(i+0F1.0) 
 
 data 
 
 END/' 
 
 Labels and data are on cards, 28 columns are used for labels and scaling 
 will be done by rows. 
 
SCALOGRAM ANALYSIS 
 Page 3 
 
 SCA(Sl) . 
 
 ENDS 
 
 DATA(30)(30F1.0) 
 
 ! data 
 END# 
 Data is on SEQUENTIAL 1 and scaling will be done by columns 
 
PROBIT ANALYSIS SECTION 
 
PRCBIT 
 
 (Mnemonic: PRB} 
 
 General Description 
 
 This program calculates maximum likelihood estimates for the parameters 
 A and B in the probit equation : 
 
 Y = A + BX 
 
 An iterative scheme is used. 
 
 II. Restrictions 
 
 The input vectors must be equal length k and: 
 input vector comes from a separate input address. 
 
 3 < k < 3000. Each 
 
 III. Parameters 
 
 Parameter 
 Number 
 
 Use or Meaning 
 
 Input vector of dosage level. 
 SEQUENTIAL 1-15- 
 
 CARDS or 
 
 Input vector of number of subjects tested 
 
 at each dose level. CARDS or SEQUENTIAL 1-15- 
 
 Input vector containing the number of 
 subjects-at each level responding to the 
 drug. CARDS or SEQUENTIAL 1-15- 
 
 Output vector of length k containing the 
 proportion of subjects responding to the 
 various close levels of the drug. SEQUENTIAL 1-15, 
 and/or PRINT. 
 
 Output vector of length k containing the values 
 of the expected probit for the various levels of 
 the drug. SEQUENTIAL 1-15 and/or PRINT. 
 
 Printed output consists of: 
 
 1 - Estimate of intercept constant A 
 
 2 - Estimate of probit regression coefficient B 
 
 3 - Chi-square value for a test of significance of final 
 
 probit equation 
 
 X - f =1 N^id - Pi) 
 
 where R^ = number of responses (input address 3) 
 
 Ni = number of objects tested (input address 2) 
 Pj_ = cumulative normal distribution values corresponding 
 to Z i where Z± = (A + BXi) - 5 
 where A and B are from final probit equation 
 
h - Degrees of freedom for X 
 d.f. = k - 2 
 
 References : 
 
 D. J. Finney, Probit Analysis , Second Edition, (Cambridge University Press 
 1952). 
 
 The program was adapted from the IBM Scientific Subroutine Package, 
 360A-CM-03X, Version III, page kk. 
 
 IV. Example 
 
 If two or more input addresses are cards , the cards must be stacked 
 in order of their parameter numbers. For example: 
 
 /*ID 
 
 // EXEC SOUPAC 
 
 //SOUPAC.SYSIN DD * 
 
 MAT. 
 
 MOVE ( CARDS )(SEQ2) 
 
 END P 
 
 PRB ( CARDS ) ( SEQ2 ) ( CARDS ) ( PRINT ) . 
 
 END S 
 
 DATA(1)( ) 
 
 : cards for SEQ 2 
 
 END# 
 
 DATA(l)( ) 
 
 Cards for Parameter 1 
 
 END# 
 
 DATA(l)( ) 
 
 Cards for Parameter 3 
 
 END# 
 /* 
 
 NOTE The mnemonic for PROBIT is PRB, nor PRO. 
 
RANDOM NUMBER GENERATION SECTION 
 
RANDOM NUMBER GENERATOR 
 
 (Mnemonic: RND) 
 
 I. General Description 
 
 This program calculates a matrix of normally distributed random 
 numbers. An approximation formula is used to normalize uniformly 
 distributed random numbers 
 
 k 
 
 Y = 
 
 i=l 
 
 X i 2 
 
 s/k/12 
 
 where the Xj_are the uniformly distributed random numbers, and Y is the 
 normally distributed number with mean zero (0) and standard deviation 
 one (l) . K is set to 12 by the program. Y is then transformed to the 
 input scale by multiplying by the standard deviation and adding the mean. 
 
 II. Parameters 
 
 Parameter 
 Number 
 
 k 
 
 5 
 
 Use or Meaning 
 
 Input Address of 9 (nine) digit odd integer 
 used as a starting point for the random 
 number generator. CARDS or SEQUENTIAL 1-15. 
 
 Output Address of random numbers matrix. 
 SEQUENTIAL 1-15. PRINT is not valid . 
 
 Mean of random numbers enclosed in asterisks, 
 i.e., *0.0*. 
 
 Standard deviation, i.e., *1.0*. 
 
 Number of rows in output matrix of random 
 numbers. 
 
 6 
 
 7 
 III. Special Comments 
 
 Number of columns in output matrix of random 
 numbers. 
 
 Output Address of 9 digit integer which is 
 finishing point of the random number generator. 
 Do not specify PRINT since number is automatically 
 printed. 
 
 If this program is used with the same integer starting point, it will 
 generate the same numbers. Thus, use Parameter 7 to output the finishing 
 location, and then pass that address as the starting location for the next 
 use of this program. 
 
 V. 
 
 Reference 
 
 IBM System/360 Scientific Subroutine Package (36OA-CM-O3X) Version III 
 page 77. 
 
UTILITY 
 
UTILITY PROGRAM 
 
 I. General Description 
 
 The UTILITY program has been designed to handle small utility functions 
 which do not necessitate or justify the creation of a unique program within 
 the SOUPAC system. The following statements will invoke the UTILITY program. 
 
 UTILITY. 
 
 (insert subparameter card or cards here 
 END P 
 The following sections describe the functions of the various subparameters , 
 
 II. PRESORT Program 
 
 The PRESORT Program is presently the only program in the UTILITY program. 
 It is used to set up the data cards to be input into the IBM SORT/MERGE 
 package which will be executed following the present SOUPAC program and 
 before another SOUPAC program which will use the sorted data for an input. 
 
 SORT (0 or l)(0 or l)(V 1 ), 
 
 (v n ), 
 
 Parameter 
 Number 
 
 Use or Meaning 
 
 if data is to be sorted in ascending order. 
 
 1 if data is to be sorted in descending order. 
 
 if data to be sorted is in single precision. 
 
 1 if data to be sorted is in double precision. 
 
 3 through n <_ 20 
 
 indicates the variable or variables to be sorted 
 with the later variables, if any, varying most 
 rapidly. 
 
 Following the SOUPAC program in which the UTILITY program appears , 
 the following card must appear : 
 
 // EXEC S0UPS0RT,INPUT=Snn,0UTPUT= Smm. 
 where nn and mm represent the two digit equivalent of the sequential unit 
 numbers to be input to the sort and output from the sort to the next SOUPAC 
 program. The two units must not be the same. 
 
 The next card will start the next SOUPAC program which will operate 
 under the assumption that the sorted data has been supplied on the 
 specified sequential unit in the output of the S0UPS0RT program. 
 
 // EXEC SOUPAC, DISP=0LD 
 //SYSIN DD * 
 
 (Your program which uses the sorted data) 
 
UTILITY 
 Page 2 
 
 The example given below is for sorting cards input data so that it 
 may be input into a FREQUENCY program which uses variable 5 as a control 
 variable. 
 
 /*ID identification card information 
 
 // EXEC S0UPAC 
 
 //SYSIN DD * 
 
 MATRIX. 
 
 M0VE ( CARDS )( Si). 
 
 END PR0GRAM 
 
 UTILITY. 
 
 S0RT(O)(1)(5). 
 
 END PR0GRAM 
 
 END S0UPAC 
 
 DATA(10)(10F5.0). 
 
 (user's data deck) 
 
 END # 
 I* 
 
 II EXEC S0UPS0RT,INPUT=SO1,0UTPUT=SO2 
 
 // EXEC S0UPAC ,DISP=0LD 
 
 //SYSIN DD * 
 
 FREQUENCY (S2). 
 
 TW0. 
 
 PER(1)(1)(1). 
 
 C0NTR0L(5). 
 
 END PR0GRAM 
 
 END S0UPAC 
 
 /* 
 
 The data on SI is sorted in ascending order on variable 5- The data is 
 passed to the SOUPSORT job step on SI in double precision. This data is 
 sorted on variable 5 and then output onto S2 in double precision. It is 
 then input into the FREQUENCY program of the next SOUPAC job step, whereupon 
 analysis continues. 
 
 III. Notes, Restrictions, and Ideas 
 
 1. The default output from MATRIX is in double precision 
 
 2. The output from TRANSFORMATIONS is in single precision 
 
 3. Only one utility program is allowed per SOUPAC program 
 
 h. If other sequential units have been used during the first SOUPAC 
 
 program besides the one passed to the sort job step, they are still 
 intact and usuable in the second SOUPAC program due to the DISP=OLD 
 parameters . 
 
*PB-720Q"3' 
 5-IdT 
 
 C 
 
!> PERMA BOUND*!