Ha rclH&Ba 
 
 BMB 
 
 HBgajH 
 
 ■MSSKlClMMi 
 
 Bt88 SSx w8S 
 ■KSra |H 
 
 MB 
 ■ 
 
 fntHMraRS 
 
 ■H Sh EtS 
 ■Mjgittyv n 
 
 ■^■■■■VBlflfSBBcttn 
 
 ■■■BUHHJK39BBGI 
 
 Bra cthto 
 
 ■H BMOTHMW 
 
 ■■■HI —■»»—« 
 ■ Hag Iga KB 
 
 Ban 
 
 hwKSsf BHsh 
 
 h ■n&sa 
 
 lll^^M^Bf— CT1MM 
 
 HBHHHH| 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 5IO. 84 
 
 lifer 
 
 ho. 800 - 8o*3 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/compilingseriall805leas 
 
Report No. UIUCDCS-R-76-805 
 
 NSF-0CA-MCS73-07980 A03-000020 
 
 
 COMPILING SERIAL LANGUAGES FOR PARALLEL MACHINES 
 
 by 
 
 Bruce Robert Leasure 
 
 November 1976 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 The Library of thft 
 
 University at Illinois 
 t urbana-Cham en 
 
Report No. UIUCDCS-R-76-805 
 
 COMPILING SERIAL LANGUAGES FOR PARALLEL MACHINES* 
 
 by 
 
 Bruce Robert Leasure 
 
 November 1976 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 *This work was supported in part by the National Science Foundation under 
 Grant No. US NSF-MCS73-07980 A03 and was submitted in partial fulfillment 
 of the requirements for the degree of Master of Science in Computer Science, 
 November 1976. 
 
1ii 
 
 ACKNOWLEDGMENT 
 
 I would like to thank Rick Schubert for his 
 assistance in implementing the initial version of the 
 compiler. I wo.uld also like to express my appreciation 
 to the growing number of users of the compiler who are 
 patient during these developing stages. 
 
IV 
 
 Table of Contents 
 
 I. INTRODUCTION 1 
 
 II. AN INTUITIVF LOOK 2 
 
 III. THE NEED FOR THESE TECHNIQUES 29 
 
 IV. FOTORE DEVELOPEMENTS 33 
 
 Appendix A - The Structure of the Compiler 35 
 
 Appendix B - Module Descriptions .39 
 
 Appendix C - Data Structures ............89 
 
 LIST OF REFERENCES 93 
 
I. INTRODUCTION 
 
 Much has been written in the past several years 
 about techniques for translating a program written in a 
 serial language (such as FORTRAN) into an efficient 
 proqram for a parallel machine. This paper discusses the 
 use of these concepts in the implementation of a FORTRAN 
 compiler for an idealised target machine. The paper is 
 organized into three sections. First, an intuitive feel 
 for the various concepts and transformations, then a 
 discussion of applicability, advantages, and 
 disadvantages of this type of compilation, and lastly, an 
 outline of future developments. A description of the 
 compiler, its modules, and data structures is given in 
 the appendices. 
 
II. AN IHTpiglYS LOOK 
 
 When executing in a parallel machine, it is often 
 desirable to use as much of the machine a possible at any 
 given time. When compiling a program written in a serial 
 language for a parallel machine, two options are 
 available for achieving this goal: 
 
 1) Recognition of statements that can be done 
 simultaneously for a large number of operands and 
 results. 
 
 2) Discovering a partial execution ordering between 
 statements. 
 
 The first gives a lot of the same thing to do at any 
 given time, and the second gives a lot of possibly 
 different things to do. The usefulness of each depends 
 largely on the type of target machine. 
 
 Unfortunately, any algorithm for recognizing, from 
 FORTRAN source text, operations that can be executed 
 simultaneously for a large number of operands and results 
 must consider the partial ordering implied by the 
 programmer through statement orderings and subscripts. 
 The task of recognizing these simultaneous aspects of a 
 
program is divided into four distinct subproblems, the 
 first of which is by far the most difficult: 
 
 1) Recognition of the partial ordering between 
 statements of the serial program, including 
 orderings between statements on different iterations 
 
 of DO loops. 
 
 2) Recognition of cycles in the partial ordering. 
 
 3) Dse of the partial ordering and the cycles to 
 recognize array operations. 
 
 4) Transformation of all cycles, which are recurrences, 
 into some form efficiently processable on the target 
 machine . 
 
 The algorithms used in this compiler were described 
 in rather complex formulations in the original works [C1, 
 D1, M, S1, T1 ]. Fortunately, the implementation of the 
 theory is relatively straightforward. For the details of 
 each algorithm, see the references included with each 
 section. 
 
REQUIREMENT 
 
 In any compilation transformation, there is one 
 basic requirement that mast be met. The results of the 
 program before the transformation must be the same as the 
 results of the program after the transformation. Some 
 transformations exist which decrease or increase the 
 roundoff error associated with a particular type of 
 calculation. All of the transformations in this paper, 
 excluding the recurrence transformations, are result 
 preserving [M1]. The recurrence transformations disturb 
 the roundoff error associated with the calculation. For 
 further details of the errors associated with recurrence 
 calculations, see [C1]. 
 
 PAPTIAL ORDERING 
 
 In a program written for a serial machine, the order 
 of execution of the statements (ignoring transfers of 
 control) is in the linear order specified by the 
 programmer. It is quite obvious that most programs would 
 generate the same results if the statements of the 
 program were carefully permuted. 
 
1 
 
 A=1 
 
 2 
 
 B=37 
 
 2 
 
 B=37 
 
 1 
 
 A=1 
 
 3 
 
 C=A*B 
 
 5 
 
 D=A+8 1 
 
 a 
 
 B=11 
 
 3 
 
 C=A+B 
 
 5 
 
 D=A*81 
 
 U 
 
 B=11 
 
 6 
 
 E=A+B 
 
 6 
 
 F=A*B 
 
 Both of the programs above generate the same values for 
 the variables A, B, C, D, and E. Since the ordering 
 specified is not unique, a partial ordering seems to be 
 all that is required. A valid partial ordering is then 
 one that always yields values that are the same as the 
 original ordering. Reading < as "must precede", consider 
 the valid partial orderings given here: 
 
 1<2<3<JK5<6 
 
 2<1<5<3<4<6 
 
 1<3<U<6, 1<5, 2<3<4<6 
 
 A chain in a partial ordering is a group of statements 
 for which a complete linear ordering is specified. The 
 rrinimal valid partial ordering is the valid partial 
 ordering with the shortest longest chain of orderings. 
 The recognition of the minimal valid partial ordering is 
 equivalent to finding the minimum number of steps in 
 which a program could be computed, given an arbitrary 
 
number of computing elements and an ideal data switch. 
 Also, even if only one, or a portion of one, statement 
 can be executed at a time on a given machine, a partial 
 ordering allows the compiler to choose which statement to 
 generate next. 
 
 PAfllAIi ORDERINGS arid ITERATION BOOflDARISS 
 
 Consider the following program: 
 
 DO 20 I = 1, 10 
 10 B(I) =1 
 20 A (I) = B(3>1) 
 
 A simultaneous version of each assignment statement in 
 this program can easily be constructed. 
 
 DO 10 I = 1, 10 
 10 B(I) = I 
 
 DO 20 I = 1, 10 
 20 A(I) = B(I+1) 
 
 The problem comes in determining if there exists a valid 
 partial ordering of the two statements in the 
 simultaneous program. In the serial version, each 
 
element of A preserves an old value of B and the elements 
 of B are assigned consecutive integers. If the serial 
 ordering is preserved in the simultaneous program, B has 
 the same value as in the serial program, but A has 
 consecutive integers also. If the opposite ordering is 
 assumed, then the values generated in the simultaneous 
 program are the same as the values generated in the 
 serial program. If no ordering is assummed, on a •small' 
 machine one would have to be selected, and if the wrong 
 choice were made, the answers would not agree. The 
 reason for the valid partial ordering being the inverse 
 of the serial ordering can be easily seen if the DO loop 
 is written out, rather than being expressed iteratively. 
 
 10 B(1) =1 iteration 1 
 
 20 A(1) = B(2) 
 
 10 B (2) =2 iteration 2 
 
 20 A (2) = B(3) 
 
 Statement 20 in iteration 1 uses a value that is modified 
 by statement 10 in iteration 2. Hence, statement 20 
 (simultaneous version) must be executed prior to 
 statement 1° (simultaneous version) . It should be noted 
 that the existance of partial order ings across iteration 
 boundaries does not reguire simply the inversion of the 
 ordering in the serial program. 
 
DO 10 I r 1, 10 
 30 A (I) = B(I*1) 
 UO B(I) =1 
 
 CYCLES and PARTIAL ORDERING 
 
 A cycle in a partial ordering is a collection of 
 statements which are related by a cycle in the directed 
 graph of the precedes relation. 
 
 4<3<2<5<3<1 (statements 3, 2, and 5) 
 
 2<2 (statement 2) 
 
 A cycle may occur only when some form of iteration (IP 
 loops, DO loops, etc.) encloses the statements involved 
 in the cycle, and each statement within the scope of the 
 iteration is considered to be executed simultaneously for 
 all iterations. 
 
DO 20 I = 2, 10 
 10 B(I) = A (1-1) 
 20 A (I) = B(I-1) 
 
 10 B (2) = A<1) iteration 1 
 
 20 A(2) = B(1) 
 
 10 B(3) - A (2) iteration 2 
 
 20 A (3) = B(2) 
 
 Statement 10 in iteration 1 must precede statement 20 in 
 iteration 2. Similarly, statement 20 in iteration 1 must 
 precede statement 10 in iteration 2. This results in 
 10<2C, and 20<10, which is a cycle. Cycles may contain 
 any number of statements. 
 
 DO 10 I = 2, 10 
 10 D(I) = 0(1-1) 
 
 D(2) = D(1) iteration 1 
 D (3) = D(2) iteration 2 
 
 Iteration one must be executed before iteration two. 
 This statement is a cycle, because it must precede 
 itself. 
 
10 
 
 From these examples, it seems rather obvious that 
 cycles in a partial ordering imply that fetching all of 
 the operands, performing all of the operations, and 
 storing all of the results (in that order) will not 
 generate the same values as the serial program for those 
 statements contained in the cycle if the entire index set 
 is done simultaneously. The statements involved in a 
 cycle in a partial ordering constitute a recurrence* 
 Extensive work has been done on fast, accurate, and 
 efficient methods of calculating the results of 
 recurrences [C1, S1 ]. A few of the simpler methods will 
 be presented in the code generation section of this 
 paper. 
 
 STANDARDIZATION TRANSFORMATIONS 
 
 In order to make the calculation of the partial 
 ordering as simple as possible, all of the index sets 
 discovered by the compiler (DO indices, induction 
 variables, etc.) are normalized to a starting value of 
 zero and an increment of one. This necessitates the 
 substitution of an expression using the new index 
 variable for the old index variable for the entire scope 
 of the index assigned. Also, if the old index variable 
 is used outside of the scope of the index set, then the 
 
11 
 
 correct value of the old index variable must be set upon 
 exiting from the scope of the index set. 
 
 DISCOVEBY TRANSFORMATIONS 
 
 In order to recognize a minimal valid partial 
 ordering, the identification of the ordered set of 
 elements involved in the operations is imperative. In 
 FORTRAN, this involves collection of information 
 concerning subscripts. Since subscripts are combinations 
 of scalar integer variables and integer constants in the 
 large majority of cases, the probleir reduces to 
 discovering as much as possible about scalar integer 
 variables. Several items are easily recognizable: 
 
 1) variables defined in a DATA statement and not 
 changed 
 
 2) variables assigned expressions of known scalar 
 integer variables and constants 
 
 3) variables representing index sets 
 
 Of course, when information concerning the value of 
 variables is collected at compile time, the scope of the 
 
12 
 
 value of the variable must be considered [A1 ], 
 
 ESTABLISHING A NEAR MINIMAL VALID PARTIAL ORDERING 
 
 A pair of statements A and B must be related by 
 A < B if statement A is executed before statement B in 
 the serial program and if at least one of the following 
 is true: 
 
 1) Statement A modifies the value of a variable that 
 statement B uses to generate a result. 
 
 C = D statement A 
 
 E = C statement B 
 
 2) Statement B modifies the value of a variable that 
 statement A uses to generate a result. 
 
 C = D statement A 
 
 D = P statement B 
 
 3) statement A and statement B both modify the value of 
 the same variable. 
 
 C = D statement A 
 
 C « p statement B 
 
 To construct a valid partial ordering, all pairs of 
 
13 
 
 statements must be tested for satisfying these 
 conditions. If a statement is involved in an iteration, 
 then each occurrence of that statement must be tested 
 with each occurrence of every statement within the scope 
 of the iteration (including other occurrences of itself) • 
 For a program that does a large amount of iteration, this 
 exhaustive method is extremely expensive in compilation 
 time. A near minimal solution that is inexpensive is 
 obviously desirable, 
 
 A near minimal solution can be achieved by 
 successive approximation. First, each statement is 
 tested against only those statements that can follow it 
 in execution on a serial machine. If a statement is 
 inside the scope of an iteration, then it is tested 
 against every statement inside the scope of the iteration 
 (including itself for tests 1 and 2). To make the tests 
 guick and simple, only variable names are used (not 
 subscript information). The complexity of the tests and 
 the number of tests is kept small by treating each 
 occurrence of a statement from an iteration as identical 
 references to the entire array. 
 
14 
 
 statement 
 
 1 
 
 
 READ(1,1) B 
 
 
 2 
 
 
 A = B ♦ C 
 
 
 3 
 
 
 M = 21 
 
 
 U 
 
 
 DO 40 I = 2, M 
 
 
 5 
 
 
 D(I) = A 
 
 
 6 
 
 
 E(D = D(I) 
 
 
 7 
 
 
 P(I) = F(I-1) 
 
 
 8 
 
 
 G(I) = H(I-1) 
 
 
 9 
 
 
 H(I) = G(I-1) 
 
 
 10 
 
 40 
 
 CONTINDE 
 
 
 11 
 
 
 WRITE(2,2) H, G, A 
 
 1<2<5<11 3<U<8<9<11 9<8 
 U<5<6 U<7 7<7 6<5 
 
 This approximation to the minimal valid partial 
 ordering is minimal where subscript information does not 
 exist, and is always conservative (ie«, always contains 
 the minimal valid partial ordering as a subset) in those 
 cases with subscripts. In fact, two statements will be 
 ordered by this approximation and not by the minimal 
 valid partial ordering only if one of the statements has 
 a result variable that is present in the other statement, 
 but the subscripts expressions are such that the two 
 occurrences of that single variable form nonintersecting 
 sets. Many people in that past have considered this to 
 
15 
 
 be a sufficiently close approximation to the minimal 
 valid partial ordering [B1]» Because of the work done by 
 [K2], a closer approximation to the minimal valid partial 
 ordering seems desireable. 
 
 The second approximation is to check some limited 
 amount of subscript information on those cases whose 
 variable names compare and subscript information is 
 present. \ given variable generally has as many 
 subscript expressions as it has dimensions. If the 
 subscript expression for dimension I of one occurrence of 
 the variable will never equal the subscript expression 
 for dimension I of the other occurrence of the variable, 
 then the two statements are not related by the < relation 
 (ie. , independent) with respect to the interaction of 
 these two occurrences of this variable. 
 
 A (I,3,K) = 
 
 - MI,0,K) 
 
 Since 3 is not equal to 4, the two statements are 
 independent with respect to these two occurrences of the 
 variable A. The situation can become much more complex. 
 
16 
 
 A (I, J) = 
 
 = A (I,K*J) 
 
 Depending upon the value of K, the two statements may, or 
 may not be independent with respect to the variable A, 
 If a range of values can be determined by one of the 
 discovery transforms, then a determination can be made. 
 Otherwise, the < relation must be assummed between the 
 two statements. By checking the subscripts 
 independently, minimality is approached, but still not 
 achieved. 
 
 DO 10 I =1, 20 
 
 A(7*I,T) = 
 
 = A(I,3*I) 
 
 10 CONTINUE 
 
 A (7,1) = ... 
 
 ... = A (1,3) 
 
 A(21,3) = . .. 
 
 ... = A (7,21) independent 
 
17 
 
 Even checking only one subscript at a time, the general 
 solution is extremely difficult without enumeration. The 
 second approximation must be even more limited. Only 
 specific types of subscript expressions are considered. 
 A subscript that can be expressed as a linear combination 
 of index set variables (Xi) and known values (Ci) is 
 called a linear subscript. 
 
 CC ♦ C1*X1 ♦ C2*X2 ♦ .. . 
 
 The detection of intersecting subscript expressions 
 becomes considerably simpler if only linear subscripts 
 are considered, and a large majority of subscripts appear 
 to fall into this catagory. If an arbitrary number of 
 variables associated with index sets is allowed in each 
 subscript, the problem is still very difficult to solve 
 without enumeration. For this reason, only cases 
 involving one index set per subscript are currently being 
 considered. Plans are made to consider two index sets 
 per subscript at some future time. 
 
 In short, the second approximation is applied only 
 if the first approximation has found an ordering to be 
 implied by a variable that occures in both statements 
 with subscripts. Then each subscript pair (corresponding 
 positions from each occurrence) is checked to insure 
 linearity and that at most one index set is used in the 
 
18 
 
 subscript expression foe some dimension. In those cases 
 where all of these conditions are met, a further test is 
 made. Otherwise, the < relation is assummed (a 
 conservative assumption) . For the details of this test, 
 see the appendix section labeled DEPEND. Plans are being 
 made to experiment with other tests to obtain an 
 understanding of the complexity and effectiveness of 
 calculating a partial ordering. 
 
 REDUCTION OF A PARTIAL ORDERING 
 
 The length of the chains in a partial ordering may 
 be reducable by the technigue of forward substitution. 
 If statement A modifies a variable that statement B uses 
 and statament A is executed before statement B in the 
 serial program, then the partial ordering will show that 
 A < B. This relation can be removed from the partial 
 ordering by substituting the expression for the variable 
 that is modified by statement A into the occurrences of 
 that variable in statement B. 
 
 A = B(I) ♦ C * D before 
 
 F * A * D ♦ H 
 
19 
 
 A = B(I) ♦ C * D after 
 
 P = (B(I) ♦ C * D) * D ♦ H 
 
 With subscripted variables, the problem becomes slightly 
 more complex, as the values of subscripts may have to be 
 ad-Justed. 
 
 A (I) = B ♦ C (J-I) before 
 
 F(I) = A(I) ♦ A (1-3) 
 
 A (I) ■ B ♦ C(J-I) after 
 
 F(I) = B ♦ C(J-I) ♦ B ♦ CfJ-X+3) ?? 
 
 The substitution of subscripted variables inside the 
 scope of an index set raises special problems because of 
 boundary conditions (as seen in the previous example) • 
 Because of the difficulties involved, the benefits must 
 be examined to see if the gains are worth the trouble. 
 Forward substitution reduces the number of < relations in 
 the oartial ordering while increasing the computation 
 time for individual statements. If each individual 
 statement describes a computation that will saturate the 
 target machine, then forward substitution will result in 
 a program that is more parallel, but will execute slower. 
 Because of this, forward substitution should only be 
 considered when the statement into which the substituting 
 
20 
 
 is being done will not saturate the target machine. For 
 the purposes of this compiler, any statement inside the 
 scope of an index set is assummed to be able to saturate 
 the target machine. This assumption is reasonable on 
 currently constructable machines (a very finite number of 
 processors). Because of this, forward substitution is 
 only done outside the scope of an index set. 
 
 IDENTIFICATION OF CYCLES IN T.HJ PARXIAL ORDERING 
 
 Once a valid partial ordering is established as the 
 approximation to the minimal valid partial ordering, all 
 of the cycles in the partial ordering (recurrences) must 
 be found. This implies that all of the transative 
 relations must be filled in. 
 
 A<B B<C ==> k<C 
 
 Since the statements of a program can be viewed as 
 vertices of a directed graph, and the < relation as a 
 directed edge, the cycles can be found by one of the 
 standard methods of finding all of the paths of all 
 lengths on a directed graph [f1]« The cycles are then 
 identified by looking for maximal totally connected 
 subgraphs (PI blocks) of the directed graph [B1]. k new 
 
21 
 
 directed graph can be constructed using the PI blocks as 
 vertices and the edges that are unigue between PI blocks 
 as edges in the new graph. This graph represents the the 
 partial ordering of the PI blocks. Each PI block 
 represents the smallest set of statements that can be 
 executed by itself for an entire index set. There are 
 two types of vertices in this new graph, those with self 
 loops, and those without. A vertex without a self loop 
 represents a statement that may be executed 
 simultaneously for all values of the index set. A vertex 
 with a self loop represents a set of statements that form 
 a recurrence relation. 
 
 CONDITIONAL STATEMENTS 
 
 The methods presented, to this point, work very well 
 as long as no conditional transfer of control takes 
 place. When a conditional statement is encountered, 
 there are three possibilities: 
 
 1) Do nothing and take your lumps by assumming no 
 overlap between successively executed portions 
 
 of the program. 
 
 2) Attempt to compute as few conditional statements 
 
22 
 
 as possible by combining conditional statements. 
 
 3) Attempt to remove conditional statements from 
 the scope of index sets. 
 
 It is advantageous to execute as few conditional 
 statements as possible on almost any machine with some 
 type of overlap. This advantage is magnified by 
 pipelining, and by true parallel processing. The basic 
 problem is that when a conditional statement is 
 encountered in the execution stream, the next section of 
 code to be executed is not known. This implies that all 
 statements that are executed on some, but not all, of the 
 branch paths out of a conditional statement must wait 
 (have a < relation) for the completion of the conditional 
 statement prior to becoming eligible for execution. In 
 some cases, this delay is so large as to allow the 
 computation of all paths of a conditional statement 
 during the delay. With the use of the IF tree 
 transformations [D1], all or a portion of the 
 calculations involved in the various paths from the 
 conditional statement may be overlapped with the 
 computation of the conditional. Thus, when the correct 
 branch is selected, only store or move operations need to 
 be done. 
 
23 
 
 Inside the scope of an index set, the problem is 
 magnified by the size of the index set, and by the 
 partial orderings that exist across iteration boundaries 
 that may (conditionally) be introduced. While it would 
 he highly desirable to remove all conditional statements 
 from the scope of an index set by some automatic means, 
 presently available techniques only allow for the removal 
 of a large percentage [T1]. The types of conditional 
 statements that can be easily removed are separatable 
 into three classes. First, if the conditional statement 
 does not have any < relations with any statements within 
 the scope of the index set, and is not dependent upon the 
 index set. This means that the branch of the conditional 
 statement to be executed will be determined prior 
 entering the scope of the index set. The program can 
 then be rewritten by replicating the common code between 
 the various cases, inserting the special code for each 
 possibility , and placing a slightly revised conditional 
 statement outside the scope of all of these index sets. 
 
 Do ar I = 1, 10 before 
 
 A (I) = B(I) 
 
 IF ( C .GT. ) GO TC 20 
 D(I) = 17 
 GO TO 40 
 20 D(I) = 11 
 
24 
 
 40 CONTINUE 
 
 IF ( C .GT. ) GO TC 25 after 
 
 DO 20 I = 1, 10 
 
 A (I) = B(I) 
 
 D(I) = 17 
 20 CONTINUE 
 
 GO TO 45 
 25 DO U0 I = 1, 10 
 
 A (I) = B(I) 
 
 D(I) = 11 
 40 CONTINUE 
 45 CONTINUE 
 
 Secondly, if the conditional statement has a < 
 relation with the index set creator, then mode patterns 
 must be generated, so that each of the new index sets may 
 be executed for the correct values. Exactly one of the 
 index sets (when modified by the appropriate mode 
 pattern) will contain a given value from the original 
 index set. When the conditional statement has a < 
 relation with statements common to all possible execution 
 paths through the conditional, and/or has a < relation 
 with statements in at most one of the possible execution 
 paths, then this conditional may be removed by the use of 
 mode patterns also. 
 
25 
 
 Thirdly, for all remaning conditional statements, 
 there is a test for the complexity of the 
 inter dependencies represented by the < relations between 
 the various execution paths and the conditional statement 
 [T1]. This test was deemed too complex to implement in 
 the initial version of the compiler. For this type of 
 conditional, the PI block containing the conditional 
 statement is marked for serial execution. 
 
 CODE GENERATION 
 
 At this point, the original program has been 
 transformed into a set of partially ordered statement 
 qroupings. Por each of the statement groupings (PI 
 blocks) , code is compiled independently of the partial 
 orderings. If the PI block has no self loop in the 
 partial ordering, then it represents some type of scalar 
 operation performed on a possibly large number of 
 elements. When a PI block has a self loop, it represents 
 a recurrence. 
 
 Because of the three address machine chosen for the 
 target machine, and the transformation that changes the 
 original FORTRAN program into statements having only one 
 
26 
 
 operation, at most two operands, and one result, code 
 generation for PI block without self loops is extremely 
 trivial. Subscript information, if present, is extracted 
 for the operands and the result. Similar information is 
 collected for the upper bounds of all of the index sets 
 (recall that the lower bound is always zero, and the 
 increment is always one) . The operation being performed 
 is translated to the appropriate machine code. (Builtin 
 functions of the serial language are assummed to be 
 implemented as a single instruction in the target 
 machine.) The entire collection is then packaged with the 
 operand and result addresses, to form one, rather large, 
 machine instruction. Quite obviously, conventional 
 machine code could be generated at this point also. 
 
 Recurrences are a little more complex. h test is 
 performed first to see if the recur rnce is linear. A 
 linear recurrence is a recurrence which can be described 
 by 
 
 n-1 
 Xn = Cn ♦ SUM ( Bi * Xi ) 
 i=0 
 
 At present, if a recurrence fails this test, then 
 explicit code is generated to execute the recurrence in a 
 serial manner. Certain improvements in this approach are 
 
27 
 
 possible with in current knowledge (see future 
 developments) • When the recurrence is linear, the set of 
 assignment statements that form the recurrence are 
 converted to a lower triangular banded matrix for 
 calculation by the fastest known method [C1, S1], This 
 method is intuitive for simple cases. 
 
 DO 10 I = 1, 10 
 Z(I) = 3 * Z(I-1) ♦ C (I) 
 10 CONTINUE 
 
 By writing out all 10 equations, and ignoring little 
 details like array bounds, a set of linear equations 
 suitable for the backwards substitution portion of 
 Gaussian elemination is obtained: 
 
 Z(1) = C(1) ♦ 3*Z (0) 
 
 Z(2) = C(2) +0 ♦ 3*Z(1) 
 
 Z(3) = C(3) +0 ♦ ♦ 3*Z(2) 
 etc. 
 
 This can be written in the matrix form z = A * z *• c. 
 The target machine is assumed to have an instruction to 
 compute recurrences that can be expressed in this form. 
 
28 
 
 The appropriate values of A and c must be collected. 
 If constants are used as coefficents in the recurrence, 
 then these elements can be calculated at compile time. 
 When variables are used, the A and c operands must be 
 seeded at run time with the appropriate values. This 
 necessitates the generation of scalar operations whose 
 result and one operand are A or c. At most one such 
 instruction per statement involved in the recurrence is 
 needed to fill c, while at most one instruction per right 
 hand side variable (which is involved in the recurrence) 
 is needed to initialize A. Once the result of the 
 recurrence is calculated, the results must be dispersed 
 to the appropriate left hand side variables. This 
 reguires one instruction per left hand side variable. 
 (If only one left hand side variable exists, then this 
 step is not necessary.) 
 
29 
 III. THE NEED FOR THESE TECHNIQUES 
 
 As truely parallel machines become more and more 
 practical and available, a large volume of programs that 
 are currently coded in a serial language will have to be 
 converted to a more parallel form. While there are 
 several choices on the conversion question, almost all of 
 them are prohibitively expensive: 
 
 1) have an analyst construct parallel versions of the 
 serial algorithms. 
 
 2) run the program in its serial form on the parallel 
 machine . 
 
 3) automatically convert the serial version to a more 
 parallel form. 
 
 Eecause of the cost of computational resources and 
 personnel, the only viable alternative is automatic 
 conversion. The compilation techniques used in this 
 compiler are not only result preserving, but produce 
 efficient code, in most instances, for parallel machines. 
 
 At first glance, it appears that the translation of 
 a program in a serial language to a program in a parallel 
 form would be useful only on machines that have vector 
 operations like STAR or ILLIAC IV. While the parallel 
 
30 
 
 version of the serial program is guite useful for 
 machines such as these, a rather large number of 
 applications exist for more conventional machines. One 
 way to look for advantages of the parallel version of a 
 program over the serial version is to consider the two 
 ma-jor differences: 
 
 1) At any point in a parallel program, at least one 
 statement is available for execution. At any point 
 in a serial program, only one statement is available 
 for execution. 
 
 2) Index sets in a parallel program contain only one of 
 the smallest set of statements that must be executed 
 together. Index sets in a serial program contain at 
 least one of these smallest units. 
 
 These distinctions seem rather trivial for a serial 
 machine until the individual components of a particular 
 serial machine is examined. If the machine has 
 independent functional units like the CDC6600, the wider 
 the choice of instruction orderings available to the 
 compiler, the more efficient that the generated code can 
 be. If the machine has a small instruction cache, then 
 the smallest possible loop has a better chance of fitting 
 within the cache. If the machine has a memory structure 
 designed to reference contiguous locations efficiently 
 
31 
 
 (interleaving, interlacing, multiword fetches, etc.), 
 then the fewer the number of operand streams existing 
 within a loop, the easier the storage assignment problem 
 is for utilization of these memory structures. 
 Scheduling a program on a multiprocessor system such as 
 C.mmp becomes trivial, as each PI block could be assigned 
 to its own processor as it becomes eligible for 
 execution. Paging behavior of a program is also improved 
 by the localization of the instruction and data 
 references. 
 
 Some standard compilation transformations are also 
 easily obtainable from the parallel version of a program. 
 If a calculation precedes no other calculation, then it 
 is redundant, and may be pruned from the execution graph. 
 If a computation is inside the scope of an index set and 
 does not use the variable associated with the index set, 
 and is not a recurrence, then the index set may be 
 deleted (equivalent to moving a statement outside the 
 scope of an index set) . 
 
 The disadvantages of the parallel form of a program 
 written in a serial langage are derived from the same 
 areas as the advantages. By making the statements within 
 the scope of an index set as few as possible, the number 
 of index sets is (possibly) increased. This means that 
 there are more incrementing and testing operations to be 
 
32 
 
 done to complete the execution of the entire program than 
 in the serial form. By doing forward substitution, more 
 computations must be done to achieve the same result. So 
 the penalty that is paid for the increased parallelism is 
 a little more work that must be done. For some machines, 
 the advantages easily outway the disadvantages, and for 
 others, it the decision is made just as easily in the 
 other direction. At present, a signifigant number of 
 machines appear to be on the borderline. As machines 
 become more and more parallel, the translation technigues 
 used in this compiler will become more and more useful. 
 
33 
 
 IV. POTURE DEVELOPMENTS 
 
 There are several areas in which the algorithms used 
 in this compiler can be improved: 
 
 1) In the area of recurrences, the cutting, 
 folding and splitting theorems [C1] would 
 improve efficiency of the resulting code. 
 
 2) The approximation to the minimal valid partial 
 ordering can be improved by treating each of 
 the three ordering conditions seperately [K1]. 
 
 3) The efficiency of conditional statements inside 
 the scope of index sets may be improved by a 
 more complete implementation of the IF removal 
 theorems [ T1 ]. 
 
 4) The conditional statements outside the scope of 
 all index sets could be handled by the method 
 of IF trees ( D1 ]. 
 
 5) More index sets could be found by recognizing 
 induction variables. 
 
34 
 
 6) Special types of recurrences could be 
 recognized for vhich an efficient method of 
 computation is well known [H1]» 
 
 7) COMHON and EQUIVALENCE statements must be 
 considered* 
 
 8) Expand some subroutines in line. 
 
 Work is presently in progress in the first four areas, 
 
35 
 
 APPENDIX A 
 The Structure of the Compiler 
 
 Because the compiler was designed as a test frame 
 for compilation techniques, modifiability was of prime 
 importance, as was the ability to select which 
 transformations should be applied during a particular 
 compilation. For this reason, an extremely modular 
 compiler was designed with all transformations modifying 
 the same data structure. The algorithms used in each of 
 the modules is outlined in appendix B. The major data 
 structures are described in appendix C. A large set of 
 option switches was implemented for controlling 
 transformations and the volume of output. 
 
 Programs are first processed by a lexical analyzer 
 (routine SCANNER) into a tokenized version of the 
 program. The output of the lexical analyzer is a symbol 
 table, a brief description of each statement of the 
 program, and a detailed token by token representation of 
 those statements in the program that have a variable 
 number of items associated with their. These three 
 structures (SYMTAB, PROGRAM, and DETAIL) are passed to 
 the rest of the compilation process by a disk file. A 
 program need only be lexically analyzed once. 
 
36 
 
 When the compiler proper (routine MASTER) recieves 
 control, the current compiler options (which 
 transformations, outputs desired, etc.) are initialized 
 (routine OPTIONS) . Then the disk file of the tokenized 
 program is read (routine BEADIN) . A POBTBAN version of 
 the tokenized program is printed (routine UNSCAN) • The 
 compiler then gives the user the opportunity to change 
 the value of any scalar integer variables by inserting 
 DATA statements into the program (routine CCABDS) • 
 
 The index sets of the program are located and 
 normalized to run from zero to some upper bound with an 
 increment of one (routine DONIFS) . All known expressions 
 of index sets and constants are substituted, following 
 standard scope rules (routine DONIFS) . A FORTBAN version 
 of the transformed program is printed (routine UNSCAN) . 
 The compiler user has the opportunity to redefine the 
 upper bounds of index sets by inserting DATA statements 
 (routine CCARDS) . (Note that the magnitude of the upper 
 bound of an index set may make considerable difference in 
 the partial ordering.) The subscript expressions and 
 index set upper bounds are converted (if possible) to a 
 parenthesis free representation by the use of 
 distribution and other basic algebraic identities 
 (routine INDEX) . A FORTRAN version of the transformed 
 program is printed (routine ONSCAN) . 
 
37 
 
 For every contiguous block of statements outside of 
 the scope of all index sets, and containing no branch 
 target, statement forward substitution is done to 
 increase the width of the expression evaluation trees and 
 to eleminate partial ordering restrictions within the 
 block. A partial ordering is developed (routines IOLISTS 
 and DEPEND) for the block, and statement level 
 substitution done between dependent statements (routine 
 FORWARD) . A FORTRAN version of the transformed program 
 is printed (routine UNSCAN) • 
 
 When selected by the user, conditional statement 
 removal is done by calling routine IFREMOV. IFREMOV 
 removes the conditional statements from the scope of 
 index sets, if possible. IFEEMOVE uses routines SEGMENT, 
 IOLISTS, and DEPEND. The routine IFTREE is called, at 
 user option, to combine all conditional statements 
 outside the scope of index sets. Routine IFCLUB is 
 called to remove all remaining conditional statements 
 from the program by assuming the worst possible type of 
 overlap — none. IFCLUB does this fcy breaking the 
 program into segments of straight line code. Routine 
 SEGMENT is used to locate all of these segments. 
 
 The program is then converted to three address 
 FORTRAN code (ie., one result and at most two operands 
 per assignment statement) (routine INDEX) . A FORTRAN 
 
38 
 
 version of the transformed program is printed (routine 
 UNSCAN) . Except for some minor differences in DO loop 
 bounds, the tokenized version of the program could have 
 come directly from the lexical analyzer. The original 
 FORTRAN program could have been coded in this form. 
 
 Now, the traditional compilation process begins 
 (routine COMPILE) • A list of variables used by each 
 statement is built along with a list of variables 
 modified by each statement (routine IOLISTS) • Then a 
 irinimal valid partial ordering is approximated (routine 
 DEPEND) • All of the transative orderings are discovered, 
 and maximal totally connected subgraphs (PI blocks) of 
 the program graph are located. A partial ordering 
 between the PI blocks is deduced from the ordering in the 
 program graph (routine PIPART) . Code is then generated 
 for each PI block (routine GENER) . The tokenized version 
 of the FORTRAN program (after all transformations) is 
 then output along with the code that was generated 
 (routine WRITEIT) . 
 
39 
 
 APPENDIX B 
 
 Individual Module Descriptions 
 
40 
 
 CCARDS 
 
 FONCTION.1 
 
 To define scalar integer variables (as if coded in DATA 
 statements) 
 
 METHOD ; 
 
 A CHARACTER (6) VARYING value is read from the file 
 CONTROL. If the value is ***EOF, the CCARDS routine 
 returns control to the calling procedure. Otherwise, 
 this value is looked up in the symbol table to see if it 
 is the name of a scalar integer variable. If so, a FIXED 
 BTNRFY(15) variable is read from the file CONTROL, and 
 the appropriate symbol table entry is updated. 
 Appropriate error messages are output when any unexpected 
 condition is met. The method is then repeated. 
 
 PARAMETERS: 
 
 PRFLAG - the debug print flag - BIT (1) 
 
41 
 CONDITIONS :_ 
 
 ENDFILE (CONTROL) - equivalent to reading ***EOP 
 UNDFFINFDFILE (CONTROL) - equivalent to reading ***EOF 
 CONVERSION - the value causing the error is ignored 
 
 MESSAGES.: 
 
 CCAPDS CALLED 
 
 CCARDS UNABLE TO FIND VARIABLE "XXXXXX" 
 
 CCARDS VARIABLE "XXXXXX" IS NOT A SCALAR INTEGER 
 VARIABLE, AND CANNOT BE INITIALIZED 
 
 CCARDS VARIABLE "XXXXXX" HAS VALUE ### 
 
 CCAPDS CONVERSION ERROR ON INPUT. CHARACTER "X" IS IN 
 ERROR. SKIPPING TO NEXT RECCGNIZABLE INPUT 
 
 CCARDS PFTURNS 
 
42 
 
 COMPILE 
 
 FONCTION.1 
 
 The direction of the code generation process for a 
 program segment. 
 
 METHOD :_ 
 
 The input/output list is built for the program segment 
 using the routine IOLISTS. A particular DO loop 
 parallelization is selected from the file DOPARAL if 
 FLAG.DO_LIST is true. Otherwise, the all parallel case 
 is selected. The data dependency graph is built using 
 the routine DEPEND, and this graph is partitioned using 
 the routine PIPABT. The actual code is then generated by 
 calling the routine GENEB. The compiled code is then 
 written to the various output files by calling the 
 routine tfRITEIT. If FLAG.PERMOTE and PLAG.DO_LIST are 
 both false, then COMPILE returns. If PLAG. PERMUTE is 
 true, and PLAG. DO_LIST false, the next parallelization in 
 the permutation seguence is selected. If FLAG.DO_LIST is 
 true, a DO loop parallelization is selected from the file 
 DOPARAL. The method is repeated until all permutations 
 are exhausted, or until a negative parallelization is 
 encountered on file DOPARAL. 
 
43 
 PARAMETERS : 
 
 None 
 
 CONDITIONS: 
 
 FNDFILE (DOPARAL) - causes FLAG.DO_LIST to be set false 
 
 UNDFFINEDFILE(DOPARAL) - causes FLAG.DO_LIST to be set 
 
 false 
 
 MESS^GESj. 
 
 COMPILE CALLED PROGRAM_PTR = ### 
 
 C0I1PILF LIST SPECIFICATION IMPOSSIBLE. NO FILE DOPARAL . 
 (3NC0DE=###) DEFAULT PARALLELIZATION ASSUMMED. 
 
 COMPILE END OF FILE ON DOPARAL. MISSING NEGATIVE 
 PARALLELIZATION. DEFAULT PARALLELIZATION 
 ASSUMMED. 
 
 COMPILE LIST PARALLELIZATION # # # # I 
 
 COMPILE PERMUTE PARALLELIZATION # # # # # 
 
 COMPILE DEFAULT PARALLELIZATION # # t # #...... 
 
 COMPILE RETURNS 
 
44 
 
 DEPEND 
 
 FUNCTION: 
 
 To identify all of the "must precede" ( < ) relations 
 specified by the statement orderings and subscript 
 information. 
 
 METHOD: 
 
 First, the number of statements and the number of index 
 sets present in the program are counted. The scope of 
 each of the index sets is saved for later use. Every 
 input variable is then compared with every output 
 variable, and every output variable is compared with 
 every output variable to check for data dependence, if 
 the statements containing the variables being compared 
 already are related by < relations in both directions (a 
 cycle exists), then the tests is skipped for that 
 combination of variables. 
 
 When two variables are being compared, the variable names 
 are checked for agreement first. If this test is passed, 
 subscripts (if present) are checked next. Each subscript 
 position (ie., for each dimension) is checked 
 independently. First, the subscript is checked for 
 linearity (see IOLISTS) . If the subscript is not linear. 
 
45 
 
 then the worst case must be assummed (a cycle if inside 
 the scope of an index set, otherwise, the linear ordering 
 portion of the cycle) . If the subscripts are linear, 
 then data is collected to allow the classification of the 
 subscript expression into types for ease in testing for 
 dependence. Two simple tests are done on all types of 
 subscripts to quickly determine if a dependence is 
 possible: 
 
 1) The GCD test from integer number theory [N1]- 
 
 2) The real root in the interval test. 
 
 If only constants are invloved in these subscript 
 expressions, then special case is invoked. This case 
 is exact, and very simple, for all it does is check to 
 see if the constants in both subscripts are the same. If 
 the subscripts contain only one index variable between 
 them, then special case 1 is invoked. This case simply 
 enumerates all possible cases for the coefficents 
 involved and the ordering of the variables. While not a 
 pretty test, this test is exact, and relatively simple to 
 understand. When the subscripts contain two index 
 variables between them (not necessarily distinct), then 
 special case 2 is invoked. This case only works, at 
 present, if the coefficients on both of the index 
 variables are ♦I or -1. Enumeration of cases is again 
 the method of solution. If a subscript does not fit the 
 
46 
 
 qualifications for any special case, then the worst case 
 condition is assummed. 
 
 PARAM ETERS; 
 
 IOHFAD - head of input output list - POINTER 
 
 IOTAIL - tail of input output list - POINTER 
 
 DH - dummy - POINTER 
 
 DT - dummy - POINTER 
 
 N - number of statements - FIXED BIMAHY(15) 
 
 ACTIVE - number of index sets - FIXED BINARY (15) 
 
 SIMUL - simultaneous index set - (*) BIT(1) 
 
 PFFLAG - print flag - BIT(1) 
 
 CONDITIONS; 
 none 
 
 MESSAGES; 
 
 A data dependency graph is printed. 
 
 Statistics for the various data dependency tests 
 are printed. 
 
 Optional information concerning every variable 
 comparison is printed. 
 
47 
 
 DETCMPT 
 
 FUNCTION: 
 
 To reconstruct DETAIL, deleting null elements, and 
 following indirect pointers. This procedure places 
 DETAIL back into the form that the SCANNER produces. 
 
 METHOD^ 
 
 A new copy of DETAIL is created by stepping through each 
 statement record of every program segment and copying the 
 DETAIL entries described to a temporary DETAIL array 
 (named NEW_DITAIL) . If the forward substitution table 
 (named FORTAB) is allocated, the DETAIL entries described 
 in FORTAB are also copied. The old DETAIL is then 
 released, the size of DETAIL is adjusted, and it is 
 reallocated. The NFW_DETAIL array is then copied into 
 DETAIL element by element. The NEW_DETAIL array is then 
 released. The control pointers into DETAIL from PROGRAM, 
 and FOPTAB are updated during the copying process. 
 DETLAST and DETSIZE are also updated. 
 
 PARAMETERS: 
 
 FXTPA - the amount of free space to be left in copied 
 
 D'TAIL - FIXED BINARY(15) 
 
48 
 CONDITIONS; 
 
 ND_OVEPFLOW - causes NEW_DBTAIL to grow 
 BAD_PROGRAM_TYPE - raises the ERROR condition 
 
 MESSAGES,: 
 
 DETCMPT CALLED. EXTRA * ### 
 
 DETCMPT NEH DETAIL OVERFLOW 
 
 DETCMPT BAD FNTRY IN PROGRAM ARRAY. STATEMENT TYPE = ##t 
 
 DETCMPT C0PY_1, ND_CURRENT = ###, ENTRY = "X M 
 
 DETCMPT COPY_IT, PROM = #♦#, SIZE = ### 
 
 DETCMPT COPIFR, PROGRAN_PTF = ###, DETAIL_PTR * ##♦ 
 
 DETCMPT COPY_BAS, PPOGRAM_PTR = ### , DETAIL_PTR = ### 
 
 DETCMPT RETORNS 
 
49 
 DONIPS 
 
 FONCTIONJ. 
 
 To normalize all DO loop index sets to begin at zero, and 
 to have an increment, of one. To discover as much as 
 possible about the integer expressions used in 
 
 subscripts. 
 
 METHOD; 
 
 There are two processes being done simultaneously in this 
 module. For ease in discussion, the processes will be 
 considered seperately. 
 
 DO loop normalization is a very simple process. Whenever 
 a DO loop is encountered, it is normalized after index 
 forward substitution (which will be discussed later) is 
 done on the statement. k DO loop is of the following 
 form prior to normalization: 
 
 DO STMT* INDEX = LOWER, UPPER, INCH 
 
 After normalization: 
 
 DO STMT# NEW_INDEX = C, NEW_0PPER, 1 
 
 The new upper bound is a very simple function of the old 
 lower bound, increment, and upper bound: 
 
50 
 
 NEW_UPPER = (UPPER - LOWER) / INCR 
 
 The old index variable can easily be computed from the 
 new index variable: 
 
 INDEX = NEW_INDEX * INCR ♦ LOWER 
 
 The entries in the statement record for the DO loop are 
 updated to reflect the changes made by DO loop 
 normalization. A zero is filled into the lover bound 
 (rather than a pointer to the symbol table), and a one is 
 filled into the increment. The upper bound is set to an 
 indirect pointer to the DETAIL array vhere the expression 
 for NEW_OPPER shown above is stored (indirect pointers 
 are always negative) . The name of the variable 
 associated with the new upper bound is "U6## ", where the 
 #* comes from the number of DO loops previously 
 encountered in this call to DONIFS. 
 
 Similarly, the new index variable is named "XSt*". 
 The symbol table pointer for the new index variable 
 replaces the old, and an entry is placed in the index 
 forward substitution table (FORTAB) for the old index 
 variable (see the expression given above) . 
 
 The substitution bounds on this entry are egual to 
 the bounds of the original DO loop. 
 
51 
 
 The process of index forward substitution is also very 
 simple. Pirst, the every statement is examined for a 
 variable that is already in the forward substitution 
 table (within the substitution bounds, of course). If 
 such a variable is found, the occurrence of the variable 
 is replaced by an indirect pointer to the statement 
 defining the substitution for that variable. (Note that 
 these indirect pointers are followed and removed by the 
 routine DETCM^T.) A variable can be added to the forward 
 substitution list if it is a scalar integer variable and 
 if every operand on the right hand side of the assignment 
 statement is a known scalar integer variable (DATA 
 statement) , a DO loop index, or a constant. 
 
 PARAMETERS: 
 
 PROGRAM_PTR - pointer to a program segment - PIXED 
 BINARY (15) 
 
 CONDITIONS^ 
 
 FT_0VERPL0W - the forward substitution table is grown 
 SYMBOL_TABLE_OVERFLOH - causes the symbol table to grow 
 
52 
 HFSSAGES: 
 
 DONIPS CALLED. PROGRAM_PTR = ### 
 
 DONIFS DO LOOP AT PROGRAM_PTR = ### IS NORMALIZED. 
 
 NEW STATEMENT RECORD: *# it ## #1 ** ** *# if *t 
 
 DONIFS ADD SYMBOL AT ##, NAME IS "XXXXXX", TYPE = ## 
 
 PONTFS FORTAB ENTRY ADDED AT ##. ENTRY =(## ## - ##) 
 
 DONIFS FORTAB ENTRY REPLACED AT ##. ENTRY = (## #♦ - #•) 
 
 DONIFS SUBSTITUTE AT DETAIL_PTR = # t OF i# FOR ## 
 
 DONIFS SYMBOL TABLE OVERFLOW 
 
 DONIFS FORWARD SUBSTITUTION TABLE OVERFLOW 
 
 DONIFS PFTURNS 
 
53 
 
 EBROH 
 
 FUNCTION: 
 
 To generate error and warning messages from stock 
 phrases. 
 
 METHOD :_ 
 
 There are two entry points to ERROR: ERROR and WARN. 
 Both entry points use the same message generation 
 process. The message number is used as an index into a 
 table of messages. This message, along with the 
 additional character string parameter is printed out in a 
 box made of * characters. If the WARN entry point was 
 used, control is returned to the caller. If the ERROR 
 entry point was used, the ERROR condition is raised. 
 
 PARAMETERS: 
 
 # - the error message number - PIX1?D BINARY (15) 
 
 C - the additional character message - CHARACTER (*) 
 VARYING 
 
 CONDITIONS: 
 
 none 
 
54 
 MESSAGES^ 
 
 An error or warning message enclosed in a box of * 
 
 Possible messages: 
 
 UNDEFINED PILE.... NO OUTPUT WILL BE WRITTEN TO 
 FILE "XXXXXX" 
 
 UNDEFINED FILE.... NO DATA READ FROM FILE "XXXXXX" 
 
 UNEXPECTED END OF FILE ON FILE "XXXXXX" 
 
55 
 
 FORWARD 
 
 FUNCTION: 
 
 To take a portion of a program segment, and to rewrite 
 each statement in this portion much like index forward 
 substitution, but without worrying about known values or 
 variable types. Subscripted arrays are handled in 
 addition to scalars. 
 
 HETHQD ; 
 
 This routine uses the data dependency graph to make the 
 search space smaller for the substitution process. If 
 one statement is dependent on another, then FORWARD will 
 look at the output variables of the statement that must 
 be executed first, and see if an input variable occurs 
 with the same name in the other statement. If so, the 
 subscript lists are checked for linearity (see the 
 definition under the module IOLISTS). If the subscripts 
 are linear, then the compiler could perform the necessary 
 subscript manipulation to substitute. If both subscripts 
 are not linear, an elementary pattern match is done to 
 see if the subscripts contain the same variables, with 
 the same operators relating them. If this is true, then 
 the substitution is also performed. Otherwise, no 
 substitution is performed. Substitution is done by 
 
56 
 
 placing an indirect pointer to the substituting 
 expression in place of the occurrence of the input 
 variable. The subscript list (if any) in DETAIL that was 
 associated with the variable that was replaced is nulled 
 out. 
 
 PARAMETERS:, 
 
 IOHEAD - list head for the input output lists - POINTER 
 IOTAIL - list tail for the input output lists - POINTER 
 EH, DT - dummy parameters - POINTER 
 N - the number of statements - FIXED BINARY (15) 
 PRFLAG - the debug print flag -EIT(1) 
 
 CONDITIONS: 
 
 none 
 
 MESS AGE S: 
 
 FORWARD CALLED (### STATEMENTS) 
 FORWARD SUBSTITUTING ## AT DETAIL (##) 
 
 FORWARD RETURNS 
 
57 
 GENER 
 
 FONCTIONl 
 
 To transform the PI PARTITIONed three address FORTRAN 
 program into vector machine language. To collect certain 
 bounds associated with the complexity of the computation 
 of the program. 
 
 METHOD^ 
 
 The input output lists are used to build the output code. 
 First an index is built that allows a statement number to 
 reference an input output list directly, rather than 
 having to search the list for it. The BEGIN SEGMENT node 
 is generated. Then each PI BLOCK generates one code 
 element. Depending upon the cyclicity of the PI BLOCK, 
 and upon the statement type, an appropriate subroutine is 
 called to generate the code for that PI BLOCK. The code 
 element for the END SEGMENT is then built. At present, 
 there is only one type of code element, that for the 
 three address code element. Obviously, this is not 
 needed for such operations as BEGIN_SEGMENT, END_SEGMENT, 
 NOP, DO, or PO_END. Two more types of code elements are 
 currently planned, both of which are subsets of the 
 current code element. Now that all of the code elements 
 are built, the successor list is built using the partial 
 
58 
 
 ordering information for the PI BLOCKS in PORD. If a PI 
 BLOCK has no predecessors, then this PI BLOCK is made a 
 successor to the BEGIN_SEGMENT node. If a PI BLOCK has 
 no successors, and is not inside a serial no, then this 
 PI BLOCK is made a predecessor to the END_SEGHENT node. 
 
 When building the code element for a simple assignment 
 statement or input/output statement, the output variable 
 is the first variable on the output list of the input 
 output list for this PI BLOCK, Similarly, the first and 
 second operands come from the first and second operands 
 on the input list of the input output list for this PI 
 BLOCK. The operation to be done is discovered by looking 
 at the DETAIL pointer contained in the last input output 
 list atom examined. This pointer («hen adjusted 
 correctly) points to the operator token in DETAIL that is 
 correct for this PI BLOCK. 
 
 When building the code element for a recurrence, the 
 RECURRENCE operation itself is built first. The size of 
 the A matrix is calculated by multiplying the product of 
 the parallel index sets by the number of nontemporaries 
 as the first element in the output list of all of the 
 input output lists for this PI BLOCK. The recurrence 
 head (a NOP) is then allocated, and the setup nodes 
 (ADDITIONS , MULTIPLIES, etc.) for A and C are allocated 
 and linked between the recurrence head and the RECURRENCE 
 
59 
 
 operation itself. The recurrence tail is then allocated 
 (a NOP) . 
 
 Then the output dispersal nodes are allocated for 
 every nontemporary as the first element of an output list 
 of all of the input output lists for this PI BLOCK and 
 linked between the RECURRENCE operation and the 
 recurrence tail. 
 
 P ARAM ET ER.S :_ 
 
 SIMUL - which DO loops are parallel - (*) BIT (1) 
 
 CONDITIONS: 
 
 NO SIMULATE - turns ELAG. SIMULATE off 
 
 MESSAGES: 
 
 GENER iPARTS = ##, ALLOCN(PORD) = # 
 
 GENER ADD_SUC PRIMITIVE PRED.OC=##, PRED. SN= t#. #• 
 
 SUC.OC=t#, SUC.SN=##.## 
 
 A decimal and English listing of each code element. 
 
 GENER NO_SIMULATE RAISED EROM PROCEDURE "XXXXXXXXXXX", 
 WHILE PROCESSING PI BLOCK ## 
 
 GENER RETURNS 
 
60 
 
 IFCLUB, IFREHOV, IFTHES 
 
 FUNCTION: 
 
 These routines direct the conditional statement removal 
 process. 
 
 M^THODj. 
 
 These routines are currently in a state of flux as 
 various implementations of the theory are being tried. 
 
61 
 
 INDEX 
 
 FUNCTION^ 
 
 To remove all unnecessary parentheses from subscript 
 expressions by the application of distribution. To 
 transform the FORTRAN assignment statements so that there 
 is at most one operation on the right hand side. 
 
 METHOD! 
 
 This module can be thought of as two distinct 
 transformations, though a good deal of common code is 
 shared by the two. Each transformation will be explained 
 seperately. In any case, the program segment is 
 processed one statement record at a time. 
 
 When a statement is encountered that could contain a 
 subscripted variable, the statement is parsed, using a 
 simple precedence method, until a beginning of subscript 
 condition (a non unary left parenthesis) is found. At 
 this point, the parse tree building routine is enabled, 
 and a parse of the subscript is made until an end of 
 subscript condition (a non unary right parenthesis or 
 comma) is found. The parse tree is then distributed, and 
 a corresponding FORTRAN like subscript entry is built at 
 the end of DETAIL. An indirect pointer is placed in the 
 
62 
 
 position of the old subscript, and any remaining space is 
 nulled. If a comma caused the end of subscript 
 condition, the beginning subscript condition is then 
 caused, and the process repeats. If a left parenthesis 
 caused the end of subscript condition, then the parse 
 tree builder is disenabled, and the parsing of the 
 current statement continues. 
 
 Every assignment statement is parsed, with the parse tree 
 builder initially enabled. Whenever a subscript list is 
 found, a pointer to it is saved, but it is ignored. When 
 the end of statement is reached, a special routine 
 converts this parse tree to a collection of FORTRAN 
 statements (placed at the end of DETAIL) in such a way so 
 that every statement has at most one operation. this 
 necessitates the creation of temporaries (much like the 
 generation of three address code) . The temporaries 
 created have the same number of dimensions as the current 
 DO loop nest level. Temporaries have a name of "TS##" 
 where the ## comes from the number of temporaries created 
 during this call to INDEX. One other simplification is 
 done at this stage. If all of the operands of one of 
 these generated statements would be constants, then the 
 name of the temporary is changed to "ZS##", and the 
 statement is not generated. If the PORTRAN program had a 
 scalar coded inside a DO loop as the result (left hand 
 side) variable of an assignment statement, the scalar 
 
63 
 
 variable vill be replaced by a compiler substituted array 
 with the same number of dimensions as the DO loop nest 
 level. The name of this substiuted variable is "S6##". 
 Really, a multiple assignment is created, with the M S&#t" 
 variable and the scalar both on the left hand side. If 
 the scalar variable is then used on the eight hand side 
 of an assignment statemtent (or similar. • .CALL, WRITF, 
 etc.) , then the scalar variable will be replaced by the 
 substituted variable. This transforms more things into 
 vector operations, and eliminates some of the common 
 forms of recurrences. Code should fce generated when 
 exiting a DO loop to assign the scalar value the correct 
 element of the substituted array. This is not done, as 
 yet, though a place is reserved for doing it in the INDEX 
 routine. 
 
 An example of both transformations is perhaps best: 
 
 DO 10 I = 1,15,1 
 10 k = 1 + 2*B(U*(I-1) ,1*3) +6*8 
 
 becomes: 
 
 DO 10 I = 1,15,1 
 T&0C(I) = 2 * B(4*I-U*1,I43) 
 T601 (I) = 1 ♦ T&00 (I) 
 10 S6C3(I), I = T501(I) ♦ Z&02 
 
64 
 
 P A R A n EJJR s :_ 
 
 PROGRAM_PTR - program segment pointer - FIXED BINARY (15) 
 SUBSW - standardize subscript switch - BIT(1) 
 PARSESW - generate one operation code - BIT(1) 
 
 CONDITIONS! 
 
 TREF_SPACE - causes parse tree to grow 
 BAD_SOBSCRIPT - raises the ERROR CONDITION 
 BAD_OPERATION - raises the ERROR CONDITION 
 STACK_OVERFLOW - causes the parse stack to grow 
 V_STACK_OVERFLOW - causes the variable stack to grow 
 DETAIL_OVERFLOM - causes DETAIL to grow 
 V_STACK_0NDEPFLOW - raises the ERROR CONDITION 
 STACK_ONDERFLOW - raises the ERROR CONDITION 
 ONMATCHPD_PAREN - raises the ERROR CONDITION 
 UNSATCHFD SUBSCRIPT - raises the ERROR CONDITION 
 
65 
 
 MESSAGFS:_ 
 
 INDEX CALLED POF PROGRAM_PTR = ### 
 
 INDEX STACK POP OF ##• 
 
 INDEX STACK = #t# ### •## ... 
 
 INDEX STACK POSH OF tit 
 
 INDEX STACK OVERFLOW 
 
 INDFX STACK UNDERFLOW 
 
 INDEX V_STACK POP OF ttt 
 
 INDEX V_STACK = tit tit *%$ ... 
 
 INDEX V_STACK PUSH OF *** 
 
 INDEX V_STACK OVERFLOW 
 
 INDEX V_STACK UNDERFLOW 
 
 INDEX PARSE TREE (it) (LEFT RIGHT DETAIL SYMBOL) 
 (f$) [i* ** t9 it) (it) (ti it ** **) . 
 
 INDEX ADD SYMBOL AT $*, NAME IS "XXXXXX", TYPE=ii 
 
 INDEX SYMBOL TABLE OVERFLOW 
 
 INDEX EXPRESSION PARSE FROM tit TO Uf 
 
 INDEX EXPRESSION PARSE RETURNS 
 
 INDFX BEGIN SUBSCRIPT AT DETAIL_PTR = tti 
 
 INDEX END SUBSCRIPT AT DETAIL_PTF = *§* 
 
 INDEX VAR ADD FOR tit AT DETAIL_PTR = tit 
 
 INDEX VAR ADD FOR *** AT DETAIL_PTR - ***, AT TREE INDEX 
 tit 
 
 INDEX OPERATOR ADD OF OPERATOR *% 
 
 INDEX OPERATOR ADD OF OPERATOR it, AT TREE INDEX ♦♦♦ 
 
 INDEX TTD CALLED WITH TP=ii, GlF=ti, G2F=it, OPR = it 
 
 INDEX DETAIL PLACE OF tit AT DETAIL PTR = tti 
 
66 
 
 INDEX DETAIL OVERFLOW 
 
 INDEX TPEE SPACE OVERFLOW 
 
 INDEX UNMATCHED LEFT PAREN 
 
 INDEX DISTRIBUTE CALLED FOR TREE_INCEX = #tt 
 
 INDEX TREE TO TRIAD TEMPORARY "XXXXXX" SUBSTITUTED FOR 
 "XXXXXX" 
 
 INDEX TREE TO TRIAD, NO VARIABLE OF NAME "XXXXXX" IN THE 
 TEMPORARY LIST 
 
 INDEX TREE TO TRIAD CALLED 
 
 INDEX TOO MANY USER TEMPORARIES 
 
 INDEX NULL SUBSCRIPT 
 
 INDEX TOO FEW OPERANDS FOR OPERATOR 
 
 INDEX USER VARIABLE "XXXXXX" CHANGE! TO TEMPORARY 
 "XXXXXX" 
 
 INDEX DOt #t HAS A SUBSCRIPT LIST A I DETAIL_PTR = ###, 
 AND ENDS JUST PRIOR TO PROGRAM ENTRY ### 
 
 INDEX SUBSCRIPT NORMALIZATION OF DETAIL_PTR tt# 
 
 INDEX TRIADIZATION OF DETAIL_PTR ### 
 
 INDEX RETURNS 
 
67 
 IOLISTS 
 
 FUNCTION! 
 
 To build a list of input and output variables for each 
 statement record in a program segment, and to link all of 
 these lists together in one large list for the entire 
 
 segment. 
 
 METHOD^ 
 
 First, the number of DO loops in the program segment is 
 counted. This enables the storing of subscript 
 information with positional identification of which DO 
 loop to which the information applies. The program 
 segment is then processed a statement record at a time. 
 If a statement can change the value of a variable, then 
 that variable is added to the output list for this 
 statement. If the value of a variable can be used by a 
 statement, then that variable is added to the input list 
 for this statement. It should be noted that DO loops 
 thus have an output list of the index variable, and an 
 input list of the variables used in the bounding 
 expressions. SUBROUTINE, READ, and FUNCTION statements 
 have only output lists. WRITE, RETURN, and STOP 
 statements have only input lists. CALL statements have 
 identical input and output lists. There is one case 
 
68 
 
 where an input output list is built when there is not 
 necessarily a corresponding statement. When the end of a 
 00 loop is sensed, an input output list is built with no 
 inDat or output list (ie., just a list head). Since 
 statement types are stored in the list head, it is easy 
 to differentiate between the various types of statements 
 for which input output lists are built (if need be) . 
 ?ach atom in the input or output list is identical in 
 form and construction, so the construction will only be 
 explained once. The symbol table pointer for the 
 variable concerned is obtained from a DETAIL entry 
 associated with the current statement record. This entry 
 is looked up in the symbol table, and the number of 
 dimensions is extracted. An atom is then allocated for 
 the variable. If the variable is dimensioned, then the 
 subscripts are checked for linearity. A linear subscript 
 is one that can be expressed as: 
 
 CO + C1*X1 ♦ C2*X2 ♦ .... 
 
 Where the Ci are integer values known at compile time, 
 and the Xi are DC loop index variables. If the 
 subscripts are linear, then the Ci are copied in to a 
 matrix with the corresponding Xi being known by the 
 position of the Ci in the matrix. If the subscript is 
 not linear, then the expression must be calculated at run 
 time, and hence the variables involved in the subscript 
 
69 
 
 must be added to the input list of the statement. The 
 linearity of the subscript is saved* 
 
 PAR£U1FRS:_ 
 
 PROGPTR - pointer to a program segment - FIXED BINARY (15) 
 IOHEAD - the head of the iolist to be built - POINTER 
 IOTAIL - the tail of the iolist to be built - POINTER 
 DH, DT - dummy - POINTER 
 
 ACTIVE - the number of DO loops - PIXED BINARY (15) 
 PRFLAG - the debug print flag - BIT (1) 
 
 CONDITIONS: 
 
 AREA - causes input output list space to grow 
 
 M ESSAGE S.; 
 
 IOLISTS CALLED, PROGRAM (###) 
 
 A listing of the iolists for the entire program segment. 
 
 IOLISTS RETURNS 
 
70 
 
 MASTER 
 
 FUNCTION; 
 
 To control the transformation process from the tokenized 
 version of the FORTRAN program generated by the SCANNER 
 to the three address code product. To title and number 
 output pages. To grow the correct AREA when more space 
 is needed. 
 
 METHOD:. 
 
 First, the options to be used for this particular run are 
 obtained by calling the routine OPTION. Then the scanned 
 program is obtained by calling the routine READIN. A 
 FORTRAN version of the tokenized program is printed out 
 now, and after every major transformation step by calling 
 the routine UNSCAN. The routine CCARDS is called to 
 define any scalar integer variables that are known at 
 scan time. Then DO loops are normalized, and index 
 forward substitution is done by calling the routine 
 DONIFS. The DETAIL array is compacted by calling the 
 routine DETCMPT. The CCARDS routine is called again, to 
 allow the definition of DO loop upper bounds ("0&## n 
 variables) , if desired. The routine INDEX is then called 
 to standardize the subscripts. The DETAIL array is 
 compacted, and SSDEL is called to eleminate unnecessary 
 
71 
 
 parentheses associated with beginning and ending 
 subscripts. If the forward substitution option vas 
 selected, then these steps are done. For every 
 contiguous group of assignment and input/output 
 statements outside of all DO loops, an input output list 
 is built using the routine IOLXSTS. Then, a data 
 dependency graph is built using the routine DEPEND. The 
 statement level forward substitution is then done by 
 calling the routine FORWARD. The now useless input 
 output list is released by calling the routine IOLFRE. 
 This process is repeated until no more contiguous groups 
 of statements (as described above) exist. Then, the 
 DETAIL array is compacted. This ends the special code 
 done for the forward substitution option. The FORTRAN 
 program is then rewritten as three address code FORTRAN 
 by calling the routine INDEX. Excessive parentheses are 
 again deleted by the routine SSDEL. If IF statements are 
 to be removed (and all must be) , input output lists are 
 built for the entire program by the routine IOLISTS, and 
 the routine IFREMOV is called to remove all of the IF 
 statements that it can by using MODE patterns, reordering 
 computations, etc. The input output lists are released 
 when IFREMOV returns. In case there are any IF 
 statements left, that IFREMOV was unable to handle, the 
 routine IFCLOB is called to segment the program into IF 
 free pieces. The routine COMPILE is then called for each 
 
72 
 
 program segment to control the generation of code for 
 that segment. 
 
 If the AREA condition is raised, the ON-BLOCK in MASTER 
 decides which area to grow by looking at the name of the 
 procedure that caused the AREA condition to be raised. 
 
 Titles and page numbering are handled by an 
 FNDPAGF(SYSPRINT) ON-BLOCK. 
 
 PARAMETERS:^ 
 
 ST - the subtitle - CHARACTER (65) VARYING 
 
 CONDITIONS^ 
 
 AREA - causes the appropriate AREA to grow 
 
 FNDPAGF (SYSPRINT) - causes titles and page numbers 
 
 ERROR - causes the FINISH condition 
 
 FINISH - causes debug printing and termination 
 
 ONDEFINEDFILE (CODEOUT) - causes no code to be generated 
 
 UNDEFINEDFILE(SCAN) - causes no SCANNER like output 
 
73 
 
 MESSAGES; 
 
 MASTER CALLED 
 
 MASTER BEGIN COMPILATION OP SEGMENT ### 
 
 MASTER ARFA CONDITION RAISED 
 
 MASTFR SYMBOL TABLE SIZE = ♦##, CODE_SIZE = ### 
 
 MASTER RETORNS 
 
74 
 
 OPTION 
 
 FUNCTION: 
 
 To allow the user to change the default values of the 
 various option flags, debug flags, and array sizes for 
 the needs of the particular program being analyzed. 
 
 METHODj. 
 
 A GET DATA is done on the file DEFAULT for three 
 strucures: AFEAINF, FLAG, and DEBUG. A GET DATA for the 
 same three structures is then done on the file OPTIONS. 
 This allows differing sets of default options for 
 different users, with a simple method of overriding the 
 default options. 
 
 PARAMETERS 
 
 none 
 
75 
 
 CONDITIONS; 
 
 UNDEPINEDPILE (OPTIONS) - turns DEBUG. OPTION on 
 UNDEFINEDPILE(DEPAULT) - turns DEBDG. OPTION on 
 ENDFILE (OPTIONS) - turns DEBUG. OPTION on 
 ENDFILE (DEFAULT) - turns DEBUG. OPTION on 
 NAME (OPTIONS) - ignores the item 
 NAME (DEFAULT) - ignores the item 
 
 M^SSAGES^ 
 
 OPTION CALLED 
 
 OPTION END OF FILE ON OPTIONS 
 
 OPTION NAME CONDITION RAISED ON M X" . SKIPPING TO NEXT 
 NAME 
 
 OPTION OPTIONS FILE NOT DEFINED. DEFAULTS ASSUHMED 
 
 A listing of all of the options that will be used. 
 
 OPTION RETURNS 
 
76 
 
 PIPART 
 
 FUNCTION: 
 
 To locate all cycles in a data dependency graph, and to 
 qenerate a new dependence graph with each of the cycles 
 corresponding to a single entry. To eliminate transative 
 dependencies. 
 
 ME THOD; 
 
 The transative closure of the data dependency matrix is 
 computed. This new matrix contains entries if there is a 
 path of any length between two nodes. Hence, if the 
 diagonal symmetric entries are true for two statements, 
 then the two statements are in the same cycle. The 
 statementts that are in the same cycle are collected in a 
 straightforward manner. Each of the cycles, or 
 independent statements is called a PI BLOCK. The partial 
 ordering of the PI BLOCKS is obtained from the data 
 dependency matrix by ORing rows together that correspond 
 to the same PI BLOCK. This ordering is not minimal, as 
 many links exist that could be inferred by the transative 
 law. At present, no attempt is made to remove these 
 extra links. The only harm in this outlook is that more 
 space is needed for successor lists during the code 
 generation process. 
 
77 
 PARAMETERS; 
 
 N - the number of statements - PIXEC BINARY (15) 
 
 tPARTS - the number of PI BLOCKS found - PIXED BIHARY(15) 
 
 PPPLAG - the debug print flag - BIT (1) 
 
 CONDITIONS: 
 
 none 
 
 MESSAGES: 
 
 A partition of the statements into PI BLOCKS, 
 A partial ordering of the PI BLOCKS. 
 
78 
 
 PRINTER 
 
 FUNCTION^ 
 
 To provide a debug output of the symbol table (SYMTAB) 
 and the tokenized FORTRAN program (DETAIL and PROGRAH) . 
 
 METHOD:. 
 
 PROGRAM, DETAIL, and SYMTAB are printed out in decimal, 
 using column labels where appropriate. The printing of 
 the symbol table is optional. The next record pointer 
 (offset -M) is used in conjunction with the statement 
 type (offset ♦()) to disect the PROGRAM array into 
 statement records. The semicolon (token 99) is used to 
 break the DETAIL array into records. Each of the three 
 tables is printed until the zalue of PRGLAST, DETLAST, or 
 SYMLAST is reached. 
 
 PARAMETERS^ 
 
 PF.SYM - print the symbol table flag - BIT(1) 
 
 CONDITIONS: 
 
 none 
 
79 
 MESSAGES :, 
 
 A listing of PROGRAM broken at statement records 
 
 A listing of the symbol table 
 
 A listing of DETAIL broken by semicolons 
 
80 
 
 READIN 
 
 FUNCTION! 
 
 To read in the tokenized FORTRAN program. 
 
 METHOD: 
 
 A file with a name of SCANOUT is opened. Two control 
 words are read in that govern the size of PROGRAM, and 
 the amount of space already used in PROGRAM (PRGSIZE and 
 PRGLAST) . PROGRAM is then allocated, and filled with 
 another read. Similar things happen, and DFTAIL and 
 SYMTAB are read in. Two dummy words are read in (the 
 first of which contains the title length, but it is 
 ignored) , and then the title is read in. The initial 
 segment head (SEGMENT) is allocated, and initialized. 
 
 PARAMETERS: 
 
 TITLE - the title of this program - CHARACTER (65) 
 
 CONDITIONS: 
 
 TJNDEFINFDFILE (INFILE) - raises the ERROR condition 
 
 ENDFILE (INFILE) - raises the ERROR condition or a blank 
 
 title 
 
81 
 MESSAGES: 
 
 READIN CALLED 
 
 READIN ERROR - NO INPUT FBOH SCANNER 
 
 PEADIN NOT ENOUGH DATA FROM SCANNER 
 
 READIN RETURNS, PRGSIZE=##, SYHSIZE=##, DETSIZE=## 
 
82 
 
 SCANNER 
 
 FUNCTION-! 
 
 To take a card image FORTRAN program and perform standard 
 lexical analysis functions. To remove all expressions 
 from the conditional portions of IF satements. 
 
 KETHOD^ 
 
 An input card is read from the file SYSIN, and the 
 statement number is extracted from it (if it exists) . 
 Then the first token is extracted from the card. If this 
 token is one of the FORTRAN key words, a special routine 
 is called to handle that particular type of statement. 
 Otherwise, the statement must be an assignment statement. 
 Needless to say, this overlooks the two cases of possible 
 ambiguities in FORTRAN for this type of parse: a 
 variable named DO or a variable named FORMAT. Also, 
 statements such as INTEGER FUNCTION, and REAL FUNCTION 
 are parsed as INTEGER and REAL statements, respectively, 
 because of the use of blank as a delimiter. While these 
 would all be serious flaws in a production FORTRAN 
 compiler, only a small amount of inconvience is caused in 
 the environment in which the SCANNER will be used. The 
 resolution of statement labels is done after the entire 
 program is processed. A list of all of the statement 
 
83 
 
 number definitions and references is kept until the END 
 card is processed. The expression that was coded inside 
 an IF statement is removed by the IP statement processor, 
 and it is assigned to a temporary variable "L&t# n . This 
 temporary variable is then placed inside the IP statement 
 in place of the expression that was removed. A certain 
 number of control cards are accepted by the SCAHHER. 
 Each control card begins with a %, and then one of 
 several key words: DETAIL, PROGRAM, SYHBOL, or DSER. If 
 one of these key words is recognized as the first token 
 on a control card, the next token on the card is assummed 
 to be a number giving the size of the respective table. 
 If the first token is not one of the control card key 
 words, then it is assummed to be the name of a user 
 defined builtin function. The second token on the 
 control card is assummed to be a nutrber giving the 
 relative difficulty of calculating this function (from 
 one to four). The control cards for the sizes of the 
 various tables should be included prior to any other 
 cards in the input file SYSIN, if included at all. 
 Control cards defining user builtin functions should 
 appear before the first occurrence of that function. The 
 tokenized output is written on the file SCANOOT in the 
 format expected by READIN. 
 
84 
 PARAMETFRSj. 
 
 TITLE - the title of the program - CHARACTER (65) VARYING 
 
 CONDITIONS :_ 
 
 FNDFILF(SYSIN) - sets end of data flag 
 
 MESSAGES :_ 
 
 A listing of the program to be scanned. 
 
 A listing of the generated PROGRAH array. 
 
 A listing of the generated symbol table* 
 
 A listing of the generated DETAIL array. 
 
85 
 SEGHENT 
 
 FUNCTION; 
 
 To find all distinct execution paths in a program. 
 
 METHOD:, 
 
 Travel all execution paths, keeping track of what places 
 have been visited before. When first encountering a 
 statement visited before, terminate the current execution 
 path, and break the old one that was encountered at the 
 point of encounter. Collect the beginning pointer for 
 each program segment, and the ending pointer. 
 
 PARAMFT.EFSJ. 
 non© 
 
 CONDITIONS; 
 
 none 
 
 MESSAGES: 
 
 none 
 
86 
 
 SSDEL 
 
 FUNCTION; 
 
 To delete excess parentheses caused by having DETCMPT 
 follow indirect pointers in a general way. To change 
 begin subscript tokens to left parentheses, and end 
 subscript tokens to right parentheses. 
 
 METHOD: 
 
 Every entry in DETAIL is scanned for begin and end 
 subscript tokens, and these are translated to either 
 parentheses or nulls, depending upon the surrounding 
 conditions. Parentheeses are deleted in the neighborhood 
 of a comma inside a subscript. 
 
 PARAMETERS:. 
 none 
 
 CONDITION Sj. 
 none 
 
 MESSAGES :_ 
 none 
 
87 
 ON SCAB 
 
 FUNCTION: 
 
 To take a tokenized version of a FORTRAN program, and 
 generate a readable FORTRAN program similar to that which 
 "humans" use. 
 
 METHOD.: 
 
 The symbol table is printed first as a series of REAL, 
 INTEGER, and COMPLEX statements. Then a statement record 
 is selected from PROGRAM, and a special section of code 
 is executed, depending upon the statement type. The rest 
 is rather straight forward looking up of operators in 
 tables, and the looking up of variables in the symbol 
 table. 
 
 PARAMETERS: 
 
 PRGPTR - pointer to the program segment - FIXED 
 BINARY(15) 
 
 MESSAGE - a comment card to be the first line - 
 CHARACTER (72) 
 
 CONDITIONS: 
 none 
 
 MESSAGES.; 
 
 A FORTRAN like listing is produced. 
 
88 
 
 WRITEIT 
 
 FUNCTION^ 
 
 To generate the output file of generated three address 
 code. To qenerate the output file of SCANNER like ouput 
 plus the partial ordering information necessary to 
 distribute DC loops. 
 
 METHOD^ 
 
 The necessary information is copied into the HEADER, and 
 the HEADER is written to the file CODEOOT. Then, the 
 entire CODSPAC AREA is written to the file CODEOUT. 
 Then, the PROGRAM, DETAIL, TITLE, and SYMTAB are written 
 to the file SCAN just like READIN expects them. The 
 discriptive information for the PI PARTITION is then 
 written to the file SCAN, followed by the partition 
 information (PART and START) . 
 
 PARAMETERS: 
 none 
 
 CONDITIONS I 
 none 
 
 MESSAGES: 
 
 WPIT'IT SEGMENT # ## , DEPEND* ## 
 
89 
 
 appeed;x C 
 
 Data Structures 
 
90 
 
 SCANNEF OUTPUT 
 
 The SCANNKF data structures are explained in [S2], 
 
 Input Output Lists 
 
 The input output lists are actually a list when looked at 
 from a statement level. Each statement has an entry 
 (IOLIST) , and these entries are linked together in card 
 reader linear order. For each statement, there are two 
 possible lists. The first list contains entries for all 
 variables whose values are required for the evaluation of 
 the statement, and the second list contains entries for 
 all variables whose values are set by the statement. The 
 variables in each of these lists are represented by 
 ATOMS. The IOLIST entry contains information about which 
 of the index sets apply to this statement, and offsets 
 into the SCANNER data structures foe this statement. The 
 ATOMS contain information about the subscripts associated 
 with this variable, both in the original tokenized form, 
 and in a compact coefficient form. 
 
91 
 Data Dependency 
 
 A square bit matrix with a dimension equal to the number 
 of statements is the result of the calculation of the 
 near minimal valid partial ordering. 
 
 PI Partitioning 
 
 A square matrix vith a dimension equal to the number of 
 PI blocks represents the partial ordering of PI blocks. 
 Two linear arrays give the correspondence between 
 statements and PI blocks. The first gives the starting 
 location of the statement number list for this PI block 
 in the second. The second is a list of statement 
 numbers, ordered by PI block. 
 
 Compiler Output 
 
 K connected acyclic directed graph with one source and 
 one sink is the output of the compiler. The vertices are 
 
 composed of operations (CODE elements) , and the edges by 
 SUCCESSOR LISTS. 
 
92 
 
 Assorted Structures 
 
 DEBOG - A bit structure vhich controls output volume 
 
 FLAG - Compiler options 
 
 TYPFINF - A mnemonic association for scanned tokens 
 
93 
 LIST OP REFERENCES 
 
 A1 Aho, Alfred V. and Jeffery D. Oilman, The T heory of 
 Parsing, T ranslation, anjl Compili ng Vol ££, 1973, 
 pp8U9 
 
 B1 Bron, C. and J. Kerbosch, "Algorithm 457, Finding 
 All Cliques of an Undirected Graph", COMB. ACH 16 
 (1973) pp575-577 
 
 C1 Chen, Shyh-Ching and David J. Kuclc, "Time and 
 
 Parallel Processor Bounds for Linear Recurrence 
 Systems", IEES TRANSACTIONS ON COM PUTE RS, vol C-24 
 No. 7 July 1975, pp701-717 
 
 D1 Davis, E. W., "A Multiprocessor for Simulation 
 
 Applications", Ph.D. Thesis, Department of Computer 
 Science, University of Illinois at Campaign-Urbana, 
 Report No. 527, July 1972 
 
 K1 Kuclc, David J., Ross A. Toule, and Bruce R. Leasure, 
 internal document, unpublished 
 
94 
 
 K2 Kuck, David J. , et "fleasurement of Parallelism in 
 Ordinary FORTRAN Programs", 1973 Sagamore Computer 
 Conference on Parallel Processing, pp23-36 
 
 M1 Muraoka, Yoichi, "Parallelism Exposure and 
 
 Exploitation in Programs", Department of Computer 
 Science, University of Illinois at Campaign-Orbana, 
 Report No. 424, Pebruary 1971 
 
 N1 Niven, Ivan and Herbert S. Zuckerman, An 
 
 Introduction to the T heo ry of Numbers , 1972, 
 pp100-105 
 
 R1 Russell, E. C, "Automatic Program Analysis", 
 
 Ph.D. Thesis Department of Electrical Engineering, 
 University of California, Los Angeles, Report 
 No. 69-12, March 1969 
 
 S1 Sameh, Ahmed H. and Richard P. Brent, "Solving 
 Triangular Systems on a Parallel Computer", 
 Department of Computer Science, University of 
 Illinois at Campaign-Urbana, Report No. 766, 
 November 1975 
 
95 
 
 S2 Schubert, Richard, "Scanner Output", internal 
 document, unpublished 
 
 T1 Tovle, Ross A., "Control and Data Dependence for 
 
 Program Transformation", Ph.D. Thesis, Department of 
 Computer Science, University of Illinois at 
 Campaign-Orbana, Report No. 788, March 1976 
 
 H1 Harshall, S., "A Theorem on Boolean Matrices", 
 Ji ££!! i* C962), pp 11-12 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-76-805 
 
 I. Title and Subtitle 
 
 COMPILING SERIAL LANGUAGES FOR PARALLEL MACHINES 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 November 1976 
 
 '. Author(s) 
 
 Bruce Robert Leasure 
 
 8. Performing Organization Rept. 
 
 No -UIUCDCS-R-76-805 
 
 ). Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 13. Type of Report 8c Period 
 Covered 
 
 Master's Thesis 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 Much has been written in the past several years about techniques for translating 
 a program written in a serial language into object code that is reasonably 
 executable on a parallel machine. This paper describes, in an intuitive manner, 
 the transformations that discover and improve the parallel execution 
 characteristics of programs written in a serial language. 
 
 This document also reflects the state of the FORTRAN compiler in the PARAFRASE 
 project as of March 1976. Since that time, many improvements have been made. 
 
 17. Key Words and Document Analysis. 17o. Descriptors 
 
 detection of parallelism 
 improvement of parallelism 
 data dependence 
 partial ordering 
 
 17b. Identifiers/Open-Ended Terms 
 
 17c. COSATI Field/Group 
 
 18. Availability Statement 
 
 Release unlimited 
 
 FORM NTIS-38 (10-70) 
 
 19. Security Class (This 
 Report) 
 
 u 
 
 UNCLASSIFIED 
 curity Class (Thi 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 99 
 
 22. Price 
 
 USCOMM-DC 40329-P7! 
 
^4 
 
 W 
 
JUN21REC0