Ha rclH&Ba BMB HBgajH ■MSSKlClMMi Bt88 SSx w8S ■KSra |H MB ■ fntHMraRS ■H Sh EtS ■Mjgittyv n ■^■■■■VBlflfSBBcttn ■■■BUHHJK39BBGI Bra cthto ■H BMOTHMW ■■■HI —■»»—« ■ Hag Iga KB Ban hwKSsf BHsh h ■n&sa lll^^M^Bf— CT1MM HBHHHH| LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 5IO. 84 lifer ho. 800 - 8o*3 Digitized by the Internet Archive in 2013 http://archive.org/details/compilingseriall805leas Report No. UIUCDCS-R-76-805 NSF-0CA-MCS73-07980 A03-000020 COMPILING SERIAL LANGUAGES FOR PARALLEL MACHINES by Bruce Robert Leasure November 1976 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS The Library of thft University at Illinois t urbana-Cham en Report No. UIUCDCS-R-76-805 COMPILING SERIAL LANGUAGES FOR PARALLEL MACHINES* by Bruce Robert Leasure November 1976 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 *This work was supported in part by the National Science Foundation under Grant No. US NSF-MCS73-07980 A03 and was submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, November 1976. 1ii ACKNOWLEDGMENT I would like to thank Rick Schubert for his assistance in implementing the initial version of the compiler. I wo.uld also like to express my appreciation to the growing number of users of the compiler who are patient during these developing stages. IV Table of Contents I. INTRODUCTION 1 II. AN INTUITIVF LOOK 2 III. THE NEED FOR THESE TECHNIQUES 29 IV. FOTORE DEVELOPEMENTS 33 Appendix A - The Structure of the Compiler 35 Appendix B - Module Descriptions .39 Appendix C - Data Structures ............89 LIST OF REFERENCES 93 I. INTRODUCTION Much has been written in the past several years about techniques for translating a program written in a serial language (such as FORTRAN) into an efficient proqram for a parallel machine. This paper discusses the use of these concepts in the implementation of a FORTRAN compiler for an idealised target machine. The paper is organized into three sections. First, an intuitive feel for the various concepts and transformations, then a discussion of applicability, advantages, and disadvantages of this type of compilation, and lastly, an outline of future developments. A description of the compiler, its modules, and data structures is given in the appendices. II. AN IHTpiglYS LOOK When executing in a parallel machine, it is often desirable to use as much of the machine a possible at any given time. When compiling a program written in a serial language for a parallel machine, two options are available for achieving this goal: 1) Recognition of statements that can be done simultaneously for a large number of operands and results. 2) Discovering a partial execution ordering between statements. The first gives a lot of the same thing to do at any given time, and the second gives a lot of possibly different things to do. The usefulness of each depends largely on the type of target machine. Unfortunately, any algorithm for recognizing, from FORTRAN source text, operations that can be executed simultaneously for a large number of operands and results must consider the partial ordering implied by the programmer through statement orderings and subscripts. The task of recognizing these simultaneous aspects of a program is divided into four distinct subproblems, the first of which is by far the most difficult: 1) Recognition of the partial ordering between statements of the serial program, including orderings between statements on different iterations of DO loops. 2) Recognition of cycles in the partial ordering. 3) Dse of the partial ordering and the cycles to recognize array operations. 4) Transformation of all cycles, which are recurrences, into some form efficiently processable on the target machine . The algorithms used in this compiler were described in rather complex formulations in the original works [C1, D1, M, S1, T1 ]. Fortunately, the implementation of the theory is relatively straightforward. For the details of each algorithm, see the references included with each section. REQUIREMENT In any compilation transformation, there is one basic requirement that mast be met. The results of the program before the transformation must be the same as the results of the program after the transformation. Some transformations exist which decrease or increase the roundoff error associated with a particular type of calculation. All of the transformations in this paper, excluding the recurrence transformations, are result preserving [M1]. The recurrence transformations disturb the roundoff error associated with the calculation. For further details of the errors associated with recurrence calculations, see [C1]. PAPTIAL ORDERING In a program written for a serial machine, the order of execution of the statements (ignoring transfers of control) is in the linear order specified by the programmer. It is quite obvious that most programs would generate the same results if the statements of the program were carefully permuted. 1 A=1 2 B=37 2 B=37 1 A=1 3 C=A*B 5 D=A+8 1 a B=11 3 C=A+B 5 D=A*81 U B=11 6 E=A+B 6 F=A*B Both of the programs above generate the same values for the variables A, B, C, D, and E. Since the ordering specified is not unique, a partial ordering seems to be all that is required. A valid partial ordering is then one that always yields values that are the same as the original ordering. Reading < as "must precede", consider the valid partial orderings given here: 1<2<31) A simultaneous version of each assignment statement in this program can easily be constructed. DO 10 I = 1, 10 10 B(I) = I DO 20 I = 1, 10 20 A(I) = B(I+1) The problem comes in determining if there exists a valid partial ordering of the two statements in the simultaneous program. In the serial version, each element of A preserves an old value of B and the elements of B are assigned consecutive integers. If the serial ordering is preserved in the simultaneous program, B has the same value as in the serial program, but A has consecutive integers also. If the opposite ordering is assumed, then the values generated in the simultaneous program are the same as the values generated in the serial program. If no ordering is assummed, on a •small' machine one would have to be selected, and if the wrong choice were made, the answers would not agree. The reason for the valid partial ordering being the inverse of the serial ordering can be easily seen if the DO loop is written out, rather than being expressed iteratively. 10 B(1) =1 iteration 1 20 A(1) = B(2) 10 B (2) =2 iteration 2 20 A (2) = B(3) Statement 20 in iteration 1 uses a value that is modified by statement 10 in iteration 2. Hence, statement 20 (simultaneous version) must be executed prior to statement 1° (simultaneous version) . It should be noted that the existance of partial order ings across iteration boundaries does not reguire simply the inversion of the ordering in the serial program. DO 10 I r 1, 10 30 A (I) = B(I*1) UO B(I) =1 CYCLES and PARTIAL ORDERING A cycle in a partial ordering is a collection of statements which are related by a cycle in the directed graph of the precedes relation. 4<3<2<5<3<1 (statements 3, 2, and 5) 2<2 (statement 2) A cycle may occur only when some form of iteration (IP loops, DO loops, etc.) encloses the statements involved in the cycle, and each statement within the scope of the iteration is considered to be executed simultaneously for all iterations. DO 20 I = 2, 10 10 B(I) = A (1-1) 20 A (I) = B(I-1) 10 B (2) = A<1) iteration 1 20 A(2) = B(1) 10 B(3) - A (2) iteration 2 20 A (3) = B(2) Statement 10 in iteration 1 must precede statement 20 in iteration 2. Similarly, statement 20 in iteration 1 must precede statement 10 in iteration 2. This results in 10<2C, and 20<10, which is a cycle. Cycles may contain any number of statements. DO 10 I = 2, 10 10 D(I) = 0(1-1) D(2) = D(1) iteration 1 D (3) = D(2) iteration 2 Iteration one must be executed before iteration two. This statement is a cycle, because it must precede itself. 10 From these examples, it seems rather obvious that cycles in a partial ordering imply that fetching all of the operands, performing all of the operations, and storing all of the results (in that order) will not generate the same values as the serial program for those statements contained in the cycle if the entire index set is done simultaneously. The statements involved in a cycle in a partial ordering constitute a recurrence* Extensive work has been done on fast, accurate, and efficient methods of calculating the results of recurrences [C1, S1 ]. A few of the simpler methods will be presented in the code generation section of this paper. STANDARDIZATION TRANSFORMATIONS In order to make the calculation of the partial ordering as simple as possible, all of the index sets discovered by the compiler (DO indices, induction variables, etc.) are normalized to a starting value of zero and an increment of one. This necessitates the substitution of an expression using the new index variable for the old index variable for the entire scope of the index assigned. Also, if the old index variable is used outside of the scope of the index set, then the 11 correct value of the old index variable must be set upon exiting from the scope of the index set. DISCOVEBY TRANSFORMATIONS In order to recognize a minimal valid partial ordering, the identification of the ordered set of elements involved in the operations is imperative. In FORTRAN, this involves collection of information concerning subscripts. Since subscripts are combinations of scalar integer variables and integer constants in the large majority of cases, the probleir reduces to discovering as much as possible about scalar integer variables. Several items are easily recognizable: 1) variables defined in a DATA statement and not changed 2) variables assigned expressions of known scalar integer variables and constants 3) variables representing index sets Of course, when information concerning the value of variables is collected at compile time, the scope of the 12 value of the variable must be considered [A1 ], ESTABLISHING A NEAR MINIMAL VALID PARTIAL ORDERING A pair of statements A and B must be related by A < B if statement A is executed before statement B in the serial program and if at least one of the following is true: 1) Statement A modifies the value of a variable that statement B uses to generate a result. C = D statement A E = C statement B 2) Statement B modifies the value of a variable that statement A uses to generate a result. C = D statement A D = P statement B 3) statement A and statement B both modify the value of the same variable. C = D statement A C « p statement B To construct a valid partial ordering, all pairs of 13 statements must be tested for satisfying these conditions. If a statement is involved in an iteration, then each occurrence of that statement must be tested with each occurrence of every statement within the scope of the iteration (including other occurrences of itself) • For a program that does a large amount of iteration, this exhaustive method is extremely expensive in compilation time. A near minimal solution that is inexpensive is obviously desirable, A near minimal solution can be achieved by successive approximation. First, each statement is tested against only those statements that can follow it in execution on a serial machine. If a statement is inside the scope of an iteration, then it is tested against every statement inside the scope of the iteration (including itself for tests 1 and 2). To make the tests guick and simple, only variable names are used (not subscript information). The complexity of the tests and the number of tests is kept small by treating each occurrence of a statement from an iteration as identical references to the entire array. 14 statement 1 READ(1,1) B 2 A = B ♦ C 3 M = 21 U DO 40 I = 2, M 5 D(I) = A 6 E(D = D(I) 7 P(I) = F(I-1) 8 G(I) = H(I-1) 9 H(I) = G(I-1) 10 40 CONTINDE 11 WRITE(2,2) H, G, A 1<2<5<11 3 k