LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 51084 U(pr to. 293-300 cot>.2 CENTRAL CIRCULATION AND BOOKSTACKS The person borrowing this material is re- sponsible for its renewal or return before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each non-returned or lost item. Thtft, mutilation, or defacement of library materials can be tausei for student disciplinary action. All materials owned by the University of Illinois Library are the property of the State of Illinois and are protected by Article 16B of Illinois Criminal law and Procedure. TO RENEW, CALL (217) 333-8400. University of Illinois Library at Urbana-Champaign JIM 2 81999 ff 8 1 A.h. When renewing by phone, write new due date below previous due date. L162 Digitized by the Internet Archive in 2013 http://archive.org/details/storageallocatio297mura JMtAJ Report No. 297 January 13, 19^9 fYl il^sisi STORAGE ALLOCATION ALGORITHMS IN THE TRANQUIL COMPILER ' by Yoichi Muraoka UM « I Report No. 297 STORAGE ALLOCATION ALGORITHMS IN THE TRANQUIL COMPILER* by Yoichi Muraoka January 13> 19^9 Department of Computer Science University of Illinois Urbana, Illinois 6l801 * This work was supported in part by the Advanced Research Projects Agency as administered by the Rome Air Development Center under Contract No. US AF 30(602)Ul4U and submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, February, 1969* ■ir M Ill ACKNOWLEDGEMENT The author would like to express his most sincere gratitude to Professor Robert S. Northcote, Department of Computer Science of the University of Illinois, who graciously offered many suggestions and comments. Furthermore, without his criticism of the presenta- tion, this thesis would probably be totally unreadable. The author is also indebted to Professor David J. Kuck, who originated TRANQUIL and provided helpful criticism and sugges- tions throughout this work. Thanks are also extended to Mrs. Patricia Stippes, who typed the manuscript in its final form. I IV ABSTRACT TRANQUIL is a language for describing algorithms in terms of parallel constructs. Its compiler is now being implemented for the parallel array computer ILLIAC IV. This paper discusses a particular part of the implementation; namely, the problem of storage allocation for arrays. nB'i TABLE OF CONTENTS Page 1. INTRODUCTION 1 2. DATA STRUCTURES IN TRANQUIL 3 2.1 Data Declarations 3 2.2 Mapping Functions 3 3. IMPLEMENTATION 9 3-1 Introduction 9 3.2 Array Partitioning 10 3«3 Address Calculation 2k 3«^ Storage Allocation 26 k. FURTHER DISCUSSION 29 5- CONCLUSION 31 APPENDIX A. SYNTAX AND SEMANTICS SPECIFICATION OF TRANQUIL DECLARATIONS 32 1. Declarations 32 2. Variable Declaration 33 3- Array Declaration . 3^ h. PEM Reserve Declaration 39 5* PEM Assignment Declaration ho B. TABLES k3 VI Page C ARMY PARTITIONING AND PACKING FLOWCHARTS 1+5 D. STORAGE ALLOCATION PACKAGES 50 LIST OF REFERENCES 55 Si fa VI 1 LIST OF FIGURES Figure Page 1. The Standard Storage Schemes 6 2. Partitioning for Array A [1:3* 1:300, 1:300] 11 3. Block Packing for Array A [1:3, 1:300, 1:300] .... 18 k. Use of PE Memory for the Blocks Shown in Figures 2 and 3 19 5- BASETB Format 1 20 6. BASETB Format 2 20 7. Examples of BASETB Entries 21 8. One Subarray Generated from Array A [#1:10, ##1:10, *1:3, **1:5] 22 9. SLIST 27 Al. Subarrays for an Array A [#5, ##>+, *32, **6U] .... 37 Bl. Entries and Linkage of Tables for A[l:M n , 1:M , ..., 1:M ] ' kk 1 2 n CI. Pass 2 Program Block Entry and Block Exit Flowcharts for Array Declarations k6 C2. Array Partitioning Flowchart kQ C3« Residual Block Packing Flowchart U9 Dl. Table and List Entry Formats 52 D2. Example of an Entry for I^MEMORY 53 D3« Example of an Entry for VLIST 5U Mil! a 1. IMRODUCTION Familiarity with the structure of the ILLIAC IV computer [l] and the TRANQUIL language [2] will, in general, be assumed throughout this paper. However a few of the characteristics of ILLIAC IV which are important to the development of this paper will now be given. The most important feature of ILLIAC IV is that many- simple identical processing elements (PEs) obey instructions which are decoded by one common control unit (CU). Each PE can receive common data which is broadcast from its CU, but it also has access to data in its own 2K word memory (PEM). Thus, although executing identical decoded control signals from a CU, every PE can operate upon different data. Each PE has the option of playing either an active or a passive role during an instruction cycle on the basis of its own state, which is determined by mode control. Also, each PE is connected to its four nearest neighbors, thus permitting the routing of data from one PE to another. If ILLIAC IV is to be used efficiently, it is essential that data be stored "evenly" in PEM's so that, on receiving control signals from a CU, as many PE's as possible can operate on their own data, i.e., the data which are stored in their own PEM. Further, if it should be necessary for same PE to use data in another PEM, the routing distance should be as small as possible. TRANQUIL, as a language, has been designed to assist in the programming of primarily algebraic numerical computations. Among many existing computer languages, ALGOL, FORTRAN, PL1, and APL [3] also fall into this category. The prime concern here is to decide what kind of data types (information structures other than simple variables) should be included in the language. According to Knuth [h] , information structures can be categorized as follows: (i) linear list (ii) tree (iii) multilinked structure. In the languages mentioned above, however, not all of these structures are provided explicitly. Arrays are usually the only information structure which is provided (although multilinked structures are allowed in PLl). Other information structures such as the tree structure are implemented by utilizing arrays [3]. TRANQUIL also has arrays as the only structured data type in the language. The other types of data structures were not included because ILLIAC IV has been designed as an array machine and computations on arrays are its primary function. In the following chapters, the problem of storage alloca- tion in the TRANQUIL compiler, namely how to specify mapping functions for arrays and how the TRANQUIL compiler handles them, are discussed. 2. DATA STRUCTURES IN TRANQUIL 2.1 Data Declarations The data structures which are recognized in TRANQUIL are simple variables, arrays and sets. (For further discussion on sets, the reader is referred to [8].) All data structures, as well as labels, switches, and procedures must be declared in some block head as in ALGOL. The data type attributes are INTEGER , REAL , COMPLEX and BOOLEAN . Certain precision attributes also may be specified. Array declarations must include attributes which specify type (as above) and storage scheme (i.e., mapping function), in addition to size specifications which follow the same format as those in PL1; e.g., REAL SKEWED ARRAY A [1:50, 1:50]. As in ALGOL, the size of an array may be specified dynamically in inner blocks. A complete syntax specification and informal semantics for declarations is given in Appendix A. 2.2 Mapping Functions As was implicitly mentioned in the introduction, the efficiency of a program is highly dependent on the choice of mapping functions for arrays. Hence, TRANQUIL should provide users with a means of specifying mapping functions which are both simple to use and yet capable of specifying quite complicated storage schemes. Several conventions for the specification of mapping functions are provided in TRANQUIL. These include both predefined standard storage schemes and a mechanism to enable users to specify their own scheme. Before investigating these schemes we discuss the partitioning of arrays, and the use of the PE memories as a two-dimensional array. ILLIAC IV memory may be regarded as a 20U8 X 256 array (number of words in a PEM) X (number of PEM's). Hence, it is desirable to reduce all n-dimensional arrays (n > 2) to a set of 2-dimensional subarrays. Rows of these subarrays are usually stored in a row of ILLIAC IV memory: i.e., across PEM's. As an example, if the 3-dimensional array A is declared: REAL STRAIGHT ARRAY A [1:3, 1:100, 1:200], then the compiler will treat this as a set of three subarrays each of size 100 X 200. Note that the size of the subarrays to be formed is, in general, determined by the last two dimensions. In some cases it may be desirable to change this rule; i.e., form subarrays from dimensions other than the last two. To satisfy this requirement the following convention has been adopted. One asterisk placed in front of any dimension size specification in an array declaration forces that dimension to be stored in a column (i.e., one PEM) of ILLIAC IV memory, while two asterisks indicate row storage (i.e., across PEM's). Thus, REAL STRAIGHT ARRAY A 0*1:100, *1:200, 1:3] forces the compiler to form three sub arrays of size 200 X 100. Furthermore, REAL ARRAY A [*1:50] causes the vector A to be stored in one PEM. m PL, ^_^ ^_^ CO H 00 H CO ^ .» CO r- CO ^— s -=(- VO OO ^"»N --■v on rH LT\ on JA ' oo CvJ H -J- cvj on _a- * ^ s ^ — ^ ^ — -^ on OJ H no H OJ on ' •— -s ^— * .- — *. OJ rH CO i - H on ^ — ^ ^— \ • — X H CO Ol tH H OJ CO oj on -3 on vo t— oo TJ O 3= Ph _^^ H CC CO • r— • • • • VJD • • • • J- LT\ OJ on J- J- OJ rH y-— V ,— *v * — s ^— ■* H OJ rn -d" en •» •t 1\ •s OO OJ rH CO ■**—*' ■*~s •^-^ * — ' y— >» * — s ^-v H OJ oo OJ ■, »— s — s -■— ^ ^-K y^-*N H Ol r-l rH _ CO oj on _3- u o co PL| ^-^ , — ( 1 . en I ) rH OJ □O f- • • • vO • • • L'\ • • • J" • • • m on rH »N • ^~v OJ OJ Ol '1 H OJ CO H H H rH H OJ CO o oj on -3- ir\ vo i>- co o OJ b.C a3 rH o a -p j- — s y— "s ^— ^ CO OJ 0- vO ^ — s on^xi <-H OJ H vO rH OJ rH ♦. r-i • • • • ^ on -> H • • • • -. on OJ H OJ H OJ w OJ H a m PL, on pl, on PL, on pl, CO "■^ w *-* ' ' 0) o cd +3 -p ■ VI O -P a) S at CO aj si EH . • Tl * * o CO • ■\^T ^*N , s H Ol r- t- ^O "N (U P3 on h rH H H H H r- !h Ixj OJ ■< H • • • • •v W - £1 t- PL, • • • • •v H cti Vt-, rH r- pl, r-- pl. C- W O H H H rH PL, t — * M VD rH Q s: , — ^ ^ — . ^ — » OJ vO m L?\ r— vo VO MD ^— N <^N t* OJ rH rH <-i Lf\ OJ CM H \D u AJ -> OJ • • • • ^ W ^ OJ • • * oo -> H -H vO W VO Ph VO W vq w VO H -n rH PL, H rH P^ rH f^ rH PL, cS Lj " ' * — "— " — -P •H -P • • • « d •S • • • • • • •H o a CO id ^OJ ^OJ OJ -4" vO J- •—•CO *"*N ^l w on oj H OJ Ol H H OJ S OJ ^ H • • • • • • ^ W • • - W ^ W Ph bi OJ Ph OJ PL, OJ PL, OJ PL, vl) ^_^ ^— "* ^-^ ^ — UN OJ CM on ^— «.„~ r I^^rH OJ J" C— vO -d" s~- [— s-^ m oj H H H Ol OJ rH H H OJ X •> w • • • • •> W " !? • • ^ W -< W u H PL, H Ph H PL, rH PL, H PL, 1) Ol * — ' ' ,, ^~^' • ' ' UJ ro 7 The standard, more commonly used, methods of storage are: STRAIGHT , SKEWED and CHECKER . These schemes are illustrated by the examples in Figure 1 where an 8x8 matrix A is stored in eight PEM's of ILLIAC IV. STRAIGHT is the simplest storage form for a matrix and leads to the simplest program, but the drawback is that a column is contained entirely in one PEM, thus prohibiting simultaneous access to all elements of a column. SKEWED allocation, on the other hand, distributes columns, as well as rows, across PEM's allowing access to an entire column in parallel. Since rows and columns are equally accessable in this scheme, matrices can be PACKED by appropriate transposition, thus minimizing wastage of memory space, as shown in Figure 1. The CHECKER allocation scheme has been developed specially for storing mesh type data for elliptic partial differential equations. This scheme allows each PE fast access to the four nearest neighboring mesh points. Further discussion on storage schemes for matrix type operations is found in [5]» The CHECKER scheme is discussed in [6]. Mapping functions are applied to the aforementioned subarrays. Thus, for example, skewing is done on each of three subarrays of size 200 X 100 belonging to the array A which is declared REAL SKEWED ARRAY A [**1:100, *1:200, 1:3]. It is feasible to include new mapping functions in this list, but it is anticipated that most users needs will be satisfied by this set of standard mapping functions. Finally, the user who "wishes to specify his own mapping function may make use of a PE memory assignment statement. For example : PEMEMORY PB [1:10, 1:256]; PEM FOR (I, J) SIM ([1, 2, ..., 10] X [1, 2, ..., 256]) DO PB [I, J] «- B [I, MOD (256, I + J - 1)]; REAL ARRAY (PB) B[l:10, 1:256]; establishes virtual space of size 10 X 256 in PE memory, and then stores a 10 X 256 array B there in skewed form. Thus, instead of making up the aforementioned subarrays out of an array declaration, space reserved in PE memory may be used. In the program, the pro- grammer refers to an element of memory space via the assigned array name B and its subscripts, as usual. It should be noted that storage mapping functions can not be specified dynamically. Should remapping of data be required, an explicit assignment statement may be used; e.g., to change the data in an array A from skewed to straight storage an assignment statement B - A is used, where B has been declared to be a straight array. 3 . IMPLEMENTATION 3.1 Introduction The TRANQUIL compiler currently has two passes. In pass 1, on recognizing declarations, the compiler enters the necessary descriptive information; e.g., in the case of array declarations, the attributes, type of mapping function, and size and dimension information for the array, into a table (IDTAB). Segmentation of data, i.e., partitioning of arrays, and storage allocation are taken care of in pass 2. Descriptions and formats of the tables used are given in Appendix B. There are many computational problems the programs for which require more working storage than is available. This is particularly true for programs which will be run on ILLIAC IV, because the size of each PEM is relatively small compared with the computational speed of a PE. For example, multiplication of two 256 X 256 matrices takes only 70 msec on ILLIAC IV, but almost half the memory is needed to accomodate three of these arrays. Thus a strategy for segmenting arrays, and controlling the overlaying of segments, must be devised for the TRANQUIL compiler. The segmentation of object programs will not be considered here. Our main concern is how to deal with large arrays. 10 3.2 Array Partitioning According to Randell and Kuehner [7], two characteristics believed to "be most "useful for revealing the functional capability and underlying mechanisms of current hardware-assisted dynamic storage allocation systems are related to the concepts of name space and predictive information. A discussion of these characteristics will now be given. (i) Name Space On conventional computers, memory is always treated as a linear space. Array elements must be stored as a linear array or vector; e.g., matrices are stored in order by rows, and a row of an array frequently forms a segment [7] • A typical size for a segment is 1024 words (B5500), and this limitation is reflected in the fact that the maximum size vector that can be declared is 1024 words. ILL1AC IV memory, however, may be regarded as a two- dimensional space, i.e. an array, with 2048 rows (number of words in a Pffl) X 256 columns (number of PEM's). This suggests the use of a two-dimensional segment, to be referred to as a block. Explicitly, all arrays will be partitioned into two dimensional -blocks, the maximum size of which is 256 X 256 words. This partitioning is done on the subarrays which were mentioned in Chapter 2. 11 A [1, *, *] ! k _ _ J A[2,200,100] 6 5 7 A [2, *, *] 8 10 9 li A [3, *,,*] Figure 2. Partitioning for Array A [1:3, 1:300, 1:300] 12 ,* i Figure 2 illustrates the partitioning of a three-dimensional array into 3 subarrays, each with k blocks for a total of 12 blocks. The reasons for using two-dimensional segments are as follows : (a) It is necessary to have large segments in order to reduce the number of I/O requests. It is especially necessary that all Pffl's be interrupted evenly when I/O requests are being processed [l], i.e., a block should be formed so that it has elements across all PEM's (if possible) when it has been read into memory. (b) On the other hand, operations on arrays may always be performed in terms of subarrays; e.g., one can write iM A ll A 12 A A 21 22 B ll B 12 B 21 B 22 (A 11 + V ( A 12 +B l 2 ) (A 21+ B 21 ) (A 22+ B 22 ) for the addition of two matrices A and B provided that submatrices A. . and B. . have the same order. These two observations suggest the use of two-dimensional segments, i.e., blocks. Another advantage of having two-dimensional segments is that either a row or a column of a segment can be accessed, which is an advantage over linear (row or column) storage. The partition- ing of an array into blocks is independent of the mapping function used; i.e., for a SKEWED array skewing is applied to each block after partitioning. 13 One possible disadvantage which could be claimed against block segmentation is that of difficulty in indexing across blocks. This may, however, be remedied by again utilizing the submatrix concept in matrix operations, e.g., instead of writing: FOR (I) SIM ([1,2, ...,512]) DO FOR (J) SIM ([1,2, ...,512]) DO BEGIN C[I,J] - 0; FOR (K) SEQ([l,2,...,512]) DO C[I,J]«- C[I,J] + A[I,K] X B[K,J] END _(D write: FOR (M) SEg ([1,2]) DO FOR (N) SEQ^ ([1,2]) DO BEGIN [M,N] C <- 0; FOR (L) SE§ ([1,2]) DO [M,N] C «- [M,N] C + [M,L] A x [L,N]B END (2). 14 where [M,N]C is computed using code (l), but with indices now varying from 1 to 256 instead of 1 to 512, i.e. FOR (I) SIM ([1,2,... ,256]) DO FOR (J) SIM ([1,2, ...,256]) DO BEGIN [M,N] C [I, J] «- 0; FOR (K) SEQ ([1,2,..., 256]) DO F" .« [M,N] C [I, J] *• [M,N] C [I, J] r ; + [M,L] A [I,K] X [L,N] B [K,J] END It should he emphasized that the generation of codes corresponding to (2) above from (l) is the compiler's responsibility and programs written in TRANQUIL do not have to contain explicit provisions for segmentation. For the detailed algorithm for generating code (2) from code (l), the reader is referred to [8,93- The sizes of the blocks obtained by the array partitioning fall Into four categories: (a) 256 X 256 (b) n X 256 n < 256 (c) 256 X m m < 256 (d) m X n m, n < 256 For the sake of discussion we call these SQUARE, HBLOCK, VBLOCK, and SBLOCK, respectively. As was mentioned before, it is important to form a block which has 256 columns, so that when I/O is being 15 processed all PEM's will be interrupted evenly. Thus, small blocks belonging to the same array should, if possible, be packed together to form a larger block. The details will be discussed later. (ii) Predictive information All array operations and data transfers between ILLIAC IV disk and PE memory are done in terms of these blocks. Blocking facilitates the use of arrays which are larger than the total PE o memory. All data is normally stored on the 10 bit (approximately 30 memory loads) ILLIAC IV disk, and blocks are only brought into Q PE memory (at a transfer rate of 5 X 10 bits/second or 8 ms /block) as required. The disk rotation time is kO ms. The TRANQUIL compiler will automatically generate block transfer I/O requests, which are handled by the operating system, to make it possible to write a TRANQUIL program which includes no explicit I/O statements. All data are initially stored on the disk in the same way they are used on ILLIAC IV; e.g., if a skewed array is necessary on ILLIAC IV, then the array is also stored in a skewed fashion on the disk before the program which uses it is initiated by the operating system. It is generally agreed that both preplanned and dynamic storage allocation have advantages for certain types of problems. Preplanned allocation should work best with more regularly predictable problems, whereas dynamic storage allocation can best cope with problems whose flow and storage requirements are highly data dependent and therefore unpredictable. In the case of the TRANQUIL compiler it has been further recognized that the methods are not mutually exclusive and can work well in combination. A pre scheduled 16 sequence of I/O requests, for example, can be generated locally; i.e., in a TRANQUIL statement involving large arrays as in A <- B X C IF*" 1 where A, B, and C are arrays of size 102*+ X 102U. On the other hand, a dynamic storage allocation scheme is necessary globally; i.e., between TRANQUIL statements, and this should consider transfers of control. The first TRANQUIL compiler will adopt a M V simple strategy; i.e., a block on demand system utilizing the first-in last-out strategy. The block number for an element A[i n , i_,..., i . , i "| of 1 2' ' n-1 n an array A[l:R., l:M p ,..., 1:M ] is given by |(...((i 1 - 1) M 2 + (i 2 - 1)) M 3 + ...(i n _ 2 -l)) For example, the block number for the element A[2,200,100] in Figure 2 is ((2-1) r 30 + 255] + rioo"i\r3oo + 25£i + r i0(T i = h L 256 J u^J/L 25^ J L25£| 17 For each block thus established an entry is made in BASETB. This, however, does not necessarily imply that a segment is established for each block. The blocks smaller than 256 X 256, i.e., HBLOCKs, VBLOCKs, and SBLOCKs may be packed together (they are then called subblocks) to form a larger block (called a superblock) . It should be noted that the terms HBLOCK, VBLOCK, SBLOCK and SQUARE only refer to the size of a block as previously defined. Thus it is possible, for example, to talk about a superblock which is a HBLOCK. The entry in BASETB gives an absolute base address (the address of the upper left-most element) of the block, if the block is in memory, or the address relative to the base address of the corresponding superblock and a pointer to the BASETB entry of that superblock in the case of a subblock. The above addresses are each specified by a PEM word number (x) together with a PEM number (Y) . Thus if (x, y)j (X,Y) are the relative address of a subblock and base address of a corresponding superblock, respectively, then the base address of the subblock is (x + X, y + Y). As an example, in Figure 2 there are: 3 SQUARES (blocks 0, k and 8), 3 HBLOCKs (blocks 2, 6 and 10), 3 VBLOCKs (blocks 1, 5 and 9), and 3 SBLOCKs (blocks 3, 7 and 11) . Twelve entries are established in BASETB corresponding to these blocks. The 3 VBLOCKs are packed into a 256 X ikk superblock (Figure 3)- Ikk 18 mm ■HI 8* 4» I-*- VD OJ vr P I a i £5 I I I I m 1 I ^ Packing of VBLOCKs 256- -H T i 4U Packing of HBLOCKs -Ikk- [ 3 IT Packing of SBLOCKs Figure 3- Block Packing for Array A[l:3, 1:300, 1:300] 19 The relative address, in this superblock, of block 10 is (0, 96). The three HBLOCKs are packed into a 132 X 256 superblock and the three SBLOCKs are stored in a kk X ikk superblock as shown. Further, suppose these blocks are stored in PE memory as shown in Figure h. PE Word No, 256 512 768 1021+ 1156 PE No, TW 2, 6, 10 ^> cm CM OJ CM m H > Figure k. Use of PE Memory for the Blocks Shown in Figures 2 and 3* ,. H 20 Entries for BASETB are made after packing. p X Y SIZEX SIZEY Figure 5. BASETB Format 1 mm The BASETB entry format for a superblock or SQUARE is shown in Figure 5, where P indicates that this is an entry for a superblock or SQUARE; (X,Y) is the absolute base address of this block; SIZEX is the number of rows of this block; SIZEY is the number of columns of this block. c ORIGIN COX COY Figure 6. BASETB Format 2 where Figure 6 shows the BASETB entry format for a subblock, C indicates that this is an entry for a subblock; (COX, COY) is the relative address of the subblock. 21 For example, the BASETB entries for blocks 2 and 10 in Figure h are : p 768 256 Ikk 10 c 2 96 Figure 7. Examples of BASETB Entries Thus, the absolute base address for block 10 is given by (X.. n , Y, n ) where and X Q = BASETB [10] . COX + BASETB [BASETB [10] . ORIGIN] . X = + 768 = 768 Y 1Q = BASETB [10] . COY + BASETB [BASETB [10] . ORIGIN] . Y = 96 + = 96 The procedure above generates, for example, 100 subarrays and 100 entries in BASETB for an array A[l:10, 1:10, 1:3, 1:5] in spite of the fact that the subarrays might be packed and generate only one superblock. An array A[#l:10, ##1:10, *1:3, **1:5], on the other hand, generates only 1 entry in BASETB. This is because * and ** together with 22 # and ## force the compiler to generate a single subarray of size 30 X 50 (Figure 8) . Since partitioning is done in terms of this subarray, only one block is generated, making only one entry in BASETB. Thus if *'s and#'s are used effectively, more efficient code can be generated. 10 30 Figure 8. One Subarray Generated from Array A [#1:10, #1:10, *1:3, **1:5] 23 Should VBLOCKs or SBLOCKs result from partitioning, their sizes will he modified so that they have (n + 7) columns, where n I ft J T is the number of columns of the original block. This is done in order that each block be stored beginning at the p-th PM, where p mod 8 = 0, to make efficient use of the 8 word CU fetch capabilities. Blocks will be packed after the above mentioned modification has been made. Packing is done in two stages. First, packing is done for the same kind of blocks (e.g., VBLOCKs are packed by themselves) belonging to the same array. For example, VBLOCKs are arranged side by side until the number of columns of the resultant superblock reaches 256. A similar procedure is used for packing HBLOCKs, i.e., they are stacked so that the resultant superblock has up to 256 rows. In the case of SBLOCK packing, first they are treated as VBLOCKs (e.g., they are arranged side by side). Further, if the resultant superblock is a HBLOCK, then HBLOCK packing is done. It should be noted that even after the packing it is possible to have residual blocks which can not be packed, or a superblock which is either VBLOCK, or SBLOCK. These are the objectives of further packing, which is discussed in the following section. HBLOCKs will not be considered in further packing and will be treated as SQUARES. A detailed flowchart for partitioning and packing is given in Appendix C. 2k 3.3 Address Calculation The effective address of any array element is established by computing its block number, which was discussed in the previous section, and a relative address within that block. The computation of the relative address varies from one mapping function to another. Here only standard mapping functions are discussed. For an element A [i , i , . . . , i , i ] of an array A [1:M-. , . . ., 1:M ], the relative PE number and the relative PE word address of the element in the specified block are given as follows, where x will denote PEM address and y will denote PEM number: (i) STRAIGHT array x = (i -1) mod 256 n-1 y = (i -1) mod 256 (ii) SKEWED array x = (i -1) mod 256 n-1 y = (i _ + i -2) mod 256 v n-1 n ' For example, consider again the array in Figure 2. The block number for the element A[2, 200, 100] was h. If the array is skewed, the PE number relative to the base address of this block is (220 + 100 -2) mod 256 = k2 and the relative address in this PEM is 199. 25 The equation in Section 3-2 can also be written as; /(...((^-l) ' M 2 + (i 2 -l)) ' M 3 + ...(i n _ 2 -l)) • m' + r hsw • m' n + r x n-n where M' . = 1 M. + 255 1 256 To obtain the block number using this form requires (n-2) subtractions, (n-l) multiplications and (n-l) additions. This value, in turn, is used to access BASETB to locate an absolute base address of the block. M, • 1VL...M • M 1 -, • M' words are required in \L 2 n-2 n-l n H BASETB for this array, besides n words in DOPETB, which contains the bounds for each subscript position. Upon finding the absolute base address, the relative address of an element in the block is calculated using one of the two sets of equations given above, if a standard mapping function is specified. Both equations require one or two subtractions or additions and a shift operation. In practice, a PEM word address is calculated in the PE which requires it, and is used as an index value for the PEX. Thus, the PE index value together with the CU index value, which is an absolute base address of the block, is used to locate the element of the array. In most cases some or all of the elements in a row or column are used simulta- neously. In the case of column operation, for example, each PE can simultaneously compute the relative address (index value) which it will require. 26 It should be noted that to compute the PM number and the PEM word address of an element, no information on the array size is necessary; i.e., no reference to DOPETB is necessary. On the linear memory space, however, all dimensional information is required to locate the memory cell corresponding to an array element [10] . 3-4 Storage Allocation The storage allocation procedure is a separate package which is independent of the other parts of the compiler, such as the "block packing procedure. The compiler can request any amount of memory space; i.e., SQUARE, HBLOCK, VBLOCK or SBLOCK, and free it at any time. The storage allocation procedure keeps track of memory usage, and returns appropriate memory space on request, or frees it. In allocating memory space for a block a linked space list called I4MEM0RY, which keeps count only of the number of rows of memory which have been used, is utilized. If a HBLOCK of size m X 256 (or a SQUARE) is to be stored, the list is searched to locate the smallest space which has at least m (or 256) adjacent rows and the block is stored there. In the case of VBLOCK (256 X m) 256 rows of PE memory may be allocated and a sublist corresponding to a 256 X 256 block of storage called VLIST is established. A VLIST records use of columns (PEMs) in a particular 256 X 256 block. In the case of a SBLOCK, again a 256 X 256 block may be allocated with associated list SLIST. This block is divided into 4x8 subblocks. SLIST consists of a 6k X 32 bit boolean array (Figure 9) in which each bit represents the use, or otherwise, of each 4x8 subblock. 27 32 bits r Gh bits V: T|T~ '■ I — ■ — »■ II I I ■ ■■■■ — I '■ m bit r U8 word PE memory block Figure 9. SLIST The above is the overall picture of the storage allocation. However, in transferring data between ILLIAC IV disk and PE memory, the initial PE memory address is restricted to certain PEM's and the block of data transferred can have only 16, 32, 6U, 128 or 256 columns. For example, a 6^4 column block transfer must start in one of PEM numbers 0, 6U, 128 or 192. Thus, adjustment must be made of the size of VBLOCKs and SBLOCKs so that they have one of the above mentioned numbers of columns. This implies that a certain amount of storage is wasted. For instance., a VBLOCK of size 256 X 7 2 is made up to a block of size 256 X 128, wasting 256 X 56 words. To avoid this further packing is introduced which is applied before PE memory is assigned to them(i.e., before requesting memory space to the storage allocator) . 28 First a bit array which is similar to SLIST is prepared. Upon entering a TRANQUIL program "block, arrays are partitioned and packed as discussed in Section 3*2. If any residual VBLOCKs or SBLOCKs resulted, then they are now formed into a superblock and entries are made in the bit array. A superblock of size m X 256 (SQUARE) is made whenever this bit array becomes full, or the program exits from the TRANQUIL block. This method may still introduce uneconomical packing. However, such wasted space can not be used during that program block anyway. The strategy that has been chosen might involve considerable bookkeeping, but it minimizes storage fragmentation and reduces the frequency of storage allocation and I/O requests. Details of the algorithm are given in Appendix C. 29 k. FURTHER DISCUSSION There are several parts of the TRANQUIL language still to be implemented. One of these is the PEM storage allocation statement. This will be implemented using a macro generator which is now being written. For instance, consider the array B, the storage allocation statement for which appears in Chapter 2. After this statement in a TRANQUIL program, whenever the array B is used its subscript expression is replaced by the expression appearing in the storage allocation statement. Data management, such as partitioning of arrays and packing of blocks, has been discussed and it has been shown that effective address calculation for elements of partitioned arrays can be computed efficiently. However, no provision has yet been made for a proper way to communicate with the ILLIAC IV operating system. Obviously, for efficient transfer of data from the disk to/from the PEM the format (mapping and partitioning) of the data on the disk must be in the form required in PEM. Hence the TRANQUIL compiler must communicate the appropriate information to the operating system which, when it obtains file name, program name and external format information, will cause the data to be partitioned and mapped in the format required by the associated TRANQUIL program. Also relating to input/ output is the strategy of better resource allocation. One direction which is now being studied is to make a model of a program and interchange statements so that the resultant program is computationally equivalent to the original one, 30 yet faster in execution speed. The idea is to reorganize statements to minimize input/ output of data blocks. Such high optimization will be especially effective for lengthy production type programs. Finally, TRANQUIL is by no means a completely satisfactory problem oriented language in the sense that users are not free from the burden of choosing data representation, e.g., mapping functions. What is needed is a language which Balzer calls a dataless programming language [ll] • He states: "The independence (of a processing to data repre- sentation) will allow the programmer (l) to disregard, while specifying the program, the details of data processing, memory space require- ments, and matching of data representation to the processing done on it; and (2) to handle them instead, during the data declaration phase. .... The problem of data representation can be left until this programming has been completed; thus, a more rational decision can be made concerning an optimal representation. Because of this separation, the programmer should be able to think through his problem better." 31 5 • CONCLUSION Array mapping functions and array partitioning for the TRANQUIL language have been discussed in this paper. It is obvious that for array type operations SKEWED storage will often provide the best result in terms of computational speed because either a row or a column of a SKEWED array can be accessed simultaneously by all PEs. Also, since the use of a TWS system [12] makes the TRANQUIL compiler more readily modifiable, it is feasible to incorporate new mapping functions in TRANQUIL relatively easily. It should be emphasized that the addressing mechanism for arrays is rather simple in spite of the complicated partitioning. Actually, to locate an element in a block, only a few additions/ substructions and some shift operations are required. Thus, the partitioning mechanism has no adverse effect on the computational speed of compiled codes. Finally the partitioning and blocking of data, together with automatic (computer generated) input/ output of data blocks to/from disk, enables the user to write programs using data arrays / 9 \ (up to 10 bits) which are larger than the ILLIAC IV memory. 32 APPENDIX A SYNTAX AND SEMANTICS SPECIFICATION OF TRANQUIL DECLARATIONS The syntax specification is written in Backus Naur Form [13] • The syntax for the following non-terminal symbols is as specified in the ALGOL report [lU], or in Appendix B of [8], and they are used here without any further definition: The semantics specification is given in English. If not explicitly given, the semantics is assumed to be identical to that given in [lk] for the equivalent ALGOL construct. 1. Declarations : := | 33 2. Variable Declaration 2.1 Syntax : := : := BOOLEMf| REAl | REALS | REAL!) ! INTEGER I INTEGERS | INTEGER! 1 BYTE8 J BYTE16 : := , | : := 2.2 Examples INTEGER I, J REAL X, Y 2.3 Semantics Variable declarations serve to declare certain identifiers to represent simple variables of a given type. Each attribute corresponds to a specific word format in ILLIA.C IV: BOOLEAN 6k bit word (only the least significant bit is meaningful) REAL 64 bit floating point REALS 32 bit floating point REALD 128 bit (double precision) floating point INTEGER kQ bit fixed point INTEGERS 2k bit fixed point INTEGER! 6k bit fixed point (no sign) 3h BYTE 8 8 bit fixed point (no sign) BYT El 6 16 "bit fixed point (no sign) COMPLEX Simple variables further specified by this attribute have complex numbers as values. 3> Array Declaration 3«1 Syntax ARRAY | ARRAY | ARRAY () | ARRAY () ^mapping function>: := STRAIGHT ! SKEWED | SKEWED PACKED | CHECKER | : := : := | , : := [] | , ::= | , : := | : := *| ** | #| ##| : := : : := : := : := : := 35 3.2 Examples REAL SKEWED ARRAY A[l;5, 1:10], B[5, 10] ARRAY AR, BR[**1:80, *1:256] INTEGER ARRAY AI, Bl[#4, **64, *128] 3-3 Semantics An array declaration declares one or several identifiers to represent multidimensional arrays of subscripted variables and gives the dimensions of the arrays, the bounds of the subscripts, the types of the mapping functions, and the type of the variable. 3-3.1 Subscripts Bounds As in PL1, the subscript bounds can be given in either ALGOL form or FORTRAN form. Simultaneous use of both forms in one bound list is, however, forbidden; e.g., [1:5* 10] is illegal. 3.3.2 Dimensions Up to eight dimensions are allowed in arrays. 3.3.3 Arrangement (Also see Chapter 1.) Generally arrays are stored in PE memory by subarrays, which are made up from the last two dimensions; e.g., three 6k X 128 subarrays are to be formed for an array A[3, 64, 128]. Arrangement declarations serve to change the rule. One asterisk *, together with ** placed in front of a bound indicates that a subarray is formed by those dimensions. For example, A[5, **256, *128] will generate five 36 subarrays of size 128 X 256. Since subarrays are stored in PE memory as they are, i.e., a m X n subarray occupies m PEM words in n PEM's, the arrangement declaration may be used to introduce a better memory usage. Also,* forces data corresponding to that subscript to be stored in one PEM, and ** indicates that the data for that subscript is stored across PEM's. Thus, a vector A[50] can be stored in one PEM by declaring as A[*50], or across PEM's by A[**50] • One sharp # and two sharps ## similarly indicate how to arrange subarrays in PE memory. One sharp indicates the direction of increasing PEM word address and ## indicates across PEM's. For example an array A[#5, ##h, *32, **64] will introduce twenty 32 X 6k subarrays, arranged in five rows of k subarrays thus making up a 160 X 256 block (Figure Al) . 6k 37 32A[1,1,*,*] A[l,2,*,*] A[l,i+,*,*J 6k 160 32 A[5,l,*,*] 256 A[5,^,*,*] Figure Al. Subarrays for an Array A[#5,## i +,*32,**6U] 38 The five combinations of arrangement markers given in Table Al are legal, where the numbers indicate the number of allowable appearances in a single declaration. -* -*-* # # 1st combination n- times 2nd 1 1 3rd 1 1 1 hth 1 1 1 5th 1 1 1 1 Table Al. Combinations of Arrangement Declaration Markers 39 3.3.4 Mapping Function The def ault mapping function is SKEWED PACKED . In the case of a user-specified mapping function, a corresponding PEM assignment declaration must be in effect at the time the array declaration is processed. 3.3-5 PEM Area If a PEM assignment declaration is used to define a special mapping function, then a corresponding PEM area name must appear in the array declaration. h. PEM Reserve Declaration 4.1 Syntax [, ] : := : := : := k.2 Example PEMEMORY PEMEM [10, 256] k-3 Semantics PEM reserve declarations serve to reserve a certain amount of virtual memory, allowing the programmer to store arrays there in any fashion. The integer used for PEM size should not be greater than 256. Should more space be needed, more than one area may 40 "be reserved. The declaration should appear before the area is used in a PEM assignment declaration. The reserved area will be released upon exit from the block in which it was declared, as usual. 4.3.1 Word Size and PEM Size Both word and PEM are understood to be numbered starting from and increasing in increments of 1. 5 • PEM Assignment Declaration 5«1 Syntax : := PEM lock> : := | BEGIN : := | | : := ::= FOR () SIM () DO : := •<- : := [, ] ::= | : := | : := [] 1+1 : := |, : := 5.2 Examples PEM FOR (I, J) SIM ([1,2,..., 256] X [1,2, . . . ,256] ) DO PEMEM[I, J] ♦- ARAY [J,l] PEM BEGIN PEMEM [1,1] -ARAY [l,ljj PEMEM [256,1] *- AMY [1,256]; FOR (I) SB4 ([2,3,. .-,255]) DO PEMEM [1,1] «- ARAY [l,l] END 5-3 Semantics PEM assignment declarations serve to store arrays into a reserved PEM area in the way specified, i.e., declare a new mapping function. 5.3.1 PEM Area PEM areas must be reserved before they are actually used by PEM assignment declarations. 5.3.2 Variables All variables appearing in PEM assignment declarations besides PEM area names and array names need not be declared and they are understood to be local to the declaration. k2 5.3.3 Sets All sets appearing in a PEM assignment declaration may not be dynamic; i.e., parameters can never be passed to this declaration from outside. 5.3.^ PEM Assignment Statement PEM PEMEM [I, J] «- AR [K,L] causes the (K,L) element of an array AR to be stored in the I-th row and the J-th column of PEMEM. 5.^ Further Example BEGIN PEMEMORY PEMEM [256,256]; PEM FOR (I, J) SIM ([1,2,... ,256] X [1,2, ...,256]) DO PEMEM [I, J] «- ARAY [J,l]; REAL ARRAY (PEMEM) ARAY [256,256]; END In the above example a 256 X 256 array ARAY is stored in PE memory in such a way that a column of ARAY is across PEMs and a row of ARAY is in a single PEM. ^3 APPENDIX B TABLES The following is the list of tables used in the TPAUQUXL compiler to take care of declarations and memory allocation, (i) Tables used in both Pass 1 and Pass 2 IDTAB contains the information on each identifier declared in a program; e.g., type, a pointer to a corresponding DOPETB entry if an identifier is an array. DOPETB contains the information necessary to refer- ence arrays; e.g., size of each dimension, the number of dimensions. (ii) Tables in Pass 2 BASETB contains the descriptor for each block resulting from an array partitioning; e.g., size of a block, base address for a block. These tables are linked as shown in Figure Bl. The number of blocks N in BASETB is determined by: L - 25s j L 2 5< N = ^ X M 2 X ... X IM^-l + 255 X M + 255 "256 kk LDTAB A DOPETB n < "i Mr M n BASETB SIZE BASE 256X256' 100 ^ • ■) N n ^ number of dimensions N = number of blocks Figure Bl. Entries and Linkage of Tables for A[l:M ] _,l:M 2 ,...,l:M n ] APPENDIX C ARRAY PARTITIONING AND PACKING FLOWCHARTS ^5 he w o < o l-l en CQ H jg § w K 8 « Ph rl s H H >H CO < t-'H W C) C ) o o >: O 5 i-q a jg] pq < H Pd H Ph o Pd Em 842 1098 2047 i USED FREE USED FREE USED FREE i 1 u 3 2 256 - 1 2 F 256 4 3 40 3 U 296 5 4 80 4 F 376 6 5 466 5 U 842 5 6 256 6 F 1098 6 6 50 Figure D2. Example of an Entry for I4MEM0RY 3h PE Memory Usage 500 <* 6k 32 32 <> LT\ OJ IUMEMORY Word u SIZE =256 VLIST w o CO W O w 3: CO CM u SIZE = 128 u u • u SIZE - Gk u • u F F u SIZE = 32 u • ^To BASETB Figure D3: Example of an Entry for VLIST 55 LIST OF REFERENCES [1] Barnes, G. H-, et al, "The ILLIAC IV Computer", IEEE Transactions on Computers , C-17 , 8 (August, 1968), pp. 7^-757- [2] Kuck, D. J., "ILLIAC IV Software and Application Programming", IEEE Transactions on Computers , C-17 , 8 (August, 1968), pp. 758-770. [3] Iverson, K. E-, " A Programming Language ", John Wiley & Sons, Inc., New York (1962). [k] Knuth, D. E., " The Art of Computer Programming ", Vol. 1, Addi son-Wesley (1968). [5] Knowles, M., et al, "Matrix Operations on ILLIAC IV", Department of Computer Science, University of Illinois, Urbana, Illinois, ILLIAC IV Document No. 118 (March, 1967). [6] Benokraitis, V., "Alternate Storage Methods for Two-Dimensional Hydrodynamics Calculations", Department of Computer Science, University of Illinois, Urbana, Illinois, ILLIAC IV Document No. 190 (May, 1968). [7] Randell, B. and Kuehner, J. C, "Dynamic Storage Allocation Systems", Comm. ACM , 11, 5 (May, 1968), pp. 297-306. [8] Wilhelmson, R. B., "Control Statement Syntax and Semantics of a Language for Parallel Processors", (M.S. Thesis), Department of Computer Science, University of Illinois, Urbana, Illinois, (January, 1969). [9] Budnik, Paul P., "TRANQUIL Arithmetic", (M.S. Thesis), Department of Computer Science, University of Illinois, Urbana, Illinois, (January, 1969) « [10] Hellerman, H., "Addressing Multidimensional Arrays", Comm. ACM , 5, k (April, 1962), pp. 205-207. [11] Balzer, R. M., "Dataless Programming", Proc FJCC , (1967), pp. 535-5^3. [12] Northcote, R. S., "The Structure and Use of a Compiler-Compiler System", Proc. Third Australian Computer Conference , (May, 1966), pp. 339-3^- [13] Backus, J. W., "The Syntax and Semantics of the Proposed International Algebraic Language of the Zurich ACM-GAMM Conference", Proc. Int. Conf. Inf. Proc , UNESCO , Paris, France, (June, 1959)- 56 [lh] Naur, P., et al., "Revised Report on the Algorithmic Language ALGOL 60", Comm. ACM , 6 (January, I963), pp. 1-17- UNCLASSIFIED Security Classification DOCUMENT CONTROL DATA -R&D (Security claaell lcatlon ol till,, body o/ rt.ff.cl and Indenrnj annotation mu.t be entered when the , w „|) rep ort /» cle.altled) 1. ORIGINATING ACTIVITY (Corporate author) ' Department of Computer Science University of Illinois Urbana, Illinois 6l801 J. REPORT TITLE 2a. REPORT SECURITY CLASSIFICATION UNCLASSIFIED 2b. CROUP STORAGE ALLOCATION ALGORITHMS IN THE TRANQUIL COMPILER 4. descriptive NOTES (Typm ot report and Ineluaive dmtmm) Research Report 8. AUTHOR(S) (Flrmtnmotm, middle initial, Im at nana) Yoichi Muraoka 6. REPORT DATE January 13, 1969 *a. CONTRACT or srant no. 46-26-15-305 6. PROJEC T NO. USAF 30(602)1+1^1 10. DISTRIBUTION STATEMENT 7a. TOTAL NO. OF PACES 6l 76. NO. OF REFS 14 ORIGINATOR'S REPORT NUMBER(S) DCS Report No. 297 96. OTHER REPORT NOW (Arty other number* that may be ateloned tMa report) ' Qualified requesters may obtain copies of this report from DCS, III. SUPPLEMENTARY NOTES • 2. SPONSORING MILITARY ACTIVITY Rome Air Development Center NONE Griffiss Air Force Base IS. ABSTRACT — — 1 Rome, New York 13MK) TRANQUIL is a language for describing algorithms in terms of parallel constructs. Its compiler is now being implemented for the parallel array computer ILLIAC IV. This paper discusses a particular part of the implementation; namely, the problem of storage allocation for arrays. DD ,'.r..1473 UNCLASSIFIED Security Classification UNCLASSIFIED Security Classification KEY UNIVER9ITY OF ILUNOI9-URBAN* 3 0112 045402051