UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN Digitized by the Internet Archive in 2013 http://archive.org/details/cleopatracodegen740halb UIUCDCS-R-76-740 no. 7io CLEOPATRA CODE GENERATOR USER'S GUIDE by John David Halbur January, 1976 Report No. UIUCDCS-R-76-740 CLEOPATRA CODE GENERATOR USER'S GUIDE by John David Halbur 1976 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 i i TABLE OF CONTENTS I . THE LANGUAGE IMPLEMENTED I 1.1. Introduction to this Report 1.2 Summary of Major Restrictions 1 1.3 Identifiers and Constants 1.4 Configurations 1 . 5 Types 5 1.6 Blocks 6 1 . 7 Expressions 1.8 Statements 1.9 Implemented Keywords 1.10 System Supplied Operators 1.11 Job Control Language 13 1.12 Subrout ine Linkage 1.13 Error Messages 18 2. INPUT Specifications 19 2 . 1 SYMBOL Table Entries 19 2. 2 Form of the Intermediate Text 2.3 Intermediate Text Transmission LIST OF REFERENCES 27 APPENDIX A - SPECIFICATIONS FOR THE INTERMEDIATE TEXT 28 APPENDIX B - SUMMARY OF PRODUCTIONS IMPLEMENTED 32 1. THE LANGUAGE IMPLEMENTED 1 . 1 Introduction to This Report This report consists of two parts. The first part details the lan- guage accepted by the CLEOPATRA code generator and is intended for use by CLEOPATRA programmers using the compiler. The second part describes what in- formation should be in the symbol table and is intended for use by a compiler writer interfacing with this code generator. Part 1 is intended as a companion to Report UIUCDCS-R-74-646 CLEOPATRA [2] which gives the specifications for the entire language. The term- inology and notations used here follow that used in the earlier report. A knowledge of the information in Report 646 is essential to understanding most of what is described here. 1 . 2 Summary of Major Restrictions This section describes in general some of the major restrictions of this implementation. More details appear in later sections classified by lan- guage feature. The six major control structures are implemented. These control structures are: local and global data blocks, local and global structure blocks, and procedure and operator routine blocks. The system-supplied types are BIT, INTEGER, LONG_INTEGER, POINTER, and CHARACTER. User-defined types are allowed with some restrictions. REAL and DECIMAL types are not im- plemented. There are three major restrictions on user-defined types. One is that they may not be nested. A user-defined type may not contain another user- defined type. Arrays are not allowed as part of a user-defined type but arrays of user defined types are permitted. The last restriction is that identifiers in a user-defined type data block may not have the attributes DEFER, CONSTANT, or SHARE. Only one storage scheme is implemented for arrays. The keywords RIGHT and VECTOR may be used where appropriate. Cross-sectioning of arrays is not im- plemented and all formal parameter arrays must be declared with *:* bounds. Ar- rays may not have variable bounds and character identifiers may not have variable lengths. The CONSTANT attribute is implemented for the basic types, but only for scalars. The INIT attribute is implemented for scalar variables only. Deferred storage is implemented, but part of a user-defined type may not be deferred. Procedures and operators can only return the five basic types and only as scalars. No implied type conversion is allowed anywhere in this imple- mentation. All of the statements in the language are implemented with a few re- strictions but the features oriented toward operating systems are not implemented (these include the material in Chapters 8 and 9 of the Report 646) . Data-groups are not implemented. 1. 3 Identifiers and Constants This implementation is based on the assumption that the source program will be on punched cards. However, the actual source is transparent to the code generation and some limitations on syntax placed by the analysis phase may not be included here. All keywords are reserved. A list of keywords appears in Section 1.8. The end-of-source-record taken that delimits a comment beginning with an exclamation point is the boundary of a card. Identifiers must be 31 characters or fewer. For conJ Lgural Lon n the maximum Length is seven because they are used as external names. If a configuration name is longer than seven characters, the first seven will be used, which could cause ambiguity. Constants may be integer or long-integer decimal, bit, or literal string. Octal is not implemented but decimal, hexadecimal, and binary repre- sentation of integers is allowed. The minimum and maximum values for integers are given in Table 1.3.1. Literal character strings may be at most 256 characters long. INTEGER LONG_INTEGER MAXIMUM MINIMUM MAXIMUM MINIMUM HEX X.7FFF X.8000 F.X.7FFFFFFF F.X. 80000000 DECIMA1 32767 -32768 F. 2147483647 F. -2147483648 Table 1.3.1 Range of Integer Values The system-supplied values are TRUE and FALSE for bit values, NIL for any basic type, LARGE and SMALL for integer types, and FIRST and LAST for charac- ter strings. The values assigned to these names are given in Table 1.3.2. FIRST and LAST are character strings of length one, only. They are shown in the table as their hexadecimal EBCDIC representation. The total number of identifiers and literals for one compilation is limited to 1500. INTEGER LONG- INTEGER BIT LARGE 32767 F. 2147483647 SMALL -32768 F. -2147483648 NIL F.O S.O TRUE S.l FALSE S.O FIRST LAST CHARACTER POINTER C. X.00 X.FF Table 1.3.2 System-Supplied Values 1 . 4 Configurations Configuration names may be at most seven characters. The same name may not be used as a configuration name twice in the same program. These names are used as external names. They can have aliases, but only the original name is external. This means that alias names may be longer than seven characters and may be redefined, or used in case the original name is redefined in another context. One or two control sections will be set up for each configuration. The first is used for the program code and the second is used to store local static information such as CONSTANTS and subroutine addresses. The second may not be present if there are none of these. Blocks should be presented for compilation grouped together by con- figuration. A routine block must immediately follow its corresponding data and structure blocks. Also a type data block must appear before an instance of that type is defined in a data block. Otherwise, the standard define-bef ore- use rule applies. A block in this implementation may not be compiled separately from other blocks in the same configuration. A configuration may be compiled separately if 1) it uses no global data, nor does it have a global data block. It therefore communicates with other configurations only through para- meters; 2) it calls upon other configurations only if they are defined in this configuration's structure block; and 3) the routine block and data block contain no literals or constants needed at run-time except for the system-supplied constants. Item 3 can be overcome by defining all literals and constants with the CONSTANT attribute in the local data block. Associated only with the configuration at the root node of the configuration tree for one compilation may be an environment request. This is of the form "COMPILE INTO configuration-reference" which tells the compiler what nesting level the current compilation is to begin with. It need not be present for a main program. Recursion is implemented for routine blocks. Parameters may not be specified for type definitions. The BUILT_IN attribute is not implemented and neither is SHARE. 1. 5 Types The basic types implemented are INTEGER, LONG__INTEGER , BIT, CHARACTER, and POINTER. Their capacities are the same as those given for their corres- ponding constants given in section 1.2. Identifiers of type CHARACTER must specify a maximum length. POINTER variables must point to either a basic type or a user-defined type. Although conversion between types is not automatic, some degree of conversion between INTEGER and LONG_INTEGER is allowed. In expressions they may be mixed without loss of precision for LONG_INTEGER values. Care must be taken, however, so that the correct type is supplied for parameters and operands of user-defined operators. Mixture in this case will result in a LONG_INTEGER result. Pointer variables can have as a value either NIL or the address of an allocation of a deferred identifier. Storage is allocated through an ALLOCATE statement. An object can be freed only through the execution of a RELEASE statement. Loss of all pointers to an object result in garbage. Upon release of an object, the pointer used to reference it is unchanged. Copies of pointers to the same object are also unchanged. This implementation places all of the responsibility of management of deferred storage on the programmer. User-defined types are implemented with several rules limiting their capabilities. They may not contain arrays, other user-defined types, or ele- ments with the attributes DEFER, CONSTANT, SHARE, or INIT. Data groups are not implemented, but user-defined types can be used to aggregate the basic types. 1.6 Blocks Structure blocks describe calling conventions for other configurations. It is possible in CLEOPATRA to specify whether or not a parameter can be changed by its subroutine. This is enforced by the analysis phase of the compiler. All parameters are passed by reference. This means that expressions and constants are passed by value, and identifiers and array references are passed by address. Keyword parameters are not implemented and positional parameters may not be omitted. Implied conversions are not allowed for parameters. Arrays and user-defined types may not be returned by a routine block. Data blocks define identifiers used by routine blocks. Some possible attributes are not implemented and others are restricted. Data-groups are not implemented. For arrays, only the storage scheme RIGHT for row-major storage is implemented. RIGHT and VECTOR are permissible attributes. BIT arrays are not implemented, but arrays for the four other basic types are implemented. Arrays may not be initialized. This implies that the attribute CONSTANT cannot apply to arrays. The INIT attribute does apply to all of the basic types applied to scalar variables. Bit variables should only be initialized with TRUE and FALSE, not S.l and S.O. POINTER variables may only be initialized to NIL. Expressions are not allowed by the INIT attribute except for those that can be evaluated at compile time. The attributes ALIGNED and COMPACT may not be used. All identifiers are ALIGNED. INTEGER identifiers are aligned on half-word boundaries as are CHARACTER identifiers. LONG_INTEGER and POINTER variables are aligned on full word boundaries. Bit variables are aligned on byte boundaries unless they are part of a user-defined type. This is to allow the user to define a type bit- string. Type data blocks define the structure of user-defined types. Identi- fiers within a type data block cannot have attributes DEFER, CONSTANT, SHARE, or INIT, along with all other restrictions stated above for regular identifiers. 1 . 7 Expressions Most of sections 6.4 and 6.5 of Report 646 do not apply to this im- plementation. The COPY attribute is not implemented and neither is cross- sectioning of arrays. An array bound must be an integer constant or a constant expression. The lower bound is assumed to be one if not given. Formal parameter arrays should be defined with bounds of *:*. The compiler supplied functions LBOUND and HBOUND are implemented. Their operands must be an un- qualified array name and an integer constant. 1 . 8 Statements The statements of the language are, for the most part, implemented in their entirety. The exceptions are RETURN, ALLOCATE, and DECISION. The only restriction on the RETURN statement concerns what can be returned. Only scalar, system supplied types may be returned. The ALLOCATE statement cannot have parameters or an INIT attribute. Deferred identifiers cannot have the INIT attribute in their data blocks either. The restriction on the DECISION statement is that no more than 64 switches may be associated with one expression in the decision part of the statement. 1. 9 Implemented Keywords Table 1.9.1 lists all keywords used in this implementation along with their symbol table row numbers. All keywords are reserved words. 1 . 10 System Supplied Operators Table 1.10.1 lists all system-supplied operators. Most of these are explained in report 646. They are briefly described here and any differ- ences from report 646 are noted. Unary arithmetic operators: negation ABS absolute value Character unary operator: LENGTH LENGTH can have as its operand only a character variable and not an expression Row// Keyword Row// Keyword 1 ACTION 3 BEGIN 5 DOWNTO 7 END 9 FOR 11 IF 13 NIL 15 PROCEDURE 17 RETURN 19 THEN 21 UPTO 23 ADDRESS 25 B 27 BUILT 29 C 31 COMMENT 33 CONSTANT 35 DEFER 37 F 39 IN 41 INTEGER 43 LONG- INTEGER 45 RETURNS 47 S 49 VECTOR 51 X 53 FALSE 55 LARGE 57 SMALL 2 ALLOCATE 4 DECISION 6 ELSE 8 EXIT 10 FROM 12 ITERATE 14 OPERATOR 16 RELEASE 18 STEP 20 WHEN 22 WHILE 24 ALIAS 26 BIT 28 BY 30 CHARACTER 32 COMPILE 34 DATA 36 EXTENTS 38 GLOBAL 40 INIT 42 INTO 44 POINTER 46 RIGHT 48 TO 50 TYPE 52 STRUCTURE 54 FIRST 56 LAST 58 TRUE Table 1.9.1 Keywords 10 unary minus ABS absolute value + binary addition subtraction * MOD AND OR >= or ->< LENGTH -> <- "-> ?-> ii multiplication /' division ,V J. exponentiation modular arithmetic logical and logical or -i logical not == equal ~ l= not equal > greater than < less than greater than or equal to <- or -s> less than or equal to := assignment character string length substring index verify concatenation Table 1.10.1 System Supplied Operators 11 Assignment := this is implemented for all basic types. Comparison operators >= <= -i< -> less than greater than equal to not equal to greater than or equal to less than or equal to not less than not greater than these return a bit value and are implemented for INTEGER, LONG_INTEGER and CHARACTER. In addition for BIT and POINTER types only == ?nd -?= are implemented. integer arithmetic operators: + * ** // MOD Truncated division 12 a negative exponent for the ** operator results in zero string selection: -> <- character interrogation: "-> INDEX ?-> VERIFY character concatentation: " logical operators for BIT and switches: AND OR -i unary not Two operators are provided that act as conversions from CHARACTER to LONG_INTEGER and vice versa. They are unary operators. Their names are CHAR and L_INT respectively. 1 1 The operators VERIFY, CHAR, and the comparison operators for character strin;-; are performed at run-time by a subroutine call. For convenience, two I/O procedures are provided. One named INPUT reads one record of eighty characters and returns that string via its single parameter. Its definition should appear in any program that wishes to refer- ence it. That definition is PROCEDURE INPUT (CHARACTER BY ADDRESS). RETURNS LONG_INTEGER; It returns the length of the record. It reads from a file named SYSIN. The second procedure accepts a character string and prints a variable length record. It accepts strings up to length 133. The user may give a car- riage control at the beginning of the string, but must specify this through the JCL. The string to be printed should be passed as parameter. The length of the record is returned. The file used is named SYSPRINT. The definition of the routine must appear in the program if it is used. A sample is: PROCEDURE OUTPUT (CHARACTER). RETURNS LONG_ .INTEGER ; 1.11 Job Control Language There are several parameters that may be passed to the code generator, The LIST/NOLIST option specifies whether or not a listing of the code generated is to be printed. Likewise the MAP/NOMAP option specifies whether or not a memory map of constants and automatically allocated storage is to be printed. The DECK option specifies that an object deck is to be punched and the OBJECT parameter specifies that the object program should be passed to the next step for execution. These two are mutually exclusive. Both cannot be performed. The CARDS option specifies that the code generator's input appears on punched 14 cards. This can be used for debugging the code generator or the analysis phase. The corresponding option is DISK which specifies that the input is in a disk temporary data set. The default options are LIST, MAP, OBJECT, and DISK. If two of a pair of mutually exclusive parameters are present, the second is chosen. The files needed by the code generator are STEPLIB to specify to the operating system where the program resides, SYSPRINT for printer output of the code and memory map, SYSIN for input, SYSLIN if the OBJECT parameter is in effect, and SYSPUNCH if the DECK parameter is specified. The job control lan- guage needed to generate code and execute that code are specified below in Table 1.11.1. For execution the SYSLIB DD statement must include the data set de- scribed in Table 1.11.1. It contains compiler-supplied subroutines. It also contains the control section that forms the entry point of any CLEOPATRA environ- ment of the stack and storage management. 15 //CODE EXEC //STEPLIB DD //SYSLIN DD // //SYSPRINT DD //SYSPUNCH DD //SYSIN DD //GO EXEC //SYSLIB DD //SYSPRINT DD //SYSUDUMP DD //SYSLIN DD //SYSLOUT DD //SYSIN DD PC,M=CLEOCDE , PARM= ' CARDS ' DSN=USER . P64 1 6 . CLEOCODE , DISP=SHR DSN=&&LINKFILE,DISP= (NEW, PASS) , SPACE=(TRK,(10,1)) ,UNIT=DISK SYSOUT=A SYSOUT=B * PGM=LOADER , PARM= ' EP=CLEOMAIN ' DSN=USER. P64 16 . CLEOLIB ,DISP=SHR SYSOUT=A SYSOUT=A DSN=&&LINKFILE,DISP= (OLD, DELETE) SYSOUT=A Table 11.1.1 Suggested JCL To Generate Code and execute CLEOPATRA 16 1.12 Subroutine Linkage If it is necessary to write routines in assembler language to inter- face with CLEOPATRA, certain calling conventions should be followed. Parameters are passed through a pointer in register 1 to a list of addresses of parameters. Integer and pointer values are returned through register and character and bit values are returned through the last entry in the parameter address vector. All registers should be stored in the save area pointed to by register 13. Register 14 contains the return address. To comply exactly with the CLEOPATRA linkage convention use the code in Table 1.12.1. Remember that the entry point to all CLEOPATRA programs must be the program named CLEOMAIN. The name of the main program is always changed to CLEOSTRT to begin the user's program and a control section named CLEOSTRT contains all literals and must be present. At offset eight from the entry point should be the address of the static area associated with this routine if any. It is an external address. At offset 12 is a half-word containing the amount of automatically allocated storage needed for this routine. It is always a minimum of eighteen words for the standard save area. The next half-word at offset 14 contains the length of the save area and data areas. The difference between the total area and this area is the length of the temporary stack pointed to by register 12. If not enough room is available in currently allocated memory a memory management routine is called. For specialized user assembler subroutines some or most of this may be omitted. 17 USING *, 15 STM BC DC DC ST L LH LH AL C BC L ST ST L ST LR BC LR LR LH L 14,12, 12(13) 15, 16( 0,15) V (STATIC) X 1 10001000' 13, S( 0,13) 8( 0,15) 14( 0,15) 12( 0,15) 72( 0,13) 76( 0,13) 70( 0,15) 72( 0,13) 4( 0, 4) 72( 0, 4) 76( 0,13) 76( 0, 4) 11 12 2 2 2 11 4 13 2 2 2 13 15 2 10 15 BALR 14 LR LR 1 AR 12 BALR 2 USING * 88( 0,15) 12( 0,15) 0( 0, 3) 15 2 10 13 2 Subroutine Entry L 13, 4( 0,13) L 14, 12 ( 0,13) LM 1,12, 24(13) BCR 15,14 Subroutine Return Table 1.12.1 Subroutine Linkage 18 1.13 Error Messages The input to the code generator is supposed to be free of syntax er- rors, since the analysis phase should have corrected or deleted them. The code generator does generate a few error messages. If an illegal token is passed in the intermediate text a message will be printed and the code generator will stop. It will also stop if the symbol table or constant table overflow. The run time temporary stack cannot exceed 4096 bytes nor can the static control section connected to a routine control section. A message will be printed in this case, but code generation will continue because the code generated may still be correct. To correct an error caused by this overflow split the subroutine in question and try again. A message will be printed if a compiler parameter is illegal, but code generation will continue. A statement might be too long for the code generator to handle. In this case code generation will terminate. 19 2. INPUT Specifications 2.1 SYMBOL Table Entries Described here are the requirements for all possible entries in the symbol table: All items have a type. The TYPE field in the table should be set to the binary encoding of the following assignments, Long integer 1, Integer 15, Pointer 7, Character 2, Bit 8, User 9, Label 13. The type field should be set to "user" only for identifiers which have a type defined elsewhere by the user. For local data the LOCAL should set the DEFER bit to true. All for- mal parameters should have the bit FORMAL set to true in their row. For numeric and character literals the LITERAL bit should be set. For INTEGER and BIT values the actual value should appear in ATR2. LONG_INTEGER values should be divided and stored in ATR1 and ATR2 . Character literals should have their lengths stored in ATR1 and their value should be stored in the Constant Table. ATR2 should point to the first character of this entry. CONSTANT storage follows the same pattern except with the CONSTANT bit set. The value to which the constant is initialized should not appear in a row by itself unless it is used elsewhere ir> the program. A POINTER identifier should include in the PTR field the bit pattern of the type it points to. If this type is user defined then ATR1 should point to the row represented by the declaration of the type. Items that are part of a user-defined type (i.e., they appear in a type data block) must have the IN- TYPE bit set. In addition, the BLKLEVEL field of the first element in the type should contain the pointer to the row represented by the definition of the type. An identifier of this type should have its ATR2 entry also point to the row 20 represented by the type name. Identifiers that are initialized, except for CONSTANTS, should have the INITIAL bit set. The ATR2 entry should then point to the row of the literal to which the identifier is to be initialized. For arrays the ARRAY bit should be set and the ATR2 entry should point to the constant table where, in entries of five characters each, should first ap- pear the number of extents followed by the lower and upper bound for extent one and so on with the lower and upper bounds for the remaining extents. For an alias name the ALIAS bit should be set. The ATR2 entry should point to the symbol table row for the original name. In all of the above cases, for type CHARACTER the ATR1 entry should contain the maximum length, if known. Configuration names and operator links require two adjacent symbol table rows. The LINK bit should be set for these. For a procedure name the ENTRY bits should be set to '01'B. Unary operators should have an ENTRY value of '10'B and binary operators should have 'll'B. In the first row the BLK# entry should be the block in which the identifier is defined (i.e., the block numbers of the configura- tion in whose structure block this name is defined) . The BLKLEVEL entry should contain the nesting level of the configuration represented by that name. The ATRI entry should contain its actual block number and the ATR2 entry should contain a pointer to the position in the constant table where the character string name of the configuration appears. In the second row the ATRI entry should contain the number of parameters to the configuration and ATR2 is used to point to a user-defined type if the con- figuration returns the type POINTER to a user-defined type. The BLK// entry always contains the block number of the block in which 21 the identifier is defined. The BLKLEVEL entry should contain the nesting Leve of the configuration in which it is defined, except in the cases mentioned above. Block numbers and block levels always start at 1 . A diagram of the symbol table declaration appears in Table 2.1.1. 22 DECLARE SYMTAB(95:1594) , 2 ATTRIBUTES, 3 TYPE 3 CONSTANT 3 ARRAY 3 LITERAL 3 INITIAL 3 LOCAL 3 DEFER 3 FORMAL 3 ALIAS 3 IN_TYPE 3 LINK 3 ENTRY 3 PTR 3 APTR 3 UNUSED 2 BLK# 2 BLKLEVEL 2 ATR1 2 ATR2 BIT (4 BIT(1 BIT(1 BIT(1 BIT(1 BIT(1 BIT(1 BIT(1 BIT(1 BIT(1 BIT(1 BIT(2 BIT (4 BIT(1 BIT(1 ), FIXED BIN, FIXED BIN, FIXED BIN, FIXED BIN; /* Type of an identifier */ /* CONSTANT attribute */ /* Set if an array /* CHAR or INT constant /* Set if initialized /* if local data /* DEFER attribute /* formal parameter /* if alias name /* if part of user type */ /* Jink item */ /* three types of entry */ I* type pointed to */ /* if PTR points to ARRAY*/ */ */ */ */ */ */ */ Table 2.1.1 The Symbol Table 23 Global structure blocks appear in configurations that describe user defined types. The BLK# entry for all configuration names found in a globa] structure block should correspond to the block number of the configuration in which the type is defined. The BLKLEVEL entry should be one greater than the block level of the one in which the type is defined and not two greater. Symbol table rows corresponding to the definition of a user defined type should have a BLK// entry of zero. User defined type names do not have a limit of 7 characters and may be up to 31 characters long. 2. 2 Form of the Intermediate Text The intermediate text is described in detail elsewhere [l J. Appendix A includes the specifications for the text. It is mainly an array of symbol table pointers. Expressions are in a hybrid postfix form with array references and procedure calls in prefix form. All expressions must be ended with an ending symbol such as a semi- colon, and all statements must end in a semi-colon, even if they do not appear in the source program. Postfix operators must be preceded in the text by first their right operand and secondly by their left operand. This is because CLEOPATRA expressions are evaluated right to left. Logical expressions must include pointers to other entries in the intermediate text. These point to AND and OR operators that precede the corres- ponding logical sub-expression. The values are actually the negative of the intermediate text position. 2. 3 Intermediate Text Transmission The intermediate text should be in a data set of variable length records if the DISK compiler option is specified. Records are of three types, symbol table rows, constant table segments, and intermediate text statements. 24 Records should appear in the following order for a single configuration: first symbol table rows corresponding to that configuration, then constant table entries, and last the intermediate text. This should be repeated for every configura- tion. Configurations defining a user-defined type do not have a routine block and so their symbol table entries should be transmitted along with the informa- tion for another type of configuration. Symbol table rows should be written one row as one logical record. Every new row since the last transmission of symbol table rows should be sent in one group. To mark the end of symbol table records a dummy record should be written, containing all 1 bits in the UNUSED entry of the record. Following this should be parts of the constant table. These should be variable length character strings of maximum length 100. The code generator will try to read these records until it finds one of length less than 100. Next it will try to read the intermediate text. This should be transmitted one state- ment at a time in an array of 50 elements. A statement that requires more than fifty entries will be too long and a compiler error message should be generated in the analysis phase. The end of a statement is a semi-colon. Compound statements are broken up. The end of an IF statement for this purpose is its corresponding THEN. The end of a DECISION statement is the word ACTION. ELSE is transmitted along with the statement that follows it. BEGIN is transmitted separately. The end of an ITERATE statement follows the semi-colon after ITERATE or the semi-colon after the WHILE expression of one exists. The code generator will stop reading the text when the logical end of the program is found. An example is diagramed in Figure 2.2.1. If the CARDS parameter is in effect the format of the records is dif- ferent . 25 Symbol table rows 95 - 120 CONSTANT TABLE POS 1 - 50 IT PROC stmt Assign stmt Return stmt END Symbol table rows 121-130 CONSTANT TABLE POS 51 - 150 CONSTANT TABLE POS 151 - 160 IT PROC stmt Iterate stmt Return stmt END END Figure 2.2.1 Text Transmission 26 Symbol table rows should appear as one per card. The thirty-two bits should ap- pear in the first thirty-two columns and the four other entries in the symbol table should be in the next twenty columns, five columns each. The constant table must be on one card per configuration. Following it should be the inter- mediate text in GET LIST format. An entry of 99999 marks the end of a statement and stops reading. 27 LIST OF REFERENCES 1 ] Halbur, John D. , "A Code Generator for the CLEOPATRA Language", Report UIUCDCS-R-75-739, Department of Computer Science, University of Illinois, June 1975. [2] Schreiner, Axel T. , "CLEOPATRA Comprehensive Language for Elegant Operat System and Translator Design", Report UIUCDCS-R-74-646, Department of Computer Science, University of Illinois, May 1974. [3] , "CLEOPATRA A Proposal for Another System Implementation Language", Report UIUCDCS-R-74-654 , Department of Computer Science, University of Illinois, June 1974. 28 APPENDIX A - SPECIFICATIONS FOR THE INTERMEDIATE TEXT The form of the BNF used here is the same as that used by Schreiner [2J. Upper case symbols and punctuation are terminals and represent their symbol table row pointers. Lower case designates non-terminals. Other nota- tion used is: ' indicates a choice { } are used to combine a number of choices j the enclosed entity may appear or be omitted ]. the enclosed entity may appear or more times { } . the enclosed entity must appear one or more times. 29 intermediate text routine_type proc op name parms left right statements statement stmt conf iguration_name routine t statements END proc op PROCEDURE configuration_name [parms] ; OPERATOR [left] name [parms] right; identifier identifier identifier parms identifier identifier statement statement statements stmt; lstmt expression RETURN expression ALLOCATE identifier FOR identifier RELEASE expression j EXIT [label] | NIL | IF expression THEN statement ELSE statement 30 lstmt cstmt begin decision decisions actions switches iterate label : cstmt label; cstmt ; begin decision iterate BEGIN statements END DECISION decions ACTION actions [ELSE statement] END {switches VECTOR constant constant : expression ; identifier : expression ; }• {expression : statement }• identifier identifier switches [FOR identifier [FROM expression;] [{D0WNT0 UPTO } expression ;] [STEP expression;]] ITERATE { ; I WHILE expression ; } statements [WHEN expression;] END u expression operand array proc_call operator op_call conf iguration_name itptr identifier constant operand [operand] operator NOT expression ) ' and expression ■ expression operand [operand] operator itptr expression identifier array proc_call identifier expression [ , expression] • ) conf iguration_name expression [ , expression] • ) system supplied op_call name expression [ , expression] * ) identifier - intermediate text pointer symbol table pointer to a user- defined symbol symbol table pointer to a literal constant 32 APPENDIX B - SUMMARY OF PRODUCTIONS IMPLEMENTED Report 74-646 #'s (l.D* (1.2) (1.3) letter : := A B delimiting-character ; I , special-character : : 1*1-1 + C j • • ' | Z := ( I ) I . I : I ' blank-character @ I # I $ I % I & = I " I / I > I < (1.4) (1.5) (1.6) (1.7) digit ::= 0|l|2|34|5|6|7|8 control-character ::= backspace ! end-of -source-record comment : := COMMENT any sequence of characters with the exception of a semicolon; : := ! any sequence of characters with the exception of an end of source record end-of-source-record * modified from Report UIUCDCS-R-646 33 (2.1) identifier : := letter [letter | digit] (2.2) operator : := {special-character} • identifier (2.3) constant : := integer long-integer bit literal-value (2.4) decimal-digit : := digit (2.5) hexadecimal-digit : := digit | A | B C D E F (2.7) binary-digit : := 1 (2.8) minus-symbol : := - (2.9) decimal-string : := [minus-symbol] {decimal-digit}' (2.10) hexadecimal-string : := X. [minus-symbol] {hexadecimal-digit}- (2.12) binary-string : := B. [minus-symbol] {binary-digit}* (2.13)* basic-constant : := decimal-string hexadecmimal-string binary-string (2.14) integer : := basic-constant (2.15) long-integer : := F. integer (2.17) bit ::= S. integer (2.28) literal-value : := Cany sequence of characters upto and not including the first following blank character (2.29) system-supplied-value : := type {LARGE NIL | SMALL} (2.30)* system-supplied-constant : := FALSE FIRST LAST TRUE 34 (3.1) configuration ::= type-pack algorithm (3.2) type-pack : := global-structure-block [structure-block] [data-block] type-data-block (3.3) algorithm : := [structure-block] [global-data-block] [data-block] routine-block (3.4) compilation ::= [environment request] {structure-block global-structure-block data-block global-data-block | type-data-block routine-block}- (3.5) environment-request : := COMPILE INTO configuration-reference (3.6) configuration-reference : := configuration-name [. configuration-name 35 (4.1)* basic-ref-type ::= INTEGKK LONG_TNTEGER | BIT CHARACTER POINTER (ref-type) (4.2)* basic-type ::= INTECER | LONG_INTEGER BIT CHARACTER! (expression)] | POINTER(type) (4.4)* ref-type ::= {basic-ref-type type-name} [integer EXTENTS] [ALIAS identifier] (4.5)* type : := {basic-type type-name} L array ] [ALIAS identifier] 36 (5.1) structure-block : := STRUCTURE configuration-name { ; link-item}* [;] END configuration-name [;] (5.2) configuration-name : := procedure-name type-name operator-link (5.3)* link-item : := TYPE type-name [ALIAS identifier] global- link- it em (5.4)* global-link-item ::= PROCEDURE procedure-name [ALIAS identifier] [ref-type-list . ] RETURNS basic-type (5.5)* ::= operator-link : OPERATOR [ lef t-ref-types] operator [ALIAS identifier] right-ref-types RETURNS basic-type (5.10) ref-type-list ::= (type-formal [ , type-formal] • ) (5.H)* type-formal ::= ref-type [BY ADDRESS] (5.12) left-ref-types : := ref-type [BY ADDRESS : [ref-type-list] . ref-type-list] (5.13) right-ref-types ::= ref-type : ref-type BY ADDRESS ref-type-list { . ref-type : ref-type BY ADDRESS} (5.14) global-structure-block : := GLOBAL STRUCTURE type-name { ; global-link-items} • [ ; ] END type-name [ ; ] (5.15) type-name : := identifier (5.16) routine-block ::= PROCEDURE procedure-name [name-list . ] { ; statements}- [;] END procedure-name [ ; ] (5.17) procedure-name : := identifier (5.18) name-list ::= (f ormal [, formal] • ) (5.19)* formal : := identifier (5.20) procedure-call : := procedure-name [parameters] 37 (5.21)* parameters ::= (actual[, actual]-) (5.22)* actual ::= expression (5.23) routine-block ::= operator-link : OPERATOR [left-names] operator right-names { ; statement}- [ ; ] END operator-link [;] (5.24) operator-link ::= identifier (5.25) left-names ::= ref-type identifier [ { • • } name-list : ] (5.26) right-names ::= [:] ref-type identifier name-list { : • } ref-type identifier (5.27) operator-call : := [expression [{ : • } [parameters]]] operator [[parameters] { : • }] expression (5.29) data-block ::= DATA configuration-name { ; [CONSTANT | DEFER] data-group}- [;] END configuration-name [;] (5.30) global-data-block : := GLOBAL DATA configuration-name { ; [CONSTANT DEFER] data-group}' [ ;"] END configuration-name [ ; ] (5.31)* type-data-block : := GLOBAL DATA type-name { ; basic-type [array] item[,item]- }• [;] END type-name [ ; ] (5.32)* data group ::= {basic-type type-name} [array] item [,item]- (5.34)* array : := { RIGHT | VECTOR} (bound [, bound] • ) (5.35)* bound : := { *:* [integer:] integer} (5.36)* item ::= identifier [INIT constant] 38 (6.1)* expression : := constant system-supplied-value system- supplied- const ant array-reference procedure-call operator-call | (expression) (6.2)* array-reference ::= array-expression (6.3)* array-expression : := identifier 39 (7.1) stmt (7.2) stmt (7.3) stmt (7.5)* stmt (7.4) (7.11) (7.12) (7.13) (7.14) (7.15) (7.16) (7.17) (7.18)* (7.19)* (7.20) (7.21) stmt (7.6) stmt (7.7) lstmt (7.8) label (7.9) cstmt (7.10) cstmt := expression Nil, := RETURN expression := EXIT [label] := ALLOCATE identifier FOR pointer := IF expression THEN statement [ [; ] ELSE statement] := RELEASE expression := cstmt label : cstmt label identifier = BEGIN statement [; statement] •[; ] END = [for-phrase] ITERATE [while-phrase] statement [; statement ] • [;] [when-phrase] END for-phrase ::= FOR identifier [FROM expression] [;] [{DOWNTO UPTO} expression] [;] [STEP expression] [;] while-phrase ::= WHILE expression ; when-phrase ::= WHEN expression [;] cstmt ::= DECISION decision [; decision]- [;] ACTION action [;action]- [;] [ELSE statement [;]] END decision : := switch : expression decision ::= switch{ , switch}- [list-layout] : list-index switch ::= identifier list-layout : := VECTOR (integer : integer) list-index : := expression action : := switch-expression : statement switch-expression : := [-i] switch [switch-operator [-»] switch]- 40 (7.22)* switch-operator ::= & == -= | AND | OR (7.23) statement : := stmt lstmt BIBLIOGRAPHIC DATA SHEET I. Report No. l!IUCDCS-R-7(>-M() 4. T n lc .mil Suhi ii lc CLEOPATRA CODE GENERATOR USER's GUIDE 3. Rei ipient'i A. , . ■■ion N< 5. Report Datt ■ '■inu.iry V)K) 7 . A hi hoi (s ) John David Halbur 8. I'itI ormiriK Organization Rept. No. 9. Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 10. Projl , , I ask/Work Unit No. 11. Contract /Grant No. 12. Sponsoring Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 13. Type of Keport & Period ( overed Technical Report 14. 15. Supplementary Notes 16. Abstracts CLEOPATRA is a general-purpose and systems implementation language in the style of ALGOL designed for computers similar to the IBM System/360. Among its concepts are extensions to the ALGOL block structure, user-defined data types and data access mechanisms, and user-defined 'generic' operators. The language is goto-free, and has a generalized decision table as its main control structure. An interrupt mechanism is proposed. This report, a companion to the technical report UIUCDCS-R-75-739 , discusses the code generation phase of a first implementation. The implementation restrictions and the input to the code generator are described in detail. The report is primarily intended for the implementor of an analysis phase. 17. Key Words and Document Analysis. 17a. Descriptors Code Generation Compilation Compilers Programming Languages Storage Allocation Intermediate text 17b. Identifiers /Open-Ended Terms CLEOPATRA 17c. COSATI Field/Group 18. Availability Statement UNLIMITED FORM NTIS-35 (10-70) 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 40 22. Price USCOMM-DC 40329-P71 <# FB 2 W«