1 
 
 
 
 1 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 no.9^>0-82)5 
 
A' 
 
 "if 
 
 

 „ UIUCDCS-R-76-834 
 
 u 
 
 
 ^ 
 
 IMPLEMENTATION OF THE LANGUAGE CLEOPATRA; 
 THE ANALYSIS PASS 
 
 by 
 Scott Harley Fisher 
 
 October I976 
 
b 
 
 11 
 
 S 
 
 '0 
 
 in 
 
UIUCDCS-R-T6-83U 
 
 IMPLEMENTATION OF THE LANGUAGE CLEOPATRA: 
 THE ANALYSIS PASS 
 
 BY 
 SCOTT BARLEY FISHER 
 B.S., Oniversity of Illinois, 1972 
 
 THESIS 
 
 Submitted in partial fulfillment of the requirements 
 for the degree of Master of Science in Computer Science 
 in the Graduate College of the 
 University of Illinois at Urbana-Champaign , 1976 
 
 Urbana, Illinois 
 
Ill 
 
 Acknowledgeaent 
 
 As I write this acknowledqemen t and think back over the 
 lonq months of work that went into this project, it is 
 difticult to name th<^ individuals who have influences this 
 work. To any deserving but unnamed indivduals, I express my 
 reqrets for the omission. 
 
 I would like to thank my thesis advisor Dr. H. George 
 Friedman, Jr. for suggesting this project, reading this 
 thesis, and supplying the proper environment for the work 
 contained herein. 
 
 When embarking on a new field, it is always beneficial 
 to study the previous literature on the subject. In my 
 case, the "previous literature" was my Director of Research, 
 but more than that, my "previous literature" was embodied in 
 an enthusiastic pedagogue and good frifend--Dr. Axel T. 
 Schreiner. Axel was tremendously helpful, and always 
 willing to discuss difficult points. He also displayed 
 
4 
 
IV 
 
 confidence in my abilities and in the project even when 
 confidence seemed out of order. It seems inadequate, 
 but--thanJcs Axel. 
 
 I also extend a special thanks to my family. Although 
 they do not understand the esoterica involved in the 
 analysis pass of a compiler, they were always confident in 
 and supportive of my work. It is, to a large part, through 
 their efforts that I have attained this goal. For that, I 
 sincerf^lv thank them all. 
 
 Lastly, I wish to express my thanks to two fellow 
 graduate students and good f riends--John Hodry and John 
 Bowman--for helpful suggestions and technical advice on 
 various aspects of this project. 
 
 It is to these people that I dedicate this thesis. 
 
 S. H, 
 
 F. 
 
 ii^m: 
 
Table of Contents 
 
 Chapter 
 
 Page 
 
 1.0 introduction 1 
 
 2.0 Accessibility and Scope 3 
 
 2.1 The Configuration - U 
 
 2.2 Local vs. Global Scope 6 
 
 2.2.1 Global Scope 6 
 
 2.2.2 Local Scope 8 
 
 3.0 The Subset language 9 
 
 3.1 Structure Blocks 10 
 
 3.2 Data Blocks 11 
 
 3.3 Routine Blocks 12 
 
 U.O Implementation of the Subset 13 
 
 U.I Implementation Rules for Structure Blocks ... 13 
 
 4,2 Implementation Rules for Data Blocks 16 
 
 U. 3 Implementation Rules for Routine Blocks 18 
 
 H.n statements 19 
 
 S.O Further Considerations 27 
 
 5.1 Pointer Variables 27 
 
 5.2 Presentation of Blocks for Compilation 30 
 
 *5.0 Structure of the Compiler 36 
 
 6.1 Compiler Parameters 39 
 
 6.2 Symbol ^able Complex 42 
 
 6.2.1 Name Recognition 43 
 
 6.2.2 Type Analysis 44 
 
 6.2.3 level/Configuration Table SO 
 
 6.2.4 Configuration Table 51 
 
 6.2.5 Type Table 51 
 
 6.3 lexical Analysis • 52 
 
 6.4 Syntactic Analysis 53 
 
 6.5 Semantic Analysis 56 
 
 7.0 Conclusions 58 
 
 References 62 
 
■i 
 
 
1.0 Introduction 
 
 Proqramminq languages come in two basic flavours — the 
 easy to use, easy to write variety; and the more 
 complicated, but more powerful variety. Both have important 
 applications and, as evidenced by sheer numbers, are quite 
 viable. The former type, as exemplified by BASIC, is 
 adequate for the "simple" program where intricate data 
 structuring is not of primary concern, or for the novice 
 programmer. The latter, typified by PL/I, allow more 
 complex data bases and more powerful manipulations. It 
 becomes the task of the programmer to select the language 
 suitable for the particular project. 
 
 CLEOPATRA (Comprehensive Language for Elegant OPerAting 
 system and TRAnslator design) is a member of the latter 
 class. CLEOPATRA can be characterized by a rather detailed 
 program text, a very modular logical structure, and powerful 
 manipulative abilities. The language specifications as well 
 as a discussion of the implications have been presented by 
 Axel T. Schrfciner in [ 1 ] and [2]. This thesis presents the 
 initial implementation of the analysis pass of a CLEOPATRA 
 compiler. (Certain extensions and restrictions have been 
 imposed on the original language.) 
 
■11 
 
 •'is 
 
 ill 
 
 is 
 
 I 
 
 The obvious question that arises when a new language is 
 presented is, "Why another language?". This is indeed a 
 valid question, and a language, to be a worthy and viable 
 tool, must be able to address this question. In the present 
 case, the objectives have been to produce a compiler for a 
 language that: 
 
 1) Allows user-defined data types; 
 
 2) Can act as a laboratory for new features such as: 
 
 Decision tables, and 
 
 Powerful control structures that facilitate 
 the design of programs. 
 
 Within this framework, the user is allowed a cectain amount 
 of flexibility to produce a well structured and concise 
 program for compilation. Further, in an effort to produce a 
 "secure" program, full type checking is enforced. For the 
 compiler, a certain level of complexity is introduced since 
 opening and closing of "blocks" or levels of scope may be 
 more frequent. It is also necessary to maintain more 
 complex data bases to provide these special services. 
 
2.0 Accessibility and Scope 
 
 In days of old (relatively speaking!), a program 
 "owned" the computer for the duration of its execution. As 
 a result, the computer and all of its facilities became a 
 servant to the user program. Nowadays, of course, the 
 converse is true: the computer runs the program. This 
 being the case, the program and in particular its contents 
 have a life cycle. In a non-procedural language, the whole 
 program is active for the duration of execution. In the 
 more sophisticated languages (e.g. ALGOL), only portions 
 (often called procedures or blocks) are active at any given 
 time. In CLEOPATRA, this basic unit is the configuration. 
 All scope rules revolve around the configuration. However 
 not all elements of a configuration have identical scopes. 
 The scope of an element is a function of its placement in a 
 i295.i or global block. This relationship between global and 
 local elements will be brought forth after an examination of 
 the basic unit in CLEOPATRA — the configuration. 
 
■^1 
 
 ^Pc 
 
 in 
 38 
 
 2. 1 The Configuration 
 
 The structure of the source program written in any 
 procedure oriented language can be described by a tree in 
 the graph theoretic sense. That is, every procedure has a 
 father; the father of the main procedure might be 
 considered to be the run-time environment. It is from this 
 tree that ALGOL-style languages develop the scope or 
 accessibility rules. CLEOPATRA also uses this tree 
 structure concept. Each node of the resulting program tree 
 is called a configuration. (At this stage, a configuration 
 may be thought of as a procedure in the AL30L or PASCAL 
 sense. This definition is incomplete but sufficient for the 
 moment.) To the compiler, the underlying concept of the 
 configuration is the configuration's position in the 
 configuration tree. Each node in the tree is given a unigue 
 number. The crucial point is that each node in the tree 
 defines the configuration's relation to the program as a 
 whole. To the user, a configuration is denoted by a 
 configuration name. 
 
 In most programming languages, the static program tree, 
 corresponding to the configuration tree, is determined and 
 constructed from the £hisical nesting of the source program. 
 That is, one procedure is a descendant of another if the 
 
forioer*s statements are physically placed within the 
 latter»s statements. In CLEOPATRA, the tree is determined 
 ^y loaical nesting rather than physical nesting. This is 
 accomplished via the CLEOPATRA structure block. The 
 structure block defines the accessibility of elements in the 
 tree defined from the present configuration (node the the 
 tree) toward its descendants. 
 
 To this point, the term used to describe the program's 
 logical structure has been the static program tree. This is 
 the tree that is constructed at coropile-time and represents 
 the true logical nesting ot the source program. In contrast 
 to the static program tree is the dynamic run-time tree. 
 The dynamic tree is composed of the physically growing and 
 shrinking program in memory. At run-time, configurations 
 are allocated space in memory for their dynamic variables 
 and required return addresses. This space, which grows upon 
 entry to a configuration and shrinks upon exit, is a 
 constituent of the dynamic program tree. The run-time 
 environment is determined by the code generator, and is 
 described in [3]. (The reader will find a good discussion 
 of this topic as well as an excellent reference to compiler 
 principles in [5] and [6].) 
 
I 
 
 "3' 
 
 n 
 11 
 
 I 
 
 is 
 
 '12 
 P 
 E 
 
 2.2 Local vs. Global scope 
 
 To this point, the static program has been created. In 
 a languaqe such as ALGOL, the scope rules would now be 
 automatically defined. This is not the case in CLEOPATRA. 
 There are two basic scope rules rather than one. These can 
 be explained within the framework of the static program 
 tree, and the context of the current configuration tree. 
 The current configuration tree is defined as the currently 
 active configuration plus all descendants of that 
 configuration in the static program tree. In the case of 
 the initial configuration (similar to the main procedure in 
 PL/I) the current configuration tree is the whole program. 
 In the case of a configuration at a maximum nesting level, 
 the current configuration tree is the configuration itself. 
 
 2. 2. 1 Global Scope 
 
 The scope rule for global elements is the same as that 
 of ALGOL. A global element will potentially be active and 
 thus available throughout the current configuration tree. 
 However, a global element can become inactive in two ways. 
 
each producing different results. If a local element of the 
 same name as the global element is declared at a deeper 
 nesting level, the global element becomes known throughout 
 the current configuration tree except in the configuration 
 in which the local element is defined. In the second case, 
 if another global element of the same name as the first 
 global element is defined at a deeper nesting level, the 
 first element is known in its current configuration tree 
 until the configuration in which the second element is 
 declared. At this point, the second global element becomes 
 active for its current configuration tree or until its scope 
 is altered by one of these two situations. 
 
 An element is made global by being defined in a alobal 
 ^§;t§. blgck^ tjrpe data block, or by means of a alobal 
 structure block. The global structure block alters the 
 "globalness" slightly. This alteration will be discussed in 
 due time. For now, it can be stated that the general global 
 scope implies activation throughout the current 
 configuration tree unless the name is redefined. 
 
 irc<>\-'>%'M 
 
2.2-2 Local Scope 
 
 The concept of the local scope is rather unique to 
 CLEOPATRA. Further, its scope is easy to define. A local 
 element is active only for the configuration in which it is 
 defined. Thus, a local element is not known by any of its 
 descendants. 
 
 The purpose of such local scope is for scratch 
 
 variables, counters, and the like. This scope concept helps 
 
 contribute to program structure, since scratch variables 
 should not be made global. 
 
 An element is made local by declaration in a local data 
 block. The analysis pass detects local and global elements 
 and indicates the type for the code generator. The code 
 generator then allocates the local element upon invocation 
 of its configuration and deallocates upon exit. 
 
3.0 The Subset Language 
 
 To this point, the term configuration has been defined 
 as a procedure or a node in the static program tree. A 
 configuration is indeed those things, but in a aore precise 
 sense, a configuration is a collection of statements 
 defining an accessibility sequence, the data availability, 
 and the actions to be taken on these data. From this 
 definition, it can be concisely stated from what a 
 configuration is formed. A configuration is composed of a 
 routine block and possibly a combination of a structure 
 and/or a data block. In the context of the previous 
 definition, the following correspondences are formed: 
 
10 
 
 CONCEPTUAL BLOCK 
 
 ACTUAL BLOCK 
 
 USE 
 
 Structural 
 
 Structure and Accessibility 
 global structure sequences 
 
 Data 
 
 Local data 
 
 global data 
 
 Definition and 
 scope of data 
 
 Routine 
 
 Procedure and 
 
 operator block 
 
 Executable 
 statements 
 
 3. 1 Structure Blocks 
 
 As alluded to previously, the structure block defines 
 the logical nesting of the program's configurations by 
 constructing the static program tree. The scope of a 
 structure block is the current configuration tree. This 
 implies that procedures are logically nested in the manner 
 of ALGOL procedures. 
 
11 
 
 As indicated, the alobal structure block modifies 
 slightly the global scope rule. A global structure block is 
 associated with a user-defined data type. The global 
 structure block "pulls" the definition of the type to the 
 same level as the structure defining the user-defined type. 
 That is, a user-defined type is declared in a structure 
 block, which is a node in the static program tree. The 
 global structure block pulls the nesting level in the tree 
 to the same level as the defining configuration. 
 
 3.2 Data Blocks 
 
 Data blocks are used to define data items available to 
 a configuration. A data block is not required for a 
 configuration if the corresponding configuration does not 
 need to possess any new variables. A configuration does own 
 data items declared global in a predecessor configuration. 
 However, good coding practice favours elimination of scratch 
 global variables. Local data blocks are used to create the 
 scratch variables needed by a configuration. As previously 
 stated, an identifier may be declared in the current 
 
12 
 
 configuration with the sane name as a previously declared 
 global identifier. In this case, the current identifier is 
 the active element. 
 
 3.3 Routine Blocks 
 
 The last type of block is the routine block. To this 
 point, the input has established the synbol table and 
 configuration table with the proper declarations and calling 
 sequences. That having been coapleted, the executable 
 statements can be parsed. The routine block is the only 
 block which must be present in a configuration. This is the 
 case because a "procedure" may use global data and may not 
 need to possess nested configurations, but must contain 
 executable statements. 
 
 Two types of routine blocks exist for this purpose — the 
 ££2^£liiE§. block and the operator blcck. The procedure block 
 is used to describe the user's algorithm in the CLEOPATRA 
 language. The operator block defines the actions to be 
 taken by a user-defined operator. 
 
13 
 '*.0 Implementation of the Subset 
 
 As previously stated, this thesis is the presentation 
 of one aspect of the subset language--the analysis of input 
 through the generation of intermediate text. The purpose of 
 the current implementation is not only to produce a working 
 compiler, but also to determine the types of algorithms 
 reguired to implement tho language constructs. 
 
 The subset has been defined in [*♦]. The subset 
 specifications will not be repeated here except in those 
 areas where changes have been made. Though the code 
 generator was complete before the analysis pass was begun, 
 changes are still possible so long as the code generator 
 receives the reguired input. In some cases, restrictions 
 have been placed on input, but in other cases extensions 
 have been added. 
 
 U.I Implementation Rules for Structure Blocks 
 
 In [U], all configuration names were reguired to be 
 unigue. This was necessary for the linkage editor being 
 used. This restriction has been removed by the analysis 
 
# 
 
 |3 
 
 '0 
 
 
 1U 
 
 pass. The analysis pass requires uniqueness within any 
 qiven level in the conf iquration tree; however, duplication 
 is allowed between levels. This extension was made so that 
 the user can apply the same scope rules to all names 
 presented for compilation. In fact, the analysis pass forms 
 the unique names for the user and passes them to the code 
 qenarator. This is done by concatenatinq "CLEO" with a 
 three diqit confiquration number (which, as will be 
 recalled, is unique) . There are two cases where this 
 transformation does not occur. These are for the procedures 
 "INPUT" and "OUTPUT". The code qenerator uses the input and 
 output routines supplied by the linkaqe editor. It is, 
 therefore, imperative that the linkaqe editor actually gets 
 th*=' names "INPUT" and "OUTPUT". This conversion mechanism 
 also allows another extension. Formerly, configuration 
 names were restricted to seven characters. By the above 
 method, there is no restriction in lenqth of these names. 
 The name table constructed by the analysis pass indicates 
 both the confiquration name given by the user, and the 
 configuration name passed to the code generator. It should 
 also be noted that the linkage editor map will show the 
 converted names (i.e. the unique names constructed by the 
 analysis pass) and not the user's configuration names. The 
 correspondence can be found by using the name table which is 
 a default option of the analysis pass. 
 
15 
 
 In constructing the static program tree^ the analysis 
 pass accepts the first configuration name as the root of the 
 tree (i.e. as the main program in the PL/I sense). This 
 first configuration presented for compilation then is not 
 predefined, but after that point the declare-bef ore-use rule 
 is strictly enforced for structure blocks. The implication 
 is that no configuration may be presented before it is given 
 in a structure block. This is not solely to enforce the 
 declaration rule: the analysis pass has no way of 
 determining where a configuration fits into the 
 configuration tree if its nesting has not been declared 
 along with that of its predecessors. This does not contrast 
 with other languages since physical nesting implies the 
 structure in other block structured languages. An important 
 distinction must be made: A structure block "I" does not 
 define configuration "Y". Rather, some structure block "X" 
 defines "Y" where "X" is the ancestor of "¥" — i.e. "X" owns 
 "Y". 
 
 >J^iX->^ 
 
 ''/'y/<f/ 
 
16 
 4.2 Implementation Rules for Data Blocks 
 
 The subset language supports localj^ al2^i,l and 
 tZ.E£_ddta. The analysis pass reguires that all identifiers 
 must be declared before their use. Further, the analysis 
 pass provides complete scope and type checking. 
 
 Ir [4% identifiers had been limited to 31 charcters in 
 length. The analysis pass puts no restriction on the length 
 of identifiers. 
 
 The subset supports four basic types of data along with 
 user-defined types. The four basic types are: 1) Bit; 2) 
 Character; 3) Integer; and U) Long_integer . The subset 
 does not support the type pointer. A discussion of the 
 problems encountered attempting to implement pointers can be 
 found in Chapter 5. Real, decimal, and octal are not 
 supported by the code generator and thus not supported by 
 the analysis pass. Arrays are permitted with the 
 restrictions as noted in [4]. 
 
 An important attribute of CLEOPATRA is its support of 
 user-defined data types. There is not a single specific 
 block for this definition; however, the construct type-pack 
 exists for this purpose. A type-pack is a group of the 
 basic blocks which, when combined, define the structure of 
 
17 
 
 the user-defined type as well as configurations which may 
 access the data type. The name given the user-defined type 
 is defined in a structure block to assign to it a position 
 in the configuration tree. Next, a g.lobal data block is 
 used to define the underlying representation of the type. 
 It should be noted that the code generator places certain 
 restrictions on the contents of a user-defined type. These 
 restrictions exist in the subset language and are enforced 
 by the analysis pass. The last element of the type-pack is 
 the global structure block, which was mentioned earlier. 
 Local data and local structure blocks may also be applied to 
 a user-defined type. 
 
 One other type of data exists and it is the literal* or 
 self-defining constant (for example: 12 or the character 
 string C.abc) . These data do not reguire the 
 define-before-use rule. Several extensions of the language 
 have been made to allow more flexibility for the user. 
 Blanks are allowed between the period and the digits in the 
 hexadecimal, binary, and long__integer strings. (for 
 example: F. 20 is a legal long_integer representation for 
 20.) Character strings are limited to 256 characters. This 
 restriction is required by the code generator and is 
 enforced by the analyis pass. (Characters after the first 
 256 are truncated.) Furthermore, the analysis pass supports 
 a full upper and lower case alphabetic character set. 
 
18 
 
 As stated in [1], [2], and [ U], CLEOPATRA does not 
 allow automatic data type conversion. There are, however, 
 various builtin functions for this purpose, as well as the 
 ability to define operators. 
 
 Storage management is handled in three ways: constant^ 
 ^£i^LE.§.lr ^^^ §.y.Jt:Omatic. The mechanism for this handling is 
 treated in detail in [3]. Arrays may not have dynamic 
 bounds, "iost of the array partitioning and handling 
 facilities given in the full language are not available in 
 this subset (see [4]). 
 
 4. 3 Implementation Rules for Routine Blocks 
 
 The routine block contains the actual executable 
 statements for the exposition of the specific algorithm to 
 be implemented. The routine block is composed of any number 
 of statements which operate on data "owned" by the 
 configuration and any global data defined previously in the 
 tree. The format and details of the statements will be 
 presented shortly. 
 

 19 
 
 There are two types of routine blocks: operator and 
 ££2cedure. The operator block is used to define the action 
 of a user-defined operator on its operand (s). In general, 
 the main algorithm that the user is inplementing will be 
 coded in a procedure block. 
 
 U.a statements 
 
 Routine blocks are composed of two basic types of 
 statements. These are: 1) simple, and 2) com£ound. Simple 
 statements include the following: 
 
 a) Expression 
 
 b) IF . .. THEN ... 
 C) RETUEN 
 
 d) EXIT 
 
 ELSE 
 
 The compound statement combines the simple statements into a 
 more powerful control statement. These are: 
 
 a) BEGIN ... END for bracketing statements, 
 
I 
 
 1 
 
 ^3 
 
 ■-J3 
 
 J' 
 
 IS 
 
 '0 
 
 in 
 
 1S> 
 
 20 
 
 b) ITERATE, with options FOR, WHILE, 
 
 WHEN and EXIT. 
 
 c) DECISION which allows selective 
 
 execution of statements. 
 
 A key difference between CLEOPATRA and most other 
 lanquaqes is that expressions are evaluated from right to 
 left, (This is also the method employed in APL) Further, 
 there is no operator precedence (parenthesization forces 
 precedence for the parenthesized quantity), CLEOPATRA 
 allows the user to define operators as does ALGOL w. 
 However, ALGOL V maintains operator precedence by requiring 
 the user to assign a precedence number to each user-defined 
 operator. 
 
 User-defined operators may also have parameter lists 
 similar to those of a procedure except that the parameter 
 lists may exist on the right and/or the left side of the 
 operator. In fact, the code generator only accepts 
 right-side parameters. Therefore, the analysis pass 
 converts all left parameters into right parameters. The 
 syntax for parameter lists to operators has been changed to 
 remove an ambiguity in the use of operators with parameters. 
 Formerly, parameters to operators were denoted by placing 
 the parameters in parentheses. The problem arose that if a 
 
21 
 
 binary and unary operator, each with parameters, were placed 
 next to each other, the analysis pass would not be able to 
 resolve the question of which parameter list belonged to 
 which operator. This is because of the ambiguity and the 
 fact that the operators in question may have unary and 
 binary components. Thus parameters to operators must be 
 denoted by placing an apostrophe ( ' ) on the side of the 
 parameter list closest to the operator. 
 
 The last restriction on expressions is that there may 
 be no more than 50 elements in an expression. This is 
 because the code generator places a limit on the length of 
 an expression in intermediate text form. 
 
 One change has been made to the language specifications 
 of the FOB statement. Previously [U], the following would 
 be a legal albeit meaningless statement: 
 
 FOR identifier ; ; ; 
 
 Although most programmers would refrain from such a nebulous 
 statement, the analysis pass would have to recognize the 
 construction and the code generator would have to generate 
 code for it. The meaning is certainly questionable, as 
 would be the resultant code. Therefore, the following FOR 
 statement has been implemented: 
 
22 
 
 FOF Statement ::= FOR identifier [FROm expression] 
 
 STEP expression 
 [[;] OPTO I DOWNTO expression ] ; 
 
 Tn the absence of the FROM clause, the initial value of the 
 identifier used as the index is its present value. If the 
 npTO DOWNTO expression option is not used, the effect is to 
 increment the index and continue the loop indefinitely. 
 This should only be used in conjunction with a WHEN, WHILE, 
 or EXIT statement. This form of the FOH statement allows as 
 flexible and powerful implementation as the code generator 
 via the intermediate text will accept. 
 
 The first proposed revision was similar to the version 
 implemented, however it would have made optional the STEP 
 expression phrase when using UPTO or DOWNTO expression. 
 
 FOR Statement ::= FOR identifier [FROM expression] 
 [ STEP expression ] 
 £[ ; ] UPTO I DOWNTO expression ] ; 
 
 The default when STEP is omitted would be one. This is the 
 
23 
 
 optimal solution, but because of the requirenents of the 
 intermediate text, it was not possible to implement this 
 form. 
 
 All keywords in the subset have been reserved. A list 
 of these reserved words along with their symbol table entry 
 numbers can be found in Figure 1. Figure 2 gives a list of 
 the builtin operators provided by the subset. These may be 
 redefined in an operator block and thus are not reserved. 
 The code generator requires that there be no more that 1500 
 symbol table entries in addition to the list of reserved 
 words and operators given. Although 9U keywords are given 
 in the figures, there are actually 122 keywords. The 
 discrepancy lies in the fact that most of the operators have 
 more than one entry (e.g. 66 and 69). The operators, 
 though apparently equivalent., are not because they operate 
 on different operand types. The negative sign in entry 66 
 is a unary negative and operates on integers. The negative 
 sign in entry 69 is a binary negative and operates on 
 integers. This same situation applies to most operators. 
 The effect is invisible to the user, however, since a 
 semantic analysis routine is called during the parsing of 
 expressions to determine the semantically correct operator. 
 
 
If 
 k 
 
 4 
 
 i!« 
 
 IS 
 
 l 
 
 '0 
 
 > 
 
 I* 
 
 
 
 
 
 24 
 
 ZHtrj 
 
 SYmbol 
 
 Entry 
 
 Symbol 
 
 
 Number 
 
 
 Number 
 
 
 
 1 
 
 ACTION 
 
 2 
 
 ALLOCATE 
 
 
 3 
 
 BEGIN 
 
 4 
 
 DECISION 
 
 
 5 
 
 DOWNTO 
 
 6 
 
 ELSE 
 
 
 7 
 
 END 
 
 8 
 
 EXIT 
 
 
 9 
 
 FOR 
 
 10 
 
 FROM 
 
 
 11 
 
 IF 
 
 12 
 
 ITERATE 
 
 
 13 
 
 NIL 
 
 14 
 
 OPERATOR 
 
 
 15 
 
 PROCEDURE 
 
 16 
 
 RELEASE 
 
 
 17 
 
 RETURN 
 
 18 
 
 STEP 
 
 
 19 
 
 THEN 
 
 20 
 
 WHEN 
 
 
 21 
 
 UPTO 
 
 22 
 
 WHILE 
 
 
 23 
 
 ADDRESS 
 
 24 
 
 ALIAS 
 
 
 25 
 
 B 
 
 26 
 
 BIT 
 
 
 27 
 
 BUILT 
 
 28 
 
 BY 
 
 
 29 
 
 C 
 
 30 
 
 CHARACTER 
 
 
 31 
 
 i 
 
 i 
 
 COMMENT 
 
 32 
 
 Figure la 
 Reserved Words 
 
 COMPILE 
 
 
25 
 
 Number 
 
 Entry S^Bbol 
 Number 
 
 33 
 
 CONSTANT 
 
 35 
 
 DEFER 
 
 37 
 
 F 
 
 39 
 
 IN 
 
 41 
 
 INTEGEB 
 
 43 
 
 LONGINTEGER 
 
 US 
 
 RETURNS 
 
 47 
 
 S 
 
 49 
 
 VECTOR 
 
 51 
 
 X 
 
 53 
 
 FALSE 
 
 55 
 
 LARGE 
 
 57 
 
 SMALL 
 
 34 
 
 DATA 
 
 36 
 
 EXTENTS 
 
 38 
 
 GLOBAL 
 
 40 
 
 INIT 
 
 42 
 
 INTO 
 
 44 
 
 POINTER 
 
 46 
 
 RIGHT 
 
 48 
 
 TO 
 
 50 
 
 TYPE 
 
 52 
 
 STRUCTURE 
 
 54 
 
 FIRST 
 
 56 
 
 LAST 
 
 5B 
 
 TRUE 
 
 Fiqure lb 
 
 Reserved Words (continued) 
 
26 
 
 ^: 
 
 is 
 
 •5 
 
 '0 
 
 Entrj 
 
 Symbol 
 
 Number 
 
 
 66 
 
 - 
 
 68 
 
 ♦ 
 
 70 
 
 * 
 
 72 
 
 ** 
 
 7U 
 
 AND 
 
 76 
 
 OR 
 
 78 
 
 = = 
 
 80 
 
 > 
 
 82 
 
 >= 
 
 8a 
 
 • 
 
 86 
 
 LENGTH 
 
 88 
 
 -> 
 
 90 
 
 "-> 
 
 92 
 
 II 
 
 9U 
 
 CHAR 
 
 EntrY 
 
 Syabol 
 
 Number 
 
 
 67 
 
 ABS 
 
 69 
 
 - 
 
 71 
 
 // 
 
 73 
 
 HOD 
 
 75 
 
 LBOUND 
 
 77 
 
 -» 
 
 79 
 
 = 
 
 81 
 
 < 
 
 83 
 
 < = 
 
 85 
 
 <ONUSED> 
 
 87 
 
 HBOOND 
 
 89 
 
 <- 
 
 91 
 
 ?-> 
 
 93 
 
 LINT 
 
 Figure 2 
 
 Built-in Operators 
 
27 
 5.0 Further Considerations 
 
 During the implementation of the analysis pass, several 
 problems arose. In most cases, by some modification, they 
 were overcome. However, two major problems have persisted. 
 In both cases, implementation was attempted within the 
 constraints of the language, facilities, and code generator. 
 Unfortunately, the attempts met with limited success. 
 Valuable information was gleaned from this process and is 
 presented below so that future efforts might profit from the 
 results thus far. 
 
 5. 1 Pointer Variables 
 
 A ma-jor dilemma arose in the implementation of pointer 
 variables. The BNF (Backus Naur Form) of the productions 
 causing the problem is: 
 
 Basic_Ref_Type 
 
 ... Pointer (ref_type) 
 
 Ref_Type : := Basic_Ref_Type .. 
 
 &<i;:fc? 
 
The quandary then became the following: 
 
 28 
 
 
 i' 
 
 (^ 
 
 % 
 
 \i 
 
 ■IS 
 
 '8 
 
 12 
 is 
 
 At data definition time, the compiler dDes not 
 know the name of the variables pointed to and so 
 must store the type pointed to as pointer, as 
 pointer.... However, the symbol table does not 
 allow control over the level of pointers nor the 
 final target of the pointer. Therefore the strong 
 typing could easily be circumvented, and nothing 
 in the user's region would be safe. 
 
 The problem becomes complicated by the fact that the 
 symbol table, which was designed before the analysis pass 
 was begun, does not inherently support indirect pointers. 
 One advantage of CLEOPATRA is that all types are checked in 
 the analysis pass for compatability of source and target. 
 It is therefore imperative to have the facilities to deduce 
 and retain types for identifiers. Type checking is indeed a 
 two level problem in CLEOPATRA. At the point of definition 
 of the variable, all that is known is that it is a pointer 
 to a pointer to a pointer.... Later, in the usage context 
 the name of the variable pointed to, pointed to... must be 
 checked. Thus the levels of offset must be known. 
 
29 
 
 Several solutions were hypothesized and each suaBarily 
 discarded save for the last. Among the proposals 
 entertained vere: 
 
 1) Restricting pointers to point to basic types only, 
 
 thus eliminating multi-level pointers. This 
 appeared more restrictive than valuable. 
 
 2) Constructing a symbol table for the analysis pass, 
 
 then reformatting before passing to the code 
 generator. This method would be wasteful of space 
 and especially of time. 
 
 3) Removing pointers from the subset. 
 
 Since the current implementation is a subset and an 
 investigation into the methods for implementation of 
 CLEOPATRA, the removal appeared to be the best solution. 
 Indeed, it has been argued that pointers should not be 
 included in programming languages [ 7]. 
 
 Though they have been removed, and irrespective of the 
 debate, pointers are in the full language and must 
 eventually be handled. After considerable effort in an 
 attempt to implement them, an adeguate solution has been 
 formulated for use in the full language. Rather than a 
 "type returned/pointed to" field in the type analysis table 
 
30 
 
 of the Symbol Table Complex, a ref_type field is suggested. 
 This would be a pointer to a linked list containing the type 
 pointed to and a pointer to the next level of offset if 
 necessary. This would allow full implementation, compact 
 size since the nodes could be allocated dynamically, and 
 simple yet complete type checking. 
 
 » 
 it 
 
 k 
 a 
 
 I 
 
 !! 
 ft 
 
 5.2 Presentation of Blocks for Compilation 
 
 In tho CLEOPATRA language description presented in [2], 
 blocks could be presented in almost any order. In [4], 
 blocks must be grouped by configuration with the routine 
 block following the structure and data blocks. These two 
 orderings imply a very different analysis pass. 
 
 As will be recalled, an important function performed by 
 the analysis pass is the construction of the static program 
 tree. This activity is mostly invisible to the code 
 generator since it presumes proper (or repaired) input. The 
 analysis pass, on the other hand, bases all of its 
 activities around the knowledge of this tree. In the first 
 ordering (actually non-ordering) the tree may not be 
 completed to the proper level when a routine block for a 
 
31 
 
 heretofore undeclared configuration arrives. Several 
 alternatives present themselves at this point: 1) Declare 
 the new configuration to be a descendant of the current 
 configuration; 2) Plush the block ; and 3) Talce two passes 
 over the input. He shall discuss the three in order. 
 
 By declaring that the routine block for an unknown 
 configuration automatically becomes a direct descendant of 
 the current block the initial guestion of where to place the 
 configuration in the tree is resolved. However, this 
 immediately presents several new problems with possible 
 disasterous results. First, the static tree would have to 
 be "pulled apart" and the new configuration inserted. The 
 physical action of changing the links is not a problem. A 
 larger problem is that the scope of variables and procedures 
 would change. By inserting this configuration, new 
 procedures and variables could be brought into the current 
 scope. This would potentially change variables, present 
 naming conflicts, and alter calling seguences. Without 
 elaborating further, it should be clear that this method is 
 completely infeasible. 
 
 The second alternative, to flush the block, would 
 certainly alleviate some of the problems associated with the 
 first alternative. Mo new blocks or variables would be 
 pulled into a potentially incorrect place in the 
 
32 
 
 configuration tree. Obviously, execution would have to be 
 suppressed since the routine would not be used. Farther, 
 any calls on this routine would be illegal. Also, any 
 procedures nested within the routine would be flushed since 
 they would automatically be out of order. Thus, this 
 solution has a very severe snowball effect. 
 
 /! 
 
 f 
 % 
 I 
 
 I 
 
 •0 
 
 12 
 I 
 
 The last alternative is to run a two pass analysis. 
 The first pass would collect all elements and sort out the 
 structure of the static tree. Then the second pass would 
 resolve scopes based on the static tree. This method would 
 solve the problems presented above. The disadvantages are 
 that the whole compiler would reguire three passes (two for 
 analysis and ono for code generation), and some form of the 
 input would have to be retained for the analysis pass. The 
 important point is that the problem of the static tree would 
 be solved at the expense of additional time and space. 
 
 The reguirement that bloclcs must be presented by 
 configuration with routine blocks last alleviates the 
 problem above and reguires only one pass for analysis. The 
 problem here is that there is slightly less generality in 
 the presentation of bloclcs for compilation. 
 
 The actual implementation is somewhat of a compromise 
 between all the methods. The analysis pass was implemented 
 as a single pass implying a certain amount of ordering. He 
 
33 
 
 require a configuration to be defined in a structure block 
 
 before any other blocks for that configuration are presented 
 
 (except for the initial configuration) . Next in order of 
 
 presentation must be any type_data blocks where the naae of 
 
 the type_data block was declared in the iaaediately 
 
 preceeding structure block. This is necessary to preserve 
 
 the list of like named predecessors. When a block is 
 
 
 closed, a predecessor for each element being closed is 
 activated if it exists. In order to keep type__data open 
 this ordering must be preserved. Lastly, the data blocks 
 and routine blocks are presented in the following manner. 
 Any data block must precede its routine block but, need not 
 immediately precede it. These blocks must be presented in 
 level order--just as the scope rules. The most important 
 rule is that a structure block must precede all other blocks 
 in its configuration. However, all structure blocks do not 
 have to be grouped together. 
 
 A considerable amount of time was spent attempting to 
 find a better compromise. Indeed, at one point all 
 activities ground to a halt while efforts were turned to the 
 opening and closing algorithm. Finally, the above solution 
 was chosen. Along the way, a very promising algorithm was 
 in part derived. The concept is presented here for further 
 consideration and refinement. 
 
34 
 
 This method employs a 6 x 3 bit lap for each 
 configuration. The format is: types_of_blocks x legal_use. 
 Precisely, the form is: 
 
 Allov nay Define Has been Defined 
 
 *t 
 
 m 
 
 «» 
 
 ^i 
 
 i\ 
 
 I 
 
 Poutine 
 Global Data 
 Type_Data 
 Local Data 
 Global Structure 
 Local Structure 
 
 The "allow" column is a function of the type of 
 conf iguration--al3orithm or type pack. The "may define" is 
 initialized to "yes" and switched to "no" later as a 
 function of the "has been defined" entry. The "has been 
 defined" is initialized to "no" and changes as compilation 
 proceeds. 
 
 As an example of the use of the bit map, consider the 
 following situations: 
 
35 
 
 1) Hhen defining a routine block (row 1) , we know that 
 
 from this point on, we can not present data blocks 
 for this configuration. So, all data block bits 
 are set to "nay not define". 
 
 2) When defining a global data block for the current 
 
 configuration, we can not define type data or 
 global data for predecessors in the configuration 
 tree. Therefore we climb up the configuration 
 tree setting the "may not define" bits. 
 
 3) When defining a local structure block for a 
 
 configuration, we can not define structure blocks 
 for any predecessor configurations. So, the "may 
 not define" bit is set for all predecessors. 
 
 Upon entering a block, the bit map is checked to determine 
 whether the block just entered is in a legal position. 
 
 This method would require little overhead and only the 
 18-bit map. The method has not been proved for all cases 
 but because of the speed and small space requirements, it 
 seems to be the best alternative method. 
 
36 
 
 6.0 Structure of the Compiler 
 
 ■C 
 
 1^ 
 
 i' 
 
 I 
 
 is 
 
 I 
 
 '0 
 
 12 
 
 The CLEOPATRA compiler has been implemented in two 
 passps. That is, the input or a transformation is examined 
 in two distinct time frames. The first pass — the Analysis 
 Pass — is the interface between the user's source program and 
 the code generator. The analysis pass constructs the static 
 program tree, examines the input to be sure that it is in 
 proper syntactic and semantic form, and produces 
 intermediate text — a hybrid transformation of the 
 source--which is passed to the code generator. If any 
 errors are detected during this analysis, the analysis pass 
 sets the user condition code to 12 which suppresses the 
 invocation of the code generator. If no errors are found, 
 the codr generator is called to convert the intermediate 
 text into object code. The details of the code generator 
 can be found in [ 3], 
 
 The analysis pass may be broken into three basic phases 
 (see Figure 3): Lexical Analysis, Syntactic Analysis, and 
 Semantic Analysis. Though the £ass has been broken into 
 £hases, it must be realized that only the analysis pass is a 
 discrete entity. The analysis pass can be defined as the 
 
37 
 
 interaction of the three phases with the symbol table 
 complex on the source program to produce an internal 
 representation of the user program. 
 
I 
 
 f1" 
 
 
 1 
 
 IS 
 
 Solid line - 
 Broken line 
 
 • Flow of information 
 - Flow of control 
 
 Figure 3 
 Analysis Pass 
 
 38 
 
 Source Program 
 
 
 
 Symbol 
 
 Table 
 
 Complex 
 
 <inf ormation 
 
 Lexical Analysis Phase 
 
 
 *- 
 
 information (Ccontrol 
 
 <r 1 
 
 Syntactic Analysis Phase 
 
 
 ^ 
 
 1 
 
 Cinformation jCcontrol 
 
 Stim^nf ^ r- Una 1 v<;-i «; PTi ac;t» 
 
 
 
 
 
39 
 
 In the pages that follow, the phases and the symbol table 
 complex will be dissected and discussed as to their function 
 in the compilation process. 
 
 6. 1 Compiler Parameters 
 
 Durinq the design of the analysis pass, several 
 compile-time parameters were incorporated into the code. In 
 all cases, values for the parameters may be passed to the 
 compiler in the parameter field of the job control language. 
 If no parameters are passed, the default values are used. 
 In general, the default parameters will be sufficient for a 
 compilation. A listing of all options and their value for 
 the current compilation is printed on a page before the 
 source listing. These parameters are listed in Figure 4 and 
 are encoded by prefixing the name of the parameter with 
 "DEBUG." followed by the parameter followed by 
 "=value_to_be_used," . In the case of the last parameter, 
 the comma is replaced by a semicolon. 
 
40 
 
 ^: 
 
 it 
 
 J 
 
 1 
 
 (9 
 
 is 
 
 'J IS 
 
 '0 
 
 )> 
 
 Parameter 
 
 BLK DUMP 
 
 CONFIG DUMP 
 
 DEBUG CRD 
 
 DEBUG PROC 
 
 Action 
 
 Output block tables. 
 
 Output configuration tables. 
 
 Output a trace of the compiler 
 along with selected values 
 beginning at this card number. 
 To be used only on compiler 
 error as it generates 
 considerable output. 
 
 Same as above. It may be 
 enabled either initially or 
 by the above card number. 
 
 Figure Ua 
 
 Compiler Parameters 
 
 Default 
 
 Yes 
 
 Yes 
 
 30000 
 
 No 
 
U1 
 
 Parameter 
 
 Action 
 
 SefaiOi 
 
 1ST TXT 
 
 Output a listing of the No 
 
 internediate text — this requires 
 a "//GO.ITEXT DD SYSODT=A" card 
 if the catalogued procedure is not 
 used. 
 
 LINECT 
 
 Number of source cards to be 
 listed on a page. 
 
 58 
 
 Nam tab dump 
 
 Output the nane table. 
 
 Yes 
 
 OPEN TST 
 
 Output block tables, configuration No 
 tables, and selected values upon 
 entering a nevf block. 
 
 SOURCE 
 
 Output a source listing. 
 
 Yes 
 
 SYM_TAB_DOnP Output the symbol table, 
 
 Yes 
 
 TYP^TAB^DOMP Output the type tables. 
 
 Yes 
 
 Figure Ub 
 
 Compiler Parameters (continued) 
 
it 
 k 
 
 / 
 
 11 
 
 1 
 
 s 
 
 I 
 
 12! 
 !3 
 
 42 
 
 Since laany large tables are required by the analysis 
 pass, compilation began to require large amounts of main 
 memory. Therefore, the analysis pass has been overlayed to 
 reduce this requirement. The overhead in swapping the 
 overlays is minimal compared to the saving in space. 
 
 The analysis pass was originally designed to have 
 expandable tables (symbol table, block table etc.) for 
 increased generality . Unfortunately, this feature had to be 
 removed. As the amount of code increased, the cost of 
 recompilation of the analysis pass increased prohibitively. 
 Therefore, other methods (external procedures in PL/I, and 
 use of load modules) which do not allow dynamic bounds on 
 arrays were employed. This saved in the cost of 
 implemrntation, but forced the use of static bounded tables. 
 In fact, this is not too limiting since the code generator 
 has a fixed bound on the size of the symbol table. 
 
 6.2 Symbol Table Complex 
 
 The Symbol Table Complex is the major data base for the 
 compiler. This is due not only to its information content, 
 but also to its physical size and time spent in its 
 
43 
 
 manipulations. This size is in part due to the power of the 
 compiler. The symbol table complex is used by all phases 
 although not all phases may change an entry. The symbol 
 table complex is composed of four basic modules: 
 
 1) Name Recognition 
 
 2) Type Analysis 
 
 3) Level / Configuration Table 
 
 4) Configuration table 
 
 5) Type Table 
 
 In the following sections each table will be presented 
 for an overview of its function. 
 
 6. 2. 1 Name Recognition 
 
 The Name Recognition Table contains the actual 
 character representation of the input tokens. This table 
 consists of a linear string of all tokens concatenated 
 together. The table also contains a vector of three 
 pointers for each symbol table entry. The first is a 
 pointer to the start of the token in the string. The second 
 
I 
 
 a ' 
 
 ail'' 
 
 > 
 
 I 
 
 '0 
 !' 
 
 P 
 
 44 
 
 is a length count. The last is a link of the liks-names 
 (like but not equivalent). As previously mentioned, two 
 similarly named symbols may exist provided they have 
 different scopes. However, only one entry is kept in the 
 name table in order to save space. 
 
 Name recognition is the first action taken by the 
 symbol table manager in searching for an entry. The search 
 technique employed is a hash table with chaining for 
 duplicates. After some experimentation a suitable hash 
 function was chosen and mapped into 256 hashing buckets. 
 When two different tokens hash to the same bucket, the first 
 is inserted into the table in the usual fashion, but the 
 second is chained by a link from the first. 
 
 6.2.2 Type Analysis 
 
 The Type Analysis table contains the type 
 representation and parts of the scope values for symbols. 
 As discussed above, the lexical analysis phase inserts 
 values into the type analysis table. The fields and their 
 values are given below: 
 
U5 
 
 Field 
 
 Function 
 
 Type 
 
 Type of syabcl or returned type: 
 
 1 . . Long_integer 
 
 2 . . Character 
 
 3 . . Error 
 
 7 . . Pointer 
 
 8 . . B it 
 
 9 . . User Defined 
 13 . . Label 
 
 15 , . Integer. 
 
 Constant 
 
 Set if constant data. 
 
 Array 
 
 Literal 
 
 Initial 
 
 Set if array. 
 
 Set if self-defining constant, 
 
 Set if symbol is initialized. 
 
 Local 
 
 Set if local symbol. 
 
 Defer 
 
 Set if deferred storage. 
 
 Formal 
 
 Set if formal parameter. 
 
 Alias 
 In_type 
 
 Set if alias name. 
 
 Set if part of user-defined type 
 
46 
 
 Link 
 
 Set if link item (e.q. configuration 
 
 name) . 
 
 Entry 
 
 I' 
 
 #1 
 
 is 
 
 S 
 
 '8 
 
 Ptr 
 
 Apt r 
 
 Unused 
 
 Plk 
 
 S'3t if an entry item, following 
 types: 
 01 .. if procedure 
 
 10 .. if unary operator 
 
 11 .. if binary operator. 
 
 Type of symbol pointed to (not 
 i mplemented) . 
 
 Set if points to array (not 
 
 i mplem ent^d) . 
 
 Analysis pass sets bit number 1 if 
 
 the identifier is a data item, 
 and 2 if it is a read-only data 
 
 item. 
 
 Configuration number of surrounding 
 conf iq uration. 
 
47 
 
 Blklevel 
 
 Nesting level of the configuration if 
 the item is a configuration 
 name, or the nesting level 
 of the surrounding 
 configuration if it is 
 not a configuration. 
 
 Atrl 
 
 Depends on the item. 
 
 Atr2 
 
 Depends on the item. 
 
 Most of the entries are self-explanatory from the context of 
 use. Those not clear are expanded upon below. 
 
 Entrjr T^ijS 
 
 S£ecial Field Usage 
 
 Configuration name 
 
 Reserve two symbol table rows. 
 Set type returned in TYPE of row 
 
 two. 
 Link bit is set in rows one and two. 
 Entry bit is set in row two. 
 Atrl in row one contains the 
 
 configuration number. 
 
U8 
 
 C 
 
 A' 
 
 
 
 Data item 
 
 Character 
 
 Atr2 in row one points to the 
 
 configuration name in 
 
 the constant table. 
 Atrl in row two contains the number 
 
 of parameters to the 
 
 proced ure/operator. 
 Atr2 in row two contains a pointer 
 
 to the position of the 
 
 entry pointed to if the 
 
 procedure returns 
 
 a pointer. (not used) 
 
 Atrl is the naximum length if the 
 item is of type character. 
 
 Otherwise Atrl is the configuration 
 number of the surrounding 
 conf ig urat ion. 
 
 Atr2 points to the row of the value 
 to which the data item is 
 initialized if the item has 
 tne initial attribute. 
 Length in Atrl. 
 
 'If initialized, Atr2 points to the 
 initial value in the 
 constant table. 
 
49 
 
 Bit, 
 
 Integer, 
 Lonq_integer, 
 Constant 
 
 Alias 
 
 User-defined types 
 
 The value is in &tr2 unless 
 long_integer, 
 
 in which case the value is 
 overlayed in Atrl and 
 Atr2. 
 
 Set alias bit and link if necessary. 
 Atr2 points to the najor nane. 
 
 Blk set to zero for elements and 
 in_type bit is set. 
 
 The type analysis table is one of two tables passed to 
 the code generator. The other is the constant table. The 
 constant table contains the following entries: 
 
 1) Seven character unique configuration names as 
 
 described previously. 
 
 2) The number of extents and bounds for arrays. 
 
 3) The value of character literals. 
 
 Mi 
 
50 
 
 c 
 
 Both tables are passed to the code generator in the 
 intermediate text file. On entry to a routine block, the 
 type analysis table entries froa the last element 
 transmitted through to the current top of the table and the 
 whole constant table are placed in the intermediate text 
 file. (The last element of the type analysis table is 
 denoted by eleven ones in the unused field.) This is 
 followed by the intermediate text for the routine. 
 
 
 ill 
 
 11! 
 
 - " 
 ;'B 
 
 I 
 
 6.2.3 Level/Configuration Table 
 
 The Level/Configuration table contains the information 
 about the activation of symbols. One element of the table 
 is a field that links together all symbols of the same 
 configuration. This field is used for activation of symbols 
 upon entry to a new block. other fields are used to form a 
 chain of entries and their predecessors. 
 
51 
 
 6.2.4 Configuration Table 
 
 The Configuration Table is a presentation of the static 
 program tree. It consists of a list of configuration 
 numbers along with their immediate predecessors. There is 
 also a pointer to a symbol number in the level/configuration 
 table which thus links all elements of the same 
 configuration. This table is built mainly from the 
 structure blocks. 
 
 6.2.5 Type Table 
 
 The Type Table holds the attributes of parameters to 
 operators and procedures. When operators and procedures are 
 declared, only the type of the parameter is given. It is at 
 the point when the operator or procedure is called that the 
 actual identifier is found. Parameters can be of any 
 available type, arrays, and left or right in the case of 
 operators. There is no specific bit in the type table to 
 denote an array entry. This is done indirectly by recording 
 the number of extents for the entry. The number of extents 
 is always zero unless the item is ein array in which case it 
 
I 
 
 i 
 
 J' 
 
 1:1 
 !2' 
 
 52 
 
 has a positive value. The seaantic processor requires this 
 information in type checking. 
 
 6.3 Lexical Analysis 
 
 As shown in Figure 3, the Lexical Analysis phase is the 
 first phase encountered by the input. The Analysis pass 
 occurs as one pass thus there is an interaction among the 
 phases. The lexical analysis phase performs the most 
 rudimentary albeit important work on the input. First, the 
 input stream is tokenized by the productions in the language 
 specifications. Comments and blanks are eliminated. 
 (See Figure 3.) 
 
 The scanner is able to recognize certain types and 
 resolve them--bit values, integers, long_integers. In the 
 case where the scanner is able to determine the type (due to 
 the syntax) the type may be inserted into the symbol table. 
 When the scanner is not able to make this determination, it 
 still is able to reduce the possibilities. It thus inserts 
 an unresolved type. 
 
iUte 
 
 53 
 
 The scanner calls upon a routine of the Syibol Table 
 Hanager to look up the syibol in the syabol table. 
 Depending on the context, another routine can be called to 
 insert the token into the Symbol Table Complex. Other 
 utilities include reading routines and a routine to convert 
 radixes since CLEOPATRA accepts input in binary, decimal, or 
 hexadecimal integers. There is also a routine to convert 
 characters to their integer representation, and integers to 
 their character representation. The Lexical Analysis phase 
 is driven by the Syntax Analyzer. The Syntax Analyzer 
 "knows" what to look for and the context in which it 
 resides. It thus knows whether it is in a declarative 
 context or merely looking for a defined token. 
 
 6.U Syntactic Analysis 
 
 If one phase of the compiler could be considered the 
 driver for the analysis pass, the Syntactic Analysis phase 
 could qualify. It is the syntax of the language upon which 
 the parse is based. 
 
'ii 
 
 f 
 
 '!3 
 
 I 
 
 !2 
 
 
 54 
 
 The syntax analyzer is broken (and overlayed) into five 
 
 major modules. These nodules follow the saae lines as 
 
 the 
 
 block structure: Local Data, Global Data, Local Structure, 
 
 Global Structure, and Type Data. The basic flow of 
 
 the 
 
 syntax analyzer is: While source text exists, the main 
 
 program determines the type of block that is being 
 
 presented, and calls the proper block processor. The block 
 
 processors are not completely self-contained; they 
 
 all 
 
 share certain modules such as the error handler. However, 
 
 it is clear that once parsing begins on one type of block. 
 
 the other block parsers are unnecessary. The selected 
 
 processor then continues parsing until the block terminates. 
 
 At that point, control returns to the main program 
 
 for 
 
 selection of the next block. 
 
 
 Several different parsing methods were considered 
 
 for 
 
 the current implementation. Upon examination of the 
 
 BNF 
 
 specifications, it was found that in almost all cases. 
 
 the 
 
 parsing machinery could determine "where it was" in 
 
 the 
 
 parse by looking at the current symbol. In the worst case. 
 
 one symbol look-ahead was necessary. For this reason, a 
 
 recursive descent parser was chosen for all elements in 
 
 the 
 
 language. Although this is not as fast as an LR parser. 
 
 implementation was quicker and did not require 
 
 the 
 
 calculation of the parsing tables. 
 
 
55 
 
 Error correction is handled on a "need to know" basis. 
 The parser continues parsing after an error is encountered 
 until it is unable to deteraine where to continue in the 
 parse. When this occurs, the parser flushes the input until 
 it is able to determine where to continue. In the worst 
 case, this means flushing the current statement which may be 
 a compound statement (e.g. a FOR loop). In other cases 
 only part of a statement is flushed. In any case, one or 
 more error messages will be output. If a compiler error 
 occurs, at least one error message is printed (which would 
 likely be the cause of the severe error), and all tables are 
 printed along with selected compiler variable values. 
 
 An important function of the syntax analyzer is to 
 insert values into the symbol table complex. It is at this 
 point that the types of symbols are resolved. Recall that 
 the lexical analysis phase inserted a type which was usually 
 unresolved. The syntax analyzer is able to determine the 
 types that the lexical analyzer is not able to distinguish. 
 

 
 {9 
 
 I 
 
 Is 
 
 56 
 
 6.5 Semantic Analysis 
 
 It has been stated on several occasions that CLEOPATRA 
 is a very type conscious language. That is, lixed mode 
 operations are not allowed except by user-defined operators 
 designed for this purpose. It is the task of the semantic 
 processing modules to monitor all types and their usage. 
 
 The primary semantic processor is associated with the 
 expression parser. Each operand is pushed onto a type 
 stack. As the associated operator arrives, the type of the 
 top entry (or entries for binary operators) is compared to 
 the type of the operator. If the types match, the parse 
 continues. If the types are unmatched, the semantic 
 processor searches the symbol table via the configuration 
 link to find another entry for the operator which has the 
 proper types. If the proper type is not found, an error 
 message is emitted and the call to the code generator is 
 inhibited by setting the condition code. 
 
 Another important aspect of semantics in CLEOPATRA is 
 the analysis of parameter lists. The full connection is a 
 three-way association: The declaration of a procedure or 
 operator specifies the attributes of the parameters. The 
 data block for the respective block must declare a variable 
 of each of the attributes to serve as a target for the 
 
57 
 
 parameter. At this point no parameter checking is done, 
 however, since the positional association has not yet been 
 made. The third component of the association ties the 
 package together. In the definition of the routine block, a 
 name_list is specified if there are parameters to the 
 routine. At this point identifiers must be placed 
 positionally as to their type and the type in the 
 declaration of the procedure or operator. A semantic 
 processor then compares the position of the parameter to the 
 expected type at that position against the type of the 
 identifier. If there is a mismatch — too few or too many 
 parameters — an error message is generated. 
 
58 
 
 7.0 Conclusions 
 
 r : 
 I. 
 
 H 
 
 r- 
 
 
 121 
 
 The objective of the current research has been to 
 implement a parser and intermediate text generator for the 
 CLEOPATRA subset language. Further, this iapleaentation was 
 to serve as a test of the feasibility of the language 
 itself. It was to serve as an analysis of the algorithms 
 and data bases required to provide the facilities of the 
 CLEOPATRA language. In general, the research has been 
 sucessful in this regard. Certain limitations were placed 
 on the full language, but every attempt was made to keep the 
 language intact. 
 
 Is the subset implemention the perfect language? 
 Certainly we would like to respond affirmatively to that 
 guestion but, in fact, the answer is "not really". All of 
 the major control structures and blocks have been 
 implemented. But, other important features have been 
 omitted. Some of these omissions include (recalling from 
 Chapter 5) : 
 
 1) Pointers; 
 
 2) Complete freedom in the order of presentation 
 
 of blocks for compilation ; 
 
 3) Allowing arrays within a user-defined type 
 
 (see [4]). 
 
59 
 
 Notwithstanding these onissions, the language CLEOPATRA 
 and the present subset are viable tools in the programaer 's 
 repertoire. During the coding of the analysis pass, the 
 facilities of CLEOPATRA would have aade implementation much 
 easier and cleaner. Further, the language leads quite 
 naturally to a well structured and clean program. The data 
 bases used in the subset should be sufficient for later 
 implementations. Indeed, as a test for feasibility, the 
 present implementation has demonstrated that the language is 
 feasible. The present research has indicated a course of 
 action for further efforts on the implementation of the full 
 language. 
 
 Initially, a study of the use, need and desirability of 
 the basic type POINTER should be undertaken. As stated in 
 Chapter 5, it has been argued that pointers are an artifact 
 from compilers gone by. A careful analysis of whether 
 pointers should be available and what should be a legal 
 target for a pointer if they are to be allowed should be 
 done first. If the type POINTER is to be retained, the 
 method of implementation should be examined. This would 
 answer such questions as: 1) "How is type checking to be 
 handled?"; and 2) "How can arrays of pointers be 
 implemented? ". 
 
^: 
 
 
 
 up 
 
 60 
 
 The next area of study should be in syabol table 
 design. In Chapter 5, it was stated that the present syabol 
 table is insufficient for indirect pointers. This problem, 
 along with some general restructuring, should be 
 undertaken — possibly in conjunction with the study of 
 pointers. (Suggestions have been given in Chapter 5.) 
 
 The presentation order of blocks should be examined 
 rrlativo to the bit map discussed in Chapter 5. As stated, 
 the bit map seems to be the "cheapest" method to allow 
 maximum flexibility in presentation order, yet retaining the 
 general two pass method. This problem is somewhat less 
 crucial since placing the restriction used in the subset 
 does not appear too restrictive. However, the algorithm 
 should not be too difficult if the bit map is true for all 
 cases. 
 
 The last major area to be examined before implementing 
 the full language is a study of optimal parsing and code 
 generation methods for the CLEOPATRA language. A table 
 driven method might be considered, although table size might 
 make such a method impractical. In all considerations, the 
 design of a well structured compiler with the capability of 
 being overlayed should be paramount. 
 
61 
 
 The above analyses having been completed, the full 
 CLEOPATRA language could be inplenjented. With the lessons 
 of the subset and the suggested analyses, CLEOPATBA would 
 indeed be a very effective progranming language. 
 
62 
 
 References 
 
 [11 Schreiner, Axel T. , "A Proposal for Another Systea 
 Implementation Lanquage", Ph.D. Thesis, 
 Department of Computer Science, Oniversity of 
 Illinois, Urbana, Illinois, 1974. 
 
 C 
 
 [21 Schreiner, Axel T. , "Comprehensive Language for Elegant 
 Operating System and Translator Design", Technical 
 Report UIUCDCS-E-7a-646, Department of Computer 
 Science, University of Illinois, Urbana, Illinois, 
 197U. 
 
 
 i 
 
 il 
 
 '0 
 IS 
 
 [31 Halbur, John D. , "A Cotle Generator for the CLEOPATRA 
 Language", Masters Thesis, Department of Computer 
 Science, University of Illinois, Urbana, Illinois, 
 1975. 
 
 [4 1 Halbur John D., "CLEOPATRA Code Generator User's 
 Guide", Technical Report UIUCDCS-R-76-740, 
 
 Department of Computer Science, University of 
 Illinois, Urbana, Illinois, 1976. 
 
 [51 Gries, David, Compii^r Construction for Digital 
 Computers, John Wiley and Sons, Inc., New York 
 1971. 
 
 [6 1 Baur, F. L., Eckel, J., C cm pile r Con struct ion- -An 
 AlM^c^l^ Course, Springer-Verlag, New York 1974. 
 
 [71 Hoare, C.a.R., Notes on Data Structuring, in Dahl, 
 Dijkstra, and Hoare, Structured ££2aEilSisa» 
 Academic Press, New York, pp. 83-174, 1972. 
 
im^ 
 
 BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-76-834 
 
 3. Recipient's Accession No. 
 
 4. Title and Subtitle 
 
 IMPLEMENTATION OF THE LANGUAGE CLEOPATRA; 
 ANALYSIS PASS 
 
 THE 
 
 5. Report Date 
 
 October 1976 
 
 6. 
 
 7. Author(s) 
 
 Scott Harley Fisher 
 
 8. Performing Organization Rept. 
 
 No. 
 
 9. Performing Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana- Champaign 
 
 Urbana, Illinois 618OI 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 12. Sponsoring Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Charapaign 
 
 Urbana, Illinois 618OI 
 
 13. Type of Report & Period 
 Covered 
 
 Master's Thesis 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 CLEOPATRA is a general purpose language with features suitable for 
 systems prograiraning. A compiler for the language CLEOPATRA has been 
 implemented in two passes. This report describes the analysis pass which 
 produces an intermediate text suitable for the code generation pass. The 
 analysis pass was written in PL/l for the IBM 36O computer. Due to the 
 facilities of the language, the analysis pass requires innovative data 
 structures and algorithms - these are reported herein. 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 Block Structured Language 
 
 Compilation 
 
 Compilers 
 
 Intermediate Text 
 
 Programming Languages 
 
 Symbol Table Management 
 
 Parsing 
 
 17b. Identifiers/Open-Ended Terms 
 
 Systems Implementation Languages 
 CLEOPATRA 
 
 17e. COSATI Field/Group 
 
 18. Availability Statement 
 
 RELEASE UNLIMITED 
 
 FORM NTIS-38 ( 10-70) 
 
 19.. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 USCOMM-DC 40329-P71 
 
i. 
 
FEB 2. ^ «77 
 
c 
 
 Jl 
 
 V 
 
 4 
 4 
 
 «■ 
 
 "J 
 
 ;li 
 
 (3 
 
 :c 
 
 •is 
 
 '0 
 
 ■■ 
 
 4 
 
 4 
 
roqvityy«VA.Ny:,ya.:g»or- /.->v^->v- .•a 
 
:>■> 
 
 ^. 
 
 ^: 
 
LilriibUtiilit 
 
 «wiMiT>oim»iiwiT3*wwwM»iiw«»iiKinn»ntronTmnny<njn'<t^»!T't<< nnnfinH H^t>'/' 
 
 JAN 
 
 .^ 9 1976 
 
7 
 
 UNIVERSITY OF ILLINOIS-URBANA 
 510.84 IL6R no. COOZ no. 830-835(1976 
 Implementation of the language CLEOPATRA 
 
 3 0112 088403073