MlfflBM 
 
 HHBSB! 
 HflBolffin 
 
 JHfl 
 
 
 ■83 
 
 ■HHKaiBGfitBS 
 
 H II 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 5/0.84 
 
 no. S9S-&O0 
 cop. 2. 
 
..^.Report No. UIUCDCS-R-73-596 
 
 yyi^i^ 
 
 A GENERALIZED LEXICAL SCANNER FOR A TRANSLATOR WRITING SYSTEM 
 
 by 
 
 Albert Cannon Baker, Jr, 
 
 October 1973 
 
 ■ \\ 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLI 
 
Report No. UIUCDCS-R-73-596 
 
 A GENERALIZED LEXICAL SCANNER FOR A TRANSLATOR WRITING SYSTEM* 
 
 by 
 Albert Cannon Baker, Jr, 
 
 October 1973 
 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 Urbana, Illinois 61801 
 
 This work was supported in part by the National Science Foundation under 
 Grant No. US NSF-GJ-328 and was submitted in partial fulfillment of the 
 requirements for the degree of Master of Science in Computer Science, 
 October 1973. 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/generalizedlexic596bake 
 
A GENERALIZED LEXICAL SCANNER FOR A TRANSLATOR WRITING SYSTEM 
 
 Albert Cannon Baker, Jr., M.S. 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign, 1973 
 
 This is an expository paper that is concerned with a Lexical 
 Scanner for a translator writing system that has been in use at the Univer- 
 sity of Illinois. Its significant features include a structured, binary- 
 tree symbol table, a parameterized macro expander, and a compile-time 
 flexibility for assigning characters that make up the basic terminal 
 symbols. A comprehensive example of the scanner's operation is also in- 
 cluded. 
 
m 
 
 ACKNOWLEDGMENT 
 
 I wish to express my deepest thanks to Professor J. R. Phillips 
 for his continuing support as academic advisor and thesis supervisor; to 
 Professor R. S. Northcote, who struck in me the spark of interest in compiler 
 design and defined much of the material for the initial lexical scanner; and 
 to Norma E. Able for her patience through many discussions leading me through 
 the maze of TRANQUIL and the TWS. 
 
 Finally, to my wife Carol, thanks for the continued encouragement 
 and motivation, without which the successful completion of these studies 
 would never have been possible. 
 
 The Air Force Institute of Technology, Wright Patterson Air Force 
 Base, Ohio, sponsored the author's studies at the University of Illinois. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 . 1.1 The Translator Writing System 1 
 
 1.2 The Lexical Scanner 2 
 
 2. SCANNER DATA STRUCTURES 5 
 
 2.1 General Considerations 5 
 
 2.2 BIGTAB - The Symbol Table 6 
 
 2.2.1 Basic Format 7 
 
 2.2.2 Storage for Keywords, Identifiers and String 
 
 Literals 8 
 
 2.2.3 Storage for Numeric Literals 9 
 
 2.3 The SCAN Descriptor 12 
 
 2.4 MACROTAB - The Macro Text Table 14 
 
 2.4.1 Concept of a Definitional Facility 14 
 
 2.4.2 The TWS Macro Facility 14 
 
 2.4.3 MACROTAB Descriptors 16 
 
 2.5 CHARCLASS - The Character Class Table 18 
 
 3. MAIN SCANNER PROCEDURES 22 
 
 3.1 P.EADACARD 22 
 
 3.2 NEXTCHAR 23 
 
 3.3 Boolean Procedure TABLESEARCH 24 
 
 3.3.1 Table Lookup 24 
 
 3.3.2 Processing of an Entry Already in BIGTAB 25 
 
 3.3.3 Insertion of a New Entry Into BIGTAB 26 
 
 3.4 Procedures to Build BIGTAB Entries 27 
 
 3.5 The Macro Facility 27 
 
 3.6 Alpha Procedure SCAN 30 
 
 3.7 Integer Procedure SHAKEOUTBIGTAB 31 
 
 APPENDIX 36 
 
 A SYNTAX AND SEMANTICS OF SCANNER-DEFINED ITEMS 36 
 
 B SAMPLE PROGRAM 42 
 
 LIST OF REFERENCES 55 
 
LIST OF FIGURES 
 
 Page 
 
 1. BIGTAB Basic Format 7 
 
 2. Format of the SCAN Descriptors 13 
 
 3. Table Lookup Algorithm 25 
 
 4. Examples of Balanced Tree Structures 33 
 
 5. Unbalanced Tree Structure 43 
 
 6. Balanced Tree Structure 44 
 
1. INTRODUCTION 
 
 1.1 The Translator Writing System 
 
 One of the software efforts that was undertaken for the Illiac IV 
 project at the University of Illinois at Urbana-Champaign was the development 
 of a Translator Writing System which permits the implementation of the various 
 compilers; in particular, the compilers for the Illiac IV problem-oriented 
 languages TRANQUIL [1] and GLYPNIR [2] are the prime examples. The main 
 components of the TWS are: 
 
 a) syntax meta-languages (TWINKLE [7] and TBNF [4]), which 
 are extensions to Backus-Naur Form [8, 9], and in which 
 the syntax of a programming language £ is specified; 
 
 b) a semantics meta-language (Illinois Semantics Language ISL 
 [5]) which is an extension of Burroughs Extended Algol that 
 includes special constructs to manipulate tables, stacks 
 and generate object code; and in which the semantics of a 
 programming language £ is specified; 
 
 c) a basic core for each translator consisting of the lexical 
 scanner, either the skeleton parser for a direct parsing 
 algorithm translated into Algol code or a complete table- 
 driven parser for an interpretive parsing algorithm [4], 
 
 and miscellaneous auxiliary procedures which are independent 
 of the source and object languages of the translator; 
 
 d) a system consisting of syntax preprocessors to generate 
 either the tables for the interpretive parser or a set of 
 Algol source statements that will parse the language 
 
specified; the ISL translator to translate the ISL ex- 
 tensions into Algol; a program to generate from the out- 
 puts of the syntax preprocessor, the ISL translator, and 
 the basic core a complete Algol program which, when 
 compiled by the standard Algol compiler, will be the re- 
 quired translator for the language £ as specified by the 
 syntax and semantics. 
 
 1 .2 The Lexical Scanner 
 
 The function of a lexical scanner in a compiler is to scan charac- 
 ters from a source program, combining one or more of them together to form 
 single terminal symbols when the syntactic recognizer (parser) makes a request 
 for a new symbol. As far as the TWS and the scanner are concerned, the 
 following symbols are deemed to be terminal symbols: 
 
 a) special single characters ($,:-+=, etc.); 
 
 b) key words in the language, including either reserved 
 identifiers or special identifiers for which a special 
 character (nominally "#") precedes the identifier (BEGIN, 
 #ELSE, etc.); 
 
 c) identifiers (X, CASHIN1STNATI0NALBANK, etc.); 
 
 d) string and numeric literals ("THIS IS A STRING", 3.14159, 
 @ 234, etc.). 
 
 Internal to the scanner is a powerful parameterized text- type [3] 
 macro expander which has the capability to recognize and store declarations of 
 defined identifiers, and to regurgitate the stored text when the identifier is 
 used subsequently. This facility is transparent to the syntactic recognizer 
 and, except for block structure considerations, is transparent to the semantics 
 
 
as well. Section 2.4 contains a discussion of the data structure of the 
 macros, and Appendix A includes a discussion of the syntax and semantics of 
 the macro generator. 
 
 The symbol table used by the scanner is a straightforward binary- 
 tree structure, with disjoint trees for the several terminal symbol classes 
 interleaved within the same table. Binary Coded Decimal (BCD) information 
 is stored in this table packed six characters per 48-bit Burroughs B-5500 
 word. Section 2.2 contains a complete description of the symbol table. 
 
 There is much flexibility built into the scanner to make the re- 
 sultant compilers both more general and easier to use. Through control card 
 options, the user of the compiler may specify that non-standard symbols be 
 used to define the terminal classes, such as using "8" instead of "@" to mark 
 the exponent part of a numeric literal, or that the key words in the language 
 would be marked by a special symbol, freeing these identifiers for the pro- 
 grammer's use. The inclusion of a macro facility gives the programmer the 
 power to extend the basic language, or to make one language resemble another, 
 or to make his source code appear more readable. For example, if the compiler 
 for an Algol -like language were written using the TWS, one could extend the 
 language at compile time by adding appropriate macro definitions to a program 
 written in it to make it resemble COBOL: 
 
 DEFINE ADDING = MEND , 
 TO = + MEND , 
 COMPUTE = MEND , 
 BY = := MEND ; 
 
 where MEND terminates a macro definition. Then the source language statement: 
 
 COMPUTE XYZ BY ADDING A TO B; 
 
would be compiled as: 
 
 XYZ := A+B 
 
 since ADDING and COMPUTE were both defined to be null. 
 
 Thus, to reiterate, the main functions of this lexical scanner are 
 to assemble the terminal symbols from the source string, to pass simple repre- 
 sentations of those symbols to the syntactic recognizer, to maintain the BCD 
 symbol table, and to perform macro expansion. The data structures behind these 
 functions are the subject of Chapter 2, and a functional description of these 
 functions in terms of the Algol procedures that implement them are the subject 
 of Chapter 3. 
 
2. SCANNER DATA STRUCTURES 
 
 2.1 General Considerations 
 
 The structure of the internal tables and the algorithms to use them 
 will have a great effect on the speed and efficiency of any program. The 
 lexical scanner is one of the most used procedures in any compiler, and 
 attention must be paid to make it as efficient as possible. In the imple- 
 mentation described here, one of the main considerations is the structure of 
 the language into which the compilers are translated, Burroughs extended 
 Algol for the B-5500. A brief introduction to the B-5500 and its constraints 
 on the Algol language are appropriate here. 
 
 The B-5500 [8] is a multiprogramming, multiprocessor computer 
 system. With a limited (32K 48-bit words) main memory, it relies heavily on 
 segmentation of both programs and data to make most effective use of limited 
 memory to service the various programs in the mix. Specifically, programs 
 and arrays are broken down into segments, each no larger than 1024 words. 
 The program segments are stored on the disk. When a program enters the mix 
 to be run, it is assigned a fixed, non-over! ayable, contiguous area for a 
 run-time stack and program reference table. This latter contains storage for 
 single variables and descriptors relating to each program and array segment. 
 Then, program and data segments are read off the disk as they are needed, and 
 assigned space in core possibly overlaying previous information from any of 
 the programs in the mix. If the area being overlaid contains only program 
 segments, or array segments that have not been written into, it is simply 
 overwritten; the information is still on the disk. However, an area containing 
 array segments with words that have been changed causes those segments to be 
 written back onto the disk before being overlaid. 
 
The restriction that array segments be no longer than 1024 words is 
 of primary interest here. The segmentation is by array rows - with each row 
 occupying a segment. Thus, no row may be longer than 1024 words. A one- 
 dimensional linear array is one row, so no linear array may be longer than 
 1024 words. Larger linear tables must be simulated as two-dimensional arrays 
 For instance, an 8192 word table could be declared with array bounds [0:15, 
 0:511] so there would be sixteen segments each containing 512 words. When 
 simulating large linear arrays, it is wise to express the range for each sub- 
 script to be a power of two in each case in order to be able to access an 
 entry in the table using a single index. In the case above, the column sub- 
 script requires exactly four bits, whereas the row subscript requires exactly 
 nine. Thus, given a single 48-bit index I, I. [35:4] would select the proper 
 row, whereas I. [39:9] would select the proper column position within that row 
 (in the partial word notation of Burroughs extended Algol). 
 
 2.2 BIGTAB - The Symbol Table 
 
 In any compiler, storage for the representations of the terminal 
 symbols must be made. The efficiency of the compiler can be greatly affected 
 by the choice of data structure for the symbol table. The specific functions 
 that must be optimized in the use of the symbol table are, in order of im- 
 portance: 
 
 a) lookup 
 
 b) insertion 
 
 c) traversing. 
 
 Furthermore, separate lists must be maintained for the four classes of multi- 
 character terminal symbols: <*!>, <*N>, <*S>, and <*R>. 
 
 
The basic structure BIGTAB was chosen so as to store data as a 
 forest of binary trees, having four trees interleaved within one 8192-word 
 table. The advantages of this approach are: 
 
 a) being naturally linked lists, interleaving the trees in 
 the same table is possible; thus, an identifier entry could 
 be adjacent in the table to a numeric literal; space has to 
 be reserved for only one table; 
 
 b) lookup is fast compared to a linearly linked list; 
 
 c) insertion can be made in the next sequential location in the 
 table with no need to change links already established. 
 
 Knuth [6] discusses extensively the characteristics of binary tree structures. 
 
 2.2.1 Basic Format 
 
 Each tree has a head node in a fixed table location: BIGTABp] for 
 identifiers, BIGTAB[2] for numeric literals, BIGTAB[3] for string literals and 
 BIGTAB[15] for key words. This head node is a pointer to the root of its tree. 
 Each BIGTAB entry consists of an entry head plus one to eight data words to 
 store the BCD characters of the text. 
 
 HEAD NODE 
 
 ENTRY 
 HEAD 
 
 Semantic 
 Part 
 
 0:16 
 
 Number of 
 Characters 
 
 16:6 
 
 Left Pointer | Right Pointer 
 
 22:13 
 
 35:13 
 
 DATA 
 WORDS 
 
 
 
 
 
 
 _________ 
 
 
 
 0:12 
 
 BCD CHARACTERS 
 
 Figure 1. BIGTAB Basic Format 
 
8 
 
 2.2.2 Storage for Keywords, Identifiers and String Literals 
 
 The syntax preprocessor will extract from the syntax definition the 
 language specific keywords and non-terminals and place them, linked together 
 in one tree in BIGTAB format, into a disc file called "<LANGNAME>/TABLES", 
 where <LANGNAME> is the name assigned by the compiler designer. At the start 
 of every execution of the TWS-built compiler, this file is read, initializing 
 the run-time BIGTAB. The language non-terminals (such as <BL0CK>, or 
 ARITHMETIC PRIMARY>) will allow the table-driven parser to provide a trace 
 of the parsing path. The non-terminals are inserted into the table with a 
 leading blank so they will never be recognized as identifiers in the source 
 string. Appendix B lists the TBNF syntax of a simple language DEMALGOL, and 
 gives an example of the initial BIGTAB produced by this syntax. 
 
 The scanner assumes as the nominal condition that these keywords 
 will always be preceded by the special symbol "#" (i.e., #BEGIN), and that 
 BIGTAB[15] will point to this initial syntax-preprocessor-built tree. Thus, 
 all occurences of "#" followed by an identifier will cause reference to this 
 tree. But, by control card option, the nominal condition can be replaced by 
 a reserved word option. In this condition, all occurences in the source 
 string of all syntax-defined keyword identifiers (i.e., BEGIN) will be reserved 
 to have only the keyword meaning, and BIGTAB[1], the identifier tree, is set 
 to point to the syntax preprocessor-built tree. Thereafter, all identifiers 
 in the source string will be checked against this table, and newly-defined 
 identifiers will be linked into it. The scanner will recognize the presence 
 of a reserved identifier by the fact that the BIGTAB address is within the 
 range of the initial table. 
 
 For keywords, identifiers and string literals, the basic format is 
 exactly as specified in section 2.2.1. The only difference among the three 
 
classes is in the use of the semantic part. The BCD characters are stored 
 six per 48-bit word, allowing a maximum of 48 significant characters. 
 
 For identifiers, headword bit [1:1] is reserved by the scanner to 
 indicate this identifier is defined as a macro or macro formal parameter. If 
 set, then bits [4:12] point to the address in MACROTAB of the stored text, 
 and bits [2:1] indicate a formal parameter. If the identifier is not defined 
 as a macro, bits [2:13] may be set by the compiler semantic routines as de- 
 sired. In GLYPNIR [2], as implemented using the TWS, pointers to the semantic 
 IDTAB and the parser MSTACK are inserted in the semantic part. 
 
 For keywords, the syntax preprocessor places in the semantic part a 
 unique symbol number for each keyword in the syntax. This allows the parser 
 to consider the keyword as it would a single special symbol. 
 
 For string literals, the semantic part is reserved for the semantic 
 routines, typically to point to a literal table. 
 
 2.2.3 Storage for Numeric Literals 
 
 A numeric literal is a string of characters that carries an inherent 
 semantic value - the specific quantity that this string represents. As this 
 semantic value will be variable and machine dependent, the TWS will not convert 
 these literals to an internal machine representation, but rather transform the 
 literal string to a normalized, consistent BIGTAB entry, with enough analysis 
 performed on the source string to make the semantic conversion of the numeric 
 literal to internal representation relatively straightforward for the semantic 
 part of the compiler. 
 
 In BIGTAB, the same basic header word and data word structure applies 
 here as in the identifier, keyword and string literal tables. The semantic 
 part of the header word can be used, as in the TRANQUIL compiler implemented 
 
10 
 
 using the TWS, to store a literal type, and a pointer to a semantic table. 
 But, in the data words, quite a different structure is used. Instead of the 
 BCD characters packed six per 48-bit word, the full eight character capacity 
 of each word is used, with the first two characters in the first data word 
 being used to describe certain semantic attributes of the numeric literal: 
 
 ENTRY 
 HEAD: 
 
 Semantic 
 Part 
 
 Number of 
 Characters 
 
 Left Tree 
 Pointer 
 
 Right Tree 
 Pointer 
 
 
 
 0:16 16:6 22:13 
 
 35:13 
 
 
 DATA 
 WORDS: 
 
 Char 
 
 
 Char 
 1 
 
 Char 
 2 
 
 Char 
 3 
 
 Char 
 4 
 
 Char 
 5 
 
 Char 
 6 
 
 Char 
 7 
 
 (1-8) 
 
 0:6 
 
 6:6 
 
 
 12:6 
 
 18:6 
 
 24:6 
 
 30:6 
 
 36: 
 
 6 
 
 42:6 
 
 The first two characters (12 bits) of the first data word have the 
 following values: 
 
 0:1 - Unused, always zero 
 1:1 - Base indicator 
 
 =0 Decimal numeral, base 10 
 
 =1 Nondecimal numeral , base 2 to 36 
 2:1 - Numeric type 
 
 =0 Integer 
 
 =1 Real 
 3:1 - Sign of exponent 
 
 =0 Positive 
 
 =1 Negative 
 4:2 - Number of exponent digits (I) 
 
 0-3 (i.e., exponent 0-999, Q ) 
 
 6:6 - Number of mantissa digits (N). 
 
11 
 
 There follow "N" characters of the mantissa; followed by "I" by 
 characters of the exponent (for real type numeric literals only); followed by 
 one character containing the base (for nondecimal base numeric literals only), 
 range 2-36^. The last data word is zero filled. 
 
 For numeric literals containing a radix point, the mantissa is 
 normalized, that is the exponent is recomputed as though the radix point is 
 to the right of the rightmost mantissa digit. 
 
 For nondecimal bases, there must be provision for up to 36 different 
 digits. The scanner considers 0-9 and A-Z as the 36 digits. The internal 
 character code for the decimal digits 0-9 exactly correspond to the "digit 
 value" 0-9. But this is not true for the alphabetic letters. To correct for 
 this, the input alphabetic character will be converted to a true digit value 
 in the range 0-35 for storage in the data words. This is accomplished by 
 subtracting a bias from the character code, depending on the letter: 
 
 A-I subtract 7 
 J-R subtract 14 
 S-Z subtract 22. 
 
 Consider as an example, the hexadecimal numeric literal 
 3A42E. 5690-354(1 6). The semantic descriptor would be composed as follows: 
 
 0:1 =0 
 
 1:1 =1 , nondecimal base 
 
 2:1 =1, real 
 
 3:1 =1, negative exponent 
 
 4:2 =3, 3 digits of exponent 
 
 6:6 =8, 8 digits of mantissa 
 
 This produces for the first 12 bits 0111 111 001 000, or as six bit characters, 
 
12 
 
 V8". This will produce the following data words: 
 
 Data word 1: *■ 8 3 # 4 2 > 5 
 Data word 2: 69357+00 
 
 Note that the exponent has been changed from 354 to 357, repre- 
 senting the normalization; that the base is represented by "+", or 16, Q ; that 
 
 the second word is padded with zeros; that hex A (character code 17) is 
 converted into "#" (character code 10); and that hex E (character code 21) is 
 converted into ">" (character code 14). 
 
 The simple decimal integer 1 would be converted for storage to: 
 
 0:1 =0 
 
 1:1 =0, decimal base 
 
 2:1 =0, integer 
 
 3:1 =0, positive exponent 
 
 4:2 =0, no exponent part 
 
 6:6 =1, 1 mantissa digit. 
 
 This produces the six bit characters "01", and the following data word: 
 
 Data word 1: 110 
 
 2.3 The SCAN Descriptor 
 
 The ALPHA procedure SCAN is called by the parser (and recursively 
 from within the scanner itself) when a new terminal symbol is required. The 
 48-bit value assigned to SCAN as a function is referred to as the SCAN de- 
 scriptor, and has the format as shown in Figure 2. 
 
 For keywords <*R>, the symbol number is assigned by the syntax pre- 
 processor, starting with 66, «. Special single characters have a symbol number 
 
 equal to their internal 6-bit character code, thus varying from 0, (numeral 
 
13 
 
 zero) to 63, ("). This allows the keywords and the special single characters 
 to be considered the same in the parsing routines. 
 
 <*I>, Identifiers: 
 
 Unused 
 
 Class 
 =1 
 
 BIGTAB 
 Semantic Part 
 
 0:2 2:4 
 
 <*N> , Numeric Literals 
 
 6:12 
 
 Pointer to 
 BIGTAB 
 
 18:13 
 
 Class 
 =1 
 
 Pointer to 
 BIGTAB 
 
 31:4 
 
 35:13 
 
 Unused 
 
 0:2 
 
 Class 
 =2 
 
 BIGTAB 
 Semantic Part 
 
 Pointer to 
 BIGTAB 
 
 2:4 
 
 6:12 
 
 18:13 
 
 Class 
 =2 
 
 Pointer to 
 BIGTAB 
 
 31:4 
 
 35:13 
 
 <*S> , String Literals 
 
 Unused 
 
 Class 
 =3 
 
 BIGTAB 
 Semantic Part 
 
 0:2 
 
 2:4 
 
 6:12 
 
 Pointer to 
 BIGTAB 
 
 Class 
 =3 
 
 18:13 
 
 Pointer to 
 BIGTAB 
 
 31:4 
 
 35:13 
 
 :*R>, Keywords: 
 
 Unused 
 
 Class 
 = 15 
 
 Symbol 
 Number 
 
 Pointer to 
 BIGTAB 
 
 Class 
 = 15 
 
 Symbol 
 Number 
 
 0:2 
 
 2:4 
 
 6:12 
 
 18713 
 
 31:4 
 
 35:13 
 
 Special Single Characters: 
 
 Unused 
 
 Class 
 = 15 
 
 Symbol 
 Number 
 
 Symbol 
 Number 
 
 Class 
 = 15 
 
 Symbol 
 Number 
 
 0:2 
 
 2:4 
 
 6:12 
 
 18:13 
 
 31:14 
 
 35:15 
 
 Figure 2. Format of the SCAN Descriptors 
 
 For keywords <*R>, the symbol number is assigned by the syntax pre- 
 processor, starting with 66, Q . Special single characters have a symbol number 
 
 equal to their internal 6-bit character code, thus varying from 0, Q (numeral 
 
14 
 
 zero) to 63, ("). This allows the keywords and the special single characters 
 to be considered the same in the parsing routines. 
 
 2.4 MACROTAB - The Macro Text Table 
 
 2.4.1 Concept of a Definitional Facility 
 
 Many compilers in current use (B-5500 ALGOL, IBM PL/I, JOVIAL, etc.) 
 have a definition facility- -that is capability to define compile-time procedure- 
 like constructs. One can compare a text- type definition or macro facility with 
 a run-time procedure construct as follows: 
 
 A procedure 
 
 a) is considered syntactically as a complete <statement> or 
 <primary>; 
 
 b) produces one set of machine code that may be executed by 
 jumps and parameter linkages from different parts of the 
 main program. 
 
 A macro 
 
 a) may be an incomplete syntactic fragment composed of a 
 sequence of terminal symbols; 
 
 b) produces a separate set of machine code for each invocation; 
 
 c) is strictly a compile-time device that is transparent to the 
 parsing and semantic portions of the compiler. 
 
 2.4.2 The TWS Macro Facility 
 
 As the TWS was developed using Burroughs B-5500 ALGOL, it became 
 apparent that the definition facility implemented on this compiler made the 
 compiler easier to use, and actually allowed local extensions to be implemented 
 in a rather straightforward manner. Therefore, as a practical matter a similar 
 
15 
 
 parameterized macro expander was included as a part of the core compiler for 
 all TWS-written compilers. 
 
 The storage scheme selected was to store the macro text as the entire 
 SCAN descriptor, with two header words for each definition. Some elements of 
 the storage scheme are: 
 
 a) one 48-bit word per terminal symbol; 
 
 b) the scope of identifiers (a semantic concept) used in the 
 macro text will be defined at the point in the program where 
 the macro is declared, since the SCAN descriptor includes the 
 BIGTAB semantic part at the time the macro was declared. 
 
 c) accessing the pre-stored macro text by the scanner may be 
 faster than scanning the text from the source string—as 
 the time-consuming assembling of the characters into the 
 numeric strings and identifier strings, and table lookup in 
 BIGTAB is performed only once, no matter how many times the 
 macro is invoked; 
 
 d) block structure considerations are made to allow an iden- 
 tifier to be defined, for example, as a label in one block 
 and redefined as a macro in an inner block with the old 
 semantic definition being restored upon block exit; 
 
 e) a defined identifier may be redefined within the block in 
 which it was declared, in which case the new text will re- 
 place the old text for subsequent invocation (this implies 
 that the macro declaration does not necessarily have to be 
 placed in the block head for a block-structured language), 
 
 f) no parsing or syntax checking of the text is made until the 
 defined identifier is invoked; 
 
16 
 
 g) defined identifiers (i.e., calls on other macros) may occur 
 anywhere within the macro text, but the value is defined at 
 the point of the macro declaration; 
 h) defined identifiers may occur anywhere within the actual 
 
 parameter part of any macro call; 
 i) no recursion--i .e. , one macro directly or indirectly calling 
 itself-- is permitted. 
 Appendix A describes the detail of the syntax and semantics of the 
 elements of the macro facility. 
 
 2.4.3 MACROTAB Descriptors 
 
 As the macro text is processed by the scanner from the DEFINE dec- 
 laration, two header words are set up in MACROTAB: 
 
 W0R.D1 : 
 
 Unused 
 
 0:6 
 
 Address of 
 Return Descriptor 
 
 Address of Actual 
 Parameter Table 
 
 6:12 
 
 Number of 
 Parameters 
 
 18:12 
 
 30:12 
 
 Unused 
 
 42:6 
 
 W0RD2: 
 
 BIGTAB Semantic 
 
 Pointer to BIGTAB 
 
 Link to 
 
 
 Block 
 
 Part of Defined 
 
 Address of Defined 
 
 Previous 
 
 
 Nesting 
 
 Identifier 
 
 Identifier 
 
 MACROTAB 
 
 entry 
 
 Level 
 
 0:16 
 
 76TT3 
 
 29:12 
 
 4T77 
 
 The text is then scanned (by SCAN) into the table, one word per termi- 
 nal symbol. If a defined identifier is encountered in the source string, a 
 special macro call descriptor is inserted into the table. If a formal parameter 
 is encountered, a special formal parameter descriptor is inserted. Finally, at 
 the end of the text, a return descriptor is inserted: 
 
17 
 
 SCAN 
 DESCRIPTOR: 
 
 Unused 
 
 Class 
 
 0:2 
 
 Symbol # 
 
 2:4 
 
 6:12 
 
 BIGTAB Pointer for 
 <*I> <*N> <*S> 
 
 18:13 
 
 Class I BIGTAB 
 Pointer 
 31:4 35:13 
 
 MACRO CALL 
 
 Unused 
 
 Class 
 
 Pointer to 
 
 Address of Where 
 
 Contents of 
 
 DESCRIPTOR: 
 
 
 =8 
 
 Called Macro 
 
 to Continue after 
 the Call 
 
 Called Macro's 
 Return Word 
 
 0:2 
 
 2:4 
 
 6:12 
 
 18:12 
 
 30:12 
 
 RETURN 
 DESCRIPTOR: 
 
 Unused 
 
 0:2 
 
 Class 
 =9 
 
 Where to Continue Processing 
 upon Return; =0 Means 
 Outermost Macro 
 
 Address of Macro 
 Call Descriptor 
 
 2:4 
 
 6:12 
 
 18:12 
 
 FORMAL 
 
 PARAMETER 
 
 DESCRIPTOR: 
 
 Unused 
 
 Class 
 = 10 
 
 Address of Macro 
 Header Word 
 
 Parameter 
 Number 
 
 0:2 
 
 2:4 
 
 6:12 
 
 18:12 
 
 When the macro is invoked, the actual parameters must be stored, in a 
 manner similar to the macro itself, as scan descriptors. In addition to the 
 stored text, for each actual parameter, there will be one return descriptor as 
 described above plus one parameter address and length word for each two actual 
 parameters. 
 
 ADDRESSES 
 AND LENGTH 
 DESCRIPTOR: 
 
 First 
 Length 
 
 First 
 Address 
 
 Second 
 Length 
 
 Second 
 Address 
 
 0:12 
 
 12:12 
 
 24:12 36:12 
 
 As the formal parameters used within the macro definitions are strictly 
 local, provision has been made to use the high-order end of the macro table 
 (from location 4095 down) as temporary storage for the semantic part of the 
 parameter identifiers during scanning of the macro text. This semantic part is 
 then restored when the mend terminating the definition is scanned: 
 
 FORMAL 
 PARAMETER 
 SAVE WORDS 
 
 BIGTAB 
 Semantic Part 
 
 BIGTAB 
 Pointer 
 
 0:16 
 
 16:13 
 
18 
 
 2.5 CHARCLASS - The Character Class Table 
 
 The term "terminal head symbol" refers to the first character of a 
 terminal symbol. The scanner needed a way to determine from the terminal head 
 symbol what class of terminal symbol was to follow. For example, a decimal 
 digit, radix point or exponent sign will indicate a numeric literal must be 
 formed from the following characters. Similarly, a string quote indicates a 
 string literal follows. For this and other decision points in the scanner, a 
 table of character classes has been established, assigning to each six-bit BCD 
 character a bit string: 
 
 Character 
 Class 
 
 Class 
 Value 
 
 CHARCLASS Bit Positions 
 41 42 43 44 45 46 47 
 
 Digits 0-9 58 
 
 Special Keyword 
 
 Delimiter ( £ ) 4 
 
 Numeric Literal 
 
 Exponent Delimiter ( £ ) 18 
 
 Radix Point ( ^ ) 34 
 
 Numeric Literal 
 
 Base Delimiter [ {_) 64 
 
 String Quote ( ^ ) 3 
 
 All Other 
 
 Special Symbols 
 
 Letters A-I 89 
 
 Letters J-R 105 
 
 Letters S-Z 121 
 
 1110 10 
 
 10 
 
 10 10 
 10 10 
 
 10 
 1 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 Table 1. CHARCLASS Table 
 
 When certain lexical decisions must be made about a character in the 
 source string, a branch is made, indexed by some subset of the bits in the 
 CHARCLASS entry. 
 
19 
 
 When the scanner is ready to search for a new terminal symbol, a 
 branch is made on bits [45:3] of the CHARCLASS of the terminal head symbol, 
 with the following results: 
 
 [45:3] Value Characters in This Class 
 
 1 Letters A-Z. Identifier follows. 
 
 2 @_, _;_, Digit 0-9. Numeric literal follows. 
 
 3 \ String literal follows. 
 
 4 #_. If the special word option for keywords 
 was chosen, a language keyword follows. 
 
 All other special characters. Process as a 
 special single character. 
 
 During the assembly of a numeric literal, when it is known the 
 terminal head symbol has bits [45:3]=2, a further branch is made on bits [42:2], 
 with the following results: 
 
 [42:2] 
 
 Val 
 
 ue 
 
 Characters in This Class 
 
 1 
 
 
 
 Q_. Exponent delimiter. 
 
 2 
 
 
 
 „ Radix point. 
 
 3 
 
 
 
 Digits 0-9. 
 
 These correspond to special processing depending on the numeric type. 
 
 Later, during assembly of the interior symbols of the numeric literal, 
 a branch is made on bits [41:3] of the incoming source symbol, with the 
 following results: 
 
20 
 
 [41 :3] Value Characters in This Class 
 
 1 £. Branch to exponent logic. 
 
 2 _j_. Branch to fractional digit logic. 
 
 3 Digits 0-9. Assemble in normal manner. 
 
 4 {_. Branch to base logic. 
 
 5 A-I. Branch to logic to convert character 
 codes 17-25 to true digit values 10-18. 
 
 6 J-R. Branch to logic to convert character 
 codes 33-41 to true digit values 19-27. 
 
 7 S-Z. Branch to logic to convert character 
 codes 50-57 to true digit values 28-35. 
 
 All other special symbols. Terminate numeric 
 Literal processing. 
 
 During assembly of the alphanumeric internal characters of an iden- 
 tifier, CHARCLASS bit [44:1] is used to indicate an alphanumeric character. 
 
 [44:1] Value Characters in This Class 
 
 1 Digits 0-9, letters A-Z. Assemble into the 
 identifier. 
 
 All other characters. Terminate identifier 
 processing. 
 
 This procedure of using the internal character code of the source 
 characters to index a table of character classes allows the TWS-supplied contro' 
 card options below to be easily implemented: 
 
21 
 
 ALPHABETIC a 
 
 ALPHANUMERIC b 
 
 RSWD 
 
 SPWD c 
 
 EXPONENT d 
 
 Let a have CHARCLASS.[44:4]=9. Thus a can 
 
 be an identifier terminal head symbol 
 
 ( [45: 3]=1 ) or an internal identifier symbol 
 
 ([44:13=1). 
 
 Let b have CHARCLASS . [44: 1 ]=1 . Thus b can 
 
 be an internal identifier symbol. 
 
 Choose the reserved word option for keywords. 
 
 Let # have CHARCLASS = zero. 
 
 Retain the nominal special word option for 
 
 keywords, but designate £ to be the delimiter 
 
 by setting CHARCLASS c to 4, and reset 
 
 CHARCLASS £ to zero. 
 
 Change CHARCLASS d to 18; reset CHARCLASS £ 
 
 to zero. 
 
 Specifying "ALPHANUMERIC - " on a control card would allow a COBOL- 
 like identifier CASH-IN-FIRST-NATIONAL-BANK. 
 
22 
 
 3. MAIN SCANNER PROCEDURES 
 
 This chapter will give a functional description of major parts of 
 the lexical scanner in terms of the ALGOL procedures that perform the various 
 functions. 
 
 3.1 READACARD 
 
 This procedure defines the basic format of the source program cards 
 accepted by the TWS scanner. 
 
 a) The source program card images are read from the disc file 
 "SOURCE" as 80-character records, into an array CARDBUF[0] 
 to CARDBUF[9], appropriately recognizing the end-of-file 
 condition. 
 
 b) The text in card image columns 1-72 is then transferred into 
 another array CHARBUF[1] to CHARBUF[72], one BCD character 
 per word. 
 
 c) The card image is analyzed to identify leading and trailing 
 blanks, setting items "FCR" as the column with the first non- 
 blank character, "LCR" as the last non-blank character, and 
 "NCR" as the moving character pointer initially set equal to 
 "FCR". 
 
 d) The card image counter "CARDCOUNT" is incremented by one, 
 with the current value placed in CARDBUF[10] for printing as 
 columns 81-89 of the card image. 
 
 e) The number in columns 72-80 of the card image is translated 
 to internal form, and made available to the rest of the 
 compiler procedures as "CARDNUM". If this field is blank on 
 the first card image in the source stream, "CARDNUM" will be 
 
23 
 
 subsequently set equal to "CARDCOUNT" for each card image. 
 
 f) If column one is "$", the card image is printed on printer 
 backup disc file "LINE", and control is passed to the pro- 
 cedure "CONTROLCARD" for analysis of the control card 
 information. 
 
 g) If the card image was not a control card, and if the "PRINT" 
 or "LIST" control card options had been chosen previously, 
 the contents of array CARDBUF, including the inserted card 
 counter, is printed on file "LINE". 
 
 Thus, externally, the following are the user-oriented TWS features 
 implemented through READACARD: 
 
 a) control cards start with a "$" in column one; 
 
 b) source text occupies columns 1-72; 
 
 c) columns 72-80 may contain a card count, that will be made 
 available, if the semantic routines store it, for program 
 traces. 
 
 3.2 NEXTCHAR 
 
 This procedure provides other scanner routines with text one charac- 
 ter at a time. Its output is a variable NXTCHR containing the six-bit BCD code 
 of the character scanned in the source string. In addition, the following func- 
 tions are performed: 
 
 a) If a "%" is detected anywhere on the card image, the rest of 
 the card image is ignored. This is the basic COMMENT capa- 
 bility provided by the TWS. An ALGOL-like facility may be 
 implemented by the semantics, with the caveat that all the 
 text in the comment would be scanned, with all words and 
 
24 
 
 numbers placed into BIGTAB. 
 b) When not internal to a string literal, multiple blanks in 
 the source stream are reduced to a single blank for parsing 
 purposes. 
 
 3.3 Boolean Procedure TABLESEARCH 
 
 Prior to the execution of TABLESEARCH, other procedures will have 
 been run to assemble from the source string one of the classes <*I>, <*R>, 
 <*S>, or <*N> of multi -character terminal symbols into SYMBUF[0] to SYMBUF[7], 
 in the format of the BIGTAB data words described in section 2.2. The value of 
 the procedure will be TRUE if a successful BIGTAB table lookup has been made, 
 meanwhile setting the global variable NEXTSYM to be the SCAN descriptor (see 
 section 3.2). The value is set to FALSE if a macro definition or call is en- 
 countered, indicating the main SCAN procedure must then either assemble further 
 text from the source string, or obtain SCAN descriptors from the macro table. 
 
 The TABLESEARCH procedure has three main portions that will be de- 
 scribed in detail: table lookup in BIGTAB, processing of an entry already in 
 BIGTAB, and insertion of a new entry into BIGTAB. 
 
 3.3.1 Table Lookup 
 
 BIGTAB is a straightforward binary tree structure. The basic algo- 
 rithm below does not reflect the complication in the scanner that is required 
 by the fact that entries to be compared may be of different lengths, extending 
 over one to eight words. 
 
 
25 
 
 Given SYMBUF[0] to SYMBUF[7] containing an entry in BIGTAB format; 
 TYP, the terminal symbol class (1, 2, 3, 15) of the entry; 
 NWDCH, the length of the SYMBUF entry; 
 LEFTPOINTER and RIGHTPOINTER referring to pointers in the 
 BIGTAB entry head 
 
 LI (Find root) 
 L2(Test if link null) 
 L3(Compare SYMBUF 
 
 with BIGTAB) 
 L4(SYMBUF f BIGTAB) 
 
 L5(SYMBUF > BIGTAB) 
 
 EXIT! 
 
 EXIT2 
 
 Set ENTRYPTR + BIGTAB[TYP] 
 
 Is ENTRYPTR = 0? Yes, go to EXIT1. 
 
 If SYMBUF = BIGTAB entry, and length of SYMBUF 
 
 length of BIGTAB entry, then go to EXIT2. 
 
 If SYMBUF < BIGTAB entry, 
 
 set ENTRYPTR *■ BIGTAB[ENTRYPTR].LEFTPTR, 
 
 set K «■ 1, go to L2. 
 
 Set ENTRYPTR «- BIGTAB[ENTRYPTR]. RIGHTPOINTER, 
 
 set K ^ 2, go to L2. 
 
 Entry not in BIGTAB. See section 3.3.3. 
 
 Entry already in BIGTAB, ENTRYPTR is location 
 
 of head node. See section 3.3.2. 
 
 Figure 3. Table Lookup Algorithm 
 
 3.3.2 Processing of an Entry Already in BIGTAB 
 
 If the symbol scanned in the source string is already in BIGTAB, three 
 cases must be distinguished: 
 
 a) The keyword DEFINE is encountered. A macro definition 
 
 follows. Process the text into MACROTAB format (see section 
 2.4). Exit TABLESEARCH, indicating an unsuccessful table 
 lookup. 
 
26 
 
 b) A defined identifier indicating a macro call is encountered. 
 Process the actual parameters of the call (see section 2.4) 
 if any, change the scan mode to indicate subsequent scan 
 descriptors are to come from the macro table. Exit 
 TABLESEARCH, indicating an unsuccessful table lookup. 
 
 c) A normal entry is encountered. Build the scan descriptor 
 NEXTSYM according to the symbol class. Exit TABLESEARCH, 
 indicating successful table lookup. 
 
 3.3.3 Insertion of a New Entry Into BIGTAB 
 
 When the symbol scanned in the source string is not found in the 
 BIGTAB table lookup, it must be then inserted into the symbol table: 
 
 a) The head word is created, inserting only the number of 
 characters. The semantic routines will set the semantic 
 part, and both right and left tree links will be empty when 
 the entry is created. 
 
 b) The head word and the data word(s) are inserted into the 
 table in the next available sequential location. A check 
 is made to insure that the complete entry will fit into one 
 array row, as to split elements of one symbol across array 
 rows would cause undue overhead due to array row segmenta- 
 tion in the B-5500. 
 
 c) It was found that the entry was not in BIGTAB when either 
 the right or the left of the entry head in location 
 ENTRYPOINTER was null. If in L4, K was set to one, then 
 the left pointer of the entry head at location ENTRYPOINTER 
 must be set to point to this new entry address. Otherwise, 
 in L5, K was set to two so the right pointer must be set to 
 
 
27 
 
 point to the new entry, 
 d) Build the scan descriptor NEXTSYM according to the symbol 
 class. Exit TABLESEARCH, indicating a successful table 
 lookup. 
 
 3.4 Procedures to Build BIGTAB Entries 
 
 There exist three major procedures, NUMERICLIT, STRINGET, and 
 ALPHAGET to assemble into SYMBUF to SYMBUF 7 the numeric literal, string 
 literal and identifier data types. They are functionally described by their 
 outputs in section 2.2. 
 
 3.5 The Macro Facility 
 
 Section 2.4 describing the data storage for the macro text gives an 
 adequate functional description of the procedures PROCESSMACRODECLARATION, 
 MACROINVOCATION, and PROCESSMACROACTUALPARAMETERPART. This section will dis- 
 cuss the procedure GETDESCRIPTORFROMMACROTAB, to illustrate how it "executes" 
 the descriptors placed in the macro table by the other procedures. 
 
 The major concept in designing this descriptor-based macro system 
 that would allow nearly arbitrary text in the parameters of the macro invoca- 
 tion, and that would further allow arbitrary (except recursive) macro invoca- 
 tions either within the text or within the actual parameter was the concept that 
 the descriptors could be considered as "instructions" directing the flow of data 
 from the table, that would be "interpreted" by the Alpha procedure GETDESCRIPTOR- 
 FROMMACROTAB. If a formal parameter or call on another macro is detected, 
 during scanning of the text in the declaration, a special "jump instruction" is 
 placed in the sequential macro table to direct the flow of data. At the end of 
 the macro text itself, and at the end of an actual parameter, a "return" word 
 is inserted - to direct the flow back to the point where it was interrupted. 
 
28 
 
 The procedure is called from the SCAN procedure when a new symbol 
 is needed in SCANMODE 5. The global variable NEXTMACRO contains the macro 
 table address of the next sequential entry to be chosen. The macro table 
 entry at this address is examined and "executed", depending on the class 
 field, bits [2:4] of the entry: 
 
 Class Value Action 
 
 1, 2, 3, 15 Normal SCAN descriptor . 
 
 Set GETDESCRIPTORFROMMACROTAB +■ MACROTAB [NEXTMACRO]. 
 Increment NEXTMACRO by one. Exit procedure. 
 
 8 Macro invocation descriptor . 
 
 1) Set up called macro's return descriptor. Return 
 location is either NEXTMACRO + 1 or the address following 
 the actual parameter table. This location has been in- 
 serted in bits [18:12] of the macro invocation descriptor 
 by the procedure PROCESSMACRODECLARATION. 
 
 2) Set in the called macro's entry head word one the ad- 
 dress of the actual parameter table. If parameters are 
 present, their location will be NEXTMACRO + 1. 
 
 3) Set NEXTMACRO to the first word of the called macro's 
 text, located immediately following the second header word. 
 
 4) Branch to code to examine a new macro table entry. 
 
 9 Return descriptor . 
 
 Either a complete macro call or an actual parameter has 
 been "executed". Consider the following two cases: 
 1) Return address is zero. This is true only for an 
 outermost macro call. Set GETDESCRIPTORFROMMACROTAB <- 0, 
 
29 
 
 Class Value Action 
 
 set SCANMODE *■ 0. Set PTMACROTAB (the pointer to the next 
 available location for insertion of next text) to the 
 value it had at the time the defined identifier was 
 scanned in the source string. This will "erase" the 
 actual parameters stored for this call. Exit from the pro- 
 cedure. 
 
 2) Return address is not zero. Set NEXTMACRO «- return 
 address, branch to code to examine a new macro table entry. 
 
 10 Formal parameter . 
 
 Extract from descriptor bits [6:12] the address of the 
 macro head word one. From bits [18:12] of the head word, 
 extract the location of the actual parameter table as set 
 when the actual parameters were scanned. From bits [18:12] 
 of the formal parameter descriptor, extract the parameter 
 number. Determine from the addresses and lengths de- 
 scriptor in the actual parameter table the location of the 
 specific actual parameter needed, as well as the address of 
 its return descriptor. Set the actual parameter return ad- 
 dress to NEXTMACRO + 1 . Set NEXTMACRO «- actual parameter 
 address. Branch to code to examine a new macro table entry. 
 
 Note that none of the "special" descriptors cause output from the pro- 
 cedure, but just a redirecting of the flow, followed by "execution" of the de- 
 scriptor in the new location. Note also that this procedure will work on either 
 empty macros or empty parameters. In both cases, the stored text will be simply 
 a return descriptor. 
 
30 
 
 3.6 Alpha Procedure SCAN 
 
 SCAN is the procedure that controls the actions of all the other pro- 
 cedures mentioned above. It is the prime interface with the parsing routines, 
 and is called when a new terminal symbol is required. The procedures to build 
 the macro table are declared in the SCAN procedure head, and thus call SCAN re- 
 cursively to obtain the descriptors to store in the macro table. 
 
 The value of SCAN as a function is normally the scan descriptor as 
 discussed at length in section 2.4. In SCANMODE 4, its value will be the 
 contents of SYMBUF[0]. 
 
 There are several modes of operation of SCAN, depending on the way 
 the source characters are to be assembled. Setting of the global variable 
 SCANMODE prior to call of SCAN will cause one of the following actions to be 
 taken: 
 
 SCANMODE Action Taken by the Scanner 
 
 Normal operational mode. Ignore all embedded blanks out- 
 side of string literals. Return normal SCAN descriptors. 
 
 1 As in SCANMODE 0, but reduce adjacent embedded blanks to 
 one blank and report as a single special symbol. 
 
 2 Scan the text between FCR and LCR. Return a descriptor on 
 each character in the source string as a single special 
 character SCAN descriptor, but ignore blanks. 
 
 3 As SCANMODE 2, but reduce adjacent blanks to one and report. 
 
 4 As SCANMODE 0, but return contents of SYMBUF[0] - i.e., the 
 first BCD characters of the terminal symbol - as the SCAN 
 descriptor. Do not look up or enter the symbol in BIGTAB. 
 
 5 Fetch SCAN descriptor from the macro table. 
 
31 
 
 When in SCANMODE 0, 1 or 4, a branch is made on CHARCLASS[45:3] of 
 the terminal head symbol to define whether an identifier, a numeric literal, 
 a string literal, a special word keyword or a single special symbol follows. 
 Based on the specific branch made, the terminal symbol is assembled in the 
 proper format into the array SYMBUF. In SCANMODE or 1 , a BIGTAB table 
 lookup is performed, obtaining the BIGTAB semantic part and address of the 
 symbol. With this information, the SCAN descriptor is assembled. 
 
 3.7 Integer Procedure SHAKEOUTBIGTAB 
 
 It was noticed when working with the initial BIGTAB produced by the 
 1969 version of TRANQUIL that there was quite a large imbalance in the tree 
 structure. Specifically, the initial BIGTAB contained 198 entries consisting 
 of 109 keywords and 89 language terminals. A reflection on the properties of 
 binary trees shows that in the worst case, all nodes could be strung out", 
 requiring 198 levels, and in the best case, eight levels (riog 2 198l). The 
 importance in the number of levels is in the speed of lookup--the more levels 
 to the tree, the more comparisons that must be made to find an entry in the 
 tree. 
 
 On the 198-entry tree actually produced by the syntax preprocessor, the 
 level number of the nodes varied from one (for the head of the tree) to eigh- 
 teen. The average level of all 198 nodes was nine. For comparison, a fully 
 balanced binary tree with eight levels could contain 255 nodes, will have a 
 maximum level of eight, and an average level of 7.03. 
 
 An algorithm was developed by this author that will balance the tree 
 structure of any input BIGTAB-type tree, modifying the left and right tree 
 pointers, but leaving all nodes in the same locations as previously. The maxi- 
 mum level of the balanced tree will be riog ? (N-l)l, where N is the total number 
 
32 
 
 of nodes in the tree to be balanced. 
 
 Before the algorithm is discussed in detail, some observations can be 
 made about the structure of the balanced tree. Given the nodes in lexical 
 order in the sequential table TAB[1] to TAB[N], the tree developed by this 
 algorithm will have all odd TAB entries, i.e., TAB[1], TAB[3], TAB[5], etc., 
 as terminal nodes-with both left and right tree pointers null. Conversely, 
 all even TAB entries, i.e., TAB[2], TAB[4], etc., will have at least one non- 
 null link. Furthermore, as the tree is "grown", the left sub-tree of any node 
 will always be complete. If the tree is not full, it will be the right sub- 
 trees that will be partially empty. Figure 4 illustrates these points. 
 
 The essence of the algorithm is to order the nodes into a linear list 
 TAB, and then to visit each node on each level sequentially, from left to 
 right, computing and setting the new BIGTAB tree links as each node is visited. 
 To control sequencing of the algorithm, a queue is constructed, being ini- 
 tialized both front and rear with the new head node. When the right and left 
 tree links are computed for a node, the TAB address of these sub-nodes are 
 inserted into the rear of the queue. This results in the visit of all nodes 
 in a certain level before progressing to the next lower level. 
 
 Use of this procedure has been made a control card option. If "BALANCE" 
 appears on a control card, the initial BIGTAB is balanced. Thereafter, the 
 procedure may be called from procedure TABLESEARCH, whenever a BIGTAB array 
 row fills up. 
 
 On a series of benchmark tests using a 1943-card-image input deck, using 
 the 1969 TRANQUIL BIGTAB, the balancing added about 2.8 seconds to the two 
 minute total scan time. But use of the balanced BIGTAB was able to increase 
 throughput of the scanner between six and seventeen percent over that using the 
 unbalanced BIGTAB. 
 
33 
 
 Number of Nodes 
 1 
 
 2 
 
 Balanced Tree 
 
 m 
 m 
 
 21 
 
 E 
 
 3 
 
 Hi m 
 
 ll 
 
 4 
 
 □ [TIE 
 
 Figure 4. Examples of Balanced Tree Structures 
 
34 
 
 ALGORITHM B - Balance the Tree Structure 
 
 Let BIGTABWORD be considered a pointer with bits [22:13] as a left 
 pointer field, and bits [35:13] as a left pointer field. Let each TAB entry 
 have three fields: bits [15:10] as the queue link, bits [25:10] as the delta 
 field, and bits [35:13] as the BIGTAB address. 
 
 Bl Traverse the tree in postorder (see Algorithm T, Knuth [6], 
 page 317), using an auxiliary stack, placing the BIGTAB ad- 
 dresses of the nodes visited into a sequential array TAB[1] 
 to TAB[N]. 
 
 B2 Find the size of the smallest fully balanced tree that has 
 less than or equal to N nodes. Let I be this number, with a 
 value 1, 3, 7, 15, 31, 2 m -l(m>J). The exponent m is the 
 maximum number of levels in the balanced tree. 
 
 B3 Set SHAKEOUTBIGTAB to be the BIGTAB address of the root of 
 the balanced tree. This will be found in TAB[[I/2]], for 
 example, if 1=15, TAB[8] will contain the BIGTAB address of 
 the root of the tree. 
 
 B4 Set F «- R «- P + [1/21, the root of the tree. Set the delta 
 field of TAB[P] +• P/2. 
 
 B5 Is F=0? (output queue exhausted?). Yes , go to B13 (exit). 
 
 B6 Set P •*■ F (front of queue). Compute right and left tree 
 pointers for node P. Set DELTA *■ delta field of TAB[P]. 
 Set Q <- BIGTAB address stored in TAB[P]. 
 
 B7 If DELTA = 0, then node Pisa terminal node with both pointers 
 zero. Set BIGTABWORD to zero. Go to B12. 
 
 B8 DELTA f 0. Compute left tree pointer. Set LINK «- P - DELTA. 
 Set left part of BIGTABWORD to the BIGTAB address field of 
 
35 
 
 TAB[LINK]. Set the delta field of TAB[LINK] to DELTA/2 (as it 
 is on the next lower level), and insert LINK into the rear of 
 the output queue. 
 
 B9 Compute the right tree pointer. Set LINK «■ P + DELTA. Since 
 the right sub-tree may be incomplete, recompute DELTA. If 
 LINK > N, then set DELTA <- DELTA/2, go to B9. 
 
 BIO If DELTA = 0, then the right sub-tree is empty. Set the right 
 part of BIGTABWORD to zero. Since there is no right sub-tree, 
 nothing needs to be put into the output queue. Go to B12. 
 
 Bll DELTA JO. Set right part of BIGTABWORD to the BIGTAB address 
 field of TAB[LINK]. Set DELTA/2 into the delta field of 
 TAB[LINK]. Insert LINK into the rear of the output queue. 
 
 B12 Set BIGTAB[Q].[22:26] *- BIGTABWORD. Set F to the next node 
 from the front of the queue. Go to B5. 
 
 B13 Exit procedure. Entire tree is balanced. 
 
36 
 
 APPENDIX A 
 
 SYNTAX AND SEMANTICS OF SCANNER-DEFINED ITEMS 
 
 <*!>, the Identifier Metaclass 
 
 TBNF SYNTAX 
 <LETTER> 
 <DECIMAL DIGIT> 
 <ALPHABETIC CHARACTER> 
 ALPHANUMERIC CHARACTER> 
 
 A | B | C | D | ...|Z 
 0|1|2|3|4|5|6|7|8|9 
 
 <LETTER>|<ALPHABETIC CONTROL CARD OPTION 
 <ALPHABETIC CHARACTER> |<DECIMAL DIGIT>| 
 <ALPHANUMERIC CONTROL CARD OPTION> 
 <ALPHABETIC CHARACTER> |<*I> 
 ALPHANUMERIC CHARACTER> 
 
 SEMANTICS 
 
 All identifiers are limited to 48 characters in length. Identifiers 
 may not extend over card-image boundaries. The standard syntax of ALPHABETIC 
 CHARACTER> or ALPHANUMERIC CHARACTER> may be augmented by control card option 
 (see section 2.5) . 
 
 <*R>, the Keyword Metaclass 
 
 TBNF SYNTAX 
 
 <*R> 
 
 ;= <*I>|"#" <*I> 
 
 SEMANTICS 
 
 If the "RSWD" control card option is selected, a simple identifier may 
 be used as a keyword - and cannot be re-declared by the compiler-user. The 
 nominal state is the "SPWD" or special word option, where a "#" must precede 
 the syntax-defined keyword identifier for it to have keyword meaning. The 
 
 
37 
 
 programmer may then choose freely the identifiers he uses with no fear of 
 encountering reserved identifiers he did not know about. 
 
 <*N>, the Numeric Literal Metaclass 
 TBNF SYNTAX 
 
 <DECIMAL INTEGER> 
 
 <DIGIT> 
 
 <INTEGER> 
 
 <RADIX P0INT> 
 
 FRACTIONAL PART> 
 
 <EXP0NENT DELIMITER> 
 
 <EXP0NENT PART> 
 
 <BASE LEFT DELIMITER> 
 <BASE RIGHT DELIMITER> 
 <BASE PART> 
 
 <REAL> 
 
 <FIXED P0INT> 
 
 <*N> 
 
 = list <DECIMAL DIGIT> 
 
 = <LETTER>|<DECIMAL DIGIT> 
 
 = <DECIMAL DIGITxDIGIT>* 
 _ M it 
 
 = <RADIX POINTxDIGIT>* 
 = "@" 
 
 = <EXP0NENT DELIMITER>[+|-]? 
 
 <DECIMAL INTEGER> 
 _ n / ii 
 
 _ ii \ ii 
 
 = <BASE LEFT DELIMITERxDECIMAL INTEGER> 
 
 <BASE RIGHT DELIMITER> 
 = [<INTEGER>|<FIXED P0INT>]? 
 
 <EXP0NENT PART>| FIXED P0INT> 
 = <INTEGER><RADIX POINT><DIGIT>*| 
 
 <RADIX P0INT> list <DIGIT> 
 = [<INTEGER>|<REAL>]<BASE PART>? 
 
 SEMANTICS 
 
 A numeric literal may be split across card images. The length of a 
 numeric literal must not exceed the following formula: 
 
 I + N < 62, if base is decimal 
 I + N < 61, for non-decimal base, 
 
38 
 
 where I represents the total number of integer and fractional digits, not 
 counting the radix point, and N represents the number of digits in the normal- 
 ized exponent, not counting the exponent delimiter. No blanks may be embedded 
 within a numeric literal. The exponent part may not exceed the range ± 999. 
 The base part must be in the range 2-36. 
 
 EXAMPLES 
 
 INTEGER 
 
 1 
 
 0A3456 (must start with a decimal digit) 
 FIXED POINT 
 
 1. 
 
 l.ABCDE 
 
 .34291 
 REAL 
 
 023 
 
 1.043 
 
 24897320-728 
 
 77A34Q.9L70+3 
 <*N> 
 
 ABCDE(16) 
 
 3.489023(12) 
 
 1011110001110(2) 
 
 <*S>, the String Literal Metaclass 
 
 TBNF SYNTAX 
 
 <STRING B0DY> 
 <STRING QU0TE> 
 <*S> 
 
 = [<ANY> but [" not "]]* 
 
 = <STRING QUOTExSTRING BODYxSTRING QUOTE 
 
39 
 
 SEMANTICS 
 
 The definition of <STRING B0DY> indicates that if the string quote 
 appears within the body, it must be double. When the string is reduced to a 
 BIGTAB entry, the redundant quote is deleted. The string literal must be 
 completely contained in one card-image, and may not exceed 48 characters (not 
 counting redundant double string quotes). 
 
 EXAMPLES 
 
 INPUT (stored as "INPUT" ) 
 
 "EXPAND", "ILLIAC IV TRANSLATOR WRITING SYSTEM 1 
 
 THE TWS MACRO FACILITY 
 
 TBNF SYNTAX 
 
 <MACR0 DECLARATION> 
 
 <MACR0 DEFINITION> 
 
 <MACR0 FORMAL PARAMETER 
 PART> 
 
 <MEND> 
 
 <MACR0 INV0CATI0N> 
 
 <MACR0 ACTUAL PARAMETER 
 
 PART> 
 
 DEFINE list <MACR0 DEFINITION> 
 separator "," ";" 
 
 <*IxMACR0 FORMAL PARAMETER PART> " = " 
 [<ANY> but <MEND> but DEFINE]* <MEND> 
 
 "[" list <*!> separator "," "]"| 
 "(" list *I separator "," ")" 
 "?"|MEND|<C0NTR0L CARD MEND> 
 <*IxMACR0 ACTUAL PARAMETER PART> 
 
 "[" list <MACR0 ACTUAL PARAMETER> 
 separator "," "]"| 
 ")" list< MACRO ACTUAL PARAMETER> 
 separator "," ")" 
 
40 
 
 <MACR0 ACTUAL PARAMETER ::= [<ANY> but. DEFINE but. 
 
 <UNBRACKETED C0MMA>]* 
 <UNBRACKETED C0MMA> ::= "," 
 
 SEMANTICS 
 
 The limitation that a macro definition not contain "DEFINE" enforces 
 the rule that macro declarations not be nested. Macro calls may exist either 
 within the defined text, or in an actual parameter. The restriction exists 
 that a macro may not contain a call on itself, either directly or indirectly. 
 
 The unbracketed comma is a recognition that an actual parameter may 
 contain virtually any text. Specifically, if another macro call is in the 
 actual parameter part, or a call on a procedure containing its own actual 
 parameters, delimited by commas, a way must be devised for defining the param- 
 eter delimiter comma. The definition that has been implemented is: 
 
 <UNBRACKETED C0MMA> ::= A level zero comma where, when the 
 
 initial "[" or "(" is recognized, the 
 level is set to zero, and incremented 
 by one for each subsequent "[" or "(" 
 and decremented by one for each sub- 
 sequent "]" or ")". 
 
 This concept could be expanded by modifying the procedure PROCESS- 
 MACROACTUALPARAMETERPART in a language-specific manner to accommodate such 
 additional bracketing pairs that might occur in an actual parameter as 
 INTEGER - ";" or REAL - ";". 
 
 <C0NTR0L CARD MEND> is a control card option "MACROEND X", where X may 
 be "MEND" or a single special character. If "MEND" appears, then this keyword 
 
41 
 
 will mark the end of the macro definition. If a single special character, then 
 that will end the definition. 
 
 EXAMPLES 
 
 <MACR0 DECLARATION 
 $ MACROEND # 
 
 DEFINE MACROTABLE(Pl) = MACROTAB[(TWSTI PI). [36:5], 
 TWSTI.[41:7]]#,INCR(X) = X + X+l #, ALPHABETIC = [42:6] # ; 
 MACRO INVOCATION 
 
 . . . MACROTABLE(IF A+B Y THEN ELSE (INCR(W))) . . . 
 ADD (A[3,4,7,9,12], B[4,3]); 
 
42 
 
 APPENDIX B 
 SAMPLE PROGRAM 
 
 In order to illustrate the concepts described in this paper, we 
 consider a comprehensive example. This begins with the TBNF grammar that 
 defined the initial symbol table and proceeds through the balancing of the 
 symbol table at the beginning of the compilation of a program and through 
 the scanning of the text of a sample program. 
 
 TBNF Grammar for the Language DEMALGOL 
 
 The following language is used as a TWS bench mark. This language, 
 with the addition of the semantic actions, is documented in Trout [4]. 
 
 DEMALGOL 
 
 <PR0GRAM> 
 <BL0CK> 
 
 <DECLARATION> 
 
 <STATEMENT> 
 
 ::= <BL0CK>; 
 
 ::= BEGIN <DECLARATION>* list <STATEMENT> 
 separator ";" 
 END ; 
 
 ::= [INTEGER | BOOLEAN | LABEL] 
 
 [ list <*!> separator ","|<ERR0R>; 
 
 ::= [<LABEL> " : "]*[[60 T0|G0T0]<LABEL> | 
 IF <B00LEAN> THEN <STATEMENT> | 
 IF <B00LEAN> THEN <STATEMENT> 
 [ELSE <STATEMENT>]?| 
 <VARIABLE> [": = "|"«-"] <VALUE> | 
 <> [ ahead ";" [ ahead END | ahead ELSE] | 
 <ERR0R>; 
 
 
43 
 
 o 
 
 
 
 
 
 
 
 H- 
 
 
 2T 
 
 
 
 UJ 
 
 
 
 X 
 
 
 
 
 
 1— 
 
 
 cc 
 
 
 
 
 
 
 
 
 
 
 
 
 n 
 
 
 ^ 
 
 
 Ul 
 
 
 
 
 
 
 
 2: 
 
 
 
 
 
 U. 
 
 
 
 
 ^^^ 
 
 1— 1 
 
 
 h- 
 
 ' 
 
 
 
 j^ 
 
 O 
 
 
 
 
 ^^ 
 
 O 
 
 
 
 
 
 
 
 — 
 
 
 
 
 
 I 1 
 
 y 
 
 UJ 
 
 ^ 
 
 / 
 
 >— 1 
 
 <c 
 
 / 
 
 u. 
 
 UJ 
 
 
 UJ 
 
 — 1 
 
 
 Q 
 
 O 
 
 
 
 
 
 CO 
 
 
 
 CD 
 S- 
 3 
 +-> 
 <J 
 
 S- 
 +-> 
 OO 
 
 CD 
 CD 
 
 S- 
 
 -o 
 
 CD 
 U 
 
 c 
 
 -O 
 
 c 
 
 in 
 
 CD 
 
 cn 
 
 c/o 
 I 
 
 cd : 
 
 <C UJ 
 
 I- I 
 C5 ; 
 
 •-1 o 
 co o 
 
 00 
 
 r— CTl 
 
 to 
 
 CM 
 
 CO 
 C\J 
 
 en 
 
 CM 
 
 CD 
 
 >— «=C 
 
 CO 
 
 O 
 \ CM 
 
 O 1— 
 
 O O 
 
 Q O 
 O O 
 O O 
 
 O 
 
 ^ «3" 
 
 o co 
 ui o 
 
 CO \ 
 
 o r-- 
 o <£> 
 
 co o 
 
 ^ UJ 
 
 Q 1— l— 
 
 O CO o 
 
 0(00 
 
 1^. 
 -^<: 
 
 O UI 
 
 •^ I 
 
 1— o 
 
 1— o 
 a: \co 
 
 OOiO 
 
 0^00 
 
 CO 
 CO 
 
 CO 
 
 CO 
 
 co 
 
 ID 
 
 "ii- 
 
 
 LO CO 
 
 o < 
 
 000 
 o r^ o 
 
 o o 
 
 CvJ CM «^J-|— CM «d" Ul «3- GO CO O CM 
 
 ooooooou_onzo_ioz:oor: 
 
 1 — OMOOOO^OinOlOONOCOO 
 
 r-~.or^or~-or^.or--oi^or^or^.o 
 
 o ui 
 \^ 
 
 O •— 1 
 
 VJD UJ 
 
 O Q 
 
 \o 
 
 O O 
 
 O 
 
 O Q 
 »=t UI 
 
 o 2: 
 o o 
 
 CO 
 
 < x 
 
 CJ3 Q 
 
 •— 1 2 
 CO 1— 1 
 
 LncoNCOcriOi — CNjoT^LnioNcocnoi — ojm>si-invoNOoaiOr-cjro<tificoNOOcriO 
 
 1 — 1— 1— 1 — 1— CMCMCMCMCMCMCMCMCMCMCOCOCOCOCOCOCOCOCOCO'^l-«^-^-^-^-^t'«=J-^-<^-^fLn 
 
44 
 
 S- : 
 
 ■M 
 U 
 
 3 
 
 i- 
 +J 
 00 
 
 <u 
 
 0) 
 
 s- 
 
 dJ 
 (J 
 
 C 
 
 <o 
 
 (0 
 
 CD 
 
 vc 
 
 c 
 
 "r" 
 
 LL 
 
 CM 
 
 GO 
 I 
 
 cq ; 
 
 <C UJ 
 I— I 
 
 C3 : 
 •-. o 
 
 CQ CJ> 
 
 CO o 
 
 ro ^, 
 \o 
 o \ 
 
 \CM 
 o i— 
 
 c o 
 
 CD 
 
 CO 
 
 Q O 
 O O 
 
 o o 
 
 o 
 
 o 
 s o 
 
 »— I ^"* 
 CJMD 
 Lul C 2: 
 
 o r^ o 
 o <£> o 
 
 o 
 
 ^UJ 
 O CJ3 
 
 cc o o 
 ^o o o 
 
 o 
 ^< 
 
 O UJ 
 
 \ _1 
 1— o 
 1— o 
 \co 
 en o 
 (£> o 
 
 LO 
 
 CO 
 
 CX> 
 CM 
 
 •t^- 
 
 LO CQ C\J 
 
 CM 
 
 CO 
 
 1^. 
 
 CM 
 
 cr> 
 
 CO 
 
 
 <T< 
 
 CM 
 
 ■3" I— 
 
 o«=coooooo 
 
 \_J\O\l— ^.CD 
 000>— OCMOCOO 
 
 CM 
 
 o 00. O _J o ^ 
 
 -~^ 1— \ uj -v. ear 
 
 cmoicoNO 
 
 or^.or-~or^or^.or^.or^or^-or^-o 
 
 cc 10 
 o o 
 o\ 
 o o 
 
 uj o 
 
 1— 1 O Q 
 
 u_ \^: 
 
 UJ «=f UJ 
 
 o \o 
 000 
 
 CQ 
 <C X 
 
 CD Q 
 1— 1 : 
 
 CQ 1 
 
 m^NOOC^Oi-CMn^Ln^NCOC^Or-wro^inu5ivoomo^wr25i2<£'>22S 
 
<LABEL> 
 <VALUE> 
 
 <BOOLEAN> 
 
 <ARITHMETIC PRIMARY> 
 
 <VARIABLE> 
 
 <ERROR> 
 
 45 
 
 ::= <*I> 
 
 ::= list ARITHMETIC PRIMARY> separator 
 
 ["+"|"-"|"x"|"/"]; 
 ::= <*I>|"(" BOOLEAN ")"| 
 
 <VALUE>["="| , 7"|"<"|">" "<" ">"] 
 <VALUE>; 
 = "(" VALUE ")"|<VARIABLE>|<*N>; 
 = <*I>; 
 = <ANY>[<ANY> but ";" but END]*; 
 
 END 
 
 BIGTAB as Initialized by the Syntax Preprocessor 
 
 Figure 5 represents the BIGTAB as produced by the syntax preprocessor for 
 DEMALGOL, with DEFINE and MEND added during the initialization of the run-time 
 compiler. As to format, for the head nodes, four fields are defined: semantic 
 part containing the preprocessor-assigned symbol number for each keyword (in 
 decimal), the length field indicating the number of words (indexed from zero), 
 the number of valid characters in the last word (indexed from one), and the 
 right and left pointer fields (in decimal). For sake of clarity, the non- 
 terminals that would be inserted in the symbol table are omitted from this 
 example. 
 
 SAMPLE DEMALGOL PROGRAM 
 
 $RSWD LIST ALPHANUMERIC - BALANCE 
 
 % SOLUTION OF RIGHT TRIANGLES WITH SIMULATED I/O 
 
 % AFTER AN EXAMPLE IN THE MAD PRIMER [9] 
 
 001 
 002 
 003 
 004 
 
46 
 
 BEGIN 005 
 
 INTEGER A, B, C, X, TEMP 006 
 
 LABEL GOOD-RIGHT-TRIANGLE, NOT-A-RIGHT-TRIANGLE, 007 
 
 READ-LOOP 008 
 
 DEFINE ABS ( ARG, ANS ) = 009 
 
 TEMP «■ ARG; 010 
 
 IF TEMP < THEN Oil 
 
 ANS ^ - TEMP 012 
 
 ELSE ANS <- TEMP ? , 013 
 
 GO-TO-GOOD = 014 
 
 IF X < 0.1 THEN GO TO GOOD-RIGHT-TRIANGLE ? ; 015 
 
 READ-LOOP: % READ A, B, C; WRITE A, B, C 016 
 
 ABS ( ((AxA) + (BxB)) - (C*C), X); 017 
 
 GO-TO-GOOD; 018 
 
 ABS ( ((BxB) + (CxC)) - (AxA), X); 019 
 
 GO-TO-GOOD; 020 
 
 ABS ( ((CxC) + (AxA)) - (BxB), X) 021 
 
 GO-TO-GOOD; 022 
 
 023 
 
 NOT-A-RIGHT-TRIANGLE: 024 
 
 % WRITE "NOT A RIGHT TRIANGLE" 025 
 
 GOTO READ-LOOP; 026 
 
 027 
 
 GOOD-RIGHT -TRIANGLE: 028 
 
 % WRITE "GOOD RIGHT TRIANGLE" 029 
 
 GO TO READ-LOOP; 030 
 
 END 031 
 
 
47 
 
 ACTION BY THE SCANNER ON THIS PROGRAM 
 
 This section will trace the major actions of the scanner on text 
 lines 1-18 of the above program, with the following notes on format: 
 
 a) SCAN descriptors are shown with five fields, representing 
 the class, symbol number or semantic part, BIGTAB pointer 
 or symbol number, class and BIGTAB pointer or symbol number. 
 The values are in decimal. 
 
 b) BIGTAB header words are shown with four fields: semantic 
 part, number of words/characters, left tree pointer, right 
 tree pointer. In the case of the semantic part of macro 
 definitions or parameters, the semantic part is represented 
 by (a/b/c), where a_ is bit [1:1], the macro flag, _b is bit 
 [2:1], the formal parameter flag, and c_ is bit [3:12], the 
 macro address or parameter number. 
 
 c) BIGTAB data words are shown as they appear in section 2.2, 
 i.e., in character notation. 
 
 d) MACROTAB entries are shown with various fields, as described 
 in section 2.4.3. If a field is designated as unused, it 
 will not be listed below. 
 
 $RSWD LIST ALPHANUMERIC - BALANCE 001_ 
 
 CHARCLASS[#] = 
 
 SPSTYP =0 (i.e., reserved word option rather than special) 
 
 CHARCLASS[-]. 44 = 1 
 
 (See Figure 7 for the results of balancing the tree) 
 
 BIGTAB[1] = 33 (links initial balanced BIGTAB to <*I> tree) 
 
 BIGTAB[15] = 
 
48 
 
 Lines 2, 3 and 4 cause no output by the scanner other than the listing. 
 
 BEGIN 
 
 005 
 
 *SCAN descri 
 
 iptor 
 
 15/66/24/15/66 
 
 BEGIN 
 
 INTEGER A, B, C, 
 
 , X, TEMP 
 
 
 006 
 
 *SCAN descri 
 
 iptor 
 
 15/68/34/15/68 
 
 INTEGER 
 
 BIGTAB[51] 
 
 = 
 
 0/01/0/0 
 
 
 BIGTAB[52] 
 
 = 
 
 00A 
 
 
 BIGTAB[43] 
 
 = 
 
 77/03/51/0 
 
 (link to AND) 
 
 *SCAN descri 
 
 ptor 
 
 1/0/51/1/51 
 
 A 
 
 *SCAN descri 
 
 ptor 
 
 15/58/58/15/58 
 
 > 
 
 BIGTAB[53] 
 
 = 
 
 0/01/0/0 
 
 
 BIGTAB[54] 
 
 = 
 
 00 B 
 
 
 BIGTAB[43] 
 
 = 
 
 77/03/51/53 
 
 (link to AND) 
 
 *SCAN descri 
 
 ptor 
 
 1/0/53/1/53 
 
 B 
 
 *SCAN descri 
 
 ptor 
 
 15/58/58/15/58 
 
 » 
 
 BIGTAB[55] 
 
 = 
 
 0/01/0/0 
 
 
 BIGTAB[56] 
 
 = 
 
 OOC 
 
 
 BIGTAB[26] 
 
 = 
 
 69/11/0/55 
 
 (link to BOOLEAN) 
 
 *SCAN descri 
 
 ptor 
 
 1/0/55/1/55 
 
 C 
 
 *SCAN descri 
 
 ptor 
 
 15/58/58/15/58 
 
 j 
 
 BIGTAB[57] 
 
 = 
 
 0/01/0/0 
 
 
 BIGTAB[58] 
 
 = 
 
 OOX 
 
 
 BIGTAB[33] 
 
 = 
 
 72/02/31/57 
 
 (link to TO) 
 
 *SCAN descri 
 
 ptor 
 
 1/0/57/1/57 
 
 X 
 
 *SCAN descri 
 
 ptor 
 
 15/58/58/15/58 
 
 » 
 
 BIGTAB[59] 
 
 = 
 
 0/04/0/0 
 
 
49 
 
 BIGTAB[60] 
 BIGTAB[39] 
 *SCAN descriptor 
 LABEL GOOD-RIGHT-TRIANGLE, 
 
 OOTEMP 
 
 75/04/59/0 (link to THEN) 
 1/0/59/1/59 TEMP 
 NOT-A-RIGHT-TRIANGLE, 
 
 017 
 
 *SCAN descr 
 
 iptor 
 
 15/70/29/15/70 
 
 LABEL 
 
 BIGTAB[61] 
 
 = 
 
 0/31/0/0 
 
 
 BIGTAB[62] 
 
 = 
 
 00G00D-R 
 
 
 BIGTAB[63] 
 
 = 
 
 OOIGHT-T 
 
 
 BIGTAB[64] 
 
 = 
 
 OORIANGL 
 
 
 BIGTAB[65] 
 
 = 
 
 00E 
 
 
 BIGTAB[35] 
 
 = 
 
 73/04/61/0 
 
 (link to GOTO) 
 
 *SCAN descri 
 
 iptor 
 
 1/0/61/1/61 
 
 GOOD-RIGHT-TRIANGLE 
 
 *SCAN descri 
 
 ptor 
 
 15/58/58/15/58 
 
 > 
 
 BIGTAB[66] 
 
 = 
 
 0/32/0/0 
 
 
 BIGTAB[67] 
 
 = 
 
 00N0T-A- 
 
 
 BIGTAB[68] 
 
 = 
 
 OORIGHT- 
 
 
 BIGTAB[69] 
 
 = 
 
 OOTRIANG 
 
 
 BIGTAB[70] 
 
 = 
 
 OOLE 
 
 
 BIGTAB[59] 
 
 = 
 
 0/04/66/0 
 
 (link to TEMP) 
 
 *SCAN descri 
 
 ptor 
 
 1/0/66/1/66 
 
 NOT-A-RIGHT-TRIANGLE 
 
 *SCAN descri 
 
 ptor 
 
 15/58/58/15/58 
 
 > 
 
 READ-LOOP 
 
 
 
 008 
 
 BIGTAB[71] 
 BIGTAB[72] 
 BIGTAB[73] 
 BIGTAB[66] 
 
 0/13/0/0 
 OOREAD-L 
 OOOOP 
 0/32/0/71 
 
 (link to NOT-A-RIGHT...) 
 
50 
 
 *SCAN descriptor 
 DEFINE ABS ( ARG, ANS ) = 
 
 1/0/71/1/71 
 
 READ-LOOP 
 
 009 
 
 BIGTAB[74] 
 
 BIGTAB[75] 
 
 BIGTAB[51] 
 
 MACR0TAB[1] 
 
 BIGTAB[76] 
 
 BIGTAB[77] 
 
 BIGTAB[53] 
 
 MACR0TAB[4095 = 
 
 BIGTAB[78] 
 
 BIGTAB[79] 
 
 BIGTAB[76] 
 
 MACR0TAB[4094] = 
 
 TEMP +■ ARG; 
 
 (l/0/0)/03/0/0 
 
 OOABS 
 
 0/01/0/74 
 
 0/74/0/0 
 
 (l/l/0)/03/0/0 
 
 OOARG 
 
 0/01/76/0 
 
 0/77 
 
 (l/l/D/03/0/0 
 
 OOANS 
 
 (l/l/0)/03/78/0 
 
 0/79 
 
 (link to A) 
 
 (set up head word 2) 
 
 (link to B) 
 
 (formal parameter save) 
 
 (link to ARG) 
 (formal parameter save) 
 
 010 
 
 MACR0TAB[2] 
 MACR0TAB[3] 
 MACR0TAB[4] 
 MACR0TAB[5] 
 IF TEMP < THEN 
 
 1/0/59/1/59 
 15/31/31/15/31 
 10/0/0 
 15/46/46/15/46 
 
 (scan of TEMP) 
 (scan of «-) 
 (formal parameter) 
 (scan of ;) 
 
 on 
 
 MACR0TAB[6] 
 
 MACR0TAB[7] 
 
 MACR0TAB[8] 
 
 BIGTAB[77] 
 
 BIGTAB[78] 
 
 BIGTAB[2] 
 
 15/74/37/15/74 
 
 1/0/59/1/59 
 
 15/30/30/15/30 
 
 0/06/0/0 
 
 01000000 
 
 0/0/0/77 
 
 (scan of IF) 
 (scan of TEMP) 
 (scan of <) 
 
 (start numeric tree) 
 
51 
 
 MACR0TAB[9] 
 MACR0TAB[10] 
 ANS + - TEMP 
 
 2/0/77/2/77 
 15/75/39/15/75 
 
 (scan of 0) 
 (scan of THEN) 
 
 012 
 
 MACR0TAB[11] = 
 MACR0TAB[12] = 
 MACR0TAB[13] = 
 MACR0TAB[14] = 
 MACR0TAB[15] = 
 ELSE ANS + TEMP 
 
 10/0/1 
 
 15/31/31/15/31 
 2/0/77/2/77 
 15/44/44/15/44 
 
 1/0/59/1/59 
 
 (formal parameter ANS) 
 (scan of -*-) 
 (scan of 0) 
 (scan of -) 
 (scan of TEMP) 
 
 013 
 
 MACR0TAB[16] 
 
 MACR0TAB[17] 
 
 MACR0TAB[18] 
 
 MACR0TAB[19] 
 
 MACR0TAB[20] 
 
 BIGTAB[78] 
 
 BIGTAB[76] 
 
 MACR0TAB[0] 
 
 GO-TO-GOOD 
 
 15/76/41/15/76 
 
 10/0/1 
 
 15/31/31/15/31 
 
 1/0/59/1/59 
 
 15/14/14/15/14 
 
 0/03/0/0 
 
 0.03/0/0 
 
 20/0/2 
 
 (scan of ELSE) 
 
 (formal parameter ANS) 
 
 (scan of «-) 
 
 (scan of TEMP) 
 
 (scan of ?) 
 
 (reset BIGTAB semantic-ARG) 
 
 (reset BIGTAB semantic-ANS) 
 
 (set up head word 1) 
 
 014 
 
 BIGTAB[79] 
 BIGTAB[80] 
 BIGTAB[81] 
 BIGTAB[61] 
 MACR0TAB[22] = 
 IF X < 0.1 THEN GO 
 
 (l/l/21)/14/0/0 
 00G0-T0- 
 00G00D 
 
 0/31/79/0 (link to GOOD-RIGHT...) 
 0/79/0/0 (set up head word 2) 
 TO GOOD-RIGHT-TRIANGLE ? : 015 
 
 MACR0TAB[23] 
 MACR0TAB[24] 
 
 15/74/37/15/74 (scan of IF) 
 1/0/57/1/57 (scan of X) 
 
52 
 
 MACR0TAB[25] 
 BIGTAB[82] 
 BIGTAB[83] 
 BIGTAB[77] 
 MACR0TAB[26] 
 MACR0TAB[27] 
 MACR0TAB[28] 
 MACR0TAB[29] 
 MACR0TAB[30] 
 MACR0TAB[31] 
 MACR0TAB[21] 
 READ-LOOP: % READ A, 
 
 15/30/30/15/30 
 0/06/0/0 
 : 2011 000 
 0/06/82/0 
 2/0/82/2/82 
 15/75/39/15/75 
 15/71/31/15/31 
 15/72/33/15/33 
 1/0/61/1/61 
 15/14/14/15/14 
 31/0/0 
 B, C; WRITE A, B, C 
 
 (scan of <) 
 
 (numeric 0.1 ) 
 (link to 0) 
 (scan of 0.1 ) 
 (scan of THEN) 
 (scan of GO) 
 (scan of TO) 
 
 (scan of GOOD-RIGHT-...) 
 (scan of ?) 
 
 (set macro head word 1) 
 
 016 
 
 *SCAN descriptor 1/0/71/1/71 
 *SCAN descriptor 15/13/13/15/13 
 ABS ( ((AxA) + (BxB)) - (OC), X) ; 
 
 READ-LOOP 
 
 017 
 
 MACR0TAB[0] 
 
 MACR0TAB[33] 
 
 MACR0TAB[34] 
 MACR0TAB[35] 
 MACR0TAB[36] 
 MACR0TAB[37] 
 MACR0TAB[38] 
 MACR0TAB[39] 
 MACR0TAB[40] 
 MACR0TAB[41] 
 
 20/32/2 
 
 15/29/29/15/29 
 
 15/29/29/15/29 
 
 1/0/51/1/51 
 
 15/32/32/15/32 
 
 1/0/51/1/51 
 
 15/45/45/15/45 
 
 15/16/16/15/16 
 
 15/29/29/15/29 
 
 1/0/53/1/53 
 
 (set actual parameter 
 address in head word 1) 
 (scan of (, begin 
 actual parameter) 
 (scan of ( ) 
 (scan of A) 
 (scan of x) 
 (scan of A) 
 (scan of ) ) 
 (scan of +) 
 (scan of ( ) 
 (scan of B) 
 
 
53 
 
 MACR0TAB[42] = 
 
 MACR0TAB[43] = 
 
 MACR0TAB[44] = 
 
 MACR0TAB[45] = 
 
 MACR0TAB[46] = 
 
 MACR0TAB[47] = 
 
 MACR0TAB[48] = 
 
 MACR0TAB[49] = 
 
 MACR0TAB[50] = 
 
 MACR0TAB[51] = 
 
 MACR0TAB[52] = 
 
 MACR0TAB[32] = 
 
 MACR0TAB[53] = 
 
 riACR0TAB[54] = 
 
 MACR0TAB[32] = 
 
 MACR0TAB[20] = 
 
 *SCAN descriptor 
 *SCAN descriptor 
 
 MACR0TAB[52] = 
 *SCAN descriptor 
 
 15/32/32/15/32 
 
 1/0/53/1/53 
 
 15/45/45/15/45 
 
 15/45/45/15/45 
 
 14/44/44/15/44 
 
 15/29/29/15/29 
 
 1/0/55/1/55 
 
 15/32/32/15/32 
 
 1/0/55/1/55 
 
 15/45/45/15/45 
 
 15/58/58/15/58 
 
 19/33/0/0 
 
 1/0/57/1/57 
 15/45/45/15/45 
 
 19/33/1/53 
 
 9/0/0 
 
 1/0/59/1/59 
 15/31/31/15/31 
 9/5/0 
 15/29/29/15/29 
 
 (scan of x) 
 
 (scan of B) 
 
 (scan of ) ) 
 
 (scan of ) ) 
 
 (scan of -) 
 
 (scan of ( ) 
 
 (scan of C) 
 
 (scan of x) 
 
 (scan of C) 
 
 (scan of ) ) 
 
 (scan of parameter 
 
 delimiter comma) 
 
 (actual parameter 
 
 addresses and lengths) 
 
 (scan of X, parameter 2) 
 
 (scan of ) , end of 
 
 parameter part) 
 
 (actual parameter 
 
 addresses and lengths) 
 
 (set outermost return 
 
 descriptor) 
 
 (TEMP, from MACR0TAB[2]) 
 
 ( , from MACR0TAB[3]) 
 
 (set parameter 1 return) 
 
 ( (, from MACR0TAB[33], 
 
 actual parameter table) 
 
54 
 
 *SCAN descriptor 
 *SCAN descriptor 
 
 15/45/45/15/45 
 15/46/46/15/46 
 
 ( ), from MACR0TAB[51]) 
 (;, from MACR0TAB[5]) 
 
 *SCAN descriptor 15/75/39/15/75 
 
 MACR0TAB[54] = 9/12/0 
 
 *SCAN descriptor 1/0/57/1/57 
 
 *SCAN descriptor 15/31/31/15/31 
 
 (THEN, from MACR0TAB[10]) 
 (set parameter 2 return) 
 (X, from MACR0TAB[53], 
 actual parameter table) 
 (<-, from MACR0TAB[12]) 
 
 *SCAN descriptor 
 MACR0TAB[54] = 
 *SCAN descriptor 
 *SCAN descriptor 
 *SCAN descriptor 
 *SCAN descriptor 
 G0-T0-G00D: 
 
 15/76/41/15/76 
 
 9/18/0 
 
 1/0/57/1/57 
 
 15/31/31/15/31 
 
 1/0/59/1/59 
 
 15/46/46/15/46 
 
 (ELSE, from MACR0TAB[16]) 
 (parameter 2 return word) 
 (X, from MACR0TAB[53]) 
 (<-, from MACR0TAB[18]) 
 (TEMP, from MACR0TAB[19]) 
 ( ; , from card 17) 
 
 018 
 
 MACR0TAB[31] = 
 *SCAN descriptor 
 
 9/0/0 
 15/74/37/15/74 
 
 (set outermost return) 
 (IF, from MACR0TAB[23]) 
 
 *SCAN descriptor 1/0/61/1/61 
 
 *SCAN descriptor 
 
 15/46/46/15/46 
 
 (G00D-RIGHT-TRIANGLE ; 
 from MACR0TAB[30]) 
 (;, from card 18) 
 
55 
 
 LIST OF REFERENCES 
 
 [1] Abel, N. A., et al , "TRANQUIL: A Language for an Array Processing 
 Computer," Proceedings of the Spring Joint Computer Confer - 
 ence," 1969. 
 
 [2] Lawrie, D., "GLYPNIR. A List Processing Language for Illiac IV," 
 Department of Computer Science, University of Illinois at 
 Urbana-Champaign, Report No. 322, April 1969. 
 
 [3] Feldman, J., and D. Gries, "Translator Writing Systems," Communica - 
 tions of the ACM , Vol. 11, No. 2, pp. 101-113, February 1968. 
 
 [4] Trout, H. R. G., "A BNF-like Language for the Description of Syntax 
 Directed Compilers," Department of Computer Science, Univer- 
 sity of Illinois at Urbana-Champaign, Report No. 300, January 
 1969. 
 
 [5] Machado, N. C, "ISL--A Semantics Language for a Translator System,' 
 M.S. thesis, Department of Computer Science, University of 
 Illinois at Urbana-Champaign, Report No. 367, December 1969. 
 
 [6] Knuth, D. E., The Art of Computer Programming. Volume 1, Funda - 
 mental Algorithms , Addison-Wesley, Reading, Mass., 1968. 
 
 [7] Mercer, D., "TWINKLE--A Syntax Language for a Translator Writing 
 System," M.S. thesis, Department of Computer Science, Univer- 
 sity of Illinois at Urbana-Champaign, Report No. 396, May 1970. 
 
 [8] Burroughs B-5500 Information Processing Systems Reference Manual, 
 Burroughs Corporation, Detroit, Mich., September 1968. 
 
 [9] Organick, E. I., A MAD Primer , published by the author, 1964. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-73-596 
 
 4. Tide and Subtitle 
 
 A GENERALIZED LEXICAL SCANNER FOR A TRANSLATOR WRITING SYSTEM 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 October 1973 
 
 6. 
 
 7. Author(s) 
 
 Albert Cannon Baker. Jr 
 
 8. Performing Organization Rept. 
 
 No UIUCDCS-R-73-596 
 
 9. Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 US NSF-GJ-596 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 20550 
 
 13. Type of Report & Period 
 Covered 
 
 Master's Thesis 
 
 14. 
 
 15. Supplementary Notes 
 
 6. Abstracts 
 
 This is an expository paper that is concerned with a Lexical Scanner for a 
 translator writing system that has been in use at the University of Illinois. Its 
 significant features include a structured, binary-tree symbol table, a parameterized 
 nacro expander, and a compile-time flexibility for assigning characters that make up 
 the basic terminal symbols. A comprehensive example of the scanner's operation is 
 also included. 
 
 '. Key Words and Document Analysis. 17a. Descriptors 
 
 exical scanner, translator writing system, compiler-compiler 
 
 b. Identifiers/Open-Ended Terms 
 
 1:. COSATI Field/Group 
 
 1 Availability Statement 
 
 UNLIMITED 
 
 F 'M NTIS-35 ( 10-70) 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 61 
 
 22. Price 
 
 ._ 
 
 USCOMM-DC 40329-P71 
 
->> 
 
 ',. 
 
 ^ 
 

!?"• 
 
 s <& 
 
hbhB mm 
 9H| 
 
 
 ' :' ■■■■■■ 
 
 Mini H *:£«, 
 I 
 
 ■ 
 
 HI HI iSs^. 
 
 B ■ 
 
 
 ■ 
 
 
 
 
 RH fei KB 
 
 Bl H HI 
 
 ■flHH 
 
 Ha « 
 
 iS&fS So 
 
 
 Km 
 
 WHBj 
 
 HHHHI 
 
 nag 8BB Ha 
 
 HHHJ 
 WWW 
 
 nmH 38 
 
 jjMMBnMHfflffll 
 
 ffiffiBH 
 moflRI 
 
 asrasan