LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 biO.%4 
 
 I SLUT 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 OCT "' WTT" 
 
 L161 — O-1096 
 
fS/ y Re P° rt No * 396 
 
 /^IdiAA^ 
 
 TWINKLE- -A SYNTAX LANGUAGE FOR A 
 TRANSLATOR WRITING SYSTEM 
 
 . by 
 
 Robert Leroy Mercer 
 
 ILLIAC IV Document No. 218 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/twinklesyntaxlan396merc 
 
Report No. 396 
 
 WINKLE- -A SYNTAX LANGUAGE FOR A 
 TRANSLATOR WRITING SYSTEM* 
 
 by 
 
 Robert Leroy Mercer 
 
 May 15, 1970 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 618OI 
 
 This work was supported in part "by the Advanced Research Projects 
 Agency as administered by the Rome Air Development Center under 
 Contract No. USAF 30(602)-lnM and submitted in partial fulfillment 
 of the requirements for the degree of Master of Science in Computer 
 Science, February 1970. 
 
11 
 
 ABSTRACT 
 
 TWINKLE is a language designed to aid in the syntactic 
 specification of programming languages. In addition to the constructs 
 available in BNF, TWINKLE provides for easy specification of lists and 
 other frequently used linguistic structures. By providing a large number 
 of alternatives for its various constructs, TWINKLE allows the language 
 designer to specify a language in terms that approach natural language. 
 
 The implementation of a compiler for TWINKLE is described. 
 This compiler is the first phase of the ILLIAC IV Translator Writing 
 System. 
 
Ill 
 ACKJTCWLEDGEMENTS 
 
 The author wishes to express his appreciation for the advice and 
 efforts of Dr. Robert S. Northcote who has helped immeasurably in the creation 
 of this paper. 
 
 Thanks are also due the author's colleagues Alan Beals, Nelson 
 Machado, and Jacques LaFrance whose contributions to the language described 
 herein and discussions on the translator writing system have been invaluable. 
 
 For financial support, I wish to acknowledge the National 
 Science Foundation for its award of a fellowship. I also acknowledge support 
 by the ILLIAC TV project for willing provision of the necessary computer and 
 other physical facilities. 
 
 Finally, deep gratitude is expressed to Mrs. Sandy McCabe and Mrs. 
 Shirley Brown for their time and effort in typing the manuscript. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 1.1 Backus Naur Form 2 
 
 1.2 Translatable Backus Naur Form 3 
 
 2. THE TWINKLE METALANGUAGE FOR SYNTACTIC SPECIFICATION 1+ 
 
 2.1 Syntactic Symbols 7 
 
 2.1.1 Terminals 7 
 
 2.1.1.1 Characters 7 
 
 2.1.1.2 Special Words 9 
 
 2.1.1.3 Character Mode Terminals 10 
 
 2.1.1. k Meta-Terminals 11 
 
 2.1.1.5 Blanks 11 
 
 2.1.2 Nonterminals 12 
 
 2.1.3 Any Symbols 12 
 
 2.1.4 Square Brackets l4 
 
 2.1.5 Maybe Symbols 17 
 
 2.1.6 Enclosures 17 
 
 2.1.7 Unordered List 19 
 
 2.1.8 Precedence Structures 19 
 
 2.1.9 Lists 20 
 
 2.1.9.1 Head 20 
 
 2.1.9.2 Type 22 
 
 2.1.9.3 Base 22 
 
 2.1.9.4 Separator 22 
 
 2.1.9.5 Tail 23 
 
 2.1.10 Seeded Lists 24 
 
 2.2 Semantic Symbols 25 
 
 2.2.1 Actions and Tests 25 
 
 2.2.2 Simple Calls 26 
 
 2.2.3 Declaration Calls 27 
 
 2.2.4 Implicit Calls 27 
 
 2.2.5 Parameterized Semantic Calls 28 
 
 2.2.6 Bit Actions and Tests 28 
 
 2.2.7 The Tail 29 
 
 2.2.8 Placement of Calls 29 
 
 2.3 Null and Empty Symbols 31 
 
 3. CONTROL OF THE TWINKLE TRANSLATION 32 
 
 3.1 The Translator Writing System 32 
 
 3-2 Control Statements 35 
 
 3.2.1 Language Name Designation 35 
 
 3.2.2 Print Options 35 
 
 3.2.3 The Parser Type Option 38 
 
 3-2.4 Zip Control 38 
 
 3-2.5 Program Parameter Control 39 
 
Page 
 
 3.2.6 Executable Compiler Options 39 
 
 3.2.7 Miscellaneous Control Options 1+0 
 
 3.3 Burroughs B-5500 Control Cards for 
 
 Executing TWINKLE 1+1 
 
 k. IMPLEMENTATION OF THE TWINKLE TRANSLATOR 1+3 
 
 k.l NONTAB, SYMTAB, and OPRTAB 1+3 
 
 1+.2 PROTAB, PRODS and PDLIST ^5 
 
 if. 3 PRODSTACK and LPSTACK 1+9 
 
 k.k Any Patterns 53 
 
 1+.5 Grammar Transformations 53 
 
 1+.5«1 Back Context Absorption 53 
 
 k.5.2 Empty Removal 57 
 
 U.5'3 Back Substitution of Singly 
 
 Defined Nonterminals 58 
 
 k.^.k Dummy Insertion 59 
 
 k.6 TWINKLE Output Files 6l 
 
 1+.6.1 TABLESF 61 
 
 1+.6.2 ACTIONS 6k 
 
 k.J ZIP Files 65 
 
 5. SUMMARY 66 
 
 APPENDIX 
 
 A. Reserved Words for TWINKLE 68 
 
 B. The Syntax of TWINKLE written in TWINKLE 
 
 and in BNF 69 
 
 LIST OF REFERENCES 93 
 
VI 
 
 LIST OF FIGURES 
 
 Figure Page 
 
 1. The syntax of the list with separator 21 
 
 2. Block diagram of the Translator Writing System 33 
 
 3. An entry in the NONTAB table kk 
 
 k. PROTAB word format U6 
 
 5. The format of a PRODSTACK k9 
 
 6. The format of a LPSTACK 50 
 
 7> Left recursive and right recursive lists 52 
 
1. INTRODUCTION 
 
 The recent proliferation of digital computers has spawned an ever 
 increasing number of formal languages for computer programming and related 
 purposes. Creating a compiler for such a formal language is a decidedly non- 
 trivial task, often requiring several man-years of effort. Therefore, from 
 this "bourgeoning stock of languages and compilers, several widely applicable 
 compiler writing techniques have been extracted which at once lead to a deeper 
 understanding of the compiler writing process and to a considerable reduction 
 of the effort involved. 
 
 Because of its importance in obtaining a clear and precise definition 
 of a formal language, the development of syntax metalanguages has been inti- 
 mately related to the development of compiler writing techniques. These meta- 
 languages range from Backus Naur Form (BNF) [1], and its many variants 
 [2, 3> ^> 5]> to languages more suitable to syntactic recognition, such as 
 the Floyd production language (FFL) [6] and operator precedence tables [7]> 
 and even to the conventional programming languages, FORTRAN [8] and PL1 [9]- 
 Each of these languages has certain advantages: relative compactness and clarity 
 of syntactic structure in the case of BNF and its derivatives; a very clear and 
 explicit statement of the recognition algorithm in the case of FFL and operator 
 precedence tables; and, finally, virtually immediate implementation in the 
 case of FORTRAN and FL1. As is to be expected, ease of producing a linguistic 
 description decreases rapidly as the description itself approaches an imple- 
 mented compiler. 
 
 The primary aim of the TWINKLE metalanguage is to provide a major 
 increase in the ease with which a syntactic specification may be created by 
 a language designer and in the ease with which that syntax may be understood 
 "by a user of the language unfamiliar with metalanguages in general. This has 
 
"been achieved through the introduction of a wide variety of syntactic symbols 
 for designating many of the common syntactic structures such as lists, en- 
 closures, etc., and through the provision of numerous English words and 
 phrases, which may be used with commonly understood meanings, as an integral 
 part of the syntactic specification. 
 
 As a secondary aim, TWINKLE has been designed to present a unified 
 front for the University of Illinois Translator Writing System (TWS). TWINKLE 
 is the input language and the TWINKLE translator is the first phase of the TWS. 
 Thus, TWINKLE combines BNF as described by Beals [3] and translatable BNF 
 (TBNF) as described by Trout [h]. Before progressing to a detailed descrip- 
 tion of the TWINKLE language and translator which occupies the remainder of 
 this thesis, a brief description is provided of both BNF and TBNF. 
 
 1.1 Backus Naur Form 
 
 The basic unit of the BNF description of a language is the produc- 
 tion. A production consists of a nonterminal (the left hand side) followed 
 by the symbol, triple ":: = ", followed by a list of terminals and nonterminals 
 (the right hand side). Each nonterminal consists of a string of characters 
 
 enclosed in either quotes (" ") or angle brackets (< >) . The string may 
 
 not include ", <, or > and may not start with *. Terminals are special words 
 (strings of alphanumeric characters preceded by #) or characters (A, B, C, 
 etc)- Characters used in the metalanguage (#, ", <, /, etc.) must also be 
 preceded by a # when used as terminals. Two productions with the same left 
 hand side may be combined into one by including the right hand side of the 
 second with the right side of the first and separating it from the latter with 
 the metacharacter, "/ ". Productions themselves are separated from one another 
 the metacharacter, "j ". 
 
1.2 Translatable Backus Naur Form 
 
 TBNF is, in itself, a large step toward simplifying syntactic spec- 
 ification. In addition to the BNF structures described in the last section, 
 TBNF allows: 
 
 (i) Kleene star-- 
 
  * = \ |  |   | . . . ; 
 (ii) Ampersand for optional presence of some symbol-- 
  & =  | \ ; 
 (iii) Square "bracket construct to delimit groups of symbols and 
 alternatives -- 
  ::=  [ | ] z 
 is equivalent to 
  ::=   z 
  ::=  |  ; 
 (iv) list  =   * ; 
 
 (v) list  separator  =  [ ] * ; 
 (vi)  = Any symbol at all ; 
 (vii) "but  used in conjunction with  to reduce its generality. 
 
 Thus TBNF is considerably more general than BNF. Note, however, that TBNF 
 does not allow left recursion because the parser generated employs recursive 
 descent. 
 
2. THE TWINKLE METALANGUAGE FOR SYNTACTIC SPECIFICATION 
 
 When work was begun on the TWINKLE language, two metalanguages, 
 BNF and TBNF, were already in use at the University of Illinois as input 
 languages to the TWS. The BNF input yielded a deterministic parsing algor- 
 ithm based on the Floyd Production Language (FPL), as described by Beals 
 [ 3 ]; while TBNF input yielded a recursive descent (KD) parsing algorithm, 
 as described by Trout [ k ] . Each parser has certain advantages but, 
 once either BNF or TBNF has been chosen as the metalanguage, it requires 
 a major effort to convert the description to the alternate form. Thus, 
 although it would be desirable, because of its relatively rapid generation, 
 to create a RD parser during the debugging phases of the language descrip- 
 tion, it would be equally desirable, because of the rigorous exclusion of 
 ambiguity inherent in the nature of the deterministic FPL parser, to 
 create a FPL parser when the last phase of the compiler creation is reached. 
 Unfortunately this ideal has not been attainable, in the past, primarily 
 due to the difficulty of translating TBNF into BNF by hand. These considera- 
 tions, therefore, dictate that TWINKLE be a superset of both BNF and TBNF, 
 so that existing language specifications may be accepted by the new system 
 with little or no change, and that the TWINKLE translator output be either 
 BNF or TBNF. 
 
 The basic form of TWINKLE, therefore, is cast in the familiar 
 BNF mode. That is, a TWINKLE syntactic specification consists of one or 
 more productions, each of which has a left hand side, which is the non- 
 terminal being defined (wholly, or in part) by the production, and a 
 
 :t hand side which comprises a set of alternative definitions for the 
 
nonterminal in the left hand side. Each definition, in turn, comprises 
 a string of TWINKLE syntactic and semantic symbols. In BNF the produc- 
 tions are separated from one another by semicolons; the definitions by 
 vertical bars which are actually rendered in the implementation by a 
 slash; and the left and right hand sides of a production by the character, 
 triple "::=" . It is one of the aims of the TWINKLE project to make 
 possible the rigorous specification of a language syntax in a way that is 
 at once acceptable to both human beings and computers. To this end, 
 English phrases have been provided for the replacement and/or embellishment 
 of the metalinguistic symbolism of BNF and TBNF. In addition, several 
 features not present in either BNF or TBNF have been introduced. For 
 example, the BNF productions, 
 
  : : - #BEGIN  #END / 
 
 #BEGIN #END; 
  : : =  /  #; 
 ; 
 
 may be written in TWINKLE in the much more readable form: 
 
 A  CONSISTS OF A POSSIBLY EMPTY LIST OF S 
 
 (1) 
 SEPARATED BY SEMICOLONS ENCLOSED IN #BEGIN AND #END; 
 
 At first glance this might appear to have two distinct interpretations: 
 particularly, a program may be either 
 
 BEGIN s ; s ; s ; ... ; s END 
 or 
 
 s BEGIN; END s BEGIN; END s ... BEGIN; END s 
 
where "s" here stands for . In fact, the former interpretation 
 is made. If the language designer wishes to express the latter form of 
 program, he may write : 
 
 A  CONSISTS OF A POSSIBLY EMPTY LIST OF S 
 
 (2) 
 
 SEPARATED BY [SEMICOLONS ENCLOSED IN #BEGIN AND #END] ; 
 
 In TWINKLE, the square brackets ([]) serve the function of delineating 
 clause and phrase structure in productions. The enclosure operator always 
 acts on the immediately preceding syntactic symbol which is at the same 
 nesting level as the enclosure operator, itself. Thus, in production (i), 
 it is the possibly empty list that is to be enclosed and not the semicolons. 
 In production (2), on the other hand, the square brackets associate the 
 enclosure operator with the semicolons and indicate that this construct, 
 taken as a whole, is to be the separator of the possibly empty list. 
 
 It is to be noted, however, that the TWINKLE language does not 
 enforce strict grammatical usage of English but, rather, allows for such 
 usage by the language designer. Thus, it will be found that, in the TWINKLE 
 syntax, the articles "A" and "AN" are treated equally with the result that 
 a determined corruptor of English might write (l) as 
 
 AN  CONSISTS OF AN POSSIBLY EMPTY LIST OF S 
 SEPAPATED BY SEMICOLONS ENCLOSED IN #BEGIN AND #END; ^' 
 
 The TWINKLE program, in its most general form, is made up of 
 three distinct portions, of which the language syntactic description is 
 the second. The first portion is a set of control statements which deter- 
 mine, among other things, the nature and volume of the TWINKLE output and 
 
the processing options for the remainder of the TWS. The construction 
 and use of this portion is dealt with in Chapter 3« The third portion, 
 the semantic tail, conveys the necessary semantic information about the 
 language being described. This is discussed briefly in 2.2.5, and more 
 fully by Machado [10]. The remainder of the current chapter is a 
 discussion of the syntactic and semantic symbols available to the language 
 designer for the syntactic portion of the TWINKLE system. 
 
 2.1 Syntactic Symbols 
 2.1.1 Terminals 
 
 Terminals are those symbols of which the object program is 
 ultimately composed. They may occur only on the right hand side of a 
 production and are represented in the syntax in a number of different 
 ■ways. Terminals fall into the following classes: characters, special 
 words, character mode terminals, meta- or special class terminals, and 
 blanks. 
 2.1.1.1 Characters 
 
 The simplest form of the terminal is the character. Here 
 character refers to any of the twenty- seven special characters accepted 
 by equipment of the Burrough's B-5500 computer system. These are included, 
 together with the other thirty- seven B-5500 characters, in Table I- In 
 the syntax, a character may be represented by prefixing it -with a sharp 
 (#) when the character standing alone would have some special significance 
 to TWINKLE (i.e., when the character is a meta- character ) . For example, 
 a comma is represented in the syntax by the symbol pair "#, ". However, 
 since more than half of the special characters available are meta- characters, 
 it is probably safest to include the sharp in all cases. This is a point 
 at which TWINKLE diverges from the standard BNF and TBNF. The latter have 
 
 
 Table ] 
 
 
 The 
 
 Burroughs 
 
 B-5500 
 
 Character 
 
 Set 
 
 
 
 
 
 
 Classes of Any Base Symbols 
 
 
 
 H 
 
 a 
 
 ■H 
 
 EH 
 
 CD 
 T! 
 O 
 
 o 
 
 H 
 
 cd 
 
 0) 
 -P 
 
 H 
 
 * 
 
 co 
 
 CD 
 
 co 
 
 CD 
 
 pq 
 l» 
 
  
 CO 
 CO 
 
 pq 
 
 >> 
 c 
 < 
 
 H 
 
 co 
 
 a 
 
 •H 
 
 CD 
 EH 
 
 CD 
 
 o 
 O 
 
 H 
 
 CO 
 
 C 
 
 fn 
 CD 
 P 
 E! 
 H 
 
 * 
 
 co 
 QJ 
 
 CO 
 CO 
 
 pq 
 >> 
 < 
 
 CD 
 
 •B 
 
 o 
 o 
 
 r-H r-H 
 
 co co 
 
 •H U 
 g CD 
 S-H -P 
 CD C! 
 
 Eh H 
 
 * 
 w 
 
 CD 
 CO 
 
 co 
 
 pq 
 >> 
 
 < 
 
 
 
 
 
 0,1,3 
 
 A 
 
 17 
 
 0,1,2 
 
 K 
 
 34 
 
 0,1,2 
 
 T 51 
 
 0,1,2 
 
 1 
 
 1 
 
 0,1,3 
 
 B 
 
 18 
 
 0,1,2 
 
 L 
 
 35 
 
 0,1,2 
 
 U 52 
 
 0,1,2 
 
 2 
 
 2 
 
 0,1,3 
 
 C 
 
 19 
 
 0,1,2 
 
 M 
 
 36 
 
 0,1,2 
 
 V 53 
 
 0,1,2 
 
 3 
 
 3 
 
 0,1,3 
 
 D 
 
 20 
 
 0,1,2 
 
 N 
 
 37 
 
 0,1,2 
 
 w 54 
 
 0,1,2 
 
 4 
 
 k 
 
 0,1,3 
 
 E 
 
 21 
 
 0,1,2 
 
 
 
 38 
 
 0,1,2 
 
 X 55 
 
 0,1,2 
 
 5 
 
 5 
 
 0,1,3 
 
 F 
 
 22 
 
 0,1,2 
 
 P 
 
 39 
 
 0,1,2 
 
 Y 56 
 
 0,1,2 
 
 6 
 
 D 
 
 0,1,3 
 
 G 
 
 23 
 
 0,1,2 
 
 Q 
 
 4o 
 
 0,1,2 
 
 Z 57 
 
 0,1,2 
 
 7 
 
 7 
 
 0,1,3 
 
 H 
 
 24 
 
 0,1,2 
 
 R 
 
 4l 
 
 0,1,2 
 
 , 58 
 
 0,1,4 
 
 8 
 
 8 
 
 0,1,3 
 
 I 
 
 25 
 
 0,1,2 
 
 S 
 
 42 
 
 0,1,4 
 
 i 59 
 
 0,1,4 
 
 9 
 
 9 
 
 0,1,3 
 
 • 
 
 26 
 
 0,1,4 
 
 -X- 
 
 43 
 
 0,1,4,6 
 
 / 60 
 
 0,1,4,5 
 
 # 
 
 10 
 
 0,1,4 
 
 [ 
 
 27 
 
 0,1,4 
 
 - 
 
 44 
 
 0,1,4,6 
 
 = 61 
 
 0,1,4,5 
 
 @ 
 
 ii 
 
 0,1,4 
 
 & 
 
 28 
 
 0,1,4 
 
 ) 
 
 45 
 
 0,1,4 
 
 ] 62 
 
 0,1,4 
 
 1 
 
 12 
 
 0,1,4 
 
 ( 
 
 29 
 
 0,1,4 
 
 j 
 
 46 
 
 0,1,4 
 
 " 63 
 
 0,1,4 
 
 
 13 
 
 0,1,4 
 
 < 
 
 30 
 
 0,1,4,5 
 
 < 
 
 47 
 
 0,1,4,5 
 
 special 
 
 "word-0,7 
 
 > 
 
 Ik 
 
 0,1,4,5 
 
 <— 
 
 31 
 
 0,1,4 
 
 
 48 
 
 0,1 
 
 <*!> - 
 
 0,7 
 
 > 
 
 15 
 
 0,1,4,5 
 
 X 
 
 32 
 
 0,1,4,6 
 
 / 
 
 49 
 
 0,1,4,6 
 
 <*N> - 
 
 0,7 
 
 + 
 
 lb 
 
 0,1,4,6 
 
 J 
 
 33 
 
 0,1,2 
 
 S 
 
 50 
 
 0,1,2 
 
 <*S> - 
 
 0,7 
 
 = Any Terminal 
 
 1 = Any Character 
 
 2 == Any Letter 
 
 3 = Any Digit 
 
 4 = Any Special Character 
 
 5 = Any Relational Operator 
 
 6 = Any Algebraic Operator 
 
 7 = Any Non-Character 
 
fewer meta- characters and, as such, require fewer sharps. Although it is 
 very easy to insert the necessary sharps, it may be desirable to make a 
 preliminary run through the TWINKLE translator alone to isolate what trouble 
 spots there may be. 
 
 An alternative method for indicating a character in the syntax 
 which avoids the details of the sharp, and which provides for a more 
 readable syntax, consists of writing down the English word or phrase which 
 identifies the character in question. Thus, in production (l) above, 
 SEMICOLONS is used in place of the equally acceptable "#;". While this 
 form is not available for all of the special characters, it proves quite 
 useful in practice. A complete list of these alternatives is given in 
 the TWINKLE syntax (see Appendix B). 
 2.1.1.2 Special Words 
 
 Many times it is convenient to consider a group of letters and 
 digits as being, conceptually, a single terminal. Thus, in languages of 
 the ALGOL family, the letter strings BEGIN and END are each taken as 
 single terminals. This approach has an advantage in the milieu of the 
 TWS in that these conceptual units, or special words, are compiled rela- 
 tively quickly by the scanner as opposed to the more laborious and time 
 consuming letter by letter compilation through the syntax and semantics. 
 
 Any string of letters and digits which begins with a letter may 
 be used as a special word. It must not have embedded blanks, and the 
 character immediately after it must not be alphanumeric. As with characters, 
 a special word must be prefixed by a sharp in the syntax if it would 
 otherwise be of special significance to the TWINKLE translator. Since 
 there are well over one hundred such words in TWINKLE (see Appendix A), 
 
it is safest to use sharps literally. Again, there is a divergence of TWINKLE 
 at this point from BNF and TBNF which is easily overcome. 
 2.1.1.3 Character Mode Terminals 
 
 While the special word is often the better way of entering 
 alphanumeric information, there are times when character by character 
 input is actually preferable. For example, the parsing of FORTRAN and 
 B-5500 ALGOL FORMAT statements is simplified if done in the character mode. 
 More generally, any time there is an abundance of single character signifi- 
 cance in a syntactic entity, it is better parsed and more compactly 
 described in the character mode. 
 
 If the sequence of characters to be dealt with consists entirely 
 of digits, it may be written into the syntax directly because an unadorned 
 number has no special significance to the TWINKLE translator. If, however, a 
 more general sequence must be handled, the sequence must be preceded by the 
 word ALPHA, which indicates to TWINKLE that it must consider the following 
 sequence of characters specially. Since ALPHA is a bit long, it behooves one to 
 provide a means of keeping its use to a minimum. To this end, the construct, 
 
 [ALPHA A / ALPHA B / ALPHA C / ... / ALPHA 2] , 
 
 is equivalent to the more compact form, 
 
 ALPHA [A / B / C / . . . / Z] 
 
 Another, more obscure, method of specifying alphanumeric 
 characters (or, in fact, any of the Burroughs B-5500 characters) is the 
 code construct which is based on the internal binary representations of 
 the various characters. This form of character representation is a 
 
11 
 
 carry over from TBNF where it was adopted primarily "because the question 
 mark is not a valid character on punched cards in the B-5500 system. It 
 consists of the word CODE followed by an integer between zero and sixty- 
 three which is the internal code of the character being indicated. 
 2.1.1.U Meta-Terminals 
 
 Because of the advantages attendant to allowing the scanner to 
 perform a certain amount of simple syntactic analysis immediately on the 
 input string (as, for example, in the recognition of special words), 
 the TWS scanner also recognizes members of three special classes of 
 terminals: identifiers, numbers, and strings. These meta-terminals are 
 represented in the syntax by the symbols <*I>, <*N>, and <*S>, respectively. 
 In an English-like syntax, they may be represented by IDENTIFIER (or 
 IDENTIFIERS), NUMBER (or NUMBERS), and STRING (or STRINGS). An identifier 
 is any sequence of alphanumeric characters beginning with a letter, 
 provided the sequence is not a special word of the language. A string 
 is any sequence of characters (excluding the quote) enclosed in quotes. 
 A discussion of how the scanner handles these items has been given by 
 Machado [lo] • 
 
 In addition to these three meta-terminals, TWINKLE allows for 
 the syntactic specification of up to twenty other meta-terminal symbol 
 classes. In the syntax, these are represented by the symbol <*n> where 
 n is a digit between four and twenty-three and identifies the meta- 
 terminal. A special scanner is necessary to take advantage of this 
 facility. 
 2.1.1.5 Blanks 
 
 Blanks are specified in the syntax by the word BLANK. A blank 
 can only be scanned in the character mode. 
 
12 
 
 2.1.2 Nonterminals 
 
 Nonterminals are specified in TWINKLE as strings of characters, 
 called nonterminal names, enclosed in either angle brackets or quotes 
 
 (< > or " " ). For obvious reasons the nonterminal name may 
 
 not include either angle brackets or quotes. Furthermore, any blanks 
 which appear in the nonterminal name are disregarded. Thus, the nonter- 
 minals,  and , are treated identically. In 
 the BNF or TBNF output resulting from a TWINKLE translation, the blanks 
 in nonterminal names displayed are, in fact, removed. To retain a 
 modicum of readability in this compact form it is advisable to hyphenate 
 multi-word nonterminal names; for example, use  
 in place of , which is any of the twenty- six letters 
 of the alphabet, one must write-- 
 
13 
 
  ::=A/B/C/.../X/Y/Z 
 
 Trout, in TBNF, introduced the pseudo-nonterminal, , which stands for 
 any terminal symbol. If not all terminals are to be indicated, the 
 exceptions, if small in number, may follow the pseudo-nonterminal — each 
 preceded by the special word, BUT. For example, any terminals except 
 BEGIN and END may be written: 
 
  BUT #BEGIN BUT #END 
 
 This construct has been used primarily in error recovery in TBNF languages. 
 
 In TWINKLE, the any symbol has been generalized and has become 
 a powerful programming tool. The syntax of  is shown below: 
 
 ( #terminal 
 Character 
 #letter 
 
 #DIGIT 
 #ANY / #S FECIAL #CHARACTER 
 
 #RELATIONAL #OPERATOR 
 #ALGEBRAIC #OPERATOR 
 
 #NONCHARACTER 
 \ <^i\TriT\iniTr'DMTi\Tfl t/s * 
 
 LIST OF {#BUT } 
 
 ) 
 
 X 
 
 #BUT #[LIST OF S SEP( #/ }# ] 
 
 ^ 
 
 J 
 
 V_ 
 
 ' 
 v 
 
 EXCEPTION LIST 
 
 BASE 
 
 Use has been made of a rather simple two-dimensional extension of TWINKLE: 
 square brackets have been replaced by vertical braces with the alternatives 
 occupying one line each; the Greek letter 'V is used instead of the special 
 word, LAMBDA. 
 
 The terminals in each of the bases, except for , are 
 shown in Table I (page 8). The  base is unique in that, first, 
 
14 
 
 the elements that it contains depend upon the actual nonterminal symbol 
 used and, second, they are not restricted to terminals but include all 
 of the alternative definitions of the nonterminal. Any terminals which 
 are in the base, but "which are not desired, may be written in the excep- 
 tion list following the . The TBNF form of the exception list 
 is still accepted by TWINKLE but, as with the ALPHA list, it is possible 
 to use only one BUT and enclose the terminals of the exception list in 
 square brackets immediately following it. Thus, in place of 
 
 BUT #BEGIN BUT #ENL BUT #LEFT BUT #RIGHT , 
 one may write 
 
 BUT [#BEGIN / #END / #LEFT / #RIGHT] 
 
 2.1.4 Square Brackets 
 
 In English, clauses are separated at one level by commas and 
 at a higher level by semicolons. Beyond this either more than one sentence 
 is used or the clause separation must be done by the reader from context. 
 Even this, however, does not prevent ambiguity beyond four levels or so. A 
 language such as TWINKLE, in which it is necessary to indicate clause 
 nesting to an arbitrary level, must have a more powerful mechanism 
 available. 
 
 nee the semicomma and the demisemicolon do not yet exist, it 
 decided that clauses and other such ensembles, which are intended as 
 .ngle syntactic entities, be enclosed in square brackets. Two examples of 
 • e already been encountered: the terminal list following ALPHA, 
 
 .1st following BUT. Beyond these the square brackets find 
 
15 
 
 several other uses; whenever one or more symbols, or groups of symbols, 
 appear at one spot in a production they may he enclosed in square brackets, 
 Thus, the productions, 
 
 AN  CONSISTS OF #ANY #TERMINAL; 
 AN  CONSISTS OF #ANY #CHARACTER; 
 AN  CONSISTS OF #ANY #LETTER , 
 
 may be written more compactly as 
 
 AN  CONSISTS OF #ANY FOLLOWED BY [^TERMINAL OR 
 #CRARACTER OR #LETTER] . 
 
 For purposes of adding semantic symbols, which will be discussed later, 
 the special word,ANY, and the bracket construct, taken as a whole, are 
 considered to be at level zero of the production, while the special 
 words; TERMINAL, CHARACTER, and LETTER, are considered to be at level one. 
 
 Alternatively, an entire production may be nested in square 
 brackets and one may write- - 
 
 AN  CONSISTS OF #ANY FOLLOWED BY AN [ 
 
 CO 
 
 WHICH IS DEFINED TO BE #TERMINAL OR #CHARACTER OR #LETTER] . 
 
 Note that although all previous forms of arrow are still valid in the 
 nested production, it is also permissible to include the special word, 
 WHICH, so that the construct will look more like an English clause. When 
 a nonterminal, such as , is defined in a nested production it 
 
16 
 
 may then appear anywhere else in the syntax just as if it had been de- 
 fined in the usual manner. There are, however, some precautions to be 
 ta'^en with nested productions. These stem from the fact that the non- 
 terminal so defined may have additional definitions elsewhere in the 
 syntax. In this case the nonterminal represents the totality of its 
 alternative definitions, except when it appears on the left hand side 
 of a nested production in which case it represents only those definitions 
 which appear on the right hand side of the same nested production. To 
 illustrate this, suppose that in addition to the production above, in 
 which  is defined, one also has the production: 
 
 A  CONSISTS OF #SOME FOLLOWED BY AN 
 
 (5) 
 [ WHICH IS #DIGIT OR #DIGITS] 
 
 The two productions (k) and (5) are then equivalent to the BNF productions: 
 
  : : = #ANY  ; 
 
  : : - #SOME  ; 
 
  : : =  /  ; 
 
  : : = ^TERMINAL / #LETTER / #CHARACTER ; 
 
  : : = # DIGIT / #DIGITS 
 
 The square bracket construct may also be applied to a single 
 string of symbols to indicate that the string is to be taken as a unit 
 itself. This is useful in constructs such as the maybe symbol, enclosures, 
 . lists described below. 
 
IT 
 
 2.1.5 Maybe Symbols 
 
 Frequently a syntactic structure has one or more substructures 
 which may be omitted without syntactic error. Thus, for example, in ALGOL 
 a list of labels preceding a statement is optional. Many other examples may 
 be found in algorithmic languages; several may be found in TWINKLE, 
 itself. To make the specification of such structures as easy as possible, 
 Trout [h] adopted the Brooker and Morris question mark [ll] --changing it, in 
 the process, into an ampersand because the question mark is an illegal char- 
 acter on B-5500 cards. In TWINKLE, either an ampersand or a question mark may 
 be placed after a symbol (or group of symbols enclosed in square brackets) to 
 indicate that it is optional. The English-like form of this construct con- 
 sists of preceding the optional symbol by the special word phrase, POSSIBLY 
 ONE. This form is more general, in that it may be applied directly to lists 
 and enclosures; whereas, they must be enclosed in brackets when followed by a 
 question mark or ampersand. 
 
 An example of the English form of  follows. The 
 production, 
 
 AN  CONSISTS OF AN  FOLLOWED BY 
 
 POSSIBLY ONE  , 
 is equivalent to the two BNF productions: 
 
  : : -  ; 
 
  : : =   
 
 Under more complex conditions, the maybe symbol can account for a considerable 
 increase in readability of the syntax. 
 
 2-1.6 Enclosures 
 
 Another very common construct in computer languages is that in which 
 some structure, such as a list of subscripts, is enclosed in delimiters, such 
 
as parentheses. The delimiters may be different: e.g., the special words, 
 BEGIN and END, which bracket compound statements in ALGOL; they may be different 
 but very closely related: e.g., the left and right parentheses which enclose 
 subscript lists in FORTRAN and PL1; they may be identical: e.g., quotes which 
 delimit strings in ALGOL or periods which enclose logical operators in some 
 dialects of FORTRAN. Corresponding to these possibilities there are three 
 forms of enclosure. The general representative of the first form may be 
 symbolized as: 
 
  #ENCLOSED #IN  #AND  
 
 where  represents any basic TWINKLE symbol or group of symbols en- 
 closed in brackets. The latter two symbols, if not enclosed in brackets, may 
 not be enclosure symbols themselves. Using this construct and the list symbol 
 discussed below a compound statement in ALGOL may be defined by the very clear 
 production: 
 
 A  IS DEFINED TO BE A POSSIBLY EMPTY LIST OF 
 S SEPARATED BY SEMICOLONS ENCLOSED IN #BEGIN AND #ENDj . 
 
 The second form of enclosure actually applies to only three sets of characters 
 in the Burroughs character set: parentheses, square brackets, and angle brack- 
 ets. It may be symbolized, using the two dimensional bracket construct de- 
 scribed earlier, as: 
 
 C\ ^PARENTHESES ] 
  #ENCLOSED #IN { #ANGLE #BRACKETS f 
 
 ( #SQUARE #BRACKETSJ 
 
 Finally, the third form of enclosure symbol is simply: 
 
  #ENCL0SED #IN  
 
 word, S, may be added to make the second symbol plural. 
 
19 
 
 2.1.7 Unordered List 
 
 The unordered list provides a means of indicating that a group of 
 items may appear in any order. One very simple example of the use of this is 
 the PL1 iterated DO loop. The leading statement may have an initial value, 
 increment, and final value for the control variable; the increment and 
 final value may appear in either order. Symbolically the unordered list has 
 the form: 
 
 #UNORDERED #LIST #0F [#AND- • • #AND ] ; 
 or 
 
 [ #AND  • • • #AND ] #IN #ANY #ORDER 
 
 2.1.8 Precedence Structures 
 
 The precedence structure was introduced in TBNF to allow for the 
 specification of operator precedence in a BNF environment. Since the prece- 
 dence structure does not lend itself readily to a simple English alternative, 
 its syntax has not been expanded from the TBKF version. A precedence struc- 
 ture consists of: the special word, OPERATOR, followed by a list of precedence 
 groups enclosed in square brackets; followed by the special word, ON; and, 
 finally, by the operand on which the precedence is based. Each precedence 
 group consists of a list of symbols, the operators, followed by the special 
 word, PRECEDENCE, and a pair of integers separated by a comma enclosed in paren- 
 theses. The integers indicate the precedence of the preceding operators in 
 the stack and in the input stream, respectively. Succeeding precedence groups 
 are separated from one another by slashes. The following is an example of 
 the use of a precedence structure : 
 
 ARITHMETIC EXPRESSION> :: = 
 
 OPERATOR[#f #- PRECEDENCE (1,1)/ #/ #X PRECEDENCE^, 2)] ON  
 
20 
 
 2.1.9 Lists 
 
 The list is one of the most useful of the TWINKLE constructs. Con- 
 sequently it occurs in a wide variety of English-like and symbolic forms. To 
 indicate a simple list without separators, the right square bracket of a nested 
 production or square bracket construct may be followed by an asterisk or by a 
 plus sign. The asterisk denotes a possibly empty list while the plus sign 
 denotes a list that must have at least one element. 
 
 In BNF, lists are usually formed either by left recursive productions- 
 such as : 
 
  : : =  /   ; 
 
 or by right recursive productions such as: 
 
  : : =  /   
 
 This underlying structure is masked by the simplicity of the TWINKLE constructs 
 but may be made explicit, if desired, by the insertion of the qualifying spe- 
 cial word, OPEN, for left recursion or CLOSE, for right recursion between the 
 right square bracket and the asterisk. If no qualifier is present, left re- 
 cursion is implied. 
 
 The syntax of list structures allowing for separators is shown in 
 figure 1. As indicated it consists of five portions: head, type, base, sepa- 
 rator, and tail. Only the type and base are, necessarily, nonempty but at least 
 one of the head and tail must be empty. The functions of these various com- 
 ponents are discussed below. 
 2.1.9.1 Head 
 
 In the absence of the list tail, the list head determines 
 whether the list is left or right recursive. If it is 
 empty (the 1 1 word, REDUCED, or the phrase, LEFT RECURSIVE), 
 .1st is left recursive. If it is R, EXPANDED, or RIGHT 
 : :; right recursive. 
 
21 
 
 O 
 •P 
 03 
 U 
 d 
 
 0) 
 
 -P 
 K) 
 •H 
 
 O 
 05 
 
 t 
 
 CD 
 
 H 
 
2.1.9-2 Type 
 
 Lists may be either possibly empty or nonempty , as noted 
 above. It is the function of the list type, which 
 may not be empty, to determine this characteristic. A non- 
 empty list is indicated by the list types : L, LIST, STRING, 
 SEQUENCE, NONEMPTY STRING, NONEMPTY LIST, and NONEMPTY 
 SEQUENCE. For a possibly empty list, the available 
 types are : EL, KLEENE, KLEENE STAR, POSSIBLY EMPTY LIST, 
 POSSIBLY EMPTY STRING, and POSSIBLY EMPTY SEQUENCE. To 
 improve readability, the list type may be followed by 
 the special word, OF. If the list type is either 
 STRING or KLEENE, OF is necessary to avoid syntactic 
 ambiguity. 
 
 2.I.9.3 Base 
 
 The base may be a single symbol or group of symbols 
 enclosed in square brackets. It must not be a list 
 itself and is considered to be nested one level deeper 
 than the list of which it is the base. An S may follow 
 the base, in some cases, to indicate its plural character 
 in the grammatical structure of the list. In addition, 
 phrases which may be used as character terminals 
 all have plural forms which may be used profitably here. 
 
 • .j.h Separator 
 
 The power of the list construct is greatly enhanced 
 by the possibility of specifying a separator. Like 
 the base, the separator may be either a single symbol 
 
 jp of symbols enclosed in square brackets. The 
 •irator, basi , coni LderecJ bo be at the n^xt 
 
23 
 
 higher bracket nesting level from that of the list, itself. 
 The plural forms valid in the base are also valid in the 
 separator. The appearance of the separator between 
 successive base items is either required or optional- 
 according as the separator type is definite or 
 questionable. Definite separator types are indicated 
 by SEP, SEPARATOR, and SEPARATED BY; questionable 
 separator types are indicated by Q,, and by the special 
 word, POSSIBLY, followed by any one of the definite 
 separator types. 
 
 2.I.9.5 Tail 
 
 In the absence of a list head, the tail determines 
 "whether the list takes the left recursive or right 
 recursive form. If it is empty, or the special word, 
 CLOSE, the list is left recursive. If it is the 
 special word, OPEN, the list is right recursive. 
 Note once again that either the tail or the head must 
 be empty in any list structure. 
 
 The examples below illustrate the various aspects 
 of the list structure: 
 
 A S SEPARATED 
 BY SEMICOLONS; 
 
 ARITHMETIC EXPRESSION ::= LIST [LIST  
 SEP [#* /#/]] SEP [#+/ #-]; 
 
2k 
 
  : := [ #:] *  CONSISTS OF A LEFT RECURSIVE LIST OF 
 S POSSIBLY SEPARATED BY SLASHES 
 ENCLOSED IN SQUARE BRACKETS; 
 
 2.1.10 Seeded Lists 
 
 Any of the list structures, the syntax for which appears in figure 1 
 above, becomes a seeded list when followed by a list seed. The syntax of the 
 list seed is 
 
 #STARTING ~ 
 #BEGINNING_ 
 
 } 
 
 #WITH  , 
 
 where  may be any TWINKLE symbol, or group of symbols, enclosed in 
 square brackets, with the exception of enclosures and maybe symbols. 
 
 The seed list may be used to indicate that the first element of a 
 particular list is distinguished in some way from the rest. For example, 
 the syntax of an ALGOL block may be written: 
 
 A  CONSISTS OF A POSSIBLY EMPTY LIST OF S 
 SEPARATED BY SEMICOLONS STARTING WITH A POSSIBLY EMPTY 
 LIST OF S SEPARATED BY SEMICOLONS ENCLOSED 
 IN #BEGIN AND #END 
 
 [ote that, since the list seed may not be an enclosure, the enclosure operator 
 applies to the entire seeded list and not simply to the list of S . 
 
25 
 
 2.2 Semantic Symbols 
 
 The TWINKLE symbols and constructs described up to this point may 
 be used in the syntactic specification of a language. A compiler, however, 
 must be more than simply a recognizer for a language. It must assign appro- 
 priate meanings (e.g., in the form of equivalent machine code) to the various 
 syntactic entities that it recognizes from the input stream. These meanings, 
 taken as a whole, make up the semantic description of the language or, more 
 simply, the semantics of the language. In the TWS, the semantics is written 
 in the Illinois Semantics Language (ISL), a complete description of which is 
 provided by Machado [10 ] • For the purposes of this discussion, it suffices 
 to consider the ISL semantic description as a number of individual semantic 
 blocks, each of which is associated with a semantic name through which it may 
 be accessed by the parser. Before describing the manner in which the parser 
 is directed to initiate a semantic block, it is necessary to consider the pos- 
 sible results of the execution of a semantic block. 
 
 2.2.1 Actions and Tests 
 
 Based on their effect on the parser, semantic blocks may be divided 
 into two groups: semantic actions and semantic tests. Actions have no direct 
 effect on the parser and are used primarily for such functions as table cre- 
 ation, code emission, etc Tests, on the other hand, are actually more of a 
 syntactic character than a strictly semantic character. Thus, a test is 
 called by the parser when the nature of a particular entity is syntactically 
 undeterminable and requires investigation on a higher level. For example, 
 in an ALGOL assignment statement a boolean variable must be assigned the 
 value of a boolean expression while an integer or real variable must be 
 assigned the value of an arithmetic expression. Since the declaration in 
 which the type of the variable in question was determined may precede its 
 
26 
 
 usage by an arbitrary amount, and since an arithmetic expression and a 
 boolean expression may, in general, be identical for arbitrarily many symbols, 
 the parser is unable to determine whether an arithmetic or boolean assignment 
 is being made. The question is resolved by calling a test which compares 
 the variable with tables made when the declarations were recognized. The 
 result of this comparison is communicated to the parser which then is able 
 to proceed correctly with the parse. 
 
 Communication between test and parser is by means of the globally 
 declared boolean variable, SEMANTICTEST. If the value of SEMANTICTEST after 
 the execution of the test is true, the parser continues along the 
 indicated branch of the parsing tree. Otherwise the parser abandons this 
 branch and must decide among the remaining branches, possibly invoking further 
 tests in the process. It is important, therefore, that each block, which may 
 be called as a test, set SEMANTICTEST at some point in its execution. If 
 this is not done, the parser -will determine its future course of action 
 from the previous value of SEMANTICTEST with the attendant possibility of 
 an erroneous parse. Since any block may be called, both as an action and as 
 a test, at different times during the parse, it is permissible for an action 
 to set SEMANTICTEST although the variable is disregarded under these circum- 
 stances. 
 
 An indication to the parser of the points in the syntactic recogni- 
 tion at which a particular action or test must be involved is given by placing 
 semantic call symbols at appropriate points in the syntax. These calls may 
 have any of the three forms described below. 
 
 ' . .'- Simple Calls 
 
 'nple action (test) calls consist of the special symbols, "^S"("@T"), 
 olloved by the name of the action (test) being called. Any string of digits 
 * alphanumeric characters beginning with a letter may be used as a name. 
 
27 
 A particular semantic block may be called from as many places in the syntax 
 as desired and may be called as a test of one point while being called as an 
 action at another point. Several forms of the simple call linger from earlier 
 versions of the TWS. Thus, in addition to "@S", an action name may be pre- 
 ceded either by "@Q" or "#", and a test name may be preceded by "#" and 
 enclosed in either quotes or angle brackets. When "#" is used in an action 
 call, the action name must be a string of digits. The reason for this is that 
 the call would appear to be a special word if the action name were to begin 
 with a letter. 
 
 2.2.3 Declaration Calls 
 
 Any of the simple calls (except the "#"  form) may be 
 extended into a declaration call. The action or test name is followed by a 
 colon and then a description of the action or test in ISL code. This descrip- 
 tion, or declaration, must be enclosed either in square brackets, or in the 
 special words, BEGIN and END. Each name may be declared no more than once in 
 this way, although such a declaration is not necessary. Any name so declared 
 may be used in the syntax in simple calls, both before and after the appearance 
 of the declaration. When the semantic block in question is brief, the overall 
 clarity of the linguistic description may be considerably enhanced by using the 
 declaration call. 
 
 2.2.U Implicit Calls 
 
 When a block is of such a nature that a declaration call would be 
 appropriate, and yet is used in only one place, it is clearly not necessary 
 to give a name to that block. Under these circumstances the name and colon 
 in the declaration call may be omitted, thereby creating an implicit call. For 
 each implicit call in the syntax, the TWINKLE translator generates a unique name 
 through which the relevant block of code may be referenced by the ISL translator. 
 
28 
 2.2.3 Parameterized Semantic Calls 
 
 Semantic calls, -with the exception of implicit calls, may be modified 
 by a list of integer constant parameters separated by commas enclosed in paren- 
 theses and placed immediately following the action or test name. These constants 
 are used by the parser to set a group of global variables (the array row PARAM) 
 that may be referenced by the semantic routine when it is called. This is 
 frequently very useful when a number of portions of the syntax, which would 
 otherwise require different semantic actions, may be serviced by a single, 
 appropriately parameterized, action. For example, in recognizing written out 
 characters, TWINKLE employs a single semantic action whose parameter is the 
 internal code number of the symbol recognized. 
 
 2.2.6 Bit Actions and Tests 
 
 Frequently, a semantic action involves nothing more than the setting of 
 a single bit. Similarly, a semantic test is frequently based on the condition of 
 such a bit. Calling a semantic block to perform these manipulations requires a 
 disproportionate amount of overhead and it was, therefore, considered appropriate 
 to introduce special action and test types specifically for performing bit opera- 
 tions. The syntax of the bit action is: 
 
 f#SET 
 
 #g #s [#reset| #bit  
 
 where  is either a number, identifier, or TWINKLE special word. The de- 
 signated bit is correspondingly either set or reset. The syntax for the bit test 
 
 r x 
 
 J#ON 
 
 #§ #T #BIT |#OFfJ 
 
 Condition Action 
 
 The test is true if the designated bit is in the condition specified (i.e., ON or 
 
 F). The default condition is ON. If the test is true the indicated action is 
 
 the designated bit. Up to kQ different bit names may be used by the lan- 
 
 These are assigned by the TWINKLE translator to the hd bits of 
 
 Le, ACTIONBITS. 
 
29 
 
 2.2.7 The Tail 
 
 In most languages it will not be desirable to declare each block 
 (implicitly or otherwise) directly in the syntax specification. Also, all 
 but the simplest of semantics will require a number of variables and procedures 
 declared globally to the individual semantic blocks. For the sake of complete- 
 ness, these global declarations and undeclared blocks may be enclosed between 
 the special words, BEGIN and END, (thus forming the semantic tail) and appended 
 to the syntax specification. This tail will then be passed directly to the 
 ISL translator upon completion of the TWINKLE translation. In this way a 
 language may be completely processed by the TWS from a single complete specifi- 
 cation of the language. During the debugging phase of the language development, 
 it is more natural to process the syntax and semantics separately. The 
 details of coordinating the ISL translator with the rest of TWS in an 
 independent run are discussed by Machado [lo] • 
 
 2.2.8 Placement of Calls 
 
 A semantic call may appear anywhere in the syntax that a syntactic 
 symbol may appear, except at the beginning of an alternative. Thus, a semantic 
 call may not appear immediately after the arrow of a production, the left square 
 bracKet of a square bracket construct, or the separator ("/" or OR) of a 
 list of alternatives. The reason for this is that a semantic call cannot be 
 made by the parser until it has determined exactly what stage the parse has 
 reached. Clearly the parser cannot, in general, determine this at the 
 beginning of an alternative. 
 
 Unfortunately, there is a good deal more to placing semantic calls 
 than simply knowing where they will be legal. The ideal time to place them 
 would be after the FPL form (see chapter 3 and the paper by Beals [3 ]) had 
 been generated. The condition of the stack and the phase of the parse 
 
30 
 
 would then "be known explicitly. It is, however, fairly straightforward to 
 place them directly into BNF as was done in earlier versions of the TWS. The 
 problems present in TWINKLE, with respect to call placement, arise chiefly from 
 the complex structures (lists, enclosures, etc.) available and from the large 
 amount of grammar transformation that is inherent in TWINKLE translation. 
 Thus, virtually all of the TWINKLE constructs not present in BNF employ some 
 form of TWINKLE generated nonterminals in their implementation. Because of 
 this, the configuration of the stack at the moment of a semantic call may not 
 be easy to determine. The following guidelines will be helpful in creating 
 and placing semantic calls to achieve a given end: 
 
 1. A semantic call is made following the recognition of the symbol 
 or construct at the same bracket nesting level immediately 
 preceding its occurrence in the syntax. For example, if one 
 writes 
 
  : : = LIST  SEP  @S1 , 
 semantic action "1" will be called after the entire list 
 has been recognized and not after each separator, . The 
 latter effect may be achieved by 
 
  : : = LIST  SEP [ @Sl] 
 
 2. A semantic routine should not reference the stack for symbols 
 at the same nesting level as its call — provided that the call 
 and the symbol are separated by one of the non-BNF TWINKLE 
 constructs. For example, in the TWINKLE production, 
 
  : :=   LIST    @S A , 
 the semantic action, "A", may reliably reference the nonterminals, 
  and , but may not reference the nonterminals,  and , 
 
 from which its call is separated by the list construct. 
 
31 
 
 3« A semantic routine should not reference symbols which occur at 
 different nesting levels from its call. The only exception to 
 this rule is the case in which a, semantic call immediately follows 
 the right square bracket of a square bracket construct. The call 
 is then, in essence, copied onto the end of each of the alternatives 
 within the square brackets. 
 A detailed example showing the placement of semantic calls in a TWINKLE 
 grammar for a subset of ALGOL is provided by Machado [lo]« 
 
 2.3 Null and Empty Symbols 
 
 Several forms of context analysis which are performed automatically 
 by the TWS must be provided by the user under the TBNF system. The three 
 special words (BACK, AHEAD, and NOT), which are used in TBNF to provide con- 
 textual information, have no meaning to the BNF half of the TWS. Therefore, 
 when translating into BNF, these special words and the constructs that they 
 herald are meaningless to TWINKLE and are referred to as null symbols. In 
 addition to these null symbols, TWINKLE provides two forms of comments which 
 are meaningless and therefore qualify as null symbols. Any string of symbols 
 enclosed in parentheses, the left-most parenthesis of which is not preceded 
 by either a sharp or a semantic call, constitutes a comment and is deleted 
 by the scanner. Any string of symbols preceded by the special words, COMMENT 
 or C, and not including the symbols [,],;,., or the special words, BEGIN and 
 END, also constitutes a comment. 
 
 An empty symbol denotes a string of zero length. It is written as 
 either one of the special words, EMPTY or LAMBDA, or as an adjacent pair of 
 left and right nonterminal parentheses (i.e., < > or " ") . 
 
32 
 
 3- CONTROL OF THE TWINKLE TRANSLATION 
 
 The TWINKLE translation is merely the first step in a chain of opera- 
 tions undertaken in generating a compiler for a language. Control options 
 specified in the TWINKLE input may be intended for use in a later phase of 
 the TWS. To make the meaning of these options clear a brief description of 
 the entire TWS is now given. 
 
 3«1 The Translator Writing System 
 
 Figure 2 presents a block diagram of the interrelations between the 
 programs which make up the Translator Writing System when creating a compiler 
 for a language, L. As indicated, the TWS can generate either a recursive 
 descent compiler or a deterministic Floyd production compiler, the decision 
 being made by the user through the PARSER control card. Consideration will 
 be given first to the Floyd production section of the TWS which comprises the 
 TWINKLE translator, the ISL translator (ISLTRAN), BNF2FPL, FPL2PAR, PAR2ALG 
 and finally, the ALGOL compiler. 
 
 A unified syntactic and semantic description of L is provided as 
 input to the TWINKLE translator. The translator extracts the syntactic infor- 
 mation which it transforms into BNF and places, together with several other 
 tables, it in a disk file labelled L/TABLESF. Similarly, the semantic portion 
 
 the input is placed in file, L/ ACTIONS, for use by ISLTRAN. The TWINKLE 
 translator then initiates execution of both BNF2FPL and ISLTRAN. The BNF 
 syntax of L is transformed by BNF2FPL into Floyd productions (FPL) which are 
 ilaced in L/FLOYDP while additional tables are placed in L/TABLESF and the 
 information in the first record of L/TABLESF is updated. BNF2FPL 
 .itiates execution of FPL2PAR which transforms the FPL syntax from 
 nto a stream of pseudo-orders which are returned to L/TABLESF. 
 itrol information in the first record is updated. 
 
33 
 
 
 r< 
 
 
 
 O / 
 
  -P 
 
 
 
 •H £ 
 
 5h 
 
 
 W > 
 
 R< 
 
 CD 
 
 o 
 
 H 
 
 P 
 
 H 
 
 o 
 
 
 fe 
 
 o 
 
 l-H 
 
 o 
 
 rH 
 
 0) 
 H 
 
 •H 
 
 o 
 o 
 
 nasavd/T 
 
 aviTna/i 
 
 CO 
 H 
 
 EH 
 
 5n/i 
 
 CO 
 O 
 
 l-H 
 
 J 
 
 O 
 
 Jh 
 
 
 C\ 
 
 o 
 
 ^ 
 
 CD 
 
 
 fx 
 
 ( rH 
 
 ^ 
 
 Si 
 
 
 |S 
 
 ; h 
 
 O 
 
 0) 
 
 
 PC 
 
 1 
 
 rH 
 
 Ph 
 
 C5 
 
 > 
 CO 
 
 bO 
 Si 
 •H 
 •P 
 ■H 
 
 fH 
 
 IS 
 
 fH 
 
 o 
 
 -P 
 05 
 
 rH 
 W 
 
 rl 
 
 EH 
 CD 
 -P 
 
 ; 
 
 where  is any string of letters and digits beginning with a 
 letter. These characters, or the first seven if there are more, are used as 
 a prefix for all the interlinking and output files generated by the TWS, 
 including the TWINKLE translator. 
 
 3.2.2 Print Options 
 
 The print control statement consists of the special word, PRINT, 
 followed by a colon, followed by a list of print options separated by commas, 
 followed by a semicolon. The options available are defined below. 
 
 1. TABLES SIFESF: causes the printing of a table displaying the sizes and 
 locations of all of the tables in the f ile, TABLESF • 
 
 2. TERMINALS ALPHABETICALLY, or TRMALF: causes the printing of an alphabetic 
 
y 
 
 list of all of the special -words used in the language. If BNF is being 
 generated then the list includes an index of each occurrence of the 
 special words in PROTAB. 
 
 3. TERMINALS NUMERICALLY, or TRMNUM: causes the printing of a numerically 
 ordered list of all of the special words used in the language. 
 
 h. CHARACTERS , or TRMCHR: causes the printing of an index of the occurrences 
 of the 6k characters in PROTAB. 
 
 5- TERMINALS: is equivalent to 2, 3, and k taken together. 
 
 6. NONTERMINALS ALPHABETICALLY, or NTALF: causes the printing of an alpha- 
 betical list of all of the nonterminals used in the language. If BNF is 
 generated, the list includes an index of each occurrence of the 
 nonterminals in PROTAB- 
 
 7- NONTERMINALS NUMERICALLY or NTNUM: causes the printing of a numerically 
 ordered list of all of the nonterminals used in the language. 
 
 . NONTERMINALS: is equivalent to 6 and 7 taken together. 
 
 9- SYNTAX, or INPUT: causes the printing of the TWINKLE input as it is read. 
 
 0. INDEX, or XREF, or CROSS REFERENCE: causes the printing of an index of 
 
 occurrences of all nonterminals, terminals, and actions in the syntax 
 by card number. 
 
 1. AC! IONS ALPHABETICALLY, or ACTALF: causes the printing of an alphabetical 
 
 all of the semantic actions and tests used Ln the language. If 
 I generated, the list includes an index of each occurrence of 
 d tests in PROTAB. 
 
37 
 
 12. ACTIONS NUMERICALLY, or ACTNUM: causes the printing of a numerically 
 ordered list of all of the semantic actions and tests used in the language. 
 
 13. ACTIONS: is equivalent tc 11 and 12 taken together. 
 
 Ik. PROTAB: is the name of the table into which TWINKLE places the BNF 
 equivalent of TWINKLE syntax in the input. This option causes the 
 printing of this table. 
 
 15. FLOYD: BNF2FPL transforms PROTAB into a set of Floyd productions in the 
 disk file, L/FLOYDP. This option causes the printing of these Floyd 
 productions. 
 
 16. COMBINED GROUPS: causes the printing of the components of all of the 
 combined groups required by the language. For a discussion on the use 
 of combined groups, see the paper by Beals [ 3 ] • 
 
 17. PARSER: FPL2PAR transforms the Floyd productions in FLOYDP into a stream 
 of pseudo-orders which make up the parser. This option causes the 
 printing of this stream of pseudo-orders. 
 
 18. PATTERNS: causes the printing of a table of the  patterns 
 created in the TWINKLE translator processing of L, as well as any 
 additional patterns created by either BNF2FPL or FPL2PAR. 
 
 19- STANDARD: is the union of options 1, 5> 8, 9, 10, 13, 1^, and 18. 
 
 20. DEBUGGING, or DEBUGN: is the union of options 15, l6 and 19, i.e., of 
 everything but option 17 • 
 
 21. EVERYTHING: this is the union of all options. 
 
 22. NOTHING: this option, when used by itself, inhibits all printing. 
 
38 
 
 If no print control statement appears, the print options are set 
 to the default option, STANDARD. 
 
 3-2»3 The Parser Type Option 
 
 As mentioned several times above, the TWS is equipped to produce 
 compilers based on either recursive descent or Floyd production language parsers. 
 It is appropriate, therefore, to have a control statement for determining -which 
 is to be generated. The relevant control statement is : 
 
  PARSER; 
 where  is either RECURSIVE DESCENT or FLOYD PRODUCTION. 
 
 3-2.U Zip Control 
 
 In Burroughs B-5500 ALGOL it is possible for one ALGOL program 
 to initiate execution of another by executing a zip statement (i.e., by 
 zipping to the other program). The component programs of the TWS use the 
 zip statement to initiate their successors. In normal operation zipping 
 continues through final compilation by the ALGOL compiler. Frequently, a 
 user does not desire execution of the entire TWS but may wish, for example, 
 to check just the syntax, or just the semantics, of the input. This possibility 
 is allowed for in the TWS by the zip control statements which are listed below: 
 
 ZIP TO ISLj 
 DONT ZIP TO ISL; 
 DONT ZIP; 
 
 ZIP THROUGH ; 
 re 
 
  ::= TWST/BNF2FPL/FPL2PAR/PAR2ALG/ISL/ALG0L. 
 use of these control statements is self-evident. 
 
39 
 
 3-2. 5 Program Parameter Control 
 
 Each of the programs in the TWS has certain program parameters 
 which are normally assigned default values that permit compiler generation 
 for many small languages. It is possible, however, that a particular language 
 may require more execution time, a larger stacksize, or a higher B-5500 core 
 memory estimate to run successfully through some phase of the TWS. Corres- 
 pondingly, it may be desirable when processing some smaller languages to 
 decrease the values of some of these program parameters. This can be done with 
 the three program parameter control statements shown below: 
 
  PRIORITY = <*N>; 
 
  CORE - <*N>; 
 
  STACK = <*N>; 
 
 where  was defined in the last section and <*N> is a positive 
 integer. These set the priority (and, implicitly, the time limit), the core 
 estimate, and the stacksize, respectively, of the program designated. These 
 parameters are then used in zipping to the program. If the  
 is COMPILER, the parameters are passed to the ALGOL compiler and become 
 the default parameters for the generated language compiler. 
 
 3> 2 .6 Executable Compiler Options 
 
 It was noted in section 3-1 that a Floyd production parser generated 
 by the TWS may be, to a greater or a lesser extent, an executable parser. 
 The default option is a parser which is wholly interpretive, but an executable 
 version of any of the three parser sections may be requested by use of the 
 control statements shown below: 
 
 EXECUTABLE LOOKAHEAD; 
 
 EXECUTABLE FILL TABLES; 
 
 EXECUTABLE FLOYD PRODUCTIONS; . 
 
1+0 
 
 If the lookahead and fill tables portions of the parser are interpretive, 
 the resultant compiler, L/DISK, may only he executed if L/TABLESF is resident 
 on disk. By making these two portions executable, the parser becomes 
 a self-contained unit and compilation in L requires only L/DISK. 
 
 3'2.7 Miscellaneous Control Options 
 
 CLOSE, CLOSE LP, CLOSE LINEPRINTER, or CLOSE LINE PRINTER: applies 
 to BNF2FPL; it causes a separate file of output to be created each time an 
 error occurs during execution of BNF2FPL. In this way a user can ascertain 
 the cause of some errors before BNF2FPL runs to completion. 
 
 LONG LOOKAHEAD: applies to BNF2FPL; it specifies a four symbol 
 lookahead to be used in differentiating before deciding that the group cannot 
 be built. If the Floyd productions of a group being generated cannot be 
 differentiated by a three symbol lookahead and if combination is not possible, 
 the group is not normally built and the BNF2FPL translation fails. In 
 practice it has been found that when a lookahead of three symbols fails, no 
 additional amount of lookahead will help. 
 
 COMBINE FIRST: applies to BNF2FPL; it specifies that Floyd produc- 
 tion combination be attempted after a. one symbol lookahead has failed to 
 differentiate, but before attempting a two or three symbol lookahead; if combina- 
 
 is not possible, two and three symbol lookaheads will be attempted 
 before abandoning the group. 
 
 FLOYD PRODUCTIONS PER PROCEDURE: <*N>: applies to PAR2ALG- When 
 creating an executable parser, PAR2ALG generates procedures -- each containing 
 some specified number of the Floyd productions of the language. This number 
 ically 100, but may be set by the language designer to any desired 
 • 
 
kl 
 
 GROUPS PER PROCEDURE: <*N>: applies to PAR2ALG; determines the 
 number of groups of Floyd productions in each executable parser procedure. 
 
 PROGRAM SYMBOL: : is followed by a nonterminal name, say , 
 ■which is taken to be the unique objective symbol of the language in question; 
 if this option is not used, the first nonterminal to appear in the syntax 
 specification is taken as the unique objective symbol for the language. 
 
 SPECIAL SYMBOLS: : may be used to force a 
 particular ordering of the special words of the language which are otherwise 
 numbered in the order in which they first appear in the syntax. 
 
 3-3 Burroughs B-5500 Control Cards for Executing TWINKLE 
 
 The TWINKLE translator is executed like a compiler on the B-5500 
 system. When the syntax to be translated is on cards, the following deck 
 set up may be used: 
 
 ? USER - Language designer's user code 
 ? COMPILE A/B WITH TWINKLE LIBRARY 
 ? DATA CARD 
 
 input syntax 
 ? END. 
 Since TWINKLE does not create executable code, the file, A/B, is not used, 
 and the name may be specified arbitrarily by the language designer. Because 
 this file is not used, either of the following forms may be used when the 
 input syntax is a file on disk, say PLl/SYNTAX: 
 
 ? USER = Language designer's user code 
 ? COMPILE A/B WITH TWINKLE LIBRARY 
 ? TWINKLE FILE CARD = PLl/SYNTAX SERIAL 
 ? END: 
 
 or 
 
k2 
 
 ? USER = Language designer's user code 
 ? COMPILE PL1/ SYNTAX WITH TWINKLE LIBRARY 
 ? END. 
 
 In the latter case, TWINKLE discovers that the input is not on cards and that 
 no file has been equated to file, CARD. It then investigates the code file and, 
 if it exists on disk, takes it as the file, CARD. In the former case a file 
 has been equated to f ile, CARD, so this is taken as the input syntax. In this 
 case, the code file, A/B, is not used and may be named arbitrarily. 
 
^3 
 
 k. IMPLEMENTATION OF THE TWINKLE TPANSLATOR 
 
 The TWINKLE translator has been implemented with the TWS in a 
 bootstrapping fashion. The preliminary version of the translator was written 
 in BNF and processed on the portion of the TWS then existing, which was 
 essentially equivalent to the BNF2FPL, FPL2PAR and PAR2ALG stages of the 
 current TWS. Each subsequent revision to the TWINKLE translator was imple- 
 mented with the aid of its predecessor. Thus, although the present syntax 
 is much more sophisticated than the initial syntax, it is also shorter and 
 considerably more readable. The following sections detail some of the 
 salient features of the TWINKLE translator. 
 
 k.l NONTAB, SYMTAB and OPRTAB 
 
 As each nonterminal is read from the input syntax, its name is com- 
 pared against all those presently entered in NONTAB. If a match is found, 
 the corresponding nonterminal number is extracted from the relevant 
 field of the header word for the matching table entry. If no match is dis- 
 covered, a new entry is made. The entries are linked through the header 
 words in a binary tree which is alphabe* J cally ordered by the nonterminal 
 names. The format of the entries is shown below in figure 3* Associated 
 with each new entry into NONTAB is an entry into NTINDX pointing to the header 
 word of the nonterminal in NONTAB which facilitates printing out the non- 
 terminal names when necessary. Also, if the CROSS REFERENCE print option has 
 been activated by the user, the NTINDX word for a given nonterminal contains 
 a pointer to the base of an inter-linked list of the occurrences of that 
 nonterminal in the input syntax by line number. The actual repository 
 for this list, as well as those for the other nonterminals, terminals, and 
 
hh 
 
 action symbols from the input syntax, is an array called OVERALLINDEX, each 
 of whose entries contains a line number, a bit showing whether the specified 
 occurrence was on the right or left hand side of a production, and a pointer 
 to the entry for the next occurrence of the item in whose list the entry 
 resides. 
 
 16 19 22 
 
 HEADER 
 
 X,> n WORDS 
 
 m chars 
 
 Figure 3- An entry in the NONTAB table 
 
 Figure 3 shows the details of a single entry in NONTAB. Consider, 
 first, the header word. Nonterminals (with the exception of the unique 
 objective symbol) used in the syntactic input must appear on the left hand 
 side of at least one production and on the right hand side of at least one 
 (not necessarily different) production. The INLHS bit is set on recognizing 
 the nonterminal as the left hand side of a production and the INRHS bit is 
 set on recognizing it in the right hand side of a production. These bits are 
 checked at the conclusion of syntax input and any discrepancies are reported 
 as errors on the TWINKLE output file, LINE. The SYMBOLVALUE field contains 
 rial number (code) of the nonterminal symbol represented by this 
 
^5 
 
 NONTAB entry. Given a nonterminal name of k characters, n of the WORDS 
 field and m of the CHARS field are given by n = [k/l6] and m = k - 6* (n+l). 
 These two fields, taken together^ determine the extent of useful information in 
 the remaining words of the NONTAB entry. Finally, the LEFTPOINTER and RIGHT- 
 POINTER fields contain pointers to subsequent entries in the alphabetic binary 
 tree which NONTAB comprises. The remaining words in the NONTAB entry consist 
 of n words containing 6 characters each of the nonterminal name right justified 
 with 2 unused characters at the left, and, in the last word, 2 unused charac- 
 ters, m characters from the nonterminal name, and blanks filled to the right. 
 
 Corresponding to NONTAB and NTINDX for nonterminal storage 
 are the pairs of tables (SYMTAB, STINDX), and (OPRTAB, OTINDX) for storing 
 special symbols, and semantic symbols, respectively. As mentioned above, the 
 line by line index information for both these types of symbols is stored in 
 OPERALLINDEX along with that for nonterminals. While STINDX and OTINDX are 
 identical counterparts to NTINDX, entries in SYMTAB and OPRTAB differ slightly 
 from those in NONTAB and, in fact, from one another. In the case of SYMTAB, 
 there is, clearly, no need for the INLHS and INRHS bits since a special symbol, 
 if it appears at all, must appear on the right-hand side of a production. 
 Consequently, in the header words for SYMTAB, these bits are included as a 
 portion of the SYMBOLVALUE field. In OPRTAB header words it is also clear 
 that the INLHS and INRHS bits are unnecessary, but here the first bit is unused 
 and the second bit becomes the USED bit denoting an action or test symbol 
 
 that has been declared in the syntax. This bit is read by ISL/DISK to deter- 
 mine which actions it must get from the file,/ACTIONS . It is 
 also used by TWINKLE to catch duplicate declarations of the same semantic name. 
 
 k.2 PROTAB, PRODS and PDLIST 
 
 The primary table into which TWINKLE collects the BNF productions which 
 it produces from its TWINKLE input is PROTAB. Figure k shows the fields of a 
 
he 
 
 PROTAB word. The FLAGS field comprises a set of six one bit flags carrying 
 6 12 18 30 36 
 
 
 FLAGS 
 
 NEXT 
 
 LHS 
 
 TYPE 
 
 ENTRY 
 
 SYMBOL 
 
 Figure h. PROTAB word format 
 
 various pieces of information. These flags are referred to as: IREC, NOBACK, 
 TRMDER, LASTNT, SFLAG and REC The IREC flag is set only in the first word 
 of a production and denotes an indirectly left recursive production. That is, 
 a production of the form: 
 
  : : =  a 
 for which  is a headsymbol of . 
 
 When the NOBACK flag is set, the symbol in the SYMBOL field is not 
 to be back-substituted into this PROTAB location. If the symbol in the SYMBOL 
 field has a terminal derivation, then the TRMDER flag is set. The LASTNT 
 flag is set when the symbol in the SYMBOL field is a nonterminal and, except 
 for possible trailing semantic symbols, is the last symbol in the production. 
 The SFLAG flag heralds a semantic symbol as the next symbol in the production. 
 Finally, the REC flag is set in the first symbol of a production if the symbols 
 in the LHS and SYMBOL fields are the same (i.e., if the production is left 
 recursive). The remaining fields are : the NEXT field which gives the number 
 of words to the beginning of the next production, the LHS field which contains 
 the number of the nonterminal being defined, and the TYPE and ENTRY fields of 
 this particular right-hand side symbol. 
 
 TWINKLE performs a number of grammar transformations on a local level. 
 Frequently more than one production is being built simultaneously, as happens 
 when nested definitions are being translated, or when lists are being imple- 
 ments. These difficulties make it very cumbersome for TWINKLE to put produc- 
 tions directly into PROTAB. To circumvent these problems TWINKLE uses a 
 
hi 
 
 556 element entry table (PRODS ) as a directory and status table for 255 thirty- 
 two- symbol productions which are stored in the PDLIST array. The words of 
 PRODS form a list linked forward through the NEXTPD fields and backwards 
 through the IASTPD fields. The first entry of PRODS acts as a base for the 
 productions currently in use. The base of available productions is given by 
 the integer variable, FIRSTPDA VAIL. To manipulate these two structures, TWINKLE 
 uses two procedures: GETPROD and GIVEUP. GETPROD is an integer -typed pro- 
 cedure which has no arguments and, when called, returns the address of the next 
 available element of PRODS after incorporating it into the link structure and 
 making the necessary modifications to the various list pointers. GIVEUP has 
 as its sole argument the address of an element of PRODS to be removed from 
 the link structure and returned to available pool. The PDLIST symbols cor- 
 responding to PRODS(N) are PDLIST (32 x N) through PDLIST (32 X N + 31). 
 
 In addition to serving as the link structure for the productions in 
 PDLIST, each entry of PRODS contains the following information about the 
 production with which it is associated: 
 
 (i) COMPLETE: a flag which indicates that the associated production is no 
 longer being extended; 
 (ii) LEVEL: an eight bit field recording the level of bracket nesting at 
 which the production originated; 
 (iii) LAMDA: a flag which indicates whether the production has participated 
 in empty context absorption; 
 (iv) NEXT SYMBOL: a five bit field containing a count of the symbols currently 
 in the production; 
 (v) DORNO: a three bit field which identifies whether the left-hand side 
 of the production is a simple nonterminal or one of the several 
 TWINKLE generated nonterminal types; 
 (vi) LHS: a twelve bit field containing the nonterminal number of the 
 
48 
 
 left-hand side of the production. 
 
 Each word of PDLIST contains only the left-hand side of the produc- 
 tion in the fields, IjHSDORNO and LHS, and one symbol from the right-hand side in 
 the fields: DORNO, TZPE, and ENTRY. The remaining information necessary in 
 PROTAB is not filled in until a production actually enters PROTAB. 
 
 Productions which are being created in PDLIST may be extended by 
 calling the procedure, ADDON. ADDON has two arguments: the first is the number 
 of the production to be extended; the second is the symbol by which it is to 
 be extended. If the symbol to be added is a TWINKLE -generated nonterminal 
 representing the alternatives of a nested definition set, the procedure 
 for adding it to a production is somewhat complex. Each of the alternatives 
 must be added on individually and, if more than one alternative is present, 
 new productions must be created. Thus, for example, if Oi represents the 
 string of symbols in the production being extended, and f3 through p represent 
 the strings of symbols in the alternatives of the symbol being added on, 
 ADDON will replace the production: 
 
  : : - a 
 
 by n productions: 
 
  : := a& 
  : : = ap g 
 
  : : = OB 
 
 K n 
 
 At the same time ADDON removes the n productions: 
 
  .:= p x 
 
  : : = P 2 
 
  : : = B 
 
 K n 
 
 where - the TWINKLE -gene rated nonterminal referred to above. By expanding 
 
U 9 
 
 nested definitions in this way, TWINKLE ensures that each of the productions 
 which it creates retains all of the context that the language designer 
 specified in the original TWINKLE production. 
 
 If the symbol being added on is any other type of symbol, say the 
 symbol, a, ADDON replaces the production: 
 
  : : = a 
 by the production: 
 
  : : = aa 
 where GC is as above. 
 
 When it has been arcertained that production, P, is completed and is 
 ready to be put into PROTAB, the procedure, PUTINPROTAB, is called with the para- 
 meter, P. PUTINPROTAB fills in the NEXT and FLAGS fields of the production 
 and writes it directly into the next available locations in PROTAB. Once this 
 has been accomplished the production is returned to the free pool. 
 
 k.3 PRODSTACK and LPSTACK 
 
 The TWINKLE language offers the user only two recursive constructs. 
 These are nested definitions and list structures, which are implemented with 
 PRODSTACK and LPSTACK, respectively. These are dimensioned to allow nesting 
 of either definitions or lists to a depth of thirty, but this may, of course, 
 be easily altered in the unlikely event that it needs to be. The formats of 
 words in these two stacks are shown below (in figures 5 and 6, respectively). 
 
 SYMBOL 
 
 EXSYMBOL 
 
 Fieure 5. The format of a PRODSTACK entrv. 
 
50 
 
 SEPD0RN0 
 
 SEP 
 
 LSITTYPE 
 
 3^1 9 
 
 LBDORNO 
 
 15 
 
 27 
 
 SEPTYPE 
 
 SEPENTRY 
 
 30 
 
 36 
 
 LBTYPE 
 
 LBENTRY 
 
 SEPSYMBOL 
 
 LBSYMBOL 
 
 EXSEPSYMBOL 
 
 EXLBSYMBOL 
 
 Figure 6. The format of a LPSTACK entry. 
 
 PRODSTACK is simply a stack of single symbols- -the top symbol being 
 the left-hand side of the set of productions currently being built. Whenever 
 the left-hand side of a TWINKLE production is encountered, the left-hand side 
 nonterminal symbol is pushed into PRODSTACK. Similarly, when the left square 
 bracket of a square brackets construct is encountered, a TWINKLE- generated 
 temporary dummy (the DORNO field is set to one) is created and pushed into 
 PRODSTACK. Right square brackets and semicolons, which end square bracket 
 constructs and productions, respectively, cause the top of PRODSTACK to be 
 popped. PRODSTACK is used in assigning names to TWINKLE- gene rated permanent 
 dummy nonterminals in the following manner. When such a dummy is required 
 (e.g., in the generation of lists, see below), PRODSTACK is searched downward 
 from the top for a natural nonterminal (which is identified by a zero in the 
 DORNO field) . There must always be such an entry because PRODSTACK always 
 extends to the beginning of some TWINKLE production which must start with a 
 natural nonterminal. The alphanumeric characters which make up the name of 
 the nonterminal are obtained from NONTAB. The desired name is then created 
 to these characters a blank, the characters "DUMMY" another blank, 
 and finally a number unique to this nonterminal. Since blanks cannot appear 
 in natural nonterminal names, these serve to ensure that no duplication of 
 
51 
 
 nonterminal names can arise by this procedure. As an example of this, the list 
 structure in the TWINKLE production: 
 
 A  CONSISTS OF A LIST OF S ; 
 
 would be implemented with a permanent dummy nonterminal named PROGRAM DUMMY 1. 
 
 Each entry of LPSTACK eventually contains all the information neces- 
 sary to construct the BNF equivalent productions for the list structure which 
 generated it. Whenever the list type of a list structure is encountered, a 
 new word is pushed into LPSTACK with an appropriate setting of the LISTTYPE 
 bit (i.e., 1 for a nonempty list and for a possibly empty list). When a 
 list base is recognized, the sub-fields of the EXLBSYMBOL field are set in the 
 top word of LPSTACK to identify the base. Similarly, recognition of a list 
 separator causes the subfields of the EXSEPSYMBOL to be filled in the top 
 word of LPSTACK; the SEP bit is set to zero for a definite separator 
 and to one for a possibly empty separator. If the list does not have a 
 separator, the EXSEPSYMBOL field is set to indicate an empty separator 
 and the SEP bit is set to zero. 
 
 The LPSTACK entry does not reflect the type of recursive desired for 
 the list, i.e., left recursive or right recursive. However, this is determined 
 syntactically and is transmitted to the semantics via the choice of the action 
 called. Given that EXSEPSYMBOL contains the symbol, S, and that EXLBSYMBOL 
 contains the symbol, B, figure 7 shows the productions which are generated 
 to implement the list for various choices of SEP and LISTTYPE where  is 
 the TWINKLE- generated permanent dummy which implements the list structure 
 and  is a TWINKLE -gene rated temporary dummy. Recall that a SEP bit of 
 one designates a definite separator and a zero bit designates a possibly 
 empty separator; whereas, a LISTTYPE bit of one indicates a nonempty list and a 
 zero bit indicates a possibly empty list. 
 
52 
 
 CPE 
 
 
 
 
 
 1 
 
 I 
 
 i 
 i 1 
 
 SEP 
 
 
 
 1 
 
 i ° 
 
 I 
 1 
 
 
 yes 
 
 yes 
 
 J no 
 
 1 no 
 
 i 
 
 
 yes 
 
 yes 
 
 i no 
 
 no 
 
 
 yes 
 
 yes 
 
 lyes 
 i 
 
 yes 
 
 
 yes 
 
 no 
 
 yes 
 
 no 
 
 
 yes 
 
 yes 
 
 yes 
 
 yes 
 
 -i 
 
  : 
 
 :=  
 
  : 
 
 := < > 
 
  : 
 
 :=  S B 
 
  : 
 
 :=  B 
 
  : 
 
 := B 
 
 (a) Left 
 
 LISTYPE 
 
 
 
 1 
 
 1 
 
  : 
  : 
 
  : 
  : 
  : 
 
 
 SEP 
 
 1 
 
 ' 
 
 1 
 
 
 
 yesj yes 
 
 no 
 
 no 
 
 :=  
 
 
 yes; yes 
 
 no 
 
 no 
 
 := < > 
 
 
 yes 
 
 yes 
 
 : yes 
 
 yes 
 
 := B S  
 
 
 yes 
 
 no 
 
 jyes 
 
 no 
 
 := B  
 
 
 yes 
 
 yes 
 
 i 
 1 yes 
 
 -i 
 
 yes 
 
 := B 
 
 (b) Right 
 
 Figure 7- Left recursive and right recursive lists. 
 
 Note that possibly empty lists are characterized by the  nonterminal; 
 whereas, nonempty lists are characterized directly by the list implementing 
 nonterminal . 
 
53 
 
 k.k Any Patterns 
 
 An any pattern comprises 38^ bits (eight words of kQ -bits each) for 
 which there exists a one-to-one correspondence between the lower numbered bits 
 and the terminal symbols of the language. Bits zero through 63 correspond to 
 the Gh characters of the Burroughs B5500 system;, bits 66 through 85 correspond 
 to the possible special terminal classes; bits 86 and above correspond 
 to the special words of the language. Bits Gh and 65 are never set in an any 
 pattern because they correspond to a terminating character and an illegal 
 character, respectively. Any patterns are stored end-to-end in a 512 -word 
 array, ANYPAT. Each bit that is set in an any pattern indicates that 
 the corresponding terminal is represented by the any pattern. 
 
 During the preliminary translation, any patterns are actually created 
 in the negative- -that is, the bits are set if the corresponding terminal is 
 not represented by the particular any pattern. This condition is rectified 
 when the syntax input has been completed. Each pattern is transmitted to the 
 procedure, CLEANUP, which converts it to the required form which may involve 
 recursive calls of CLEANUP if the any base for the pattern was a nonterminal. 
 
 k.5 Grammar Transformations 
 
 The PROTAB generated by a TWINKLE translation is not, in general, in 
 a form acceptable to BNF2FPL. A number of transformations must therefore be 
 performed on a BNF grammar to increase the probability of its acceptance to 
 the overall TWS. In addition, a few transformations are applied to increase 
 the efficiency of the resultant compiler. These are described below in 
 the order in which they are performed. 
 
 h.^.1 Back Context Absorption 
 
 In translating from TWINKLE into BNF, it is important that TWINKLE 
 retain any context in the BNF that was inherent in the original TWINKLE 
 
^ 
 
 production. When a left recursive list is created, it is necessary for the 
 non-recursive productions of the list- implementing nonterminal to absorb 
 any of the context prior to it that may have arisen from constructs within 
 the TWINKLE production that initially gave rise to the list. As may perhaps 
 be anticipated, problems begin to crop up when more than one list is included 
 in a single TWINKLE production- -as in the case of nested lists. 
 
 Each time a definition is complete (i.e., whenever a slash, right 
 square bracket, semicolon, or special word, OR, is encountered), the following 
 context- absorbing algorithm is invoked. Let {P n |n = !>•••■> N} be the set 
 of productions in PDLIST and {D |n = 1,..., N} be the corresponding entries 
 in PRODS. Further, let P (j) denote the j-th symbol on the right-hand side 
 of the n-th production and let R denote the corresponding left-hand side. 
 The right-hand sides of P n are then scanned for the occurence of a left- 
 recursive TWINKLE- generated dummy nonterminal, say P (j) (such a nonterminal 
 is characterized by a 3 in the DORNO field discussed in section h.2). If 
 j > 1, or if the LAMDA bit of PRODS is set, a new production, P (k = N + l), 
 
 K. 
 
 is created as follows: 
 
 P k (i - j + 1) = P n (i) i = 3, + 1,...,  ::= a LIST [LIST b SEP c] d LIST e 
 
 where a, b, c, and d represent terminal symbols. The productions, before 
 context absorption, are: 
 
 P :  :: = \a d  
 
 P :  : : - *» 
 
 P :  : : =   
 
 P, :  : : = Kb 
 
 P :  : : =  c b 
 
 P^ :  : : = Xe 
 
 P :  : : =  e 
 
 where \ denotes productions, P., for which the LAMDA bit of D. is set to 1. 
 
 Then: 
 
 P :  : : = X  * 
 
 P^ :  : : = X b * 
 
56 
 
 P c :  ::=  c b 
 5 
 
 P^:  ::= X e 
 
 P :  : : =  e 
 
 Po :  : : =  c  
 
 P :  : 
 P 1Q :  : 
 P :  : 
 
 - \ a 
 
 -  
 
 =  b 
 
 are the productions remaining after P, (l) and P~(2) have participated 
 in context absorption. Note that P^l) does not participate because the 
 asterisk indicates that the LEVEL field of D p is equal to OLDMA.KKER. Since 
 P (l) and P 7 (l) cannot participate because they lack a X : 
 
 10' 
 
 11' 
 
 12' 
 
 13' 
 
  : 
 
 : = 
 
 \  * 
 
  : 
 
 : = 
 
 X b * 
 
  : 
 
 : = 
 
  c b 
 
  : 
 
 : = 
 
 \e * 
 
  : 
 
 : = 
 
  e 
 
  : 
 
 : = 
 
  
 
  : 
 
 : = 
 
  b 
 
  : 
 
 : = 
 
  
 
  : 
 
 : = 
 
  c e 
 
  : 
 
 * s; 
 
 \ a b 
 
 show the productions remaining after context has been absorbed from Po(3) and 
 2). Note that the duplication of P from P is inhibited. Finally: 
 
 P c .:  : 
 5 
 
 P :  : 
 
 :=  c b 
 :=  e 
 \ :  ::=  
 
57 
 
 P :  
 
 P 12 :  
 
 P :  
 
 P . :  
 
 =  b 
 =  
 =  c e 
 = X a b 
 
 show the results of eliminating all productions marked with an asterisk. Note 
 also that RD1 is no longer explicitly left-recursive but retains implicit 
 left-recursiveness through P and P • It is easy to see that the terminal 
 strings defined by the fast set of productions above are exactly those 
 defined by P through P originally. 
 
 Context absorption is a local grammar transformation and is, there- 
 fore, carried out in PDLIST before the productions are entered into PROTAB. 
 All of the grammar transformations are global and are performed in PROTAB, it- 
 self. To discuss these with some facility, the notation of this section will 
 be altered and augmented as follows. The i-th production in PROTAB will be 
 denoted by P., its right-hand side symbols by T-(j), where j ranges from 1 
 through I. , the number of symbols on the right-hand side. The left-hand side 
 will be denoted by R. • Finally, for each nonterminal n, N(n) and 
 T(n) will be the sets of nonterminal and terminal head symbols of n, respec- 
 tively. 
 
 h . 5 . 2 Empty Removal 
 
 The control of ADDON and PUTINPROTAB is such that empty symbols may 
 appear in PROTAB only as productions in themselves. That is, if a production 
 contains an empty symbol, that must be its only symbol. Even in this 
 relatively mild form, however, empty symbols are unacceptable to BNF2FPL and 
 it falls to TWINKLE to remove them by back-substituting and collapsing PROTAB 
 around them. Since PROTAB always contains the initial production: 
 
  : : = J_  J_ 
 
58 
 
 where  is the unique objective symbol of the language, and 
 "_L" is a special termination terminal symbol that appears nowhere else in 
 PROTAB. Furthermore, there is no nonterminal for which the only production 
 is LAMBDA since there is a check that each nonterminal has a terminal string 
 derivation. 
 
 Each production, P., for which I. = 1 is tested to determine whether 
 P.(l) is the empty symbol LAMBDA. When such a production is found, PROTAB 
 
 is scanned for occurrences of R. . Such an occurrence, say P (j) = R. , 
 
 1 n 0/ i 
 
 generates a new production, P, at the end of PROTAB according to the rules: 
 
 (i) if l = 1 then 
 
 ' n 
 
 a) 4 k - 1 
 
 b > \-\ 
 
 c) P v (l) is an empty symbol; 
 
 (ii) if I > 1 then 
 n 
 
 c) P k (m) = P n (m) m=l, ...,j-l; P k (m)=P n (m+l), m-J,...,^. 
 
 In the first case, P will be picked up in its turn as an empty production. 
 In the second case,P, will eventually be scanned for occurrences of R. and 
 possibly generate further new productions. During the course of this pro- 
 cedure new productions are compared against existing productions in PROTAB 
 for identity. If a match is found the new production is not entered into 
 PROTAB. 
 
 . , . ';; Back Substitution of Singly Defined Nonterminals 
 
 Unlike LAMBDA removal, without which PROTAB is unacceptable to 
 BNF2FPL, back- substitution of singly-defined nonterminals merely serves to 
 
59 
 
 increase the overall efficiency of the resultant compiler by decreasing the 
 number of reductions required to recognize certain nonterminals. 
 
 The algorithm for back-substitution is very straightforward and 
 proceeds as follows. The sei^ A, of singly defined nonterminals is determined 
 by one pass through PROTAB. Then, for each nonterminal, n, in A, PROTAB is 
 scanned for productions, P., in which P.(«j) = n for some j such that 1 < j < t. • 
 A new production, P^ is then created such that: 
 
 (i) r^v 
 
 (ID ^ = i ± + t n , . 1; 
 
 (iii) P. (m) m = 1, ..., j - 1; 
 
 P n , (m - J + 1) m = $t • • » > J + t n . - 1; 
 P i (m - \,+ 1) m = J + l^t-o'tlj^ 
 
 where P , is the single production for which R , = n. The production,?., is 
 then deleted and the scan for occurrences of n continues, eventually reaching 
 P and checking for possible further occurrences of n. 
 
 k.^.h Dummy Insertion 
 
 It has been mentioned, briefly, that two otherwise indistinguishable, 
 Floyd productions may be differentiated by a look ahead of at most three symbols, 
 If this much look ahead is insufficient, BKF2FPL attempts to combine the pro- 
 ductions deciding (in essence) that the differentiation may be postponed. Of 
 course differentiation must eventually be accomplished by look ahead, if at all. 
 There are two situations in which this combination cannot be performed. First, 
 a terminal symbol may not be combined with a nonterminal. Second, if A and B 
 are two nonterminals for which a combination is proposed, that combination 
 cannot be carried out if A is a head symbol of B (or, of course, if B is a 
 head symbol of A). The reason for this is that the parser, in the course of 
 
6o 
 
 looking for an A or a B, will then be satisfied by finding an A, even though 
 that A may, in fact, be the beginning of a B which a correct parse would 
 discover. 
 
 The approach to this problem in the original TWS was to look for BNF 
 productions of the form: 
 
  : : - a A p 
  : : = a  y 
 
 (1) 
 
 where a is a nonempty string of terminals and/or nonterminals, and A is a 
 headsymbol of the nonterminal, B (A may be either a terminal or a nonterminal) 
 These productions were modified to;