LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN biO.%4 I SLUT The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN OCT "' WTT" L161 — O-1096 fS/ y Re P° rt No * 396 /^IdiAA^ TWINKLE- -A SYNTAX LANGUAGE FOR A TRANSLATOR WRITING SYSTEM . by Robert Leroy Mercer ILLIAC IV Document No. 218 Digitized by the Internet Archive in 2013 http://archive.org/details/twinklesyntaxlan396merc Report No. 396 WINKLE- -A SYNTAX LANGUAGE FOR A TRANSLATOR WRITING SYSTEM* by Robert Leroy Mercer May 15, 1970 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 618OI This work was supported in part "by the Advanced Research Projects Agency as administered by the Rome Air Development Center under Contract No. USAF 30(602)-lnM and submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, February 1970. 11 ABSTRACT TWINKLE is a language designed to aid in the syntactic specification of programming languages. In addition to the constructs available in BNF, TWINKLE provides for easy specification of lists and other frequently used linguistic structures. By providing a large number of alternatives for its various constructs, TWINKLE allows the language designer to specify a language in terms that approach natural language. The implementation of a compiler for TWINKLE is described. This compiler is the first phase of the ILLIAC IV Translator Writing System. Ill ACKJTCWLEDGEMENTS The author wishes to express his appreciation for the advice and efforts of Dr. Robert S. Northcote who has helped immeasurably in the creation of this paper. Thanks are also due the author's colleagues Alan Beals, Nelson Machado, and Jacques LaFrance whose contributions to the language described herein and discussions on the translator writing system have been invaluable. For financial support, I wish to acknowledge the National Science Foundation for its award of a fellowship. I also acknowledge support by the ILLIAC TV project for willing provision of the necessary computer and other physical facilities. Finally, deep gratitude is expressed to Mrs. Sandy McCabe and Mrs. Shirley Brown for their time and effort in typing the manuscript. IV TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1 Backus Naur Form 2 1.2 Translatable Backus Naur Form 3 2. THE TWINKLE METALANGUAGE FOR SYNTACTIC SPECIFICATION 1+ 2.1 Syntactic Symbols 7 2.1.1 Terminals 7 2.1.1.1 Characters 7 2.1.1.2 Special Words 9 2.1.1.3 Character Mode Terminals 10 2.1.1. k Meta-Terminals 11 2.1.1.5 Blanks 11 2.1.2 Nonterminals 12 2.1.3 Any Symbols 12 2.1.4 Square Brackets l4 2.1.5 Maybe Symbols 17 2.1.6 Enclosures 17 2.1.7 Unordered List 19 2.1.8 Precedence Structures 19 2.1.9 Lists 20 2.1.9.1 Head 20 2.1.9.2 Type 22 2.1.9.3 Base 22 2.1.9.4 Separator 22 2.1.9.5 Tail 23 2.1.10 Seeded Lists 24 2.2 Semantic Symbols 25 2.2.1 Actions and Tests 25 2.2.2 Simple Calls 26 2.2.3 Declaration Calls 27 2.2.4 Implicit Calls 27 2.2.5 Parameterized Semantic Calls 28 2.2.6 Bit Actions and Tests 28 2.2.7 The Tail 29 2.2.8 Placement of Calls 29 2.3 Null and Empty Symbols 31 3. CONTROL OF THE TWINKLE TRANSLATION 32 3.1 The Translator Writing System 32 3-2 Control Statements 35 3.2.1 Language Name Designation 35 3.2.2 Print Options 35 3.2.3 The Parser Type Option 38 3-2.4 Zip Control 38 3-2.5 Program Parameter Control 39 Page 3.2.6 Executable Compiler Options 39 3.2.7 Miscellaneous Control Options 1+0 3.3 Burroughs B-5500 Control Cards for Executing TWINKLE 1+1 k. IMPLEMENTATION OF THE TWINKLE TRANSLATOR 1+3 k.l NONTAB, SYMTAB, and OPRTAB 1+3 1+.2 PROTAB, PRODS and PDLIST ^5 if. 3 PRODSTACK and LPSTACK 1+9 k.k Any Patterns 53 1+.5 Grammar Transformations 53 1+.5«1 Back Context Absorption 53 k.5.2 Empty Removal 57 U.5'3 Back Substitution of Singly Defined Nonterminals 58 k.^.k Dummy Insertion 59 k.6 TWINKLE Output Files 6l 1+.6.1 TABLESF 61 1+.6.2 ACTIONS 6k k.J ZIP Files 65 5. SUMMARY 66 APPENDIX A. Reserved Words for TWINKLE 68 B. The Syntax of TWINKLE written in TWINKLE and in BNF 69 LIST OF REFERENCES 93 VI LIST OF FIGURES Figure Page 1. The syntax of the list with separator 21 2. Block diagram of the Translator Writing System 33 3. An entry in the NONTAB table kk k. PROTAB word format U6 5. The format of a PRODSTACK k9 6. The format of a LPSTACK 50 7> Left recursive and right recursive lists 52 1. INTRODUCTION The recent proliferation of digital computers has spawned an ever increasing number of formal languages for computer programming and related purposes. Creating a compiler for such a formal language is a decidedly non- trivial task, often requiring several man-years of effort. Therefore, from this "bourgeoning stock of languages and compilers, several widely applicable compiler writing techniques have been extracted which at once lead to a deeper understanding of the compiler writing process and to a considerable reduction of the effort involved. Because of its importance in obtaining a clear and precise definition of a formal language, the development of syntax metalanguages has been inti- mately related to the development of compiler writing techniques. These meta- languages range from Backus Naur Form (BNF) [1], and its many variants [2, 3> ^> 5]> to languages more suitable to syntactic recognition, such as the Floyd production language (FFL) [6] and operator precedence tables [7]> and even to the conventional programming languages, FORTRAN [8] and PL1 [9]- Each of these languages has certain advantages: relative compactness and clarity of syntactic structure in the case of BNF and its derivatives; a very clear and explicit statement of the recognition algorithm in the case of FFL and operator precedence tables; and, finally, virtually immediate implementation in the case of FORTRAN and FL1. As is to be expected, ease of producing a linguistic description decreases rapidly as the description itself approaches an imple- mented compiler. The primary aim of the TWINKLE metalanguage is to provide a major increase in the ease with which a syntactic specification may be created by a language designer and in the ease with which that syntax may be understood "by a user of the language unfamiliar with metalanguages in general. This has "been achieved through the introduction of a wide variety of syntactic symbols for designating many of the common syntactic structures such as lists, en- closures, etc., and through the provision of numerous English words and phrases, which may be used with commonly understood meanings, as an integral part of the syntactic specification. As a secondary aim, TWINKLE has been designed to present a unified front for the University of Illinois Translator Writing System (TWS). TWINKLE is the input language and the TWINKLE translator is the first phase of the TWS. Thus, TWINKLE combines BNF as described by Beals [3] and translatable BNF (TBNF) as described by Trout [h]. Before progressing to a detailed descrip- tion of the TWINKLE language and translator which occupies the remainder of this thesis, a brief description is provided of both BNF and TBNF. 1.1 Backus Naur Form The basic unit of the BNF description of a language is the produc- tion. A production consists of a nonterminal (the left hand side) followed by the symbol, triple ":: = ", followed by a list of terminals and nonterminals (the right hand side). Each nonterminal consists of a string of characters enclosed in either quotes (" ") or angle brackets (< >) . The string may not include ", <, or > and may not start with *. Terminals are special words (strings of alphanumeric characters preceded by #) or characters (A, B, C, etc)- Characters used in the metalanguage (#, ", <, /, etc.) must also be preceded by a # when used as terminals. Two productions with the same left hand side may be combined into one by including the right hand side of the second with the right side of the first and separating it from the latter with the metacharacter, "/ ". Productions themselves are separated from one another the metacharacter, "j ". 1.2 Translatable Backus Naur Form TBNF is, in itself, a large step toward simplifying syntactic spec- ification. In addition to the BNF structures described in the last section, TBNF allows: (i) Kleene star-- * = \ | | | . . . ; (ii) Ampersand for optional presence of some symbol-- & = | \ ; (iii) Square "bracket construct to delimit groups of symbols and alternatives -- ::= [ | ] z is equivalent to ::= z ::= | ; (iv) list = * ; (v) list separator = [ ] * ; (vi) = Any symbol at all ; (vii) "but used in conjunction with to reduce its generality. Thus TBNF is considerably more general than BNF. Note, however, that TBNF does not allow left recursion because the parser generated employs recursive descent. 2. THE TWINKLE METALANGUAGE FOR SYNTACTIC SPECIFICATION When work was begun on the TWINKLE language, two metalanguages, BNF and TBNF, were already in use at the University of Illinois as input languages to the TWS. The BNF input yielded a deterministic parsing algor- ithm based on the Floyd Production Language (FPL), as described by Beals [ 3 ]; while TBNF input yielded a recursive descent (KD) parsing algorithm, as described by Trout [ k ] . Each parser has certain advantages but, once either BNF or TBNF has been chosen as the metalanguage, it requires a major effort to convert the description to the alternate form. Thus, although it would be desirable, because of its relatively rapid generation, to create a RD parser during the debugging phases of the language descrip- tion, it would be equally desirable, because of the rigorous exclusion of ambiguity inherent in the nature of the deterministic FPL parser, to create a FPL parser when the last phase of the compiler creation is reached. Unfortunately this ideal has not been attainable, in the past, primarily due to the difficulty of translating TBNF into BNF by hand. These considera- tions, therefore, dictate that TWINKLE be a superset of both BNF and TBNF, so that existing language specifications may be accepted by the new system with little or no change, and that the TWINKLE translator output be either BNF or TBNF. The basic form of TWINKLE, therefore, is cast in the familiar BNF mode. That is, a TWINKLE syntactic specification consists of one or more productions, each of which has a left hand side, which is the non- terminal being defined (wholly, or in part) by the production, and a :t hand side which comprises a set of alternative definitions for the nonterminal in the left hand side. Each definition, in turn, comprises a string of TWINKLE syntactic and semantic symbols. In BNF the produc- tions are separated from one another by semicolons; the definitions by vertical bars which are actually rendered in the implementation by a slash; and the left and right hand sides of a production by the character, triple "::=" . It is one of the aims of the TWINKLE project to make possible the rigorous specification of a language syntax in a way that is at once acceptable to both human beings and computers. To this end, English phrases have been provided for the replacement and/or embellishment of the metalinguistic symbolism of BNF and TBNF. In addition, several features not present in either BNF or TBNF have been introduced. For example, the BNF productions, : : - #BEGIN #END / #BEGIN #END; : : = / #; ; may be written in TWINKLE in the much more readable form: A CONSISTS OF A POSSIBLY EMPTY LIST OF S (1) SEPARATED BY SEMICOLONS ENCLOSED IN #BEGIN AND #END; At first glance this might appear to have two distinct interpretations: particularly, a program may be either BEGIN s ; s ; s ; ... ; s END or s BEGIN; END s BEGIN; END s ... BEGIN; END s where "s" here stands for . In fact, the former interpretation is made. If the language designer wishes to express the latter form of program, he may write : A CONSISTS OF A POSSIBLY EMPTY LIST OF S (2) SEPARATED BY [SEMICOLONS ENCLOSED IN #BEGIN AND #END] ; In TWINKLE, the square brackets ([]) serve the function of delineating clause and phrase structure in productions. The enclosure operator always acts on the immediately preceding syntactic symbol which is at the same nesting level as the enclosure operator, itself. Thus, in production (i), it is the possibly empty list that is to be enclosed and not the semicolons. In production (2), on the other hand, the square brackets associate the enclosure operator with the semicolons and indicate that this construct, taken as a whole, is to be the separator of the possibly empty list. It is to be noted, however, that the TWINKLE language does not enforce strict grammatical usage of English but, rather, allows for such usage by the language designer. Thus, it will be found that, in the TWINKLE syntax, the articles "A" and "AN" are treated equally with the result that a determined corruptor of English might write (l) as AN CONSISTS OF AN POSSIBLY EMPTY LIST OF S SEPAPATED BY SEMICOLONS ENCLOSED IN #BEGIN AND #END; ^' The TWINKLE program, in its most general form, is made up of three distinct portions, of which the language syntactic description is the second. The first portion is a set of control statements which deter- mine, among other things, the nature and volume of the TWINKLE output and the processing options for the remainder of the TWS. The construction and use of this portion is dealt with in Chapter 3« The third portion, the semantic tail, conveys the necessary semantic information about the language being described. This is discussed briefly in 2.2.5, and more fully by Machado [10]. The remainder of the current chapter is a discussion of the syntactic and semantic symbols available to the language designer for the syntactic portion of the TWINKLE system. 2.1 Syntactic Symbols 2.1.1 Terminals Terminals are those symbols of which the object program is ultimately composed. They may occur only on the right hand side of a production and are represented in the syntax in a number of different ■ways. Terminals fall into the following classes: characters, special words, character mode terminals, meta- or special class terminals, and blanks. 2.1.1.1 Characters The simplest form of the terminal is the character. Here character refers to any of the twenty- seven special characters accepted by equipment of the Burrough's B-5500 computer system. These are included, together with the other thirty- seven B-5500 characters, in Table I- In the syntax, a character may be represented by prefixing it -with a sharp (#) when the character standing alone would have some special significance to TWINKLE (i.e., when the character is a meta- character ) . For example, a comma is represented in the syntax by the symbol pair "#, ". However, since more than half of the special characters available are meta- characters, it is probably safest to include the sharp in all cases. This is a point at which TWINKLE diverges from the standard BNF and TBNF. The latter have Table ] The Burroughs B-5500 Character Set Classes of Any Base Symbols H a ■H EH CD T! O o H cd 0) -P H * co CD co CD pq l» CO CO pq >> c < H co a •H CD EH CD o O H CO C fn CD P E! H * co QJ CO CO pq >> < CD •B o o r-H r-H co co •H U g CD S-H -P CD C! Eh H * w CD CO co pq >> < 0,1,3 A 17 0,1,2 K 34 0,1,2 T 51 0,1,2 1 1 0,1,3 B 18 0,1,2 L 35 0,1,2 U 52 0,1,2 2 2 0,1,3 C 19 0,1,2 M 36 0,1,2 V 53 0,1,2 3 3 0,1,3 D 20 0,1,2 N 37 0,1,2 w 54 0,1,2 4 k 0,1,3 E 21 0,1,2 38 0,1,2 X 55 0,1,2 5 5 0,1,3 F 22 0,1,2 P 39 0,1,2 Y 56 0,1,2 6 D 0,1,3 G 23 0,1,2 Q 4o 0,1,2 Z 57 0,1,2 7 7 0,1,3 H 24 0,1,2 R 4l 0,1,2 , 58 0,1,4 8 8 0,1,3 I 25 0,1,2 S 42 0,1,4 i 59 0,1,4 9 9 0,1,3 • 26 0,1,4 -X- 43 0,1,4,6 / 60 0,1,4,5 # 10 0,1,4 [ 27 0,1,4 - 44 0,1,4,6 = 61 0,1,4,5 @ ii 0,1,4 & 28 0,1,4 ) 45 0,1,4 ] 62 0,1,4 1 12 0,1,4 ( 29 0,1,4 j 46 0,1,4 " 63 0,1,4 13 0,1,4 < 30 0,1,4,5 < 47 0,1,4,5 special "word-0,7 > Ik 0,1,4,5 <— 31 0,1,4 48 0,1 <*!> - 0,7 > 15 0,1,4,5 X 32 0,1,4,6 / 49 0,1,4,6 <*N> - 0,7 + lb 0,1,4,6 J 33 0,1,2 S 50 0,1,2 <*S> - 0,7 = Any Terminal 1 = Any Character 2 == Any Letter 3 = Any Digit 4 = Any Special Character 5 = Any Relational Operator 6 = Any Algebraic Operator 7 = Any Non-Character fewer meta- characters and, as such, require fewer sharps. Although it is very easy to insert the necessary sharps, it may be desirable to make a preliminary run through the TWINKLE translator alone to isolate what trouble spots there may be. An alternative method for indicating a character in the syntax which avoids the details of the sharp, and which provides for a more readable syntax, consists of writing down the English word or phrase which identifies the character in question. Thus, in production (l) above, SEMICOLONS is used in place of the equally acceptable "#;". While this form is not available for all of the special characters, it proves quite useful in practice. A complete list of these alternatives is given in the TWINKLE syntax (see Appendix B). 2.1.1.2 Special Words Many times it is convenient to consider a group of letters and digits as being, conceptually, a single terminal. Thus, in languages of the ALGOL family, the letter strings BEGIN and END are each taken as single terminals. This approach has an advantage in the milieu of the TWS in that these conceptual units, or special words, are compiled rela- tively quickly by the scanner as opposed to the more laborious and time consuming letter by letter compilation through the syntax and semantics. Any string of letters and digits which begins with a letter may be used as a special word. It must not have embedded blanks, and the character immediately after it must not be alphanumeric. As with characters, a special word must be prefixed by a sharp in the syntax if it would otherwise be of special significance to the TWINKLE translator. Since there are well over one hundred such words in TWINKLE (see Appendix A), it is safest to use sharps literally. Again, there is a divergence of TWINKLE at this point from BNF and TBNF which is easily overcome. 2.1.1.3 Character Mode Terminals While the special word is often the better way of entering alphanumeric information, there are times when character by character input is actually preferable. For example, the parsing of FORTRAN and B-5500 ALGOL FORMAT statements is simplified if done in the character mode. More generally, any time there is an abundance of single character signifi- cance in a syntactic entity, it is better parsed and more compactly described in the character mode. If the sequence of characters to be dealt with consists entirely of digits, it may be written into the syntax directly because an unadorned number has no special significance to the TWINKLE translator. If, however, a more general sequence must be handled, the sequence must be preceded by the word ALPHA, which indicates to TWINKLE that it must consider the following sequence of characters specially. Since ALPHA is a bit long, it behooves one to provide a means of keeping its use to a minimum. To this end, the construct, [ALPHA A / ALPHA B / ALPHA C / ... / ALPHA 2] , is equivalent to the more compact form, ALPHA [A / B / C / . . . / Z] Another, more obscure, method of specifying alphanumeric characters (or, in fact, any of the Burroughs B-5500 characters) is the code construct which is based on the internal binary representations of the various characters. This form of character representation is a 11 carry over from TBNF where it was adopted primarily "because the question mark is not a valid character on punched cards in the B-5500 system. It consists of the word CODE followed by an integer between zero and sixty- three which is the internal code of the character being indicated. 2.1.1.U Meta-Terminals Because of the advantages attendant to allowing the scanner to perform a certain amount of simple syntactic analysis immediately on the input string (as, for example, in the recognition of special words), the TWS scanner also recognizes members of three special classes of terminals: identifiers, numbers, and strings. These meta-terminals are represented in the syntax by the symbols <*I>, <*N>, and <*S>, respectively. In an English-like syntax, they may be represented by IDENTIFIER (or IDENTIFIERS), NUMBER (or NUMBERS), and STRING (or STRINGS). An identifier is any sequence of alphanumeric characters beginning with a letter, provided the sequence is not a special word of the language. A string is any sequence of characters (excluding the quote) enclosed in quotes. A discussion of how the scanner handles these items has been given by Machado [lo] • In addition to these three meta-terminals, TWINKLE allows for the syntactic specification of up to twenty other meta-terminal symbol classes. In the syntax, these are represented by the symbol <*n> where n is a digit between four and twenty-three and identifies the meta- terminal. A special scanner is necessary to take advantage of this facility. 2.1.1.5 Blanks Blanks are specified in the syntax by the word BLANK. A blank can only be scanned in the character mode. 12 2.1.2 Nonterminals Nonterminals are specified in TWINKLE as strings of characters, called nonterminal names, enclosed in either angle brackets or quotes (< > or " " ). For obvious reasons the nonterminal name may not include either angle brackets or quotes. Furthermore, any blanks which appear in the nonterminal name are disregarded. Thus, the nonter- minals, and , are treated identically. In the BNF or TBNF output resulting from a TWINKLE translation, the blanks in nonterminal names displayed are, in fact, removed. To retain a modicum of readability in this compact form it is advisable to hyphenate multi-word nonterminal names; for example, use in place of , which is any of the twenty- six letters of the alphabet, one must write-- 13 ::=A/B/C/.../X/Y/Z Trout, in TBNF, introduced the pseudo-nonterminal, , which stands for any terminal symbol. If not all terminals are to be indicated, the exceptions, if small in number, may follow the pseudo-nonterminal — each preceded by the special word, BUT. For example, any terminals except BEGIN and END may be written: BUT #BEGIN BUT #END This construct has been used primarily in error recovery in TBNF languages. In TWINKLE, the any symbol has been generalized and has become a powerful programming tool. The syntax of is shown below: ( #terminal Character #letter #DIGIT #ANY / #S FECIAL #CHARACTER #RELATIONAL #OPERATOR #ALGEBRAIC #OPERATOR #NONCHARACTER \ <^i\TriT\iniTr'DMTi\Tfl t/s * LIST OF {#BUT } ) X #BUT #[LIST OF S SEP( #/ }# ] ^ J V_ ' v EXCEPTION LIST BASE Use has been made of a rather simple two-dimensional extension of TWINKLE: square brackets have been replaced by vertical braces with the alternatives occupying one line each; the Greek letter 'V is used instead of the special word, LAMBDA. The terminals in each of the bases, except for , are shown in Table I (page 8). The base is unique in that, first, 14 the elements that it contains depend upon the actual nonterminal symbol used and, second, they are not restricted to terminals but include all of the alternative definitions of the nonterminal. Any terminals which are in the base, but "which are not desired, may be written in the excep- tion list following the . The TBNF form of the exception list is still accepted by TWINKLE but, as with the ALPHA list, it is possible to use only one BUT and enclose the terminals of the exception list in square brackets immediately following it. Thus, in place of BUT #BEGIN BUT #ENL BUT #LEFT BUT #RIGHT , one may write BUT [#BEGIN / #END / #LEFT / #RIGHT] 2.1.4 Square Brackets In English, clauses are separated at one level by commas and at a higher level by semicolons. Beyond this either more than one sentence is used or the clause separation must be done by the reader from context. Even this, however, does not prevent ambiguity beyond four levels or so. A language such as TWINKLE, in which it is necessary to indicate clause nesting to an arbitrary level, must have a more powerful mechanism available. nee the semicomma and the demisemicolon do not yet exist, it decided that clauses and other such ensembles, which are intended as .ngle syntactic entities, be enclosed in square brackets. Two examples of • e already been encountered: the terminal list following ALPHA, .1st following BUT. Beyond these the square brackets find 15 several other uses; whenever one or more symbols, or groups of symbols, appear at one spot in a production they may he enclosed in square brackets, Thus, the productions, AN CONSISTS OF #ANY #TERMINAL; AN CONSISTS OF #ANY #CHARACTER; AN CONSISTS OF #ANY #LETTER , may be written more compactly as AN CONSISTS OF #ANY FOLLOWED BY [^TERMINAL OR #CRARACTER OR #LETTER] . For purposes of adding semantic symbols, which will be discussed later, the special word,ANY, and the bracket construct, taken as a whole, are considered to be at level zero of the production, while the special words; TERMINAL, CHARACTER, and LETTER, are considered to be at level one. Alternatively, an entire production may be nested in square brackets and one may write- - AN CONSISTS OF #ANY FOLLOWED BY AN [ CO WHICH IS DEFINED TO BE #TERMINAL OR #CHARACTER OR #LETTER] . Note that although all previous forms of arrow are still valid in the nested production, it is also permissible to include the special word, WHICH, so that the construct will look more like an English clause. When a nonterminal, such as , is defined in a nested production it 16 may then appear anywhere else in the syntax just as if it had been de- fined in the usual manner. There are, however, some precautions to be ta'^en with nested productions. These stem from the fact that the non- terminal so defined may have additional definitions elsewhere in the syntax. In this case the nonterminal represents the totality of its alternative definitions, except when it appears on the left hand side of a nested production in which case it represents only those definitions which appear on the right hand side of the same nested production. To illustrate this, suppose that in addition to the production above, in which is defined, one also has the production: A CONSISTS OF #SOME FOLLOWED BY AN (5) [ WHICH IS #DIGIT OR #DIGITS] The two productions (k) and (5) are then equivalent to the BNF productions: : : = #ANY ; : : - #SOME ; : : = / ; : : = ^TERMINAL / #LETTER / #CHARACTER ; : : = # DIGIT / #DIGITS The square bracket construct may also be applied to a single string of symbols to indicate that the string is to be taken as a unit itself. This is useful in constructs such as the maybe symbol, enclosures, . lists described below. IT 2.1.5 Maybe Symbols Frequently a syntactic structure has one or more substructures which may be omitted without syntactic error. Thus, for example, in ALGOL a list of labels preceding a statement is optional. Many other examples may be found in algorithmic languages; several may be found in TWINKLE, itself. To make the specification of such structures as easy as possible, Trout [h] adopted the Brooker and Morris question mark [ll] --changing it, in the process, into an ampersand because the question mark is an illegal char- acter on B-5500 cards. In TWINKLE, either an ampersand or a question mark may be placed after a symbol (or group of symbols enclosed in square brackets) to indicate that it is optional. The English-like form of this construct con- sists of preceding the optional symbol by the special word phrase, POSSIBLY ONE. This form is more general, in that it may be applied directly to lists and enclosures; whereas, they must be enclosed in brackets when followed by a question mark or ampersand. An example of the English form of follows. The production, AN CONSISTS OF AN FOLLOWED BY POSSIBLY ONE , is equivalent to the two BNF productions: : : - ; : : = Under more complex conditions, the maybe symbol can account for a considerable increase in readability of the syntax. 2-1.6 Enclosures Another very common construct in computer languages is that in which some structure, such as a list of subscripts, is enclosed in delimiters, such as parentheses. The delimiters may be different: e.g., the special words, BEGIN and END, which bracket compound statements in ALGOL; they may be different but very closely related: e.g., the left and right parentheses which enclose subscript lists in FORTRAN and PL1; they may be identical: e.g., quotes which delimit strings in ALGOL or periods which enclose logical operators in some dialects of FORTRAN. Corresponding to these possibilities there are three forms of enclosure. The general representative of the first form may be symbolized as: #ENCLOSED #IN #AND where represents any basic TWINKLE symbol or group of symbols en- closed in brackets. The latter two symbols, if not enclosed in brackets, may not be enclosure symbols themselves. Using this construct and the list symbol discussed below a compound statement in ALGOL may be defined by the very clear production: A IS DEFINED TO BE A POSSIBLY EMPTY LIST OF S SEPARATED BY SEMICOLONS ENCLOSED IN #BEGIN AND #ENDj . The second form of enclosure actually applies to only three sets of characters in the Burroughs character set: parentheses, square brackets, and angle brack- ets. It may be symbolized, using the two dimensional bracket construct de- scribed earlier, as: C\ ^PARENTHESES ] #ENCLOSED #IN { #ANGLE #BRACKETS f ( #SQUARE #BRACKETSJ Finally, the third form of enclosure symbol is simply: #ENCL0SED #IN word, S, may be added to make the second symbol plural. 19 2.1.7 Unordered List The unordered list provides a means of indicating that a group of items may appear in any order. One very simple example of the use of this is the PL1 iterated DO loop. The leading statement may have an initial value, increment, and final value for the control variable; the increment and final value may appear in either order. Symbolically the unordered list has the form: #UNORDERED #LIST #0F [#AND- • • #AND ] ; or [ #AND • • • #AND ] #IN #ANY #ORDER 2.1.8 Precedence Structures The precedence structure was introduced in TBNF to allow for the specification of operator precedence in a BNF environment. Since the prece- dence structure does not lend itself readily to a simple English alternative, its syntax has not been expanded from the TBKF version. A precedence struc- ture consists of: the special word, OPERATOR, followed by a list of precedence groups enclosed in square brackets; followed by the special word, ON; and, finally, by the operand on which the precedence is based. Each precedence group consists of a list of symbols, the operators, followed by the special word, PRECEDENCE, and a pair of integers separated by a comma enclosed in paren- theses. The integers indicate the precedence of the preceding operators in the stack and in the input stream, respectively. Succeeding precedence groups are separated from one another by slashes. The following is an example of the use of a precedence structure : ARITHMETIC EXPRESSION> :: = OPERATOR[#f #- PRECEDENCE (1,1)/ #/ #X PRECEDENCE^, 2)] ON 20 2.1.9 Lists The list is one of the most useful of the TWINKLE constructs. Con- sequently it occurs in a wide variety of English-like and symbolic forms. To indicate a simple list without separators, the right square bracket of a nested production or square bracket construct may be followed by an asterisk or by a plus sign. The asterisk denotes a possibly empty list while the plus sign denotes a list that must have at least one element. In BNF, lists are usually formed either by left recursive productions- such as : : : = / ; or by right recursive productions such as: : : = / This underlying structure is masked by the simplicity of the TWINKLE constructs but may be made explicit, if desired, by the insertion of the qualifying spe- cial word, OPEN, for left recursion or CLOSE, for right recursion between the right square bracket and the asterisk. If no qualifier is present, left re- cursion is implied. The syntax of list structures allowing for separators is shown in figure 1. As indicated it consists of five portions: head, type, base, sepa- rator, and tail. Only the type and base are, necessarily, nonempty but at least one of the head and tail must be empty. The functions of these various com- ponents are discussed below. 2.1.9.1 Head In the absence of the list tail, the list head determines whether the list is left or right recursive. If it is empty (the 1 1 word, REDUCED, or the phrase, LEFT RECURSIVE), .1st is left recursive. If it is R, EXPANDED, or RIGHT : :; right recursive. 21 O •P 03 U d 0) -P K) •H O 05 t CD H 2.1.9-2 Type Lists may be either possibly empty or nonempty , as noted above. It is the function of the list type, which may not be empty, to determine this characteristic. A non- empty list is indicated by the list types : L, LIST, STRING, SEQUENCE, NONEMPTY STRING, NONEMPTY LIST, and NONEMPTY SEQUENCE. For a possibly empty list, the available types are : EL, KLEENE, KLEENE STAR, POSSIBLY EMPTY LIST, POSSIBLY EMPTY STRING, and POSSIBLY EMPTY SEQUENCE. To improve readability, the list type may be followed by the special word, OF. If the list type is either STRING or KLEENE, OF is necessary to avoid syntactic ambiguity. 2.I.9.3 Base The base may be a single symbol or group of symbols enclosed in square brackets. It must not be a list itself and is considered to be nested one level deeper than the list of which it is the base. An S may follow the base, in some cases, to indicate its plural character in the grammatical structure of the list. In addition, phrases which may be used as character terminals all have plural forms which may be used profitably here. • .j.h Separator The power of the list construct is greatly enhanced by the possibility of specifying a separator. Like the base, the separator may be either a single symbol jp of symbols enclosed in square brackets. The •irator, basi , coni LderecJ bo be at the n^xt 23 higher bracket nesting level from that of the list, itself. The plural forms valid in the base are also valid in the separator. The appearance of the separator between successive base items is either required or optional- according as the separator type is definite or questionable. Definite separator types are indicated by SEP, SEPARATOR, and SEPARATED BY; questionable separator types are indicated by Q,, and by the special word, POSSIBLY, followed by any one of the definite separator types. 2.I.9.5 Tail In the absence of a list head, the tail determines "whether the list takes the left recursive or right recursive form. If it is empty, or the special word, CLOSE, the list is left recursive. If it is the special word, OPEN, the list is right recursive. Note once again that either the tail or the head must be empty in any list structure. The examples below illustrate the various aspects of the list structure: A S SEPARATED BY SEMICOLONS; ARITHMETIC EXPRESSION ::= LIST [LIST SEP [#* /#/]] SEP [#+/ #-]; 2k : := [ #:] * CONSISTS OF A LEFT RECURSIVE LIST OF S POSSIBLY SEPARATED BY SLASHES ENCLOSED IN SQUARE BRACKETS; 2.1.10 Seeded Lists Any of the list structures, the syntax for which appears in figure 1 above, becomes a seeded list when followed by a list seed. The syntax of the list seed is #STARTING ~ #BEGINNING_ } #WITH , where may be any TWINKLE symbol, or group of symbols, enclosed in square brackets, with the exception of enclosures and maybe symbols. The seed list may be used to indicate that the first element of a particular list is distinguished in some way from the rest. For example, the syntax of an ALGOL block may be written: A CONSISTS OF A POSSIBLY EMPTY LIST OF S SEPARATED BY SEMICOLONS STARTING WITH A POSSIBLY EMPTY LIST OF S SEPARATED BY SEMICOLONS ENCLOSED IN #BEGIN AND #END [ote that, since the list seed may not be an enclosure, the enclosure operator applies to the entire seeded list and not simply to the list of S . 25 2.2 Semantic Symbols The TWINKLE symbols and constructs described up to this point may be used in the syntactic specification of a language. A compiler, however, must be more than simply a recognizer for a language. It must assign appro- priate meanings (e.g., in the form of equivalent machine code) to the various syntactic entities that it recognizes from the input stream. These meanings, taken as a whole, make up the semantic description of the language or, more simply, the semantics of the language. In the TWS, the semantics is written in the Illinois Semantics Language (ISL), a complete description of which is provided by Machado [10 ] • For the purposes of this discussion, it suffices to consider the ISL semantic description as a number of individual semantic blocks, each of which is associated with a semantic name through which it may be accessed by the parser. Before describing the manner in which the parser is directed to initiate a semantic block, it is necessary to consider the pos- sible results of the execution of a semantic block. 2.2.1 Actions and Tests Based on their effect on the parser, semantic blocks may be divided into two groups: semantic actions and semantic tests. Actions have no direct effect on the parser and are used primarily for such functions as table cre- ation, code emission, etc Tests, on the other hand, are actually more of a syntactic character than a strictly semantic character. Thus, a test is called by the parser when the nature of a particular entity is syntactically undeterminable and requires investigation on a higher level. For example, in an ALGOL assignment statement a boolean variable must be assigned the value of a boolean expression while an integer or real variable must be assigned the value of an arithmetic expression. Since the declaration in which the type of the variable in question was determined may precede its 26 usage by an arbitrary amount, and since an arithmetic expression and a boolean expression may, in general, be identical for arbitrarily many symbols, the parser is unable to determine whether an arithmetic or boolean assignment is being made. The question is resolved by calling a test which compares the variable with tables made when the declarations were recognized. The result of this comparison is communicated to the parser which then is able to proceed correctly with the parse. Communication between test and parser is by means of the globally declared boolean variable, SEMANTICTEST. If the value of SEMANTICTEST after the execution of the test is true, the parser continues along the indicated branch of the parsing tree. Otherwise the parser abandons this branch and must decide among the remaining branches, possibly invoking further tests in the process. It is important, therefore, that each block, which may be called as a test, set SEMANTICTEST at some point in its execution. If this is not done, the parser -will determine its future course of action from the previous value of SEMANTICTEST with the attendant possibility of an erroneous parse. Since any block may be called, both as an action and as a test, at different times during the parse, it is permissible for an action to set SEMANTICTEST although the variable is disregarded under these circum- stances. An indication to the parser of the points in the syntactic recogni- tion at which a particular action or test must be involved is given by placing semantic call symbols at appropriate points in the syntax. These calls may have any of the three forms described below. ' . .'- Simple Calls 'nple action (test) calls consist of the special symbols, "^S"("@T"), olloved by the name of the action (test) being called. Any string of digits * alphanumeric characters beginning with a letter may be used as a name. 27 A particular semantic block may be called from as many places in the syntax as desired and may be called as a test of one point while being called as an action at another point. Several forms of the simple call linger from earlier versions of the TWS. Thus, in addition to "@S", an action name may be pre- ceded either by "@Q" or "#", and a test name may be preceded by "#" and enclosed in either quotes or angle brackets. When "#" is used in an action call, the action name must be a string of digits. The reason for this is that the call would appear to be a special word if the action name were to begin with a letter. 2.2.3 Declaration Calls Any of the simple calls (except the "#" form) may be extended into a declaration call. The action or test name is followed by a colon and then a description of the action or test in ISL code. This descrip- tion, or declaration, must be enclosed either in square brackets, or in the special words, BEGIN and END. Each name may be declared no more than once in this way, although such a declaration is not necessary. Any name so declared may be used in the syntax in simple calls, both before and after the appearance of the declaration. When the semantic block in question is brief, the overall clarity of the linguistic description may be considerably enhanced by using the declaration call. 2.2.U Implicit Calls When a block is of such a nature that a declaration call would be appropriate, and yet is used in only one place, it is clearly not necessary to give a name to that block. Under these circumstances the name and colon in the declaration call may be omitted, thereby creating an implicit call. For each implicit call in the syntax, the TWINKLE translator generates a unique name through which the relevant block of code may be referenced by the ISL translator. 28 2.2.3 Parameterized Semantic Calls Semantic calls, -with the exception of implicit calls, may be modified by a list of integer constant parameters separated by commas enclosed in paren- theses and placed immediately following the action or test name. These constants are used by the parser to set a group of global variables (the array row PARAM) that may be referenced by the semantic routine when it is called. This is frequently very useful when a number of portions of the syntax, which would otherwise require different semantic actions, may be serviced by a single, appropriately parameterized, action. For example, in recognizing written out characters, TWINKLE employs a single semantic action whose parameter is the internal code number of the symbol recognized. 2.2.6 Bit Actions and Tests Frequently, a semantic action involves nothing more than the setting of a single bit. Similarly, a semantic test is frequently based on the condition of such a bit. Calling a semantic block to perform these manipulations requires a disproportionate amount of overhead and it was, therefore, considered appropriate to introduce special action and test types specifically for performing bit opera- tions. The syntax of the bit action is: f#SET #g #s [#reset| #bit where is either a number, identifier, or TWINKLE special word. The de- signated bit is correspondingly either set or reset. The syntax for the bit test r x J#ON #§ #T #BIT |#OFfJ Condition Action The test is true if the designated bit is in the condition specified (i.e., ON or F). The default condition is ON. If the test is true the indicated action is the designated bit. Up to kQ different bit names may be used by the lan- These are assigned by the TWINKLE translator to the hd bits of Le, ACTIONBITS. 29 2.2.7 The Tail In most languages it will not be desirable to declare each block (implicitly or otherwise) directly in the syntax specification. Also, all but the simplest of semantics will require a number of variables and procedures declared globally to the individual semantic blocks. For the sake of complete- ness, these global declarations and undeclared blocks may be enclosed between the special words, BEGIN and END, (thus forming the semantic tail) and appended to the syntax specification. This tail will then be passed directly to the ISL translator upon completion of the TWINKLE translation. In this way a language may be completely processed by the TWS from a single complete specifi- cation of the language. During the debugging phase of the language development, it is more natural to process the syntax and semantics separately. The details of coordinating the ISL translator with the rest of TWS in an independent run are discussed by Machado [lo] • 2.2.8 Placement of Calls A semantic call may appear anywhere in the syntax that a syntactic symbol may appear, except at the beginning of an alternative. Thus, a semantic call may not appear immediately after the arrow of a production, the left square bracKet of a square bracket construct, or the separator ("/" or OR) of a list of alternatives. The reason for this is that a semantic call cannot be made by the parser until it has determined exactly what stage the parse has reached. Clearly the parser cannot, in general, determine this at the beginning of an alternative. Unfortunately, there is a good deal more to placing semantic calls than simply knowing where they will be legal. The ideal time to place them would be after the FPL form (see chapter 3 and the paper by Beals [3 ]) had been generated. The condition of the stack and the phase of the parse 30 would then "be known explicitly. It is, however, fairly straightforward to place them directly into BNF as was done in earlier versions of the TWS. The problems present in TWINKLE, with respect to call placement, arise chiefly from the complex structures (lists, enclosures, etc.) available and from the large amount of grammar transformation that is inherent in TWINKLE translation. Thus, virtually all of the TWINKLE constructs not present in BNF employ some form of TWINKLE generated nonterminals in their implementation. Because of this, the configuration of the stack at the moment of a semantic call may not be easy to determine. The following guidelines will be helpful in creating and placing semantic calls to achieve a given end: 1. A semantic call is made following the recognition of the symbol or construct at the same bracket nesting level immediately preceding its occurrence in the syntax. For example, if one writes : : = LIST SEP @S1 , semantic action "1" will be called after the entire list has been recognized and not after each separator, . The latter effect may be achieved by : : = LIST SEP [ @Sl] 2. A semantic routine should not reference the stack for symbols at the same nesting level as its call — provided that the call and the symbol are separated by one of the non-BNF TWINKLE constructs. For example, in the TWINKLE production, : := LIST @S A , the semantic action, "A", may reliably reference the nonterminals, and , but may not reference the nonterminals, and , from which its call is separated by the list construct. 31 3« A semantic routine should not reference symbols which occur at different nesting levels from its call. The only exception to this rule is the case in which a, semantic call immediately follows the right square bracket of a square bracket construct. The call is then, in essence, copied onto the end of each of the alternatives within the square brackets. A detailed example showing the placement of semantic calls in a TWINKLE grammar for a subset of ALGOL is provided by Machado [lo]« 2.3 Null and Empty Symbols Several forms of context analysis which are performed automatically by the TWS must be provided by the user under the TBNF system. The three special words (BACK, AHEAD, and NOT), which are used in TBNF to provide con- textual information, have no meaning to the BNF half of the TWS. Therefore, when translating into BNF, these special words and the constructs that they herald are meaningless to TWINKLE and are referred to as null symbols. In addition to these null symbols, TWINKLE provides two forms of comments which are meaningless and therefore qualify as null symbols. Any string of symbols enclosed in parentheses, the left-most parenthesis of which is not preceded by either a sharp or a semantic call, constitutes a comment and is deleted by the scanner. Any string of symbols preceded by the special words, COMMENT or C, and not including the symbols [,],;,., or the special words, BEGIN and END, also constitutes a comment. An empty symbol denotes a string of zero length. It is written as either one of the special words, EMPTY or LAMBDA, or as an adjacent pair of left and right nonterminal parentheses (i.e., < > or " ") . 32 3- CONTROL OF THE TWINKLE TRANSLATION The TWINKLE translation is merely the first step in a chain of opera- tions undertaken in generating a compiler for a language. Control options specified in the TWINKLE input may be intended for use in a later phase of the TWS. To make the meaning of these options clear a brief description of the entire TWS is now given. 3«1 The Translator Writing System Figure 2 presents a block diagram of the interrelations between the programs which make up the Translator Writing System when creating a compiler for a language, L. As indicated, the TWS can generate either a recursive descent compiler or a deterministic Floyd production compiler, the decision being made by the user through the PARSER control card. Consideration will be given first to the Floyd production section of the TWS which comprises the TWINKLE translator, the ISL translator (ISLTRAN), BNF2FPL, FPL2PAR, PAR2ALG and finally, the ALGOL compiler. A unified syntactic and semantic description of L is provided as input to the TWINKLE translator. The translator extracts the syntactic infor- mation which it transforms into BNF and places, together with several other tables, it in a disk file labelled L/TABLESF. Similarly, the semantic portion the input is placed in file, L/ ACTIONS, for use by ISLTRAN. The TWINKLE translator then initiates execution of both BNF2FPL and ISLTRAN. The BNF syntax of L is transformed by BNF2FPL into Floyd productions (FPL) which are ilaced in L/FLOYDP while additional tables are placed in L/TABLESF and the information in the first record of L/TABLESF is updated. BNF2FPL .itiates execution of FPL2PAR which transforms the FPL syntax from nto a stream of pseudo-orders which are returned to L/TABLESF. itrol information in the first record is updated. 33 r< O / -P •H £ 5h W > R< CD o H P H o fe o l-H o rH 0) H •H o o nasavd/T aviTna/i CO H EH 5n/i CO O l-H J O Jh C\ o ^ CD fx ( rH ^ Si |S ; h O 0) PC 1 rH Ph C5 > CO bO Si •H •P ■H fH IS fH o -P 05 rH W rl EH CD -P ; where is any string of letters and digits beginning with a letter. These characters, or the first seven if there are more, are used as a prefix for all the interlinking and output files generated by the TWS, including the TWINKLE translator. 3.2.2 Print Options The print control statement consists of the special word, PRINT, followed by a colon, followed by a list of print options separated by commas, followed by a semicolon. The options available are defined below. 1. TABLES SIFESF: causes the printing of a table displaying the sizes and locations of all of the tables in the f ile, TABLESF • 2. TERMINALS ALPHABETICALLY, or TRMALF: causes the printing of an alphabetic y list of all of the special -words used in the language. If BNF is being generated then the list includes an index of each occurrence of the special words in PROTAB. 3. TERMINALS NUMERICALLY, or TRMNUM: causes the printing of a numerically ordered list of all of the special words used in the language. h. CHARACTERS , or TRMCHR: causes the printing of an index of the occurrences of the 6k characters in PROTAB. 5- TERMINALS: is equivalent to 2, 3, and k taken together. 6. NONTERMINALS ALPHABETICALLY, or NTALF: causes the printing of an alpha- betical list of all of the nonterminals used in the language. If BNF is generated, the list includes an index of each occurrence of the nonterminals in PROTAB- 7- NONTERMINALS NUMERICALLY or NTNUM: causes the printing of a numerically ordered list of all of the nonterminals used in the language. . NONTERMINALS: is equivalent to 6 and 7 taken together. 9- SYNTAX, or INPUT: causes the printing of the TWINKLE input as it is read. 0. INDEX, or XREF, or CROSS REFERENCE: causes the printing of an index of occurrences of all nonterminals, terminals, and actions in the syntax by card number. 1. AC! IONS ALPHABETICALLY, or ACTALF: causes the printing of an alphabetical all of the semantic actions and tests used Ln the language. If I generated, the list includes an index of each occurrence of d tests in PROTAB. 37 12. ACTIONS NUMERICALLY, or ACTNUM: causes the printing of a numerically ordered list of all of the semantic actions and tests used in the language. 13. ACTIONS: is equivalent tc 11 and 12 taken together. Ik. PROTAB: is the name of the table into which TWINKLE places the BNF equivalent of TWINKLE syntax in the input. This option causes the printing of this table. 15. FLOYD: BNF2FPL transforms PROTAB into a set of Floyd productions in the disk file, L/FLOYDP. This option causes the printing of these Floyd productions. 16. COMBINED GROUPS: causes the printing of the components of all of the combined groups required by the language. For a discussion on the use of combined groups, see the paper by Beals [ 3 ] • 17. PARSER: FPL2PAR transforms the Floyd productions in FLOYDP into a stream of pseudo-orders which make up the parser. This option causes the printing of this stream of pseudo-orders. 18. PATTERNS: causes the printing of a table of the patterns created in the TWINKLE translator processing of L, as well as any additional patterns created by either BNF2FPL or FPL2PAR. 19- STANDARD: is the union of options 1, 5> 8, 9, 10, 13, 1^, and 18. 20. DEBUGGING, or DEBUGN: is the union of options 15, l6 and 19, i.e., of everything but option 17 • 21. EVERYTHING: this is the union of all options. 22. NOTHING: this option, when used by itself, inhibits all printing. 38 If no print control statement appears, the print options are set to the default option, STANDARD. 3-2»3 The Parser Type Option As mentioned several times above, the TWS is equipped to produce compilers based on either recursive descent or Floyd production language parsers. It is appropriate, therefore, to have a control statement for determining -which is to be generated. The relevant control statement is : PARSER; where is either RECURSIVE DESCENT or FLOYD PRODUCTION. 3-2.U Zip Control In Burroughs B-5500 ALGOL it is possible for one ALGOL program to initiate execution of another by executing a zip statement (i.e., by zipping to the other program). The component programs of the TWS use the zip statement to initiate their successors. In normal operation zipping continues through final compilation by the ALGOL compiler. Frequently, a user does not desire execution of the entire TWS but may wish, for example, to check just the syntax, or just the semantics, of the input. This possibility is allowed for in the TWS by the zip control statements which are listed below: ZIP TO ISLj DONT ZIP TO ISL; DONT ZIP; ZIP THROUGH ; re ::= TWST/BNF2FPL/FPL2PAR/PAR2ALG/ISL/ALG0L. use of these control statements is self-evident. 39 3-2. 5 Program Parameter Control Each of the programs in the TWS has certain program parameters which are normally assigned default values that permit compiler generation for many small languages. It is possible, however, that a particular language may require more execution time, a larger stacksize, or a higher B-5500 core memory estimate to run successfully through some phase of the TWS. Corres- pondingly, it may be desirable when processing some smaller languages to decrease the values of some of these program parameters. This can be done with the three program parameter control statements shown below: PRIORITY = <*N>; CORE - <*N>; STACK = <*N>; where was defined in the last section and <*N> is a positive integer. These set the priority (and, implicitly, the time limit), the core estimate, and the stacksize, respectively, of the program designated. These parameters are then used in zipping to the program. If the is COMPILER, the parameters are passed to the ALGOL compiler and become the default parameters for the generated language compiler. 3> 2 .6 Executable Compiler Options It was noted in section 3-1 that a Floyd production parser generated by the TWS may be, to a greater or a lesser extent, an executable parser. The default option is a parser which is wholly interpretive, but an executable version of any of the three parser sections may be requested by use of the control statements shown below: EXECUTABLE LOOKAHEAD; EXECUTABLE FILL TABLES; EXECUTABLE FLOYD PRODUCTIONS; . 1+0 If the lookahead and fill tables portions of the parser are interpretive, the resultant compiler, L/DISK, may only he executed if L/TABLESF is resident on disk. By making these two portions executable, the parser becomes a self-contained unit and compilation in L requires only L/DISK. 3'2.7 Miscellaneous Control Options CLOSE, CLOSE LP, CLOSE LINEPRINTER, or CLOSE LINE PRINTER: applies to BNF2FPL; it causes a separate file of output to be created each time an error occurs during execution of BNF2FPL. In this way a user can ascertain the cause of some errors before BNF2FPL runs to completion. LONG LOOKAHEAD: applies to BNF2FPL; it specifies a four symbol lookahead to be used in differentiating before deciding that the group cannot be built. If the Floyd productions of a group being generated cannot be differentiated by a three symbol lookahead and if combination is not possible, the group is not normally built and the BNF2FPL translation fails. In practice it has been found that when a lookahead of three symbols fails, no additional amount of lookahead will help. COMBINE FIRST: applies to BNF2FPL; it specifies that Floyd produc- tion combination be attempted after a. one symbol lookahead has failed to differentiate, but before attempting a two or three symbol lookahead; if combina- is not possible, two and three symbol lookaheads will be attempted before abandoning the group. FLOYD PRODUCTIONS PER PROCEDURE: <*N>: applies to PAR2ALG- When creating an executable parser, PAR2ALG generates procedures -- each containing some specified number of the Floyd productions of the language. This number ically 100, but may be set by the language designer to any desired • kl GROUPS PER PROCEDURE: <*N>: applies to PAR2ALG; determines the number of groups of Floyd productions in each executable parser procedure. PROGRAM SYMBOL: : is followed by a nonterminal name, say , ■which is taken to be the unique objective symbol of the language in question; if this option is not used, the first nonterminal to appear in the syntax specification is taken as the unique objective symbol for the language. SPECIAL SYMBOLS: : may be used to force a particular ordering of the special words of the language which are otherwise numbered in the order in which they first appear in the syntax. 3-3 Burroughs B-5500 Control Cards for Executing TWINKLE The TWINKLE translator is executed like a compiler on the B-5500 system. When the syntax to be translated is on cards, the following deck set up may be used: ? USER - Language designer's user code ? COMPILE A/B WITH TWINKLE LIBRARY ? DATA CARD input syntax ? END. Since TWINKLE does not create executable code, the file, A/B, is not used, and the name may be specified arbitrarily by the language designer. Because this file is not used, either of the following forms may be used when the input syntax is a file on disk, say PLl/SYNTAX: ? USER = Language designer's user code ? COMPILE A/B WITH TWINKLE LIBRARY ? TWINKLE FILE CARD = PLl/SYNTAX SERIAL ? END: or k2 ? USER = Language designer's user code ? COMPILE PL1/ SYNTAX WITH TWINKLE LIBRARY ? END. In the latter case, TWINKLE discovers that the input is not on cards and that no file has been equated to file, CARD. It then investigates the code file and, if it exists on disk, takes it as the file, CARD. In the former case a file has been equated to f ile, CARD, so this is taken as the input syntax. In this case, the code file, A/B, is not used and may be named arbitrarily. ^3 k. IMPLEMENTATION OF THE TWINKLE TPANSLATOR The TWINKLE translator has been implemented with the TWS in a bootstrapping fashion. The preliminary version of the translator was written in BNF and processed on the portion of the TWS then existing, which was essentially equivalent to the BNF2FPL, FPL2PAR and PAR2ALG stages of the current TWS. Each subsequent revision to the TWINKLE translator was imple- mented with the aid of its predecessor. Thus, although the present syntax is much more sophisticated than the initial syntax, it is also shorter and considerably more readable. The following sections detail some of the salient features of the TWINKLE translator. k.l NONTAB, SYMTAB and OPRTAB As each nonterminal is read from the input syntax, its name is com- pared against all those presently entered in NONTAB. If a match is found, the corresponding nonterminal number is extracted from the relevant field of the header word for the matching table entry. If no match is dis- covered, a new entry is made. The entries are linked through the header words in a binary tree which is alphabe* J cally ordered by the nonterminal names. The format of the entries is shown below in figure 3* Associated with each new entry into NONTAB is an entry into NTINDX pointing to the header word of the nonterminal in NONTAB which facilitates printing out the non- terminal names when necessary. Also, if the CROSS REFERENCE print option has been activated by the user, the NTINDX word for a given nonterminal contains a pointer to the base of an inter-linked list of the occurrences of that nonterminal in the input syntax by line number. The actual repository for this list, as well as those for the other nonterminals, terminals, and hh action symbols from the input syntax, is an array called OVERALLINDEX, each of whose entries contains a line number, a bit showing whether the specified occurrence was on the right or left hand side of a production, and a pointer to the entry for the next occurrence of the item in whose list the entry resides. 16 19 22 HEADER X,> n WORDS m chars Figure 3- An entry in the NONTAB table Figure 3 shows the details of a single entry in NONTAB. Consider, first, the header word. Nonterminals (with the exception of the unique objective symbol) used in the syntactic input must appear on the left hand side of at least one production and on the right hand side of at least one (not necessarily different) production. The INLHS bit is set on recognizing the nonterminal as the left hand side of a production and the INRHS bit is set on recognizing it in the right hand side of a production. These bits are checked at the conclusion of syntax input and any discrepancies are reported as errors on the TWINKLE output file, LINE. The SYMBOLVALUE field contains rial number (code) of the nonterminal symbol represented by this ^5 NONTAB entry. Given a nonterminal name of k characters, n of the WORDS field and m of the CHARS field are given by n = [k/l6] and m = k - 6* (n+l). These two fields, taken together^ determine the extent of useful information in the remaining words of the NONTAB entry. Finally, the LEFTPOINTER and RIGHT- POINTER fields contain pointers to subsequent entries in the alphabetic binary tree which NONTAB comprises. The remaining words in the NONTAB entry consist of n words containing 6 characters each of the nonterminal name right justified with 2 unused characters at the left, and, in the last word, 2 unused charac- ters, m characters from the nonterminal name, and blanks filled to the right. Corresponding to NONTAB and NTINDX for nonterminal storage are the pairs of tables (SYMTAB, STINDX), and (OPRTAB, OTINDX) for storing special symbols, and semantic symbols, respectively. As mentioned above, the line by line index information for both these types of symbols is stored in OPERALLINDEX along with that for nonterminals. While STINDX and OTINDX are identical counterparts to NTINDX, entries in SYMTAB and OPRTAB differ slightly from those in NONTAB and, in fact, from one another. In the case of SYMTAB, there is, clearly, no need for the INLHS and INRHS bits since a special symbol, if it appears at all, must appear on the right-hand side of a production. Consequently, in the header words for SYMTAB, these bits are included as a portion of the SYMBOLVALUE field. In OPRTAB header words it is also clear that the INLHS and INRHS bits are unnecessary, but here the first bit is unused and the second bit becomes the USED bit denoting an action or test symbol that has been declared in the syntax. This bit is read by ISL/DISK to deter- mine which actions it must get from the file,/ACTIONS . It is also used by TWINKLE to catch duplicate declarations of the same semantic name. k.2 PROTAB, PRODS and PDLIST The primary table into which TWINKLE collects the BNF productions which it produces from its TWINKLE input is PROTAB. Figure k shows the fields of a he PROTAB word. The FLAGS field comprises a set of six one bit flags carrying 6 12 18 30 36 FLAGS NEXT LHS TYPE ENTRY SYMBOL Figure h. PROTAB word format various pieces of information. These flags are referred to as: IREC, NOBACK, TRMDER, LASTNT, SFLAG and REC The IREC flag is set only in the first word of a production and denotes an indirectly left recursive production. That is, a production of the form: : : = a for which is a headsymbol of . When the NOBACK flag is set, the symbol in the SYMBOL field is not to be back-substituted into this PROTAB location. If the symbol in the SYMBOL field has a terminal derivation, then the TRMDER flag is set. The LASTNT flag is set when the symbol in the SYMBOL field is a nonterminal and, except for possible trailing semantic symbols, is the last symbol in the production. The SFLAG flag heralds a semantic symbol as the next symbol in the production. Finally, the REC flag is set in the first symbol of a production if the symbols in the LHS and SYMBOL fields are the same (i.e., if the production is left recursive). The remaining fields are : the NEXT field which gives the number of words to the beginning of the next production, the LHS field which contains the number of the nonterminal being defined, and the TYPE and ENTRY fields of this particular right-hand side symbol. TWINKLE performs a number of grammar transformations on a local level. Frequently more than one production is being built simultaneously, as happens when nested definitions are being translated, or when lists are being imple- ments. These difficulties make it very cumbersome for TWINKLE to put produc- tions directly into PROTAB. To circumvent these problems TWINKLE uses a hi 556 element entry table (PRODS ) as a directory and status table for 255 thirty- two- symbol productions which are stored in the PDLIST array. The words of PRODS form a list linked forward through the NEXTPD fields and backwards through the IASTPD fields. The first entry of PRODS acts as a base for the productions currently in use. The base of available productions is given by the integer variable, FIRSTPDA VAIL. To manipulate these two structures, TWINKLE uses two procedures: GETPROD and GIVEUP. GETPROD is an integer -typed pro- cedure which has no arguments and, when called, returns the address of the next available element of PRODS after incorporating it into the link structure and making the necessary modifications to the various list pointers. GIVEUP has as its sole argument the address of an element of PRODS to be removed from the link structure and returned to available pool. The PDLIST symbols cor- responding to PRODS(N) are PDLIST (32 x N) through PDLIST (32 X N + 31). In addition to serving as the link structure for the productions in PDLIST, each entry of PRODS contains the following information about the production with which it is associated: (i) COMPLETE: a flag which indicates that the associated production is no longer being extended; (ii) LEVEL: an eight bit field recording the level of bracket nesting at which the production originated; (iii) LAMDA: a flag which indicates whether the production has participated in empty context absorption; (iv) NEXT SYMBOL: a five bit field containing a count of the symbols currently in the production; (v) DORNO: a three bit field which identifies whether the left-hand side of the production is a simple nonterminal or one of the several TWINKLE generated nonterminal types; (vi) LHS: a twelve bit field containing the nonterminal number of the 48 left-hand side of the production. Each word of PDLIST contains only the left-hand side of the produc- tion in the fields, IjHSDORNO and LHS, and one symbol from the right-hand side in the fields: DORNO, TZPE, and ENTRY. The remaining information necessary in PROTAB is not filled in until a production actually enters PROTAB. Productions which are being created in PDLIST may be extended by calling the procedure, ADDON. ADDON has two arguments: the first is the number of the production to be extended; the second is the symbol by which it is to be extended. If the symbol to be added is a TWINKLE -generated nonterminal representing the alternatives of a nested definition set, the procedure for adding it to a production is somewhat complex. Each of the alternatives must be added on individually and, if more than one alternative is present, new productions must be created. Thus, for example, if Oi represents the string of symbols in the production being extended, and f3 through p represent the strings of symbols in the alternatives of the symbol being added on, ADDON will replace the production: : : - a by n productions: : := a& : : = ap g : : = OB K n At the same time ADDON removes the n productions: .:= p x : : = P 2 : : = B K n where - the TWINKLE -gene rated nonterminal referred to above. By expanding U 9 nested definitions in this way, TWINKLE ensures that each of the productions which it creates retains all of the context that the language designer specified in the original TWINKLE production. If the symbol being added on is any other type of symbol, say the symbol, a, ADDON replaces the production: : : = a by the production: : : = aa where GC is as above. When it has been arcertained that production, P, is completed and is ready to be put into PROTAB, the procedure, PUTINPROTAB, is called with the para- meter, P. PUTINPROTAB fills in the NEXT and FLAGS fields of the production and writes it directly into the next available locations in PROTAB. Once this has been accomplished the production is returned to the free pool. k.3 PRODSTACK and LPSTACK The TWINKLE language offers the user only two recursive constructs. These are nested definitions and list structures, which are implemented with PRODSTACK and LPSTACK, respectively. These are dimensioned to allow nesting of either definitions or lists to a depth of thirty, but this may, of course, be easily altered in the unlikely event that it needs to be. The formats of words in these two stacks are shown below (in figures 5 and 6, respectively). SYMBOL EXSYMBOL Fieure 5. The format of a PRODSTACK entrv. 50 SEPD0RN0 SEP LSITTYPE 3^1 9 LBDORNO 15 27 SEPTYPE SEPENTRY 30 36 LBTYPE LBENTRY SEPSYMBOL LBSYMBOL EXSEPSYMBOL EXLBSYMBOL Figure 6. The format of a LPSTACK entry. PRODSTACK is simply a stack of single symbols- -the top symbol being the left-hand side of the set of productions currently being built. Whenever the left-hand side of a TWINKLE production is encountered, the left-hand side nonterminal symbol is pushed into PRODSTACK. Similarly, when the left square bracket of a square brackets construct is encountered, a TWINKLE- generated temporary dummy (the DORNO field is set to one) is created and pushed into PRODSTACK. Right square brackets and semicolons, which end square bracket constructs and productions, respectively, cause the top of PRODSTACK to be popped. PRODSTACK is used in assigning names to TWINKLE- gene rated permanent dummy nonterminals in the following manner. When such a dummy is required (e.g., in the generation of lists, see below), PRODSTACK is searched downward from the top for a natural nonterminal (which is identified by a zero in the DORNO field) . There must always be such an entry because PRODSTACK always extends to the beginning of some TWINKLE production which must start with a natural nonterminal. The alphanumeric characters which make up the name of the nonterminal are obtained from NONTAB. The desired name is then created to these characters a blank, the characters "DUMMY" another blank, and finally a number unique to this nonterminal. Since blanks cannot appear in natural nonterminal names, these serve to ensure that no duplication of 51 nonterminal names can arise by this procedure. As an example of this, the list structure in the TWINKLE production: A CONSISTS OF A LIST OF S ; would be implemented with a permanent dummy nonterminal named PROGRAM DUMMY 1. Each entry of LPSTACK eventually contains all the information neces- sary to construct the BNF equivalent productions for the list structure which generated it. Whenever the list type of a list structure is encountered, a new word is pushed into LPSTACK with an appropriate setting of the LISTTYPE bit (i.e., 1 for a nonempty list and for a possibly empty list). When a list base is recognized, the sub-fields of the EXLBSYMBOL field are set in the top word of LPSTACK to identify the base. Similarly, recognition of a list separator causes the subfields of the EXSEPSYMBOL to be filled in the top word of LPSTACK; the SEP bit is set to zero for a definite separator and to one for a possibly empty separator. If the list does not have a separator, the EXSEPSYMBOL field is set to indicate an empty separator and the SEP bit is set to zero. The LPSTACK entry does not reflect the type of recursive desired for the list, i.e., left recursive or right recursive. However, this is determined syntactically and is transmitted to the semantics via the choice of the action called. Given that EXSEPSYMBOL contains the symbol, S, and that EXLBSYMBOL contains the symbol, B, figure 7 shows the productions which are generated to implement the list for various choices of SEP and LISTTYPE where is the TWINKLE- generated permanent dummy which implements the list structure and is a TWINKLE -gene rated temporary dummy. Recall that a SEP bit of one designates a definite separator and a zero bit designates a possibly empty separator; whereas, a LISTTYPE bit of one indicates a nonempty list and a zero bit indicates a possibly empty list. 52 CPE 1 I i i 1 SEP 1 i ° I 1 yes yes J no 1 no i yes yes i no no yes yes lyes i yes yes no yes no yes yes yes yes -i : := : := < > : := S B : := B : := B (a) Left LISTYPE 1 1 : : : : : SEP 1 ' 1 yesj yes no no := yes; yes no no := < > yes yes : yes yes := B S yes no jyes no := B yes yes i 1 yes -i yes := B (b) Right Figure 7- Left recursive and right recursive lists. Note that possibly empty lists are characterized by the nonterminal; whereas, nonempty lists are characterized directly by the list implementing nonterminal . 53 k.k Any Patterns An any pattern comprises 38^ bits (eight words of kQ -bits each) for which there exists a one-to-one correspondence between the lower numbered bits and the terminal symbols of the language. Bits zero through 63 correspond to the Gh characters of the Burroughs B5500 system;, bits 66 through 85 correspond to the possible special terminal classes; bits 86 and above correspond to the special words of the language. Bits Gh and 65 are never set in an any pattern because they correspond to a terminating character and an illegal character, respectively. Any patterns are stored end-to-end in a 512 -word array, ANYPAT. Each bit that is set in an any pattern indicates that the corresponding terminal is represented by the any pattern. During the preliminary translation, any patterns are actually created in the negative- -that is, the bits are set if the corresponding terminal is not represented by the particular any pattern. This condition is rectified when the syntax input has been completed. Each pattern is transmitted to the procedure, CLEANUP, which converts it to the required form which may involve recursive calls of CLEANUP if the any base for the pattern was a nonterminal. k.5 Grammar Transformations The PROTAB generated by a TWINKLE translation is not, in general, in a form acceptable to BNF2FPL. A number of transformations must therefore be performed on a BNF grammar to increase the probability of its acceptance to the overall TWS. In addition, a few transformations are applied to increase the efficiency of the resultant compiler. These are described below in the order in which they are performed. h.^.1 Back Context Absorption In translating from TWINKLE into BNF, it is important that TWINKLE retain any context in the BNF that was inherent in the original TWINKLE ^ production. When a left recursive list is created, it is necessary for the non-recursive productions of the list- implementing nonterminal to absorb any of the context prior to it that may have arisen from constructs within the TWINKLE production that initially gave rise to the list. As may perhaps be anticipated, problems begin to crop up when more than one list is included in a single TWINKLE production- -as in the case of nested lists. Each time a definition is complete (i.e., whenever a slash, right square bracket, semicolon, or special word, OR, is encountered), the following context- absorbing algorithm is invoked. Let {P n |n = !>•••■> N} be the set of productions in PDLIST and {D |n = 1,..., N} be the corresponding entries in PRODS. Further, let P (j) denote the j-th symbol on the right-hand side of the n-th production and let R denote the corresponding left-hand side. The right-hand sides of P n are then scanned for the occurence of a left- recursive TWINKLE- generated dummy nonterminal, say P (j) (such a nonterminal is characterized by a 3 in the DORNO field discussed in section h.2). If j > 1, or if the LAMDA bit of PRODS is set, a new production, P (k = N + l), K. is created as follows: P k (i - j + 1) = P n (i) i = 3, + 1,..., ::= a LIST [LIST b SEP c] d LIST e where a, b, c, and d represent terminal symbols. The productions, before context absorption, are: P : :: = \a d P : : : - *» P : : : = P, : : : = Kb P : : : = c b P^ : : : = Xe P : : : = e where \ denotes productions, P., for which the LAMDA bit of D. is set to 1. Then: P : : : = X * P^ : : : = X b * 56 P c : ::= c b 5 P^: ::= X e P : : : = e Po : : : = c P : : P 1Q : : P : : - \ a - = b are the productions remaining after P, (l) and P~(2) have participated in context absorption. Note that P^l) does not participate because the asterisk indicates that the LEVEL field of D p is equal to OLDMA.KKER. Since P (l) and P 7 (l) cannot participate because they lack a X : 10' 11' 12' 13' : : = \ * : : = X b * : : = c b : : = \e * : : = e : : = : : = b : : = : : = c e : * s; \ a b show the productions remaining after context has been absorbed from Po(3) and 2). Note that the duplication of P from P is inhibited. Finally: P c .: : 5 P : : := c b := e \ : ::= 57 P : P 12 : P : P . : = b = = c e = X a b show the results of eliminating all productions marked with an asterisk. Note also that RD1 is no longer explicitly left-recursive but retains implicit left-recursiveness through P and P • It is easy to see that the terminal strings defined by the fast set of productions above are exactly those defined by P through P originally. Context absorption is a local grammar transformation and is, there- fore, carried out in PDLIST before the productions are entered into PROTAB. All of the grammar transformations are global and are performed in PROTAB, it- self. To discuss these with some facility, the notation of this section will be altered and augmented as follows. The i-th production in PROTAB will be denoted by P., its right-hand side symbols by T-(j), where j ranges from 1 through I. , the number of symbols on the right-hand side. The left-hand side will be denoted by R. • Finally, for each nonterminal n, N(n) and T(n) will be the sets of nonterminal and terminal head symbols of n, respec- tively. h . 5 . 2 Empty Removal The control of ADDON and PUTINPROTAB is such that empty symbols may appear in PROTAB only as productions in themselves. That is, if a production contains an empty symbol, that must be its only symbol. Even in this relatively mild form, however, empty symbols are unacceptable to BNF2FPL and it falls to TWINKLE to remove them by back-substituting and collapsing PROTAB around them. Since PROTAB always contains the initial production: : : = J_ J_ 58 where is the unique objective symbol of the language, and "_L" is a special termination terminal symbol that appears nowhere else in PROTAB. Furthermore, there is no nonterminal for which the only production is LAMBDA since there is a check that each nonterminal has a terminal string derivation. Each production, P., for which I. = 1 is tested to determine whether P.(l) is the empty symbol LAMBDA. When such a production is found, PROTAB is scanned for occurrences of R. . Such an occurrence, say P (j) = R. , 1 n 0/ i generates a new production, P, at the end of PROTAB according to the rules: (i) if l = 1 then ' n a) 4 k - 1 b > \-\ c) P v (l) is an empty symbol; (ii) if I > 1 then n c) P k (m) = P n (m) m=l, ...,j-l; P k (m)=P n (m+l), m-J,...,^. In the first case, P will be picked up in its turn as an empty production. In the second case,P, will eventually be scanned for occurrences of R. and possibly generate further new productions. During the course of this pro- cedure new productions are compared against existing productions in PROTAB for identity. If a match is found the new production is not entered into PROTAB. . , . ';; Back Substitution of Singly Defined Nonterminals Unlike LAMBDA removal, without which PROTAB is unacceptable to BNF2FPL, back- substitution of singly-defined nonterminals merely serves to 59 increase the overall efficiency of the resultant compiler by decreasing the number of reductions required to recognize certain nonterminals. The algorithm for back-substitution is very straightforward and proceeds as follows. The sei^ A, of singly defined nonterminals is determined by one pass through PROTAB. Then, for each nonterminal, n, in A, PROTAB is scanned for productions, P., in which P.(«j) = n for some j such that 1 < j < t. • A new production, P^ is then created such that: (i) r^v (ID ^ = i ± + t n , . 1; (iii) P. (m) m = 1, ..., j - 1; P n , (m - J + 1) m = $t • • » > J + t n . - 1; P i (m - \,+ 1) m = J + l^t-o'tlj^ where P , is the single production for which R , = n. The production,?., is then deleted and the scan for occurrences of n continues, eventually reaching P and checking for possible further occurrences of n. k.^.h Dummy Insertion It has been mentioned, briefly, that two otherwise indistinguishable, Floyd productions may be differentiated by a look ahead of at most three symbols, If this much look ahead is insufficient, BKF2FPL attempts to combine the pro- ductions deciding (in essence) that the differentiation may be postponed. Of course differentiation must eventually be accomplished by look ahead, if at all. There are two situations in which this combination cannot be performed. First, a terminal symbol may not be combined with a nonterminal. Second, if A and B are two nonterminals for which a combination is proposed, that combination cannot be carried out if A is a head symbol of B (or, of course, if B is a head symbol of A). The reason for this is that the parser, in the course of 6o looking for an A or a B, will then be satisfied by finding an A, even though that A may, in fact, be the beginning of a B which a correct parse would discover. The approach to this problem in the original TWS was to look for BNF productions of the form: : : - a A p : : = a y (1) where a is a nonempty string of terminals and/or nonterminals, and A is a headsymbol of the nonterminal, B (A may be either a terminal or a nonterminal) These productions were modified to; := a 7 := A . (2) Now Q, is not a headsymbol of B and, if a Q is found, there can be no question of its actually heralding a B. This can create problems, however, when |3 = P-iPo an( ^ there exists a production: : : ~ AP 1 t 1 . (3) If (3 is more than three symbols long, the parser is unable to determine, when it finds an A, whether that A should become a Q, or a N . It is not possible to combine since the Q, production calls for an immediate reduction. To obviate this difficulty, the present system performs dummy insertion as a dummy) by replacing the productions (l) by: a = AP 1 P 2 6l instead of the productions (2). The production (3) now causes no problem (if t\ and p are differentiable) because the burden of the differentiation can be put off by combination of the strings £ 2 and T . Experience has shown that this modification has profound effect on the ease of writing acceptable grammars. The algorithm by which TWINKLE performs dummy insertion is based on two procedures, MAKEDUMMYS and FIX, which call one another recursively. MAKEDUMMYS has three arguments (BASIC, FIRST, and LAST) which are addresses of productions in PROTAB. Each production from FIRST to LAST is compared against the production at BASIC to see if a dummy is needed. If it is determined that a dummy is required at PROTAB [I], FIX is called with I as a parameter and pro- ceeds to make the necessary alterations to PROTAB. The production thus created may, in turn, require dummies when compared to productions prior to BASIC To make these comparisons, FIX calls MAKEDUMMYS before returning. k.6 TWINKLE Output Files U.6.1 TABLESF The TABLESF file is the basic thread that ties the TWS together. The first record of TABLESF contains control information concerning which options are in effect, a directory of the various tables appearing in later records, and a brief of the progress through the system. This latter contains the last program to run successfully and the dates and times at which each program was most recently executed. Below is a list of entries in the first record of TABLESF. LANGNAME - the name of the language truncated to seven characters; TABLEBEGIN - a pointer to the first word of the table directory; PRINT OPT IONS - a word of control bits for printing, and a few other options; LASTPROGRAM -the number of the last program to run successfully '> (0 = TWINKLE; 1 - BNF2FPL; 2 = FPL2PAR; 3 = PAR2ALG); ZIPTHROUGHBITS - the number of the program through which the user desires the system to zip automatically; FIRSTGROUP - used by BNF2FPL; TWINKLEDATE TWINKLETIME BNFDATE BMFTIME FPLDATE FPLTBffi PARDATE PARTIME FPLPRIORITY FPLCORE PARPRIORITY PARCORE ISLPRIORITY ISLCORE ISLSTACK ALGOLPRIORITY ALGOLCORE ALGOLSTACK COMPPRIORITY COMPCORE COMPSTACK TABPRIORITY TABCORE TABSTACK N \ the date and time at which the indicated program was last run successfully; these are the various user controlled execution parameters of the programs indicated; the prefix, COMP, refers to the resultant compiler and the prefix, TAB, refers to the TWS printing program , PRINTAB/TWS; 63 NOSYM NONDUM FDDERPROC STKTOP NTS TINDEXSTART NNS NTINDEXSTART - NOS OPRINDEXSTART - SPSTABPT SPSTABSTART NONTABPT NONTABSTART OPRTABPT OPRTABSTART PROTABPT PROTABSTART PATTERNLENGTH - PATTERNPT PATTER1ISTART - the number of special terminal classes required by the language; the number of nonterminals prior to dummy insertion; the number of Floyd productions per procedure as set by the control card described in section 3.2.6; the length of the longest production in PROTAB; the number of special words used in the language; the record in which the special word index table, STIKDX, begins; the number of nonterminals used in the language; the record in which the nonterminal index table, NTINDX, begins the number of actions and tests used in the language; the record in which the action index table, OTINDX, begins; the number of entries in SYMTAB; the record in which SYMTAB begins; the number of entries in NONTAB; the record in which NONTAB begins; the number of entries in OPRTAB; the record in which OPRTAB begins; the number of entries in PROTAB; the record in which PROTAB begins; the length of each any pattern; the number of entries in ANYPAT; the record in which ANYPAT begins; 6k NOMORETABPT - a dummy pointer; NOMORETABSTART - the first available record at the end of ANYPAT. By using TABLEBEGIN and record pointers for referencing resident tables, the format of TABLESF is made very flexible. In fact, several additional tables are placed in TABLESF as a language runs its course through the TWS. The LASTPROGRAM entry ensures that no program in the TWS will operate on a TABLESF file that has not been properly prepared. U.6.2 ACTIONS Although semantic information is also included in the TWINKLE input, this information is placed, virtually without any processing, into the file, ACTIONS, which is passed on to the ISL translator for the majority of its processing. ACTIONS is a file of card images of which the first contains the number of different actions and tests used in the syntax, and the card number on which the semantic tail, if any, begins. If there is no semantic tail, this latter number is zero. Each action or test name is written onto the next available card in ACTIONS when it is read for the first time. When a semantic declaration call is encountered, the code which defines it is placed on consecutive cards in ACTIONS. To set these apart from the action and test names, two special cards are emitted before the code and two after it. The first card emitted has dollar signs in the first four columns and the number of the action with which the test is associated following these. The second card has only the special word, BEGIN. The first card following the code has only the special word, END, and, finally, there is a card with dollar signs in the first four columns once again. Implicit actions are treated similarly with the exception that a name rated for these and placed in ACTIONS along with the code. The 65 names generated for these actions are IMPLICIT!., IMPLICIT2, etc. k.7 ZIP Files TWINKLE must have the facilities for initiating the execution of several other programs, depending on the various control parameters obtained from the input. Since each program is initiated in essentially the same way, the following description of the initiation of BNF2FPL should suffice as an illustration. When it has been determined that a zip to BNF2FPL/TWS is required, the translator executes a SEARCH on file BNF2FPL, the multifile- and file- identifiers of which are "BNF2FPL" and "TWS", respectively. This determines whether or not BNF2FPL/TWS is presently in the disk directory. If it is not, the following message is issued to the system SPO : #TWINKLE SUSPENDED: BNF2FPL/TWS NOT IN DIRECTORY OK ZIP: mAXOK NO ZIP: mAXNO where, m is the mix number of the TWINKLE translator. At this point, the opera- tor must either load BNF2FPL/TWS and then restore TWINKLE 'to execution by entering mAXOK at the SPO, or terminate the TWINKLE execution by entering mAXNO at the SPO. Assuming that BNF2FPL was present or has been loaded, TWINKLE writes the following control cards in BNFTEMPFILE : ? EXECUTE BNF2FPL/TWS; FLLE NAME - Xaaaaaa ? STACK = nnnn ? DATA Xaaaaaa ctctctclclclcl ? END 66 5- SUMMARY The TWINKLE metalanguage unifies the TWS "by accepting, virtually without alteration, syntax specifications written in either TBNF or BNF and generating either TBNF or BNF specifications which then may "be used to create either a recursive descent or a deterministic Floyd production parser, respec- tively. The changes necessary to he made to the input involve such things as additional sharps on special words and characters, which frequently may be accomplished after a single preliminary run through the translator. TWINKLE goes well beyond these sub -languages, however, by offering a rich syntax for creating readable English descriptions of languages which are directly acceptable for computer manipulation. Thus, TWINKLE achieves the "lean mix of compact syntax metalanguage with natural language" which Perstein [2] has advocated. The sacrifice for this generality has been that the TWINKLE language, itself, is much more difficult to describe than either BNF or TBNF. Possible ambiguities arise in the use of English ; the interpretation taken must he stated explicitly. It is felt, however, that the TWINKLE interpreta- tion is most frequently that which would naturally occur to native speakers of the English language and hence, although precise definition requires much more effort, one's familiarity with natural English makes the TWINKLE syntax of a language immediately understandable. Floyd [12] offers the following example of BNF in explaining its metasymbolism : : :=