\ ■ 
 
 riKfiflWffiffil 
 
UNIVERSITY OF 
 ILLINOIS LIBRARY 
 
 At urbana-champaign 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/lingolreadablefo889kamp 
 
UIUCDCS-R-77-889 
 
 Of Z 
 
 ik'tt 
 
 LINGOL 
 
 September 1977 
 
 A Readable Formalism for Programming 
 
 
 
 Language Semantics 
 by 
 Garry R. Kampen 
 
 UILU-ENG 77 1766 
 
 •5* 
 
 •■* 
 
 • * 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 u*E£*i 
 
LINGOL: A Readable Formalism for Programming 
 Language Semantics 
 
 by 
 
 Garry R. Kampen 
 Assistant Professor of Computer Science 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 61801 
 
 September, 1977 
 
Abstract 
 
 This paper describes a metanotation for defining the syntax 
 and semantics of a programming language in a rigorously formal manner. 
 Definitions are operational: A semantic definition is a set of string 
 transformation rules that operate on concrete representations of programs 
 and their environments. 
 
 The formalism is simple and easy to learn, and produces relatively 
 readable language descriptions. To illustrate the formalism, and to facilitate 
 comparison with other metalanguages, a formal definition of the simple 
 programming language ASPLE is presented. The method is compared in detail 
 with the W-grammar approach, and some techniques for verifying the consistency 
 of definitions are discussed. 
 
Outline 
 
 Pages 
 
 1. Introduction 1 
 
 2. Informal Description of LINGOL 3 
 
 3. Informal Description of ASPLE 7 
 
 4. Formal Definition of ASPLE 11 
 
 5. Evaluation 24 
 
 6. Verification 33 
 
 7. Summary 38 
 
 Figures 
 
 3.1. ASPLE Memory Structure 9 
 
 4.1. Syntax of ASPLE Programs 12 
 
 4.2. Syntax of ASPLE States 13 
 
 4.3. ASPLE Interpreter, Part I 16 
 
 4.4. ASPLE Interpreter, Part II 17 
 
 4.5. Expression Evaluation 17 
 
 4.6. Operators 18 
 
 4.7. Unary Functions 19 
 
 4.8. Transition Diagram 22 
 
 6.1. Derivation Tree 34 
 
 6.2. Domains of Productions 35 
 
 6.3. Internal Representation of Strings 37 
 
1. Introduction 
 
 Although BNF and similar syntactic metanotations have found wide 
 acceptance, the same cannot be said about formal means of specifying the 
 semantics of programming languages. A surprising variety of semantic 
 metanotations exist, and some of these have been used to define full-size 
 programming languages, but none have achieved widespread use. This is due 
 in part to the difficulty of learning the notation, and in part to the 
 size, complexity, and sheer unreadability of the definitions themselves. 
 
 In this paper we describe a syntactic and semantic metanotation, 
 LINGOL, which has several desirable properties: 
 
 - It is complete. Any language whose sentences are strings 
 of symbols from a finite alphabet and whose semantics are 
 definable by a Turing machine can be entirely defined in 
 LINGOL. 
 
 - It is simple. A small number of familiar mathematical 
 objects - sets, tuples, functions, and strings of symbols - 
 are related by two kinds of production rule. Standard nota- 
 tional conventions are used wherever possible. 
 
 - It is readily adaptable to mechanical verification and pro- 
 cessing. Definitions are operational, that is, they pro- 
 vide an algorithm for executing programs in the defined 
 language. 
 
 To illustrate LINGOL, we will use it to define a simple programming 
 language, ASPLE. The original definition of ASPLE is due to Cleaveland and 
 Uzgalis [ 1 ]. Its use here is motivated by the fact that ASPLE has become a 
 kind of benchmark for evaluating semantic formalisms. In a paper by Marcotty, 
 Ledgard, and Bochmann [ 3 ], ASPLE is defined using four very different methods 
 W-grammars, Production Systems, the Vienna Definition Language, and Attribute 
 Grammars. 
 
 To facilitate comparison, we have followed the style of [3] in our 
 definition where possible. Since LINGOL is most nearly related to W-grammars, 
 the W-grammar definition in particular has been used as a model. 
 
A longer example of LINGOL is given in [2], where a larger and 
 more realistic programming language is defined. The language is interactive, 
 hlock-structured, and self-extensihle, and it contains a full complement of 
 data and control structures. For a survey of other formal definition methods 
 and an extensive bibliography the reader is referred to [2] and [3]. 
 
 The remainder of this paper is organized as follows: Sections 2 
 and 3 contain informal descriptions of LINGOL and ASPLE respectively. Section 
 4 contains a context-free grammar for ASPLE and its formal semantic definition 
 in LINGOL. In section 5, we compare the LINGOL and W-grammar approaches to 
 semantics by means of a detailed example, and in section 6 we discuss methods 
 for verifying the consistency of formal definitions. 
 
2. Informal Description of LINGOL 
 
 The LINGOL formalism consists of two metalanguages, L and L~. 
 L is a syntactic metalanguage that resembles an extended version of 
 BNF; ^y i- s a semantic metalanguage whose 'programs' resemble Markov 
 algorithms or SNOBOL string transformation rules. L 9 is used to define 
 functions whose domain and range are strings or n-tuples of strings belonging 
 to syntactic classes defined in L . 
 
 We will illustrate LINGOL by defining the syntax and semantics 
 of a very simple language consisting of arithmetic expressions. Its grammar 
 is written in L as follows: 
 
 Exp -»■ Int Partexp 
 
 x: Partexp ■*■ (Op Int)* 
 
 °P + ± I Z 
 Int -> Digit+ 
 
 Digit -*■ 0|l| . .. |9 
 
 Informally, an expression is an integer followed by a partial expres- 
 sion. A partial expression is an operator-integer pair repeated zero or more 
 times. An operator is either the symbol + or the symbol =, and an integer 
 is a sequence of one or more digits. 
 
 Depending on the media used for presentation, strings of terminal 
 symbols may be indicated by using italics, underlining them, or enclosing 
 them in quotes. When quotes are used, the quote terminal symbol is represented 
 by a pair of quotes. 
 
 Nonterminal symbols are the names of syntactic classes. Adopting 
 a standard mathematical convention, we capitalize class names and use the 
 same name in lower case (possibly with an integer suffix) to denote a member 
 
of the class; thus expressions from the class Exp will be denoted by exp, 
 expl, exp2, and so on. In addition, the grammar above indicates that x 
 will denote a terminal string in the class Partexp. 
 
 The semantics of our simple language can be defined in various 
 ways. One possibility is to define a function F that maps every expression 
 exp into its value. Using L~, we write 
 
 F: int ■* int 
 
 int + intl ■* Sum(int, intl) 
 int = intl ■+ Compare (int, intl) 
 int x op intl -*■ F(F(int x)op intl) 
 
 F(exp) is evaluated by scanning the left-hand sides of the produc- 
 tions above starting with the topmost production. When a match with the string 
 exp is found, the expression on the right is evaluated. Sum and Compare are 
 functions that take strings of digits as arguments and return a string of 
 digits as a result. The function Compare returns 1_ if its arguments denote 
 the same integer, and JD otherwise; for example: 
 
 Compare (008,8) = 1 
 
 Note that the last production will only be applied when exp contains 
 two or more operators, as in the example below. 
 
 F ( 4+4=8) = F(F(4+4) =8) 
 = F(8=8) 
 = 1 
 
 An alternative is to define the semantics of a language by an inter- 
 preter that executes programs in that language. The interpreter is defined 
 by a function I that maps the current state s. of the interpreter into its 
 
 successor state s. in . Given an initial state s„, I determines the computation 
 l+l 
 
 s_,s, - ..., s ; when I(s ) is undefined, the computation is said to terminate 
 1 n n 
 
and s is the terminal state of the computation. For our example language, 
 n 
 
 states correspond to expressions and terminal states to integers. The 
 function I is defined by 
 
 I: int + intl x ■+■ Sum(int, intl) x 
 
 int = intl x ■> Compare (int, intl) x 
 
 The initial state s = 4+4=8 determines the following computation: 
 
 I(s Q ) = s - Sum (4, 4) =8 = 8=8 
 T( - S l^ = S 2 = Compare (8, J3_) = 1 
 
 The definition of the functions F and I illustrate the L component of LINGOL. 
 The syntax and semantics of the L_ metalanguage are described below: 
 
 A description in L„ is a function-name followed by a sequence of 
 semantic productions of the form p+e, where p is a pattern or an n-tuple 
 (p ,p , ..., p ) of patterns and e is a string expression or an n-tuple of 
 string expressions. A pattern is a sequence of string variables and strings 
 of terminal symbols; a string expression is a sequence of string variables, 
 terminal strings, and string-valued functions with string expressions as 
 arguments . 
 
 To find the value F(x) of a function defined by a set of L~ produc- 
 tions, we match x with the left side of one of the productions and evaluate 
 the right side. The values of string variables in the expression are determined 
 by the pattern match. If no match can be found or if the string expression 
 is undefined, F(x) is undefined. 
 
 The semantics and context-sensitive syntax of L are specified further 
 by the five rules below. Rules (3) through (5) assure that a given set of 
 productions determines at most one value F(x). 
 
1. Every variable must be associated with a syntactic 
 class. 
 
 2. If the same variable appears more than once on the 
 left side, it must match the same string of symbols 
 each time it occurs. 
 
 3. Every variable that appears on the right side must 
 appear on the left side. 
 
 4. If more than one rule matches a given string, the 
 first rule in sequence is chosen. 
 
 5. If a pattern matches a string x in more than one 
 way, the parse that assigns the longest substring 
 of x to the first (leftmost) pattern variable is 
 chosen. If there are several such parses, the ones 
 that assign the longest string to the second variable 
 are selected, and so on until a unique binding of pat- 
 tern variables to strings is obtained. 
 
 Patterns and tuples of patterns have the same semantics; in par- 
 ticular, rules (4) and (5) apply to an entire tuple and not to individual 
 patterns within the tuple. For example, suppose we wish to evaluate 
 Compare ( 001 2 , 01 2 ) where the function Compare is defined by 
 
 Compare: (zeros int, zeros2 int) -*■ 1 
 ( int, int2) -> 
 and the syntactic class Zeros is defined by 
 
 Zeros ->■ 0* 
 Both productions match, but by rule (4) the first one is selected, variable 
 zeros is bound to 00 by rule (5), int matches both occurrences of 1_2 by 
 rule (2), and zeros2 is consequently bound to 0. Evaluating the right 
 side, we have Compare (0012 , 012 ) = 1. 
 
3. Informal Description of ASPLE 
 
 An ASPLE program consists of a sequence of declarations followed 
 by a sequence of executable statements. Declarations serve to associate a 
 'mode' with each identifier used in the program. There are five types of 
 statement: assignment statements, if-then-else conditionals, while-do 
 loops, input and output statements. Statements contain expressions composed 
 of Boolean and integer constants, identifiers, and the operators +, *, =, and 
 ^ . The operators + and * placed between integer values represent addition 
 and multiplication respectively; between Boolean values they represent the 
 logical 'or' and 'and' operations. The operators = and £ take integer 
 arguments and return a Boolean value. Every identifier used in an expres- 
 sion must appear in exactly one declaration. 
 
 The example ASPLE program below is taken from [3]. 
 
 begin 
 
 int X, Y, Z; 
 
 input X; 
 
 Y := 1; 
 
 Z := 1; 
 
 if (X t 0) then 
 
 while (Z t X) do 
 Z := Z + 1; 
 Y := Y * Z 
 end 
 
 fi; 
 
 output Y 
 end 
 
 The program above reads a positive integer value X from an input 
 
 file, computes its factorial, and prints the result Y on an output file. 
 
 Variables X, Y, and Z are declared to reference only integer values; their 
 
8 
 
 mode is thus reference-to-integer. 
 
 Just as integers can be assigned to variables of mode reference- 
 to-integer, references to integers can be stored in variables of mode 
 reference- to-ref erence-to-integer , and so on for as many levels of indirec- 
 tion as are desired. Consider the program below: 
 
 begin 
 
 ref ref int A, B; 
 
 ref int C, D; 
 
 int E; 
 
 E := 50; 
 
 C := E; 
 
 A := C; 
 
 D := A; 
 
 input D; 
 
 output D 
 end 
 
 In this program the integer 50 is assigned to variable E, a refer- 
 ence to E is assigned to variable C, and a reference to C is assigned to 
 variable A. Since D expects a value of mode reference-to-integer, A is 
 'dereferenced' twice and the resulting reference to E is stored in variable D. 
 The input statement reads an integer value into the variable E, and the out- 
 put statement prints the value of E. Assuming that the value input was 25, 
 the final state of memory is as shown in Figure 3.1. B is undefined. 
 
 
©» 
 
 ®»— * 
 
 — * 
 
 « 
 
 ► 
 
 
 
 ! 
 
 r 
 
 4 
 
 • * 
 
 25 
 
 
 ©*— • 
 
 Figure 3.1. ASPLE Memory Structure 
 
 The Boolean or integer constant obtained by repeatedly de- 
 referencing a variable is called the 'primitive value' of that variable, 
 and its mode is called the 'primitive mode' of the variable. In the 
 example above, the primitive mode of variables A, C, D, and E is integer, 
 and their primitive value is 25 at program termination. 
 
 An assignment statement is legal if 
 
 (a) the right side is defined, 
 
 (b) both sides have the same primitive mode, and 
 
 (c) if n and n are the number of occurrences of 'reference-to' 
 
 L K 
 
 in the modes of the left and right sides respectively, then 
 n L " X - V 
 A legal assignment statement is executed as follows: 
 
 (1) If the right side is a constant or identifier 
 and n -1 = n , then the value on the right is 
 
 L R 
 
 assigned to the variable on the left. 
 
 (2) If the right side is an identifier and n -1 < n , 
 
 L R 
 
 the identifier is dereferenced until a value is 
 obtained whose mode contains n -1 occurrences of 
 
 Jj 
 
 'reference-to,' and this value is assigned. 
 
 (3) If the right side is an expression other than a 
 
 constant or identifier, n = 0. Identifiers in 
 
 R 
 
 the expression are replaced by their primitive 
 
 values, the expression thus formed is evaluated, 
 
 and the resulting constant is assigned. 
 
10 
 
 In the example program below, the first three statements il- 
 lustrate rules (1), (2), and (3) respectively; the last three statements 
 are illegal since they violate conditions (a), (b) , and (c) respectively. 
 
 begin 
 
 
 
 
 
 
 ref int C. 
 
 D 
 
 I 
 
 
 
 
 int E, F, 
 
 G; 
 
 
 
 
 
 bool H; 
 
 
 
 
 
 
 E := 10; 
 
 
 i-i 
 
 = 
 
 1. 
 
 "r-°1 
 
 F := E; 
 
 
 I"L 
 
 = 
 
 1, 
 
 «,-« 
 
 G := (E); 
 
 
 I"L 
 
 S 
 
 1, 
 
 n,-0] 
 
 C := D; 
 
 
 [ "L 
 
 = 
 
 2, 
 
 n R =2] 
 
 H := E; 
 
 
 l \ 
 
 = 
 
 1, 
 
 n,-l] 
 
 C := (E) 
 
 
 I\ 
 
 = 
 
 2, 
 
 n R = 0] 
 
 end 
 
 
 
 
 
 
 The argument of an input statement must be an identifier whose 
 primitive mode is the same as the mode of the next value to be input. The 
 identifier is dereferenced to obtain a reference to a constant, and the 
 
 input value is assigned to the referenced location. 
 
 Expressions appearing in other statements are always evaluated 
 or dereferenced to a constant as the first step in executing the statement, 
 
 
11 
 
 4. Formal Definition of ASPLE 
 
 A formal grammar for ASPLE programs is given in Figure 4.1. 
 It is an almost direct translation into L.. of the BNF grammar on page 
 195 of [3], except that two-word nonterminals have been renamed, productions 
 [B18] and [B19] are slightly changed, and some compression of the grammar 
 has been achieved by using the Kleene * and + operators. 
 
 Since we intend to define the semantics of ASPLE by means of an 
 interpreter, we need to extend the ASPLE grammar to include a definition 
 of the class of computational states. The definition consists of the pro- 
 ductions [B23] through [B35] in Figure 4.2. [B36] through [B43] are 
 stand-alone productions that define syntactic classes used by the interpreter 
 but not by other syntax rules. In particular, [B38] through [B43] define 
 implementation-dependent restrictions on the length of programs, memory, 
 integer constants, identifiers, and files. , 
 
 For convenience in defining classes of fixed-length strings, we 
 let N*k denote a sequence of k instances of the syntactic class N. Thus 
 Digit*10 is the class of all 10-digit integers, and Digit*10 Digit+ is the 
 class of all integers with more than 10 digits. 
 Interpreter States 
 
 A state of the ASPLE interpreter is represented by a program or a 
 sequence of declarations and statements, followed by a snapshot that describes 
 the current contents of memory and of the input and output files associated 
 with every program. Memory also serves as a symbol table: Each entry 
 includes the mode of an identifier as well as its contents. An identifier 
 may be undefined, or may contain (refer to) a Boolean constant, an integer 
 constant, or a reference to another identifier as in the example below: 
 me mory ; A refbool undefined; B r efint 12; C refrefint B; 
 
12 
 
 [B01] Program "*" begin Decls j_ Stmts end 
 
 [Declaration] 
 
 [B02] Decls -> (Declaration j_)* Declaration 
 
 [B03] Stmts -* (Statement j_)* Statement 
 
 [B04] Declaration -» Mode Idlist 
 
 [B05] Mode -> bool int ref Mode 
 
 [B06] Idlist -> (Id j)* Id 
 
 [Statements] 
 
 [B07] Statement ■* Assignment | Conditional Loop | Transput 
 
 [B08] Assignment ■*■ Id _^ Exp 
 
 [B09] Conditional ■> if Exp then Stmts fi 
 
 if Exp then Stmts else Stmts fi 
 [BIO] Loop -> while Exp do Stmts end 
 [Bll] Transput -> input Id 
 
 output Exp 
 [Expressions ] 
 
 [B12] Exp -> Factor | Exp + Factor 
 
 t B 13] Factor -*■ Primary | Factor _* Primary 
 
 [B14] Primary ■* Id Constant j( Exp )_ _£ Compare ± 
 
 [B15] Compare ■> Exp = Exp | Exp £ Exp 
 
 [Constants and Identifiers] 
 
 [B16] Constant ■+ Bool Int 
 
 [B17] Bool -> true false 
 
 [B18] Int ■+ Digits Digit 
 
 [B19] Digits -> Digit* 
 
 [B20] Digit + | 1 | ... | 9 
 
 [B21] Id -> Letter+ 
 
 [B22] Letter ■+ A | B | ... | Z 
 
 Figure 4.1. Syntax of ASPLE Programs 
 
13 
 
 [States] 
 
 [B23] State ■* Initial Declaring Executing | Final 
 
 [B24] Initial ■» Program Snap 
 
 [B25] Declaring ■* Decls jj_ Stmts j_ Snap 
 
 [B26] Executing -*■ Stmts j_ Snap 
 
 [B27] Final ■* Snap Lexemes error Lexemes 
 
 [B28] Snap -*■ memory ; Loc* infile Record* outfile Record* 
 
 [B29] Record ■*■ Constant j_ 
 
 [B30] Loc ■+ Id Mode Box j_ 
 
 [B31] Box -> Val undefined 
 
 [B32] 
 
 Val 
 
 + Id Constant 
 
 
 
 [B33]x 
 
 ,y,x: Lexemes 
 
 -»- 
 
 (Box Operator Keyword 
 
 Mode 
 
 _)* 
 
 [B34] 
 
 Operator 
 
 ■* ; := 1 + 1 * = 
 
 If li 
 
 I I 
 
 [B35] 
 
 Keyword 
 
 ■f if then else fi 
 
 while 
 
 do 
 
 
 
 
 end input output 
 
 memory 
 
 
 
 
 
 infile outfile 
 
 
 
 [B36] 
 
 Zero ■* 0* 
 
 
 
 
 
 [B37] 
 
 Con ■*■ Constant 
 
 undefined 
 
 
 
 [Limitations] 
 
 
 
 
 
 [B38] 
 
 Longprogram 
 
 -*■ 
 
 Lexeme*10000 Lexeme+ 
 
 
 
 [B39] 
 
 Longmemory 
 
 -> 
 
 Loc*2000 Loc+ 
 
 
 
 [B40] 
 
 Longint 
 
 -> 
 
 Digit*10 Digit+ 
 
 
 
 [B41] 
 
 Longid 
 
 -> 
 
 Letter*6 Letter+ 
 
 
 
 [B42] 
 
 Maxint 
 
 -> 
 
 4095 
 
 
 
 [B43] 
 
 Longf ile 
 
 -> 
 
 Record*500 Record+ 
 
 
 
 Figure 4.2. Syntax of ASPLE States 
 
14 
 
 In the W-grammar for ASPLE the same state of memory would be 
 represented by 
 
 memory loc A has ref bool refers undefined end 
 loc B has ref int refers 12 end 
 loc C has ref ref int refers B end 
 
 We have chosen an abbreviated representation of memory in the 
 belief that a generally useful formal definition should be tied closely to 
 concrete programs and to concrete representations of memory of the sort that 
 might be generated by a symbolic dump routine. Such a representation ought 
 to be both compact and syntactically similar to the programs it accompanies. 
 
 A compact state description permits example computations that 
 are not excessively bulky . For example, the execution of the ASPLE program 
 
 begin int X; X j^ J3 end 
 is represented by the following sequence of states: 
 [SI] begin int X; X jj^ 0^ end memory ; inf ile outf ile 
 [S2] int X; X j_f_ 0j_ memory; inf ile outf ile 
 
 [S3] X 2z. 0_L memory; X ref int undefined; inf ile outfile 
 
 [S4] memory; X ref int 0; inf ile outfile 
 
 The strings [SI], [S2], [S3], and [S4] belong respectively to the 
 subsets Initial, Declaring, Executing and Final of the set of states. 
 Interpreter Definition 
 
 The interpreter for ASPLE is defined by a state transition function 
 I and seven auxilliary functions E, Plus, Times, Equal, Unequal, Sue and 
 Pred. The last two are the successor and predecessor functions for the 
 class of non-negative integers. E is an expression evaluator, and the other 
 functions define the ASPLE operators +» jS =, and £. The domains and 
 ranges of the functions are as follows: 
 
15 
 
 I: State - Final ■> State 
 
 E: (Exp, State) -*■ Con 
 
 Plus: (Con, Con) ■> Con 
 
 Times: (Con, Con) -> Con 
 
 Equal: (Con, Con) ■> Con (4.1) 
 
 Unequal: (Con, Con) -> Con 
 
 Sue: Int ■+ Int 
 
 Pred: Int - Zero -> Int 
 
 Con is defined by [B37] as the class of integer and Boolean con- 
 stants together with undefined ; when arithmetic overflow occurs, or a binary 
 operator is supplied the wrong arguments, or an undefined identifier is 
 used in an expression, the result undefined is passed through the expres- 
 sion evaluation process and ultimately returned by E. 
 
 The definition of I consists of the semantic productions [101] 
 through [129] displayed in Figures 4.3 and 4.4. The definition of function E 
 is shown in Figure 4.5, Figure 4.6 contains the definition of Plus, Times, 
 Equal, and Unequal, and Figure 4.7 contains the definitions of Sue and Pred. 
 We will consider each of these definitions in turn. 
 
 In the definition of I, productions [101] through [105] serve to 
 enforce implementation-dependent limitations on ASPLE programs. [101] through 
 [104] cause a transition to an error state when a 'compile time' error is 
 detected: excessive program length, too many declarations, an oversize constant 
 or identifier. [101], [103], and [104] apply only to the initial state, but 
 [102] may be invoked at any time while declarations are being processed. [105] 
 cause a transition to an error state when the output file overflows during 
 execution. 
 
16 
 
 I: [Interpreter] 
 
 [Limitations] 
 
 [101] longprog snap -»■ error PROGRAM TOO LONG 
 
 [102] x longmemory y + error EXCESSIVE MEMORY REQUIRED 
 
 [103] begin x longint y -*■ error OVERSIZE INTEGER 
 
 [104] begin x longid y ■+ error IDENTIFIER TOO LONG 
 
 [105] x outfile longfile ■* error OUTPUT FILE OVERFLOW 
 
 [Declarations] 
 
 [106] begin decls ^ stmts end x ■> decls j_ stmts j_ x 
 
 [107] mode id _j_ idlist j_ x -*• mode id £_ mode idlist _^ x 
 
 [108] mode id j^ x j_ id mode2 y 
 
 ■* x j_ id mode2 y error id ALREADY DECLARED 
 [109] mode id _^ x memory; y 
 
 -*■ x memory; id ref mode undefined; y 
 [Assignment] 
 
 [110] id := int 
 
 id refint box y 
 id refint int y 
 id refbool box y 
 id refbool bool y 
 
 id ref mode box y ±_ id2 mode val z 
 id ref mode id 2 y j_ id 2 mode val z 
 id2 mode val y j_ id ref mode box z 
 id2 mode val y j_ id ref mode id 2 z 
 id 2 mode val y 
 ■*■ id jj^ val J_ x j_ id2 mode val y 
 
 [115] id 2Z box J. x -> x error ILLEGAL ASSIGNMENT i d l^_ box 
 
 [116] id jj2 exp j_ x -> id _£z_ E(exp, x) _|_ x 
 
 [111] id 2Z bool j_ x 
 -> x 
 
 [112] id j^ id2 ^ x 
 ■+ x 
 
 [113] id _^ id2 j_ x 
 ■> x 
 
 [114] id := id2 ; x 
 
 Figure 4.3. ASPLE Interpreter, Part I 
 
17 
 
 [Conditions] 
 
 [117] if true then stmts fi; x ■+ stmts ; x 
 
 [118] if false then stmts fi; x -* x 
 
 [119] if true then stmts else stmts2 fi; x -> stmts ; x 
 
 [120] if false then stmts else stmts2 fi; x ->- stmts2 ; x 
 
 [121] if con then x -> x error ILLEGAL CONDITIONAL 
 
 [122] if exp then x -»• if E(exp, x) then x 
 
 [Loops] 
 
 [123] while exp jdo stmts end; x ■* if exp then stmts j_ 
 
 while exp clo stmts end f i; x 
 [Transput] 
 
 [124] input id ^ x j_ id mode id2 j^ y 
 
 ■*• input id2 j_ x \_ id mode id2 j_ y 
 [125] input id j_ x inf ile constant j^ y 
 
 -*■ id _£» constant j_ x inf ile constant j^ y 
 [126] input id x x ■*■ x error ATTEMPT TO READ EMPTY FILE 
 [127] output constant j_ x ■*■ x constant j_ 
 [128] output undefined ; x ■> x error OUTPUT UNDEFINED 
 [129] output exp j_ x ■> output E(exp, x) j^ x 
 
 Figure 4.4. ASPLE Interpreter, Part II 
 
 E: [Expression Evaluation] 
 
 [El] (exp + factor, x) ■> Plus(E(exp, x) , E(factor, x)) 
 
 [E2] (factor *_ primary, x) ■> Times (E(factor, x) , E(primary, x)) 
 
 [E3] (id, x j_ id mode val j_ y) ■* E(val, x j_ y) 
 
 [E4] (id, x) -> undefined 
 
 [E5] (constant, x) -> constant 
 
 [E6] ( _( exp )_ , x) -> E(exp, x) 
 
 [E7] ( _( exp = exp2 )_ , x) ■* Equal (E (exp, x) , E(exp2, x) ) 
 
 [E8] ( _( exp ± exp2 2 » x) -> Unequal (E (exp, x) , E(exp2, x) ) 
 
 Figure 4.5. Expression Evaluation 
 
18 
 
 Plus: [Addition and Boolean 'or'] 
 
 [PI] ( int, zero ) ■> int 
 
 [P2] (maxint, int ) ■> undefined 
 
 [P3] ( int, int2 ) ■> Plus (Suc(int) , Pred(int2)) 
 
 [PA] ( false , false ) ■*■ false 
 
 [P5] ( bool, bool2) •> true 
 
 [P6] ( con, con2 ) ■> undefined 
 
 Times: [Multiplication and Boolean 'and'] 
 
 [Tl] ( int, zero ) ■* 
 
 [T2] ( i n t, digit) ■> Plus (Times (int, Pred (digit) ) , int) 
 
 [T3] ( int, digits digit) -> Plus (Times (int 0, digits), Times(int, digit)) 
 
 [T4] ( true , true ) -> true 
 
 [T5] (bool, bool2) + false 
 
 [T6] ( con, con2 ) •> undefined 
 
 Equal: [Compare Integers for Equality] 
 [EQ1] (zero int, zero2 int) -> true 
 [EQ2] (int, int2) ■> false 
 [EQ3] (con, con2) ■»■ undefined 
 
 Unequal: [Compare Integers for Inequality] 
 [Ul] (zero int, zero2 int) -> false 
 [U2] (int, int2) -> true 
 [U3] (con, con2) -> undefined 
 
 Figure A. 6. Operators 
 
19 
 
 Sue: [Successor Function] 
 [SOI] digits ■*- digits 1 
 [S02] digits 1 ■> digits 2 
 
 [S09] digits 8^ ■* digits 9 
 
 [S10] 9 ■* 10 
 
 [Sll] int 9 ■* Suc(int) 
 
 Pred: [Predecessor Functions] 
 
 [PR01] digits JL ■+ digits 0^ 
 
 [PR02] digits 2 + digits 1 
 
 [PR09] digits 9 ■* digits 8 
 
 [PR10] 10 + 9_ 
 
 [PR11] int + Pred (int) 9 
 
 Figure 4.7. Unary Functions 
 
20 
 
 The remaining productions are organized to reflect the structure 
 of the grammar. [106] initializes the execution process by reducing the 
 program to a sequence of declarations and statements. Other productions 
 operate in one of two ways: They remove a declaration or statement from the 
 left of the sequence, execute it, and modify the snapshot accordingly, or 
 they replace a declaration or statement with equivalent ASPLE code to which 
 another production applies. Both modes of operation are illustrated by the 
 sample computation below. The productions used are listed in their order of 
 application. 
 
 begin int X,X end memory ; inf ile outfile [106] 
 
 int X,X; memory; inf ile outfile [107] 
 
 int X; int X; memory ; inf ile outfile [109] 
 
 int X; memory; X ref int undefined ; inf ile outfile [108] 
 
 memory; X ref int undefined . . . error . . . 
 
 The order of productions in a definition may be significant. For 
 example, the domain of [108] is a subset of the domain of [109]; if these two 
 productions were interchanged the second one would never be applied and redun- 
 dant declarations would not be detected. 
 
 Productions [110] through [129] define the semantics of statement 
 execution. Since the definition of the assignment statement is the most com- 
 plex, we will discuss it at some length; the remaining definitions will be lefl 
 to the reader. 
 
 The productions that define assignment are arranged in three groups 
 that correspond to the three cases of the informal description of assignment 
 in section 3: 
 
21 
 
 (1) If the right side of the assignment is a constant 
 
 or identifier and n -l=n then the value on the 
 
 L K. 
 
 right is stored in the variable on the left. 
 
 a) Integer constants are stored by [110]. 
 
 b) Boolean constants are stored by [111]. 
 
 c) Identifiers that are defined are stored 
 by [112] or [113]- 
 
 (2) If the right side is an identifier that has been 
 defined but does not satisfy (1), [114] is applied 
 to replace the right side with its value. If the 
 resulting assignment statement still fails to 
 satisfy (1), [114] will dereference the right side 
 again, and this will continue until n -l=n R or the 
 right side is a constant. 
 
 (3) When the right side is an expression other than a 
 constant or identifier or undefined , [116] causes 
 the expression to be replaced by its value (or 
 undefined ) . Single item expressions that fail to 
 satisfy (1) or (2) are intercepted by [115], which 
 generates an appropriate error message. 
 
 The process of executing an assignment statement is represented by 
 the state transition graph in Figure 4.8. Each state in the diagram corresponds 
 to the set of interpreter states matched by one of the productions [110] through 
 [116]. For example, if we are in state 16, production [116] will be applied 
 and the resulting interpreter state will belong to state 10, 11, or 15 of the 
 diagram. Either [110], [111], or [115] will be applied subsequently. 
 
 From the graph it is easy to verify that an assignment statement 
 will eventually be processed. The only circular path passes through state 14, 
 and every time [114] is applied n decreases. When 1^=0 another state is 
 reached and processing is complete. 
 
 We will complete our discussion of assignment by providing a de- 
 tailed example of the operation of [112]. The first step is shown below: a 
 string that represents the current state of the interpreter has been matched 
 vith the left side of [112]. The binding of pattern variables to substrings 
 is indicated by vertical alignment. 
 
22 
 
 Figure 4.8. Transition Diagram 
 
23 
 
 id j_= id2 j_ x j_ id ref mode box y ± id2 mode val z 
 A jj= B j_ memory ; A ref ref int G j_ .B ref int 9_ J_ • • • 
 The second step is to evaluate the right side of [112] using the 
 bindings above. The result is given on the second line: 
 x j_ id ref mode id 2 y \_ id2 mode val z 
 memory j_ A ref ref int jJ j_ B ref int 9^ \_ . . . 
 If A and B had appeared in memory in the reverse order, [113] would 
 be used instead. Note that B must be defined for assignment to take place; 
 if B were undefined the match would fail since the pattern variable val cannot 
 take undefined as a value. Note also that the variable y is bound to the empty 
 string because A and B occupy adjacent locations in memory. 
 
 Some care must be taken in writing semantic productions. For example, 
 if the second and third instance of _; were omitted from the pattern, the fol- 
 lowing situation could occur: 
 
 id v=_ id 2 j_ x id ref mode box y id2 mode 
 A \=_ B j_ memory ; P A ref ref int G j|M B_ ref int . . . 
 In this example, the statement A := B assigns B to the variable PA 
 if MB is defined, regardless of the mode or status of A and B. 
 Auxilliary Functions 
 
 The auxilliary functions defined in Figures 4.5, 4.6, and 4.7 require 
 fewer productions than I but make use of recursion. To prove that the recur- 
 sion terminates is not difficult: We simply note that E is applied to fewer 
 symbols at each successive call, and that the second argument of Plus and 
 Times is decremented at each call until it reaches zero and a value is returned. 
 Note that production [E3] is applied repeatedly to obtain the primitive value 
 of an identifier; since the semantics of ASPLE rule out circular chains of 
 pointers, the declaration of id need not be passed on to the next call of E. 
 
24 
 
 5. Evaluation 
 
 A number of criteria for evaluating formal definition techniques 
 are proposed in [3]. In particular, the authors point out that an impor- 
 tant measure of a formal definition technique is its ability to provide the 
 answer to detailed questions about the language it describes. A sample 
 question is posed and each of four definitions is used to answer it. For 
 purposes of comparison, we will show how the same question is answered by 
 the definition in section 4. The remainder of this section is a detailed 
 comparison of the LINGOL and W-grammar approaches to language definition. 
 
 A question that might be posed about ASPLE is: In the example 
 
 program below, is the assignment of an integer constant to the variable X 
 
 valid? 
 
 begin 
 
 ref int X; 
 
 X := 2 
 end 
 
 To answer the question, we execute the program starting with the 
 initial state 
 
 begin ref int X; X ]f 2 end memory; . . . 
 
 We ignore the input and output files since they are not used. 
 Applying the interpreter productions [106] and [107] we obtain successively 
 the states 
 
 refint X; X ;= 2; memory ; . . . 
 
 X_ }=_ 2j_ memory ; X refref int undefined; . . . 
 
 Now we examine the productions for assignment. [HO] does not 
 apply, since it requires a mode of refint; the next rule that admits an intege 
 
25 
 
 on the right of the assignment is [115], and applying it we obtain the error 
 state below. The assignment is clearly invalid. 
 
 memory; x refref int undefined; . . . error ILLEGAL ASSIGNMENT X: = 2 
 Comparison with W-grammars 
 
 As with LINGOL, the W-grammar method is based upon strings of 
 symbols and rewrite rules, and this similarity suggests that a comparison 
 between the two will be especially meaningful. An obvious comparison can 
 be made by counting rewrite rules; if we do so we find that the SIBYL 
 definition requires 43 syntax productions, and a total of 77 semantic pro- 
 ductions, while the W-grammar definition in [3] requires 38 context-free 
 productions (metaproductions) and 100 additional productions (hyperrules) , 
 not including the 22 productions of a standard context-free syntax for 
 ASPLE. 
 
 This comparison is overly simplistic for several reasons. First, 
 the semantic productions differ greatly in complexity; one fairly elaborate 
 hyperrule can be equivalent to several simpler LINGOL productions, and both 
 descriptions contain sequences of trivial productions. Second, there are 
 differences of style as well as notation. The authors of the W-grammar 
 definition have attempted to separate the context-sensitive and semantic 
 aspects of ASPLE; in the LINGOL definition they are intertwined. 
 
 A more fundamental difference is that the LINGOL definition is 
 operational while the W-grammar definition is essentially axiomatic: In 
 effect, a computation must be deduced from a set of relations rather than 
 generated by an algorithm. 
 
 We will compare the two methods by applying them both to a simple 
 class of expressions. First, however, we must lay the notational groundwork 
 for a description of W-grammars and their semantics. 
 
26 
 
 We begin by using L to define another syntactic metalanguage W. 
 The nonterminal and terminal symbols of grammars in W are defined as follows: 
 Any non-empty sequence of lower-case letters followed by a comma is a 
 nonterminal; the symbols 0, T, F, +, ( and ) are terminal symbols. 
 W + Production+ 
 
 Production ■* N + Form 
 x,y,z: Form -> (N | T)* (5.1) 
 
 n: N ■> (a | b | ... | z)+ ^ 
 
 T -*o|llll±llll 
 
 A sentence of W is shown below. Since we choose to regard it as 
 a grammar rather than a character string, it is not underlined and spaces 
 are inserted for readability. It defines a language containing two kinds 
 of expressions, integer and Boolean: the type of an expression is the 
 same as the types of its operands. 
 
 exp, -*■ intexp, 
 
 exp, -*■ boolexp, 
 
 intexp, -*■ ( intexp, + intexp, ) 
 
 boolexp, ■*■ ( boolexp, + boolexp, ) (5.2) 
 
 intexp, ■*■ 
 
 boolexp, ■+ T 
 
 boolexp, ■*■ F 
 
 Two integer expressions and a Boolean expression are shown 
 below: 
 
 (0+0) (T+(F+T)) 
 
27 
 
 Now we introduce a new metanotation consisting of sequences of 
 productions of the form p °* e, where p and e have the same syntax as L~ 
 patterns and expressions. A sequence of these productions defines a binary 
 relation on strings rather than a function, since a string may have any 
 number of successors. Only the first two of the defining rules for L„ 
 must be satisfied: 
 
 (1) Every variable in p or e must be associated with a 
 syntactic class. 
 
 (2) A variable must match the same string of symbols 
 each time it occurs. 
 
 We can use these productions to model the semantics of various 
 
 classes of grammars, including W-grammars and grammars in L.. . For example, 
 
 the meaning of grammar (5.2) is defined by the relation Rl below. 
 
 Rl : exp, *♦ intexp, 
 
 exp, °* boolexp, 
 
 intexp, =* _( intexp, + intexp, )_ 
 
 boolexp, =* ( boolexp, + boolexp, ) (5.3) 
 
 intexp, =* 
 
 boolexp, =* T_ 
 
 boolexp ^ =* F_ 
 
 The relation Rl determines a larger relation Dl (derives) defined 
 
 by 
 
 x n z Dl x y z iff n Rl y 
 
 where n is a nonterminal from the class N defined in (5.1) and x, y, and z 
 
 are members of Form. The class of expressions defined by (5. 3) is the set 
 
 of terminal strings derivable from exp , that is, the set of strings y such 
 
 that y e T* and 
 
28 
 
 exp, Dl x, Dl x„ Dl ... Dl x Dl y for some x. G T*. 
 — — 13 n i 
 
 For example, 
 exp, Dl intexp, Dl (intexp, +intexp, ) Dl (0+intexp,) Dl (0+0) 
 
 Because our new metanotation admits string-valued variables as 
 well as string literals, we can give a somewhat more compact definition 
 of the relation Rl, as follows: 
 
 Intbool ** int | bool 
 Rl : exp, ** intbool exp, 
 
 intbool exp, "* j( intbool exp, + intbool exp, ) (5.4) 
 intexp, =* () 
 boolexp, ■* T | F 
 This is an example of a two-level grammar or W-grammar. The first- 
 level grammar defines a set Intbool of modes, and the second-level grammar 
 uses the variable intbool to avoid writing a production for each mode of ex- 
 pression. Since the set Intbool could have been defined to contain an 
 infinite number of modes, we see that a two-level grammar can be used to 
 represent an infinite number of context-free productions. The last produc- 
 tion uses our standard abbreviation for two productions having the same left 
 side. 
 
 The same two-level grammar expressed in a more standard notation 
 is shown below. We have followed the lead of [3] in using '+' instead of 
 'plus symbol' to denote the terminal symbol +, and similar abbreviations 
 for the other terminals. 
 
29 
 
 INTBOOL : : int; bool. 
 
 exp: INTBOOL exp. 
 
 INTBOOL exp: _£♦ INTBOOL exp, + , INTBOOL exp, )_. 
 
 int exp: (). 
 
 bool exp: T; F. 
 
 We can use a two-level grammar to define the semantics of expres- 
 sions as well as their syntax, but to do so we must adopt a different 
 strategy. The first-level grammar will be used to generate an infinite 
 set of nonterminals that includes as a proper subset an encoding of all 
 legal expressions. For example, the expression (0+0) is encoded as the 
 non-terminal int left zero plus zero right , . As before, we define a rela- 
 tion between nonterminals and forms (R2), and extend it to a derivation 
 relation (D2); but this time the set of terminal strings derivable from 
 the nonterminal exp, is the set of expressions with their values. In the 
 example derivation below, the initial choice of nonterminals permits the 
 derivation of the terminal string 0^ 0^ ; hence JO is an expression and is 
 its value. 
 
 exp, D2 intzero, intzero, eval intzero giving zero, D2 ... D2 0^ 
 
 The first-level grammar and the second-level rules (called hyper- 
 rules) that generate the set of legal expressions are given below. 
 
 Intbool -*■ int bool 
 
 Exp -> left Exp plus Exp right J Value 
 Value •*■ zero true f alse 
 R2: exp, =* intbool exp_^ intbool value^ eval exp giving value^ 
 
 intbool left expl plus exp2 right, (5.5) 
 
 =*■ X intbool expl _^ + intbool exp2 _^_ )_ 
 int zero, =*• () 
 bool true, =* T 
 bool false, =* J_ 
 
30 
 
 The semantics of expression evaluation is defined by the 
 additional hyperrules given below. These rules ensure that a nonterminal 
 of the form 
 
 eval exp giving value A 
 will derive a terminal string (the empty string) only when intbool eX p 
 derives an expression and intbool value _j_ derives the value of that 
 expression. 
 
 eval value giving value j_ ** 
 
 eval left expl plus exp2 right giving value x 
 °* eval expl giving value2 _^ 
 eval exp2 giving value3 _j_ 
 
 where value equals value2 plus value3 _j_ (5.6) 
 
 where zero equals zero plus zero _j_ =* 
 where true equals true plus value ±_ =*■ 
 where true equals value plus true _g_ =* 
 where false equals false plus false A =* 
 
 Notice that most of the hyperrules above derive the empty 
 string. To illustrate their use, we sketch the derivation of the expres- 
 sion (0+0) with value j): 
 
 exp, D2 int left zero plus zero right, int zero , 
 
 eval left zero plus zero right giving zero, 
 D2 ... D2 (0+0) 0^ eval zero giving zero, eval zero giving zero, 
 
 where zero equals zero plus zero, 
 D2 ... D2 (0+0) 
 
 The first two hyperuules (5.6) are equivalent to the two axioms 
 below. The statement EVAL (value) = value is true because the nonterminal ev; 
 value giving value generates the empty string. The left and right sides 
 
31 
 
 of the second axiom are logically equivalent because the left side of 
 
 the corresponding hyperrule generates the empty string only when the right 
 
 side does. 
 
 Eval (value) = value 
 
 Eval( lef t expl plus exp2 right ) = value 
 
 iff Eval (expl) = value2 and (5.7) 
 
 Eval(exp2) = value3 and 
 Plus(value2, value3) = value 
 The axioms (5.7) provide a recursive definition of the string-valued 
 function Eval. The same definition written in L would look like this: 
 Eval: value -> value 
 
 left expl plus exp2 right ■+ Plus (Eval (expl) , Eval(exp2)) 
 A syntactic and semantic definition equivalent to the W-grammar 
 definition in (5.5) and (5.6) is given below using L.. and L ? . In this 
 case Eval operates on concrete rather than abstract or encoded expressions, 
 so its definition is somewhat shorter. 
 
 Exp -> _( Exp + Exp )_ I Value 
 Value -> Bool 
 Bool -> T F 
 Eval: value -> value 
 
 ( expl + exp2 ) + Plus (Eval (expl) , Eval(Exp2)) 
 Plus: (0, 0) ■+ 
 
 (T, bool) ■> T 
 (bool, T) -> T 
 (F, F) + F 
 
32 
 
 The class of legal expressions is defined to be the subset of 
 Exp whose members are mapped to values by the function Eval. To enable 
 the function Plus to discriminate between legal and illegal expressions, a 
 production for the class Bool of Booleans has been included in the first- 
 level grammar. 
 
33 
 
 6. Verification 
 
 Because formal language definitions tend to be large and 
 complex, and because they presently must be checked by hand rather than by a 
 compiler or interpreter, typographic and logical errors have a way of 
 creeping in and remaining undetected. Clearly, it is important to 
 identify those aspects of a definition that can be checked in a routine 
 manner, and to develop mechanical means of verification wherever possible. 
 For LINGOL, several forms of verification are possible. 
 
 An obvious first step is to verify that a definition is well- 
 formed. It must satisfy the context-free syntax of L and L and the 
 context-sensitive restrictions given in rules (1) and (3) of section 2: 
 Every variable in an L_ production must be defined by an L production, 
 and every variable on the right of an L- production must also appear on 
 the left. 
 
 As a second step, we can attempt to verify that functions have the 
 expected domain and range; see (4.1). In checking a function we make use of 
 the properties of other functions. For example, the assertion that the 
 range of E is the set Con rests on the assertion that Con is the range of 
 the functions Plus, Times, Equal, and Unequal. 
 
 The domain of E is actually a superset of (Exp, State), namely 
 the set (Exp, Lexemes ) . To verify this, we note that the domains of the 
 productions [El] through [E8] correspond to the leaf nodes of a tree generated 
 from (Exp, Lexemes) by applying the syntax productions [B12] through [B15] 
 (see figure 6.1). Since the grammar for expressions is unambiguous, the 
 leaves of the tree form a partition of (Exp, Lexemes) . Production [E3] is 
 omitted since its domain is a subset of the domain of [E4]. 
 
34 
 
 (Exp, Lexemes) 
 
 (Exp + Factor, Lexemes) [El] 
 
 (Factor, Lexemes) 
 
 *• (Factor ^_ Primary, Lexemes) [E2] 
 
 -^(Primary, Lexemes) 
 
 I 
 
 r >(Id, Lexemes) [EA] 
 
 I--*- (Constant, Lexemes) [E5] 
 
 |--«»(_(_Exp2_)_, Lexemes) t E6 l 
 
 ' — *-( (Compare) , Lexemes) 
 
 -*- (XExp = Exp^, Lexemes) [E7] 
 
 -»► (J_Exp £ ExpK Lexemes) [E8] 
 
 Figure 6.1. Tree of Alternative Derivations 
 
 
 We can compute the domains of a set of productions by taking 
 their left sides and replacing each variable with the name of the syntactic 
 class it denotes. If we express the result as a Venn diagram like the 
 ones shown in figure 6.2, it is easy to determine which productions can 
 be: interchanged without affecting the definition (those with disjoint 
 domains); removed without affecting the definition (those whose domains are 
 contained in the domain of an earlier production); removed without 
 changing the domain of the function (those whose domains are contained in 
 the domain of a later production). 
 
35 
 
 
 El) (E2J fm) E4'J (E5) (E6) (E7) (E8 
 
 Figure 6.2. Domains of Productions 
 
 Some of the information in figure 6.2 can be mechanically 
 generated (or verified) using a tree of derivations like the one in figure 
 6.1. The fact that productions [112] and [113] have disjoint domains 
 cannot, since it depends on the context-sensitive property of ASPLE 
 that no variable can be declared twice (and thus id cannot both precede 
 and follow id2 in memory) . 
 
 Having computed the domains and ranges of the defining produc- 
 tions for the interpreter, we can construct a transition diagram resembling 
 the one in figure 4.8. Transition diagrams are a useful abstraction that 
 reveal properties of both the definition (for example, the fact that while 
 and input are defined in terms of _if and assignment) and of the defined 
 language (for example, the fact that assignment can never be a non-term- 
 inating computation) . 
 
 An important property of a definition is locality: It is easier 
 to trace the execution of a statement through the definition if the produc- 
 tions involved are closely grouped, preferably on the same page of the 
 defining document. We can use a transition diagram to identify rules that 
 should be rearranged, and a Venn diagram to determine if the rearrangement 
 is possible. 
 
36 
 
 Since LINGOL descriptions are operational definitions, they 
 can be used to guide the execution of an example program for the language 
 being defined. If assignments actually assign and loops really loop, 
 we have some additional assurance that the definition describe the 
 language we intended. 
 
 The process of executing test programs and generating sample 
 computations can be mechanized, and in fact this has been done in a 
 limited way. A portion of the definition of SIBYL was transformed into 
 a SNOBOL program which was then applied to some sample computations; as 
 a result several errors were detected in the original definition. 
 
 Not surprisingly, the SNOBOL implementation was extremely inef- 
 ficient. We can do much better by building an interpreter or compiler 
 for LINGOL definitions that takes advantage of their structure. For 
 example, the search for a matching production can be greatly speeded up 
 if, for each production, we examine the parse tree(s) of the current 
 state rather than the underlying character string. If productions are 
 implemented as transformations on parse trees, we can minimize the 
 amount of parsing and string manipulation required. 
 
 We can also make use of the fact that productions can be mapped 
 onto a state transition diagram. We need not scan all 29 productions of 
 the ASPLE Interpreter (or the 108 productions that define SIBYL); instead 
 we can limit the matching process at each cycle to just those productions 
 reachable from the current state. 
 
37 
 
 x Exp 
 
 ABCD : = 15 
 
 (a) 
 
 x Exp 
 
 | Id |0perj] 
 
 I \t 
 
 |ABCD| := 15 
 
 OOOF 
 
 (b) 
 
 ^ 
 
 Descriptors 
 
 J 
 
 Characters 
 
 Binary Word 
 
 Figure 6.3. Internal Representation of Strings 
 
 Finally, efficient hand-coded versions of standard functions 
 like integer addition can be provided in a library. A further step is 
 to encode lexemes of standard types in a way that facilitates processing. 
 In figure 6.3 (b), for example, the string '15' of type Int has been 
 encoded in binary form. If we continued this process of replacing strings 
 and string-functions with storage structures and hand-coded subroutines, 
 our formal definition would gradually evolve into an interpretive implementa- 
 tion of the language. 
 
38 
 
 7. Summary 
 
 The use of string transformations in semantic definitions 
 appears to have several advantages: Definitions can be written that 
 are reasonably compact and readable, at least by comparison with some 
 existing formal approaches. Semantic productions can be grouped 
 to form a highly modular description. The semantic metalanguage is 
 simple and easy to learn. 
 
 In addition, the notation lends itself to mechanical verifi- 
 cation. Because definitions are operational rather than axiomatic, they 
 can be used to drive an interpreter that generates example computations. 
 
 Basing the definition on concrete programs represented 
 
 by character strings rather than abstract programs represented by, 
 say, labelled parse trees offers advantages as well as disadvantages. 
 On the one hand, computations can be represented compactly and the 
 reader is spared the effort of translating between concrete and abstract 
 syntax. On the other hand, questions of syntax may become entangled 
 with semantics, and care must be taken to avoid unintended results 
 in the string transformation rules. In general, more of the burden 
 is placed on the authors of a definition and less on the users. As- 
 suming that the latter outnumber the former, this seems like a reasonable 
 choice. 
 
39 
 
 References 
 
 1. Cleaveland, J. and Uzgalis, R. What every programmer should 
 know about grammars , Department of Computer Science, University 
 of California, Los Angeles, California, 1973. 
 
 2. Kampen, G. "A Formal Definition of the SIBYL Programming Language," 
 UIUCDCS-R-77-852, Department of Computer Science, University of 
 Illinois, Urbana, Illinois, 1977. 
 
 3. Marcotty, M. , Ledgard, H. F., and Bochmann, G. V. M A Sampler of 
 Formal Definitions," Computing Surveys 8:2, pp. 155-267. 
 
OGRAPHIC DATA 
 T 
 
 1. Report No. 
 
 UIUCDCS-R-77-889 
 
 2. 
 
 3. Recipient's Accession No. 
 
 C ,md Mibt itlc- 
 
 [NGOL: A Readable Formalism for Programming 
 
 5. Report Date 
 
 September 1977 
 
 Language Semantics 
 
 6. 
 
 uirl s ) 
 
 Garry R. Kampen 
 
 8. Performing Organization Kept. 
 
 N 1)IUCDCS-R-77-889 
 
 ng Organization Name and Address 
 
 Department of Computer Science 
 
 10. Project/Task/Work Unit No. 
 
 University of Illinois at Urbana-Champaign 
 Urbana, IL 61801 
 
 11. Contract /Grant No. 
 
 ii...nn»: Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, IL 61801 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 •]• iry Notes 
 
 .riji.' s 
 
 This paper describes a metanotation for defining the syntax and semantics 
 a programming language in a formal manner. Definitions are 
 erational: A semantic definition is a set of string transformation rules that 
 erate on concrete representations of programs and their environments. 
 
 The formalism is simple and easy to learn, and produces relatively readable 
 nguage descriptions. To illustrate the formalism, and to facilitate comparison 
 th other metalanguages, a formal definition of the simple programming language 
 ! PLE is presented. The method is compared in detail with the W-grammar approach, 
 id some techniques for verifying the consistency of definitions are discussed. 
 
 1 
 
 l/ords and Document Analysis. 17o. Descriptors 
 
 saantics 
 
 STiantic metanotation 
 
 : rmal languages 
 
 ngramming language 
 
 ^grammar 
 
 Btalanguage 
 
 litax 
 
 m it iers Open-Hndcd Terms 
 
 
 
 ! Field/Group 
 
 ability Statement 
 
 Ji-imited 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security (lass (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pag< 
 
 39 
 
 22. Price 
 
 USCOMM DC 4032'4-H 
 
OCf 2 ^80 
 
 
UNIVERSITY OF ILLINOIS-URBANA 
 510.84 IL6R no COO? no 886 893(1977 
 Generating binary trees lexlcographicall 
 
 3 0112 088403594 
 
 ■HP 
 
 m 
 
 ■■ 
 
 a B. 
 
 Hi 
 
 ■■■i 
 
 18 v