Digitized by the Internet Archive in 2013 http://archive.org/details/formaldefinition852kamp Yin- t? 2 - UIUCDCS-R-7 7-852 fl^V UILU-ENG 77 1710 Y Cr> * A Formal Definition of the SIBYL Programming Languages by Garry Kampen March 1977 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS The Library of the University or Illinois 9t "rhana-Champaign A Formal Definition of the SIBYL Programming Languages by Garry Kampen Department of Computer Science University of Illinois Urbana, Illinois March 1977 UIW OZ I- u 7 Z 1- UJ .1 Z_J j" J 5 Off Zh * ^v% x< Uj 2 Q. UJ 5 13 ^J o > LU X Z at UJ Z _l ( III IT 1 III J h u z < 0. z < Q. z bJ I 3 K CN ' K Q 1- a. Z _l IT Ul m 5 3 -1 _|2 Id I < 03 z 1- IL U OZ Abstract TITLE NO. ACCOUNT NO. LOT AND TICKET NO. 1* 'tion of T * 1 * 510.84*I3L6iR* MT . WATCH LOTH COLOR HEIGHT CHARGING INFORMATION PICA WRAP UBBING CALL NO. NGING EXTRA LETTERING :tra TIME BINDING CHARGES :tra thick *NDSEW n -h ves ctures YL is b an @ a sequence ics iu) r and a. O u 'n >i used < other £:ed for tr s is II1LC IMO. ACCOUNT NO. 200-009 LOT AND TICKET NO. 97- nvri CLOTH COLOR 510 *C0P»2- «Tl NS. HEIGHT 11 WRAP © ska, %5i-$i 7 Abstract This report contains a complete formal description of an experimental programming language, SIBYL, which achieves simplicity by generalizing a number of concepts and structures found in other programming languages. The syntax of SIBYL is defined by a context free grammar, and the semantics by an interpreter whose state transition function consists of a sequence of string transformation rules. This approach to semantics provides an operational definition that is highly modular and well adapted to mechanical verification. Because of its generality, SIBYL can in turn be used as a language definition tool. By mapping constructs in other high-level languages into their SIBYL equivalents, the need for repeated definitions of common control and data structures is avoided. A Formal Definition of the SIBYL Programming Language TaMe of Contents 1.0 Introduction. 1 2.0 Background. 7 3.0 Notation and Terminology. 10 3.1 Syntactic Metalanguage. 10 3.2 Semantic Metalanguge. H 3.3 String Automata. 14 3.U Relations on Expressions. 15 3.5 Operations on Automata. 16 3.6 Networks of Automata. 17 U.O SIBYL Syntax. 22 k.l Operators. 23 k.2 Primitive Values and Comments. 24 k.3 Data Structures and Variables. 25 Jt.U Character Set. 25 5.0 SIBYL Semantics. 27 5.1 Expressions. 27 5.2 Primitive Values. 28 28 28 29 29 30 30 31 5.3 Data Structures. 32 32 33 33 36 37 37 5.2.1 Null. 5.2.2 Booleans. 5.2.3 Tokens. 5.2.U Relations on Numbers. 5.2.5 Numbers. 5.2.6 Relations on Strings. 5.2.7 Strings. Data Structures. 5.3.1 Lists. 5-3.2 List Subexpressions. 5.3.3 Distribution. 5.3.U Reduction. 5.3.5 Records. 5.3.6 Record Subexpressions 5-5 5.U Variables. 38 38 39 40 41 43 44 47 47 47 50 5.6 Control Structures. 51 51 52 53 54 55 5-7 Self-Extension. 56 6.0 SIBYL Pragmatics. 58 6.1 Parallel Tasks. 60 6.2 Semaphores and Coroutines. 61 7 . Summary . 6 3 Variab les. 5.U.1 Assignment. 5.U.2 Input /Output . 5.U.3 Dereferencing. 5.U.U Subscripts. 5.U.5 Environment . 5.U.6 Blocks. Procedures. 5.5.1 Execution. 5.5.2 Environment . 5-5.3 Coercion. Control Structures. 5.6.1 Conditionals. 5.6.2 Loops. 5.6.3 Indexed Loops. 5.6.1* For Loops. 5.6.5 Coercion. List of Figures Figure Page 1. SIBYL Example. 3 2. Language Definition System. ** 3. Programming System. •* k. Network Example. 20 5. Multiprocessor System. 59 A Formal Definition of the SIBYL Programming Language 1.0 Introduction . The design and implementation of a software system requires a variety of artificial languages: job control languages, programming languages, notations for system design and analysis and metanotations to describe the other languages. The diversity of these languages makes life more difficult for the programmer who must learn and use them, but it appears to be inevitable. Existing languages are retained because they possess powerful compilers, extensive subroutine libraries, and numbers of experienced users. New languages are developed because they are better suited to specialized problem areas or user groups, or in order to take advantage of new hardware or software concepts. In the face of this linguistic diversity, two trends offer hope to the beleaguered computer user: the gradual replacement of relatively specialized and idiosyncratic languages like Fortran and COBOL by more general and powerful languages like PL/1 and Algol 68, and the development of metanotations for the precise specification of programming languages. This paper describes a syntactic and semantic metanotation aimed specifically at the users of programming languages. The formalism supports language descriptions that are understandable by mathematically unsophisticated programmers, and sufficiently complete and precise to be transformed mechanically into compilers or interpreters. An extended example of the metanotation is provided in sections h and 5, which contain the formal syntactic and semantic specification of a self-extensible interactive block-structured ty_peless language, SIBYL. SIBYL is a high-level programming language that incorporates features from languages as diverse as APL, LIE . ALGOL 68, PASCAL, and GEDANKEN. Data structures include integer and decimal numbers, Boolean values, character strings, pointers, lists and records. Arrays, files, stacks, and queues are provided as special cases of lists. Control structures include recursive procedures, a variety of loops, and extended case and conditional expressions. Scalar operators extend to arrays as in APL. For an example program, see figure 1. A unique feature of SIBYL is the provision of explicit operators for accessing and modifying the local environment of blocks and procedures. This permits extremely flexible scope rules. For example, a block or procedure can be given access to any or none of the following: variables in enclosing blocks, selected common or own variables, the environment of definition of the procedure, the environment of execution of the procedure. SIBYL is designed to serve in two distinct but complementary roles: the intermediate language of a two-level language definition system, and a high-level problem description language in a programming system. The first role is essentially that of a common base language as described by Dennis [Dennis, 1972]. The two-level approach to language definition is illustrated in figure 2. Programming language semantics are defined by syntax- directed translators which map source language constructs into equivalent SIBYL constructs. The semantics of SIBYL are defined by Job ( id : 123^ 5 , time : 20 , line : 500 ) ; (in:Data(device:Database , . . . ) ,out:Data( device :Sysprint, . . . ) ,Main: [ C'Part cost summary.' (input :ref ( in) , output : ref ( out ) , part : ( name : ' xxxxxxx ' , cost: 999 ), sum:0 )$ (len( input )@ read (input) -> part; part. cost + sum -*■ sum ); 'OTotal is ' sum •*■> output ])$ !Main; The SIBYL code above reflects in abbreviated form the structure of a COBOL program and its accompanying JCL. The 'execute program' command IMain has exactly the same effect as the procedure call Query (in, out ) when Query is defined by Query: [(i,o)$( 'OTotal is ' (igcost ) + )-»o] . Figure 1: SIBYL Example. JCL COBOL Fortran i L ^ @ ( L i Syntax-Directed Translators in L and L . SIBYL L 1 L 2 Interpreter in L and L . Mathematical Notation Figure 2: Language Definition System, SIBYL Syntax-Directed Translators in L, and L^. JCL Fortran Figure 3: Programming System. an interpreter. The translators and interpreter are described by transition rules from the semantic metalanguage L , which operate on syntactic constructs defined in the metalanguage L . L and L are simple enough to be informally defined, or to be given a straight- forward mathematical description. An advantage of the two-level approach is that both the translators and the interpreter can be relatively compact : the translators, because SIBYL is 'close' to the source language in the sense that it includes many high-level semantic constructs; the interpreter, because SIBYL is an almost purely expressive language. Repressive features, that is, features which restrict the generality of the language in order to permit efficient compilation or execution, are deliberately omitted. The use of SIBYL in a programming system is illustrated in figure 3. A description of the problem and its initial solution program are written in SIBYL and checked for correctness. When this has been done, critical procedures are reprogrammed for greater efficiency and translated into languages for which efficient compilers exist. This approach reduces a large problem to a pair of smaller ones: Finding a correct description of the solution, and obtaining an efficient program. Moreover, SIBYL' s freedom from hardware-oriented features ensures a high degree of portability for the initial program. The syntactic and semantic metalanguages L and L Q are defined informally in sections 3.1 and 3.2 respectively. The elements of L are sets of context-free syntax productions; L contains lists of state transition rules which operate on strings of characters. Transition rules can be grouped into semantic modules which define separate language components, thus permitting a building- block approach to language design. Notation for describing the behavior of these building-blocks is introduced in sections 3.3 and 3.U. Operators for combining semantic modules into networks of interacting automata are introduced in sections 3.5 and 3.6. The resulting metanotation L can be used to define the hardware components of a programming system. The metanotations L and L are used in sections h and 5 respectively to define the syntax and semantics of SIBYL. L is used in section 6 to describe a multiprocessor environment in which SIBYL programs can initiate parallel tasks. 2.0 Background . Before discussing the SIBYL metanotation in more detail, it will be useful to review existing syntactic and semantic formalisms. For syntax, the most widely used notations are variations of (a) the two-dimensional metalanguage used to define COBOL [US, 19&5] » an< ^ (b) the BNF notation introducted by Backus [Backus, 1959]. These notations are essentially equivalent, their meaning is well understood, and implementations exist in the form of parse- table generators which accept grammars as input [Aho, Ullman, 1972]. Other syntactic formalisms, notably two-level grammars [van Wijngaarden, 1975] and the abstract syntax of the Vienna group [Lucas, Lauer, Stigleitner, 1970] , can best be viewed as components of the semantic notations they support. Existing semantic methods exhibit varying degrees of formalism and abstractness, ranging from the specification of a language by its compiler [Garwick, 1966] to a purely axiomatic description [Hoare, 1969] or an algebraic characterization [Goguen, Thatcher, Wagner, Wright, 197!?] The methods lying between these extremes can be loosely grouped into three categories: interpretive, devolutional, and functional. Interpretive methods define a language by exhibiting an interpreter that transforms the current state of the computation into its successor state [Wegner, 1971^. The state includes a program and a memory component, and possibly separate control and environment components as well [ Herriot , 1971^ . The program component may be represented by a character string corresponding to the concrete program being executed [Kampen, 197 3 J or by an abstract object resembling a parse tree, as in the Vienna group method. State transitions may be described by functions [Lucas, Walk, 1969] , by precise but informal descriptions [van Wijngaarden, 1975], or by programs in a simple language which is assumed to be already known [Lukaszewicz , 1976], Devolutional descriptions provide a translation algorithm which maps programs in the language being defined into programs in a language which is assumed to be already known [Wirth, Weber, 1969]. Devolutional definitions may be wholly or partly extensional: a language is mapped into a subset of itself either by an external transformation rule or a statement within the language itself [irons, 1970]. Functional and axiomatic methods tend to be implicit rather than constructive. For example, in the axiomatic system of Hoare [1969] the properties of a successor state are described but no method is given for computing it. Other approaches define a language by mapping programs into functions l Scott, Strachy, 1971; Tennant , 1976] or lambda expressions [Landin, 1965], or by functions which map programs into computations or terminal states l-Hoare, Lauer, 1973]. A recent comparison of four major approaches to semantic definition [Marcotty, Ledgard, Bochmann, 1976 ] illustrates some of the drawbacks of existing formalisms: They are frequently harder to learn and use than the language they define; incapable of providing a clear, concise, easily modified description; and, with the exception of a partial implementation of the Vienna Definition Language [Feyock, 1975], largely unsupported by mechanical aids for verifying that descriptions are veil-formed and logically consistent. The design of the SIBYL metanotation has been guided by the desire to provide a formalism that is (a) sufficiently simple to be readily learned by programmers, and (b) sufficiently complete so that, given a language description, sample programs and their computations can be mechanically generated. In order to achieve simplicity, standard syntactic constructs are shared by SIBYL and its metanotation wherever possible. For example, names and character strings have a standard representation. Semantically , L is an extended BNF; L resembles (but is not identical to) a subset of the well-known string-manipulation language SNOBOL. To permit mechanization, definitions are interpretive rather than axiomatic. Statements equivalent to axioms in the Hoare notation must be proved as theorems. Programs and computational states are represented by concrete character strings. Abstractions arise from and guide the definition but are not part of it. Considerations of simplicity have guided the use of the metanotation as well as its design. The grammar for SIBYL has been simplified by minimizing the amount of semantic information it conveys: operator procedence, for example, is only suggested by the grouping of operators into classes. The SIBYL interpreter is defined by a list of approximately 100 semantic rules, none involving recursion. A definition in terms of recursive functions might have reflected the structure of the language more clearly, but it would not have permitted the incremental approach of section 5, where the language is developed as a series of successively more complex sublanguages. 3.0 Notation and Terminology . 3.1 Syntactic Metalanguage . The syntactic metalanguage L is. a set of grammars, each consisting of one or more productions of the form where n is the name of a syntactic class (non-terminal symbol) and e is an expression whose operands are syntactic class names and strings of characters (terminal symbols) over an alphabet V. Strings are enclosed in quotes or underlined; syntactic class names begin with an upper-case letter. The operators | , +, and * mean 'or', 'one or more', and 'zero or more' respectively. For example, a class of integer expressions is defined by Expression ->■ Operand (Operator Operand)* Operator -> ' + ' | '*' Operand ■+■ Nil|Digitf Nil + Digit -> 0|l|2|3.llLll|6|ll§.l£ Operator | has the lowest -precedence, + and * the highest. For readability, we adopt the convention of indicating the components of a syntactic class by indentation. 11 L includes expressions which define n-tuples of syntactic classes. For example, the class of states of a programmable pocket calculator with stack, display, and input registers and function keys + (add), * (multiply), |- (clear) and -\ (evaluate) is given by the grammar State -*■ (Stack, Display, Input) Stack "*" t It Operand Operator Display -*■ Operand Input -> Key* Key ■> Digit l + l^l^ln 3.2 Semantic Metalanguage . The semantic metalanguage L_ is a set of semantic descriptions consisting of one or more transition rules of the form (P x , P 2 , •-., P n ) ■*■ (e 1 , ..., e m ) where the p. are patterns , i.e. sequences of character strings and string variables, and the e. are string expressions , i.e. sequences of character strings, string variables, and string-valued functions of string expressions. Variables appearing on the right side of a rule must appear on the left side of the same rule. For 12 example, a possible semantic description D for the pocket calculator of section 3-1 is D_: (s ,w ,y t yl) .+ (fc. .nil , yl ) D 2 : (s ,w , digit y ) -*- (s ,w digit , y ) D : (h ,w ,op y ) -*■ (kw op, nil , y ) D, : (HH+,wl,key y ) -*• (k ,Sum(w,wl) ,key y ) D : (v-w^,wl,key y ) -> (h ,Prod(w,wl) ,key y ) where s e Stack, v,wl e Operand, y,yl e Input, digit e Digit, op e Operator, key e Key, and Sum and Prod return the sum and product respectively of integers represented as strings of digits. Since semantic descriptions have the power of Turing machines, Sum and Prod can themselves be defined by transition rules. By convention, string variables are formed from lower- case letters and digits, while the names of functions and descriptions are capitalized. An informal description of the pocket calculator can be written to match the transition rules: The clear key H clears the stack and zeroes the display (D ). As digits are entered they appear in the display (D ). When the stack is empty a + or * keyin pushes the displayed operand and the operator onto the stack (D ). When the stack is not empty a +, *, or -i keyin causes the stack 13 and display operands to be added or multiplied, depending on the stack operator, and the result placed in the display (D^,D ). The semantics of L insure that every description D defines a function. Given an n-tuple (s , s , ...,s ) of strings we compute its value under D as follows: (1) For every transition rule in D, match the patterns p. against the corresponding strings s.. Pattern-matching is done by a top- down parse with backup, as in the SNOBOL language. At each stage, the alternative that matches the longest string is tried first, tnen the next-longest, and so on. Patterns are matched from left to right, starting with p . When a variable is matched with a sub- string, subsequent occurrences of that variable are replaced with that substring. (2) Choose the first (topmost) rule whose left-hand side matches the n-tuple, and compute d(s , s , ..., s ) by evaluating the string expressions on the right-hand side. The way rules (l) and (2) operate to eliminate semantic ambiguity is illustrated by the state below, which can be parsed in three different ways. (K>5+ ,06,H HH hlO-0 D : (s ,w ,y hyl ) Applied. D : (s ,w ,y -yl ) Eliminated by (l). D« : (Ha+ ,wl,key y ) Eliminated by (2). 3.3 String Automata . The function defined by the semantic description D is a member of a class of abstract machines call string automata. A string automaton of n registers is a mapping T : A -+■ B where A and B are subsets of a set S of states consisting of n-tuples of character strings over an alphabet V. When T is a function the automaton is said to be deterministic ; otherwise it is non- deterministic . A computation is a sequence s_, s , ... of states such that s. L . = T(s.). States in S - A are called halt states. If a l+l 1 computation is finite and its last state s is a halt state, the computation is said to terminate , and s is its final state . Also, s is the initial state of the computation, s. _ is the successor state of s. , and s . is a consequence of s^ whenever j>i. These relationships may be stated more compactly as s -*■ s. o k s. -> s... i l+l * s . -> s . 1 J * where ■*■, -*, and ■> are relations on S determined by T. When T is deterministic, -V and ■+ are functions. An example of a computation determined by D is given below. Following each state is the transition rule D. which 1 applies to that state. 15 Cs ,w ,y>-5+6-< fc >£ » 5+6-t (h ,05,+6m (K)j>+,0 ,6-4 (K)5+,p6,± D. V Clear stack and display. Enter digit into display. Push operator onto stack. Enter digit into display. Compute sum. 3.4 Relations on Expressions . String automata can he used to define software or hardware modules (e.g. pocket calculators) or languages (e.g. arithmetic expressions). In the latter case, it is convenient to extend the relations ->, •>, and ■+ from states to expressions. Assume that T is the defining automaton for a language L, and that s& is the string formed by concatenating the components of a state s, i.e. \S,, s, ..., s )& — s s ... s. id n id n Then the relation ■* is defined for e ,e e L by el •+■ e2 iff si ■$■ s2 where si and s2 are states, sl&= mlv^el^ for some string ml, and s2&= m2^e2_j_ for some string m2. lC Relations -*■ and ■*■ are defined similarly. An equivalence relation = on expressions is defined by * * el = e2 iff el + e and e2 -*- e for some expression e. Enclosing quotes and underlining may be omitted when the meaning is clear. For example, when L is the class Expression defined in 3.1 and T is defined by D from section 3.2, then 5+6 •*- 05+06 -V 11 and 5+6 = 6+5. 3. 5 Operations on Automata . String automata may be combined to form new automata by a variety of operators, including the function composition operator o: T = T o T iff T(s) = T (T (s)) for all s. T = T & T iff T(s) = T (s) when T(s) is defined, = T ( s ) otherwi se . T = T * iff T(sl) = s2 when si 4- s 2 in T . T = T n iff T = T when n = 1, "1 = T o T otherwise. 17 For example, the expression below defines a new pocket calculator which behaves like the original except that operands are entered all at once instead of a digit at a time. New = D & D* & D„ & D, & D,_ 12 3 4 5 An abbreviated form of the same expression is New = D & D* & D -L t- s> • • J 3.6 Networks of Automata . If T is a string automaton with states of the form s = (.s- , s_, ... s ). and if R = (E., R n , ... R ) is a sequence of n 1 d n 1 d n distinct objects called registers , then T(R) denotes an instance of T. The terminology introduced for string automata also applies to instances of string automata. When T'(R) is in state s, string s is called the content of register R. . A register can have only one content. A network is a set of instances of string automata whose registers are selected from a list R. The state of the network is the list of contents of R. When a network component changes state, so does the network. If T (P) and T (Q) are instances of the string automata T and T and R is a list of the registers appearing in P and Q, the expression T(R) = T X (P) + T 2 (Q) defines T to be the string automaton whose successor state T(s) is defined as follows: Let s determine the contents of R, and apply T to the contents of P. Apply T to the state induced "by the new contents of P. The resulting network state is T(s). The nodule connection operator + can ^n dcfineo nore precisely in terms of the functions T (R:P) and T (R:Q) induced on states of f(R) by T and T : T(R:Q)(s) = s' iff T(q) = q' where s ,- = 0.4 and s, = q. when R. = Q., 1 J i J i J s! = s. otherwise. i i T(R) = T (P) + T (Q) iff T = (T (R:P) o T 2 (R:Q)) & T (R:q) & T (R:P) The operators o, S and + are all associative, so an expression of the form T = T„ + T„ + ... + T 12 n is well-defined. A state transition of T is computed by applying T n to the current state if possible, then applying T if possible, and so on. If none of the T. modifies the current state, T is undefined l for that state. 19 Figure k shows the network defined "by- Net (Rl, R2, R3, RU) = D(R1, R2, R3) + Term(R3, RU ) with TernL : (y, start e stop ) -*■ (yt-e, stop ) Term 2 : (x , stop ) + (X >z t ) Terny (X , -i) ■> (j ,X ) where e e Expression. When connected to the input buffer of the calculator, the terminal device defined by Term ensures that only legal expressions are transmitted, and permits the use of start and stop as syntactic sugar. A sample computation is given below. The computations of D and Term represent parallel processes that pause while awaiting input. Note that D changes state twice during one transition of Net. Net: D: Term: (K)5+,06,X,stop_) (H35+,06,X) (X,stop_) ( t-05+ ,06,X,H ) CX,H ) Q-05+ , 06,-0 C^,X ) (fc ,ll,d,A ) (v- ,11,H) 20 Rl D R3 V R2 Term < RU ■>. - Figure k: Network Example. 21 If A, B, and C are string automata, if R is a list of registers, and if we extend the operators &, o, *, and + by the definitions A(R) & B(R) = (A&B)(R) A(R) o B(R) = (AoB)(R) A(R)* = A*(R) then &, o, and + are associative, and moreover: A & B » B & A when dom(A) f) dom(B) = A o (B&C) = (AoB) 8| (AoC) when dom(B) O dom(C) = (A&B) o C = (AoC) & (BoC) (A*)* = A* A + B = B & A when dom(A) f\ rge(B) = A + B = A o B when dom(A) r rge(B) C dom(B) k.O SIBYL Syntax . State -»■ (Memory, Stack Operand, Input) m: Memory ■* Record s: Stack ■*■ (Operand Operator)* y: Input -> ( Separator | Operator | Term)* sep: Separator ■* Blank* | Comment op: Operator ■+ Evaluator | Delimiter | _(_| _ a _| _)_ term: Term ■* Operandi Subexpression w: Operand •*■ Nil | Value nil: Nil -► Blank v: Value ■+ Constant | Variable con: Constant ■* Primitive | Structure | Procedure prim: Primitive ■*■ Null | Boolean | Token | Number | String stru: Structure ■+■ List | Record proc : Procedure ■* [_Form]_ Subexpression •*■ _(Form_)_ f: Form •> (Statement |_j_)* t: Statement ■+ (Expression I Delimiter )• e: Expression -*■ (Separator | Evaluator | Term)* The remainder of the context-free syntax for SIBYL is given belov. The lower case name preceding each class name will be used to denote a string of that class. Names of distinct strings will be distinguished by an integer suffix. A string of the class (Operator Input) will be denoted by x. An example of a state of the form (m, s w op wl,x) is ( (A:5,B:6,) , hA+B ,H) 23 U.l Operators . ev: Evaluator ■> Op | Primary! Op A Op pr: Primary -> Op |0po| Op | Relation |0p^| Op rel: Relation ■*■ Op 5 del: Delimiter ■* Op °P 10 "♦ XljuL °P 5 "* ilild^J^zJZz. °P 9 ■* t\L\t °Pu +l\& Op g •> +|- op 3 ->v 0P ? + i^_ Op 2 •> il-^AilA^JjPPi Op. ■* &|jjt. Op ■*■ ilililzllilhld % - iUi The operators above are grouped by precedence from highest (0p 1() ) to lowest (0p Q ). h. 2 Primitive Values and Comments . prim: Primitive ■*- Null | Boolean | Token | Number | I ' null: Null + % bool: Boolean * True | False true: True •*■ 1_ false: False ■> token: Token ■*■ Function | Ptype | Stype |Deref fun: Function ■* hd | tl | len | dom | rge | read | pop ptype: Ptype •*■ null | bool [ token ) num ) int l dec | str stype: Stype ■+ list [ rec | proc deref : Deref ■* val | ref | con num: Number ■+■ Index | Integer | Decimal i : Index -*■ Digit + int: Integer -*■ Sign Digit+ dec : Decimal •*■ Sign Digit+ j_ Digit+ Sign ■* A|- str: String -*■ Quote Characters Quote Quote -*■ _]_ cs: Characters ■*■ Character* c: Character *> Letter | Digit | Blank) Quote Quote | Other Comment ■> C_ String Some examples of primitives are given below. Note that the quote character is represented within a string by a pair of quotes. % token 12 -3 12.5 '0'* CONN OR' C comment. 25 h. 3 Data Structures and Variables . list: List -*■ {_ Values )_ vs: Values -*• ( Value j_ )* rec: Record -*■ _(_ Fields _)_ fs: Fields -*> (Field J* Field -* Name j_ Value name: Name •* Letter (Letter| Digit )* var: Variable -*■ Name | Name^jBubscripts | Reference ref : Reference ■> r. Subscripts subs: Subscripts -*■ Subscript (^Subscript)* sub: Subscript ■* Name | Index A record r with a list and a record as values is exhibited below along with some variables that reference its components. r: (ARRAY: ( (l ,), (1,2, ), (l, 2,3 ,),) , EMPLOYEE : (NAME: 'JONES' ,AGE:30,),) r. ARRAY ARRAY ARRAY. 3. 2 EMPLOYEE . NAME h.k Character Set . sym: Symbol -*■ Letter | Digit | Blank] Quote | Other Letter ■> Uppercase! Lowercase Other ■+ Syml|Sym2|Sym3|Sym)+|Nonascii| Control Blank ■* _ Syml -*■ i|^|£|£|£|& Quote ■+ ^_ Sym2 -> ililil + IJ-IJl Digit -* o|i|2|3|U|5|6|t|8|9 Sym3 + ilillll£ Uppercase ■* a|b| . . . |Z_ SymU -> LlV.ll Lowercase ■*■ a|b| . . . | z. Nonascii ■*■ tldltllA|vl Control ■* ASCII control characters All symbols except H, h, A_, V, and H are members of the extended ASCII character set. The exceptions will be represented externally by the keywords start , stop , and , or , and not . Uppercase characters may be substituted for lowercase when the latter are not available. ASCII symbols other than control characters are listed in the order they appear in the ASCII collating sequence. 21 5.0 SIBYL Semantics . The formal semantic definition of SIBYL is presented in this section. The definition is a list of semantic rules which describe an interpreter with states of the form (m,s w,y). Semantic rules are grouped and ordered to parallel the syntax productions of section U, and to make possible a step-by- step development of the language: At each step, the rules to that point define a subset of the overall language, and examples are drawn from this sublanguage. 5.1 Expressions . K.'- (m,s v .y^ y) -»• (m,nil H nil, yl) E 2 : (m,s w ,sep y) -> (m,s w , y) E : (m,s nil,v y) **• (m,s v , y) E^: (m,s v ,vl y) -> (m,s v ,lvl y) s : (n,s \r ,_(. :■') " fo» 5 w i nil » y) E^: (m,s w _(_ v ,_)_ y) * (m,s w,v nil y) E (m,s nil ,ev y) -*■ (m,s nil ev nil, y) E Q (m,s w op. wl,op y)+ (m,s w op. wl op. nil, y) 1 j 1 j where <_ i < j and 1 < j . The operator h_ clears and resets the stack and deletes everything to its left in the input buffer (rule E ). Since the first rule takes precedence over the others, a ^ keyin can used to halt an otherwise nonterminating computation. Expressions are evaluated from left to right. Gepara - *. are ignored (E ), but the operator l_ is inserted between adjacent values (Ei ). Values are entered when the top of the stack is empty (E~), and pushed down when the operator to their right has higher precedence than the operator to their left (Eg). Prefix operators (e.g. operators whose left argument is nil) take precedence over the operator to their left (Ey). 5.2 Primitive Values . 5.2.1 Null . R : (m,s null J_v,x) -* (m,s null,x) A null value applied to a value v returns null. 5.2.2 Booleans . P> 2 : (m,s nilnbool , x ) ■* (m,s Not(bool) ,x) R : (m,s boolAbooll,x) -*■ (m,s And(bool,booll) ,x) R^: (m,s boolvbooll,x) -> (m,s Or (bool,hoo"ll 1 ,x) The Boolean operators H[, ^, and v are defined by the functions Not, And, and Or, which have the usual meaning for the values 1 (true) and (false): 11 V A 1 ->0V0 ■+■ 29 5.2.3 Tokens . R : (m,s con ^_ stype,x) •+ (m,s Type(con,stype ) , x) R^-: (m,s prim = ptype,x) -> (m,s Type (prim, ptype ) ,x) The operator ■ (is) tests its left-hand argument for membership in the syntactic class designated by the token on the right. Type(x,y) = 1 (true) when x is in y, and 0_ (false) otherwise. Ex: 12 - num a 12.0 = dec -> 1 5.2.U Relations on Numbers . R : (m,s w rel wl,rell y) ■+■ (m,s w rel wl, Awl rell y) R„: (m,s num<_ numl, x) -* (m,s Less (num, numl) ,x) R : (m,s num>_ numl, x) ■* (m,s Less (numl, num) ,x) R 1Q : (m,s num= numl, x) -» ( m ,s Equal (num, numl) ,x) R : (m,s num<=numl, x) -> (m,s nilTLess (numl , num) ,x) R 12 : (m,s num>=numl , x) ■* (m,s nilnLess (num, numl ) ,x) R : (m,s num/=numl , x) -> (m,s nil ^ Equal (num, numl) ,x) Expressions of the form el < e2 < ... are abbreviations for expressions of the form el < e2 A e2 < . . . (R ). The relational operators <_, >_, =_, <=_, _>=, and J_^_ yield a Boolean value when applied to numeric operands. Ex: 12 = 12 = 012.0 ■* 1A1 + 1 5.2.5 Numbers . R , : (m,s num *_ numl,x) R : (m,s num j_ numl,x) R ,: (m,s nurr + numl,x) R : (m,s num - numl,x) R o: (m,s nil opg num ,x) R : (m,s nurr j_ v ,x) (m,s Prod ( num, nunl ) ,x) (m,s Quot ( num , numl ) , x ) (m,s Sum(num,numl ),x) (m,s Diff (num, numl) ,x) (m,s opo num x) (m,s nuin,+ v x) The basic arithmetic operators (including unary + and -) are defined by the functions Prod, Quot, Sum, and Diff which return the product, quotient, sum, and difference of their arguments. When the operator l_ is coerced to +_ it is moved to the input buffer to allow for the possibility that the operator to the left has higher precedence than + (R _). iy Ex: 12-3*25 ■> 12+- 3*25 ■*■ 12+-T5 5.2.6 Relations on Strings . R : (m,s str rel strl,x) -*- (m,s nil#_str ,rel#_strl x) R : (m,s nil #_ str ,x) ■> (m,s Encode(str) ,x) R : (m,s nil ft_ 1 ,x) ■*■ (m,s Decode( i ) ,x) 31 The function Encode maps every string into the unique non-negative integer given by its representation in 7-bit ASCII. Decode is the inverse of Encode. Strings of the same length are ordered lexicographically "by the ASCII collating sequence. Short strings precede (are less than) longer strings. * Ex: 'B» < 'AB' ■> 5.2.7 Strings . Rpo : (m,s'cs'& _]_csl_[_,x) -*■ (m,s ^_cs csl_|_ ,x) Rp,: (m,s str &_ con ,x) -»■ (m,s str &Qt (con) , x ) R : (m,s str l_ v ,x) -*• (m,s str,&_ v x) R ^: (m,s hd! 'c cs_^_,x) ■> (m,s _^cj_ >x) R : (m,s tl! 'c cs 1 ,x) -»■ (m,s _^_cs_|_ ,x) Rpo" (m,s len! str ,x) ■> (m,s Len(str) ,x) R : (m,s nil_I_ str ,x) ■*■ (m,s nil,Unq(str) x) The basic string operators are &_ (concatenate) and the functions hd_, tl_, and len which return the head (first character), tail (all but the first character), and length (number of characters) of a string. The j_ operator coerces to &_ when preceded by a string (R_ ). Constants other than strings are concatenated after being coerced by the function Qt , which first replaces every quote in the constant by a pair of quotes, and then encloses the result in single quotes. Unq is the inverse of Qt . Cince the result of the unquote operation may be an expression rather than an operand, it is placed in the input buffer (R 9Q ). Ex : hd'ABC & tl'DEF' ■* 'A'&'EF 1 + 'AEF' * * Ex : 'N=' 12+3 ■* 'N='&15 * 'H=15' Ex : len'MC" COY 1 *!' 12*3' ■* 6*12*3 ■* 216 5. 3 Data Structures . 5.3.1 Lists . D : (m,s {_ vs _)&_ {_ vsl )_ ,x) ■+ (m,s {_ vs vsl _]_ , x) D : (m,s hd! {_ v^ys _K X ) "*" ( m > s v > x ) D : (m,s tl! j[_ v,vs _)_> x ) "* ( m » s i_ vs 1 > x ) Di : (m,s len! list ,x) ■+ (ra,s Len(list) , x) D : (m,s int _ 1 _ : _ intl ,x) -> (m,s Seq ( int , intl ) , x) The basic list operators are &_ (concatenation), hd_ (head), tl (tail), and len (length). As in LISP, hd and tl are undefined for the empty list ( ) . The operator j^ (sequence) generates a list of consecutive integers in ascending or descending order starting with int and ending with intl. 33 Ex: 0..-2&0..2 ■+ (0,-1 ,-2 , )&(0, 1, 2, ) + (0,-1,-2,0,1,2,) Ex : hd(5,6)+len(tl(5,8)) ■* 5+len((8,)) -> 6 5.3.2 List Subexpressions . D,-: (m,s w {_ vs nil >JL x) ->■ (m,s w {_ vs nil, x) D : (m,s w _(_ vs v >JL x) (m,s w {_ vs Vj_ nil, x) Dq: (m,s w {_ vs nil,_)_ x) (m,s w , _(vs _)_ x) D : (m,s w ( vs v , ) x) (m,s w , (vs v, ) x) Subexpressions of the form (e_ , e , ..., e ,) evaluate to lists of the form (v , v , ..., v ,) where v. is the value of expression e.. Nil entries are deleted (D^) and the abbreviation 1 o (e , e , ..., e ) is permitted for lists of more than one element (D ). Since (e_ ) yields v by a previous rule, it cannot evaluate to the singleton list (v , ). Ex: (5+6,, ,7+8) * (11,15,) 5.3.3 Distribution . D 1Q : (m,s list pr listl.x) -+ (m,s ,Dist (list pr listl)x) D 11 : (m,s list pr prim ,x) + (m,s ,Dist (list pr prim) x) D l2 : (m,s prim pr list ,x) -> (m,s ,Dist (prim pr list) x) D : (m,s nil pr list ,x) ->- (m,s nil ,Dist(nil pr list) x) D , : (m,s con ^op conl ,x) -> (m,s nil > Dist (con op conl) x) D : (m,s nil _;_op con ,x) ■+- (m,s nil ,Dist (_f)_ f w op 11 +1*1 f itljtll op Ct2_ a f2l ■+ f tl op t2,(fl) op If 2j_ f Itl flj_ op lt2 f2_)_ -> If tl op t2j. f w op It2j_f2j_ ■> f w op t2_j_ w op lf2_)_ f w op It 2j_ +jf w op t2_)_ f itljJUl op w + f tl op w A l f ll op w f Itl^ op w + {t tl op w_)_ f ¥ Op ¥l -»■ If W Op Wl)^ Dt* is defined for all "binary expressions whose terms are (a) a pair of subexpressions (t , t , ..., t ) and (t ' ,t ' , ..., t ' ) or (b) a subexpression and an operand w. In the first case, the result is a subexpression of the form (t op t ' , . . . , t op t ' ) » where k is the minimum of n and m. In the second case, the result is a subexpression of the form (t op w, ..., t op w) or (w op t , . . . , w op t ). The first two rules for Dt define an exception: If statement t. , t ' , or t is empty, the last statement of the k k n result will be empty. For example, ()+() evaluates to () and not (+), since the first rule applies with f = X. 5.3.** Reduction . D ,: (m,s list pr nil,x) -► (m,s nil ,Reduce(list pr) x) D : (m,s list^op nil,x) -> (m,s nil ,Reduce(list op) x) The function Reduce constructs a subexpression consisting of list components separated by the given operator. The next operator must be a delimiter; otherwise it will be taken as a prefix operator and pushed onto the stack before rule D. ,_ or D_ ^ can be 1? lb applied. For example, ((5,6,7,)+) - ((5+6+7)) * 18. Note that both distribution and reduction apply to matrices of any order. For example, to multiply the corresponding elements of two 3-dimensional matrices A and B and sum the components of the result matrix, we write (((A*B+)+). Reduce = Rd* , where Rd is the function defined below. Reduction is carried out by the repeated application of the second rule for Rd. Rd: £t x )_ op ■+ (t 2 _(t_j_ vs_)_ op ■+ (t op vsj_ op ( ) OP •*■ 1 1 37 5.3.5 Records . D lg : (m,s (fs)_k.i_ fsl l > x ) "* ( m > s I fs fsl l > x ) D, Q : (m,s dom! rec ,x) -> (m,s Dom(rec) ,x) DpQ*. (m,s rge! rec ,x) ■* (m,s Rge(rec) ,x) Operators on records include &_ (concatenate) and the built-in functions dom (domain) and rge (range) which return respectively the list of field names in a record and the list of values it contains. E Xl : (A:5,B:7,) & (C:9.) * (A:5,B:7,C:9, ) Ex_: dom(A:5,B:7,) & rge(C:9,) * U,B,) & (9,) 5.3.6 Record Subexpressions . D™: ( m > s w i. f s name ,j_y) ■*■ (m,s w {_ f s name jnil,y) D 22 : (m,s w {_ fs field „y) ■> (m,s w {_ fs f ield^nil ,y ) D 23 : (m,s w _(_ f s nil ,_)y) -* (m,s w , _(_fsj_ y) D 2l+ : (m,s w _(_ fs field Jy) -+ (m,s w , _(fs field_J_ y) An expression of the form (n n : e , ..., n, : e, , ) where the n. are field-names and the e. are expressions with value v. 1 i * 1 yields a record of the form (n : v , ..., n : v ) upon evaluati As with lists, the final comma can be omitted. Ex 1 : (A:5+0,B:3*2) I (A:5-B:6,) Ex 2 : (A,B).: (5,6) I (A:5,B:6J 5. h Variables . 5.^.1 Assignment . V : (m ,s con c r -ar,x) + (r'ind(m,var) ,s con cv var , v. ) V : (m ,s nil ev var,x) -»■ (Find(m,var ) ,s nil ev var , x) V • (y^con yl,s conl ■+_ var,x) -> (y conl yl,s conl, x) V, : (y^y yl,s val I var,x) -»■ (y v yl,s v , x) on The operations of assignment {-*) and retrieval are carried out in two steps: first, the referenced value is located; then it is either replaced by a constant (V ) or retrieved (V, ). The value of Find(m,var) is the memory string m with the cursor ~^_ inserted just before the value referenced by the variable var. For example, 39 if m = '(A:5,B:(3,U),)» then Find(m,B) = ' (A:5 ,B:^(3,U , ) , ) ' and Find(m,B.2) = » (A: 5 ,B: (3,^ , ) , ) ' and Find(m,C.l) = ' (A: 5 ,B: (3,U, ) ,"►) ' Records are searched from left to right until a matching name is found or failure occurs, as in the last example. Lists are indexed starting with 1, so B.2 references the second item in list B. Lists and records may be nested to any depth, and variables may have any number of field names and indices as subscripts. Multiple assignment is possible, as shown below. ( (B;0,C:5,D:6,) , h , val!B+OD-Q * ( (B :0,C : 0,D: 0) , H) , h) 5 . k . 2 Input /Output . V 5 : (V— ! v _l. VS _L yl»s read! var,x) -*■ (yiys)_ yl,s v ,x) v g : (y^ iys v , ) yl , s pop! var,x) •*■ (y_(ysj_ yl,s v ,x) V 7 : ^— -L vs 1 yl» s con» var,x) -> (yjvs con,) yl,s var.Len((vs con,)),x) The built-in functions read and pop and the operator ■+> (write or push) are defined on variables which reference a list. read and pop remove the first and last element of the list respectively and return it as their value; ^*> appends its left operand to the end of the list and returns a subscripted variable that references the appended element. For example, if file Fl is initially bound to the null list ( ) we have ((A:5,B:6,) +> Fl ,val(Fl.l) ,read(Fl ) , ) * (Fl.l , (A: 5 ,B: 6, ) , (A: 5 ,B:6, ) , ) 5 . h . 3 Derof erenc ing . v 3 : (y ± con y 1 ^ ref j_ var,x) ■*■ (y con yl,s var,x) V : (y i v yl,s w ev var,x) ■+■ (y v yl,s w ev v,x) V _: (m ,s deref l_ con,x) ■*■ (m ,s con,x) The function ref returns a reference to a constant when applied to a variable var. If var directly references a constant, var itself is returned (V ft ). If var directly references another variable, the variable will be retrieved (V ) and dereferenced in turn. Rule V ensures that if the chain of references is not dangling or circular, ;* and ;*> will coerce their right operands to a reference to a constant, and other evaluators will obtain a constant. In particular, the function con returns the constant indirectly referenced by its argument ( V ) . 1+1 The last rule permits the functions val, ref , and con in the class Deref to be applied componentwise to parameter lists containing constants as well as variables. For example, if m = ' (A:5,B:A,C:B,D:C,)' then by rules D . , V, , V Q , and V val..'(5,A,B,C,D,) ■* (5,5,A,B,C,) ref.:(5,A,B,C,D,) * (5,A,A,A,A,) con.:(5,A,B,C,D,) * (5,5,5,5,5,) 5 . U . k Subscripts . V : (m ,s var ev w ,x) •> (Find(m,var ) , s nil (yar,ev w)_ jX ) v 12 : Cyclist y!> s var,_!_varl x) -> (y list yl,s var, con! varl x) v T o : (y^list yl,s var,_!_i x) ■*■ (y list yl,s var^i ,x) V , : (y>rec yl,s var,_Iname x) -*■ (y rec yl,s varjiame ,x) V : (y+stru yl,s var,_!_list x) -> (y stru yl,s nil ,Dist (varllist ) x) V -|£ : (yi v yl»s var,ev w x) -> (y v yl,s v ,ev w x) The operator j_ (subscript) appends an index to a list- valued variable (V ) and a subfield name to a record-valued variable (V , ). Variables used to index a list are coerced to their values (V ) . Variables distribute over lists of subscripts (v 15 ). Evaluators other than \_ dereference their left-hand operand and then their right-hand operand, if both are variables. Names of built-in functions and types are exceptions: since they are also constants , rules V through V will be applied first . If the right-hand variable dereferences to a constant for which the function is not defined, the function name will be dereferenced. For example, suppose that memory contains an array A and a record Employee as follows : m = '(A:(1,2,3,),K:2, Employee: (Name: (First: "Joe", Last: "Smith",), Skills: (5,6,8,9,),),)' Then the examples below are valid: A(K) ■* A.'K + A!2 * A. 2 Employee Name (First) * Employee. Name. First A(A) - A!(l,2,3,) I (A.1,A.2,A.3,) Employee (Name, Skills) I ( Employee. Name, Employee. Skills , ) len!A -> len!(l,2,3,) * 3 a->a * (1,2,3, Ha U3 5.U.5 Environment. V !7 : ^y^l > s ref i sl env,x) + (y)_ , s ref$_ si ref ,x) V !8 : ^ y — ' S ref - Sl var ' x ^ "* ^ y -^- > s r ef$. sl refVvar.x) v 19 : (y^ 1 .* 3 var,x) ■+ (y yl,s null ,x) Variables without a corresponding value default to null (V ) ■Ly There is an exception: If the variable is not global (i.e. it fails to match a field-name of the memory record), if memory contains the global variable r_ bound to a list of records ( local environments ) , and if the execution stack contains a reference followed by _$, then the reference is appended to the variable (V o) so that Find will try to resolve the variable in the local environment . If Find fails a second time the exception no longer holds, since r_ is global, and null is returned. The special variable env returns a reference to the current local environment (V ) . For example, in the memory m below, sysin , sysout , and r_ are global variables, while job , rate , and code are local variables. m = ' (sysin: C '12 10" ,"30 50" ,), sysout : ( " T0TAL=56 92 " , ) , r: ((job:5,rate:2.50,code:300 ,), ( job: 6, rate: 3. 10, code: r.l. code, ) , ( job : T, rate : 1+. 25, code :r. 2. code, ),),)' uu An attempt to dereference the variable A leads to the following computation, where y and yl are strings such that m = yl = yi ) > ) J : (m , >-r.3$+A ,-|) (y±L , t-r.3$+A ,^) (y 1 , hr.3$+r.3.A ,-0 (yl +),),) , »-r.3$+r .3.A ,-0 (m ,i-r.3$+# ,-») 5 . U . 6 Blocks. V 20 : (m,sd ref >i y) ->- (m, sd ref $ nil , y) V 21 : (m,sd var .i y) -> (m,sd - ref I var ,$y) V 22 : (m,sd rec .£ y) ->■ (m,sd rec -» r_ ,£y) V 23 : (m,sd wl )_»_ y) -> (m,sd nil , y) V 2V (m,s re >f$ w 5_J_ y) -»■ (m,s w ,_2_ y) V.,: (m,s re :f$ w ,) y) -*■ (m,s w ,) y) 25 where sd G (Stack ( {_ \ ± | Delimiter ))• A block is a statement of the form e $ t where e is a declaration (i.e. an expression whose value is a record or a record- valued variable) and t is a statement called the scope of the declaration, When the value of e is a reference, it is pushed onto the stack (V ) . Variables are resolved first (V ) , and records are pushed onto the environment stack (V ) . h5 Statement t is evaluated in the new environment. When t is terminated by a ^ or _)_, the environment pointer is deleted and the previous environment restored (V , ,V ). The value of a compound statement of the form e, : e^ : . . . e, is the value of e, (V__). Note 12k k 23 that compound statements presuppose the existence of variables: if the first k-1 expressions didn't have side-effects there would be no reason to evaluate them. The block below contains the declaration of a single variable, Pay, whose value is a record with two fields. Note the second use of _$ to simplify record processing. (Pay: (Regular: (Hrs :8, Rate: k. 53, Amount :0) , Overt ime : ( Hrs : 8 , Rate : 6 . 75 , Amount : ) ) ) $ Pay. Regular. Hrs * Pay .Regular. Rate -> Pay. Regular .Amount ; (Pay. Overt ime $ Hrs * Rate -*■ Amount) The relationship between program and data structure is more obvious if we write (Pay$ (Regular $ Hrs * Rate ■> Amount , Overtime $ Hrs * Rate -> Amount)) In the example below, the variable A is declared local to three successive inner blocks. (A:5,B:0) $ ((A:7) $A+B+A-HB ); ((A:8) & env $A+B-*A+B ); ((A: 9) & (dom(env).: re f.!dom(env))$A+B+A-*B;pop(r)); The first block is closed , i.e the environment of the enclosing block is inaccessible. A+B is undefined and A will be assigned the value null. In the second block, A+B is evaluated in an environment formed by concatenating a copy of the enclosing environment to the local activation record. The local variable A will be set to 8 and the enclosing environment will be unaffected. This permits backtracking behavior as in the PLANNER lan^uap-e. The third subexpression resembles an Algol 60 block: The local variable B is bound to a reference to the nonlocal B, which vill be reset to 9- Since the second occurrence of A in the local activation record is inaccessible, the nonlocal A will be unchanged. As in Algol, the activation record is deleted when the block is exited. hi 5. 5 Procedures . 5.5-1 Execution . P : (m,s nil \[_f]_ ,x) -> (m,s nil ,(f)_ x) P : (m,s [_name f ] Icon ,x) -* (m,s nil ,X(namej_con)_ f)_ x) P : (m,s Jlist f JJ_listl,x) -> (m,s nil .((list . :ref . ! list)f) x) A procedure is a form enclosed in square brackets. The prefix operator l_ (execute) causes a procedure to be executed by replacing the brackets with parentheses and placing the resulting subexpression in the input buffer for evaluation (?-,)• The infix operator _I_ (apply) applies a procedure to a constant (P Q ) or a parameter list (P-,). In the second case, function ref coerces variable parameters to references in the environment of execution; then the operator .: constructs an activation record which binds formal parameters to (references to) actual parameters. This is illustrated in the example below, where each procedure is used to increment A by 5. P: [A+5 ■*■ A] ... !P F: [X$X+5] ... F(A) -*■ A G: [(X,Y)$X+5 ■+ Y] ... G(A,A) H: [R$R$X+5 + Y] ... H(X :val(A) ,Y :ref (A) ) 5-5.2 Environment . P^: (m,s rec J_ X e $_ f J_,x) ■> (m,s £ e & rec $ f ],x) Rule P, permits part of the local environment of a procedure to be computed at definition time and then inserted into the procedure. Its use is illustrated by the definition of procedures I and J below: (A:50,B:60)$ (A:5,B:6,F:[X & (A:10) $ X + A + A] , G:[X & env $ X + A -> A], H:[X & (A:ref(A)) $ X + A -> A], I: env [X $ X + A -> A], j:(A:ref(B)) [x $ x + A -> a])$ f(5); h(5); i(5); J(5h F(5) = 15 since A is a local variable initialized to 10. G(5) = 10, but since G contains a copy of its environment of execution, it has no side-effects. H(5) returns 10 and modifies its environment of execution since the local variable A is bound to a reference. 1(5) returns 55 and has no side-effects, while J(5) resets B in the environment of definition to 65. The procedure Common defined below accepts records and record-valued references as input, and returns a record of references. Common: [ (R, )$R$[dom(env) ](dom(env) )] Appending the value of Common(R) to a procedure's environment permits the procedure to reference (access and modify) the components of a record R by name. Common can be used to permit shared COMMON blocks, as in Fortran, or to support own variables and recursion as in the example below. U9 Factorial: Common (Uses :0) [N&Common(env, )$ Uses + 1 -> Uses ; N = => 1, N * Factorial (N-l)] The first instance of Common pushes the record (Uses:0) onto the environment stack and returns a record of the form (Uses :r.n. Uses ) , which is appended to the local environment of procedure Factorial. The second instance of Common produces a record of reference to the enclosing environment when Factorial is executed. A field of the form Factorial :r.n. Factorial must be included, so the local environment of Factorial will include references to both the (non-local) 'own' variable Uses, which is used to count the number of times Factorial is executed, and Factorial itself. 5.5-3 Coercion . P : (m,s [f]_ pr [_fll ,x) -► (m,s nil , DistUO pr &!)) x) P.: (m,s [f] pr con ,x) -*• (m,s nil , DistCXOpr con ) x) 6 P : (m,s con pr [f]_ ,x) ■+ (m,s nil , Dist(conpr (f)_ ) x) P„: (m,s nil pr [f]_ ,x) ■*■ (m,s nil , Dist(nil pr (f)_ ) x) The rules above coerce a procedure to a value by executing it. When the procedure consists of a list of statements, the operator and other operand are distributed over it as in the example below. 9 + [6+A,7+B] + (9+6+A.9+7+B) Note that if parentheses were used in place of square brackets, the assignment would be carried out before the addition rather than after. The coercion rules above also permit non-procedural language features to be modeled. For example, the first three expressions below can be executed in any order without changing the meaning (output) of the procedure. [ [X*X] -* A: JY*Z] -> B; [A-B] + C: 0.+ (X,Y,Z);JC +> Output; 25.+ (X,Y,Z);!C +> Output ] 51 5.6 Control Structures . The conditional and loop operators f>_ and @_ are defined only for states of the form (m,sd w,x), where sd is a stack belonging to the syntactic class Stack ( {_ _j_ | Delimiter). 5.6.1 Conditionals . C : (m,sd true ,=>t t)_ y) -*■ (m,sd nil, t _)_ y) C : (m,sd false,f>t f_)_ y) -+ (m,sd nil, f )_ y) A subexpression of the form (e => t , e => t , ...) has the expected meaning: Successive expressions e. are evaluated until the value true is returned; then the corresponding statement t. is executed and the subexpression is exited. Some examples are given below along with their syntactically sugared equivalents. (X > Y => 25 -»■ Y) = if X > Y then 25 + Y fi (P(x) => A , B) = if P(x) then A else B fi (P(x) => Q(x) => 1,0) E (P(x) and Q(x)) (P(x) => l,Q(x) => 1,0) E (p( x ) or Q(x)) In the last two examples Q(x) is executed only when P(x) fails to determine the value of the Boolean expression. 5.6.2 Loops . C : (m,sd nil, @ t ]_y) + (m,sd nil, t ,@ t ) y) A subexpression of the form (@t) causes statement t to be executed repeatedly until the loop is exited (or ♦- is entered from the terminal). Two possible forms are illustrated below together with their syntactically sugared equivalents. (@ el ; e2 =>) 5 do el until e2 od (@ el ; e2 => e3) = do el until e2 returning e3 od The program fragment below copies non-zero integers from an input to an output file, skipping zeroes and exiting when the file is exhausted or when a non-integer value is encountered. Assume exit = 1 and repeat = 0. (@(len( input) = => exit, read (input) -> x; ~»int(x) => exit, x = => repeat , x ■» output ;repeat )=> ) 53 By enclosing conditionals in square brackets and applying the coercion rules of section 5-5.3 we obtain the forms of labelled case expressions shown below. i = [l => P(x), 2 => Q(x), i => Default (x)] x = [Apples => Casel(), Oranges => Case2(), Fruit => 0] i = [1 =>A, 3 => B, (2,fc,9> => C, (5..8)v => D] x < [0 => error, 10 => ratel, 20 => rate2, 0; error] 5-6.3 Indexed Loops . C^: (m,sd 0,@ t f J_ y) ^ (m,sd nil, f J. y) C^: (m,sd i , g t f J_ y) + ( m , s d nil, t ,i-l@ t f]_y) A subexpression of the form (fl;igt f2) is evaluated by executing fl, executing statement t i times (or until the loop is exited), and then executing f2. The four examples below perform the following actions: copying 10 records from file A to file B; selecting a subset of a list of names using a bit map; creating a vector; creating a 2 by 3 matrix. (10@ read(A) -» B;) * () (1,0,1).@(FRED,J0E,MIKE) * (FRED,MIKE,) (1,2,3@0,2@10 ^ (1,2,0,0,0,11,11,) (2@(3@0)) : ((0,0,0,), (0,0,0,),) Note that the value of the first expression is the empty- list because the statement is terminated with a semicolon. Note also that dimensions are given in the same order as subscripts: When A = (2fi(3§0)), A. 2. 3 is defined but A. 3. 2 is not. 5.6.U For Loops . Cg: (m,sd JJ_,@ t f)_ y) •+ (m,sd nil, fly) C : (m,sd str,@ t f)_ y) ■+ (m,sd hdj_ str.t ,-tl! str @ t f)_ y) C g : (m,sd _Q,@ t f) y) -»- (m,sd nil, fly) C : (m,sd list,@ tf}y)+ (m,sd hdllist, ^tl.'l ist § t fj. y) A subexpression of the form (list@ -*■ x; e) evaluates expression e for every value x in list. Loops can be nested as in the second example below. ((1,3,5,2)8 -> I; A(I) + B(I) -*■ C(I)) (l..5@ "» I; h. .20@ -* J; e(l,J)) Lists can be constructed from character strings and matrices as follows : (•ABC'g) + ('A'.'BC'g) * ('A' ,'B» ,'C',) (U0,0),(1,1))@@) * ((0,0)@,(1,1)@) * (0,0,1,1,) 55 Using the distributive property of the subscript operator, we can reverse the rows and columns of a 9 x 8 x 7 matrix M by componentwise assignment: (M(9..1)@(6..1)§(1..7))+ (M(1..9)§(1..8)§(1..7)) 5.6.5 Coercion . C : (m,sd var,=>_ y) -+ (m,sd val!v ar , => y) C : (m,sd var,@_ y) -> (m,sd vallv ar, @_ y) Operators =>_ and @_ are the only delimiters which coerce their operands. Since variables are dereferenced, expressions like the one below are meaningful. (P => N@0) The operators .=> and J§ also dereference variable operands, In the example below, the expressions on the left have the meaning suggested by the syntactically sugared equivalent expressions on the right when L is a list. (L > 0).=> L e first x in L such that x > end (L > 0).@ L = all x in L such that x > end ■/- 5.7 Self-Extension . S : (m,s v ev wl, x) -> (m,s ext! ( w ^_ ev ^j. wlj_ ,x) S 2 : (m,s {_ f , J_ y) + (m,s err.'[ f] ,y) S : (m,s w , op y) -* (m,s err .' [ w ^_ op ^J_ ,y) Since rules S , S and S are the last rules of the definition thev provide a default action for states to which no other rule applies. Let us assume that procedures ext and err always return a value. (This will be the case if they are undefined, by rule R in section 5.2.1.) Then we can prove Theorem: Trivial expressions (those containing only separators) reduce to nil. Non-trivial expressions whose evaluation terminates always reduce to a value. Proof: Rules S n and E Q together ensure that the theorem holds for 1 o expressions whose subexpressions yield a value. Rules S , S„, and E ensure that every subexpression whose evaluation terminates yields a value. Q.E.D. Subexpressions or portions of subexpressions which cannot otherwise be evaluated are encapsulated in square brackets and passed to an error procedure for correction or display (S ,S ). An erroneous delimiter or A is deleted before being pushed onto the stack so that an interactive user can enter a correction. The language extension procedure ext is applied to binary expressions which are otherwise undefined. Since ext obeys the same 57 scope rules as other user-defined procedures, extensions can be made local to particular blocks. For example, the operator * will be extended so that 3 * i A i E 'A' * 3 = 'AAA' for all expressions whose environment includes ext : J (w,op,wl)$ (w,op,wl)=[ (int , '*' ,str)/v => (w@wl).&, (str,'*' ,int)^=> (wl@w).&]], 6.0 SIBYL Pragmatics . The meaning assigned to a program by a formal language specification is in general incomplete in that it fails to describe the interaction of the program with the peripheral devices and softvare of the particular installation on which the program is run. For example, the expression '+T0TAL=10' -» P acquires additional meaning when output file P is attached to a printer that uses +_ as a carriage control. The notation introduced in section 3.6 permits the definition of computer installations as networks of concurrently executing hardware and software modules and provides a means for defining parallel and non-deterministic language features. To see this, consider the hardware configuration diagrammed in figure 5« It consists of an operator's console, a tape unit, a printer, and three processors, all sharing a common memory. Assuming that the string automata Console, Tape, and Printer have been defined, the configuration is completely specified by Inst(R) =I.+I+I+C+T+P vhpre I. = Processor(Mem, Stack. , Input.), l 11 C = Console (Mem, Display, Input .Keys) , T = Tape (Mem, Reel), P = Printer (Mem, Output ), R = (Mem, Display, Keys, Reel, Output ) & Stack & Input , 59 Reel -sm T Mem \ Display Stack Input V _ Keys h c tput w Stack- Input p \ Stacks p Input- J 3 ^ — <^~ Figure 5: Multiprocessor Syst em, 60 and Processor is defined by Processor = ^ t R^^ & ^ ^ ITIP U I C^ u t 8^, V " (V 3..9 & V 12..19 )o( VW & V 10 & V 20..25 The function composition operator o is used to combine the two steps of a memory access into one indivisible operation. This eliminates intermediate states of the form (y^yl , stack, input ) and permits multiple processors to share the same memory register. 6.1 Parallel Tasks. If the memory, stack, and input registers are initialized so that f mem = (r : ( ) , Tasks : ( ) ,C: ( ) ,T: ( ) ,P: ( ) , ) stack. = i- 1 — input. = (@.' read (Tasks) ; ) programs entered from the console can initiate parallel tasks by placing them on the Tasks queue for subsequent execution by processors I and I . An example is >-( (Producer : [el] , Consumer: [e2 ],)$ Producer ->■> Tasks; Consumer -» Tasks )h 61 Let us suppose that Producer is a procedure that reads records from the tape file T, processes them, and passes them on- to Consumer via a queue Q. Consumer processes records from Q and outputs them to the printer via file P. The program that spawns both tasks must supply them with Q and a procedure Read that waits until a file contains an entry "before returning to the calling program. ((Read: [X & (Y:0)$@ -i( (read(X)->Y)=null) =>val(Y)], Q: .)$ (Producer: env [ (R :0)$@Read(T)+R;el;R-»Q;R=( )=>] , Consumer: env [(R:0)$@Read(Q)-*R;e2;R-»P,R=( )=>] , )$ ( Producer, Consumer )@-»-> Tasks ) In the example above, both tasks continue to execute after the main program terminates, and remain active until the empty record [)_ (used here as an end-of-file marker) is read and passed on. 6.2 Semaphores and Coroutines . Queues like Q in the previous example may be used as semaphores. In the example below, semaphore S ensures that only one task at a time is in its critical section, while CI and C2 are used to simulate coroutine linkage. 62 (Pause: [X$@ -,read(X)=null)=>] , S : (l, ) ,C1: ( ) ,C2 : ( ) , Producer: [el;Pause(S) ; critical sectionl; 1-*>S; e2], Consumer: [e3,Pause(S) ; critical section2; l-»->S; eh] t Coroutinel: [el; 1-*>C2; Pause(Cl); e3; 1-»C2], Coroutine2: [Pause(C2); e2 ; 1-»C1; Pause(C2); eU])$ (Produces, Consumes, Coroutine 1, Coroutine 2) @->->Tasks; 63 7.0 Summary. The semantics of both the hardware and software components of a computing system can he precisely described by a formal notation based on lists of transition rules over n-tuples of character strings, The formalism used in this paper seems especially suitable for describing artificial languages to their users, for the following reasons: (1) The notation is simple and requires little mathematical sophistication on the part of its users. (2) Definitions are operational. They provide a straight- forward procedure for evaluating expressions in the defined language. (3) Every state of a computation has a unique concrete representation as a tuple of character strings. (h) Definitions are modular. Transition rules can be reordered and combined in various ways to facilitate learning by different classes of users, or to provide alternate views of the same language as an aid to understanding . These same characteristics make the formalism attractive as a language design tool. Its simplicity makes it amenable to mathematical definition and analysis, its operational nature permits the automatic synthesis of interpreters for example programs, and its modularity supports a building-block approach to language design. REFERENCES Aho, A. V. and J. D. Ulljnan [1972] The Theory of Parsing, Translation and Compiling . Prentice-Hall (Englewood Cliffs, N.J.). Backus, J. W. [1959] "The Syntax and Semantics of the Proposed International Algebraic Language of the Zurich ACM-GAMM Conference" Proc. I nternational Conf. on Information Processing , UNESC0, pp. 125-132. Dennis, J.B. [1972] "On the Design and Specification of a Common Base Language" MAC TR-101 MIT Project MAC (Cambridge). Feyock, Stefan [1975] "Toward an Implementation of the Vienna Definition Language" Proc. 1975 International Conference on ALGOL 68 . Oklahoma State Univ. (Stillwater), pp. 370-38U. Garwick, J. V. [1966] "The Definition of Programming Languages by their Compilers" Formal Language Description Languages for Computer Programming (Proc. IFIP Working Conf. 196U) (Steel, T. B. , Ed.). North-Holland Publ. Co. (Amsterdam) pp. 266-29*+. Goguen, J. A., J. W. Thatcher, E. G. Wagner and J. B. Wright [1975] Initial Algebra Semantics and Continuous Algebras Research Rept. RC 52^3, Thomas J. Watson Research Center, (Yorktown Heights, New York). Herriot, R. G. [1971] The Definition of the Control and Environment Structure of Programming Languages . (Ph.D. Thesis) Univ. of Wisconsin (Madison). Hoare, C. A. R. [1969] "An Axiomatic Basis for Computer Programming" Comm. ACM 12 :10, pp. 576-580. Hoare, C. A. R. , and P. E. Lauer [1973] Consistent and Complementary Formal Theories of the Semantics of Programming Languages" Tech. Rept. hk , Univ. of Newcastle-upon-Tyne (Newcastle-upon-Tyne ) . Irons, E. T. [1970] "Experience with an Extensible Language" Comm. ACM 13 :1 pp. 31-^0. Kampen, G. R. [1973] SIBYL: A Formally Defined Interactive Programming System Containing an Extensible Block-Structured Language . (Ph.D. Thesis) Tech. Rept. #73-06-l6, Computer Science Group, University of Washington (Seattle). Kampen, G. R. and J. L. Baer [1975] "The Formal Definition of Semantics by String Automata" Computer Languages V. 1 . Bergman Press pp. 121-138. 65 Landin, P. J. [1965] "A Correspondence between ALGOL 60 and Church's Lambda Notation" Comm. ACM 8 :2,3, pp. 89-101, 158-165. Lucas, P., P. Lauer, and H. Stigleitner [1970] Method and Notation for the Formal Definition of Programming Languages , Tech. Rept. TR 25.087, IBM Laboratory (Vienna). Lucas, P. and K. Walk [1969] "On the Formal Description of PL/I" Annual Review in Automatic Programming 6 , Part 3. Pergamon Press (New York) pp. 105-182. Lukaszewicz, L. [1976] A Semantics Definition System (A Preliminary Description) . UIUCDCS-R-76-773, Univ. of Illinois, Dept . of Comp. Sci. (Urbana). Marcotty, M. , H. F. Ledgard and G. V. Bachmann [1976] "A Sampler of Formal Definitions" Computing Surveys 8 :2, pp. 155-276. Scott, D., and C. Strachy [l97l] "Towards a Mathematical Semantics for Computer Languages" Proc. Symp. on Computers and Automata , Polytechnic Institute of Brooklyn; also Tech. Mon. PRG-6 , Oxford U. Computing Lab., pp. 19-U6. Tennant, R. D. [1976] "The Denotational Semantics of Programming Languages" CACM 19 :8, pp. l+3T- i +53. US. [1965] COBOL: Edition 1965 . Dept. of Defense, U.S. Gov't Printing Office (Washington, D.C.). van Wijngaarden et al. (editors) [1975] "The Revised Report on the Algorithmic Language ALGOL 68" Acta Informatica 5 : 1-3. Springer-Verlag (Berlin). Wegner, P. [1971 ] "Data Structure models in programming languages" SIGPLAN Not. 6:2 (Proc. Symp. on Data Structures in Programming Languages) pp. 1-5^. Wirth, N. and H. Weber [1966] "Euler: A Generalization of ALGOL and its Formal Definition" CACM 9 :1, pp. 13-27, CACM 9 :2, pp. 89-99- ILIOGRAPHIC DATA ? r -> :