Digitized by the Internet Archive in 2013 http://archive.org/details/translationofrel936daws .j/u,o 7 \ ^J^UIUCDCS-R-78-936 77^4 v 6 UILU-ENG 78 1729 A TRANSLATION OF RELATIONAL DATABASE QUERY LANGUAGES INTO SIBYL by David W. Dawson July 1978 A Translation of Relational Database Query Languages into SIBYL by David W. Dawson Department of Computer Science University of Illinois Urbana, Illinois July 1978 for Mary and Richard CONTENTS INTRODUCTION VL SQUARE The metalanguage 2 SIBYL 5 Relational databases 6 The three languages: comparisons 7 Example database 8 lLPHA 10 Data Basic Forms Major Productions Procedures References 11 12 13 26 30 31 Data Basic Forms Major Productions Procedures References 33 34 36 47 51 .E 52 Data Basic Forms Set Expressions Statements Procedures References 53 54 56 61 67 68 .USIONS 70 The Languages Translation SIBYL Comments 70 72 74 75 BIBLIOGRAPHY 77 INTRODUCTION In the past few years about a dozen query languages have been designed for use with relational data bases . This report formally defines three of them — ALPHA [6] , VL [13] and SQUARE [3] — by giving syntax-directed translations into SIBYL, a language designed in part to define the semantics of programming languages . The definitions of both syntax and semantics are precise and formal, and in the presence of a SIBYL interpreter can be used to implement the languages . To fully understand the translations presented in the next three chapters, the reader should understand relational databases and have a working knowledge of SIBYL. (For the latter, see [9] and [10] . ) Most readers will not wish to be so thorough; by reading the Introduction and Conclusion they can get a good impression of the contents of this report, its purpose, and its findings. By skimming the definitions and examining their complexity, the reader can investigate the relative merits of the query languages and of the utility of this definition scheme . Five parts make up the remainder of this introduction. They describe the metalanguage used to define the syntax- directed translation, the SIBYL language, relational data- bases, a broad comparison of the languages, and the example database used in illustrating the query languages. The Metalanguage . Although the metalanguage is supplemented by examples and commentary, and although a few translations are given informally by English descriptions, still the metalanguage describes the language rigorously. Our metalanguage consists of extended BNF productions, together with translations of the right sides of productions into SIBYL code. Thus syntax and semantics are both distinguished and appropriately connected. For example, suppose we want to define a tiny subset of LISP (the PLUS operator) in terms of FORTRAN. We do this by writing the syntax-directed translation schema * Sum ■*■ ( PLUS num, nunu)^ ** ^num, + nun^ * Num ■+ ( 0|1 ) + Note that the two productions on the left form a context-free grammar for our LISP subset. In particular, (PLUS 10 Oil) is a vald sentence, which is translated to (10 + 022) . This defines a translation function T, which is applied as follows: T ( ( PLUS 10 Oil) ) = ( T (10) + T (011) ) = ( 10 + 011 ) Object language sumbols, both of the query languages and of the SIBYL code, are underlined (e.g., #, GET , if R. protected = T ) . Normally query language letters are upper case and SIBYL is lower case, but this is by no means inflexible, as the last example above shows. Blanks are ignored; it is assumed that they are used when necessary and are otherwise non-significant except inside identifiers, and primitive values. Nonterminals of the grammar, like other metasymbols, :re never underlined, and are always written in lower case, in possibly abbreviated form when they occur in the right side of productions. On the left side of productions, they are written out in full, with the first letter only capitalized, and often preceded by the standard abbreviation. The pro- duction symbol itself is '-*■'. For example, the VL non- terminal 'Relational table condition' is abbreviated 'rtcond 1 On the right side of a production it appears as -*■ ... rtcond while on the left it appears as ... rtcond: Relational table condition -+■ . . . . Several metasymbols (never underlined) make the grammar more concise and readable. -- denotes alternative productions, a -> b c is equivalent to the two productions a -*■ b and a ■> c. (Lower precedence than concatenation.) ( ) -- groups forms together. * -- or more occurrences of the preceding structure. + -- 1 or more occurrences of the preceding structure. ? -- or 1 occurrences of the preceding structure, subscripts -- denote different occurrences of a nonterminal which must be distinguished in the translation. Our grammar is therefore quite similar to one popular variant of extended BNF. Where formal definitions are not given (but could be given if necessary) , English explanations are enclosed in angle brackets. Example: str: String -*• Finally, every production is preceded by '*', often fol- lowed by an index number which helps to classify the produc- tions . For example, the seven basic classes of sentence in a language would be preceded by *1 . , *2. t ..., *1 . , while *2.1, *2.2 and *2.3 would denote nonterminals used mainly in *2 . An example of a simple production is the SQUARE variable list : * 1.1 varlist : Variable list ■*■ var (^ var)* If X, Y and GEORGE are 'var's, then the following are 'varlist 's : X Y, GEORGE, X X , X Translations into SIBYL always follow their syntax productions, and are always preceded by '**' . Nonterminals are to be further translated, according to the appropriate rules . The semantic rule corresponding to the SQUARE example above is ** (_ var (j_ var) * j)_ In many places, as here, the exact correspondence between symbols (the two instances of 'var' here) is made clear to the reader by the context. (This informality is not serious.) How do we translate the three examples above? The rule for 'var' is * var: Variable -* freevar compvar ** freevar (dom compvar @ -*• d; ^compvar x d , ( ) ) I If X is in fact a 'compvar', then the first example (X) translates to (( dom X @ - d; (X, d, ( ) ) ) , ) If the SIBYL translation is identical in form to the right side of the production, then it is omitted. To aid in understanding, many SIBYL constructions have been grouped into procedures and listed at the ends of chapters . A good reference for syntax-directed translation is Aho and Ullman [l] . It should be noted that our translation scheme is not a simple syntax directedttranslation in the sense of [1] . SIBYL. SIBYL is an experimental programming language with the usual control structures (conditionals, loops, case state- ments, procedures) and data structures. These include numbers 5 -35 56.32 character strings 'ABC ' OLD MC " DONALD ' lists (1, 2, 3) (A,B,3) ('A', 'B',) (5,) records (A:5, B:6) (NAME: 'SMITHY', AGE: 15) Since lists and records can contain lists and records as values, complex data structures can be formed. A rich set of operators is provided, including arithmetic and logical operators and operators for concatenating and decomposing strings, lists, ana 1 records. SIBYL is designed to have enough power to permit unusually compact definitions, and to simulate the environment and structures of all other programming languages . One purpose of this report is to see how well SIBYL per- forms its intended primary function of defining other languages . For detailed descriptions of SIBYL, see [9] and [10] . Relational Databases. A relational database is basically a collection of tables; see the example database at the end of this introduc- tion. Each table has a name, as does every column of the tables. Tables are called "relations" or "relational tables"; columns may be "domains", "attributes" (ALPHA), "descriptors" (VL) , "fields", or "columns" (SQUARE); and the entries of the tables are "tuples", or sometimes "rows". Certain domains which uniquely determine the rest of a tuple may be singled out as "key fields". Language syntax usually doesn't specify: the exact form of relations. We have chosen to let a tuple be a set of SIBYL records, so that the value of the relation STAR is ( ( S#:0l, SNAME: 'SUN' , POSITION: 202020 , SCLASS:*G'), • ( S#:57, SNAME: 'SIRIUS B', POSITION: 162722, SCLASS.-'F'), ) For discussions of relational databases, see {5], [8] and [12] . The Three Languages : Comparisons . ALPHA, VL and SQUARE have several features in common: (1) they are fairly new; (2) they are intended to be data manipulation sublanguages of larger computer languages (and in the case of VL, of a particular language, VL1) ; (3) each provides facilities for creating, destroying and modifying relations as well as retrieving data from them, but retrieval would almost certainly be the most widely used. On the other hand, there are many interesting differ- ences as well. Only ALPHA has non- rudimentary protection and access control features . The syntax of ALPHA is deliberately not made precise, while BNF grammars appeared early in the history of both VL and SQUARE. The SQUARE language has an unusual 2-dimensional syntax, and alternative syntaxes have appeared. The importance of key fields varies from great (in ALPHA) to minimal (they are not mentioned in SQUARE) . ALPHA is based on the precise but unwieldly predicate calculus, while VL and SQUARE attempt to make querying as easy as possible for unsophisticated users . Several queries in these languages and others are com- pared in [13] , while [ll] has a very long list of queries in ALPHA, SQUARE and several other languages, but not VL. We have omitted some language features, both deliber- ately and (no doubt) accidentally. Details have been 8 omitted which would add little to the explanation but greatly complicate the implementation. For example, piped mode in ALPHA is not implemented, and there is error checking only in ALPHA. Example Database . We will use the following relational database to illustrate the languages. Key fields are underlined. STAR (S#, SNAME, POSITION, SCLASS) PLANET ( S#, P# , PNAME PTYPE) VEHICLE (V#, VCLASS, VNAME) MISSION ( V#, S#, MDATE , MTYPE) COLONY (C#, CNAME, S#, P#, CDATE) STAR S# SNAME 01 SUN 10 RIGIL 11 12 PROXIMA 23 TAU CETI 56 SIRIUS A 57 SIRIUS B POSITION SCLASS 202020 191816 191816 191816 262918 162722 162722 G G K M G A F PLANET S# P# 01 01 01 12 23 23 56 03 04 06 01 02 05 01 PNAME EARTH MARS TITAN GAGARIN PLATEAU KAMPEN CYON PTYPE E M I M E I R VEHICLE v# VCLASS VNAME P078 PROBE PIONEER P323 PROBE X2 PROBE ZOND C29 CARGO S09 SCOUT S12 STARSHIP ENTERPRISE S15 STARSHIP SCORPIO MISSION v# s# MDATE MTYPE P078 01 1981 F P215 10 2170 P215 11 2170 P215 12 2170 F X2 23 2195 L S09 56 2210 E S09 57 2210 E S12 23 2221 C S15 12 2225 C S15 56 2240 C S15 57 2240 E S12 56 2245 C COLONY c# CNAME s# 01 04 CDATE 02 LOWELL 2004 03 BELLONA 01 04 2010 04 LANG 01 06 2015 53 NEW EARTH 23 02 2221 55 FAROUT 12 01 2225 58 PRIME 56 01 2240 ■ * I I I n DSL ALPHA The best known and most widely cited relational data- base query language is DSL ALPHA (Data Sub-Language ALPHA), which was developed by E. F. Codd of IBM [6] . Among its noteworthy features are: 1. Data can be both retrieved and modified. 2. Access to data is through "workspaces", which are local to users, as opposed to the database proper, with which workspaces provide communication. 3. Queries are based on the predicate calculus familiar to mathematicians and logicians. The query language is therefore very powerful and elegant, but quite different from English. 4. ALPHA syntax has never been rigidly defined, but is supposed to depend on the host language of which it is a part. (Thus ALPHA within FORTRAN would look different from ALPHA within COBOL) . The published versions of ALPHA in fact differ in several ways (see, for example, [6], [7] and [8]). We have based the ALPHA of this chapter on that of a popular early source, [6] . There are a few differences between our ALPHA and that of [6]: 11 1. Codd provided for an option "MAYBE TOO" to handle undefined fields in the database. We have not. 2. [6] distinguishes between "domain names," which may occur twice in a relation, and "attribute names," which are by definition distinct. For example, a relation R might have fields (A,B,A,C). The system would distinguish the two occurrences of domain name A as two attributes, probably by adding suffixes . We require domain names to be distinct, so that for us, domain = attribute. 3. We have no piped mode, which permits a user to move data in or out of a workspace tuple by tuple. 4. ALPHA normally permits use of functions and con- stants in target lists; see [6]. For simplicity's sake we have ignored these. Without piped mode, a workspace can be in any of six states (see Fig. 1 on next page). Data . There are four data types within ALPHA: relations, workspaces, attributes /domains and tuple variables. In this description, each is represented by a kind of record with one value field (except for attributes) and several fields for useful information. To the user, relations and workspaces are both simply relational tables, but the run-time information needed by the system is different . complete HOLD start UPDATE 12 T = ready to transmit P = put R = ready to receive H = hold G = get U = update Figure 1 (after Fig. 2 in [6], p. 13) Our implementation requires two special structures a list 'tvars 1 of declared triple variables, and a list 'free' of free variables in qualifications. Basic Forms * alist: Attribute list * j(aname ( ^aname ) *)_ ** J^aname ( ^aname ) * , ) Examples . 1. (DATE) 2. (S#, DATE, MTYPE) * aterm: Attribute term ■* rname^ aname Examples : 1. PLANET. PNAME 2. VEHICLE. V# rname Relation name -► name 13 ** if name . protected then Error (4) else name fi; A relation cannot be accessed if it is protected; for example, while another user is updating it. * aname: Attribute name ■*• name * wname: Workspace name •> name * tvar: Tuple variable ■* name * name: Name ■* let name (let num # und) The original ALPHA permits # and _ within names. * let: Letter ■* A | B | C | ••*2|Z * num: Numeral -*■ 0|lJ,2| ••• |9_ * #: Pound sign ■*- * und: Underline ■*■ aBi Major Productions * o. ALPHA Statement ■*■ declare get hold alter put update release destroy * 1 declare Declare '■* domst relst ranst * 1.1 domst Vc* Domain statement ■+ DOMAIN aname hltype (aname : (type : ' domain ' , data L n ^- t yP e ^_ relations: () )_ )& env -*■ env; * 1.1.1 hltype Host language type -»- * 1.2 relst : ** 14 Relation statement -*■ RE> 'ON rname alist^ Kii'f (a^ist^ aname) ( (alistn = %)V =t> Error (0) / Subset ( (alist 2 1 (aname,)) .alist-i ) =» Error (1) . (rname : (type attributes key tvar 'relation' , alist (alist^ 1 (aname , ) ) (range: rname, value : %) , protected : F, value j_Q )_ )&env •» env; rname .-»■> ( a list | .(§ relations) ; ); Attributes must be created by DOMAIN statements before relations which use them can be defined. Those attributes which make up a key must of course be included in the attributes of the relation. The "tuple variable" rname^tvar ranges over rname^value in quantified retrievals . Examples . DOMAIN S# INTEGER (2) DOMAIN SNAME CHARACTER (24) DOMAIN POSITION INTEGER (6) DOMAIN SCLASS CHARACTER (2) RELATION STAR (S#, SNAME, POSITION, SCLASS) KEY S# RELATION MISSION (V#,S#,DATE,MTYPE) KEY (V#,S#,DATE) )) ' domain ' 'INTEGER(2)', (STAR, MISSION,) )) 15 The environment now contains the following records, among others : (STAR ,:(type : 'relation 1 , attributes : (S# , SNAME , POSITION , SCLASS , ) , key :(S#,) tvar : (range : STAR, value : % ) , protected : F value (S# : (type data relations The empty relations STAR and MISSION can be filled by PUT statements with records like (V#:'P078*, S#: 01, DATE: '1981', MTYPE : 'F ! ). *1.3 ranst: Range statement -* RANGE (rname |wname) tvar (rsquant)? ** _(tvar : (type : 'tvar', range : (rname Iwname)^ quant : ( ' SOME ' (j_ rsquant) ?)^_ value : % ) )& env ■» env; Remove (t var , tvars) ; tvar -» tvars; *1.3.1 rsquant: Range statement quantifier -> SOME | ALL ** 'SOME' I 'ALL 1 16 The tuple variables declared in RANGE statements are used during retrievals as free and bound variables. Like workspaces, they are local to each user. The user may associate a quantifier with tuple variables at declaration time; if none is explicitly mentioned, the existential quan- tifier ('SOME') is assumed. For details of their use, see rule 2 below. Examples . 1. RANGE STAR X 2. RANGE PLANET P ALL 3. RANCE STAR X SOME Tables are retrieved from the database into workspaces by GET or HOLD statements . The former merely presents the data for inspection; the latter enables the user to alter the data through host language statements. * 2. get: Get ■+ GET (rname |wname) [ GET (_(quota)_)? wname tlist( 2. qexp)? (ord+)? ** Display ( ( rname | wname ) )j_ ( wname = % =» _(wname : (type : 'workspace ' , targetlist : tlist sources : Rectify ( tlist @ 1) , tvar : (range: wname x value : %) , state : 'RT', value : () ) 17 ) & env -*• env h. _( wname . state/='RT' =» Error (5) ± wname . state <- 'busy G'; wname .value <- Retrieve (t list J^Oqexp)?]^ ( Sort 1 (w name . value , (o rd+ ) ) ; ) ? (wname .value ♦- (w name .value) . @(1 . . quota) ; ) ? Display (w name) ; wname . state ♦- 'RT 1 ; ). Productions for quota, targlist, qexp and ord follow the examples . Examples . 1. GET STAR In this simplest form of a GET there is no work- space; the contents of STAR are simply displayed in table form on the output file. 2. GET W STAR The same, except that now the tuples in STAR are copied into a workspace W. 3. GET W (STAR.SNAME, STAR. POSITION) Only two fields of STAR are retrieved. The SIBYL execution environment now contains this record: (w: (type : 'workspace ' , targetlist: ((STAR, SNAME,), (STAR, POSITION,),), sources :(STAR,), 18 tvar : (range: W, value: 7o) , state : 'RT', value : ( (SNAME: 'SUN', POSITION: 202020), (SNAME: 'SIRIUS B 1 .POSITION: 162722) ,) )) 4. GET W (4) STAR With the quota, only the first four tuples are retrieved into W. 5. GET W (STAR. SNAME, STAR. POSITION): STAR.SCLASS = 'G\ Retrieval with qualification. "Find the names and positions of all G class stars." The printed result is W NAME POSITION SUN" 202020 RIGIL 191816 TAU CETI 262918 The qualification ("predicate" in [5] , "quantified expression" (qexp) in these notes) admits the full power of the predicate calculus. 6. RANGE STAR X GET W (PLANET. PNAME) :3 X(X. S# = PLANET. S# AX.SCLASS = 'G') Or, RANGE STAR X SOME RANGE PLANET P GET W (P.PNAME): X.S# = P.S# A X.SCLASS = 'G' "Find the names of all planets around G class stars;" or, "Find the names of all planets such that there is a star X for which the planet and X have the same star number and the class of X is 'G'." The second query 19 uses a default quantifier and a tuple variable within the target list . 7. GET W STAR UP POSITION DOWN S# The contents of W are sorted, primarily by ascending position, secondarily by descending S#. *2. 1 quota: Quota ■* *2.2 tlist: Target list -*■ target |_(target (^target) *)_ ** (target,) 1 (target (.target)*,) *2.2.1 target: Target -* rname^aname ** (rname . tvar , a name) | wname^aname ** (wname . tvar , aname) | tvar ^aname ** j( tvar^aname)_ rname ** (rname . tvar,) .& ( rname . at tributes )@ wname ** (wname . tvar , ) .&( (w name /. target list) @2)@ tvar ** (tvar , ) . & (t var ,. range . attributes )@ A qualified target is converted into a pair, the first entry of which is a tuple variable. If there is no attribute qualifier, all the tuples are. wanted, so a list of all possible pairs is generated. *2.3 ord: Ordering -> (UP | DOWN ) rname ^ aname ** (( 'UP' | 'DOWN' ), aname )^ *2.4 qexp: Quantified expression •> 20 ** (s5 | V tvar (s5l \ 3 tvar £s5^) s5 (Remove (t var , free); tvar . range@ -*• tvarj_ s5 ) A (Remove ( t var , free); tvar . range @ + tvarj_ s5 ) V * s5 ->• s4 s5 _** s4 ** s5 = s4 * s4 -> s3 s4 j± s3 ** s4 <^ s3 * s3 -v s2 s3 V. s2 ** s3 V_ s2 * s2 -*■ si s2 A_ si ** s2 _A_ si * si -> sO ^2 sO ** / sO sO -*■ terr [i pred term * term -*■ rname^aname wname^aname tvar . aname ** (rname . tvar .value) . aname ** (wname . tvar . value) . aname ** £tvar ■»> free ; Various alternatives are in use for the logical connec- tives of s5 through si. Syntactically, the language of qexp down to sO and the predicate calculus are identical. The list 'free' keeps track of free tuple variables. Whenever one is used in a term it is appended to 'free 1 ; when it is quantified it is removed. When all the qexp has been generated and the variables in the target list are taken into account , the remaining free variables are quantified by the procedure Quantfree. 21 *3. hold: Hold ■+ HOLD wname tlist (j_qexp) ** _£ wname = % =» (wname : (type : 'workspace' , targetlist : tlistj_ sources : Rectify( tlist @ 1) , tvar : (range : wname^ value ; %) , state : 'RT', value : ( ) )_ ) & env ■> env ( len (w name . sources)/ =1 =» Error (3) j_ wname . state/ = 'RT' =» " Error (5) j_ wname . state «- 'busy H'; (wname . sources (1)) .protected 4- T; wname . value <- Retrieve (t 1 i s tj_ (CT(;qexp)? l )); wname . state +- 'TU'; _2i The HOLD statement, like the GET, retrieves tuples from the database into a workspace, but the workspace is not dis- played and the HOLD makes it possible to change the permanent data through subsequent alteration and UPDATE. Examples . 1 . HOLD W VEHICLE 2. RANGE VEHICLE V HOLD W (V.V#, V.VNAME): V.VNAME = 'CODD' 22 Only one relation or tuple variable is permitted in the target list, so that the subsequent UPDATE can be defined. The relation is placed in a protected state. In addition, that relation's key must be included in the list of attributes given or implied in order that the UPDATE be possible. *4. alter: Alteration ■* wname . aname ** ( (w name . state = ( "HJ ' , 'busy U'))V =» wname . st ate ♦» 'busy U'; (wname . value @ -» x; x . aname > , Error (5) The host language itself provides alterations to work- spaces, most typically by assignment statements, although procedures could also be used. Example . After the second HOLD above, we want to change the name of the scout ship Codd . W.VNAME = 'DOLPHIN' *5. update: Update •* UPDATE wname ** JL wname . state/= 'TU' =» Error (5) len (wname . sources)/ = 1 =» Error (3) 23 , /Subset ( (w name . sources (1)) .key, wname_. targetlist @ 2) =» Error (1) , wname . state ♦- 'busy U 1 ; R «- w name . sources (1) ; (wname . value @ •*• x; 1 . . len (R. value) @ -» n; if (x . (R . key) = (R , value (n) ) . (R . key) )Athen (R. value (n) . (w name . targetlist@2)) . 4- x. (w name . target list @ 2) fi; ii. R. protected — F; wname . state «- 'RT'; 2i If the workspace is in the proper state, if only one relation is associated with it, and if the relation's primary key is included among the workspace's attributes, then the relation is updated in the following way: If t is a tuple in the relation whose values on key fields agree with values in the key fields of some tuple s in the workspace, then t is changed to agree with s. Example . UPDATE W. Tuple in W: (V# Old tuple in VEHICLE: (V# New tuple in VEHICLE: (V# '509', VNAME: 'DOLPHIN') '509',VCLASS: 'SCOUT' , VNAME: 'CODD') *509',VCLASS: 'SCOUT' , VNAME: 'DOLPHIN') 24 *6. release: Release ■+ RELEASE wname ** if (w name . stat e/= (' TU ', 'RT'))Vthen Error(5) else wname. state •*- 'RT'; • - . . . . i . . ■ i ■ ■ wname . sources .protected «- F; fi; A RELEASE puts the workspace back in a ready state with- out performing an update (i.e., it aborts the update). *7. put: Put ■> PUT wname rname^alist)? (ord+)? ** if wname . state /= 'RT' then Error (5) else wname . state — 'busy P 1 ; (wname .value-*- Sort 1 (w name . value , ord+2^ ) ? (wname . value @ -*■ x; x(^_alist) ? ^> rname . value h. fi; Unlike the UPDATE, which merely changes tuples already in a relation, a PUT adds tuples. However, the original ALPHA report doesn't specify how the workspace itself is to be defined and filled. Examples . 1. PUT W COLONY Insert all the tuples in W into the relation COLONY. 2. PUT W COLONY UP S# Same, except insert tuples by increasing S#. 3. PUT Wl VEHICLE. (V#, VCLASS, VNAME) Insert the V#, VCLASS, VNAME fields, in that order, of the tuples in Wl into VEHICLE. 25 *8. destroy: Destroy ■* delete drop ALPHA provides two types of statement for destroying data. In a DELETE a relation or workspace is emptied of its contents but stays in the database; a DROP removes the data item. *8.1 delete: Delete ■*- DELETE (rname | tvar) | DELETE (rname | tvar) j_ qexp | DELETE wname ** (rname tvar . range ) .value *■ ( ) ; ** x +- ( rname . tvar | tvar ); R •*- x. range. value; (R @ -» x. value; if qexp then (R / = x. value) . @ R fi; 1l ** *8.2 drop: Drop + DROP name >'cVf if name . type = 'tvar' then Remove (n ame , tvars ) fi; if name . type = 'domain' then if name . relations = ( ) then % -> name else Error(2) fi else % ■+ name fi; 26 Domains cannot be dropped if they are still being used by relations. Examples . 1. DELETE Wl The relation associated with Wl is "updated"; except that tuples are removed instead of changed. 2. DELETE X The relation associated with X is emptied. 3. DROP X The tuple variable X is dropped. 4. DELETE PLANET The relation PLANET is emptied. 5. DELETE PLANET : PLANET. S# =57 All PLANET tuples whose S# is 57 are eliminated. 6. DROP PLANET The relation PLANET is removed from the database. 7. DROP S# If no relation uses it, the domain S# is dropped. Procedures 1. Error: [n $ 'Error: ' & (n// = =» 'attribute not declared', = 1 =* 'key not included in attribute list', = 2 => 'domain still used in a relation', = 3* 'attempt to change more or less than 1 relation', 27 = 4 =* 'relation protected 1 , = 5 =* ' WS in wrong state ' ) -» out ] Rectify : [(A:ref A) $ (x:0, temp: ( ) ) $ (A @ + x; x /= % =» / (x = temp V) =» x -» temp) ; A *- temp ] Routine to eliminate redundant and null members of a list . The business part of GET and HOLD statements. There are two arguments: 'tlist', the list of target pairs generated in 2.2.1, and ' lproc ' , the Boolean procedure generated from a quantified expression in 2.4. Retrieve: T (tlist , lproc) & env $ (targvars : ( ) ) $ targvars «- Rectify (tlist @ 1) ; free «- ( ) ; lproc *- Quantfree (f ree, targvars, tvars , lproc) ; (Cprod (targvars @ range)@.+ (targvars @ value); lproc => ((tlist@l)@(tlist@2)).:(tlist@2) ) ] The local variable 'targvars' is a list of all the distinct tuple variables used in the retrieval target . The procedure Quantfree assigns default quantifiers to free variables that do not occur in the target list. 28 4. Quantfree: V (free.targvars ,tvars ,lproc) & env $ free*- Setdiff (free, targvars) ; (tvars @ ■+■ x; In (x,free) =*■ r «*- r & (lproc: if x. quant = 'SOME' then [(x. range @+x. value; lproc)VJ else [(x. range @->-x. value; lproc) A J fi) )] First, the list 'free' of free tuple variables is adjusted by subtracting the target list variables. Then the remaining free variables are assigned quantifiers, in reverse order of their declaration. This is done by running through 'tvars', and replacing 'lproc' by successively more quantified versions . 5. Cprod: [A & env $ (x:0, y:0) $ (len A // - *.<().) , > = 1 => (hd A @ + x; Cprod (tl A) @ + y; (x,) & y ) >] Cprod is a recursive procedure to form the Cartesian product of a list of lists . 29 6. Setdiff: [(A, B) $ (a:0) $ (A @ - a; (a = B) V * nil, a) ] The set difference A\B of two lists. 7. Display: Example STAR S# SNAME POSITION S CLASS 01 SUN 202020 G 10 RIGIL 191816 G 57 SIRIUS B 162722 F 8. Subset: [ (A, B) $ (B @ = A)VA ] Boolean procedure to check if A is a subset of B. 9. Sortl: [(table: ref cable, ords) & env $ (ord: %, order: %, field: %, table : %) $ (len(ords) . . 1 @ -*■ ord; orders ( ord(l)='UP* => '<' / '>' ); field*- ord (2); table <- Sort (table, order, field) ) ] Specialized recursive routine to sort a relational table 'table' according to an ordering list 'ords' of pairs of the form ( 'UP' | 'DOWN' , aname) . 10. Sort: [ (table : ref table, order, field) & env $ (A:Sort(tl table, order, field) , x: hd table) $ 30 (len A // < - 1 => A, > = 2 => (A. field ! order x.field)@A &(x,)& (A. field ! order x. field) /@A )] Recursive routine to sort a table 'table' of records on key 'field' with order 'order' = '<' or '>'. 11. In: [(elt, list) $ (elt = list)v] Boolean procedure to see if 'elt' is in 'list'. 12. Remove: [(elt, list:ref list) & env $ list «- (elt/ = list).@list] Procedure to remove all occurrences of 'elt' from 'list*. References [6] Codd, E.F., "A Data Case Sublanguage Founded on the Relational Calculus," Proc. 1971 ACM-SIGFIDET Workshop on Data Description, Access, and Control , ACM (1972); also IBM Research RJ893 (1971). [7] Codd, E.F., "Seven Steps to Rendezvous with the Casual User," IBM Research RJ1333 (1974). [S] Date, C.J., An Introduction to Database Systems , Addison- Wesley (1975). [12] Martin, James, Computer Database Organization , Prentice- Hall, Inc. (1975). VL The VL language was designed at the University of Illi- nois by R.S. Michalski, Richard Schubert and others [13] . Although inspired by ALPHA, it differs in three important ways . 1. VL is a sublanguage of a particular language, VL1, which is designed to "extend the capabilities of current information systems by including deductive capabilities and introducing inductive capabilities" [13] . The syntax of VL is completely defined in [I3] by a form of extended BNF grammar. 2 . The conditions permitted in queries are simpler than those of ALPHA, and in some ways more like English. 3. In contrast to ALPHA and most other query languages, where links between attributes must be specified, in VL they can be implicit. For example, the ALPHA query RANGE PLANET P GET (STAR.SNAME): 3P(STAR.S# = P.S# A P.PNAME = 'MARS') can be rendered in VL as GET (SNAME) : PNAME = MARS. The implicit linkage facility means that queries are more natural and less messy, especially for users unfamiliar with mathematical logic. On the other hand, a big system 32 could spend too much time searching for links. This is not a great problem because VL is designed to be used with rela- tively small databases, those which can be kept in core. A more serious problem is the great potential for ambiguity. A "link" between two attributes ("descriptors" in VL terminology) is a sequence of joins of one, two or more relations such that both attributes are included, and there can be more than one way of forming the joins. For example, suppose a database contains the relations R(A,B), S(B,C) and T(C,A), with the following contents: R S T A B B C C A 1 2 2 3 3 2 2 2 3 3 3 1 What is the result of the query GET R.A: S.C = 3 ? If we take the obvious join R*S, then we get R.A = 1 or 2. If we take R*T and then T*S, the result is R.A = 2 or 3. If we take R*S*T, we get R.A = 2 only. Finally, if any linkage is permitted, then R.A = 1, 2 or 3. The problem of ambiguity vs. complexity recalls some of the problems involved in using the predicate calculus in philosophy and mathematics . Symbolic logic was designed pre- cisely to avoid the ambiguity and lack of precision of natural language, but symbolic logic is cumbersome and very difficult for non-mathematicians to understand. This description of VL makes no attempt to mimic any actual implementation of the language. (For example, the VL 33 system described in [13] stores a great deal of information with descriptor names.) Our purpose is to describe a user's view of VL in terms of SIBYL without also describing the invisible nuts and bolts . Our version of VL does not completely coincide with [13] . The grammar has been altered in places, either to make the SIBYL description easier or to eliminate certain constructions that seem semantically inappropriate (in declaring a relation, descriptor names shouldn't have relational qualifications). Descriptions are not provided for INDUCE and DEDUCE state- ments, since their use is peculiar to the VL1 system, or for the implicit linkage convention, because of its complexity. There is some difference of terminology between ALPHA and VL. "Relations" and "domains" or "attributes" in ALPHA correspond to "relational tables" and "descriptors" in VL. In contrase to ALPHA, we have provided almost no error- checking facilities. Data. There are no tuple variables in VL, workspaces are not an explicit part of the language, and descriptors are not declared separately from relations; therefore the only data type is the relational table. In structure, our SIBYL rela- tion for VL is almost identical to that for ALPHA. A rela- tional table can be in one of four states, which indicate its degree of permanence: 'tempi' - used within larger expression *temp2' - saved only until end of session 'event' - table contains a single "event" tuple 'perm' - permanent. 34 There are two special data structures used in our SIBYL implementation: 'rtabs 1 , a list of all relational tables, and 'chtabs', a list of tables created during a CHANGE statement . Basic Forms * Desclist -*■ desc (^ desc)* ** ^desc (j_ desc)* _J * desc: Descriptor ■*■ dname | rtname^dname | dname OF rt name | function ** (Uniqrel ( dname t ) , dname) ] Ortname^dname^ | Ortname^dname )_ | (%, function) One type of descriptor found in [l] is not implemented here: the form rtname . . rtname . dname . Each full descriptor (as opposed to a simple name) translates to a relation-descriptor pair. The relation of the desc-pair is retrieved by the procedure Table, and the descriptor name by Dname. Examples . 1. V#, VCLASS, VNAME 2 . PNAME 3. STAR POSITION 4. POSITION OF STAR 5. MAX (STAR. POSITION) * Dname list ■* dname (^dname)* ** _(dname (^dname ) * A ) * dname: Descriptor name -»■ name * rtname: Relational table name ■+ name * Event name ■*■ name * Newname ■> name * Function ■* funcname (desc (.desc)*) 35 ** funcname ( (Table ( desc ) .value) @ Dname( desc) ( , (Table( desc ) .value) @ Dname( desc))*) These functions are similar to those of ALPHA and are properly functions of sets rather than individual tuples. * Funcname ■*■ AVG ** [A $ (A+)/len(A)] [ MIN ** [A $ A A] [ MAX ** [A $ AV] [ SUM ** [A $ A +] [ COUNT ** [A $ len A] 1 DOMAIN ** [A $ A] We assume that SIBYL has been extended so that A and V give the min and max of numbers , and that functions applied to lists give lists, not functions of unions of those lists Examples . 1. COUNT ( COLONY. CNAME) ( = 6 ) 2. MAX (MISSION. DATE, COLONY. S#) ( = (2245, 56,) ) * Valuelist ■* value (^ value)* ** Rvalue (^value) *;_) * Value •* aexpr iset name ? * aexpr: Arithmetic expression -*■ aterm| aexpr + aterm| aexpr -_ aterm * Aterm -*■ af actor |aterm*afactor | aterm /_ af actor * Af actor ■* constant |£aexpr)J z. af actor * Constant ■*• Note: in VL, strings are not surrounded by quotes. The difference between strings and names is given by the con- text . * Name -*■ . * ? : Undefined value + % 36 The character set of VL in [13] is the same as that of ALPHA, and '#' is permitted within names. Major Productions *0. VL Statement -> create | retrieve | change | delete | save | comment | induce | deduce | help *1. Create -*■ define-ins |add-ins *l.la Define-ins -► DEFINE (RT) ? ( rt name_( dname 1 i s 1 1 )_ ( KEY : = dname list 2 )?) "' "END ** ( (rtname : (descs : dname 1 is t]_ »_ key ^(dnamelist2)? ' status : 'perm' , value : ( ) ) ) .&(env,rtabs) . + (env.rtabs); ) + In a relational table the key is optional. The key fields must be an initial sublist of the list of descriptors. In the VL implementation described in [l3], a relational table is a set of tuples, the first tuple being the descriptor list . The structure above agrees with that used in the last chapter, and simplifies the SIBYL description. The second form of definition is used to create "events", or single tuples which do not yet belong to relational tables (actually, events are formally a special kind of relation), while the first creates empty tables. 37 pi. lb Define-ins ■+ DEFINE EVENT (eventname jjl (dname : =value ( , dname : =value)*) )+ END (dname (j_dname) * ,) , ** ( (eventname: (descs status value ' event ' , (( dname rvalue (^dname^, value) *) ,)) ) .&(env,rtabs) . -» (env,rtabs); )+ Examples . 1. DEFINE STAR (S#,SNAME, POSITION, SCLASS) KEY : =S# END 2. DEFINE RT PLANET (S#,P#,PNAME,PTYPE) KEY: =S#,P# END 3. DEFINE MISSION (S#,V#,DATE,MTYPE) VEHICLE (V#,VCLASS,VNAME) KEY:=V# END 4. DEFINE EVENT S:=(S#: =01, SNAME:=SUN, POSITION: =202020, SCLASS :=G) END *1.2 Add-ins ■* ADD TO rtname Orow-cond)? tuplelist END ADD eventname TO rtname O_row-cond)? | ADD (v alue list) TO rtname Orow-cond)? ** (env $ (buffer, insert, v) $ buffer <- ( _( tuplelist @-»v; rtname . descs . : v) eventname .value (rtname . descs . :v aluelist ,) )j_ insert «- len( rt name . value) ; ( insert <- r ow-cond;) ? rtname . value «- rtname . value (1 . . insert) & buffer & rt name . value (insert+1 . . len( rt name . value)) ; ); 38 The variable 'buffer' stores the list of tuples to be inserted, while 'insert' points to the insertion location. Only the first form requires an END. *1.2.1 Tuplelist + (£valuelisO) + *1 .2.2 Row-cond ■> [ROW <_ (value | LAST )] | [ROW > (value [ LAST )] ** (value 1 1 en ( rt name . va lue ) ) ^JL | (value | len( rtname . value) Examples . 1. ADD TO PLANET (01, 02, EARTH, E) (01, 04, MARS, M) (01, 06, TITAN, I) (12, 01, GAGARIN, M) (23, 02, PLATEAU, E) (23, 05, KAMPEN, I) (56, 01, CYON, R) END 2. ADD S TO STAR 3. ADD S TO STAR : [ROW < l] 4. ADD (S 15, STARSHIP, SCORPIO) TO VEHICLE: [ROW > LAST] *2. Retrieve + get | let "2.1 Get ■+ GET (RT) ? (rtname y^) ? rtcond (order) ? ** (env $ tempt ab $ tempt ab «- r t condj_ ( Sort 1 (temptab, order) ; ) ? ( (rtname: temptab ) .&(env,rtabs) ,-»(env,rtabs) ; rtname . status »• 'temp 2'; ) ? Display ( t emp t ab) ; ) 39 *2.2 Let * LET (RT) ? rtnarae 2Z rtcond (order) ? ** (rtname:rtcond ) . &(env,rtabs) . ->-(env,rtabs) ; r t name . status *- t temp2 f ; ( Sort 1 ( rtname, order) ; ) ? A GET statement always prints the retrieved table but need not save it. A LET always saves the retrieval, but doesn't print it. As in ALPHA, an ordering on descriptors can be specified. E xamples . 1. GET RT TEMPI := STAR ~ 2. GET TEMPI := STAR 3. GET TEMPI := STAR ORDER UP ON SNAME 4. GET STAR The first two statements are identical; the permanent table STAR is printed, and stored in the temporary table TEMPI. In #3, the temporary copy of STAR (but not the permanent version) is reordered by increasing star name. In #4, the table is printed but not stored. 5. LET TEMPI := STAR STAR is stored in TEMPI, but nothing is printed. More complicated retrievals will be illustrated as the relational table condition is described. *2.3 rtcond: Relational table condition -*■ rt((: | WHERE ) condition)? |iset ** rt ( ;(env $ t $ L?*- @ "*" t; condition => t) i ) ? |iset The relational table condition actually carries out the retrieval . It can be in one of two forms : one similar to the 40 ALPHA retrieval; and the image set, which is described below under *2.3.3. An rtcond retrieves a temporary relational table, which may itself be used in a nested retrieval. No tuple variables or quantifiers are used or permitted; implicit linkages and image sets can usually be employed instead. In fact, VL is fully as powerful as ALPHA. Semantically, the rtcond returns either a relational table consisting of all tuples in the table rt which satisfy 'condition 1 , or an image set . -''2.3.1 Rt + texpr ( _(dnamelist)_ )? |£desclist) ** (descs: ( (texpr . descs ( ; dnamelist ) ? Dname desclist ) ) , key : % , status: 'tempi ' , value: (t expr (@ dnamelist) ? )_ | (s & env $ (Join (Table desclist ) Q ■» s ; s . Dname desclist )_i i 1 The table expression 'texpr' is a union, difference or intersection of compatible tables. It is projected onto the fields in 'dnamelist'. If a descriptor list is used with no explicit table in front of it, the join of the implied tables is taken. 41 Exa mples . 1. MISSION 2. MISSION (S#,DATE) 3. (STAR. SNAME, PLANET. PNAME) 4. (SNAME, PNAME) *2.3.1.1 texpr: Table expression ■+ tterm| texpr+tterm| texpr^tterm ** (t expr . value «- ( Union | Setdiff ) (t expr . value , tter m. value)) * Tterm ■+ tf actor | tterm * tf actor ** (tter m. value *- Intersection( tter m. value ,t f actor. value)) * Tf actor -»■ rtname | O:tcond)_ ** ( (R : rt name ) $ R. status «- 'tempi' ;R) | rt cond Examples . Suppose CITY and COUNTRY are two relations with the same attributes as COLONY. Then the following are valid relational table expressions for use in an rtcond. 1. CITY + COLONY 2. CITY- COLONY (CNAME,DATE) 3. COLONY - CITY*NATION (S#) 4. COLONY + (CITY: [CITY. DATE > 2100]) *2.3.2 Condition ■* vl-terrn (OR vl-term)* ** vl-term (V vl-term)* * Vl-term ■* selector (selector)* ** selector (A selector)* * Selector -*■ [desc relop valuelist] [rtcond -^ relop rtcond 2] I [valuelist (NOT) ? IN rtcond] ** (L ink (t , desc ) @ (! relop valuelist ! (if relop ='N0T=' then ' A ' else 'V fi))V 42 [ RtopC rtcond-, , relop^ rtcond2)_ | (/_) ? In(valuelist , rtcond )A ) Relop + <_ | < = | ^ | NOT = | _^_= | A condition is a logical expression in disjunctive normal form. The simplest conditions, or "selectors", can be in one of three forms: (1) A test to see if a descriptor bears a certain relationship to each of several values. (2) A test for subset or equality relationship between two relational tables. Here '<' means 'subset'. The tables can themselves be complicated expressions. In practice, they are often image sets. (3) A test to see if each member of a list is or is not in a given table. Examples . 1. [SCLASS = G] 2. MISSION. DATE > 2100] 3. PTYPE = E, M] 4. "(STAR. S#: SCLASS) > = PLANET (S#)] 5. 02 IN PLANET (S#)] 6. "A,B NOT IN (SCLASS)] 7. [SCLASS = G] OR [SCLASS = K] [PTYPE = E,M] . : 2.3.3 iset Image set ■* (desc. ( : | WHERE ) (desc2 1 desc9 = value2) (^_(desc | desc = value))" 2. (Table ( desc- ^ (g ->• x ; J_n( (t .Dname( desc2) [value 9) ,Link(x, desc?) ) (A In( (t .Dname(desc ) [value ) ,Link(x, desc )) )i =» x . Dname ( de scj) ); The image set retrieves all values of desc-^ which satisfy a list of two kinds of condition: (1) a join linkage can be 43 found to a value of desc2 which equals the given value, or (2) a join linkage can be found to a descriptor being retrieved. Thus in the second case image sets can have free variables . Such an image set cannot be a relational table condition; it makes no sense to say GET (MISSION. V#:M1SSI0N. DATE) . Examples . 1. GET (STAR. SNAME: STAR. SCLASS = M) Get all star names whose class is M. 2. GET (V#): [(MISSION. S#:V#) > = (MISSION. S#:VNAME = CODD)] Find the numbers of all vehicles which have visited the same stars as the Codd . 3. LET T:= (PLANET . PNAME : PTYPE = E, SCLASS = K) Find the names of all E type planets around K-class stars, and store in T. This query could also be rendered LET T:= PLANET(PNAME) : [PTYPE = E] [SCLASS = K] . *2.4 Order -> ORDER (UP [ DOWN ) ON desclist ** ( ('UP' | ' DOWN ') ,) & Dname desclist The ordering statement and its SIBYL implementation are the same as in ALPHA, except that only one direction of ordering is permitted. *3. Change ■+ CHANGE (RT) ? (rtname -) ? rtcond-,^ (_^(rtname := ) ? rtcond)* (change- stmt )* end- change ** chtabs ♦- ( rtcond-. &( source : )_ (jjrtcond & (source: )* J ); 44 Ortname, j_ rtcond-.)_ (& (rtname : rtcond) )* .& (env,rtabs). ■» (env, rtabs) ; ( chang e - s tmt ) * end -change The rtconds used must extract tuples from only one existing table. The key fields are not used to implement the change, nor is there any special protection for them. The list 'chtabs' stores the retrieved temporary tables. Altera- tions are made by change- statements to the temporary tables; the permanent tables are updated with an END. Each member of 'chtabs' has an extra field 'source' to record the rela- tion that is to be updated. "3.1 Change-stmt ■* dname y^ value (j_ condition)? | dname : = newname : [ROW = 0] [ GET rtcond (order) ? | DISPLAY rtname (chtabs @ ->• R; if In ( dname , R.descs) then (R. value @ •» u; (condition =» ) ? u . dname «- (u . dname , value )_ ) -> R. value fi; ii (chtabs @ -> R; if In ( dname , R.descs) then Substitute ((dname , newname) , dname , R.descs) ; 45 (R. value @ -> u; u — Subs ti tut e ( newname , dname , dom u) . :ran u ) -*• R. value fi; h. Display ( rtname) ; Changes to be made are stored as pairs of old-new values, which replace the old value . There are two basic types of change: change of value of a descriptor, and change of a descriptor name. In each case, all tables in 'chtabs' are affected. The CHANGE statement cannot be used to add tuples to a relation. -3.2 End-change -> ABORT | END ** chtabs <-( ); | Update ; chtabs *- ( ) ; Examples. 1. CHANGE Tl := PLANET, COLONY DISPLAY Tl P# := 03 : [PNAME = PLATEAU] GET (PLANET. P#, COLONY. P#): [PNAME = PLATEAU] END The number of Plateau is changed to 03. The DISPLAY prints the retrieved table Tl, while the GET prints the P# 02 retrieved from old, unchanged tables. The END updates those tables . 2. CHANGE STAR SCLASS := STYPE [ROW = 0] ABORT The user changes the SCLASS field name to STYPE, but then changes his mind. The required condition [ROW = 0] reflects 46 the fact that descriptor names are stored in the Oth row of a table in the standard VL implementation. This condition syntactically distinguishes the first two kinds of change statement . *4. Delete + DELETE (RT) dcond (^dcond)* ** dcondj_ (dcond j_)* -4.1 Dcond ■♦ rtname (£dnamelist)_ ? ((: | WHERE ) condition)? | rtname ** (rtname . value @ -> t ; if T (A condition) ? then t (. dnamelist^)? ■*- nil else t fi ) ->• rtname .value; | rtname +- %; The DELETE statement erases fields of tuples or whole tuples from a table, or, in its simplest form, destroys the table. Examples . 1. DELETE MISSION The relation MISSION is destroyed. 2. DELETE MISSION (MTYPE) Remove the descriptor MTYPE from MISSION. 3. DELETE STAR: [S# =11], COLONY Delete star #11 from STAR, and destroy the COLONY table. 4. DELETE STAR (S CLASS ) : [S# = 11] Erase the class of star #11. 47 In [13], arbitrary rtconds are permitted in the delete list . *5. Save + SAVE rtname ** rtname .status *- 'perm'; A temporary table is made permanent . Example . SAVE TEMPI. *6. Comment -> COMMENT string END *7 . Deduce *8 . Induce These last two statements are an essential part of the VL1 system, but are highly specialized features of the VL language . Since they have no counterpart in other query languages and are not fully described in [l3],we have chosen to leave them out . *9 . Help -> HELP nonterminal or instruction . The nonterminal is one which occurs in the original VL grammar of [13] . The result is a printout of a production rule or description. Examples . 1. HELP GET 2. HELP Procedures 1. Sortl: . 2. Display : . 48 3. Table: [desc $ desc(l)] 4. Dname: [desc $ desc (2)] The last two procedures extract the associated table and descriptor names from a descriptor pair. 5. Uniqrel: [dname & env $ (rel:0) $ (rtabs ■* R; ( R.descs == dname then ( rel = => rel «- R , rel/= (0,7oM => rel «- % ) ) ) rel] The single argument dname is a unary list. Uniqrel returns % if dname is associated with more than one relation. 6. Join: [tlist & env $ J $ (len tlist // < = 1 =*tlist(l), > = 2 => ((A:hd tlist ,B:Join(tl tlist))& env $(x > y,d,J)$ d *- Intersection (A.descs, B.descs) J-*- (descs: Union (A.descs , B.descs) , key:7o, status: 'tempi', value: ( ) ); (A @ ^x; B (§ -y y; if (x.d = y.d) then J. value <*- x & y fi );J )) ] 49 The procedure Join recursively computes the join of a list of relational tables. [(A,B)& env $ Rectify (A & B)] [(A,B)& env $ x $(A @ + x; In(x,B) => x)] [(A,B)& env $ x $(A @ - x;/In(x,B) => x)] [(x,A) $ y $ (A @ - y; x == y)v] [(x,y,A:ref A) $ a $(A @ ■+ a; a==y «* x,a)] [(A:ref a) $ (x:%, B: ( ) ) $ (A @ - x; x /= % => /In (x,B) =* (B«-x) ); A ■«- B 7. Union 8. Intersection 9. Setdiff 10. In 11. Substitute 12. Rectify 13. Subset ] [(A,B) & env $ x $ (A @ + x; In(x, B))A ] We assume that '==' is defined to mean 'identically equal'. By using this operator, the above set operations will work properly for any kind of list . In ALPHA we needed set operations only on lists of primitives. 14. Rtop: [(A, op, B) & env $ (op // = '<' =» Subset (A,B) A /Subset (B,A), = '<=' =* Subset (A,B) = ' = '=» Subset (A,B) A Subset (B,A) , ='/=' =» /Subset (A,B) V /Subset (B,A), 50 = '>=' =* Subset (B,A) = '>' => Subset (B,A)A/ Subset (A,B) )] The procedure 'Rtop' converts set expressions with rela- tion operators into set relationships. 15. Update: [(chtabs @ ■* R; C 'first check if any descriptors have been changed' (1. .len R.descs @ ■> n; R.descs (n) ■+ d; if d = list then Substitute (d(2),d(l), R. source. descs) ; (R. source, value @ ■*■ x; x *- Substitute (d(2) ,d(l) , dom x) . : ran x ) ■*■ R. source .value; C 'replace d by its 2nd entry' R. descs (n) <- d(2) fi ); C 'update values in R. source' (R. value @ ■+ x; R. source, value @ ■*■ y; R.descs @ ■* d; if x.d = list A x.d(l) = y.d then y.d *- x .d(2) fi ) ] 51 16. Link: [(x,d) & env $ if In (Dname(d), dom x) then (x.Dname(d) , ) else
fi ] If x is a tuple and d a descriptor, then Link(x,d) is the set of all values of d for which join-links can be con- structed from the entire tuple x. Reference [13] Schubert, R. N., "The VL Relational Data Sublanguage for an Inferrential Computer Consultant," Univ. of Illi- nois Dept. of Computer Science UIUC DCS-R-77-846 (1977). SQUARE The SQUARE sublanguage (Specifying Queries As Relational Expressions) reflects a different philosophy on query language design: that queries should resemble as far as possible the way people use tables . It was developed after ALPHA but before VL [2 , 3] - Like VL, it is less cumbersome than the predicate calculus of ALPHA, but unlike VL it makes some use of free variables and none of implicit linkages. Among its noteworthy characteristics may be listed the following: 1. SQUARE heavily uses "relational expressions", which may be the objects of queries or which may be embedded in more complicated expressions . 2. The most common building block in relational expres- sions is the "mapping", which mimics human usage of a table. 3. As with VL, there are no provisions for file protection. 4. The original SQUARE uses an unusual two-dimensional syntax for easy exposition, which is reproduced here. This feature is discussed in [3] . A more conven- tional syntax has been developed into the closely related language SEQUEL [4] . Let us examine a typical mapping. Suppose we want to 53 retrieve the names of all planets around star number 01. This is rendered 1. PLANET (01) PNAME S# If the relation PLANET is imagined in table form, each domain is a column of the table; in fact, "column" is the SQUARE terminology [3] for "attribute" or "domain" in ALPHA or "descriptor" in VL. To perform (1) manually, one looks up all 01 entries in the S# column ("source" column) , then retrieves the corresponding entries under PNAME (the "target" column) . For a slightly more complex retrieval, suppose we want the numbers and names of all colonies around star 56 which were founded after 2300: 2. COLONY (56, > 2300) C#, CNAME S#, CDATE Each source must match its corresponding argument. The comparison operator in the second argument illustrates the tendency in SQUARE to put a lot of information in argument lists. Our syntax for SQUARE is based on that given in [3] . There is no error checking. Data . Relational tables, or sets, and variables are the only two data types directly involved in handling relations . Neither require system information fields, so a permanent relation is just a list of records. Formally, variables fall into two classes: "free vari- ables", analogous to ALPHA tuple variables, and "computed 54 variables", which have values assigned to t) ; however, there is no nee' to make a sharp distinction .^tween them. Although i ianent relations consist of records ("tuples"), a "set" or "relational expression", which is what a mapping or other query retrieves, is a list of lists with column names removed. For example, if R(A,B) presently con- tains the tuples (A:0, B:15) and (A: 3, B:76), then the expression A, B has the value ((0, 15,), (3, 76,),). To keep track of column names we use the special variable 'columns'. When the expression tuples are assigned to a computed variable, the fields of 'columns' are added to make records. Of course, it would be possible to define the SIBYL translation of SQUARE so that tuples are always records . Note that "sets" are lists of tuples, not lists of primitive values. The symbols + and 4- in the grammar indicate whether following terminals are to be written on, respectively, the upper or lower line . Basic Forms *aexpr: Arithmetic expression ■+ aterm aexpr + aterm aexpr aterm * Aterm ■+ afactor aterm *_ afactor aterm /_ afactor ** aterm * afactor * Afactor -*■ const fexpr _(aexpr)_ afactor 55 *fexpr: Function expression ■*■ t fnarae_(sexpr)_ **fname ^sexpr @ 1) * Fname ■+ AVG ! MAX | MIN | SUM | COUNT ** Example. COUNT ( PLANET (56) ) P# S# The number of planets around star 56. *acop: Arithmetic comparison operator "** = I *M £ I £ I i I 5. ** /= <= >= Column list + cname (^ cname)* **j(cname^, cname V*^) Column name ■*■ name Computed variable -* name Variable name -*■ name Relation name -»■ name * Name ■*■ Neither [2] nor [3] have '#' in names, but we permit it in order to make use of the example database. Traditionally, SQUARE variable names are lower case [3,11] . * collist * cname: * compvar * vname: * rname: tlist clist Tuple list ■+ tuple Ctuple)* Tuple •*■ Constant list -* const (, const)* **(tuple (j_tuple) * x ) ** clist ** ( const ( , cons t ) * ) /( const: Constant ■+ num|str * num: Number -> * str: * ul * + String ■+ Underline -*■ Upper line -*■ 56 * + : Lower line ■> The SQUARE character set of [2] and [3] has many more special symbols than ALPHA or VL. Examples: lower case letters, n, u, 0, t, + , <-, +. Set Expressions "0. sexpr: Set expression ■* sterm sexpr U sterm sexpr 2. sterm ** sterm Union( sexpr, sterm)_ Setdiff ( sexpr, sterm)^ * Sterm -*■ sfactor sterm J\_ sfactor ** sfactor Intersection ( sterm, sfactor^ * Sfactor -* rexpr | (sexpr )_ * rexpr: Relational expression ■*■ [tlist] + compvar (i collist) ? proj mapexpr *' V columns •<-(); tlist ( ) columns + collist \_ compvar ( @ collist) ? proj mapexpr As mentioned before, the set expression is the backbone of SQUARE . Exampl es . 1. [< 'P078*, 'PROBE*, 'PI0NEER'>, <'P323\ 'PROBE'," >] A "constant" relational expression, consisting of two tuples, perhaps to be used in an insertion. 2. 3. x SNAME, SCLASS 57 4 . STAR SNAME, SCLASS Ex. (3) and (4) give the same result if the table STAR has been stored in x. (4) is the projection of STAR onto two of its columns . 5. VEHICLE ('STARSHIP') V# VCLASS 6.x e VEHICLE: x = 'STARSHIP' V# VCLASS (5) is similar to mapping (1) discussed in the introduc- tion to this chapter. As a query, (6) is the same as (5), but is expressed with a free variable. In the syntax of [3] (and in ours) , expressions with free variables may be used only as queries, and not as part of more complicated expressions. Thus (6) is not a "relational expression", although the syntax could be extended to make it one. 7. STAR u PLANET- SNAME PNAME A set expression, the union of two relational expres- sions . *0.1 pro j : Projection ■> 4- trglist f rname ** domains -«- trglist j_ rname @ trglist This is the simplest way of retrieving from a table. To retrieve all of STAR we must list all its columns: STAR S#, SNAME, POSITION, SCLASS *0.2 mapexpr: Mapping expression ■+ mapping (o mapping)'' + _(arglist)^ ** mapping (mapping)* £arglist2. 58 "0.2.1 mapping: Mapping ■* + trglist + rname ( J.) + srclist ** [ args & env $ domains *■ trglist j_ ( Rectify | ) _(rtname @ -*■ x ; Agree ( trglist, rname , x, Cnames ( srclist) Type ( srclist ) , arg s ) =» x. trglist I "0.2.1.1 trglist: Target list ■+ collist "0.2.1.2 srclist: Source list ■* srccol (^srccol)* ** ^srccol (, srccol)*, ) *0. 2. 1.2.1 srccol: Source column ■* cname ul cname ** _( ( 'dis ' 'con' ) _^ cname )_ "0.2.2 arglist: Argument list ■»■ arg ( x arg)* -" Urg (j_arg)* mL "0.2.2.1 arg: Argument •+ ( acop) aexpr sexpr ** _^ ( = acop) aexpr _^ sexpr The mapping is the commonest SQUARE retrieval. Examples . 1. VEHICLE ('STARSHIP') V# VCLASS The set of numbers of all vehicles of STARSHIP class. 2. COLONY (56, <2300) C#, CNAME S#, CDATE The numbers and names of all colonies around star 56 founded before 2300; the result is (( 58, 'PRIME',),). Here C#, CNAME is the "target list" of the mapping, the list of 59 columns to be retrieved. Next is the table COLONY. Then come the "source list" S#, CDATE and the "argument list" 56, <2300, which are matched. This last part of the mapping can become quite compli- cated, as is shown by the intricacy of the procedure Agree, which in our implementation performs the actual comparisons. Arguments may be simple values or arithmetic expressions, values with built in comparison operators, or complete set expressions. In the latter two cases, there is defined to be a match if the source columns of a tuple match any value of the set; this is known as a "disjunctive mapping". 3. COLONY (^2300, CNAME DATE, S# STAR ('PROXIMA')) S# SNAME The names of all colonies around Proxima which were founded in or after 2300. If the date requirement is dropped, the new expression can be written as the composite of two mappings : 4. COLONY o STAR ('PROXIMA') CNAME S# S# SNAME Larger tuples may be used in sets in the argument list, in which case source columns are compared simultaneously: 5. COLONY (>2005, CNAME CDATE, S#, P# PLANET ('MARS')) S#, P# PNAME The names of all colonies on Mars founded after 2005 (result = ( (BELLONA , ) , ) ) . Both star and planet numbers must be specified. Note that LANG matches in the S# column but not the P# column, and is therefore not retrieved. 60 In a composition the target and source lists need not have the same names . This is convenient , since MDATE and CDATE are different. 6. COLONY o MISSION ('S12') CNAME S#, CDATE S#, MDATE V# The names of colonies founded by vehicle S12. One can also require that source column entries match all tuples of a set. This is called a "conjunctive" mapping, and is indicated by underlining the column name. 7. MISSION ( STAR ('A')) V# S# S# CLASS The numbers of vehicles which have visited all A-class stars. More precisely, if m is a tuple in MISSION, then (7) retrieves m.V# if and only if the argument set above is included in the set [x.S# : x.V# = m.V#] of S# which corres- pond to m.V#. In other words, if S is a set of tuples, the query R (S) B A is equivalent to x • Freevars (v arlist) ± ( test =» ) ? Vars (varlist) @ Cols (varlist) 62 1 )i Display (t emp, domains ) ; The GET retrieves an expression and displays it without assigning it to anyone. Examples . 1. GET VEHICLE ("STARSHIP') V# VCLASS 2. VEHICLE ('STARSHIP') V# VCLASS 3. v +■ VEHICLE ('STARSHIP'); V# VCLASS GET v 4. GET x 6 VEHICLE: x = 'STARSHIP'. V# VCLASS 5. GET s : s + COUNT ( VEHICLE ('STARSHIP')) V# VCLASS The first four are all equivalent . Example (4) is reminiscent of ALPHA. In the syntax of [3] , such expressions can only be used as the object of a query. Example (10) under *0.2 shows an arithmetic expression in a GET. Free variables are those used in ALPHA-like targetlists. Computed variables have values assigned to them, just as in common procedural languages. In the examples above, v and s are computed, while x is free. *1.1 varlist: Variable list + var (j_var)* ** Cvar (, var)*, ) * var: Variable ■+ freevar compvar ** freevar (dom compvar @ ->- d; jtcompvar^ d , ( ) ) ) 63 * freevar: Free variable ■*■ + vname _£_ rname i vname + collist 4 (_£_) ? rname ** (. ( dom vname (1) | collist) @ ■» d; _Cvname^_ d^ rname^ 1 compvar: Computed variable ■*• t vname *1.2 test Test ■*• lterm test V lterm * Lterm ■*- lfactor | lterm A lfactor * Lfactor -*■ arithcomp modcomp setcomp asgn j(test)_ | 2l lfactor ** | / lfactor *1.2.1 arithcomp: Arithmetic comparison ■+ t aexpr + acop t aexpr *1.2.2 modcomp: Modified comparison -»■ + aexpr t acop + mod t sexpr t sexpr t mod + acop f aexpr + sexpr -^ t modi + acop + mod^ + sexpr« ** _(aexpr acop J^sexpr @ 1) ) mod 1 ((sexpr @ 1) acop aexpr), mod 1 ( (sexpr ^ @ 1) @ acop ^sexpr^ @ 1)) mod, mod 2 Modifier ■* ALL SOME ** *mod A V *1.2.3 setcomp: Set comparison -*- t sexpr-, + scop + scop^ ,Wc Setcomp ( sexpr, , 'scop' , scop«)_ *scop: Set comparison operator -*■ = £ Z* z £ Z ** = | /= | > | >= | < | A test is a boolean expression containing a free variable; it has the same role as the ALPHA "quantified expression" or <= 64 "predicate". Numbers or sets can be compared. Assignments may be embedded in tests, where they return a value of "true". One novelty which saves effort is the use of quantifiers ("modifiers") around arithmetic operators. Since all sets are finite, no ambiguity is possible when two quantifiers are used. Example . [<1>, <3>, <4>] ALL < SOME [<3>, <6>] The value is T. In [2] and [3] modified operators are defined in terms of the MAX and MIN functions. *2. asgn: Assignment ■*• + ( LET ) ? (rname (+ collist +) ? ♦ query compvar ♦ aexpr) ** ( (rname : %) % env -*• env ; rname •«- ( query @ ■» q ; domains (j_ collist) ? . : q 1 I compvar + aexpr ) ; T Assignment can be used either to create temporary rela- tional tables, or to assign arithmetic values to computed variables. The results are not printed. Examples . 1. LET TEMPTA3 * STAR SNAME 2. TEMPTAB * STAR SNAME 3. GET S STAR, x: S = 'G' A SNAME SCLASS x +■ COUNT ( PLANET (s )) P# S# S# 65 Ex. (1) and (2) are equivalent. (? N retrieves the names and number of recorded planets of G-clas:? stars. The result is SUN 3 RIGIL 1 TAU CETI 1 *3. create: Creation -»-++_ rname + cname ( M cnarne)* ** (rname : ( (dn ame : % (j_ dname : %> ) * ) ) ) & env -> env; Example . + PLANET S#, P#, PNAME, PTYPE A permanent relational table is created. A dummy tuple is inserted to save the column names . *4. insert: Insertion ■*• + +_ rname + collist t Jjclist)^ ** rname <+. collist ^j_ clist j_ Example . 4- PLANET (01, 06, 'TITAN', 'I') S#, P#, PNAME, PTYPE *5. delete: Deletion •> f +_ rname I collist + _(clist)_ ** Orname @ •» x; if rname ^ collist ■*• clist then Remove (x , rname)_ fi; I h. Example . 1. t PLANET (01, 06, 'TITAN', 'I') S#, P#, PNAME, PTYPE 2. t PLANET (01) S# Example (1) deletes the tuple that was inserted in the last example, while (2) deletes the records of all planets 66 around star 01. *6. destroy: Destruction ■*■++, rname ** rname *■ %; Example . + PLANET The entire table PLANET is destroyed. *7. update: Update -*■ t + rname 4- collist j_ ulist t J[arglist: x clist)_ ** (1. .len rname @ -> n ; x *■ rname (n) ; (x. collist @ ! arglist ) =» (x & env $ (1. . len ulist @ -*• k; rname (n) ! (u list k) clist (k) ) 1 In an UPDATE the source list is divided into two parts. Those tuples in the relation whose 'collist' entries match the corresponding arguments are updated according to the update list 'ulist' and its arguments. Arithmetic operators may be put in the update list . Examples . 1 . ■* VEHICLE ( ' CODD ' , ' DOLPHIN * ) . VNAME; VNAME Change the name of the Codd to Dolphin . 2. + PLANET (23, 1) S#; P# + Add 1 to the numbers of planets around star 23. The syntax of [3] permits tuples in the update argument. 67 Procedures 1. Cprod 2. Display 3. In 4. Intersection 5. Rectify 6. Remove 7. Setcomp 8. Setdiff 9. Union 10. Agree: [(T, [ x & (A: ref A) & env$A +■ Setdiff (A, (x,))] y; y . T == x . T =* y . S (m . . n) ) ) fi fi 68 )A ] T = target list of mapping R = relation x = tuple being tested S = column names in source list type = types in source list ('dis' or 'con') t args = argument list \ [srclist $ srclist @ 1] [srclist $ srclist @ 2] [varlist $ varlist @ 2] [varlist $ varlist @ 3] [varlist $ varlist @ 1] [varlist $ (varlist @ ■> v; v(3)/= ( ) => v(l))] 11. Type: 12. Cnames : 13. Cols: 14. Rels: 15. Vars : 16. Freevars V References [2] Boyce, R. F., D. D. Chamberlin, W. F. King, and M. M. Hammer, "Specifying Queries as Relational Expres- sions," Proc ACM S1GPLAN/S1GIR Interface Meeting (1973), pp. 31-40. [3] , "Specifying Queries as Relational Expressions: the SQUARE data sublanguage," Proc 1974 ACM SIGF1DET Workshop , Ann Arbor, Mich. (1974), pp. 249-264. 69 [4] Chamberlin, D. D., and R. F. Boyce, "SEQUEL: A struc- tured English query language," Comm ACM , vol. 18 (1975), pp. 621-8. [ll] Lacroix, Michel, and Alain Pirotte, "Example queries in relational languages," Technical Note N107, MBLE Research Laboratory, Brussels, Belgium (1978). CONCLUSIONS This report has used the semantic definition language SIBYL to study three relational database query languages by providing a syntax-directed translation into SIBYL for each one. Writing it has made clear some of the problems, advantages, and disadvantages of the languages, of the translation method of semantic definition, and of SIBYL itself. The Languages . 1 . ALPHA . The entries ("targets") used in target lists are of different types, whereas logically only tuple variables are permissible. For example, we can have (X.S#, Y, STAR, PLANET. P#) as a target list, where X and Y are tuple variables, STAR and PLANET are relations, the outer two targets are qualified by attribute names, and the inner two are not . The advantage of this inelegance is that target lists become more like natural language. To use relation names as tuple variables is not consis- tent, and requires extra run-time structure (each relation has a built-in tuple variable). Again, this makes ALPHA more English-like . A more serious problem is the independent declaration of tuple variables and the use of implicit quantification, 71 as in RANGE STAR X GET W (PLANET. PNAME): X.SNAME = 'SUN' X.S# = PLANET. S# Not only must the implementation keep track of declared tuple variables, but in translating a quantified expression, it must find out which variables are implicitly quantified by examining the predicate and target list for free variables. To the extent that ALPHA resembles the predicate calculus, it is simple, elegant and easy to translate. At the same time it is clumsy and hard for untrained users to understand. 2. VL. As mentioned in the VL chapter, implicit linkage, which is VL's most distinctive feature, is both a great advantage and a potentially serious problem: it makes certain queries very simple, but adds the possibility of semantic ambiguity. A syntactic ambiguity occurs with image sets: (A:B) is the set of all A corresponding to B, while (A:B=5) is the set of all A such that B = 5. Some users may be confused by the absence of quotes around strings . 3 . SQUARE . This language's two-dimensional syntax is not suited for computers, but alternative syntaxes are easy to construct and have already been developed. This pecu- liarity does make SQUARE very easy to read, which was in fact the original intent . 72 The source and arguments lists of mappings can become quite complicated, and thereby account for most of the diffi- cult features of our implementation. Perhaps some way of breaking these structures into smaller and more manageable pieces can be devised. On the other hand, these lists are rather easy to read, and are fairly close to the mathematical meaning if arguments are interpreted as defining sets. The simple mapping B^A^S) is simple both to understand and to implement. It mimics the way people use tables. Mathematically, it can be written Tr^TT* _i (S) , where tt„ denotes the projection of R onto the column (domain) list K. Thus SQUARE seems based on functions rather than the logical predicate calculus. Translation. Our method of syntax-directed translation into SIBYL ran into four kinds of problems. 1. Peculiarities of SIBYL. These are not serious . For example, a unary list in SIBYL must have a trailing comma, so a typical list like * Namelist -» _£ name ( _^ name) * )_ must be translated ** j[ name (j_ name) * , ) 2. Context sensitivity in the object language. Since our grammar is context free, context sensitivity appears in the semantics , where it must be handled by the SIBYL transla- tion (for example, declaration and use of tuple variables in 73 ALPHA, handled by run-time structures). This problem seems fundament ally insoluble. 3 . Complexity of language structures (e.g., comparisons involving sets in VL and SQUARE) or of the SIBYL implementa- tion (the VL implicit linkage) . This is to a certain extent unavoidable, since we often want languages to be complicated. In many cases complexity is due to one of the other three problems. When writing the SIBYL code, one must try to make it as readable as possible. 4. Unsuitable, inconsistent, or inelegant syntax. Some syntax problems can be easily solved by simply rewriting the grammar (for example, using dnamelists instead of desclists in VL relation declarations). The problems are more serious when the syntax doesn't accurately reflect the semantics at an important point (an example: the ALPHA relation name, which can act either as a bona fide relation, or as tuple variable taking values in that relation) . This leads to complicated translations of the offending form, in order that the form can be used in all of its ways . There are several possible solutions: (1) Break the form up; e.g., 'form' becomes ' forml' and 'form2', with separate translations. (2) Translate the form into enough structure to take care of all uses , then perhaps apply procedures in each separate case (e.g., the SQUARE ' srclist 1 ). This was our usual strategy. 74 (3) Permit more than one translation of a form. For example , * Name ■> ** 1 ** 2 ; ; ; ; ; Later we might have * Widget ■* ... name . . . ** . . . name**l . . . and also * Nubbin -* ... name . . ... name /wx z We did not use this method, but it seems promising SIBYL In general, SIBYL worked very well in semantic defini- tion. It is powerful and flexible enough to translate most object language features, and has good facilities for using a brute force approach when all else fails (procedures and ! before strings). Nevertheless, improvements can be made, to which end we suggest the following extensions, some of which are used in this report: 1. Define ~ to be a "super-equality" operator which never distributes over lists; A==B is true if and only if A and B have identical values . 2. Permit names to be a special data type, or otherwise allow coercion of names to their values to be suppressed. This is possible now by putting names in lists or procedure brackets, but perhaps something less messy can be thought up. 3. Allow do . . . @ . . . od for loops of the form 75 4. Add more record processing facilities. For example, if R = (A: 10, B:20), then R(l) might be defined as (A: 10) . 5. Extend _V_ and _A to give the maximum and minimum of two numeric values . 6. If SIBYL is extensively used in language definition, someone should design a package of routines for this purpose. It might also be useful to find a way of building semantic level numbers into strings; e.g. 1' DELETE 1 or 1DELETE for a nonterminal, 0' = ' or 0= for a terminal, and 2 1 -*- 1 or 2+ for a symbol used in the grammar. 7. It would be useful to find a way to simulate alternate modes of indexing in SIBYL. For example, in SIBYL, single elements, rows, and columns of a 2-dimensional array are referenced by expressions of the form A(i)(j), A(i) , and (A@j) respectively. In many languages expressions of the form A(i,j), A(i,%), and A(%,j) are standard. These could be used in SIBYL by making A a procedure that references an array-type data area. Comments . These language descriptions provide constructive defini- tions: if a working SIBYL interpreter is available, these definitions can be used to implement the languages (under the assumption, of course, that errors have been removed). Of great help to any such project would be the availability of a package of routines designed to help in language definitions , as was discussed above. For example, such a package should include standard set operations. 76 SIBYL did a good job of defining languages, and the grammar proved to be both flexible and clear. In both cases some improvements can be made, as detailed in the sections above. The actual translation process took about two weeks for each language. The first step was to obtain a syntax; then the right sides of productions were translated into SIBYL. By this time it became clear that certain changes should be made to the syntax, and the process was repeated. Finally the procedures were designed, and again some modifica- tions of earlier work became desirable . In this way some of the most complicated language features were redesigned as many as three times. Under these conditions, the grammar helped by naturally structuring the problems, and the versatility of SIBYL proved sufficient for all translation tasks . The great bulk of the real problems in translating these three languages were caused by a small percentage of the language forms. It should be possible to decrease the amount of SIBYL code by over a third by eliminating only a few of the language features (for example, by simplifying SQUARE argument lists). This is important, because it has been estimated that the vast majority of queries actually used in a real -world environment are simple [3] . A useful query language can therefore be implemented (perhaps for demonstra- tion or experimental purposes) much more cheaply than, say, all of ALPHA. BIBLIOGRAPHY [lj Aho, Alfred V., and Jeffrey D. Ullman, The Theory of Parsing, Translation, and Compiling, Vol. I: Parsing , Prentice-Hall, Inc. (1973). [2] Boyce, R. F., D. D. Chamberlin, W. F. King, and M. M. Hammer, "Specifying Queries as Relational Expres- sions," Proc ACM SIGPLAN/SIGIR Interface Meeting (1973), pp. 31-40. [3] , "Specifying Queries as Relational Expres- sions: the SQUARE data sublanguage," Proc 1974 ACM-SIGFIDET Workshop , Ann Arbor, Mich. (1974), pp. 249-64. [4] Chamberlin, D. D., and R. F. Boyce, "SEQUEL: A struc- tured English query language," Comm ACM 18 (1975), pp. 621-8. [5] Codd, E.F., "A Relational Model of Data for Large Shared Data Banks," Comm ACM 13 (1970). [6] , "A Data Base Sublanguage Founded on the Relational Calculus," Proc 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control , ACM (1972); also IBM Research RJ893 (1971). [7J , "Seven Steps to Rendezvous with the Casual User," IBM Research RJ 1333 (1974). 78 [8] Date, C. J., An Introduction to Database Systems , Addison-Wesley (1975). [9] Kampen, G. R., "A Formal Definition of the SIBYL Pro- gramming Languages," Univ. of Illinois Dept . of Computer Science, UIUCDCS-R-77-852 (1977). [10] , "A SIBYL Tutorial" (available from the author, Dept. of Computer Science, Univ. of Illinois) [ll] Lacroix, Michel, and Alain Pirotte, "Example queries in relational languages," Technical Note N107, MBLE Research Laboratory, Brussels, Belgium (1978). [12] Martin, James, Computer Database Organization , Prentice- Hall, Inc. (1975). [13] Schubert, R. N. , "The VL Relational Data Sublanguage for an Inferrential Computer Consultant," Univ. of Illinois Dept. of Computer Science, UIUCDCS-R- 77-846 (1977). BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-78-936 3. Recipient's Accession No. 4. Title and Subtitle A TRANSLATION OF RELATIONAL DATABASE QUERY LANGUAGES INTO SIBYL 5. Report Date July 1978 6. 7. Author(s) 8- Performing Organization Rept. David W. Dawson N 9. Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61810 10. Project/Task/Work Unit No. 11. Contract/Grant No. 12. Sponsoring Organization Name and Address 13. Type of Report & Period Covered 14. 15. Supplementary Notes 16. Abstracts This report contains a formal description and comparison of three very different query languages designed for use with relational databases: Codd's ALPHA language, Michalski's VL, and SQUARE. Each language is defined by a syntax-directed translation schema that maps queries into semantically equivalent statements in SIBYL, an experimental programming language. The report includes an informal summary and evaluation of each of the query languages, of the translation approach to language definition, and of SIBYL as a tool for the formal definition of semantics. 17. Key Words and Document Analysis. 17a. Descriptors syntax, formal semantics, query language, relational data base, programming language, language definition 17b. Identifiers/Open-Ended Terms 17c. COSATI Field/Group 18. Availability Statement 19. Security Class (This Report) UNCLASSIFIED 21. No. of Pages 20. Security Class (This Page UNCLASSIFIED 22. Price FORM NTIS-35 ( 10-70) USCOMM-OC 40329-P7 1 wnv 1 J 1978 AUG itft.