Bn 
 
 
MAlHEMATiC 
 
« N TRMC 1R C U lAT 1 ON S DBOOKSTACKS 
 
 The Person borrowng this «* before 
 
 be charged a minim 
 
 each non-returned o ^ ^ , 
 
 ^^l-*r^5«-»- o,-B,, 
 
 f \llinoi. and ore pr 
 
 law and Procedure. _. 333-8400. 
 
 - TO BMW; ejuBiJ u*—*-* 
 
 APR 25 
 
 below previous due date- 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/relationaldataba913kimw 
 
^^ y^rt No. 
 
 fl/aTZ 
 
 :&f 
 
 UIUCDCS-R-78-913 
 
 UILU-ENG 78 1703 
 
 RELATIONAL DATABASE SYSTEM IMPLEMENTATIONS — A SURVEY 
 
 by 
 
 Won Kim 
 
 .5? 
 
 o 
 
 January 1978 
 
 NSF-OCA-MCS73-07980-000031 
 
 m 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 University of Illinois 
 ^na-ChamHUgfl 
 
Report No. UIUCDCS-R-78-913 
 
 RELATIONAL DATABASE SYSTEM IMPLEMENTATIONS — A SURVEY 
 
 by 
 
 Won Kim 
 
 January 1978 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 * 
 
 This work was supported in part by the National Science Foundation under 
 
 Grant No. US NSF-MCS73-07980. 
 
ACKNOWLEDGEMENT 
 
 This author is deeply indebted to Professor David J. Kuck for the 
 numerous helpful discussions and suggestions on the subject mat- 
 ters of this paper. 
 
 ABSTRACT 
 
 This paper surveys the overall structure and design of some of 
 the prototype relational database systems frequently referenced 
 in the literature, and then examines in some detail some inter- 
 esting approaches that have been taken in implementing various 
 requisite features for a comprehensive relational database sys- 
 tem. 
 
 TABLE CF CONTENTS 
 
 1 . Introduction 
 
 2. System Architecture and Design 
 
 3. Storage Structures and Access Strategies 
 
 4. Query Processing and Optimization 
 
 5. Support of User Views 
 
 6. Integrity and Authorization Control 
 
 7. Concurrency Control 
 
 8. System Checkpoint and Recovery 
 References 
 
1. INTRODUCTION 
 
 A series of papers by E.F. Codd between 1970 and 1971 (1, 2, 3. 
 k) provided a great impetus to the development of relational 
 database systems and associated research activities. A number 
 of prototype systems were implemented to demonstrate the feasi- 
 bility for supporting high level, non-procedural, set-oriented 
 data sublanguages. At the same time a number of systems were 
 developed to serve as database access and storage subsystems 
 for implementing high level relational data sublanguages. More 
 recently efforts have been directed toward implementing compre- 
 hensive relational database systems that incorporate solutions 
 to various specific problems that have been identified. Table 
 1.1 lists a number of prototype relational systems that are fre- 
 quently referenced in the literature. 
 
 A comprehensive relational database system must not only provide 
 interfaces for high level, non-procedural data sublanguages for 
 both the application programmers and remote terminal users, but 
 also support various users' views of the stored database, enforce 
 data integrity and authorization constraints, control concurrent 
 updates of a shared database, and provide transaction and system 
 recovery. Efficient data access strategies and storage struc- 
 tures and query processing techniques must also be employed to 
 meet the response-time requirements of on-line terminal users. 
 Also a good report generator should be provided to display the 
 results of a query in a highly stylized format. 
 

 Readers wishing for a tutorial on relational database are refer- 
 red to Date (6) and Chamberlin (7). 
 
 Table 1.1 Some Prototype Relational Database Systems 
 
 MADAM (MacAIMS Data Management System) — Project MAC, MIT 
 
 RDM3 (Relational Data Management System) -- MIT 
 
 DAM AS (Data Management System) — MIT 
 
 PDMS (Relational Data Management System) — 
 
 General Motors Research Lab., Warren, Mich. 
 
 IS/l (Information System 1) — 
 
 IBM UK Scientific Centre, Peterlee, England 
 
 PRTV (Peterlee Relational Test Vehicle) -- 
 
 IBM UK Scientific Centre, Peterlee, England 
 
 RM (Relational Memory) — 
 
 IBM Cambridge Scientific Center, Cambridge, Mass. 
 
 XRM (Extended Relational Memory) — 
 
 IBM Cambridge Scientific Center, Cambridge, Mass. 
 
 QUERY -BY -EXAMPLE -- IBM Yorktown Heights Research Lab., N.Y. 
 
 SEQUEL -- IBM San Jose Research Lab., Calif. 
 
 GMI3 (Generalized Management Information System) — 
 
 IBM Cambridge Scientific Center, Cambridge, Mass. 
 MIT Sloan School of Management, Cambridge, Mass. 
 
 3QUIRAL (Smart Query Interface for Relational Algebra) — 
 University of Utah 
 
 RENDEZVOUS -- IBM San Jose Research Lab., Calif. 
 
 ZETA/TORUS (Toronto Understanding System) -- 
 University of Toronto 
 
 INGRES (Interactive Graphics and Retrieval System) — 
 University of California at Berkeley 
 
 SYSTEM R -- IBM San Jose Research Lab., Calif. 
 
^his paper first describes the overall architecture and design 
 of some of the prototype systems, and then examines some of the 
 approaches that have been taken in implementing the various re- 
 quisite features for a comprehensive relational system. 
 The illustrative examples used in this paper are based on the 
 following database of employees and their departments i 
 
 EMP (NAME, DNO, JOB, SAL, AGE, MGR) » primary key is NAME 
 DEPT (DNO, LOC, NEMPS) ; primary key is DNO 
 
 Each EMP tuple contains the employee name, department number, 
 job name, salary, age, and manager's name. Each DEPT tuple 
 contains the department number, location (floor), and the num- 
 ber of employees in the department. 
 
 Some of the relational database terms used in this paper are 
 listed in Table 1.2 along with their equivalents found in the 
 literature. 
 
 Table 1.2 Some Relational Database Terms 
 
 base relation -- schema, data model, table, stored relation 
 
 view — subschema, external data model, derived relation, 
 
 implied relation, virtual relation, defined relation 
 
 column, field -- attribute, domain, component 
 
 tuple -- row, entry 
 
 range variable -- tuple variable, entry variable 
 
 selection — restriction 
 
 projection 
 
 join -- semi join, natural join, composition 
 
 statistical or aggregate function — libray function, built-in 
 
 function 
 
2. SYSTEM ARCHITECTURE AND DESIGN 
 
 This section describes the overall architecture and design of 
 a number of prototype relational systems. Systems treated in de- 
 tail include MADAM, IS/l, PRTV, RM, XRM, ZETA, INGRES, and SYS- 
 TEM R. Other systems that are mentioned in Table 1.1 are only 
 briefly described below. 
 
 General 'Motor's RDMS (8) supports a data sublanguage based on 
 the relational algebra. It has been used in several applica- 
 tions for engineering and commercial data analysis. MIT's RDMS 
 (9) is an upgraded version of MADAM, and is widely in use with- 
 in the administrative departments at MIT. MIT's DAMA3 (10, 11) 
 and Utah's SQUIRAL (12) have made interesting contributions to 
 the area of query optimization (see section k) . IBM's SEQUEL 
 (13, I**-, 15) and QUERY-BY-EXAMPLE (16, 1?) implement two of the 
 best developed high level, non-procedural data sublanguages, 
 SEQUEL (18) and QUERY -BY -EXAMPLE (19), respectively. Both sys- 
 tems use XRM as data access and storage subsystem. GMI3 (20), 
 which is being used at MIT for modeling New England energy re- 
 sources, was developed by extending the SEQUEL prototype to ac- 
 commodate a multi-user environment. RENDEZVOUS (21) is a natu- 
 ral language understanding system that supports a relational 
 view of data. 
 
 2.1 MADAM (MacAIMS Data Management System) (22, 23) 
 
 MADAM, developed as a part of Project MacAIMS at MIT's Project 
 MAC, is generally acknowledged to be the first system that sup- 
 
ported an n-ary relational model of data and a set-oriented data 
 sublanguage based on the relational algebra. MADAM was imple- 
 mented on MULTIC3 in PL/l , the language in which MULTIC3 itself 
 was written. Fig. 2.1.1 shows the overall structure of MADAM. 
 The user-oriented subsystem consists of a query language inter- 
 preter and display handler. The division of the system into a 
 set of procedure and data segments concerned with sets of data 
 elements (e.g., name, salary, job, department number) and another 
 set of procedure and data segments concerned with sets of rela- 
 tions (e.g., EMP, DEPT) is the consequence of the view taken by 
 the designers of MADAM that information one may store in a data- 
 base consists of sets of data elements and sets of relations 
 among the data elements. 
 
 Associated with each type of data elements are a data segment 
 and a data element module. Associated with each possible data 
 structure for storing a relation (e.g., tree, linked list, array) 
 are a relation data segment and a relation strategy module. 
 Fig. 2.1.1 is more fully explained in section 3,1. 
 
 2.2 Peterlee IS/l (24) 
 
 IS/l was the first relational prototype implemented by IBM. 
 This system, developed at the IBM UK Scientific Centre, Peterlee, 
 England, supports a query language based on the relational alge- 
 bra and was implemented in PL/l. Fig. 2.2.1 illustrates the 
 architecture of the system. 
 
In the syntax section, a user query is translated into a string 
 of intermediate codes. This string is a procedure call and asso- 
 ciated arguments. 
 
 The procedure section consists of a set of PL/l procedures that 
 support the set of user commands. A simple interpreter is pro- 
 vided to serve as the interface between the syntax section and 
 the procedure section. The interpreter isolates the arguments 
 from each string of intermediate codes, pushes them onto an arg- 
 ument stack, and passes control to the procedure being called. 
 The database facilities section consists of a set of PL/l pro- 
 cedures that access and store views and base relations. 
 IS/l was designed to allow the users to easily extend the func- 
 tional capabilities of the base system by writing their own PL/l 
 procedures and linking them to the system. The syntax section 
 is extended as a side effect of the procedure section extension. 
 IS/l supports user views on the base relations by either storing 
 their definitions in strings of intermediate codes (implicit 
 views) or by maintaining the materialized sets of tuples for the 
 views as separate base relations (explicit views). This last 
 point is elaborated upon in section 5.1. 
 
 2.3 PRTV (Peterlee Relational Test Vehicle) (25) 
 
 PRTV was developed at the IBM UK Scientific Centre, Peterlee, 
 England, based on their experience with IS/l. 
 This system was implemented in two major sections called the 
 top end and the bottom end. 
 
8 
 
 The top end was implemented primarily in the string processing 
 language MP/3, and partially in PL/l . This section of the sys- 
 tem parses the user language, processes and maintains the sys- 
 tem directories, performs the mapping between relations and 
 storage structures, links user-provided PL/l procedures, and 
 performs query optimization. The designers of I3/l and PRTV 
 feel very strongly that although data sublanguages based on 
 the relational calculus or relational algebra are "relational- 
 ly complete (4)," they are by no means "functionally complete." 
 Thus 13/1 and PRTV allow the users to extend the system to suit 
 their application requirements that system designers may not 
 foresee. The query optimization techniques implemented for 
 PRTV are due to the works of Hall (26), and are described in 
 some detail in section 4.1. 
 
 The bottom end was implemented in a mixture of PL/l and IBM 370 
 assembler language. This section consists of a set of proce- 
 dures that perform all storage and relational operations on the 
 database. 
 
 The user language PRTV supports is based on the relational alge- 
 bra. However, this query language requires a sufficient know- 
 ledge of algebra on the part of the users, and thus restricts 
 the usability of PRTV as a stand-alone system. In fact, PRTV 
 was used as a low level subsystem for the Urban Management Sys- 
 tem which created a database of 60 megabytes with relations of 
 more than 60,000 tuples and degrees as high as 200. This data- 
 base is thought to be the largest relational database to have 
 had regular use (one year). 
 
Fig. 2.1.1 Architecture of MADAM 
 
 Relation Strategy Modules 
 
 names 
 
 jobs 
 
 salaries 
 
 procedure 
 segment 
 
 column 
 names 
 
 
 
 
 
 
 data 
 segment 
 
 relation 
 
 EMP 
 
 relation 
 DEPT 
 
 Fig. 2.2.1 Architecture of IS/l 
 
 algebra-based 
 user language 
 
 syntax 
 section 
 
 interpreter 
 
 procedure 
 section 
 
 database 
 facilities 
 
 syntax 
 extension 
 
 procedure 
 extension 
 
 views 
 
10 
 
 2.4 RM (Relational Memory) 
 
 XRM (Extended Relational Memory) (2?» 28, 15) 
 
 Both RM and XRM were developed at the IBM Cambridge Scientific 
 Center for use as low-level, procedural relational interfaces 
 for host language systems, or as intermediate interfaces in im- 
 plementing high-level, non-procedural relational languages. 
 RM allows variable length byte strings to be stored and accessed 
 by numerical identifiers in a set of pages called the "entity 
 space." Binary relations whose data elements are either inte- 
 gers or numerical identifiers for the byte strings stored in 
 the entity space are stored and operated on in another set of 
 pages called the "relation space." Note that RM supports only 
 binary relations. Binary relations are logically sufficient, 
 for relations of higher degrees can be straightforwardly decom- 
 posed into sets of semantically equivalent binary relations* 
 Binary relations, however, cannot adequately support large, 
 formatted databases (chapt. 11 of 6). 
 
 XRM was developed by extending RM to support n-ary relations, 
 n-ary tuples are stored in the RM entity space, and inversions 
 that provide efficient associative access to these n-ary tuples 
 are stored in the RM relation space. A detailed discussion of 
 XRM is given in section 3.3* 
 
 2.5 ZET A/TORUS (29, 30) 
 
 ZETA is a relational prototype under implementation at the Uni- 
 versity of Toronto. Three principal levels constitute this sys- 
 tem, as shown in Fig. 2.5.1, 
 
11 
 
 The lew-level, called MINIZ, implements elementary commands that 
 directly eperate on single relations. These include commands te 
 create and destroy relations, insert and delete tuples, retrieve 
 and update columns within a tuple, and mark selected tuples of 
 a relation. The most interesting command at this level is the 
 mark command. A mark is a unary relation (i.e., a list) of tuple 
 identifiers of those tuples of a relation satisfying a selection 
 operation. This facility is employed in the intermediate level 
 for the support of views (see section 5*2) and for the implemen- 
 tation of queries involving two or more relations. 
 The intermediate level, consisting primarily of the EXECUTOR 
 module, performs the translation of the intermediate form of 
 the user commands into a sequence of lew level commands. The 
 EXECUTOR consists of an interpreter, a set of "schema" proce- 
 dures, and a set of utilities. The interpreter breaks down the 
 intermediate form of a user command into a sequence of utility 
 operations. For each lew level command, a corresponding utility 
 is provided at this level. The interpreter invokes the schema 
 procedures whenever data pertaining to user views or base rela- 
 tions are to be retrieved or accessed. 
 
 The high level ef ZETA consists of a host-programming language 
 system (HLS) and a self-contained language system (SLS). The 
 HLS supports a SEQUEL-like data sublanguage embedded in Pl/l . 
 The SLS provides a syntax- table-driven compiler-compiler to 
 tailor a SEQUEL-like stand-alone query language for particular 
 applications. 
 
12 
 
 The low level was used as a back end database system for a nat- 
 ural language understanding system called TORUS (Toronto Under- 
 standing System). 
 
 2.6 INGRES (Interactive Graphics and Retrieval System) (31) 
 
 INGRES, developed at the University of California at Berkeley, 
 is implemented on top of the UNIX operating system in C, the 
 language in which UNIX itself is written. INGRES supports QUEL 
 (Query Language), a high level language based on the relational 
 calculus for terminal users; EQUEL (Embedded QUEL), QUEL embed- 
 ded in C for application programmers; and CUPID, a graphics- 
 oriented casual user language. 
 
 INGRES is implemented as a sequence of UNIX processes as shown 
 in Fig. 2.6.1. Communication between a UNIX process and the 
 tree of subprocesses it spawns is accomplished via one-direction 
 communication links called "pipes." Each pipe is written into 
 by one process and read by another. 
 
 P rocess 1 is created when INGRES is invoked from the UNIX. It 
 should be noted that INGRES is treated as a user process under 
 the UNIX operating system environment. Process 1 is the inter- 
 active terminal monitor which maintains a workspace for a user 
 to formulate and edit a collection of INGRES commands. The con- 
 tents of the user workspace are passed down pipe A as a string 
 of ASCII characters for execution by later processes. 
 Process 2 translates the input query passed down pipe A into a 
 parse tree using a lexical analyzer and a parser. The qualifi- 
 
13 
 
 cation portion of the query is converted to an equivalent Boole- 
 an expression in conjunctive normal form. A conjunctive normal 
 form expression is amenable to the query modification techniques 
 that INGRE3 employs for the support of user views and authoriza- 
 tion and integrity control (32, 22). Query modification is sim- 
 ply the merging of the parse tree for a query with the stored 
 parse trees for the view definitions, and authorization and inte- 
 grity constraints. Section 6.2 gives an example of the query 
 modification technique in action. Process 2 also contains rou- 
 tines for a crude concurrency control. It is noted here that 
 views and selective authorization control have not been imple- 
 mented, and that only simple integrity assertions involving 
 single range variables and no aggregate functions (e.g., average, 
 sum, count) are currently enforced. 
 
 Process 3 accepts from process 2 a string of tokens representing 
 a user query, and performs the following two functions i 
 
 1. decomposition of a query involving more than one range 
 variable (multi-variable query) into a sequence of inter- 
 actions involving only one range variable (one-variable 
 query). For details, see (3*0 and section 4.2. 
 
 2. processing of a one-variable query. 
 
 Retrieve commands are completely processed in process 3» Update 
 commands (Replace, Delete, and Append), however, are converted 
 to appropriate Retrieve commands to simply isolate the qualify- 
 ing tuples. These tuples are then spooled onto a temporary file 
 which is passed to process 4 for the actual update operations. 
 
14 
 
 The reasons for this deferred update are explained in section 
 k.2. All utility commands (e.g., create and destroy base rel- 
 ations and secondary indices) are simply passed down to process 
 4. Note in Fig. 2.6.1 that the results of a retrieve command 
 are returned in a stylized format directly to the user termi- 
 nal. 
 
 Process k is a collection of overlays that support deferred up- 
 date and all utility commands. 
 
 Note in Fig. 2.6.1 that all commands are passed to the right 
 via pipes A, B, and C, whereas all error messages are passed 
 to the left via pipes D, E, and F. 
 
 When INGRES is invoked from an EQUEL program, the executable 
 module for the EQUEL program (produced by a pre-compiler and 
 a compiler) replaces the interactive terminal and process 1 
 of Fig. 2,6.1. 
 
 The size of each INGRES process largely explains the particu- 
 lar process structure chosen by the designers of INGRES. 
 
 Table 2.6.1 INGRES Space Requirements (in bytes) 
 
 Access Methods (AM) — UK 
 
 EQUEL (precompiler, compiler, AM) — 4lK 
 
 Process 1 — 10K 
 
 Process 2 and AM — 5^K 
 
 Process 3 and AM — 56K 
 
 Process 4 (AM, 8 overlays) — 171K 
 
15 
 
 The size of a UNIX process is limited to 64K bytes on a PDP 
 llAo, and to 128K bytes on a PDP llA5 or 11/70. 
 The INGRES designers believe that INGRES should appear as a 
 user job to UNIX so that INGRES would operate on a standard 
 UNIX. Consequently, INGRES contains no facilities for mem- 
 ory management and I/O? UNIX performs these functions. This 
 design decision should be contrasted with that for SYSTEM R 
 (see below). Readers wishing for a more detailed account of 
 the UNIX operating system should consult (35) • 
 
 2.7 SYSTEM R (36) 
 
 SYSTEM R, developed at the IBM San Jose Research Laboratory 
 as a vehicle for research in relational database, is by far 
 the most sophisticated and comprehensive prototype system 
 implemented. This system provides extensive data control fa- 
 cilities, including selective authorization, integrity assert- 
 ions, transactions as units of consistency (see section 6.3), 
 This system implements much of the results of the recent in- 
 vestigations in concurrency control (37 • 38, 39) • including 
 such notions as the levels of consistency, granularity of 
 locks, and locking at the logical level as well as at the 
 physical level (see section 7*2). The system also provides 
 facilities for transaction backout, and transaction and sys- 
 tem recovery (see section 8.2). SYSTEM R, rather than rely- 
 ing on the automatic paging of virtual memory in VM/370, pro- 
 vides its own memory management and I/O facilities. This de- 
 
16 
 
 cision was made to satisfy the implementation requirements for 
 a novel recovery scheme (see section 8.2). 
 
 Fig. 2.7.1 shows the overall structure of SYSTEM R. The RDI 
 (Relational Data Interface) is the principal user interface, 
 consisting of a set of operators for data definition and mani- 
 pulation and a set of operators for database locking and trans- 
 action. A special RDI operator called SEQUEL is provided to 
 support terminal users and PL/l host language system users. 
 Users are allowed to write programs on top of the RDI for the 
 support of other relational interfaces (e.g., QUERY BY EXAMPLE) 
 or hierarchical or network interfaces. 
 
 The RD3 (Relational Data System) is the subsystem that implements 
 the RDI operators. The RD3 provides authorization and integrity 
 control facilities, supports various users' views of the data- 
 base, maintains the catalogs of external names, and performs an 
 access path optimization. 
 
 The RSI (Relational Storage Interface) is the internal interface 
 which provides operators for tuple-at-a-time access to base rel- 
 ations, and also operators for data definition, data recovery, 
 and transaction management. 
 
 The RSS (Relational Storage System) is a complete storage subsys- 
 tem that supports the RSI operators. The RSS maintains access 
 paths to the base relations, and secondary indices on selected 
 columns of the base relations. The RSS maintains transaction 
 consistency by performing locking, deadlock detection, and back- 
 out. The RSS also performs memory management and transaction 
 and system recovery. 
 
17 
 
 Fig. 2.5.1 Architecture of ZETA 
 
 HLS 
 
 S 
 
 v / 
 
 3LS 
 7ZZ. 
 
 high level 
 
 EXECUTOR 
 
 interpreter 
 
 schema 
 procedures 
 
 utilities 
 
 intermediate 
 level 
 
 MINIZ 
 
 basic operations 
 on relations 
 
 low level 
 
 Fig. 2.6.1 Process Structure of INGRES 
 
 A, B, C, D, E, and F are pipes. 
 Fig. 2.7.1 Architecture of SYSTEM R 
 
 SEQUEL SEQUEL 
 
 embedded in PL/l stand-alone 
 
 emulators for 
 hierarchical, network, or 
 other relational interfaces 
 
 Relational Data Interface 
 
 Relational Data System 
 
 Relational Storage Interface 
 
 Relational Storage System 
 
18 
 
 3. STORAGE STRUCTURES AND ACCESS STRATEGIES 
 
 This section examines the storage structures and access strate- 
 gies used in MADAM, PRTV, RM, XRM, INGRES, and SYSTEM R. 
 
 3.1 MADAM 
 
 The following discussion refers to Fig. 2.1.1. 
 
 Each type of data elements (e.g., name, salary, job, column name 
 of a relation) is associated with a different DEM (Data Element 
 Module) and a different DES (Data Element Segment). Furthermore, 
 each type of data structure for storing a relation (e.g., tree, 
 list, array) is associated with a different R3M (Relation Strate- 
 gy Module) and a different RDS (Relation Data Segment). 
 A DEM converts appropriate type of data elements from their nat- 
 ural form to their standard form in which all data elements are 
 stored. For example, the standard form for data elements of 
 type "date," "11/25/77" or "Nov. 25, 1977," may be "771125." 
 Since most data elements are either character strings or inte- 
 gers, a D3M (Data Strategy Module) for character strings and a 
 D3M for integers are provided. A DSM generates a unique refer- 
 ence number for each data element. The reference number for an 
 integer is the integer itself, whereas the reference number for 
 a string of characters is determined by an algorithm which pre- 
 serves the lexical ordering between different character strings. 
 The fixed-length (one word) reference number is used in all sub- 
 sequent operations on the data element. Since the reference 
 numbers preserve the lexical ordering between the data elements 
 
19 
 
 they represent, the frequent comparison and ranking operations 
 can be directly performed on them. Moreover, these operations 
 can be performed much more quickly on the single-word long ref- 
 erence numbers than on the variable-length data elements. 
 Within a DES (Data Element Segment) all data elements of one 
 type are stored in a binary search tree. 
 
 The designers of MADAM take the view that for each type of data 
 structure for storing a relation a different set of procedures 
 should be provided to support operations on the relation. Each 
 RSM is designed to perform all database operations on only one 
 type of data structure. A potential difficulty of this approach 
 arises when it becomes necessary to operate on two relations 
 stored in different types of data structure. A canonical form 
 storage structure for relations is therefore defined as the 
 standard two dimensional array representation of a relation. 
 Each RSM is allowed to accept a relation stored in canonical 
 form and produce the output in canonical form. Thus when it 
 becomes necessary to operate on two relations stored in differ- 
 ent storage structures, one of the relations is converted to 
 canonical form by its RSM. The RSM for the other relation then 
 performs the actual operation using the canonical form as one 
 of its inputs. 
 
 All data elements and domain names of a relation are uniformly 
 referred to by their reference numbers. 
 
20 
 
 3.2 PRTV 
 
 PRTV implements various techniques to increase storage space uti- 
 lization and to reduce data retrieval time. These include the 
 use of a standard format, sorting, and suppression and compres- 
 sion of data. 
 
 PPTV stores all relations sorted on the values of some column. 
 Sorting makes the execution of operations such as join more ef- 
 ficient, and allows for the suppression of duplicate leading 
 fields. 
 
 Data suppression is performed as follows i In each record, lead- 
 ing fields are suppressed if they are identical to the corres- 
 ponding fields of the preceding record; and the remaining fields 
 are stored along with an indication of the number of fields act- 
 ually stored. For example, the records shown in Fig. 3.2.1 are 
 suppressed and stored as shown in Fig. 3*2.2. 
 
 Fig. 3.2.1 Records Before Suppression 
 
 12 2 
 
 1 2 3 
 
 1 1 3 
 
 2 1 3 
 
 Fig. 3.2.2 Records of Fig. 3.2.1 After Suppression 
 
 12 2 
 
 1 3 
 
 2 13 
 
 3 2 13 
 
21 
 
 PRTV performs data compression as follows i The first record in 
 each block of records is stored in its entirety; in each of the 
 subsequent records, all fields that have not been suppressed 
 are Exclusive-Ored with the corresponding fields of the first 
 record i only the non-zero bytes that result from this operation 
 are stored, along with a bit map that indicates the positions of 
 these bytes. For example, the records of Figs. 3.2.3a and 3.2.3b 
 are Exclusive-Ored to yield the record shown in Fig. 3.2.3c. 
 
 Fig. 3.2.3 PRTV Data Compression 
 
 a) 01011111 01011100 01100101 01100011 
 
 b) 01011111 01110101 01100101 01000011 
 XOR 00000000 00101001 00000000 00100000 
 
 c) 01010000 00101001 00100000 
 bit map 
 
 3.3 RM, XRM 
 
 RM uses one set of pages to store data elements, and another set 
 of pages to store binary relations. A data element is referenced 
 by a numeric identifier consisting of a page address and an off- 
 set within the fixed header area in the page (see Fig. 3«3«1)« 
 The referenced header area position contains the actual byte ad- 
 dress for the data element. A binary relation identifier also 
 consists of a page address and an offset within the page header 
 area. The header area position contains the byte address of the 
 first tuple of the relation. RM maintains a linked list of all 
 tuples of a relation in increasing order of the left column values 
 
22 
 
 of the tuples. All tuples with the same left column values, in 
 turn, are represented in a linked list (linked to the common 
 left column value) in increasing order of the right column values 
 (see Fig. 3.3.2). 
 
 XRM stores all n-ary tuples of a relation in the RM entity space 
 (set of data pages). Each field of an n-ary tuple is represent- 
 ed as a fixed length integer, which is the actual value for an 
 integer column, and which is the numeric identifier for a string 
 of characters stored in another data page for a non-integer col- 
 umn. 
 
 XRM provides a class relation for all data elements belonging to 
 the same column. This is implemented via a binary relation in 
 RM relation space (set of relation pages). Each tuple of this 
 relation consists of the hashed value of a data element and the 
 numeric identifier for the data element. A class relation is 
 used for quick access from the value of a data element to its 
 numeric identifier. 
 
 XRM also maintains a binary relation for each n-ary relation 
 so that a particular tuple of the relation may be directly ac- 
 cessed given its primary key value. Each tuple of this binary 
 relation contains the hashed value of the key and the identifi- 
 er of the n-ary tuple. 
 
 Rapid access to a tuple with a particular value in a particular 
 field is also supported by an inversion relation . There is also 
 a master relation which contains information about all other rel- 
 ations. The relation identifier of a relation is then the iden- 
 tifier of a tuple in the master relation which describes the rel- 
 ation. 
 
23 
 
 Fig. 3.3.1 RM Data Page Layout 
 
 TID1 TID2 
 
 header area 
 
 TID 1 tuple identifier 
 
 Fig. 3.3.2 RM Relation Page Layout 
 
 header area 
 
 RID 1 relation identifier 
 
 vl< v2 < v3 
 v5< v6 
 
 Fig. 3.^.1 INGRES Page Layout for 
 
 Keyed Storage Structures 
 
 next 
 primary 
 page 
 
 next 
 
 overflow 
 
 page 
 
 next 
 line number 
 
 tuple 1 
 
 tuple 2 
 
 TID1 TID2 Mine table 
 
 TID 1 tuple identifier 
 
zk 
 
 Fig. 3.^.2 Structure of a Keyed INGRES File 
 
 primary pages 
 
 a primary 
 page 
 
 overflow 
 pages 
 
 next 
 
 primary 
 page 
 
 next 
 
 overflow 
 
 page 
 
 next 
 line no 
 
 Fig. 3.5.1 SYSTEM R Storage Structure 
 
 overflow 
 tuple 
 
 TID 
 
 header area 
 
25 
 
 3.4 INGRES 
 
 INGRES supports five storage structures. Four of these are 
 keyed, that is the storage location of any tuple within a file 
 that contains a relation is determined as a function of the 
 value of the tuple's primary key. These are termed hashed, 
 ISAM, compressed hashed, and compressed ISAM. The non-key 
 structure is termed a heap, and stores tuples of a relation 
 independently of their key values. 
 
 The non-key structure involves a low overhead, and is intended 
 for relations of small cardinality, or temporary relations cre- 
 ated as intermediate results during query processing. New 
 tuples are simply appended to the end of a file in the order 
 supplied, and their byte offsets within the file become their 
 tuple identifiers. 
 
 The four keyed storage structures share a common page layout 
 shown in Fig. 3,b,l, A tuple identifier consists of a page ad- 
 dress and a line number in the fixed line table (header area 
 in RM/XRM pages). The referenced entry in the line table con- 
 tains the byte address of the tuple. The "next primary page" 
 pointers chain together all primary pages. A relation is init- 
 ially represented as a linked list of primary pages in a file. 
 The "next overflow page" pointers link together all overflow 
 pages containing tuples that do not fit in the primary pages. 
 The general structure of a keyed file is illustrated in Fig. 
 3.^.2. Each keyed file, consisting of primary and overflow 
 pages, is dedicated to the storage of a single relation. This 
 decision not to cluster tuples from different relations that 
 
26 
 
 may be accessed together was made so that the INGRES access meth- 
 ods and query decomposition procedures (see section 4.2) would 
 be manageable. The decision was also a direct consequence of 
 the fact that INGRES provides no memory management or I/O facili- 
 ties of its own. Logically adjacent pages in a UNIX file are 
 not necessarily physically adjacent, and only a small number of 
 tuples from different relations may be grouped in a page of 512 
 bytes that UNIX supports. 
 
 In a hashed file, tuples are distributed randomly throughout the 
 file according to the values of their primary keys. 
 In an ISAM file, tuples of a relation are initially sorted to 
 produce an ordering on a particular key. The heighest key values 
 in all the primary pages comprising the file are recorded in a 
 multilevel directory in several pages following the primary pages 
 within the file itself. This directory is static, that is, it 
 does not change during subsequent insertion, deletion, and update 
 of tuples. Tuples in a primary page and its overflow pages are 
 not maintained in sort order. An INGRES ISAM file is similar to 
 IBM's ISAM. A good discussion on ISAM can be found in Chapter 
 7-6 of (40). 
 
 An ISAM file is useful in situations where the key value is spe- 
 cified as falling within a range of values, whereas a hashed 
 file is appropriate when access is conditioned on a specific key 
 value. 
 
 In both keyed storage structures mentioned above, fixed length 
 tuples are stored. INGRES implements a compressed hashed file 
 
27 
 
 and a compressed ISAM file. These schemes use a hashed file 
 and an ISAM file in conjunction with some data compression tech- 
 niques (41). Currently blanks and portions of a tuple which 
 are identical to the preceding tuple are suppressed. The com- 
 pressed files are useful when an increase in storage utiliza- 
 tion outweighs the overhead involved in encoding and decoding 
 the variable length field values of tuples. 
 
 3.5 SYSTEM R 
 
 The SYSTEM R storage structure is illustrated in Fig. 3* 5.1. 
 A segment is an address space in VM/370 virtual memory. Each 
 segment consists of a set of pages. A page is the unit of 
 data transfer between the main memory and secondary storage 
 devices. Each segment may contain many relations! however, 
 each relation is entirely contained within a segment. 
 Unlike in INGRES, each page may contain tuples from different 
 relations, so that extra page accesses may be avoided when 
 tuples from different relations are accessed together? however, 
 each tuple is entirely contained within a page. 
 Some pages are reserved for the storage of tuples of user rel- 
 ations (data pages), while other pages are intended for the 
 storage of index structures and system catalogs. Also several 
 types of segment are supported to satisfy different requirements 
 with appropriate overheads. For example, one type of segment 
 is used for the storage of shared data. Facilities must be pro- 
 vided for concurrent access, transaction backout, and segment 
 
28 
 
 recovery for the data of this segment. A different segment type 
 is used for the storage of temporary relations generated as 
 intermediate results of a query processing. The overhead incur- 
 red for the support of a segment for shared data is not necessary 
 for this type of segment. Another type of segment is used for 
 maintaining the access path structures or internal catalogs. 
 Each tuple of a relation is stored as a contiguous sequence of 
 field values along with a prefix. The prefix contains the rela- 
 tion identifier, the number of fields in the tuple, pointer 
 fields that implement the links (see section 4.3), and the num- 
 ber of pointer fields. As in RM/XRM and INGRES, each tuple is 
 associated with a tuple identifier which is a concatenation of 
 a page number and an offset within a header area. Tuples may 
 be moved within a page to allow space compaction, and their new 
 byte addresses entered in their slots in the fixed header area. 
 Thus the tuple identifiers generated before the space compac- 
 tion correctly reference the corresponding tuples after the 
 space compaction. In case a tuple is updated to a longer total 
 value and space is no longer available in the current page, it 
 is tagged with a new tuple identifier in a nearby page (see Fig. 
 3.5.1). If the new tuple overflows again, the original tuple 
 is tagged with a tuple identifier in a page that does not cause 
 the tuple to overflow. A tuple access via its tuple identifier 
 thus involves at most two page accesses. 
 
 Each tuple identifier is generated and used by the system (RSS) 
 to refer to tuples from index structures and to maintain pointer 
 chains. This is further discussed in section 4.3. 
 
29 
 
 k. QUERY PROCESSING AND OPTIMIZATION 
 
 A number of interesting works have been done on optimizing the 
 implementation of high level data sublanguages based on the rela- 
 tional calculus or relational algebra. In (4-) Codd presents an 
 algorithm, called "Codd's reduction algorithm," for translating 
 a relational calculus expression into a sequence of relational 
 algebra operations. Palermo made certain improvements in the 
 efficiency of the reduction algorithm and implemented the alge- 
 bra operators in APL (^2). Smith and Chang (12), Hall (26), and 
 Pecherer (4-3) have explored techniques for translating an alge- 
 braic query expression into an equivalent, but more efficient, 
 form. Astrahan and Chamberlin (15) have proposed techniques for 
 decomposing the block-structured SEQUEL statements. Rothnie (10, 
 11) and Wong and Youssefi (3*0 have implemented iterative decom- 
 position techniques for reducing a calculus-based query involving 
 multiple range variables into a sequence of queries involving 
 only single range variables. Senko, et. al. (44) and Tsichritzis 
 (4-5) have proposed schemes in which a non-procedural expression 
 is compiled into a set of procedures in a hierarchical or network 
 oriented language. In (46) Gotlieb presents and compares a num- 
 ber of algorithms for implementing a join. In (47) Blasgen and 
 Eswaran present an evaluation of several techniques for imple- 
 menting join, projection, and selection operations on the basis 
 of their costs in secondary storage accesses. 
 
 The optimizing techniques implemented for PRTV, INGRES, and SYS- 
 TEM R are examined in this section. 
 
30 
 
 k.l PRTV 
 
 PRTV generates a query tree for a sequence of relational algebra 
 operations specified by the user. The PRTV optimizer then reorg- 
 anizes the query tree according to the following principles i 
 
 1. Selections are moved as far down the tree (toward the leaves) 
 
 as possible for the earliest possible execution. 
 .?. Projections not requiring the results to be sorted are 
 also moved as far down the tree as possible. 
 
 3. Projections requiring sorted results, however, are moved 
 as far up the tree as possible for the latest possible 
 execution. 
 
 4. Projections of a projection on the same relation are 
 merged into a single projection. 
 
 5. Expressions involving several set operators are rearranged 
 according to such standard rules as commutativity and dis- 
 tributivity . 
 
 6. Common subtrees within a query tree may be realized once 
 into an explicit relation so that duplication of the oper- 
 ations may be avoided. Note here that the optimizer esti- 
 mates the cost of creating an explicit relation and the 
 cost of duplicating the operations, and chooses the cheaper 
 alternative. 
 
 7. The optimizer also selects among alternative implementations 
 of the relational operators. 
 
31 
 
 k.2 INGRES 
 
 Query processing in INGRES is done in process 3 via the OVQP 
 (one-variable query processor) and DECOMP (decomposition) rou- 
 tines. The OVQP is invoked to completely process a query which 
 involves a single range variable. The DECOMP is called to re- 
 duce a query involving more than one range variable into a se- 
 quence of one-variable queries. The DECOMP makes iterative 
 calls to the OVQP to process the intermediate one-variable que- 
 ries, and merges the results of these calls. The OVQP and DE- 
 COMP are discussed in detail below. 
 
 OVQP 
 
 This program consists of a module called STRATEGY which selects 
 an optimal access strategy for retrieving tuples from a single 
 relation, and a module called SCAN which evaluates each retrieved 
 tuple against the predicates of the query and creates the output 
 list of values from the qualifying tuples. 
 
 DECOMP 
 
 Three techniques are used to reduce a multi-variable query to 
 
 a query involving one less range variables! tuple substitution, 
 
 one variable detachment, and reformatting. 
 
 1) tuple substitution 
 
 One range variable in the query is selected, and the relation 
 
 over which the variable ranges is scanned one tuple at a time. 
 
 All references to the selected variable in the query are replaced 
 
 by appropriate field values from the retrieved tuple. The origi- 
 
32 
 
 nal query now has one less variable. Note that the variable 
 
 which ranges over a relation with the smallest cardinality is 
 
 selected for substitution. 
 
 As an example, consider how the following 2-variable query is 
 
 reduced by tuple substitution. 
 
 (Find the names of employees who earn more than their 
 managers) 
 
 RANGE OF E, M 13 EMP 
 RETRIEVE (E.NAME) 
 WHERE E.SAL > M. SAL AND 
 E.MGR = M.NAME 
 
 The two range variables, E and M, in this QUEL statement both 
 
 range over the EMP relation. Suppose that E is selected for 
 
 substitution, and that a tuple (Jones, 2, salesman, 18K, 30, 
 
 Smith) of EMP is retrieved. Substitution of Jones, 18K, and 
 
 Smith for E.NAME, E.3AL, and E.MGR, respectively, yields the 
 
 following one variable query « 
 
 RANGE OF M IS EMP 
 RETRIEVE (Jones) 
 WHERE 18K> M.SAL AND 
 Smith = M.NAME 
 
 2) one variable detachment 
 
 Recall that the qualification portion of a QUEL statement is 
 converted to an equivalent conjunctive normal form expression 
 (see section 2.6). Let us define a simple clause to be a con- 
 junct of the form (variable. column comparator constant). 
 For example, E.SAL = 16K is a simple clause, but E.MGR = M.NAME 
 is not. 
 One variable detachment is done in two steps as follows t 
 
33 
 
 First, for each range variable V in a simple clause C, create a 
 temporary relation T by first performing a selection over R, the 
 relation over which V ranges, with predicate C, and then perform- 
 ing a projection over all columns of R that are involved in the 
 target list and all remaining conjuncts of the qualification. 
 For example, consider the following QUEL statement! 
 
 (Find the names of employees over 50 years of age who work 
 on the third floor and earn more than their managers) 
 
 RANGE OF E, M IS EMP 
 RANGE OF D IS DEPT 
 RETRIEVE (E.NAME) 
 WHERE E.SAL > M.SAL AND 
 
 S.MGR = M.NAME AND 
 
 E.DNO = D.DNO AND 
 
 D.LOC = 3 AND 
 
 E.AGE > 50 
 
 Temporary relations Tl and T2 are created for the two simple 
 
 clauses in the query, D.LOC ■ 3, and E.AGE > 50. Tl is defined 
 
 over DNO, while T2 is defined over NAME, SAL, MGR, and DNO. 
 
 RANGE OF D IS DEPT 
 RETRIEVE INTO Tl (D.DNO) 
 WHERE D.LOC = 3 
 
 RANGE OF E IS EMP 
 
 RETRIEVE INTO T2 (E.NAME, E.SAL, E.MGR, E.DNO) 
 
 WHERE E.AGE > 50 
 
 Each temporary relation thus generated is stored in a hashed 
 file (see section 3.*0« The hash key is chosen as follows i 
 Non-simple clauses of the form V.D = VI. Dl in the original que- 
 ry are collected, where V and VI are range variables, D and Dl 
 are column names, and V ranges over temporary relation T. If 
 this collection is empty, T is stored in a heap; otherwise, a 
 D is arbitrarily selected. 
 
34 
 
 Consider our example query again. For variable D, which may now 
 
 range over Tl, our collection contains only one clause, D.DNO = 
 
 E.DNO. Therefore, Tl is stored hashed on DNO. For variable E, 
 
 which may now range over T2, our collection consists of two 
 
 clauses, E.MGR = M.NAME, and E.DN0=D.DN0. T2 may then be hashed 
 
 on either MGR or DNO. T2 may have to be restructured with a 
 
 different hash key, if this arbitrary selection of a key turns 
 
 out to be inappropriate (see reformatting below). 
 
 Second, all simple clauses are removed from the original query, 
 
 and all the relations over which the variables involved in the 
 
 simple clauses are replaced by the newly generated temporary 
 
 relations. Remember these temporary relations are horizontal 
 
 and vertical subsets of the original relations. The result of 
 
 one variable detachment is shown below. 
 
 RANGE CF D IS Tl 
 
 RANGE OF E 13 T2 
 
 RANGE OF M 13 EMP 
 
 RETRIEVE (E.NAME) 
 
 WHERE E.SAL > M.SAL AND 
 
 E.MGR - M.NAME AND 
 
 E.DNO = D.DNO 
 
 3) reformatting 
 
 Consider as an example the 3-variable query above. Suppose that 
 Tl is hashed on DNO, T2 is hashed on MGR, and Tl has the smallest 
 cardinality. Then range variable D is selected for tuple substi- 
 tution. When tuple substitution is made, E.DNO = D.DNO becomes 
 E.DNC = constant, and the resulting simple clause can now be re- 
 moved by one variable detachment. Since one variable detachment 
 must be performed for each tuple of Tl over which D ranges, re- 
 
35 
 
 formatting T2, over which E ranges, to be hashed on DNO may sig- 
 nificantly improve its processing. Therefore, if T2 was hashed 
 on MGR (see one variable detachment above), it is rehashed on 
 DNC here. 
 
 Before concluding this subsection, a few reasons for the deferred 
 update mentioned in section 2.6 are explained. The examples used 
 in (31) are very illuminating, and are largely transplanted here. 
 
 1 ) secondary index anomaly 
 
 Suppose each employee earning under 15K is to be given a 10 per- 
 cent raise. Suppose also that a secondary index on SAL is main- 
 tained as shown in Fig. 4.2.1a. 
 
 Fig. 4.2.1 Secondary Index Anomaly 
 
 a) SAL TIP b) SAL TIP c) SAL TIP 
 
 12K Tl 13K T2 13. 2K Tl 
 
 13K T2 13. 2K Tl 14. 3K T2 
 
 Suppose update is immediate. The OVQP makes use of this index j 
 first it updates the tuple identified by Tl, and then updates 
 the index entry (12K Tl ) to (13.2K Tl ) to reflect this. After 
 the next qualifying tuple, T2, is processed, the OVQP encounters 
 Tl again. In this way a qualifying tuple may be updated an in- 
 definite number of times! See Figs. 4.2.1b and 4.2.1c. 
 
 2) base relation anomaly 
 
 Suppose each employee earning more than his or her manager is to 
 take a 10 percent pay cut. Consider the database fragment shown 
 in Fig. 4.2.2. 
 
36 
 
 Fig. 4.2.2 Base Relation Anomaly 
 
 NAME SAL MGR 
 
 Smith 10K Jones 
 Jones 8K Killer 
 Brown 9.5K Smith 
 
 Suppose update is immediate here. The only employee qualifying 
 for a pay cut is Smith. Smith now earns 9K. Note that Brown 
 now earns more than Smith, and as a consequence of the immediate 
 update of Smith's tuple, Erown has to take a pay cut, tool 
 
 It has been pointed out that deferred update incurs a heavy over- 
 head, and an optional immediate update (in process 3) is now be- 
 ing considered. 
 
 4.3 SYSTEM R 
 
 A brief discussion on the access paths that SYSTEM R supports is 
 in order before the SYSTEM R optimizer is examined. 
 SYSTEM R supports three access paths for tuples of relations; 
 images, links, and relation scans. An image is conceptually a 
 binary relation consisting of values from one or more fields, 
 called sort fields , of a relation in ascending or descending 
 order, and their corresponding tuple identifiers. At most one 
 clustering image for a relation is supported, in which tuples 
 whose sort values are "close" are stored physically "close" to 
 one another. Access to any tuple of a relation with an image 
 support is accomplished by keying on the sort field value. 
 Access to a sequence of tuples within a range of sort field 
 
37 
 
 values can also be done efficiently on a relation with an image 
 support. Each image on a relation resides in one or more pages 
 within the segment in which the relation is stored. The pages 
 in which an image resides form the nodes of a balanced B-tree 
 (for a comprehensive discussion on B-trees, see Chapter k.5 of 
 k8 ). A link, on the other hand, is a doubly linked list 
 that connects tuples in one relation (a unary link), or a tuple 
 of one relation with a sequence of tuples of another relation 
 (a binary link). A unary link may be used to maintain for a rel- 
 ation a tuple ordering that is not ascending or descending on 
 any sort fields. It may also be used to connect all tuples of 
 a relation for fast access without the time penalty of an en- 
 tire relation scan (see below). A binary link may significantly 
 enhance the processing time for a join. For example, if a bin- 
 ary link is maintained between each DEPT tuple and all EMP 
 tuples in our example database based on value matches in DNO 
 columns of both relations, to respond to a query "Find the names 
 and salaries of all employees in department 5." an appropriate 
 DEPT tuple is first found and then its binary link is followed 
 to retrieve all required EMP tuples. Finally, a relation scan 
 is simply a sequential search through all data pages of a seg- 
 ment that contains the relation. Since tuples from different 
 relations may be stored in one page (see section 3»5)i the pre- 
 fix of each tuple retrieved must be examined to determine if 
 the tuple belongs to the required relation. 
 
38 
 
 Now we are ready to understand the SYSTEM R optimizer. 
 The objective of this optimizer is to minimize the expected num- 
 ber of page fetches from secondary storage. The optimizer selects 
 an optimal method of executing a query as follows « first, it 
 classifies the query into one of several types according to the 
 presence of some particular language features (e.g., join, group 
 by); second, it examines the system catalogs to determine the 
 set of access paths available for the relations involved; third, 
 it determines the set of "reasonable" methods for executing the 
 query from the set of access paths found; and fourth, it evalu- 
 ates the expected cost formula for each "reasonable" method, 
 and selects the minimum cost method. The parameters of the cost 
 formula include the relation cardinality R, the number of data 
 pages occupied by the relation D, the average number of tuples 
 per data page T (= R/D), the image cardinality I (the number 
 of distinct sort field values in a given image), and the esti- 
 mated CPU cost for tuple comparisons H. 
 
 Some of the "reasonable" methods for executing a simple query 
 involving a single relation are illustrated below along with 
 their expected cost. To be concrete, the following SEQUEL que- 
 ry for finding the names and salaries of all programmers over 
 25 years of age is considered. 
 
 SELECT MAO, SAL 
 FRCP.'! BMP 
 
 WHERE JOB = programmer AND 
 AGE > 25 
 
 1) use of a clustering image that matches a predicate of the 
 
 form (column = constant). An image "matches" a predicate 
 
39 
 
 if the sort field of the image matches the column name of the 
 predicate. If EMP has the clustering image on JOB, this op- 
 tion is applicable. The expected cost to retrieve all result 
 tuples is R/(T x I) page accesses. 
 
 2) use of a clustering image that matches a predicate of the form 
 (column comparator constant). If EMP has the clustering image 
 on AGE, this option is applicable. Assuming that half the 
 tuples in the relation satisfy the predicate, the expected 
 cost is R/(2 x T) . 
 
 3) use of an image (non-clustering) that matches a predicate of 
 the form (column = constant). Since each tuple requires a 
 page access, the expected cost is R/l. 
 
 *0 use of an image that matches a predicate of the form (column 
 comparator constant). The expected cost is R/2. 
 
 5) use of a clustering image that does not match any predicate. 
 The image is scanned, and each tuple retrieved is evaluated 
 against all predicates. The expected cost is R/T + H x R x N, 
 where N is the number of predicates in the query. 
 
 6) use of an image that does not match any predicate. The ex- 
 pected cost is R + H x R x N. 
 
 For a more complete discussion on the processing of a simple 
 query, see (47) . 
 
 Now let us examine some of the "reasonable" methods for executing 
 a query involving a join of two relations. A detailed discussion 
 on these methods is given in (47). 
 
40 
 
 Again to be concrete, let us consider a SEQUEL query that finds 
 
 the names and jobs of all employees who work on the third floor 
 
 and earn over 20X. 
 
 SELECT NAME, JOB 
 FROM EMP, DEFT 
 WHERE EMP. SAL > 20 K AND 
 
 EMP.DNO = DEPT.DNC AND 
 
 DEPT.LCG = 3 
 
 1) use images on join fields of both relations i 
 
 a. Scan along the image on DEPT.DNC to retrieve a DEPT tuple 
 whose LOC value is 3« 
 
 b. Scan along the image on EMP.DNO to retrieve all EMP tuples 
 whose DNC values match the current DEPT.DNO value and 
 whose SAL values are greater than 20K. 
 
 c. Repeat steps a and b until the image on DEPT.DNC is com- 
 pletely scanned. 
 
 2) sort both relations along the join fields » 
 
 a. Scan EMP using its clustering image, and create a tempora- 
 ry relation Rl with NAME, JOB, and DNO fields from those 
 tuples of EMP whose SAL values are greater than 20K. 
 
 b. Scan DEFT using its clustering image, and create a tempora- 
 ry relation R2 with DNC field from those tuples of DEPT 
 whose LOC is J, 
 
 c. Sort Rl and R2 on DNC fields. 
 
 d. Apply method 1) above. 
 
 3) use multiple passes : 
 
 a. Scan DEPT using its clustering image, and insert into a 
 main memory data structure W the DNC values of all DEPT 
 
41 
 
 tuples whose LOC values are 3. W is a unary relation in 
 our example. If W can hold only N such tuples, the small- 
 est N tuples are placed in W during current pass. The 
 remaining tuples are inserted into W during subsequent 
 passes. 
 
 b. Once W is constructed, scan EMP using its clustering image. 
 For each EMP tuple T whose SAL value > 20K, T. NAIVE and 
 T.JC3 values are placed in the output list, if T.DNO is 
 found in W. 
 
 c. If W was not large enough to hold all qualified DEPT tuples, 
 steps a and b are repeated. 
 
 4) use the TID algorithm : 
 
 a. Construct a sorted list Rl of the tuple identifiers of 
 the EMP tuples whose SAL values > 20K using the image on 
 EMP. SAL. 
 
 b. Construct a sorted list R2 of the TIDs of the DEPT tuples 
 whose LOC values = 3 using the image on DEPT. LOC. 
 
 c. Perform a simultaneous scan over the DEPT.DNO and EMP.DNC 
 images to find the TID pairs of tuples whose DNO values 
 match. If Tl and T2 of each such TID pair (Tl, T2) are 
 in Rl and R2, respectively, then the tuple referenced by 
 Tl is retrieved and its NAME and JOB field values are 
 placed in the output list. 
 
 The use of a binary link in processing a join was explained ear- 
 lier in this subsection. Remember that each of these ' options is 
 applicable only if its requisite access paths are available. 
 
h2 
 
 SYSTEM R currently relies on the users to dynamically create and 
 delete all access paths maintained on their base relations. Tech- 
 niques are being investigated that will shift some of these bur- 
 dens away from the users. 
 
 5. SUPPORT CF USER VIEWS 
 
 An individual user's view of the database may be derived from the 
 underlying relations by permuting their columns, or applying sel- 
 ection, projection, or join operations. A view can be any deriv- 
 able relation as far as supporting retrieval operations is con- 
 cerned. For supporting storage operations (insert, delete, up- 
 date), however, a view must be identical to the underlying base 
 relations, except that individual tuples and non-primary key 
 fields may be omitted in the view. This restriction is essential 
 in propagating the effects of the storage operations down to the 
 underlying base relations. Systems that support user views in- 
 clude I 3/1, PRTV, SEQUEL, ZETA, and SYSTEM R. A good introduc- 
 tion to the notion of view is given in (49) and Chapter 10 of 
 (6). This section examines the view mechanisms implemented for 
 I 3/1, ZETA, and SYSTEM R. 
 
 5.1 13/1 
 
 In is/l a view definition is translated into an equivalent string 
 of intermediate codes (see section 2.2) and stored away (an impli- 
 cit view). When this view is accessed for the first time, its 
 definition is retrieved and executed to yield the desired rela- 
 tion. The relation thus materialized is then stored as a new 
 base relation (an explicit view), so that subsequent accesses to 
 
<o 
 
 the view may be accomplished without the overhead necessary in 
 executing its definition. 
 
 When a base relation is updated, Io/l makes all explicit views 
 defined on the relation implicit again, so that the effect of 
 the update is reflected in the views when they are recreated. 
 If loss of information may result in recreating an explicit 
 view, however, updates are made directly on the explicit view. 
 The effect of storage operations on a view is not propagated 
 to its underlying base relations. It is noted here that IS/l 
 also allows views to be defined on other views to any number 
 of levels. 
 
 When the storage space becomes full, 13/1 automatically deletes 
 some of the explicit views. Explicit views are selected for 
 deletion according to their usage counts, sizes, whether or not 
 they can be recreated without loss of data, and the estimated 
 computing effort required to recreate them. 
 
 5.2 ZSTA 
 
 ZETA supports two types of views, namely a snapshot and an auto- 
 matic derivation. The implementation schemes for these are due 
 to Czarnik (50) . 
 
 A snapshot is a time invariant picture of a portion of the data- 
 base at a particular instant of time. A snapshot does not ref- 
 lect any changes made to its underlying relations, and has to be 
 recreated each time an updated version is required. An automa- 
 tic derivation, on the other hand, is a time-dependent view that 
 is updated whenever its underlying relations are updated. 
 
44 
 
 A snapshot defined on a single relation may be implemented in 
 any of the following ways: 
 
 First, in a retrieval-oriented environment, the marking scheme 
 of the low level system (see section 2.5) can be used. An ac- 
 cess to a snapshot then involves indirect accesses to the ap- 
 propriate tuples via their tuple identifiers in the mark. 
 Second, in an update-oriented environment, the overhead required 
 in updating the marks becomes prohibitively high, and a snapshot 
 is implemented as an explicit relation. This scheme sacrifices 
 storage space in favor of an improved processing time. 
 Third, a compromise scheme can be used to avoid the excessive 
 processing time overhead and the storage space penalty of the 
 above two methods. A snapshot is initially implemented as a 
 mark. Before a change is made to a tuple of the underlying 
 relation, the tuple is copied and logged onto a separate data 
 area. The mark entry is then modified to point to the logged 
 tuple. When the number of updates to the mark reaches a thresh- 
 old, the snapshot is converted to an explicit relation and the 
 mark is destroyed. 
 
 An automatic derivation is implemented by storing its definition 
 and executing it each time an access is made to the view. The 
 marking scheme can also be used to implement an automatic deri- 
 vation defined on a single relation. 
 
 5.3 INGRES 
 
 INGRES does not currently support user views. However, the query 
 modification technique (see section 6.2) developed for the inte- 
 
45 
 
 grity and access control (32, 33) can be used for the support of 
 user views in future versions of the system. 
 
 5.4 SYSTEM R 
 
 SYSTEM R allows definition of views via the query facilities of 
 the SEQUEL data sublanguage. Queries may be made against a view, 
 and other views may be defined on it. When a user destroys a 
 view, the view and all other views defined on it are destroyed. 
 Storage operations against a view are supported only if the view 
 is a selection and/or a projection on a single base relation and 
 it also contains the primary key of the base relation. Effects 
 of storage operations against a view are propagated down to its 
 underlying base relation. Effects of storage operations on the 
 base relations immediately become visible to all views defined 
 on them. 
 
 SYSTEM R stores a view definition as a Pre-Optimized Package 
 (POP) containing its parse tree and an optimal execution scheme 
 to materialize the data (determined by the optimizer as discussed 
 in section 4.3). SYSTEM R performs an operation against a view 
 by merging the parse tree for the qualification portion of the 
 operation into the parse tree for the view definition, then op- 
 timizing the resulting tree, and finally performing the indicated 
 operation on the relevant tuples of the base relation. Note that 
 if base relations or the access paths maintained on them are de- 
 leted, the POPs of all views defined on these base relations are 
 invalidated, and must be recreated from their defining SEQUEL 
 statements. 
 
46 
 
 6. INTEGRITY AND AUTHORIZATION CONTROL 
 
 It is essential that an integrated database system in a multi- 
 user environment prevent users from accessing and modifying the 
 database in unauthorized ways (authorization or access control) 
 and prevent authorized or unauthorized users from violating the 
 semantic accuracy of the database through their carelessness or 
 lack of knowledge (integrity control). However, only a few sys- 
 tems are known to support both the integrity and authorization 
 control facilities. Integrity control is provided in QUERY-BY - 
 EXAMPLE, INGRES, and SYSTEM R; while authorization control is 
 provided in MADAM, SEQUEL, QUERY-BY-EXAMPLE, and SYSTEM R. This 
 section deals with the authorization mechanism of MADAM, and the 
 authorization and integrity control facilities of INGRES and SYS- 
 TEM R. r'or a good introductory discussion on database security 
 and integrity, see Chapters 23 and 24 of (6). 
 
 6.1 MADAM 
 
 An authorization mechanism due to Owens (53) is implemented for 
 ^• f ADAM with the aid of the MULTIC3 protection facilities (54). 
 The creator of a relation is allowed to grant to other users 
 various types of read and write capabilities on any column of 
 the relation. In increasing order of accessibility, the read 
 capabilities that may be granted are null (N), manipulate (M), 
 statistical (3), and print (P). N permits no access, and is the 
 system default. M permits a user to perform relational algebra 
 operations on the protected column. The user may or may not be 
 
^7 
 
 allowed to see the results of the operations. S allows a user 
 to perform statistical functions (e.g., average, sum, count) and 
 see the results. P allows a user to read the protected column. 
 Each type of read capability is considered to be a proper subset 
 of the next more accessible capability. Thus, a user with the 
 3 capability also has the M capability, and a user with the P ca- 
 pability has the 3 and M capabilities as well. 
 
 The write capabilities that may be granted are, in increasing or- 
 der of accessibility, null (N), append (A), write (W), and change 
 access (C). N permits no access, and is the default. A allows 
 a user to append tuples to the logical end of the protected rela- 
 tion. T,r allows a user to update the column values. C permits a 
 user to manipulate the authorization information associated with 
 the column being protected. 
 
 furthermore, before the system allows any of the above-mentioned 
 read and write capabilities, any combination of the following 
 tests that may be specified by the creator of the relation or 
 those users to whom the G capability has been granted must be 
 satisfied : 
 
 1) list of user characteristics such as the user-id, terminal-id, 
 day of the week, and so on. Examples of authorization restric- 
 tions with this type of test may be "if user-id = Smith and term- 
 inal-id = 57 and day = Friday, then access = M" or "if user-id 
 
 t Jones, then access = P." 
 
 2) content of the column being protected. For example, "if SAL 
 = 30X, then access = N." 
 
 3) context in which the column being protected appears. For ex- 
 
48 
 
 ample, "SAL must not be retrieved in conjunction with the corres- 
 pondinr NAME values" and "3AL must not be retrieved in conjunc- 
 tion with NAME = Jones." 
 
 *0 password. The user must provide an appropriate password in 
 response to a prompt from the system. 
 
 6.2 INGRES 
 
 The current version (version 5) of INGRES does not support selec- 
 tive authorization of access, and also enforces only simple inte- 
 grity assertions that involve single range variables and no aggre^ 
 gate functions. So-called query modification techniques have 
 been developed, however, to support user views, selective authori 
 zation, and more general integrity assertions, and they are cur- 
 rently being incorporated into the system. 
 
 Before discussing the query modification techniques in detail, 
 the limited database control features implemented in the current 
 system are examined. First of all, INGRES supports a set of 
 Databases. Each Database has a special user called the Database 
 Administrator (DEA). Each Database consists of a set of rela- 
 tions created and owned by the DBA, sets of relations created 
 and owned by other users authorized to interact with the Data- 
 base, and a set of catalog relations containing information 
 about all other relations in the Database, their columns, and 
 the secondary indices maintained on them. The relations created 
 by the DBA are shared among all users in the Database; however, 
 the relations created by other users are not shared. The DBA 
 
^9 
 
 can destroy any relations created by any other users. 
 
 Let us now see how the proposed query modification techniques 
 
 can be uniformly applied to implement the views, authorization, 
 
 and general integrity assertions. 
 
 Smith defines a view VEMP on EMP as follows : 
 
 RANGE OF E 13 EKP 
 
 DEFINE VEEP (E.NAKE, E.DNO, E.3AL) 
 
 WHERE E.SAL > 20 K 
 
 An integrity constraint is in force on EMP: 
 
 RANGE 0? E IS EMP 
 
 INTEGRITY CONSTRAINT IS E.SAL < 30K 
 
 Smith is given the authority to update salaries of employees 
 
 he manages : 
 
 RANGE OF E IS EMP 
 
 PROTECT EMP FOR ALL (E.3AL; E.NAME, E.DNO) 
 
 WHERE E.MGR = Smith 
 
 ( Smith is authorized only to read the names and department 
 
 numbers of his employees) 
 
 Now suppose that Smith wishes to give Jones a 20 percent pay 
 
 raise by issuing the following QUEL statement: 
 
 RANGE OF S IS VEMP 
 
 REPLACE E (SAL = 1.2 * S.SAL) 
 
 WHERE E.NAME = Jones 
 
 Since this query is against a view, it is first modified as 
 
 follows : 
 
 RANGE OF E IS EMP 
 REPLACE E (SAL = 1.2 * E.SAL) 
 WHERE E.NAME = Jones AND 
 E.SAL > 20K 
 
 Note that the predicates of the query and the view definition 
 
 are merged, and also that the range variable E now ranges over 
 
 the base relation EMP. 
 
50 
 
 ^he integrity constraint is next merged into this modified query 
 to yield the following: 
 
 RANGE CF E IS EMP 
 
 REPLACE E (SAL = 1.2 * E.3AL) 
 
 WHERE E.NAME = Jones AND 
 
 E.3AL > 20 K AND 
 
 1.2 * E.3AL -c 30K 
 
 Finally, the authorization restriction is merged into the query, 
 
 and the resulting query is ready to be passed down to process 3» 
 
 RANGE CF E 13 EM? 
 
 REPLACE E (3AL = 1.2 * E.3AL) 
 
 WHERE E.NAKE = Jones AND 
 
 E.3AL "> 20K AND 
 
 1.2 * E.SAL < 30K AND 
 
 E.MGR = Smith 
 
 6.3 SYSTEM R 
 
 SYSTEM R supports extensive data control facilities, including 
 authorization, integrity assertions, transactions, and triggers. 
 
 1) authorization 
 
 SYSTEM R does not provide for a centralized database administra- 
 tor. Each user creates and deletes his own base relations, and 
 defines and destroys views on them. A user also grants and re- 
 vokes selected privileges on his views and base relations to 
 other users. These privileges may be any combinations of the 
 following: read, insert tuples, delete tuples, update selected 
 fields of tuples; specify access paths, triggers, and integrity 
 assertions; and grant or revoke any of these privileges to still 
 other users. A detailed account of the SYSTEM R authorization 
 mechanism is given in (51). 
 
51 
 
 The nested granting of privileges and their subsequent revocation 
 complicate the implementation of the authorization mechanism. 
 When a user creates a base relation, he is fully authorized to 
 perform any operation on it, and also to grant any or all privi- 
 leges on it to any other user. The update privilege is held on 
 each column of the base relation. The privileges that any user 
 nolds on a base relation are either grantable or not grantable. 
 The set of privileges that the user may grant to other users is 
 the union of all grantable privileges he holds on the relation. 
 The set of privileges that the user holds on the relation, on 
 the other hand, is the union of all grantable and non-grantable 
 privileges he holds on the relation. A user may revoke only 
 those privileges he previously granted. When a user A revokes 
 a grantable privilege P from another user B, P is withdrawn from 
 user B and all other users to whom user B may have granted P. 
 Note that even when user A revokes P from user B, user B may 
 still retain ? if another user C previously granted P to user B. 
 Consider the following situation, assuming that Smith is the cre- 
 ator of the EMP relation: 
 
 1 . Smith grants to Jones the read and insert privileges on 
 EMP with the grant option. 
 
 2. Smith grants to Brown the read privilege on EMP with the 
 grant option. 
 
 3. Jones grants to Brown the insert privilege on EMP. 
 
 Erown is then authorized to insert tuples into the EMP relation, 
 make queries against the relation, and grant the read privilege 
 to other users. Note also that the read privilege permits Erown 
 
52 
 
 to define views on the EMP relation. 
 Now if Smith revokes all privileges from Brown, Brown loses the 
 read privilege and the privilege to grant the read privilege, but 
 retains the insert privilege that Jones granted to him. If, on 
 the other hand, Smith revokes the grant privilege from Jones, 
 brown loses the insert privilege as a result. 
 
 When a user defines a view, his privileges on it are restricted 
 to only those privileges that he holds on the underlying views 
 and base relations. If the view involves more than one relation, 
 the user's privileges are defined as the intersection of the pri- 
 vileges held on all underlying relations. The user may grant a 
 orivilege ? on the view if and only if he holds a grantable privi- 
 lege P on all underlying relations. The update privilege is held 
 on a column of the view only if it is held on that column of all 
 underlying relations in which the column appears. 
 
 2) integrity assertions 
 
 SYSTEM R supports integrity assertions that describe either valid 
 states (state assertions) or valid state transitions (transition 
 assertions). For example, the assertion, "No employee should 
 earn less than 10K, " is a description for a valid state; while 
 the assertion, "No employee should be given a pay cut, " constrains 
 a state transition. 
 
 SYSTEM ?. allows both state and transition assertions to be imposed 
 on individual tuples of a relation or on sets of tuples. For ex- 
 ample, "The average salary of employees of any department must 
 not exceed 15K" is a state assertion applied to a set of tuples, 
 
53 
 
 while ,,rv he average salary of employees of any department must 
 not rise by more than 10 percent a year" is a transition asser- 
 tion on a set of tuples. 
 
 3Y3TE?" P, furthermore, allows the user to specify a sequence of 
 SEQUEL statements as a transaction , and checks for and enforces 
 integrity constraints at the end of each transaction by default, 
 so that consistency constraints may be applied to a sequence of 
 complex storage operations which may pass through states that 
 temporarily violate these constraints. For example, in transfer- 
 rins money from Smith's savings account to checking account, the 
 consistency constraint that Smith maintain a constant total bal- 
 ance is temporarily violated when the savings account is debited 
 but the checking account is not yet credited. State assertions 
 apply to the final state of the database, while transition asser- 
 tions compare the states of the database before and after the 
 transaction. 
 
 When integrity assertions are declared to be "immediate," however, 
 they are enforced immediately, even on intermediate stages of a 
 transaction. An immediate assertion on a tuple is enforced when- 
 ever a tuple is updated, while an immediate assertion on a set 
 of tuples is enforced after each SEQUEL update statement is exe- 
 cuted. SYSTEM 3 allows the user to specify save points within a 
 transaction and back up either to the beginning of the transac- 
 tion or to any save point by undoing all changes made to the data- 
 base by the transaction. The user may also establish integrity 
 points (similar to the transaction save points) within a long 
 transaction in order to have the transaction backed out only to 
 
5^ 
 
 its most recent integrity point. Integrity constraints are en- 
 forced at each integrity point within a transaction. 
 Finally, integrity assertions are dynamically created and deleted 
 by the user. The SYSTEM R approach to integrity control is des- 
 cribed in detail in (52). 
 
 3) Triggers 
 
 A trigger is a sequence of SEQUEL statements that is automatical- 
 ly executed when a pre-specified condition is satisfied. For ex- 
 ample, whenever an EMP tuple is inserted, or deleted, or its DNC 
 field is modified, the NEMP3 field of an appropriate DEFT tuple 
 (EMP.DNO = DEPT.DNO) must be modified to reflect the change in 
 the EM? relation. The following trigger, named EMPIN3ERT, auto- 
 matically increments the NEMPS field of a DEPT tuple when a new 
 EMP tuple is inserted: 
 
 DEFINE TRIGGER EMPIN3ERT 
 CM INSERT OF EMP t 
 (UPDATE DEPT 
 
 SET NEMPS = NEMPS + 1 
 
 WHERE DNC = NEW EMP.DNO) 
 
 Triggers EMFDELETE and SMPUPDATE can also be easily defined for 
 execution when an EMP tuple is deleted, and when the DNO field 
 of an EMP tuple is modified, respectively. 
 
55 
 
 7. CONCURRENCY CONTROL 
 
 Systems that support concurrent updates on a shared database in- 
 clude GMI3, INGRES, and SYSTEM R. This section discusses the 
 concurrency control implemented for INGRES and SYSTEM R. 
 
 7.1 INGRES 
 
 The current version of INGRES provides only a crude concurrency 
 control. The unit of data consistency is one QUEL statement. 
 INGRES uses physical locks on columns of relations, and avoids 
 deadlocks by not allowing an interaction to pass down to process 
 3 until it can lock all required resources. When a lock request 
 is made, the LOCK catalog is first physically locked, and then 
 checked for the presence of the lock. If the lock is not in the 
 catalog, it is placed in the catalog, and the lock request is 
 satisfied. If the lock is already in the catalog (i.e., the lock 
 is held by another concurrent user), the concurrency processor 
 of process 2 waits for a fixed interval, and then tries again. 
 The concurrency processor deletes appropriate locks from the cat- 
 alog only when it receives a termination message from process 3» 
 
 7.2 SYSTEM R 
 
 SYSTEM P. implements locking techniques at the logical level of 
 relations and tuples as well as at the physical level of pages. 
 Locking: at the logical level is implemented to support isolated 
 backout of a transaction and retrieval of "clean" data as well 
 as synchronization of concurrent updates. The isolated backout 
 
56 
 
 of a transaction refers to the data consistency constraint that 
 the backout of a transaction's updates must not undo another 
 concurrent transaction's updates. A data item remains "dirty" 
 as lonr as the transaction within which it has been updated may 
 be backed out; it becomes "clean" only when the transaction goes 
 to completion. The classic "lost update" problem is also resolved 
 by the logical level locking. 
 
 v.'hen the system locks relations, tuples, or index values at the 
 logical level, it must also acquire locks at the physical level 
 of data or index pages to insure accurate results. For example, 
 transaction Tl makes an access to the EMP relation, and transac- 
 tion m 2 to the DEPT relation. Suppose tuples from the two rela- 
 tions are stored in the same page. Tl may obtain the byte address 
 of an SMP tuple from a header area slot; and before Tl retrieves 
 the tuple, T2 may retrieve a DEPT tuple and invoke a data compac- 
 tion routine to reassign tuple addresses in the page. The tuple 
 address that Tl previously retrieved may not point to the correct 
 tuple any longer. The logical locks set on the two relations thus 
 do not guarantee accurate results. 
 
 Physical locks are held for the duration of a single SEQUEL state- 
 ment execution, while logical locks are held until they are expli- 
 citly released or to the end of a transaction. Locks may either 
 be shared or exclusive. If a transaction is only to read a data 
 item, it sets a shared lock on the data item and allows other con- 
 current transactions to read the same data item. If a transaction 
 holds an exclusive lock on a data item, no other concurrent trans- 
 actions are allowed to read or update the data item. 
 
57 
 
 SYSTEM R locks data objects at various granularities so that ap- 
 plications accessing different amounts of data are run efficient- 
 ly. Locks can be set on individual tuples, or an entire relation, 
 or even an entire segment. For a transaction requiring an access 
 to a small amount of data, it would be reasonable to lock individ- 
 ual tuples; while it may be more efficient to lock entire rela- 
 tions, if a transaction accesses a very large amount of data. 
 Unlike INGRES, which avoids deadlocks, SYSTEM R is designed to 
 periodically check for deadlock situations, and recover from 
 deadlocks by backing out one or more transactions on the basis 
 of their relative ages and on the duration of their locks. 
 SYSTEM R implements three levels of consistency in isolating 
 each transaction from the effects of other concurrent transac- 
 tions. At any level of consistency the isolated backout of a 
 transaction is guaranteed. The consistency levels differ in 
 the degree of isolation of each transaction from other concur- 
 rent transactions, and in the processing overhead. 
 A transaction at level 1 consistency may read "dirty" data, 
 and different values for the same data during the same trans- 
 action. Although it offers the least degree of isolation of 
 a transaction from others, this level incurs the least proces- 
 sing overhead, and may be entirely satisfactory when exact 
 results are not required. At this level no logical locks are 
 required for a read operation. 
 
 At level 2 consistency, a transaction reads only "clean" data? 
 however, it may still read different values for the same data 
 
58 
 
 durinr the same transaction, since the data may be updated by 
 another transaction and become "clean" before the first trans- 
 action goes to completion. If a tuple is read, a shared lock 
 must be held on the tuple for the duration of the read opera- 
 tion. 
 
 At level 3 consistency, the user sees the logical equivalent of 
 a sinrle user system. Each transaction reads only "clean" data, 
 and the same values for the same data during the same transac- 
 tion. Although it guarantees complete isolation of a transac- 
 tion from other concurrent transactions, this level incurs the 
 highest processing overhead. If a tuple is read at this level 
 of consistency, a shared lock must be held on the tuple until 
 the end of the transaction. 
 
 It is noted that if a tuple is updated, an exclusive lock must 
 be held on the tuple for the duration of the transaction for 
 all three levels of consistency. 
 
 The SYSTEM P. approach to concurrency control is given in (37 » 
 38). 
 
 8. SYSTEM CHECKPOINT AMD RECOVERY 
 
 ^ost of the early prototype relational database systems do not 
 provide functions to restore the database to a consistent state 
 in the event of a system crash. In this section the system 
 checkpoint and recovery schemes implemented in INGRES and SYSTEM 
 ?. are presented. 
 
 8.1 INGRES 
 
59 
 
 Recall that for an update operation all qualifying tuples are 
 isolated and spooled onto a temporary file in process 3 and 
 the actual update of the base relations and the secondary in- 
 dices maintained on them is performed in process b. INGRES ac- 
 complishes recovery from system crashes that leave the disk 
 intact by either destroying the temporary deferred update file 
 if process 4 has not started processing it, or by allowing the 
 deferred update file to be completely processed if process k 
 has already started processing it. 
 
 To recover from system crashes that destroy the disk files, 
 the INGRES "superuser " logs all database interactions, and 
 checkpoints the database onto tape using the UNIX backup scheme. 
 Recovery is then accomplished by restoring the last checkpoint 
 and running the log of interactions. 
 
 8.2 3Y3TEP4 R 
 
 The SYSTEM R scheme for recovery from system crashes that leave 
 the disk storage intact (i.e., only the contents of main memory 
 are destroyed) and also from those that destroy the disk storage 
 is based on the use of transaction logs and a set of segment re- 
 covery functions. 
 
 The set of segment recovery functions includes the RSI operators 
 OPEN_SEGMENT, 3AVE__3EGMENT , and RE3TCRE_3EGMENT . To efficiently 
 support these functions, two page maps, called current and backup, 
 are associated with each recoverable segment. When a segment is 
 first opened by the 0PEN_3EGMENT operator, these two page maps 
 contain identical entries. V/hen a page of the segment is access- 
 
60 
 
 ed for an update operation for the first time since the segment 
 was opened or was last saved (see below), the page is fetched 
 into a main memory buffer, a new page slot is allocated on the 
 disk, and the appropriate entry in the current page map is mod- 
 ified to reference the new page slot. The page is then updated 
 in its main memory buffer, and written out on the new page slot. 
 The 5AVE__3EGMENT operator causes all pages of a segment that 
 have been updated in their main memory buffers to be written 
 onto their newly allocated page slots. All old page slots for 
 those pages that have been updated since the last segment save 
 point are released, and the backup page map entries are set 
 equal to their corresponding current page map entries. The 
 P.E3TCRS_3SGMENT operator causes the current page map entries 
 to be set equal to their corresponding backup page map entries 
 and all newly allocated page slots to be released, since the 
 backup page map points to a consistent copy of the segment. 
 lecrments dedicated for user relations and segments intended 
 for the logs of transactions are both recovered via these re- 
 covery functions. 
 
 For recovery from disk- intact system crashes, the system fre- 
 quently checkpoints segments containing user relations. The 
 transaction log segments are checkpointed at the end of each 
 transaction, and contains both the old and new values of all 
 updated data objects. 
 
 For recovery from non-disk-intact system crashes, the system 
 not only checkpoints data segments and transaction log seg- 
 ments, but also copies the saved pages from disk to tape. 
 
61 
 REFERENCES 
 
 1. Codd, E.F. A Relational Model for Large Shared Data Banks, 
 CACM vol. 13, no. 6, June 1970, pp. 377-387 
 
 2. Codd, E.F. A Data Base Sublanguage Founded on the Relational 
 Calculus, Proc. 1971 ACM SIGFIDET Workshop on Data Descrip- 
 tion, Access and Control, San Diego, Calif., Nov. 1971 
 
 3. Codd, E.F. Further Normalization of the Data Base Relational 
 Model, Courant Computer Science Symposia, vol. 6, Data Base 
 Systems, Prentice Hall, N.Y., May 1971 
 
 4. Codd, E.F. Relational Completeness of Data Base Sublanguages, 
 Courant Computer Science Symposia, vol. 6, Data Base Systems, 
 Prentice Hall, N.Y., May 1971 
 
 5. Codd, E.F. Recent Investigations in Relational Data Base Sys- 
 tems, Proc. IFIP Congres 74, Stockholm, Sweden, Aug. 1974 
 
 6. Date, C.J. An Introduction to Database Systems, 2nd. edition, 
 Addison-Wesley, 1977 
 
 7. Chamberlin, D.D. Relational Data-Base Management Systems, 
 Computing Surveys, vol. 8, no. 1, March 1976 
 
 8. Whitney, V.K. A Relational Data Management System, Proc. 
 
 1974 ACM SIGMOD Workshop on Data Description, Access and Con- 
 trol, Ann Arbor, Mich., April 1974, pp. 55-66 
 
 9. Steuert, J., and J. Goldman. The Relational Data Management 
 System: A Perspective, Proc. 1974 ACM SIGMOD Workshop on 
 Data Description, Access and Control, Ann Arbor, Mich., 
 April 1974, PP. 295-320 
 
 10. Rothnie, J.B. An Approach to Implementing a Relational Data 
 Base Management System, Proc. 1974 ACM SIGMOD Workshop on 
 Data Description, Access and Control, Ann Arbor, Mich., 
 April 1974 
 
 11. Pothnie, J. 3. Evaluating Inter-Entry Retrieval Expressions 
 in a Relational Data Base Management System, Proc. AFIPS 
 
 1975 National Computer Conference, vol. 44, pp. 417-423 
 
 12. Smith, J.M., and P.Y.T. Chang. Optimizing the Performance 
 of a Relational Algebra Database Interface, CACM vol. 18, 
 no. 10, Oct. 1975, PP. 568-579 
 
 13. Boyce, R.F., and D.D. Chamberlin. A Structured English Que- 
 ry Language, Proc. 1974 ACM SIGMOD Workshop on Data Descrip- 
 tion, Access and Control, Ann Arbor, Mich., May 1974 
 
62 
 
 14. Boyce, R.F., and D.D. Chamberlin. Using a Structured English 
 Query Language as a Data Definition Facility, IBM Research 
 Report: RJ 1318, San Jose, Calif., Dec. 1973 
 
 15. Astrahan, M.M., and D.D. Chamberlin. Implementation of a 
 structured English Query Language, CACM vol. 18, no. 10, Oct. 
 1975, PP. 580-588 
 
 16. Zloof, Wi.M. Query By Example, Proc. AFIP3 1975 National Com- 
 puter Conference, vol. 44, May 1975 
 
 17. Zloof, M.F. Query By Example: The Invocation and Definition 
 of tables and Forms, Proc. International Conference on Very 
 Large Data Bases, Framingham, Mass., Sept. 1975 
 
 18. Reisner, F. , R.F. Boyce, and D.D. Chamberlin. Human Factors 
 Evaluation of Two Data Base Query Languages — SQUARE and 
 SEQUEL, Proc. AFIPS 1975 National Computer Conference, vol. 
 
 44, May 197 5 
 
 19. Thomas, J.C., and J.D. Gould. A Psychological Study of Query 
 By Examole, Proc. AFIPS 1975 National Computer Conference, 
 vol. 44* May 1975 
 
 20. Donovan, J.J., R. Fessel, S.S. Greenberg, and L.M. Gutentag. 
 An Experimental VM/370 Based Information System, Proc. Inter- 
 national Conference on Very Large Data Bases, Framingham, Mass. 
 Sept. 1975, PP. 549-553 
 
 21. Codd, E.F. Seven Steps to Rendezvous with the Casual User, 
 Proc. IFI? TC-2 Working Conference on Data Base Management 
 Systems, April 1974, North-Holland 
 
 22. Goldstein, R.C., and A.J. Strnad. The MacAIMS Data Manage- 
 ment System, Proc. 1974 ACM SIGMGD Workshop on Data Descrip- 
 tion, Access and Control, Ann Arbor, Mich., May 1974 
 
 23. Strnad, A.J. The Relational Approach to the Management of 
 Data Bases, Proc. IFIF Congres 1971, pp. 901-904 
 
 24. Not ley, M.S. The Peterlee 13/1 System, IBM (UK) Scientific 
 Centre Report: UKSC - 0018, March 1972, Peterlee, England 
 
 25. Todd, 3. J. P. The Peterlee Relational Test Vehicle — A Sys- 
 tem Overview, IBM Systems Journal, vol. 15, no. 4, 1976, 
 pp. 285-307 
 
 26. Hall, P.A.V. Optimisation of a Single Relational Expression 
 in a Relational Data Base System, IBM (UK) Scientific Centre 
 Report: UKSC - 0076, June 1975, Feterlee, England 
 
 27. Lorie, R.A. XRM -- an Extended (n-ary) Relational Memory, 
 
 IBM Scientific Center Report: G320-2096, 1974, Cambridge, Mass. 
 
63 
 
 28. Astrahan, M.M., and R.A. Lorie. SEQUEL-XRMi A Relational 
 System, Proc. ACM Pacific Regional Conference, April 1975» 
 pp. 3^-38 
 
 29. Czarnik, B., 3. Schuster, and D. Tsichritzis. ZETA 1 A Rela- 
 tional Data Base Management System, Proc. ACM Pacific Region- 
 al Conference, April 1975, pp. 21-25 
 
 30. Mylopoulos, J., S. Schuster, and D. Tsichritzis. A Multi- 
 Level Relational System, Proc. AFIPS 1975 National Computer 
 Conference, vol. kk t May 1975. PP- 403-408 
 
 31. Stonebraker, M.R., E. Wong, and P. Kreps. The Design and Im- 
 plementation of INGRES, ACM Transactions on Database Systems, 
 vol. 1, no. 3, Sept. 1976, pp. 189-222 
 
 32. Stonebraker, M.R. Implementation of Integrity Constraints 
 and Views by Query Modification, Proc. ACM 1975 SIGMOD Work- 
 shop on Management of Data, San Jose, Calif., May 1975, pp. 
 65-78 
 
 33. Stonebraker, M.R., and E. Wong. Access Control in a Relation- 
 al Data Ease Management System by Query Modification, Proc. 
 ACM 1974 National Conference, San Diego, Calif., Nov. 1974, 
 vol. 1, pp. 180-186 
 
 34. Wong, E. , and K. Youssefi. Decomposition — A Strategy for 
 Query Processing, ACM Transactions on Database Systems, 
 vol. 1, no. 3, Sept. 1976, pp. 223-241 
 
 35. Ritchie, D.M., and K. Thompson. The UNIX Time-Sharing System, 
 CACM vol. 17, no. 7, July 1974, pp. 365-375 
 
 36. Astrahan, M.M. et. al. System R: Relational Approach to Data- 
 base Management , ACM Transactions on Database Systems, vol. 1, 
 no. 2, June 1976, pp. 97-137 
 
 37. Gray, J.N., R.A. Lorie, and G.R. Putzolu. Granularity of 
 Locks in a Shared Data Ease, Proc. International Conference 
 on Very Large Data Bases, Framingham, Mass., Sept. 1975, 
 
 pp. 428-451 
 
 38. Gray, J.N., R.A. Lorie, G.R. Putzolu, and I.L. Traiger. 
 Granularity of Locks and Degrees of Consistency in a Shared 
 Data Base, Proc. IFI? TC-2 Working Conference on Modelling 
 in Data Base Management Systems, Jan. 1976, North-Holland 
 
 39- Eswaran, K.P., J.N. Gray, R.A. Lorie, and I.L. Traiger. 
 
 The Notions of Consistency and Predicate Locks in a Data Base 
 System, CACM vol. 19, no. 11, Nov. 1976, pp. 624-633 
 
 40. Tremblay, J. P., and P.G. Sorenson. An Introduction to Data 
 Structures with Applications, McGraw-Hill, 1976 
 
64 
 
 41 . Gottlieb, D. et. al. A Classification of Compression Methods 
 and Their Usefulness in a Large Data Processing Center, Proc. 
 4?I?3 1975 National Computer Conference, vol. 44, pp. 453-458 
 
 42. Palermo, P.P. A Data Ease Search Problem, IBM Research Re- 
 port: PJ 1072, July 1972, San Jose, Calif. 
 
 43. Pecherer, R.M. Efficient Evaluation of Expression in a Rela- 
 tional Algebra, Proc. ACM Pacific Regional Conference, April 
 1975, pp. 44-49 
 
 44. >enko, M.E., E.B. Altman, M.M. Astrahan, and P.L. Fehder. 
 Data Structures and Accessing in Data-Base Systems, IBM Sys- 
 tems Journal, vol. 12, no. 1, 1973 
 
 4 5. Tsichritzis, D. A Network Framework for Relation Implementa- 
 tion, Proc. IFIP TC-2 Working Conference on Data Definition 
 
 Language, Jan. 1975 
 
 46. Gotlieb, L.R. Computing Joins of Relations, Proc. ACM 1975 
 3IGMCD International Conference on Management of Data, San 
 Jose, Calif., 1975. pp. 55-63 
 
 47. Blasgen, M.W., and K.P. Eswaran. On the Evaluation of Que- 
 ries in a Relational Data Base System, IBM Research Report: 
 PJ 1745, April 1976, San Jose, Calif. 
 
 48. Wirth, N. Algorithms + Data Structures = Programs, Prentice 
 Hall, En^lewood Cliffs, N.J., 1976 
 
 49. Chamberlin, D.D., J.N. Gray, and I.L. Traiger. Views, Author- 
 ization, and Locking in a Relational Data Base System, Proc. 
 AFIPS 1975 National Computer Conference, vol. 44, pp. 425-430 
 
 50. Czarnik, E. A Primitive Relational Data Base Management Sys- 
 tem, M.Sc. Thesis, Dept. of Computer Science, U. of Toronto, 
 Oct. 1974 
 
 51. Griffiths, P., and B.W. Wade. An Authorization Mechanism for 
 a Relational Data Base System, ACM Transactions on Database 
 Systems, vol. 1, no. 3. Sept. 1976, pp. 242-255 
 
 52. Eswaran, K.P., and D.D. Chamberlin. Functional Specification 
 of a Subsystem for Data Base Integrity, Proc. International 
 Conference on Very Large Data Bases, Framingham, Mass., Sept. 
 1975, PP. 48-68 
 
 53. Owens, R.C. Evaluation of Access Authorization Characteris- 
 tics of Derived Data Sets, Proc. 1971 ACM 3IGFIDET Workshop 
 on Data Description, Access and Control, pp. 263-278 
 
 54. Graham, R.M. Protection in an Information Processing Utility, 
 CACM vol. 11, no. 5, May 1968, pp. 365-369 
 
IBLIOGRAPHIC DATA 
 HEET 
 
 1. Report No. 
 
 UIUCDCS-R-78-913 
 
 Title and Subtitle- 
 Relational Database System Implementations — A Survey 
 
 Author(s) 
 
 Won Kim 
 
 Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 . Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D. C. 
 
 >upp!cmt. nrary Notes 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 January, 1978 
 
 8. Performing Organization Rept. 
 
 No - UIUCDCS-R-78-913 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 US-NSF-MCS73-07980 
 
 13. Type of Report & Period 
 Covered 
 
 Technical Report 
 
 14. 
 
 . Abstract s 
 
 This paper surveys the overall structure and design of some of the prototype 
 relational database systems frequently referenced in the literature, and then 
 examines in some detail some interesting approaches that have been taken in 
 implementing various requisite features for a comprehensive relational database 
 system. 
 
 I Ke\ tt'ords and Document Analysis. 17a. Descriptors 
 
 Authorization control 
 Checkpoint and recovery 
 Concurrency control 
 Data consistency 
 Integrity control 
 }uery optimization 
 Relational database 
 Jser view 
 
 ■ Identifiers Open-Ended Terms 
 
 '■ ( OS ATI Fie Id /Group 
 Availability Statement 
 
 Release Unlimited 
 
 C ,M NT1S-35 ( 10-70; 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 66 
 
 22. Price 
 
 USCOMM-DC 40329-P7! 
 

 
 
 MAP 3 1Q7ft 
 
VSBB9B8XBS