The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 To renew call Telephone Center, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 WoTip 
 
 L161— O-1096 
 
UIUCDCS-R-81-1052 
 
 UILU-ENG 81 1721 
 
 A Characterization of Globally Consistent Databases 
 and their Correct Access Paths 
 
 by 
 Yehoshua Sagiv 
 
 July 1981 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/characterization1052sagi 
 
UIUCDCS-R-81-1052 
 
 A Characterization of Globally Consistent Databases 
 and their Correct Access Paths 
 
 Yehoshua Sagiv 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 Urbana, Illinois 61801 
 
 July 1981 
 
 This work was supported in part by NSF grant MCS-80-03308. 
 
ABSTRACT 
 
 We propose the representative instance as a representation of the 
 data stored in a database whose relations are not the projections of a 
 universal instance. We characterize the database schemes for which 
 local consistency implies global consistency. (Local consistency means 
 that each relation satisfies its own functional dependencies, and global 
 consistency means that the representative instance satisfies all the 
 functional dependencies.) We show how to compute efficiently projec- 
 tions of the representative instance provided that local consistency 
 implies global consistency. Throughout this work we assume that a cover 
 of the functional dependencies is embodied in the database scheme in the 
 form of keys. 
 
KO^ Introduction 
 
 The universal instance assumption is essential to many papers in 
 design theory for relational databases. As pointed out in [FMU], two 
 different concepts are included in this assumption. The most basic con- 
 cept is the universal relation scheme assumption (also known as the 
 "uniqueness assumption" [Ber]). It asserts that each attribute has a 
 unique role, that is, for any subset of attributes X, there is (at most) 
 one relationship among the attributes of X. This assumption is made 
 explicitly or implicitly in many papers in design theory for relational 
 databases. In particular, it is made in papers dealing with the 
 axiomatization of dependencies, and synthesis and decomposition of rela- 
 tion schemes. 
 
 The second and more controversial concept is the universal instance 
 assumption , that is, the assumption that the relations of a database are 
 the projections of a single relation over the set of all the attributes. 
 This assumption is needed in order to define lossless joins [ABU]. 
 There are two versions of this assumption. According to the first, this 
 assumption has to be made only in order to determine whether a join is 
 lossless, i.e., it is only a tool that is used when a database is 
 designed and when queries are evaluated [FMU]. The second version (the 
 pure universal instance assumption) states that the relations of a data- 
 base must always be the projections of a universal instance, and null 
 values have to be used in order to satisfy this requirement 
 [HLY,Ko,Li,Ma,Sc]. 
 
- 2 - 
 
 When a universal instance is assumed, users usually formulate 
 queries having in mind the universal instance rather than the actual 
 relations of the database [KU] . If a given query refers to a set of 
 attributes X, then the first step in evaluating this query is to compute 
 the projection of the universal instance onto X. If the pure universal 
 instance assumption is made, it is sufficient to take any lossless join 
 over a set of attributes that contains X. If the relations of the data- 
 base are not the projections of a universal instance, different lossless 
 joins might give different results. In [FMU] this problem is solved by 
 requiring that the join dependency consisting of all the relation 
 schemes be acyclic. In this paper we propose an alternative solution. 
 We believe that our solution reflects the properties of functional 
 dependencies better than [FMU] . In particular, some of the problems 
 left open in [FMU] are handled in our case without explicitly defining 
 maximal objects [MU] . 
 
 In this paper we assume a universal relation scheme, but not a pure 
 universal instance. Instead we define the representative instance of a 
 database (in Section 3). The representative instance has been used to 
 determine whether the database satisfies a set of functional dependen- 
 cies [Hol,Val]. We believe that the representative instance correctly 
 describes the information stored in the database even when the relations 
 are not the projections of a universal instance. As a first step toward 
 evaluating queries with respect to the representative instance, we 
 address the following problems. 
 (1) Under what conditions the database is globally consistent if each 
 
- 3 - 
 
 relation is locally consistent? That is, under what conditions the 
 representative instance satisfies all the functional dependencies if 
 each relation satisfies its own functional dependencies? 
 (2) How can we compute efficiently projections of the representative 
 instance? 
 
 Throughout this paper we assume that a cover of the functional 
 dependencies is embodied in the database scheme in the form of keys (as 
 in [Ber]). In Section 3 we define the uniqueness condition , and in Sec- 
 tion 4 we prove that local consistency implies global consistency if and 
 only if the database scheme satisfies the uniqueness condition. In Sec- 
 tion 5 we show how to compute efficiently projections of te representa- 
 tive instance provided that the database scheme satisfies the uniqueness 
 condition. 
 
 2_. Preliminaries 
 2_._1 Basic Definitions 
 
 We view a relation (cf. [CI]) as a finite table with columns 
 labeled by attributes , and rows, called tuples , that represent mappings 
 from the attributes to their associated domains . Let |i be a tuple of a 
 relation labeled by a set of attributes R. If A e R, then |i(A) is the 
 value of \x in column A; if S C R, then u[S] denotes the values of \i for 
 the attributes in S . We say that r is a relation over a set of attri- 
 butes R if the columns of r are labeled by the attributes of R. 
 
- 4 - 
 
 We use the letters A,B,C,... to denote attributes, and the letters 
 . ..,X,Y,Z to denote sets of attributes. A string of attributes (e.g., 
 ABCD) denotes the set containing these attributes, and the union of two 
 sets X and Y is written XY. 
 
 A relational database scheme over a set of attributes U is a set of 
 
 ordered pairs <R, ,K n >, . . . ,<R ,K > such that R. C U and K. is a set of 
 
 linn 1 — i 
 
 (explicit) keys of R. . (If X e K. , then XC R ; and no key of R. prop- 
 erly contains another key of R. .) Each R. is a set of attributes label- 
 ing the columns of a relation, and we use it as the name of the relation 
 (i.e., there are no distinct relations over the same set of attributes). 
 We call each R. a relation scheme . The set of FDs (functional dependen- 
 
 cies) of R. is 
 1 
 
 F = {X -» A | X e K. and A e R. - X} 
 
 A database is a set of relations r, , . . . ,r over R. , . . . ,R , respec- 
 In In 
 
 tively, such that each r. satisfies F . That is, a database is the 
 
 n 
 "current value" of the database scheme. We assume that F = \} F is a 
 
 i=l 
 cover of all the FDs imposed on the database by the user. In other 
 
 words, a cover of the FDs is embodied in the database scheme in the form 
 
 of (explicit) keys (as in [Ber]). 
 
 2_.2_ Relations with Null Values 
 
 In many cases there is a need to represent partial information in 
 the database. If we have a relation over the attributes Manager and 
 
- 5 - 
 
 Department, and Jones is a manager without a department, then the tuple 
 (Jones, 6) is inserted into this relation. The value 6 is a special 
 value, called a null value , and it denotes unknown information. Suppose 
 that there are two managers without a department, e.g., Jones and Smith. 
 There is no reason to assume that they manage the same (unknown) depart- 
 ment. In order to distinguish the null value in the tuple (Jones, 6) 
 from the null value in the tuple (Smith, 6), we will mark each null value 
 with a unique subscript and store the tuples (Jones, 6,) and (Smith, 6«). 
 Null values with distinguishing subscripts are called marked null 
 [Ko,Ma], and will be used exclusively in this paper.. Two null values 
 are equal only if they have the same subscript. We say that tuples \x. 
 and [i agree on column A, written n (A) = u.(A), if either both u (A) 
 and n 9 (A) are not null and equal or both are null and equal. 
 
 2.*2 The Chase Process 
 
 Suppose that a relation r (with null values) is required to satisfy 
 an FD X -> A. It is not necessarily correct to argue that r violates 
 
 X -*■ A if it has two tuples that agree on X and disagree on A. Instead, 
 
 (2) 
 we should use X -*■ A to equate symbols of r in the following way. 
 
 Suppose that r has tuples u. and n_ that agree on all the columns for X 
 
 but disagree on the column for A. If u. has 6. in column A and |i_ has 
 
 6 . in column A, then we can replace all occurrences of 6 . in r with 6. . 
 
 If u. has a non-null value c in column A and u_ has a null value 6 . in 
 
 (2) Occasionally, we refer to null and non-null values as symbols. 
 
- 6 - 
 
 that column, then we can replace all occurrences of 6. with the non-null 
 value c . 
 
 Suppose that the relation r is required to satisfy a set F of FDs . 
 
 We can apply the FDs of F to r until no more symbols of r can be 
 
 equated. The relation obtained in this way is called the chase of r 
 
 with respect to F, written chase (r), and it satisfies an FD X •> A of F 
 
 F 
 
 if and only if there is no pair of tuples that agree on X and disagree 
 
 on A. We say that the relation r sa tisfies F if and only if chase (r) 
 
 F 
 
 satisfies F. If r satisfies F, then chase (r) is unique up to renaming 
 
 F 
 
 of null values [MMS]. 
 
 Example 1 : Suppose that F = {A->-C,A->-D, B+C , CD+B } and let r be the 
 following relation. 
 
 A 
 
 B 
 
 c 
 
 D 
 
 1 
 
 1 
 
 6 1 
 
 6 2 
 
 6 3 
 
 2 
 
 1 
 
 6 4 
 
 1 
 
 6 5 
 
 6 6 
 
 2 
 
 2 
 
 1 
 
 6 7 
 
 1 
 
 We apply the FD A ■> C to the first and third tuples to replace 6, with 
 6 . Then all occurrences of 6. are replaced with 6 7 by applying B ■*■ C 
 to the fourth and first tuples. The result is the following relation. 
 
 A 
 
 B 
 
 c 
 
 D 
 
 1 
 
 1 
 
 6 7 
 
 6 2 
 
 6 3 
 
 2 
 
 1 
 
 6 4 
 
 1 
 
 6 5 
 
 6 7 
 
 2 
 
 2 
 
 1 
 
 6 7 
 
 1 
 
- 7 - 
 
 By applying A + D to the first and third tuples, 6~ is replaced with 2. 
 
 Now 6- is replaced with 1 by applying CD ♦ B to the first and third 
 
 tuples. Since no more applications are possible, chase (r) is 
 
 F 
 
 A 
 
 B 
 
 C 
 
 D 
 
 1 
 
 1 
 
 5 7 
 
 2 
 
 6 3 
 
 2 
 
 1 
 
 6 4 
 
 1 
 
 1 
 
 6 7 
 
 2 
 
 2 
 
 1 
 
 6 7 
 
 1 
 
 Clearly, chase (r) (and hence r) satisfies F. If we add the FD C ■*■ D to 
 F 
 
 F, then chase (r) is no longer unique, since 6~ can be replaced wi 
 
 either 1 or 2, and C ■> D is not satisfied by chase (r) (and r) . [] 
 
 r 
 
 th 
 
 If r satisfies F, then r also satisfies additional FDs that can be 
 inferred by Armstrong's axioms [Arm] . The closure of a set of attri- 
 butes X with respect to a set of FDs F, written 3L,, is the set of all 
 attributes A such that X ->■ A can be derived from F by Armstrong's 
 axioms. We can compute X^ in linear time [BB]. If F denotes a cover of 
 
 all the FDs imposed on the database (i.e., the embodied FDs), then we 
 
 + + 
 
 usually write X instead of X^. 
 
 2.4 Relational Expressions and Extension Joins 
 
 In this paper we consider relational expressions over the operators 
 
 project, (natural) join, and union that are denoted by ti, M, and \>, 
 
 respectively. The operands are the relation schemes R. , . . . ,R . Let a 
 
 be a set of relations r, , . . . ,r for the relation schemes R. ,...,R , 
 
 1* ' n 1' ' n' 
 
- 8 - 
 
 respectively, such that each r satisfies F . The value of E for a, 
 
 written v (E), is computed by substituting the relations r, ,...,r for 
 a In 
 
 the relation schemes R. , . . . ,R , and applying the operators according to 
 
 the usual definitions. (When the join is applied, two tuples are joined 
 
 on a given column only if they agree in this column.) Two expressions E 1 
 
 and E„ are equivalent , written E. = E„, if for all sets a, 
 
 v (E n ) = v (E„). The expression E_ is contained in E- , written E_ C E.. 
 a 1 a 2 r 2 1' 2—1 
 
 if for all sets a, v (E_) C v (E.). An expression E has a unique value 
 
 a 2. — a l 
 
 for the current database and, so, by a slight abuse of notation we 
 
 denote this value by E (rather than v (E), where a is the set of the 
 
 current relations) . In particular, if u is a tuple in the value of E 
 
 for the current database, then we write u. e E. 
 
 m 
 The expression t><| R. is an extension join [Ho2] of R. if for all 
 
 j=l X 3 1 
 
 Kj<m, the set R. ...R. contains a key of R. . Extension joins are a 
 
 L l X j X j+1 
 
 special case of lossless joins [ABU]. 
 
 Example 2 : Let <ABC, {A}>,<BD, {B}>,<CDE, {CD}> be a database scheme. 
 ABC M BD is an extension join of ABC. Similarly, ABC M BD M CDE is 
 an extension join of ABC, since ABC contains the key B of BD, and ABCD 
 contains the key CD of CDE. ABC M CDE is not an extension join, since 
 ABC does not contain any key of CDE (i.e., ABC does not contain CD which 
 is the only key of CDE). [] 
 
 (1) This notion of equivalence is called strong equivalence in [ASU] . 
 
- 9 - 
 
 _3 . The Representative Instance 
 
 The ultimate goal of designing a database scheme has always been a 
 collection of relation schemes R. , . . . ,R that are independent, that is, 
 relation schemes that allow the user to update each relation in the 
 database without having to change the contents of the other relations. 
 Of course, there might be some semantically meaningful constraints 
 (e.g., as in [C2]) that do not allow every possible update. But these 
 constraints should be as limited as possible. We feel that enforcing 
 the universal instance assumption is too restrictive. Clearly, this 
 assumption can always be enforced by using marked nulls [KU,Ma]. How- 
 ever, this can be done only at the expense of applying the chase process 
 to the universal instance whenever updates are performed on the data- 
 base. Furthermore, there is a need to store many null values that do 
 not provide any information. These null values are needed only to 
 satisfy the universal instance assumption. 
 
 Efficiency is not the only issue. Most relational database systems 
 are not designed to use the chase process. Therefore, there is a need 
 to develop a theory for determining correct access paths when the 
 universal instance assumption is not satisfied. The simplified univer- 
 sal instance assumption of [FMU] is one possible solution. In [FMU] a 
 certain condition is imposed on the database scheme, and it is claimed 
 that under that condition a minimal (in the number of relation schemes) 
 lossless join is a correct access path (even if the relations of the 
 database are not the projections of a universal instance). We will 
 present a different set of conditions and show that queries should be 
 
- 10 - 
 
 evaluated by performing the union of several lossless joins rather than 
 just one lossless join. 
 
 In this section we will define the representative instance 
 [Hol,Va2] of a database r , . . . ,r . The representative instance is 
 defined for every r , . . . ,r (even if the r 's are not the projections of 
 a single relation). If r. , . . . ,r are the projections of a universal 
 instance r and the database scheme has the lossless join property, then 
 r is also the representative instance. In [Hol,Val] the representative 
 instance is used to determine whether the database satisfies a set of 
 FDs . We believe that the representative instance is also a correct 
 representation of all the information stored in the database, and 
 queries posed about the contents of the database should be answered with 
 respect to the representative instance. 
 
 Let U be the set of all the attributes. A relation r can be 
 viewed as a relation over U by adding columns for the attributes in 
 U - R. that contain distinct null values. Formally, the augmentation of 
 a relation r over R to a relation over U, written a TJ (r ), is 
 {u u agrees with some tuple of r on R , 
 and has distinct null values 
 (that do not appear in any other tuple) 
 for the attributes of U - R.} 
 
 Example 3 : Let r be the relation 
 
- 11 - 
 
 a..„ n (r) is the relation 
 ABCD 
 
 1 
 
 2 
 
 
 
 c 3 
 itloi 
 
 C 4 
 
 
 
 A 
 
 B I 
 
 c 1 
 
 D 
 
 C l 
 
 s ij 
 
 c >! 
 
 6 2 
 
 C 3 
 
 6 3 1 
 
 *l 
 
 6 4 
 
 where 6 I {c. ,c ? ,c_,c, } for every i. 
 
 [] 
 
 Consider a database r. , . . . ,r over relation schemes R.,...,R , and 
 
 In In 
 
 n 
 
 let r" ™ y a (r ) . If there are no dependencies, then r' is the 
 
 i=l 
 
 representative instance of the database r. , . . . ,r . When dependencies 
 
 are present, the chase process should be applied to r'. Thus, if the 
 
 only dependencies are those in the set of FDs F, then the representative 
 
 instance is chase^(r'). The database r. ,...,r satisfies the set of FDs 
 
 F if the representative instance satisfies F [Hol,Val]. 
 
 Example 4 ; Consider the database scheme 
 <ABCD,{A}>,<CGDEF,{CG»,<DEFB,{DEF}>,<BCF,{BC}>. Note that this data- 
 base scheme is in Boyce-Codd normal form. Suppose that the relation for 
 ABCD is {1112}, the relation for CGDEF is {11111}, the relation for DEFB 
 is {1111}, and the relation for BCF is empty. To obtain the representa- 
 tive instance we have to compute the chase of the following relation. 
 
 _A_ 
 1 
 
 1 B 
 
 C 
 
 D 
 
 E 
 
 F 
 
 G 
 
 1 1 
 
 1 
 
 2 
 
 6 i 
 
 6 2 
 
 6 3 
 
 i h 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 I 1 
 
 S 7 
 
 1 
 
 1 
 
 1 
 
 6 8 
 
- 12 - 
 
 The null value 6,. can be replaced with 1 by applying the FD DEF •> B to 
 the second and third tuples, and then 6„ is replaced with 1 by applying 
 BC ■*■ F to the first and second tuples. No FD can be applied after that, 
 and so the representative instance is 
 
 1 B 
 
 C 
 
 D 
 
 E 
 
 F 
 
 1 
 
 1 
 
 2 
 
 h 
 
 1 | 
 
 | 1 
 
 1 
 
 1 
 
 1 
 
 1 | 
 
 1 1 
 
 6 7 
 
 1 
 
 1 
 
 1 | 
 
 8 
 This representative instance (and hence also the database) satisfies the 
 
 FDs that are embodied in the relation schemes. However, if the relation 
 
 for BCF were {112} instead of <J>, then the representative instance would 
 
 violate the FD BC ■* F . [] 
 
 In the above example, the projection of the representative instance 
 onto BCF contains the tuple 111. It may be argued that the tuple 111 
 over the attributes BCF does not represent a correct information, since 
 the relation for the relation scheme BCF does not contain this tuple. 
 However, if this argument is accepted, then it follows that two distinct 
 relationships between the attributes B, C, and F are stored in the data- 
 base. One relationship is stored in the relation for the relation 
 scheme BCF, and the other relationship is obtained by the extension join 
 CGDEF M DEFB. But this is contrary to the universal relation scheme 
 assumption that is essential to many works in design theory for rela- 
 tional databases. Without this assumption the axioms for functional and 
 multivalued dependencies, and the various synthesis and decomposition 
 algorithms cannot be used. Therefore, we believe that the representa- 
 tive instance correctly represents the information stored in the 
 
- 13 
 
 database. 
 
 3_. 1_ Computing Total Projections of the Representative Instance 
 
 In the remainder of this paper we consider a database scheme 
 
 <R, ,K->, . . . ,<R ,K > and a corresponding database r.,..,r with a 
 linn In 
 
 representative instance r. For simplicity's sake, we assume that the 
 
 relations r.,..,r do not contain null values. (Thus, null values exist 
 1' n 
 
 only in the representative instance.) We can always decompose a relation 
 scheme into smaller relation schemes, each having the keys of the origi- 
 nal relation scheme and some of its attributes. Therefore, the only 
 actual restriction implied by our assumption is that null values cannot 
 be stored for the attributes of a key. 
 
 The user formulates queries having in mind the representative 
 
 instance r rather then the individual relations r, , . . . ,r . Suppose that 
 
 1 n 
 
 the user is formulating a query that refers to a set of attributes X. 
 The first step in evaluating this query is to compute the projection of 
 r onto X. We assume that if the user is referring to the attributes in 
 X, then he/she is interested only in tuples of r that have non-null 
 values for all the attributes in X. Therefore, the problem addressed in 
 this paper is how to compute the X -total projection of r, i.e., 
 {\i I u is a tuple in it (r) without any null value} 
 
 A 
 
 For example, given the database of Example 4, the ACF-total projection 
 of the representative instance is {111}. 
 
- 14 - 
 
 Example 4 illustrates two surprising facts. First, the representa- 
 tive instance does not necessarily satisfies the functional dependencies 
 even if each relation satisfies the FDs imposed by its keys (this fact 
 was originally pointed out in [Hoi]). Second, an X-total projection of 
 the representative instance cannot always be computed by joining several 
 relations of the database and then projecting onto X. In that example, 
 
 the ABCDF-total projection of the representative instance is {11121}. 
 
 m 
 However, no expression of the form tl-CM R. ) has as its value the rela- 
 
 tion {11121} over ABCDF. (Here, R ,...,R. are some of the relation 
 
 1 X m 
 schemes.) In the following sections we will characterize those cases in 
 
 which the problems indicated by Example 4 do not occur. 
 
 3^. 2_ The Uniqueness Condition 
 
 n 
 Consider a tuple u of ]} a. (r ) that has originated from the rela- 
 
 j=l J 
 tion r . That is, \i is non-null in the columns of R and has distinct 
 
 nulls in all the other columns. Therefore, a null value in column 
 
 A e U - R. of u can be replaced during the chase process (either with 
 
 another null or with a non-null) only if A e R. [BDB]. The problems 
 
 illustrated in Example 4 are caused by the existence of two different 
 
 ways that could potentially replace a specific null value. In order to 
 
 avoid such troublesome situations, we require that for each relation 
 
 scheme R , there is a unique way to derive R . We will show that the 
 
 representative instance of the database r , ...,r satisfies F (i.e., the 
 
 set of all the embodied FDs) if and only if each R is uniquely derived. 
 
- 15 - 
 
 We will define "uniqueness" after the following informal example. 
 
 Example 5 : Consider the algorithm of Fig. 1. This algorithm com- 
 putes X assuming that a cover of F is embodied in the relation schemes. 
 Suppose the database scheme is <ABC, {A}>,<BD, {B}>,<CD, {C}>. The closure 
 of ABC can be computed (i.e., derived) in two different ways. Either we 
 use BD and its key B to obtain ABCD from ABC in line 3 of Fig. 1, or we 
 use CD and its key C to obtain ABCD from ABC. 
 
 If the database scheme is <ABC, {A}>,<BD, {B}>,<CE, {C}>, then there 
 is only one way to compute (ABC) . BD is used to add D, and CE is used 
 to add E. [] 
 
 We say that R satisfies the uniqueness condition if there is no R. 
 (j * i) such that for some set of attributes X and an attribute A 
 
 (1) XA C (R ± )p_ F , and 
 
 j 
 
 (2) X e K and A e R - X. 
 
 Example 6 : For the first database scheme of Example 5, we have 
 
 F. = {A-HB,A»C}, F - {B-»D}, and F - {C-*D}. ABC does not satisfy the 
 
 uniqueness condition, since CD C (ABC) where C is a key of CD and D 
 
 F-F 3 
 
 is another attribute of CD. In the second database scheme of Example 5, 
 
 F = {A->>B,A>C}, F = {B+D}, and F = {C+E}. The relation scheme ABC 
 
 satisfies the uniqueness condition. In proof, (ABC) = ABCD and only 
 
 3 
 C appears also in CE and it is a key of CE; (ABC) = ABCE and only B 
 
 appears also in BD and it is a key of BD. [] 
 
- 16 - 
 
 begin 
 
 (1) Y := X; 
 
 (2) while there is a key V e K. such that V C Y and R J Y do 
 
 J — J — — 
 
 (3) Y := YR. 
 end 
 
 Figure 1 
 
 We now show that the uniqueness condition implies that R. is 
 
 uniquely derived. Suppose that R. satisfies the uniqueness condition, 
 
 and let R. C R. (j * i) . Let X = (R.)t, ^ uR.. We claim that X is a 
 l—i l F-F . j 
 
 J 
 key of R.. In proof, if X does not contain a key of R , then it is 
 
 J J 
 
 impossible to have R C^ R. ; and if it properly contains a key, then the 
 
 uniqueness condition is violated. If B e X, we say that R. uses B in R 
 
 (sometimes we say that R. uses X in R. ). If B e R - X, we say that R 
 
 adds B to R . . We also say that R. adds all its attributes to R. and 
 l l i 
 
 uses none of them. The following propositions show that if R. satisfies 
 the uniqueness condition, then R is uniquely derived in the sense that 
 each A e R is added by a unique R . 
 
 Proposition 1 : If R satisfies the uniqueness condition and 
 R (: R , then every A e R . is either used by R in R or added by R, to 
 R (but not both) . 
 
 Proof: Immediate from the definitions. [] 
 
- 17 - 
 
 Proposition 2 : If R satisfies the uniqueness condition, then each 
 A e R. is added by a unique R. C_ R . . 
 
 Proof : Clearly, each A e R. is added by at least one R.. Suppose 
 
 that for some A e R. there are distinct R and R such that both R and 
 
 i q P q 
 
 R add A to R. . Since R adds A, there is a key X e K such that 
 p i q q 
 
 X C (R. ) T , ^ and A e R - X. Similarly, there is a key Y e K such that 
 — i F-F q p 
 
 + ** + 
 
 Y C_ (R . ) and A e R - Y. Consider a computation of R by the algo- 
 
 P P 
 rithm of Fig. 1. 
 
 Case 1_. q # i and R is never used in line 3 of the algorithm. 
 Thus, XA C_ R C_ (R . ) and, so, R. does not satisfy the uniqueness 
 
 q " q 
 
 condition (a contradiction). 
 
 Case 2. p * i and R is never used in line 3. This is similar to 
 P 
 
 Case 1 . 
 
 Case 3. R is used in line 3 after R (this includes the case 
 
 q P 
 
 where p = i, i.e., R is not used at all in line 3). Let Y" be the 
 
 value of Y just before R is used in line 3. Since Y' C (R_, )„ _ and R 
 J q — i F-F p 
 
 q + 
 
 has already been used (i.e., R C Y'), it follows that R C (R.,)^ „ . 
 
 p — p — i F _ F 
 
 + + ^ 
 
 But A e R and, so, A e (R. )_, „ . Since X C (R. )_, _, , X e K and 
 p i F — r — 1 r ~r Q 
 
 q q 
 
 A e R - X, the uniqueness condition is violated by R (a contradic- 
 
 tion) . 
 
 Case 4. R p is use d in i ine 3 after R This is similar to Case 3. 
 
 Since at least one of q and p is different from i, no other case is 
 possible. [] 
 
- 18 - 
 
 4. Main Theorem 
 
 Let <R. ,K.>, . . . ,<R ,K > be a database scheme, and suppose that each 
 relation r. satisfies F and does not have any null values. In this 
 section we will prove that the representative instance satisfies F if 
 and only if each R. satisfies the uniqueness condition. First, we will 
 prove two lemmas that are needed for the "if" part (which is more diffi- 
 cult to prove). So, we assume that each R. satisfies the uniqueness 
 
 n 
 condition. Let r = ]} ql (r ) and consider the computation of chase (r). 
 
 i-1 U i F 
 
 Suppose that u is a tuple of r that has originated from r . We say that 
 
 R. adds A to ii if R. adds A to R. . Similarly, R. uses A in u if R. uses 
 
 A in R. . By the uniqueness condition and Propositions 1 and 2, for each 
 
 A e R. , there are unique R. and key X e K, such that R. adds A to u 
 i J J J 
 
 using X (clealry, A e R. -X). 
 
 Lemma 1 : Suppose that all the R.'s satisfy the uniqueness condi- 
 tion. Let R C R^ , Y e K , and A e R - Y. If R either does not add A 
 P - i P P P 
 
 to R. or does not use Y in R. , then there is B e Y such that R adds B 
 i i P 
 
 _+ 
 
 to R. . 
 
 l 
 
 Proof : Suppose that R does not add A to R . By Proposition 1, 
 
 there is a key V e K used by R in R J such that A e V. If Y C V, then 
 
 P P i ~ 
 
 YA C V. Since A i Y, it follows that the key V properly contains the 
 
 key Y of R . But this is impossible and, so, Y - V * <j> and every 
 
 B e Y - V is added by R to R+ 
 
 P i 
 
 Now suppose that R does not use Y. If R does not use any key 
 
 P P 
 
 (i.e., p=I), then R adds every B e Y. If R uses another key V e K , 
 x ' r ' p p p 
 
- 19 - 
 
 then Y (2 V (otherwise, one key properly contains another key). There- 
 fore, there is B e Y - V that is added by R . [] 
 
 We assume that chase (r) is computed by repeatedly applying the 
 
 F 
 
 following rule to tuples of r. 
 
 FD-rule . If tuples u 1 and |i_ agree on some key X e K but disagree on 
 some B e R - X, then for all A e R - X: 
 
 (1) If |i. (A) * ^u(A) and \i. (A) is null, then replace all 
 occurrences of u-, (A) in r with n_(A). 
 
 (2) If |j. (A) * H ? (A) an d H« (A) is null, then replace all 
 
 occurrences of u„(A) in r with n_(A). 
 
 The above FD-rule is sufficient for computing chase (r), since a cover 
 
 F 
 
 of F is embodied in the database scheme. 
 
 Lemma 2 ; Suppose that all the R 's satisfy the uniqueness condi- 
 tion. Let r" be an intermediate relation in the computation of 
 chase (r), and let \x^ t \i~ e f . 
 
 (A) If u, (A) = |a«(A) = 6 for some column A, then 
 
 (Al) the same R. adds A to both u- and u_, and the same key X e K 
 
 is used by R. in both u. and \i~, and 
 (A2) ^[Rj] = ^[Rj]. 
 
 (B) If \i. (A) is non-null, then \i [R ] e r , where R is the unique rela- 
 tion scheme that adds A to \i . 
 
 Proof ; Induction on the number of applications of the FD-rule that 
 are used to obtain r" from r. 
 
- 20 - 
 
 Basis . Zero applications. That is, r" = r and, so, all null 
 
 values in r' are distinct. Thus, part A is vacuously true. As for part 
 
 B, let u. be a tuple of r' that has originated from r . If u (A) is 
 
 non-null, then A e R. and R. adds A to li, . But u. [R . 1 e r. and, so, 
 
 1 i H. ^1 l i J i 
 
 part B is true for r" . 
 
 Induction . Suppose that r" is obtained from r by n-1 > applica- 
 tions, and r" is obtained from r" by a single application. In particu- 
 lar, suppose that r" Is obtained from r" by equating tuples v. and v~ 
 
 using key Y of R (i.e., the FD-rule is applied to v. and v that agree 
 p 1 z 
 
 on Y in r"). By the inductive hypothesis, the lemma is true for r", and 
 we have to show that it is true also for r' . 
 
 Part A. Let u. and \x~ be tuples of r" such that u 1 (A) = u«(A) = 6. 
 for some column A. If |a, (A) = u 2 (A) also in r", then by the inductive 
 hypothesis, conditions (A1)-(A2) of the lemma are satisfied in r" and, 
 hence, also in r" (if for some column B, we have u, (B) = u 2 (B) in r", 
 then |i, (B) = |i ? (B) also in r'). Thus, we have to consider only |i., \i~ 
 and an A such that 
 
 (1) Ul (A) = v^A) in r", 
 
 (2) u 2 (A) = v 2 (A) in r", and 
 
 (3) v.(A) * v-(A) in r" and A e R - Y. 
 
 (Otherwise, either \xAA) - ^(A) in r" or [^(A) * u 2 (A) in r' . ) 
 
 Case 1. R adds A using Y to both v, and v . Since v. (A) ■ a, (A) 
 — p 1 z. 11 
 
 in r" (and u, (A) is null in r", since it is null in r' ) , the inductive 
 hypothesis implies that R adds A using Y also to u- , and 
 
- 21 - 
 
 (i) ^[Rp] = V;L [R p ] in r". 
 
 Similarly, R adds A using Y also to \i~ and 
 
 (ii) n 2 [R p ] = v 2 [R p ] in r". 
 
 It remains to be shown that u. [R ] ■ u^[R ] in r'. Suppose that v.. (B) 
 
 is non-null in r" for some B e R - Y. Since R adds B to v,, part B of 
 
 P P 1 
 
 the inductive hypothesis implies that v 1 [R ] e r in r". By (i), 
 u [R ] e r in r" and, so, u 1 (A) is non-null in r* , contrary to assump- 
 tion. Therefore, v, (B) is null in r" for all B e R - Y, and similarly 
 for v~. Since v.. and v„ are equated using key Y of R , it follows that 
 
 V 1 [ V = V 2 [ V ln **" By (1) and (li)> ^1 [ V = ^2 [R p ] in r '* 
 
 Case 2. R does not add A to v, . We will show that this case can- 
 p 1 
 
 not actually occur. Let r, be the relation from which v. has ori- 
 ginated. By a lemma of [BDB], R C_ R, (since v. is equated with v~ 
 using R ). Lemma 1 implies that there is B e Y that is added by R to 
 
 v., because A is not added by R and A e R - Y (by (3)). Since v, and 
 1 y p p N,/x '' 1 
 
 v„ are equated using key Y of R , it follows that v 1 (B) = v«(B) in r". 
 If v. (B) is null in R", then by part A of the inductive hypothesis, 
 v. (A) = v~(A) in r", contrary to (3). If v. (B) is non-null in r", then 
 by part B of the inductive hypothesis, v. [R ] e r in r" and, by (1), 
 (j.. (A) is non-null in r' , contrary to assumption. Thus, v.. (B) is neither 
 null nor non-null in r" and, consequently, this case cannot occur. 
 
 Case 3. R does not add A to v~. This is similar to Case 2. 
 p 2 
 
 Part B. We have to show that if \i~ (A) is non-null in r* , 
 
 then 
 
 u.[R.] e r., where R adds A to \L. . If u. (A) is also non-null in r", 
 
- 22 - 
 
 then by the inductive hypothesis, u. [R ] e r in r" and, so, u.. [R ] is 
 the same in r" and f (since all the columns of u 1 [R ] in r" are non- 
 null and cannot be changed). Thus, we have to consider only the follow- 
 ing case. 
 
 (a) v. (A) is non-null in r", 
 
 (b) v-(A) is null in r", 
 
 (c) u, (A) = v„(A) in r", and 
 
 (d) v. and v are equated using a key Y of R and A e R - Y. 
 
 12 P P 
 
 Claim 1 : v, [R ] e r in r" (and, hence, also in r'). 
 
 Proof : Case 1_. R adds A to v, . Since v. (A) is non-null in r" (by 
 (a)), the inductive hypothesis implies that v. [R ] e r in r". There- 
 fore, v. [R ] is non-null in r" and, hence, it is the same in r" and r" . 
 
 Case 2_. R does not add A to v. . By (d) and Lemma 1, there is 
 B e Y that is added by R to v.. Suppose that v. (B) is null. Since 
 v. (B) = v„(B) in r" (by (d)), part A of the inductive hypothesis implies 
 that v 1 (A) ■ v„(A) in r". This is contrary to (a) and (b) and, there- 
 fore, v 1 (B) is non-null in r". By part B of the inductive hypothesis, 
 
 v [R ] e r in r". [] 
 
 1 p p 
 
 Claim 2: R uses Y in v.. 
 p 2 
 
 Proof: Suppose that R does not use Y in v„. By Lemma 1, there is 
 
 P ^ 
 
 B e Y such that R adds B to v^. By (d), v 2 (B) = v ] (B) in r" and, by 
 Claim 1, v»(B) is non-null in r". By the inductive hypothesis, 
 
 v„ [R e r in r". But this is impossible, since v (A) is null in 
 
 2 p p 2 
 
- 23 - 
 
 [] 
 
 By Claim 2 and (d), R adds A to v« using Y. Therefore, part A of 
 
 the inductive hypothesis, (b) and (c) imply that \x. [R ] ■ v~[R ] in r" 
 
 (and R adds A to la, using Y). If v_(C) is non-null in r" for some 
 pi 2 
 
 C e R - Y, then by the inductive hypothesis, v„(A) is non-null, con- 
 trary to (b) . Hence, v (C) is null in r" for all C e R - Y and, by 
 
 1 P 
 
 (d), v [R ] = v 9 [R ] in r'. Thus, \x [R ] = v ? [R ] = v. [R ] e r in 
 1 p 2 p lpZplpp 
 
 r'. [] 
 
 Corollary 1 : Suppose that during the computation of chase (r), the 
 
 r 
 
 null in column A of \i. is replaced with a non-null as a result of equat- 
 ing v. and v« using key Y of R . Then R adds A to \i. using Y. 
 
 Proof ; Let v., v„, and |i 1 be the same as in part B of the proof of 
 Lemma 3. We have shown that R adds A to \i. using Y. [] 
 
 Theorem 1: Let <R, ,K,>, . . . ,<R ,K > be a database scheme. Suppose 
 
 linn 
 
 that each relation r satisfies F and does not have any nulls. Then 
 the representative Instance satisfies F if and only if each R satisfies 
 the uniqueness condition. 
 
 Proof : If . Suppose that some FD X -»• A e F is violated in the 
 representative instance. That is, there are tuples u. and n_ in the 
 representative instance such that 
 
 (1) n x [X] = u 2 [X], and 
 
 (2) n 1 (A) and n«(A) are distinct non-nulls. 
 
- 24 
 
 Claim 1 ; n,[R.] e r. and n 2 [R.] e r . 
 
 Proof : We will prove that u,[R.] e r (proving that ^[R.] e r is 
 similar). 
 
 Case 1. R. adds A to \x. . By Lemma 2, \i [R ] e r . 
 
 Case 2_. R. does not add A to \i. . By Lemma 1, there is B e X such 
 that B is added by R. to u. . Suppose that u. (B) is null. By Lemma 2, 
 |i, (A) = |i_(A), since u- (B) = |a~(B). This contradiction implies that 
 \x. (B) is non-null and, by Lemma 2, u, [R. ] e r . [] 
 
 By Claim 1 and (1), \i. (A) = |i_(A), since r satisfies F . But this 
 is contrary to (2) and, so, the "if" part is proved. 
 
 Only If . Suppose that some R. does not satisfy the uniqueness con- 
 dition. That is, there is an R. (j * i) , a key X e K , and an attribute 
 
 A e R. - X such that XA C (R J )„ „ . We have to show that there are 
 
 — i F-F . 
 J 
 relations r.,...,r (without nulls) that satisfy F.,...,F , respec- 
 tively, such that the representative instance does not satisfy F. We 
 choose r. , . . . ,r as follows. For k t j, each r, has exactly one tuple 
 that maps all the attributes of R, to 1. The relation r has exactly 
 
 one tuple that maps A to 2 and each attribute in R - A to 1. Let 
 
 n J 
 
 r = y a (r ), and let u, be the tuple of r that has originated from r . 
 
 ..UK.cC k. 
 
 k=l 
 We apply the chase process to r using only the FDs of F - F. and tuples 
 
 u, such that k * j. Let r' be the resulting relation. By a Lemma of 
 
 [BDB], ^.(B) ■ 1 for all B e (R ) in r'. Therefore, tuples |i and 
 i i F- Fj i 
 
 \i of r' violate X -*■ A. Clearly, if we have a violation of X ■> A in r', 
 
- 25 - 
 
 then this violation exists also in the representative instance and, so, 
 the proof is complete. [] 
 
 5_. Computing Total Projections of the Representative Instance 
 
 Suppose that <R, ,K. >, . . . ,<R ,K > is a database scheme such that 
 linn 
 
 each R. satisfies the uniqueness condition, and let X be a set of attri- 
 butes. In this section we will show how to construct in polynomial time 
 an expression E whose value is the X-total projection of the representa- 
 tive instance. The expression E is of the form ]}n (E ), where each E. 
 
 j X j j 
 
 is an extension join. Among all the expressions of this form whose 
 value is the X-total projection of the representative instance, the 
 
 m 
 
 expression E is minimal in both the number of extension joins and the 
 number of join operators. 
 
 Let r = y. a (r ), and u e chase (r) be a tuple that has originated 
 i=1 U I F 
 
 from r . We define 
 
 P = {A I u(A) is non-null} 
 
 Consider the set 
 
 S = {R. I there is an A e P such that R. adds A to u} 
 
 By Lemma 2, P = R ...R , where S = {R, ,...,R. }. Let R. and R be 
 
 J l J m J l J m J p J q 
 
 distinct members of S. By Corollary 1 and Lemma 2, the columns of u for 
 
 the attributes added by R become non-null as a result of a single 
 
 J P 
 application of R. , and similarly for R. . Therefore, the columns of 
 
 J p J q 
 
 both R and R. cannot become non-null simultaneously (because R. and 
 
 J P J q P 
 
 R. add distinct attributes). Consequently, let R. ,R. ,...,R. be an 
 J q J l h J m 
 
- 26 - 
 
 ordering of the elements of S such that if p < q, then u[R ] becomes 
 
 P 
 non-null before u[R. 1. Note that i, = i. 
 
 v l 
 
 m m 
 
 Lemma 3 ; b<J R. is an extension join of R. , and u[P] e M R. . 
 k=l J k x k=l J k 
 
 Proof : Consider u just before u[R ] (for some Kp<m) becomes non- 
 
 P + 
 
 null. Let n(B) be non-null, and suppose that R. adds B to R . By 
 
 J q 
 
 Lemma 2, u[R . ] is non-null and, so, q < p. Therefore, the non-null 
 
 J q 
 
 columns of u are R. ...R. . By Corollary 1, u[R. ] becomes non-null 
 
 J l Vl J P 
 
 as a result of applying R. and, so, Y C_ R. . . .R, for some key Y of 
 
 m J p J l J p-1 
 
 R. . Thus, MR. is an extension join (of R . ) . 
 J p k=l J k 
 
 Now consider |j, at the end of the chase process. By Lemma 2, 
 
 ^[R . ] e r. for every Kk<m, and \x is non-null exactly in the columns 
 
 J k J k m 
 
 of P = R. ...R. . Therefore, u[P] e M R. . [] 
 J l 3 m k=l J k 
 
 Lemma 3 states that the non-null portion of any tuple in the 
 
 representative instance is in some extension join. Conversely, if 
 
 m m 
 
 t><! R. is an extension join and |i e |><| R, , then by a lemma of [BDB], 
 k=l J k k=l J k 
 
 there is a tuple v in the representative instance such that v[P] ■ |i. 
 
 Therefore, we have the following corollary. 
 
 Corollary 2 : If all the R 's satisfy the uniqueness condition, then 
 
 the X-total projection of the representative instance is given by the 
 
 expression yn Y (E.), where each E is an extension join over a set of 
 j X j j 
 
 attributes containing X. 
 
 The expression E of Corollary 2 might have redundant subexpres- 
 sions. We now show how to minimize it. Clearly, there is an extension 
 
- 27 - 
 
 join of R over a set of attributes containing X only if XCR. . 
 
 J l p J l 
 
 Furthermore, if M R. (p < m) is also an extension join of R. over a 
 
 k=l J k m p J l 
 
 set of attributes containing X, then it v (P<l R. ) C n^-CM R. ). There- 
 
 n % (M R ) C i^(M R ) . 
 k=l J k k=l J k 
 
 fore, we need to consider only minimal extension joins of R. with 
 
 m J l 
 
 respect to X, that is, extension joins \><\ R such that 
 
 k=l J k 
 
 (1) XCR . ..R , and 
 
 J l J m 
 
 (2) no proper subsequence of R. , ...,R. is an extension join of R. 
 
 J l J m X l 
 
 that satisfies (1). 
 
 Lemma 4 ; If XCR,, and R. , . . . ,R satisfy the uniqueness condition, 
 then the minimal extension join of R with respect to X is unique and 
 can be found in linear time. 
 
 Proof : The algorithm of Fig. 2 computes a set S of relation schemes 
 that must be Included in any extension join of R. over a set of attri- 
 butes containing X. The idea is that S must contain any R that adds an 
 attribute A e X to R and, recursively, if R, e S, then S includes also 
 every R. that adds an attribute of R, . Now consider an ordering 
 
 R. ,...,R. of the elements of S such that if k < m, then R, is used 
 
 j l J P + 
 
 before R in line 3 of Fig. 1 during the computation of R . . (Note that 
 m i 
 
 3 X = iO 
 
 P 
 Claim 1 : t><! R is an extension join (of R . ) . 
 k=l J k 
 
 Proof : Let V = R. ...R. , and Y be the key used by R. in R. 
 
 J l J m J mH 1 
 
 (Km<p). Consider an attribute B e Y, and suppose that R adds B to R.. 
 
 R must have been added to S in line 4 of Fig. 2, since R. e S. 
 
 Further, R must be used before R. during the computation of R. . 
 
 mfl 
 
- 28 - 
 
 begin 
 
 (1) S := <j>; 
 
 (2) Y := X; 
 
 (3) while there is an R. such that R. t S and R. adds some A e Y to R. do 
 J J J i — 
 
 begin 
 
 (4) S := S \} {R }; 
 
 (5) Y := YR. 
 
 J 
 
 end 
 end 
 
 Figure 2 
 
 P 
 
 Therefore, if B e Y, then B e V and, so, (><j R. is an extension 
 
 k=l J k 
 
 join. [] 
 
 The algorithm of Fig. 2 can be implemented in linear time using a 
 data structure similar to that of [BB]. Similarly, the ordering of the 
 elements of S can be determined in linear time. [] 
 
 We have shown that we need to consider only minimal extension joins 
 
 of those R. such that X C R.. Now suppose that E. and E. are minimal 
 J - J i J 
 
 extension joins of R. and R., respectively, with respect to X. The fol- 
 lowing lemma implies that if E, includes R. , then ti (E . ) C^ -n^CE.). 
 
 Lemma 5 : If E includes R , then E includes every relation scheme 
 that appears in E . 
 
- 29 - 
 
 Proof : Apply the algorithm of Fig. 2 to X and R. , and let 
 
 R. , ...,R be the order in which the elements of S are added in line 4. 
 
 1 P 
 We will prove by induction on k that XR . ..R C^ V and R is in E . , 
 
 1 1 k *k J 
 
 where V is the set of attributes in E . 
 
 Basis , k - 0. Obvious. 
 
 Induction . Suppose that R is not included in E. (i.e., i . t i) . 
 
 Since R. is added after R , ...,R in line 4, R adds some 
 
 X k 1 k-1 k 
 
 B e XR. ...R. to R. using a key Y (i.e., Y C (R. )_ _ ). By the 
 
 i l i k-l ± ~ i F " F i 
 
 inductive hypothesis, B e V. Since E. is an extension join of R that 
 
 does not include R , it follows that V C^ (R.) . Moreover, 
 
 + k + i k 
 
 Y C (R.)_ _ , because Y C (R. )_ _ and R, (but not R. ) is included in 
 - j F-F ^ - i F-F i i fc 
 
 E.. Thus, YB C (R.)_ _ , and so R, violates the uniqueness condi- 
 j - j / F-F 1 j 
 
 tion. [] k 
 
 Theorem 2: Let <R n ,K->, . . . ,<R ,K > be a database scheme such that 
 11' n n 
 
 each R satisfies the uniqueness condition. Suppose that each relation 
 r satisfies F and does not have any nulls. An expression whose value 
 
 is the X-total projection of the representative instance can be found in 
 
 2 
 0(n ) time, where n is the space needed to write down the database 
 
 scheme. 
 
 Proof : The algorithm for constructing the expression is given in 
 Fig. 3. For a given value of T, define 
 E(T) = {E, | E, is the minimal extension join of R, with respect to X, 
 
 and R, e T} 
 Let F(T) be the union of all extension joins in E(T) projected onto X. 
 
- 30 - 
 
 begin 
 
 (1) W := 4>; 
 
 (2) Let T = {R ± | X C R*}; 
 
 (3) for every R. e T do 
 
 begin 
 
 (4) let E be the minimal extension join of R with respect to X; 
 
 (5) if there is no R. e T (j * i) that appears in E. then 
 
 (6) add E ± to W 
 
 (7) else remove R. from T 
 
 i 
 
 end; 
 
 (8) let W = {E.,...,E }; 
 l m 
 
 m 
 
 (9) return the expression \} il.(E.) 
 
 end 
 
 Figure 3 
 
 Using Lemma 5, we can easily prove by induction that for all values of T 
 obtained in line 3, F(T) = F(T Q ), where T is the initial value of T. 
 By Corollary 2 and Lemma 4, the value of F(T n ) is equal to the X-total 
 projection of the representative instance. The expression returned by 
 the algorithm of Fig. 3 is F(T ) , where T is the final value of T. 
 Thus, the algorithm returns an expression whose value is the X-total 
 projection of the representative instance. 
 
 By Lemma 4 and [BB], the algorithm of Fig. 9 can be implemented to 
 
 2 
 run In 0(n ) time. 
 
- 31 - 
 
 Corollary 3 : Among all the expressions of the form tyit (E ), whose 
 
 J X j 
 value Is the X-total projection of the representative instance, the 
 
 expression returned by the algorithm of Fig. 3 is minimal in both the 
 
 number of subexpressions of the form it„(E ) and the total number of join 
 
 operators. 
 
 Proof : The expression returned in line 9 is nonredundant in the 
 
 sense that if any subexpression of the form it (E.) is removed from the 
 
 union, then we can find relations r.,...,r for which the value of the 
 
 In 
 
 resulting expression is not the X-total projection of the representative 
 instance. In proof, for each R that appears in E , the relation r has 
 exactly one tuple with 1 in every column; all the other relations are 
 empty. Clearly, the value of E. is a relation with exactly one tuple 
 that has 1 in every column. All the other extension joins have at least 
 one empty relation and, so, their value is the empty relation. Thus, 
 removing "^(E ) changes the value of the expression from a relation with 
 one tuple to the empty relation. By a theorem of [SY], it follows that 
 the expression returned in line 9 is minimal in the number of subexpres- 
 sions of the form it^(E ). Since each E has no equivalent expression 
 with fewer join operators (by Lemma 4), the expression returned in line 
 9 is also minimal in the total number of join operators (by a result of 
 [SY]). [] 
 
 Example 7 : Suppose that the attributes are P(pro ject) , 
 D( department), M(manager), L(location), and A(assistant), and the data- 
 base scheme is <LDP, {LD»,<DPM, {DP}>,<LMA, {LM}>. 
 
- 32 - 
 
 Intuitively, the database scheme describes an application in which 
 each project belongs to several departments and is carried out in 
 several locations, but a department can have only one project in each 
 location. In each department participating in a project, there is a 
 manager responsible for that project. Each manager has an assistant in 
 each location. These relation schemes satisfy the uniqueness condition. 
 
 Suppose we want to compute the total projection of the representa- 
 tive instance onto LM. After Step (1) of the algorithm, S = {LDP,LMA}. 
 The minimal extension join of LDP with respect to LM is LDP M PDM, and 
 the minimal extension join of LMA with respect to LM is LMA. In Step 
 (2) of the algorithm S is not changed, and so an expression for the LM- 
 total projection of the representative instance is 
 Tt LM (LDP M PDM) \> ti lm (LMA). 
 
 The result of the above expression is all tuples (l,m) such that 
 either manager m has an assistant in location 1 or manager m manages 
 some project in location 1. Considering the fact that there might be 
 partial information (e.g., a manager with a project in a location where 
 he does not have an assistant, or a manager with an assistant in a loca- 
 tion where he does not have a project), the correct answer is indeed 
 given by the above expression. [] 
 
- 33 - 
 
 6. Conclusions 
 
 We have proposed the representative Instance as a measure of the 
 data stored In the database, and we have characterized, in terms of the 
 uniqueness condition, the database schemes for which the representative 
 instance always satisfies the functional dependencies. The uniqueness 
 condition can be viewed as an extension of Boyce-Codd normal form, since 
 it removes inter-relation anomalies in the sense that each relation can 
 be updated independently of the contents of the other relations without 
 violating global consistency. (It is an extension of Boyce-Codd normal 
 form, since a relation scheme that satisfies the uniqueness condition is 
 also in Boyce-Codd normal form.) This condition is much less restrictive 
 than [BG] , and we believe that many practical applications satisfy it. 
 We have also shown how to compute efficiently total projections of the 
 representative instance if the uniqueness condition is satisfied. 
 
 In [Sa] we have dealt with database schemes that do not satisfy the 
 uniqueness condition. We have shown that the representative instance 
 can be guaranteed to satisfy the functional dependencies if the modified 
 foreign-key constraint is imposed on the database. The modified 
 foreign-key constraint is less restrictive and semantically more mean- 
 ingful than imposing the existence of a universal instance. Under this 
 constraint, total projections of the representative instance can be com- 
 puted by performing the union of several extension joins. 
 
- 34 - 
 
 References 
 
 [ABU] Aho, A. V. , C. Beeri, and J. D. Ullman, "The Theory of Joins in 
 Relational Databases," ACM Trans , on Database Systems , Vol. 4, No. 
 3 (Sept. 1979), pp. 297-314. 
 
 [ASU] Aho, A. V. , Y. Sagiv, and J. D. Ullman, "Equivalences Among Rela- 
 tional Expressions," SIAM J. Computing , Vol. 8, No. 2 (May 1979), 
 pp. 218-246. 
 
 [Arm] Armstrong, W. W., "Dependency Structures of Database Relation- 
 ships," Proc . IFIP 74 , North Holland, 1974, pp. 580-583. 
 
 [BB] Beeri, C. and P. A. Bernstein, "Computational Problems Related to 
 the Design of Normal Form Relational Schemas," ACM Trans , on 
 Database Systems , Vol. 4, No. 1 (March 1979), pp. 30-59. 
 
 [Ber] Bernstein, P. A., "Synthesizing Third Normal Form Relations from 
 Functional Dependencies," ACM Trans , on Database Systems , Vol. 1, 
 No. 4 (Dec. 1976), pp. 277-298. 
 
 [BG] Bernstein, P. A., and N. Goodman, "What Does Boyce-Codd Normal 
 Form Do?," Proc . Int . Conf. on Very Large Data Bases , 1980, pp. 
 245-259. 
 
 [BDB] Biskup, J. , U. Dayal, and P. A. Bernstein, "Synthesizing Indepen- 
 dent Database Schemas," Proc . ACM-SIGMOD Int . Conf . on Management 
 of Data , 1979, pp. 143-151. 
 
 [CI] Codd, E. F., "A Relational Model for Large Shared Data Banks," 
 Comm . ACM , Vol. 13, No. 6 (June 1970), pp. 377-387. 
 
 [C2] Codd, E. F., "Extending the Database Relational Model to Capture 
 More Meaning," ACM Trans , on Database Systems , Vol. 4, No. 4 (Dec. 
 1979), pp. 397-434. 
 
 [FMU] Fagin, R., A. 0. Mendelzon, and J. D. Ullman, "A Simplified 
 Universal Relation Assumption and Its Properties," IBM research 
 report, RJ2900, Nov., 1980. 
 
 [Hoi] Honeyman, P., "Testing Satisfaction of Functional Dependencies," 
 Proc . XP1 Conf. , Stonybrook, N. Y., June, 1980. 
 
 [Ho2] Honeyman, P. , "Extension Joins," Proc . Int . Conf . on Very Large 
 Data Bases , 1980, pp. 239-244. 
 
 [HLY] Honeyman, P., R. E. Ladner, and M. Yannakakis, "Testing the 
 Universal Instance Assumption," Information Processing Letters , 
 Vol. 10, No. 1 (Feb. 1980), pp. 14-19. 
 
 [Ko] Korth, H. F., "A Proposal for the SYSTEM/U Query Language," unpub- 
 lished memorandum, Stanford University, Stanford, CA, 1980. 
 
 [KU] Korth, H. F. and J. D. Ullman, "System/U: a Database System Based 
 on the Universal Relation Assumption," Proc . XP1 Conf . , Sto- 
 nybrook, N. Y., June, 1980. 
 
 [Li] Lien, Y. E., "Multivalued Dependencies With Null Values in Rela- 
 tional Databases," Proc . Int . Conf . on Very Large Data Bases , 
 1979. 
 
 [Ma] Maier, D. , "Discarding the Universal Instance Assumption: Prelim- 
 inary Results," Proc . XP1 Conf ., Stonybrook, N. Y., June, 1980. 
 
 [MMS] Maier D. , A. 0. Mendelzon, and Y. Sagiv, "Testing Implications of 
 Data Dependencies," ACM Trans , on Database Systems , Vol. 4, No. 4 
 (Dec. 1979), pp. 455-469. 
 
 [MU] Maier, D. , and J. D. Ullman, "Maximal Objects and the Semantics of 
 
- 35 - 
 
 Universal Relation Databases," Tech. Rept. 80-016, Dept. of Com- 
 puter Science, State University of New York at Stony Brook, Stony 
 Brook, NY, Nov. 1980. 
 
 [Sa] Sagiv, Y. , "Can We Use the Universal Instance Assumption Without 
 Using Nulls?" Proc . ACM-SIGMOD Int . Conf . on Management of Data , 
 Ann Arbor, April 1981, pp. 108-120. 
 
 [SY] Sagiv, Y., and M. Yannakakis, "Equivalences Among Relational 
 Expressions with the Union and Difference Operators," »J. ACM , Vol. 
 27, No. 4 (Oct. 1980), pp. 633-655. 
 
 [Sc] Sciore, E., "The Universal Instance and Database Design," TR 271, 
 Dept. of Elec Eng. and Comp. Sci., Princeton University, Prince- 
 ton, N. J. , June, 1980. 
 
 [Val] Vassiliou, Y. , "Functional Dependencies and Incomplete Informa- 
 tion," Proc . Int . Conf . on Very Large Data Bases , 1980, pp. 
 260-269. 
 
 [Va2] Vassiliou, Y. , "A Formal Treatment of Imperfect Information in 
 Database Management," Technical Report CSRG-123, University of 
 Toronto, Nov. , 1980. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 ijtitcdcs-r-si-ios? 
 
 3. Recipient's Accession No. 
 
 4. Title and Subtitle 
 
 A Characterization of Globally Consistent Databases and 
 their Correct Access Paths 
 
 5. Report Date 
 
 July 1981 
 
 7. Author(s) 
 
 Yehoshua Sagiv 
 
 8. Performing Organization Rept. 
 No. 
 
 Performing Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 1304 W. Springfield 
 Urbana, IL 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 MCS- 80-03308 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C. 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts \j e propose the representative instance as a representation of the data stored 
 in a database whose relations are not the projections of a universal instance. We 
 characterize the database schemes for which local consistency implies global consis- 
 tency. (Local consistency means that each relation satisfies its own functional depen- 
 dencies, and global consistency means that the representative instance satisfies all 
 the functional dependencies.) We show how to compute efficiently projections of the 
 representative instance provided that local consistency implies global consistency. 
 Throughout this work we assume that a cover of the functional dependencies is embodied 
 in the database scheme in the form of keys. 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 17b. Identifiers/Open-Ended Terms 
 
 17c. COSAT1 Field/Group 
 
 18. Availability Statement 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 -22. 
 
 22. Price 
 
 FORM NTIS-35 (10-70) 
 
 USCOMM-OC 40329-P7I