UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN below v ef ° re *> taN* ffi borr °™* ffff? / OK 720/0 L162 UIUCDCS-R-79-980 //L^'f '' UILU-ENG 79 1732 July 1979 Subset Dependencies as an Alternative to Embedded Multivalued Dependencies by Yehoshua Sagiv Scott Walecka nJEL 'BRARY F T>*E Mar l 2 1SS0 U 0&P n OFjLUN0lS UIUCDCS-R-79-980 Subset Dependencies as an Alternative to Embedded Multivalued Dependencies Yehoshua Sagiv Scott Walecka Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 July 1979 (+) The work of this author was supported in part by the National Sci- ence Foundation under grant MCS-77-22830 Digitized by the Internet Archive in 2013 http://archive.org/details/subsetdependenci980sagi ABSTRACT We show that the inference rules for multivalued dependencies can- not be extended to a complete set of inference rules for embedded mul- tivalued dependencies. A new type of dependencies, called subset depen- dencies, is introduced. Subset dependencies are a generalization of embedded multivalued dependencies. We give a set of inference rules for subset dependencies and investigate their properties. CR categories: 4.33 Key words and phrases: multivalued dependency, embedded multivalued dependency, subset dependency, inference rule, relational database. - 2 - _1. Introduction The relational database model [Cod] uses dependencies as a semantic tool for expressing properties of the data. Functional [Arm, Cod] and multivalued dependencies [BFH,Fa 1, Zan] are the most common types of dependencies, and they have been investigated thoroughly (e.g., [Bel,BB,Bil,Bi2,Fa2,HIT,Mak,Men,Nic,Sag,SaF]). A complete utilization of multivalued dependencies requires that we deal also with embedded multivalued dependencies, i.e., those mul- tivalued dependencies that hold in a projection of a relation but not necessarily in the relation itself. In contrast to functional and mul- tivalued dependencies, the properties of embedded multivalued dependen- cies are substantially unknown. Attempts have been made to extend the inference rules of [BFH] for multivalued dependencies to a complete set of inference rules for embedded multivalued dependencies [TK1,TK2]. However, in this paper we show that no such extension exists. The proof is carried out by showing that for every positive integer n, there is a set of n embedded multivalued dependencies £ that implies another embed- ded multivalued dependency a, but the only embedded multivalued depen- dencies implied by any subset of Z are those obtained by augmentation and projection. We also introduce a new type of dependencies, called subset dependencies , that is a generalization of embedded multivalued dependen- cies. A set of inference rules for subset dependencies is presented. This set of rules is not known to be complete. However, it is superior - 3 - to the rules of [BFH] in the following sense. We show the existence of a subset of embedded multivalued dependencies for which one cannot obtain a complete set of inference rules by extending the rules of [BFH], and yet our rules are complete for this subset. Our rules also imply the rules of [BFH] for multivalued dependencies, and the known rules for embedded multivalued dependencies [ABU,Fal,TKl] . 2* Basic Definitions and Results 2^_1 . The Relational Model for Databases The relational model for databases assumes that the data is stored in tables called relations . The columns of a table correspond to attributes , and the rows to records or tuples . Each attribute has an associated domain of values. It is convenient to regard a tuple as a mapping from the attributes to thier domains, since no canonical order- ing of the attributes is needed in this way. A relation scheme is a set of attributes labeling the columns of a table. We often use the rela- tion scheme itself as the name of the table. A relation can be viewed as the "current value" of a relation scheme. Suppose that r is a relation defined on a relation scheme X. Let \x be a tuple of r and A an attribute in X. The tuple p maps the attribute A to M(A), and u(A) is called the A -value of u. If Y is a subset of X, then y(Y) is a tuple defined only on the attributes of Y; the tuple y(Y) maps each attribute A of Y to y(A). We call M(Y) a Y -value in r and usually ^enote it by y. If tuples y and v agree on all the attributes - 4 - of the set X, then we write u(X) - v(X). The projection of the relation r onto Y is obtained by removing coloumns of r not corresponding to attributes of Y and identifying common tuples, i.e., r(X) - . (Note that v_ occurs in u.) Thus, there is a tuple v in r corresponding to a set W of CON(X) with a V-value v Q and a Z-value z.. It follows that V must be a subset of W, because W contains all the attributes (except - 12 - those in Z) that are mapped to by v. But in this case there is an edge from [V] to [W] and, hence, there is a path from [V] to [X] (because there is a path from [W] to [X]). Thus, we have proved the following claim. Claim 2 : Let V be disjoint from Z. If Z (v Q ) - {z ,z.>, then V is in CON(X) (i.e., there is a path from [V] to [X]). We now show that all the ZSD's of I hold in r but Z(X) C Z(Y) fails in r. In proof, Z (x~) = {z.,z.}, because X is in CON(X). Since there is no path from [Y] to [X], Z (y Q ) - {z Q } by Claim 2. Thus Z(X) C Z(Y) fails in r. Let Z(W) C Z(V) be any ZSD in £. By Claim 1, in order to prove that Z(W) C Z(V), it is sufficient to show that Z (w ) £ Z (v.) for the WV-value w.v- occuring in u. By Claim 1, if Z (w_) = {z.} we are done. So suppose that Z (w_) = {z_,z.}. By Claim 2 , there is a path from [W] to [X] and, hence, there is a path from [V] to [X] (because Z(W) C Z(V) implies an edge from [V] to [W] ) . Hence, Z (v_) ■ {z_,z.}. This com- pletes the proof. [] Lemma 9 and Lemma 10 provide a method for deciding whether a set of ZSD's E implies another ZSD Z(X) C Z(Y). In order to do so, construct a Z-graph G with nodes corresponding to X and Y, and check whether there is a path from [Y] to [X] . Theorem 11 : Testing whether a ZSD Z(X) C Z(Y) is a consequence of a 2 3et of ZSD's Z can be done in 0(n ) time, where n is the size of the - 13 - input Proof ; Assuming that the attributes in the input are represented by the numbers l,...,k, a Z-graph containing nodes for X and Y can be con- 2 structed in 0(n ) time. Testing whether there is a path from [Y] to [X] requires only linear time (in the size of the graph). [] A Z embedded multivalued dependency (abbr. Z-EMVD) is an EMVD of the form X +•*■ Y|Z, where Z is a fixed set of attributes. Corollary 12 : Testing whether a Z-EMVD X >->■ Y|Z is a consequence of 2 a set of Z-EMVD's E can be done in 0(n ) time, where n is the size of the input. 5. ZSD's and EMVD's In this section we investigate the EMVD's implied by a set of ZSD's E. Let MG be the minimal Z-graph for E. The Z -EMVD cover of E, writ- r ten Z-EMVD (E), is the set {X+vyJZ | there is a path from [XY] to [X] in MG^} We will show that an EMVD T is implied by E only if there is a Z-EMVD a in Z-EMVD (E) such that t is obtained from a by augmentation and projec- tion. Lemma 13 : If a Z-EMVD X ■*-*■ Y|Z is a consequence of a set of ZSD's Q E, then X ++ Y|Z can be obtained from a Z-EMVD in Z-EMVD (E) by augmen- tation an' projection. - 14 - Proof ; If both X and XY are in KER(Z), then X ♦♦ Y|Z is in Z- EMVD (Z) and we are done. Assume that neither X nor XY is in KER(Z) (the other two cases in which either X or XY is in KER(Z) are proved similarly). Let G„ be a Z -graph in which all the nodes correspond to members of the set KER(Z) U {[X],[XY]}. Since X -►->- Y|Z is a consequence of Z, there is a path in G from [XY] to [X] . Let the first edge in this path be from [XY] to [S], and the last edge be from [T] to [X]. An edge from [XY] to [S] can exist only if XY C S. Similarly, T C X and, hence, T C S. Let S be written as TS', where S' is disjoint from T. Thus, Z-EMVD (Z) contains the Z-EMVD T ♦+ S'|Z. It is easy to show that X ++ Y|Z follows from T ++ S'|Z by augmentation and projection. [] Lemma 14 : If W ++ V|Y is a nontrivial EMVD implied by a set Z of ZSD's, then either VCZor YCZ. (It is assumed that W, V, and Y are pairwise disjoint.) Proof : Construct a relation r over {0,1} with two tuples that agree exactly on the atributes of Z and W. Let z be the Z-value of the two tuples in r. Obviously, for every X-value x, Z (x) = {z}. Thus all the ZSD's of Z hold in r and, hence, W ■*■■*■ V|W holds in r. By Lemma 3 in [SaF], either VCZorYCZ. [] Suppose that a is a nontrivial EMVD implied by a set of ZSD's Z. By Lemma 13, a can be written as W +■* V'|Z', where Z' C Z. (It is assumed that W, V, and Z' are pairwise disjoint.) We now prove the following lemma. - 15 - Lemma 15 : There exists a Z-EMVD W ■*■* V|Z implied by E such that W' ■*-* V'|Z' can be obtained from W +■*■ V|Z by augmentation and projec- tion. Proof : We use the same method as in the proof of Lemma 10. In that proof we built a relation r having two Z-values, z. and z., that disagreed on all the columns of Z. It is sufficient, however, that z n and z. would not be the same. Thus z. is replaced with z, where z has 0's exactly in the columns of W ft Z and l's in all the other columns of Z. Since W ++ V'|Z' is a nontrivial EMVD, Z' - W is nonempty (i.e., some columns of z are indeed 1 and z is different from z_). Let W = W - Z, and let G„ be a Z-graph for £ containing the nodes [W] and [WV'J. Construct a relation r as in the proof of Lemma 10 using the Z-values z_ and z (instead of z.), and the set CON(W). Recall that y is the tuple of r that maps all the attributes to 0. Since W is in CON(W), the relation r has a tuple u such that vj maps all the attributes of W to 0, all the attributes of Z to z, and all the other attributes to 1. Note that the tuples vi and u agree exactly on the columns of W. All the ZSD's of I hold in r and, hence, W ** V'|Z' also holds in r. Therefore, there is a tuple t in r such that t(W') = u(W'), t(V') = u(V'), and t(Z') = o(Z'). The Z-value of t must be z, because u maps some attributes of Z' to 1. Therefore, t and u agree on all columns of Z. Since they should disagree on all the columns of V, it follows that V is disjoint from Z. - 16 - By the construction of r, WV' must be in CON(W), because WV' con- tains all the columns (except those in Z) in which t has O's. Hence, there is a path in G from [WV'] to [W] . Therefore, W ^ V'|Z is a consequence of E. Obviously, W ♦+ V'|Z' can be obtained from W +■♦ V |Z by augmentation and projection. [] 6^. The Nonextendibility of the MVP Inference Rules to EMVD 's In this section we show that for any positive integer n, one can find a set Z of n EMVD's that implies another EMVD a, but any n-1 EMVD's of £ imply only those EMVD's that can be obtained by projection and aug- mentation. This result indicates that the inference rules of [BFH] for MVD's cannot be extended in any meaningful way to a set of inference rules for EMVD's. Given a positive integer n, let X A ,X-, ...,X ,,Z be pairwise dis- U z n— l joint sets of attributes. Let Z consists of the following Z-EMVD's. x o ** x i |z Xj — x 2 iz X n -2 " X n-I |Z X „-l " x o |z That is, E contains the Z-EMVD X ->•> X 1+ il z for a11 0■ X ,|Z is a consequence of E. n-1 n-1 Lemma 17 ; Let E' be a set of n-1 dependencies from E. If a' is implied by £', then there is a Z-EMVD a in V such that o" is obtained from a by augmentation and projection. Proof ; Consider the graph MG . Obviously, a path in MG that corresponds to a Z-EMVD implied by £ must start in a node [XX] (for some 0 X f Z is in Z. It is easy to see that there is a path from [XX] to IX,,] Figure 2 - 18 - for all i. That is, X *-*■ X |Z is implied by Z (0