UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN S" £ efo « *• "JfiLf a Crowed DEC 7 ZMO W*££Hi£ """^ «"*« ««» due da K Wow L162 no. ), and the union of sets of attributes X and Y is written XY. In this paper we assume that all the attributes are drawn from a universal set of attributes U. A functional dependency (abbr. FD) [Arm, Cod] is a statement of the form X > Y, where both X and Y are sets of attributes. The FD X + Y holds in a relation r, if for all tuples y and v of r, if y[X] = v[X], then u[Y] = v[Y]. Let R.,...,R be relation schemes, and let r be a relation on 1 q q y R . . Suppose that y .,..., y are q tuples of r (not necessarily dis- i-i i i q tinct) . The tuples y, , . . . , y are joinable on R. , . . . ,R with a result v, r 1 q - J 1 q q if there exists a mapping v defined on U R. such that for all Ki 1 n Let £ be a set of JD's, and let a be a JD or an FD. We assume that all the JD's are defined on U. The dependency a is a consequence of £ (or o is implied by £) if and only if for all relations r on U, the dependency o holds in r if all the dependencies of £ hold in r. Let £ be a set of dependencies. A convenient way of representing all the MVD's with a fixed left side that are implied by £ is by con- structing the dependency. The dependency basis of a set of attributes X is a partition of U into pairwise disjoints subsets of attributes X, Y,,...,Y such that 1 n (1) X ++ Y is implied by £ (Ki■>■ Y is implied by £, then Y is a union of some of the Y ,'s. The existence of the dependency basis follows from the inference rules for MVD's [Fag]. If Z contains only FD's and MVD's, then the dependency basis can be constructed in polynomial time [Bel, Gal, HIT, Sag] . A ta't - leau [ABU,ASU] is a two-dimensional matrix in which columns correspond to attributes. The rows of a tableau consist of variables of - 6 - the following types (1) distinguished variables , usually denoted by subscripted a' s , and (2) nondistinguished variables , usually denoted by subscripted b's. A variable cannot appear in more than one column, and in each column there is exactly one distingushed variable. A JD *[R ,...,R ] has a corresponding tableau T as follows. For each R , tableau T has a row w with distinguished variables exactly in the R -columns, and distinct nondistinguished variable in the rest of the columns . We can also view a tableau as a relation over the domain of distinguished and nondistinguished variables. Note that rows w.,...,w of T are joinable on R.,...,R , and the resulting row consists only of distinguished variables. Example 1 : Consider the JD *[AB,BCD,AD] . The tableau T. corresponding to this JD is A B C D \a^ a 2 b 1 b 2 j b 3 a 2 a 3 a 4 ' a l b 4 b 5 a 4 [] Let E be a set of FD's and JD's. Each dependency in E has an asso- ciated rule that can be applied to any tableau T as follows. (1) FD-Rules . An FD X -*• Y in £ has an associated rule for equating variables of T as follows. Suppose that rows w. and w» of T agree in all the X-columns, but disagree in an A-column, where A is an attribute of Y. If one of w. and w ? has a distinguished variable in its A-column, - 7 - then rename the two rows so that w. is that row. The FD-rule for X ♦ Y replaces all occurrences of the variable appearing in the A-column of w~ with the variable appearing in the A-column of w. . (2) JD-Rules. A JD *[S. S ] in E has an associated rule for l P adding rows to T as follows. If rows w. , . . . ,w of T are joinable on 1 P S.,...,S with a result w, and w is not already in T, then w is added to T. Each one of The above rules transforms a tableau T to another tableau T'. The rules can be applied repeatedly to a tableau T only a finite nuber of times, and the result is unique (up to renaming of non- distinguished variables) [MMS] . The chase of T under Z, denoted chasej,(T), is the tableau obtained by applying the rules associated with £ to T until no rule can be applied anymore. Let a be a JD with a corresponding tableau T . The JD a is a consequence of I if and only if chase_(T ) contains a row consisting only of distinguished variables [MMS]. Example 2 ; Let Z = {*[AB,BCD, ABD] , A ♦ B, C ♦ A}, and let a be the JD *[AB,BCD,AD] whose corresponding tableau is given in Example 1. The FD-rule for A ■*■ B can be applied to the first and third rows of the tableau in Example 1, and hence b, is identified with a~. The resulting tableau is - 8 - A B C D a x a 2 b l b 2 , b 3 a 2 a 3 a 4 a l a 3 b 5 a 4 The first, second, and third rows of the above tableau are joinable on AB,BCD,ABD with a result (a. ,a_,a_,a, ) . Thus, applying the JD-rule for *[AB,BCD,ABD] produces the tableau A B C D a l a 2 b l b 2 b 3 a 2 a 3 a 4 a l a 2 b 5 a 4 a l a 2 a 3 a 4 Applying the FD-rule for C ■*■ A to the second and fourth rows of the above tableau identifies b_ with a.. As a result the second row becomes identical to the fourth row, and hence it can be omitted. The resulting tableau is A B C D a l a 2 b l b 2 a l a 2 b 5 a 4 a l a 2 a 3 a 4 No rule for E can be applied to the above tableau [] - 9 - 3^ NP-Completeness Results Concerning Join Dependencies _3 ._1 Boolean Expressions and Tableaux All the results use almost the same reduction from the 3-satisfia- blllty (3-SAT) problem, shown NP-complete In [C] ; see also [K,GJ]. Let Q ■ F....F be a Boolean expression In conjunctive normal form, where the F. 's are clauses of three literals each, and x.,x„,...,x are all j 1 2 n the variables appearing in this expression. We denote the variables appearing in a clause F. by x. , x. , and x . We assume that n>4 (and J J l 3 2 J3 hence m>l), and each variable appears in at least two clauses. Note that if n<3, then the satisfiability of Q can be decided in linear time; and if a variable appears in only one clause, then this clause is always satisfied and, hence, it can be omitted. Thus, this variant of the 3-SAT problem is NP-complete. We now show how Q is used to construct two tableaux that correspond to join dependencies. These tableaux are similar to those used in the NP-completeness proofs given in [ASU] . Each one of them has (nri-3n+2) columns. The first m columns correspond to the clauses F.,...,F , and they are labeled by the attributes E.,...,E . The next 3n columns are divided into three blocks of n columns each. The n columns in each block correspond to the variables x. , . . . ,x . The columns of the three blocks are labeled by A 's, B 's, and C 's, respectively. The last two columns are labeled by D- and D_. The first tableau, denoted by S, represents the m clauses. For each clause F containing the variables x , x . and x , tableau S has a row s as follows. Row s. has dis- J 1 J 9 J o J J tinguished variables in the columns for E , A. , A , A , and D.. All J 3 l 3 2 J3 - 10 - the other columns have distinct nondistnigushed variables. The tableau S has one more row, denoted by s ., with distinguished variables in all the E, B, C, and D« columns (the rest of the columns have distinct non- distinguished variables) . Let S be the relation scheme corresponding to row s of S (Kj A (Ki • • a a i • i » • • a a a a a ► • 4 i • a a a • a a • 4 a • a • a a a i • i a a a i • • a a a • • • 4 a a a a a a • • i • a a a a a a • 4 ► • • a i a a a c i a a a i i a a a| J 4 a a| Figure 1 Thus, the truth assignment represented by w is now given in the A- column8 of w. It is easy to show that no further applications of FD- rules are possible. Let T' be the tableau obtained by applying the FD- rules to T. Lemma 1 : Suppose that rows w.,...,w . , of T" are joinable on i m+l S.,...,S ., with a result w, and w is not in T. Then w ,, is u, and for 1 nn-i m+l all Kjl). Thus, all the w.'s (i*k) are equal to a row of T' representing a truth assignment for F, , and w, is also a row represent- - 14 - ing a truth assignment for F. . But every variable x appears in more than one clause and, hence, the pattern of the distinguished variables in the A-columns of tableau S implies that w, represents the same truth assignment as all the other w 's. That is, all the w 's are identical. So far we have shown that if w .. is not u, then all the w.'s are m+i l identical. Now suppose that some w. is not a row representing a truth assignment for F . If w is u, then Claim 2 implies that for all Ki for all the variables x.,...,x such that for all Kj on the variables of F . ; and let w ., be row u of J J ™+l T'. It is easy to show that the rows v. v .. are joinable on S.,...,S . with a result w not in T'. [] Lemma 3 ; The JD a (corresponding to T) is a consequence of Z if and only if Q is satisf iable. Proof : Only if . By Corollary 2, if Q is not satisf iable, then the only JD-rule for E cannot be applied to T' . Therefore, chase_(T) is the result of applying the FD-rules to T, i.e., chase_(T) » T'. This chase does not contain a row with only distinguished variables and, hence, o is not a consequence of £. If . Suppose that Q is satisfiable. By Lemma 1 and Corollary 2, an application of the JD-rule for Z to T' adds a row w that has dis- tinguished variables in all the E, B, C, and D columns. We can apply the FD-rules for D.D 2 + A. (Kj