UNIVERSITY OF 
 
 ILLINOIS LIBRARY 
 
 AT URBANA-CHAMPAIGN 
 
S" £ efo « *• "JfiLf a Crowed 
 
 DEC 7 ZMO 
 
 W*££Hi£ """^ «"*« ««» due da K Wow 
 
 L162 
 
no. <j$5 
 
 UIUCDCS-R-79-985 
 
 UILU-ENG 79 1734 
 On The Complexity of Inferring Join Dependencies 
 
 by 
 David Maier 
 Yehoshua Sagiv 
 
 August 1979 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/oncomplexityofin985maie 
 
UIUCDCS-R-79-985 
 
 On the Complexity of Inferring Join Dependencies 
 
 David Maier 
 Department of Computer Science 
 State University of New York 
 
 at Stony Brook 
 Stony Brook, New York 11794 
 
 Yehoshua Sagiv 
 Department of Computer Science 
 University of Illinois at Urbana-Champaign 
 Urbana, Illinois 61801 
 
 August 1979 
 
 (+) The work of this author was supported in part by the National Sci- 
 ence Foundation under grant MCS-77-22830 
 
ABSTRACT 
 
 It is shown that deciding whether a set of functional dependencies 
 and one join dependency implies another join dependency is NP-complete. 
 It is also shown that deciding whether a JD-rule can be applied to a 
 tableau T is NP-complete. This problem is NP-complete even if T can be 
 obtained from a tableau corresponding to a join dependency by applying 
 some FD-rules. As a result, it follows that computing the join of 
 several relations is NP-hard. 
 
 CR categories: 4.33, 5.25 
 
 Key words and phrases: functional dependency, multivalued dependency, 
 join dependency, join, membership algorithm, NP-complete, relational 
 database. 
 
- 2 - 
 
 J^. Introduction 
 
 The relational model for databases [Cod] uses dependencies as a 
 semantic tool for expressing constraints that the data must satisfy. 
 Functional dependencies and join dependencies (that include multivalued 
 dependencies as a special case) are examples of such dependencies. A 
 utilization of these dependencies in the design of relational databases 
 depends upon the ability to develop membership algorithms , that is, 
 algorithms for deciding whether a set of dependencies £ implies another 
 dependency a. Several efficient membership algorithms are known if all 
 the dependencies are functional or multivalued [Bel, BeB, Gal, HIT, Sag] , 
 and an exponential time and space algorithm exists for functional and 
 join dependencies [MMS]. 
 
 In this paper we show that if a is a join dependency, and E is a 
 set of functional dependencies and one join dependency, then deciding 
 whether E implies o is NP-complete. As a by-product of this result, we 
 show that the problem of deciding whether a JD-rule can be applied to a 
 tableau T, and the problem of deciding whether a relation r does not 
 obey a join dependency are NP-complete. The first problem is NP- 
 complete even if T can be obtained from a tableau corresponding to a 
 join dependency by applying some FD-rules. Another by-product is a 
 proof that deciding whether a relation r is not the join of relations 
 r ,...,r is NP-hard. It is easy to give examples in which the join of 
 r.,...,r has an exponential size (measured as a function of the space 
 
 needed to write down r.,...,r ). Therefore, this result indicates that 
 
 In 
 
 an algorithm for computing the join of r.,...,r whose running time Is 
 
- 3 - 
 
 polynomial in the size of the output (I.e., the space needed to write 
 down the join of r.,...,r ) is unlikely to exist. A similar result is 
 given in [HLY]. However, our result is stronger, since we assume that 
 r.,...,r are projections of some universal instance I. 
 
 A recent result [Yan] shows that if a is a functional or a mul- 
 tivalued dependency, then deciding whether a set Z of functional and 
 join dependencies implies a can be done in polynomial time. Thus, the 
 only remaining open problem is to find a lower bound on the complexity 
 of deciding whether a set of join dependencies implies another join 
 dependency. It is interesting to note that the only known algorithm for 
 the more restricted problem of deciding whether a set of multivalued 
 dependencies implies a join dependency is exponential in time and space, 
 and there is no known lower bound [ABU] . 
 
 2. Basic Definitions 
 
 A relation is a two-dimensional table in which columns correspond 
 to attributes , and rows correspond to records or tuples . Each attribute 
 has an associated domain of values, and a tuple is viewed as a mapping 
 from the attributes to their domains. If r is a relation, p is a tuple 
 of r, and X is a set of attributes, then p[X] denotes the values of \i in 
 the X-columns. A set of attributes labeling the columns of a relation 
 is called a relation scheme . If R is a set of attributes labeling the 
 columns of a relation r, then r is said to be defined on R. We use the 
 letters A,B,C,... to denote attributes, and the letters 
 
- 4 - 
 
 . . . ,R, S, . . . ,X, Y,Z to denote sets of attributes (i.e., relation schemes). 
 A set of attributes is written as a string attributes (e.g., ABCD is the 
 set (A,B,C,D>), and the union of sets of attributes X and Y is written 
 XY. In this paper we assume that all the attributes are drawn from a 
 universal set of attributes U. 
 
 A functional dependency (abbr. FD) [Arm, Cod] is a statement of the 
 form X > Y, where both X and Y are sets of attributes. The FD X + Y 
 holds in a relation r, if for all tuples y and v of r, if y[X] = v[X], 
 then u[Y] = v[Y]. 
 
 Let R.,...,R be relation schemes, and let r be a relation on 
 1 q 
 
 q 
 
 y R . . Suppose that y .,..., y are q tuples of r (not necessarily dis- 
 
 i-i i i q 
 
 tinct) . The tuples y, , . . . , y are joinable on R. , . . . ,R with a result v, 
 r 1 q - J 1 q 
 
 q 
 
 if there exists a mapping v defined on U R. such that for all Ki<q, 
 
 i=l 
 y.[R ] = v[R ] . A loin dependency (abbr. JD) [Ris] is a statement of 
 
 the form *[R,,...,R ], where each R. is a relation scheme. The JD 
 
 1 i * , 
 
 *[R.,...,R ] holds in a relation r defined on y R. if whenever tuples 
 
 q i=l 
 
 \i. t .., t M of r are ioinable on R.,...,R with a result v, then v is also 
 1 q J 1* ' q ' 
 
 a tuple of r. The JD *[R.,...,R ] is defined on the relation scheme 
 
 1 q — 
 
 q 
 
 y r. . 
 i-i 
 
 A multivalued dependency (abbr. MVD) [BFH,Fag,Zan] is a JD with at 
 most two relation schemes. An MVD *[R.,R_] is also written 
 R. ri R ++ R (or equivalently R. fi R ++ R ). Conversely, the MVD 
 X -*•-»■ Y defined on U can be written as the JD *[XY,XZ], where 
 Z = U - X - Y. Both FD's and MVD's have a complete set of inference 
 
- 5 - 
 
 rules [Arm,BFH] , and polynomial time membership algorithms 
 
 [Bel,BeB,Gal,RTT,Sag] . An MVD X •*■■*■ Y holds in a relation r if and only 
 if X ■*■•*■ Y - X holds in r [Fag] . Therefore, we can assume that in an MVD 
 X •♦"»■ Y the left and right sides (i.e., X and Y) are disjoint. 
 
 Let r.,...,r be relations defined on R.,...,R , respectively. The 
 
 n 
 
 loin of r.,...,r , written * r . , is 
 
 1 i-1 ± 
 
 <y there are tuple y.er. (Ki<n) such that y.,...,u 
 
 II in 
 
 are joinable on R.,...,R with a result y> 
 1 n 
 
 Let £ be a set of JD's, and let a be a JD or an FD. We assume that 
 all the JD's are defined on U. The dependency a is a consequence of £ 
 (or o is implied by £) if and only if for all relations r on U, the 
 dependency o holds in r if all the dependencies of £ hold in r. 
 
 Let £ be a set of dependencies. A convenient way of representing 
 all the MVD's with a fixed left side that are implied by £ is by con- 
 structing the dependency. The dependency basis of a set of attributes X 
 is a partition of U into pairwise disjoints subsets of attributes 
 
 X, Y,,...,Y such that 
 1 n 
 
 (1) X ++ Y is implied by £ (Ki<n), and 
 
 (2) if X >■>■ Y is implied by £, then Y is a union of some of the Y ,'s. 
 The existence of the dependency basis follows from the inference rules 
 for MVD's [Fag]. If Z contains only FD's and MVD's, then the dependency 
 basis can be constructed in polynomial time [Bel, Gal, HIT, Sag] . 
 
 A ta't - leau [ABU,ASU] is a two-dimensional matrix in which columns 
 correspond to attributes. The rows of a tableau consist of variables of 
 
- 6 - 
 
 the following types 
 
 (1) distinguished variables , usually denoted by subscripted a' s , and 
 
 (2) nondistinguished variables , usually denoted by subscripted b's. 
 
 A variable cannot appear in more than one column, and in each column 
 there is exactly one distingushed variable. 
 
 A JD *[R ,...,R ] has a corresponding tableau T as follows. For 
 each R , tableau T has a row w with distinguished variables exactly in 
 the R -columns, and distinct nondistinguished variable in the rest of 
 the columns . We can also view a tableau as a relation over the domain 
 of distinguished and nondistinguished variables. Note that rows 
 w.,...,w of T are joinable on R.,...,R , and the resulting row consists 
 only of distinguished variables. 
 
 Example 1 : Consider the JD *[AB,BCD,AD] . The tableau T. 
 corresponding to this JD is 
 
 A B C D 
 \a^ a 2 b 1 b 2 
 
 j b 3 a 2 a 3 a 4 
 ' a l b 4 b 5 a 4 
 
 [] 
 
 Let E be a set of FD's and JD's. Each dependency in E has an asso- 
 ciated rule that can be applied to any tableau T as follows. 
 
 (1) FD-Rules . An FD X -*• Y in £ has an associated rule for equating 
 variables of T as follows. Suppose that rows w. and w» of T agree in 
 all the X-columns, but disagree in an A-column, where A is an attribute 
 of Y. If one of w. and w ? has a distinguished variable in its A-column, 
 
- 7 - 
 
 then rename the two rows so that w. is that row. The FD-rule for X ♦ Y 
 replaces all occurrences of the variable appearing in the A-column of w~ 
 with the variable appearing in the A-column of w. . 
 
 (2) JD-Rules. A JD *[S. S ] in E has an associated rule for 
 
 l P 
 
 adding rows to T as follows. If rows w. , . . . ,w of T are joinable on 
 
 1 P 
 
 S.,...,S with a result w, and w is not already in T, then w is added to 
 
 T. 
 
 Each one of The above rules transforms a tableau T to another 
 tableau T'. The rules can be applied repeatedly to a tableau T only a 
 finite nuber of times, and the result is unique (up to renaming of non- 
 distinguished variables) [MMS] . The chase of T under Z, denoted 
 chasej,(T), is the tableau obtained by applying the rules associated with 
 £ to T until no rule can be applied anymore. Let a be a JD with a 
 corresponding tableau T . The JD a is a consequence of I if and only if 
 chase_(T ) contains a row consisting only of distinguished variables 
 [MMS]. 
 
 Example 2 ; Let Z = {*[AB,BCD, ABD] , A ♦ B, C ♦ A}, and let a be the 
 JD *[AB,BCD,AD] whose corresponding tableau is given in Example 1. The 
 FD-rule for A ■*■ B can be applied to the first and third rows of the 
 tableau in Example 1, and hence b, is identified with a~. The resulting 
 tableau is 
 
- 8 - 
 
 A B C D 
 
 a x a 2 b l b 2 
 
 , b 3 a 2 a 3 a 4 
 a l a 3 b 5 a 4 
 
 The first, second, and third rows of the above tableau are joinable on 
 AB,BCD,ABD with a result (a. ,a_,a_,a, ) . Thus, applying the JD-rule for 
 *[AB,BCD,ABD] produces the tableau 
 
 A B C D 
 
 a l a 2 b l b 2 
 b 3 a 2 a 3 a 4 
 
 a l a 2 b 5 a 4 
 
 a l a 2 a 3 a 4 
 
 Applying the FD-rule for C ■*■ A to the second and fourth rows of the 
 above tableau identifies b_ with a.. As a result the second row becomes 
 identical to the fourth row, and hence it can be omitted. The resulting 
 tableau is 
 
 A B C D 
 
 a l a 2 b l b 2 
 a l a 2 b 5 a 4 
 
 a l a 2 a 3 a 4 
 
 No rule for E can be applied to the above tableau [] 
 
- 9 - 
 
 3^ NP-Completeness Results Concerning Join Dependencies 
 _3 ._1 Boolean Expressions and Tableaux 
 
 All the results use almost the same reduction from the 3-satisfia- 
 
 blllty (3-SAT) problem, shown NP-complete In [C] ; see also [K,GJ]. Let 
 
 Q ■ F....F be a Boolean expression In conjunctive normal form, where 
 
 the F. 's are clauses of three literals each, and x.,x„,...,x are all 
 j 1 2 n 
 
 the variables appearing in this expression. We denote the variables 
 
 appearing in a clause F. by x. , x. , and x . We assume that n>4 (and 
 
 J J l 3 2 J3 
 
 hence m>l), and each variable appears in at least two clauses. Note 
 that if n<3, then the satisfiability of Q can be decided in linear time; 
 and if a variable appears in only one clause, then this clause is always 
 satisfied and, hence, it can be omitted. Thus, this variant of the 
 3-SAT problem is NP-complete. 
 
 We now show how Q is used to construct two tableaux that correspond 
 to join dependencies. These tableaux are similar to those used in the 
 NP-completeness proofs given in [ASU] . Each one of them has (nri-3n+2) 
 columns. The first m columns correspond to the clauses F.,...,F , and 
 they are labeled by the attributes E.,...,E . The next 3n columns are 
 divided into three blocks of n columns each. The n columns in each 
 block correspond to the variables x. , . . . ,x . The columns of the three 
 blocks are labeled by A 's, B 's, and C 's, respectively. The last two 
 columns are labeled by D- and D_. The first tableau, denoted by S, 
 represents the m clauses. For each clause F containing the variables 
 x , x . and x , tableau S has a row s as follows. Row s. has dis- 
 
 J 1 J 9 J o J J 
 
 tinguished variables in the columns for E , A. , A , A , and D.. All 
 
 J 3 l 3 2 J3 
 
- 10 - 
 
 the other columns have distinct nondistnigushed variables. The tableau 
 S has one more row, denoted by s ., with distinguished variables in all 
 the E, B, C, and D« columns (the rest of the columns have distinct non- 
 distinguished variables) . Let S be the relation scheme corresponding 
 to row s of S (Kj<m+1). That is, S contains all the attributes 
 labeling columns in which s has distinguished variables. Thus, the 
 
 tableau S corresponds to the JD *[S .,..., S , ] . 
 
 j m+i 
 
 The second tableau, denoted by T, represents truth assignments 
 under which clauses of Q are true. For every F (Kj<m), tableau T has 
 seven rows that represent all the truth assignments under which F is 
 true. If C is a truth assignment under which F is true, then T con- 
 tains a row w as follows. For Ki<3, if x. is assigned 1 under £, row 
 
 J i 
 w has a distinguished variable in the B -column; otherwise, w has a 
 
 J i 
 distinguished variable in the C -column. Row w has distinguished vari- 
 
 J i 
 ables also in the E -column and in the D. -column. The tableau T has two 
 
 additional rows, denoted by u and v. Row u has distinguished variables 
 
 in all the E, B, C, and D. columns (excatly as row s .. of S). Row v 
 
 I m+l 
 
 has distinguished variables in all the A and D columns. All the other 
 columns of rows of T contain distinct nondistinguished variables. 
 
 Example 3 ; Consider the Boolean expression 
 
 (x. + x_ + x-)(x. + x. + x,)(x. + x 2 + x,). 
 By a slight abuse of notation, we denote the distinguished variable in 
 each column by an a (without a subscript) . The dots stand for distinct 
 nondistinguished variables. The tableau S Is 
 
- 11 - 
 
 E l E 2 E 3 A l A 2 A 3 A 4 B l B 2 B 3 B 4 C l C 2 C 3 C 4 D l °2 
 
 s l 
 s 2 
 s 3 
 s 4 
 
 |a 
 
 • 
 
 • 
 
 a 
 
 a 
 
 a 
 
 
 
 
 
 
 
 
 
 
 a 
 
 • I 
 
 1 • 
 
 a 
 
 • 
 
 a 
 
 • 
 
 a 
 
 a 
 
 
 
 
 
 
 
 
 
 a 
 
 • | 
 
 1 • 
 
 • 
 
 a 
 
 a 
 
 a 
 
 • 
 
 a 
 
 
 
 
 
 
 
 
 
 a 
 
 • I 
 
 u 
 
 a 
 
 a 
 
 • 
 
 • 
 
 • 
 
 • 
 
 a 
 
 a 
 
 a 
 
 a 
 
 a 
 
 a 
 
 a 
 
 a 
 
 • 
 
 a| 
 
 The tableau T is given in Figure 1. [] 
 
 Let Z be a set of dependencies that consists of the JD 
 *[S.,...,S ], and the FD's B D. ■*■ A , CD. + A , and D.D- ■*■ A (for 
 Ki<n); and let a be the JD corresponding to the tableau T. 
 
 We will show that a is a consequence of E if and only if Q is 
 satisfiable. The proof is an analysis of the computation of chase (T). 
 Since the rules associated with E can be applied to T in any order, we 
 start by applying the FD- rules. The FD-rules for D,D 2 > A (Ki<n) can- 
 not be applied, since no two rows of T agree in the columns for D. and 
 D^. The application of the other FD-rules modifies only the A-columns 
 of T. Note that rows u and v of T are not affected by this modifica- 
 tion. After all possible applications of FD-rules to T, each A -column 
 is going to have exactly two repeated nondistinguished variables, say 
 b and b (Ki<n). The variable b results from the application of the 
 FD-rules for B D. ■*■ A , and can be viewed as representing the truth 
 value 1. The variable b results from the application of the FD-rules 
 
 fo 
 
 r C D. ■»• A , and can be viewed as representing the truth value 0. A 
 
 row w of T representing a truth assignment for a clause F (with vari- 
 ables x , x , and x ) is going to have b in the A -column, if x 
 
 J l J 2 J 3 J i_ J i J i 
 
 is true; otherwise, it is going to have b in this column (Ki<n). 
 
 J i 
 
 (1) A variable is repeated if it appears in more than one row. 
 
- 12 - 
 
 E l E 2 E 3 A l A 2 A 3 A 4 B l B 2 B 3 B 4 C l C 2 C 3 C 4 D l D 2 
 
 ll 
 
 I 
 
 
 
 
 1 • 4 
 
 • 
 
 • 
 
 a a a 
 
 • 
 
 a 
 
 
 |l 
 
 I 
 
 
 
 
 1 • 4 
 
 a 
 
 • 
 
 a a 
 
 • 
 
 a 
 
 
 \i 
 
 1 
 
 
 
 
 a a 
 
 • 
 
 a . 
 
 • 
 
 • 
 
 a 
 
 
 \l 
 
 t 
 
 
 
 
 a 
 
 » • 
 
 • 
 
 a a 
 
 • 
 
 a 
 
 
 \t 
 
 I . . . 
 
 
 
 
 a . 
 
 a 
 
 • 
 
 a 
 
 • 
 
 a 
 
 
 \s 
 
 L . . 
 
 
 
 
 a a 
 
 • 
 
 • 4 
 
 a 
 
 • 
 
 a 
 
 
 \t 
 
 1 
 
 
 
 
 a a a 
 
 • 
 
 • 
 
 ► • 
 
 • 
 
 a 
 
 
 
 a . 
 
 
 
 
 i • 
 
 ► • 
 
 • 
 
 a 
 
 a 
 
 a 
 
 a 
 
 
 
 a 
 
 
 
 
 • 4 
 
 i • 
 
 a 
 
 a 
 
 a 
 
 • 
 
 a 
 
 
 
 a 
 
 
 
 
 1 • « 
 
 a 
 
 a 
 
 a 
 
 ► • 
 
 • 
 
 a 
 
 
 
 a 
 
 
 
 
 a 
 
 ► • 
 
 • 
 
 • 
 
 a 
 
 a 
 
 a 
 
 
 
 a 
 
 
 
 
 a 
 
 » • 
 
 a 
 
 • 1 
 
 a 
 
 • 
 
 a 
 
 
 
 a 
 
 
 
 
 a 
 
 a 
 
 • 
 
 • 4 
 
 • 
 
 a 
 
 a 
 
 
 
 a 
 
 
 
 
 a 
 
 a 
 
 a 
 
 • 
 
 > • 
 
 • 
 
 a 
 
 
 
 a 
 
 
 
 
 i • i 
 
 » • 
 
 • 
 
 a 
 
 a 
 
 a 
 
 a 
 
 
 
 a 
 
 
 
 
 ► • 4 
 
 i • 
 
 a 
 
 a 
 
 a 
 
 • 
 
 a 
 
 
 
 a 
 
 
 
 
 • 4 
 
 a 
 
 • 
 
 a 
 
 • 
 
 a 
 
 a 
 
 
 
 a 
 
 
 
 
 i • i 
 
 a 
 
 a 
 
 a 
 
 i • 
 
 • 
 
 a 
 
 
 
 a 
 
 
 
 
 a 
 
 • 
 
 • 
 
 • 4 
 
 a 
 
 a 
 
 a 
 
 
 
 a 
 
 
 
 
 a 
 
 a 
 
 • 
 
 • 
 
 i • 
 
 a 
 
 a 
 
 
 
 a 
 
 
 
 
 a 
 
 a 
 
 a 
 
 • 4 
 
 ► • 
 
 • 
 
 a 
 
 
 
 i a a 
 
 
 
 
 a c 
 
 i a 
 
 a 
 
 a i 
 
 i a 
 
 a 
 
 a| 
 
 J 4 
 
 
 
 
 
 
 
 a a| 
 
 Figure 1 
 
 Thus, the truth assignment represented by w is now given in the A- 
 column8 of w. It is easy to show that no further applications of FD- 
 rules are possible. Let T' be the tableau obtained by applying the FD- 
 rules to T. 
 
 Lemma 1 : Suppose that rows w.,...,w . , of T" are joinable on 
 
 i m+l 
 
 S.,...,S ., with a result w, and w is not in T. Then w ,, is u, and for 
 1 nn-i m+l 
 
 all Kj<m, row w is a row of T representing a truth assignment for F . 
 
 ""roof : If all the w 's are Identical, then w is the same row as the 
 w 's and, hence, it is in T' . Therefore, it suffices to show that if 
 
- 13 - 
 
 either w . . is not u or some w. (j*m+l) is not a row representing a 
 truth assignment for F . , then all the w.'s are identical. 
 
 Claim I t If row w . . or row w. (for some Ki<m) has in the 
 
 E -column a nondistinguished variable that appears nowhere else in T' , 
 
 then w. and w .. are identical, 
 i m+1 
 
 Claim 1 follows from the fact that for all Ki<m, rows w. and w. , 
 
 1 m+l 
 
 agree in the E -column, because both S. and S ., contain E. . 
 ° i i m+l i 
 
 Claim 2 ; If some w (j*m+l) is u, then for all Kl<m, row w is u. 
 
 Claim 2 follows from the fact that for all Ki<m, the relation 
 scheme S contains the attribute D , and u has in the D. -column a non- 
 distinguished variable appearing nowhere else in T'. 
 
 Suppose that w . . is v. But v has in each E -column a distinct 
 m+l 1 
 
 nondistinguished variables appearing nowhere else in T' , and so by Claim 
 
 1, every w. is v. So suppose that w. . is a row representing a truth 
 l m+l 
 
 assignment for some F, . Therefore, row w . has a distinguished vari- 
 able in the E, -column, and in all the other E-columns it has distinct 
 nondistinguished variables appearing nowhere else in T'. By Claim 1, 
 
 for all i*k, row w. and w , , are identical. Row w. must have a dis- 
 i m+l k 
 
 tinguished variable in the E, -column, since w . has a distinguished 
 variable in this column and both S, and S . contain E, . By Claim 2, 
 row w, cannot be u, because there is a row w (j*m+l) that is not u 
 (since i>l). Thus, all the w.'s (i*k) are equal to a row of T' 
 representing a truth assignment for F, , and w, is also a row represent- 
 
- 14 - 
 
 ing a truth assignment for F. . But every variable x appears in more 
 than one clause and, hence, the pattern of the distinguished variables 
 in the A-columns of tableau S implies that w, represents the same truth 
 assignment as all the other w 's. That is, all the w 's are identical. 
 
 So far we have shown that if w .. is not u, then all the w.'s are 
 
 m+i l 
 
 identical. Now suppose that some w. is not a row representing a truth 
 
 assignment for F . If w is u, then Claim 2 implies that for all Ki<m, 
 
 row w. is u. But w . . is also u, and so all the w.'s are identical. If 
 i m+1 i 
 
 w is either v or a row representing a truth assignment for some F, 
 J fc 
 
 ( j*k) , then w has in the E -column a nondistinguished variable appear- 
 ing nowhere else in T' , and so by Claim 1, rows w. and w .. are identi- 
 
 j m+l 
 
 cal. But w ,, is u, and so Claim 2 implies that all the w.'s are ident- 
 
 m+1 r i 
 
 ical. [] 
 
 Corollary 2 ; Rows w. , . . . ,w . of T' are joinable on S.,...,S . 
 with a result w not in T' if and only if Q is satisfiable. 
 
 Proof : Only if . By Lemma 1, row w (Kj<m) represents the follow- 
 ing truth assignment for F. . If x. is a variable of F , and the 
 
 J J j J 
 
 A. -column of w has the repeated nondistinguished variable b , then 
 J j J J i 
 
 x. is assigned 1. If the A. -column of w, has the repeated nondis- 
 
 J j J j J 
 
 tinguished variable b , then x is assigned 0. Under this truth 
 
 J i J i 
 assignment F is true. But the pattern of the distinguished variables 
 
 in the A-columns of tableau S implies that in this case there is a truth 
 
 ass^ nment i> for all the variables x.,...,x such that for all Kj<m, 
 
 In 
 
 the truth assignment ^ agrees with the truth assignment represented by 
 
- 15 - 
 
 w on the variables of F . . Hence, each P. is true under ty, and Q is 
 satisf iable. 
 
 If . Suppose that t is a truth assignment that satisfies Q. For 
 
 all Kj<m, let w be the row of T' representing the truth assignment for 
 
 F. that agrees with i|> on the variables of F . ; and let w ., be row u of 
 J J ™+l 
 
 T'. It is easy to show that the rows v. v .. are joinable on 
 
 S.,...,S . with a result w not in T'. [] 
 
 Lemma 3 ; The JD a (corresponding to T) is a consequence of Z if and 
 only if Q is satisf iable. 
 
 Proof : Only if . By Corollary 2, if Q is not satisf iable, then the 
 only JD-rule for E cannot be applied to T' . Therefore, chase_(T) is the 
 result of applying the FD-rules to T, i.e., chase_(T) » T'. This chase 
 does not contain a row with only distinguished variables and, hence, o 
 is not a consequence of £. 
 
 If . Suppose that Q is satisfiable. By Lemma 1 and Corollary 2, an 
 application of the JD-rule for Z to T' adds a row w that has dis- 
 tinguished variables in all the E, B, C, and D columns. We can apply 
 the FD-rules for D.D 2 + A. (Kj<n) to w and v (the last row of T'), and 
 the result is a row with only distinguished variables. Thus, a is a 
 consequence of E. [] 
 
- 16 - 
 
 _3. 2 NP-Completeness Results Concerning Applications of JD-Rules and 
 Testing Whether Relations Obey Join Dependencies * 
 
 Theorem 4 : The problem of deciding whether a JD-rule can be applied 
 to a tableau U is NP-complete. This problem is NP-complete even if U 
 can be obtained from a tableau corresponding to a JD by applying some 
 FD-rules. 
 
 Proof : At first we will show that the problem is in NP. Suppose we 
 
 have to decide whether the JD-rule for a JD *[R.,...,R ] can be applied 
 
 1 q 
 
 to a tableau U. We nondeterministically choose q rows w.,...,w of U, 
 and check in polynomial time whether they are joinable on R.,...,R with 
 a result w not in U. 
 
 To show that the problem is complete in NP, the 3-SAT problem can 
 
 be reduced to this problem as described in Section 3.1. That is, given 
 
 an instance Q of the 3-SAT problem, we construct the JD *[S .,..., S ] and 
 
 1 m 
 
 the tableau T. By applying some FD-rules to T, we obtain the tableau 
 T'. By Corollary 2, the JD-rule for *[S .,..., S ] can be applied to T' 
 if and only if Q is satisfiable. [] 
 
 Corollary 5 : It is NP-complete whether a JD *[R.,...,R ] does not 
 hold in a relation r. 
 
 Proof : The problem is in NP, since we can nondeterministically find 
 q tuples of r that are joinable on R.,...,R with a result that is not a 
 tuple of r. 
 
- 17 - 
 
 To show that the problem is complete In NP, we can view the tableau 
 T' as a relation r (by replacing each variable with a distinct con- 
 stant) . By Corollary 2, the JD *[S ,,..., S ] does not hold in r if and 
 
 i m 
 
 only if Q is satisfiable. [] 
 
 3. • A Mi NP-Completeness Result for Inferring Join Dependencies 
 
 Theorem 6 ; Let T be a set of FD's and one JD, and let Y be another 
 JD. The problem of deciding whether Y is a consequence of V is NP- 
 complete. 
 
 Proof ; Let *[R.,...,R ] be the only JD in T. At first we show that 
 the problem is in NP. Let U be a tableau and suppose that chase r (U) can 
 be obtained from U by using only the JD-rule for T. The following claim 
 shows that any row of chase r (U) (that is not in U) can be obtained by a 
 single application of the JD-rule for T to some rows of U. 
 
 Claim 1 : If a tableau U' is obtained by repeatedly applying the 
 JD-rule for T to a tableau U, then any row of U' is the result of join- 
 ing some rows of U on R. R . 
 
 In order to prove this claim, suppose that the JD-rule for T is 
 applied only to the original rows of U until it cannot be applied 
 anymore. Let the resulting tableau be U. It suffices to show that the 
 JD-rule for T cannot be applied to U. So suppose that the JD-rule can 
 be appl ^d to U. That is, there are rows w.,...,w of U that are join- 
 able on R,,...,R with a result w not in U. If some w. is not in U, 
 1' q i 
 
- 18 - 
 
 then there are rows v.,...,v in U that are joinable on R.,...,R with a 
 
 1 q 1 q 
 
 result w . But w. and v agree on R. and, hence, w can be replaced 
 
 with v.. That is, w.,...,w. . ,v. ,w. , , , . . .,w are Joinable on R.,...,R 
 i 1 i-1 I i+1 q J 1 q 
 
 with a result w. It follows that every w that is not In U can be 
 
 replaced with some row in U, and the resulting rows are Joinable on 
 
 R,,...,R with a result w. Therefore, w is also in U. 
 1 q 
 
 Now suppose that no FD-rule for T can be applied to a tableau U, 
 but some FD-rules for T can be applied to a tableau U' , where U' is 
 obtained from U by applying the JD-rule for T several times* That is, 
 there are rows v and w of U' such that some FD-rule for T can be applied 
 to v and w. By Claim 1, rows v and w can be generated by applying the 
 JD-rule to rows of U (unless they are already in U) . By using a non- 
 deterministic algorithm, rows v and w can be obtained in polynomial time 
 in no more than two applications of the JD-rule for V, Therefore, in 
 order to generate any row of chase_(U), we can always find a sequence of 
 applications of the rules for T in which the JD-rule is never used more 
 than twice in a row. Let n be the number of distinct variables in U. 
 Each application of an FD-rule reduces the number of distinct variables 
 by one. Thus, the FD-rules can be applied to U no more than n times. 
 Since each application of an FD-rule Is preceded by no more than two 
 applications of the JD-rule for T, we can generate any row of chase r (U) 
 in 0(n) applications of the rules for T. In particualr, we can use a 
 nondeterministic algorithm to generate the row consisting only of dis- 
 tinguished variables (if this row is indeed in chase r (U)) in 0(n) appli- 
 cations of the rules for T. 
 
- 19 - 
 
 The following is a nondeterministic polynomial time algorithm that 
 returns "Yes" if Y is a consequence of T. The tableau for Y is denoted 
 by V. 
 
 (1) Nondetermini8tically create two rows v. and v. such that each v is 
 either a row of V or can be obtained by joining some rows of V on 
 
 1 q 
 
 (2) If either v. or v 2 consists only of distinguished variables, then 
 
 return "Yes". 
 
 (3) Add v. and v 2 to V (if they are not already there). 
 
 (4) Apply the FD-rules to V until no FD-rule can be applied. If at 
 least one FD-rule has been applied, then go to (1). 
 
 Steps (l)-(3) require nondeterministic linear time. Step (4) 
 requires (deterministic) polynomial time [ABU]. Each application of an 
 FD-rule reduces the number of distinct variables in V by one, and Step 
 (1) is repeated only after an application of some FD-rule. Therefore, 
 no more than 0(n) rows are added to V, and the algorithm has a nondetem- 
 inistic polynomial running time. 
 
 It remains to be shown that the problem is NP-complete. The 3-SAT 
 problem can be reduced to this problem as described in Section 3. 1, and 
 the NP-completeness follows from Lemma 3. [] 
 
- 20 - 
 
 .3»A An NP-Hard Result for Computing the Join of Several Relations 
 
 In this section we show that computing the join of several rela- 
 tions is a hard problem (even If the relations come from a universal 
 instance) . We assume familiarity with the definition of the join opera- 
 tor, and the correspondence between tableaux and relational expressions 
 (cf. [ASU] ) . It should be noted that a similar result Is stated in 
 [HLY]. However, our result Is stronger, since we assume that the rela- 
 tions are obtained by projection from a universal instance. 
 
 Theorem 7 : Let E be a relational expression with the join as the 
 only operator, let I be a universal instance, and let r be a relation. 
 The problem whether E(I) * r is NP-hard. (E(I) is the value of the 
 expression E for the Instance I.) 
 
 Proof: We can view the tableau T' of Section 3. 1 as a universal 
 
 instance, and the tableau S as representing the relational expression 
 
 m 
 
 * S . Thus the 3-SAT problem can be reduced to this problem in the 
 i-1 
 following way. Given a Boolean expression Q, we construct the rela- 
 
 m 
 tional expression * S corresponding to the tableau S of Section 3. 1. 
 
 i-1 
 The instance I is obtained from the tableau T' by replacing each vari- 
 able of T' with a distinct element from the domain of the corresponding 
 
 attribute. The relation r is the same as the instance I. By Corollary 
 
 m 
 2, the relation r is not the value of * S. for I if and only if Q is 
 
 i-1 
 satisfiable. [] 
 
- 21 - 
 
 References 
 
 [ABU] Aho, A. V., C. Beeri, and J. D. Ullman, "The Theory of Joins in 
 Relational Databases/' ACM Trans * on Database Systems . Vol. 4, No. 
 3 (Sept., 1979), pp. 297-314. 
 
 [ASU] Aho, A. V., Y. Sagiv, and J. D. Ullman, "Equivalences Among Rela- 
 tional Expressions," SI AM J. Computing . Vol. 8, No. 2 (May, 1979), 
 pp. 218-246. 
 
 [Arm] Armstrong, W. W., "Dependency Structures of Database Relation- 
 ships," Proc. IFIP 74. North Holland, 1974, pp. 580-583. 
 
 [Bel] Beeri, C, "On the Membership Problem for Multivalued Dependencies 
 in Relational Databases," to appear in ACM Trans , on Database 
 Systems . 
 
 [BeB] Beeri, C, and P. A. Bernstein, "Computational Problems Related to 
 the Design of Normal Form Relational Schemas," ACM Trans , on 
 Database Systems . Vol. 4, No. 1 (March, 1979), pp. 30-59. 
 
 [BFH] Beeri, C, R. Fagin, and J. H. Howard, "A Complete Axiomatization 
 for Functional and Multivalued Dependencies in Database Rela- 
 tions," Proc . ACM-SIGMOD Int . Conf . on Management of Data , 
 Toronto, Aug., 1977, pp. 47-61. 
 
 [Cod] Codd, E. F., "A Relational Model for Large Shared Data Banks," 
 Comm. ACM, Vol. 13, No. 6 (June, 1970), pp. 377-387. 
 
 [Coo] Cook, S. A., "The Complexity of Theorem Proving Procedures," Proc . 
 3rd Annual ACM Symp . on Theory of Computing , May, 1971, pp. 
 151-158. 
 
 [Fag] gin, R. , "Multivalued Dependencies and a New Normal Form for 
 Relational Databases," ACM Trans , on Database Systems , Vol. 2, No. 
 
- 22 - 
 
 3 (Sept., 1977), pp. 262-278. 
 
 [Gal] Galil, Z., "An Almost Linear Time Algorithm for Computing the 
 Dependency Basis In a Relational Data Base," Res. Rept., Dept. of 
 Mathematical Sciences, Computer Science Division, Tel Aviv Univer- 
 sity, Tel Aviv, Israel. 
 
 [GaJ] Garey, M. R. , and D. S. Johnson, Computers and Intractability : A 
 Guide to the Theory of NP-Completeness , Freeman, San Francisco, 
 1979. 
 
 [HIT] Hagihara, K. , M. Ito, K. Taniguchi, and T. Kasami, "Decision Prob- 
 lems for Multivalued Dependencies in Relational Databases," SIAM 
 J. Computing . Vol. 8, No. 2 (May, 1979), pp. 247-264. 
 
 [HLY] Honeyman, P., R. E. Ladner, and M. Tannakakis , "Testing the 
 Universal Instance Assumption," to appear. 
 
 [Rar] Karp, R. M. , "Reducibillty Among Combinatorial Problems," in 
 Complexity of Computer Computations , (R. E. Miller and J. W. 
 Thatcher, eds.), Plenum Press, New York, 1972, pp. 85-104. 
 
 [MMS] Maier D. , A. 0. Mendelzon, and Y. Sagiv, "Testing Equivalence of 
 Data Dependencies," to appear in ACM Trans , on Database Systems . 
 
 [Ris] Rlssanen, J., "Theory of Relations for Databases - A Tutorial Sur- 
 vey," Proc . 7th Symp . on Mathematical Foundations of Computer 
 Science , Lecture Notes in Computer Science 64, Springer-Verlag, 
 1978, pp. 536-551. 
 
 [Sag] Sagiv. Y. , "An Algorithm for Inferring Multivalued Dependencies 
 With an Application to Propositional Logic," to appear in JACM . 
 
 [Yan] Yannakakis, M. , private communication. 
 
 [Zan] Zaniolo, C, "Analysis and Design of Relational Schemata for 
 
- 23 - 
 
 Database Systems," Tech. Rep. UCLA-ENG-7769, Dept. of Comp. Sci., 
 UCLA, July, 1976. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-79-985 
 
 3. Recipient's Accession No. 
 
 5. Report Dste 
 
 August 1979 
 
 4. Title and Subtitle 
 
 On the Complexity of Inferring Join Dependencies 
 
 7. Author(s) 
 
 David Maier, Yehoshua Sagiv* 
 
 8- Performing Organization Rept. 
 No. 
 
 9. Performing Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 at Urbana-Champaign 
 
 Urhana, TlUnMs 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 MCS-77-22830 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C. 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 It is shown that deciding whether a set of functional dependencies 
 and one join dependency implies another join dependency is NP-complete. 
 It is also shown that deciding whether a JD-rule can be applied to a 
 tableau T is NP-complete. This problem is NP-complete even if T can be 
 obtained from a tableau corresponding to a join dependency by applying 
 some FD-rules. As a result, it follows that computing the join of 
 severa; relations is NP-hflrri 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 functional dependency, multivalued dependency, join dependency, 
 join, membership algorithm, NP-complete, relational database 
 
 17b. Identifiers/Open-Ended Terms 
 
 17c. COSATI Field/Group 
 
 18. Availability Statement 
 
 19. Security Class (This 
 Report) 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 FORM NTIS-35 (10-70) 
 
 USCOMM-DC 40329-P7 1 
 
JON l 2 1960 
 
FEB 
 
 2 Ml