UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAJSN s T po e ns^;rf n orits a S f\™*™i >* - which it was withdrl ° the Iibnu 7 fr °^ Latest Date : t r^ b :;;r° rbef0rethe f^setjz r er,inins °< — - ■— . ♦he Universe ° nd "^ «*«* ™ Missal from To renew coll Telephone Center 333 a/ , nn L161— O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/quadraticalgorit992sagi UIUCDCS-R-79-992 CUf UILU-ENG 79 1740 Quadratic Algorithms for Minimizing Joins in Restricted Relational Expressions by Yehoshua Sagiv October 1979 e— c • »i DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS THE LIBRARY OF THE MAY 8 1960 UNIVERSITY OF ILLINOIS 'i^ANA-OHAMPAIGN UIUCDCS-R-79-992 Quadratic Algorithms for Minimizing Joins in Pestricted Relational Expressions Yehoshua Sagiv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 October 1979 This work was supported in part by the National Science Foundation under grant MCS-76-15255 Xn. ABSTRACT An important step in the optimization of queries in relational databases is minimizing the number of join operations used in the evaluation of a query. It is shown that three subclasses of tableaux 2 (including the subclass of simple tableaux) have 0(n ) time equivalence and minimization algorithms. Since tableaux are nonprocedural represen- tations of relational expressions over select, project and join, these minimzation algorithms can be used to minimize the number of join opera- tors in expressions whose tableaux belong to one of these subclasses. CR categories: 4.33, 5.25 Key words and phrases: relational database, relational algebra, query optimization, equivalence of queries, conjunctive query, tableau, NP- complete. _K Introduction The relational model for databases features two high-level query languages: the relational algebra and the relational calculus [9,10]. The relational algebra is a procedural language that uses operators defined on relations, and hence a query is usually translated to a rela- tional expression before being evaluated. However, the efficiency with which a query can be answered depends on the relational expression that has been chosen to represent this query. Consequently, a number of papers (e.g., [12,13,14,15,17,18]) have considered transformations that reduce the cost of evaluating a query. However, these transformations do not necessarily produce an equivalent query of least cost. Chandra and Merlin [8] show how to perform global optimization on a large class of queries, but their algorithm is exponential in the size of the query. The most commonly used operators of the relational algebra are select, project and join, and a polynomial time algorithm for optimizing a subclass of expressions with these operators is given in [4,5]. This optimization technique uses tableaux [3] as a nonprocedural representa- tion of queries. Tableaux are similar to the conjunctive queries of [8], and resemble Zloof's "Query-by-Example" language [20]. Relational expressions over select, project, and join can be represented by tableaux [3]. In [8] it is shown that every tableau has an equivalent minimal tableau. The importance of this result follows from the fact that an expression with a minimum number of joins corresponds to a minimal tableau. A polynomial minimization algorithm for the class of simple tableau is given in [4] (although the problem is NP-complete in - 2 - the general case [3]). In [5] it shown how to obtain in polynomial time an optimal expression from a minimal simple tableau (if such an expres- sion exists). This optimization technique is machine independent. It is capable of minimzing the number of join operators (note that join is the most expensive operator to implement), eliminating redundant subexpressions, and applying select and project as early as possible. In a relational database system, this type of optimization should be augmented with machine dependent optimization that takes into consideration the size of the relations involved, sorted columns, etc. In this paper we describe new minimization algorithms for two subc- lasses of tableaux. We also show how to improve the running time of the minimzation algorithm for the class of simple tableaux. All these algo- 2 rithms have an 0(n ) running time. It shown that each one of the three 2 subclasses of tableaux discussed in this paper also has an 0(n ) time equivalence algorithm. Finally, in Sections 7 and 8 we touch upon the problems of minimizing tableaux in the general case, and obtaining optimal expressions from minimal tableaux. 2. Basic Definitions 2.1 The Relational Model The relational model for databases [9] assumes that the data is stored in tables called relations . The columns of a table correspond to - 3 - attributes , and the rows to records or tuples . Each attribute has an associated domain of values. A tuple is viewed as a mapping from the attributes to their domains, since no canonical ordering of the attri- butes is needed in this way. If r is a relation with a column corresponding to the attribute A, and p is a tuple in r, then y(A) is the value of the A-component of u. In this paper we usually denote a relation as a set of tuples. A relation scheme is a set of attributes labeling the columns of a table, and it is usually written as a string of attributes. We often use the relation scheme itself as the name of the table. A relation is just the "current value" of a relation scheme. The relation is said to be defined on the set of attributes of the relation scheme. 2.'2l The Relational Algebra and Relational Expressions The relational algebra [9,10] is a set of operators defined on relations. In this paper we consider the operators select, project, and join. Let r be a relation defined on a set of attributes X, A an attri- bute in X and c a value from the domain of A. The selection A=c, writ- ten a. (r) , is A«c o. (r) «■ {u | u is a tuple in r and u(A)«c> A C Let Y be a subset of X, the projection of r onto Y, written ^ Y (r) , is - 4 - tt (r) = Let r. and r~ be relations defined on the relation schemes R. and R 2 , respectively. The (natural) loin of r, and r 2 , written r. (xj r~, is r. |xj r 2 = Conventionally, we also regard as a tableau. This tableau represents the function that maps every instance to the empty relation. Example 1 : Consider the following tableau. ABC T - ,a l a 3J j a l 2 b 3J ] b l b 2 a 3J 1*1 b 2 b 4| The summary is shown first, with a line below it, and integers are used as constants. Tableau T defines a relation on the relation scheme AC. For exam- ple, supoose that I is the instance {211,121,122). Consider the valuation p which assigns 2 to b« and 1 to every other variable in T. Under this valuation, each row of T becomes 121, which is a member of I. Therefore, pCa.a..) = 11 is in T(I). If p assigns 1 to a., b. , b~ and b,, and 2 to b~ and a~, the first and third rows become 121 and the second row becomes 122; both are members of I, and so 12 is in T(I). Since no valuation for T produces a tuple other than 11 or 12, T(I) * {11, 12). [] - 8 - _3._1 Relational Expressions and Tableaux Every restricted relational expression E has a corresponding tableau T that defines the same mapping as E (however, the converse is not true) [3]. The tableau T for the expression E can be constructed bottom up by applying the following rules [3]. (A) If there are no operators in E, then E is a single relation scheme R. The tableau T for E has one row and a summary such that : (i) If A is an attribute in R, then in the column for A, the tableau T has the same distinguished variable in the summary and row. (ii) If A Is not in R, then its column has a blank in the summary and a nondistinguished variable in the row. (Bl) Suppose E is of the form a (E.), and we have constructed T., the tableau for E.. (i) If the summary of T. has blank in the column for A, then the expression E has no meaning, and the tableau T for E is unde- fined. (ii) If T- has a constant c'*c in the summary column for A, then for any instance I, v_(E) has only tuples with c' in the com- ponent for A, and v_(E) Is $. Hence, the tableau for E is 4>. (iii) If T. has c in the summary column for A, then the tableau for E is T-. (iv) If T. has a distinguished variable a in the summary column for A, the tableau T for E is constructed by replacing a by c whenever It appears in T.. - 9 - (B2) Suppose E is of the form tt (E.) and T. is the tableau for E.. Con- struct T for E by replacing nonblank symbols by blanks in the sum- mary of T. for those columns whose attributes are not in X. Dis- tinguished variables in those columns become nondistinguished. (B3) Suppose E is of the form E. b<| E-, and T. and T^ are the tableaux for E. and E_, respectively. (i) If T. and T~ have some column in which their summaries have distinct constants, then It easy to show that for all instances I, v T (E)= , so <(» is the tableau for E. (ii) If no corresponding positions in the summaries have distinct constants, let S. and S„ be the sets of symbols of T. and T_, respectively. We may take S. and S_ to have disjoint sets of nondistinguished variables, but identical distinguished vari- ables in corresponding columns. Construct T for E to have the union of all the rows of T. and T_. The summary of T has in a given column: (a) The constant c if one or both T. and T_ have c in that column's summary. In this case replace any distinguished variable in that column by c. (b) The distinguished symbol a if (a) does not apply, but one or both of T. and T ? have a in that column's summary. (c) Blank, otherwise. These rules can also be used to define the operations select, pro- ject and join on tableaux. The result of applying any one of these operators to tableaux (not necessarily tableaux derived from restricted - 10 - relational expressions) is defined to be the tableau described in the rule for that operator. Example 2 ; Consider the expression tt (tt (AB (xj BC) M a _(AB)) If the above rules are applied to this expression, then the result is the tableau of Example 1. [] ^._2 Equivalence of Tableaux Two tableaux T and T 2 are equivalent , written T. = T-, if for all instances I, T.(I) = T_(I). We say that T. is contained in T_, written T x C^ T 2 , if for all I, T^I) C T 2 (I). Let T. and T~ be tableaux with the same target relation scheme, and let S. and S_ be the sets of symbols of T. and T_, respectively. A homomorphism is a mapping £:S.-»-S ? such that: (a) If c is a constant, then £(c)=c (b) If s is the summary of T., then £(s) is the summary of T_. (c) If w is any row of T., then £(w) is a row of T„. The following theorem is proved in [3,8]. Theorem 1 ; T_ (L, T. if and only if T. and T_ have the same target relation scheme, and there is a homomorphism £:S.->-S„. By condition (c) , a homomorphism £ corresponds to a mapping 8 from the rows of T. to the rows of T_. The mapping 6 is called a containment mapping , and it satisfies the following conditions. - 11 - (1) If row w of T. has a constant in some column A, then 9(w) has the same constant in column A. (2) If row w of T. has a distinguished variable in column A, then 9(w) has a distinguished variable in column A. (3) If rows w and v have the same nondistinguished variable in column A, then rows 8(w) and 9(v) have the same symbol in column A. Corollary 2 ; [3] Tableaux T. and T„ are equivalent if and only if they have identical summaries up to renaming of distinguished variables, and there are containment mappings in both directions. A tableau T is minimal if T is not equivalent to any tableau with fewer rows. Note that if T comes from an expression E, then the number of joins in E is one less than the number of rows in T. Thus, if T is minimal, it corresponds to an expression with a minimum number of joins. For every tableau T, there Is a unique (up to renaming of variables) tableau T' , such that T = T' and T' is minimal [8]. Furthermore, the minimal tableau equivalent to T can be obtained by deleting some of the (2) rows of T. The core of T is the set of all the rows that belong to the minimal tableau obtained by deleting redundant rows of T. A folding Is a containment mapping from the rows of a tableau T to the rows of the core of T, such that every row in the core of T is mapped to itself. It can be shown that every tableau T has a folding [8]. Note that a (1) In this paper we consider only equivalence (and not proper contain- ment) of tableaux, and therefore the definition of a containment mapping is more restricted than the original definition given in [3]. (2) A tableau T might have two different cores. However, they are the same up to renaming of variabls. Therefore, we usually speak about the core of T. - 12 - homomorphisra that corresponds to a folding maps every variable in the core of T to itself. The following corollary is an immediate conse- quence of the results stated so far. Corollary 3 ; Let T. and T_ be tableaux with the same summary. If the rows of T\ are a subset of the rows of T_, then (1) T 2 C.J. T r and (2) T 2 = T. if and only if the core of T- is contained in T.. Let w and x be rows of tableaux over the same set of attributes. Row w covers row x, written x < w, if for all columns A, (1) if x has a constant in column A, then w has the same constant in column A, and (2) if x has a distinguished variable in column A, then w has a dis- tinguished variable in column A. Note that if x is mapped to w, and x < w, then the first two conditions of a containment mapping are satisfied. Let R and S be sets of rows over the same set of attributes. The set S covers the set R, written R < S, if every row of R is covered by some row of S. A symbol (i.e., a variable or a constant) of a tableau T is repeated in some column A, if it appears in that column in more than one row. A tableau T is simple if whenever T has a repeated nodistinguished variable b in some column A, then b is the only repeated symbol in that 3 column. The class of simple tableaux has an 0(n ) equivalence algorithm 4 [3], and an 0(n ) minimzation algorithm [4]. Other equivalence algo- rithms can be obtained from the containment algorithms of [16]. That - 13 - 2 is, tesing whether T« = T. can be done in 0(n ) in the following two cases : (1) Both T. and T„ have at most one repeated nondistinguished variable in each row. (2) Every row of T. is covered by at most two rows of T^, and every row of T« is covered by at most two rows of T.. 4^ Polynomial Equivalence Algorithms In this section we consider the following classes of tableaux. (1) The class of all tableaux T, such that T has at most one repeated nondistinguished variable in each row. (2) The class of all tableaux T, such that every row of T is covered by at most one row besides itself. Note that deciding whether a tableau belongs to Class 1 (or whether a tableau is simple) requires 0(n) time. However, deciding whether a 2 tableau belongs to Class 2 requires 0(n ). 2 Theorem 4 : Each one of the above classes has an 0(n ) time equivalence algorithm. Proof : It follows immediately from the results of [16] that the theorem is true for Class (1). Suppose that both T and T 9 are tableaux that belong to Class 2. We say that two rows w and x are equivalent, if w covers x and x covers w. Consider the algorithm of Figure 1. This algorithm tests whether T = T . Obviously, if T = T. and T. E 1 - 14 - (1) let T. be the result of removing from T. every row that Is not equivalent to some row of T~; (2) let T 2 be the result of removing from T_ every row that Is not equivalent to some row of T. ; (3) If T, £ T. then return false; (4) if T. % T 2 then rejturn false; (5) If T. = T. then return true else return false; end Figure 1 then T. = T 2 if and only if T. = T-. Thus, we have to show that T. and T 2 cannot be equivalent if either T. % T, or T 2 % 1„. f r Suppose that T. % T. . Since the rows of T. are a subset of the rows of T., it follows that the core of T, contains a row w which is not in T.. By the construction of T., no row of T ? is equivalent to w. But equivalent tableaux have cores which are the same (up to renaming of variables). That is, every row in the core of one tableau has an equivalent row in the core of the other tableau. Therefore, T. and T« cannot be equivalent. Similarly, if T 2 % T- t then T. % T_. Obviously, for i e (1,2), each row of T is covered by at most two rows of T , and each row of T is covered by at most two rows of T . Suppose that a row w of T. is covered by more than two rows of T_. Let x be a row of T« that is equivalent to w. Row x is covered by every row - 15 - of T„ that covers w. Therefore, x Is covered by at least two rows of T 2 besides itself. This contradiction implies that each row of T. is r r covered by at most two rows of T-. Similarly, each row of T. is covered by at most two rows of T.. It follows that testing equivalence in lines (3)-(5) can be done using the algorithm of [16]. Each application of this algorithm 2 2 requires 0(n ) time. Lines (1) and (2) can be executed in 0(n ) time and, therefore, the algorithm of Figure 1 has a time complexity of 0(n 2 ). [] 5_. Obtaining Minimization Algorithms from Equivalence Algorithms Let S be a class of tableaux. We say that S is closed under row deletion if whenever a tableau T is in S, and T' is obtained by deleting some of the rows of T, then T' is also in S. Theorem 5 ; Let S be a class of tableaux closed under row deletion. If there is an equivalence algorithm for S that runs in F(n) time (F(n) > en for some constant c) , then there is a minimization algorithm for S that runs in nF(n) time. Proof : Figure 2 describes a minimization algorithm for S. The input for this algorithm is a tableau T of S. We assume that the numbers l,2,...,r correspond to the rows of T. The algorithm is based on the ability to test equivalence of tableaux from S. Note that the equivalence test in line (3) can be replaced with the containment test - 16 - T ' C/p T » because the rows of T' are a subset of the rows of T and, hence, T (L, T' . Let T be the value of T after i iterations through the loop of lines (l)-(3). Since T is assigned a new value in line (3) only if the new value is equivalent to the old value, T (i.e., the tableau returned by the algorithm) is equivalent to T~. Suppose that T is not minimal. Therefore, there is a row j in T that does not belong to the core of T . Let T be the result of deleting row j from T . Since the core of T is not changed as a result of this deletion, T = T and, hence, r Since row j has not been deleted by the algorithm, it must be in T . Let T be the result of removing row j from T . It follows that T is not equivalent to T fi (otherwise row j cannot appear in T ) . But the rows of T are a subset of the rows of T., and the rows of T. are a sub- r j j set of the rows of T Q and, therefore, T Q (L, T .£_ T . Since T = T Q , it follows that T = T Q . This contradiction implies that T is minimal. In each pass through the main loop, line (2) requires 0(n) time and line (3) requires F(n) time. The loop is repeated r times to give a time complexity of nF(n) (since r < n) . [] _6 . Polynomial Minimization Algorithms The classes of tableaux described in Section 4, and the class of simple tableux have polynomial time equivalence algorithms. Each one of - 17 - begin (1) ~ for i :■ 1 to r do begin (2) Let T' be the result of deleting row i from T; (3) if T = T' then T :- T'; end; (4) return T; end Figure 2 these classes is closed under row deletion and, therefore, Theorem 5 can be applied to obtain polynomial minimization algorithms for these classes. However, the algorithms obtained by applying Theorem 5 are not the most efficient minimization algorithms for these classes. For each one of these classes there is a minimization algorithm with a time com- 2 plexity 0(n ). These minimization algorithms are given in the following sections . j>.^ Regular Tableaux In this section we will show that a tableau, in which a folding does not eliminate any repeated nondistinguished variable, can be minim- 2 ized in 0(n ) time. This fact is used in developing the minimization algorithms of the following sections. Let b a repeated nondistinguished variable of a tableau T. The variabel b is essential if It appears in every core of T. A tableau T is regular if all repeated nondistinguished variables of T are essen- tial. - 18 - begin /* The Input is a tableau T. */ /* Initially all the rows of T are marked "unconsidered". */ (1) K!lil& there is a row w marked "unconsidered" do begin (2) mark w "considered"; (3) for every row x other than w do (4) if in whatever column x and w disagree, x has a nondistinguished variable that appears nowhere else in T then delete x from T; end; (5) return T; end Figure 3 Lemma 6 ; The algorithm of Figure 3 returns a tableau T equivalent 2 — to T in 0(n ) time. Furthermore, if T is regular, then T is minimal. Proof : Consider line (4) of the algorithm. Row x is deleted if there is a row w in T, such that for all columns A, if x and w disagree in column A, then x has a nondistinguished variable that appears nowhere else in T. By Corollary 2 in [3], the tableau obtained by deleting row x from T is equivalent to T. Thus, T (the tableau returned by the algo- rithm) is equivalent to T. As for the running time of this algorithm, let T have r rows and t columns. The cost of executing line (4) once is 0(t). In each iteration of the outer loop, the inner loop is executed at most r times. The outer loop is executed no more than r times. 2 Thus, the total cost of line (4) is 0(r t). Every other line has a con- 2 stant cost, and is executed no more than r times. Since both rt and r 2 are smaller than n, the algorithm has a 0(n ) running time. Suppose that T is regular. Every repeated nondistinguished vari- able of T is essential and, therefore, must appear in the core of T. - 19 - Consider a folding from T onto its core. This folding maps every row in the core of T to itself and, therefore, the corresponding homomorphism maps every repeated nondistinguished variable of T to itself. Suppose that row x of T is mapped to some other row w in the core of T. Since each repeated nondistinguished variable is mapped to itself, it follows that in whatever column x and w disagree, x has a nondistinguished vari- able that appears nowhere else in T. Therefore, x is deleted during the execution of the above algorithm. Since this is true for every x which is not in the core of T, T is minimal. [] Each one of the following algorithms for minimizing a tableau T has two steps. In the first step some rows of T are deleted in order to obtain an equivalent regular tableau T. In the second step the algo- rithm for minimizing regular tableaux is applied to T. §^.2 Minimizing Tableaux of Class 1_ Let T be a tableau that has at most one repeated nondistinguished variable in each row. For each repeated nondistinguished variable b, let W(b) be the set of all the rows that contain b. Suppose that some repeated nondistinguished variable b~ is not essential, and let A be the column in which b~ appears. Let 6 be a folding from the rows of T to Its core. Obviously, all the rows of 9(W(b n )) have the same symbol d (d * b) in column A, and W(b Q ) is covered by 9(W(b )). The following lemma shows that these conditions are also sufficient for the elimina- tion of b~. - 20 - begin 7* The input is a tableau T that belongs to Class 1. */ /* Initially all the variables of T are marked "unconsidered". */ (1) KlliiS, there is a repeated nondistinguished variable b marked "unconsidered" do begin (2) mark b "considered"; (3) let A be the column in which b appears ; (4) for each symbol s (s * b) in column A do begin (5) let S be the set of all the rows of T that have the symbol s in column A; (6) if W(b) < S then delete W(b) from T; end; end; (7) return T; end Figure 4 Lemma 7 ; Suppose that ^ is a mapping from T to itself such that for all x t W(b-.), Kx) ■ x. Then ^ is a containment mapping if and only if (1) all the rows of ; (11) for each column A of w that has a repeated nondistinguished variable e do begin (12) let s be the symbol of c(w) in column A; (13) if h(e) = then h(e) := s; (14) if h(e) * s then return else return M; end FOLD; begin /* main procedure */ /* Initially all the variables are marked "unconsidered". */ (18) while there is a repeated nondistinguished variable b marked "unconsidered" do begin (19) mark b "considered"; (20) M := F0LD(b,T); (21) delete all the rows of M from T; end; (22) return T; end Figure 5 outer loop (lines (4)-(16)) is 0(n) . Finally, note that lines (l)-(3) and line (17) have a total cost of no more than 0(n). [] - 26 - Theorem 11 : A tableau T, in which each row is covered by at most 2 one row besides itself, can be minimized in time 0(n ). Proof : Suppose that T' is the result of applying the algorithm of Figure 5 to T. If a variable b is not essential in T' , then it is not essential in T or in any tableau obtained from T by deleting some redun- dant rows (because T and T' are equivalent). Thus, all the rows con- taining b are deleted in line (21), and hence T' is regular. By Lemma 2 10, the running time of this algorithm is 0(n ). [] .6. _4 Simple Tableaux Suppose that T is a simple tableau. Let S be a set of rows, and let w be a row of T. The closure of S with respect to w, denoted CL (S), is the minimal set of rows such that w (1) S C CL (S), and w (2) if x is a row in CL (S) such that x has a repeated nondistinguished w variable b in some column A, and w has some other symbol in this column, then all the rows containing b are in CL (S). w In [3] it is shown that if w covers CL (S), then the tableau obtained by w deleting all the rows of CL (S) - w is equivalent to T (this is true w even if T is not simple). Furthermore, if some repeated nondis- tinguished variable b of T is not essentail, then there is a row w (that does not contain b) such that w covers CL (W(b) ) . (Note that w is not w in CL (W(b)), since w is not in W(b).) w - 27 - These results can be used to obtain a regular tableau equivalent to T as follows. Compute CL (W(b)) for each repeated nondistlngulshed w variable b and each row w that does not contain b. If w covers CL (W(b)), then delete all the rows of CL (W(b) ) from T. The resulting w w tableau is equivalent to T, and it is regular because all repeated non- distinguished variables that are not essential have been eliminated [3]. In this section we describe an implementation of this algorithm that 2 runs in 0(n ) time. Lemma 12 ; Let w be a row of a simple tableau T. Suppose that b. and b« are two repeated nodistlnguished variables of T that do not occur in w. Then CL (W(b,)) and CL (W(b )) are either equal or disjoint, w I W i. Proof ; Let S. = CL (W(b,)) and S = CL (W(b )). Suppose that some row x belongs to both S. and S_. Row x must have some repeated nondis- tlngulshed variable b that does not occur in w and, hence, W(b) is con- tained in both S. and S„. We claim that both S. and S ? are equal to CL (W(b)). By the definition of a closure, if R is a subset of CL (S) w w (for any row w and set of rows S), then CL (R) C CL (S). Thus, both S. w w 1 and S. contain CL (W(b) ) . L W Let S ■ CL (W(b)). Suppose that S. is not equal to S. If some row w 1 of W(b. ) is in S, then all the rows of W(b. ) are In S, and S is equal to S. (because it satisfies the definition of CL (W(b,))). Thus W(b, ) must i w 1 I be contained in S. - S. We will derive a contradiction by showing that S. - S satisfies also the second condition of CL (W(b.)). Let u be any row of S. - S. Since u is in S., u has a repeated nondistlngulshed - 28 - variable b In some column A, and w has some other symbol in that column. If some row of W(b) is in S, then all the rows of W(b) must be in S, and u cannot be in S. - S. Thus, W(b) is disjoint from S and, hence, all the rows containing b are in S. - S (since they are in S.). Therefore, S. - S satisfies also the second condition of CL (W(b,)). Since this is i w i impossible, it follows that S. is equal to S. Similarly, S« is equal to S. [] Lemma 12 implies that if CL (W(b) ) has been computed, and b is a w repeated nondistinguished variable that appears in some row of CL (W(b)), then there Is no need to compute CL (W(b)) (it is assumed w w that neither b nor b occur in w) . Thus, for each row w we do the fol- lowing. At first all repeated nondistinguished variables that do not appear in w are marked "unconsidered". The next step is to compute CL (W(b)) for some repeated nondistinguished variable that is marked "unconsidered". During this step all repeated nondistinguished variabes that occur in some row of CL (W(b) ) are marked "considered". If w CL (W(b) ) is covered by w, then all the rows of CL (W(b) ) are deleted, w w This step is repeated for some other variable that is marked "uncon- sidered", until all the variables are marked "considered". The complete algorithm is described in Figure 6. 2 Theorem 13 : A simple tableau T can be minimized in 0(n ) time. Proof : By Lemma 12 and the results of [3] , the algorithm of Figure 6 returns a regular tableau equivalent to T. Consider the time complex- ity of this algorithm. We assume that T has r rows and t columns, and n - 29 - procedure CLOSURE (b, w) : begin (1) S :« 4.; (2) make QUEUE empty; (3) mark b "considered"; (4) add all the rows containing b to QUEUE; (5) while QUEUE is not empty do begin (6) let v be the first row on QUEUE; (7) move v from QUEUE to S; (8) for every column A do (9) if v and w disagree in column A, v has a repeated nondistinguished variable d in this column, and d is marked "unconsidered" then begin (10) mark d "considered"; (11) for every row x containing d do (12) ~~~ if x is neither in S nor on QUEUE (13) ~~ then add x to QUEUE; end; end; (14) return S; end begin /* main procedure */ (15) for every row w do begin (16) mark all repeated nondistinguished variables "unconsidered"; (17) for every repeated nondistinguished variable d that occurs in w do (18) mark d "considered"; (19) w^liiS, there is a repeated nondistinguished variable b marked "unconsidered" do begin (20) ~R := CLOSURE (b,w); (21) if R < w then delete all the rows of R from T; end; end; (2 2) reJturn~T; end Figure 6 is the size of T (i.e., n ■ 0(rt)). At first we will show that a call CLOSURE (b,w) requires 0(st) time, where s is the number of rows in CL (W(b)). Consider the cost of executing the loop of lines (11)-(13). - 30 - We assume that for every variable in T there is a linked list that points to all the rows containing this variable. These lists can be created in 0(n) time prior to the execution of the algorithm. By using these lists, the cost of finding all the rows containing d in the loop of lines (11)- (13) is proportional to the number of these rows. The test of deciding whether a row is in S or on QUEUE can be implemented in constant time. Thus, the cost of lines (11)-(13) is accounted for by assigning a constant cost to each row w whenever w is examined in this loop. A row is examined in this loop no more than t times, since it has at most t variables. Also note that if a row is examined in this loop, then it belongs to CL (W(b)). Therefore, the total cost of lines w (11)-(13) is 0(st). Consider now the cost of executing the loop of lines (5)- (13) once (excluding the cost of lines (11)- (13)). Lines (6) and (7) have a con- stant cost. The cost of lines (8)-(10) is 0(t) . Since the loop, of lines (5)- (13) is repeated s times, the total cost of this loop is 0(st). Lines (l)-(4) have a cost of 0(s) . Thus, a call CLOSURE(b,w) requires 0(st) time. We now compute the cost of executing the loop of lines (15)- (21) once. The cost of line (16) is 0(t) (since each column of a simple tableau has at most one repeated nondistinguished variable) . The loop of lines (17)— (18) requires 0(t) time. The cost of line (19) is no more than the number of repeated nondistinguished variables, i.e., 0(t) . The cost of executing lines (20)- (21) once is 0(st), where s is the number of rows in R. For each row w, all the sets obtained as a value of R in - 31 - line (20) are pairvd.se disjoint. Thus, the cost of executing the loop of lines (15)-(21) once is 0(rt) . This loop is repeated r times and, 2 2 hence, the total cost is 0(r t) , i.e., no more than 0(n ). Theorem 14 : If T. and T„ are simple tableaux, then testing whether 2 T. is equivalent to T„ can be done in 0(n ) time. Proof : By using the algorithm of Figure 6, we compute regular tableaux T. and T„ equivalent to T. and T„, respectively. It follows from the results of [3] that testing whether T. is equivalent to T. can 2 be done in 0(n ) time, where n is the size of T. and T-. [] _7. Decomposition of Tableaux Let T be a tableau that does not necessarily belong to one of the classes we have discussed so far. None of the three minimization algo- rithms can minimize T in polynomial time, but they can be used as heuristics. The minimization algorithm for simple tableaux can be applied to any tableau, and the result is an equivalent tableau possibly with fewer rows. The idea behind the algorithm of Section 6.2 can be used to minimize tableaux as follows. Let b be a repeated nondis- tinguished variable of a tableau T, and suppose that the set S of all the rows containing b does not have any other repeated nondistinguished variable. Then all the rows of S can be deleted if they are covered by a set of rows that have the same symbol in the column of b. - 32 - The algorithm of Section 6.3 is not only good as a heuristic, but can also be used as an exponential time minimization algorithm for tableaux. Let T be a tableau. For every row i of T, we define C(i) to be the set of all the rows that cover row i, i.e., C(i) ■ Suppose we construct a function c such that for all i, c(i) e C(i) (it is understood that c(i) ■ i if C(i) *