m CENTRAL CIRCULATION BOOKSTACKS The person charging this material is re- sponsible for its renewal or its return to the library from which it was borrowed on or before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each lost book. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. TO RENEW CALL TELEPHONE CENTER, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN JUN 1 8 1936 SEP 2 3 1997 JUN 3 1997 When renewing by phone, write new due date below previous due date. L162 Digitized by the Internet Archive in 2013 http://archive.org/details/designingextende898mich Report No. UIUCDCS-R-78-898 TTt^^f UILU-ENG 78 1708 Wf% ty DESIGNING EXTENDED ENTRY DECISION TABLES AND OPTIMAL DECISION TREES USING DECISION DIAGRAMS March 1978 by Ryszard S. Michalski m • » DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Report No. UIUCDCS-R-78-898 DESIGNING EXTENDED ENTRY DECISION TABLES AND OPTIMAL DECISION TREES USING DECISION DIAGRAMS by Ryszard S. Michalski March 1978 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 This work was supported in part by the National Science Foundation under grant NSF MCS 76-22940. ABSTRACT The paper introduces the concept of a decision diagram and shows its application to designing extended entry decision tables and converting them to space or time optimal decision trees. A decision diagram is a geometrical representation of a decision table by means of a planar model of a multidimensional discrete space as described in [12]. Two algorithms for optimal (or suboptimal) space or time conversion are described using decision diagrams. These algorithms are basically de- composition algorithms, but by varying their degree (def . 5), one can obtain a spectrum of algorithms, differing in the trade-off between the computational efficiency and the degree of guarantee that the solution is optimal. When the algorithms do not guarantee the optimality, they give a measure of the maximum possible distance between the obtained and the optimal trees. Key words and phrases : Limited Entry Decision Tables, Extended Entry Decision Tables, Decision Trees, Conversion Algorithms, Decision Diagram, Logic Diagram. CR Categories : 8.3 I. INTRODUCTION There are many practical problems where certain actions or decisions depend on the outcomes of a set of tests. A convenient way of specifying the cor- respondence between test outcomes and the actions is by means of a decision table. Decision tables have found a widespread application in computer pro- gramming [7,5], data documentation [3], and in various other areas of data processing. Recently, in a modified form, they have also found an application to certain problems in artificial intelligence [13]. Fig. 1 gives an example of a limited entry decision table , where tests can have only three possible outcomes: YES, NO or IRRELEVANT, denoted in Fig. 1, by 1, 0, -, respectively. Fig. 2 gives an example of an extended entry decision table , where tests can have an arbitrary number of outcomes. Techniques described in this paper are applicable to both, limited and extended entry decision tables. Each column of a decision table specifies a decision rule which consists of a condition part (a combination of test outcomes) and an action part (an action or sequence of actions which should be taken when the condition part is satisfied). If the order of actions is important, the entries in the action part are integers indicating the order. In any decision table, test outcomes can take only a finite number of distinct values. Let x n , x , ..., x denote tests and D, , D , ..., D , 1 Z n 1 z n corresponding sets of possible outcomes of these tests. The event space E = D x D x ... x D (1) 12 n (where x denotes cartesian product) , is the set of all possible sequences of test outcomes ( events) . As was described in [ 12] the event space E can be represented geometrically on a plane in the form of a diagram. For the lack of space, the description of the diagram, and of the rule for recognizing cartesian complexes (see below) in it, also given in [.12] is omitted here. tfl 0) H Rules 1 2 3 h 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 0000000001 1 1 1 0001110110 1 1 1 -10-01 - 1 0000000000 0000001110 0-1011-01- - 1 1 1 1 1 1 1 1 1 1 - 1 1 1 1 1 1 1 1 - - - - 1 - - - - - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 1 1 1 1 1 1 1 1 1 — — — 1 _ — 1 1 - - 1 CO 1 c A„ u < A 5 111111 111111 1 1 1 1 1 1 A limited entry decision table, Figure 1. Rul es 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 1 - 1 2 1 1 2 2 2 1 (A x- 2 1 2 1 1 2 2 2 2 2 2 2 2 0) CO 4J 2 rH 0) 0) X- 2 _ 3 _ 1 3 1 2 2 2 W H 3 \ 2 — — — - - 2 1 A, 1 1 1 1 0) 1 M (J A ? 1 1 1 1 1 1 o U W u -< A 3 1 * Error-an impossible combination of test outcomes An extended entry decision table, Figure 2, It is therefore recommended that the reader have a prior acquaintance with paper [12]. A basic concept used here is that of an elementary cartesian complex (a special case of a ' cartesian complex ' [ 12 ]) , defined as a set of events or cells * of a diagram, which can be expressed as a single logical product (a term) of conditions which check whether a test x. has outcome a.. Such 1 l conditions are written as [x. = a.], and terms as products A[x. = a.]. If 1 x iei * ± an outcome of a test is irrelevant ('-'), then the condition involving this test is omitted from the term. Thus, each condition part of a rule in a decision table can be expressed as a term, and be represented in a diagram as an elementary cartesian complex (from now on, simply, a complex). A decision diagram , for a given decision table, is constructed by locat- ing in the diagram (representing the space E of test outcomes) the complexes which correspond to the condition part of every rule, and marking them by actions specified in the action part. A complex (or a cell) marked by action A is called in the sequel a complex (or cell ) of class A. Fig. 3 and 4 present the decision diagrams representing decision tables in Fig. 1 and 2, respectively. It may be a useful exercise for the reader to check the correspondence between the rules in the decision tables, and corresponding complexes in the decision diagrams. In this paper, decision diagrams are used, both, as a conceptual geometri- cal model for describing algorithms, and as a visual aid for solving problems. A significant advantage of decision diagrams lies in the fact, that it is much easier (for humans) to see differences and similarities between geometrical configurations, than between strings of numbers or symbols. * In the case of binary tests (i.e., when a.£ [0,1]), a complex of 2 cells corresponds to a k-cube (a subset of 2 vertices of an n-dimensional hypercube) . RULE 12 3 Decision diagram representing decision table in Fig. 1, Figure 3. RULE 1 Empty cells correspond to ELSE conditions- Decision diagram representing decision table in Fig. 2. Figure 4, A description of algorithms in terms of geometrical constructs (which can be visualized) has therefore a great appeal - both for scientific communication and education. In the past, many authors used the concept of an n-dimensional hypercube and its subsets, k-cubes, for representing an event space and logical products, respectively. A hypercube, however, can be directly visualized only when there are not more than 3 variables; when there are more than 3 variables, it rapidly looses its value as a geometrical model. When the variables can take more than 2 values (as in our case) the concept of a hypercube is even less adequate. Although a form of diagrams with binary tests (Karnaugh maps) has been used in the past for solving problems related to limited entry decision tables, this is the first paper, to the author's knowledge, which demonstrates use- ulness of diagrams for extended entry decision tables, and uses them system- atically as a conceptual model for presenting and analyzing algorithms. The paper also demonstrates that decision diagrams are a useful practical tool (when the use of a computer is not necessary) for directly solving various problems related to decision tables, such as testing decision tables for re- dundancy, consistency and completeness, optimizing decision tables, and quickly converting them to optimal (or near-optimal) decision trees. Chapter 2 describes the use of decision diagrams in designing and optimizing decision tables, and Chapter 3 gives a theoretical analysis of the problem of converting decision tables to (space or time) optimal decision trees, and describes two first degree conversion algorithms. Chapter 3 also demonstrates a need, in some cases, for conversion algorithms of higher degree than first, and shows that such algorithms can be easily obtained from the first degree algorithms. II. USE OF DECISION DIAGRAMS IN DESIGNING DECISION TABLES 2.1 Testing decision tables for redundancy , consistency and completeness . A well designed decision table should be non-redundant, consistent and 8 complete.* These properties can be easily tested once a decision diagram has been constructed for the given decision table. The redundancy occurs, if the decision diagram contains complexes of the same action class, which have a non-empty intersection. For example, in Fig. 3, complexes represent- ing rule 1 and 2 of action class A , intersect, and therefore, the decision table in Fig. 1 is redundant. If intersecting complexes are of different action classes, then the decision table is inconsistent. The decision table is complete, if every cell of the decision diagram belongs to (is covered by) some complex. We can see in Fig. 3 that the decision table from Fig. 1 is redundant, consistent and complete; and in Fig. 4, that the decision table from Fig. 2, is irredundant, consistent and complete (the table would be incomplete if there were no ELSE rule) . 2. 2 Optimization of a_ decision table It is usually desirable that a decision table contains the minimum number of rules, which is sufficient for specifying the given decision problem and preserving the requirements of non-redundancy**, consistency and completeness. In a decision diagram, a reduction in the number of rules occurs when two or more complexes of the same action class are merged (or rearranged) into a smaller number of complexes. The theoretical basis for merging complexes is given by the simplification rule : L[x.=0] V L[x.=l] V ...V L[x.=d.-1] = L (2) 11 11 where L is a term, L[x.=a] is a logical product of L with condition [x =a], {0,1, . . . ,d -1} is the set of all possible outcomes of test x., i.e., D.. i li * A decision table is ^ redundant , if there is a combination of test outcomes which satisfied the condition part of more than one rule with the same action part; inconsistent , if there is a situation as above but when the action parts are different; «complete, if it contains a rule for any sequence of test outcomes. ** If one permits a redundancy, the number of rules can sometimes be further reduced. The rule (2) applied to a decision diagram says that if complexes of the same action class differ in the outcome of only one test, and the test takes on all possible values in these complexes, then the complexes can be merged into one complex not involving this test at all . If certain combinations of test outcomes can never occur (are 'DON'T CARE- s' ) , then cells corresponding to them (empty cells in Fig. 4) can be included in any complex if this can help to merge complexes. Let {A }, i=l,2,...,m, denote the set of all action classes, and E. - 1 l the set of all cells of action class A. in a given decision diagram. A cover C (A ) of action class A. is a set, {C }, of complexes, whose J J K. 3 union includes (covers) set E_, and does not cover any cells of other action classes m E ^ ^ k CE\ c. The difference between c +c, and c is computed: o 1 o 1 A(x.,A) = c + c 1 - c (7) l o 1 Let K denote all action classes whose cells are partitioned by selecting test x . A(x.,A) is determined for each A G K, and then their sum is computed: i i A 1 (x ± ) = Z A(x_.,A) (8) kzK Definition 7. A, (x.) is called A, or dynamic cost estimate for x.. 1 l —1 — z l To see that A is a form of the first degree cost estimate, assume that the reference set R is the union of covers U (c(E^) U c(E 1 ± : l o 1 where i scans partitions of set E by all tests being considered for an assign- R = C(E) u Y ( C ( E ) U C(Ep) (9) ment to the given node. A, can then be viewed as a property of the relation between R and x.. In using A., for test selection one can ignore the value c in (7), since it remains the same for each test. Only when a test is selected, *A cover is irredundant if removing or decreasing any complex in it makes the resulting set not a cover. 25 one can compute the 'complete' value of A., which Is needed for computing the total cost estimate Z , as defined later. The A estimate does not have the previously mentioned disadvantages of A , and is clearly more precise in estimating the effect of a test selection on the final decision tree. Its computation, however, is more complex because at each step of test selection the optimal cover of (ever decreasing) sub- diagrams has to be computed (note, however, that after selecting a test, the cardinalities of c(E ) and c(E 1 ) can be used in computing A for test selection at the next level of the tree). A 'shortcut' in computing A is also possible. Namely, if A =0, then, obviously, A = 0, and A does not have to be computed in such case (for details, see Example 3 and remarks after algorithm D) . A question arises of which test to select when A is the same for more than one test. In computing A , sizes of covers C(E ) and C(E..) were not taken into consideration. If the cardinality of C 1 = C(E 1 ) U C(E X ) is the same for different values of i, o x (see (9)), then A for corresponding tests are also the same. The covers C may, however, consist of complexes of different sizes (because of the DON'T CARE-s). The larger the complex, the smaller number of tests are involved in its expression, and the corresponding complex can be potentially repre- sented by a leaf at a higher level of the tree. This may reduce number of nodes (see Example 3) . Consequently, a reasonable tie-breaking rule is to select the test for which C consists of larger complexes (i.e., the total number of cells in complexes of C is larger). When there is still a tie, any test can be selected. Definition 8 . The above defined criterion for test selection is called the criterion of dynamically minimizing added leaves (DMAL) . 26 The DMAL criterion is a form of first order cost estimate (as is MAL). Assuming that E denotes the sum of the estimate A- for all nodes in a tree, theorem 2 also holds for Z . 1 Other first degree cost estimates Pollack [14] describes a first order cost estimate in which complexes broken by a test are assigned weights (called 'column-counts') equal to the number of cells they consist of. The cost estimate ('dash-count') for a test is the sum of weights of complexes broken by selecting the test. (An assumption is made in [14] that each action class is represented by only one complex. Thus, the issue of alternative covers is not considered there, which strongly limits the applicability of the method). According to the above estimate, breaking, e.g., 4 two-cell complexes is equivalent to breaking 1 eight-cell complex. Breaking 4 two-cell complexes adds 4 more nodes, while breaking 1 eight-cell complex adds only one node (although, if this complex is broken in the next test selections, 7 more nodes could potentially be added to the tree). It is assumed here, of course, that no subsequent merging of complexes is done. In view of what was said before about the fast decrease of the probability that a complex is broken a few times, such an estimate does not seem to be sufficiently justified (in fact, it is easy to find an example for which such an estimate will select a wrong test, while the simpler MAL criterion will select a right one). (The tie-breaking criterion used in [14] (called DELTA) favors an imbalance in the number and sizes of complexes on both sides of axes of a test (i.e., in parts of diagram defined by value and 1 of a test). It is unclear how to justify such a criterion, and there is a simple counter-example to it.) Alster [1] describes a first degree cost estimate where the weight given to broken complexes is 2 , where k is the total number of reduced variables ('dashes') in all (essential) complexes in a cover of an action class (i.e., if there are, for example, 3 two-cell complexes in an action class and any one is broken by 3 the test, then it will be given weight 2 =8). Thus, if there is only one complex in an action class, the estimate is equivalent to Pollack's dash-count 27 estimate, but if there are few complexes in a class, and each complex has more than one cell (i.e., has at least 1 dash), then any broken complex from this group will be given a very large weight. Here again, for reasons discussed before, such criterion seems to be not sufficiently justified, and it is easy to find a counter-example to it (for which both the MAL and the DMAL criterion select a correct test). When there are alternative (non-essential) complexes, paper [1] advocates the creation of OR-groups. The weight of broken complexes in such a group is divided by the cardinality of the group. The aim of the measure is to take into account the fact, that the larger are OR-groups, the more likely it is that a cover exists whose complexes will not be broken by the test under consideration. Note, that in computing A , rather than attempting to estimate (by the above or any other measure) the probability that such a cover exists, one simply directly searches for the cover. This is computationally acceptable, and at the same time the estimate is more precise. CASE 2 : Conversion of extended entry decision tables. In an extended entry decision table, tests may have an arbitrary number of outcomes. Consequently, the equivalent trees may have an arbitrary number of branches. Let us assume first, that all tests have the same number of out- comes equal to d, and the corresponding decision tree will be d-ary. In a d-ary tree, the number of leaves is I = (d-l)v+l (10) where v is the number of nodes. Thus, as in the case of binary trees, a d-ary tree with the minimum number of leaves has also the minimum number of nodes. Both criteria of test selection, MAL and DMAL, can be adoped here with- out modification, as it is shown below. Selecting a test corresponds now to partitioning a diagram into d parts. Consequently, if a test breaks a complex, it breaks it into d smaller complexes. This adds d-1 leaves and 1 node to the tree. Therefore, decreasing the number 28 of complexes which are broken by a test, decreases the number of nodes. Value d makes no difference with regard to which test should be selected. Both principlt MAL and DMAL, can be directly applied. If, however, tests can have different numbers of outcomes, the trees with the same number of leaves can have different number of nodes. For example, Figure 9 shows a decision diagram which can be converted to 2 trees with the same number of leaves but different number of nodes, as shown in Fig. 10. If two trees have the same number of leaves, then the tree in which tests with larger number of outcomes are assigned to nodes closer to the root will have a smaller number of nodes. Therefore, a reasonable generalization of MAL and DMAL criterion is to accept as the primary tie-breaking rule (for tests with the same value of A and A. , respectively) the preference for tests with larger number of outcomes, and then, as the secondary tie-breaking rule, the one referring to the size of complexes. Thus, we have: Definition 9 . The (modified) criterion MAL {DMAL} for test selection is defined: 1. Choose the test for which A {A } is smaller, 2. In case of a tie, chose the one with larger number of outcomes. 3. If there is still a tie, chose the one which partitions the diagram into parts with smaller complexes {into parts in which covers of the same class have larger complexes}, 4. If there is still a tie, chose any test. Note, that in the case when all tests have the same number of outcomes, the above defined MAL and DMAL criteria are equivalent to their previously defined form (def. 6 and 8). Although using the MAL or DMAL criterion for test selection will often lead to the optimal tree, in some cases the obtained tree will be sub-optimal. In such cases, a higher degree cos t estimate may be needed for the "right" test selection. An example of such a case is given in the section 3.3 (Example 4). 29 Empty cells denote DON'T CARE-s. A decision diagram involving tests with different number of outcomes* Figure 9. A : A 2 A 3 A 4 A 5 Two decision trees equivalent to decision diagram in Fig. 9. Figure 10. 30 3. 3 Algorithms and examples The previous section described 2 criteria, MAL and DMAL, for test selection, but left unspecified the details of using them for constructing decision trees. This section describes 2 conversion algorithms, S('static cover') and D('dynamic cover') which employ the criteria MAL and DMAL, respectively. Although algorithms are described in the context of using decision diagrams, they can be directly adopted for computer implementation. The algorithms permit someone, with practice in recognizing complexes in a diagram, to quickly and directly convert a decision diagram into an op- timal or near-optimal decision tree. In the latter case, the algorithms give an estimate, Z or E , respectively, of the maximum difference (in the number of nodes) between the obtained and optimal tree. The algorithms assume as given, a procedure for constructing the optimal cover of a decision diagram. Algorithm S The algorithm uses MAL criterion for test selection and assumes that the initial reference set R is the optimal cover of the decision diagram (or one of the alternative optimal covers, if such exist). Since a different R can produce different decision trees, in order to obtain the 'best' tree the algorithm may have to be repeated for each alternative cover (unless for some tree, the total cost estimate X n = 0). The algorithm is recommended, when there exist only one or very few optimal (or irredundant*) covers. Step 1 : Determine the optimal cover of the decision diagram, and accept it as the reference set R. Assign the set of all tests to T. Set a printer P to indicate the root of the tree. *An interesting and, to this author's knowledge, unsolved problem is whether the optimal tree can be always derived from the optimal cover (assuming that splitting complexes or joining previously split parts are the only permissible operations on the cover). 31 Ste p 2 : For each test from T compute estimate A (def. 3). If for some o test, x , A (x.) = 0, go to step 3. If for every test A =j= 0, select a test according to MAL criterion (def. 9). Let x denote the selected test, and 0,l,...,d.-l be its outcomes. Step 3 : Assign x. to node P, and outcomes of x., values 0,1, . . . ,d .-1, to branches of node P (in order from the left to the right). Split the diagram into d. (sub)diagrams D(x =0), D(x.=l) , . . . ,D(x.=d .-1) . Check if any of these diagrams contain a complex (or complexes) of the same action class. For each such diagram, assign the name of the action class to the end (leaf) of the branch. Put the remaining diagrams on the list L. If L is empty, then STOP. Step 4 ; Apply algorithm S, starting from step 2, to each of the diagrams on the list L. Assume the following initialization for each diagram: 9 P points to the node at the end of the branch corresponding to the given diagram, T = T \ {x.}, where \ is the set subtraction « Merge, if possible, any complexes or parts of the broken complexes (which lie within the scope of the diagram), which are of the same action class, into larger complexes. If k complexes are merged into 1, subtract value k-1 from A (x.). Accept the final set of complexes as the reference set R. (The above merging is not a necessary operation; if used, it can sometimes improve the final tree.) After completing the tree, compute the total cost estimate Z (see theorem 2). Example 1 Convert the decision table in Fig. 1 to a decision tree using algorithm S (Fig. 3 shows the corresponding decision diagram). 32 Step 1 ; The otpimal cover of the decision diagram is determined (Fig. 5) . (Since only one complex is associated with each decision class, complexes are identified by symbols denoting classes) S:=(A l' A 2' (A l' A 2 ) ' A 3' A 4' A 5 )> T:=(x 1 ,x 2 ,...,x 6 ) Step 2: Compute A for each x.eT: c — o 1 A (x.) = o 1 A (x ) = 4 (axes of x„ cut A ,A_,A. ,A C ) o I I 2 3 4 5 A (x_) = 6 (axes of x_ cut all complexes) A (x. ) = o 4 A (x_) = 3 (axes of x n cut A.,A.,A r ) o 5 5 3 4 5 A (x,) = 6 (axes of x, cut all complexes), o 6 6 Since A (x ) = 0, remaining values of A do not have to be computed o 1 o (unless one wants to derive alternative trees; they were computed here for illustration). Test x is selected. Step 3 : x is assigned to the root of the tree; left and right branches of the root are assigned values and 1, respectively. Split the diagram to 2 diagrams, D(x =0) and D(x =1). Since both diagrams contain complexes of different classes, L:={D(x =0), D(x =1)}. Step 4 : Consider diagram D(x =0) first. P points to the node which ends the branch from the root. T:=(x ,x»,x. ,x c ,x,) . The reference z j 4 5 b set R:=(A ,A 2 ,(A ,A 2 ),A^). Step V : Compute A for each x.GT: A (x„) = 2 (axes of x„ cut A. and A.) o 2 2 2 4 A (x„) = 4 (axes of x,. cut A. (A. ,A ) ,A_,A, ) oj 3 llzz4 A (x.) = o 4 Select x. . 4 33 Step 3' : x is assigned to P, the left and the right branches are assigned and 1, respectively. The diagram D(x =0) is split into 2 new diagrams, D(x =0, x =0) and D(x =0 , x =1). Since A (x,)=0, no merging is possible. Diagram D(x =0,x =1) consists of one complex A . Therefore, the end of branch 1 is marked by A , . L: = {D(x =0, x =0)} 1 4 Step 4': Apply algorithm S, starting from step 2, to diagram D(x =0, x =0), assuming the initialization: P points to the node ending branch (of node x,); t: = (x_,x„,x,-,x,) , R: = (A , (A ,A ),A ) If one continues the algorithm, the end result will be the tree presented in Fig. 11. It is easy to check that A for each selected list was 0, therefore, the total cost estimate Z =0, and the tree is optimal. o Example 2 Convert the extended entry decision table from Fig. 2 to a decision tree using algorithm S (the corresponding decision diagram is shown in Fig. 7). Step 1 : The optimal cover of the diagram is determined (Fig. 7). R: = (L , L ? ,L ,L, L ), T: = (x ,x_,x„,x,), P points to the root of the tree. Step 2 : Compute A estimate for each x^T: A (x. ) = 2 (axes of x n cut complexes L.. and L_) o I 1 i z A (x.) = o z A (x ) = 5 (axes of x cut all complexes) A (x.) = 5 o 4 Test x_ is selected. Step 3 : x is assigned to P (the root of the tree); branches from P are assigned values 0, 1, 2. The decision diagram is split into 3 (new) diagrams D(x =0) , D(x> =1) and D(x =2). Diagram D(x =0) consists of complex L of decision class A , and diagram D(x =1) of complex 34 A space optimal decision tree corresponding to the decision table in Fig. 1 (Example 1). Figure 11. 35 L of class A . Ends of branches and 1 are marked A and A , respectively. The list L: = {D(x=2)}. Step 4 : Apply the algorithm, starting from step 2, to diagram D(x =2). Continuing this process produces the tree presented in Fig. 12 Since A for each test was 0, the total cost estimate £ =0 and the tree o o is optimal. Algorithm D This algorithm uses DMAL criterion for test selection. It is particularly recommended when there are quite a few choices of complexes for a cover, and, therefore, there can be a large number of irredundant (and perhaps also optimal) covers. The algorithm starts with a decision diagram in which cells of all action classes are treated as separate* (i.e., not included in any complexes). Step 1 : Assign the set of all tests to T. Set P to indicate the root of the tree. Step 2: For each x.^T, compute cost estimate A (def. 7). Select the best test by applying DMAL criterion (def. 9). Step 3 : Assign x. to node P, and outcomes of x., values 0,1, . . . ,d.-l, to branches of P. Split the diagram into d. diagrams D(x.=0), D(x.=l), ..., D(x.=d.-1). Check if any of the diagrams contain cells of only 1 action class. For each such diagram assign the name of the action class. For each such diagram assign the name of the action class to the end of the branch corresponding to the diagram. Put the remaining diagrams on the List L. If L is empty, then STOP. *This condition is not necessary, if the adopted covering algorithm can find optimal covers starting with complexes rather than individual cells. 36 A space optimal decision tree corresponding to the decision table in Fig. 2 (Example 2), Figure 12, 37 Step 4 : Apply algorithm D, starting from step 2, to each of the diagrams on L. Assume the following initialization for each diagram: # P points to the node at the end of the branch corresponding to the diagram, # T: = T \ {x.}. After completing the tree, compute Z , i.e., the sum of values of A for tests assigned to each node of the tree. T. is the maximum possible difference (in the number of nodes) between the obtained tree and the optimal tree. A 'shortcut' in executing algorithm D is to mark (record) in the diagram (considered at a given step), the optimal covers C(E_), C(E 1 ) , . . . ,C (E _. ) , i which are generated at this step for computing A (def. 7). When A is computed in the framework of the subdiagrams (of the above diagram), A is determined with an C(E.) as the reference set. If A =0, then A =0. (See Example 3, step 2, l o 1 for illustration.) Example 3 Convert the decision diagram in Fig. 13a to a decision tree using algorithm D. Step 1 : T:=(x ,x ,x ,x,). P points to the root of the tree. Step 2 : Compute A (def. 7) for each x £T: A 1 (x 1 )=l The axis of x, divides cells of action class 1 and 3. (Cells of classes 2,4 and 5 are on one side of the axis. The optimal cover of cells of class 1, in the original diagram, consists of 3 complexes. The optimal cover of cells in class 1 in diagram D(x =0) consists of 1 complex, and in D(x =1) of 2 complexes. Thus, A(x_, class 1) = (1 + 2) - 3 = 0. The optimal cover of class 3 in the original diagram consists of 2 complexes (Fig. 13b). The optimal cover of class 3 in D(x 1 =0) has 1 complex (L^), and in D(xi=l) also 1 complex (L^); see Fig. 13c. Thus, A]^(x^, class 3) = (1+1) -2 =0, and, finally, Ai(x]_) = A^(x 1 ,class 1) + Aj (x^,class3) = 0. 38 a. x,x s 1 x, 1 x. s s ® C. L, / DIXjSO)* (3 3) DlX^lH @ Q © ^ ^ © D(X,=0) D(X,=1) o (?) w © r ® H L L D(X 3 =0aX 4 =1) D(X 3 =0, X 4 =0) Decision diagrams for example 3. Figure 13. Optimal decision tree for decision diagram in Fig. 13a. Figure 14. 39 Note that the cover of class 3 in Fig. 13b has larger complexes than the cover of this class in Fig. 13c. This is the reason for selecting later test x^, rather than x-^ , although for both A 1 =0 (see criterion 3 for DMAL in def. 9). A.. (x~) = 2 (because of action classes 4 and 5) A 1 (x 3 ) = VV = ° We have a tie for x-.x-.x,. Condition 3 of DMAL eliminates test x . From remaining x and x, ,x„ is chosen. To apply the 'shortcut', the optimal cover of class 1 in diagrams D(x =0) and D(x =1) is marked (Fig. 13d). If the above cover is taken now as the reference set R, then A (x.) = 0, in the framework of D(x =0). This implies that A (x, , class 1) = 0, and, therefore constructing optimal covers for class 1 in D(x =0),x ,=0) and D(x =0,x=l) is avoided. 3 4 3 4 Step 3 : x is assigned to P. The diagram is split into 2 diagrams, D(x =0) and D(x 3 =l). T:-(aL,,x 2 ,x,) . Step 4 : Algorithm D is now applied separately to diagrams D(x =0) and D(x =1), For the left offspring of node x~, test x, is chosen. Fig. 13e shows the optimal covers of classes 1 and 3 in subdiagrams D(x =0,x,=0) and D(x =0,x =0). Continuing the algorithm leads to the decision tree in Fig. 14. The tree corresponds to the (optimal) cover shown in Fig. 13f. The total cost estimate £ =0, therefore the tree is optimal. 3.4 A comparison of algorithms 9 and D. Need for higher degree algorithms ^ in some cases. Algorithm S starts with constructing the optimal cover of the given decision diagram. The need for applying a covering algorithm arises again when come complexes are broken and there is a possibility that their parts and 40 possibly some other complexes of the same action class, could be merged into larger complexes. In algorithm D, the covering algorithm is applied at every step of test selection, although, for ever decreasing (sub) diagrams, and only for classes which are divided by a test (and except when the 'shortcut' is possible). In total, algorithm D requires more applications of the covering procedure, and, as result, takes more computation time. On the other hand, estimate A n is more precise than A , and, therefore, 1 o the algorithm D may produce a 'better' tree than algorithm S, when there are many irredundant covers possible for a given decision diagram. Both algorithms are first degree algorithms, and, as the following example shows, may fail to construct the optimal decision tree. Example 4 Fig. 15 gives an example of a decision diagram for which algorithms S and D (or any other first degree algorithm) fail to produce the optimal decision tree, (Example was constructed by Yasui [22]). Let us compute estimates A and A- for all tests: o 1 A ( Xl ) - 2 W = 2 A Q (x 2 ) - 2 A 2 (x 2 ) =2 A (x 3 ) = 3 W = 2 W = 1 W = 1 (The optimal cover shown in Fig. 16 was taken as the reference set for computing V Both algorithms select test x, for the root, while this is the only test which does not produce an optimal tree (Fig. 17 and 18). It is easy to see that it is impossible to reject test x, by evaluating the effect of only one test on the decision diagram. In this case, one has to take into consideration the effect of a pair of tests, i.e., to apply a second degree algorithm. Thus, to make algorithm S (or D) able to construct optimal decision tree in this case, one should compute, instead of A (A ) , the second degree cost 41 x 2 x 2 1 4 2 5 U 1 8 9 2 5 1 1 4 7 7 1 1 3 6 3 6 C l ) ] l L A counterexample to any first degree algorithm. Figure 15. X l *2 CD CD © d) ^ ^ CD CD CZ D d UJ CD CD x 4 x, Optimal cover for decision diagram in Figure 15. Figure 16. 42 A A A A L 8 1 7 4 9 4 Sub-optimal decision tree for decision diagram in Fig. 15. (11 nodes) Figure 17. Optimal decision tree for decision diagram in Fig. 15. (10 nodes) Figure 18. 43 2 2 estimate A (or A ). In Fig. 19, rectangular boxes include tests which can be selected for a given node of the tree together with the value of A . From this o figure we see that: A 2 (x n ) = A (x.) + min {A (x,/x,),A (x,/x.),A (x. /x, ) } ol ol o21o31o41 + min {A (x /x..),A (xJx.),A (x./x n )} OZl Oil O 4 1 = 2 + + 0=2 A 2 (x. ) =1+1+1=3 o 4 wh ere A^x./x^and A^x^) denote the estimates A^) in the framework of subdiagrams D(x.=0) and D(x.=l)» respectively. 2 2 Thus, A (x n ) < A (x.), and test x. will be rejected, o 1 o 4 4 The author conjectures that for any algorithm of finite degree, there exists a decision diagram for which the algorithm will fail to produce the optimal tree. The above example shows that by extending the order of algorithms, the class of conversion problems for which the algorithms produce the optimal tree is also extended. Let k be the maximum order of the estimate A n needed, that its computation m 1 for a test, a candidate for the root, reaches the leaves of the tree under construction. Obviously, k <^ n. Theorem 3 : If algorithm D employs the cost estimate A.. , then the resulting tree is guaranteed to be optimal. Proof ; The estimate A. (unlike A ) does not assume that any specific optimal 1 o (or irredundant) cover has to be first computed, and, therefore, is not effected by existence of more than one optimal (or irredundant cover) . The value of A , for a test x. is the minimum number of nodes, above the lower bound given by theorem 1, which can be in any tree whose root is x. . The algorithm D selects the test for which A , is minimum. Therefore, the tree with so assigned root, and, recursively, other nodes, will have minimum number of nodes, i.e., will be optimal. It is clear now, that by varying the degree k between 1 and k of the ro V 44 s IO * X l-l < CM X o • • ro s X o < X CM • X r-* N 00 c 0) 4J B •H 4J o T3 o CN O <3 -O C CN O <3 03 0J 4-1 CO B •H 4J 03 0) 03 O CJ O" 0) n &0 03 ■d C O o a; 03 0) 3 a B o u O-s 01 u 3 ao •rl "<° 45 estimate A , one obtains a spectrum of methods which differ in the trade- off between the computational efficiency and the degree of guarantee that the obtained tree is optimal. 3. 5 Time optimal decision trees In the case of converting a decision diagram to a tree corresponding to the time optimal program (called, for short, time optimal tree ) one assumes that tests, x., are assigned costs, indicating the time needed for test evaluation, and that actions are assigned probabilities of their occurrence. The optimal tree is the tree which has the minimum weighted path length, i.e., the minimum value of I Z p . .path-cost . (11) j=l J J where I - the number of leaves p. - the probability of path j in the tree (suppose action A is assigned to path j, and the probability of action A is p.. If the number of cells of action A in the decision diagram is c , and the number of cells in the complex a corresponding to path j is c, then it is assumed that p.- " P A J J c A A path-cost. - the sum of costs of the tests assigned to nodes on the path j. Let L be a complex of decision class A. The complex L can be assigned the cost: cost(L) = p,. test-cost(L) (12) C L n where p T = p. L c . A A c - the number of cells in L test-cost (L) - the sum of the costs of tests in complex L. Selecting test x. for a node of the tree corresponds to partitioning a diagram (subdiagram) D to d . subdiagrams, D(x.=0), D(x.=l) , • • • ,D.(x =d .-1) . Let us assume initially, that R is an optimal cover of the diagram D, and R. , j=0,l, . . . ,d.-l, are the parts of the cover lying within the diagrams D(x.=j), j=0,l, . . . ,d.-l , respectively. 46 Definition 10 . The cost, cost(C), of a cover C is defined as the sum of the costs of its complexes. The 'incremental' cost of selecting test x can be estimated as: T (x.) - E cost(R.) - cost (R) (13) o l j 3 Observe now, that since the costs of tests and probabilities P^ can have arbitrary values, the costs of different optimal covers can also be dif- ferent (the situation is then different than it was in the case of space optimal trees). Consequently, in order to use T (x.) as a proper analogue of A (x.) estimate, R in (13) should not be an optimal cover (def. 1), but a cost optimal cover, defined as a cover of the diagram of minimum cost. In using the T estimate for test selection, it is computationally ad- vantageous to ignore the component 'cost(R)' in (13), until a test is selected, and then to compute the 'complete' value of T ("similarly as in computing A.. ) . The 'complete' value of T is needed, because the sum of T estimates, denoted o o ZT , over all the nodes of the obtained tree, specifies (analogously to E ) the maximum possible difference between the cost of the obtained tree and the optimal one. In order to obtain an analogue T to the A. estimate, both R and R. in (13) should be the minimum cost covers of diagram D and diagrams D(x.=j), respectively. The sum of T estimates, denoted ET.. , over the nodes of the obtained tree plays again the same role as £ . The theorem 3 holds also for T.. . It is interesting to observe that the dynamic programming algorithm by Schumacher and Sevcik [18] is equivalent, at the conceptual level, to computing t! 1 (i.e., the nth order estimate T ). Some differences are that instead of the cost of a complex, they use an inversely related notion of the gain , defined as the difference between the sum of the costs of events in the complex and the cost of the complex. (The gain can equivalently be computed as the probability of the complex multiplied by the sum of the costs of tests which do not occur in the term expressing the complex.) Also, the order of computing T in [18] 47 n is specified from the leaves of the tree up, while the definition of T suggests ( but does not require ) computing it from the root down. The way T^ is computed is, of course, a matter of implementation. The way it is done in [21] seems to be efficient, because it constructs the cost optimal cover (cor- responding to the final tree) from the single cells up, step by step, building upon the intermediate results. This avoids a repetition of certain operations, which would occur, if one independently constructs covers of subsequent sub- diagrams, going from the whole diagram to the individual cells. It is easy to see, however, that Schumacher and Sevcik algorithm can be in certain cases very inefficient. This is because it always computes the most costly estimate T- , even when a lower order estimate (much less costly) could produce the optimal decision tree (or 'sufficiently optimal', as measured by IT or ET ). o 1 The following example (taken from [18]) illustrates this observation. Example 5 Fig. 20 presents a decision diagram and its cost optimal cover. The large size numbers in the cells indicate actions, and the small size numbers their probabilities. The action -1 indicates the logically excludable events (DON'T CARE-s), and the action 4 indicates ELSE events (assumed here as having prob- ability). The numbers in parentheses (at the axes) indicate costs of tests. We briefly illustrate here an application of algorithms S and D, in which first order estimates T and T. are used, instead of A and A 1 . o 1 o 1 1. Compute the cost, cost(R), of the optimal cover (see Fig. 20): cost(R) = 0.2-30 + 0.5*30 + 0.3-25 = 28.5 2. Compute T (and T ) for each test ° 3 T (x.) = TiCx-,) ■ £ cost(R.) - cost (R) = T (x ) = T. (x ) = 0.20-30 + 0.5-30 + 0.3-35 - cost(R) = 3 o 1 1 z T (x.) = T n (x_) = 0.2-35 + 0.25-35 + 0.25-25 + 0.3*25 - cost(R) = 1 o j 13 Test Xj is selected for the root. 48 1 .20 .25 .25 (1 -0 c 2) /or>\ 4 4 0.3 (3) > — -• 1 ( 5) X (5) X (10) Decision diagram and the cost optimal cover for Example 5. Fig. 20 49 3. Compute T (and T.. ) for the test-candidates for the left descendant o 1 of the root. T o (x 2 ) = T l( x 2 ) = • T o (x 3 ) - Tl (x 3 ) - 1 ■ Test x is selected, o 4. Compute T (and T ) for the test-candidates for the right descendant of the root. T (xj = T.(x_) = o 3 1 j The value IT = IT = 0, thus, the tree is optimal. In fact, the tree is identical to the one obtained in [21], though its derivation required much less computation. IV. SUMMARY We have shown that the decision diagram introduced in the paper can be useful, both, as a conceptual model for describing algorithms, and as a practical tool for decision table design and conversion to space or time optimal decision trees. The advantage of the decision diagram is that rules in a decision table (or leaves of a tree) are represented as certain geometrical configurations, and relationships between the rules are represented as spatial relations between these configurations. For this reason, the decision diagram can also be used as an educational aid, for visually illustrating concepts and algorithms related to decision tables and decision trees. It may be of interest to the reader to mention here the results of an experiment done by the author in comparing the time spent in solving the same problem, using a conventional method and the decision diagram. The problem was to verify (check consistency, completeness and non-redundancy), reduce and convert to space optimal decision tree, the decision table shown in Fig. 1. The time spent on various phases of the problem by: A - the person who used a conventional method (a faculty member who teaches decision tables ) and B - the author using the decision diagram, is given in Fig. 21. It should be mentioned that the decision tree obtained by person A had 1 more node than the optimal tree obtained using the decision diagram. Note, also, that the most of the time (10 minutes) in the decision diagram method was spent just on determining the decision diagram (which is rather a mechanical process, not requiring the knowledge of decision table algorithms) . 50 Time (min) Using a conventional method Using decision diagram Draw diagram Draw complexes in the diagram Reduce table (determine cover) Verify Convert to tree - 1 9 1 0' 5" 2 13 2 2' 30" TOTAL 17' 30" 13'5" Time spent on various phases of the problem using a conventional method and the decision diagram. Figure 21 51 The concept of kth degree conversion algorithm, also introduced in the paper, permits one to generate a spectrum of conversion algorithms, differing in the trade-off between the computational efficiency and the degree of guarantee of the decision tree optimality. The algorithms S and D were shown to be applicable for both space and time optimal conversion, and they can use cost estimates of a different degree. When algorithms do not produce the optimal tree, they gave a measure of the maximum possible difference between the obtained and the optimal trees. ACKNOWLEDGMENTS The research reported in this paper was supported by a grant from the National Science Foundation, NSF MCS 76-22940. The author thanks Professor Gary Kampen for stimulating discussions and comments, and Tom Dietterich for proofreading of the paper. 52 REFERENCES 1. Alster, T. M. Heuristic algorithms for constructing near-optimal decision trees. Report No. UIUCDCS-R-71-474, Department of Computer Science, University of Illinois, Urbana, IL., Aug. 1971. 2. Bayes, A. T. A dynamic programming algorithm to optimise decision table code, Australian Computer T. 4 (May 1973), 77-79. 3. Fisher, D. L. Data documentation and decision tables. Comm. ACM , 18 (Jan. 1965), 26-31. 4. Ganapathy, S. , Rajaraman, V. Information theory applied to the conversion of decision tables to computer programs. Comm. ACM 16 , 9 (Sept. 1973), 532-39. 5. Jarvis, J. M. An analysis of programming via decision table compilers. SIGPLAN Notices (ACM Newsletter) 6,8 (Sept. 1971), 30-32. 6. King, P. J. H. Conversion of decision tables to computer programs by the rule mask technique. Comm. ACM 9 , 11 (Nov. 1966), 796-801. 7. Kirk, G. W. Use of decision tables in computer programming. Comm. ACM 8 , 1 (Jan. 1965), 41-43. 8. Larson, J., Michalski, R. S. AQVAL/1 (AQ7) User's guide and program description. Report No. UIUCDCS-R-75-731, Department of Computer Science, University of Illinois, Urbana, IL. June 1971. 9. Michalski, R. S. On the quasi-minimal solution of the general covering problem. Proceedings of the V international symposium on information processing (FCIP 69), Vol. A3 (Switching Circuits), Yugoslavia, Bled, Oct. 8-11, 1969, 125-127. 10. Michalski, R. S. A geometrical model for the synthesis of interval covers. Report No. UIUCDCS-R-71-731, Department of Computer Science, University of Illinois, Urbana, IL., June 1975. 11. Michalski, R. S. Synthesis of optimal and quasi-optimal variable-valued logic formulas. Proceedings of the 5th International Symposium on Multiple- Valued Logic, Bloomington, Indiana, May 13-16, 1975, 76-87. 12. Michalski, R. S. A Planar Geometrical Model for Representing Multi- dimensional Discrete Spaces and Multiple-Valued Logic Functions. Report No. UIUCDCS-R-78-897 , Department of Computer Science, University of Illinois, Urbana, IL., January 1978. 13. Michie, D. AL1 - a package for generating strategies from tables. SIGART Newsletter, No. 59, 1976. 14. Pollack, S. Conversion of limited-entry decision tables to computer programs. Comm. ACM, 8, 11 (Nov. 1965), 677-82. 15. Pooch, U. W. Translation of decision tables. Computing Surveys 6 , (June 1974), 125-51. 53 16. Rabin, T. Conversion of limited-entry decision tables into optimal decision trees: fundamental concepts. SIGPLAN Notices (ACM Newsletter) 6, (Sept. 1971), 68-71. 17. Reinwald, L. T., and Soland, R. M. Conversion of limited-entry decision tables to optimal computer programs - I: Minimum average processing time. J. ACM 13 , 3 (July 1966), 339-58., II: Minimum storage requirement. J. ACM 14 , 4 (Oct. 1967), 742-55. 18. Schumacher, H., Sevcik, K. C. The synthetic approach to decision table conversion. Comm. ACM, 19 , 6 (June 1976), 343-51. 19. Shwayder, K. Extending the information theory approach to converting limited-entry decision tables to computer programs. Comm. ACM 17 , 9 (Sept. 1974), 532-37. 20. Shwayder, K. Conversion of limited-entry decision tables to computer programs - a proposed modification to Pollack's algorithm. Comm. ACM 14 , 2 (Feb. 1971), 69-73. 21. Verhelst, M. The conversion of limited-entry decision tables to optimal and near-optimal flowcharts: two new algorithms, Comm. ACM 15 , 11 (Nov. 1972), 974-80. IBLIOGRAPHIC DATA *EET 1. Report No. UIUCDCS-R-78-898 2- 3. Recipient's Accession No. Title and Subtitle DESIGNING EXTENDED ENTRY DECISION TABLES AND OPTIMAL 5. Report Date March 1978 DECISION TREES USING DECISION DIAGRAMS 6. Author(s) Ryszard S. Michalski 8. Performing Organization Rept. No - UIUCDCS-R-78-898 Performing Organization Name and Address University of Illinois at Urbana-Champaign 10. Project/Task/Work Unit No. Department of Computer Science Urbana, Illinois 61801 11. Contract /Grant No. , Sponsoring Organization Name and Address 13. Type of Report & Period Covered 14. . Supplementary Notes • Abstracts -jhe paper introduces the concept of a decision diagram and shows its applica- Lon to designing extended entry decision tables and converting them to space or time jtimal decision trees. A decision diagram is a geometrical representation of a scision table by means of a planar model of a multidimensional discrete space as ascribed in [12] . Two algorithms for optimal (or suboptimal) space or time conversion are described sing decision diagrams. These algorithms are basically decomposition algorithms, but f varying their degree (def. 5), one can obtain a spectrum of algorithms, differing in le trade-off between the computational efficiency and the degree of guarantee that the jlution is optimal. When the algorithms do not guarantee the optimality, they give a jasure of the maximum possible distance between the obtained and the optimal trees. . Key Words and Document Analysis. 17a. Descriptors Limited Entry Decision Tables, Extended Entry Decision Tables, Decision Trees, Conversion Algorithms, Decision Diagram, Logic Diagram. CR Categories: 8.3 1>. Identifiers/Open-Ended Terms 'e. COSATI Fie Id /Group .Availability Statement Release Unlimited 19. Security Class (This Report) UNCLASSIFIED 21. No. of Pages S7 20. Security Class (This Page UNCLASSIFIED 22. Price )«M NT1S-35 (10-70) USCOMM-DC 40329-P7 1 t 2 1979