m 
 
CENTRAL CIRCULATION BOOKSTACKS 
 
 The person charging this material is re- 
 sponsible for its renewal or its return to 
 the library from which it was borrowed 
 on or before the Latest Date stamped 
 below. You may be charged a minimum 
 fee of $75.00 for each lost book. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 
 TO RENEW CALL TELEPHONE CENTER, 333-8400 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 JUN 1 8 1936 
 SEP 2 3 1997 
 JUN 3 1997 
 
 When renewing by phone, write new due date below 
 previous due date. L162 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/designingextende898mich 
 
Report No. UIUCDCS-R-78-898 
 
 TTt^^f 
 
 UILU-ENG 78 1708 
 
 Wf% 
 
 ty 
 
 DESIGNING EXTENDED ENTRY DECISION TABLES AND 
 OPTIMAL DECISION TREES USING DECISION DIAGRAMS 
 
 March 1978 
 
 by 
 Ryszard S. Michalski 
 
 m 
 
 • » 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
Report No. UIUCDCS-R-78-898 
 
 DESIGNING EXTENDED ENTRY DECISION 
 
 TABLES AND OPTIMAL DECISION TREES 
 
 USING DECISION DIAGRAMS 
 
 by 
 
 Ryszard S. Michalski 
 
 March 1978 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 This work was supported in part by the National Science Foundation 
 under grant NSF MCS 76-22940. 
 
ABSTRACT 
 
 The paper introduces the concept of a decision diagram and shows its 
 application to designing extended entry decision tables and converting 
 them to space or time optimal decision trees. A decision diagram is a 
 geometrical representation of a decision table by means of a planar model 
 of a multidimensional discrete space as described in [12]. 
 
 Two algorithms for optimal (or suboptimal) space or time conversion 
 are described using decision diagrams. These algorithms are basically de- 
 composition algorithms, but by varying their degree (def . 5), one can 
 obtain a spectrum of algorithms, differing in the trade-off between the 
 computational efficiency and the degree of guarantee that the solution 
 is optimal. When the algorithms do not guarantee the optimality, they 
 give a measure of the maximum possible distance between the obtained and 
 the optimal trees. 
 
 Key words and phrases : Limited Entry Decision Tables, Extended Entry 
 Decision Tables, Decision Trees, Conversion Algorithms, Decision Diagram, 
 Logic Diagram. 
 
 CR Categories : 8.3 
 
I. INTRODUCTION 
 
 There are many practical problems where certain actions or decisions depend 
 on the outcomes of a set of tests. A convenient way of specifying the cor- 
 respondence between test outcomes and the actions is by means of a decision 
 table. Decision tables have found a widespread application in computer pro- 
 gramming [7,5], data documentation [3], and in various other areas of data 
 processing. Recently, in a modified form, they have also found an application 
 to certain problems in artificial intelligence [13]. Fig. 1 gives an example 
 of a limited entry decision table , where tests can have only three possible 
 outcomes: YES, NO or IRRELEVANT, denoted in Fig. 1, by 1, 0, -, respectively. 
 Fig. 2 gives an example of an extended entry decision table , where tests can 
 have an arbitrary number of outcomes. Techniques described in this paper are 
 applicable to both, limited and extended entry decision tables. 
 
 Each column of a decision table specifies a decision rule which consists 
 of a condition part (a combination of test outcomes) and an action part (an 
 action or sequence of actions which should be taken when the condition part 
 is satisfied). If the order of actions is important, the entries in the 
 action part are integers indicating the order. 
 
 In any decision table, test outcomes can take only a finite number of 
 
 distinct values. Let x n , x , ..., x denote tests and D, , D , ..., D , 
 
 1 Z n 1 z n 
 
 corresponding sets of possible outcomes of these tests. The event space 
 
 E = D x D x ... x D (1) 
 
 12 n 
 
 (where x denotes cartesian product) , is the set of all possible sequences 
 
 of test outcomes ( events) . 
 
 As was described in [ 12] the event space E can be represented geometrically 
 on a plane in the form of a diagram. For the lack of space, the description of 
 the diagram, and of the rule for recognizing cartesian complexes (see below) 
 in it, also given in [.12] is omitted here. 
 
tfl 
 
 0) 
 H 
 
 Rules 
 1 2 3 h 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 
 
 0000000001 1 1 1 
 
 0001110110 1 1 1 
 
 -10-01 - 1 
 
 0000000000 
 
 0000001110 
 
 0-1011-01- - 1 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 - 
 
 - 
 
 - 
 
 
 
 - 
 
 1 
 
 
 
 - 
 
 - 
 
 - 
 
 
 
 - 
 
 - 
 
 1 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 - 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 — 
 
 — 
 
 — 
 
 
 
 1 
 
 _ 
 
 — 
 
 
 
 1 
 
 
 
 1 
 
 - 
 
 - 
 
 1 
 
 CO 1 
 
 c A„ 
 
 u 
 
 < A 5 
 
 111111 
 
 111111 
 
 1 1 
 
 1 1 
 
 1 1 
 
 A limited entry decision table, 
 
 Figure 1. 
 

 
 
 
 
 
 
 
 
 Rul 
 
 es 
 
 
 
 
 
 
 
 
 
 1 
 
 2 
 
 3 
 
 4 
 
 5 
 
 6 
 
 7 
 
 8 
 
 9 
 
 10 
 
 11 
 
 12 
 
 13 
 
 14 
 
 15 
 
 
 x 1 
 
 - 
 
 
 
 
 
 
 
 1 
 
 2 
 
 1 
 
 
 
 1 
 
 
 
 2 
 
 2 
 
 2 
 
 1 
 
 
 (A 
 
 x- 
 
 
 
 2 
 
 1 
 
 2 
 
 1 
 
 1 
 
 2 
 
 2 
 
 2 
 
 2 
 
 2 
 
 2 
 
 2 
 
 2 
 
 0) 
 CO 
 
 4J 
 
 2 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 rH 
 
 0) 
 0) 
 
 X- 
 
 
 
 2 
 
 _ 
 
 3 
 
 
 
 _ 
 
 1 
 
 
 
 3 
 
 1 
 
 
 
 2 
 
 2 
 
 2 
 
 W 
 
 H 
 
 3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 \ 
 
 2 
 
 
 
 — 
 
 
 
 
 
 — 
 
 — 
 
 
 
 - 
 
 
 
 - 
 
 
 
 2 
 
 1 
 
 
 
 A, 
 
 1 
 
 1 
 
 
 1 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 0) 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 M 
 
 (J 
 
 
 A ? 
 
 
 
 1 
 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 
 o 
 
 U 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 W 
 
 u 
 
 -< 
 
 A 3 
 
 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 * 
 
 Error-an impossible combination of test outcomes 
 
 
 An extended entry decision table, 
 
 Figure 2, 
 
It is therefore recommended that the reader have a prior acquaintance 
 
 with paper [12]. 
 
 A basic concept used here is that of an elementary cartesian complex 
 
 (a special case of a ' cartesian complex ' [ 12 ]) , defined as a set of events 
 
 or cells * of a diagram, which can be expressed as a single logical product 
 
 (a term) of conditions which check whether a test x. has outcome a.. Such 
 1 l 
 
 conditions are written as [x. = a.], and terms as products A[x. = a.]. If 
 
 1 x iei * ± 
 
 an outcome of a test is irrelevant ('-'), then the condition involving this 
 test is omitted from the term. 
 
 Thus, each condition part of a rule in a decision table can be expressed 
 as a term, and be represented in a diagram as an elementary cartesian complex 
 (from now on, simply, a complex). 
 
 A decision diagram , for a given decision table, is constructed by locat- 
 ing in the diagram (representing the space E of test outcomes) the complexes 
 which correspond to the condition part of every rule, and marking them by 
 actions specified in the action part. 
 
 A complex (or a cell) marked by action A is called in the sequel a 
 complex (or cell ) of class A. Fig. 3 and 4 present the decision diagrams 
 representing decision tables in Fig. 1 and 2, respectively. It may be a 
 useful exercise for the reader to check the correspondence between the rules 
 in the decision tables, and corresponding complexes in the decision diagrams. 
 
 In this paper, decision diagrams are used, both, as a conceptual geometri- 
 cal model for describing algorithms, and as a visual aid for solving problems. 
 A significant advantage of decision diagrams lies in the fact, that it is much 
 easier (for humans) to see differences and similarities between geometrical 
 configurations, than between strings of numbers or symbols. 
 
 * In the case of binary tests (i.e., when a.£ [0,1]), a complex of 2 cells 
 corresponds to a k-cube (a subset of 2 vertices of an n-dimensional 
 hypercube) . 
 
RULE 12 3 
 
 Decision diagram representing decision table in Fig. 1, 
 
 Figure 3. 
 
RULE 1 
 
 Empty cells correspond to 
 ELSE conditions- 
 
 Decision diagram representing 
 decision table in Fig. 2. 
 
 Figure 4, 
 
A description of algorithms in terms of geometrical constructs (which 
 can be visualized) has therefore a great appeal - both for scientific 
 communication and education. 
 
 In the past, many authors used the concept of an n-dimensional hypercube 
 and its subsets, k-cubes, for representing an event space and logical products, 
 respectively. A hypercube, however, can be directly visualized only when there 
 are not more than 3 variables; when there are more than 3 variables, it rapidly 
 looses its value as a geometrical model. When the variables can take more than 
 2 values (as in our case) the concept of a hypercube is even less adequate. 
 
 Although a form of diagrams with binary tests (Karnaugh maps) has been 
 used in the past for solving problems related to limited entry decision tables, 
 this is the first paper, to the author's knowledge, which demonstrates use- 
 ulness of diagrams for extended entry decision tables, and uses them system- 
 atically as a conceptual model for presenting and analyzing algorithms. The 
 paper also demonstrates that decision diagrams are a useful practical tool 
 (when the use of a computer is not necessary) for directly solving various 
 problems related to decision tables, such as testing decision tables for re- 
 dundancy, consistency and completeness, optimizing decision tables, and quickly 
 converting them to optimal (or near-optimal) decision trees. 
 
 Chapter 2 describes the use of decision diagrams in designing and optimizing 
 decision tables, and Chapter 3 gives a theoretical analysis of the problem of 
 converting decision tables to (space or time) optimal decision trees, and 
 describes two first degree conversion algorithms. Chapter 3 also demonstrates 
 a need, in some cases, for conversion algorithms of higher degree than first, 
 and shows that such algorithms can be easily obtained from the first degree 
 algorithms. 
 
 II. USE OF DECISION DIAGRAMS IN DESIGNING DECISION TABLES 
 
 2.1 Testing decision tables for redundancy , consistency and completeness . 
 A well designed decision table should be non-redundant, consistent and 
 
8 
 
 complete.* These properties can be easily tested once a decision diagram 
 has been constructed for the given decision table. The redundancy occurs, 
 if the decision diagram contains complexes of the same action class, which 
 have a non-empty intersection. For example, in Fig. 3, complexes represent- 
 ing rule 1 and 2 of action class A , intersect, and therefore, the decision 
 table in Fig. 1 is redundant. If intersecting complexes are of different 
 action classes, then the decision table is inconsistent. The decision table 
 is complete, if every cell of the decision diagram belongs to (is covered 
 by) some complex. We can see in Fig. 3 that the decision table from Fig. 1 
 is redundant, consistent and complete; and in Fig. 4, that the decision table 
 from Fig. 2, is irredundant, consistent and complete (the table would be 
 incomplete if there were no ELSE rule) . 
 
 2. 2 Optimization of a_ decision table 
 
 It is usually desirable that a decision table contains the minimum number 
 of rules, which is sufficient for specifying the given decision problem and 
 preserving the requirements of non-redundancy**, consistency and completeness. 
 In a decision diagram, a reduction in the number of rules occurs when two or 
 more complexes of the same action class are merged (or rearranged) into a 
 smaller number of complexes. The theoretical basis for merging complexes is 
 
 given by the simplification rule : 
 
 L[x.=0] V L[x.=l] V ...V L[x.=d.-1] = L (2) 
 
 11 11 
 
 where L is a term, 
 
 L[x.=a] is a logical product of L with condition [x =a], 
 
 {0,1, . . . ,d -1} is the set of all possible outcomes of test x., i.e., D.. 
 i li 
 
 * A decision table is 
 
 ^ redundant , if there is a combination of test outcomes which satisfied the 
 
 condition part of more than one rule with the same action part; 
 inconsistent , if there is a situation as above but when the action parts 
 
 are different; 
 «complete, if it contains a rule for any sequence of test outcomes. 
 ** If one permits a redundancy, the number of rules can sometimes be further 
 reduced. 
 
The rule (2) applied to a decision diagram says that if complexes of 
 the same action class differ in the outcome of only one test, and the test 
 takes on all possible values in these complexes, then the complexes can be 
 merged into one complex not involving this test at all . If certain 
 combinations of test outcomes can never occur (are 'DON'T CARE- s' ) , then 
 cells corresponding to them (empty cells in Fig. 4) can be included in any 
 complex if this can help to merge complexes. 
 
 Let {A }, i=l,2,...,m, denote the set of all action classes, and E. - 
 1 l 
 
 the set of all cells of action class A. in a given decision diagram. 
 
 A cover C (A ) of action class A. is a set, {C }, of complexes, whose 
 
 J J K. 
 
 3 
 
 union includes (covers) set E_, and does not cover any cells of other 
 
 action classes 
 
 m 
 
 E ^ ^ k CE\ <J lh • (3) 
 
 If all complexes in C(A.) are pairwise disjoint, then the cover is called 
 a disjoint cover of class A.. 
 
 If covers C(A.) of classes A., i=l, 2, ..., m, have the property that 
 any two complexes from any two such covers, respectively, do not intersect, 
 then the union of such covers is called a cover of the decision table . If 
 in a cover of a decision table, the covers C(A.) are disjoint, then the 
 cover is called a disjoint cover of the decision table . 
 
 It is easy to see that any decision table which satisfies conditions 
 of non-redundancy, consistency and completeness defines a disjoint cover of 
 the corresponding decision diagram. The following concept is basic to the 
 contents of the paper: 
 
 Definition 1_. An optimal disjoint cover of a decision diagram is defined 
 as a disjoint cover, which has the smallest number of complexes, and, in 
 case of tie, includes complexes of larger size (i.e., their union covers 
 more cells) among other disjoint covers of the diagram. 
 
10 
 
 The importance of the optimal disjoint cover (called, from now on, 
 the optimal cover ) stems from the fact that it corresponds to a decision 
 table with the smallest number of rules. If there are two or more such 
 tables, the optimal cover corresponds to the table which has more dashes 
 ('-') in the entries (i.e. the condition parts of the rules involve fewer 
 specified test outomes). The decision table corresponding to the optimal 
 cover is called the optimal decision table . 
 
 Thus, determining the optimal decision table is equivalent to determining 
 the optimal cover of a decision diagram. It should be noted that there can 
 be, in general, more than one optimal cover, and, therefore, more than one 
 optimal decision table defining the same decision process. Fig. 5 and Fig. 7 
 present optimal covers for decision diagrams in Fig. 3 and 4, respectively. 
 Fig. 6 and 8 show the optimal decision tables corresponding to these covers. 
 
 Determining the optimal cover of a decision diagram is similar to the 
 process of minimizing a Boolean function in a Karnaugh map, although there 
 are differences: 
 
 1. all complexes must be pairwise disjoint, 
 
 2. the cover involves a family of covers, one cover for each action class 
 (unlike in Boolean minimization, when there is only one cover to be 
 constructed) 
 
 3. the process is done in non-binary event space (assuming extended entry 
 decision tables) . 
 
 For commonly occuring decision tables (which rarely involve more than 
 6-7 binary or 'few-valued' tests (see, e.g., [5]) optimal covers of the cor- 
 responding decision diagrams can often be found just by visual inspection 
 of the diagram and the application of the rule for recognition of complexes 
 given in fl] (one can also develop his/her own recognition rule, since 
 complexes have certain easily detectable regularities). 
 
11 
 
 Xj X2 X3 
 
 
 I 
 
 v. 
 
 ^ 
 
 ■Ap 
 
 A, 
 
 A 
 
 J 
 
 mm 
 
 A 
 
 A 
 
 1 x 5 
 x 4 
 
 Optimal cover of decision diagram in Fig. 3. 
 
 Figure 5. 
 
12 
 
 x l 
 
 
 
 
 
 
 
 1 
 
 
 
 1 
 
 (A X„ 
 
 
 
 1 
 
 - 
 
 - 
 
 - 
 
 - 
 
 4-1 
 
 X 
 
 if) J 
 
 
 
 
 
 
 
 a. x^ 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 1 
 
 - 
 
 - 
 
 - 
 
 x 6 
 
 
 
 
 
 
 
 « A 
 
 c 
 
 o A 2 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 
 : A 3 
 
 
 
 
 1 
 
 1 
 
 
 « A 5 
 
 
 
 
 
 
 1 
 
 
 
 Optimal decision table corresponding to the cover in Fig. 5 
 
 Figure 6. 
 
13 
 
 Optimal cover of decision diagram in Fig. 4 
 
 Figure 7. 
 
14 
 
 W X 1 
 
 *J * 
 
 U) v 
 
 H c 
 
 - 0-21 
 2 12 2 
 
 A 1 
 o A_ 
 
 •M 2 
 
 tJ A, 
 <: 3 
 
 1 1 
 
 1 1 
 
 1 
 
 Optimal decision table corresponding to the cover in Fig. 7, 
 
 Fig. 8 
 

 15 
 
 When a decision table is very large (say, more than 8 'few-valued' 
 tests and 50 rules) it is necessary to use a computer program implementing 
 a systematic method of decision table optimization. 
 
 Since the optimization of a decision table is equivalent to solving a 
 special case of the general covering problem [9], any of the known covering 
 algorithms can be adopted here. An example of a systematic method for 
 optimizing limited entry decision tables is described in [19], which adopts 
 the Quine-McCluskey algorithm for generating prime implicants. 
 
 Another example of a systematic method is the quasi-optimal covering 
 algorithm A , described in [9] and [11]. This algorithm works differently 
 than usual covering algorithms which first generate all candidate complexes 
 and then select a cover: namely, it generates, at each step, only a specially 
 chosen subset of candidate complexes, and then selects from it the 'best' 
 complex. Due to its efficiency and generality, the algorithm A can be applied 
 to very complex covering problems, involving many (easily more than 15) binary 
 or many-valued variables. The algorithm is implemented in computer program 
 AQVAL/1 [8]. 
 
 In the sequel, the generation of the optimal cover of a decision diagram (or 
 just a single action class) is assumed to be done by some adopted procedure. 
 
 Sometimes decision tables are constructed with an assumption that 
 individual rules are tested in order from the left to the right, and the first 
 rule satisfied evokes the corresponding action. This assumption usually leads 
 to a simpler decision table, because the condition part of a rule may intersect 
 with the condition part of any of the preceding rules. Decision diagrams can 
 be easily used to optimize such ordered decision tables. First, construct the 
 optimal cover of the first action class (the same way as in the 'unordered' 
 table). Next, when constructing optimal covers of subsequent action classes, 
 the cells covered by the preceding covers are treated as "DON'T CARE-s whenever 
 it can simplify the cover. 
 
16 
 
 I I I. CONVERTING DECISION TABLES INTO OPTIMAL DECISION TREES 
 3.1 General remarks , previous work 
 
 It is an interesting aspect of human versus machine psychology, that 
 one prefers a decision table in order to formalize a decision process, 
 but when one wants to program it, a decision tree is preferable. In 
 such a tree, the nodes are labeled by individual tests, the branches by test 
 outcomes, and the leaves by action classes. Many methods (e.g., 6, 14, 17, 
 1, 16, 22, 21, 2, 14, 4, 19, 18) have been devised for converting a decision 
 table (in most cases only limited entry) to a decision tree which is optimal 
 (or near-optimal) according to some criterion. Usually, two criteria of 
 optimality are considered: 
 
 1. that the tree has the minimum number of nodes (such a tree corresponds to 
 the space optimal program, i.e., a program with the minimum memory require- 
 ment), or 
 
 2. that the tree corresponds to the time optimal program (i.e., program 
 whose average execution time is minimum). 
 
 After an early rule mark technique proposed by King [6], most sub- 
 sequently developed methods (e.g., 14, 1, 16, 22, 21, 14, 4) apply the principle 
 of decomposition. Tests to be assigned to the nodes of the tree are selected 
 one by one, according to some heuristic criterion. Decomposition methods 
 are computationally very efficient, but may fail to produce the strictly 
 optimal trees. 
 
 Other approaches include a branch-and-bound algorithm by Reinwald and 
 Soland [17], and dynamic programming methods by Bayes [2], and by Schumacher 
 and Sevcik [18]. These algorithms always guarantee the optimal solution, but 
 require exhaustive searches through large spaces of trees. For example, the 
 storage requirement and the execution time of the dynamic programming method 
 of Schumacher and Sevcik [18] both grow with the number of binary tests 
 in proportion to 3 . Consequently, the method becomes practically unacceptable 
 when the number of binary tests exceeds 10-12. In the case of the branch-and-boum 
 

 17 
 
 algorithm [17], the situation is even worse (as estimated in [18], the 
 algorithm may require several hours even when n=6) . Since the computa- 
 tional cost of table conversion should be included in the total cost of tree 
 implementation, it can happen that a suboptimal tree (obtained by an efficient 
 decomposition method) is 'more optimal' than the theoretically optimal 
 
 (tree. 
 Most of the developed methods are strictly computer oriented, and give 
 little insight to the nature of the difficulty of the conversion problem; 
 thus, they leave the practitioners coping with the variety of practical 
 conversion problems strongly dependent on the availability of computer programs 
 and forced to accept the assumptions underlying the programs. 
 
 In this paper we analyze the conversion problem of limited and extended 
 entry decision tables, and try to provide a clear graphical illustration of 
 the difficulties by using decision diagrams. Throughout the paper we con- 
 centrate on the simpler problem of space optimal conversion, and then show 
 how the results can be extended to time optimal conversion. 
 
 The conversion algorithms described here are basically decomposition 
 algorithms. We show, however, that by extending their degree (def. 5), one 
 can obtain a spectrum of algorithms which differ in the trade-off between the 
 computational efficiency and the degree of guarantee of optimality. When the 
 algorithms do not guarantee the optimum, they give a measure of the max imum 
 possible distance to the optimum. 
 
 3. 2 Problem analysis 
 
 CASE 1: Conversion of limited entry decision tables 
 to space optimal decision trees 
 
 First, we will describe a procedure for converting a decision table, by 
 means of a decision diagram, to an arbitrary decision tree equivalent to it. 
 
18 
 
 A test, say x., is selected and assigned to the root of the tree under 
 construction. The left and the right branches of the root are assigned out- 
 comes of the test, and 1, respectively. The branches are put into corres- 
 pondence with parts (subdiagrams) of the decision diagram, defined by conditions 
 [x.=0] and [x.=l], or, briefly*, with subdiagrams D(x.=0) and D(x.=l), re- 
 spectively. Next, a test x , is selected for the left, and a test x._ for 
 the right descendant of the root. The branches of the node x are put into 
 
 correspondence with subdiagrams D(x.=0, x. =0) and D(x.=0, x. =1). A similar 
 
 i Jl i jl 
 
 correspondence is established for branches from node x . It is clear now, that 
 selection of a test for a node always implies a partitioning of the corresponding 
 diagram into smaller parts. For the next test selection these parts can be 
 considered as independent diagrams (although, they may consist of geometri- 
 cally separate collections of cells) . 
 
 If at any step of assigning a test to a node, a subdiagram corresponding 
 to a branch of this node consists of cells of only one action class, say A, 
 (and possibly DON'T CAREs) , then this branch ends with a leaf , which is assigned 
 class name A. The tree is completed when there are no subdiagrams with cells 
 of different action classes. 
 
 It is obvious that if we construct this way all possible trees equivalent 
 to the given decision diagram, then we can select from them the optimal tree. 
 Since the number of possible trees grows very rapidly** with the number of 
 available tests, this method of finding the optimal tree is clearly unacceptable, 
 except for trivial cases. 
 
 *A part of decision diagram defined by a term [x. =a.. ] [x . =a„ ] . . . [x. =a,] 
 
 - , _ / x XX X XZ Z XK. K. 
 
 xs denoted D(x..=a 1 , x.„=a n , ..., x. =a, ) . -, 
 
 xl 1 xz 2' xk k n-1 i 
 
 **For n k-ary tests, k=2,3,..., the maximum number of trees is ||(n-i) 
 
 (we have n choices for the root, n-1 choices^for i=0 
 
 k descendents of the root, n-2 choices for k descendants 
 
 of the descendants, etc.) 
 
19 
 
 How can we develop a criterion which guides the test selection at each 
 step in such a way that the resulting tree will be optimal (or at least near 
 
 optimal)? This question is the subject of the remaining part of the section. 
 
 ■ 
 
 In a binary tree, the number of leaves, I, is uniquely defined by the 
 number of nodes, v : 
 
 I = v + 1 (4) 
 
 Consequently, minimizing the number of nodes is equivalent to minimizing 
 the number of leaves. In our decision trees, the leaves correspond to action 
 classes. Therefore, if a decision table has m different action classes and 
 an equivalent tree has m leaves, then the tree is, obviously, optimal. A 
 decision tree can be non-optimal only when the tree has more than one leaf 
 marked by the same action class. 
 
 Each leaf (branch) of a tree is defined by a sequence of outcomes of 
 tests assigned to the nodes on the path from the root to the leaf (branch). 
 Suppose, that these outcomes for some leaf (branch) are a ,a_,...,a,, and 
 
 X Z K 
 
 the corresponding tests are x. ,x„, . . . ,x, . Then the leaf (branch) defines a 
 term 
 
 (in a binary tree a.E{0,l}) and, also, a complex in a decision diagram. 
 
 Suppose, that in a given decision diagram, there are k cells of action 
 class A. If these cells are treated separately, then there will be k leaves 
 representing them in a decision tree. If, however, they can be put into one 
 complex, then all of them can (potentially) be represented by one leaf. 
 Consequently, the problem of minimizing the number of nodes in the decision tree 
 is related to the problem of representing action classes by the minimum number of 
 complexes, i.e., the problem of constructing the optimal cover of a decision 
 diagram. The following theorem gives a more precise meaning of this relation: 
 Theorem 1 : If the optimal cover of a decision diagram has s complexes, then 
 the optimal decision tree equivalent to the diagram has at least s-1 nodes. 
 
20 
 
 Proof : 
 
 According to def. 1, the optimal cover has the smallest possible number 
 of complexes needed to cover all cells of every action class. Since leaves 
 correspond to complexes, then any tree equivalent to the decision diagram, 
 including the optimal one, can not have less than s leaves, and, because of 
 (4), less than s-1 nodes. ■ 
 
 Theorem 1 gives a better lower bound on the number of nodes in the 
 optimal decision tree than value m-1 (m-number of action classes), but requires 
 a construction of the optimal cover. It also implies that if a tree can be 
 constructed with s-1 nodes, then the tree is optimal. Theorem 1 also indicates, 
 that the optimal tree may have more leaves than the number of complexes in 
 the optimal cover. The reason for this is that each node in a binary tree 
 always has 2 branches, which correspond to 2 complexes. Therefore, the number 
 of nodes in a tree representing a cover depends not only on the number of 
 complexes in a cover, but also on the relationships existing among complexes. 
 
 Let R, called a reference set , denote (initially) the optimal cover of 
 a given decision diagram. As was mentioned before, selecting a test for a node 
 implies a partitioning of the diagram into 2 parts. Such a partitioning may 
 break one or more complexes in R into 2 (sub)complexes . If only 1 complex 
 is broken, then instead of 1 leaf (which could potentially represent this 
 complex in the tree), there will have to be at least 2 leaves representing this 
 complex in the final decision tree ('at least', because subsequent partitions 
 may break this complex again). Thus, by selecting the given test, 1 additional 
 leaf (and, therefore, 1 additional node) is added to the tree, above the necessary 
 minimum. If R complexes are broken, then R leaves (and R nodes) are added to 
 the tree. It is clear, that the number of complexes which are broken in the 
 reference set by selecting a test (or, generally, a relation between the re- 
 ference set and the test) is indicative of the number of nodes in the final tree. 
 Consequently, properties of this relation can be used for constructing a test 
 
21 
 
 selection criterion. 
 
 Suppose text x has been selected for a node, and some complexes have 
 been broken in R. The diagram is partioned into 2 subdiagrams, and there 
 are now 2 sets, R' and R", each being a cover of the corresponding subdiagram. 
 The next selection of a test (for a node which ends the branch corresponding to 
 one of the subdiagrams), can now be based on the relation between the R' (R") and 
 each test. 
 
 If the initial R were a different optimal cover than the above, the 
 whole process could lead to a different tree. To make our considerations 
 general, let us then assume that R can stand for any cover (of the initial 
 diagram or subsequent subdiagrams), and that our goal is to determine the 
 'quality' of a test by measuring a certain property of the relation between 
 set R and the test. The following definition gives a more precise expression 
 of the above concept. 
 
 Definition 2 . A first degree cost estimate A(x. ) for test x. , with regard to 
 reference set R, is a function which assigns to x. a positive integer depending 
 
 on the relation between R and x.. 
 
 l 
 
 The following is a definition of an important special case of first degree 
 
 cost estimate. 
 
 Definition 3 . A first degree cost estimate for x., which assigns to x. the 
 
 number of complexes broken by x. in R, is called the A or static cost estimate, 
 
 i — o 
 
 and denoted A (x.). 
 o 1 
 
 A first degree cost estimate is insufficient to capture the full cost of 
 
 selecting a test x., because once test x. is selected, the reference set R 
 i l 
 
 is partitioned, and this partition may influence the choice of subsequent 
 tests. In order to apply a "look ahead" in test selection, higher degree cost 
 
 estimates are introduced. 
 
 2 
 Definition 4 . The second degree cost estimate for text x . , A (x . ) , is defined : 
 
 A 2 (a.) = A(x.) + min {A(x T )} + min (A(x D )} (6) 
 
 11 L K 
 
 xGN(x.) x€N(x.) 
 
22 
 
 where N(x.) is the set of tests available for assignment to the descendants of 
 
 node assigned test x. (N(x ) is the set of all tests minus 
 
 tests assigned to nodes on the path from the root to node x., 
 
 inclusively) , 
 
 x T - a test assigned to the left descendant of node x., 
 
 x_ - a test assigned to the right descendant of node x . 
 
 Similarily, higher degree cost estimates can be defined. 
 
 Definition 5 . A kth, k = 1,2, ... , degree conversion algorithm is defined as 
 
 an algorithm which assigns to each node of the tree the test, whose kth degree 
 
 cost estimate is minimum. 
 
 Let us discuss in more detail the estimate A . This estimate is very 
 
 o 
 
 attractive due to its simplicity. If the reference set R is the optimal cover 
 
 of the diagram (subdiagram) , A (x.) specifies the minimum number of nodes which 
 
 are added to the final tree, above the lower bound defined by the cardinality 
 
 of R, if any of the parts of complexes broken by x. cannot be (or are not) merged 
 
 later into a larger complex. If, in addition, complexes broken by selecting 
 
 x. are not broken again in the consecutive steps of test selection, then 
 
 A(x ) gives the exact number of added nodes to the final tree, 
 o 
 
 If a complex has only 2 cells, it can be broken only once, and the proba- 
 bility that it will be broken again is 0. If it has 2 cells, it can be broken 
 at most 2-1 times (so many nodes in a tree are needed to break it into individ- 
 ual cells). Assuming that minimum A is the criterion for test selection, the 
 probability that a complex of 2 cells is broken t times (t=0,l,..,2 -1) 
 
 decreases rapidly with t . This is so, because in order for a complex to be 
 broken t times, the reference set R has to include a special arrangement of 
 at least 2t complexes*. The larger is t, the less likely is the occurrence 
 of such an arrangement. 
 
 * In order that a test which breaks a given complex will be a "winner" 
 t times (assuming no tie) , there will have to exist in R at least two 
 complexes each time, that are broken by all available tests. Thus, the 
 total number of complexes must be at least 2t. 
 
23 
 
 Since the above probability depends on the particular configuration of complexes 
 
 in R, one can only say that, in general, the larger is the broken complex, 
 
 the somewhat higher is the chance that its parts may be broken again during 
 
 the tree construction. Note, however, that size alone will not cause a 
 
 complex to be repeatedly broken. 
 
 When the complex has 2 cells, i.e., half of the decision 
 diagram, it will not be broken even once, if minimum A Q is 
 the criterion of test selection. This is so, because such a 
 complex is defined by the value of only one test, and this 
 test is the only test which does not break any complexes 
 and therefore, it will be selected for the root of the tree. 
 
 In view of the above, a reasonable criterion for test selection is to 
 use A as the primary criterion, and when there is a tie (when more than one test 
 has the same value of A ) , to select the test which breaks smaller complexes. 
 If there still is a tie, any test can be selected. 
 
 Definition 6 . The above defined criterion for test selection is called the 
 criterion of minimizing added leaves (MAL) . 
 
 MAL can be viewed as another form of the first order cost estimate. 
 Suppose a decision tree has been constructed using the MAL criterion. The sum 
 of A for tests assigned to each node of the tree is called the total cost 
 estimate , and denoted S . An important property of Z is given by: 
 Theorem 2 : A decision tree obtained using MAL criterion has no more than 
 
 Z nodes above the number of nodes in the optimal tree. 
 
 o r 
 
 Proof : 
 
 The theorem is the direct consequence of theorem 1, definition 3, and 
 the previously discussed meaning of A estimate . ■ 
 
 Thus, if I is 0, the obtained tree is optimal. 
 
 If after any step of test selection, some, say t complexes and/or 
 parts of the broken complexes are merged into one complex, the value of E 
 should be decreased by t-1. Therefore, in order to get a 'better' tree using 
 MAL criterion, one should check, after each test selection, whether parts of 
 the broken complexes (possibly together with other complexes) can be merged 
 
24 
 into larger complexes. 
 
 Another disadvantage of the MAL cost-estimate is that it assumes that in 
 computing A the reference set is the optimal cover of the diagram (subdiagram) 
 under consideration. If there are many optimal (or irredundant* covers), all 
 of them may have to be checked in order to find out which one leads to the 
 'best' cover (assuming that £ =f each time). 
 
 We will now develop another first order cost estimate which does not have 
 the above disadvantages. 
 
 Let E denote the set of all cells of action class A in a given diagram 
 
 (subdiagram). Selecting a test, say x , , divides the diagram into 2 subdiagrams. 
 
 Suppose, that by doing so, set E is split into 2 subsets, E and E . 
 
 Let C(E), C(E ) and C(E.) denote optimal covers of sets E, E and E_ , respectively; 
 o 1 o I 
 
 and c, c , c - the corresponding cardinalities of these covers. Obviously, 
 
 c + c_ > c. The difference between c +c, and c is computed: 
 o 1 o 1 
 
 A(x.,A) = c + c 1 - c (7) 
 
 l o 1 
 
 Let K denote all action classes whose cells are partitioned by selecting 
 
 test x . A(x.,A) is determined for each A G K, and then their sum is computed: 
 i i 
 
 A 1 (x ± ) = Z A(x_.,A) (8) 
 
 kzK 
 
 Definition 7. A, (x.) is called A, or dynamic cost estimate for x.. 
 1 l —1 — z l 
 
 To see that A is a form of the first degree cost estimate, assume that 
 the reference set R is the union of covers 
 
 U (c(E^) U c(E 1 ± : 
 l o 1 
 
 where i scans partitions of set E by all tests being considered for an assign- 
 
 R = C(E) u Y ( C ( E ) U C(Ep) (9) 
 
 ment to the given node. A, can then be viewed as a property of the relation 
 between R and x.. In using A., for test selection one can ignore the value c 
 in (7), since it remains the same for each test. Only when a test is selected, 
 
 *A cover is irredundant if removing or decreasing any complex in it makes 
 the resulting set not a cover. 
 
25 
 
 one can compute the 'complete' value of A., which Is needed for computing 
 the total cost estimate Z , as defined later. 
 
 The A estimate does not have the previously mentioned disadvantages of 
 A , and is clearly more precise in estimating the effect of a test selection 
 on the final decision tree. Its computation, however, is more complex because 
 at each step of test selection the optimal cover of (ever decreasing) sub- 
 diagrams has to be computed (note, however, that after selecting a test, the 
 cardinalities of c(E ) and c(E 1 ) can be used in computing A for test selection 
 at the next level of the tree). A 'shortcut' in computing A is also 
 possible. Namely, if A =0, then, obviously, A = 0, and A does not have 
 to be computed in such case (for details, see Example 3 and remarks after 
 algorithm D) . 
 
 A question arises of which test to select when A is the same for more than 
 one test. In computing A , sizes of covers C(E ) and C(E..) were not taken into 
 consideration. 
 
 If the cardinality of C 1 = C(E 1 ) U C(E X ) is the same for different values of i, 
 
 o x 
 
 (see (9)), then A for corresponding tests are also the same. The covers 
 C may, however, consist of complexes of different sizes (because of the DON'T 
 CARE-s). The larger the complex, the smaller number of tests are involved 
 in its expression, and the corresponding complex can be potentially repre- 
 sented by a leaf at a higher level of the tree. This may reduce number of 
 nodes (see Example 3) . 
 
 Consequently, a reasonable tie-breaking rule is to select the test for 
 which C consists of larger complexes (i.e., the total number of cells in 
 complexes of C is larger). When there is still a tie, any test can be 
 selected. 
 
 Definition 8 . The above defined criterion for test selection is called the 
 criterion of dynamically minimizing added leaves (DMAL) . 
 
26 
 
 The DMAL criterion is a form of first order cost estimate (as is MAL). 
 
 Assuming that E denotes the sum of the estimate A- for all nodes in a tree, 
 
 theorem 2 also holds for Z . 
 
 1 
 
 Other first degree cost estimates 
 
 Pollack [14] describes a first order cost estimate in which complexes 
 broken by a test are assigned weights (called 'column-counts') equal to the 
 number of cells they consist of. The cost estimate ('dash-count') for a 
 test is the sum of weights of complexes broken by selecting the test. (An 
 assumption is made in [14] that each action class is represented by only one 
 complex. Thus, the issue of alternative covers is not considered there, 
 which strongly limits the applicability of the method). 
 
 According to the above estimate, breaking, e.g., 4 two-cell complexes 
 
 is equivalent to breaking 1 eight-cell complex. Breaking 4 two-cell complexes 
 
 adds 4 more nodes, while breaking 1 eight-cell complex adds only one node 
 
 (although, if this complex is broken in the next test selections, 7 more 
 
 nodes could potentially be added to the tree). It is assumed here, of course, 
 
 that no subsequent merging of complexes is done. In view of what was said 
 
 before about the fast decrease of the probability that a complex is broken 
 
 a few times, such an estimate does not seem to be sufficiently justified (in 
 
 fact, it is easy to find an example for which such an estimate will select 
 
 a wrong test, while the simpler MAL criterion will select a right one). 
 
 (The tie-breaking criterion used in [14] (called DELTA) favors an imbalance in 
 the number and sizes of complexes on both sides of axes of a test (i.e., in parts 
 of diagram defined by value and 1 of a test). It is unclear how to justify 
 such a criterion, and there is a simple counter-example to it.) 
 
 Alster [1] describes a first degree cost estimate where the weight given to 
 
 broken complexes is 2 , where k is the total number of reduced variables ('dashes') 
 
 in all (essential) complexes in a cover of an action class (i.e., if there are, 
 
 for example, 3 two-cell complexes in an action class and any one is broken by 
 
 3 
 the test, then it will be given weight 2 =8). Thus, if there is only one 
 
 complex in an action class, the estimate is equivalent to Pollack's dash-count 
 
27 
 estimate, but if there are few complexes in a class, and each complex has more 
 than one cell (i.e., has at least 1 dash), then any broken complex from this 
 group will be given a very large weight. Here again, for reasons discussed 
 before, such criterion seems to be not sufficiently justified, and it is easy 
 to find a counter-example to it (for which both the MAL and the DMAL criterion 
 select a correct test). 
 
 When there are alternative (non-essential) complexes, paper [1] advocates 
 the creation of OR-groups. The weight of broken complexes in such a group 
 is divided by the cardinality of the group. The aim of the measure is to take into 
 account the fact, that the larger are OR-groups, the more likely it is that a cover 
 exists whose complexes will not be broken by the test under consideration. 
 
 Note, that in computing A , rather than attempting to estimate (by the above 
 or any other measure) the probability that such a cover exists, one simply directly 
 searches for the cover. This is computationally acceptable, and at the same time 
 the estimate is more precise. 
 
 CASE 2 : Conversion of extended entry decision tables. 
 
 In an extended entry decision table, tests may have an arbitrary number 
 of outcomes. Consequently, the equivalent trees may have an arbitrary number 
 of branches. Let us assume first, that all tests have the same number of out- 
 comes equal to d, and the corresponding decision tree will be d-ary. In a d-ary 
 tree, the number of leaves is 
 
 I = (d-l)v+l (10) 
 
 where v is the number of nodes. Thus, as in the case of binary trees, 
 a d-ary tree with the minimum number of leaves has also the minimum number of 
 nodes. Both criteria of test selection, MAL and DMAL, can be adoped here with- 
 out modification, as it is shown below. 
 
 Selecting a test corresponds now to partitioning a diagram into d parts. 
 Consequently, if a test breaks a complex, it breaks it into d smaller complexes. 
 This adds d-1 leaves and 1 node to the tree. Therefore, decreasing the number 
 
28 
 
 of complexes which are broken by a test, decreases the number of nodes. Value 
 
 d makes no difference with regard to which test should be selected. Both principlt 
 
 MAL and DMAL, can be directly applied. 
 
 If, however, tests can have different numbers of outcomes, the trees with 
 the same number of leaves can have different number of nodes. For example, 
 Figure 9 shows a decision diagram which can be converted to 2 trees with the 
 same number of leaves but different number of nodes, as shown in Fig. 10. 
 
 If two trees have the same number of leaves, then the tree in which 
 tests with larger number of outcomes are assigned to nodes closer to the root 
 will have a smaller number of nodes. Therefore, a reasonable generalization 
 of MAL and DMAL criterion is to accept as the primary tie-breaking rule (for 
 tests with the same value of A and A. , respectively) the preference for tests 
 with larger number of outcomes, and then, as the secondary tie-breaking rule, 
 the one referring to the size of complexes. Thus, we have: 
 Definition 9 . The (modified) criterion MAL {DMAL} for test selection is defined: 
 
 1. Choose the test for which A {A } is smaller, 
 
 2. In case of a tie, chose the one with larger number of outcomes. 
 
 3. If there is still a tie, chose the one which partitions the diagram into 
 parts with smaller complexes {into parts in which covers of the same class 
 have larger complexes}, 
 
 4. If there is still a tie, chose any test. 
 
 Note, that in the case when all tests have the same number of outcomes, 
 the above defined MAL and DMAL criteria are equivalent to their previously 
 defined form (def. 6 and 8). 
 
 Although using the MAL or DMAL criterion for test selection will often 
 lead to the optimal tree, in some cases the obtained tree will be sub-optimal. 
 In such cases, a higher degree cos t estimate may be needed for the "right" test 
 selection. An example of such a case is given in the section 3.3 (Example 4). 
 
29 
 
 Empty cells denote 
 DON'T CARE-s. 
 
 A decision diagram involving tests with 
 different number of outcomes* 
 
 Figure 9. 
 
 A : A 2 A 3 A 4 A 5 
 
 Two decision trees equivalent to 
 decision diagram in Fig. 9. 
 
 Figure 10. 
 
30 
 
 3. 3 Algorithms and examples 
 
 The previous section described 2 criteria, MAL and DMAL, for test selection, 
 but left unspecified the details of using them for constructing decision trees. 
 This section describes 2 conversion algorithms, S('static cover') and D('dynamic 
 cover') which employ the criteria MAL and DMAL, respectively. Although algorithms 
 are described in the context of using decision diagrams, they can be directly 
 adopted for computer implementation. 
 
 The algorithms permit someone, with practice in recognizing complexes 
 in a diagram, to quickly and directly convert a decision diagram into an op- 
 timal or near-optimal decision tree. In the latter case, the algorithms give 
 an estimate, Z or E , respectively, of the maximum difference (in the number 
 of nodes) between the obtained and optimal tree. The algorithms assume as 
 given, a procedure for constructing the optimal cover of a decision diagram. 
 Algorithm S 
 
 The algorithm uses MAL criterion for test selection and assumes that the 
 initial reference set R is the optimal cover of the decision diagram (or one 
 of the alternative optimal covers, if such exist). Since a different R can 
 produce different decision trees, in order to obtain the 'best' tree the 
 algorithm may have to be repeated for each alternative cover (unless for some 
 tree, the total cost estimate X n = 0). 
 
 The algorithm is recommended, when there exist only one or very few optimal 
 (or irredundant*) covers. 
 
 Step 1 : Determine the optimal cover of the decision diagram, and accept it 
 as the reference set R. Assign the set of all tests to T. Set a 
 printer P to indicate the root of the tree. 
 
 *An interesting and, to this author's knowledge, unsolved problem is whether 
 the optimal tree can be always derived from the optimal cover (assuming that 
 splitting complexes or joining previously split parts are the only permissible 
 operations on the cover). 
 
31 
 
 Ste p 2 : For each test from T compute estimate A (def. 3). If for some 
 o 
 
 test, x , A (x.) = 0, go to step 3. If for every test A =j= 0, 
 select a test according to MAL criterion (def. 9). Let x denote 
 the selected test, and 0,l,...,d.-l be its outcomes. 
 
 Step 3 : Assign x. to node P, and outcomes of x., values 0,1, . . . ,d .-1, to 
 branches of node P (in order from the left to the right). Split 
 the diagram into d. (sub)diagrams D(x =0), D(x.=l) , . . . ,D(x.=d .-1) . 
 Check if any of these diagrams contain a complex (or complexes) of 
 the same action class. For each such diagram, assign the name of the 
 action class to the end (leaf) of the branch. Put the remaining 
 diagrams on the list L. If L is empty, then STOP. 
 
 Step 4 ; Apply algorithm S, starting from step 2, to each of the diagrams on 
 the list L. Assume the following initialization for each diagram: 
 9 P points to the node at the end of the branch corresponding 
 
 to the given diagram, 
 T = T \ {x.}, where \ is the set subtraction 
 
 « Merge, if possible, any complexes or parts of the broken 
 
 complexes (which lie within the scope of the diagram), which 
 are of the same action class, into larger complexes. If k 
 complexes are merged into 1, subtract value k-1 from A (x.). 
 Accept the final set of complexes as the reference set R. 
 (The above merging is not a necessary operation; if used, it 
 can sometimes improve the final tree.) After completing the 
 tree, compute the total cost estimate Z (see theorem 2). 
 
 Example 1 
 
 Convert the decision table in Fig. 1 to a decision tree using algorithm 
 
 S (Fig. 3 shows the corresponding decision diagram). 
 
32 
 
 Step 1 ; The otpimal cover of the decision diagram is determined (Fig. 5) . 
 (Since only one complex is associated with each decision class, 
 complexes are identified by symbols denoting classes) 
 S:=(A l' A 2' (A l' A 2 ) ' A 3' A 4' A 5 )> T:=(x 1 ,x 2 ,...,x 6 ) 
 
 Step 2: Compute A for each x.eT: 
 c — o 1 
 
 A (x.) = 
 o 1 
 
 A (x ) = 4 (axes of x„ cut A ,A_,A. ,A C ) 
 o I I 2 3 4 5 
 
 A (x_) = 6 (axes of x_ cut all complexes) 
 
 A (x. ) = 
 o 4 
 
 A (x_) = 3 (axes of x n cut A.,A.,A r ) 
 o 5 5 3 4 5 
 
 A (x,) = 6 (axes of x, cut all complexes), 
 o 6 6 
 
 Since A (x ) = 0, remaining values of A do not have to be computed 
 o 1 o 
 
 (unless one wants to derive alternative trees; they were computed 
 here for illustration). Test x is selected. 
 
 Step 3 : x is assigned to the root of the tree; left and right branches of 
 the root are assigned values and 1, respectively. Split the 
 diagram to 2 diagrams, D(x =0) and D(x =1). Since both diagrams 
 contain complexes of different classes, L:={D(x =0), D(x =1)}. 
 
 Step 4 : Consider diagram D(x =0) first. P points to the node which ends 
 
 the branch from the root. T:=(x ,x»,x. ,x c ,x,) . The reference 
 
 z j 4 5 b 
 
 set R:=(A ,A 2 ,(A ,A 2 ),A^). 
 
 Step V : Compute A for each x.GT: 
 
 A (x„) = 2 (axes of x„ cut A. and A.) 
 o 2 2 2 4 
 
 A (x„) = 4 (axes of x,. cut A. (A. ,A ) ,A_,A, ) 
 oj 3 llzz4 
 
 A (x.) = 
 o 4 
 
 Select x. . 
 
 4 
 
33 
 
 Step 3' : x is assigned to P, the left and the right branches 
 
 are assigned and 1, respectively. The diagram D(x =0) is split 
 
 into 2 new diagrams, D(x =0, x =0) and D(x =0 , x =1). Since 
 
 A (x,)=0, no merging is possible. Diagram D(x =0,x =1) consists 
 
 of one complex A . Therefore, the end of branch 1 is marked by A , . 
 
 L: = {D(x =0, x =0)} 
 1 4 
 
 Step 4': Apply algorithm S, starting from step 2, to diagram 
 D(x =0, x =0), assuming the initialization: P points to the node 
 ending branch (of node x,); t: = (x_,x„,x,-,x,) , R: = (A , (A ,A ),A ) 
 
 If one continues the algorithm, the end result will be the tree 
 
 presented in Fig. 11. It is easy to check that A for each selected list was 
 
 0, therefore, the total cost estimate Z =0, and the tree is optimal. 
 
 o 
 
 Example 2 
 
 Convert the extended entry decision table from Fig. 2 to a decision tree 
 using algorithm S (the corresponding decision diagram is shown in Fig. 7). 
 Step 1 : The optimal cover of the diagram is determined (Fig. 7). R: = (L , 
 L ? ,L ,L, L ), T: = (x ,x_,x„,x,), P points to the root of the tree. 
 
 Step 2 : Compute A estimate for each x^T: 
 
 A (x. ) = 2 (axes of x n cut complexes L.. and L_) 
 o I 1 i z 
 
 A (x.) = 
 o z 
 
 A (x ) = 5 (axes of x cut all complexes) 
 
 A (x.) = 5 
 o 4 
 
 Test x_ is selected. 
 Step 3 : x is assigned to P (the root of the tree); branches from P are 
 
 assigned values 0, 1, 2. The decision diagram is split into 3 (new) 
 diagrams D(x =0) , D(x> =1) and D(x =2). Diagram D(x =0) consists 
 of complex L of decision class A , and diagram D(x =1) of complex 
 
34 
 
 A space optimal decision tree corresponding to 
 the decision table in Fig. 1 (Example 1). 
 
 Figure 11. 
 

 35 
 
 L of class A . Ends of branches and 1 are marked A and A , 
 respectively. The list L: = {D(x=2)}. 
 Step 4 : Apply the algorithm, starting from step 2, to diagram D(x =2). 
 
 Continuing this process produces the tree presented in Fig. 12 
 
 
 Since A for each test was 0, the total cost estimate £ =0 and the tree 
 o o 
 
 is optimal. 
 Algorithm D 
 
 This algorithm uses DMAL criterion for test selection. It is particularly 
 recommended when there are quite a few choices of complexes for a cover, and, 
 therefore, there can be a large number of irredundant (and perhaps also optimal) 
 covers. The algorithm starts with a decision diagram in which cells of all 
 action classes are treated as separate* (i.e., not included in any complexes). 
 Step 1 : Assign the set of all tests to T. Set P to indicate the root of the 
 
 tree. 
 Step 2: For each x.^T, compute cost estimate A (def. 7). Select the best 
 
 test by applying DMAL criterion (def. 9). 
 Step 3 : Assign x. to node P, and outcomes of x., values 0,1, . . . ,d.-l, to 
 
 branches of P. Split the diagram into d. diagrams D(x.=0), D(x.=l), 
 ..., D(x.=d.-1). Check if any of the diagrams contain cells of only 
 1 action class. For each such diagram assign the name of the action 
 class. For each such diagram assign the name of the action class to 
 the end of the branch corresponding to the diagram. Put the remaining 
 diagrams on the List L. If L is empty, then STOP. 
 
 *This condition is not necessary, if the adopted covering algorithm can find 
 optimal covers starting with complexes rather than individual cells. 
 
36 
 
 A space optimal decision tree corresponding 
 to the decision table in Fig. 2 (Example 2), 
 
 Figure 12, 
 
37 
 
 Step 4 : Apply algorithm D, starting from step 2, to each of the diagrams 
 on L. Assume the following initialization for each diagram: 
 
 # P points to the node at the end of the branch corresponding 
 to the diagram, 
 
 # T: = T \ {x.}. 
 
 After completing the tree, compute Z , i.e., the sum of values 
 
 of A for tests assigned to each node of the tree. T. is the 
 
 maximum possible difference (in the number of nodes) between the 
 
 obtained tree and the optimal tree. 
 
 A 'shortcut' in executing algorithm D is to mark (record) in the diagram 
 
 (considered at a given step), the optimal covers C(E_), C(E 1 ) , . . . ,C (E _. ) , 
 
 i 
 which are generated at this step for computing A (def. 7). When A is computed 
 
 in the framework of the subdiagrams (of the above diagram), A is determined with 
 
 an C(E.) as the reference set. If A =0, then A =0. (See Example 3, step 2, 
 l o 1 
 
 for illustration.) 
 Example 3 
 
 Convert the decision diagram in Fig. 13a to a decision tree using algorithm 
 D. 
 
 Step 1 : T:=(x ,x ,x ,x,). P points to the root of the tree. 
 Step 2 : Compute A (def. 7) for each x £T: 
 A 1 (x 1 )=l 
 
 The axis of x, divides cells of action class 1 and 3. (Cells 
 
 of classes 2,4 and 5 are on one side of the axis. The optimal 
 
 cover of cells of class 1, in the original diagram, consists 
 
 of 3 complexes. The optimal cover of cells in class 1 in diagram 
 
 D(x =0) consists of 1 complex, and in D(x =1) of 2 complexes. 
 
 Thus, A(x_, class 1) = (1 + 2) - 3 = 0. The optimal cover of 
 
 class 3 in the original diagram consists of 2 complexes (Fig. 13b). 
 
 The optimal cover of class 3 in D(x 1 =0) has 1 complex (L^), and 
 
 in D(xi=l) also 1 complex (L^); see Fig. 13c. Thus, A]^(x^, class 3) = 
 
 (1+1) -2 =0, and, finally, Ai(x]_) = A^(x 1 ,class 1) + Aj (x^,class3) = 0. 
 
 
38 
 
 a. 
 
 x,x s 
 
 1 x, 
 1 x. 
 
 s 
 
 s 
 
 ® 
 
 C. 
 L, 
 
 
 
 / 
 
 
 
 DIXjSO)* 
 
 (3 
 
 3) 
 
 
 
 
 
 
 
 
 DlX^lH 
 
 @ 
 
 
 
 
 Q 
 
 © 
 
 ^ 
 
 ^ 
 
 © 
 
 D(X,=0) D(X,=1) 
 
 
 
 o 
 
 
 
 (?) 
 
 w 
 
 
 
 © 
 
 r 
 
 
 
 ® 
 
 H 
 
 
 
 L 
 
 L 
 
 D(X 3 =0aX 4 =1) 
 
 D(X 3 =0, X 4 =0) 
 
 Decision diagrams for example 3. 
 Figure 13. 
 
 Optimal decision tree for decision 
 diagram in Fig. 13a. 
 
 Figure 14. 
 
39 
 
 Note that the cover of class 3 in Fig. 13b has larger complexes 
 than the cover of this class in Fig. 13c. This is the reason 
 for selecting later test x^, rather than x-^ , although for both 
 A 1 =0 (see criterion 3 for DMAL in def. 9). 
 
 A.. (x~) = 2 (because of action classes 4 and 5) 
 
 A 1 (x 3 ) = 
 
 VV = ° 
 
 We have a tie for x-.x-.x,. Condition 3 of DMAL eliminates test x . 
 
 From remaining x and x, ,x„ is chosen. To apply the 'shortcut', 
 
 the optimal cover of class 1 in diagrams D(x =0) and D(x =1) is 
 
 marked (Fig. 13d). 
 
 If the above cover is taken now as the reference set R, then 
 
 A (x.) = 0, in the framework of D(x =0). This implies that A (x, , 
 
 class 1) = 0, and, therefore constructing optimal covers for class 
 
 1 in D(x =0),x ,=0) and D(x =0,x=l) is avoided. 
 3 4 3 4 
 
 Step 3 : x is assigned to P. The diagram is split into 2 diagrams, D(x =0) 
 
 and D(x 3 =l). T:-(aL,,x 2 ,x,) . 
 Step 4 : Algorithm D is now applied separately to diagrams D(x =0) and D(x =1), 
 
 For the left offspring of node x~, test x, is chosen. Fig. 13e shows 
 
 the optimal covers of classes 1 and 3 in subdiagrams D(x =0,x,=0) 
 
 and D(x =0,x =0). 
 
 Continuing the algorithm leads to the decision tree in Fig. 14. The tree 
 corresponds to the (optimal) cover shown in Fig. 13f. 
 
 The total cost estimate £ =0, therefore the tree is optimal. 
 3.4 A comparison of algorithms 9 and D. Need for higher degree algorithms ^ 
 
 in some cases. 
 
 Algorithm S starts with constructing the optimal cover of the given 
 decision diagram. The need for applying a covering algorithm arises again when 
 come complexes are broken and there is a possibility that their parts and 
 
40 
 
 possibly some other complexes of the same action class, could be merged into 
 
 larger complexes. In algorithm D, the covering algorithm is applied at every 
 
 step of test selection, although, for ever decreasing (sub) diagrams, and only for 
 
 classes which are divided by a test (and except when the 'shortcut' is possible). 
 
 In total, algorithm D requires more applications of the covering procedure, 
 
 and, as result, takes more computation time. 
 
 On the other hand, estimate A n is more precise than A , and, therefore, 
 
 1 o 
 
 the algorithm D may produce a 'better' tree than algorithm S, when there are 
 many irredundant covers possible for a given decision diagram. 
 
 Both algorithms are first degree algorithms, and, as the following example 
 shows, may fail to construct the optimal decision tree. 
 Example 4 
 
 Fig. 15 gives an example of a decision diagram for which algorithms S and 
 
 D (or any other first degree algorithm) fail to produce the optimal decision tree, 
 
 (Example was constructed by Yasui [22]). 
 
 Let us compute estimates A and A- for all tests: 
 
 o 1 
 
 A ( Xl ) - 2 W = 2 
 
 A Q (x 2 ) - 2 A 2 (x 2 ) =2 
 
 A (x 3 ) = 3 W = 2 
 
 W = 1 W = 1 
 
 (The optimal cover shown in Fig. 16 was taken as the reference set for computing 
 
 V 
 
 Both algorithms select test x, for the root, while this is the only test 
 
 which does not produce an optimal tree (Fig. 17 and 18). It is easy to see that 
 it is impossible to reject test x, by evaluating the effect of only one test 
 on the decision diagram. In this case, one has to take into consideration the 
 effect of a pair of tests, i.e., to apply a second degree algorithm. 
 
 Thus, to make algorithm S (or D) able to construct optimal decision tree 
 in this case, one should compute, instead of A (A ) , the second degree cost 
 
41 
 
 x 2 x 2 
 
 
 
 
 
 
 
 1 
 
 4 
 
 2 
 
 5 
 
 U 
 
 1 
 
 8 
 
 9 
 
 2 
 
 5 
 
 
 1 
 
 1 
 
 4 
 
 7 
 
 7 
 
 1 
 
 1 
 
 3 
 
 6 
 
 3 
 
 6 
 
 
 
 C 
 
 l 
 ) 
 
 
 
 ] 
 
 l 
 
 L 
 
 A counterexample to any 
 first degree algorithm. 
 
 Figure 15. 
 
 X l *2 
 
 CD 
 
 CD 
 
 © 
 
 d) 
 
 ^ 
 
 ^ 
 
 CD 
 
 CD 
 
 CZ 
 
 D 
 
 d 
 
 UJ 
 
 CD 
 
 CD 
 
 x 4 
 x, 
 
 Optimal cover for decision diagram 
 in Figure 15. 
 
 Figure 16. 
 
42 
 
 A A A A 
 
 L 8 1 7 4 9 4 
 
 Sub-optimal decision tree for decision diagram 
 in Fig. 15. (11 nodes) 
 
 Figure 17. 
 
 Optimal decision tree for decision diagram in Fig. 15. 
 
 (10 nodes) 
 
 Figure 18. 
 
43 
 
 2 2 
 estimate A (or A ). In Fig. 19, rectangular boxes include tests which can be 
 
 selected for a given node of the tree together with the value of A . From this 
 
 o 
 
 figure we see that: 
 
 A 2 (x n ) = A (x.) + min {A (x,/x,),A (x,/x.),A (x. /x, ) } 
 ol ol o21o31o41 
 
 + min {A (x /x..),A (xJx.),A (x./x n )} 
 
 OZl Oil O 4 1 
 
 = 2 + + 0=2 
 
 A 2 (x. ) =1+1+1=3 
 o 4 
 
 wh 
 
 ere A^x./x^and A^x^) denote the estimates A^) in the framework of 
 
 subdiagrams D(x.=0) and D(x.=l)» respectively. 
 
 2 2 
 Thus, A (x n ) < A (x.), and test x. will be rejected, 
 o 1 o 4 4 
 
 The author conjectures that for any algorithm of finite degree, there 
 exists a decision diagram for which the algorithm will fail to produce the 
 optimal tree. 
 
 The above example shows that by extending the order of algorithms, the 
 
 class of conversion problems for which the algorithms produce the optimal 
 
 tree is also extended. 
 
 Let k be the maximum order of the estimate A n needed, that its computation 
 m 1 
 
 for a test, a candidate for the root, reaches the leaves of the tree under 
 
 construction. Obviously, k <^ n. 
 
 Theorem 3 : If algorithm D employs the cost estimate A.. , then the resulting 
 
 tree is guaranteed to be optimal. 
 
 Proof ; 
 
 The estimate A. (unlike A ) does not assume that any specific optimal 
 1 o 
 
 (or irredundant) cover has to be first computed, and, therefore, is not effected 
 by existence of more than one optimal (or irredundant cover) . The value of A , 
 for a test x. is the minimum number of nodes, above the lower bound given by 
 theorem 1, which can be in any tree whose root is x. . The algorithm D selects 
 the test for which A , is minimum. Therefore, the tree with so assigned root, and, 
 recursively, other nodes, will have minimum number of nodes, i.e., will be optimal. 
 It is clear now, that by varying the degree k between 1 and k of the 
 
ro 
 
 V 
 
 44 
 
 
 
 s 
 
 IO 
 
 
 * 
 
 X 
 
 l-l 
 
 < 
 
 CM 
 
 X 
 
 o 
 
 • 
 
 • 
 
 ro 
 
 
 s 
 
 X 
 
 o 
 
 < 
 X 
 
 CM 
 
 
 • 
 
 X 
 
 r-* 
 
 N 
 
 00 
 
 c 
 
 0) 
 
 4J 
 
 B 
 
 •H 
 4J 
 
 o 
 <y 
 
 03 
 
 Oi 
 
 •a 
 o 
 
 c 
 
 03 
 03 
 
 •u 
 O 
 P 
 
 o> 
 
 T3 
 
 o 
 
 CN O 
 
 <3 
 
 -O 
 
 C 
 
 CN O 
 
 <3 
 
 03 
 
 0J 
 4-1 
 
 CO 
 B 
 
 •H 
 4J 
 03 
 0) 
 
 03 
 O 
 CJ 
 
 O" 
 
 0) 
 
 n 
 
 &0 
 03 
 
 ■d 
 
 C 
 O 
 
 o 
 
 a; 
 
 03 
 
 0) 
 
 3 
 
 a 
 B 
 o 
 u 
 
 O-s 
 
 01 
 
 u 
 
 3 
 ao 
 
 •rl 
 
 "<° 
 
45 
 
 estimate A , one obtains a spectrum of methods which differ in the trade- 
 off between the computational efficiency and the degree of guarantee that 
 the obtained tree is optimal. 
 3. 5 Time optimal decision trees 
 
 In the case of converting a decision diagram to a tree corresponding to the 
 time optimal program (called, for short, time optimal tree ) one assumes that 
 tests, x., are assigned costs, indicating the time needed for test evaluation, 
 and that actions are assigned probabilities of their occurrence. The optimal 
 tree is the tree which has the minimum weighted path length, i.e., the minimum 
 
 value of 
 
 I 
 
 Z p . .path-cost . (11) 
 
 j=l J J 
 
 where 
 
 I - the number of leaves 
 
 p. - the probability of path j in the tree (suppose action A is assigned to path 
 j, and the probability of action A is p.. If the number of cells of action 
 A in the decision diagram is c , and the number of cells in the complex 
 
 a 
 
 corresponding to path j is c, then it is assumed that p.- " P A 
 
 J J c A A 
 
 path-cost. - the sum of costs of the tests assigned to nodes on the path j. 
 
 Let L be a complex of decision class A. The complex L can be assigned 
 the cost: 
 
 cost(L) = p,. test-cost(L) (12) 
 
 C L n 
 
 where p T = p. 
 
 L c . A 
 
 A 
 
 c - the number of cells in L 
 
 test-cost (L) - the sum of the costs of tests in complex L. 
 
 Selecting test x. for a node of the tree corresponds to partitioning a 
 diagram (subdiagram) D to d . subdiagrams, D(x.=0), D(x.=l) , • • • ,D.(x =d .-1) . Let 
 us assume initially, that R is an optimal cover of the diagram D, and R. , 
 j=0,l, . . . ,d.-l, are the parts of the cover lying within the diagrams D(x.=j), 
 j=0,l, . . . ,d.-l , respectively. 
 
46 
 
 Definition 10 . The cost, cost(C), of a cover C is defined as the sum of the 
 
 costs of its complexes. 
 
 The 'incremental' cost of selecting test x can be estimated as: 
 
 T (x.) - E cost(R.) - cost (R) (13) 
 
 o l j 3 
 
 Observe now, that since the costs of tests and probabilities P^ can have 
 arbitrary values, the costs of different optimal covers can also be dif- 
 ferent (the situation is then different than it was in the case of space 
 optimal trees). Consequently, in order to use T (x.) as a proper analogue 
 of A (x.) estimate, R in (13) should not be an optimal cover (def. 1), but a 
 cost optimal cover, defined as a cover of the diagram of minimum cost. 
 
 In using the T estimate for test selection, it is computationally ad- 
 vantageous to ignore the component 'cost(R)' in (13), until a test is selected, 
 and then to compute the 'complete' value of T ("similarly as in computing A.. ) . 
 
 The 'complete' value of T is needed, because the sum of T estimates, denoted 
 
 o o 
 
 ZT , over all the nodes of the obtained tree, specifies (analogously to E ) the 
 maximum possible difference between the cost of the obtained tree and the 
 optimal one. 
 
 In order to obtain an analogue T to the A. estimate, both R and R. in (13) 
 should be the minimum cost covers of diagram D and diagrams D(x.=j), respectively. 
 The sum of T estimates, denoted ET.. , over the nodes of the obtained tree plays 
 again the same role as £ . The theorem 3 holds also for T.. . 
 
 It is interesting to observe that the dynamic programming algorithm by 
 Schumacher and Sevcik [18] is equivalent, at the conceptual level, to computing 
 t! 1 (i.e., the nth order estimate T ). Some differences are that instead of the 
 cost of a complex, they use an inversely related notion of the gain , defined as 
 the difference between the sum of the costs of events in the complex and the 
 cost of the complex. (The gain can equivalently be computed as the probability 
 of the complex multiplied by the sum of the costs of tests which do not occur 
 in the term expressing the complex.) Also, the order of computing T in [18] 
 
47 
 
 n 
 is specified from the leaves of the tree up, while the definition of T suggests 
 
 ( but does not require ) computing it from the root down. The way T^ 
 is computed is, of course, a matter of implementation. The way it is done in 
 
 [21] seems to be efficient, because it constructs the cost optimal cover (cor- 
 responding to the final tree) from the single cells up, step by step, building 
 upon the intermediate results. This avoids a repetition of certain operations, 
 which would occur, if one independently constructs covers of subsequent sub- 
 diagrams, going from the whole diagram to the individual cells. 
 
 It is easy to see, however, that Schumacher and Sevcik algorithm 
 can be in certain cases very inefficient. This is because it always computes 
 the most costly estimate T- , even when a lower order estimate (much less costly) 
 could produce the optimal decision tree (or 'sufficiently optimal', as measured 
 
 by IT or ET ). 
 o 1 
 
 The following example (taken from [18]) illustrates this observation. 
 Example 5 
 
 Fig. 20 presents a decision diagram and its cost optimal cover. The large 
 size numbers in the cells indicate actions, and the small size numbers their 
 probabilities. The action -1 indicates the logically excludable events (DON'T 
 CARE-s), and the action 4 indicates ELSE events (assumed here as having prob- 
 ability). The numbers in parentheses (at the axes) indicate costs of tests. We 
 briefly illustrate here an application of algorithms S and D, in which first 
 
 order estimates T and T. are used, instead of A and A 1 . 
 o 1 o 1 
 
 1. Compute the cost, cost(R), of the optimal cover (see Fig. 20): 
 cost(R) = 0.2-30 + 0.5*30 + 0.3-25 = 28.5 
 
 2. Compute T (and T ) for each test 
 ° 3 
 
 T (x.) = TiCx-,) ■ £ cost(R.) - cost 
 
 (R) = 
 
 T (x ) = T. (x ) = 0.20-30 + 0.5-30 + 0.3-35 - cost(R) = 3 
 o 1 1 z 
 
 T (x.) = T n (x_) = 0.2-35 + 0.25-35 + 0.25-25 + 0.3*25 - cost(R) = 1 
 o j 13 
 
 Test Xj is selected for the root. 
 
48 
 
 1 
 
 .20 
 
 
 
 .25 
 
 .25 
 
 
 (1 
 
 -0 
 
 c 
 
 2) 
 
 /or>\ 
 
 
 
 
 
 
 
 
 4 
 
 
 
 
 4 
 
 0.3 
 
 (3) 
 
 
 > — -• 
 
 
 1 
 
 
 ( 
 
 5) 
 
 X 
 
 (5) 
 
 X 
 
 (10) 
 
 Decision diagram and the cost 
 optimal cover for Example 5. 
 
 Fig. 20 
 
49 
 
 3. Compute T (and T.. ) for the test-candidates for the left descendant 
 
 o 1 
 
 of the root. 
 
 T o (x 2 ) = T l( x 2 ) = 
 
 • T o (x 3 ) - Tl (x 3 ) - 1 ■ 
 
 Test x is selected, 
 o 
 
 4. Compute T (and T ) for the test-candidates for the right descendant 
 of the root. 
 
 T (xj = T.(x_) = 
 o 3 1 j 
 
 The value IT = IT = 0, thus, the tree is optimal. In fact, the tree is 
 identical to the one obtained in [21], though its derivation required much 
 less computation. 
 
 IV. SUMMARY 
 
 We have shown that the decision diagram introduced in the paper can be 
 
 useful, both, as a conceptual model for describing algorithms, and as a practical 
 
 tool for decision table design and conversion to space or time optimal decision 
 
 trees. The advantage of the decision diagram is that rules in a decision table 
 
 (or leaves of a tree) are represented as certain geometrical configurations, and 
 
 relationships between the rules are represented as spatial relations between 
 
 these configurations. 
 
 For this reason, the decision diagram can also be used as an educational 
 
 aid, for visually illustrating concepts and algorithms related to decision tables 
 
 and decision trees. 
 
 It may be of interest to the reader to mention here the results of an experiment 
 done by the author in comparing the time spent in solving the same problem, 
 using a conventional method and the decision diagram. The problem was to verify 
 (check consistency, completeness and non-redundancy), reduce and convert to space 
 optimal decision tree, the decision table shown in Fig. 1. The time spent on 
 various phases of the problem by: A - the person who used a conventional 
 method (a faculty member who teaches decision tables ) and B - the author using 
 the decision diagram, is given in Fig. 21. It should be mentioned that the decision 
 tree obtained by person A had 1 more node than the optimal tree obtained using the 
 decision diagram. Note, also, that the most of the time (10 minutes) in the 
 decision diagram method was spent just on determining the decision diagram (which 
 is rather a mechanical process, not requiring the knowledge of decision table 
 algorithms) . 
 
50 
 
 
 Time (min) 
 
 
 Using a 
 conventional 
 method 
 
 Using decision 
 diagram 
 
 Draw diagram 
 
 Draw complexes in the diagram 
 
 Reduce table (determine cover) 
 
 Verify 
 
 Convert to tree 
 
 - 
 
 1 
 
 9 
 
 1 
 
 0' 5" 
 2 
 
 13 
 
 2 
 
 2' 30" 
 
 TOTAL 
 
 17' 30" 
 
 13'5" 
 
 Time spent on various phases of the problem using a 
 conventional method and the decision diagram. 
 
 Figure 21 
 
51 
 
 The concept of kth degree conversion algorithm, also introduced in the paper, 
 permits one to generate a spectrum of conversion algorithms, differing in the 
 trade-off between the computational efficiency and the degree of guarantee of 
 the decision tree optimality. The algorithms S and D were shown to be applicable 
 for both space and time optimal conversion, and they can use cost estimates of a 
 different degree. When algorithms do not produce the optimal tree, they gave a 
 measure of the maximum possible difference between the obtained and the 
 optimal trees. 
 
 ACKNOWLEDGMENTS 
 
 The research reported in this paper was supported by a grant from the 
 National Science Foundation, NSF MCS 76-22940. 
 
 The author thanks Professor Gary Kampen for stimulating discussions and 
 comments, and Tom Dietterich for proofreading of the paper. 
 
52 
 
 REFERENCES 
 
 1. Alster, T. M. Heuristic algorithms for constructing near-optimal decision 
 trees. Report No. UIUCDCS-R-71-474, Department of Computer Science, 
 University of Illinois, Urbana, IL., Aug. 1971. 
 
 2. Bayes, A. T. A dynamic programming algorithm to optimise decision table 
 code, Australian Computer T. 4 (May 1973), 77-79. 
 
 3. Fisher, D. L. Data documentation and decision tables. Comm. ACM , 
 18 (Jan. 1965), 26-31. 
 
 4. Ganapathy, S. , Rajaraman, V. Information theory applied to the conversion 
 of decision tables to computer programs. Comm. ACM 16 , 9 (Sept. 1973), 
 532-39. 
 
 5. Jarvis, J. M. An analysis of programming via decision table compilers. 
 SIGPLAN Notices (ACM Newsletter) 6,8 (Sept. 1971), 30-32. 
 
 6. King, P. J. H. Conversion of decision tables to computer programs by 
 the rule mask technique. Comm. ACM 9 , 11 (Nov. 1966), 796-801. 
 
 7. Kirk, G. W. Use of decision tables in computer programming. Comm. 
 ACM 8 , 1 (Jan. 1965), 41-43. 
 
 8. Larson, J., Michalski, R. S. AQVAL/1 (AQ7) User's guide and program 
 description. Report No. UIUCDCS-R-75-731, Department of Computer Science, 
 University of Illinois, Urbana, IL. June 1971. 
 
 9. Michalski, R. S. On the quasi-minimal solution of the general covering 
 problem. Proceedings of the V international symposium on information 
 processing (FCIP 69), Vol. A3 (Switching Circuits), Yugoslavia, Bled, 
 Oct. 8-11, 1969, 125-127. 
 
 10. Michalski, R. S. A geometrical model for the synthesis of interval 
 covers. Report No. UIUCDCS-R-71-731, Department of Computer Science, 
 University of Illinois, Urbana, IL., June 1975. 
 
 11. Michalski, R. S. Synthesis of optimal and quasi-optimal variable-valued 
 logic formulas. Proceedings of the 5th International Symposium on Multiple- 
 Valued Logic, Bloomington, Indiana, May 13-16, 1975, 76-87. 
 
 12. Michalski, R. S. A Planar Geometrical Model for Representing Multi- 
 dimensional Discrete Spaces and Multiple-Valued Logic Functions. 
 Report No. UIUCDCS-R-78-897 , Department of Computer Science, University 
 of Illinois, Urbana, IL., January 1978. 
 
 13. Michie, D. AL1 - a package for generating strategies from tables. 
 SIGART Newsletter, No. 59, 1976. 
 
 14. Pollack, S. Conversion of limited-entry decision tables to computer 
 programs. Comm. ACM, 8, 11 (Nov. 1965), 677-82. 
 
 15. Pooch, U. W. Translation of decision tables. Computing Surveys 6 , 
 (June 1974), 125-51. 
 
53 
 
 16. Rabin, T. Conversion of limited-entry decision tables into optimal 
 decision trees: fundamental concepts. SIGPLAN Notices (ACM Newsletter) 
 6, (Sept. 1971), 68-71. 
 
 17. Reinwald, L. T., and Soland, R. M. Conversion of limited-entry decision 
 tables to optimal computer programs - I: Minimum average processing time. 
 J. ACM 13 , 3 (July 1966), 339-58., II: Minimum storage requirement. 
 
 J. ACM 14 , 4 (Oct. 1967), 742-55. 
 
 18. Schumacher, H., Sevcik, K. C. The synthetic approach to decision table 
 conversion. Comm. ACM, 19 , 6 (June 1976), 343-51. 
 
 19. Shwayder, K. Extending the information theory approach to converting 
 limited-entry decision tables to computer programs. Comm. ACM 17 , 9 
 (Sept. 1974), 532-37. 
 
 20. Shwayder, K. Conversion of limited-entry decision tables to computer 
 programs - a proposed modification to Pollack's algorithm. Comm. ACM 14 , 
 2 (Feb. 1971), 69-73. 
 
 21. Verhelst, M. The conversion of limited-entry decision tables to optimal 
 and near-optimal flowcharts: two new algorithms, Comm. ACM 15 , 11 
 (Nov. 1972), 974-80. 
 
IBLIOGRAPHIC DATA 
 *EET 
 
 1. Report No. 
 
 UIUCDCS-R-78-898 
 
 2- 
 
 3. Recipient's Accession No. 
 
 Title and Subtitle 
 
 DESIGNING EXTENDED ENTRY DECISION TABLES AND OPTIMAL 
 
 5. Report Date 
 
 March 1978 
 
 DECISION TREES USING DECISION DIAGRAMS 
 
 6. 
 
 Author(s) 
 
 Ryszard S. Michalski 
 
 8. Performing Organization Rept. 
 
 No - UIUCDCS-R-78-898 
 
 Performing Organization Name and Address 
 
 University of Illinois at Urbana-Champaign 
 
 10. Project/Task/Work Unit No. 
 
 Department of Computer Science 
 Urbana, Illinois 61801 
 
 11. Contract /Grant No. 
 
 , Sponsoring Organization Name and Address 
 
 13. Type of Report & Period 
 Covered 
 
 
 14. 
 
 . Supplementary Notes 
 
 • Abstracts -jhe paper introduces the concept of a decision diagram and shows its applica- 
 Lon to designing extended entry decision tables and converting them to space or time 
 jtimal decision trees. A decision diagram is a geometrical representation of a 
 scision table by means of a planar model of a multidimensional discrete space as 
 ascribed in [12] . 
 
 Two algorithms for optimal (or suboptimal) space or time conversion are described 
 sing decision diagrams. These algorithms are basically decomposition algorithms, but 
 f varying their degree (def. 5), one can obtain a spectrum of algorithms, differing in 
 le trade-off between the computational efficiency and the degree of guarantee that the 
 jlution is optimal. When the algorithms do not guarantee the optimality, they give a 
 
 jasure of the maximum possible distance between the 
 
 obtained and the optimal trees. 
 
 . Key Words and Document Analysis. 17a. Descriptors 
 
 Limited Entry Decision Tables, Extended Entry Decision Tables, Decision Trees, 
 Conversion Algorithms, Decision Diagram, Logic Diagram. 
 
 CR Categories: 8.3 
 
 1>. Identifiers/Open-Ended Terms 
 'e. COSATI Fie Id /Group 
 
 .Availability Statement 
 
 Release Unlimited 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 S7 
 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 22. Price 
 
 )«M NT1S-35 (10-70) 
 
 USCOMM-DC 40329-P7 1 
 
t 
 
 2 1979