k 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/heuristicalgorit474alst 
 
yo */7i -Re p° rt N °- ^ 
 
 /r/#s&~ 
 
 HEURISTIC ALGORITHMS FOR CONSTRUCTING 
 NEAR-OPTIMAL DECISION TREES 
 
 August 17, 1971 
 
 by 
 
 Joan Manning Alster 
 
 NOV 9 1972 
 
 UNIVERSITY QF ILLINOIS 
 AT URBANA-CHAMPAIGN 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
Report No. kfk 
 
 HEURISTIC ALGORITHMS FOR CONSTRUCTING 
 NEAR-OPTIMAL DECISION TREES 
 
 by 
 Joan Manning Alster 
 
 August 17, 1971 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 618OI 
 
 *This work was submitted in partial fulfillment of the requirements 
 for the degree of Master of Science in Computer Science, August 1971? 
 and was supported in part by the National Science Foundation and the 
 Department of Computer Science. 
 
iii 
 
 ACKNOWLEDGMENT 
 
 I would like to express my appreciation to Professor Jurg 
 Nievergelt for the guidance he gave me with this thesis. Also, I 
 would like to thank Professor J. N. Snyder, the University of Illinois 
 Department of Computer Science, and the National Science Foundation 
 for providing the computer time necessary for my research. 
 
IV 
 
 TABLE OF CONTENTS 
 
 Page 
 
 ACKNOWLEDGMENT iii 
 
 1. INTRODUCTION 1 
 
 2. HEURISTIC ALGORITHMS. . . ■ 10 
 
 2.1. Algorithm 1 (Constant Cases) 10 
 
 2 .2 . Algorithm 2 (Weight Function) 12 
 
 2.3. Algorithm 3 (Weight Function) 13 
 
 3- AN IMPROVED HEURISTIC ALGORITHM 17 
 
 3-1. Algorithm k (Dash-Count) 2k 
 
 LIST OF REFERENCES 29 
 
1. INTRODUCTION 
 
 The logical structures of certain types of problems may be 
 represented by decision trees. A decision tree is a binary tree whose 
 internal nodes represent points in time at which decisions must be made 
 to take either the left or right branches of these nodes. The root 
 of a decision tree represents the status of a problem before any decisions 
 have been made, and the leaves of the tree represent all possible out- 
 comes of the problem (which the tree represents) which could result from 
 all possible combinations of decisions made at the internal nodes. 
 
 There has been a great deal of study done on related tree 
 problems. Two major areas of study have been optimal search trees and 
 Huffman's tree constructions for minimum redundancy codes. Decision 
 trees are, in fact, a generalization of these other types of trees, 
 and there are many optimality problems which arise in the area of 
 decision trees which cannot be solved using the algorithms derived for 
 handling these other, related tree problems. 
 
 The problem of decision trees arises in the study of decision 
 tables and in the conversion of limited-entry decision tables to 
 decision trees for the purpose of computer programming. Algorithms 
 have been found for converting a decision table to a computer program 
 which uses a minimum amount of storage. The storage requirement for 
 the program is minimized by using the given decision table to construct 
 a corresponding decision tree which has a minimum number of nodes and 
 
constructing the program from this optimal decision tree. There are 
 also algorithms for converting a decision table to a computer program 
 which executes in a minimum amount of time. The execution time is 
 minimized by assigning a probability, or frequency of occurrence, to 
 each possible outcome (column) in the table and constructing the 
 corresponding decision tree so that the most likely outcomes are resolved 
 at relatively low level nodes of the tree and less likely outcomes are 
 resolved at the higher level nodes of the tree. The decision tree for 
 minimizing execution time will probably contain more than the minimum 
 number of nodes because frequently occurring outcomes will be resolved 
 in as few decision steps (nodes) as possible even if this necessitates 
 additional decision steps for resolving outcomes which seldom occur. 
 [For discussion of the two algorithms mentioned here, see Pollack, 
 "Conversion of Limited Entry Decision Tables to Computer Programs." 
 Communications of the ACM , Vol. 8, No. 11, November 1965, pp. 677-682.] 
 
 The decision table application of decision trees is not entirely 
 general. Below is an example of a decision table: 
 
 
 c i 
 
 C 2 
 
 
 C 
 
 n 
 
 p l 
 
 N 
 
 Y 
 
 . . . 
 
 Y 
 
 P 2 
 
 Y 
 
 - 
 
 . . . 
 
 N 
 
 • 
 
 • 
 
 • 
 
 . . . 
 
 • 
 
 P 
 m 
 
 — 
 
 N 
 
 . . . 
 
 Y 
 
 Figure 1 
 
A particular case, C , is determined "by whether each of certain 
 predicates (conditions) p, , p , ..., p holds (entry in table is Y), 
 does not hold (entry in table is N), or does not apply (entry in 
 table is — ) . Each — , or "don't-care" entry under a particular case 
 C in the table is a substitute for listing two configurations (assign- 
 
 K. 
 
 ments of Y or N to each p.) of the P., p~, .... p in that case (one 
 
 1 1* 2' ' m 
 
 with a Y where the dash occurs, the other with an N where the dash occurs) 
 Thus, cases containing one or more don't-care entries actually include 
 several configurations of p.,..., p • However, the possible combina- 
 tions of configurations of p, , ..., p which can be represented by a 
 single column containing don't-care entries is only a subset of all 
 possible combinations of configurations which could be included in a 
 case. There are certain decision tree applications for which it is 
 necessary to be able to assign any combination of configurations of 
 the p , ..., p to each case (each configurations will belong to only one 
 C, ) • Construction of optimal decision trees for each applications cannot 
 be accomplished by using the previously-mentioned algorithms devised 
 for converting decision tables to decision trees. 
 
 One such general application of decision trees arises in the 
 problem of trying to optimize the efficiency of branching in a computer 
 program. Consider the following simple example which is illustrated 
 in Figure 2: We are working with the x and y coordinates of points 
 in the Euclidean plane. Assume x and y are non-zero. We wish to branch 
 to different parts of the program depending upon whether we have case 1 
 (the point is in the first quadrant), case 2 (the point is in the second 
 quadrant), or case 3 (the point is in the third or fourth quadrants). 
 
y 
 
 Case 2 
 
 Case 1 
 
 Cas<; 3 
 
 Figure 2 
 
 -> x 
 
 The two possible ways to program this branching process are shown in 
 the two decision trees below. 
 
 no 
 
 yes 
 
 yes 
 
 Figure 3& 
 
 Figure 3b 
 
 Since the number of tests required to resolve any case in Figure 3b is 
 always less than or equal to the number of tests required to resolve a 
 case in Figure 3a, Figure 3b represents the preferable programming logic 
 
For this example, there were only two possible decision trees, so it 
 was convenient to examine "both trees and choose the better of the two. 
 However, the relatively complex logic of most problems programmed for 
 the computer makes such a trial-and-error analysis highly impractical. 
 Therefore, we would like to devise a systematic method for analyzing 
 a programming problem and creating a decision tree which will optimize 
 its branching processes. The remainder of this paper will be concerned 
 with this problem. 
 
 A programming branching problem can be represented by a table 
 from which a decision tree can be constructed. For example, the problem 
 represented by Figure 2 can be represented by the following table : 
 
 
 c i 
 
 C 2 
 
 °3 
 
 P x (x > 0) 
 
 l 
 
 
 
 
 
 
 1 
 
 P 2 (y > o) 
 
 l 
 
 1 
 
 
 
 
 
 
 = 
 
 1 = 
 
 false 
 true 
 
 Figure k 
 
 There are two predicates (conditions), p, and p ? , each of which may be 
 either true or false. Therefore, there are 2 = k possible combinations 
 for these two predicates. If p and p ? are both true, we have case C, ; 
 if p, is false and p p is true, we have case C p ; and if p p is false 
 (and p either true or false), we have case C_. 
 
If we make use of the "don't-care" symbol in constructing 
 our table, then the table in Figure h could be represented as the 
 decision table in Figure 5. (More complicated problems generally 
 will not be representable in decision table form.) However, for the 
 present, we shall choose not to use the "don't-care" representation; 
 therefore, if we have n predicates our table will always contain 2 
 columns. This means that if certain combinations of ones and zeroes 
 do not concern us in a particular problem, they still must be entered 
 in the table. However, all these combinations may be grouped into a 
 single C , the "else" case. 
 
 
 C l 
 
 C 2 
 
 C 3 
 
 p l 
 
 1 
 
 
 
 - 
 
 P 2 
 
 1 
 
 1 
 
 
 
 Figure 5 
 
 Our present study is based on two assumptions: (l) that the 
 cost of making a decision remains constant over all nodes of the 
 decision tree, i.e. the truth value of a particular p. may be 
 determined with equal ease for all p., and (2) each case occurs with 
 equal probahility. When these assumptions hold true, the best branching 
 logic for a computer program will be that for which the corresponding 
 decision tree has a minimal number of nodes. We shall refer to the tree 
 with the minimal number of nodes as the optimal tree. There may be more 
 than one optimal tree for a given problem. 
 
The search for a systematic method for finding the optimal 
 tree has not yet yielded an algorithm which is guaranteed to produce 
 the optimal tree on the first try. However, I have found several 
 heuristic algorithms which, when applied, greatly reduce the effort 
 that is required to produce the optimal tree by trial-and-error methods. 
 These heuristic algorithms will be discussed in Chapters 2 and 3- 
 
 Let us first consider the obvious algorithm of finding an 
 optimal decision tree by exhaustive search. Consider the problem 
 illustrated by the following table: 
 
 
 c i 
 
 C 2 
 
 C 3 
 
 ■ p 
 
 M l 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 
 
 1 1 
 
 1 
 
 1 
 
 p 2 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 
 
 
 
 
 
 
 
 1 1 
 
 1 
 
 1 
 
 P 3 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 p k 
 
 
 
 1 
 
 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 Figure 6 
 
 Here n=4 so we have 2 =16 columns in the table. From this table, we 
 shall construct a tree which will have the following structure: 
 
To be filled 
 with some p. 
 
 Resolved Cases 
 
 Figure 7 
 
 The maximum number of nodes the tree can contain is 15 (= 2 -1, where n 
 
 is the number of predicates), but it may contain fewer than 15 nodes if 
 
 parts of some cases (some columns from the table) can be resolved without 
 
 testing all of the p. . If we conduct an exhaustive trial-and-error search 
 
 l 
 
 for the optimal tree, how many trees must we check? Realizing that each 
 
 root-to-leaf path can contain a particular p. at most once, we see that 
 
 2 
 we have h choices for the level zero node, 3 choices for the level one 
 
 k 8 
 
 nodes, 2 choices for the level two nodes, and 1 choices for the level 
 
 three nodes. Thus, the maximum number of trees that would have to be 
 
 P h R 
 inspected is k-3 *2 »1 = 576 trees. Of course, if some of the trees 
 
 have fewer than 15 nodes, there will be fewer trees to investigate, but 
 the number will remain quite large--too large, in fact, to make trial- 
 and-error investigation feasible even when n is as small as k. When the 
 exhaustive trial-and-error search was programmed and run on the computer, 
 
9 
 it was found that for this example, kQk trees had to be inspected to 
 discover that there exist two optimal trees with six nodes each. In 
 general, when there are n predicates, the upper bound for the number 
 of trees which must be inspected to ensure obtaining the optimal tree 
 is given by: 
 
 Maximum number of trees to inspect = 
 
 „( 2 °).(„.i)( 2l ).(„- 2 )( 22 ).....(„. k )( 2k ). ... .(i)^"' 1 *) 
 
 Obviously, an exhaustive trial-and-error search for the optimal decision 
 tree requires too much work to be feasible. 
 
10 
 
 2. HEURISTIC ALGORITHMS 
 
 2.1. Algorithm 1 (Constant Cases) 
 
 Step 1 ; To determine which p. to select when constructing 
 the decision tree, look at the table and choose the p. for which the 
 most cases are constant (either all zeroes or all ones), if such a p. 
 exists. For example, in Figure k, for p, , cases C, and C p are constant, 
 hut for p_, all three cases are constant. Therefore, p p should be 
 chosen first. Indeed, as shown by Figures 3a and 3b, Po is the 
 preferable choice . 
 
 Step 2 : If no such p. exists, proceed as though by the 
 exhaustive search method and apply this "constant case" algorithm to 
 resulting subtables whenever possible. Whenever this algorithm yields 
 a "best" choice for a particular node, no other p.'s need to be tried 
 for that node (unless, of course, in proceeding through the exhaustive 
 search we change a p. which lies on the path between the root and our 
 particular node). Therefore, we eliminate looking at some of the trees 
 we might otherwise have considered when searching exhaustively for the 
 optimal tree. Note that when two or more p. 's have the same number of 
 constant cases, each of these P. ' s should be tried; they might not all 
 prove to be equally good choices. To illustrate Step 2, refer to Figure 6 
 There are no constant cases for any of the p. 's. Therefore, proceed 
 as by the exhaustive search method. Suppose p is chosen to be the root 
 of the tree. Construct two subtables: 
 
11 
 
 p l = ° 
 
 
 C l 
 
 C 2 
 
 C 3 
 
 P 2 
 
 
 
 1 
 
 1 
 
 1 
 
 1 
 
 
 
 P 3 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 p k 
 
 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 
 
 p l = 
 
 = 1 
 
 
 
 
 c l 
 
 C 2 
 
 
 p 2 
 
 
 
 
 
 1 
 
 1 1 
 
 1 
 
 B 3 
 
 1 1 
 
 
 
 
 
 1 
 
 1 
 
 % 
 
 1 
 
 1 
 
 
 
 1 
 
 1 
 
 Figure 8 
 
 To obtain the left descendant of the root (p = 0), note that "both 
 Pp and p~ have two constant cases so both of these predicates must be 
 tried, but p. need not be tried. To obtain the right descendant of the 
 root (p = l), note that p ? has three constant cases, more than either 
 p or p. , so choose p p . p_ and p. never need to be considered. The tree 
 now looks like this: 
 
 (Must also 
 try p here) 
 
 Figure 9 
 
 Now construct two more subtables for each bottom node and continue 
 constructing the tree until all the leaves of the tree are resolved 
 cases. After constructing all possible trees with p, at the root 
 
12 
 
 (eliminating the consideration of some, of course, by using the 
 algorithm), go back and do the same for roots of p p , p , and Pi . 
 (Since the algorithm did not, in this example, yield any information 
 
 about which p. -would be a best choice for the root of the decision 
 
 1 
 
 tree, all roots must be tried.) The optimal decision tree is the best 
 tree found by the above-described search. When Algorithm 1 was 
 programmed and run on the computer, it was found that for the above 
 example, 36 trees were inspected to find the two optimal trees with 
 six nodes each. This is in contrast to the U8U trees examined in the 
 exhaustive search. 
 
 2.2. Algorithm 2 (Weight Function) 
 
 Step 1 ; For each predicate, p., calculate a weight given by 
 
 n-1 
 WT X = Z Z K • P R 
 all cases k=l 
 
 where (a) P is the number of groups of 2 zeroes or ones there are 
 within a case, (b) no zeroes or ones are counted more than once, and 
 
 (c) longer strings are counted over shorter strings (i.e. four zeroes 
 
 2 1 
 
 would be counted as one group of 2 zeroes, not two groups of 2 zeroes). 
 
 For example, if one case within a p. contains six zeroes and three ones, 
 
 the weight for that case would be given by: 
 
 wt == 2 • 1 v + „ 1 • 2 - 
 
 poop s v S* ry\ x 
 
 one group of \ / one group of 2 zeroes \ (one 1 left over) 
 2^ zeroes J ( + one group of 2^ones J 
 / I = two groups of 2>P zeroes/ 
 \or ones / 
 
13 
 
 To find WT., for p., sum the weights over all cases 
 1 i 7 
 
 Step 2 ; Choose the predicate for which the weight is a 
 maximum. If more than one predicate has the maximum weight, all 
 those with the maximum weight must be tried to ensure that the next 
 node of the decision tree will be filled with the "best" predicate. 
 
 We shall evaluate the weights for all the predicates in 
 the example in Figure 6 . 
 
 p.: WI 1 = (2-1 + 1-1) + (1-1) + (2-1 + 1-1) = 7 
 
 P 2 : WT 2 = (2-1 + 1-1) + (1-1) + (2-1 + 1-1) = 7 
 
 P 3 : WT 3 = (2-1 + 1-1) + (1*1) + (2-1 + 1-1) = 7 
 
 p^: WI k = (2-1 + 1-1) + (1-1) + (1-2) = 6 
 
 The algorithm happens not to be decisive for choosing the root of the 
 decision tree, though it does show that p. would be the worst choice 
 for the root (so p. need not be considered as a possible root). When 
 Algorithm 2 was run on the computer, 22 trees were inspected to find 
 the two optimal trees with six nodes each. This is somewhat better 
 than the 36 trees inspected when using Algorithm 1. 
 
 2.3. Algorithm 3 (Weight Function) 
 
 given by 
 
 Step 1 : For each predicate, p., calculate a weight 
 
 n-1 
 WT 2 = Z Z K • P R 
 all cases k=2 
 
 where the notation is the same as that described for Algorithm 2 
 
Ik 
 
 Step 2 ; Same as Step 2 for Algorithm 2 . 
 Again, we shall evaluate the weights for the predicates 
 in Figure 6. 
 
 Pl : (2 2 -l + 1 2 -1) + (1 2 -1) + (2 2 -l + 1 2 -1) = 11 
 
 P 2 : (2 2 -l + 1 2 -1) + (1 2 -1) + (2 2 -l + 1 2 -1) = 11 
 
 P 3 : (2 2 -l + 1 2 -1) + (1 2 .1) + (2 2 -l + 1 2 -1) = 11 
 
 P k : (2 2 -l + 1 2 -1) + (1 2 -1) + (l 2 -2) = 8 
 
 Algorithm 3 has the effect of weighing more heavily the larger groups 
 of ones and zeroes than does Algorithm 2. As with the earlier algorithms, 
 Algorithm 3 is indecisive for selecting a root in the example shown 
 here. However, when this algorithm was run on a computer, it was found 
 that only eight trees needed to be investigated to find the two optimal 
 trees with six nodes each. This is a considerable improvement over the 
 22 trees which had to be investigated when Algorithm 2 was used. 
 
 The three algorithms plus the exhaustive search were programmed 
 for the computer and run for eight different problem situations, each 
 with n=U (hand testing is fairly easy for n<3)« The eight trials were not 
 random problem situations, but rather, were carefully selected to 
 represent as wide a range of different types of situations as possible. 
 The number of cases varied from two to five. The results are 
 summarized below: 
 
15 
 
 En 3 
 
 H pc, 
 K 
 
 O Eh 
 
 t< MO 
 
 0) DO) 
 
 P^ II Pi 
 
 a o ?h w 
 
 3 EH c 
 
 P CH (1) -H 0) 
 
 Xi -P 
 O ft Eh 
 S O 
 
 CO W fd 
 
 S (D C 
 
 p <h t! 0) 3 
 
 ~ O -p U O 
 
 Pi EH fe 
 
 B O 
 
 S 0) 
 
 •rH (D 
 
 P <<H - 
 
 " O tj-p h 
 O ft EH 
 B SO 
 
 - O -p u o 
 
 ftHh 
 a O 
 
 01 
 
 ■p 
 
 W O 
 
 0) 0) 
 
 (I) ft 
 
 k W 
 
 EH C 
 
 H 
 
 C 
 •H rH 
 
 to g <u 
 
 ,Q <*H <L> -H <D 
 
 " OtJ-P h 
 O ft EH 
 3 O 
 
 fl |« J Hi H 
 
 S O -p Sh 
 
 3 ft EH C-h 
 
 B o 
 
 w £ <D 
 
 t) P M 
 O ft EH 
 3 O 
 
 H 
 
 <u B m - 
 
 f> HH -rl <L1 3 
 
 e o p *-« o 
 
 
 S 
 
 CO -p 
 
 
 
 <u o 
 
 -H- 
 
 < > 
 
 <u v 
 
 CO 
 
 rr> 
 
 <Vh h ft 
 
 3f 
 
 -H- 
 
 O EH w 
 
 
 
 c 
 
 
 
 3 
 
16 
 
 As the table shows, the algorithms did not always yield all the 
 optimal trees. However, that is not a problem because any optimal 
 tree is considered to be as good as any other. Algorithm 2 nearly 
 always required examination of fewer trees than Algorithm 1, and 
 Algorithm 3 was always at least as good, and often better than 
 Algorithm 2 in this respect. The starred (*) entries in the table 
 indicate instances in which an algorithm failed to find an optimal 
 tree. If one is looking for accuracy in finding the optimal tree, 
 Algorithm 2 appears best. However, if one is more interested in speed 
 and cares only that he find a "good" (and not necessarily a "best") 
 tree, Algorithm 3 is preferable. In any case, all the algorithms 
 reduced substantially the exhaustive search effort, and all found 
 good, though not always optimal, trees. The algorithms remain 
 untested for n >k . 
 
IT 
 
 3- AN IMPROVED HEURISTIC ALGORITHM 
 
 After extensive experimentation with Algorithms 1, 2, and 
 3, I became convinced that the single most important consideration 
 when selecting a "best" predicate is the number of occurrences of zeroes 
 or ones in groups of powers of two within cases. This suggests consider- 
 ing decision tables which contain "don't-care" (-) entries since the 
 dashes are helpful in indicating where powers of two occur. For example, 
 the case C, in Figure 11a could be represented as in Figure lib with 
 the dash (-) signifying that p is an undesirable choice as far as case 
 
 C, is concerned, 
 k 
 
 
 C k 
 
 
 
 p l 
 
 1 
 
 P 2 
 
 1 1 
 
 P 3 
 
 1 1 
 
 % 
 
 
 
 Figt 
 
 ire 11a 
 
 
 C k 
 
 p l 
 
 - 
 
 P 2 
 
 1 
 
 P 3 
 
 1 
 
 % 
 
 
 
 Figure lib 
 
 The Quine-McCluskey minimization procedure is used to construct the table 
 with "don't-care" entries. For n < k, a Karnaugh map is also useful. 
 Let us use the Quine-McCluskey procedure to convert the table in 
 Figure 6 to a table containing "don't-care" entries. Each case must 
 be handled separately, so we begin with C, • First, we find the prime 
 implicants as shown below: 
 
18 
 
 Decimal 
 Representation 
 
 of 
 Binary Sequence 
 
 Complete 
 Sequences 
 
 of p . ' s 
 
 l 
 
 in Case 1 
 
 Derived "Don't-Care" Sequences 
 
 (2) 
 
 (*0 
 
 1 n/ 
 
 0100\l 
 
 (2,3): 1 -n/ 
 (2,6): 0-10 
 
 (2,10): - 1 On/ 
 (U,5): 010- 
 (^,6): 1-0 
 
 (2,3,10,11): - 1 - 
 
 (3) 
 (5) 
 
 (6) 
 
 do) 
 
 OOllJ 
 1 In/ 
 
 onoJ 
 1010J 
 
 (3,11): - 1 In/ 
 
 (10,11): lOlJ 
 
 (ii) 
 
 10 11 
 
 Prime Implicants 
 (non-checked sequences) 
 
 -01- 
 
 0-10 
 10- 
 1-0 
 
 Figure 12 
 
19 
 
 The following comments are made in regard to Figure 12: 
 
 (1) The columns in Case C in Figure 6 are represented 
 horizontally as complete sequences of p.'s. 
 
 (2) To facilitate finding derived sequences, the complete 
 sequences (and thus, the derived sequences also) are listed in order 
 of the number of l's they contain. 
 
 (3) In order for two sequences to be combined, they must be 
 identical in all but the one position in which one sequence will contain 
 a 1, the other a 0. 
 
 (h) When two sequences are combined, they are checked (v) 
 in the table. The prime implicants are all those sequences which are 
 unchecked after no more sequences can be combined to form more 
 derived sequences. 
 
 (5) Every sequence must be compared with all sequences below 
 it which contain the same number of l's or one more 1 than the given 
 sequence (even if some of these sequences are already checked), and all 
 possible derived sequences must be written. 
 
 After the prime implicants have been obtained, a McCluskey 
 Chart is constructed. For our example, the following chart is obtained. 
 
20 
 
 Derived 
 Don't-Care 
 Sequences 
 
 Derived From 
 Which Complete 
 Sequences 
 
 Seque 
 2 
 
 ■nces Belonging to Case C 
 3 k 5 6 10 11 
 
 1- 
 0-10 
 10- 
 1-0 
 
 (2,3,10,11) 
 
 (2,6) 
 
 (^,5) 
 
 (h,6) 
 
 
 
 
 X 
 
 X 
 
 X X ) 
 X 
 
 X 
 
 X 
 
 Essential Prime Implicants: 
 
 
 
 Non-Essential Prime Implicants: 
 
 
 
 Ol- 
 IO- 
 
 Must have one of 
 these to cover 
 "6" column 
 
 Figure 13 
 
 In the new table containing "don't-care" entries, C n is 
 represented as follows: 
 
 
 c l 
 
 
 
 
 
 
 J— —. 
 
 
 i— — | 
 
 
 p l 
 
 - 
 
 
 
 
 
 
 
 
 
 
 P 2 
 
 
 
 1 
 
 
 - 
 
 or 
 
 1 
 
 
 P 3 
 
 1 
 
 
 
 
 1 
 
 
 - 
 
 
 % 
 
 - 
 
 - 
 
 
 
 
 
 
 
 
 Figure lU 
 
21 
 
 The table entries for C ? and CL can also "be derived using 
 the Quine-McCluskey procedure to yield the following table: 
 
 
 C l 
 
 C 2 
 
 C 3 
 
 p l 
 
 - 
 
 
 
 ~0~~ 
 
 
 
 
 
 1 
 
 1 
 
 p 2 
 
 
 
 1 
 
 - 
 
 or 
 
 1 
 
 
 1 
 
 1 
 
 P 3 
 
 1 
 
 
 
 1 
 
 
 - 
 
 
 1 
 
 - 
 
 % 
 
 _ 
 
 _ 
 
 
 
 
 
 
 
 - 1 
 
 _ _ 
 
 
 
 
 
 
 
 Figure 15 
 
 -Figure 15 could also have been all or partly derived from the following 
 Karnaugh map which can be obtained directly from Figure 6. Of course, 
 use of a Karnaugh map is only practical for n < k. 
 
 Entries in map indicate to which 
 case the corresponding sequence 
 of l's and O's belongs. 
 
 <CD = Essential Prime Implicant 
 
 -'.':- = Non-Essential Prime Implicant 
 
 Figure 16 
 
22 
 
 The following comments are in regard to Figure 15: 
 (l) Essential prime implicants are represented without 
 brackets ([]) with one important exception: when two essential prime 
 implicants overlap in all but one column of the McCluskey Chart or in 
 all but one box of a Karnaugh map (that one column or box being the one 
 which makes each of the two essential prime implicants essential), then 
 when entering the two essential prime implicants in the table, put 
 brackets around each and insert the word "or" between the entries. 
 For example, suppose the following map and chart illustrate a case 
 C 3 (n=U): 
 
 Pi 
 
 
 «§ 
 
 'WW 
 
 wi 
 
 111 
 
 0111=7 
 3 
 
 m 
 
 li 
 
 ■ 
 
 1111=15 
 3 
 
 1110=14 
 3 
 
 //jj%%% 
 
 ill 
 
 WM 
 
 /y M& 
 
 
 7 1*+ 15 
 
 (7,15): -111 
 (14,15): 111- 
 
 X X 
 X X 
 
 Karnaugh Map 
 
 McCluskey Chart 
 
 Figure 17 
 
 Both -111 and 111- are essential prime implicants and they overlap in 
 all but one box in the Karnaugh map and all but one column of the 
 McCluskey Chart (-111 is essential because it contains "7", 111- because 
 it contains "lV) . Therefore, the case C would be represented: 
 
 
 C 3 
 
 p l 
 
 
 - 
 
 
 l" 
 
 P 2 
 
 
 1 
 
 or 
 
 1 
 
 P 3 
 
 
 1 
 
 
 1 
 
 % 
 
 
 1 
 
 
 L -> 
 
 Figure 18 
 
23 
 
 .(2) When there is a choice of which non-essential prime 
 implicant (or combination of non-essential prime implicants) to select, 
 list all choices in the table, each in brackets, and join the brackets 
 by the word "or". Call such a group of bracketed entries joined by the 
 word "or" an or-group (each bracketed entry will be called a component 
 of the or-group), and define the or-number of an or-group to be the 
 number of components contained in the or-group. There may be more than 
 one or-group in a case. For example, a case could have the following 
 structure: []or[] []or[]or[]. The first or-group has an or-number 
 of 2, the second an or-number of 3* 
 
 (3) Define the case-count for a case to be 2 where 
 
 r = (number of dashes occurring in 
 
 non-bracketed entries in the case) 
 
 + Z (maximum number of dashes in any 
 all or-groups component of the or-group) 
 in the case 
 
 (U) Define the dash-count for a particular p. to be the sum 
 of the case-counts corresponding to each non-bracketed dash in the row 
 
 plus the sum of f- - — j <> (corresponding case-count) for each 
 
 bracketed dash in the row. 
 
 The preceding comments and definitions lead to the statement 
 of Algorithm k for constructing an optimal decision tree. The notions 
 of case-count and dash-count resemble those described by Pollack in 
 his article, "Conversion of Limited-Entry Decision Tables to Computer 
 Programs" (cited earlier in this paper). 
 
2k 
 
 3.1. Algorithm k (Dash-Count) 
 
 Step 1 : Using a Karnaugh map or the Quine-McCluskey 
 procedure plus the bracketing rules described above, construct a 
 table containing "don't-care" entries. 
 
 Compute the case-count for each case. 
 
 Compute the dash-count for each p. . 
 
 Select the p. for which the dash-count is a minimum. 
 
 Step 
 
 2: 
 
 Step 
 
 3: 
 
 Step 
 
 k: 
 
 If more than one p. has minimum dash-count, select any of the p. 's 
 with a minimum dash-count. 
 
 The case-counts and dash-counts for Figure 15 are shown below: 
 
 Case 
 Count 
 
 
 2 3+1 = 
 
 2 U =16 
 
 
 2 1 = 2 
 
 2 3 = 8 
 
 
 
 
 
 C 2 
 
 c 3 
 
 Dash Counts 
 
 p l 
 
 - 
 
 
 
 ~0~ 
 
 
 
 
 1 
 
 1 
 
 
 16 
 
 P 2 
 
 
 
 1 
 
 - 
 
 or 
 
 1 
 
 1 
 
 1 
 
 (1/2) (16) = 
 
 8 
 
 P 3 
 
 • 1 
 
 
 
 1 
 
 
 - 
 
 1 
 
 
 
 (1/2) (16)46 = 
 
 16 
 
 p h 
 
 - 
 
 - 
 
 _0_ 
 
 
 
 
 1 
 
 - 
 
 2(16)4242(8) = 
 
 50 
 
 Figure 19 
 
 Since p p has the minimum dash count, choose p p to be the root (p p is 
 indeed the root of both optimal trees). Use the original table (not the 
 one with the "don't-care" entries) to create two subtables and apply 
 Algorithm k to each of the subtables. 
 
25 
 
 One very significant advantage of Algorithm k over the other 
 algorithms is that cases which are of no concern in a particular 
 problem can be designated as such (as a "d" in a Karnaugh map or 
 Quine-McCluskey chart) rather than having to be combined into a single 
 'else" case. For example, suppose we have a problem in which n = k 
 and in which we assign only 13 predicate sequences to cases C, through 
 Ci . We have no concern for what happens to the three sequences 0010, 
 0011, and 0111. In order to apply our earlier algorithms, we would 
 have to first combine these three "else" sequences into a single 
 case--C c; . However, for Algorithm h, this "else" case is not necessary. 
 Suppose the original table is: 
 
 
 C l 
 
 
 C 2 
 
 
 C 3 
 
 % 
 
 p l 
 
 1 
 
 
 
 
 
 
 
 
 
 1 
 
 1 
 
 1 1 
 
 1 
 
 1 
 
 1 
 
 P 2 
 
 1 
 
 1 
 
 
 
 1 
 
 
 
 
 
 
 
 1 
 
 
 
 1 1 
 
 1 
 
 P 3 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1 
 
 1 1 
 
 
 
 1 
 
 1 
 
 P l4 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 
 
 
 
 1 
 
 1 
 
 Figure 20 
 
table : 
 
 26 
 
 The following Karnaugh map can he constructed from this 
 
 Figure 21 
 
 The "d" entries provide a far more accurate representation 
 of the problem than would a fifth, "else" case; and as the Karnaugh 
 map indicates, a far better optimal tree will result when the "d" 
 entries are used. Figure 22 shows an optimal decision tree obtained 
 using an "else" case and one obtained using "d" entries. 
 
27 
 
 Optimal Decision Tree Using CV 
 as "Else" Case (8 nodes) 
 
 Optimal Decision Tree 
 Using "d" Entries (5 nodes) 
 
 Figure 22 
 
 Algorithm k has yielded an optimal decision tree in many 
 examples on which it was tested. However, Toshio Yasui (Ph.D. student, 
 University of Illinois) has found the following counterexample to 
 Algorithm k. Given the decision table in Figure 23, 
 
28 
 
 
 c i 
 
 C 2 
 
 C 3 
 
 c k 
 
 C 5 
 
 C 6 
 
 C 7 
 
 C 8 
 
 C 9 
 
 
 
 
 
 
 
 
 
 
 
 p l 
 
 - 
 
 
 
 1 
 
 - 
 
 
 
 1 
 
 1 
 
 
 
 
 
 p 2 
 
 
 
 - 
 
 1 
 
 
 
 - 
 
 1 
 
 
 
 1 
 
 1 
 
 P 3 
 
 
 
 1 
 
 - 
 
 
 
 1 
 
 - 
 
 1 
 
 
 
 
 
 p k 
 
 
 
 
 
 
 
 1 
 
 1 
 
 1 
 
 - 
 
 
 
 1 
 
 Figure 23 
 
 Algorithm k indicates that the optimal tree would have p> for its 
 root. In fact, however, all the optimal trees have either p., P~, 
 or p for their roots with a root of p, yielding no optimal trees. 
 Therefore, although Algorithm h provides an efficient, systematic 
 method for finding a very "good" decision tree, it will, in some 
 situations, fail to yield the optimal decision tree. Algorithm h is 
 judged, however, to be generally more reliable than the algorithms 
 presented earlier in this paper. 
 
29 
 
 LIST OF REFERENCES 
 
 Pollack, Solomon, L., "Conversion of Limited Entry Decision Tables 
 to Computer Programs, " Communications of the ACM , Vol. 8, 
 No. 11, November 1965, pp. 677-682. 
 
 Yasui, Toshio, "Some Combinatorial Aspects of Decision Table 
 Optimization Problems," Ph.D. Dissertation (in progress), 
 University of Illinois at Urbana-Champaign, Urbana, Illinois. 
 
\ 
 
 p 
 
 %