XAnflflflflAJUU' mtigm Wtimm WSSM WMBm HS558™ IBB ■nil Wm wwiiBrai HuMMhiiI HKBSfltSil! mm W mm HKhShH r ■If HI ran mBBSKBXmnm rawwIiiMvsHilHU hV IITiHIIBnnTliiBflifflTffm ■■■■■■■■UlllnBBlnMKVM wmmmm wUBmSSSmlSSli mSSSmmm HHHflHIMHnnnRnfli LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510 .*4 ULCaT cop' 2 ' Digitized by the Internet Archive in 2013 http://archive.org/details/conversionofdeci501yasu UIUCDCS-R-72-501 7#*cc£ CONVERSION OF DECISION TABLES INTO DECISION TREES by Toshio Yasui February, 1972 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS DCS Report No. UIUCDCS-R-72-501 CONVERSION OF DECISION TABIES INTO DECISION TREES by Toshio Yasui University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 February, 1972 This work was supported in part by the Advanced Research Projects Agency under Contract No. USAF 30(G02)klhk and Contract No. DAHCO^-72-C-OOOl and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois, 1972. iii ACKNOWLEDGEMENT The author wishes to express his deepest gratitude to his supervisor, Professor Jurg Nievergelt of the Department of Computer Science of the University of Illinois for his guidance, stimulating discussions and suggestions, encouragements, and patience during the past two years. He is also indebted to Professor Daniel L. Slotnick, the director of the Center for Advanced Computation of the University of Illinois, and to the former Illiac IV project for their continued support for this thesis research. He is thankful to Mr. Jay N. Culliney for his proofreading, and special thanks are also due to Mrs. Patresa A. Grennan for typing the origi- nal draft and most of the final manuscript. My greatest appreciation goes to Mrs. Gayanne Carpenter for making the arrangements which allowed this thesis to be completed on time and also for typing Chapters 1 and 2, and to Mrs. Glenna D. Ganow for her typing of Chapter k of this thesis. Finally, he would like to thank personnel of the Department of Computer Science Print Shop under the supervision of Mr. Dennis L. Reed. iv TABLE OF CONTENTS Page 1. INTRODUCTION 1 2 . DECISION TABLES AND THEIR CONVERSION PROBLEMS k 2.1. Decision Tables k 2.2. Conversion Problem 7 2.3. Cubic Representation of Decision Tables 10 3. MATHEMATICAL PRELIMINARIES 16 3.1. Introduction 16 3-2. Algebraic Foundations 17 3»3- Partitions of an n-cube and Decision Trees 20 3.^. Lattices of n-cube Partitions 26 k. SOME ASPECTS OF DECISION TREE CONSTRUCTION 3^ h.l. Introduction J>k 4.2. How to Construct Decision Trees 35 4.3- Comparative Study of Algorithms kQ k.k. Bounds of Minimum Total Loss 5^ 4.5« Optimality Discussions for Different Objective Functions 56 5 . A DECOMPOSITION THEORY OF DECISION TABLES AND DECISON TREES 71 5.1. Introduction 71 5.2. Decomposition Problem and Objective Functions 73 5 • 3 • Analysis of Orthogonal Decompositions 82 5.4. Synthesis of Orthogonal Decompositions 9^" 5.5* Discussion of Optimal Decompositions 11^ Page APPENDIX LITERATURE SURVEY 126 LIST OF REFERENCES 1JO VITA 133 1 . INTRODUCTION Decision tables have been widely accepted as a convenient technique for specifying complex logical relationships in such diverse computer application areas as data processing in census studies, process control in manufacturing, management information processing systems, and so on (see, e.g., McDaniel [17], [18], [19] or Pollack et al, [2k]). A key problem for the successful use of decision tables is how to process them efficiently on a computer. One possible way is to convert (pre- process, translate or compile) a decision table into a special kind of flowchart, known as a decision tree. A simple but reasonable measure of the complexity of such a decision tree, is the number of its internal nodes, i.e., the number of decision boxes which appear in it. This is a special case of more elaborate measures of the memory requirement and the average processing time of decision trees. Several methods have been proposed in the literature for converting decision tables into decision trees with the objectives to minimize these two measures of the complexity. Some of these, intended primarily as manual methods (Egler [2], Press [25], and Pollack [23]) are based on plausible arguments, but with little theoretical background. Others, such as the use of the branch-and-bound method (Reinwald and Soland [26], [27]) are very general in nature, and might be improved with more knowledge of the specific structure of decision trees. A recent thesis by Garey [3] investigates such problems which are concerned with the structure of optimal decision trees. However, this structure is still not sufficiently well understood to be able to design efficient algorithms for the construction of optimal decision trees. Hence, further investigations of this topic are appropriate. This thesis is devoted to theoretical investigations of decision table conversion problems. For this purpose we present a simplified model of the optimization problem. A special kind of partition of the set of 2 vertices of an n-cube is considered as a model of a decision table with n condition rows. The problem of converting such an n-cube partition into a binary decision tree is discussed, based mainly on the simplified objective function mentioned above. The structure of this thesis is as follows. In Chapter 2, decision tables and their conversion problems are briefly described using conventional terminology. Then, in Chapter 3, we present mathematical concepts and notations which appear throughout the remainder of this thesis. The next two chapters are the main body of this thesis. In Chapter k we describe a procedure, called Procedure R, to construct decision trees for a given n-cube partition, and based on this procedure, we propose an algorithm, "iterated local minimization". It does not always yield optimal solutions, but generates suboptimal trees. The resultant decision trees are compared quantitatively with the trees constructed by Pollack's algorithm (Pollack [23]) and with optimal decision trees. This chapter also contains an analysis of "rule -splits", in particular, lower and upper bounds for the minimum number of required rule-splits over all n-cube rule partitions. Since a decision tree which is optimal with respect to one objective function is not necessarily optimal for other objective functions, relationships existing among optimal decision trees under different objective functions are also studied in the chapter . Chapter 5 i- s another major divison. The entire chapter is devoted to a presentation of the new topic, a decomposition theory of decision tables and decision trees. Recent development of multiprocessor systems or parallel computers provide the motivation behind the study. We consider decomposing decision tables or decision trees into smaller ones so that they can be processed effectively in parallel. After a theoretical analysis of decomposition, we propose a procedure, called Procedure D, to construct a pair of decision trees from a given n-cube partition. Based on this procedure, a heuristic algorithm is shown. Appendix contains a short summary of some of the main contributions to our topic. 2. DECISION TABLES AND THEIR CONVERSION PROBLEMS 2.1. Decision Tables Although flowcharts are a -widely accepted means of describing the logic of computer programs, they have several significant dis- advantages which should encourage analysts to seek alternate methods for stating the pertinent aspects of a program. Decision tables provide such an alternative. First some of the disadvantages of the flowcharting techniques are listed: 1. Although flowcharts are often very appropriate for describing scientific programs where each box can represent a certain amount of computation, they are often not appropriate for system programs, business data processing or information retrieval, where a long sequence of logical decisions must be made. 2 . Flowcharts for complex problems tend to become lengthy and difficult to follow and modify. Decision tables tend to overcome these disadvantages while providing some advantages as well. DOES HE HAVE A GOOD DRIVING RECORD ? Y Y Y N N N IS HE OVER 25 YEARS OF AGE ? Y N N Y Y N IS HE MARRIED ? - Y N Y N - INSURE X X X X - - CHARGE RISK RATE - - X X - - REJECT APPLICATION - - - - X X TABLE 2.1. Table 2.1. is an example of a decision table of driver insurability. It has four major sections. The condition stub is the upper left quadrant and contains descriptions of conditions on which decisions are to be based. Conditions are usually represented as questions. The action stub occupies the lower left quadrant and supplies all possible actions for the conditions listed above. The condition entry section is found in the upper right quadrant and answers the questions found in the condition stub. All possible combinations of answers to the questions are formed here where the responses are restricted to "Y" to indicate "Yes" and "N" to indicate "No". If no response is indicated, then the response need not be checked for that particular question and "-" (dash) is written there. The action entry is the remaining quadrant of the table and indicates the appropriate actions resulting from the conditions above. The only permissible entry here is the "X" to indicate "take this action". One or more actions may be designated for each combination of responses. Each of the various combinations of responses along with the indicated actions for that combination is called a rule . The various rules are usually numbered or lettered for identification purposes. Below we list some advantages of using decision tables. 1. Logic is stated precisely and compactly. 2. Complex logic is easier to understand, and the relation- ship among variables is readily understood. 3« Decision tables lend themselves to update and change. k. The tables are appropriate for independent review and documentation . Here we refer to the case of a large-scale decision table implementation for the Census of Agriculture (196*0 (McDaniel [l8])- A questionnaire with 335 questions was sent to each farm over 3,100 counties. Then, after 600 to 800 items were tabulated, a decision table of 600 pages was made. It generated 20,000 to 25,000 lines of code and needed 3,000 hours of UNIVAC 1107 computer time. In total, 53 man-years of programming with 1^ man-years on the edit program were required. Problems such as these stimulated the development of programming languages which are based on decision tables. Some of these are listed in chronological order of their development (see, e.g., McDaniel [19]): TABSOL--an experimental tabular language for GE machines, LOGTAB for the IBM 70^ and FORTAB--an extension of FORTRAN with a quite extensive and sophisticated decision table facility, DETAB-X--a COBOL oriented decision table language, and DETAB-65--a further development of DETAB-X. Associated with the development of these languages, there arose a major problem of algorithms which will compile decision tables into efficient programs. This problem is the motivation and practical background for our theoretical study. 2 .2 . Conversion Problem One important way to process a decision table by a computer is to transform it into a special kind of flowchart known as a decision tree . Since there are usually many decision trees which correspond to a given decision table, the problem arises of finding the ones which are most efficient according to some criteria of optimality. Consider as an example Table 2.2. It is a simplified decision table in that the action stub and entry are not presented. c l Y Y Y N C 2 Y - N - °3 N Y N - GO TO % R 2 R 3 \ TABLE 2.2. 8 This decision table can be realized by each of the decision trees shown in Figures 2.1. (a)., 2.1.(b)., and 2.1. (c). among others. R. R, FIGURE 2.1. (a). h R i FIGURE 2.1. (b). R 3 \ H 2 \ \ \ R, FIGURE 2.1. (c). They suggest some significant points about optimizing the conversion of the table into a decision tree. These three trees embody the same logical consequences but differ in the procedures they specify for arriving at the rules . They are not equally good from the viewpoint of memory requirement and processing time. For example, the tree (c) always requires that all three conditions be evaluated, while for the trees (a) and (b) the number of conditions to be evaluated is sometimes fewer. To characterize this problem the following quantities are introduced: s.; memory space required for condition c. , t. ; time required for processing condition c, and p.; probability with which the j-th rule, R., occurs. Then the total memory requirements, S, and the average processing time, T, for each of the decision trees can be calculted as follows : (a) S a = s ± + s 2 + s 3 T a = ( p l + ^'^l + ^ + t 3 ) + V 2 '(\ + t 3 ) + P)4 ' t 1 (b) S b = 2s 1 + s 2 + s T b = (p 1 + P 3 )'( t 1 + t 2 + t 3 ) + (p 2 + P^)^^ + t ) (c) S c = ks 1 + s 2 + 2s T c - (t x + t 2 + t ) 10 The expression for T is a generalization of a quantity known as the weighted external path length of a "binary tree (see, e.g., Knuth ClU]). It becomes identical to this quantity if t., = t n = t = 1. Since S < S, < S and T < T, < T , for all positive 123 a b c aoc values s., t. and p., E p . = 1, hold, (a) is the best of the three 11 J all j J decision trees with respects to both memory requirement and processing time. However, in general, a tree which is optimal in one respect need not be optimal in another . 2.3- Cubic Representation of Decision Tables The purpose of this section is to introduce a representation of a decision table by an n-dimensional cube. By using this cubic representation we can give a more intuitive interpretation to several procedures for generating decision trees, as well as a mathematically more precise formulation. The mathematical terminology and notation used in this thesis is introduced in the next chapter. Here we discuss the cubic representation of a decision table at an intuitive level. Let us consider Table 2.2. again. If we replace Y and N by 1 and 0, respectively, then a rule, say R , becomes the triple (1, -,l). Similarly, R^ R g and R, are (1,1,0), (1,0,0) and (0,-,-), respectively. Such triples can be identified with vertices or certain sets of vertices (namely, subcubes) of the 3-dimensional cube as shown in Figure 2.2. 11 j 101 FIGURE 2.2. Note that the conditions C. correspond to the coordinate axes of the cube. R and R each correspond to a vertex, or 0-cube, R to an edge, or 1-cube, and R. to a face, or 2-cube. Thus a decision table with n conditions can be represented by a partition (of a special kind) of the set of vertices of an n-cube. Now we explain how a decision tree can be obtained from the cubic representation. The decision tree of Figure 2.1. (a), is taken as an example. Correspondingly, the procedure is illustrated in Figure 2.3- (a). 12 FIGURE 2.3. (a). R. R, The process starts by separating the 3-cube into two 2-cubes by removing the coordinate C. . Then each 2 -cube can be again separated into 1-cubes by removing a coordinate. This process continues until each separated cube consists of exactly one rule, that is, until every rule is identified. 13 Let us consider another example, Figure 2.1.(b). Its corresponding process can be illustrated as in Figure 2. 3 -(b). R. R, FIGURE 2.3. (b). Ik This case creates a somewhat different situation. That is, when C„ is taken first, the 2-cube R. is split into two 1-cubes, R£ and Rj". They are the same rule but are separately identified at the terminals of the decision trees. Such a "rule- split " (i.e., a cube consisting of a single rule is separated) increases the number of nodes of the resulting decision tree, i.e., the number of condition boxes which appears in the tree as well as the number of terminal nodes, which represent cases distinguished by the tree. Such rule-splits occurred in both of the trees in Figures 2.1. (b). and 2.1.(c). Before we develop this cubic representation in the succeeding chapters, we give some remarks about such conventional terms as Else-rule , redundancy or contradiction of a decision table (see, e.g., Pollack [2k] ) . (1) In general, the set of all rules of a decision table does not cover its associated cube completely. This means that some vertices of the n-cube are left unspecified and no action is taken for these unspecified vertices. We call the set of these vertices the Else-rule. The Else-rule is not necessarily a single subcube as is an ordinary rule . (2) It may also occur that, for a given decision table, two or more different rules are assigned to the same vertex. In other words, they overlap at that vertex. If the series of actions taken for these rules are the same, redundancy exists. If this is not the case, however, then, there are different series of actions for that vertex. This is called contradiction . 15 In this thesis, we consider neither the case of the Else-rule nor the case of redundancy or contradiction in a decision table, but only the case where the set of all vertices is partitioned into a set of disjoint subcubes. These restrictions are natural since a decision tree whose in- ternal nodes correspond to single conditions C. (i.e., not logical combina- tions of conditions C. such as C. V C. ' C. ) can realize only such a special 1 1 j k/ " * type of partition of all vertices. 16 3. MATHEMATICAL PRELIMINARIES 3-1. Introduction In this chapter we first review some "basic algebraic concepts concerning the lattice of partitions of a set, and then some additional terminology and notations are introduced. By using these, we put deci- sion table problems into a more abstract and simplified form for their theoretical developments in succeeding chapters. Particularly, n-cube partitions and decision trees will be used instead of decision tables and flowcharts, respectively. Among those introduced in this chapter, two concepts are es- pecially basic and important to understand this thesis. One of these is inequality between two cube partitions, and the other is multipli- cation of cube partitions. The former concept will be used often for describing the conversion procedure from cube partitions into decision trees in Chapter k, while the latter will play an essential role in the decomposition theory of decision tables presented in Chapter 5. In the last section of this chapter, we present a preliminary study of some sets of n-cube partitions and show that each set forms a lattice with the introduction of two operations. More advanced study of this lattice in relation to optimality discussion is presented in Chapter h, however. 17 3-2. Algebraic Foundations This entire section is excerpted from the book by Hartmanis and Stearns [5]. A relation between a set S and a set T is a subset R of S X T and for (s,t) in R we write s R t. Thus, R = { (s,t) | s R t }. A relation R on S x S is: reflexive if, for all s, s R s; symmetric if s R t implies t R s; transitive if s R t and t R u implies s R u. A relation R on S is an equivalence relation on S if R is reflexive, symmetric and transitive. If R is an equivalence relation on S, then for every s in S, the set B R (s) = { t | s R t } is an equivalence class (i.e., the equivalence class defined by "s".) A partition tt on S is a collection of disjoint subsets of S whose set union is S, i.e., * = {B } a such that B n B„ = for a I p and U {B } = S. a We refer to the sets of it as blocks of tt and designate the block which contains s by 18 B n (s). We write s = t(n) if and only if s and t are contained in the same block of n, Note that s = t(it) if and only if B (s) = B (t). A binary relation R on S is a partial ordering of S if and only if R is (i) reflexive: s R s for all s in S, (ii) antisymmetric: s R t and t R s implies t sb s, (iii) transitive: sRt, t R u implies s R u. We refer to a set S with a given partial ordering R as a par - tially ordered set . When a relation R is a partial ordering, we use the more suggestive' symbol " < " instead of R, and the partially ordered set is represented by the pair (S, <)« Let (S, <) be a partially ordered set and T be a subset of S. Then s (in S) is the least upper bound (l.u.b.) of T if and only if (i) s > t for all t in T; (ii) s' > t for all t in T implies that s' > s. Dually, s is the greatest lower bound (g.l.b.) of T if and only if (i) s < t for all t in T (ii) s' < t for all t in T implies that s' < s. A lattice is a partially ordered set, L = (S, <), which has a l.u.b. and a g.l.b. for every pair of elements. We now give an equivalent definition of a lattice in terms of the l.u.b. and g.l.b. operations. 19 A lattice L is a triplet L = (S, •, +) where S is a non-empty set of lattice elements and " • " and " + " are binary operations satisfying the four postulates (i) to (iv) below, known respectively as the idempotent, commutative, associative and ab- sorption laws. (i) x • x = x and x + x = x (ii) x • y = y • x and x + y = y + x (iii) x • (y • z) = (x . y) • z and x + (y + z) = (x + y) + z (iv) x • (x + y) = x and x + (x • y) = x. Let L = (S, ' , +) satisfy the conditions of the above definition of a lattice and define x < y if and only if x • y = x. Then it can be verified that (S, <) is a lattice and g.l.b. (x, y) - x • y and l.u.b. (x, y) = x + y. If L is a finite lattice, then L itself has a l.u.b. and g.l.b., denoted by I and 0, respectively. Element I is called the identity because I • x = x for all x in L. Element is called the zero because x + = x for all x in L. Let L = (S, •, +) be a lattice and T a nonvoid subset of S. Then L' = (T, •, +) is a sub lattice of L if and only if x and y in T implies that x • y and x + y are in T. 20 ,n 3.3. Partitions of an n-cube and Decision Trees The n- dimensional cube, or n-cube for short, has 2" vertices with n lines emanating from each vertex. The vertices of the n-cube are labeled with n-tuples of zeros and ones such that two vertices are con- nected by a line if and only if these labels differ in exactly one position. As examples, the cubes n = 1, 2 and 3 are shown in Figure 3.1. o -o 111 Oil 010 00 10 101 100 000 1-cube 2 -cube 3 -cube FIGUBE 3.1. Let us agree to call a single vertex a 0-cube. Then a pair of adjacent 0-cubes will determine an edge or 1-cube. If the two vertices are, say, (1,0,1) and (0,0,1), then we shall denote this 1-cube as (-,0,1). As " - " ranges over {0,1}, this represents two vertices of the 1-cube. In a similar way, a 2-cube is made up of four 0-cubes. For instance, (0,0,1), (0,1,1), (1,0,1) and (1,1,1) make up the 2-cube (-,-,1), where each ranges over {0,1} independently. 21 Now we define two different n-cube partitions as follows DEFINITION 3.1. J3 A partition s of a set of 2 vertices of an n-cube is called an n-cube partition . The number of blocks of a partition it is denoted by #(*) . If each block, B., of this partition n is a single k-cube partition (k < n), then it is called an n-cube rule partition. EXAMPLE 3.1. We show a 3-cube rule partition n, below as an example, * 1 = {B^ Bg, B 3 , B^}, where B ] _ = {(0,0,0)}, Bg = {(1,0,0)}, B 3 = {(-,0,l)}, and B. = {(-,1,-)}. We note that each B. (i= 1,2,3**0 of rt is a single cube. The following partition jt is a 3-cube partition but not a rule partition, however, because B is not a single cube, nor is B . it = {B , Bg, B 3 , B^}, where B ± = {(0,0,0), (l,0,l)}, Bg = {(0,0,1)}, B 3 = {(0,1,0), (0,1,-), (1,1,1)} and B^ = {(l,-,0)}, respectively. 22 Both rule partition and (general) partition of the n-cube can be considered as simple models of a decision table with n condition rows. If we are concerned only with the condition stub and condition entry por- tions of a decision table (which has no Else-rule nor redundancy or ambi- guity), a rule partition corresponds uniquely to a decision table with n conditions. An n-cube partition, more generally, can be considered as another model of a decision table. A block of a partition corresponds to an action of a decision table. As long as conversion techniques are concerned, these two parti- tions serve as simple theoretical models of a decision table for the discussion of its optimality. Next we define binary trees, and then binary decision trees. DEFINITION 3.2. A binary tree T. with i internal nodes and i+1 terminal nodes (i > 0) is defined recursively as follows: If i = 0, then it consists of a single terminal node, otherwise it is a triple (T., v, T ), where v is a distinguished internal node 23 called the root of T., and T g and T are binary trees vrith I and r internal ~ — — — ■ x Z r nodes, respectively, and with £+1 and r+1 terminal nodes, respectively, where I > 0, r > 0, i+r = i-1. DEFINITION 3-3» A binary decision tree involving n conditions C, , CL, .... C is il ° i 2 n a binary tree each of whose internal nodes is labeled with a con- dition C . such that, for any path from the root of the tree to a terminal J node, no condition C. appears more than once along this path. Then we associate a binary decision tree with an n-cube parti- tion in the following manner. DEFINITION 3.U. The n-cube rule partition associated with a binary decision tree T is defined as follows: For any terminal node t of T and the path C , C ,..., C t l t 2 S from the root to t, where t. e {1,2, ...n}, associate the k-cube (k = n-p), - / t t tx in such a way that 1) x. is "-" if C. is not in the path C , C ,..., C and, i l t ± tg t p 2) x. is (l) if C. is in the path C , C ,..., C and the i i t l t 2 p terminal node t exists in the left (right) sub-decision tree of the internal node C. . l 2k It is obvious that all these blocks B form an n-cube rule partition and the partition is unique EXAMPLE 3-2. Assume n = 3« Associated with each terminal node i, is a block B. (i = 1,2,3,10 as follows: B ] _ = {(0,-,0)} B 2 = {(0,0,1)} B^ = {(0,1,1)} They form the following 3-cube partition jr. vt- B, 25 DEFINITION 3.5 . It is said that a decision tree realizes a rule partition it if and only if each terminal node of the tree represents a block of the partition jt and vice versa. A partition is realizable if and only if there exists a decision tree which realizes the partition. We note that a realizable partition is always a rule partition. EXAMPLE 3.3. We show an example of a non-realizable 3-cube partition it, B r B, Blocks, B., of Jt are B = {(1,0,-)}, B B^ = {(0,0,0)}, and B = ((1,1,1)}. B, {(0,-,l)}, B^ = {(-,1,0)}, REMARK As we have shown, a decision tree determines a unique n-cube rule partition. However, the converse is not true. In other words, more than one decision tree may realize the same partition as we show in the following. 26 \ B 3 B 6 B 5 B h Bg Two different decision trees T and T realize the same B. partition jr. 5.^. Lattices of n-cube Partitions So far we have learned that there exist three different kinds of n-cube partitions, i.e., (general) partitions, rule partitions and realizable partitions. Since a realizable partition is necessarily a rule partition, we have S c s "C S o r where S , S and S are the sets of all realizable, rule and general o 7 r ° n-cube partitions, respectively. In this section we study those sets S, S and S of all n-cube r o general, rule and realizable partitions. In more details, we introduce a binary relation " < " between two n-cube partitions and show that these sets with this relation are partially ordered sets. Then three binary operations " • ", " + ", and " © " are introduced between n-cube partitions. 27 We discuss closure properties of S, S and S under these operations and show that (S, ♦, +) and (S , ', ©) are lattices. More advanced study of optimality problems will be presented in Chapter k, based on these lattices. The relation " < " and a multiplication " • ", both defined between two cube partitions, are very important concepts and are used more than often in the succeeding two chapters. We introduce the binary rela- tion " < " first. DEFINITION 3-6. and write For n and n , we say that rt is larger than or equal to re «!<* 2 if and only if, for any two vertices v and v_ of the n-cube, V l ~ V 2 (*l) implies v ■ v g (k 2 ) , that is, it < it holds if and only if every block n is contained in a block of if . Since this relation satisfies the three properties of a partial ordering, we can present the following proposition. PROPOSITION 3.1. The binary relation " < " in the above is a partial ordering of S, S , and S rt . ' r' Next we introduce the binary operations " • ", " + ", and " @ " among n-cube partitions. 28 DEFINITION 3-7- If it and it p are partitions, then (i) it • it p is a partition such that v l S V 2 ^1 ' ^ if and 0nly lf V l ^ V 2 ^1^ and V l S V 2 • ^2^' (ii) it + it is a partition such that v_ - v p (it + it ) if and only if there exists a sequence in v such that u. = u. , (n,) or u = u. , (it_) for < i < 1-1. i i + 1 v 1 i i + 1 ' 2 y __ The procedure to form it • it is very simple since the blocks of it • it are obtained by intersecting blocks of it and it , The process to obtain it + it is longer, but still straight- forward. To compute B ' (v) we proceed inductively. Let B (v) = B (v) U B (v) 1 2 and for i > 1 let B i + (v) = B (v) U {B | B is a block of it or it , and B R B. (v) ^ }. Then B (v) = B (v) for any i such that B. , (v) = B.(v) it + it x * i + 1 v l EXAMPLE 3.4. We show it • it and it + it by the following examples. 29 *1 • *2 *!+ « 2 NOTATION 3.1. and Repeated multiplication and addition are represented by n jt • it . . , jt = n it. 1 2 n l i=l n JT + JT + . . . + TT = Z IT. 1 2 n . , i 1=1 According to the first definition of a lattice in Section 3.2. , it is easy to show the following. PROPOSITION 3.2. The partially ordered set, (S, <), of all n-cube partition is a lattice and , 30 g.l.b. (x lt « 2 ) = k ± ■ * 2 l.u.b. (jt^ st g ) = ^ + n g . Since we can define the binary relation " < " by: «»■ < jt if g^ only if jt • jt = jt if and only if it + jt = jt , we can prove the following statement by using the second definition of a lattice. PROPOSITION 3-3- The set (S, •, +) of all n-cube partitions with the two binary operations " • " and " + " is a lattice. We know that the two other sets S and S of all rule and realiz- r o able partitions are subsets of S and also partially ordered sets with respect to the relation " < As we now show however, S and S with .f _ ' r o the two operations •" • " and " + " are not sublattices of (S, ' } +) . Next we investigate these two sets S and S more carefully. r o J PROPOSITION 3A. 1) With respect to rule partitions: If jt_ , jt e S then Jt • jr. e S . However, there exists a pair 1 2 r 1 2 r of rule partitions jt and jt_ ( jt_ , jt 6 S ) such that jt_ + tl_ & S . 1 2 v 1' 2 r 12 r r 2) With respect to realizable partitions: If jt . jt € S . then jt • jt_ 6 S . However, there exists a 1 2 o 1 2 o 7 pair of realizable partitions Jt, and Jt_ (it n , it e S ) such that jt_ + jt_ 1 2 1 2 o 12 ' o 3) There exists a pair consisting of a rule partition, it. , and a realizable partition, jt , such that jt < jt and jt is not realizable. PROOF: The proof of the first two statements of l) and 2) are omitted. 31 It is sufficient to show the following two partitions it and it in order to prove the two second statemtnts of l) and 2) . *1 + *2 Both it and it are realizable (i.e., rule) partitions. However, it + it is neither a rule partition nor a realizable partition. To prove 3), we show the following two partitions it and it . *1 ' *2 it is a rule partition but not a realizable partition. it is a realizable partition consisting of only one block (the 3-cube itself) and it < it . However, it • it = it , and so it • it is not realizable. 32 By the above proposition, we learned that sets S and S are not closed under the operation " + " while they are closed under the operation " • ". Therefore, (S , • , +) and (S , • , +) are not sub- lattices of the lattice (S, ' , +) . Instead of the operation " + ", we next define an operation between two partitions as follows. DEFINITION 3-8. it © it p is a rule partition which satisfies l) k © it > jt + it and 2) for any it £ S such that it + it < it, it ii ? < it holds. EXAMPLE 3 -3» We show two examples of the operation " 1 *-]_ © * 2 P l P 2 p l ® p 2 The process to obtain it © it from it and it is omitted. Since it is easily shwon that S and S are closed under the r o operation " © " and we can define g.l.b. (jr., it ) - it © it then we obtain the following proposition. 33 PROPOSITION 3.5. The sets (S , •, ©) and (S , *, ©) are lattices, respectively. And the latter is a sub-lattice of the former. REMARK The set (S, », ©) is not a lattice since jt © it does not satisfy g.l.b. (it , it ) property. It is known that every finite lattice has and I elements. The three lattices (S, ' , +) , (S , ' , @) and (S , • , ©) have the following same and I elements. 1) The element is a partition consisting of 2 blocks, and each block contains one and only one vertex. 2) The I element is a partition consisting of one block con- taining all 2 vertices, i.e., the n-cube itself. ^ k. SOME ASPECTS OF DECISION TREE CONSTRUCTION k.l. Introduction Known manual methods (Egler [2], Press [25], and Pollack [23]) of converting decision tables into decision trees are based, mainly on plausible arguments but little theoretical background. On the other hand, Reinwald and Soland ([26], [27]) formulated this conversion problem as a problem in mathematical programming, and described a b ranch- and-bound algorithm for the construction of decision trees which minimizes either the average processing time or the storage requirement. In this chapter we derive some basic theoretical results con- cerning optimal decision trees. The argument is developed based on n-cube partitions which are introduced as a simplified mathematical model of decision tables. After the cost of a decision tree is defined, a procedure, called Procedure R, to construct decision trees from a given partition is shown. Based on this procedure an algorithm, called "iterated laocal minimization", is proposed. It does not always generate an optimal decision tree, but yields suboptimal trees which approxi- mately minimize costs. The trees generated by this algorithm are com- pared quantitatively with optimal trees and with those constructed by Pollack's first algorithm. This chapter also contains an analysis of "rule-splits", particularly, lower and upper bounds for the minimum number of required rule splits over all n-cube rule partitions. Since a decision tree which is optimal for one objective function is not necessarily optimal for other objective functions, relationships existing among optimal decision 35 trees under different objective functions are also discussed in this chapter. Such arguments are based on partially ordered sets, S and S , of n-cube realizable and rule partitions. k.2. How to Construct Decision Trees In this section, we describe a procedure, called Procedure R, to construct decision trees which realize partitions tt' , it" f ... which are refinements of a given partition n (i.e., n' < n f tt"i FIGURE k.l. 37 Then the cost of a decision tree realizing re' is #(rt')-l, and #(*')-! = #(«)-l + {#(*<) - #(n)}. The quantity #(«' ) - #(ir) is called the loss(rc',rc) due to the replacement of the (nonrealizable) partition n by the (realizable) partition rt* , where rt' < it, DEFINITION k.2. The minimum cost for a partition re is defined to be Min {#(*' )-l} = #(it)-l + Min loss(re' ,re) it' it' where Min is taken over all realizable partitions, re', which satisfy re* it' < re. Now we see that our optimization problem is to find a realizable partition, re'(< jt)> for a given partition, re, such that the loss(ir',jt) ■ #(jt') - #(«) is minimized. Next we describe a procedure to construct a decision tree realizing re' for a given partition tx (re' < re) and show how to calculate the loss, #(rt') - #(rc). The entire procedure, called Procedure R, is based on the following Operation A which generates two (k-l)-cube partitions from a k-cube partition. OPERATION A Given a k-cube rule partition, ex, which consists of more than one block, and given a condition, C , where 1 < s < k, define two (k-l)-cube rule partitions, cr and cr , as follows: 38 For each block B. = (ac., x , ..., x g , ..., x^) of cr: 1) if x of B. is 0, then the (k-l) -tuple (x , x , . .., S -L JL £_ x s-l> X s+1' ""' \) is a bl ° Ck ° f V and 2) is x of B. is 1, then the (k-l) -tuple (x , x , . .., x , x , ..., x.) is a block of cr , and S— J. S+J- K -L £ / i i i 3) if x is -, then the (k-l) -tuple (x , x , . .., x x ,,..., x ) is a block of both cr„ and cr , and s+1' - k' X k) cr n and cr have no blocks other than those obtained from l), 2) and 3) above. EXAMPLE k.l. Consider the following 3-cube partition cr, where cr = C l C 2 C 3 { B 1 = (0, 0, 0), B 2 = (0,1,0), B 3 = (0,-,l), \ = (1,-,-)}. If we choose C as the root of the decision tree, then the corresponding c 2 c 3 cr and a ± are cr Q = {(0 , ), (1,0), (-,1)) and o^ = {(-,-)}, respectively. 39 Given a partition jt, Procedure R constructs a decision tree which realizes a partition it'(< tt), by applying Operation A repeatedly. At the moment we will leave open how procedure R chooses the condition C to be used in Operation A. Various specific choices will be dis- s cussed later. PROCEDURE R Assume a rule partition it is given. If n consists of a single block, construct a decision tree which consists of a single terminal node. Otherwise, choose a condition C , derive the two partitions, s jt and it , by applying Operation A to it and C , and construct a \J _L S decision tree as follows: Its root is labeled with condition C , its left subtree is obtained by applying Procedure R to jt , and its right subtree by applying Procedure R to k . Then we obtain the following proposition. (The proof is omitted.) PROPOSITION k.l. Procedure R, applied to a partition jt, constructs a decision tree realizing a partition it' which satisfies it' < jr. EXAMPLE k.2. Readers are suggested to check the above Procedure R by two examples in Figure 2.3- (a) and. (b) in Chapter 2. For the case (a), the realized partition is the given partition, it. For the case (b), however, Procedure R yields the tree realizing the following partition, rr', which is smaller than the original partition, it. ko Tt' In Proposition k.l., it is shown that the tree constructed by- Procedure R applied to a partition it realizes a partition it 1 which satisfies it' < it. Now we show how the loss(jr',«) = #(rt' ) - #(jt) can be calculated. For this purpose we introduce the following definition. DEFINITION k.3. The loss , Z(C ,u) due to using C when Procedure R is applied — — — s s to a partition cr is defined by £(C ,cr) = the number of blocks, B. = (x , x , ..., X, ), of cr, where x is "-". s We call C a lossfree condition with respect to cr if and only if i(C ,cr) = holds. On the other hand, if all blocks, B. = (x , x , ..., x.) , of cr have x ="*", then the condition C is called inessential to cr. s ' s PROPOSITION k.2. The cost of the decision tree realizing rt' constructed by the above Procedure R is given by #(se')-l = #(*)-l + (#(*') - #(*)} = #(«)-l + S |(C f a ± ) t i hi and, therefore, the loss (it',jt) is expressed by Loss(ir*,jt) = 2 &{C ,. EXAMPLE h. 3 . Consider the following partition n . ^7 Both the iterated local minimization and Pollack's first algorithm generate the same decision tree, T of Figure 2.1. (a). They choose CY first as the root of the decision tree. The cost of T, is 12. B," B" FIGURE ^.2.(a). FIGUPE k.2. (b) kS The decision tree T of Figure k.2. (b) is the optimal tree for this nonrealizable partition, it. Its cost is 11. We show in Figure k. 3- , two partitions, rt' and it" realized by T. and T , respectively. B r \ B T b; B, B, FIGUEE k.3.(a). FIGUEE k.J.Cb)- We note that the loss(ir',jt) and the loss (rf", it) are 3 and 2, respectively. PROPOSITION h.k. The iterated local minimization, as well as Pollack's first algorithm, does not always yield an optimal decision tree. k. 3- Comparative Study of Algorithms We learned that the iterated local minimization and Pollack's first algorithm do not always yield an optimal decision tree. In this section, we compare trees generated by the algorithm of iterated local minimization, by Pollack's algorithm, and optimal trees; and we provide estimates of how far off-optimal trees are which are generated by these two algorithms. To make the comparative arguments more concise, we k 9 prepare the following definition and give a very baste theorem. DEFINITION k.k. For a k-cube partition, it , its n-cube extension (k < n) is defined as follows. For each block, B. = (x. , x_, .... x. ) of it, . ' 1 1' 2' 'Is.' k' n-k we form the 2 blocks of it by adding (n-k) new coordinates, C, n , n ' k+1' c k+l c k+2 c n C k+2' '"' C n' to the tuple > i,e -' ( x i> x 2> '•'> \> ° » ° » •..,0 ), Then #(* ) is equal to 2 n ~ k • #(rt ). EXAMPLE k.k. As an example, the construction from it to its n-cube extension, K. it , is shown below, where k = 3 and n = k. THEOREM k.2. n Assume an algorithm, A, is applied to a k-cube partition, it. 9 and it constructs a decision tree realizing it ' . Then, as we defined 50 previously, the loss(it ', it. ) is #(«.') - #(it fc ). When the same algorithm, A, is applied to the n-cube extension, it , of n , then the loss (it ' ,k ) is given by n n loss(n n ', * n ) =#(* n ') - #(« n ) ^n-k n / , \ > 2 • loss(it ',it ) = 2 n-k k > V (#(« k «) - #(n k )), where « ' is a partition realized by the resultant decision tree, n PROOF : The algorithm A may or may not have the LCF policy. If it has the LCF policy, then it can choose the newly augmented coordinates C _ , k+1 n-k C ^, .... C at the first 2 -1 steps of Procedure R applied to the n- k+2' ' n cube extension, it , because these conditions are lossfree at each step of the first 2 -1 steps. In practice, this means that the n-cube n-k partition jt is divided into 2 k-cube partitions, jt . For each k-cube partition, the algorithm generates the loss(rt ' ,Tt ) = #(jt ' ) - #(rt ); therefore, totally, this algorithm generates the 1oss(jt ',jt ) - n-k 2 . lossfjr. '.it, ) for the n-cube extension, it . If this algorithm A v k ' k 7 ' n does not have the LCF policy, then it generates much more loss than the above loss (it ',it ) according to Theorem k.l. Q.E.D. n ' n This theorem says that the loss generated by an algorithm for a n-k k-cube partition is magnified by the factor 2 for its n-cube extension. COROLLARY k.3, Assume that two algorithms, A and B, have the LCF policy. If A and B generate the loss (it ' ,it ) and the loss(it ",it ) for a k-cube partition, respectively, then for any n (> k), there exists an n-cube 51 partition for which A and B generate 2 « loss(n',ir) and 2 . 1oss(tt ",jt, ), respectively. In other words, the difference of the n-k losses generated by A and B is 2 •. {loss(it ' ,it ) - loss(« ",jc. )}. K. k k K PROOF: If we consider the n-cube extension it of the k-cube it, , then n k' the above statement can be directly derived from Theorem k.2. Q. E.D. The corollary says that, if there is a difference, d, in the losses generated by two algorithms with LCF policy, for a k-cube partition, then, its n-cube extension causes the difference in the losses n-k by A and B to be 2 . d. We now apply this corollary to compare the losses of various trees. THEOREM k.k. For any n > k, there exists an n-cube partition for which the n-k cost of an optimal decision tree is 2 less than the cost of a decision tree constructed by the iterated local minimization or by Pollack's first algorithm. PROOF : Consider the U-cube partition, n, in Example k.k. Since the decision tree by the iterated local minimization or by Pollack's first algorithm differs from the optimal, in terms of the difference of loss, by one, then for its n-cube extension, the difference of loss becomes 2 n_k , according to Corollary k.3- Q.E.D. Secondly, we compare Pollack's first algorithm with the iterated local minimization. 52 THEOREM k.5. For any n > 5, there exists an n-cube partition for which the cost of the decision tree constructed by Pollack's first algorithm is 2 • 2 n times larger than the cost of a decision tree by the iterated local minimization. PROOF: Pollack's first algorithm generates the loss for the 5-cube partition of Figure k.k. (it chooses C first as the root of the decision tree. ) FIGURE k.k. The iterated local minimization, however, generates the loss 1. It chooses C first as the root of the decision tree. After the LCF policy can be applied at each step of the procedure, we obtain, for the n-cube extension, the difference of loss in the statement, i.e., ^-5 (3-D =2-2 n-5 Q.E.D. 53 Finally it is shown that the iterated local minimization is not always better than Pollack's algorithm. THEOREM h.6. For any n > 5> there exists an n-cube partition for which the cost of the decision tree constructed by the iterated local minimization is n-5 2*2 times larger than the cost of the decision tree obtained by Pollack's first algorithm. PROOF : The iterated local minimization generates a loss of 5 for the 5- cube partition of Figure 4. 5- That is, first it chooses C as the root of the decision tree while generating a loss 2 at this first step. After that it generates the loss 1 and 2 for the resultant two 4-cube partitions, respec- tively. The total loss, therefore, is equal to 5- FIGURE 4.5. On the other hand, Pollack's algorithm chooses C first, which gen- erates a loss 3 at this step. After that, it generates no loss. Therefore, the difference of loss by both algorithms for this partition is (5-3) =2. Using Corollary 4.3, the 2 • 2 n ~ 5 is obtained. Q.E.D. ^ k.k. Bounds of Minimum Total Loss Associated with the concept of total loss for nonrealizable partitions, one interesting question occurs. That is: how much total loss must be generated by an optimal algorithm? Not at ion ally, this quantity is characterized by L(n) = Max Min loss (it ' ,jt) Jt It' = Max Min {#(«') - #(jt) \ it' < jt and it 1 is realizable} it jt' where Max is taken over all n-cube partitions, jt THEOREM 4.7. The quantity L(n) is bounded by i * L( P ) *§ 'log (g). PROOF: a) Lower bound Consider the 3-cube partition in Example 5-3« in Chapter 3- The total loss associated with an optimal decision tree for this partition is equal to one. According to Corollary k.3. an optimal n-3 algorithm generates at least the loss of 2 for its n-cube extension. Therefore, we obtain the lower bound (which is existential) 2 for L(n). b) Upper bound. For the proof of this bound, we use the following lemma. LEMMA k-1 For any step of Procedure R, at most a loss of [2 /kj is generated (k > 3), i.e., /(c ,t) < L2 k -VkJ, where [x\ is the greatest integer that is less than or equal to x, and 55 a partition, t, for which we construct subtree at this step, is a k-cube partition. The proof of this lemma is omitted. Instead, for k = 3, ^ and 5, we show in Example U.5- a k-cube partition which achieves L2 k " 1 /kJ. EXAMPLE U.5, The following 3-> ^— and 5-cube partitions are examples which ,k-l enerate a loss of [2 /kj for k = 3* ^, and 5> at one step. k = 3 P 3-1 , if J- 1 56 Now we prove the upper bound "based on the above lemma. Assume k-1 that we have an n-cube partition for which a loss of [2 /kj is generated at each step of Procedure R. Then, for this partition, the total loss is given by [2 n " 1 /nJ + 2 • L2 n " 2 /(n-l)J + .... + 2 n ~ 3 • L2 2 /3J and bounded by < 2 n_1 /n + 2 • 2 n " 2 /(n-l) + 2 2 • 2 n " 3 /(n-2) + ... + 2 n ~ 3 • 2 2 /3 = 2 n_1 • {1/n + l/(n-l) + l/(n-2) + ... + 1/3} pn+1 _ n < 2 • / X dt = 2 , • (log n - log 2) 2 n . log (n). Q.E.D. 2 2 7 In practice, for the case n = 10, the lower bound is 2 . This means, in conventional terminology, that there exists a decision table 7 with 10 condition rows for which an optimal algorithm splits 2 rules 9 totally. It is, however, bounded by 2 • log 5 = 820. k.5- Optimality Discussions for Different Objective Functions So far we worked only with the objective function #(«). There are, however, two other criteria to be considered, the total memory space requirement, M, and average processing time, P, which are briefly described in Chapter 2. Obviously, #(jt) is a simplified and special case of M and P. However, one question arises. What kind of relationship does exist among those optimal decision trees which minimize different objective functions? More simply, does an optimal decision tree for one objective function minimize the other two objective functions? 57 In this section we develop such arguments on optimality for different objective functions, based on two partially ordered sets, S and S, of all realizable and rule partitions, respectively. Moreover, we can show how Procedure R and related algorithms work and/or should be modified for those two other objective functions. For simplicity, the three different objective functions are called #-cost, M-cost, and P- cost, respectively. STATEMENT For a given partition *c € S , where do optimal solutions exist in S for the different #-cost, M-cost and P-cost, respectively? How can we relate one optimal solution to the rest? How well do procedures or algorithms which were so far proposed for minimizing #-cost, work for M-cost and/or P-cost? This section answers these questions to some extent. We first give a brief review of properties for these three different costs. Recall that a decision tree is defined only for a realizable partition it and we denote it by T(jt). T(tt) is read "a decision tree realizing a partition it". a) #-cost #-cost, C(T(jt)), of a decision tree T(jt) realizing rt is the number of internal nodes of the tree, and it is equal to the number of blocks of jt minus one, i.e., c(t(k)) = #(«) - 1. Therefore, the #-cost of all decision trees realizing it is the same. In other words, this cost is independent of decision trees realizing ir and 58 depends only on the partition, it. So we define #-cost, C(jt), for a realizable partition rr by C(ff) = #(*) - 1. (Do not confuse C(tc) with #(tt). The former is #-cost, and the latter is the number of blocks of it. ) It is straightforward to extend, the above definition to the case of a nonrealizable partition, it. Thus, more generally, we define #-cost, C(jt), of a rule partition tt ( e g ) "by r C(«) = #(*) - 1. b) M-cost M-cost, on the other hand, can be defined only for a realizable partition, jt( e S ), and a decision tree, T(tt). M-cost, C (T(tt)), of a decision tree T(it), is defined, by where s. is the storage space required for a condition, C, and the sum Z s- is over all internal nodes i, of T(tt). i Then it is easy to see that the M-cost of different decision trees realizing the same partition, Jt(e S n ),may differ, i.e., C M (T 1 (7T)) ^ C M (T 2 (Tt)) ' Consider it, T and T in Remark after Example 3.3. T and T realize the same partition, it, but the M-cost of those trees is C (T_(ir)) = S l + ^ S 2 + S ^ and C M^ T P^^ = 2s + s + 2s , respectively. It is concluded that M-cost cannot be defined for a partition itself. c) P-cost First we assume a partition, jt(e S ). A probability, Pr(v), of 59 occurrence for every vertex v of it is given and fixed. Then, a probability of occurrence of a block, B, of jt is simply calculated by Pr(B) = ^PrCv). Then, the P-cost, C (T(rt)),of a decision tree (T( it)) realizing n is defined by C p (T(»r)) = E Ep( B± ) .M ) i. k k where B. is a block of tt corresponding to a terminal node, i, of T; t. is the time required to process a condition C. ; and the sum Z t. k k jL k k is taken over all internal nodes, C. , along the path from the root to X k the terminal node i of the tree. Examples are seen in Section 2.2. Now we show a different way to calculate the P-cost without constructing a decision tree. THEOREM k.Q. Let a partition jt have blocks B. = (x. , x. , ..., x. ), l i_ ' l ' ' l " 12 n where x. = 1, or"-". For any decision tree, T(jt), realizing it(€ S n ) , its cost, (t(jt)), is equal to C (T(*)) =SEp(B )• {S t } where i is an index such that x. is 1 or (but not "-") in the n- tuple k i k B. = (x. , x. , .... x. ). l l ' l ' ' l 12 n The proof is emitted. 6o EXAMPLE k.6. To verify the theorem, consider the following partition, it, and the tree, T(rt), realizing rt. B 2 = ((-,1,0)} ((0,0,0)} {(1,0,0)} T(«) C (T(jt))can be calculated by its definition as follows, C p (T(n)) = Pr(B 1 ) • t 3 + Pr^ (t 3 + t 2 ) + Pr(B 3 ) (t^ + t 2 + t x ) + Pr(B^) ■ (t^ + tg + tj. On the other hand, the left hand side of the equality in the above theorem is calculated as follows. Indices, L , of B. for i = 1, 2, 3 and U are (1,2,3}, {1,2,3}, {2,3} and (3}, respectively. Then, the terms of the left hand side are Pr(B ) ■ (t + t + t ), Pr(B ) • (t- L + tg + t ), Pr(B ) • (t + t ) and Pr(B ]+ ) • t , and it is easy to see that this sum is equal to C (t(jt)) 61 We learned that the P-cost of all decision trees realizing tc is the same, and it depends only on the partition it. We define P-cost, C (jt), for a partition jt by C p (it) =2 Pr(B.)-(2 t }. i x i, k k In a way similar to the case of #-cost, the above definition can be extended to a nonrealizable partition, it. To clarify differing aspects of these three costs, they are summarized in the following proposition. PROPOSITION k.5- 1. For a partition, Tt( e S ), its #-cost, C(it) , is defined by C(jt) - #(«) - 1. If it is realizable (e S Q ), then all decision trees realizing it have the same #-cost, C(T(tt)) = C(n) = #(jt) - 1. 2. M-cost, C (T(rt)), is defined for a realizable partition, it(e Sq), and a decision tree, T(rt), realizing jt, by C.,(T(jt)) =2s., M i x and it may vary for each tree. 3. For a partition, jr(e S r ), P-cost, C (it), is defined by C p (*) = 2 Pr(B.) • (S t. }. i i k k If it is realizable, (e S ) , then all decision trees realizing 7t have the same cost C_,(T(jt)) = C (it). r r Now we can present the following proposition concerning the relationship between "cost inequality" and "partition inequality". 62 PROPOSITION k.6. 1. Assume jt , it e s and it < it . Then C(it 1 ) > C(it 2 ) and C p (it 1 ) > Cp(rtg). 2. Assume it , it G S^ and it < it . Let T and T be decision trees realizing it and it , respectively. Then, for some T ± and. T 2 , C M (T l (:r)) ^ C M (T 2 (jt)) holds, and. for other T and T , C M (T 1 (it)) < C M (T 2 (it)) holds. PROOF: 1. It is obvious from the definitions of #-cost and P-cost 2. We show the following example, it and it are both realizable and it < it holds. T (*) T (n ' 2 2- Since C M (T 1 (it 1 )) = 2 S;L + 2s 2 + s 3 and ^(T^)) = S;L + s 2 + 2^, the sign of C^^C^)) - C M (T 2 (it 2 )) = ( S;L + s 2 - b 3 ) may take positive or negative values (or zero). Q.E.D. Next we associate the above properties with procedures 63 constructing decision trees, i.e., Procedure R and its related algorithms. We recall that Procedure R generally constructs a decision tree realizing rt if it is realizable (esj. First we consider the set S of all realizable partitions. Propositions k.% and k.6. lead to the following theorem. THEOREM h.9. Assume jt is realizable (e S ) • Then, 1. Procedure R with the LCF policy always constructs an optimal decision tree for both #-cost and P-Cost. 2. However, Procedure R with the LCF policy may not construct an optimal decision tree for M-cost. PROOF: 1. If tt is realizable, then Procedure R with the LCF policy constructs a decision tree realizing rt and, according to Proposition h.5t its cost, C(t(jt)), (or C (t(tt)) is the same for any other decision trees. Proposition k.6. says that, for any jt'( C(tt), (or C -□(*') > C (tc)) holds. Therefore, the constructed tree is optimal. 2. Procedure R with the LCF policy always constructs a decision tree realizing a partition it if it is realizable. However, Proposition k.6, says that there exists a partition tt'(< it) and tree T'(Tt') such that C(T' (it')) < C (T(jt)). Therefore, T(jt) may not be optimal. Q.E.D. The above theorem says that, if a given partition, it, is realizable, Procedure R with the LCF policy works best for #-cost and p-cost but not for M-cost. For M-cost, there may exist an optimal decision tree realizing it' (jt'< jt) for a given partition it. 6k Although the following modification of the LCF policy does not guarantee optimality, it suggests, however, a simple and reasonable way of selecting a condition, C , at each step of Procedure R for M-cost. Modified Lossf ree-Condltion-First (MLCF) Policy At each step of Procedure R, 1) if there exists only one lossfree condition, C, then choose it or 2) if there are several such conditions, then choose C. whose ' 1 s . is a maximum. 1 If this is applied to a realizable partition, it, a decision tree, T(it), realizing it (not it' such that it' < it) is constructed. Of course, it is an optimal tree for #-cost and P-cost. Since M-cost, in general, decreases if a condition, C, with a larger s. is chosen at a higher level of the tree, the MLCF policy constructs a near optimal tree and it is the best among all decision trees realizing the realizable partition it. So far we have shown how Procedure R and its related LCF policy work for a realizable partition. Next we consider the set S of all rule partitions. Our object is to find a realizable partition, it', for a given nonrealizable partition, it(eS r )^ such that it' < it and C(it') (or C (it')) is minimized. The following theorem defines where such an optimal realizable partition it' exists in S for a given nonrealizable partition, it, in S . We need the following terminology. 65 DEFINITION k.$. A subset B of a partially ordered set C is said to be maximal in C if and only if, for all x e B and all y e C, either x > y holds or else x and y are incomparable. THEOREM 4.10. For a nonrealizable partition, it, assume an optimal decision tree realizes a partition, it', (it' < it f it'€ S and it e S ) for #-cost or P-cost. Then such it' must be in a maximal set of S. D S , where S it 7 it is the set of all partitions a such that a < it, i.e., S = [a € S | a < it}. PROOF : it' should be in S . If there exists a realizable partition it" such that it" > it', then Proposition k.6. says that C(it") < C(it') and C (it") < C (it'). This contradicts the fact that it' is a partition P - P realized by optimal decision tree. Q.E. D. According to the above theorem, an optimal solution is a max- imal element of S D S . Therefore, we can find an optimal solution for P-cost in this way: First we neglect all probabilities, Pr(v), and time^ t., required to process C. Enumerate all -elements in the maximal set of all realizable partitions which are less than or equal to a given parti- tion, it. Then, calculate P-cost of all these elements. An optimal solution is one of these elements whose P-cost is a minimum. REMARK A partition realized by an optimal decision tree for M-cost may not be in this maximal set. 66 This procedure which is based on Theorem 4.10 ., however, is impractical because the enumeration of all elements of S (IS is rather exhaustive. As an alternative, in the following section, we modify the iterated local minimization for #-cost (which was proposed in Section 4.2.) so that the modified version can be applied to P-cost optimization problems, For this modification, we first analyze the inequality C (it ) > C (jt ) for a : tt„ which was seen in Proposition 4.6. For #-cost, it has been analyzed and associated with Procedure R in Proposition 4.2. It was shown there, C(it ) -C(* 2 ) =Si(C , a.) i i where l(C , cr.) is the loss incurred by C with respect to C (rt_) as we now pi — p 2 show. DEFINITION 4.6. Consider choosing C at a step of Procedure R, where we are working with a partition, a. Then, the loss Si (C- , cr) incurred by C P 3 S with respect to cr is defined by I (C , 0") =* t E Pr(B k ) P s' S k v CT where B is a block of cr which is split by C . That is, the sum is " s ■&■ x. i-u 4. i -c i, j_ -, -r-k / k k k> such that x of the tuple I cr s * a taken over all B^ such that xj of the tuple B^ ■■ (x^, x^, . . . , x£) is - (in other words, k is an index whose B is split by the condition C ). s 67 EXAMPLE 4.7. Consider the follwoing cr. If we choose C , then the loss ^-(C , cr) is t ? • Pr(B ) since only B is split by C . If we choose C , it splits the blocks B and B . Then, the loss * p (C ., cr) = t • {Pr(B ) + Pr(B ) } . B, THEOREM 4.11. Assume Procedure R is applied to a partition, it , and the resultant decision tree realizes a partitions, Jt . Then, Cp(« ) -0 p (, 2 ) -| l p (C , According to Theorem ^.8 V C (it ) is equal to Pr^) • tj + Pr(B 2 ) • (tg + t 5 ) + Pr(B 3 ) • (^ + tg + t ) + Pr(B^) • (t ± + tg + tj). Also, C (it ) is calculated and equals Pr(B.p ■ (tg + t_) + Er(Bj) • (t x + tg + t 3 ) + Pr B' X + t n + t ) + Pr(B ) • (t + t ) T.' v 1 2 3' ' " x ~2 y v "2 ' "3' + Pr(B 5 ) • (t ± + tg + t ) + Fr(B^) • (^ + tg + t ? ) Therefore, V*i> - c p(" 2 ) 6 9 = t 2 . {Pr(B') + Pr(B^) + Pr(B*')) + t ± • {Pr(Bj) + Pr(B^')} = t 2 • Pr(B 1 ) + t x ' Pr(B^ U B*) . On the other hand, the total loss Z i_(C , of the original partition. This scheme can be visualized by the following sketch of Figure 5.1. The intersection operation can be realized by a simple "Logical AND" function. INPUT OUTPUT FIGURE 5.1. We note the fact that each processor deals with a smaller decision tree, and hence the average processing time of this scheme might be shorter than that for the single processor case. In succeeding sections of this chapter, a decomposition problem of a partition, which can be 73 applied to a parallel processing of a decision table, will be formulated and some objective functions for efficient decompositions will be introduced. After a theoretical analysis of this problem, discussions for the construction of a pair of decision trees for a given partition are developed. Based on a procedure, called Procedure D, a heuristic algorithm for efficient decom- positions is also presented in the last section. 5.2. Decomposition Problem and Objective Functions First we define the notation of a decomposition of a partition. DEFINITION 5-1. A set of n-cube partitions, it. (i = 1,2, ,,k), is called a k- decomposition (or simply decomposition ) of an n-cube partition, it, if and only if k n it . < it . i=l X " This is a necessary and sufficient condition for our parallel processing scheme to work. Then, our decomposition problem is how to find a decomposition {it.} for a given partition it so that we may effectively do parallel processing. As a measure of efficiency for decompositions, the following two objective functions are introduced. They are understood as extensions of the conventional cost of a decision tree discussed in the previous chapter. OBJECTIVE FUNCTION 1. For a decomposition { it . } of it, objective function 1, C (it ,jt , * ' ' ,it ) is defined by Ik k where #(tc) is the number of blocks of it. OBJECTIVE FUNCTION 2 For a decomposition [n.) of n, objective function 2, CqC*-,* * 2 >---; \); is defined by c 2 ( v v ... f « k ) - .| {#(,,) -1). In Example 5. 3. both cost functions are illustrated. The first, k objective function C f is the number of blocks of the partition H it. . 1 k i=l X This number is related to the distance from it to II n. on the lattice of k 1=1 X all n-cube partitions since #(JT it.) - #.(*) is the number of blocks 1*1 1 which are split. The second one is based on more practical considerations, It corresponds to the sum of internal nodes of all decision trees, or equivalently to the total memory space requirement, since #(rt.) -1 is proportional to the storage space required for the i-th processor. (Each internal node of a decision tree is supposed to require one unit of storage space.) There is another objective function which corresponds to the average processing time. We need, however, probabilities of occurrence of rules for its introduction. It will not be considered in this thesis. So far, we have given the definition of a decomposition and have introduced two objective functions. The method of decomposition is restricted as will be shown later. In order to explain the motiva- tion for such a restriction, recall Example 5.1. The idea shown there immediately leads to the following statement. 75 PROPOSITION 5-1 . Assume that a decision tree T realizes an n-cube partition Jt. Then, a set of subtrees, T. (i = l,2,,,k), of T obtained by removing any set of (k-l) edges of T induces a k-decomposition, i.e., k II it. < jt, i=l x ~ where each jt (l = 1,2, ,,k) is the n-cube partition realized by T. . i i To clarify this proposition, we use Example 5-1. To identify, say R , it is sufficient to be given answers (i.e., "Yes" or "No") of internal nodes C. , C , C and C along the path from the root to R . In practice, trees T.. and T give such information for this case of R . It is easy to see that the resultant set of trees T. obtained by the method generally provides more than such information. This means that if k processors are available, then we may decompose the original tree T into k subtrees, T , T , ,, T , so that our scheme works. In other words, by collecting intermediate results from these processors and multiplying them, we can identify a rule of the original jt. If we use the above result of Proposition 5.1., one more property can be given as follows. PROPOSITION 5.2. For a given realizable partition jt, we can always achieve the inequality Min C 2 (jt 1 , Jt 2 ,..., h ) < #(jt) -1 vv*-* k <* for arbitrary k (k =1,2,,,) 76 PROOF: Assume a decision tree T realizes it. Applying the method in Proposition 5.1. to T yields k trees T. (i = 1,2, •••,k) which realizes x k X X such that IT n. < k, Then, we can derive the following formulae, i=l 1 k C (« Hg,.-., i^) = 2 {#(n.) -1} i=l k = £ {the number of internal nodes of T. } i=l X = {the number of internal nodes of T} - #(*) -1 j that is, C 2^ V *2' ""' \^ = ^^ _1 ' In other words, if it is realizable, then there exists at least one k-decom- position of n satisfying the expression in the statement. Q.E.D. This proposition shows the total memory requirement for our parallel processing scheme is at most the memory required for the case of single processor. A similar result can be obtained for the average processing time but it will not be presented here. These facts show the advantages of parallel processing, i.e., a smaller storage requirement and shorter process- ing time. The method of decomposition stated in Proposition 5«1« is a useful tool for theoretical analysis: we could, for example, give the very basic result in Proposition 5-2. It has, however, a practical deficiency as the following example shows . 77 EXAMPLE 5-2, T- We assume three processors are available. Removing edges of T between C, and both C _'s results in three isolated subtrees T, , T_ and T, . Since T„ and T, d 1 d $ d J are the same, however , we actually need only T and T (or TO . In other words, two processors are enough for this decomposition. It is concluded that decomposing a decision tree by removing its edges is impractical. In the above example, T is completely the same as T and the set of T , T and T can be considered a redundant decomposition . Similar redundancy can be found, to some extent, in Example 5-1., also. Two identical conditions, C 's, appear in T and T , and so do two C 's in T and T . Two CL ' s can be seen in T and T , respectively. Generally, we may need, for better decom- position, this kind of small redundancy where the same conditions appear in different subtrees. However, for the establishment of our simple decomposi- tion theory, we exclude such redundancy, i.e., we put the following restric- tion on our decomposition problems. RESTRICTION A condition C. is processed by one and only one processor, i.e., a condition C. appears in only one of the subtrees. For simplicity of the theoretical analysis of decomposition problems, 78 we consider only the case k = 2, that is, the dual processor case. However, results following in this thesis can be easily extended to the general case. We introduce the following terminology. DEFINITION 5-2. If a pair of decision trees satisfy Restriction, they are called an orthogonal pair of decision trees. DEFINITION 5.5. Let C 's be conditions corresponding to coordinates of an n-cube i partition it. Then, an orthogonal decomposition of it is a pair of n-cube partitions, jt_ and ft , such that 1) it • n < it and 2) S n S 2 = V ql is a 3-cube partition and its coordinate set is U = {C, , CL, CL } . 1 -^ a. r 2' J P is a ^— cube partition and its essential coordinate set S = {C , C.,}. Then p is essentially a 2-cube partition and its degenerated for P* is C shown in Figure 5.5. Since S^ = U holds, P* can be extended to its Pa 3-cube partition t of Figure 5-6. by adding the coordinate C e IJ - S Finally, QL * p = a • t is shown in Figure 5-7' u P* 86 FIGURE 5.5. FIGURE 5-6. NOTE FIGURE 5.7, In most cases in this thesis, a is simply a cube itself (as seen in this example) rather than a general cube partition. In other words, it is a partition consisting of only one block. Now we give the proof of Theorem 5.1. PROOF: By replacing the i-th terminal node of T by a subtree 1~, its corresponding block a of a is refined by f3. (a *• f3 is well- defined because of the orthogonality of T and 3L.) To do this replace- ment for all i, every block a. of a is refined into a. * P by p. This implies that a terminal node of T, * T p corresponds to a block of a. * P in one-to-one manner. Since a block of a. * P is a block of a • P, then T x * T 2 realizes a Q.E.D. 87 Since there is no essential difference of roles between a and (3 (or T and T ), T * T p and T * T realize the same partition a • p. COROLLARY 5-2. Let a pair of decision trees T and T be orthogonal and realize partitions a and p, respectively. Then, T * T and T * T realize the multiplication a • p. Next we derive a necessary and sufficient condition that an orthogonal pair of decision trees realize a decomposition of a given parti- tion ti. Its fundamental result will be seen in Lemma 5.1. Theorem 5«3« is the final and complete statement of the necessary and sufficient condi- tion. Based on Theorem 5»3«> the synthesis problem, i.e., constructing an orthogonal decomposition for a given partition tx, will be discussed in suc- ceeding sections. Assume the following orthogonal pair of decision trees, T and T . Let a and (3 be those partitions which are realized by T and T , respectively. i 88 In order to analyze the relationship between a . (3 and n, recall Procedure R in Chapter ^. Suppose that we are constructing a decision tree for a given it, T is considered as a growing (inter- mediate) decision tree under construction by Procedure R,(i.e., T is partially constructed by some steps of Procedure R) , and then we associate a partition it. with the i-th terminal node of T . i 1 tt. is a partition which is to be realized by a forthcoming possible subtree rooted at the terminal node of T n . (We note that the xt. 's 1 i are different from the a. 's. a. is simply a subcube as a block of a. i i Then the following Lemma 5«1- is obtained.) LEMMA 5.1. Assume that T and T are orthogonal and realize a and $, respectively. Then a necessary and sufficient condition for the pair of qj and p to be an orthogonal decomposition of n is ct. * P | «, for every i. PROOF: First we prove the necessity by contradiction. Assume that jt. * p < it. does not hold for a particular i. Then,, there exists a pair of two 0- cubes a and b such that 1) a and b are in a block of a. * p, and 2) a is in a block of n., but b is in another block of tt.. i i 2) means that a and b are in different blocks of jt since two different blocks of it. are in different blocks of tt. Then we are led to the fact that a and b are in a block of a. * 0, i.e., a block of a • p but not in a block of tc. 89 This shows that a • P < it does not hold. Next the sufficient condition is proved. If a. * p <: n. for every i, a block of a. * |3 which is a block of a • p for some i is in a block of it.. A block of it., however, is in a block of jr. Therefore, a • P < * holds. That is, a and p is an orthogonal decomposition of it. Q.E.D, EXAMPLE 5.6. To verify the above statement, we use the example from case (b) of Example 5«3v an d also refer to Example 5.^-. If we assume that T is a partially constructed decision tree by Pro- cedure R, we can associate the following partitions tt , it and it , with the respective terminal nodes of T , as shown in Figure 5-8. 90 FIGURE 5-8 As we have shown in Example 5.4., each a. * P is as follows: It is obvious that the inequality a. * P ^ jt. holds for every i, and we can conclude that this fa, f3) is an orthogonal decomposition of n. >si-*0 a ri a a i * p < *]_ « 2 * P < * 2 a 3 * P < * 5 If we modify the tree T p into T' as shown in Figure 5-9 •> then the a. * f3' are obtained in a similar way. 4* 91 Tg'O') °l " p ' U t\ QL * 0' ^ * 3 FIGURE 5-9- All a. * P are less than their corresponding Jt.'s. Therefore, (en, P') is another decomposition of it. a • P' is also shown in Figure 5-10. a . P' FIGURE 5.10. Next we present some properties which allow Theorem 5-3« to be stated more neatly. 92 PROPERTY 5.1. 1) a. and ir. always have the same coordinate sets, i.e., U = U for every i. a. jc. 1 i 2) a. * Ms always well-defined. PROOF : l) is obvious, and its proof is omitted. In order to prove 2), it is sufficient to show that S„ C U holds, where S Q is the set ' ' 6 = a. 8 i of essential conditions of 8. Let S be the set of essential conditions a of a and V. be the set of those conditions which appear along the path from the root to the i-th terminal node of T , respectively. Then, V. C S holds. Therefore, the coordinate set U of a. satisfies i = a ' a. ± i U = {all conditions} - V = S U S D -V. 3 g a. i a p i = 6 since V. <= S . In other words, a refinement of a. by 8 can be defined. Q.E.D. PROPERTY $.2. Suppose that T and T realize an orthogonal decomposition (a, 8) of it. If the i-th terminal node of T is located at the £-th level of T (i.e., £ internal nodes exist from the root to this node) and we let V. be a set of those conditions which correspond to I internal nodes, then conditions in S - V. are inessential to tx., i.e., #(S )- a i i' ' " •>} C do not appear along the path from the root to the i-th terminal node of T n , then S -{C,C,...,C} conditions are in- l' a p' q' ' r essential to n.. LEMMA 5-2. Make the same assumption as in Property 5*2. Let S be the i set of essential conditions of n . Then, all S are the same set for i it. i every i, and are equal to S . Q.E.D. PROOF : As we have seen in Property 5.1, the set U of jt.'s coordinates i is U = S_ U S -V. . Property 5-2, however, says that conditions in Jt. Bcci r o s ■> > a i S -V. are inessential to n. Therefore, the essential coordinate set a i i ' S of it. is S Q U S -V. - (S -V.) = B a . jr. i p a i v a x' p i This Lemma 5 '2. indicates that if T^ and H realize an orthogonal de- composition (a, (3) of n, then all partitions jc. and P are essentially #(S Q ) -cube partitions and they have the same set of essential conditions. P If we define multiplication and inequality among those two partitions of different sizes which have the same set of essential conditions 9 k (as a natural extension of conventional multiplication and inequality between two partitions of the same size), the condition a * 8 < it in Lemma 5.1. can be rewritten as simply (3 < ff.. This is obvious since a is a subcube and works as if it adjusts the difference of the i two coordinate sets of and it.. However, there is no difference between S„ and S . Therefore, we can compare 8 and it. in terms of B it . l l this newly defined inequality . THEOREM 5-3- Assume that T and T are orthogonal and realize a and 8. Then, 1) for this pair (a, 8) to be an orthogonal decomposition of it, S = S for all i must hold, and it. 8 i 2) a necessary and sufficient condition for this (a, 8) to be an orthogonal decomposition of it is that 8 < it. holds for all i. PROOF : l) is the same as Lemma 5-2. 2) is derived from Lemma 5«1. and Lemma 5.2. Q.E.D. This theorem is the main result of this section. Based on this analysis, in the following section we develop discussions about the synthesis problem of decompositions. 5A. Synthesis of Orthogonal Decompositions In this section, a procedure, called Procedure D, to construct orthogonal decompositions of it is shown. Theorem 5«3- plays a key role in that procedure: that is, once T, is given and the it. are determined 95 we check whether these essential condition sets S are all the same ft . 1 or not. If they are identical sets, then 3, the counterpart of a, can be determined by P = -.? ■ *■ all i i We always use equality S * \ \ , it. instead of 8 < ^ . n. J ^ all li = all l i since the maximum element over all 6' such that 8' < H n. is II n. and = i i i i 8 is always better than 8' (8' < 8) for the objective functions pre- viously defined. That is; PROPOSITION 5-5- If (a, 8) is an orthogonal decomposition of tt, then so is (a, 8 ' ) for any 3 ' such that 8 ' < 8 . Furthermore , the following two inequalities concerning objective functions hold: C (a, 8) < c (a, P'), and C 2 (a, p) < CgCa, £'). The proof is omitted since it is obvious. Now, consider an example to illustrate Procedure D. A rigor- ous description of the procedure will be given later. EXAMPLE 5-T- Assume a ^--cube partition jt shown in Figure 5-H« If T, is the decision tree with the root C only, the counterpart 6 is deter- mined by 8 = tt • n and it is also shown, a • 8 is also shown in Figure 5- 11 •> and it is verified that a • 8 < ir holds. 96 ■•0- T 2 0) - «! • * 2 3 X c, a . p FIGURE 5.11. 97 Now try to expand the tree T into T of Figure 5.12. (a), by- replacing two terminal nodes of T, by two C. r s as follows: All h- tt.'s are also shown and their essential condition sets, S , are all i the same, i.e., S = {C , C }. The counterpart 3' of a' is deter- i m ined by p' =-5-| **• and is also given in Figure 5-12. (a). We can con- struct the decision tree T^ realizing 3' T 1 *(«') P * =11 ji *J T^(3^ *1 * 2 * 3 \ FIGURE 5.12. (a). k Then we note the fact that instead of using 3' = .it jt! for determing P', it can alternatively be derived by 3' = B • 3 where 3-, and 3 p are two partitions obtained by removing the coordinate C. from 3* This process is shown in Figure 5*12. (b). 98 P'-fj_. P 2 -J FIGURE 5-12. (b). So far, we have obtained two orthogonal decompositions (a, p) and (a' , p') of jr. T' is obtained by expanding the tree T in such a way that the two terminal nodes of T are replaced by a condition CV. This fact implies a 1 < a. Moreover, the following two points should be noted regarding this step: l) two terminal nodes of T are replaced by the same condition CV, not by different conditions (say, C, for one node and CV for the other), and 99 2) p' can be determined by p' = p • p as an alternative to We know that the pair (I, xt) is a trivial orthogonal decomposition of jt. So this means we have obtained the following sequence of or- thogonal decompositions of Jt, (I, jt) - (a, p) » (a', p')- If we consider the process from (I, jt) to (a, P) as choosing a coordinate C_ from jt and transplanting it in the null tree (realizing the I-partition) , it forms T ; and the corresponding P can be deter- mined by P = jt • jt where jt and it are partitions obtained by removing the C coordinate from it. In a similar way, the second step from (cu, P) to (en', P') can be considered as selecting the condition C> from P and transplanting it at the two terminal nodes of T . Then its counter- part p' is determined by p' = p • P where P and P are two partitions obtained by removing the C« coordinate from p. Continuing this process n times we will obtain the following sequence of decompositions of it: (I, it) -> (a, p) - (a 1 , p') - (a", p") -> ... (a*'", I), by C 1 by C^ where I > a > a' '. * . . . . Then a question arises concerning point l) . Why should we choose the same condition C. to be transplanted at the terminal nodes of T ? To answer this question, choose CL and C, for the left and right terminal nodes of T and let this expanded tree be denoted by T" of Figure 5-13- 100 T^V) -J C 'U FIGURE 5 -13- Immediately we see the contradiction to the statement of the necessary condition in Theorem 5-3« I n practice, Jt_" and at " have essential conditions CL and C. . On the other hand, jt '' and jti," have C_ and CL 2 k ? 3 k 2 3 as their essential conditions. Since S „ U S_„ = { all conditions C n a, P" 1 through C^} and S^,, = (C*, C , C^} , S Q „ = {C 2 } must hold. Therefore, we can conclude that this a" realized by T" cannot have as its counterpart |3' for the pair to be an orthogonal decomposition of n, As we have shown in the above example, generally only one condition can be chosen and transplanted at all terminal nodes of T, in order to form the next tree T' . (This is also generally true for any step from (a (i) , p (i) ) to (a (i+l) , P (i+l) .) 101 There is another aspect of this situation, however. It seems to be just a special case but actually it is an important aspect. It is explained in the following. Assume the step from (a, P) . If we choose the condition C to be transplanted at the two terminal nodes of T , its corresponding tree I* is as in Figure 5.1^1-. (a). *(«*) V V y v u k p* = n i-l i'lGURE 5.1^- fa). T 2 *(P») •ji -x- - -/r * 1 2 Modified T* '3 '^ FIGURE 5.1^. (b). 102 k Then, the counterpart P* can be determined by .JJLrt.*. We note, however, that it, was essentially a 2 -cube partition and S = {C , C. } and C 1 ^ } 4 d is not essential. Also, jt* = n* holds, where jc* and it*; are obtained by removing the coordinate C from n . These facts suggest taking the left descendant node C out of T* and associating n* with the original left terminal node of T of Figure 5.1^. (b). Now we learned that the existence of the inessential condition C of tt causes some modification to our procedure. That is, if this C is chosen, n degenerates to a "one-size smaller" partition it* (= jc*) without transplanting this C at the corresponding terminal node of T . Now we can give the complete form of the procedure for con- structing a series of orthogonal decompositions for a given jr. It is named Procedure D. PROCEDURE D 1) Initially we set a = I and ^ = tt, and their correspond- ing essential condition sets S = and S p {all C.}, where #(S ') = n (i.e., we assume that jt is essentially an n-cube partition). Let T be a null tree, 2) For the step from (a^, p^) to (c/ i+1 \ ^ ±+1 h , we let T grow to T» ' by the following process. a) Choose a condition C^ 1+1 ' ) € S^, 1 ' and let S^ 1 ) = S^ U {C (i+l) j andS^ i+l) . S^ - {C (i+1) }. b) Assume that j is the index representing the j-th terminal node. 103 (Initially, this «. , is it = n.) J b-l) If C is essential to rt. ', then replace the terminal node of T by this C . Moreover, two (n-i-l)-cube partitions ■n . and it. i+D which are obtained by removing the coordinate C from the (n-i)-cube partition rt. ', are associated J with two newly created left and right descendant •i+i: nodes j and j diverging from C 5- 15- (a).) (See Figure FIGURE 5.15. (a). 10^ b-2) If C is not essential to it. , leave this j-th terminal node as it is. With this node is associated the (n-i-l)-cube partition it. , which J is the degenerated form of the (n-i)-cube partition it. with respect to the coordinate C . (See Figure 5.15.(b) .) T (i)( a (i)) T< i+l) (c* (i+l) ) FIGURE 5.15- (b) c) Repeat the above process b) for all j. Then, all it (i+1) and it. produced by b-l) and all it. by b-2) form the new set of t^ 1+ for all terminal nodes of the ex- panded tree T (i+D 1 d) Split the (n-i)-cube partition P into two (n-i-l)-cube partitions p and £>^ by removing the C - coordinate from ^ X \ 105 Let P (i+1) =f4 l} • $f>. 3) Repeat the above process 2) for i = 0,l,2,,,n-l. EXAMPLE 5.8. Readers are encouraged to verify the above by Example 5-7. There we had chosen C as C at i = and a (T , T ) pair was con- structed which realized an orthogonal decomposition, {a, P), of n. At the next step, i = 1, if we chose C = C = CV , then, the pair (T' , T') realizing (a', P') would be obtained. On the other hand, if we (2) chose C = C alternatively, Procedure D would yield the pair (T*, T*) realizing (c#, ft*) . In the next theorem, we will show that Procedure D guarantees the generation of a series of n orthogonal decompositions of n. THEOREM 5.k. The Procedure D, described above generates a sequence of n orthogonal decompositions of n, (a^\ pM) .(a (2) , p (2) )^(a (5) , p^)) .. . . (a^\ l) ( i+1 ) „ (i) where a < a . PROOF : The proof is by induction. For i = 0, it is true that a ~ I and p = 7t are an orthogonal decomposition of it. We assume, as the induction hypothesis, that (a , p ) is an orthogonal decomposition of it. In other words, P (i) < n «(*) s{^ n sj^ = and S n (1+l) U S^ i+l) -Jjl 2 r 1 2 {all C.) hold for the i-th step. Then we show that p' 1 ^ < 1J xc/ 1+1 ', i = J J io6 S (i+1) n S^ i+1) = and s| i+l) U S^ i+l) = {all C.} hold for the (i+l)-th step. Since p < rt. holds for every j, the selection C causes J p(i) < Tr ( 1+1 ) and p( 1+1 ^ < n ( : 1+1 ' > for j such that c' 1 ^ is essential s to / 1 ' ) (by the processes 2)- b-2) and 2)-d)). For j such that (T ' J ^ . -, ^ (i) o(i) (i) j. j. n j.1 j. o(i) ^ (i+l) is not essential to jt . , (3 < jt. unmediately means that P < it . and p^ < iri i+1 ' are true by 2)- b-2) and 2)-d). Therefore, for all j, (i + i) . (i) . p (i) < (i+i) holds . 1 2 = j By the way of construction, P . is the maximum element over all partitions P' such that P' < H jr.. Therefore, we obtain P = H J. x+1 \ It is obvious that S^ n sj^ = and S^ U S^ = {all C.) lj 1 2 r 1 2 l hold for all i = 0,1,,, n-1. Then, any decomposition (a , P " ' ) of jt is orthogonal. One more property, a^ < a , is also easily shown. T realizing a is a subtree of that T.j which realizes a . Therefore, cr 1+ ' < cr 1 . Q.E.D. Before ending this section, another theorem is presented which states the relationship among those decompositions generated by Procedure D from the viewpoint of objective functions. 107 THEOREM 5-5' In the step from (cr 1 ^, ^^) to (cr 1+1 ', ^ 1+1 in Procedure D (i = 0,1*, * n-l), 1) if a selected condition C is essential to all n . , J then (i+1) a (i+l) ^ (i) Q (i) holds. However, 2) if C is not essential to some Jt. , then it may j.1 j. (i+l) D (i+l) -, (i) (i) possibly occur that a, • (3 and a • P are j j., -l. « e (i+l) ^(i+l)\ j// (i+l) not comparable and that C [a , p ) = #{ct p (i+l) ) < G^ 1 *, p (i) ) =#(a (i) • p (i) ) is true. That is, {(or • P )} for 1*0,1,2,,, n-1 is not always monotonic in terms of inequality " < ", Therefore, {C^c/ 1 ', p' 1 ')} is not a monotonic function of i, either, although {a } is monotonic. PROOF : We prove l) first. If the assumption in l) holds, every terminal node of T is replaced by C as follows. 108 T, (1 W«) T (i + D (a (i + D ) (i+l) According to Theorem 5.1. in Section 50«* Q: P is realized by ,(1) T (D '1 * L 2 i+l) ,. ft (i+l) ,(1+1) J i+l) T.',' # T^ ' and soa ' • P is realized by T v # T p Then we compare these two trees, T n * T p and T^ •* T p 109 (1+1) 1 i 2 2 m(i+l) m T l * T 2 Since T is a subtree of T , attention is focus sed on the difference between T^Of T^ 1 ' * T^ and the following subtree of t| 1+ ' .(1+1) of Figure 5«l6< ,(i+l) (i+1) ,(i+D FIGURE 5-l6. 110 Then it is easy to see that the partition P realized by T^ is larger than or equal to the partition realized by this subtree because: v '""' realizes P which is determined by p^ """"'' = p^ 7 • p o ,(1+1) u 2 :i+d ji) . R (i) where p^ 1 ' and P^ are partitions obtained by removing the CT 1 coordinate from p^ 1 . The following sketch of Figure 5-17- is helpful in understanding the fact. ,(i) (i) (i+1) (i! (i+D (i+D (i) (!) FIGUEE 5.17- That is, in Figure 5«17»>the partition P is larger than or equal to the partition which is realized by the tree in the middle. This latter partition is larger than or equal to the partition realized by the third tree since p^ 1+1 ^ = p^ 1 ' . p^ 1 ) < pW f p^ 1 ). Therefore, now we can con- clude that T^ * T^ 1 " 1 " realizes the partition which is less than or equal to the partition realized by T.^ 1 ' * T ? . Ill (i+l) Q (i+l) ^ (i) «(i) i , -, In other words, cr • P < or y • p v y holds. In order to show the truth of statement 2), it is sufficient to give the following example . EXAMPLE $.9. Assume that we apply Procedure D to the following 5-cube partition it. If we select conditions CL and C, at i = and i = 1, respectively, then 3 4 (2) (2) (2) (2) we obtain the following pair T v and T^ ' realizing a and P ', respectively. 112 T< 2 V S) ) #(a (2) ) = * T< 2) (2) ) '1 P (2) #(P (2) ) = 5 (1+1) Then we choose C as C for i = 2. This condition C is not essential to some partitions jc. and Procedure D generates the pair T., and l\ as follows . t[ 5) (c/ 3) ) (3) #(oP } ) = 6 #(0 (5) ) = 3 113 (2) (2) Then we can show below two orthogonal decompositions a • (3 V ' (3) (3) and a, • p which are not comparable . c/ 2 ) . p<«> C.(a (2 \ P (2) ) = 20 c^ . p (5) C,(a (5) , p (3) ) = 18 Furthermore, C-Ca'*'', p'* 5 ') < C (cr^ p^) is true, because C_(cr* , P^) = #(a (3) • #(p (3) ) - 18 < C n (a (2) , p (2) ) . #(a (2) ) • #(P (2) ) - 20. = 1 Q.E.D. nh 5*5> Discussion of Optimal Decompositions In the previous section we have shown the procedure, Procedure D. to construct a series of orthogonal decompositions [{a , $r ')} for a given partition it. It, however, was not shown which condition C should be selected at each step of the procedure. The role of this procedure, therefore, is quite similar to that of Procedure R in Chapter 3 for constructing a decision tree for a given partition tc. It did not show which condition C should be selected at the i-th step, either. s It could show only the way to construct one of the decision trees realiz- ing Jt' (n'< it) for a given it. For both procedures, it is quite obvious that the way of choosing the condition C in Procedure D (or C s in Procedure R) greatly influences the cost of constructed decision trees. For decomposition problems, we have proposed two objective functions in Section 5*2. It is hoped that, based on Procedure D, some algorithms to construct optimal decompositions for these objective functions may be developed. Some intuitive (heuristic) algorithms or formulations by some mathematical programming tools are expected. In what follows, first we discuss the exhaustive search for optimal solutions, and thereafter a heuristic algorithm is proposed. In Procedure D, there are n! possible ways to select a sequence of C ■ s . For each sequence there are n orthogonal decompositions generated excluding the trivial decomposition (l,«). Then, totally, n • n! orthogonal decompositions can be generated if we exhaust all possibilities. This number, however, can be reduced to n • n!/2 since there is no essential difference between a> X ' and p^ 1 ': a cr 1 '' being one of the P^ 1 ' and a p^ being one of the cf X \ 115 The structure of the exhaustive algorithm can best be explained with the help of the following tree. START The node (pq...r) at the i-th level of the tree stands for T.j con- structed by C^ 1 ^ = C , (r ' = C , ,, and c' 1 ^ = C at each step of p q r Procedure D. From a node at the i-th level, (n-i) edges, showing possible (n-i) choices of C , diverge; and each of them connects to a node at the (i+l)-th level, say, from the node (12) of the second level to the node (123) in the third level by the edge C 3' This transition is shown next more explicitly. (We note that a de- generation occurs in T . ) 116 ,(2) ,(3) We learned that there are n • n! possible nodes in the above tree. Not all of them, however, is distinguished. We show an example, Corresponding to nodes (21) and (l2j , we have two decision trees T-j (2)' and T shown below, respectively. (2) J 2 ^ 4 2 ) *2 "3 .(2), (2), (2), (2), 117 (2) (2)' We note that two partitions a and a realized by these (2) (2)' (2) (2)* (2) (2)' T and Tp are identical and, moreover, it x ji* , rt£ ' = jr^ ' , (2) (2)' (2) (2)' (2) jC = TTp and *h *= *£ hold. Then, their counterparts p and (2)' P are also the identical partition. In other words, two orthogonal (2) (2) (2) ' (2) ' decompositions (a , P ) and (a , p ) are identical. Further- more, any selection of the same C (k = 3,4,..., n-l) for developing (2) (2)' both T, and T by Procedure D, yields the same decomposition (a , P ) for k = ^,K,..., n-l. In summary, we need not distinguish the node \21j from the node n.2J . Therefore, we let them merge together in the tree as follows. ©---© 118 (2)' However, if degeneration occurs for T , for example, we have a different situation. That is, the following two decision trees (2) (2)' Tj. and T should be distinguished from each other, because the application of Procedure D to these two trees results in different decompositions. Then, their corresponding nodes \2l) and \12J should not merge. ,(2) '2)' How many mergings of nodes occur in the generation tree is de- pendent of a given partition it. If no degeneration occurs, as an extremal case, at each step of Procedure D for any selected sequence of :*) then the generation tree degenerates to the following small tree due to mergings of nodes. There, we have only nodes (I £ ... f. where £ <£<...< i. at the i-th level for i = 1,2,..., n. This means that there are exactly (j_) nodes at the i-th level and we can conclude that this number of nodes totally amounts to 2 (?) = 2 n . 119 Instead of enumerating all n • n! candidates (this is another extremal case), we usually need to consider a number of decompositions "between 2 n and n • n' for the exhaustive method. If we calculate two objective functions C (a , |3 ) and Cp(cr , p ) for all decompositions that correspond to nodes of this tree, we can find the best solution. We need, however, the effort to generate all n • n! candidates. Therefore, we sacrifice guaranteeing the best solution but we save the cost of the work which would have been required to find some reasonable or suboptimal solutions for both objective functions. Based on Procedure D, Theorem 5.^-., and Theorem 5«5«>we discuss a heuristic algorithm for finding suboptimal orthogonal decompositions. We have already shown the following facts ,(i+X) ,, For any selected sequence of C (i = 0,1,,,, n-1), 120 ,(i) l) a decreases as i increases. That is, and, a (0) = l>a (1) >a (2 >>... #(a ) is a monotonically increasing function of i, #(a (0) ) =0 <#(a (l) ) <#(a (2) ) < ..., (Theorem 5.4.) however, #(a • P ) is globally decreasing, i.e., 2) a • p is not monotonic (Theorem 5. 5.). I n general, #(a (0) • p (0) ) =#(! • *) -#(*) <#(a (l) - P (l) ) < ##(p (1) ) >#(p (2) ) > ... . (Since any orthogonal decomposition a • P of jt must satisfy a • (3 < ti, #(a • P) > #(«) holds. For the first objective function, therefore, the trivial decomposition (a , P ) = (I, it) is optimal because C (I, Jt) = #(fl) achieves the minimal #(jt). However, this de- composition is not reasonable since its second objective function C (I, jt) = #(l) + #(jt)-2 is unreasonably large. Therefore, we consider "suboptiinal" as "reasonably good" for both objective functions , C (a, P) and C 2 (a, p) . ) Now we have two criteria, i.e., C (a, (3) = #(a) ■ #(P) and C 2 (a, p) = #(a) + #(p)-2. In order to find a reasonable solution by applying Procedure D, an algorithm has to show: l) which condition i+1) should be chosen at each step and 2) at which i we should stop 121 Procedure D. The first requirement corresponds to selecting an edge of the tree (defined before) and the second requirement corresponds to know- ing at which level of that tree we should stop. This is shown in the sketch below. START STOP In order to explain these points more simply, we assume that C 1 (c/ 1 ' ) , p^) = #(c/ 1 ' ) ) • #(P^) is constant for any i and for any selected sequence of C . Then an optimal decomposition for the second objective function CLCcr 1 ', P^O = #(0^0 + #(P^)-2 can 122 be found around the i-th step such that #(cr ') = #(a^ ') is attained. (Recall that x + y is minimized at x = y if x • y is constant for real numbers x and y.) This strong assumption of #(a ) • #(fr ') being constant is not true in general, but if #(a ) • #(P ) is gradually increasing and it does not deviate much from the lower bound #(«), the above criterion, #(or ) = #(p ), seems reasonable. In summary, we choose a condition C at each step so that its resultant C x (a (i+1 \ P (1+l) ) = #(a (i+l) ) ■ #(P (l+l) ) is minimized over all possible choices of conditions, and we stop this algorithm around i such that #(a ) = #(p ) is achieved. If we recall here that #((3 ) is generally decreasing and ff(of ) is increasing, we can realize that without much loss we reach this i in straightforward way. Therefore, this method to search for suboptimal solutions is not impractical. The condition f(oc ) = #(P ) is attained around i such that the derivative (gradient) of C (a , fv ') changes from negative to positive. (Recall also that the derivative of x + y changes from negative to positive at x = y as x increases, assuming that x • y is constant.) Therefore, alternatively we can state that we stop Procedure D around i such that C 2 (cr 1+1 ), p' 1 ^) * Q q( C, #(*) = 16 (1) ,(2) Then our algorithm selects C v J = C , C v = C, and C ,(3) « c . At i = 1, #(a (2) ) a#(P (2) ) is attained. And C (a (5) , p (5) ) £ C (a (2) , f3 (2) ) 2 X^ , r- / 2 at i = 2. Therefore, we stop the procedure at i = 1 or 2. Both are reasonably good decompositions. In Figure |5.l8.,(Q! , P ) as well as (tJ 1 ), T^) for i = 1,2,3 are shown. J- c 12k i) 1=0, c< 1+1 > -c< l >-c ifV 1 )) #(a (1) ) = 2 c 3 C 2 \^ C 1 (1) 2) i - I, C (i+1) = C (2) . C JD,.^ #(e (1) ) = 9 e^ 1 ', e (1) ) 18 9 J< a V 8) ) T (2) (p (2) } #(a (2) ) = It vl/ c - (2) #(P (2) ) = 5 C^a^, p (2) ) = 20 125 TpV 3) ) |c Cl J- #(P (3) ) = 3 #(a (3) ) = 7 C^a^, 3 (3) ) = 21 C 2 (o (3 \ P (3) ) - 8 To see what follows if we continue this procedure, we show below T and P which are generated by choosing C = C = CL. U) i . 3 , C (1+1) = C {k) = 0, TfV") T< U V°> I I CO #(a (U) ) = lU #0 (U) ) . 2 0l (a (U) , p (U) ) =28 126 APPENDIX LITERATURE SURVEY Much has been published concerning decision tables, decision table languages and their applications in various areas. The following books can serve as as introduction to this subject: Hughes et al. [6], Katzan [9], McDaniel [17], [l8], [19] and Pollack et al. [2k], Most of these texts cover introductory material through some specific applications and/or decision table languages. Some of them also include topics on decision table convert sion problems. A good summary and concise survey of research topics in this field can be found in Katzan [9] and King [12]. Many useful discussions concerning decison table conversion problems were first given by Montalbano [20]. Egler [2] attempted to give a very simple manual method for converting decision tables into decison trees. He thought that it minimizes both the average processing time and the storage requirement. Montalbano [21], however, refuted Egler r s algorithm by showing a counterexample. Pollack [23] proposed two plausible procedures: One for minimizing the storage requirement and the other for the average processing time. He asked readers to prove his algorithms or to offer counterexamples showing his algorithm fail. There is a counterexample in Sprague [3I] which shows that neither algorithm guarantees optimality for its respective objec- tive. By introducing the concept of entropy in information theory, Schwayder [28] modified Pollack's algorithms but it is known that the algorithm does not always generate an optimal tree either. Earlier, before Pollack's [25] appeared, Press [25] gave another simple manual procedure as well as an interesing discussion of decision table languages. 127 None of the methods discussed above generates optimal trees in all cases. On the other hand, Reinwald and Soland attack the conversion problems by a branch and bound method in their papers [26], [27]. Their algorithms guarantee to produce optimal trees, but they are fairly complex and time consuming. Recent work of Alster [l] gives an attempt to extend decision table conversion problems into a more generalized decision tree construction problem. It deals with constructions of optimal decision tree not only for rule partitions but also for general partitions. Several heuristic algorithms to minimize the number of internal nodes of the trees and their results by computer implementation are described. Another decision tree construction problem, called a binary identification problem by the author, can be found in Garey [3]. The decision table considered do not have "-" (dash) entries, which means that the correr- sponding cube partitions are of the following special type: For a decision table with m rule columns and n condition rows, the corresponding partition consists of m O-cubes (each corresponding to a rule) and one block of (2 - m) O-cubes (the Else-rule). Garey' s approach is similar to the well known optimal binary search tree constructions (Knuth [15] and Hu and Tucker [7l)« His main discussions are how the exhaustive algorithm which he describes can be improved. He finds some specific relationships among probabilities of occurence of rules and/or costs of conditions, which reduce the amount of work if they are met. Also to be mentioned here is an earlier book by Picard [22], which contains a number of results about general decision trees, usually of the type that under certain conditions a tree of a certain structure is optimal. There are also similarities between the construction of decision trees and relay network realizations of Boolean functions (see, e.g., Harrison [h]) . Since the role of a transfer relay in a network and of a decision box in a decision tree is the same, transfer relay realizations of Boolean functions are apparently a special case of decision tree construc- tions. In Marcus [l6], an algorithm to realize a Boolean function with a small number of transfer relays, using Karnaugh map techniques, is proposed. We know that the iterated local minimization in this thesis is a generaliza- tion of his method. Also, a correspondence between decision tree construc- tions and transfer relay realizations of Boolean functions is described by Seshagiri [293- The objective of these authors is to reduce the number of relays used in realizations of Boolean functions. There is no such a concept as an average processing time in switching theory, since all relays require the same period for thir operation. Slagle's work [30] is more close to our subject. He discusses an effective binary decision tree construction for a given Boolean expression. In this thesis we discussed only methods for converting decision tables into decision trees. There are, however, other fundamentally different approaches to processing tables by computers. Kirk [13] and King [10] proposed and developed the use of mask matrix techniques , respectively. Veinott [32] also shows a programming technique to interpret tables into computer programs. These methods, however, need the evaluation of all condi- tions for each input datum and it is obviously wasteful of execution time.. 129 Finally we mention two papers, King [11] and Press [25], for discussions of ambiguity and redundancy problems of decision tables. 130 LIST OF REFERENCES [l] Alster, J. M., "Heuristic Algorithms for Constructing Near-Optimal Decision Trees," M.S. Thesis, Department of Computer Science, University of Illinois at Urhana- Champaign, Urbana, Illinois, Report No. k-jk (August 1971)- [2] Egler, J. F., "A Procedure for Converting Logic Table Conditions into an Efficient Sequences of Test Instruction, " Coram. ACM , Vol. 10, No. 8, pp. 510-514 (August 1967) . [3] Garey, M. R., "Optimal Binary Decision Trees for Diagonostic Identification Problems," Ph.D. Thesis, Department of Computer Science, University of Wisconsin (1970). [h-] Harrison, M. A. , Introduction to Switching and Automata Theory , McGraw-Hill Book Company, New York; Chapter 7 (1965) . [5] Hartmanis, J. and Stearns, R.E., Algebraic Structure Theory of Sequential Machines , Prentice-Hall, Inc., Englewood Cliffs, New Jersey, Chapter (1966) . [6] Hughes, M. L. , Shank, R. M., and Stein, E. S., Decision Tables , MDI Publications (Management Development Institute Division of Information Industries, Inc.), Wayne, Pennsylvania (1968). [7] Hu, T. C, and Tucker, A. C, "Optimal Computer Search Trees and Variable -Length Alphabetical Codes," SIAM J. on Applied Math. , Vol. 21, No. k, pp. 51^-532 (December 197lT« [8] Huffman, D. A., "A Method for the Construction of Minimum Redundancy Cods," Proc. IRE, Vol. ko, No. 9, pp. IO98-IIOI (September 1952) . [9] Katzan, H. , Jr., Advanced Programming , Van Nostrand Reinhold, New York, Chapter 9 (1970). [l0] King, P. J. H. , "Conversion of Decison Tables to Computer Program by Rule Mask Technique," Coram . ACM , Vol. 10, NO. 2, pp. 135-1^2 (August 1967). [ll] King, P. J. H., "Ambiguity in Limitted Entry Decision Table," Coram . ACM , Vol. 11, No. 10, pp. 680-684 (October 1968). [12] King, P. J. H., "Decision Tables," Computer J., Vol. 10, No. 2, PP- 135-lte (August I967). 131 [lj] Kirk, H. W. , "Use of Decision Tables in Computer Programming/' Comm . ACM , Vol. 8, No. 1, pp. 4l-44 (January 1965) . [l4] Knuth, D. E., The Art of Computer Programming , Addison-Wesley Publishing Company, Reading, Massachusetts, Vol. 1, Chapter 2 (1968). [15] Knuth, D. E., "Optimal Binary Search Trees," Acta Informatica , Vol. 1, Fasc. 1, pp. 14-25 (1971). [l6] Marcus, M. P., "Minimization of the Partially-Developed Transfer Tree," IRE, Trans, on Electronic Computers, EC-6, pp. 92-95 (June 1957). [17] McDaniel, H. , An Introduction to Decision Logic Tables , John Wiley and Sons , Inc . , New York X19&J) , [l8] McDaniel, H., Applications of Decision Tables , Brandon/ Systems Press, Inc., New York TI970 ) . [19] McDaniel, H., Decision Table Software , Brandon/Systems Press, Inc., New York (1970). [20] Montalbano, M., "Tables, Flowcharts and Program Logic," IBM System J. , Vol. 1, pp. 5I-63 (September 1962). [21] Montalbano, M., "Egler's Procedure Refuted," Comm . ACM , Vol. 7, No. 1, p. 1 (January 1964). [22] Picard, C, Theorie des Questionnaires , Les Grands Problems des Science 20, Gauthier-Villars, Paris (in French) (1965) • [23] Pollack, S. L. , "Conversion of Limitted Entry Decision Tables to Computer Programs," Comm. ACM, Vol. 8, No. 11, pp. 677-682 (November 1965) . [24] Pollack, S. L., Hicks, H. T., Jr., and Harrison, W. J., Decision Tables : Theory and Practice , John Wiley and Sons, Inc., New York (1971). [25] Press, L. I., "Conversion of Decision Tables to Computer Programs," Comm . ACM , Vol. 8, No. 6, pp. 385-390 (June 1965) . [26] Reinwald, L. T. and Soland, R. M., "Conversion of Limitted Entry Decision Tables to Optimum Computer Programs I: Minimum Average Processing Time," J. ACM , Vol. 13, No. 3, pp. 339- 358 (July I966). 132 [27] Reinwald, L. T. and Soland, R. M., "Conversion of Limitted Entry Decision Tables to Optimum Computer Programs H: Minimum Storage Requirements," J. ACM, Vol. 1^, No. k, pp. 7^2-755 (October 1967) . [28] Schwayder, K. , "Conversion of Limitted Entry Decision Tables to Computer Programs A Propsed Modification to Pollack's Algorithm," Comm . ACM , Vol. lk, No. 2, pp. 69-73 (February 1971) . [29] Seshagiri, N. , "Relay Tree Network Decomposition of Decision Tables," Proc . IEEE , Vol. 55, No. 9, pp. 16^8-16^9 (September I967T [30] Slagle, J. R., "An Efficient Algortihm for Finding Certain Minimum-Cost Procedures for Making Binary Decision," J. ACM , Vol. 11, No. 3, pp. 253-26^ (July 196^). [31] Sprague, V. G., "On Storage Space of Decision Tables," Comm . ACM , Vol. 9, No. 5, pp. 319-320 (May 1966) . [32] Veinott, C. G., "Programming Decision Tables in FORTRAN, COBOL or ALGOL," Comm. ACM, Vol. 9, No. 1, pp. 31-35 f January 1966) 133 VITA The author, Toshio Yasui, was born in Kyoto, Japan, on May lk, 19^3- He received the Bachelor of Science and the Master of Science degrees both in electronic engineering from Kyoto University in 1966 and 1968, respectively. From September I968 to June 1971? he worked as a research assistant with the Illiac IV Project of the Department of Computer Science of the University of Illinois at Urbana-Champaign for their development of the large scale parallel computer Illiac IV. He has been a research assistant with the Center for Advanced Computation of the University of Illinois at Urbana-Champaign since June 1971. He is a member of the Institute of Electrical and Electronics Engineers and the Association for Computing Machinery. LIOGRAPHIC DATA ET 1. Report No. UIUC DCS-R-72-501 3. Recipient's Accession No. it le and Subtitle inversion of Decision Tables into Decision Trees 5. Report Date February, 1972 uthor(s) Toshio Yasui 8- Performing Organization Rept. No. 501 erforming Organization Name and Address Department of Computer Science University of Illinois at Urbana- Champaign Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract/Grant No. USAF 50(602) klkk- DAHC04-72-C-0001 Sponsoring Organization Name and Address Illiac IV Project, Center for Advanced Computation, and Department of Computer Science of the University of Illinois at Urbana-Champaign, Urbana, Illinois ' 61801 13. Type of Report & Period Covered Ph.D. Thesis 14. i. upplementary Notes , bstracts Known manual methods of converting decision tables into decision trees are ased mainly on plausible arguments with little theoretical backgrounds. Our inten- icn is to establish a new theory in this field. By considering a special kind f partitions of 2 n vertices of an n-cube as a model of decision tables, we put the onversion problem into a simplified and abstract form. We derive some theoretical esults concerning the optimization problem, and then an algorithm called iterated ocal minimization is proposed and compared quantitatively with other algorithms. Also, the new topic, a decomposition theory of decision tables and decision rees is presented. We consider decomposing decision tables on decision trees into nailer ones so that they can be processed effectively in parallel. • ;y Words and Document Analysis. 17o. Descriptors decision tables, decision table languages, decision trees, decomposition leory, parallel processing, partially ordered sets of partitions, partitions of vertices of an n-cube partition. >>• lentifiers /Open-Ended Terms - JSATI Field/Group A 'lability Statement Release unlimited 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 137 22. Price IS-35 ( 10-70) USCOMM-DC 40329-P71 Jtt» \<*n