XAnflflflflAJUU' 
 
 mtigm 
 
 Wtimm 
 WSSM 
 
 WMBm 
 
 HS558™ 
 
 IBB 
 
 ■nil 
 
 Wm 
 
 wwiiBrai 
 HuMMhiiI 
 
 HKBSfltSil! 
 
 mm 
 W 
 
 mm 
 
 HKhShH 
 
 r 
 
 ■If 
 
 HI ran 
 
 mBBSKBXmnm 
 
 rawwIiiMvsHilHU 
 
 hV 
 
 
 IITiHIIBnnTliiBflifflTffm 
 
 ■■■■■■■■UlllnBBlnMKVM 
 
 wmmmm 
 
 wUBmSSSmlSSli 
 
 mSSSmmm 
 
 HHHflHIMHnnnRnfli 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510 .*4 
 ULCaT 
 
 cop' 2 ' 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/conversionofdeci501yasu 
 
UIUCDCS-R-72-501 
 
 7#*cc£ 
 
 CONVERSION OF DECISION TABLES INTO DECISION TREES 
 
 by 
 
 Toshio Yasui 
 
 February, 1972 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
DCS Report No. UIUCDCS-R-72-501 
 
 CONVERSION OF DECISION TABIES INTO DECISION TREES 
 
 by 
 
 Toshio Yasui 
 
 University of Illinois at Urbana-Champaign 
 Urbana, Illinois 6l801 
 
 February, 1972 
 
 This work was supported in part by the Advanced Research Projects 
 Agency under Contract No. USAF 30(G02)klhk and Contract No. DAHCO^-72-C-OOOl 
 and was submitted in partial fulfillment of the requirements for the degree 
 of Doctor of Philosophy in Computer Science in the Graduate College of the 
 University of Illinois, 1972. 
 
iii 
 
 ACKNOWLEDGEMENT 
 
 The author wishes to express his deepest gratitude to his 
 supervisor, Professor Jurg Nievergelt of the Department of Computer Science 
 of the University of Illinois for his guidance, stimulating discussions and 
 suggestions, encouragements, and patience during the past two years. 
 
 He is also indebted to Professor Daniel L. Slotnick, the director 
 of the Center for Advanced Computation of the University of Illinois, and to 
 the former Illiac IV project for their continued support for this thesis 
 research. 
 
 He is thankful to Mr. Jay N. Culliney for his proofreading, and 
 special thanks are also due to Mrs. Patresa A. Grennan for typing the origi- 
 nal draft and most of the final manuscript. My greatest appreciation goes 
 to Mrs. Gayanne Carpenter for making the arrangements which allowed this 
 thesis to be completed on time and also for typing Chapters 1 and 2, and to 
 Mrs. Glenna D. Ganow for her typing of Chapter k of this thesis. 
 
 Finally, he would like to thank personnel of the Department of 
 Computer Science Print Shop under the supervision of Mr. Dennis L. Reed. 
 
iv 
 
 TABLE OF CONTENTS 
 
 Page 
 
 1. INTRODUCTION 1 
 
 2 . DECISION TABLES AND THEIR CONVERSION PROBLEMS k 
 
 2.1. Decision Tables k 
 
 2.2. Conversion Problem 7 
 
 2.3. Cubic Representation of Decision Tables 10 
 
 3. MATHEMATICAL PRELIMINARIES 16 
 
 3.1. Introduction 16 
 
 3-2. Algebraic Foundations 17 
 
 3»3- Partitions of an n-cube and Decision Trees 20 
 
 3.^. Lattices of n-cube Partitions 26 
 
 k. SOME ASPECTS OF DECISION TREE CONSTRUCTION 3^ 
 
 h.l. Introduction J>k 
 
 4.2. How to Construct Decision Trees 35 
 
 4.3- Comparative Study of Algorithms kQ 
 
 k.k. Bounds of Minimum Total Loss 5^ 
 
 4.5« Optimality Discussions for Different Objective Functions 56 
 
 5 . A DECOMPOSITION THEORY OF DECISION TABLES AND DECISON TREES 71 
 
 5.1. Introduction 71 
 
 5.2. Decomposition Problem and Objective Functions 73 
 
 5 • 3 • Analysis of Orthogonal Decompositions 82 
 
 5.4. Synthesis of Orthogonal Decompositions 9^" 
 
 5.5* Discussion of Optimal Decompositions 11^ 
 
Page 
 
 APPENDIX LITERATURE SURVEY 126 
 
 LIST OF REFERENCES 1JO 
 
 VITA 133 
 
1 . INTRODUCTION 
 
 Decision tables have been widely accepted as a convenient 
 technique for specifying complex logical relationships in such diverse 
 computer application areas as data processing in census studies, process 
 control in manufacturing, management information processing systems, and 
 so on (see, e.g., McDaniel [17], [18], [19] or Pollack et al, [2k]). 
 A key problem for the successful use of decision tables is how to process 
 them efficiently on a computer. One possible way is to convert (pre- 
 process, translate or compile) a decision table into a special kind of 
 flowchart, known as a decision tree. A simple but reasonable measure 
 of the complexity of such a decision tree, is the number of its internal 
 nodes, i.e., the number of decision boxes which appear in it. This is 
 a special case of more elaborate measures of the memory requirement and 
 the average processing time of decision trees. 
 
 Several methods have been proposed in the literature for 
 converting decision tables into decision trees with the objectives to 
 minimize these two measures of the complexity. Some of these, intended 
 primarily as manual methods (Egler [2], Press [25], and Pollack [23]) 
 are based on plausible arguments, but with little theoretical background. 
 Others, such as the use of the branch-and-bound method (Reinwald and 
 Soland [26], [27]) are very general in nature, and might be improved with 
 more knowledge of the specific structure of decision trees. A recent 
 thesis by Garey [3] investigates such problems which are concerned with 
 
the structure of optimal decision trees. However, this structure is 
 still not sufficiently well understood to be able to design efficient 
 algorithms for the construction of optimal decision trees. Hence, 
 further investigations of this topic are appropriate. 
 
 This thesis is devoted to theoretical investigations of decision 
 table conversion problems. For this purpose we present a simplified 
 model of the optimization problem. A special kind of partition of the 
 set of 2 vertices of an n-cube is considered as a model of a decision 
 table with n condition rows. The problem of converting such an n-cube 
 partition into a binary decision tree is discussed, based mainly on the 
 simplified objective function mentioned above. 
 
 The structure of this thesis is as follows. In Chapter 2, 
 decision tables and their conversion problems are briefly described 
 using conventional terminology. Then, in Chapter 3, we present 
 mathematical concepts and notations which appear throughout the remainder 
 of this thesis. The next two chapters are the main body of this thesis. 
 
 In Chapter k we describe a procedure, called Procedure R, to 
 construct decision trees for a given n-cube partition, and based on this 
 procedure, we propose an algorithm, "iterated local minimization". It does 
 not always yield optimal solutions, but generates suboptimal trees. The 
 resultant decision trees are compared quantitatively with the trees 
 constructed by Pollack's algorithm (Pollack [23]) and with optimal 
 
decision trees. This chapter also contains an analysis of "rule -splits", 
 in particular, lower and upper bounds for the minimum number of required 
 rule-splits over all n-cube rule partitions. Since a decision tree which 
 is optimal with respect to one objective function is not necessarily 
 optimal for other objective functions, relationships existing among 
 optimal decision trees under different objective functions are also 
 studied in the chapter . 
 
 Chapter 5 i- s another major divison. The entire chapter is 
 devoted to a presentation of the new topic, a decomposition theory of 
 decision tables and decision trees. Recent development of multiprocessor 
 systems or parallel computers provide the motivation behind the study. We 
 consider decomposing decision tables or decision trees into smaller ones 
 so that they can be processed effectively in parallel. After a 
 theoretical analysis of decomposition, we propose a procedure, called 
 Procedure D, to construct a pair of decision trees from a given n-cube 
 partition. Based on this procedure, a heuristic algorithm is shown. 
 
 Appendix contains a short summary of some of the main 
 contributions to our topic. 
 
2. DECISION TABLES AND THEIR CONVERSION PROBLEMS 
 
 2.1. Decision Tables 
 
 Although flowcharts are a -widely accepted means of describing 
 the logic of computer programs, they have several significant dis- 
 advantages which should encourage analysts to seek alternate methods 
 for stating the pertinent aspects of a program. Decision tables 
 provide such an alternative. First some of the disadvantages of the 
 flowcharting techniques are listed: 
 
 1. Although flowcharts are often very appropriate for 
 describing scientific programs where each box can represent a certain 
 amount of computation, they are often not appropriate for system 
 programs, business data processing or information retrieval, where 
 
 a long sequence of logical decisions must be made. 
 
 2 . Flowcharts for complex problems tend to become lengthy and 
 difficult to follow and modify. 
 
 Decision tables tend to overcome these disadvantages while 
 providing some advantages as well. 
 
DOES HE HAVE A GOOD DRIVING RECORD ? 
 
 Y 
 
 Y 
 
 Y 
 
 N 
 
 N 
 
 N 
 
 IS HE OVER 25 YEARS OF AGE ? 
 
 Y 
 
 N 
 
 N 
 
 Y 
 
 Y 
 
 N 
 
 IS HE MARRIED ? 
 
 - 
 
 Y 
 
 N 
 
 Y 
 
 N 
 
 - 
 
 INSURE 
 
 X 
 
 X 
 
 X 
 
 X 
 
 - 
 
 - 
 
 CHARGE RISK RATE 
 
 - 
 
 - 
 
 X 
 
 X 
 
 - 
 
 - 
 
 REJECT APPLICATION 
 
 - 
 
 - 
 
 - 
 
 - 
 
 X 
 
 X 
 
 TABLE 2.1. 
 
 Table 2.1. is an example of a decision table of driver 
 insurability. It has four major sections. The condition stub is the 
 upper left quadrant and contains descriptions of conditions on which 
 decisions are to be based. Conditions are usually represented as 
 questions. The action stub occupies the lower left quadrant and 
 supplies all possible actions for the conditions listed above. The 
 condition entry section is found in the upper right quadrant and 
 answers the questions found in the condition stub. All possible 
 combinations of answers to the questions are formed here where the 
 responses are restricted to "Y" to indicate "Yes" and "N" to indicate 
 "No". If no response is indicated, then the response need not be 
 checked for that particular question and "-" (dash) is written there. 
 The action entry is the remaining quadrant of the table and indicates 
 
the appropriate actions resulting from the conditions above. The 
 only permissible entry here is the "X" to indicate "take this action". 
 One or more actions may be designated for each combination of responses. 
 Each of the various combinations of responses along with the indicated 
 actions for that combination is called a rule . The various rules are 
 usually numbered or lettered for identification purposes. Below we list 
 some advantages of using decision tables. 
 
 1. Logic is stated precisely and compactly. 
 
 2. Complex logic is easier to understand, and the relation- 
 ship among variables is readily understood. 
 
 3« Decision tables lend themselves to update and change. 
 
 k. The tables are appropriate for independent review and 
 documentation . 
 
 Here we refer to the case of a large-scale decision table 
 implementation for the Census of Agriculture (196*0 (McDaniel [l8])- 
 
 A questionnaire with 335 questions was sent to each farm over 
 3,100 counties. Then, after 600 to 800 items were tabulated, a decision 
 table of 600 pages was made. It generated 20,000 to 25,000 lines of 
 code and needed 3,000 hours of UNIVAC 1107 computer time. In total, 
 53 man-years of programming with 1^ man-years on the edit program 
 were required. 
 
 Problems such as these stimulated the development of programming 
 languages which are based on decision tables. Some of these are listed 
 in chronological order of their development (see, e.g., McDaniel [19]): 
 
TABSOL--an experimental tabular language for GE machines, 
 
 LOGTAB for the IBM 70^ and FORTAB--an extension of FORTRAN 
 with a quite extensive and sophisticated decision 
 table facility, 
 
 DETAB-X--a COBOL oriented decision table language, and 
 
 DETAB-65--a further development of DETAB-X. 
 
 Associated with the development of these languages, there 
 arose a major problem of algorithms which will compile decision tables 
 into efficient programs. This problem is the motivation and practical 
 background for our theoretical study. 
 
 2 .2 . Conversion Problem 
 
 One important way to process a decision table by a computer 
 is to transform it into a special kind of flowchart known as a decision 
 tree . Since there are usually many decision trees which correspond 
 to a given decision table, the problem arises of finding the ones which 
 are most efficient according to some criteria of optimality. Consider 
 as an example Table 2.2. It is a simplified decision table in that the 
 action stub and entry are not presented. 
 
 c l 
 
 Y 
 
 Y 
 
 Y 
 
 N 
 
 C 2 
 
 Y 
 
 - 
 
 N 
 
 - 
 
 °3 
 
 N 
 
 Y 
 
 N 
 
 - 
 
 GO TO 
 
 % 
 
 R 2 
 
 R 3 
 
 \ 
 
 TABLE 2.2. 
 
8 
 
 This decision table can be realized by each of the decision trees shown 
 in Figures 2.1. (a)., 2.1.(b)., and 2.1. (c). among others. 
 
 R. 
 
 R, 
 
 FIGURE 2.1. (a). 
 
 h R i 
 
 FIGURE 2.1. (b). 
 
 R 3 \ 
 
 H 2 \ \ \ 
 
 R, 
 
 FIGURE 2.1. (c). 
 
They suggest some significant points about optimizing the 
 conversion of the table into a decision tree. These three trees embody 
 the same logical consequences but differ in the procedures they specify 
 for arriving at the rules . They are not equally good from the viewpoint 
 of memory requirement and processing time. For example, the tree (c) 
 always requires that all three conditions be evaluated, while for the 
 trees (a) and (b) the number of conditions to be evaluated is sometimes 
 fewer. To characterize this problem the following quantities are 
 introduced: 
 
 s.; memory space required for condition c. , 
 t. ; time required for processing condition c, and 
 p.; probability with which the j-th rule, R., occurs. 
 Then the total memory requirements, S, and the average 
 processing time, T, for each of the decision trees can be calculted 
 as follows : 
 
 (a) S a = s ± + s 2 + s 3 
 
 T a = ( p l + ^'^l + ^ + t 3 ) + V 2 '(\ + t 3 ) + P)4 ' t 1 
 
 (b) S b = 2s 1 + s 2 + s 
 
 T b = (p 1 + P 3 )'( t 1 + t 2 + t 3 ) + (p 2 + P^)^^ + t ) 
 
 (c) S c = ks 1 + s 2 + 2s 
 
 T c - (t x + t 2 + t ) 
 
10 
 
 The expression for T is a generalization of a quantity 
 known as the weighted external path length of a "binary tree (see, e.g., 
 Knuth ClU]). It becomes identical to this quantity if 
 
 t., = t n = t = 1. Since S < S, < S and T < T, < T , for all positive 
 
 123 a b c aoc 
 
 values s., t. and p., E p . = 1, hold, (a) is the best of the three 
 11 J all j J 
 
 decision trees with respects to both memory requirement and processing 
 time. However, in general, a tree which is optimal in one respect need 
 not be optimal in another . 
 
 2.3- Cubic Representation of Decision Tables 
 
 The purpose of this section is to introduce a representation 
 of a decision table by an n-dimensional cube. By using this cubic 
 representation we can give a more intuitive interpretation to several 
 procedures for generating decision trees, as well as a mathematically 
 more precise formulation. The mathematical terminology and notation used 
 in this thesis is introduced in the next chapter. Here we discuss the 
 cubic representation of a decision table at an intuitive level. 
 
 Let us consider Table 2.2. again. If we replace Y and N by 
 1 and 0, respectively, then a rule, say R , becomes the triple (1, -,l). 
 Similarly, R^ R g and R, are (1,1,0), (1,0,0) and (0,-,-), respectively. 
 Such triples can be identified with vertices or certain sets of vertices 
 (namely, subcubes) of the 3-dimensional cube as shown in Figure 2.2. 
 
11 
 
 j 101 
 
 FIGURE 2.2. 
 
 Note that the conditions C. correspond to the coordinate 
 axes of the cube. R and R each correspond to a vertex, or 0-cube, R to 
 an edge, or 1-cube, and R. to a face, or 2-cube. Thus a decision table 
 with n conditions can be represented by a partition (of a special kind) of 
 the set of vertices of an n-cube. 
 
 Now we explain how a decision tree can be obtained from the 
 cubic representation. The decision tree of Figure 2.1. (a), is taken as an 
 example. Correspondingly, the procedure is illustrated in Figure 2.3- (a). 
 
12 
 
 FIGURE 2.3. (a). 
 
 R. 
 
 R, 
 
 The process starts by separating the 3-cube into two 2-cubes 
 by removing the coordinate C. . Then each 2 -cube can be again separated 
 into 1-cubes by removing a coordinate. This process continues until 
 each separated cube consists of exactly one rule, that is, until every 
 
 rule is identified. 
 
13 
 
 Let us consider another example, Figure 2.1.(b). Its 
 corresponding process can be illustrated as in Figure 2. 3 -(b). 
 
 R. 
 
 R, 
 
 FIGURE 2.3. (b). 
 
Ik 
 
 This case creates a somewhat different situation. That is, 
 when C„ is taken first, the 2-cube R. is split into two 1-cubes, R£ 
 and Rj". They are the same rule but are separately identified at the 
 terminals of the decision trees. Such a "rule- split " (i.e., a cube 
 consisting of a single rule is separated) increases the number of nodes 
 of the resulting decision tree, i.e., the number of condition boxes 
 which appears in the tree as well as the number of terminal nodes, which 
 represent cases distinguished by the tree. Such rule-splits occurred 
 in both of the trees in Figures 2.1. (b). and 2.1.(c). 
 
 Before we develop this cubic representation in the succeeding 
 chapters, we give some remarks about such conventional terms as Else-rule , 
 redundancy or contradiction of a decision table (see, e.g., Pollack [2k] ) . 
 
 (1) In general, the set of all rules of a decision table 
 does not cover its associated cube completely. This means that some 
 vertices of the n-cube are left unspecified and no action is taken for 
 these unspecified vertices. We call the set of these vertices the 
 Else-rule. The Else-rule is not necessarily a single subcube as is an 
 ordinary rule . 
 
 (2) It may also occur that, for a given decision table, two 
 
 or more different rules are assigned to the same vertex. In other words, 
 they overlap at that vertex. If the series of actions taken for these 
 rules are the same, redundancy exists. If this is not the case, however, 
 then, there are different series of actions for that vertex. This is 
 called contradiction . 
 
15 
 
 In this thesis, we consider neither the case of the Else-rule nor 
 the case of redundancy or contradiction in a decision table, but only the 
 case where the set of all vertices is partitioned into a set of disjoint 
 subcubes. These restrictions are natural since a decision tree whose in- 
 ternal nodes correspond to single conditions C. (i.e., not logical combina- 
 tions of conditions C. such as C. V C. ' C. ) can realize only such a special 
 
 1 1 j k/ " * 
 
 type of partition of all vertices. 
 
16 
 
 3. MATHEMATICAL PRELIMINARIES 
 
 3-1. Introduction 
 
 In this chapter we first review some "basic algebraic concepts 
 concerning the lattice of partitions of a set, and then some additional 
 terminology and notations are introduced. By using these, we put deci- 
 sion table problems into a more abstract and simplified form for their 
 theoretical developments in succeeding chapters. Particularly, n-cube 
 partitions and decision trees will be used instead of decision tables 
 and flowcharts, respectively. 
 
 Among those introduced in this chapter, two concepts are es- 
 pecially basic and important to understand this thesis. One of these 
 is inequality between two cube partitions, and the other is multipli- 
 cation of cube partitions. The former concept will be used often for 
 describing the conversion procedure from cube partitions into decision 
 trees in Chapter k, while the latter will play an essential role in 
 the decomposition theory of decision tables presented in Chapter 5. 
 
 In the last section of this chapter, we present a preliminary 
 study of some sets of n-cube partitions and show that each set forms a 
 lattice with the introduction of two operations. More advanced study 
 of this lattice in relation to optimality discussion is presented in 
 Chapter h, however. 
 
17 
 
 3-2. Algebraic Foundations 
 
 This entire section is excerpted from the book by Hartmanis and 
 
 Stearns [5]. 
 
 A relation between a set S and a set T is a subset R of S X T 
 
 and for (s,t) in R we write s R t. Thus, 
 
 R = { (s,t) | s R t }. 
 A relation R on S x S is: 
 
 reflexive if, for all s, s R s; 
 
 symmetric if s R t implies t R s; 
 
 transitive if s R t and t R u implies s R u. 
 A relation R on S is an equivalence relation on S if R is reflexive, 
 symmetric and transitive. 
 
 If R is an equivalence relation on S, then for every s in S, 
 the set 
 
 B R (s) = { t | s R t } 
 is an equivalence class (i.e., the equivalence class defined by "s".) 
 
 A partition tt on S is a collection of disjoint subsets of S 
 whose set union is S, i.e., 
 
 * = {B } 
 
 a 
 such that 
 
 B n B„ = for a I p 
 
 and 
 
 U {B } = S. 
 a 
 
 We refer to the sets of it as blocks of tt and designate the block 
 which contains s by 
 
18 
 
 B n (s). 
 
 We write 
 
 s = t(n) 
 if and only if s and t are contained in the same block of n, 
 Note that s = t(it) if and only if B (s) = B (t). 
 A binary relation R on S is a partial ordering of S if and only if R is 
 
 (i) reflexive: s R s for all s in S, 
 
 (ii) antisymmetric: s R t and t R s implies t sb s, 
 
 (iii) transitive: sRt, t R u implies s R u. 
 
 We refer to a set S with a given partial ordering R as a par - 
 tially ordered set . When a relation R is a partial ordering, we use 
 the more suggestive' symbol " < " instead of R, and the partially ordered 
 set is represented by the pair (S, <)« 
 
 Let (S, <) be a partially ordered set and T be a subset of S. 
 Then s (in S) is the least upper bound (l.u.b.) of T if and only if 
 
 (i) s > t for all t in T; 
 
 (ii) s' > t for all t in T implies that s' > s. 
 
 Dually, s is the greatest lower bound (g.l.b.) of T if and only if 
 
 (i) s < t for all t in T 
 
 (ii) s' < t for all t in T implies that s' < s. 
 
 A lattice is a partially ordered set, L = (S, <), which has a 
 l.u.b. and a g.l.b. for every pair of elements. 
 
 We now give an equivalent definition of a lattice in terms of 
 the l.u.b. and g.l.b. operations. 
 
19 
 
 A lattice L is a triplet 
 
 L = (S, •, +) 
 where S is a non-empty set of lattice elements and " • " and " + " are 
 binary operations satisfying the four postulates (i) to (iv) below, 
 known respectively as the idempotent, commutative, associative and ab- 
 sorption laws. 
 
 (i) x • x = x and x + x = x 
 
 (ii) x • y = y • x and x + y = y + x 
 
 (iii) x • (y • z) = (x . y) • z and x + (y + z) = (x + y) + z 
 
 (iv) x • (x + y) = x and x + (x • y) = x. 
 
 Let L = (S, ' , +) satisfy the conditions of the above definition 
 of a lattice and define 
 
 x < y if and only if x • y = x. 
 
 Then it can be verified that (S, <) is a lattice and 
 
 g.l.b. (x, y) - x • y and 
 
 l.u.b. (x, y) = x + y. 
 
 If L is a finite lattice, then L itself has a l.u.b. and g.l.b., 
 denoted by I and 0, respectively. Element I is called the identity because 
 
 I • x = x for all x in L. 
 
 Element is called the zero because 
 
 x + = x for all x in L. 
 
 Let L = (S, •, +) be a lattice and T a nonvoid subset of S. 
 Then L' = (T, •, +) is a sub lattice of L if and only if x and y in T 
 implies that x • y and x + y are in T. 
 
20 
 
 ,n 
 
 3.3. Partitions of an n-cube and Decision Trees 
 
 The n- dimensional cube, or n-cube for short, has 2" vertices 
 with n lines emanating from each vertex. The vertices of the n-cube are 
 labeled with n-tuples of zeros and ones such that two vertices are con- 
 nected by a line if and only if these labels differ in exactly one position. 
 As examples, the cubes n = 1, 2 and 3 are shown in Figure 3.1. 
 
 o 
 
 
 
 -o 
 
 111 
 
 Oil 
 
 010 
 
 00 
 
 10 
 
 101 
 
 100 
 
 000 
 
 1-cube 
 
 2 -cube 
 
 3 -cube 
 
 FIGUBE 3.1. 
 
 Let us agree to call a single vertex a 0-cube. Then a pair of 
 adjacent 0-cubes will determine an edge or 1-cube. If the two vertices 
 are, say, (1,0,1) and (0,0,1), then we shall denote this 1-cube as (-,0,1). 
 As " - " ranges over {0,1}, this represents two vertices of the 1-cube. 
 In a similar way, a 2-cube is made up of four 0-cubes. For instance, 
 (0,0,1), (0,1,1), (1,0,1) and (1,1,1) make up the 2-cube (-,-,1), where 
 
 each 
 
 ranges over {0,1} independently. 
 
21 
 
 Now we define two different n-cube partitions as follows 
 
 DEFINITION 3.1. 
 
 J3 
 
 A partition s of a set of 2 vertices of an n-cube is called 
 an n-cube partition . The number of blocks of a partition it is denoted 
 by #(*) . 
 
 If each block, B., of this partition n is a single k-cube 
 partition (k < n), then it is called an n-cube rule partition. 
 
 EXAMPLE 3.1. 
 
 We show a 3-cube rule partition n, below as an example, 
 
 * 1 = {B^ Bg, B 3 , B^}, where B ] _ = {(0,0,0)}, Bg = {(1,0,0)}, B 3 = {(-,0,l)}, 
 and B. = {(-,1,-)}. We note that each B. (i= 1,2,3**0 of rt is a single 
 cube. 
 
 The following partition jt is a 3-cube partition but not a rule 
 partition, however, because B is not a single cube, nor is B . it = {B , 
 Bg, B 3 , B^}, where B ± = {(0,0,0), (l,0,l)}, Bg = {(0,0,1)}, B 3 = {(0,1,0), 
 (0,1,-), (1,1,1)} and B^ = {(l,-,0)}, respectively. 
 
22 
 
 Both rule partition and (general) partition of the n-cube can 
 be considered as simple models of a decision table with n condition rows. 
 If we are concerned only with the condition stub and condition entry por- 
 tions of a decision table (which has no Else-rule nor redundancy or ambi- 
 guity), a rule partition corresponds uniquely to a decision table with 
 n conditions. An n-cube partition, more generally, can be considered as 
 another model of a decision table. A block of a partition corresponds to 
 an action of a decision table. 
 
 As long as conversion techniques are concerned, these two parti- 
 tions serve as simple theoretical models of a decision table for the 
 discussion of its optimality. 
 
 Next we define binary trees, and then binary decision trees. 
 
 DEFINITION 3.2. 
 
 A binary tree T. with i internal nodes and i+1 terminal nodes 
 (i > 0) is defined recursively as follows: 
 
 If i = 0, then it consists of a single terminal node, otherwise 
 it is a triple (T., v, T ), where v is a distinguished internal node 
 
23 
 
 called the root of T., and T g and T are binary trees vrith I and r internal 
 
 ~ — — — ■ x Z r 
 
 nodes, respectively, and with £+1 and r+1 terminal nodes, respectively, 
 where I > 0, r > 0, i+r = i-1. 
 
 DEFINITION 3-3» 
 
 A binary decision tree involving n conditions C, , CL, .... C is 
 
 il ° i 2 n 
 
 a binary tree each of whose internal nodes is labeled with a con- 
 dition C . such that, for any path from the root of the tree to a terminal 
 J 
 
 node, no condition C. appears more than once along this path. 
 
 Then we associate a binary decision tree with an n-cube parti- 
 tion in the following manner. 
 
 DEFINITION 3.U. 
 
 The n-cube rule partition associated with a binary decision 
 
 tree T is defined as follows: 
 
 For any terminal node t of T and the path C , C ,..., C 
 
 t l t 2 S 
 
 from the root to t, where t. e {1,2, ...n}, associate the k-cube (k = n-p), 
 
 - / t t tx 
 
 in such a way that 
 
 1) x. is "-" if C. is not in the path C , C ,..., C and, 
 
 i l t ± tg t p 
 
 2) x. is (l) if C. is in the path C , C ,..., C and the 
 
 i i t l t 2 p 
 
 terminal node t exists in the left (right) sub-decision tree of the 
 
 internal node C. . 
 l 
 
2k 
 
 It is obvious that all these blocks B form an n-cube rule 
 
 partition and the partition is unique 
 
 EXAMPLE 3-2. 
 
 Assume n = 3« Associated with each terminal node i, is a block B. 
 (i = 1,2,3,10 as follows: 
 
 B ] _ = {(0,-,0)} 
 
 B 2 = {(0,0,1)} 
 
 B^ = {(0,1,1)} 
 
 They form the following 3-cube partition jr. 
 
 vt- 
 
 B, 
 
25 
 
 DEFINITION 3.5 . 
 
 It is said that a decision tree realizes a rule partition it if 
 and only if each terminal node of the tree represents a block of the 
 partition jt and vice versa. A partition is realizable if and only if 
 there exists a decision tree which realizes the partition. We note that 
 a realizable partition is always a rule partition. 
 
 EXAMPLE 3.3. 
 
 We show an example of a non-realizable 3-cube partition it, 
 
 B r 
 
 B, 
 
 Blocks, B., of Jt are B = {(1,0,-)}, B 
 B^ = {(0,0,0)}, and B = ((1,1,1)}. 
 
 B, 
 
 {(0,-,l)}, B^ = {(-,1,0)}, 
 
 REMARK 
 
 As we have shown, a decision tree determines a unique n-cube 
 rule partition. However, the converse is not true. In other words, 
 more than one decision tree may realize the same partition as we show 
 in the following. 
 
26 
 
 \ B 3 B 6 B 5 B h Bg 
 
 Two different decision trees T and T realize the same 
 
 B. 
 
 partition jr. 
 
 5.^. Lattices of n-cube Partitions 
 
 So far we have learned that there exist three different kinds 
 
 of n-cube partitions, i.e., (general) partitions, rule partitions and 
 
 realizable partitions. Since a realizable partition is necessarily 
 
 a rule partition, we have 
 
 S c s "C S 
 o r 
 
 where S , S and S are the sets of all realizable, rule and general 
 o 7 r ° 
 
 n-cube partitions, respectively. 
 
 In this section we study those sets S, S and S of all n-cube 
 
 r o 
 
 general, rule and realizable partitions. In more details, we introduce 
 a binary relation " < " between two n-cube partitions and show that 
 these sets with this relation are partially ordered sets. Then three 
 binary operations " • ", " + ", and " © " are introduced between n-cube 
 partitions. 
 
27 
 
 We discuss closure properties of S, S and S under these operations 
 and show that (S, ♦, +) and (S , ', ©) are lattices. More advanced 
 study of optimality problems will be presented in Chapter k, based on 
 these lattices. 
 
 The relation " < " and a multiplication " • ", both defined 
 between two cube partitions, are very important concepts and are used more 
 than often in the succeeding two chapters. We introduce the binary rela- 
 tion " < " first. 
 
 DEFINITION 3-6. 
 
 and write 
 
 For n and n , we say that rt is larger than or equal to re 
 
 «!<* 2 
 
 if and only if, for any two vertices v and v_ of the n-cube, 
 
 V l ~ V 2 (*l) implies v ■ v g (k 2 ) , 
 
 that is, it < it holds if and only if every block n is contained in a 
 block of if . 
 
 Since this relation satisfies the three properties of a partial 
 ordering, we can present the following proposition. 
 
 PROPOSITION 3.1. 
 
 The binary relation " < " in the above is a partial ordering of 
 
 S, S , and S rt . 
 
 ' r' 
 
 Next we introduce the binary operations " • ", " + ", and " @ " 
 among n-cube partitions. 
 
28 
 
 DEFINITION 3-7- 
 
 If it and it p are partitions, then 
 (i) it • it p is a partition such that 
 
 v l S V 2 ^1 ' ^ if and 0nly lf V l ^ V 2 ^1^ and 
 
 V l S V 2 • ^2^' 
 
 (ii) it + it is a partition such that 
 
 v_ - v p (it + it ) if and only if there exists a sequence in v 
 such that 
 
 u. = u. , (n,) or u = u. , (it_) for < i < 1-1. 
 
 i i + 1 v 1 i i + 1 ' 2 y __ 
 
 The procedure to form it • it is very simple since the blocks 
 of it • it are obtained by intersecting blocks of it and it , 
 
 The process to obtain it + it is longer, but still straight- 
 forward. To compute B ' (v) we proceed inductively. Let 
 
 B (v) = B (v) U B (v) 
 1 2 
 
 and for i > 1 let 
 
 B i + (v) = B (v) U {B | B is a block of it or it , 
 
 and B R B. (v) ^ }. 
 
 Then B (v) = B (v) for any i such that B. , (v) = B.(v) 
 
 it + it x * i + 1 v l 
 
 EXAMPLE 3.4. 
 
 We show it • it and it + it by the following examples. 
 
29 
 
 *1 • *2 
 
 *!+ « 2 
 
 NOTATION 3.1. 
 
 and 
 
 Repeated multiplication and addition are represented by 
 
 n 
 
 jt • it . . , jt = n it. 
 1 2 n l 
 
 i=l 
 
 n 
 
 JT + JT + . . . + TT = Z IT. 
 
 1 2 n . , i 
 1=1 
 
 According to the first definition of a lattice in Section 3.2. , 
 it is easy to show the following. 
 
 PROPOSITION 3.2. 
 
 The partially ordered set, (S, <), of all n-cube partition is a 
 lattice and 
 
 , 
 
30 
 
 g.l.b. (x lt « 2 ) = k ± ■ * 2 
 
 l.u.b. (jt^ st g ) = ^ + n g . 
 
 Since we can define the binary relation " < " by: «»■ < jt if g^ only 
 if jt • jt = jt if and only if it + jt = jt , we can prove the following 
 statement by using the second definition of a lattice. 
 
 PROPOSITION 3-3- 
 
 The set (S, •, +) of all n-cube partitions with the two binary 
 
 operations " • " and " + " is a lattice. 
 
 We know that the two other sets S and S of all rule and realiz- 
 
 r o 
 
 able partitions are subsets of S and also partially ordered sets with 
 
 respect to the relation " < As we now show however, S and S with 
 .f _ ' r o 
 
 the two operations •" • " and " + " are not sublattices of (S, ' } +) . 
 
 Next we investigate these two sets S and S more carefully. 
 
 r o J 
 
 PROPOSITION 3A. 
 
 1) With respect to rule partitions: 
 
 If jt_ , jt e S then Jt • jr. e S . However, there exists a pair 
 1 2 r 1 2 r 
 
 of rule partitions jt and jt_ ( jt_ , jt 6 S ) such that jt_ + tl_ & S . 
 
 1 2 v 1' 2 r 12 r r 
 
 2) With respect to realizable partitions: 
 
 If jt . jt € S . then jt • jt_ 6 S . However, there exists a 
 
 1 2 o 1 2 o 7 
 
 pair of realizable partitions Jt, and Jt_ (it n , it e S ) such that jt_ + jt_ 
 
 1 2 1 2 o 12 
 
 ' o 
 
 3) There exists a pair consisting of a rule partition, it. , 
 
 and a realizable partition, jt , such that jt < jt and jt is not realizable. 
 PROOF: The proof of the first two statements of l) and 2) are omitted. 
 
31 
 
 It is sufficient to show the following two partitions it and it in order 
 to prove the two second statemtnts of l) and 2) . 
 
 *1 + *2 
 
 Both 
 
 it and it are realizable (i.e., rule) partitions. However, 
 
 it + it is neither a rule partition nor a realizable partition. 
 
 To prove 3), we show the following two partitions it and it . 
 
 *1 ' *2 
 
 it is a rule partition but not a realizable partition. it is 
 a realizable partition consisting of only one block (the 3-cube itself) 
 and it < it . However, it • it = it , and so it • it is not realizable. 
 
32 
 
 By the above proposition, we learned that sets S and S are 
 not closed under the operation " + " while they are closed under the 
 operation " • ". Therefore, (S , • , +) and (S , • , +) are not sub- 
 lattices of the lattice (S, ' , +) . 
 
 Instead of the operation " + ", we next define an operation 
 
 between two partitions as follows. 
 
 DEFINITION 3-8. 
 
 it © it p is a rule partition which satisfies 
 
 l) k © it > jt + it and 
 
 2) for any it £ S such that it + it < it, it 
 
 ii ? < it holds. 
 
 EXAMPLE 3 -3» 
 
 We show two examples of the operation " 
 
 1 
 
 *-]_ © * 2 
 
 P l P 2 p l ® p 2 
 
 The process to obtain it © it from it and it is omitted. 
 
 Since it is easily shwon that S and S are closed under the 
 
 r o 
 
 operation " © " and we can define g.l.b. (jr., it ) - it © it then we 
 obtain the following proposition. 
 
33 
 
 PROPOSITION 3.5. 
 
 The sets (S , •, ©) and (S , *, ©) are lattices, respectively. 
 And the latter is a sub-lattice of the former. 
 
 REMARK 
 
 The set (S, », ©) is not a lattice since jt © it does not satisfy 
 g.l.b. (it , it ) property. 
 
 It is known that every finite lattice has and I elements. 
 The three lattices (S, ' , +) , (S , ' , @) and (S , • , ©) have the following 
 same and I elements. 
 
 1) The element is a partition consisting of 2 blocks, and 
 each block contains one and only one vertex. 
 
 2) The I element is a partition consisting of one block con- 
 taining all 2 vertices, i.e., the n-cube itself. 
 
^ 
 
 k. SOME ASPECTS OF DECISION TREE CONSTRUCTION 
 
 k.l. Introduction 
 
 Known manual methods (Egler [2], Press [25], and Pollack [23]) 
 of converting decision tables into decision trees are based, mainly on 
 plausible arguments but little theoretical background. On the other 
 hand, Reinwald and Soland ([26], [27]) formulated this conversion 
 problem as a problem in mathematical programming, and described a 
 b ranch- and-bound algorithm for the construction of decision trees which 
 minimizes either the average processing time or the storage requirement. 
 
 In this chapter we derive some basic theoretical results con- 
 cerning optimal decision trees. The argument is developed based on 
 n-cube partitions which are introduced as a simplified mathematical 
 model of decision tables. After the cost of a decision tree is defined, 
 a procedure, called Procedure R, to construct decision trees from a 
 given partition is shown. Based on this procedure an algorithm, called 
 "iterated laocal minimization", is proposed. It does not always generate 
 an optimal decision tree, but yields suboptimal trees which approxi- 
 mately minimize costs. The trees generated by this algorithm are com- 
 pared quantitatively with optimal trees and with those constructed by 
 Pollack's first algorithm. 
 
 This chapter also contains an analysis of "rule-splits", 
 particularly, lower and upper bounds for the minimum number of required 
 rule splits over all n-cube rule partitions. Since a decision tree which 
 is optimal for one objective function is not necessarily optimal for 
 other objective functions, relationships existing among optimal decision 
 
35 
 
 trees under different objective functions are also discussed in this 
 chapter. Such arguments are based on partially ordered sets, S and 
 S , of n-cube realizable and rule partitions. 
 
 k.2. How to Construct Decision Trees 
 
 In this section, we describe a procedure, called Procedure R, 
 to construct decision trees which realize partitions tt' , it" f ... which 
 are refinements of a given partition n (i.e., n' < n f tt"<it, ...). 
 Algorithms for constructing decision trees which have been proposed in 
 the literature ([23], [25], [26], [27]) as well as the algorithm which 
 is presented in this section, are essentially based on this procedure. 
 To discuss the optimality of constructed decision trees we first intro- 
 duce the cost of a decision tree as an objective function of our con- 
 version problem. 
 
 DEFINITION k.l. 
 
 The cost of a decision tree is the number of internal nodes 
 in it, i.e., the number of terminal nodes minus one. 
 
 We note that, if each condition C. equally requires a unit of 
 storage space, this cost coincides with the total storage requirement 
 previously defined. It is obviously a special case of average pro- 
 cessing time cost. By simplifying these two conventional costs of 
 flowcharts into the above simple cost, it is hoped, that we can develop 
 more theoretical arguments about optimal decision trees. 
 
 As defined earlier, a decision tree is said to realize an 
 n-cube partition it if and only if each terminal node of the tree 
 corresponds to a block of a k-cube (k < n) of the partition it in a 
 
36 
 
 one-to-one manner. We possibly have more than one decision tree 
 realizing the same partition. The cost of all decision trees realizing 
 the same partition, however, is always the same, since it is simply 
 the number #(jt) of blocks of the partition jt minus one. 
 
 In short, if a given partition it is realizable, all corre- 
 sponding decision trees have the same cost of #(jt)-l. We will show 
 later in this section that it is easy to check whether a given partition 
 jt is realizable or not, and to construct decision trees realizing 
 realizable partitions. 
 
 On the other hand, if a partition Jt is not realizable, then it 
 does not have any corresponding decision tree, then we must split some 
 blocks, B., of jt into smaller blocks so that the resultant partition 
 jt' is realizable. As an example, we show in Figure h.l. , a realizable 
 partition, jt', which was obtained by splitting the block B. of an 
 nonrealizable partition, jt, into smaller blocks, B ' and B, ". 
 
 y>i 
 
 FIGURE k.l. 
 
37 
 
 Then the cost of a decision tree realizing re' is #(rt')-l, and 
 
 #(*')-! = #(«)-l + {#(*<) - #(n)}. 
 The quantity #(«' ) - #(ir) is called the loss(rc',rc) due to the replacement 
 of the (nonrealizable) partition n by the (realizable) partition rt* , where 
 rt' < it, 
 
 DEFINITION k.2. 
 
 The minimum cost for a partition re is defined to be 
 
 Min {#(*' )-l} = #(it)-l + Min loss(re' ,re) 
 it' it' 
 
 where Min is taken over all realizable partitions, re', which satisfy 
 re* 
 
 it' < re. 
 
 Now we see that our optimization problem is to find a 
 realizable partition, re'(< jt)> for a given partition, re, such that the 
 loss(ir',jt) ■ #(jt') - #(«) is minimized. 
 
 Next we describe a procedure to construct a decision tree 
 realizing re' for a given partition tx (re' < re) and show how to calculate 
 the loss, #(rt') - #(rc). The entire procedure, called Procedure R, is 
 based on the following Operation A which generates two (k-l)-cube 
 partitions from a k-cube partition. 
 
 OPERATION A 
 
 Given a k-cube rule partition, ex, which consists of more 
 
 than one block, and given a condition, C , where 1 < s < k, define 
 
 two (k-l)-cube rule partitions, cr and cr , as follows: 
 
38 
 
 For each block B. = (ac., x , ..., x g , ..., x^) of cr: 
 
 1) if x of B. is 0, then the (k-l) -tuple (x , x , . .., 
 
 S -L JL £_ 
 
 x s-l> X s+1' ""' \) is a bl ° Ck ° f V and 
 
 2) is x of B. is 1, then the (k-l) -tuple (x , x , . .., 
 
 x , x , ..., x.) is a block of cr , and 
 
 S— J. S+J- K -L 
 
 £ / i i i 
 
 3) if x is -, then the (k-l) -tuple (x , x , . .., x 
 
 x ,,..., x ) is a block of both cr„ and cr , and 
 s+1' - k' X 
 
 k) cr n and cr have no blocks other than those obtained from 
 l), 2) and 3) above. 
 
 EXAMPLE k.l. 
 
 Consider the following 3-cube partition cr, where cr = 
 
 C l C 2 C 3 
 { B 1 = (0, 0, 0), B 2 = (0,1,0), B 3 = (0,-,l), \ = (1,-,-)}. If 
 
 we choose C as the root of the decision tree, then the corresponding 
 
 c 2 c 3 
 
 cr and a ± are cr Q = {(0 , ), (1,0), (-,1)) and o^ = {(-,-)}, 
 respectively. 
 
39 
 
 Given a partition jt, Procedure R constructs a decision tree 
 
 which realizes a partition it'(< tt), by applying Operation A repeatedly. 
 
 At the moment we will leave open how procedure R chooses the condition 
 
 C to be used in Operation A. Various specific choices will be dis- 
 s 
 
 cussed later. 
 
 PROCEDURE R 
 
 Assume a rule partition it is given. If n consists of a single 
 block, construct a decision tree which consists of a single terminal 
 
 node. Otherwise, choose a condition C , derive the two partitions, 
 
 s 
 
 jt and it , by applying Operation A to it and C , and construct a 
 
 \J _L S 
 
 decision tree as follows: 
 
 Its root is labeled with condition C , its left subtree is 
 obtained by applying Procedure R to jt , and its right subtree by 
 applying Procedure R to k . 
 
 Then we obtain the following proposition. (The proof is omitted.) 
 PROPOSITION k.l. 
 
 Procedure R, applied to a partition jt, constructs a decision 
 tree realizing a partition it' which satisfies it' < jr. 
 
 EXAMPLE k.2. 
 
 Readers are suggested to check the above Procedure R by two 
 examples in Figure 2.3- (a) and. (b) in Chapter 2. For the case (a), the 
 realized partition is the given partition, it. For the case (b), however, 
 Procedure R yields the tree realizing the following partition, rr', which 
 is smaller than the original partition, it. 
 
ko 
 
 Tt' 
 
 In Proposition k.l., it is shown that the tree constructed by- 
 Procedure R applied to a partition it realizes a partition it 1 which 
 satisfies it' < it. Now we show how the loss(jr',«) = #(rt' ) - #(jt) can 
 be calculated. For this purpose we introduce the following definition. 
 
 DEFINITION k.3. 
 
 The loss , Z(C ,u) due to using C when Procedure R is applied 
 — — — s s 
 
 to a partition cr is defined by 
 
 £(C ,cr) = the number of blocks, B. = (x , x , ..., X, ), of cr, 
 
 where x is "-". 
 s 
 
 We call C a lossfree condition with respect to cr if and only if 
 i(C ,cr) = holds. 
 On the other hand, if all blocks, B. = (x , x , ..., x.) , of cr 
 
 have x ="*", then the condition C is called inessential to cr. 
 
 s ' s 
 
 PROPOSITION k.2. 
 
 The cost of the decision tree realizing rt' constructed by the 
 above Procedure R is given by 
 
 #(se')-l = #(*)-l + (#(*') - #(*)} 
 
 = #(«)-l + S |(C f a ± ) t 
 i 
 
hi 
 
 and, therefore, the loss (it',jt) is expressed by 
 
 Loss(ir*,jt) = 2 &{C ,<J.) 
 i s i x 
 
 where the sum Z £(C S , cr. ) (it is called, simply the total loss) is taken 
 i i - 1 
 
 over all internal nodes i of the decision tree constructed by Procedure R. 
 
 The proof is not given. However, consider the example in Figure 
 
 2.3- (a) and. (b) in Chapter 2, again. In (a), the conditions chosen at 
 
 each step are all lossfree conditions with respect to the corresponding 
 
 partitions. Therefore, the total loss is zero. On the other hand, in 
 
 (b), the root C generates the loss one. The rest of the internal nodes 
 
 generate no losses. That is, the total loss£ i(C ,cr.) is equal to one. 
 
 1 S i X 
 Actually, it is .shown in Example l+.2.that the loss(Tt',Jt) is equal to one 
 
 for (b). 
 
 Now the problem of constructing an optimal decision tree from a 
 
 given partition is reduced to finding a suitable condition, C , at each 
 
 step of Procedure R so that the total loss,Si(C ,o\), is minimized. 
 
 i 
 
 s. ' 1 
 l 
 
 So far, there is no method proposed which guarantees minimizing the total 
 loss. One plausible method is Pollack's first algorithm, and the other 
 is the iterated local minimization which is explained in the following. 
 Before we introduce this algorithm, one very basic theorem is 
 presented. 
 
 THEOREM k.l. 
 
 Assume there exists a lossfree condition C. for a given 
 partition jr. Then any decision tree with its root C, (k/i) (realizing *' 
 ( < k )) can be transformed into a decision tree with the root C. , while 
 
k2 
 
 preserving or reducing the cost of the original tree. 
 
 PROOF: Consider a decision tree T with the root labeled C (k^i), 
 
 which realizes a partition jt' (< it). Since i(C, jt) = 0, it is easy to 
 
 see that C. appears in every path from the root to a terminal node of 
 
 T. Now we mention the lowest level at which the internal node C. 
 
 1 
 
 appears. We note that no other C. must appear in the path from the root 
 to this present C. from the property of a decision tree (in Chapter 3)« 
 Then, we claim that its neighboring node at the same level which has the 
 same predecessor, C, must be also C. because, 
 
 1) if it is a terminal node, this contradicts the fact that 
 ^(C.,jt) = since no C. appears in the path from the root 
 to this terminal node, and 
 
 2) if it is C (m/i), then C. must appear in a lower level of 
 
 m ' i 
 
 the tree since no C. appears in the path from the root to 
 
 this C . Therefore, we have a pair of C.'s at the same 
 m ' l 
 
 level which have the same predecessor, C, as shown below. 
 
 J 
 
 ^©k 
 
 Next we replace the C. by C. and the two C.'s by two C.'s. At 
 
 the same time, two trees, T and T , of the four subtrees of T (T. 
 
 23 i 
 
 through T. ) shown above are exchanged. Then it is easy to see that the 
 
following modified decision tree also realizes the same partition, n* 
 
 .^0k 
 
 Let o" and cr denote those two partitions which are realized 
 
 by the left and right subtrees following the new C. of the modified 
 
 tree, respectively. If it happens that the condition C. is inessential 
 
 to 0" (cr ), then the p-cube partition, a (cr ), degenerates to the 
 
 (p-l)-cube partition, cr' (cr' ), by removing "-" in the j-th position 
 
 from the p-tuple of a block B. of cr (a ). (We note that if C. is a 
 
 1 1 l 
 
 lossfree condition with respect to it, then C. is always essential at 
 
 each step of the Procedure R. Therefore such degeneration does not 
 
 occur by C.) If such a degeneration occurs for <j- (<J n ) , then C. can be 
 l 1' j 
 
 eliminated from its corresponding subtree. Then, the tree with C. 
 
 J 
 
 eliminated realizes a different partition, n", but still jt"< it holds, 
 
 and it is easy to show that jt' < rt". In other words, we could reduce 
 
 the cost by one (or two, if both C.'s were eliminated). When such 
 
 J 
 
 degeneration does not occur, then the cost of the modified tree remains 
 the same as that of the original tree. By applying this argument 
 successively, we can move the lossfree condition C. from lower levels to 
 
higher levels while preserving or even reducing the cost of the tree. 
 
 Finally, we obtain the decision tree with the root C. and with its cost 
 
 less than or equal to that of the original tree with the root C . Q.E.D. 
 
 The above theorem describes the case of the root of an entire 
 
 decision tree. In similar way, however, the argument generally can be 
 
 applied to construction of subtrees, i.e., choosing a condition, C , 
 
 s 
 
 for o" at each step of Procedure R. 
 
 REMARK 
 
 Using Pollack's terminology, this theorem says that the 
 condition row with dash count should be chosen first. 
 
 We can present the following policy for the selection of a con- 
 dition, C , for a partition, °", at each step of Procedure R. 
 
 LOSSFREE-CONDITION-FIRST (LCF) Policy 
 
 At each stage of Procedure R, select a lossfree condition, if 
 one exists. 
 
 If we use the above concept, the LCF policy, we can, as follows, 
 easily state the realizability of a partition as well as show how to con- 
 struct decision trees realizing such a realizable partition. 
 
 PROPOSITION k.3. 
 
 A partition it is realizable if and only if we can choose a 
 lossfree condition at each step of Procedure R. Moreover, assume 
 Procedure R with the LCF policy is applied to a realizable partition n. 
 Then, all decision trees constructed by this procedure with the LCF 
 policy realize the same partition, jt, and the cost of such trees is the 
 
^5 
 
 same. That is, choosing one particular condition out of all lossfree 
 conditions with respect to cr. at each step of Procedure R does not 
 influence the realizability of the partition, n, nor change the cost of 
 the decision trees. 
 
 The proof is omitted. 
 
 Now we propose the iterated local minimization. As shown in 
 
 the above Proposition U.3., there is no difficulty in constructing decision 
 
 trees if a given partition is realizable. According to Proposition ^4.2. , 
 
 the loss(jt',jt) can be expressed by Z£(C ,a-), "when Procedure R is 
 
 i S i X 
 
 applied to a nonrealizable partition, Tt. One plausible way to minimize 
 
 this loss is to choose a condition, C , at each step such that the 
 
 s 
 
 loss H(C , cr. ) is minimized. 
 v s. i 7 
 
 1 
 
 ALGORITHM (The iterated local minimization) 
 
 At each step of Procedure R, choose C such that i(C ,cr) is a 
 minimum over all i(C.,cr) for all possible choices C.. In other words, 
 this algorithm consists of the following two cases. 
 
 Case l) If there exists a lossfree condition, C , with respect 
 
 to cr, then choose C as the root of the current decision 
 7 s 
 
 tree. If several such lossfree conditions, C , exist, 
 
 ' s ' 
 
 then choose any of them (the LCF policy). 
 Case 2) If there is no lossfree condition, then choose any C 
 
 whose i(C ,cr) is a minimum among all possible conditions. 
 
 REMARK 
 
 In the terminology of decision tables, these cases l) and 2) 
 
k6 
 
 can be restated as follows. For a given decision table, first select 
 
 a condition row having no dash entry, if possible. If such a row does 
 
 not exist, select a condition row having the fewest number of dash 
 
 entries. 
 
 Readers should notice the difference between Pollack's first 
 
 algorithm ([23l) and the above iterated local minimization. Pollack's 
 
 first algorithm uses dash count (which is a weighted sum of a number of 
 
 dashes of a condition row) as a criterion for selecting a condition, C , 
 
 s 
 
 at each step of Procedure R, and chooses a condition whose dash count is 
 a minimum. (On the other hand, the iterated local minimization simply 
 calculates a number of dashes of a condition row and chooses a condition, 
 
 the number of dashes of which is a minimum among all possible choices of 
 
 \ k • 
 
 conditions.) The weight, 2 1 , of dash count is a number of O-cubes in a 
 
 rule (i.e., block) of kj_-subcube. Then, the dash count of a condition 
 
 k* 
 C is the sum T, 2 x where indices i are taken over all k -subcube blocks 
 s i i 
 
 which are split by the condition C . 
 
 s 
 
 Now we return to continue arguments concerning the iterated 
 local minimization. Unfortunately, this algorithm does not always 
 guarantee optimal decision trees due to case 2) of the algorithm. We 
 show this fact by the following Example k.?>. 
 
 EXAMPLE h. 3 . 
 
 Consider the following partition n . 
 
^7 
 
 Both the iterated local minimization and Pollack's first algorithm 
 generate the same decision tree, T of Figure 2.1. (a). They choose 
 CY first as the root of the decision tree. The cost of T, is 12. 
 
 B," B" 
 
 FIGURE ^.2.(a). 
 
 FIGUPE k.2. (b) 
 
kS 
 
 The decision tree T of Figure k.2. (b) is the optimal tree for this 
 nonrealizable partition, it. Its cost is 11. We show in Figure k. 3- , 
 two partitions, rt' and it" realized by T. and T , respectively. 
 
 B r 
 
 \ B T 
 
 b; 
 
 B, 
 
 B, 
 
 FIGUEE k.3.(a). 
 
 FIGUEE k.J.Cb)- 
 
 We note that the loss(ir',jt) and the loss (rf", it) are 3 and 2, respectively. 
 
 PROPOSITION h.k. 
 
 The iterated local minimization, as well as Pollack's first 
 algorithm, does not always yield an optimal decision tree. 
 
 k. 3- Comparative Study of Algorithms 
 
 We learned that the iterated local minimization and Pollack's 
 first algorithm do not always yield an optimal decision tree. In this 
 section, we compare trees generated by the algorithm of iterated local 
 minimization, by Pollack's algorithm, and optimal trees; and we provide 
 estimates of how far off-optimal trees are which are generated by these 
 two algorithms. To make the comparative arguments more concise, we 
 
k 9 
 
 prepare the following definition and give a very baste theorem. 
 
 DEFINITION k.k. 
 
 For a k-cube partition, it , its n-cube extension (k < n) is 
 
 defined as follows. For each block, B. = (x. , x_, .... x. ) of it, . 
 
 ' 1 1' 2' 'Is.' k' 
 
 n-k 
 we form the 2 blocks of it by adding (n-k) new coordinates, C, n , 
 
 n ' k+1' 
 
 c k+l c k+2 c n 
 C k+2' '"' C n' to the tuple > i,e -' ( x i> x 2> '•'> \> ° » ° » •..,0 ), 
 
 Then #(* ) is equal to 2 n ~ k • #(rt ). 
 
 EXAMPLE k.k. 
 
 As an example, the construction from it to its n-cube extension, 
 
 K. 
 
 it , is shown below, where k = 3 and n = k. 
 
 THEOREM k.2. 
 
 n 
 
 Assume an algorithm, A, is applied to a k-cube partition, it. 9 
 
 and it constructs a decision tree realizing it ' . Then, as we defined 
 
50 
 
 previously, the loss(it ', it. ) is #(«.') - #(it fc ). When the same 
 algorithm, A, is applied to the n-cube extension, it , of n , then the 
 
 loss (it ' ,k ) is given by 
 n n 
 
 loss(n n ', * n ) =#(* n ') - #(« n ) 
 
 ^n-k n / , \ 
 > 2 • loss(it ',it ) 
 
 = 2 
 
 n-k 
 
 k > V 
 (#(« k «) - #(n k )), 
 
 where « ' is a partition realized by the resultant decision tree, 
 n 
 
 PROOF : The algorithm A may or may not have the LCF policy. If it has 
 
 the LCF policy, then it can choose the newly augmented coordinates C _ , 
 
 k+1 
 
 n-k 
 C ^, .... C at the first 2 -1 steps of Procedure R applied to the n- 
 
 k+2' ' n 
 
 cube extension, it , because these conditions are lossfree at each step 
 
 of the first 2 -1 steps. In practice, this means that the n-cube 
 
 n-k 
 partition jt is divided into 2 k-cube partitions, jt . For each 
 
 k-cube partition, the algorithm generates the loss(rt ' ,Tt ) = #(jt ' ) - 
 
 #(rt ); therefore, totally, this algorithm generates the 1oss(jt ',jt ) - 
 
 n-k 
 2 . lossfjr. '.it, ) for the n-cube extension, it . If this algorithm A 
 v k ' k 7 ' n 
 
 does not have the LCF policy, then it generates much more loss than the 
 
 above loss (it ',it ) according to Theorem k.l. Q.E.D. 
 
 n ' n 
 
 This theorem says that the loss generated by an algorithm for a 
 
 n-k 
 k-cube partition is magnified by the factor 2 for its n-cube extension. 
 
 COROLLARY k.3, 
 
 Assume that two algorithms, A and B, have the LCF policy. If 
 A and B generate the loss (it ' ,it ) and the loss(it ",it ) for a k-cube 
 partition, respectively, then for any n (> k), there exists an n-cube 
 
51 
 
 partition for which A and B generate 2 « loss(n',ir) and 2 . 
 
 1oss(tt ",jt, ), respectively. In other words, the difference of the 
 
 n-k 
 losses generated by A and B is 2 •. {loss(it ' ,it ) - loss(« ",jc. )}. 
 
 K. k k K 
 
 PROOF: If we consider the n-cube extension it of the k-cube it, , then 
 n k' 
 
 the above statement can be directly derived from Theorem k.2. Q. E.D. 
 
 The corollary says that, if there is a difference, d, in the 
 losses generated by two algorithms with LCF policy, for a k-cube 
 
 partition, then, its n-cube extension causes the difference in the losses 
 
 n-k 
 by A and B to be 2 . d. 
 
 We now apply this corollary to compare the losses of various 
 
 trees. 
 
 THEOREM k.k. 
 
 For any n > k, there exists an n-cube partition for which the 
 
 n-k 
 cost of an optimal decision tree is 2 less than the cost of a decision 
 
 tree constructed by the iterated local minimization or by Pollack's first 
 
 algorithm. 
 
 PROOF : Consider the U-cube partition, n, in Example k.k. Since the 
 decision tree by the iterated local minimization or by Pollack's first 
 algorithm differs from the optimal, in terms of the difference of loss, 
 by one, then for its n-cube extension, the difference of loss becomes 
 2 n_k , according to Corollary k.3- Q.E.D. 
 
 Secondly, we compare Pollack's first algorithm with the 
 iterated local minimization. 
 
52 
 
 THEOREM k.5. 
 
 For any n > 5, there exists an n-cube partition for which the 
 cost of the decision tree constructed by Pollack's first algorithm is 
 2 • 2 n times larger than the cost of a decision tree by the iterated 
 local minimization. 
 
 PROOF: Pollack's first algorithm generates the loss for the 5-cube 
 partition of Figure k.k. (it chooses C first as the root of the 
 
 decision tree. ) 
 
 FIGURE k.k. 
 
 The iterated local minimization, however, generates the loss 1. It 
 chooses C first as the root of the decision tree. After the LCF policy 
 can be applied at each step of the procedure, we obtain, for the n-cube 
 
 extension, the difference of loss in the statement, i.e., 
 
 ^-5 
 
 (3-D =2-2 
 
 n-5 
 
 Q.E.D. 
 
53 
 
 Finally it is shown that the iterated local minimization is not 
 always better than Pollack's algorithm. 
 
 THEOREM h.6. 
 
 For any n > 5> there exists an n-cube partition for which the cost 
 
 of the decision tree constructed by the iterated local minimization is 
 
 n-5 
 2*2 times larger than the cost of the decision tree obtained by Pollack's 
 
 first algorithm. 
 
 PROOF : The iterated local minimization generates a loss of 5 for the 5- cube 
 partition of Figure 4. 5- That is, first it chooses C as the root of the 
 decision tree while generating a loss 2 at this first step. After that it 
 generates the loss 1 and 2 for the resultant two 4-cube partitions, respec- 
 tively. The total loss, therefore, is equal to 5- 
 
 FIGURE 4.5. 
 
 On the other hand, Pollack's algorithm chooses C first, which gen- 
 erates a loss 3 at this step. After that, it generates no loss. Therefore, 
 the difference of loss by both algorithms for this partition is (5-3) =2. 
 Using Corollary 4.3, the 2 • 2 n ~ 5 is obtained. Q.E.D. 
 
^ 
 
 k.k. Bounds of Minimum Total Loss 
 
 Associated with the concept of total loss for nonrealizable 
 partitions, one interesting question occurs. That is: how much total 
 loss must be generated by an optimal algorithm? Not at ion ally, this 
 quantity is characterized by 
 
 L(n) = Max Min loss (it ' ,jt) 
 
 Jt It' 
 
 = Max Min {#(«') - #(jt) \ it' < jt and it 1 is realizable} 
 it jt' 
 
 where Max is taken over all n-cube partitions, 
 jt 
 
 THEOREM 4.7. 
 
 The quantity L(n) is bounded by 
 
 i * L( P ) *§ 'log (g). 
 
 PROOF: a) Lower bound Consider the 3-cube partition in Example 5-3« in 
 Chapter 3- The total loss associated with an optimal decision tree for 
 
 this partition is equal to one. According to Corollary k.3. an optimal 
 
 n-3 
 algorithm generates at least the loss of 2 for its n-cube extension. 
 
 Therefore, we obtain the lower bound (which is existential) 2 for L(n). 
 
 b) Upper bound. For the proof of this bound, we use the 
 
 following lemma. 
 
 LEMMA 
 
 k-1 
 For any step of Procedure R, at most a loss of [2 /kj is 
 
 generated (k > 3), i.e., 
 
 /(c ,t) < L2 k -VkJ, 
 
 where [x\ is the greatest integer that is less than or equal to x, and 
 
55 
 
 a partition, t, for which we construct subtree at this step, is a 
 k-cube partition. 
 
 The proof of this lemma is omitted. Instead, for k = 3, ^ 
 
 and 5, we show in Example U.5- a k-cube partition which achieves 
 
 L2 k " 1 /kJ. 
 
 EXAMPLE U.5, 
 
 The following 3-> ^— and 5-cube partitions are examples which 
 
 ,k-l 
 
 enerate a loss of [2 /kj for k = 3* ^, and 5> at one step. 
 
 k = 3 
 P 3-1 , 
 
 if J- 1 
 
56 
 
 Now we prove the upper bound "based on the above lemma. Assume 
 
 k-1 
 that we have an n-cube partition for which a loss of [2 /kj is 
 
 generated at each step of Procedure R. Then, for this partition, the 
 total loss is given by 
 
 [2 n " 1 /nJ + 2 • L2 n " 2 /(n-l)J + .... + 2 n ~ 3 • L2 2 /3J 
 and bounded by 
 
 < 2 n_1 /n + 2 • 2 n " 2 /(n-l) + 2 2 • 2 n " 3 /(n-2) + ... + 2 n ~ 3 • 2 2 /3 
 
 = 2 n_1 • {1/n + l/(n-l) + l/(n-2) + ... + 1/3} 
 
 pn+1 _ n 
 < 2 • / X dt = 2 , • (log n - log 2) 
 
 2 n 
 
 . log (n). Q.E.D. 
 
 2 2 
 
 7 
 In practice, for the case n = 10, the lower bound is 2 . This 
 
 means, in conventional terminology, that there exists a decision table 
 
 7 
 with 10 condition rows for which an optimal algorithm splits 2 rules 
 
 9 
 totally. It is, however, bounded by 2 • log 5 = 820. 
 
 k.5- Optimality Discussions for Different Objective Functions 
 
 So far we worked only with the objective function #(«). There 
 are, however, two other criteria to be considered, the total memory 
 space requirement, M, and average processing time, P, which are briefly 
 described in Chapter 2. Obviously, #(jt) is a simplified and special 
 case of M and P. However, one question arises. What kind of relationship 
 does exist among those optimal decision trees which minimize different 
 objective functions? More simply, does an optimal decision tree for 
 one objective function minimize the other two objective functions? 
 
57 
 
 In this section we develop such arguments on optimality for 
 different objective functions, based on two partially ordered sets, S 
 and S, of all realizable and rule partitions, respectively. Moreover, 
 we can show how Procedure R and related algorithms work and/or should be 
 modified for those two other objective functions. For simplicity, the 
 three different objective functions are called #-cost, M-cost, and P- 
 cost, respectively. 
 
 STATEMENT 
 
 For a given partition *c € S , where do optimal solutions exist in 
 S for the different #-cost, M-cost and P-cost, respectively? How can we 
 relate one optimal solution to the rest? How well do procedures or 
 algorithms which were so far proposed for minimizing #-cost, work for 
 M-cost and/or P-cost? 
 
 This section answers these questions to some extent. 
 
 We first give a brief review of properties for these three 
 different costs. Recall that a decision tree is defined only for a 
 realizable partition it and we denote it by T(jt). T(tt) is read "a decision 
 tree realizing a partition it". 
 a) #-cost 
 
 #-cost, C(T(jt)), of a decision tree T(jt) realizing rt is the 
 number of internal nodes of the tree, and it is equal to the number of 
 blocks of jt minus one, i.e., 
 
 c(t(k)) = #(«) - 1. 
 
 Therefore, the #-cost of all decision trees realizing it is the same. In 
 other words, this cost is independent of decision trees realizing ir and 
 
58 
 
 depends only on the partition, it. So we define #-cost, C(jt), for a 
 realizable partition rr by 
 
 C(ff) = #(*) - 1. 
 (Do not confuse C(tc) with #(tt). The former is #-cost, and the latter is 
 the number of blocks of it. ) 
 
 It is straightforward to extend, the above definition to the 
 
 case of a nonrealizable partition, it. Thus, more generally, we define 
 
 #-cost, C(jt), of a rule partition tt ( e g ) "by 
 
 r 
 
 C(«) = #(*) - 1. 
 
 b) M-cost 
 
 M-cost, on the other hand, can be defined only for a realizable 
 partition, jt( e S ), and a decision tree, T(tt). M-cost, C (T(tt)), of a 
 decision tree T(it), is defined, by 
 
 where s. is the storage space required for a condition, C, and the sum 
 
 Z s- is over all internal nodes i, of T(tt). 
 
 i 
 
 Then it is easy to see that the M-cost of different decision 
 trees realizing the same partition, Jt(e S n ),may differ, i.e., 
 
 C M (T 1 (7T)) ^ C M (T 2 (Tt)) ' 
 Consider it, T and T in Remark after Example 3.3. T and T realize 
 the same partition, it, but the M-cost of those trees is C (T_(ir)) = 
 S l + ^ S 2 + S ^ and C M^ T P^^ = 2s + s + 2s , respectively. It is concluded 
 that M-cost cannot be defined for a partition itself. 
 
 c) P-cost 
 
 First we assume a partition, jt(e S ). A probability, Pr(v), of 
 
59 
 
 occurrence for every vertex v of it is given and fixed. Then, a 
 probability of occurrence of a block, B, of jt is simply calculated by 
 
 Pr(B) = ^PrCv). 
 Then, the P-cost, C (T(rt)),of a decision tree (T( it)) realizing n is defined 
 
 by 
 
 C p (T(»r)) = E Ep( B± ) .M ) 
 
 i. k 
 
 k 
 
 where B. is a block of tt corresponding to a terminal node, i, of T; 
 
 t. is the time required to process a condition C. ; and the sum Z t. 
 
 k k jL k 
 
 k 
 
 is taken over all internal nodes, C. , along the path from the root to 
 
 X k 
 
 the terminal node i of the tree. Examples are seen in Section 2.2. 
 
 Now we show a different way to calculate the P-cost without 
 
 constructing a decision tree. 
 
 THEOREM k.Q. 
 
 Let a partition jt have blocks B. = (x. , x. , ..., x. ), 
 
 l i_ ' l ' ' l " 
 
 12 n 
 
 where x. = 1, or"-". For any decision tree, T(jt), realizing it(€ S n ) , 
 
 its cost, (t(jt)), is equal to 
 
 C (T(*)) =SEp(B )• {S t } 
 
 where i is an index such that x. is 1 or (but not "-") in the n- tuple 
 k i k 
 
 B. = (x. , x. , .... x. ). 
 l l ' l ' ' l 
 12 n 
 
 The proof is emitted. 
 
6o 
 
 EXAMPLE k.6. 
 
 To verify the theorem, consider the following partition, it, 
 and the tree, T(rt), realizing rt. 
 
 B 2 = 
 
 ((-,1,0)} 
 
 ((0,0,0)} 
 
 {(1,0,0)} 
 
 T(«) 
 
 C (T(jt))can be calculated by its definition as follows, 
 
 C p (T(n)) = Pr(B 1 ) • t 3 + Pr^ 
 
 (t 3 + t 2 ) + Pr(B 3 ) 
 
 (t^ + t 2 + t x ) + Pr(B^) ■ (t^ + tg + tj. 
 
 On the other hand, the left hand side of the equality in the above 
 theorem is calculated as follows. Indices, L , of B. for i = 1, 2, 3 
 and U are (1,2,3}, {1,2,3}, {2,3} and (3}, respectively. Then, the 
 terms of the left hand side are Pr(B ) ■ (t + t + t ), Pr(B ) • 
 (t- L + tg + t ), Pr(B ) • (t + t ) and Pr(B ]+ ) • t , and it is easy to 
 
 see that this sum is equal to C (t(jt)) 
 
61 
 
 We learned that the P-cost of all decision trees realizing 
 tc is the same, and it depends only on the partition it. We define 
 P-cost, C (jt), for a partition jt by 
 
 C p (it) =2 Pr(B.)-(2 t }. 
 i x i, k 
 
 k 
 
 In a way similar to the case of #-cost, the above definition can be 
 extended to a nonrealizable partition, it. 
 
 To clarify differing aspects of these three costs, they are 
 summarized in the following proposition. 
 
 PROPOSITION k.5- 
 
 1. For a partition, Tt( e S ), its #-cost, C(it) , is defined by 
 
 C(jt) - #(«) - 1. 
 
 If it is realizable (e S Q ), then all decision trees realizing 
 
 it have the same #-cost, C(T(tt)) = C(n) = #(jt) - 1. 
 
 2. M-cost, C (T(rt)), is defined for a realizable partition, 
 
 it(e Sq), and a decision tree, T(rt), realizing jt, by 
 
 C.,(T(jt)) =2s., 
 M i x 
 
 and it may vary for each tree. 
 
 3. For a partition, jr(e S r ), P-cost, C (it), is defined by 
 
 C p (*) = 2 Pr(B.) • (S t. }. 
 i i k k 
 
 If it is realizable, (e S ) , then all decision trees realizing 
 
 7t have the same cost C_,(T(jt)) = C (it). 
 
 r r 
 
 Now we can present the following proposition concerning the 
 relationship between "cost inequality" and "partition inequality". 
 
62 
 
 PROPOSITION k.6. 
 
 1. Assume jt , it e s and it < it . Then 
 C(it 1 ) > C(it 2 ) and C p (it 1 ) > Cp(rtg). 
 
 2. Assume it , it G S^ and it < it . Let T and T be decision 
 trees realizing it and it , respectively. Then, for some 
 T ± and. T 2 , 
 
 C M (T l (:r)) ^ C M (T 2 (jt)) holds, and. 
 for other T and T , 
 C M (T 1 (it)) < C M (T 2 (it)) holds. 
 
 PROOF: 1. It is obvious from the definitions of #-cost and P-cost 
 2. We show the following example, it and it are both realizable and 
 it < it holds. 
 
 T (*) 
 
 T (n ' 
 2 2- 
 
 Since C M (T 1 (it 1 )) = 2 S;L + 2s 2 + s 3 and ^(T^)) = S;L + s 2 + 2^, the 
 sign of C^^C^)) - C M (T 2 (it 2 )) = ( S;L + s 2 - b 3 ) may take positive 
 or negative values (or zero). Q.E.D. 
 
 Next we associate the above properties with procedures 
 
63 
 
 constructing decision trees, i.e., Procedure R and its related algorithms. 
 We recall that Procedure R generally constructs a decision tree realizing 
 rt if it is realizable (esj. First we consider the set S of all 
 realizable partitions. Propositions k.% and k.6. lead to the following 
 theorem. 
 
 THEOREM h.9. 
 
 Assume jt is realizable (e S ) • Then, 
 
 1. Procedure R with the LCF policy always constructs an 
 optimal decision tree for both #-cost and P-Cost. 
 
 2. However, Procedure R with the LCF policy may not construct 
 an optimal decision tree for M-cost. 
 
 PROOF: 1. If tt is realizable, then Procedure R with the LCF policy 
 constructs a decision tree realizing rt and, according to Proposition 
 h.5t its cost, C(t(jt)), (or C (t(tt)) is the same for any other decision 
 trees. Proposition k.6. says that, for any jt'(<jt), C(jt') > C(tt), 
 (or C -□(*') > C (tc)) holds. Therefore, the constructed tree is optimal. 
 2. Procedure R with the LCF policy always constructs a decision tree 
 realizing a partition it if it is realizable. However, Proposition k.6, 
 says that there exists a partition tt'(< it) and tree T'(Tt') such that 
 C(T' (it')) < C (T(jt)). Therefore, T(jt) may not be optimal. Q.E.D. 
 
 The above theorem says that, if a given partition, it, is 
 realizable, Procedure R with the LCF policy works best for #-cost 
 and p-cost but not for M-cost. For M-cost, there may exist an optimal 
 decision tree realizing it' (jt'< jt) for a given partition it. 
 
6k 
 
 Although the following modification of the LCF policy does not guarantee 
 optimality, it suggests, however, a simple and reasonable way of selecting 
 a condition, C , at each step of Procedure R for M-cost. 
 
 Modified Lossf ree-Condltion-First (MLCF) Policy 
 At each step of Procedure R, 
 
 1) if there exists only one lossfree condition, C, then choose 
 it or 
 
 2) if there are several such conditions, then choose C. whose 
 
 ' 1 
 
 s . is a maximum. 
 
 1 
 
 If this is applied to a realizable partition, it, a decision 
 tree, T(it), realizing it (not it' such that it' < it) is constructed. Of 
 course, it is an optimal tree for #-cost and P-cost. Since M-cost, in 
 general, decreases if a condition, C, with a larger s. is chosen at a 
 higher level of the tree, the MLCF policy constructs a near optimal tree 
 and it is the best among all decision trees realizing the realizable 
 partition it. 
 
 So far we have shown how Procedure R and its related LCF policy 
 work for a realizable partition. Next we consider the set S of all 
 rule partitions. Our object is to find a realizable partition, it', for 
 a given nonrealizable partition, it(eS r )^ such that it' < it and C(it') (or 
 C (it')) is minimized. The following theorem defines where such an 
 optimal realizable partition it' exists in S for a given nonrealizable 
 partition, it, in S . We need the following terminology. 
 
65 
 
 DEFINITION k.$. 
 
 A subset B of a partially ordered set C is said to be maximal 
 in C if and only if, for all x e B and all y e C, either x > y holds or 
 else x and y are incomparable. 
 
 THEOREM 4.10. 
 
 For a nonrealizable partition, it, assume an optimal decision 
 tree realizes a partition, it', (it' < it f it'€ S and it e S ) for #-cost 
 
 or P-cost. Then such it' must be in a maximal set of S. D S , where S 
 
 it 7 it 
 
 is the set of all partitions a such that a < it, i.e., S = [a € S | a < it}. 
 
 PROOF : it' should be in S . If there exists a realizable partition it" 
 
 such that it" > it', then Proposition k.6. says that C(it") < C(it') and 
 
 C (it") < C (it'). This contradicts the fact that it' is a partition 
 P - P 
 
 realized by optimal decision tree. Q.E. D. 
 
 According to the above theorem, an optimal solution is a max- 
 imal element of S D S . Therefore, we can find an optimal solution for 
 P-cost in this way: First we neglect all probabilities, Pr(v), and time^ 
 t., required to process C. Enumerate all -elements in the maximal set of 
 all realizable partitions which are less than or equal to a given parti- 
 tion, it. Then, calculate P-cost of all these elements. An optimal solution 
 
 is one of these elements whose P-cost is a minimum. 
 
 REMARK 
 
 A partition realized by an optimal decision tree for M-cost 
 may not be in this maximal set. 
 
66 
 
 This procedure which is based on Theorem 4.10 ., however, is 
 impractical because the enumeration of all elements of S (IS is rather 
 exhaustive. As an alternative, in the following section, we modify the 
 iterated local minimization for #-cost (which was proposed in Section 4.2.) 
 so that the modified version can be applied to P-cost optimization problems, 
 For this modification, we first analyze the inequality C (it ) > C (jt ) 
 for a : tt„ which was seen in Proposition 4.6. For #-cost, it has been 
 analyzed and associated with Procedure R in Proposition 4.2. It was 
 shown there, 
 
 C(it ) -C(* 2 ) =Si(C , a.) 
 
 i i 
 
 where l(C , cr.) is the loss incurred by C with respect to <x. and the 
 s. 1 s. 1 
 
 i i 
 
 sumZUC , (J.) is taken over all internal nodes i of the tree and called 
 i s i x 
 
 the total loss. There is a similarity for C (jt_) > C (rt_) as we now 
 
 pi — p 2 
 
 show. 
 DEFINITION 4.6. 
 
 Consider choosing C at a step of Procedure R, where we are 
 working with a partition, a. Then, the loss Si (C- , cr) incurred by C 
 
 P 3 S 
 
 with respect to cr is defined by 
 I (C , 0") =* t E Pr(B k ) 
 
 P s' S k v CT 
 
 where B is a block of cr which is split by C . That is, the sum is 
 
 " s 
 
 ■&■ x. i-u 4. i -c i, j_ -, -r-k / k k k> 
 such that x of the tuple I 
 
 cr s * a 
 
 taken over all B^ such that xj of the tuple B^ ■■ (x^, x^, . . . , x£) 
 
 is - (in other words, k is an index whose B is split by the condition 
 
 C ). 
 s 
 
67 
 
 EXAMPLE 4.7. 
 
 Consider the follwoing cr. If we choose C , then the loss ^-(C , cr) 
 is t ? • Pr(B ) since only B is split by C . If we choose C , it splits the 
 blocks B and B . Then, the loss * p (C ., cr) = t • {Pr(B ) + Pr(B ) } . 
 
 B, 
 
 THEOREM 4.11. 
 
 Assume Procedure R is applied to a partition, it , and the resultant 
 decision tree realizes a partitions, Jt . Then, 
 
 Cp(« ) -0 p (, 2 ) -| l p (C , <r) 
 
 1 
 
 = Z t 
 
 i s. 
 
 {Z Er(Bp), 
 
 k u 
 
 where the sum Z i_(C , cr.) is taken over all internal nodes 1 of the tree, 
 i P s. 1 
 
 1 
 
 The proof of this theorem is based on Theorem 4.8., and can be done 
 is a similar way as Proposition 4.2, is proved. It is omitted, however. 
 
 EXAMPLE K 8. 
 
 To verify Theorem 4.11., we show an example. Assume Procedure R 
 is applied to the following *t and the constructed tree is T(it. ) which real- 
 izes it . 
 
68 
 
 V*i> 
 
 According to Theorem ^.8 V C (it ) is equal to 
 Pr^) • tj + Pr(B 2 ) • (tg + t 5 ) + Pr(B 3 ) • (^ + tg + t ) 
 + Pr(B^) • (t ± + tg + tj). 
 Also, C (it ) is calculated and equals 
 
 Pr(B.p ■ (tg + t_) + Er(Bj) • (t x + tg + t 3 ) 
 
 + Pr B' 
 
 X + t n + t ) + Pr(B ) • (t + t ) 
 
 T.' v 1 2 3' ' " x ~2 y v "2 ' "3' 
 + Pr(B 5 ) • (t ± + tg + t ) + Fr(B^) • (^ + tg + t ? ) 
 
 Therefore, 
 
 V*i> - c p(" 2 ) 
 
6 9 
 
 = t 2 . {Pr(B') + Pr(B^) + Pr(B*')) + t ± • {Pr(Bj) + Pr(B^')} 
 
 = t 2 • Pr(B 1 ) + t x ' Pr(B^ U B*) . 
 
 On the other hand, the total loss Z i_(C , <r ) is calculated as follows. 
 
 i r s. l 
 
 The root C of T(it ) splits a block, B _, then the loss by the C at this 
 step is t • Er(B ). Next, C of T(« 1 ) splits {B^ U Bj 1 ) into B^ and B.J" , 
 so the loss at this stage is t ± • Pr(B^' UbJ') = t ■ {Pr(B^) + Pr(B^)} . 
 Since the other conditions of T(it ) generates loss zero, the total loss is 
 
 t 2 • Pr(B 1 ) + t ± • (Er(Bj) + ErCB*')} , 
 and this is equal to the quantity C (it ) - C (it ) . 
 
 REMARK 
 
 The following Figure k.6. suggests a more simple and intuitive 
 way to calculate the difference C (it ) - C (it ). We compare it with it , and 
 particularly their edges (i.e., 1-cube) are observed. Since it < it holds, 
 any edge of it which is covered by a rule (edges of B,' and B p in EXAMPLE U.8.) 
 corresponds to an edge of it which is also covered by a rule (the two corre- 
 sponding edges of it are included in B, and B p , respectively, in the example). 
 However, there are edges of it which are not covered by a rule, but, whose 
 corresponding edges of it are covered by a rule. They are labeled by a, b, 
 and c in Figure h.6. 
 
 C. 
 
 C 2 
 
 FIGURE k.6. 
 
TO 
 
 If we calculate the following quantity for each of these edges a, b, and c, 
 
 t. ' {Pr(v. ) + Rr(v. )} ; 
 
 where vertices v. and v. are adjacent through the edge C. whose time 
 1 1 X 2 X 
 
 required to process is t., and add them together, then this sum is equal to 
 
 the difference C (it ) - C p (* ), 
 
 Now we present an algorithm for P-cost. 
 
 ALGORITHM 
 
 At each step of Procedure R. choose C. such that l^(C., cr) is a 
 
 7 1 Pi 
 
 minimum over all i^(C , cr) for possible choices of C . 
 
 P s s 
 
 This algorithm does not always generate optimal trees, but it is 
 based on plausible arguments, and it is easy to implement. 
 
71 
 
 5. A DECOMPOSITION THEORY OF DECISION TABLES AND DECISION TREES 
 
 5.1. Introduction 
 
 With the intention to process a large decision table effectively 
 in parallel we consider a decomposition of an n-cube partition. The study 
 of decomposition of n-cube partitions is motivated by the possibility of 
 processing large decision tables effectively in parallel. In order to give 
 an intuitive feeling for our problem, we consider the following example 
 shown below. 
 
 EXAMPLE 5.1. 
 
 Assume that the following decision tree T realizes a partition jt. 
 
 If we remove the edges of T marked with ~ , we obtain three smaller 
 
 decision trees T_ , T^ and T as shown nsxt. 
 12 3 
 
72 
 
 i 
 
 {\} f R 6 
 
 L K 3J 
 
 V 
 
 ? 8 
 
 If three independent processors are available and each of them processes 
 T. (i = 1,2,3) j we obtain a set of three outcomes from these three 
 processors, say, (R ± , R g , R ), (Rg, R^, R ) and (R^ R g , R , R , Rg). 
 Taking the intersection of these three in the set theoretical sense yields 
 (R 1 , R 2 r R 3 ) n (R 2 , R^, R ) n (R x , R , R , R^ R g ) = R 2 : thus, we can 
 identify the block, R_> of the original partition. This scheme can be 
 visualized by the following sketch of Figure 5.1. The intersection 
 operation can be realized by a simple "Logical AND" function. 
 
 INPUT 
 
 OUTPUT 
 
 FIGURE 5.1. 
 We note the fact that each processor deals with a smaller 
 decision tree, and hence the average processing time of this scheme might 
 be shorter than that for the single processor case. In succeeding sections 
 of this chapter, a decomposition problem of a partition, which can be 
 
73 
 
 applied to a parallel processing of a decision table, will be formulated 
 and some objective functions for efficient decompositions will be introduced. 
 After a theoretical analysis of this problem, discussions for the construction 
 of a pair of decision trees for a given partition are developed. Based on 
 a procedure, called Procedure D, a heuristic algorithm for efficient decom- 
 positions is also presented in the last section. 
 
 5.2. Decomposition Problem and Objective Functions 
 
 First we define the notation of a decomposition of a partition. 
 
 DEFINITION 5-1. 
 
 A set of n-cube partitions, it. (i = 1,2, ,,k), is called a k- 
 decomposition (or simply decomposition ) of an n-cube partition, it, if and 
 
 only if 
 
 k 
 
 n it . < it . 
 i=l X " 
 
 This is a necessary and sufficient condition for our parallel 
 
 processing scheme to work. Then, our decomposition problem is how to find 
 
 a decomposition {it.} for a given partition it so that we may effectively do 
 
 parallel processing. As a measure of efficiency for decompositions, the 
 
 following two objective functions are introduced. They are understood as 
 
 extensions of the conventional cost of a decision tree discussed in the 
 
 previous chapter. 
 
 OBJECTIVE FUNCTION 1. 
 
 For a decomposition { it . } of it, objective function 1, C (it ,jt , * ' ' ,it ) 
 is defined by 
 
Ik 
 
 k 
 
 where #(tc) is the number of blocks of it. 
 
 OBJECTIVE FUNCTION 2 
 
 For a decomposition [n.) of n, objective function 2, 
 
 CqC*-,* * 2 >---; \); is defined by 
 
 c 2 ( v v ... f « k ) - .| {#(,,) -1). 
 
 In Example 5. 3. both cost functions are illustrated. The first, 
 
 k 
 objective function C f is the number of blocks of the partition H it. . 
 
 1 k i=l X 
 
 This number is related to the distance from it to II n. on the lattice of 
 
 k 1=1 X 
 
 all n-cube partitions since #(JT it.) - #.(*) is the number of blocks 
 
 1*1 1 
 which are split. The second one is based on more practical considerations, 
 
 It corresponds to the sum of internal nodes of all decision trees, or 
 equivalently to the total memory space requirement, since #(rt.) -1 
 is proportional to the storage space required for the i-th processor. 
 (Each internal node of a decision tree is supposed to require one unit 
 of storage space.) There is another objective function which corresponds 
 to the average processing time. We need, however, probabilities of 
 occurrence of rules for its introduction. It will not be considered in 
 this thesis. 
 
 So far, we have given the definition of a decomposition and 
 have introduced two objective functions. The method of decomposition 
 is restricted as will be shown later. In order to explain the motiva- 
 tion for such a restriction, recall Example 5.1. The idea shown there 
 immediately leads to the following statement. 
 
75 
 
 PROPOSITION 5-1 . 
 
 Assume that a decision tree T realizes an n-cube partition Jt. 
 
 Then, a set of subtrees, T. (i = l,2,,,k), of T obtained by removing any 
 
 set of (k-l) edges of T induces a k-decomposition, i.e., 
 
 k 
 
 II it. < jt, 
 i=l x ~ 
 
 where each jt (l = 1,2, ,,k) is the n-cube partition realized by T. . 
 
 i i 
 
 To clarify this proposition, we use Example 5-1. To identify, 
 say R , it is sufficient to be given answers (i.e., "Yes" or "No") 
 of internal nodes C. , C , C and C along the path from the root to R . 
 In practice, trees T.. and T give such information for this case of R . 
 It is easy to see that the resultant set of trees T. obtained by the 
 method generally provides more than such information. 
 
 This means that if k processors are available, then we may 
 decompose the original tree T into k subtrees, T , T , ,, T , so that our 
 scheme works. In other words, by collecting intermediate results from 
 these processors and multiplying them, we can identify a rule of the 
 original jt. If we use the above result of Proposition 5.1., one more 
 property can be given as follows. 
 
 PROPOSITION 5.2. 
 
 For a given realizable partition jt, we can always achieve the 
 inequality 
 
 Min C 2 (jt 1 , Jt 2 ,..., h ) < #(jt) -1 
 
 vv*-* k <* 
 
 for arbitrary k (k =1,2,,,) 
 
76 
 
 PROOF: Assume a decision tree T realizes it. Applying the method in 
 
 Proposition 5.1. to T yields k trees T. (i = 1,2, •••,k) which realizes x 
 k X X 
 
 such that IT n. < k, Then, we can derive the following formulae, 
 i=l 1 
 
 k 
 C (« Hg,.-., i^) = 2 {#(n.) -1} 
 
 i=l 
 k 
 = £ {the number of internal nodes of T. } 
 i=l X 
 
 = {the number of internal nodes of T} 
 
 - #(*) -1 j 
 
 that is, 
 
 C 2^ V *2' ""' \^ = ^^ _1 ' 
 
 In other words, if it is realizable, then there exists at least one k-decom- 
 
 position of n satisfying the expression in the statement. Q.E.D. 
 
 This proposition shows the total memory requirement for our parallel 
 processing scheme is at most the memory required for the case of single 
 processor. A similar result can be obtained for the average processing time 
 but it will not be presented here. These facts show the advantages of 
 parallel processing, i.e., a smaller storage requirement and shorter process- 
 ing time. 
 
 The method of decomposition stated in Proposition 5«1« is a useful 
 tool for theoretical analysis: we could, for example, give the very basic 
 result in Proposition 5-2. It has, however, a practical deficiency as the 
 following example shows . 
 
77 
 
 EXAMPLE 5-2, 
 
 T- 
 
 We assume three processors are available. Removing edges of T between C, and 
 
 both C _'s results in three isolated subtrees T, , T_ and T, . Since T„ and T, 
 d 1 d $ d J 
 
 are the same, however , we actually need only T and T (or TO . In other 
 words, two processors are enough for this decomposition. It is concluded 
 that decomposing a decision tree by removing its edges is impractical. 
 
 In the above example, T is completely the same as T and the set of 
 T , T and T can be considered a redundant decomposition . Similar redundancy 
 can be found, to some extent, in Example 5-1., also. Two identical conditions, 
 C 's, appear in T and T , and so do two C 's in T and T . Two CL ' s can be 
 seen in T and T , respectively. Generally, we may need, for better decom- 
 position, this kind of small redundancy where the same conditions appear in 
 different subtrees. However, for the establishment of our simple decomposi- 
 tion theory, we exclude such redundancy, i.e., we put the following restric- 
 tion on our decomposition problems. 
 
 RESTRICTION 
 
 A condition C. is processed by one and only one processor, i.e., 
 a condition C. appears in only one of the subtrees. 
 
 For simplicity of the theoretical analysis of decomposition problems, 
 
78 
 
 we consider only the case k = 2, that is, the dual processor case. However, 
 results following in this thesis can be easily extended to the general case. 
 We introduce the following terminology. 
 
 DEFINITION 5-2. 
 
 If a pair of decision trees satisfy Restriction, they are called 
 an orthogonal pair of decision trees. 
 
 DEFINITION 5.5. 
 
 Let C 's be conditions corresponding to coordinates of an n-cube 
 
 i 
 
 partition it. Then, an orthogonal decomposition of it is a pair of n-cube 
 partitions, jt_ and ft , such that 
 
 1) it • n < it and 
 
 2) S n S 2 = <jf and S. U S 2 = [all C i 's}, 
 
 where the essential condition set , S., with respect to it is defined by 
 
 J J 
 
 S = [C. ! C. is esssntial to it.} , for i = 1,2. 
 j 1 1 J 
 
 Note that C. is said to be essential to n. if and only if there 
 
 exists at least one block (x„ , x_,..., x.,..., x ) of it. such that x. is 
 
 1 2 1 n j 1 
 
 either 1 or (not dash) in its corresponding C. coordinate. If C. is not 
 
 essential to an n-cube partition it., then it. degenerates to an (n-l)-cube 
 
 J J 
 
 partition by removing the C. coordinate. It is said to be essentially an 
 (n-l)-cube partition. 
 
 PROPOSITION 5.3. 
 
 If it and it is an orthogonal decomposition of it, then any C. € S 
 is inessential to it and any C. G S is inessential to it . Therefore, 
 
79 
 
 it is essentially an (n- #(S ))-cube partition and it is essentially an 
 
 (n- #(S ))-pube partition, where #(A) for a set A is the number of elements 
 of A. 
 
 Some orthogonal decompositions are given in Example 5-3. 
 
 EXAMPLE 5.3. 
 
 We show four orthogonal decompositions, (if , if ), of if. Their corre- 
 sponding decision trees, T and T , essential consition sets, S and S , and 
 two objective functions, C (it , it ) and C (it , it_), are also shown. 
 
 fa) 
 
 W 
 
 C l (lT l' ^ = 2k 
 C 2 (it i; it 2 ) = 8 
 
 T 2 (0 
 
(b) 
 
 80 
 
 U 3 • 
 
 *1 ' *2 
 
 1,(^5 
 
 (c) 
 
 T 2 (« 2 ) 
 
 T^) 
 
 
 T 2 (, 2 ) 
 
 c i ( V It 2 ) 
 c 2 (^ * 2 ) 
 
 16 
 6 
 
(a) 
 
 81 
 
 fl 
 
 *1 ' *2 
 
 w 
 
 °1 ( V 7T 2 ) = 8 
 C g (ir , n g ) = 4 
 
 In (a), ft, • Jt_ = n, and no block of the original partition rt is 
 split. This case is a desirable case in a decomposition. In (b), however, 
 7T • tc < jt, and #(rr • it ) - #(rt) = 9 - 5 = ^+- The partition rt in (c) always 
 yields jt • it = for any nontrivial orthogonal decomposition. (For any tt, 
 there exists a trivial orthogonal decomposition, (it, i), of jr.) Case (c) also 
 seems a good decomposition. In summary, (a) and (d) are good decompositions, 
 and (b) is reasonably good. Case (c) is not effective, because we test all 
 conditions C , C , C and C, for all possible rules of jc. 
 
82 
 
 5.3. Analysis of Orthogonal Decompositions 
 
 In this section we analyze an orthogonal decomposition and show 
 a necessary and sufficient condition for a pair of partitions (or decision 
 trees) to become a decomposition of a given partition it. Based on results 
 obtained in this section, the synthesis procedure, i.e., how to construct 
 orthogonal decompositions for a given partition will be described in the 
 next section. From now on, in order to avoid confusion of the indices of 
 7T and it , we will use a and p, instead, respectively. 
 
 First, a result of theoretical analysis concerning an orthogonal 
 pair of decision trees is presented. 
 
 THEOREM 5.1. 
 
 Let T and T be an orthogonal pair of decision trees which realize 
 partitions a and 3, respectively. If we replace every terminal node of T 
 by the decision tree T , this new tree, denoted by T * T , realizes the 
 partition a • ^. 
 
 EXAMPLE 5.k. 
 
 Consider Example 5.3. case (b) again, tc and it are rewritten as 
 a and (3, respectively. Decision trees, T and T , realizing a and 3, res- 
 pectively, as well as the essential condition sets, S and S , are also shown. 
 
 a 
 
 <X 
 
 a 
 
 s 1 = (c 2 , V 
 
83 
 
 Sg - lc^, c 
 
 Since C and C, (e S ) are inessential to p, p is essentially a 2-cube 
 partition. Its degenerated 2-cube partition p is shown below in Figure 5-2. 
 
 ^ 
 
 n 
 
 Degenerated (3* 
 
 FIGURE 5.2. 
 
 Then, the tree T * T and the partition a ' p are shown in 
 
 Figure 5.3, 
 
 FIGURE 5-3' 
 
Sk 
 
 We note that a subcube a. of a corresponding to the i-th terminal 
 
 node of T is further refined by an attached subtree T , say, {a } is 
 
 refined into fp , P , p }. This refinement of a. (i = 1,2,3) are shown in 
 Figure 5.k. (a), (b), and (c), respectively. 
 
 nU 
 
 a. 
 
 P 
 
 P, 
 
 P 
 
 a. 
 
 ?h 
 
 ^ 
 
 (a) 
 
 FIGURE 5 A. 
 
 (b) 
 
 (c) 
 
 The concept of "refinement of a. of a partition cc by a subtree T " 
 or "refinement of a block a. by another partition p", as seen in the above 
 example, is intuitive and straightforward. Its rigourous definition is given 
 as follows. 
 
 DEFINITION 5 .h. 
 
 Let \ and IL be an i-cube partition and its coordinate set ,i. e. , 
 
 U, = {C. C. is a coordinate of \) 
 A. i I l 
 
 and #(ll ) = I. Let co and S be an m-cube partition and its corresponding 
 A. CO 
 
 essential coordinate set , respectively, i.e., 
 
 S = {C. I C. is essential to col. 
 CO i I i J 
 
 (Now we see that co is essentially a #(S )-cube partition. ) By co we denote 
 the co's degenerated #(S )-cube partition. If S gIL holds, then co can be 
 
 extended to its i-cube extension t by adding all conditions in U. - S 
 Then, we define a refinement , \ * co , of \ by co by 
 \ * w zs X • r . 
 
 co 
 
85 
 
 PROPOSITION 5-k. 
 
 If u. = S holds, then \ * to = \ • co. 
 
 A. W 
 
 EXAMPLE 5-5. 
 
 We explain the refinement a * p of a by p, using a, and p of 
 
 Example 5 • ^ ■ 
 
 2 
 
 U 
 
 ql 
 
 ,c r c 2 , c 3 l 
 
 lc l> V 
 
 ql is a 3-cube partition and its coordinate set is U = {C, , CL, CL } . 
 1 -^ a. r 2' J 
 
 P is a ^— cube partition and its essential coordinate set S = {C , C.,}. 
 
 Then p is essentially a 2-cube partition and its degenerated for P* is 
 
 C 
 shown in Figure 5.5. Since S^ = U holds, P* can be extended to its 
 
 Pa 
 
 3-cube partition t of Figure 5-6. by adding the coordinate C e IJ - S 
 
 Finally, QL * p = a • t is shown in Figure 5-7' 
 
u 
 
 
 
 P* 
 
 86 
 
 FIGURE 5.5. 
 
 FIGURE 5-6. 
 
 NOTE 
 
 FIGURE 5.7, 
 
 In most cases in this thesis, a is simply a cube itself (as 
 seen in this example) rather than a general cube partition. In other 
 words, it is a partition consisting of only one block. Now we give 
 the proof of Theorem 5.1. 
 
 PROOF: By replacing the i-th terminal node of T by a subtree 1~, 
 its corresponding block a of a is refined by f3. (a *• f3 is well- 
 defined because of the orthogonality of T and 3L.) To do this replace- 
 ment for all i, every block a. of a is refined into a. * P by p. This 
 implies that a terminal node of T, * T p corresponds to a block of a. * P 
 in one-to-one manner. Since a block of a. * P is a block of a • P, then 
 
 T x * T 2 realizes a 
 
 Q.E.D. 
 
87 
 
 Since there is no essential difference of roles between a and (3 
 (or T and T ), T * T p and T * T realize the same partition a • p. 
 
 COROLLARY 5-2. 
 
 Let a pair of decision trees T and T be orthogonal and realize 
 partitions a and p, respectively. Then, T * T and T * T realize 
 the multiplication a • p. 
 
 Next we derive a necessary and sufficient condition that an 
 orthogonal pair of decision trees realize a decomposition of a given parti- 
 tion ti. Its fundamental result will be seen in Lemma 5.1. Theorem 5«3« 
 is the final and complete statement of the necessary and sufficient condi- 
 tion. Based on Theorem 5»3«> the synthesis problem, i.e., constructing an 
 orthogonal decomposition for a given partition tx, will be discussed in suc- 
 ceeding sections. 
 
 Assume the following orthogonal pair of decision trees, T and T . 
 Let a and (3 be those partitions which are realized by T and T , respectively. 
 
 i 
 
88 
 
 In order to analyze the relationship between a . (3 and n, 
 recall Procedure R in Chapter ^. Suppose that we are constructing a 
 decision tree for a given it, T is considered as a growing (inter- 
 mediate) decision tree under construction by Procedure R,(i.e., T 
 is partially constructed by some steps of Procedure R) , and then 
 
 we associate a partition it. with the i-th terminal node of T . 
 
 i 1 
 
 tt. is a partition which is to be realized by a forthcoming possible 
 
 subtree rooted at the terminal node of T n . (We note that the xt. 's 
 
 1 i 
 
 are different from the a. 's. a. is simply a subcube as a block of a. 
 
 i i 
 
 Then the following Lemma 5«1- is obtained.) 
 
 LEMMA 5.1. 
 
 Assume that T and T are orthogonal and realize a and $, 
 respectively. Then a necessary and sufficient condition for the pair 
 of qj and p to be an orthogonal decomposition of n is 
 
 ct. * P | «, for every i. 
 
 PROOF: First we prove the necessity by contradiction. Assume that 
 jt. * p < it. does not hold for a particular i. Then,, there exists a 
 pair of two 0- cubes a and b such that 
 
 1) a and b are in a block of a. * p, and 
 
 2) a is in a block of n., but b is in another block of tt.. 
 
 i i 
 
 2) means that a and b are in different blocks of jt since two different 
 blocks of it. are in different blocks of tt. Then we are led to the 
 fact that a and b are in a block of a. * 0, i.e., a block of a • p but 
 not in a block of tc. 
 
89 
 
 This shows that a • P < it does not hold. Next the sufficient condition 
 is proved. If a. * p <: n. for every i, a block of a. * |3 which is a 
 block of a • p for some i is in a block of it.. A block of it., however, 
 is in a block of jr. Therefore, a • P < * holds. That is, a and p is 
 an orthogonal decomposition of it. Q.E.D, 
 
 EXAMPLE 5.6. 
 
 To verify the above statement, we use the example from case (b) 
 of Example 5«3v an d also refer to Example 5.^-. 
 
 If we assume that T is a partially constructed decision tree by Pro- 
 cedure R, we can associate the following partitions tt , it and it , 
 with the respective terminal nodes of T , as shown in Figure 5-8. 
 
90 
 
 FIGURE 5-8 
 
 As we have shown in Example 5.4., each a. * P is as follows: 
 It is obvious that the inequality a. * P ^ jt. holds for every i, and we 
 can conclude that this fa, f3) is an orthogonal decomposition of n. 
 
 >si-*0 
 
 
 a ri a 
 
 a i * p < *]_ 
 
 « 2 * P < * 2 
 
 a 3 * P < * 5 
 
 If we modify the tree T p into T' as shown in Figure 5-9 •> then 
 the a. * f3' are obtained in a similar way. 
 
4* 
 
 91 
 
 Tg'O') 
 
 °l " p ' 
 
 U 
 
 t\ 
 
 QL * 0' 
 
 ^ * 
 
 3 
 
 FIGURE 5-9- 
 
 All a. * P are less than their corresponding Jt.'s. Therefore, (en, P') 
 is another decomposition of it. a • P' is also shown in Figure 5-10. 
 
 a . P' 
 
 FIGURE 5.10. 
 
 Next we present some properties which allow Theorem 5-3« to 
 be stated more neatly. 
 
92 
 
 PROPERTY 5.1. 
 
 1) a. and ir. always have the same coordinate sets, i.e., 
 
 U = U for every i. 
 a. jc. 
 
 1 i 
 
 2) a. * Ms always well-defined. 
 
 PROOF : 
 
 l) is obvious, and its proof is omitted. In order to prove 
 
 2), it is sufficient to show that S„ C U holds, where S Q is the set 
 ' ' 6 = a. 8 
 
 i 
 
 of essential conditions of 8. Let S be the set of essential conditions 
 
 a 
 
 of a and V. be the set of those conditions which appear along the path 
 
 from the root to the i-th terminal node of T , respectively. Then, 
 
 V. C S holds. Therefore, the coordinate set U of a. satisfies 
 i = a ' a. ± 
 
 i 
 
 U = {all conditions} - V = S U S D -V. 3 g 
 a. i a p i = 6 
 
 since V. <= S . In other words, a refinement of a. by 8 can be defined. 
 
 Q.E.D. 
 
 PROPERTY $.2. 
 
 Suppose that T and T realize an orthogonal decomposition 
 
 (a, 8) of it. If the i-th terminal node of T is located at the £-th 
 
 level of T (i.e., £ internal nodes exist from the root to this node) 
 
 and we let V. be a set of those conditions which correspond to I internal 
 
 nodes, then conditions in S - V. are inessential to tx., i.e., #(S )- 
 
 a i i' ' " <x 
 
 #(V. ) = #(S ) - I conditions are inessential to jt.. 
 x cc j_ 
 
 PROOF J We know that U = S U S Q -V. and V.cg (by Property 5.1.). 
 it. en 8i i= * ' 
 
93 
 
 As we have shown in the proof of Property 5-L, U -S_ = U -S_ = S -V. 
 
 ' a. p jr. p a i 
 
 i i 
 
 are also true. Conditions which are added to form a. *■ p are those in 
 
 the above S -V.. Since a. * (3 c jc. holds, those conditions augmented 
 
 a i i = i ' & 
 
 are also inessential to it.. Q.E.D. 
 
 i ^ 
 
 The above Property 5.2. says, in short, that if the conditions 
 
 C , C )>•>} C do not appear along the path from the root to the i-th 
 
 terminal node of T n , then S -{C,C,...,C} conditions are in- 
 l' a p' q' ' r 
 
 essential to n.. 
 
 LEMMA 5-2. 
 
 Make the same assumption as in Property 5*2. Let S be the 
 
 i 
 
 set of essential conditions of n . Then, all S are the same set for 
 
 i it. 
 
 i 
 
 every i, and are equal to S . Q.E.D. 
 
 PROOF : As we have seen in Property 5.1, the set U of jt.'s coordinates 
 
 i 
 
 is U = S_ U S -V. . Property 5-2, however, says that conditions in 
 Jt. Bcci r o s ■> > a 
 
 i 
 
 S -V. are inessential to n. Therefore, the essential coordinate set 
 a i i ' 
 
 S of it. is S Q U S -V. - (S -V.) = B a . 
 jr. i p a i v a x' p 
 
 i 
 
 This Lemma 5 '2. indicates that if T^ and H realize an orthogonal de- 
 composition (a, (3) of n, then all partitions jc. and P are essentially 
 
 #(S Q ) -cube partitions and they have the same set of essential conditions. 
 P 
 
 If we define multiplication and inequality among those two partitions 
 of different sizes which have the same set of essential conditions 
 
9 k 
 
 (as a natural extension of conventional multiplication and inequality 
 between two partitions of the same size), the condition a * 8 < it 
 in Lemma 5.1. can be rewritten as simply (3 < ff.. This is obvious 
 
 since a is a subcube and works as if it adjusts the difference of the 
 
 i 
 
 two coordinate sets of and it.. However, there is no difference 
 
 between S„ and S . Therefore, we can compare 8 and it. in terms of 
 B it . l 
 
 l 
 
 this newly defined inequality . 
 
 THEOREM 5-3- 
 
 Assume that T and T are orthogonal and realize a and 8. Then, 
 
 1) for this pair (a, 8) to be an orthogonal decomposition of 
 
 it, S = S for all i must hold, and 
 it. 8 
 i 
 
 2) a necessary and sufficient condition for this (a, 8) to be an 
 orthogonal decomposition of it is that 
 
 8 < it. holds for all i. 
 
 PROOF : l) is the same as Lemma 5-2. 2) is derived from Lemma 5«1. and 
 Lemma 5.2. Q.E.D. 
 
 This theorem is the main result of this section. Based on 
 this analysis, in the following section we develop discussions about 
 the synthesis problem of decompositions. 
 
 5A. Synthesis of Orthogonal Decompositions 
 
 In this section, a procedure, called Procedure D, to construct 
 orthogonal decompositions of it is shown. Theorem 5«3- plays a key role 
 in that procedure: that is, once T, is given and the it. are determined 
 
95 
 
 we check whether these essential condition sets S are all the same 
 
 ft . 
 
 1 
 
 or not. If they are identical sets, then 3, the counterpart of a, can 
 be determined by 
 
 P = -.? ■ *■ 
 
 all i i 
 
 We 
 
 always use equality S * \ \ , it. instead of 8 < ^ . n. 
 J ^ all li = all l i 
 
 since the maximum element over all 6' such that 8' < H n. is II n. and 
 
 = i i i i 
 
 8 is always better than 8' (8' < 8) for the objective functions pre- 
 viously defined. That is; 
 
 PROPOSITION 5-5- 
 
 If (a, 8) is an orthogonal decomposition of tt, then so is 
 (a, 8 ' ) for any 3 ' such that 8 ' < 8 . Furthermore , the following two 
 inequalities concerning objective functions hold: 
 
 C (a, 8) < c (a, P'), and 
 
 C 2 (a, p) < CgCa, £'). 
 The proof is omitted since it is obvious. 
 
 Now, consider an example to illustrate Procedure D. A rigor- 
 ous description of the procedure will be given later. 
 
 EXAMPLE 5-T- 
 
 Assume a ^--cube partition jt shown in Figure 5-H« If T, is 
 the decision tree with the root C only, the counterpart 6 is deter- 
 mined by 8 = tt • n and it is also shown, a • 8 is also shown in 
 Figure 5- 11 •> and it is verified that a • 8 < ir holds. 
 
96 
 
 ■•0- 
 
 T 2 0) 
 
 - «! • * 2 
 
 3 X c, 
 
 a . p 
 
 FIGURE 5.11. 
 
97 
 
 Now try to expand the tree T into T of Figure 5.12. (a), by- 
 replacing two terminal nodes of T, by two C. r s as follows: All h- 
 
 tt.'s are also shown and their essential condition sets, S , are all 
 
 i 
 the same, i.e., S = {C , C }. The counterpart 3' of a' is deter- 
 
 i 
 
 m 
 
 ined by p' =-5-| **• and is also given in Figure 5-12. (a). We can con- 
 
 struct the decision tree T^ realizing 3' 
 
 T 1 *(«') 
 
 P * =11 ji 
 
 *J 
 
 T^(3^ 
 
 *1 * 2 * 3 \ 
 
 FIGURE 5.12. (a). 
 
 k 
 Then we note the fact that instead of using 3' = .it jt! for determing P', 
 
 it can alternatively be derived by 3' = B • 3 where 3-, and 3 p are 
 
 two partitions obtained by removing the coordinate C. from 3* This 
 
 process is shown in Figure 5*12. (b). 
 
98 
 
 P'-fj_. P 2 
 
 -J 
 
 FIGURE 5-12. (b). 
 
 So far, we have obtained two orthogonal decompositions (a, p) 
 and (a' , p') of jr. T' is obtained by expanding the tree T in such 
 a way that the two terminal nodes of T are replaced by a condition CV. 
 This fact implies a 1 < a. Moreover, the following two points should 
 be noted regarding this step: 
 
 l) two terminal nodes of T are replaced by the same condition 
 CV, not by different conditions (say, C, for one node and 
 CV for the other), and 
 
99 
 
 2) p' can be determined by p' = p • p as an alternative to 
 
 We know that the pair (I, xt) is a trivial orthogonal decomposition 
 of jt. So this means we have obtained the following sequence of or- 
 thogonal decompositions of Jt, 
 
 (I, jt) - (a, p) » (a', p')- 
 
 If we consider the process from (I, jt) to (a, P) as choosing a 
 coordinate C_ from jt and transplanting it in the null tree (realizing 
 the I-partition) , it forms T ; and the corresponding P can be deter- 
 mined by P = jt • jt where jt and it are partitions obtained by removing 
 the C coordinate from it. In a similar way, the second step from (cu, P) 
 to (en', P') can be considered as selecting the condition C> from P and 
 transplanting it at the two terminal nodes of T . Then its counter- 
 part p' is determined by p' = p • P where P and P are two partitions 
 obtained by removing the C« coordinate from p. Continuing this process 
 n times we will obtain the following sequence of decompositions of it: 
 
 (I, it) -> (a, p) - (a 1 , p') - (a", p") -> ... (a*'", I), 
 by C 1 by C^ 
 where I > a > a' '. * . . . . 
 
 Then a question arises concerning point l) . Why should we 
 choose the same condition C. to be transplanted at the terminal nodes 
 of T ? To answer this question, choose CL and C, for the left and right 
 terminal nodes of T and let this expanded tree be denoted by T" of 
 Figure 5-13- 
 
100 
 
 T^V) 
 
 -J C 'U 
 
 FIGURE 5 -13- 
 
 Immediately we see the contradiction to the statement of the necessary 
 
 condition in Theorem 5-3« I n practice, Jt_" and at " have essential 
 
 conditions CL and C. . On the other hand, jt '' and jti," have C_ and CL 
 2 k ? 3 k 2 3 
 
 as their essential conditions. Since S „ U S_„ = { all conditions C n 
 
 a, P" 1 
 
 through C^} and S^,, = (C*, C , C^} , S Q „ = {C 2 } must hold. Therefore, we can 
 conclude that this a" realized by T" cannot have as its counterpart |3' 
 for the pair to be an orthogonal decomposition of n, 
 
 As we have shown in the above example, generally only one 
 condition can be chosen and transplanted at all terminal nodes of T, 
 in order to form the next tree T' . (This is also generally true for 
 any step from (a (i) , p (i) ) to (a (i+l) , P (i+l) .) 
 
101 
 
 There is another aspect of this situation, however. It seems to be just 
 a special case but actually it is an important aspect. It is explained 
 in the following. 
 
 Assume the step from (a, P) . If we choose the condition C 
 to be transplanted at the two terminal nodes of T , its corresponding 
 
 tree I* is as in Figure 5.1^1-. (a). 
 
 *(«*) 
 
 V 
 
 V 
 
 y 
 
 v 
 
 u 
 
 k 
 
 p* = n 
 
 i-l 
 
 i'lGURE 5.1^- fa). 
 
 T 2 *(P») 
 
 •ji -x- - -/r * 
 1 2 
 
 Modified T* 
 
 '3 '^ 
 
 FIGURE 5.1^. (b). 
 
102 
 
 k 
 Then, the counterpart P* can be determined by .JJLrt.*. We note, however, 
 
 that it, was essentially a 2 -cube partition and S = {C , C. } and C 
 
 1 ^ } 4 d 
 
 is not essential. Also, jt* = n* holds, where jc* and it*; are obtained by 
 removing the coordinate C from n . These facts suggest taking the left 
 descendant node C out of T* and associating n* with the original left 
 terminal node of T of Figure 5.1^. (b). 
 
 Now we learned that the existence of the inessential condition 
 C of tt causes some modification to our procedure. That is, if this 
 C is chosen, n degenerates to a "one-size smaller" partition it* (= jc*) 
 without transplanting this C at the corresponding terminal node of T . 
 
 Now we can give the complete form of the procedure for con- 
 structing a series of orthogonal decompositions for a given jr. It 
 is named Procedure D. 
 PROCEDURE D 
 
 1) Initially we set a = I and ^ = tt, and their correspond- 
 ing essential condition sets S = and S p 
 
 {all C.}, where #(S ') = n (i.e., we assume that jt is 
 essentially an n-cube partition). Let T be a null tree, 
 
 2) For the step from (a^, p^) to (c/ i+1 \ ^ ±+1 h , we 
 let T grow to T» ' by the following process. 
 
 a) Choose a condition C^ 1+1 ' ) € S^, 1 ' and let S^ 1 ) = S^ 
 
 U {C (i+l) j andS^ i+l) . S^ - {C (i+1) }. 
 
 b) Assume that j is the index representing the j-th 
 terminal node. 
 
103 
 
 (Initially, this «. , is it = n.) 
 J 
 
 b-l) If C is essential to rt. ', then replace the 
 terminal node of T by this C . Moreover, 
 two (n-i-l)-cube partitions ■n . and it. 
 
 i+D 
 
 which are obtained by removing the coordinate C 
 from the (n-i)-cube partition rt. ', are associated 
 
 J 
 
 with two newly created left and right descendant 
 
 •i+i: 
 
 nodes j and j diverging from C 
 5- 15- (a).) 
 
 (See Figure 
 
 FIGURE 5.15. (a). 
 
10^ 
 
 b-2) If C is not essential to it. , leave this 
 j-th terminal node as it is. With this node is 
 associated the (n-i-l)-cube partition it. , which 
 
 J 
 
 is the degenerated form of the (n-i)-cube partition 
 it. with respect to the coordinate C . (See 
 Figure 5.15.(b) .) 
 
 T (i)( a (i)) 
 
 T< i+l) (c* (i+l) ) 
 
 FIGURE 5.15- (b) 
 
 c) Repeat the above process b) for all j. Then, all it 
 
 (i+1) 
 
 and it. produced by b-l) and all it. by b-2) form 
 the new set of t^ 1+ for all terminal nodes of the ex- 
 
 panded tree T 
 
 (i+D 
 1 
 
 d) Split the (n-i)-cube partition P into two (n-i-l)-cube 
 partitions p and £>^ by removing the C - coordinate 
 
 from ^ X \ 
 
105 
 
 Let P (i+1) =f4 l} • $f>. 
 3) Repeat the above process 2) for i = 0,l,2,,,n-l. 
 
 EXAMPLE 5.8. 
 
 Readers are encouraged to verify the above by Example 5-7. 
 There we had chosen C as C at i = and a (T , T ) pair was con- 
 structed which realized an orthogonal decomposition, {a, P), of n. 
 At the next step, i = 1, if we chose C = C = CV , then, the pair 
 
 (T' , T') realizing (a', P') would be obtained. On the other hand, if we 
 
 (2) 
 chose C = C alternatively, Procedure D would yield the pair (T*, T*) 
 
 realizing (c#, ft*) . 
 
 In the next theorem, we will show that Procedure D guarantees 
 the generation of a series of n orthogonal decompositions of n. 
 
 THEOREM 5.k. 
 
 The Procedure D, described above generates a sequence of n 
 orthogonal decompositions of n, 
 
 (a^\ pM) .(a (2) , p (2) )^(a (5) , p^)) .. . . (a^\ l) 
 
 ( i+1 ) „ (i) 
 where a < a . 
 
 PROOF : The proof is by induction. For i = 0, it is true that a ~ I 
 
 and p = 7t are an orthogonal decomposition of it. We assume, as the 
 
 induction hypothesis, that (a , p ) is an orthogonal decomposition 
 
 of it. In other words, P (i) < n «(*) s{^ n sj^ = and S n (1+l) U S^ i+l) 
 
 -Jjl 2 r 1 2 
 
 {all C.) hold for the i-th step. Then we show that p' 1 ^ < 1J xc/ 1+1 ', 
 
 i = J J 
 
io6 
 
 S (i+1) n S^ i+1) = and s| i+l) U S^ i+l) = {all C.} hold for the (i+l)-th 
 step. Since p < rt. holds for every j, the selection C causes 
 
 J 
 
 p(i) < Tr ( 1+1 ) and p( 1+1 ^ < n ( : 1+1 ' > for j such that c' 1 ^ is essential 
 
 s 
 
 to / 1 ' ) (by the processes 2)- b-2) and 2)-d)). For j such that (T ' 
 J 
 
 ^ . -, ^ (i) o(i) (i) j. j. n j.1 j. o(i) ^ (i+l) 
 is not essential to jt . , (3 < jt. unmediately means that P < it . 
 
 and p^ < iri i+1 ' are true by 2)- b-2) and 2)-d). Therefore, for all j, 
 
 (i + i) . (i) . p (i) < (i+i) holds . 
 
 1 2 = j 
 
 By the way of construction, P . is the maximum element over 
 all partitions P' such that P' < H jr.. Therefore, we obtain P = 
 
 H J. x+1 \ It is obvious that S^ n sj^ = and S^ U S^ = {all C.) 
 lj 1 2 r 1 2 l 
 
 hold for all i = 0,1,,, n-1. Then, any decomposition (a , P " ' ) of 
 jt is orthogonal. 
 
 One more property, a^ < a , is also easily shown. T 
 
 realizing a is a subtree of that T.j which realizes a . 
 
 Therefore, cr 1+ ' < cr 1 . Q.E.D. 
 
 Before ending this section, another theorem is presented 
 which states the relationship among those decompositions generated by 
 Procedure D from the viewpoint of objective functions. 
 
107 
 
 THEOREM 5-5' 
 
 In the step from (cr 1 ^, ^^) to (cr 1+1 ', ^ 1+1 in Procedure 
 D (i = 0,1*, * n-l), 
 
 1) if a selected condition C is essential to all n . , 
 
 J 
 
 then 
 (i+1) a (i+l) ^ (i) Q (i) 
 holds. However, 
 
 2) if C is not essential to some Jt. , then it may 
 
 j.1 j. (i+l) D (i+l) -, (i) (i) 
 possibly occur that a, • (3 and a • P are 
 
 j j., -l. « e (i+l) ^(i+l)\ j// (i+l) 
 not comparable and that C [a , p ) = #{ct 
 
 p (i+l) ) < G^ 1 *, p (i) ) =#(a (i) • p (i) ) is true. 
 
 That is, {(or • P )} for 1*0,1,2,,, n-1 is not always monotonic in 
 
 terms of inequality " < ", Therefore, {C^c/ 1 ', p' 1 ')} is not a monotonic 
 
 function of i, either, although {a } is monotonic. 
 
 PROOF : We prove l) first. If the assumption in l) holds, every 
 terminal node of T is replaced by C as follows. 
 
108 
 
 T, (1 W«) 
 
 T (i + D (a (i + D ) 
 
 (i+l) 
 
 According to Theorem 5.1. in Section 50«* Q: P is realized by 
 
 ,(1) T (D 
 '1 * L 2 
 
 i+l) ,. ft (i+l) 
 
 ,(1+1) J i+l) 
 
 T.',' # T^ ' and soa ' • P is realized by T v # T p 
 
 Then we compare these two trees, T n * T p and T^ •* T p 
 
109 
 
 (1+1) 
 
 1 i 2 
 
 2 
 
 m(i+l) m 
 
 T l * T 2 
 Since T is a subtree of T , attention is focus sed on the 
 
 difference between T^Of T^ 1 ' * T^ and the following subtree of t| 1+ ' 
 
 .(1+1) 
 
 of Figure 5«l6< 
 
 ,(i+l) 
 
 (i+1) 
 
 ,(i+D 
 
 FIGURE 5-l6. 
 
110 
 
 Then it is easy to see that the partition P realized by T^ is 
 larger than or equal to the partition realized by this subtree because: 
 
 v '""' realizes P which is determined by p^ """"'' = p^ 7 • p o 
 
 ,(1+1) 
 u 2 
 
 :i+d ji) . R (i) 
 
 where p^ 1 ' and P^ are partitions obtained by removing the CT 1 
 coordinate from p^ 1 . The following sketch of Figure 5-17- is helpful 
 in understanding the fact. 
 
 ,(i) 
 
 (i) 
 
 (i+1) 
 
 (i! 
 
 (i+D 
 
 (i+D 
 
 (i) 
 
 (!) 
 
 FIGUEE 5.17- 
 
 That is, in Figure 5«17»>the partition P is larger than or equal to 
 the partition which is realized by the tree in the middle. This latter 
 partition is larger than or equal to the partition realized by the third 
 
 tree since p^ 1+1 ^ = p^ 1 ' . p^ 1 ) < pW f p^ 1 ). Therefore, now we can con- 
 clude that T^ * T^ 1 " 1 " realizes the partition which is less than or 
 equal to the partition realized by T.^ 1 ' * T ? . 
 
Ill 
 
 (i+l) Q (i+l) ^ (i) «(i) i , -, 
 In other words, cr • P < or y • p v y holds. 
 
 In order to show the truth of statement 2), it is sufficient 
 
 to give the following example . 
 
 EXAMPLE $.9. 
 
 Assume that we apply Procedure D to the following 5-cube 
 partition it. 
 
 If we select conditions CL and C, at i = and i = 1, respectively, then 
 
 3 4 
 
 (2) (2) (2) (2) 
 
 we obtain the following pair T v and T^ ' realizing a and P ', 
 
 respectively. 
 
112 
 
 T< 2 V S) ) 
 
 #(a (2) ) = * 
 
 T< 2) (2) ) 
 
 '1 
 
 P (2) #(P (2) ) = 5 
 
 (1+1) 
 
 Then we choose C as C for i = 2. This condition C is not essential 
 
 to some partitions jc. and Procedure D generates the pair T., and l\ 
 as follows . 
 
 t[ 5) (c/ 3) ) 
 
 (3) 
 
 #(oP } ) = 6 
 
 #(0 (5) ) = 3 
 
113 
 
 (2) (2) 
 
 Then we can show below two orthogonal decompositions a • (3 V ' 
 
 (3) (3) 
 and a, • p which are not comparable . 
 
 c/ 2 ) . p<«> 
 
 C.(a (2 \ P (2) ) = 20 
 
 c^ . p (5) 
 
 C,(a (5) , p (3) ) = 18 
 
 Furthermore, C-Ca'*'', p'* 5 ') < C (cr^ p^) is true, because C_(cr* , P^) 
 = #(a (3) • #(p (3) ) - 18 < C n (a (2) , p (2) ) . #(a (2) ) • #(P (2) ) - 20. 
 
 = 1 
 
 Q.E.D. 
 
nh 
 
 5*5> Discussion of Optimal Decompositions 
 
 In the previous section we have shown the procedure, Procedure D. 
 
 to construct a series of orthogonal decompositions [{a , $r ')} for 
 
 a given partition it. It, however, was not shown which condition C 
 
 should be selected at each step of the procedure. The role of this 
 
 procedure, therefore, is quite similar to that of Procedure R in Chapter 
 
 3 for constructing a decision tree for a given partition tc. It did not 
 
 show which condition C should be selected at the i-th step, either. 
 
 s 
 
 It could show only the way to construct one of the decision trees realiz- 
 ing Jt' (n'< it) for a given it. For both procedures, it is quite obvious 
 
 that the way of choosing the condition C in Procedure D (or C 
 
 s 
 
 in Procedure R) greatly influences the cost of constructed decision trees. 
 
 For decomposition problems, we have proposed two objective 
 functions in Section 5*2. It is hoped that, based on Procedure D, some 
 algorithms to construct optimal decompositions for these objective 
 functions may be developed. Some intuitive (heuristic) algorithms or 
 formulations by some mathematical programming tools are expected. 
 
 In what follows, first we discuss the exhaustive search for 
 optimal solutions, and thereafter a heuristic algorithm is proposed. 
 In Procedure D, there are n! possible ways to select a sequence of 
 C ■ s . For each sequence there are n orthogonal decompositions 
 generated excluding the trivial decomposition (l,«). Then, totally, n • n! 
 orthogonal decompositions can be generated if we exhaust all possibilities. 
 This number, however, can be reduced to n • n!/2 since there is no 
 essential difference between a> X ' and p^ 1 ': a cr 1 '' being one of the 
 P^ 1 ' and a p^ being one of the cf X \ 
 
115 
 
 The structure of the exhaustive algorithm can best be explained 
 with the help of the following tree. 
 
 START 
 
 The node (pq...r) at the i-th level of the tree stands for T.j con- 
 structed by C^ 1 ^ = C , (r ' = C , ,, and c' 1 ^ = C at each step of 
 
 p q r 
 
 Procedure D. From a node at the i-th level, (n-i) edges, showing 
 possible (n-i) choices of C , diverge; and each of them connects 
 
 to a node at the (i+l)-th level, say, from the node (12) of the second 
 
 level to the node (123) in the third level by the edge C 
 
 3' 
 
 This transition is shown next more explicitly. (We note that a de- 
 
 generation occurs in T . ) 
 
116 
 
 ,(2) 
 
 ,(3) 
 
 We learned that there are n • n! possible nodes in the above 
 
 tree. Not all of them, however, is distinguished. We show an example, 
 
 Corresponding to nodes (21) and (l2j , we have two decision trees T-j 
 
 (2)' 
 
 and T shown below, respectively. 
 
 (2) 
 
 J 2 ^ 4 2 ) 
 
 *2 "3 
 
 .(2), (2), (2), (2), 
 
117 
 
 (2) (2)' 
 We note that two partitions a and a realized by these 
 
 (2) (2)' (2) (2)* (2) (2)' 
 
 T and Tp are identical and, moreover, it x ji* , rt£ ' = jr^ ' , 
 
 (2) (2)' (2) (2)' (2) 
 
 jC = TTp and *h *= *£ hold. Then, their counterparts p and 
 
 (2)' 
 P are also the identical partition. In other words, two orthogonal 
 
 (2) (2) (2) ' (2) ' 
 decompositions (a , P ) and (a , p ) are identical. Further- 
 more, any selection of the same C (k = 3,4,..., n-l) for developing 
 
 (2) (2)' 
 both T, and T by Procedure D, yields the same decomposition 
 
 (a , P ) for k = ^,K,..., n-l. In summary, we need not distinguish 
 the node \21j from the node n.2J . Therefore, we let them merge together 
 in the tree as follows. 
 
 ©---© 
 
118 
 
 (2)' 
 
 However, if degeneration occurs for T , for example, we 
 
 have a different situation. That is, the following two decision trees 
 
 (2) (2)' 
 
 Tj. and T should be distinguished from each other, because the 
 
 application of Procedure D to these two trees results in different 
 
 decompositions. Then, their corresponding nodes \2l) and \12J should 
 
 not merge. 
 
 ,(2) 
 
 '2)' 
 
 How many mergings of nodes occur in the generation tree is de- 
 pendent of a given partition it. If no degeneration occurs, as an 
 extremal case, at each step of Procedure D for any selected sequence of 
 
 :*) 
 
 then the generation tree degenerates to the following small tree 
 
 due to mergings of nodes. There, we have only nodes (I £ ... f. 
 
 where £ <£<...< i. at the i-th level for i = 1,2,..., n. 
 
 This 
 
 means that there are exactly (j_) nodes at the i-th level and we can 
 conclude that this number of nodes totally amounts to 2 (?) = 2 n . 
 
119 
 
 Instead of enumerating all n • n! candidates (this is another extremal 
 case), we usually need to consider a number of decompositions "between 
 2 n and n • n' for the exhaustive method. 
 
 If we calculate two objective functions C (a , |3 ) and 
 Cp(cr , p ) for all decompositions that correspond to nodes of this 
 tree, we can find the best solution. We need, however, the effort to 
 generate all n • n! candidates. Therefore, we sacrifice guaranteeing 
 the best solution but we save the cost of the work which would have 
 been required to find some reasonable or suboptimal solutions for both 
 objective functions. 
 
 Based on Procedure D, Theorem 5.^-., and Theorem 5«5«>we discuss 
 a heuristic algorithm for finding suboptimal orthogonal decompositions. 
 
 We have already shown the following facts 
 
 ,(i+X) ,, 
 
 For any selected sequence of C 
 
 (i = 0,1,,,, n-1), 
 
120 
 
 ,(i) 
 
 l) a decreases as i increases. That is, 
 
 and, 
 
 a (0) = l>a (1) >a (2 >>... 
 
 #(a ) is a monotonically increasing function of i, 
 #(a (0) ) =0 <#(a (l) ) <#(a (2) ) < ..., 
 
 (Theorem 5.4.) 
 however, #(a • P ) is globally decreasing, i.e., 
 
 2) a • p is not monotonic (Theorem 5. 5.). I n general, 
 
 #(a (0) • p (0) ) =#(! • *) -#(*) <#(a (l) - P (l) ) < 
 
 #<a< 2 ' V 2 ))<..., 
 
 ,(i) 
 
 3) It has been stated explicitly, but #(P ) is generally 
 decreasing, i.e., 
 
 #(p (0) ) = #(*) >#(p (1) ) >#(p (2) ) > ... . 
 
 (Since any orthogonal decomposition a • P of jt must satisfy 
 a • (3 < ti, #(a • P) > #(«) holds. For the first objective function, 
 therefore, the trivial decomposition (a , P ) = (I, it) is optimal 
 because C (I, Jt) = #(fl) achieves the minimal #(jt). However, this de- 
 composition is not reasonable since its second objective function 
 C (I, jt) = #(l) + #(jt)-2 is unreasonably large. Therefore, we consider 
 "suboptiinal" as "reasonably good" for both objective functions , C (a, P) 
 and C 2 (a, p) . ) 
 
 Now we have two criteria, i.e., C (a, (3) = #(a) ■ #(P) and 
 C 2 (a, p) = #(a) + #(p)-2. In order to find a reasonable solution by 
 applying Procedure D, an algorithm has to show: l) which condition 
 
 i+1) 
 
 should be chosen at each step and 2) at which i we should stop 
 
121 
 
 Procedure D. The first requirement corresponds to selecting an edge of 
 the tree (defined before) and the second requirement corresponds to know- 
 ing at which level of that tree we should stop. This is shown in the 
 sketch below. 
 
 START 
 
 STOP 
 
 In order to explain these points more simply, we assume that 
 C 1 (c/ 1 ' ) , p^) = #(c/ 1 ' ) ) • #(P^) is constant for any i and for any 
 selected sequence of C . Then an optimal decomposition for the 
 second objective function CLCcr 1 ', P^O = #(0^0 + #(P^)-2 can 
 
122 
 
 be found around the i-th step such that #(cr ') = #(a^ ') is attained. 
 (Recall that x + y is minimized at x = y if x • y is constant for real 
 numbers x and y.) This strong assumption of #(a ) • #(fr ') being 
 constant is not true in general, but if #(a ) • #(P ) is gradually 
 increasing and it does not deviate much from the lower bound #(«), 
 the above criterion, #(or ) = #(p ), seems reasonable. 
 
 In summary, we choose a condition C at each step so that 
 its resultant C x (a (i+1 \ P (1+l) ) = #(a (i+l) ) ■ #(P (l+l) ) is minimized 
 over all possible choices of conditions, and we stop this algorithm 
 around i such that #(a ) = #(p ) is achieved. If we recall 
 here that #((3 ) is generally decreasing and ff(of ) is increasing, 
 we can realize that without much loss we reach this i in straightforward 
 way. Therefore, this method to search for suboptimal solutions is not 
 impractical. 
 
 The condition f(oc ) = #(P ) is attained around i such that 
 the derivative (gradient) of C (a , fv ') changes from negative to 
 positive. (Recall also that the derivative of x + y changes from 
 negative to positive at x = y as x increases, assuming that x • y is 
 constant.) Therefore, alternatively we can state that we stop Procedure 
 D around i such that C 2 (cr 1+1 ), p' 1 ^) * Q q(<X s P ) is attained. 
 
 Now we give the complete statement of the algorithm. 
 
 ALGORITHM 
 
 At each step of Procedure D, choose a condition C such 
 that its resultant orthogonal decomposition (a , f3 ) minimizes 
 
 
123 
 
 Cjfo/ 14 " 1 *, P (i+1) ) - #(a (i+1) ) • #(P (i+l) ). Repeat steps from i = 
 through k such that #(05 ) = f(p ) is achieved, or alternatively, 
 
 C 2 (a (k+1) , p (k+l) ) 5 C 2 (a (k) , f3 (k) ) is attained. 
 
 EXAMPLE 5-10. 
 
 We show here an example for which the above algorithm works 
 effectively. Assume the following 5-cube partition it. 
 
 > C, 
 
 #(*) = 16 
 
 (1) 
 
 ,(2) 
 
 Then our algorithm selects C v J = C , C v = C, and C 
 
 ,(3) 
 
 « c . 
 
 At 
 
 i = 1, #(a (2) ) a#(P (2) ) is attained. And C (a (5) , p (5) ) £ C (a (2) , f3 (2) ) 
 
 2 X^ , r- / 2 
 
 at i = 2. Therefore, we stop the procedure at i = 1 or 2. Both are 
 
 reasonably good decompositions. In Figure |5.l8.,(Q! , P ) as well as 
 (tJ 1 ), T^) for i = 1,2,3 are shown. 
 
 J- c 
 
12k 
 
 i) 1=0, c< 1+1 > -c< l >-c 
 
 ifV 1 )) 
 
 #(a (1) ) = 2 
 
 c 3 
 
 C 2 \^ C 1 
 
 (1) 
 
 2) i - I, C (i+1) = C (2) . C 
 
 JD,.^ 
 
 #(e (1) ) = 9 
 
 e^ 1 ', e (1) ) 
 
 18 
 9 
 
 J< a V 8) ) 
 
 T (2) (p (2) } 
 
 #(a (2) ) = It 
 
 vl/ c - 
 
 (2) 
 
 #(P (2) ) 
 
 = 5 
 
 C^a^, p (2) ) = 20 
 
125 
 
 TpV 3) ) |c 
 
 Cl 
 
 J- 
 
 #(P (3) ) = 3 
 
 #(a (3) ) = 7 
 
 C^a^, 3 (3) ) = 21 
 C 2 (o (3 \ P (3) ) - 8 
 
 To see what follows if we continue this procedure, we show below 
 T and P which are generated by choosing C = C = CL. 
 
 U) i . 3 , C (1+1) = C {k) = 0, 
 
 TfV") 
 
 T< U V°> 
 
 I I 
 
 CO 
 
 #(a (U) ) = lU 
 
 #0 (U) ) . 2 
 0l (a (U) , p (U) ) =28 
 
126 
 
 APPENDIX LITERATURE SURVEY 
 
 Much has been published concerning decision tables, decision table 
 languages and their applications in various areas. The following books can 
 serve as as introduction to this subject: Hughes et al. [6], Katzan [9], 
 McDaniel [17], [l8], [19] and Pollack et al. [2k], Most of these texts cover 
 introductory material through some specific applications and/or decision 
 table languages. Some of them also include topics on decision table convert 
 sion problems. A good summary and concise survey of research topics in this 
 field can be found in Katzan [9] and King [12]. 
 
 Many useful discussions concerning decison table conversion problems 
 were first given by Montalbano [20]. Egler [2] attempted to give a very 
 simple manual method for converting decision tables into decison trees. He 
 thought that it minimizes both the average processing time and the storage 
 requirement. Montalbano [21], however, refuted Egler r s algorithm by showing 
 a counterexample. Pollack [23] proposed two plausible procedures: One for 
 minimizing the storage requirement and the other for the average processing 
 time. He asked readers to prove his algorithms or to offer counterexamples 
 showing his algorithm fail. There is a counterexample in Sprague [3I] which 
 shows that neither algorithm guarantees optimality for its respective objec- 
 tive. By introducing the concept of entropy in information theory, Schwayder 
 [28] modified Pollack's algorithms but it is known that the algorithm does 
 not always generate an optimal tree either. Earlier, before Pollack's [25] 
 appeared, Press [25] gave another simple manual procedure as well as an 
 interesing discussion of decision table languages. 
 
127 
 
 None of the methods discussed above generates optimal trees in 
 all cases. On the other hand, Reinwald and Soland attack the conversion 
 problems by a branch and bound method in their papers [26], [27]. Their 
 algorithms guarantee to produce optimal trees, but they are fairly complex 
 and time consuming. 
 
 Recent work of Alster [l] gives an attempt to extend decision 
 table conversion problems into a more generalized decision tree construction 
 problem. It deals with constructions of optimal decision tree not only for 
 rule partitions but also for general partitions. Several heuristic algorithms 
 to minimize the number of internal nodes of the trees and their results by 
 computer implementation are described. 
 
 Another decision tree construction problem, called a binary 
 identification problem by the author, can be found in Garey [3]. The decision 
 table considered do not have "-" (dash) entries, which means that the correr- 
 sponding cube partitions are of the following special type: For a decision 
 table with m rule columns and n condition rows, the corresponding partition 
 consists of m O-cubes (each corresponding to a rule) and one block of (2 - m) 
 O-cubes (the Else-rule). Garey' s approach is similar to the well known 
 optimal binary search tree constructions (Knuth [15] and Hu and Tucker [7l)« 
 His main discussions are how the exhaustive algorithm which he describes can 
 be improved. He finds some specific relationships among probabilities of 
 occurence of rules and/or costs of conditions, which reduce the amount of 
 work if they are met. 
 
 Also to be mentioned here is an earlier book by Picard [22], which 
 contains a number of results about general decision trees, usually of the type 
 

 that under certain conditions a tree of a certain structure is optimal. 
 
 There are also similarities between the construction of decision 
 trees and relay network realizations of Boolean functions (see, e.g., 
 Harrison [h]) . Since the role of a transfer relay in a network and of a 
 decision box in a decision tree is the same, transfer relay realizations of 
 Boolean functions are apparently a special case of decision tree construc- 
 tions. In Marcus [l6], an algorithm to realize a Boolean function with a 
 small number of transfer relays, using Karnaugh map techniques, is proposed. 
 We know that the iterated local minimization in this thesis is a generaliza- 
 tion of his method. Also, a correspondence between decision tree construc- 
 tions and transfer relay realizations of Boolean functions is described by 
 Seshagiri [293- The objective of these authors is to reduce the number of 
 relays used in realizations of Boolean functions. There is no such a concept 
 as an average processing time in switching theory, since all relays require 
 the same period for thir operation. Slagle's work [30] is more close to our 
 subject. He discusses an effective binary decision tree construction for a 
 given Boolean expression. 
 
 In this thesis we discussed only methods for converting decision 
 tables into decision trees. There are, however, other fundamentally different 
 approaches to processing tables by computers. Kirk [13] and King [10] 
 proposed and developed the use of mask matrix techniques , respectively. 
 Veinott [32] also shows a programming technique to interpret tables into 
 computer programs. These methods, however, need the evaluation of all condi- 
 tions for each input datum and it is obviously wasteful of execution time.. 
 
129 
 
 Finally we mention two papers, King [11] and Press [25], for 
 discussions of ambiguity and redundancy problems of decision tables. 
 
130 
 
 LIST OF REFERENCES 
 
 [l] Alster, J. M., "Heuristic Algorithms for Constructing Near-Optimal 
 Decision Trees," M.S. Thesis, Department of Computer Science, 
 University of Illinois at Urhana- Champaign, Urbana, Illinois, 
 Report No. k-jk (August 1971)- 
 
 [2] Egler, J. F., "A Procedure for Converting Logic Table Conditions 
 into an Efficient Sequences of Test Instruction, " 
 Coram. ACM , Vol. 10, No. 8, pp. 510-514 (August 1967) . 
 
 [3] Garey, M. R., "Optimal Binary Decision Trees for Diagonostic 
 Identification Problems," Ph.D. Thesis, Department of 
 Computer Science, University of Wisconsin (1970). 
 
 [h-] Harrison, M. A. , Introduction to Switching and Automata Theory , 
 McGraw-Hill Book Company, New York; Chapter 7 (1965) . 
 
 [5] Hartmanis, J. and Stearns, R.E., Algebraic Structure Theory of 
 Sequential Machines , Prentice-Hall, Inc., Englewood Cliffs, 
 New Jersey, Chapter (1966) . 
 
 [6] Hughes, M. L. , Shank, R. M., and Stein, E. S., Decision Tables , 
 MDI Publications (Management Development Institute Division 
 of Information Industries, Inc.), Wayne, Pennsylvania (1968). 
 
 [7] Hu, T. C, and Tucker, A. C, "Optimal Computer Search Trees and 
 
 Variable -Length Alphabetical Codes," SIAM J. on Applied Math. , 
 Vol. 21, No. k, pp. 51^-532 (December 197lT« 
 
 [8] Huffman, D. A., "A Method for the Construction of Minimum 
 
 Redundancy Cods," Proc. IRE, Vol. ko, No. 9, pp. IO98-IIOI 
 (September 1952) . 
 
 [9] Katzan, H. , Jr., Advanced Programming , Van Nostrand Reinhold, 
 New York, Chapter 9 (1970). 
 
 [l0] King, P. J. H. , "Conversion of Decison Tables to Computer Program 
 by Rule Mask Technique," Coram . ACM , Vol. 10, NO. 2, 
 pp. 135-1^2 (August 1967). 
 
 [ll] King, P. J. H., "Ambiguity in Limitted Entry Decision Table," 
 Coram . ACM , Vol. 11, No. 10, pp. 680-684 (October 1968). 
 
 [12] King, P. J. H., "Decision Tables," Computer J., Vol. 10, No. 2, 
 PP- 135-lte (August I967). 
 
131 
 
 [lj] Kirk, H. W. , "Use of Decision Tables in Computer Programming/' 
 Comm . ACM , Vol. 8, No. 1, pp. 4l-44 (January 1965) . 
 
 [l4] Knuth, D. E., The Art of Computer Programming , Addison-Wesley 
 
 Publishing Company, Reading, Massachusetts, Vol. 1, Chapter 2 
 (1968). 
 
 [15] Knuth, D. E., "Optimal Binary Search Trees," Acta Informatica , 
 Vol. 1, Fasc. 1, pp. 14-25 (1971). 
 
 [l6] Marcus, M. P., "Minimization of the Partially-Developed Transfer 
 Tree," IRE, Trans, on Electronic Computers, EC-6, pp. 92-95 
 (June 1957). 
 
 [17] McDaniel, H. , An Introduction to Decision Logic Tables , John Wiley 
 and Sons , Inc . , New York X19&J) , 
 
 [l8] McDaniel, H., Applications of Decision Tables , Brandon/ Systems 
 Press, Inc., New York TI970 ) . 
 
 [19] McDaniel, H., Decision Table Software , Brandon/Systems Press, Inc., 
 New York (1970). 
 
 [20] Montalbano, M., "Tables, Flowcharts and Program Logic," IBM 
 System J. , Vol. 1, pp. 5I-63 (September 1962). 
 
 [21] Montalbano, M., "Egler's Procedure Refuted," Comm . ACM , Vol. 7, 
 No. 1, p. 1 (January 1964). 
 
 [22] Picard, C, Theorie des Questionnaires , Les Grands Problems des 
 Science 20, Gauthier-Villars, Paris (in French) (1965) • 
 
 [23] Pollack, S. L. , "Conversion of Limitted Entry Decision Tables to 
 Computer Programs," Comm. ACM, Vol. 8, No. 11, pp. 677-682 
 (November 1965) . 
 
 [24] Pollack, S. L., Hicks, H. T., Jr., and Harrison, W. J., 
 
 Decision Tables : Theory and Practice , John Wiley and Sons, 
 Inc., New York (1971). 
 
 [25] Press, L. I., "Conversion of Decision Tables to Computer Programs," 
 Comm . ACM , Vol. 8, No. 6, pp. 385-390 (June 1965) . 
 
 [26] Reinwald, L. T. and Soland, R. M., "Conversion of Limitted Entry 
 Decision Tables to Optimum Computer Programs I: Minimum 
 Average Processing Time," J. ACM , Vol. 13, No. 3, pp. 339- 
 358 (July I966). 
 
132 
 
 [27] Reinwald, L. T. and Soland, R. M., "Conversion of Limitted Entry 
 Decision Tables to Optimum Computer Programs H: Minimum 
 Storage Requirements," J. ACM, Vol. 1^, No. k, pp. 7^2-755 
 (October 1967) . 
 
 [28] Schwayder, K. , "Conversion of Limitted Entry Decision Tables to 
 
 Computer Programs A Propsed Modification to Pollack's 
 
 Algorithm," Comm . ACM , Vol. lk, No. 2, pp. 69-73 
 (February 1971) . 
 
 [29] Seshagiri, N. , "Relay Tree Network Decomposition of Decision 
 Tables," Proc . IEEE , Vol. 55, No. 9, pp. 16^8-16^9 
 (September I967T 
 
 [30] Slagle, J. R., "An Efficient Algortihm for Finding Certain 
 Minimum-Cost Procedures for Making Binary Decision," 
 J. ACM , Vol. 11, No. 3, pp. 253-26^ (July 196^). 
 
 [31] Sprague, V. G., "On Storage Space of Decision Tables," Comm . ACM , 
 Vol. 9, No. 5, pp. 319-320 (May 1966) . 
 
 [32] Veinott, C. G., "Programming Decision Tables in FORTRAN, COBOL 
 
 or ALGOL," Comm. ACM, Vol. 9, No. 1, pp. 31-35 f January 1966) 
 
133 
 
 VITA 
 
 The author, Toshio Yasui, was born in Kyoto, Japan, on 
 May lk, 19^3- He received the Bachelor of Science and the Master of 
 Science degrees both in electronic engineering from Kyoto University 
 in 1966 and 1968, respectively. 
 
 From September I968 to June 1971? he worked as a research 
 assistant with the Illiac IV Project of the Department of Computer 
 Science of the University of Illinois at Urbana-Champaign for their 
 development of the large scale parallel computer Illiac IV. 
 
 He has been a research assistant with the Center for Advanced 
 Computation of the University of Illinois at Urbana-Champaign since 
 June 1971. 
 
 He is a member of the Institute of Electrical and Electronics 
 Engineers and the Association for Computing Machinery. 
 
LIOGRAPHIC DATA 
 ET 
 
 1. Report No. 
 
 UIUC DCS-R-72-501 
 
 3. Recipient's Accession No. 
 
 it le and Subtitle 
 
 inversion of Decision Tables into Decision Trees 
 
 5. Report Date 
 
 February, 1972 
 
 uthor(s) 
 
 Toshio Yasui 
 
 8- Performing Organization Rept. 
 
 No. 501 
 
 erforming Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urbana- Champaign 
 
 Urbana, Illinois 6l801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 USAF 50(602) klkk- 
 DAHC04-72-C-0001 
 
 Sponsoring Organization Name and Address 
 
 Illiac IV Project, Center for Advanced Computation, 
 and Department of Computer Science of the 
 University of Illinois at Urbana-Champaign, Urbana, 
 Illinois ' 61801 
 
 13. Type of Report & Period 
 Covered 
 
 Ph.D. Thesis 
 
 14. 
 
 i. upplementary Notes 
 
 , bstracts 
 
 Known manual methods of converting decision tables into decision trees are 
 ased mainly on plausible arguments with little theoretical backgrounds. Our inten- 
 icn is to establish a new theory in this field. By considering a special kind 
 f partitions of 2 n vertices of an n-cube as a model of decision tables, we put the 
 onversion problem into a simplified and abstract form. We derive some theoretical 
 esults concerning the optimization problem, and then an algorithm called iterated 
 ocal minimization is proposed and compared quantitatively with other algorithms. 
 
 Also, the new topic, a decomposition theory of decision tables and decision 
 rees is presented. We consider decomposing decision tables on decision trees into 
 nailer ones so that they can be processed effectively in parallel. 
 
 • ;y Words and Document Analysis. 17o. Descriptors 
 
 decision tables, decision table languages, decision trees, decomposition 
 leory, parallel processing, partially ordered sets of partitions, partitions of 
 vertices of an n-cube partition. 
 
 >>• lentifiers /Open-Ended Terms 
 
 - 
 
 JSATI Field/Group 
 
 A 'lability Statement 
 
 Release unlimited 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 137 
 
 22. Price 
 
 IS-35 ( 10-70) 
 
 USCOMM-DC 40329-P71 
 
Jtt» 
 
 \<*n