UN/VCRSITY OF, 
 
 ILLINOIS LIBRARY 
 
 At URBANA-CHAMPAIGN 
 
The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 To renew call Telephone Center, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 I 
 
 <t#. ' ° n 
 
 
UIUCDCS-R-78-867 
 
 SiO. H 
 IUa; 
 
 2^ 
 
 ~frUu£% UILU-ENG 78 1720 
 
 SELECTION OF MOST REPRESENTATIVE 
 
 TRAINING EXAMPLES AND INCREMENTAL 
 
 GENERATION OF VL HYPOTHESES: 
 
 the underlying methodology 
 
 and the description of programs 
 
 ESEL and AQ11 
 
 by 
 
 R. S. Michalski and J. B. Larson 
 
 May 1978 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
 University ol Illinois 
 r.rbana-Chamr 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/selectionofmostr867mich 
 
SELECTION OF MOST REPRESENTATIVE 
 
 TRAINING EXAMPLES AND INCREMENTAL 
 
 GENERATION OF VL HYPOTHESES: 
 
 the underlying methodology 
 
 and the description of programs 
 
 ESEL and AQ11 
 
 by 
 
 R. S. Michalski and J. B. Larson 
 
 May 1978 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 This work was supported in part by National Science Foundation Grant 
 NSF MCS 76-22940 and in part by a Senior Visiting Fellowship from 
 British Science Research Council. 
 
ABSTRACT 
 
 The paper describes the underlying theoretical framework 
 and operational details of two programs, ESEL and AQ11, for computer 
 induction within the framework of the variable-valued logic system 
 VL^ (i.e., a statement calculus which involves variables with an 
 arbitrary number of discrete values [Michalski 1974])! 
 
 ESEL - A supporting program for selecting 'most represen- 
 tative' learning and/or testing VL^ events from 
 a large data base of events. The program provides 
 the input to the program AQllt 
 
 AQ11 - A program for incremental generation of VL^ hypo- 
 theses, which are generalized and optimized des- 
 criptions of input event sets. The program also 
 provides a facility for evaluating the performance 
 of these inferred hypotheses on testing events. 
 
 Given a large set of examples describing certain objects 
 or situations, program ESEL selects from them a small subset of 
 the most representative ones. The examples have to be in the 
 form of VLi events, i.e., in the form sequences of values of cer- 
 tain discrete variables (or descriptors). In selecting the events, 
 the program distinguishes among three types of descriptors: nominal 
 descriptors, whose value set is an unordered set, linear descrip- 
 tors, whose value set is a linearly ordered set, and structured 
 descriptors, whose value set is a tree-ordered set. 
 
 Events selected by ESEL are Input to program AQ11, 
 which generates VL^ hypotheses describing the events. The program 
 can work incrementally, i.e., given a working hypothesis (a set of 
 rules) obtained at some stage, and a set of events, the program can 
 modify the hypothesis to make it consistent with the events. 
 
 Program AQ11 also has the facility to test the performance 
 of a given hypothesis on a set of testing events, and to compute 
 an extended confusion matrix. 
 
1. SELECTION OF THE MOST REPRESENTATIVE TRAINING EXAMPLES: Program ESEL 
 
 1.1 Basic Concepts and Notation 
 
 The purpose of program ESEL is to select a subset of 
 most representative events from a large number of VL events 
 (see definition below) . The need for using this program arises 
 when a given training set of events is very large (say, a few hundred 
 or more events) and AQVAL/1 inductive programs (program AQ11, des- 
 cribed here, also AQ7 [Larson, Michalski 75], Uniclass-RS [Stepp 76], 
 AQ9 [Cuneo 75], AQPLUS [Forsburg 75], SYM-4 [Jensen 75], YAL [Yalow 77]) 
 could not accept such a large number of events Or would run very 
 inefficiently. 
 
 The theoretical background for ESEL is given in [Michalski 
 
 75]. Here, for completeness, we will summarize it, and then describe 
 
 the program itself. 
 
 Let £(d, , d OJ ..., d ) or, briefly, &, denote a set of 
 1 2 n 
 
 all n-tuples (x ' , x' ..., x'), x! £ D . , 1*1.2,..., n, where D, are 
 12 nil i 
 
 certain finite sets and d is the cardinality of D . Thus: 
 
 £(d. , ..., d ) = D. x D x ... x D (1) 
 
 1 n 1 2 n 
 
 & is called the universe of events, and its elements are called events, x! 
 -l 
 
 and D , i= 1,2,3..., denote a value and the domain ( value set ) of the 
 descriptor x , respectively. Descriptors* are certain direct (or derived) 
 measurements or characteristics of objects or situations. 
 
 * 
 A descriptor, as described here, is equivalent to a variable. (In a more 
 general sense, not considered here, a descriptor can also be an n^-ary 
 relation or n-argument function.) 
 
Depending on the nature of a descriptor, its domain 
 may have a different structure, e.g., it can be a linearly ordered set, a 
 partially ordered set, or an unordered set. 
 
 Three categories of descriptors are distinguished here: 
 
 I. Nominal or cartesian descriptors whose domains are sets that 
 have no order. 
 
 II. Linear (interval) descriptors whose domains are any linearly 
 ordered sets. Thus, this category includes ordinal, interval, ratio and absolute 
 variables, as defined in mathematical psychology. 
 
 III. Structured descriptors whose domains are partially ordered 
 sets < S, >_ > that are neither linearly ordered nor totally unordered. In 
 this paper we will restrict ourselves to the case of partially ordered sets 
 having the property that for any two elements a, b E S, there exists at least 
 one element o such that 
 
 a ^_ a and b £_ Q 
 Sets with such structures will be called generalization structures or g-struc- 
 tures . Figure 1 presents a Hasse diagram of a g-structure. 
 
 An example of g-structure 
 Figure 1 
 
 * The term used in our previous papers on this subject, 
 
In the diagram, a relation a _> b is represented by placing node a above 
 
 node b and linking the nodes by an arc. 
 
 Examples of descriptors : the blood type of a person is a 
 nominal descriptor, the height or weight of a person is a 
 linear descriptor, and the position of the person in an 
 hierarchy of an institution is a structured descriptor. 
 
 Suppose, without loss of generality, that we are given two 
 event sets, El and EO, where El, EO C g y each associated with a certain 
 decision or action k (k = 1 and 0, respectively). These sets define a set of 
 functions 
 
 {f: £ + D} (2) 
 
 such that 
 
 {e | f(e) = k} = Ek, k = 1,0 (3) 
 
 where e £ & and D = {0,1,*}; '*' in D means 'no decision'. 
 
 A problem of inductive inference is to determine an expression V 
 of a function f, which is most desirable, with respect to some criterion, among 
 all the expressions of all functions (2). Such an expression will usually 
 also assign values 1 or to events not included in Ek; i.e., the expression 
 will be a certain generalization of the sets Ek. Namely, the initial set Ek 
 will be transferred into sets Ek(V) 3 Ek, where 
 
 Ek(V) = {e | V(e) = k} , k = 1,0 
 V(e) - the value of the expression V for the event e. 
 
 AQVAL/1 programs (Michalski 77) can be used to solve the problem 
 if the expression V is restricted to the class of DVL 1 expressions and the sizes 
 of the sets Ek do not exceed certain limits. When sets Ek are very 
 large (say, a few hundred elements or more) , then the computational time of 
 the programs may be too long. The problem arises as to whether sets Ek could 
 not be reduced to more manageable sizes and still provide sufficient information 
 about decision classes from the viewpoint of inductive inference. 
 
 If a precise measure of a "degree of representativeness" of each 
 event e £ Ek were available, then an event reduction process could be performed 
 
simply by selecting events whose 'degree of representativeness' is above a 
 certain threshold. For example, the frequency of occurrence of an object with the 
 description e in the class k could serve as an estimate of such a measure. 
 This estimate, however, in many practical problems is either not available or 
 is not adequate. Consequently, some other means must be developed for selecting 
 the 'most representative' events. 
 
 There can be a number of different methods of solving this problem 
 (see, e.g., [Michalski 75]). Program ESEL implements a method called 'outstanding 
 representatives' (OR). 
 1.2 An Outline of the OR Method 
 
 In this method, the original event set is reduced to a set consisting 
 of events which are most 'distant' from each other. An important feature of 
 this method is that the resulting set will include events which delineate the 
 'outside' of the events in the original set. For example, if the 'true' but 
 unknown decision class is a circle and its interior and the original event 
 set consists of a number of randomly selected points from this class, then 
 the reduced set will be a set of points lying on or close to the perimeter of 
 the circle and spanning a polygon of approximately equal sides. 
 
 This method is, however, very sensitive to events which differ 
 significantly from the rest of the events in the original set. If such events 
 happened to be errors, then these errors would have a strong effect on the 
 result. To circumvent this problem, an additional test could be done, which 
 selects an event only if it has a certain number of 'close' neighbors*. Figure 
 1 illustrates this method. 
 
 *This feature is not implemented, 
 
Let e and e« denote two given events: 
 
 e l (x l' x 2'*--> x nl' X nl+1' X nl+2'•• ,, X n2' X n2+1' X n2+2'"*' X n* 
 
 p = fx" x" x" x" x" x" x" x" x"} 
 
 e 2 ^ V x 2'-'-' nl' nl+1' X nl+2'"*' n2' n2+l» n2+2' ' V 
 
 V _ J V ^ V 
 
 linear structured nominal 
 
 variables variables variables 
 
 where x! and x'.' denote values of variable x. in e, and e„, respectively. Assume, 
 11 i 1 2 
 
 without loss of generality, that the first nl variables in the events above are 
 
 interval variables, the following n2 variables are structured variables and 
 
 those remaining are nominal variablest. 
 
 First, we will define a measure of the distance d (x ' , xV) between 
 
 the values of a variable depending on the type of the variable : 
 
 « For linear variables: 
 
 |x' - x'.'l 
 d(x'., xV) = * 1 , 1 < i < nl (4) 
 
 1 1 a.x — — 
 
 assuming that the domain of each linear variable is represented 
 
 the 
 
 by the set {0, 1, 2,..., X.},\ ■ \-l ( d i ~ the cardinality of D , i.e. of 
 
 domain of x . ) 
 « For structured variables: 
 
 d(x . X ") = _NB_ 
 
 Q ^ X i' V mnb (5) 
 
 nl < i <_ n2 (see Figure 2) 
 
 where NB is the number of branches on the shortest path linking x' with xV 
 
 in the Hasse diagram representing the domain of x., and MNB is the maximum number 
 
 of branches on the shortest path linking any two nodes of the diagram. 
 
 • For nominal variables: 
 
 1, if x' is not identical to x'' 
 d(x\ x") = 
 
 ri, if x^ 
 
 1^0, other* 
 
 lerwise 
 (n2 < i £ n) 
 Two types of distance measures between events are considered: 
 
 tit is assumed here that if the domain of a structured variable is not a g- 
 structure, then the variable is treated as a cartesian variable. 
 
NB(a,b) = 3 
 NB(b,c) = 6 
 NB(a,c) = 7 
 
 MNB = 9 
 
 d(a,b) = 3/9 
 d(b,c) = 6/9 
 d(a,c) = 7/9 
 
 Illustration of a distance between values of a structured variable 
 
 Figure 2 
 
(1) Quantized measure : 
 
 d q (e l' e 2 } = i-1 iCdCxJ.xJ).!^ + i= ^ 2+1 wd(x;,xp (6) 
 
 where T, = (t._, t ,..., t. ) is a sequence of thresholds t.. associated with 
 1 11 12 ip ^ ij 
 
 variable x,, i = 1, 2, ..., n2 
 
 q is a quantization function q:{d(x' x"XT}-»-{0, 1, 2,..., p} 
 defined as 
 
 -0, if d(xj, xj) £ t n 
 
 1, if t. 1 <d(x', xp < t i2 
 
 q(d(x^, xp,T) = / ' 
 
 
 p, if t. < d(x , x ) 
 ^ ip i i 
 
 w - a 'weight' assigned to nominal variables in relation to non- 
 nominal variables. 
 
 (2) Continuous measure 
 
 n 
 d c ( V e 2 ) = 1-1 V (x i' X l ) (7) 
 
 where w. is a weight associated with the variable x,. 
 l i 
 
 The threshold sequence T. in the quantized measure and weight 
 w in the continuous measure represent two different means to control the 
 effect of a single linear or structured variable on the distance between events, 
 
 As we can see, control by a threshold sequence avoids a aultipli- 
 cation operation in computing the distance, unlike in control by weight, and 
 thus is computationally simpler than the latter. It requires, however, that 
 the user specifies the value of p and p thresholds for each variable, as 
 opposed to the single number (weight) required in control by weight. 
 
1. 3 Algorithm 
 
 We will now describe a specific algorithm implementing the OR 
 method. The algorithm is applied in the same way to every set E , 
 k = 1, 2,.... Let us then assume that E stands for any one of these sets. 
 Either of the distance measures introduced in section 1.2 can be used in 
 the algorithm. 
 
 1. For each e e E determine the distance d(e, e„), where 
 e Q = (0, 0, 0,..., 0). 
 
 2. Find events e . and e such that 
 
 mm max 
 
 d ( e m -.- T ,' e r>) = mln d ( e > e n) 
 mm U ^ „ \) 
 
 etE 
 
 d ( e ™o V > O = max d ( e > e r) 
 max o eeE 
 
 3. Determine the distance d(e . , e ) and divide it into 
 
 mm max 
 
 r intervals*, where r is between 0.01 and 0.1 of the 
 size c(E) of the original set E (e.g., if c(E) = 3000 
 then r is between 30 and 300) . 
 
 4. Partition E into r subsets, E n , E_,..., E , such that 
 
 ' 1' 2 r 
 
 E. consists of events whose distance d(e, e„) lies in 
 the ith interval, i = 1, 2,..., r: 
 
 a i-l < d ^ e ' e 0^ - a i 
 
 where a. , and a. are the endpoints of the ith interval 
 l-l l 
 
 (a_. = d(e . , e„) and a = d(e , e_.)). 
 mm r max 
 
 * The intervals do not have to be equal. The desired situation here is to 
 have intervals which will lead to the subsets E. (determined in step 4) 
 of approximately the same size. 
 
5. From each subset E., 1=1, 2, ..., 4, select a subset 
 
 E consisting of s events (where s is such that r*s gives 
 
 the desired size of the reduced event set) . The 
 
 selection is made in the following way: 
 
 1.) Find e, and e„ in E. such that 
 1 2 l 
 
 d(e_, ej = max d(e , e )* 
 
 12 r „ a b 
 
 e , e, e E . 
 a b x 
 
 2.) Find e~ such that 
 
 d(e , e ) • d(e_, e^) = max (dCe.e.^) • d(e,e 2 )) 
 
 e £E. 
 
 l 
 
 s-1.) Find e such that 
 s 
 
 s-1 s-1 
 
 n d(e , e.) = max II d(e, e^) 
 j-1 S J e6E 1 j-1 J 
 
 where fl denotes the arithmetic multiplication. 
 
 6. The union of the sets E, : 
 
 is 
 
 r 
 E = U E. 
 S i-1 1S 
 
 gives the reduced event set. 
 
 *A more computationally efficient process, though one which might lead to a 
 less desirable result, is to replace step 1 by two steps: 
 la) find e^ such that 
 
 d(e-L, e Q ) = min d(e, e Q ) 
 e eEi 
 
 lb) find e 2 such that 
 
 d(e^, e 2 ) = max d(e, e^) . 
 ee E 
 
 tlhe reason for using multiplication in steps 2, ..., s-1, is to select 
 events which are at similar distances from each other. 
 
10 
 
 The number of operations required by the algorithm is 
 
 approximately : 
 
 s-1 
 N = c(E) + r (cCEj) + 2 2 j(t-j) 
 
 where c(E), c(E ) is the cardinality of E and E., respectively. (E . are 
 assumed to be all of equal size.) An 'operation' may involve computing the 
 distance between two events, the comparison of two distances, the comparison 
 of the distance with a threshold, etc. In the modified form of the algorithm 
 we have: 
 
 s-1 
 
 N 
 
 ' = c(E) + r .2 jc(E ) 
 
 For example, if c(E) = 3000, c(E ± ) = 100, r * 30, s = 10, then N = 273000 
 (N' = 268000), and the cardinality of the reduced set would be c(E ) = 300. 
 
 1.4 User's Guide for ESEL 
 
 INPUT FILES 
 
 PARMSX - A file with information about variables. 
 
 INST - A file with information about the sizes of event sets 
 
 which are in the data base and the sizes of representa- 
 tive sets of events. 
 
 EVNT - The file containing the data base. 
 PARMSX FILE 
 
 This file contains the number of variables in the event descriptions 
 in the data base, the range of values for each variable, the domain structure 
 for each variable, and the weight which should be given to each variable. The 
 first number in this file must be the number of variables in each event descrip- 
 tion. The next three specifications are all in the same form: a number, optionally 
 followed by a set of numbers. The first number may be used to set all values of 
 
11 
 
 the range, structure, or weight to a single value. If all values in a specifi- 
 cation are to be set to one value, then this first number should be this value 
 and there is no following set of values. If a value is to be specified for each 
 variable, the first number should be (0). For example, suppose there are 3 
 variables with the following situation: 
 
 x l x 2 X 3 
 
 max. value 3 2 4 
 structure interval nominal interval 
 weight 5 6 3 
 
 The file PARMSX would look like this: 
 
 3 324 303 563 
 
 max value structure weights 
 
 The first 3 indicates 3 variables, the first indicates that the 
 range will be specified for each variable (another value than would indicate 
 that all variable ranges will be of that value) . The second indicates that 
 the structure of all variables will be individually specified. An interval 
 structure is specified by the number 3, any other number gives a nominal struc- 
 ture. The final indicates that weights will be specified independently for 
 each variable. 
 
 Here is another example: suppose there are 4 variables in the 
 data base with the following characteristics: 
 
 X l X 2 X 3 X 4 
 
 max. value 3 3 3 3 
 
 structure interval interval interval interval 
 weight 6 5 11 
 
 Then the PARMSX file would look like this: 
 
 6 5 11 
 
12 
 
 The 4 indicates that there are 4 variables describing objects in 
 the data base. The next two 3's indicate that all ranges are from to 3 and 
 that all variables are of interval structure. The indicates that weights will 
 be specified independently. 
 
 INST FILE 
 
 This file contains information about the number of events in each 
 class in the data base and the number of events from each class the program is 
 to select. Each class is specified by two lines in this file. The first line 
 specifies the number of events in the data base which correspond to the class, 
 the second line specifies the number of partitions and the total number of 
 events which are to be selected from the class. A class with events, par- 
 titions, and selected events terminates the file. 
 
 For example, a data base with 100 events, the first 50 of which are 
 to be in the first class, the next 20 in the second class and the last 30 in 
 the third may be specified as follows: 
 
 INST: 
 
 blank - the first line must be blank 
 
 50 - the first 50 events are in the first class 
 
 15 - using 1 partition, select 5 events 
 20 - the second class has 20 events 
 
 16 - using 1 partition, select 6 events 
 30 - the last class has 30 events 
 
 3 20 - using 3 partitions, select 20 events 
 
 
 
 - the last class 
 
 EVNT FILE 
 
 This file contains the actual data base. Events are stored as lists 
 of integers. Irrelevant values are stored as -1. The first line of this file 
 must be blank. 
 
 For example, a situation with 5 events and 3 variables: 
 
13 
 
 EVNT FILE 
 
 blank 
 0-13 
 
 2 2 3 
 
 3 3 1 
 3 11 
 10 
 
 OUTPUT FILES 
 
 OFILEX - A file with the selected events. 
 
 TOPT - A file with the remaining events which were not selected. 
 
 Each file is in the same form as the input file EVNT except that a class 
 number is appended to the beginning of each event. This output format is compatible 
 with the VL- mode of the INDUCE-1 (Larson, Michalski 77, Larson 77 a,b) program 
 and the program AQ11. 
 
14 
 
 2. INCREMENTAL GENERATION AND TESTING OF VL HYPOTHESES: Program AQ11 
 
 2. 1 Introduction 
 
 There are many situations when one starts with certain initial 
 hypotheses about given data and then, in the process of experimenting with 
 these hypotheses, has to modify them in order to preserve consistency with new 
 acquired facts. Such situations arise, e.g., in rule-based expert systems, where 
 in the course of a system's performance some rules are discovered to be incorrect 
 or incomplete and have to be modified. 
 
 A process of generating hypotheses (or descriptions) in steps, where 
 each step starts with certain working hypotheses and a set of (new) data and 
 ends with appropriately modified hypotheses, is called an incremental (or mul t i- 
 step ) generation of hypotheses . 
 
 The purpose of program AQ11 is to implement such an incremental 
 generation of hypotheses in the framework of the variable-valued logic system 
 VL- (Michalski 74) . Although from the viewpoint of the complexity of real 
 scientific research this framework is extremely restricted; nevertheless, 
 it is still sufficiently rich to provide an interesting research subject and, 
 also, to obtain solutions which may have practical applications. 
 
 Hypotheses are expressed here as (constant-free) disjunctive 
 normal VL expressions (DVL expressions*). A DVL expression is a disjunction 
 of terms, where a term isalogical product of selectors. A selector is a 
 
 statement in the form: 
 
 [x # R] 
 
 where x is a unary descriptor (variable) 
 
 # denotes any of the relational operators ■ £ ^ ^ 
 
 R is a list of constants which are elements of the domain of x (R is called 
 
 the reference of the selector) 
 
 *In the general case, DVL^ expressions involve constants and are multiple-valued logi 
 expressions [Michalski 74]. Here, for simplicity, we will assume initially, that 
 they are just binary (i.e., either satisfied or not satisfied), and have no constant 
 
15 
 
 When a DVL expression is evaluated for a given event, selectors are 
 
 interpreted as conditions (or questions) . A selector is satisfied if the value of the 
 
 variable in the event satisfies the condition, otherwise, it is not satisfied . 
 
 Some examples of selectors and their interpretation as conditions follows: 
 
 [x. = 1] is x. equal to 1? 
 
 l l 
 
 [x. = 1,3] is x. equal to 1 or 3? 
 
 [x. = 1..3] is x. between 1 and 3, inclusively? 
 l l 
 
 An example of a term: 
 
 [ Xl = 3][x 3 = 2,4,5][x 5 = 0] 
 
 The above term is satisfied if x 1 equals 2, x has value 2, 4, or 5 and 
 
 x_ has value 0. 
 
 An example of DVL- formula: 
 
 T V T V T 
 12 3 
 
 where T , T , T are terms. The formula is satisfied if term T or T~ 
 or T„ is satisfied. 
 
 A DVL formula is interpreted as a description of a set of 
 events, namely events which satisfy it. 
 
 2.2. Description of Methodology 
 
 Suppose there is given a set of hypotheses (DVL descriptions), 
 V = {V.}, i=l,...,m, and a family of event sets ('facts'), F={F }, which these 
 hypotheses are supposed to describe. Suppose that for any i, V. describes cor- 
 rectly only a part of the events from F . 
 
 The problem is to produce a new set of hypotheses, V = {V }, where 
 
 each V. describes all events from set F f , and does not describe events from 
 i i 
 
 any other event set F., j ± i. 
 
 The following solution to this problem is based on the multiple 
 application of a computer program implementing an efficient algorithm [Michalski 71] 
 for determining a cover , C(E /E ), of an event set E ] against the event set E . 
 
16 
 
 Such a cover can be interpreted as a DVL expression, which is satisfied by every 
 
 event in E and not satisfied by any event in E (or in E \ E- , if E and E, intersect) 
 x o O 1 o 1 
 
 The solution consists of 3 major steps: 
 
 Ste P 1 - The first step isolates those facts which are not consistent 
 
 with the given hypotheses. For each hypothesis, two sets 
 
 are created: 
 
 F - a set of events which should be covered by the hypothesis, 
 
 but are not 
 F - a set of events which are covered by the hypothesis, but 
 
 should not be covered. 
 (An event is said to be covered by a hypothesis if the event 
 satisfies the VL. formula which represents the hypothesis.) 
 Specifically, this step determines, for each i, 1=1,2, ... ,m, the sets*: 
 
 F + = F. \ v\ (8) 
 
 ill 
 
 F" = V. n F , j=l,2,...,m; tfi (9) 
 
 (see Figure 3) . 
 
 Thus, F. denotes events which should be covered by V but are not, and 
 
 F.. denotes 'exception' events, i.e., events in F., j^i, which are 
 ij F ' ' i 
 
 covered by V., but should not be covered, 
 l 
 
 Step 2 . This step determines, for each i, a generalized formula V. 
 describing all exception events (the union of sets F.., j=l,2,...,m, 
 j^i). This is done by generating, for given i and each j, a cover of 
 
 F against the events in the sets V, U F., i=l, 2, . . . ,m: 
 
 3 m + 
 
 V.. = C(F, . / U V.u F.) (10) 
 
 ij ij i=l i i 
 
 and then taking the logical union of V 
 
 ■ v vT (n) 
 
 j=i 
 
 3H 
 
 1 • i ij 
 
 *V denotes the set of events covered by formula V.. 
 
-17- 
 
 Illustration of sets ¥_, and F_, . 
 
 Figure 3 
 
18 
 
 The reason for this step is that it is computationally more efficient 
 
 to use formulas V than the union of E.., j=l,2,...,m; j^i. 
 
 Step 3 . New 'correct' hypotheses could be obtained now by 'subtracting' 
 
 from each V. the formula V. and 'adding' to it the set F.. To do this 
 11 l 
 
 directly, however, is difficult. Again, an advantage is taken of the 
 
 available computer program for generating covers C(E. /E ). 
 
 1 o 
 
 Namely, the new hypotheses, V., i=l,2,...,m, are determined as covers: 
 
 v] = C(F./U [(V^v") u Ffc ]) (12) 
 
 k=l 
 k^i 
 
 (The point is that directly simplifying a union of terms is difficult; 
 
 but ' substracting' a term from a term or generating a cover of an 
 
 event set against a DVL.. formula is easier). 
 
 Step 4 . This step determines the final representation of hypotheses 
 
 V.. The V. are DVL.. expressions which are unions of terms. Some terms 
 i l 1 
 
 in a V. may represent (cover) only a few events in F . . Such 'low 
 weight' terms are replaced by the events (facts) themselves (since an 
 event takes less memory than a term). In program AQ11, parameter 
 PUNY specifies the minimum percent of events which a term has to cover 
 to be a 'high weight' term. 
 
 For example, if PUNY = 0.02, and a set F. has 100 events, then all 
 terms which cover 3 or more events (3 > 0.02 x 100) are 'high weight' 
 terms. Terms which cover 1 or 2 events are replaced with those events. 
 
 2. 3 An alternative way of handling exception events 
 
 In the procedure above, the exception events were represented by terms 
 in V . . If the number of exception events is small, it can be easier to handle the 
 events without turning them into expressions V.. The 'substraction' (denoted by \) 
 of an event e from a term T (in a given formula) is done by logically multiplying 
 the term by the negation of the event: 
 
19 
 
 T\i e = T a e (13) 
 
 In order to use this way of handling exception events, in program 
 AQ11 the parameter STGY should be set to value 2 (STGY=2) . 
 
 The result of operation (13) can produce several terms. Anyone of them 
 is sufficient to be used in the new hypothesis. In program AQ11, there is a para- 
 meter //EX which specifies how many such terms a user wants to store for representing 
 a hypothesis. If the number of generated terms is larger than //EX, the program 
 Selects //EX 'best' terms according to the criteria list. 
 
 2.4 Additional Features 
 
 There may exist certain restrictions on the event space which must 
 hold in the resulting formulas. A restriction may be of the form 
 
 [x =2] -> [x =NA] (NA = not applicable) 
 
 which is read "if x has the value 2 then the variable x is not applicable." 
 
 The implementation of these restrictions can be viewed as an extra set of 
 
 hypotheses V ,, which is included in the set E of all covers: 
 n+1 
 
 C(F./E°UV L ,) ' 
 l n+1 
 
 Due to the techniques used in the covering algorithm (namely, the use of para- 
 meter 'maxstar', see p. 27), this may not be the best approach since only a 
 few terms in each intermediate quantity are retained. Therefore, the program 
 imposes these restrictions on all facts in the set F={F.}. Using the above 
 restrictions, an event 
 
 e = (1 3 2) 
 is replaced with 
 
 e = (NA 3 2) 
 
20 
 
 2. 5 Testing Procedure 
 
 By applying the above described part of AQll program one can determine 
 DVL descriptions (hypotheses) of classes of objects from examples of objects 
 representing individual classes. An obvious problem arises of testing the validity 
 of the derived descriptions. This is done by applying the descriptions to new 
 examples of objects with known class membership. The results of such testing are 
 usually represented in a form of a confusion matrix . This matrix specifies for 
 each class ( a row in the matrix), the numbers of testing objects of this class, 
 which were assigned by the descriptions to individual classes (corresponding to 
 columns of the matrix) . 
 
 Below is an example of a confusion matrix involving 2 classes: a 
 class of cancer cells, and a class of non-cancer cells: 
 
 Class 
 (Correct Assignment) 
 
 Assigned Decision 
 
 Cancer cells 
 Non-cancer cells 
 
 Cancer cells 
 
 Non-cancer cells 
 
 28 
 7 
 
 2 
 23 
 
 Entries on the diagonal indicate the correct decisions, entries outside of the 
 diagonal - incorrect decisions. For example the number 7 in the second row 
 indicates that 7 (testing) non-cancer cells were classified incorrectly as cancer eel] 
 
 This form of confusion matrix is adequate if an event (object) either 
 satisfies or does not satisfy a formula. In general, however, it is desirable to 
 consider the degree to which a given event e satisfies or matches a formula. Such 
 a degree, called degree of consonance (or degree of match ) and denoted DC(e,V), 
 is computed according to an evaluation scheme . An evaluation scheme consists 
 of definitions for computing: 
 
 (1) DC(S,e) - a degree of consonance between a selector and an event (briefly, 
 degree of consonance of a_ selector ) , 
 
 (2) DC(T,e) - a degree of consonance of a term (a product of selectors), 
 
 (3) DC(V,e) - a degree of consonance of a DVL formula (union of terms), 
 
21 
 
 DC({V }, e) - a degree of consonance of a set of formulas (describing the 
 
 same class) . 
 
 Many different evaluations schemes can be applied for evaluating 
 DVL-, formulas. Methods developed in many-valued logic (e.g., Recher 69) and 
 fuzzy reasoning (e.g., Zadeh 74, Gaines 76) are applicable here. We will 
 describe the evaluation scheme currently implemented in program AQ11, 
 and give suggestions for other evaluation schemes. 
 
 (1) Definition of degree of consonance of a selector. 
 
 The basic definition of the degree of consonance, DC(S,e), of a 
 
 selector comes from the evaluation rules in VL [Michalski 74]. Assuming 
 
 that the output domain of the formulas D = {0,1} we have: 
 
 1, if the value of appropriate variable in e 
 satisfies the selector S 
 
 DC(S,e) = 0, if it does not satisfy S 
 
 * the value is unspecified 
 
 For example, suppose event e = (x ,x„,x ) = (3, 1, 1), and selector S is [x =1,3]. 
 We have D(e,S) = 1, because value of x_ in e is a member of the reference of the 
 selector (i.e., 1 is member of {1,3}). (Fig. 4). 
 
 Alternative evaluation schemes can take into consideration the 
 structure of the domainof the variable in the selector. If a variable is linear, 
 it seems that the above definition of DC(e,S) is too rigid. For example, if a 
 ,linear variable x =13 and S: [x =14.. 18], the selector is evaluated to 0, while 
 it seems desirable to evaluate it to some value greater than (since 13 is so 
 'close' to 14). This means that one could accept a 'bell-shaped' function for 
 evaluating interval selectors (Fig. 5). 
 
 The concept of 'trimming' a term can be also useful here. In an untrimmed 
 ( extended ) term, references of selectors (sets of values) are as large as possible 
 without leading to a contradiction, i.e., intersectionwith formulas of different classes 
 In the trimmed term, references are as small as possible, providing that the term 
 jJtill covers the same learning events as the extended term and preserves the type 
 
-22- 
 
 DCn 
 
 
 
 Selector S:[x=l,3] 
 
 ~T~ ~T~ ~J~ ~4~ *"x 
 
 A graphical illustration for evaluation selector [x =1,3] 
 
 Figure 4 
 
 DCA 
 
 
 
 -if L 
 
 T---— I 
 
 12 345 6 78 
 
 A bell-shaped (A) versus step-shaped (B) function for 
 evaluating a linear selector [x.=3..5] 
 
 Figure 5 
 
23 
 
 of selectors, e.g., if the reference of a linear selector lsa..b, then in 
 the trimmed selector it will be an interval a . .b , a<a b <b. 
 
 An evaluation function can assign DC = 1 when a variable has value 
 within the 'trimmed* reference, assign DC = 0, when the variable has value 
 outside of the extended reference, and assign DC = B, 0<B<1, otherwise. 
 (Fig. 6 a and b). 
 
 (2) Definition of the degree of consonance of a term. 
 
 In the definition of VL , the degree of consonance of a term was defined as the 
 minimum of values of selectors in the term. In AQ11, the degree of con- 
 sonance of a term is computed as the ratio of the number of selectors 
 satisfied in the term to the total number of selectors in the term. 
 
 If all selectors in a term are satisfied, then both definitions 
 give the same value. If this is not the case, the latter definition differ- 
 entiates between the terms with different numbers of selectors 
 satisfied, while the former does not, (which is a desirable feature). 
 
 As an alternative, one could use here also a probabilistic logic 
 evaluation, which evaluates a term into the arithmetic product of DC-s of 
 selectors . 
 
 (3) Definition of the degree of consonance of a formula. 
 
 The degree of consonance of a formula V and an event e, DC(V,e), 
 is defined as the maximum of degrees of consonance DC(T ,e), of terms T , 
 in the formula (i.e., as defined in VL ), i.e.: 
 
 DC(V,e) = MAX {DC(T ,e)} 
 
 T.GV i 
 
 l 
 
 (4) Definition of the degree of consonance of a set of formulas of the same class. 
 
 It is useful sometimes to generate more than one formula describing 
 a given class . The reason is that having more than one formula per class may 
 
-24- 
 
 Generalized selector S: [x =1:3] 
 
 Trimmed selector 
 
 S': [x.=2] 
 
 a. Evaluation function for an interval 
 
 selector using the concept of 'trimming' 
 
 DC* 
 1 
 B 
 
 
 
 
 
 Generalized selector 
 S: [x.=l,2,4,5] 
 
 Trimmed selector 
 S: [x ± =2,5] 
 
 b. Evaluation function for a nominal 
 
 selector using the concept of 'trimming' 
 
 Figure 6 
 
25 
 
 improve the reliability of decision making. We have accepted in AQ11 the 
 definition that the degree of consonance DC({V },e), of a set of formulas, 
 {V }, as the average of the DC-s of the formulas in the set: 
 
 DC({V.},e) = AVG (DC(v.,e)} 
 
 V. 1 
 1 
 
 Given a set of formulas of different classes and an event, the DC is 
 computed between the formula (or a set of formulas) of each class and the event. 
 The classification decisions are ordered according to the value of DC. All de- 
 cisions with value DC within the distance TAU (see parameter TAU in section 2.5) 
 from the maximum value of DC, are rank 1 decisions (i.e., each of 
 these decisions are treated as equally justified ). Then the next 'best' 
 decision which is not of rank 1 is selected, and all decisions with DC within 
 TAU distance from DC of the selected decision are assigned rank 2. The 
 process repeats IRK times (see input parameter IRK in section 2.5). 
 
 For each testing event, values of DC of ranked decisions are printed 
 by the program AQ11 as rows in an 'extended' confusion matrix (see Fig. 7 
 for an example) . In the matrix, a decision of rank 1 which is correct is 
 underlined, and the number of rank 1 decisions for the given event is printed 
 in the 'TIES' column. If an event has some unspecified values, it may still 
 be possible to compute DC for certain classes, and for certain classes DC would 
 depend on the value of unspecified variables. If a decision of rank 1 
 is correct and would remain so no matter what values unspecified variables 
 take, then the decision is treated as a correct decision. In any other case, the 
 decision is excluded from computing total correctness statistics, and the column 
 'UNSP' corresponding to the given event has entry * . 
 
 The performance statistics for each group of events of one class are 
 printed in the last 2 rows of the group of rows associated with these events. 
 The rows contain the number and percentage, respectively, of events classified 
 
26 
 
 to each class. The percentage D f correct decisions is braced by 
 
 vertical bars. The matrix also contains a column #DEC/event specifying an 
 
 'indecision ratio', which is a ratio of all decisions of rank 1 to the total 
 
 number of events in the group (excluding rows with USP=*) . Figure 7 gives 
 
 an example of an extended confusion matrix. The matrix was computed for 3 classes 
 
 Dl, D2 and D3 described by formulas: 
 
 Dl: [xl=2][x3=l][x4=l] 
 
 Dl: [xl=2][x2=0] 
 
 D3: [x2=l][x4=l] 
 and for 3 testing events of class Dl: 
 
 el: (1011) 
 
 e2: (2111) 
 
 e3: (*011) (* denotes unspecified value) 
 
 The parameters were: TAU=0.1 and IRK=2: 
 
 The percentages 
 braced by 
 vertical bars 
 indicate the 
 correct 
 decisions. 
 
 
 
 
 
 Assig 
 
 ned Dec 
 
 ision 
 
 Correct Assign 
 
 #DEC/Event 
 
 TIE 
 
 UNSP 
 
 Dl 
 
 D2 
 
 D3 
 
 Dl 
 
 1-33 
 
 
 
 .68 
 
 .50 
 
 .50 
 
 2 
 
 
 1.00 
 
 .50 
 
 1.00 
 
 
 * 
 
 
 
 0.50 
 
 
 
 2 
 
 
 
 1 
 
 
 
 |67%| 
 
 0% 
 
 33% 
 
 An example of an extended confusion matrix 
 Figure 7 
 
 The value of #DEC/Event is 1.33 because there were 4 decisions of rank 1 
 (including new with UNSP=*) and 3 events (4/3=1.33). 
 
 Concluding, the program AQ11 permits a user to determine decision 
 formulas in an incremental way and then automatically test them on the testing 
 data. Thus, it is a 'complete' tool for making experiments in inductive learning 
 
 of DVL descriptions of data. 
 
27 
 
 2. 6 Program User's guide 
 
 The program requires two sets of input parameters: control 
 parameters (read in PL/l data format) and data parameters (read in PL/l 
 list format). All but two control parameters have default values, there- 
 fore only two (NV, NCL) must be specified. All data parameters must be 
 specified (although some may be omitted if certain control parameters 
 are set). The following is a list of all parameters which the program 
 currently accepts. Default values (if any) are given in the examples. 
 
 2.6.1 Control parameters 
 
 • NV (no default value) 
 Example: NV = 50 
 
 Possible values: integer in the range 1:50 
 
 This parameter specifies the number of variables which are 
 available to describe each event. 
 
 • NCL (no default value) 
 Example: NCL = 19 
 
 Possible values: any positive integer 
 
 NCL specifies the number of classes of events (or event sets) 
 which are to be input to the program. The program will then 
 generate a cover of each event set. 
 
 • MAXSTAR 
 
 Example: MAXSTAR = 10 (default value) 
 
 Possible values: any positive integer 
 
 MAXSTAR is the maximum number of complexes (terms) which are kept 
 in any intermediate star (see AQ7 documentation for a further 
 description [Larson, Michalski 75]). The procedure for selecting 
 'best terms' is somewhat different in this program than in 
 AQ7. (When each intermediate star is trimmed, the user-specified 
 cost function is used rather than only the first two criteria). 
 
28 
 
 • NPASS 
 
 Example: NPASS = 1 (default value) 
 
 Possible values: any non-negative integer 
 
 If NPASS = 0, the program will only execute the confusion 
 matrix phase and then terminate (Set TEST = 'l'B). 
 
 If NPASS = 1, the program will form a cover of the facts 
 using input formulas (if any), then evaluate the formulas 
 if TEST = »1'B. 
 
 If NPASS > 1, the program will partition the set of facts into 
 sets whose size depends on the PCT parameter (see below) and form 
 hypotheses based on old hypotheses and the partitioned facts. 
 Then, new sets of events will be taken in turn and hypotheses 
 formed based on the entire set of events taken up to the point 
 and the hypotheses from the last pass. 
 
 • TEST 
 
 Example: TEST = 'O'B (default value) 
 
 Possible values: 'l'B or 'O'B 
 
 If TEST is 'l'B, then a confusion matrix will be computed 
 
 after each pass. The testing events must be given to the 
 program in the file TESTF. 
 
 • RESTRICT 
 
 Example: RESTRICT = 'O'B (default value) 
 Possible values: 'l'B or 'O'B 
 
 If RESTRICT is 'l'B, then a set of restrictions is accepted 
 
 by the program (see parameter REST) and applied to all events. 
 
 • RTEST 
 
 Example: RTEST = 'O'B (default value) 
 Possible values: 'l'B or 'O'B 
 
 If RTEST is 'l'B, the restriction will also be applied 
 
 to testing events. 
 
29 
 
 TRANS 
 
 Example: TRANS = 'O'B (default value) 
 
 Possible values: 'l'B or 'O'B 
 
 If TRANS is set to 'l'B, then the variable names and values 
 are translated into descriptive names in the output. In this 
 case, a file TRAN must be given to the program (see TRAN below) 
 
 • PUNY 
 
 Example: PUNY =0.02 (default value) 
 
 Possible values: real value in interval [0:1] 
 
 All terms which cover less than a percent ' (PUNY*ioo) of the 
 events of the corresponding set will be discarded in the next 
 pass (i.e., if a term covers 2 events, PUNY = 1, and there are 
 23 events in the training set, this term will not be used in the 
 next pass). 
 
 • TAU 
 
 Example: TAU = .019 (default value) 
 
 Possible values: real values in interval T0:ll 
 
 This parameter relates to the computation of the confusion 
 matrix. Any two values (degrees of consonance) within TAU 
 of each other are considered to be of the same rank. For 
 example, if .98 is the highest decision value for a testing 
 event, any decision with a value between .96 and .98 would 
 be a rank 1 decision (assuming default value of TAU) . 
 
 • IRK 
 
 Example: IRK = 2 (default value) 
 
 Possible values: positive integer less than NCL 
 
 This parameter also relates to the computation of the confusion 
 matrix and controls the number of decisions printed out. All 
 degrees of consonance of rank not greater than IRK are printed, 
 others are not printed. If IRK = 1, only rank 1 degrees are 
 printed. One exception to this is that the degree associated 
 with the correct decision is always printed. 
 
30 
 
 NCRIT 
 
 Example: NCRIT = 2 (default value) 
 
 Possible values: integer in range [1:4] 
 
 NCRIT specifies the number of cost criteria which should be applied 
 when computing the cost of a formula (see CRIT) . CRIT(l) through 
 CRIT (NCRIT) will be used, all others will be ignored 
 
 CRIT (I) 
 
 Example: CEIT(l) = 1 CRIT(2) = 2 (default value) 
 
 Possible values: each CRIT(l) may have values 1, 2, 3, 5 and 9 
 
 CRIT (I) = J specifies that the I-th criterion in order will be the 
 
 cost function J. There should be NCRIT specifications indi- 
 cating the cost function which will be used (j) and the order 
 in which they will be applied (i). Available cost functions 
 are the following: 
 
 1. Maximize the number of events covered by the given term, 
 and not covered by previous terms 
 
 2. Minimize number of selectors 
 
 3. Minimize cost of all variables in this term. If 
 this criterion is specified, costs of variables 
 must also be specified (see Z parameter) 
 
 5. Minimize the number of events of £0 covered 
 
 9 . Maximize total number of events covered by a term 
 
 #EX 
 
 Example: #EX = 1 (default value) 
 
 Possible values : positive integer 
 
 During some phase of the program, exception terms are 
 formed (description of events which are covered by hypotheses 
 but should not have been). #EX gives the number of redundant 
 exception terms (i.e., the terms which cover the same 
 event ) . 
 
31 
 
 • TR 
 
 Example: TR = 'O'B (default value) 
 
 Possible values: '1' or '0' B 
 
 TR gives a trace of the multi-step process giving the exception 
 terms and the size of the sets F" 1 " and F~ described in Section 2.1. 
 
 • STGY 
 
 Example: STGY = 1 (default value) 
 
 Possible values: 1 or 2 
 
 If STGY has the value 1, then exception terms are formed 
 for events in the sets F~. If STGY has the value 2, then the 
 previous hypotheses are multiplied by the complement of the 
 exception events of the set F~. 
 
 • INDEP 
 
 Example: INDEP = 'O'B (default value) 
 
 Possible values: *0'B or 'l'B 
 
 If INDEP is 'l'B, then the number of independently covered 
 events are printed for each complex. Otherwise, only the num- 
 ber of new events and the total number of events covered are 
 printed. 
 
 • TITLE 
 
 Example: TITLE = (default value) 
 
 Possible values: non-negative integer 
 
 TITLE specifies the number of cards which are in the title. 
 The title cards must follow the semi-colon which terminates 
 the set of control parameters. 
 
 • OPT 
 
 Example: OPT = 'l'B (default value) 
 
 Possible values: 'l'B or 'O'B 
 
 If OPT is 'l'B, then after each pass a table is printed indicating the 
 numbers of times each cost criterion is evaluated (number of 
 terms for which the cost function is evaluated) . 
 
32 
 
 • MODE 
 
 Example: MODE = 'IC (default value) 
 
 Possible values: 'IC, 'DC*, »VL' 
 
 If MODE = 'IC, then covers are allowed to intersect over 'DON'T 
 CARE'areas of the event space. If MODE = 'IC, the covers are 
 constrained to be disjoint. MODE = *VL' gives order dependent 
 covers. 
 
 • CPXEV 
 
 Example: CPXEV = 'l'B (default value) 
 
 Possible values: 'l'B or 'O'B 
 
 If this parameter is 'l'B, then during the testing phase a 
 table is printed which gives the number of times each term 
 was needed to give a correct decision. 
 
 • GEN 
 
 Example: GEN = 'l'B (default value) 
 
 Possible values: 'l'B or 'O'B 
 
 If this parameter is 'l'B, then only the necessary parts of 
 the reference of each output complex are printed (i.e., a new 
 term is created from the generated term which has the following 
 properties) : 
 
 a. The new term covers the same events. 
 
 b. The new term contains the same variables. 
 
 c. The references in the new term are as small as possible. 
 
 • ECHO 
 
 Example: ECHO = 'ERZ' (default value) 
 
 Possible values: A string contains any of the characters ZERF 
 
 If the letter appears, the corresponding input data is echoed. 
 
 E = Events 
 R = Restrictions 
 Z = Variable costs 
 F = Input formulas 
 
 The default echos events, restrictions and variable costs if 
 they are in the input. 
 
33 
 
 • TOLERANCE (I) 
 
 Example: TOLERANCE(2) =0.0 (default value) 
 
 Possible values: integer or real in [0:1] 
 
 TOLERANCE (J) is the tolerance for the J-th criterion specified. 
 If it is an integer, then it is assumed to be an absolute tolerance, 
 Otherwise, it is a relative tolerance calculated by finding 
 TOLERANCE * (MAX-MIN) when MAX or MIN are the maximum and minimum 
 elements in the list of costs to be sorted. 
 
 • ORD 
 
 Example: ORD = 'l'B (default value) 
 
 Possible values: 'l'B, 'O'B 
 
 If ORD is 'l'B, then the program will reorder events in EO, 
 in decreasing order, with regard to the distance from e.. . 
 
 • N-TAU 
 
 Example: N-TAU = (default value) 
 
 Possible values: integer in [0:8] 
 
 This parameter, if not zero, generates a TAU estimation table 
 giving summary information for each class in the evaluation 
 procedure using N-TAU values of TAU beginning at with increments 
 of TAU- INC. 
 
 • TAU- INC 
 
 Example: TAU- INC = .02 (default value) 
 Possible values: Real in [0:1] 
 
 This is the increment used in the TAU estimation table. 
 
34 
 
 Semi colon (;): This must be entered to terminate the control 
 
 parameters. 
 
 2.6.2 Data parameters 
 
 These parameters have the names as used in the program. In the 
 input to the program only their values are specified, in the order 
 given here (See fig. B-l (a) for an example.) 
 
 § TITLEC 
 
 Possible values: The number of lines specified by the TITLE 
 
 parameter 
 
 These lines are printed at the top of the output. 
 
 • NSPEC 
 
 Possible values: An integer in the range [0:NV] 
 
 Number of variables for which a structure is to be specified. 
 
 • VTYPE 
 
 Possible values: 'F', 'I' 
 
 The NSPEC variables will be of this type ('F 1 - nominal 
 variable, 'I' -linear variable). 
 
 • TYPE 
 
 Possible values: A list of NSPEC integers in the range [1:NV] 
 The list indicates variables of VTYPE. 
 
 Example of NSPEC, VTYPE, TYPE: 3'F' 1 3 5 
 
 There are 3 variables of type 'F' (nominal) namely, 
 variables 1, 3, and 5. The rest will have type 'I'. 
 
 • NL 
 
 Possible values: A list of NV positive integers in the range [1:8] 
 
 This parameter gives the number of values which each variable 
 can assume. 
 
 Example: 12 4 
 
35 
 
 • NE 
 
 Example : 3 1^+ 
 
 Possible values: A list of NCL integers in the range [0:NEVE] 
 
 The parameter specifies the number of events in each event 
 set. The sum should add up to NEVE. 
 
 • NF 
 
 Example : 3^1 
 
 Possible values: A list of NCL non-negative integers 
 
 This parameter specifies the number of terms of the hypothesis 
 for each event set. 
 
 • PCT 
 
 Example: .2 .k 1 
 
 Possible values: A list of NPASS real values in range [0:1] 
 
 (except if NPASS = 1, PCT is assumed to be l) 
 
 In this example, 20$> of the events will be described first, 
 then an extra 20$, of the events will be added and a description 
 formed using previous hypotheses. Finally, the complete set of 
 events is used (see NPASS above). 
 
 • REST 
 
 Example: (xl2 = l)-> (xlk = *); 
 
 (xl3 = 2)-> (xl = *) (xh = 1). 
 
 Possible values: A list of decision rules separated with semi- 
 colons and terminated with a period 
 
 This restriction will be applied to all events (i.e., added to 
 current specifications). RESTRICT must be set to specify restric- 
 tions. An * in the reference indicates that this variable is not 
 applicable. Restrictions are separated by semi-colons and the 
 list of all restrictions is terminated by a period. 
 
36 
 
 • EVENT 
 
 Possible values: NEVE lists of events, NEVE = SUM(NE) 
 
 There are two ways in which events can be specified, and the 
 two types of specifications can be mixed. 
 
 1. An event can be specified as a list of values, one value for 
 each variable. The values can be: 
 
 a) non-negative integer — indicating value of the variable 
 
 b) -1 — variable does not apply 
 
 c) -2 — do not know the value 
 
 Example: 3 2 0-1-20 
 k 1 2 
 
 2. An event can also be specified by a VLl formula which is 
 
 preceeded by a line which says FORMULA. Each formula 
 must be terminated by a semi-colon. 
 
 Example : FORMULA. 
 
 (xl = 2) (x3 = 0); 
 
 FORMULA. 
 
 (x3 = 1) (x21 =2); 
 
 • FORMULA 
 
 Possible values: NCL lists of formulas, each having NF complexes 
 There are two ways to specify a formula: 
 
 1. as a FORMULA as in the event specification, 
 
 2. as a binary positional bit string in PL/1 List Format. 
 
 • Z 
 
 Example: Z(l,2) = 9 Z(3,*0 = 57; 
 
 Possible values: Integer values terminated by semi-colon in 
 PL/1 Data Format 
 
 These are costs of the variables which are accepted if CRIT(I) = 3 
 has been specified for the event set I. If Z value is not specified 
 for some variables, it is assumed to be 1. Z(I,J) = Y means that 
 variable x has cost Y for event set I. 
 
37 
 
 2.6.3 Files 
 
 • TEST 
 
 This file must be included if the parameter TEST is set to 'l'B, 
 The first line of this file contains a list of NCL values 
 indicating the number of test events for each event set. 
 The list of testing events follows. Each event is specified 
 as a list of variable values with coding of -1 and -2 as 
 above. 
 
 • TRAN 
 
 This file must be included if TRANS is 'l'B. It contains the 
 names of all variables and variable values. Each name will be 
 truncated: variable names to 20 characters, value names to 10 
 characters. The format is the following: For each variable 
 one specifies: 
 
 variable name, variable value names 
 
 Each name must be in single quotes. 
 
 Example : 
 
 //TRAN DD * 
 
 •TEMPERATURE' 
 
 'COLD 1 
 
 'MODERATE' 
 •WARM' 
 
 •HUMIDITY' 
 
 •DAMP' 'DRY 1 
 
 • TESTF 
 
 This is a temporary file which the program uses to store test 
 events from one pass to the next. See JCL set up for specifi- 
 cation of this file. 
 
 This completes a description of the input specification to the pro- 
 gram AQ11. For a user's convenience, appendix A gives a summary of the input 
 specification. Appendix B gives an example of input and output from the 
 program. 
 
38 
 
 2.6.4 Program Output 
 
 Most of the output is self-explanatory (see appendix B) . The input 
 data is echoed when specified. Then, the formulas for each pass are printed. 
 To the right of each term is a pair of numbers which specify the number 
 of new events covered and the total number of events covered by that term. 
 
 After all the formulas for one pass are printed, a confusion matrix 
 is printed for these formulas and given testing data. Information about each 
 pass is printed in turn until all passes are complete. 
 
 If two events of different classes are identical, then a message 
 is printed indicating a non-disjoint representation of classes. In such a 
 situation, if a cover C(E1/E0) is being created, then the event of EO is 
 ignored . 
 
 The output from the evaluation part of the program consists of an 
 extended confusion matrix, as described in section 2.5. 
 
 Two other tables are printed at the user's option. If CPXEU is 
 set, then a table listing the number of correct decisions for each complex 
 is given. If N-TAU is not zero, then TAU estimation table is printed, giving the 
 indecision ratio and number of correct decisions for each class for N-TAU 
 values of TAU , beginning with in increments of TAU-INC. 
 
 3 . SUMMARY 
 
 We have described here the underlying methodology and computer programs 
 for selecting 'best' learning VL, events (program ESEL) , and incrementally 
 generating VL.. hypotheses for given event sets (e.g., selected by program ESEL), 
 and then automatically testing them on the supplied testing events (program AQ11) 
 These two programs constitute a package which can be used for making 
 experiments in induction of descriptions from examples in various applied fields. 
 
39 
 
 ACKNOWLEDGMENT 
 
 This work was supported in part by the National Science Foundation 
 under Grant NSF MCS 76-22940, and in part by a Senior Visiting Fellowship 
 from the Science Research Council in U. K. 
 
 The paper was written while one of the authors, R. S. Michalski, 
 was spending his sabbatical leave at the University of Essex in England. 
 This author would like to express here his deepest gratitude to Professor 
 Brian Gaines, the head of Electrical Engineering Department of University 
 of Essex University, for unusual hospitality and help to organize 
 life in the new environment, as well as for the numerous and inspiring 
 discussions. 
 
 Thanks are also due to Tom Dietterich for the strenuous job of 
 proofreading of this paper. 
 
40 
 
 REFERENCES 
 
 1. Cuneo, R. P. Selected Problems of Minimization of Variable-Valued Logic 
 Formulas. Report No. 726, Department of Computer Science, University 
 
 of Illinois, Urbana, Illinois, 1975. 
 
 2. Forsburg, A. S. A user's guide for AQPLUS , on internal report, Department 
 of Computer Science, University of Illinois, Urbana 1975. 
 
 3. Gaines, B. R. Foundations of Fuzzy Reasoning, International Journal 
 of Man-Machine Studies, No. 8, 1976. 
 
 4. Jensen, G. M. Determination of Symmetric VL ± Formulas: Algorithm 
 and Program SYM4. Report No. 774, Department of Computer Science, 
 University of Illinois, Urbana, Illinois, 1975. 
 
 5. Larson, J. Inductive Inference in the Variable Valued Predicated Logic 
 System VL : Methodology and Computer Implementation, Report No. 869, 
 DepartmenFof Computer Science, University of Illinois, Urbana, Illinois, 
 1977. 
 
 6. Larson, J. Induce 1 : An Interactive Inductive Inference Program in 
 VL Logic System. Report No. 876, Department of Computer Science, 
 University of Illinois, Urbana, Illinois, 1977. 
 
 7. Larson, J., Michalski, R. S. Inductive Inference of VL ± Rules. 
 Workshop on Pattern Directed Inference Systems, Honolulu, Hawaii, 
 May 1977. 
 
 8. Larson, J., Michalski, R. S. AQVAL/1 (AQ7) User's Guide and Program 
 Description. Report No. 731, Department of Computer Science, University 
 of Illinois, Urbana, Illinois, 1975. 
 
 9. Michalski, R. S. TOWARD COMPUTER-AIDED INDUCTION: A Brief Review of 
 Currently Implemented AQVAL Programs, Report No. 874, Department of 
 Computer Science, University of Illinois, Urbana, Illinois, May 1977. 
 
 10. Michalski, R. S. On the Selection of Representative Samples from Large 
 Relational Tables for Inductive Inference, Department of Information 
 Engineering, University of Illinois at Chicago Circle, July 1975. 
 
 11 Michalski, R. S. VL : Variable-Valued Logic System. 1974 International 
 Symposium on Multiple-Valued Logic, West Virginia University, Morgantown, 
 West Virginia, May 1974. 
 
 12. Michalski, R. S. A Geometrical Model for the Synthesis of Interval 
 Covers. Report No. 461, Department of Computer Science, University of 
 Illinois, Urbana, Illinois, 1971 
 
 13. Rescher, M. Many-Valued Logic McGraw-Hill, New York, 1969. 
 
 14. Stepp, R. Uniclass Cover Synthesis: Methodology and a Computer Program 
 Description, Report No. , Department of Computer Science, University 
 of Illinois, Urbana, Illinois, 
 
 15. Zadeh, L. A. Fuzzy Logic and its Application to Approximate Reasoning, 
 Proceedings IFIP Congress 1974, Vol. 3, North-Holland, 1974. 
 
41 
 
 APPENDIX A 
 
 AQ11 Input Specifications 
 
 1. ID parameters 
 
 Allow 150 to 180 K Bytes of storage for large problems. A very small 
 problem may be run in 120 K. Very few IOREQ's are used by the program; 
 500K ismore than enough. Time is the main variable which must be ad- 
 justed. Using the following parameters, an estimate of the time re- 
 quired for a large job can be given. 
 
 MAXSTAR = 1 NPASS =3 NV = 35 
 
 NCL = 19 NEVE (training) = 307 No evaluation 
 
 Time: 1 min. Region: 174K 
 
 Changing MAXSTAR to 7 and requesting evaluation using 388 events, the 
 time increased to 3 minutes for the training phase and 1 minute and 30 
 seconds for evaluation. 
 
 JCL 
 
 The following JCL is recommended: 
 
 // EXEC PGM=ITCN3,REGION=180K,PARM=' ISA (N), REPORT' 
 
 //STEPLIB DD DSN=USER.P2123.ITCN3,DISP=SHR 
 
 //SYSPRINT DD SYS0UT=A 
 
 //PLIDUMP DD SYS0UT=A 
 
 //SAVEF DD DSN=&&TEMP,UNIT=DISK,SPACE=(TRK,(10,1)) 
 
 //FT06F001 DD SYS0UT=A 
 
 //SYSIN DD * 
 
 input parameters and data 
 
 //TEST DD * 
 
 test data (if evaluation requested) 
 
 //TRAN DD * 
 
 translation data (if TRANS is set) 
 
 ISA(N): N should be the region requested minus 125, 
 e.g. 51K 
 
42 
 
 Control parameters 
 
 Parameter 
 
 NV 
 
 NEVE 
 
 NCL 
 
 MODE 
 
 MAXSTAR 
 
 ECHO 
 
 NCRIT 
 
 CRIT(l) 
 
 CRIT(2) 
 
 CRIT(3) 
 CRIT(A) 
 
 TITLE 
 
 RESTRICT 
 
 SAVE 
 
 GEN 
 
 PUNY 
 
 TR 
 
 NPASS 
 STGY 
 
 #EX 
 
 OPT 
 
 TRANS 
 TEST 
 RTEST 
 TAU 
 
 IRK 
 
 CPXEV 
 
 NGE 
 
 INDEP 
 
 TOLERANCE (I) 
 
 N-TAU 
 
 TAU- INC 
 ORD 
 
 Semi-colon (;) 
 Data parameters 
 Parameter 
 
 TITLE 
 NSPEC 
 
 TYPE 
 
 Default 
 
 r IC« 
 10 
 
 'ERZ' 
 2 
 1 
 2 
 
 5 
 9 
 
 
 
 'O'B 
 
 •O'B 
 
 •l'B 
 
 .02 
 
 'O'B 
 1 
 1 
 
 1 
 
 'l'B 
 
 'O'B 
 'O'B 
 'O'B 
 .019 
 
 2 
 
 'l'B 
 
 200 
 'O'B 
 
 
 .02 
 'l'B 
 
 Description 
 
 Number of variables 
 
 Total number of training events 
 
 Number of classes 
 
 Mode of operation 
 
 Maximum star size 
 
 Echo input 
 
 Number of criteria 
 
 Criterion 1 
 
 Criterion 2 
 
 Criterion 3 
 
 Criterion 4 
 
 Number of lines in title 
 
 Accept restrictions 
 
 Save formulas in a file 
 
 SAVEF 
 Trim eomplexes for output and 
 
 evaluation 
 The minimum percent of events which h 
 to be. covered by a term 
 Trace multi-step procedure 
 Number of steps 
 Way in which events of F sets 
 
 are handled 
 Numbers of redundant exception 
 
 complexes 
 Print statistics about number of times 
 
 each cost function is evaluated 
 Translate output using TRAN file 
 Evaluate formulas 
 Apply restrictions to test events 
 Equivalent threshold for rank 1 
 
 decisions 
 Number of ranked decisions which 
 
 are printed 
 Print statistics about satisfied 
 
 complexes during evaluation 
 Initial storage for complexes 
 Prints independent events if set 
 Tolerance for I tn specified test 
 function 
 
 Number of columns in 'tau' estimation 
 table 
 
 Increment in tau estimation table 
 Reorder the events in EO, in 
 decreasing order, with regard 
 to the distance from e . 
 Terminate control parameters 
 
 Description 
 
 Lines of title (if any) 
 
 Number of variables for which type 
 
 TYPE is specified 
 Type of these variables 
 
43 
 
 VSPEC 
 PCT 
 
 NL 
 NE 
 NF 
 RESTRICTIONS 
 
 EVENTS 
 
 FORMULAS 
 
 Z 
 
 5. Files 
 File 
 
 Indicies of variables of type TYPE 
 If NPASS > 1, the percent of events 
 
 to use in learning phase for each 
 
 pass 
 Number of values for each variable 
 Number of events in each set 
 Number of formulas in each set 
 If RESTRICT is set. Each pair of 
 
 rules must be separated with a 
 
 semi- colon; the entire list is 
 
 terminated with a period. 
 Lists of events in either of two 
 
 forms 
 Lists of formulas as in either of 
 
 two forms 
 If any CRIT(l) = 3, costs of variables 
 
 terminated with semi-colon 
 
 Description 
 
 TRAN 
 
 TEST 
 SAVEF 
 
 TRANS is 'l'B, the file names of 
 
 classes, variables and variable 
 
 values. Each name must be in 
 
 quotes 
 TEST is 'l'B, the file of test 
 
 events 
 SAVE is 'l'B, the output file of 
 
 formulas in bit positioned form 
 
 (list format) 
 
44 
 
 APPENDIX B 
 
 An Example of an Input to 
 and an Output from AQ11 
 
 This appendix contains an example of the program input and output 
 which involves most of the features of the program. Figure B-l gives the 
 input specification for this example. Figure B-2 gives the output which was 
 obtained. The first page of output repeats the input in a slightly extended 
 form. The next pages show formulas which were generated [in which variables 
 x 1 ,x 2 ,x 3 ,x^ are substituted by their names, and defined in the input (item P in 
 Fig. B-l)] and the results of the evaluation of formulas on testing events. 
 
 Explanation of Figure B-l . 
 
 The example involves four variables (NV=4; see item B in Figure B-l(a)), 
 
 which can take 2, 3, 4 and 2 values, respectively (item E) . All variables are 
 
 nominal, except variable x which is interval (item D) . There are 2 classes 
 
 (NCL=2; item B) , each represented by 6 learning events (items F, J). The 
 
 last event of set (class) 1 is specified as a DVL.. formula (in the middle of 
 
 item J) . Item H defines the percentage of learning events to be used in each 
 
 iteration (pass). The restriction on event space is given by a VL, decision 
 
 rule (item I) . There are initial hypotheses for class 1 and 2 hypotheses for 
 
 class 2 (item G) . Item K (fig. B-l(b)) lists the hypotheses for class 2, The cost of 
 
 variable 1 for set 1 is specified as 2 (item L) • the cost criteria for the selection of 
 complexes (terms) in the synthesis of covers are in the order 1, 2, 3, 9 
 
 (1 and 2 by default; 3 and 9 defined by CRIT(3)=3, CRIT(4)=9 in item B) . (For 
 
 the definition of cost of variables and cost criteria see [Larson, Michalski 75]). 
 
 Evaluation of the formulas to be generated is requested (//TEST DD*) 
 
 and sets of test events supplied, 4 events per class (items M, N) . A file 
 
 containing names of each class (set), each variable and each value of the variable 
 
 is also supplied (items 0, P) . 
 
45 
 
 Input for Example 
 
 A 
 
 // EXEC PGM=ITCN3,REGION=160K,PARM«'ISA(23K), REPORT' 
 
 //STEPLIB DD DSN=USER.P0012.ITCN3,UNIT=DISK,DISP=SHR 
 
 //SYSPRINT DD SYSOUT=A 
 
 //PLIDUMP DD SYSOUT=A 
 
 //SAVEF DD SYSOUT=B 
 
 //FT06F001 DD SYSOUT=A 
 
 //TEST DD DSN=JIM, UNIT=DISK,DISP= (NEW, DELETE), SPACE=(TRK, (10,10)) 
 
 //SYSIN DD * 
 NPASS=2 NV=4 MAXSTAR=30 CRIT(3)=3 CRIT(4)=9 
 TITLE=3 NEVE=12 ECHO='ERFZ' TEST='1'B TRANS- 1 l'B 
 
 RESTRICT='1'B NCL=2 INDEP= , 1 , B NGE=100 NCRIT=4 
 NTAU=4; 
 
 ************************************************************ 
 
 TEST RUN 
 ************************************************************************ 
 
 i »r 
 
 3 
 
 
 2 3 4 
 
 2 
 
 
 6 6 
 
 
 
 2 
 
 
 
 .5 1 
 
 
 
 (X1=0) 
 
 (X2=0) 
 
 (X3=0) -> 
 
 r 
 
 
 
 
 2 
 
 
 
 
 2 
 
 1 
 
 
 Oil 
 
 
 
 
 2 2 
 
 1 
 
 
 FORMULA 
 
 
 
 (X4=0) 
 
 (X2=l 
 
 2) (X1=0) 
 
 2 1 
 
 1 
 
 
 3 
 
 
 
 
 12 
 
 
 
 
 111 
 
 
 
 
 10 2 
 
 1 
 
 
 12 3 
 
 
 
 
 (X4=0) . 
 
 A JCL 
 
 B Control parameters 
 
 C Title 
 
 D NSPEC, UTYPE 
 
 E Number of levels/variable 
 
 F Number of events/pass 
 
 G Number of formulas/set 
 
 H Fraction of events/pass 
 
 I Restriction 
 
 J Event list (6 events/set) 
 
 v 
 
 Figure B-l (a) 
 
46 
 
 M 
 
 ( 
 
 r 
 
 < 
 
 ( 
 
 r 
 
 FORMULA 
 
 (Xl=l) (X2=l 2) 
 
 FORMULA 
 
 (Xl=l) (X3=2 3) 
 
 z(l,l)=2; 
 
 //TEST DD * 
 4 4 
 
 10 
 110 
 13 
 11 
 110 1 
 12 3 1 
 13 
 13 1 
 //TRAN DD * 
 'ACCEPT' 
 'REJECT' 
 'NEW' 
 
 'COLOR 1 
 
 'SIZE' 
 
 'WEIGTH' 
 
 (X4=0) (X3=0 1); 
 
 'YES' 
 'NO' 
 
 'RED' 
 'BLUE 1 
 ' ORANGE ' 
 
 'SMALL' 
 'MEDIUM' 
 ' LARGE ' 
 'X LARGE' 
 
 'HEAVY' 
 'LIGHT' 
 
 variable xl 
 ) values 
 
 variable x2 
 
 ) values 
 
 variable x3 
 
 values 
 
 variable x4 
 j values 
 
 K Formulas (2 guesses for set 2) 
 L Cost of variables (variable 1 
 
 has cost 2 for set 1) 
 M Number of test events/set 
 N List of test events 
 Name of each set 
 P Variable names and variable 
 
 value names 
 
 Figure B-l (b) 
 
4? 
 
 Explanation of Figure B-2 . 
 
 The first part contains an echo of the input (item A) . Next (item B) 
 
 prints the formulas obtained after the first iteration (pass) , which used 
 
 50% of the input events (first 3 events in both classes; see item H in 
 
 Fig. B-l) . The classes, variables and values of variables are specified by 
 
 names. Together with each complex (term) a triple of numbers is printed 
 
 (NEW, IND, COV) (item C) , where 
 
 NEW - denotes the number of events covered by the given complex and not 
 
 covered by the previous complexes on the list of complexes generated for this class 
 
 IND - denotes the number of events covered only by the given complex 
 
 COV - the total number of events covered by the given complex. 
 
 The program also lists the number of times each cost criterion has 
 been evaluated (item D) . Item E gives a symbolic specification of the obtained 
 formulas. Next, an extended confusion matrix is printed (item F) as the result 
 of evaluating the obtained formulas for the testing events (item N in Fig. B-l). 
 We can see from the matrix that all testing events of the first class ('ACCEPT') 
 have been misclassif ied, and all the events of the second class ('REJECT') have 
 been correctly classified. 
 
 Item G specifies the number of times each complex in the cover of 
 each class has been satisfied by testing events in the case of correct decisions 
 (second complex, C2, of class D2 correctly classified 3 testing events, and the 
 third one, C3, correctly classified 1 testing event). 
 
 Item H specifies the percentage of correct decisions and the indecision 
 ratio for various values of parameter TAU (generally, the higher TAU, the 
 greater is the number of correct decisions, but also the greater is the indecision 
 ratio) . 
 
 Item I lists the formulas obtained in the second iteration (which 
 used all the learning events), and item J - the corresponding confusion matrix. 
 We can see that this time 50% of testing events of class 1, and 100% of 
 class 2 were correctly classified. Items K and L give the same information 
 as items G and H, respectively, but for the formulas obtained in the second iteration. 
 
48 
 
 *************************************************************************** 
 
 TEST RUN 
 *************************************************************************** 
 
 TR= ' ' B 
 
 NPASS=2 
 
 
 NV=4 
 
 MAXSTAR=30 
 
 TITLE=3 
 
 NCL=2 
 
 
 TEST='1 , B 
 
 MODE= ' IC * 
 
 STGY=1 
 
 INDEP='1'B 
 
 
 GEN='1'B 
 
 ECHO='ERFZ' 
 
 NGE=100 
 
 TAU=1.89999E-02 
 
 
 IRK=2 
 
 CPXEV='1'B 
 
 TRANS='1'B 
 
 NTAU=4 
 
 
 TAU INC=1.99999E-02 
 
 ORD-'l'B'; 
 
 CRIT LIST 
 
 1 0.00 2 0.00 3 0.00 
 
 9 
 
 0.00 
 
 
 113 
 
 
 
 
 
 2 PASSES 0.50 1.00 
 
 
 
 
 NUMBER OF 
 
 LEVELS / VARIABLE 2 3 4 
 
 2 
 
 #EX=1 
 
 
 NUMBER OF 
 
 EVENTS / CLASS 6 6 
 
 
 SAVE='0*B 
 
 
 NUMBER OF 
 
 FORMUAS / CLASS 2 
 
 
 PUNY=1.99999E-02 
 RESTRICT= ' 1 ' B 
 RTEST='0'B 
 
 
 RESTRICTIONS ON EVENT SPACE 
 
 (X1=0) (X2=0) (X3=0) -> (X4=0) 
 
 LIST OF INPUT EVENTS 
 
 1 
 
 
 
 
 
 
 
 
 
 
 
 2 
 
 
 
 2 
 
 
 
 
 
 
 
 3 
 
 2 
 
 
 
 1 
 
 
 
 
 
 4 
 
 1 
 
 1 
 
 
 
 
 
 
 
 5 
 
 2 
 
 2 
 
 1 
 
 
 
 
 
 
 (X4= 
 
 =0] 
 
 (X2=l 
 
 ,2) 
 
 (X1=0) 
 
 (X3= 
 
 =D; 
 
 7 
 
 2 
 
 1 
 
 1 
 
 
 
 
 
 8 
 
 
 
 3 
 
 
 
 
 
 
 
 9 
 
 1 2 
 
 
 
 
 
 
 
 
 
 10 
 
 1 1 
 
 1 
 
 
 
 
 
 
 
 11 
 
 1 
 
 2 
 
 1 
 
 
 
 
 
 12 
 
 1 2 
 
 3 
 
 
 
 
 
 
 
 INPUT FORMULAS 
 
 (Xl=l) (X2=l,2) (X4=0) (X3=0,l); 
 (Xl=l) (X3=2,3); 
 
 COSTS OF VARIABLES WHICH ARE NOT 1: Z(l, 1)= 2; 
 
 TIME FOR INPUT OF DATA 36 CENTISECONDS 
 
 C NEW IND COV 
 
 C *****C0VER OF ACCEPT***** 
 
 CPX 1: (NEW= YES) (SIZE= SMALL) 
 CPX 2: (NEW= YES) (SIZE= LARGE) 
 
 *****COVER OF REJECT***** 
 
 CPX 1 
 CPX 2 
 CPX 3 
 
 (SIZE= MEDIUM) 
 (SIZE= X LARGE) 
 (NEW= NO) 
 
 # TIMES EV. 
 
 12 
 
 11 
 
 4 
 
 4 
 
 19 CENTISECONDS 
 
 Figure B-2 (a) 
 
 2) 
 1) 
 
 ( 
 
 1 
 
 1 
 
 1) 
 
 ( 
 
 1 
 
 1 
 
 1) 
 
 ( 
 
 1 
 
 1 
 
 1) 
 
49 
 
 FORMULAS FOR CLASS 1 
 CPX :(X1 = 0) (X3 = 0) 
 CPX :(X1 = 0) (X3 = 2) 
 
 FORMULAS FOR CLASS 2 
 
 CPX 
 CPX 
 CPX 
 
 (X3 = 1) 
 (X3 = 3) 
 (XI = 1) 
 
 NUMBER OF EVENTS IN EACH CLASS 
 
 ASSIGNED DECISION 
 
 CORRECT // EVENTS/ TIE UNSP 1 1 D 1 D 2 
 ASSIGN # RK1 DEC | | 
 
 
 D 1 ACCEPT .50 1.00 
 
 .50 1.00 
 .50 1.00 
 .50 1.00 
 
 4/4 | | 4 
 1.00 0% 100% 
 
 
 D 2 REJECT .50 1.00 
 
 1.00 
 .50 1.00 
 .50 1^00 
 
 4/4 | 4 | 
 1.00 0% 100% 
 
 NUMBER OF CORRECT DECISIONS /COMPLEX 
 
 COMPLEXES 
 EVENT SETS C1C2C3C4C5C6C7C8C9C10 Cll C12 C13 
 
 1 3 1 
 
 2 
 
 Figure B-2 (b) 
 
H 
 
 CLASS 
 
 1 
 2 
 
 TOTALS 
 
 TAU ESTIMATION TABLE (% CORRECT / INDECISION RATIO) 
 
 VALUE OF TAU 
 0.00 0.02 0.04 0.06 
 0.00/1.00 0.00/1.00 0.00/1.00 0.00/1.00 
 1.00/1.00 1.00/1.00 1.00/1.00 1.00/1.00 
 
 0.50/1.00 0.50/1.00 0.50/1.00 0.50/1.00 
 
 FINAL STATISTICS 
 INDECISION RATIO: 1.00 
 PERCENT CORRECT: 50.00 
 
 TIME TO EVALUATE FORMULAS 35 CENTISECONDS 
 
 
 CPX 1:(NEW= YES) (SIZE= SMALL MEDIUM) (WEIGHT= HEAVY) 
 
 CPX 2:(NEW= YES) (SIZE= LARGE) 
 
 CPX 3:(NEW= YES) (SIZE= SMALL) 
 
 *****COVER OF REJECT***** 
 
 CPX 1:(SIZE= MEDIUM) (WE IGHT= LIGHT) 
 
 CPX 2:(SIZE= X LARGE) 
 
 CPX 3:(NEW= NO) 
 
 CRIT # 
 1 
 2 
 3 
 9 
 
 = HEAVY) 
 
 ( 
 
 3 
 
 2 
 
 3) 
 
 
 ( 
 
 2 
 
 2 
 
 2) 
 
 
 ( 
 
 1 
 
 1 
 
 2) 
 
 
 ( 
 
 1 
 
 1 
 
 1) 
 
 
 ( 
 
 2 
 
 1 
 
 2) 
 
 
 ( 
 
 3 
 
 3 
 
 4) 
 
 // TIMES EV. 
 8 
 7 
 4 
 3 
 
 
 
 
 
 TIME FOR THIS PASS 
 
 20 CENTISECONDS 
 
 r 
 
 ASSIGNED DECISION 
 
 CORRECT // EVENTS/ TIE UNSP | I D 1 D 2 
 ASSIGN // RK1 DEC I I 
 
 D 1 ACCEPT 
 
 4/ 4 
 1.00 
 
 D 2 REJECT 
 
 4/ 4 
 1.00 
 
 I I 
 
 1^00 .50 
 
 1.00 .50 
 
 767 1.00 
 
 .67 1.00 
 
 2 
 50% 
 
 2 
 50% 
 
 II 
 
 ,50 1^00 
 
 1^00 
 
 ,67 1.00 
 
 so lTgg 
 
 I 4 | 
 0% 100% 
 
 Figure B-2 (c) 
 
51 
 
 NUMBER OF CORRECT DECISIONS /COMPLEX 
 
 EVENT SETS 
 
 COMPLEXES 
 C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 CIO Cll C12 C13 
 
 r 
 
 CLASS 
 
 1 
 2 
 
 TAU ESTIMATION TABLE 
 
 CORRECT / INDECISION RATIO) 
 
 VALUE OF TAU 
 0.00 0.02 0.04 0.06 
 0.50/1.00 0.50/1.00 0.50/1.00 0.50/1.00 
 1.00/1.00 1.00/1.00 1.00/1.00 1.00/1.00 
 
 TOTALS 
 
 0.75/1.00 0.75/1.00 0.75/1.00 0.75/1.00 
 
 
 FINAL STATISTICS 
 INDECISION RATIO: 1.00 
 PERCENT CORRECT: 75.00 
 
 TIME TO EVALUATE FORMULAS 
 
 28 CENTISECONDS 
 
 Figure B-2 (d) 
 

 51 
 
 NUMBER OF CORRECT DECISIONS/COMPLEX 
 
 EVENT SETS 
 
 COMPLEXES 
 C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 CIO Cll C12 C13 
 
 3 1 
 
 TAU ESTIMATION TABLE 
 
 CORRECT / INDECISION RATIO) 
 
 CLASS 
 
 1 
 2 
 
 VALUE OF TAU 
 0.00 0.02 0.04 0.06 
 0.50/1.00 0.50/1.00 0.50/1.00 0.50/1.00 
 1.00/1.00 1.00/1.00 1.00/1.00 1.00/1.00 
 
 TOTALS 
 
 0.75/1.00 0.75/1.00 0.75/1.00 0.75/1.00 
 
 FINAL STATISTICS 
 INDECISION RATIO: 1.00 
 PERCENT CORRECT: 75.00 
 
 TIME TO EVALUATE FORMULAS 
 
 28 CENTISECONDS 
 
 Figure B-2 (d) 
 
IBLIOGRAPHIC DATA 
 HEET 
 
 Title and Subtitle 
 
 1. Report No. 
 
 UIUCDCS-R-78-867 
 
 SELECTION OF MOST REPRESENTATIVE TRAINING EXAMPLE! 
 
 AND INCREMENTAL GENERATION OF VL HYPOTHESES: the underlying 
 methodology and the description of programs ESEL and AQ11 
 
 Author(s) 
 
 R. S. Michalski and J. B. Larson 
 
 Performing Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana-Champaign 
 
 Urbana, IL 61801 
 
 2, Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C. 
 
 5, Supplementary Notes 
 
 3. Recipient's Accession No. 
 
 5. Report Date 
 
 May 1978 
 
 8* Performing Organization Rept. 
 No. 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract /Grant No. 
 
 NSF MCS 76-22940 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 6. Abstracts ^ e p a p er describes the underlying theoretical framework and operational de- 
 rails of two programs, ESEL and AQll, for computer induction within the framework of the 
 /ariable-valued logic system VL^ (i.e., a statement calculus which involves variables 
 fith an arbitrary number of discrete values [Michalski 1974]): 
 
 ESEL - A supporting program for selecting 'most representative' learning 
 and/or testing VL^ events from a large data base of events. The 
 programs provides the input to the program AQll. 
 AQll - A program for incremental generation of VL-^ hypotheses, which are 
 generalized and optimized descriptions of input event sets. The 
 program also provides a facility for evaluating the performance 
 of these inferred hypotheses on testing events. 
 Given a large set of examples describing certain objects or situations, program 
 ESEL selects from them a small subset of the most representative ones. The examples 
 have to be in the form of VL 1 events, i.e., in the form sequences of values of certain 
 discrete variables (or descriptors). In selecting the events, the program distinguishes 
 among three types of descriptors: nominal descriptors, whose value set is an unordered 
 set, linear descriptors, whose value set is a linearly ordered set, and structured des- 
 criptors, whose value set is a tree-ordered set. 
 
 Events selected by ESEL are input to program AQll, which generates VL-, hypo- 
 theses describing the events. The program can work incrementally, i.e., given a work- 
 ing hypothesis (a set of rules) obtained at some stage, and a set of events, the pro- 
 gram can modify the hypothesis to make it consistent with the events. 
 
 Program AQll also has the facility to test the performance of a given hypothe- 
 sis on a set of testing events, and to compute an extended confusion matrix. 
 
 i7. Key Words and Document Analysis. 17o. Descriptors 
 
 VL., Variable-valued logic, inductive inference, incremental hypothesis generation, 
 incremental induction, hypothesis modification, rulebased deduction, deductive 
 inference 
 
 18. Availability Statement 
 
 f?. Security Claaa (This 
 Report) 
 
 IINC.I.ASSIF 
 
 m. 
 
 20. Security Class ( 1 his 
 Page 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 °«M NTIS-35 (10-701 
 
 USCOMM-DC 40329-P7 1 
 
AUG 1 7 1978 
 
■