The person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books are reasons 
 for disciplinary action and may result in dismissal from 
 the University. 
 To renew call Telephone Center, 333-8400 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 "■>!# 1 
 
 ,'■ :• - 
 
 987 
 
 L161— O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/learninggenerali1007diet 
 
21 £ tor 
 
 A UIUCDCS-R-80-1007 
 
 LEARNING AND GENERALIZATION OF STRUCTURAL DESCRIPTIONS: 
 Evaluation Criteria and Comparative Review of Selected Methods 
 
 UILU-ENG 80 1710 
 
 by 
 
 Thomas G. Dietterich and Ryszard S. Michalski 
 
 February 1980 
 
UIUCDCS-R-80-1007 
 
 LEARNING AND GENERALIZATION OF STRUCTURAL DESCRIPTIONS: 
 Evaluation Criteria and Comparative Review of Selected Methods 
 
 by 
 
 Thomas G. Dietterich and Ryszard S. Michalski 
 
 February 1980 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
LEARNING AND GENERALIZATION OF STRUCTURAL DESCRIPTIONS: 
 Evaluation Criteria and Comparative Review of Selected Methods 
 
 Thomas G. Dietterich 
 Computer Science Department 
 Stanford University 
 Stanford, California 94305 
 
 Ryszard S. Michalski 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 61801 
 
 ABSTRACT 
 
 Some recent work in the area of learning structural descriptions from examples 
 is reviwed in light of the need in many diverse disciplines for programs which 
 can perform conceptual data analysis. Such programs describe complex data in 
 terms of logical, functional, and causal relationships which cannot be dis- 
 covered using traditional data analysis techniques. Various important aspects 
 of the problem of learning structural descriptions are examined and criteria 
 for evaluating current work is presented. Methods published by Buchanan, 
 et. al. [1-3,20], Hayes-Roth [6-9], and Vere [22-25], are analyzed according 
 to these criteria and compared to a method developed by the authors. Finally 
 some goals are suggested for future research. 
 
 Key words: Machine Learning, Inductive Inference, Knowledge Acquisition, 
 Structural Learning Computer Inference 
 
 This research was supported in part by the National Science Foundation under 
 grants MCS-76-22940 and MCS-79-06614. 
 
 This report was submitted for publication in Artificial Intelligence Journal, 
 
1. INTRODUCTION 
 
 1 . 1 Motivation and Basic Concepts 
 
 There are many problem areas where large volumes of data are generated 
 about a class of objects, the behavior of a system, a process, etc. Scien- 
 tists in fields as diverse as agriculture, chemistry, and psychology are 
 faced with the need to analyze such data in order to detect regularities 
 and common patterns. Traditional tools for data analysis include various 
 statistical techniques, curve-fitting techniques, numerical taxonomy, etc 
 These methods, however, are often not satisfactory because they impose an 
 overly restrictive mathematical framework on the scope of possible solu- 
 tions. For example, statistical methods describe the data in terms of pro- 
 bability distribution functions placed on random variables. As a result, 
 the types of patterns which they can discover are limited to those which 
 can be expressed by placing constraints upon the parameters of various pro- 
 bability distribution functions. Because of the mathematical frameworks 
 upon which they are based, traditional methods cannot detect conceptual 
 patterns such as the logical, causal, or functional relationships that are 
 typical of descriptions produced by humans. This is a well-known problem 
 in AI, namely that a system in order to learn something must first be able 
 to express it. The solution requires introducing more powerful representa- 
 tions for hypotheses and developing corresponding techniques of data 
 analysis and pattern discovery. Work done In AI and related areas on com- 
 puter induction and learning structural descriptions from examples has laid 
 the groundwork for researh in this area. This is not accidental, because, 
 as Michie [17] has pointed out, the development of systems which deal with 
 
 - 1 - 
 
problems in human conceptual terms is a fundamental characteristic of AI 
 research. 
 
 In this paper, we examine some of the recent work in AI on the subject of 
 learning and generalization of structural descriptions. In particular, we 
 will review four recent methods of inductive generalization: Buchanan et. 
 al., Hayes-Poth, Vere, and our own work (Farlier well-known work by Winston 
 was recently reviewed by Knapman [10]). We also outline some goals for 
 research in this area. Attention is given primarily to the simplest form of 
 generalization, namely the maximally specific conjunctive statements which 
 characterize a single set of input events (called for short, conjunctive 
 generalizations). The reason for this choice is that most work done in 
 this area is addressing this, quite restricted, subject. Many of the 
 researchers whose work we review in this paper have done work on other as- 
 pects of machine learning including generalization using negative examples 
 (Vere, Michalski) and developing discriminant descriptions of several 
 classes of objects (Michalski). Due to space limitations, we have been un- 
 able to include these topics in this paper. Instead, these contributions 
 are mentioned in the sections concerning extensions. We begin the analysis 
 by first discussing several important aspects of the problem of learning 
 conceptual descriptions: 
 
 . types of descriptions: characteristic versus discriminant 
 
 . forms of descriptions 
 
 . types of generalization processes involved in generalizing descrip- 
 tions (rules of generalization) 
 
 • constructive versus non-constructive induction 
 
 . peneral versus problem-oriented methods of induction. 
 
 - 2 - 
 
1.2 Types of Descriptions 
 
 We distinguish between characteristic and discriminant descriptions [15]. 
 A characteristic description Is a description of a single set of objects 
 (examples, events) which is intended to discriminate that set of objects 
 from all other possible objects. For example, a characteristic description 
 of the set of all tables would discriminate any table from all things which 
 are non-tables. Psychologists consider this problem under the name of con- 
 cept formation (e.g. Hunt [9]). Since it is impossible to examine all oth- 
 er possible objects, a characteristic description is usually developed by 
 specifying all characteristics which are true for all known objects of the 
 class (positive examples). Alternatively, in some problems there are 
 available so-called "near misses" which can be used to more precisely cir- 
 cumscribe the given class. 
 
 A discriminant description is a description of a single class of objects in 
 the context of a fixed set of other classes of objects. It states only 
 those properties of objects in the class under consideration which are 
 necessary to distinguish them from the objects in the other classes. A 
 characteristic description can be viewed as a discriminant description in 
 which the given class is discriminated against infinitely many alternative 
 classes . 
 
 In this paper we restrict ourselves to the problem of determining charac- 
 teristic descriptions. The problem of determining discriminant descrip- 
 tions has been studied by Michalski and his collaborators [12-16]). 
 
 1 .3 Forms of Descriptions 
 
 - 3 - 
 
Descriptions, either characteristic or discriminant, may take several 
 forms. In this paper we concentrate on generalizations in conjunctive 
 form. Other forms include disjunctions, exceptions, production rules of 
 various types, hierarchical and multilevel descriptions, semantic nets, and 
 frames . 
 
 1 . 4 Generalization Rules 
 
 The proc-ss of inducing a general description from examples can be viewed 
 as a process of applying certain generalization rules to the initial 
 descriptions to transform them into more general output descriptions. This 
 viewpoint permits one to characterize various methods of induction by 
 specifying the rules of generalization which they use. Below is a brief re- 
 view of various generalization rules based on the paper [16]. 
 
 i) Dropping Condition Rule . If a description is viewed as a conjunc- 
 tion of conditions which must be satisfied, then one way to generalize it 
 is to drop one or more of these conditions. For example: 
 
 red(x) * big(x) k red(x) 
 
 (this reads: "the description 'xs which are red and big' can be generalized 
 to the description 'xs which are red'; |< denotes the generalization 
 operator ) 
 
 ii) Turning Constants to Variables Rule . If we have two or more 
 descriptions, each of which refers to a specific object (in a set to be 
 characterized), we can generalize these by creating one description which 
 
 - 4 - 
 
contains a variable in place of the specific object: 
 
 tall(Fred) man(Fred) | 
 
 |< tall(x) man(x) 
 tall(Jim) man(Jim) | 
 
 assuming that the value set of x is {Fred, Jim, ... }. 'x' can be inter- 
 preted as representing 'a person from the .group under consideration.' 
 
 These first two rules of generalization are the rules most commonly used in 
 the literature on computer induction. Both rules can, however, be viewed 
 as special cases of the following rule. 
 
 iii) Generalizing by Internal Disjunction Pule . A description can be 
 generalized by extending the set of values that a descriptor (i.e. vari- 
 able, function, or predicate) is permitted to take on in order that the 
 description is satisfied. This process involves an operation called the 
 internal disjunction . For example; 
 
 shape(x, square) | 
 
 |< shape(x, (square or triangle or rectangle)) 
 shape(x, triangle) | 
 
 where statements on the left of |< describe some single objects in a class, 
 
 and the statement on the right is a plausible generalization. 
 
 Using the notation of variable-valued logic system VL-. [16] this rule can 
 be expressed somewhat more compactly: 
 
 [shape(x)=square] | 
 
 |< [shape(x)=square, triangle, rectangle] 
 [shape(x)=triangle] | 
 
 The ',' in the expression on the right of the |< denotes the internal 
 disjunction . Although it may seem at first glance that the internal dis- 
 
 - 5 - 
 
junction is just a notational abbreviation, this operation appears to be 
 one of the fundamental operations people use in generalizing descriptions. 
 
 In general this rule can be expressed: 
 
 V[L = PI] k W[L - R2] 
 
 where W is some condition and where Rl and R2 are sets of values linked by 
 Internal disjunction, and PI R2. 
 
 There are two other important special cases of this rule. First, when the 
 descriptor involved takes on values which are linearly ordered (a linear 
 descriptor) and the second when the descriptor takes on values which 
 represent concepts at various levels of generality (a structured 
 
 descriptor ) . 
 
 In the case of a linear descriptor we have: 
 
 iv) Closing Interval P ule . For example, suppose two objects of the 
 same class have all the same characteristics except that they have dif- 
 ferent sizes, a and b. Then, it is plausible to hypothesize that all 
 objects which share these characteristics but which have sizes between a 
 and b are also in this class. 
 
 W[size(xl)-a] | 
 
 fc W[size(x) - a..b] 
 W[size(x2)«b] | 
 
 In the case of structured descriptors we have: 
 
 v) Clinbing Oneralizatlon Tree Rule . Suppose the value set of the 
 
 - 6 - 
 
shape descriptor is the tree of concepts: 
 
 plane geometric figure 
 
 polygon oval figure 
 
 ;le rectangle ellipse circl< 
 
 triangle rectangle ellips< 
 
 With this tree structure, values such as triangle and rectangle can be gen- 
 eralized by climbing the generalization tree: 
 
 [shape(x)=rectangle] | 
 
 |< [shape(x)-polygon] 
 [shape(x)=triangle] | 
 
 1 .5 Constructive Induction 
 
 >fost methods of induction produce descriptions which involve the same 
 descriptors which were present in the initial data. These methods operate 
 by selecting descriptors from the input data and putting them into a form 
 which is an appropriate generalization. Such methods perform 
 non -constructive induction. A method performs constructive induction if it 
 includes mechanisms which can generate new descriptors not present in the 
 input data. These new descriptors are generated by applying rules of con- 
 structive induction. Such rules may be written as procedures or as produc- 
 tion rules and may be based on general knowledge or on problem-oriented 
 knowledge (for examples of constructive generalization rules see [16]). 
 Constructive induction rules can interpret the input data in terms of 
 knowledge about the problem domain. Frequently, the solution to a problem 
 is dependent upon finding the proper description for the problem; as in the 
 mutilated checkerboard problem. An inductive program should contain facil- 
 ities for constructive induction including a library of general construc- 
 
 - 7 - 
 
tive induction rules. The user should be able to suggest new rules for the 
 program to examine. In order to activate those rules which would be most 
 useful, the program must be able to efficiently search the space of possi- 
 ble constructive induction rules. 
 
 Programs which perform constructive induction are more likely to find use- 
 ful and interesting patterns in complex data since they have the ability to 
 examine the data using many different representations. 
 
 1 . 6 General versus Problem-oriented Methods 
 
 It is a common view that general methods of induction, although mathemati- 
 cally elegant and theoretically applicable to many problems, are in prac- 
 tice very inefficient and rarely lead to any interesting solutions. This 
 opinion seems to have lead certain workers to abandon (at least temporari- 
 ly) work on general methods and concentrate on some specific problem (e.g., 
 Buchanan, et. al. [1,2,3] or Lenat [11]). This approach often leads to in- 
 teresting and practical solutions. On the other hand, it is often diffi- 
 cult to extract general principles of induction from such problem-specific 
 work. It is also difficult to apply such special-purpose programs to new 
 areas . 
 
 An attractive possibility for solving this dilemma is to develop methods 
 which incorporate various general principles of induction (including con- 
 structive induction) together with mechanisms for using exchangeable pack- 
 ages of problem-specific knowledge. In this way a general method of induc- 
 tion, provided with an appropriate package of knowledge, could be both 
 easily applicable to different problems and also efficient and practically 
 
 - 8 - 
 
useful. This idea underlies the development of the INDUCE programs 
 [13,15,16]. 
 
 2. COMPARATIVE REVIEW OF SELECTED METHODS 
 
 2.1 Evaluation Criteria 
 
 Ue evaluate the selected methods of induction in terms of several criteria 
 considered especially important in view of the remarks in section 1. 
 
 i) Adequacy of the representation language . The language used to 
 represent input data and output generalizations determines to a large ex- 
 tent the quality and usefulness of the output descriptions. Although it is 
 difficult to assess the adequacy of a representation language out of the 
 context of some specific problem, recent work in AI has shown that 
 languages which treat all phenomena uniformly must sacrifice descriptive 
 precision. For example, researchers who are attempting to build natural- 
 language systems prefer the richer knowledge representations such as frames 
 and semantic nets (with their tremendous variety of syntactic forms) to 
 more uniform and less structured representations such as attribute-value 
 lists and PLANNER-style databases. In our own work on inductive learning, 
 we have chosen to use the representation language VL-. (see below) which 
 has a wider variety of syntactic forms than our earlier language VI . 
 Although languages with many syntactic forms do provide greater descriptive 
 precision, they also make the induction process more complex. In order to 
 control this complexity, a compromise must be sought between uniformity and 
 richness of forms. In the evaluation of each method, a review of the 
 operators and syntactic forms of each description language is provided. 
 
 - 9 - 
 
ii) Pules of generalization Implemented * The generalization rules im- 
 plemented in each algorithm are listed. 
 
 iii) Computational efficiency . The exact analysis of the computational 
 efficiency of these algorithms is very difficult due both to the inherent 
 complexity of the algorithms and to the lack of precise formulations of the 
 algorithms in available publications. However, it seems useful to have 
 some data comparing the efficiency of these algorithms even if that data is 
 approximate and based on hand-simulations. To get some indication of the 
 efficiency we measure the total number of description generations or com- 
 parisons required by each method to perform a test example (see Fig. 1). 
 Ue also measure the ratio of the number of output conjunctive generaliza- 
 tions to the total number of generalizations examined on this example. 
 Since these numbers are derived from only one example, it is not appropri- 
 ate to draw strong conclusions from them concerning the general performance 
 of the algorithms. Our conclusions are based primarily on the general 
 behavior of the algorithms. 
 
 iv) Flexibility and extensibility . Mere conjunctive characteristic gen- 
 eralizations are not particularly useful for conceptual data analysis be- 
 cause of their limited format and their lack of formal mechanisms for han- 
 dling errors in the input data. It is important in evaluating these algo- 
 rithms to consider the ease with which each method could be extended to 
 
 a) discover descriptions with forms other than conjunctive generaliza- 
 tions (see section 1.3), 
 
 b) include mechanisms which facilitate the detection of errors in the in- 
 
 - 10 - 
 
put data, 
 
 c) provide a general facility for incorporating domain-specific knowledge 
 into the induction process as an exchangeable package (Ideally, the 
 domain-specific knowledge should be Isolated from the general-purpose in- 
 ductive process.) » and 
 
 d) perform constructive induction. 
 
 It is difficult to assess the flexibility and extensibility of the algo- 
 rithms presented here. We base our evaluation on the general approaches of 
 the methods and on extensions which have already been made to them. 
 
 In the following sections, we describe each method by presenting the 
 description language used, sketching the underlying algorithm, and evaluat- 
 ing the method in terms of the above criteria. Fach method will be illus- 
 trated using the test example shown in Fig. 1. 
 
 r, □ 
 
 A 
 
 Figure 1 
 
 2.2 Data-driven Methods : Hayes -Pot h and Vere . 
 
 Methods can be divided into bottom-up (data-driven) , top-down (model- 
 driven) , and mixed methods. Bottom-up methods generalize the input events 
 pairwise until the final conjunctive generalization is computed: 
 
 - 11 - 
 
F1=C1 E2 E3 E4 
 C.2 is the set of conjunctive generalizations of El and E2. Gi is the set 
 of conjunctive generalizations obtained by taking each element of Gl-1 and 
 generalizing it with Ei . 
 
 TT e consider here only the methods described by Hayes-Roth and Vere. Other 
 bottom-up methods include the candidate elimination approach described by 
 Mitchell [18] and the Uniclass method described by Stepp [20]. 
 
 2.2.1 Hayes-Roth- Program SPROUTER [5-?] 
 
 Hayes-Poth uses the term maximal abstraction or interference match for max- 
 imally specific conjunctive generalization. He uses parameterized struc- 
 tural representations (PSRs) to represent both the input events and their 
 generalizations. For example, consider the two events described in Fig. 2: 
 
 o 
 □ 
 
 El 
 
 D. 
 
 o 
 
 £2 
 
 The PSRs for these could be 
 
 Figure 2 
 
 El: {{circle :a}{square:b }{ small :a> 
 
 {smal 1 :b}{ontop :a, underrb}} 
 E2: {{c1rcle:c}{square:d>{circle:e> 
 
 - 12 - 
 
{small :c >{ large :d>{ small :e} 
 {ontop:c, under :d} 
 {lnsldere, outsiderd}) 
 
 The expressions such as {small :a} are case frames made up of case labels 
 (small, circle, etc.) and parameters (a, b, c, d). The PSR can be inter- 
 preted as a conjunction of predicates of the form small (a) where the param- 
 eters are existentially quantified variables which are assumed to be dis- 
 tinct. 
 
 The interference match attempts to find the longest one-to-one match of 
 parameters and case frames (i.e., the longest common subexpression). This 
 is accomplished in two steps. First the case relations in El and E2 are 
 matched in all possible ways to obtain the set M. Two case relations match 
 if all of their case labels match. Each element of M is a case relation 
 and a list of parameter correspondences which permit that case relation to 
 match in both events: 
 
 M = {{circle:((a/c)(a/e))}{square:((b/d))> 
 {small : ( (a/c) (b/c) (a/e) (b/e) ) } 
 {ontop, under: ((a/c b/d))>> 
 
 The second step involves selecting a subset of the parameter correspon- 
 dences in M such that all parameters can be bound consistently. This is 
 conducted by a breadth-first search of the space of possible bindings with 
 pruning of unpromising nodes. The search can be visualized as a node- 
 
 - 13 - 
 
building process. Here is one such (pruned) search: 
 
 M 
 
 Interference match 
 
 {ontop, under) 
 a/c b/d 
 
 The nodes are numbered in order of generation. One at a time, a node is 
 examined and joined with all other consistent nodes which have already been 
 examined. The nodes 5, 8, and 9 are conjunctive generalizations. Node 9 
 binds a to c (to give 1) and b to d (to give 2) to produce the conjunction: 
 
 {{circle : 1}{ square: 2 }{ small: 1} 
 {ontop: 1, under:2}} 
 
 The node-building process is guided by computing a utility value for each 
 candidate node to be built. The nodes are pruned by setting an upper limit 
 on the total number of possible nodes and pruning nodes of low utility when 
 that limit is reached. 
 
 Evaluation : 
 
 i) Representational adequacy. The algorithm discovers the following 
 conjunctive generalizations of the example in Fig. 1 : 
 
 1. {{ontop:!, under : 2>{medium: 1 >{clear: 1>> 
 There ts a medium clear object ontop of 
 
 - 14 - 
 
something. 
 
 2. {{ontop:l, under: 2}{medium: lKlarge: 2} 
 
 {clear:2}> 
 There is a medium object ontop of a 
 large, clear object. 
 
 3. {{medium: 1>{ clear: 1}{ large: 3 }{ clear: 3} 
 
 {shaded :2>> 
 There is a medium sized clear object, 
 a large sized clear object, and a 
 shaded object. 
 
 PSRs provide two symbolic forms: parameters and case labels. The case la- 
 bels can express ordinary predicates and relations easily. Symmetric rela- 
 tions may be expresed by using the same label twice as in {same!size:a , 
 same! size :b}. The only operator is the conjunction. The language has no 
 disjunction or internal disjunction. As a result, the fact that the top 
 element in Fig. 1 is always either a square or a diamond cannot be 
 discovered. 
 
 ii) Rules of generalization. The method uses the dropping condition 
 and turning constants to variables rules. 
 
 iii) Computational efficiency. On our test example, the algorithm re- 
 quires 22 comparisons and generates 2D candidate conjunctive generaliza- 
 tions of which 6 are retained. This gives a figure of 6/20 or 30% for com- 
 putational efficiency. Four separate interference matches are required 
 since the first match of Fl and E2 produces three possible conjunctive gen- 
 
 - 15 - 
 
eralizations. 
 
 lv) Flexibility and extensibility. Payes-Poth has indicated (personal 
 communication) that this method has been extended to produce disjunctive 
 generalizations and to detect errors in data. Hayes-Roth has applied this 
 method to various problems in the design of the speech understanding system 
 Hearsay II. However, no facility has been developed for incorporating 
 domain-specific knowledge into the generalization process. 
 
 Also, no facility for constructive induction has been incorporated although 
 Hayes-Roth has developed a technique for converting a PSR to a lower-level 
 finer-grained uniform PSR. This transformation permits the program to 
 develop descriptions which involve a many-to-one binding of parameters. 
 
 2.2.2 Vere: Program Thoth [21-24] 
 
 Vere uses the term maximal conjunctive generalization or maximal unifying 
 generalization to denote the maximally specific conjunctive generalization. 
 Each event is represented as a conjunction of literals. A literal is a 
 parenthesized list of constants called terms. For example, the objects in 
 Fig. 1 would be described: 
 
 P.l: (circle a)(square b)(small a)(small b) 
 
 (ontop a b) 
 F 2: (circle c) (square d) (circle e) 
 
 (small c) (large d) (small e) 
 
 (ontop c d)(inside e d) 
 
 Although these resemble Hayes-Roth's PSRs , they are quite different. There 
 are no distinguished symbols. All terms are treated uniformly. 
 
 The algorithm operates In four steps. First, the literals in each of the 
 
 - 16 - 
 
two events to be generalized are matched in all possible ways to generate 
 the set of matching pairs MP. Two literals match if they contain the same 
 number of constants and they share a common term in the same position. For 
 the example of Fig. 2, 
 
 MP» { ((circle a), (circle c)), 
 ((circle a), (circle e)), 
 ((square b), (square d)), 
 ((small a), (small c)), 
 ((small a), (small e) ) , 
 ((small b), (small c)), 
 ((small b), (small e)), 
 ((ontop a b),(ontop c d)) } 
 
 The second step involves selecting all possible subsets of MP such that no 
 
 single literal of one event is paired with more than one literal in another 
 
 event. Each of these subsets eventually forms a new generalization of the 
 
 original events. 
 
 In the third step, each subset of matching pairs selected in step 2 is ex- 
 tended by adding to the subset additional pairs of literals which did not 
 previously match. A new pair p is added to a subset S of MP if each 
 literal in p is related to some other pair q in S by a common constant in a 
 common position. For example, if S contained the pair ((square b), (square 
 d)) then we could add to S the pair ((ontop a b), (inside e d)) because the 
 third element of (ontop a b) is the second element of (square b) and the 
 third element of (inside e d) is the second element of (square d) (Vere 
 calls this a 3-2 relationship) . We continue adding new pairs until no more 
 can be added . 
 
 In step 4 the resulting set of pairs is converted into a new conjunction of 
 literals by merging each pair to form a single literal. Constants which do 
 
 - 17 - 
 
not match are turned into new constants which may be viewed as variables* 
 For example, ((circle a), (circle c)) would be converted to (circle 1). 
 
 Evaluation: 
 
 i) Representational adequacy. When applied to the test example (Fig. 
 1) this algorithm produces many generalizations. A few of the significant 
 ones are listed here: 
 
 1. (ontop 1 2) (medium 1) (large 2) (clear 2) (clear 3) (shaded 4) (5 4) 
 
 i 
 There is a medium object on top of a large clear object. Another ob- 
 ject is clear. There is a shaded object. (Mote also the vacuous re- 
 lationship 5 derived from unifying circle and triangle) . 
 
 2. (ontop 1 2) (clear 1) (medium 1)(9 1) (5 3 4) (shaded 3) (7 3) (6 3) (clear 
 4) (large 4) (8 4) 
 
 There is a medium, clear object on top of some other object and there 
 are two objects related in some way (5) such that one is shaded and 
 the other is large and clear. (Note the vacuous relationships 6, 7, 
 8, and 9) . 
 
 3. (ontop 1 2) (medium 1) (clear 2) (large 2) (5 2) (shaded 3) (7 3) (clear 
 4)(6 4) 
 
 There is a medium object on top of a large clear object. There is a 
 shaded object and there is a clear object. (Note the vacuous rela- 
 tionships 5, 6, and 7). 
 
 The representation is very general. By convention the first symbol of a 
 literal can bp interpreted as a predicate symbol. The algorithm, however, 
 treats all constants uniformly. This creates difficulties. For instance 
 
 - 18 - 
 
the algorithm generates vacuous literals in certain situations. Literals 
 can be formed by pairing (red x) with (big y) to produce meaningless gen- 
 eralizations. One advantage of this relaxation of semantic constraints is 
 that the program can discover conjunctive generalizations involving a 
 many-to-one binding of variables. 
 
 The language contains only a conjunction operator. No disjunction or 
 internal disjunction is included. 
 
 ii) Rules of generalization. The algorithm implements the dropping 
 condition rule and the turning constants to variables rule. 
 
 iii) Computational efficiency. From the published articles [21-24] it 
 is not clear how to perform step 2. The space of possibilities is very 
 large and an exhaustive search could not possibly give the computation 
 times which Vere has published. It would be interesting to find out what 
 heuristics are being used to guide the search. 
 
 iv) Flexibility and extensibility. Vere has published algorit^T- 
 which discover descriptions with disjunctions [23] and exceptions [24]. He 
 has also developed techniques to generalize relational production rule? 
 [22,23]. The method has been demonstrated using the traditional AI toy 
 problems of IQ analogy tests and blocks-world sequences. A facility for us- 
 ing background information to assist the induction process has also been 
 developed. It uses a spreading activation technique to extract relevant 
 relations from a knowledge base and add them to the input examples prior to 
 generalizing them. Since the method has been extended to discover disjunc- 
 tions aud exceptions, it would be expected that the method could also 
 
 - 19 - 
 
operate in noisy environments. 
 
 2. 3 Model-driven Methods : Buchanan et » al . , and Michalski . 
 
 Model-driven methods search a set of possible generalizations in an attempt 
 to find a few "best" hypotheses which satisfy certain requirements. The 
 two methods discussed here search for a small number of conjunctions which 
 together cover all of the input events. The search proceeds by choosing as 
 the initial working hypothesis some starting point in the partially ordered 
 set of all possible descriptions. If the working hypotheses satisfy cer- 
 tain termination criteria, then the search halts. Otherwise, the current 
 hypotheses are modified by slightly generalizing or specializing them. 
 These new hypotheses are then checked to see if they satisfy the termina- 
 tion criteria. The process of modifying and checking continues until the 
 criteria are met. Top-down techniques typically have better noise immunity 
 and can easily be extended to discover disjunctions. The principal disad- 
 vantage of these techniques is that the working hypotheses must repeatedly 
 be checked to determine whether they subsume all of the input events. 
 
 2.3.1 Buchanan, et. al.: Program Meta-DENDRAL [1-3,19] 
 
 The algorithm which we describe here is taken from the RULEGEN program 
 (part of the Meta-DE^DRAL system) . Meta-DENPRAL was designed to discover 
 cleavage rules to explain mass spectrometry data. The descriptive language 
 is based on the ball-and-stlck model of chemical molecules. Each input 
 event is a bond environment which describes some portion of a molecule. 
 The environment is represented by a graph of the atoms in the molecule with 
 four descriptors attached to each atom and forms the left hand side of a 
 
 - 20 - 
 
cleavage rule. The right hand side of the rule predicts a cleavage based 
 on the existence In a molecule of the left-hand side of the rule (breakbond 
 (**) indicates that the ** bond is predicted to be broken). A typical 
 cleavage rule (with atoms w, x, y, and z) is: 
 
 LEFT-HAND SIDE (BOND ENVIRONMENT) : 
 
 Kolecule 
 
 graph: 
 
 w ** x — y - 
 
 - z — 
 
 Atom descriptors: 
 
 
 
 
 atom 
 
 type 
 
 nhs 
 
 nbrs 
 
 dots 
 
 w 
 
 carbon 
 
 3 
 
 1 
 
 
 
 X 
 
 carbon 
 
 2 
 
 2 
 
 
 
 y 
 
 nitrogen 
 
 1 
 
 2 
 
 
 
 z 
 
 carbon 
 
 2 
 
 2 
 
 
 
 RIGHT-HAND SIDE (CLEAVAGE PREDICTION): 
 => Breakbond (**) 
 The algorithm chooses as its starting point the most general bond enviro- 
 ment ( x ** y ) with no properties specified for either atom. During the 
 search, this description is grown by successively specializing a property 
 of one of the atoms in the graph or by adding a new atom to the graph. 
 After each specialization, the new graph is checked to see if It is 
 "better" than the parent graph from which is was derived. A daughter graph 
 is better than its parent if it still covers at least half of the input 
 rules (it's general enough) and still focusses on only one cleavage proces 
 (it's specific enough). The cleavage rules built by this algorithm ".re 
 further improved by the program RULEMOD. 
 
 Evaluation: 
 
 i) Representational adequacy. The representation was adequate for the 
 specific task of developing cleavage rules. It was not intended to be a 
 gene^ 1 representation for objects outside of the chemical world. The 
 
 - 21 - 
 
descriptions can be viewed as conjunctions. Individual rules developed by 
 tbe program can be considered to be linked by disjunction. 
 
 ii) Pules of generalization. The dropping condition and turning con- 
 stants to variables rules are used "in reverse" during the specialization 
 process. RULEGEN does not seem to have the ability to handle an internal 
 disjunction but RULFMOD apparently does. For example, it can indicate that 
 the type of atom is "anything except hydrogen". In similar work on nuclear 
 magnetic resonance (NMP), Mitchell presents an example in which the value 
 of nhs is listed as "greater than or equal to one" (which indicates an 
 internal disjunction). 
 
 iii) Computational efficiency. Because this is a problem-specific al- 
 gorithm, we cannot supply comparison figures here for how this algorithm 
 would work on our test example. The current program is considered to be 
 relatively inefficient [2]. 
 
 iv) Flexibility and extensibility. Meta-DENDRAL has been extended to 
 handle NMR spectra. The program works well in an errorful environment. 
 It uses domain-specific knowledge extensively. However, there is no strict 
 separation between a general-purpose induction component and a special- 
 purpose knowledge component. It is not clear whether the methods developed 
 for Meta-PENDFAL could be easily applied to any non-chemical domain. The 
 program does not perform constructive induction In any general way. Howev- 
 er, the INTSTIM program does perform sophisticated transformations on the 
 Input spectra In order to develop the bond-environment descriptions. 
 
 2.3.2 Michalskl and Pietterich: Program INDUCE 1.2 
 
 - 22 - 
 
The algorithm described here is one of three algorithms designed by Michal- 
 ski and his collaborators. The others are a data-driven method described 
 by Stepp [20] and a mixed method described by Larson and Michalski [12,13]. 
 The language used to describe the input events is VL_., an extension to 
 first-order predicate logic (FOPL) [16]. Each event is represented as a 
 conjunction of selectors. A selector typically contains a function or 
 predicate descriptor (with variables as arguments) and a list of values 
 that the descriptor may assume. The selector [size(xl)"small, medium] as- 
 serts that the size of xl may take the values small or medium. The events 
 in Fig. 2 are represented as: 
 
 El: [size(xl)=small] [size(x2)"small] 
 
 [shape(xl) -circle] [ shape (x2) "square] 
 
 [ontop(xl ,x2) ] 
 E2: [size(xl)"small] [size(x2)=large] 
 
 [size(x3)=small] [shape (xl) ^circle] 
 
 [shape (x2) =square] [ shape (x3) "circle] 
 
 [ontop(xl,x2)J [inside(x3,x2)] 
 
 In this method, descriptors are divided into two classes: attribute 
 descriptors and structure-specifying descriptors. Attribute descriptors 
 describe attributes such as size or shape or distance which are applicable 
 to all variables (representing, e.g., object parts). Structure-specifying 
 descriptors include all other descriptors. They typically represent rela- 
 tionships among variables such as ontop or inside. Each input conjunction 
 is broken into two conjuncts — one built of selectors containing only attri- 
 bute descriptors (the attribute conjunct) and one built of selectors con- 
 taining only structure-specifying descriptors (the structure conjunct) . 
 
 The algorithm is based on the observation that the structure-specifying 
 descriptors are responsible for the computational complexity of generaliz- 
 
 - 23 - 
 
ing structural descriptions. If we could determine conjunctions of 
 structure-specifying selectors which were relevant for describing a partic- 
 ular class of objects, then the generalization of the attribute conjuncts 
 could be handled quickly by an appropriate covering algorithm. The algo- 
 rithm seeks to determine such a set of structure conjuncts which appear 
 likely to be part of a maximally specific conjunctive generalization of all 
 of the input events. It does this by finding conjunctions which are maxi- 
 mally specific generalizations of the input structure conjuncts considered 
 alone. Such conjunctive generalizations of the structure conjuncts must be 
 contained in some maximally specific generalizations of the entire set of 
 Input events. However, there may be maximally specific conjunctive gen- 
 eralizations of the input events which contain few if any structure- 
 specifying selectors. This algorithm also finds these generalizations by 
 considering structure conjuncts which are less than maximally specific. 
 
 The algorithm operates In two phases. The first phase is the structure- 
 determining phase. A random sample of the input structure conjuncts is 
 taken. This sample becomes the initial set of generalizations G_. In each 
 step, G is first pruned to a fixed size by removing unpromising generali- 
 zations. Then G is checked to see if any of its generalizations covers 
 all of the structure conjuncts. If any do, they are removed from G and 
 placed in the set C of candidate conjunctive generalizations. Lastly, G 
 is generalized to form G by taking each element of G and generalizing 
 it In all possible ways by dropping single selectors. When the set of can- 
 didates G reaches a prespecified size, the search stops. 
 
 The second phase Is the attribute-determining phase. In this phase, the 
 
 - 24 - 
 
problem Is converted to a multiple-valued logic covering problem using the 
 VL. propositional calculus [14,15]. Each candidate cover A in C is matched 
 against all input events and the relevant variables are identified. For 
 each natch, the appropriate attribute conjuncts are extracted and used to 
 form a VL. event. For example, 
 
 if A * [ontop(pl,p2)] and 
 
 Fl = [ontop(pl,p2)] [ontop(p2,p3)] 
 
 [size(pl)«=l] [size( P 2)«3] [size(p3)-5] 
 
 [color (pi) =red] [color(p2)=green] 
 
 [color(p3)*blue] 
 
 then we get two VL. events: 
 
 Vl= (1, 3, red, green) and 
 V2= (3, 5, green, blue). 
 
 These are vectors of attributes which correspond here to the descriptors: 
 
 (size(pl), size(p2), color(pl), color(p2)) 
 
 for pi and p2 in A. 
 
 All input events are converted into VL. events in this manner. In general, 
 more than one VL. event is created from each input event. The set of VL. 
 events can be covered using a covering algorithm. A cover could be ob- 
 tained by forming the union of the values taken on by each VL attribute. 
 Such an approach usually leads to overgeneralization since only one VL 
 event derived from each input event need be covered. We use a beam-search 
 technique to select a subset of the VL. events to be covered. 
 
 This two-phase algorithm provides two computational advantages. First, the 
 time required to compare expressions in the structure-determining phase is 
 
 - 25 - 
 
reduced because the structure conjuncts are usually much smaller than the 
 full Input conjuncts. Second, the manipulation of VL. formulas Is very 
 easy since they may be represented as bit strings and manipulated using 
 fast bit-parallel operations. The chief disadvantage of this algorithm is 
 that it is difficult to decide when to terminate the structure-determining 
 phase. 
 
 Fvaluation: 
 
 i) Representational adequacy. The algorithm discovers, among others, 
 the following generalizations of the events in Fig. 1: 
 
 1. [ontop(pl,p2) ] [size(pl)=medium] [shape(p l)«=circle, square, rectangle] 
 [size(p2)=large] f shape (p 2 )=box, rectangle, ellipse] [texture (p 2 )=c lea r] 
 There is a medium-sized circle, rectangle or square on top of a 
 large, clear box, rectangle, or ellipse. 
 
 2. [ontop(pl,p2) ] [size(pl)=medium] [shape(pl)=polygon] 
 [texture (pi )=clear] [size(p2)=medium,large] 
 
 [ shape (p 2 )=rec tangle, circle] 
 
 There is a clear, medium-sized polygon on top of a medium or large 
 
 circle or rectangle. 
 
 3. [ontop(pl,p2)] [size(pl)«=medium] [shape (p 1 )*polygon] 
 [size(p 2) -medium, large] [shape (p 2 )=rec tangle ,ellipse .circle] 
 
 There is a medium-sized polygon on top of a large or medium rectan- 
 gle, ellipse or circle. 
 
 U. [size(p l)=smal 1, medium] [shape(p 1 )-circle .rectangle] 
 [tpxture(pl ) -shaded] 
 
 - 26 - 
 
There is a shaded object which is either medium or small In size and 
 has a circular or rectangular shape. 
 
 This algorithm implements the conjunction, disjunction and internal dis- 
 junction operators. It provides a fairly non-uniform set of representa- 
 tional facilities. Descriptors, variables, and values are all dis- 
 tinguished. Descriptors are further analyzed into structure-specifying 
 descriptors and attribute descriptors. The current method provides for 
 descriptors which have unordered, linearly ordered, and tree ordered value 
 sets. This variety of possible representations permits a better "fit" 
 between the description language and any specific problem. 
 
 ii) Rules of generalization. The algorithm uses all rules mentioned 
 in section 1.4 and also a few constructive induction rules (see below). 
 All constants are coded as variables. The effect of the turning-constants 
 to variables rule is achieved as a special case of the generalization by 
 internal disjunction rule. 
 
 iii) Computational efficiency. The algorithm requires 28 comparisons 
 and builds 13 rules during the search to develop the descriptions listed 
 above. Four rules are retained so this gives an efficiency ratio of 4/1 3 
 or 30*. 
 
 iv) Flexibility and extensibility. The algorithm can easily discover 
 disjunctions by altering the termination criteria for the structure- 
 determining phase to accept structure conjuncts which do not necessarily 
 cover all of the input events. The same general two-phase approach can 
 also be applied to problems of determining discriminant generalizations. 
 
 - 27 - 
 
Larson and Michalski have done work on determining discriminant classifica- 
 tion rules [12,13,14] . 
 
 The algorithm has good noise immunity. Noise events can be discovered be- 
 cause the algorithm tends to place them in separate terms of a disjunction. 
 
 Domain-specific knowledge can be incorporated into the program by defining 
 the domains of descriptors, specifying the structures of these domains, 
 specifying certain simple production rules, and by providing constructive 
 induction rules. These forms of knowledge representation are not always 
 convenient, however. Further work should provide other facilities for 
 knowledge representation. 
 
 A few simple constructive induction rules have been incorporated into the 
 current implementation as a preprocessor. Other constructive induction 
 rules can be specified by the user. Using the built-in constructive induc- 
 tion rules, the program produces the following conjunctive generalization 
 of the input events in Fig. 1: 
 
 [ft p's with texture clear=2] [top-most(pl) ] 
 
 [ontop(pl ,p2) ] [size(p l)=medium] 
 
 [shape (pl)=polygon] [texture (pi )=clear] 
 
 [size(p2)=medlum,large] 
 
 [shape (p 2) =clrcle .rectangle] 
 
 There are exactly two clear objects In each event. The top most object 
 
 is a medium sized, clear polygon and it is on top of a large or medium 
 
 sized circle or rectangle. 
 
 hope to expand this constructive induction facility in the future. 
 
 - 28 - 
 
2.4 Summary 
 
 The comparison of various methods is summarized in Fig. 3. The table shows 
 the distinct advantages and disadvantages of top-down methods as opposed to 
 bottom-up methods. Bottom-up methods tend to be faster but noise immunity 
 and flexibility suffer as a consequence. Top-down methods have good noise 
 immunity and are easily modified to discover disjunctive and other forms of 
 generalization. They do tend to be computationally more expensive. By 
 separating the structure-determining phase from the attribute-determining 
 phase in our method, a considerable speed-up has been achieved. 
 
 3.0 CONCLUSION 
 
 One of the problems of current research on induction is that each research 
 group is using a different formal language and terminology. This makes the 
 exchange of information difficult. This paper was intended to help readers to 
 get a better understanding of the state of the art in this area. 
 
 Some important problems to be addressed in future research include: 
 
 i) the development of adequate formal languages and knowledge represen- 
 tations for hypothesis formulation and modification: 
 
 ii) extension of the scopes of operators and forms which an inductive 
 program can efficiently use during hypothesis formulation; 
 
 iii) the development of general mechanisms of induction which can be 
 guided by problem-specific packets of knowledge; and 
 
 iv) incorporation in the program of extensive facilities for construc- 
 tive induction and multi-level schemes of description. In particular, an 
 
 - 29 - 
 
■*ethod: 
 Criterion 
 
 Fayes-Poth 
 
 Vere 
 
 Buchanan et.al. Michalak! 
 
 Intended application: 
 
 general 
 
 general 
 
 discovering 
 mass spectro- 
 metry rules 
 
 genera] 
 
 Language : 
 
 syntactic concepts: 
 
 operators 
 
 Parameterized 
 Structural 
 Representation 
 case frames 
 parameters 
 case labels 
 
 A 
 
 Ouantifier- 
 free FOPL 
 
 literals 
 constants 
 
 Chemical model Variable-valued 
 
 logic system VL21 
 
 molecule graph selectors 
 
 A 
 
 attributes 
 
 descriptors 
 
 constants in 
 
 dummy variables 
 
 in value sets 
 
 constants in 
 
 
 value sets 
 
 A,V, 
 
 A,V, 
 
 internal V 
 
 internal V 
 
 yes 
 
 yes 
 
 yes 
 
 yes 
 
 yes 
 
 yes 
 
 no 
 
 yes 
 
 no 
 
 yes 
 
 Ceneralization Pules: 
 dropping condition? 
 constants to variables? 
 generalizing by internal v? 
 climbing tree? 
 closing intervals? 
 
 yes 
 
 yes 
 
 no 
 
 no 
 
 no 
 
 yes 
 
 yes 
 
 no 
 
 no 
 
 no 
 
 Ff f iciency : 
 comparisons : 
 
 conjunctions 
 
 generated during search: 
 
 ratio output to total* 
 
 22 
 
 20 
 
 6/20=30% 
 
 complete 
 algorithm 
 not known 
 
 not applicable 28 
 
 not applicable 
 not applicable 
 
 13 
 
 4/13=30% 
 
 Fxtensibility: 
 applications 
 
 disjunctive forms? 
 noise immunity 
 domain knowledge? 
 
 constructive induction? 
 
 speech 
 
 analysis 
 
 no 
 
 low 
 
 no 
 
 none 
 
 yes 
 
 probably good 
 
 yes 
 
 no 
 
 mass spectro- 
 metry, NMP 
 yes 
 
 excellent 
 yes, built-in 
 to program 
 no 
 
 soybean disease 
 
 diagnosis 
 
 yes 
 
 very good 
 
 yes 
 
 limited 
 facility 
 
 Figure 3. 
 
 - 30 - 
 
inductive program should be able to assign names to various subdescriptions 
 and use these names in the formulation of hypotheses (i.e. generate 
 hierarchical forms). 
 
 Finally, an important principle which should guide future research is what 
 we call the principle of comprehensibility . This principle states that the 
 descriptions which an AI program uses and the concepts which it generates 
 should be easily comprehensible by people. Tn the context of work on in- 
 duction, the comprehensibility principle requires that the descriptions be 
 short and use operators which can be easily interpreted in natural 
 language. Furthermore, systems should be designed to provide flexible in- 
 teractive facilities. This approach has been adopted in our work because 
 we expect that the most significant applications of AT inductive programs 
 will be as interactive tools for conceptual data analysis and computer- 
 aided acquisition of rules for knowledge-based expert systems. 
 
 4. A rKNOWLFPOFMFNTS 
 
 The authors gratefully acknowledge the partial support of NSF under grant 
 MCS-76-22940 and MCS-79-06614. 
 
 5. PFFFPFT-TCFS 
 
 [1] Buchanan, B. C, F. A. ^eigenbaum, J. Lederberg, "/ Feuristic Program- 
 ming Study of Theory Formation in Science," in Froc . IJCAI-2, 197], pp. 
 
 40-48. 
 
 [2] "uchanan, B.C., D. H. Smith, W. C. White, P. J. Critter, F. A. Fei^en- 
 baum, J. Lederberg, C. Tjerassi, J. Am. Chem . Foe 98 (1976) p. 6168. 
 
 - 31 -r 
 
[3] Buchanan, **. G. , F. A. Feigenbaum, "Pendral and Meta-Pendral, Their 
 Applications Dimension," Artlf. Tntell . 11 (1Q7S) pp. 5-24. 
 
 [A] ^ietterich, T., "User's Cuide for ITTDUCE 1 . 1 , " internal report, r» Pp t. 
 of Corp. Pel., Univ. of Tllnois, Urhana, 197B. 
 
 [5] T, ayes-Poth, v ., "Collected p apers on the Learning and Recognition of 
 Structured Patterns", n ept. of Conp . Sci., Carnegie-Uel Ion Univ., Jan. 
 1975. 
 
 [{■] Dayes-^oth , F. , "Patterns of Induction and Associated Knowledge Ac- 
 quisition Algorithms," Pept . of Conp. Sci., Carnegie-Mellon Univ., May 
 1976. 
 
 [7] "ayes-Poth , F. , J. f 'cPermott, "knowledge Acquisition from Structure 
 Descriptions", Tn Froc TJCAT-5, 1977, pp. 356-362. 
 
 [P] "ayes-^oth, F., J. M cDermott, "An Interference Matching Technique for 
 Inducing Abstractions", CACM 21:5, 197°, pp. 4D1-41P. 
 
 [°] ' 4 unt , F.R., Fxperiments in Induction , Academic Press, 1966. 
 
 [If" 1 ] T'napnan, John, "A Critical Peview of T, inston's Learning Structural 
 Pescriptions fron Fxamples," AISF Quarterl y Issue 31, September 1973, pp. 
 31Q-320. 
 
 [11] T,enat, D. f "/?'• An artificial intelligence approach to discovery in 
 mathematics as heuristic search," Conp. Sci. Pept., Pept. STAU-CS-76-570, 
 Stanford Univ., July l n 76- 
 
 - 32 - 
 
[12] Larson, J., and R.S. Michalski, "Inductive Inference of VL Decision 
 Rules," SIGART Newsletter , June 1977, pp. 38-44. 
 
 [13] Larson, J., 'Inductive Inference in the Variable Valued Predicate Log- 
 ic System VL21 : Methodology and Computer Implementation' , Rept . Mo. P69, 
 Pept. of Comp. Sci., Univ. of 111., Urbana , May 1977. 
 
 [14] Michalski, R. S., "Variable-valued logic and Its application to pat- 
 tern recognition and machine learning," In Comp . Sci . and Multiple-Valued 
 Logic , ed. P. C. Rine, Morth-Folland, 1977, pp. 506-534. 
 
 [15] Michalski, R.S., "Toward Computer-aided Induction: a brief review of 
 Currently Implemented AOVAL programs," In Proc. IJCAT-5, 1977. 
 
 [1*] Michalski, R.S. "Pattern Recognition as Knowledge-Guided Induction," 
 "ept. °27, Pept. of Comp. Sci., Univ. of 111. Urbana, 1°7P (an updated 
 version to appear in IEEE Trans, on Pattern Analysis and Machine Learning, 
 
 loro) . 
 
 [17] Michle, P., "New Face of AI," Experimental Programming Pepts.: No. 33, 
 UIPU, Univ. of Edinburgh, 197 7. 
 
 [IP] Mitchell, T. M., "Version Spaces: A Candidate Elimination Approach to 
 Rule Learning," In Proc. IJCAI-5, MIT, 1977. 
 
 [19] Schwenzer, G. M. , T. M. Mitchell, "Computer-assisted Structure Eluci- 
 dation TTsing Automatiically Acquired Carbon-13 IIKR Pules," in ACS Symposium 
 Series, Mo. 54, 'Computer-assisted Structure Elucidation,' D.H. Smith (ed), 
 1977. 
 
 - 33 - 
 
[20] Stepp, R. "Learning without Negative Examples via Variable-Valued Logic 
 Characterizations: The Uniclass Inductive Program AQ7UNI," Rept. No. 982, 
 Dept. of Comp. Sci. , Univ. of 111., Urbana, 1979. 
 
 [21] Vere, S.A., "Induction of Concepts in the Predicate Calculus," In 
 Proc. IJCAT-4, 1975. 
 
 [22] Vere, S. A., "Induction of Relational Productions in the Presence of 
 background Inf ornation ," Tn Proc. IJCAI-5, 1977. 
 
 [23] Vere, S. A., "Inductive Learning of Relational Productions", in 
 Pattern-Pirected Inference System s, P. A. Waterman and F. Hayes-Poth (eds) , 
 Academic Press, 1P7P. 
 
 [24] Vere, S. A., "Multilevel Counterf actuals for Ceneralizations of Rela- 
 tional Concepts and Productions," Pept. of Inf. Fng'g, Univ. of 111., Chi- 
 cago Circle, 197P. 
 
 - 34 - 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R-80-1007 
 
 3. Recipient's Accession No. 
 
 5- Report Date 
 
 February 1980 
 
 4. Title and Subtitle 
 
 LEARNING AND GENERALIZATION OF STRUCTURED DESCRIPTIONS: 
 Evaluation Criteria and Comparative Review of Selected 
 Methods 
 
 7. Author(s) 
 
 Thomas G. Dietterich and Ryszard S. Michalski 
 
 8* Performing Organization Rept. 
 No. 
 
 9. Performing Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 61801 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Gram No. 
 
 NSF MCS-76-22940 
 NSF MCS-79-06614 
 
 12. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, DC 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 15. Supplementary Notes 
 
 16. Abstracts 
 
 Sone recent work in the area of learning structural descriptions from exam- 
 ples is reviewed in light of the need in many diverse disciplines for pro- 
 grams which can perform conceptual data analysis. Such programs describe 
 complex data in terms of logical, functional, and causal relationships 
 which cannot be discovered using traditional data analysis techniques. 
 Various important aspects of the problem of learning structural descrip- 
 tions are examined and criteria for evaluating current work is presented. 
 Methods published by Buchanan, et .al . [1-3, 20] , Hayes-Roth [6-9], and Vere 
 [22-25], are analyzed according to these criteria and compared to a method 
 developed by the authors. Finally some goals are suggested for future 
 research. 
 
 17. Key Words and Document Analysis. 17a. Descriptors 
 
 Machine Learning, Inductive Inference, 
 Knowledge Acquisition, Structural Learning 
 Computer Inference 
 
 17b. Identifiers/Open-Ended Terms 
 
 17c. COSATI Field/Group 
 
 18. Availability Statement 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 FORM NTIS-3S ( 10-70) 
 
 USCOMM-DC 40329-P71