HHHMWHWmII mm 
 
 HIS WssBi 
 
 HI HI 
 
 BHHffl 
 
 
 HBH1 
 
 H IB 
 
 WW 
 
 EH n 
 
 IK 
 
 HKBH Hi 
 
 91 
 
 ■nBS 
 
 H 
 
 33&H 
 
 SUB 
 
 ■ 
 
 I 
 
 IB 
 Hi 
 
 ;^ft 
 
 &-'.■ 
 
 
 ■ 
 
 ■ 
 
 H 
 
 £ A ,-*r HWBBW 
 
 STlIHI 
 
 mini 
 
 
 ■ 
 
 HH1 
 
 I 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510. 84- 
 
 IJ*6r 
 *o.6&7-£72 
 cop.2 
 
uiucDcs-R-7^-671 
 
 August, 197k 
 
 f)i^L4< 
 
 1^ l^v I 
 
 OCT 3 1 197/ 
 
 LEARNING BY INDUCTIVE INFERENCE 
 R. S. Michalski 
 
uiucdcs-r-7^-671 
 
 LEARNING BY INDUCTrVE INFERENCE 
 
 "by 
 
 R. S. Michalski 
 
 August, 197^ 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana-Champaign 
 
 Urbana, Illinois 6l801 
 
 Invited paper for the NATO Advanced Study Institute Seminar on 
 Computer Oriented Learning Processes, Aug. 26 - Sept. 7, 197*+> 
 Bonas, France. 
 
 (Preprint for limited distribution. ) 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/learningbyinduct671mich 
 
LEARNING BY INDUCTIVE INFERENCE 
 
 R. S. Michalski 
 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 SUMMARY. The paper is addressed to learning processes which 
 employ inductive inference. A system of variable -valued logic, 
 called VIg, is briefly described and its application to imple- 
 menting inductive learning processes is discussed. The VL2 can 
 be characterized as a 'multi-valued first order predicate logic 1 . 
 An example of learning by a computer program the difference between 
 two classes of objects is given. 
 
 INTRODUCTION 
 
 Learning processes can be generally viewed as the processes 
 of determining and representing relationships which exist among 
 objects. These relationships are determined and represented 
 within the system which learns ('STUDENT') using a source of 
 information about the objects ('TEACHER'). It has been observed 
 (e.g., Bongardl), that the smaller the degree of STUDENT-oriented 
 organization of information which the TEACHER provides, the 
 greater must be the complexity of the STUDENT. Consequently, the 
 learning processes can be classified according to the degree of 
 organization of information provided by the TEACHER. Thus, we 
 can distinguish, e.g., learning 'by being born' (innate capabil- 
 ities) or being designed (the greatest organization on the part 
 of the TEACHER), learning by being programmed, learning from 
 examples, from observation (Vithout teacher'), learning by 
 'inspiration'. In this paper we are concerned with problems 
 which belong to the area of 'learning from examples*. 
 
 Like physical processes which are governed by a law of 
 minimum energy, it seems (still only intuitively) that information 
 
processes, thus also learning processes, may be governed by a 
 corresponding law of 'minimum-complexity' (or 'maximum- 
 simplicity' ). In other words, information processes seem 
 to have an overall tendency to achieve given information 
 processing goals by the simplest means (which, in special cases, 
 just means the minimum number of operations). An evidence of the 
 existence of such a tendency in the area of human literary 
 expression is the Zipf's law. 2 It seems that all human infor- 
 mation processing activities, in particular scientific activities, 
 are oriented toward determining adequate and, at the same time, 
 simple descriptions or explanations of surrounding environment 
 and phenomena. The ability to create the simplest descriptions, 
 which use only the 'most significant' concepts, and disregard the 
 'irrelevant details', is highly regarded and considered an 
 evidence of intelligence. But how can we formally define such 
 concepts as the 'simplest description'. How can we create 
 machines which have the ability of determining such descriptions? 
 
 As Banerji^ pertinently observed, a simple concept for one 
 person may not be simple for another. His explanation of it is 
 that 'there is something in the human mind which, given constant 
 exposure to a concept, however complicated, makes it simple'. 
 This explanation can be deepened by saying that a seemingly 
 complex concept becomes simple if it is well understood, which, 
 in turn, means that its relationship to the well-known concepts 
 has been clearly established. Therefore, in order to be able to 
 define a measure of simplicity of descriptions, two requirements 
 have to be first satisfied: 
 
 (1) A language in which descriptions are expressed has to be 
 assumed. 
 
 (2) A measure of 'semantic equivalence' of descriptions has to 
 be established. This condition is necessary because for 
 determining the 'simplest description' of whatever we 
 describe, we want to compare only descriptions which convey 
 the same information (i.e., which are semantically equivalent ). 
 
 Having satisfied (l) and (2), a measure of simplicity of 
 descriptions can be easily formalized. It can be, e.g., a 
 monotonically decreasing function of the length of a description 
 (measured, e.g., by the number of certain assumed constructs of 
 the language which occur in the description). If there is given 
 a 'simplicity function' over the individual constructs, then one 
 can consider a weighted sum of constructs. If only a preference 
 order of constructs is assumed, then one could use the lexico- 
 graphic functional defined by Michalski. 
 
 In this paper we present some recent results from our work 
 on the theory and computer implementations of systems which can 
 
learn the 'simplest descriptions' by executing an inductive 
 inference process ('inductive learning'). 
 
 LANGUAGE FOR EXPRESSING DESCRIPTIONS: SYSTEM VL2 
 
 The formal system which we are currently developing as a 
 tool for expressing descriptions and implementing inductive 
 learning is a variable-valued logic system VL?. This system is 
 an extension of the system VL^ described by Michalski. > 5,6 
 The VLg system gives a sound formal basis for developing an 
 'algebra of descriptions' which would enable one, for example, 
 to build descriptions, to simplify them, generalize to various 
 degree, to compare descriptions of individual objects or classes 
 of objects, to infer a description of a class of objects from 
 examples of objects of this class, etc. 
 
 The full definition of the system VL2 is not yet available. 
 For the purpose of this paper we will briefly and informally 
 describe some* of the concepts of the system, most relevant to 
 our subject. 
 
 To do it simply, we will relate our description of the 
 system to the presently widely used first order predicate logic 
 (FOPL) : 
 
 1. In FOPL, the atomic formulas (k-ary predicate symbols 
 followed by k occurrences of variables, function forms and/or 
 constants) are assumed to be binary valued (true or false). 
 In the VLg, these formulas (called atomic forms ) are treated 
 as functions which, as well as their arguments, range over 
 independent domains. These domains are determined as most 
 appropriate for the interpretation of the atomic forms and 
 their arguments, or the problem at hand. 
 
 2. The atomic forms occur in a wff of VL2 (a VL2 formula ) as 
 parts of a broader concept of a selector, and are not, 
 generally, the VLg formulas when standing alone (except 
 for the case when a VLg formula reduces to a FOPL formula). 
 
 3. VL2 formulas range over an output domain, denoted D, which 
 
 is a linearly ordered set having the smallest and the largest 
 element . 
 
 * 
 In the full definition of VLg there are more operations than 
 
 those described here and the concept of selector has a broader 
 
 meaning . 
 
h. The sclent or in defined as a selector st atemen t, SS, 
 enclosed in brackets: 
 
 [SS] (1) 
 
 The selector statement is either a condi t ional statement : 
 
 L#R (2) 
 
 or a quantified statement 
 
 Q(L#R) • (3) 
 
 where 
 
 L - called the left part of the conditional statement or 
 the refe ree, is either a VL2 formula (see point 5) or 
 a form which can be described as a quantifier -free 
 FOPL formula over atomic forms. It will be assumed 
 for the purpose of this paper that this FOPL formula 
 is in a disjunctive normal form, and that or is denoted 
 by ', x , and by '.' and negation by a bar over the 
 predicate symbol. For example, a FOPL formula 
 
 P 1 (x,f(y))A^P 2 (y)VP 5 (x,y,c) (k) 
 
 where 
 
 P 1 (x,f(y)),p 2 (y),p_(x,y,c) — atomic forms 
 
 x,y — variables, f(y) - a function of y 
 
 c — a constant 
 is written as 
 
 P 1 (x,f(y))'p" 2 (y),p 5 (x,y,c) (5) 
 
 # denotes '=' or '^' 
 
 R - called the r~j ght part of the conditional statement or 
 reference, is a subset of the union of the domains of 
 atomic forms in L, or a VIo formula. 
 
 Q - a sequence of existential, 3xj_, and/or universal, V x i> 
 quantifier forms, where Xj_ are variables in atomic 
 forms of L. 
 
 Examples of a selector: 
 
 [p(x,y) --- 3] (6) 
 
 [p 1 (x,a).p 2 (y,z) = 2,4] (7) 
 
[3x,vy(p 1 (x,y,b)vp 2 (y,c) = 0,2)] (8) 
 
 The selector in which SS is a conditional statement is 
 called a conditional s elector (e.g., (6) and (7)), else it 
 is called a quantified selector (e.g., (8)). A conditional 
 selector [L # R] in which the referee L is a single atomic 
 form Pj_ and the reference R is a subset of its domain, is 
 called a simple selector . 
 
 A simple selector [P^ = R] ([Pj_ ^ R] ) is said to be 
 satisfied, iff the value of the atomic form P^ is (is not) 
 an element of R. If P, P]_ and P 2 are atomic forms then: 
 
 [P = R] is satisfied, iff [P ^ R] is satisfied 
 
 [P ^.R] is satisfied, iff [P = R] is satisfied 
 
 [P «P # R] is satisfied iff [P-j_ # R] and [P 2 # R] are 
 satisfied 
 
 [P-pPp # R] is satisfied iff [P x # R] or [P 2 # R] is 
 satisfied 
 
 [(3x) (P # R)] is satisfied iff, for given values of all 
 free variables in P (i.e., variables other 
 than x), there exists a value of x which 
 satisfies the selector [P # R] 
 
 [(yx) (P # R)] is satisfied, iff, for given values of free 
 
 variables, the selector [P # R] is satisfied 
 for all values of x. 
 
 5. A Vl£ formula is defined by the following rules: 
 
 (i) an element of the output domain D or a selector 
 standing alone is a VL2 formula, 
 
 (ii) if V, V]_ and Vg are VI^ formulas then so are: 
 
 i(V) called the inverse of V 
 
 V-i A V p (written also V]V 2 ) called the conjunction 
 of V;|_ and V 2 ' 
 
 V, V V ? called the disjunction of V-j_ and V 2 . 
 
 A Vlrj formula in the form of a disjunction of terms, 
 where term is a conjunction of selectors and an element of 
 D, is called a disjunctive simple VI^ formula and denoted 
 as DVIg. 
 
A VLg formula which includes only conditional selectors 
 is called a conditional or quanti fi or -free formula. In 
 what follows we will discuss only conditional VL^ formulas. 
 
 6. Each VLo formula V is assigned a value v(V)eD depending on 
 the values of atomic forms in it: 
 
 (i) The value of an clement of D standing alone is 
 this element itself. 
 
 (ii) The value of a selector is the largest element of 
 D, if the selector is satisfied, otherwise the 
 smallest element of D. 
 
 (iii) If the value V is the k-th smallest element of 
 D, then the value of the inverse -i(v) is the 
 k-th largest element of D. 
 V2.V2 is assigned the smaller of the values of 
 
 V]_ and V 2 
 V-^V VV) is assigned the larger -of the values of 
 
 V]_ and V>>« 
 
 For illustration, below is an example of a VL2 formula 
 and its interpretation: 
 
 ^[p. 1 (x 1 ,x 2 ).p 2 (x 2 ,x^)^medium][p^=true] \j 3Cp*= unknown] V 
 
 l[p 1+ (x 2 ,x J+ )=yellow J> red] (9) 
 
 Suppose that the domains of atomic forms p-j_(x-j_, x 2 ), P2(x2,x*), 
 P3> Plj. (xq) x^ ) ar e, respectively, D-jj=D 2 = { small, medium, large] , ' 
 D^={ unknown, false, true] and Dl^={white, yellow, blue, red, black] . 
 And that the output domain of the formula (9) is D={0, 1, 2, 3,h), 
 ordered as indicated by numbers. 
 
 The formula (9) is assigned value ( has value) h, iff atomic 
 forms Pi(x]_, X2) and P2(x2,xi|.), for given values of X]_, xg and xx 
 take value not equal 'medium', and px takes value 'true'. If 
 the above condition is not satisfied, and px takes value 'unknown', 
 then (9) has value 3* If both of the above conditions do not hold 
 and pl4.(x2,xi(.), for given values X2,xl+, takes value 'yellow' or 
 'red', then (9) has value 1. If none of the above conditions 
 hold, (9) has value 0. 
 
 BASIC CONCEPTS UNDERLYING INDUCTIVE INFERENCE BY MEANS OF VLg 
 
 The subject of inductive inference by means of the VL2 system 
 is very broad. For the limitation of space, we will only 
 delineate some of its major concepts. 
 
Suppose that the domains of all atomic forms in a VL2 
 formula are D]_, D2, . . «,D n . The set of all possible sequences 
 of values of atomic forms, that is set D^xDgX . . . x D n , is called 
 an event space of the formula, and its elements are called event s ♦ 
 The event space of a formula V is denoted by E(v). If the output 
 domain of V is set D, then V expresses a function 
 
 f: E(V) » D (10) 
 
 The atomic forms in a VLg formula denote functions of the 
 similar type, namely an atomic form Pi(x]_, x>>) denotes a function 
 
 p.: D XD -D (11) 
 
 1 1 lg 1 
 
 where Dj_ is the domain of -p^ix^y^) and Vi and Dj domains of 
 xi and X2, respectively. 
 
 The atomic forms, however, do not express the functions (ll), 
 they only denote their names and arguments. For further 
 considerations we will make a simplifying assumption, that these 
 functions are fixed and can be computed for any given values of 
 their input variables. 
 
 Let V"i and V2 be two VL2 formulas having comparable sets of 
 atomic forms* (i.e., one set includes or is equal to another set). 
 And let E be a subset of the event space E, specified by domains 
 of the larger of the two sets of atomic forms. Formulas V]_ and V2 
 are called semantic ally E - equivalent, which we write 
 
 V-l = V 2 (12) 
 
 iff for every eeE 
 
 v(V x ) = v(V 2 ) (13) 
 
 If E = E, then V-j_ and V2 are called semantically equivalent 
 and we write V]_ = v>>. A rule which transforms one formula into 
 another, semantically equivalent formula, is called an equivalence - 
 preserving transformation rule . Below are given examples of such 
 rules (read '=' as: 'the formula on the left side may be replaced 
 by the formula on the right side')* Assume that V is an arbitrary 
 VLp formula; Bj_, P]_, P2 are atomic forms; R]_, R2 £ Dj_ (domain of 
 Pj_), and R c Di = D2 (domains of P]_ and P2). 
 
 The atomic formulas are here considered equal if they represent 
 functions which differ only in that some of their arguments are 
 substituted by a value from the domains of the arguments. 
 
V[P. = R 1 ]VV[P i = R 2 ] s V[P. = R x Uiy {Ik) 
 
 v[r. / R 1 ]vv[p. / R 2 ] - v[p. f. \^\] (15) 
 
 If R^U Bg = D ± and P^fll^ =■. </> (empty set) then (ik) and (15) reduce 
 to (16) and (17): 
 
 V[P i = R i^V[P. = R 2 ] = V (16) 
 
 V[P. / R ] _]\ <[t\ 4 R 2 ] =Y (17) 
 
 V[P ] _ - R]W[P 2 = R] = V[P 1 ,P 2 = R] (18) 
 
 V^ = R][P 2 = R] = V[\'? 2 = R] (19) 
 
 V[P 1 = R][P 2 ^ R] s V[P 1 'P 2 = R] (20) 
 
 Suppose now that the output domain of a DVL2 formula V is a 
 set D whose smallest element is *. Suppose further that all 
 elements of D, except * t denote certain 'specified decisions' 
 about events, and element * denotes an 'unspecified decision'. 
 Let E+ and E* denote subsets of E(v) for which V takes specified 
 and unspecified decisions, respectively. 
 
 Events of E+ are those which satisfy at least one term in V 
 (i.e., satisfy all selectors in the term), while E* are the 
 remaining events in E, i.e., E*=E(v)\E + . We will call the set E + 
 a set of recognizable events of V and E* a set of not -recognizable 
 events of V. Elements of E* will be called *- events . 
 
 Let V]_ be a VL2 formula and E^ its set of recognizable 
 events. 
 
 A rule which transforms the formula Vn into a new formula 
 V2 (whose set of recognizable events is E£), is called a deductive 
 inference rule (DR) if 
 
 e£ 
 
 V^^ = V 2 and E+ c E+ (21) 
 
 and is called an inductive inference rule (IR) if 
 
 E+ ~ 
 V 1 = V 2 and E 2 3 E+ (22) 
 
 According to (22), a rule is an IR, iff Y 2 makes the same 
 specified decisions as Vj_ for events of Ej, but, also, makes 
 specified decisions for some other events than Ej. A question 
 arises of how these 'other' events should be selected and what 
 decisions should be made about them. To answer this question, a 
 
criterion governing an inductive' rule is needed. We accept a 
 criterion which can be characterized as a 'criterion of 
 simplicity'. That is, we design a 'simplicity functional' for 
 VL2 formulas (which can be modified according to application) and 
 employ inductive rules which maximize the assumed functional. 
 
 An important inductive rule of this type is the one which 
 assigns to ^-events such decisions which permit one to apply to a 
 given formula rules (ll|-)-(20) whenever it could lead to the 
 simplification of the formula according to the accepted measure 
 of simplicity (which, at the same time, means a generalization of 
 the formula). 
 
 An inductive program, called AQVAL/l, which operates on such 
 principles, has been developed at the University of Illinois and 
 already experimentally applied to selected learning and recognition 
 problems from the area of medicine5 and plant pathology (the 
 current version of AQVAL/l implements a subset of VL2 called VL]_). 
 It should be mentioned that problems of inductive learning by 
 means of variable -valued logic have a strong relationship to the 
 problems of grammatical inference. ' 
 
 DESCRIBING OBJECTS IN TERMS OF VL2 
 
 In the application of VL2 to describing objects, atomic forms 
 are used to represent certain functions called descriptors . 
 Descriptors are functions which a learning system uses to describe 
 objects. 
 
 Let pi denote a descriptor: 
 
 p, : X D. . - D. (23) 
 
 where 
 
 X denotes the cartesian product 
 J = {1,2, ...,k} 
 
 D-ji - input domains of the descriptor 
 D^ - the output domain of the descriptor 
 
 Special cases of a descriptor: 
 
 1. J = {1}, i.e., p^ is a unary function. If D. denotes a set 
 of objects, and Dj_ a set of the values of a specific 
 characteristic of the objects, then j>± is called a feature . 
 
 2. J= (1,2, ...,k}, k=2,3, ..., D i;L =D i2 =...Di r , Dj = { true, false} 
 If Dji denotes a set of objects, then pi can be interpreted 
 as a k-ary relation among these objects. If Pj_(0i-j_, Oj_ , . . ., 
 
°ii )-truo, then we say that the relation among 0±. f 0^ nf . . .,0j, 
 holds, othervrisc does not hold. If Dj is not a binary-valued' 
 set, but has a finite number of values, then we will say that 
 Pj_ i s a w ultj -valued 1s- ary relation . 
 
 As we can sec a descriptor has a very broad meaning. 
 
 Example 
 
 Suppose D. = D^ denote a set of parts of a certain physical 
 object. To express a fact that, e.g., a relation 'above' holds 
 between certain parts of the object, we can use a function: 
 
 ABOVE: D. x D. -* {true, false} (2k) 
 
 X l X 2 
 
 If the relation 'above' holds between 0]_ and 2 we write 
 [ABOVE ( Op 2 ) = true], or, since the output domain is just binary, 
 simply ABOVE (0p0 2 ). Suppose, however, that we want to distinguish 
 between 3 possibilities: not above, little above, much above. In 
 this case we assume that 
 
 D 1 = [not, little, much} (25) 
 
 To express the fact that Oi is much above 2 , we use a 
 selector 
 
 [ABOVE (Op 2 ) = much] (26) 
 
 If in describing a class of objects we observe that the part 0]_ is 
 either much above or not above the part 2 , we would write: 
 
 [ABOVE (0 X , 2 ) = not, much] (27) 
 
 In describing individual objects we can distinguish the 
 following classes of descriptors: 
 
 1. Global, O-level, descriptors. 
 
 These are features which characterize objects 
 as a whole (e.g., color, size, texture, length, 
 etc.) 
 
 2. Local 1-level descriptors which characterize 
 basic (l-level) parts and k-ary, k=2,3, ••• 
 relationships among them. 
 
 3. Local k-level, k=2, 3, •••> descriptors which 
 characterize k-level parts and relationships 
 among parts of the k-1 level parts. 
 
AN EXAMPLE OF IEAKNING THE SIMPLEST DESCRIPTION OF THE 
 DIFFERENCE BETWEEN TWO CLASSES OF OBJECTS 
 
 Suppose we want to develop a machine which, given examples 
 of objects from certain classes, could learn the simplest 
 (according to some defined criteria) description of the object 
 classes or the differences between classes. Let us assume that 
 the machine has already built-in certain elementary abilities, 
 such as the ability to recognize a triangle or rectangle, to 
 measure their size and orientation, to determine various relation- 
 ships between the recognized oujects, e.g., a relation 'on top of, 
 'in between', etc. The problem of implementing the abilities of 
 this type is quite difficult by itself. Though, there have 
 already been developed computer programs which can, to a limited 
 degree, measure the descriptors of the kind described above (see, 
 e.g., Winston^). It is important to observe, however, that the 
 number of 'such elementary descriptors, which potentially may be 
 needed is not very large, and therefore each of them could be 
 implemented by a specially designed software or hardware device. 
 On the other hand, the number of potential combinations of these 
 descriptors, which may occur in descriptions of real objects, is 
 prohibitively large. Therefore, an important problem, to which 
 we are addressing ourselves is how to implement very efficient 
 inference and learning processes which create goal oriented 
 descriptions of objects or object classes, assuming that these 
 elementary descriptors are available. This type of problem is 
 illustrated by the following example. 
 
 Fig. 1 presents two classes of 'TABLES'. The objective is 
 to implement a learning process which would produce the simplest 
 description, with regard to an assumed simplicity functional, of 
 the difference between these two classes of TABLES. 
 
 Suppose that the following descriptors and their domains are 
 used to describe the TABLES: 
 
 1. global descriptors: length, {short, long] 
 
 # parts, 13,10 
 
 2. a) features of individual parts Pj_, i=l, 2,3,^, (top 
 
 rectangle, left triangle, right triangle, bar): 
 
 part-type (Pi), {0, □, 7,^, = ) 
 
 part -length (Pj_), {0, short, long} 
 
 part-texture (P^, {0, ©,©,©,©} 
 
 (0 means 'not relevant' - when a part does not exist) 
 
 b) binary relations among parts on-top: (P^ , P^ ), 
 {above -middle, above-left, ebove^rightj 
 
c) ternary relations among parts: in-between (Pi,P-<,Pv), 
 
 (low, high] (part I 1 ^ is between \\- and l\) . 
 
 Using these descriptors, the machine describes each object 
 in terms of the VJj2 system, as a conjunction of selectors. For 
 example, object 1 in class 1 would be described as: 
 
 f length- short! [#parts=4] [part-type (P^ )=| 1 1 [ part -type (Po )= 7 ] 
 
 [ part -type (P,)= ^ ][part-type(P^)= t=a][ part -length (P]_) = short] 
 
 [ on-top (Pp P 2 )=above-right][ in-between (P^, P 2 , P^high] (28) 
 
 Suppose that T^_, T^ , T.j_, and Tj, denote the descriptions of 
 objects 1,2,3,4 in class i, 1=1,2, respectively. A description 
 of the class 1 (which is the 'least general') could then be: 
 
 CLASS1(T 11 V T^VT^VT^) (29) 
 
 and of class 2: 
 
 CLASS2(T 21 V T 22 VT 23 VT 2ij .) (30) 
 
 where {#CIASS1, CIASS2} is the output domain of the formulas. 
 
 Events which do not satisfy any of these formulas are *-events. 
 Suppose now that as a simplicity criterion we accept a criterion 
 demanding that a formula has the minimum number of terms, and, with 
 the secondary priority, the minimum number of selectors. 
 
 A way to attain the simplest, in the above sense, description 
 of the difference between the two classes, is to maximally simplify 
 and generalize the formulas (29) and (30) under the restriction 
 that the resulting formulas will have the empty intersection. 
 (The 'empty intersection' means that there will be no events which 
 satisfy both formulas.) This is done by assigning to ^-events such 
 decisions which lead to the maximal simplification and generalization 
 of formulas (29) and (30) by using rules (l^)-(20) (without 
 violating the above-mentioned restriction). Such an inductive 
 process can be very efficiently executed by the previously mentioned 
 computer program AQVAL/l. The simplest formulas, according to our 
 criterion, for both classes obtained from the AQVAL/l were: 
 
 CLASSl[length=short][part-texture(ri | )= ([),© ] (31) 
 
 C LASS2[ lengths long ]V [part-texture (P. )= ,@ ] (32) 
 
 (the execution time was less than 3 sec. on the IBM 3^0/75; 
 AQVAL/l is written in PL/l). 
 
These formulas state that TABLES of class 1 are 'short* and 
 the texture of the bar is ({]]) or @ , and that TABLES of class 2 
 are either long or the texture of the "bar is <^h or there is no 
 bar. 
 
 This description of the classes seem to agree well with what 
 a human might accept as a 'most simple* difference between the two 
 classes. 
 
 ACKNOWLEDGMENT 
 
 The author gratefully acknowledges the financial support he 
 obtained from the Department of Computer Science of the University 
 of Illinois at Urbana-Champaign for conducting the research 
 reported in this paper. It is also his pleasant duty to express 
 thanks to* Mr. A.B. Baskin for the fruitful discussions, criticism 
 and proofreading of the paper. 
 
 REFERENCES 
 
 1. Bongard, M. M., Probliema uz nayania, izd. Nauka, Moscow 19^7 • 
 (English trans. : Pattern recognition, New York Spartan 
 Books, 1970). 
 
 2. Cherry, C, On Human Communication, the M.I.T. Press, 
 Cambridge, Mass., 196^. 
 
 3. Banerji, R. B., Simplicity of concepts, training and the 
 real world, Artificial and Human Thinking, edit. A. Eli thorn, 
 D. Jones, Jossey-Bass Inc., Publishers, 1973* 
 
 k. Michalski, R. S., A Variable -Valued Logic System as Applied 
 to Picture Description and Recognition, GRAPHIC LANGUAGES, 
 edit. F. Nake, A. Rosenfeld, North -Holland Publishing 
 Company ( Proceedings of the IFIP Working Conference on 
 Graphic Languages , Vancouver, Canada, May 1972 ) . 
 
 5. Michalski, R. S., AQVAL/l — Computer Implementation of a 
 Variable-Valued Logic System and the Application to Pattern 
 Recognition, Proceedings of the First International Joint 
 Conference on Pattern Recognition, Washington, D.C., 
 
 . October 30-November 1, 1973. 
 
 6. Michalski, R. S., VARIABLE -VALUED LOGIC: System VLi, 
 Proceedings of the International Symposium on Multiple -Value d 
 Logic, West Virginia University, Morgantown, West Virginia, 
 May 29-31, 197^. 
 
7 . Baskin, -A. B., A comparative discussion of variable-valued 
 logic and grammatical inference, Report Ho. G(o, of the 
 Department of Computer Science, University of Illinois, 
 
 Urbana, July l<y^l . 
 
 8. Winston, P. J., Learning structural descriptions from 
 
 exampD.es, Ph.D. thesis, MAC-TH-76, Artif. Intell. Lab., 
 M.I.T., 1970. 
 
 
Oi 
 
 0, 
 
 1. 
 
 ■»*■•■■»^ I, ■■™*•■•?■^^»»^'»^■■^^^^^^W^ , •^ 
 
 i j ^i i, f ,+,* h 
 
 Td 
 
 1. 
 
 •77?^~WI~«TTr!^«^»^» 
 
 V/S/M/W/. 
 
 ^^^^^w^^^ 
 
 inlV^'ill -l»*l 
 
 2. 
 
 T^fTTTT?! »■■ "TT^TTTT^TTTTTT: 
 
 ^ 
 
 ^T?!*rm*T''^^!^^TA 
 
 3. 
 
 □I 
 
 ^ 
 
 3. 
 
 ^ ■***•" 
 
 ^r 
 
 4. 
 
 4. 
 
 ^ 
 
 5551 
 
 iiiiiiiniiiiii 
 
 Two classes of 'TABLES' 
 Fig. 1. 
 
iBLIOGRAPHIC DATA 
 1EET 
 
 1. Report No. 
 
 UIUCDCS-R-7U-671 
 
 3. Recipient's Accession No. 
 
 Title and Subtitle 
 
 5. Report Date 
 
 LEARNING BY INDUCTIVE INFERENCE 
 
 August, 1974 
 
 6. 
 
 Author(s) 
 
 R. S. Michalski 
 
 8. Performing Organization Rept. 
 No. 
 
 Performing Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana-Champaign 
 
 Urbana, Illinois 6l801 
 
 10. Project/Task/Work Unit No. 
 
 IT. Contract /Grant No. 
 
 J. Sponsoring Organization Name and Address 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana-Champaign 
 
 Urbana, Illinois 6l801 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 i. Supplementary Notes 
 
 5. Abstracts 
 
 The paper is addressed to learning processes -which employ inductive inference. A 
 system of variable -valued logic, called VL2, is briefly described and its application 
 to implementing inductive learning processes is discussed. The VL2 can be charac- 
 terized as a 'multi-valued first order predicate logic'. An example of learning by 
 a computer program the difference between two classes of objects is given. 
 
 7. Key U'ords and Document Analysis. 17a. Descriptors 
 
 7b. Identifiers /Open-Ended Terms 
 
 7c. ( OSATI Fie Id /Group 
 
 
 
 
 18. Availability Statement 
 
 19. 
 
 Security (lass (This 
 Report ) 
 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 
 2(5. 
 
 Security Class (This 
 Page 
 
 UNCLASSIFIED 
 
 22. Price 
 
 °RM NTIS-35 ( 10-70) 
 
 
 
 USCOMM-DC 40329-P- 
 
NOV 1 " 1974 
 

 

 
 UNIVERSITY OF ILLINOIS-URBANA 
 510 14 II 6R no C002 no 667 672(1974 
 Report / 
 
 3 0112 088401382 
 
 m 
 
 
 v: 
 
 ■ 
 
 1 1 
 
 *. 
 
 
 
 ■ 
 
 1 
 
 H 1 1 
 
 
 
 1 
 
 ■ 
 
 1 ^3 
 
 LH 
 
 ■ 
 
 *rV. 
 
 I ■ 
 
 ■ 
 
 
 B 
 
 H L^HH 
 Hi mm LH 
 
 MwHrTBflMHW tu- 
 
 I 
 
 
 H H 
 
 Hi ¥*¥* 
 
 mBHLh HI