The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. To renew call Telephone Center, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN Obv 6N« L161— O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/patternrecogniti927mich 5Jo. * and rules representing knowledge of the recognition problem at hand. The paper formulates the underlying theory for generalization and optimization of des- criptions of object classes, expressed in the form of decision rules. The language for formulating descriptions is an extension of the first order predicate calculus, called the variable-valued logic calculus VL21. ^21 contains several new syntactic forms, specially oriented for expressing inductive processes. The presented approach uniformly combines descriptors (variables, predicates, functions) of three different types: nominal, linear and structured, and has an ability to generate new descriptors not used in the initial data rules. Index Terms pattern recognition, decision rules, generalization techniques, computer induction, knowledge-acquisition, learning from examples, many-valued logic, computer inference, computer consulting systems, theory formation , inductive inference This work was supported in part by the National Science Foundation under Grant NSF MCS 76-22940. The author is with the Department of Computer Science, University of Illinois, Urbana, Illinois 61801 -1- -2- 1 . INTRODUCTION A pattern recognition rule can be viewed as a rule DESCRIPTION ::> RECOGNITION CLASS (1) which assigns a situation (an object, a process, etc.) to the RECOGNITION CLASS, when the situation satisfies the DESCRIPTION. In the decision space approach the DESCRIPTION is in the form of an analytical expression involving a set of numerical variables selected a priori. Variables spanning the decision space are treated uniformly, are usually assumed to be measured on at least an interval scale, and are desired to be relevant and independent characterics of the objects. When the variables are strongly interconnected and the relevant object characteristics are various relations among the variables, or among parts or subparts of objects, then the decision space approach becomes inadequate. In such situations the structural approach can be useful. In the structural approach, the DESCRIPTION is a formal grammar (usually a phrase-structure grammar) in which terminals are certain elementary parts of objects, called 'primitives'. The types of relationships which can be expressed "naturally" in terms of a formal grammar are, however, quite limited. If the relevant characteristics include, for example, some numerical measurements in addition to relations and symbolic concepts, then grammars involving them are very cumbersome or inadequate. This is a strong limitation, because in many problems an adequate class description requires both numerical characterizations of objects and a specification of various relationships among properties of objects, of object parts, logical conditions on properties, etc.; i.e., involve descriptors of mixed arity and measured on different scales. Thus, there is a need for a method which could handle simultanously all such desciptors. -3- Both the decision space approach and the syntactic approach tend to produce descriptions which are not easily comprehensible by humans. This is so because these descriptions do not directly correspond to the 'natural language type' descriptions which human experts would develop observing the same data and which they would normally like to use. Although in some applications 'human comprehensibility ' may not be important, in other ap- plications (e.g., in expert computer consulting systems) it is a crucial requirement. This paper presents results, still early and limited, of an attempt to develop a uniform conceptual framework and an implementation method which would satisfy both of the above requirements. In addition, an important aspect of this method is that the final descriptions which it produces may involve new descriptors (variables or relations) which were not included in the initial characterization of objects. This is achieved through the application of 'metarules' which represent the under- lying knowledge of the problem at hand and of the properties of descriptors used in formulating the descriptions of exemplary data. Therefore, the approach taken is in the spirit of research in artificial intelligence. The method uses logic as the basic formal framework (specifically, a certain syntactic extension of the first order predicate calculus, called variable-valued logic system VL 91 ), and is most closely related to the, body of work termed 'computer induction'. The ability to develop new descriptors, in addition to those given a priori, places this work in the category of what we call 'constructive induction'* as opposed to 'non-constructive induction', in which the final descriptions relate only descriptors initially provided. * The author thanks Larry Travis of the University of Wisconsin for suggesting this name. -4- 2. RELATED RESEARCH It would be a very difficult task, requiring more space than provided, to characterize adequately various important con- tributions to computer induction. We will make here only a very limited and certainly not adequate review of some more recent works. Many results consider inductive tasks within a specific problem domain. For example, programs collectively called METADENDRAL [1] use a model-directed heuristic search to determine rules that describe themolecular structure of an unknown chemical compound from mass spectrometry data. In [2] Winston describes a method for deter- mining a graph description of simple block structures from examples. A program developed by Lenat [3] generates concepts (represented as collections of a priori defined properties) of elementary mathematics, under the guidance of a large body of heuristic rules. Soloway and Riseman [4] describe a method for creating multi-level descriptions of a part of a baseball game, starting with 'snapshots' of the game, and using rules representing general knowledge of the game. The programs such as those mentioned above usually incorporate a large body of task-specific knowledge and tend to perform quite well on tasks they were designed for. They represent an important achieve- ment and demonstrate again that high performance requires specialized solutions. An important problem which they raise, however, is how to untangle and systematize the ideas which they contribute* in order to extend understanding of inductive processes at large, and to apply in other problem areas. -5- A significant part of research has been concerned with determining patterns in sequences of symbols (e.g., Simon [5], Waterman [6]). Simon [5] found that descriptions of such patterns consistently incorporate onlyafew basic relations: 'some' and 'next' between symbols, iterations between subpatterns, and hierarchic phrase structure. Gaines [7] developed a method for generating finite-state automata, which approximate a given symbol string, and represent different trade-offs between the complexity and poorness- of-fit. Shaw, Swartout and Green [8] developed a program for inferring Lisp code from a set of examples of Lisp statements. The above works are related to the general subject of grammatical inference (i.e., inference of a grammar which may have produced a given set of strings) . Early work in this area was con- cerned with the inference of a phrase structure grammar (e.g., Feldman et al [9]). More recent work moves into inferring 'multi-dimensional' grammars (e.g., work by Brayer and Fu [10]). In the recent years there has been a new trend toward the development of general methods of induction. Michalski and his collaborators (e.g., [11, 12, 13]) have developed a methodology (using a sentencial calculus with discrete variables, called variable-value logic system VL, 5 as a formal basis) and computer programs for determining generalized and optimal in some sense discriminant descriptions of classes of objects from examples. The examples are presented as sequences of values of discrete variables with an associated recognition class. Work in a similar spirit, al- though more limited in scope, was reported by Stoffel [14] (the elementary statements used there are restricted to the 'variable-value' forms, i.e., to 'elementary selectors', as described in Section 4). -6- Many authors use a restricted form (usually quantifier- free) of the first-order predicate calculus (FOPC) or some equiva- lent notation as the formal framework for formulating hypotheses. Morgan [15] describes a formal method of hypothesis generation, called f- resolution, which stems from deductive resolution principles. Various theoretical issues of induction in FOPC were considered by Plotkin [16]. Fike, Hart and Nilsson [17] describe an algorithm for generalizing robot plans. Hayes-Roth and McDermott (e.g., [18]), also Vere [19], describe methods and computer programs for generating conjunctive descriptions of least generality (which they call 'maximal abstractions'), of a set of objects represented by products of n-ary predicates. The rules of generalization which they use can be characterized as 'dropping a condition' and 'turning constants into variables' (see section 5.3). Related, but different in spiri^ is work by Zagoruiko [20] on a general method for 'strengthening hypotheses' by narrowing the uncertainty intervals of values of output variables, and work by Hedrick [21] on determining production systems using a semantic net of predefined concepts. This paper presents a theoretical framework for generalizing and optimizing descriptions of object classes in the form of decision rules. The decision rules can involve descriptors of three different types (nominal, linear and structured), employ some new syntactic forms, and use problem knowledge for guiding induction and generating new descriptors. The formal notation is a modification and extension for FOPC, called variable-valued logic system VL . This formalism is claimed to be more adequate than the traditional FOPC as a conceptual framework for describing the inductive processes under consideration. The paper is an extension and modification of the report [22], and stresses the conceptual principles -7- of induction method rather than specific algorithms and implementation details. Most of the latter are described in [23, 24, 25]. 3. PROBLEM STATEMENT A VL transformation rule is defined as a rule DESCRIPTION.- dj> DESCRIPTION 2 (2) where DESCRIPTION, and DESCRIPTION are expressions in VL ?1 system (section 4), f ^> stands for various transformation operators which define the meaning of the rule. A DESCRIPTION may look like: ax. ,x [on-top(x ,x ) ] [size(x )=3. .5] [color (x)=blue, yellow, red] A [length(x.) . length(x )=small] (For explanation of notation see section 4) . We will consider here the following transformation operators: (i) ::> the operator defines a decision rule. DESCRIPTION specifies a decision (or a sequence of decisions) which is assigned to a situation which satisfies DESCRIPTION. . (In the application to pattern recognition, DESCRIPTION- defines the recognition class.) If a situation does not satisfy the DESCRIPTION , the rule assigns to it a NULL decision. (ii) =* the operator defines an inference rule. If a situation satisfies DESCRIPTION , the rule assigns the truth-status 'TRUE' to DESCRIPTION-, otherwise the truth-status of DESCRIPTION is '?'. (In an inference rule DESCRIPTION is called the condition and DESCRIPTION is called the consequence. -8- A decision rule can be viewed as a special case of an inference rule, namely, when DESCRIPTION is a constant, an elementary selector, or a product of elementary selectors involving decision variables (see def. 2), also, when its truth-status is TRUE (in general, it may be not TRUE). (iii) |^ the operator defines a generalization rule, which states that the DESCRIPTION^ is more general than DESCRIPTION , i.e., the set of situations which satisfy DESCRIPTION- is a superset of the set of situations satisfying DESCRIPTION . (iv) |= the operator specifies an equivalence preserving trans- formation rule (when the above mentioned sets are equal) . The rule is a special case of a generalization rule. The problem considered in this paper is defined as follows: • Given is (a) a set of VL decision rules, called data rules , which specify initial knowledge, {C . .} , about some situations (objects, processes, ...) and the recognition class, K., associated with them: C ±1 ::>K ± , C 12 ::>K ± C^ :: > K ± C n ::> K 2 , C 22 ::> K 2 C U2 ::> ^ (3) C - ::> K , C ::> K C , ::> K ml m mz m mtm m -9- (b) a set of VL inference rules which define a problem environment, i.e., represent knowledge about the recognition problem under consideration. This includes value sets of descriptors used in the data rules, the properties of descriptors and their interrelationships characteristic to the problem at hand. (c) a preference (or optimality) criterion, which for any two 'comparable' sets of decision rules specifies which one is more preferable, or states that they are equally preferable. • The problem is to determine, through an application of generalization rules ( sec. 5.3), a n ^w set of decision rules (called output rules or hypotheses) : c k ::> V c ii ■■■■ >K v ■■■ C U ::> % G 21 * ml > K r C< 21 ::> K r ... C ^ ::> ^ > K , C\ ::> K , ... C ::> K m m2 m mrm m (A) which are most preferable among all sets of rules that do not contradict the problem environment rules, and with regard to the input rules are consistent and complete. The output rules are consistent with regard to input rules, if for any situation to which the input rules assign a non-NULL decision, the output rules assign to it the same decision, or the NULL decision. -10- Th e output rules are complete with regard to input rules, if for any situation to which the input rules assign a non-NULL decision } the output rules also assign to it a non-NULL decision. It is easy to see that if the output rules are consistent and complete with regard to the input rules then they are semantically equivalent (i.e., assign the same decision to the same situation) or more general than the input rules (i.e., they may assign a non-NULL decision to situations to which the input rules assign a NULL decision). From a given set of data rules it is usually possible to derive many different sets of rules which are consistent and complete and which satisfy the problem environment rules. The role of the preference criterion is to select one (or a few alternative sets of rules) which is most desirable in the given application. The preference criterion may refer to the simplicity of the rules (defined in some way) , their generality, the cost of measuring the information needed for rule evaluation, degree of approximation to the given facts, etc. (section 5.4). In this paper we accept the restriction that the DESCRIPTIONS, C • ■ and C. are disjunctive simple VL 91 expressions (section 4). Such expressions have a very simple interpretation, and seem to be sufficient for many applications. -11- 4. VL EXPRESSIONS AS DESCRIPTIONS 4.1 Definition of VL Data rules, hypotheses, problem environment descriptions, and generalization rules are all expressed using the same formalism, that of variable-valued logic calculus VL ?1 . * VL 91 is an extension of predicate calculus designed to facilitate a compact and uniform expression of descriptions of different degrees and different types of generalization. The formalism also provides a simple linguistic inter- pretation of descriptions without losing the precision of the con- ventional predicate calculus. To make the paper self-contained, we will provide here a brief description of VL 91 . There are three major differences between VL», and the first order predicate calculus: 1. In place of predicates, it uses selectors (or relational statements) as basic operands. A selector, in the most general form, specifies a relationship between one or more atomic functions and other atomic functions or constants. A common form of a selector is a test to ascertain whether the value of an atomic function is a specific constant or is a member of a set of constants. The selectors represent compactly certain types of logical relationships which can not be directly represented in FOPC but which are common in human descriptions. They are particularly useful for representing changes in the degree of generality of descriptions and for syntactically uniform treatment of descriptors of different types. *VL 91 is a subset of a more complete system VL» under development -12- 2. Each atomic function (a variable, a predicate, a function) is assigned a value set (domain), from which it draws values, together with a characterization of tne structure of the value set. This feature facilitates a representation of the semantics of the problem and the application of generalization rules appropriate to the type of descriptors. 3. An expression in VL^-. can have a truth status: TRUE, FALSE or ? (UNKNOWN) . The truth-status '?' provides an interpretation of a VL 91 description in the situation, when, e.g., outcomes of some measurements are not known. Definition 1 : An atomic function is a variable, or a function symbol followed by a pair of parentheses which enclose a sequence of atomic functions and/or constants. Atomic functions which have a defined interpretation in the problem under consideration are called descriptors. A constant differs from a variable or a function symbol in that its value set is empty. If a confusion is possible, a constant is typed in quotes. Examples Constants 2 * red Atomic forms : x color(red) on-top(pl,p2) ( (x , g(x )) Exemplary value sets: D(x ) = {0, 1,..., 10} D(color) = {red, blue,...} D (on-top} = {true, false} D(f) = {0,1,..., 20} Definition 2 : A selector is a form [L # R] (5) where L - called referee^ is an atomic function, or a sequence of atomic functions separated by ' . ' . (The operator ' . ' is called the internal conjunction. ) -13- // - is one of the following relational operators: = =j: > < > < R - called reference, is a constant or atomic function, or a sequence of constants or atomic functions separated by operator 1 , ' or ' . . ' . (The operators ' , ' and ' . . ' are called the internal disjunction, and the range operator 3 respectively) . A selector in which the referee L is a simple atomic function and the reference R is a single constant is called an elementary selector. The selector has truth-status TRUE {or FALSE} with regard to a situation if the situation satisfies {does not satisfy] the selector, i.e., if the referee L is {is not} related by # to the reference R. The selector has the truth- status '?' (and is interpreted as being a question), if there is not sufficient information about the values of descriptors in L for the given situation. To simplify the exposition, instead of giving a definition of what It means that 'L is related by // to R' , we will simply explain this by examples. (See section section 5.1 for more details). (i) [color(boxl) = white] (ii) [length(boxl) > 2] (iii) [weight (boxl) = 2.. 5] (iv) [blood-type (PI) = O.A.B] (v) [on-top (boxl, box2) - T] or simply [on-top (boxl, box2)] (vi) [above (boxl, box2) = 3 ,r ] (viii) [weight (boxl) > weight (box3)] color of boxl is white length of boxl is greater than or equal to 2 weight of boxl is between 2 and 5, blood-type of PI is or A or B boxl is on top of box2 box 1 is 3" above box2 the weight of boxl is greater than the weight of box3 (ix) (x) [length(boxl) . length (box2) = 3] the length of boxl and box2 is 3 [type( P;L ) . type (P 2 ) = A,B] the type of P and the type of Y is either A or B. Note the direct correspondence of the syntactic forms to linguistic descriptions. Note also that some selectors can not be expressed in FOPG in a (pragmatically) equivalent form (e.g., (iv) , (ix) , (x) ) . -14- A VL ?1 expression (or, here, simply VL expression) is defined by the following rules: (i) A constant TRUE, FALSE or '?' is a VL expression (ii) A selector is a VL expression (iii) If V, V and V are VL expressions then so are: (V) formula in parentheses — I V inverse V A V or V V conjunction V V V- disjunction V.. V V_ exclusive disjunction 1—2 J V \| V exception V t=£>V„ metaimplication wherecO-e {+,*», ::>,=>, |g , (= } (implication, equivalence, decision assignment, inference, generalization, semantical equivalence) 3x 1 ,x_, . . . ,x (V) existentially quantified expression Vx , x , . . . , x, (V) universally quantified expression A VL formula can have truth-status TRUE (T) , FALSE (F) or UNKNOWN(?). The interpretation given to connectives ~|» A, V, -*-, is defined in Fig. 1. (This interpretation is consistent with Kleen-Korner 3-valued logic) . An expression with the operator =*, |< or f= is assumed to always have the truth-status TRUE and with operator ::>, TRUE or ?. Operators\ , V_, and <* are interpreted: V \ V is equivalent to V (IVJ V V_ V is equivalent (V V V )\ V V V « V is equivalent to (V ->V ) (V -^V ) The truth-status of r TRUE {FALSE} if, in a given situation, there exists {does not exist} a value of x which makes 3x(V) is / the truth-status of V equal TRUE ? if it is not known whether there exists . . . -15- ? r? F _T A • F T ? ? F ? • F F F F T ? • F T V • F T • ? ? T F • F T T T T T -» ? F T F T" T T t~~?~1F~t DEFINITION OF CONNECTIVES T.A.V AND — -** IN VL. 21 Figure 1 Vx(V) is TRUE {FALSE} if for every value of x in a given situation, the truth-status of V is {is not} TRUE ? if it is not known whether for every . . . A constant * ('irrelevant') is introduced to substitute for R, in a selector [L = R], when R is the sequence of all p.ossible values the L can take . A VL expression in the form QF 1 ,QF 2 ,... (V 1 v P 2 v,..vP 1 ) (7) where QF . is a quantifier f orm 3x.. ,x_, . . . or Vx.,x 9 ,... and P. is a con- junction of selectors (a term), is called a disjunctive simple VL expression (a DVL expression) . -16- 5. INFERENCE AND GENERALIZATION RULES 5. 1 Interpretation of Inference Rules An inference rule DESCRIPTION ■* DESCRIPTION (8) is used by applying it to situations. A situation is, in general, a source of information about values of variables and atomic functions in DESCRIPTION., (the condition part of the rule). A situation can, e.g., be a data base storing values of variables and procedures for evaluating atomic functions, or it can be an object on which various tests are performed to obtain these values. A decision rule is viewed as a special case of an inference rule, when DESCRIPTION (the consequence or decision part of the rule) is a con- stant, an elementary selector, or a product of elementary selectors involving decision variables (i.e., the DESCRIPTION uniquely defines a decision or a sequence of decisions) . The truth status of the condition and decision part of a rule, before applying it to a situation, is assumed to be UNKNOWN. Let Q denote the set of all possible situations under consideration. To characterize situations in Q, one determines a set S, called the descriptor set 3 which consists of variables, predicates and atomic functions (called, generally, descriptors) whose specific values can adequately characterize (for the problem at hand) any specific situation. We will assume here that theaiguments of atomic functions are single variables, rather than other atomic functions. A situation is characterized by an event which is a sequence of assignments (L:=v), where L is a variable or an atomic function with specific values of arguments, and v is a value of the variable or atomic function which characterizes the situation. It is assumed that each descriptor has defined a value set (domain) which contains all possible values the -17- descriptors can take for any situation in Q. Certain descriptors may not be applicable to some situations and therefore it is assumed that a descriptor in such cases takes value NA, which stands for not applicable. Thus, the domains of all descriptors always include by default the value NA. The set of all possible events for the given descriptor set S is called the event space, and denoted &(S). It should be noted that within a single event certain variables (variables which are quantified in formulas) may be assigned a number of different values, i.e., there may be more than one pair (L:=v.), where L is a variable and v., i = 1, 2, ... represent different values. An event e £ £(S) is said to satisfy a selector [f (x. , . . . ,x^) # R] iff the value of function f for values of x. , i = 1, 2, ..., k, as specified in the event e, is related to R by //. For example, the event e: (. . .x 5 :=a l5 x 6 :=a , f 2Q (a 1 , a^ := 5, ...) satisfies the selector: [f 20 ( x 5' x 6^ = 1 » 3 > 5 ^ A satisfied selector is assigned truth-status TRUE. If an event does not satisfy a selector then the selector is assigned truth-status FALSE. If an event does not have enough information in order to establish whether a selector is satisfied or not then the selector has UNKNOWN truth-status with regard to this event. Let us assume first that the condition part of an inference rule is a quantifier-free formula. Interpreting the connectives -J , A, V , as described in figure 1, one can determine from the truth status of selectors the truth-status of the whole formula. An event is said to satisfy a rule, iff an application of the condition part of the rule to the event gives the formula truth-status TRUE. Otherwise, the event is said to not satisfy the rule. -18- Suppose now that the condition formula is in the form 3x(V) An application of this formula to an event assigns status TRUE to the formula iff there exists in e a value assigned to x such that V achieves status TRUE (x may have a number of different values assigned to it). For example, the formula apart [color (part) = red] is satisfied by the event: e = (... part:=Pl, color (Pl):=blue, part:=P2, color (P2) :=yellow, part:=P3, color (P3) :=red. . .) If the condition part is a form V X (V) then it is assigned status TRUE if every value of x in the event applied to it satisfies V. If the condition part assumes truth-status TRUE then the decision part is assigned status TRUE. When the decision part reaches status TRUE then variables and functions which occur in it are assumed to have values which make this formula TRUE. These values may not, in general, be unique. For example, suppose that V is a decision part with status TRUE: V: [p(x r x 2 ) = 2][x 3 = 2:5][x 5 =7] V is interpreted as a description of a situation in which p has value 2 (if a specification of p(x..,x ) is known, then from it we can infer what values of x.. and x_ might be), x„ has a value between 2 and 5, inclusively, and x, has value 7. (Note that the formula does not give precise information about the value of x„.) After applying a formula to an event, the truth status of the condition and decision part returns to UNKNOWN. The role of an inference rule can then be described as follows: the rule is applied to an event, and if the event satisfies the condition part, then an assignment of values to variables -19- and functions is made as defined by the decision part. This assignment defines a new event (or a set of events which satisfy the decision part) . Another inference rule now can be applied to this event (or set of events), and if satisfied by it (or by all of them) , a new assignment of values to some variables and functions can be made. Examples of VL inference rules: [p(x r x 2 ) = 3][q(x 2 ) = 2,5] [x ? + 0] "♦ [d(y x ) = 7][p( yi ,y 2 ) = 2] Hx 3 ([p(x 1 ,x 3 ) = 2..3][q(x 7 ,x 3 ) > 2]) V [t(x ± ) =» 1] «• [dCy^ = 7] TRUE ^ [p(x 2 ,x 7 ) = 2][x ? = 2,3,5] 5. 2 Specification of the problem environment in the form of inference rules Types of descriptors The process of generalizing a description depends on the type of descriptors used in the description. The type of a descriptor depends on the structure of the value set of the descriptor. We distinguish here among three different structures of a value set: 1 . Unordered Elements of the domain are considered to be independent entities, no structure is assumed to relate them. A variable or function symbol with this domain is called nominal (e.g., blood-type). 2. Linearly Ordered The domain is a linearly ordered set. A variable or function symbol with this domain is called linear (e.g., military rank, temperature, weight). 3. Tree Ordered Elements of the domain are ordered into a tree structure. A predecessor node in the tree represents a concept which is more general than the concepts represented by the dependent nodes (e.g., the predecessor of nodes 'triangle, rectangle, pentagon, etc.' may be a 'polygon'). A variable or function symbol with such a domain is called structured. -20- Each descriptor ( a variable or fuction symbol ) is assigned its type in the specification of the problem. In the case of structured descriptors, the structure of the value set is defined by inference rules (e.g., see eqs. (13) , (14) , (15)) . In addition to assigning to each variable and function symbol a domain, one defines properties of variables and atomic functions characteristic for the given problem. They are represented in the form of inference rules. Here are a few examples of such properties. 1. Restrictions on Variables Suppose that we want to represent a restriction on the event space saying that if a value of variable x- is ('a person does not smoke'), then the variable x„ is 'not applicable' (x„ - kind of cigarettes the person smokes) . This is repre- sented by a rule: [x ± = 0] => [x 3 = NA] NA = not applicable 2. Relationships Between Atomic Functions For example, suppose that for any situation in a given problem, the atomic function f(x.., x ? ) is always greater than the atomic function g(x , x ? ) . We represent this: T => Vx p x 2 [f(x r x 2 ) > g(x r x 2 )] 3. Properties of Predicate Functions For example, suppose that a predicate function is transitive, We represent this: \/x 1 ,x 2 ,x 3 ([left(x 1 ,x 2 )] [left(x 2 ,x 3 )] =* [leftU^x^ ]) Other types of relationships characteristic for the problem environment can be represented similarly. ■21- 5.3. Generalization rules In order to transform data rules (3) into hypotheses (4) , generalization rules are applied to data rules. A generalization rule transforms one or more decision rules associated with the same general- ization class (which, in our case, is the same as recognition class) , into a new decision rule, which is equivalent to or more general than the initial rules. A decision rule V ::> K (9) is equivalent to a set of decision rules {V ± ::> K}, i = 1, 2, ... (10) if any event which satisfies at least one of the 7., 1=1, 2, ..., satisfies also V, and conversely. If the converse is not required, the rule (9) is said to be more general than (10) . The generalization rules are applied to data rules under the condition of preserving consistency and completeness, and achieving optimality according to the preference criterion. A basic property of a generalization transformation is that the resulting rule may have UKNOWN truth-status (is a hypothesis) ; its truth-status has to be tested on new data. Below is a list of a few basic generalization rules (K denotes a generalization class) . Non-constructive rules : (i) the extending reference rule V[L = R ] ::> K \< V[L = R 2 ] ::> K where L - is an atomic function R^l R- , and R ,R„ are subsets of the value set, D(L), of descriptor L. -22- V - an arbitrary description (here a VL expression) . This is a generally applicable rule; the type of descriptor L does not matter, (ii) The dropping selector (or dropping condition ) rule V[L = R] ::> K |< V ::> K This rule is also generally applicable. It is one of the most commonly used rules for generalizing information. It can be derived from rule (i) , by assuming that R~ in (i) is equal the value set D(L). In this case the selector [L = R~] has always truth-status TRUE, and as such can be removed, (iii) The closing interval rule V[L = a] ::> K V[L = b] ::> K This rule is applicable only when L is a linear descriptor. ( V[L = a..b] ::> K To illustrate the rule, consider as objects two states of a machine, and as a recognition class, a characterization of the states as normal. The rule says that if the states differ only in that the machine has two different temperatures, say, a and b, then the hypothesis is made that all states in which the temperature is in the interval [a,b] are. also normal. (iv) The climbing generalization tree rule r V[L = a] ::> K one or more rules V[L = b] ::> K / [L = s ] ::> « ^[L = i] ::> K V -23- where L is a structured descriptor s - represents the predecessor node (a concept at the next 'level of generality') of nodes a,b,...and i, the tree domain of L. The rule is applicable only to selectors involving structured descriptors. This rule has been used, e.g., in [2], [3], [21]. Example: V[ shape (p)=triangle ] ::> K 7[shape(p)=rectangle] ::> K (v) The extension against rule (. K[shape(p)=polygon] ::> K V ± [L = R x ] ::> K V 2 [L = R 2 ] ::>-»£ C [L + R 2 ] ::> K where R rs R = V 1 and V - arbitrary descriptions. This rule is of general applicability. It is used to take into consideration 'negative examples', or, in general, to maintain consistency. It is a basic rule for determining discriminant class descriptions. one or more rules (vi) The 'turning constants into variables ' rule r 7[p(a,Y)] ::> K 7[p(b,Y)] ::> K 7[p(i,Y)] ::> K C 7[p(x,Y)]::> K V where Y stands for one or more arguments of atomic function p. -24- x is a variable whose value set includes a,b,...,i. This is a rule of general applicability. It is the basic rule used in works on induction employing predicate calculus. Constructive Rules ; Constructive rules generate descriptions of the data rules in terms of certain new descriptors, and, therefore, are a form of generali- zation rules. They also can be viewed simply as rules which generate new descriptors ( 'metadescriptors ' ) . There can be very many such rules. We will restrict ourselves here to two examples. Some constructive rules are encoded as specialized procedures. (vi)the counting rule 7J attribute (P _)=A]. . . [attribute (P )=A] [attribute (P)4A] . . . . .[attribute^P^f a] |< V[#P-attribute -A=k] ::> K where P ,P_, . . . ,P , . . . ,P - are constants denoting, e.g., parts of an object attribute - stands for a certain attribute of Pj-s, e.g., color, size, tecture, etc. #P-attribute..-A - denotes a new descriptor inter- preted as the 'number of P-^-s (e.g., parts) with attribute equal A'. Example: V [color (P1)=RED] [color (P2)=RED] [color (P3)=BLUE] : : >K j< [//P -color- red=2] : :> K (The above is a generalization rule, because a set of oMects with any two red parts is a superset of a set of objects with two parts which are red and one part which is blue) -25- (viii) the generating chain properties rule If the arguments of different occurrences of the same relation (e.g., relation 'above', 'left-of, 'next 1 , etc.) form a chain, i.e., are linearly ordered by the relation, the rule generates descriptors relating to specific objects in the chain and computes their properties as potentially relevant characteristics. For example: LST-object - the 'least object', i.e., the object at the beginning of the chain (e.g., the bottom object in the case of relation 'above') MST-object - the object at the end of the chain (e.g., the top object) ith-object - the ith object of the chain. 5. 4 The preference criterion The preference criterion defines what is the desired solution to the problem, i.e., what kind of hypotheses are being sought. The question of what should be the preference criterion is a broad subject beyond the scope of the paper. We will, therefore, discuss here only the underlying ideas behind the presented approach. First, we disagree with many authors who seem to be searching for one universal criterion which should guide induction. Our position is that there are many di- mensions, independent and interdependent, on which the hypotheses can be evaluated. The weight given to each dimension depends on the ultimate use of the hypotheses. Among these dimensions are various forms of simplicity of the hypothesis (e.g., the number of operators in it, the quantity of information required to encode the hypothesis using operators from an a priori defined set [26], etc.), the scope of the hypo thesis, which relates the events predicted by the hypothesis to the events actually -26- observed (e.g., the 'degree of generalization' [12], the 'precision' [26]), the cost of measuring the descriptors in the hypothesis, etc. Therefore, instead of defining a specific criterion, we specify only a general form of the criterion. The form permits a user to define various specific criteria to the inductive program, which are appropriate to the application. The form, called a 'lexicographic functional' consists of an ordered list of criteria (of dimensions of hypothesis quality) and a list of 'tolerances' for these criteria [12, 23]. An important and somewhat surprising property of such an approach is that by properly defining the preference criterion, the same computer program can produce either the characteristic or dis- criminant descriptions of object classes. The characteristic description specifies the common properties shared by the objects of the same class (most work on induction considers only this type of descriptions, e.g., [2], [5], [18]), while the discriminant description specifies only the properties necessary for distinguishing the given class from all the other classes (Michalski [12, 27], Larcon [23]). 5.5 Arithmetic descriptors In addition to initial linear descriptors used in the data rules, new linear descriptors can be formulated as arithmetic functions of the original ones. These descriptors are formulated by a human expert as suggestions to the program. 6. OUTLINE OF ALGORITHM AND OF COMPUTER IMPLEMENTATION' In this section we outline the top level algorithm for rule induction and its implementation in the computer program INDUCE-1.1 ([23] [24] [25]). The algorithm is illustrated by an example. -27- INDUCE-1.1 is considered to be only an aid to rule induction. Its successful application to practical problems requires a cooperation between the program and an expert, whose role is to formulate data rules and the problem environment rules, define the preference criterion and other parameters, evaluate the obtained rules, repeat the process if desired, etc. 6. 1 Computer representation of VL decision rules Decision rules are represented as graphs with labeled nodes and labeled directed arcs. A label on a node can be: a) a selector with a descriptor without the argument list, b) a logical operation, c) a quantifier form 3 x or \/x) . Arcs link arguments with selectors or descriptors, and are labeled by 0,1,2,.., to specify the position of an argument in the descriptor indicated at the head of the arc (0 indicates that the order of arguments is not important) . Several different types of relations may be represented by an arc. The type of relation is determined by the label on the node at each end of the arc. The types of relations are: 1) functional dependence, 2) logical dependence, 3) implicit variable dependence, 4) scope of variables. Figure 2 gives a graph representing a VL ?1 expression. The two arcs connected to the logical operation (A) represent the logical dependence of the value of the formula on the values of the two selectors. The other arcs in the figure represent the functional dependence of f on x and x„, and g on x_ . [f « 1] 3x, [g - 2] VL Graph Structure: 3x x x^ff^.x^ - l][g(x 2 > - 2]) Figure 2 -28- 6.2. Outline of the Top Level Algorithm The implementation of the inductive process in the program INDUCE-1 was based on ideas and algorithms adopted from the earlier research on the generalization of VL expressions (Michalski [12,27] , and some new ideas and algorithms developed by Larson [23,24]. The top level algorithm (in somewhat simplified form) can be described as follows: 1. At the first step, the data rules (whose condition parts are in the disjunctive simple forms) are transformed to a new set of rules, in which condition parts are in the form of a -expressions . A c-expression (a conjunctive expression) is a product of selectors accompanied by one or more quantifier forms, i.e., forms QFx.. x_ ..., where QF denotes a quantifier. (Note, that due to the use of the internal disjunction and quantifiers, a c-expression represents a more general concept than a conjunction of predicates (used, e.g., in [18] [19]). 2. A decision class is selected, say K t and all c-expressions associated with this class are put into a set Fl, and all remaining c-expressions are nut into a set FO ( the set Fl represents events to be covered , and set FO represents constraints, i.e., events not to be covered ). 3. By application of inference rules (describing the problem environment), constructive generalization rules, and rules generating arithmetic descriptors (sec. 5. 5), new selectors are generated. The 'most promising' selectors (according to a certain criterion) are added to the c-expressions in Fland FO. 4. A c-expression is selected from Fl, and a set of consistent generalizations (a restricted star) of this expression is obtained. This is done by starting with single selectors (called 'seeds'), selected from this c-expression as the 'most promising' ones (according to the preference criterion). In each -29- subsequent next step, a new selector is added to the c-expression obtained in the previous step (initially the seeds) , until a specified number (parameter NCONSIST) of consistent generalizations is determined. Consistency is achieved when a c-expression has NULL intersection with the set FO. This 'rule growing 1 process is illustrated in fig. 3. 5. The obtained c-expressions, and c-expressions in FO, are transformed to two sets El and EO, respectively, of VL events (i.e., sequences of values of certain discrete variables) . A procedure for generalizing VL- descriptions is then applied to obtain the 'best cover' (according to a user defined criterion) of set El against EO (the procedure is a version of AQVAL/1 program [12]). During this process, the extension against 3 the closing the interval and the climbing generalization tree rules are applied. The result is transformed to a new set of c-expressions (a restricted star) in which selectors have now appropriately generalized references. 6. The 'best' c-expression is selected from the restricted star. 7. If the c-expression completely coveisFl, then the process repeats for another decision class. Otherwise, the set Fl is reduced to contain only the uncovered c-expressions, and steps 4 to 7 are repeated. The implementation of the inductive process in INDUCE- 1.1 consists of a large collection of specialized algorithms, each accomplishing certain task . Among the most important tasks are: 1. the implementation of the 'rule growing process' 2. testing whether one c-expression is a generalization of ('covers') another c-expression. This is done by testing for subgraph isomorphism. -30- o - a disgarded c-rule - an active c-rule - a terminal node denoting a consistent c-rule Each arc represents an operation of adding a new selector to a c-rule The branching factor is determined by parameter ALTER. The number of active rules (which are maintained for the next step of the rule growing process) is specified by parameter MAXSTAR. The number of terminal nodes (consistent generalizations) which program attempts to generate is specified by parameter NCONSIST. Illustration of the rule growing process (an application of "the dropping selector rule in the reverse order) Figure 3 -31- 3. generalization of a c-expression by extending the selector references and forming irredundant c-expressions (includes application of AQVAL/1 procedure) . 4. Generation of new descriptors and new selectors. Program INDUCE 1.1 has been implemented in PASCAL (for Cyber 175 and DEC 10); its complete description is given in [25]. 6.3. Example We will present now an example illustrating some of the features of INDUCE-1. 1. Suppose given are two sets of trains, Eastbound and Westbound, as shown in fig. 4. The problem is to determine a concise (logically sufficient) description of each set of trains, which distinguishes one set from the other (i.e., a discriminant description which contains only necessary conditions for distinguishing between the two sets) . As the first step, an initial set of descriptors is determined for describing the trains. Eleven descriptors are selected in total. Among them: • infront(car . ,car .) - oar. is in front of oar. J (a nominal descriptor) • length (car.) - the length of oar. (a. linear descriptor) • car- shape (car .) - the shape of oar. (a structured descriptor with 12 nodes in the generalization tree; see eqs. (13) and (14)) cont- load (car ., load .) - oar. contains load i J (a nominal descriptor) load -shape (load.) - the shape of load. (a structured descriptor) The value set: • circle • hexagon — • t ~ZZ^=5** polygon • triangle—— . rectangle -32- 1. EASTBOUND TRAINS o o o o o^ o^ o o v^ 7YX A>-i u o u un° * LOJ^SMLAJ^A^ D JL TT o 1 73 — cr O - O - T3 Jl 2. WESTBOUND TRAINS i. lA UoooU D n r^AULQjun n 3. lOJ- TST -IDg u u u u u ^v_ Jl o \aa^M^0aaaJ3^ n s- i □ □ u □ j-iEO Figure 4 -33- • nrpts-load(car ) - the number of parts in the load of oar (a linear descriptor) • nrwheels(car ) - number of wheels in oar (a linear descriptor) The data rules consist of descriptions of the individual trains in terms of the selected descriptors, together with the specification of the train set they belong to. For example, the data rule describing the second eastbound train is: Scar ,car ,car ,car , load ,load 9 , . . . [infront(car 1 ,car 2 ) ] [infront(car 2> car^]. . . [ leng th ( car )= long] A [car-shape(car i )=eneine][car-shaDe(car 2 )=TT-shaned]r C ont-load(car 2 ,load 1 )]A (12) [load-shape (load )=triangle] . . . [nrwheels(car ) ] . . : :>[class= Eastbound] Rules describing the problem environment in this case are only rules defining structures of structured descriptors (arguments of descriptors are omitted) : [car-shape=open rctngl,open trapezoid, U-shaped, dbl open rctngl]=* (13) [car-shape=open top] [car-shape=ellipse, closed rctngl, jagged top, sloping top]=*[car-shape=closed top] (14) [load-shape=hexagon, triangle, rec tangle ] => [load-shape=polygon] (15) The criterion of preference was to minimize the number of rules (c-expressions) in describing each class, and, with secondary priority, to minimize the number of selectors in each rule. Rules of constructive generalization included in the program are able to construct, among other descriptors, such descriptors as the length of a chain, properties of elements of a chain, number of objects satisfying a certain relation, etc. For example, from the data rule (12) , the constructive generalization rules can produce new selectors such as: * At this moment, before proceeding further, the reader is advised to look at the pictures and to try to solve this problem on his/her own. -34- [nrcars=4] - the number of cars in the train is 4 (the length of chain defined by relation infront) [nrcars-length-long=l] - the number of long cars is 1 (the engine) [nr-pts-load (last-car) =2]- the number of parts in the load of the last car is 2 [position(car )=i] - the position of oar, is i Suppose that eastbound trains are considered first. The set Fl contains then all c-expressions describing eastbound trains, and FC,all c-expressions describing westbound trains. The description e is selected from Fl (suppose it is the above description of the second eastbound train), and supplemented by 'most promising' metadescriptors generated by problem environment rules and constructive generalization rules. In this case, the metaselector [shape (last-car)=rectangle] is added to e. Next, a set G (a restricted star) of certain number (NCONSIST) of consistent generalizations of e is determined. This is done by forming a sequence of partial stars (a partial star may include inconsistent generalizations of e) . If an element of a partial star is consistent, it is placed into the set G. The initial partial star (P ) contains the set of all selectors of e . This partial star and each subsequent partial star is reduced according to a user specified preference criterion to the 'best' subset, before a new partial star is formed. The size of the subset is controlled by a parameter called MAXSTAR. A new partial star P. - is formed from an existing partial star P. in the following way: for each c-expression in P., a set of c-expressions is placed into P..,» each new c-expression containing the selectors of the original c-expression plus one new selector from e s which is not in the original c-expression. Once a sufficient number of consistent generalizations have been formed, a version of the AQVAL/1- program (Michalski [12]) is -35- applied to extend the references of all selectors in each consistent generalization. As the result, some selectors may be removed and some may have more general references. In the example, the best subset of selectors of e (i.e., the reduced partial star (P ) ) was: Bear., [car-shape (car )=U-shaped] (16) 3car[ car-shape (car )=open trapezoid] (17) 3car [ car-shape (car )= rectangle] (18) [car-shape (last-car) =rec tangle] (19) The last c-expression is consistent (has empty intersection with c-expressions in FO) and, therefore, is placed in G. From the remaining, a new partial star is determined. This new partial star contains a consistent generalization: 3car [car-shape(car )=rectangle] [length(car )=short] (20) which is added to G. Suppose G is restricted to have only two elements (NC0NSIST=2) . Now, the program AQVAL/1 is applied to generalize references of the selectors in c-expressions of G, if it leads to an improvement (according to the preference criterion) . In this case, a generalization of (20) produces a consistent and complete generalization: Scar., [car-shape (car- )=closed top] [length(car )=short] (21) (the generalization of (19), [car-shape(last-car)-polygon] , is not complete; it does not cover all Fl) . In this example, only 2 partial stars were formed, and two consistent generalizations were created. In general, a set of consistent generalizations is created through the formation of several partial stars. The size of each partial star and the number of alternative generalizations are controlled by user supplied parameters. -36- Assuming a larger value of NCONSIST, and applying the above procedure to both decision classes, the program INDUCE- 1.1 produced the following alternative descriptions of each set of trains: (The selectors or references underlined by a dotted line were generated by application of constructive generalization rules or problem environment rules) . Eastboud trains: Scar [length(car )=short] [car-shape(car 1 )=closed top] : :> [class=Eastbound] (the same as (21)). It can be interpreted: If a train contains a car which is short and has a closed top, then it is an eastbound train. 3car 1 , car 2 , load 1 , load 2 [ inf ront (car^ car 2 ) ] [cont-load (car , load ) J ,\ [ coat-load (car 2 , load^ ] [ load -shape (load )=triangle] A [load-shape (load 2 )=£olv£onJ : :> [class=Eastbound] (23) It can be interpreted: If« a train contains a car whose load is a triangle, and the load of th e car behind is polygon, then the train is eastbo und. Westbound trains: [nrcars=3] V acar 1 [car-shape [class=Westbound] ( 2 ^) ^car. [ nr cars- lengthy long=2] [Eosition^car-2 = 3j [shape(car 1 )=op_en-top,. jagged-top] ::> [class= Westbound] (25) It is interesting to note that the example was constructed with rules (23) and (24) in mind. The rule (22) found by the program as an alternative was rather surprising because it seems to be conceptually simpler than rule (23) . This shows that the combinatorial part of an induction process can be successfully handled by a computer program, and, therefore, programs like the above have a potential to serve as an aid to induction processes in various applied sciences. -37- 7. SUMMARY We have presented an approach to pattern recognition which views it as knowledge-guided computer induction. Let us briefly re- view the main advantages and limitations of this approach. Among the advantages are the generality of the method and the simplicity of interpretation of the pattern recognition rules. More specifically, the approach: takes into consideration three types of descriptors (nominal, linear and structured) and can use descriptors of different arity (variables, n-ary relations and functions) takes into consideration the properties of the inter- relationships of descriptors, characteristic to the recognition problem at hand gives thepossibility of defining (within limits) a pre- ference criterion, measuring the quality of the rules, that is most suited to the application has an ability to generate new descriptors ('metadescriptors' ) and blend them smoothly with the initial ones to provide a basis from which the final description chooses its most appropriate descriptors provides uniformity of the representation of initial and final descriptions (i.e., in terms of VL rules) and of inference and generalization rules permits the person stating the problem to suggest various arithmetic transformations of the original (linear) vari- ables which look promising as relevant characterization of obj ect classes • Among major limitations of the presented work is a quite limited form of expressing initial and final descriptions (i.e., in the form of a disjunctive simple VL_. expressions), and a restricted number of operators the program (implementing the approach) understands and uses -38- in inducing descriptions. Another limitation is that the program does not dif ferenciate among possible types of linear descriptors (e.g., ordinal, interval, ratio and absolute). Also, it does not take into consideration any probabilistic information, nor it is able to auto- matically search for appropriate algebraic transformations. These limitations do not seem, however, to be inherent to the approach. Also, the questions pertinent to the computational efficiency of algorithms used have not been investigated. ACKNOWLEDGEMENT The research presented here has been supported in part by the National Science Foundation Grant NSF MCS 76-22940. The author acknowledges the collaboration with James Larson of Rockwell International, Inc., in developing several ideas presented here and, in particular, his outstanding implementation of the first version of the program, INDUCE-1. Among many people who helped through discussions and through their interest in the work, the author would like to specially mention K. S. Fu, Donald Michie, Brian Gaines, Raj Reddy, Len Uhr, Larry Travis, A. B. Baskin and Tom Dietterich. •39- REFERENCES [ 1] Buchanan, B. G. , Mitchell, T. , Model-directed learning of production rules, Computer Science Depart ., Report No. STAN-CS-77-597 , Stanford University, March 1977 . [ 2] Winston, P. H., Learning structural descriptions from examples, Tech. Rep. AI TR-231, MIT AI Lab, Cambridge 1970. [ 3] Lenat, D. B. » AM: An artificial intelligence approach to discovery in mathematics as heuristic search, Computer Science Department, Report No. STAN-CS-76-570, Stanford University, July 1976. [ 4] Soloway, E. M. and Riseman, E. M.» Levels of pattern description in learning, Proceedings of the 5th International Joint Conference on Artificial Intelligence, August 22-25, MIT, 1977* [ 5] Simon, H. A., Complexity and the representation of patterned sequences of symbols, Psychological Review, Vol. 79, pp. 369-382, 1972 ." [ 6] Waterman, D. A. } Adaptive production systems, Working paper #285, Department of Psychology, Carnegie-Melon University, Pittsburgh, 1974 . [ 7] Gaines, B. R., Behavior/ structure transformations under uncertainty, Int. Journal on Man-Machine Studies, Vol. 8, pp. 337-365, 1976. [ 8] Shaw, D. E., Swartout, W. R. and Green, C. C, Inferring Lisp programs from examples, Proceedings of the 4th International Joint Conference on Artificial Intelligence, Vol. I, pp. 351-356, Tibilisi, September 1975 . [ 9] Feldman, J. A., Gips, J., Horning, J. J., and Reder, S. Grammatical complexity and inference, CS report No. 125, Computer Science Department, Stanford University 1969. [10] Brayer, J. M. , Fu, K. S.> Web grammars and their application to pattern recognition, TR-EE 75-1, School of EE, Purdue University, December 1975 • [11] Michalski, R. S.» A variable- valued logic system as applied to picture description and recognition, GRAPHIC LANGUAGES , edts. F. Nake and A. Rozenfeld, North-Holland 1972. [12] Michalski, R. S., AQVAL/1 — computer implementation of a variable- valued logic system and the application to pattern recognition, Proceedings of the First International Joint Conference on Pattern Recognition, Washington, D.E., October 30-November 1, 1973. -40- [13] Larson, James , A multi-step formation of variable-valued logic hypotheses , Proceedings of the Sixth Annual International Symposium on Multiple-Valued Logic at Utah State University, May 25-28, 1976. [14] Stoffel, J. C. , The theory of prime events :data analysis for sample vectors with inherently discrete variables, Information Processing 74, North-Holland Publishing Company, pp. 702-706, 1974 . [15] Morgan, C. G. , Automated hypothesis generation using extended inductive resolution, Advance Papers of the 4th I. J. Conf. on Artificial Intelligence, Vol. I, pp. 351-356, Tbilisi, Georgia, September 1975* [16] Plotkin, G. D. , A further note on inductive generalization. In Machine Intelligence 6, B. Meltzer and D. Michie, Eds., American Elsevier, New York, 1971, [17] Fikes, R. E., Hart, R. E. and Nilsson, N. J. Learning and executing generalized robot plans , Artificial Intelligence 3, 1972. [18] Hayes-Roth and McDermott, J. An interference matching technique for inducing abstractions, Communications of the ACM, No. 5, Vol. 21, pp. 401-411, May 1978* [19] Vere, S., Induction of concepts in the predicate calculus , Advance Papers of the 4th I. J. Conf. on Artificial Intelligence, Vol. I, pp. 351-356, Tbilisi, Georgia, September 1975. [20] Zagoruiko, N. G. , Xskustviennui intellekt jL empiricheskoie predskazanie , Novosibirsk! j Gosudarstviennyi Unversitiet, 1975. [21] Hedrick, C. L., A computer program to learn production systems using a semantic net, Ph.D. thesis, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, July 1974 » [22] Larson, J., Michalski, R. S. t Inductive inference of VL decision rules, Proceedings of the Workshop on Pattern-Directed Inference Systems, Honolulu, Hawaii, May 23-27, 1977, SIGART Newsletter, No. 63, June 1977* [23] Larson, James, Inductive inference in the variable-valued predicate logic system VL 2 : methodology and computer implementation, Ph.D. Thesis, Report No. UIUCDCS-R-77-869, Department of Computer Science, University of Illinois, Urbana, May, 1977. -41- [24] Larson, James, INDUCE-1: an interactive inductive inference program in VL 21 logic system, Report No. UIUCDCS-R- 77-876, Department of Computer Science, University of Illinois, Urbana, May, 1977 . [25] Dietterich, T., INDUCE 1.1 - the program description and a user's guide, Internal Report, Department of Computer Science, University of Illinois, Urbana, July, 1978. [26] Coulon, D., Kayser, D., Learning criterion and inductive behaviour, Pattern Recognition, Vol. 10, No. 1, pp. 19-25, 1978. [27] Michalski, R. S., A system of programs for computer-aided induction: a summary , 5th International Joint Conference on Artificial Intelligence , MIT, Boston, Massachusetts, August, 1977. BLIOGRAPHIC DATA IEET 1. Report No. UIUCDCS-R-78-927 2- 3. Recipient's Accession No. "i Title and Subtitle PATTERN RECOGNITION AS KNOWLEDGE-GUIDED COMPUTER INDUCTION 5- Report "Bate June 15. 1978 6. ' Author(s) Ryszard S. Michalski 8. Performing Organization Rept. No. || Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana -Champaign Urbana, IL 61801 10. Project/Task/Work Unit No. 11. Contract /Grant No. NSF MCS 76-22940 J. Sponsoring Organization Name and Address National Science Foundation Washington, D.C. 13. Type of Report & Period Covered 14. 5. Supplementary Notes stracts The determination of pattern recognition rules is viewed as a problem of computer induction, under the guidance of generalization rules and rules representing knowledge of the recognition problem at hand. The paper formulates the underlying theory for generalization and optimization of descriptions of object classes, express- ed in the form of decision rules. The language for formulating descriptions is an extension of the first order predicate calculus, called the variable-valued logic calculus VL ?1 . VL91 contains several new syntactic forms, specially oriented for expressing Inductive processes. The presented approach uniformly combines descriptors (variables, predicates, functions) of three different types: nominal, linear and structured, and has an ability to generate new descriptors not used in the initial data rules. 7. Key Words and Document Analysis. 17a. Descriptors pattern recognition, decision rules, generalization techniques, computer induction, knowledge-acquisition, learning from examples, many-valued logic, computer inference, computer consulting systems, theory formation, inductive inference 7b. Identifiers /Open-Ended Terms 7c. COSATI Field/Group 8. Availability Statement 19. Security Class (This Report) UNCLASSIFIED 21. No. of Pages 20. Security Class (This Page UNCLASSIFIED 22. Price ORM NTIS-35 ( 10-70) USCOMM-DC 40329-P7! .1111 so \m