which it was withdrawn on or before the Latest Date stamped below. for disciplinary action and may result the University. L161— O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/inductiveinferen869lars UIUCDCS-R-77-869 Inductive Inference in the Variable Valued Predicate Logic UILU-ENG 77 1725 System VL ? . : Methodology and Computer Implementation by James B. Larson May 1977 \4* DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Inductive Inference in the Variable Valued Predicate Logic System VL-. : Methodology and Computer Implementation by James B. Larson Department of Computer Science University of Illinois Urbana, Illinois May 1977 This work was submitted in partial fulfillment of the requirements for the degree of Docotr of Philosophy in Computer Science in the Graduate College of the University of Illinois and was supported in part by the National Science Foundation, Grant No. NSF MCS 74-03514. no. ^^-813 INDUCTIVE INFERENCE IN THE VARIABLE VALUED PREDICATE LOGIC SYSTEM VL : METHODOLOGY AND COMPUTER IMPLEMENTATION 21 James Burton Larson, Ph.D. Department of Computer Science University of Illinois at Urbana-Champaign, 1977 A formal methodology and computer program are oresented for the transformation of a set of user supplied logical decision rules into a new, generalized set of decision rules which is near optimal according to a user supplied criterion. The VL$2 logic system (a multi-valued version of a first order predicate calculus) is used as the framework for defining and expressing decision rules and transformations on decision rules. The program INDUCE_1 which implements certain inductive inference rules using a graphical representation of VL$2 expressions is described and some examples of inductive problems solved by the program are given. Ill ACKNOWLEDGEMENTS I would like to express my gratitude to the following people for their help on this thesis: Pirst, Professor R. S. Michalski for his inspiration, challenging problems, and many significant suggestions especially regarding meta-f unctions, my committee. Professor D. Plaisted and Dr. Don Friesen foe encouragement and many helpful discussions, I especially wish to recognize the valuable support of Richard Chilausky during the early stages of this research, Barr Segal for proofreading of the manuscript, and the many predecessors in the area of inductive inference and variable valued logic whose efforts have been much appreciated in the development of this thesis, I am greateful for the financial support of the National Science Foundation, the Department of Computer Science, the Research Board of the University of Illinois, and the Computing Services Offices, the latter for supplying the computer time to implement the text formatter used for this paper. Finally, without the moral encouragement of my wife, Rhonda, this work surely would have ended years ago from utter frustration and discouragement. IV PREFACE This paper was prepared on a CYBEF 175 computer at the University of Illinois using a text formatter written by the author. Since several special characters were not available on the print train of the printer, the following character combinations have special meaning: Character Meaning v logical disjunction 6 logical conjunction 6_ set intersection E the existential quantifier 1 the universal quantifier TABLE OP CONTENTS CHAPTER PAGE 1. Introduction. •••••••••••••••••••••••••••••••• •• •• 1 1.1 The Problem 2 1.2 Previous Specific Applications. ••••••••••••••••••• • 4 1.3 Formal Systems for Inductive Inference. •••••••••• •• 7 1.4 Overview of the Following Chapters •••••••••• 12 2. Representing Decisions in the VL System. •••••••••••• • 14 2.1 VL System Structure. ••• • • ••• 16 2.2 Selector Formation and Interpretation Rules ••••••• 17 2.3 VL Formation Rule • • 21 2.4 Interpretation Rules. ••••••••••••••••••••••••••••• • 22 2.5 VL Decision Rules •• •••••••• 26 3. VL Transformation Rules. ••••••••••••• • 30 3.1 Eguivalence Transformation. • •••••••••••••• 31 3.2 Generalizing Transformations. •••••••••••••••••••• •• 33 3.3 Specializing Transformations. •••••••••••••••••••• •• 37 3.1 Transformation Rules Involving the Decision Part. • • 38 3.5 Example of Application of Transformation Rules..... 40 3.6 Efficient Application of Generalization Rules...... 45 4. Computer Representation of VL Decision Rules. •••••••• • 49 5. Algorithms and Computer Implementation................ 54 5.1 Input to the Program. •••••••••••••••••••••••••••• •• 54 vi 5.2 Program Output, •••••••••••••••••••••••••••••••••• •• 61 5.3 Formation of a Complete Generalization. ••••••••••• • 64 5.4 Determine Cover and Intersection of 2 Formulas..... 6 5 5*5 Trimming a Set of c-f ormulas. • • • • •••••••••••• 70 5.6 Pormation of a Set of Consistent Generalizations. • • 73 5.7 Extending the References of a Consistent c2-formula 75 5.8 Adding Hew Functions and Predicates to c-f ormulas.. 79 6. Examples of Decision Rule Generation using INDUCE_1... 84 6.1 Figures Example (EX 1) • ••••••• 89 6.2 Arch Example (£1 2) 102 6.3 Trains (EX 3) 107 6.4 Textures (EX 4) 112 7. Current Limitations and Possible Extentions. ......... . 117 LIST OF REFERENCES 123 APPENDIX A 129 APPENDIX B. 138 VITA •••••• • 145 1, Introduction An important problem which is often presented to computer systems is that of extracting relevant information from complex data in order to gain a better understanding of the meaning behind such data. Most current methods are incapable of adeguately describing highly structured situations and produce results which are difficult to interpret, A selection of those systems which overcome these difficulties is given in sections 1,2 and 1,3, The following chapters deal with finding useful (as defined by the user with an optimality criterion) , generalized information about sets of situations represented as logical VL decision rules, A decision rule is a form CONDITION => DECISION where CONDITION describes some set of situations and DECISION describes some new situation or action which is indicated if an existing situation satisfies the description in CONDITION. If no situation satisfies the CONDITION in a decision rule, the rule makes a NULL decision. The descriptions in CONDITION and DECISION are represented in the VL logic system. This system is a variable valued first order predicate caluculus with a rich set of operators and the facility for allowing user defined domain sizes and structures for variables and functions appropriate for the problem at hand. The approach taken here is to apply inductive inference rules to logical decision rules which express some decisions made with sets of situations in order to form new, near-optimal decision rules which retain the decision making capabilities of the original rules. 1.1 The Problem The specific induction problem being investigated is as follows: Given a set of decision rules: C =>D ; C =>D ; •• • C =>D 1,1 1 1,2 1 1,t1 1 C =>D ; C =>D ; ... C ->D 2,1 2 2,2 2 2,t2 1 (1.1) C =>D ; C =>D ; ... C =>D n,1 n n,2 n n,tn n where C and D are expressions in the VL system which i,1 i 2 represent the CONDITION and DECISION parts of decision rules respectively, then find, through an application of generalization rules, a set of ?L decision rules: C =>D 1 1 C =>D 2 2 (1.2) C =>D n n which are with regard to the rules (1.1): 1) consistent 2) complete 3) optimal with regard to a user supplied optimality criteria. The new rules are consistent if for any situation for which the new rules assign a decision (a non-HULL decision) , the initial rules assign the same decision or a NOLL decision. They are complete if for any situation for which the initial rules assign a decision, the new rules assign a decision. From the initial rules, it is usually possible to derive many sets of rules which are consistent and complete. Therefore, a criterion of optiaality (defined by a user according to his problem) is used to select a few alternatives which are most desirable according to the specific induction problem. The attention is restricted to sets of rales which make only one decision for a given situation* 1.2 Previous Specific Applications Inductive inference is used here to describe a formal method for rewriting or generalizing available data in order to give new information about a problem and make new decisions which could not be obtained before. Statistical methods are probably the most widely used forms of inductive inference. These methods reguire a great deal of a-priori knowledge including the availability of a large set of data, knowledge about the interpendence of variables on each other, and an understanding of the type of underlying distribution of the data [Croft-71 ]• In addition, statistical results may be difficult to read and interpret [Larson 76] (e.g. a conditional probability matrix). The first approach to automated inductive inference using logic was most likely developed by Hunt [Hunt 66]. He described a number of different schemes for generating decision trees which can be used to distinguish between sets of letter sequences. Although a decision tree produces an elegant procedure which can be easily executed on a computer, it lacks the flexibility necessary to represent more general concepts. The BEURISTIODENDRAL program [Buchanan et.al. 69] provides a model appropriate for representing the structure of chemical compounds and some transformations representing possible chemical reactions which can be applied to the compound representations under certain known physical constraints. The program finds a set of possible structures of a compound knowing its empirical formula and mass spectrometer data by suggesting various structures for the compound and applying transformations to the structures under the guidence of a set of heuristics based on the mass spectrometer data. The meta-DENDRAL program [Buchanan et.al, 72] finds a general mechanism or theory which explains the transformations which take place relying on the knowledge of those transformations which are plausible and those which are forbidden. Computer aided medical diagnosis is another area in which logical inductive inference methods have been suggested. Pople [ Pople et.al, 72] have suggested a graph structure representation of biomedical facts and an approach to forming theories by finding common subgraphs using user supplied suggestions. Of particular note is work in the area of computer aided medical diagnosis and plant pathology [Hichalski 73, 74, Chilausky et.al 76, Larson 76] use the VL system (a variable valued logic system which is the precursor to the VL system used here) and the program AQTAL/1-AQ7 to 2 infer descriptions of clases of liver diseases and soybean diseases. The latter work is a specific application of a the VL logic system to several problems. Winston [Winston 70] demonstrates specific procedures vhich discover descriptions from examples in the toy blocks world. A description which is descriminant (i.e. can be used to distinguish one object or set of objects from other sets) is formed by matching the similar parts of the object under consideration with another object (near-miss) and then isolating the structures which are different between the two objects. This differs from the approach taken in the following chapters in that matching is only done here in order to adequately describe some specific feature which distinguishes between two objects. (e.g. to specify the second part from the top in an object, one may have to include some predicates which define the second from the top in terms of other descriptors. If the distinguishing feature involves this part, then the definition of the part used in the description must be common to both objects.) Winston also uses modifiers such as •must 1 , 'may 1 , •must not 1 in descriptions. A form of these modifiers is inherent in the VL approach (e.g. discriminant descriptions involve the 2 •must 1 modifier, descriptive descriptions involving only set of objects yields descriptions involving a type of 'may 1 modifier) • The program ARITHHETIC [Bongard 70] finds an algebraic rule which explains sample relationships. The program is given sets of tables, each table containing 3-tuples of a ternary relation. A set of 33 predicates are used in the program (although not explicitly given in the reference) where a predicate may be: e.g. if the guotient of the first two elements of the 3-tuple is positive, then the predicate is true. Por each row of a table, a set of features is generated by finding Boolean combinations of the predicates applied to the row. A feature describing the table is generated for each Boolean combination by finding the product of Boolean combinations for all rows. The set of features which appear to be most useful in distinguishing one table from the others is selected as the description of each table. 1. 3 Formal Systems for Inductive Inference A criticism of some of the above endeavors is that they are problem specific. More general systems which have many possible application areas have emphasis in two types of approaches: 1) generation of descriptions of sets of objects represented in a logic system of some kind , and 2) creation 8 of nev concepts in a sequential manner by generating and modifying hypotheses* k summary of several types of learning systems can be found in [Banerji 75], Of particular interest here is the work of Horgan [Horgan 72] in which a formal system based on the first order predicate calculus with falsehood preserving transformations is presented. Briefly, the idea that inductive inference can be described as backwards reasoning is apparently not sufficent for a practical system. For example, if E was derived from the 2 assertions -»E and E v E , then backward reasoning would 1 1 2 somehow have to generate the two assertions above given only E • There are far to many expressions E which can be applied to a situation such as this to yield a practical system. Instead, Horgan defines a falsehood preserving transformation of a deductive inference rule into an inductive inference rule (in Horgan^s notation) : E |- E 1 P 2 where E is false for every interpretation in which E is 2 1 false. A set of transformations (P-rules) can be created to convert a deductive logic systeoi into an inductive logic system (e.g. if E and E are atomic expressions, then 1 2 E V E |- E E 1 2 P 1 2 i.e.. from the disjunction E v E , one may infer the 1 2 con-junction E E while preserving falsehood)* With this 12 system, theorem proving technigues using falsehood preserving rules may be applied to inductive problems, (In later chapters, the symbol |< is used to denote such a •generalizing 1 transformation.) A number of authors have presented systems which use a graph structure representation of expression in a type of logic system. The 'parameterized structure representation 1 of Hayes-Roth [Hayes-Roth 76] is used in inductive tasks which learn descriptions of sets cf objects and transformations from one set of objects to another set of objects from examples. The problems addressed are closely akin to the work in the following chapters cf this paper, one of the objectives being to find the common properties of all examples of one class which are available to the system using a graphical representation. The number of possible alternatives in Hayes-Roth* s method is liirited by a fixed utility function which evaluates intermediate results and discards hypotheses of low utility. The work differs from that presented in the following chapters in that only independent descriptions are sought using a fixed utility criterion (these are called descriptive descriptions in 10 chapter 6.). The structure used for representation does not take into account specific domain structures which are inherent in the VL system and there is no facility for generating new descriptors within the system. A formal approach using the predicate logic system [ Vere 75] produces the largest common set of descriptors of a set of examples by representing examples using a graph structure and finding the largest common subgraph of these structures. Such methods suffer from the HP-complete nature of graph isomorphism algorithms (this problem is addressed in the following chapters by finding the smallest useful subgraphs of graphs of examples instead of the largest subgraph.) • Neither Vere nor Hayes-Roth use negative examples heavily in their implementations. Hedrick [Hedrick 74] uses a semantic net to represent examples and to build and modify hypotheses as new examples are given to the program. The semantic net supports only binary relations but otherwise looks similar to the graph structure of Vere and Hayes-Roth. Kochen [ Kochen 74] presents a different type of system with a set of initial events containing state variables, actions and relations between the actions, a learning program which applies certain transformations to events at various time steps. At each tinre step, weighted 11 hypotheses are formed which reduce the set of states stored in memory (i.e those states which are explained by the hypotheses ) • Production systems provide a rich tool for the introduction of many inference techniques (recall that VL decision rules are similar to production rules) • Briefly, a production system architecture contains a working memory, a set of productions which modify working memory, and a recognize-act cycle with conflict resolution to dictate the order in which productions are applied to iremory and to add new productions when necessary. Waterman [ Waterman 70, 74, 75] uses these to solve several problems. A program which plays poker has been developed [Waterman 70] which designs betting strategies in terms of production rules. More recently [Waterman 75], the approach has been applied to recognizing letter seguences with success. Rychener [ Rychener 76 ] has applied production systems to chess end games and natural language input of a toy blocks world. These two authors use distinct production system architectures which differ in the ordering of working memory, ordering of productions and in the way in which new productions are added to an existing set of productions to correct errors made by the system. As a final note with regard to production systems. 12 the MYCIN system [ Shortlieffe 74 ] has been shown to be a very powerful tool in aiding physicians with regard to antimicrobial therapy selection* In the HTCIM system, deductive inference using a multi-valued truth model is applied to expert supplied productions and a data base consisting of a patient record* 1*4 Overview of the Following Chapters The VL system (a subset of the VL system) is 21 2 described in chapter 2* The subset used here uses only the truth values TRUE, FALSE, and UNKNOWN instead of the mult i- valued truth domain of the VL systeir* Also, only a 2 very small subset of the operators available in VL are 2 used* The inductive rules used to transform initial decision rules (1*1) into new decision rules (1*2) are given in chapter 3* The rules involve selecting the most significant features of the initial condition (C in 1.1), extending i,1 the value set which each feature may assume under the domain structure constraints, and adding new global functions which describe certain characteristics of the condition (C )• Chapter 4 contains a graphical representation of VL rules* A subset of the graph structure is used in the computer program INDDCE_1 described in chapter 5. The program accepts as input: 13 1) a set of decision rules representing certain examples of sets of decisions, 2) a problem environment description in the form of VL decision rules which describes certain characteristics of functions and donrains which arise from the particular application, and 3) a set of control parameters which supply the optimality criterion and certain parameters which limit the number of alternatives generated at various points in the program. The output from the program contains a set of complete, consistent decision rules. Chapter 6 gives results of the program as applied to some specific situations. Chapter 7 describes some limitations and possible extensions of the program. Two appendices are given: appendix A, providing a listing of the detailed output of the program applied to one example of chapter 6, and appendix B, giving a brief review of the precursor to this work (the program AQVAL/1-AQ7) which is used as a procedure in IHDDCE_1. 14 2. Representing Decisions in the VL System 2 Much of the information in this chapter is found in [Larson et.al. 77, Nichalski 74b], It is included here to give the reader a familiarity with the VL system. The complete VL system contains a very rich set of operators and domains. A subset of VL called VL which contains a basic 2 21 set of operators and domains is used here. Only the VL system is described with notes indicating the extensions which are possible in the full VL system. In later 2 chapters, the notation VL is used to refer to the system VL . 21 The logic system VL is a language for describing situations (e.g. objects, classes of objects) and expressing decision and inference rules. The language provides for a compact expression of descriptions which is both easily readable and sufficiently precise to facilitate formal manipulation (possibly by a computer) • There are two major differences between VL and the 21 first order predicate calculus 1. Instead of predicates, selectors are used which can be viewed as tests for membership of values of predicates and functions in a certain set. 15 2. Each variable, predicate and function symbol is assigned a domain (or value set) together with a characterization of the structure of the domain. (This feature facilitates the process of rule generalization and allows for the application of different generalization transformations according to the structure of the domain*) There are three types of domains currently distinguished: T. Unordered or N om inal Elements of the domain are considered to be independent entities; no structure is assumed to relate them. A variable or function symbol with this domain is called nominal or cartesian (e.g. blood type, names of objects, etc.) • 2» tiDS^rly. Ordered or Interval The domain is a linearly ordered set. A variable or function symbol with this domain is called interval (e.g. military rank, temperature, size) . 3. Tree Ordered 16 Elements of the domain aire ordered into a tree structure. 1 predecessor node in the tree represents a concept which is more general than the concepts represented by the descendent nodes (e.g., the predecessor of the nodes •triangle 1 , •rectangle 1 , •pentagon 1 may be a • polygon') • A variable or function symbol with such a domain is called structured. 2.1 VL System Structure The »I 21 syste* used is a 5-tuple (V.B.S.B.I, .here: V - is a set of variable symbols. Each variable symbol is associated vith a domain D(i )• A £rou£ of i variables which have the same doirain are labelled with the same variable symbol but a different subscript (e.g. x ,x ,..,x , y ,y ,...,y are specifications of variables in two variable groups which assume values from two domains denoted D(x) and D(y) or alternatively. Dfx ) and D(y )• i i P - is a set of n-ary functions and predicate symbols. Each n-ary function symbol represents a mapping from an argument space into a domain. For a function f(z ,z ,...,x ), this is a mapping: 12 n 17 D(x ) x D(x 2 ) X ... X D(X fc ) -> D(f) where D (x ) , D (x ) , ... D(x ), D (f ) represent the 1 2 k domains of the variables x , x, ... x and the 1 2 k domain of the function f, respectively. A predicate is a function whose domain is the set [ TRUE, FALSE ]• Included in the domains of all function and variable symbols is the value MA (not applicable), s - is a set of symbols including: ( ) [ 1 = < > -> <"> v => 6 v E E. A A. , . R - is a set of formation rules described in section 2.3 I - is a set of interpretation rules described in section 2.4 2.2 Selector Formation and Interpretation Rules A well formed VL formula (wff) is composed of quantifier forms, selectors, and logical connective symbols. A selector is a form: fL # R] or [L» ] where 18 L, L* - each called the referee ace atomic forms. An atomic form is a variable symbol or a function or predicate symbol followed optionally by a list of atomic forms enclosed in parentheses. In the above forms, the atomic form L' must be a predicate symbol with arguments following in parentheses. If L contains a function symbol, then the related function is called an atomic fanctioQ. R - the reference is a set of values in the domain of the atomic function of L. B may be in several forms: Reference Example Description a a constant in the domain of the atomic function of L a,b a list of values in the domain of L separated by commas a..b a pair of values in the domain of L separated by * the symbol (*) representing 19 all values in the domain of L (except Nl) NA the value NA (not applicable) # - is one of the symbol combinations If R is a set of values, then L is related to R by # if when # is = or -»= L has a value (does not have a value) in the set R when # is <= >= < > L has a value related to every value of R by #• The selector is interpreted as a unit of information about a situation with value or truth-status TRUE if the relation R # L holds or FALSE if the relation does not hold, or UNKNOWN in which case the selector is interpreted as a question about the situation which must be answered in order to determine if the selector is satisfied. If some 20 variables in the atomic form of the selector are quantified, these quantifiers must he considered when determininq the truth-status of a selector* If R is *, then I is related to B for any value of L except HA (in this case, # is always =) . Below are some examples of a selector : Selector Interpretation: truth-status TRUE [ color (wall ) = white] The color of the wall represented by wall is white. 1 [lenqth(box ) >1 ] The lenqth of the box represented by box is qreater than or equal to 1. [box - 2 ..5 1 The variable box may have a value 1 1 between 2 and 5 inclusive. The values of box may represent various boxes in a situation. The selector restricts the ranqe of values of the variable box to the 1 values 2 throuqh 5. [ ontop (x ,x ) ] The part represented by x is on top of the part represented by X i 2 21 2. 3 VL Formation Rule Formulas in the ?L logic system are used to describe 2 situations, and also to express decision rules and inference rules. The VL formulas are defined by the following formation rules: 1. A selector is a ?L formula (wf f ) • 2* If V, V and V are wff- then so are: 1 2 (V) a formula in parentheses -•V inverse V & V or V V con-junction (the symbol & is 12 12 used to represent conjunction) V v V disjunction 1 2 V v V exclusive disjunction 1 _ 2 V — -^/V exception 12 V -> V V implies V 12 1 2 v <-> V V is eguivalent to V 12 1 2 22 Ex ,x ,...,x (V) Existent ially quantified formula 12 k (E is used to represent the existential quantifier)* E. x ,x ,...,x (v) Distinctly existential ly quantified formula Ax ,x ,***,x (v) Universally quantified formula ~ 1 2 k (A is used to represent the universal quantifier). Not all of these forms are considered in the following chapters. In chapter 3, (VL inference rules) only conjunction, disjunction, and quantifiers are considered* Chapter 4 (Graph Representation) presents a graph structure representation vhich includes all of these forms but the types of formulas actually included in the algorithm and the implementation involve only conjunction and distinct existential quantification* 2.4 Interpretation Rules A VL formula may have truth-status TRUE, FALSE, or unknown. In the full VL system, a truth- status domain with 2 interval structure may be defined, but here only the values 23 above are considered* The connectives (-• v 6) are interpreted in the normal manner: VL formula Interpretation -V PALSE if ? is TEUE, TRDE if V is FALSE, UNKNOWN if V is UNKNOWN. V v V TRUE IF EITHER V OR V IS TRUE, 12 12 UNKNOWN if both ? and V are 1 2 UNKNOWN or one is UNKNOWN and the other FALSE, TRUE otherwise. V & V UNKNOWN if both V and V are 12 12 UNKNOWN or one is true and the other is UNKNOfcN, TRUE if both V and V are TRUE, FALSE 1 2 otherwise. The remaining connectives may te rewritten in equivalent forms: 24 VL Formula Equivalent form V -> V -V v ? 12 12 V <-> V (V -> ? ) S (V -> V ) 12 12 2 1 V v^ V ? ^.V 12 12 V v V V v ?^ , ? V 1 " 2 1 2^^ 1 2 A VL system is used to describe a set of situations* In order to effectively apply a formula to a set of situations, the VL system should contain variables, 2 functions, and predicates vhich adequately characterize the situations. To determine the truth-status of a formula with reqard to a specific situation, an event is created (an event may be viewed as an interpretation of a situation in the VL system) • An event is a sequence of assignments to variables, functions and predicates in the system which characterize a specific situation. Quantified variables nay be assigned a set of values. One function assiqnment vay be made to a given set of values of arguments if the value of the function is known. If a function does not have an assignment for a given set of values, then the value NA (not applicable) is assumed. 25 A selector [L # R] (or [ L • ]) is satisfied by an event if there is a set of assignments to variables and functions (or predicates) in L (or L*) such that L is related to R by # (or L 1 has the value TRUE) • A VL formula is satisfied by an event if it has 2 truth status TRUE when applied to the event. The quantified formulas are interpreted: The truth status of Ex ,x ,«..,x (V) is TRUE (or FALSE) in a given "12 n situation if there exists (or does not exist) values for x ,x ,...,x in the event 12 n assignments which makes the truth-status of the formula V egual to TRUE ? - if it is not known whether there exist values ••• E.x ,x ,...,x (V) is TRUE (or FALSE) in a given ""12 n situation if there exists (or does not exist) distinct (different) values for x ,x , ...,x in the event 12 n 26 assignments which makes the troth-status of the formula equal to TRUE. This obviates the need for extra predicates in an expression like x -»=x , 1 2 x -»=x , x -»=x # etc. 2 3 13 ? - if it is not known whether there exist values ••• Ax ,x ,...,x (V) is TROE (or FALSE) in a given 12 n situation if for all assignments to the variales x ,x ,...,x , 1 2 n the formula V has truth-status equal to TRUE. A VL formula is a description of a situation if every event which can be derived from the situation satisfies the VL formula and every event which satisfies the formula 2 is also an interpretation of the situation. 2*5 VL Decision Rules If V and V are VL formulas, a general form of a VL 12 2 decision rule is V => V (2.5.1) 1 2 27 The formula V is called the cond it ion part and V is the decision part. A restricted form of the VL decision rule will be used in the following chapters. (In the computer implementation, the formula V is assumed to be a product of selectors which contain 0-ary functions in the referee. The terminology is relaxed in this case to allow the function symbol which appears in the decision part to be called a decision variabl e. ) A decision rule in the form 2.5.1 may be applied to a set of situations as follows: If the condition part of the decision rule (V ) is given truth-status TRUE, then the decision part of the decision rule (V ) also assumes the 2 truth-status TRUE. For each event (assignment e:=L) which satisfies V , a new set of assignments are made to the event 1 using the decision part of the rule to form the set of all events which satifsfy the conjunction V 6 V • For example, 12 given the decision rule: E. Xi ,X 2 [p(x ,x )] => [D=1] (2.5.2) and the event e: v =0,1; ,,-0.1 p,0.1),.«0« (2.5.3, with variables functions and predicates: x, D, p, g with 28 domains D(x)=[0,1], D(D)*[0,1], D (p) ,D (g) =[ IHIJE, FALSE ], anew assignment is made to e to give: e: x :=0,1; x :=0,1; p (0,1) :=TRUE; D:»1. (2,5. 4) 1 2 A decision rale: E.x i# x 2 CPU #x ) ] => I»" 1 »* 2 [9(^»» 2 ) ] (2.5.5) applied to the event 2.5.3 gives one of the two new events: e : x :-0,1; x :=0,1; p (0,1) :=TR0E; g (0, 1) :=TRUE; e 2 : ,,-0,1, x^O.1: p ,0.1, :=TRDE; , ,1.0, :=TE0E; Note that q(1,1) and g(0,0) are not given status TBUE since the quantifier E. insists that the tvo variables x and x 12 have different values. Given a set of decision rules each in the form 2.5.1, the set may be applied to a set of events. Initially, the condition parts of all decision rules have value UNKNOWN. If an event satisfies the condition part of a decision rule, nev assignments are made to the event according to the decision part of the satisfied rule and the condition part of the rule returns to truth status UNKNOWN. In the remaining chapters, events are only used as a formal basis for defining certain concepts. Since the number 29 of events necessary to completely describe a situation is quite large, only the VL formulas themselves are manipulated by the algorithms. 30 3« VL Transformation Bales From one set of decision roles (1. 1) , a new set of decision roles (1*2) is obtained by applying certain transformation roles (t-rules) • For now, we will restrict oar attention to roles which transform the condition part of a role. These t-rules may be grouped into three types of roles based on the events which satisfy the condition part* Given two roles: H : V => D 1 1 H : V => D 2 2 V is more general than V if every event satisfying ¥ also 1 2 2 satisfies V • If the converse is also trae, then V is 1 1 egoivalent to V • Below, B and B are the input and ootpot 2 12 of a t-role. The three types of t-roles can be expressed as follows: h transformation T : B I # B (# being one of =, <, >) is 1. An equivalence transformation (denoted B |= B ) if 1 2 the condition parts of both roles are egoivalent. 2. A generalizing transformation (denoted B^ |< B ) if 1 2 31 the condition part V of R is more general than the 2 2 condition part V of R • 1 1 Rules 1, and 2. are called inductive inference rules 3. A specializing transformation (deductive inference rule denoted R |> R ) if the condition part V is 1 2 1 more general than the condition part V • 2 Here, we are most interested in the first two types of t-rules (i.e., inductive inference rules). 3.1 Eguivalence Transformation An eguivalence transformation rewrites a VL formula 2 into a different form either using equivalent VL operators or introducing new functions which represent some information already in the rule in a different manner. Below are some examples of eguivalence transformations. The symbols L and L represent atomic forms- V and V* represent VL formulas, D 2 2 represents a VL formula which has no variables in common 2 with V, V», L or L - and |= is used to represent an 1 2 eguivalence transformation. E1. Eguivalent VL forms. 2 V(fL = 1] v [L = 2]) => D |= V[L = 1,2] => D 32 ?[L -= 3,4] => D I* V[L ~ 0,1,2,5] *> D (assaming that the domain of L is the interval [0..5] and has nominal structure)* The dot operator (•) between two atomic forms is called 'internal conjunction* thus the expression on the right above is read: 'If L and L both have the value 1 and V is 1 2 satisfied, then make decision D. ' • V([L 1 = 1 ][L 2 = 1]) => D |= Y(£L .L = 1]) => D (assuming that L and L have the same domain size 1 2 and structure) • 22. Internal Conjunction of Arguments VV => D |= VV'[f'(l .X )=i] => D 12 V = [ff(x ) = i][f (x^ = i] This rule introduces a new predicate f ' which has the domain [TRUE, FALSE] and two arguments. The (• ) operator instead of (,) indicates that the order of arguments to f is irrelevant. The function f assumes the value TfiOE if f has the value i for both arguments x and x • 1 2 E3. Introducing New Predicates. VV => D |= Y¥'[rel_f(x , x ) ] => D 33 v = rf^) = i)m* 2 > = ji where i is related to j by rel. Por example, ii# i> = j would result in new predicates LT_f, GT_f r GE_f. Ett. Splitting the Condition Part. V v V 1 => D |= V => D, V 1 => D This rule is used to form a set of decision rules with product condition parts from decision rules in disjunctive normal form. 3,2 Generalizing Transformations This type of transformation usually produces not only a more general decision rule from a set of decision rules but also a 'simpler 1 one than the original. Some rules are applied in context. That is, one rule is actually transformed but the context consisting cf rules with a related decision is used to obtain a more optimal result, (These transformations may also be interpreted as transformations from a set of rules into a new, more general rule. The approach of focussing on one rule in the •context* of others is taken here to more closely reflect the approach taken in the implementation. ) In the following rules, the symbol |< is used to indicate an inductive inference rule. 3a Gl Dropping a Selector* VfL = H] => D |< ? => D Although this rale is • interesting in a formal sense, it should be applied with care since the number of generalizations possible with successive applications of this rule is very large. Hore is said about this problem in section 3* 4* G2, Extending the Reference Hominal domain structure VfL = a] => D |< V[L = a,b] => D in context [ L = b 3 -> D Interval domain structure V(L = a] => D »< V[L = a..b] => D in context [ L = b ] => D Tree structured domain VfL = a] => D |< ?[L = c] -> D in context [L = b] => D c is a predecessor of both a and b in the generalization structure of the domain of L. G3. Extension Against 35 Nominal domain V [L = R 1 -> D |< V [L -*=B ] => D 11 12 in context: V [L = R 1 => -»D 2 2 assuming R & R = null. (The symbol S 1 " 2 denotes set intersection.) Interval domain structure V [L = a..b] => D |< V [L = e. . f ] =>D 1 1 in context: V [ L = c. • d 1 => -»D 2 assuming fa..b] & [c.d] = null if b < c then e = 0, f - c-1 if a > d then e = d+1, f = h (0 and h are the minimum and maximum elements in the domain of L) Tree structured domain V [L = a] => D l< V fL = c] => D 1 1 in context: V [L = b] => ~>D 2 assuming a & b = null the constant (c) is the most distant ancestor of (a) which is not an ancestor of (b) (c may be egual to a) • GU. Replace A with a constant 36 (Ax V(x )) => D |< V(x ) [x = a] => D i i i i a - an element in the domain of x • (The i symbol A represents the universal quantifier*) For * example: Ax [ffx^ = 1] => D |< [f D The left side of the expression requires that f have the value 1 for all values of x before a decision be 1 made. The generalized expression (right side) only requires examination of one value in the domain of x , namely the value 2. ill other values in the domain of x are irrelevant to the decision rule. 1 G5. Beplace a constant with E V(x )fx = a] => D |< Ex V (x ) => D i i i 1 a - an element in the domain of x • (The i symbol E represents the existential quantifier.) For example: [f(x ) = 1][x = 3] |< Ex [f(x^) = 1] => D The left side of the expression makes a decision only if f has the value 1 for x with value 3. The 1 37 generalized expression makes a decision if f has the value 3 or any value of x . 1 G6. Move E to the Right Ex Ax ? (x ,x ) => D |< Ax Ex V (x ,x ) => D ijij 1~iij 3. 3 Specializing Transformations A specializing transformation may be used to apply a decision rule to a new situation or to add certain restrictions to decision rules* Below, the symbol |> represents a deductive inference rule. R1. Adding Restrictions VV» => D |> VV» ¥••»>' D in context V» => V« ■ This rule is used to add restrictions to descriptions and generalizations where the restrictions represent some structure which is imposed on a function or some relationship between functions which always holds. For example. the transitivity or symmetry of a function may be introduced using a restriction. rf (x ,x ) ][f (x ,x ) ] => [f (x ,x ) ] (transitivity) 38 [f (x^x ) ] => [f (x^x ) ] (symmetry) R2. Dropping Product Rule V v V => D |>' ▼ -> D 1 2 1 This rule may be used to obtain a set of decision rules with a product in the condition part from a rule in disjunctive normal form. 3.4 Transformation Rules Involving the Decision Part It is clear that reversing the roles of the input and output of an inductive t-rule gives a deductive t-rule and conversely. Therefore, each of the rules G1-G6 could be inverted to get a deductive rule. The rules in sections 3, 1-3.3 were based on the events satisfying the condition part of a decision rule* Similar rules may be applied to the decision part of a rule to obtain equivalent, more general or more restricted rules. Given two rules: R : V => D 1 1 R : V => D 2 2 R is more general than R and R is more restricted than R 2 11 2 if every assignment of event values made by R is also made 39 by F . If the converse is also true, then R is equivalent 2 1 to P • To transform the decision part of a decision rule, 2 apply G1-G6 to the decision part of a rule to obtain a deductive rule and R1-H2 to obtain an inductive rule interchanging the roles of the condition and decision parts in the transformation rules. A few examples are given below. DP1. Dropping Selector Rule Applied to the Decision Part V => [L = R ][L = R ] |> V => [L = R ] 1 12 2 1 1 Though this is a generalization rule when applied to the condition part, it is a deductive rule when applied to the decision part of a decision rule. DP2. Splitting the Decision Part V => [L = R ]D |= V => [L = R ], ? => D 2 2 11 This eguivalence preserving rule is used to produce a set of decision rules which involve only one decision variable. DP3. Replace a Constant with A in the Decision Part V => TL(x ) = R ][x = a] |> ? => Ax [ L (x ) =R ] 1 11 "111 40 3*5 Example of Application of Transformation Bales In this section, selected transformations will be examined in more detail with respect to a specific example* Consider the situation in Figure 3,1. • There are four objects classified according to two decision variables DA and DB. Each object is described in terms of the following VL 2 functions and predicates: p ,p : variables each representing a part in an object with domain : P = [0,1] with nominal structure ontop: a predicate mapping P x P into [TBUB, FALSE] The predicate is TRUE if the part represented by the first argument is on top of the part represented by the second argument. shape: a function mapping P into [Triangle (T) , Circle (C) , Rectangle (R) , Ellipse (E) , Polygon (P) , Curved Figure (CF) ] with generalizations: [shape=T,R ]=>[ shape=P ] and [shape=C,E ]=>[shape=CF ]• The function specifies the shape of the part represented by the argument. The domain is a tree structured domain where Polygon is an ancestor (a generalization) of Triangle and Rectangle and Curved Figure is an ancestor of Circle and Ellipse. 41 DA,DB: decision variables with domain [1,2] DA = 1 DA = 2 1 Circle 2 Circle DB = 1 Triangle Circle 3 Ellipse H Rectangle DB = 2 Rectangle Rectangle Figure 3. 1 Set of Objects Each object in fig 3.1 may be described in the VL system. The description of objects 1 and 4 arc: [ontop(p ,p ) ][ shape (p ) =C ][ shape (p ) =T ] => [DA=1 ][DB=1 ] (3. 5.1) [ontop (P^Pj ]f shape (p)=R][ shape (p)=R] => [DA=2 ][ CB=2] (3.5.2) The variables p and p are assumed to be guantified with the 1 2 operator E. (distinct existential guantifier) • Using the 42 rule DPI, the decision rale 3*5*1 can be transformed into two new rules: [ontoplp^p^ Hshape(p )=C][ shape (p)=T] => [DA=1 ] (3.5.3) [ontopfp^p^ ][ shape (p ]=C][ shape (p)=T] => [DB=1] (3.5.4) Concentrating now on 3.5.4, the dropping selector rule may be applied twice in succession to obtain [ shape (p )=C] => [DB=1]. (3.5.5) This rule describes all objects in figure 3. 1 with the decision DB=1. In addition, it does not describe any object with DB=2; so it is a complete, consistent generalization of the objects with DB=1. (Note that completeness and consistency are in no way guaranteed by the application of the dropping selector rule. These conditions can however be checked after each application of the rule.) Applying the extension against rule to 3.5.3 in the context [onto(p ,p )][ shape (p ) =B ][ shape (p ) =B ] => [Dl=2] (3.5.6) 12 2 2 focusing on the selectors [ shape (p ) =C ] in 3.5.4 and f shape (p )=B] in 3.5.6 one may obtain [ shape (p =CP][ontop(p ,p )][ shape (p ) =T ] => [DA=1]. (3.5.7) 43 Applying extension against now to 3.5.7 in the context [ontopUyp , ]tshap e «p i ,.C] [ shape,P 2 ,»C] -> (Da-2] (3.5.8, focusing on the selectors [ shape (p ) =T ] in 3, 5, 7 and [ shape (p ) =C ] in 3.5,8, one obtains [ s.ape,p i ,=CFMo„top ( p i ,p 2 ,Mshape,p 2 ,«P] [DA=1 ,. (3 . 5 . 9) This rule is a consistent and complete generalization of all rules in fugure 3.1 with DA=1 in the decision part. Looking now at the description of object 4 (3.5.2), two rules are obtained by applying t-rule R1: [ontop( V P 2 , K shape,p i ,=HMshape«P 2 ,=R] => [D .-2] ,3.5.10, [ontop(p if p 2 ) ][ shape ( Pi )=R][ shape (p 2 )=R] => [DB=2] (3.5.11) An application of the dropping selector rule to 3.5.11 twice in succession gives a complete, consistent rule: [ shape (p )=R] => [ DB=2 ] (3.5.12) An application of t-rule E3 to 3.5.10 with a relation egnals produces: [ontop(p ,p ) ]f shape (p)=R][ shape (p ) =R ]f EQ-shape (p . p ) ] 12 1 2 12 44 =>[DA=2] (3.5.13) The predicate EQ-shape specifies that the value of the function shape is the same for arguments p and p • The ,., separating the argents ^ and ^ of EQ-shape signif, that the order of the arguments is irrelevant. (In the implementation, a selector with this type of predicate is written [ shape (p . p ) =same ). ) An application of the dropping selector rule three times in succession to 3.5.13 gives: tE0-.h.p.ID.-2] C3.5.1.) which is a complete, consistent generalization of 3.5.10. In summary, the new rules which were obtained are: tontopOyp^Kshape^.-CPJtshape.p^P] => [0 »-1 ] [ EQ-shape (p ,p ) ] => [DA = 2] (3.5.15) 1 2 [ shape (p )=C] => [DB=1] [ shape (p )=R] => [ DB=2 ] In the above discussion, the resulting simple generalizations depended on knowing the proper rules to apply and the proper portions of the decision rules to modify. Por larger problems, this approach is infeasible because of the large number of possible generalizations which could be 45 made. Therefore, a more efficient approach is required. Such an approach is given in the next section and described in detail in the chapter on basic procedures (chapter 5) • 3.6 Efficient Application of Generalization Pules There are two significant problems with using the procedure described in section 3. 5 to apply t-rules to a set of decision rules. The first is the large number of new rules which can be generated from one rule. This problem could be circumvented by trimming the intermediate lists of new rules selecting only the most promising set of rules according to a user specified criterion before further application of t-rules (see section 5.3 for a description of a trimming procedure). A second probleir which is not surmounted as easily is the complexity involved in determining whether a VL formula is consistent and in 2 evaluating the optimality cost functions during trimming. To determine whether one VL formula is a generalization of 2 another or is consistent with respect to another formula, one must determine whether all or any of the events which satisfy one VL formula also satisfy another VL formula. 2 2 Using the current implementation, this involves determining whether one graph structure representation is a subgraph of another (see section 5.4 for a description of a 46 subgraph isomorphism algorithm). This problem is exponential in nature (i.e., the time to determine whether one graph is a subgraph of another using a depth-first-search is proportional to m raised to the power of n where m is the number of nodes in the larger graph and n is the number of nodes in the smaller graph). Actually, it is not guite so time consuming since the graphs hare relatively few edges and the edges and nodes are labelled. It is, however, important to form simple generalizations which correspond to small graph structures. Since the cost criterion normally includes a cost function which minimizes the number of selectors in a product, the consistency and optimality of optimal rules should be easily calculated. In the example of section 3.5, the original rules were made smaller by dropping selectors. An alternate approach is to grow the generalized rule beginning with single selectors and adding new selectors until a consistent rule or set of alternative rules is created. A very general algorithm follows; chapter 5 gives a complete description of the algorithm in the current implementation. 1. Given a set of decision rules in disjunctive normal form, create a new set of rules with a product in the 47 condition part and one selector in the decision part of each rule. 2. Select a rule which involves one value of a decision variable. Add to the rule new selectors which represent functional relations as specified by a user. (i.e., multiply the condition part of the rule by selectors containing new functional relations.) 3. Find a consistent generalization of the rule from 2 by locating the most promising selectors of the rule from 2 and adding new selectors to each of these selectors until a set of consistent generalizations of the rule from 2 is obtained. ft. Apply the extension against rule using an AQVAL/1 procedure to all consistent rules. 5. Select the best generalization from this set and remove rules from the set produced in step 1 for which this is a generalization. Repeat steps 2-5 until no more rules remain which involve the decision variable and value of step 2. 6. Continue by selecting another value of the current 48 decision triable or selecting another decision variable until all decisions have been considered. 49 4. Computer Representation of VL Decision Rules Some of the information in this section appears in [Larson et.al. 77] and is given here as a background for the description of the computer implementation, A VL decision rule can be represented as a graph with labelled nodes and directed labelled edges. The labels on the nodes can be: a) a selector containing k-ary descriptors without argument lists, b) a k-ary descriptor without arguments, c) a quantified variable with an optional subrange of values, d) a logical operator, (From here on, a node is referred to by its label, e.g., a selector node means a node with a selector label.) The edges are labelled with integers from 0,1,... • Edges not labelled refer to the position of an argument in the label at the head of the edge. (Edges have non-zero labels only if the position in the argument list of the head node is important. Labels of may be dropped for convenience.) Several different types of relations may be represented by edges. The type of relation is determined by the label on the node at each end of the edge. The types of relations are: 1. function Dependence - The label of the head node of 50 the edge has a k-ary descriptor* The value represented by the edge is the value of the atomic form in the tail if the tail is a selector node, a descriptor value if the tail is a descriptor node, or one or all of a set of descriptor values if the tail is a quantified variable* The edge label specifies which argument of the head node assumes this value (Figure 4.1). Ex +* [ g = 1 . . 2 ] m» [ f = 1 ] - Ex - 1 L * - 2 Functional Dependence: Ex^ ([g (x^ =1. . 2 ][f (g (x^ .^ = 1] Figure 4.1 2» logical Dependence - The head node is a logical operator (e.g. v, 6, =>) and the tail node is a selector node, or a logical operator node. If the tail node is a selector, then the value represented by the edge is the truth value of the selector at the tail (Figure 4.2) 51 [f = 1] [g = 2] Logical Dependence : Ex (ff(x ) » 1 J v [ g(x ) =2] Pigure 4,2 3« ISJBlicit Variable Dependence - The labels of the head and tail nodes are quantified variables. This type of dependence represents the implicit function (which can be represented by a Skolem function) (Pigure U. 3). [P = 1] Ax -*• Ex Implicit Variable Dependence: Ax Ex fp(x ,x ) =11 " r r r 2 Figure 4.3 52 4. Scope of Var.±aJ>i£s - The head node is a logical operator and the tail is a .quantified variable. This type of dependence may be necessary for certain binary loqical operators such as (->, <->) • For the functions v and L this type of dependence is implicit in the functional dependence of the arguments* (Figure 4.4) [P = 1] * *► =>- * [q - 1] Ex Ex ~ 1 2 Scope of Variables: Ex ,x ([ p (x ) = 1 ] => [g(x ) ■ 1 ]) Figure 4.4 The graph of a more complex decision rule is given in Figure 4.5. The value of x is dependent in an unspecified way on the value of x (the edge labelled 1). The disjunction 2 (v) depends on the values of x and x , but this is clearly 2 3 specified by the functional dependence of £ and g on x and x . Pinally, observe that the decision operator (=>) does not 53 explicitly depend on the specific values of x , x , or x , 12 3 but instead depends on the truth value of the entire premise using some set of value assignments for x ,x , and x • 12 3 Ex rf = 1] [f = 1] Graph Structure Example: Ex Ax Ex (([f(x ,x ) = 1 ] v [g(x ,x ) 12 3 2 3 2 3 [f(x ,x ) - 1J -> Id - 11 Figure 4,5 - 2,3]) 5U 5, Algorithms and Computet Implementation A computer program I!DUCE_1 has been written to find a generalization of a set of decision rules. The algorithms are described in the remaining sections of this chapter; examples of generalizations produced by the program are given in chapter 6 and a sample session with the program is given in appendix A. The program does not perform all of the transformations given in chapter 3 and does not accept the full VL language given in chapter 2. The restrictions on the form of the input and a description of the output are given in the next two sections. 5.1 Input to the Program The program accepts as input: 1) a set of decision rules, 2) a problem environment description including a set of restrictions, domain definitions, variable costs etc., and 3) a set of parameters which control certain aspects of the program operation. Decision rules, restrictions, and domain structures are entered a VL type formulas in the following 2 format : Decision Rules Decision rules must satisfy the following grammar: 55 11. 12. 13. 14. 15. 16. 17. 18, 19. 110. := => . := | := [ ( ) = ] | [ ( ) ] | [ = ] | [ ( ) ] := , l := . | := , | .. | | * = [ = ] = string of letter = := string of digits. Examples of decision rules: TfU^) = 2,3][g [D = 1], In this expression, the structure of the domain of the function g is set to interval by the program (see note 4 below). The domain of f is unchanged. All variables (e.g. x and x ) are 1 2 56 quantified by the operator f (distinct existential quantifier - note 3) [p(x ,x ) I* = 2,U] *> [D = 2], In this example, the function p is assumed to have two values (i.e., it is a predicate - note 6). The selector containing p is satisfied if it has the value 1. The second selector restricts the possible values of x to the set of values [2,1] (note 3). Several observations can be made about this grammar: 1. The condition part is a single product and the decision part involves only one variable. It is assumed that one decision variable has been selected to be studied by the user. Also, the equivalence rule has been applied which splits a condition part of a formula in disjunctive normal form into a set of decision rules with condition parts which are single products (t-rules EU and DP2) • 2. Each atomic form is a function symbol with a list of single variable arguments. It is assumed that the user has converted forms such as [g(f (x.)) ] l 57 into a form [g(y ) ][p(y ri.) ] 1 j i by introducing a new predicate p(y ,x ) f or each J i function symbol f which is in an argument list and a variable y for each occurrence of the function f (x ) J i in an argument list. The predicate p is assumed to have the value TRUE if y = f (x ) and FALSE J i otherwise. For example: [shape (part (x ) ) = 1 ] is assumed to be transformed into e.g. an expression [ shape (p ) = 1 ][ contains (p ,x ) ] 3. ill variables (arguments) are assumed to be existentially quantified. Variables with the same function symbol part are assumed to have the same domain. Furthermore, variables with values from the same domain are assumed to take on distinct values from the domain. Variables may be restricted to a subrange of values by using the third option in the 58 selector definition* Using this method, a constant may be specified as an argument to a function. ft. If a reference of the second form is specified at least once in the input (production 16) (e.g. [f(x )=2..2] or [f(x )=2..5]), a domain of type interval is assumed; otherwise, the domain of a variable or function is assumed to be of nominal type or tree-structured if such a structure is specified. 5. The last form of the reference uses a value (*) . This specifies the entire domain of the associated function symbol. (E»g» [ P=* ] means that the selector is satisfied for any value of p other than the value NA.) If a selector is omitted from the decision rule entirely, the program assumes that the function has the value NA (not applicable). Such a domain value has no generalization (other than NA itself). Although NA is not specified in an input rule, it can be a valid value of of a variable. 6. If the second form of the selector is used (production 13, ,..,. [f !«,.*,» B. the program assumes that the function symbol has the value 1. This may be used to specify a TRUE value for a 59 predicate (predicates are treated as functions with domain [0,1]). In general, tc simplify the expressions, only positive values cf predicates are specified* The program then uses only positive instances of the predicate in the generalizations. If negative values of a predicate are desired in the generalizations, these relations should be included in the initial decision rule specification (see the arch „„pl. in chapter 6). (e.g. [ontop ftyp^ ] specifies that p is on top of p ; [ ontop (p ,p ) =0 ] specifies that p is not on top of p •) 1 2 7. If the fourth form of the selector is used (e.g. [p(x . x ) ], then the order of the arguments is assumed to be irrelevant. Restrictions: Restrictions must satisfy the following production: ::= - > where the arguments of the selector part must all appear in the condition part. CONDITION and SELECTOR are the same as in the decision rule grammar. Por example: 60 [left(i ,x ) ][left(z ,z ) ] => [leftfz ,z ) ]. Restrictions extend or modify decision rale specifications. Por every occurrence of the CONDITION of the restriction in the condition part of the decision rule, the selector is added to the condition part of the decision rule if it is not already there. If the SELECTOR is already in the condition part of a decision rule, then the values of the SELECTOR replace the values in the occurrence of SELECTOR in the decision rule. Using this feature, the transitive closure of a transitive function may be calculated. A restriction which adds equivalence type predicates is included by default. Each restriction is applied to each decision rule as it is entered into the program. Domain Generalization Structure Specification This specification must satisfy the production: ::= [ = ]=>[ = ] where the two function symbols are the same and FN-SYM and REP are as in the decision rule grammar. 61 Por example: [s = 1,3,4] => [s = 6]. [s = 0,2] => [s = 7]. [s = 4] => [s = 8]. [s = 6,8] => [s = 9], Entering a T-STROC automatically sets the type of the domain for the function symbol to the structured type. Each element in the REF of the condition part of this rule must either be a leaf of the corresponding tree or have been previously defined with another T-STROC specification. Parameters: Several parameters may be entered into the program. In addition to the parameters mentioned in the following sections (i.e., the MAXSTAR parameter and optimality criterion parameters) , certain trace reguests may be entered to follow the execution of the program. 5.2 Program Output Several trace features are available to obtain a list of intermediate results in the program. An example of 62 program output is given in appendix A. Although input decision rules may be any formula which satisfies the grammar given in the previous section, the program only searches for generalizations which can be represented by a connected graph structure with functional dependence edges. The user may construct decision rules to satisfy this constraint by including new predicates which link products which have no arguments in common. (In general, it can be tested to see if a product of selectors in a decision rule has a connected graph structure by determining whether there is a partitioning of selectors into two products which have no argument in common. If there is such a partitioning, then the graph structure is not connected.) Different generalizations may be obtained by varying the program parameters and reordering the decision rules. In general, increasing any of the MAXSTAR, ALTER, or NCONSIST parameters (described below) will cause the program to require more memory and time but the resulting generalizations may be more optimal. Rearranging the optimality criteria may also produce a different result. A higher tolerance associated with a particular cost function will reduce the selection based on that function. The default optimality criteria (or optimality cost functions) in the order of application are the following: 63 1. Minimize the inconsistencies of a rale with tolerance 0.30, i.e., the number of events covered by the rule which are not supposed to be covered by the rule, (this is cost function number 3 in section 5.5). This allows the program to produce consistent generalizations quickly. The high tolerance removes highly inconsistent rules while leaving selection of nearly consistent rules to the remaining cost functions. 2. Minimize the number of products in the complete generalization with tolerance O.CC (this is cost function number 1 in section 5.5). 3. Minimize the number of selectors in each product produced by the program (function 2 ir section 5.5). *»• Minimize the cost of functions in each product (function U in section 5.5). If costs are specified, this criterion may be moved forward. If the user wishes certain functions to appear in the resulting products, the costs for these functions may be specified (given negative cost). Similarly, functions 64 which are very difficult or costly to measure may be given appropriate positive costs. 5. Maximize the intersection of resulting rules (cost function 5 in section 5.5). In real situations, the separate products produced by the program may represent a large number of common input decision rules along with some peculiarities of specific situations which arise. Use of this cost function will favor the selection of a more representative result as opposed to one which describes only a particular set of situations. Appendix B contains a more detailed discussion of a similar cost function in the VL system. 1 The program contains about 40 procedures. Five major tasks which are performed by some groups of procedures are described below. 5.3 Formation of a Complete Generalization A generalization is found of a set of decision rules is found containing a specified value I in the decision part. Two sets of products are generated: a set F1 which contains all products in the CONDITION parts of rules with a decision value of I, and a set F0 which contains all other 65 products. Each product is called a c-formula (conjunctive-formula) • One c-formula E1 of F1 is selected at random and a connected-con-junctive-f ormula (c2-formula) is generated which is a generalization of E1, consistent with respect to the set P0 r and near optimal with respect to a user defined criterion. A c-formula is connected if its graph structure representation is weakly connected by functional dependence relations. A c-formula is consistent with respect to a set of c-f ormulas FO if it does not intersect with any element of the set FO (i.e., there is no event which satisfies both c-formulas) • Once a generalization of E1 is found, it is saved in a set CQ and all elements of F1 which are covered by this generalization are removed from F1. One c-formula E1 covers another c-formula EO if E1 is a generalization of EO. Another element of the new set F1 is selected and the procedure repeated. When there are no more elements in F1, the complete, consistent generalization of the set of c-formulas F1 is the disjunction of all c2-formulas in CQ. 5.4 Determine Cover and Intersection of 2 Fornrulas Two similar procedures are decribed here. The test to determine whether a c2-f ormula E covers a c-formula E 1 is 66 used when E* is an element of the set P1 . The test to determine whether E intersects with E* is used when E' is an element of FO (i.e., to determine if E and E' are consistent) • The procedure uses the graph structure representations of E and E* (6 and 6* with nodes and edges V^E^V^E* respectively). The graph G is assumed to be weakly connected. E cowers E 1 if there is a specia lizing isomorphism (s- isomorphism) from G to a subgraph of G*» The reverse mapping (from a subgraph of G' to G) is called a generaliz ing isomorphism (g- isomorphism) • E intersects with E* if there is an intersecting isomorphism (i- isomorphism) between G and a subgraph of G'« Each isomorphism from G to a subgraph of G' is a 1-to-1 correspondence between nodes and edges of G and a subset of nodes and edges of G' where the correspondence (or matching) of nodes and edges is defined as follows: A node n of G matches a node n* of G' if: 1. They are both selector nodes or both quantified variable nodes. and 2. If they are selector nodes, then the function symbols in both nodes are the same. If they are variable nodes, they are of the same group of variables. 67 and 3« With an s- isomorphism or g-isomorp hism, the set of values associated with n is a generalization of the set of values associated with n'« (lhe sets of values may be equal.) With an i- isomorphism, the sets of values intersect. In the case of selector nodes, these values are the elements in the reference of the selector. In the case of guantified variable nodes, these values are the subranges of the variables. An edge of G matches an edge of G* if : 1. They have the same label and 2. The respective head nodes match and the tail nodes match. To speed rejection, a quick scan through the nodes is made to see if there is a correspondence between nodes of G and a subset of nodes of G* (ignoring links between nodes). If there is a possible correspondence, a procedure is invoked which locates a subgraph of G* which is isomorphic to G and assigns each node of G to a corresponding node of G' • The procedure is as follows: 68 1* Select a starting node (n ) of 6 which contains the most labelled incoming edges. (This is the selector node with the largest number of arguments*) Selecting a node of this type insures that there is a minimum of backtracking through the starting node* 2* & rooted directed a-cyclic graph G* vith nodes and edges V* and E* is constructed from G by copying all nodes and edges of G to G* and assigning a direction to each edge of G* so that G* has no cycles and for each node x in f V*-n* 1, there is a path from n* to x* (n* is the node of G* which corresponds to n in G. ) A traversal of the graph G* is the list of edges and nodes visited in a preorder traversal of G* vith root n* • A preorder traversal of a subgraph with root x visits the node x, visits each outgoing edge of x and traverses the subgraph which has as the root, the head node of the traversed edge* 3* The graph G is traversed in the order of the traversal of corresponding nodes and edges of G** At each step of the traversal of G, a node and new edge of G* is found which match the node and edge of G* If two nodes match, they are assigned to each other 69 and a record of the matching nodes and edges is kept for each assignment in a backtrack list. To establish a 1-to-1 correspondence, nodes of one graph which are previously assigned can only match corresponding assigned nodes of the other graph. 4. If there is no node and edge of G ' vhich matches a node and edge of G, the procedure backtracks to the previous nodes and edges on the backtrack list, erasing the last nodes and edges on the backtrack list and the assignments associated with the nodes. Another node and edge of G a are selected which match the last node and edge of G on the backtrack list and the traversal of G continues. If nc node and edge of G* can be found at this point, then the procedure again backtracks until a new match is found or the backtrack list is exhausted. 5. If the traversal of G is complete, then G covers (intersects) G 1 * If the backtrack list is exhausted, then G does not cover (intersect) G'. A feature is included which finds all subgraphs of G' vhich are isomorphic to G. This feature is used in section 5.7 and in adding restrictions to c-formulas. 70 6, If the traversal of G is complete, then the current set of assignments is the desired napping. To find the next isomorphism, the procedure returns to step 4 assuming that the last nodes and edges on the backtrack list did not match* 5, 5 Trimming a Set of c-formulas ItilDSJiDa is *& e process of selecting the BAXST1R best elements of a set of c-formulas with regard to a user defined criterion. The user specifies the cost functions which are to be used, the order in which they should be applied, and the tolerance associated with each cost function* Implemented cost functions are: 1* The number of events of the current set P1 which are covered by a c2-formula. (The negative of this guantity is used to obtain a cost*) This function minimizes the number of c-formulas in CQ* 2. The number of selectors in a c2-formula* The function minimizes the number of selectors in the c-formula. 71 3. The number of events of FO which intersects with a c2-formula. This function leads more rapidly to consistent c-formulas. 4. The total cost of all functions contained in a c2-f ormula. 5. The number of events of the original set PI which are covered by a c2- formula. (The negative of this guantity is used to obtain a cost.) This function finds the most representative c-formulas. A set of c-formulas is trimmed using n cost functions (cf ,cf ,...cf ) and relative tolerance for each cost 12 n function (tol ,tol ,...tol )• The costs are applied in the 12 n order specified by the user (cf first, cf second, etc) • For 1 2 each cost function cf , the MAXSTAH best c2-formulas along i with all c2-formulas eguivalent in cost to the HAXSTAR best c-formulas are passed to the evaluation using the next cost function cf • Other c2-formulas in the set of c-formulas i*1 are discarded. With the last specified cost function (cf ) , n only the MAXSTAH best c-formulas are retained. For each cost function cf , i=1,2,...,n, eguivalence i of two c-formuas in cost is defined using an absolute 72 tolerance (AT ) • Suppose the set of c-formulas P is composed i of a list p , p , ...p • After values for cost function cf 12m i have been evaluated cf (p ) for each c-formula p , the i J J maximum and minimum cost functipn values are determined cf (p ) and cf (p )• An absolute tolerance (AT ) is i max i min i calculated using the user specified tolerance tol as i follows: AT. = tol. * (cf. (p ) - cf. (p )) ixi max l mm The MAXSTAR c-formulas of least cost are determined and the list reordered (p ,p ,...,p ,...,p )• If iAT ] j i j i MAXSTAR i If i=n # (the last cost function) then only the MAXSTAR best c2-forwulas are retained. P = P - [ p : j >MAXSTAR], The set of c2-formulas which remains is the desired trimmed set of c2-formulas. 73 5.6 Formation of a Set of Consistent Generalizations A star (denoted by MQ) is formed which covers E1 • (A star which covers E1 is a set of consistent c2-f ormulas which cover E1.) The procedure begins by forming a partial star (P) which contains a set of c-formulas each consisting of one selector of E1 • (The £artial sta r may contain c2-formulas which are not consistent with respect to FO. ) This partial star is trimmed according to the user s applied optimality criterion. The conjunction in each c-formula which remains after trimming is multiplied by each selector of E1 which is directly connected to it to form a new partial star. Consistent c2-formulas are placed in WQ# The partial star is again trimmed and new selectors added to each product until the desired set MQ of c2-formulas is obtained. Several paramenters control the sizes of sets in this procedure: BAXSTAH - the number of c-formulas in a partial star after trimming NCOHSIST - the minimum number of consistent c2- formulas which must be in HQ ALTER - the maximum number of new alternatives which may be formed by adding selectors to an element of a partial star. 7U In the following discussion, equivalence type selectors (i.e. , selectors of the form [f (z , x )=samel) are 1 2 treated differently from selectors involving a function symbol and a set of values in the reference. 1. A partial star P is formed which contains all selectors of E1 with unary functions. 2. P is trimmed to contain only the best MAXSTAK c2- formulas. Consistent c-formulas are placed into HQ. If fewer than NCOMSIST elements are in MQ, then step 3 is executed. Otherwise, the AQ procedure is applied to the elements of NQ (as described in section 5.7) • 3. A new partial star P* is formed fronr the old one (P) • Por each element p in P, a list of all variables i (i.e., arguments of selectors of p ) is formed. All i arguments of equivalence type selectors which occur in the corresponding selector of E1 are also included in the list. U. A list of all selectors of E1 which are not already in p and which have at least one argument in the i 75 variable list (found in step 3) is created* If there are more than ALTER elements in this list, the best ALTER selectors are retained (using as a criterion the cost of functions in the selectors) • 5, For each of these selectors, a new c2-formula is formed which contains the original selectors in p i and the new selector. If the new c-formula contains an equivalence selector with only cne argument, then the new c-formula is discarded; otherwise, it is placed in P*. Steps 2 through 5 are repeated, setting P - P 1 in step 2 until NCONSIST elements are in MQ or until no new elements are in the new partial star P*. 5,7 Extending the References of a Consistent c2-formula Each consistent c2-formula of MQ (obtained in section 5»6 step 2) contains an alternative, near optimal conjunction of selectors of E1 which distinguishes E1 from any c-formula of PO, Using some methods developed for the program AQ7, the reference of each of these selectors may be generalized to obtain a consistent c2-formula which will possibly cover possibly more c-formulas of F1. Given a graph G of a conistent c2-fcrmula mg in MQ, a c^structure is created (G*) by replacing all references of 76 nodes of G with * (the complete set of values for the function in the selector) • The nodes of G* are enumerated n* i (i=1 ,2 ,. •• # m) and a TL system is created with each VL variable x related to a node n* of G*. The domain (denoted i i D ) of variable x is the same as the domain of the function i i or variable in the node n* • i The VL events space may be defined: E - D xD x • •• x D • 12 n Two sets of VL complexes L and L* are formed from the events 1 of the current set F1 and the set FO respectively. Individual complexes in these sets are denoted 1 and 1' • i i Each compelex covers a set of points in the space E. For each element of F1 and FO all isomorphisms from the c-structure G* to the graph representation G of a c-forraula in F1 or FO are determined. For each isomorphism obtained from F1 and FO, a VL complex is created (1 or 1' )• 1 i i Denoting the value sets of the nodes in a subgraph of G which is isomorphic to G* as R,R,...,B, the corresponding VL 12m 1 complex may be written: [x =R ][x =R ] ... [x =R 1, 1 12 2 mm This complex covers the VL events: 1 77 B XB x . • . I B 1 2 m in the event space E, First, the complex 1 which results from extracting the values from the nodes of the graph of iig is generated. Then all other isomorphisms from G* to c-fcrmulas of F1 are determined and a complex added to L for each isomorphism which results in a new complex that is not already in L« The set L* is created in a similar manner by generating all distinct complexes resulting from isomorphisms from G* to c-formulas of PO. Since the c-formula mg is consistent with regard to PO, the complex 1 is disjoint from all complexes in L 1 . (That is, there is no point in the VL event space E which is in both 1 and a complex of L'«) A near optimal extension of 1 1 against L' in E may be calculated using a version of the AQ7 program. The best complex in this extension (1 ) is g calculated according to a user defined criterion. The 1 g complex is converted to a c-formula by replacing the value set of each node n* of G* by the reference of the selector i with variable x in 1 • This c-formula is consistent with i g respect to the set FO, (This is evident since, if there were a VL event which satisfied both the c-formula from 1 and a 1 q c-formula of FO, then one could find a VL complex — using 78 an isomorphism between G* and the c- formula of FO — which intersected with 1 and L*« But since 1 and L* are disjoint, 1 1 this can not happen.) The cost function for the AQ7 procedure computes the cost of a complex. Cost functions may be selected from the following: 1. The number of elements in L which are covered by a complex but not cowered by any previous 1 • This is q the AQ7 counterpart to the cost function 1 in section 5.5) (Use the negative of this value to get a cost.) 2. The number of selectors in a complex (the AQ7 counterpart to function 2 in section 5.5). 3. The number of elements of L covered by a complex which are associated with different events of F1. 4. The total cost of variables which appear in a complex (i.e., the cost of functions or variables in associated nodes of 6*). This is the counterpart of the function 4 in section 5.5. 5. The total number of events in L covered by a complex. 79 6. The number of events in L* covered by a complex (the AQ7 counterpart to function 3 in 5.5) • Trimming is done in the manner described above for c-formulas (sec 5.5) • A HAXSTAR parameter is specified for this procedure and the 1 is selected from the extension using a q MAXSTAR value of 1. Appendix B gives a more complete description of the AQ7 procedure. 5.8 Adding New Punctions and Predicates to c-formulas The program currently allows three types of new functions to be added to existing c-fcrmulas: 1) global descriptors (meta functions) which count the freguency of occurrence of selectors with unary functions; 2) eguivalence type predicates (of the form [f (x , x ) =same ] i.e., the value of f is the same for x and x ); 3) extremity type predicates ([lst-f (x ) ] or [mst-f (x ) ]) which indicate that the argument x is at one end of a seguence of binary predicates. Punctions are added at the user's discretion. Beta Selectors - There are two types of meta functions currently calculated and added to a c-formula as a meta selector: One type (#PT(f=a) where f is an atomic function and a is a value in D (f ) ) , counts the 80 number of times a particular selector ([f (••)=*]) appears in a c-formula. The second type (FORALL (f =a) ) is a predicate which is true if a function assumes only one ' value in a c-formula and false otherwise. For each atomic function-reference pair which appears in any c-formula , a met a selector is added to the c-formula which has a meta function in the referee. For example, a c-formula [ tx (x^ =1 ][ tx (x 2 ) =1 ][ sh (x^ =1 ][ sh (x 2 ) =0 ] generates the four meta selectors: [ #PT(tx=1)=2] - the number of parts with tx=1 is 2 [ #PT (sh=0) =1 ] - the number of parts with sh=0 is 1 [ #PT(sh=1) =1 ] - the number of parts with sh=1 is 1 [F0RALL(tx=1) ] - all parts have tx=1. Since the number of such selectors may be quite large, the list of meta functions is trimmed to a small set. The size of the set is determined by a parameter METATRIM supplied by the user using as criteria the degree to which a value of the new function will separate the sets F1 and F0. For each 81 meta function- value combination (meta selector) which is generated, the number of c- formulas of F1 and PO which satisfy the selector is calculated. Associated with the meta function (mf) are two numbers: P1C0V and FOCOV. F1C0V is the maximum nunrber of c-formulas of F1 cowered by a meta selector arising from the meta function mf, FOCOV is the nurrber of c- formulas of FO which are cowered by the meta selector which gave the highest F1C0V value. The list of possible meta functions is trimmed to METATRIM remaining meta functions by sorting in descending order, the list of meta functions according to the primary field F1C0V and the secondary field FOCOV and selecting the first HETATRIH functions from this list. The meta selectors which result from applying each of the selected meta functions to each c-formula are automatically appended to the c-forwulas and carried with each c-formula during the generalization process. Setting HETATRIH to bypasses the entire meta selector generation process. Eguivalence Predicates - These predicates may have arbitrarily many arguments whose order is 82 irrelevant. They are calculated by scanning all selectors in each c- formula for sets of 2 or more selectors with unary functions which have the same atomic function and reference* Such a set of selectors is said to be equivalent and a new predicate is created which contains an argument from each of the atomic forms of all equivalent selectors* For example, a c-formula of the form: [s(x )*1 ][s(x 2 )=1 ][s(x 3 ) = 1][s(x^)=2][s(x 5 )=2] leads to the creation of predicates: [s(x *x *x )=sameirs(x . x ) -same ] 12 3 4 5 Bxtremeties Predicates - These are unary predicates which represent the ends of a sequence of selectors with the same binary predicate: [p(x »x ) ][p(x ,x ) ] . . . [p(> i _ 1 #x i ) ] where, in the sequence, if i>2 then the first argument of the j-th selector (1<=j 8 2-rectangle, 3-diamond 0,2 # 3,7=>9 4-open-top, 5-ellipse 6-open bottom, 7-square The input decision rule for the first object in has the 1 following form: [ontop(p f p ) ] [ontop(p ,p ) ] [size(p )=1 ] [size(p )=1 ] [size(p 3 )=2] [tx(p )=0] [tx(p )=1] [tx(p 3 )=0] [shape (p^a ] [ shape (p )=1] f shape (p) =4] [ inside (p , p ) ]=>[d=1 ]. 2 3 2 3 with domain structure: [shape=1,5] => [shape=8], [shape=0,2,3 # 7 ] => [shape=9]. The resulting domain table is: NAME HARG TYPE COST BIN HAX STRUCTURE PORALL 10 1 #PT 2 91 shape 1 3 9 1 5«> 8; 2 3 7*> 9; ontop 2 1 P 4 p i 4 P 2 4 P 3 4 P » 4 size 1 2 tx 1 1 inside 2 1 1 d 1 3 next 2 1 1 Generalized decision roles using no new f anct ions (8 seconds) The program was run vith the above decision rules several times vith different parameters for each ran* The first run through the program used modest values for the parameters AQMAXSTAE, ALTER, VLHAXSTAR and NCONSIST and default values and ordering of criteria vith no new functions added* The parameter values and resulting generalizations are given belov: Parameters: AQPARHS VLPARMS 92 AQHAXSTAR = 4 LQST HCONSIST = 4 ALTER * 4 VLHAISTAR = 4 AQCRIT AQTOLERAHCE VLCRIT VLTOLERANCE -1 3 0.30 2 -1 4 2 NBP OP CRIT: AQNF = 2 7LNF = 3 NEW FNCTNS: METATRIH » For the set , the program discovered one product which covers all elements of but no element of O or • 1 2 3 The product represented as a decision rule is: [ontop( Pi ,P 2 ) ir size ( Pi )=1][ size

[d=1]. (If there is a clear, medium size part en top of a medium size part, then make decision d=1.) Using a different set of parameters (MCOHSIST-2) , the description (which is less optimal according to the criteria specified) is obtained: [ontop(p ,p ) ][shape(p )=9 ][shape(p )=9 ] v [shape( Pi ) = 3J=>[d=1] The description of the set produced several 2 alternative generalizations which describe one or two parts 93 of bat only one generalization which covered ail 3 parts in : 2 [ ontop (p^ , p 2 ) K size (pj =1 J[ shape (p ) *8 ]«>£d=2 ] (there is an ellipsoid on top of a part with medium size) there vas no single product which covered all objects in • The program found one generalization which cowered two examples of , namely: [shape ( ?1 ) =5 ][ tx ( P<| ) =1 ]=>[ d«3 ] (there is a shaded ellipse) In covering the second object in , the program found a number of combinations of shape, texture and size of parts of the example of • The rule which was selected is: 3 [ ontop (P 1 . P 2 K shape (p ) =2 tf tx (p^ =1 ]=>[ d= 3 ] (there is a rectangle on top of a shaded part) giving a final description of the set : [shape(p i )=5][tx(p i )=1] v [ ontop (p , p^ )[ shape (p^) =2 ] [tx(p 2 )=1]=>[d=3) Generalization with EITflST type predicate s (8 seconds) 9a In the second application of the program to the example in figure 6.1 # the BZTHTT type predicates were added to each input decision rule. The only difference in parameters is that these new predicates are added to the input decision rules. The results are the following: For the set , there were two alternative generalizations each of which cower all three parts in : 1 T shape (p^ ) =9 ][ tx (p^ ) =0 ][ HST-ontop (p^ ) ]=>[ d=1 ] (the top part is a clear polygon) [size( Pi )=1 ][tx( Pi )=0]CMST-ontop( Pi ) ]=>[d»1 ] (the top part has medium size) Only one product was found to cower the set : [ shape (p ) =8 ]f HST-ontop (p ) ]=>[ d=2 ] (the top part is an ellipsoid) There is still no single product which cowers all ob-jects in however, the description of the second ob-Ject 3 in was somewhat simplified: 3 fshape(p )=5 )[tx(p )=1 ] v [ shape(p <| ) =2 ][ M ST-ontop (p^ ] = >£d=3] 95 Gene raliz ations with meta selectors (9 seconds) The next pass of the program included the three meta selectors which were determined by the program to be the best for describing each set of objects (using as a criteria the values of F1C0V and FOCOV) • The results along with the interpretation of the selected meta selectors are: [ms1=2]=>[d«1 ] (there are 2 clear parts) The selected meta-selectors are: HS TTPE PDNCTION P1C0? FOCOY 1 #PT tx =0 3, 2 #PT size =1 3, 3 3 #PT shape =6 3, 5 The three meta selectors count the number of parts with each texture and the number of sguare parts* The set O was cowered by the disjunction of tvo selectors: [ms3=1] ▼ [ms3=3]=>[d=2] (there are either one or 3 clear parts) The selected meta-selectors are: 96 s TYPE PDMCTIOM P1CO? POCOV 1 #PT shape s 7 3, 4 2 #PT shape = 6 3, 5 3 #PT tx = . 2, Recall that tPT has an interval domain structure. Giving tPT a nominal domain structure, the rale is: [ms3=1,3]=>[d=3 ] The most interesting simplification is with the set • Tvo alternatives were discovered: 3 [ms1=0]=>[d=3 ] [ms2=1]=>[d=3] (all parts have shaded texture) The selected meta-selectors are: HS TYPE FUNCTION F1COV FOCOV 1 #PT tx = 3, 2 FOBALL tx = 1 3, 3 #PT shape = 5 2. 2 In summary, alternative descriptions for each set of objects are: f tPT (tx=0)=2 ]=>[d = 1 ] or 97 [ shape (p )=9][size(p ) =1 ][ HST-ontop (p ) J=>[d=1 ] or [shape ( Pi )-9][tx(p)=0][HST-ontop(p) ]=*>[ d=1 ] (there are 2 clear parts or the top part is a clear polygon or a polygon of medium size) [ shape (p ) =8 ][ HST-ontop (p )]=>[d=2] or [ ♦ PT (t x=0 ]=1 , 3 ]=>[ d=2 ] (the top part is an ellipsoid or there are one or three clear parts) [F0HALL(tx=1) ]=>[d=3) or [#PT(tx=0)=0]=>[d=3] (all parts are shaded) Descriptive generaliza tio ns (10 seconds each set) If the program is given the description of only one set of objects at a time, then a descriptive generalization of sorts may be found by adjusting some parameters. The significant modification of parameters is that the cost function 2 (number of selectors) is replaced by -2 (i.e. , the more selectors, the lower the cost). In addition, a restriction was added to the predicate •on top* to implement the transitive closure for this predicate: [ ontop(p ,p ) ][ontop(p ,p ) ] => [ontop (p ,p ) ]. 98 The resulting descriptions contain a product which covers all objects in a set and contains as many selectors as possible* Of course, since the descriptions • of the other two sets of objects is not included, the descriptions are not discriminant but they are complete. AQPARHS VLPARHS AQMAXSTAR = H LQST NCOHSIST = 70 ALTER = 3 VLHAXSTAR = 3 AQCRIT AQTOLERA] ICE VLCBIT VLTOLERAHCE -1 3 0.30 -2 C -1 H -2 NBR OP CRIT: AQNF = 2 ?LNP = 3 NEW FHCTNS: METATRIM ■ 3 EXTMTY For each set, the products with the maximum number of selectors which the program found to be com son to all objects of the set are in the following decision rules; rontop( Pi ,p 2 ) ]rsize( Pi )=1 tftxtp^l ][ shape ^ ) =9 ][ tx (p^ =0 ] [HST-ontop(p )][size(p ) =1 , 2 1C shape (p ) = 2,4,5][tx(p )=0] 12 2 2 [LST-ontop(p ) ][ms1=0][ms2 = 2 Hms3=2]=>[d=1 ] The selected meta-selectors are: 99 s TYPE PUICTIOH P1CO? POC 1 #PT shape s 6 3, 2 #PT size - 1 3, 3 #PT tx = 3, (The bottom part is clear and has size mediant or large* The top part is a clear polygon, has medium size and there are no parts with an open bottom, 2 parts of medium size, and 2 clear parts*) [ontop(p ,p ) ][ontop(p ,p ) ][ size (p ) =1 ,2 Xsize (p ) -1, 2 ] r shape (p ) =8 )[ shape (p ) -0, 1 , 2, 4 ][ LST-ontop (p ) ] 12 2 [ ms1 =0 ][ ms2=0 ][ ms3=0 , 1 ]=>[ d= 2 ] The selected meta-selectors are: MS TYPE POICTIOH P1COV P0COV 1 #PT shape s 6 3, 2 #PT shape = 7 3, 3 #PT shape = 5 2, c (The top part is an ellipsoid vhich is not small and is on top of two parts* The bottom part is of medium or large size* Also, there are no sguare or open bottom parts and no more than 1 ellipse*) [ontop( Pi ,P 2 ) ][ size (p^) =0,1 K size (p^ =1 ,2 ]Ctx(p^)=1 ] [tx(p )=1 ][HST-0HT0P(P ) J[ SHAPE (P ) =9 ] [ LST-ontop (p ) ][ms1=0](ms2=1 ][ms3=0,1 ]=>[d=3 ] 100 The selected Beta-selectors are: HS TIPE PUHCTIOH P1CO? P0CO? 1 #PT tx 2 FORALL tx 3 tPT shape =0.3, = 1 3, C = 6 2, C (There are at least two parts both shaded vith the top part being a polygon of small or medium size and the bottom part being of medium or large size* ill parts are shaded and there is not more than 1 open bottom part*) 101 V, 21 Figures for Example EI 1 Figure 6, 1 102 6.2 Arch Example (EX 2) The objects in figure 6*2 are examples which describe an arch much like the arch which Winston describes [Winston 70]* Some additional shapes and relations have been added to give a more interesting example. The set 1 includes examples of arches and the set k contains examples of 2 objects which are not arches. The description used in this example are given below: Descriptor Structure Domain rel nominal 0-on top, 1 -under 2-left, 3-right d -other touch nominal 0-false # 1-true or (ientation) nominal 0-horizontal, 1-vertical shape nominal 0-rectangle, 1-triangle 2-ellipse The decision rule for the first arch in A as given to the program is: 103 [relfp^p )=0] [rel(p ,p )=0] [ rel (P 2 #P 3 > s 2 ] [touch (p ,p)»1 ] [touch (p ,p ) = 1] [ touch( p ,p ) =0 ] [or( Pi )=0] [or(p )=1] [or(p )=1 ] [shape (p )=0] [ shape (p ) -0 ] [ shape (p ) =0 ] => [ d=1 ]. mm -J Restrictions describing the character of the functions rel and touch are added to each rule: [rel(p ,p )=4]=>[rel(p ,pj=4]. 12 2 [ rel (P.P J =0 ]=>[ rel (p ,p 12 2 [rel (p ,p )=1 }=>[rel(p ,p [rel(p ,p )=2]=>[rel(p ,p 12 2 [ rel (p ,p ) =3 ]=>[ rel (p ,p 12 2 )-1]. )=0]. 1-3]. )=2]. [ touch (p,p)=1 ]=>[touch(p ,p )=1 ]. [ touch (p ,p )=0 ]=>[ touch (p , p ) =0 1. **V 2 2 1 The program was run twice with these examples: once to find an optimal discriminant generalization, and again to find a descriptive generalization. The parameters and results are as follows: AQPARHS YLPARHS AQMAISTAR = 2 LQST NCOHSIST = 12 ALTER = 1 YLMAXSTAR = 1 AQCRIT -1 AQTOLERAHCE VLCRIT VLTOLERAHCE 0.30 -2 4 NBR OP CRIT: AQHF = 2 NEW FNCTNS: METATRIH 104 -1 -2 YLMP = 3 DiscrinjinaQt description o£ sgt 4 ^ (12 seconds) [rel(p ,p )=0] [touch (p,p)=1] [ touch (p ,p)=1 ] [ touch (p,p) =0 ] [or(p ) = 1] [or(p .p )=same] 2 3 3 2 3 [ shape (p .p ) =same ]=>[d=1 ] (There is a part which is touching two others and on top of one. The other two parts are not touching but they both have the same shape and a vertical orientation). Descriptive decision rule for A i (2 seconds) [rel(p ,p )=0] [rel(p ,p ) =0 ] [ rel (p ,p ) =2,3 ] f touch (p ,p)=1 ] [touch (p ,p) = 1 ] [touch(p ,p )=0] [or(p )»0] [or(p )=1] [or(p ) =1 ] [shape (p) =0,1 ] 12 3 1 [or(p .p ) =same ] [ shape (p . p )=same] =>[ d=1 ] It so happens that the descriptive decision rules are also discriminant in this instance but this is normally not the case. In comparison to Winston's description of an arch, the results relate very closely to his result, namely that an 105 arch has two supporting members of a top piece which are not touching. 106 Arches for Example BZ 2 Figure 6.2 107 6.3 Trains (EI 3) The trains in figure 6.3 (from [Larson Michalski 1977]) represent two sets of trains, one set going vest and the other set going east. There are 12 descriptors vhich seem relevant to the situation in figure 6.3: Descriptor ncars nwhl (twhl/car) In (length) cshape Structure interval interval tree-struc 0,1,2,5=>10 3,4,6,7,8,9= npl (tpts/load) lshape if (in front) ccont lcont t (train) car nominal nominal nominal nominal nominal nominal Domain [3:5] interval [2,3 ] 0-short, 1-long 0-open rctngl, 1-open trap. 2-0-shaped, 3-hexagon =>11 4-ellipse, 5-dbl open rctngl 6-closed rctngl, 7-jagged top 8-sloping top, 9- locomotive 1 0-open top, 11-closed top interval [0:3] 0-circle, 1 -hexagon 2-triangle, 3-rectangle 0-false, 1-true 0-false, 1-true 0-false, 1-true [1] [1:5] 108 lod nominal [1:5] The variable (t) and the predicate ccont (train contains car) are introduced to form weakly connected graph structure representations of input decision rules. The description of each train contains about 60 selectors so only the description of the first train is given below: ncar(t ) =5 1 [ ccont (t r car )=11 [ ccont (t # car ) =1 1 1 1 1 12 ccont (t ,car ) =1 ] [ccont (t ,car )=1] [ccont (t ,car ) = 1] 1 3 1 4 1 5 if (car # car ) =1 ] [ if (car # car ) = 1 ] [if (car ,car ) = 1] if (car ,car ) =1 1 [loc(car ) =1 ] [loc(car ) =2 ] loc (car )=3] [ loc (car )=4] [loc(car )=5] [nwhl(car )=2] 3 4 5 1 nwhl(car ) =2 ] [nwhl(car ) =2 ] [ nwhl (car ) =3 ] [nwhl(car )=2] [ In (car ) =1 ] [In (car ) =1 ] [In (car ) =0 ] [In (car ) *1 ] 12 3 4 In (car ) =0 ] [cshape(car ) =9 ] [cshape(car )-0] cshape(car ) =8 ] reshape (car )=0] [cshape(car ) =0 ] npl(car ) =0 ] fnpl(car )=3] [npl(car ) = 1 ] [npl (car^) =1 ] npl (car ) =1 1 [lcont(car , lod ) =1 ] [ lcont (car ,lod )=1] lcont (car f lod )=1] [ lcont (car , lod )=1] 2 3 3 4 lcont (car , lod )=1] [lcont (car , lod )=1] [lshape(lod )=3] 4 5 5 6 1 lshape(lod ) =3 ] [lshape(lod )=3) [lshape(lod )*2] lshape(lod ) =1 ] [lshape(lod )=0] ->[d=1]. 5 6 The parameters and discriminant generalizations from this example are now presented* 109 AQPARHS AQHAXSTAR = 4 LQST MCO AQCRIT AQTOLERANCE -1 2 4 NBR OP CRIT: AQHF = 2 NEW FHCTNS: METATRIM = VLPARHS LQST MCOISIST = 8 ALTER ■ 8 VLHAXSTAR = 1 VLCRIT 3 -1 2 VLTOLERANCE 0.30 YLNP = 3 The discrimin an t generalizations o btai ned were the following: (10 seconds) [cshape(car )=11][ln(car )=0]=>[d=1] (there is a short, closed top car) [ncar(t ) =3 ] v [cshape(car ) =7 J=>[d=2 ] (there are three cars or a car with a jagged top) Some other interesting generalizations which were obtained by using different parameters are: [ncars(t ) =4, 5 ][lshape (lod )=2]=>[d=1] (If there are at least 4 cars and one car has a load which is a triangle then the train is going east*) 110 [loc (car ) =4 ][cshape (car )=6] v [loc (car )=5 ]f cshape (car (1) =0 ]=>[d=1 ] (If the fourth car is a closed rectangle or the fifth is an open rectangle then the train is going east*) [ # PT (ln=1) =2 ][ POEALL (nwhl=2) ] v [$PT (ln=1 ]=2 ][ loc (car ) =3 ][cshape (car ) =1 ]«>[d=2 ]. (If there are two long cars and either all cars have 2 wheels or the third car has an open top, then the train is goinq west.) [ #PT(ln=1)=2 ][cshape(car )=1,2] w [ loc (car ) =2 ][ cshape (car )=6]=>[d=2] (If there are exactly 2 long cars, one of which has shape open trapazoid or D-shape or car number 2 is a closed rectangle, then the train is going west*) 1. TRAINS GOING EAST 111 »■ l, P„ K, Q J - ^ - l D D D ^ -P° JL TJ V 73 0" k^W^Q D JGL o JL 77 .. lsJ-(^H^aa^ dM O MLM-C *-0 U H "73 — o — 0"" ' o o' ^0 — 0" 2. TRAINS GOING WEST »■ lA. NoooU O^ 1 73 CT ^3 0"~^ O CT ^S O- 1 7T 73 !T L=i JL CT 3. ^AM T3 W ME Dl 0- TT 4 - \.Q.A^'='. M i ' d — u 73 — tr ^3 a AoAJ ^ rr-u ] C TT ^1 JL TT Trains for Example EX 3 Figure 6.3 112 6.4 Textures (EX 4) This example is included to give some idea of the limitations of the program* The figure 6.4 (from [Hichalski 1972]) contains examples of figures with 9 boxes numbered from 1 to 9 from the center to the right and around the outside in a clockwise direction each with one of 4 textures. There are several ways to represent the figures. The first representation takes advantage of the ordering of arguments in a function to specify the location of each box (i.e, the third box corresponds to the third argument of the predicate p) • Since there is no apparant structure to each object, the program relies heavily on the AQ7 procedure to find a simple description of each set of objects. A description of the first object in the set T the results follow : Cp( VVWVVWV ] ttx( V = ° 3 C tx < x 2 )=31 [tx(x )=1] [tx(x )=3] [tx(x )=1] [tx(x ) = 3] [tx(x )*0] 3 4 5 6/ Ttx(x )=2] [tx(x )=0] =>[d=1] 8 9 Resulting decision rules ; (15 seconds) [p(x^x ,x ,x^,z ,x .t^x^x^) ] [tx(x 3 ) =0,2,3] [ tx (x^) =1,2 ] rtifs i-o,i] v 113 tp {x^z ,t^,x^,t ,x .JiyX^zJ ] [tx(x 2 )=0,3] [ tx (x^ *1 ,2 ] => [d=1] [p(x i'VVVV x 6'VVV ][tx( V =1][tx( V =23v £ p( VV W W WV ] C tx < x 2 > =1 ' 2 ] [tx(x 3 )=1 # 2] [tx(x )=0,1,2] v [p(x ,x r x f x ,x ,x ,x ,x ,x ) ] [tx(x )=0,3] [tx (x ) =0, 2, 3 ] [tx(x 5 )=0,3] => [d=2] If the program is now allowed to generalize over location as well as texture, then the results are much more difficult to attain. The above representation may be modified by specifying that the order of arguments is irrelevant to the predicate p (using a (•) between arguments instead of a (,)) and specifying the location of a box in the object by assigning a value to the variable representing the box (assigning a subrange x -2, for example). The description i of an object then appears: [p(x .x .x .x .x .x .x .x .x )][x =1 ] [x =2] [x =3 ] [ x =4] 1 2 3 4 5 6 7 8 91 2 3 4 [x =5] [x =6] [x =7] [x =8] [x =9] |tl(l }«fl] [tx(x )=3] 5 6 7 8 9 1 2 [tx(x )=1 ] [tx(x )=3] [tx(x )=1 ] [tx(x ) = 3] [tx(x ) =0 ] O U D O / [tx(x )=2] [tx(x )=0] ->[d=1]. 8 9 The program is allowed to generalize over values of both 114 tz(z ) and z • The features of the graph Batching algorithm i i which before lead to efficient comparisons between objects are no longer useful (i.e., the fact that all nodes are labelled and that many edges .are labelled). Since the c-structure of a formula of the form 1x^3) [tz(z i )=2] [p(x .x ) ] C* 2 =1] [tz(z 2 )=1] is [z^*] [tzfz^*] CPOy* ) ] [z 2 =*] [tz(z 2 )=*] there are very many occurrences of such a c-structure in each object resulting in a large number of VL events supplied to the AQ7 procedure (200 events in each set with from 6 to 8 variables) • The program was run with the value 1 for each parameter but unfotunately the program ran for over 1 minute with no results* The input set of decision rules was then cut to only the first 5 objects in set T and the last five objects in set T and the program was run again with this 2 reduced set of objects. The results of the program ezecution using this formulation are: (1 minute) rp(z .z ) ][z =1 ][i =9][tz(z )=2][tz(z )=1 ] v 12 12 1 2 [p(X .X J ][z i =1][z 2 =5.9]Ctz(z i )=0.3][tz(Z 2 )=1.2.3] v [x «3][tx(x )»3] => [d=1] 115 [ p (x^ % 2 ) ][ x^ = 1 J[ x 2 =4 ][ tx (x^ =3 ][ tx (x 2 ) =3 ) * [p(* .x ) Kl^l ][x 2 =9][tx(x i )=2][tx(x 2 )=0 ] ▼ [x =1Jttx(x )=1] v [i *9]ttx(x )=2J v [p(x i .x 2 ) tfx^l ][x 2 =5]Ctx(x i )=0][tx(x 2 )=0 ] => [d=2J (Presumably, if the parameters are increased, the results would be much more interesting but this appears to be the limit of the program's applicability with regard to the exponential nature of the graph isomorphism algorithm*) The time reguired to compute these results is significantly greater than the time reguired when the location of each box is known (15 seconds compared to one minute) • 116 r'/^SSi uzzmk 1 n ^ _1 n r— ' : ^ — ? ^^ 10 llL n-o :A - 1 - 2 - 3 Texture Figures for Example EX 4 Figure 6,4 117 7« Current Limitations and Possible Extentions In the preceeding chapters, a formal methodology has been presented for solving a wide variety of inductive tasks. It differs from previous work in it's formal basis, flexible optimality criteria and domain definitions, ability to add new descriptors to existing descriptions, and apparent extensibility. Presented below are a few limitations of the current implementation. Some of the limitations have plausible solutions in light of some applications by other authors in specific problem areas. Some extentions may be added to the program with the current state of the art and are omitted from the current implementation due to the higher priority placed on evaluating the existing features. Some features require algorithms which are not yet well understood (especially those which involve the modification of hypotheses in light of new information and the use of multi- valued truth domains). Extensions are listed in the author's estimate of increasing difficulty. 1. The restriction which reguires generalizations to have weakly connected graph structure representations has the advantage that the search space is limited and resulting descriptions contain selectors which 118 are related in some meaningful way. This constraint should be applied at the option of the user* 2. The graph structure used in the current implementation has not been fully analyzed. It appears that unary functions represent a special type of description of parts of objects and are well suited to generalizations using a VL system* Functions and predicates of higher order represent relationships between parts and are more amenable to generalization using a VL approach (i.e.,, 'growing' mm consistent generalizations) • These two roles of functions and predicates have not been investigated* 3* The program has a potential for application to more abstract objects (e.g. patient records used by the HYCIN system). Op until now* only simple physical objects have been used as examples* U* There should be a facility to add new user defined functions to a rule of the type: V |= V[S0M_cost=3 ] where V is [cost(x )=2][cost(x ) =1 ] 1' 2 119 5. The tree structure domain may be extended to a generalization structure as presented by Hichalski [Hichalski 75]. 6. Only the operators (v and S) are currently understood by the program. The TL system contains a very rich set of operators, one of the most important being the exception operator. In the current implementation, there is no generalization of decision D=1 in the set of rules: [p(x ,x ) ] => [D=1] [p(x ,xj )[s(x i )=1] => [D=2] 7. Decision rules which are used heavily in production systems contain condition and decision parts which share arguments. These should be handled by the program in some way. e.g. [p(* ,x ) ] => [q(x ,« ) ] 8. Descriptive generalizations only contain one conjunctive expression. In some circumstances, it is desirable to produce a disjunctive, descriptive description (see Yuen for a program UNICLASS which 120 generates descriptive descriptions for 7L 1 expressions) • 9* The order in which objects are selected from the set P1 seems to be significant. The program ESEL whose algorithm is described in Hichalski [Hichalski 75] gives a method for selecting ?L type events. 10. Universal quantifiers have been considered only in a global sense since a simple universally guantified expression is less general than an existentially guantified expression in the condition part of a rule. A universal or distinct universal quantifier could be added to the program. 11. Only the distinct existential quantifier is included in the program. This decision vas made in order to limit the search space as well as to simplify the user input and the resulting expressions. Hayes-Roth [Hayes-Roth 76] has pointed out the utility of a •many to one* mapping instead of the current one to one map. 12. No facility is included in the current implementation to accomodate initial hypotheses in order to form a 121 new consistent, complete generalization in light of new information. Larson [Larson 76] and Hedrick [Hedrick 74] give some approaches to this problem using other systems* For example, given an hypothesis: CP(x i »x 2 ) Ifl|z )«1] => [D=1 ] and the new information that [p(x ,x ) ls(x 2 )=2] => [D=2], the program should generate a nev rale which is consistent (e. g. , fp(x i ,x 2 ) ](s(X 2 )-=2] => [D=1] 13. Generalizations of subsets of parts of an object should be usable by the program to form nev, higher level generaliztions in terms of the these generalized concepts (e.g. Winston [Winston 70] generates the description of an arch and then uses this description to describe a seguence of arches) • 14. Include a multi- valued truth domain to learn from data of differing degrees of certainty. Very little 122 is done in this area; one of the problems being that a deductive system using multi-valued truth values is not well understood* 123 LIST OF REFERENCES Janerji, R. 1975, "Learning with Structural Descriptions." working paper Temple University, Philadelphia, Penn. tongard, H. 1970, Pattern Recognition. . translated by Theodore Chevon, Spartan books. New York, N. T. Buchanan, B.G. , Sutherland, G.L., and Feigenbaum, E.I. 1969, "Heuristic DENDRAL: a Program for Generating Explanatory Hypotheses in Organic Chemistry", in Machine Intelligence 4 t (B. Meltzer and D. Hichie eds) , Edinburgh University Press. 3uchanan, B.G., Sutherland, G. L. , and Feigenbaum, E.A. 1972, "Heuristic Theory Formation, Data Interpretation, and Rule Formation." Machine I ntelligenc e It. (B. Heltzer and D. Hichie eds) , John Hiley & Sons. :hilausky R. , Jacobsen B. , Hichalski R.S. 1976, "An application of Variable Valued Logic to Inductive Learning of Plant Disease Diagnostic Rules", Proceedings of the S_ixth Annual Symposium on flultjple Valued Logic. Logan, Utah. 124 Croft, J. A. 1971, "A comparative Study of Mathematical lethods for Diagnosing Disease." Fa. D dissertation, ■orth western University Evans ton, Illinois* * Hayes-Roth, F. and HcDeraott, J. 1976, "Knowledge Acquisition from Structural Descriptions." Department of Computer Science, Carnegie Mellon university. Hedrick C.L. 1974, "A Computer Program to Learn Production Systems Using a Semantic Net. " Department of Computer Science, Carnegie Mellon University, Pittsburgh, Penn. Hunt, E. B. 1966, Experiments in Induct ion. Academic Press, Kochen, M. 1974, "Representation and Algorithms for Cognitive Learning. " Artificial I ntel ligence 5, Larson J., Hichalski R.S., "AQVAL/1-AQ7 user's Guide and Program Description." Department of Computer Science report number 731, University of Illinois. Larson J. 1976, "A Multi-step Pormation of Variable Valued Logic Hypotheses." Proceedings of tjie Sixth International Symposium on Multiple Valued Logic. Logan, Utah. 125 Larson J., Hichalski R.S. 1977, "Inductive Inference of VL Decision Bales*** Workshop on Pattern Directed Inference Systems, Hawaii* Hichalski R.S. 1972, "1 Variable- Valued Logic System as Applied to Picture Description and Recognition," ££a£hj.c Lan^uages^ p roceedings of the %?%!? Wo rking Conference on Graphic Languag es* Vancouver Canada* , Hichalski R.S. 1973, "AQVAL/1 AQ7 Computer Implementation of a Variable Valued Logic System and the Application to Pattern Recognition." P roceedi ngs of the First International J oin t Confere nce on Pattern Recognition. Washington, D. C. Hichalski R.S. 1974a, "Variable-Valued Logic: System VL •" P roce edings gf Fourth International Symposium on Hutifile^Valued Logic* West Virginia University, Horgantovn, Virginia* Hichalski R.S. 197Ub, "Learning by Inductive Inference." HATO Advanced Study Institute on Computer Oriented Learning Process. " Prance. Hichalski R.S. 1974c, "Problems of Designing an Inferential 126 Hedical Consulting System." Pirst Illinois Conference on Medical Information Systems University of Illinois Urbana, Illinois* Hichalski R.S. 1975, "On the Selection of Representative Samples From Large Relational Tables for Inductive Inference." Department of Information Engineering University of Illinois at Chicago Circle, Chicago, Illinois. Hichalski R.S. 1977, "Toward Computer-aided Induction: a brief review of currently implemented AQVAL programs" • 5th International Conference on Artificial Intelligence, Cambridge, Mass. Morgan C.G. 1972, "Inductive Resolution." Master^s thesis. Department of Computer Science, Edmonton Alberta. Pople, H., Werner G. 1972, "An Information Processing Approach to Theory Formation in Biomedical Research." AlflPS Conference Proceedings. Vol 4 0. Rychener, M.D. 1976, "Production Systems as a Programming Language for Artificial Intelligence Applications." Ph.d dissertation. Department of Computer Science, Carnegie Mellon University, Pittsburgh, Penn. 127 Schubert K.L. 1976, "Extending the Expressive Power of Semantic Networks." Artificial Intelligence 7. Shortliffe, E.H. 1974, "A Rale Based Computer Program for Advising Physicians Regarding Antimicrobial Therapy Selection*" Ph.D dissertation." Stanford Artificial Intelligence Laboratory, Memo AIH-251 • Tar-jan R. 1972, "Depth-First Search and Linear Graph Algorthms." SI AH Journal on Computers, Vol 1, No 2* Vere S.A. 1975, "Induction of Concepts in the Predicate Calculus," P roc eedings of the Fou rth International Joint Conference on Art ificial Intelligence * Waterman D.H. 1970, "Generalization of Learning Technigues for Automating the Learning of Heurisics," Artificial Intel ligence 1 • Waterman D.H. 1974, "Adaptive Production Systems." working paper #385, Department of Psychology, Carnegie Mellon University, Pittsburgh, Penn. Waterman D.A. 1975, "Serial Pattern Acguisition: A Production System Approach. " working paper #286, 128 Department of Psychology, Carnegie Hellon University, Pittsburgh, Penn. Winston, D.B. 197C, "Learning Structural Descriptions from Examples, M AI-TR-76, Cambridge; MIT, Irtificial Intelligence Laboratory* Yuen H* , "Oniclass Synthesis: User's Guide" internal report* Department of Coomputer Science, University of Illinois* Urbana, Illinois* 129 APPENDIX A This appendix contains a trace of the program execution through the solution of EX1 (section 6.1) covering the set with c- formulas with the EXTHTT type predicates added to each input rule. Host of the output is self explanatory (vith reference to chapter 6 for the definition of some terms) , with the possible exception of the output of rules. Rules are printed with a rule number (a unique number assigned by the program to each new graph data structure which is needed). The event set which the rule is intended to cover (i.e., the decision value in the decision D=i) is given after the rule number the cost function indices used in trimming (in parentheses after the word COSTS) is given. Following this is a list of the actual costs of the c-formula) • Comments are interspersed with the output to clarify specific points. (Computer output is given in capital letters). Parameters: AQPABMSS VLPAHMS AQHAXSTAR = 7 LQST NCONSIST - 10 ALTER = • VLMAXSTAR = a RQCRIT AQTOLERAHCE VLCRIT ▼LTOLERANCE -1 3 0.30 2 -1 4 2 NBR OP CRIT: AQNF = 2 VLHP = 3 130 NEW PNCTNS: METATRIH = NOW COVEBIHG EVENT ROLE 5 EVENT SETS: 1 COSTS ( 3-1 2) fONTOP (P1,P2) ][ONTOP(P2,P3) X SIZE(P1)= 1][SIZE(P2)= 1] [SIZE(P3) = 2][SHAPE(P1) = ][ SHAPE (P2) - 2 ][SHAPE (P3) = 5] [TX(P1) = 0KTX(P2)= 1][TX(P3)= ][ HST-ONTOP (P1) ] [LST-ONTOP(P3) ] A c-formala has been selected from the set which 1 is represented by rale number 5, THE FOLLOWING FORNOLAS ARE ROLE 22 EVENT SETS: TLST-ONTOP(PI) ] ROLE 21 EVENT SETS: [ HST-ONTOP (P1) ] ROLE 20 EVENT SETS: TTX(P1)= 0] ROLE 19 EVENT SETS: TTX(P1)= 1 ] ROLE 18 EVENT SETS: [TX (P1)= 01 ROLE 17 EVENT SETS: [SHAPE (P1)= 5] ROLE 16 EVENT SETS: [ SHAPE (P1)= 2] ROLE 15 EVENT SETS: TSHAPE (P1)= ] ROLE 14 EVENT SETS: fSIZE(Pl)= 2] ROLE 13 EVENT SETS: rSIZE(Pl)= 1 ] ROLE 12 EVENT SETS: fSIZE(Pl)= 1] IN THE ONTRIMMED STAR 1 COSTS 1 COSTS 1 COSTS 1 COSTS 1 COSTS 1 COSTS 1 COSTS 1 COSTS 1 COSTS 1 COSTS 1 COSTS 3-1 2) 3-1 2) -3 -3 3-1 2) 3 -3 3-1 2) 6 -3 3-1 2) 3 -3 3-1 2) 3 -1 3-1 2) 2 -2 3-1 2) 2 -1 3-1 2) 4 -2 3-1 2) 5 -3 3-1 2) 5 -3 131 All selectors of the event (condition of rale 5) are place in the intial partial star. Since VLHAXSTAR has the value 4, the above list is trimmed to a smaller list of the 4 best c-formulas in the partial star. THE FOLLOWING FORMULAS BEBAIN AFTER TBIHMING BOLE 16 EVENT SETS: 1 COSTS ( 3-1 2) 2 [SHAPE (P1) = 2 1 BOLE 14 EVENT SETS: 1 COSTS ( 3-1 2) 4 [SIZE(P1)= 2] ROLE 12 EVENT SETS: 1 COSTS ( 3-1 2) 5 rsizE(pi)= 1 ] BOLE 18 EVENT SETS: 1 COSTS ( 3-1 2) 3 [TX(P1)= 0] The next partial star is formed by adding one selector to all c-formalas in the previous partial star forming a list of c-formalas which have two selectors. -2 1 -2 1 -3 1 -3 1 THE FOLLOWING FORMULAS ARE IN THE ONTRIHMED STAR BOLE 38 EVENT SETS: [TX(P1) = 0][HST-OITOP(P1) ] BOLE 37 EVENT SETS: [SHAPE (P1)= 0][TX(P1)= 0] BOLE 36 EVENT SETS: [SIZE(P1)= 1 ]f TI(P1)= ] BOLE 35 EVENT SETS: [0NT0P(P1,P2) ][TX(P1) = 0] BOLE 34 EVENT SETS: [SIZE(P1)= 1 ][ HST-OITOP(PI) ] BOLE 33 EVENT SETS: 1 COSTS ( 3-1 2) 1 COSTS ( 3-1 2) 1 COSTS ( 3-1 2) 1 COSTS ( 3-1 2) 1 COSTS ( 3-1 2) 1 COSTS ( 3-1 2) -3 -1 -3 -3 -3 -3 2 2 132 r SIZE(P1)= RULE [SIZE(P1) = RULE CONTOP (P1, ROLE [SIZE(P1)= ROLE [SIZE(P1)= ROLE rsizE(Pi)= ROLE [ONTOP (P1, ROLE [SHAPE (P1) ROLE [SIZE(P1) = ROLE [ONTOP (P1, ROLE [ONTOP (P1, 1][TX(P1) = 0] 32 ETENT SETS: 1 COSTS 1 ][ SHAPE (P1)= ] 31 EVENT SETS: 1 COSTS P2) ][SIZE(P1)= 1 ] 30 EVENT SETS: 1 COSTS 2][LST-0HT0P(P1) ] 29 EVENT SETS: 1 COSTS 2][TX(P1) = 0] 28 EVENT SETS: 1 COSTS 2][SHAPE(P1)= 5] 27 EVENT SETS: 1 COSTS P2) ][SIZE(P2)= 2] 26 EVENT SETS: 1 COSTS = 21[TX (P1)= 1] 25 EVENT SETS: 1 COSTS 1 ][SHAPE(P1)= 2] 24 EVENT SETS: 1 COSTS P2) ][ SHAPE (P1)= 2] 23 EVENT SETS: 1 COSTS P2) ][ SHAPE (P2)= 2] ( 3-1 2) 2 -1 2 ( 3-1 2) 4 -3 2 ( 3-1 2) 3 -2 2 ( 3-1 2) 1 -2 2 ( 3-1 2) 3 -1 2 ( 3-1 2) 3 -2 2 ( 3-1 2) 1 -1 2 ( 3-1 2) 2 -2 2 ( 3-1 2) 2 -1 2 ( 3-1 2) 1 -2 2 At most 4 new selectors were added to each c-formula of the previous partial star (ALTER-4) • This partial star is now trimmed to the 4 best elements according to the optima lity criterion. THE FOLLOWING PORMOLAS REMAIN AFTER TRIMMING ROLE 25 EVENT SETS: 1 COSTS ( 3-1 2) TSIZE(P1)= 1 ][SHAPE(P1)= 2] ROLE 23 EVENT SETS: 1 COSTS ( 3-1 2) [ONTOP (P1,P2) ][SHAPE(P2)= 2] ROLE 29 EVENT SETS: 1 COSTS ( 3-1 2) [ SIZF(P1)= 2 ][TX (P1)= ] -2 2 -2 2 -2 2 133 ROLE 38 EVENT SETS: 1 COSTS ( 3-1 2) 1 -3 [TX(P1) = 0][HST-OHTOP(P1) ] THE FOLLOWING FOBMOLAS ABE IH THE OHTBIMHED STAB RULE 51 EVENT SETS: 1 COSTS ( 3-1 2) -1 [SHAPE (P1)= 0][TI(P1)= 0][HST-ONTOP(P1) ] RULE 50 EVENT SETS: 1 COSTS ( 3-1 2) -3 [SIZE(P1)= 1][TX(P1) = 0][HST-ONTOP(P1) ] RULE 49 EVENT SETS: 1 COSTS ( 3-1 2) 1 -3 [ONTOP (P1,P2) ][TX(P1) = 0][MST-ONTOP(P1) ] RULE 48 EVENT SETS: 1 COSTS ( 3-1 2) -2 [SIZE(P1)= 2]fTI(Pl)= 0][LST-ONTOP(P1) ] RULE 47 EVENT SETS: 1 COSTS ( 3-1 2) 1 -1 [SIZE(P1)= 2][SHAPE(P1) = 5][TX(P1) = 0] RULE 46 EVENT SETS: 1 COSTS ( 3-1 2) -2 [ONTOP (P1,P2) ][ SIZE (P2)= 2][TX(P2)= 0] RULE 45 EVENT SETS: 1 COSTS ( 3-1 2) -2 [ONTOP (P1,P2) ][SHAPE(P2) = 2][TX(P1) = 0] RULE 44 EVENT SETS: 1 COSTS ( 3-1 2) -1 [ONTOP (P1,P2) ][SHAPE(P1) = ][ SHAPE (P2) = 2] RULE 43 EVENT SETS: 1 COSTS ( 3-1 2) 1 -2 [ONTOP(P1,P2) ][SIZE(P2) = 1 ][SHAPB (P2) = 2] RULE 42 EVENT SETS: 1 COSTS ( 3-1 2) 1 -2 [ONTOP (P1,P2) ][ SIZE (P1)= 1 ][ SHAPE (P2)= 2] ROLE 41 EVENT SETS: 1 COSTS ( 3-1 2) 1 -1 [SIZE(P1)= 1 ][SHAPE(P1)= 2][TX(P1)= 1] RULE 40 EVENT SETS: 1 COSTS ( 3-1 2) 2 -1 [ONTOP (P1,P2) ][SIZE(P1)= 1 ][SHAPE(P1)= 2] RULE 39 EVENT SETS: 1 COSTS ( 3-1 2) 1 -2 [ONTOP (P1,P2) 1[ SIZE (P2)= 1 ][ SHAPE (P2) = 2] THE FOLLOWING FOBHOLAS BEHAIN AFTEB TBIHMIHG RULE 56 EVENT SETS: 1 COSTS ( 3-1 2) -2 [ONTOP (P1,P2) 1[ SHAPE (P2) = 2][TX(P1)= 0] RULE 55 EVENT SETS: 1 COSTS ( 3-1 2) -2 [ONTOP (P1,P2) ][ SIZE (P2)= 2][TX(P2)= 0] RULE 54 EVENT SETS: 1 COSTS ( 3-1 2) -2 [SIZE(P1)= 2][TX(P1)= OXLST-ONTOP(PI) ] 134 RULE 53 EVEBT SETS: 1 COSTS ( 3-1 2) 0-3 3 [SIZE(P1)= 1][TX(P1) = 0][HST-ONTOP(P1) ] Since each selected c-formula is consistent, all c-formulas are placed in the list HQ. The program now looks for more descriptive generalizations to supply a larger set of variables to the AQ7 procedure* THE FOLLOWING POBHULAS ARE IN THE ONTRIHHED STAR ROLE 69 EVENT SETS: 1 COSTS ( 3-1 2) 0-1 4 TSIZE(P1)= 1 ][SHAPE(P1) = 0][TX(P1) = X HST-OHTOP (P1) ] RULE 68 E?ENT SETS: 1 COSTS ( 3-1 2) 0-3 4 [ONTOP (P1,P2) 1[SIZE(P1)= 1][TX(P1) = ][ HST-ONTOP (P1) ] RULE 67 EVENT SETS: 1 COSTS ( 3-1 2) 0-14 TSIZE(P1)= 2][ SHAPE(P1) = 5][TX(P1) = ][ LST-ONIOP (P1) ] RULE 66 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 4 [ONTOP (P1,P2) ][ SIZE (P2)= 2][TX(P2) = ][ LST-ONTOP (P2) ] RULE 65 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 4 [ONTOP (P1.P2) ][SIZE(P2)= 2][TX(P1) = 1][TX(P2)= 0] RULE 64 EVENT SETS: 1 COSTS ( 3-1 2) 0-1 4 [ONTOP (Pl r P2) ][ SIZE (P2)= 2 ][ SHAPE (P2) = 5][TX(E2)= 0] RULE 63 EVENT SETS: 1 COSTS ( 3-1 2) 0-14 [ONTOP (P1,P2) ][ SIZE (P2)= 2 ][ SHAPE (P1) = 2][TX(P2)= 0] RULE 62 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 4 [ONTOP (P1,P2) ][SIZE(P1)= 1][SIZE(P2)= 2][TX(P2)= 0] RULE 61 EVENT SETS: 1 COSTS ( 3-1 2) 0-14 [ONTOP (P1,P2) ][ SHAPE (P2) = 2 K TX (P1) = ][ TX (P2)= 1] RULE 60 EVENT SBTS: 1 COSTS ( 3-1 2) 0-1 4 [ONTOP (P1,P2) ][SHAPE(P1)= ][ SHAPE (P2) = 2][TX(P1)= 0] RULE 59 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 4 [ONTOP (PI. P2) ][ SIZE (P2)= 1 ][ SHAPE (P2) = 2][TX(P1)= 0] RULE 58 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 4 [ONTOP (P1,P2) ][ SIZE (P1)= 1 ][SHAPE (P2) = 2][TX(E1)= 0] 135 At least 10 consistent formulas have be generated and placed in HQ* The AQ procedure is then applied to each one of the consistent formulas (NC0NSIST=10) • Only the trace of one such application is given below. THE CONSISTENT FORMULAS: BEPOFE AQ: ROLE 58 EVENT SETS: 1 COSTS ( 3-1 2) -2 4 [0NT0P (P1,P2) ][SIZE(P1) = 1 ][SHAPE(P2)= 2][TX(P1) = 0] THE C-FORMOLA STRUCTURE IS: ROLE 58 EVENT SETS: 1 COSTS ( 3-1 2) -2 4 [ONTOP (P1,P2) ][SIZE(P1) = * ][SHAPE(P2)= * ][TX(P1) = * ] THERE ARE 6 VL1 TYPE VARIABLES X1,X2,...,X 6 VARIABLES ARE ASSOCIATED WITH NODES IN THE C-FORMOLA AS FOLLOWS: NODE VARIABLE ONTOP X1 P1 12 P2 X3 SIZE XU SHAPE X5 TX X6 AQ IS APPLIED TO THE POLLOWING IHPOT CPIS/EVEiTS SET 1 0] 1] 0] 0] [X1 = [11 = [X1 = [X1 = [X1 = [X1 = [X1 = rxi= [X1 = rxi= [X4= [X4= [X4= [X4= 1][X5= 2][I6 = 1)fX5= 4][X6 = 1][X5= 1]£X6= 1][X5= 2][X6 = SET 2 [X4= [X4 = [X4= [X4 = [X4= [X4= 2][X5= 1][X5= 2][X5 = 1 ][ X5= 1][I5= 1][X5= 1 ][ X6= 4 ][ X6= 0][X6= 1)[X6= 2]fX6 = 5KX6 = 0] 0] 1] 0) 1] 1] 136 [X1 = 1][X4= 1][X5= 1]£X6 = 1] r X1= 1][X4= 2][X5= 6]£X6= 1] [X1= 1][X4 = 0][X5= 5J[X6= 1] THE RESULTING COMPLEX FROM THIS PASS IS: [X4 = 1][X5= 2 3 7 9](X6= 0] AFTER AQ: RULE 58 EVENT SETS: 1 COSTS f 3-1 2) rONTOP(Pl,P2) ][SIZE(P1)= 1 ][SHAPE(P2)= 9]fTX(Pl) = -2 0] The results from the application of the AQ7 procedure to each consistent formula are given below. These are the consistent alternative solutions (note that only those with a cost of -3 for cost function -1 are complete) • THE FOLLOWING RULE 58 r ONTOP (P1,P2) RULE 59 f ONTOP (P1,P2) RULE 60 ("ONTOP (P1,P2) RULE 61 [ONTOP (P1,P2) RULE 62 [ONTOP (P1,P2) RULE 63 [ONTOP (P1,P2) RULE 64 [ONTOP (P1,P2) RULE 65 [ONTOP (P1,P2) RULE 66 [ ONTOP (P1,P2) RULE 67 ARE ALTERNATIVECONSISTENT GENERALIZATIONS EVENT SETS: 1 COSTS ( 3-1 2) -2 ][SIZE(P1)= 1 ][ SHAPE (P2)= 9][TX(E1)= 0] EVENT SETS: 1 COSTS ( 3-1 2) -2 ][SIZE(P2)= * ][ SHAPE (P2)= 2][TX(P1)= 0] EVENT SETS: 1 COSTS ( 3-1 2) -2 ][SHAPE(P1)= 9][SHAPE(P2)= 9][TX(P1)= * ] EVENT SETS: 1 COSTS ( 3-1 2) -2 ][SHAPE(P2)= 2][TX(P1)= 0][TX(P2)= * ] EVENT SETS: 1 COSTS ( 3-1 2) -2 1[SIZE(P1)= 1][SIZB(P2)= 2][TX(P2)= 0] EVENT SETS: 1 COSTS ( 3-1 2) -1 1[SIZE(P2)= 2][SHAPE(P1)= 2 ][ TX ( B2) = * ] EVENT SETS: 1 COSTS ( 3-1 2) -1 ][SIZE(P2)= 2 ][ SHAPE (P2)= 8][TX(P2)= 0] EVENT SETS: 1 COSTS ( 3-1 2) -2 ][SIZE(P2)= 2][TX(P1)= 1][TX(P2)= 0] EVENT SETS: 1 COSTS ( 3-1 2) -2 1[SIZE(P2)= 2][TX(P2)= ][ LST-ONTOP (P2) ] EVENT SETS: 1 COSTS ( 3-1 2) -1 137 [SIZE(P1)= 2][SHAPE(P1) = 8](TX(P1) = )[ LST-OMTOP (P1) ] ROLB 68 E?EHT SETS: 1 COSTS ( 3-1 2) 0-3 3 [ ONTOP (P1,P2) ][SIZE(P1)= 1]£TX(P1)= ][ HST-ONTOP (P1) ] ROLE 69 EVENT SETS: 1 COSTS ( 3-1 2) 0-3 3 [SIZE(P1)=. 1 ]f SHAPE (P1)= 9][TI(P1)= ][ HST-ONTOP (P1) ] POLE 44 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 3 rONTOP (P1,P2) ][SHAPE(P1) = 9 ][ SHAPE (P2) = 9] ROLE 45 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 3 [ONTOP (P1,P2) ][SHAPE(P2)= 2KTI(P1) = 0] ROLE 46 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 3 [ONTOP (P1,P2) ][ SIZE (P2) = 2][TX(P2) = 0] ROLE 48 EVENT SETS: 1 COSTS ( 3-1 2) 0-2 2 [SIZE(P1)= 2][TX(P1) = 0]£LST-ONTOP(P1) ] ROLE 50 EVENT SETS: 1 COSTS ( 3-1 2) 0-3 2 [SIZE(P1)= 1 ][TI(P1)= ][BST-0MT0P(P1) ] ROLE 51 EVENT SETS: 1 COSTS ( 3-1 2) 0-3 2 [SHAPE (P1)= 9][TX(P1)= ][ HST-ONTOP (P1) ] THE SELECTED HQ PORHOLA IS: ROLE 50 EVENT SETS: 1 COSTS ( 3-1 2) 0-3 2 [SIZE(P1)= 1]rTX(Pl)= 0][MST-ONTOP(P1) ] The solution to the problem is the following rule. Note however that there were four distinct solutions generated (rules 68, 69, 50, and 51) • THE FOLLOWING POBMOLAS COVER SET 1 THIS ROLE COVERS 3 NEW ROLES ROLE 50 EVENT SETS: 1 COSTS { 3-1 2) 0-3 2 [SIZE(P1)= 1][TX(P1)= 0)f HST-ONTOP (P1) ] 138 APPENDIX B This appendix contains a very brief description of the AQ7 procedure implemented in the program IHD0CE_1. The reader is referred to [Larson et.al. 75] for farther details. A set of variables (x ,x ,...,x ) and a domain 12 n definition including size and structure (noodnal, interval or tree-structured) is given. The space (even t space) defined by E = D(x ) x D(x ) x ... x D(x ) 12 n contains a set of points (events) where each point may be specified by supplying a list of values for the variables x ,x ,..,x • A complex is the set of points in the event 12 n space which satisfy a particular VL expression of the form [x = E ][x = R ] ... [x = S ] 112 2 n n where R is a representation for a subset of the domain D (x ) i i or the symbol (*) (if R =*, then the selector may be removed i from the expression). The procedure accepts two sets of ?L expressions (L and L 1 ) representing disjoint complexes and produces a near optimal generalization of the set of expressions in L with regard to a set of optimality criteria. Certain other parameters control the generalization process. 139 Let L=[l ,1 ,..,1 ] and L'=[l' ,1« ,...,1» ] (these 12m 12 m' two sets are denoted P1 and PO respectively in other references; the above notation vas selected to avoid confusion with sets of c-formulas) . The program selects one element (e ) from L and forms a star about this element (a star about e is a set of complexes each of which contains e but does not intersect with any element of L' and is nearly maximal under inclusion). One element (1 ) is selected from q the star using the optimality criterion and placed in a set of output complexes; all other elements of L which are covered by this 1 are removed from L and the process is q repeated with a new e until the set L is exhausted* I star is generated by forming a sequence of elementary stars and partial stars, one for each element of L'« An elementary star (BS (e ,1* ) or ES ) is a set of i i complexes which covers e does not intersect with 1' , and is i maximal under inclusion and under domain structure constraints. A partial star (P (e ) or F ) is a set of i i complexes which contain e but do not intersect with any 1' , j j<=i (note that P is in fact a star) • To generate an m* elementary star ES , the extension against rule (see chapter i 3) is applied to each selector of e in the context of 1* • i The result is a set of selectors, each one corresponding to one selector of e • To form a partial star P from a i*1 mo partial star P , each eleaent of P is multiplied by each i i element of ES (i.e., the set of complexes in P and ES i*1 i i may be viewed as a logical sum of products of selectors; the multiplication is then the normal result of expanding a product of sums in the VL system) • The partial star P is initialized to be the entire event space E. As each partial star F is generated, i absorption laws are applied to discard any complex of P i which is contained in another complex of P • The partial i star is then trimmed to AQHAXSTAR number of elements using a procedure identical to that in section 5.4. The final partial star (P ) is trimmed with an AQHAXSTAR value of 1 to m 1 produce 1 • If a parameter LQST is set (has the value TRUE) , q then 1 is stripped down to an expression vith the following q properties: 1) the stripped 1 contains the same variables as the q original expression, 2) the stripped 1 covers the same eletrents of L as the q original, 3) the reference of each selector of 1 contains the q fewest elements of all complexes satisfying 1 and 2 above under domain structure constraints (i.e., interval variables must have a range of values in the reference) . 141 If the set L' is null, then (1) above is replaced with: 1) the stripped 1 contains all variables x , x ,***,x • g 1 2 n The latter condition occurs when generating descriptive descriptions covering only one set of complexes (see sections 6.1,6.2). Stripping is done by finding the disjunction of all complexes in L covered by 1 and then adjusting the reference q of each selector in the sun to conforn to the domain structure (i.e., for interval domains, this involves filling the gaps to while form an interval, while for tree structured domains, this involves finding the lowest level generalization of all elements in the reference)* A simple example may clarify the procedure* Given 3 variables with domains: Variable x 1 X 2 x Structure Values nominal [0:2] interval [0:3] tree-struc [0:6] 0,1,2=>5 3,4=>6 and input complexes and parameters: L: 1 [x =0] [x =1 ] [x =2] 11 2 3 1U2 1 [x =1 ] [x =1 ] [x =0] 2 1 2 3 1 [x =1 ] [x =2] £x =0] 3 1 2 3 L«: 1^ r« t -H C« 2 =2] [x 3 =3] 1' 2 [x^O] [x 2 =3] [x 3 =1] 1« 3 [x i= 3] [x 2 =2] [x 3 =4] AQMAISTAR = 2 LQST = TBOE cost functions: -1 (maximize number of complexes covered) 2 (minimize number of selectors) tolerance = C for both functions* Let e =1 , the elementary star ES and the partial 1 1 star P contain: 1 ES^^ [x -0,2], [x 2 =0..1], [x 3 =5] since multiplication by the entire space E (P ) leaves each element unchanged. The trimmed P is 1 P 2± : Methodology and Computer Implementation 1. Report No. U1UCDCS-R-77-869 3. Recipient's Accession No. 5. Report Date May 1977 6. 7. Author(s) James B. Larson 8- Performing Organization Rept. No. 9. Performing Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 61801 10. Project/Task/Work Unit No. 11. Contract /Grant No. NSF MCS 74-03514 12. Sponsoring Organization Name and Address National Science Foundation Washington, DC 13. Type of Report 81 Period Covered 14. 15. Supplementary Notes 16. Abstracts A formal methodology and computer program are presented for the transformation of a set of user supplied logical decision rules into a new, generalized set of decision rules which is near optimal according to a user supplied criterion. The VL$2 logic system (a multi-valued version of a first order predicate calculus) is used as the framework for defining and expressing decision rules and transformations on decision rules. The program INDUCE-1 which implements certain inductive inference rules using a graphical representation of VL$2 expressions is described and some ex- amples of inductive problems solved by the program are given. 17. Key Words and Document Analysis. 17a. Descriptors inductive learning variable valued logic inductive inference decision rules production systems 17b. Identifiers/Open-Ended Terms 17c. COSATI Field/Group 18. Availability Statement *ORM NTIS-35 (10-70) 19. Security Class (This Report) UNCLASSIFIED curity Class (Thi 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price USCOMM-DC 40329-P7 1 SFP 1 S 1971 AUG f ram