UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN So ; te: «*« ££. t a , s ww 3 » S£ ZT - — ^s-t= ffsp / ^en renewing by ohnn. P^ous due c^*^ ""**»*« d. te below L162 UIUCDCS-R-79-982 2opy 2 I • I Wl' I I UILU-ENG 79 1730 Learning Without Negative Examples via Variable-Valued Logic Characterizations: The Uniclass Inductive Program AQ7UNI by ROBERT STEPP ft. July 1979 *4*&>i *Sa Report No. UIUCDCS-R-79-982 Learning Without Negative Examples via Variable -Valued Logic Characterizations! The Uniclass Inductive Program AQ7UNI by Robert Stepp July 1979 Department of Computer Science University of Illinois at Urbana- Champaign Urbana, Illinois 61801 This work was supported in part by the National Science Foundation under Grant NSP MCS 79-06614. ABSTRACT This paper describes the underlying theory, internal logic, and evaluation of an inductive program AQ7UNI which accepts a set of symbolic descriptions (events) of arbitrary objects and produces a general description (c hara cterizat ion ) of the set. The events are attribute- value lists and the resulting characterizations are expressed in a simple yet powerful formal language VLj (Variable-valued Logic system 1 [Michalski 7^t 75]) » which is a form of monadic predicate calculus. AQ7UNI belongs to a family of inductive programs developed at the University of Illinois, Department of Computer Science (see [Michalski 77a, 77b] for a summary) which employ quasi-extremal optimality techniques. The degree of generalization (as defined in [Michalski 79]) and the search space used in this method can be controlled by the user, through a variety of operational parameters entered with the problem data. Several artificially-constructed problems from recent papers on variable-valued logic ([Michalski 75, 78] and [Larson 77l) are used to illustrate the program's capabilities. The author acknowledges the support and encouragement provided by R. S. Michalski, Associate Professor of Computer Science, University of Illinois, who was the source of the fundamental algorithms on which this work is based. Digitized by the Internet Archive in 2013 http://archive.org/details/learningwithoutn982step CONTENTS Introduction 1 The Uniclass Algorithm 6 Sample Characterizations "TRAINS" 16 "BOTTLES" 20 "FACES" 25 "ANIMALS" 28 Comparison with Other Methods ^3 Summary 55 Introduction This report describes a technique for constructing characteristic descriptions of a set of instances of a class of objects ( events ) . It is assumed that negative examples, i.e. examples of objects not belonging to the class, are not available. Thus, the concept formation problems being considered are somewhat more difficult than those in which negative examples are present, particularly if they are carefully selected "near misses" (e.g. [Winston 70]). The problem of learning from only positive examples is called a unic las s generalization problem. The characteristic descriptions or characterizat i ons will be in the form of logical rules defined in the variable- valued logic system VL^ Ciwichalski 7*0* The characterizations can be very explicit, retaining all features of all events, or they can be very general, referring to just a few key features. This difference is captured mathematically by an attribute of a characterization called the degree of general - ization , which will be precisely defined shortly. At the lowest degree of generalization, a characterization will retain and separate the features of every event in the group, and is just a list of the attributes of every event. At a medium degree of generalization, the characterization will make a statement of general attributes about several sub- g. ^ups of similar events. Finally, at the highest degree of generalization, the characterization gives just one statement which gives general attributes for all events lumped together. There are many inductive problems in which specific instances of a class of objects are given and the task is to determine the common properties of the objects in the class. Sometimes a taxonomy of the feature space is desired, based on key features, such as those features present in characterizations of medium degrees of general- ization. The subgroupings of the events used in the characterization form clusters of similar events, each cluster differing from the others. Such ad hoc clustering may reveal a fundamental property of the system supplying the events, or at least aid feature selection by pointing out variables of special interest. The work described in this paper utilizes an event description language called VLi , which was developed by R. S. Michalski. Actually only a subset of VL^ is necessary for this work. Using this subset, we assume that all features are representable by non-negative integers and that for each feature there is a positive integer d which gives the number of levels in its domain, i.e. the feature value will be in the closed interval [0,d-l]. Unless specially named, features are referenced 1 - the variables X^ , X 2 . Xo, and so on. In system VL^ , events are described by a logical -"pr-ssion called a term which is the product (logical conjunction) of sele ctors . Each selector is a logical unit of the form [x^=set of valuesj and is true when the value of the variable X^ is an element in the set of values. The set of values is the refer ence set of variable X^. 3 In VL^, a selector may utilize other relational operators -. = , < , > as well as = , however the equality relationship is the only one included in the subset of VL^ which is necessary for the task at hand. An event with n features represented by the variables Xi , X2»...,X n is represented in system VLi as the product of n selectors, one selector for each variable. The reference sets all contain but a single value, such that the product of selectors is true only for the feature values of the event it represents. To illustrate this, suppose there are three features and a specific event is given by the triple (3»^»2). The VLI expression [X1=3][X2=U][X3=2] is satisfied only by the event (3»^.2) and uniquely represents this event in the VLi language. The VLj expression [Xl=3][X3=2] is also satisfied by event (3»^»2) and perhaps many other events as well because no selector for X2 occurs. When a selector is omitted, the missing variables are free to assume any value in their domains. If the number of levels in the domain of X2 is 6, then the preceeing VLi expression is equivalent to [X1=3][X2=0,1,2,3.^,5][X3=2]. Algorithms and their computer implementations exist for synthesizing variable-valued logic (VL^) expressions to discr i minate between two or more classes of events [Michalski 73.7*0 [Larson & Michalski 75]. A different problem exists when a VL< characterization of a jingle class of events is sought since no negative examples are present. Such a characterization may also be called the uniclass cover of the class of events. When producing VI4 expressions which will discriminate between two or more classes, the expression for any given class must describe a subset of the event space which includes or covers the events in the learning sample for that class while excluding alien events belonging to any other class. Better covers are those which are simpler or more meaningful as reflected in the number of terms in the VI4 expression, the number of selectors they contain, as well as other measures of desirability which the programmer may select. An important point is that even the worst cover must still include all events in the learning sample while excluding all events in any other class. If only a single class of events is present, the test for the inclusion of all learning events with simultaneous exclusion of alien events cannot be applied. A uniclass problem may be converted into a two class problem by inventing a second class of events defined to be all those events in the event space not belonging to the first class of events, but such an approach, which pro- duces an exact uniclass cover, is of the lowest degree of generalization and does not generally yield a descrip- tion of the class of events any simpler or more meaningful than was given in the original event data. Simpler and more meaningful are terms which do not have precise meanings at present, but one can develop precise criteria of optimality of solutions which can approximate the simplicity or meaningfulness. In the multi-class approach, one wants to generate the simplest discrimination rules possible which implies that VI4 rules should cover as large a subspace as possible, but of course still not cover any alien events. That approach to the uniclass problem yields two dilemmas. First, in the "exact cover" case, all points in the event space which are not sample event points are alien events and initial covers (that form the characterization of the lowest degree of generalization) cannot be expanded and generalized since doing so would include an alien point. As the result, no generalization can be made and only a simplification of the original data is possible, i.e. the number of terms in the characterization may not be as great as the number of events, but the complexity of the characterization expression remains maximal. Second, in the "approximate cover" case, no alien points in the event space are created and the expansion of the VL1 rules to include adjacent points in the event space is allowed. The attempt to make the terms as general as possible to simplify them takes one to the extreme, when entire event space is covered. The result is too much generalization and the VI4 characterization no longer shows any individual features of the learning events. If an advantageous characterization of a class of events exists, its degree of generalization must lie somewhere between the two extremes mentioned above. In the solution of this dilemma, the degree of generalization is controlled by introducing the concepts of density threshold and selector thre shold to determine where the middle ground on term generalization is to be taken. The Uniclass Algorithm The basic uniclass algorithm was developed by R. S. Michalski and was partially implemented by a student at the University of Illinois, H. Yuen. The following discussion will explain the algorithm as it exists in a modified and extended form which is the basis for the implementation of the inductive program AQ7UNI, version 2. In the VI4 system [Michalski 7^0, a cover for a class of events is a logical formula which is the disjunction of logical expressions called term s. It has already been shown, by examples, how a term is the product of selectors, and how a term may be satisfied by one or more events. A complex is a subset of the set of all points in the event space. For every term there is an associated complex composed of all those points in the event space at which the term is satisfied. Some complexes cannot oe represented by a single term. Such complexes will be purposely avoided by requiring that any complex by exactly described by some term, and with this constraint terms and complexes become equivalent, one making a logical statement, the other a set-theoretical statement, about the same situation. Throughout the remainder of this paper the words "term" and "complex" will be used interchangeably, each connoting the hidden properties of the other. . A cover is a set of complexes (a list of terms) such that every event is in the union of the complexes. If the intersection of any two distinct complexes is non-empty, the complexes are intersecting , otherwise they are dis jo int . The variables in system VL^ may be of nominal or interval scale. Nominal scale variables may be simple, called FACTOR type variables, or generalization tree structured, called STRUCTURED type variables. Interval scale variables are called INTERVAL type variables. A syntactic limitation is built into the VL^ selector reference set notation. FACTOR variable reference sets may be any powerset of the domain of the variable. STRUCTURED variable reference sets must be a single leaf or node in the generalization tree. INTERVAL variable reference sets must be a single interval subset of the domain of the variable. Because of term/complex equivalency, these syntactic restrictions also further restrict the subsets of events which are legal complexes. 8 Some definitions are needed to assist with a formal presentation of the uniclass algorithm. varia b le X^ Xi l l^iSn, is a set of s^ values in the domain of variable Xi selector [Sjl C s il = [X^=ri] is a logical expression in the VI4 system which is true only when the value of variable Xi is an element of the set ri comple x C(E) C(E) = [S kl ][S k2 ]...[S k .] Hj'n is a conjunction of selectors which is true for all events in set E and false for the maximum number of events not in set E. C(E) is both a term in a VLj logical expression and the corresponding subspace of the event space . density of complex (D(C(E)) D(C(E)) s j- is the ratio of the number of events in set E to the number of points in the event subspace C(E). degree of generalizati on AG(E) AG(E) = -log 2 D(C(E)) (introduced in [Michalski 79]) is the average number of bits of information needed to locate a particular event from E within a unit event subspace of size #C/#E, with #E and #C as defined in the previous definition of density. When event set E is described by complex C, each event in E is being described by an enclosing subspace containing an average of #C/#E points. This gives "generality" or "uncertainty" to the location of the event (it is one of the #C/#E points, but which one is it?). The degree of generalization is the average amount of information disregarded in determining the location of the event when it is described by C(E) rather than E. rank R ( e ,e ' ) R is a measure of the dissimilarity of the events e and e'. Recall that both e and e' are sequences of n values for the n variables. Let d(x,x') by if x=»x' and 1 otherwise. Then n R(e,e') = 2l d(xj[,xl) where x^ l=-i^n is the i=1 • « sequence of values for e and x^ l£i build neighborhoods j (lrank. Rrank+1 ("rank" denotes the rank limit given by the user). The neighborhood construction algorithm attempts to form E N as the union of the E R . sets for i from 1 to rank. Beginning as the set containing only the seed event, E N has one subset E added to it, starting with E R ^. If 14 the selector threshold is satisfied (i.e. the number of selectors in C(E N ) at this point is not greater than the limit specified) then the density of C(E N ) is evaluated. If D(C(E N )) is not less than the density threshold, the process continues by going on to. add events from the subset of E R of next higher rank. If the density is too small, then the subset by subset costruction process stops and an optional event by event construction process begins. When the event by event construction process begins, E« already contains the union of some E R , subsets such that E N satisfies the neighborhood constraints but the union of E« with the subset of next higher rank does not. Let that subset of next higher rank be called E R i , then during event by event construction, each event in E D . is K j individually added to the set E N temporarily and the desirability of E«j is evaluated. If E N satisfies the neighborhood constraints then the individual event becomes a permanent part of E^. Otherwise it is removed from E N and placed instead into E R . + ^ where it is in a position to be considered again later. After all events in E R . have been considered neighborhood construction halts if none of the events in E_ . were retained in E M . If some events were retained, then the neighborhood construction continues to the next higher rank in the usual way. Pull Neighbor 15 e-npty all lints L: (lRANK add events on list L r to E N count the selectors in C(E N ) add • to lint of rank RANK-fl add event e to E., sM»ct n-xt event in E R -(SEEU} remove events on list L r from E N Bomeadded- No select first event e on list L_ r»r+li HALT if r>RANK oomeaddedgNo remove event e from E % , and add it to list Lj..^ aomeadded=Yes Legend ■ SEED Eri Li. rank E«i i C(5„ STi DTi D(C) select next event e on list L_ i an event which was selected from 5^ the set of events remaining to be covered a list of events whose distance to SEED (measured by R) Is i i a rank limit imposed bv the user the set of events belonging to the neighborhood ) i the smallest single complex covering the ev«nts Efi the maximum-number-of-selectors threshold the minimum-density threshold i the density of complex C Neighborhood Construction Fipure 2 16 Examples with Sample Characterizations AQ7UNI will be illustrated using four examples from various articles on the VL« and VL2 systems. The raw input specifications for all four examples are given in [Stepp 790 along with the actual output of the program. TRAINS The first example is called TRAINS and it has been the subject of previous work using the VL£ system [Larson 77] • The trains are presented pictorially in figure 3» I" figure 3i "two classes of trains are shown, however in the following characterization the trains are treated as one single class. There are six domains in the trains example t the number of cars in the train, the number of wheels on car i (lsi<5)» the length of car i, the shape of car i, the shape of the cargo in car i, and the number of items carried in car i. When a train has fewer cars than the maximum number of 5» "the parameters wheels, length, shape, cargo shape, number of items, for nonexistant cars are given the value "not applicable." For this example it was decided that a character- ization consisting of 2 complexes with approximately the same number of events in each was desirable. After several experimental choices of uniclass parameters, 17 io n o M^T - i a a a i -JH •tr— & 4> — o — u* \3 — cr Hj cH ^ JL l o o r V-tf \j — cr Ho — cr 1 V "0 — — cr Ka)-IoU H D o — cr £MLALW ■d — cr "-c — o — cr ti — cr u Jl ! — O — cr 4 l d — & Hd cH u D II n) — o- 1 tr^AMAHB i ^ D- ^2^ H3 — o — cr D -o JL u cr— cr i: — a J-J |V\VWWvWi HO cr Jl (j C> (J K°HQ3 JL xr ^L£^^MQ D Jl TRAINS Figure 3 from [Larson 77] 18 pleasing results were obtained using the degree of generalization given by a density threshold of 10"*" with eight neighborhoods. An english translation of the complexes produced (see figure *0 is* "A train is a sequence of five or fewer cars (this was presumed by the way in which the input data was set up) in which the first car is a locomotive (which is a long: car having 2 wheels, with no cargo). The second car (the one attached to the locomotive) carries circles, triangles, or rectangles, and the fifth car (if it exists) is short, has 2 wheels, and carries 1 item. Additionally, there are two distinct types of trains. One type has circles or rectangles as cargo in car 3 while the other type has one triangle carried in car 3»" The complete output of the AQ7UNI program for the TRAINS problem can be found in [Stepp 79], The program generates two complexes covering five trains each. The first part of the english translation above comes from a report in the output listing which gives the selectors which have identical values in both complexes, the "common characteristics." The statement that two types of trains exist reflects the two complexes, each describing five trains. The VL« statement of these complexes (with common characteristics removed) is given in figure ^. 19 car 3 does not contain triangles ^2 L T5 — o — cr •—o \o/-jpM ftMWvWA' XT p-W-05 root_\a/-4 a J. tt — o^ cr-ff xj — O n o>rs>44H u — cr u — o — cr o u ji TT 51 — n. XT JI i -0 cr XL "— O — cr JL •-O cr car 3 contains triangles l~ KahaHE o o -JLA li_\ A /—JTi 17 - T5 — XT D — ^^O | A j- fooo T3 — TT loi- i j -fAi-j □ a g 14 X5 — cr xr TT xr a v./~l -UilAltIi5-1j («AST&faIbKS-1) LDill'LiiX 2 OP RANK J COVERS (»'l'li I ANULU.S = 1. .2) (»ClhCLtS=0.. 1) (IAS1 UliU'l.UX 4 tVtUTJ ( 4 NLW) Ulia U^NjITK OP 3. 3 J3B-01 i LVtlUS ( 3 NfcW) WITH DtHJITY OP 3.7>OE-01 liliISKS-2) J OF HAIIk COVbHS 1 tWtNIJ ( 1 NLW) WITH iHSHSITI OP 1.0J0L»0U (»Sv»HAI«ii:> = 0) (KTlilAIICLtS^I) ( • Z 1 Hi: LtS^2 ) (•ASILRISKS^I) Figure 6b 23 describee precisely those bottles produced by company B. Another characterization was made in which the degree of generalization was controlled by the density threshold (set to twice the overall event density) rather than the selector threshold. Figure 6b illustrates the three complexes in this second characterization. Surprisingly, the bottles produced by company B again form one of the complexes. Figures 7a, 7b, and 7c show generalized logical diagrams (GLDb) for the three characterizations made. A generalized logical diagram contains a cell for each point in the event space. A complex is represented by the set of cells which represent the events the complex covers. Thus complexes are areas on the GLD. The characterizations of figures 6a and 6b correspond to the GLDs 7a and 7c respectively. These characterizations utilize disjoint complexes and this is clearly displayed by the GLDs, since no areas overlap. The program AQ7UNI can generate characterizations utilizing either disjoint or intersecting complexes at the user's option, and the characterization illustrated by GLD 7b was made with the same control parameters as in GLD 7a, except that intersecting complexes were used. GLDs are fully described in [Michalski 78j. 2^ Generalized Logical Diagrams for BOTTLES O A complex 2i C#squ=l]C#trial..2] complex 3j [#squ.l]r#tri-l] L#cir=0j[#ast=2J complex 1: [#squ=l][#ast=l] parameters used » DT=1 ST=2 TYPE=DC Figure 7a complex 2« [#tri=1..2][#ast=1..2] complex li [#squ=l][#ast=l] parameters usedt DT=1 ST=2 TYPE=IC Figure 7b complex 3» r#8qu=0j[^tri=l] L#cir=2JL#ast=:lJ complex 2: r#tri=1..2][#cir=0..l] [#ast=2] complex It [#squ=l][#ast=l] parameters used: DT=.22 ST=4 TYFE=DC Figure 7c 25 FACES The next problem characterizes the "faces" presented in figure 8. Each face is described by the four features t number of circles, number of ovals, number of triangles, and number of squares. This example, like the others, started out as a discrimination problem and is treated as such in [Michalski 753 • For our use here all class boundaries are removed and all eight events are considered as a single class. Two characterizations were produced. The first one is constrained via the selector threshold to use a maximum of three selectors in any complex, tending to cause some generalization to be made since four selectors are required to specify any single event. Additionally, the mode is specified as EXACT which limits the general- ization to just that produced by the elimination of at least one selector. The first characterization is illustrated in figure 9a which shows the four complexes which cover 3i 2, 2, and 1 events respectively. Because these complexes are disjoint, the events they cover form clusters and in this case they are hierarchical. In figure 9a the horizontal line divides the eight faces according to the number of squares they have. The four faces above the line have 1 square while those below it have or 2 squares. The top group is further divided according to the number of circles. Characterization two in contrast to characterization 26 F(0) F(l) FACES Figure 8 from [Mlchalski 75] 27 Characterizations of FACES #circles=l Tsqua:es=l #circles=2 #squares=0 #squares=2 O.U'I.lX 1 o[' HAIih. 1 UOVi.hS J EVENTS ( J NEW) WITH UKNS1T* OF 7.5JOK-Q1 (JC1UCI.K3-2) (I0VAL^ = 2) (liiyiJAIitS-1) COrfl'LtX 2 OP 1 &ytNl'S I 2 NEW) WITH OtNJlTK OF J.JJJE-J1 (|L1&CLES=2) (»TIUANGLE.S-2..3) (I SQO AKK3= 0) LOfUM.LX 3 OP KAMK 2 COVth-S 2 cVENT:; { 2 NEW) WITH JHUSITK OF 2.5J0£-01 (IC1KCLE^1..2) (I0VALS^2) (I iiy U A k ES = 2) CO,1lM.t.X 't OF HANK COVthS 1 EVENTS J 1 NEW) WITH DiiNSITX OF 1.JJ0ii»00 (»LlRCLtb=1) (IOVAL:i=2) (ITR1ANGLESM) {• SQU AhES = 1 ) Figure 9a #circl«s=2 #circles=l C( ?!!£{-f?, D cJtP F BA,K 2 C0, *»S 6 BTBNTS ( 6 NEW) WITH DEBSITr OF 1.667E-01 ( I H oLL. t .>— ^ ) COrtPLEI 2 OF BANK 2 COVIRS 2 EVENTS ( 2 NEW) WITH DBMSITT OF 3.333E-01 (ICIBCLBS^I) (IO?ALS=2) < 1TBIANGLBS=1 .. 3) * (ISQU ABES= 1 . . 2) Figure 9b 28 one, is made with MODE set to FREE, which permits the greatest generalization, however the RANK of the complex has been limited to 2. The RANK value, when lower than the number of variables, saves time in the program by refusing to consider subgroupings of highly dissimilar events, i.e. the lower the RANK, the more similarity between events in the subgroups which a single complex will cover. The second characterization is illustrated in figure 9b. Now there are only two complexes, which divide the faces into two groups i those with 2 circles and those with 1 circle. The selection of the character- ization of most utility rests with the user and depends on his current level of understanding of the environment in which the problem exists and the framework in which the characterization is to be used. By adjusting the parameters of the characterization, the cost, optimality, and degree of generalization may be varied widely. In the two characterizations presented, the length of references criterion was used to judge neighborhood optimality. ANIMALS The animals example represents a family of problems in the biological sciences. A lar^e number of animals, presumably of microscopic size, are shown in figure 10. This problem was taken from [Michalski 75] where it was used to illustrate the usefulness of Variable-Valued Logic to the classification problem. The features used 30 to describe the animals are the following. • number of black circles on the body • number of tails • number of crossmarks on the tails • number of easily distinguished extremeties • type of body texture • number of empty circles on the body • number of empty squares on the body • number of empty triangles on the body • type of tail • shape of body • number of sharp or straight angles • number of eyes • number of black squares on the body In generating characterizations of the animals, we will proceed in two waysi (1) A characterization of each class or phyla will be made separately. Since we assume animals within the same class are similar, we will seek a characterization of a high degree of generalization to establish common character- istics within each individual class. (2) A characterization of all animals will be made in which all classes shown in figure 10 are combined to produce just one class. Since we are now looking for similarities and differences among the animals, we will seek a characterization with a medium degree of generalization. Output data for the ANIMALS example appears on pages to . There were 18 separate characterizations produced, which are listed starting on page 37 • Of these, the first 29 0. JEXfMS: 2. GRUFFLES • OV)^ 4. SNORPS 6. MELLINARKS: 8 FUBBY100FERS: 10. MORIEYS 12. F10RGIED0RFUS: ^AJ)S^~^ 1 SMUXEYS: 5. SPURONS: 7. SCRANILLEM5 9. SnEFOLYBUFFS 11. SEYLRONS: 13. SELFRODEIGROLFS: SPECIES OF 'ANIMALS' Figure 10 from [Michalski 75] 31 14 are individual characterizations of separate phyla, while the last 4 are characterizations of all animals. Looking at characterization number 1, which characterizes class 0, the Jexems, we see that these animals are ones which have no black circles, crossmarks, empty squares, empty triangles, eyes, black squares, or single tail but they do have two empty circles and are blank in texture and have irregular or circular shape. The individual phyla characterizations provide a good description of each class which, because it is mechanically generated, is always accurate. Comparison of the characterizations of the individual phyla is also enlightening. When we compare character- ization 3 (Gruffles) with characterization 1 (Jexems) we see that Gruffles have no tails; Jexems may have tails. Gruffles have 2 or more empty circles; Jexems have only 2 empty circles. Gruffles always have empty triangles; Jexems never have any. Gruffles may be ellipse shaped; Jexems may not be ellipse shaped. Gruffles have no sharp angles; Jexems may have sharp angles. These differences provide new information about the two classes which otherwise might go unnoticed. In the case presented here, Gruffles can be differentiated from Jexems 32 by checking the number of empty triangles feature. In general thought the characterizations of separate groups are not mutually disjoint, so discrimination rules cannot be produced merely by comparing the characterizations. Now we consider an even more interesting character- ization. Consider these animals in figure 10 before the phyla existed, and suppose that we believe it reasonable to make the phyla classifications according to the 13 features which have been defined. By requesting a character- ization with a medium degree of generalization, we should expect to see all animals described via subgroups, in which the animals within each subgroup are similar. This was done and the results for two different characterizations are labeled characterizations 15 and 17 respectively in the program-generated output. In characterization 15, the 8 complexes given divide the animals into 8 subgroups, which presumably could form the basis for new phyla classifications. In figure 11 the animals have been rearranged into these 8 subgroups. The largest subgroup contains the more "ordinary" animals. Whether the relatively large population of the first subgroup is good or bad depends on the user's viewpoint, however the uniclass algorithm always tends to cover the most events with the first complex when disjoint complexes are being produced, as in this case. The other 7 subgroups 33 i ii « fe^vAy-i w>< Qi^ r ,< Ho M K MlO HM MM *m wo — U*. W MO HM com (J • II M on* 10 <- M M (NO. 04 I XI M 01 m ■ o via uh 10M 11 M MM JH (J M COM OM COM MO. WQ 1/1 X] Oi/I UO I COM M aM a— « So to — I Hi MB MM | OtO M UH X ■ •— ' 4* 5^ MM CO 10 wo ^fc. H lO oto W'J UM HM m COM U-l C MM 10 UM . 10 m x:o M mm ^■-J 11 MIO -)M MU I oa m i m w -J «: -J ua M ua • O U a (0 l o II o to I a, c M a »- o ♦ COM M O -*° o m o o CVJU o Ity to Ski Oul ux _ UO Q-«0 t IOM J II iMto tor* HUM 4 o H^M MO 3COM U • Do MfN — H~~ Z BM OtO MM-* M ■ HO M — to aM vO 1 HU — tj 4 •"M tai ** •< MIO MM O toOM wH H H M MO • wS mum B.O 4 MM MM J M MO -i 30 >£H C -< ■ to_ UM tOH 4 MOa M MU(J ^►4 MUM MOM B. M O — to O • f II M UM MO MM MU II M I -J -IXM Q.-14 ccqh o u MH >M OtO -J 30 ail oa o- M I — Om — tow 10(N MO IOOU M CM 38 VLi Characterizations of each Phyla (continued) II HO m o 01 M M «» M M M •» | H «•» M a, It- H « B a, M (A Q N M O m M O «■* M W\_ M O-: M c^-* M vO M M *z O • a O • o *M o • H O • H o - ■ O o o v^Ji O O o o •iT M o o ^- W o o ^t H O - M O T-l -> O o • M f II M • II M • II < • 55° t "m • is • ll"* a O < 3 i o no o o o o (0 to to < .. (0 JH -)H mo MH fll U *■* u H u M U M u M U H U M M fa M e^H M MH M eu j M MO M MM M MM «H o_ MH o— UH o — MH OM MH OH MH OM MH OO OM r« OM OM Om ll OM M OM H Om H MtO M Ml/) M-* MCO M_ MCO MM MCO MM M(0 M— _ MCO MM ■ HM ■ H— ■ H»- a HM a HH a H O a HM com MO COM M MM M m 34 MO MM M* - MM M-.ll (OMCI MM MH MQ 10 Mm COM MQ U1CD-, MQ C0H MQ (0 MQ MQ C0 W M ao > =0O > xoo > MM t» a_ > ZMIO •» a OuO M II Oi/l M O^l QOS1 OvO MM Oao M»- on MO 1 om 0J-. UO QU UO QO UO UO OH UO Q UO QCM UO QM 1 a_ 1 U 1 II M 1 «# 1 M i -) I a— . 10 04 35 MO COM MS COM M«3 M COM aa 10M aao (OM 33 coca (0M aoo io r* HU II M 1 0< ■ dilO tor- HM cor» HU M cor* H-* to»- H (0n *»- MO M a a co _ a SM M ■ Mil a ■ (NM BIO m— n OCO M" - OC0 M*-»lO OlO MCO OtO MH BIO ■ace 10 OtO MHO M x •• H a M M I H a to ii (OCX H -a M ■ HM M IB* M — « w M «^ M -»o M M M MM M M«- 33M ■JSH 3M oo TJU1 «o =a 33M OOtO 33M r»«» MM iTlMM Q3M vOM HU aa-^ HU A^ HU cAm HU ca i HU *"^ HU •• m HU H — << ■*o -« <4 ■* UM ■* M *■ ■* < MM cm —M II ECU —V5M DM ~^«i ca CM CM ^^ -,^ CM *^ CO CM *-MM onto CO to MtO coca MCO co h MtO — nj MtO M to CB10 — -,. Cl, MCO «*M O 10 II M o iojco O (0M • O (0 «• O (OM S3 O Q O WOO i-q O (0— O MH HUM to MH Hio«; C/3 MH HC0M MH H-.*- MH HO-» MH H II II MH H O ias ancnaq to a =B(0 Ix, M =nioco a aM a a co a acoto o a a-jsj 04D« >H — 82 MOM txcao M» MOM D»MM H» M0)_ MOO m2 M ll Mcaoi MM MMM OM»j as MOM m H CO W MM MM to MM MU PQ MM MU-) MM MC^ «^5 ca 04 MM MOlO MM MCO MM MC0M T M h- J "^M M-» v«HOI o M Ma vO— II M MX Xi-- M M >-)a m M -I* 1 r»»-M M -la coa mto< w Q M Ma M || vOCOM *^0 o Q-M a >H D-M % m mm Ox ^, MH II M W MM o« mm CM M B 1 HM B — i0 Hi O C — -ca >H B M o C (OM B oa — B COM ►7 oco -»o oco M | oo WOm tii SB COOD t-1 a a (0M (X a C0-— DO ■ toca — CO COM ■ M mcs w MM MCO aM 33 J -» aM «H^ aM M •< MM MU 1MM «•» m «• «« I" M M «c» MHCI mm M«»0 o MM tw -. 1 1-1 MM ««— . >03H MM >co>» 0- MM >-cn co o MM txco tij MM > w ll M^ MM B>00 w to MM >— oj tO 10 O C » to CO oo k73 CO OOca £Z 10 Ol-> | W to O CO PL, (0 O II U U>0m to O a M(J U^M w •-•J 4 om ax o=a avi oa a m (NM MM »-M ax Q_« ■ M Oa MH «- II CU MH •- 11 M MH »- ll •< MH r- ii ca MH »- II M MH T> II M MH r-ll M ■ M ua »H UM 3M UM MM UJN MU 1 Mi4 MUM -)M MUM MM MU II MM MU 1 OQ -u 1 a, oca M 1 J OQ M 1 M oa M 1 M OQ M | M OQ M 1 -J OQ M 1 M X ■Jx;< M d»lM M -Ik: B M JlC< M MM* M -l*M M MMB ua MM< O &MM Cu-133 CM MM33 r> a,_i.< -d- MMM 04 coco M BCOH M COM tH M COCO M O 331/1 M B33H M ea« 00 33 o— — o^ 33 o— -» t-l 33 o-~ »-l 33 o-^* T-l M o — T-l a o , *^** »-♦ =3 o — 3fc* u ^ H U = to B0 o r* r* to ^.^ to-» f ■m r- i o Cu to-* *- II li- ra It a o 13 M ^ % U 4 «o to II •k O U) E'- M Bd i a to* II 1 en 03 a Bu w c« Oi H H H 13 i M 10 ■3 g m • •J Bd Cu ^ I Bfl M «• 13 «•> ^ H to* M 03 ta to— ■J 03 •» O Bfl * B H B4 t» to *to O II M o CO 03 O M O o H * oj vf> cc^ O °? u>o °v o§ °? -to. w fN o • •ft. •s * u "o • ft. • w * u -H a, o J" vr\^ \T\ « O 03 <\J ii o » Oo o « v-| • II O II a II" o3 11 u II u r 'l M II 2 II " II M ' (J OJ < < <-j -l « *3 z ft. «»o b( rn n fN CM 'J (N03 O w f-rxj o-<— . cw Oto- || CO j O OM OO o o»» ■ O JW oco O OI o 1 1 < i a ♦ ^-^-« 1 w * ma * «• ♦ —to «— 1 II to 23 54 Mm 04 II ^Jy:'^ B4 MHO CO — COM 1 rt CO. \n O vOH cow O 33 l| r-^. OUJC o oz>«s E r vs 10 w 30 fllO r-ca o«OJ O 003 ow:o o>J rti •H •H < -a 3 I •u • H • a i • z .HO • Z II ..rj«. c 03 _) H =» r-33 (NX r- H ^ «-M r-M — - Bu, rxiTi ex. H Cu.01.tl Ol, BuH i| BkO I Cx.r0 _ OJ Bd H o~ Oil O w o=«» O'JJ C—Ct O II bri ODO rH 4P ^~*> O M o J H— • < m -OJ H II rH C3 CO Ou. m »-" M »»>-< »- -~ »-M VM *- _l t-attj KM 10 OJ vO X HIT H--i CM ;j M.0 wH M mh-o M to M -J MH~» mH >* 4-i 33 a to*-. to — (0 0) to^^ra in 'A to Mrs VIM to— OJ w — M(->0 00— .to- o D a u o CJ — a^ ao DJJ □ •< QO Q — ll ULl) rH rH •-3 to 1 UJ -L3 — =3-« 33 J" , a cu 2300 31 X II II — to 33—JU 33 — ao o w e o O (O T— HO H-l H »10 HE Hod H'J II H--M H»3 II c •H -i ■< rN 5& W3 Ma ii IX »- MM MCtJtO M xl M to o o 1 _l i*. 31 II »HtO 3103 l| J13 aiHio 31 OS •» 3JC3I0 CO o U • wi A x W on H-~ XX o~- O J •H M fN — i — a —.-">-■ J4 — xo W JJ ^m. — . 'J ■p •H TO 2 > rV, ■■33 3n«a] 3IOIX 31 fU i, r« «<■ aio^ sr-ta to 3 lO toe UH W>-» Wl| 14 tOHC* uj^ > 3d II IX) ujii -e u c •H ■3 H 33 UJ J!H Z ** zr« z-"/) z ZX3 03 zn » M o IIO CO * 3J ,M» OJ— « 1 .,- . ,. txJQ tij — H •H o -i X « !»- — OK so_ (Nro (N— i--! c — OJrj tob •• -J H to T3 o 10 «« toon (OEM >-d B>0) >cca< > U as >-l M II UJ -0(J3J WtOO -J tO M u -C + UJM 30 3 «>U1 i2 II r.s* «*Oi ■<^ 33 ^ || o o O-J -1 35 r-t-> o«- J— « rNoa jj (MOJ-u CM — «* r-Q3< »-csrx) z en a. -1 ■»x n ii to* i:ai □a «• X Cto> 13 0U O o x: .X] to « to-n H — — *i(0 11 *-> to «< e »H • a ■-■) H -a »;_ t/is >■< QJ— ' to to a c H-l rt o u H •-— —t c=o Oto °J — . re o — OtT o > -c 33 tO to-< too i| toca*- (/■)-» 03 too — tofflot tora — u II SB lx! 03 03H OIO cju 03— < tBC'J oauto CBU o «< r» uJO l»J"» wj 10 -»J«to^— m -i OJ z ujto I lxj«to — . •1 H M W B» l| S"* - >03 | o»~-aj =•— o B» OJM >— 13 D»to-;q 33 CO o (O O'O O oaai O » CO -J O Cm O z O z -3 M tj U*: u~ u « U— O U II 'U (J Oi U— < u~o • H H 33 as U4 T-ClJ OZ nas •-to OM o z to 4-> r* 03 M m *: rim PI II • »- il II r-ib<3as i- il il O II 33 O II II U3 •< 33 to O 10 -* to J -r>-( t-T-J IOH to_i o U CB to c Ml -l-i C '1 -)M wj || _JM H •< ■xo Mi b£H — kmi: SrStOfJJ l«S H-t ^SMtxl blSMa; U vO 01 zr.ca 53"3 Z <«- z«CH ztoa, Z3 0S«» DO aa Cu, — IN — O —■— ^ un ^^^* to-TO to— ^^ to Cx. O Ou, p» • Pu Dto O Onto) — P- O DU. to* bu, a II CJ C — O t O— ^3 O — II Q~- O — il o— . O— II O II z >• o »• ^^ O II OZ ^». ■o a ^.«. O 73 a M M H «- II II UJ 30 II -4 M ."Or /i zr""IM U U 'I UlM UM o z OM UZ (JM 31 2M O* -m CT3 cant C3 W aa il o ra=s oso 03QS o— . OJ _J p: H IHIO MU WM tors MU M Z MH -J CU _) -J xU M'J 1 M'J 1 xu 1 X-l M xO I xu ii XU 1 -iU OfNQ XI 1 o i :» C-33 • »;«!• t a • can r^LQ » OJ — » a*- o — O ' O "^ *■■ ' o^^^* O— ~* o^-~ Oto—» O^^to— a — — H-* U u u u U u u u H 40 II vD co 3B H O W » M H E* 1/1 «p« »4 o M it « 30 H * a 10 •< H M •• u • H o M (-1 o 19 1 H « II o * CM r> CM O ^ On O «3 an iX) -i vO° CD CO CN co o vO a) O _ at fr^M . 01 CM o • 'i •^ o o • • vo • -^ " H C^-o C>- » -^ ** o3V -^ « 3 c .H c H - • o II o II II " Oco II M o O co 11 u S3 V to M < X * ■< »« in«l t-iO in ca E U *^ H cd M «■ ii x fel tt-H Ouca rt.M ■x •• iV — Ck.ro •H m H o~ 0=B OH OH O w o oca c < a >-( o 'J « M. *m* o hi m X l| XH »nro fr>Oi ►- — Kr- XH -0 a HiO H>< Hi/1 H-< H»- H HX Cd M MfO i-i-aa M C3i M=3 M MCd HtJ H a cox ton VIH 10 1/) coca COO C0H II > 2-.0 XCO 31J 23 -» zo z z~-* O 3 co- jj • U-J CO M COO CO < Q u O ca— Cl-O Q-O a_ ao a ii Q_ 'J 1 35 » 10 ii u f*! «H XO to m 3 — 330 33-0 33KJ ca n 33 ra 33 II O CO T- HI HX H< HZ H'O HM HC1 o O < rN M i| 1-1 II MJ 1-4— 1 Mca MU H3tf • -J r- »a< =■-> 31 3 »-l 3€H 341 1 3 CO CO o u • CO v-< J X a. W H rN *— > 1 — ■< m4 ,-H -U ^~3C — .UI c 05 a XQj an 3 IS men 3> « JO 31 CO — 10 o 3 u~i cu c CO — COM -o.o CO — CO * woo u o H XCO z 25>-l T> a z>— zca ii UOi M •H no » •» — II * «^ H -p O 3 M r» — IN*! CM JJ N« mo (N-~ r-»'S) 10 c« H . H u ^ r-X ^Oi X ii CO — 1 M too •< M < ■< •< i/l 31 M ca n a, — 1C «-%J — ' T3 —O •~^c — o — — *J CO •H iM l/l 35 ca 11 33 ca 3C roca H Jh O '/)-< 10 ii CO — lO II oii: CO 10 ceo U Dm H H-4 HCO H HCO HL0 H«_ HO — •< 0) II 31 z a ZO) 3Z .— . zca ZlO ZOO ZX3 ca +J o co :u ro ii C03 WO U3 3 COO— . >fNO( 30 — ■^ CJ cd OH u » > >H t» i| >H > J30 >cao 33 ♦ 10 >4 aj ■vlOJ jJX uia UJM "^ aj l| to WO io u C.JM CO 3 ^ in CO «n i C-l CJ J a r»H ^»H rr 1 a?H r~ — co r>coi<: Oi-C0 39 cd Ot-l Oi H .»x m— • vOQi in — ^ >4 j -j »- II X o o >c to E — Hca coal ■ jC • o 01 H ,M« "J «■« CO* x •• -J» 35 o o u H *■* o ■• O ca — aJ— ' M~-* o i- in tn ii V) — (0 II COO COw 10< u II z ro a — . m tn m caio CEX3_ cc— ~. CCH— . T" 4 i» LAJ'J >u« »j ». 4«jy. kO Ui 1*3 a CO*iJ HI > H M W > II s»co >io >co >co,o =— . — n »*-yi ca a w oio OKI OJJ O* OO 1 ooio O I < w 'J u*= Ui/I U2J IJIO U J U M 30 U— i3 H P3 en l/l l-l Ifl »-x COx BJZ VI »— en —4 mc no m j r»10 |"1 i| ■< n VllO HB< CD < z 10 eg ca I/1M CL«« OM o U ca M L> H u -3ca JC — CCS (■' t *0 it » bees ktCW 3«:>-iH 3«:io •4 H u o M X. CO 33 ~» JEW z — z< » ZIO — zca » M CO P» vJ •*'J ■< •< » -»: rtHfn ■xCtt • cn_ CD • cr-~ C0»»'< CCQ3< ca ■< CO a Ci. *-^ f— ^: r- —^j U-l (N-l 10 tu Q CVi Cl, • B.73 CV 1 a- 3 cu»n Cu 13 II 'J o-» o • o<< c • O^-O O— J O «CJ rj II a x O o J -3 OM 90 •-co z M i—i l- «"■ II rv| ii n-ICO tf l| in M in \C ~-iX r« ii ca M ■on JflM U U II CJ ejea oca uca 3 Q^ O»-l0 CM ra 30 en C0 4-I l| M CUM O — OJ J 3! —1 M r-3 i-i >-( II r/1 il -4 '1 -1 IJCJ -J -0 ML) M'J XZ3 XL) XCJCO X-JM XUIO -i:0 CNQ CO 1 CO I UH iO 1 u] i a< UJ-hO, CO | Cu Oz Bu«- -J*i _1:<2 -IX J!«i Ji!< J<< J^'* Ouo .-, 0m J i^_l CUM X_l a. — i — la-Hd — .43 z •o SU eca C33 CH cca cca to C •CO jcato 30 ^ 3*" o — o— O w o — 0~~* O "-^^^ o— — a — •■* H*~ U u u u U U u H CO 0) <1) iH s O o til' C •H +» O CD CO u 0) c to ^. o M — , ■■ »— 03 o r- o II ^^ ■B ■3 m O ■" IT> O 03 << M & M ■* M o at to a jm a ^^ H 1 O r- a •« *— «— «L a. — » a «~ M h 13 w* f— M M ■3 M K ** r- H ^^ O (O 04 «k 01— . ^^ O w m »— r« 1 H ^« • •» ft* oo •— — ft. ■ o o A. ^* II *" 03 ■ O It n ■ ^i^ oo at II O || M H u o M «-» ?• II to O o vO «• ■1 09 10 •* O U 1 to O H ** H H 1 ft^ & 03 ec w O 1 II to H U Bj O Mm H en U M ,— 10 1 c 0^ to Uco U — ■3 m J ifl © % 0i eo *- L o Ai mo M M D II M ■ « II ~ M II •• u a a B| ft* 03 13 UM K UO ^^ i to o •ft O M mm BJ — i to 0. * i a ■*^ a^ at MO ••o Cu, 1 «^ m 04 II •■ o ft* U«J ' —II CM »-» M ■ hi ,^ II 1 to MM *• U hi H to 01 u ._» CViOt — « •• CO a ft* Q ro IN o O •ft o — •4 M O H O a o° O M O*" O M O aM o "^S o° O -. tinued) gure 12) II IN 1 CO o " O o • m O w O « OO . » o y * 0* Og Ti °-3 a O ~ O o . ll O w Oi o J- II. .-* to J" O C\i H O oa O n Oo O -«iO O n - O < ii O at C\J M »• o II w n 2 11 a II M II M II N II " II «' || MM II Jt/1 II M II J H •— in uV O* oa t V, a« *-? MM r u. m a O = o r V tow O i| t~. oV S5^ <: m < u < H < » <;o- < to — ' c H to a MU M II M« M II MtCO mho M — M II 10 MMtO ad— TO mm: i OJ-l 1 ■H M to C»M OM O^* OM Oa ii o N3 (N , U1D — O-JIO OO03 OftiO ODM OCJ o«-o o-Jca inMco rH 13 M ■< 33 • H • H •o • HO .0 1 • HO • an • H» • a • a • a«» • CC"ft ^.^^ Q3 -4 H in io vCM *& (NM II •- II il »-M r"M~* •-Q3 *: •-«< •~ ll *■* (NO— • U % H MtO MU «•- wJIO M o«» o«» M H W >- tu«l k.H ButO ftuHt>4 Bu.cn co B-H H sum | 0u,H-* Bu. — Bu. •— Bu,m— . Bu,M— . •h nj (U m H O=o O — O 1 o— J- OC«» O"- O O M kc 0~Ti4 0 X O >-< O Oi ■1 H>* to Mm 03 II — . II — . Hll H II •u to Ha H — HS H. -■ — 1 HCdcn M— O »em Mi:a B-MtO >-— to a) NO 31 H II H»- H=a l-r- — HM_ Hi" Jd H = «» H— a HM-< HM^ HMM H M to rH H CO MJ M H«ft M mHM M -1 MH — M UWCHuJ uaj mH>* M_>- t-i 3 ft M Q IOM tO0» to w tOM^. l0"-^>3 to at co IOM inmco tOHO tOHO to— ^J IOMM l| 3» -H ■< joa a ZOO a o xo* IJl« aoo a mo aML3 a «• am* I— 1 r-\ 8 3> JJH M U«ai M -I co_jn M — • MHO M MMlftl -J CO j4 M— — wo — < r-4 •H o a O Q — OO QM Q0 4 QM QO Q — 1| QO*- a«» ca Qmoa QUO ata o O 1 II a II O mn B M 33 M aoo □a a a V< O 10 r- HO HCM H-J HCO-O Hta HM II H*-M HMM H — II H — II H»ll HO-« o w •p o •4 rv Si HC4 M(9 Mdl M •- M03IO M M WOIJ M^M M"^M M 10 M ~i •H c 1 ►J r~ »H (■ II ana 31 M l| 3HM » OCa* ■ H'J a ii o. a II Oi a jsiu atN=> o U • 10 M an MM OtO MH o — m a 10 ■< to*t Om II O M •H i-i (N — 1 M mM — >*j ii — Bui — *ll*l 0* — o -• , — Msa —id 33 — O — Buvl C C o o O m z aoi "»«■ »o^ 3NN 31 •« ao-. r» «•» amto amto aa a 3 OH »- >a >«3 B»0S DO 9> i| a) >ca ii > ii -J B» US II B-O — («0 — > i| a 1 to M CO od at M i. MO II «UH «4tO BJtOM MUM wtoo uyu M«3 Mao MtOM UCO u a a MM H 3 to x: ll *: a «^ ^^ *C II to a OM J 50 (NH VflW eo«-aq -< to< — in to — —JO O a to< a m ■3 *^ ^: • o in H M *4«» M< toca M-» to M — — B3 »-•« tO 33 xz aU o o in u H ft* 09 H — en H oto a o~ a tot,') a II M to it a OtO O ■ 10 tOO to-«t lOO— » tooa~» too— . tO03O» too~ tOM — to— ~— U H a CO CH — an o:h— . can «u anu mu«o «EtJ OSMO aoH mu a t-iw •< Cft »JO an W»tJ ■M — . jj <• _ mi a M» I co a •MM aJM 1 0u|»— . Lvl ^^— * Wl HM-H H M BS OtO » 03 B»— 10 t>coo B»— 04 B»03M t»— »J »mM l»^'- Bft^Oi p.— DO »»oo a a lO OO O 1 OO II O a ooa O a ooa OH II OHfl O a uto a *t > JC Bui O U* (J — . u~u U a O— O U a* U-~*t U Qi U~a U«M u— o -p H JO at •-o Maa •-•^ oa «- M 1/ OM •-to — < ■ft oz M-J to •— m M o jo ro i| ii nm< HUH •- II ll o ii a O II II O M O — O II l| nMM m ^ft- ^ -: at to to to OM toaj WJ to J ton 10 J — m ^^ toj •am o U cs to J CO cos ►JH OiH UM ^ ll MM MH M— . MM t->i- H M •< wsO •4MH M H MM 1 ktM< klMMl MmM MM.< tea i ft; en co Kh< ad«» I U oo H SO) jr. c «C -J«T1 Uft3>Q M- 04 -4 O U u w o t M H to H r> 04 G 1 Cu II II J « H ■ vO to S> 04 ^^ U L H H N •» m O 04 m 10 H *^* O II M ■3 *^ H M ■ a u kl tel in «• ,— ^^ M MJ 1 4b p^ D W o II H a M H M 04 Ou ■ W M to— , «^ to (9 a< H M — COM a» H w M H 1 4a to ■* DI a Ix M ^* Bi «tf M OO t3 M 09 B a U«J U M «fc H *m* M - - H % M n (N -« *^ to O •i MO •J MM II O 1 o o 5*i ON H 00- VO 04 vO a O "oj Ou O M ^» W tH -» cmih 1 " o w <\J * C\J*< OO ~ O Jto VfN,< O -"" T3 CVi (N O O •? •s •m •2 •2 •3 04 * 10 OQ 1 * II M M • a • no 0) »i o -3" to J- i4 » J" 3 ^- II .3 H O^M O M-J t-I » c^\mio 3 H r» • in ii 2 o5 11 2 A 3 C_J oa I'd I'd Ob As II men II 2 ii m ' ,1 OM O B3J C •H to <- <• < J c O ■* II ^^ a h«S b a — n a cn — M COM e«o fllO 00 '0 co^ ocaM rico OMto *^~'"* V. in M v*><< r»M ■O II O — O 1 o~i »rd OO* /nil imoM CO CO a 01 « nol O09 9 a en (NO. (Nta PD OB"* ("IN K >no!i • H I •a .HM « M H mi/i U-110 inn l/VO \f\tki \OH »-0l — no »-M<« u % % CD M «t oa M OO H M — E CO h MM MM fcH •u.0 0u»- B-O OuH fiu II 0-»3 t-cu M»- MM »- — xrj m— m < E o a HII H 11 HM H»= H HH H»- HM'J H to pr 1 O cd M mm MM i-j«* 1-104 MC4 M* - M Mcaa M— 1 M 'J M O. 10M tOM to — LOO too to tOB WH-4 tOMM tO«a i-H O ll «"» z«a 58 M a a~" a a— ao aM« acSM ZOM rH O «9> MH MH co— . U MO w M MM — MOQ3 M M Qocd < be no u O ta- Q — oo Q — O II (3 OO o«« QZa U 1 no to (J 04 II — •— . w II H c ■o to M rn— 33 — 33 Ol acq ECO 33*3 33 m men 33KJ * ~* H HN — H — * HO — HMM O -P o 4 H-1 O o u • to 10 C 0u ■3 Ml/) II to MO CO CU w M fN — » 1 — 1 M — H -O0 . — M — J»»M MM — JC-AJ —MO c n a aOj 3fOi 9* 3109 >M — 31 CO — vwM scaca SMx S«kM to w~ a 10 NX3 m a W-* MM MOO MHO a)-W MX3M N03M M— 04 u M o H ZM aM a at» ■*"i aw ii a * ato M «H«i a 04 M *r-\ CO p HO M m Ml «^ « MO *-*-* to II M — — *-i H ■P o 33 M fN — — «4 04 33 H •H ^— ^ o to-« i/im to II to II 10 ~1 — t/>ca«a tOKM to — tooo to»to U »H M H H-l H-i HW HN H "- HO w H10.J H — hhu H10 — M 0) -p II 30 ao acq acu a« XH ax: a'Ji(j aM--. atOM ato 04 (N a M N ii M II B>M wo MO MC4 — M — Moa MOM MM04 MO — M OH > >M E»H r»H >Mo >ao t»ca< >Oa >asa) aa 1 l/l M M MB) Mia MM MM M» l| MO^ W ^i MJOO MCM Mt^a u cd MM H S O M M MO a to II OM ■J 30 CNH e*H (NH ^H OMM 1/1 »-M tn — ^« H eot/iM vo— na a IH OH CU H 3"M nM ;!""* n— maox 1- II M «^ OM octi Ui O a O B M H ■HM IOM — M M m< —10 S3 x: t O t/1 H H -— % ^^ _>«a wJ«» UJ ^-^ U33 M » ■3 1/1 U H ^^ m^ O o O- M — 03 U II H • to can O a t/» 10 10 i| to II «0 II w-< tooca io — to- tooa U II M 04 cn — 03 — oato onto COM— . MH — 03X1M COM rn ME(J T-t < t« MO NO MM MM MGOCJi tw«co •J u MM-« *>~^ M M M H M B4 ood >a »»oto >— V) t»a » >40 OOM »04<4 04 h! a 1/1 0«5 OB OH I S-i OOM OH || U*a O il -a OOM •4 E> N o um UM UIO 0"> UNU u to UIO U H H m 00 m 4 PW —VI a* »- .,.» Wl*^,, to 4 O Dm Pu Bu • *u • M O M IO M D OuO — M to *u O II IS o — O — o • O • o«o o »o o— o O O • II — II (J H M >- o o o o OM •-M OU) IN — oa oa a H M H «•• II h> k! «u -1 M MU MU MU MU MJU M'J^ MUM MU 1 MU 1 MU 1 UM O^Q M 1 M 1 M 1 M 1 MMOi M 1 Cu M 1 a. M 1 a, M 1 a, M 1 CVi Oa ►.»" Jil nIU J *: JK J<-« .Jm< ►J'^-< MMI3 MM13 MMX3 MO ,«, cum Bl J Oj-J CUM cuho Cw-13 CuM=3 OiMM OyMM CUMM a CO M CO aca caa z:cq B«tO nemo raio eco« X3C0« ■3 CO* M »- 33' O — o~* o — o~- o— » o~— o-~— O"**^ o— 0— 33 — * — H«~ u (J u u U U u u u u H ^3 Comparison v/ith other Methods The uniclass algorithm used by AQ7UNI differs from other characterization techniques in that (1) the uniclass algorithm can produce a disjunctive description of the events with varying degrees of generality, and (2) the uniclass algorithm does not permit the use of structured events (i.e. event descriptions involving dummy variables). AQ7UNI is a data-driven, botton-up method (as opposed to a top-down or model-driven method). The disjunctive units, the complexes, are built up of individual events until a threshold limit causes this process to halt. The characterization methods of SPROUTER [Hayes-Roth ?6] and THOTH [Vere 78] have been classified as bottom-up methods in [Dietterich & Michalski 79] but AQ7UNI differs widely from both according to the two points above. Even with the differences that exist, some general- ization techniques do appear in common. In AQ7UNI , generalizations come about by the application of several procedures : 1. internal disjunction Recall that a selector is of the form [Xi=value] or [Xi=set of values]. The latter form represents an internal disjunction, i.e. X2=3»5 represents M (x 2 =3) v (x 2 =5). 2. dropping a selector When the list of values in a selector contains all values in the domain of the variable, or when a large portion of the domain is present and the action of the selector threshold forces the elimination of a selector, the selector is dropped from the characterization, 3« closing an interval When variable X. is of interval type and a selector such as Xi=2,5t8 is present, it is generalized to X^=2..8 which denotes that X^ may take any value in the interval [2,8]. 4, climbing a generalization tree When variable Xj is of structure type and a selector such as X^=square, triangle is present, it is generalized by replacing the values by the term for which they are both refinements. In this example if square and triangle are both refinements of polygon, then the selector would become Xj=polygon. When no node of common refinement exists, the selector is dropped. In a comparative study of several characterization methods [Dietterich & Michalski 79] these same generalization processes were found. Process number 2 is used in both ^5 bottom-up methods mentioned previously. Processes 1 and 2 are incorporated in Meta-DENDRAL [Buchanan 78] and all processes are present in INDUCE [Larson 77l» These latter two programs utilize a model-driven technique. It is important to remember that the characterization methods mentioned above utilize structured event environ- ments. This capability is not supported by the VI4 system, and hence is outside the realm of AQ7UNI. The importance of structured events can be illustrated by trying to solve the characterization problem used in [Dietterich & Michalski 7°1. That problem is to character- ize the events shown in figure 13 • event 2 event 3 Figure 13 46 An event is structured when it consists of subevents with similar features (e.g. size, shape, texture) and the relations between subevents (e.g. larger than, ontop, within). To compare events we must first find and compare corresponding subevents, according to their relationships. Consider events 1 and 3 in figure 13» Each of the several objects in each event is a subevent and has observable features of size, shape, and texture. When we compare events 1 and 3 we could compare object a to j, b to h, and c to i, but our natural approach to this task would be to first decide which objects are comparable, and then make the comparisons. Most people would compare objects a and h, b and i, and c and j because a and h are on the top, b and i are in the middle, and c and j are on the bottom. But other lines of reasoning are also valid « compare b and i because they are both shaded, compare a and h because they are both small, compare c and j because they are both large, etc. It is just coincidence that these latter notions of compar- ability lead to the same mapping of comparable objects. This is rarely true unfortunately and you are directed to the task of comparing events 1 and 2 to realize the difficulty. If we ignore the relations between objects in events 1 and 2 the task of comparing them becomes easier. Consider events 1* and 2 in figure 14. We may compare object a to any object d, e, f, or g but it seems most reasonable to compare object a with the object in event 2 which is most ^7 event 1 * event 1* ■ event l ,,f K, ke similar, i.e. event d. Similarly, event b is best compared to either event f or g, and event c is best compared to event e. If event 1 is given by 1" in figure 14, the changed texture of object a now makes the comparison of events 1" and 2 more difficult. Object a is just as much like object d as either f or g and the three choices of a comparable object may have to be carried throughout the induction process. Finally there is the possibility that certain objects may be so dissimilar that it is best to declare them non-comparable. Referring now to the task of comparing events l 1 * 1 and 2 of figure 14, if we decided to form pairs of comparable objects (a,d), (b,f), and (c,e) we are left with objects x and g. It would seem foolish to compare x to g just because they are left over after other pairings have been made. Perhaps object x or g represents a unique situation which distinguishes their respective event and which is truly not comparable to any other object. By introducing a similarity threshold limit into the comparable event finding process, objects which are not sufficiently similar can be declared non- comparable. One implementation of non-comparability is the substitution of a special null object for a non- comparable object in the comparable object pairings. The algorithm below finds comparable objects when presented with n structured events. Algorithm S: 1. Select an event at random to be the fundamental 49 event. The algorithm will generate sets of comparable objects, one from each event, for each object in the fundamental event. 2. Let m be the number of objects in the fundamental event and let t be the similarity threshold value. The following steps 3 to 6 are to be repeated m times to form the m sets of comparable events, one set for each object in the fundamental event. 3. Compare the fundamental object (i.e. the object of current interest in the fundamental event) with each unclassified object in each event. 4. Find the set of objects of maximum similarity for each event. If the maximum similarity is less than t then substitute the special null object for the object of maximum similarity. 5. If a single object, or the null object was selected in step 4 it is the comparable object and enters the set of comparable objects being constructed. 6. If several objects were selected in step 4 (i.e. a tie in similarity value) then save the set of alternatives and continue on to process the next fundamental object. As further object classification proceeds, make the selection firm when only one of the alternatives remains, the others having been assigned to other subsequent object comparability sets. ■ 7. At the end, if any alternative sets remain, make a 50 firm selection of one object randomly. The algorithm ends with m sets of n objects each. Algorithm S will be illustrated by identifying the comparable objects in the three events shown in figure 13. An english description of the events in figure 13 isi event It "Three objects a, b, and c are arranged with a ontop of b ontop of c. Object a is a medium, clear square. Object b is a medium, shaded circle. Object c is a large, clear Ushape." event 2: "Four objects d, e, f, and g are arranged with d ontop of e, and f and g within e. Object d is a medium, clear square. Object e is medium, clear rectangle. Objects f and g are small shaded circles." event 3* "Three objects h, i, and j are arranged with h ontop of i ontop of j. Object h is a medium, clear triangle. Object i is a medium, shaded rectangle. Object j is a large, clear ellipse." The events will be described formally by the variables size (s-small, m-medium, 1-large), texture (c-clear, s-shaded), shape (s-square, c-circle, u-Ushape , r-rectangle, t-triangle, e-ellipse) and on (the value of on is the identity of the object on which it rests). The relation within which applies only to event 2 will not be used. Table 1 gives the formal description of the three events. event: object j 1 1 a 1 2 b 1 3 c 2 1 d 2 2 e 2 3 f 2 g 3 1 h 3 2 i 3 3 sizes m m 1 m m s s m m 1 texture i c s c c c s s c s c shape : s c u s r c c t r c on: 2 3 - 2 Table 1 - - 2 3 - 51 In this application of Algorithm S, the similarity measure will be the count of matching variable values. The value for the variable fm is matched in a special way and counts twice. The match score is 1 if the on values are both null (-) or both non-null. An additional point is scored if the value of the on values in the previous object columns match exactly. We begin to apply algorithm S by selecting the fundamental event. Let it be event 1. Then we proceed to find the set of comparable events for object a. We compare the values of size, texture, shape, on, of object a to those of objects d, e, f, and g (we select d) and to those of objects h, i, and j (we select h). One set of comparable objects is thus {a,d,h~}» Next we compare object b to e, f, and g (we select e,f,and g all with a similarity of 2) and we compare object b to objects i and j (we select i). Now object c is compared to objects e,f, and g (we select e) and to object j (we select j). The third set of comparable objects is /c,e,jl and we are left with one alternatives set still containing f and g from which we randomly choose f and the second set of comparable objects becomes |b,f,il. Each event can now be represented by the values of the variables for an object from each of the three sets of comparable events, using logic system VL« • When this data is given to the program AQ7UNI and the results paraphrased in english the description of the events in figure 13 is: "There is a medium sized clear square or triangle ontop of either (a) a small or medium shaded circle or 52 rectangle or (b) a medium or large clear Ushape, rectangle or ellipse." Setting the selector threshold low eliminates selectors with multi-valued reference sets and produces the simpler description! "There is a medium sized clear object ontop of a shaded object or a clear object." By interpreting selectors which may not be applicable to all events as possible situations, it may also be said that "The shaded object may be ontop of the clear object." When algorithm S is applied with event 2 as the fundamental event, a different generalization is formed » "There is a medium sized clear object ontop of another object," or with more detailt "There is a medium sized clear square or triangle ontop of either (a) a medium or large rectangle or Ushape or (b) a circle or ellipse. Object (a) or (b) might be ontop of the other." When event 3 is the fundamental event, the general- ization ist "There is a medium sized clear object ontop of a medium object which may be ontop of another object" and with more detailt "There is a medium sized clear square or triangle ontop of a medium sized circle or rectangle which may be ontop of a small or large ellipse, circle or Ushape." Setting the similarity threshold to 2, the last generalization becomes "There is a medium sized clear square or triangle ontop of a medium sized circle or rectangle which may be ontop of an object which might be 53 a large clear ellipse or Ushape." The characterizations of the events in figure 13 produced by algorithm S with AQ7UNI are similar to those of the other methods cited which were studied by Dietterich and Michalski. Some characterizations from their study are i "There is a medium object ontop of a large, clear object." (Hayes-Roth's method) "There is a medium object ontop of a large clear object. There is a shaded object and there is a clear object." (Vere's method) "There is a medium-size circle, rectangle, or square ontop of a large, clear Ushape , rectangle, or ellipse." (Michalski' s method) "There are exactly two clear objects in each event. The top-most object is a medium sized, clear polygon and it is ontop of a large or medium sized circle or rectangle." (Michalski' s method with constructive induction) Many other characterizations are given in Dietterich and Michalski 's paper, however the samples given above show the general flavor of the characterizations which can be generated. The last sample above uniquely shows the added power of constructive induction which is a technique not available in the AQ7UNI method. A summary of the differences of the characterization techniques which have been mentioned is given in figure 15» which appears in [Dietterich & Michalski 793 » except for the last column pertaining to the AQ7UNI-with-algorithm-S technique, which appears here for the first time. 5^ CO u tl c tl bu. s: ^ » 3 cfl U rH 6 tl >i tj Si Li o C -P e "-< tl U E i-i u o of > f-l cj O -P ■H a o E n ti t> -^ a £ tj to o 3 rH E cfl v > -p i c «> >> rH CO ,Q co o -fH -H r-l > » 01 a w -p L. -P t) o c to •P CO O -P C V O 3 i-l C r-l V O Cfl 00 CJ > > < tj CO ^ V (1 r-l 3 r-l TO -H E .O C c Cfl V co a) -h (-. > +-> * t-l -H CO V 1 CO to O t-i CO -P P tl >> U -P CO -p v c rH rH CO O P. > C 00 -*-l (4 J3 •p -h m i-i cfl o CJ U >)-P tl • 4) •r* .H tH t) c_> E CD 3 > C U 61CNJ rH 0) £ C rH t) a o _i t) tl 3 O CO • bu > rH S> BTJU O > < .c P. co c tl «J -P e p> +-> u rH 3 C 00 3 .£> Cfl CJ -H -P V tj i- CO 3 rH -P C rH o p a Cfl E Cfl CJ > CO >1 CO t) CO tl >> CO m 00 *> « « CO >1 >» >> CM > ' < 1 U tl H-) 00 •h a. CO -P r-l <« o rH C co — 1 (i. CO (0 u -P U P t> C V tl 03 c c« t) ■P c «j 3 U •H O DO Qf<« rH CJ c TJ O t) .H N -M •H TJ CO U *> -P ti u c i— 1 P 3 U CO tl p CO u E U ti tl CO 3 (-, c i-i i-i a V cfl P tl c<0 a. tn ac c o •H tj -P c0 tj o C -H »1 rH Q. c a- U|rH < 1) CO 3 cx c CO -J m to tl CO rH E U tl CTJ O £1 U -P > o CD CJ •3 tl to ti E »^ S- O hi r-l CO c a CO CO tl >v CO u p. CO o CO o a. a. CO •P O c CO u t) c t> T) CO ■H c «0 CO o tl c £> bi) >> CO to O -H t) CO TJ >> !-. t) E O I-i a. to ■ •0 OS 00 CO X. t> •a o o DC tl £> O ■P ■P TJ £ V ttf, P 3 -h O E £. -H •a o o hi >> li t> > >> ■p E ■P cfl ^H C 3 O ^ a. ti E C t> J= * p -p o t) -H C rH t, ^ a, o E SD-P OHO O CO c CO t) >) tl o p 3 OS c o o ■P «>• o- > co c to n C-l tu, O -P tl MrH •H C -H C H C CO rH -H +J CO .O-H C CO Ql-h +J COP t-i t, D.T3 to .H CO t) ti O C C S-. t) P c iioociiiC « TJ CJ O > U -I 'J c>- tl m U CO c > ■H !_ 00 t> O P rH C 00 - c >iO CJ CO C-H V li ■H CO C Q- •H E > rH Cfl c CO CJ CO- CO >> B P c .H O rH ~4 •H +J X3 CO -H CJ 09 -H C rH ti Q. p a. x CO U TJ o o bil >> rH TJ J3 t> CO P ,0 .h O E P. -"I O CO •a ti p a c o- 3 to ■r, g CD k, ^ o TJ <-> c 3 E E HO TJ t) fH Jt o c O. TJ ■ > •H O- P c o o 3 -H (-. P -P o CO 3 C TJ o c CJ -H to O P *) c o ■p rt si •H Jh c> +> o rt Jh aJ XI o o c o tfi •H Jh nJ pu o o j<: 10 rH CO o ■H U-y CD ■cSo P. r0 55 Summary Inductive program AQ7UNI can characterize any class of events which can be described in the Variable-Valued Logic system VLj . The degree of generalization can be controlled by adjustments to the selector threshold and density threshold, and the optimality of the solution can be altered by parameters controlling neighborhood con- struction and neighborhood judging criteria. Characterizations of medium degrees of generality usually cause several complexes to be formed, each covering a portion of the events. When disjoint complexes are requested, the complexes describe clusters or subgroups of events in the class which have similar characteristics. Unique events tend to fall into the smallest subgroups because they are the most difficult to describe generally. The great flexibility of the program with several control parameters and the wide range of characterization problems and solution requirements makes experimentation the only technique for exploring the range of possible characterizations in order to find those which are useful. AQ7UNI has no facilities for constructive induction nor can it handle problems involving structured events. Sometimes these two limitations can be overcome by manually introducing new variables (in lieu of constructive induction) or transforming a structured-event problem into a VI4- expressable one (e.g. via algorithm S). 56 REFERENCES Buchanan, B. G., Feigenbaum, E. A., "Dendral and Meta- Dendral, Their Applications Dimension," Artificial Intelligence, Vol 11, pp5-24, 1978. Dietterich, T. G., Michalski, R. S., "Learning and Generalization of Characteristic Descriptions: Evaluation Criteria and Comparative Review of Selected Methods," submitted for publication to the Sixth International Joint Conference on Artificial Intelligence, August, 1979* Hayes-Roth, Frederich, "Patterns of Induction and Associated Knowledge Acquisition Algorithms," Department of Computer Science, Carnegie -Me 11 on University, May 1976. Larson, J., Michalski, R. S., "AQVAL/l (AQ7) User's Guide and Program Description," Department of Computer Science report number 731, University of Illinois, Urbana Illinois, June 1975* Larson, James G., "Inductive Inference in the Variable Valued Predicate Logic System VL21 t Methodology and Computer Implementation," Department of Computer Science report number 869, University of Illinois, Urbana Illinois, May 1977. Michalski, R. S. f "AQVAL/l — Computer Implementation of a Variable-Valued Logic System and the Application to Pattern Recognition," Proceedin gs o f the First In t e r national Joint Conference on Pattern Recognition , Washington, D. C, October 30-November 1, 1973. Michalski, R. S., "VARIABLE-VALUED LOGIC: System VLj. , " 197^ International Symposium on Multi ple -Value d Logic, West Virginia University, Morgantown, West Virginia, May 29-31, 197^. Michalski, R. S., "Variable-Valued Logic and its Application to Pattern Recognition and Machine Learning," chapter in the monograph: Mul tiple-Valued Logic and C o mpute r Science , edt. David Rine, North-Holland publishers, 1975* 57 Michalski, R. S., "Toward Computer-Aided Induction: A Brief Review of Currently Implemented AQVAL Programs," Department of Computer Science report number 87^» University of Illinois, Urbana Illinois, May 1977(a). Michalski, R. S., "A SYSTEM OF PROGRAMS FOR COMPUTER-AIDED INDUCTION: A SUMMARY," 5th International Joint Conference on Artificial Intellegence , MIT, Boston, Massachusetts, August 1977(b). Michalski, R. S., "A Planar Geometrical Model for Representing Multidimensional Discrete Spaces and Multiple-valued Logic Functions," Department of Computer Science report number 897. University of Illinois, Urbana Illinois, January 1978. Michalski, R. S., and Larson, J. B., "SELECTION OF MOST REPRESENTATIVE TRAINING EXAMPLES AND INCREMENTAL GENERATION OF VLj HYPOTHESES: the underlying methodology and the description of programs ESEL and AQ11," Deparement of Computer Science report Number 867, University of Illinois, Urbana Illinois, May 1978. Michalski, R. S., "STUDIES IN COMPUTER INDUCTION AND PLAUSIBLE INFERENCE," a research proposal submitted to the National Science Foundation, Intelligent Systems Program, Computer Science Section, Division of Mathematical and Computer Sciences (1979)* Stepp, Robert, "The Uniclass Inductive Program AQ7UNI : Program Implementation and User's Guide," Department of Computer Science report number 9^9. University of Illinois, Urbana Illinois, April 1979. Vere, S. A., "Multilevel Counterfactuals for Generalizations of Relational Concepts and Productions," Department of Information Engineering, University of Illinois, Chicago Circle, 1978. Winston, P. H., "Learning Structural Descriptions from Examples," Technical Report AI TR-231, MIT AI Lab, Cambridge, Massachusetts, 1970. BIBLIOGRAPHIC DATA SHEET 4. Title and Subt itle 1. Report No. UIUCDCS-R-79-982 3. Recipient's Accession No. 5. Report Date July 1979 Learning Without Negative Examples via Variable-Valued Logic Characterizations: The Uniclass Inductive Program 7. Author(s) Robert Stepp 8. Performing Organization Rept. No. 9. Performing Organization Name and Address Department of Computer Science University of Illinois Urbana, IL 6l801 10. Project/Task/Work Unit No. 11. Contract /Grant No. NSF MCS 79-06614 12. Sponsoring Organization Name and Address National Science Foundation 13. Type of Report & Period Covered 14. 15. Supplementary Notes 16. Abstracts This paper describes the underlying theory, internal logic, and evaluation of an induc- tive program AQ7UNI which accepts a set of symbolic descriptions ( events ) of arbitrary objects and produces a general description (characterization) of the set. The events are attributevalue lists and the resulting characterizations are expressed in a simple yet powerful formal language VL (Variable-valued Logic system 1 [Michalski r jk i 75]), which is a form of monadic predicate calculus. 17. Key Words and Document Analysis. 17a. Descriptors Computer Induction Machine Learning without Teacher Variable-valued Logic Characteristic descriptions 17b. Identifiers /'Open-Ended Terms 17c. COSATI Fie Id /Group 18. Availability Statement Unlimited FORM NTIS-35 ( 10-70) 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 31 22. Price USCOMM-DC 40329-P71 FEB 2 1981