no. 997-1005 1979-80 lncomnl . copy 2 CENTRAL CIRCULATION BOOKSTACKS ree or $75.00 for each lost book. Theft, mutilation, and underlining «« i. i. for dl..,p, lnary ort|o „ ^ r ^ B9 res u , , t b f i 0k j 1 « 7«o« the Univenlty. ^ U " ,n dls «"'"al from TO RENEW CAll TELEPHONE CENTER, 333-8400 p^T ou 7 d r^r w ' write ^ dueda -^w Digitized by the Internet Archive in 2013 http://archive.org/details/knowledgebasedsy1001mich /lO- /DO I ^^W-UIUCDCS-R- 80-1001 January 1980 UILU-ENG 80 1704 Knowledge-based Systems by Donald Michie •2 it c DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS "RRANA- CHAMPAIGN 4 UIUCDCS-R-80-1001 Knowledge-based Systems by Donald Michie Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 and Machine Intelligence Research Unit University of Edinburgh United Kingdom January 1980 Introduction A new programming technology has been growing up around the problem of how to transfer human expertise in given domains into effective machine form, so as to enable computing systems to perform convincingly as advisory consultants. Expert systems development, confined during the past decade to academic laboratories, is becoming cost-effective. Reasons are partly advance of semi-conductor technology and partly development of well-understood methodologies for "knowledge-based" programming. A few examples may be in order to illustrate the kinds of consultant 23 skills under discussion (Figure 1) . MOLGEN interactively aids molecular geneticists in the planning of DNA-manipulation experiments. VM ("Ventilator 15 Management") gives real-time advice for the management of patients undergoing mechanical ventilation in the intensive care unit of the Pacific Medical 21 Center. PUFF interprets results of pulmonary function tests in use in the 3 same center. SACON guides engineers in the use of a large program which 18 integrates structural analysis procedures. PROSPECTOR advises when and where 8 to drill for ore. DENDRAL takes the pattern generated by subjecting an unknown organic chemical to a mass spectrometer, and infers the molecular 34 structure. SECS uses a "knowledge-base" of chemical transforms to propose schemes for synthesizing state compounds. End-game expert systems deploy and discuss chess-master knowledge and generate improved teaching texts. 32 29 MYCIN and INTERNIST out-perform clinical consultants within the limited domains of expertise of these programs. Medicine ° MYCIN, for identification of bacteria in blood and urine samples, and prescription of antibiotic regime. Shortliffe, at Stanford Medical School, USA. • INTERNIST, for diagnosis in internal medicine. • Intensive care ("iron-lung") Myers and Pople at Pittsburgh University, USA, VM (Fagan and others) Interpretation of lung tests PUFF (Kunz) Chemistry • DENDRAL , for identification of organic compounds. Feigenbaum, Lederberg, Djerassi, Buchanan, Carhart and others. Stanford University, USA. • SECS system for designing organic syntheses. Wipke, University of California at Santa Cruz, USA. Molecular genetics MOLGEN (Lederberg, Martin, Friedland, King, Stefik) Other • Consultancy for structural engineers SACON (Bennett) • Consultancy for mineral prospecting PROSPECTOR (Hart, Duda, Einaudi) Figure 1. Examples of expert systems. Six Facts of Today's World 1. The market for consultancy demands specialists, not generalists: this applies to automated consultancy too. 2. Real-time operation is in some applications not just desirable but essential (see the reference to VM earlier) . 3. A consultant's skill consists to an important degree of asking the client the right follow-up questions, as the outlines of the case takes shape. 4. Unless the program can do this, and can also explain its steps on demand, client confidence collapses. 5. An expert system acts as a systematizing repository over time of the knowledge accumulated by many specialists of diverse experience. Hence it can and does ultimately attain a level of consultant expertise exceeding that of any single one of its "tutors." 6. Program text in the ordinary sense is an unsuitable and unpopular medium for the description and communication by human experts of their expertise. "Advice languages" are needed. Nature of Knowledge-based Expert Systems Expert systems are not, and owing to the complexity of their tasks cannot be, either procedure-driven in the ordinary sense _or data-driven — although they can all be fairly described as database-driven. The great bulk of the database, however, is typically made up of rules which are invoked by pattern-match with features of the task-environment and which can be added to, modified or deleted by the user. A database of this special type is ordinarily called a "knowledge-base," and its existence determines that there are three different user-modes for an expert system in contrast to the single mode (getting answers to problems) characteristic of the more familiar type of computing (see Figure 2): CO cO s: • • en z a> E T3 01 O 4-1 B CO >^ i-i Cfi 01 en ■u 3 u 01 0) a. 01 X h 0) .fi r ■u c < en c o •H •u cO 3 c u CO cu o rH •H 4-J ex H 3 X a 4J 0) T3 en OJ en 3 cfl M CO CO T3 ■— ' U QJ )H 0) rH CU en en & CO E 3 o 3 QJ c •H 1 rW 1 ^ O en M a 8 o 4-1 4-1 en en en ^ 0) OJ £ .C en 4-1 3 co 60 bO •H a > •H O 4J U 4-1 a QJ o E H QJ en cO I QJ bO QJ rH O QJ 00 c •H 4J en 0) > cO W en cd QJ en 3 I 4-1 en en 4J V4 OJ a x 01 c CO U-i O 0) en 3 0) i i 01 01 u H CN OU H 3 bO •H fa CN m (1) getting answers to problems — user as client; (2) improving or increasing the system's knowledge — user as tutor; (3) harvesting the knowledge-base for human use — user as pupil. Users of an expert system in mode (2) are known as "domain specialists." It is not possible to build an expert system without one. In the early days it was thought that computer-based consultation would be confined to the conventional performance goal of earlier computing technologies, namely to deliver good answers to the client's input questions. This turned out not to be the whole picture. First : the client demands explanations as well as answers. Second : the system is typically required to have facilities for improving and refining its knowledge from tutorial interaction with the client's own domain specialists. Third : an additional feature has come to the fore, whereby the system can be made to generate improved codifications of domain-specific knowledge for human use. A typical explanation facility allows the user to challenge and examine the reasoning process underlying the system's answers. The dialogue shown in Figure 3 is excerpted from a session with MYCIN, a program knowledgeable about diagnosis and antibiotic therapy in certain classes of infectious diseases. This record contains two examples of the system's ability to handle questions of the form "how did you work that out?" In the first, the user types "RULE" and receives an English-language version of the last rule to be executed. In the second example the user-command "WHY" triggers a backward trace of the inference process which fired the system's last question. The number of the last rule is also given in case the user wishes to retrieve and examine it. Further backward tracing could be ••• 22) Was any chemotherapy recently begun specifically to combat ORGANISM-1? ••NO 23) Is Jeremiah Sample (PATIENT-1) a burn patient? "RULE RULE047 IF: 1) THE SITE OF THE CULTURE IS BLOOD, AND 2) THE IDENTITY OF THE ORGANISM IS NOT KNOWN WITH CERTAINTY, AND 3) THE STAIN OF THE ORGANISM IS GRAMNEG, AND 4) THE MORPHOLOGY OF THE ORGANISM IS ROD, AND 5) THE PATIENT HAS BEEN SERIOUSLY BURNED THEN: THERE IS WEAKLY SUGGESTIVE EVIDENCE (.4) THAT THE IDENTITY OF THE ORGANISM IS PSEUDOMONAS ••NO 24) Is the patient's illness with ORGANISM-1 a hospital-acquired infection? ••YES 25) Do you consider Jeremiah Sample (PATIENT-1) to be a compromised host? (alcoholic, immunosuppressed , etc.): ••NO 26) Has Jeremiah Sample (PATIENT-1) had any gu surgery or manipulation of the urinary tract? (e.g., indwelling catheter, cystoscopy, etc.) ••WHY I AM TRYING TO FIND OUT ABOUT THE IDENTITY OF THE ORGANISM. I ALREADY KNOW THAT: THE PORTAL OF ENTRY OF THE ORGANISM IS UPPER-GU THE MORPHOLOGY OF THE ORGANISM IS ROD THE STAIN OF THE ORGANISM IS GRAMNEG THE SITE OF THE CULTURE IS BLOOD THEREFORE, IF: 1) THE PATIENT HAS NOT HAD A GENITO-URINARY MANIPULATIVE PROCEDURE, AND 2) URINARY-TRACT-INFECTION IS NOT A PROBLEM FOR WHICH THE PATIENT HAS BEEN TREATED THEN: THERE IS SUGGESTIVE EVIDENCE (.6) THAT THE IDENTITY OF THE ORGANISM IS E.COLI (RULE156) 32 Figure 3. Excerpt from a consultation with the MYCIN program. activated by repeated use of "WHY" and "RULE" commands before proceeding with the main dialogue. The supporting software framework is of a type normally called a 13 "production system" — a modular collection of rules, together with a control structure. Each rule has a condition part consisting of a conjunction of patterns CI, C2, etc., paired with an action part (Al, A2, etc.) according to the general scheme shown in Figure 4. The list of rules is searched for the subset whose condition parts are satisfied ("matched") by the current state of the database. The retrieved candidate set is processed to detect any conflicts and to resolve them by elimination of rules from the candidate set. The first rule of the reduced set is executed. An action part can be an action, e.g. "print disease-name" or a logical, numerical or other value, or it can be an action-sequence or an action-scheme, goal-list or other advice structure used to guide an action-generating module. Typically execution of the action part of a rule modifies the state of the database. Rule-based Inference The deductive inferences performed by MYCIN in the process of answering the user's questions follow a control scheme known as "backward chaining." To get an idea of how this works, consider a simple set of rules (Figure 5) in which letters from the alphabet have been substituted for "facts." 1) A & B -> F 2) C & D -> G 3) E -> H 4) B & G -> J 5) F & H -> X 6) G & E -> X 7) J & K -> X The arrow "->" implies "THEN," thus the first rule reads If A is true AND B is true THEN F is true. RECOGNIZE DATABASE: C5 CI C3 PRODUCTION RULES (CI & C2) -> Al C3 -> A2 (CI & C3) -+ A3 C4 ■> A4 C5 ■+ A5 Match CONFLICT SET SELECTED RULE C3 ■*■ A2 (CI & C3) -* A3 C5 -*■ A5 Conflict Resolution S > C3 ■* A2 ACT Execution C3 -*• A2 > A2 executed At the given instant the database contains the system's model, in the form of an implicit conjunction of conditions, of the state of the task environment. "(CI & C2) -»■ Al" means "if the condition (CI & C2) matches the database, then execute Al." Conflict resolution is the task of a tie-breaking algorithm, not specified here. Figure 4. Production system "recognize-act" cycle. In the simple set of rules below, letters of the alphabet have been substituted for "facts." 1) A & B -> F 2) C & D -> G 3) E -> H 4) B & G -> J 5) F & H -> X 6) G & E ■*■ K 7) J & K -> X "p & Q -* R" means "If p " Q is true and is true then R is true" We discover that B, C, D and E are true: is X therefore true? Figure 5. Production system used as a deduction engine; in this "backward chaining" mode matching is done on the right-hand rather than the left-hand parts of rules. 10 Suppose that in a particular case we discover by observation that "facts" B, C, D and E are "true," and we wish to discover if X is therefore true. The program will consider those rules which could be used to infer the truth of X, i.e. those rules (5, 7) which have an X on the right-hand side of the arrow. Each such rule is tested to see if each of the facts on the left-hand side are known to be true, any unknown fact being treated in the same way as the original fact X — i.e. we proceed by recursion. Thus : X may be deduced from Rule 5 or Rule 7 1) Starting with Rule 5, are F & H true? 2) F can be shown to be true if A & B are both true (Rule 1) 3) A is not known to be true, so this attempt fails 4) Continuing with Rule 7, are J & K true? 5) J can be shown to be true if B & G are both true (Rule 4) 6) B is known to be true "a priori" 7) G can be shown to be true if C & D are both true (Rule 2) 8) C is known to be true "a priori" 9) D is known to be true "a priori" 10) therefore G is true 11) therefore J is true 12) K can be shown to be true if G & E are both true (Rule 6) 13) G is already known to be true (step 10) 14) E is known to be true "a priori" 15) therefore X is true 16) therefore X is true. 11 The above simple deductive technique is the basis of MYCIN'S reasoning. The technique is powerful and efficient while at the same time very general and easily comprehended. "Learning" Expert Systems The rule-based structure of expert systems facilitates acquisition by the system of new rules and modification of existing rules, not only by tutorial interaction with a domain specialist but also by autonomous "learning." An early example was a self-taught pole-balancer developed by 26 14 Michie and Chambers on the basis of 225 condition-action rules. De Dombal's diagnosis program acquires its medical expertise by statistical induction 22,24 over patient records with confirmed diagnoses. Michalski's AQ11 program acquires diagnostic expertise by logical induction as also is the basis, following different formalisms, of the successes scored for machine learning 9 in chemistry by Meta-DENDRAL and in robot vision by the Edinburgh FREDDY 2 30 19 system. Finally Quinlan's latest version, ID3, of Hunt's CLS algorithm recently synthesized inductively in a few seconds of machine time solutions to classification problems which had proved intractable as tasks of hand- synthesis and coding. A connection thus appears between machine learning and automatic programming. This connection gains interest from the fact that 36 recent runs of ID3 have synthesized programs (in the form of decision trees) which perform classification tasks more than five times faster than the best hand-coded program. Various "learning" expert systems are listed in Figure 6. 12 The system for soybean diagnosis shown in the figure starts with primitive descriptors from the expert pathologist and from these, and from 12 r-l co QJ •■ 4-1 4-1 rH 3 A CO 3 g /--s c c/j X CO •H 3 ^ ^J •> UH 00 CO M a) o o H PQ 3 u CO O «^N CX X ^ 4J ro s o 4J rH CO a z CO CU •H o rH a) M e v-i o S X 03 rH ^^ 0) bO B 4J a. 4J ^■n O cu #* /— s u CO ex 3 cn co >h u b cn U o O r^ S-i a QJ a) h H 3 p-i •H CO u •— N tM CO cu CO PQ 4-1 x b c G X X •u A cO bO e o c •H o ■u 3 #v QJ 4J 3 CO •H 3 CJ o QJ 3 •H 3 •H X W o 01 cO e o X CU 3 B u ■H P> *-) 13 •H M o E Sh nj O •H ^ C J-i rJ •H 53 CO n id cu * 4-1 * 2 CO CD CO s rH QJ bt 3 13 r-l CJ >> px CX PQ CX rH O cO CO 3 ^ Q b X •> B 1 /-^ u C Xi 13 cn ^ CO QJ •» i3 •H 4-1 3 ex CO CO e 3 3 w B M U a co ■H •H o ■H cO p cO tH 0J O 4 QJ rH en x c/} n H 1 X! n rH MH CJ 3 w CJ 0) H •H CO CJ s X> £ CJ 3 -H X ■H >^ QJ rH X! 4-1 3 6 CO C/j O 3 o s CO 13 O* CJ CD PQ pi «3 U < cj o* M \^ PQ Nw' > 3 rH O X •H 4-1 CD CO cn CJ cn ■H co UH mh 3 CO •H cn o r (0 ■H 4-1 cu cu cu QJ O 3 3 3 >. 4J X 3 0) cO o ai 3 >, •H 13 r-l •H 1 CJ O rH •H 4-1 4J 13 cO 3 cn rH cn •H 3 •H H CJ o 3 ca m CO 14H •H r-l bO .C o a O 3 a is O r •H o 3 CJ CD rH CO CJ CO 0) r-l •H CO •H CO cn 1 r-l O CO 3 CO a) o o Mh •H o cn u 4-1 rH 3 e 3 co u cu co cn 00 o bO d) a r* 3 4-1 CO -3 cO cn cu CO r-l •H X •H •H a mh •H CO Q CO O 13 c/j O > ex a t i en o o 14H 3 rJ o •H bC cn 4-1 3 50 cn •H 3 o ^ ■H -a rH C 3 CU •H OJ 13 •H 3 4-J cn o x: •H t.0 cn ca 4-J ■H •» 3 •H 3 4-J o C M X •H bC 1 bO 4J O bC •H -H o 3 3 Cfi a) •H ^ O Pi Ai r a f*i 00 <4H o , , O 1 — 1 o 4J 3 cu X U X -a bO o CJ CO •H 0) i rH M 4-1 eg QJ QJ eg IS X X 3 Oi 4-4 3 3 QJ CO 3 X a a •H 4-1 x 4J cn cn >. B cu X X! 0) rH CJ 4-J 3 13 •H cn U ■OJ X >-> 4-J 3 [fl g cu 4-1 4J 5 cn CO u QJ 1) r-l U 13 ex •H cx CJ X OJ CU OJ X rH a 4-1 CU H CX CJ 01 S QJ 01 u r-( CO JC 4H bO o X 4-1 C4H cu OJ CO 13 «l 1 6 o rJ o H 0) 4-1 OJ r-l X 4-1 <4H -U 4-J bC c CJ C 3 r^ ca co •H O rH Uh •H 13 -a CJ 4-1 CJ rH UH to 13 QJ CO Uh C r-l CJ 14H cO •H •H o 'O 4-J 0) CO •H ^ cu cu 13 rJ CO rH ^ QJ CJ 4-1 X cx rH CO e CX OJ 3 a cti B X cO tfl X CO H CJ CJ cu cn CO >^ CO 4-1 r-l QJ a x QJ bO 3 •H 3 r-l CO OJ rJ QJ 3 bO •H 13 a training set of values for diseased plants with confirmed diagnoses, synthesizes a set of diagnostic rules. The unexpected discovery was made that a machine-synthesized set of rules greatly out-performed those developed by the plant pathologist, Dr. Jacobsen, who acted as domain-specialist and was the source of the original set of primitive descriptors. Jacobsen then attempted to improve his rules, and partially succeeded as shown in the bottom line of Figure 7. Feeling that further improvement would be hard, he discontinued the attempt and adopted instead the machine-synthesized set as the basis of his subsequent professional work. One way of summarizing the relation between inductive "concept learning" and automatic program synthesis is diagrammed in Figure 8. An unexpected side-light on future uses for inductive learning, additional to the obvious ones, is cast by the following consideration. As memory continues to get cheaper faster than processing power, the possibility of encoding industrially useful information in the form of giant look-up tables will begin to be realized in commercial practice. In many cases the time- complexity of the function to be represented in the table makes it infeasible to initialize such tables in the obvious way. When, however, the function has a low-complexity inverse (as for example the prime factor function or the function mapping from mass spectra to molecular structures) it is possible to initialize such tables "backwards," i.e. by enumerating f's y-domain and using the inverse computation to enter the elements of the x-domain. Look-up then proceeds "forwards." Drawbacks are: (1) cluttering up memory with uninteresting and unwanted x-values; (2) conceptual opacity of the resulting table to the human domain-specialist. Inductive inference techniques will 14 CO O 0) •H U CJ <4H cu ex ct) C C en 6 ex S~ s o •H eu •H CO 1 U r5 CJ m 3 P^ CN -C CO H rH CO •u e eu M en •H en > M 0) 2 j= ■> •H O N 4-) 3 4-1 u •H CO •H eu CJ a 05 (1) 3 en CO u m 3 en en o cu o •H l •H o 4J CO u en co s co g o o i-i •H 0) o u c •u T3 ex 00 J3 X. r*. ■w •H CO 4J u rQ m •H •H •H Cfi o 13 3 3 w rJ en o O CU en eu >> ^ CN en 4-1 T3 cu ■u O CJ H cu ex N— > en CO CO CO CO •H co M M M cd V-l en o 3 3 3 a) O 0) O o CJ CO CO en s o o CJ •H cu co eu CO CO CO 13 TD a C B^S 5~S fr-S CTn IT) r^ vO ON co CO H m o CU CT> A oo ON • • en H cd CO hJ j-i 4-1 c Ph CO -a en cu H MH O ■H M •H a c co M ■H CO cu C 3 en CU rH O rH >i 3 c rH o CU ^ cu o- C/3 C t-J < •H CJ CO B DO 3 •H en 3 cu m •H T3 en rH co CJ C/} X) 3 CO s OJ en .£> o a CO ►-> P5 >. M Cfl 3 CC 01 rH •rH •H o ,3 c U •H Pi M >> MH rQ C 4J > 3 u QJ •H e co ■H J-j M OJ OJ > O- •H X 3 H £) cu >H 3 50 •H 15 CLIENT'S MENTAL PICTURE OF WHAT HE WANTS <1> (A) Systems analyst's task INTENSIONAL DEFINITION OF RELATION TO BE COMPUTED (2) BJ Programmer's task ,± <5> PRACTICAL DEFINITION OF RELATION (PROGRAM) (3) (c) Learner's task TUTORIAL DEFINITION OF RELATION (GUIDED SELECTION OF EXAMPLES AND COUNTER-EXAMPLES) (4) (5) Teacher's task EXTENSIONAL DEFINITION OF RELATION (ACTUAL OR HYPOTHETICAL DATABASE OF INSTANCES) (5) Figure 8. A way of looking at the synthesis of a concept (description) in the form of a program. The fundamental ideas underlying this diagram are (1) that both the teacher and the programmer have the task of conveying concepts to target devices which are then called upon to apply the acquired concepts to new data; and (2) that a symbolic definition is not the only kind of definition which could be used as a formal specification from which to build a program. A sufficient set of tutorial instances could do the same job. In the case of the teacher's task, the "target devices" are of course his or her human pupils — fortunately equipped with rather good inductive capabilities. Current research aims to equip the programmer's target devices in something like the same way. 16 be required for combatting both (1) and (2). This theme is illustrated in Figure 9. Critical Role of Patterns Knowledge-based computing systems seek to implement the consulting skills of human experts. They answer questions in problem domains too complex for "standard" hardware/software designs, but not so complex as to be totally intractable. Study of the cognitive strategies of experts has shown that performance in such domains, at least for human practitioners, is not based on elaborate calculations but on the mental storage and use of large incremental catalogues of pattern-based rules. Thus chess mastership is gained through the acquisition and organization in memory of diagnostic patterns, not through increases in calculating power. In Figure 10 the upper two patterns illustrate the thematic categories of the sort found in the 35 early pages of a chess primer ("fork" and "back-rank mate" respectively) . The lower two exhibit a single pattern differing by a minor perturbation which happens to be critical. In the left-hand case a familiar type of sacrificial attack on the King can be launched with impunity. In the right-hand position it can be spiked at the last minute by the move B-Q6 by Black, guarding White's intended Q x RP. The role of remembered patterns is thus to propose a tactical idea. Detailed check-out by concrete analysis is still required. In Figure 11 some representative pattern-based skills are listed, for four of which "expert systems" have been implemented. The approximate number of patterns required for successful machine implementation is thus in these four cases known. The last line contains estimates for a highly sophisticated domain of human expertise where no comparable machine expertise yet exists. Figure 12 shows the approximate numbers of patterns required for a few of the fragments of the total chess domain for which 17 Processing/dollar grows by >10 per decade Memory/dollar grows by >10 per decade Efficiency of given problem-representation depends on RELATIVE COST: processing memory So what are we planning * & to , — do about it? ONE WAY - choose functions with easy inverses and build very large databases (say, trillion-bit). Possible e.g. for mass spec, in organic chemistry. RESULT - requirement generated for knowledge-engineering skills (1) for filtering entries to the table (2) for inductively compacting . Figure 9. A new situation precipitated by continuing hardware trends 18 mm |t&! fpjl Jiu|! §11 e IIP mm WA wm mm mm wm mm mm. « iim 8II1 n Sm mm jj| H • II Figure 10. Some "patterns" in chess. 19 SKILL NATURE OF IMPLEMENTATION No of pattern- based rules in implemented system. Seeing a scene Incremental catalogue of visual patterns. Simple scenes of shadowed polyhedra, Waltz, early 1970s. 10 Balancing a pole Incremental catalogue of pattern-based rules. Michie & Chambers, mid-1906s. 225 Identifying organic compounds from mass spectra Incremental catalogue of pattern-based rules. 'Dendral' program of Lederberg, Feigenbaum and Buchana. c 400 Identifying bacteria from lab tests on blood and urine Incremental catalogue of pattern-based rules. 'Mycin' program of Buchanan & Shortliffe. c 400 Calculating- prodigy arithmetic Alexander Aitken, studied by Hunter, 1962, 2( used pattern-based rules. ? Grandmaster chess Chess-masters, studied by Binet, de Groot, 11 2f Chase & Simon, Nievergelt, use pattern-based rules. 30,000-50,000 Figure 11. Some pattern-based skills (condensed from Michie, 1976). 27 20 Ending Approximate size of the problem space Number of patterns required for an expert system King and Rook against King 6 40,000 10 King and Pawn against King 5 100,000 20 King and Knight against King *7 and Rook 2,000,000 30 Figure 12. Pattern-requirement of three small sub-domains of chess grows slowly relative to increase of domain complexity, 21 machine mastery has been achieved. These small sub-domains could of course be (and have been) solved by brute-force enumeration. But this approach yields representations which cannot support the intelligent query and explanation facilities demanded by the user of knowledge- based systems. Patterns in Computer Vision In chess and other deterministic combinational domains (such as industrial routeing and scheduling in various OR contexts) the power of patterns is revealed in the extraction of sense from otherwise intractable explosions of combinatorial complexity. Figure 12 gives a hint of how well-chosen pattern-sets can serve this function, and shows that a relatively slow growth of the pattern-catalogue can maintain control over a wildly growing problem space. In some other perceptual domains, notably vision, combinatorial complexity is compounded by the presence of sensory noise, thus putting an even higher premium on the stored pattern-base. Even without low-level noise, perturbations can be severe. If Figure 13 were viewed upside-down or on its side when first encountered, there is little chance that the human eye and brain would "see" the Dalmatian dog drinking 16 from a puddle in a stone-strewn landscape. The feat whereby sense is extracted from noise rests on the fact first emphasized by Helmholtz in the 19th century that visual perception is an act of reconstruction of the percept from a large repertoire of stored internal models. The rate of input of visual information to the higher centers of the brain is not great enough to do more than give hints and prompts for the reconstructive process. We catch the mechanism in the act whenever we "see" in randomly blotched surfaces pictures which are not "really" there — "similitudes of all sorts of landscapes and figures in all sorts of actions" as Leonardo da Vinci remarked . 22 ure ^.3. A "noisy" visual scene, interpretable by the human eye and brain with the aid of a large stored set of patterns (from Gregory, 1970). 23 Removal of noise is executed in two main phases — a pre-processing phase in which knowledge of an essentially statistical nature is applied for smoothing and cleaning up the raw picture, and a second phase, which the figure has been selected to illustrate, in which semantic knowledge comes into play. For handling knowledge of the second kind a commonly used representational form is the semantic net , of which Figure 14 shows an early example. In the context of machine expertise in vision, Helmholtz 1 "internal models" popularized by Gregory thus receive specific and concrete realization. There is a practical bearing of computer vision on expert systems work, owing to the need from time to time to resort to diagrammatic explanations and other "picture talk" in the course of man-machine consultations. If a medical program is advising on a case of acute abdominal pain it would be advantageous to be able to input the standard diagram of the abdomen from the patient's notes, filled in graphically to indicate regions of tenderness, rigidity, etc., rather than to have to construct symbolic circumlocutions. Beyond a certain level of complexity, e.g. in computerized fault-diagnosis in the production machinery of an oil platform, the task of circumlocution can become intractable. Past work towards supplying such needs has until recently been retarded by lack of highly parallel hardware designs. Computer vision can certainly use these. Working with a FORTRAN emulator of Duff's CLIP-3 parallel array processor, Armstrong and Jelinek (1977) developed a command language for vision in which they were able to specify solutions to the normal range of low-level vision tasks — removing noise, finding and measuring blobs, following lines, detecting vertices and so on. Although emulator- overhead slowed their algorithms down by factors ranging from a thousandfold to ten- thousandfold, they were still beating standard sequential algorithms 24 ["region," area, compactness] '["holeset," number of holes] ["outline," ] ["hole," area, compactness] ["segment," ... , . ,, no. of segments, no. of internal corners, , outline, _ .. _ . , . _ . J no. of external corners, no. of straight lines ["segment," angle, curvature, length] Figure 14. The diagram depicts the descriptive structures used in the Edinburgh robot project to encode diagnostic information about solid objects viewed by the computer-controlled robot through a TV camera. The slots in these structures could only be filled after a variety of pre-processing routines had acted to eliminate noise in the picture, identify optically homogeneous regions, to find, trace and segment boundaries, and to perform various measurements on the primitive features thus isolated. 25 in real time. The reason, it turned out, was that without a language in which to express parallelism it is not easy to acquire the mental set for seeing simple fast ways of doing these things. In the next generation of knowledge-based systems, incorporation of versatile and adaptive array processors for vision and other perceptual tasks will be a necessity, a point of confluence with the closely related field of robotics. Mass Production of Inscrutable Patterns The knowledge engineer's building blocks are thus patterns (descriptive of key concepts underlying the given consultant skill) . A state-of-the-art system requires many hundreds of such descriptive patterns to be programmed, and the current cycle of development envisages thousands or even tens of thousands for complex task environments. The work of coding even one pattern can consume many programmer weeks, so that the total task appears prohibitive. Accordingly, the knowledge engineer of the 1980' s will not construct his own building blocks, but will have recourse to automated systems of pattern synthesis. Such systems already exist. They must be equipped with stocks of primitive descriptors appropriate to given domains. Pattern-synthesis is then induced by supplying tutorial specifications in the form of examples and counter-examples. Methods have recently been developed which can inductively synthesize patterns from examples for a small fraction of the cost of programming them by hand. When run on the machine in the form of classification programs, machine-made patterns typically out-perform man-made ones both in accuracy and execution cost. But these machine-efficient patterns turn out to be 25 conspicuously different from those developed by experts, and in general to 26 36 be somewhat inscrutable to humans. A methodology is therefore needed for humanizing the man-machine channel. A small number of salient phenomena can now be combined to yield some rather strange conclusions about the systems towards which we are moving. Six Facts of Tomorrow's World 1. Knowledge-based systems are memory- intensive rather than processor- intensive. They will soon comprise thousands of stored rules per system. 2. Costs of memory relative to processor costs will continue to decrease. 3. The way to use a large memory as the basis of knowledgeable behavior is to fill it with patterns descriptive of the key concepts of the given knowledge-domain. From these the rule-bases are built. 4. It is becoming possible to mass-produce such patterns by machine more cheaply than by programming. 5. The resulting patterns are highly efficient at run-time but their form tends to inscrutability for the domain specialist. 6. A preliminary look has indicated that in some cases there may be transformations capable of rendering machine-optimized patterns into more humanly transparent forms. An at chi Lecture for Knowledge Engineering in the 1980s Putting all of the above together we can identify major components of future systems. The resulting automated mining-and-ref ining plant for human knowledge presents some bizarre, even awesome, features. To maintain a secure grip on credulity I will expound it within the narrow framework of a selected practical application, chosen for its attractive mix of combinatorial 27 complexity, susceptibility to knowledge-based approaches, and commercial potential. I am speaking of the identification of organic compounds by industrial chemists. The skilled chemist performs the knowledge-based computation shown in Figure 15. This computation is extremely hard to simulate by program. A decade of work at Stanford University by the DENDRAL project has resulted in a system proficient at identifying straight-chain aliphatic compounds and members of certain classes of oestrogenic steroids. Such expertise is too narrow to be of serious interest to chemists. Some other approach is needed. As a starting point take one giant lookup memory, shall we say 12 (conservatively) 10 bits, directly addressable. We wish to use it as a dictionary of mass spectrogram patterns, a likely molecular structure being entered against each pattern. Such dictionaries exist in industrial use but are constructed by hand, and do not exceed 100,000 entries. If the computation is so hard, how can we compute the entries for the dictionary in the first place? We ask: "What about the inverse computation?" Programs certainly do exist for predicting molecular structure -y mass spec, pattern in reasonable time. If we could generate exhaustively and irredundantly the complete set of molecular structures in the given class then by computing for each structure its proper predicted pattern we could construct (somewhat back-handed ly) the desired dictionary. A suitable structure-generating program exists in the 37 form of Raymond Carhart's "CONGEN." Such an immensely powerful question-answering resource would unfortunately be limited to answering what I have termed elsewhere "questions of the first kind" (What is the value of f(x)?) without being able to tell 28 Empirical formula Mass spec, data NMR data Infra-red data Other data Knowledge of the stability of chemical bonds as a function of their local intra-molecular environments Inferred molecular structure Figure 15. The experienced chemical consultant is able to compute the molecular structure of an unknown compound by applying his physico-chemical knowledge, and heuristic rules of thumb, to an assemblage of measurements performed on the unknown compound . 29 the chemist anything of the "why." This is the point at which the AI specialist has to come back into the picture to deploy his inductive inference machinery (software tools such as Michalski's INDUCE and Quinlan's ID3) to compress parts of the dictionary into pattern-rule form. Where possible he must also humanize for intelligibility. We end up, then, with a scenario like that of Figure 16. The weakest link in the diagram, because the problem has only recently been identified, let alone solved, is the "humanization loop." Initial study suggests methods applicable in some cases — e.g. for converting large homogeneous decision trees into hierarchically structured collections of human- type rules. In other cases conversion of representation may be thwarted by complexity considerations. Choice may then have to be exercised between sacrificing the superior efficiency of the machine-made algorithm, and equipping it with a simplified "cover story" for human use. The possibility of enabling knowledge-based systems to handle "stories" in this sense, for purposes both of input and of output, deserves study 31 10 and overlaps work such as that of Schank and Charniak on story-understanding programs. We must not forget that before the introduction of writing, and to some extent even to the present day, the chief means of encoding useful knowledge has been through stories, proverbs and other mnemonic paraphernalia of folk science. Machines too may have to be taught to handle these time-honored summarizing structures. 30 r ^o«\ _** V/ ■u en a H •H (0 cfl •H 6 O o (1) Q a M rH 1 OJ -H : I k co ai u a) M > QJ N a) CO 4J CO -H 0) D. *-> sajnj Avau x n 3 3sn •H CO Tj M CO t-i C X> 3 O- >,-H QJ CO 1 co M ^h a x: 6 OJ OJ £. O O 3 rH 4-1 60 en 4-1 PC 3 3 •H en co = * •H | S3 cfl B T 60 >-l 3 0) •H !>. CD N N n > 1 •H •H CO Cfl •H G CO rH 3 4J 4-1 U OJ CO O OJ CJ QJ x: rl •H Cfl 3 4-1 4J QJ u 4J 1 X 4J c u 3 OJ CJ xs 3 CO t^ o OJ > •H 3 M CX en 4-1 60 c X en u 4J n Cfl 0) •H cu •H H XI 1 U O •H rl . 0) Cfl CO 3 4H CO Cfl a >-, C Cfl X o 60 3 4-1 co 43 o 4J 0) • •H >-% 3 o O a •H CJ £ Q 4-1 i-H U -H ■H cfl Cfl -a 4J CO 9 c rH O CO 4-1 4H 0) OJ 11H Cfl O cfl •H S 3 CJ & 0) rH 43 4-1 •H >H QJ O •H 4-1 0) m 0) Cfl 3 3 CJ 4J B x: -a O 53 4-1 -d CO Cfl •H P3 M O 4J z Cfl o 3 QJ C/> 4-J cfl 6 3 Cfl 3 qj •H o cu o IH O o X 60 i-i > •H QJ •H o 0) 4-4 •H 4-1 > 4-1 M 4-1 60 4-1 CtJ 3 cfl a Cfl 3 >> Cfl rl •H 4-1 CJ •H ^ 3 CD ~ 3 4J ■H 13 CO £ s- Cfl TJ cfl I Xj 3 X) B CO QJ O 0) M c C o P-, X r-{ e a) 11 cfl a o X! 4-1 4J 3 O -H X> X CJ OJ 3 X -H 3 X> cfl E o <: o OJ QJ CJ X> 3 •H C C/D Cfl CJ • >, 60 CU 3 JS •H 4J 4-1 3 - a qj B a O >, CJ 4-1 XI X) cu u en co cfl X XI 3 1 cfl OJ 4-1 60 en X QJ 4-1 rH O s o en 3 3 Jj* O •H l-l 4-1 O cfl • 4H 4J QJ 3 c o &. >h •h B X! »-i o CJ cfl cj ct 3 B QJ 4-1 O 3 OJ en cu H en C£ en cu C O W •H 00 o. Cfl ON QJ rH U CtJ vO QJ U 3 60 •H 31 REFERENCES 1. Ambler, A. P., Barrow, H. G., Brown, C. M. , Burstall, R. M. , and Poppelstone, R. J. (1975). A versatile system for computer-controlled assembly. Artificial Intelligence , j>, 129-156. 2. Armstrong, J. L. and Jelinek, J. (1977). CHARD User's manual (a FORTRAN CLIP emulator). Research Memorandum MIP-R-115 . Edinburgh: Machine Intelligence Research Unit, University of Edinburgh. 3. Bennet, J., Creary, L., Engelmore, R. S. and Melosh, R. (1978). SACON: a knowledge-based consultant for structural analysis. Memo HPP-78-28 , also Report No. STAN-CS-78-699 . Stanford: Computer Science Department. Stanford University. 4. Binet, A. (1894). Psychologie des Grands Calculateurs et Jouers d'Echecs . Paris: Hachette. 5. Bramer, M. (1980). An optimal pattern-based algorithm for King and Pawn against King. In Advances in Computer Chess 2 (ed. M. R. B. Clarke). Edinburgh: Edinburgh University Press. 6. Bratko, I. (1978). Proving correctness of strategies in the AL1 assertional language. Information Processing Letters , 7_, 223-230. 7. Bratko, I. and Michie, D. (1980). A representation for pattern-knowledge in chess end-games. In Advances in Computer Chess 2 (ed. M. R. B. Clarke). Edinburgh: Edinburgh University Press (in press). 8. Buchanan, B. G. (1979). Issues of representation in conveying the scope and limitations of intelligent assistant problems. In Machine Intelligence 9 (eds. J. E. Hayes, D. Michie and L. I. Mikulich) . Chichester: Ellis Horwood, and New York: Halsted Press (John Wiley). 9. Buchanan, B. G. , Smith, D. H. , White, W. C, Gritter, R. , Feigenbaum, E. A., Lederberg, J., and Djerassi, C. (1976). Applications of Artificial Intelligence for chemical inference. XXII. Automatic rule formation in mass spectrometry by means of the Meta-DENDRAL program. J. Amer. Chem. Soc. , 98, 6168-6178. 10. Charniak, E. (1977). Inference and knowledge in language comprehension. In Machine Intelligence 8 , pp. 541-574 (eds. E. W. Elcock and D. Michie). Chichester: Ellis Horwood; and New York: Halsted Press (John Wiley). 11. Chase, W. G. and Simon, H. A. (1973). Perception in chess. Cognitive Psychology , 4_, 55-81. 12. Chilausky, R. , Jacobsen, B. and Michalski, R. S. (1976). An application of variable-valued logic to inductive learning of plant disease diagnostic rules. Proc. 6th Ann. Internat. Symp . on Multi-varied Logic , Utah. 32 13. Davis, R. and King, J. (1977). An overview of production systems. In Machine Intelligence 8 , pp. 300-332 (eds. E. W. Elcock and D. Michie) . Chichester: Ellis Horwood and New York: John Wiley. 14. de Dombal, F. T., Leaper, D. J., Staniland, J. R. et_al. (1972). Computer-aided diagnosis of acute abdominal pain. Brit. Med. J. , 2> 9-13. 15. Fagan, L. M. (1978). Ventilator management: a prografito provide on-line consultative advice in the intensive care unit. Memo HPP- 78-16 . Stanford: Computer Science Department, Stanford University. 16. Gregory, R. L. (1970). The Intelligent Eye . London: Duckworth. 17. Groot, A. de (1965). Thought and Choice in Chess . (ed. G. Baylor) (translation, with additions, of Dutch version of 1946). The Hague and Paris: Mouton. 18. Hart, P. E. , Duda, R. 0. and Einaudi, M. T. (1978). A computer-based consultation system for mineral exploration. Unpublished report, obtainable from authors at Menlo Park, Ca: SRI International. 19. Hunt, E. B., Marin, J. and Stone, P. (1966). Experiments in Induction . New York: Academic Press. 20. Hunter, I. M. L. (1962). An exceptional talent for calculative thinking. Brit. J. Psychol. , 53, 243-258. 21. Kunz, J. (1978). A physiological rule-based system for interpreting pulmonary function test results. Memo HPP 78-19 . Stanford: Department of Computer Science, Stanford University. 22. Larson, J. and Michalski, R. S. (1977). Inductive inference of VL decision rules. SIGART Newsletter , 63 , 38-44. 23. Martin, N. , Friedland, P., King, J. and Stefik, M. J. (1977). Knowledge-base management for experiment planning in molecular genetics. Memo-HPP-77-19 , Stanford: Computer Science Department, Stanford University. Also in Proc. 5th Inter. Joint Conf. on Artif. Intell. , ■ JCAI-7 7 , Pittsburgh, Computer Science Department, Carnegie Mellon University. 24. Michalski, R. S. (1978). Pattern Recognition as knowledge-guided induction. Report 927 . Urbana: Department of Computer Science, University of Illinois. 25. Michalski, R. S. and Chilausky, R. (1979). Knowledge acquisition by encoding expert rules versus computer induction from examples: a case study involving soybean pathology. Accepted for publication in International Journal for Man-Machine Studies . 26. Michie, D. and Chambers, R. A. (1968). BOXES: an experiment in adaptive control. In Machine Intelligence 2 , pp. 137-152 (eds. E. Dale and D. Michie). Edinburgh: Edinburgh University Press. 33 27. Michie, D. (1979). Machine models of perceptual and intellectual skills. In Scientific Models and Man; The Herbert Spencer Lectures 1976 , pp. 56-79 (ed. H. Harris). Oxford: Oxford University Press. 28. Nievergelt, J. (1977). The information context of a chess position, and its implications for the chess-specific knowledge of chess players. SIGART Newsletter , 62, 13-15. 29. Pople, H. E. and Myers, J. D. and Miller, R. A. (1977). DIALOG: a model of diagnostic logic for internal medicine. Proc. 5th Inter. Joint Conf. on Artif. Intelligence, IJCAI-77 , Pittsburgh: Computer Science Department, Carnegie Mellon University. (This program is now called INTERNIST) . 30. Quinlan, J. R. (1979). Discovering rules by induction from large collections of examples. In Expert Systems in the Microelectronic Age (ed. D. Michie). Edinburgh: Edinburgh University Press. 31. Schank, R. C. (1977). Representation and understanding of text. In Machine Intelligence 8 (eds. E. W. Elcock and D. Michie). pp. 575-619. Chichester: Ellis Horwood; and New York: Halsted Press (John Wiley). 32. Shortliffe, E. H. (1976). Computer-Based Medical Consultations: MYCIN . New York: Elsevier/North Holland. 33. Waltz, D. L. (1972). Generating semantic descriptions from drawings of scenes with shadows. MAC AI-TR-271 , MIT, Cambridge, Mass. 34. Wipke, W. T. (1974). Computer-assisted 3-dimensional synthetic analysis. In Computer Representation and Manipulation of Chemical Information , pp. 147-174. (eds. W. T. Wipke, S. R. Heller, R. J. Feldmann and E. Hyde). London and New York: Wiley Interscience. 35. Zuidema, C. (1974). Chess, how to program the exceptions? Afdeling informatica IW21/74 . Amsterdam: Mathematisch Centrum. 36. Unpublished work by J. R. Quinlan (personal communication) has attained greater than f actor-of-f ive superiority in execution time of machine- synthesized patterns in the form of decision- trees, as compared with the best hand-coded programs. 37. Carhart, R. E. (1977). Re-programming DENDRAL. AISB Quarterly , 28 , 20-22. This paper gives a brief overview of DENDRAL from a utility standpoint, and also discusses solutions to the structure-generating problem. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-80-1001 4. Title and Subtitle Knowledge-based Systems 3. Recipient's Accession No. 5. Report Date January 1980 7. Autiior(s) Do nald Michie 8. Performing Organization Rept. No. 9. Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 10. Project/Task/Work Unit No. 11. Contract/Grant No. 12. Sponsoring Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana. IL 61801 13. Type of Report & Period Covered 14. 15. Supplementary Notes 16. Abstracts The "expert system" concept in chemistry, molecular genetics, geology, medicine, plant pathology, chess and other applications. Design principles: store- time trade-off, problem complexity; man-machine communication of concepts ("patterns") Languages for knowledge-based programming: goals, patterns, production rules. Structure of MYCIN. Modularity and facilities for querying the knowledge-base. Rule-acquisition and computer induction: Michalski's INDUCE program. Patterns as building blocks. Amount of programming per pattern. Total number needed for given task domains. Costs of man-made and machine-made patterns. Reliability. Intelligibility. Need to "humanize." Novel future designs. 17. Key Words and Document Analysis. 17a. Descriptors computer induction expert systems knowledge acquisition knowledge representation machine learning pattern-directed inference pattern synthesis machine perception production rules 17b. Identifiers/Open-Ended Terms 17c. COSATI Field/Group 18. Availability Statement FORM NTIS-3S (10-701 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21- No. of Pages 22. Pi ice USCOMM-DC 40329-P7I UNIVERSITY OF ILLINOIS URBAN* 3 0112 001342416