ARTIFICIAL INTELLIGENCE 135

The Organization of Expert Systems*
A Tutorial

Mark Stefik, Jan Aikins, Robert Balzer,
John Benoit, Lawrence Birnbaum,
Frederick Hayes-Roth, Earl Sacerdoti**

Xerox Palo Alto Research Center, Palo Alto, CA 94304, U.S.A.

Recommended by Daniel G. Bobrow

ABSTRACT
This is a tutorial about the organization of expert problem -solving programs. We begin with a
restricted class of problems that admits a very simple organization. To make this organization feasible
it is required that the input data be static and reliable and that the solution space be small enough to
search exhaustively. These assumptions are then relaxed, one at a time, in case study of ten more
sophisticated organizational prescriptions. The first cases give techniques for dealing with unreliable
data and time-varying data. Other cases show techniques for creating and reasoning with abstract
solution spaces and using multiple lines of reasoning. The prescriptions are compared for their
coverage and illustrated by examples from recent expert systems.

1. Introduction

Twenty years ago, Newell [29] surveyed several organizational alternatives for
problem solvers. He was concerned with how one should go about designing
problem-solving systems. Many techniques have been developed in artificial
intelligence (henceforth Al) research since then and many examples of expert
systems have been built. Expert systems are problem-solving programs that

* There is currently much interest and activity in expert systems both for research and applications.
A forthcoming book edited by Hayes-Roth, Waterman, and Lenat [21]provides abroad introduction
to thecreation andvalidation ofexpert systems for ageneralcomputerscience audience. An extended
version of this tutorial, which introduces concepts and vocabulary for an audience without an AT
background, will appear as achapter in the book.

** Additional affiliations: 3. Aikins, Hewlett-Packard, Palo Alto, CA; R. Balzer, USC/Information
Sciences Institute, Marina del Rey, CA; J. Benoit, The MITRE Corporation, McLean, VA; L.
Birnbauni, Yale University, New Haven, CT; F. Hayes-Roth, Teknowledge, Palo Alto, CA; E.
Sacerdoti, Machine Intelligence Corp., Palo Alto, CA.

Artificial Intelligence 18 (1982) 135-173
0004.3702/82/0000-0000/$02.75 © 1982 North-Holland


136 M. STEFIK ET AL.

solve substantial problems generally conceded as being difficult and requiring
expertise. They are called knowledge based because their performance depends
critically on the use of facts and heuristics used by experts. Expert systems have
been used as a vehicle for Al research under the rationale that they provide a
forcing function for research in problem solving and a reality test.
Recently some textbooks have appeared that organize principles of Al (e.g.,

[301) and give examples of advanced programming techniques (e.g.,
[6]). However, there is no guidebook for an expert systems designer
to the issues and choices in designing a system. Furthermore, an unguided
sampling of expert systems from the literature can be quite confusing.
Examples are scattered in various journals, conference proceedings, and tech-
nical reports. Systems with seemingly similar tasks sometimes have radically
different organizations, and seemingly different tasks are sometimes performed
with only minor variations on a single organization. The variations reflect the
immaturity of the field in which most of the systems are experimental. From
the diversity of experiments one would like to extract alternatives and prin-
ciples to guide a designer.
This tutorial is organized as follows: Section 2 is a catalog of some generic

expert tasks. For each task there is a checklist of requirements and key
problems associated with expert performance of the task. The purpose of this
section is to extract a set of architecturally relevant issues that cover a variety
of problem-solving tasks, Section 3 presents the substance of the organizational
ideas. It addresses the issues from Section 2 and makes prescriptions.

2. A Characterization of Expert Tasks

In this section we will consider several generic tasks that experts perform.
Examining these tasks will help us to understand what makes expert reasoning
difficult. The difficulties provide a guide to architectural relevance; they enable
us to focus on issues that relate to critical steps in reasoning.

Interpretation

Interpretation is the analysis of data to determine their meaning.

Example. Interpretation of mass spectrometer data [2]. In this case, the data
are measurements of the masses of molecular fragments and interpretation
means the determination of one or more chemical structures.

Requirements. Find consistent and correct interpretations of the data. It is often
important that analysis systems be rigorously complete, that is, that they
consider the possible interpretations systematically and discard candidates only
when there is enough evidence to rule them out.

Key problems. Data are often noisy and errorful, that is, data values may be
missing, erroneous, or extraneous.


EXPERT SYSTEMS TUTORIAL 137

(1) This means that interpreters must cope with partial information.
(2) For any given problem, the data may seem contradictory. The interpreter

must be able to hypothesize which data are believable.
(3) When the data are unreliable, the interpretation will also be unreliable.

For credibility it is important to identify where information is uncertain or
incomplete and where assumptions have been made.
(4) Reasoning chains can be long and complicated. It is helpful to be able to

explain how the interpretation is supported by the evidence.

Diagnosis

Diagnosis is the process of fault-finding in a system (or determination of a
disease state in a living system) based on interpretation of potentially noisy
data.

Example. Diagnosis of infectious diseases [34].

Requirements. Requirements include those of interpretation. A diagnostician
must understand the system organization (i.e., its anatomy) and the relation-
ships and interactions between subsystems.

Key problems. (1) Faults can sometimes be masked by the symptoms of other
faults. Some diagnostic systems ignore this problem by making a single fault
assumption.
(2) Faults can be intermittent. A diagnostician sometimes has to stress a

system in order to reveal faults.
(3) Diagnostic equipment can itself fail. A diagnostician has to do his best

with faulty sensors.
(4) Some data about a system are inaccessible, expensive, or dangerous to

retrieve. A diagnostician must decide which measurements to take.
(5) The anatomy of natural systems such as the human body is not fully

understood. A diagnostician may need to combine several (somewhat in-
consistent) partial models.

Monitoring

Monitoring means to continuously interpret signals and to set off alarms when
intervention is required.

Example. Monitoring a patient using a mechanical breathing device after
surgery [17].

Requirements. A monitoring system is a partial diagnostic system with the
requirement that the recognition of alarm conditions be carried out in real
time. For credibility, it must avoid false alarms.

Key problems. What constitutes an alarm condition is often context-depen-
dent. To account for this, monitoring systems have to vary signal expectations
with time and situation.


138 M.STEFIKETAL.

Prediction

Prediction means to forecast the course of the future from a model of the past
and present.

Example. Predicting the effects of a change in economic policy. (Some plan-
fling programs have a predictive component. There is currently an opportunity
to develop expert prediction programs in a variety of areas.)

Requirements. Prediction requires reasoning about time. Predictors must be
able to refer to things that change over time and to e~ventsthat are ordered in
time. They must have adequate models of the ways that various actions change
the state of the modeled environment over time.

Key problems. (1) Prediction requires the integration of incomplete infor-
mation. When information is complete, prediction is not an Al problem (e.g.,
where will Jupiter be two years from next Thursday).
(2) Predictions should account for multiple possible futures (hypothetical

reasoning), and should indicate sensitivity to variations in the input data.
(3) Predictors must be able to make use of diverse data, since indicators of

the future can be found in many places.
(4) The predictive theory may need to be contingent; the likelihood of

distant futures may depend on nearer but unpredictable events.

Planning

A plan is a program of actions that can be carried out to achieve goals.
Planning means to create plans.

Example. Experiment planning in molecular genetics [36].

Requirements. A planner must construct a plan that achieves goals without
consuming excessive resources or violating constraints. If goals conflict, a
planner establishes priorities. If planning requirements or decision data are not
fully known or change with time, then a planner must be flexible and oppor-
tunistic. Since planning always involves a certain amount of prediction, it has
the requirements of that task as well.

Key problems. (1) Planning problems are sufficiently large and complicated
that a planner does not immediately understand all of the consequences of his
actions. This means that the planner must be able to act tentatively, so as to
explore possible plans.
(2) If the details are overwhelming, he must be able to focus on the most

important considerations.
(3) In large complex problems, there often are interactions between plans for

different subgoals. A planner must attend to these relationships and cope with
goal interactions.


EX1~ERTSYSTEMS TUTORIAL 139

(4) Often the planning context is only approximately known, so that a
planner must operate in the face of uncertainty. This requires preparing for
contingencies.
(5) If the plan is to be carried out by multiple actors, coordination (e.g.

choreography) is required.

Design

Design is the making of specifications to create objects that satisfy particular
requirements.

Example. Designing a digital circuit. (This is an area of increased interest and
activity in expert systems.)

Requirements. Design has many of the same requirements as planning.

Key problems. (1) In large problems, a designer cannot immediately assess the
consequences of design decisions. He must be able to explore design pos-
sibilities tentatively.
(2) Constraints on a design come from many sources. Usually there is no

comprehensive theory that integrates constraints with design choices.
(3) In very large systems, a designer must cope with the system complexity

by factoring the design into subproblems. He must also cope with interactions
between the subproblems, since they are seldom independent.
(4) When a design is large, it is easy to forget the reasons for some design

decisions and hard to assess the impact of a change to part of a design. This
suggests that a design system should record justifications for design decisions
and be able to use these justifications to explain decisions later. This is
especially apparent when subsystems are designed by different designers.
(5) When designs are being modified, it is important to be able to reconsider

the design possibilities. During redesign, designers need to be able to see the
‘big picture’ in order to escape from points in the design space that are only
locally optimal.
(6) Many design problems require reasoning about spatial relationships.

Reasoning about distance, shapes, and contours demands considerable com-
putational resources. We do not yet have good ways to reason approximately
or qualitatively about shape and spatial relationships.

Several issues appear repeatedly across this catalog of expert tasks:
Large solution spaces. In interpretation problems like the mass spec-

trometery example [2], some problems require millions of possible chemical
structures to be considered. In planning and design tasks, the number of
reasonable solutions is usually a very small fraction of a very large number of
possible solutions. In each of these tasks, the size and characterization of the
solution space is an important organizational parameter.
Tentative reasoning. Many diagnostic procedures profitably employ assump-

tions about the number of faults or about the reliability of sensors. Part way


140 M. STEFIK ET AL.

through diagnosis, it may be discovered that these assumptions are unwar-
ranted. This places a premimum on the ability to undo the effects of the
assumptions. Similarly, in design and planning tasks it is often appropriate
because of scale to employ simplifying assumptions (e.g., abstractions). In any
given design, some of the assumptions will fail in a design, so there is an
incentive to employ methods that facilitate the reworking of assumptions and
trade-offs during iterations of the design process.
Time-varying data. Patient monitoring and diagnosis tasks are concerned

with situations that evolve over time—as diseases follow their natural course or
as treatments are administered.
Noisy data. Sensors often yield noisy data. This is a factor for any task

involving reasoning from measurements such as interpretation, diagnostic, and
monitoring tasks.

The next section considers organizational prescriptions that deal with each of
these issues.

3. Knowledge Engineering Prescriptions

Feigenbaum [18] defines the activity of knowledge engineering as follows.

“The knowledge engineer practices the art of bringing the principles
and tools of Al research to bear on difficult applications problems
requiring experts’ knowledge for their solution. The technical issues of
acquiring this knowledge, representing it, andusing it appropriately to
construct and explain lines-of-reasoning, are important problems in
the design of knowledge-based systems The art of constructing
intelligent agents is both part of and an extension of the programming
art. It is the art of building complex computer programs that represent
and reason with knowledge of the world.” [18, pp. 1014—1016]

This section is intended as a prescriptive guide to building expert systems. To
illustrate the strengths and limitations of organizational alternatives we will cite
a number of contemporary systems. In presenting these examples we adopt a
level of detail that is adequate for making the ideas clear yet avoiding the
particulars of the task and the programming implementation. The reader
seeking a more detailed discussion of implementation is encouraged to consult
a textbook on Al programming (e.g., [6]).
One of the most variable characteristics of expert systems is the way that

they search for solutions. The choice of search method is affected by many
characteristics of a domain, such as the size of the solution space, errors in the
data, and the availability of abstractions. Inference is at the heart of a
reasoning system and failure to organize it properly can result in problem-
solvers that are hopelessly inefficient, naive, or unreliable. As a consequence of
this, search is one of the most studied topics in artificial intelligence.


EXPERT SYSTEMS TUTORIAL 141

Our pedagogical style is to start with a very restricted class of problems that
admits a very simple search process. We will articulate the domain restrictions
under which this organization is applicable, and thereby expose its limitations.
Then we will relax the requirements on the task domain and introduce
ameliorating techniques as architectural prescriptions. Fig. I shows a chart of

2 I 3 I 4 I

No Evaluator for
Partial Solution
Fixed Order of

Abstracted Steps

6

State~trigge red
Expectations

FIG. 1. Case 1 begins with a restricted class of problems that admits a very simple organization.
These assumptions are relaxed oneat a time in the other cases.

~1

Requirements —p.

Prescriptions .—--+

Small Solution Space
Data Reliable & Fixed
Reliable Knowledge
Exhaustive Search
Monotonic Reasoning

Single Line of Reasoning

I
IUnreliable Dataor Knowledge

Combining Evidence
from Multiple Sources
Probability Models
Fuzzy Models
Exact Models

Time.Varying Data I
Big, Factorable

Solution Space

Hierarchical
Generate and Test

5 9 1
Representation Method

Too Inefficient
Tuned Data Structures
Knowledge Compilation
Cognitive Economy

Single Line of Reasoning
Too Weak

Multiple Lines of
Reasoning

10
Single Knowledge Source

Too Weak

Heterogenous Models
Opportunistic Scheduling
Variable-Width Search

No Fixed Sequence
of Subproblems

Abstract Search Space

7

Subproblerns Interact

Constraint Propagation
Least Commitment

8
Efficient Guessing

is Needed

Belief Revision for
Plausible Reasoning


142 M. STEFIK ETAL.

the cases that we will consider. Each box in the figure corresponds to one of
the cases and the numbering indicates the order in which the cases are
discussed. The lines connecting the boxes organize the cases into a tree
structure such that a sequence of cases along a branch corresponds to increas-
ingly elaborate considerations of a basic idea. The first three branches consider
the complications of unreliable data or knowledge, time-varying data, and a
large search space. Any given problem may require combining ideas from any
of these topics. The problem of a large search space is then considered along
three major branches. The first branch (cases 5 through 8) considers organiza-
tions for abstracting asearch space. The second branch focuses on methods for
incomplete search. The third branch considers only ways to make the knowl-
edge base itself more efficient. This breakdown is mainly pedagogical. Real
systems may combine these ideas.

3.1. Case 1—Small search space with reliable knowledge and data

Systems for complex tasks are generally more complicated than systems for
simple tasks. In this section we will consider a very simple architecture which
has been used for relatively simple applications. We begin by listing the
requirements for task simplicity:
(1) The data and knowledge should be reliable.
(2) The data and knowledge should be static.
(3) The space of possible solutions should be small.
On the surface these requirements may seem quite mild. Indeed, there is a

widely held belief among people who have not looked closely into problem
solving that most problems satisfy these requirements. After all, for many
problems the facts seem straightforward and there are not that many solutions
to consider. Under closer examination, however, most real tasks fail to meet
these requirements including the examples of expert tasks listed in Section 2.
The first requirement is that the data and knowledge be reliable. Reliable

data are not noisy or errorful. There can be no extraneous signals and no
missed signals (e.g., due to sampling). In real applications few sources of data
meet these requirements. In addition to data reliability, the knowledge must be
reliable. Reliable knowledge is applicable without concern about consistency or
correctness. Systematic application of reliable knowledge should not lead to
false, approximate, or tentative conclusions. The main advantage of reliability
for both data and knowledge is the monotonicity of the system. In the simplest
architecture, the memory is a monotonic data base to which conclusions are
added by the reasoning system as they are inferred. No provisions need to be
made for retraction of facts pending new information. It is enough to develop a
single line of reasoning; that is, there is no need to develop multiple arguments
to support potential conclusions. If more than one inference rule is applicable
at a given time, the order in which they are applied is unimportant.


EXPERT SYSTEMS TUTORIAL 143

The second requirement is intended to avoid the problem of reasoning with
time-dependent data. This means that the system need not be concerned with
invalidating facts as time passes.
The requirement that the search space should be small implies that no

provisions need to be made to cope with the limitations of computational
resources. There need be no concern about computationally efficient data
structures or for avoiding combinatorial explosions. It doesn’t matter whether
the search is for one solution or all possible solutions as long as the space is
small. If the search is exhaustive, the maximum size of the search space
depends on the time it takes to consider a single solution. A useful number to
keep in mind for this maximum is ten factorial (10!). If 25 milliseconds are
required to consider a solution, then 10! solutions can be considered sequen-
tially during a full twenty-four hour day. This is often a practical upper limit for
exhaustive search. The surprise is that the ceiling is so low.
An organization for solving these problems has two main parts: a memory

and an inference method. The simplest organization of the memory would be a
list of inferred facts (i.e., beliefs). For many problems, the beliefs can be
represented in the predicate calculus1 such as

(On BlockI Block2),
(NOT (On BIock2 Table-i)).

Some systems attempt to optimize the storage format of the data. For example,
in frame systems (see [3]) the indexing of facts is organized to make the most
common access paths more efficient. Data which are used together are stored
together in frames.
In the following sections we explore some more sophisticated organizations

that will enable us to relax these restrictions on the problems.

3.2. Case 2—Unreliable data or knowledge

Experts sometimes make judgments under pressure of time. All the data may
not be available; some of the data may be suspect; some of the knowledge for
interpreting the data may be unreliable. These difficulties are part of the
normal state of affairs in many interpretation and diagnostic tasks. The general
problem of drawing inferences from uncertain or incomplete data has invited a
variety of technical approaches.
One of the earliest and simplest approaches to reasoning with uncertainty

was incorporated in the MYCIN expert system [7, 34] for selecting antibiotic
therapy for bacteremia. One of the requirements for MYCIN was to represent
judgmental reasoning such as “A suggests B” or “C and D tend to rule out E”.
To this end, MYCIN introduced a model of approximate implication using

In our examples we use the prefix form of the notation because of its obvious similarity to list
notations as in LISP.


i~w M.STEFIKETAL.

numbers called certainty factors to indicate the strength of a heuristic rule. The
following is an example of a rule from MYCIN’S knowledge base.

If (1) the infection is primary-bacteremia and
(2) the site of the culture is one of the sterile sites and
(3) the suspected portal of entry of the organism is the

gastro-intestinal tract,
then there is suggestive evidence (.7) that the identity of the
organism is bacteroides.

The number ‘.7’ in this rule indicates that the evidence is strongly indicative
(0.7 out of 1) but not absolutely certain. Evidence confirming a hypothesis is
collected separately from that which disconfirms it, and the ‘truth’ of the
hypothesis at any time is the algebraic sum of the evidence. This admits the
combination of evidence in favor and against the same hypothesis.
The introduction of these numbers is a departure from the exactness of

predicate calculus. In MYCIN, things are not just true or false; reasoning is
inexact and that inexactness is numerically characterized in the rules by an
expert physician. Facts about the world are represented as 4-tuples cor-
responding to an atomic formula with a numeric truth value. For example,

(IDENTITY ORGANISM-2 KLEBSIELLA .25)

is interpreted as ‘The identify of organism-2 is Kiebsiella with certainty 0.25’.
In predicate calculus, the rules of inference tell us how to,combine wffs and
truth values. MYCIN has its own way to combine formulas. When the premise of
a rule is evaluated, each predicate returns a number between 1 and —1 (—1
means ‘definitely false’). MYCIN’s version of AND performs a minimization of
its arguments; OR performs a maximization of its arguments. This results in a
numerical value between —1 and 1 for the premise of a rule. For rules whose
premise values surpass an empirical threshold of 0.2, the rule’s conclusion is
made with a certainty that is the product of the premise value and the certainty
factor of the rule. These rules of combination can be shown to have certain
properties—such as insensitivity to the order in which the rules are applied.
MYCIN’S certainty factors are derived from probabilities but have some distinct
differences (see [34]).
A reasonable question about such approaches is whether they are un-

necessarily ad hoc. A commonly voiced criticism is that MYCIN introduces its
own formalism for reasoning with uncertainty when thereare thoroughly-studied
probabilistic approaches available. For example, Bayes’ Rule could be used to
calculate the probability (e.g., of a disease) in light of specified evidence, from
the a priori probability of the disease and the conditional probabilities relating
the observations to the diseases. The main difficulty with Bayes’ Rule is the
large amount of data that are required to determine the conditional prob-
abilities needed in the formula. This amount of data is so unwieldy that the


EXPERT SYSTEMS TUTORIAL 145

conditional independence of observations is often assumed. It can be argued
that such independence assumptions undermine the rigorous statistical model.
A middle ground which replaces observations with subjective estimates of prior
probabilities has been proposed by Duda, Hart, and Nilsson [14] and analyzed
for its limitations by Pednault, Zucker, and Muresan [31].
Another approach to inexact reasoning that diverges from classical logic is

fuzzy logic as discussed by Zadeh [40] and others. In fuzzy logic, a statement
like ‘X is a large number’ is interpreted as having an imprecise denotation
characterized by a fuzzy set. A fuzzy set is a set of values with corresponding
possibility values as follows.

Fuzzy Proposition:
X is a large number.

Corresponding fuzzy set:
(XE [0,10], .1),
(XE [10,1000], .2),
({X> 1000}, .7).

The interpretation of the proposition ‘X is large’ is that ‘X might be less than
10’ with possibility 0.1, or between 10 and 1000 with possibility 0.2, and so on.
The fuzzy values are intended to characterize an imprecise denotation of the
proposition.
Fuzzy logic deals with the laws of inference for fuzzy sets. Its utility in reasoning

about unreliable data depends on the appropriateness of interpreting soft data
(see Zadeh [41]) as fuzzy propositions. There is little agreement among Al
researchers on the utility of these modified logics for intelligent systems, or even
on their advantages for reasoning with incomplete data.
The pseudo-probability and fuzzy approaches for reasoning with partial and

unreliable data depart from the predicate calculus by introducing a notion of
inexactness. Other approaches are possible. For example, the use of exact
inference methods on unreliable data in an expert system is illustrated by the
GAl program reported by Stefik [37]. GAl is a data interpretation system which
copes with errorful data. It exploits the redundancy of experimental data in
order to correct errors that it may contain.
GAl infers DNA structures from segmentation data. GAl’S task is to assemble

models of complete DNA structures given data about pieces (called segments)
of the structures. The segment data are produced by chemical processes which
break DNA apart in predictable ways. In a typical problem, several in-
dependent breaking processes called digestions are performed and the resulting
segments are measured. These digestions give independent measurements of
the DNA molecule. For example, independent estimates of the molecular
weight can be computed by summing the weights of the segments in any of the
‘complete’ digestions. (A digest is called complete if all of the molecules have
been cleaved in all possible places by the enzymes.)


146 M. STEFIK ET AL.

An example of a rule for correcting missing data is:

If a segment appears in a complete digestion for an enzyme that
fails to appear in the incomplete digestion for that enzyme,
it may be added to the list of segments for the incomplete digestion.

This rule is based on the observation that segments are easier to overlook in
incomplete digestions than in complete digestions. This rule places more
confidence in data from complete digestions than from incomplete ones. Other
data correction rules incorporate more elaborate reasoning by modeling pre-
dictable instrument error such as failure to resolve measurements which are too
close together. Such rules enable GA! to look to the data for evidence of
instrument failure.
In summary, several methods for reasoning with unreliable data and knowl-

edge have been proposed. The probability-related and possibility methods use
modified logics to handle approximations. They use numerical measures for
combining evidence. In contrast, data correction rules can reason with partial
information without compromising the exactness of predicate calculus. All of
these methods depend on the formalization of extra meta-knowledge in order to
correct the data, take back assumptions, or combine evidence. The availability
of this meta-knowledge is a critical factor in the viability of these approaches to
particular applications.
A special method for contending with both fallible knowledge and limited

computational resources will be considered in Section 3.10.

3.3. Case 3—Time-varying data
Some expert tasks involve reasoning about situations that change over time.
One of the earliest approaches in Al to take this into account was the
situational calculus introduced by McCarthy and Hayes for representing
sequences of actions and their effects [25]. The central idea is to include
‘situations’ along with the other objects modeled in the domain. For example, the
formula

(On Block-I Table-2 Situation-2)

could represent the fact that in Situation-2, Block-i is on Table-2. A key
feature of this formulation is that situations are discrete. This discreteness
reflects the intended use of this calculus in robot planning problems. A robot
starts in an initial situation and performs a sequence of actions. After each
action, the state of the robot’s world is modeled by another situation. In this
representation, a situation variable can take situations as values. In some
implementations, the actual situation variable is usually left implicit by index-
ing the formulas according to situations.
Actions in the situational calculus are represented by functions whose


EXPERT SYSTEMS TUTORIAL 147

domains and ranges are situations. For each action, a set of frame axioms
characterizes the set of assertions (i.e., ‘the frame’) that remain fixed while an
action takes place within it. In a robot planning task, an example of an action
would be the Move action for moving an object to a new location. A frame
axiom for Move would be that all objects not explicitly moved are left in their
original location.
While many issues can be raised about this approach to representing changing

situations, many Al systems have used it for a variety of tasks with only minor
variations. Sometimes the changes of situation are signalled by time-varying
data, rather than by the autonomous actions of a robot. An example of this is
shown by the VM system reported by Fagan [16, 17]. VM (for Ventilator
Manager) is a program that interprets the clinical significance of patient data
from a physiological monitoring system. VM monitors the post-surgical progress
of a patient requiring mechanical breathing assistance. A device called a
mechanical ventilator provides breathing assistance for seriously ill patients.
The type and settings of the ventilator are adjusted to match the patient’s
needs. As the patient’s status improves, various adjustments and changes are
made, such as replacing the mechanical ventilator with a ‘T-piece’ to supply
oxygen to the patient.
In VM’s application, a patient’s situation is affected by the progression of

disease and the response to therapeutic interventions. For such applications,
the model of clinical reasoning must account for information received from
tests and observations over time.
VM illustrates knowledge suitable for coping with time-varying data. This

knowledge in VM is organized in terms of several kindsof rules: transition rules,
initialization rules, status rules, and therapy rules. Periodically VM receives a
new set of instrument measurements and then it reruns all of its rules.
Transition rules are used to detect when the patient’s state has changed, e.g.,
when the patient starts to breathe on the T-piece. The following is an example
of a transition rule.

If (1) the current context is ‘Assist’ and
(2) respiration rate has been stable for 20 minutes and
(3) l/E ratio has been stable for 20 minutes,

then the patient is on ‘CMV’ (controlled mandatory ventilation).

This rule governs the transition between an ‘Assist’ context and a ‘CMV’
context. When the premise of a transition rule is satisfied in VM, a new
‘context’ is entered. These contexts correspond to specific states or situations.
When a context is changed, VM uses initialization rules to update its in-
formation for the new context (e.g., expectations and unacceptable limits for
the measurements). These rules refer to the recent history of the patient to
establish new expectations and information for the new context. Part of one


148 M. STEFIK ET AL.

such rule follows:

If (I) the patient transitioned from ‘Assist’ to ‘T-piece’ or
(2) the patient transitioned from ‘CMV’ to ‘T-piece’

then expect the following:

Very
low Low

{~-Acceptable——]
f—Ideal—]

Mm Max High
Very
high

SYS 110 150
DIA 60 95
MAP 6() 75 110 120
Pulse rate 60 120
ECO2 22 28 30 40 45 50

v~’sreasoning about time is limited to adjacent time intervals. It is concer-
ned only with the previous state and the next state. Its mechanisms for dealing
with this are (I) state-triggered expectations and (2) rules for dynamic belief
revision. The transition rules in VM govern changes of context. Data arrives
periodically, but context is changed only when it is adequately supported by the
evidence. The initialization rules are essentially like the frame axioms—
establishing what changes and what stays the same in the new context. Once a
context is set, the expectations are used to govern VM’s behavior until the
context is changed again.
Programs which need to reason about more distant events require more

elaborate representations of events and time. For example, planning and
prediction tasks require reasoning about possible futures. For these ap-
plications, the situational calculus must be extended to allow for multiple
possible futures with undetermined operations, unordered sets of possible
future events, and the possible actions of uncontrolled multiple actors. While
Al systems capable of such sorts of reasoning seem within reach, their
construction is still a research enterprise.

3.4. Case 4—Large but factorable solution space

In the restricted class of problems that we started with in Section 3.1, it was
stipulated that:
(1) The data and knowledge must be reliable.
(2) The data must be static.
(3) The search space must be small.

We have already discussed some techniques for relaxing the first two
requirements. In this section we will begin the consideration of techniques for
coping with very large search spaces.


EXPERT SYSTEMS TUTORIAL 149

In many data analysis tasks, it is not enough to find just one interpretation of
the data. It is often desirable to find every interpretation that is consistent with
the data. This conservative attitude is standard in high risk applications such as
the analysis of poisonous substances or medical diagnosis. A systematic ap-
proach would be to consider all possible cases, and to rule out those that are
inconsistent with the data. This approach is called reasoning by elimination and
has been familiar to philosophers for years, but it has often been regarded as
impractical. The difficulty is that there is often no practical way to consider all
of the possible solutions.
The DENDRAL program [2] is probably the best known Al program

that reasons by elimination (using generate and test). The key to making
it work is to incorporate early pruning into the generate and test cycle.
This section illustrates some of the characteristics of problems on which this
approach will work, and gives examples of the kinds of knowledge that are
needed. Since the problem area for the DENDRAL program is rather complicated
for tutorial purposes, we will consider a simpler expert system, GAl [37], that
was already mentioned in Section 3.2.
Like DENDRAL, GAl is a data interpretation program that infers a complete

molecular structure from measurements of molecular pieces. Fig. 2 shows a

Complete A Digest: 5 5
Complete B Digest: 3 7
Complete A&B Digest: 1 2 3 4

FIG. 2. Enzyme A cleaves thecircular molecule at thepoints labeled A. Enzyme B cleaves it at the
points labeled B. The table lists the fragments that would be observed under ideal digestion
experiments.

The Data


150 M. STEFIK ET AL.

simple example of the kind of data that GA! would have about a molecule. The
top part of the figure shows that a molecule is made up of segments of
measureable length. The lines labeled A and B that cut across the circle
indicate the sites where the molecule is cleaved by enzymes (named A and B,
respectively). All of the molecules that GAl j5 concerned with are linear or
circular. This means that all of the molecular segments are linear pieces that
can he arranged end to end. When a sample of molecules is completely
digested by an enzyme, pieces are released whose sizes can be measured. The
goal is to infer the structure of the original molecule from the digest data.
Sometimes more than one molecular structure is consistent with the available
data.
A primary task in problems like this is to create a workable generator of all

of the possible solutions (i.e., all of the possible molecules). In GA!, the first
step is to apply data correction rules as shown in Section 3.2. Then GAl
determines an initial set of generator constraints—a set of segments and
enzyme sites for building candidate molecules. The rules for deriving the list
will not be elaborated here, but they make conservative use of molecular
weight estimates and redundant data from several digests. The generation
process then begins by combining these segments and sites, and testing whether
the combinations fit the evidence. For example, the following lists (among
others) correspond to the complete molecule in Fig. 2.

(1 A 2 B 3 A 4 B),
(2 B 3 A 4 B 1 A),
(1 B 4 A 3 B 2 A).

These equivalent representations can he generated by starting with any of the
four segments in the picture of the molecule in Fig. 2, and reading off the sites
and segments around the circle either clockwise or counterclockwise. This
provides us with eight equivalent representations for the same molecule. A
generator is said to be nonredundant if it produces exactly one of the
equivalent representations of a solution (the canonical form) during the
generation process. GAl does this by incorporating rules for pruning non-
canonical structures during the generation process. An example of such a rule
follows:

If circular structures are being generated, only the smallest segment
in the list of initial segments should he used for the first segment.

The key to effective use of generate and test is to prune classes of in-
consistent candidates as early as possible. For example, consider the following
structures from the generation process for the sample problem:

(1 B 2 B).

GA! treats this as a description of all the molecules that match the pattern


EXPERT SYSTEMS TUTORIAL 151

(1 B 2 B (Segment) (Site) (Segment) (Site))

where any of the remaining segments and sites may be filled in the template. It
is easy to see that no molecule matching this template is consistent with the
data in Fig. 2—because all such molecules would yield a segment of length 2 in
the complete digest for B. Other pruning considerations are more global. When
such pruning rules exist, the solution space is said to be factorable.
GAl has been run on problems where the number of possible candidates is

several billion. However, the pruning rules are so effective at eliminating
classes of solutions that most problems require only a few seconds of computer
time.
After the generator is finished, usually twenty or thirty candidates make it

through all of the pruning rules. Only one or two of these will be consistent
with all of the data. In principle, it is possible to add more pruning rules to GA!
so that only the consistent solutions remain. However, the rules needed to do
this become increasingly complex and specialized. It becomes difficult to prove
the correctness of such rules (so that solutions will not be missed) and to ensure
that the rules are faithfully represented in the program. There is also a point of
diminishing returns when each new specialized rule covers a smaller number of
cases. In GA! this problem is addressed by applying a digestion process model
to the final candidates and comparing its predictions with the observed data. A
simple scoring function then penalizes candidates that predict extra or missing
segments. Because the number of disagreements between the idealized digests
of two different molecules diverges rapidly for small molecular differences, it
was not necessary to tune the scoring function to recognize wrong solutions.
In summary, generate-and-test is an appropriate method to consider when it

is important to find all of the solutions to a problem. For the method to be
workable, the generator must partition the solution space in ways that allow for
early pruning. These criteria are often associated with data interpretation and
diagnostic problems.

3.5. Case 5—No evaluator for partial solutions

There are many problems involving large search spaces for which generate and
test is a method of last resort. The most common difficulty is that no generator
of solutions can be found for which early pruning is viable. Design and
planning problems are of this nature. One usually cannot tell from a fragment
of a plan or design whether that fragment is part of a complete solution; there
is no reliable evaluator of partial solutions expressed as solution fragments.
In this section we will consider the first of several approaches to problem

solving without early pruning. These approaches have in common the idea of
abstracting the search space but differ in their assumptions about the nature of
that space. Abstraction emphasizes the important considerations of a problem
and enables its partitioning into subprohlems. In the simplest case there is a


152 M. STEFIK ET AL.

fixed partitioning in the abstract space which is appropriate for all of the
problems in an application.
This case is illustrated by the Ri program reported by McDermott [27]. RI’S

area of expertise is the configuring of Digital Equipment Corporation’s VAX
computer systems. Its input is a customer’s order and its output is a set of
diagrams displaying the spatial relationships among the components on the
order. This task includes a substantial element of design. In order to determine
whether a customer’s order is satisfactory, Rl must determine a spatial
configuration for the components and add any necessary components that are
missing.
The configuration task can be viewed as a hierarchy of subtasks with strong

temporal interdependencies. RI partitions the configuration task into six
ordered subtasks as follows.

(1) Determine whether there is anything grossly wrong with the
customer’s purchase order (e.g., mismatched items, major
prerequisites missing).
(2) Put the appropriate components in the cpu and cpu expansion
cabinets.
(3) Put boxes in the unibus expansion cabinet and put the
appropriate components in those boxes.
(4) Put panels in the unibus expansion cabinets.
(5) Lay out the system on the floor.
(6) Do the cabling.

The actions within each subtask are highly variable; they depend on the
particular combination of components in an order and on the way these
components have been configured so far. Associated with each subtask in Rl is
a set of rules for carrying out the subtask. An example of a rule for the third
subtask follows:

If the most current active context is assigning a power supply
and a unibus adaptor has been put in a cabinet
and the position it occupies in the cabinet (its nexus) is known
and there is space available in the cabinet for a power supply for
that nexus
and there is an available power supply
and there is no H7i01 regulator available,

then add an H7i01 regulator to the order.

Ri has about 800 rules about configuring VAX systems. Most of the rules are
like the example above. They define situations in which some partial con-
figuration should be extended in particular ways. These rules enable Rl to
combine partial configurations to form an acceptable configuration. They
indicate what components can (or must) be combined and what constraints


EXPERT SYSTEMS TUTORIAL 153

must be satisfied in order for the combinations to be acceptable. They make
use of a data base describing properties of about 400 VAX components. Other
rules describe the temporal relationships between subtasks by determining
their ordering. (These rules are analogous to the transition rules described for
VM in Section 3.3 except that the rules monitor the state of RI’S problem solving
instead of data from external sensors.)
The approach that RI uses is called Match. It is one of Newell’s weak

methods for search [28]. Match enables Ri to explore the space of possible
configurations with the basic operations of creating the extending partial
configurations. Match explores this space by starting in an initial state, going
through intermediate states, and stopping in a final state without any back-
tracking. Each state in the space is a partially instantiated configuration. Rl
proceeds through its six major tasks in the same order for each problem; it
never varies the order and it never backs up in any problem. The benefit of the
abstraction space is that Ri needs to do very little search.
The conditions that make Match viable are both its source of power and its

weakness. The key requirement is that there can be no backtracking. This
means that at any intermediate state, RI must be able to determine whether the
state is on a solution path. This requires that there must exist a partial ordering
on decisions for the task such that the consequences of applying an operator
bear only on ‘later’ parts of the solution.
It is interesting that Match is in fact insufficient for the complete task

in Rl. The subtask of placing modules on the unibus is formulated essentially as
a bin-packing problem—namely how to find an optimal sequence that fits
within spatial and power constraints. No way of solving this problem without
search is known. Consequently Rl uses a different method for this part of the
problem.
In summary, the use of abstractions should be considered for applications

where there is a large search space but no method for early pruning. Ri is an
example of a system which uses a fixed abstract solution. Within this frame-
work, it uses the Match weak method to search for a solution. Whether Match
is practical for an application depends on how difficult it is to order the
intermediate states.

3.6. Case 6—No fixed partitioning of subproblems

When every example problem in an application can be usefully partitioned into
the same subproblems, then the organization described in the previous section
should be considered. In applications with more variety to the problems, no
fixed set of subproblems can provide a useful abstraction. For example,
planning domains such as errand running (see [22]) require plans rich with
structure. To be useful, abstractions must embody the variable structure of the
plans.


154 M.STEFIKETAL.

In this section we will consider an approach called top-down refinement that
tailors an abstraction to fit each problem. The following aspects of the
approach are important.
(1) Abstractions for each problem are composed from terms (selected from a

space of terms) to fit the structure of the problem.
(2) During the problem-solving process, these concepts represent partial

solutions that are combined and evaluated.
(3) The concepts are assigned fixed and predetermined abstraction levels.
(4) The problem solution proceeds top down, that is, from the most abstract

to the most specific.
(5) Solutions to the problem are completed at one level before moving down

to the next more specific level.
(6) Within each level, subproblems are solved in a problem-independent

order. (This creates a partial ordering on the intermediate abstract states.)
The best known example of a program using this approach is the ABSTRIPS

program reported by Sacerdoti [33]. ABSTRIPS was an early robot planning
program. It made plans for a robot to move objects (e.g., boxes) between
rooms. A design goal for ABSTRIPS was to provide abstractions sufficiently
different from the detailed ‘ground’ space to achieve a significant improve-
ment in problem-solving efficiency, but sufficiently similar so that the mapping
down from abstractions would not be time-consuming. This led to an interes-
ting and simple approach for representing abstractions.
Abstractions in ABSTRIPS are plans. They differ from ground level plans only

in the level of detail used to specify the preconditions of operators. This level
of detail is indicated by associating a number (termed a criticality value) with
all of the literals used in preconditions. For example, Sacerdoti suggested the
following criticality assignments in a robot planning domain:

Type andColor 4
InRoom 3
Pluggedln and Unplugged 2
NextTo 1

In this example, the predicates for Type and Color of objects are given high
criticalities, since the robot has no operator for changing them. These predi-
cates together with the set of robot actions are combined to form plans for
solving particular problems; the space of possible plans is the set of all of the
plans that can be built up from these pieces. The most abstract plans are the
ones that include only the higher criticality concepts.
Planning in AB5TRIP5 starts by setting criticality to a maximum. In it, planning

within each level proceeds backwards from goals. Preconditions whose criti-
cality is below the current level are invisible to the planner, since it is presumed
that they will be accounted for during a later pass. After a plan is completed at
one level, the criticality level is decremented and planning is started on the


EXPERT SYSTEMS TUTORIAL 155

lower level. The previous abstract version of the plan is used to guide the
creation of the next level. For example, an early version of a plan may
determine the route that the robot takes through the rooms. In more detailed
versions, steps for opening and closing doors are included. In this way, the
abstract plans converge to the specific plan. The sequence of abstract plans is
created differently for each problem.
ABSTRIPS was a great advance over its predecessor STRIPS, which lacked the

hierarchical planning ability. Generally, when hierarchical and non-hierarchical
approaches have been systematically compared, the former have dominated.
ABSTRIPS was substantially more efficient than STRIPS, and the effect increased
dramatically as longer plans were tried. Since then, many other hierarchical
planning programs have been created. In most of the later programs, the
abstraction concepts have simply been arranged in a hierarchy, without actually
assigning them criticality numbers.
In summary, the interesting feature of top-down refinement is the flexibility

of the abstractions. Abstraction states are individually constructed to fit each
problem in the domain. In contrast to Match, top-down refinement places only
a partial ordering on the intermediate states of the problem-solver. Still, there
are some important conditions about problem solving in the domain of ap-
plication that aie inherent in the method. The basic assumption is that a small
fixed amount of problem-solving knowledge about criticality levels and top-
down generation is sufficient. Furthermore, it must be possible to assign a
partial criticality ordering to the domain concepts. What is important for one
problem must be important for all of the problems. The next section suggests
some ways to relax these requirements.

3.7. Case 7—Interacting subproblems

One basic difficulty with top-down refinement is the lack of feedback from the
problem-solving process. It is presumed that the same kinds of decisions should
be made at the same point (i.e., criticality level) for each problem in the
domain. In this section, we will explore an approach that is based on a different
principle for guiding the reasoning process called the least-commitment prin-
ciple. The basic idea is that decisions should not be made arbitrarily or
prematurely. They should be postponed until there is enough information.
Reasoning based on the least-commitment principle requires the following

abilities.
(1) The ability to know when there is enough information to make a

decision.
(2) The ability to suspend problem-solving activity on a subproblem when

the information is not available.
(3) The ability to move between subproblems, restarting work as infor-

mation becomes available.
(4) The ability to combine information from different suhproblems.


156 M. STEFIK ET AL.

Level 1 Paint the ceiling and paint the ladder.

Level 2

Level 3

Get Paint H Get Iadder~~~pply_paint to ceiling._K______
Split J Join

Get Paint I—J~~ply_paintto ladder. /

Level 3
(after conflict resolution.)

Get Paint H Get ladder. H Apply paint to ceiling._H
Is~itI ______ ____ ____________

Paint J—~_~oi2j—___—{_Applypaint to ceiling.

FIG. 3. Example of planning in NOAH. NOAH analyzes the interactions between the steps in order to
assign them an ordering in time. In this example, the ‘painting the ladder’ is seen to be in conflict
with using it. To complete both goals, NOAH decides to paint the ceiling first. Later processing will
factor out common subplans like ‘getpaint’.

Fig. 3 shows an example of this style of reasoning from the NOAH system
reported by Sacerdoti [32]. NOAH was a robot planning system that used
a least-commitment approach to assign a time-ordering to operators in a
plan. Earlier planning programs inserted operators into a plan as they worked
backwards from goals. In contrast, NOAH assigned the operators only a partial
ordering and added specifications for a complete ordering of the operators only
as required.
In Fig. 3, NOAH starts with two subgoals: paint the ceiling and paint the

ladder. Plans for the two subgoals are expanded in parallel and a conflict is
found. If the ladder is painted first, it will be wet andwe won’t be able to paint
the ceiling. In other words, the step to paint the ladder violates a precondition
(that the ladder be usable) for the step to paint the ceiling. This interference


EXPERT SYSTEMS TUTORIAL 157

between the subgoals provides NOAH with enough information to order the
tasks. If it had arbitrarily ordered the steps and painted the ladder first, it
would have had to plan around the wet ladder, perhaps waiting for it to dry.
The resulting plan would not have been optimal.
Another example of this idea was given in MOLGEN reported by Stefik [36].

MOLGEN is an expert system that used this style of reasoning for designing
molecular genetics experiments. MOLGEN’S organization involved the following
features:
(1) Interactions between nearly independent subproblems are represented as

constraints.
(2) Interactions between subproblems are discovered via constraint

propagation.
(3) MOLGEN uses explicit problem-solving operators (as opposed to domain-

specific operations) to reason with constraints.
(4) MOLGEN alternates between least-commitment and heuristic strategies in

problem-solving.
In the least commitment strategy, MOLGEN makes a choice only when its

available constraints sufficiently narrow its alternatives. Its problem-solving
operators are capable of being suspended so that a decision could be post-
poned. Constraint propagation is the mechanism for moving information
between subproblems. It enables MOLGEN to exploit the synergy between
decisions in different subproblems. In contrast with ABSTRIPS, strict backward
expansion of plans within levels, MOLGEN expands plans opportunistically in
response to the propagation of constraints.
The fourth feature illustrates an interesting limitation of the least commit-

ment principle. Every problem-solver has only partial knowledge about solving
problems in a domain. With only the least-commitment principle, the solution
process must come to a halt whenever there are choices to be made, but no
compelling reason for deciding any of them. We call this situation a least-
commitment deadlock. When MOLGEN recognizes this situation, it switches to its
heuristic approach and makes a guess. In many cases, aguess will be workable,
and the solution process can continue to completion. In other cases, a bad
guess can lead to conflicts. The number of conflicts caused by (inaccurate)
guessing is a measure of the incompleteness of the problem-solving knowledge.
Conflicts can also arise from the least-commitment process in cases where the
goals are fundamentally unattainable.
In summary, the least-commitment principle coordinates decision-making

with the availability of information and moves the focus of problem-solving
activity among the available subproblems. The least-commitment principle is of
no help when there are many options and no compelling reasons forchoices. In
these cases, some form of plausible reasoning is necessary. In general, this
approach uses more information to control the problem-solving process than
the top-down refinement approach.


158 M. STEFIK ET AL.

3.8. Case 8—Guessing is needed

Guessing or plausible reasoning is an inherent part of heuristic search. For
example, the generator in a generate and test system guesses about partial
solutions so that they can be tested. A more subtle example is a problem-solver
based on top-down refinement, which implicitly assumes that it will be able to
refine its higher abstractions to specific solutions. Some examples follow of
generic situations in reasoning where guessing is important:
(1) Many problem-solvers need to cope with incomplete knowledge and may

be unable to determine the best choice at some stage in its problem solving. In
such cases, a problem-solver is unable to finish without making a guess.
Examples of this are assumptions introduced as a first step in hypothetical
reasoning and assumptions introduced to break aleast-commitment deadlock as
discussed in the previous section.
(2) A search space may be quite dense in solutions. If solutions are plentiful

and equally desirable and there is no need to get them all, guessing is efficient.
(3) Sometimes there is an effective way to converge to solutions by sys-

tematically improving approximations. (Top-down refinement is an example of
this.) In cases where convergence is rapid, it may be appropriate to guess even
when solutions are rare.
The difficulty with guessing is in identifying wrong guesses and recovering

from them efficiently. This section considers how plausible reasoning can
benefit from particular architectural features.
One of the best known systems with architectural provisions for guessing is

the EL system for circuit analysis reported by Stallman and Sussman [35]. EL
analyzes analog electrical circuits. It has two main methods that are described
below—forward reasoning and the method of ‘assumed states’. The assumed
states provide the examples of guessing.
Forward reasoning with electrical laws is used to compute electrical

parameters (e.g., voltage or current) at one node of a circuit from parameters
at other nodes. EL uses only a few laws, such as Ohm’s Law which defines a
linear relationship between voltage and current for a resistor, and Kirchhoff’s
current law which states that the current flowing out of a node equals the
current flowing into it. Much of EL’S power derives from two things: (1) the
exhaustive application of these laws and (2) the ability to reason with these
laws symbolically as shown in Fig. 4.
This figure illustrates a circuit in which resistors are connected in a ladder

arrangement. The analysis task is to determine the voltages and resistances at
all of the nodes of the circuit. The interesting aspect of this is that symbolic
reasoning about the circuit is much simpler than writing and solving equations
for the series and parallel resistor network. Analysis begins with the intro-
duction of a variable e to represent the unknown node voltage at the end of the
ladder. This yields acurrent e/5 through resistorR6. Then by Kirchhoff’s Law we


EXPERT SYSTEMS TUTORIAL 159

Ri
5 ohms

Ri
5 ohms

Before anal~J~

R3
5 ohms

AfterAnaly.~

R3
5 ohms

R5
5 ohms

R5
5 ohms

R6
ohms

FIG. 4. Symbolic propagation of electrical parameters. Analysis begins by assigning the symbol e to
the unknown voltage at theupper right corner of the ladder. Other values are derived by stepwise
application of Ohm’s and Kirchhoff’s laws.

have the same current through R5 which gives us avoltage 2e on the left of the
resistor. This voltage across R4 allows us to compute the current through it
using Ohm’s Law. The application of electrical laws in terms of symbolic
unknowns continues until all of the voltages are defined in terms of e. At that
time we have

8e = 10 volts, e = 5/4 volts.

Sometimes circuit analysis requires the introduction of more than one
variable to represent unknown circuit parameters. In general, the analysis

evolts


160 M.STEFIKETAL.

involves two main processes: i-step deductions and coincidence. The i-step
deductions are direct applications (sometimes symbolic) of the electrical laws.
A coincidence occurs when a 1-step deduction is made which assigns a value to
a circuit parameter that already has a value (symbolic or numeric). At the time
of the coincidence, it is often possible to solve the resulting equation for one
variable in terms of the others. This allows EL to eliminate unknowns.
The propagation method can be extended to any devices where the electrical

laws are invertible and where the algebra required for the symbolic reasoning is
tractable. Unfortunately, many simple and important electrical devices, such as
inverters and transistors, are too complicated for this approach. For example, a
diode is approximately represented by exponential equations. Electrical
engineering has an approach for these devices called the method of assumed
states. This is where guessing enters into EL’S problem solving.
The method of assumed states uses a piecewise linear approximation for

complicated devices. The method requires making an assumption about which
linear region a device is operating in. EL has two possible states for diodes (on
or off) and three states for transistors (active, cutoff, and saturated). Once a
state is assumed, EL can use tractable linear expressions for the propagation
analysis as before.
After making an assumption, EL must check whether the assumed states are

consistent with the voltages and currents predicted for the devices. Incorrect
assumptions are detected by means of a contradiction, which is the event in
which chosen assumptions are seen to be inconsistent. When this happens, then
some of the assumptions need to be changed. Intelligent processing of con-
tradictions involves determining which assumptions to revise. Implementing
this idea in the problem-solver led to the following important architectural
features:
(i) queue-based control;
(2) dependency-directed backtracking.
The operators in EL that perform the propagation analysis are called demons.

They are placed in queues and run sequentially by a scheduler. When demons
run, they make assertions in the data base and then return to the scheduler.
This data base activity causes other demons whose triggers match the assertions
to be added to the queue.
EL has three queues for DC analysis with different priorities. The lowest

priority queue is used for device-state assumptions. These demons are given a
low priority so that the immediate consequences of an assumption will be
explored before more assumptions are made. The middle queue is used for
most of the electrical laws. The high priority queue is used for demons that
detect contradictions. These demons are given a high priority so that invalid
assumptions will be detected before too much computational work is done.
These demons trigger the dependency-directed backtracking.
EL keeps dependency records of all of its deductions and assumptions. In El.,


EXPERT SYSTEMS TUTORIAL 161

an assertion is believed (or in) if it has well-founded support from atomic
assumptions. An assertion without such support is said to be out. If an out
assertion returns to favor, it is said to be unouted. Fig. 5 shows an example of
this process in a data base. Al, Bl, and Cl are atomic data that are currently
in. Suppose that Al and A2 are mutually exclusive device-state assumptions.
The top of Fig. 5 shows which facts are in when Al is in. Arrows are used to
indicate support. (Assertions following from A2 are shown in dotted lines to
show that they are in the data base, but that they are out.) The bottom figure
shows what happens if Al is outed and A2 is unouted.
An important aspect of EL’S problem-solving is its ability to recover from

tentative assumptions. The details of the implementation and knowledge will
not be covered here, since there are many ways to approach this problem. The
main points are:
(1) In the event of a contradiction, EL needs to decide what to withdraw. It is

(a)

/——~fl
/

FIG. 5. Example of belief revision in EL. ~I~hcdark boxes arc in and the lighter ones are out. In (a),
Al is in and so all of its consequences are also in. In (b), Al is out but A2 is in.

(b)


162 M. STEFIK ET AL.

not effective to simply withdraw all of the assumptions that are antecedants of
the contradictory assertion. EL must decide which of the assumptions are most
unlikely to change and this requires domain-specific knowledge.
(2) EL must redo some of the propagation analysis. Sometimes it is possible

to salvage some of the symbolic manipulation (e.g., variable elimination) that
has been done. EL has special demons that decide carefully what to forget.
(3) Contradictions are remembered so that choice combinations that are

found to be inconsistent are not tried again.
These ideas were the intellectual precursors to work on belief revision

systems.2 Belief revision can he used for reasoning with assumptions or
defaults. For a problem solver to revise its beliefs in response to new knowl-
edge, it must reason about dependencies among its current set of beliefs. New
beliefs can be the consequences of new information received or derived. A
critical issue in this style of reasoning is well-founded support and there are
some pitfalls for the unwary involving cyclical support structures. An important
question is “what mechanisms should be used to resolve ambiguities when
there are several possible revisions?” It is clear this choice needs to be
controlled, but the details for making the decision remain to be worked out. In
the examples above, we used knowledge about justifications to reason about
choices. Doyle [13] has proposed a style of dialectical argumentation where the
primary step is to argue about the kinds of support for beliefs. In such system
the complexity of knowledge about belief revision would itself require a
substantial knowledge base. Every approach depends critically on the kinds of
dependency records that are created and saved. This work is at the frontier of
current Al research.
In summary, EL 15 an example of a program with organizational provisions

for plausible reasoning. It uses symbolic forward reasoning for analyzing
circuits. To analyze complicated devices EL has to assume linear operating
regions. It uses dependency-directed backtracking so that it can recover from
incorrect assumptions. It uses a priority-oriented queue to schedule tasks so
that contradictions will be found quickly and so that the immediate con-
sequences of assumptions will be considered before further assumptions are
made.

3.9. Case 9—Single line of reasoning is too weak

When we explain to someone how we solved a problem, we often invoke 20-20
hindsight and leave out the mistakes that we made along the way. Our
explanation makes it appear that we followed a very direct and reasonable
route from beginning to end. For developing intuitions about problem-solving
behavior, this gives a misleading impression that problem-solving is the pursuit

2 A bibliography of recent papers on these ideas waspublished by Doyle and London [121. Basic
algorithms for revisingbeliefs have been reported by Doyle [Ill and McAllester [261.


EXPERT SYSTEMS TUTORIAL 163

of a ~ngle line of reasoning. Actually, there are important and somewhat
subtle reasons for being able to use multiple lines of reasoning in problem
solving and several of the systems described above gain power from this ability.
These systems use multiple lines of reasoning for two major purposes as
explained below:
(1) To broaden the coverage of an incomplete search, or
(2) to combine the strengths of separate models.

The HEARSAY-Il system 115] provides the best example of the first purpose. (It
is described in the next section.) In coping with the conflicting demands of
searching a large space with limited computational resources, HEARSAY-Il
performs a heuristic and incomplete search. In general, programs that have
fallible evaluators can decrease the chances of discarding a good solution from
weak evidence by carrying a limited number of solutions in parallel.
A good example of combining the strengths of multiple models is given by

the SYN program reported by Sussman, Steele, and de Kleer [10, 381. SYN is a
program for circuit synthesis, that is, for determining values for components in
electrical circuits. The EL program described in the previous section determined
circuit parameters such as voltage given fully specified components in a circuit.
SYN determines values for the components (e.g., the resistance of resistors)
given the form of the circuit and some constraints on its behavior.
SYN uses many of the propagation analysis ideas developed for El.. The

interesting new organizational idea in SYN is the idea of slices or multiple views
of a circuit. This corresponds to the idea of equivalent circuits in electrical
engineering practice. A simple example of a slice is the idea that a voltage
divider made from two resistors in series can be viewed as a single resistor; one
slice of the circuit describes it as two resistors and another slice describes it as
one. To analyze the voltage divider, SYN uses the second slice to compute the
current through the divider. Then by reverting back to the first slice, SYN can
compute the voltage at the midpoint. In general, the idea is to switch to
equivalent representations of circuits to overcome blockages in the propagation
of constraints. The power of slices is that they provide redundant paths for
information to travel in propagation analysis.
By exploiting electrically equivalent forms of circuits involving resistors,

capacitors, and inductances, SYN is capable of analyzing rather complex circuits
without extensive algebraic manipulation. The idea of slices is not limited to
electrical circuits. For example, algebraic transformations of equations can be
viewed as means for shifting perspectives. Sussman and Steele also give an
example of understanding a mechanical watch by using structural and func-
tional decompositions.
In summary, slices are used to combine the strengths of different models.

When they are combined with forward reasoning, they provide redundant
paths for information to propagate. A problem-solver based on this idea must
know how to create and combine multiple views.


164 M. STEFIK ET AL.

3.10. Case 10—Single source of knowledge is too weak
An important adjunct to the use of multiple reasons in problem solving is the
use of multiple sources of knowledge. In this section we will consider the
HEARSAY-Il system, which coordinates diverse sources of knowledge using an
opportunistic scheduler. HEARSAY-Il is a system for speech understanding
reported by Erman et a]. [iSl. It recognizes spoken requests for information
from a data base. Production of speech involves a series of transformations
starting with the speaker’s intentions, through choice of semantic and syntactic
structures, and ending with sound generation. To understand speech HEARSAY-
!! must reverse this process.
In HEARSAY-Il the knowledge for understanding speech is organized as a set

of interacting modules (called knowledge sources or KSs) as shown by the
arrows in Fig. 6. The KSs cooperate in searching a multi-level space of partial

V —
Verify .~-.

FIG. 6. (Levels and knowledge sources in HERSAY-II). The knowledge sources areas follows:
Semantics: generates interpretation for the information retrieval system.
SEG: digitizes the signal, measures parameters, produces labeled segmentation.
POM: creates syllable-class hypotheses from segments.
MOW: creates word hypotheses from syllable classes.
Word-Ct!: controls the number of hypotheses that MOW makes.
Word-Seq: creates word-sequence hypotheses for potential phrases.
Word-Seq-Ct!: controls the number of hypotheses that Word-Seq makes.
Predict: predicts words that follow phrases.
Verify: rates consistency between segment hypotheses and contiguous word-phrase pairs.
Concat: creates a phrase hypothesis from a verified contiguous word-phrase pair.
RPOL: rates the credibility of hypotheses.

Levels Knowledge Sources

Data Base
Interface Semantics

I
Phrase IParse Concat~

Word
Sequence IWor .d-Seq .4.—

Word
MO~

Word-Seq~.~j~
V

RPOL

Syllable

POM

.~i

Segment SEG .4-

—

Parameter


EXPERT SYSTEMS TUTORIAL 165

solutions. They extract acoustic parameters, classify acoustic segments into
phonetic classes, recognize words, parse phrases, and generate and evaluate
hypotheses about undetected words and syllables. The KSs communicate
through a global data base called a blackboard with seven information levels as
shown in the figure. These levels are HEARSAY-H’s heterogeneous abstraction
spaces. The primary relationship between levels is compositional: word
sequences are composed of words, words are composed of syllables, and so on.
Entities on the blackboard are hypotheses. When KSs are activated, they
create and modify hypotheses on the blackboard, record the evidential support
between levels, and assign credibility ratings. For example, a sequence of
acoustic segments can be evidence for identifying asyllable in a specific interval
of the utterance; the identification of a word as an adjective can be evidence
that the following word will be an adjective or noun.
HEARSAY-II’S use of abstraction differs from the systems considered in cases 5

through 8. Those systems all use uniform abstraction spaces. The abstractions
are uniform in that they use the same vocabulary as the final solutions and
diper only in the amount of detail. For example in the planning systems,
abstract plans have the same structure and vocabulary as final plans. In
HEARSAY-Il, the diversity of knowledge needed to solve problems justifies the
use of heterogeneous abstraction spaces.
A computational system for understanding speech is caught between three

conflicting requirements: a large space of possible messages to understand,
variability in the signal, and the need to finish in a limited amount of time. The
number of possible ideal messages is a function of the vocabulary, language
constraints, and the semantics of the application. The number of actual
messages that a system encounters is much larger than this because speech is
affected by many sources of variability and noise. At the semantic level, errors
correspond to peculiarities of conceptualization. At the syntactic level errors
correspond to peculiarities of grammar. At the lexical and phonemic levels the
variance is in word choice and articulation. Errors at the lower levels com-
pound difficulties at the high levels. For example, the inability to distinguish
between the four phrases

till Bob rings,
tell Bob rings,
till Bob brings,
tell Bob brings,

may derive from ambiguities in the acoustic levels. HEARSAY-Il copes with this
by getting the KSs at different levels to cooperate in the solution process. This
has led to the following interesting architectural features:
(I) HEARSAY-lI combines both top-down and bottom-up processing.
(2) HEARSAY-LI reasons about resource allocation with a process called

opportunistic scheduling.


M.STEFIKETAL.

An example of top-down processing is the reduction of a general sentential
concept into alternate sentence forms, each sentence form into specific alter-
native word sequences, specific words into alternative phonic sequences and so
on until a best interpretation is identified. Bottom-up processing tries to
synthesize interpretations from the data. For example, one might combine
temporally adjacent word hypotheses into syntactic or conceptual units. In
HEARSAY-Il, some KSs use top-down processing and other KSs use bottom-up
processing.
All KSs compete to he scheduled and HEARSAY-Il tries to choose the most

promising KSs at any given moment using opportunistic scheduling. Oppor-
tunistic scheduling combines the idea of least commitment with strategies for
managing limited computational resources. The opportunistic scheduler adapts
automatically to changing conditions of uncertainty by changing the breadth of
search. The basic mechanism for this is the interaction between KS-assigned
credibility ratings on hypotheses and scheduler-assigned priorities of pending
KS activations. When hypotheses have been rated equally, KS activations for
their extension are scheduled together. In this way, ambiguity between com-
peting hypotheses causes HEARSAY-lI to search with more breadth, and to delay
the choice among competing hypotheses until more information is brought to
bear.
HEARSAY-II’S approach to data interpretation differs from that of GAl discuss-

ed in Section 3.4. Both programs contend with very large search spaces. Both
programs need to have effective ways to rule out large classes of solutions. GAL
does this with early pruning. In the absence of constraints, it would expand
every solution in the space. HEARSAY-Il constructs a complete solution by
extending and combining partial candidates. Because of its opportunistic sche-
duler, it heuristically selects a limited number of partial candidates to pursue.
To avoid missing solutions, HEARSAY-Il must not focus the search too narrowly
on the most ‘promising’ subspaces.
In summary, HEARSAY-Il provides an example of an architecture created to

meet several conflicting requirements. Multiple levels provide the necessary
abstractions for searching a large space. The levels are heterogeneous to match
the diversity of the interpretation knowledge. Opportunistic scheduling com-
bines the least-commitment idea with the ability to manage computational
resources by varying the breadth of search and by combining top-down and
bottom-up processing.

3.11. Case 11—General representation methods are too inefficient

Research on expert systems has benefited from the simplicity of using uniform
representation systems. However, as knowledge bases get larger, the efficiency
penalty incurred by using declarative and uniform representations can become
significant. Attention to these matters will become increasingly important in


EXPERT SYSTEMS TUTORIAL 167

ambitiously conceived future expert systems with increasingly large knowledge
bases.
This section considers architectural approaches for tuning the performance

of expert systems by making changes to the representation of knowledge.
Three main ideas will be considered:
(1) Use of specialized data structures;
(2) Knowledge compilation;
(3) Knowledge transformations for cognitive economy.
The organization of data structures has consequences for the efficiency of

retrieving information. The selection and creation of efficient data structures is
a principal part of most computer science curriculums. Consequently, several of
the programs discussed in the previous cases (e.g., DENDRAL, GAl, and HEARSAY-
ii) use specialized data structures. In general, these data structures are design-
ed so that facts that are used together are stored together and facts that are
used frequently can be retrieved efficiently.
Choice of data structure depends on assumptions about how the data will be

used. A common assumption about special data structures is that they are
complete with regard to specific relationships. From example, in GAl’S data
structure for molecular hypotheses, molecular segments are connected if and
only if they are linked in the data structure. This sort of assumption is
commonplace in representations like maps, which are assumed to show all of
the streets and street intersections. Representations whose structure in the
medium is analogous to the structure of the world being modeled are some-
times called analogical representations.
A less understood issue is the use and selection of data structures in systems

where many kinds of information are used. There is not much experience with
systems that mix a variety of different representations. One step in this
direction is to tag relations with information describing the chosen represen-
tation so that specialized information can be accessed and manipulated using
uniform mechanisms. Some ideas along this line appeared in Davis’ thesis [8],
where schemata were used to describe some formatting and computational
choices, but the work has not been extended to describe general dimensions of
representation (see [4j) for use by a problem solver.
A second important idea for knowledge bases is the idea of compiling

knowledge. By compilation, we mean any process which transforms one re-
presentation of knowledge into another representation which can be used more
efficiently. This transformation process can include optimization as well as the
tailoring of representations for particular instruction sets. Space does not
permit a detailed discussion of techniques here, but some examples are listed
below to suggest the breadth of the idea.

Example I. Burton reported a System for taking ATN grammars and compiling
them into executable code [51. The compiled grammars could be executed to


168 M. STEFIK ET AL.

parse sentences much more rapidly than previous interpreter-based ap-
proaches.

Example 2. Production system languages have been studied and experimented
with for several years (e.g., see [91). A basic difficulty with many production
Systems is that large production system programs execute more slowly than
small ones. The extra instructions do not need to execute to slow down the
system; their mere presence interferes with the matching process that selects
productions to run. An example of one such language is QPS2 reported by Forgy
and McDermott [19]. Forgy conducted a study of ways to make such produc-
tion Systems more efficient by compiling them into a network of ‘node
programs’ [20]. The compiler exploits two properties of production systems:
structural similarity and temporal redundancy. Structural similarity refers to
the fact that many productions contain similar conditions; temporal redun-
dancy refers to the fact that individual productions change only a few facts in
the memory so that most of the data is unchanged from cycle to cycle. Forgy’s
RETE matching process exploits this by looking only at the changes in the
memory. Forgy’s analysis shows how several orders of magnitude of speed up
can be achieved by compiling the productions and by making some simple
changes to computing hardware.

Example 3. Another system that compiles a knowledge base of production
rules, EMYCIN, was reported by van Melle [39]. EMYCIN is not considered a
‘pure’ production system since it is not strictly data-driven; the order that the
rules are tried in EMYCIN is controlled by the indexing of parameters. This
means that EMYCIN’S interpreter does not repeatedly check elements in the
working memory so that some of the optimizations used by Forgy would
provide much less of an improvement for EMYCIN. EMYCIN’S rule compiler
concentrates on eliminating redundancy in the testing of similar patterns in
rules and compiles them into decision trees represented as LISP code.

Example 4. The HARPY system for speech recognition reported by Lowerre [24]
illustrates several issues about compilation. HARPY represents the knowledge for
recognizing speech in a unified data structure (context-free production rules)
which represents the set of all possible utterances in HARPY’S domain. This data
structure represents essentially the same information that was used in HEARSAY-
H except for the parameterization and segmentation information. HARPY’S
knowledge compiler combines the syntax, lexical, and juncture knowledge into
a single large transition network. First, it compiles the grammer into a word
network, then it replaces each word with acopy of its pronunciation graph and
inserts word-juncture rules at the word boundaries. In the final network, each
path from a start node to an end node represents a sequence of segments for
some sentence. With the knowledge in its compiled form, HARPY is capable of
performing arapid search process that attempts to find the best match between


EXPERT SYSTEMS TUTORIAL 169

the utterance and the set of interpretations. The major concerns about the
extensibility of this idea are (1) that the highly stylized form of the input that
HARPY can accept makes it difficult to add new knowledge and (2) the com-
pilation is expensive for a large knowledge base (13 hours of PDP-l0 time for a
thousand word, simple grammar).

The promise of knowledge compilation ideas is to make it possible to use
very general means for representing knowledge while an expert system is being
built and debugged. Then a compiler can be applied to make the knowledge
base efficient enough to compete with hand coding. Given a compiler, there is
no need to sacrifice flexibility for efficiency: the knowledge base can be
changed at any time and recompiled as needed. In addition, the compiler can
be modified to re-represent the knowledge efficiently as hardware is changed,
or as the trade offs of representation become better understood. The tech-
niques for doing this are just beginning to be explored and will probably
become increasingly important in the next few years.
So far in our discussion of efficiency we have assumed that it is necessary for

the designers of a knowledge base to anticipate how knowledge will be used
and to arrange for it to be represented efficiently. Lenat, Hayes-Roth, and
Klahr [23] coined the term cognitive economy to refer to systems which
automatically improve their performance by changing representations, chang-
ing access (e.g., caching), and compiling knowledge bases. Systems like this
need to be able to predict how representations should be changed, perhaps by
measurements on representative problems. The ideas of cognitive economy and
knowledge compilation are more speculative than the other ideas we have
considered and there are many theoretical and pragmatic issues to be resolved
before they can be widely used. They are included here at the end in the hope
that they will receive more attention in artificial intelligence research.

4. Summary

Our pedagogical tour of cases began with the consideration of a very simple
architecture. It required that problems in an application have a small search
space and that data and knowledge be reliable and constant.
In the second case we considered ways to cope with unreliable data or

knowledge. Probabilistic, fuzzy, and exact methods were discussed. All of these
methods are based on the idea of increasing reliability by combining evidence.
Each method requires the use of meta-knowledge about how to combine
evidence. The probabilistic (and pseudo-probabilistic) approaches use various a
priori and conditional probability estimates; the fuzzy approaches use fuzzy set
descriptions; the exact approaches use non-monotonic data correction rules.
Errorful data and knowledge seem to be ubiquitous in real applications. In the
HEARSAY-Il example of case 10, we saw how an opportunistic scheduler was
used to cope with the conflicting requirements of errorful data, limited corn-


170 M. STEFIK ETAL.

putational resources, and a large search space by varying the breadth of a
heuristic search.
In the third case we considered ways to work with time-varying data. We

started with the situational calculus and then considered a program that used
transition rules to trigger expectations in amonitoring task. More sophisticated
ways to reason with time seem to require more research.
The remaining cases dealt with ways to cope with large search spaces. We

started with the hierarchical generate and test approach in the fourth case
which requires a search space to be factorable in order to allow early pruning.
This approach explores the space of solutions systematically and can be quite
effective for returning all consistent solutions. We considered the issues of
canonical forms and completeness in using this approach.
Generate and test is a weak method, applicable only when early pruning is

feasible. It requires the ability to generate candidates in away that large classes
can be pruned from very sparse partial solutions. In many applications,
solutions must be instantiated in substantial detail before they can be ruled out.
Fortunately, it is not necessary in all applications to consider all possible
solutions and to choose the best. The next few cases describe reasoning with
abstractions to reduce the combinatorics without early pruning. The methods
presented are usually applicable in satisficing tasks.
In case five, we considered a form of reasoning based on fixed abstractions.

This approach requires that all of the information needed for testing partial
solutions be available (or be able to be generated) before a subproblem is
generated. This requirement exposes the weakness of this approach: some
problems cannot be solved from the available information without backtrack-
ing. While the abstractions make the problem solver efficient, their use is too
rigid for some applications.
The next approach to abstraction is more flexible. Abstractions are com-

posed from a set of concepts in a hierarchical space. This method is called
top-down refinement. The simplest version of top-down refinement (case six)
uses a fixed criticality ordering of the concepts and a fixed partial order for
solving subproblems.
Top-down refinement does not allow for variability in the readiness to make

decisions. In the seventh case, we introduced the least commitment principle
which says that decisions should be postponed until there is enough in-
formation. This approach tends to exploit the synergy of interactions between
subproblems. It requires the ability to suspend activity in subproblems, move
information between them, and then restart them as information
becomes available. In this case, the problem-solving knowledge is much richer
than in the previous methods. It was suggested that principles like least
commitment should be incorporated as part of meta-level problem solving.
An inherent difficulty with pure least commitment approaches is the

phenomenon of a least commitment deadlock. When a problem solver runs out


EXPERT SYSTEMS TUTORIAL 171

of decisions that it knows it can make correctly, it must guess to con-
tinue. We suggested that the amount of guessing is a measure of its missing
knowledge, hut that knowledge bases will always he incomplete. This theme
was continued in case eight where dependency-directed backtracking was
discussed as an architecture to support efficient retraction of beliefs in plausible
reasoning.
Cases nine and ten illustrated: (1) the use of multiple lines of reasoning to

enhance the power of a problem solver, (2) the use of heterogeneous abstrac-
tion models to capture the variety of knowledge in some applications, and (3)
the use of an opportunistic scheduler to use knowledge sources as soon as they
become applicable (either top-down or bottom-up) and to control the breadth
of search.
Finally, we considered some methods for speeding up processing and in-

formation retrieval: specialized data structures and knowledge compilers.
These techniques do not attack the basic combinatorics of search, but they do
reduce the cycle time of problem solvers and will become increasingly im-
portant in future expert systems with large knowledge bases.
In articulating these ideas about expert systems, we were forced to decide

what was essential and important about expert systems. In some cases, other
choices of expert Systems could have been made to make the same points. Our
view is that the recent critical work in expert systems has focused on the
mechanisms of problem-solving, and research in this area has been the most
fruitful when it has been directed towards substantial applications. The ap-
plications and the research reported here are both empirical: knowledge bases
are tried, tested, and revised; the architectural research follows a similar, but
slower cycle.
An expert system is always a product of its time. System builders operate

against a background of competing ideas and controversies and must also
confront the limitations of their resources. To summarize experiments and
create a simplified theory, we must necessarily step outside of this rich
historical context. Inescapably, we are bound to our current vantage point. As
this paper is written, there is a ferment of activity in expert system research.
The theory of building intelligent systems is far from complete and the ideas
expressed here are by no means universally accepted in the Al community. To
wait for the ideas to settle and survive the test of history would preclude
creating a timely guide to current thinking.

ACKNOWLEDGMENT

In August 1980, the National Science Foundation and the Defense Advance Research Projects
Agency co-sponsored a workshop on expert systems in San Diego. The conference organizers were
Rick Hayes-Roth, Don Waterman, and Douglas Lenat. The purpose of this workshop was to bring
together researchers from many different institutions to write a definitive book [211 on expert
systems. The participants were organized in groups corresponding to book chapters.


172 M. STEFIK ET AL.

The ideas in this paper are the product of the ‘architecture’ group, of which Mark Stefik was the
appointed chairman at the conference. The other members of the group were the other co-authors
of this paper. During theconference we struggled intensively for three days to organize these ideas.
The task of writing this tutorial fell to the chairman and was carried out over the next several
months. Credit, however, belongs to all of the members of the group for actively and generously
pulling together to define the scope of this tutorial, sharpen distinctions, organize the ideas, and
later critique versions of the text.
Thanks also to Daniel Bobrow, John Seeley Brown, Johan de Kleer, Richard Fikes, Adele

Goldberg, Richard Lyon, John McDermott, Allen Newell, and Chris Tong for reading early
versions of this text and providing many helpful suggestions. Thanks to Lynn Conway for
encouraging this work, and to the Xerox Corporation for providing the intellectual and computing
environments in which it could be done.

REFERENCES

I. Barr, A. and Feigenbaum, E.A., The handbook of artificial intelligence, Computer Science
Department, Stanford University, Stanford, CA, 1980.

2. Buchanan, B.G. and Feigenbaum, E.A., DENDRAI. and Meta-DENDRAL: Their applications
dimension, Artificial Intelligence 11(1978) 5—24.

3. Bobrow, D.G. and Winograd, T., An overview of KRL, a knowledge representation language,
Cognitive Sci. 1(l) (1977) 3—46.

4. Bobrow, D.G., Dimensions of representation, in: D.G. Bobrow and A. Collins, Eds.,
Representation and Understanding (Academic Press, New York, 1975).

5. Burton, R.R., Semantic grammar: an engineering technique for constructing natural language
understanding systems, Bolt Beranek and Newman Rept. No. 3453 (December 1976).

6. Charniak, E., Riesbeck C.K., and McDermott, DV., Artificial Intelligence Programming
(Eribaum, Hilisdale, NJ, 1980).

7. Davis, R., Buchanan, B.G. and Shortliffe, E., Production rules as a representation for a
knowledge-based consultation program, Artificial Intelligence 8 (1977) 15—45.

8. Davis, R., Applications of meta-level knowledge to the construction, maintenance and use of
large knowledge bases, Ph.D. Thesis, Stanford University, AT Lab Memo AIM-283 (July 1976).

9. Davis, R. and King, J., An overview of production systems, in: E.W. Elcock and D. Michie,
Eds., Machine Intelligence (Wiley, New York, 1976) 300—332.

10. de Kleer, J. and Sussman, G.J., Propagation of constraints applied to circuit synthesis, Circuit
Theory AppI. 8 (1980) 127—144.

11. Doyle, J., A truth maintenance system, Artificial Intelligence 12 (1979) 231—272.
12. Doyle J. and London P., A selected descriptor-indexed bibliography to the literature on belief

revision, Sigart Newsletter 71 (1980) 7—23.
13. Doyle, J., A model for deliberation, action, and introspection, Al TR 581, Artificial In-

telligence Laboratory, MIT, Cambridge, MA (May 1980).
14. Duda, R.O., Hart, P.E. and Nilsson, N.J., Subjective Bayesian methods for rule-based

inference systems, SRI International, Artificial Intelligence Center Technical Note 124 (Janu-
ary 1976).

iS. Erman L.D., Hayes-Roth, F., Lesser, V.R. and Reddy, DR., The HEARSAY-Il speech-
understanding system: integrating knowledge to resolve uncertainty, ACM Comput. Surveys
12(2) (1980) 213—253.

16. Fagan, L.M., Kunz, J.C., Feigenbaum, E.A. and Osborn, J.J., Representation of dynamic
clinical knowledge: measurement interpretation in the intensive care unit, Proc. Sixth Internat.
Joint Conf. Artificial Intelligence, August 1979.

17. Fagan, J.M., vM: Representing time-dependent relations in a medical setting, Doctoral
Dissertation, Computer Science Department, Stanford University, Stanford, CA, June 1980.


EXPERT SYSTEMS TUTORIAL 173

18. Feigenbaum, E.A., The art of artificial intelligence: I. Themes and case studies of knowledge
engineering, Proc. Fifth Internat. Joint Conf. Artificial Intelligence (1977) 1014—1029.

19. Forgy, C. and McDermott, J., OPS: A domain-independent production system, Proc. Fifth
Internat. Joint C’onf. Artificial Intelligence (1977) 933—939.

20. Forgy, CA., On the efficient implementation of production systems, Doctoral Dissertation,
Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, February 1979.

21. Hayes-Roth, F., Waterman, D. and Lenat, D., Eds., Building Expert Systems (in preparation).
22. Hayes-Roth, B. and Hayes-Roth, F., A cognitive model of planning, cognitive Sci. 3 (1979)

275—310.
23. Lenat, D.B., Hayes-Roth, F. and Klahr, P., Cognitive economy, Computer Science Depart-

ment, Stanford University, Heuristic Programming Project Rept. HPP-79-IS (June 1979).
24, Lowerre, B.T., The HARPY speech recognition system, Ph.D. Thesis, Computer Science

Department, Carnegie-Mellon University, Pittsburgh, PA, 1976.
25. McCarthy, J., and Hayes, P.J., Some philosophical problems from the standpoint of artificial

intelligence, in: B. Meltzer and D. Michie, Eds., Machine Intelligence 9 (Wiley, New York,
1979).

26. McAllester, D.A., An outlook on truth maintenance, MIT Al Memo No. 551, April 1980.
27. McDermott, J., Rt: A rule-based configurer of computer systems, Department of Computer

Science, Carnegie-Mellon University, Rept. CMU-CS-80-1 19 (April 1980).
28. Newell, A., Artificial intelligence and theconcept of mind, in: R.C. Schank and KM. Colby,

Eds., Computer Models of Thought and Language (Freeman, San Francisco, 1973).
29. Newell, A., Some problems of basic organization in problem-solving programs, in: M.C.

Yovits, G.T. Jacobi and GD. Goldstein, Eds., Proc. Second Conf. on Self-Organizing Systems
(Spartan Books, Washington DC, 1962).

30. Nilsson, N.J., Principles of Artificial Intelligence (Tioga, Palo Alto, 1980).
31. Pednault, E.P.D., Zucker, SW., and Muresan, L.V., On the independence assumption

underlying subjective Bayesian updating, Artificial Intelligence 16 (1981) 213—222.
32. Sacerdoti, ED., A Structurefor Plans and Behavior (Elsevier, NewYork, 1977).
33. Sacerdoti, E.D., Planning in ahierarchy of abstraction spaces, Artificial Intelligence 5(2) (1974)

115—135.
34. Shortliffe, E.H., Computer-Based Medical Consultations: MYCIN (Elsevier, NewYork, 1976).
35. Stallman, R.M, and Sussman, G.J., Forward reasoning and dependency-directed backtracking

in a system for computer-aided circuit analysis, Artificial Intelligence 9 (1977) 135—196.
36. Stefik, M.J., Planning with constraints, Artificial Intelligence 16(2) (1981) 111—140.
37. Stefik, M., Inferring DNA structures from segmentation data, Artificial Intelligence 11 (1978)

85—114.
38. Sussman, G.J. and Steele, G.L., CONSTRAiNTS—A language for expressing almost-hierarchical

descriptions, Artificial Intelligence 14 (1980) 1—39.
39. van Melle, W., A domain-independent system that aids in constructing knowledge-based

consultation programs, Doctoral Dissertation, Computer Science Department, Stanford Uni-
versity, Rept. No. STAN-CS-80-820 (June 1980).

40. Zadeh, L.A., A theory of approximate reasoning, in: J.E. Hayes, D. Michie andLI. Mikulich,
Eds., Machine Intelligence 9 (Wiley, NewYork, 1979).

41. Zadeh, L.A., Possibility theory and soft data analysis, University of California at Berkeley,
Electronics Research Laboratory, Memo No. UCB/ERL M79/66 (August 1979).

Received August 1981