PII: 0898-1221(90)90117-3


Computers Math. Applic Vol. 20. No 9.10. pp. 12~140. 1990 0097-4943+90 $3.00+0.00 
Printed m Great Britain. All rights rese~'ed Copyright '._C, 1990 Pergamon Press plc 

D I A G N O S I N G  J A U N D I C E  EXPERT SYSTEM 

L. U. YAL~?INALP a n d  L. STERLING 
Department of Computer Engineering and Science, Case Western Reserxe University. 

Cleveland, OH 44106, U.S.A. 

Abslract--DIJEST (Diagnosing Jaundice Expert SysTem) is a medical expert system which produces a 
differential diagnosis of a patient presenting with jaundice. DIJEST is written in Prolog. and illustrates 
the use of the language for clearly expressing knowledge. Specifically, the expert system contains explicit 
declarative knowledge of anatomy and physiology which is used by clinicians when diagnosing obstructive 
jaundice. The inference engine matches patient records against expected manifestations of symptoms in 
diseases. Novel m DIJEST is the uncertainty reasonmg scheme, using contribution and absence factors, 
which places equal importance to symptoms present, absent and unknown in the patient's medical record. 
Domain specific reasoning and domain specific knowledge are clearly separated from general inference 
capabilities and knowledge representation schemes. DIJEST has performed well in preliminary tests, being 
particularl2, impressive for patients with multiple diseases. 

I. I N T R O D U C T I O N  

Research in medical expert systems, a m a j o r  a p p l i c a t i o n  a r e a  o f  AI, has led to the d e v e l o p m e n t  
of, a n d  e x p e r i m e n t a t i o n  with, new schemes for representing knowledge. N o  universal tool o r  
technique has emerged. Each research g r o u p  has its o w n  style affected by the p r o b l e m  d o m a i n  a n d  
the medical expertise being used. 

T h i s  p a p e r  describes a new medical expert system, D I J E S T  ( D i a g n o s i n g  J a u n d i c e  Expert 
SysTem), which is c o n c e r n e d  with the differential diagnosis o f  patients with o b s t r u c t i v e  jaundice. 
D I J E S T  evolved with its m a j o r  objective to explore present k n o w l e d g e  r e p r e s e n t a t i o n  techniques 
a n d  to i n t r o d u c e  a declarative style for modelling clinical p r o b l e m  solving. Subsequently a n o t h e r  
issue b e c a m e  critical, n a m e l y  the modelling o f  u n c e r t a i n t y  r e a s o n i n g  d u r i n g  the m a n y  stages o f  
c o n s u l t a t i o n  a n d  diagnosis o f  a disease. 

D I J E S T  has yet a n o t h e r  f r a m e  based scheme for k n o w l e d g e  r e p r e s e n t a t i o n  a n d  r e a s o n i n g  with 
uncertainty. It is d e v e l o p e d  using Prolog. We are able to c o m b i n e  different knowledge represen- 
tation techniques in a single f r a m e w o r k  due to the flexibility o f  P r o l o g  in the design o f  different 
d a t a  structures for the system. Specifically features o f  f r a m e - b a s e d  a n d  rule-based r e p r e s e n t a t i o n s  
were integrated with a new calculus for u n c e r t a i n t y  reasoning. G e n e r a l  medical k n o w l e d g e  a b o u t  
the d o m a i n  was easily represented in P r o l o g  declaratively. It also enabled a clear r e p r e s e n t a t i o n  
o f  inference as a specialized interpreter handling the d a t a  structures. 

O u r  scheme for u n c e r t a i n t y  reasoning is novel due to a s t r o n g  d e p e n d e n c e  on the i n t e r p r e t a t i o n  
o f  present a n d  a b s e n t  d a t a ,  i n f o r m a t i o n  that is k n o w n  to exist, known not to exist a n d  i n f o r m a t i o n  
that is unknown at the time o f  c o n s u l t a t i o n  with the p r o g r a m .  This was a c o n s t r a i n t  imposed by 
o u r  medical experts. It allows c o n t e x t - d e p e n d e n t  e v a l u a t i o n  o f  the patient d a t a .  T h e  scheme uses 
c o n t r i b u t i o n  a n d  a b s e n c e  factors which are a t t a c h e d  to p a r t i c u l a r  m a n i f e s t a t i o n s  o f  a disease. These 
f a c t o r s  c o n s t i t u t e  a numerical r e p r e s e n t a t i o n  which c o m p l e m e n t s  the qualitative descriptions in 
D I J E S T .  These qualitative descriptions m i r r o r  the medical e x p e r t s '  definition o f  the characteristics 
o f  m a n i f e s t a t i o n s .  

O u r  work has been influenced by the f a m o u s  medical expert systems, M Y C I N ,  Internist a n d  PIP. 
T h e  r e p r e s e n t a t i o n  o f  k n o w l e d g e  using f r a m e s  is similar to P I P ' s  [I]. T h e  c o n c e p t  o f  c o n t r i b u t i o n  
a n d  absence f a c t o r s  evolved f r o m  investigation o f  the confidence f a c t o r s  o f  M Y C I N  [2] a n d  the 
e v o k i n g  strengths a n d  frequencies in Internist [3]. In addition, representing c o m m o n - s e n s e  
k n o w l e d g e  in D I J E S T  is affected by the r e p r e s e n t a t i o n  o f  properties in Internist [4] a n d  the use o f  
logical decision criteria in P I P  [5]. 

T h e  basic system was designed to be e x p a n d e d  a n d  e n h a n c e d  to i n c o r p o r a t e  the stages o f  clinical 
r e a s o n i n g  d u r i n g  the c o u r s e  o f  a p a t i e n t ' s  t r e a t m e n t .  D I J E S T  has been tested o n  s a m p l e  p a t i e n t  
cases t a k e n  f r o m  medical t e x t b o o k s  a n d  patient records. It p e r f o r m s  at an a c c e p t a b l e  level 

125 


126 L.O. YAL~INALP and L. STERLING 

according to o u r  experts. An interesting feature is its handling o f  multiple diseases contributing 
to the jaundice. 

T h e  paper is organized as follows. After a brief overview o f  D I J E S T s  scope, we present D I J E S T s  
architecture and the multi-layered knowledge representation in the system. Th e next section 
describes the uncertainty reasoning mechanism which underlies the modelling o f  diagnosis, 
followed by o u r  conclusions. We emphasize in this a c c o u n t  how P ro l o g  can be used to develop 
an expert system. 

2. S C O P E  O F  D I J E S T  

2. I. The problem o f  obstructit.'e jaundice 

Jaundice is the yellow pigmentation o f  the skin o r sceleras by bilirubin. This in turn is a result 
o f  elevated levels o f  bilirubin in the blood stream [6]. T h e r e  are several reasons for this elevation. 

Most o f  the bilirubin is derived from the catabolism o f  hemoglobin present in the red blood cells. 
T h e  bilirubin is t r a n s f o r m e d  into bile and the liver plays a central role in this metabolism o f  the 
bile pigments. T h e  d e r a n g e m e n t s  o f  this metabolism cause several diseases which have jaundice as 
a c o m m o n  symptom. 

T h e  elevation o f  the bilirubin might be related to pathogenetic mechanisms o r disease processes. 
We are c o n c e r n e d  a b o u t  a subset o f  these diseases which cause obstructieejaundice. This is jaundice 
due to the mechanical o b s t r u c t i o n  o f  the biliary radicles o r functional factors that cause impaired 
hepatic excretion o f  bilirubin into bile. 

Figure I is a simplified diagram o f  the organs that are related to the flow o f  bile to the intestine 
after its excretion from the liver. T h e  enlargement o f  any organ near the bile ducts can block the 
flow o f  bile, thereby causing obstructive jaundice. T h e  principal examples are inflammation o f  the 
gallbladder, liver or pancreas, a t u m o r  or a cystlike mass in the head o f  the pancreas. Obstructive 
jaundice can also be caused by gallstones leaving the gallbladder, lodging in the bile ducts and 
blocking the flow. Diagnosing the most c o m m o n  causes o f  obstructive j au n d i ce as mentioned ab o v e 
is the p r i m a r y  focus o f  D I J E S T .  Specifically, D I J E S T  considers viral hepatitis, alcoholic hepatitis, 
cirrhosis, cholecystitis, choledocholithiasis, pancreatitis, pancreatic can cer and pancreatic pseudo- 
cyst. Hepatitis is the inflammation o f  the liver. We are co n cern ed  with two types o f  hepatitis, one 
is caused by excessive c o n s u m p t i o n  o f  alcohol and the o t h e r  by virus. Cirrhosis is the ch ro n i c 
irreversible injury o f  the liver. Cholecystitis is the inflammation o f  the gallbladder where 
choledocholithiasis refers to the obstruction o f  the bile duct by gallstone(s). Pancreatitis is 
inflammation o f  the pancreas. Pancreatic cancer refers to a cancerous growth, while pancreatic 
pseudo cyst refers to the cystlike masses at the head o f  the pancreas. 

It is critical to differentiate the mechanism that causes the o b st ru ct i o n  and the site o f  the 
o b s t r u c t i o n  in clinical practice. T h e r e f o r e ,  D I J E S T  is designed to p r o d u c e  possible diagnosis by 

, ~. ~ I n ~ r o h e p O t l C  

ncrreos 
Cyst,C ouct 

Deoclenurn----..___ " ~ , ~ l  lO of voter 

~UCtS 

Fig. I. The anatomy of organs participating in the bile flow. 


DIJEST 127 

rPat*ent profile l 
~--4 (.,sto~y, I 

I cLinicaL e~am, I 
L t a d  t e s t s )  .~ 

I Disease 
desCriptors 

I General 
medicaL 

knowledge 

MATCHER 

Candidate 
d;seases 

I EvaLuation of 
pat=ent I_ 

(dynamic patient 
data) J- 

.__L . . . . .  -=i 
t 

_f 
-L 

SCREENING 

LikeLihood 
estimates for 

candidate 
diseases 

" . . . . . . .  ] 

t 
i 

; 
I 

I 
L . . . . . . . . . . . . . . .  l r  . . . . . . . . . . . .  

- 

: t  t [ . . . . .  U ~ r  . . . . . . .  interaction 
Fig. 2. The architecture of DIJEST. 

indicating the likelihood of each of these diseases and the differentiating factors that leads to the 
diagnosis. 

2.2. Architecture o f  D I J E S T  

DIJESTs system structure as initially planned is shown in Fig. 2. The system is constructed 
around its most important component, the specialized interpreter that we call the MATCHER. In 
this section we describe the function of the system components. 

The boxes above the MATCHER indicate the knowledge used by DIJEST. Diseases are 
represented by disease descriptors. The candidate diseases are the list of diseases that are to be 
considered for differential diagnosis. The patient profile consists of all the knowledge related to 
a particular patient. 

The MATCHER analyzes the patient producing an evaluation of the patient and likelihood 
estimates for candidate diseases. The MATCHER evaluates the current mixture of known, 
uncertain, partially satisfied and unknown findings of a patient with respect to the candidate 
diseases. The evaluation of each patient includes any contradictory evidence and suggestions for 
additional tests that should be performed. The results will provide feedback for the next stage of 
diagnosis, More details of the MATCHER are given in Section 4. 

A full evaluation of the output of the MATCHER was intended to be considered by the screening 
process. Currently the patient profile is examined for the manifestations expected by the disease 
descriptors. The screening process would also evaluate significant patient data that is not explained 
by the differential diagnosis. 

3. K N O W L E D G E  R E P R E S E N T A T I O N  IN DIJEST 

The knowledge base of DIJEST consists of medical knowledge about jaundice and information 
about the patient. Knowledge of jaundice is divided into descriptions of the diseases which cause 


128 L. 0. YAL(~INALP and L. STERLING 

disease_history(noL_appl, 
disease_desc(choledocholithiasis, 
history(noL.appl, ~previous illness 

not_appl, ~pre~ious tests 
not.appl, %e~osed_to 
not_appl, ~family_background 
[ %symptoms 
(pain ,[site(abdomen,S), 

severity(none_to_severe,I), 
continuity(intermittent,S), 
duration(short,3), 
coupled_by(nausea, 1 I, 
coupled_by(vomiting, 1 I, 
threshold(131]. 
contribution_absence_factors(0.9,-O. 111, 

normalization J'actor(O.gl], 
not.appl, ~obser~ations 
not_appl, J~drng.use 
not_appl ~surgery 
1, 

clinical( 
[(jaundice,[pace(fast,2), 

pace(medium_slow, 1.31, 
threshold(1.31], 
contribution_absence_factors(0.6,-2.011, 

(gallbladder,present,contribution_absence.factors(O.9,-2.011 , 
(tender_abdomen,[site(upper.quadrant,S I, 

condition(attacks,S), 
threshold(lO)], 
contribution_absence_factors(0.7,-2.0)), 

normalization.Saetor(2.21] 
1, 

labtests( 
[(obstructive.tests,disease_related,contribution_absence_factors(O.7,-2.011, 
(gallbladder, gallstones(presentl,contribution_absence_factors(0.8,0.211, 
(common.bile_duct, obstruction(presentl,contribution_absence_factors(O.9,-2.0/), 
(common.bile_duct, dilatation(abnormal),contribution_absence_factors(O.CJ,-2.0/), 
normalization.factor(2.7)/] 
1. 

Fig. 3. Disease descriptor of choledocholithiasts. 

jaundice, and general medical knowledge. This section describes the significant points in o u r  
representation. 

3. I. Disease descriptors 

The diseases that have j a u n d i c e  as a c o m m o n  s y m p t o m  are represented individually in D I J E S T  
by disease descriptors or DDs. A D D  describes the p r o t o t y p i c a l  characteristics o f  a patient who 
has the disease and it is represented by a framelike structure in Prolog. Figure 3 shows the disease 
descriptor for choledocholithiasis. 

Each disease descriptor is a q u a d r u p l e  indexed by the disease name. T h e  expected characteristics 
related to the history, clinical examination and the laboratory tests are the c o m p o n e n t s  o f  this 
structure. We will refer to those three c o m p o n e n t s  as contexts. Each co n t ex t  consists o f  a n u m b e r  
o f  slots that show the logical subdivision within that context. F o r  example, the history context has 
eight slots showing previous diseases, previous tests, en v i ro n m en t al  and clinical factors which 
suggest the disease if the patient has been exposed to them, facts that are related to family 


DIJ EST 129 

b a c k g r o u n d ,  expected previous symptoms, expected physical observations, usage o f  particular 
drugs and previous a b d o m i n a l  surgery, respectively. T h e  co n t ex t s for b o t h  clinical ex am i n at i o n  and 
l a b o r a t o r y  tests have only one slot. Slots that are not applicable fo r a specific disease are shown 
by no_appl as illustrated by Fig. 3. 

Each slot consists o f  specific characteristics related to the slot. T h e y  are called elements. T h e y  
are indicated by slanted uppercase letters in Fig. 3. An element is either a single characteristic o r 
a disjunction o f  characteristics. 

A single characteristic is called a key tuple and is indexed by a key. A key is the smallest 
c o m p o n e n t  o f  this layered structure and it can be either a direct key or an extractable key. 
A key tuple consists o f  key attributes, and a pair c o n t a i n i n g  a contribution factor and an absence 
factor. The key attributes o f  a key tuple show the characteristics o f  a key which are related 
to the disease. T h e y  are defined with respect to the key type. T h e  c o n t r i b u t i o n  and absence 
factors are related to the uncertainty reasoning handled by the M A T C H E R  and will be dis- 
cussed in the next section. T h e y  are represented as contribution~absence_factors(CF, AF) for clarity 
in Fig. 3. 

T h e  type o f  a key determines how the diagnosis is handled by D I J E S T .  T h e  different 
key types are known by the M A T C H E R  and handled differently. T h e y  are described 
below. 

!. Direct keys. A simple concept or a finding is represented by a direct key. T h e  key attri- 
butes o f  a direct key are given by a list o f  qualitatit,e characteristics defining the key. F o r  
each attribute, a n u m b e r  showing the i m p o r t a n c e  o f  this qualitative description with respect 
to the key is given as an integer between I and 5. It is called the significance ~,alue o f  
the attribute. T h e  qualitative descriptions along with significance values describe the co n cep t  
or finding fully. F o r  example, pain is a direct key, but its severity, d u r a t i o n  or location 
differs from one disease to a n o t h e r  as well as their relative significance for defining the pain. 
In Fig. 3, the respective attribute values o f  pain for choledocholithiasis are shown. F o r  
example, a patient is expected to have pain with intermittent continuity. Since intermittent 
pain is an i m p o r t a n t  indicator for choledocholithiasis, it is given a significance value o f  5. 
Every direct key is defined with a threshold that is used by the M A T C H E R  for uncertainty 
reasoning. 

If. Extractable keys. An extractable key represents a medical c o n c e p t  that can not be described 
as a simple finding. T h e  concept needs to be extracted fro m  the patient data, In o r d e r  to simplify 
their representation, they are shown as a single key in the disease descriptors. T h e y  are divided 
into four categories to simplify the diagnosis: 

(i) Some o f  the keys represent anatomical  o r physiological states o r concepts. T h e  
patient is expected to be in or have such states if he is likely to have the disease. 
F o r  example, gallbladder is a key in the l a b o r a t o r y  tests context, and the 
presence o f  gallstones is its defining at t ri b u t e as shown in Fig. 3. Th e value o f  
this a t t r i b u t e  is used as an aid for determining the site, o r the exact co n d i t i o n  
for such a key. Since there m a y  be m a n y  tests that would determine whether 
the patient has the specified state, this representation allows the M A T C H E R  to 
determine the necessary diagnostic tests that would indicate the co n d i t i o n  given 
in this key. 

(ii) T h e  names o f  blood tests are used as keys in D I J E S T  to show the tests 
required in the diagnosis, for example bilirubin, amylase and sgot. Th e 
analysis o f  the results o f  a particular b l o o d  test is disease dependent. Th e 
same blood test may indicate different likelihoods o f  the presence o f  a 
disease for different diseases. Possibili O, distribution cun,es are used to 
represent those ranges o f  results o f  b l o o d  tests. T h e y  are separate from the 
disease descriptors and were provided by o u r  medical experts. H o w  they 
are used with respect to the disease descriptors will be covered in the next 
section. 

(iii) Some keys refer to a collection o f  simple o r complex findings. T h e  individual 
findings in the collection might not be very significant on their own o r affect the 


130 L. 0. YAL~INALP and L. STERLING 

(iv) 

diagnostic process. However, their c o m b i n a t i o n  constitutes a medical co n cep t  
and should be considered as a composite finding. We call such keys compound 
keys. T h e  collection o f  individual findings are represented in separate tables for 
each c o m p o u n d  key. C o m p o u n d  keys are represented as a c o m b i n a t i o n  o f  direct 
or extractable keys, but the c o n t r i b u t i o n  and absence factors are defined for the 
c o m p o s i t e  meaning. Prodrome is an example o f  a c o m p o u n d  key which is used 
for diagnosing hepatitis. Figure 3 does not contain an example o f  a c o m p o u n d  
key. 
Some keys represent rules that D I J E S T  has to activate in o rd er to check the 
presence o f  a disease. T h e y  are used by the M A T C H E R  to evaluate the patient 
d a t a  that are related to different contexts or to c o m p a r e  the tests results. F o r  
example, inflammation and obstructive tests are two rule names. T h e  latter is 
shown in Fig. 3. T h e  key attributes for this kind o f  key are not used since the 
c o n c e p t  to which they refer is e m b e d d e d  in the rules. 

3.2. Patient data 

T h e  i n f o r m a t i o n  a b o u t  a patient is given as input to D I J E S T .  All the i n f o r m a t i o n  related to a 
patient is indexed by a unique patient number. It is given in four different frames, analogous to 
the contexts in the disease descriptors. We refer to the i n f o r m a t i o n  a b o u t  the patient as the patient 
profile. 

I. Patient ID record. Consists o f  identification information. 
2. Medicalhistorv. This frame is similar to the history co n t ex t  o f a  DD. An example 

frame for a patient is shown in Fig. 4. It consists o f  six different slots 
c o r r e s p o n d i n g  to the first six slots in a D D . D ru g  and surgery i n f o r m a t i o n  are 
represented as separate slots in the knowledge base if the patient has relevant data. 

medical.history(lO001, 
% pre~ious diseases 

[(jaundice,[occur rence(negative)]), 
(alcoholism,[occurrence(negative)I)], 

previous_tests 
[(wbc,[date(2.2,1987),site(blood),result(10200)])], 

%exposed to 
[(hepatotoxins,[exp_to(negative)]), 
(jaundiced_people,[exp_to(negative)])], 

~famzly background 
[Cjau ndice, [occurrence(negative)])], 

%symptoms 
[(pain ,[date(15,1,1987), 

site(abdomen), 
severity(severe), 
continuity(intermittent), 
duration(6, days), 
coupled_by(nausea), 
coupled_by(vomiting)I). 

(.a-sea.[date(iS,1,198Z)]), 
(vomiting,[date(15,1,1987),cause(la rge_clinner)]), 
(intolerance_fatty_foods,[reaction(negative)])], 

%observations 
[(skin,[color(yellow)I), 
(u rine,[color(d ark)I), 
(stool ,[color(llght.brown)])])). 

Fig. 4. Medical history for patient IO00l. 


DIJ EST 131 

. 

. 

Only simple findings with their related attribute values are represented in this 
frame. The attributes have both quantitatit,e and qualitatit,e descriptions. For 
example, the attribute duration for the key pain shows how long the patient has 
been in pain, and might have the value "6 days". 
Clinical examination. This frame consists o f  two slots: the knowledge related to 
the actual physical examination o f  the patient and the results o f  the cardiovascular 
tests routinely taken. As in the history frame o f  the patient, the simple findings 
are represented as binary key tuples. 
Tests. Each test frame for a patient is indexed by the test name and the patient 
ID number. It consists o f  information about the date o f  the test and its result with 
respect to the site where it is taken. For tests such as ultrasound, the results are 
given as a collection o f  findings related to a site, its state and its condition, because 
those tests are used to determine the condition o f  different parts o f  the body. For 
blood or urine tests, their specific site is indicated along with a single result. 
Examples for patient 10001 are shown below. 

test(10001 ,date (2,3,1987),soot, [ (blood,80) ] ). 

test (10001 ,date (2,3,1987),alk_phosp, [ ( b l o o d , 1 2 0 ) ]  ). 
test (10001 ,date(2,3,1 987),ultrasound, [ (gallbladder,edema,present), 

(common_bile_duct,dilations,s) 
(gallbladder,gallstones, present), 
(pancreas,swelling,normal), 
(pancreas_head,dilatation,normal), 
(pancreas,state,normal) ] ). 

3.3. General medical knowledge 

Medical knowledge in D I J E S T  is represented independently from any particular patient. 
Examples o f  such knowledge are the general characteristics o f  jaundice, what the available tests 
measure along with their possible sites, the restricted a n a t o m y  o f  the human body that concerns 
the d o m a i n  diseases, the d o m a i n  specific qualitative representation o f  quantitative terms and the 
possibility distribution curves o f  blood test results. This knowledge is used by the M A T C H E R  
when creating the differential diagnosis. F o r  example, ultrasound is used to determine the presence 
o f  gallstones in the gallbladder or the size o f  the bile ducts, or sgot is a blood test and its primary 
function is to detect liver injury. This information is represented declaratively in DIJEST and 
illustrated below with a few examples. 

lab_test(ultrasound, [ (gallbladder,gallstones, [ present,a bsent] ), 
(gallbladder,edema, [present,absent] ), 
(pa ncreas_head,dilation, [normal,s,inc] ), 
(extra hepatic_d ucts,dilatation, [ normal,s, inc] ), 
(intrahepatic_d ucts.dilatation, [ normal,s, inc] ), 
(liver, hepatic_texture, [ homogen,not homogen ] ), 
(pancreas,swelling, [head,diffuse, normal] ), 
(pancreas,state, [atrophic,ind urated,cyst, normal] )] ). 

lab_test (u ribirilogen, [ (u rine,excretion_bile, [present.absent.decreased,increased] )] ). 

4. P A T I E N T  E V A L U A T I O N  IN D I J E S T  

4. !. A Prolog-based MA T C H E R  

As its name suggests, M A T C H E R  compares a patient profile with the disease descriptors present 
in the system. It is a special interpreter written in Prolog which compares the frame structures, takes 
into account present, absent and u n k n o w n  factors and establishes likelihood scores for the presence 
o f  a disease. The findings o f  a disease, namely its DD, is matched against a patient's profile in the 
three different contexts o f  history, clinical exam and laboratory test data. A likelihood score is 


132 L. 0. YAL~INALP and L. STERLING 

calculated for each context, and the overall likelihood score for the disease is c o m p u t e d  as the 
average o f  the scores for the three contexts. 

diagnose( Patient, History,Clinical,Tests,disease( Disease, DiseaseProb) ) ,-- 
disease_desc(Disease, DH,DC, DT), 
eval_history(Disease, Patient, History, ClinicaI,Tests, D H,H ist Prob), 
eval_clinical (Disease, Patient.H istory,ClinicaI,Tests, DC,ClinicalProb). 
eva I_tests ( D isease, Patient, H istory. C li n icaI,Tests, DT,Tests Prob), 
combine_prob(Hist Prob,ClinicaIProb,TestsProb, DiseaseProb). 

combine_prob(H P.CP,TP, FinalProb) ,- 
FinalProb is ( H P + C P + T P ) / 3 . 0 .  

L o o k i n g  at the sample disease descriptor in Fig. 3, and a patient's medical history record fro m  
Fig. 4, it should be clear that the matching is not a direct unification o f  expected values o f  attributes 
for keys for a slot in a particular context. Nevertheless, the interpreter uses unification to determine 
key types to handle different type o f  keys. Details o f  the M A T C H E R  will be covered in the 
follo~.ing sections. 

The findings o f  a patient are evaluated with respect to a list o f  d o m a i n  diseases, called the 
candidate disease list. T h e  c a n d i d a t e  disease list in the c u r r e n t  version o f  D I J E S T  is all the known 
diseases that are present in the knowledge base. Heuristic rules could be added as a front-end to 
generate a shorter list. F o r  example, some sets o f  s y m p t o m s  suggest very strongly viral hepatitis 
and nothing else. At the m o m e n t ,  all the c a n d i d a t e  diseases are processed in a straightforward 
m a n n e r  and for each D D  on the c a n d i d a t e  disease list, a likelihood score is calculated which 
represents the possibility that a patient has the disease. 

4.2. Calculation oJ" likelihood scores 

A cot!fidence measure (CM) is calculated separately for each slot in a context. T h e  likelihood 
score for the context is a weighted sum o f  the CM for all the slots in the context. T h e  weighting 
is affected by the n u m b e r  o f  relet,ant slots in a context. T h e  relevance o f  a slot is disease-dependent. 
F o r  example, the family b a c k g r o u n d  o f  the patient is not relevant for choledocholithiasis as shown 
in Fig. 3 and it is indicated by a not_appl value o f  the slot. 

T h e  CM for a slot is calculated from the C M s  o f  all the elements in the slot. T h e  confidence 
measure for an individual slot element represents how much the patient profile satisfies the 
requirements o f  that element o f  the disease descriptor. T h e  calculation o f  individual CMs is tied 
to o u r  use o f  c o n t r i b u t i o n  and absence factors to be described below. 

Recall that an element is either a single key tuple or a disjunction o f  them. T h e  CM calculation 
o f  a key tuple is determined by the key type, and the requirements satisfied by the patient profile 
which is related to the c o n t r i b u t i o n  and absence factors. Th e CM o f  a disjunction o f  key tuples 
is the largest CM o f  one o f  the disjuncts. 

If the M A T C H E R  can find the manifestations defined for a key tuple that are expected to be 
present in a patient with a particular disease, then this key tuple is t, alidated. If only some o f  the 
findings are existent, then this key is partially ~,alidated. When the patient profile is known not to 
have those findings or there is evidence against the presence o f  the findings, then the key is 
im'alidated. The M A T C H E R  considers the keys to be unknown if it can not find the related 
attributes from the patient profile in the case o f  a direct key, or extract it in the case o f  extractable 
keys. F u r t h e r  details a b o u t  the validation process are presented after the discussion o f  the use o f  
c o n t r i b u t i o n  and absence factors. 

4.3. The role o f  contribution and absence factors 

C o n t r i b u t i o n  and absence factors are the essence o f  the mechanism for reasoning under 
uncertainty in D I J E S T .  A contribution factor (CF ) and an absence factor (AF) are defined for each 
key in exert, key tuple o f  a slot in the DD. T h e  C F  determines the degree o f  i m p o r t a n c e  o f  the 
presence o f  the specific concept represented by the key n am e to the slot in which it occurs. It 
indicates the expectation that a patient has the specific disease when the i n f o r m a t i o n  in his/her 
profile validates the requirements o f  this key. T h e  c o n t r i b u t i o n  fact o r is defined as a real n u m b e r  
between 0 and I, inclusive. F o r  example, the c o n t r i b u t i o n  factor o f  the direct key pain is 0.9 for 


DIJEST 133 

choledocholithiasis as shown in Fig. 3. It shows that the presence o f  pain as defined by its respective 
values is very i m p o r t a n t  for choledocholithiasis. 

T h e  A F  determines the i m p o r t a n c e  o f  the absence o f  the co n cep t  in the patient profile. It 
effectively measures the likelihood o f  a patient to have o r not to have a disease given the absence 
o f  the key. It is represented on a scale o f  ( - . ~ ,  I). T h e  wide scale o f  absence factors is used to 
influence the i m p o r t a n c e  o f  a specific key to the entire slot within which it is defined. F o r  example, 
the absence factor o f  pain is - 0 . 1  as shown in Fig. 3. C F  values fo r a key are always greater than 
the A F  values. 

T h e  analysis to determine whether the patient has the disease depends purely on CFs and 
AFs. O u r  scheme is similar to the scoring mechanism in P IP  where the scores are given in the 
frames [I]. The CFs and AFs are actually the quan t i t at i v e representation o f  the qualitative terms. 
such as "usually p r e s e n t " ,  " c o n f i r m i n g " ,  "'critical", "'more likely", "'less likely" and "'contra- 
dicting", that were used by o u r  medical experts. Th e terms have been distributed on two different 
scales by using CFs and AFs. 

Each application has a base value, BV, which partitions the c o n t r i b u t i o n  factors into two sets, 
those a b o v e  the BV and those below it. The BV is used as a point o f  reference for the distribution 
o f  c o n t r i b u t i o n  and absence factors o f  the keys. F o r  D I J E S T ,  a BV o f  0.5 was used. 

T w o  principles underly o u r  choice o f  values for c o n t r i b u t i o n  and absence factors from their 
respective scales for specific keys: 

• C F / >  BV indicates that the key is i m p o r t a n t  to establish that the patient has the 
disease under consideration. 

• A F  < 0 indicates that the absence o f  the key is i m p o r t a n t  to c o n t r a d i c t  that the 
patient has the disease under consideration. 

Confidence measure values are classified into four categories based on these two principles: 

I. C F > B V ,  AF~>O. These keys are confirming. A confirming key in the patient 
profile contributes significantly to the likelihood score. Its validation will lead to 
a high score. H o w e v e r  even if the key is not validated, the disease m ay  still figure 
p r o m i n e n t l y  in the final differential diagnosis. 

2. CF>~ BV, A F  ( O .  These keys are critical. Critical keys have the most impact on 
determining the likelihood score. T h e  validation o f  a critical key contributes to 
a high score. T h e  invalidation o f  a critical key co n t ri b u t es negatively to the score 
by using the AF. If a critical key is un k n o w n , a neutral position is taken. 

3. CF ( B V .  A F  < O. These keys are contradicting. T h e  validation o f  a co n t rad i ct i n g  
key does not strongly confirm the existence o f  the disease. T h e  invalidation o f  a 
c o n t r a d i c t i n g  key can lead to a very low likelihood score. 

4. C F  < BV, A F  i> O. These keys are minor. M i n o r  keys are used for fine tuning the 
differential diagnosis and will play a greater role in the future screening process. 

This classification scheme a p p r o x i m a t e l y  c o r r e s p o n d s  to the following use o f  eroking strength 
and frequency values in Internist's scoring mechanism. 

• Critical keys: eroking strength, 4 frequency 4. 
• C o n t r a d i c t i n g  keys: et,oking strength, l frequency 4. 
* Confirming keys: et,oking strength, 4 frequency 2. 
• M i n o r  keys: et'oking strength, 2 frequency I. 

Each element in a slot list is evaluated accord i n g  to the a b o v e  classification. T h e  M A T C H E R  
determines how well the patient profile fits the structure that is determined for this element. Using 
the state o f  the patient profile with respect to the attributes o f  each element in this slot list and 
using the C F  and A F  factors, the m a t c h e r  determines the CM o f  this element. Repeating this 
iterative process, all CM values o f  the elements in a slot list are accumulated and normalized by 
the unique normalization_factor for the slot. T h e  overall sum o f  the slots determines the score o f  
a particular context and then the likelihood score o f  the disease. 

T h e  matching process for the slot values is illustrated below. In the code, Context refers to the 
current name o f  the context, Hypothesis refers to the n am e o f  the disease currently investigated 


134 L. 0. YAL~INALP and L. STERLING 

and PatientSlot has all the values that are currently k n o w n  for the Patient for a particular slot, 
such as symptoms. T h e  first clause illustrates that the slots which are not applicable are n o t  skipped 
over, with the assumption that they are completely satisfied for probability calculations. 

satisfy_slots(Context, Hypothesis, Patient, PatientSlot, noLappl,1.0). 
satisfy_slots(Context, Hypothesis,Patient, PatientSlot, Slot,SlotProb) ,- 

Slot' = = not_appl, 
satisfy_slot (Context, Hypothesis, Patient, PatientSlot, Slot,Slot Prob,O). 

satisfy_slot (Context,Hypothesis, Patient, PatientSlot,[normalization_factor(N F)], 
SlotProb,AccProb) *- 

SlotProb is AccProb/NF. % normalize for a slot 
satisfy_slot (Context, Hypothesis, Patient, PatientSlot, [ Keyl Key List], Slot Prob,Acc Prob) ,-- 

Key = = normalization_factor(NF), 
satisfy_key(Context, Hypothesis, Patient, PatientSlot, Key,CM), 
accumulate (Acc Prob,C M,AccProbNext), 
satisfy_slot (Context,Hypothesis, Patient,PatientSlot, KeyList, SlotProb,AccProbNext). 

satisfv__key determines whether a key is a single key o r a disjunction o f  keys. 

T h e  code for processing single keys is given below. Th e find predicate extracts the values for a 
particular key from the patient profile. 

h a n d l e _ s i n g l e _ k e y ( C o n t e x t ,  H y p o t h e s i s ,  Patient, P a t i e n t S l o t , K e y ,  KeyValues,C F,AF, CM) ,--- 
i s_cl i r e c L k e y  ( C o n text, Key, KeyVal u es), 
find (Key, PatientVals, P a t i e n t S I o t ) ,  
d i r e c t _ k e y  ( C o n t e x t ,  H y p o t  hesis, Patient, PatientVals, Key, KeyValues, C F,AF, C M). 

hand•e-sing•e-key( C•ntext'Hyp•thesis•Patient•PatientS••t•Key'Key•a•ues'•F•AF'CM) ,-- 
n o t  is_direct_key(Context, Key, KeyValues), 
extract_from (Context, Hypothesis, Patient. PatientSlot, Key, KeyValues, CF,AF,CM). 

hand•e-sing•e-key( C•ntext•Hyp•thesis•Patient•Patients••t•Key•Keyva•ues•CF•AF•CM) .-- 
base_value(BV), 
not_known (Context, Hypothesis, Patient, Key, KeyValues, BV, C F,AF, CM) 

direct_key( Context, Hypothesis, Patient, PatientVals, Key, KeyValues, CF,AF,CM). 
check_whether_absent(PatientVals), 
absent_key(Hypothesis,Patient,PatientVals, Key,KeyValues, CF,AF,CM), 

direcLkey(Context, Hypothesis, Patient, PatientVals, Key, KeyValues, CF,AF,CM) .-- 
check_whether_present(PatientVals), 
match_compare (Context, Hypothesis, Patient, PatientVals, Key, KeyValues, CF,AF,CM). 

4.4 Calculating confidence measures for keys 
This subsection describes how the individual C M s  are calculated for individual keytuples. Both 

direct keys and extractable keys are treated in detail. O u r  description here is qualitative in nature. 
T h e  exact formulae used can be found in Ref. [7]. 

The confidence measure o f  a direct key is calculated t h r o u g h  an extended c o m p a r i s o n  o f  the 
values in the key attribute list o f  the D D  with the patient values as shown below. T h e  first stage 
is to calculate the patient sum. that is a score indicating how well the patient values match the 
attribute values. Patient sums are only calculated for keys which actually a p p e a r  in the patient 
profile. 

match_compare (Context, Hypothesis, Patient, PatientVals, Key, KeyVals, C F,AF, C M ) ,-- 
compute_patient_sum (Context, Hypothesis, Patient, PatientVals,CF,AF, 

Key, KeyVals,O,PatientSum,Contradiction Flag), 
member (threshold (Threshold),KeyVals), 
find_normalization (KeyVals,Norm Factor), 
compute_key_prob(Contradiction Flag, PatientSum,Norm Factor,Threshold,CF,AF,CM). 

The M A T C H E R  calculates patient sums as follows. First the terms used in the patient profile, 
which may be a mixture o f  qualitative and quant i t at i v e terms such as 6 days, are converted to the 


DIJEST 135 

d o m a i n  d e p e n d e n t  qualitative terms which are used in the D D s, fo r example short o r medium. T h e  
terms are then c o m p a r e d  with the actual terms in the D D  and exact matches an d  c o n t r a d i c t i o n s  
are noted. T h e  terms which exactly match are s u m m e d  using weights which are given in the D D  
with respect to each attribute. This is illustrated with the co d e presented below. 

compute_patient-sum (Context, Hypothesis, Patient, PatientVals, CF,AF, 
Key, [threshold (T) ] ,TotalSu m,TotalSum,no). 

compute_patient_sum (Context, Hypothesis, Patient, PatientVals, CF,AF, 
Key, [ ElementIKeyVals] ,InterSum,NextSum,Contradiction Flag) ,-- 

Element'.,. = = threshold(T), 
match ( PrevContrad iction Flag, Context, Hypothesis, Patient, Key, PatientVals, Element, 

InterSum,TotalSum), 
check_contradiction (PrevContradiction Flag,Context, Contradiction Flag,Hypothesis, 

C F,AF, Key,TotalSum,NextSum). 

match (no,Context, Hypothesis, Patient, Key, PatientVals, Element, I nterSu m,AccSum) 4-- 
match-single_val(Hypothesis, Element, PatientVals,AttrContr), 
AccSum is InterSum + AttrContr. 

match (yes, Context, Hypothesis, Patient, Key, PatientVals, Element, lnterSum,AccSum) , -  
is_a_contradiction (Hypothesis, Patient, Key, PatientVals), 
record_contradiction (Context, Hypothesis, Patient, Key). 

match_single_val ( H ypothesis,site (Site, Contr),AIIValues, Contr) ,- 
member(site(Patsite),AIIValues), 
appropriate_.site(Hypothesis, Patsite). 

match_single_val (AnyConcept, Parameter,AIIValues, Contr) , -  
%generalized matching 
Parameter =.. [Name, ParVaI,Contrl, 
FindVal = .. [Name,SomeVal], 
member(FindVaI,AIIValues), 
match_from_tables(AnyConcept, Name, ParVaI,SomeVal). 

% Sample facts 
appropriate_site(choledocholithiasis, righLupper_quadrant). 
appropriate_site (choledocholithiasis,epigestrium). 
match_from_tables(_,duration, DAYS,short) *- 

number(DAYS), DAYS > =1, DAYS < 11. 
match_from_tables(_,duration, DAYS,moderate) , -  

number(DAYS), DAYS > 10, DAYS < 36. 
match_from_tables(_,duration, DAYS,long) ,- 

number(DAYS), DAYS > 35. 

Every direct key has a threshold, which is the m i n i m u m  value o f  the patient sum considered to 
adequately match the key. T h e  second stage o f  the M A T C H E R  is to c o m p a r e  the patient sum with 
the threshold set for this key. On the basis o f  this c o m p a r i s o n ,  the M A T C H E R  concludes whether 
the patient profile satisfies the a t t r i b u t e  values completely, partially, o r c o n t r a d i c t s  them, and 
calculates the C M  accordingly. 

If the patient sum exceeds the threshold value, then we say that the direct key has been t'alidated. 
T h e  CM value is this case is the C F  value. F o r  example, the at t ri b u t e values o f  the patient in Fig. 4 
indicates a sum o f  13 points. This is equal to the threshold value for this key, t h erefo re the direct 
key pain is validated for this patient. T h e  C M  is then set to 0.9. 

I f  the patient sum is less than the threshold, and no c o n t r a d i c t i o n  has been noted, the key has 
been partially z,alidated. T h e  confidence measure is a normalized fraction o f  the C F  value. This is 
handled by compute-key_prob. M o r e  details are in Ref. [7]. 

If a c o n t r a d i c t i o n  has been noted, the value o f  the CM differs depending w h et h er the absence 
factor o f  the key is positive or negative. If the A F  is negative, it is returned as the CM. Otherwise 
the CM is the negative o f  the C F  value. This is handled by check_contradiction. 


136 L. I~. YAL(~INALP and L. STERLING 

We describe each o f  the four categories o f  extractable keys in turn, where the key appears in 
the patient profile: 

(i) Special-purpose knowledge is used to handle the an at o m i cal  o r physiological states that 
are indexed as a key, such as c o m m o n  bile duct o b s t r u c t i o n  as in Fig. 3 o r swelling o f  the pancreas. 
Some sample facts are given below. 

extract_from(Context, Hypothesis, Patient, PatientSlot, Key,KeyValuas, CF,AF,CM) ,- 
anatomy(Key), 
anatomy_test (Context, Hypothesis, Patient, PatientSIot, Key, KeyValues,CF,AF,CM). 

anatomy (Key) ,- organ (Key). 
anatomy(Key) ,-- system (Key, SystemComponents). 
anatomy (Key) ,- system (Sys, SystemComponents), 

part_of (Key,SystemComponents). 

system (intra hepatic_ducts, [left_intra hepatic_duct,rig ht_intra hepatic_d uct] ). 
system (extra hepatic_ducts, [common_bile_duct,cystic_duct,pancreatic_duct] ). 

anatomy_test (Context, Hypothesis, Patient, PatientSIot,Organ,present,CF,AF,CF) ,- 
organ(Organ), 
surgery(Patient,SurgeryList), 
not taken (SurgeryList,Organ). 

anatomy_test (Context, Hypothesis, Patient, PatientSIot,TestContext, 
(Specification,FacttoDetermine),C F,AF,CM) ,- 

test_illustrates (TestContext, Specification, ListofTests), 
prioritize(ListofTests,FinalTests), 
patient_satisfies (Hypothesis, Patient, PatientSIot,TestContext, 

Specification.FacttoDetermine, FinalTests, CF,AF,CM). 

F o r  each state, the set o f  relevant tests is determined along with their o rd er o f  preference. T h e  
representation o f  anatomical knowledge in D I J E S T  has been designed to allow the M A T C H E R  
to find the necessary tests that would indicate the presence o f  the specified state. F o r  example, the 
M A T C H E R  finds that ultrasound and C T  tests are indicative for understanding the co n d i t i o n  o f  
the c o m m o n  bile duct when checking choledocholithiasis [7]. 

After the necessar,v tests are found, it is determ i n ed  whether the patient has taken the test. If 
he has not, the C M  for this key is calculated using the C F  and A F  values, and varies depending 
in which o f  the four categories the C F  and A F  values lie. I f  the patient has taken the test, d o m a i n  
specific knowledge is used to determine whether the patient's test results satisfy the specified state. 
If so, the CM is set to the CF. Otherwise, the C M  is eq u at ed  to A F  because a conflict exists between 
the expected condition o f  the patient and the patient profile. T h e r e  is no possibility to partially 
validate these keys. F o r  example, the results o f  the u l t raso u n d  for the patient in Fig. 4 are c o m p a r e d  
with the expected o u t c o m e s  for the key c o m m o n  bile duct. F o r  the test results [7], it is found that 
the c o m m o n  bile duct o f  the patient is very dilated. T h e r e f o r e ,  CM is eq u at ed  to 0.9. Th e ultrasound 
also shows there are gallstones in the gallbladder. CM for this key is set to 0.8. 

Planning optimal order o f  tests, prioriti-e, is a complicated issue, and could be the d o m a i n  o f  
a n o t h e r  expert system that would p e r f o r m  in parallel to D I J E S T .  Currently, the tests are checked 
in sequential order. Studies in decision analysis for developing clinical strategies similar to the one 
for the diagnosis o f  extrahepatic obstructive jau n d i ce can be useful for the d ev el o p m en t  o f  this 
module. Especially, the sensitivity, specificity, complications and the cost o f  the individual tests 
have been investigated to devise different adaptive strategies for tests taking, represented as decision 
trees in Ref. [8]. We have used a~ailability as o u r  criteria for ordering. 

(ii) Keys referring to blood tests, such as amylase and bilirubin, are evaluated using possibility 
distribution curves which are graphs provided to us by o u r  experts. First the M A T C H E R  checks 
whether this is a key that requires curve fitting analysis by seeing whether a patient has taken the 
particular test. If not, the calculation o f  the C M  is carried out by considering the four classes o f  
C F  and A F  values as for the anatomical states. If the patient has the test, the patient value is 
checked by a disease specific possibility distribution curve, where each curve estimates the 


DIJEST 137 

likelihood that a patient with the particular test value has the disease being considered. The 
resulting possibility value is used along with the CF and AF to determine the CM of this key. For 
example, if the patient's test result shows a particular positive possibility, this value is used to 
normalize the CF specified for this key. Normalization is needed since the importance of this test 
result is specified with the CF, and how well the patient's result fits the expected value for the disease 
is determined by the curve. If the patient's test result contradicts the presence of the disease, 
invalidating the key, then the full AF value is used as the CM value. Curve fitting is actually not 
very suitable with Prolog if speed and accuracy is required. It should be implemented as an external 
procedure. 

extract_from (Context, Hypothesis, Patient, PatientSlot, Key, KeyVals, CF,AF, CM) ,-- 
possibility_curve( Key), 
curve_fitting (Hypothesis, Patient, PatientSIot, Key, KeyVals,C F,AF, CM). 

possibility_curve(Key) ,-- blood_test(Key). 

curve_fitting(Hypothesis, Patient, PatientSIot, Key, KeyVals, eF,AF, e M )  ,-- 
blood_test_analysis( Hypothesis, Patient, PatientSIot, Key, CF,AF, CM ). 

blood_test_analysis(Hypothesis, Patient,PatientSIot, BloodTest, CF,AF, CM) ,-- 
(get_patient_val(BloodTest,serum,PatientSIot, Result); 
get_patient_val(BloodTest, blood,PatientSIot, Result)), 
blood_test (BloodTest, Hypothesis, Result, Prob), 
calculate_CM (Prob. Hypothesis, Patient, BloodTest, eF,AF, e M ) .  

(iii) Recall that compound keys refer to a collection of findings, for example prodrome. Their 
analysis requires the MATCHER to consider each finding in the collection similar to the 
consideration of each attribute of a direct key. Each finding for compound keys, though, has to 
be analyzed separately similar to an element of a slot. The collected result of all the findings 
determines the overall CM for this key. 

extract_from (Context, Hypothesis, Patient, PatientSIot, Key, KeyValues, C F,A F,C M ) ,- 
c o n c e p L t a b l e ( C o n t e x t ,  Key, Keyeoncepts), 
satisfy_concept (Context, Hypothesis, Patient, PatientSIot, Keyeoncepts, Prob), 
concept_prob(Prob,CF,AF, CM). 

The sum of all the confidence measures of the findings that are related to this key is denoted 
CMs. CMs is tested with respect to an interval [0,Threshold) where the value of the threshold for 
compound keys is application-dependent. If CMs lies within this interval, the presence of a finding 
can be neither validated nor invalidated, and is considered to be unknown. If the value is to the 
left of this region, the finding is invalidated and the overall CM is set to the AF. Otherwise, it is 
considered to be fully validated and the overall CM is set to the CF. This is handled by 
concept _prob. 

(iv) The rule names that are used within key tuples are evaluated by activating each rule, for 
example for li~'er tests and obstructit,e tests. These rules, which represent for example a group of 
tests, need to be evaluated considering domain specific dependencies of the tests. Each rule is 
interpreted separately and the CM calculation varies for each. Default behavior if the patient has 
not taken the test is similar to the default behavior for anatomical states and blood tests. For 
example, the rule obstructit'e tests in Fig. 3 is activated for the patient in Fig. 4. The values of the 
tests of this patient is found to be sufficient for this rule. Therefore, the CM for this key is set to 
the CF value, which is 0.7. 

extract_from (Context, Hypothesis, Patient, PatientSIot, Key, KeyValues, C F,AF, C M )  ,-- 
call_proc ( [ Key, Context, Hypothesis, Patient, PatientSIot, KeyValues, C F,A F,C M ] ). 

/ * C A L L  A N Y  PROCEDURE PASSED AS P A R A M E T E R ' /  
call_proc([ProcNamelList]) , -  Proc = .. [ProcNamelList],Proc. 

The MATCHER has a default behavior for evaluating keys which are not covered by the above 
discussion, for example a direct key in the DD which does not appear in the patient profile, or 
a compound key for which no information is known. The CMs of these keys are determined with 


138 L. CI. YAL(;INALP and L. STERLING 

respect to the f o u r  categories o f  C F  a n d  A F  values. T h e  crucial categories o f  critical keys a n d  
c o n t r a d i c t i n g  keys are chosen so as not to c o n t r i b u t e  to the overall sum. T h e  confidence m e a s u r e  
C M  is calculated as follows: 

n o L k n o w n  (Context, Hypothesis, Patient, gey, KeyVals, BV,CF,AF,CM) ,-- 
% minor keys 
CF < BV 
A F >  = 0 ,  
CM is(CF + AF)/2. 

not_known (Context, Hypothesis, Patient, Key, KeyVals, BV, C F,A F,0) ,- 
% critical keys 
CF > = BV, 
A F < 0 ,  
record_question ( [ Context, H ypothesis, Patient, Key, KeyVals] ). 

not_known (Context, Hypothesis, Patient, Key, KeyVals, BV,CF,AF,0) ,- 
% contradicting keys 
A F < O ,  
CF < BV 
record_possible_contra (Context, Hypothesis, Patient, Key). 

not_known( Context, Hypothesis,Patient, Key, KeyVals, BV, CF,AF,AF) ,- 
% Confirming keys 
A F >  = 0  
C F >  = BV, 
record_u nknown ( Context, Hypothesis, Patient, Key). 

F o r  e x a m p l e ,  the exact location o f  the o b s t r u c t i o n  c a n  not be d e t e r m i n e d  by u l t r a s o u n d  for the 
patient in Fig. 4 [7]. This key is a critical key. T h e r e f o r e ,  the C M  value is set to 0 by the default 
values as described a b o v e .  

4.5. Orerall likelihood score 

T h e  o~erall likelihood score o f  a slot Ls~o,, as m e n t i o n e d  earlier, is the sum o f  the C M  for each 
key a n d  n o r m a l i z e d  by the specific n o r m a l i z a t i o n  f a c t o r  o f  the slot. T h e  n o r m a l i z a t i o n  factor, N F ,  
is defined as follows ~ h e r e  n is the n u m b e r  o f  elements in a slot. We a s s u m e  that not all absence 
f a c t o r s  are zero. 

" ,~AF,, A F  >/0, 
N F = E f = ,  -I.CF,, A F < 0 .  

T h e  weighted sum, WS,, can be defined as the best case where all the elements o f  the slot i is 
validated. T h u s ,  WS, = E~'= t C F , .  T h e r e f o r e ,  N F ,  ~< WS,. With this relation, the n o r m a l i z a t i o n  helps 
to increase the c o n t r i b u t i o n  o f  slot i to the likelihood o f  the overall context. T h e  score might be 
g r e a t e r  t h a n  I with d a t a  that c o n f i r m s  all the expected values o f  a slot in a disease descriptor. T h e  
overall likelihood o f  a c o n t e x t  is thus defined as 

N \  

T Ls o~, 
L c =  J=l 

N V  

where N V  equals the n u m b e r  o f  valid slots in a context. 
Let us illustrate this calculat,on b~ using the e x a m p l e  disease in Fig. 3 a n d  the patient in Fig. 4. 

I f  the l a b _ t e s t s  slot is considered, it is seen that the n o r m a l i z a t i o n _ f a c t o r .  2.7 is calculated as 
described above. T h e  calculation o f  C M s  for each o f  the keys in this slot is illustrated in Section 4.3. 
Respective b ,  the~ are 0.7, 0.8, 0 a n d  0.9. T h e  sum o f  these C M s  is 2.4. Using these values, Lsk,,, 
is set to 0.89. Since there is only one slot in this context, L~b_, .... is equal to 0.89. 

4.6. The patient anaO,sis 

When the M A T C H E R  calculates the likelihood scores o f  a disease, special i n f o r m a t i o n  related 
to the patient with respect to each disease is recorded a l o n g  with the likelihood scores, T h i s  


D I J E S T  139 

information is used to produce an evaluation report a b o u t  the status o f  a patient. It consists o f  
the list o f  findings which are expected but not present in the patient data, which are contradictory to 
the evaluated disease, and the important concepts which have not been validated during the analysis 
o f  the M A T C H E R .  The findings o f  the evaluation are divided into four categories, questions, 
contradictions, possible contradictions and unknowns. To record this information, again the four 
categories o f  C F  and A F  values are used. The code in the previous section is suitably adapted. 

The evaluation report can be used to guide the subsequent stages o f  clinical diagnosis in the 
screening process shown in Fig. 2. For example, the missing necessary tests to check a specific 
condition that have not been performed are suggested by questions for a disease. Contradictions 
are the set o f  facts in the patient profile that contradict the existence o f  the disease. Possible 
contradictions are the unknown classes o f  information which might be critical. They can contradict 
the disease if their definite absence is proven. U n k n o w n s  is the category o f  data that can be used 
for confirmation but are u n k n o w n  at the time o f  evaluation, 

5. P E R F O R M A N C E  O F  D I J E S T  

The development time for D I J E S T  was a b o u t  nine m o n t h s  including our learning a b o u t  aspects 
o f  jaundice, the diseases and the related a n a t o m y  and the physiology. The knowledge represen- 
tation scheme and the uncertainty reasoning mechanism reflect our perception o f  medical concepts 
and clinical reasoning provided by our experts. 

D I J E S T  has been tested with cases taken from medical text books and real patient records. For 
example, Table I shows a differential diagnosis produced by DIJEST for a patient with 
choledocholithiasis. The medical history o f  this patient is shown by Fig. 4. During testing, the 
evaluation o f  all the d o m a i n  diseases were included. In clinical use, a threshold m a y  be used to 
inhibit unlikely diseases. 

The analysis shows that choledocolithiasis is given the highest likelihood score by DIJEST, even 
though it does not get the highest score in each context. The score for acute cholecystitis shows 
the way large absence factors can prevent a disease from being considered seriously as explaining 
the jaundice. The scores from the contexts o f  clinical examination and lab tests strongly suggest 
that cholecystitis could explain the jaundice, more so than choledocolithiasis, but the patient's 
history strongly contradicts the disease. 

The evaluation report o f  this patient points out for example the lack o f  information about critical 
findings o f  hepatitis, such as the presence o f  a prodrome, or the exposure to the use o f  needles in 
the past. The evaluation report is not shown here. 

Later on in the course o f  the disease, the same patient contracted pancreatitis, directly caused 
by the choledocholithiasis. We added new test results to the patient profile and re-ran DIJEST. 
The result o f  the second differential diagnosis is given in Table 2. The only changed scores are o f  
those diseases related to the pancreas. Note especially that the likelihood score o f  pancreatitis has 
significantly increased. 

Table 2 demonstrates the ability o f  DIJEST to cope with multiple diseases. Knowledge is still 
necessary, for example, to realize that hepatitis and choledocholithiasis do not in general co-exist, 
whereas choledocholithiasis m a y  cause pancreatitis. Such reasoning, which would form part o f  the 
screening process, allows us to place more significance on the score for pancreatitis than for 
hepatitis even though it is actually marginally lower. 

Table l Table 2 

Likelihood Scores for Patient I0001 

Disease History Clinical 
choledocholithissis i .00 0.84 
viral hepatitis -0.05 0.99 
hepatitis -0.75 0.99 
acute cholecystitis -1.50 1.17 
pancreatitis 0.28 0.20 
pancr, pseudo cyst 0.16 0.00 
cirrhosis 0.89 0.90 
paJncreatic cancer -0.50 0.17 

Tests Total Score 
0.89 0.91 
0.81 0.58 
1.00 0.41 
1.08 0.25 
0.21 0.23 
0.13 0.10 

- 1 . 5 0  0.03 
- 0 . 8 7  - 0 . 4 0  

II II Likelihood Scores for Patient 10001 
Disease History Clinical Tests Total Score 
choledocholithiasis 1.00 0.84 0.89 ().91 
viral hepat, itis - 0 . 0 5  0.99 0.81 0.58 
pancreatitis 0.28 0.20 1.06 0.51 
hepatitis -0.75 0.99 1100 0.41 
acute cholecystitis - 1 . 5 0  1.17 1.08 0.25 
cirrhosis 0.69 0.90 - 1 . 5 0  0.03 
pancr, pseudo cyst 0.16 0.00 - 0 . 3 0  - 0 . 0 5  
pancreatic cancer - 0 . 5 0  0.17 - 1 . 2 3  - 0 . 5 2  


140 L. I~. YAL~INALP and L. STERLING 

D I J E S T  was i m p l e m e n t e d  by using P r o l o g  c o n s t r u c t s  which are s t a n d a r d  in a l m o s t  all Prologs. 
It currently runs under Sicstus a n d  Q u i n t u s  Prologs. In terms o f  speed, p r o d u c i n g  a table such as 
a b o v e  a n d  the e v a l u a t i o n  r e p o r t  takes only a few seconds on the average. 

6. C O N C L U S I O N S  

T h e  features o f  D I J E S T  in its c u r r e n t  state can be s u m m a r i z e d  as follows. 
Medical k n o w l e d g e  is represented declaratively. T h e  d o m a i n  specific knowledge a n d  d o m a i n  

specific r e a s o n i n g  is clearly distinguished f r o m  d o m a i n  i n d e p e n d e n t  k n o w l e d g e  by the M A T C H E R  
by using different types o f  keys. T h e  c o m p l e x  medical k n o w l e d g e  related to the diseases, the 
characteristics o f  different testing p r o c e d u r e s  a n d  the basic a n a t o m i c a l  a n d  physiological structure 
o f  the b o d y  are all represented i n d e p e n d e n t l y  o f  p a t i e n t  i n f o r m a t i o n  a n d  illustrate characteristics 
o f  jaundice. Using P r o l o g  e n a b l e d  us to reach o u r  objective, to have this s e p a r a t i o n  and write a 
specialized interpreter very easily. T h e  interpreter has also been generalized to handle d o m a i n s  
o t h e r  t h a n  D I J E S T  by c u s t o m i z i n g  the general m a t c h i n g  capabilities o f  the interpreter. 

R e p r e s e n t i n g  the likelihood estimates by using two s e p a r a t e  factors, c o n t r i b u t i o n  and absence 
factors, can distinguish between valid, invalid, u n k n o w n  and a b s e n t  data. 

D I J E S T  presents very realistic likelihood estimates o f  the presence o f  the c a n d i d a t e  diseases by 
e v a l u a t i n g  the patient profiles, which m a y  be incomplete. O f  special i m p o r t a n c e  is the calculation 
o f  likelihood scores o f  the individual c o n t e x t s  a n d  their effect on the final diagnosis. D I J E S T  also 
e m p h a s i s e s  significant f a c t o r s  in the e v a l u a t i o n  o f  each disease. C o n t r a d i c t o r y  findings a n d  
i m p o r t a n t  d a t a  which m a y  be required for further e v a l u a t i o n  o f  the patient are noted. 

D I J E S T  is very p r o m i s i n g  in the early detection o f  co-existing diseases in a patient a n d  provides 
g o o d  likelihood estimates in the cases with multiple diseases. 

T h e  m o s t  difficult task in D I J E S T  is to o b t a i n  the c o n t r i b u t i o n  a n d  absence factors for different 
keys. Especially, representing the experts" qualitative view o f  the subject by using those factors 
needs successive e x p e r i m e n t s  a n d  a d j u s t m e n t .  

A w e a k p o i n t  o f  D I J E S T  is its neglect o f  unexplained factors that are c o n t a i n e d  in the patient 
profile. T h e  presence o f  a screening process for presenting the results o f  M A T C H E R  in a 
user-oriented m a n n e r  a n d  for r e m o v i n g  r e d u n d a n t  i n f o r m a t i o n  would e n h a n c e  the p e r f o r m a n c e  o f  
D I J E S T .  T h e  consistency checking is also only partially complete. 

At this stage, however, D I J E S T  is e n c o u r a g i n g  in its expressive p o w e r  for medical knowledge 
a n d  by p r o v i d i n g  useful likelihood estimates to indicate the presence o f  d o m a i n  diseases. It has 
p o t e n t i a l  for detecting the co-existence o f  multiple diseases. It is unique in b o t h  its knowledge 
r e p r e s e n t a t i o n  scheme a n d  r e a s o n i n g  with uncertainty. 

.4cknowledgements--We would like to thank our medical expert, Professor David Ransohoff, for providing the medical 
knoaledge embodied m DIJEST. We are grateful for his ~aluable time in attending the knowledge engineering sessions 
and ackno~ledge his influence in shaping DIJEST. Drs Arnold Shmerling and Lawrence Widman also supplied valuable 
medical insights. Dr Len SamueLs commented on an earher draft and pro,,ided the correspondence between contribution 
and absence factors of DIJEST and the scoring mechamsm of Internist. We also thank Yuval Lirox for inviting us to submit 
this paper to the specml isssue of Computers & Mathematics with Apphcations. 

R E F E R E N C E S  

S. Pauker, A. Gorry. J. Kassier and W. Schwartz, To,~ards the simulation of clinical cognition: taking a present illness 
by computer. 4m. d. Med. 60, 981-996 (1976). 

2. B G. Buchanan and E. Shortliffe. Rule Based Expert Systems The 31 YCIN Ex'perlments o f  the Stanford Programming 
Prolect. Addison-Wesley, Reading. Mass. (1984). 

3. R. Miller, H. Pople and J D. Meyers, Internist I, An experimental computer based diagnostic consultant for general 
mternal medicine. New Engl. J, Med. 307, 468-476 (1982). 

4. F. E. Mesarie, R. Miller and J. D. Meyers, Intermst-I properties: representing common sense and good medical practice 
in a computerized medical knowledge base. Computers Biomed. Res. 18, 458~,79 (1985). 

5. P. Szolovitz and S. Pauker, Categorical and probabilistic reasoning in medical diagnosis. Art!L Intell. II, 115-144 
(1978), 

6. Harrtson's Principles o f  Internal Medicine, I Ith edn. McGraw-Hill, Net York (1987). 
7. L I~I. Yalqmalp. Uncertainty reasomng in a medical expert system: DIJEST M.S. Thesis, Department of Computer 

Engineering and Science, Case V~estern Reserve University, Cle~,eland, Ohio, (1987). 
8. J. Richter, M. Silverstein and R. Shapiro. Suspected obstructive jaundice, A decision analysis of diagnostic strategies. 

Ann. Internal Med. 99, 46-51 (1983).