PII: 0898-1221(90)90117-3 Computers Math. Applic Vol. 20. No 9.10. pp. 12~140. 1990 0097-4943+90 $3.00+0.00 Printed m Great Britain. All rights rese~'ed Copyright '._C, 1990 Pergamon Press plc D I A G N O S I N G J A U N D I C E EXPERT SYSTEM L. U. YAL~?INALP a n d L. STERLING Department of Computer Engineering and Science, Case Western Reserxe University. Cleveland, OH 44106, U.S.A. Abslract--DIJEST (Diagnosing Jaundice Expert SysTem) is a medical expert system which produces a differential diagnosis of a patient presenting with jaundice. DIJEST is written in Prolog. and illustrates the use of the language for clearly expressing knowledge. Specifically, the expert system contains explicit declarative knowledge of anatomy and physiology which is used by clinicians when diagnosing obstructive jaundice. The inference engine matches patient records against expected manifestations of symptoms in diseases. Novel m DIJEST is the uncertainty reasonmg scheme, using contribution and absence factors, which places equal importance to symptoms present, absent and unknown in the patient's medical record. Domain specific reasoning and domain specific knowledge are clearly separated from general inference capabilities and knowledge representation schemes. DIJEST has performed well in preliminary tests, being particularl2, impressive for patients with multiple diseases. I. I N T R O D U C T I O N Research in medical expert systems, a m a j o r a p p l i c a t i o n a r e a o f AI, has led to the d e v e l o p m e n t of, a n d e x p e r i m e n t a t i o n with, new schemes for representing knowledge. N o universal tool o r technique has emerged. Each research g r o u p has its o w n style affected by the p r o b l e m d o m a i n a n d the medical expertise being used. T h i s p a p e r describes a new medical expert system, D I J E S T ( D i a g n o s i n g J a u n d i c e Expert SysTem), which is c o n c e r n e d with the differential diagnosis o f patients with o b s t r u c t i v e jaundice. D I J E S T evolved with its m a j o r objective to explore present k n o w l e d g e r e p r e s e n t a t i o n techniques a n d to i n t r o d u c e a declarative style for modelling clinical p r o b l e m solving. Subsequently a n o t h e r issue b e c a m e critical, n a m e l y the modelling o f u n c e r t a i n t y r e a s o n i n g d u r i n g the m a n y stages o f c o n s u l t a t i o n a n d diagnosis o f a disease. D I J E S T has yet a n o t h e r f r a m e based scheme for k n o w l e d g e r e p r e s e n t a t i o n a n d r e a s o n i n g with uncertainty. It is d e v e l o p e d using Prolog. We are able to c o m b i n e different knowledge represen- tation techniques in a single f r a m e w o r k due to the flexibility o f P r o l o g in the design o f different d a t a structures for the system. Specifically features o f f r a m e - b a s e d a n d rule-based r e p r e s e n t a t i o n s were integrated with a new calculus for u n c e r t a i n t y reasoning. G e n e r a l medical k n o w l e d g e a b o u t the d o m a i n was easily represented in P r o l o g declaratively. It also enabled a clear r e p r e s e n t a t i o n o f inference as a specialized interpreter handling the d a t a structures. O u r scheme for u n c e r t a i n t y reasoning is novel due to a s t r o n g d e p e n d e n c e on the i n t e r p r e t a t i o n o f present a n d a b s e n t d a t a , i n f o r m a t i o n that is k n o w n to exist, known not to exist a n d i n f o r m a t i o n that is unknown at the time o f c o n s u l t a t i o n with the p r o g r a m . This was a c o n s t r a i n t imposed by o u r medical experts. It allows c o n t e x t - d e p e n d e n t e v a l u a t i o n o f the patient d a t a . T h e scheme uses c o n t r i b u t i o n a n d a b s e n c e factors which are a t t a c h e d to p a r t i c u l a r m a n i f e s t a t i o n s o f a disease. These f a c t o r s c o n s t i t u t e a numerical r e p r e s e n t a t i o n which c o m p l e m e n t s the qualitative descriptions in D I J E S T . These qualitative descriptions m i r r o r the medical e x p e r t s ' definition o f the characteristics o f m a n i f e s t a t i o n s . O u r work has been influenced by the f a m o u s medical expert systems, M Y C I N , Internist a n d PIP. T h e r e p r e s e n t a t i o n o f k n o w l e d g e using f r a m e s is similar to P I P ' s [I]. T h e c o n c e p t o f c o n t r i b u t i o n a n d absence f a c t o r s evolved f r o m investigation o f the confidence f a c t o r s o f M Y C I N [2] a n d the e v o k i n g strengths a n d frequencies in Internist [3]. In addition, representing c o m m o n - s e n s e k n o w l e d g e in D I J E S T is affected by the r e p r e s e n t a t i o n o f properties in Internist [4] a n d the use o f logical decision criteria in P I P [5]. T h e basic system was designed to be e x p a n d e d a n d e n h a n c e d to i n c o r p o r a t e the stages o f clinical r e a s o n i n g d u r i n g the c o u r s e o f a p a t i e n t ' s t r e a t m e n t . D I J E S T has been tested o n s a m p l e p a t i e n t cases t a k e n f r o m medical t e x t b o o k s a n d patient records. It p e r f o r m s at an a c c e p t a b l e level 125 126 L.O. YAL~INALP and L. STERLING according to o u r experts. An interesting feature is its handling o f multiple diseases contributing to the jaundice. T h e paper is organized as follows. After a brief overview o f D I J E S T s scope, we present D I J E S T s architecture and the multi-layered knowledge representation in the system. Th e next section describes the uncertainty reasoning mechanism which underlies the modelling o f diagnosis, followed by o u r conclusions. We emphasize in this a c c o u n t how P ro l o g can be used to develop an expert system. 2. S C O P E O F D I J E S T 2. I. The problem o f obstructit.'e jaundice Jaundice is the yellow pigmentation o f the skin o r sceleras by bilirubin. This in turn is a result o f elevated levels o f bilirubin in the blood stream [6]. T h e r e are several reasons for this elevation. Most o f the bilirubin is derived from the catabolism o f hemoglobin present in the red blood cells. T h e bilirubin is t r a n s f o r m e d into bile and the liver plays a central role in this metabolism o f the bile pigments. T h e d e r a n g e m e n t s o f this metabolism cause several diseases which have jaundice as a c o m m o n symptom. T h e elevation o f the bilirubin might be related to pathogenetic mechanisms o r disease processes. We are c o n c e r n e d a b o u t a subset o f these diseases which cause obstructieejaundice. This is jaundice due to the mechanical o b s t r u c t i o n o f the biliary radicles o r functional factors that cause impaired hepatic excretion o f bilirubin into bile. Figure I is a simplified diagram o f the organs that are related to the flow o f bile to the intestine after its excretion from the liver. T h e enlargement o f any organ near the bile ducts can block the flow o f bile, thereby causing obstructive jaundice. T h e principal examples are inflammation o f the gallbladder, liver or pancreas, a t u m o r or a cystlike mass in the head o f the pancreas. Obstructive jaundice can also be caused by gallstones leaving the gallbladder, lodging in the bile ducts and blocking the flow. Diagnosing the most c o m m o n causes o f obstructive j au n d i ce as mentioned ab o v e is the p r i m a r y focus o f D I J E S T . Specifically, D I J E S T considers viral hepatitis, alcoholic hepatitis, cirrhosis, cholecystitis, choledocholithiasis, pancreatitis, pancreatic can cer and pancreatic pseudo- cyst. Hepatitis is the inflammation o f the liver. We are co n cern ed with two types o f hepatitis, one is caused by excessive c o n s u m p t i o n o f alcohol and the o t h e r by virus. Cirrhosis is the ch ro n i c irreversible injury o f the liver. Cholecystitis is the inflammation o f the gallbladder where choledocholithiasis refers to the obstruction o f the bile duct by gallstone(s). Pancreatitis is inflammation o f the pancreas. Pancreatic cancer refers to a cancerous growth, while pancreatic pseudo cyst refers to the cystlike masses at the head o f the pancreas. It is critical to differentiate the mechanism that causes the o b st ru ct i o n and the site o f the o b s t r u c t i o n in clinical practice. T h e r e f o r e , D I J E S T is designed to p r o d u c e possible diagnosis by , ~. ~ I n ~ r o h e p O t l C ncrreos Cyst,C ouct Deoclenurn----..___ " ~ , ~ l lO of voter ~UCtS Fig. I. The anatomy of organs participating in the bile flow. DIJEST 127 rPat*ent profile l ~--4 (.,sto~y, I I cLinicaL e~am, I L t a d t e s t s ) .~ I Disease desCriptors I General medicaL knowledge MATCHER Candidate d;seases I EvaLuation of pat=ent I_ (dynamic patient data) J- .__L . . . . . -=i t _f -L SCREENING LikeLihood estimates for candidate diseases " . . . . . . . ] t i ; I I L . . . . . . . . . . . . . . . l r . . . . . . . . . . . . - : t t [ . . . . . U ~ r . . . . . . . interaction Fig. 2. The architecture of DIJEST. indicating the likelihood of each of these diseases and the differentiating factors that leads to the diagnosis. 2.2. Architecture o f D I J E S T DIJESTs system structure as initially planned is shown in Fig. 2. The system is constructed around its most important component, the specialized interpreter that we call the MATCHER. In this section we describe the function of the system components. The boxes above the MATCHER indicate the knowledge used by DIJEST. Diseases are represented by disease descriptors. The candidate diseases are the list of diseases that are to be considered for differential diagnosis. The patient profile consists of all the knowledge related to a particular patient. The MATCHER analyzes the patient producing an evaluation of the patient and likelihood estimates for candidate diseases. The MATCHER evaluates the current mixture of known, uncertain, partially satisfied and unknown findings of a patient with respect to the candidate diseases. The evaluation of each patient includes any contradictory evidence and suggestions for additional tests that should be performed. The results will provide feedback for the next stage of diagnosis, More details of the MATCHER are given in Section 4. A full evaluation of the output of the MATCHER was intended to be considered by the screening process. Currently the patient profile is examined for the manifestations expected by the disease descriptors. The screening process would also evaluate significant patient data that is not explained by the differential diagnosis. 3. K N O W L E D G E R E P R E S E N T A T I O N IN DIJEST The knowledge base of DIJEST consists of medical knowledge about jaundice and information about the patient. Knowledge of jaundice is divided into descriptions of the diseases which cause 128 L. 0. YAL(~INALP and L. STERLING disease_history(noL_appl, disease_desc(choledocholithiasis, history(noL.appl, ~previous illness not_appl, ~pre~ious tests not.appl, %e~osed_to not_appl, ~family_background [ %symptoms (pain ,[site(abdomen,S), severity(none_to_severe,I), continuity(intermittent,S), duration(short,3), coupled_by(nausea, 1 I, coupled_by(vomiting, 1 I, threshold(131]. contribution_absence_factors(0.9,-O. 111, normalization J'actor(O.gl], not.appl, ~obser~ations not_appl, J~drng.use not_appl ~surgery 1, clinical( [(jaundice,[pace(fast,2), pace(medium_slow, 1.31, threshold(1.31], contribution_absence_factors(0.6,-2.011, (gallbladder,present,contribution_absence.factors(O.9,-2.011 , (tender_abdomen,[site(upper.quadrant,S I, condition(attacks,S), threshold(lO)], contribution_absence_factors(0.7,-2.0)), normalization.Saetor(2.21] 1, labtests( [(obstructive.tests,disease_related,contribution_absence_factors(O.7,-2.011, (gallbladder, gallstones(presentl,contribution_absence_factors(0.8,0.211, (common.bile_duct, obstruction(presentl,contribution_absence_factors(O.9,-2.0/), (common.bile_duct, dilatation(abnormal),contribution_absence_factors(O.CJ,-2.0/), normalization.factor(2.7)/] 1. Fig. 3. Disease descriptor of choledocholithiasts. jaundice, and general medical knowledge. This section describes the significant points in o u r representation. 3. I. Disease descriptors The diseases that have j a u n d i c e as a c o m m o n s y m p t o m are represented individually in D I J E S T by disease descriptors or DDs. A D D describes the p r o t o t y p i c a l characteristics o f a patient who has the disease and it is represented by a framelike structure in Prolog. Figure 3 shows the disease descriptor for choledocholithiasis. Each disease descriptor is a q u a d r u p l e indexed by the disease name. T h e expected characteristics related to the history, clinical examination and the laboratory tests are the c o m p o n e n t s o f this structure. We will refer to those three c o m p o n e n t s as contexts. Each co n t ex t consists o f a n u m b e r o f slots that show the logical subdivision within that context. F o r example, the history context has eight slots showing previous diseases, previous tests, en v i ro n m en t al and clinical factors which suggest the disease if the patient has been exposed to them, facts that are related to family DIJ EST 129 b a c k g r o u n d , expected previous symptoms, expected physical observations, usage o f particular drugs and previous a b d o m i n a l surgery, respectively. T h e co n t ex t s for b o t h clinical ex am i n at i o n and l a b o r a t o r y tests have only one slot. Slots that are not applicable fo r a specific disease are shown by no_appl as illustrated by Fig. 3. Each slot consists o f specific characteristics related to the slot. T h e y are called elements. T h e y are indicated by slanted uppercase letters in Fig. 3. An element is either a single characteristic o r a disjunction o f characteristics. A single characteristic is called a key tuple and is indexed by a key. A key is the smallest c o m p o n e n t o f this layered structure and it can be either a direct key or an extractable key. A key tuple consists o f key attributes, and a pair c o n t a i n i n g a contribution factor and an absence factor. The key attributes o f a key tuple show the characteristics o f a key which are related to the disease. T h e y are defined with respect to the key type. T h e c o n t r i b u t i o n and absence factors are related to the uncertainty reasoning handled by the M A T C H E R and will be dis- cussed in the next section. T h e y are represented as contribution~absence_factors(CF, AF) for clarity in Fig. 3. T h e type o f a key determines how the diagnosis is handled by D I J E S T . T h e different key types are known by the M A T C H E R and handled differently. T h e y are described below. !. Direct keys. A simple concept or a finding is represented by a direct key. T h e key attri- butes o f a direct key are given by a list o f qualitatit,e characteristics defining the key. F o r each attribute, a n u m b e r showing the i m p o r t a n c e o f this qualitative description with respect to the key is given as an integer between I and 5. It is called the significance ~,alue o f the attribute. T h e qualitative descriptions along with significance values describe the co n cep t or finding fully. F o r example, pain is a direct key, but its severity, d u r a t i o n or location differs from one disease to a n o t h e r as well as their relative significance for defining the pain. In Fig. 3, the respective attribute values o f pain for choledocholithiasis are shown. F o r example, a patient is expected to have pain with intermittent continuity. Since intermittent pain is an i m p o r t a n t indicator for choledocholithiasis, it is given a significance value o f 5. Every direct key is defined with a threshold that is used by the M A T C H E R for uncertainty reasoning. If. Extractable keys. An extractable key represents a medical c o n c e p t that can not be described as a simple finding. T h e concept needs to be extracted fro m the patient data, In o r d e r to simplify their representation, they are shown as a single key in the disease descriptors. T h e y are divided into four categories to simplify the diagnosis: (i) Some o f the keys represent anatomical o r physiological states o r concepts. T h e patient is expected to be in or have such states if he is likely to have the disease. F o r example, gallbladder is a key in the l a b o r a t o r y tests context, and the presence o f gallstones is its defining at t ri b u t e as shown in Fig. 3. Th e value o f this a t t r i b u t e is used as an aid for determining the site, o r the exact co n d i t i o n for such a key. Since there m a y be m a n y tests that would determine whether the patient has the specified state, this representation allows the M A T C H E R to determine the necessary diagnostic tests that would indicate the co n d i t i o n given in this key. (ii) T h e names o f blood tests are used as keys in D I J E S T to show the tests required in the diagnosis, for example bilirubin, amylase and sgot. Th e analysis o f the results o f a particular b l o o d test is disease dependent. Th e same blood test may indicate different likelihoods o f the presence o f a disease for different diseases. Possibili O, distribution cun,es are used to represent those ranges o f results o f b l o o d tests. T h e y are separate from the disease descriptors and were provided by o u r medical experts. H o w they are used with respect to the disease descriptors will be covered in the next section. (iii) Some keys refer to a collection o f simple o r complex findings. T h e individual findings in the collection might not be very significant on their own o r affect the 130 L. 0. YAL~INALP and L. STERLING (iv) diagnostic process. However, their c o m b i n a t i o n constitutes a medical co n cep t and should be considered as a composite finding. We call such keys compound keys. T h e collection o f individual findings are represented in separate tables for each c o m p o u n d key. C o m p o u n d keys are represented as a c o m b i n a t i o n o f direct or extractable keys, but the c o n t r i b u t i o n and absence factors are defined for the c o m p o s i t e meaning. Prodrome is an example o f a c o m p o u n d key which is used for diagnosing hepatitis. Figure 3 does not contain an example o f a c o m p o u n d key. Some keys represent rules that D I J E S T has to activate in o rd er to check the presence o f a disease. T h e y are used by the M A T C H E R to evaluate the patient d a t a that are related to different contexts or to c o m p a r e the tests results. F o r example, inflammation and obstructive tests are two rule names. T h e latter is shown in Fig. 3. T h e key attributes for this kind o f key are not used since the c o n c e p t to which they refer is e m b e d d e d in the rules. 3.2. Patient data T h e i n f o r m a t i o n a b o u t a patient is given as input to D I J E S T . All the i n f o r m a t i o n related to a patient is indexed by a unique patient number. It is given in four different frames, analogous to the contexts in the disease descriptors. We refer to the i n f o r m a t i o n a b o u t the patient as the patient profile. I. Patient ID record. Consists o f identification information. 2. Medicalhistorv. This frame is similar to the history co n t ex t o f a DD. An example frame for a patient is shown in Fig. 4. It consists o f six different slots c o r r e s p o n d i n g to the first six slots in a D D . D ru g and surgery i n f o r m a t i o n are represented as separate slots in the knowledge base if the patient has relevant data. medical.history(lO001, % pre~ious diseases [(jaundice,[occur rence(negative)]), (alcoholism,[occurrence(negative)I)], previous_tests [(wbc,[date(2.2,1987),site(blood),result(10200)])], %exposed to [(hepatotoxins,[exp_to(negative)]), (jaundiced_people,[exp_to(negative)])], ~famzly background [Cjau ndice, [occurrence(negative)])], %symptoms [(pain ,[date(15,1,1987), site(abdomen), severity(severe), continuity(intermittent), duration(6, days), coupled_by(nausea), coupled_by(vomiting)I). (.a-sea.[date(iS,1,198Z)]), (vomiting,[date(15,1,1987),cause(la rge_clinner)]), (intolerance_fatty_foods,[reaction(negative)])], %observations [(skin,[color(yellow)I), (u rine,[color(d ark)I), (stool ,[color(llght.brown)])])). Fig. 4. Medical history for patient IO00l. DIJ EST 131 . . Only simple findings with their related attribute values are represented in this frame. The attributes have both quantitatit,e and qualitatit,e descriptions. For example, the attribute duration for the key pain shows how long the patient has been in pain, and might have the value "6 days". Clinical examination. This frame consists o f two slots: the knowledge related to the actual physical examination o f the patient and the results o f the cardiovascular tests routinely taken. As in the history frame o f the patient, the simple findings are represented as binary key tuples. Tests. Each test frame for a patient is indexed by the test name and the patient ID number. It consists o f information about the date o f the test and its result with respect to the site where it is taken. For tests such as ultrasound, the results are given as a collection o f findings related to a site, its state and its condition, because those tests are used to determine the condition o f different parts o f the body. For blood or urine tests, their specific site is indicated along with a single result. Examples for patient 10001 are shown below. test(10001 ,date (2,3,1987),soot, [ (blood,80) ] ). test (10001 ,date (2,3,1987),alk_phosp, [ ( b l o o d , 1 2 0 ) ] ). test (10001 ,date(2,3,1 987),ultrasound, [ (gallbladder,edema,present), (common_bile_duct,dilations,s) (gallbladder,gallstones, present), (pancreas,swelling,normal), (pancreas_head,dilatation,normal), (pancreas,state,normal) ] ). 3.3. General medical knowledge Medical knowledge in D I J E S T is represented independently from any particular patient. Examples o f such knowledge are the general characteristics o f jaundice, what the available tests measure along with their possible sites, the restricted a n a t o m y o f the human body that concerns the d o m a i n diseases, the d o m a i n specific qualitative representation o f quantitative terms and the possibility distribution curves o f blood test results. This knowledge is used by the M A T C H E R when creating the differential diagnosis. F o r example, ultrasound is used to determine the presence o f gallstones in the gallbladder or the size o f the bile ducts, or sgot is a blood test and its primary function is to detect liver injury. This information is represented declaratively in DIJEST and illustrated below with a few examples. lab_test(ultrasound, [ (gallbladder,gallstones, [ present,a bsent] ), (gallbladder,edema, [present,absent] ), (pa ncreas_head,dilation, [normal,s,inc] ), (extra hepatic_d ucts,dilatation, [ normal,s, inc] ), (intrahepatic_d ucts.dilatation, [ normal,s, inc] ), (liver, hepatic_texture, [ homogen,not homogen ] ), (pancreas,swelling, [head,diffuse, normal] ), (pancreas,state, [atrophic,ind urated,cyst, normal] )] ). lab_test (u ribirilogen, [ (u rine,excretion_bile, [present.absent.decreased,increased] )] ). 4. P A T I E N T E V A L U A T I O N IN D I J E S T 4. !. A Prolog-based MA T C H E R As its name suggests, M A T C H E R compares a patient profile with the disease descriptors present in the system. It is a special interpreter written in Prolog which compares the frame structures, takes into account present, absent and u n k n o w n factors and establishes likelihood scores for the presence o f a disease. The findings o f a disease, namely its DD, is matched against a patient's profile in the three different contexts o f history, clinical exam and laboratory test data. A likelihood score is 132 L. 0. YAL~INALP and L. STERLING calculated for each context, and the overall likelihood score for the disease is c o m p u t e d as the average o f the scores for the three contexts. diagnose( Patient, History,Clinical,Tests,disease( Disease, DiseaseProb) ) ,-- disease_desc(Disease, DH,DC, DT), eval_history(Disease, Patient, History, ClinicaI,Tests, D H,H ist Prob), eval_clinical (Disease, Patient.H istory,ClinicaI,Tests, DC,ClinicalProb). eva I_tests ( D isease, Patient, H istory. C li n icaI,Tests, DT,Tests Prob), combine_prob(Hist Prob,ClinicaIProb,TestsProb, DiseaseProb). combine_prob(H P.CP,TP, FinalProb) ,- FinalProb is ( H P + C P + T P ) / 3 . 0 . L o o k i n g at the sample disease descriptor in Fig. 3, and a patient's medical history record fro m Fig. 4, it should be clear that the matching is not a direct unification o f expected values o f attributes for keys for a slot in a particular context. Nevertheless, the interpreter uses unification to determine key types to handle different type o f keys. Details o f the M A T C H E R will be covered in the follo~.ing sections. The findings o f a patient are evaluated with respect to a list o f d o m a i n diseases, called the candidate disease list. T h e c a n d i d a t e disease list in the c u r r e n t version o f D I J E S T is all the known diseases that are present in the knowledge base. Heuristic rules could be added as a front-end to generate a shorter list. F o r example, some sets o f s y m p t o m s suggest very strongly viral hepatitis and nothing else. At the m o m e n t , all the c a n d i d a t e diseases are processed in a straightforward m a n n e r and for each D D on the c a n d i d a t e disease list, a likelihood score is calculated which represents the possibility that a patient has the disease. 4.2. Calculation oJ" likelihood scores A cot!fidence measure (CM) is calculated separately for each slot in a context. T h e likelihood score for the context is a weighted sum o f the CM for all the slots in the context. T h e weighting is affected by the n u m b e r o f relet,ant slots in a context. T h e relevance o f a slot is disease-dependent. F o r example, the family b a c k g r o u n d o f the patient is not relevant for choledocholithiasis as shown in Fig. 3 and it is indicated by a not_appl value o f the slot. T h e CM for a slot is calculated from the C M s o f all the elements in the slot. T h e confidence measure for an individual slot element represents how much the patient profile satisfies the requirements o f that element o f the disease descriptor. T h e calculation o f individual CMs is tied to o u r use o f c o n t r i b u t i o n and absence factors to be described below. Recall that an element is either a single key tuple or a disjunction o f them. T h e CM calculation o f a key tuple is determined by the key type, and the requirements satisfied by the patient profile which is related to the c o n t r i b u t i o n and absence factors. Th e CM o f a disjunction o f key tuples is the largest CM o f one o f the disjuncts. If the M A T C H E R can find the manifestations defined for a key tuple that are expected to be present in a patient with a particular disease, then this key tuple is t, alidated. If only some o f the findings are existent, then this key is partially ~,alidated. When the patient profile is known not to have those findings or there is evidence against the presence o f the findings, then the key is im'alidated. The M A T C H E R considers the keys to be unknown if it can not find the related attributes from the patient profile in the case o f a direct key, or extract it in the case o f extractable keys. F u r t h e r details a b o u t the validation process are presented after the discussion o f the use o f c o n t r i b u t i o n and absence factors. 4.3. The role o f contribution and absence factors C o n t r i b u t i o n and absence factors are the essence o f the mechanism for reasoning under uncertainty in D I J E S T . A contribution factor (CF ) and an absence factor (AF) are defined for each key in exert, key tuple o f a slot in the DD. T h e C F determines the degree o f i m p o r t a n c e o f the presence o f the specific concept represented by the key n am e to the slot in which it occurs. It indicates the expectation that a patient has the specific disease when the i n f o r m a t i o n in his/her profile validates the requirements o f this key. T h e c o n t r i b u t i o n fact o r is defined as a real n u m b e r between 0 and I, inclusive. F o r example, the c o n t r i b u t i o n factor o f the direct key pain is 0.9 for DIJEST 133 choledocholithiasis as shown in Fig. 3. It shows that the presence o f pain as defined by its respective values is very i m p o r t a n t for choledocholithiasis. T h e A F determines the i m p o r t a n c e o f the absence o f the co n cep t in the patient profile. It effectively measures the likelihood o f a patient to have o r not to have a disease given the absence o f the key. It is represented on a scale o f ( - . ~ , I). T h e wide scale o f absence factors is used to influence the i m p o r t a n c e o f a specific key to the entire slot within which it is defined. F o r example, the absence factor o f pain is - 0 . 1 as shown in Fig. 3. C F values fo r a key are always greater than the A F values. T h e analysis to determine whether the patient has the disease depends purely on CFs and AFs. O u r scheme is similar to the scoring mechanism in P IP where the scores are given in the frames [I]. The CFs and AFs are actually the quan t i t at i v e representation o f the qualitative terms. such as "usually p r e s e n t " , " c o n f i r m i n g " , "'critical", "'more likely", "'less likely" and "'contra- dicting", that were used by o u r medical experts. Th e terms have been distributed on two different scales by using CFs and AFs. Each application has a base value, BV, which partitions the c o n t r i b u t i o n factors into two sets, those a b o v e the BV and those below it. The BV is used as a point o f reference for the distribution o f c o n t r i b u t i o n and absence factors o f the keys. F o r D I J E S T , a BV o f 0.5 was used. T w o principles underly o u r choice o f values for c o n t r i b u t i o n and absence factors from their respective scales for specific keys: • C F / > BV indicates that the key is i m p o r t a n t to establish that the patient has the disease under consideration. • A F < 0 indicates that the absence o f the key is i m p o r t a n t to c o n t r a d i c t that the patient has the disease under consideration. Confidence measure values are classified into four categories based on these two principles: I. C F > B V , AF~>O. These keys are confirming. A confirming key in the patient profile contributes significantly to the likelihood score. Its validation will lead to a high score. H o w e v e r even if the key is not validated, the disease m ay still figure p r o m i n e n t l y in the final differential diagnosis. 2. CF>~ BV, A F ( O . These keys are critical. Critical keys have the most impact on determining the likelihood score. T h e validation o f a critical key contributes to a high score. T h e invalidation o f a critical key co n t ri b u t es negatively to the score by using the AF. If a critical key is un k n o w n , a neutral position is taken. 3. CF ( B V . A F < O. These keys are contradicting. T h e validation o f a co n t rad i ct i n g key does not strongly confirm the existence o f the disease. T h e invalidation o f a c o n t r a d i c t i n g key can lead to a very low likelihood score. 4. C F < BV, A F i> O. These keys are minor. M i n o r keys are used for fine tuning the differential diagnosis and will play a greater role in the future screening process. This classification scheme a p p r o x i m a t e l y c o r r e s p o n d s to the following use o f eroking strength and frequency values in Internist's scoring mechanism. • Critical keys: eroking strength, 4 frequency 4. • C o n t r a d i c t i n g keys: et,oking strength, l frequency 4. * Confirming keys: et,oking strength, 4 frequency 2. • M i n o r keys: et'oking strength, 2 frequency I. Each element in a slot list is evaluated accord i n g to the a b o v e classification. T h e M A T C H E R determines how well the patient profile fits the structure that is determined for this element. Using the state o f the patient profile with respect to the attributes o f each element in this slot list and using the C F and A F factors, the m a t c h e r determines the CM o f this element. Repeating this iterative process, all CM values o f the elements in a slot list are accumulated and normalized by the unique normalization_factor for the slot. T h e overall sum o f the slots determines the score o f a particular context and then the likelihood score o f the disease. T h e matching process for the slot values is illustrated below. In the code, Context refers to the current name o f the context, Hypothesis refers to the n am e o f the disease currently investigated 134 L. 0. YAL~INALP and L. STERLING and PatientSlot has all the values that are currently k n o w n for the Patient for a particular slot, such as symptoms. T h e first clause illustrates that the slots which are not applicable are n o t skipped over, with the assumption that they are completely satisfied for probability calculations. satisfy_slots(Context, Hypothesis, Patient, PatientSlot, noLappl,1.0). satisfy_slots(Context, Hypothesis,Patient, PatientSlot, Slot,SlotProb) ,- Slot' = = not_appl, satisfy_slot (Context, Hypothesis, Patient, PatientSlot, Slot,Slot Prob,O). satisfy_slot (Context,Hypothesis, Patient, PatientSlot,[normalization_factor(N F)], SlotProb,AccProb) *- SlotProb is AccProb/NF. % normalize for a slot satisfy_slot (Context, Hypothesis, Patient, PatientSlot, [ Keyl Key List], Slot Prob,Acc Prob) ,-- Key = = normalization_factor(NF), satisfy_key(Context, Hypothesis, Patient, PatientSlot, Key,CM), accumulate (Acc Prob,C M,AccProbNext), satisfy_slot (Context,Hypothesis, Patient,PatientSlot, KeyList, SlotProb,AccProbNext). satisfv__key determines whether a key is a single key o r a disjunction o f keys. T h e code for processing single keys is given below. Th e find predicate extracts the values for a particular key from the patient profile. h a n d l e _ s i n g l e _ k e y ( C o n t e x t , H y p o t h e s i s , Patient, P a t i e n t S l o t , K e y , KeyValues,C F,AF, CM) ,--- i s_cl i r e c L k e y ( C o n text, Key, KeyVal u es), find (Key, PatientVals, P a t i e n t S I o t ) , d i r e c t _ k e y ( C o n t e x t , H y p o t hesis, Patient, PatientVals, Key, KeyValues, C F,AF, C M). hand•e-sing•e-key( C•ntext'Hyp•thesis•Patient•PatientS••t•Key'Key•a•ues'•F•AF'CM) ,-- n o t is_direct_key(Context, Key, KeyValues), extract_from (Context, Hypothesis, Patient. PatientSlot, Key, KeyValues, CF,AF,CM). hand•e-sing•e-key( C•ntext•Hyp•thesis•Patient•Patients••t•Key•Keyva•ues•CF•AF•CM) .-- base_value(BV), not_known (Context, Hypothesis, Patient, Key, KeyValues, BV, C F,AF, CM) direct_key( Context, Hypothesis, Patient, PatientVals, Key, KeyValues, CF,AF,CM). check_whether_absent(PatientVals), absent_key(Hypothesis,Patient,PatientVals, Key,KeyValues, CF,AF,CM), direcLkey(Context, Hypothesis, Patient, PatientVals, Key, KeyValues, CF,AF,CM) .-- check_whether_present(PatientVals), match_compare (Context, Hypothesis, Patient, PatientVals, Key, KeyValues, CF,AF,CM). 4.4 Calculating confidence measures for keys This subsection describes how the individual C M s are calculated for individual keytuples. Both direct keys and extractable keys are treated in detail. O u r description here is qualitative in nature. T h e exact formulae used can be found in Ref. [7]. The confidence measure o f a direct key is calculated t h r o u g h an extended c o m p a r i s o n o f the values in the key attribute list o f the D D with the patient values as shown below. T h e first stage is to calculate the patient sum. that is a score indicating how well the patient values match the attribute values. Patient sums are only calculated for keys which actually a p p e a r in the patient profile. match_compare (Context, Hypothesis, Patient, PatientVals, Key, KeyVals, C F,AF, C M ) ,-- compute_patient_sum (Context, Hypothesis, Patient, PatientVals,CF,AF, Key, KeyVals,O,PatientSum,Contradiction Flag), member (threshold (Threshold),KeyVals), find_normalization (KeyVals,Norm Factor), compute_key_prob(Contradiction Flag, PatientSum,Norm Factor,Threshold,CF,AF,CM). The M A T C H E R calculates patient sums as follows. First the terms used in the patient profile, which may be a mixture o f qualitative and quant i t at i v e terms such as 6 days, are converted to the DIJEST 135 d o m a i n d e p e n d e n t qualitative terms which are used in the D D s, fo r example short o r medium. T h e terms are then c o m p a r e d with the actual terms in the D D and exact matches an d c o n t r a d i c t i o n s are noted. T h e terms which exactly match are s u m m e d using weights which are given in the D D with respect to each attribute. This is illustrated with the co d e presented below. compute_patient-sum (Context, Hypothesis, Patient, PatientVals, CF,AF, Key, [threshold (T) ] ,TotalSu m,TotalSum,no). compute_patient_sum (Context, Hypothesis, Patient, PatientVals, CF,AF, Key, [ ElementIKeyVals] ,InterSum,NextSum,Contradiction Flag) ,-- Element'.,. = = threshold(T), match ( PrevContrad iction Flag, Context, Hypothesis, Patient, Key, PatientVals, Element, InterSum,TotalSum), check_contradiction (PrevContradiction Flag,Context, Contradiction Flag,Hypothesis, C F,AF, Key,TotalSum,NextSum). match (no,Context, Hypothesis, Patient, Key, PatientVals, Element, I nterSu m,AccSum) 4-- match-single_val(Hypothesis, Element, PatientVals,AttrContr), AccSum is InterSum + AttrContr. match (yes, Context, Hypothesis, Patient, Key, PatientVals, Element, lnterSum,AccSum) , - is_a_contradiction (Hypothesis, Patient, Key, PatientVals), record_contradiction (Context, Hypothesis, Patient, Key). match_single_val ( H ypothesis,site (Site, Contr),AIIValues, Contr) ,- member(site(Patsite),AIIValues), appropriate_.site(Hypothesis, Patsite). match_single_val (AnyConcept, Parameter,AIIValues, Contr) , - %generalized matching Parameter =.. [Name, ParVaI,Contrl, FindVal = .. [Name,SomeVal], member(FindVaI,AIIValues), match_from_tables(AnyConcept, Name, ParVaI,SomeVal). % Sample facts appropriate_site(choledocholithiasis, righLupper_quadrant). appropriate_site (choledocholithiasis,epigestrium). match_from_tables(_,duration, DAYS,short) *- number(DAYS), DAYS > =1, DAYS < 11. match_from_tables(_,duration, DAYS,moderate) , - number(DAYS), DAYS > 10, DAYS < 36. match_from_tables(_,duration, DAYS,long) ,- number(DAYS), DAYS > 35. Every direct key has a threshold, which is the m i n i m u m value o f the patient sum considered to adequately match the key. T h e second stage o f the M A T C H E R is to c o m p a r e the patient sum with the threshold set for this key. On the basis o f this c o m p a r i s o n , the M A T C H E R concludes whether the patient profile satisfies the a t t r i b u t e values completely, partially, o r c o n t r a d i c t s them, and calculates the C M accordingly. If the patient sum exceeds the threshold value, then we say that the direct key has been t'alidated. T h e CM value is this case is the C F value. F o r example, the at t ri b u t e values o f the patient in Fig. 4 indicates a sum o f 13 points. This is equal to the threshold value for this key, t h erefo re the direct key pain is validated for this patient. T h e C M is then set to 0.9. I f the patient sum is less than the threshold, and no c o n t r a d i c t i o n has been noted, the key has been partially z,alidated. T h e confidence measure is a normalized fraction o f the C F value. This is handled by compute-key_prob. M o r e details are in Ref. [7]. If a c o n t r a d i c t i o n has been noted, the value o f the CM differs depending w h et h er the absence factor o f the key is positive or negative. If the A F is negative, it is returned as the CM. Otherwise the CM is the negative o f the C F value. This is handled by check_contradiction. 136 L. I~. YAL(~INALP and L. STERLING We describe each o f the four categories o f extractable keys in turn, where the key appears in the patient profile: (i) Special-purpose knowledge is used to handle the an at o m i cal o r physiological states that are indexed as a key, such as c o m m o n bile duct o b s t r u c t i o n as in Fig. 3 o r swelling o f the pancreas. Some sample facts are given below. extract_from(Context, Hypothesis, Patient, PatientSlot, Key,KeyValuas, CF,AF,CM) ,- anatomy(Key), anatomy_test (Context, Hypothesis, Patient, PatientSIot, Key, KeyValues,CF,AF,CM). anatomy (Key) ,- organ (Key). anatomy(Key) ,-- system (Key, SystemComponents). anatomy (Key) ,- system (Sys, SystemComponents), part_of (Key,SystemComponents). system (intra hepatic_ducts, [left_intra hepatic_duct,rig ht_intra hepatic_d uct] ). system (extra hepatic_ducts, [common_bile_duct,cystic_duct,pancreatic_duct] ). anatomy_test (Context, Hypothesis, Patient, PatientSIot,Organ,present,CF,AF,CF) ,- organ(Organ), surgery(Patient,SurgeryList), not taken (SurgeryList,Organ). anatomy_test (Context, Hypothesis, Patient, PatientSIot,TestContext, (Specification,FacttoDetermine),C F,AF,CM) ,- test_illustrates (TestContext, Specification, ListofTests), prioritize(ListofTests,FinalTests), patient_satisfies (Hypothesis, Patient, PatientSIot,TestContext, Specification.FacttoDetermine, FinalTests, CF,AF,CM). F o r each state, the set o f relevant tests is determined along with their o rd er o f preference. T h e representation o f anatomical knowledge in D I J E S T has been designed to allow the M A T C H E R to find the necessary tests that would indicate the presence o f the specified state. F o r example, the M A T C H E R finds that ultrasound and C T tests are indicative for understanding the co n d i t i o n o f the c o m m o n bile duct when checking choledocholithiasis [7]. After the necessar,v tests are found, it is determ i n ed whether the patient has taken the test. If he has not, the C M for this key is calculated using the C F and A F values, and varies depending in which o f the four categories the C F and A F values lie. I f the patient has taken the test, d o m a i n specific knowledge is used to determine whether the patient's test results satisfy the specified state. If so, the CM is set to the CF. Otherwise, the C M is eq u at ed to A F because a conflict exists between the expected condition o f the patient and the patient profile. T h e r e is no possibility to partially validate these keys. F o r example, the results o f the u l t raso u n d for the patient in Fig. 4 are c o m p a r e d with the expected o u t c o m e s for the key c o m m o n bile duct. F o r the test results [7], it is found that the c o m m o n bile duct o f the patient is very dilated. T h e r e f o r e , CM is eq u at ed to 0.9. Th e ultrasound also shows there are gallstones in the gallbladder. CM for this key is set to 0.8. Planning optimal order o f tests, prioriti-e, is a complicated issue, and could be the d o m a i n o f a n o t h e r expert system that would p e r f o r m in parallel to D I J E S T . Currently, the tests are checked in sequential order. Studies in decision analysis for developing clinical strategies similar to the one for the diagnosis o f extrahepatic obstructive jau n d i ce can be useful for the d ev el o p m en t o f this module. Especially, the sensitivity, specificity, complications and the cost o f the individual tests have been investigated to devise different adaptive strategies for tests taking, represented as decision trees in Ref. [8]. We have used a~ailability as o u r criteria for ordering. (ii) Keys referring to blood tests, such as amylase and bilirubin, are evaluated using possibility distribution curves which are graphs provided to us by o u r experts. First the M A T C H E R checks whether this is a key that requires curve fitting analysis by seeing whether a patient has taken the particular test. If not, the calculation o f the C M is carried out by considering the four classes o f C F and A F values as for the anatomical states. If the patient has the test, the patient value is checked by a disease specific possibility distribution curve, where each curve estimates the DIJEST 137 likelihood that a patient with the particular test value has the disease being considered. The resulting possibility value is used along with the CF and AF to determine the CM of this key. For example, if the patient's test result shows a particular positive possibility, this value is used to normalize the CF specified for this key. Normalization is needed since the importance of this test result is specified with the CF, and how well the patient's result fits the expected value for the disease is determined by the curve. If the patient's test result contradicts the presence of the disease, invalidating the key, then the full AF value is used as the CM value. Curve fitting is actually not very suitable with Prolog if speed and accuracy is required. It should be implemented as an external procedure. extract_from (Context, Hypothesis, Patient, PatientSlot, Key, KeyVals, CF,AF, CM) ,-- possibility_curve( Key), curve_fitting (Hypothesis, Patient, PatientSIot, Key, KeyVals,C F,AF, CM). possibility_curve(Key) ,-- blood_test(Key). curve_fitting(Hypothesis, Patient, PatientSIot, Key, KeyVals, eF,AF, e M ) ,-- blood_test_analysis( Hypothesis, Patient, PatientSIot, Key, CF,AF, CM ). blood_test_analysis(Hypothesis, Patient,PatientSIot, BloodTest, CF,AF, CM) ,-- (get_patient_val(BloodTest,serum,PatientSIot, Result); get_patient_val(BloodTest, blood,PatientSIot, Result)), blood_test (BloodTest, Hypothesis, Result, Prob), calculate_CM (Prob. Hypothesis, Patient, BloodTest, eF,AF, e M ) . (iii) Recall that compound keys refer to a collection of findings, for example prodrome. Their analysis requires the MATCHER to consider each finding in the collection similar to the consideration of each attribute of a direct key. Each finding for compound keys, though, has to be analyzed separately similar to an element of a slot. The collected result of all the findings determines the overall CM for this key. extract_from (Context, Hypothesis, Patient, PatientSIot, Key, KeyValues, C F,A F,C M ) ,- c o n c e p L t a b l e ( C o n t e x t , Key, Keyeoncepts), satisfy_concept (Context, Hypothesis, Patient, PatientSIot, Keyeoncepts, Prob), concept_prob(Prob,CF,AF, CM). The sum of all the confidence measures of the findings that are related to this key is denoted CMs. CMs is tested with respect to an interval [0,Threshold) where the value of the threshold for compound keys is application-dependent. If CMs lies within this interval, the presence of a finding can be neither validated nor invalidated, and is considered to be unknown. If the value is to the left of this region, the finding is invalidated and the overall CM is set to the AF. Otherwise, it is considered to be fully validated and the overall CM is set to the CF. This is handled by concept _prob. (iv) The rule names that are used within key tuples are evaluated by activating each rule, for example for li~'er tests and obstructit,e tests. These rules, which represent for example a group of tests, need to be evaluated considering domain specific dependencies of the tests. Each rule is interpreted separately and the CM calculation varies for each. Default behavior if the patient has not taken the test is similar to the default behavior for anatomical states and blood tests. For example, the rule obstructit'e tests in Fig. 3 is activated for the patient in Fig. 4. The values of the tests of this patient is found to be sufficient for this rule. Therefore, the CM for this key is set to the CF value, which is 0.7. extract_from (Context, Hypothesis, Patient, PatientSIot, Key, KeyValues, C F,AF, C M ) ,-- call_proc ( [ Key, Context, Hypothesis, Patient, PatientSIot, KeyValues, C F,A F,C M ] ). / * C A L L A N Y PROCEDURE PASSED AS P A R A M E T E R ' / call_proc([ProcNamelList]) , - Proc = .. [ProcNamelList],Proc. The MATCHER has a default behavior for evaluating keys which are not covered by the above discussion, for example a direct key in the DD which does not appear in the patient profile, or a compound key for which no information is known. The CMs of these keys are determined with 138 L. CI. YAL(;INALP and L. STERLING respect to the f o u r categories o f C F a n d A F values. T h e crucial categories o f critical keys a n d c o n t r a d i c t i n g keys are chosen so as not to c o n t r i b u t e to the overall sum. T h e confidence m e a s u r e C M is calculated as follows: n o L k n o w n (Context, Hypothesis, Patient, gey, KeyVals, BV,CF,AF,CM) ,-- % minor keys CF < BV A F > = 0 , CM is(CF + AF)/2. not_known (Context, Hypothesis, Patient, Key, KeyVals, BV, C F,A F,0) ,- % critical keys CF > = BV, A F < 0 , record_question ( [ Context, H ypothesis, Patient, Key, KeyVals] ). not_known (Context, Hypothesis, Patient, Key, KeyVals, BV,CF,AF,0) ,- % contradicting keys A F < O , CF < BV record_possible_contra (Context, Hypothesis, Patient, Key). not_known( Context, Hypothesis,Patient, Key, KeyVals, BV, CF,AF,AF) ,- % Confirming keys A F > = 0 C F > = BV, record_u nknown ( Context, Hypothesis, Patient, Key). F o r e x a m p l e , the exact location o f the o b s t r u c t i o n c a n not be d e t e r m i n e d by u l t r a s o u n d for the patient in Fig. 4 [7]. This key is a critical key. T h e r e f o r e , the C M value is set to 0 by the default values as described a b o v e . 4.5. Orerall likelihood score T h e o~erall likelihood score o f a slot Ls~o,, as m e n t i o n e d earlier, is the sum o f the C M for each key a n d n o r m a l i z e d by the specific n o r m a l i z a t i o n f a c t o r o f the slot. T h e n o r m a l i z a t i o n factor, N F , is defined as follows ~ h e r e n is the n u m b e r o f elements in a slot. We a s s u m e that not all absence f a c t o r s are zero. " ,~AF,, A F >/0, N F = E f = , -I.CF,, A F < 0 . T h e weighted sum, WS,, can be defined as the best case where all the elements o f the slot i is validated. T h u s , WS, = E~'= t C F , . T h e r e f o r e , N F , ~< WS,. With this relation, the n o r m a l i z a t i o n helps to increase the c o n t r i b u t i o n o f slot i to the likelihood o f the overall context. T h e score might be g r e a t e r t h a n I with d a t a that c o n f i r m s all the expected values o f a slot in a disease descriptor. T h e overall likelihood o f a c o n t e x t is thus defined as N \ T Ls o~, L c = J=l N V where N V equals the n u m b e r o f valid slots in a context. Let us illustrate this calculat,on b~ using the e x a m p l e disease in Fig. 3 a n d the patient in Fig. 4. I f the l a b _ t e s t s slot is considered, it is seen that the n o r m a l i z a t i o n _ f a c t o r . 2.7 is calculated as described above. T h e calculation o f C M s for each o f the keys in this slot is illustrated in Section 4.3. Respective b , the~ are 0.7, 0.8, 0 a n d 0.9. T h e sum o f these C M s is 2.4. Using these values, Lsk,,, is set to 0.89. Since there is only one slot in this context, L~b_, .... is equal to 0.89. 4.6. The patient anaO,sis When the M A T C H E R calculates the likelihood scores o f a disease, special i n f o r m a t i o n related to the patient with respect to each disease is recorded a l o n g with the likelihood scores, T h i s D I J E S T 139 information is used to produce an evaluation report a b o u t the status o f a patient. It consists o f the list o f findings which are expected but not present in the patient data, which are contradictory to the evaluated disease, and the important concepts which have not been validated during the analysis o f the M A T C H E R . The findings o f the evaluation are divided into four categories, questions, contradictions, possible contradictions and unknowns. To record this information, again the four categories o f C F and A F values are used. The code in the previous section is suitably adapted. The evaluation report can be used to guide the subsequent stages o f clinical diagnosis in the screening process shown in Fig. 2. For example, the missing necessary tests to check a specific condition that have not been performed are suggested by questions for a disease. Contradictions are the set o f facts in the patient profile that contradict the existence o f the disease. Possible contradictions are the unknown classes o f information which might be critical. They can contradict the disease if their definite absence is proven. U n k n o w n s is the category o f data that can be used for confirmation but are u n k n o w n at the time o f evaluation, 5. P E R F O R M A N C E O F D I J E S T The development time for D I J E S T was a b o u t nine m o n t h s including our learning a b o u t aspects o f jaundice, the diseases and the related a n a t o m y and the physiology. The knowledge represen- tation scheme and the uncertainty reasoning mechanism reflect our perception o f medical concepts and clinical reasoning provided by our experts. D I J E S T has been tested with cases taken from medical text books and real patient records. For example, Table I shows a differential diagnosis produced by DIJEST for a patient with choledocholithiasis. The medical history o f this patient is shown by Fig. 4. During testing, the evaluation o f all the d o m a i n diseases were included. In clinical use, a threshold m a y be used to inhibit unlikely diseases. The analysis shows that choledocolithiasis is given the highest likelihood score by DIJEST, even though it does not get the highest score in each context. The score for acute cholecystitis shows the way large absence factors can prevent a disease from being considered seriously as explaining the jaundice. The scores from the contexts o f clinical examination and lab tests strongly suggest that cholecystitis could explain the jaundice, more so than choledocolithiasis, but the patient's history strongly contradicts the disease. The evaluation report o f this patient points out for example the lack o f information about critical findings o f hepatitis, such as the presence o f a prodrome, or the exposure to the use o f needles in the past. The evaluation report is not shown here. Later on in the course o f the disease, the same patient contracted pancreatitis, directly caused by the choledocholithiasis. We added new test results to the patient profile and re-ran DIJEST. The result o f the second differential diagnosis is given in Table 2. The only changed scores are o f those diseases related to the pancreas. Note especially that the likelihood score o f pancreatitis has significantly increased. Table 2 demonstrates the ability o f DIJEST to cope with multiple diseases. Knowledge is still necessary, for example, to realize that hepatitis and choledocholithiasis do not in general co-exist, whereas choledocholithiasis m a y cause pancreatitis. Such reasoning, which would form part o f the screening process, allows us to place more significance on the score for pancreatitis than for hepatitis even though it is actually marginally lower. Table l Table 2 Likelihood Scores for Patient I0001 Disease History Clinical choledocholithissis i .00 0.84 viral hepatitis -0.05 0.99 hepatitis -0.75 0.99 acute cholecystitis -1.50 1.17 pancreatitis 0.28 0.20 pancr, pseudo cyst 0.16 0.00 cirrhosis 0.89 0.90 paJncreatic cancer -0.50 0.17 Tests Total Score 0.89 0.91 0.81 0.58 1.00 0.41 1.08 0.25 0.21 0.23 0.13 0.10 - 1 . 5 0 0.03 - 0 . 8 7 - 0 . 4 0 II II Likelihood Scores for Patient 10001 Disease History Clinical Tests Total Score choledocholithiasis 1.00 0.84 0.89 ().91 viral hepat, itis - 0 . 0 5 0.99 0.81 0.58 pancreatitis 0.28 0.20 1.06 0.51 hepatitis -0.75 0.99 1100 0.41 acute cholecystitis - 1 . 5 0 1.17 1.08 0.25 cirrhosis 0.69 0.90 - 1 . 5 0 0.03 pancr, pseudo cyst 0.16 0.00 - 0 . 3 0 - 0 . 0 5 pancreatic cancer - 0 . 5 0 0.17 - 1 . 2 3 - 0 . 5 2 140 L. I~. YAL~INALP and L. STERLING D I J E S T was i m p l e m e n t e d by using P r o l o g c o n s t r u c t s which are s t a n d a r d in a l m o s t all Prologs. It currently runs under Sicstus a n d Q u i n t u s Prologs. In terms o f speed, p r o d u c i n g a table such as a b o v e a n d the e v a l u a t i o n r e p o r t takes only a few seconds on the average. 6. C O N C L U S I O N S T h e features o f D I J E S T in its c u r r e n t state can be s u m m a r i z e d as follows. Medical k n o w l e d g e is represented declaratively. T h e d o m a i n specific knowledge a n d d o m a i n specific r e a s o n i n g is clearly distinguished f r o m d o m a i n i n d e p e n d e n t k n o w l e d g e by the M A T C H E R by using different types o f keys. T h e c o m p l e x medical k n o w l e d g e related to the diseases, the characteristics o f different testing p r o c e d u r e s a n d the basic a n a t o m i c a l a n d physiological structure o f the b o d y are all represented i n d e p e n d e n t l y o f p a t i e n t i n f o r m a t i o n a n d illustrate characteristics o f jaundice. Using P r o l o g e n a b l e d us to reach o u r objective, to have this s e p a r a t i o n and write a specialized interpreter very easily. T h e interpreter has also been generalized to handle d o m a i n s o t h e r t h a n D I J E S T by c u s t o m i z i n g the general m a t c h i n g capabilities o f the interpreter. R e p r e s e n t i n g the likelihood estimates by using two s e p a r a t e factors, c o n t r i b u t i o n and absence factors, can distinguish between valid, invalid, u n k n o w n and a b s e n t data. D I J E S T presents very realistic likelihood estimates o f the presence o f the c a n d i d a t e diseases by e v a l u a t i n g the patient profiles, which m a y be incomplete. O f special i m p o r t a n c e is the calculation o f likelihood scores o f the individual c o n t e x t s a n d their effect on the final diagnosis. D I J E S T also e m p h a s i s e s significant f a c t o r s in the e v a l u a t i o n o f each disease. C o n t r a d i c t o r y findings a n d i m p o r t a n t d a t a which m a y be required for further e v a l u a t i o n o f the patient are noted. D I J E S T is very p r o m i s i n g in the early detection o f co-existing diseases in a patient a n d provides g o o d likelihood estimates in the cases with multiple diseases. T h e m o s t difficult task in D I J E S T is to o b t a i n the c o n t r i b u t i o n a n d absence factors for different keys. Especially, representing the experts" qualitative view o f the subject by using those factors needs successive e x p e r i m e n t s a n d a d j u s t m e n t . A w e a k p o i n t o f D I J E S T is its neglect o f unexplained factors that are c o n t a i n e d in the patient profile. T h e presence o f a screening process for presenting the results o f M A T C H E R in a user-oriented m a n n e r a n d for r e m o v i n g r e d u n d a n t i n f o r m a t i o n would e n h a n c e the p e r f o r m a n c e o f D I J E S T . T h e consistency checking is also only partially complete. At this stage, however, D I J E S T is e n c o u r a g i n g in its expressive p o w e r for medical knowledge a n d by p r o v i d i n g useful likelihood estimates to indicate the presence o f d o m a i n diseases. It has p o t e n t i a l for detecting the co-existence o f multiple diseases. It is unique in b o t h its knowledge r e p r e s e n t a t i o n scheme a n d r e a s o n i n g with uncertainty. .4cknowledgements--We would like to thank our medical expert, Professor David Ransohoff, for providing the medical knoaledge embodied m DIJEST. We are grateful for his ~aluable time in attending the knowledge engineering sessions and ackno~ledge his influence in shaping DIJEST. Drs Arnold Shmerling and Lawrence Widman also supplied valuable medical insights. Dr Len SamueLs commented on an earher draft and pro,,ided the correspondence between contribution and absence factors of DIJEST and the scoring mechamsm of Internist. We also thank Yuval Lirox for inviting us to submit this paper to the specml isssue of Computers & Mathematics with Apphcations. R E F E R E N C E S S. Pauker, A. Gorry. J. Kassier and W. Schwartz, To,~ards the simulation of clinical cognition: taking a present illness by computer. 4m. d. Med. 60, 981-996 (1976). 2. B G. Buchanan and E. Shortliffe. Rule Based Expert Systems The 31 YCIN Ex'perlments o f the Stanford Programming Prolect. Addison-Wesley, Reading. Mass. (1984). 3. R. Miller, H. Pople and J D. Meyers, Internist I, An experimental computer based diagnostic consultant for general mternal medicine. New Engl. J, Med. 307, 468-476 (1982). 4. F. E. Mesarie, R. Miller and J. D. Meyers, Intermst-I properties: representing common sense and good medical practice in a computerized medical knowledge base. Computers Biomed. Res. 18, 458~,79 (1985). 5. P. Szolovitz and S. Pauker, Categorical and probabilistic reasoning in medical diagnosis. Art!L Intell. II, 115-144 (1978), 6. Harrtson's Principles o f Internal Medicine, I Ith edn. McGraw-Hill, Net York (1987). 7. L I~I. Yalqmalp. Uncertainty reasomng in a medical expert system: DIJEST M.S. Thesis, Department of Computer Engineering and Science, Case V~estern Reserve University, Cle~,eland, Ohio, (1987). 8. J. Richter, M. Silverstein and R. Shapiro. Suspected obstructive jaundice, A decision analysis of diagnostic strategies. Ann. Internal Med. 99, 46-51 (1983).