VALIDATION OF EXPERT SYSTEMS­ WITH APPLICATIONS 10 AUDITING AND ACCOUNTING EXPERT SYSTEMS* Daniel E. O'Leary Graduate School of Business, University of Southern California, Los Angeles, CA 90089 ABSTRACT This paper proposes a set of definitions for the concepts "validation" and "assessment" applied to expert systems (ESs). It develops a framework for this validation and demonstrates the framework on existing accounting and auditing ESs to elicit some of the research issues involved in ES validation. Validation is critical to the design and implementation of decision-making ESs. In a setting where objectivity is sought and variance is avoided, validation ascertains what a system knows, knows incorrectly. or does not know. Validation ascertains the system's level of exper­ tise and investigates the theoretical basis on which the system is based. It evaluates the reliabili­ ty of decisions made by the system. The validation framework developed in this paper is research methods. It is designed to reflect the unique aspects of ESs (in contrast to other types of computer programs) and can be used by ES developers as a basis from which to perform validation and by researchers as a framework to elicit research issues in validation. Subject Areas: Accounting Theory and Auditing. INTRODUCTION In addition to the processes of designing, developing, and implementing ex­ pert systems (ESs), validation is important to the decision-making success of a sys­ tem and to the continued use of an ES. An ES that has not been validated suffi­ ciently may make poor decisions. This can lead to a loss of confidence in the par­ ticular ES or in other systems, resulting in discontinued use and financial loss. A number of different approaches to validating particular auditing and ac­ counting ESs [16] [17] [18] [38] [39] and medical ESs [52] [8] [7] have been reported. Validation of general ESs [21] [33] and potential bases for the validation of ESs [4] also have been discussed. This paper presents a theory-based framework that is useful not only for guiding the validation of an ES, but also for eliciting other validation research issues. Validation of ESs Developing, designing, and implementing systems for making expert decisions requires analyzing the knowledge base and the decision-making capabilities of the system. That process, referred to as validation, requires 1. ascertaining what the system knows, does not know, or knows incorrectly *The author would like to thank Doug Andrews, Nils Kandelin, Chen-en Ko, and Paul Watkins from the University of Southern California 1986 Conference on Expert Systems; participants at the American Accounting Association 1986 national meeting; Paul Watkins and the anonymous referees 468 1987J O'Leary 469 2. ascertaining the level of expertise of the system 3. determining if the system is based on a theory for decision making in the particular domain 4. determining the reliability of the system Once these concerns have been satisfactorily addressed, the system is updated to reflect the findings and may be revalidated. The process should be performed in an environment designed to provide an objective and cost-effective validation. Validation ascertains what the system knows, does not know, or knows incor­ rectly. For example, when the expert system Rl (a system for configuring computers used by Digital Equipment [36]) was validated, errors and omissions in the knowl­ edge base were identified. These errors and omissions were corrected before the system was placed in service [21]. Validation ascertains the level of decision-making expertise of the system. The types of problems a system can solve and the quality of its solutions define the level of expertise. For example, in education it is common to define an individual's level of accomplishment based on the types of problems that the individual can solve and the quality of the solutions produced. Validation determines if the ES is theory based. Davis [12] [13] argued that basing an expert system on a theory is an efficient approach to developing such a system. Lack of a theory base has resulted in the failure of at least one system [48]. Validation analyzes the reliability ofthe ES. Given similar inputs, the ES should produce similar outputs. In addition, before and after revalidation a system generally should give similar responses to similar sets of test problems. Validation Process Previous analyses of ES validation have stressed the importance of periodic informal validation rather than a single, formal validation at the end of a project [44]. This validation will not be the same in each situation but will differ in formal­ ity and in the extent to which the validation process is implemented. As suggested by software engineering [44] and ES validation [2l], formal acceptance testing may be appropriate. (A formal acceptance test might consist of a test where the user formally signs off on the quality of the decisions made by the system.) ES Assessment Although the focus of this paper is on ES validation, other issues addressed in developing, designing, and implementing ESs do not relate to decisions that the system makes; nevertheless, these other issues do contribute to the overall success of a system. The assessment process involves analyzing the user interface and sup­ porting the quality of the development effort. For example, it involves 1. ascertaining the quality of the user interface [31] .. of an earlier version of this paper; and, in particular, John Henderson for their comments. Earlier ver­ sions of this paper were presented at the University of Southern California Symposium on Expert Systems, February 1986, and the American Accounting Association National Meeting, August 1986. ,' 470 Decision Sciences [Vol. 18 2. evaluating the documentation of the system (a typical weakness of most systems, with good reason; since ESs are evolutionary, the documenta­ tion does not evolve at the same rate as the rest of the system) 3. determining what language or shell should be used to develop the system 4. analyzing the quality of the system programming Paper Objectives This paper has four primary objectives: to propose a set of definitions for the concepts "validation" and "assessment" applied to ESs; to develop a framework for the validation of ESs; to demonstrate that framework on some previously pub­ lished auditing and accounting ESs; and to use that demonstration to elicit some of the research issues involved in ES validation. The paper proceeds as follows. The objective of developing a framework is to take advantage of the unique aspects of ESs. Therefore the next section discusses some unique aspects of ESs in general and of auditing and accounting ESs in par­ ticular. Using a research methods approach, these unique characteristics are incor­ porated into a framework for ES validation. The framework is demonstrated on accounting and auditing ESs and that demonstration is used to elicit some of the research issues involved in the validation of ESs. UNIQUE CHARACTERISTICS OF AUDITING AND ACCOUNTING ESS If ESs are like other computer systems, then validating an ES should be the same as validating any other computer system. However if ESs are different, then their unique characteristics can be used to develop a specific framework for their validation. A number of technical, environmental, design, and domain charac­ teristics distinguish ESs from other computer-based systems. The technical aspects which distinguish ESs include the following. First, ESs process symbolic information (e.g., "If... then" rules) rather than just numeric infor­ mation [1] [43]. This ability to process symbolic information allows ESs to solve nonnumeric-like problems, which generally are less precise than numeric problems. Second, experience with representing knowledge shows that a fraction (less than 10 percent) of this knowledge escapes standard representation schemes and requires special "fixes" [19] to make it accessible. Since other systems do not use knowledge representation, they do not face this problem. Third, ESs often are developed using either artificial intelligence (AI) languages or ES shells [25]. An AI language is a computer language aimed at processing symbolic information (e.g., Prolog [11] or Lisp [5l]). An ES shell is software designed to aid the development of an ES by prespecifying the inference engine and making it easier to input knowledge into the system. Some of the first ES shells were EMYCIN [7] and AL/X [15]. More recently developed shells include Texas Instrument's Pef§Onal Consultant Plus, Teknowledge's M.I, and Inference Corporation's ART. The characteristics of ES shells and AI languages (e.g., their ease of examination by nonprogrammers) can be made a part of an ES validation framework. .­ 1987] O'Leary 471 Environmental characteristics which distinguish ESs include the following. First. ESs directly influence or make decisions [1] [25] [49]. Other systems (e.g., decision support systems (DSSs» simply support decision making or have an in­ direct impact on decisions. Second, the expertise being modeled by an ES generally is in short supply, is an expensive resource, or is not readily available at a particular geographic location [20]. This is in contrast to other computer systems (e.g.• account­ ing systems) where generally a number of personnel understand what the system is designed to accomplish. In addition to these characteristics that differentiate ESs from virtually all other types of computer systems, the dominant design methodology for both ESs and DSSs differentiates them from traditional computer systems. First, ESs (and DSSs) often are developed using a "middle-out" design rather than the traditional data processing approach of top-down or bottom-up design. A middle-out design phi­ losophy starts with a prototype and gradually expands the system to meet the needs of the decision [27]. Second, like DSSs, ESs evolve over time-the system changes as the decision-making process gradually is understood and modeled [28]. As a consequence, traditional validation models based on other design philosophies are not likely to meet the needs of an ES. Finally, some domain characteristics of auditing and accounting decisions (and other business-based decisions) often distinguish these decisions from decisions made in other domains. First, in cpntrast to some scientific decisions that have a unique solution (such as those represented by the ES DENDRAL [1]), auditing and accounting decisions generally do not have a single solution. Second, the deci­ sion reached often can be evaluated only by how similar it is to decisions other decision makers develop (i.e., by consensus); there may be no way to rank the deci­ sions a priori. Third, different schools of thought may represent alternative knowl­ edge bases. This means experts may not agree on a recommended solution. (This can apply to other disciplines as well.) Fourth, a "good" decision does not necessarily result in "good" consequences. Decisions are based on information available at the time they are made; this does not guarantee desirable consequences at a later time. Fifth, in contrast to some disciplines where a decision has no direct dollar value, the decision modeled by an ES can have substantial monetary value. VALIDATION FRAMEWORK Software engineering encompasses the general set of tools and techniques that aid programmers in software development. Since ESs are computer programs, soft­ ware engineering might appear to be a likely candidate to supply a framework for ES validation. However, the unique characteristics of ESs in general, and of account­ ing and auditing ESs in particular, indicate that such a framework will not be appropriate. An examination of one such appro8,fh [44] supports this view. First, in soft­ ware engineering the focus of validation is on finding errors in the program. The validation of ESs is more than just a process of finding errors. Here the validation process meets the development needs of derming the level of expertise of the system, ," 472 Decision Sciences [Vol. 18 identifying what the system can and cannot do, understanding the quality of the decisions produced by the system, and describing the theory on which the system is based. Because software engineering generally is not concerned directly with human expertise, it is not directly concerned with these issues. Second, the unique characteristics of ESs indicate they are different from other computer programs. Since ESs process symbolic information, use ES shells and AI languages, and require special fixes in their knowledge bases, they also require different validation approaches than other computer programs. Further, ESs directly affect decisions and they model expertise that is in short supply; the environment in which they operate is different from that of other computer programs. Third, the domain-based decisions of auditing and accounting ESs often are not well enough understood to use traditional software engineering approaches. While software engineering generally uses structured top-down or bottom-up ap­ proach in the design and evaluation of software, ESs are evolutionary and often are developed using a middle-out approach. One alternative to the software engineering approach is a research methods approach. This approach views the development of ESs as experimental representa­ tions of human expertise, that is, as research designs. Kerlinger defined research design as "the plan, structure and strategy of investigation conceived so as to ob­ tain answers to research questions and to control variance" [30, p. 3001. In a similar sense, validation can be defined as the p~an, structure, and strategy of investiga­ tion conceived so as to obtain answers to questions about the knowledge and deci­ sion processes used in ESs and to control variance in that process. Kerlinger also noted, "research designs are invented to enable the researcher to answer research questions as validly, accurately, objectively, and economically as possible" [30, p. 301]. He also noted that accuracy consists of four concepts: reliability, systematic variance, extraneous systematic variance, and error variance. We use these characteristics of research design (validity, objectivity, economics, and accuracy) to formulate a framework (Thble 1) for the validation of ESs. Since Kerlinger noted the existence of three types of validity in research methods (con­ tent validity, criterion-related validity, and construct validity), each will be treated separately. Content Validity "Content validity is the representativeness or sampling adequacy of the content­ the substance, the matter, the topics" [30, p. 4581. In validating ESs, content validity refers to ascertaining what the system knows, does not know, or knows incorrectly. This can be operationalized in at least two ways: by direct examination of the system components or by testing the system. Direct Examination of the System Components. An ES is based on the knowledge of experts. Accordingly, it is importfnt that these experts know what is contained in the system. The expert can examine the knowledge base directly in one of two ways. First, the ES could develop, for example, a list of the rules in its knowledge base for periodic review or a summary of the process it uses for " 1987] O'Leary 473 Table 1: Summary of validation framework. 1. Content validity a. Direct examination of the system by the expert b. System test against human experts (Thring test) (1) Intraexpert test (2) Interexpert test c. System test against other models 2. Criterion validity a. Definition of the level of expertise of the system (1) Human evaluation criteria (2) lest problems to define the level of expertise (3) Quality of responses defined b. Knowledge-base criteria c. Clarification of evaluation criteria 3. Construct validity 4. Objectivity a. Programmer validation b. Independent administration of validation Co Sponsor/end-user validation d. Biasing and blinding e. Different development and test ~ata 5. Economics (cost-benefit) 6. Reliability a. System test against itself (sensitivity analysis) b. lest problems for revalidation 7. Systematic variance (experimental variance) a. Problems reflecting range of problems encountered b. Variation in the test problems c. Number of test problems d. 'TYpe I and 'TYpe II errors 8. Extraneous variance a. Complexity of the system b. ES's location in the system life cycle c. Recognition, examination, and testing of special fixes d. Location of judges during testing e. Learning on part of judges 9. Error variance the inference engine. Second, the expert could examine the storage of the informa­ tion by the ES directly. The first solution might produce reports that could be read easily but, being an intermediate step, it also would require validation. As a result, it would be preferable if the expert could examine the ,nowledge base directly. In the second case, the primary concern is the format of the knowledge to be reviewed. If the system were built using an ES shell or an AI language such as Prolog, direct review likely is feasible. ---................... ..... -----­~-- 474 Decision Sciences [Vol. 18 A knowledge expert might not be able to investigate the inference engine to see whether it is correct because of the complexity of the computer code. However, if an ES shell is used, the inference engine normally would be prespecified. The direct examination of the components has some limitations. First, fear of computers [21] may generate hostility toward the system. Second, human infor­ mation processing is limited. Direct examination may not correctly process the links between parts of the knowledge base; also, the breadth of information contained in the knowledge base could limit the success of a direct investigation. Third, the direct approach may require substantial resources. If the expertise is scarce, purchas­ ing expertise may be costly. Fourth, direct examination may be a tedious job that could lead to errors in the validation. Fifth, if the knowledge base is large or com­ plex, examination could prove very time consuming and complex and lead to infor­ mation overload. Sixth, current technology is not designed to facilitate direct examination. Further, any direct analysis of the knowledge base is limited to looking at the pieces of the base and not at how they interact. In addition, translation of the com­ puter code makes any direct analysis of the heuristics in the inference engine dif­ ficult. Accordingly, the best solution is to test the system as a whole. System Test against Human Expert. If the system is designed to perform as an expert, it should be tested against an expert(s). To ensure that the ES has cap­ tured the expertise of the expert it Wij.S designed around, its decisions should be compared to that expert's. However, such an intraexpert test procedure has the potential to introduce bias into the validation process through the acquisition of information from memory or the manner in which information is processed [26]. In terms of ESs, this means that the knowledge base or the relationships between sets of rules or the heuristics in the inference engine all may contain bias. This suggests that the ES also must be tested against other experts from the same "school" as the original expert but who are not biased or invested in the particular ES. Since such interexpert tests when conducted using experts from alternative schools may yield contradictory decisions, this type of comparison is not recom­ mended. However, alternative views of the world might produce additional knowledge for the system. System Test against Other Models. System tests against human experts may be preferred to tests against other models. However, tests against human experts can be expensive. Experts also face time constraints [7]. Since the system is a model, one important characteristic should be its relationship to other models. An ES should 1. perform in a similar or better fashion than other models for the same problem 2. be able to solve problems that are not amenable to other solution methodologies ,. One type of model used to analyze decisions is regression analysis [6] [32] where independent variables represent the variables used by a decision maker. Other types of models may be used, but the model used largely is a function of the problem " 1987] O'Leary 475 and previous recommended solutions to the problem. In particular, the preferred model would be the one that has provided the best solution to the problem so far. For example, an ES for production scheduling would be tested against an existing operations research model. Unfortunately, in some cases comparison of the ES to regression or to other approaches may not be feasible because of the structure of the ES knowledge base, (e.g., if "If. .. then" statements are present). It may be very difficult to translate such rules into numeric variables. However, simulation generally is a feasible alter­ native [9] [10]. Criterion Validity "Criterion-related validity is studied by comparing test or scale scores with one or more external variables, or criteria, known or believed to measure the attri­ bute under study" [30, p. 459]. In ES validation, this refers to the criteria used to validate the system, for example, to ascertain the level of the system's expertise. The primary criterion for system validation is the relationship between the deci­ sions developed by the system and decisions developed by human experts. In AI this is referred to broadly as a Thring test. However, this sort of relationship is not evaluated easily. Difficulties arise for a number of reasons: differing definitions of the level of expertise, differing kn