College and Research Libraries Performance Appraisal: Behavioralism and Its Discontents . Charles A. Schwartz In the area of performance appraisal, libraries and other nonprofit organizations face a special dilemma of providing job feedback while maintaining a nonauthoritarian climate. This paper assesses the utility of four approaches to this problem: behavioralism, minimal model, heuristic literature, and tacit knowledge from in-house experimentation. Part one argues that sources of rater error are generally intractable and, concomitantly, that behavioral appraisal methods are ineffective. Part two takes up the widespread notion that performance appraisal, as a matter of ethics, should be less ambitious and important. The third part describes the heuristic literature in this field as a "folk-science." The next part surveys the experience with in-house experi- ments in the social and behavioral sciences. Then, the final part of the paper draws some con- clusions about the relative utility of the different approaches and offers a suggestion as to the future procedure in this field. he literature on performance appraisal has been written largely in terms of two contrast- ing emphases. On the one hand -are those writers, principally in the fields of business and applied psychology, who propose multitrait-multimethod appraisal systems. This behavioral approach has a rich, half-century history of research, but it has developed unevenly. Investigation of most of the possible types of rating for- mats has probably been pushed to the lim- its of its usefulness, while study of organi- zational factors surrounding the appraisal process is still rudimentary .1 Underlying the concepts and procedures of the behav- ioral approach is the basic notion that some combination of appraisal methods can be devised to control rater error. At the other extreme is a smaller group, which argues that appraisals should be scaled down and made less important, es- pecially for salary and rewards adminis- tration. This argument is roughly a quarter-century old but has not been ad- vanced empirically. It draws on Douglas McGregor's classic contention that super- visors and subordinates distrust apprais- als for fundamentally sound reasons: few supervisors want to play God; at the same time, every subordinate seeks to protect the integrity of his or her own personality, as well as to obtain valued rewards .2 In the gap between the two emphases is a heuristic perspective common to jour- nals geared toward practitioners (e.g., Harvard Business Review) and to library sci- ence. It is found mainly in lists of recom- mendations about what to avoid when evaluating people and how to motivate .them in the process. Although the literature in the library field over the past ten years testifies to a growing concern about the need for better Charles A . Schwartz is Social Sciences Bibliographer at the University of Northern Iowa, Cedar Falls, Iowa 50613. 438 appraisal methods, 3 general discussions have not been followed up by experimen- tal research that describes how particular methods affect performance at work, the state of morale, or the margin of error. As an approach to performance ap- praisal, in-house experimentation is a novelty in the social and behavioral sci- ences. In part, interest in it can be traced to the gradual unfolding of two interdiscipli- nary concerns that pervade the so-called post-behavioral movement. These con- cerns are the issue of certainty versus use- fulness of research and the question of whether human interaction and social ar- rangements can be understood with a dis- engaged consciousness. However, the ac- tual practice of in-house experimentation has been limited because of the difficulty involved. This paper assesses the utility of differ- ent approaches to performance appraisal for nonprofit organizations. The first sec- tion reviews the behavioral approach in three areas: (1) sources of rater bias and variance in the traditional supervisor- subordinate context; (2) problems inher- ent in alternative appraisal contexts; and (3) the status of several common appraisal techniques. The second section discusses the minimal appraisal, with particular ref- erence to the dilemma of providing better job feedback while preserving a non- authoritarian climate. The third section describes the heuristic approach to perfor- mance appraisal, focusing on two prob- lems in this area: the valuing aspect of ap- praisals, and the limitations of common sense. The fourth section contains a re- view of the experience with in-house ex- ' periments. The fifth and concluding part provides a brief summary that calls for ad- aptation of in-house experimentation and development of tacit (personal) knowl- edge. The term performance appraisal must be broadly interpreted since it covers place- ment and promotion decisions, salary and rewards administration, motivational and developmental counseling, and criteria for the evaluation of personnel practices. This paper discusses all of these matters, but focuses on the last one; legal concerns, such as due process, are not discussed. Performance Appraisal 439 The term nonprofit organization requires comment. Administration texts distin- guish among three sectors: business en- terprises, government agencies, and non- profit organizations. Nonprofit organiza- tions are distinguished from the other sec- tors by having ambiguous means and ends, little or no job feedback, and a ten- dency not to monitor their environments. Whereas the characteristics and earnings of nonprofit workers may differ substan- tially from workers in business or in gov- ernment, there is no evidence that job ori- entations or other subjective behaviors vary across the sectors. 4 But the term is preferable to some others, such as knowledge-based industries 5 or organized anarchies. 6 I. BEHA VIORALISM Rater Bias and Variance There exist severe problems in obtaining (1) ·accurate supervisor ratings of job be- havior, even on a single performance di- mension; and (2) agreement among raters on a ratee' s composite, comparative eval- uation. Although rater agreement is no guarantee of accurate results, rater dis- agreement does signal the presence of er- ror. Two fundamental sources of rater variance can be singled out: bias and halo error. Many specific sources of variance can be partially remedied or at least ac- counted for in appraisals, whereas bias and halo error remain intractable. 7 By all accounts, rater bias (i.e., prefer- ence or prejudice, whether conscious or unconscious) is the most serious common problem of performance appraisals. A substantial body of empirical literature demonstrates what one might expect from everyday experience: people tend to rate people like themselves more favorably. 8 According to pers.~mal construct theory, each individual relies on a set of his or her own characteristics to make judgments about others. Thus, "more effective" managers value initiative, planning abil- ity, perseverance, and broad knowledge; "less effective" managers value consider- ation, tact, cooperation, teamwork, and loyalty .9 In the same way, raters are differ- entially accurate in identifying "correct" and "incorrect" job behaviors. 10 Even 440 College & Research Libraries when raters focus on a single behavior or trait, subjectivity breeds variance .11 Halo error is committed by raters relying on a summary impression of each ratee rather than distinguishing among the lev- els of performance that individuals exhibit on different job dimensions. Evaluation of any one dimension is not synonymous with performance appraisal as a whole. 12 A number of reports conclude that no ap- praisal instrument or amount of education or experience is satisfactorily resistant to halo error. 13 Training sessions in which participants are urged not to commit this error run two related risks. One is that in- dividual raters are not capable of accu- rately identifying ratees' strong and weak performance areas. "Thus, it is possible that reducing halo may also lead to re- duced reliability. " 14 The other risk is that "making greater distinctions among dif- ferent performance categories may lead to a reduction in interrater reliability" as well .15 Possible upper limits in reliability of per- formance ratings were explored in one ex- periment by creating a nearly ideal envi- ronment in which actors played jobs familiar to the raters. Results indicated an inborn ''ceiling for interrater agreement in judging perforrnance." 16 Given what ap- pear to be human limitations associated with halo error, some behavioral research- ers are turning away from efforts to con- trol halo error and toward study of the kinds of cues people depend upon in mak- ing summary evaluations of others. Per- sonal construct theory is one example of this shift in theoretical approach. Another example of the new research centers on attribution theory in the field of social psychology. 17 This theory seeks to explain by a two-stage process people's naive assumptions about the causes of their own and others' behavior. In the first or diagnostic stage, the supervisor, pre- sented with an incident of poor perfor- mance (e.g., tardiness, a missed · dead- line), tries to determine the cause by sorting through a variety of informational cues. Such attribution involves a judg- ment about whether the cause was inter- nal to the subordinate ·( e.g., his or her atti- tude, ability, or effort) or external to the September 1986 subordinate (e.g., task difficulty, lack of support, insufficient information). The diagnostic stage is typically domi- nated by self-serving biases. A number of empirical studies show that people think their own behavior stems from external forces, while others' behavior sterns from internal factors. 18 Thus, in the case of un- satisfactory job performance, the supervi- sor is likely to blame internal causes (i.e., the ratee), while the subordinate is likely to blame external events (i.e., things be- yond his or her control). At the decision stage the supervisor can either modify the job or tell the subordi- nate to "change." Usually, the latter course is easier. This tendency toward in- ternal attribution is strengthened when the outcome of unsatisfactory perfor- mance is serious-even though the out- come may have been beyond the subordi- nate's control. However, when poor performance occurs without serious con- sequence, the supervisor is more likely to overlook the problem. 19 Attribution the- ory holds that, for more effective apprais- als, supervisors should focus on the be- havior and not the outcome . Alternative Rating Patterns In view of the inherent difficulties of the supervisor-subordinate relationship, vari- ous other rating patterns have received at- tention. Research on subordinate evalua- tion of the supervisor is slim; also, the results of experiments do not lend them- selves to clear generalization, apart from the evidence of a potential for frictions and misunderstandings in the use of this ap- praisal.20 Laboratory experiments, lacking any real threat of ratee retaliation, are theoreti- cally interesting but tend to be artificial. One set of experiments on reverse evalua- tion suggests a declining role for sex bias. Traditionally, women have been rated dif- ferentlX from men, very often more nega- tively . 1 At the University of Dayton, male subjects and female confederates worked in dyads on a task in which the female con- federate was always chosen leader over the male subject. The selection of leaders was designed to appear in one of three ways to the men: as an arbitrary decision based on sex; as a matter of "chance"; or as a determination of ''merit,'' based on a (secretly rigged) male-female competition on a prior task. It was arranged for half of the teams to succeed at the task, the other half to fail. 22 Results of the experiments indicate that appraisal of women as leaders is mainly a dual function of perceived fairness of the selection method and of task outcome. In the first instance, male subordinates who think that a woman was handed a leader- ship position by virtue of her sex may try to restore psychological equity by down- grading a woman's qualifications. The other, more powerful criterion of ap- praisal for the woman leader is whether she succeeds at the task. Although these findings can reasonably be extended to a leader of either sex when the selection process is perceived to be un- fair, they carry a particular warning against clumsy affirmative action pro- grams. Also, they suggest that organiza- tions can best insure a person's accep- tance in a leadership role by assigning · relatively easy tasks at first. Peer evaluations have a curious quality. They tend to show more halo error than su- pervisor ratings (i.e., less sensitivity to dif- ferences among job behaviors); 23 yet, they have high reliability: in the prediction of leadership potential. 24 No general explana- tion of this reliability seems to exist, but a number of specific findings are well docu- mented: (1) friendship plays a minor role; (2) racial bias is to be expected; (3) brief ob- servations of physical appearance and ·manner are highly similar to observations based on considerable prior contact; and ( 4) peer evaluations are stable across poups, or as group composition changes. Self-ratings are dichotomous. When em- ployees compare their own job perfor- mance with the that of others in similar jobs at about the same salary, self-ratings are greatly inflated. As a rule, the higher the job level, the greater the inflation. For ex- ample, among managers more than 80 per- cent rated themselves in the top 10 percent category. 26 On the other hand, self-appraisals for developmental purposes are much more reliable indicators of one's relative Performance Appraisal 441 strength in different capacities. Research has shown that self-appraisals of manage- rial potential, when done confidentially, have fairly good job-behavior discrimi- nant validity and are not unduly influ- enced by halo error. 27 Overall, most peo- ple have an intuitive grasp of their ability despite an unrealistically favorable per- ception of their current performance com- pared to others' performance. This self- delusion is not necessarily bad. As research has also shown, high self-esteem is associated with many desirable work characteristics, such as motivation, pride in one's work, and physical and mental well-being. 28 Appraisal Techniques Rater Training. Considerable attention has been directed toward attempts tore- duce various rating errors through rater training. Recent studies suggest few solid leads about how to train people to observe and record job behaviors reliably. 29 Some writers argue that training programs should not be encouraged because, lack- ing any theoretical model, "we do not know what truth in performance appraisal is." 30 Others contend that some training, at least in communication skills, is neces- sary because raters without any training ''have the potential of doing more harm than good. " 31 One drawback to rater training was mentioned above: the reduction in deter- minant validity and expansion of interra- ter variance after lectures about avoiding halo error. Another consideration is that the impact of training effects on subse- quent performance appraisals does not last long. The most commonly cited "washout" period is one week. 32 The es- sential point is that rater errors appear to be well-developed habits resistant to lec- tures, practice, or warning. 33 Rating Formats. Trait checklists for grad- ing complex personal characteristics, e.g., cooperativeness and independence, are looked upon unfavorably in management circles and by the courts. From an organi- zational standpoint, the main problem with trait measures is poor feedback: tell- ing someone to "show more initiative" or to ''be a better listener'' does little good 442 College & Research Libraries without telling the person how to imple- ment the advice in behavioral terms. 34 From a legal perspective, personnel deci- sions based on trait measures and affect- ing protected classes (e.g., minorities) are deemed ''susceptible to partially and to the personal taste, whim, or fancy of the evaluator.'' 35 These problems ae well known. What is less widely recognized is that trait mea- sures and other ad hoc instruments have been shown to be no worse than, and most often superior to, behavioral scaling strategies. In one study of police job per- formance, three rating instruments-a trait checklist, a behaviorally anchored rating scale (BARS), and a simpler numer- ical rating scale-were compared with re- spect to several critieria. Although no in- strument was clearly superior for all purposes, the results taken together sug- gest the following ranking from best to worst: the numerical rating scale, trait rat- ings, and the BARS format. 36 This order- ing of the relative superiority of the instru- ments is contrary to theoretical expectations in a number of fields but is in general agreement with the reviews of the literature bearing on empirical testing of scaling strategies. 37 Trait measures continue to be the most widely used tzy,e of performance ap- praisal method. By one estimate, 90 per- cent of today' s organizations evaluate an employee's effectiveness in terms of per- sonal qualities thought desirable by those in the organizational hierarchy. 39 MBO. Management by objectives (MBO) and performance appraisal are of- ten confused. The key to MBO' s success is goal setting tailored to the individual, while performance appraisal attempts to apply uniform standards to all employee behavior. Nevertheless, writers have commonly adopted two interconnected postulates that hold first, that MBO leads to higher performance; and second, that performance appraisal should focus on goal attainment. Underlying the first postulate is the be- lief that participative goal setting is supe- rior to assigning goals. The research on this issue is equivocal. It would appear that goal specificity and goal acceptance September 1986 can be attained as easily through assigned as through mutually set goals but that the establishment of higher goals is enhanced more by employee participation than by assigning higher goals or telling employ- ees to do their best. 40 Although there is a heavy emphasis in the literature on mu- tual goal setting for the sake of profes- sional employees, one review article found that such participation is "most ef- fective among uneducated and minority- group members, perhaps because it gives them a sense of control over their fate. ''41 The second postulate, that performance raters should emphasize goal attainment, ignores the maxim of attribution theory: for more effective appraisals, raters should focus on behavior and not out- come. Research has shown that if employ- ees are evaluated on goal attainment, ''they are likely to set low goals or reject higher goals imposed by supervisors," but that if employees are evaluated on per- formance, "they will continue to set high goals whether the goals are attained or not." 42 Again, the role of participation is un- clear. When one of the parties in mutual goal setting is ''more equal'' than the other (as is the norm), the question of management by whose objectives inevita- bly arises. 43 Although quantification is not possible, observers seem to agree that MBO includes a lot of spurious participa- tion, in which the fiction is maintained that the subordinate is making a real input into work planning and goal setting. 44 Moreover, MBO is not for everyone. Job experience is necessary before participa- tion makes much sense. For new employ- ees, as well as troublesome ones and those who do not wish to participate, a more ap- propriate personnel style than MBO is what behavioralists call the watcher. This means getting individual employees .to perform the way you want by reminding them that you are overseeing the situa- tion. 45 Interview. During the appraisal inter- view, the supervisor and the subordinate are generally working at cross purposes .. Supervisors have to make judgments af- fecting the subordinates' careers, morale, and relative share of rewards. Subordi- nates need to verify their self-esteem and also obtain enough rewards. Such con- flicts tend to generate ambivalent patterns of avoidance and defensiveness. 4 Mutual avoidance of the risks of nega- tive feedback has given rise to a process called the "vanishing appraisal," in which both parties implicitly collude, in effect, to keep matters short and sweet. 47 Defensiveness springs from criticism of any sort. 48 To be effective at all, negative feedback should focus on specific behav- ioral changes to improve job performance; but there is no hard evidence that well- intentioned "constructive criticism" actu- ally works. 49 Research in this area focuses on the rela- tionship between appraisal interview characteristics and subsequent job perfor- mance. At least six major characteristics of an effective appraisal interview are well documented (although it is unclear whether they are all conceptually dis- tinct). These are (1) the subordinate's prior preparation for the interview, (2) his or her opportunity to present ideas and feelings, (3) the degree of helpfulness of the supervisor, ( 4) the extent to which job problems are cleared up, (5) the extent to which goals are set, and (6) the absence of threats. Each characteristic has been found to be significantly related to im- provement in later job performance, as well as to satisfaction with the appraisal process. 50 II. THE MINIMAL APPROACH The nonprofit organization faces a spe- cial dilemma in the appraisal process not ordinarily found in the business or gov- ernmental sectors. The area of this di- lemma, noted in a national survey of the quality of employment, 51 is located be- tween work characteristics and organiza- tional climate. On the one hand, the sur- vey found that "nonprofit employees, less fettered by centralization and control, have more autonomy in doing their jobs, report more variety and challenge, and find that their education is matched to their job demands." The cost of such job autonomy and employee influence is lack of job feedback, whether from the work it- self or from supervisors and peers. Performance Appraisal 443 On the other hand, as the survey also showed, nonprofit organizations gener- ally attempt to draw together their em- ployees by democratic arrangements, so that participation "may create a greater sense of commitment and involvement for employees and serve to clarify jobs and soften role stresses." The survey con- cluded that efforts to import more busi- nesslike performance appraisal systems into nonprofit organizations for the sake of job feedback could "centralize author- ity, limit flexibility, stifle innovation, and . . . alienate employees not only from their service but also from their ideals. " 52 This argument parallels the general call in some quarters to make performance ap- praisals, as an ethical matter out of fair- ness, less ambitious and important. 53 Al- though no "minimal model" has been drafted, the basic position is that man- agers should only attempt to identify ex- tremes of performance and leave salary administration to impersonal forces, such as the seniority system. Apart from likely violation of legal ground rules, there are two fundamental weaknesses with the minimal approach. One is that the large area of indeter- minancy in appraisals is an important fi- duciary component of the supervisor's role, not an excuse to skip or minimize the appraisal process. On the whole, employ- ees want valid job feedback. They may not be looking for criticism, but they do not want to spend their professional lives un- informed about the character and results of their work. The other weakness of the minimal approach is that it downplays the need for goal setting and job feedback, and apparently ignores the ''Pygmalion effect,'' which has shown again and again that people perform best when they have high supervisor expectations, encourage- ment, and visual attention. III. HEURISTIC KNOWLEDGE The heuristic side of performance ap- praisal consists of strategies and rules of thumb for deciding which among several alternative courses of action promises to be the most expedient one. In the mathe- matical and physical sciences, some heu- 444 College & Research Libraries ristic rules can leap across logical gaps in problem solving. However, in immature or ineffective fields such as social sciences, heuristic knowledge consists of intuitive generalities, or aphorisms, which have neither the objectivity, nor the perma- nence of an impersonal law. The heuristic potential of any field de- pends on whether the important prob- lems in that field ''can be effectively solved to yield some sort of genuine knowledge.'' 54 Study of the appraisal pro- cess is a venture into tremendously com- plicated problems of organizational be- havior. Behavior includes not only overt acts but also subjective behavior, such as attitudes, beliefs, motivations, and de- sires. No significant empirical or theoreti- cal progress has been made in this field for a long time. A major review of the 1970s literature had little positive to report on performance appraisal: "Research con- vincingly showed that instrumentation [i.e. a behavioral rating scale] has minimal impacts on psychometric errors, but that conclusion was pretty well established be- fore. " 55 A more current assessment of re- search work in organizational psychology concluded that "the theory/practice gap will close by a decrease in the relevance of theory, not by an increase in the quality of the practice. '' 56 About the only strong consensus with empirical support in the heuristic litera- ture on performance appraisal deals with the fundamental importance of setting goals and giving feedback. Together, these are not only more effective than al- ternative methods, such as participatory decision making or job enrichment, but may also be the major mechanisms by which these other incentives affect moti- vation. Even money is a more effective motivator when bonuses are made contin- gent on the attainment of specific objec- tives.57 But on issues of technique having a di- rect bearing on how to set goals and give feedback-for example, the relative merit of assigned versus participatory goals, or of intrinsic versus extrinsic rewards, or of weekly versus annual appraisal ses- sions-theo.rists plainly disagree. September 1986 The potential development of heuristic rules about performance appraisal is un- dermined not only by the contemporary scarcity of empirical generalizations, but also by two chronic limitations: the valu- ing aspects of appraisal, and common sense. In moving along a continuum from behavior-based, mechanistic tasks to those that are judgment based and, hence, or- ganic in structure, the valuing aspect of ap- praisal becomes increasingly salient. Even for moderately uncertain tasks, ''articula- tion of a mutually agreeable standard of measurement is normally not possible.'' 58 A series of studies on this shift from mea- surement to valuing as appraisal uncer- tainty increases demonstrated that value is likely to be assigned on the basis of "personal factors of relevance," and con- cluded that personal judgments should be regarded as "critical and legitimate con- cerns" in those situations "where contro- versy and interest are joined. " 59 As a general surmise, common sense is not an infallible guide to performance ap- praisal. Herbert White60 and Richard De Gennaro, 61 among other writers, rightly argue for the value of experience, the proper attitude, and intelligence in library management; but Peter Drucker singles out two areas in which such tacit knowl- edge is not enough. ''One is personnel de- cisions, the other is marketing. " 62 Druck- er's statement cannot be "proved," but there is supportive research on the nature of tacit knowledge (discussed below); and an old but unchallenged review of the lit- erature concluded that the inability to judge the work traits and aptitudes of oth- ers is itself a personality trait. 63 Behavioral study of performance ap- praisal is not simply stalled, nor its heuris- tic knowledge merely ineffective. Given both a lack of empirical generalizations and a propensity for subjective evalua- tions and personal feelings, the prevailing heuristic literature resembles a folk-science. This is "a body of accepted knowledge whose function is not to provide the basis for further advance, but to offer comfort and reassurance to some body of believ- ers.''64 · IV. IN-HOUSE EXPERIMENTATION Broadly defined, the in-house experi- ment is a nonrandom, quasi-empirical plan "to generate knowledge about hu- man OJ>tions that go beyond the status quo." Its nonrandom character derives from the self-selected population (of par- ticipants). Its quasi-empirical character in- volves a shift from positivist logic (normal science) to tacit knowledge, which is par- tial and subjective. To generate knowledge is not simply to make an inquiry; knowledge is the capacity to detect and deal with er- ror. Finally, human options can apply to various aspects of organizational behav- ior: work motivation, staff productivity, job enrichment, and participatory man- agement, as well as performance ap- praisal. Any stricter definition of the in-house experiment would be arbitrary. The philo- sophical pluralism now current in the field of organizational psychology is one of its ~hief sources of strength, 66 but it is also a broad and general feature; and the paucity k · of published reports on this type of experi- ment makes it difficult to find models or paradigms. 67 The library science literature, for example, is full of narratives about per- formance appraisal but only one article stands out as a true experiment, in which a definite plan leads to action that detects and attempts to correct a mismatch be- tween intentions and organizational out- comes.68 Reports on in-house experimentation have been made, for the most part, by a small band of theorist-practitioners who have tied adult learning strategies to the study of organizational culture. 69 Their findings suggest two general reasons why an in-house experiment is not an ordinary plan of action: ''the inability of individuals to learn the skills to produce conditions of trust, candidness, and risk-taking under online conditions, and the lack of organi- zational support for such changes. " 70 With reference to the first reason; a fairly sophisticated study of the types of ''mas- ter plans" people hold about managing their actions determined that ''individuals who are young or old, male or female, Performance Appraisal 445 white or black, rich or poor, powerful or powerless, may hold different espoused theories, they may act differently, but their interpersonal theories-in-use do not seem to vary.'' A model of this more or less universal behavioral pattern has four objectives: "(1) control the purpose of the situation, (2) minimize losing and maxi- mize winning, (3) suppress negative feel- ings, and (4) emphasize rationality." 71 Additionally, research work on leader- ship points up a marked tendency for or- ganizational administrations to avoid in- house experiments. Supervisors at every level tend to have a basic lack of confi- dence in others and do not believe that self-management traits-initiative, judg- ment, responsibility-are widely distrib- uted among subordinates. 72 And a clearly large number of supervisors hold to a lump-of-control theory. This type of theory-in-use assumes that there is a vir- tually fixed amount of authority in any or- ganization, so that whatever the supervi- sor surrenders is a political and even a personal loss. 73 All this suggests behavioral reasons why in-house experiments to alter the sta- tus quo have seldom been put into action. On a theoretical plane, it seems unlikely that such experiments could evolve into strategic sources for model building be- cause their methods and criteria consist as much of independent variables as of orga- nizational behavior. Even MBO, in prac- tice, has scores of variations. Accordingly, the confidence level of an in-house experi- ment may decay quickly when findings are removed from the original setting. 74 An associated theoretical issue centers on the intrinsically mental character of in- house experiments (and of all social phe- nomena). 75 Several labels for personal knowledge about organizational behavior in a particular setting have appeared (e .g., indwelli~ understanding, 7 armchair the- orizing), but most writers have adopted Michael Polanyi' s 1972 term, tacit knowl- edge.78 Everyone uses tacit knowledge to make sense of the interplay of intentions, inter- actions, and context in daily life. In an in- house experiment, the role of tacit 446 College & Research Libraries knowledge-experience, intuition, intelli- gence, guesswork-would be preeminent . in designing the approach and in selecting and interpreting the results: The research ought to be skeptical of positivist science when the unit of analysis is, like there- searcher, a self-reflecting subject, when · rela- tionships between subjects (actors) are influ- enced by definitions of the situation, or when the reason for undertaking the research is to solve a Rroblem which the actors have helped to define. Such action research makes theorists ner- vous because reliance on a participant's personal assessment implies a profundity that can be self-aggrandizing, defensive, or misinformed. To temper and round out subjective assessments, writers uniformly recommend the panel or team design for evaluation of nonrandom in-house experi- ments. 80 CONCLUSIONS Performance appraisal provides a modi- cum of rationality and legality to key per- sonnel actions, but there is no neat, objec- tive way of going about it because of common, intractable sources of rater er- ror. Of the four approaches to perfor- mance appraisal under review- behavioralism, minimal model, heuristic knowledge, and in-house experimenta- tion-only the last one has the capacity to detect and deal with error. All the other approaches are of relatively limited utility. The essential difficulty with behavioral- ism is that our state of knowledge about the determinants of appraisal effective- ness lags far behind organizational needs and legal requirements. Behaviorally based methods, when evaluated for valid- ity, relevance, discriminality, and free- dom from bias, remain inadequate for the purposes that appraisals are intended to serve. This all-too-apparent deficiency in personnel administration has come under increasing legal scrutiny, 81 but there is lit- tle documentation of the particular laws governing appraisals in small nonprofit organizations. No dramatic breakthroughs in the the- ory or practice of performance appraisal are in sight. Practically all of the concepts and methods now in use have been September 1986 around for decades, and the types of method still to be tested do not appear promising. 82 Thus, the search for an ulti- mate rating format was plainly exhausted in the 1970s when researchers quit hedg- ing about the efficacy of behavioral scaling strategies. Rater training programs may be on the decline, but misapplications of MBO as a grand substitute for an appraisal system will likely endure. At the theory- building level, a workable model of the ap- praisal process would encompass such a vast array of human, social, and organiza- tional variables that, if feasible, would re- quire decades to develop. 83 The antithesis of behavioralism is pater- nalism, involving a minimal model with two chief premises: benign neglect of perfor- mance appraisal; and "unconditional re- wards in the sense that the amount of re- ward that any individual receives is not dependent in any clear-cut way on how he behaveswithin the organization, but rather on the fact that he is a member of that organization.' ' 84 Although this model has been eclipsed in the management lit- erature since the mid-1950s by others (e.g., MBO, participatory decision mak- ing, behavioralism), it is still salient in the nonprofit sector and especially in the li- brary field. Relaxation of performance appraisal and resort to across-the-board pay in- creases may be entirely reasonable in some organizations at some times, such as during periods of budgetary restraint. But, over the long haul, the premises of paternalism tend to break down. First, where performance appraisals are not conducted, ''implicit evaluations will be assumed, and what managers say and do, even casually, will be interpreted and weighted as organizational evalua- tions. " 85 Second, an unconditional re- wards system based on membership and egalitarianism may well be dissatisfying not only to high achievers but also to many average workers, given the high in- flation of self-ratings relative to others' performance. The aim of heuristic literature is to dis- seminate usable knowledge in a particular field. In the social and behavioral sciences, the logic of positivism-that human inter- action and society can be understood in terms of impersonal laws-has led to re- search reports that are quite narrow in scope and technical in application. One ar- gument of an emergent postbehavioral movement is that scientific certainty of knowledge, to a degree, is the enemy of its usefulness: . The effort to establish certainty almost always leads to large numbers of studies being done on a single small topic and to more and more care- ful specification of the phenomenon. Once the phenomenon has been subjected to all the tests of certainty, it often winds up so complex that it is no longer useful to the practitioner.86 Hence, the important function of heu- ristic literature is to make scientific knowl- edge understandable and accessible to the public. This literature holds few empiri- cally grounded generalizations about per- formance appraisal, but it certainly pro- vides readier access to useful knowledge than could be obtained from a survey of the primary research in behavioral science journals. Analysis of the several main approaches to performance appraisal along the fore- going lines leads to the inevitable conclu- sion that performance appraisal is not an effective field. However, this does notre- quire us to assume that nothing can be done, through deliberate plan, to improve the appraisal process. It seems reasonable to expect that real improvements can be achieved through a good measure of in- house experimentation and some basic changes in the way we deal with un- spoken assumptions about our ability to judge the work of others. Unlike rater errors in performance ap- praisal, the problems of in-house experi- mentation are not intractable. The sine qua non is II double-loop learning, " 87 which occurs when, in order to deal with error, behavioral values and tacit knowledge are questioned and altered. (Single-loop learning, by contrast, involves the detec- tion of only routine errors that do not up- set the underlying values of the theory-in- use). The elusive goal of in-house experimen- tation has been called reflexive power, 88 or the enlightenment effect, whereby knowl- edge of behavioral laws can modify or even nullify their impact on those who Performance Appraisal 447 share that knowledge. 119 However, discus- sions in the literature about the enlighten- ment effect are necessarily abstract be- cause such tacit knowledge is II a-critical" 90 -it cannot be articulated ex- pressly. Thus, it does not pass easily as public (or heuristic) knowledge: Even if we begin to have an intuitive ·under- standing of an organization's culture, we may find it extraordinarily difficult to write down that understanding in such a way that the es- sence of the culture can be communicated to someone else. . . . But when we see the essence of a culture, the paradigm by which people operate, we are struck by how powerful our insight into that or- ganization now is, and we can see instantly why certain things work the way they do, why certain proposals never get bought, why change is so difficult, why certain people leave, and so on . . . . It is the search for and the occasional find- ing of this central insight that makes it all worthwhile .. . even if, in the end, we can share it only with colleagues. 91 In brief summary, perhaps an appropri- ate skepticism of the approaches to perfor- mance appraisal other than in-house ex- perimentation is the comedian's dictum, "It isn't ignorance that does the most damage, it's knowing for sure so much that isn't so." These approaches involve only II single-loop learning" and do not have the capacity to detect and deal with basic error. In-house experimentation, by contrast, is a way of correcting overlearn- ing without overintellectualizing the problem. In using nonexperimental approaches we judge the work and potential of others with a set of behavioral values and tacit theories-in-use. These may or may not be accurate or corrigible, but most organiza- tions have never thought through what they regard to be desirable about personal traits and organizational behavior, what relative importance they give to the vari- ous values they hold, or how these values can be incorporated into the current ap- praisal system. We would be better off if we made a serious endeavor to assess and experiment with our theories-in-use, starting with more doubts about what we know ''for sure.'' 448 College & Research Libraries September 1986 REFERENCES 1. Thomas DeCotiis, "An Analysis of the External Validity and Applied Relevance of Three Rating Formats," Organizational Behavior and Human Performance 19:265 (Aug. 1977); Jeffrey Kane and Edward Lawler III, "Performance Appraisal: Its Assessment and Determinants," in Research in Organizational Behavior: An Annual Series of Analytical Essays and Critical Reviews, ed. by Barry Staw, V.1 (Greenwich, Conn.: JAI Press, 1979), p.456-58. 2. Douglas McGregor, "An Uneasy Look at Performance Appraisal," Harvard Business Review 53:89-94 (May-June 1957). 3. Dimitry Berkner, "Library Staff Development through Performance Appraisal," College and Re- search Libraries 40:335-44 (July 1979); G . Edward Evans, Management Techniques for Librarians, 2d ed. (New York: Academic, 1983) p.211-43; David Kaser and Ruth Jackson, "A Century of Person- nel Concerns in Libraries,'' in A Century of Service: Librarianship in the United States and Canada, ed. by Sidney L. Jackson, Eleanor B. Herling, and E. J. Josey (Chicago: American Library Assn., 1976), p.129-45; H. R. Kroll, "Beyond Evaluation: Performance Appraisal as a Planning and Motiva- tional Tool in Libraries," Journal of Academic Librarianship 9:27-32 (Mar. 1983); Murray S. Martin, Issues in Personnel Management in Academic Libraries (Greenwich, Conn.: JAI Press, 1981), p.153-57; Maxine Reneker, "Performance Appraisal in Libraries: Purpose and Techniques," in Personnel Administration in Libraries, ed. by Sheila Creth and Frederick Duda (New York: Neal-Schuman, 1981) p .227-89; Herbert S. White, Library Personnel Management (White Plains, N.Y.: Knowledge Industry Pub., 1985), p .123-38. · 4. Philip Mirvis and Edward Hackett, "Work and Work Force Characteristics in the Nonprofit Sec- tor," Monthly Labor Review 15:3-12 (Apr . 1983). 5. Peter F. Drucker, The Age of Discontinuity: Guidelines to Our Changing Society (New York: Harper, 1969). 6. Kim Cameron, ''Critical Questions in Assessing Organizational Effectiveness,'' Organizational Dy- namics 9:66-80 (Autumn 1980). 7. Randall Schuler, Personnel and Human Resources Management, rev. ed. (N~w York: West Pub., 1984), p.271-79. 8. Elaine Gruenfeld, Performance Appraisal: Promise and Peril (Ithaca, N.Y.: New York State School of Industrial and Labor Relations, Cornell Univ., 1981), p.18-20; Kane, "Performance Appraisal," p.452-54 . 9. W. Kirchner and D. Reisberg, "Differences between Better and Less Effective Supervisors in Ap- praisals of Subordinates," Personnel Psychology 15:295-302 (Autumn 1962). 10. Gruenfeld, Performance Appraisal, p.10-13. 11. Wayne Cascio and John Bernardin,'' Implications of Perf0rmance Appraisal Litigation for Person- nel Decisions," Personnel Psychology 34:215 (Summer 1981). 12. Walter Borman, "Consistency of Rating Accuracy and Rating Errors in the Judgment of Human Performance," Organizational Behavior and Human Performance 20:238-52 (Dec. 1977); Robert Holz- bach, "Rater Bias in Performance Ratings; Superior, Self-, and Peer Ratings," Journal of Applied Psychology 63:579-88 (Nov. 1978); William Scott, ''The Influence of_ Variations in Performance Pro- files on the Performance Evaluation Process: An Examination of the Validity of the Criterion," Organizational Behavior and Human Performance 14:360-70 (Dec. 1975). 13. Walter Borman, ''Effects of Instructions to Avoid Halo Error on Reliability and Validity of Perfor- mance Evaluation Ratings,'' Journal of Applied Psychology 60:556-60 (May 1975); Holzbach, ''Rater Bias," p.581. 14. Borman, "Effects of Instructions," p.557. 15. Borman, "Effects of Instructions," p.560. 16. Walter Borman, ''Exploring Upper Limits of Reliability and Validity in Job Performance Ratings,'' Journal of Applied Psychology 63:142 (May 1978). 17. Gary Latham, Larry Cummings, and Terrence Mitchell, "Behavioral Strategies to Improve Perfor- mance," Organizational Dynamics 9:5-23 (Winter 1981). Terence Mitchell and Robert Wood, "Su- pervisors' Responses to Subordinate Poor Performance: A Test of an Attributional Model,'' Orga- nizational Behavior and Human Performance 25:123-38 (Feb . 1980). 18. Latham, "Behavioral Strategies," p.14. 19. Mitchell, "Supervisors' Responses," p.138. 20. Jess Martin, "Staff Evaluation of Supervisors," Special Libraries 70:26-29 (Jan. 1979). 21. Holzbach, "Rater Bias," p.579-88. Benson Rosen and Thomas Jerdee, "Sex Stereotyping in the Performance Appraisal 449 Executive Suite," Haroard Business Review 52:43-56 (Mar.-Apr. 1974). 22. Marsha Jacobson and Walter Koch, ''Women as Leaders: Performance Evaluation as a Function of Method of Leader Selection," Organizational Behavior and Human Performance 20:149-57 (June 1977). 23. Holzbach, "Rater Bias," p.580. 24. Arie Lewin and Abram Zwany, "Peer Nominations: A Model, Literature Critique and a Paradigm for Research," Personnel Psychology 29:423-47 (Aug. 1976) . 25. Lewin, "Peer Nominations," p.431-35. 26. Schuler, Personnel, p.272-73. 27. Herbert Heneman and Donald Schwab, "Assessing Outcomes," in Perspectives on Personnel/Hu- man Resource Management, rev. ed. by Herbert Heneman and Donald Schwab (Homewood, ill.: Irwin, 1982) p.68; Holzbach, "Rater Bias," p.580. 28. Meyer, "Self-Appraisal of Job Performance, " Personnel Psychology 33:293 (Summer 1980). 29. Borman, "Effects of Instructions," p.556. 30. Cascio, "Implications," p.223. 31. Nathan Winstanley, "Performance Appraisal: Another Pollution Problem?" Conference Board Rec- ord 9:59-63 (Sept . 1972). · 32. Dennis Warmke and Robert Billings, "Comparisons of Training Methods for Improving the Psy- chometric Quality of Experimental and Administrative Performance Ratings," Journal of Applied Psychology 64:129 (Feb. 1979). 33. Latham, "Behavioral Strategies," p.19. 34. Latham, "Behavioral Strategies," p.7 . 35. Wade v. Mississippi Cooperative Extension Seroice, 528 F.2d 416 (7th Cir. 1978). 36. DeCotiis, "An Analysis ." 37. Wayne Cascio and Enzo Valenzi, ''Behaviorally Anchored Rating Scales: Effects of Education and Job Experience of Raters and Ratees,'' Journal of Appli~p Psychology 62:278-82 (Aug . 1977); DeCo- tiis, "An Analysis"; Gruenfeld, Performance Appraisal, p.13-18; Donald Schwab, Herbert Bene- man III, and Thomas DeCotiis, "Behaviorally Anchored Rating Scales: A Review of the Litera- ture," Personnel Psychology 28:549-62 (Winter 1975) . 38. Barry Friedman and Edwin Cornelius III, "Effect of Rater Participation in Scale Construction on the Psychometric Characteristics of T~o Rating Scale Formats, II Journal of Applied Psychology 61:210-16 (May 1976) . 39. Latham, "Behavioral Strategies," p .7. 40 . Gary Latham, Terence Mitchell, and Dennis Dossett, "Importance of Participative Goal Setting and Anticipated Rewards on Goal Difficulty and Job Performance,'' Journal of Applied Psychology 63:163-71 (May 1978). 41 . Gary Latham and Edwin Locke, "Goal Setting-A Motivational Tool That Works," in Psychologi- cal Foundations of Organizational Behavior, rev. ed. by Barry Staw (Glenview, Ill.: Scott, Foresman, 1983), p .252 . 42. Latham, "Importance of Participative Goal Setting, " p.169-70; Latham, "Behavioral Strategies," p.20. 43. Frederick Herzberg, ''One, More Time: How Do You Motivate Employees?,'' Haroard Business Re- view 46:61 (Jan.-Feb. 1968); Craig Pinder, "Concerning the Application of Human Motivation Theories in Organizational Settings," Academy of Management Review 2:384- 97 (July 1977). 44 . Michael Beer, "Performance Appraisal: Dilemmas and Possibilities," Organizational Dynamics 9:28 (Winter 1981); Schuler, Personnel, p .270 . 45. Ron Kolgraf, "Psychology and Selling," Industrial Distribution, p .39 (Oct. 1978). 46. Beer, "Performance Appraisal," p.28; Schuler, Personnel, p .270. 47. Beer, "Performance Appraisal," p.27. 48. Ronald Burke, William Weitzel, and Tamara Weir, "Characteristics of Effective Employee Perfor- mance Review and Development Interviews: Replication and Extension," Personnel Psychology 31:903-11 (Winter 1978). 49. Schuler, Personnel, p.270. 50. Gruenfeld, Performance Appraisal, p.22-27; Burke, "Characteristics," p.915-916. 51. Mirvis, "Work and Work Force Characteristics." 52. Mirvis, "Work and Work Force Characteristics," p.ll. 53. Gruenfeld, Performance Appraisal, p.23; Schuler, Personnel, p.290; N. Winstanley, "Legal and Ethi- cal Issues in Performance Appraisals," Haroard Business Review 58:186-88 (Nov.-Dec . 1980). 54. Jerome R. Ravetz, Scientific Knowledge and Its Social Problems (Oxford, Eng .: Clarendon, 1971), p .377. 450 College & Research Libraries September 1986 55 . Lee Dyer and Donald P. Schwab, "Personnel/Human Resource Management Research," in Indus- trial Relations Research in the 1970s: Review and Appraisal, ed. by Thomas A. Kochan, Daniel J. B. Mitchell, and Lee Dyer (Madison, Wis.: Industrial Relations Research, 1982), p.194. 56. J. Richard Hackman, "Doing Research That Makes a Difference," in Doing Research That Is Useful for Theory and Practice, ed. by Edward E. Lawler III and others (Washington: Jossey-Bass, 1985), p.150. 57. Latham, "Goal Setting," p.45-46. 58. Michael Keeley, "A Contingency Framework for Performance Evaluation," Academy of Manage- ment Review 3:398-99 (July 1978). 59. Keeley, "A Contingency Framework," p.403. 60 . White, Library Personnel Management. 61. Richard De Gennaro, "Theory vs. Practice in Library Management," Library Journal108:1318-21 (July 1983). 62. Peter F. Drucker, "Managing the Public Service Institution," College & Research Libraries 37:4-14 (Jan. 1976). 63. Ronald Taft, "The Ability to Judge People," Psychological Bulletin 52:1-23 (Jan. 1955). 64. Ravetz, Scientific Knowledge and Its Social Problems, p.366. 65. Chris Argyris, "Theories of Action That Inhibit Individual Learning," American Psychologist 31:638 (Sept. 1976). 66. Nigel Nicholson and Toby Wall, "Introduction: Themes in the Theory and Practice of Organiza- tional Psychology,'' in The Theory and Practice of Organizational Psychology: A Collection of Original Essays, ed. by Nigel Nicholson and Toby D. Wall (New York: Academic, 1982), p.12. 67. Edgar H. Schein, Organizational Culture and Leadership (Washington: Jossey-Bass, 1985), p.147. 68. Martin, "Staff Evaluation of Supervisors." 69. Chris Argyris and Donald Schon, Organizational Learning: A Theory of Action Perspective (Reading, Mass .: Addison-Wesley, 1978); Chris Argyris, Reasoning, Learning, and Action: Individual and Orga- nization (Washington: Jossey-Bass, 1982); Donald A. Schon, The Reflective Practitioner (New York: Basic Books, 1982); David A. Kolb, Experimental Learning: Experience as the Source of Learning and Development (Englewood Cliffs, N.J.: Prentice-Hall, 1984). 70 . Chris Argyris, "Research as Action: Usuable Knowledge for Understanding and Changing the Status Quo," in The Theory and Practice of Organizational Psychology, p.206 . 71 . Argyris, "Resarch as Action," p.200. Emphasis added. 72. Raymond E. Miles, "Human Relations or Human Resources?" in Organizational Psychology: A Book of Readings, ed. by David A. Kolb, Erwin M. Rubin, and James M. Mcintyre (Englewood Cliffs, N.J.: Prentice-Hall, 1971), p.238. 73. Miles, "Human Relations or Human Resources?;" p.237-238. 74. Frank Landy, "Models of Management: Assumptions of-Theorists," in The Theory and Practice of Organizational Psychology, p.103-21; Jerald Hage, "Theory-building," in The Theory and Practice of Organizational Psychology, p .123-47. 75. John Searle, Minds, Brains and Science (Cambridge, Mass.: Harvard Univ. Pr., 1984), p.84. 76. Nicholson, "Introduction," p.15. 77. Nicholson, ''Introduction,'' p .12. 78. Michael Polanyi, Personal Knowledge: Towards a Post-Critical Philosophy (Chicago: Univ. of Chicago Pr., 1972), p.373. 79. Gerald I. Susman and Rodger D. Evered, "An Assessment of the Scientific Merits of Action Re- search," Administration Science Quarterly 23:600 (Dec. 1978). 80. Hackman, "Doing Research That Makes a Difference," p.154; Donald A: Schon, "Education for Reflection-in-Action: An Alternative to the Positivist Epistemology of Practice," in The Optimum Utilization of Knowledge: Making Knowledge Serve Human Betterment, ed. by Kenneth E. Boulding and Lawrence Senesh (Boulder, Colo.: Westview Pr., 1983), p .314-22; William R. Ferrell, "Combining Individual Judgments," in Behavioral Decision Making, ed. by George Wright (New York: Plenum Pr., 1985), p.111-45. 81. Robert Lazer, "The 'Discrimination' Danger in Performance Appraisal," Conference Board Record 13:160 (Mar. 1976). 82. Kane, "Performance Appraisal," p.442. 83. Kane, "Performance Appraisal," p.426. 84. Edward E. Lawler III, Motivation in Work Organizations (Monterey, Calif.: Brooks/Cole, 1973), p.199. 85. Albert Shapero, Managing Professional People: Understanding Creative Performance (New York: Free Pr., 1985), p.90. Performance Appraisal 451 86. Edward E. Lawler III, "Challenging Traditional Research Assumptions," in Doing Research That Is Useful for Theory and Practice, p.12. 87. Argyris, "Research as Action," p.201 . 88. Nicholson, "Introduction," p.3. 89. Kenneth J. Gergen, "Social Psychology as History," Journal of Personality and Social Psychology 26:309-20 (May 1973). 90 . Polanyi, Personal Knowledge, p.264. 91. Schein, Organizational Culture and Leadership, p.147.