College and Research Libraries Development and Testing of a Reference Transaction Assessment Instrument Marjorie E. Murfin and Gary M. Gugelchuk An i~J!Jas designed to assess the reference outcome in terms of patron report of suc- cess. It was theorized that patrons were capable of separating reports of success in finding what was wanted from reports of general overall satisfaction, if given a suitably constructed instru- ment. Utilized in design were Cronbach 's theories on response set and Cuadra's theories on the fail-safe reaction. A survey was conducted. Fifteen academic libraries across the country participated. Their collections ranged from 56,000 to 4 million volumes. A total of 531 transactions with matching patron and librarian forms were analyzed in terms of 35 variables. The reliability of the instru- mentwas found to be .81 overall-.89 for the patron form and. 76 for the librarian form. It was also found that patron report of success on factual questions did not differ significantly from results obtained from unobtrusive observation. II he primary source of pride and satisfaction for the professional reference librarian is the quality . of reference service he or she provides. Although quality reference ser- vice is a desired goal, its definition and as- sessment have been fraught with diffi- culty. Criteria for reference service quality abound in the literature, and numerous evaluation plans have been attempted to measure and assess quality. These valu- able forays into the poorly mapped-out territory of reference service appear to some observers to exemplify the legend of the ten blind men trying to define an ele- phant by only touching its varied parts- the methodologies of evaluation seem re- stricted, and the evaluation criteria partial and incomplete. Before 1960, evaluation of the actual out- come of the reference transaction was not a pressing issue because it was generally believed, on the basis of previous studies, that the outcome was highly satisfactory to reference patrons. 1 Summarizing these studies in 1964, Rothstein stated that "results could hardly be bettered by pay- ing for testimonials. " 2 However, in 1970, with the advent of unobtrusive observa- tion as a research methodology, the out- come of the reference transaction began to be seriously questioned. Controlled stud- ies using this method revealed that refer- ence librarian success at answering factual Marjorie E. Murfin is Associate Professor in the Information Services Department of Ohio State University Libraries, Columbus, Ohio 43210. Gary M. Gugelchuk is Associate AHEC Program Director and Assistant Pro- fessor of Health Professions Education at the College of Osteopathic Medicine of the Pacific, Pomona, California. The authors wish to thank those whose generous support made this project possible. This study was supported first by a grant from the Ohio State University and then by two grants given to Charles Bunge and the first author by the Council on Library Resources. All computer time was furnished by the Ohio State University Instruction and Research Computer Center at no cost to the project. 314 questions averaged around 50%. 3 Refer- ence librarians were hard pressed to rec- oncile the favorable attitudes of their pa- trons with this low success rate. In response to this dilemma, there has been an increased interest in research on the nature of reference service, and as a result of this, some important advances have been made. These include the defini- tion of the reference question and its in- corporation into a standard by Emerson and the LAMA Committee on Statistics for Reference Service, 4 the formulation of mutually exclusive categories of questions by Hieber, 5 the work of identifying refer- ence process factors by Jahoda and Braunagel, 6 the considerable methodolog- ical contributions of Hallman to subject re- cording and data collection, 7 the sepera- tion of outcomes of the reference transaction by Strong, 8 Howell and oth- ers/ Tessier and others/0 Tagliacozzo, 11 and Goth berg, 12 and the work on scale construction for relevance judgments by Cuadra and others. 13 Guided by these research break- throughs, the authors have undertaken to design and develop a survey instrument and a data collection protocol that would assess the outcome of the reference trans- action and identify factors related to suc- cessful and nonsuccessful performance. We also hoped to design this instrument in such a way that it would be practical for widespread use while demonstrating both validity and reliability. As far as the au- thors can determine, patron response to the traditional reference transaction has never been measured by use of a uniform instrumentthat (1) is designed to measure more than one or two types of outcome; (2) utilizes carefully constructed scales with clearly demonstrated power of dis- crimination; (3) has a sufficient number of items to demonstrate reliability; and (4) has been administered in a number of li- braries under carefully controlled condi- tions. These are standards basic to mea- surement research in other disciplines, yet not been consistently utilized in reference research. Before presenting the design and field- testing results of the reference transaction Development and Testing 315 assessment instrument (RTAI), the re- search issues involved will be examined. These issues can best be understood in the historical context of evaluation research in reference. THE LITERATURE AND METHODOLOGY OF REFERENCE EVALUATION Between 1976 and 1982, growing pres- sures to evaluate reference service re- sulted in an outpouring of some 238 publi- cations on assessment of reference service. Ninety-two of these items stressed the need for evaluation, sug- gested criteria, and proposed ideas on evaluation while 146 were studies of vari- ous aspects of reference service. In addi- tion, some 60 items were written on com- munication and interaction. 14 The spate of publication has continued to the present day. One trend apparent in the literature during this period sought the solution to reference service problems in the individ- ual reference librarian. 15Studies of refer- ence personnel increased tenfold from 1976 through 1982, covering such aspects as personality characteristics, concepts and attitudes, cognitive styles, and a vari- ety of other areas. Also explored was the role of the reference librarian in the com- munication process. 16 Another trend since 1975 has been a greater emphasis on outcome studies, with some 29 appearing in the literature. Of these, 55% utilized unobtrusive obser- vation or judgment of outcome by outside experts, 21% librarian judgment onll,, and 24% judgment by patron/librarian. 1 Stud- ies of the outcome of online searching also began to appear frequently during this pe- riod, but in contrast to outcome studies of traditional reference service, some 81% utilized patron judgment, 5% patron/li- brarian judgment, and 14% focused on measures of precision and recall. 18 This volume of publication, in the opin- ion of some experts, has not been associ- ated with substantial progress. In recent years, Altman notes, "there is still no con- sensus on how to measure, let alone eval- uate, reference services. " 19 Mary George, 316 College & Research Libraries a reference head, regrets that there are no · widely accepted evaluation methods on which to draw. 20 "Altman notes, "there is still no con- sensus on how to measure, let alone evaluate, reference services.' " Various other reference experts discuss reasons why progress in development of reference evaluation methodology may have been slow. For instance, some claim that not all relevant aspects of the refer- ence transaction have been properly stud- ied. Ellsworth and Joan Mason comment: . . . no consideration at all [has been paid] to the wide range of other factors that can make or break reference service .... Many things be- sides eye contact or body language speak vol- umes to library users about the reference de- partment's intentions toward them. 21 A second problem has been that in- depth knowledge of the measurement process and of the reference process have seldom been found together in a single in- dividual. Cross-disciplinary research is needed but has not taken place frequently enough. Zweizig comments on this, not- ing that reference managers do not feel that they understand the measurement process and that those who design mea- sures do not understand the reference process. 22 Unobtrusive Evaluation In spite of these problems and concerns, some progress has been made and valu- able insights gained by research done dur- ing this period. Perhaps the greatest sin- gle advance has been the development of the method of unobtrusive observation as a means of assessing the outcome of the reference transaction. In unobtrusive ob- servation, proxies present factual test questions at the reference desk by phone or in person. Success is then judged by the percentage of correct answers obtained. Research studies using unobtrusive obser- vation appear to have nearly doubled July 1987 since 1975, 23 and, in one instance, unob- trusive observation has been adapted for use as a performance measure in a single library. 24 The authors agree with McClure that, although unobtrusive testing is but one approach to the evaluation of refer- ence service, it is a critically essential com- ponent.25 Unobtrusive observation, however, has certain limitations, as has been noted by various reference experts. Test questions present a problem because they must be of a certain level of difficulty, must be repre- sentative, can't easily be repeated, and take considerable time and effort to for- mulate. Results are usually not available for some time and are difficult to §eneral- ize to the department as a whole. Also, this approach is generally biased toward factual questions and does not as- sess the full range of questions asked by patrons. Nor does it usually take into ac- count the difficulties of garbled and inco- herent questions as often presented by ac- tual patrons. Finally, this method ordinarily admits of only right or wrong answers and generally does not allow for the other less clear-cut outcomes, such as finding answers within an acceptable range or not finding what was sought but instead finding other useful information or materials. For these reasons, practical methods of reference assessment in addition to unob- trusive observation are still needed. This need for a variety of methods is stressed by Crowley, among others, who notes the advantages of independent multiple ap- proaches to measuring reference capac- ity. 27 Pierce also supports this when she notes that any library evaluating its refer- ence services will find any method chosen lacking in some respects. 28 One methodology available for use in assessing reference department quality is the questionniare survey. This approach is commonly used in sociological research, public opinion polling, and marketing, to name but a few contexts. Over the years there has been a vast improvement in the design of questionnaires to make results more reliable and valid. Thus, a suitably designed questionnaire might also work well for evaluation of the reference trans- action. CRITERIA FOR A REFERENCE ASSESSMENT INSTRUMENT AND DESIGN OF THE RTAI The literature of reference evaluation suggests a number of desired qualities for an effective reference assessment tool. An effective assessment instrument should A. Demonstrate validity; B. Demonstrate reliability; C. Utilize the natural setting; D. Assess multiple factors in the refer- ence process; and E. Include sufficient input, process, and outcome factors so that cause and ef- fect relationships may become apparent. In addition, the use of a standardized form to collect and report data should F. Utilize an adequate and unbiased sample; G. Provide for comparability of data; H. Provide for timeliness of results; and I. Provide for interpretability. Each of these items will be considered below as it relates to the design of our ref- erence transaction assessment instrument (RTAI) and data collection protocol. Construct Validity and the Influence of Positive Bias Perhaps the most important quality for any assessment instrument is construct validity, defined as the ability of an instru- ment to measure the characteristics it is in- tended to measure. Roloff notes in regard to reference research that ''Researchers rarely indicate reliability and validity mea- sures . . . absence of descriptions of such fundamental concepts ... is distress- ing. ,29 As mentioned previously, a major ques- tion that arises in the area of validity is re- lated to patron report of outcome .. Why have so many studies found high levels of patron satisfaction when, on the other hand, unobtrusive observation suggests that only about 50% of reference questions are answered correctly? Many in the profession have hypothe- sized that patron evaluation is not valid because patrons are biased in a positive di- Development and Testing 317 rection toward the reference librarian and allow this bias to influence their rating of success in finding information. These opinions are summarized by Young, who says: The very fact of receiving help or the pleasant demeanor of the reference librarian, rather than accuracy of information, will inevitably influ- ence patron perception of service. 30 Others express similar concerns, includ- ing Rothstein, Zweizig, Pierce, Altman, Kantor, Vathis, Weech, and McClure. 31 These concerns are based on results of past studies that are described by Roth- stein and Weech in their reviews of re- search in reference. 32 ''Satisfaction with service did not in- . fluence patrons' rating of the infor- mation found." On the other hand, it is important to note that Gothberg, using an instrument where outcomes of (1) satisfaction with the interview, and (2) satisfaction with the transfer of information were separated, found that satisfaction with service did not influence ~atrons' rating of the infor- mation found. 3 This supports the conten- tion that unrealistically high ratings found in previous studies were due in great part to the use of inadequate instruments and methods to study a complex phenome- non. Other researchers also note impreci- sion in previous measurement instru- ments. Certain common types of imprecision may actually encourage posi- tive bias and lead to inflated ratings. The Too-Broad Question. One example of imprecise measurement is the use of a broad, ·general satisfaction question not tied to any particular aspect of the refer- ence transaction. That is, the satisfaction question was not linked specifically to the information actually provided. Thus, re- sponses represented only broad, general satisfaction with the entire transaction or experience. Tagliacozzo also points out the ineffectiveness of the broad satisfac- tion question when she notes, "Especially 318 College & Research Libraries those questions which elicit general over- all judgments of the service provided by the system should be looked at suspi- ciously.' ' 34 The Single Question. In addition to the too-broad question, another reason for lack of precision and inflated ratings in previous attempts to measure satisfaction may have been the use of a single question that did not take into account the multiple outcomes of the reference transaction. Tagliacozzo emphasizes this when she comments ''one should be wary of taking the data of a single rating scale to assess the value that an information system has for its users. " 35 The importance of this is stressed by Cronbach, who recommends measuring all outcomes that might be worth striving for. 36 With regard to out- comes of the information interview in par- ticular, Auster says that satisfaction is multidimensional. She notes that a set of questions is required, rather than a single question, and that we must identify the distinct aspects and tap the dimensions of satisfaction identified in the literature. 37 Tessier, Crouch, and Atherton (Cochrane) discuss a variety of possible outcomes of the information transaction. 39 In designing the patron form, particular care was taken to prevent the blurring to- gether of different outcomes in order to in- crease the likelihood of valid ratings. Sep- arate scales were made for (1) success in finding what was wanted; (2) satisfaction with what was found or not found; (3) sat- isfaction with helpfulness, courtesy, etc.; and (4) amount learned. Other outcomes, such as communication difficulty, could also be measured by combining separate items. A copy of these forms may be seen in appendix A. However, even avoiding the above com- mon types of imprecision is probably not sufficient in and of itself to ensure validity of response. Positive bias toward the ref- erence librarian does exist and must be ac- tively controlled through sophisticated design. More specifically, positive bias to- ward socially desirable types of answers has been known to measurement experts for some time as one type of response set. Mehrens and Lehman note that steps can be taken to try to control for response set. 39 July 1987 Cronbach notes that response set is partic- ularly prevalent on tests that (1) contain ambiguous items; (2) require the individ- ual to respond on a disagree-agree contin- uum; and (3) lend themselves to re- sponses in a favorable or unfavorable direction. Best Answer Anchoring. One technique suggested by Cronbach is the best answer choice. 40 This technique has been utilized in the patron scale for finding what was wanted, where six choices of answer (de- scriptive phrases not necessarily on a con- tinuum) are presented (see appendix B). Using versions of this technique, Cuadra and others found, in studies of anchored versus unanchored scales, that anchoring the midpoint on the scale with a phrase tied to a particular reference point pro- duced lower ratings and was especially ef- fective in producing differentiated re- sponses. Cuadra notes "it might prove especially effective to add additional an- chors at each scale extreme, to obtain the desired greater difference effect.' ' 41 Loftus and others also remark on the effective- ness of anchored scales when they note, ''if the task required is one of estimation, we can use our knowledge of the effects of anchoring and other heuristics to guide the respondent's judgment.''42 Specificity of Scale Items. Cuadra and Kat- ter related positive bias on the part of rat- ers to the specificity of scale items. They found that if a rater was uncertain about how to represent the rating on a particular scale, the tendency was to assign a higher rating. A study was done in which some 24 subjects judged the relevance of 12 arti- cles to a certain statement of require- ments. These judgments were made un- der two different conditions. One condition involved a very specific state- ment of requirements, the other a more general statement of requirements. Using the general statement, the subjects ex- pressed greater uncertainty and tended to give higher relevance ratings. Using the more specific statement, they expressed less uncertainty and tended to give lower relevance ratings. Cuadra terms this the fail-safe reaction and discusses it further. 43 Number of Categories in Scales. In one study, Cuadra showed that certainty of judgment at the ends of the scale is much greater when there are at least six to eight categories in the scale. Six to eight cate- gory scales were shown to have superior information-preserving qualities over two and four category sales without apparent disadvantages. 44 ''In the natural setting, patron input is an essential dimension of evalua- tion of reference service.'' In line with these research findings and to encourage valid responses the follow- ing goals were set for the design of the RTAI: 1. Control possible bias in favor of the reference librarian (as a reward for help- fulness) by effectively separating ratings of service from those of success in finding what was wanted. 2. Prevent blurring of the rating of suc- cess with other outcomes by effectively providing separate ratings for (a) success in finding what was wanted, (b) satisfac- tion with the information/materials found or not found, (c) satisfaction with service (helpfulness, etc.), (d) amount learned, and (e) communication difficulty. 3. Prevent rater uncertainty in judging success by using precise and specific lan- guage in items. 4. Prevent rater uncertainty in judging success by use of a scale with six anchored points. Additional controls for positive bias were applied by establishing a rigorous scoring system. The criterion for complete success required three conditions: (1) that the patron mark "found just what was wanted'' on the success scale, (2) that the patron mark yes on the satisfaction scale, and (3) that there be no marks for any of the nine listed reasons for dissatisfaction. Thus, each transaction had to meet a three-point criterion in order to be judged completely successful. Similarly, in regard to the six service items (understood what was wanted, enough help, time, courtesy, clear expla- Development and Testing 319 nations, knowledge), a partly negative mark by the patron was grouped with fully negative marks as ''less than com- pletely satisfactory'' on that item. So it can be seen that the standard of judgment was set at a high level. · Reliability Mehrens and Lehman note, ''In general longer tests are more reliable. Test length influences reliability.' ' 45 The RTAI in- cludes multiple factors on both patron and librarian forms that, it is hoped, will en- hance reliability and permit cross check- ing of results. Results of reliability tests on the RTAI are treated in detail in the follow- ing section on pilot testing. Utilization of the Natural Setting Shavit says: Qualitative research has the natural setting as the direct source of data .... Reference service can best be understood when it is observed in the setting in which it occurs. 46 In the natural setting, patron input is an essential dimension of evaluation of refer- ence service. Cronin and others support this conclusion: It would only be possible to carry out a proper evaluation exercise by enlisting the cooperation of the user. . . . Few evaluation studies can be conducted without ... the views and require- ments of the end user being taken into ac- count.47 However, discussions such as those be- tween Cooper48 and Soergel49 argue the is- sue of whether patron judgment of suc- cess is sufficient, in and of itself, as the criterion for success. One argument given against use of the patron's report of suc- cess is that it often doesn't represent a judgment of the best possible information or material that could have been provided, as judged against an external criterion of quality. We believe this to be true, and this dimension of reference evaluation must be left to expert peer review. On the other hand, patrons judge the success and satisfactoriness of the infor- mation or material on how well these meet their own complex pattern of internal and external needs, purposes, and states of knowledge. The work of Cuadra and Kat- 320 College & Research Libraries ter illustrates this in demonstrating that the patron's purpose or intended use for information strongly influences his or her judgment of relevance, as, for example, self-generated needs for certain materials, following someone else's instructions, prejudices against certain sources or types of materials or against certain styles of pre- sentations, and certain knowledge or lack of knowledge that distorts the interpreta- tion of certain kinds of material. 50 This suggests that unless we fully un- derstand the patron's state of knowledge, purpose, and intended use for informa- tion, we cannot adequately judge how successfully his or her needs have been met. The patron is the only one in a posi- tion to know his or her own complex pat- tern of needs, motivations, abilities, and situational constraints, many of which are not expressed and can never be grasped in their entirety by the reference librarian or intermediary. Tagliacozzo emphasizes the need for patron judgment of outcomes when she notes, "Perhaps we will never be able to determine what the real infor- mation needs of the users are and should therefore content ourselves with what the users believe-and tell us-that they are. , 51 Unless we are unfailingly expert at inter- viewing and have limitless time at our dis- posal, we may miss some key aspect or · fact that renders our interpretation of need wrong. If the patron appears to be dissatisfied with good material or is satis- fied with scanty material it is just as likely that we have not uncovered some key con- straint or requirement as that this re- sponse is due to the patron's inability to judge information critically. 52 Others have, in fact, advocated or used a measure of user satisfaction. These in- clude Pritchard, Auckland, and Castens, 53 and Rzasa and Baker, 54 who considered user satisfaction one component of effec- tiveness . Lancaster defines effectiveness as ''to what extent a service satisfies its us- er ' s demands. " 55 The Consortium for Public Library Innovation also includes user satisfaction as a measure of effective- ness56 as did Strong57 in his study. Rzasa and Baker note that ''the conclusion can _ be drawn that a measure of effectiveness July 1987 must adequately reflect satisfaction of user needs. " 58 Multidimensional Nature of the Reference Process The reference process is complex, con- sisting of a considerable number of input, process, and outcome factors. Reference experts stress the need to take these multi- ple factors into account. For example, the interview process alone is recognized in other disciplines as extremely complex. Pittenger, Hockett, and Danehy devote 81 pages of text to analysis of only the first five minutes of the therapeutic inter- view. 59 Taylor underscores this by noting that the reference interview is one of the most complex acts of human communica- tion for ''in this act, one person tries to de- scribe for another person, not something he knows, but rather something he does not know. ' ' 60 As for the complexity of the process overall, Lancaster identifies at least 55 fac- tors 61 and S. D. Neill identifies some 66 factors that may play a part in the refer- ence process at one time or another. 62 In order for a reference transaction assess- ment instrument to have content validity, it would need to represent this complex- ity. Dick and Hagerty define content va- lidity as how well items selected for inclu- sion in an instrument represent the actual situation being considered. 63 Mehrens and Lehmann note that poor content validity is always deplorable but particularly bad when results are to be used for purposes of accountability . 64 Analysis of Cause and Effect Shavit summarizes well when he com- ments that the question is ''not only how accurate is the reference librarian's re- sponse .. . but also why is the answer in- accurate?" He adds that we need to know what causes searches to fail. 65 Lancaster, McClure, and Pierce also support the need to identify factors associated with failure. 66 In order to link cause and effect, it is nec- essary to include as many input factors as possible in a reference evaluation instru- ment because it cannot be known ahead of time with certainty which inputs will later prove to be causal factors . All outcomes should also be included so that complex patterns of cause and effect can be dis- cerned. If cause-and-effect relationships cannot be discerned, it is often difficult to interpret results, and the likelihood of misinterpretation is increased. Mehrens and Lehmann note that to establish causal relationships, certain variables must be taken into account, including input, sur- rounding conditions, process, and out- come.67 ''Norm-referenced measurement can often assist in decision making and is necessary both for program evalua- tion and for making predictions.'' In designing the RTAI, this model was followed, and an attempt was made to represent multiple input, condition, pro- cess, and outcome factors. A librarian form was designed based on input and process variables suggested by previous studies and by other reference practition- ers and experts68 and judged by the au- thors as most likely to affect the outcome of the reference transaction. This method of constructing an instrument, termed common sense or deductive, is supported by Burisch, who says "[there is] no consis- tent superiority of any strategy in terms of validity or predictive effectiveness . . . de- ductive scales normally communicate more directly ... whenever there is a gen- uine choice, the simple deductive ap- proach is recommended. ''69 Input factors in this model included type of staff handling questions, patron status and discipline, type and subject of question, and importance of question to patron. Process factors included response (directing and suggesting only or helping with the search), time taken, instruction given, and number and types of sources used. Factors related to surroundings were included in a 17-item checklist of problems. As further support for the se- lection of these factors, mobility or help- ing with the search has been shown by Gers and Seward to affect the outcome of the reference transaction. 70 A copy of the Development and Testing 321 librarian form may be seen in appendix A. The authors were aware that the librari- an's recording task would be increased to some extent by a librarian form with multi- ple factors. But it was felt that the greater good would come of being able to link cause and effect, thus tending to prevent harmful misinterpretation of results. Also, inclusion of multiple factors should contribute to reliability and permit cross- checking of results. Utilization of an Adequate and Unbiased Sample The sample utilized should be, as Roloff notes, both adequate to justify generaliza- tion and selected so as to avoid potential bias. 71 To counteract the possibility of bias, we required that all transactions during the sample period, both reference and di- rectional, be recorded. Directional trans- actions were clearly defined, and all other questions were to be recorded as refer- ence, in line with the definition as out- lined in ANSI Standard 2239.7-1983. 72 Comparability Brown notes that "without a common language and common understandings . . . we cannot effectively and efficiently address our common problems .... It is becoming less and less possible for the in- dividual library to exist in isolation from others."73 The RTAI addresses this need for comparability by providing norms. Mehrens and Lehmann argue the merits of norm-referenced instruments when they note that without norms for purposes of comparison, we have no clear idea of the meaning of a person's score in and of itself. Norm-referenced measurement can often assist in decision making and is nee- . essary both for program evaluation and for making predictions. They note that norms or ''group means are always more reliable than individual scores because the random errors of the individual scores tend to cancel themselves out, thus mak- ing the means reasonably accurate.' ' 74 The RTAI is a norm-referenced survey instru- ment designed to provide a profile for an individual library, with data for each item in the profile for (1) all libraries of the same size (and type), (2) the top-scoring library L------------------------------------------------------------------------------- 322 College & Research Libraries in that size group (three size groups), and (3) all libraries in the sample, based upon use of a standardized form and sampling strategy. Timeliness of Results Zweizig notes that situations should be avoided in which data arrive too late to af- fect decisions. 75 By utilizing the technol- ogy of computerized scanning, the RTAI should have a turnaround time of several weeks at the most. Interpretability Shavit, following John van Maanen, notes that high technical sophistication of- ten makes research results incomprehen- sible to all but the highly trained few. 76 . Brown notes that ''data should be orga- nized into a framework intelligible to both lay and library persons."77 The RTAI re- ports all scores in terms of simple percent- ages. Thus, if a percentage as a criterion for reference department performance were to be established in the future, the RTAI could be used as a criterion- referenced instrument. 78 Mehrens and Lehmann note that the most meaningful test should be related to both norms and criteria. 79 They also note that presenting a number of scores in profile form, as in the RTAt is a useful aid in interpretation. 80 In summary, the RTAI was designed over a period of several years, utilizing the first author's own reference work as a ba- sis and in consultation with the second au- thor with regard to measurement and sta- tistical questions. It was hoped that the RT AI would be able to be used by a library for the following purposes: 1. To provide feedback from the patron as to whether his or her particular needs were met. 2. To identify factors associated with success or failure on each individual trans- action. 3. To assess quality of service on all types of questions, rather than on factual questions alone. 4. To assess quality of service in the ac- tual reference setting, using the questions asked by a library's own patrons. 5. To provide feedback on a variety of outcomes of the reference transaction, such as success; satisfaction; quality of July 1987 helpfulness, courtesy, etc.; amount learned; quality of communication and the interview. 6. To compare other factors, as well as success, with other libraries of the same size and type. 7. To provide feedback to those in charge from librarians on factors that they report as having hindered their efforts on a particular transaction. PILOT-TESTING THE REFERENCE TRANSACTION ASSESSMENT INSTRUMENT (RTAI) In order to evaluate the reference trans- action assessment instrument and its pro- posed method of utilization, heads of ref- erence departments across the country were asked to volunteer in a pilot-test of the procedure. Of the libraries volunteer- ing to participate in this pilot test, one public and 17 academic libraries were se- lected in order to vary as much as possible the range of institutional contexts in which the assessment scheme would be used. These 18 libraries ranged in collec- tion sizes from 56,000 to 4 million volumes and included both publicly and privately funded institutions located in regions of the country from California to New En- gland. Also included in these 17 academic libraries were one medical school and one undergraduate library. Because of the particular natures of the public, undergraduate, and medical li- braries, it was decided to eliminate them from the results presented here. Thus, the data presented in the remainder of this pa- per are restricted to the 15 academic li- brary reference departments contained in our sample and described below in table 1. Each library received a packet of 50 forms for reference questions and 50 forms for directional questions, plus de- tailed instructions. Reference forms con- sisted of a checklist to be marked by the reference staff member for each question and a tear-off sheet to be marked by the patron for the same question. The two forms were machine coded so that librar- ian and patron responses could be matched after the forms were separated. A copy of the assessment instrument ap- pears as appendix A. Each library was instructed to adminis- I 1 Development and Testing 323 TABLE 1 CHARACTEIDSTICSOFLIDRARrnS PARTICIPATING IN PILOT TEST COLLECTION SIZE Publicly funded Privately funded Total number of transactions Small 500,000 vols. 1 3 138 ter the forms on a typical day in the Au- tumn quarter and to record each transac- tion until all 50 forms were administered. After a request for assistance had been made, the patron was informed that the reference department was conducting a survey and was asked to fill out a brief checklist at a later time and deposit the completed form in a labeled container some distance away. The patron portion of the form was then tom off and given to the patron. The staff member noted the question in the upper right-hand corner of the librarian's form and completed the rest of the items on the form as soon as possible after the transaction had been completed. 81 Upon completion of the 50 forms, librar- ian and patron forms were collected and mailed to the Ohio State University. The forms were computer scanned, and data were read onto magnetic tapes. The com- puter also matched serial numbers from the librarian and patron forms so that both halves of the transaction could be re- united. Return Rate In the 15 participating libraries, 88.46% of all forms given out were returned. The number of transactions per library for which both the patron and librarian halves of the form were returned ranged from a low of 80% to a high of 97.67%. Reliability Reliability of measures, the degree to which replication of measurement yields the same result, is a key concern in the for- mation of any measuring instrument. By definition, the best way to obtain an esti- mate of the reliability of an instrument is via the test-retest method. Because of the Medium 500,000-1,000,000 vols. 4 1 179 Large 1,000,000 vols. 4 2 214 time and costs involved in conducting such a study with the reference transac- tion assessment instrument, an alternate approach to estimating the instrument's reliability was employed: an analysis of the intercorrelation of multiple, parallel measures. To this end, the Reference Transaction Assessment Instrument was partitioned into a set of 35 variables that appeared to be distinct measures of the quality of li- brary reference service. The definitions of these 35 variables are found in appendix B, along with the method of their quantifi- cation. Variables whose manner of quanti- fication was expected to correlate nega- tively with quality of library reference service were inverted to correlate posi- tively. In addition, examination of the dis- tributions of the variables derived from the 531 responses in our sample revealed that several of them were skewed with re- spect to a normal distribution (that is, the bell-shaped curve). These variables were transformed to make the shape of their distributions more closely conform to a normal distribution. After performing some necessary modifications, Pearson zero-order product-moment correlations were computed for each pair of these 35 variables. The resulting correlation matrix appears as ~igure 1. As an initial measure of the instru- ment's reliability. Cronbach' s alpha, a commonly used estimator of reliability, was computed. 82 Cronbach's a = Ne11+(n-1)e, where e is the average interitem correlation and N is the total number of variables, 35 in our case. The average interitem correlation for the ma- trix in figure 1 is 0.1128, which yields a value for Cronbach's a of 0.81652. 83 Al- though the average interitem correlation 324 College & Research Libraries July 1987 '· TOTCkPrl ---2. Ql .os .• ) . Zll 1 • I) - .olt ... lliiS -.12 - .r.l •,01 s. ZIIO ,1'16 .01 .olt ,)1 6. Zl ,}lo ,10 .2 .. - .2~ .05 1. T01CKlll) .62 ,08 ·" •• O'J .)6 ·"" e. TOH•. ~rll .21 ,(18 .17 -.10 ,02 .1 .. .so 'J, tr•c r.crt ,12 ,08 • )0 .06 .01 ,I'J • r.z .oo 10. lOIO~rt ,16 ,(I(, .t•) • ,0) .'11 • ~I • Slo ,10 ,0, "· lliiO.QI'II • .02 •• os .01 .01 .oo ,II ·"' ,08 .01 ,0) u. 1111(1\.ll -. 21t ·.01 - ,0) .I) -.oo -. 22 -,I, -,118 • ,OJ -,07 -.olt I), ll ,~Ill'" ,) I ,I, .76 -.11 .us .!IS ,)'J ,76 • Ht ,I) ,Is ..... lit. Zrt.O •• oo .os ,loa .0) ,It .to ,10 ,O'J ,18 ,02 .00 •,OI't .12 IS. zrAS •• us -.0) • (I) .22 .21 •• tS -,07 -.os .02 -.01 -.01 •. olt -.10 .2" 16. Pll) •• 02 ,01 .28 .oo ,12 ,00 .0) .os ,07 .OS -.os • .oa .08 ,80 ,I'J 17. H)ICY.!tll ·.~) ,f;6 ,, ,c~ ,II ,11 I ::1 ,c! ,n3 .c~ :!II I! ' J) ,ij ,JO ,I~ 1' .... ' . 15; lu ic ~L rv ,IJ) ,05 ,il) • .l·~ .G~ .. ,~, .,:.tu • ,t!J •,01 -,ol • ,oi .ut - .os • 21t ,I, • 2~ 19. Zl •. cs .02 .11 • ,08 ,08 ·,liD ·"" ,0) .oo .12 •• 01 .05 •• 02 ,18 •,01 ,II 10. 101Cr.HIII .co ,It ,01 .0) .05 •• os •,01 ,0) -.os •,Ole •,00 ,01 •• 06 • 28 • 25 ,2, Zl, TOtCY.ttl • .OJ ,118 ,02 ,(16 .CIS •,06 -.oz .oz •. 0) •,IJ) .oz ,01 -.01 .)2 • 27 . ) ' n. I01C1'111t .olt ,(18 •• 01 •,OJ ,Ill ,CIIt •,01 ,U6 • • 01 •,Oit ·,OJ •. 01 .01 ,IZ .12 ,I, H. Zt•ll7 ,0) ,0) .02 ,Ol .oo • ,01 •• oo .oo ,01 -.os • ,Ol .oo .oo ,II ·'" ,ll lie. ll'kS •.08 ,07 .07 .01 ,Oio •. ole • ,ot, •• oo .01 -.os •,Ol ,01 - .0} ,)0 • 25 .1'3 25. ZPII'J .02 .06 .01 ·.Ole .08 •,!':} •,01 •• 02 -.02 ,1)2 - ,0} .0 .. ·.OS .20 . ,, ,I, 21,. zr·111 n .0] .01 -.01 - .oz ,0) ,(lit -.01 .01 • .O't -.02 •• 0) ,02 .oo ,10 .06 ,18 v. 1r1111 - ,0) .01 •,Oit ,10 • .02 •,II •• 02 -.06 .01 -,(I(, .02 ,1)0 -.10 .06 .06 .ol't 28. lPRIZ • ,00 .os • .Oio .oo ,(16 -.01 -.0) •,01 ·.OS •,01 .oo .00 • . 06 • 25 • 25 .2 .. 2'J. IOI(t,liiN ·.10 -.0) •,01 .O'J ·.00 -.zo -.10 •,0) -.07 - .os ·.02 .1) -.20 ,O'J ,0) .07 )0, 10 tC Ulll .os .08 .II ·.06 .o .. ,Ol .01 .oo ,10 -.oo •, I 0 •,06 .0) • 1t7 .IS ·"" )I • ~JlCrPCD .o .. .07 .12 -.o~t -.01 ,Oio .0) ,02 ,ll -.0) -,06 -,01 -.00 ·" .12 .)5 ~ }2. 1010£0 .18 .o~ ,I] -,£11 .02 .II ,I'J ,I} • I} ,01 -.06 -.07 ,Oit .)2 ,II) • 211 )). 1L11rs • • I) - ,(llo .cc. ,81 . ~ .. -.:c. -.12 • ,II ,06 ·.01 -.02 ,06 -.18 .10 .t6 ,1) J". A': •, I l -.oo ,12 ·'' ,18 .oo •,O(, •,00 -.01 ,0) ·.Ol ,02 .02 ,r.o .20 ,)2 JS. TOlCrANS ,02 ,01 .2 .. .01 .10 ,O'J .oc. ,01 ,0, .oz -.oz -.o7 ,10 .76 ,I, .78 1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 FIGURE 1 Correlation Matrix is low, the sheer number of variables un- factor analysis, principal components der consideration here accounts for the analysis, and canonical correlation analy- large value for Cronbach's a. sis. Whereas these other techniques aim to Computation of Cronbach' s a assumes create a new set of variables that represent that the 35 variables are parallel a mathematically simpler combination of measures-that is, different measures of the original set of variables, the results of the same underlying phenomenon. It ig- cluster analysis are groupings of the origi- nores the fact that two or more distinct di- nal variables based upon their overall de- mensions of a complex phenomenon may gree of similarity. Variables in the same be measured. If such is the case, Cron- cluster are more closely related to each bach's a is a poor estimator of the true reli- other than to any variable found in a dif- ability of the measuring instrument. Close ferent cluster. In addition, these groups, inspection of the correlation matrix (figure termed clusters, are nonoverlapping. A 1) suggests that this may be the case for variable will be assigned to only one clus- the reference transaction assessment in- ter. In the other previously mentioned strument. In some sections of the matrix, multivariate techniques, a single variable correlations are quite small, while in other may be divided up into two or more new sections, groups of highly intercorrelated variables, making interpretation of the variables can be seen. To explore this pos- results rather difficult. sibility in a more formal manner, a cluster There are a number of distinct tech- analysis of the 35 variables was con- niques for performing cluster analyses. 84 ducted. The procedure used in this study is a hier- Cluster analysis is one of a number of statistical methods used in the examina- archical, divisive method using the SAS program VARCLUS. 85 The procedure is tion of multivariate data sets and is akin to divisive in that it starts out with the full35- Development and Testing 325 ·"' .. .; ,{IS .06 • 21. .60 .o~t .28 ·'' .Oit .7') ·" ,2) .01 .10 . ,, .0~ .}0 .oz .SI .22 .6S .25 • loZ • .!II .68 .Sit .H .u .17 .86 .06 .6S ... , .2) .l'j ,lt6 --- .10 .... .00 .60 .18 ,qo ,)0 ·'' ·'' .or. .oa - .olt .)1 • 21 .17 .21 ·'" .01 ,10 ·'' .)It .Ol .60 .so ,IS .II ... , • JS ,0') .17 .06 .10 .os ·'' • 2) .06 • .0} .12 .lit ·'' .06 ,16 .26 ·'' .OS .I 1 .10 ·'' . 'J . ') .IS .O'J .06 .06 -.oo .2) .2 .. .or. • )5 ·'' • r.z . s~ ·'' .21 .n ,(•8 ,12 -.or. .12 ,17 • 20 .or. .28 ·" • JS • 52 .12 .18 • '7 .06 ·" -.o, . sa .82 ,12 .08 -0 01 ·'' .IS .os ,10 ·'' .oq .0) .oa ,I J .12 .06 .oa. ,01 .2') ·' J .07 .16 , Zit .01 .OS ,18 ,I~ .oo .0) • 22 .oc. • ll .IS .15 .21 .88 .)1 .C'S .29 .2, ·" .lit .Z7 .:z .12 .01 .zo .or. .5') ,It) ,)It ,I) .}l 17 18 19 20 21 22 23 "24 25 26 27 28 29 30 31 32 33 34 FIGURE 1 Continued variable set and begins to divide them into clusters in a step-by-step manner. It first divides the total variable set into a best two-cluster solution. It then examines the two clusters, finds the cluster containing the greater amount of dissimilarity among the variables contained in that cluster, and divides it into two clusters. The result is a three-cluster solution of the variables. The three clusters are then examined, a four- cluster solution is obtained, and so on. Ul- timately, if this procedure were allowed to keep clustering in this manner, it would eventually produce 35 clusters, with one variable in each cluster. However, the subdividing process is stopped when there is statistically little improvement to be gained by further subdivision. In the context of reliability studies, variables contained in the same cluster when the program terminates can be considered sufficiently similar so as to represent parallel measures of an underlying dimen- sion. In addition, the V ARCLUS program is a form of oblique principal components analysis so that the reliability estimator theta ( e ) , an extension of Cronbach' s a to the results of principal components analy- sis, 86 can be computed for each resulting cluster. Table 2 lists the 12 clusters produced from the original35 variables by the proce- dure described above. The steps in the for- mation of this 12-cluster solution are rep- resented in figure 2, where gaps in the column of Xs correspond to separate clus- ters as produced at each step in the analy- sis. The results observed in figure 2 are quite revealing. At the two-cluster stage, the analysis almost exactly divides the vari- ables into those derived from the librarian half of the form and those from the patron half. The correlation between the two clusters is 0.04. The only exception is that TOTCKLRN, a patron variable measuring how much the patron learned about the li- brary or reference sources, is contained in the librarian cluster. It should be noted that at the final stage, this variable is clus- 326 College & Research Libraries July 1987 TABLE2 CLUSTERS PRODUCED FROM ORIGINAL 35 VARIABLES Cluster Variable Name Definition 1 2 3 4 5 6 7 TOTCKSER TOTCKTH ZPR12 ZPR8 TOTCKLR3 TOTCKSPB TOTCKQPB ZFND PR3 TOTCKNEI AG TOTCKRNS TOTCKUK ZPR7 ZPRlO ZPRll ZBD TOTCKAPB TOTCKREL TOTCKPCD TOTCKCD ZPRS ZLRPS Number of times patron checks negative responses on 6 service items (un- derstanding, knowledgeability, courtesy, clear explanations, enough time, enough help) Number of times patron reports receiving only partly or not enough time and/or help Patron reports whether enough time was given (yes, partly, no) Patron reports whether enough help was given (yes, partly, no) Number of times librarian checks ~7listed problems (see appendix B) Number of times librarian checks 3 searching problems (difficult to think of source, difficult to find subject headings, source is difficult to consult) Number of times librarian checks 4 question problems (lacking and misin- formation, very recent information wanted, concerned with foreign country/language, difficult citation) Patron reports degree to which he/she found what was wanted and was satisfied (found exactly and satisfied, found approximately and satisfied, found approximately and not satisfied, not found and not satisfied) Patron reports degree of satisfaction with what was found or not found (sat- isfied, partly satisfied, not satisfied) · Number of times patron checks negative items in regard to quantity of infor- mation (found nothing, couldn't find information in source, not enough, not enough in-depth) Number of times patron and librarian agree on outcome (agree found, agree partly found, agree not found) Number of times patron checks 9listed reasons for being partly or not satis- fied (found nothing, not enough, need more in-cfepth, not relevant enough, want different viewpoint, couldn't find information in source, not sure if correct) Number of times that patron reports that librarian only partly or did not understand, and/or was only partly or not knowledgeable Patron reports whether librarian understood what was wanted (yes, partly, no) Patron reports whether librarian appeared knowledgeable (yes, partly, no) Patron reports whether librarian was courteous (yes, partly, no) Librarian reports whether busy and whether directed or helped with search (busy and direct, busy and search, not busy and direct, not busy and search) Number of times librarian reports 3 administrative problems (busy, books off shelf, cataloging or tedinical problem) Number of times patron checks negative items in regard to quality of infor- mation (not relevant enough, want different viewpoint, not sure infor- mation correct, what I asked for but not really what I wanted) Number of times checks items indicating communication difficulty (what I asked for but not really what I wanted, librarian only partly or did not understand what was wanted, want different viewpoint) Number of times l'atron checks items indicating communication difficulty plus number of times librarian reports that patron was difficult to commu- nicate with Patron reports that librarian helped with the search or directed and sug- gested Librarian and patron report that librarian helped with search Librarian or patron report that librarian directed and suggested only Development and Testing 327 TABLE 2 Continued Cluster Variable Name Definition 8 TOTCKPPB Number of times librarian reports 5 patron problems (needs extra help, in hurry, wants number of things, returns frequently, difficult to communi- cate with) QT ZT LLR5NUM TOTCKLI Librarian reports type of question (4 types, most specific to least specific) Librarian reports time taken in minutes (0-2, 3-5, 5-15, more than 15) Librarian reports number of sources used (1-5 +) Number of times librarian reports giving 2 types of library instruction (sources, citations, search strategy and!or catcilog, computer, holdings, locations) TOTCKLRN Number of times patron reports learning (degree of learning about sources and/or about library in general) 9 ZRT Librarian reports degree of success in finding what was wanted by patron (found, partly found, not found) TOTCKCPB Number of times librarian reports collection problems (collection weak or out-of-date and/or needs books in another area) 10 TOTCKLEV Number of times patron checks items indicating difficulty in comprehend- ing and utilizing information (need more simple, too much, explanations not clear) 11 ZPR9 LRS Patron reports whether librarian's explanations were clear (yes, partly, no) Librarian reports whether helped with the search or directed and suggested oruy . 12 ZI Patron reports degree of importance of transaction to him/her (very impor- tant, important, moderately important, somewhat important, not impor- tant) tered with TOTCKLI, a measure of the amount of instruction the librarian gave the patron. The views of both the librarian and patron appear to agree on the issue of instruction. At the third step in the derivation, the patron variables are divided into a cluster of variables evaluating the services pro- vided by the librarian and a cluster of vari- ables measuring the quality of the infor- mation the patron received. The correlation between these two clusters is 0.39. The other intermediate steps in the deri- vation of the 12-cluster solution are not so easily interpretable. In general, however, the variables within each of the 12 clusters are semantically related, suggesting that the results are sufficiently valid for reli- ability considerations. Given that the 12 clusters produced from our 35 variables (see table 2) repre- sent distinct submeasurements of the quality of library reference departments, reliability estimates for each cluster can be computed. These are listed in table 3. It should be noted that theta for a single- variable cluster cannot be computed. Clearly, cluster nine has an extremely low reliability estimate vis-a-vis the other clusters. It combines the presence of col- lection problems in the area of the pa- tron's question with the librarian's report of whether the information needed was found. Although this is a sensible relation- ship of the two variables in that a weak col- lection in a given area will adversely affect the ability of the librarian to find the infor- mation the patron requests, the relation- ship read the other way around (a strong collection in a given area) is not so directly related to finding the information. A host of factors intervene in going from the pa- tron's question to determining the correct source. It is significant that cluster nine was produced from a larger cluster that contained cluster two, a set of variables consisting of overall problems with the transaction, the presence of searching problems, and the presence of question problems. If clusters two and nine are re- combined, the resulting cluster has a reli- 328 College & Research Libraries July 1987 ITERATION VARIABLES xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKSER xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKTH xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZPR8 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZPR12 xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKLEV xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZPR9 xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZPRS xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZLRPS xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKUK xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZPR7 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZPR10 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZPR11 xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZFND ~ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx PRJ ... xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKNEI xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx AG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKRNS xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZI xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKREL xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCI(J>CD xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKCD XX xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKPPB xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx QT xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZT xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKLI xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LLR.5NUM xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKLRN xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKLRJ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKSPB xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOT(]{QPB xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZRT xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOTCKCPB xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZLRS xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ZBD xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TOT(]{APB FIGURE2 Cluster Solution Development and Testing 329 TABLE3 RELIABILITY ESTIMATES FOR CLUSTERS Cluster THETA 1 0.90 2 0.62 3 0.89 4 0.76 5 0.87 6 0.88 ability estimate (theta) of 0.63. Finally, the reliability estimate theta can be computed for intermediary clusters. In particular, at the two-cluster stage, there- liability of the patron-derived variable cluster is 0.89 and for the librarian-derived cluster, 0.76. External Validity In addition to being reliable, measures must be valid. In contrast to reliability, va- lidity of measures concerns the degree to which an instrument measures the con- cept it is designed to measure. Evidence for the validity of an instrument can come from a number of sources, but external va- lidity, the degree to which results from a measurement instrument correspond to the results from other studies of the same phenomenon, is most commonly utilized in addition to the construct validity issues previously discussed. Many of the variables assessed by the reference transaction assessment instru- ment have not-been independently exam- ined on a sample of library transactions and so cannot be assessed in terms of ex- ternal validity. One item, however-the percentage of times the patron found ex- actly what was wanted-should be com- parable to the results of unobtrusive ob- servation by Myers and by Jirjees. In these studies, proxies presented test questions Cluster THETA 7 0.79 8 0.62 9 0.46 10 0.93 11 12 by telephone to reference librarians in aca- demic libraries in the Southeast and Northeast. Myers, in her study of aca- demic reference service in the U. S. South- east, found that 49.12% of the responses to her list of 14 factual questions were an- swered correctly. 87 Jirjees, using 35 factual questions, found that 56.6% were an- swered correctly. 88 In our sample of 15 li- braries, patrons report that they received "just what was wanted" on 46.32% of the "facts and statistics" questions. A chi- square test comparing these percentages in the three studies is not statistically sig- nificant (table 4). Thus, results of success on factual-type questions as reported by patrons in this study correspond to that obtained by testing reference librarians by unobtrusive observation on a preselected set of reference questions. A second comparison with external studies involves previously reported data on patron satisfaction with reference de- partment service, as opposed to quality of answer, where approximately 81% and 84% of reference department patrons re- ported they were satisfied with the ''inter- personal" service they received. 89 In this study, where patrons were asked to evalu- ate the quality of the "interpersonal" ser- vice they received, independently of their evaluation of their satisfaction with the in- formation they obtained, 79.48% of the TABLE4 Found Exactly Not Found Exactly Total COMPARISON OF RESULTS OF UNOBTRUSIVE OBSERVATION WITH RESULTS OF THIS STUDY Myers JIRJEES 236 99 244 76 480 175 ,C=3.64 d£=2 RTAI Total 44 379 51 371 95 750 P>0.05 330 College & Research Libraries patrons in the sample reported they were completely satisfied with the service. Al- though no statistical test of these percent- ages can be provided, the results of these three studies of patron satisfaction with service are quite similar. DISCUSSION For the two variables on the reference transaction assessment instrument for which independent measures exist (i.e., ZFND and TOTCKSER), the results from our pilot study agree with those of other studies. Assessment of external validity for the other variables derived from the RTAI must await independent study. For the complete set of 35 variables, estimated reliability as measures of reference depart- ment quality is 0.81, which is quite re- spectable for a questionnaire. Given the high return rate, the RIAl and its employ- ment as an evaluation tool appear to be a viable means of conducting reference de- partment evaluation. Results from the cluster analysis indi- cate that the 35 variables derived from the RTAI group into 12 clusters, with reliabil- ity estimates ranging from 0.46 to 0.93. Further interpretations and inferences from the clustering results must be cau- tiously drawn as the clustering technique used in this analysis is designed to be ex- ploratory and, as such, does not serve to confirm or refute a priori hypotheses con- cerning the relationships among vari- ables. Alternate techniques would be needed to test different models for the re- lationships among clusters, and a more rigorous sampling strategy would be re- quired for testing hypotheses and general- izing results to the total population of ref- erence transactions. Two aspects of the clustering solution presented in the reliability section, which were only mentioned in passing, require further explanation and exploration. As was mentioned, at the two-cluster stage the procedure employed in this study di- vided the 35 variable set into a patron form-derived cluster and a librarian form- derived cluster, with a correlation of 0.04 between these two clusters. This low cor- relation may surprise the reader, suggest- ing that there is no statistically significant July 1987 relationship between patron and librarian judgments. This may not be a valid infer- ence, however, because the librarian form consists of variables that, by and large, measure factors in the situation that are expected to influence librarian perfor- mance. The patron form variables largely measure outcomes of the interaction. Al- though factors that influence outcomes should relate to those same outcomes more strongly than this result indicates, the clustering procedure used here was selected to create clusters that are maxi- mally distinct from one another while at the same time, placing variables that are as similar as possible into a cluster. The most likely interpretation of our results is that outcome variables, as a group, are more closely related to each other than they are to influencing variables as a group. Alter- nate analyses would need to be performed to determine the strength of relationship between particular situational factors and particular outcome measures. At the 3-cluster stage of the 12-cluster fi- nal solution, the procedure separates the patron cluster into a service evaluation subcluster and a quality of information re- ceived subcluster, with a correlation of 0.39 between these two subclusters. This result may have some bearing on the con- troversy surrounding the ability of pa- trons to evaluate reference department quality. Our results indicate that, as judged by the patron, reference librarians satisfactorily respond to the patron's in- formation needs on factual questions ap- proximately 50% of the time. Patrons, however, also appear to distinguish be- tween their satisfaction with the services provided (courtesy, amount of help and time given, understanding and knowl- edge, clear explanations) and their satis- faction with the information they ob- tained. With a correlation of 0.39 between the service measures and the information measures, only 16% of the variance in ser- vice satisfaction could be explained by their evaluation of the quality of informa- tion received. Although quality of infor- mation received is a factor influencing the patron's judgment of the quality of ser- vice, this is not the sole determinant. In- ferences concerning the patron's ability to .. judge information critically, based only upon measures of their satisfaction with the services provided, are unwarranted. These interpretations, nevertheless, re- quire further research. Our results are merely suggestive. This, however, indi- cates an additional use of the reference transaction assessment instrument as are- Development and Testing 331 search tool. By presenting the results of our pilot test and its use as an evaluation tool, it is our hope that other researchers in the area of reference department behav- ior will be encouraged to further study the issue of what makes a reference depart- ment good. REFERENCES AND NOTES 1. Mary Jo Lynch, "Research in Library Reference/Information Service," Library Trends 12:462 (Win- ter 1983). 2. Samuel Rothstein, "The Measurement and Evaluation of Reference Service," Library Trends 12:464 (Winter 1983). 3. Alvin M. Schrader, "Performance Standards for Accuracy in Reference and Information Services: The Impact of Unobtrusive Measurement Methodology," The Reference Librarian 11:197-214 (Fall/Winter 1984). 4. Katherine Emerson, "National Reporting on Reference Transactions, 1976-78," RQ 16:199-207 (Spring 1977); American National Standard for Library and Information Sciences and Related Publishing Practices: Library Statistics (New York: American National Standards Institute, 1983) (ANSI Z239. 7- 1983). 5. Carolyn Hieber, An Analysis of Questions and Answers in Libraries, Studies in the Man-System Inter- face in Libraries, Report no.1 (Bethlehem, Pa.: Lehigh University Center for Information Sciences, 1966) (ED 014 995). 6. Gerald Jahoda, The Process of Answering Reference Questions: A Test of a Descriptive Model (Tallahas- see, Fla.: Florida State University, School of Library Science, 1977) (ED 136 769); Gerald Jahoda and Judith Braunagel, The Librarian and Reference Queries: A Systematic Approach (New York: Aca- demic Pr., 1980). 7. Clark Hallman, "Designing Optical Mark Forms for Reference Statistics," RQ 20:257-64 (Spring 1981). 8. Gary Strong, "Evaluating the Reference Product," RQ 19:367-72 (Summer 1980). 9. Benita J. Howell, Edward B. Reeves, and John Van Willigen, ''Fleeting Encounters-A Role Analy- sis for Reference Librarian-Patron Interaction," RQ 16:124-29 (Winter 1976). 10. Judith A. Tessier, Wayne W. Crouch, and Pauline Atherton (Cochrane), "New Measures of User Satisfaction with Computer-Based Literature Searches," Special Libraries 68:383-89 (Nov. 1977). 11. Renata Tagliacozzo, "Estimating the Satisfaction oflnformation Users," Bulletin of the Medical Li- brary Association 65:243-49 (Apr. 1977). 12. Helen Goth berg, ''Immediacy: A Study of Communication Effect on the Reference Process,'' Jour- nal of Academic Librarianship 2:128 Guly 1986). 13. Carlos Cuadra, Robert Katter, Emory Holmes, and Everett Wallace, Experimental Studies of Rele- vance Judgments: Final Report. Volume II. Description of Individual Studies (Santa Monica, Calif.: Sys- tem Development Corporation, 1967). 14. Marjorie Murfin and Lubomyr Wynar, Reference Service: An Annotated Bibliographic Guide; Supple- ment 1976-1982 (Littleton, Colo.: Libraries Unlimited, 1984), p.179-215, 9-10, 104-14, 245-47. 15. William F. Young, ''Evaluating the Reference Librarian,'' The Reference Librarian 11:125-26 (Winter 1984). 16. Murfin and Wynar, Supplement 1976-1982, p.193-97, 201-2. 17. Murfin and Wynar, Supplement 1976-1982, p.208-15. 18. Murfin and Wynar, Supplement 1976-1982, p.285-97. 19. Ellen Altman, ''Assessment of Reference Services,'' in The Service Imperative for Libraries: Essays in Honor of Margaret E. Monroe, ed. Gail Schlachter (Littleton, Colo.: Libraries Unlimited, 1982), p.170. 20. Mary W. George, "Questions Sheehy Can't Answer: Reflections on Becoming Head of Referece," The Reference Librarian 3:13 (Spring 1982). 21. Ellsworth Mason and Joan Mason, "The .Whole Shebang-Comprehensive Evaluation of Refer- ence Operations," The Reference Librarian 11:26 (Fall/Winter 1984). 22. Douglas Z weizig, "Tailoring Measures to Fit Your Service: A Guide for the Managers of Reference 332 College & Research Libraries July 1987 Services," The Reference Librarian 11:53 (Fall/Winter 1984). 23. Murfin and Wynar, Supplement 1976-1982, p.212-15. · 24. Linda M. Olson, "Reference Service Evaluation in Medium-Sized Academic Libraries: A Model,'' Journal of Academic Librarianship 9:322-29 Oan. 1984). 25. Charles McClure, "Output Measures, Unotrusive Testing, and Assessing the Quality of Refer- ence Services," The Reference Librarian 11:232 (Fall/Winter 1984). 26. McClure, "Output Measures," p.229; Sydney Pierce, "In Pursuit of the Possible: Evaluating Ref- erence Services," The Reference Librarian 11:16 (Fall/Winter 1984). 27. Terence Crowley, "Referred Reference Questions: How Well Are They Answered?" The Reference Librarian 11:90-91 (Fall/Winter 1984). 28. Pierce, "In Pursuit of the Possible," p.18. 29. Michael Roloff, "Communication at the User-System Interface: A Review of Research," Library Research 1:16 (Spring 1979). 30. Young, "Evaluating the Reference Librarian," p.124. 31. Rothstein, "The Measurement and Evaluation of Reference Service," p.464; Zweizig, "Tailoring Measures," p.56; Pierce, "In Pursuit of the Possible," p.15-16; Altman, "Assessment of Refer- ence Services," p.175-76; Paul Kantor, "Evaluation of and Feedback in Information Storage and Retrieval Systems," in Annual Review of Information Science and Technology, v.17 (White Plains, N.Y.: Knowledge Industry, 1982), p.114; Alma C. Vathis, "Reference Transaction and End Prod- uct as Viewed by the Patron," RQ 23:61 (Fall1983); Terry Weech, "Who's Giving All Those Wrong Answers? Direct Service and Reference Personnel Evaluation," The Reference Librarian 11:119-20 (Fall/Winter 1984); McClure, "Output Measures," p.231. 32. Rothstein, "The Measurement and Evaluation of Reference Service," p.456-72; Terry Weech, "Evaluation of Adult Reference Service," Library Trends 22:315-35 Oan. 1974). 33. Gothberg, "Immediacy: A Study of Communication Effect on the Reference Process," p.128. 34. Tagliacozzo, "Estimating the Satisfaction of Information Users," p.248. 35. Tagliacozzo, "Estimating the Satisfaction of Information Users," p.243. 36. L. J. Cronbach, "Course Improvement through Evaluation," Teacher's College Record 64:680 (May 1983). 37. Ethel Auster, "User Satisfaction with the Online Negotiation Interview: Contemporary Concern in Traditional Perspective," RQ 23:55 (Fall1983). 38. Tessier, Crouch, and Atherton (Cochrane), "New Measures of User Satisfaction," p.383-89. 39. William Mehrens and Irvin Lehmann, Standardized Tests in Education, 3d ed. (New York: Holt, 1980), p.215. 40. L. J. Cronbach, ''Further Evidence on Response Sets and Test Design,'' Educational and Psychologi- cal Measurement 10, no.1:4-5, 22-23 (1950). 41. Cuadra and others, Experimental Studies, p.75. 42. Elizabeth Loftus, Stephen Fienberg, and Judith Tanur, "Cognitive Psychology Meets the National Survey," American Psychologist 40:178 (Feb. 1985). 43. Carlos Cuadra and Robert V. Katter, "Opening the Black Box of Relevance," Journal of Documenta- tion 23:291, 293 (Dec. 1967). 44. Cuadra and others, Experimental Studies, p.66. 45. Mehrens and Lehman, Standardized Tests, p.61, 53-54. 46. David Shavit, "Qualitative Evaluation of Reference Service," The Reference Librarian 11:239-40 (Fall/Winter 1984). 47. Blaise Cronin, "Performance Measurement and Information Management," Aslib Proceedings 34:231 (May 1982). See also Shavit, "Qualitative Evaluation," p.241; Douglas Zweizig and Brenda Dervin, "Public Library Use, Users, Uses-Advances in Knowledge of the Characteristics and Needs of the Adult Clientele of American Public Libraries," in Advances in Librarianship v.7, ed. Melvin Vogt (New York: Academic, 1977), p.249; P. V. Rzasa and N. R. Baker, "Measures of Effec- tiveness for a University Library," Journal of the American Society for Information Science 23:248 Ouly 1972); Vathis, "Reference Transactions and End Product," p.60; F. W. Lancaster, The Measurement and Evaluation of Library Services (Washington, D.C.: Information Resources, 1977), p.1; Helen Focke, "Library Users of Information and Their Needs," in Present Status and Future Prospects of Reference/Information Service (Chicago: American Library Assn., 1967), p.32. 48. WilliamS. Cooper, "On Selecting a Measure of Retrieval Effectiveness," Journal of the American Society for Information Science 24:87-100 (Mar./ Apr. 1973). 49. Dagobert Soergel, "Is User Satisfaction a Hobgoblin?" Journal of the American Society for Information Science 27:256-59 Ouly/Aug. 1976). . 50. Cuadra and Katter, "Opening the Black Box of Relevance," p.294. j Development and Testing 333 51. Tagliacozzo, "Estimating the Satisfaction of Information Users," p.243. 52. For arguments in support of this viewpoint, see Cooper, "On Selecting a Measure," p.87 -:99. Also see Tessier, Crouch, and Atherton (Cochrane), "New Measures of User Satisfaction," p.386. 53. A. Pritchard, M. Auckland, and M. Castens, Library Effectiveness Study (London: City of London Polytechnic, 1973) (ED 089 761). 54. Rzasa and Baker, "Measures of Effectiveness," p.248. 55. Lancaster, The Measurement and Evaluation of Library Service, p.l. 56. Gary E. Strong, "The Consortium for Public Library Innovation," Public Library Quarterly 1:253 (1979). 57. Strong, "Evaluating the Reference Product," p.367-72. 58. Rzasa and Baker, "Measures of Effectiveness," p.248. 59. Robert Pittenger, Charles Hockett, and John Danehy, The First Five Minutes: A Sample of Microscopic Interview Analysis (Ithaca, N.Y.: Martineau, 1960). 60. Robert S. Taylor, "Question-Negotiation and Information Seeking in Libraries," College & Re- search Libraries 29:180 (May 1968). 61. F. W. Lancaster, "Factors Influencing the Effectiveness of Question-Answering Services in Li- braries," The Reference Librarian 11:95-108 (Fall/Winter 1984). 62. S.D. Neill, "The Reference Process and the Philosophy of Karl Popper," RQ 24:314-18 (Spring 1985). 63. Walter Dick and Nancy Hagerty, Topics in Measurement: Reliability and Validity (New York: McGraw-Hill, 1971), p.85. 64. Mehrens and Lehmann, Standardized Tests, p.267. 65. Shavit, "Qualitative Evaluation," p.237, 241-42. 66. Lancaster, "Factors Influencing the Effectiveness of Question-Answering," p.106-7; McClure, "Output Measures," p.218; Pierce, "In Pursuit of the Possible," p.19-20. 67. Mehrens and Lehmann, Standardized Tests, p.266-67. 68. Jahoda, Process of Answering Reference Questions. 69. Matthias Burisch, "Approaches to Personality Inventory Construction: A Comparison of Merits," American Psychologist 39:214 (Mar. 1984). 70. Ralph Gers and Lillie Seward, "Improving Reference Performance," Library Journal110:34 (Nov. 1, 1985). 71. Roloff, "Communication at the User-System Interface: A Review of Research," p.l6. 72. American National Standard for Library and Information Sciences and Related Publishing Practices: Library Statistics (New York: American National Standards Institute, 1983) (ANSI Z39.7-1983). 73. Maryann Kevin Brown, "Library Data, Statistics and Information: Progress toward Comparabil- ity," Special Libraries 71:477, 484, 478 (Nov. 1980). See also Marcella Ciucki, "Recording of Referen- ce/Information Service Activities: A Study of Forms Currently Used," RQ 16:273-83 (Summer 1977); Auster, "User Satisfaction," p.56; Katherine Emerson, "Definitions for Planning and Eval- uating Reference Service," The Reference Librarian 11:74 (Fall/Winter 1984); Pierce, "In Pursuit of the Possible," p.14. 74. Mehrens and Lehmann, Standardized Tests, p.27, 13, 86, 57. 75. Zweizig, "Tailoring Measures," p.58-59. 76. Shavit, "Qualitative Evaluation," p.235. ·77. Brown, "Library Data," p.478. 78. Mehrens and Lehmann, Standardized Tests, p.12-15, 27. 79. Ibid., p.27. 80. Ibid., p.106. 81. J'he 50-form limit was decided upon for two major reasons. First, the questionnaire form of re- search is somewhat intrusive. Therefore, restricting the intrusion to as small a degree as possible seemed a desirable goal. Second, a sample of fewer than 30 transactions per library would severely threaten the security of making statistical inferences for a given library. Thus, 30 forms would be the minimally acceptable number of transactions. Not knowing the degree of compliance of both librarians and patrons to filling out and depositing the forms, some number above 30 would be necessary to insure that a sufficient number of complete transactions would be obtained. A 60% return rate was assumed, thereby yielding the number 50. 82. For a discussion of Cronbach's alpha, see E. G. Carmines and R. A. Zeller, Reliability and Validity Assessment (Beverly Hills, Calif.: Sage Publ., 1979), p.43-46. 83. The value of Cronbach's alpha reported here differs slightly from that published by C. Bunge. Bunge provides preliminary results based on a simplified correlation matrix where all statistically nonsignificant correlations were set equal to zero. The results here are based on the empirically 334 College & Research Libraries July 1987 derived correlation matrix given in figui:e 1. See C. Bunge, ''Factors Related to Reference Question Answering Success," RQ 24:483 (Summer 1985). 84. For an overview of various clustering techniques, seeM. S. Aldenderfer and R. K. Blashfield, Clus- ter Analysis (Beverly Hills, Calif.: Sage Publ., 1984). 85. SASinstitute, SAS User's Guide: Statistics, 1982ed. (Cary, N.C.: SASinstitute, 1982), p.461-62. An N of 448 forms representing all forms that were completely filled out for all variables was used in this analysis. 86. Carmines and Zeller, Reliability and Validity, p.60-61. 87. Marcia J. Myers and Jassim M. Jirgees, The Accuracy of Telephone Reference/Infonnation Seroices in Academic Libraries (Metuchen, N.J.: Scarecrow, 1983), p.60. 88. Myers and Jirgess, p.240. 89. Strong, "Evaluating the Reference Product," p.370; Howell and others, "Fleeting Encounters," p.127. APPENDIX A: REFERENCE TRANSACTION ASSESSMENT INSTRUMENT Qsophomol'e 0Junior Osentor The Reference Department is doing e survey of reference use and would appreciate it if you would mark the following brief checksheet. Thank you! (Deposit checksheet UNFOLDED in container on leaving this area or on leaving the library .) THANKS AGAIN FOR YOUR HELP! 1 1. Did you locoto what you asked about at the reference desk 1 0 Ves, w tth 0 v ... notwhetletkltd tor , butot h tnformation ormatertaltthetwtllbe helpful 0 Ve1, bul :~~.~~·lly I 2 . If yes. how did you f ind the a -Librarian found or Q~~~ ~~~:s- informatlon or materlals1 helped fi nd on my own QDtdn' tfotlowsugg ... t ions but found on my own ! 0 ~:•c:C:::~~· .:~~."~,or I 3 . Were you satisfied with the information or materials found or suggested? Q ~:;;;r: :~':e~',ioo or QAiumni 1 4 . If~ or~ Ntlsfied, why7 MARK ALL THAT APPLY. 0 Found noth1ng 0 Too much 0 Want different vl ewpomt 0 Not enough 0 Need more m ·depth 0 Couldn' t find information m source Q Need more s1mpl e 0 Not rei event enough 0 Not sure 11 1nlormat1on QIYIIn me IS co rrect 0Facuhy OstoH ~mponant 1mportant 0 0 0Noteftiliatad w tth Untv MAJOR OR TEACHING / 0 Artr~~tPu?.f.t:u?.:!" 6 . Was the librarian busy (e.g .• phone ringing , others waiting)? 0 Panly 0 -0 --- 0Low Qeustness/ Manegement OotherSocia iSel 0Mechc.ne / H .. Ith C Agnc./B iolog.al Sci 0Technology/ Engtneertng Olnterdtsct phnary / Other 7 . Old the librarian understand what you wanted? 8 . Did you gat enough help and explanation ? 9. Ware the explanations clear? 10. Did the librarian appear knowledgeable about your question? 11 . Was the service you recei ved courteous and considerate? 12. Did the librarian gi ve you enough time? 13. Did you learn something about reference sources or use of the library as a result of consulting the reference librarian? 14. Did you become acquainted with a ny reference sources you hadn' t previously known about, •• a result of consulting the reference librarian? No. none 0 taoo•oo••oo••oooooo 3272 I ~USE OM.Y MA1t1 NO MARKS IN THts AMA Q 0 0 0 0 0 -0 0 0 9 0 0 : -0 0 0 - 9 0 0 : -Q 0 0 ---: : ®e®®0® ®0®®0®®®• j Development and Testing 335 APPENDIX A. Continued • 1. TYPE OF QUESTION Select 2!!!Y one category in A -0 below that l iQ MAR K IF PHO NE - beat fits type of answer wanted . : Ao~~:~~~~,T!:'~~~~ :C~H0~~!!,~::,Teo Q lt bferien Q Ubrery Ass•stln t Q otherAssistent - 0 2 ~':~'!~~ - ;tem in larger publica tion(~ er1tC ie, speech, quote, po em, : Q 3. Anything (Or ceftetn typ e of thi ng) ~nicule r 1ut hor JOT DOWN QUESTION :10 B !~~~: ~n~~=!:v:~:!:i~~~~~ :!.:~0o~·~~:t~!~:,~~~ii~~:;;,~:~~:~:.h!c:':1~~"~~~:~~~:~.~-!::.~::~~ ot • few :b c GENERAL EXPl. Of CATALOG , LIBR., OR PRINTED REF. SOURCE WANTED (IIJithor thon short onswor) : \~~~'~: ::;:~ ~~~ g: ~:::::,: ;::.~~; ,:·;~~ :~·~;,::~:: so u•ce co ntoonong ot) - Qb. Relate2subjorconcepts (nemes. eddr .• de lintttons. stettsltca, rettngs, rank• ngs , etc., etc.) • 2. t