College and Research Libraries "Existing Collection Strength" and Shelflist Count Correlations in RLG' s Conspectus for Music William E. McGrath and Nancy B. Nuzzo "Existing Collection Strength" (ECS) is an estimate of the extent to which subject areas in Research Libraries Group (RLG) collections meet research level objectives on a scale of 0 to 5. ECS can be interpreted as a proxy for shelflist counts. To test that assumption, two main sets of correlations were computed: 1) 17 correlations for 17 RLG libraries across 138 LC class ranges, indicating how well the scale represents collection strength within a library; 2) 138 correlations for 138 LC ranges across 17 libraries, indicating how well those ECS estimates were made from library-to-library. "Within library" correlations ranged from 0.36 to 0.89, all significant, with a confidence interval of the mean (0.66) from 0.55 to 0.74. "Across library" correlations ranged from -0.34 to 0.94, with none significant below 0.47. The mean was 0.54, with a confidence interval 0.08 to 0.81. As a proxy variable for shelflist within libraries, ECS is expeditious but, necessarily, less than perfect, and is, therefore, a trade-off between convenience and accuracy. However, the wide range of correlations across libraries, from inverse to highly positive, indicates that using ECS to compare libraries is not appropriate, probably because each library interprets and applies the ECS scale differently. • he Research Libraries Group (RLG) Conspectus is an un- usual source of data about li- brary collections. 1 According to Nancy E. Gwinn and Paul H. Mosher, it was intended to help RLG members evaluate and compare their collections and fo facilitate coordinated collection development by summarizing existing and future collection strengths.2 Its large size and widespread recognition by RLG members and others suggest that a close examination of its data would yield in- teresting insights for collection develop- ment and analysis in general. PURPOSE A novel concept of the Conspectus is the variable "existing collection strength," or ECS. ECS was intended to William E. McGrath is Professor in the School of Information and Library Studies and Nancy B. Nuzzo is an Associate Librarian in the Music Library at State University of New York at Buffalo, Buffalo, New York 14260. This study was supported by the Council on library Resources (CLR Grant 4015). The authors wish to thank Leslie Hume and James Coleman of the Research Libraries Group for their interest and support. We also thank the Council on Library Resources and Deanna Marcum, formerly of CLR, for their support. RLG staff and member libraries do not necessarily endorse these findings; all errors, interpretations and misinterpretions are solely our own. 194 be a shortcut for representing collections by assigning numerical values to pre- defined "levels" of RLG member library holdings. How well ECS represents col- lections is an important consideration in its continued use. This study addresses that question by examining the data in one Conspectus, that for music libraries.3 In addition to ECS values, the Conspectus for Music contains shelflist counts of member library collections. These counts can be directly compared or cor- related to ECS values. RLG itself has conducted numerous "verification" studies of its collections. These interesting studies are available from RLG, and have been reported by Paul H. Mosher and Jim Coleman. 4 Our study was not affiliated with those veri- fication studies, though we received complete cooperation from RLG. Rationale for correlating ECS and shelflist counts can be found on page 2-1 of the RLG Collection Development Manual containing the following description of ECS. 1. These values [ECS] describe collec- tions or collection policies abso- lutely, not relatively. They assume, therefore, a national perspective and a broad cognizance of all facets of collecting. 2. When the value describes existing collection strength, it should relate to national shelflist measurement, reflecting what is actually on the shelves.5 We interpret these statements to mean that ECS is intended to mirror library holdings, i.e., to reflect actual shelflist counts. METHOD ECS values are expert judgments in- tended to represent predefined levels of collection strength in each library in specified LC classification ranges. These values, or "levels" as they are called in the Conspectus, range from 0 to 5.6 The designers of the Conspectus assigned each of these levels a specific definition of collection strength. For the purpose of "Existing Collection Strength" 195 this analysis, these definitions are irrele- vant, except to note that 0 is the mini- mum and 5 the maximum, and that each value signifies the strength of holdings as judged by collection development li- brarians. The ECS 6-point scale can be interpre- ted as an interval scale variable. An in- terval scale usually has a fixed range, and the difference between two adjacent scores is the same as that between any other adjacent scores. For example, in the Fahrenheit temperature scale, which ranges from 0 to 1 00, the difference be- tween 60 and 62 degrees is the same as that between 62 and 64 degrees. With ECS, the difference between level3 and level4 is taken to be the same as between any other level such as 2 and 3 or 4 and 5, without regard to the actual number of holdings the levels are intended to represent. A ratio-level scale is one in which 0 represents a true absence of value, rather than merely the smallest number, and the largest number has no theoretical maximum. Any two values can be ex- pressed as a ratio. Thus, ECS is not quite a ratio-scale variable because a level of 4, for example, is not twice the level of 2. An ordinal scale is one in which scores or counts are not implicit in the ranks; one rank is simply lower or higher than another, while ranks have no theoretical maximum. Thus, ECS is probably more than ordinal because shelflist counts are implicit, even though not expressed, and because it also has a maximum permiss- able value. Interpretation of the data as ordinal would result in a large number of tie ranks-with most of them tied at 2 or 3. Rank order correlat.ion methods do not handle large numbers of ties very efficiently. ECS values are intended to "reflect what is actually on the shelves," whether 10,100, or 10,000 volumes. Most elemen- tary statistics texts explain that ratio scales contain more information than in- terval, interval more than ordinal, and ordinal more than nominal. A collection containing 10,000 volumes is twice that of 5,000, but an ECS of 4 is not twice that of 2. Thus, by estimating existing 196 College & Research Libraries strength, much information about actual, highly variable shelflist counts is not available. Of course, that is presumably the reason for using ECS in the first place-when shelflist data are not avail- able, or too difficult, expensive, or time- consuming to collect. The important question is, how much information is lost? In this study, that question be- comes, how well does the ECS interval scale represent the shelflist ratio scale? Thus, for the purpose of analysis only, ECS values are treated as interval scale. The danger therein is that the ECS values will be regarded as containing more in- formation than they in fact do. The ques- tion of whether they are truly interval, ordinal, or something else is left unan- swered. Another major statistical question is whether each library interprets the scale values in the same way. For example, is a 3 in one library the same as a 3 in another? The question may be addressed in part by computing cross-library corre- lations. If ECS values and shelflist counts agree fairly well, then collection de- velopment librarians could feel confident in using ECS. If ECS values and shelflist counts agree fairly well, then collection devel- opment librarians could feel confident in using ECS. To measure the extent of that agreement, three sets of Pearson prod- uct-moment correlations were com- puted between music library ECS values and their correponding shelflist counts (SHELFLIST): 1) correlation within li- braries, with LC ranges as the units of analysis; 2) correlation across libraries, with libraries as the units of analysis; and 3) correlations with the units of anal- ysis as a composite of LC ranges and libraries. Correlational methods depend a great deal on how. the variables are defined-whether they are continuous or dichotomous, nominal, ordinal, inter- val or ratio-and on the research objec- tive? March 1991 Pearson correlations are appropriate for interval-level data but require that data to be normally distributed and have the same variance. Both ECS and SHELFLIST violate this assumption to some degree. This violation requires that one or both variables undergo some transformation to bring them into line with one another. In this study, a loga- rithmic transformation of SHELFLIST was made, rendering it fairly normal, prior to computation of correlations. This transformation has the effect of im- proving the correlations somewhat. RESULTS Correlation within Libraries "Correlation within libraries" tests the hypothesis that the ECS values of a par- ticular library and its corresponding shelflist counts are the same. Complete shelflist counts of 138 LC music classifi- cation ranges were available for 17 music libraries. Here, the unit of analysis is the LC music classification range with a sample size of 138 classes (figure 1). Correlation coefficients for these 17 are tabulated in table 1. All coefficients were computed in the LOTUS 1-2-3 micro- computer spreadsheet program. 8 The computed coefficients are esti- mates of the true correlation, which we cannot know precisely, but which can be reported as confidence intervals. Their accuracy depends on the sample size: the larger the sample the more accurate the coefficient. The best we can say is that 1) the true correlations lie somewhere be- tween lower and upper limits of the con- fidence intervals, or that 2) the true correlation must be at least some mini- mum value to be significant. Table 1 shows confidence intervals as well as the smallest coefficient required to be signif- icant. Correlations range from 0.36 to 0.89, with mean of 0.66 arid confidence inter- val 0.55 to 0.74. The interval for the com- puted mean is also shown. The true mean lies somewhere between 0.55 and 0.74, with n = 17. The overall correlation of 0.67 agrees with the mean for libraries, 0.66. "Existing Collection Strength" 197 LC Classification Range ECS SHELFLIST Log of SHELFLIST M1 3 94 1.97 M2 4 1920 3.28 M3-3.1 4 486 2.68 M5 3 37 1.57 M6-19 3 810 2.9 M20-39 4 2114 3.25 -etc.- Figure 1. Partial tabulation of ECS values, with shelfli.St counts and their logs within one library, NYBS (SUNYBuffalo), from the Conspectus for Music, May 1986. Sample size=138. Correlation coefficients were computed for this library and 16 others. Random error, sample size, and vari- ables not under a library's control all affect correlations. When confidence in- tervals for two libraries overlap, it is in- correct to say that one library has done better than another. Their correlation co- efficients are unlikely to be significantly different. Because our intent was not to compare libraries, they are listed in table 1 alphabetically rather than ranked by correlation coefficients. Correlation across Libraries-by LC Classification Range Correlation by LC range tests the hy- pothesis that the ECS values and their corresponding shelflist counts are the same for a particular classification range. In a very important sense, what is being tested is whether each library in- terprets collection strength in the same way. That is, does "existing collection strength" mean the s~me thing in each library? Is a value of 3 in one library the same as a value of 3 in another library? High correlations would indicate that li- braries tend to interpret ECS values in the same way. Here the unit of analysis is the library, and the sample size is 17 (figure 2). One hundred thirty-eight correlation coeffi- cients between ECS and SHELFLIST were computed for each of the 138 LC classification ranges. These coefficients are tabulated in table 2. They ranged very broadly from -0.34 to 0.94. It should be noted immediately that a sample of 17 is not large, hence it is not surprising that a substantial number of correlations, about 37%, were not significant. Given the small sample size, only correlations higher than 0.47 can be considered sig- nificant at the 95% confidence level. The small sample size also accounts for the lack of confidence in the mean correlation with a broad interval of .08 to 0.81. This interval indicates that the true mean could have any positive value be- tween those two extremes, from virtu- ally no correlation to a very high correlation. When only 17libraries have counted their shelflists, the mean ECS across libraries is not especially reliable. Should the number of libraries counting their shelflists be increased substantially, the mean ECS might then be somewhat more reliable. General Correlation between ECS Val- ues and Shelflist Counts Irrespective of Library or LC Classification Correlations may also be tested with- out categorizing ECS and shelflist values by library or music classification ranges, and instead may be tested by treating each ECS/shelflist comparison as an ob- servation either from the overall Music 198 College & Research Libraries March 1991 TABLE 1. 11 WITHIN-LIBRARY" CORRELATION COEFFICIENTS* BETWEEN ECS VALUES AND LOGARITHMS OF SHELFLIST FOR 17 MUSIC LIBRARIES. Correlation Library Code Coefficient Confidence Interval Brigham Young Univ. UTBG 0.36 (0.16-0.54) Colorado State Univ. COSG 0.49 (0.3(H).63) Columbia University NYCG 0.56 (0.39-0.69) Cornell University NYCX 0.73 (0.61-0.82) Dartmouth NHDG 0.54 (0.37-0.68) Johns Hopkins MDJG 0.81 (0.71-0.87) New York University NYUG 0.67 (0.53-0.77) Northwestern Univ. !LNG 0.61 (0.45-0.73) Notre Dame INDG 0.89 (0.83-0.92) Rutgers University NJRG 0.68 (0.55-0.78) Stanford University CSUG 0.46 (0.26-0.61) SUNY -Buffalo NYBS 0.76 (0.65-0.84) Temple University PATG 0.72 (0.6(H).81) Univ. Calif.-Berkeley CUBG 0.63 (0.47-0.74) Univ. of Michigan MIUG 0.77 (0.66-0.84) Univ. of Oklahoma OKUG 0.71 (0.58-0.81) Yale University CtYG 0.77 (0.66-0.84) Mean correlation (n=17) 0.66 (0.55-0.74) .. All are significant @ 99% confidence level for 137 df. (Smallest correlation needed to be significant = 0.23) Conspectus, from music scores only, or from music literature only. Here, the unit of analysis is neither "Library" nor "LC range." It is, instead, a composite--i.e., "Library /LC range," as in figure 3. Here, the sample is much larger-virtually the entire population of ECS values and shelflist counts-and the confidence much better. Three coefficients were computed, 0.67, 0.71 and 0.64, for the overall Music Conspectus, for Music Scores, and for Music Literature (table3). Though these correlations may seem moderately high, and though the large sample size may seem to provide high statistical confidence, they should be in- terpreted with caution for reasons given · previously. DISCUSSION ECS as a Surrogate Measure The methods discussed above show that 1) correlations between estimates of collection strength and actual counts of holdings are always somewhat less, and sometimes substantially less,than per- fect; 2) correlations vary from library to library and from subject to subject; and 3) confidence in those correlations can be expressed only in terms of a range be- tween some high and some low. The cor- relational approach measures the extent to which ECS agrees with actual shelflist counts and the statistical confidence we can have in those correlations. Because ECS values are surrogates of shelflist "Existing Collection Strength" 199 Library ECS SHELFLIST Log of SHELFLIST COSG 2 84 1.92 CSUG 3 399 2.6 CTYG 4 692 2.84 CUBG 3 437 2.64 !LNG 3 330 2.52 MDJG 95 1.98 MIUG 2 443 2.65 NHDG 2 118 2.07 NJRG 3 198 2.3 NYBY 2 296 2.47 NYCG 3 440 2.64 NYCX 4 512 2.71 NYUG 2 218 2.34 OKUG 68 1.83 PATG 3 202 2.31 UTBG 3 113 2.05 Figure 2. Typical array of ECS, with shelflist counts and their logarithms, for LC range ML2900-3275, Conspectus for Music, May 1986 (see appendix A); 138 correlations between ECS and log of SHELFLIST, with N = 17, were computed from such data. counts, they are necessarily and by defi- nition less reliable than actual counts. Qualitatively, the agreement between ECS and shelflist counts might be con- sidered moderately low to moderately high-about what should be expected for judgmental data. Within- Library Correlations The correlation coefficient in the "within-library" test is for only one li- brary at a time. Thus, a coefficient for one library says nothing about that for any other library. Furthermore, a within-library coeffi- cient explains nothing about the library or its collection. It is, at best, a statistic describing a tenuous relationship be- tween two ways of measuring the same thing. No inferences can nor should be made from it. Across-Library Correlations "Across-library" correlations ranged broadly-from moderately negative to nearly perfect. When a large number of statistics such as correlations and means are computed, a very broad range of re- sults is not unusual in many contexts. But how could estimates of collection strength result in negative correlations? It happens this way: one library esti- mates its strength to be 5 in some LC range, with a shelflist count of 450 vol- umes; another estimates its strength in the same range to be 3, with a count of 700 volumes. Obviously these numbers have an inverse relationship, a negative correlation, and obviously both libraries have different perceptions of 3 or 5. In theory, each library estimates the strength of an LC range relative to some absolute strength. In practice, estimates 200 College & Research Libraries March 1991 TABLE2. FREQUENCY DISTRIBUTION OF CORRELATION COEFFICIENTS BETWEEN ECS VALUES AND LOGARITHMS OF SHELFLIST FOR 138 LC MUSIC CLASSES ACROSS LIBRARIES Distribution (no. of ranges) Frequency class limits of correlation coefficients a,b Significancec 1 2 4 9 13 22 35 39 13 138 Total -0.34 to -0.20 -0.19 to -0.05 -0.04 to 0.08 0.09 to 0.23 0.24 to 0.37 0.38 to 0.51 0.52 to 0.65 0.66 to 0.79 0.80 to 0.94 not sig (95%) not sig (95lfo) not sig (95%) not sig (95%) not sig (95%) not sig (99%) sig (95%) sig (99%) sig (99%) a) These are frequency limits, not confidence intervals. b) Mean correlation = 0.54, conf.interv. = (0.08-0.81). Median= 0.59. c) Correlations smaller than 0.47 and 0.59 not significant @ 95% and 99% confidence level, respectively, df = 15. are probably sometimes made relative to perceptions of strength in other libraries. Because different perceptions are inevi- table, and because negative correlations are hardly acceptable relationships be- tween estimated and actual collection strengths, cross-library comparisons should not be trusted and are probably not intended by RLG. The very broad range of correlations may substantiate what RLG meant when it said ECS val- ues "describe collections absolutely, not relatively," i.e., not relative to other li- braries. Because different perceptions are in- evitable, cross-library comparisons should not be trusted and are proba- bly not intended by RLG. General Correlations: Overall, Music Scores, Literature The moderately high general correla- tions for the composite library /LC data are somewhat more reliable than those for "within" and "across" because of the large sample size. They should still be treated with caution, of course, for the same reason that all correlations in this study should be suspect. Because one purpose for making ECS judgments in the first place was to establish a measure for facilitating library cooperation, coef- ficients having no reference to specific libraries may be academic. Yet, these cor- relations could be useful as standard in- dexes, to which all other correlations could be compared. For example, if a new library were to join the Group and wished to evaluate its collection, it would be helpful to know that the over- all correlation of 0.67 could be used as a frame of reference. Questions of Methodology One question about using logarithmic transformations of SHELFLIST is that ECS is also a transformation of SHELF- LIST. ECS is a judgmental transforma- tion from true, essentially continuous, ratio data (SHELFLIST), to perceptual, discrete, interval-level data. In effect, we are, in :a circuitous procedure, correlat- ing SHELFLIST with itself; more specif- ically, two transformations of shelflist Library/LC range CSUG /ML47-54 CSUG /ML2900-3275 CSUG /ML3300-3354 CTYG/ML47-54 CTYG/ML2900-3275 CTYG/ML3300-3354 NYCX/MU7-54 NYCX/ML2900-3275 NYCX-/ML3300-3354 ECS 3 3 2 4 4 3 4 4 3 -etc.- "Existing Collection Strength" 201 SHELFLIST 1064 399 3 806 692 3 1568 512 18 Log of SHELFLIST 3.00 2.60 0.47 2.90 2.84 0.47 2.71 1.25 3.19 Figure 3. Partial tabulation of ECS values, shelflist counts, and their logarithms from two composite library/LC ranges in the Conspectus for Music, May 1986. Data for general test contains all ranges and all libraries, for a total sample of 2,346 (17 x 138). TABLE 3. ONE-TO-ONE CORRELATIONS BETWEEN ECS AND LOGARITHMS OF SHELFLIST. Correlation Sample Size (n) Overall Music scores only Literature only are made with each other: 1) ECS, the judgmental transformation of SHELF- LIST, and 2) the logarithmic transforma- tion of SHELFLIST. A more satisfactory approach would be to identify or define other behavioral variables to correlate with SHELFLIST. Data for behavioral variables, of course, may be hard to come by, whereas ECS is relatively easy to ob- tain. Sources of Error RLG librarians are the first to point out the sources of statistical error in the Con- spectus: the inherent variability of collec- tions, differences in data tabulation, and different procedures for counting, esti- 0.67 0.71 0.64 2346 952 1394 mating, and reporting. Some of these sources, particularly those concerning the definition of ECS, are noted in the Conspectus itself. Even if these sources were perfectly controlled, random error would still exist. If correlations were per- fect or nearly perfect, then both the sta- tistical methods and the statistician would be suspect. The Question of Quality At the start of this study we assumed that ECS was defined and adopted as a surrogate for actual strength and that actual strength, if measureable at all, could only be measured by the number of items in a library's shelflist. We were, 202 College & Research Libraries of course, very careful not to interpret "strength" as quality, an interpretation which would generate vigorous debate. Some librarians maintain that the num- ber of items in a collection is a sufficient measure of quality, while others say that quality is too nebulous a thing to pin down with counts of anything. Though all would agree that strength, quantity, and quality are different things, we offer no rigorous definitions here. Many attributes of the collection may affect quality: age, recency, specific titles, current acquisitions, budget, selectivity, accessibility, availability, location, as well as the manner of the collection's organization and administration. Addi- tionally, user impressions, demand, tes- timonials, reputation of the parent institution, and a host of other internal and external attributes all may have an effect. Whether these attributes do or do not measure quality depends very much on how they are perceived and defined. One thing is certain: the extent to which these attributes enter into the estimates of strength must also affect confidence in the ECS concept as well as trust in shelf- list counts.9 Shelflist as a Measure of Collection Strength · An alternative way of looking at col- lection strength is to think of it as a the- oretical concept, like Plato's ideal, and that neither ECS nor SHELFLIST can be considered actual strength. Rather, both are estimates of that mystical ideal. . Strength would then be the sum of all the significant attributes discussed above plus numerical counts and anything else March 1991 one might want to throw in. At best, ECS and SHELFLIST alone would then be less than perfect indicators of that strength. Practically speaking, because there are as many ideals as collection development librarians, and none of them more attainable than another, per- haps the most sensible approach is to think of a shelflist count as a reasonable measure of the number of books and of ECS as a flawed but expeditious substi- tute for those counts. Further Research Further research might involve more detailed investigation into the ordinal and interval properties of ECS as well as alternative ways of estimating collection strength. Sampling shelflists as an alter- native to judgment may be a more reli- able and a more orthodox procedure. Identification of other variables which could be correlated with shelflist counts is another possible direction for re- search. If ECS becomes a more widely used method for estimating collection strength, investigations into the statisti- cal concept of validity should be under- taken. Other issues that require investigation include determining whether low within-library correlations suggest a flaw in Conspectus methodol- ogy or hint at the presence of collections with unique strengths, or whether they imply characteristics of the literature that are not evident from shelflist counts. Do low between-library correlations in- dicate subject areas that require special attention? Researchers need to take up such isssues. REFERENCES AND NOTES 1. "Conspectus Points Way to Cooperative Collection Development," Research Libraries Group News 5:12-14 (September 1984). 2. Nancy E. Gwinn and Paul H. Mosher, "Coordinating Collection Development," College & Research Libraries 44:128-40 (March 1983). 3. Research Libraries Group, RLG Conspectus; Conspectus for Music (Stanford, Calif.: Research Libraries Group, July 1986). 4. Paul Mosher, "The Nature and Uses of the RLG Verification Studies," C&RL News 46 no. 7:336-38 (July I Aug. 1985); Jim Coleman, "Verification Studies: Design and Im- plementation," C&RL News 46 no. 7:338-40 (July I Aug. 1985). "Existing Collection Strength" 203 5. Research Libraries Group, Collection Development, Manual, 2d ed. (Stanford, Calif.: Research Libraries Group, unpublished document), p. 2-1. 6. Ibid. 7. Edward G . Carmines and Richard A. Zeller, Reliability and Validity Assessment, Sage University Paper Series on Quantitative Applications in the Social Sciences, No. 17 (Beverly Hills, Calif., and London: Sage Publications, 1979), p. 17-27; Walter Dick and Nancy Hagerty, Topics in Measurement: Reliability and Validity (New York: McGraw-Hill Book Company, 1971), p. 101-28. 8. LOTUS 1-2-3 Computer Programs (Cambridge, Mass.: Lotus Development Corporation). 9. Numerous notes in the Conspectus describing special features, emphases, exceptions, restrictions, qualifications, and other factors which contribute to the erosion of statis- tical confidence testify to the difficulties in establishing consistent measurement. How does y.our collection measure up? You need reliable quantitative data to justify collection management decisions. You need a flexible analysis system designed with your library's goals in mind. Introducing ... ............... ...... OCLC! AM/COS Collection Analysis Systems. Collection Analysis CD compares your holdings against those of similar institutions, using a subset of the OCLC database on compact disc. Tape Analysis gives you a custom-designed MARC tape analysis for your library or group. A Tape Match against Books for College Libraries is also offered. OCLC/AMIGOS Collection Analysis Systems Available exclusively in the U.S. from AMIGOS Bibliographic Council, Inc. 11300 North Central Expressway, Suite 321 Dallas, Texas 75243 (800)843-8482 (214)750-6130