shim.p65 Improving Database Vendors’ Usage Statistics Reporting 499 499 Improving Database Vendors’ Usage Statistics Reporting through Collaboration between Libraries and Vendors Wonsik Shim and Charles R. McClure Wonsik Shim is an Assistant Professor in the School of Information Studies at Florida State University; e- mail: wshim@lis.fsu.edu. Charles R. McClure is the Francis Eppes Professor and Director of the Informa- tion Use Management and Policy Institute in the School of Information Studies at Florida State Univer- sity; e-mail: cmcclure@lis.fsu.edu. The article reports the results from the Association of Research Libraries (ARL) E-Metrics study to investigate issues associated with the usage sta- tistics provided by database vendors. The ARL E-Metrics study was a con- certed effort by twenty-four ARL libraries to develop and test statistics and measures in order to describe electronic resources and services in ARL libraries. This article describes a series of activities and investigations that included a meeting with major database vendors and the field-testing of usage statistics from eight major vendors to evaluate the degree to which the reports are useful for library decision-making. Overall, the usage statis- tics from the vendors studied are easy to obtain and process. However, the standardization of key usage statistics and reporting format is critical. Vali- dation of reported statistics also remains a critical issue. This article offers a set of recommendations for libraries and calls for continuous collaboration between libraries and major database vendors. he move to a networked envi- ronment has significantly in- creased the range of services and resources that the library provides its users. The library has become a twenty-four-hour-a-day access point to information services where users obtain services and resources on their terms and when they want such services. Often us- ers do not enter the library physically nor do they interact directly with library staff. The costs of providing these networked services and resources can be significant. As a result, library managers are seeking ways to measure the use of these digital services and resources. One of the results of the networked information provision environment is that libraries increasingly depend on ex- ternal providers of academic and schol- arly information content and services. Recent statistics estimate that in 2000– 2001, research libraries spent, on average, 16.25 percent of their materials budget on electronic resources, a sharp increase from a mere 3.6 percent in 1992–1993.1 This in- formation has traditionally existed in the library as subscription print journals, 500 College & Research Libraries November 2002 print indexes and abstracts, books, and so on. However, there is a big difference in terms of ownership and control be- tween traditional information contents and digital information contents. With physical media, the library owned the objects and controlled their use. For example, the library catalog—be it card catalog or online catalog—repre- sented what the library owned and could make available to its users. But with elec- tronic media, the library is only one of many access points to the information resources. As a result, the library has much less control over use.2 The library catalog now includes many pointers to external information sources that, in some cases, may no longer exist when the user tries to access them. Figure 1 depicts a simplified view of the differences between the traditional library environment and the networked library environment characterized by the Internet as the primary information de- livery medium and the growing presence of external electronic information re- sources and services in the library. FIGURE 1 Changed Library Environment3 In the traditional library, most library materials were housed in a physical li- brary building, and users typically needed to come to the library to use its materials and services.4 Availability was an important concern because of the physical characteristics of the materials. In the networked library, however, library materials and services increasingly reside outside the physical library building. Li- braries now depend, in large measure, on the publishers of electronic journals (e.g., Elsevier’s Science Direct and Academic Press’s IDEAL), electronic content aggregators (e.g., Ebsco and Gale), and other electronic information providers to meet user demands for resources and ser- vices.5 Availability has become less an is- sue in the networked library environment because the electronic medium allows several people to use the same material at the same time.6 On an experiential basis, many aca- demic librarians describe the use of their networked information services with terms such as “exponential growth” or “we can’t keep up with demand.” At the same time, a number of academic librar- ies have seen stagnant or declining sta- tistics of traditional indicators of library service, such as turnstile counts, in-house reference transactions, and circulation. … the provision of usage statistics by electronic content providers is problematic at best. Improving Database Vendors’ Usage Statistics Reporting 501 Librarians need reliable and accurate sta- tistics that will allow them to make good resource allocation decisions (e.g., cost- benefit analysis, contract negotiation, jus- tification of expenditure), meet user needs (e.g., identifying barriers of access, under- standing user behaviors), and develop strategic plans (e.g., user education, peer comparison) for the development and operation of electronic services and re- sources. Although some progress has been made over the past several years, most notably, the guidelines produced by the International Coalition of Library Consor- tia (ICOLC), the provision of usage sta- tistics by electronic content providers is problematic at best.7 This article focuses on the problem of acquiring and using the statistics pro- vided by external, fee-based electronic content providers and describes the work done in the Association of Research Li- braries (ARL) E-Metrics Project to stan- dardize the usage statistics and promote dialogue among database vendors and re- search libraries.8 Previous Work The growing presence of electronic infor- mation resources and networked services prompted interest and research in devel- oping statistics and measures to describe the emerging information environment. The most relevant work is a manual pub- lished by the ALA in 2001.9 Written by John Carlo Bertot, Charles R. McClure, and Joe Ryan, the work is based on De- veloping Public Library Statistics and Performance Measures for the Net- worked Environment, a research project funded by the Institute of Museum and Library Services (IMLS). Intended prima- rily for public library managers, the manual not only contains step-by-step procedures to collect some key usage sta- tistics but also provides a set of issues that library administrators need to consider in collecting and using those statistics. Many of the proposed statistics can be easily transferred into an academic library set- ting. In an article published in 2000, Carol Tenopir and Eleanor Read offered an ex- ample of cross-institutional analysis of database use.10 Using data from a vendor for fifty-seven academic institutions, Tenopir and Read found that regardless of the type of academic library, user de- mands are concentrated on a fairly pre- dictable span of time—“early in the week, at midday, in the month when term pa- pers are due.” The authors also concluded that, compared with other electronic me- dia such as chat rooms and general Internet resources, students underutilize electronic library databases. At the indi- vidual institution level, Deborah D. Blecic, Joan B. Fiscella, and Stephen E. Wiberley Jr. identified ways that libraries can use vendor-supplied usage statistics to under- stand the scope of use.11 Sally A. Rogers also has provided a good comparison of print and e-journal usage at Ohio State University.12 Recognizing the need for ways to compare patron usage of elec- tronic and print materials, Kathleen Bauer proposed the use of indexes to combine multiple usage indicators of both elec- tronic and print resources.13 A recent compilation by McClure and Bertot has provided an overview of a wide array of issues surrounding the evaluation of networked information ser- vices in several different contexts, includ- ing usage statistics from database ven- dors.14 As mentioned earlier, the ICOLC guidelines are widely recognized as the de facto standard regarding usage statis- tics supplied by database vendors. Finally, the discussion of usage statis- tics of database vendors is not complete without mentioning two very active mail- ing lists that deal with the topic: Library License (liblicense-l@lists.yale.edu), hosted by the Yale University Library; and the Electronic Resources in Libraries list (eril-l@listserv.binghamton.edu), hosted by the Binghamton University Library. Although these mailing lists do not cover vendor statistics exclusively, there have been a considerable number of postings and threads on both regarding the topic. These mailing lists also have been used 502 College & Research Libraries November 2002 as a catalyst to formulate the library community’s response to major chal- lenges from database vendors. The current work focuses on issues re- lated to acquiring, processing, and using vendor usage statistics at research librar- ies under the ARL E-Metrics Project. It is important to point out that the E-Metrics Project is one of many initiatives that are working toward establishing standard- ized, comparable statistics for electronic contents and services. However, the ARL E-Metrics Project is unique in that it is a cooperative effort among a large number of research libraries and that it seeks the participation of major database vendors in attempting to find solutions. ARL E-Metrics Project Usage statistics in the context of electronic subscription–based databases mainly re- fer to the indicators of the volume of user access to the electronic resources and ser- vices available from database vendors. Examples of those indicators are a count of sessions in a specific database, the time per session in a specific database, the count of searches in a specific database, and the count of full-text downloads per time period per database. In addition, usage statistics can show a variety of in- formation, including success or failure of user access (e.g., turn-aways per time period per specific database), user access methods (e.g., telnet versus browsers), access levels of one institution compared against peer institutions, cost of access (e.g., cost per downloaded item), and other items pertaining to user behaviors. According to a survey conducted with the participants of the February 2000 ARL Project Planning Session on Usage Mea- sures for Electronic Information Re- sources, held in Scottsdale, Arizona, the following problems are associated with usage reports from database vendors: 15 • Reports do not provide detailed information about usage. For example, many vendors did not provide usage fig- ures by journal or database title. • Reports are inconsistent. For ex- ample, vendors use their own terminolo- gies and do not provide adequate expla- nations to understand the reported sta- tistics. • Reports are not comparable. Be- cause usage reports come in different for- mats and contain different statistics, it is impossible to compile accurate statistics within the library and to compare with other libraries. However, the biggest problem with usage reports is that many vendors sim- ply do not provide any data at all. The ARL E-Metrics Project was a con- certed effort by selected members of the research library community to investigate various issues and problems related to collecting and using data on electronic materials and services. The project, which began in April 2000 and finished in De- cember 2001, was funded by a group of twenty-four ARL libraries. Figure 2 iden- tifies the project’s participants. One of the aims of the E-Metrics Project was to engage in a collaborative effort with selected database vendors to establish an ongoing means of producing selected de- scriptive statistics on database use, users, and services. A complete project descrip- tion, project reports, and the data collection manual are available at the ARL E-Metrics Project site at http://www.arl.org/stats/ newmeas/emetrics/index.html. The E-Metrics Project should be viewed in the context of a number of re- lated initiatives, both national and inter- national, that are under way to assist li- braries in assessing their networked re- sources and services. Although these ini- tiatives take different approaches, focus on different types of libraries, and work within various operating environments, they all focus on developing library elec- tronic statistics and performance mea- sures. These efforts include: • International Coalition of Library C o n s o r t i a ( I C O L C ) : S i n c e t h e m i d - 1990s, this international coalition of li- braries—predominantly academic— has been working toward a standard s e t o f d e f i n i t i o n s f o r s u b s c r i p t i o n online contents. It published the first g u i d e l i n e s i n N o v e m b e r 1 9 9 8 ( s e e Improving Database Vendors’ Usage Statistics Reporting 503 FIGURE 2 ARL E-Metrics Project Participants University of Alberta Auburn University University of Connecticut University of Illinois-Chicago University of Maryland-College Park University of Nebraska-Lincoln University of Pennsylvania University of Pittsburgh University of Southern California Virginia Polytechnic Institute and State University University of Wisconsin-Madison Library of Congress Arizona State University University of Chicago Cornell University University of Manitoba University of Massachusetts University of Notre Dame Pennsylvania State University Purdue University Texas A&M University University of Western Ontario Yale University New York Public Library, The Research Libraries http://www.library.yale.edu/consor- tia/webstats.html) and a revised ver- sion in December 2001 (see http:// w w w. l i b r a r y. y a l e . e d u / c o n s o r t i a / 2001webstats.htm). • National Information Standards Orga- nization (NISO): NISO is updating its Z39.7—Library Statistics Standard to in- clude network services and resources sta- tistics and performance measures. The draft standard was completed in 2002 (see http://www.niso.org/emetrics/current/ complete.html). • National Commission on Libraries and Information Science (NCLIS): Over the years, NCLIS has continued its work in standard- izing online database usage statistics and reporting mechanisms. This project largely focuses on the public library environment (see http://www.nclis.gov). • Institute of Museum and Library Ser- vices (IMLS): IMLS sponsored a project to develop national network statistics and performance measures for public librar- ies. The project resulted in a network sta- tistics manual for public libraries.16 • Project COUNTER (Counting Online Usage of Networked Electronic Resources): COUNTER is supported by a group of publishers, library associations, and other library-related national bodies whose pri- mary aim is to formulate an international code of practice (COD) governing the re- cording and reporting of usage statistics. The release of the first COD is expected by the end of 2002 (see http:// projectcounter.org). • National Clearinghouse for Library and Information Center Networked Statistics: Proposed by Charles R. McClure and his associates at the Information Use Manage- ment and Policy Institute, Florida State University, establishment of the clearing- house will facilitate the sharing and dis- semination of primary data, tools, edu- cation, and research regarding statistics of networked resources and services (see http://www.ii.fsu.edu). One important issue regarding these ini- tiatives is the extent to which the initiatives and organizations coordinate with one an- other. For a host of reasons, including ven- dor cooperation, library reporting require- ments, and library management needs, more coordination and cooperation is nec- essary throughout these projects. The au- thors are involved in a number of projects mentioned above and, to the extent pos- sible, will cooperate with other groups. ARL Meeting with Database Vendors A meeting with a select group of large database vendors occurred on March 14, 2001, in conjunction with the ACRL an- It also appears that different vendors use different counting mechanisms. 504 College & Research Libraries November 2002 FIGURE 3 Database Vendors Attending the ARL Meeting Elsevier/ScienceDirect netLibrary OCLC/FirstSearch JSTOR Pro Quest Ovid Lexis-Nexis Gale Group EBSCO nual meeting in Denver. The goal of this meeting was to engage the community of vendors, publishers, and libraries in building consensus for reporting data on the use of vendor databases and to pro- mote an understanding of what can and cannot be done vis-à-vis the provision of data from the vendor community. The meeting served as a discussion forum for: • sharing information about the de- velopment and standardization of se- lected statistics that describe users and uses of databases; • reaching agreement on the impor- tant data elements and definitions; • engaging vendors in a test of data elements being designed; • understanding the issues that affect vendor-supplied statistics describing da- tabase use and users; • developing a process so that the li- brary community and the vendor com- munity can work together in developing and standardizing a core set of statistics. A total of nine vendors attended the meeting, as shown in figure 3. During the meeting, both the vendor and the library representatives agreed that the reported statistics should be based on the ICOLC guidelines. It was noted that the market is increasingly di- versified in terms of business models, content provided by vendors, and other factors. Accordingly, developing a stan- dardized set of statistics that cover all of these will continue to be a challenge. Everyone agreed that technologies and technology changes have a lot to do with what and how statistics can be collected and reported. For instance, Z39.50 clients do not allow statistics to be collected. So- lutions, such as digital certificates, also are technology based. However, in most cases, the costs of buying and implement- ing these technologies may outweigh any attempt to justify their use to produce more reliable and detailed data. It also appears that different vendors use different counting mechanisms. As a result, the compiled statistics have limited reliabil- ity and validity. Additional investigation into these and related questions is needed. Overall, the meeting was very useful in that it brought libraries and vendors together and established a dialogue.17 The meeting also was a necessary first step for the upcoming field-testing of proposed statistics developed by the E-Metrics study team. As a result of the meeting, all of the vendors present agreed to partici- pate in the vendor statistics field-testing. Vendor Statistics Field-testing The primary goal of the field-testing was to assess usage statistics from major da- tabase vendors in terms of comparability of statistics and their definitions, break- down of data, and report formats. Methodologies Invitations were sent to several vendors, including those that participated in the ARL meeting. All the vendors contacted, twelve in all, agreed to participate in the field-testing. The invitation explained the goals and objectives of the field-testing and provided a brief summary of ex- pected deliverables from each participat- ing vendor. A set of field-testing guidelines was developed and an electronic copy distrib- uted to the vendors. In addition, project participants (libraries) were contacted and their participation in the field-test- ing was solicited. Because not all field- testing libraries subscribed to all of the services, three or four vendors were as- signed to each library based on its sub- scription matrix. The intent was to allevi- Improving Database Vendors’ Usage Statistics Reporting 505 ate the burden on the libraries of evaluat- ing too many vendor reports. In addition, from the standpoint of vendors, it seemed to make sense to concentrate on a few li- braries rather than all of the libraries sub- scribing to their services. The guidelines asked specifically for four deliverables from each vendor: 1. a monthly report (April 2001) in a standardized text format (specific guide- lines were given for data elements and their arrangement); 2. a detailed, step-by-step description of the process used to collect the statis- tics, including the rules and assumptions applied in the process; 3. a monthly (April 2001) raw data log file; 4. issues and suggestions related to providing usage statistics. The vendors were asked to send the field-testing data to their assigned librar- ies and to the authors at Florida State University by the last week of May 2001. A separate evaluation questionnaire was developed and distributed to the field- testing libraries. Field-test Findings Vendor statistics change constantly and can therefore be considered a moving tar- get. The information presented here is for illustration purposes only and may not correctly reflect the current practices and offerings of the database vendors men- tioned in this report. A total of eight vendors participated in the field- testing. Table 1 shows the data formats in which the field test reports were pro- vided by the vendors and the availability of documen- tation received from the vendors with regard to the definitions of the statistics provided and information on how data were collected, filtered, and aggregated. The majority of vendors investigated provided us- age reports in a text format as well as other formats. Compared with the results from the vendor statistics analysis during the E-Metrics Project, the evidence indicates that vendors have made good efforts, especially in the area of making documentation available.18 Many vendors simply did not have any documentation about usage statistics at all when the authors initially analyzed their reports. However, many of the ven- dors’ documentation did not provide enough details concerning the definitions of reported statistics to aid in an under- standing of those statistics. Table 2 shows key ICOLC statistics in- cluded in each vendor’s field-testing re- port. It is important to understand that no attempt has been made to validate compliance with the ICOLC guidelines. Aside from the ICOLC guidelines, there are many instances where the same sta- tistics from different vendors are not equal measures. An obvious example is how vendors apply time-out parameters to compute session counts. Vendor docu- mentation indicated a wide range of time- outs (e.g., Gale, 6 minutes; Ebsco, 10 min- utes; and Science Direct, 30 minutes). A more serious case results from the fact that similar vendors use different methodologies to count the same user activity. As a result, even the most seem- ingly simple statistics, such as searches and items requested, might not be duly compared. Is a search to multiple data- base packages (as in the case of Gale or TABLE 1 Vendor Statistics Field-testing Participation Vendors Academic Press Pro Quest Ebsco Gale Group Lexis-Nexis NetLibrary Science Direct SilverPlatter Data Format txt, Excel Excel, txt, PDF txt csv zip (csv), Word, txt zip (txt), csv txt csv Availability of Documentation n.a. Yes Yes Yes Yes Yes Yes n.a. n.a.: Not available from the vendor during the field-testing. 506 College & Research Libraries November 2002 Ebsco) counted as a single search or as a separate search for each database chosen? Is browsing a secondary database such as author, subject, or journal list counted as a search or a menu selection? Does the vendor take into consideration multiple requests for the same document in a short time period (say, less than 10 seconds) and treat them as one request or multiple re- quests? Is clicking the next button to re- trieve the next set of results counted as separate search? The list of questions goes on and on. The answers to all of these questions can significantly inflate or de- flate the reported usage counts. Further- more, what happens if a vendor changes its counting methodology and does not disclose it? There is widespread suspicion among librarians that even the identically la- beled statistics are counted differently. A close examination of vendor documen- tation provided seems to suggest that the suspicion is not unfounded. Indeed, the answers to the above-mentioned ques- tions differ among vendors. Another im- portant problem has to do with the fact that most vendors do not provide de- tailed information to libraries, making it difficult for librarians to determine whether two comparable statistics from two different vendors refer to the same thing and can be compared accordingly. All of these issues seriously undermine the usefulness of usage statistics and threaten the validity of data. (The issue of validity is addressed later in this ar- ticle.) As a result of the fact that the types of content available through vendors are increasingly diverse and the terms refer- ring to information items have not been fully standardized, a cross-comparison of the items-requested statistic can be diffi- cult. For example, netLibrary, which is gaining increased presence in research libraries, does not lend itself easily to the kinds of statistics with which we are now familiar. This presents a challenge if li- braries try to aggregate the total number of items accessed for cross-vendor com- parison or to gauge the total amount of information transfer from licensed mate- rials available at their institutions. TABLE 2 Key ICOLC Statistics Included in the Vendor Reports (by vendor) Vendors Academic Press/ IDEAL Pro Quest Ebsco Gale Group Lexis-Nexis NetLibrary Science Direct SilverPlatter Items Requested Full text, reference, abstract, table of contents Full text, abstract, citation Full text, abstract Full text, citation, abstract, hits, views, print station Full text, document retrievals Page view, browse, checkout, dictionary use Full text, abstract Full text, abstract Searches Yes Yes Yes Yes Yes Yes Yes Yes Sessions Yes No Yes Yes No Yes Yes Yes Turn-aways n/a n/a n/a Yes n/a Yes n/a Yes n/a: Not applicable Improving Database Vendors’ Usage Statistics Reporting 507 The turn-away statistic has been use- ful in determining whether to increment the number of simultaneous user licenses. However, the statistic applies only to those vendors that have such a restriction. Table 2 shows that out of the eight ven- dors, only three have simultaneous user limits and all three report the turn-away measure. Table 3 shows a breakdown of reported statistics according to the ICOLC-recom- mended categories. It also lists other breakdown categories that the vendors reported. It appears that vendors, in gen- eral, satisfied the title-level (journal, da- tabase, or book) breakdown requirement. The IP (Internet protocol) breakdown re- quirement also was being generally re- spected. But in all cases, the statistics were lumped at the subnet (a group of IP ad- dress block) level rather than at the indi- vidual IP address level. The tabulation might not have been included in sum- mary statistics anyway because it can be made available in log files. Unfortunately, most vendors were unable to furnish log data files because of technical and legal concerns. Half the vendors currently pro- vide some time-related breakdowns. Libraries’ Evaluation of Vendor Reports Overall, libraries reported that the data files were easy to read and process. The majority of libraries used Microsoft Ex- cel to import and display data files. In one case, a vendor sent part of the data files in pdf format, which forced the recipient libraries to enter the numbers manually. The results show that libraries would pre- fer data formats, notably, text formats that can be easily imported into data analysis programs such as Excel and Lotus 1-2-3 without having to spend extra time and effort to manipulate or enter the data. Although all participating libraries at least opened the data files, only a few at- tempted to analyze the data. There seemed to be several reasons why librar- ies were hesitant about in-depth analysis of data. One library commented that it did not test the data because they were the TABLE 3 Breakdown of Statistics in the Vendor Reports (by vendor) Vendors Academic Press/ IDEAL Pro Quest Ebsco Gale Group Lexis-Nexis NetLibrary Science Direct SilverPlatter By Journal or Database Title Journal title Database title, journal title Database title Database title, journal title Database title Book title Journal title Database title IP Yes Yes Yes No Yes Yes Yes No Time/Day No Time No Time, Day Time, Day Time, Day No No Other Client ID Group and profile ID Subscribed versus non- subscribed Peak time and duration 508 College & Research Libraries November 2002 summary data and not the raw data the library expected from the field-testing. The following comment from another li- brary also explains why libraries have not done further analysis: “We currently place raw vendor statistics on our staff intranet and do not compile them for comparison purposes, as we have yet to define what statistics and what format would best suit our institutional needs for such a compi- lation.” At least one library reported specifi- cally how it processed the field-testing data. For each vendor analyzed, the li- brary compared the session counts from the library redirect page (all requests to external vendor databases pass through a Web page that counts how many times different databases are accessed) and the vendor report. This produced, for each database, a rough idea of what portion of attempted log-ins (sessions) originated from people who bypass the library da- tabase Web page. The library also calcu- lated the estimated cost per article viewed and the distribution of articles viewed by title, which confirmed that 25 percent of the titles account for 80 percent of articles viewed for the particular database. The field-testing instructions provided guidelines in terms of essential data ele- ments, data arrangement, and file format. Contrary to the authors’ expectations, all of the vendors simply repackaged their monthly usage reports and submitted them to the libraries. Therefore, the only practical difference between the field-test- ing report and the report that libraries ac- cessed from the vendor Web site in a nor- mal situation was that libraries received the data files directly from the vendors instead of retrieving them from the ven- dor Web sites. Several libraries appreci- ated the fact that they could receive data files in text format, which is much easier to handle than, say, HTML format. An- other minor difference was the availabil- ity of data definitions and statistics col- lection processes from some of the par- ticipating vendors. In some cases, this was the first time that explanations were avail- able to the libraries. Typically, documents that contain definitions of statistics and other background information, if they are available, are provided on the vendors’ Web sites. Even when the sets of data were avail- able from vendors, it was difficult for the libraries to do valid comparisons of the data because of insufficient descriptions of data definitions and limited explana- tion of how the data sets were collected and summarized. Many libraries feared that, without explanatory information on what each data element in the vendor re- ports meant and how the counts were fil- tered, such a comparison would have been faulty at best. This suggests that until there is a satisfactory degree of assurance that the statistics provided by the differ- ent vendors—based on the documenta- tion they provide—are consistent enough for cross-comparison, libraries will not commit major resources in an attempt to compile vendor data into a standardized format or repository. Another problem with comparing data from multiple vendors was the inconsis- tent data formats. The task of combining data fields and adjusting data arrangement from even three or four vendors proved to be extremely time-consuming. What li- braries want is a standardized usage re- port containing common data elements and arranged in a predetermined, agreed- upon order that is provided separately from vendor-specific data elements or ad- ditional data. Even the different placement of field headings, in a column or in a row, requires special handling by the libraries. The majority of respondents said that the data provided by these vendors are “necessary and valuable.” They liked the fact that the data are “very straightfor- ward and easy to use” and, more impor- tant, that they provide some indication of The market for electronic content providers is becoming more diverse and complicated, and the types of statistics that best serve libraries in this changing environment need to be considered. Improving Database Vendors’ Usage Statistics Reporting 509 the extent to which subscription-based services are being utilized. Of course, the relative value depends on the quality of data and the importance of the database to the library (e.g., the amount of money the library spends for a particular data- base as compared with other databases to which they subscribe). Although the majority of libraries be- lieved that the usage reports provided by the individual vendors are useful, some questioned the cumulative value of usage reports combined across vendors. Given the fact that typical ARL libraries deal with several dozen database vendors, normalizing the data, in the current forms, from these vendors will require a prohibitive amount of effort. Usage reports deal almost exclusively with the specific use of vendor databases in terms of frequencies (e.g., searches and sessions), duration (e.g., connection time), and amount of information transfer (e.g., items requested) while largely ignoring another dimension that many libraries consider very important—information about user behavior. The current usage metrics provide information about user behavior to a degree, but not at the level many libraries would hope. To be useful, information about user behavior will need to be correlated with individual user profiles. But the current environment for database access, which is heavily rooted on IP-based authentica- tion, does not permit the kind of data col- lection that libraries expect. Although there is a desire to receive more detailed information about user behaviors, it con- flicts with the current practices and the libraries’ concern about user privacy. Optimally, vendors would provide an option that allows libraries to access raw data log files that have sufficient infor- mation for useful analysis and standard- ized definitions, and that are collected consistently over time. Unfortunately, many vendors were unable to provide log data files because of technical, legal, or other concerns. Because the field-testing dealt with only one month’s data (April 2001), it is difficult to know if what was collected is typical. However, the authors have not heard from the field-testing libraries of any unusual discrepancy between the field-testing data and data they received before the field-testing. The authors real- ize that just comparing data from the same vendors will not provide a satisfac- tory answer to collecting accurate, reli- able, and standardized data. During the course of writing this re- port, the authors came across an e-mail message from a major database vendor acknowledging errors in its usage reports. This suggests that libraries are not in a good position to know what exactly goes into the vendor reports. Some unusual numbers or patterns are relatively easy to identify, but consistent under- or overcounts are harder to detect. The authors believe that the data pro- vided from the vendors studied are easy to obtain and manipulate. Most vendors offer several data formats, including text format (e.g., comma-separated file) and spreadsheet format (e.g., MS Excel), in addition to standard HTML format for easy viewing in Web browsers. Also, many vendors offer an ad hoc report gen- eration facility whereby libraries can cus- tomize the fields and set desired time periods they want to examine. However, processing vendor reports from multiple vendors may become a burden on librar- ies in terms of time and staff efforts be- cause the formats and data arrangements vary considerably from vendor to vendor. Dealing with vendor usage reports raises a number of other issues. The mar- ket for electronic content providers is be- coming more diverse and complicated, and the types of statistics that best serve libraries in this changing environment need to be considered. Companies such as netLibrary did not even exist when the ICOLC guidelines were first drafted. A related issue is the effect of mega-merg- ers taking place in the electronic content providers’ market and how these merg- ers will affect statistical reporting. For the most part, libraries have relied on the ICOLC guidelines as the de facto 510 College & Research Libraries November 2002 standard for usage statistics for licensed materials. Indeed, the guidelines brought the issue of usage statistics to full view for many practicing librarians and database vendors. Although most vendors included in the study claimed a high level of compli- ance with the guidelines, some librarians remain skeptical, citing the differences in the way statistics are collected by different vendors (e.g., different time-outs) and the lack of concrete documentation. The ICOLC guidelines are concerned mainly with de- fining basic usage statistics and do not con- tain detailed information that can be used to validate whether the vendor reports ad- here to the standard. In addition, the library community may have different opinions about how statistics should be counted. What level of specificity are we pursuing in the standardized reports? And who is going to ensure that a vendor report meets the accepted standard? The validity of usage statistics is a criti- cal issue that needs to be addressed seri- ously. First, there should be more detailed information to analyze the validity of re- ported statistics from database vendors. The current documentation, albeit im- proved, is simply not adequate. In this re- gard, Project Counter is an important ini- tiative because it attempts to define an agreed-upon code of practices. All related parties need to work together to draw up the specifics of the practices to dispel the persistent suspicion that even the same statistics are counted differently. For this to happen, libraries, publishers, and aggregators need to continue a healthy dialogue regarding their expectations. Vendors need to be more forthcoming in the discussion and better describe what they do and how they do it in usage re- porting. Because practitioners themselves sometimes do not agree on what is valid, the library community needs to deter- mine what a valid metric is. Establish- ment of the National Clearinghouse for Library and Information Center Net- worked Statistics described earlier can help formulate consensus among practi- tioners. Finally, an external validation service or organization can be considered as a part of the solution. The validating service then would enforce compliance to industry standards and monitor actual use. The authors mention this simply as a possibility in the long term and suggest that it be thoroughly examined before being put forward for implementation. This study has not dealt with issues re- lated to usage reporting in consortial ar- rangements. As those are becoming very common in research libraries, librarians will need to make sure that the individual members involved in the consortia re- ceive the same level of usage statistics for their institutions as in individual site-li- censing agreements. Usage statistics currently provided by vendors give useful information regard- ing the utilization of external subscrip- tion-based information services. Librar- ies use the data for a variety of purposes: usage trends over time, justification for expenditures, cost analysis, modification of service provision, and so on. Related to the issue of the value of the data is the trustworthiness (reliability) of the data. And, as discussed earlier, there also is some concern about the lack of user-re- lated information in usage statistics. Recommendations Based on the findings of this study, the authors make several suggestions that may be useful for ARL libraries (and per- haps other libraries) to consider in deal- ing specifically with vendor statistics, in- cluding: • Focusing data analysis on high-impact databases: Libraries should not treat all da- tabases equally when it comes to data analysis. Because of inconsistencies in data elements and report delivery, it is difficult to normalize usage statistics from all vendors who report data. Instead, li- braries need to investigate the usage pat- terns of “major” databases, whatever those might be locally, and ways that im- provements can be made in terms of ac- cess and use of materials. • Collecting locally obtainable data for external databases: Although libraries need to depend on database vendors for usage Improving Database Vendors’ Usage Statistics Reporting 511 statistics, they have several ways (e.g., through redirect page counters for li- censed databases or through proxy server logs) to capture at least partial informa- tion on user access to the external data- bases (e.g., attempted log-ins). This kind of internal data helps libraries spot-check the reliability of vendor-supplied usage statistics. Moreover, because the data will be under the control of libraries, they will be more consistent than measures re- ported by different vendors. • Keeping track of aggregate key statis- tics and use them: Libraries often find themselves in need of gross figures of user access to external licensed databases for various internal and external reporting. The aggregate numbers are good indica- tors of overall trends in user demand for, and access to, external databases. It is important to keep some level of consis- tency in the way the gross figures are cal- culated and reported. One way to main- tain consistency is to gather data from the same pool of database vendors or data- base titles over a specified period of time (e.g., Total number of searches conducted in existing licensed databases grew by 20% in 2000 to 1,200,000 as compared to the 1999 total of 1,000,000 searches. The data are based on the same thirty-five vendors that report the statistic.). • Validating reliability: The library community needs to consider concrete ways (e.g., third-party validation) to en- sure consistent and reliable reporting from vendors. • Demanding documentation: Libraries should demand better documentation of the data collection and filtering process from the various vendors. Such documen- tation should describe how the sets of data are collected and defined, and dis- cuss any issues or concerns in the report- ing of these data to libraries. • Organizing the library for data collec- tion, analysis, and working with the vendors: Many libraries simply lack adequate staff, or the staff members lack adequate knowledge and training, to work effec- tively with the statistics and information that some of the vendors can supply. Li- brary staff need to have an understand- ing of the statistics and to know how to manipulate the files and how to organize and report such data. In addition, the li- brary needs to be able to commit organi- zational resources to working with and using such vendor statistics. The use of different system parameters (e.g., time-out), the application of different assumptions about user behavior (e.g., how to treat or count multiple clicks on the same document within a session), and the lack of adequate explanation in vendor docu- mentation regarding specific definitions and data collection and filtering processes all contribute to the reporting problem. The comprehensive standardization of usage statistics and data delivery methods (e.g., file format and data arrangement) cannot be easily achieved in the short term. These are long-term goals toward which vendors and libraries need to work together. The ARL community should continue to make progress in this area by working among themselves and with the database vendor community. Therefore, the authors recom- mend that comparisons be limited to data from the same vendors or data that are known to be collected, defined, and re- ported similarly. The authors strongly recommend that vendors report standardized usage statis- tics, such as those recommended by the ICOLC and those defined in the final manual that resulted from the project.19 These should appear in the standardized column and row arrangements and in- clude a separate report that contains any additional vendor-specific data. Continuing the Momentum ARL libraries have needed consistent, comparable, easy-to-use, and useful us- age statistics from content providers (da- The fact of the matter is that both the library community and the vendor community have much to learn in terms of understanding how best to define, collect, analyze, report, and validate such statistics. 512 College & Research Libraries November 2002 tabase vendors) ever since they embraced the notion of maintaining statistics on the use of external licensed materials. The ARL E-Metrics Project provided an op- portunity for the ARL community to look at the issues and problems related to ven- dor usage reporting in a more systematic way and to begin working toward devel- oping more useful reports. However, much more work remains in this area. Members of the study team found that some library staff had little knowledge about the vendor statistics, had limited training in being able to manipulate and analyze the reported data, and were quite surprised that such evaluation and ma- nipulation of data required special train- ing and knowledge. Some libraries were not organized for ongoing data collection and analysis of vendor statistics: it was unclear who was responsible for such ef- forts and whether resources were avail- able to support these efforts. And finally, most libraries simply had no manage- ment information system (even in the most basic sense of the word) for orga- nizing, analyzing, and reporting such data. The study team found that, in gen- eral, libraries were not prepared to com- mit the necessary resources, staff time, training, and effort into the evaluation.20 Thus, one difficulty in some of the dis- cussions with the vendors was a lack of knowledge and skills on the part of the li- brarians in using and analyzing the data. The fact of the matter is that both the library community and the vendor community have much to learn in terms of understand- ing how best to define, collect, analyze, re- port, and validate such statistics. For their part, many libraries simply do not have a culture of evaluation that supports the as- sessment effort needed to use vendor-based statistics successfully.21 Organizational de- velopment, staff training, and strategic planning for exploiting such data on the part of libraries will be key components in moving forward in this area. Several organizations in library and vendor communities, national and inter- national bodies, are currently working in this area. Although these initiatives do not overlap exactly in terms of goals and scopes, there is a danger that they may result in conflicting reporting require- ments. Specific ways to coordinate and encourage cooperation have yet to be de- veloped. Indeed, the number and range of organizations interested in developing standardized statistics is significant. From the vendors’ point of view, it is impossible to respond to multiple and conflicting requests for data from the li- brary community. As one vendor com- mented, until the library community can decide how best it wants the data defined, collected, validated, and reported, ven- dors cannot provide endless responses to users in terms of offering “customized” data sets. Thus, to some degree, the mem- bers of the library community must con- tinue to work among themselves to reach such an agreement regarding these stan- dards. In addition, different types of libraries (academic, school, public, special, etc.) need to reconsider the degree to which they think their needs are unique to their particular settings. A “full-text down- load” is not going to vary across type of library. Those librarians who argue that they have unique or special data needs simply support the view of some vendors that it is impossible to provide multiple data types, defined differently, for differ- ent libraries. And little progress will be made on standardizing these statistics. The members of the library community must work together in the development of such standards, definitions, and report- ing. Both vendors and the library commu- nity need to realize that the development, testing, refinement, and standardization of vendor-based statistics is an ongoing pro- cess. Given the changes in technology, da- tabase structures, and other factors, the life span of the statistics may be short (com- pared with more traditional library statis- tics). Thus, being able to have longitudi- nal data from vendors may be difficult and there will be a need to be much more prag- matic as to the availability and use of ven- dor-based statistics. Establishment of the Improving Database Vendors’ Usage Statistics Reporting 513 National Clearinghouse for Library and Information Center Networked Statistics (http://www.ii.fsu.edu) at Florida State University’s Information Use Manage- ment and Policy Institute will play a coor- dinating role in the collection, use, and analysis of network data sources includ- ing, but not limited to, database vendor statistics. The clearinghouse will facilitate the cross-fertilization of the various efforts thus far to build on each other and inte- grate activities for meaningful library as- sessment in support of decision making and analysis. One important accomplishment dur- ing the project was the initiation of con- versations and cooperation with major database vendors. Currently, library lead- ership in this area is diffuse and lacks co- ordination. Work needs to continue, es- pecially in the standardization of key us- age statistics, data delivery, and better documentation of definitions and report- ing procedures. An ongoing, more for- malized mechanism is essential to ensure that such meetings take place, progress is made, and better standards for vendor- based statistics are developed. Notes 1. Association of Research Libraries, ARL Supplementary Statistics 2000-2001 (ARL, Washing- ton, D. C., 2002). Available online from: . 2. It is quite possible that the users may not even realize that the library evaluated the elec- tronic sources and negotiated the licensing contracts. Most electronic databases validate legiti- mate user log-ins by examining the origination IP (Internet protocol) addresses included in the requests. Users can bypass the library Web site when they access external electronic databases as long as they use the computers carrying the legitimate IP addresses. For remote users using Internet service providers (ISPs), they may need to set up a proxy server that allows them to access electronic databases. The proxy information allows them to use the institution’s IP ad- dresses. 3. The arrows denote movement of information contents and location of user access. In the depiction of the traditional library, materials reside in library premises and users come to the library to use the collection. On the other hand, in the networked library, not all of the library collection resides within the library’s physical boundaries. Also, part of user access occurs out- side the library, as in the case of access to most subscription-based, electronic-licensed materials. 4. These include off-site storage facilities libraries use to reduce the cost of warehousing less frequently requested materials. 5. The authors are not suggesting here that only external, licensed materials are of concern to scholars and libraries. They acknowledge that other freely available or community-based ser- vices are being heavily used by academic users. 6. There can be a set of arbitrary limitations such as simultaneous log-in limits, but they do not originate from the characteristics of the electronic format. 7. International Coalition of Library Consortia, “Guidelines for Statistical Measures of Us- age of Web-Based Information Resources” (ICOLC, 2001). [revised December 2001]. Available online from http://www.library.yale.edu/consortia/2001webstats.htm. 8. There is a large and diverse number of electronic content providers in the market, and it is difficult to describe them collectively. The term “database vendors” is used in this article to denote various content providers such as traditional journal publishers providing electronic coun- terparts, aggregators of full-text journals and reference databases (e.g., JSTOR, Project Muse, Ebsco, Gale, ProQuest), electronic book providers (e.g., Questia, netLibrary), and so on. 9. John Carlo Bertot, Charles R. McClure, and Joe Ryan, Statistics and Performance Measures for Public Library Networked Services (Chicago: ALA, 2001). 10. Carol Tenopir and Eleanor Read, “Patterns of Use and Usage Factors for Online Databases in Academic Libraries,” College & Research Libraries 61 (May 2000): 234–46. 11. Deborah D. Blecic, Joan B. Fiscella, and Stephen E. Wiberley Jr., “The Measurement of Use of Web-based Information Resources: An Early Look at Vendor-supplied Data,” College & Re- search Libraries 62 (Sept. 2001): 434–53. 12. Sally A. Rogers, “Electronic Journal Usage at Ohio State University,” College & Research Libraries 62 (Jan. 2001): 25–34. 13. Kathleen Bauer, “Indexes as Tools for Measuring Usage of Print and Electronic Resources,” College & Research Libraries 62 (Jan. 2001): 36–42. 14. Charles R. McClure and John Carlo Bertot, ed., Evaluating Networked Information Services: Techniques, Policy, and Issues (Medford, N.J.: American Society for Information Science and Tech- 514 College & Research Libraries November 2002 nology, 2001). 15. The meeting served as the planning session for the project that later became known as the ARL E-Metrics Project. Thirty-five ARL institutions were represented at the meeting. A survey questionnaire was sent out before the meeting, and twenty-one libraries responded. The survey contained four open-ended questions on the data needs for electronic information resources, the status of data collections at participating libraries, and the expectation for the meeting. The sum- mary presentation during the meeting (in the Microsoft Powerpoint format) is available online from http://www.arl.org/stats/newmeas/scottsdale/jeffshim/. 16. Bertot, McClure, and Ryan, Statistics and Performance Measures for Public Library Networked Services. 17. Additional meetings between vendors and other members of the library community re- garding statistics have occurred at ALA midwinter 2001 and 2002. These meetings also have attempted to better coordinate and validate vendor-based statistics that are being reported to libraries. The National Commission on Libraries and Information Science has sponsored these meetings and summaries, as well as other related reports, and is available online from http:// www.nclis.gov/statsurv/statsurv.html. 18. An interim Phase I report describing current practices of participating ARL member li- braries related to network statistics and measures was issued November 7, 2000, and is available online from http://www.arl.org/stats/newmeas/emetrics/index.html. 19. Wonsik Shim, Charles R. McClure, Bruce T. Fraser, and John Carlo Bertot, Data Collection Manual for Academic and Research Library Network Statistics and Performance Measures (Tallahassee, Fla.: Information Use Management and Policy Institute (Dec. 2001); also available from the ARL online from http://www.arl.org/stats/newmeas/emetrics/index.html. 20. As part of the E-Metrics Project, the study team produced three PowerPoint presentations about preparing the organization for evaluation, the importance of evaluation and use of vendor statistics, and an overview/introduction to the recommended statistics. These presentations are available from the ARL, in Washington, D.C. and from http://www.arl.org/stats/newmeas/ emetrics/index.html. 21. Amos Lakos, “The Missing Ingredient: Culture of Assessment in Libraries,” Performance Measurement and Metrics: The International Journal for Library and Information Services 1 (1): 3–7.