shim.p65


Improving Database Vendors’ Usage Statistics Reporting  499

499

Improving Database Vendors’ Usage
Statistics Reporting through
Collaboration between Libraries
and Vendors

Wonsik Shim and Charles R. McClure

Wonsik Shim is an Assistant Professor in the School of Information Studies at Florida State University; e-
mail: wshim@lis.fsu.edu. Charles R. McClure is the Francis Eppes Professor and Director of the Informa-
tion Use Management and Policy Institute in the School of Information Studies at Florida State Univer-
sity; e-mail: cmcclure@lis.fsu.edu.

The article reports the results from the Association of Research Libraries
(ARL) E-Metrics study to investigate issues associated with the usage sta-
tistics provided by database vendors. The ARL E-Metrics study was a con-
certed effort by twenty-four ARL libraries to develop and test statistics and
measures in order to describe electronic resources and services in ARL
libraries. This article describes a series of activities and investigations that
included a meeting with major database vendors and the field-testing of
usage statistics from eight major vendors to evaluate the degree to which
the reports are useful for library decision-making. Overall, the usage statis-
tics from the vendors studied are easy to obtain and process. However, the
standardization of key usage statistics and reporting format is critical. Vali-
dation of reported statistics also remains a critical issue. This article offers a
set of recommendations for libraries and calls for continuous collaboration
between libraries and major database vendors.

he move to a networked envi-
ronment has significantly in-
creased the range of services
and resources that the library

provides its users. The library has become
a twenty-four-hour-a-day access point to
information services where users obtain
services and resources on their terms and
when they want such services. Often us-
ers do not enter the library physically nor
do they interact directly with library staff.
The costs of providing these networked
services and resources can be significant.
As a result, library managers are seeking

ways to measure the use of these digital
services and resources.

One of the results of the networked
information provision environment is
that libraries increasingly depend on ex-
ternal providers of academic and schol-
arly information content and services.
Recent statistics estimate that in 2000–
2001, research libraries spent, on average,
16.25 percent of their materials budget on
electronic resources, a sharp increase from
a mere 3.6 percent in 1992–1993.1 This in-
formation has traditionally existed in the
library as subscription print journals,



500  College & Research Libraries November 2002

print indexes and abstracts, books, and
so on. However, there is a big difference
in terms of ownership and control be-
tween traditional information contents
and digital information contents.

With physical media, the library
owned the objects and controlled their
use. For example, the library catalog—be
it card catalog or online catalog—repre-
sented what the library owned and could
make available to its users. But with elec-
tronic media, the library is only one of
many access points to the information
resources. As a result, the library has
much less control over use.2 The library
catalog now includes many pointers to
external information sources that, in some
cases, may no longer exist when the user
tries to access them.

Figure 1 depicts a simplified view of
the differences between the traditional
library environment and the networked
library environment characterized by the
Internet as the primary information de-
livery medium and the growing presence
of external electronic information re-
sources and services in the library.

FIGURE 1
Changed Library Environment3

In the traditional library, most library
materials were housed in a physical li-
brary building, and users typically
needed to come to the library to use its
materials and services.4 Availability was
an important concern because of the
physical characteristics of the materials.
In the networked library, however, library
materials and services increasingly reside
outside the physical library building. Li-
braries now depend, in large measure, on
the publishers of electronic journals (e.g.,
Elsevier’s Science Direct and Academic
Press’s IDEAL), electronic content
aggregators (e.g., Ebsco and Gale), and
other electronic information providers to
meet user demands for resources and ser-
vices.5 Availability has become less an is-
sue in the networked library environment
because the electronic medium allows
several people to use the same material
at the same time.6

On an experiential basis, many aca-
demic librarians describe the use of their
networked information services with
terms such as “exponential growth” or
“we can’t keep up with demand.” At the
same time, a number of academic librar-
ies have seen stagnant or declining sta-
tistics of traditional indicators of library
service, such as turnstile counts, in-house
reference transactions, and circulation.

… the provision of usage statistics
by electronic content providers is
problematic at best.



Improving Database Vendors’ Usage Statistics Reporting  501

Librarians need reliable and accurate sta-
tistics that will allow them to make good
resource allocation decisions (e.g., cost-
benefit analysis, contract negotiation, jus-
tification of expenditure), meet user needs
(e.g., identifying barriers of access, under-
standing user behaviors), and develop
strategic plans (e.g., user education, peer
comparison) for the development and
operation of electronic services and re-
sources.

Although some progress has been
made over the past several years, most
notably, the guidelines produced by the
International Coalition of Library Consor-
tia (ICOLC), the provision of usage sta-
tistics by electronic content providers is
problematic at best.7

This article focuses on the problem of
acquiring and using the statistics pro-
vided by external, fee-based electronic
content providers and describes the work
done in the Association of Research Li-
braries (ARL) E-Metrics Project to stan-
dardize the usage statistics and promote
dialogue among database vendors and re-
search libraries.8

Previous Work
The growing presence of electronic infor-
mation resources and networked services
prompted interest and research in devel-
oping statistics and measures to describe
the emerging information environment.
The most relevant work is a manual pub-
lished by the ALA in 2001.9 Written by
John Carlo Bertot, Charles R. McClure,
and Joe Ryan, the work is based on De-
veloping Public Library Statistics and
Performance Measures for the Net-
worked Environment, a research project
funded by the Institute of Museum and
Library Services (IMLS). Intended prima-
rily for public library managers, the
manual not only contains step-by-step
procedures to collect some key usage sta-
tistics but also provides a set of issues that
library administrators need to consider in
collecting and using those statistics. Many
of the proposed statistics can be easily
transferred into an academic library set-
ting.

In an article published in 2000, Carol
Tenopir and Eleanor Read offered an ex-
ample of cross-institutional analysis of
database use.10 Using data from a vendor
for fifty-seven academic institutions,
Tenopir and Read found that regardless
of the type of academic library, user de-
mands are concentrated on a fairly pre-
dictable span of time—“early in the week,
at midday, in the month when term pa-
pers are due.” The authors also concluded
that, compared with other electronic me-
dia such as chat rooms and general
Internet resources, students underutilize
electronic library databases. At the indi-
vidual institution level, Deborah D. Blecic,
Joan B. Fiscella, and Stephen E. Wiberley
Jr. identified ways that libraries can use
vendor-supplied usage statistics to under-
stand the scope of use.11 Sally A. Rogers
also has provided a good comparison of
print and e-journal usage at Ohio State
University.12 Recognizing the need for
ways to compare patron usage of elec-
tronic and print materials, Kathleen Bauer
proposed the use of indexes to combine
multiple usage indicators of both elec-
tronic and print resources.13

A recent compilation by McClure and
Bertot has provided an overview of a
wide array of issues surrounding the
evaluation of networked information ser-
vices in several different contexts, includ-
ing usage statistics from database ven-
dors.14 As mentioned earlier, the ICOLC
guidelines are widely recognized as the
de facto standard regarding usage statis-
tics supplied by database vendors.

Finally, the discussion of usage statis-
tics of database vendors is not complete
without mentioning two very active mail-
ing lists that deal with the topic: Library
License (liblicense-l@lists.yale.edu),
hosted by the Yale University Library; and
the Electronic Resources in Libraries list
(eril-l@listserv.binghamton.edu), hosted
by the Binghamton University Library.
Although these mailing lists do not cover
vendor statistics exclusively, there have
been a considerable number of postings
and threads on both regarding the topic.
These mailing lists also have been used



502  College & Research Libraries November 2002

as a catalyst to formulate the library
community’s response to major chal-
lenges from database vendors.

The current work focuses on issues re-
lated to acquiring, processing, and using
vendor usage statistics at research librar-
ies under the ARL E-Metrics Project. It is
important to point out that the E-Metrics
Project is one of many initiatives that are
working toward establishing standard-
ized, comparable statistics for electronic
contents and services. However, the ARL
E-Metrics Project is unique in that it is a
cooperative effort among a large number
of research libraries and that it seeks the
participation of major database vendors
in attempting to find solutions.

ARL E-Metrics Project
Usage statistics in the context of electronic
subscription–based databases mainly re-
fer to the indicators of the volume of user
access to the electronic resources and ser-
vices available from database vendors.
Examples of those indicators are a count
of sessions in a specific database, the time
per session in a specific database, the
count of searches in a specific database,
and the count of full-text downloads per
time period per database. In addition,
usage statistics can show a variety of in-
formation, including success or failure of
user access (e.g., turn-aways per time
period per specific database), user access
methods (e.g., telnet versus browsers),
access levels of one institution compared
against peer institutions, cost of access
(e.g., cost per downloaded item), and
other items pertaining to user behaviors.

According to a survey conducted with
the participants of the February 2000 ARL
Project Planning Session on Usage Mea-
sures for Electronic Information Re-
sources, held in Scottsdale, Arizona, the
following problems are associated with
usage reports from database vendors: 15

• Reports do not provide detailed
information about usage. For example,
many vendors did not provide usage fig-
ures by journal or database title.

• Reports are inconsistent. For ex-
ample, vendors use their own terminolo-

gies and do not provide adequate expla-
nations to understand the reported sta-
tistics.

• Reports are not comparable. Be-
cause usage reports come in different for-
mats and contain different statistics, it is
impossible to compile accurate statistics
within the library and to compare with
other libraries.

However, the biggest problem with
usage reports is that many vendors sim-
ply do not provide any data at all.

The ARL E-Metrics Project was a con-
certed effort by selected members of the
research library community to investigate
various issues and problems related to
collecting and using data on electronic
materials and services. The project, which
began in April 2000 and finished in De-
cember 2001, was funded by a group of
twenty-four ARL libraries. Figure 2 iden-
tifies the project’s participants.

One of the aims of the E-Metrics Project
was to engage in a collaborative effort with
selected database vendors to establish an
ongoing means of producing selected de-
scriptive statistics on database use, users,
and services. A complete project descrip-
tion, project reports, and the data collection
manual are available at the ARL E-Metrics
Project site at http://www.arl.org/stats/
newmeas/emetrics/index.html.

The E-Metrics Project should be
viewed in the context of a number of re-
lated initiatives, both national and inter-
national, that are under way to assist li-
braries in assessing their networked re-
sources and services. Although these ini-
tiatives take different approaches, focus
on different types of libraries, and work
within various operating environments,
they all focus on developing library elec-
tronic statistics and performance mea-
sures. These efforts include:

• International Coalition of Library
C o n s o r t i a  ( I C O L C ) :  S i n c e  t h e  m i d -
1990s, this international coalition of li-
braries—predominantly academic—
has been working toward a standard
s e t  o f  d e f i n i t i o n s  f o r  s u b s c r i p t i o n
online contents. It published the first
g u i d e l i n e s  i n  N o v e m b e r  1 9 9 8  ( s e e



Improving Database Vendors’ Usage Statistics Reporting  503

FIGURE 2
ARL E-Metrics Project Participants

University of Alberta
Auburn University
University of Connecticut
University of Illinois-Chicago
University of Maryland-College Park
University of Nebraska-Lincoln
University of Pennsylvania
University of Pittsburgh
University of Southern California
Virginia Polytechnic Institute and State

University
University of Wisconsin-Madison
Library of Congress

Arizona State University
University of Chicago
Cornell University
University of Manitoba
University of Massachusetts
University of Notre Dame
Pennsylvania State University
Purdue University
Texas A&M University
University of Western Ontario
Yale University
New York Public Library, The Research

Libraries

http://www.library.yale.edu/consor-
tia/webstats.html) and a revised ver-
sion in December 2001 (see http://
w w w. l i b r a r y. y a l e . e d u / c o n s o r t i a /
2001webstats.htm).

• National Information Standards Orga-
nization (NISO): NISO is updating its
Z39.7—Library Statistics Standard to in-
clude network services and resources sta-
tistics and performance measures. The
draft standard was completed in 2002 (see
http://www.niso.org/emetrics/current/
complete.html).

• National Commission on Libraries and
Information Science (NCLIS): Over the years,
NCLIS has continued its work in standard-
izing online database usage statistics and
reporting mechanisms. This project largely
focuses on the public library environment
(see http://www.nclis.gov).

• Institute of Museum and Library Ser-
vices (IMLS): IMLS sponsored a project to
develop national network statistics and
performance measures for public librar-
ies. The project resulted in a network sta-
tistics manual for public libraries.16

• Project COUNTER (Counting Online
Usage of Networked Electronic Resources):
COUNTER is supported by a group of
publishers, library associations, and other
library-related national bodies whose pri-
mary aim is to formulate an international
code of practice (COD) governing the re-
cording and reporting of usage statistics.

The release of the first COD is expected
by the end of 2002 (see http://
projectcounter.org).

• National Clearinghouse for Library and
Information Center Networked Statistics:
Proposed by Charles R. McClure and his
associates at the Information Use Manage-
ment and Policy Institute, Florida State
University, establishment of the clearing-
house will facilitate the sharing and dis-
semination of primary data, tools, edu-
cation, and research regarding statistics
of networked resources and services (see
http://www.ii.fsu.edu).

One important issue regarding these ini-
tiatives is the extent to which the initiatives
and organizations coordinate with one an-
other. For a host of reasons, including ven-
dor cooperation, library reporting require-
ments, and library management needs,
more coordination and cooperation is nec-
essary throughout these projects. The au-
thors are involved in a number of projects
mentioned above and, to the extent pos-
sible, will cooperate with other groups.

ARL Meeting with Database Vendors
A meeting with a select group of large
database vendors occurred on March 14,
2001, in conjunction with the ACRL an-

It also appears that different vendors
use different counting mechanisms.



504  College & Research Libraries November 2002

FIGURE 3
Database Vendors

Attending the ARL Meeting
Elsevier/ScienceDirect
netLibrary
OCLC/FirstSearch
JSTOR
Pro Quest
Ovid
Lexis-Nexis
Gale Group
EBSCO

nual meeting in Denver. The goal of this
meeting was to engage the community of
vendors, publishers, and libraries in
building consensus for reporting data on
the use of vendor databases and to pro-
mote an understanding of what can and
cannot be done vis-à-vis the provision of
data from the vendor community. The
meeting served as a discussion forum for:

• sharing information about the de-
velopment and standardization of se-
lected statistics that describe users and
uses of databases;

• reaching agreement on the impor-
tant data elements and definitions;

• engaging vendors in a test of data
elements being designed;

• understanding the issues that affect
vendor-supplied statistics describing da-
tabase use and users;

• developing a process so that the li-
brary community and the vendor com-
munity can work together in developing
and standardizing a core set of statistics.

A total of nine vendors attended the
meeting, as shown in figure 3.

During the meeting, both the vendor
and the library representatives agreed
that the reported statistics should be
based on the ICOLC guidelines. It was
noted that the market is increasingly di-
versified in terms of business models,
content provided by vendors, and other
factors. Accordingly, developing a stan-
dardized set of statistics that cover all of
these will continue to be a challenge.

Everyone agreed that technologies and
technology changes have a lot to do with
what and how statistics can be collected
and reported. For instance, Z39.50 clients
do not allow statistics to be collected. So-
lutions, such as digital certificates, also are
technology based. However, in most
cases, the costs of buying and implement-
ing these technologies may outweigh any
attempt to justify their use to produce
more reliable and detailed data.

It also appears that different vendors use
different counting mechanisms. As a result,
the compiled statistics have limited reliabil-
ity and validity. Additional investigation
into these and related questions is needed.

Overall, the meeting was very useful
in that it brought libraries and vendors
together and established a dialogue.17 The
meeting also was a necessary first step for
the upcoming field-testing of proposed
statistics developed by the E-Metrics
study team. As a result of the meeting, all
of the vendors present agreed to partici-
pate in the vendor statistics field-testing.

Vendor Statistics Field-testing
The primary goal of the field-testing was
to assess usage statistics from major da-
tabase vendors in terms of comparability
of statistics and their definitions, break-
down of data, and report formats.

Methodologies
Invitations were sent to several vendors,
including those that participated in the
ARL meeting. All the vendors contacted,
twelve in all, agreed to participate in the
field-testing. The invitation explained the
goals and objectives of the field-testing
and provided a brief summary of ex-
pected deliverables from each participat-
ing vendor.

A set of field-testing guidelines was
developed and an electronic copy distrib-
uted to the vendors. In addition, project
participants (libraries) were contacted
and their participation in the field-test-
ing was solicited. Because not all field-
testing libraries subscribed to all of the
services, three or four vendors were as-
signed to each library based on its sub-
scription matrix. The intent was to allevi-



Improving Database Vendors’ Usage Statistics Reporting  505

ate the burden on the libraries of evaluat-
ing too many vendor reports. In addition,
from the standpoint of vendors, it seemed
to make sense to concentrate on a few li-
braries rather than all of the libraries sub-
scribing to their services.

The guidelines asked specifically for
four deliverables from each vendor:

1. a monthly report (April 2001) in a
standardized text format (specific guide-
lines were given for data elements and
their arrangement);

2. a detailed, step-by-step description
of the process used to collect the statis-
tics, including the rules and assumptions
applied in the process;

3. a monthly (April 2001) raw data log
file;

4. issues and suggestions related to
providing usage statistics.

The vendors were asked to send the
field-testing data to their assigned librar-
ies and to the authors at Florida State
University by the last week of May 2001.
A separate evaluation questionnaire was
developed and distributed to the field-
testing libraries.

Field-test Findings
Vendor statistics change constantly and
can therefore be considered a moving tar-
get. The information presented here is for
illustration purposes only and may not
correctly reflect the current practices and
offerings of the database vendors men-
tioned in this report.

A total of eight vendors
participated in the field-
testing. Table 1 shows the
data formats in which the
field test reports were pro-
vided by the vendors and
the availability of documen-
tation received from the
vendors with regard to the
definitions of the statistics
provided and information
on how data were collected,
filtered, and aggregated.

The majority of vendors
investigated provided us-
age reports in a text format

as well as other formats. Compared with
the results from the vendor statistics
analysis during the E-Metrics Project, the
evidence indicates that vendors have
made good efforts, especially in the area
of making documentation available.18

Many vendors simply did not have any
documentation about usage statistics at
all when the authors initially analyzed
their reports. However, many of the ven-
dors’ documentation did not provide
enough details concerning the definitions
of reported statistics to aid in an under-
standing of those statistics.

Table 2 shows key ICOLC statistics in-
cluded in each vendor’s field-testing re-
port. It is important to understand that
no attempt has been made to validate
compliance with the ICOLC guidelines.
Aside from the ICOLC guidelines, there
are many instances where the same sta-
tistics from different vendors are not
equal measures. An obvious example is
how vendors apply time-out parameters
to compute session counts. Vendor docu-
mentation indicated a wide range of time-
outs (e.g., Gale, 6 minutes; Ebsco, 10 min-
utes; and Science Direct, 30 minutes).

A more serious case results from the
fact that similar vendors use different
methodologies to count the same user
activity. As a result, even the most seem-
ingly simple statistics, such as searches
and items requested, might not be duly
compared. Is a search to multiple data-
base packages (as in the case of Gale or

TABLE 1
Vendor Statistics Field-testing Participation

Vendors
Academic Press
Pro Quest
Ebsco
Gale Group
Lexis-Nexis
NetLibrary
Science Direct
SilverPlatter

Data Format
txt, Excel
Excel, txt, PDF
txt
csv
zip (csv), Word, txt
zip (txt), csv
txt
csv

Availability of
Documentation

n.a.
Yes
Yes
Yes
Yes
Yes
Yes
n.a.

n.a.: Not available from the vendor during the field-testing.



506  College & Research Libraries November 2002

Ebsco) counted as a single search or as a
separate search for each database chosen?
Is browsing a secondary database such as
author, subject, or journal list counted as
a search or a menu selection? Does the
vendor take into consideration multiple
requests for the same document in a short
time period (say, less than 10 seconds) and
treat them as one request or multiple re-
quests? Is clicking the next button to re-
trieve the next set of results counted as
separate search? The list of questions goes
on and on. The answers to all of these
questions can significantly inflate or de-
flate the reported usage counts. Further-
more, what happens if a vendor changes
its counting methodology and does not
disclose it?

There is widespread suspicion among
librarians that even the identically la-
beled statistics are counted differently. A
close examination of vendor documen-
tation provided seems to suggest that the
suspicion is not unfounded. Indeed, the
answers to the above-mentioned ques-
tions differ among vendors. Another im-
portant problem has to do with the fact

that most vendors do not provide de-
tailed information to libraries, making it
difficult for librarians to determine
whether two comparable statistics from
two different vendors refer to the same
thing and can be compared accordingly.
All of these issues seriously undermine
the usefulness of usage statistics and
threaten the validity of data. (The issue
of validity is addressed later in this ar-
ticle.)

As a result of the fact that the types of
content available through vendors are
increasingly diverse and the terms refer-
ring to information items have not been
fully standardized, a cross-comparison of
the items-requested statistic can be diffi-
cult. For example, netLibrary, which is
gaining increased presence in research
libraries, does not lend itself easily to the
kinds of statistics with which we are now
familiar. This presents a challenge if li-
braries try to aggregate the total number
of items accessed for cross-vendor com-
parison or to gauge the total amount of
information transfer from licensed mate-
rials available at their institutions.

TABLE 2
Key ICOLC Statistics Included in the Vendor Reports (by vendor)

Vendors
Academic Press/
IDEAL
Pro Quest
Ebsco
Gale Group

Lexis-Nexis
NetLibrary

Science Direct
SilverPlatter

Items Requested
Full text, reference, abstract,
table of contents
Full text, abstract, citation
Full text, abstract
Full text, citation, abstract, hits,
views, print station
Full text, document retrievals
Page view, browse, checkout,
dictionary use
Full text, abstract
Full text, abstract

Searches
Yes

Yes
Yes
Yes

Yes
Yes

Yes
Yes

Sessions
Yes

No
Yes
Yes

No
Yes

Yes
Yes

Turn-aways
n/a

n/a
n/a
Yes

n/a
Yes

n/a
Yes

n/a: Not applicable



Improving Database Vendors’ Usage Statistics Reporting  507

The turn-away statistic has been use-
ful in determining whether to increment
the number of simultaneous user licenses.
However, the statistic applies only to
those vendors that have such a restriction.
Table 2 shows that out of the eight ven-
dors, only three have simultaneous user
limits and all three report the turn-away
measure.

Table 3 shows a breakdown of reported
statistics according to the ICOLC-recom-
mended categories. It also lists other
breakdown categories that the vendors
reported. It appears that vendors, in gen-
eral, satisfied the title-level (journal, da-
tabase, or book) breakdown requirement.
The IP (Internet protocol) breakdown re-
quirement also was being generally re-
spected. But in all cases, the statistics were
lumped at the subnet (a group of IP ad-
dress block) level rather than at the indi-
vidual IP address level. The tabulation
might not have been included in sum-
mary statistics anyway because it can be
made available in log files. Unfortunately,
most vendors were unable to furnish log

data files because of technical and legal
concerns. Half the vendors currently pro-
vide some time-related breakdowns.

Libraries’ Evaluation of Vendor
Reports
Overall, libraries reported that the data
files were easy to read and process. The
majority of libraries used Microsoft Ex-
cel to import and display data files. In one
case, a vendor sent part of the data files
in pdf format, which forced the recipient
libraries to enter the numbers manually.
The results show that libraries would pre-
fer data formats, notably, text formats that
can be easily imported into data analysis
programs such as Excel and Lotus 1-2-3
without having to spend extra time and
effort to manipulate or enter the data.

Although all participating libraries at
least opened the data files, only a few at-
tempted to analyze the data. There
seemed to be several reasons why librar-
ies were hesitant about in-depth analysis
of data. One library commented that it did
not test the data because they were the

TABLE 3
Breakdown of Statistics in the Vendor Reports (by vendor)

Vendors
Academic Press/
IDEAL
Pro Quest
Ebsco
Gale Group
Lexis-Nexis
NetLibrary
Science Direct

SilverPlatter

By Journal or
Database Title

Journal title

Database title, journal title
Database title
Database title, journal title
Database title
Book title
Journal title

Database title

IP
Yes

Yes
Yes
No
Yes
Yes
Yes

No

Time/Day
No

Time
No
Time, Day
Time, Day
Time, Day
No

No

Other

Client ID
Group and
profile ID

Subscribed
versus non-
subscribed
Peak time and
duration



508  College & Research Libraries November 2002

summary data and not the raw data the
library expected from the field-testing.
The following comment from another li-
brary also explains why libraries have not
done further analysis: “We currently place
raw vendor statistics on our staff intranet
and do not compile them for comparison
purposes, as we have yet to define what
statistics and what format would best suit
our institutional needs for such a compi-
lation.”

At least one library reported specifi-
cally how it processed the field-testing
data. For each vendor analyzed, the li-
brary compared the session counts from
the library redirect page (all requests to
external vendor databases pass through
a Web page that counts how many times
different databases are accessed) and the
vendor report. This produced, for each
database, a rough idea of what portion of
attempted log-ins (sessions) originated
from people who bypass the library da-
tabase Web page. The library also calcu-
lated the estimated cost per article viewed
and the distribution of articles viewed by
title, which confirmed that 25 percent of
the titles account for 80 percent of articles
viewed for the particular database.

The field-testing instructions provided
guidelines in terms of essential data ele-
ments, data arrangement, and file format.
Contrary to the authors’ expectations, all
of the vendors simply repackaged their
monthly usage reports and submitted
them to the libraries. Therefore, the only
practical difference between the field-test-
ing report and the report that libraries ac-
cessed from the vendor Web site in a nor-
mal situation was that libraries received
the data files directly from the vendors
instead of retrieving them from the ven-
dor Web sites. Several libraries appreci-
ated the fact that they could receive data

files in text format, which is much easier
to handle than, say, HTML format. An-
other minor difference was the availabil-
ity of data definitions and statistics col-
lection processes from some of the par-
ticipating vendors. In some cases, this was
the first time that explanations were avail-
able to the libraries. Typically, documents
that contain definitions of statistics and
other background information, if they are
available, are provided on the vendors’
Web sites.

Even when the sets of data were avail-
able from vendors, it was difficult for the
libraries to do valid comparisons of the
data because of insufficient descriptions
of data definitions and limited explana-
tion of how the data sets were collected
and summarized. Many libraries feared
that, without explanatory information on
what each data element in the vendor re-
ports meant and how the counts were fil-
tered, such a comparison would have
been faulty at best. This suggests that until
there is a satisfactory degree of assurance
that the statistics provided by the differ-
ent vendors—based on the documenta-
tion they provide—are consistent enough
for cross-comparison, libraries will not
commit major resources in an attempt to
compile vendor data into a standardized
format or repository.

Another problem with comparing data
from multiple vendors was the inconsis-
tent data formats. The task of combining
data fields and adjusting data arrangement
from even three or four vendors proved
to be extremely time-consuming. What li-
braries want is a standardized usage re-
port containing common data elements
and arranged in a predetermined, agreed-
upon order that is provided separately
from vendor-specific data elements or ad-
ditional data. Even the different placement
of field headings, in a column or in a row,
requires special handling by the libraries.

The majority of respondents said that
the data provided by these vendors are
“necessary and valuable.” They liked the
fact that the data are “very straightfor-
ward and easy to use” and, more impor-
tant, that they provide some indication of

The market for electronic content
providers is becoming more diverse
and complicated, and the types of
statistics that best serve libraries in
this changing environment need to
be considered.



Improving Database Vendors’ Usage Statistics Reporting  509

the extent to which subscription-based
services are being utilized. Of course, the
relative value depends on the quality of
data and the importance of the database
to the library (e.g., the amount of money
the library spends for a particular data-
base as compared with other databases
to which they subscribe).

Although the majority of libraries be-
lieved that the usage reports provided by
the individual vendors are useful, some
questioned the cumulative value of usage
reports combined across vendors. Given
the fact that typical ARL libraries deal
with several dozen database vendors,
normalizing the data, in the current
forms, from these vendors will require a
prohibitive amount of effort.

Usage reports deal almost exclusively
with the specific use of vendor databases
in terms of frequencies (e.g., searches and
sessions), duration (e.g., connection time),
and amount of information transfer (e.g.,
items requested) while largely ignoring
another dimension that many libraries
consider very important—information
about user behavior. The current usage
metrics provide information about user
behavior to a degree, but not at the level
many libraries would hope.

To be useful, information about user
behavior will need to be correlated with
individual user profiles. But the current
environment for database access, which
is heavily rooted on IP-based authentica-
tion, does not permit the kind of data col-
lection that libraries expect. Although
there is a desire to receive more detailed
information about user behaviors, it con-
flicts with the current practices and the
libraries’ concern about user privacy.

Optimally, vendors would provide an
option that allows libraries to access raw
data log files that have sufficient infor-
mation for useful analysis and standard-
ized definitions, and that are collected
consistently over time. Unfortunately,
many vendors were unable to provide log
data files because of technical, legal, or
other concerns.

Because the field-testing dealt with
only one month’s data (April 2001), it is

difficult to know if what was collected is
typical. However, the authors have not
heard from the field-testing libraries of
any unusual discrepancy between the
field-testing data and data they received
before the field-testing. The authors real-
ize that just comparing data from the
same vendors will not provide a satisfac-
tory answer to collecting accurate, reli-
able, and standardized data.

During the course of writing this re-
port, the authors came across an e-mail
message from a major database vendor
acknowledging errors in its usage reports.
This suggests that libraries are not in a
good position to know what exactly goes
into the vendor reports. Some unusual
numbers or patterns are relatively easy
to identify, but consistent under- or
overcounts are harder to detect.

The authors believe that the data pro-
vided from the vendors studied are easy
to obtain and manipulate. Most vendors
offer several data formats, including text
format (e.g., comma-separated file) and
spreadsheet format (e.g., MS Excel), in
addition to standard HTML format for
easy viewing in Web browsers. Also,
many vendors offer an ad hoc report gen-
eration facility whereby libraries can cus-
tomize the fields and set desired time
periods they want to examine. However,
processing vendor reports from multiple
vendors may become a burden on librar-
ies in terms of time and staff efforts be-
cause the formats and data arrangements
vary considerably from vendor to vendor.

Dealing with vendor usage reports
raises a number of other issues. The mar-
ket for electronic content providers is be-
coming more diverse and complicated,
and the types of statistics that best serve
libraries in this changing environment
need to be considered. Companies such
as netLibrary did not even exist when the
ICOLC guidelines were first drafted. A
related issue is the effect of mega-merg-
ers taking place in the electronic content
providers’ market and how these merg-
ers will affect statistical reporting.

For the most part, libraries have relied
on the ICOLC guidelines as the de facto



510  College & Research Libraries November 2002

standard for usage statistics for licensed
materials. Indeed, the guidelines brought
the issue of usage statistics to full view for
many practicing librarians and database
vendors. Although most vendors included
in the study claimed a high level of compli-
ance with the guidelines, some librarians
remain skeptical, citing the differences in
the way statistics are collected by different
vendors (e.g., different time-outs) and the
lack of concrete documentation. The ICOLC
guidelines are concerned mainly with de-
fining basic usage statistics and do not con-
tain detailed information that can be used
to validate whether the vendor reports ad-
here to the standard. In addition, the library
community may have different opinions
about how statistics should be counted.
What level of specificity are we pursuing
in the standardized reports? And who is
going to ensure that a vendor report meets
the accepted standard?

The validity of usage statistics is a criti-
cal issue that needs to be addressed seri-
ously. First, there should be more detailed
information to analyze the validity of re-
ported statistics from database vendors.
The current documentation, albeit im-
proved, is simply not adequate. In this re-
gard, Project Counter is an important ini-
tiative because it attempts to define an
agreed-upon code of practices. All related
parties need to work together to draw up
the specifics of the practices to dispel the
persistent suspicion that even the same
statistics are counted differently. For this
to happen, libraries, publishers, and
aggregators need to continue a healthy
dialogue regarding their expectations.
Vendors need to be more forthcoming in
the discussion and better describe what
they do and how they do it in usage re-
porting. Because practitioners themselves
sometimes do not agree on what is valid,
the library community needs to deter-
mine what a valid metric is. Establish-
ment of the National Clearinghouse for
Library and Information Center Net-
worked Statistics described earlier can
help formulate consensus among practi-
tioners. Finally, an external validation
service or organization can be considered

as a part of the solution. The validating
service then would enforce compliance to
industry standards and monitor actual
use. The authors mention this simply as
a possibility in the long term and suggest
that it be thoroughly examined before
being put forward for implementation.

This study has not dealt with issues re-
lated to usage reporting in consortial ar-
rangements. As those are becoming very
common in research libraries, librarians
will need to make sure that the individual
members involved in the consortia re-
ceive the same level of usage statistics for
their institutions as in individual site-li-
censing agreements.

Usage statistics currently provided by
vendors give useful information regard-
ing the utilization of external subscrip-
tion-based information services. Librar-
ies use the data for a variety of purposes:
usage trends over time, justification for
expenditures, cost analysis, modification
of service provision, and so on. Related
to the issue of the value of the data is the
trustworthiness (reliability) of the data.
And, as discussed earlier, there also is
some concern about the lack of user-re-
lated information in usage statistics.

Recommendations
Based on the findings of this study, the
authors make several suggestions that
may be useful for ARL libraries (and per-
haps other libraries) to consider in deal-
ing specifically with vendor statistics, in-
cluding:

• Focusing data analysis on high-impact
databases: Libraries should not treat all da-
tabases equally when it comes to data
analysis. Because of inconsistencies in
data elements and report delivery, it is
difficult to normalize usage statistics from
all vendors who report data. Instead, li-
braries need to investigate the usage pat-
terns of “major” databases, whatever
those might be locally, and ways that im-
provements can be made in terms of ac-
cess and use of materials.

• Collecting locally obtainable data for
external databases: Although libraries need
to depend on database vendors for usage



Improving Database Vendors’ Usage Statistics Reporting  511

statistics, they have several ways (e.g.,
through redirect page counters for li-
censed databases or through proxy server
logs) to capture at least partial informa-
tion on user access to the external data-
bases (e.g., attempted log-ins). This kind
of internal data helps libraries spot-check
the reliability of vendor-supplied usage
statistics. Moreover, because the data will
be under the control of libraries, they will
be more consistent than measures re-
ported by different vendors.

• Keeping track of aggregate key statis-
tics and use them: Libraries often find
themselves in need of gross figures of user
access to external licensed databases for
various internal and external reporting.
The aggregate numbers are good indica-
tors of overall trends in user demand for,
and access to, external databases. It is
important to keep some level of consis-
tency in the way the gross figures are cal-
culated and reported. One way to main-
tain consistency is to gather data from the
same pool of database vendors or data-
base titles over a specified period of time
(e.g., Total number of searches conducted
in existing licensed databases grew by
20% in 2000 to 1,200,000 as compared to
the 1999 total of 1,000,000 searches. The
data are based on the same thirty-five
vendors that report the statistic.).

• Validating reliability: The library
community needs to consider concrete
ways (e.g., third-party validation) to en-
sure consistent and reliable reporting
from vendors.

• Demanding documentation: Libraries
should demand better documentation of
the data collection and filtering process
from the various vendors. Such documen-
tation should describe how the sets of
data are collected and defined, and dis-
cuss any issues or concerns in the report-
ing of these data to libraries.

• Organizing the library for data collec-
tion, analysis, and working with the vendors:
Many libraries simply lack adequate staff,
or the staff members lack adequate
knowledge and training, to work effec-
tively with the statistics and information
that some of the vendors can supply. Li-

brary staff need to have an understand-
ing of the statistics and to know how to
manipulate the files and how to organize
and report such data. In addition, the li-
brary needs to be able to commit organi-
zational resources to working with and
using such vendor statistics.

The use of different system parameters
(e.g., time-out), the application of different
assumptions about user behavior (e.g., how
to treat or count multiple clicks on the same
document within a session), and the lack

of adequate explanation in vendor docu-
mentation regarding specific definitions
and data collection and filtering processes
all contribute to the reporting problem. The
comprehensive standardization of usage
statistics and data delivery methods (e.g.,
file format and data arrangement) cannot
be easily achieved in the short term. These
are long-term goals toward which vendors
and libraries need to work together. The
ARL community should continue to make
progress in this area by working among
themselves and with the database vendor
community. Therefore, the authors recom-
mend that comparisons be limited to data
from the same vendors or data that are
known to be collected, defined, and re-
ported similarly.

The authors strongly recommend that
vendors report standardized usage statis-
tics, such as those recommended by the
ICOLC and those defined in the final
manual that resulted from the project.19

These should appear in the standardized
column and row arrangements and in-
clude a separate report that contains any
additional vendor-specific data.

Continuing the Momentum
ARL libraries have needed consistent,
comparable, easy-to-use, and useful us-
age statistics from content providers (da-

The fact of the matter is that both the
library community and the vendor
community have much to learn in
terms of understanding how best to
define, collect, analyze, report, and
validate such statistics.



512  College & Research Libraries November 2002

tabase vendors) ever since they embraced
the notion of maintaining statistics on the
use of external licensed materials. The
ARL E-Metrics Project provided an op-
portunity for the ARL community to look
at the issues and problems related to ven-
dor usage reporting in a more systematic
way and to begin working toward devel-
oping more useful reports. However,
much more work remains in this area.

Members of the study team found that
some library staff had little knowledge
about the vendor statistics, had limited
training in being able to manipulate and
analyze the reported data, and were quite
surprised that such evaluation and ma-
nipulation of data required special train-
ing and knowledge. Some libraries were
not organized for ongoing data collection
and analysis of vendor statistics: it was
unclear who was responsible for such ef-
forts and whether resources were avail-
able to support these efforts. And finally,
most libraries simply had no manage-
ment information system (even in the
most basic sense of the word) for orga-
nizing, analyzing, and reporting such
data. The study team found that, in gen-
eral, libraries were not prepared to com-
mit the necessary resources, staff time,
training, and effort into the evaluation.20

Thus, one difficulty in some of the dis-
cussions with the vendors was a lack of
knowledge and skills on the part of the li-
brarians in using and analyzing the data.
The fact of the matter is that both the library
community and the vendor community
have much to learn in terms of understand-
ing how best to define, collect, analyze, re-
port, and validate such statistics. For their
part, many libraries simply do not have a
culture of evaluation that supports the as-
sessment effort needed to use vendor-based
statistics successfully.21 Organizational de-
velopment, staff training, and strategic
planning for exploiting such data on the
part of libraries will be key components in
moving forward in this area.

Several organizations in library and
vendor communities, national and inter-
national bodies, are currently working in
this area. Although these initiatives do not

overlap exactly in terms of goals and
scopes, there is a danger that they may
result in conflicting reporting require-
ments. Specific ways to coordinate and
encourage cooperation have yet to be de-
veloped. Indeed, the number and range
of organizations interested in developing
standardized statistics is significant.

From the vendors’ point of view, it is
impossible to respond to multiple and
conflicting requests for data from the li-
brary community. As one vendor com-
mented, until the library community can
decide how best it wants the data defined,
collected, validated, and reported, ven-
dors cannot provide endless responses to
users in terms of offering “customized”
data sets. Thus, to some degree, the mem-
bers of the library community must con-
tinue to work among themselves to reach
such an agreement regarding these stan-
dards.

In addition, different types of libraries
(academic, school, public, special, etc.)
need to reconsider the degree to which
they think their needs are unique to their
particular settings. A “full-text down-
load” is not going to vary across type of
library. Those librarians who argue that
they have unique or special data needs
simply support the view of some vendors
that it is impossible to provide multiple
data types, defined differently, for differ-
ent libraries. And little progress will be
made on standardizing these statistics.
The members of the library community
must work together in the development
of such standards, definitions, and report-
ing.

Both vendors and the library commu-
nity need to realize that the development,
testing, refinement, and standardization of
vendor-based statistics is an ongoing pro-
cess. Given the changes in technology, da-
tabase structures, and other factors, the life
span of the statistics may be short (com-
pared with more traditional library statis-
tics). Thus, being able to have longitudi-
nal data from vendors may be difficult and
there will be a need to be much more prag-
matic as to the availability and use of ven-
dor-based statistics. Establishment of the



Improving Database Vendors’ Usage Statistics Reporting  513

National Clearinghouse for Library and
Information Center Networked Statistics
(http://www.ii.fsu.edu) at Florida State
University’s Information Use Manage-
ment and Policy Institute will play a coor-
dinating role in the collection, use, and
analysis of network data sources includ-
ing, but not limited to, database vendor
statistics. The clearinghouse will facilitate
the cross-fertilization of the various efforts
thus far to build on each other and inte-
grate activities for meaningful library as-
sessment in support of decision making
and analysis.

One important accomplishment dur-
ing the project was the initiation of con-
versations and cooperation with major
database vendors. Currently, library lead-
ership in this area is diffuse and lacks co-
ordination. Work needs to continue, es-
pecially in the standardization of key us-
age statistics, data delivery, and better
documentation of definitions and report-
ing procedures. An ongoing, more for-
malized mechanism is essential to ensure
that such meetings take place, progress
is made, and better standards for vendor-
based statistics are developed.

Notes

1. Association of Research Libraries, ARL Supplementary Statistics 2000-2001 (ARL, Washing-
ton, D. C., 2002). Available online from: <http://www.arl.org/stats/sup/sup01.pdf>.

2. It is quite possible that the users may not even realize that the library evaluated the elec-
tronic sources and negotiated the licensing contracts. Most electronic databases validate legiti-
mate user log-ins by examining the origination IP (Internet protocol) addresses included in the
requests. Users can bypass the library Web site when they access external electronic databases as
long as they use the computers carrying the legitimate IP addresses. For remote users using
Internet service providers (ISPs), they may need to set up a proxy server that allows them to
access electronic databases. The proxy information allows them to use the institution’s IP ad-
dresses.

3. The arrows denote movement of information contents and location of user access. In the
depiction of the traditional library, materials reside in library premises and users come to the
library to use the collection. On the other hand, in the networked library, not all of the library
collection resides within the library’s physical boundaries. Also, part of user access occurs out-
side the library, as in the case of access to most subscription-based, electronic-licensed materials.

4. These include off-site storage facilities libraries use to reduce the cost of warehousing less
frequently requested materials.

5. The authors are not suggesting here that only external, licensed materials are of concern
to scholars and libraries. They acknowledge that other freely available or community-based ser-
vices are being heavily used by academic users.

6. There can be a set of arbitrary limitations such as simultaneous log-in limits, but they do
not originate from the characteristics of the electronic format.

7. International Coalition of Library Consortia, “Guidelines for Statistical Measures of Us-
age of Web-Based Information Resources” (ICOLC, 2001). [revised December 2001]. Available
online from http://www.library.yale.edu/consortia/2001webstats.htm.

8. There is a large and diverse number of electronic content providers in the market, and it is
difficult to describe them collectively. The term “database vendors” is used in this article to
denote various content providers such as traditional journal publishers providing electronic coun-
terparts, aggregators of full-text journals and reference databases (e.g., JSTOR, Project Muse,
Ebsco, Gale, ProQuest), electronic book providers (e.g., Questia, netLibrary), and so on.

9. John Carlo Bertot, Charles R. McClure, and Joe Ryan, Statistics and Performance Measures
for Public Library Networked Services (Chicago: ALA, 2001).

10. Carol Tenopir and Eleanor Read, “Patterns of Use and Usage Factors for Online Databases
in Academic Libraries,” College & Research Libraries 61 (May 2000): 234–46.

11. Deborah D. Blecic, Joan B. Fiscella, and Stephen E. Wiberley Jr., “The Measurement of Use
of Web-based Information Resources: An Early Look at Vendor-supplied Data,” College & Re-
search Libraries 62 (Sept. 2001): 434–53.

12. Sally A. Rogers, “Electronic Journal Usage at Ohio State University,” College & Research
Libraries 62 (Jan. 2001): 25–34.

13. Kathleen Bauer, “Indexes as Tools for Measuring Usage of Print and Electronic Resources,”
College & Research Libraries 62 (Jan. 2001): 36–42.

14. Charles R. McClure and John Carlo Bertot, ed., Evaluating Networked Information Services:
Techniques, Policy, and Issues (Medford, N.J.: American Society for Information Science and Tech-



514  College & Research Libraries November 2002

nology, 2001).
15. The meeting served as the planning session for the project that later became known as the

ARL E-Metrics Project. Thirty-five ARL institutions were represented at the meeting. A survey
questionnaire was sent out before the meeting, and twenty-one libraries responded. The survey
contained four open-ended questions on the data needs for electronic information resources, the
status of data collections at participating libraries, and the expectation for the meeting. The sum-
mary presentation during the meeting (in the Microsoft Powerpoint format) is available online
from http://www.arl.org/stats/newmeas/scottsdale/jeffshim/.

16. Bertot, McClure, and Ryan, Statistics and Performance Measures for Public Library Networked
Services.

17. Additional meetings between vendors and other members of the library community re-
garding statistics have occurred at ALA midwinter 2001 and 2002. These meetings also have
attempted to better coordinate and validate vendor-based statistics that are being reported to
libraries. The National Commission on Libraries and Information Science has sponsored these
meetings and summaries, as well as other related reports, and is available online from http://
www.nclis.gov/statsurv/statsurv.html.

18. An interim Phase I report describing current practices of participating ARL member li-
braries related to network statistics and measures was issued November 7, 2000, and is available
online from http://www.arl.org/stats/newmeas/emetrics/index.html.

19. Wonsik Shim, Charles R. McClure, Bruce T. Fraser, and John Carlo Bertot, Data Collection
Manual for Academic and Research Library Network Statistics and Performance Measures (Tallahassee,
Fla.: Information Use Management and Policy Institute (Dec. 2001); also available from the ARL
online from http://www.arl.org/stats/newmeas/emetrics/index.html.

20. As part of the E-Metrics Project, the study team produced three PowerPoint presentations
about preparing the organization for evaluation, the importance of evaluation and use of vendor
statistics, and an overview/introduction to the recommended statistics. These presentations are
available from the ARL, in Washington, D.C. and from http://www.arl.org/stats/newmeas/
emetrics/index.html.

21. Amos Lakos, “The Missing Ingredient: Culture of Assessment in Libraries,” Performance
Measurement and Metrics: The International Journal for Library and Information Services 1 (1): 3–7.