hightower.p65 Benchmarking Web Site Usage 61 Recommendations for Benchmarking Web Site Usage among Academic Libraries Christy Hightower, Julie Sih, and Adam Tilghman The Web sites that academic libraries are developing for their research communities represent an important new aspect of information man­ agement. Comparative statistical analysis of Web site usage among simi­ lar institutions would improve librarians’ ability to evaluate the effective­ ness of their efforts. A centralized voluntary reporting structure for Web server usage statistics, coordinated by the Association of Research Li­ braries’ (ARL’s) Office of Statistics, would provide a significant service to academic librarians. Factors to consider in designing such a benchmarking program are discussed, based on a pilot study of Web site usage statistics from fourteen science and technology libraries. ew measures of library activ­ ity attract the attention of practitioners and administra­ tors alike because they prom­ ise answers to two eternal questions: (1) How effectively are librarians meeting the information needs of their primary clien­ tele? and (2) Have their own approaches to budgetary and technological chal­ lenges been more or less successful than those of comparable institutions? As we progress toward the largely digital library of the future, the active role of librarians in designing user interfaces and expert systems further whets their already voracious appetite for usage data. The foundation for many institutions’ digital library efforts is their development of highly customized Web sites. This is fortuitous because Web servers automati­ cally log data about the demand for spe­ cific resources within these sites. One expects to find copious profes­ sional literature on how academic librar­ ies can capitalize on readily available data on the size and characteristics of their own Web site audiences. After all, the popular press has seen fit to devote lengthy articles to the scores of tools and services available for Web server log analysis, as have trade journals in the fields of business and computing. There­ fore, it is astonishing to discover how little the library and information science jour­ nals have published about the potential of these statistics as a measure of library activity. This dearth of literature has left librar­ ians ill-informed as to the capabilities of Web server log analysis software. Many Christy Hightower is the Web Coordinator in the Science & Engineering Library at the University of California-San Diego; e-mail: chightow@ucsd.edu. Julie Sih is formerly the Corporate Programs Librar­ ian at the University of California-San Diego; e-mail: jsih@gort.ucsd.edu. Adam Tilghman is the Web Programmer Analyst in the Science Libraries at the University of California-San Diego; e-mail: agt@ucsd.edu. 61 mailto:agt@ucsd.edu mailto:jsih@gort.ucsd.edu mailto:chightow@ucsd.edu F IG U R E 1 Total Site P age R equests by U ser O rigin (F eb. 1996) ����������������� 62 College & Research Libraries January 1998 0 200 0 400 0 60 00 80 00 10 000 12 000 1 400 0 1 600 0 1 800 0 N M L * K J I H G F E * D C U C S D A Physical Sciences Library Websites * Incom ple te da ta Internal U nresolved E xternal academic libraries undervalue, or even ignore, their Web traffic data. Even librar­ ies that regularly analyze their server logs have difficulty interpreting the results in the absence of benchmarks. Without ex­ ternal comparisons, judging the success of a site is difficult because the size of the potential audience is unknown. Fourteen thousand page requests per month may sound impressive, but how does one know whether that indicates stellar or abysmal demand for a midsize university’s sci/tech library Web site? To help library directors and Web de­ velopers make sense of their own Web site traffic measurements, the authors of this article examine how a benchmarking pro­ gram might be developed to compare the statistics of one academic library Web site against those of others. In so doing, the authors identify several practical and philosophical issues concerning intercam­ pus comparisons of library Web site traf­ fic. Because such Web sites are unique resources developed for specific audi­ ences, meaningful benchmarking can oc­ Engineering Web Total Site Top Ten Pages Resources Pages1 Benchmarking Web Site Usage 63 TABLE 1 Page Requests by Site versus Selected Pages (Feb. 1996) Physical Engineering Web Sciences Library Total Site Top Ten Pages Resources Pages1 Web Sites Requests Rank Requests Rank Requests Rank A 16,339 1 15,951 1 4,956 1 UCSD 13,086 2 8,698 2 512 2 C 4,394 3.5 4,340 3 n/a n/a D2 4,390 3.5 2,712 5 218 6 E 4,241 5 2,408 6 88 9 F 4,001 6 3,684 4 294 4 G 3,189 7 1,886 7 222 5 H 2,927 8 1,514 9 155 8 I 1,694 9 1,694 8 n/a n/a J 1,395 10 1,131 10 157 7 K2 1,081 11 610 13 46 10 L 851 12 785 11 16 12 M 679 13 679 12 360 3 N 429 14 329 14 27 11 1 Includes only those pages with external links 2 Incomplete data available cur only among carefully selected peers whose Web sites share essential charac­ teristics. Moreover, equitable compari­ sons require uniform definition of mea­ surement units, as well as establishment of a standardized approach to collection, analysis, and reporting. Based on their experience analyzing the Web server log files of fourteen uni­ versities’ science and engineering librar­ ies, as well as their evaluation of stan­ dards proposed by two Internet advertis­ ing bodies, the authors propose voluntary guidelines and a common set of metrics for estimation of library Web site audiences. They also assert that the ARL might provide a valuable service to its member institutions by facilitating this standardization process and providing a mechanism whereby aca­ demic libraries can choose appropriate peers and models. Methodology With no relevant research models in li­ brary and information science journals, the authors’ main sources of technical in­ formation were the Web site of the wwwstat 2.0 analysis software,1 the FAQ (frequently asked questions) document for the comp.infosystems.www USENET newsgroups,2 and anecdotal advice shared in forums such as the Web4Lib electronic discussion list.3 The authors Because such Web sites are unique resources developed for specific audiences, meaningful benchmarking can occur only among carefully selected peers whose Web sites share essential characteristics. also consulted several business and com­ puting articles whose reports of known problems with Web site audience estima­ tion helped in planning their approach.4 The authors began their study in No­ vember of 1995 by identifying possible peers for the Science & Engineering (S&E) Library at the University of California- San Diego (UCSD). They invited partici­ Total Total 64 College & Research Libraries January 1998 ticipants based on the data TABLE 2 collected. Rather, the study’s Total Page Requests versus Edited1 Byte Totals aim was to gain valuable ex­ (Feb. 1996) perience in the process of benchmarking in order to Physical make informed recommen- Sciences Library Total Total dations on what data to mea- Web Sites Requests Rank Kilobytes Rank sure and how best to collect A 16,339 1 70,610 2 and analyze those data for UCSD 13,086 2 74,346 1 benchmarking in the library C 4,394 3.5 60,932 3 setting. The rankings pre­ D2 4,390 3.5 40,007 4 sented in figure 1 and tables E 4,241 5 763 14 1 and 2 are for purposes of F 4,001 6 16,050 6 comparing the metrics used, G 3,189 7 19,938 5 not the institutions. The au- H 2,927 8 10,349 7 thors felt that a small sample I 1,694 9 4,548 11 size was sufficient to accom- J 1,395 10 8,490 9 plish these objectives and K2 1,081 11 4,380 12 were not overly concerned L 851 12 2,038 13 when institutions with mul- M 679 13 8,724 8 tiple Web servers were able N 429 14 5,201 10 to provide data for only a 1 Multimedia files excluded from byte count 2 Incomplete data available pation from the sci/tech libraries of the following institutions: sister University of California campuses, the eight institutions used by the University of California for other benchmarking purposes (e.g., faculty salary comparisons), and some additional universities noted for their sci/tech pro­ grams. Of the twenty ARL and non-ARL institutions invited to participate, fourteen (including UCSD) participated fully by supplying their sci/tech libraries’ raw Web server log files for the month of February 1996. The participating library institutions were: the University of California cam­ puses at Berkeley, Davis, Irvine, Riverside, San Diego, Santa Barbara, and Santa Cruz; Cornell University; the Massachusetts In­ stitute of Technology; Stanford University’s Engineering Library; the State University of New York at Buffalo; the University of Illinois at Urbana-Champaign; the Uni­ versity of Michigan; and the University of Southern California. It was not the authors’ intent to assign relative performance outcomes to the par- single server. Other aspects of the methodology used, which are identified below, also reflect the pilot nature of the authors’ efforts. Most participants made their raw Feb­ ruary 1996 Web server access log files available by assigning them a uniform resource locator (URL) and allowing the authors to grab the data through their Web browsers, although two institutions sent their files via file transfer protocol (FTP). After obtaining the log files, the authors isolated the data for the specific Web pages they were interested in before running the files through analysis soft­ ware (wwwstat 2.0). Why Raw Log Files Were Requested As they currently exist, Web server sta­ tistics are based on the data contained in a server’s access log files. Each request for a document from a site is recorded as a line in that Web server ’s log (see figure 2). Most servers support the common log file format (CLF), which keeps very simple request information. It contains the visitor ’s host name (the machine F IG U R E 2 W eb S er ve r A cc es s L og C om po ne nt s r i � : " : S � { 00 S S r S � o � § . § § . § £ I � H§ ) §i �§l 0 0 o � § . § §§ HB- ) £ I 0 � H§ ) §i �§l 0 0 o � § . § § . § £ I §H- ) l i § i �§l 0 0 0 o � H� £ I j a ._: g� E jEg " ; 0�j :0 �M ;M; ;M g0g g� � 0 � g ;g �_: < 0 g : ; gg � 0 " a0 6"a a6a "i i: a � 0 n TN N N 0 g '_ 6 6 : : u < �u 0 n u� u < � a06 "a a6a "i i: a � 0 n TN N N T 0 g ' ?� 6 6 u" � u0 §i §_o i: I � _g: g a 6 ia _ a6 aii _ i:_ a � 0 n �N N N 0 g 0gg ' ?� ;.5 5\ ;s 5 \ 5\ .5. f ;p.O OO 0t0 R em ot e ho st n am e, o r IP a dd re ss R eq ue st — U R L a nd m et ho d N um be r of b yt es t ra ns fe rr ed H T T P R es po ns e co de D at e an d ti m e of a cc es s (2 00 , 30 2, 4 04 , et c. ) se hp lib .u cs d. ed u - - [1 2/ F eb /1 99 6: 12 :2 2: 39 - 08 00 ] “G E T /h o1 02 0r .g if H T T P /1 .0 ” 20 0 16 38 4 lowed them to determine which Web pages on each site would be included or ignored. This degree of standardization is vital for benchmarking purposes. Challenges of Performing Analysis on the Raw Log Files Although the raw data ap­ proach offered the greatest level of standardization for the study, it also posed a number of problems. First, it provided only a very brief window of opportu­ nity in which to obtain the files because some institu­ tions’ server software was configured to purge or overwrite their log files au­ tomatically at the stroke of midnight on the final day of the month. Second, be­ cause the raw data files were very large (some as large as 99.9 megabytes for a single month), their trans­ mission over slow connec­ tions took much longer than expected. Finally, upon ar­ rival at the authors’ site, they consumed significant amounts of disc storage space (not to mention com­ puter processing time for analysis). The authors also discovered that some of these massive files were not compliant with CLF format standards, so addi- Benchmarking Web Site Usage 65 name or IP address), the date and time of resents a single hit or request, hit counts the request, the URL of the page being re- for each file requested are obtained by quested, the number of bytes transferred, adding up the number of lines of data in and some technical information about the the log containing the URL for that file. hypertext transfer protocol (HTTP) re- Obtaining participants’ raw log files quest itself (for instance, how successful enabled the authors to use the same analy­ the request was). Because each line rep- sis software, configured in the same way, on each sample. It also al­ http:sehplib.ucsd.edu 66 College & Research Libraries January 1998 tional programming time was needed to convert the files into a format acceptable to the wwwstat analysis software. In the interests of standardization, the authors eliminated multimedia files and error messages from their comparisons. Fortunately, the wwwstat software may be instructed to do this automatically; however, to be certain that only the traf- In the interests of standardization, the authors eliminated multimedia files and error messages from their comparisons. fic of sci/tech library pages on each server was considered, the authors were forced to visit each site individually and work their way through every link to identify potentially “relevant” pages. This list was then verified with the Web site creators to ensure that important Web pages had not been overlooked. (To avoid having these visits reflected in the February 1996 log files, this verification was performed in late January 1996.) SelectingfPagesf orfAnalysis One of this approach’s most important and time-consuming steps proved to be the identification and isolation of the files relevant to the study before performing statistical analysis. Access logs are de­ signed to record requests for all files on a particular server; thus, in most cases, hits on Web documents irrelevant to the study were recorded faithfully alongside hits on the Web pages of interest. The selection of pages for analysis in­ volved some difficult and subjective de­ cisions. For example, the UCSD S&E Li­ brary manages a single Web site cover­ ing the physical sciences, mathematics, and engineering disciplines, but many traditional peer institutions divide these disciplines among multiple branch librar­ ies, usually with a separate Web site for each administrative unit (see figure 3). In an effort to make equitable comparisons with the UCSD S&E Library Web site, the authors attempted to construct virtual peers by combining statistics from mul­ tiple sci/tech library Web sites at the same institution, whenever appropriate and possible, and excluded hits on Web pages devoted to disciplines not covered by the S&E Library. However, it was impossible to obtain log files from all the Web serv­ ers the authors would like to have in­ cluded in the study, and it is reasonable to expect that institutions D and K in table 1 would have had higher page request totals had the authors been able to obtain data for all of those institutions’ relevant libraries. Because the final report was highly customized to the authors’ own institution’s benchmarking goals, the results were decid­ edly less useful for the other participants. For these other institutions to reap the same level of benefits under this model, each would need to obtain the raw data and perform an analysis based on its own Web page selection criteria. Most participants would find this prohibitively labor-inten­ sive for groups of more than two or three institutions. Therefore, for large-scale benchmarking efforts, the authors advo­ cate a model that would eliminate dupli­ cation of analysis efforts by providing a more objective assessment of all partici­ pating institutions’ relative performance across the board. Further recommenda­ tions and practical advice for such a pro­ gram are outlined below. Additional Background Since the authors’ initial literature review, there have been few significant contribu­ tions to the literature. Notable mono­ graphs are Rick Stout’s Web Site Stats: Tracking Hits and Analyzing Traffic and Robert W. Buchanan and Charles Lukaszewski’s Measuring the Impact of Your Web Site, both of which contain tu­ torial-level technical advice.5 Among the critiques of selected log analysis software products and services6 is Tova Stabin and Irene Owen’s case study “Gathering Us­ Benchmarking Web Site Usage 67 FIGURE 3 Web Page Selection for this Study age Statistics at an Environmental Health Library Web Site,” which compares three freeware analysis tools’ performance on the same server log file.7 Another library- oriented study is Norman Friesen’s “Monitoring the Use of World Wide Web Pages,” which provides a comprehensive literature review for online usage mea­ surements.8 All the above discuss how to analyze a single server log. As of October 1997, there had been no cross-site com­ parative studies.9 It is advertisers who are hammering out Web measurement standards and guidelines and who are striving to push the technology beyond what it is capable 68 College & Research Libraries January 1998 of today. Librarians’ needs in terms of Web usage data, scarcely articulated even in our own professional literature, cer­ tainly have not caught the attention of analysis software programmers. Conse­ quently, in whatever respects academic libraries’ situations differ from those of companies who advertise on the Web, li­ brarians must learn to adapt their mea­ surement methodologies. This requires a technical understanding of not only the analysis tools themselves but also the motives of Web-based advertisers and the sites that cater to them. One of the most interesting conclu­ sions from the authors’ study is that libraries cannot determine their Web peer group simply by comparing numerical usage statistics. Business Week reports that ad-sup­ ported Web sites have used server log analysis to justify an average ad rate of $17 CPM (cost per thousand viewers) for 1997, whereas television’s average CPM hovers between $5 and $6.10 According to Matthew Kinsman, these Web sites’ abil­ ity to document their growing audi­ ences—and especially their ability to tar­ get specific niches within those audi­ ences—allowed them to raise their ad rates more than 200 percent between the first quarters of 1996 and 1997.11 Adver­ tisers’ enthusiasm for “the most measur­ able of all media by far”12 explains why estimates for Internet ad expenditures range from $400 million13 to $940 million14 for 1997, and are expected to surpass $4.8 billion by the year 2000.15 Advertisers’ well-publicized doubts about the reliability of Web server statis­ tics have spawned scores of companies offering independent, third-party mea­ surements and/or auditing.16 These in­ clude ventures of such newsworthy com­ panies as Nielsen Media Research; maga­ zine trackers ABC (Audit Bureau of Cir­ culation) and BPA (Business Publications Audit) International; and the so-called Big Three of financial accounting (Ernst & Young, Coopers & Lybrand, and Price Waterhouse).17 In 1997, the proliferation of analysis software and services, each of which had developed its own units of measurement, prompted two advertising trade associa­ tions to issue standards and guidelines for gauging Web audiences.18 Both bod­ ies attempt to define metrics and meth­ odologies for cross-site comparisons. The first of these, the Coalition for Advertis­ ing-Supported Information and Enter­ tainment (CASIE), is a joint project of the American Association of Advertising Agencies and the Association of National Advertisers. The CASIE Guiding Principles of Interactive Media Audience Measurements, released on April 3, 1997, are endorsed by the Advertising Research Foundation (ARF); indeed, they are based on ARF’s long-standing principles for determining print, radio, and television audiences.19 On September 15, 1997, the Internet Advertising Bureau (IAB) released its own document, entitled Metrics & Methodology.20 The thirty-eight-member Media Measure­ ment Task Force that produced this docu­ ment included representation from the owners of such immensely popular sites as Yahoo and Playboy, as well as from ad buyers such as Microsoft. Although both sets of guidelines are tailored to the ad­ vertising industry, librarians can use them as models for standards appropriate to the needs of their profession. Recommendations for Developing a Benchmarking Program A Web site benchmarking program for academic libraries could be designed and implemented either informally among a few institutions or formally among many libraries. Based on experience in collect­ ing and comparing usage data from four­ teen test institutions, this article proposes some voluntary guidelines and a common set of metrics for the estimation of library Web site audiences so as to make future cross-site comparisons a possibility. http:Methodology.20 http:audiences.19 http:audiences.18 http:Waterhouse).17 http:auditing.16 Determination of Web Peers Whether seeking to establish an informal benchmarking network of a few libraries or to identify which libraries within a formal reporting structure to benchmark against, identification of the library’s Web peer group is an important first step to­ ward obtaining and using Web server usage statistics profitably. One of the most interesting conclusions from the authors’ study is that libraries cannot determine their Web peer group simply by compar­ ing numerical usage statistics. There are important nonnumeric characteristics to consider as well, and these defining char­ acteristics should be reported along with the numerical usage statistics in benchmarking programs. Finding a Web peer group is a three-step process: Step 1. Identify the starting pool. The first step is to identify a starting pool of those institutions or individual branch libraries whose character or activities are of interest. The pool could contain the library’s ARL peers, other institutions it usually benchmarks against because of similarities in student population or aca­ demic programs, institutions in the col­ lection development consortia, or institu­ tions with particularly noteworthy Web sites the library admires and wants to emulate. Surprisingly, a library’s ARL peer is not necessarily its Web peer. Admittedly, a larger sample might prove otherwise, but among the fourteen libraries in this study, an institution’s overall ARL ranking bore no statistical relationship to the number of hits to its Web site or to the number of bytes transferred. In the study group of ARL institutions, Web site hits were not statistically correlated with circulation statistics, reference queries, number of full-time students, number of teaching faculty, or dollar amount of research grants received by the institution. In ad­ dition, age of the Web site showed no cor­ relation to the number of hits: some young sites received more hits than older sites. Apparently, Web site character and Benchmarking Web Site Usage 69 quality are more influential in affecting usage than are the characteristics of the parent institution. By all means, ARL peers should be included in the starting pool because the library should bench­ mark itself against institutions it cares about; however, the starting pool should not be limited exclusively to the library’s peers in the traditional sense. Step 2. Narrow the list after site exami­ nation. The second step is to find Web peers from among those in the starting pool by examining each Web site. In the study, the authors were tempted to skip this step and merely rank their site against those with similar numbers of page re­ quests because they had the luxury of hav­ ing data from so many institutions in hand. When the authors did this, institution A (see table 1) appeared to be the only can­ didate to consider partnering with for fu­ ture exchanges of benchmarking informa­ tion. However, after examining the Web sites more carefully, it became clear that institution E was actually the closest peer because of its similarity in subject scope and design philosophy, site architecture, and target audience. These site character­ istics are the important nonnumerical data elements that benchmarking programs should also collect. Figure 4 illustrates how the essential nonnumeric characteristics of each participant, as well as the numeric data, might be reported, whether the re­ porting form is paper or electronic. In lo­ cating one’s Web peer, the ability to sort on these nonnumeric characteristics to nar­ row the starting pool becomes important. Subject Scope and Philosophy. Obviously, a good subject match is important among peers. The information-seeking behavior of individuals, the availability of Web data sources, and the suitability of subject-spe­ cific data to presentation in a Web envi­ ronment do differ somewhat by disci­ pline. Equally important, however, is how the site is designed to present the vari­ ous subjects to the visitor. It bears repeating that in the study the authors attempted to ignore the structure FIGURE 4 Sample Web Site Usage Data Report C a I e 1 1 n o \ ,,I lrhro7 0 C o ll eg e & R es ea rc h L ib ra ri es Ja n u ar y 1 99 8 Benchmarking Web Site Usage 71 of the Web sites sampled, with unsatis­ factory results. Their attempts to compile statistics from multiple sci/tech library sites in order to construct at each institu­ tion a virtual counterpart to their own library’s Web site (see figure 3) was time- consuming and awkward, and left linger­ ing doubts about equality because the str ucture of any site influences the visitor ’s path (and thus page request counts) so greatly. Combining separate virtual branch library sites statistically after the fact is not the same as present­ ing a combined, unified site to users and then measuring the resulting traffic. The authors contend that a visitor to a site designed as a single virtual library, containing links to all subject areas, has a different experience than one who comes to a site that presents itself as several vir­ tual branches, each serving different dis­ ciplines. This virtual structure will affect hit counts in much the same manner as physical gate counts are affected by the existence of physical branch libraries. Single virtual “central” libraries and sepa­ rate virtual branches are each valid de­ sign choices, but because the choice may affect page request counts, it is wise to select peers whose design is a match. In figure 4, each participant’s design phi­ losophy is categorized as being either a single virtual library site or a site with multiple virtual branch libraries. In cases of differing numbers of virtual branch libraries, it may be useful to consider constructing a peer group of individual sub­ ject pages rather than whole Web sites. Thus, an engineering (or patents, or chemistry) sub­ ject page, for example, could be compared to other libraries’ engineering (or patents, or chemistry) page rather than to their whole engineering library Web site. (See table 1 for engineering page request totals in the study group.) Design factors other than the presence or absence of virtual branches also can affect page request counts. In this study, the authors considered using edited byte counts (site totals minus bytes for multi­ media files) in relation to hit counts as a numeric indicator of the size and richness of libraries’ HTML documents (see table 2). However, subjective assessments, based on visits to each Web site, were found to be more informative for peer choices. The presence or absence of link annotations, the number of useful links on each page, the presence or absence of a “home” button on every page, and the number of layers or “clicks” separating common starting and ending points also affect hit counts.21 The number of addi­ tional design factors to consider in choos­ ing a peer group is a judgment call. Site Architecture. In addition to these design variables, even more fundamen­ tal differences can exist in a site’s archi­ tecture or technical implementation. Tra­ ditionally, Web sites are collections of HTML documents containing links and text. These HTML pages are “handcrafted,” constructed individually by subject special­ ists, and sit ready and waiting for a visitor to browse to find the links and information he or she needs. However, an alternative model for constructing Web sites has emerged and is gaining support as the best way to scale up or “automate” the construction of Web pages. In this new model, the site is actually a database of individual links with their associated an­ notations, and the HTML page the visi­ tor sees is created mechanically on the fly from this database of links and custom­ ized to match the visitor’s typed-in query. In the new model, the vocabulary used greatly affects the content of the page viewed. The two models could produce either very similar or potentially very dif­ ferent page structures, but at the very least the keyword search entry point probably would inspire vastly different browsing and query behavior from visitors. Usage statistics based on page requests cannot honestly be compared between sites that offer visitors such different information- seeking experiences. (In the study, only institution C used this database or a ma- chine-generated, on-the-fly model for http:counts.21 72 College & Research Libraries January 1998 page generation.) In figure 4, the site ar­ chitecture is noted in the participant’s profile. Target Audience. In narrowing the list of potential peers, the library also should look for a match in the site’s target audi­ ence. For benchmarking purposes in aca­ demic libraries, page requests for inter­ nal and external users should be reported separately (see figure 1). Some library sites are designed and marketed to be a resource to the entire world. (In the study group, institutions A and C were of this type.) These sites have unique digitized holdings, offer unique programs, or in some other way significantly add to the value of their sites with the goal of serv­ ing the needs of users beyond their home institutions. The sites are consciously Some library sites are designed and marketed to be a resource to the entire world. marketed to external users. Because the UCSD S&E Library’s target audience is the faculty, staff, and students on campus, institutions A and C would fall outside its peer group. Categorizing Web sites in the report­ ing structure according to design philoso­ phy (single virtual library versus multiple virtual branches), site architecture (tradi­ tional handcrafted versus machine-gen­ erated, on the fly), and target audience (internal, external, or both) will give Web server page request counts more mean­ ingful context and aid in peer identifica­ tion. These three defining characteristics could easily be identified in a participant profile section of the Web site usage data report (see figure 4). Finally, in examining the sites of each potential peer, the library also should look for whatever it values and admires most in a library Web site (e.g., sites with no stale pages, sites that consistently exhibit proactive and innovative uses of technol­ ogy to improve service to their users). Its peer group should contain some exem­ plary sites that are inspiring. Step 3. Evaluate following a trial run. Once the library has selected a peer group from the initial pool, the third and final step is to do a trial run and evaluate the usefulness of the match. Either internal, external, or total page accesses should be examined, as appropriate for the library’s own site’s goals. If the page access counts for some sites in the library’s group are substantially lower, it is debatable how useful continued comparison with those sites will be. The library may want to con­ sider dropping them from its pool if the time and effort necessary to obtain their data is high. (Of course, because the Web is always changing, the low-traffic sites may bear reexamination in the near fu­ ture.) If a noteworthy site similar in ar­ chitecture and purpose has page access counts that are substantially higher, it might be kept as a model to aim for. Also, it may be useful to consider constructing a peer group of selected subject pages rather than whole institutions if that would help the library’s development goals. MostfUsefulfStatisticsfforfBenchmarking Purposes Statistical analysis of log files results in a great deal of data that are fascinating to a Web site’s creators but of negligible in­ terest to their peer institutions. This ar­ ticle has already discussed the negligible value of cross-site comparisons of byte transmissions in the context of peer se­ lection; the authors further note that both sets of advertising industry guidelines examined omit mention of bytes entirely. What librarians—and advertisers—really want to know is this: How many unique individuals are using the resources on a particular Web site, and how does that number measure up to other, similar sites? Average and actual incidence of re­ peat visits also would be an indicator of sites’ ability to maintain an audience. Advertisers are hot on the pursuit of this “unique visitor” data, which Kirsner Benchmarking Web Site Usage 73 calls “the holy grail of site measure­ ment.”22 Unfortunately, most of the exist­ ing methods of visitor identification (e.g., sending cookie files or tracers,23 or con­ sulting IP address tables) identify indi­ vidual computers—a futile strategy in campus computer lab and shared library workstation environments. Demanding self-identification for each session via sur­ veys or passwords is intrusive upon the user and programming-intensive for the library. Therefore, Web audiences must be es­ timated based on the number of “hits” (re­ quests for individual files) made upon the server. Selecting which of these hits to count is essential for objective compari­ sons. For example, in the case of a library Web page that makes liberal use of deco­ rative graphics (e.g., library logo, back­ grounds, bullets), a single visit registers several hits on the server—one for the HTML file and one for each of the nontext elements, which are treated as individual files. This phenomenon can inflate a site’s aggregate hit count significantly. The au­ thors recommend reporting “page re­ quests” rather than “hits” as defined by the IAB. For nonframed pages, a page re­ quest is defined as “An opportunity for an HTML document to be displayed within a browser window, which may contain text, images, media objects (i.e., Java, Shockwave, Real Audio) or other online elements.”24 Thus, multimedia files are eliminated from the log before analy­ sis. Table 1 reports page requests rather than hits under this definition. Participants in benchmarking pro­ grams should preprogram their analysis software to ignore hits resulting from unsuccessful or rerouted requests. In ad­ dition, agreement should be reached on the treatment of hits from content-rich multimedia files and OPAC interfaces. “Multimedia files” is a class that spans from 120-byte GIF images of bullets to 100Mb+ audiovideo extravaganzas. At the high end are data-rich files such as those presented by art, music, architec­ ture, and map libraries. Because multime­ dia hit count comparisons would be worthwhile only between items of the same data type, these resources would be better benchmarked in a separate pro­ gram designed for that purpose. Although Web versions of OPACs do contribute to institutions’ Web presence, the navigational patterns that character­ ize their usage are significantly different from other library Web pages. For that reason, Web OPAC usage data also should be benchmarked as part of a separate pro­ gram, if at all. Level of Detail at Which Page Requests Should Be Reported Frequency. Analysis of the authors’ own site’s usage over time reveals that traffic varies greatly depending on whether an academic term is in session. To facilitate comparisons between quarter-system and semester institutions, institutions should report their statistics on a monthly basis. Another argument for reporting Web server traffic on a monthly basis, rather than by academic term or year, is the great speed at which Web functionality devel­ ops. Libraries that seek to be dynamic re­ organize their Web sites frequently as they create new resources and employ new functions. Monthly reports are more likely to distinguish the effects of these newly added (or newly deleted) files on overall usage patterns. As noted previously, the CLF format contains a wealth of information, and analysis software can be configured to present this information at varying lev­ els of detail. This gives participants in sta­ tistical exchanges a great deal of flexibil­ ity. However, some options provide more useful benchmarking indicators than oth­ ers. In the study, the authors evaluated these various measures to determine the ideal format for statistical reports among peers. Page requests for selected pages. The authors investigated whether evaluation of page request counts for the ten most 74 College & Research Libraries January 1998 FIGURE 5 Types of Webpages Homepage: The top-level document relating to a virtual library site. All the other pages constituting that site are usually accessible by following links from the homepage. Directional pages: Those pages that give directions, answer short questions, or state policy, e.g., pages that list branch libraries, give overviews of collections and services, list building hours, library mission statements, circulation policies. Reference pages: Those pages that provide substantive data, e.g., detailed guides to using the collection or specific databases, guides to Internet resources, data sets. Combination pages: Pages that combine both directional and reference type data, such as library newsletters, should be tallied according to the type of information that predominates. For instance, if more than half of the content of the newsletter is considered directional (e.g., how to get a library card), count it as "directional," but if more than half is devoted to substantive discussions (e.g., providing instruction in the use of new databases), the entire newsletter page should be counted as "reference." heavily used Web pages at each library Web site would prove more convenient or more telling than evaluation of total page requests. The study did not reveal signifi­ cant statistical differences between these two measurements (see table 1). Analyz­ ing the top ten pages would provide a quick and dirty way to determine overall “rankings” in terms of which institutions’ Web sites receive the most traffic. How­ ever, these rankings alone are of negligible value for benchmarking purposes because they do not allow one to evaluate whether another institution’s Web site is truly com­ parable to one’s own in terms of design philosophy, level of development, or in­ tended audience. The authors do not rec­ ommend reporting page requests for sub­ sets of pages based on numerical thresh­ olds (i.e., top ten most-used pages). Comparisons based on home page re­ quests alone also are unsatisfactory for benchmarking purposes. The highly non­ linear nature of the Web, the ability to book­ mark interior pages, and the fact that search engines usually send visi­ tors directly to interior pages makes counting the home pages alone inadvis­ able. However, requests for home pages, reference pages, and directional pages may be useful to report separately, as long as all pages intended for public use are reported in site to­ tals. Just as reference queries are tallied separately from directional queries at most reference desks, requests for Web pages that present reference-type data and those that present direc­ tional or policy information should be tallied separately (see figure 4). These two categories of Web pages function differently and represent vastly different levels of intellectual effort to create and maintain them. Simple definitions could be established for determining whether a particular Web page should be consid­ ered primarily “reference” or primarily “directional” (see figure 5). Because li­ brary home pages usually incorporate both these elements, they would be counted in a third category. Prior to the first data collection, each page at a Web site would be classified by type. A simple script would separate the pages for analysis every time thereafter (metadata tags might prove useful here). This classification requires some initial setup time but would greatly improve benchmarking quality. Design philosophy. Page request re­ ports should reflect whether a Web site is designed on a central or branch model because this affects the pattern of traffic flow. Web sites organized on a collabora­ tive model, in which the visitor ’s experi­ ence is of a single virtual library cover­ ing all disciplines, should report Web Benchmarking Web Site Usage 75 server usage for their site as a whole. In contrast, Web sites whose major organi­ zational divisions give users the sense of visiting separate virtual libraries should report subtotals that reflect this organi­ zation (see figure 4). Primary versus secondary clientele. Furthermore, hit counts for each of these categories should be reported by visitor origin (internal, external, or unresolved) (see figure 1). If the site is primarily de­ signed to serve the needs of local users and little or no effort is expended on external promotion, the library’s benchmarking ef­ forts should focus on comparing its inter­ nal hit count to the internal hit count of other institutions that share its emphasis on the local user. Because the incoming IP address of each visitor is recorded in the access log (see figure 2), the analysis software can easily check the IP address against a list of known IP addresses to tally requests originating from machines on campus separately from those originating outside the campus. This is a standard feature of the wwwstat package. Comparing Web site data across insti­ tutions requires more attention to detail than is currently the case with other types of data reported to ARL. Preserving an appropriate level of detail for equitable comparisons between sites makes Web usage data more complex than most other commonly exchanged statistics. How­ ever, once correctly programmed, the data collection is completely automatic and devoid of human error, unlike most other statistics reported to ARL. Implementation and Administration Issues The authors of this article strongly favor establishment of a formal reporting struc­ ture for comparing Web server statistics across academic libraries. Benchmarking efforts would be greatly facilitated by cen­ tral collection and distribution of these data, according to preestablished stan­ dards (patterned after the IAB model) defining comparable metrics for estima­ tion of library Web site audiences. Such an infrastructure would provide a larger pool of potential peers for selection and would eliminate the need to forge piece­ meal partnerships with other institutions in order to obtain their data. The Need to Inspire Commitment. In January 1996, the authors sent a question­ naire to the fourteen participating univer­ sity libraries, plus three other ARL insti­ tutions, which asked both Web content providers and Web site administrators a number of questions concerning how their Web site statistics were being used. The survey data indicated that the prac­ tical applications of Web server log file analysis were sorely underutilized in the sample population. Although eleven of the seventeen technical Web site admin­ istrators surveyed (almost 65%) said they use Web server analysis software to in­ terpret their log files, only four respon­ dents (under 24%) claimed that the con­ tent providers of their sci/tech libraries’ Web sites examined their usage statistics on a regular basis. Eight respondents (just over 47%) reported that their sci/tech Web page creators either had never re­ quested to see usage statistics for their Web sites or were still waiting for their technical administrators to make them Preserving an appropriate level of detail for equitable comparisons between sites makes Web usage data more complex than most other commonly exchanged statistics. available. Four others said that content authors requested these data only “occa­ sionally”—defined as less than once per academic semester or quarter. (One re­ spondent said that monthly statistics could be found on an internal page at any time but did not indicate whether content authors were taking advantage of this.) Fourteen of the seventeen responding institutions (just over 82%) indicated that, as of the time of the survey, usage statis­ tics had never affected decisions as to how 76 College & Research Libraries January 1998 staff hours should be allocated among the various aspects of their Web sites. A pessimist, upon observing these li­ braries’ failure to capitalize on their own Web server data, would question whether these same libraries would be inclined to support a more ambitious, multicampus undertaking. However, the authors are optimistic that the visibility of such a benchmarking opportunity would cap­ ture the attention of these libraries’ ad­ ministrators; with increased priority given to Web server statistics would come increased awareness of—and subsequent productive use of—the wealth of infor­ mation contained in Web server log files. The authors also note that several of the Web sites in the sample were only a few months old in January 1996; when sur­ veyed, their creators’ time was monopo­ lized by basic development, but as their Web sites mature, these individuals be­ come more receptive to the idea of using sophisticated management aids. Possible Roles for ARL. ARL’s Office of Statistics seems a logical choice for co­ ordinating a voluntary reporting struc­ ture among ARL members. Its expertise would prove a great asset in selecting a uniform software package or commercial service to be used by all participants. To keep pace with the rapid changes in Web site technology, this office also could as­ sist by identifying and evaluating new analysis tools. In addition, its innovative and exemplary Web page,25 where cus­ tomized ARL statistical reports are al­ ready available, would be a natural home for the distribution of Web usage data. Identification of the ideal method for analyzing participants’ Web server data would be one of the most important roles of the Office of Statistics. It is unlikely that the office would want to allocate the nec­ essary server space to collect, analyze, and archive institutions’ raw log files each month, so participants would need to ei­ ther analyze their own data using a uni­ form software tool, configured uniformly, or use the same commercial service. The libraries in this study that were regularly analyzing their own institu­ tions’ access logs either wrote their own programs to do so or used one of the ser­ viceable shareware or freely available software packages that have been avail­ able for a few years now, such as wwwstat. Due to the easy availability of low-cost software, none of the study par­ ticipants chose to contract with a commer­ cial service to have their Web server log file analysis performed for them.26 Com­ mercial analysis can be done on-site by purchasing software to run at the library’s end, thus allowing a high degree of interactivity and the ability to generate customized reports at any time. Or, com­ mercial analysis can be done off-site. Off- site analysis saves time and system re­ sources but gives less control over the format and frequency of reports. Again, the Office of Statistics would be uniquely qualified to evaluate these options and to help select the best analysis method for program participants. Technical support staff from the Office of Statistics might direct a network of volunteers from the technical Web sup­ port staff at various ARL institutions who understand how to configure the analy­ sis software according to established stan­ dards. These volunteers would be avail­ able to help library Web server adminis­ trators with both the initial setup and ongoing adaptations to the configuration as their Web sites change and grow. Ide­ ally, each institution would undergo an annual audit to confirm that the software is configured in a standardized way. Because the Web is evolving so quickly, any large-scale benchmarking scheme will undoubtedly need periodic revision. Nevertheless, the Web itself and the soft­ ware for analyzing Web server statistics are mature enough today to begin a benchmarking program. The level of ef­ fort being poured into library Web sites and the degree of importance that Web sites have to library users justify the ef­ fort required to mount such a program. Summary of Recommendations The authors strongly favor establishment of a voluntary reporting structure for comparing Web server statistics across academic libraries. Seven recommenda­ tions for developing a centralized volun­ tary benchmarking program are pro­ posed below. The ARL Office of Statistics is well suited to coordinate such a pro­ gram, and would provide a significant service to its members by doing so. 1. Selecting peers for Web site benchmarking is a three-step process: Step 1. Identify a starting pool of insti­ tutions whose character or ac­ tivities interest the library. Step 2. Narrow the list by examining each Web site, looking for matches in subject scope and design philosophy, site architec­ ture, target audience, and other desirable characteristics. Step 3. Perform a trial run to evaluate the usefulness of the match. 2. Page requests, as defined by the IAB, are the most practical basis for multisite comparisons. Purely decorative or directional multimedia files (pictures, bullets, icons, etc.) and error messages should be eliminated from reported to­ tals. 3. Hits from data-rich multimedia files, Web OPACs, and commercial prod­ ucts (e.g., electronic journals) would be excluded from this benchmarking pro­ gram. 4. Page requests for each institution should be reported on a monthly basis, according to the various categories of component pages: • All library Web pages at an institution should be considered, not just home pages or the top ten pages from each branch. • Traffic subtotals should be reported in terms of Web sites’ major organizational components (which frame how visitors navigate the site); these subtotals may not always parallel the host libraries’ physi- Benchmarking Web Site Usage 77 ture of the computers serving the data. • “Reference”-type Web pages should be totaled separately from pages that are merely “directional.” • For each of these categories, page re­ quests should be reported by visitor origin (internal, external, unresolved). 5. Page request comparisons are more meaningful when institutions’ usage data are sorted according to shared site char­ acteristics: • Design philosophy: Collaborative ap­ proach versus branch by branch • Site architecture: Handcrafted versus on the fly • Target audience: External promotion versus no external promotion 6. A formal reporting structure for comparing Web server statistics across in­ stitutions would greatly facilitate benchmarking of usage data by: • Motivating more institutions to par­ ticipate, thus providing a larger pool of potential peers • Eliminating the need to forge piece­ meal partnerships with other institu­ tions in order to obtain their data 7. The ARL Office of Statistics should provide centralized coordination and as­ sistance for exchanging Web server usage data, including: • Ensuring consistency through estab­ lishment of voluntary standards for defining comparable metrics for esti­ mating academic library audiences • Determining the most effective, uni­ form means for participants to analyze their data, through selection of a soft­ ware package or an analytical service to be used by all • Serving as the centralized collection point for institutions’ reported data and by overseeing periodic audits • Distributing the reported data through the ARL Office of Statistics Web page • Providing a yearly evaluation of the program’s effectiveness This work was supported by a grant from cal/administrative structure or the struc- the Research Grants for Librarians Program, 78 College & Research Libraries awarded by the Librarians Association of the University of California, San Diego Division. The authors are grateful for the participation of those libraries that shared their data. They January 1998 also would like to thank Susan Starr, Jim Jacobs, and Suzanne Wakerly for their many helpful comments on early drafts of this ar­ ticle. Notes 1. The wwwstat software, written by Roy Fielding, is available from the University of Cali- fornia-Irvine Department of Information and Computer Science at http://www.ics.uci.edu/pub/ websoft/wwwstat (Nov. 1996). 2. “How Can I Keep Statistics about My Web Server?” at http://www.boutell.com/faq/ stats.htm (Apr. 1996) is the relevant section of Thomas Boutell, WWW FAQ: Frequently-Asked Ques­ tions (Answered, of Course!). The comp.infosystems.www USENET newsgroup has since been divided into fifteen separate newsgroups, accessible via http://www.boutell.com/faq/ ngroups.htm (Apr. 1996). 3. Materials from the Web4Lib Electronic Discussion, including a searchable archive and The Library Web Manager’s Reference Center, may be accessed from http://sunsite.berkeley.edu/ Web4Lib/ (Sept. 1997). 4. The most helpful of these were: Steven J. Vaughn-Nichols, “Caching Could Stall Internet Commerce,” Byte 20, no. 6 (June 1995): 40; Julie Chao, “Tallies of Web-Site Browsers Often De­ ceive,” Wall Street Journal, June 21, 1995, 1(B); Ellis Booker, “Labor Day in Tuktoyaktuk?” Computerworld 29 (July 10, 1995): 58; Matthew Cutler and Devra Hall, “Sizing ‘Em Up,” Internet World 6, no. 8 (Aug. 1995): 22–24; Stephen Howard, “Have We Got a Hit?” MacWEEK 9 (Sept. 11, 1995): 3; Wayne Rash Jr., “Demand Grows for Commercial Accountability on the Internet,” CommunicationsWeek (Nov. 6, 1995): 86; John Evan Frook, “Web-Hit Audit System Called into Question,” CommunicationsWeek (Dec. 18, 1995): 1; Jeff Ubois, “The Art of the Audit,” Internet World 6, no. 12 (Dec. 1995): 62–74; Niel Robertson, “Stalking the Elusive Usage Data,” Internet World 7, no. 4 (Apr. 1996): 28–31. 5. Rick Stout, Web Site Stats: Tracking Hits and Analyzing Traffic (Berkeley, Calif: Osborne McGraw-Hill, 1997); Robert W. Buchanan Jr. and Charles Lukaszewski, Measuring the Impact of Your Web Site (New York: John Wiley & Sons, 1997). 6. The most recent of these are Mitch Wagner’s buyers guide for various software tools in “Tracking Web Users Is Getting Easier,” Computerworld 31 (Mar. 3, 1997): 59, 63; and Linda Rich’s review of several Web measurement services in “Count Them In,” Mediaweek 7 (Feb. 3, 1997): IQ24–IQ29. Consult http://www.yahoo.com/Computers_and_Internet/Software/Internet/ World_Wide_Web/Servers/Log_Analysis_Tools/ for additional freeware, shareware, and com­ mercial analysis products. 7. Tova Stabin and Irene Owen, “Gathering Usage Statistics at an Environmental Health Library Web Site,” Computers in Libraries 17, no. 3 (Mar. 1997): 30–37. 8. Norman Friesen, “Monitoring the Use of World Wide Web Pages,” available at http:// www.ualberta.ca/~nfriesen/597/.index.html (Jan. 10, 1996). 9. Laurel A. Clyde, “The Library as Information Provider: The Home Page,” Electronic Li­ brary 14, no. 6 (Dec. 1996): 549–558. This is a survey of many libraries but does not examine usage data. 10. Linda Himelstein, Ellen Neuborne, and Paul M. Eng, “Web Ads Start to Click,” Business Week (Oct. 6, 1997): 128–38. 11. Matthew Kinsman, “Online Advertising: The Basics,” Catalog Age 14 (Sept. 1, 1997): 70– 71. 12. Michael Krantz, “The Medium Is the Measure,” Mediaweek 5 (Sept. 25, 1995): IQ20–IQ24. 13. Veronis, Suhler & Associates’ estimate, as reported in Catherine P. Taylor, “Everything Catches in the Net,” Adweek (Eastern Ed.) 38 (Sept. 8, 1997): MO32. 14. Jupiter Communications Inc.’s projection, as reported in Himelstein, Neuborne, and Eng, “Web Ads Start to Click,” 128–38. 15. Purrester Research, as reported in Mark Halper, “So, Does Your Web Site Pay?” Forbes, ASAP Supplement (Aug. 25, 1997): 117–18. 16. An excellent recent overview, from the perspective of newspapers that sell ad space on their Web sites, is Scott Kirsner, “Web of Confusion,” American Journalism Review 19, no. 6 (July/ Aug. 1997): 34–39. See also Paul Demery, “Keeping a Reliable Score of ‘Hits,’” Practical Accoun­ tant 30 (Mar. 1997): 16; Jodi B. Cohen, “Web Audits: A Complex Art,” Editor & Publisher 130 (Feb. www.ualberta.ca/~nfriesen/597/.index.html http://www.yahoo.com/Computers_and_Internet/Software/Internet http:http://sunsite.berkeley.edu http://www.boutell.com/faq http://www.boutell.com/faq http://www.ics.uci.edu/pub Benchmarking Web Site Usage 79 3, 1997): 24I–26I. 17. Damon Darlin, “Ratings Game,” Forbes 158 (Dec. 2, 1996): 226; Cris Beam, “ABC and BPA Audit the Body Electric,” Folio: The Magazine for Magazine Management 25 (Oct. 15, 1996): 30–32; Jane Hodges, “BPA Questions Int’l Audit Standards,” Advertising Age 68 (Feb. 3, 1997): 56; Himelstein, Neuborne, and Eng, “Web Ads Start to Click,” 134. 18. Consult http://www.yahoo.com/Computers_and_Internet/Software/Internet/ World_Wide_Web/Servers/Log_Analysis_Tools/Titles for an extensive list of Web server log analysis software and services. 19. Coalition for Advertising-Supported Information and Entertainment, CASIE Guiding Prin­ ciples of Interactive Media Audience Measurements, available at http://www.commercepark.com/ AAAA/casie/gp/guiding_principles.html (Apr. 3, 1997). 20. Internet Advertising Bureau Media Measurement Task Force, Metrics and Methodology, available at http://www.iab.net/advertise/metricsource.html (Sept. 15, 1997). 21. Kirsner, “Web of Confusion,” 36–39. 22. Ibid., 36. 23. Neil Randall, “The New Cookie Monster,” PC Magazine 16 (Apr. 22, 1997): 211–14; Stephen H. Wildstrom, “Privacy and the ‘Cookie’ Monster,” Business Week no. 3506 (Dec. 16, 1996): 22. 24. IAB, Metrics and Methodology. 25. Association of Research Libraries, ARL Statistics and Information, available at http:// www.arl.org/stats/statistics/stat.html (Apr. 1997). 26. See note 6 above for reviews of such software and services. www.arl.org/stats/statistics/stat.html http://www.iab.net/advertise/metricsource.html http://www.commercepark.com http://www.yahoo.com/Computers_and_Internet/Software/Internet